Improve Performance on Android with Kernel Task Scheduler Tweaking [Part 2]

Welcome back to the tweaking guide for another kernel Task Scheduler for Android. In the previous article, we discussed some basic Task Scheduler configuration options, which can be manually set to improve performance on Android devices. If you haven't read the article, we strongly recommend doing it before continuing, because this includes some useful information to better understand how the Task Scheduler really works.

Apart from the basic parameters discussed in the previous article, the Linux Task Scheduler implements several special features that try to improve the distribution of fair processing power. These features do not produce the same results on all systems and use-cases, so they can be activated or deactivated at runtime. They are available through kernel debugfs. Not all kernel packages support the debugfs scheduler tweaking feature. You can learn if your kernel supports changing these features by entering the following command in the Terminal Application (as a super user):

Boost Performance on Android by Tweaking the Kernel’s Task Scheduler [Part 2]

cat /sys/kernel/debug/sched_features

If you get an output like the following, then your kernel supports changing debug features at runtime:

Boost Performance on Android by Tweaking the Kernel’s Task Scheduler [Part 2]

If you get an output error "Unlike files or directories", you can try to mount the debugfs kernel, if available but not mounted at boot:

mount -t debugfs none /sys/kernel/debug
If this command does not produce an error, you can try passing the first command again.

Toggling A Scheduler Debug Feature
To enable a scheduler debug feature, you just need to write its name inside the sched_features file. For example:

echo GENTLE_FAIR_SLEEPERS > /sys/kernel/debug/sched_features
To disable a feature, you must also write its name inside the same file, but with a “NO_” in front. For example:

echo NO_GENTLE_FAIR_SLEEPERS > /sys/kernel/debug/sched_features
Most Common Scheduler Debug Features
It is not guaranteed that a specific kernel will support all scheduler features discussed below. Support for features varies between kernel versions and between device implementations. All features supported are those contained in the output of command ‘cat /sys/kernel/debug/sched_features’. Features that start with a “NO_” are those supported but currently disabled.


This feature tries to reduce run time of sleepers (processes that tend to sleep for a long time). As a result, more runtime is distributed to active tasks. Some users report that disabling this feature might improve responsiveness on low-end devices.


Put a task that wakes up on the same CPU as the task that woke it. This assumes that the new task will work on the same data in memory as the previous one. Scheduling it on the same CPU will improve cache locality(read below for more information about cache locality).


We need some deeper Task Scheduling knowledge to understand this feature: On Linux, there are different task scheduling priorities. A task’s scheduling priority defines how important that task is and how much resources will get from the scheduler. Most tasks run using SCHED_OTHER (SCHED_NORMAL on newer kernels) priority. Other priorities are SCHED_BATCH, SCHED_RR, SCHED_FIFO and SCHED_DEADLINE. Last three of these priorities are real-time priorities, designed for tasks that need to run as fast as possible and as soon as they are created.

Task Scheduler assigns each CPU two special parameters: rt_runtime and rt_period. Rt_period is the scheduling period that is equivalent to 100% CPU bandwidth. Rt_runtime is the time in which each CPU runs only real-time tasks. All other tasks run for [rt_period – rt_runtime] time. Typically, the kernel Task Scheduler gives 95% of a CPU’s time to real-time tasks and 5% to all others.

RT_RUNTIME_SHARE allows a CPU to run a real-time task up to 100% of time, by borrowing rt_runtime from other CPUs. CPUs that lend rt_runtime can then run non-realtime tasks for more time. Some custom kernel developers disable this feature by default, as it can render a CPU unable to service non-realtime tasks strictly scheduled on it and decrease performance.


Generally on Linux, when a task wakes up, it preempts (=takes the place of) the task that was running before on the CPU (wakeup preemption). NEXT_BUDDY handles the case when the task that wakes up does not cause preemption. When NEXT_BUDDY is enabled, the task that just woke up will run on the next scheduling event. This feature also improves cache locality.


When wakeup preemption succeeds, prefer to schedule next the task that ran just before preemption. LAST_BUDDY improves cache locality.


This handles wakeup preemption discussed above. Most kernels enable this behavior by default. Disabling it might help on heavy loaded systems.

Cache Memory and Cache Locality

The key point in modern CPU designs is minimizing latencies between fetching data from memory devices and processing them. RAM memory, even in its latest generations, is quite slow to operate. It usually requires hundreds of CPU cycles to get some data from an area inside RAM storage.

To overcome this situation, modern CPUs come packed with some special type of memory, known as cache memory. There are usually several cache memory devices connected in series, named L1, L2, L3 etc. Letter “L” stands for level, ie L1 cache is the one closest to the processor. The higher the level of cache, the slower it is to operate. On the other hand, higher level caches are less expensive to manufacture. However, all cache memories are a lot more expensive than typical RAM memory. That is why most CPUs only pack small amounts of cache memory (some MBs).

Typically, when a CPU needs to get some data from memory, it will first look in cache memories in increasing order. Every time it jumps to a higher level cache memory during the fetching process, latency increases. If data is not found inside the cache memory, CPU relies on RAM storage, which largely increases processing work.

Android and Cache Locality

Android devices that ship with a number of the same type of processors in one cluster (eg. a Qualcomm msm8226 device), usually have a shared L2 cache. Each processor has its own L1 cache. These devices are easier to manage by the kernel Task Scheduler. In contrast, devices that implement the big.Little architecture (Asymmetric Multi-Processing, AMP) are more complex. They pack two clusters of processors, each one featuring its own L2 cache. The Linux Task Scheduler includes support for these types of devices, but custom manufacturer implementations need code reworks for them to operate efficiently.

Final Thoughts

All features discussed above are quite complex to understand but can have different performance impact on different devices. For example, low-end devices might show better performance with GENTLE_FAIR_SLEEPERS and RT_RUNTIME_SHARE disabled. Enabling WAKEUP_PREEMPTION might also increase throughput on these devices.

Things are different on big.Little platforms, which feature more than one CPU clusters. Scheduling a child process or thread from a big core on a little core, or vice-versa introduces cache-memory copying latency. You can minimize this effect by enabling features that improve cache locality, AFFINE_WAKEUPS, and LAST_BUDDY.

Also


Delivered by FeedBurner