Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755355AbcK2REi (ORCPT ); Tue, 29 Nov 2016 12:04:38 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:35012 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751268AbcK2REY (ORCPT ); Tue, 29 Nov 2016 12:04:24 -0500 Cc: mtk.manpages@gmail.com, Peter Zijlstra , Ingo Molnar , Thomas Gleixner , linux-man , lkml To: Mike Galbraith From: "Michael Kerrisk (man-pages)" Subject: RFC [v2]: documenting autogroup, group scheduling, and interactions with nice Message-ID: Date: Tue, 29 Nov 2016 18:04:14 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7597 Lines: 146 Hello Mike and others, This is my second version of an attempt to document the autogroup that you added in 2.6.38. As well as reworking nd extending the autogroup text, in this round I've added text describing group scheduling, and also noted the changes (somewhat surprising for users) that implicit autogrouping brought about for the operation of the nice(1) command. Could you please take a look, and let me know if anything needs fixing. Cheers, Michael For the sched(7) man page: The autogroup feature Since Linux 2.6.38, the kernel provides a feature known as auto‐ grouping to improve interactive desktop performance in the face of multiprocess, CPU-intensive workloads such as building the Linux kernel with large numbers of parallel build processes (i.e., the make(1) -j flag). This feature operates in conjunction with the CFS scheduler and requires a kernel that is configured with CONFIG_SCHED_AUTOGROUP. On a running system, this feature is enabled or disabled via the file /proc/sys/kernel/sched_autogroup_enabled; a value of 0 dis‐ ables the feature, while a value of 1 enables it. The default value in this file is 1, unless the kernel was booted with the noautogroup parameter. A new autogroup is created created when a new session is created via setsid(2); this happens, for example, when a new terminal win‐ dow is started. A new process created by fork(2) inherits its parent's autogroup membership. Thus, all of the processes in a session are members of the same autogroup. An autogroup is auto‐ matically destroyed when the last process in the group terminates. When autogrouping is enabled, all of the members of an autogroup are placed in the same kernel scheduler "task group". The CFS scheduler employs an algorithm that equalizes the distribution of CPU cycles across task groups. The benefits of this for interac‐ tive desktop performance can be described via the following exam‐ ple. Suppose that there are two autogroups competing for the same CPU (i.e., presume either a single CPU system or the use of taskset(1) to confine all the processes to the same CPU on an SMP system). The first group contains ten CPU-bound processes from a kernel build started with make -j10. The other contains a single CPU- bound process: a video player. The effect of autogrouping is that the two groups will each receive half of the CPU cycles. That is, the video player will receive 50% of the CPU cycles, rather than just 9% of the cycles, which would likely lead to degraded video playback. The situation on an SMP system is more complex, but the general effect is the same: the scheduler distributes CPU cycles across task groups such that an autogroup that contains a large number of CPU-bound processes does not end up hogging CPU cycles at the expense of the other jobs on the system. A process's autogroup (task group) membership can be viewed via the file /proc/[pid]/autogroup: $ cat /proc/1/autogroup /autogroup-1 nice 0 This file can also be used to modify the CPU bandwidth allocated to an autogroup. This is done by writing a number in the "nice" range to the file to set the autogroup's nice value. The allowed range is from +19 (low priority) to -20 (high priority). (Writing values outside of this range causes write(2) to fail with the error EINVAL.) The autogroup nice setting has the same meaning as the process nice value, but applies to distribution of CPU cycles to the auto‐ group as a whole, based on the relative nice values of other auto‐ groups. For a process inside an autogroup, the CPU cycles that it receives will be a product of the autogroup's nice value (compared to other autogroups) and the process's nice value (compared to other processes in the same autogroup. The use of the cgroups(7) CPU controller to place processes in cgroups other than the root CPU cgroup overrides the effect of autogrouping. The autogroup feature groups only processes scheduled under non- real-time policies (SCHED_OTHER, SCHED_BATCH, and SCHED_IDLE). It does not group processes scheduled under real-time and deadline policies. Those processes are scheduled according to the rules described earlier. The nice value and group scheduling When scheduling non-real-time processes (i.e., those scheduled under the SCHED_OTHER, SCHED_BATCH, and SCHED_IDLE policies), the CFS scheduler employs a technique known as "group scheduling", if the kernel was configured with the CONFIG_FAIR_GROUP_SCHED option (which is typical). Under group scheduling, threads are scheduled in "task groups". Task groups have a hierarchical relationship, rooted under the initial task group on the system, known as the "root task group". Task groups are formed in the following circumstances: * All of the threads in a CPU cgroup form a task group. The par‐ ent of this task group is the task group of the corresponding parent cgroup. * If autogrouping is enabled, then all of the threads that are (implicitly) placed in an autogroup (i.e., the same session, as created by setsid(2)) form a task group. Each new autogroup is thus a separate task group. The root task group is the parent of all such autogroups. * If autogrouping is enabled, then the root task group consists of all processes in the root CPU cgroup that were not otherwise implicitly placed into a new autogroup. * If autogrouping is disabled, then the root task group consists of all processes in the root CPU cgroup. * If group scheduling was disabled (i.e., the kernel was config‐ ured without CONFIG_FAIR_GROUP_SCHED), then all of the pro‐ cesses on the system are notionally placed in a single task group. Under group scheduling, a thread's nice value has an effect for scheduling decisions only relative to other threads in the same task group. This has some surprising consequences in terms of the traditional semantics of the nice value on UNIX systems. In par‐ ticular, if autogrouping is enabled (which is the default), then employing setpriority(2) or nice(1) on a process has an effect only for scheduling relative to other processes executed in the same session (typically: the same terminal window). Conversely, for two processes that are (for example) the sole CPU- bound processes in different sessions (e.g., different terminal windows, each of whose jobs are tied to different autogroups), modifying the nice value of the process in one of the sessions has no effect in terms of the scheduler's decisions relative to the process in the other session. -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/