Hi Peter,
In 8a99b683 ("sched: Move SCHED_DEBUG sysctl to debugfs"), you moved
sched_min_granularity_ns to debugfs, citing that it is debug-only (true)
and undocumented (it is documented in sched-design-CFS.rst, under
the old name).
This breaks my application, Scylla[1]. We use sched_min_granularity_ns
to reduce the chances that a high networking backlog will starve the
application thread. It is a thread-per-core design, so we won't find another
core for the application, they are all busy (and besides, the application
threads are pinned).
In addition to sched_min_granularity_ns, we also tune a few other
sysctls:
# Prevent auto-scaling from doing anything to our tunables
kernel.sched_tunable_scaling = 0
# Preempt sooner
kernel.sched_min_granularity_ns = 500000
# Don't delay unrelated workloads
kernel.sched_wakeup_granularity_ns = 450000
# Schedule all tasks in this period
kernel.sched_latency_ns = 1000000
# autogroup seems to prevent sched_latency_ns from being respected
kernel.sched_autogroup_enabled = 0
# Disable numa balancing
kernel.numa_balancing = 0
While we can adapt to the move, I would much prefer it if the old location
was restored. I think it even makes sense to make this a non-debug tunable;
it helps to application to be more responsive without using the realtime
class, which is its own can of worms (and will likely result in reduced
throughput).
[1] https://github.com/scylladb/scylla