Hi all,
Just wondering if what I'm seeing is expected. I'm using the CentOS 7
RT kernel with boot args of "skew_tick=1 irqaffinity=0 rcu_nocbs=1-27
nohz_full=1-27" among others.
Normally if I run cyclictest it sets /dev/cpu_dma_latency to zero. This
gives worst-case latency around 6usec.
If I set /dev/cpu_dma_latency to something large and then set
/sys/devices/system/cpu/cpu${num}/power/pm_qos_resume_latency_us to "2"
for the CPUs that cyclictest is running on then the worst-case latency
jumps to more like 16usec.
If I set pm_qos_resume_latency_us to "2" for all CPUs on the system,
then the worst-case latency comes back down. It's not sufficient to set
it for all CPUs on the same socket as cyclictest.
It does not seem to make any difference in the worst-case latency to set
cpuset.sched_load_balance to zero for the cpuset containing cyclictest.
(All cpusets but one have cpuset.sched_load_balance set to zero, and
that one doesn't include the CPUs that cyclictest runs on.)
Looking at the latency traces, there does not appear to be any single
culprit. I've seen cases where it appears to take extra time in
migrate_task_rq_fair(), tick_do_update_jiffies64(), rcu_irq_enter(), and
enqueue_entity().
I'm trying to dynamically isolate CPUs from the system for running RT
tasks, but it seems like the rest of the system still affects the
isolated CPUs.
Any comments/suggestions would be appreciated.
Thanks,
Chris