LinuxLists.cc - High context switch rate, ksoftirqd's chewing cpu

2012-11-17 13:10:33

Subject: High context switch rate, ksoftirqd's chewing cpu

Hello Tejun et al.

This week I spent some hours tracking a regression in 3.7 kernel that
was producing high context switch rate on one of my machines. I
carefully bisected between 3.6 and 3.7-rc1 and eventually found this
commit a culprit:

commit e7c2f967445dd2041f0f8e3179cca22bb8bb7f79
Author: Tejun Heo <[email protected]>
Date: Tue Aug 21 13:18:24 2012 -0700

workqueue: use mod_delayed_work() instead of __cancel + queue

Now that mod_delayed_work() is safe to call from IRQ handlers,
__cancel_delayed_work() followed by queue_delayed_work() can be
replaced with mod_delayed_work().

Most conversions are straight-forward except for the following.

* net/core/link_watch.c: linkwatch_schedule_work() was doing a quite
elaborate dancing around its delayed_work. Collapse it such that
linkwatch_work is queued for immediate execution if LW_URGENT and
existing timer is kept otherwise.

Signed-off-by: Tejun Heo <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Tomi Valkeinen <[email protected]>

Then I carefully reverted chunk by chunk to find out what exact change
is responsible for the regression. You can find it attached as wq.patch
(to preserve whitespace). Very simple modification with wildly different
behavior on only one of my machines, weird. I'm also attaching ctxt/s
graph that shows the impact nicely. I'll gladly provide any additional
info that could help you resolve this.

Please Cc: on reply (not subscribed to lkml).

Regards,
--
Zlatko

Attachments:

wq.patch (521.00 B)
context-weekly.png (11.51 kB)
Download all attachments

2012-11-30 22:52:09

by Tejun Heo

[permalink] [raw]

Subject: Re: High context switch rate, ksoftirqd's chewing cpu

Hello, Zlatko.

Sorry about the delay. Your message was in my spam folder. The
attachment seems to have confused the filter.

On Sat, Nov 17, 2012 at 02:01:29PM +0100, Zlatko Calusic wrote:
> This week I spent some hours tracking a regression in 3.7 kernel
> that was producing high context switch rate on one of my machines. I
> carefully bisected between 3.6 and 3.7-rc1 and eventually found this
> commit a culprit:
>
> commit e7c2f967445dd2041f0f8e3179cca22bb8bb7f79
> Author: Tejun Heo <[email protected]>
> Date: Tue Aug 21 13:18:24 2012 -0700
>
> workqueue: use mod_delayed_work() instead of __cancel + queue
...
>
> Then I carefully reverted chunk by chunk to find out what exact
> change is responsible for the regression. You can find it attached
> as wq.patch (to preserve whitespace). Very simple modification with
> wildly different behavior on only one of my machines, weird. I'm
> also attaching ctxt/s graph that shows the impact nicely. I'll
> gladly provide any additional info that could help you resolve this.
>
> Please Cc: on reply (not subscribed to lkml).
>
> Regards,
> --
> Zlatko

> diff --git a/block/blk-core.c b/block/blk-core.c
> index 4b4dbdf..4b8b606 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -319,10 +319,8 @@ EXPORT_SYMBOL(__blk_run_queue);
> */
> void blk_run_queue_async(struct request_queue *q)
> {
> - if (likely(!blk_queue_stopped(q))) {
> - __cancel_delayed_work(&q->delay_work);
> - queue_delayed_work(kblockd_workqueue, &q->delay_work, 0);
> - }
> + if (likely(!blk_queue_stopped(q)))
> + mod_delayed_work(kblockd_workqueue, &q->delay_work, 0);
> }
> EXPORT_SYMBOL(blk_run_queue_async);

That's intersting. Is there anything else noticeably different than
the ctxsw counts? e.g. CPU usage, IO throughput / latency, etc...
Also, can you please post the kernel boot log from the machine? I
assume that the issue is readily reproducible? Are you up for trying
some debug patches?

Thanks.

--
tejun

2012-11-30 22:55:51

by Tejun Heo

[permalink] [raw]

Subject: Re: High context switch rate, ksoftirqd's chewing cpu

Hello, again.

Can you please try this patch? Thanks!

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 042d221..26368ef 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1477,7 +1477,10 @@ bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
} while (unlikely(ret == -EAGAIN));

if (likely(ret >= 0)) {
- __queue_delayed_work(cpu, wq, dwork, delay);
+ if (!delay)
+ __queue_work(cpu, wq, &dwork->work);
+ else
+ __queue_delayed_work(cpu, wq, dwork, delay);
local_irq_restore(flags);
}

2012-11-30 23:12:07

by Zlatko Calusic

[permalink] [raw]

Subject: Re: High context switch rate, ksoftirqd's chewing cpu

On 30.11.2012 23:52, Tejun Heo wrote:
> Hello, Zlatko.
>
> Sorry about the delay. Your message was in my spam folder. The
> attachment seems to have confused the filter.
>
> On Sat, Nov 17, 2012 at 02:01:29PM +0100, Zlatko Calusic wrote:
>> This week I spent some hours tracking a regression in 3.7 kernel
>> that was producing high context switch rate on one of my machines. I
>> carefully bisected between 3.6 and 3.7-rc1 and eventually found this
>> commit a culprit:
>>
>> commit e7c2f967445dd2041f0f8e3179cca22bb8bb7f79
>> Author: Tejun Heo <[email protected]>
>> Date: Tue Aug 21 13:18:24 2012 -0700
>>
>> workqueue: use mod_delayed_work() instead of __cancel + queue
> ...
>>
>> Then I carefully reverted chunk by chunk to find out what exact
>> change is responsible for the regression. You can find it attached
>> as wq.patch (to preserve whitespace). Very simple modification with
>> wildly different behavior on only one of my machines, weird. I'm
>> also attaching ctxt/s graph that shows the impact nicely. I'll
>> gladly provide any additional info that could help you resolve this.
>>
>> Please Cc: on reply (not subscribed to lkml).
>>
>> Regards,
>> --
>> Zlatko
>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 4b4dbdf..4b8b606 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -319,10 +319,8 @@ EXPORT_SYMBOL(__blk_run_queue);
>> */
>> void blk_run_queue_async(struct request_queue *q)
>> {
>> - if (likely(!blk_queue_stopped(q))) {
>> - __cancel_delayed_work(&q->delay_work);
>> - queue_delayed_work(kblockd_workqueue, &q->delay_work, 0);
>> - }
>> + if (likely(!blk_queue_stopped(q)))
>> + mod_delayed_work(kblockd_workqueue, &q->delay_work, 0);
>> }
>> EXPORT_SYMBOL(blk_run_queue_async);
>
> That's intersting. Is there anything else noticeably different than
> the ctxsw counts? e.g. CPU usage, IO throughput / latency, etc...
> Also, can you please post the kernel boot log from the machine? I
> assume that the issue is readily reproducible? Are you up for trying
> some debug patches?
>
> Thanks.
>

Hey Tejun! Thanks for replying.

It's an older C2D machine, I've attached the kernel boot log. Funny
thing is that on the other half a dozen machines I don't observe any
problems, only on this one. And it's reproducible every time. I don't
see any other anomalies beside the two I already mentioned, high context
switch rate and ksoftirqd daemons eating more CPU, probably as a
consequence.

I'll gladly try your patch and send my observations tommorow, as I've
just started md resync on the machine, which will take couple of hours.

Regards,
--
Zlatko

Attachments:

dmesg.txt (23.15 kB)