LinuxLists.cc - Re: multi-second application stall in open()

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Thu, Jun 21, 2012 at 02:26:13PM -0500, Josh Hunt wrote:
> On Thu, Mar 8, 2012 at 6:16 PM, Josh Hunt <[email protected]> wrote:
> > On Thu, Mar 8, 2012 at 5:40 PM, Vivek Goyal <[email protected]> wrote:
> >> On Thu, Mar 08, 2012 at 04:22:31PM -0600, Josh Hunt wrote:
> >>
> >> [..]
> >>> A crude bisection seems to show that if I revert "blkio: Set
> >>> must_dispatch only if we decided to not dispatch the request"
> >>> (bf7919371025412978268efca4b09dd847acb395) I no longer see the stalls
> >>> in 2.6.35. However this does not seem to solve the problem if I revert
> >>> it in 2.6.38.
> >>
> >> Strange. If this was the problem it should have fixed it in 2.6.38 also.
> >> BTW, the blktrace you sent was from 2.6.38 or 2.6.35 kernels.
> >>
> >
> > The blktrace is from 2.6.35. I just attached a portion of it earlier
> > b/c I thought it would be too much to send the whole thing. I've gone
> > ahead and attached the full output (bzip'd), along with the same
> > workload with slice_idle set to 0 for comparison.
> >
> >>>
> >>> By setting slice_idle to 0, does this basically disable plugging?
> >>
> >> It disables idling and not plugging.
> >
> > OK that's what I thought. I'm just confused about the differences in
> > the blktrace outputs wrt plugging with slice_idle set to 0.
> >
> >>
> >>> Based on the blktrace info it seems that something is going wrong with
> >>> plugging with my testcase. I'm just wondering why setting slice_idle
> >>> to 0 seems to resolve my issue? Also, since we see unplugs in the
> >>> blktrace how could the requests still not be getting sent to the disk?
> >>
> >> Unplug will just try to kick the queue. That does not mean that request
> >> will be dispatched. And that's the question that why are we not
> >> dispatching requests.
> >>
> >> I had another look at traces and I think it is not just async write, but
> >> there is was sync write request queued and we have not dispatched that
> >> too for a long time.
> >>
> >> Added request here.
> >>
> >> ?8,0 ? ?1 ? ?36921 ?5028.492019664 ? 162 ?A ?WS 63 + 8 <- (8,1) 0
> >> ?8,0 ? ?1 ? ?36922 ?5028.492021620 ? 162 ?Q ?WS 63 + 8 [sync_supers]
> >> ?8,0 ? ?1 ? ?36923 ?5028.492029721 ? 162 ?G ?WS 63 + 8 [sync_supers]
> >> ?8,0 ? ?1 ? ?36924 ?5028.492040617 ? 162 ?I ? W 63 + 8 ( ? 10896)
> >> [sync_supers]
> >> ?8,0 ? ?1 ? ? ? ?0 ?5028.492044807 ? ? 0 ?m ? N cfq162 insert_request
> >> ?8,0 ? ?1 ? ? ? ?0 ?5028.492046763 ? ? 0 ?m ? N cfq162 add_to_rr
> >>
> >> And after a long time we dispatched the request.
> >>
> >> ?8,0 ? ?0 ? ? ? ?0 ?5050.116841906 ? ? 0 ?m ? N cfq162 set_active wl_prio:0 wl_type:1
> >> ?8,0 ? ?0 ? ? ? ?0 ?5050.116844979 ? ? 0 ?m ? N cfq162 fifo=ffff8800e8787aa0
> >> ?8,0 ? ?0 ? ? ? ?0 ?5050.116846655 ? ? 0 ?m ? N cfq162 dispatch_insert
> >> ?8,0 ? ?0 ? ? ? ?0 ?5050.116849728 ? ? 0 ?m ? N cfq162 dispatched a
> >> request
> >> ?8,0 ? ?0 ? ? ? ?0 ?5050.116851683 ? ? 0 ?m ? N cfq162 activate rq, drv=1
> >> ?8,0 ? ?0 ? ?36518 ?5050.116853360 ? 166 ?D ? W 63 + 8 (21624812743)
> >> [kblockd/0]
> >> ?8,0 ? ?0 ? ?36519 ?5050.117236650 ?9671 ?C ? W 63 + 8 ( ?383290) [0]
> >>
> >> So it is not async requestss being starved by sync request issue, most
> >> likely.
> >>
> >
> > I wanted to point out again that this is a very small io load. On the
> > avg of a few KB/s.
> >
> >> Are you using any of the blk cgroup stuff?
> >
> > No, none of that is enabled in the kernel.
> >
> >>
> >> Can you put some more trace messages to figure out what's happening.
> >> I think you can try putting some trace messages in following functions.
> >>
> >> __blk_run_queue()
> >> cfq_select_queue()
> >>
> >> and try to narrow down why CFQ refuses to dispatch the request when
> >> this happens.
> >
> > Sure. I'll add some more traces here and see if it sheds any light on the issue.
> >
> > Thanks again for your help.
> >
> >>
> >> Thanks
> >> Vivek
> >
> > --
> > Josh
>
> It looks like I am hitting this issue again. I upgraded our system to
> the 3.0.13 kernel and it appeared to resolve the issues I was seeing.
> Unfortunately it is still occurring now just less frequently.
>
> I've gone ahead and instrumented cfq_select_queue and have some more
> information. To quickly recap my issue - using CFQ we are seeing an
> application's async writes stall for 5-15s. This is using a sata_svw
> controller and a rotating drive which does not have NCQ enabled. If I
> change slice_idle to 0 the stalls appear to go away. I've also tried
> using the deadline scheduler and do not see the issue with that. I dug
> through CFQ and identified the line which with slice_idle set to 0
> gets bypassed and allows my workload to run without stalling (I added
> in the blktrace logging for my debugging):
>
> if (cfqq->dispatched && cfq_should_idle(cfqd, cfqq)) {
> service_tree = cfqq->service_tree;
> cfq_log_cfqq(cfqd, cfqq, "keep_queue st->count:%d, dispatch:%u",
> service_tree->count, cfqq->dispatched);
> cfqq = NULL;
> goto keep_queue;
> }
>
> When I have slice_idle set to 0, this gets bypassed and my workload
> runs fine. I've verified this by removing this code and running the
> workload with that kernel. I'm not saying it should be removed, just
> that bypassing this code helps my workload for whatever reason.
>
> Here are some snippets from my blktrace log (attached):
>
> Insert:
> 8,0 1 0 4466.115833485 0 m N cfq5864 insert_request
> 8,0 1 0 4466.115835161 0 m N cfq20720 Idling.
> st->count:2, hw_tag:0
> 8,0 1 1499412 4466.115836278 6023 I W 7078615 + 256 (
> 311495) [flush-8:0]
> 8,0 1 0 4466.115837675 0 m N cfq5864 insert_request
> 8,0 1 0 4466.115840190 0 m N cfq20720 Idling.
> st->count:2, hw_tag:0
> 8,0 1 1499413 4466.115841307 6023 U N [flush-8:0] 8
> 8,0 1 0 4466.115842983 0 m N cfq20720 Idling.
> st->count:2, hw_tag:0
> 8,0 1 0 4466.115844660 0 m N cfq20720 keep_queue
> st->count:2, dispatch:1
>
> Dispatch:
> 8,0 0 0 4478.127274376 0 m N cfq5864 complete rqnoidle 0
> 8,0 0 0 4478.127318796 0 m N cfq5864 Not idling. st->count:1
> 8,0 0 0 4478.127320472 0 m N cfq5864 dispatch_insert
> 8,0 0 0 4478.127322148 0 m N cfq5864 dispatched a request
> 8,0 0 0 4478.127324104 0 m N cfq5864 activate rq, drv=2
> 8,0 0 2876000 4478.127325221 7232 D W 7078615 + 256
> (12011488943) [testd]
> 8,0 0 2876001 4478.128414478 7232 C W 7078135 + 264 ( 1250173) [0]
> 8,0 0 0 4478.128445767 0 m N cfq5864 complete rqnoidle 0
>
> Also in the blktrace I see gaps of 3-4s where nothing happens:
> 8,0 0 0 4466.893959719 0 m N cfq20723 del_from_rr
> 8,0 0 0 4466.893961395 0 m N cfq schedule dispatch
> 8,0 1 1499539 4470.000135552 174 A WS 63 + 8 <- (8,1) 0
> 8,0 1 1499540 4470.000138066 174 Q WS 63 + 8 [sync_supers]
>
> Meanwhile the requests issued earlier are just sitting on the queue
> waiting to be dispatched.
>
> To accelerate the issue I'm constantly dropping the cache - while :;
> do echo 1 > /proc/sys/vm/drop_caches; done. The issue occurs without
> it, but this helps reproduce it in about an hour. I also wanted to
> point out that I see a decent amount of syslogd activity in the trace
> and know that it does do fsync(). I'm not sure if that's helping the
> stalling, but wanted to mention that.

This is strange. I can see in logs that for some reason writes are not
being dispatched and queue is doing nothing.

8,0 0 0 4466.132851211 0 m N cfq schedule dispatch
8,0 0 0 4466.132854564 0 m N cfq20720 keep_queue
timer_pending st->count:2, dispatch:0
8,0 0 0 4466.139951919 0 m N cfq idle timer fired
8,0 0 0 4466.139954713 0 m N cfq20720 slice expired
t=0
8,0 0 0 4466.139958066 0 m N cfq20720 sl_used=17
disp=21 charge=17 iops=0 sect=176
8,0 0 0 4466.139959742 0 m N cfq20720 del_from_rr
8,0 0 0 4466.139963653 0 m N cfq schedule dispatch
8,0 1 1499521 4466.791207736 7570 A R 7258191 + 8 <- (8,1)
7258128
8,0 1 1499522 4466.791209692 7570 Q R 7258191 + 8 [httpdmon]
8,0 1 1499523 4466.791217514 7570 G R 7258191 + 8 [httpdmon]
8,0 1 1499524 4466.791221705 7570 I R 7258191 + 8 ( 4191)
[httpdmon]
8,0 1 0 4466.791227572 0 m N cfq7570 insert_request
8,0 1 0 4466.791229248 0 m N cfq7570 add_to_rr
8,0 1 0 4466.791235953 0 m N cfq20719 set_active
wl_prio:0 wl_type:2

Here we deleted queue 20720 and did nothing for .6 seconds and from
previous logs it is visible that writes are pending and queued.

For some reason cfq_schedule_dispatch() did not lead to kicking queue
or queue was kicked but somehow write queue was not selected for
dispatch (A case of corrupt data structures?).

Are you able to reproduce this issue on latest kernels (3.5-rc2?). I would
say put some logs in select_queue() and see where did it bail out. That
will confirm that select queue was called and can also give some details
why we did not select async queue for dispatch. (Note: select_queue is called
multiple times so putting trace point there makes logs very verbose).

Thanks
Vivek

2012-06-21 20:36:20

[permalink] [raw]

Subject: Re: multi-second application stall in open()

Hey, Vivek.

On Thu, Jun 21, 2012 at 04:32:17PM -0400, Vivek Goyal wrote:
> Here we deleted queue 20720 and did nothing for .6 seconds and from
> previous logs it is visible that writes are pending and queued.
>
> For some reason cfq_schedule_dispatch() did not lead to kicking queue
> or queue was kicked but somehow write queue was not selected for
> dispatch (A case of corrupt data structures?).
>
> Are you able to reproduce this issue on latest kernels (3.5-rc2?). I would
> say put some logs in select_queue() and see where did it bail out. That
> will confirm that select queue was called and can also give some details
> why we did not select async queue for dispatch. (Note: select_queue is called
> multiple times so putting trace point there makes logs very verbose).

Some people are putting in watchdog timers in block layer to kick cfq
when it stalls with pending requests. The cfq code there has diverged
quite a bit from upstream so I have no idea whether it's caused by the
same issue. The symptom sounds exactly the same tho. So, yeah, I
think it isn't too unlikely that we have a cfq logic bug leading to
stalls. :(

--
tejun

2012-06-21 21:11:21

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Thu, Jun 21, 2012 at 3:32 PM, Vivek Goyal <[email protected]> wrote:
> On Thu, Jun 21, 2012 at 02:26:13PM -0500, Josh Hunt wrote:
>> On Thu, Mar 8, 2012 at 6:16 PM, Josh Hunt <[email protected]> wrote:
>> > On Thu, Mar 8, 2012 at 5:40 PM, Vivek Goyal <[email protected]> wrote:
>> >> On Thu, Mar 08, 2012 at 04:22:31PM -0600, Josh Hunt wrote:
>> >>
>> >> [..]
>> >>> A crude bisection seems to show that if I revert "blkio: Set
>> >>> must_dispatch only if we decided to not dispatch the request"
>> >>> (bf7919371025412978268efca4b09dd847acb395) I no longer see the stalls
>> >>> in 2.6.35. However this does not seem to solve the problem if I revert
>> >>> it in 2.6.38.
>> >>
>> >> Strange. If this was the problem it should have fixed it in 2.6.38 also.
>> >> BTW, the blktrace you sent was from 2.6.38 or 2.6.35 kernels.
>> >>
>> >
>> > The blktrace is from 2.6.35. I just attached a portion of it earlier
>> > b/c I thought it would be too much to send the whole thing. I've gone
>> > ahead and attached the full output (bzip'd), along with the same
>> > workload with slice_idle set to 0 for comparison.
>> >
>> >>>
>> >>> By setting slice_idle to 0, does this basically disable plugging?
>> >>
>> >> It disables idling and not plugging.
>> >
>> > OK that's what I thought. I'm just confused about the differences in
>> > the blktrace outputs wrt plugging with slice_idle set to 0.
>> >
>> >>
>> >>> Based on the blktrace info it seems that something is going wrong with
>> >>> plugging with my testcase. I'm just wondering why setting slice_idle
>> >>> to 0 seems to resolve my issue? Also, since we see unplugs in the
>> >>> blktrace how could the requests still not be getting sent to the disk?
>> >>
>> >> Unplug will just try to kick the queue. That does not mean that request
>> >> will be dispatched. And that's the question that why are we not
>> >> dispatching requests.
>> >>
>> >> I had another look at traces and I think it is not just async write, but
>> >> there is was sync write request queued and we have not dispatched that
>> >> too for a long time.
>> >>
>> >> Added request here.
>> >>
>> >> ?8,0 ? ?1 ? ?36921 ?5028.492019664 ? 162 ?A ?WS 63 + 8 <- (8,1) 0
>> >> ?8,0 ? ?1 ? ?36922 ?5028.492021620 ? 162 ?Q ?WS 63 + 8 [sync_supers]
>> >> ?8,0 ? ?1 ? ?36923 ?5028.492029721 ? 162 ?G ?WS 63 + 8 [sync_supers]
>> >> ?8,0 ? ?1 ? ?36924 ?5028.492040617 ? 162 ?I ? W 63 + 8 ( ? 10896)
>> >> [sync_supers]
>> >> ?8,0 ? ?1 ? ? ? ?0 ?5028.492044807 ? ? 0 ?m ? N cfq162 insert_request
>> >> ?8,0 ? ?1 ? ? ? ?0 ?5028.492046763 ? ? 0 ?m ? N cfq162 add_to_rr
>> >>
>> >> And after a long time we dispatched the request.
>> >>
>> >> ?8,0 ? ?0 ? ? ? ?0 ?5050.116841906 ? ? 0 ?m ? N cfq162 set_active wl_prio:0 wl_type:1
>> >> ?8,0 ? ?0 ? ? ? ?0 ?5050.116844979 ? ? 0 ?m ? N cfq162 fifo=ffff8800e8787aa0
>> >> ?8,0 ? ?0 ? ? ? ?0 ?5050.116846655 ? ? 0 ?m ? N cfq162 dispatch_insert
>> >> ?8,0 ? ?0 ? ? ? ?0 ?5050.116849728 ? ? 0 ?m ? N cfq162 dispatched a
>> >> request
>> >> ?8,0 ? ?0 ? ? ? ?0 ?5050.116851683 ? ? 0 ?m ? N cfq162 activate rq, drv=1
>> >> ?8,0 ? ?0 ? ?36518 ?5050.116853360 ? 166 ?D ? W 63 + 8 (21624812743)
>> >> [kblockd/0]
>> >> ?8,0 ? ?0 ? ?36519 ?5050.117236650 ?9671 ?C ? W 63 + 8 ( ?383290) [0]
>> >>
>> >> So it is not async requestss being starved by sync request issue, most
>> >> likely.
>> >>
>> >
>> > I wanted to point out again that this is a very small io load. On the
>> > avg of a few KB/s.
>> >
>> >> Are you using any of the blk cgroup stuff?
>> >
>> > No, none of that is enabled in the kernel.
>> >
>> >>
>> >> Can you put some more trace messages to figure out what's happening.
>> >> I think you can try putting some trace messages in following functions.
>> >>
>> >> __blk_run_queue()
>> >> cfq_select_queue()
>> >>
>> >> and try to narrow down why CFQ refuses to dispatch the request when
>> >> this happens.
>> >
>> > Sure. I'll add some more traces here and see if it sheds any light on the issue.
>> >
>> > Thanks again for your help.
>> >
>> >>
>> >> Thanks
>> >> Vivek
>> >
>> > --
>> > Josh
>>
>> It looks like I am hitting this issue again. I upgraded our system to
>> the 3.0.13 kernel and it appeared to resolve the issues I was seeing.
>> Unfortunately it is still occurring now just less frequently.
>>
>> I've gone ahead and instrumented cfq_select_queue and have some more
>> information. To quickly recap my issue - using CFQ we are seeing an
>> application's async writes stall for 5-15s. This is using a sata_svw
>> controller and a rotating drive which does not have NCQ enabled. If I
>> change slice_idle to 0 the stalls appear to go away. I've also tried
>> using the deadline scheduler and do not see the issue with that. I dug
>> through CFQ and identified the line which with slice_idle set to 0
>> gets bypassed and allows my workload to run without stalling (I added
>> in the blktrace logging for my debugging):
>>
>> ? ? ? ? if (cfqq->dispatched && cfq_should_idle(cfqd, cfqq)) {
>> ? ? ? ? ? ? ? ? service_tree = cfqq->service_tree;
>> ? ? ? ? ? ? ? ? cfq_log_cfqq(cfqd, cfqq, "keep_queue st->count:%d, dispatch:%u",
>> ? ? ? ? ? ? ? ? ? ? ? ? service_tree->count, cfqq->dispatched);
>> ? ? ? ? ? ? ? ? cfqq = NULL;
>> ? ? ? ? ? ? ? ? goto keep_queue;
>> ? ? ? ? }
>>
>> When I have slice_idle set to 0, this gets bypassed and my workload
>> runs fine. I've verified this by removing this code and running the
>> workload with that kernel. I'm not saying it should be removed, just
>> that bypassing this code helps my workload for whatever reason.
>>
>> Here are some snippets from my blktrace log (attached):
>>
>> Insert:
>> ? 8,0 ? ?1 ? ? ? ?0 ?4466.115833485 ? ? 0 ?m ? N cfq5864 insert_request
>> ? 8,0 ? ?1 ? ? ? ?0 ?4466.115835161 ? ? 0 ?m ? N cfq20720 Idling.
>> st->count:2, hw_tag:0
>> ? 8,0 ? ?1 ?1499412 ?4466.115836278 ?6023 ?I ? W 7078615 + 256 (
>> 311495) [flush-8:0]
>> ? 8,0 ? ?1 ? ? ? ?0 ?4466.115837675 ? ? 0 ?m ? N cfq5864 insert_request
>> ? 8,0 ? ?1 ? ? ? ?0 ?4466.115840190 ? ? 0 ?m ? N cfq20720 Idling.
>> st->count:2, hw_tag:0
>> ? 8,0 ? ?1 ?1499413 ?4466.115841307 ?6023 ?U ? N [flush-8:0] 8
>> ? 8,0 ? ?1 ? ? ? ?0 ?4466.115842983 ? ? 0 ?m ? N cfq20720 Idling.
>> st->count:2, hw_tag:0
>> ? 8,0 ? ?1 ? ? ? ?0 ?4466.115844660 ? ? 0 ?m ? N cfq20720 keep_queue
>> st->count:2, dispatch:1
>>
>> Dispatch:
>> ? 8,0 ? ?0 ? ? ? ?0 ?4478.127274376 ? ? 0 ?m ? N cfq5864 complete rqnoidle 0
>> ? 8,0 ? ?0 ? ? ? ?0 ?4478.127318796 ? ? 0 ?m ? N cfq5864 Not idling. st->count:1
>> ? 8,0 ? ?0 ? ? ? ?0 ?4478.127320472 ? ? 0 ?m ? N cfq5864 dispatch_insert
>> ? 8,0 ? ?0 ? ? ? ?0 ?4478.127322148 ? ? 0 ?m ? N cfq5864 dispatched a request
>> ? 8,0 ? ?0 ? ? ? ?0 ?4478.127324104 ? ? 0 ?m ? N cfq5864 activate rq, drv=2
>> ? 8,0 ? ?0 ?2876000 ?4478.127325221 ?7232 ?D ? W 7078615 + 256
>> (12011488943) [testd]
>> ? 8,0 ? ?0 ?2876001 ?4478.128414478 ?7232 ?C ? W 7078135 + 264 ( 1250173) [0]
>> ? 8,0 ? ?0 ? ? ? ?0 ?4478.128445767 ? ? 0 ?m ? N cfq5864 complete rqnoidle 0
>>
>> Also in the blktrace I see gaps of 3-4s where nothing happens:
>> ?8,0 ? ?0 ? ? ? ?0 ?4466.893959719 ? ? 0 ?m ? N cfq20723 del_from_rr
>> ? 8,0 ? ?0 ? ? ? ?0 ?4466.893961395 ? ? 0 ?m ? N cfq schedule dispatch
>> ? 8,0 ? ?1 ?1499539 ?4470.000135552 ? 174 ?A ?WS 63 + 8 <- (8,1) 0
>> ? 8,0 ? ?1 ?1499540 ?4470.000138066 ? 174 ?Q ?WS 63 + 8 [sync_supers]
>>
>> Meanwhile the requests issued earlier are just sitting on the queue
>> waiting to be dispatched.
>>
>> To accelerate the issue I'm constantly dropping the cache - while :;
>> do echo 1 > /proc/sys/vm/drop_caches; done. The issue occurs without
>> it, but this helps reproduce it in about an hour. I also wanted to
>> point out that I see a decent amount of syslogd activity in the trace
>> and know that it does do fsync(). I'm not sure if that's helping the
>> stalling, but wanted to mention that.
>
> This is strange. I can see in logs that for some reason writes are not
> being dispatched and queue is doing nothing.
>
> ?8,0 ? ?0 ? ? ? ?0 ?4466.132851211 ? ? 0 ?m ? N cfq schedule dispatch
> ?8,0 ? ?0 ? ? ? ?0 ?4466.132854564 ? ? 0 ?m ? N cfq20720 keep_queue
> timer_pending st->count:2, dispatch:0
> ?8,0 ? ?0 ? ? ? ?0 ?4466.139951919 ? ? 0 ?m ? N cfq idle timer fired
> ?8,0 ? ?0 ? ? ? ?0 ?4466.139954713 ? ? 0 ?m ? N cfq20720 slice expired
> t=0
> ?8,0 ? ?0 ? ? ? ?0 ?4466.139958066 ? ? 0 ?m ? N cfq20720 sl_used=17
> disp=21 charge=17 iops=0 sect=176
> ?8,0 ? ?0 ? ? ? ?0 ?4466.139959742 ? ? 0 ?m ? N cfq20720 del_from_rr
> ?8,0 ? ?0 ? ? ? ?0 ?4466.139963653 ? ? 0 ?m ? N cfq schedule dispatch
> ?8,0 ? ?1 ?1499521 ?4466.791207736 ?7570 ?A ? R 7258191 + 8 <- (8,1)
> 7258128
> ?8,0 ? ?1 ?1499522 ?4466.791209692 ?7570 ?Q ? R 7258191 + 8 [httpdmon]
> ?8,0 ? ?1 ?1499523 ?4466.791217514 ?7570 ?G ? R 7258191 + 8 [httpdmon]
> ?8,0 ? ?1 ?1499524 ?4466.791221705 ?7570 ?I ? R 7258191 + 8 ( ? ?4191)
> [httpdmon]
> ?8,0 ? ?1 ? ? ? ?0 ?4466.791227572 ? ? 0 ?m ? N cfq7570 insert_request
> ?8,0 ? ?1 ? ? ? ?0 ?4466.791229248 ? ? 0 ?m ? N cfq7570 add_to_rr
> ?8,0 ? ?1 ? ? ? ?0 ?4466.791235953 ? ? 0 ?m ? N cfq20719 set_active
> wl_prio:0 wl_type:2
>
> Here we deleted queue 20720 and did nothing for .6 seconds and from
> previous logs it is visible that writes are pending and queued.
>
> For some reason cfq_schedule_dispatch() did not lead to kicking queue
> or queue was kicked but somehow write queue was not selected for
> dispatch (A case of corrupt data structures?).
>
> Are you able to reproduce this issue on latest kernels (3.5-rc2?). I would

I cannot do it with my current application, but will try doing it another way.

> say put some logs in select_queue() and see where did it bail out. That

Well I did add some instrumentation in select_queue, the "keep_queue
st->count:%d, dispatch:%u" line I mentioned above, but I will add more
and retest.

I'm attaching a similar run with no stalls when I set slice_idle to 0.

> will confirm that select queue was called and can also give some details
> why we did not select async queue for dispatch. (Note: select_queue is called
> multiple times so putting trace point there makes logs very verbose).
>
> Thanks
> Vivek
--
Josh

Attachments:

blktrace-slice_idle-0.bz2 (117.84 kB)

2012-06-21 21:28:26

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Thu, Jun 21, 2012 at 3:36 PM, Tejun Heo <[email protected]> wrote:
> Hey, Vivek.
>
> On Thu, Jun 21, 2012 at 04:32:17PM -0400, Vivek Goyal wrote:
>> Here we deleted queue 20720 and did nothing for .6 seconds and from
>> previous logs it is visible that writes are pending and queued.
>>
>> For some reason cfq_schedule_dispatch() did not lead to kicking queue
>> or queue was kicked but somehow write queue was not selected for
>> dispatch (A case of corrupt data structures?).
>>
>> Are you able to reproduce this issue on latest kernels (3.5-rc2?). I would
>> say put some logs in select_queue() and see where did it bail out. That
>> will confirm that select queue was called and can also give some details
>> why we did not select async queue for dispatch. (Note: select_queue is called
>> multiple times so putting trace point there makes logs very verbose).
>
> Some people are putting in watchdog timers in block layer to kick cfq
> when it stalls with pending requests. ?The cfq code there has diverged
> quite a bit from upstream so I have no idea whether it's caused by the
> same issue. ?The symptom sounds exactly the same tho. ?So, yeah, I
> think it isn't too unlikely that we have a cfq logic bug leading to
> stalls. ?:(
>
> --
> tejun
Tejun

When you say the code has diverged from upstream, do you mean from 3.0
to 3.5? Or maybe I'm misunderstanding what you're getting at. Also, if
you have any links to the watchdog timer code you're referring to I
would appreciate it.

Thanks
--
Josh

2012-06-21 21:32:40

[permalink] [raw]

Subject: Re: multi-second application stall in open()

Hello,

On Thu, Jun 21, 2012 at 04:28:24PM -0500, Josh Hunt wrote:
> When you say the code has diverged from upstream, do you mean from 3.0
> to 3.5?

It's based on something diverged from 2.6.X, so an ancient thing.

> Or maybe I'm misunderstanding what you're getting at. Also, if
> you have any links to the watchdog timer code you're referring to I
> would appreciate it.

Rakesh is the one who observed the bug and wrote the watchdog code.
Rakesh, I think Josh is seeing similar cfqq hang issue. Did the
watchdog code reveal why that happened? Or was it mainly to just kick
the queue and keep it going?

Thanks.

--
tejun

2012-06-21 21:48:06

by Rakesh Iyer

[permalink] [raw]

Subject: Re: multi-second application stall in open()

-- Resending because my mail went out in html text and got bounced by
the list, apologies if you see it twice --

Hello,

I coded up the watchdog and dropped it in but never did get the time
to go looking for evidence of stalls so no confirmed evidence of what
the cause was.

Chad and I did manage to stare at the code long and hard and sort of
convince ourselves that cfq_cfqq_wait_busy & associated logic could be
the cause of the stall (strictly in my opinion - that logic can be
fully be fully folded into the idling logic, but that's a discussion
for another day).

Hope that helps.
-Rakesh

On Thu, Jun 21, 2012 at 2:32 PM, Tejun Heo <[email protected]> wrote:
>
> Hello,
>
> On Thu, Jun 21, 2012 at 04:28:24PM -0500, Josh Hunt wrote:
> > When you say the code has diverged from upstream, do you mean from 3.0
> > to 3.5?
>
> It's based on something diverged from 2.6.X, so an ancient thing.
>
> > Or maybe I'm misunderstanding what you're getting at. Also, if
> > you have any links to the watchdog timer code you're referring to I
> > would appreciate it.
>
> Rakesh is the one who observed the bug and wrote the watchdog code.
> Rakesh, I think Josh is seeing similar cfqq hang issue. ?Did the
> watchdog code reveal why that happened? ?Or was it mainly to just kick
> the queue and keep it going?
>
> Thanks.
>
> --
> tejun

2012-06-22 14:12:18

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Thu, Jun 21, 2012 at 04:11:18PM -0500, Josh Hunt wrote:

[..]
> > say put some logs in select_queue() and see where did it bail out. That
>
> Well I did add some instrumentation in select_queue, the "keep_queue
> st->count:%d, dispatch:%u" line I mentioned above, but I will add more
> and retest.

Actually before the stall we expired the current queue. That means there
is no active queue in cfq now. So keep_queue trace will help only if there
is an active queue and we decide to keep that queue.

8,0 0 0 4466.139959742 0 m N cfq20720 del_from_rr
8,0 0 0 4466.139963653 0 m N cfq schedule dispatch
8,0 1 1499521 4466.791207736 7570 A R 7258191 + 8 <- (8,1)
7258128

Thing to figure out here is that why cfq is not picking a new queue
despite the fact there are pending requests and there is no active
queue (hence cfq is not obiviously not seeing that queue).

>
> I'm attaching a similar run with no stalls when I set slice_idle to 0.

At this point I am not sure why slice_idle=0 is not seeing this issue. It
does not see to be directly related to idling. Because when stall happens
CFQ is not idling on anything. It has expired active queue and for some
reason it thinks that I have no more requests to dispatch, so I have
nothing to do until a new request comes in.

Thanks
Vivek

2012-06-22 20:05:52

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Fri, Jun 22, 2012 at 9:12 AM, Vivek Goyal <[email protected]> wrote:
> On Thu, Jun 21, 2012 at 04:11:18PM -0500, Josh Hunt wrote:
>
> [..]
>> > say put some logs in select_queue() and see where did it bail out. That
>>
>> Well I did add some instrumentation in select_queue, the "keep_queue
>> st->count:%d, dispatch:%u" line I mentioned above, but I will add more
>> and retest.
>
> Actually before the stall we expired the current queue. That means there
> is no active queue in cfq now. So keep_queue trace will help only if there
> is an active queue and we decide to keep that queue.
>
> ?8,0 ? ?0 ? ? ? ?0 ?4466.139959742 ? ? 0 ?m ? N cfq20720 del_from_rr
> ?8,0 ? ?0 ? ? ? ?0 ?4466.139963653 ? ? 0 ?m ? N cfq schedule dispatch
> ?8,0 ? ?1 ?1499521 ?4466.791207736 ?7570 ?A ? R 7258191 + 8 <- (8,1)
> 7258128
>
> Thing to figure out here is that why cfq is not picking a new queue
> despite the fact there are pending requests and there is no active
> queue (hence cfq is not obiviously not seeing that queue).
>
>>
>> I'm attaching a similar run with no stalls when I set slice_idle to 0.
>
> At this point I am not sure why slice_idle=0 is not seeing this issue. It
> does not see to be directly related to idling. Because when stall happens
> CFQ is not idling on anything. It has expired active queue and for some
> reason it thinks that I have no more requests to dispatch, so I have
> nothing to do until a new request comes in.
>
> Thanks
> Vivek

Got some more info. It actually looks like every idle timer fire there
is some sort of stall:

8,0 0 0 4807.015613266 0 m N cfq idle timer fired
8,0 0 0 4807.015616897 0 m N cfq7646
idle_slice_timer: active queue
8,0 0 0 4807.015618853 0 m N cfq7646
idle_slice_timer: clear_cfqq_deep
8,0 0 0 4807.015620529 0 m N cfq7646 slice expired t=0
8,0 0 0 4807.015622205 0 m N cfq7646 del timer
8,0 0 0 4807.015623323 0 m N cfq7646 update idle time stats
8,0 0 0 4807.015626396 0 m N cfq7646 sl_used=8
disp=1 charge=8 iops=0 sect=8
8,0 0 0 4807.015627793 0 m N cfq7646 slice_expired
del cfqq from rr
8,0 0 0 4807.015628910 0 m N cfq7646 del_from_rr
8,0 0 0 4807.015630307 0 m N cfq7646 slice_expired
resort_rr_list
8,0 0 0 4807.015631983 0 m N cfq7646 slice_expired
setting active_queue to NULL
8,0 0 0 4807.015633380 0 m N cfq7646 slice_expired
setting active_cic to NULL
8,0 0 438262 4807.717719091 6076 A W 1639679 + 8 <- (8,1) 1639616
8,0 0 438263 4807.717720767 6076 Q W 1639679 + 8 [flush-8:0]
8,0 0 438264 4807.717727192 6076 G W 1639679 + 8 [flush-8:0]
8,0 0 438265 4807.717730265 6076 P N [flush-8:0]

You can see in the attached log that it happens every timer fire to some extent.

We get to here in the code:
static void cfq_idle_slice_timer(unsigned long data)
{
...
/*
* Queue depth flag is reset only when the idle didn't succeed
*/
cfq_clear_cfqq_deep(cfqq);
cfq_log_cfqq(cfqd, cfqq, "idle_slice_timer: clear_cfqq_deep");
}
expire:
cfq_slice_expired(cfqd, timed_out);
out_kick:
cfq_schedule_dispatch(cfqd);
out_cont:
spin_unlock_irqrestore(cfqd->queue->queue_lock, flags);
}

and from the traces we are going through cfq_slice_expired() and then
cfq_schedule_dispatch(), but the schedule_dispatch print is not there,
which means cfqd->busy_queues is false.

static inline void cfq_schedule_dispatch(struct cfq_data *cfqd)
{
if (cfqd->busy_queues) {
cfq_log(cfqd, "schedule dispatch");
kblockd_schedule_work(cfqd->queue, &cfqd->unplug_work);
}
}

I'm still going through the traces, but figured I'd pass along this
info for now. If we're stalling after every timer fire here, then it's
understandable why I'm not seeing the stalls when I set slice_idle to
0.

Will update if I have more. Let me know if there's something else
you'd like me to instrument.

--
Josh

Attachments:

sda.parsed-moreverbose2.bz2 (306.03 kB)

2012-06-22 20:22:04

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Fri, Jun 22, 2012 at 3:05 PM, Josh Hunt <[email protected]> wrote:
> On Fri, Jun 22, 2012 at 9:12 AM, Vivek Goyal <[email protected]> wrote:
>> On Thu, Jun 21, 2012 at 04:11:18PM -0500, Josh Hunt wrote:
>>
>> [..]
>>> > say put some logs in select_queue() and see where did it bail out. That
>>>
>>> Well I did add some instrumentation in select_queue, the "keep_queue
>>> st->count:%d, dispatch:%u" line I mentioned above, but I will add more
>>> and retest.
>>
>> Actually before the stall we expired the current queue. That means there
>> is no active queue in cfq now. So keep_queue trace will help only if there
>> is an active queue and we decide to keep that queue.
>>
>> ?8,0 ? ?0 ? ? ? ?0 ?4466.139959742 ? ? 0 ?m ? N cfq20720 del_from_rr
>> ?8,0 ? ?0 ? ? ? ?0 ?4466.139963653 ? ? 0 ?m ? N cfq schedule dispatch
>> ?8,0 ? ?1 ?1499521 ?4466.791207736 ?7570 ?A ? R 7258191 + 8 <- (8,1)
>> 7258128
>>
>> Thing to figure out here is that why cfq is not picking a new queue
>> despite the fact there are pending requests and there is no active
>> queue (hence cfq is not obiviously not seeing that queue).
>>
>>>
>>> I'm attaching a similar run with no stalls when I set slice_idle to 0.
>>
>> At this point I am not sure why slice_idle=0 is not seeing this issue. It
>> does not see to be directly related to idling. Because when stall happens
>> CFQ is not idling on anything. It has expired active queue and for some
>> reason it thinks that I have no more requests to dispatch, so I have
>> nothing to do until a new request comes in.
>>
>> Thanks
>> Vivek
>
> Got some more info. It actually looks like every idle timer fire there
> is some sort of stall:
>
> ?8,0 ? ?0 ? ? ? ?0 ?4807.015613266 ? ? 0 ?m ? N cfq idle timer fired
> ?8,0 ? ?0 ? ? ? ?0 ?4807.015616897 ? ? 0 ?m ? N cfq7646
> idle_slice_timer: active queue
> ?8,0 ? ?0 ? ? ? ?0 ?4807.015618853 ? ? 0 ?m ? N cfq7646
> idle_slice_timer: clear_cfqq_deep
> ?8,0 ? ?0 ? ? ? ?0 ?4807.015620529 ? ? 0 ?m ? N cfq7646 slice expired t=0
> ?8,0 ? ?0 ? ? ? ?0 ?4807.015622205 ? ? 0 ?m ? N cfq7646 del timer
> ?8,0 ? ?0 ? ? ? ?0 ?4807.015623323 ? ? 0 ?m ? N cfq7646 update idle time stats
> ?8,0 ? ?0 ? ? ? ?0 ?4807.015626396 ? ? 0 ?m ? N cfq7646 sl_used=8
> disp=1 charge=8 iops=0 sect=8
> ?8,0 ? ?0 ? ? ? ?0 ?4807.015627793 ? ? 0 ?m ? N cfq7646 slice_expired
> del cfqq from rr
> ?8,0 ? ?0 ? ? ? ?0 ?4807.015628910 ? ? 0 ?m ? N cfq7646 del_from_rr
> ?8,0 ? ?0 ? ? ? ?0 ?4807.015630307 ? ? 0 ?m ? N cfq7646 slice_expired
> resort_rr_list
> ?8,0 ? ?0 ? ? ? ?0 ?4807.015631983 ? ? 0 ?m ? N cfq7646 slice_expired
> setting active_queue to NULL
> ?8,0 ? ?0 ? ? ? ?0 ?4807.015633380 ? ? 0 ?m ? N cfq7646 slice_expired
> setting active_cic to NULL
> ?8,0 ? ?0 ? 438262 ?4807.717719091 ?6076 ?A ? W 1639679 + 8 <- (8,1) 1639616
> ?8,0 ? ?0 ? 438263 ?4807.717720767 ?6076 ?Q ? W 1639679 + 8 [flush-8:0]
> ?8,0 ? ?0 ? 438264 ?4807.717727192 ?6076 ?G ? W 1639679 + 8 [flush-8:0]
> ?8,0 ? ?0 ? 438265 ?4807.717730265 ?6076 ?P ? N [flush-8:0]
>
> You can see in the attached log that it happens every timer fire to some extent.
>
> We get to here in the code:
> static void cfq_idle_slice_timer(unsigned long data)
> {
> ...
> ? ? ? ? ? ? ? ?/*
> ? ? ? ? ? ? ? ? * Queue depth flag is reset only when the idle didn't succeed
> ? ? ? ? ? ? ? ? */
> ? ? ? ? ? ? ? ?cfq_clear_cfqq_deep(cfqq);
> ? ? ? ? ? ? ? ?cfq_log_cfqq(cfqd, cfqq, "idle_slice_timer: clear_cfqq_deep");
> ? ? ? ?}
> expire:
> ? ? ? ?cfq_slice_expired(cfqd, timed_out);
> out_kick:
> ? ? ? ?cfq_schedule_dispatch(cfqd);
> out_cont:
> ? ? ? ?spin_unlock_irqrestore(cfqd->queue->queue_lock, flags);
> }
>
> and from the traces we are going through cfq_slice_expired() and then
> cfq_schedule_dispatch(), but the schedule_dispatch print is not there,
> which means cfqd->busy_queues is false.
>
> static inline void cfq_schedule_dispatch(struct cfq_data *cfqd)
> {
> ? ? ? ?if (cfqd->busy_queues) {
> ? ? ? ? ? ? ? ?cfq_log(cfqd, "schedule dispatch");
> ? ? ? ? ? ? ? ?kblockd_schedule_work(cfqd->queue, &cfqd->unplug_work);
> ? ? ? ?}
> }
>
> I'm still going through the traces, but figured I'd pass along this
> info for now. If we're stalling after every timer fire here, then it's
> understandable why I'm not seeing the stalls when I set slice_idle to
> 0.
>
> Will update if I have more. Let me know if there's something else
> you'd like me to instrument.
>
> --
> Josh

It looks like my statement about busy_queues isn't entirely accurate.
I'm seeing this later on in the trace:

8,0 0 0 4842.798611807 0 m N cfq idle timer fired
8,0 0 0 4842.798614600 0 m N cfq11441
idle_slice_timer: active queue
8,0 0 0 4842.798616277 0 m N cfq11441
idle_slice_timer: clear_cfqq_deep
8,0 0 0 4842.798617674 0 m N cfq11441 slice expired t=0
8,0 0 0 4842.798619350 0 m N cfq11441 del timer
8,0 0 0 4842.798620747 0 m N cfq11441 update idle time stats
8,0 0 0 4842.798624658 0 m N cfq11441 sl_used=8
disp=2 charge=8 iops=0 sect=16
8,0 0 0 4842.798625775 0 m N cfq11441
slice_expired del cfqq from rr
8,0 0 0 4842.798627172 0 m N cfq11441 del_from_rr
8,0 0 0 4842.798628569 0 m N cfq11441
slice_expired resort_rr_list
8,0 0 0 4842.798629686 0 m N cfq11441
slice_expired setting active_queue to NULL
8,0 0 0 4842.798631083 0 m N cfq11441
slice_expired setting active_cic to NULL
8,0 0 0 4842.798632201 0 m N cfq schedule dispatch
8,0 1 296335 4842.809550488 11443 A R 13463743 + 8 <- (8,2) 884848
8,0 1 296336 4842.809552443 11443 Q R 13463743 + 8 [cron]

So it is getting called at times.

--
Josh

2012-06-22 20:42:14

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Fri, Jun 22, 2012 at 03:22:01PM -0500, Josh Hunt wrote:

[..]
> It looks like my statement about busy_queues isn't entirely accurate.
> I'm seeing this later on in the trace:

Yes it is being called some times. But it is interesting to see that in
the instance you pasted in last mail, we did not see "schedule_dispatch"
message. Are you sure that other requests were queued at that point of
time and there were busy_queues? You can try printing cfqd->rq_queued
to verify that.

Also put some message in cfq_select_queue() to see why it did not select
a new queue before the stall.

Thanks
Vivek

2012-06-22 20:53:09

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Fri, Jun 22, 2012 at 3:42 PM, Vivek Goyal <[email protected]> wrote:
> On Fri, Jun 22, 2012 at 03:22:01PM -0500, Josh Hunt wrote:
>
> [..]
>> It looks like my statement about busy_queues isn't entirely accurate.
>> I'm seeing this later on in the trace:
>
> Yes it is being called some times. But it is interesting to see that in
> the instance you pasted in last mail, we did not see "schedule_dispatch"
> message. ?Are you sure that other requests were queued at that point of
> time and there were busy_queues? You can try printing cfqd->rq_queued
> to verify that.
>
> Also put some message in cfq_select_queue() to see why it did not select
> a new queue before the stall.
>
> Thanks
> Vivek

I think you may be correct that at those points in time there weren't
any busy queues. I do know that there were requests waiting during
this time:

Insert:
8,0 1 296677 4847.796879805 6076 P N [flush-8:0]
8,0 1 296678 4847.796881202 6076 I W 32800441 + 8 (
2794) [flush-8:0]
8,0 1 0 4847.796883157 0 m N cfq5917 insert_request
8,0 1 296679 4847.796884834 6076 U N [flush-8:0] 1

Stall (one of many b/t insert and dispatch):
8,0 1 296851 4852.776942694 8216 U N [httpdmon] 1
8,0 1 0 4852.776945488 0 m N cfq select_queue: !cfqq
8,0 1 0 4852.776946885 0 m N cfq select_queue:
cfq_choose_cfqg
8,0 1 0 4852.776948281 0 m N cfq select_queue:
setting active queue to new_cfqq:NULL
8,0 1 0 4852.776950237 0 m N cfq8216 set_active
wl_prio:0 wl_type:1
8,0 1 0 4852.776951634 0 m N cfq8216 del timer
8,0 1 0 4852.776952751 0 m N cfq8216 update idle time stats
8,0 1 0 4852.776954427 0 m N cfq8216 Idling on
sync. st->count:1
8,0 1 0 4852.776957780 0 m N cfq8216 fifo= (null)
8,0 1 0 4852.776959456 0 m N cfq8216 dispatch_insert
8,0 1 0 4852.776961412 0 m N cfq8216 remove request
8,0 1 0 4852.776963647 0 m N cfq8216 dispatched a request
8,0 1 0 4852.776965043 0 m N cfq8216 activate rq, drv=1
8,0 1 296852 4852.776966161 8216 D R 13529455 + 8 (
34642) [httpdmon]
8,0 1 0 4852.777044943 0 m N cfq8216 Idling on
sync. st->count:1
8,0 1 0 4852.777046340 0 m N cfq8216 select_queue:
dispatched and should_idle - keep_queue
8,0 0 440281 4852.777130150 7284 C R 13529455 + 8 ( 163989) [0]
8,0 0 0 4852.777145236 0 m N cfq8216 complete rqnoidle 0
8,0 0 0 4852.777147471 0 m N cfq8216 set_slice=100
8,0 0 0 4852.777149427 0 m N cfq8216 arm slice timer
8,0 0 0 4852.777151103 0 m N cfq8216 Idling on
sync. st->count:1
8,0 0 0 4852.777153058 0 m N cfq8216 arm_idle: 8
group_idle: 0
8,0 0 0 4847.817116995 0 m N cfq schedule dispatch
8,0 0 0 4847.817120068 0 m N cfq7646 select_queue:
timer_pending - keep_queue
8,0 0 0 4847.824613293 0 m N cfq idle timer fired
8,0 0 0 4847.824616366 0 m N cfq7646
idle_slice_timer: active queue
8,0 0 0 4847.824617763 0 m N cfq7646
idle_slice_timer: clear_cfqq_deep
8,0 0 0 4847.824619439 0 m N cfq7646 slice expired t=0
8,0 0 0 4847.824621395 0 m N cfq7646 del timer
8,0 0 0 4847.824622792 0 m N cfq7646 update idle time stats
8,0 0 0 4847.824625865 0 m N cfq7646 sl_used=12
disp=3 charge=12 iops=0 sect=24
8,0 0 0 4847.824627262 0 m N cfq7646 slice_expired
del cfqq from rr
8,0 0 0 4847.824628379 0 m N cfq7646 del_from_rr
8,0 0 0 4847.824629776 0 m N cfq7646 slice_expired
resort_rr_list
8,0 0 0 4847.824631452 0 m N cfq7646 slice_expired
setting active_queue to NULL
8,0 0 0 4847.824632849 0 m N cfq7646 slice_expired
setting active_cic to NULL
8,0 0 0 4847.824634246 0 m N cfq schedule dispatch
8,0 1 296749 4848.309940967 7646 A R 33545872 + 8 <- (8,4) 2152
8,0 1 296750 4848.309956053 7646 Q R 33545872 + 8 [httpdmon]

Dispatch:
8,0 0 0 4857.832310760 0 m N cfq5917 activate rq, drv=2
8,0 0 440336 4857.832312157 7284 D W 32800441 + 8
(10035430955) [testd]
8,0 0 440337 4857.832447371 7284 C W 2076271 + 8 ( 687247) [0]
8,0 0 0 4857.832457987 0 m N cfq5917 complete rqnoidle 0

In this case you see the schedule dispatch message so there were busy
queues, but for whatever reason nothing gets run. Will keep digging.

--
Josh

2012-06-22 20:57:50

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Fri, Jun 22, 2012 at 3:53 PM, Josh Hunt <[email protected]> wrote:
> On Fri, Jun 22, 2012 at 3:42 PM, Vivek Goyal <[email protected]> wrote:
>> On Fri, Jun 22, 2012 at 03:22:01PM -0500, Josh Hunt wrote:
>>
>> [..]
>>> It looks like my statement about busy_queues isn't entirely accurate.
>>> I'm seeing this later on in the trace:
>>
>> Yes it is being called some times. But it is interesting to see that in
>> the instance you pasted in last mail, we did not see "schedule_dispatch"
>> message. ?Are you sure that other requests were queued at that point of
>> time and there were busy_queues? You can try printing cfqd->rq_queued
>> to verify that.
>>
>> Also put some message in cfq_select_queue() to see why it did not select
>> a new queue before the stall.
>>
>> Thanks
>> Vivek
>
> I think you may be correct that at those points in time there weren't
> any busy queues. I do know that there were requests waiting during
> this time:

Sorry. Please disregard the prev mail. Looks like I must have scrolled
my buffer while pasting in the traces... Here's what I meant to send:

Insert:
8,0 1 296677 4847.796879805 6076 P N [flush-8:0]
8,0 1 296678 4847.796881202 6076 I W 32800441 + 8 (
2794) [flush-8:0]
8,0 1 0 4847.796883157 0 m N cfq5917 insert_request
8,0 1 296679 4847.796884834 6076 U N [flush-8:0] 1
8,0 1 296680 4847.796901037 6076 A R 32499249 + 8 <- (8,3) 11534424
8,0 1 296681 4847.796902154 6076 Q R 32499249 + 8 [flush-8:0]
8,0 1 296682 4847.796904389 6076 G R 32499249 + 8 [flush-8:0]
8,0 1 296683 4847.796906066 6076 I R 32499249 + 8 (
1677) [flush-8:0]
8,0 1 0 4847.796907462 0 m N cfq6076 insert_request
8,0 1 0 4847.796908859 0 m N cfq6076 add_to_rr

Stall:
8,0 1 0 4847.816870313 0 m N cfq7646 insert_request
8,0 1 0 4847.816873665 0 m N cfq7646 del timer
8,0 1 0 4847.816874783 0 m N cfq7646 update idle time stats
8,0 1 0 4847.816876738 0 m N cfq7646 select_queue:
active requests/not expired - keep_queue
8,0 1 0 4847.816878414 0 m N cfq7646 Idling on
sync. st->count:1
8,0 1 0 4847.816879811 0 m N cfq7646 dispatch_insert
8,0 1 0 4847.816882046 0 m N cfq7646 remove request
8,0 1 0 4847.816884002 0 m N cfq7646 dispatched a request
8,0 1 0 4847.816885399 0 m N cfq7646 activate rq, drv=1
8,0 1 296748 4847.816887075 7646 D R 32499241 + 8 (
20115) [httpdmon]
8,0 1 0 4847.816945463 0 m N cfq7646 Idling on
sync. st->count:1
8,0 1 0 4847.816946580 0 m N cfq7646 select_queue:
dispatched and should_idle - keep_queue
8,0 1 0 4847.816948536 0 m N cfq7646 Idling on
sync. st->count:1
8,0 1 0 4847.816949653 0 m N cfq7646 select_queue:
dispatched and should_idle - keep_queue
8,0 0 440261 4847.817104144 7284 C R 32499241 + 8 ( 217069) [0]
8,0 0 0 4847.817110290 0 m N cfq7646 complete rqnoidle 0
8,0 0 0 4847.817112525 0 m N cfq7646 arm slice timer
8,0 0 0 4847.817113922 0 m N cfq7646 Idling on
sync. st->count:1
8,0 0 0 4847.817115878 0 m N cfq7646 arm_idle: 8
group_idle: 0
8,0 0 0 4847.817116995 0 m N cfq schedule dispatch
8,0 0 0 4847.817120068 0 m N cfq7646 select_queue:
timer_pending - keep_queue
8,0 0 0 4847.824613293 0 m N cfq idle timer fired
8,0 0 0 4847.824616366 0 m N cfq7646
idle_slice_timer: active queue
8,0 0 0 4847.824617763 0 m N cfq7646
idle_slice_timer: clear_cfqq_deep
8,0 0 0 4847.824619439 0 m N cfq7646 slice expired t=0
8,0 0 0 4847.824621395 0 m N cfq7646 del timer
8,0 0 0 4847.824622792 0 m N cfq7646 update idle time stats
8,0 0 0 4847.824625865 0 m N cfq7646 sl_used=12
disp=3 charge=12 iops=0 sect=24
8,0 0 0 4847.824627262 0 m N cfq7646 slice_expired
del cfqq from rr
8,0 0 0 4847.824628379 0 m N cfq7646 del_from_rr
8,0 0 0 4847.824629776 0 m N cfq7646 slice_expired
resort_rr_list
8,0 0 0 4847.824631452 0 m N cfq7646 slice_expired
setting active_queue to NULL
8,0 0 0 4847.824632849 0 m N cfq7646 slice_expired
setting active_cic to NULL
8,0 0 0 4847.824634246 0 m N cfq schedule dispatch
8,0 1 296749 4848.309940967 7646 A R 33545872 + 8 <- (8,4) 2152

Dispatch:
8,0 0 0 4857.832309642 0 m N cfq5917 dispatched a request
8,0 0 0 4857.832310760 0 m N cfq5917 activate rq, drv=2
8,0 0 440336 4857.832312157 7284 D W 32800441 + 8
(10035430955) [testd]
8,0 0 440337 4857.832447371 7284 C W 2076271 + 8 ( 687247) [0]
8,0 0 0 4857.832457987 0 m N cfq5917 complete rqnoidle 0

--
Josh

2012-06-22 21:34:09

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Fri, Jun 22, 2012 at 3:57 PM, Josh Hunt <[email protected]> wrote:
> On Fri, Jun 22, 2012 at 3:53 PM, Josh Hunt <[email protected]> wrote:
>> On Fri, Jun 22, 2012 at 3:42 PM, Vivek Goyal <[email protected]> wrote:
>>> On Fri, Jun 22, 2012 at 03:22:01PM -0500, Josh Hunt wrote:
>>>
>>> [..]
>>>> It looks like my statement about busy_queues isn't entirely accurate.
>>>> I'm seeing this later on in the trace:
>>>
>>> Yes it is being called some times. But it is interesting to see that in
>>> the instance you pasted in last mail, we did not see "schedule_dispatch"
>>> message. ?Are you sure that other requests were queued at that point of
>>> time and there were busy_queues? You can try printing cfqd->rq_queued
>>> to verify that.
>>>
>>> Also put some message in cfq_select_queue() to see why it did not select
>>> a new queue before the stall.
>>>
>>> Thanks
>>> Vivek
>>
>> I think you may be correct that at those points in time there weren't
>> any busy queues. I do know that there were requests waiting during
>> this time:
>
> Sorry. Please disregard the prev mail. Looks like I must have scrolled
> my buffer while pasting in the traces... Here's what I meant to send:
>
> Insert:
> ?8,0 ? ?1 ? 296677 ?4847.796879805 ?6076 ?P ? N [flush-8:0]
> ?8,0 ? ?1 ? 296678 ?4847.796881202 ?6076 ?I ? W 32800441 + 8 (
> 2794) [flush-8:0]
> ?8,0 ? ?1 ? ? ? ?0 ?4847.796883157 ? ? 0 ?m ? N cfq5917 insert_request
> ?8,0 ? ?1 ? 296679 ?4847.796884834 ?6076 ?U ? N [flush-8:0] 1
> ?8,0 ? ?1 ? 296680 ?4847.796901037 ?6076 ?A ? R 32499249 + 8 <- (8,3) 11534424
> ?8,0 ? ?1 ? 296681 ?4847.796902154 ?6076 ?Q ? R 32499249 + 8 [flush-8:0]
> ?8,0 ? ?1 ? 296682 ?4847.796904389 ?6076 ?G ? R 32499249 + 8 [flush-8:0]
> ?8,0 ? ?1 ? 296683 ?4847.796906066 ?6076 ?I ? R 32499249 + 8 (
> 1677) [flush-8:0]
> ?8,0 ? ?1 ? ? ? ?0 ?4847.796907462 ? ? 0 ?m ? N cfq6076 insert_request
> ?8,0 ? ?1 ? ? ? ?0 ?4847.796908859 ? ? 0 ?m ? N cfq6076 add_to_rr
>
>
> Stall:
> ?8,0 ? ?1 ? ? ? ?0 ?4847.816870313 ? ? 0 ?m ? N cfq7646 insert_request
> ?8,0 ? ?1 ? ? ? ?0 ?4847.816873665 ? ? 0 ?m ? N cfq7646 del timer
> ?8,0 ? ?1 ? ? ? ?0 ?4847.816874783 ? ? 0 ?m ? N cfq7646 update idle time stats
> ?8,0 ? ?1 ? ? ? ?0 ?4847.816876738 ? ? 0 ?m ? N cfq7646 select_queue:
> active requests/not expired - keep_queue
> ?8,0 ? ?1 ? ? ? ?0 ?4847.816878414 ? ? 0 ?m ? N cfq7646 Idling on
> sync. st->count:1
> ?8,0 ? ?1 ? ? ? ?0 ?4847.816879811 ? ? 0 ?m ? N cfq7646 dispatch_insert
> ?8,0 ? ?1 ? ? ? ?0 ?4847.816882046 ? ? 0 ?m ? N cfq7646 remove request
> ?8,0 ? ?1 ? ? ? ?0 ?4847.816884002 ? ? 0 ?m ? N cfq7646 dispatched a request
> ?8,0 ? ?1 ? ? ? ?0 ?4847.816885399 ? ? 0 ?m ? N cfq7646 activate rq, drv=1
> ?8,0 ? ?1 ? 296748 ?4847.816887075 ?7646 ?D ? R 32499241 + 8 (
> 20115) [httpdmon]
> ?8,0 ? ?1 ? ? ? ?0 ?4847.816945463 ? ? 0 ?m ? N cfq7646 Idling on
> sync. st->count:1
> ?8,0 ? ?1 ? ? ? ?0 ?4847.816946580 ? ? 0 ?m ? N cfq7646 select_queue:
> dispatched and should_idle - keep_queue
> ?8,0 ? ?1 ? ? ? ?0 ?4847.816948536 ? ? 0 ?m ? N cfq7646 Idling on
> sync. st->count:1
> ?8,0 ? ?1 ? ? ? ?0 ?4847.816949653 ? ? 0 ?m ? N cfq7646 select_queue:
> dispatched and should_idle - keep_queue
> ?8,0 ? ?0 ? 440261 ?4847.817104144 ?7284 ?C ? R 32499241 + 8 ( ?217069) [0]
> ?8,0 ? ?0 ? ? ? ?0 ?4847.817110290 ? ? 0 ?m ? N cfq7646 complete rqnoidle 0
> ?8,0 ? ?0 ? ? ? ?0 ?4847.817112525 ? ? 0 ?m ? N cfq7646 arm slice timer
> ?8,0 ? ?0 ? ? ? ?0 ?4847.817113922 ? ? 0 ?m ? N cfq7646 Idling on
> sync. st->count:1
> ?8,0 ? ?0 ? ? ? ?0 ?4847.817115878 ? ? 0 ?m ? N cfq7646 arm_idle: 8
> group_idle: 0
> ?8,0 ? ?0 ? ? ? ?0 ?4847.817116995 ? ? 0 ?m ? N cfq schedule dispatch
> ?8,0 ? ?0 ? ? ? ?0 ?4847.817120068 ? ? 0 ?m ? N cfq7646 select_queue:
> timer_pending - keep_queue
> ?8,0 ? ?0 ? ? ? ?0 ?4847.824613293 ? ? 0 ?m ? N cfq idle timer fired
> ?8,0 ? ?0 ? ? ? ?0 ?4847.824616366 ? ? 0 ?m ? N cfq7646
> idle_slice_timer: active queue
> ?8,0 ? ?0 ? ? ? ?0 ?4847.824617763 ? ? 0 ?m ? N cfq7646
> idle_slice_timer: clear_cfqq_deep
> ?8,0 ? ?0 ? ? ? ?0 ?4847.824619439 ? ? 0 ?m ? N cfq7646 slice expired t=0
> ?8,0 ? ?0 ? ? ? ?0 ?4847.824621395 ? ? 0 ?m ? N cfq7646 del timer
> ?8,0 ? ?0 ? ? ? ?0 ?4847.824622792 ? ? 0 ?m ? N cfq7646 update idle time stats
> ?8,0 ? ?0 ? ? ? ?0 ?4847.824625865 ? ? 0 ?m ? N cfq7646 sl_used=12
> disp=3 charge=12 iops=0 sect=24
> ?8,0 ? ?0 ? ? ? ?0 ?4847.824627262 ? ? 0 ?m ? N cfq7646 slice_expired
> del cfqq from rr
> ?8,0 ? ?0 ? ? ? ?0 ?4847.824628379 ? ? 0 ?m ? N cfq7646 del_from_rr
> ?8,0 ? ?0 ? ? ? ?0 ?4847.824629776 ? ? 0 ?m ? N cfq7646 slice_expired
> resort_rr_list
> ?8,0 ? ?0 ? ? ? ?0 ?4847.824631452 ? ? 0 ?m ? N cfq7646 slice_expired
> setting active_queue to NULL
> ?8,0 ? ?0 ? ? ? ?0 ?4847.824632849 ? ? 0 ?m ? N cfq7646 slice_expired
> setting active_cic to NULL
> ?8,0 ? ?0 ? ? ? ?0 ?4847.824634246 ? ? 0 ?m ? N cfq schedule dispatch
> ?8,0 ? ?1 ? 296749 ?4848.309940967 ?7646 ?A ? R 33545872 + 8 <- (8,4) 2152
>
> Dispatch:
> ?8,0 ? ?0 ? ? ? ?0 ?4857.832309642 ? ? 0 ?m ? N cfq5917 dispatched a request
> ?8,0 ? ?0 ? ? ? ?0 ?4857.832310760 ? ? 0 ?m ? N cfq5917 activate rq, drv=2
> ?8,0 ? ?0 ? 440336 ?4857.832312157 ?7284 ?D ? W 32800441 + 8
> (10035430955) [testd]
> ?8,0 ? ?0 ? 440337 ?4857.832447371 ?7284 ?C ? W 2076271 + 8 ( ?687247) [0]
> ?8,0 ? ?0 ? ? ? ?0 ?4857.832457987 ? ? 0 ?m ? N cfq5917 complete rqnoidle 0
>
> --
> Josh

This doesn't seem right to me, but I could be misunderstanding
something. The stall I pasted in the previous mail shows an insert to
cfq5917:

> 8,0 1 296678 4847.796881202 6076 I W 32800441 + 8 (
> 2794) [flush-8:0]
> 8,0 1 0 4847.796883157 0 m N cfq5917 insert_request

I then grepped my previously attached log file for changes to rr for
this queue from the time of the insert to the dispatch:

8,0 1 0 4847.030323122 0 m N cfq5917 add_to_rr
8,0 1 0 4847.030333459 0 m N cfq5917 slice_expired
resort_rr_list
8,0 1 0 4847.797539953 0 m N cfq5917 slice_expired
del cfqq from rr
8,0 1 0 4847.797541070 0 m N cfq5917 del_from_rr
8,0 1 0 4847.797542467 0 m N cfq5917 slice_expired
resort_rr_list
8,0 1 0 4847.802484220 0 m N cfq5917 add_to_rr
8,0 0 0 4847.808710791 0 m N cfq5917 slice_expired
del cfqq from rr
8,0 0 0 4847.808711908 0 m N cfq5917 del_from_rr
8,0 0 0 4847.808713305 0 m N cfq5917 slice_expired
resort_rr_list
8,0 1 0 4847.809024801 0 m N cfq5917 add_to_rr
8,0 1 0 4847.810320233 0 m N cfq5917 slice_expired
del cfqq from rr
8,0 1 0 4847.810321351 0 m N cfq5917 del_from_rr
8,0 1 0 4847.810322748 0 m N cfq5917 slice_expired
resort_rr_list
8,0 1 0 4847.810447905 0 m N cfq5917 add_to_rr
8,0 1 0 4847.812490649 0 m N cfq5917 slice_expired
del cfqq from rr
8,0 1 0 4847.812492046 0 m N cfq5917 del_from_rr
8,0 1 0 4847.812493442 0 m N cfq5917 slice_expired
resort_rr_list
8,0 1 0 4852.819290238 0 m N cfq5917 slice_expired
resort_rr_list
8,0 0 0 4857.832506038 0 m N cfq5917 slice_expired
del cfqq from rr
8,0 0 0 4857.832507156 0 m N cfq5917 del_from_rr
8,0 0 0 4857.832508553 0 m N cfq5917 slice_expired
resort_rr_list
8,0 1 0 4857.832548782 0 m N cfq5917 add_to_rr
8,0 0 0 4857.832843796 0 m N cfq5917 slice_expired
del cfqq from rr
8,0 0 0 4857.832844913 0 m N cfq5917 del_from_rr
8,0 0 0 4857.832846310 0 m N cfq5917 slice_expired
resort_rr_list
8,0 1 0 4857.833060306 0 m N cfq5917 add_to_rr

Here's the dispatch for the request above:
> 8,0 0 0 4857.832310760 0 m N cfq5917 activate rq, drv=2
> 8,0 0 440336 4857.832312157 7284 D W 32800441 + 8
> (10035430955) [testd]

Shouldn't the queue stay on the RR list until it is empty?
--
Josh

2012-06-25 13:30:58

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Fri, Jun 22, 2012 at 04:34:07PM -0500, Josh Hunt wrote:

[..]
> Shouldn't the queue stay on the RR list until it is empty?

This does look odd. cfqq should stay on service tree as long as it has
requests.

Can you attach the full log again. Also make sure that blktrace is not
dropping any trace events.

In slice_expire() we check following.

if (cfq_cfqq_on_rr(cfqq) && RB_EMPTY_ROOT(&cfqq->sort_list))
cfq_del_cfqq_rr(cfqd, cfqq);

So for some reason RB_EMPTY_ROOT() is returning true. But we must have
added the request and it should not have been empty.

cfq_insert_request()
cfq_add_rq_rb()
elv_rb_add()

So may be little more tracing after request addition will help. Just check
that RB_EMPTY_ROOT() is not true after addition of request and also print
number of requests queued.

In slice_expired() we can probably put a BUG_ON() which checks following.

BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list) && (cfqq->queued[0] || cfqq->queued[1]));

Thanks
Vivek

2012-06-25 16:22:36

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Mon, Jun 25, 2012 at 8:30 AM, Vivek Goyal <[email protected]> wrote:
> On Fri, Jun 22, 2012 at 04:34:07PM -0500, Josh Hunt wrote:
>
> [..]
>> Shouldn't the queue stay on the RR list until it is empty?
>
> This does look odd. cfqq should stay on service tree as long as it has
> requests.
>
> Can you attach the full log again. Also make sure that blktrace is not
> dropping any trace events.
>
> In slice_expire() we check following.
>
> ? ? ? ?if (cfq_cfqq_on_rr(cfqq) && RB_EMPTY_ROOT(&cfqq->sort_list))
> ? ? ? ? ? ? ? ?cfq_del_cfqq_rr(cfqd, cfqq);
>
> So for some reason RB_EMPTY_ROOT() is returning true. But we must have
> added the request and it should not have been empty.
>
> cfq_insert_request()
> ?cfq_add_rq_rb()
> ? ?elv_rb_add()
>
> So may be little more tracing after request addition will help. Just check
> that RB_EMPTY_ROOT() is not true after addition of request and also print
> number of requests queued.
>
> In slice_expired() we can probably put a BUG_ON() which checks following.
>
> BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list) && (cfqq->queued[0] || cfqq->queued[1]));
>
> Thanks
> Vivek

Vivek

First off thanks for all the time you've spent helping me on this :)

I'm attaching the log. I will add more instrumentation based on your
mail and let you know the results.

Thanks
--
Josh

Attachments:

sda.parsed-moreverbose2.bz2 (306.03 kB)

2012-06-25 17:26:07

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Mon, Jun 25, 2012 at 09:30:47AM -0400, Vivek Goyal wrote:
> In slice_expired() we can probably put a BUG_ON() which checks following.
>
> BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list) && (cfqq->queued[0] || cfqq->queued[1]));

Nit: Please use WARN_ON_ONCE() instead. Multi-sec stalls tend to be
better than panic. :)

--
tejun

2012-06-25 21:18:58

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Mon, Jun 25, 2012 at 11:22:32AM -0500, Josh Hunt wrote:
> On Mon, Jun 25, 2012 at 8:30 AM, Vivek Goyal <[email protected]> wrote:
> > On Fri, Jun 22, 2012 at 04:34:07PM -0500, Josh Hunt wrote:
> >
> > [..]
> >> Shouldn't the queue stay on the RR list until it is empty?
> >
> > This does look odd. cfqq should stay on service tree as long as it has
> > requests.
> >
> > Can you attach the full log again. Also make sure that blktrace is not
> > dropping any trace events.
> >
> > In slice_expire() we check following.
> >
> > ? ? ? ?if (cfq_cfqq_on_rr(cfqq) && RB_EMPTY_ROOT(&cfqq->sort_list))
> > ? ? ? ? ? ? ? ?cfq_del_cfqq_rr(cfqd, cfqq);
> >
> > So for some reason RB_EMPTY_ROOT() is returning true. But we must have
> > added the request and it should not have been empty.
> >
> > cfq_insert_request()
> > ?cfq_add_rq_rb()
> > ? ?elv_rb_add()
> >
> > So may be little more tracing after request addition will help. Just check
> > that RB_EMPTY_ROOT() is not true after addition of request and also print
> > number of requests queued.
> >
> > In slice_expired() we can probably put a BUG_ON() which checks following.
> >
> > BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list) && (cfqq->queued[0] || cfqq->queued[1]));
> >
> > Thanks
> > Vivek
>
> Vivek
>
> First off thanks for all the time you've spent helping me on this :)
>
> I'm attaching the log. I will add more instrumentation based on your
> mail and let you know the results.

I suspect that cfq5917 are actually two queues. One sync and other async.
In latest kernels, trace messages append "A" or "S" in queue name to
represent async/sync queue. Why do I think so.

- I see that cfq5917 has dispatched both "W" and "R" requests. "R" don't
get queued in sync queues and vice a versa.

- Once cfq5917 has preempted cfq5917. A queue can't preempt itself.

8,0 1 296615 4847.030318652 5917 I R 1572951 + 8 ( 2793)
[syslogd]
8,0 1 0 4847.030321446 0 m N cfq5917 insert_request
8,0 1 0 4847.030323122 0 m N cfq5917 add_to_rr
8,0 1 0 4847.030325636 0 m N cfq5917 preempt
8,0 1 0 4847.030327033 0 m N cfq5917 slice expired t=1
8,0 1 0 4847.030328710 0 m N cfq5917 resid=39
8,0 1 0 4847.030331783 0 m N cfq5917 sl_used=1 disp=7
charge=1 iops=0 sect=96

Hence I think that it is just a chance that both sync and async queues
have got same cfqq->pid. If you just append anohter character to represent
sync and async property of queue, that can cofirm it.

Thanks
Vivek

2012-06-25 23:06:02

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Mon, Jun 25, 2012 at 4:18 PM, Vivek Goyal <[email protected]> wrote:
> On Mon, Jun 25, 2012 at 11:22:32AM -0500, Josh Hunt wrote:
>> On Mon, Jun 25, 2012 at 8:30 AM, Vivek Goyal <[email protected]> wrote:
>> > On Fri, Jun 22, 2012 at 04:34:07PM -0500, Josh Hunt wrote:
>> >
>> > [..]
>> >> Shouldn't the queue stay on the RR list until it is empty?
>> >
>> > This does look odd. cfqq should stay on service tree as long as it has
>> > requests.
>> >
>> > Can you attach the full log again. Also make sure that blktrace is not
>> > dropping any trace events.
>> >
>> > In slice_expire() we check following.
>> >
>> > ? ? ? ?if (cfq_cfqq_on_rr(cfqq) && RB_EMPTY_ROOT(&cfqq->sort_list))
>> > ? ? ? ? ? ? ? ?cfq_del_cfqq_rr(cfqd, cfqq);
>> >
>> > So for some reason RB_EMPTY_ROOT() is returning true. But we must have
>> > added the request and it should not have been empty.
>> >
>> > cfq_insert_request()
>> > ?cfq_add_rq_rb()
>> > ? ?elv_rb_add()
>> >
>> > So may be little more tracing after request addition will help. Just check
>> > that RB_EMPTY_ROOT() is not true after addition of request and also print
>> > number of requests queued.
>> >
>> > In slice_expired() we can probably put a BUG_ON() which checks following.
>> >
>> > BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list) && (cfqq->queued[0] || cfqq->queued[1]));
>> >
>> > Thanks
>> > Vivek
>>
>> Vivek
>>
>> First off thanks for all the time you've spent helping me on this :)
>>
>> I'm attaching the log. I will add more instrumentation based on your
>> mail and let you know the results.
>
> I suspect that cfq5917 are actually two queues. One sync and other async.
> In latest kernels, trace messages append "A" or "S" in queue name to
> represent async/sync queue. Why do I think so.
>
> - I see that cfq5917 has dispatched both "W" and "R" requests. "R" don't
> ?get queued in sync queues and vice a versa.
>
> - Once cfq5917 has preempted cfq5917. A queue can't preempt itself.
>
> ?8,0 ? ?1 ? 296615 ?4847.030318652 ?5917 ?I ? R 1572951 + 8 ( ? ?2793)
> [syslogd]
> ?8,0 ? ?1 ? ? ? ?0 ?4847.030321446 ? ? 0 ?m ? N cfq5917 insert_request
> ?8,0 ? ?1 ? ? ? ?0 ?4847.030323122 ? ? 0 ?m ? N cfq5917 add_to_rr
> ?8,0 ? ?1 ? ? ? ?0 ?4847.030325636 ? ? 0 ?m ? N cfq5917 preempt
> ?8,0 ? ?1 ? ? ? ?0 ?4847.030327033 ? ? 0 ?m ? N cfq5917 slice expired t=1
> ?8,0 ? ?1 ? ? ? ?0 ?4847.030328710 ? ? 0 ?m ? N cfq5917 resid=39
> ?8,0 ? ?1 ? ? ? ?0 ?4847.030331783 ? ? 0 ?m ? N cfq5917 sl_used=1 disp=7
> charge=1 iops=0 sect=96
>
> Hence I think that it is just a chance that both sync and async queues
> have got same cfqq->pid. If you just append anohter character to represent
> sync and async property of queue, that can cofirm it.
>
> Thanks
> Vivek

You are right about the sync/async queues :) The idea that the queue
was being deleted with requests still on it was incorrect. I've
attached an updated log with some more details. It's starting to seem
like the problem is that the kblockd thread is not kicking the queue.
I will try and look in that direction some more.

--
Josh

Attachments:

sda.parsed-stall (281.96 kB)

2012-06-26 04:01:51

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Mon, Jun 25, 2012 at 6:05 PM, Josh Hunt <[email protected]> wrote:
> On Mon, Jun 25, 2012 at 4:18 PM, Vivek Goyal <[email protected]> wrote:
>> On Mon, Jun 25, 2012 at 11:22:32AM -0500, Josh Hunt wrote:
>>> On Mon, Jun 25, 2012 at 8:30 AM, Vivek Goyal <[email protected]> wrote:
>>> > On Fri, Jun 22, 2012 at 04:34:07PM -0500, Josh Hunt wrote:
>>> >
>>> > [..]
>>> >> Shouldn't the queue stay on the RR list until it is empty?
>>> >
>>> > This does look odd. cfqq should stay on service tree as long as it has
>>> > requests.
>>> >
>>> > Can you attach the full log again. Also make sure that blktrace is not
>>> > dropping any trace events.
>>> >
>>> > In slice_expire() we check following.
>>> >
>>> > ? ? ? ?if (cfq_cfqq_on_rr(cfqq) && RB_EMPTY_ROOT(&cfqq->sort_list))
>>> > ? ? ? ? ? ? ? ?cfq_del_cfqq_rr(cfqd, cfqq);
>>> >
>>> > So for some reason RB_EMPTY_ROOT() is returning true. But we must have
>>> > added the request and it should not have been empty.
>>> >
>>> > cfq_insert_request()
>>> > ?cfq_add_rq_rb()
>>> > ? ?elv_rb_add()
>>> >
>>> > So may be little more tracing after request addition will help. Just check
>>> > that RB_EMPTY_ROOT() is not true after addition of request and also print
>>> > number of requests queued.
>>> >
>>> > In slice_expired() we can probably put a BUG_ON() which checks following.
>>> >
>>> > BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list) && (cfqq->queued[0] || cfqq->queued[1]));
>>> >
>>> > Thanks
>>> > Vivek
>>>
>>> Vivek
>>>
>>> First off thanks for all the time you've spent helping me on this :)
>>>
>>> I'm attaching the log. I will add more instrumentation based on your
>>> mail and let you know the results.
>>
>> I suspect that cfq5917 are actually two queues. One sync and other async.
>> In latest kernels, trace messages append "A" or "S" in queue name to
>> represent async/sync queue. Why do I think so.
>>
>> - I see that cfq5917 has dispatched both "W" and "R" requests. "R" don't
>> ?get queued in sync queues and vice a versa.
>>
>> - Once cfq5917 has preempted cfq5917. A queue can't preempt itself.
>>
>> ?8,0 ? ?1 ? 296615 ?4847.030318652 ?5917 ?I ? R 1572951 + 8 ( ? ?2793)
>> [syslogd]
>> ?8,0 ? ?1 ? ? ? ?0 ?4847.030321446 ? ? 0 ?m ? N cfq5917 insert_request
>> ?8,0 ? ?1 ? ? ? ?0 ?4847.030323122 ? ? 0 ?m ? N cfq5917 add_to_rr
>> ?8,0 ? ?1 ? ? ? ?0 ?4847.030325636 ? ? 0 ?m ? N cfq5917 preempt
>> ?8,0 ? ?1 ? ? ? ?0 ?4847.030327033 ? ? 0 ?m ? N cfq5917 slice expired t=1
>> ?8,0 ? ?1 ? ? ? ?0 ?4847.030328710 ? ? 0 ?m ? N cfq5917 resid=39
>> ?8,0 ? ?1 ? ? ? ?0 ?4847.030331783 ? ? 0 ?m ? N cfq5917 sl_used=1 disp=7
>> charge=1 iops=0 sect=96
>>
>> Hence I think that it is just a chance that both sync and async queues
>> have got same cfqq->pid. If you just append anohter character to represent
>> sync and async property of queue, that can cofirm it.
>>
>> Thanks
>> Vivek
>
> You are right about the sync/async queues :) The idea that the queue
> was being deleted with requests still on it was incorrect. I've
> attached an updated log with some more details. It's starting to seem
> like the problem is that the kblockd thread is not kicking the queue.
> I will try and look in that direction some more.
>
> --
> Josh

So this really seems like a problem with kblockd not kicking in. I've
instrumented every path in select_queue and it's not getting hit after
schedule dispatch. Everything seems to stall at that point until a new
request comes in.

In the previous log I attached I see the following:

Insert:
8,0 1 12593 443.141157429 5853 I W 2132327 + 8 (
17879) [syslogd]
8,0 1 0 443.141159385 0 m N cfq5853A insert_request
8,0 1 0 443.141160782 0 m N cfq5853A cfq_add_rq_rb: sync:0
8,0 1 0 443.141161899 0 m N cfq5853A add_to_rr
8,0 1 0 443.141164134 0 m N cfq7591S Idling on
sync. st->count:1
8,0 1 0 443.141165810 0 m N cfq5853A
insert_request: rb_empty_root:0 queued[0]:1 [1]:0
8,0 1 12594 443.141166928 5853 U N [syslogd] 1

Stall:
8,0 0 0 443.677751758 0 m N cfq idle timer fired
8,0 0 0 443.677754272 0 m N cfq7591S
idle_slice_timer: active queue
8,0 0 0 443.677755949 0 m N cfq7591S
idle_slice_timer: clear_cfqq_deep
8,0 0 0 443.677757625 0 m N cfq7591S slice expired t=0
8,0 0 0 443.677759301 0 m N cfq7591S del timer
8,0 0 0 443.677760698 0 m N cfq7591S update idle time stats
8,0 0 0 443.677763212 0 m N cfq7591S sl_used=13
disp=12 charge=13 iops=0 sect=96
8,0 0 0 443.677764888 0 m N cfq7591S
slice_expired del cfqq from rr - on_rr:1 rb empty:1
8,0 0 0 443.677766285 0 m N cfq7591S del_from_rr
queued[0]:0 queued[1]:0
8,0 0 0 443.677767682 0 m N cfq7591S
slice_expired resort_rr_list
8,0 0 0 443.677769079 0 m N cfq7591S
slice_expired setting active_queue to NULL
8,0 0 0 443.677771314 0 m N cfq7591S
slice_expired setting active_cic to NULL
8,0 0 0 443.677772431 0 m N cfq schedule dispatch
8,0 1 12761 446.156017951 7591 A R 2151 + 8 <- (8,1) 2088

Dispatch:
8,0 0 0 449.232185100 0 m N cfq5853A dispatched a request
8,0 1 13089 449.232185938 8053 Q R 14575511 + 16 [bash]
8,0 0 0 449.232186776 0 m N cfq5853A activate rq, drv=1
8,0 0 52929 449.232188173 7251 D W 2132327 + 8
(6091030744) [testd]

Here is what I believe is happening. When the idle timer fires it goes
through cfq_idle_slice_timer() at the end of which it calls
cfq_slice_expired() and then cfq_schedule_dispatch(). Everything up to
this point appears to behave correctly based on the traces I've
captured. The only thing which does not work is the kblockd thread
never seems to call cfq_kick_queue(). At this point we're relying on
the kblockd thread to get things going again, but for whatever reason
it does not. This causes us to stall until a new request comes in
which seems to wake everything up until the next time a timer fires.

I do see that kblockd kicks in correctly sometimes:

8,0 0 0 442.285752298 0 m N cfq idle timer fired
8,0 0 0 442.285756209 0 m N cfq8050S
idle_slice_timer: active queue
8,0 0 0 442.285758165 0 m N cfq8050S
idle_slice_timer: clear_cfqq_deep
8,0 0 0 442.285760120 0 m N cfq8050S slice expired t=0
8,0 0 0 442.285762076 0 m N cfq8050S del timer
8,0 0 0 442.285763473 0 m N cfq8050S update idle time stats
8,0 0 0 442.285766546 0 m N cfq8050S sl_used=8
disp=1 charge=8 iops=0 sect=8
8,0 0 0 442.285768222 0 m N cfq8050S
slice_expired del cfqq from rr - on_rr:1 rb empty:1
8,0 0 0 442.285769619 0 m N cfq8050S del_from_rr
queued[0]:0 queued[1]:0
8,0 0 0 442.285771295 0 m N cfq8050S
slice_expired resort_rr_list
8,0 0 0 442.285772692 0 m N cfq8050S
slice_expired setting active_queue to NULL
8,0 0 0 442.285774927 0 m N cfq8050S
slice_expired setting active_cic to NULL
8,0 0 0 442.285776603 0 m N cfq schedule dispatch
8,0 0 0 443.083035229 0 m N cfq kick queue
8,0 0 0 443.083038302 0 m N cfq select_queue: !cfqq

Here you can see schedule dispatch is called and then we see kick
queue, etc. There is still ~800ms b/f the queue is kicked, but at
least it does kick the queue in this instance.

I do not see the stalls when I set slice_idle to 0. With slice_idle
set to 0 we do not traverse the path through cfq_idle_slice_timer(),
and it appears we do not rely on kblockd to get things going for us as
much?

The last kernel where we did not see stalls of this magnitude was
2.6.32. I believe there was some reworking of the workqueue infra
after this point and I'm wondering if this may be causing the issue?

Thoughts?
--
Josh

2012-06-26 13:00:00

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Mon, Jun 25, 2012 at 11:01:48PM -0500, Josh Hunt wrote:

[..]
> So this really seems like a problem with kblockd not kicking in. I've
> instrumented every path in select_queue and it's not getting hit after
> schedule dispatch. Everything seems to stall at that point until a new
> request comes in.

Ok, that's cool. So now we need to find out why queued work is not being
scheduled.

I think there are some workqueue related trace points. If you enable those
along with blktraces, that should give tejun some data to look at.

Thanks
Vivek

2012-06-26 15:18:06

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Tue, Jun 26, 2012 at 7:59 AM, Vivek Goyal <[email protected]> wrote:
> On Mon, Jun 25, 2012 at 11:01:48PM -0500, Josh Hunt wrote:
>
> [..]
>> So this really seems like a problem with kblockd not kicking in. I've
>> instrumented every path in select_queue and it's not getting hit after
>> schedule dispatch. Everything seems to stall at that point until a new
>> request comes in.
>
> Ok, that's cool. So now we need to find out why queued work is not being
> scheduled.
>
> I think there are some workqueue related trace points. If you enable those
> along with blktraces, that should give tejun some data to look at.
>
> Thanks
> Vivek

Tejun

Do you have any suggestions on how to debug this?

I did "perf record -a -e workqueue:*" and grabbed some tracepoint
data, but it's hard to correlate when these events are occurring in
the blktrace logs. Will keep investigating.
--
Josh

2012-06-26 15:53:38

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Tue, Jun 26, 2012 at 10:18:04AM -0500, Josh Hunt wrote:
> On Tue, Jun 26, 2012 at 7:59 AM, Vivek Goyal <[email protected]> wrote:
> > On Mon, Jun 25, 2012 at 11:01:48PM -0500, Josh Hunt wrote:
> >
> > [..]
> >> So this really seems like a problem with kblockd not kicking in. I've
> >> instrumented every path in select_queue and it's not getting hit after
> >> schedule dispatch. Everything seems to stall at that point until a new
> >> request comes in.
> >
> > Ok, that's cool. So now we need to find out why queued work is not being
> > scheduled.
> >
> > I think there are some workqueue related trace points. If you enable those
> > along with blktraces, that should give tejun some data to look at.
> >
> > Thanks
> > Vivek
>
> Tejun
>
> Do you have any suggestions on how to debug this?
>
> I did "perf record -a -e workqueue:*" and grabbed some tracepoint
> data, but it's hard to correlate when these events are occurring in
> the blktrace logs. Will keep investigating.

If you capture blktrace logs through trace_pipe and not blktrace
tool, you will get both workqueue and block traces with time stamps and
then correlating these becomes easier. So just enable "blk" tracer, enable
tracing on that particular device and then enable certaion workqueue
related trace events and capture trace_pipe output.

thanks
Vivek

2012-06-26 20:37:35

[permalink] [raw]

Subject: Re: multi-second application stall in open()

On Tue, Jun 26, 2012 at 10:53 AM, Vivek Goyal <[email protected]> wrote:
> On Tue, Jun 26, 2012 at 10:18:04AM -0500, Josh Hunt wrote:
>> On Tue, Jun 26, 2012 at 7:59 AM, Vivek Goyal <[email protected]> wrote:
>> > On Mon, Jun 25, 2012 at 11:01:48PM -0500, Josh Hunt wrote:
>> >
>> > [..]
>> >> So this really seems like a problem with kblockd not kicking in. I've
>> >> instrumented every path in select_queue and it's not getting hit after
>> >> schedule dispatch. Everything seems to stall at that point until a new
>> >> request comes in.
>> >
>> > Ok, that's cool. So now we need to find out why queued work is not being
>> > scheduled.
>> >
>> > I think there are some workqueue related trace points. If you enable those
>> > along with blktraces, that should give tejun some data to look at.
>> >
>> > Thanks
>> > Vivek
>>
>> Tejun
>>
>> Do you have any suggestions on how to debug this?
>>
>> I did "perf record -a -e workqueue:*" and grabbed some tracepoint
>> data, but it's hard to correlate when these events are occurring in
>> the blktrace logs. Will keep investigating.
>
> If you capture blktrace logs through trace_pipe and not blktrace
> tool, you will get both workqueue and block traces with time stamps and
> then correlating these becomes easier. So just enable "blk" tracer, enable
> tracing on that particular device and then enable certaion workqueue
> related trace events and capture trace_pipe output.
>
> thanks
> Vivek

Thanks Vivek! Always nice to learn some new debugging tricks! :)

I don't know if these will paste in very well so I'm attaching some
snippets from the log I captured as well. Here is the bad case:

testd-7261 [000] 9527.900483: 8,0 m N cfq14995S
complete rqnoidle 0
testd-7261 [000] 9527.900485: 8,0 m N cfq14995S
arm slice timer
testd-7261 [000] 9527.900486: 8,0 m N cfq14995S
Idling. st->count:1, hw_tag:0
testd-7261 [000] 9527.900488: 8,0 m N cfq14995S
arm_idle: 8 group_idle: 0
testd-7261 [000] 9527.900489: 8,0 m N cfq schedule
dispatch: busy_queues:1 rq_queued:0 rq_in_driver:0 rq_in_flight[0]:0
[1]:0
testd-7261 [000] 9527.900494: 8,0 m N cfq
select_queue: !rq_queued
testd-7261 [000] 9527.907998: 8,0 m N cfq idle timer fired
testd-7261 [000] 9527.908003: 8,0 m N cfq14995S
idle_slice_timer: active queue
testd-7261 [000] 9527.908004: 8,0 m N cfq14995S
idle_slice_timer: clear_cfqq_deep
testd-7261 [000] 9527.908006: 8,0 m N cfq14995S
slice expired t=0
testd-7261 [000] 9527.908007: 8,0 m N cfq14995S del timer
testd-7261 [000] 9527.908008: 8,0 m N cfq14995S
update idle time stats
testd-7261 [000] 9527.908011: 8,0 m N cfq14995S
sl_used=9 disp=2 charge=9 iops=0 sect=16
testd-7261 [000] 9527.908012: 8,0 m N cfq14995S
slice_expired del cfqq from rr - on_rr:1 rb empty:1
testd-7261 [000] 9527.908014: 8,0 m N cfq14995S
del_from_rr queued[0]:0 queued[1]:0
testd-7261 [000] 9527.908015: 8,0 m N cfq14995S
slice_expired resort_rr_list
testd-7261 [000] 9527.908016: 8,0 m N cfq14995S
slice_expired setting active_queue to NULL
testd-7261 [000] 9527.908017: 8,0 m N cfq14995S
slice_expired setting active_cic to NULL
testd-7261 [000] 9527.908019: 8,0 m N cfq schedule
dispatch: busy_queues:1 rq_queued:1 rq_in_driver:0 rq_in_flight[0]:0
[1]:0
flush-8:0-6082 [001] 9528.075769: 8,0 A W 28304865 + 8
<- (8,3) 7340040
flush-8:0-6082 [001] 9528.075773: block_bio_remap: 8,0 W
28304865 + 8 <- (8,3) 7340040

The print "cfq schedule dispatch: busy_queues:1 rq_queued:1
rq_in_driver:0 rq_in_flight[0]:0 [1]:0" is inside the if to call
kblockd_schedule_work() so it is getting called, but you can see that
we never see the workqueue_queue_work tracepoint being hit.

Here's what it looks like in the good case:

<idle>-0 [000] 9523.042437: 8,0 m N cfq idle timer fired
<idle>-0 [000] 9523.042441: 8,0 m N cfq6082S
idle_slice_timer: active queue
<idle>-0 [000] 9523.042442: 8,0 m N cfq6082S
idle_slice_timer: clear_cfqq_deep
<idle>-0 [000] 9523.042443: 8,0 m N cfq6082S
slice expired t=0
<idle>-0 [000] 9523.042444: 8,0 m N cfq6082S del timer
<idle>-0 [000] 9523.042445: 8,0 m N cfq6082S
update idle time stats
<idle>-0 [000] 9523.042447: 8,0 m N cfq6082S
sl_used=8 disp=1 charge=8 iops=0 sect=8
<idle>-0 [000] 9523.042449: 8,0 m N cfq6082S
slice_expired del cfqq from rr - on_rr:1 rb empty:1
<idle>-0 [000] 9523.042450: 8,0 m N cfq6082S
del_from_rr queued[0]:0 queued[1]:0
<idle>-0 [000] 9523.042451: 8,0 m N cfq6082S
slice_expired resort_rr_list
<idle>-0 [000] 9523.042452: 8,0 m N cfq6082S
slice_expired setting active_queue to NULL
<idle>-0 [000] 9523.042453: 8,0 m N cfq6082S
slice_expired setting active_cic to NULL
<idle>-0 [000] 9523.042455: 8,0 m N cfq schedule
dispatch: busy_queues:1 rq_queued:1 rq_in_driver:0 rq_in_flight[0]:0
[1]:0
<idle>-0 [000] 9523.042456: workqueue_queue_work: work
struct=ffff880108db8a48 function=cfq_kick_queue
workqueue=ffff88010cf5e800 req_cpu=0 cpu=0
<idle>-0 [000] 9523.042456: workqueue_activate_work:
work struct ffff880108db8a48
kworker/0:1-12940 [000] 9523.042462: workqueue_execute_start:
work struct ffff880108db8a48: function cfq_kick_queue
kworker/0:1-12940 [000] 9523.042462: 8,0 m N cfq kick queue
kworker/0:1-12940 [000] 9523.042464: 8,0 m N cfq
select_queue: !cfqq
kworker/0:1-12940 [000] 9523.042464: 8,0 m N cfq
select_queue: cfq_choose_cfqg
kworker/0:1-12940 [000] 9523.042466: 8,0 m N cfq workload slice:40
kworker/0:1-12940 [000] 9523.042466: 8,0 m N cfq
select_queue: setting active queue to new_cfqq:NULL
kworker/0:1-12940 [000] 9523.042468: 8,0 m N cfq5979A
set_active wl_prio:0 wl_type:0

Tejun - if you could let me know what other data I should be looking
for/need to enable in the traces I would appreciate it.

Thanks!
--
Josh

Attachments:

workqueue.trace (249.42 kB)

2012-06-26 20:43:27

[permalink] [raw]

Subject: Re: multi-second application stall in open()

Hello,

On Tue, Jun 26, 2012 at 10:18:04AM -0500, Josh Hunt wrote:
> Do you have any suggestions on how to debug this?
>
> I did "perf record -a -e workqueue:*" and grabbed some tracepoint
> data, but it's hard to correlate when these events are occurring in
> the blktrace logs. Will keep investigating.

Yeah, as Vivek said, it would be better to get both traces through
perf instead of using blktrace. queue_work and execute_start TPs
print out the function pointer. Only printing out entries which match
the unplug_work would reduce a lot of noise - you can grep
cfq_kick_queue's address from /proc/kallsyms and set up
/sys/kernel/debug/tracing/events/workqueue/EVENT/filter - "echo
function==0xFUNCADDR > /sys/kernel/debug.../filter". Not sure how to
do that with perf tho.

Ooh, you posted another message. Will continue there.

Thanks.

--
tejun

2012-06-26 20:56:57