2014-01-21 16:06:21

by Frank Mayhar

[permalink] [raw]
Subject: Crash in elevator_dispatch_fn() (e.g. deadline_dispatch()) when changing elevators.

On Sat, 2014-01-18 at 09:31 -0500, Tejun Heo wrote:
> Hello, Frank.
>
> On Fri, Jan 17, 2014 at 01:59:36PM -0800, Frank Mayhar wrote:
> > After investigation, it's clear that the elevator switch code is trying
> > to quiesce the request queue and sets the bypass flag. Unfortunately,
> > scsi_mode_sense() and friends call blk_execute_rq() directly, which pays
> > no attention to said flag.
>
> Ouch.
>
> > In fact, the bypass flag, as far as I can tell, is only checked in the
> > blk_cgroup.c, in blkg_lookup() and blkg_lookup_create().
>
> Hmmm? IIRC the main place is in __get_request(). If the queue is
> bypassing, the request doesn't get REQ_ELVPRIV which basically
> indicates that the request shouldn't go through elevator. I don't
> think that part is broken even for scsi_execute_rq() - it uses
> blk_get_request() which should handle it properly.
>
> > So my question is: Is this a simple oversight? Should blk_execute_rq()
>
> Yeah, it's a bug. It's not supposed to crash like that.

Clearly. :-)

> > care about the bypass flag? Should it perhaps hold off the I/O until
> > the bypass flag is cleared? Since at that level it has nothing to do
> > with cgroups I kinda don't think so but I'm still trying to get my head
> > around how all this stuff goes together.
>
> The cgroup part isn't really relevant. That's just because cgroup
> also uses bypassing to quiesce the queue when it's changing blkcg
> policies. The thing relevant to the elveator is the check in
> __get_request().
>
> Hmm.... the culprit is that we don't have any check in the dispatch
> path. It doesn't really matter who's calling blk_peek_request(), that
> function is called as part of all request dispatches and should never
> crash. I think the only reason we haven't noticed yet is because
> calling evelator dispatch_fn doesn't crash in most cases even if the
> elevator isn't in fully working condition.
>
> Probably what we need is replacing blk_queue_dying() with
> blk_queue_bypass() test in __elv_next_request() before invoking the
> elevator dispatch function.

Replacing? Or adding to? Is BYPASS always set when DYING is set? (My
guess is not but I haven't done an exhaustive analysis.) So the
relevant code snippet in __elv_next_request() would be:
if (unlikely(blk_queue_dying(q)) ||
unlikely(blk_queue_bypass(q)) ||
!q->elevator->type->ops.elevator_dispatch_fn(q, 0))
return NULL;

> Can you please reply w/ Jens and lkml cc'd with the original
> description and my response? No reason to keep this private.

Sure, doing so. I just wanted to make sure my understanding was correct
before bringing others into it.

> Thanks a lot for the report!

No problem. Thanks for the quick response. Sorry for my
three-day-weekend-delayed reply. :-)


Jens, others, my original email follows:

> Hey, Tejun. I have a question about stuff going on in the block layer
> that's not quite as straightforward as one might wish. We have a
> situation in which we need to use something other than the CFQ I/O
> scheduler so after we create a block device (iSCSI, as it happens) we
> switch it to use the deadline scheduler. Unfortunately we've run into a
> crash when we do this in which a request completion (in some cases) or a
> sync I/O (in others) calls blk_peek_request() which in turn calls
> __elv_next_request() which in its turn calls
> q->elevator->type->ops.elevator_dispatch_fn(), which happens at the
> point of the call be deadline_dispatch_requests(), which promptly falls
> over because we're in the middle of the elevator switch and
> q->elevator->elevator_data is NULL.
>
> After investigation, it's clear that the elevator switch code is trying
> to quiesce the request queue and sets the bypass flag. Unfortunately,
> scsi_mode_sense() and friends call blk_execute_rq() directly, which pays
> no attention to said flag.
>
> In fact, the bypass flag, as far as I can tell, is only checked in the
> blk_cgroup.c, in blkg_lookup() and blkg_lookup_create().
>
> So my question is: Is this a simple oversight? Should blk_execute_rq()
> care about the bypass flag? Should it perhaps hold off the I/O until
> the bypass flag is cleared? Since at that level it has nothing to do
> with cgroups I kinda don't think so but I'm still trying to get my head
> around how all this stuff goes together.
>
> Any hints would be _greatly_ appreciated. Thanks!
--
Frank Mayhar
310-460-4042


2014-01-22 15:47:10

by Frank Mayhar

[permalink] [raw]
Subject: Re: Crash in elevator_dispatch_fn() (e.g. deadline_dispatch()) when changing elevators.

On Tue, 2014-01-21 at 07:58 -0800, Frank Mayhar wrote:
> Replacing? Or adding to? Is BYPASS always set when DYING is set? (My
> guess is not but I haven't done an exhaustive analysis.) So the
> relevant code snippet in __elv_next_request() would be:
> if (unlikely(blk_queue_dying(q)) ||
> unlikely(blk_queue_bypass(q)) ||
> !q->elevator->type->ops.elevator_dispatch_fn(q, 0))
> return NULL;

FYI, I've made this change and tested it. I can't say for certain that
it fixes the crash (since it's one of those races that's difficult to
reproduce), but it does seem to pass all the tests I've thrown at it so
far.
--
Frank Mayhar
310-460-4042

2014-01-23 18:38:40

by Frank Mayhar

[permalink] [raw]
Subject: Re: Crash in elevator_dispatch_fn() (e.g. deadline_dispatch()) when changing elevators.

On Wed, 2014-01-22 at 07:46 -0800, Frank Mayhar wrote:
> On Tue, 2014-01-21 at 07:58 -0800, Frank Mayhar wrote:
> > Replacing? Or adding to? Is BYPASS always set when DYING is set? (My
> > guess is not but I haven't done an exhaustive analysis.) So the
> > relevant code snippet in __elv_next_request() would be:
> > if (unlikely(blk_queue_dying(q)) ||
> > unlikely(blk_queue_bypass(q)) ||
> > !q->elevator->type->ops.elevator_dispatch_fn(q, 0))
> > return NULL;
>
> FYI, I've made this change and tested it. I can't say for certain that
> it fixes the crash (since it's one of those races that's difficult to
> reproduce), but it does seem to pass all the tests I've thrown at it so
> far.

Um, does anyone care about this? Tejun? Jens? Anyone?

This is a real crash; it would be nice if someone would weigh in.
--
Frank Mayhar
310-460-4042

2014-01-23 18:56:47

by Tejun Heo

[permalink] [raw]
Subject: Re: Crash in elevator_dispatch_fn() (e.g. deadline_dispatch()) when changing elevators.

On Thu, Jan 23, 2014 at 10:38:33AM -0800, Frank Mayhar wrote:
> On Wed, 2014-01-22 at 07:46 -0800, Frank Mayhar wrote:
> > On Tue, 2014-01-21 at 07:58 -0800, Frank Mayhar wrote:
> > > Replacing? Or adding to? Is BYPASS always set when DYING is set? (My
> > > guess is not but I haven't done an exhaustive analysis.) So the
> > > relevant code snippet in __elv_next_request() would be:
> > > if (unlikely(blk_queue_dying(q)) ||
> > > unlikely(blk_queue_bypass(q)) ||
> > > !q->elevator->type->ops.elevator_dispatch_fn(q, 0))
> > > return NULL;
> >
> > FYI, I've made this change and tested it. I can't say for certain that
> > it fixes the crash (since it's one of those races that's difficult to
> > reproduce), but it does seem to pass all the tests I've thrown at it so
> > far.
>
> Um, does anyone care about this? Tejun? Jens? Anyone?
>
> This is a real crash; it would be nice if someone would weigh in.

Yeah, we're gonna fix this and I *think* replacing dying with bypass
is the right thing to do as a queue is always bypassing when killed.
It's probably just that we're in the earlier part of the merge window
and I have some other things on my plate. Will post a patch in a
couple days.

Thanks.

--
tejun

2014-01-23 21:14:44

by Frank Mayhar

[permalink] [raw]
Subject: Re: Crash in elevator_dispatch_fn() (e.g. deadline_dispatch()) when changing elevators.

On Thu, 2014-01-23 at 13:56 -0500, Tejun Heo wrote:
> On Thu, Jan 23, 2014 at 10:38:33AM -0800, Frank Mayhar wrote:
> > On Wed, 2014-01-22 at 07:46 -0800, Frank Mayhar wrote:
> > > On Tue, 2014-01-21 at 07:58 -0800, Frank Mayhar wrote:
> > > > Replacing? Or adding to? Is BYPASS always set when DYING is set? (My
> > > > guess is not but I haven't done an exhaustive analysis.) So the
> > > > relevant code snippet in __elv_next_request() would be:
> > > > if (unlikely(blk_queue_dying(q)) ||
> > > > unlikely(blk_queue_bypass(q)) ||
> > > > !q->elevator->type->ops.elevator_dispatch_fn(q, 0))
> > > > return NULL;
> > >
> > > FYI, I've made this change and tested it. I can't say for certain that
> > > it fixes the crash (since it's one of those races that's difficult to
> > > reproduce), but it does seem to pass all the tests I've thrown at it so
> > > far.
> >
> > Um, does anyone care about this? Tejun? Jens? Anyone?
> >
> > This is a real crash; it would be nice if someone would weigh in.
>
> Yeah, we're gonna fix this and I *think* replacing dying with bypass
> is the right thing to do as a queue is always bypassing when killed.
> It's probably just that we're in the earlier part of the merge window
> and I have some other things on my plate. Will post a patch in a
> couple days.

Thanks!

For the record, I've seriously beaten on the change above (despite it
maybe being redundant, it seems to do the right thing) and have seen no
problems so far. I've changed it to _just_ check bypass and will beat
on that now. If you're correct that a dying queue always has bypass set
then I anticipate no problems.
--
Frank Mayhar
310-460-4042