Hello,
Commit
9d5a4e946ce5352f19400b6370f4cd8e72806278
block: skip elevator data initialization for flush requests
Skip elevator initialization for flush requests by passing priv=0 to
blk_alloc_request() in get_request(). As such elv_set_request() is
never called for flush requests.
introduced priv flag, to skip elevator_private data init for FLUSH requests.
This, I guess, lead to NULL pointer deref on my machine in cfq_insert_request,
which requires elevator_private to be set:
1 [ 78.982169] Call Trace:
2 [ 78.982178] [<ffffffff8122d1fe>] cfq_insert_request+0x4e/0x47d
3 [ 78.982184] [<ffffffff8123e139>] ? do_raw_spin_lock+0x6b/0x122
4 [ 78.982189] [<ffffffff81218c4a>] elv_insert+0x212/0x265
5 [ 78.982192] [<ffffffff81218ced>] __elv_add_request+0x50/0x52
6 [ 78.982195] [<ffffffff8121b44a>] flush_plug_list+0xce/0x12f
7 [ 78.982199] [<ffffffff8121b4c0>] __blk_flush_plug+0x15/0x21
8 [ 78.982205] [<ffffffff8146b321>] schedule+0x43e/0xbea
9 [ 78.982211] [<ffffffff8106f8bd>] ? __lock_acquire+0x149d/0x1576
10 [ 78.982215] [<ffffffff8121ba9e>] ? drive_stat_acct+0x1b6/0x1c3
11 [ 78.982218] [<ffffffff8121b92c>] ? drive_stat_acct+0x44/0x1c3
12 [ 78.982223] [<ffffffff8121f641>] ? __make_request+0x268/0x2bf
13 [ 78.982226] [<ffffffff8146bf66>] schedule_timeout+0x35/0x3b8
14 [ 78.982230] [<ffffffff810705ed>] ? mark_held_locks+0x4b/0x6d
15 [ 78.982234] [<ffffffff8146e768>] ? _raw_spin_unlock_irq+0x28/0x56
16 [ 78.982239] [<ffffffff81033bc1>] ? get_parent_ip+0xe/0x3e
17 [ 78.982244] [<ffffffff81471822>] ? sub_preempt_count+0x90/0xa3
18 [ 78.982247] [<ffffffff8146acbb>] wait_for_common+0xc3/0x141
19 [ 78.982251] [<ffffffff8103823a>] ? default_wake_function+0x0/0xf
20 [ 78.982254] [<ffffffff8146ad51>] wait_for_completion+0x18/0x1a
21 [ 78.982258] [<ffffffff8122087b>] blkdev_issue_flush+0xcb/0x11a
22 [ 78.982264] [<ffffffff811a9d65>] ext4_sync_file+0x2b3/0x302
23 [ 78.982268] [<ffffffff81129e25>] vfs_fsync_range+0x55/0x75
24 [ 78.982271] [<ffffffff81129e84>] generic_write_sync+0x3f/0x41
25 [ 78.982278] [<ffffffff810c6c6c>] generic_file_aio_write+0x8c/0xb9
26 [ 78.982281] [<ffffffff811a99a9>] ext4_file_write+0x1dc/0x237
27 [ 78.982285] [<ffffffff8123e139>] ? do_raw_spin_lock+0x6b/0x122
28 [ 78.982288] [<ffffffff811a97cd>] ? ext4_file_write+0x0/0x237
29 [ 78.982292] [<ffffffff81104859>] do_sync_readv_writev+0xb4/0xf9
30 [ 78.982298] [<ffffffff8120af11>] ? security_file_permission+0x1e/0x84
31 [ 78.982302] [<ffffffff8110418e>] ? rw_verify_area+0xab/0xc8
32 [ 78.982305] [<ffffffff81104ac6>] do_readv_writev+0xb8/0x17d
33 [ 78.982309] [<ffffffff81105b5e>] ? fget_light+0x166/0x30b
34 [ 78.982312] [<ffffffff81104bcb>] vfs_writev+0x40/0x42
35 [ 78.982315] [<ffffffff81104e1c>] sys_pwritev+0x55/0x99
36 [ 78.982320] [<ffffffff81474a52>] system_call_fastpath+0x16/0x1b
37
Should we in that case use ELEVATOR_INSERT_FLUSH for REQ_FLUSH | REQ_FUA requests
(like below)?
---
block/elevator.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/block/elevator.c b/block/elevator.c
index c387d31..b17e577 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -734,6 +734,8 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
q->end_sector = rq_end_sector(rq);
q->boundary_rq = rq;
}
+ } else if (rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
+ where = ELEVATOR_INSERT_FLUSH;
} else if (!(rq->cmd_flags & REQ_ELVPRIV) &&
where == ELEVATOR_INSERT_SORT)
where = ELEVATOR_INSERT_BACK;
On 2011.03.25 at 17:15 +0200, Sergey Senozhatsky wrote:
> Hello,
>
> Commit
> 9d5a4e946ce5352f19400b6370f4cd8e72806278
> block: skip elevator data initialization for flush requests
>
> Skip elevator initialization for flush requests by passing priv=0 to
> blk_alloc_request() in get_request(). As such elv_set_request() is
> never called for flush requests.
>
> introduced priv flag, to skip elevator_private data init for FLUSH requests.
> This, I guess, lead to NULL pointer deref on my machine in cfq_insert_request,
> which requires elevator_private to be set:
>
> 1 [ 78.982169] Call Trace:
> 2 [ 78.982178] [<ffffffff8122d1fe>] cfq_insert_request+0x4e/0x47d
> 3 [ 78.982184] [<ffffffff8123e139>] ? do_raw_spin_lock+0x6b/0x122
> Should we in that case use ELEVATOR_INSERT_FLUSH for REQ_FLUSH | REQ_FUA requests
> (like below)?
>
> ---
>
> block/elevator.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/block/elevator.c b/block/elevator.c
> index c387d31..b17e577 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -734,6 +734,8 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
> q->end_sector = rq_end_sector(rq);
> q->boundary_rq = rq;
> }
> + } else if (rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
> + where = ELEVATOR_INSERT_FLUSH;
> } else if (!(rq->cmd_flags & REQ_ELVPRIV) &&
> where == ELEVATOR_INSERT_SORT)
> where = ELEVATOR_INSERT_BACK;
Thanks. That solves all (corruption-) problems that I reported earlier in an other
thread.
--
Markus
On Fri, Mar 25 2011 at 11:22am -0400,
Markus Trippelsdorf <[email protected]> wrote:
> On 2011.03.25 at 17:15 +0200, Sergey Senozhatsky wrote:
> > Hello,
> >
> > Commit
> > 9d5a4e946ce5352f19400b6370f4cd8e72806278
> > block: skip elevator data initialization for flush requests
> >
> > Skip elevator initialization for flush requests by passing priv=0 to
> > blk_alloc_request() in get_request(). As such elv_set_request() is
> > never called for flush requests.
> >
> > introduced priv flag, to skip elevator_private data init for FLUSH requests.
> > This, I guess, lead to NULL pointer deref on my machine in cfq_insert_request,
> > which requires elevator_private to be set:
> >
> > 1 [ 78.982169] Call Trace:
> > 2 [ 78.982178] [<ffffffff8122d1fe>] cfq_insert_request+0x4e/0x47d
> > 3 [ 78.982184] [<ffffffff8123e139>] ? do_raw_spin_lock+0x6b/0x122
>
> > Should we in that case use ELEVATOR_INSERT_FLUSH for REQ_FLUSH | REQ_FUA requests
> > (like below)?
> >
> > ---
> >
> > block/elevator.c | 2 ++
> > 1 files changed, 2 insertions(+), 0 deletions(-)
> >
> > diff --git a/block/elevator.c b/block/elevator.c
> > index c387d31..b17e577 100644
> > --- a/block/elevator.c
> > +++ b/block/elevator.c
> > @@ -734,6 +734,8 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
> > q->end_sector = rq_end_sector(rq);
> > q->boundary_rq = rq;
> > }
> > + } else if (rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
> > + where = ELEVATOR_INSERT_FLUSH;
> > } else if (!(rq->cmd_flags & REQ_ELVPRIV) &&
> > where == ELEVATOR_INSERT_SORT)
> > where = ELEVATOR_INSERT_BACK;
>
> Thanks. That solves all (corruption-) problems that I reported earlier in an other
> thread.
So the flush-merge changes introduced ELEVATOR_INSERT_FLUSH (via commit
ae1b1539). And the flush bio will now get ELEVATOR_INSERT_FLUSH in
__make_request().
So it is interesting that the flush is getting inserted in the elevator
at all. AFAIK that shouldn't be (and historically hasn't been) the
case.
Combination of onstack plug changes?
1 [ 78.982169] Call Trace:
2 [ 78.982178] [<ffffffff8122d1fe>] cfq_insert_request+0x4e/0x47d
3 [ 78.982184] [<ffffffff8123e139>] ? do_raw_spin_lock+0x6b/0x122
4 [ 78.982189] [<ffffffff81218c4a>] elv_insert+0x212/0x265
5 [ 78.982192] [<ffffffff81218ced>] __elv_add_request+0x50/0x52
6 [ 78.982195] [<ffffffff8121b44a>] flush_plug_list+0xce/0x12f
7 [ 78.982199] [<ffffffff8121b4c0>] __blk_flush_plug+0x15/0x21
8 [ 78.982205] [<ffffffff8146b321>] schedule+0x43e/0xbea
Mike
On 2011-03-25 16:40, Mike Snitzer wrote:
> On Fri, Mar 25 2011 at 11:22am -0400,
> Markus Trippelsdorf <[email protected]> wrote:
>
>> On 2011.03.25 at 17:15 +0200, Sergey Senozhatsky wrote:
>>> Hello,
>>>
>>> Commit
>>> 9d5a4e946ce5352f19400b6370f4cd8e72806278
>>> block: skip elevator data initialization for flush requests
>>>
>>> Skip elevator initialization for flush requests by passing priv=0 to
>>> blk_alloc_request() in get_request(). As such elv_set_request() is
>>> never called for flush requests.
>>>
>>> introduced priv flag, to skip elevator_private data init for FLUSH requests.
>>> This, I guess, lead to NULL pointer deref on my machine in cfq_insert_request,
>>> which requires elevator_private to be set:
>>>
>>> 1 [ 78.982169] Call Trace:
>>> 2 [ 78.982178] [<ffffffff8122d1fe>] cfq_insert_request+0x4e/0x47d
>>> 3 [ 78.982184] [<ffffffff8123e139>] ? do_raw_spin_lock+0x6b/0x122
>>
>>> Should we in that case use ELEVATOR_INSERT_FLUSH for REQ_FLUSH | REQ_FUA requests
>>> (like below)?
>>>
>>> ---
>>>
>>> block/elevator.c | 2 ++
>>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/block/elevator.c b/block/elevator.c
>>> index c387d31..b17e577 100644
>>> --- a/block/elevator.c
>>> +++ b/block/elevator.c
>>> @@ -734,6 +734,8 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
>>> q->end_sector = rq_end_sector(rq);
>>> q->boundary_rq = rq;
>>> }
>>> + } else if (rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
>>> + where = ELEVATOR_INSERT_FLUSH;
>>> } else if (!(rq->cmd_flags & REQ_ELVPRIV) &&
>>> where == ELEVATOR_INSERT_SORT)
>>> where = ELEVATOR_INSERT_BACK;
>>
>> Thanks. That solves all (corruption-) problems that I reported earlier in an other
>> thread.
>
> So the flush-merge changes introduced ELEVATOR_INSERT_FLUSH (via commit
> ae1b1539). And the flush bio will now get ELEVATOR_INSERT_FLUSH in
> __make_request().
>
> So it is interesting that the flush is getting inserted in the elevator
> at all. AFAIK that shouldn't be (and historically hasn't been) the
> case.
>
> Combination of onstack plug changes?
It is, it forces a sort insert. I'll fix this up, I'm relieved we have a
good handle on this issue now.
--
Jens Axboe
On 2011-03-25 16:22, Markus Trippelsdorf wrote:
> On 2011.03.25 at 17:15 +0200, Sergey Senozhatsky wrote:
>> Hello,
>>
>> Commit
>> 9d5a4e946ce5352f19400b6370f4cd8e72806278
>> block: skip elevator data initialization for flush requests
>>
>> Skip elevator initialization for flush requests by passing priv=0 to
>> blk_alloc_request() in get_request(). As such elv_set_request() is
>> never called for flush requests.
>>
>> introduced priv flag, to skip elevator_private data init for FLUSH requests.
>> This, I guess, lead to NULL pointer deref on my machine in cfq_insert_request,
>> which requires elevator_private to be set:
>>
>> 1 [ 78.982169] Call Trace:
>> 2 [ 78.982178] [<ffffffff8122d1fe>] cfq_insert_request+0x4e/0x47d
>> 3 [ 78.982184] [<ffffffff8123e139>] ? do_raw_spin_lock+0x6b/0x122
>
>> Should we in that case use ELEVATOR_INSERT_FLUSH for REQ_FLUSH | REQ_FUA requests
>> (like below)?
>>
>> ---
>>
>> block/elevator.c | 2 ++
>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/block/elevator.c b/block/elevator.c
>> index c387d31..b17e577 100644
>> --- a/block/elevator.c
>> +++ b/block/elevator.c
>> @@ -734,6 +734,8 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
>> q->end_sector = rq_end_sector(rq);
>> q->boundary_rq = rq;
>> }
>> + } else if (rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
>> + where = ELEVATOR_INSERT_FLUSH;
>> } else if (!(rq->cmd_flags & REQ_ELVPRIV) &&
>> where == ELEVATOR_INSERT_SORT)
>> where = ELEVATOR_INSERT_BACK;
>
> Thanks. That solves all (corruption-) problems that I reported earlier
> in an other thread.
That's great. I'm surprised that this would cause silent corruption for
you, should have been accompanied by an oops. Or was that with noop
only?
--
Jens Axboe
On 2011.03.25 at 16:57 +0100, Jens Axboe wrote:
> On 2011-03-25 16:22, Markus Trippelsdorf wrote:
> > On 2011.03.25 at 17:15 +0200, Sergey Senozhatsky wrote:
> >> Hello,
> >>
> >> Commit
> >> 9d5a4e946ce5352f19400b6370f4cd8e72806278
> >> block: skip elevator data initialization for flush requests
> >>
> >> Skip elevator initialization for flush requests by passing priv=0 to
> >> blk_alloc_request() in get_request(). As such elv_set_request() is
> >> never called for flush requests.
> >>
> >> introduced priv flag, to skip elevator_private data init for FLUSH requests.
> >> This, I guess, lead to NULL pointer deref on my machine in cfq_insert_request,
> >> which requires elevator_private to be set:
> >>
> >> 1 [ 78.982169] Call Trace:
> >> 2 [ 78.982178] [<ffffffff8122d1fe>] cfq_insert_request+0x4e/0x47d
> >> 3 [ 78.982184] [<ffffffff8123e139>] ? do_raw_spin_lock+0x6b/0x122
> >
> >> Should we in that case use ELEVATOR_INSERT_FLUSH for REQ_FLUSH | REQ_FUA requests
> >> (like below)?
> >>
> >> ---
> >>
> >> block/elevator.c | 2 ++
> >> 1 files changed, 2 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/block/elevator.c b/block/elevator.c
> >> index c387d31..b17e577 100644
> >> --- a/block/elevator.c
> >> +++ b/block/elevator.c
> >> @@ -734,6 +734,8 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
> >> q->end_sector = rq_end_sector(rq);
> >> q->boundary_rq = rq;
> >> }
> >> + } else if (rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
> >> + where = ELEVATOR_INSERT_FLUSH;
> >> } else if (!(rq->cmd_flags & REQ_ELVPRIV) &&
> >> where == ELEVATOR_INSERT_SORT)
> >> where = ELEVATOR_INSERT_BACK;
> >
> > Thanks. That solves all (corruption-) problems that I reported earlier
> > in an other thread.
>
> That's great. I'm surprised that this would cause silent corruption for
> you, should have been accompanied by an oops. Or was that with noop
> only?
The corruption happend during my bisection with cfq and also with noop
on the latest git kernel.
--
Markus
On Fri, Mar 25 2011 at 11:50am -0400,
Jens Axboe <[email protected]> wrote:
> On 2011-03-25 16:40, Mike Snitzer wrote:
> > On Fri, Mar 25 2011 at 11:22am -0400,
> > Markus Trippelsdorf <[email protected]> wrote:
> >
> >> On 2011.03.25 at 17:15 +0200, Sergey Senozhatsky wrote:
> >>> Hello,
> >>>
> >>> Commit
> >>> 9d5a4e946ce5352f19400b6370f4cd8e72806278
> >>> block: skip elevator data initialization for flush requests
> >>>
> >>> Skip elevator initialization for flush requests by passing priv=0 to
> >>> blk_alloc_request() in get_request(). As such elv_set_request() is
> >>> never called for flush requests.
> >>>
> >>> introduced priv flag, to skip elevator_private data init for FLUSH requests.
> >>> This, I guess, lead to NULL pointer deref on my machine in cfq_insert_request,
> >>> which requires elevator_private to be set:
> >>>
> >>> 1 [ 78.982169] Call Trace:
> >>> 2 [ 78.982178] [<ffffffff8122d1fe>] cfq_insert_request+0x4e/0x47d
> >>> 3 [ 78.982184] [<ffffffff8123e139>] ? do_raw_spin_lock+0x6b/0x122
> >>
> >>> Should we in that case use ELEVATOR_INSERT_FLUSH for REQ_FLUSH | REQ_FUA requests
> >>> (like below)?
> >>>
> >>> ---
> >>>
> >>> block/elevator.c | 2 ++
> >>> 1 files changed, 2 insertions(+), 0 deletions(-)
> >>>
> >>> diff --git a/block/elevator.c b/block/elevator.c
> >>> index c387d31..b17e577 100644
> >>> --- a/block/elevator.c
> >>> +++ b/block/elevator.c
> >>> @@ -734,6 +734,8 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
> >>> q->end_sector = rq_end_sector(rq);
> >>> q->boundary_rq = rq;
> >>> }
> >>> + } else if (rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
> >>> + where = ELEVATOR_INSERT_FLUSH;
> >>> } else if (!(rq->cmd_flags & REQ_ELVPRIV) &&
> >>> where == ELEVATOR_INSERT_SORT)
> >>> where = ELEVATOR_INSERT_BACK;
> >>
> >> Thanks. That solves all (corruption-) problems that I reported earlier in an other
> >> thread.
> >
> > So the flush-merge changes introduced ELEVATOR_INSERT_FLUSH (via commit
> > ae1b1539). And the flush bio will now get ELEVATOR_INSERT_FLUSH in
> > __make_request().
> >
> > So it is interesting that the flush is getting inserted in the elevator
> > at all. AFAIK that shouldn't be (and historically hasn't been) the
> > case.
> >
> > Combination of onstack plug changes?
>
> It is, it forces a sort insert. I'll fix this up, I'm relieved we have a
> good handle on this issue now.
Should we also add a safety net to avoid the potential for future silent
corruption, etc? E.g.:
---
block/elevator.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/block/elevator.c b/block/elevator.c
index c387d31..86d258e 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -657,6 +657,9 @@ void elv_quiesce_end(struct request_queue *q)
void elv_insert(struct request_queue *q, struct request *rq, int where)
{
+ BUG_ON(rq->cmd_flags & (REQ_FLUSH | REQ_FUA) &&
+ where != ELEVATOR_INSERT_FLUSH);
+
trace_block_rq_insert(q, rq);
rq->q = q;
On 2011-03-25 19:54, Mike Snitzer wrote:
> On Fri, Mar 25 2011 at 11:50am -0400,
> Jens Axboe <[email protected]> wrote:
>
>> On 2011-03-25 16:40, Mike Snitzer wrote:
>>> On Fri, Mar 25 2011 at 11:22am -0400,
>>> Markus Trippelsdorf <[email protected]> wrote:
>>>
>>>> On 2011.03.25 at 17:15 +0200, Sergey Senozhatsky wrote:
>>>>> Hello,
>>>>>
>>>>> Commit
>>>>> 9d5a4e946ce5352f19400b6370f4cd8e72806278
>>>>> block: skip elevator data initialization for flush requests
>>>>>
>>>>> Skip elevator initialization for flush requests by passing priv=0 to
>>>>> blk_alloc_request() in get_request(). As such elv_set_request() is
>>>>> never called for flush requests.
>>>>>
>>>>> introduced priv flag, to skip elevator_private data init for FLUSH requests.
>>>>> This, I guess, lead to NULL pointer deref on my machine in cfq_insert_request,
>>>>> which requires elevator_private to be set:
>>>>>
>>>>> 1 [ 78.982169] Call Trace:
>>>>> 2 [ 78.982178] [<ffffffff8122d1fe>] cfq_insert_request+0x4e/0x47d
>>>>> 3 [ 78.982184] [<ffffffff8123e139>] ? do_raw_spin_lock+0x6b/0x122
>>>>
>>>>> Should we in that case use ELEVATOR_INSERT_FLUSH for REQ_FLUSH | REQ_FUA requests
>>>>> (like below)?
>>>>>
>>>>> ---
>>>>>
>>>>> block/elevator.c | 2 ++
>>>>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/block/elevator.c b/block/elevator.c
>>>>> index c387d31..b17e577 100644
>>>>> --- a/block/elevator.c
>>>>> +++ b/block/elevator.c
>>>>> @@ -734,6 +734,8 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
>>>>> q->end_sector = rq_end_sector(rq);
>>>>> q->boundary_rq = rq;
>>>>> }
>>>>> + } else if (rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
>>>>> + where = ELEVATOR_INSERT_FLUSH;
>>>>> } else if (!(rq->cmd_flags & REQ_ELVPRIV) &&
>>>>> where == ELEVATOR_INSERT_SORT)
>>>>> where = ELEVATOR_INSERT_BACK;
>>>>
>>>> Thanks. That solves all (corruption-) problems that I reported earlier in an other
>>>> thread.
>>>
>>> So the flush-merge changes introduced ELEVATOR_INSERT_FLUSH (via commit
>>> ae1b1539). And the flush bio will now get ELEVATOR_INSERT_FLUSH in
>>> __make_request().
>>>
>>> So it is interesting that the flush is getting inserted in the elevator
>>> at all. AFAIK that shouldn't be (and historically hasn't been) the
>>> case.
>>>
>>> Combination of onstack plug changes?
>>
>> It is, it forces a sort insert. I'll fix this up, I'm relieved we have a
>> good handle on this issue now.
>
> Should we also add a safety net to avoid the potential for future silent
> corruption, etc? E.g.:
Yes, I was thinking about something like that. I consider the patch
merged an immediate stop gap, we need to improve this situation. It's
not exactly pretty to have this sort of condition in both
__make_request() and flush_plug_list(). Clearly it should be handled
further down.
--
Jens Axboe
On Fri, Mar 25 2011 at 3:50pm -0400,
Jens Axboe <[email protected]> wrote:
> On 2011-03-25 19:54, Mike Snitzer wrote:
> > On Fri, Mar 25 2011 at 11:50am -0400,
> > Jens Axboe <[email protected]> wrote:
> >
> >> On 2011-03-25 16:40, Mike Snitzer wrote:
> >>> On Fri, Mar 25 2011 at 11:22am -0400,
> >>> Markus Trippelsdorf <[email protected]> wrote:
> >>>
> >>>> On 2011.03.25 at 17:15 +0200, Sergey Senozhatsky wrote:
> >>>>> Hello,
> >>>>>
> >>>>> Commit
> >>>>> 9d5a4e946ce5352f19400b6370f4cd8e72806278
> >>>>> block: skip elevator data initialization for flush requests
> >>>>>
> >>>>> Skip elevator initialization for flush requests by passing priv=0 to
> >>>>> blk_alloc_request() in get_request(). As such elv_set_request() is
> >>>>> never called for flush requests.
> >>>>>
> >>>>> introduced priv flag, to skip elevator_private data init for FLUSH requests.
> >>>>> This, I guess, lead to NULL pointer deref on my machine in cfq_insert_request,
> >>>>> which requires elevator_private to be set:
> >>>>>
> >>>>> 1 [ 78.982169] Call Trace:
> >>>>> 2 [ 78.982178] [<ffffffff8122d1fe>] cfq_insert_request+0x4e/0x47d
> >>>>> 3 [ 78.982184] [<ffffffff8123e139>] ? do_raw_spin_lock+0x6b/0x122
> >>>>
> >>>>> Should we in that case use ELEVATOR_INSERT_FLUSH for REQ_FLUSH | REQ_FUA requests
> >>>>> (like below)?
> >>>>>
> >>>>> ---
> >>>>>
> >>>>> block/elevator.c | 2 ++
> >>>>> 1 files changed, 2 insertions(+), 0 deletions(-)
> >>>>>
> >>>>> diff --git a/block/elevator.c b/block/elevator.c
> >>>>> index c387d31..b17e577 100644
> >>>>> --- a/block/elevator.c
> >>>>> +++ b/block/elevator.c
> >>>>> @@ -734,6 +734,8 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
> >>>>> q->end_sector = rq_end_sector(rq);
> >>>>> q->boundary_rq = rq;
> >>>>> }
> >>>>> + } else if (rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
> >>>>> + where = ELEVATOR_INSERT_FLUSH;
> >>>>> } else if (!(rq->cmd_flags & REQ_ELVPRIV) &&
> >>>>> where == ELEVATOR_INSERT_SORT)
> >>>>> where = ELEVATOR_INSERT_BACK;
> >>>>
> >>>> Thanks. That solves all (corruption-) problems that I reported earlier in an other
> >>>> thread.
> >>>
> >>> So the flush-merge changes introduced ELEVATOR_INSERT_FLUSH (via commit
> >>> ae1b1539). And the flush bio will now get ELEVATOR_INSERT_FLUSH in
> >>> __make_request().
> >>>
> >>> So it is interesting that the flush is getting inserted in the elevator
> >>> at all. AFAIK that shouldn't be (and historically hasn't been) the
> >>> case.
> >>>
> >>> Combination of onstack plug changes?
> >>
> >> It is, it forces a sort insert. I'll fix this up, I'm relieved we have a
> >> good handle on this issue now.
> >
> > Should we also add a safety net to avoid the potential for future silent
> > corruption, etc? E.g.:
>
> Yes, I was thinking about something like that. I consider the patch
> merged an immediate stop gap, we need to improve this situation. It's
> not exactly pretty to have this sort of condition in both
> __make_request() and flush_plug_list(). Clearly it should be handled
> further down.
OK, and btw my patch was too restrictive. blk_kick_flush()
elv_insert()s a flush request with ELEVATOR_INSERT_REQUEUE.
Should blk_kick_flush() process the flush request without calling
elv_insert() -- like is done with open coded list_add() in
blk_insert_flush()?
Or should blk_insert_flush() use elv_insert() with
ELEVATOR_INSERT_REQUEUE too?
Mike
Hey,
On Sat, Mar 26, 2011 at 12:21:56AM -0400, Mike Snitzer wrote:
> > Yes, I was thinking about something like that. I consider the patch
> > merged an immediate stop gap, we need to improve this situation. It's
> > not exactly pretty to have this sort of condition in both
> > __make_request() and flush_plug_list(). Clearly it should be handled
> > further down.
>
> OK, and btw my patch was too restrictive. blk_kick_flush()
> elv_insert()s a flush request with ELEVATOR_INSERT_REQUEUE.
>
> Should blk_kick_flush() process the flush request without calling
> elv_insert() -- like is done with open coded list_add() in
> blk_insert_flush()?
>
> Or should blk_insert_flush() use elv_insert() with
> ELEVATOR_INSERT_REQUEUE too?
Hmmm... I would prefer the latter. Given that INSERT_REQUEUE and
FRONT are no longer different, it would probably be better to use
FRONT tho. The only reason REQUEUE is used there is to avoid kicking
the queue from elv_insert(), which is gone now.
Thanks.
--
tejun
On Mon, Mar 28 2011 at 4:23am -0400,
Tejun Heo <[email protected]> wrote:
> On Sat, Mar 26, 2011 at 12:21:56AM -0400, Mike Snitzer wrote:
> > Should blk_kick_flush() process the flush request without calling
> > elv_insert() -- like is done with open coded list_add() in
> > blk_insert_flush()?
> >
> > Or should blk_insert_flush() use elv_insert() with
> > ELEVATOR_INSERT_REQUEUE too?
>
> Hmmm... I would prefer the latter. Given that INSERT_REQUEUE and
> FRONT are no longer different, it would probably be better to use
> FRONT tho. The only reason REQUEUE is used there is to avoid kicking
> the queue from elv_insert(), which is gone now.
OK, I came up with the following patch.
Jens, this is just a natural cleanup given the code that resulted from
the flush-merge and onstack plugging changes coming together.
From: Mike Snitzer <[email protected]>
Subject: block: eliminate ELEVATOR_INSERT_REQUEUE
elv_insert() no longer has a need to differentiate between
ELEVATOR_INSERT_REQUEUE and ELEVATOR_INSERT_FRONT. The onstack plugging
changes eliminated the need to avoid unplugging the queue (via
ELEVATOR_INSERT_REQUEUE).
Also, in blk_insert_flush(), use elv_insert() with ELEVATOR_INSERT_FRONT
rather than open-coding the equivalent.
Signed-off-by: Mike Snitzer <[email protected]>
---
block/blk-flush.c | 4 ++--
block/elevator.c | 3 +--
include/linux/elevator.h | 5 ++---
3 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 93d5fd8..2c43044 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -261,7 +261,7 @@ static bool blk_kick_flush(struct request_queue *q)
q->flush_rq.end_io = flush_end_io;
q->flush_pending_idx ^= 1;
- elv_insert(q, &q->flush_rq, ELEVATOR_INSERT_REQUEUE);
+ elv_insert(q, &q->flush_rq, ELEVATOR_INSERT_FRONT);
return true;
}
@@ -312,7 +312,7 @@ void blk_insert_flush(struct request *rq)
*/
if ((policy & REQ_FSEQ_DATA) &&
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
- list_add(&rq->queuelist, &q->queue_head);
+ elv_insert(q, rq, ELEVATOR_INSERT_FRONT);
return;
}
diff --git a/block/elevator.c b/block/elevator.c
index c387d31..df7c4ba 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -610,7 +610,7 @@ void elv_requeue_request(struct request_queue *q, struct request *rq)
rq->cmd_flags &= ~REQ_STARTED;
- elv_insert(q, rq, ELEVATOR_INSERT_REQUEUE);
+ elv_insert(q, rq, ELEVATOR_INSERT_FRONT);
}
void elv_drain_elevator(struct request_queue *q)
@@ -662,7 +662,6 @@ void elv_insert(struct request_queue *q, struct request *rq, int where)
rq->q = q;
switch (where) {
- case ELEVATOR_INSERT_REQUEUE:
case ELEVATOR_INSERT_FRONT:
rq->cmd_flags |= REQ_SOFTBARRIER;
list_add(&rq->queuelist, &q->queue_head);
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index d93efcc445..0b4a354 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -164,9 +164,8 @@ extern struct request *elv_rb_find(struct rb_root *, sector_t);
#define ELEVATOR_INSERT_FRONT 1
#define ELEVATOR_INSERT_BACK 2
#define ELEVATOR_INSERT_SORT 3
-#define ELEVATOR_INSERT_REQUEUE 4
-#define ELEVATOR_INSERT_FLUSH 5
-#define ELEVATOR_INSERT_SORT_MERGE 6
+#define ELEVATOR_INSERT_FLUSH 4
+#define ELEVATOR_INSERT_SORT_MERGE 5
/*
* return values from elevator_may_queue_fn
On 2011-03-29 00:15, Mike Snitzer wrote:
> On Mon, Mar 28 2011 at 4:23am -0400,
> Tejun Heo <[email protected]> wrote:
>
>> On Sat, Mar 26, 2011 at 12:21:56AM -0400, Mike Snitzer wrote:
>>> Should blk_kick_flush() process the flush request without calling
>>> elv_insert() -- like is done with open coded list_add() in
>>> blk_insert_flush()?
>>>
>>> Or should blk_insert_flush() use elv_insert() with
>>> ELEVATOR_INSERT_REQUEUE too?
>>
>> Hmmm... I would prefer the latter. Given that INSERT_REQUEUE and
>> FRONT are no longer different, it would probably be better to use
>> FRONT tho. The only reason REQUEUE is used there is to avoid kicking
>> the queue from elv_insert(), which is gone now.
>
> OK, I came up with the following patch.
>
> Jens, this is just a natural cleanup given the code that resulted from
> the flush-merge and onstack plugging changes coming together.
That looks nice and clean. What kind of testing has been done?
--
Jens Axboe
Mike Snitzer <[email protected]> writes:
> OK, I came up with the following patch.
>
> Jens, this is just a natural cleanup given the code that resulted from
> the flush-merge and onstack plugging changes coming together.
>
>
> From: Mike Snitzer <[email protected]>
> Subject: block: eliminate ELEVATOR_INSERT_REQUEUE
>
> elv_insert() no longer has a need to differentiate between
> ELEVATOR_INSERT_REQUEUE and ELEVATOR_INSERT_FRONT. The onstack plugging
> changes eliminated the need to avoid unplugging the queue (via
> ELEVATOR_INSERT_REQUEUE).
>
> Also, in blk_insert_flush(), use elv_insert() with ELEVATOR_INSERT_FRONT
> rather than open-coding the equivalent.
What you change by doing the call to elv_insert is that now the request
will have REQ_SOFTBARRIER set. I don't think that affects anything,
though (I checked). The rest looks pretty straight-forward.
Reviewed-by: Jeff Moyer <[email protected]>
On Tue, Mar 29 2011 at 7:56am -0400,
Jens Axboe <[email protected]> wrote:
> On 2011-03-29 00:15, Mike Snitzer wrote:
> > On Mon, Mar 28 2011 at 4:23am -0400,
> > Tejun Heo <[email protected]> wrote:
> >
> >> On Sat, Mar 26, 2011 at 12:21:56AM -0400, Mike Snitzer wrote:
> >>> Should blk_kick_flush() process the flush request without calling
> >>> elv_insert() -- like is done with open coded list_add() in
> >>> blk_insert_flush()?
> >>>
> >>> Or should blk_insert_flush() use elv_insert() with
> >>> ELEVATOR_INSERT_REQUEUE too?
> >>
> >> Hmmm... I would prefer the latter. Given that INSERT_REQUEUE and
> >> FRONT are no longer different, it would probably be better to use
> >> FRONT tho. The only reason REQUEUE is used there is to avoid kicking
> >> the queue from elv_insert(), which is gone now.
> >
> > OK, I came up with the following patch.
> >
> > Jens, this is just a natural cleanup given the code that resulted from
> > the flush-merge and onstack plugging changes coming together.
>
> That looks nice and clean. What kind of testing has been done?
I successfully tested it with that fsync-heavy ffsb workload (xfs on
mpath device).
Mike
On Tue, Mar 29, 2011 at 10:13:05AM -0400, Jeff Moyer wrote:
> Mike Snitzer <[email protected]> writes:
>
> > OK, I came up with the following patch.
> >
> > Jens, this is just a natural cleanup given the code that resulted from
> > the flush-merge and onstack plugging changes coming together.
> >
> >
> > From: Mike Snitzer <[email protected]>
> > Subject: block: eliminate ELEVATOR_INSERT_REQUEUE
> >
> > elv_insert() no longer has a need to differentiate between
> > ELEVATOR_INSERT_REQUEUE and ELEVATOR_INSERT_FRONT. The onstack plugging
> > changes eliminated the need to avoid unplugging the queue (via
> > ELEVATOR_INSERT_REQUEUE).
> >
> > Also, in blk_insert_flush(), use elv_insert() with ELEVATOR_INSERT_FRONT
> > rather than open-coding the equivalent.
>
> What you change by doing the call to elv_insert is that now the request
> will have REQ_SOFTBARRIER set. I don't think that affects anything,
> though (I checked). The rest looks pretty straight-forward.
What is the significance of REQ_SOFTBARRIER? Why it should be set for all
INSERT_FRONT and requeue requests.
With flush logic we got rid of all hardbarriers and hence there is no
notion of ordering as such. One has to wait for request to finish to
enforce ordering.
Should we get rid of softbarriers too?
Thanks
Vivek
Btw, is there any need to actually keep the elv_insert interface?
Itw has two callers in current mainline, two of them hardcode
ELEVATOR_INSERT_REQUEUE as the reason and could opencode it, and
the other cases could be merged into __elv_add_request.
On Tue, Mar 29, 2011 at 01:54:58PM -0400, Vivek Goyal wrote:
> What is the significance of REQ_SOFTBARRIER? Why it should be set for all
> INSERT_FRONT and requeue requests.
>
> With flush logic we got rid of all hardbarriers and hence there is no
> notion of ordering as such. One has to wait for request to finish to
> enforce ordering.
>
> Should we get rid of softbarriers too?
REQ_SOFTBARRIER makes sure that other requests don't pass the barrier
ones when dispatched from the elevator to the dispatch queue. When
the request is being inserted explicitly at front, dispatching other
requests in front of it is a bit, umm, weird.
IIRC, at least in the requeue path, some drivers depend on front
queueing for forward progress guarantee. Forward progress for
prepping is guaranteed by mempool (or something like that) and when
the request is retried, it should stay at the front of the queue;
otherwise, prepping can stall with prepped requests stuck behind
unprepped ones.
Although flush requests don't need REQ_SOFTBARRIER anymore, I don't
think removing it would bring much benefit either, so no reason to
change it just for that reason.
Thanks.
--
tejun
On Tue, Mar 29, 2011 at 02:25:55PM -0400, Christoph Hellwig wrote:
> Btw, is there any need to actually keep the elv_insert interface?
>
> Itw has two callers in current mainline, two of them hardcode
> ELEVATOR_INSERT_REQUEUE as the reason and could opencode it, and
> the other cases could be merged into __elv_add_request.
Agreed, at this point, the function seems a bit pointless.
Thanks.
--
tejun
On 2011-03-29 20:25, Christoph Hellwig wrote:
> Btw, is there any need to actually keep the elv_insert interface?
>
> Itw has two callers in current mainline, two of them hardcode
> ELEVATOR_INSERT_REQUEUE as the reason and could opencode it, and
> the other cases could be merged into __elv_add_request.
Agree, I merged the two and killed elv_insert().
--
Jens Axboe
On Wed, Mar 30, 2011 at 09:41:23AM +0200, Tejun Heo wrote:
> IIRC, at least in the requeue path, some drivers depend on front
> queueing for forward progress guarantee. Forward progress for
> prepping is guaranteed by mempool (or something like that) and when
> the request is retried, it should stay at the front of the queue;
> otherwise, prepping can stall with prepped requests stuck behind
> unprepped ones.
After writing the above, I think the current flush implementation
actually is violating the above. I think the problem is over-use of
front insertion. Flush can just append to the dispatch queue. Front
insertion should only be used by requeueing, really.
Thanks.
--
tejun
On 2011-03-30 09:57, Tejun Heo wrote:
> On Wed, Mar 30, 2011 at 09:41:23AM +0200, Tejun Heo wrote:
>> IIRC, at least in the requeue path, some drivers depend on front
>> queueing for forward progress guarantee. Forward progress for
>> prepping is guaranteed by mempool (or something like that) and when
>> the request is retried, it should stay at the front of the queue;
>> otherwise, prepping can stall with prepped requests stuck behind
>> unprepped ones.
>
> After writing the above, I think the current flush implementation
> actually is violating the above. I think the problem is over-use of
> front insertion. Flush can just append to the dispatch queue. Front
> insertion should only be used by requeueing, really.
Pure front insert should be used for requeue and internal commands (like
spin up this drive, or get error information). Flush should append to
the dispatch list.
--
Jens Axboe
Hello, Jens.
On Wed, Mar 30, 2011 at 09:59:09AM +0200, Jens Axboe wrote:
> Pure front insert should be used for requeue and internal commands (like
> spin up this drive, or get error information). Flush should append to
> the dispatch list.
Yeah, right. The reason I used REQUEUE/FRONT was because BACK
insertion involves draining the elevator and then appending the
request at the end of the dispatch queue, which is unnecessary and
inefficient. So, front insertion was a quick work around that. If
we're removing elv_insert(), we can just append directly to the
dispatch queue from flush code.
Thanks.
--
tejun
On 2011-03-30 10:02, Tejun Heo wrote:
> Hello, Jens.
>
> On Wed, Mar 30, 2011 at 09:59:09AM +0200, Jens Axboe wrote:
>> Pure front insert should be used for requeue and internal commands (like
>> spin up this drive, or get error information). Flush should append to
>> the dispatch list.
>
> Yeah, right. The reason I used REQUEUE/FRONT was because BACK
> insertion involves draining the elevator and then appending the
> request at the end of the dispatch queue, which is unnecessary and
> inefficient. So, front insertion was a quick work around that. If
> we're removing elv_insert(), we can just append directly to the
> dispatch queue from flush code.
How does this look? It's really two patches, but rolled up into one for
easier posting here.
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 93d5fd8..6a16fea 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -261,7 +261,7 @@ static bool blk_kick_flush(struct request_queue *q)
q->flush_rq.end_io = flush_end_io;
q->flush_pending_idx ^= 1;
- elv_insert(q, &q->flush_rq, ELEVATOR_INSERT_REQUEUE);
+ __elv_add_request(q, &q->flush_rq, ELEVATOR_INSERT_FLUSH);
return true;
}
@@ -281,7 +281,7 @@ static void flush_data_end_io(struct request *rq, int error)
* blk_insert_flush - insert a new FLUSH/FUA request
* @rq: request to insert
*
- * To be called from elv_insert() for %ELEVATOR_INSERT_FLUSH insertions.
+ * To be called from __elv_add_request() for %ELEVATOR_INSERT_FLUSH insertions.
* @rq is being submitted. Analyze what needs to be done and put it on the
* right queue.
*
@@ -312,7 +312,7 @@ void blk_insert_flush(struct request *rq)
*/
if ((policy & REQ_FSEQ_DATA) &&
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
- list_add(&rq->queuelist, &q->queue_head);
+ list_add_tail(&rq->queuelist, &q->queue_head);
return;
}
diff --git a/block/elevator.c b/block/elevator.c
index c387d31..0cdb4e7 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -610,7 +610,7 @@ void elv_requeue_request(struct request_queue *q, struct request *rq)
rq->cmd_flags &= ~REQ_STARTED;
- elv_insert(q, rq, ELEVATOR_INSERT_REQUEUE);
+ __elv_add_request(q, rq, ELEVATOR_INSERT_REQUEUE);
}
void elv_drain_elevator(struct request_queue *q)
@@ -655,12 +655,25 @@ void elv_quiesce_end(struct request_queue *q)
queue_flag_clear(QUEUE_FLAG_ELVSWITCH, q);
}
-void elv_insert(struct request_queue *q, struct request *rq, int where)
+void __elv_add_request(struct request_queue *q, struct request *rq, int where)
{
trace_block_rq_insert(q, rq);
rq->q = q;
+ BUG_ON(rq->cmd_flags & REQ_ON_PLUG);
+
+ if (rq->cmd_flags & REQ_SOFTBARRIER) {
+ /* barriers are scheduling boundary, update end_sector */
+ if (rq->cmd_type == REQ_TYPE_FS ||
+ (rq->cmd_flags & REQ_DISCARD)) {
+ q->end_sector = rq_end_sector(rq);
+ q->boundary_rq = rq;
+ }
+ } else if (!(rq->cmd_flags & REQ_ELVPRIV) &&
+ where == ELEVATOR_INSERT_SORT)
+ where = ELEVATOR_INSERT_BACK;
+
switch (where) {
case ELEVATOR_INSERT_REQUEUE:
case ELEVATOR_INSERT_FRONT:
@@ -722,24 +735,6 @@ void elv_insert(struct request_queue *q, struct request *rq, int where)
BUG();
}
}
-
-void __elv_add_request(struct request_queue *q, struct request *rq, int where)
-{
- BUG_ON(rq->cmd_flags & REQ_ON_PLUG);
-
- if (rq->cmd_flags & REQ_SOFTBARRIER) {
- /* barriers are scheduling boundary, update end_sector */
- if (rq->cmd_type == REQ_TYPE_FS ||
- (rq->cmd_flags & REQ_DISCARD)) {
- q->end_sector = rq_end_sector(rq);
- q->boundary_rq = rq;
- }
- } else if (!(rq->cmd_flags & REQ_ELVPRIV) &&
- where == ELEVATOR_INSERT_SORT)
- where = ELEVATOR_INSERT_BACK;
-
- elv_insert(q, rq, where);
-}
EXPORT_SYMBOL(__elv_add_request);
void elv_add_request(struct request_queue *q, struct request *rq, int where)
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index d93efcc445..21a8ebf 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -101,7 +101,6 @@ extern void elv_dispatch_sort(struct request_queue *, struct request *);
extern void elv_dispatch_add_tail(struct request_queue *, struct request *);
extern void elv_add_request(struct request_queue *, struct request *, int);
extern void __elv_add_request(struct request_queue *, struct request *, int);
-extern void elv_insert(struct request_queue *, struct request *, int);
extern int elv_merge(struct request_queue *, struct request **, struct bio *);
extern int elv_try_merge(struct request *, struct bio *);
extern void elv_merge_requests(struct request_queue *, struct request *,
--
Jens Axboe
Hello,
On (03/30/11 12:16), Jens Axboe wrote:
> How does this look? It's really two patches, but rolled up into one for
> easier posting here.
>
Nope, doesn't work for me. fsck.ext4 crashed the system.
__elv_add_request
blk_flush_complete_seq
blk_insert_flush
__elv_add_request
__make_request
generic_make_request
?bio_alloc_bioset
blkdev_issue_flush
blkdev_fsync
vfs_fsync
[..]
RIP is on blk_insert_flush + 0x54, which is one of the BUG_ONs
Dump of assembler code for function blk_insert_flush:
0x00000000000004a4 <+0>: push %rbp
0x00000000000004a5 <+1>: xor %esi,%esi
0x00000000000004a7 <+3>: mov %rdi,%r8
0x00000000000004aa <+6>: mov 0x38(%rdi),%rax
0x00000000000004ae <+10>: mov %rsp,%rbp
0x00000000000004b1 <+13>: mov 0x6d4(%rax),%edx
0x00000000000004b7 <+19>: test $0x800000,%edx
0x00000000000004bd <+25>: je 0x4ee <blk_insert_flush+74>
0x00000000000004bf <+27>: mov 0x40(%rdi),%ecx
0x00000000000004c2 <+30>: xor %esi,%esi
0x00000000000004c4 <+32>: mov 0x54(%rdi),%r9d
0x00000000000004c8 <+36>: test $0x800000,%ecx
0x00000000000004ce <+42>: setne %sil
0x00000000000004d2 <+46>: mov %esi,%edi
0x00000000000004d4 <+48>: or $0x2,%edi
0x00000000000004d7 <+51>: shr $0x9,%r9d
0x00000000000004db <+55>: cmovne %edi,%esi
0x00000000000004de <+58>: test $0x10,%dh
0x00000000000004e1 <+61>: jne 0x4ee <blk_insert_flush+74>
0x00000000000004e3 <+63>: mov %esi,%edi
0x00000000000004e5 <+65>: or $0x4,%edi
0x00000000000004e8 <+68>: and $0x10,%ch
0x00000000000004eb <+71>: cmovne %edi,%esi
0x00000000000004ee <+74>: cmpq $0x0,0x148(%r8)
0x00000000000004f6 <+82>: je 0x4fa <blk_insert_flush+86>
0x00000000000004f8 <+84>: ud2a
^^^^^^^^^^^^
0x00000000000004fa <+86>: mov 0x60(%r8),%rcx
0x00000000000004fe <+90>: test %rcx,%rcx
0x0000000000000501 <+93>: je 0x509 <blk_insert_flush+101>
0x0000000000000503 <+95>: cmp 0x68(%r8),%rcx
0x0000000000000507 <+99>: je 0x50b <blk_insert_flush+103>
0x0000000000000509 <+101>: ud2a
Sergey
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index 93d5fd8..6a16fea 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -261,7 +261,7 @@ static bool blk_kick_flush(struct request_queue *q)
> q->flush_rq.end_io = flush_end_io;
>
> q->flush_pending_idx ^= 1;
> - elv_insert(q, &q->flush_rq, ELEVATOR_INSERT_REQUEUE);
> + __elv_add_request(q, &q->flush_rq, ELEVATOR_INSERT_FLUSH);
> return true;
> }
>
> @@ -281,7 +281,7 @@ static void flush_data_end_io(struct request *rq, int error)
> * blk_insert_flush - insert a new FLUSH/FUA request
> * @rq: request to insert
> *
> - * To be called from elv_insert() for %ELEVATOR_INSERT_FLUSH insertions.
> + * To be called from __elv_add_request() for %ELEVATOR_INSERT_FLUSH insertions.
> * @rq is being submitted. Analyze what needs to be done and put it on the
> * right queue.
> *
> @@ -312,7 +312,7 @@ void blk_insert_flush(struct request *rq)
> */
> if ((policy & REQ_FSEQ_DATA) &&
> !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
> - list_add(&rq->queuelist, &q->queue_head);
> + list_add_tail(&rq->queuelist, &q->queue_head);
> return;
> }
>
> diff --git a/block/elevator.c b/block/elevator.c
> index c387d31..0cdb4e7 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -610,7 +610,7 @@ void elv_requeue_request(struct request_queue *q, struct request *rq)
>
> rq->cmd_flags &= ~REQ_STARTED;
>
> - elv_insert(q, rq, ELEVATOR_INSERT_REQUEUE);
> + __elv_add_request(q, rq, ELEVATOR_INSERT_REQUEUE);
> }
>
> void elv_drain_elevator(struct request_queue *q)
> @@ -655,12 +655,25 @@ void elv_quiesce_end(struct request_queue *q)
> queue_flag_clear(QUEUE_FLAG_ELVSWITCH, q);
> }
>
> -void elv_insert(struct request_queue *q, struct request *rq, int where)
> +void __elv_add_request(struct request_queue *q, struct request *rq, int where)
> {
> trace_block_rq_insert(q, rq);
>
> rq->q = q;
>
> + BUG_ON(rq->cmd_flags & REQ_ON_PLUG);
> +
> + if (rq->cmd_flags & REQ_SOFTBARRIER) {
> + /* barriers are scheduling boundary, update end_sector */
> + if (rq->cmd_type == REQ_TYPE_FS ||
> + (rq->cmd_flags & REQ_DISCARD)) {
> + q->end_sector = rq_end_sector(rq);
> + q->boundary_rq = rq;
> + }
> + } else if (!(rq->cmd_flags & REQ_ELVPRIV) &&
> + where == ELEVATOR_INSERT_SORT)
> + where = ELEVATOR_INSERT_BACK;
> +
> switch (where) {
> case ELEVATOR_INSERT_REQUEUE:
> case ELEVATOR_INSERT_FRONT:
> @@ -722,24 +735,6 @@ void elv_insert(struct request_queue *q, struct request *rq, int where)
> BUG();
> }
> }
> -
> -void __elv_add_request(struct request_queue *q, struct request *rq, int where)
> -{
> - BUG_ON(rq->cmd_flags & REQ_ON_PLUG);
> -
> - if (rq->cmd_flags & REQ_SOFTBARRIER) {
> - /* barriers are scheduling boundary, update end_sector */
> - if (rq->cmd_type == REQ_TYPE_FS ||
> - (rq->cmd_flags & REQ_DISCARD)) {
> - q->end_sector = rq_end_sector(rq);
> - q->boundary_rq = rq;
> - }
> - } else if (!(rq->cmd_flags & REQ_ELVPRIV) &&
> - where == ELEVATOR_INSERT_SORT)
> - where = ELEVATOR_INSERT_BACK;
> -
> - elv_insert(q, rq, where);
> -}
> EXPORT_SYMBOL(__elv_add_request);
>
> void elv_add_request(struct request_queue *q, struct request *rq, int where)
> diff --git a/include/linux/elevator.h b/include/linux/elevator.h
> index d93efcc445..21a8ebf 100644
> --- a/include/linux/elevator.h
> +++ b/include/linux/elevator.h
> @@ -101,7 +101,6 @@ extern void elv_dispatch_sort(struct request_queue *, struct request *);
> extern void elv_dispatch_add_tail(struct request_queue *, struct request *);
> extern void elv_add_request(struct request_queue *, struct request *, int);
> extern void __elv_add_request(struct request_queue *, struct request *, int);
> -extern void elv_insert(struct request_queue *, struct request *, int);
> extern int elv_merge(struct request_queue *, struct request **, struct bio *);
> extern int elv_try_merge(struct request *, struct bio *);
> extern void elv_merge_requests(struct request_queue *, struct request *,
>
> --
> Jens Axboe
>
Hello, Jens.
On Wed, Mar 30, 2011 at 12:16:13PM +0200, Jens Axboe wrote:
> @@ -261,7 +261,7 @@ static bool blk_kick_flush(struct request_queue *q)
> q->flush_rq.end_io = flush_end_io;
>
> q->flush_pending_idx ^= 1;
> - elv_insert(q, &q->flush_rq, ELEVATOR_INSERT_REQUEUE);
> + __elv_add_request(q, &q->flush_rq, ELEVATOR_INSERT_FLUSH);
Shouldn't this be list_add_tail(&rq->queuelist, &q->queue_head)?
__elv_add_request(FLUSH) calls back into blk_insert_flush() which
decomposes the flush request from FS into flush sequence.
blk_kick_flush() is used to insert the decomposed sequence requests
into the dispatch queue.
Thanks.
--
tejun
On 2011-03-30 13:21, Sergey Senozhatsky wrote:
> Hello,
>
> On (03/30/11 12:16), Jens Axboe wrote:
>> How does this look? It's really two patches, but rolled up into one for
>> easier posting here.
>>
>
> Nope, doesn't work for me. fsck.ext4 crashed the system.
>
> __elv_add_request
> blk_flush_complete_seq
> blk_insert_flush
> __elv_add_request
> __make_request
> generic_make_request
> ?bio_alloc_bioset
> blkdev_issue_flush
> blkdev_fsync
> vfs_fsync
> [..]
Ah, this addon should behave better.
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 6a16fea..eba4a27 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -261,7 +261,7 @@ static bool blk_kick_flush(struct request_queue *q)
q->flush_rq.end_io = flush_end_io;
q->flush_pending_idx ^= 1;
- __elv_add_request(q, &q->flush_rq, ELEVATOR_INSERT_FLUSH);
+ list_add_tail(&q->flush_rq.queuelist, &q->queue_head);
return true;
}
--
Jens Axboe
On 2011-03-30 13:20, Tejun Heo wrote:
> Hello, Jens.
>
> On Wed, Mar 30, 2011 at 12:16:13PM +0200, Jens Axboe wrote:
>> @@ -261,7 +261,7 @@ static bool blk_kick_flush(struct request_queue *q)
>> q->flush_rq.end_io = flush_end_io;
>>
>> q->flush_pending_idx ^= 1;
>> - elv_insert(q, &q->flush_rq, ELEVATOR_INSERT_REQUEUE);
>> + __elv_add_request(q, &q->flush_rq, ELEVATOR_INSERT_FLUSH);
>
> Shouldn't this be list_add_tail(&rq->queuelist, &q->queue_head)?
> __elv_add_request(FLUSH) calls back into blk_insert_flush() which
> decomposes the flush request from FS into flush sequence.
> blk_kick_flush() is used to insert the decomposed sequence requests
> into the dispatch queue.
Mid-air collision, I just sent that addon out to Sergey who tested. Yes
of course it should be a regular list_add, the above is a braino.
--
Jens Axboe
On (03/30/11 13:22), Jens Axboe wrote:
> > Nope, doesn't work for me. fsck.ext4 crashed the system.
> >
> > __elv_add_request
> > blk_flush_complete_seq
> > blk_insert_flush
> > __elv_add_request
> > __make_request
> > generic_make_request
> > ?bio_alloc_bioset
> > blkdev_issue_flush
> > blkdev_fsync
> > vfs_fsync
> > [..]
>
> Ah, this addon should behave better.
>
Seems to work fine for me.
Sergey
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index 6a16fea..eba4a27 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -261,7 +261,7 @@ static bool blk_kick_flush(struct request_queue *q)
> q->flush_rq.end_io = flush_end_io;
>
> q->flush_pending_idx ^= 1;
> - __elv_add_request(q, &q->flush_rq, ELEVATOR_INSERT_FLUSH);
> + list_add_tail(&q->flush_rq.queuelist, &q->queue_head);
> return true;
> }
>
>
> --
> Jens Axboe
>
On Wed, Mar 30, 2011 at 10:02:03AM +0200, Tejun Heo wrote:
> Hello, Jens.
>
> On Wed, Mar 30, 2011 at 09:59:09AM +0200, Jens Axboe wrote:
> > Pure front insert should be used for requeue and internal commands (like
> > spin up this drive, or get error information). Flush should append to
> > the dispatch list.
>
> Yeah, right. The reason I used REQUEUE/FRONT was because BACK
> insertion involves draining the elevator and then appending the
> request at the end of the dispatch queue, which is unnecessary and
> inefficient. So, front insertion was a quick work around that. If
> we're removing elv_insert(), we can just append directly to the
> dispatch queue from flush code.
Hi Tejun,
With ordering semantics gone, do we still need to drain the elevator before
queuing flush at the end of request queue.
Thanks
Vivek
Hello,
On Wed, Mar 30, 2011 at 09:46:58AM -0400, Vivek Goyal wrote:
> With ordering semantics gone, do we still need to drain the elevator
> before queuing flush at the end of request queue.
No, not at all, which was why I was cheating with front insertion.
Anyways, with Jens' patch posted in this thread, this is no longer a
problem.
Thanks.
--
tejun
On Wed, Mar 30 2011 at 4:02am -0400,
Tejun Heo <[email protected]> wrote:
> Hello, Jens.
>
> On Wed, Mar 30, 2011 at 09:59:09AM +0200, Jens Axboe wrote:
> > Pure front insert should be used for requeue and internal commands (like
> > spin up this drive, or get error information). Flush should append to
> > the dispatch list.
>
> Yeah, right. The reason I used REQUEUE/FRONT was because BACK
> insertion involves draining the elevator and then appending the
> request at the end of the dispatch queue, which is unnecessary and
> inefficient. So, front insertion was a quick work around that. If
> we're removing elv_insert(), we can just append directly to the
> dispatch queue from flush code.
I'm trying to follow along but unrolling what was said above is
challenging considering we're not getting rid of elv_insert()'s
functionality; it was just folded into __elv_add_request() -- offering
no functional change AFAIK. So placing special meaning on getting rid
of elv_insert() is confusing me.
Why can we all of a sudden append the flush to the dispatch queue _but_
not have any concern about queue draining? Seems that avoiding use of
BACK, by using list_add_tail, is enabling that. Couldn't we have always
done that? The folding of elv_insert() into __elv_add_request() seems
irrelevant.
Can we take a step back and be more explicit about the implications of
having inserted the flush with REQUEUE/FRONT? Seems to me that having
_not_ inserted the flush at the end of the dispatch queue is cause for
potential corruption (preceding data hasn't been issued to the device
yet).
And just to be clear: none of this is a concern for stable right? It is
just the flush-merge code introduced for 2.6.39 that needs fixing?
Please advise, thanks!
Mike
On Wed, Mar 30, 2011 at 10:01:24AM -0400, Mike Snitzer wrote:
> Why can we all of a sudden append the flush to the dispatch queue _but_
> not have any concern about queue draining? Seems that avoiding use of
> BACK, by using list_add_tail, is enabling that. Couldn't we have always
> done that?
Yes, we could have. I was just being lazy and looking for an easy
solution.
> The folding of elv_insert() into __elv_add_request() seems
> irrelevant.
Strictly, it is but they kinda go well together.
> Can we take a step back and be more explicit about the implications of
> having inserted the flush with REQUEUE/FRONT? Seems to me that having
> _not_ inserted the flush at the end of the dispatch queue is cause for
> potential corruption (preceding data hasn't been issued to the device
> yet).
No, the data ordering is enforced by the filesystem in the new
implementation meaning that by the time FLUSH is issued by the
filesystem, it should have made sure that all requests which must be
written before the FLUSH already had completed.
> And just to be clear: none of this is a concern for stable right? It is
> just the flush-merge code introduced for 2.6.39 that needs fixing?
I think 2.6.38 needs a -stable fix. It has the previous version of
the new flush implementation and is using front insertion.
Thanks.
--
tejun
On Wed, Mar 30, 2011 at 04:27:51PM +0200, Tejun Heo wrote:
[..]
> > And just to be clear: none of this is a concern for stable right? It is
> > just the flush-merge code introduced for 2.6.39 that needs fixing?
>
> I think 2.6.38 needs a -stable fix. It has the previous version of
> the new flush implementation and is using front insertion.
Tejun,
So wee need this as stable fix because FLUSH request can get ahead of
REQUEUED requests and it can break some drivers?
Thanks
Vivek
On Wed, Mar 30, 2011 at 11:22:48AM -0400, Vivek Goyal wrote:
> So wee need this as stable fix because FLUSH request can get ahead of
> REQUEUED requests and it can break some drivers?
Yes, I think so. All we need is just replacing elv_insert() calls in
blk-flush.c with list_add_tail(). Something like the following. I'll
test it and send a proper patch later. Thanks.
diff --git a/block/blk-flush.c b/block/blk-flush.c
index b27d020..51a45a5 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -132,7 +132,7 @@ static struct request *queue_next_fseq(struct request_queue *q)
BUG();
}
- elv_insert(q, rq, ELEVATOR_INSERT_REQUEUE);
+ list_add_tail(&rq->queuelist, &q->queue_head);
return rq;
}
--
tejun
On 2011-03-30 17:30, Tejun Heo wrote:
> On Wed, Mar 30, 2011 at 11:22:48AM -0400, Vivek Goyal wrote:
>> So wee need this as stable fix because FLUSH request can get ahead of
>> REQUEUED requests and it can break some drivers?
>
> Yes, I think so. All we need is just replacing elv_insert() calls in
> blk-flush.c with list_add_tail(). Something like the following. I'll
> test it and send a proper patch later. Thanks.
I'd suggest I just mark the queued patch for stable.
--
Jens Axboe
On Wed, Mar 30, 2011 at 07:13:20PM +0200, Jens Axboe wrote:
> On 2011-03-30 17:30, Tejun Heo wrote:
> > On Wed, Mar 30, 2011 at 11:22:48AM -0400, Vivek Goyal wrote:
> >> So wee need this as stable fix because FLUSH request can get ahead of
> >> REQUEUED requests and it can break some drivers?
> >
> > Yes, I think so. All we need is just replacing elv_insert() calls in
> > blk-flush.c with list_add_tail(). Something like the following. I'll
> > test it and send a proper patch later. Thanks.
>
> I'd suggest I just mark the queued patch for stable.
Ah, right. It might not apply without conflict but they're basically
the same changes. Thanks.
--
tejun
Jens Axboe <[email protected]> writes:
> On 2011-03-30 17:30, Tejun Heo wrote:
>> On Wed, Mar 30, 2011 at 11:22:48AM -0400, Vivek Goyal wrote:
>>> So wee need this as stable fix because FLUSH request can get ahead of
>>> REQUEUED requests and it can break some drivers?
>>
>> Yes, I think so. All we need is just replacing elv_insert() calls in
>> blk-flush.c with list_add_tail(). Something like the following. I'll
>> test it and send a proper patch later. Thanks.
>
> I'd suggest I just mark the queued patch for stable.
Could you document the nuance in that patch, please?
Thanks,
Jeff
On 2011-03-30 19:56, Jeff Moyer wrote:
> Jens Axboe <[email protected]> writes:
>
>> On 2011-03-30 17:30, Tejun Heo wrote:
>>> On Wed, Mar 30, 2011 at 11:22:48AM -0400, Vivek Goyal wrote:
>>>> So wee need this as stable fix because FLUSH request can get ahead of
>>>> REQUEUED requests and it can break some drivers?
>>>
>>> Yes, I think so. All we need is just replacing elv_insert() calls in
>>> blk-flush.c with list_add_tail(). Something like the following. I'll
>>> test it and send a proper patch later. Thanks.
>>
>> I'd suggest I just mark the queued patch for stable.
>
> Could you document the nuance in that patch, please?
Yeah I will. I rebase for-linus, it's not a "stable" branch as such.
Today I mainly collected some items as not to drop them, I'll expand on
the explanations in there.
--
Jens Axboe