2020-09-07 07:53:27

by YangYang

[permalink] [raw]
Subject: [PATCH] kyber: Fix crash in kyber_finish_request()

Kernel crash when requeue flush request.
It can be reproduced as below:

[ 2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00
...
[ 2.517468] pc : clear_bit+0x18/0x2c
[ 2.517502] lr : sbitmap_queue_clear+0x40/0x228
[ 2.517503] sp : ffffff800832bc60 pstate : 00c00145
...
[ 2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000)
[ 2.517602] Call trace:
[ 2.517606] clear_bit+0x18/0x2c
[ 2.517619] kyber_finish_request+0x74/0x80
[ 2.517627] blk_mq_requeue_request+0x3c/0xc0
[ 2.517637] __scsi_queue_insert+0x11c/0x148
[ 2.517640] scsi_softirq_done+0x114/0x130
[ 2.517643] blk_done_softirq+0x7c/0xb0
[ 2.517651] __do_softirq+0x208/0x3bc
[ 2.517657] run_ksoftirqd+0x34/0x60
[ 2.517663] smpboot_thread_fn+0x1c4/0x2c0
[ 2.517667] kthread+0x110/0x120
[ 2.517669] ret_from_fork+0x10/0x18

Signed-off-by: Yang Yang <[email protected]>
---
block/kyber-iosched.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index a38c5ab103d1..af73afe7a05c 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -611,6 +611,9 @@ static void kyber_finish_request(struct request *rq)
{
struct kyber_queue_data *kqd = rq->q->elevator->elevator_data;

+ if (unlikely(!(rq->rq_flags & RQF_ELVPRIV)))
+ return;
+
rq_clear_domain_token(kqd, rq);
}

--
2.17.1


2020-09-07 16:43:04

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] kyber: Fix crash in kyber_finish_request()

CC Omar

On 9/7/20 1:43 AM, Yang Yang wrote:
> Kernel crash when requeue flush request.
> It can be reproduced as below:
>
> [ 2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00
> ...
> [ 2.517468] pc : clear_bit+0x18/0x2c
> [ 2.517502] lr : sbitmap_queue_clear+0x40/0x228
> [ 2.517503] sp : ffffff800832bc60 pstate : 00c00145
> ...
> [ 2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000)
> [ 2.517602] Call trace:
> [ 2.517606] clear_bit+0x18/0x2c
> [ 2.517619] kyber_finish_request+0x74/0x80
> [ 2.517627] blk_mq_requeue_request+0x3c/0xc0
> [ 2.517637] __scsi_queue_insert+0x11c/0x148
> [ 2.517640] scsi_softirq_done+0x114/0x130
> [ 2.517643] blk_done_softirq+0x7c/0xb0
> [ 2.517651] __do_softirq+0x208/0x3bc
> [ 2.517657] run_ksoftirqd+0x34/0x60
> [ 2.517663] smpboot_thread_fn+0x1c4/0x2c0
> [ 2.517667] kthread+0x110/0x120
> [ 2.517669] ret_from_fork+0x10/0x18
>
> Signed-off-by: Yang Yang <[email protected]>
> ---
> block/kyber-iosched.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
> index a38c5ab103d1..af73afe7a05c 100644
> --- a/block/kyber-iosched.c
> +++ b/block/kyber-iosched.c
> @@ -611,6 +611,9 @@ static void kyber_finish_request(struct request *rq)
> {
> struct kyber_queue_data *kqd = rq->q->elevator->elevator_data;
>
> + if (unlikely(!(rq->rq_flags & RQF_ELVPRIV)))
> + return;
> +
> rq_clear_domain_token(kqd, rq);
> }
>
>


--
Jens Axboe

2020-09-08 19:02:18

by Omar Sandoval

[permalink] [raw]
Subject: Re: [PATCH] kyber: Fix crash in kyber_finish_request()

On Mon, Sep 07, 2020 at 10:41:16AM -0600, Jens Axboe wrote:
> CC Omar
>
> On 9/7/20 1:43 AM, Yang Yang wrote:
> > Kernel crash when requeue flush request.
> > It can be reproduced as below:
> >
> > [ 2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00
> > ...
> > [ 2.517468] pc : clear_bit+0x18/0x2c
> > [ 2.517502] lr : sbitmap_queue_clear+0x40/0x228
> > [ 2.517503] sp : ffffff800832bc60 pstate : 00c00145
> > ...
> > [ 2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000)
> > [ 2.517602] Call trace:
> > [ 2.517606] clear_bit+0x18/0x2c
> > [ 2.517619] kyber_finish_request+0x74/0x80
> > [ 2.517627] blk_mq_requeue_request+0x3c/0xc0
> > [ 2.517637] __scsi_queue_insert+0x11c/0x148
> > [ 2.517640] scsi_softirq_done+0x114/0x130
> > [ 2.517643] blk_done_softirq+0x7c/0xb0
> > [ 2.517651] __do_softirq+0x208/0x3bc
> > [ 2.517657] run_ksoftirqd+0x34/0x60
> > [ 2.517663] smpboot_thread_fn+0x1c4/0x2c0
> > [ 2.517667] kthread+0x110/0x120
> > [ 2.517669] ret_from_fork+0x10/0x18
> >
> > Signed-off-by: Yang Yang <[email protected]>
> > ---
> > block/kyber-iosched.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
> > index a38c5ab103d1..af73afe7a05c 100644
> > --- a/block/kyber-iosched.c
> > +++ b/block/kyber-iosched.c
> > @@ -611,6 +611,9 @@ static void kyber_finish_request(struct request *rq)
> > {
> > struct kyber_queue_data *kqd = rq->q->elevator->elevator_data;
> >
> > + if (unlikely(!(rq->rq_flags & RQF_ELVPRIV)))
> > + return;
> > +
> > rq_clear_domain_token(kqd, rq);
> > }
> >
> >

It looks like BFQ also has this check. Wouldn't it make more sense to
check it in blk-mq, like we do for .finish_request() in
blk_mq_free_request()? Something along these lines:

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index c34b090178a9..fa98470df3f0 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5895,18 +5895,6 @@ static void bfq_finish_requeue_request(struct request *rq)
struct bfq_queue *bfqq = RQ_BFQQ(rq);
struct bfq_data *bfqd;

- /*
- * Requeue and finish hooks are invoked in blk-mq without
- * checking whether the involved request is actually still
- * referenced in the scheduler. To handle this fact, the
- * following two checks make this function exit in case of
- * spurious invocations, for which there is nothing to do.
- *
- * First, check whether rq has nothing to do with an elevator.
- */
- if (unlikely(!(rq->rq_flags & RQF_ELVPRIV)))
- return;
-
/*
* rq either is not associated with any icq, or is an already
* requeued request that has not (yet) been re-inserted into
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 126021fc3a11..e81ca1bf6e10 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -66,7 +66,7 @@ static inline void blk_mq_sched_requeue_request(struct request *rq)
struct request_queue *q = rq->q;
struct elevator_queue *e = q->elevator;

- if (e && e->type->ops.requeue_request)
+ if ((rq->rq_flags & RQF_ELVPRIV) && e && e->type->ops.requeue_request)
e->type->ops.requeue_request(rq);
}

2020-09-08 19:50:57

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] kyber: Fix crash in kyber_finish_request()

On 9/8/20 1:00 PM, Omar Sandoval wrote:
> On Mon, Sep 07, 2020 at 10:41:16AM -0600, Jens Axboe wrote:
>> CC Omar
>>
>> On 9/7/20 1:43 AM, Yang Yang wrote:
>>> Kernel crash when requeue flush request.
>>> It can be reproduced as below:
>>>
>>> [ 2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00
>>> ...
>>> [ 2.517468] pc : clear_bit+0x18/0x2c
>>> [ 2.517502] lr : sbitmap_queue_clear+0x40/0x228
>>> [ 2.517503] sp : ffffff800832bc60 pstate : 00c00145
>>> ...
>>> [ 2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000)
>>> [ 2.517602] Call trace:
>>> [ 2.517606] clear_bit+0x18/0x2c
>>> [ 2.517619] kyber_finish_request+0x74/0x80
>>> [ 2.517627] blk_mq_requeue_request+0x3c/0xc0
>>> [ 2.517637] __scsi_queue_insert+0x11c/0x148
>>> [ 2.517640] scsi_softirq_done+0x114/0x130
>>> [ 2.517643] blk_done_softirq+0x7c/0xb0
>>> [ 2.517651] __do_softirq+0x208/0x3bc
>>> [ 2.517657] run_ksoftirqd+0x34/0x60
>>> [ 2.517663] smpboot_thread_fn+0x1c4/0x2c0
>>> [ 2.517667] kthread+0x110/0x120
>>> [ 2.517669] ret_from_fork+0x10/0x18
>>>
>>> Signed-off-by: Yang Yang <[email protected]>
>>> ---
>>> block/kyber-iosched.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
>>> index a38c5ab103d1..af73afe7a05c 100644
>>> --- a/block/kyber-iosched.c
>>> +++ b/block/kyber-iosched.c
>>> @@ -611,6 +611,9 @@ static void kyber_finish_request(struct request *rq)
>>> {
>>> struct kyber_queue_data *kqd = rq->q->elevator->elevator_data;
>>>
>>> + if (unlikely(!(rq->rq_flags & RQF_ELVPRIV)))
>>> + return;
>>> +
>>> rq_clear_domain_token(kqd, rq);
>>> }
>>>
>>>
>
> It looks like BFQ also has this check. Wouldn't it make more sense to
> check it in blk-mq, like we do for .finish_request() in
> blk_mq_free_request()? Something along these lines:

Yeah I think so, that's much better than working around it in the
consumer of it.

--
Jens Axboe