2012-08-02 08:41:43

by Chanho Min

[permalink] [raw]
Subject: [PATCH] fix NULL-pointer dereference on scsi_run_queue

This patch is to fix a oops from a torn down device. When
scsi_run_queue process starved queues, scsi_request_fn can race with
scsi_remove_device. In this case, rarely, scsi_request_fn release the
last reference and set sdev->request_queue to NULL. It result in
NULL-pointer dereference when spin_unlock is tried with (NULL)->
queue_lock. We need to add an extra reference to the device on both
sides of the __blk_run_queue to hold reference until scsi_request_fn
is finished.

[ 8.042972] Unable to handle kernel NULL pointer dereference at
virtual address 00000240
[ 8.051061] pgd = 80004000
[ 8.053762] [00000240] *pgd=00000000
[ 8.057342] Internal error: Oops: 17 [#1] PREEMPT SMP ARM
[ 8.062736] Modules linked in:
[ 8.065793] CPU: 0 Not tainted (3.4.2+ #313)
[ 8.070418] PC is at scsi_run_queue+0x19c/0x2b8
[ 8.074947] LR is at scsi_run_queue+0x198/0x2b8
[ 8.079476] pc : [<802569a0>] lr : [<8025699c>] psr: 20000193
[ 8.079481] sp : 9f915f10 ip : 00000f01 fp : 00000001
[ 8.090953] r10: 9f09d424 r9 : 8055294c r8 : 9f915f20
[ 8.096172] r7 : 9f914000 r6 : 00000000 r5 : 9f09e000 r4 : 9f09d400
[ 8.102694] r3 : 00000000 r2 : 80504a14 r1 : 60000193 r0 : 00000043
[ 8.109219] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM
Segment kernel
[ 8.116612] Control: 10c53c7d Table: 1f1f804a DAC: 00000015
[ 8.122355] Process kworker/0:1 (pid: 312, stack limit = 0x9f9142f0)
[ 8.128705] Stack: (0x9f915f10 to 0x9f916000)
[ 8.133059] 5f00: 00000000
00000000 9f1e8000 60000113
[ 8.141236] 5f20: 9f915f20 9f915f20 00000001 9f9be1c0 8095f480
80963100 9f914000 00000000
[ 8.149414] 5f40: 80963105 80258304 9f09da70 80034858 9f9fb6c0
00000000 00000001 9f9be1c0
[ 8.157591] 5f60: 8095f480 8095f488 9f9be1d0 9f914000 804edf80
804fff08 00000009 80037044
[ 8.165767] 5f80: 804edfc0 804edfc0 804edfc0 804edf80 804edf80
804edf80 00000001 9f859ee8
[ 8.173943] 5fa0: 9f915fcc 9f9be1c0 80036ed4 00000000 00000000
00000000 00000000 8003ac64
[ 8.182119] 5fc0: 9f859ee8 00000000 9f9be1c0 00000000 00000000
00000000 9f915fd8 9f915fd8
[ 8.190296] 5fe0: 00000000 9f859ee8 8003abd8 80014c0c 00000013
80014c0c ffffffff ffffffff
[ 8.198494] [<802569a0>] (scsi_run_queue+0x19c/0x2b8) from
[<80034858>] (process_one_work+0x118/0x39c)
[ 8.207812] [<80034858>] (process_one_work+0x118/0x39c) from
[<80037044>] (worker_thread+0x170/0x368)
[ 8.217047] [<80037044>] (worker_thread+0x170/0x368) from
[<8003ac64>] (kthread+0x8c/0x98)
[ 8.225337] [<8003ac64>] (kthread+0x8c/0x98) from [<80014c0c>]
(kernel_thread_exit+0x0/0x8)
[ 8.233692] Code: e59f011c e58dc000 eb05457e e5953004 (e5930240)
[ 8.240026] __scsi_remove_device:962 9f1e8000
[ 8.240140] ---[ end trace 1ec4a0217c9f24f3 ]---

Signed-off-by: Chanho Min <[email protected]>

---
drivers/scsi/scsi_lib.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b583277..1868c35 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -436,9 +436,13 @@ static void scsi_run_queue(struct request_queue *q)
}

spin_unlock(shost->host_lock);
+ /* hold a reference on the device so it doesn't release device */
+ get_device(&sdev->sdev_gendev);
spin_lock(sdev->request_queue->queue_lock);
__blk_run_queue(sdev->request_queue);
spin_unlock(sdev->request_queue->queue_lock);
+ /* ok to remove device now */
+ put_device(&sdev->sdev_gendev);
spin_lock(shost->host_lock);
}
/* put any unprocessed entries back */
--
1.7.0.4


2012-08-02 08:57:24

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] fix NULL-pointer dereference on scsi_run_queue

On Thu, 2012-08-02 at 17:41 +0900, Chanho Min wrote:
> This patch is to fix a oops from a torn down device. When
> scsi_run_queue process starved queues, scsi_request_fn can race with
> scsi_remove_device. In this case, rarely, scsi_request_fn release the
> last reference and set sdev->request_queue to NULL. It result in
> NULL-pointer dereference when spin_unlock is tried with (NULL)->
> queue_lock. We need to add an extra reference to the device on both
> sides of the __blk_run_queue to hold reference until scsi_request_fn
> is finished.

You need a recent kernel with this patch:

commit 940f5d47e2f2e1fa00443921a0abf4822335b54d
Author: Bart Van Assche <[email protected]>
Date: Fri Jun 29 15:34:26 2012 +0000

[SCSI] Avoid dangling pointer in scsi_requeue_command()

James

2012-08-02 09:28:26

by Chanho Min

[permalink] [raw]
Subject: Re: [PATCH] fix NULL-pointer dereference on scsi_run_queue

On Thu, Aug 2, 2012 at 5:57 PM, James Bottomley
<[email protected]> wrote:
> On Thu, 2012-08-02 at 17:41 +0900, Chanho Min wrote:
>> This patch is to fix a oops from a torn down device. When
>> scsi_run_queue process starved queues, scsi_request_fn can race with
>> scsi_remove_device. In this case, rarely, scsi_request_fn release the
>> last reference and set sdev->request_queue to NULL. It result in
>> NULL-pointer dereference when spin_unlock is tried with (NULL)->
>> queue_lock. We need to add an extra reference to the device on both
>> sides of the __blk_run_queue to hold reference until scsi_request_fn
>> is finished.
>
> You need a recent kernel with this patch:
>
> commit 940f5d47e2f2e1fa00443921a0abf4822335b54d
> Author: Bart Van Assche <[email protected]>
> Date: Fri Jun 29 15:34:26 2012 +0000
>
> [SCSI] Avoid dangling pointer in scsi_requeue_command()
>
> James
It is different from my case. This is occured inside scsi_run_queue
and on processing starved_list.
Another sdev is obtained from starved_list.

2012-08-02 09:34:59

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] fix NULL-pointer dereference on scsi_run_queue

On Thu, 2012-08-02 at 18:28 +0900, Chanho Min wrote:
> On Thu, Aug 2, 2012 at 5:57 PM, James Bottomley
> <[email protected]> wrote:
> > On Thu, 2012-08-02 at 17:41 +0900, Chanho Min wrote:
> >> This patch is to fix a oops from a torn down device. When
> >> scsi_run_queue process starved queues, scsi_request_fn can race with
> >> scsi_remove_device. In this case, rarely, scsi_request_fn release the
> >> last reference and set sdev->request_queue to NULL. It result in
> >> NULL-pointer dereference when spin_unlock is tried with (NULL)->
> >> queue_lock. We need to add an extra reference to the device on both
> >> sides of the __blk_run_queue to hold reference until scsi_request_fn
> >> is finished.
> >
> > You need a recent kernel with this patch:
> >
> > commit 940f5d47e2f2e1fa00443921a0abf4822335b54d
> > Author: Bart Van Assche <[email protected]>
> > Date: Fri Jun 29 15:34:26 2012 +0000
> >
> > [SCSI] Avoid dangling pointer in scsi_requeue_command()
> >
> > James
> It is different from my case. This is occured inside scsi_run_queue
> and on processing starved_list.
> Another sdev is obtained from starved_list.

Does it occur with that patch applied?

If it does, the likely fix would be to take a copy of the queue ... but
I'd like to understand why first. An active command has an automatic
reference to the sdev_gendev, so it shouldn't be the normal case. This
was broken by unprep because it releases the command from the queue and
drops the reference. We may have another case like unjprep, but in that
case, we need to find it ... trying to add extra get/put_device() calls
will paper over the problem.

James

2012-08-03 02:28:49

by Chanho Min

[permalink] [raw]
Subject: Re: [PATCH] fix NULL-pointer dereference on scsi_run_queue

> Does it occur with that patch applied?
I'm trying to reproduce it with that patch. but, It is unlikely to be fixed.
because scsi_run_queue is invoked from scsi_requeue_run_queue,
not scsi_requeue_command.

> If it does, the likely fix would be to take a copy of the queue ... but
> I'd like to understand why first. An active command has an automatic
> reference to the sdev_gendev, so it shouldn't be the normal case. This
> was broken by unprep because it releases the command from the queue and
> drops the reference. We may have another case like unjprep, but in that
__scsi_remove_device drops the last reference under race condition.

> case, we need to find it ... trying to add extra get/put_device() calls
> will paper over the problem.
yes, extra reference is not good to fix.
But, As long as scsi_device_dev_release_usercontext set request_queue
to NULL, Isn't it necessary to ensure that __blk_run_queue don't release device?

Thanks a lot!
Chanho Min

2012-08-03 03:01:57

by Mike Christie

[permalink] [raw]
Subject: Re: [PATCH] fix NULL-pointer dereference on scsi_run_queue

On 08/02/2012 04:34 AM, James Bottomley wrote:
> On Thu, 2012-08-02 at 18:28 +0900, Chanho Min wrote:
>> On Thu, Aug 2, 2012 at 5:57 PM, James Bottomley
>> <[email protected]> wrote:
>>> On Thu, 2012-08-02 at 17:41 +0900, Chanho Min wrote:
>>>> This patch is to fix a oops from a torn down device. When
>>>> scsi_run_queue process starved queues, scsi_request_fn can race with
>>>> scsi_remove_device. In this case, rarely, scsi_request_fn release the
>>>> last reference and set sdev->request_queue to NULL. It result in
>>>> NULL-pointer dereference when spin_unlock is tried with (NULL)->
>>>> queue_lock. We need to add an extra reference to the device on both
>>>> sides of the __blk_run_queue to hold reference until scsi_request_fn
>>>> is finished.
>>>
>>> You need a recent kernel with this patch:
>>>
>>> commit 940f5d47e2f2e1fa00443921a0abf4822335b54d
>>> Author: Bart Van Assche <[email protected]>
>>> Date: Fri Jun 29 15:34:26 2012 +0000
>>>
>>> [SCSI] Avoid dangling pointer in scsi_requeue_command()
>>>
>>> James
>> It is different from my case. This is occured inside scsi_run_queue
>> and on processing starved_list.
>> Another sdev is obtained from starved_list.
>
> Does it occur with that patch applied?
>
> If it does, the likely fix would be to take a copy of the queue ... but
> I'd like to understand why first. An active command has an automatic
> reference to the sdev_gendev, so it shouldn't be the normal case. This
> was broken by unprep because it releases the command from the queue and
> drops the reference. We may have another case like unjprep, but in that
> case, we need to find it ... trying to add extra get/put_device() calls
> will paper over the problem.
>

I think the problem is that __scsi_remove_device will now wait for
commands to get dequeued and run, before proceeding but we do not take a
device off the starved list until scsi_device_dev_release_usercontext is
run, or maybe thinking about it another way scsi_kill_request does not
remove sdevs from the starved list if the device is being removed.

So lets say we hit the not_ready path in scsi_request_fn and put the
sdev on the starved list. Then we remove the device. We could end up
putting the device in SDEV_DEL, and then calling scsi_request_fn via
blk_cleanup_queue's drain queue call. scsi_request_fn would hit the
scsi_device_online check and fail the IO, but we never took the sdev off
the starved list from what I can tell.

Now, there is no IO in the queue and so __scsi_remove_device continues.
It then calls scsi_device_dev_release_usercontext at the same time some
other thread is calling scsi_run_queue. We then race. scsi_run_queue
splices the starved list with the sdev we are trying to remove and
deletes the list entry from the list and drops the host lock. But then
scsi_device_dev_release_usercontext grabs the host lock and ends up
running the entire function and freeing the queue. Then scsi_run_queue
tries to access the sdev and queue so it can grab the queue lock that
was just freed and kablewy.

2012-08-04 09:01:36

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] fix NULL-pointer dereference on scsi_run_queue

On 08/02/12 08:41, Chanho Min wrote:
> This patch is to fix a oops from a torn down device. When
> scsi_run_queue process starved queues, scsi_request_fn can race with
> scsi_remove_device. In this case, rarely, scsi_request_fn release the
> last reference and set sdev->request_queue to NULL. It result in
> NULL-pointer dereference when spin_unlock is tried with (NULL)->
> queue_lock. We need to add an extra reference to the device on both
> sides of the __blk_run_queue to hold reference until scsi_request_fn
> is finished.

Good catch. So far I haven't been able to trigger this issue in my
tests. So it would be appreciated if you could verify whether the patch
below helps (patch is based on 3.6-rc1):

---
drivers/scsi/scsi_sysfs.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 093d4f6..59e523c 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -348,7 +348,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
starget->reap_ref++;
list_del(&sdev->siblings);
list_del(&sdev->same_target_siblings);
- list_del(&sdev->starved_entry);
spin_unlock_irqrestore(sdev->host->host_lock, flags);

cancel_work_sync(&sdev->event_work);
@@ -956,6 +955,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
void __scsi_remove_device(struct scsi_device *sdev)
{
struct device *dev = &sdev->sdev_gendev;
+ struct Scsi_Host *shost = sdev->host;
+ unsigned long flags;

if (sdev->is_visible) {
if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
@@ -977,6 +978,11 @@ void __scsi_remove_device(struct scsi_device *sdev)
blk_cleanup_queue(sdev->request_queue);
cancel_work_sync(&sdev->requeue_work);

+ spin_lock_irqsave(shost->host_lock, flags);
+ if (!list_empty(&sdev->starved_entry))
+ list_del(&sdev->starved_entry);
+ spin_unlock_irqrestore(shost->host_lock, flags);
+
if (sdev->host->hostt->slave_destroy)
sdev->host->hostt->slave_destroy(sdev);
transport_destroy_device(dev);
--
1.7.7




2012-08-04 16:46:32

by Mike Christie

[permalink] [raw]
Subject: Re: [PATCH] fix NULL-pointer dereference on scsi_run_queue

On 08/04/2012 04:01 AM, Bart Van Assche wrote:
> On 08/02/12 08:41, Chanho Min wrote:
>> This patch is to fix a oops from a torn down device. When
>> scsi_run_queue process starved queues, scsi_request_fn can race with
>> scsi_remove_device. In this case, rarely, scsi_request_fn release the
>> last reference and set sdev->request_queue to NULL. It result in
>> NULL-pointer dereference when spin_unlock is tried with (NULL)->
>> queue_lock. We need to add an extra reference to the device on both
>> sides of the __blk_run_queue to hold reference until scsi_request_fn
>> is finished.
>
> Good catch. So far I haven't been able to trigger this issue in my
> tests. So it would be appreciated if you could verify whether the patch
> below helps (patch is based on 3.6-rc1):
>
> ---
> drivers/scsi/scsi_sysfs.c | 8 +++++++-
> 1 files changed, 7 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 093d4f6..59e523c 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -348,7 +348,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
> starget->reap_ref++;
> list_del(&sdev->siblings);
> list_del(&sdev->same_target_siblings);
> - list_del(&sdev->starved_entry);
> spin_unlock_irqrestore(sdev->host->host_lock, flags);
>
> cancel_work_sync(&sdev->event_work);
> @@ -956,6 +955,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
> void __scsi_remove_device(struct scsi_device *sdev)
> {
> struct device *dev = &sdev->sdev_gendev;
> + struct Scsi_Host *shost = sdev->host;
> + unsigned long flags;
>
> if (sdev->is_visible) {
> if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
> @@ -977,6 +978,11 @@ void __scsi_remove_device(struct scsi_device *sdev)
> blk_cleanup_queue(sdev->request_queue);
> cancel_work_sync(&sdev->requeue_work);
>
> + spin_lock_irqsave(shost->host_lock, flags);
> + if (!list_empty(&sdev->starved_entry))
> + list_del(&sdev->starved_entry);
> + spin_unlock_irqrestore(shost->host_lock, flags);
> +

I do not think it's that simple. If scsi_run_queue is running right now
and that function has deleted the starved entry and is now about to
access the sdev or queue, then this code above does not help and
__scsi_remove_device could just continue on and end up calling
scsi_device_dev_release_usercontext and freeing the device from under
scsi_run_queue.

I think we have to have scsi-ml do a get_device when a sdev is added to
the starved entry and then do a put_device when it is removed (must do
these under the host lock for the starved entry case too). I am not sure
if that is just a hack/papering-over of the problem and there are more
issues like this.

2012-08-04 20:18:51

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] fix NULL-pointer dereference on scsi_run_queue

On 08/04/12 16:46, Mike Christie wrote:
> I think we have to have scsi-ml do a get_device when a sdev is added to
> the starved entry and then do a put_device when it is removed (must do
> these under the host lock for the starved entry case too). I am not sure
> if that is just a hack/papering-over of the problem and there are more
> issues like this.

That would result in a more complex patch than the patch at the start of
this thread, isn't it ? Also, IMHO it would help to document which
functions in the scsi-ml are called with an sdev reference and which
ones not. That would make the scsi-ml code easier to verify for issues
like the one reported at the start of this thread.

Bart.

2012-08-04 22:36:50

by Mike Christie

[permalink] [raw]
Subject: Re: [PATCH] fix NULL-pointer dereference on scsi_run_queue

On 08/04/2012 03:18 PM, Bart Van Assche wrote:
> On 08/04/12 16:46, Mike Christie wrote:
>> I think we have to have scsi-ml do a get_device when a sdev is added to
>> the starved entry and then do a put_device when it is removed (must do
>> these under the host lock for the starved entry case too). I am not sure
>> if that is just a hack/papering-over of the problem and there are more
>> issues like this.
>
> That would result in a more complex patch than the patch at the start of
> this thread, isn't it ? Also, IMHO it would help to document which

Yaah, but the original patch in this thread is still racey isn't it?

spin_unlock(shost->host_lock);

The sdev/queue could get freed by some other thread when this function
is right here, so the get_device call is now going to try to access
freed memory.


+ /* hold a reference on the device so it doesn't release device */
+ get_device(&sdev->sdev_gendev);

2012-08-06 17:56:57

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] fix NULL-pointer dereference on scsi_run_queue

On 08/04/12 22:36, Mike Christie wrote:
> On 08/04/2012 03:18 PM, Bart Van Assche wrote:
>> On 08/04/12 16:46, Mike Christie wrote:
>>> I think we have to have scsi-ml do a get_device when a sdev is added to
>>> the starved entry and then do a put_device when it is removed (must do
>>> these under the host lock for the starved entry case too). I am not sure
>>> if that is just a hack/papering-over of the problem and there are more
>>> issues like this.
>>
>> That would result in a more complex patch than the patch at the start of
>> this thread, isn't it ?
>
> Yaah, but the original patch in this thread is still racey isn't it?

Indeed. How about the patch below ? Scsi devices are removed from
starved_list after blk_cleanup_queue() and before put_device(). That
guarantees that inside scsi_run_queue() get_device() under host lock
will succeed.

---
drivers/scsi/scsi_lib.c | 5 +++++
drivers/scsi/scsi_sysfs.c | 7 ++++++-
2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ffd7773..bd7daec 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -452,10 +452,15 @@ static void scsi_run_queue(struct request_queue *q)
continue;
}

+ get_device(&sdev->sdev_gendev);
spin_unlock(shost->host_lock);
+
spin_lock(sdev->request_queue->queue_lock);
__blk_run_queue(sdev->request_queue);
spin_unlock(sdev->request_queue->queue_lock);
+
+ put_device(&sdev->sdev_gendev);
+
spin_lock(shost->host_lock);
}
/* put any unprocessed entries back */
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 093d4f6..44f232e 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -348,7 +348,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
starget->reap_ref++;
list_del(&sdev->siblings);
list_del(&sdev->same_target_siblings);
- list_del(&sdev->starved_entry);
spin_unlock_irqrestore(sdev->host->host_lock, flags);

cancel_work_sync(&sdev->event_work);
@@ -956,6 +955,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
void __scsi_remove_device(struct scsi_device *sdev)
{
struct device *dev = &sdev->sdev_gendev;
+ struct Scsi_Host *shost = sdev->host;
+ unsigned long flags;

if (sdev->is_visible) {
if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
@@ -977,6 +978,10 @@ void __scsi_remove_device(struct scsi_device *sdev)
blk_cleanup_queue(sdev->request_queue);
cancel_work_sync(&sdev->requeue_work);

+ spin_lock_irqsave(shost->host_lock, flags);
+ list_del(&sdev->starved_entry);
+ spin_unlock_irqrestore(shost->host_lock, flags);
+
if (sdev->host->hostt->slave_destroy)
sdev->host->hostt->slave_destroy(sdev);
transport_destroy_device(dev);
--
1.7.7

2012-08-07 08:53:53

by Chanho Min

[permalink] [raw]
Subject: Re: [PATCH] fix NULL-pointer dereference on scsi_run_queue

On Tue, Aug 7, 2012 at 2:56 AM, Bart Van Assche <[email protected]> wrote:
> Indeed. How about the patch below ? Scsi devices are removed from
> starved_list after blk_cleanup_queue() and before put_device(). That
> guarantees that inside scsi_run_queue() get_device() under host lock
> will succeed.
Thanks, IMHO, it's harmless and the simple way to solve this issue.
But, I think the second half of your patches are not required, extra
referecne is might suffice?

In addition, Is it ironic that we are careful to use put_device at
scsi_request_fn?. If we trigger the ->remove(),
It occur a oops. What about the removal of unlock/lock as patch bellow?

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 4037fd5..8d9eccd 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1608,11 +1608,7 @@ out_delay:
if (sdev->device_busy == 0)
blk_delay_queue(q, SCSI_QUEUE_DELAY);
out:
- /* must be careful here...if we trigger the ->remove() function
- * we cannot be holding the q lock */
- spin_unlock_irq(q->queue_lock);
put_device(&sdev->sdev_gendev);
- spin_lock_irq(q->queue_lock);
}

u64 scsi_calculate_bounce_limit(struct Scsi_Host *shost)

2012-08-07 09:30:34

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] fix NULL-pointer dereference on scsi_run_queue

On 08/07/12 08:53, Chanho Min wrote:
> On Tue, Aug 7, 2012 at 2:56 AM, Bart Van Assche <[email protected]> wrote:
>> Indeed. How about the patch below ? Scsi devices are removed from
>> starved_list after blk_cleanup_queue() and before put_device(). That
>> guarantees that inside scsi_run_queue() get_device() under host lock
>> will succeed.
>
> Thanks, IMHO, it's harmless and the simple way to solve this issue.
> But, I think the second half of your patches are not required, extra
> referecne is might suffice?

I'm afraid that without the second half of that patch the following race
is still possible:
- sdev reference count drops to zero while scsi_run_queue() is in
progress and while that sdev is on the starved_list of its SCSI host;
scsi_device_dev_release_usercontext() call is scheduled but not yet
executed.
- scsi_run_queue() takes that sdev off the local starved_list.
- scsi_run_queue() calls get_device() and that call fails since the
sdev reference count is zero.
- scsi_device_dev_release_usercontext() gets scheduled and frees the
sdev.
- scsi_run_queue() proceeds and calls __blk_run_queue() on a freed
queue, which is what we were trying to avoid.

Bart.

2012-08-07 09:43:24

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] fix NULL-pointer dereference on scsi_run_queue

On 08/07/12 08:53, Chanho Min wrote:
> In addition, Is it ironic that we are careful to use put_device at
> scsi_request_fn?. If we trigger the ->remove(),
> It occur a oops. What about the removal of unlock/lock as patch bellow?
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 4037fd5..8d9eccd 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1608,11 +1608,7 @@ out_delay:
> if (sdev->device_busy == 0)
> blk_delay_queue(q, SCSI_QUEUE_DELAY);
> out:
> - /* must be careful here...if we trigger the ->remove() function
> - * we cannot be holding the q lock */
> - spin_unlock_irq(q->queue_lock);
> put_device(&sdev->sdev_gendev);
> - spin_lock_irq(q->queue_lock);
> }

As far as I can see the comment in the above code was added before
scsi_device_dev_release() was moved to user context, so it might be
outdated. See also
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=65110b2168950a19cc78b5027ed18cb811fbdae8.

Bart.

2012-08-07 16:16:41

by Mike Christie

[permalink] [raw]
Subject: Re: [PATCH] fix NULL-pointer dereference on scsi_run_queue

On 08/06/2012 12:56 PM, Bart Van Assche wrote:
> On 08/04/12 22:36, Mike Christie wrote:
>> On 08/04/2012 03:18 PM, Bart Van Assche wrote:
>>> On 08/04/12 16:46, Mike Christie wrote:
>>>> I think we have to have scsi-ml do a get_device when a sdev is added to
>>>> the starved entry and then do a put_device when it is removed (must do
>>>> these under the host lock for the starved entry case too). I am not sure
>>>> if that is just a hack/papering-over of the problem and there are more
>>>> issues like this.
>>>
>>> That would result in a more complex patch than the patch at the start of
>>> this thread, isn't it ?
>>
>> Yaah, but the original patch in this thread is still racey isn't it?
>
> Indeed. How about the patch below ? Scsi devices are removed from
> starved_list after blk_cleanup_queue() and before put_device(). That
> guarantees that inside scsi_run_queue() get_device() under host lock
> will succeed.
>
> ---
> drivers/scsi/scsi_lib.c | 5 +++++
> drivers/scsi/scsi_sysfs.c | 7 ++++++-
> 2 files changed, 11 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index ffd7773..bd7daec 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -452,10 +452,15 @@ static void scsi_run_queue(struct request_queue *q)
> continue;
> }
>
> + get_device(&sdev->sdev_gendev);
> spin_unlock(shost->host_lock);
> +
> spin_lock(sdev->request_queue->queue_lock);
> __blk_run_queue(sdev->request_queue);
> spin_unlock(sdev->request_queue->queue_lock);
> +
> + put_device(&sdev->sdev_gendev);
> +
> spin_lock(shost->host_lock);
> }
> /* put any unprocessed entries back */
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 093d4f6..44f232e 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -348,7 +348,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
> starget->reap_ref++;
> list_del(&sdev->siblings);
> list_del(&sdev->same_target_siblings);
> - list_del(&sdev->starved_entry);
> spin_unlock_irqrestore(sdev->host->host_lock, flags);
>
> cancel_work_sync(&sdev->event_work);
> @@ -956,6 +955,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
> void __scsi_remove_device(struct scsi_device *sdev)
> {
> struct device *dev = &sdev->sdev_gendev;
> + struct Scsi_Host *shost = sdev->host;
> + unsigned long flags;
>
> if (sdev->is_visible) {
> if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
> @@ -977,6 +978,10 @@ void __scsi_remove_device(struct scsi_device *sdev)
> blk_cleanup_queue(sdev->request_queue);
> cancel_work_sync(&sdev->requeue_work);
>
> + spin_lock_irqsave(shost->host_lock, flags);
> + list_del(&sdev->starved_entry);
> + spin_unlock_irqrestore(shost->host_lock, flags);
> +
> if (sdev->host->hostt->slave_destroy)
> sdev->host->hostt->slave_destroy(sdev);
> transport_destroy_device(dev);
>

I think the patch will work now.

Reviewed-by: Mike Christie <[email protected]>

2012-08-08 03:42:47

by Chanho Min

[permalink] [raw]
Subject: Re: [PATCH] fix NULL-pointer dereference on scsi_run_queue

> I'm afraid that without the second half of that patch the following race
> is still possible:
> - sdev reference count drops to zero while scsi_run_queue() is in
> progress and while that sdev is on the starved_list of its SCSI host;
> scsi_device_dev_release_usercontext() call is scheduled but not yet
> executed.
> - scsi_run_queue() takes that sdev off the local starved_list.
> - scsi_run_queue() calls get_device() and that call fails since the
> sdev reference count is zero.
> - scsi_device_dev_release_usercontext() gets scheduled and frees the
> sdev.
> - scsi_run_queue() proceeds and calls __blk_run_queue() on a freed
> queue, which is what we were trying to avoid.
Thank you for the explanation. It look correct. Let's check one more thing.
What If __scsi_remove_device doesn't release device? : reference count
is more than 2.
So We lost starved_list but device is exist. Is there any issue about this?

Chanho,

2012-08-08 07:37:35

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] fix NULL-pointer dereference on scsi_run_queue

On 08/08/12 03:42, Chanho Min wrote:
> Thank you for the explanation. It look correct. Let's check one more thing.
> What If __scsi_remove_device doesn't release device? : reference count
> is more than 2.
> So We lost starved_list but device is exist. Is there any issue about this?

As far as I can see that scenario will also be handled correctly by the
proposed patch.

Bart.