2006-08-09 15:35:27

by Andreas Herrmann

[permalink] [raw]
Subject: [PATCH] limit recursion when flushing shost->starved_list

Hi,

Attached is a patch that should limit a possible recursion that can
lead to a stack overflow like follows:

Kernel stack overflow.
CPU: 3 Not tainted
Process zfcperp0.0.d819
(pid: 13897, task: 000000003e0d8cc8, ksp: 000000003499dbb8)
Krnl PSW : 0404000180000000 000000000030f8b2 (get_device+0x12/0x48)
Krnl GPRS: 00000000135a1980 000000000030f758 000000003ed6c1e8 0000000000000005
0000000000000000 000000000044a780 000000003dbf7000 0000000034e15800
000000003621c048 070000003499c108 000000003499c1a0 000000003ed6c000
0000000040895000 00000000408ab630 000000003499c0a0 000000003499c0a0
Krnl Code: a7 fb ff e8 a7 19 00 00 b9 02 00 22 e3 e0 f0 98 00 24 a7 84
Call Trace:
([<000000004089edc2>] scsi_request_fn+0x13e/0x650 [scsi_mod])
[<00000000002c5ff4>] blk_run_queue+0xd4/0x1a4
[<000000004089ff8c>] scsi_queue_insert+0x22c/0x2a4 [scsi_mod]
[<000000004089779a>] scsi_dispatch_cmd+0x8a/0x3d0 [scsi_mod]
[<000000004089f1ec>] scsi_request_fn+0x568/0x650 [scsi_mod]
...
[<000000004089f1ec>] scsi_request_fn+0x568/0x650 [scsi_mod]
[<00000000002c5ff4>] blk_run_queue+0xd4/0x1a4
[<000000004089ff8c>] scsi_queue_insert+0x22c/0x2a4 [scsi_mod]
[<000000004089779a>] scsi_dispatch_cmd+0x8a/0x3d0 [scsi_mod]
[<000000004089f1ec>] scsi_request_fn+0x568/0x650 [scsi_mod]
[<00000000002c5ff4>] blk_run_queue+0xd4/0x1a4
[<000000004089fa9e>] scsi_run_host_queues+0x196/0x230 [scsi_mod]
[<00000000409eba28>] zfcp_erp_thread+0x2638/0x3080 [zfcp]
[<0000000000107462>] kernel_thread_starter+0x6/0xc
[<000000000010745c>] kernel_thread_starter+0x0/0xc
<0>Kernel panic - not syncing: Corrupt kernel stack, can't continue.

This stack overflow occurred during tests on s390 using zfcp.
Recursion depth for this panic was 19.

Usually recursion between blk_run_queue and a request_fn is avoided
using QUEUE_FLAG_REENTER. But this does not help if the scsi stack
tries to flush the starved_list of a scsi_host.


Regards,

Andreas


Limit recursion depth when flushing the starved_list
of a scsi_host.

Signed-off-by: Andreas Herrmann <[email protected]>

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 077c1c6..d6743b9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -551,7 +551,15 @@ static void scsi_run_queue(struct reques
list_del_init(&sdev->starved_entry);
spin_unlock_irqrestore(shost->host_lock, flags);

- blk_run_queue(sdev->request_queue);
+
+ if (test_bit(QUEUE_FLAG_REENTER, &q->queue_flags) &&
+ !test_and_set_bit(QUEUE_FLAG_REENTER,
+ &sdev->request_queue->queue_flags)) {
+ blk_run_queue(sdev->request_queue);
+ clear_bit(QUEUE_FLAG_REENTER,
+ &sdev->request_queue->queue_flags);
+ } else
+ blk_run_queue(sdev->request_queue);

spin_lock_irqsave(shost->host_lock, flags);
if (unlikely(!list_empty(&sdev->starved_entry)))


2006-08-19 22:08:45

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] limit recursion when flushing shost->starved_list

On Wed, 2006-08-09 at 17:31 +0200, Andreas Herrmann wrote:
> Attached is a patch that should limit a possible recursion that can
> lead to a stack overflow like follows:

Well, OK, initially I'll put this in scsi-misc, since I acknowledge its
a problem. However, it is one that block queue grouping should be able
to get us out of ... and at that time, this code (and most of the
starved list handling) should exit the scsi mid-layer.

James


2006-08-21 07:02:20

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] limit recursion when flushing shost->starved_list

On Sat, Aug 19 2006, James Bottomley wrote:
> On Wed, 2006-08-09 at 17:31 +0200, Andreas Herrmann wrote:
> > Attached is a patch that should limit a possible recursion that can
> > lead to a stack overflow like follows:
>
> Well, OK, initially I'll put this in scsi-misc, since I acknowledge its
> a problem. However, it is one that block queue grouping should be able
> to get us out of ... and at that time, this code (and most of the
> starved list handling) should exit the scsi mid-layer.

Yep, the patch is a little nasty but I don't see a better way currently.
So the above outline sounds good to me.

--
Jens Axboe