LinuxLists.cc - kernel BUG at drivers/scsi/scsi

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On Wed, 2015-11-18 at 20:18 +1100, Michael Ellerman wrote:
> Hi folks,
>
> I'm intermittently seeing the following oops on at least one powerpc box.
>
> The BUG_ON() is from:
>
> static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb)
> {
> ...
> count = blk_rq_map_sg(req->q, req, sdb->table.sgl);
> BUG_ON(count > sdb->table.nents);
>
> Looking at the dump it looks like count was 2, I can't work out what nents was.
>
> The machine's just a fairly boring bare metal setup, with a single IPR adapter:
>
> 0001:08:00.0 RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) (rev 02)
> Subsystem: IBM PCIe3 x8 SAS RAID Internal Adapter 6Gb (57D7)
> Flags: bus master, fast devsel, latency 0
> Kernel driver in use: ipr
>
>
> Anyone seen it before or have any ideas?

I'm also seeing it on arm64 in 4.4-rc1

[    6.859003] Call trace:
[    6.861439] [<fffffe000074345c>] scsi_init_sgtable+0x84/0x88
[    6.867072] [<fffffe00007434ac>] scsi_init_io+0x4c/0x1ac
[    6.872358] [<fffffe000075004c>] sd_setup_read_write_cmnd+0x44/0x844
[    6.878682] [<fffffe0000750884>] sd_init_command+0x38/0xb0
[    6.884141] [<fffffe00007436e4>] scsi_setup_cmnd+0xd8/0x13c
[    6.889686] [<fffffe000074397c>] scsi_prep_fn+0xc0/0x140
[    6.894973] [<fffffe00003bd0cc>] blk_peek_request+0x148/0x24c
[    6.900692] [<fffffe0000744754>] scsi_request_fn+0x58/0x648
[    6.906237] [<fffffe00003b8e80>] __blk_run_queue+0x40/0x58
[    6.911696] [<fffffe00003b91a4>] blk_run_queue+0x30/0x48
[    6.916983] [<fffffe00007422e4>] scsi_run_queue+0x204/0x294
[    6.922528] [<fffffe000074321c>] scsi_end_request+0x13c/0x1a0
[    6.928247] [<fffffe0000744ef4>] scsi_io_completion+0xf0/0x564
[    6.934052] [<fffffe000073beb0>] scsi_finish_command+0xe4/0x144
[    6.939943] [<fffffe00007446cc>] scsi_softirq_done+0x148/0x178
[    6.945748] [<fffffe00003c39a4>] blk_done_softirq+0x7c/0x94
[    6.951295] [<fffffe00000c8f44>] __do_softirq+0x114/0x2a0
[    6.956667] [<fffffe00000c9340>] irq_exit+0x8c/0xe4
[    6.961522] [<fffffe000009cf00>] handle_IPI+0x170/0x228
[    6.966721] [<fffffe00000904e0>] gic_handle_irq+0xa0/0xb8
[    6.972093] Exception stack(0xfffffe03dc143de0 to 0xfffffe03dc143f00)

>
> cheers
>
>
> systemd[1]: Starting Uncomplicated firewall...
>          Starting Uncomplicated firewall...
> ------------[ cut here ]------------
> kernel BUG at drivers/scsi/scsi_lib.c:1096!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=2048 NUMA PowerNV
> Modules linked in:
> CPU: 132 PID: 2699 Comm: kworker/132:1H Not tainted 4.4.0-rc1-54939-ge22a248-dirty #77
> Workqueue: kblockd cfq_kick_queue
> task: c000000fef147400 ti: c000000feb384000 task.ti: c000000feb384000
> NIP: c0000000005ab4a8 LR: c0000000005ab490 CTR: 0000000000000000
> REGS: c000000feb387620 TRAP: 0700   Not tainted  (4.4.0-rc1-54939-ge22a248-dirty)
> MSR: 9000000100029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002228  XER: 00000000
> CFAR: c000000000464950 SOFTE: 0
> GPR00: c0000000005ab490 c000000feb3878a0 c000000000d77d00 0000000000000002
> GPR04: c000000ff2030158 c000000ff47a0c00 0000000000000000 0000000000001000
> GPR08: 0000000000000000 0000000000000001 0000000000000000 fffffffffffffff7
> GPR12: 0000000000002200 c00000000fde5200 c0000000000c8098 c000000000b39858
> GPR16: c000000000ae12c8 c000000000b39948 0000000000000000 0000000000000000
> GPR20: c0000000005ab9c0 c0000017f5800144 0000000000000000 c000001e551a6850
> GPR24: c0000017f5800140 c0000007efb9c800 0000000000000000 c000001e551a6800
> GPR28: 0000000000000000 0000000000000000 c000000ff2030158 c000000feb420240
> NIP [c0000000005ab4a8] scsi_init_sgtable+0xa8/0x180
> LR [c0000000005ab490] scsi_init_sgtable+0x90/0x180
> Call Trace:
> [c000000feb3878a0] [c0000000005ab490] scsi_init_sgtable+0x90/0x180 (unreliable)
> [c000000feb3878e0] [c0000000005ab5d4] scsi_init_io+0x54/0x160
> [c000000feb387930] [c0000000005fb43c] sd_init_command+0x6c/0xb00
> [c000000feb3879f0] [c0000000005ac2f8] scsi_setup_cmnd+0x108/0x1a0
> [c000000feb387a30] [c0000000005ac594] scsi_prep_fn+0x104/0x1c0
> [c000000feb387a70] [c00000000045d85c] blk_peek_request+0x20c/0x390
> [c000000feb387af0] [c0000000005ad6a8] scsi_request_fn+0xb8/0x7f0
> [c000000feb387bf0] [c0000000004583b4] __blk_run_queue+0x54/0x80
> [c000000feb387c20] [c00000000047e178] cfq_kick_queue+0x38/0xc0
> [c000000feb387c50] [c0000000000c00cc] process_one_work+0x2ac/0x560
> [c000000feb387ce0] [c0000000000c0510] worker_thread+0x190/0x660
> [c000000feb387d80] [c0000000000c8198] kthread+0x108/0x130
> [c000000feb387e30] [c0000000000095b0] ret_from_kernel_thread+0x5c/0xac
> Instruction dump:
> 2fa30000 409e00e4 e8bf0000 e87e0030 7fc4f378 4beb91ed 60000000 813f0008
> 78630020 7f891840 7d301026 5529effe <0b090000> 907f0008 38600000 813e005c
> ---[ end trace 3ea632a512ff48d5 ]---
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

2015-11-19 01:02:48

by Michael Ellerman

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On Wed, 2015-11-18 at 09:03 -0500, Mark Salter wrote:
> On Wed, 2015-11-18 at 20:18 +1100, Michael Ellerman wrote:
> > Hi folks,
> >
> > I'm intermittently seeing the following oops on at least one powerpc box.
> >
> > The BUG_ON() is from:
> >
> > static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb)
> > {
> > ...
> > count = blk_rq_map_sg(req->q, req, sdb->table.sgl);
> > BUG_ON(count > sdb->table.nents);
> >
> > Looking at the dump it looks like count was 2, I can't work out what nents was.
> >
> > The machine's just a fairly boring bare metal setup, with a single IPR adapter:
> >
> > 0001:08:00.0 RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) (rev 02)
> > Subsystem: IBM PCIe3 x8 SAS RAID Internal Adapter 6Gb (57D7)
> > Flags: bus master, fast devsel, latency 0
> > Kernel driver in use: ipr
> >
> >
> > Anyone seen it before or have any ideas?
>
> I'm also seeing it on arm64 in 4.4-rc1

Ah thanks, that's a good data point. I was assuming it was a driver bug, but I
assume you're not using IPR :)

> [ 6.859003] Call trace:
> [ 6.861439] [<fffffe000074345c>] scsi_init_sgtable+0x84/0x88
> [ 6.867072] [<fffffe00007434ac>] scsi_init_io+0x4c/0x1ac
> [ 6.872358] [<fffffe000075004c>] sd_setup_read_write_cmnd+0x44/0x844
> [ 6.878682] [<fffffe0000750884>] sd_init_command+0x38/0xb0
> [ 6.884141] [<fffffe00007436e4>] scsi_setup_cmnd+0xd8/0x13c
> [ 6.889686] [<fffffe000074397c>] scsi_prep_fn+0xc0/0x140
> [ 6.894973] [<fffffe00003bd0cc>] blk_peek_request+0x148/0x24c
> [ 6.900692] [<fffffe0000744754>] scsi_request_fn+0x58/0x648
> [ 6.906237] [<fffffe00003b8e80>] __blk_run_queue+0x40/0x58
> [ 6.911696] [<fffffe00003b91a4>] blk_run_queue+0x30/0x48
> [ 6.916983] [<fffffe00007422e4>] scsi_run_queue+0x204/0x294
> [ 6.922528] [<fffffe000074321c>] scsi_end_request+0x13c/0x1a0
> [ 6.928247] [<fffffe0000744ef4>] scsi_io_completion+0xf0/0x564
> [ 6.934052] [<fffffe000073beb0>] scsi_finish_command+0xe4/0x144
> [ 6.939943] [<fffffe00007446cc>] scsi_softirq_done+0x148/0x178
> [ 6.945748] [<fffffe00003c39a4>] blk_done_softirq+0x7c/0x94
> [ 6.951295] [<fffffe00000c8f44>] __do_softirq+0x114/0x2a0
> [ 6.956667] [<fffffe00000c9340>] irq_exit+0x8c/0xe4
> [ 6.961522] [<fffffe000009cf00>] handle_IPI+0x170/0x228
> [ 6.966721] [<fffffe00000904e0>] gic_handle_irq+0xa0/0xb8
> [ 6.972093] Exception stack(0xfffffe03dc143de0 to 0xfffffe03dc143f00)

cheers

2015-11-19 08:23:30

by Christoph Hellwig

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

It's pretty much guaranteed a block layer bug, most likely in the
merge bios to request infrastucture where we don't obey the merging
limits properly.

Does either of you have a known good and first known bad kernel?

2015-11-19 15:35:36

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On 11/19/2015 09:23 AM, Christoph Hellwig wrote:
> It's pretty much guaranteed a block layer bug, most likely in the
> merge bios to request infrastucture where we don't obey the merging
> limits properly.
>
> Does either of you have a known good and first known bad kernel?

Well, I have been fighting a similar issue for several months now,
albeit with multipath enabled. Haven't had much progress with this,
sadly.
Seeing that this is our distro kernel it might or might not be
related; however, as the symptoms are identical there still is a
chance that this is actually a generic block-layer problem.

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

2015-11-20 12:10:52

by Michael Ellerman

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:

> It's pretty much guaranteed a block layer bug, most likely in the
> merge bios to request infrastucture where we don't obey the merging
> limits properly.
>
> Does either of you have a known good and first known bad kernel?

Not me, I've only hit it one or two times. All I can say is I have hit it in
4.4-rc1.

Laurent, can you narrow it down at all?

cheers

2015-11-20 12:57:01

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-20 13:37:06

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On Fri, 2015-11-20 at 13:56 +0100, Laurent Dufour wrote:
> On 20/11/2015 13:10, Michael Ellerman wrote:
> > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> >
> > > It's pretty much guaranteed a block layer bug, most likely in the
> > > merge bios to request infrastucture where we don't obey the merging
> > > limits properly.
> > >
> > > Does either of you have a known good and first known bad kernel?
> >
> > Not me, I've only hit it one or two times. All I can say is I have hit it in
> > 4.4-rc1.
> >
> > Laurent, can you narrow it down at all?
>
> I'm trying bisect it but it's very time consuming, and I think I missed
> some bad kernel because I didn't let enough time to the system to panic.
>
> But I'm still on...
>
I usually hit it at boot time. Maybe one out of five times. I'll take a stab
at bisecting today...

2015-11-20 14:38:40

by Ewan Milne

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On Thu, 2015-11-19 at 16:35 +0100, Hannes Reinecke wrote:
> On 11/19/2015 09:23 AM, Christoph Hellwig wrote:
> > It's pretty much guaranteed a block layer bug, most likely in the
> > merge bios to request infrastucture where we don't obey the merging
> > limits properly.
> >
> > Does either of you have a known good and first known bad kernel?
>
> Well, I have been fighting a similar issue for several months now,
> albeit with multipath enabled. Haven't had much progress with this,
> sadly.
> Seeing that this is our distro kernel it might or might not be
> related; however, as the symptoms are identical there still is a
> chance that this is actually a generic block-layer problem.
>
> Cheers,
>
> Hannes

We have seen this also. (e.g. req->nr_phys_segments was 3, but
blk_rq_map_sg() returned 4.) I was suspicious of the patch:

bio: modify __bio_add_page() to accept pages that don't start a new segment

But we put some debugging code in and didn't hit it. We haven't
found the problem yet, either, though. We're still looking.

As Christoph said, it would seem to be a problem with the block layer
merging.

The API for this seems defective, in that blk_rq_map_sg() should
never be returning a value indicating that it overwrote past the
end of the supplied SG array and depend on the caller to check it.
(We could get data corruption on another I/O if it used adjacent
memory for a different SG list, for example.)

-Ewan

2015-11-20 14:55:25

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On 11/20/2015 03:38 PM, Ewan Milne wrote:
> On Thu, 2015-11-19 at 16:35 +0100, Hannes Reinecke wrote:
>> On 11/19/2015 09:23 AM, Christoph Hellwig wrote:
>>> It's pretty much guaranteed a block layer bug, most likely in the
>>> merge bios to request infrastucture where we don't obey the merging
>>> limits properly.
>>>
>>> Does either of you have a known good and first known bad kernel?
>>
>> Well, I have been fighting a similar issue for several months now,
>> albeit with multipath enabled. Haven't had much progress with this,
>> sadly.
>> Seeing that this is our distro kernel it might or might not be
>> related; however, as the symptoms are identical there still is a
>> chance that this is actually a generic block-layer problem.
>>
>> Cheers,
>>
>> Hannes
>
> We have seen this also. (e.g. req->nr_phys_segments was 3, but
> blk_rq_map_sg() returned 4.) I was suspicious of the patch:
>
> bio: modify __bio_add_page() to accept pages that don't start a new segment
>
> But we put some debugging code in and didn't hit it. We haven't
> found the problem yet, either, though. We're still looking.
>
Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.

I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...

> As Christoph said, it would seem to be a problem with the block layer
> merging.
>
> The API for this seems defective, in that blk_rq_map_sg() should
> never be returning a value indicating that it overwrote past the
> end of the supplied SG array and depend on the caller to check it.
> (We could get data corruption on another I/O if it used adjacent
> memory for a different SG list, for example.)
>
Yeah, the API is bloody useless.
By the time you hit the BUG_ON you've already caused a memory
corruption, so no way you can recover there.

At the very least we should be passing in the sg list count into
blk_map_rq_sg(), but that's a core blocklayer API and changes here
would require changes by quite a set of drivers. Plus it wouldn't
help me for a distribution kernel ...

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

2015-11-20 15:28:46

by Ewan Milne

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
> Can't we have a joint effort here?
> I've been spending a _LOT_ of time trying to debug things here, but
> none of the ideas I've come up with have been able to fix anything.

Yes. I'm not the one primarily looking at it, and we don't have a
reproducer in-house. We just have the one dump right now.

>
> I'm almost tempted to increase the count from scsi_alloc_sgtable()
> by one and be done with ...
>

That might not fix it if it is a problem with the merge code, though.

2015-11-21 11:30:26

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On 20/11/2015 13:10, Michael Ellerman wrote:
> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>
>> It's pretty much guaranteed a block layer bug, most likely in the
>> merge bios to request infrastucture where we don't obey the merging
>> limits properly.
>>
>> Does either of you have a known good and first known bad kernel?
>
> Not me, I've only hit it one or two times. All I can say is I have hit it in
> 4.4-rc1.
>
> Laurent, can you narrow it down at all?

It seems that the panic is triggered by the commit bdced438acd8 ("block:
setup bi_phys_segments after splitting") which has been pulled by the
merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
git://git.kernel.dk/linux-block").

My system is panicing promptly when running a kernel built at
d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
without panicing.

This being said, I can't explain what's going wrong.

May Ming shed some light here ?

Cheers

2015-11-21 16:57:30

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On Sat, 21 Nov 2015 12:30:14 +0100
Laurent Dufour <[email protected]> wrote:

> On 20/11/2015 13:10, Michael Ellerman wrote:
> > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> >
> >> It's pretty much guaranteed a block layer bug, most likely in the
> >> merge bios to request infrastucture where we don't obey the merging
> >> limits properly.
> >>
> >> Does either of you have a known good and first known bad kernel?
> >
> > Not me, I've only hit it one or two times. All I can say is I have hit it in
> > 4.4-rc1.
> >
> > Laurent, can you narrow it down at all?
>
> It seems that the panic is triggered by the commit bdced438acd8 ("block:
> setup bi_phys_segments after splitting") which has been pulled by the
> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> git://git.kernel.dk/linux-block").
>
> My system is panicing promptly when running a kernel built at
> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> without panicing.
>
> This being said, I can't explain what's going wrong.
>
> May Ming shed some light here ?

Laurent, looks there is one bug in blk_bio_segment_split(), would you
mind testing the following patch to see if it fixes your issue?

---
>From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
From: Ming Lei <[email protected]>
Date: Sun, 22 Nov 2015 00:47:13 +0800
Subject: [PATCH] block: fix segment split

Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
always points to the iterator local variable, which is obviously
wrong, so fix it by pointing to the local variable of 'bvprv'.

Signed-off-by: Ming Lei <[email protected]>
---
block/blk-merge.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index de5716d8..f2efe8a 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,

seg_size += bv.bv_len;
bvprv = bv;
- bvprvp = &bv;
+ bvprvp = &bvprv;
sectors += bv.bv_len >> 9;
continue;
}
@@ -108,7 +108,7 @@ new_segment:

nsegs++;
bvprv = bv;
- bvprvp = &bv;
+ bvprvp = &bvprv;
seg_size = bv.bv_len;
sectors += bv.bv_len >> 9;
}
--
1.9.1

Thanks,
Ming

2015-11-22 23:20:15

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> On Sat, 21 Nov 2015 12:30:14 +0100
> Laurent Dufour <[email protected]> wrote:
>
> > On 20/11/2015 13:10, Michael Ellerman wrote:
> > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> > >
> > > > It's pretty much guaranteed a block layer bug, most likely in the
> > > > merge bios to request infrastucture where we don't obey the merging
> > > > limits properly.
> > > >
> > > > Does either of you have a known good and first known bad kernel?
> > >
> > > Not me, I've only hit it one or two times. All I can say is I have hit it in
> > > 4.4-rc1.
> > >
> > > Laurent, can you narrow it down at all?
> >
> > It seems that the panic is triggered by the commit bdced438acd8 ("block:
> > setup bi_phys_segments after splitting") which has been pulled by the
> > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> > git://git.kernel.dk/linux-block").
> >
> > My system is panicing promptly when running a kernel built at
> > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> > without panicing.
> >
> > This being said, I can't explain what's going wrong.
> >
> > May Ming shed some light here ?
>
> Laurent, looks there is one bug in blk_bio_segment_split(), would you
> mind testing the following patch to see if it fixes your issue?
>
> ---
> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> From: Ming Lei <[email protected]>
> Date: Sun, 22 Nov 2015 00:47:13 +0800
> Subject: [PATCH] block: fix segment split
>
> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> always points to the iterator local variable, which is obviously
> wrong, so fix it by pointing to the local variable of 'bvprv'.
>
> Signed-off-by: Ming Lei <[email protected]>
> ---
> block/blk-merge.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index de5716d8..f2efe8a 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>
> seg_size += bv.bv_len;
> bvprv = bv;
> - bvprvp = &bv;
> + bvprvp = &bvprv;
> sectors += bv.bv_len >> 9;
> continue;
> }
> @@ -108,7 +108,7 @@ new_segment:
>
> nsegs++;
> bvprv = bv;
> - bvprvp = &bv;
> + bvprvp = &bvprv;
> seg_size = bv.bv_len;
> sectors += bv.bv_len >> 9;
> }

I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.

2015-11-23 00:36:21

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <[email protected]> wrote:
> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>> On Sat, 21 Nov 2015 12:30:14 +0100
>> Laurent Dufour <[email protected]> wrote:
>>
>> > On 20/11/2015 13:10, Michael Ellerman wrote:
>> > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>> > >
>> > > > It's pretty much guaranteed a block layer bug, most likely in the
>> > > > merge bios to request infrastucture where we don't obey the merging
>> > > > limits properly.
>> > > >
>> > > > Does either of you have a known good and first known bad kernel?
>> > >
>> > > Not me, I've only hit it one or two times. All I can say is I have hit it in
>> > > 4.4-rc1.
>> > >
>> > > Laurent, can you narrow it down at all?
>> >
>> > It seems that the panic is triggered by the commit bdced438acd8 ("block:
>> > setup bi_phys_segments after splitting") which has been pulled by the
>> > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>> > git://git.kernel.dk/linux-block").
>> >
>> > My system is panicing promptly when running a kernel built at
>> > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>> > without panicing.
>> >
>> > This being said, I can't explain what's going wrong.
>> >
>> > May Ming shed some light here ?
>>
>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>> mind testing the following patch to see if it fixes your issue?
>>
>> ---
>> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>> From: Ming Lei <[email protected]>
>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>> Subject: [PATCH] block: fix segment split
>>
>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>> always points to the iterator local variable, which is obviously
>> wrong, so fix it by pointing to the local variable of 'bvprv'.
>>
>> Signed-off-by: Ming Lei <[email protected]>
>> ---
>> block/blk-merge.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index de5716d8..f2efe8a 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>>
>> seg_size += bv.bv_len;
>> bvprv = bv;
>> - bvprvp = &bv;
>> + bvprvp = &bvprv;
>> sectors += bv.bv_len >> 9;
>> continue;
>> }
>> @@ -108,7 +108,7 @@ new_segment:
>>
>> nsegs++;
>> bvprv = bv;
>> - bvprvp = &bv;
>> + bvprvp = &bvprv;
>> seg_size = bv.bv_len;
>> sectors += bv.bv_len >> 9;
>> }
>
> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.

OK, looks there are still other bugs, care to share us how to reproduce
it on arm64?

thanks,
Ming

2015-11-23 01:50:15

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <[email protected]> wrote:
> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> > > On Sat, 21 Nov 2015 12:30:14 +0100
> > > Laurent Dufour <[email protected]> wrote:
> > >
> > > > On 20/11/2015 13:10, Michael Ellerman wrote:
> > > > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> > > > >
> > > > > > It's pretty much guaranteed a block layer bug, most likely in the
> > > > > > merge bios to request infrastucture where we don't obey the merging
> > > > > > limits properly.
> > > > > >
> > > > > > Does either of you have a known good and first known bad kernel?
> > > > >
> > > > > Not me, I've only hit it one or two times. All I can say is I have hit it in
> > > > > 4.4-rc1.
> > > > >
> > > > > Laurent, can you narrow it down at all?
> > > >
> > > > It seems that the panic is triggered by the commit bdced438acd8 ("block:
> > > > setup bi_phys_segments after splitting") which has been pulled by the
> > > > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> > > > git://git.kernel.dk/linux-block").
> > > >
> > > > My system is panicing promptly when running a kernel built at
> > > > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> > > > without panicing.
> > > >
> > > > This being said, I can't explain what's going wrong.
> > > >
> > > > May Ming shed some light here ?
> > >
> > > Laurent, looks there is one bug in blk_bio_segment_split(), would you
> > > mind testing the following patch to see if it fixes your issue?
> > >
> > > ---
> > > From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> > > From: Ming Lei <[email protected]>
> > > Date: Sun, 22 Nov 2015 00:47:13 +0800
> > > Subject: [PATCH] block: fix segment split
> > >
> > > Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> > > always points to the iterator local variable, which is obviously
> > > wrong, so fix it by pointing to the local variable of 'bvprv'.
> > >
> > > Signed-off-by: Ming Lei <[email protected]>
> > > ---
> > > block/blk-merge.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/block/blk-merge.c b/block/blk-merge.c
> > > index de5716d8..f2efe8a 100644
> > > --- a/block/blk-merge.c
> > > +++ b/block/blk-merge.c
> > > @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
> > >
> > >                       seg_size += bv.bv_len;
> > >                       bvprv = bv;
> > > -                     bvprvp = &bv;
> > > +                     bvprvp = &bvprv;
> > >                       sectors += bv.bv_len >> 9;
> > >                       continue;
> > >               }
> > > @@ -108,7 +108,7 @@ new_segment:
> > >
> > >               nsegs++;
> > >               bvprv = bv;
> > > -             bvprvp = &bv;
> > > +             bvprvp = &bvprv;
> > >               seg_size = bv.bv_len;
> > >               sectors += bv.bv_len >> 9;
> > >       }
> >
> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
>
> OK, looks there are still other bugs, care to share us how to reproduce
> it on arm64?
>
> thanks,
> Ming

Unfortunately, the best reproducer I have is to boot the platform. I have seen the
BUG a few times post-boot, but I don't have a consistant reproducer. I am using
upstream 4.4-rc1 with this config:

  http://people.redhat.com/msalter/fh_defconfig

With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 boots.
On an AMD Seattle platform, about every 9 boots.

I have a script that loops through an ssh command to reboot the platform under test.
I manually install test kernels and then run the script and wait for failure. While
debugging, I have tried more minimal configs with which I have been unable to
reproduce the problem even after several hours of reboots. With the above mentioned
fh_defconfig, I have been able to get a failure within 20 or so boots with most
kernel builds but at certain kernel commits, the failure has taken a longer time to
reproduce.

>From my POV, I can't say which commit causes the problem. So far, I have not been
able to reproduce at all before commit d9734e0d1ccf but I am currently trying to
reproduce with commit 0d51ce9ca1116 (one merge earlier than d9734e0d1ccf).

2015-11-23 02:46:26

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

Hi Mark,

On Mon, Nov 23, 2015 at 9:50 AM, Mark Salter <[email protected]> wrote:
> On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
>> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <[email protected]> wrote:
>> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>> > > On Sat, 21 Nov 2015 12:30:14 +0100
>> > > Laurent Dufour <[email protected]> wrote:
>> > >
>> > > > On 20/11/2015 13:10, Michael Ellerman wrote:
>> > > > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>> > > > >
>> > > > > > It's pretty much guaranteed a block layer bug, most likely in the
>> > > > > > merge bios to request infrastucture where we don't obey the merging
>> > > > > > limits properly.
>> > > > > >
>> > > > > > Does either of you have a known good and first known bad kernel?
>> > > > >
>> > > > > Not me, I've only hit it one or two times. All I can say is I have hit it in
>> > > > > 4.4-rc1.
>> > > > >
>> > > > > Laurent, can you narrow it down at all?
>> > > >
>> > > > It seems that the panic is triggered by the commit bdced438acd8 ("block:
>> > > > setup bi_phys_segments after splitting") which has been pulled by the
>> > > > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>> > > > git://git.kernel.dk/linux-block").
>> > > >
>> > > > My system is panicing promptly when running a kernel built at
>> > > > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>> > > > without panicing.
>> > > >
>> > > > This being said, I can't explain what's going wrong.
>> > > >
>> > > > May Ming shed some light here ?
>> > >
>> > > Laurent, looks there is one bug in blk_bio_segment_split(), would you
>> > > mind testing the following patch to see if it fixes your issue?
>> > >
>> > > ---
>> > > From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>> > > From: Ming Lei <[email protected]>
>> > > Date: Sun, 22 Nov 2015 00:47:13 +0800
>> > > Subject: [PATCH] block: fix segment split
>> > >
>> > > Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>> > > always points to the iterator local variable, which is obviously
>> > > wrong, so fix it by pointing to the local variable of 'bvprv'.
>> > >
>> > > Signed-off-by: Ming Lei <[email protected]>
>> > > ---
>> > > block/blk-merge.c | 4 ++--
>> > > 1 file changed, 2 insertions(+), 2 deletions(-)
>> > >
>> > > diff --git a/block/blk-merge.c b/block/blk-merge.c
>> > > index de5716d8..f2efe8a 100644
>> > > --- a/block/blk-merge.c
>> > > +++ b/block/blk-merge.c
>> > > @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>> > >
>> > > seg_size += bv.bv_len;
>> > > bvprv = bv;
>> > > - bvprvp = &bv;
>> > > + bvprvp = &bvprv;
>> > > sectors += bv.bv_len >> 9;
>> > > continue;
>> > > }
>> > > @@ -108,7 +108,7 @@ new_segment:
>> > >
>> > > nsegs++;
>> > > bvprv = bv;
>> > > - bvprvp = &bv;
>> > > + bvprvp = &bvprv;
>> > > seg_size = bv.bv_len;
>> > > sectors += bv.bv_len >> 9;
>> > > }
>> >
>> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
>>
>> OK, looks there are still other bugs, care to share us how to reproduce
>> it on arm64?
>>
>> thanks,
>> Ming
>
> Unfortunately, the best reproducer I have is to boot the platform. I have seen the
> BUG a few times post-boot, but I don't have a consistant reproducer. I am using
> upstream 4.4-rc1 with this config:
>
> http://people.redhat.com/msalter/fh_defconfig
>
> With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 boots.
> On an AMD Seattle platform, about every 9 boots.

Thanks for the input, and I will try to reproduce the issue on mustang with
your kernel config.

>
> I have a script that loops through an ssh command to reboot the platform under test.
> I manually install test kernels and then run the script and wait for failure. While
> debugging, I have tried more minimal configs with which I have been unable to
> reproduce the problem even after several hours of reboots. With the above mentioned
> fh_defconfig, I have been able to get a failure within 20 or so boots with most
> kernel builds but at certain kernel commits, the failure has taken a longer time to
> reproduce.
>
> From my POV, I can't say which commit causes the problem. So far, I have not been
> able to reproduce at all before commit d9734e0d1ccf but I am currently trying to
> reproduce with commit 0d51ce9ca1116 (one merge earlier than d9734e0d1ccf).

The patch for fixing 'bvprvp' is better to be included for test,
because that issue
may have a big effect on computing physical seg count.

Also I appreciate if you, Laurent or anyone may provide debug log which
can be captured with the attached debug patch when this issue is trigered .

Thanks,

Attachments:

dbg_scsi_blk_merge.patch (1.40 kB)

2015-11-23 06:55:52

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-23 13:57:30

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On 23/11/2015 00:20, Mark Salter wrote:
> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>> On Sat, 21 Nov 2015 12:30:14 +0100
>> Laurent Dufour <[email protected]> wrote:
>>
>>> On 20/11/2015 13:10, Michael Ellerman wrote:
>>>> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>>>>
>>>>> It's pretty much guaranteed a block layer bug, most likely in the
>>>>> merge bios to request infrastucture where we don't obey the merging
>>>>> limits properly.
>>>>>
>>>>> Does either of you have a known good and first known bad kernel?
>>>>
>>>> Not me, I've only hit it one or two times. All I can say is I have hit it in
>>>> 4.4-rc1.
>>>>
>>>> Laurent, can you narrow it down at all?
>>>
>>> It seems that the panic is triggered by the commit bdced438acd8 ("block:
>>> setup bi_phys_segments after splitting") which has been pulled by the
>>> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>>> git://git.kernel.dk/linux-block").
>>>
>>> My system is panicing promptly when running a kernel built at
>>> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>>> without panicing.
>>>
>>> This being said, I can't explain what's going wrong.
>>>
>>> May Ming shed some light here ?
>>
>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>> mind testing the following patch to see if it fixes your issue?
>>
>> ---
>> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>> From: Ming Lei <[email protected]>
>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>> Subject: [PATCH] block: fix segment split
>>
>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>> always points to the iterator local variable, which is obviously
>> wrong, so fix it by pointing to the local variable of 'bvprv'.
>>
>> Signed-off-by: Ming Lei <[email protected]>
>> ---
>> block/blk-merge.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index de5716d8..f2efe8a 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>>
>> seg_size += bv.bv_len;
>> bvprv = bv;
>> - bvprvp = &bv;
>> + bvprvp = &bvprv;
>> sectors += bv.bv_len >> 9;
>> continue;
>> }
>> @@ -108,7 +108,7 @@ new_segment:
>>
>> nsegs++;
>> bvprv = bv;
>> - bvprvp = &bv;
>> + bvprvp = &bvprv;
>> seg_size = bv.bv_len;
>> sectors += bv.bv_len >> 9;
>> }
>
> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.

On my side, with the patch applied on top of 4.4-rc1, I can't get the
panic anymore.

2015-11-23 15:13:35

by Pratyush Anand

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On 23/11/2015:02:57:19 PM, Laurent Dufour wrote:
> On 23/11/2015 00:20, Mark Salter wrote:
> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> >> On Sat, 21 Nov 2015 12:30:14 +0100
> >> Laurent Dufour <[email protected]> wrote:
> >>
> >>> On 20/11/2015 13:10, Michael Ellerman wrote:
> >>>> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> >>>>
> >>>>> It's pretty much guaranteed a block layer bug, most likely in the
> >>>>> merge bios to request infrastucture where we don't obey the merging
> >>>>> limits properly.
> >>>>>
> >>>>> Does either of you have a known good and first known bad kernel?
> >>>>
> >>>> Not me, I've only hit it one or two times. All I can say is I have hit it in
> >>>> 4.4-rc1.
> >>>>
> >>>> Laurent, can you narrow it down at all?
> >>>
> >>> It seems that the panic is triggered by the commit bdced438acd8 ("block:
> >>> setup bi_phys_segments after splitting") which has been pulled by the
> >>> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> >>> git://git.kernel.dk/linux-block").
> >>>
> >>> My system is panicing promptly when running a kernel built at
> >>> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> >>> without panicing.
> >>>
> >>> This being said, I can't explain what's going wrong.
> >>>
> >>> May Ming shed some light here ?
> >>
> >> Laurent, looks there is one bug in blk_bio_segment_split(), would you
> >> mind testing the following patch to see if it fixes your issue?
> >>
> >> ---
> >> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> >> From: Ming Lei <[email protected]>
> >> Date: Sun, 22 Nov 2015 00:47:13 +0800
> >> Subject: [PATCH] block: fix segment split
> >>
> >> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> >> always points to the iterator local variable, which is obviously
> >> wrong, so fix it by pointing to the local variable of 'bvprv'.
> >>
> >> Signed-off-by: Ming Lei <[email protected]>
> >> ---
> >> block/blk-merge.c | 4 ++--
> >> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/block/blk-merge.c b/block/blk-merge.c
> >> index de5716d8..f2efe8a 100644
> >> --- a/block/blk-merge.c
> >> +++ b/block/blk-merge.c
> >> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
> >>
> >> seg_size += bv.bv_len;
> >> bvprv = bv;
> >> - bvprvp = &bv;
> >> + bvprvp = &bvprv;
> >> sectors += bv.bv_len >> 9;
> >> continue;
> >> }
> >> @@ -108,7 +108,7 @@ new_segment:
> >>
> >> nsegs++;
> >> bvprv = bv;
> >> - bvprvp = &bv;
> >> + bvprvp = &bvprv;
> >> seg_size = bv.bv_len;
> >> sectors += bv.bv_len >> 9;
> >> }
> >
> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
>
> On my side, with the patch applied on top of 4.4-rc1, I can't get the
> panic anymore.

git bisect shows:

bdced438acd83ad83a6c6fc7f50099b820245ddb is the first bad commit
commit bdced438acd83ad83a6c6fc7f50099b820245ddb
Author: Ming Lei <[email protected]>
Date: Tue Oct 20 23:13:52 2015 +0800

block: setup bi_phys_segments after splitting

Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.

~Pratyush

2015-11-23 15:20:47

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On 23/11/2015 16:13, Pratyush Anand wrote:
> On 23/11/2015:02:57:19 PM, Laurent Dufour wrote:
>> On 23/11/2015 00:20, Mark Salter wrote:
>>> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>>>> On Sat, 21 Nov 2015 12:30:14 +0100
>>>> Laurent Dufour <[email protected]> wrote:
>>>>
>>>>> On 20/11/2015 13:10, Michael Ellerman wrote:
>>>>>> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>>>>>>
>>>>>>> It's pretty much guaranteed a block layer bug, most likely in the
>>>>>>> merge bios to request infrastucture where we don't obey the merging
>>>>>>> limits properly.
>>>>>>>
>>>>>>> Does either of you have a known good and first known bad kernel?
>>>>>>
>>>>>> Not me, I've only hit it one or two times. All I can say is I have hit it in
>>>>>> 4.4-rc1.
>>>>>>
>>>>>> Laurent, can you narrow it down at all?
>>>>>
>>>>> It seems that the panic is triggered by the commit bdced438acd8 ("block:
>>>>> setup bi_phys_segments after splitting") which has been pulled by the
>>>>> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>>>>> git://git.kernel.dk/linux-block").
>>>>>
>>>>> My system is panicing promptly when running a kernel built at
>>>>> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>>>>> without panicing.
>>>>>
>>>>> This being said, I can't explain what's going wrong.
>>>>>
>>>>> May Ming shed some light here ?
>>>>
>>>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>>>> mind testing the following patch to see if it fixes your issue?
>>>>
>>>> ---
>>>> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>>>> From: Ming Lei <[email protected]>
>>>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>>>> Subject: [PATCH] block: fix segment split
>>>>
>>>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>>>> always points to the iterator local variable, which is obviously
>>>> wrong, so fix it by pointing to the local variable of 'b
> ~Pratyush
>
vprv'.
>>>>
>>>> Signed-off-by: Ming Lei <[email protected]>
>>>> ---
>>>> block/blk-merge.c | 4 ++--
>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>>>> index de5716d8..f2efe8a 100644
>>>> --- a/block/blk-merge.c
>>>> +++ b/block/blk-merge.c
>>>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>>>>
>>>> seg_size += bv.bv_len;bdced438acd8
>>>> bvprv = bv;
>>>> - bvprvp = &bv;
>>>> + bvprvp = &bvprv;
>>>> sectors += bv.bv_len >> 9;
>>>> continue;
>>>> }
>>>> @@ -108,7 +108,7 @@ new_segment:
>>>>
>>>> nsegs++;
>>>> bvprv = bv;
>>>> - bvprvp = &bv;
>>>> + bvprvp = &bvprv;
>>>> seg_size = bv.bv_len;
>>>> sectors += bv.bv_len >> 9;
>>>> }
>>>
>>> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
>>
>> On my side, with the patch applied on top of 4.4-rc1, I can't get the
>> panic anymore.
>
> git bisect shows:
>
> bdced438acd83ad83a6c6fc7f50099b820245ddb is the first bad commit
> commit bdced438acd83ad83a6c6fc7f50099b820245ddb
> Author: Ming Lei <[email protected]>
> Date: Tue Oct 20 23:13:52 2015 +0800
>
> block: setup bi_phys_segments after splitting
>
> Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.

That's what I mentioned earlier ;)

Now Ming send an additional patch with seems to fix the bug introduced
through the commit bdced438acd8. When testing with this new patch I
can't get the panic anymore, but Mark reported he is still hitting it.

2015-11-23 15:21:53

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On Mon, 23 Nov 2015 10:46:20 +0800
Ming Lei <[email protected]> wrote:

> Hi Mark,
>
> On Mon, Nov 23, 2015 at 9:50 AM, Mark Salter <[email protected]> wrote:
> > On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
> >> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <[email protected]> wrote:
> >> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> >> > > On Sat, 21 Nov 2015 12:30:14 +0100
> >> > > Laurent Dufour <[email protected]> wrote:
> >> > >
> >> > > > On 20/11/2015 13:10, Michael Ellerman wrote:
> >> > > > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> >> > > > >
> >> > > > > > It's pretty much guaranteed a block layer bug, most likely in the
> >> > > > > > merge bios to request infrastucture where we don't obey the merging
> >> > > > > > limits properly.
> >> > > > > >
> >> > > > > > Does either of you have a known good and first known bad kernel?
> >> > > > >
> >> > > > > Not me, I've only hit it one or two times. All I can say is I have hit it in
> >> > > > > 4.4-rc1.
> >> > > > >
> >> > > > > Laurent, can you narrow it down at all?
> >> > > >
> >> > > > It seems that the panic is triggered by the commit bdced438acd8 ("block:
> >> > > > setup bi_phys_segments after splitting") which has been pulled by the
> >> > > > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> >> > > > git://git.kernel.dk/linux-block").
> >> > > >
> >> > > > My system is panicing promptly when running a kernel built at
> >> > > > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> >> > > > without panicing.
> >> > > >
> >> > > > This being said, I can't explain what's going wrong.
> >> > > >
> >> > > > May Ming shed some light here ?
> >> > >
> >> > > Laurent, looks there is one bug in blk_bio_segment_split(), would you
> >> > > mind testing the following patch to see if it fixes your issue?
> >> > >
> >> > > ---
> >> > > From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> >> > > From: Ming Lei <[email protected]>
> >> > > Date: Sun, 22 Nov 2015 00:47:13 +0800
> >> > > Subject: [PATCH] block: fix segment split
> >> > >
> >> > > Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> >> > > always points to the iterator local variable, which is obviously
> >> > > wrong, so fix it by pointing to the local variable of 'bvprv'.
> >> > >
> >> > > Signed-off-by: Ming Lei <[email protected]>
> >> > > ---
> >> > > block/blk-merge.c | 4 ++--
> >> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> >> > >
> >> > > diff --git a/block/blk-merge.c b/block/blk-merge.c
> >> > > index de5716d8..f2efe8a 100644
> >> > > --- a/block/blk-merge.c
> >> > > +++ b/block/blk-merge.c
> >> > > @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
> >> > >
> >> > > seg_size += bv.bv_len;
> >> > > bvprv = bv;
> >> > > - bvprvp = &bv;
> >> > > + bvprvp = &bvprv;
> >> > > sectors += bv.bv_len >> 9;
> >> > > continue;
> >> > > }
> >> > > @@ -108,7 +108,7 @@ new_segment:
> >> > >
> >> > > nsegs++;
> >> > > bvprv = bv;
> >> > > - bvprvp = &bv;
> >> > > + bvprvp = &bvprv;
> >> > > seg_size = bv.bv_len;
> >> > > sectors += bv.bv_len >> 9;
> >> > > }
> >> >
> >> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
> >>
> >> OK, looks there are still other bugs, care to share us how to reproduce
> >> it on arm64?
> >>
> >> thanks,
> >> Ming
> >
> > Unfortunately, the best reproducer I have is to boot the platform. I have seen the
> > BUG a few times post-boot, but I don't have a consistant reproducer. I am using
> > upstream 4.4-rc1 with this config:
> >
> > http://people.redhat.com/msalter/fh_defconfig
> >
> > With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 boots.
> > On an AMD Seattle platform, about every 9 boots.
>
> Thanks for the input, and I will try to reproduce the issue on mustang with
> your kernel config.

I can reproduce the issue on mustang, and looks I may understand the story now.

When 64K page size is used on arm64, and the default segment size of block
is 65536, then one segment should only include one page at most.

Commit bdced438acd83a(block: setup bi_phys_segments after splitting) does not
compute bio->bi_seg_front_size and bio->bi_seg_back_size, then one less segment
may be obtained because blk_phys_contig_segment() thought the last bvec in 1st
bio and the 1st bvec in the 2nd bio is in one physical segment, so cause the
regression.

Looks the following patch can fix the issue by figuring bio->bi_seg_front_size
and bio->bi_seg_back_size in blk_bio_segment_split().

Mark, thanks again for providing the reproduction steps, and could you run your
test to see if it can fix your issue?

---
>From 86b5f33d48715c1150fdcfd9a76e495e7aa913aa Mon Sep 17 00:00:00 2001
From: Ming Lei <[email protected]>
Date: Mon, 23 Nov 2015 20:27:23 +0800
Subject: [PATCH 2/2] blk-merge: fix blk_bio_segment_split

Commit bdced438acd83a(block: setup bi_phys_segments after
splitting) introduces function of computing bio->bi_phys_segments
during bio splitting.

Unfortunately both bio->bi_seg_front_size and bio->bi_seg_back_size
arn't computed, so too many physical segments may be obtained
for one request since both the two are used to check if one segment
across two bios can be possible.

This patch fixes the issue by computing the two variables in
blk_bio_segment_split().

Reported-by: Michael Ellerman <[email protected]>
Reported-by: Mark Salter <[email protected]>
Fixes: bdced438acd83a(block: setup bi_phys_segments after splitting)
Signed-off-by: Ming Lei <[email protected]>
---
block/blk-merge.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index f2efe8a..50793cd 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -76,6 +76,9 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
struct bio_vec bv, bvprv, *bvprvp = NULL;
struct bvec_iter iter;
unsigned seg_size = 0, nsegs = 0, sectors = 0;
+ unsigned front_seg_size = bio->bi_seg_front_size;
+ bool do_split = true;
+ struct bio *new = NULL;

bio_for_each_segment(bv, bio, iter) {
if (sectors + (bv.bv_len >> 9) > queue_max_sectors(q))
@@ -111,13 +114,26 @@ new_segment:
bvprvp = &bvprv;
seg_size = bv.bv_len;
sectors += bv.bv_len >> 9;
+
+ if (nsegs == 1 && seg_size > front_seg_size)
+ front_seg_size = seg_size;
}

- *segs = nsegs;
- return NULL;
+ do_split = false;
split:
*segs = nsegs;
- return bio_split(bio, sectors, GFP_NOIO, bs);
+
+ if (do_split) {
+ new = bio_split(bio, sectors, GFP_NOIO, bs);
+ if (new)
+ bio = new;
+ }
+
+ bio->bi_seg_front_size = front_seg_size;
+ if (seg_size > bio->bi_seg_back_size)
+ bio->bi_seg_back_size = seg_size;
+
+ return do_split ? new : NULL;
}

void blk_queue_split(struct request_queue *q, struct bio **bio,
--
1.9.1

Thanks,

2015-11-23 15:27:23

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On Mon, Nov 23, 2015 at 11:20 PM, Laurent Dufour
<[email protected]> wrote:
>>
>> Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.
>
> That's what I mentioned earlier ;)
>
> Now Ming send an additional patch with seems to fix the bug introduced
> through the commit bdced438acd8. When testing with this new patch I
> can't get the panic anymore, but Mark reported he is still hitting it.

Laurent, thanks for your test on the 1st patch, and looks there are at
least two problems, and my 2nd patch sent just now should address
Mark's issue which is caused by bdced438acd83a.

Once the 2nd one is tested OK, I will send out the two together.

Thanks,
Ming

2015-11-23 16:24:42

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On 23/11/2015 16:27, Ming Lei wrote:
> On Mon, Nov 23, 2015 at 11:20 PM, Laurent Dufour
> <[email protected]> wrote:
>>>
>>> Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.
>>
>> That's what I mentioned earlier ;)
>>
>> Now Ming send an additional patch with seems to fix the bug introduced
>> through the commit bdced438acd8. When testing with this new patch I
>> can't get the panic anymore, but Mark reported he is still hitting it.
>
> Laurent, thanks for your test on the 1st patch, and looks there are at
> least two problems, and my 2nd patch sent just now should address
> Mark's issue which is caused by bdced438acd83a.
>
> Once the 2nd one is tested OK, I will send out the two together.

FWIW, I applied the 2nd patch, and my system is still running like a charm.

Cheers,
Laurent.

2015-11-24 01:30:18

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On Mon, 2015-11-23 at 23:27 +0800, Ming Lei wrote:
> On Mon, Nov 23, 2015 at 11:20 PM, Laurent Dufour
> <[email protected]> wrote:
> > >
> > > Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.
> >
> > That's what I mentioned earlier ;)
> >
> > Now Ming send an additional patch with seems to fix the bug introduced
> > through the commit bdced438acd8. When testing with this new patch I
> > can't get the panic anymore, but Mark reported he is still hitting it.
>
> Laurent, thanks for your test on the 1st patch, and looks there are at
> least two problems, and my 2nd patch sent just now should address
> Mark's issue which is caused by bdced438acd83a.
>
> Once the 2nd one is tested OK, I will send out the two together.
>
> Thanks,
> Ming

Thanks Ming.
With both patches applied, I have been unable to reproduce the problem.

2015-11-24 19:35:47

by Alan Ott

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On 11/23/2015 10:21 AM, Ming Lei wrote:
> On Mon, 23 Nov 2015 10:46:20 +0800
> Ming Lei <[email protected]> wrote:
>
>> Hi Mark,
>>
>> On Mon, Nov 23, 2015 at 9:50 AM, Mark Salter <[email protected]> wrote:
>>> On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
>>>> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <[email protected]> wrote:
>>>>> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>>>>>> On Sat, 21 Nov 2015 12:30:14 +0100
>>>>>> Laurent Dufour <[email protected]> wrote:
>>>>>>
>>>>>>> On 20/11/2015 13:10, Michael Ellerman wrote:
>>>>>>>> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>>>>>>>>
>>>>>>>>> It's pretty much guaranteed a block layer bug, most likely in the
>>>>>>>>> merge bios to request infrastucture where we don't obey the merging
>>>>>>>>> limits properly.
>>>>>>>>>
>>>>>>>>> Does either of you have a known good and first known bad kernel?
>>>>>>>>
>>>>>>>> Not me, I've only hit it one or two times. All I can say is I have hit it in
>>>>>>>> 4.4-rc1.
>>>>>>>>
>>>>>>>> Laurent, can you narrow it down at all?
>>>>>>>
>>>>>>> It seems that the panic is triggered by the commit bdced438acd8 ("block:
>>>>>>> setup bi_phys_segments after splitting") which has been pulled by the
>>>>>>> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>>>>>>> git://git.kernel.dk/linux-block").
>>>>>>>
>>>>>>> My system is panicing promptly when running a kernel built at
>>>>>>> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>>>>>>> without panicing.
>>>>>>>
>>>>>>> This being said, I can't explain what's going wrong.
>>>>>>>
>>>>>>> May Ming shed some light here ?
>>>>>>
>>>>>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>>>>>> mind testing the following patch to see if it fixes your issue?
>>>>>>
>>>>>> ---
>>>>>> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>>>>>> From: Ming Lei <[email protected]>
>>>>>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>>>>>> Subject: [PATCH] block: fix segment split
>>>>>>
>>>>>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>>>>>> always points to the iterator local variable, which is obviously
>>>>>> wrong, so fix it by pointing to the local variable of 'bvprv'.
>>>>>>
>>>>>> Signed-off-by: Ming Lei <[email protected]>
>>>>>> ---
>>>>>> block/blk-merge.c | 4 ++--
>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>>>>>> index de5716d8..f2efe8a 100644
>>>>>> --- a/block/blk-merge.c
>>>>>> +++ b/block/blk-merge.c
>>>>>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>>>>>>
>>>>>> seg_size += bv.bv_len;
>>>>>> bvprv = bv;
>>>>>> - bvprvp = &bv;
>>>>>> + bvprvp = &bvprv;
>>>>>> sectors += bv.bv_len >> 9;
>>>>>> continue;
>>>>>> }
>>>>>> @@ -108,7 +108,7 @@ new_segment:
>>>>>>
>>>>>> nsegs++;
>>>>>> bvprv = bv;
>>>>>> - bvprvp = &bv;
>>>>>> + bvprvp = &bvprv;
>>>>>> seg_size = bv.bv_len;
>>>>>> sectors += bv.bv_len >> 9;
>>>>>> }
>>>>>
>>>>> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
>>>>
>>>> OK, looks there are still other bugs, care to share us how to reproduce
>>>> it on arm64?
>>>>
>>>> thanks,
>>>> Ming
>>>
>>> Unfortunately, the best reproducer I have is to boot the platform. I have seen the
>>> BUG a few times post-boot, but I don't have a consistant reproducer. I am using
>>> upstream 4.4-rc1 with this config:
>>>
>>> http://people.redhat.com/msalter/fh_defconfig
>>>
>>> With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 boots.
>>> On an AMD Seattle platform, about every 9 boots.
>>
>> Thanks for the input, and I will try to reproduce the issue on mustang with
>> your kernel config.
>
> I can reproduce the issue on mustang, and looks I may understand the story now.
>
> When 64K page size is used on arm64, and the default segment size of block
> is 65536, then one segment should only include one page at most.
>
> Commit bdced438acd83a(block: setup bi_phys_segments after splitting) does not
> compute bio->bi_seg_front_size and bio->bi_seg_back_size, then one less segment
> may be obtained because blk_phys_contig_segment() thought the last bvec in 1st
> bio and the 1st bvec in the 2nd bio is in one physical segment, so cause the
> regression.
>
> Looks the following patch can fix the issue by figuring bio->bi_seg_front_size
> and bio->bi_seg_back_size in blk_bio_segment_split().
>
> Mark, thanks again for providing the reproduction steps, and could you run your
> test to see if it can fix your issue?
>
> ---
> From 86b5f33d48715c1150fdcfd9a76e495e7aa913aa Mon Sep 17 00:00:00 2001
> From: Ming Lei <[email protected]>
> Date: Mon, 23 Nov 2015 20:27:23 +0800
> Subject: [PATCH 2/2] blk-merge: fix blk_bio_segment_split
>
> Commit bdced438acd83a(block: setup bi_phys_segments after
> splitting) introduces function of computing bio->bi_phys_segments
> during bio splitting.
>
> Unfortunately both bio->bi_seg_front_size and bio->bi_seg_back_size
> arn't computed, so too many physical segments may be obtained
> for one request since both the two are used to check if one segment
> across two bios can be possible.
>
> This patch fixes the issue by computing the two variables in
> blk_bio_segment_split().
>
> Reported-by: Michael Ellerman <[email protected]>
> Reported-by: Mark Salter <[email protected]>
> Fixes: bdced438acd83a(block: setup bi_phys_segments after splitting)
> Signed-off-by: Ming Lei <[email protected]>
> ---
> block/blk-merge.c | 22 +++++++++++++++++++---
> 1 file changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index f2efe8a..50793cd 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -76,6 +76,9 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
> struct bio_vec bv, bvprv, *bvprvp = NULL;
> struct bvec_iter iter;
> unsigned seg_size = 0, nsegs = 0, sectors = 0;
> + unsigned front_seg_size = bio->bi_seg_front_size;
> + bool do_split = true;
> + struct bio *new = NULL;
>
> bio_for_each_segment(bv, bio, iter) {
> if (sectors + (bv.bv_len >> 9) > queue_max_sectors(q))
> @@ -111,13 +114,26 @@ new_segment:
> bvprvp = &bvprv;
> seg_size = bv.bv_len;
> sectors += bv.bv_len >> 9;
> +
> + if (nsegs == 1 && seg_size > front_seg_size)
> + front_seg_size = seg_size;
> }
>
> - *segs = nsegs;
> - return NULL;
> + do_split = false;
> split:
> *segs = nsegs;
> - return bio_split(bio, sectors, GFP_NOIO, bs);
> +
> + if (do_split) {
> + new = bio_split(bio, sectors, GFP_NOIO, bs);
> + if (new)
> + bio = new;
> + }
> +
> + bio->bi_seg_front_size = front_seg_size;
> + if (seg_size > bio->bi_seg_back_size)
> + bio->bi_seg_back_size = seg_size;
> +
> + return do_split ? new : NULL;
> }
>
> void blk_queue_split(struct request_queue *q, struct bio **bio,
>

I applied both your patches and tested on Overdrive 3000. This fixes the
issue for me.

Added linux-arm-kernel, since arm64 triggers this issue.

Thanks Ming and testers for your hard work on this.

Tested-by: Alan Ott <[email protected]>

2015-11-25 09:04:48

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On 11/20/2015 04:28 PM, Ewan Milne wrote:
> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
>> Can't we have a joint effort here?
>> I've been spending a _LOT_ of time trying to debug things here, but
>> none of the ideas I've come up with have been able to fix anything.
>
> Yes. I'm not the one primarily looking at it, and we don't have a
> reproducer in-house. We just have the one dump right now.
>
>>
>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
>> by one and be done with ...
>>
>
> That might not fix it if it is a problem with the merge code, though.
>
And indeed, it doesn't.
Seems I finally found the culprit.

What happens is this:
We have two paths, with these seg_boundary_masks:

path-1: seg_boundary_mask = 65535,
path-2: seg_boundary_mask = 4294967295,

consequently the DM request queue has this:

md-1: seg_boundary_mask = 65535,

What happens now is that a request is being formatted, and sent
to path 2. During submission req->nr_phys_segments is formatted
with the limits of path 2, arriving at a count of 3.
Now the request gets retried on path 1, but as the NOMERGE request
flag is set req->nr_phys_segments is never updated.
But blk_rq_map_sg() ignores all counters, and just uses the
bi_vec directly, resulting in a count of 4 -> boom.

So the culprit here is the NOMERGE flag, which is evaluated
via
->dm_dispatch_request()
->blk_insert_cloned_request()
->blk_rq_check_limits()

If the above assessment is correct, the following patch should
fix it:

diff --git a/block/blk-core.c b/block/blk-core.c
index 801ced7..12cccd6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
*/
int blk_rq_check_limits(struct request_queue *q, struct request *rq)
{
- if (!rq_mergeable(rq))
+ if (rq->cmd_type != REQ_TYPE_FS)
return 0;

if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
rq->cmd_flags)) {

Mike? Jens?
Can you comment on it?

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

2015-11-25 17:57:12

by Jens Axboe

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On 11/25/2015 02:04 AM, Hannes Reinecke wrote:
> On 11/20/2015 04:28 PM, Ewan Milne wrote:
>> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
>>> Can't we have a joint effort here?
>>> I've been spending a _LOT_ of time trying to debug things here, but
>>> none of the ideas I've come up with have been able to fix anything.
>>
>> Yes. I'm not the one primarily looking at it, and we don't have a
>> reproducer in-house. We just have the one dump right now.
>>
>>>
>>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
>>> by one and be done with ...
>>>
>>
>> That might not fix it if it is a problem with the merge code, though.
>>
> And indeed, it doesn't.
> Seems I finally found the culprit.
>
> What happens is this:
> We have two paths, with these seg_boundary_masks:
>
> path-1: seg_boundary_mask = 65535,
> path-2: seg_boundary_mask = 4294967295,
>
> consequently the DM request queue has this:
>
> md-1: seg_boundary_mask = 65535,
>
> What happens now is that a request is being formatted, and sent
> to path 2. During submission req->nr_phys_segments is formatted
> with the limits of path 2, arriving at a count of 3.
> Now the request gets retried on path 1, but as the NOMERGE request
> flag is set req->nr_phys_segments is never updated.
> But blk_rq_map_sg() ignores all counters, and just uses the
> bi_vec directly, resulting in a count of 4 -> boom.
>
> So the culprit here is the NOMERGE flag, which is evaluated
> via
> ->dm_dispatch_request()
> ->blk_insert_cloned_request()
> ->blk_rq_check_limits()
>
> If the above assessment is correct, the following patch should
> fix it:
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 801ced7..12cccd6 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
> */
> int blk_rq_check_limits(struct request_queue *q, struct request *rq)
> {
> - if (!rq_mergeable(rq))
> + if (rq->cmd_type != REQ_TYPE_FS)
> return 0;
>
> if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
> rq->cmd_flags)) {
>
>
> Mike? Jens?
> Can you comment on it?

We only support merging on REQ_TYPE_FS already, so how is the above
making it any different? In general, NOMERGE being set or not should not
make a difference. It's only a hint that we need not check further if we
should be merging on this request, since we already tried it once, found
we'd exceed various limits, then set NOMERGE to reflect that.

--
Jens Axboe

2015-11-25 18:01:50

by Mike Snitzer

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On Wed, Nov 25 2015 at 4:04am -0500,
Hannes Reinecke <[email protected]> wrote:

> On 11/20/2015 04:28 PM, Ewan Milne wrote:
> > On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
> >> Can't we have a joint effort here?
> >> I've been spending a _LOT_ of time trying to debug things here, but
> >> none of the ideas I've come up with have been able to fix anything.
> >
> > Yes. I'm not the one primarily looking at it, and we don't have a
> > reproducer in-house. We just have the one dump right now.
> >
> >>
> >> I'm almost tempted to increase the count from scsi_alloc_sgtable()
> >> by one and be done with ...
> >>
> >
> > That might not fix it if it is a problem with the merge code, though.
> >
> And indeed, it doesn't.

How did you arrive at that? Do you have a reproducer now?

> Seems I finally found the culprit.
>
> What happens is this:
> We have two paths, with these seg_boundary_masks:
>
> path-1: seg_boundary_mask = 65535,
> path-2: seg_boundary_mask = 4294967295,
>
> consequently the DM request queue has this:
>
> md-1: seg_boundary_mask = 65535,
>
> What happens now is that a request is being formatted, and sent
> to path 2. During submission req->nr_phys_segments is formatted
> with the limits of path 2, arriving at a count of 3.
> Now the request gets retried on path 1, but as the NOMERGE request
> flag is set req->nr_phys_segments is never updated.
> But blk_rq_map_sg() ignores all counters, and just uses the
> bi_vec directly, resulting in a count of 4 -> boom.
>
> So the culprit here is the NOMERGE flag,

NOMERGE is always set in __blk_rq_prep_clone() for cloned requests.

> which is evaluated via
> ->dm_dispatch_request()
> ->blk_insert_cloned_request()
> ->blk_rq_check_limits()

blk_insert_cloned_request() is the only caller of blk_rq_check_limits();
anyway after reading your mail I'm still left wondering if your proposed
patch is correct.

> If the above assessment is correct, the following patch should
> fix it:
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 801ced7..12cccd6 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
> */
> int blk_rq_check_limits(struct request_queue *q, struct request *rq)
> {
> - if (!rq_mergeable(rq))
> + if (rq->cmd_type != REQ_TYPE_FS)
> return 0;
>
> if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
> rq->cmd_flags)) {
>
>
> Mike? Jens?
> Can you comment on it?

You're not explaining the actual change in the patch very well; I think
you're correct but you're leaving the justification as an exercise to
the reviewer:

blk_rq_check_limits() will call blk_recalc_rq_segments() after the
!rq_mergeable() check but you're saying for this case in question we
never get there -- due to the cloned request having NOMERGE set.

So in blk_rq_check_limits() you've unrolled rq_mergeable() and
open-coded the lone remaining check (rq->cmd_type != REQ_TYPE_FS)

I agree that the (rq->cmd_flags & REQ_NOMERGE_FLAGS) check in
the blk_insert_cloned_request() call-chain (via rq_mergeable()) makes no
sense for cloned requests that always have NOMERGE set.

So you're saying that by having blk_rq_check_limits() go on to call
blk_recalc_rq_segments() this bug will be fixed?

BTW, I think blk_rq_check_limits()'s export should be removed and the
function made static and renamed to blk_clone_rq_check_limits(), again:
blk_insert_cloned_request() is the only caller of blk_rq_check_limits()

Seems prudent to make that change now to be clear that this code is only
used by cloned requests.

Thanks,
Mike

2015-11-25 19:02:04

[permalink] [raw]

Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On 11/25/2015 07:01 PM, Mike Snitzer wrote:
> On Wed, Nov 25 2015 at 4:04am -0500,
> Hannes Reinecke <[email protected]> wrote:
>
>> On 11/20/2015 04:28 PM, Ewan Milne wrote:
>>> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
>>>> Can't we have a joint effort here?
>>>> I've been spending a _LOT_ of time trying to debug things here, but
>>>> none of the ideas I've come up with have been able to fix anything.
>>>
>>> Yes. I'm not the one primarily looking at it, and we don't have a
>>> reproducer in-house. We just have the one dump right now.
>>>
>>>>
>>>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
>>>> by one and be done with ...
>>>>
>>>
>>> That might not fix it if it is a problem with the merge code, though.
>>>
>> And indeed, it doesn't.
>
> How did you arrive at that? Do you have a reproducer now?
>
Not a reproducer, but several dumps for analysis.

>> Seems I finally found the culprit.
>>
>> What happens is this:
>> We have two paths, with these seg_boundary_masks:
>>
>> path-1: seg_boundary_mask = 65535,
>> path-2: seg_boundary_mask = 4294967295,
>>
>> consequently the DM request queue has this:
>>
>> md-1: seg_boundary_mask = 65535,
>>
>> What happens now is that a request is being formatted, and sent
>> to path 2. During submission req->nr_phys_segments is formatted
>> with the limits of path 2, arriving at a count of 3.
>> Now the request gets retried on path 1, but as the NOMERGE request
>> flag is set req->nr_phys_segments is never updated.
>> But blk_rq_map_sg() ignores all counters, and just uses the
>> bi_vec directly, resulting in a count of 4 -> boom.
>>
>> So the culprit here is the NOMERGE flag,
>
> NOMERGE is always set in __blk_rq_prep_clone() for cloned requests.
>
Yes.

>> which is evaluated via
>> ->dm_dispatch_request()
>> ->blk_insert_cloned_request()
>> ->blk_rq_check_limits()
>
> blk_insert_cloned_request() is the only caller of blk_rq_check_limits();
> anyway after reading your mail I'm still left wondering if your proposed
> patch is correct.
>
>> If the above assessment is correct, the following patch should
>> fix it:
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 801ced7..12cccd6 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
>> */
>> int blk_rq_check_limits(struct request_queue *q, struct request *rq)
>> {
>> - if (!rq_mergeable(rq))
>> + if (rq->cmd_type != REQ_TYPE_FS)
>> return 0;
>>
>> if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
>> rq->cmd_flags)) {
>>
>>
>> Mike? Jens?
>> Can you comment on it?
>
> You're not explaining the actual change in the patch very well; I think
> you're correct but you're leaving the justification as an exercise to
> the reviewer:
>
> blk_rq_check_limits() will call blk_recalc_rq_segments() after the
> !rq_mergeable() check but you're saying for this case in question we
> never get there -- due to the cloned request having NOMERGE set.
>
> So in blk_rq_check_limits() you've unrolled rq_mergeable() and
> open-coded the lone remaining check (rq->cmd_type != REQ_TYPE_FS)
>
> I agree that the (rq->cmd_flags & REQ_NOMERGE_FLAGS) check in
> the blk_insert_cloned_request() call-chain (via rq_mergeable()) makes no
> sense for cloned requests that always have NOMERGE set.
>
> So you're saying that by having blk_rq_check_limits() go on to call
> blk_recalc_rq_segments() this bug will be fixed?
>
That is the idea.

I've already established that in all instances I have seen so far
req->nr_phys_segments is _less_ than req->bio->bi_phys_segments.

As it turns out, req->nr_phys_segemnts _would_ have been updated in
blk_rq_check_limits(), but isn't due to the NOMERGE flag being set
for the cloned request.
So each cloned request inherits the values from the original request,
despite the fact that req->nr_phys_segments _has_ to be evaluated in
the final request_queue context, as the queue limits _might_ be
different from the original (merged) queue limits of the multipath
request queue.

> BTW, I think blk_rq_check_limits()'s export should be removed and the
> function made static and renamed to blk_clone_rq_check_limits(), again:
> blk_insert_cloned_request() is the only caller of blk_rq_check_limits()
>
Actually, seeing Jens' last comment the check for REQ_TYPE_FS is
pointless, too, so we might as well remove the entire if-clause.

> Seems prudent to make that change now to be clear that this code is only
> used by cloned requests.
>
Yeah, that would make sense. I'll be preparing a patch.
With a more detailed description :-)

Cheers,

Hannes

--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

2015-11-25 19:10:31