2022-03-30 12:35:03

by Andrea Righi

[permalink] [raw]
Subject: filesystem corruption with "scsi: core: Reallocate device's budget map on queue depth change"

Hello,

after this commit I'm experiencing some filesystem corruptions at boot
on a power9 box with an aacraid controller.

At the moment I'm running a 5.15.30 kernel; when the filesystem is
mounted at boot I see the following errors in the console:

Begin: Will now check root file system ... fsck from util-linux 2.36.1
[/usr/sbin/fsck.ext4 (1) -- /dev/sda2] fsck.ext4 -a -C0 /dev/sda2
root: clean, 99646/122101760 files, 11187342/488376336 blocks
done.
[ 4.636613] sd 0:2:0:0: [sda] tag#257 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[ 4.636655] sd 0:2:0:0: [sda] tag#257 CDB: Read(10) 28 00 00 00 4c 10 00 00 08 00
[ 4.636689] blk_update_request: I/O error, dev sda, sector 19472 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 4.636734] sd 0:2:0:0: [sda] tag#258 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[ 4.636772] sd 0:2:0:0: [sda] tag#258 CDB: Read(10) 28 00 00 00 4c 18 00 00 08 00
[ 4.636796] blk_update_request: I/O error, dev sda, sector 19480 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 4.636840] sd 0:2:0:0: [sda] tag#260 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[ 4.636877] sd 0:2:0:0: [sda] tag#260 CDB: Read(10) 28 00 00 00 4c 28 00 00 08 00
[ 4.636901] blk_update_request: I/O error, dev sda, sector 19496 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 4.636944] sd 0:2:0:0: [sda] tag#259 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[ 4.636971] sd 0:2:0:0: [sda] tag#259 CDB: Read(10) 28 00 00 00 4c 20 00 00 08 00
[ 4.637005] blk_update_request: I/O error, dev sda, sector 19488 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 4.637049] sd 0:2:0:0: [sda] tag#262 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[ 4.637085] sd 0:2:0:0: [sda] tag#262 CDB: Read(10) 28 00 00 00 4c 38 00 00 08 00
[ 4.637118] blk_update_request: I/O error, dev sda, sector 19512 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 4.637161] sd 0:2:0:0: [sda] tag#264 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[ 4.637197] sd 0:2:0:0: [sda] tag#264 CDB: Read(10) 28 00 00 00 4c 48 00 00 08 00
[ 4.637221] blk_update_request: I/O error, dev sda, sector 19528 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 4.637270] sd 0:2:0:0: [sda] tag#284 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[ 4.637306] sd 0:2:0:0: [sda] tag#284 CDB: Read(10) 28 00 00 00 4c e8 00 00 08 00
[ 4.637332] blk_update_request: I/O error, dev sda, sector 19688 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 4.637375] sd 0:2:0:0: [sda] tag#286 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[ 4.637411] sd 0:2:0:0: [sda] tag#286 CDB: Read(10) 28 00 00 00 4c f8 00 00 08 00
[ 4.637444] blk_update_request: I/O error, dev sda, sector 19704 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 4.637481] blk_update_request: I/O error, dev sda, sector 19664 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 4.637485] sd 0:2:0:0: [sda] tag#282 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[ 4.637487] sd 0:2:0:0: [sda] tag#287 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[ 4.637491] sd 0:2:0:0: [sda] tag#287 CDB: Read(10) 28 00 00 00 4d 00 00 00 08 00
[ 4.637491] sd 0:2:0:0: [sda] tag#282 CDB: Read(10) 28 00 00 00 4c d8 00 00 08 00
[ 4.637494] blk_update_request: I/O error, dev sda, sector 19672 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 4.747771] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.

If I reboot multiple times fsck requires a manual fix and I get dropped
to the initramfs shell. Some times the filesystem gets corrupted and I
need to redeploy the box.

If I use the same kernel with this commit reverted I can reboot as many
times as I want without any failure:

813c6871f76b ("scsi: core: Reallocate device's budget map on queue depth change")

For now I've just reverted the commit, but I'll try to add some
debugging and collect more info.

Let me know if there's any specific test that you want me to try.

Thanks,
-Andrea


2022-03-30 15:50:06

by Andrea Righi

[permalink] [raw]
Subject: Re: filesystem corruption with "scsi: core: Reallocate device's budget map on queue depth change"

On Wed, Mar 30, 2022 at 11:38:02AM +0100, John Garry wrote:
> On 30/03/2022 11:11, Andrea Righi wrote:
> > Hello,
> >
> > after this commit I'm experiencing some filesystem corruptions at boot
> > on a power9 box with an aacraid controller.
> >
> > At the moment I'm running a 5.15.30 kernel; when the filesystem is
> > mounted at boot I see the following errors in the console:
> >
> > Begin: Will now check root file system ... fsck from util-linux 2.36.1
> > [/usr/sbin/fsck.ext4 (1) -- /dev/sda2] fsck.ext4 -a -C0 /dev/sda2
> > root: clean, 99646/122101760 files, 11187342/488376336 blocks
> > done.
> > [ 4.636613] sd 0:2:0:0: [sda] tag#257 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > [ 4.636655] sd 0:2:0:0: [sda] tag#257 CDB: Read(10) 28 00 00 00 4c 10 00 00 08 00
> > [ 4.636689] blk_update_request: I/O error, dev sda, sector 19472 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > [ 4.636734] sd 0:2:0:0: [sda] tag#258 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > [ 4.636772] sd 0:2:0:0: [sda] tag#258 CDB: Read(10) 28 00 00 00 4c 18 00 00 08 00
> > [ 4.636796] blk_update_request: I/O error, dev sda, sector 19480 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > [ 4.636840] sd 0:2:0:0: [sda] tag#260 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > [ 4.636877] sd 0:2:0:0: [sda] tag#260 CDB: Read(10) 28 00 00 00 4c 28 00 00 08 00
> > [ 4.636901] blk_update_request: I/O error, dev sda, sector 19496 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > [ 4.636944] sd 0:2:0:0: [sda] tag#259 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > [ 4.636971] sd 0:2:0:0: [sda] tag#259 CDB: Read(10) 28 00 00 00 4c 20 00 00 08 00
> > [ 4.637005] blk_update_request: I/O error, dev sda, sector 19488 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > [ 4.637049] sd 0:2:0:0: [sda] tag#262 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > [ 4.637085] sd 0:2:0:0: [sda] tag#262 CDB: Read(10) 28 00 00 00 4c 38 00 00 08 00
> > [ 4.637118] blk_update_request: I/O error, dev sda, sector 19512 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > [ 4.637161] sd 0:2:0:0: [sda] tag#264 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > [ 4.637197] sd 0:2:0:0: [sda] tag#264 CDB: Read(10) 28 00 00 00 4c 48 00 00 08 00
> > [ 4.637221] blk_update_request: I/O error, dev sda, sector 19528 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > [ 4.637270] sd 0:2:0:0: [sda] tag#284 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > [ 4.637306] sd 0:2:0:0: [sda] tag#284 CDB: Read(10) 28 00 00 00 4c e8 00 00 08 00
> > [ 4.637332] blk_update_request: I/O error, dev sda, sector 19688 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > [ 4.637375] sd 0:2:0:0: [sda] tag#286 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > [ 4.637411] sd 0:2:0:0: [sda] tag#286 CDB: Read(10) 28 00 00 00 4c f8 00 00 08 00
> > [ 4.637444] blk_update_request: I/O error, dev sda, sector 19704 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > [ 4.637481] blk_update_request: I/O error, dev sda, sector 19664 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > [ 4.637485] sd 0:2:0:0: [sda] tag#282 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > [ 4.637487] sd 0:2:0:0: [sda] tag#287 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > [ 4.637491] sd 0:2:0:0: [sda] tag#287 CDB: Read(10) 28 00 00 00 4d 00 00 00 08 00
> > [ 4.637491] sd 0:2:0:0: [sda] tag#282 CDB: Read(10) 28 00 00 00 4c d8 00 00 08 00
> > [ 4.637494] blk_update_request: I/O error, dev sda, sector 19672 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > [ 4.747771] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
> >
> > If I reboot multiple times fsck requires a manual fix and I get dropped
> > to the initramfs shell. Some times the filesystem gets corrupted and I
> > need to redeploy the box.
> >
> > If I use the same kernel with this commit reverted I can reboot as many
> > times as I want without any failure:
> >
> > 813c6871f76b ("scsi: core: Reallocate device's budget map on queue depth change")
>
> I would not have thought that this causes possible corruption.
>
> >
> > For now I've just reverted the commit, but I'll try to add some
> > debugging and collect more info.
> >
> > Let me know if there's any specific test that you want me to try.
> >
>
> Please try this:
> https://lore.kernel.org/linux-scsi/[email protected]/T/#t
>
> It never made 5.17, which I would have hoped for.

Thanks John! It looks like this one is actually fixing the problem.
I rebooted multiple times and I didn't get any I/O error or corruption.

If you want you can add my:

Tested-by: Andrea Righi <[email protected]>

2022-03-31 02:42:10

by Ming Lei

[permalink] [raw]
Subject: Re: filesystem corruption with "scsi: core: Reallocate device's budget map on queue depth change"

On Thu, Mar 31, 2022 at 07:30:35AM +0900, Damien Le Moal wrote:
> On 3/30/22 22:48, Ming Lei wrote:
> > On Wed, Mar 30, 2022 at 09:31:35AM -0400, James Bottomley wrote:
> >> On Wed, 2022-03-30 at 13:59 +0100, John Garry wrote:
> >>> On 30/03/2022 12:21, Andrea Righi wrote:
> >>>> On Wed, Mar 30, 2022 at 11:38:02AM +0100, John Garry wrote:
> >>>>> On 30/03/2022 11:11, Andrea Righi wrote:
> >>>>>> Hello,
> >>>>>>
> >>>>>> after this commit I'm experiencing some filesystem corruptions
> >>>>>> at boot on a power9 box with an aacraid controller.
> >>>>>>
> >>>>>> At the moment I'm running a 5.15.30 kernel; when the filesystem
> >>>>>> is mounted at boot I see the following errors in the console:
> >>>
> >>> About "scsi: core: Reallocate device's budget map on queue depth
> >>> change" being added to a stable kernel, I am not sure if this was
> >>> really a fix or just a memory optimisation.
> >>
> >> I can see how it becomes the problem: it frees and allocates a new
> >> bitmap across a queue freeze, but bits in the old one might still be in
> >> use. This isn't a problem except when they return and we now possibly
> >> see a tag greater than we think we can allocate coming back.
> >> Presumably we don't check this and we end up doing a write to
> >> unallocated memory.
> >>
> >> I think if you want to reallocate on queue depth reduction, you might
> >> have to drain the queue as well as freeze it.
> >
> > After queue is frozen, there can't be any in-flight request/scsi
> > command, so the sbitmap is zeroed at that time, and safe to reallocate.
> >
> > The problem is aacraid specific, since the driver has hard limit
> > of 256 queue depth, see aac_change_queue_depth().
>
> 256 is the scsi hard limit per device... Any SAS drive has the same limit
> by default since there is no way to know the max queue depth of a scsi
> disk.So what is special about aacraid ?
>

I meant aac_change_queue_depth() sets hard limit of 256.

Yeah, for any hba driver which implements its own .change_queue_depth(),
there may be one hard limit there.

So I still don't understand why you mention '256 is the scsi hard limit per
device', and where is the code? If both .cma_per_lun and .can_queue are > 256
and the driver uses default scsi_change_queue_depth() and sdev->tagged_supported
is true, then user is free to change queue depth via /sys/block/$SDN/device/queue_depth
to > 256. It is same for SAS, see sas_change_queue_depth().

Also I am pretty sure some type of scsi device is capable of supporting >256 queue
depth, include sas, and sas usually has big queue depth.


Thanks,
Ming

2022-03-31 02:59:36

by Ming Lei

[permalink] [raw]
Subject: Re: filesystem corruption with "scsi: core: Reallocate device's budget map on queue depth change"

On Wed, Mar 30, 2022 at 09:31:35AM -0400, James Bottomley wrote:
> On Wed, 2022-03-30 at 13:59 +0100, John Garry wrote:
> > On 30/03/2022 12:21, Andrea Righi wrote:
> > > On Wed, Mar 30, 2022 at 11:38:02AM +0100, John Garry wrote:
> > > > On 30/03/2022 11:11, Andrea Righi wrote:
> > > > > Hello,
> > > > >
> > > > > after this commit I'm experiencing some filesystem corruptions
> > > > > at boot on a power9 box with an aacraid controller.
> > > > >
> > > > > At the moment I'm running a 5.15.30 kernel; when the filesystem
> > > > > is mounted at boot I see the following errors in the console:
> >
> > About "scsi: core: Reallocate device's budget map on queue depth
> > change" being added to a stable kernel, I am not sure if this was
> > really a fix or just a memory optimisation.
>
> I can see how it becomes the problem: it frees and allocates a new
> bitmap across a queue freeze, but bits in the old one might still be in
> use. This isn't a problem except when they return and we now possibly
> see a tag greater than we think we can allocate coming back.
> Presumably we don't check this and we end up doing a write to
> unallocated memory.
>
> I think if you want to reallocate on queue depth reduction, you might
> have to drain the queue as well as freeze it.

After queue is frozen, there can't be any in-flight request/scsi
command, so the sbitmap is zeroed at that time, and safe to reallocate.

The problem is aacraid specific, since the driver has hard limit
of 256 queue depth, see aac_change_queue_depth().


Thanks,
Ming

2022-03-31 03:05:56

by Ming Lei

[permalink] [raw]
Subject: Re: filesystem corruption with "scsi: core: Reallocate device's budget map on queue depth change"

On Wed, Mar 30, 2022 at 01:59:43PM +0100, John Garry wrote:
> On 30/03/2022 12:21, Andrea Righi wrote:
> > On Wed, Mar 30, 2022 at 11:38:02AM +0100, John Garry wrote:
> > > On 30/03/2022 11:11, Andrea Righi wrote:
> > > > Hello,
> > > >
> > > > after this commit I'm experiencing some filesystem corruptions at boot
> > > > on a power9 box with an aacraid controller.
> > > >
> > > > At the moment I'm running a 5.15.30 kernel; when the filesystem is
> > > > mounted at boot I see the following errors in the console:
>
> About "scsi: core: Reallocate device's budget map on queue depth change"
> being added to a stable kernel, I am not sure if this was really a fix or
> just a memory optimisation.
>
> > > >
> > > > Begin: Will now check root file system ... fsck from util-linux 2.36.1
> > > > [/usr/sbin/fsck.ext4 (1) -- /dev/sda2] fsck.ext4 -a -C0 /dev/sda2
> > > > root: clean, 99646/122101760 files, 11187342/488376336 blocks
> > > > done.
> > > > [ 4.636613] sd 0:2:0:0: [sda] tag#257 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > > > [ 4.636655] sd 0:2:0:0: [sda] tag#257 CDB: Read(10) 28 00 00 00 4c 10 00 00 08 00
> > > > [ 4.636689] blk_update_request: I/O error, dev sda, sector 19472 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > > > [ 4.636734] sd 0:2:0:0: [sda] tag#258 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > > > [ 4.636772] sd 0:2:0:0: [sda] tag#258 CDB: Read(10) 28 00 00 00 4c 18 00 00 08 00
> > > > [ 4.636796] blk_update_request: I/O error, dev sda, sector 19480 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > > > [ 4.636840] sd 0:2:0:0: [sda] tag#260 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > > > [ 4.636877] sd 0:2:0:0: [sda] tag#260 CDB: Read(10) 28 00 00 00 4c 28 00 00 08 00
> > > > [ 4.636901] blk_update_request: I/O error, dev sda, sector 19496 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > > > [ 4.636944] sd 0:2:0:0: [sda] tag#259 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > > > [ 4.636971] sd 0:2:0:0: [sda] tag#259 CDB: Read(10) 28 00 00 00 4c 20 00 00 08 00
> > > > [ 4.637005] blk_update_request: I/O error, dev sda, sector 19488 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > > > [ 4.637049] sd 0:2:0:0: [sda] tag#262 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > > > [ 4.637085] sd 0:2:0:0: [sda] tag#262 CDB: Read(10) 28 00 00 00 4c 38 00 00 08 00
> > > > [ 4.637118] blk_update_request: I/O error, dev sda, sector 19512 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > > > [ 4.637161] sd 0:2:0:0: [sda] tag#264 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > > > [ 4.637197] sd 0:2:0:0: [sda] tag#264 CDB: Read(10) 28 00 00 00 4c 48 00 00 08 00
> > > > [ 4.637221] blk_update_request: I/O error, dev sda, sector 19528 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > > > [ 4.637270] sd 0:2:0:0: [sda] tag#284 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > > > [ 4.637306] sd 0:2:0:0: [sda] tag#284 CDB: Read(10) 28 00 00 00 4c e8 00 00 08 00
> > > > [ 4.637332] blk_update_request: I/O error, dev sda, sector 19688 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > > > [ 4.637375] sd 0:2:0:0: [sda] tag#286 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > > > [ 4.637411] sd 0:2:0:0: [sda] tag#286 CDB: Read(10) 28 00 00 00 4c f8 00 00 08 00
> > > > [ 4.637444] blk_update_request: I/O error, dev sda, sector 19704 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > > > [ 4.637481] blk_update_request: I/O error, dev sda, sector 19664 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > > > [ 4.637485] sd 0:2:0:0: [sda] tag#282 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > > > [ 4.637487] sd 0:2:0:0: [sda] tag#287 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> > > > [ 4.637491] sd 0:2:0:0: [sda] tag#287 CDB: Read(10) 28 00 00 00 4d 00 00 00 08 00
> > > > [ 4.637491] sd 0:2:0:0: [sda] tag#282 CDB: Read(10) 28 00 00 00 4c d8 00 00 08 00
> > > > [ 4.637494] blk_update_request: I/O error, dev sda, sector 19672 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > > > [ 4.747771] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
> > > >
>
> We need to find where that memory corruption is coming from. Maybe the root
> is that too many commands are being sent to the disk. Ming?

Yeah, looks aacraid limits max queue depth as 256, see aac_change_queue_depth().


Thanks,
Ming

2022-03-31 03:17:12

by John Garry

[permalink] [raw]
Subject: Re: filesystem corruption with "scsi: core: Reallocate device's budget map on queue depth change"

On 30/03/2022 12:21, Andrea Righi wrote:
> On Wed, Mar 30, 2022 at 11:38:02AM +0100, John Garry wrote:
>> On 30/03/2022 11:11, Andrea Righi wrote:
>>> Hello,
>>>
>>> after this commit I'm experiencing some filesystem corruptions at boot
>>> on a power9 box with an aacraid controller.
>>>
>>> At the moment I'm running a 5.15.30 kernel; when the filesystem is
>>> mounted at boot I see the following errors in the console:

About "scsi: core: Reallocate device's budget map on queue depth change"
being added to a stable kernel, I am not sure if this was really a fix
or just a memory optimisation.

>>>
>>> Begin: Will now check root file system ... fsck from util-linux 2.36.1
>>> [/usr/sbin/fsck.ext4 (1) -- /dev/sda2] fsck.ext4 -a -C0 /dev/sda2
>>> root: clean, 99646/122101760 files, 11187342/488376336 blocks
>>> done.
>>> [ 4.636613] sd 0:2:0:0: [sda] tag#257 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
>>> [ 4.636655] sd 0:2:0:0: [sda] tag#257 CDB: Read(10) 28 00 00 00 4c 10 00 00 08 00
>>> [ 4.636689] blk_update_request: I/O error, dev sda, sector 19472 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>>> [ 4.636734] sd 0:2:0:0: [sda] tag#258 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
>>> [ 4.636772] sd 0:2:0:0: [sda] tag#258 CDB: Read(10) 28 00 00 00 4c 18 00 00 08 00
>>> [ 4.636796] blk_update_request: I/O error, dev sda, sector 19480 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>>> [ 4.636840] sd 0:2:0:0: [sda] tag#260 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
>>> [ 4.636877] sd 0:2:0:0: [sda] tag#260 CDB: Read(10) 28 00 00 00 4c 28 00 00 08 00
>>> [ 4.636901] blk_update_request: I/O error, dev sda, sector 19496 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>>> [ 4.636944] sd 0:2:0:0: [sda] tag#259 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
>>> [ 4.636971] sd 0:2:0:0: [sda] tag#259 CDB: Read(10) 28 00 00 00 4c 20 00 00 08 00
>>> [ 4.637005] blk_update_request: I/O error, dev sda, sector 19488 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>>> [ 4.637049] sd 0:2:0:0: [sda] tag#262 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
>>> [ 4.637085] sd 0:2:0:0: [sda] tag#262 CDB: Read(10) 28 00 00 00 4c 38 00 00 08 00
>>> [ 4.637118] blk_update_request: I/O error, dev sda, sector 19512 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>>> [ 4.637161] sd 0:2:0:0: [sda] tag#264 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
>>> [ 4.637197] sd 0:2:0:0: [sda] tag#264 CDB: Read(10) 28 00 00 00 4c 48 00 00 08 00
>>> [ 4.637221] blk_update_request: I/O error, dev sda, sector 19528 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>>> [ 4.637270] sd 0:2:0:0: [sda] tag#284 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
>>> [ 4.637306] sd 0:2:0:0: [sda] tag#284 CDB: Read(10) 28 00 00 00 4c e8 00 00 08 00
>>> [ 4.637332] blk_update_request: I/O error, dev sda, sector 19688 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>>> [ 4.637375] sd 0:2:0:0: [sda] tag#286 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
>>> [ 4.637411] sd 0:2:0:0: [sda] tag#286 CDB: Read(10) 28 00 00 00 4c f8 00 00 08 00
>>> [ 4.637444] blk_update_request: I/O error, dev sda, sector 19704 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>>> [ 4.637481] blk_update_request: I/O error, dev sda, sector 19664 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>>> [ 4.637485] sd 0:2:0:0: [sda] tag#282 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
>>> [ 4.637487] sd 0:2:0:0: [sda] tag#287 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
>>> [ 4.637491] sd 0:2:0:0: [sda] tag#287 CDB: Read(10) 28 00 00 00 4d 00 00 00 08 00
>>> [ 4.637491] sd 0:2:0:0: [sda] tag#282 CDB: Read(10) 28 00 00 00 4c d8 00 00 08 00
>>> [ 4.637494] blk_update_request: I/O error, dev sda, sector 19672 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>>> [ 4.747771] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
>>>

We need to find where that memory corruption is coming from. Maybe the
root is that too many commands are being sent to the disk. Ming?

>>> If I reboot multiple times fsck requires a manual fix and I get dropped
>>> to the initramfs shell. Some times the filesystem gets corrupted and I
>>> need to redeploy the box.
>>>
>>> If I use the same kernel with this commit reverted I can reboot as many
>>> times as I want without any failure:
>>>
>>> 813c6871f76b ("scsi: core: Reallocate device's budget map on queue depth change")
>>
>> I would not have thought that this causes possible corruption.
>>
>>>
>>> For now I've just reverted the commit, but I'll try to add some
>>> debugging and collect more info.
>>>
>>> Let me know if there's any specific test that you want me to try.
>>>
>>
>> Please try this:
>> https://lore.kernel.org/linux-scsi/[email protected]/T/#t
>>
>> It never made 5.17, which I would have hoped for.
>
> Thanks John! It looks like this one is actually fixing the problem.
> I rebooted multiple times and I didn't get any I/O error or corruption.
>
> If you want you can add my:
>
> Tested-by: Andrea Righi <[email protected]>
> .

We should prob add something like this also:

--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1708,7 +1708,9 @@ static blk_status_t scsi_queue_rq(struct
blk_mq_hw_ctx *hctx,
blk_status_t ret;
int reason;

- WARN_ON_ONCE(cmd->budget_token < 0);
+ if (WARN_ON_ONCE(cmd->budget_token < 0 ||
+ cmd->budget_token >= sdev->queue_depth))
+ return BLK_STS_IOERR;


Thanks,
John

2022-03-31 03:31:04

by John Garry

[permalink] [raw]
Subject: Re: filesystem corruption with "scsi: core: Reallocate device's budget map on queue depth change"

On 30/03/2022 11:11, Andrea Righi wrote:
> Hello,
>
> after this commit I'm experiencing some filesystem corruptions at boot
> on a power9 box with an aacraid controller.
>
> At the moment I'm running a 5.15.30 kernel; when the filesystem is
> mounted at boot I see the following errors in the console:
>
> Begin: Will now check root file system ... fsck from util-linux 2.36.1
> [/usr/sbin/fsck.ext4 (1) -- /dev/sda2] fsck.ext4 -a -C0 /dev/sda2
> root: clean, 99646/122101760 files, 11187342/488376336 blocks
> done.
> [ 4.636613] sd 0:2:0:0: [sda] tag#257 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> [ 4.636655] sd 0:2:0:0: [sda] tag#257 CDB: Read(10) 28 00 00 00 4c 10 00 00 08 00
> [ 4.636689] blk_update_request: I/O error, dev sda, sector 19472 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [ 4.636734] sd 0:2:0:0: [sda] tag#258 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> [ 4.636772] sd 0:2:0:0: [sda] tag#258 CDB: Read(10) 28 00 00 00 4c 18 00 00 08 00
> [ 4.636796] blk_update_request: I/O error, dev sda, sector 19480 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [ 4.636840] sd 0:2:0:0: [sda] tag#260 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> [ 4.636877] sd 0:2:0:0: [sda] tag#260 CDB: Read(10) 28 00 00 00 4c 28 00 00 08 00
> [ 4.636901] blk_update_request: I/O error, dev sda, sector 19496 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [ 4.636944] sd 0:2:0:0: [sda] tag#259 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> [ 4.636971] sd 0:2:0:0: [sda] tag#259 CDB: Read(10) 28 00 00 00 4c 20 00 00 08 00
> [ 4.637005] blk_update_request: I/O error, dev sda, sector 19488 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [ 4.637049] sd 0:2:0:0: [sda] tag#262 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> [ 4.637085] sd 0:2:0:0: [sda] tag#262 CDB: Read(10) 28 00 00 00 4c 38 00 00 08 00
> [ 4.637118] blk_update_request: I/O error, dev sda, sector 19512 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [ 4.637161] sd 0:2:0:0: [sda] tag#264 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> [ 4.637197] sd 0:2:0:0: [sda] tag#264 CDB: Read(10) 28 00 00 00 4c 48 00 00 08 00
> [ 4.637221] blk_update_request: I/O error, dev sda, sector 19528 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [ 4.637270] sd 0:2:0:0: [sda] tag#284 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> [ 4.637306] sd 0:2:0:0: [sda] tag#284 CDB: Read(10) 28 00 00 00 4c e8 00 00 08 00
> [ 4.637332] blk_update_request: I/O error, dev sda, sector 19688 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [ 4.637375] sd 0:2:0:0: [sda] tag#286 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> [ 4.637411] sd 0:2:0:0: [sda] tag#286 CDB: Read(10) 28 00 00 00 4c f8 00 00 08 00
> [ 4.637444] blk_update_request: I/O error, dev sda, sector 19704 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [ 4.637481] blk_update_request: I/O error, dev sda, sector 19664 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [ 4.637485] sd 0:2:0:0: [sda] tag#282 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> [ 4.637487] sd 0:2:0:0: [sda] tag#287 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> [ 4.637491] sd 0:2:0:0: [sda] tag#287 CDB: Read(10) 28 00 00 00 4d 00 00 00 08 00
> [ 4.637491] sd 0:2:0:0: [sda] tag#282 CDB: Read(10) 28 00 00 00 4c d8 00 00 08 00
> [ 4.637494] blk_update_request: I/O error, dev sda, sector 19672 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [ 4.747771] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
>
> If I reboot multiple times fsck requires a manual fix and I get dropped
> to the initramfs shell. Some times the filesystem gets corrupted and I
> need to redeploy the box.
>
> If I use the same kernel with this commit reverted I can reboot as many
> times as I want without any failure:
>
> 813c6871f76b ("scsi: core: Reallocate device's budget map on queue depth change")

I would not have thought that this causes possible corruption.

>
> For now I've just reverted the commit, but I'll try to add some
> debugging and collect more info.
>
> Let me know if there's any specific test that you want me to try.
>

Please try this:
https://lore.kernel.org/linux-scsi/[email protected]/T/#t

It never made 5.17, which I would have hoped for.

Thanks,
John

2022-03-31 04:27:18

by Damien Le Moal

[permalink] [raw]
Subject: Re: filesystem corruption with "scsi: core: Reallocate device's budget map on queue depth change"

On 3/30/22 22:48, Ming Lei wrote:
> On Wed, Mar 30, 2022 at 09:31:35AM -0400, James Bottomley wrote:
>> On Wed, 2022-03-30 at 13:59 +0100, John Garry wrote:
>>> On 30/03/2022 12:21, Andrea Righi wrote:
>>>> On Wed, Mar 30, 2022 at 11:38:02AM +0100, John Garry wrote:
>>>>> On 30/03/2022 11:11, Andrea Righi wrote:
>>>>>> Hello,
>>>>>>
>>>>>> after this commit I'm experiencing some filesystem corruptions
>>>>>> at boot on a power9 box with an aacraid controller.
>>>>>>
>>>>>> At the moment I'm running a 5.15.30 kernel; when the filesystem
>>>>>> is mounted at boot I see the following errors in the console:
>>>
>>> About "scsi: core: Reallocate device's budget map on queue depth
>>> change" being added to a stable kernel, I am not sure if this was
>>> really a fix or just a memory optimisation.
>>
>> I can see how it becomes the problem: it frees and allocates a new
>> bitmap across a queue freeze, but bits in the old one might still be in
>> use. This isn't a problem except when they return and we now possibly
>> see a tag greater than we think we can allocate coming back.
>> Presumably we don't check this and we end up doing a write to
>> unallocated memory.
>>
>> I think if you want to reallocate on queue depth reduction, you might
>> have to drain the queue as well as freeze it.
>
> After queue is frozen, there can't be any in-flight request/scsi
> command, so the sbitmap is zeroed at that time, and safe to reallocate.
>
> The problem is aacraid specific, since the driver has hard limit
> of 256 queue depth, see aac_change_queue_depth().

256 is the scsi hard limit per device... Any SAS drive has the same limit
by default since there is no way to know the max queue depth of a scsi
disk. So what is special about aacraid ?

>
>
> Thanks,
> Ming
>


--
Damien Le Moal
Western Digital Research

2022-03-31 04:31:16

by James Bottomley

[permalink] [raw]
Subject: Re: filesystem corruption with "scsi: core: Reallocate device's budget map on queue depth change"

On Wed, 2022-03-30 at 13:59 +0100, John Garry wrote:
> On 30/03/2022 12:21, Andrea Righi wrote:
> > On Wed, Mar 30, 2022 at 11:38:02AM +0100, John Garry wrote:
> > > On 30/03/2022 11:11, Andrea Righi wrote:
> > > > Hello,
> > > >
> > > > after this commit I'm experiencing some filesystem corruptions
> > > > at boot on a power9 box with an aacraid controller.
> > > >
> > > > At the moment I'm running a 5.15.30 kernel; when the filesystem
> > > > is mounted at boot I see the following errors in the console:
>
> About "scsi: core: Reallocate device's budget map on queue depth
> change" being added to a stable kernel, I am not sure if this was
> really a fix or just a memory optimisation.

I can see how it becomes the problem: it frees and allocates a new
bitmap across a queue freeze, but bits in the old one might still be in
use. This isn't a problem except when they return and we now possibly
see a tag greater than we think we can allocate coming back.
Presumably we don't check this and we end up doing a write to
unallocated memory.

I think if you want to reallocate on queue depth reduction, you might
have to drain the queue as well as freeze it.

James


2022-03-31 06:59:02

by Damien Le Moal

[permalink] [raw]
Subject: Re: filesystem corruption with "scsi: core: Reallocate device's budget map on queue depth change"

On 3/31/22 11:14, Ming Lei wrote:
> On Thu, Mar 31, 2022 at 07:30:35AM +0900, Damien Le Moal wrote:
>> On 3/30/22 22:48, Ming Lei wrote:
>>> On Wed, Mar 30, 2022 at 09:31:35AM -0400, James Bottomley wrote:
>>>> On Wed, 2022-03-30 at 13:59 +0100, John Garry wrote:
>>>>> On 30/03/2022 12:21, Andrea Righi wrote:
>>>>>> On Wed, Mar 30, 2022 at 11:38:02AM +0100, John Garry wrote:
>>>>>>> On 30/03/2022 11:11, Andrea Righi wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> after this commit I'm experiencing some filesystem corruptions
>>>>>>>> at boot on a power9 box with an aacraid controller.
>>>>>>>>
>>>>>>>> At the moment I'm running a 5.15.30 kernel; when the filesystem
>>>>>>>> is mounted at boot I see the following errors in the console:
>>>>>
>>>>> About "scsi: core: Reallocate device's budget map on queue depth
>>>>> change" being added to a stable kernel, I am not sure if this was
>>>>> really a fix or just a memory optimisation.
>>>>
>>>> I can see how it becomes the problem: it frees and allocates a new
>>>> bitmap across a queue freeze, but bits in the old one might still be in
>>>> use. This isn't a problem except when they return and we now possibly
>>>> see a tag greater than we think we can allocate coming back.
>>>> Presumably we don't check this and we end up doing a write to
>>>> unallocated memory.
>>>>
>>>> I think if you want to reallocate on queue depth reduction, you might
>>>> have to drain the queue as well as freeze it.
>>>
>>> After queue is frozen, there can't be any in-flight request/scsi
>>> command, so the sbitmap is zeroed at that time, and safe to reallocate.
>>>
>>> The problem is aacraid specific, since the driver has hard limit
>>> of 256 queue depth, see aac_change_queue_depth().
>>
>> 256 is the scsi hard limit per device... Any SAS drive has the same limit
>> by default since there is no way to know the max queue depth of a scsi
>> disk.So what is special about aacraid ?
>>
>
> I meant aac_change_queue_depth() sets hard limit of 256.
>
> Yeah, for any hba driver which implements its own .change_queue_depth(),
> there may be one hard limit there.
>
> So I still don't understand why you mention '256 is the scsi hard limit per
> device', and where is the code? If both .cma_per_lun and .can_queue are > 256
> and the driver uses default scsi_change_queue_depth() and sdev->tagged_supported
> is true, then user is free to change queue depth via /sys/block/$SDN/device/queue_depth
> to > 256. It is same for SAS, see sas_change_queue_depth().
>
> Also I am pretty sure some type of scsi device is capable of supporting >256 queue
> depth, include sas, and sas usually has big queue depth.

Correct. The 256 hard limit comes from the old parallel scsi transport
which had only 8 bits for the tag value. Other SCSI transports do have
more bits for tags, allowing higher maximum.

That said, the scsi stack limits the max queue depth to 1024 (see
scsi_device_max_queue_depth(), and most drivers set can_queue to 256 or
less by default.

>
>
> Thanks,
> Ming
>


--
Damien Le Moal
Western Digital Research