2021-10-31 23:21:27

by Justin Piszcz

[permalink] [raw]
Subject: kernel 5.15 does not boot with 3ware card (never had this issue <= 5.14) - scsi 0:0:0:0: WARNING: (0x06:0x002C) : Command (0x12) timed out, resetting card

Hello,

Issue: 5.15 hangs at boot (hangs when trying to initialize the 3ware card,
have not had this issue with any prior 5.x kernel 5.(0-14).

Arch: x86_64
Kernel: 5.15
Distribution: Debian testing
Note: Upgraded from 5.14.8 to 5.15, the diff between the two .config's is
posted below, thoughts?

dmesg snippet:
-------------------------------
[8.0574191 loop: module loaded
[8.0575251 the cryptoloop driver has been deprecated and will be removed in
in Linux 5.16
[8.057809] LSI 3uare SAS/SATA-RAID Controller device driver for Linux
v3.26.02.000.
[8.3369831 tc: Refined TC clocksource calibration: 3699.999 MHz
[8.3371911 clocksource: tsc: mask: Oxffffffffffffffff max_cycles:
Ox6aaaa900000, max_idle_ns: 881590498719 ns
[8.3375551 clocksource: Switched to clocksource tsc
( ... )
[9.097964] 3u-sas: scsiO: AEN: INFO (0x04:0x0053): Battery capacity test is
overdue:.
[9.201986] scsi host: 3w-sas
[9.305954] 3u-sas: scsi0: Found an LSI 3ware 9750-2414e Controller at
Oxfb760000, IRQ: 45.
[9.6179701 3u-sas: scsi0: Firmuare FH9X 5.12.00.016, BIOS BE9X 5.11.00.007,
Phys: 28.
[30.498007] scsi 0:0:0:0: WARNING: (0x06:0x002C) : Command (0x12) timed out,
resetting card
[71.4419581 scsi 0:0:0:0: WARNING: (0x06: 0x002C): Command (0x0) timed out,
resetting card.

--

Full configs:
https://installkernel.tripod.com/5.14.txt
https://installkernel.tripod.com/5.15.txt

Diff between 5.14 and 5.15 .config files-- could it be something to do with
CONFIG_IOMMU_DEFAULT_DMA_LAZY=y?

-CONFIG_PRINTK_NMI=y
+CONFIG_ARCH_NR_GPIO=1024
-CONFIG_X86_SYSFB=y
-CONFIG_FIRMWARE_MEMMAP=y
-CONFIG_DMIID=y
-CONFIG_DMI_SYSFS=y
-CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y
+CONFIG_TRACE_IRQFLAGS_SUPPORT=y
+CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH=y
-CONFIG_BLK_SCSI_REQUEST=y
-CONFIG_BLK_DEV_BSG=y
+CONFIG_BLK_DEV_BSG_COMMON=y
+CONFIG_BLOCK_HOLDER_DEPRECATED=y
+CONFIG_AF_UNIX_OOB=y
+CONFIG_FIRMWARE_MEMMAP=y
+CONFIG_DMIID=y
+CONFIG_DMI_SYSFS=y
+CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y
+CONFIG_SYSFB=y
+CONFIG_SCSI_COMMON=y
+CONFIG_BLK_DEV_BSG=y
+CONFIG_PTP_1588_CLOCK_OPTIONAL=y
+CONFIG_IOMMU_DEFAULT_DMA_LAZY=y
-CONFIG_MANDATORY_FILE_LOCKING=y
+CONFIG_NETFS_STATS=y
+CONFIG_NTFS3_FS=y
+CONFIG_NTFS3_LZX_XPRESS=y
+CONFIG_SMB_SERVER=y
+CONFIG_SMB_SERVER_CHECK_CAP_NET_ADMIN=y
+CONFIG_SMBFS_COMMON=y
-CONFIG_TRACE_IRQFLAGS_SUPPORT=y

Thanks,

Justin.


2021-10-31 23:57:09

by Bart Van Assche

[permalink] [raw]
Subject: Re: kernel 5.15 does not boot with 3ware card (never had this issue <= 5.14) - scsi 0:0:0:0: WARNING: (0x06:0x002C) : Command (0x12) timed out, resetting card

On 10/31/21 16:19, Justin Piszcz wrote:
> Diff between 5.14 and 5.15 .config files-- could it be something to do with
> CONFIG_IOMMU_DEFAULT_DMA_LAZY=y?

That's hard to say. Is CONFIG_MAGIC_SYSRQ enabled? If not, please enable
it and hit Alt-Printscreen-t (dump task list; see also
Documentation/admin-guide/sysrq.rst) and share the contents of the
kernel log. If that would not be convenient, please try to bisect this
issue.

Thanks,

Bart.

2021-11-01 08:54:39

by John Garry

[permalink] [raw]
Subject: Re: kernel 5.15 does not boot with 3ware card (never had this issue <= 5.14) - scsi 0:0:0:0: WARNING: (0x06:0x002C) : Command (0x12) timed out, resetting card

On 31/10/2021 23:52, Bart Van Assche wrote:
> On 10/31/21 16:19, Justin Piszcz wrote:
>> Diff between 5.14 and 5.15 .config files-- could it be something to do
>> with
>> CONFIG_IOMMU_DEFAULT_DMA_LAZY=y?

On x86 (intel or amd) iommu we were using lazy mode previously, but just
did not have a config option, so should not make a difference.


>
> That's hard to say. Is CONFIG_MAGIC_SYSRQ enabled? If not, please enable
> it and hit Alt-Printscreen-t (dump task list; see also
> Documentation/admin-guide/sysrq.rst) and share the contents of the
> kernel log. If that would not be convenient, please try to bisect this
> issue.

2021-11-01 10:38:54

by Justin Piszcz

[permalink] [raw]
Subject: Re: kernel 5.15 does not boot with 3ware card (never had this issue <= 5.14) - scsi 0:0:0:0: WARNING: (0x06:0x002C) : Command (0x12) timed out, resetting card

On Sun, Oct 31, 2021 at 7:52 PM Bart Van Assche <[email protected]> wrote:
>
> On 10/31/21 16:19, Justin Piszcz wrote:
> > Diff between 5.14 and 5.15 .config files-- could it be something to do with
> > CONFIG_IOMMU_DEFAULT_DMA_LAZY=y?
>
> That's hard to say. Is CONFIG_MAGIC_SYSRQ enabled? If not, please enable
> it and hit Alt-Printscreen-t (dump task list; see also
> Documentation/admin-guide/sysrq.rst) and share the contents of the
> kernel log. If that would not be convenient, please try to bisect this
> issue.

[ .. ]

It appears at this point in the boot process the keyboard (USB and
PS2) are not yet available and/or do not respond in this scenario (I
do have CONFIG_MAGIC_SYSRQ enabled+have used it in the past). I'll
build the prior 5.15-rc(1-7) to check where it stopped working and
reply back to the list when I have that info.

Thanks!

Justin.

2021-11-01 19:50:16

by Justin Piszcz

[permalink] [raw]
Subject: Re: kernel 5.15 does not boot with 3ware card (never had this issue <= 5.14) - scsi 0:0:0:0: WARNING: (0x06:0x002C) : Command (0x12) timed out, resetting card

On Mon, Nov 1, 2021 at 6:36 AM Justin Piszcz <[email protected]> wrote:
>
> On Sun, Oct 31, 2021 at 7:52 PM Bart Van Assche <[email protected]> wrote:
> >
> > On 10/31/21 16:19, Justin Piszcz wrote:
> > > Diff between 5.14 and 5.15 .config files-- could it be something to do with
> > > CONFIG_IOMMU_DEFAULT_DMA_LAZY=y?
> >
> > That's hard to say. Is CONFIG_MAGIC_SYSRQ enabled? If not, please enable
> > it and hit Alt-Printscreen-t (dump task list; see also
> > Documentation/admin-guide/sysrq.rst) and share the contents of the
> > kernel log. If that would not be convenient, please try to bisect this
> > issue.
>
> [ .. ]
>
> It appears at this point in the boot process the keyboard (USB and
> PS2) are not yet available and/or do not respond in this scenario (I
> do have CONFIG_MAGIC_SYSRQ enabled+have used it in the past). I'll
> build the prior 5.15-rc(1-7) to check where it stopped working and
> reply back to the list when I have that info.

[..]

I have tried all of the -rc's and they all hang at boot, keyboard
input (USB/PS2) is not working at this stage in the boot process.
Are there any thoughts on how to debug this further?

[9.305954] 3u-sas: scsi0: Found an LSI 3ware 9750-2414e Controller at
Oxfb760000, IRQ: 45.
[9.6179701 3u-sas: scsi0: Firmware FH9X 5.12.00.016, BIOS BE9X
5.11.00.007, Phys: 28.
[30.498007] scsi 0:0:0:0: WARNING: (0x06:0x002C) : Command (0x12)
timed out, resetting card
[71.4419581 scsi 0:0:0:0: WARNING: (0x06: 0x002C): Command (0x0) timed
out, resetting card.

# lilo
Added 5.14.8-1
Added 5.15.0-1 - hangs with the error above
Added 5.15.0-rc1-1 - hangs with the error above
Added 5.15.0-rc2-1 - hangs with the error above
Added 5.15.0-rc3-1 - hangs with the error above
Added 5.15.0-rc4-1 - hangs with the error above
Added 5.15.0-rc5-1 - hangs with the error above
Added 5.15.0-rc6-1 - hangs with the error above
Added 5.15.0-rc7-1 * - hangs with the error above

Regards,

Justin.

2021-11-01 20:05:51

by Douglas Miller

[permalink] [raw]
Subject: Re: kernel 5.15 does not boot with 3ware card (never had this issue <= 5.14) - scsi 0:0:0:0: WARNING: (0x06:0x002C) : Command (0x12) timed out, resetting card

I have seen a problem, with a different adapter and arch but similar
symptoms, where 5.14 worked and 5.15 did not. That was tracked down to a
difference in IRQ domain handling between the two kernels, resulting in
an IRQ essentially not working anymore. The fix was arch-specific and
not x86_64, but might be of interest:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5a4b0320783a

On 11/1/21 14:48, Justin Piszcz wrote:
> On Mon, Nov 1, 2021 at 6:36 AM Justin Piszcz <[email protected]> wrote:
>> On Sun, Oct 31, 2021 at 7:52 PM Bart Van Assche <[email protected]> wrote:
>>> On 10/31/21 16:19, Justin Piszcz wrote:
>>>> Diff between 5.14 and 5.15 .config files-- could it be something to do with
>>>> CONFIG_IOMMU_DEFAULT_DMA_LAZY=y?
>>> That's hard to say. Is CONFIG_MAGIC_SYSRQ enabled? If not, please enable
>>> it and hit Alt-Printscreen-t (dump task list; see also
>>> Documentation/admin-guide/sysrq.rst) and share the contents of the
>>> kernel log. If that would not be convenient, please try to bisect this
>>> issue.
>> [ .. ]
>>
>> It appears at this point in the boot process the keyboard (USB and
>> PS2) are not yet available and/or do not respond in this scenario (I
>> do have CONFIG_MAGIC_SYSRQ enabled+have used it in the past). I'll
>> build the prior 5.15-rc(1-7) to check where it stopped working and
>> reply back to the list when I have that info.
> [..]
>
> I have tried all of the -rc's and they all hang at boot, keyboard
> input (USB/PS2) is not working at this stage in the boot process.
> Are there any thoughts on how to debug this further?
>
> [9.305954] 3u-sas: scsi0: Found an LSI 3ware 9750-2414e Controller at
> Oxfb760000, IRQ: 45.
> [9.6179701 3u-sas: scsi0: Firmware FH9X 5.12.00.016, BIOS BE9X
> 5.11.00.007, Phys: 28.
> [30.498007] scsi 0:0:0:0: WARNING: (0x06:0x002C) : Command (0x12)
> timed out, resetting card
> [71.4419581 scsi 0:0:0:0: WARNING: (0x06: 0x002C): Command (0x0) timed
> out, resetting card.
>
> # lilo
> Added 5.14.8-1
> Added 5.15.0-1 - hangs with the error above
> Added 5.15.0-rc1-1 - hangs with the error above
> Added 5.15.0-rc2-1 - hangs with the error above
> Added 5.15.0-rc3-1 - hangs with the error above
> Added 5.15.0-rc4-1 - hangs with the error above
> Added 5.15.0-rc5-1 - hangs with the error above
> Added 5.15.0-rc6-1 - hangs with the error above
> Added 5.15.0-rc7-1 * - hangs with the error above
>
> Regards,
>
> Justin.

2021-11-03 16:20:08

by Justin Piszcz

[permalink] [raw]
Subject: Re: kernel 5.15 does not boot with 3ware card (never had this issue <= 5.14) - scsi 0:0:0:0: WARNING: (0x06:0x002C) : Command (0x12) timed out, resetting card

On Mon, Nov 1, 2021 at 4:03 PM Douglas Miller
<[email protected]> wrote:
>
> I have seen a problem, with a different adapter and arch but similar
> symptoms, where 5.14 worked and 5.15 did not. That was tracked down to a
> difference in IRQ domain handling between the two kernels, resulting in
> an IRQ essentially not working anymore. The fix was arch-specific and
> not x86_64, but might be of interest:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5a4b0320783a
>

Thanks!-- Has anyone else reading run into this issue and/or are there
any suggestions how I can troubleshoot this further (as all -rc's have
the same issue)?

[ .. ]

2021-11-03 16:26:57

by Bart Van Assche

[permalink] [raw]
Subject: Re: kernel 5.15 does not boot with 3ware card (never had this issue <= 5.14) - scsi 0:0:0:0: WARNING: (0x06:0x002C) : Command (0x12) timed out, resetting card

On 11/3/21 9:18 AM, Justin Piszcz wrote:
> Thanks!-- Has anyone else reading run into this issue and/or are there
> any suggestions how I can troubleshoot this further (as all -rc's have
> the same issue)?

How about bisecting this issue
(https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html)?

Thanks,

Bart.

2021-11-06 18:50:57

by Justin Piszcz

[permalink] [raw]
Subject: RE: kernel 5.15 does not boot with 3ware card (never had this issue <= 5.14) - scsi 0:0:0:0: WARNING: (0x06:0x002C) : Command (0x12) timed out, resetting card



-----Original Message-----
From: Bart Van Assche <[email protected]>
Sent: Wednesday, November 3, 2021 12:23 PM
To: Justin Piszcz <[email protected]>; Douglas Miller <[email protected]>
Cc: LKML <[email protected]>; [email protected]
Subject: Re: kernel 5.15 does not boot with 3ware card (never had this issue <= 5.14) - scsi 0:0:0:0: WARNING: (0x06:0x002C) : Command (0x12) timed out, resetting card

On 11/3/21 9:18 AM, Justin Piszcz wrote:
> Thanks!-- Has anyone else reading run into this issue and/or are there
> any suggestions how I can troubleshoot this further (as all -rc's have
> the same issue)?

How about bisecting this issue
(https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html)?

[ .. ]

I was having some issues finding a list of changes with git bisect, so I started checking the kernel .config and boot parameters:

I found the option that was causing the system not to boot (tested with 5.15.0 and latest linux-git as of 6 NOV 2021)
append="3w-sas.use_msi=1"

3w-sas.use_msi defaults to 0 (so now it is using IR-IO-APIC instead of MSI but now the machine boots using 5.15)
https://lwn.net/Articles/358679/

Something between 5.14 and 5.15 changed regarding x86_64's handling of Message Signaled Interrupts.
... which causes the kernel to no longer boot when 3w-sas.use_msi=1 is specified starting with 5.15.

Regards,

Justin.




2021-11-07 19:53:16

by Justin Piszcz

[permalink] [raw]
Subject: Re: kernel 5.15 does not boot with 3ware card (never had this issue <= 5.14) - scsi 0:0:0:0: WARNING: (0x06:0x002C) : Command (0x12) timed out, resetting card

On Sat, Nov 6, 2021 at 7:54 AM Justin Piszcz <[email protected]> wrote:
>
>
>
> -----Original Message-----
> From: Bart Van Assche <[email protected]>
> Sent: Wednesday, November 3, 2021 12:23 PM
> To: Justin Piszcz <[email protected]>; Douglas Miller <[email protected]>
> Cc: LKML <[email protected]>; [email protected]
> Subject: Re: kernel 5.15 does not boot with 3ware card (never had this issue <= 5.14) - scsi 0:0:0:0: WARNING: (0x06:0x002C) : Command (0x12) timed out, resetting card
>
> On 11/3/21 9:18 AM, Justin Piszcz wrote:
> > Thanks!-- Has anyone else reading run into this issue and/or are there
> > any suggestions how I can troubleshoot this further (as all -rc's have
> > the same issue)?
>
> How about bisecting this issue
> (https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html)?
>
> [ .. ]
>
> I was having some issues finding a list of changes with git bisect, so I started checking the kernel .config and boot parameters:
>
> I found the option that was causing the system not to boot (tested with 5.15.0 and latest linux-git as of 6 NOV 2021)
> append="3w-sas.use_msi=1"
>
> 3w-sas.use_msi defaults to 0 (so now it is using IR-IO-APIC instead of MSI but now the machine boots using 5.15)
> https://lwn.net/Articles/358679/
>
> Something between 5.14 and 5.15 changed regarding x86_64's handling of Message Signaled Interrupts.
> ... which causes the kernel to no longer boot when 3w-sas.use_msi=1 is specified starting with 5.15.

This only partially fixes the issues, trying to reboot also results in
a hard lockup on cpu 1 (this is semi-reproducible)
https://installkernel.tripod.com/5.15-reboot-lockup.jpg

Back to 5.14.x for now...



Justin.

2021-11-08 20:02:59

by Douglas Miller

[permalink] [raw]
Subject: Re: kernel 5.15 does not boot with 3ware card (never had this issue <= 5.14) - scsi 0:0:0:0: WARNING: (0x06:0x002C) : Command (0x12) timed out, resetting card

The commit I referenced earlier does point back to the commit that
caused the problem (that I saw). There was a series of commits related
to IRQ domains, this one seems to have actually caused the problem I saw:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a5f3d2c17b07


On 11/7/21 07:46, Justin Piszcz wrote:
> On Sat, Nov 6, 2021 at 7:54 AM Justin Piszcz <[email protected]> wrote:
>>
>>
>> -----Original Message-----
>> From: Bart Van Assche <[email protected]>
>> Sent: Wednesday, November 3, 2021 12:23 PM
>> To: Justin Piszcz <[email protected]>; Douglas Miller <[email protected]>
>> Cc: LKML <[email protected]>; [email protected]
>> Subject: Re: kernel 5.15 does not boot with 3ware card (never had this issue <= 5.14) - scsi 0:0:0:0: WARNING: (0x06:0x002C) : Command (0x12) timed out, resetting card
>>
>> On 11/3/21 9:18 AM, Justin Piszcz wrote:
>>> Thanks!-- Has anyone else reading run into this issue and/or are there
>>> any suggestions how I can troubleshoot this further (as all -rc's have
>>> the same issue)?
>> How about bisecting this issue
>> (https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html)?
>>
>> [ .. ]
>>
>> I was having some issues finding a list of changes with git bisect, so I started checking the kernel .config and boot parameters:
>>
>> I found the option that was causing the system not to boot (tested with 5.15.0 and latest linux-git as of 6 NOV 2021)
>> append="3w-sas.use_msi=1"
>>
>> 3w-sas.use_msi defaults to 0 (so now it is using IR-IO-APIC instead of MSI but now the machine boots using 5.15)
>> https://lwn.net/Articles/358679/
>>
>> Something between 5.14 and 5.15 changed regarding x86_64's handling of Message Signaled Interrupts.
>> ... which causes the kernel to no longer boot when 3w-sas.use_msi=1 is specified starting with 5.15.
> This only partially fixes the issues, trying to reboot also results in
> a hard lockup on cpu 1 (this is semi-reproducible)
> https://installkernel.tripod.com/5.15-reboot-lockup.jpg
>
> Back to 5.14.x for now...
>
>
>
> Justin.

2021-11-16 00:07:42

by Justin Piszcz

[permalink] [raw]
Subject: Re: kernel 5.15 does not boot with 3ware card (never had this issue <= 5.14) - scsi 0:0:0:0: WARNING: (0x06:0x002C) : Command (0x12) timed out, resetting card

On Mon, Nov 8, 2021 at 9:16 AM Douglas Miller
<[email protected]> wrote:
>
> The commit I referenced earlier does point back to the commit that
> caused the problem (that I saw). There was a series of commits related
> to IRQ domains, this one seems to have actually caused the problem I saw:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a5f3d2c17b07

Incase anyone runs into this issue, it appears to be fixed in
5.16.0-rc1, no more freezing at boot:

$ uname -a
Linux atom 5.16.0-rc1 #2 SMP Mon Nov 15 15:37:25 EST 2021 x86_64 GNU/Linux

Justin.