2023-03-21 13:52:09

by Thorsten Leemhuis

[permalink] [raw]
Subject: [regression] Bug 217218 - Trying to boot Linux version 6-2.2 kernel with Marvell SATA controller 88SE9235

Hi, Thorsten here, the Linux kernel's regression tracker.

I noticed a regression report in bugzilla.kernel.org. As many (most?)
kernel developers don't keep an eye on it, I decided to forward it by
mail (note, the reporter *is not* CCed to this mail, see[1]).

Christoph, apparently it's caused by a commit of yours.

Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=217218 :

> jason_a69 2023-03-20 11:05:54 UTC
>
> The machine will not boot as the controller appears to be lock up, reset
> itself and then only 2 of the 4 disks are detected which are connected
> to the controller.
>
> The man errors I am getting are
>
> dmar_fault 8 callbacks suppressed
> DMAR : DRHD: handling fault status req 2
> DMAR : [DMA Write NO_PASID] Request device [07.00.1] fault addr
> 0xfffe0000 [fault reason 0x82] Present bit in contect entry is clear
>
> Kernel version 5.15.91 works fine, I also tried 6-0.0 which also failed
>
> Looking in the change log for 6.0
> https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.0
>
> There are quite a few iommu changes. As a result I changed
> /etc/default/grub from
>
> GRUB_CMDLINE_LINUX="iommu=soft intel_iommu=on"
>
> to
>
> GRUB_CMDLINE_LINUX="iommu=soft intel_iommu=on iommu.forcedac=1"
>
> which did not help.
>
> If I do lspci on a kernel that boots I get
>
> 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor DRAM Controller (rev 09)
> 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
> 00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
> 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
> 00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 05)
> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5)
> 00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b5)
> 00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b5)
> 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5)
> 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
> 00:1f.0 ISA bridge: Intel Corporation H61 Express Chipset LPC Controller (rev 05)
> 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port Desktop SATA AHCI Controller (rev 05)
> 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
> 02:00.0 USB controller: Etron Technology, Inc. EJ188/EJ198 USB 3.0 Host Controller
> 03:00.0 PCI bridge: PLX Technology, Inc. PEX 8603 3-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ab)
> 04:01.0 PCI bridge: PLX Technology, Inc. PEX 8603 3-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ab)
> 04:02.0 PCI bridge: PLX Technology, Inc. PEX 8603 3-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ab)
> 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
> 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
> 07:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 x2 4-port SATA 6 Gb/s Controller (rev 10)
>
> I have had a look at the kernel parameters which are here
>
> https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html?highlight=iommu
>
> I have tried a few different parameters, the only thing that did work was
> intel_iommu=off
>
> Using that option would mess up my VMs so I would rather not do that.
>
> I am sure I am just missing a kernel parameter.
>
> [...]
>
> git bisect bad 78013eaadf696d2105982abb4018fbae394ca08f
> # first bad commit: [78013eaadf696d2105982abb4018fbae394ca08f] x86: remove the IOMMU table infrastructure

See the ticket for more details.


[TLDR for the rest of this mail: I'm adding this report to the list of
tracked Linux kernel regressions; the text you find below is based on a
few templates paragraphs you might have encountered already in similar
form.]

BTW, let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:

#regzbot introduced: 78013eaadf696d21
https://bugzilla.kernel.org/show_bug.cgi?id=217218
#regzbot title: dma/x86: machine with Marvell SATA controller 88SE9235
stopped booting
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (e.g. the buzgzilla ticket and maybe this mail as well, if
this thread sees some discussion). See page linked in footer for details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

[1] because bugzilla.kernel.org tells users upon registration their
"email address will never be displayed to logged out users"


2023-03-22 14:02:31

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [regression] Bug 217218 - Trying to boot Linux version 6-2.2 kernel with Marvell SATA controller 88SE9235

On Wed, Mar 22, 2023 at 05:46:09PM +0800, Jason Adriaanse wrote:
> Bonjuerno Christoph,
>
> I am the user who logged the original problem, is there anything I can do
> to help debug the issue?
>
> I do not mind testing patches or to help in any way I can.

Hi Jason,

I'm a little unresponsive right now as I'm dealing with the fallout
of a strike tomorrow that is disrupting my travel. So for now,
just a quick idea off my mind:

1) is CONFIG_GART_IOMMU enabled in your kernel
2) if so can you disable it and see if that makes the problem go away?

2023-03-28 01:34:14

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [regression] Bug 217218 - Trying to boot Linux version 6-2.2 kernel with Marvell SATA controller 88SE9235


I finally found some real time to look into this:

On Tue, Mar 21, 2023 at 02:52:00PM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
> > The man errors I am getting are
> >
> > dmar_fault 8 callbacks suppressed
> > DMAR : DRHD: handling fault status req 2
> > DMAR : [DMA Write NO_PASID] Request device [07.00.1] fault addr
> > 0xfffe0000 [fault reason 0x82] Present bit in contect entry is clear

This clearly indicates that my original idea about the AMD gart was
completely bonkers, as we're obviously on an Intel platform.

And this indicates that the device is trying to do a DMA write to
something that isn't IOMMU mapped. Getting this from an initialization
change (commit 78013eaadf6 (x86: remove the IOMMU table infrastructure")
feels very strange to me.

Can you maybe post the full dmesg? I wonder if there is interesting
initialization error in here.

2023-03-30 12:26:53

by Robin Murphy

[permalink] [raw]
Subject: Re: [regression] Bug 217218 - Trying to boot Linux version 6-2.2 kernel with Marvell SATA controller 88SE9235

On 2023-03-28 02:22, Christoph Hellwig wrote:
>
> I finally found some real time to look into this:
>
> On Tue, Mar 21, 2023 at 02:52:00PM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
>>> The man errors I am getting are
>>>
>>> dmar_fault 8 callbacks suppressed
>>> DMAR : DRHD: handling fault status req 2
>>> DMAR : [DMA Write NO_PASID] Request device [07.00.1] fault addr
>>> 0xfffe0000 [fault reason 0x82] Present bit in contect entry is clear
>
> This clearly indicates that my original idea about the AMD gart was
> completely bonkers, as we're obviously on an Intel platform.
>
> And this indicates that the device is trying to do a DMA write to
> something that isn't IOMMU mapped. Getting this from an initialization
> change (commit 78013eaadf6 (x86: remove the IOMMU table infrastructure")
> feels very strange to me.
>
> Can you maybe post the full dmesg? I wonder if there is interesting
> initialization error in here.

FWIW "Marvell SATA" instantly makes me suspect the phantom function
quirk. What *should* happen is the IOMMU driver sees the PCI DMA aliases
correctly and sets up context entries for both 07.00.0 and 07.00.1, but
it looks like that may be what's gone awry.

Robin.

2023-04-16 07:06:06

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [regression] Bug 217218 - Trying to boot Linux version 6-2.2 kernel with Marvell SATA controller 88SE9235

On Thu, Mar 30, 2023 at 01:18:45PM +0100, Robin Murphy wrote:
> FWIW "Marvell SATA" instantly makes me suspect the phantom function quirk.
> What *should* happen is the IOMMU driver sees the PCI DMA aliases correctly
> and sets up context entries for both 07.00.0 and 07.00.1, but it looks like
> that may be what's gone awry.

Looking at the bug report it seems this is device 9235, which doesn't
need the DMA alias quirks.

2023-04-16 07:08:36

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [regression] Bug 217218 - Trying to boot Linux version 6-2.2 kernel with Marvell SATA controller 88SE9235

Hi Jason,

sorry for the late reply. I had some email issues and am still
recovering from the backlog.

On Fri, Mar 31, 2023 at 10:20:37AM +0800, Jason Adriaanse wrote:
> Hi Christoph and Robin,
>
> Christoph - I would like to send you more dmesg information but as my boot
> device cannot be detected that information is not being written to disk. Is
> there any way to specifically write boot debug information to say a USB
> device with some kernel parameters?

I don't know of any good way. pstore has some ways to save kernel
messages, but it doesn't work to well with normal block devices in
case of crashes.

I'm a bit lost at the moment.

Two ideas I have, would be to

1) boot with the intel_iommu=off kernel command line
3) build a kernel with CONFIG_INTEL_IOMMU

and see if that works and report the dmesg.

2023-04-17 11:55:52

by Robin Murphy

[permalink] [raw]
Subject: Re: [regression] Bug 217218 - Trying to boot Linux version 6-2.2 kernel with Marvell SATA controller 88SE9235

On 2023-04-16 07:41, Christoph Hellwig wrote:
> On Thu, Mar 30, 2023 at 01:18:45PM +0100, Robin Murphy wrote:
>> FWIW "Marvell SATA" instantly makes me suspect the phantom function quirk.
>> What *should* happen is the IOMMU driver sees the PCI DMA aliases correctly
>> and sets up context entries for both 07.00.0 and 07.00.1, but it looks like
>> that may be what's gone awry.
>
> Looking at the bug report it seems this is device 9235, which doesn't
> need the DMA alias quirks.

Indeed that one doesn't appear to be in the quirk list currently.
However the symptom of DMA traffic from function 1 which the IOMMU
clearly wasn't expecting firmly suggests that it *does* need the quirk.
Digging up the original report, the lspci output there suggests that
07:00.1 isn't a real function, which would further confirm it.

The other thing which catches my interest is the seemingly-conflicting
"iommu=soft" and "intel_iommu=on" arguments - I could well believe that
refactoring the x86 IOMMU detection stuff might have subtly changed the
interaction there, such that previously it ended up not actually using
the IOMMU for DMA ops, but now it is?

Robin.

2023-04-24 13:21:49

by Robin Murphy

[permalink] [raw]
Subject: Re: [regression] Bug 217218 - Trying to boot Linux version 6-2.2 kernel with Marvell SATA controller 88SE9235

On 2023-04-22 07:25, Jason Adriaanse wrote:
> Hi Christoph,
>
> Sorry for my late reply, I have been on the road.
>
> So, if I boot with
> intel_iommu=off
> Then the server boots fine..although that is not a solution because I
> need Intel iommu for virtualisation.
>
> Also, I build all my kernels with CONFIG_INTEL_IOMMU=y
>

If you boot 5.15 *without* the "iommu=soft" argument, just
"intel_iommu=on", does that also break?

Robin.

2023-04-24 14:14:17

by Robin Murphy

[permalink] [raw]
Subject: Re: [regression] Bug 217218 - Trying to boot Linux version 6-2.2 kernel with Marvell SATA controller 88SE9235

On 2023-04-24 14:44, Jason Adriaanse wrote:
> I took out "iommu=soft" and the server failed to boot, so yes it does
> break.
>
> The first error was
> ata7.00: Failed to IDENTIFY (INIT_DEV_PARAMS failed , err_mask=0x80)

OK, great, that confirms the underlying issue existed all along, so the
regression is only a change in who wins a fight between certain
conflicting command-line arguments, which is arguably not so critical.

The rest of the evidence points to 88SE9235 wanting the same phantom
function quirk as most other Marvell controllers, since although it's
apparently been half-fixed such that DMA for two of the ports is being
correctly emitted from function 0 - given that you say two of the disks
*are* detected OK - the other two are still claiming to be function 1
after all.

Thanks,
Robin.

> On 24/04/2023 21:20, Robin Murphy wrote:
>> On 2023-04-22 07:25, Jason Adriaanse wrote:
>>> Hi Christoph,
>>>
>>> Sorry for my late reply, I have been on the road.
>>>
>>> So, if I boot with
>>> intel_iommu=off
>>> Then the server boots fine..although that is not a solution because I
>>> need Intel iommu for virtualisation.
>>>
>>> Also, I build all my kernels with CONFIG_INTEL_IOMMU=y
>>>
>>
>> If you boot 5.15 *without* the "iommu=soft" argument, just
>> "intel_iommu=on", does that also break?
>>
>> Robin.

2023-04-25 11:45:50

by Robin Murphy

[permalink] [raw]
Subject: Re: [regression] Bug 217218 - Trying to boot Linux version 6-2.2 kernel with Marvell SATA controller 88SE9235

On 2023-04-25 05:17, Jason Adriaanse wrote:
> Ok great,
>
> I take it a change needs to be made in
> drivers/pci/quirks.c
> ?
> I do not mind making the change locally here and letting you know if it
> works or not.

Indeed, something like this (make sure the IDs actually match what your
device reports, I'm just guessing):


diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 44cab813bf95..a9166e886b75 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4161,6 +4161,8 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9220,
/* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c49 */
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9230,
quirk_dma_func1_alias);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9235,
+ quirk_dma_func1_alias);
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0642,
quirk_dma_func1_alias);
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0645,


Marvell themselves seem to lump the 88SE92xx products together as a
closely-related family, so given that we do have quirks for 3 of the 4
already, this one does rather seem conspicuous by its absence...

Thanks,
Robin.

> On 24/04/2023 22:07, Robin Murphy wrote:
>> On 2023-04-24 14:44, Jason Adriaanse wrote:
>>> I took out "iommu=soft" and the server failed to boot, so yes it does
>>> break.
>>>
>>> The first error was
>>> ata7.00: Failed to IDENTIFY (INIT_DEV_PARAMS failed , err_mask=0x80)
>>
>> OK, great, that confirms the underlying issue existed all along, so
>> the regression is only a change in who wins a fight between certain
>> conflicting command-line arguments, which is arguably not so critical.
>>
>> The rest of the evidence points to 88SE9235 wanting the same phantom
>> function quirk as most other Marvell controllers, since although it's
>> apparently been half-fixed such that DMA for two of the ports is being
>> correctly emitted from function 0 - given that you say two of the
>> disks *are* detected OK - the other two are still claiming to be
>> function 1 after all.
>>
>> Thanks,
>> Robin.
>>
>>> On 24/04/2023 21:20, Robin Murphy wrote:
>>>> On 2023-04-22 07:25, Jason Adriaanse wrote:
>>>>> Hi Christoph,
>>>>>
>>>>> Sorry for my late reply, I have been on the road.
>>>>>
>>>>> So, if I boot with
>>>>> intel_iommu=off
>>>>> Then the server boots fine..although that is not a solution because
>>>>> I need Intel iommu for virtualisation.
>>>>>
>>>>> Also, I build all my kernels with CONFIG_INTEL_IOMMU=y
>>>>>
>>>>
>>>> If you boot 5.15 *without* the "iommu=soft" argument, just
>>>> "intel_iommu=on", does that also break?
>>>>
>>>> Robin.

2023-05-22 10:52:38

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [regression] Bug 217218 - Trying to boot Linux version 6-2.2 kernel with Marvell SATA controller 88SE9235

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

I might be missing something, but it looks to me like this regression
was never fixed in mainline. Which is strange, as we apparently had a
patch from Robin that fixed the issue for the reporter.

Did it fall through the cracks or what am I missing?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 25.04.23 15:58, Jason Adriaanse wrote:
> I am happy to report that the change worked, this is what
> drivers/pci/quirks.c looks like
>
> /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c49 */
> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9230,
>                          quirk_dma_func1_alias);
> /* https://bugzilla.kernel.org/show_bug.cgi?id=217218 */
> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9235,
>                          quirk_dma_func1_alias);
> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0642,
>                          quirk_dma_func1_alias);
>
> Relevant output of dmesg -T with the new kernel running
>
> Tue Apr 25 21:45:13 2023] scsi host0: ahci
> [Tue Apr 25 21:45:13 2023] scsi host1: ahci
> [Tue Apr 25 21:45:13 2023] scsi host2: ahci
> [Tue Apr 25 21:45:13 2023] scsi host3: ahci
> [Tue Apr 25 21:45:13 2023] ata1: SATA max UDMA/133 abar m2048@0xf7d06000
> port 0xf7d06100 irq 40
> [Tue Apr 25 21:45:13 2023] ata2: SATA max UDMA/133 abar m2048@0xf7d06000
> port 0xf7d06180 irq 40
> [Tue Apr 25 21:45:13 2023] ata3: DUMMY
> [Tue Apr 25 21:45:13 2023] ata4: DUMMY
> [Tue Apr 25 21:45:13 2023] igb 0000:05:00.0 enp5s0: renamed from eth0
> [Tue Apr 25 21:45:13 2023] ahci 0000:07:00.0: AHCI 0001.0000 32 slots 4
> ports 6 Gbps 0xf impl SATA mode
> [Tue Apr 25 21:45:13 2023] ahci 0000:07:00.0: flags: 64bit ncq sntf led
> only pmp fbs pio slum part sxs
> [Tue Apr 25 21:45:13 2023] scsi host4: ahci
> [Tue Apr 25 21:45:13 2023] scsi host5: ahci
> [Tue Apr 25 21:45:13 2023] scsi host6: ahci
> [Tue Apr 25 21:45:13 2023] scsi host7: ahci
> [Tue Apr 25 21:45:13 2023] ata5: SATA max UDMA/133 abar m2048@0xf7b10000
> port 0xf7b10100 irq 41
> [Tue Apr 25 21:45:13 2023] ata6: SATA max UDMA/133 abar m2048@0xf7b10000
> port 0xf7b10180 irq 41
> [Tue Apr 25 21:45:13 2023] ata7: SATA max UDMA/133 abar m2048@0xf7b10000
> port 0xf7b10200 irq 41
> [Tue Apr 25 21:45:13 2023] ata8: SATA max UDMA/133 abar m2048@0xf7b10000
> port 0xf7b10280 irq 41
> [Tue Apr 25 21:45:13 2023] usb 1-1: new high-speed USB device number 2
> using ehci-pci
> [Tue Apr 25 21:45:14 2023] usb 3-1: new high-speed USB device number 2
> using ehci-pci
> [Tue Apr 25 21:45:14 2023] ata8: SATA link up 6.0 Gbps (SStatus 133
> SControl 300)
> [Tue Apr 25 21:45:14 2023] ata6: SATA link up 6.0 Gbps (SStatus 133
> SControl 300)
> [Tue Apr 25 21:45:14 2023] ata7: SATA link up 6.0 Gbps (SStatus 133
> SControl 300)
> [Tue Apr 25 21:45:14 2023] ata5: SATA link up 6.0 Gbps (SStatus 133
> SControl 300)
> [Tue Apr 25 21:45:14 2023] ata7.00: ATA-9: WDC WD40EFRX-68WT0N0,
> 80.00A80, max UDMA/133
> [Tue Apr 25 21:45:14 2023] ata6.00: ATA-9: WDC WD40EFRX-68WT0N0,
> 80.00A80, max UDMA/133
> [Tue Apr 25 21:45:14 2023] ata8.00: ATA-9: WDC WD40EFRX-68WT0N0,
> 80.00A80, max UDMA/133
> [Tue Apr 25 21:45:14 2023] ata5.00: ATA-10: CT2000BX500SSD1, M6CR030,
> max UDMA/133
> [Tue Apr 25 21:45:14 2023] ata6.00: 7814037168 sectors, multi 0: LBA48
> NCQ (depth 32), AA
> [Tue Apr 25 21:45:14 2023] ata7.00: 7814037168 sectors, multi 0: LBA48
> NCQ (depth 32), AA
> [Tue Apr 25 21:45:14 2023] ata8.00: 7814037168 sectors, multi 0: LBA48
> NCQ (depth 32), AA
> [Tue Apr 25 21:45:14 2023] ata5.00: 3907029168 sectors, multi 1: LBA48
> NCQ (depth 32), AA
> [Tue Apr 25 21:45:14 2023] ata6.00: configured for UDMA/133
> [Tue Apr 25 21:45:14 2023] ata7.00: configured for UDMA/133
> [Tue Apr 25 21:45:14 2023] ata8.00: configured for UDMA/133
> [Tue Apr 25 21:45:14 2023] ata1: SATA link down (SStatus 0 SControl 300)
> [Tue Apr 25 21:45:14 2023] ata5.00: Features: Dev-Sleep
> [Tue Apr 25 21:45:14 2023] ata5.00: configured for UDMA/133
> [Tue Apr 25 21:45:14 2023] usb 1-1: New USB device found, idVendor=8087,
> idProduct=0024, bcdDevice= 0.00
> [Tue Apr 25 21:45:14 2023] usb 1-1: New USB device strings: Mfr=0,
> Product=0, SerialNumber=0
> [Tue Apr 25 21:45:14 2023] hub 1-1:1.0: USB hub found
> [Tue Apr 25 21:45:14 2023] hub 1-1:1.0: 4 ports detected
> [Tue Apr 25 21:45:14 2023] usb 3-1: New USB device found, idVendor=8087,
> idProduct=0024, bcdDevice= 0.00
> [Tue Apr 25 21:45:14 2023] usb 3-1: New USB device strings: Mfr=0,
> Product=0, SerialNumber=0
> [Tue Apr 25 21:45:14 2023] hub 3-1:1.0: USB hub found
> [Tue Apr 25 21:45:14 2023] hub 3-1:1.0: 6 ports detected
> [Tue Apr 25 21:45:14 2023] ata2: SATA link down (SStatus 0 SControl 300)
> [Tue Apr 25 21:45:14 2023] scsi 4:0:0:0: Direct-Access ATA     
> CT2000BX500SSD1  030  PQ: 0 ANSI: 5
> [Tue Apr 25 21:45:14 2023] scsi 5:0:0:0: Direct-Access ATA      WDC
> WD40EFRX-68W 0A80 PQ: 0 ANSI: 5
> [Tue Apr 25 21:45:14 2023] scsi 6:0:0:0: Direct-Access ATA      WDC
> WD40EFRX-68W 0A80 PQ: 0 ANSI: 5
> [Tue Apr 25 21:45:14 2023] scsi 7:0:0:0: Direct-Access ATA      WDC
> WD40EFRX-68W 0A80 PQ: 0 ANSI: 5
>
> Thanks everyone for all your help.
>
> Jason
>
>
> On 25/04/2023 19:37, Robin Murphy wrote:
>> On 2023-04-25 05:17, Jason Adriaanse wrote:
>>> Ok great,
>>>
>>> I take it a change needs to be made in
>>> drivers/pci/quirks.c
>>> ?
>>> I do not mind making the change locally here and letting you know if
>>> it works or not.
>>
>> Indeed, something like this (make sure the IDs actually match what your
>> device reports, I'm just guessing):
>>
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index 44cab813bf95..a9166e886b75 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -4161,6 +4161,8 @@
>> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9220,
>>  /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c49 */
>>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9230,
>>               quirk_dma_func1_alias);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9235,
>> +             quirk_dma_func1_alias);
>>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0642,
>>               quirk_dma_func1_alias);
>>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0645,
>>
>>
>> Marvell themselves seem to lump the 88SE92xx products together as a
>> closely-related family, so given that we do have quirks for 3 of the 4
>> already, this one does rather seem conspicuous by its absence...
>>
>> Thanks,
>> Robin.
>>
>>> On 24/04/2023 22:07, Robin Murphy wrote:
>>>> On 2023-04-24 14:44, Jason Adriaanse wrote:
>>>>> I took out "iommu=soft" and the server failed to boot, so yes it
>>>>> does break.
>>>>>
>>>>> The first error was
>>>>> ata7.00: Failed to IDENTIFY (INIT_DEV_PARAMS failed , err_mask=0x80)
>>>>
>>>> OK, great, that confirms the underlying issue existed all along, so
>>>> the regression is only a change in who wins a fight between certain
>>>> conflicting command-line arguments, which is arguably not so critical.
>>>>
>>>> The rest of the evidence points to 88SE9235 wanting the same phantom
>>>> function quirk as most other Marvell controllers, since although
>>>> it's apparently been half-fixed such that DMA for two of the ports
>>>> is being correctly emitted from function 0 - given that you say two
>>>> of the disks *are* detected OK - the other two are still claiming to
>>>> be function 1 after all.
>>>>
>>>> Thanks,
>>>> Robin.
>>>>
>>>>> On 24/04/2023 21:20, Robin Murphy wrote:
>>>>>> On 2023-04-22 07:25, Jason Adriaanse wrote:
>>>>>>> Hi Christoph,
>>>>>>>
>>>>>>> Sorry for my late reply, I have been on the road.
>>>>>>>
>>>>>>> So, if I boot with
>>>>>>> intel_iommu=off
>>>>>>> Then the server boots fine..although that is not a solution
>>>>>>> because I need Intel iommu for virtualisation.
>>>>>>>
>>>>>>> Also, I build all my kernels with CONFIG_INTEL_IOMMU=y
>>>>>>>
>>>>>>
>>>>>> If you boot 5.15 *without* the "iommu=soft" argument, just
>>>>>> "intel_iommu=on", does that also break?
>>>>>>
>>>>>> Robin.
>
>

2023-05-22 11:30:44

by Robin Murphy

[permalink] [raw]
Subject: Re: [regression] Bug 217218 - Trying to boot Linux version 6-2.2 kernel with Marvell SATA controller 88SE9235

On 2023-05-22 11:26, Linux regression tracking (Thorsten Leemhuis) wrote:
> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> for once, to make this easily accessible to everyone.
>
> I might be missing something, but it looks to me like this regression
> was never fixed in mainline. Which is strange, as we apparently had a
> patch from Robin that fixed the issue for the reporter.
>
> Did it fall through the cracks or what am I missing?

Strictly, the regression itself has not been fixed - I guess it does
just about qualify since the rather-out-of-date
Documentation/arch/x86/x86_64/boot-options.rst does still say that
iommu=soft "can be used to prevent the usage of an available hardware
IOMMU", and that seems to be what has stopped happening here.

What it exposed was a latent issue that this particular device has never
been properly supported for use with an IOMMU, and that's what I guessed
at a fix for.

Thanks,
Robin.

> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot poke
>
> On 25.04.23 15:58, Jason Adriaanse wrote:
>> I am happy to report that the change worked, this is what
>> drivers/pci/quirks.c looks like
>>
>> /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c49 */
>> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9230,
>>                          quirk_dma_func1_alias);
>> /* https://bugzilla.kernel.org/show_bug.cgi?id=217218 */
>> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9235,
>>                          quirk_dma_func1_alias);
>> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0642,
>>                          quirk_dma_func1_alias);
>>
>> Relevant output of dmesg -T with the new kernel running
>>
>> Tue Apr 25 21:45:13 2023] scsi host0: ahci
>> [Tue Apr 25 21:45:13 2023] scsi host1: ahci
>> [Tue Apr 25 21:45:13 2023] scsi host2: ahci
>> [Tue Apr 25 21:45:13 2023] scsi host3: ahci
>> [Tue Apr 25 21:45:13 2023] ata1: SATA max UDMA/133 abar m2048@0xf7d06000
>> port 0xf7d06100 irq 40
>> [Tue Apr 25 21:45:13 2023] ata2: SATA max UDMA/133 abar m2048@0xf7d06000
>> port 0xf7d06180 irq 40
>> [Tue Apr 25 21:45:13 2023] ata3: DUMMY
>> [Tue Apr 25 21:45:13 2023] ata4: DUMMY
>> [Tue Apr 25 21:45:13 2023] igb 0000:05:00.0 enp5s0: renamed from eth0
>> [Tue Apr 25 21:45:13 2023] ahci 0000:07:00.0: AHCI 0001.0000 32 slots 4
>> ports 6 Gbps 0xf impl SATA mode
>> [Tue Apr 25 21:45:13 2023] ahci 0000:07:00.0: flags: 64bit ncq sntf led
>> only pmp fbs pio slum part sxs
>> [Tue Apr 25 21:45:13 2023] scsi host4: ahci
>> [Tue Apr 25 21:45:13 2023] scsi host5: ahci
>> [Tue Apr 25 21:45:13 2023] scsi host6: ahci
>> [Tue Apr 25 21:45:13 2023] scsi host7: ahci
>> [Tue Apr 25 21:45:13 2023] ata5: SATA max UDMA/133 abar m2048@0xf7b10000
>> port 0xf7b10100 irq 41
>> [Tue Apr 25 21:45:13 2023] ata6: SATA max UDMA/133 abar m2048@0xf7b10000
>> port 0xf7b10180 irq 41
>> [Tue Apr 25 21:45:13 2023] ata7: SATA max UDMA/133 abar m2048@0xf7b10000
>> port 0xf7b10200 irq 41
>> [Tue Apr 25 21:45:13 2023] ata8: SATA max UDMA/133 abar m2048@0xf7b10000
>> port 0xf7b10280 irq 41
>> [Tue Apr 25 21:45:13 2023] usb 1-1: new high-speed USB device number 2
>> using ehci-pci
>> [Tue Apr 25 21:45:14 2023] usb 3-1: new high-speed USB device number 2
>> using ehci-pci
>> [Tue Apr 25 21:45:14 2023] ata8: SATA link up 6.0 Gbps (SStatus 133
>> SControl 300)
>> [Tue Apr 25 21:45:14 2023] ata6: SATA link up 6.0 Gbps (SStatus 133
>> SControl 300)
>> [Tue Apr 25 21:45:14 2023] ata7: SATA link up 6.0 Gbps (SStatus 133
>> SControl 300)
>> [Tue Apr 25 21:45:14 2023] ata5: SATA link up 6.0 Gbps (SStatus 133
>> SControl 300)
>> [Tue Apr 25 21:45:14 2023] ata7.00: ATA-9: WDC WD40EFRX-68WT0N0,
>> 80.00A80, max UDMA/133
>> [Tue Apr 25 21:45:14 2023] ata6.00: ATA-9: WDC WD40EFRX-68WT0N0,
>> 80.00A80, max UDMA/133
>> [Tue Apr 25 21:45:14 2023] ata8.00: ATA-9: WDC WD40EFRX-68WT0N0,
>> 80.00A80, max UDMA/133
>> [Tue Apr 25 21:45:14 2023] ata5.00: ATA-10: CT2000BX500SSD1, M6CR030,
>> max UDMA/133
>> [Tue Apr 25 21:45:14 2023] ata6.00: 7814037168 sectors, multi 0: LBA48
>> NCQ (depth 32), AA
>> [Tue Apr 25 21:45:14 2023] ata7.00: 7814037168 sectors, multi 0: LBA48
>> NCQ (depth 32), AA
>> [Tue Apr 25 21:45:14 2023] ata8.00: 7814037168 sectors, multi 0: LBA48
>> NCQ (depth 32), AA
>> [Tue Apr 25 21:45:14 2023] ata5.00: 3907029168 sectors, multi 1: LBA48
>> NCQ (depth 32), AA
>> [Tue Apr 25 21:45:14 2023] ata6.00: configured for UDMA/133
>> [Tue Apr 25 21:45:14 2023] ata7.00: configured for UDMA/133
>> [Tue Apr 25 21:45:14 2023] ata8.00: configured for UDMA/133
>> [Tue Apr 25 21:45:14 2023] ata1: SATA link down (SStatus 0 SControl 300)
>> [Tue Apr 25 21:45:14 2023] ata5.00: Features: Dev-Sleep
>> [Tue Apr 25 21:45:14 2023] ata5.00: configured for UDMA/133
>> [Tue Apr 25 21:45:14 2023] usb 1-1: New USB device found, idVendor=8087,
>> idProduct=0024, bcdDevice= 0.00
>> [Tue Apr 25 21:45:14 2023] usb 1-1: New USB device strings: Mfr=0,
>> Product=0, SerialNumber=0
>> [Tue Apr 25 21:45:14 2023] hub 1-1:1.0: USB hub found
>> [Tue Apr 25 21:45:14 2023] hub 1-1:1.0: 4 ports detected
>> [Tue Apr 25 21:45:14 2023] usb 3-1: New USB device found, idVendor=8087,
>> idProduct=0024, bcdDevice= 0.00
>> [Tue Apr 25 21:45:14 2023] usb 3-1: New USB device strings: Mfr=0,
>> Product=0, SerialNumber=0
>> [Tue Apr 25 21:45:14 2023] hub 3-1:1.0: USB hub found
>> [Tue Apr 25 21:45:14 2023] hub 3-1:1.0: 6 ports detected
>> [Tue Apr 25 21:45:14 2023] ata2: SATA link down (SStatus 0 SControl 300)
>> [Tue Apr 25 21:45:14 2023] scsi 4:0:0:0: Direct-Access ATA
>> CT2000BX500SSD1  030  PQ: 0 ANSI: 5
>> [Tue Apr 25 21:45:14 2023] scsi 5:0:0:0: Direct-Access ATA      WDC
>> WD40EFRX-68W 0A80 PQ: 0 ANSI: 5
>> [Tue Apr 25 21:45:14 2023] scsi 6:0:0:0: Direct-Access ATA      WDC
>> WD40EFRX-68W 0A80 PQ: 0 ANSI: 5
>> [Tue Apr 25 21:45:14 2023] scsi 7:0:0:0: Direct-Access ATA      WDC
>> WD40EFRX-68W 0A80 PQ: 0 ANSI: 5
>>
>> Thanks everyone for all your help.
>>
>> Jason
>>
>>
>> On 25/04/2023 19:37, Robin Murphy wrote:
>>> On 2023-04-25 05:17, Jason Adriaanse wrote:
>>>> Ok great,
>>>>
>>>> I take it a change needs to be made in
>>>> drivers/pci/quirks.c
>>>> ?
>>>> I do not mind making the change locally here and letting you know if
>>>> it works or not.
>>>
>>> Indeed, something like this (make sure the IDs actually match what your
>>> device reports, I'm just guessing):
>>>
>>>
>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>> index 44cab813bf95..a9166e886b75 100644
>>> --- a/drivers/pci/quirks.c
>>> +++ b/drivers/pci/quirks.c
>>> @@ -4161,6 +4161,8 @@
>>> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9220,
>>>  /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c49 */
>>>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9230,
>>>               quirk_dma_func1_alias);
>>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9235,
>>> +             quirk_dma_func1_alias);
>>>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0642,
>>>               quirk_dma_func1_alias);
>>>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0645,
>>>
>>>
>>> Marvell themselves seem to lump the 88SE92xx products together as a
>>> closely-related family, so given that we do have quirks for 3 of the 4
>>> already, this one does rather seem conspicuous by its absence...
>>>
>>> Thanks,
>>> Robin.
>>>
>>>> On 24/04/2023 22:07, Robin Murphy wrote:
>>>>> On 2023-04-24 14:44, Jason Adriaanse wrote:
>>>>>> I took out "iommu=soft" and the server failed to boot, so yes it
>>>>>> does break.
>>>>>>
>>>>>> The first error was
>>>>>> ata7.00: Failed to IDENTIFY (INIT_DEV_PARAMS failed , err_mask=0x80)
>>>>>
>>>>> OK, great, that confirms the underlying issue existed all along, so
>>>>> the regression is only a change in who wins a fight between certain
>>>>> conflicting command-line arguments, which is arguably not so critical.
>>>>>
>>>>> The rest of the evidence points to 88SE9235 wanting the same phantom
>>>>> function quirk as most other Marvell controllers, since although
>>>>> it's apparently been half-fixed such that DMA for two of the ports
>>>>> is being correctly emitted from function 0 - given that you say two
>>>>> of the disks *are* detected OK - the other two are still claiming to
>>>>> be function 1 after all.
>>>>>
>>>>> Thanks,
>>>>> Robin.
>>>>>
>>>>>> On 24/04/2023 21:20, Robin Murphy wrote:
>>>>>>> On 2023-04-22 07:25, Jason Adriaanse wrote:
>>>>>>>> Hi Christoph,
>>>>>>>>
>>>>>>>> Sorry for my late reply, I have been on the road.
>>>>>>>>
>>>>>>>> So, if I boot with
>>>>>>>> intel_iommu=off
>>>>>>>> Then the server boots fine..although that is not a solution
>>>>>>>> because I need Intel iommu for virtualisation.
>>>>>>>>
>>>>>>>> Also, I build all my kernels with CONFIG_INTEL_IOMMU=y
>>>>>>>>
>>>>>>>
>>>>>>> If you boot 5.15 *without* the "iommu=soft" argument, just
>>>>>>> "intel_iommu=on", does that also break?
>>>>>>>
>>>>>>> Robin.
>>
>>

2023-05-22 11:56:13

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [regression] Bug 217218 - Trying to boot Linux version 6-2.2 kernel with Marvell SATA controller 88SE9235

On 22.05.23 13:01, Robin Murphy wrote:
> On 2023-05-22 11:26, Linux regression tracking (Thorsten Leemhuis) wrote:
>>
>> I might be missing something, but it looks to me like this regression
>> was never fixed in mainline. Which is strange, as we apparently had a
>> patch from Robin that fixed the issue for the reporter.
>>
>> Did it fall through the cracks or what am I missing?
>
> Strictly, the regression itself has not been fixed - I guess it does
> just about qualify since the rather-out-of-date
> Documentation/arch/x86/x86_64/boot-options.rst does still say that
> iommu=soft "can be used to prevent the usage of an available hardware
> IOMMU", and that seems to be what has stopped happening here.
>
> What it exposed was a latent issue that this particular device has never
> been properly supported for use with an IOMMU, and that's what I guessed
> at a fix for.

Thx for the summary. This sounds a lot like you have no interest in
submitting the quirk entry yourself (please correct me if I'm wrong).
Jason from looking at lore doesn't seem to be involved in kernel
development regularly. And I try to stay out of such waters as well, as
I try to draw a line there. Which leads to the question:

Who will now submit the quirk entry?

From "git blame" it seems Bjorn and Alex added most of the other quirk
entries for the marvel controllers (both CCed now). Could one of you add
this one that Ron suggested in [1] as well?

/me wonders if they'd need a "Signed-off-by" from Ron for a one-liner
that is mainly copy-n-paste

Ciao, Thorsten

[1]
https://lore.kernel.org/all/[email protected]/

>> On 25.04.23 15:58, Jason Adriaanse wrote:
>>> I am happy to report that the change worked, this is what
>>> drivers/pci/quirks.c looks like
>>>
>>> /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c49 */
>>> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9230,
>>>                           quirk_dma_func1_alias);
>>> /* https://bugzilla.kernel.org/show_bug.cgi?id=217218 */
>>> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9235,
>>>                           quirk_dma_func1_alias);
>>> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0642,
>>>                           quirk_dma_func1_alias);
>>>
>>> Relevant output of dmesg -T with the new kernel running
>>>
>>> Tue Apr 25 21:45:13 2023] scsi host0: ahci
>>> [Tue Apr 25 21:45:13 2023] scsi host1: ahci
>>> [Tue Apr 25 21:45:13 2023] scsi host2: ahci
>>> [Tue Apr 25 21:45:13 2023] scsi host3: ahci
>>> [Tue Apr 25 21:45:13 2023] ata1: SATA max UDMA/133 abar m2048@0xf7d06000
>>> port 0xf7d06100 irq 40
>>> [Tue Apr 25 21:45:13 2023] ata2: SATA max UDMA/133 abar m2048@0xf7d06000
>>> port 0xf7d06180 irq 40
>>> [Tue Apr 25 21:45:13 2023] ata3: DUMMY
>>> [Tue Apr 25 21:45:13 2023] ata4: DUMMY
>>> [Tue Apr 25 21:45:13 2023] igb 0000:05:00.0 enp5s0: renamed from eth0
>>> [Tue Apr 25 21:45:13 2023] ahci 0000:07:00.0: AHCI 0001.0000 32 slots 4
>>> ports 6 Gbps 0xf impl SATA mode
>>> [Tue Apr 25 21:45:13 2023] ahci 0000:07:00.0: flags: 64bit ncq sntf led
>>> only pmp fbs pio slum part sxs
>>> [Tue Apr 25 21:45:13 2023] scsi host4: ahci
>>> [Tue Apr 25 21:45:13 2023] scsi host5: ahci
>>> [Tue Apr 25 21:45:13 2023] scsi host6: ahci
>>> [Tue Apr 25 21:45:13 2023] scsi host7: ahci
>>> [Tue Apr 25 21:45:13 2023] ata5: SATA max UDMA/133 abar m2048@0xf7b10000
>>> port 0xf7b10100 irq 41
>>> [Tue Apr 25 21:45:13 2023] ata6: SATA max UDMA/133 abar m2048@0xf7b10000
>>> port 0xf7b10180 irq 41
>>> [Tue Apr 25 21:45:13 2023] ata7: SATA max UDMA/133 abar m2048@0xf7b10000
>>> port 0xf7b10200 irq 41
>>> [Tue Apr 25 21:45:13 2023] ata8: SATA max UDMA/133 abar m2048@0xf7b10000
>>> port 0xf7b10280 irq 41
>>> [Tue Apr 25 21:45:13 2023] usb 1-1: new high-speed USB device number 2
>>> using ehci-pci
>>> [Tue Apr 25 21:45:14 2023] usb 3-1: new high-speed USB device number 2
>>> using ehci-pci
>>> [Tue Apr 25 21:45:14 2023] ata8: SATA link up 6.0 Gbps (SStatus 133
>>> SControl 300)
>>> [Tue Apr 25 21:45:14 2023] ata6: SATA link up 6.0 Gbps (SStatus 133
>>> SControl 300)
>>> [Tue Apr 25 21:45:14 2023] ata7: SATA link up 6.0 Gbps (SStatus 133
>>> SControl 300)
>>> [Tue Apr 25 21:45:14 2023] ata5: SATA link up 6.0 Gbps (SStatus 133
>>> SControl 300)
>>> [Tue Apr 25 21:45:14 2023] ata7.00: ATA-9: WDC WD40EFRX-68WT0N0,
>>> 80.00A80, max UDMA/133
>>> [Tue Apr 25 21:45:14 2023] ata6.00: ATA-9: WDC WD40EFRX-68WT0N0,
>>> 80.00A80, max UDMA/133
>>> [Tue Apr 25 21:45:14 2023] ata8.00: ATA-9: WDC WD40EFRX-68WT0N0,
>>> 80.00A80, max UDMA/133
>>> [Tue Apr 25 21:45:14 2023] ata5.00: ATA-10: CT2000BX500SSD1, M6CR030,
>>> max UDMA/133
>>> [Tue Apr 25 21:45:14 2023] ata6.00: 7814037168 sectors, multi 0: LBA48
>>> NCQ (depth 32), AA
>>> [Tue Apr 25 21:45:14 2023] ata7.00: 7814037168 sectors, multi 0: LBA48
>>> NCQ (depth 32), AA
>>> [Tue Apr 25 21:45:14 2023] ata8.00: 7814037168 sectors, multi 0: LBA48
>>> NCQ (depth 32), AA
>>> [Tue Apr 25 21:45:14 2023] ata5.00: 3907029168 sectors, multi 1: LBA48
>>> NCQ (depth 32), AA
>>> [Tue Apr 25 21:45:14 2023] ata6.00: configured for UDMA/133
>>> [Tue Apr 25 21:45:14 2023] ata7.00: configured for UDMA/133
>>> [Tue Apr 25 21:45:14 2023] ata8.00: configured for UDMA/133
>>> [Tue Apr 25 21:45:14 2023] ata1: SATA link down (SStatus 0 SControl 300)
>>> [Tue Apr 25 21:45:14 2023] ata5.00: Features: Dev-Sleep
>>> [Tue Apr 25 21:45:14 2023] ata5.00: configured for UDMA/133
>>> [Tue Apr 25 21:45:14 2023] usb 1-1: New USB device found, idVendor=8087,
>>> idProduct=0024, bcdDevice= 0.00
>>> [Tue Apr 25 21:45:14 2023] usb 1-1: New USB device strings: Mfr=0,
>>> Product=0, SerialNumber=0
>>> [Tue Apr 25 21:45:14 2023] hub 1-1:1.0: USB hub found
>>> [Tue Apr 25 21:45:14 2023] hub 1-1:1.0: 4 ports detected
>>> [Tue Apr 25 21:45:14 2023] usb 3-1: New USB device found, idVendor=8087,
>>> idProduct=0024, bcdDevice= 0.00
>>> [Tue Apr 25 21:45:14 2023] usb 3-1: New USB device strings: Mfr=0,
>>> Product=0, SerialNumber=0
>>> [Tue Apr 25 21:45:14 2023] hub 3-1:1.0: USB hub found
>>> [Tue Apr 25 21:45:14 2023] hub 3-1:1.0: 6 ports detected
>>> [Tue Apr 25 21:45:14 2023] ata2: SATA link down (SStatus 0 SControl 300)
>>> [Tue Apr 25 21:45:14 2023] scsi 4:0:0:0: Direct-Access ATA
>>> CT2000BX500SSD1  030  PQ: 0 ANSI: 5
>>> [Tue Apr 25 21:45:14 2023] scsi 5:0:0:0: Direct-Access ATA      WDC
>>> WD40EFRX-68W 0A80 PQ: 0 ANSI: 5
>>> [Tue Apr 25 21:45:14 2023] scsi 6:0:0:0: Direct-Access ATA      WDC
>>> WD40EFRX-68W 0A80 PQ: 0 ANSI: 5
>>> [Tue Apr 25 21:45:14 2023] scsi 7:0:0:0: Direct-Access ATA      WDC
>>> WD40EFRX-68W 0A80 PQ: 0 ANSI: 5
>>>
>>> Thanks everyone for all your help.
>>>
>>> Jason
>>>
>>>
>>> On 25/04/2023 19:37, Robin Murphy wrote:
>>>> On 2023-04-25 05:17, Jason Adriaanse wrote:
>>>>> Ok great,
>>>>>
>>>>> I take it a change needs to be made in
>>>>> drivers/pci/quirks.c
>>>>> ?
>>>>> I do not mind making the change locally here and letting you know if
>>>>> it works or not.
>>>>
>>>> Indeed, something like this (make sure the IDs actually match what your
>>>> device reports, I'm just guessing):
>>>>
>>>>
>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>> index 44cab813bf95..a9166e886b75 100644
>>>> --- a/drivers/pci/quirks.c
>>>> +++ b/drivers/pci/quirks.c
>>>> @@ -4161,6 +4161,8 @@
>>>> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9220,
>>>>   /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c49 */
>>>>   DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9230,
>>>>                quirk_dma_func1_alias);
>>>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9235,
>>>> +             quirk_dma_func1_alias);
>>>>   DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0642,
>>>>                quirk_dma_func1_alias);
>>>>   DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0645,
>>>>
>>>>
>>>> Marvell themselves seem to lump the 88SE92xx products together as a
>>>> closely-related family, so given that we do have quirks for 3 of the 4
>>>> already, this one does rather seem conspicuous by its absence...
>>>>
>>>> Thanks,
>>>> Robin.
>>>>
>>>>> On 24/04/2023 22:07, Robin Murphy wrote:
>>>>>> On 2023-04-24 14:44, Jason Adriaanse wrote:
>>>>>>> I took out "iommu=soft" and the server failed to boot, so yes it
>>>>>>> does break.
>>>>>>>
>>>>>>> The first error was
>>>>>>> ata7.00: Failed to IDENTIFY (INIT_DEV_PARAMS failed , err_mask=0x80)
>>>>>>
>>>>>> OK, great, that confirms the underlying issue existed all along, so
>>>>>> the regression is only a change in who wins a fight between certain
>>>>>> conflicting command-line arguments, which is arguably not so
>>>>>> critical.
>>>>>>
>>>>>> The rest of the evidence points to 88SE9235 wanting the same phantom
>>>>>> function quirk as most other Marvell controllers, since although
>>>>>> it's apparently been half-fixed such that DMA for two of the ports
>>>>>> is being correctly emitted from function 0 - given that you say two
>>>>>> of the disks *are* detected OK - the other two are still claiming to
>>>>>> be function 1 after all.
>>>>>>
>>>>>> Thanks,
>>>>>> Robin.
>>>>>>
>>>>>>> On 24/04/2023 21:20, Robin Murphy wrote:
>>>>>>>> On 2023-04-22 07:25, Jason Adriaanse wrote:
>>>>>>>>> Hi Christoph,
>>>>>>>>>
>>>>>>>>> Sorry for my late reply, I have been on the road.
>>>>>>>>>
>>>>>>>>> So, if I boot with
>>>>>>>>> intel_iommu=off
>>>>>>>>> Then the server boots fine..although that is not a solution
>>>>>>>>> because I need Intel iommu for virtualisation.
>>>>>>>>>
>>>>>>>>> Also, I build all my kernels with CONFIG_INTEL_IOMMU=y
>>>>>>>>>
>>>>>>>>
>>>>>>>> If you boot 5.15 *without* the "iommu=soft" argument, just
>>>>>>>> "intel_iommu=on", does that also break?
>>>>>>>>
>>>>>>>> Robin.
>>>
>>>
>
>

2023-06-02 13:30:09

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [regression] Bug 217218 - Trying to boot Linux version 6-2.2 kernel with Marvell SATA controller 88SE9235

Christoph, could you do me a favor and...

On 22.05.23 13:33, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 22.05.23 13:01, Robin Murphy wrote:
>> On 2023-05-22 11:26, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>
>>> I might be missing something, but it looks to me like this regression
>>> was never fixed in mainline. Which is strange, as we apparently had a
>>> patch from Robin that fixed the issue for the reporter.
>>>
>>> Did it fall through the cracks or what am I missing?
>>
>> Strictly, the regression itself has not been fixed - I guess it does
>> just about qualify since the rather-out-of-date
>> Documentation/arch/x86/x86_64/boot-options.rst does still say that
>> iommu=soft "can be used to prevent the usage of an available hardware
>> IOMMU", and that seems to be what has stopped happening here.
>>
>> What it exposed was a latent issue that this particular device has never
>> been properly supported for use with an IOMMU, and that's what I guessed
>> at a fix for.
>
> Thx for the summary. This sounds a lot like you have no interest in
> submitting the quirk entry yourself (please correct me if I'm wrong).
> Jason from looking at lore doesn't seem to be involved in kernel
> development regularly. And I try to stay out of such waters as well, as
> I try to draw a line there. Which leads to the question:
>
> Who will now submit the quirk entry?
>
> From "git blame" it seems Bjorn and Alex added most of the other quirk
> entries for the marvel controllers (both CCed now). Could one of you add
> this one that Ron suggested in [1] as well?

...submit that quirk, as Bjorn and Alex apparently didn't pick this up?
I could do so myself, but prefer to leave that to people that actually
known what they are doing -- and thus can also handle problems later, in
case any show up. And strictly speaking it apparently was you who caused
this regression with 78013eaadf6 ("x86: remove the IOMMU table
infrastructure").

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

> /me wonders if they'd need a "Signed-off-by" from Ron for a one-liner
> that is mainly copy-n-paste
>
> Ciao, Thorsten
>
> [1]
> https://lore.kernel.org/all/[email protected]/
>
>>> On 25.04.23 15:58, Jason Adriaanse wrote:
>>>> I am happy to report that the change worked, this is what
>>>> drivers/pci/quirks.c looks like
>>>>
>>>> /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c49 */
>>>> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9230,
>>>>                           quirk_dma_func1_alias);
>>>> /* https://bugzilla.kernel.org/show_bug.cgi?id=217218 */
>>>> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9235,
>>>>                           quirk_dma_func1_alias);
>>>> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0642,
>>>>                           quirk_dma_func1_alias);
>>>>
>>>> Relevant output of dmesg -T with the new kernel running
>>>>
>>>> Tue Apr 25 21:45:13 2023] scsi host0: ahci
>>>> [Tue Apr 25 21:45:13 2023] scsi host1: ahci
>>>> [Tue Apr 25 21:45:13 2023] scsi host2: ahci
>>>> [Tue Apr 25 21:45:13 2023] scsi host3: ahci
>>>> [Tue Apr 25 21:45:13 2023] ata1: SATA max UDMA/133 abar m2048@0xf7d06000
>>>> port 0xf7d06100 irq 40
>>>> [Tue Apr 25 21:45:13 2023] ata2: SATA max UDMA/133 abar m2048@0xf7d06000
>>>> port 0xf7d06180 irq 40
>>>> [Tue Apr 25 21:45:13 2023] ata3: DUMMY
>>>> [Tue Apr 25 21:45:13 2023] ata4: DUMMY
>>>> [Tue Apr 25 21:45:13 2023] igb 0000:05:00.0 enp5s0: renamed from eth0
>>>> [Tue Apr 25 21:45:13 2023] ahci 0000:07:00.0: AHCI 0001.0000 32 slots 4
>>>> ports 6 Gbps 0xf impl SATA mode
>>>> [Tue Apr 25 21:45:13 2023] ahci 0000:07:00.0: flags: 64bit ncq sntf led
>>>> only pmp fbs pio slum part sxs
>>>> [Tue Apr 25 21:45:13 2023] scsi host4: ahci
>>>> [Tue Apr 25 21:45:13 2023] scsi host5: ahci
>>>> [Tue Apr 25 21:45:13 2023] scsi host6: ahci
>>>> [Tue Apr 25 21:45:13 2023] scsi host7: ahci
>>>> [Tue Apr 25 21:45:13 2023] ata5: SATA max UDMA/133 abar m2048@0xf7b10000
>>>> port 0xf7b10100 irq 41
>>>> [Tue Apr 25 21:45:13 2023] ata6: SATA max UDMA/133 abar m2048@0xf7b10000
>>>> port 0xf7b10180 irq 41
>>>> [Tue Apr 25 21:45:13 2023] ata7: SATA max UDMA/133 abar m2048@0xf7b10000
>>>> port 0xf7b10200 irq 41
>>>> [Tue Apr 25 21:45:13 2023] ata8: SATA max UDMA/133 abar m2048@0xf7b10000
>>>> port 0xf7b10280 irq 41
>>>> [Tue Apr 25 21:45:13 2023] usb 1-1: new high-speed USB device number 2
>>>> using ehci-pci
>>>> [Tue Apr 25 21:45:14 2023] usb 3-1: new high-speed USB device number 2
>>>> using ehci-pci
>>>> [Tue Apr 25 21:45:14 2023] ata8: SATA link up 6.0 Gbps (SStatus 133
>>>> SControl 300)
>>>> [Tue Apr 25 21:45:14 2023] ata6: SATA link up 6.0 Gbps (SStatus 133
>>>> SControl 300)
>>>> [Tue Apr 25 21:45:14 2023] ata7: SATA link up 6.0 Gbps (SStatus 133
>>>> SControl 300)
>>>> [Tue Apr 25 21:45:14 2023] ata5: SATA link up 6.0 Gbps (SStatus 133
>>>> SControl 300)
>>>> [Tue Apr 25 21:45:14 2023] ata7.00: ATA-9: WDC WD40EFRX-68WT0N0,
>>>> 80.00A80, max UDMA/133
>>>> [Tue Apr 25 21:45:14 2023] ata6.00: ATA-9: WDC WD40EFRX-68WT0N0,
>>>> 80.00A80, max UDMA/133
>>>> [Tue Apr 25 21:45:14 2023] ata8.00: ATA-9: WDC WD40EFRX-68WT0N0,
>>>> 80.00A80, max UDMA/133
>>>> [Tue Apr 25 21:45:14 2023] ata5.00: ATA-10: CT2000BX500SSD1, M6CR030,
>>>> max UDMA/133
>>>> [Tue Apr 25 21:45:14 2023] ata6.00: 7814037168 sectors, multi 0: LBA48
>>>> NCQ (depth 32), AA
>>>> [Tue Apr 25 21:45:14 2023] ata7.00: 7814037168 sectors, multi 0: LBA48
>>>> NCQ (depth 32), AA
>>>> [Tue Apr 25 21:45:14 2023] ata8.00: 7814037168 sectors, multi 0: LBA48
>>>> NCQ (depth 32), AA
>>>> [Tue Apr 25 21:45:14 2023] ata5.00: 3907029168 sectors, multi 1: LBA48
>>>> NCQ (depth 32), AA
>>>> [Tue Apr 25 21:45:14 2023] ata6.00: configured for UDMA/133
>>>> [Tue Apr 25 21:45:14 2023] ata7.00: configured for UDMA/133
>>>> [Tue Apr 25 21:45:14 2023] ata8.00: configured for UDMA/133
>>>> [Tue Apr 25 21:45:14 2023] ata1: SATA link down (SStatus 0 SControl 300)
>>>> [Tue Apr 25 21:45:14 2023] ata5.00: Features: Dev-Sleep
>>>> [Tue Apr 25 21:45:14 2023] ata5.00: configured for UDMA/133
>>>> [Tue Apr 25 21:45:14 2023] usb 1-1: New USB device found, idVendor=8087,
>>>> idProduct=0024, bcdDevice= 0.00
>>>> [Tue Apr 25 21:45:14 2023] usb 1-1: New USB device strings: Mfr=0,
>>>> Product=0, SerialNumber=0
>>>> [Tue Apr 25 21:45:14 2023] hub 1-1:1.0: USB hub found
>>>> [Tue Apr 25 21:45:14 2023] hub 1-1:1.0: 4 ports detected
>>>> [Tue Apr 25 21:45:14 2023] usb 3-1: New USB device found, idVendor=8087,
>>>> idProduct=0024, bcdDevice= 0.00
>>>> [Tue Apr 25 21:45:14 2023] usb 3-1: New USB device strings: Mfr=0,
>>>> Product=0, SerialNumber=0
>>>> [Tue Apr 25 21:45:14 2023] hub 3-1:1.0: USB hub found
>>>> [Tue Apr 25 21:45:14 2023] hub 3-1:1.0: 6 ports detected
>>>> [Tue Apr 25 21:45:14 2023] ata2: SATA link down (SStatus 0 SControl 300)
>>>> [Tue Apr 25 21:45:14 2023] scsi 4:0:0:0: Direct-Access ATA
>>>> CT2000BX500SSD1  030  PQ: 0 ANSI: 5
>>>> [Tue Apr 25 21:45:14 2023] scsi 5:0:0:0: Direct-Access ATA      WDC
>>>> WD40EFRX-68W 0A80 PQ: 0 ANSI: 5
>>>> [Tue Apr 25 21:45:14 2023] scsi 6:0:0:0: Direct-Access ATA      WDC
>>>> WD40EFRX-68W 0A80 PQ: 0 ANSI: 5
>>>> [Tue Apr 25 21:45:14 2023] scsi 7:0:0:0: Direct-Access ATA      WDC
>>>> WD40EFRX-68W 0A80 PQ: 0 ANSI: 5
>>>>
>>>> Thanks everyone for all your help.
>>>>
>>>> Jason
>>>>
>>>>
>>>> On 25/04/2023 19:37, Robin Murphy wrote:
>>>>> On 2023-04-25 05:17, Jason Adriaanse wrote:
>>>>>> Ok great,
>>>>>>
>>>>>> I take it a change needs to be made in
>>>>>> drivers/pci/quirks.c
>>>>>> ?
>>>>>> I do not mind making the change locally here and letting you know if
>>>>>> it works or not.
>>>>>
>>>>> Indeed, something like this (make sure the IDs actually match what your
>>>>> device reports, I'm just guessing):
>>>>>
>>>>>
>>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>>> index 44cab813bf95..a9166e886b75 100644
>>>>> --- a/drivers/pci/quirks.c
>>>>> +++ b/drivers/pci/quirks.c
>>>>> @@ -4161,6 +4161,8 @@
>>>>> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9220,
>>>>>   /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c49 */
>>>>>   DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9230,
>>>>>                quirk_dma_func1_alias);
>>>>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9235,
>>>>> +             quirk_dma_func1_alias);
>>>>>   DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0642,
>>>>>                quirk_dma_func1_alias);
>>>>>   DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0645,
>>>>>
>>>>>
>>>>> Marvell themselves seem to lump the 88SE92xx products together as a
>>>>> closely-related family, so given that we do have quirks for 3 of the 4
>>>>> already, this one does rather seem conspicuous by its absence...
>>>>>
>>>>> Thanks,
>>>>> Robin.
>>>>>
>>>>>> On 24/04/2023 22:07, Robin Murphy wrote:
>>>>>>> On 2023-04-24 14:44, Jason Adriaanse wrote:
>>>>>>>> I took out "iommu=soft" and the server failed to boot, so yes it
>>>>>>>> does break.
>>>>>>>>
>>>>>>>> The first error was
>>>>>>>> ata7.00: Failed to IDENTIFY (INIT_DEV_PARAMS failed , err_mask=0x80)
>>>>>>>
>>>>>>> OK, great, that confirms the underlying issue existed all along, so
>>>>>>> the regression is only a change in who wins a fight between certain
>>>>>>> conflicting command-line arguments, which is arguably not so
>>>>>>> critical.
>>>>>>>
>>>>>>> The rest of the evidence points to 88SE9235 wanting the same phantom
>>>>>>> function quirk as most other Marvell controllers, since although
>>>>>>> it's apparently been half-fixed such that DMA for two of the ports
>>>>>>> is being correctly emitted from function 0 - given that you say two
>>>>>>> of the disks *are* detected OK - the other two are still claiming to
>>>>>>> be function 1 after all.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Robin.
>>>>>>>
>>>>>>>> On 24/04/2023 21:20, Robin Murphy wrote:
>>>>>>>>> On 2023-04-22 07:25, Jason Adriaanse wrote:
>>>>>>>>>> Hi Christoph,
>>>>>>>>>>
>>>>>>>>>> Sorry for my late reply, I have been on the road.
>>>>>>>>>>
>>>>>>>>>> So, if I boot with
>>>>>>>>>> intel_iommu=off
>>>>>>>>>> Then the server boots fine..although that is not a solution
>>>>>>>>>> because I need Intel iommu for virtualisation.
>>>>>>>>>>
>>>>>>>>>> Also, I build all my kernels with CONFIG_INTEL_IOMMU=y
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If you boot 5.15 *without* the "iommu=soft" argument, just
>>>>>>>>> "intel_iommu=on", does that also break?
>>>>>>>>>
>>>>>>>>> Robin.
>>>>
>>>>
>>
>>

2023-06-06 09:55:57

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [regression] Bug 217218 - Trying to boot Linux version 6-2.2 kernel with Marvell SATA controller 88SE9235

On Fri, Jun 02, 2023 at 03:07:19PM +0200, Thorsten Leemhuis wrote:
> Christoph, could you do me a favor and...

> ...submit that quirk, as Bjorn and Alex apparently didn't pick this up?
> I could do so myself, but prefer to leave that to people that actually
> known what they are doing -- and thus can also handle problems later, in
> case any show up. And strictly speaking it apparently was you who caused
> this regression with 78013eaadf6 ("x86: remove the IOMMU table
> infrastructure").

Well, Robin posted it so I think he should also finish it up and get
the credit. Robin, can you send the quirk with a formal signoff?


2023-06-20 14:38:24

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [regression] Bug 217218 - Trying to boot Linux version 6-2.2 kernel with Marvell SATA controller 88SE9235

[TLDR: This mail in primarily relevant for Linux regression tracking. A
change or fix related to the regression discussed in this thread was
posted or applied, but it did not use a Closes: tag to point to the
report, as Linus and the documentation call for. Things happen, no
worries -- but now the regression tracking bot needs to be told manually
about the fix. See link in footer if these mails annoy you.]

On 21.03.23 14:52, Linux regression tracking (Thorsten Leemhuis) wrote:
>
> I noticed a regression report in bugzilla.kernel.org. As many (most?)
> kernel developers don't keep an eye on it, I decided to forward it by
> mail (note, the reporter *is not* CCed to this mail, see[1]).
>
> #regzbot introduced: 78013eaadf696d21
> https://bugzilla.kernel.org/show_bug.cgi?id=217218
> #regzbot title: dma/x86: machine with Marvell SATA controller 88SE9235
> stopped booting
> #regzbot ignore-activity

#regzbot fix: ata: ahci: Skip 200 ms debounce delay for Marvell 88SE9235
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.