2024-02-26 10:54:56

by edmund.raile

[permalink] [raw]
Subject: [PATCH] PCI: Mark LSI FW643 to avoid bus reset

Using LSI / Agere FW643 with vfio-pci will issue an FLreset, causing
a broken link only recoverable by removing power (power-off /
suspend + rescan). Prevent this bus reset.
With this change, the device can be assigned to VMs with VFIO.

Signed-off-by: Edmund Raile <[email protected]>
---
Usefulness:
The LSI FW643 PCIe->FireWire 800 interface may be EOL but it is
the only one that does not use a PCIe->PCI bridge.
It was used in the following Apple machines:
MacBookPro10,1
MacBookPro9,2
MacBookPro6,2
MacBookPro5,1
Macmini6,1
Macmini3,1
iMac12,2
iMac9,1
iMac8,1
It is reliable and enables FireWire audio interfaces to be used
on modern machines.
Virtualization allows for flexible access to professional audio
software.

Implementation:
PCI_VENDOR_ID_ATT was reused as they are identical and I am
uncertain it is correct to add another ID for LSI to pci_ids.h.
drivers/pci/quirks.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index d797df6e5f3e..a6747e1b86da 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3765,6 +3765,15 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003e, quirk_no_bus_reset);
*/
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset);

+/*
+ * Using LSI / Agere FW643 with vfio-pci will issue an FLreset, causing
+ * a broken link only recoverable by removing power (power-off /
+ * suspend + rescan). Prevent this bus reset.
+ * With this change, the device can be assigned to VMs with VFIO.
+ */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATT, 0x5900, quirk_no_bus_reset);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATT, 0x5901, quirk_no_bus_reset);
+
/*
* Some TI KeyStone C667X devices do not support bus/hot reset. The PCIESS
* automatically disables LTSSM when Secondary Bus Reset is received and
--
2.43.0




2024-02-26 21:02:39

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] PCI: Mark LSI FW643 to avoid bus reset

[+cc Alex]

On Mon, Feb 26, 2024 at 10:25:12AM +0000, Edmund Raile wrote:
> Using LSI / Agere FW643 with vfio-pci will issue an FLreset, causing
> a broken link only recoverable by removing power (power-off /
> suspend + rescan). Prevent this bus reset.
> With this change, the device can be assigned to VMs with VFIO.

I don't quite understand the commit log. It says Function Level
Resets cause a problem, but the patch sets PCI_DEV_FLAGS_NO_BUS_RESET,
which prevents Secondary Bus Resets, not Function Level Resets.

By default, I think we try reset methods in order of
pci_reset_fn_methods[], so the fact that we get to the end and
try pci_reset_bus_function() (where PCI_DEV_FLAGS_NO_BUS_RESET is
tested) suggests that we weren't able to use FLR or PM resets.

IIUC, vfio-pci resets devices to prevent leaking state from one VM to
another. We might be willing to accept the downside of allowing that
leakage, but I think it's worth mentioning that in the commit log.

Add blank lines between paragraphs (also in the comment below), or
rewrap into a single paragraph.

> Signed-off-by: Edmund Raile <[email protected]>
> ---
> Usefulness:
> The LSI FW643 PCIe->FireWire 800 interface may be EOL but it is
> the only one that does not use a PCIe->PCI bridge.
> It was used in the following Apple machines:
> MacBookPro10,1
> MacBookPro9,2
> MacBookPro6,2
> MacBookPro5,1
> Macmini6,1
> Macmini3,1
> iMac12,2
> iMac9,1
> iMac8,1
> It is reliable and enables FireWire audio interfaces to be used
> on modern machines.
> Virtualization allows for flexible access to professional audio
> software.
>
> Implementation:
> PCI_VENDOR_ID_ATT was reused as they are identical and I am
> uncertain it is correct to add another ID for LSI to pci_ids.h.
> drivers/pci/quirks.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index d797df6e5f3e..a6747e1b86da 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3765,6 +3765,15 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003e, quirk_no_bus_reset);
> */
> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset);
>
> +/*
> + * Using LSI / Agere FW643 with vfio-pci will issue an FLreset, causing
> + * a broken link only recoverable by removing power (power-off /
> + * suspend + rescan). Prevent this bus reset.
> + * With this change, the device can be assigned to VMs with VFIO.
> + */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATT, 0x5900, quirk_no_bus_reset);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATT, 0x5901, quirk_no_bus_reset);
> +
> /*
> * Some TI KeyStone C667X devices do not support bus/hot reset. The PCIESS
> * automatically disables LTSSM when Secondary Bus Reset is received and
> --
> 2.43.0
>
>

2024-02-27 13:48:05

by edmund.raile

[permalink] [raw]
Subject: [PATCH v2] PCI: Mark LSI FW643 to avoid bus reset

Using LSI / Agere FW643 with vfio-pci will exhaust all
pci_reset_fn_methods, the bus reset at the end causes a broken link
only recoverable by removing power
(power-off / suspend + rescan).
Prevent this bus reset.
With this change, the device can be assigned to VMs with VFIO.
Note that it will not be reset, resulting in leaking state between VMs
and host.

Signed-off-by: Edmund Raile <[email protected]>

---

I sincerely thank you for your patience and explaining
the background of pci resets which I lacked.
The commit message and comment now describe it correctly.
The comment on leaking states was added.

Usefulness:

The LSI FW643 PCIe->FireWire 800 interface may be EOL but it is
the only one that does not use a PCIe->PCI bridge.
It is reliable and enables FireWire audio interfaces to be used
on modern machines.

Virtualization allows for flexible access to professional audio
software.

It has been used in at least the following Apple machines:
MacBookPro10,1
MacBookPro9,2
MacBookPro6,2
MacBookPro5,1
Macmini6,1
Macmini3,1
iMac12,2
iMac9,1
iMac8,1

Implementation:

PCI_VENDOR_ID_ATT was reused as they are identical.

drivers/pci/quirks.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index d797df6e5f3e..e0e4ad9e6d50 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3765,6 +3765,19 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003e, quirk_no_bus_reset);
*/
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset);

+/*
+ * Using LSI / Agere FW643 with vfio-pci will exhaust all
+ * pci_reset_fn_methods, the bus reset at the end causes a broken link
+ * only recoverable by removing power
+ * (power-off / suspend + rescan).
+ * Prevent this bus reset.
+ * With this change, the device can be assigned to VMs with VFIO.
+ * Note that it will not be reset, resulting in leaking state between VMs
+ * and host.
+ */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATT, 0x5900, quirk_no_bus_reset);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATT, 0x5901, quirk_no_bus_reset);
+
/*
* Some TI KeyStone C667X devices do not support bus/hot reset. The PCIESS
* automatically disables LTSSM when Secondary Bus Reset is received and
--
2.43.0



2024-02-29 23:00:21

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Mark LSI FW643 to avoid bus reset

On Tue, Feb 27, 2024 at 01:14:18PM +0000, Edmund Raile wrote:
> Using LSI / Agere FW643 with vfio-pci will exhaust all
> pci_reset_fn_methods, the bus reset at the end causes a broken link
> only recoverable by removing power
> (power-off / suspend + rescan).
> Prevent this bus reset.
> With this change, the device can be assigned to VMs with VFIO.
> Note that it will not be reset, resulting in leaking state between VMs
> and host.
>
> Signed-off-by: Edmund Raile <[email protected]>

Applied to pci/virtualization for v6.9, thank you!

> ---
>
> I sincerely thank you for your patience and explaining
> the background of pci resets which I lacked.
> The commit message and comment now describe it correctly.
> The comment on leaking states was added.
>
> Usefulness:
>
> The LSI FW643 PCIe->FireWire 800 interface may be EOL but it is
> the only one that does not use a PCIe->PCI bridge.
> It is reliable and enables FireWire audio interfaces to be used
> on modern machines.
>
> Virtualization allows for flexible access to professional audio
> software.
>
> It has been used in at least the following Apple machines:
> MacBookPro10,1
> MacBookPro9,2
> MacBookPro6,2
> MacBookPro5,1
> Macmini6,1
> Macmini3,1
> iMac12,2
> iMac9,1
> iMac8,1
>
> Implementation:
>
> PCI_VENDOR_ID_ATT was reused as they are identical.
>
> drivers/pci/quirks.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index d797df6e5f3e..e0e4ad9e6d50 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3765,6 +3765,19 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003e, quirk_no_bus_reset);
> */
> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset);
>
> +/*
> + * Using LSI / Agere FW643 with vfio-pci will exhaust all
> + * pci_reset_fn_methods, the bus reset at the end causes a broken link
> + * only recoverable by removing power
> + * (power-off / suspend + rescan).
> + * Prevent this bus reset.
> + * With this change, the device can be assigned to VMs with VFIO.
> + * Note that it will not be reset, resulting in leaking state between VMs
> + * and host.
> + */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATT, 0x5900, quirk_no_bus_reset);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATT, 0x5901, quirk_no_bus_reset);
> +
> /*
> * Some TI KeyStone C667X devices do not support bus/hot reset. The PCIESS
> * automatically disables LTSSM when Secondary Bus Reset is received and
> --
> 2.43.0
>
>

2024-03-25 10:59:53

by Takashi Sakamoto

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Mark LSI FW643 to avoid bus reset

Hi Bjorn Helgaas,

(C.C.ed to [email protected])

I have an objection to applying the change.

I've been using the issued 1394 OHCI hardware in my development for recent
years, while I have never faced the reported trouble. I think there are
any misunderstanding or misjudge somwhow in the review process to apply it.

Would I ask your precise advice to regenerate the reported issue in my
local?

This is my 1394 OHCI hardware.

```
$ sudo lspci -vvvnns 06:00.0
06:00.0 FireWire (IEEE 1394) [0c00]: LSI Corporation FW643 [TrueFire] PCIe 1394b Controller [11c1:5901] (rev 06) (prog-if 10 [OHCI])
Subsystem: LSI Corporation FW643 [TrueFire] PCIe 1394b Controller [11c1:5900]
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 255
IOMMU group: 17
Region 0: Memory at fc700000 (64-bit, non-prefetchable) [disabled] [size=4K]
Capabilities: [44] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D3 NoSoftRst- PME-Enable+ DSel=0 DScale=0 PME-
Capabilities: [4c] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [60] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0W
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <64us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [140 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
VC1: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable- ID=1 ArbSelect=Fixed TC/VC=00
Status: NegoPending- InProgress-
Capabilities: [170 v1] Device Serial Number 12-34-56-10-12-30-00-86
Kernel driver in use: vfio-pci
Kernel modules: firewire_ohci
```

I use it in the following environment at present:

* Host system
* AMD Ryzen 5 2400G
* TUF GAMING X570-PLUS with BIOS 5003 (AGESA ComboV2PI 1.2.0.B)
* SMT enabled
* SVM enabled
* IOMMU enabled
* Secure boot disabled
* Ubuntu 24.04 LTS amd64
* linux-image-6.8.0-11-generic (6.8.0-11.11)
* default kernel cmdline
* QEMU 8.2.1 (1:8.2.1+ds-1ubuntu1)
* Libvert 10.0.0 (10.0.0-2ubuntu1)
* Guest system
* UEFI using OVMF
* Seecure boot enabled
* Ubuntu 24.04 LTS amd64 (the same as above)
* default kernel cmdline

> Using LSI / Agere FW643 with vfio-pci will exhaust all
> pci_reset_fn_methods, the bus reset at the end causes a broken link
> only recoverable by removing power
> (power-off / suspend + rescan).
> Prevent this bus reset.
> With this change, the device can be assigned to VMs with VFIO.
> Note that it will not be reset, resulting in leaking state between VMs
> and host.
>
> Signed-off-by: Edmund Raile <[email protected]>
>
> I sincerely thank you for your patience and explaining
> the background of pci resets which I lacked.
> The commit message and comment now describe it correctly.
> The comment on leaking states was added.
>
> Usefulness:
>
> The LSI FW643 PCIe->FireWire 800 interface may be EOL but it is
> the only one that does not use a PCIe->PCI bridge.
> It is reliable and enables FireWire audio interfaces to be used
> on modern machines.
>
> Virtualization allows for flexible access to professional audio
> software.
>
> It has been used in at least the following Apple machines:
> MacBookPro10,1
> MacBookPro9,2
> MacBookPro6,2
> MacBookPro5,1
> Macmini6,1
> Macmini3,1
> iMac12,2
> iMac9,1
> iMac8,1
>
> Implementation:
>
> PCI_VENDOR_ID_ATT was reused as they are identical.
>
> drivers/pci/quirks.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index d797df6e5f3e..e0e4ad9e6d50 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3765,6 +3765,19 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003e, quirk_no_bus_reset);
> */
> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset);
>
> +/*
> + * Using LSI / Agere FW643 with vfio-pci will exhaust all
> + * pci_reset_fn_methods, the bus reset at the end causes a broken link
> + * only recoverable by removing power
> + * (power-off / suspend + rescan).
> + * Prevent this bus reset.
> + * With this change, the device can be assigned to VMs with VFIO.
> + * Note that it will not be reset, resulting in leaking state between VMs
> + * and host.
> + */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATT, 0x5900, quirk_no_bus_reset);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATT, 0x5901, quirk_no_bus_reset);
> +
> /*
> * Some TI KeyStone C667X devices do not support bus/hot reset. The PCIESS
> * automatically disables LTSSM when Secondary Bus Reset is received and


Regards

Takashi Sakamoto

2024-03-25 18:08:17

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Mark LSI FW643 to avoid bus reset

On Mon, Mar 25, 2024 at 10:21:35AM +0900, Takashi Sakamoto wrote:
> Hi Bjorn Helgaas,
>
> (C.C.ed to [email protected])
>
> I have an objection to applying the change.

This patch appeared in v6.9-rc1 as 29a43dc130ce ("PCI: Mark LSI FW643
to avoid bus reset").

> I've been using the issued 1394 OHCI hardware in my development for recent
> years, while I have never faced the reported trouble. I think there are
> any misunderstanding or misjudge somwhow in the review process to apply it.
>
> Would I ask your precise advice to regenerate the reported issue in my
> local?

So even without this patch, you are able to pass the FW643 to a VM
with VFIO, and you don't see any issues caused by VFIO resetting the
device?

Can you collect the output of:

$ find /sys/devices -name reset_method | xargs grep .

You should be able to manually reset the device with something like
this (I don't know your topology, so you might have to replace "1d.6"
with the bridge leading to 06:00.0):

# sudo echo 1 > # /sys/devices/pci0000:00/0000:00:1d.6/0000:06:00.0/reset

I don't *know* of a reason why a Secondary Bus Reset would work
correctly on your hardware but not on a Mac, but there could be
something weird going on.

Does the patch cause a problem for you? (Other than the fact that the
device leaks state between VMs?)

> This is my 1394 OHCI hardware.
>
> ```
> $ sudo lspci -vvvnns 06:00.0
> 06:00.0 FireWire (IEEE 1394) [0c00]: LSI Corporation FW643 [TrueFire] PCIe 1394b Controller [11c1:5901] (rev 06) (prog-if 10 [OHCI])
> Subsystem: LSI Corporation FW643 [TrueFire] PCIe 1394b Controller [11c1:5900]
> Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Interrupt: pin A routed to IRQ 255
> IOMMU group: 17
> Region 0: Memory at fc700000 (64-bit, non-prefetchable) [disabled] [size=4K]
> Capabilities: [44] Power Management version 3
> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
> Status: D3 NoSoftRst- PME-Enable+ DSel=0 DScale=0 PME-
> Capabilities: [4c] MSI: Enable- Count=1/1 Maskable- 64bit+
> Address: 0000000000000000 Data: 0000
> Capabilities: [60] Express (v1) Endpoint, MSI 00
> DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0W
> DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
> MaxPayload 128 bytes, MaxReadReq 512 bytes
> DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <64us
> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
> LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 2.5GT/s, Width x1
> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> Capabilities: [100 v1] Advanced Error Reporting
> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> HeaderLog: 00000000 00000000 00000000 00000000
> Capabilities: [140 v1] Virtual Channel
> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
> Arb: Fixed- WRR32- WRR64- WRR128-
> Ctrl: ArbSelect=Fixed
> Status: InProgress-
> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> Status: NegoPending- InProgress-
> VC1: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> Ctrl: Enable- ID=1 ArbSelect=Fixed TC/VC=00
> Status: NegoPending- InProgress-
> Capabilities: [170 v1] Device Serial Number 12-34-56-10-12-30-00-86
> Kernel driver in use: vfio-pci
> Kernel modules: firewire_ohci
> ```
>
> I use it in the following environment at present:
>
> * Host system
> * AMD Ryzen 5 2400G
> * TUF GAMING X570-PLUS with BIOS 5003 (AGESA ComboV2PI 1.2.0.B)
> * SMT enabled
> * SVM enabled
> * IOMMU enabled
> * Secure boot disabled
> * Ubuntu 24.04 LTS amd64
> * linux-image-6.8.0-11-generic (6.8.0-11.11)
> * default kernel cmdline
> * QEMU 8.2.1 (1:8.2.1+ds-1ubuntu1)
> * Libvert 10.0.0 (10.0.0-2ubuntu1)
> * Guest system
> * UEFI using OVMF
> * Seecure boot enabled
> * Ubuntu 24.04 LTS amd64 (the same as above)
> * default kernel cmdline
>
> > Using LSI / Agere FW643 with vfio-pci will exhaust all
> > pci_reset_fn_methods, the bus reset at the end causes a broken link
> > only recoverable by removing power
> > (power-off / suspend + rescan).
> > Prevent this bus reset.
> > With this change, the device can be assigned to VMs with VFIO.
> > Note that it will not be reset, resulting in leaking state between VMs
> > and host.
> >
> > Signed-off-by: Edmund Raile <[email protected]>
> >
> > I sincerely thank you for your patience and explaining
> > the background of pci resets which I lacked.
> > The commit message and comment now describe it correctly.
> > The comment on leaking states was added.
> >
> > Usefulness:
> >
> > The LSI FW643 PCIe->FireWire 800 interface may be EOL but it is
> > the only one that does not use a PCIe->PCI bridge.
> > It is reliable and enables FireWire audio interfaces to be used
> > on modern machines.
> >
> > Virtualization allows for flexible access to professional audio
> > software.
> >
> > It has been used in at least the following Apple machines:
> > MacBookPro10,1
> > MacBookPro9,2
> > MacBookPro6,2
> > MacBookPro5,1
> > Macmini6,1
> > Macmini3,1
> > iMac12,2
> > iMac9,1
> > iMac8,1
> >
> > Implementation:
> >
> > PCI_VENDOR_ID_ATT was reused as they are identical.
> >
> > drivers/pci/quirks.c | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index d797df6e5f3e..e0e4ad9e6d50 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -3765,6 +3765,19 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003e, quirk_no_bus_reset);
> > */
> > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset);
> >
> > +/*
> > + * Using LSI / Agere FW643 with vfio-pci will exhaust all
> > + * pci_reset_fn_methods, the bus reset at the end causes a broken link
> > + * only recoverable by removing power
> > + * (power-off / suspend + rescan).
> > + * Prevent this bus reset.
> > + * With this change, the device can be assigned to VMs with VFIO.
> > + * Note that it will not be reset, resulting in leaking state between VMs
> > + * and host.
> > + */
> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATT, 0x5900, quirk_no_bus_reset);
> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATT, 0x5901, quirk_no_bus_reset);
> > +
> > /*
> > * Some TI KeyStone C667X devices do not support bus/hot reset. The PCIESS
> > * automatically disables LTSSM when Secondary Bus Reset is received and
>
>
> Regards
>
> Takashi Sakamoto

2024-03-26 13:26:06

by Takashi Sakamoto

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Mark LSI FW643 to avoid bus reset

Hi Bjorn,

Thanks for your reply.

On Mon, Mar 25, 2024 at 09:41:49AM -0500, Bjorn Helgaas wrote:
> So even without this patch, you are able to pass the FW643 to a VM
> with VFIO, and you don't see any issues caused by VFIO resetting the
> device?

Absolutely yes, at least in my VM, for recent years to maintain Linux
FireWire subsystem and ALSA firewire stack.

> Can you collect the output of:
>
> $ find /sys/devices -name reset_method | xargs grep .

```
$ sudo find /sys/devices -name reset_method | xargs grep .
/sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:09.0/0000:09:00.0/reset_method:bus
/sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:09.0/reset_method:pm
/sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:02.0/0000:06:00.0/reset_method:pm bus
/sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:02.0/reset_method:pm
/sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:05.0/0000:07:00.0/reset_method:bus
/sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:05.0/reset_method:pm
/sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:0a.0/0000:0a:00.0/reset_method:bus
/sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:0a.0/reset_method:pm
/sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:08.0/0000:08:00.0/reset_method:bus
/sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:08.0/0000:08:00.3/reset_method:pm
/sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:08.0/reset_method:pm
/sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:01.0/reset_method:pm bus
/sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:01.0/0000:05:00.0/reset_method:device_specific flr bus
/sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/reset_method:pm bus
/sys/devices/pci0000:00/0000:00:01.2/reset_method:pm
/sys/devices/pci0000:00/0000:00:08.1/0000:0c:00.0/reset_method:flr bus
/sys/devices/pci0000:00/0000:00:08.1/0000:0c:00.3/reset_method:pm
/sys/devices/pci0000:00/0000:00:08.1/0000:0c:00.1/reset_method:flr pm
/sys/devices/pci0000:00/0000:00:08.1/reset_method:pm
/sys/devices/pci0000:00/0000:00:08.1/0000:0c:00.4/reset_method:pm
/sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/reset_method:pm bus
/sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/reset_method:bus
/sys/devices/pci0000:00/0000:00:01.1/reset_method:pm
/sys/devices/pci0000:00/0000:00:01.6/0000:0b:00.0/reset_method:flr bus
/sys/devices/pci0000:00/0000:00:01.6/reset_method:pm
```

If you need each PCI bus bridge information, I can provide it to you.
but I can promise they are typical hardware in AMD CPU or chipset for
Zen generation and nothing special.

> You should be able to manually reset the device with something like
> this (I don't know your topology, so you might have to replace "1d.6"
> with the bridge leading to 06:00.0):
>
> # sudo echo 1 > # /sys/devices/pci0000:00/0000:00:1d.6/0000:06:00.0/reset

```
$ echo 1 > sudo tee -a /sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:02.0/0000:06:00.0/reset
(nothing happens)
$ journalctl -k -n10
(nothing specific)
```

Would I ask you any point to check after the reset operation?

> I don't *know* of a reason why a Secondary Bus Reset would work
> correctly on your hardware but not on a Mac, but there could be
> something weird going on.

Note that the hardware provided by Apple for the past decade has no
IEEE 1394 interface, thus the patch author seems to use any kind of
bus extension to connect the issued 1394 OHCI hardware. I guess:

* Apple Thunderbolt Display
* https://lore.kernel.org/linux-pci/[email protected]/
* Apple Thunderbolt-OHCI1394 adapter
* I know FW643 is used for the product.
* Some kind of eGPU box

> Does the patch cause a problem for you? (Other than the fact that the
> device leaks state between VMs?)

It takes a bit time for me to set up my system with self-compiled v6.9-rc1
kernel. However the leak between VMs is really inconvenient to me by itself.


Thanks

Takashi Sakamoto

2024-03-27 16:26:12

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Mark LSI FW643 to avoid bus reset

On Tue, Mar 26, 2024 at 10:18:58PM +0900, Takashi Sakamoto wrote:
> On Mon, Mar 25, 2024 at 09:41:49AM -0500, Bjorn Helgaas wrote:
> > So even without this patch, you are able to pass the FW643 to a VM
> > with VFIO, and you don't see any issues caused by VFIO resetting the
> > device?
>
> Absolutely yes, at least in my VM, for recent years to maintain Linux
> FireWire subsystem and ALSA firewire stack.

So there must be something different between your system and Edmund's.
Maybe we can refine the quirk so it avoids the SBR on Edmund's system
but not yours.

Can you both collect the output of "sudo lspci -vvv" so we can try to
figure out the difference? Also a complete dmesg log would be helpful
and would contain DMI information that we might need if this is
firmware dependent.

> > Can you collect the output of:
> >
> > $ find /sys/devices -name reset_method | xargs grep .

Edmund, could you run this command, too?

Thanks,
Bjorn

2024-03-28 18:35:56

by edmund.raile

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Mark LSI FW643 to avoid bus reset

So Bjorn Helgaas beat me to it,

I'm monitoring kernel messages with
```
sudo journalctl --system -f
```

Indeed all that is necessary to generate the trace from the patch is to unbind the FW643 from ohci on my system (unpatched kernel):
```
su -c 'echo -n "0000:03:00.0" > /sys/bus/pci/drivers/firewire_ohci/unbind'
```

In combination with the kernel parameters intel_iommu=on iommu=pt, I can then bind it to vfio-pci
```
su -c 'echo -n "0000:03:00.0" > /sys/bus/pci/drivers/vfio-pci/bind'
```

And then I'd attach it to a qemu Windows VM using `-device vfio-pci,host=03:00.0 \`.
The OS finds the firewire card straightaway and the RME FireFace driver connects on booting just like it would when booting the guest on bare metal.

> $ echo 1 > sudo tee -a /sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:02.0/0000:06:00.0/reset
> (nothing happens)

Replicating the reset line Takashi sent on my system, there is no error in the kernel log but playback doesn't stop either, leading me to believe that permissions of sudo may be insufficient (root-only)?
```
echo 1 > sudo tee -a /sys/devices/pci0000\:00/0000\:00\:1c.1/0000\:03\:00.0/reset
```

So instead I ran this:
```
su -c 'echo 1 > /sys/devices/pci0000\:00/0000\:00\:1c.1/0000\:03\:00.0/reset'
```
Playback stopped immediately and could not be resumed.

Then I received this trace:

INFO: task alsa-sink-Firef:4110 blocked for more than 245 seconds.
Tainted: G W OE 6.6.10-1-MANJARO #1
task:alsa-sink-Firef state:D stack:0 pid:4110 ppid:2657 flags:0x00000002
Call Trace:
<TASK>
__schedule+0x3e7/0x1410
? tlb_batch_pages_flush+0x3d/0x70
schedule+0x5e/0xd0
schedule_timeout+0x151/0x160
wait_for_completion+0x8a/0x160
fw_run_transaction+0xe5/0x120 [firewire_core d9ff4eaf1ffb23a203d413e851f405323b49fec7]
? __pfx_split_transaction_timeout_callback+0x10/0x10 [firewire_core d9ff4eaf1ffb23a203d413e851f405323b49fec7]
? __pfx_transmit_complete_callback+0x10/0x10 [firewire_core d9ff4eaf1ffb23a203d413e851f405323b49fec7]
? __pfx_transaction_callback+0x10/0x10 [firewire_core d9ff4eaf1ffb23a203d413e851f405323b49fec7]
snd_fw_transaction+0x70/0x110 [snd_firewire_lib 30b43a591db389bbc6be51459cb243ba1fe1e662]
ff800_finish_session+0x43/0x70 [snd_fireface 5f7f3f556960f4838886792be8e9c18aa5089b0a]
snd_ff_stream_stop_duplex+0x39/0x70 [snd_fireface 5f7f3f556960f4838886792be8e9c18aa5089b0a]
pcm_hw_free+0x3c/0x50 [snd_fireface 5f7f3f556960f4838886792be8e9c18aa5089b0a]
snd_pcm_common_ioctl+0xe28/0x12b0 [snd_pcm 24933227879438b755ef98bc4844113025f38cdf]
? __seccomp_filter+0x32c/0x510
? __vm_munmap+0xbb/0x150
snd_pcm_ioctl+0x2e/0x50 [snd_pcm 24933227879438b755ef98bc4844113025f38cdf]
__x64_sys_ioctl+0x94/0xd0
do_syscall_64+0x5d/0x90
? syscall_exit_to_user_mode+0x2b/0x40
? do_syscall_64+0x6c/0x90
? do_syscall_64+0x6c/0x90
entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Now of course yanking the device from underneath streaming snd_firewire is not an intended usecase and I get that this has good cause to produce a trace but it shows that root privileges are necessary here.

Doing this without running playback yields the same result.

In both instances, I had to REISUO the system as it would keep waiting for that thread to finish when shutting down.

When I instead turned the FireFace off, I got this kernel message:
snd_fireface fw1.0: transaction failed: bus reset
snd_fireface fw1.0: transaction failed: bus reset
snd_fireface fw1.0: transaction failed: bus reset
Then running the same command, this time no threads went into D state and start producing spinlock messages/traces but by that point, BUT I wasn't able to use the FireFace after powering it back on again.
At least shutdown worked normally this time.

Are there other steps necessary to get the FW643 back working again after this?

System:
MSI PRO Z690-A DDR4(MS-7D25)
StarTech FW800 PCIe card (LSI FW643)

A peculiarity about this system that may or may not be of relevance here:
When rebooting, the FW643 can not be seen by the system any more.
Its root port also won't show up, regardless of slot used.
I have to perform a full shutdown for it to be recognized again.
This behavior is OS-independent, but it never happened on my previous Z68 system with this card (or any), there reboot worked as you'd expect.
The original MSI BIOS was buggy in this regard (and many others) in that it would not even recognize the card half the time booting from power-off.
MSI support did not care, said it was an "old device". How they can claim PCIe compliance is beyond me.
This is why I'm running Dasharo (coreboot), it always picks up the card from full power-off and behaves very predictably in other regards.
It might even be an issue with modern Intel since I've read of very similar issues on other manufacturer's Z690 and even Z790 boards (missing PCIe devices after reboot).
I've learned to live with this as I don't know how to solve it and I'm
stuck on this platform. Should of bought AMD again this time around.

> Can you both collect the output of "sudo lspci -vvv" so we can try to figure out the difference?
> This is my 1394 OHCI hardware.
```
sudo lspci -vvv
03:00.0 FireWire (IEEE 1394): LSI Corporation FW643 [TrueFire] PCIe 1394b Controller (rev 08) (prog-if 10 [OHCI])
Subsystem: Device 5901:1101
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 136
IOMMU group: 13
Region 0: Memory at 50800000 (64-bit, non-prefetchable) [size=4K]
Capabilities: [44] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME+
Capabilities: [4c] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee00358 Data: 0000
Capabilities: [60] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10W
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 256 bytes, MaxReadReq 2048 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <64us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [140 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
VC1: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable- ID=0 ArbSelect=Fixed TC/VC=00
Status: NegoPending- InProgress-
Capabilities: [170 v1] Device Serial Number 00-00-00-00-00-00-00-00
Kernel driver in use: firewire_ohci
Kernel modules: firewire_ohci
```
This does not change whether I boot patched or unpatched kernel.

> > > Can you collect the output of:
> > >
> > > $ find /sys/devices -name reset_method | xargs grep .
> Edmund, could you run this command, too?

with patch applied:
```
sudo find /sys/devices -name reset_method | xargs grep .
/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/reset_method:pm bus
/sys/devices/pci0000:00/0000:00:1c.0/reset_method:pm
/sys/devices/pci0000:00/0000:00:1c.1/0000:03:00.0/reset_method:pm
/sys/devices/pci0000:00/0000:00:1c.1/reset_method:pm
/sys/devices/pci0000:00/0000:00:1a.0/0000:01:00.0/reset_method:flr bus
/sys/devices/pci0000:00/0000:00:1a.0/reset_method:pm
/sys/devices/pci0000:00/0000:00:1d.0/reset_method:pm
/sys/devices/pci0000:00/0000:00:1d.0/0000:05:00.0/reset_method:flr bus
/sys/devices/pci0000:00/0000:00:02.0/reset_method:flr pm
/sys/devices/pci0000:00/0000:00:1c.2/reset_method:pm
/sys/devices/pci0000:00/0000:00:1c.2/0000:04:00.0/reset_method:flr bus
```
without applied patch only bus reset method is added, everything else
stays the same:
```
/sys/devices/pci0000:00/0000:00:1c.1/0000:03:00.0/reset_method:pm bus
```

This is the root port it is currently connected to:
```
sudo lspci -vvv
00:1c.1 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #2 (rev 11) (prog-if 00 [Normal decode])
Subsystem: Micro-Star International Co., Ltd. [MSI] Alder Lake-S PCH PCI Express Root Port
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 124
IOMMU group: 7
Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
I/O behind bridge: f000-0fff [disabled] [16-bit]
Memory behind bridge: 50800000-508fffff [size=1M] [32-bit]
Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff [disabled] [64-bit]
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 256 bytes, MaxReadReq 128 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
LnkCap: Port #2, Speed 8GT/s, Width x1, ASPM not supported
ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1
TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
Slot #1, PowerLimit 10W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
Changed: MRL- PresDet- LinkState+
RootCap: CRSVisible-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp- 10BitTagReq- OBFF Via WAKE#, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 2
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- LN System CLS Not Supported, TPHComp- ExtTPHComp- ARIFwd+
AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled, ARIFwd-
AtomicOpsCtl: ReqEn- EgressBlck-
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee00278 Data: 0000
Capabilities: [98] Subsystem: Micro-Star International Co., Ltd. [MSI] Alder Lake-S PCH PCI Express Root Port
Capabilities: [a0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
RootCmd: CERptEn+ NFERptEn+ FERptEn+
RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
Capabilities: [220 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
Capabilities: [150 v1] Precision Time Measurement
PTMCap: Requester:- Responder:+ Root:+
PTMClockGranularity: 4ns
PTMControl: Enabled:+ RootSelected:+
PTMEffectiveGranularity: Unknown
Capabilities: [a30 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [a90 v1] Data Link Feature <?>
Kernel driver in use: pcieport
```

`lspci -tv` reveals that it is connected to root port #2:
+-1c.1-[03]----00.0 LSI Corporation FW643 [TrueFire] PCIe 1394b Controller

I hope this information is of use to you.
Maybe there is a better solution?

Kind regards,
Edmund Raile


2024-03-28 21:06:57

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Mark LSI FW643 to avoid bus reset

On Thu, Mar 28, 2024 at 02:42:01PM -0600, Alex Williamson wrote:
> On Wed, 27 Mar 2024 10:01:19 -0500
> Bjorn Helgaas <[email protected]> wrote:
>
> > On Tue, Mar 26, 2024 at 10:18:58PM +0900, Takashi Sakamoto wrote:
> > > On Mon, Mar 25, 2024 at 09:41:49AM -0500, Bjorn Helgaas wrote:
> > > > So even without this patch, you are able to pass the FW643 to a VM
> > > > with VFIO, and you don't see any issues caused by VFIO resetting the
> > > > device?
> > >
> > > Absolutely yes, at least in my VM, for recent years to maintain Linux
> > > FireWire subsystem and ALSA firewire stack.
> >
> > So there must be something different between your system and Edmund's.
> > Maybe we can refine the quirk so it avoids the SBR on Edmund's system
> > but not yours.
> >
> > Can you both collect the output of "sudo lspci -vvv" so we can try to
> > figure out the difference? Also a complete dmesg log would be helpful
> > and would contain DMI information that we might need if this is
> > firmware dependent.
>
> The original patch proposed for this gave me the impression that this
> was a device used on various old Mac systems, not likely applicable to
> a general purpose plug-in card. Given the expanded use case, I'd
> suggest reverting the patch.

Makes sense, I'll queue up a revert for v6.9 so we can take some time
to figure this out.

> I think we need significantly more exhaustive testing on the afflicted
> system to understand whether this is an issue with the endpoint, the
> root port, the BIOS, etc.
>
> In the meantime, or maybe as a permanent solution, Edmund can make use
> of the reset_method interface in pci-syfs to restrict the available
> reset methods for the device rather than risk removing a reset
> mechanism identified as working by other users. My 2 cents. Thanks,
>
> Alex
>

2024-03-28 21:33:31

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Mark LSI FW643 to avoid bus reset

On Wed, 27 Mar 2024 10:01:19 -0500
Bjorn Helgaas <[email protected]> wrote:

> On Tue, Mar 26, 2024 at 10:18:58PM +0900, Takashi Sakamoto wrote:
> > On Mon, Mar 25, 2024 at 09:41:49AM -0500, Bjorn Helgaas wrote:
> > > So even without this patch, you are able to pass the FW643 to a VM
> > > with VFIO, and you don't see any issues caused by VFIO resetting the
> > > device?
> >
> > Absolutely yes, at least in my VM, for recent years to maintain Linux
> > FireWire subsystem and ALSA firewire stack.
>
> So there must be something different between your system and Edmund's.
> Maybe we can refine the quirk so it avoids the SBR on Edmund's system
> but not yours.
>
> Can you both collect the output of "sudo lspci -vvv" so we can try to
> figure out the difference? Also a complete dmesg log would be helpful
> and would contain DMI information that we might need if this is
> firmware dependent.

The original patch proposed for this gave me the impression that this
was a device used on various old Mac systems, not likely applicable to
a general purpose plug-in card. Given the expanded use case, I'd
suggest reverting the patch.

I think we need significantly more exhaustive testing on the afflicted
system to understand whether this is an issue with the endpoint, the
root port, the BIOS, etc.

In the meantime, or maybe as a permanent solution, Edmund can make use
of the reset_method interface in pci-syfs to restrict the available
reset methods for the device rather than risk removing a reset
mechanism identified as working by other users. My 2 cents. Thanks,

Alex


2024-03-29 04:41:36

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Mark LSI FW643 to avoid bus reset

On Thu, Mar 28, 2024 at 02:42:01PM -0600, Alex Williamson wrote:
> On Wed, 27 Mar 2024 10:01:19 -0500 Bjorn Helgaas <[email protected]> wrote:
> The original patch proposed for this gave me the impression that this
> was a device used on various old Mac systems, not likely applicable to
> a general purpose plug-in card.

I'm still using one of those "old Mac systems" as my daily driver.

Just checked the ACPI tables and there's an FPEN method below the
FRWR device which toggles GPIO 48 on the PCH. Checked the schematics
as well and GPIO 48 is marked FW_PWR_EN. The GPIO controls load
switches which cut power to the FW643 chip when nothing is connected.

Also, FW_PWR_EN feeds into an SLG4AP016V chip where it seems to
internally gate FW_CLKREQ_L.

I'm guessing the driver may need to call the FPEN ACPI method after
issuing a SBR to force the chip on (or perhaps first off, then on)
and thereby re-enable Clock Request.

It's a pity the ohci.c driver doesn't seem to support runtime PM.
That would allow cutting power to the chip when nothing is connected
and thus increase battery life. The ACPI tables indicate that the
platform sends a notification when something is plugged in, so all
the necessary ingredients are there but we're not taking advantage
of them.

Thanks,

Lukas

2024-03-29 08:12:35

by Takashi Sakamoto

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Mark LSI FW643 to avoid bus reset

Hi Lukas,

On Fri, Mar 29, 2024 at 05:41:16AM +0100, Lukas Wunner wrote:
> On Thu, Mar 28, 2024 at 02:42:01PM -0600, Alex Williamson wrote:
> > On Wed, 27 Mar 2024 10:01:19 -0500 Bjorn Helgaas <[email protected]> wrote:
> > The original patch proposed for this gave me the impression that this
> > was a device used on various old Mac systems, not likely applicable to
> > a general purpose plug-in card.
>
> I'm still using one of those "old Mac systems" as my daily driver.
>
> Just checked the ACPI tables and there's an FPEN method below the
> FRWR device which toggles GPIO 48 on the PCH. Checked the schematics
> as well and GPIO 48 is marked FW_PWR_EN. The GPIO controls load
> switches which cut power to the FW643 chip when nothing is connected.
>
> Also, FW_PWR_EN feeds into an SLG4AP016V chip where it seems to
> internally gate FW_CLKREQ_L.
>
> I'm guessing the driver may need to call the FPEN ACPI method after
> issuing a SBR to force the chip on (or perhaps first off, then on)
> and thereby re-enable Clock Request.
>
> It's a pity the ohci.c driver doesn't seem to support runtime PM.
> That would allow cutting power to the chip when nothing is connected
> and thus increase battery life. The ACPI tables indicate that the
> platform sends a notification when something is plugged in, so all
> the necessary ingredients are there but we're not taking advantage
> of them.
>
> Thanks,
>
> Lukas

Yup. In both PCI drivers and unit drivers belonging to Linux FireWire
subsystem, any type of runtime PM is not supported. If I integrate 1394
OHCI driver, I should implement all of the above in any callback of
runtime PM, or the part of the above is already supported by any driver
in parent PCI layer?


Thanks

Takashi Sakamoto

2024-03-29 08:13:46

by Takashi Sakamoto

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Mark LSI FW643 to avoid bus reset

Hi,

On Thu, Mar 28, 2024 at 06:35:29PM +0000, edmund.raile wrote:
> So instead I ran this:
> ```
> su -c 'echo 1 > /sys/devices/pci0000\:00/0000\:00\:1c.1/0000\:03\:00.0/reset'
> ```
> Playback stopped immediately and could not be resumed.
>
> Then I received this trace:
>
> INFO: task alsa-sink-Firef:4110 blocked for more than 245 seconds.
> Tainted: G W OE 6.6.10-1-MANJARO #1
> task:alsa-sink-Firef state:D stack:0 pid:4110 ppid:2657 flags:0x00000002
> Call Trace:
> <TASK>
> __schedule+0x3e7/0x1410
> ? tlb_batch_pages_flush+0x3d/0x70
> schedule+0x5e/0xd0
> schedule_timeout+0x151/0x160
> wait_for_completion+0x8a/0x160
> fw_run_transaction+0xe5/0x120 [firewire_core d9ff4eaf1ffb23a203d413e851f405323b49fec7]
> ? __pfx_split_transaction_timeout_callback+0x10/0x10 [firewire_core d9ff4eaf1ffb23a203d413e851f405323b49fec7]
> ? __pfx_transmit_complete_callback+0x10/0x10 [firewire_core d9ff4eaf1ffb23a203d413e851f405323b49fec7]
> ? __pfx_transaction_callback+0x10/0x10 [firewire_core d9ff4eaf1ffb23a203d413e851f405323b49fec7]
> snd_fw_transaction+0x70/0x110 [snd_firewire_lib 30b43a591db389bbc6be51459cb243ba1fe1e662]
> ff800_finish_session+0x43/0x70 [snd_fireface 5f7f3f556960f4838886792be8e9c18aa5089b0a]
> snd_ff_stream_stop_duplex+0x39/0x70 [snd_fireface 5f7f3f556960f4838886792be8e9c18aa5089b0a]
> pcm_hw_free+0x3c/0x50 [snd_fireface 5f7f3f556960f4838886792be8e9c18aa5089b0a]
> snd_pcm_common_ioctl+0xe28/0x12b0 [snd_pcm 24933227879438b755ef98bc4844113025f38cdf]
> ? __seccomp_filter+0x32c/0x510
> ? __vm_munmap+0xbb/0x150
> snd_pcm_ioctl+0x2e/0x50 [snd_pcm 24933227879438b755ef98bc4844113025f38cdf]
> __x64_sys_ioctl+0x94/0xd0
> do_syscall_64+0x5d/0x90
> ? syscall_exit_to_user_mode+0x2b/0x40
> ? do_syscall_64+0x6c/0x90
> ? do_syscall_64+0x6c/0x90
> entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Please mind that current software stack to operate your device does not
support this kind of operation, as I've already sent to you several times.
Users should cancel any type of communication on IEEE 1394 bus, then
unplug devices from the bus (or power them off), finally operate
suspending.

By the way, it is apart from PCI subsystem. Your change is now going to
be reverted for v6.9.


Regards

Takashi Sakamoto

2024-03-29 15:17:52

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Mark LSI FW643 to avoid bus reset

On Fri, Mar 29, 2024 at 05:12:19PM +0900, Takashi Sakamoto wrote:
> On Fri, Mar 29, 2024 at 05:41:16AM +0100, Lukas Wunner wrote:
> > Just checked the ACPI tables and there's an FPEN method below the
> > FRWR device which toggles GPIO 48 on the PCH. Checked the schematics
> > as well and GPIO 48 is marked FW_PWR_EN. The GPIO controls load
> > switches which cut power to the FW643 chip when nothing is connected.
> >
> > Also, FW_PWR_EN feeds into an SLG4AP016V chip where it seems to
> > internally gate FW_CLKREQ_L.
> >
> > I'm guessing the driver may need to call the FPEN ACPI method after
> > issuing a SBR to force the chip on (or perhaps first off, then on)
> > and thereby re-enable Clock Request.
> >
> > It's a pity the ohci.c driver doesn't seem to support runtime PM.
> > That would allow cutting power to the chip when nothing is connected
> > and thus increase battery life. The ACPI tables indicate that the
> > platform sends a notification when something is plugged in, so all
> > the necessary ingredients are there but we're not taking advantage
> > of them.
>
> Yup. In both PCI drivers and unit drivers belonging to Linux FireWire
> subsystem, any type of runtime PM is not supported. If I integrate 1394
> OHCI driver, I should implement all of the above in any callback of
> runtime PM, or the part of the above is already supported by any driver
> in parent PCI layer?

The power management method Apple uses to cut power to the FireWire
controller, Thunderbolt controller and discrete GPU is nonstandard.
It's *implemented* in ACPI, but doesn't *conform* to ACPI: There are
no Power Resources described in the ACPI tables, just custom methods.

This can be made to work on Linux by assigning a dev_pm_domain to the
Root Port above the FireWire controller. The dev_pm_domain callbacks
cut power to the FireWire controller on ->runtime_suspend() and
reinstate it on ->runtime_resume(). The reason this needs to be done
at the Root Port level is that the PCI core assumes the FireWire
controller is powered on when it calls pci_pm_runtime_resume() for it.
Normally that function would reinstate power through ACPI via
pci_power_up(), but that doesn't work due to the nonstandard nature
of Apple's ACPI tables.

I've implemented this 8 years ago for Thunderbolt but unfortunately
got sidetracked and thus haven't been able to finish upstreaming it yet:

https://github.com/l1k/linux/commit/a53d44439d42

I'm not as familiar with ohci.c as I am (or was) with thunderbolt.ko.
If you could amend ohci.c to call pm_runtime_get() when something is
attached and call pm_runtime_put() when something is detached, I could
look into bringing up the ACPI stuff. You could acquire one runtime PM
ref for each attached device or just acquire a single ref if *anything*
is connected at all. Doesn't matter. But acquiring one ref per attached
device might be simpler.

I would also need a function to perform a bus scan upon runtime resume
which looks for new devices and acquires refs as necessary. Once that
infrastructure exists, adding the Apple-specific ACPI stuff wouldn't
be too hard for me to do as I could just adapt what I did for Thunderbolt.

Thanks,

Lukas

2024-03-30 10:14:32

by edmund.raile

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Mark LSI FW643 to avoid bus reset

> Please mind that current software stack to operate your device does not
> support this kind of operation, as I've already sent to you several times.
> Users should cancel any type of communication on IEEE 1394 bus, then
> unplug devices from the bus (or power them off), finally operate
> suspending.

Yes I know, that's what I meant by it having "good cause to produce a trace".
I only meant ot demonstrate that sudo for tee is not potent enough here, which might also
be a reason why you get no kernel message.
I'm assuming the unbind indeed produces no kernel trace for you.
su -c 'echo -n "0000:03:00.0" > /sys/bus/pci/drivers/firewire_ohci/unbind'

> The power management method Apple uses to cut power to the FireWire
> controller, Thunderbolt controller and discrete GPU is nonstandard.
I sincerely hope any implementation for Apple PCs wouldn't hinder normal
operation of FW643 add-in-cards.

> In the meantime, or maybe as a permanent solution, Edmund can make use
> of the reset_method interface in pci-syfs to restrict the available
> reset methods for the device rather than risk removing a reset
> mechanism identified as working by other users.

> Revert 29a43dc130ce until we figure out a better solution. In the
> meantime, we can use the sysfs "reset_method" interface to restrict the
> available reset methods.

I tried your suggestion:
Instead of the patch:
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATT, 0x5901, quirk_no_bus_reset);
To avoid bus reset, as root, I ran:
echo -n "pm"> /sys/devices/pci0000:00/0000:00:1c.1/0000:03:00.0/reset_method
Which reduced reset_methods from 'pm bus' to just 'pm', according to
cat /sys/devices/pci0000:00/0000:00:1c.1/0000:03:00.0/reset_method

Then to bind the FW643 to vfio-pci:
echo -n "fw1.0" > /sys/bus/firewire/drivers/snd_fireface/unbind
echo -n "0000:03:00.0" > /sys/bus/pci/drivers/firewire_ohci/unbind
At this point, no kernel message, so I assume unbind worked fine!

modprobe vfio_pci
modprobe vfio_iommu_type1
modprobe vfio_pci
then strangely binding to vfio-pci returned an error
echo -n "0000:03:00.0" > /sys/bus/pci/drivers/firewire_ohci/bind
echo: write error: no matching device found.
so I used
echo 11c1 5901 > /sys/bus/pci/drivers/vfio-pci/new_id

Finally running qemu with '-device vfio-pci,host=03:00.0' produces these kernel messages:
pcieport 0000:00:1c.1: broken device, retraining non-functional downstream link at 2.5GT/s
pcieport 0000:00:1c.1: retraining failed
pcieport 0000:00:1c.1: broken device, retraining non-functional downstream link at 2.5GT/s
pcieport 0000:00:1c.1: retraining failed
vfio-pci 0000:03:00.0: not ready 1023ms after bus reset; waiting
vfio-pci 0000:03:00.0: not ready 2047ms after bus reset; waiting
vfio-pci 0000:03:00.0: not ready 4095ms after bus reset; waiting
vfio-pci 0000:03:00.0: not ready 8191ms after bus reset; waiting
vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting
vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting
vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up
twice, then:
vfio-pci 0000:03:00.0: Unable to change power state from D0 to D3hot, device inaccessible
vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

So the quirk_no_bus_reset does more than just setting reset_method manually.

So how can I use it then?

Zooming out, might this not be a general issue with bus reset not working on Z690?
Who do I turn to?