2020-11-02 18:53:12

by Kalle Valo

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

+ linux-wireless, linux-pci, devin

Thomas Krause <[email protected]> writes:

>> I had the same problem as well back in the days, for me enabling
>> CONFIG_IRQ_REMAP helped. If it helps for you also I wonder if we should
>> mention that in the ath11k warning above :)
>
> CONFIG_IRQ_REMAP did not do the trick. I noticed that the Wi-Fi card
> is behind a PCI bridge which is also disabled, could this be a
> problem?
>
> 00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20) (prog-if 00
> [Normal decode])
> Flags: bus master, fast devsel, latency 0, IRQ 123
> Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
> I/O behind bridge: [disabled]
> Memory behind bridge: 8c300000-8c3fffff [size=1M]
> Prefetchable memory behind bridge: [disabled]
> Capabilities: [40] Express Root Port (Slot+), MSI 00
> Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
> Capabilities: [90] Subsystem: Dell Device 0991
> Capabilities: [a0] Power Management version 3
> Capabilities: [100] Advanced Error Reporting
> Capabilities: [220] Access Control Services
> Capabilities: [150] Precision Time Measurement
> Capabilities: [200] L1 PM Substates
> Capabilities: [a00] Downstream Port Containment
> Kernel driver in use: pcieport

I don't know enough about PCI to say if the bridge is a problem or not.
I'm adding linux-wireless and linux-pci in someone can help. Also Devin
seems to have a similar problem.

To summarise: Thomas is reporting[1] a problem with ath11k on QCA6390
PCI device where he is not having enough MSI vectors. ath11k needs 32
vectors but pci_alloc_irq_vectors() returns -ENOSPC. PCI support is new
for ath11k and introduced in v5.10-rc1. The irq allocation code is in
drivers/net/wireless/ath/ath11k/pci.c. [2]

Can PCI folks help, what could cause this and how to debug it further?

I would first try with a full distro kernel config, just in case there's
some another important kernel config missing.

[1] http://lists.infradead.org/pipermail/ath11k/2020-October/000466.html

[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/wireless/ath/ath11k/pci.c#n633

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


2020-11-02 20:59:56

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

[+cc Govind, author of 5697a564d369 ("ath11k: pci: add MSI config
initialisation")]

On Mon, Nov 02, 2020 at 08:49:51PM +0200, Kalle Valo wrote:
> + linux-wireless, linux-pci, devin
>
> Thomas Krause <[email protected]> writes:
>
> >> I had the same problem as well back in the days, for me enabling
> >> CONFIG_IRQ_REMAP helped. If it helps for you also I wonder if we should
> >> mention that in the ath11k warning above :)
> >
> > CONFIG_IRQ_REMAP did not do the trick. I noticed that the Wi-Fi card
> > is behind a PCI bridge which is also disabled, could this be a
> > problem?
> >
> > 00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20) (prog-if 00
> > [Normal decode])
> > Flags: bus master, fast devsel, latency 0, IRQ 123
> > Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
> > I/O behind bridge: [disabled]
> > Memory behind bridge: 8c300000-8c3fffff [size=1M]
> > Prefetchable memory behind bridge: [disabled]
> > Capabilities: [40] Express Root Port (Slot+), MSI 00
> > Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
> > Capabilities: [90] Subsystem: Dell Device 0991
> > Capabilities: [a0] Power Management version 3
> > Capabilities: [100] Advanced Error Reporting
> > Capabilities: [220] Access Control Services
> > Capabilities: [150] Precision Time Measurement
> > Capabilities: [200] L1 PM Substates
> > Capabilities: [a00] Downstream Port Containment
> > Kernel driver in use: pcieport
>
> I don't know enough about PCI to say if the bridge is a problem or not.

I don't think the bridge is an issue here. AFAICT the bridge's I/O
and prefetchable memory windows are disabled, but the non-prefetchable
window *is* enabled and contains the space consumed by the ath11k
device:

00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20)
Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
Memory behind bridge: 8c300000-8c3fffff [size=1M]
56:00.0 Network controller: Qualcomm Device 1101 (rev 01)
Region 0: Memory at 8c300000 (64-bit, non-prefetchable) [size=1M]

> To summarise: Thomas is reporting[1] a problem with ath11k on QCA6390
> PCI device where he is not having enough MSI vectors. ath11k needs 32
> vectors but pci_alloc_irq_vectors() returns -ENOSPC. PCI support is new
> for ath11k and introduced in v5.10-rc1. The irq allocation code is in
> drivers/net/wireless/ath/ath11k/pci.c. [2]

This code is needlessly complicated. If you absolutely need
msi_config.total_vectors and can't settle for any less, you can do
this:

num_vectors = pci_alloc_irq_vectors(ab_pci->pdev,
msi_config.total_vectors,
msi_config.total_vectors,
PCI_IRQ_MSI);

if (num_vectors < 0) {
ath11k_err(ab, "failed to get %d MSI vectors (%d)\n",
msi_config.total_vectors, num_vectors);
return num_vectors;
}

But it seems a little greedy if the device can't operate at all unless
it gets 32 vectors. Are you sure that's a hard requirement? Most
devices can work with fewer vectors, even if it reduces performance.

> I would first try with a full distro kernel config, just in case there's
> some another important kernel config missing.
>
> [1] http://lists.infradead.org/pipermail/ath11k/2020-October/000466.html

Tangent: have you considered getting this list archived on
https://lore.kernel.org/lists.html?

> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/wireless/ath/ath11k/pci.c#n633
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-11-03 11:21:57

by Devin Bayer

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On 02/11/2020 21.57, Bjorn Helgaas wrote:
>>>
>>> CONFIG_IRQ_REMAP did not do the trick. I noticed that the Wi-Fi card
>>> is behind a PCI bridge which is also disabled, could this be a
>>> problem?

Just to provide another case, I have the same issue with this driver.

CONFIG_IRQ_REMAP=y and doesn't have any effect.

I'm unsure if the issue could be my system (Atom / Intel J1900) or the
that I'm using a slightly different card. Is there anyway to tell from the
lspci output? Here is what I guess is most relevant:

00:1c.2 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI Express Root Port 3 (rev 0e) (prog-if 00 [Normal decode])
Memory behind bridge: d0000000-d0ffffff [size=16M]
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee08004 Data: 4021

03:00.0 Unassigned class [ff00]: Qualcomm Device 1101
Subsystem: Qualcomm Device 0108
Region 0: Memory at d0000000 (64-bit, non-prefetchable) [size=16M]
Capabilities: [50] MSI: Enable+ Count=1/32 Maskable+ 64bit-
Address: fee01004 Data: 40ef
Masking: ffffffff Pending: 00000000
Capabilities: [70] Express (v2) Endpoint, MSI 00

Thanks,
Devin

2020-11-03 13:29:25

by Kalle Valo

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

Bjorn Helgaas <[email protected]> writes:

> [+cc Govind, author of 5697a564d369 ("ath11k: pci: add MSI config
> initialisation")]
>
> On Mon, Nov 02, 2020 at 08:49:51PM +0200, Kalle Valo wrote:
>> + linux-wireless, linux-pci, devin
>>
>> Thomas Krause <[email protected]> writes:
>>
>> >> I had the same problem as well back in the days, for me enabling
>> >> CONFIG_IRQ_REMAP helped. If it helps for you also I wonder if we should
>> >> mention that in the ath11k warning above :)
>> >
>> > CONFIG_IRQ_REMAP did not do the trick. I noticed that the Wi-Fi card
>> > is behind a PCI bridge which is also disabled, could this be a
>> > problem?
>> >
>> > 00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20) (prog-if 00
>> > [Normal decode])
>> > Flags: bus master, fast devsel, latency 0, IRQ 123
>> > Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
>> > I/O behind bridge: [disabled]
>> > Memory behind bridge: 8c300000-8c3fffff [size=1M]
>> > Prefetchable memory behind bridge: [disabled]
>> > Capabilities: [40] Express Root Port (Slot+), MSI 00
>> > Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
>> > Capabilities: [90] Subsystem: Dell Device 0991
>> > Capabilities: [a0] Power Management version 3
>> > Capabilities: [100] Advanced Error Reporting
>> > Capabilities: [220] Access Control Services
>> > Capabilities: [150] Precision Time Measurement
>> > Capabilities: [200] L1 PM Substates
>> > Capabilities: [a00] Downstream Port Containment
>> > Kernel driver in use: pcieport
>>
>> I don't know enough about PCI to say if the bridge is a problem or not.
>
> I don't think the bridge is an issue here. AFAICT the bridge's I/O
> and prefetchable memory windows are disabled, but the non-prefetchable
> window *is* enabled and contains the space consumed by the ath11k
> device:
>
> 00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20)
> Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
> Memory behind bridge: 8c300000-8c3fffff [size=1M]
> 56:00.0 Network controller: Qualcomm Device 1101 (rev 01)
> Region 0: Memory at 8c300000 (64-bit, non-prefetchable) [size=1M]

Good to know that the bridge shouldn't be the problem. Do you have any
ideas how to make more vectors available to ath11k, besides
CONFIG_IRQ_REMAP? Because QCA6390 works in Windows I doubt this is a
hardware problem.

>> To summarise: Thomas is reporting[1] a problem with ath11k on QCA6390
>> PCI device where he is not having enough MSI vectors. ath11k needs 32
>> vectors but pci_alloc_irq_vectors() returns -ENOSPC. PCI support is new
>> for ath11k and introduced in v5.10-rc1. The irq allocation code is in
>> drivers/net/wireless/ath/ath11k/pci.c. [2]
>
> This code is needlessly complicated. If you absolutely need
> msi_config.total_vectors and can't settle for any less, you can do
> this:
>
> num_vectors = pci_alloc_irq_vectors(ab_pci->pdev,
> msi_config.total_vectors,
> msi_config.total_vectors,
> PCI_IRQ_MSI);
>
> if (num_vectors < 0) {
> ath11k_err(ab, "failed to get %d MSI vectors (%d)\n",
> msi_config.total_vectors, num_vectors);
> return num_vectors;
> }

True, this should be cleaned up. But of course this won't solve the
actual problem.

> But it seems a little greedy if the device can't operate at all unless
> it gets 32 vectors. Are you sure that's a hard requirement? Most
> devices can work with fewer vectors, even if it reduces performance.

This was my first reaction as well when I saw the code for the first
time. And the reply I got is that the firmware needs all 32 vectors, it
won't work with less.

>> I would first try with a full distro kernel config, just in case there's
>> some another important kernel config missing.
>>
>> [1] http://lists.infradead.org/pipermail/ath11k/2020-October/000466.html
>
> Tangent: have you considered getting this list archived on
> https://lore.kernel.org/lists.html?

Good point, actually I have not. I'll add both ath10k and ath11k lists
to lore. It's even more important now that lists.infradead.org had a
hard drive crash and lost years of archives.

Thanks for the help!

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-11-03 13:32:42

by Carl Huang

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On 2020-11-03 04:57, Bjorn Helgaas wrote:
> [+cc Govind, author of 5697a564d369 ("ath11k: pci: add MSI config
> initialisation")]
>
> On Mon, Nov 02, 2020 at 08:49:51PM +0200, Kalle Valo wrote:
>> + linux-wireless, linux-pci, devin
>>
>> Thomas Krause <[email protected]> writes:
>>
>> >> I had the same problem as well back in the days, for me enabling
>> >> CONFIG_IRQ_REMAP helped. If it helps for you also I wonder if we should
>> >> mention that in the ath11k warning above :)
>> >
>> > CONFIG_IRQ_REMAP did not do the trick. I noticed that the Wi-Fi card
>> > is behind a PCI bridge which is also disabled, could this be a
>> > problem?
>> >
>> > 00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20) (prog-if 00
>> > [Normal decode])
>> > Flags: bus master, fast devsel, latency 0, IRQ 123
>> > Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
>> > I/O behind bridge: [disabled]
>> > Memory behind bridge: 8c300000-8c3fffff [size=1M]
>> > Prefetchable memory behind bridge: [disabled]
>> > Capabilities: [40] Express Root Port (Slot+), MSI 00
>> > Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
>> > Capabilities: [90] Subsystem: Dell Device 0991
>> > Capabilities: [a0] Power Management version 3
>> > Capabilities: [100] Advanced Error Reporting
>> > Capabilities: [220] Access Control Services
>> > Capabilities: [150] Precision Time Measurement
>> > Capabilities: [200] L1 PM Substates
>> > Capabilities: [a00] Downstream Port Containment
>> > Kernel driver in use: pcieport
>>
>> I don't know enough about PCI to say if the bridge is a problem or
>> not.
>
> I don't think the bridge is an issue here. AFAICT the bridge's I/O
> and prefetchable memory windows are disabled, but the non-prefetchable
> window *is* enabled and contains the space consumed by the ath11k
> device:
>
> 00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20)
> Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
> Memory behind bridge: 8c300000-8c3fffff [size=1M]
> 56:00.0 Network controller: Qualcomm Device 1101 (rev 01)
> Region 0: Memory at 8c300000 (64-bit, non-prefetchable) [size=1M]
>

Have you enabled VT-d from BIOS? This is required at least on some old
laptops.


>> To summarise: Thomas is reporting[1] a problem with ath11k on QCA6390
>> PCI device where he is not having enough MSI vectors. ath11k needs 32
>> vectors but pci_alloc_irq_vectors() returns -ENOSPC. PCI support is
>> new
>> for ath11k and introduced in v5.10-rc1. The irq allocation code is in
>> drivers/net/wireless/ath/ath11k/pci.c. [2]
>
> This code is needlessly complicated. If you absolutely need
> msi_config.total_vectors and can't settle for any less, you can do
> this:
>
> num_vectors = pci_alloc_irq_vectors(ab_pci->pdev,
> msi_config.total_vectors,
> msi_config.total_vectors,
> PCI_IRQ_MSI);
>
> if (num_vectors < 0) {
> ath11k_err(ab, "failed to get %d MSI vectors (%d)\n",
> msi_config.total_vectors, num_vectors);
> return num_vectors;
> }
>
> But it seems a little greedy if the device can't operate at all unless
> it gets 32 vectors. Are you sure that's a hard requirement? Most
> devices can work with fewer vectors, even if it reduces performance.
>
>> I would first try with a full distro kernel config, just in case
>> there's
>> some another important kernel config missing.
>>
>> [1]
>> http://lists.infradead.org/pipermail/ath11k/2020-October/000466.html
>
> Tangent: have you considered getting this list archived on
> https://lore.kernel.org/lists.html?
>
>> [2]
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/wireless/ath/ath11k/pci.c#n633
>>
>> --
>> https://patchwork.kernel.org/project/linux-wireless/list/
>>
>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-11-03 16:12:25

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

[+cc Thomas, Christoph for question about not enough MSI IRQ vectors]

On Tue, Nov 03, 2020 at 08:49:06AM +0200, Kalle Valo wrote:
> Bjorn Helgaas <[email protected]> writes:
> > On Mon, Nov 02, 2020 at 08:49:51PM +0200, Kalle Valo wrote:
> >> + linux-wireless, linux-pci, devin
> >>
> >> Thomas Krause <[email protected]> writes:
> >>
> >> >> I had the same problem as well back in the days, for me enabling
> >> >> CONFIG_IRQ_REMAP helped. If it helps for you also I wonder if we should
> >> >> mention that in the ath11k warning above :)
> >> >
> >> > CONFIG_IRQ_REMAP did not do the trick. I noticed that the Wi-Fi card
> >> > is behind a PCI bridge which is also disabled, could this be a
> >> > problem?
> >> >
> >> > 00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20) (prog-if 00
> >> > [Normal decode])
> >> > Flags: bus master, fast devsel, latency 0, IRQ 123
> >> > Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
> >> > I/O behind bridge: [disabled]
> >> > Memory behind bridge: 8c300000-8c3fffff [size=1M]
> >> > Prefetchable memory behind bridge: [disabled]
> >> > Capabilities: [40] Express Root Port (Slot+), MSI 00
> >> > Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
> >> > Capabilities: [90] Subsystem: Dell Device 0991
> >> > Capabilities: [a0] Power Management version 3
> >> > Capabilities: [100] Advanced Error Reporting
> >> > Capabilities: [220] Access Control Services
> >> > Capabilities: [150] Precision Time Measurement
> >> > Capabilities: [200] L1 PM Substates
> >> > Capabilities: [a00] Downstream Port Containment
> >> > Kernel driver in use: pcieport
> >>
> >> I don't know enough about PCI to say if the bridge is a problem or not.
> >
> > I don't think the bridge is an issue here. AFAICT the bridge's I/O
> > and prefetchable memory windows are disabled, but the non-prefetchable
> > window *is* enabled and contains the space consumed by the ath11k
> > device:
> >
> > 00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20)
> > Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
> > Memory behind bridge: 8c300000-8c3fffff [size=1M]
> > 56:00.0 Network controller: Qualcomm Device 1101 (rev 01)
> > Region 0: Memory at 8c300000 (64-bit, non-prefetchable) [size=1M]
>
> Good to know that the bridge shouldn't be the problem. Do you have any
> ideas how to make more vectors available to ath11k, besides
> CONFIG_IRQ_REMAP? Because QCA6390 works in Windows I doubt this is a
> hardware problem.
>
> >> To summarise: Thomas is reporting[1] a problem with ath11k on QCA6390
> >> PCI device where he is not having enough MSI vectors. ath11k needs 32
> >> vectors but pci_alloc_irq_vectors() returns -ENOSPC. PCI support is new
> >> for ath11k and introduced in v5.10-rc1. The irq allocation code is in
> >> drivers/net/wireless/ath/ath11k/pci.c. [2]

> > But it seems a little greedy if the device can't operate at all unless
> > it gets 32 vectors. Are you sure that's a hard requirement? Most
> > devices can work with fewer vectors, even if it reduces performance.
>
> This was my first reaction as well when I saw the code for the first
> time. And the reply I got is that the firmware needs all 32 vectors, it
> won't work with less.

I do see a couple other drivers that are completely inflexible (they
request min==max). But I don't know the system constraint you're
hitting. CC'd Thomas & Christoph in case they have time to give us a
hint.

> >> I would first try with a full distro kernel config, just in case there's
> >> some another important kernel config missing.
> >>
> >> [1] http://lists.infradead.org/pipermail/ath11k/2020-October/000466.html
> >
> > Tangent: have you considered getting this list archived on
> > https://lore.kernel.org/lists.html?
>
> Good point, actually I have not. I'll add both ath10k and ath11k lists
> to lore. It's even more important now that lists.infradead.org had a
> hard drive crash and lost years of archives.

Or you could just add linux-wireless, e.g.,

L: [email protected]
L: [email protected]

or even consider moving from ath10k and ath11k to
[email protected]. I think there's some value in
consolidating low-volume lists. It looks like ath11k had < 90
messages for all of October.

2020-11-03 21:23:53

by Thomas Gleixner

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On Tue, Nov 03 2020 at 10:08, Bjorn Helgaas wrote:
> On Tue, Nov 03, 2020 at 08:49:06AM +0200, Kalle Valo wrote:
>> Bjorn Helgaas <[email protected]> writes:
>> > On Mon, Nov 02, 2020 at 08:49:51PM +0200, Kalle Valo wrote:
>> >> Thomas Krause <[email protected]> writes:
>> >>
>> >> >> I had the same problem as well back in the days, for me enabling
>> >> >> CONFIG_IRQ_REMAP helped. If it helps for you also I wonder if we should
>> >> >> mention that in the ath11k warning above :)

Interrupt remapping only helps when the device supports only MSI (not
MSI-X) because x86 (kernel) does not support multiple MSI interrupts
without remapping.

So if only MSI is available then you get exactly _one_ MSI vector
without remapping.

>> >> > CONFIG_IRQ_REMAP did not do the trick.

The config alone does not help. The hardware has to support it and the
BIOS has to enable it.

Check the BIOS for a switch which is named 'VT-d' or such. It might
depend on 'Intel Virtualization Technology' or such.

>> > 00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20)
>> > Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
>> > Memory behind bridge: 8c300000-8c3fffff [size=1M]
>> > 56:00.0 Network controller: Qualcomm Device 1101 (rev 01)
>> > Region 0: Memory at 8c300000 (64-bit, non-prefetchable) [size=1M]

So I grabbed the PCI info from the link and it has:

Capabilities: [50] MSI: Enable- Count=1/32 Maskable+ 64bit-

So no MSI-X, ergo only one MSI interrupt without remapping.

>> >> To summarise: Thomas is reporting[1] a problem with ath11k on QCA6390
>> >> PCI device where he is not having enough MSI vectors. ath11k needs 32
>> >> vectors but pci_alloc_irq_vectors() returns -ENOSPC. PCI support is new
>> >> for ath11k and introduced in v5.10-rc1. The irq allocation code is in
>> >> drivers/net/wireless/ath/ath11k/pci.c. [2]
>
>> > But it seems a little greedy if the device can't operate at all unless
>> > it gets 32 vectors. Are you sure that's a hard requirement? Most
>> > devices can work with fewer vectors, even if it reduces performance.

Right, even most high end network cards work with one interrupt.

>> This was my first reaction as well when I saw the code for the first
>> time. And the reply I got is that the firmware needs all 32 vectors, it
>> won't work with less.

Great design.

> I do see a couple other drivers that are completely inflexible (they
> request min==max). But I don't know the system constraint you're
> hitting. CC'd Thomas & Christoph in case they have time to give us a
> hint.

Can I have a full dmesg please?

Please enable CONFIG_IRQ_REMAP and CONFIG_INTEL_IOMMU (not strictly
required, but it's a Dell BIOS after all). Also set
CONFIG_INTEL_IOMMU_DEFAULT_ON.

Or simply try a distro kernel.

Thanks,

tglx

2020-11-03 22:43:57

by Thomas Gleixner

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On Tue, Nov 03 2020 at 22:08, Thomas Gleixner wrote:
> On Tue, Nov 03 2020 at 10:08, Bjorn Helgaas wrote:
>>> > But it seems a little greedy if the device can't operate at all unless
>>> > it gets 32 vectors. Are you sure that's a hard requirement? Most
>>> > devices can work with fewer vectors, even if it reduces performance.
>
> Right, even most high end network cards work with one interrupt.
>
>>> This was my first reaction as well when I saw the code for the first
>>> time. And the reply I got is that the firmware needs all 32 vectors, it
>>> won't work with less.
>
> Great design.

Just to put more information to this:

Enforcing 32 vectors with MSI is beyond silly. Due to the limitations of
MSI all of these vectors will be affine to a single CPU unless irq
remapping is available and enabled.

So if irq remapping is not enabled, then what are the 32 vectors buying?
Exactly nothing because they just compete to be handled on the very same
CPU. If the design requires more than one vector, then this should be
done with MSI-X (which allows individual affinities and individual
masking).

That's known for 20 years and MSI-X exists for exactly that reason. But
hardware people still insist on implementing MSI (probably because it
saves 0.002$ per chip).

But there is also the firmware side. Enforcing the availability of 32
vectors on MSI is silly to begin with as explained above, but it's also
silly given the constraints of the x86 vector space. It takes just 6
devices having the same 32 vector requirement to exhaust it. Oh well...

Thanks,

tglx








2020-11-04 13:08:46

by Thomas Krause

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310


> Can I have a full dmesg please?
>
> Please enable CONFIG_IRQ_REMAP and CONFIG_INTEL_IOMMU (not strictly
> required, but it's a Dell BIOS after all). Also set
> CONFIG_INTEL_IOMMU_DEFAULT_ON.

I attached a full dmesg with the latest ath11k master and the
configuration enabled. VT-d was enabled in BIOS. Most options have been
already been present in the previous attempts (copied from the distro
config) but CONFIG_INTEL_IOMMU_DEFAULT_ON needed to be set manually. I
hope this helps, if there is more I can do to debug it on my side I'm
happy to do so.


Attachments:
dmesg.log (82.29 kB)
lspci.txt (52.21 kB)
Download all attachments

2020-11-04 15:27:16

by Thomas Gleixner

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On Wed, Nov 04 2020 at 14:04, Thomas Krause wrote:
> config) but CONFIG_INTEL_IOMMU_DEFAULT_ON needed to be set manually. I
> hope this helps, if there is more I can do to debug it on my side I'm
> happy to do so.

> [ 0.050130] DMAR: [Firmware Bug]: Your BIOS is broken; DMAR reported at address 0!
> BIOS vendor: Dell Inc.; Ver: 1.1.1; Product Version:

> [ 0.103693] DMAR: Host address width 39
> [ 0.103693] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
> [ 0.103697] DMAR: dmar0: reg_base_addr fed90000 ver 4:0 cap 1c0000c40660462 ecap 69e2ff0505e
> [ 0.103698] DMAR: DRHD base: 0x000000fed84000 flags: 0x0
> [ 0.103701] DMAR: dmar1: reg_base_addr fed84000 ver 1:0 cap d2008c40660462 ecap f050da
> [ 0.103702] DMAR: DRHD base: 0x000000fed86000 flags: 0x0
> [ 0.103706] DMAR: dmar2: reg_base_addr fed86000 ver 1:0 cap d2008c40660462 ecap f050da
> [ 0.103707] DMAR: DRHD base: 0x00000000000000 flags: 0x1
> [ 0.103707] DMAR: Parse DMAR table failure.

which disables interrupt remapping and therefore the driver gets only
one MSI which makes it unhappy.

Not that I'm surprised, it's Dell.... Can you check whether they have a
BIOS update for that box?

Thanks,

tglx

2020-11-05 13:24:20

by Kalle Valo

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

Thomas Gleixner <[email protected]> writes:

> On Wed, Nov 04 2020 at 14:04, Thomas Krause wrote:
>> config) but CONFIG_INTEL_IOMMU_DEFAULT_ON needed to be set manually. I
>> hope this helps, if there is more I can do to debug it on my side I'm
>> happy to do so.
>
>> [ 0.050130] DMAR: [Firmware Bug]: Your BIOS is broken; DMAR reported at address 0!
>> BIOS vendor: Dell Inc.; Ver: 1.1.1; Product Version:
>
>> [ 0.103693] DMAR: Host address width 39
>> [ 0.103693] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
>> [ 0.103697] DMAR: dmar0: reg_base_addr fed90000 ver 4:0 cap 1c0000c40660462 ecap 69e2ff0505e
>> [ 0.103698] DMAR: DRHD base: 0x000000fed84000 flags: 0x0
>> [ 0.103701] DMAR: dmar1: reg_base_addr fed84000 ver 1:0 cap d2008c40660462 ecap f050da
>> [ 0.103702] DMAR: DRHD base: 0x000000fed86000 flags: 0x0
>> [ 0.103706] DMAR: dmar2: reg_base_addr fed86000 ver 1:0 cap d2008c40660462 ecap f050da
>> [ 0.103707] DMAR: DRHD base: 0x00000000000000 flags: 0x1
>> [ 0.103707] DMAR: Parse DMAR table failure.
>
> which disables interrupt remapping and therefore the driver gets only
> one MSI which makes it unhappy.
>
> Not that I'm surprised, it's Dell.... Can you check whether they have a
> BIOS update for that box?

I was told that on Dell XPS 15 (with a working QCA6390 setup) there's a
separate "Virtualisation" setting in BIOS. See if you have that and try
enabling it.

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-11-06 11:47:55

by Devin Bayer

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On 03/11/2020 22.08, Thomas Gleixner wrote:
> On Tue, Nov 03 2020 at 10:08, Bjorn Helgaas wrote:
>
> Check the BIOS for a switch which is named 'VT-d' or such. It might
> depend on 'Intel Virtualization Technology' or such.
>

Thanks for this info. The platform I have, J1900, indeed does not support VT-d.

So I guess I'm not able to use this card. That's unfortunate.

It doesn't seem like the Windows driver works either. It doesn't give any errors
but it fails to find any wireless networks.

~ dev

2020-11-09 18:47:55

by Kalle Valo

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

Thomas Gleixner <[email protected]> writes:

> On Tue, Nov 03 2020 at 22:08, Thomas Gleixner wrote:
>> On Tue, Nov 03 2020 at 10:08, Bjorn Helgaas wrote:
>>>> > But it seems a little greedy if the device can't operate at all unless
>>>> > it gets 32 vectors. Are you sure that's a hard requirement? Most
>>>> > devices can work with fewer vectors, even if it reduces performance.
>>
>> Right, even most high end network cards work with one interrupt.
>>
>>>> This was my first reaction as well when I saw the code for the first
>>>> time. And the reply I got is that the firmware needs all 32 vectors, it
>>>> won't work with less.
>>
>> Great design.
>
> Just to put more information to this:
>
> Enforcing 32 vectors with MSI is beyond silly. Due to the limitations of
> MSI all of these vectors will be affine to a single CPU unless irq
> remapping is available and enabled.
>
> So if irq remapping is not enabled, then what are the 32 vectors buying?
> Exactly nothing because they just compete to be handled on the very same
> CPU. If the design requires more than one vector, then this should be
> done with MSI-X (which allows individual affinities and individual
> masking).
>
> That's known for 20 years and MSI-X exists for exactly that reason. But
> hardware people still insist on implementing MSI (probably because it
> saves 0.002$ per chip).
>
> But there is also the firmware side. Enforcing the availability of 32
> vectors on MSI is silly to begin with as explained above, but it's also
> silly given the constraints of the x86 vector space. It takes just 6
> devices having the same 32 vector requirement to exhaust it. Oh well...

Thanks Thomas, this is great info. I'm pushing this internally and we
try to get ath11k working with just one MSI vector.

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-11-09 18:49:47

by Kalle Valo

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

Bjorn Helgaas <[email protected]> writes:

>> > Tangent: have you considered getting this list archived on
>> > https://lore.kernel.org/lists.html?
>>
>> Good point, actually I have not. I'll add both ath10k and ath11k lists
>> to lore. It's even more important now that lists.infradead.org had a
>> hard drive crash and lost years of archives.
>
> Or you could just add linux-wireless, e.g.,
>
> L: [email protected]
> L: [email protected]
>
> or even consider moving from ath10k and ath11k to
> [email protected]. I think there's some value in
> consolidating low-volume lists. It looks like ath11k had < 90
> messages for all of October.

The background here is that linux-wireless is quite high volume list and
not everyone have time to follow that, so having specific ath10k and
ath11k lists make it easier for those people. So I'm hesitant to
shutdown driver lists for that reason.

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-11-10 08:36:25

by Kalle Valo

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

Kalle Valo <[email protected]> writes:

> Thomas Gleixner <[email protected]> writes:
>
>> On Wed, Nov 04 2020 at 14:04, Thomas Krause wrote:
>>> config) but CONFIG_INTEL_IOMMU_DEFAULT_ON needed to be set manually. I
>>> hope this helps, if there is more I can do to debug it on my side I'm
>>> happy to do so.
>>
>>> [ 0.050130] DMAR: [Firmware Bug]: Your BIOS is broken; DMAR
>>> reported at address 0!
>>> BIOS vendor: Dell Inc.; Ver: 1.1.1; Product Version:
>>
>>> [ 0.103693] DMAR: Host address width 39
>>> [ 0.103693] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
>>> [ 0.103697] DMAR: dmar0: reg_base_addr fed90000 ver 4:0 cap
>>> 1c0000c40660462 ecap 69e2ff0505e
>>> [ 0.103698] DMAR: DRHD base: 0x000000fed84000 flags: 0x0
>>> [ 0.103701] DMAR: dmar1: reg_base_addr fed84000 ver 1:0 cap
>>> d2008c40660462 ecap f050da
>>> [ 0.103702] DMAR: DRHD base: 0x000000fed86000 flags: 0x0
>>> [ 0.103706] DMAR: dmar2: reg_base_addr fed86000 ver 1:0 cap
>>> d2008c40660462 ecap f050da
>>> [ 0.103707] DMAR: DRHD base: 0x00000000000000 flags: 0x1
>>> [ 0.103707] DMAR: Parse DMAR table failure.
>>
>> which disables interrupt remapping and therefore the driver gets only
>> one MSI which makes it unhappy.
>>
>> Not that I'm surprised, it's Dell.... Can you check whether they have a
>> BIOS update for that box?
>
> I was told that on Dell XPS 15 (with a working QCA6390 setup) there's a
> separate "Virtualisation" setting in BIOS. See if you have that and try
> enabling it.

I was informed about another setting to test: try disabling "Enable
Secure Boot" in the BIOS. I don't know yet why it would help, but that's
what few people have recommended.

Please let me know how it goes.

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-11-11 08:55:08

by Thomas Krause

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310


Am 10.11.20 um 09:33 schrieb Kalle Valo:
>
>> I was told that on Dell XPS 15 (with a working QCA6390 setup) there's a
>> separate "Virtualisation" setting in BIOS. See if you have that and try
>> enabling it.
> I was informed about another setting to test: try disabling "Enable
> Secure Boot" in the BIOS. I don't know yet why it would help, but that's
> what few people have recommended.
>
> Please let me know how it goes.
>
I have two options under "Virtualization" in the BIOS: "Enable Intel
Virtualization Technology (VT)" and "VT for Direct I/O". Both were
enabled. Secure boot was also turned off. BIOS version is also at the
most current version 1.1.1. Because of the dmesg errors Thomas Gleixner
mentioned, I assume it would be best to contact Dell directly (even if
I'm not sure if and how fast they will respond). If the driver would
manage to work with only 1 vector, I assume this would also make it work
on my configuration, even with possible performance hits.

Best,

Thomas


2020-11-11 09:22:59

by Kalle Valo

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

Thomas Krause <[email protected]> writes:

> Am 10.11.20 um 09:33 schrieb Kalle Valo:
>>
>>> I was told that on Dell XPS 15 (with a working QCA6390 setup) there's a
>>> separate "Virtualisation" setting in BIOS. See if you have that and try
>>> enabling it.
>> I was informed about another setting to test: try disabling "Enable
>> Secure Boot" in the BIOS. I don't know yet why it would help, but that's
>> what few people have recommended.
>>
>> Please let me know how it goes.
>>
> I have two options under "Virtualization" in the BIOS: "Enable Intel
> Virtualization Technology (VT)" and "VT for Direct I/O". Both were
> enabled. Secure boot was also turned off. BIOS version is also at the
> most current version 1.1.1.

This is good to know, thanks for testing. Now we have explored all
possible BIOS options as I know of.

> Because of the dmesg errors Thomas Gleixner mentioned, I assume it
> would be best to contact Dell directly (even if I'm not sure if and
> how fast they will respond).

I have asked our people to report this to Dell, but no response yet.

> If the driver would manage to work with only 1 vector, I assume this
> would also make it work on my configuration, even with possible
> performance hits.

This is the workaround we are working on at the moment. There's now a
proof of concept patch but I'm not certain if it will work. I'll post it
as soon as I can and will provide the link in this thread.

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-11-11 09:40:09

by Thomas Gleixner

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On Wed, Nov 11 2020 at 09:53, Thomas Krause wrote:
> Am 10.11.20 um 09:33 schrieb Kalle Valo:
>>> I was told that on Dell XPS 15 (with a working QCA6390 setup) there's a
>>> separate "Virtualisation" setting in BIOS. See if you have that and try
>>> enabling it.
>> I was informed about another setting to test: try disabling "Enable
>> Secure Boot" in the BIOS. I don't know yet why it would help, but that's
>> what few people have recommended.
>>
>> Please let me know how it goes.
>>
> I have two options under "Virtualization" in the BIOS: "Enable Intel
> Virtualization Technology (VT)" and "VT for Direct I/O". Both were

VT for Direct I/O enables the IOMMU and the interrupt remapping unit,
but the kernel can't use it because the ACPI tables are busted.

> enabled. Secure boot was also turned off. BIOS version is also at the
> most current version 1.1.1. Because of the dmesg errors Thomas Gleixner
> mentioned, I assume it would be best to contact Dell directly (even if
> I'm not sure if and how fast they will respond). If the driver would

Good luck.

Thanks,

tglx

2020-11-11 19:11:37

by Kalle Valo

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

Kalle Valo <[email protected]> writes:

> Thomas Krause <[email protected]> writes:
>
>> Am 10.11.20 um 09:33 schrieb Kalle Valo:
>>>
>>>> I was told that on Dell XPS 15 (with a working QCA6390 setup) there's a
>>>> separate "Virtualisation" setting in BIOS. See if you have that and try
>>>> enabling it.
>>> I was informed about another setting to test: try disabling "Enable
>>> Secure Boot" in the BIOS. I don't know yet why it would help, but that's
>>> what few people have recommended.
>>>
>>> Please let me know how it goes.
>>>
>> I have two options under "Virtualization" in the BIOS: "Enable Intel
>> Virtualization Technology (VT)" and "VT for Direct I/O". Both were
>> enabled. Secure boot was also turned off. BIOS version is also at the
>> most current version 1.1.1.
>
> This is good to know, thanks for testing. Now we have explored all
> possible BIOS options as I know of.
>
>> Because of the dmesg errors Thomas Gleixner mentioned, I assume it
>> would be best to contact Dell directly (even if I'm not sure if and
>> how fast they will respond).
>
> I have asked our people to report this to Dell, but no response yet.
>
>> If the driver would manage to work with only 1 vector, I assume this
>> would also make it work on my configuration, even with possible
>> performance hits.
>
> This is the workaround we are working on at the moment. There's now a
> proof of concept patch but I'm not certain if it will work. I'll post it
> as soon as I can and will provide the link in this thread.

The proof of concept patch for v5.10-rc2 is here:

https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/

Hopefully it makes it possible to boot the firmware now. But this is a
quick hack and most likely buggy, so keep your expectations low :)

In case there are these warnings during firmware initialisation:

ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110

Try reverting this commit:

7fef431be9c9 mm/page_alloc: place pages to tail in __free_pages_core()

That's another issue which is debugged here:

http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-11-11 19:26:55

by wi nk

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

Kalle,

Thanks so much for your and your teams efforts. I've applied the
patch, and I'm receiving some errors similar to what you thought might
occur:

[ 7.802756] ath11k_pci 0000:55:00.0: WARNING: ath11k PCI support is
experimental!
[ 7.802797] ath11k_pci 0000:55:00.0: BAR 0: assigned [mem
0x8e300000-0x8e3fffff 64bit]
[ 7.802815] ath11k_pci 0000:55:00.0: enabling device (0000 -> 0002)
[ 7.803291] ath11k_pci 0000:55:00.0: MSI vectors: 1
[ 8.172623] ath11k_pci 0000:55:00.0: Respond mem req failed,
result: 1, err: 48
[ 8.172624] ath11k_pci 0000:55:00.0: qmi failed to respond fw mem req:-22

I've reverted the commit you mentioned and am rebuilding now. I'll
test in a few minutes.

Thanks!

On Wed, Nov 11, 2020 at 8:10 PM Kalle Valo <[email protected]> wrote:
>
> Kalle Valo <[email protected]> writes:
>
> > Thomas Krause <[email protected]> writes:
> >
> >> Am 10.11.20 um 09:33 schrieb Kalle Valo:
> >>>
> >>>> I was told that on Dell XPS 15 (with a working QCA6390 setup) there's a
> >>>> separate "Virtualisation" setting in BIOS. See if you have that and try
> >>>> enabling it.
> >>> I was informed about another setting to test: try disabling "Enable
> >>> Secure Boot" in the BIOS. I don't know yet why it would help, but that's
> >>> what few people have recommended.
> >>>
> >>> Please let me know how it goes.
> >>>
> >> I have two options under "Virtualization" in the BIOS: "Enable Intel
> >> Virtualization Technology (VT)" and "VT for Direct I/O". Both were
> >> enabled. Secure boot was also turned off. BIOS version is also at the
> >> most current version 1.1.1.
> >
> > This is good to know, thanks for testing. Now we have explored all
> > possible BIOS options as I know of.
> >
> >> Because of the dmesg errors Thomas Gleixner mentioned, I assume it
> >> would be best to contact Dell directly (even if I'm not sure if and
> >> how fast they will respond).
> >
> > I have asked our people to report this to Dell, but no response yet.
> >
> >> If the driver would manage to work with only 1 vector, I assume this
> >> would also make it work on my configuration, even with possible
> >> performance hits.
> >
> > This is the workaround we are working on at the moment. There's now a
> > proof of concept patch but I'm not certain if it will work. I'll post it
> > as soon as I can and will provide the link in this thread.
>
> The proof of concept patch for v5.10-rc2 is here:
>
> https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
>
> Hopefully it makes it possible to boot the firmware now. But this is a
> quick hack and most likely buggy, so keep your expectations low :)
>
> In case there are these warnings during firmware initialisation:
>
> ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
>
> Try reverting this commit:
>
> 7fef431be9c9 mm/page_alloc: place pages to tail in __free_pages_core()
>
> That's another issue which is debugged here:
>
> http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-11-11 19:31:51

by wi nk

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

Ok with 7fef431be9c9 reverted, it doesn't seem to change the initialization any:

[ 7.961867] ath11k_pci 0000:55:00.0: WARNING: ath11k PCI support is
experimental!
[ 7.961913] ath11k_pci 0000:55:00.0: BAR 0: assigned [mem
0x8e300000-0x8e3fffff 64bit]
[ 7.961930] ath11k_pci 0000:55:00.0: enabling device (0000 -> 0002)
[ 7.962009] ath11k_pci 0000:55:00.0: MSI vectors: 1
[ 8.461553] ath11k_pci 0000:55:00.0: Respond mem req failed,
result: 1, err: 48
[ 8.461556] ath11k_pci 0000:55:00.0: qmi failed to respond fw mem req:-22

and just for thoroughness, here are my firmware file checksums (sha256):

9cc48d1dce819ead4112c6a8051c51e4d75e2b11f99ba9d8738cf8108967b70e amss.bin
5081930c3b207f8ed82ff250f9b90fb77e87b2a92c3cf80ad020a58dea0bc5b7 board.bin
596482f780d21645f72a48acd9aed6c6fc8cf2d039ac31552a19800674d253cc m3.bin


Thanks!


On Wed, Nov 11, 2020 at 8:24 PM wi nk <[email protected]> wrote:
>
> Kalle,
>
> Thanks so much for your and your teams efforts. I've applied the
> patch, and I'm receiving some errors similar to what you thought might
> occur:
>
> [ 7.802756] ath11k_pci 0000:55:00.0: WARNING: ath11k PCI support is
> experimental!
> [ 7.802797] ath11k_pci 0000:55:00.0: BAR 0: assigned [mem
> 0x8e300000-0x8e3fffff 64bit]
> [ 7.802815] ath11k_pci 0000:55:00.0: enabling device (0000 -> 0002)
> [ 7.803291] ath11k_pci 0000:55:00.0: MSI vectors: 1
> [ 8.172623] ath11k_pci 0000:55:00.0: Respond mem req failed,
> result: 1, err: 48
> [ 8.172624] ath11k_pci 0000:55:00.0: qmi failed to respond fw mem req:-22
>
> I've reverted the commit you mentioned and am rebuilding now. I'll
> test in a few minutes.
>
> Thanks!
>
> On Wed, Nov 11, 2020 at 8:10 PM Kalle Valo <[email protected]> wrote:
> >
> > Kalle Valo <[email protected]> writes:
> >
> > > Thomas Krause <[email protected]> writes:
> > >
> > >> Am 10.11.20 um 09:33 schrieb Kalle Valo:
> > >>>
> > >>>> I was told that on Dell XPS 15 (with a working QCA6390 setup) there's a
> > >>>> separate "Virtualisation" setting in BIOS. See if you have that and try
> > >>>> enabling it.
> > >>> I was informed about another setting to test: try disabling "Enable
> > >>> Secure Boot" in the BIOS. I don't know yet why it would help, but that's
> > >>> what few people have recommended.
> > >>>
> > >>> Please let me know how it goes.
> > >>>
> > >> I have two options under "Virtualization" in the BIOS: "Enable Intel
> > >> Virtualization Technology (VT)" and "VT for Direct I/O". Both were
> > >> enabled. Secure boot was also turned off. BIOS version is also at the
> > >> most current version 1.1.1.
> > >
> > > This is good to know, thanks for testing. Now we have explored all
> > > possible BIOS options as I know of.
> > >
> > >> Because of the dmesg errors Thomas Gleixner mentioned, I assume it
> > >> would be best to contact Dell directly (even if I'm not sure if and
> > >> how fast they will respond).
> > >
> > > I have asked our people to report this to Dell, but no response yet.
> > >
> > >> If the driver would manage to work with only 1 vector, I assume this
> > >> would also make it work on my configuration, even with possible
> > >> performance hits.
> > >
> > > This is the workaround we are working on at the moment. There's now a
> > > proof of concept patch but I'm not certain if it will work. I'll post it
> > > as soon as I can and will provide the link in this thread.
> >
> > The proof of concept patch for v5.10-rc2 is here:
> >
> > https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
> >
> > Hopefully it makes it possible to boot the firmware now. But this is a
> > quick hack and most likely buggy, so keep your expectations low :)
> >
> > In case there are these warnings during firmware initialisation:
> >
> > ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> > ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
> >
> > Try reverting this commit:
> >
> > 7fef431be9c9 mm/page_alloc: place pages to tail in __free_pages_core()
> >
> > That's another issue which is debugged here:
> >
> > http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html
> >
> > --
> > https://patchwork.kernel.org/project/linux-wireless/list/
> >
> > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-11-11 19:46:26

by Kalle Valo

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

(please don't top post, makes it harder to read emails)

wi nk <[email protected]> writes:

> Ok with 7fef431be9c9 reverted, it doesn't seem to change the initialization any:
>
> [ 7.961867] ath11k_pci 0000:55:00.0: WARNING: ath11k PCI support is
> experimental!
> [ 7.961913] ath11k_pci 0000:55:00.0: BAR 0: assigned [mem
> 0x8e300000-0x8e3fffff 64bit]
> [ 7.961930] ath11k_pci 0000:55:00.0: enabling device (0000 -> 0002)
> [ 7.962009] ath11k_pci 0000:55:00.0: MSI vectors: 1
> [ 8.461553] ath11k_pci 0000:55:00.0: Respond mem req failed,
> result: 1, err: 48
> [ 8.461556] ath11k_pci 0000:55:00.0: qmi failed to respond fw mem req:-22

I also see this -22 error (see my logs in [1]), even when the firmware
reboots normally. Do you see anything after these messages?

The problem which reverting 7fef431be9c9 helps has these errors:

ath11k_pci 0000:06:00.0: qmi failed memory request, err = -110
ath11k_pci 0000:06:00.0: qmi failed to respond fw mem req:-110

[1] http://lists.infradead.org/pipermail/ath11k/2020-November/000641.html

> and just for thoroughness, here are my firmware file checksums (sha256):
>
> 9cc48d1dce819ead4112c6a8051c51e4d75e2b11f99ba9d8738cf8108967b70e amss.bin
> 5081930c3b207f8ed82ff250f9b90fb77e87b2a92c3cf80ad020a58dea0bc5b7 board.bin
> 596482f780d21645f72a48acd9aed6c6fc8cf2d039ac31552a19800674d253cc m3.bin

But these do not look same. I have:

a101dc90f8e876f39383b60c9da64ec4 /lib/firmware/ath11k/QCA6390/hw2.0/amss.bin
4c0781f659d2b7d6bef10a2e3d457728 /lib/firmware/ath11k/QCA6390/hw2.0/board-2.bin
d4c912a3501a3694a3f460d13de06d28 /lib/firmware/ath11k/QCA6390/hw2.0/m3.bin

Download them like this:

wget https://github.com/kvalo/ath11k-firmware/raw/master/QCA6390/hw2.0/1.0.1/WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1/amss.bin

wget https://github.com/kvalo/ath11k-firmware/raw/master/QCA6390/hw2.0/1.0.1/WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1/m3.bin

wget https://github.com/kvalo/ath11k-firmware/raw/master/QCA6390/hw2.0/board-2.bin

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-11-11 20:16:59

by wi nk

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On Wed, Nov 11, 2020 at 8:45 PM Kalle Valo <[email protected]> wrote:
>
> (please don't top post, makes it harder to read emails)
>
> wi nk <[email protected]> writes:
>
> > Ok with 7fef431be9c9 reverted, it doesn't seem to change the initialization any:
> >
> > [ 7.961867] ath11k_pci 0000:55:00.0: WARNING: ath11k PCI support is
> > experimental!
> > [ 7.961913] ath11k_pci 0000:55:00.0: BAR 0: assigned [mem
> > 0x8e300000-0x8e3fffff 64bit]
> > [ 7.961930] ath11k_pci 0000:55:00.0: enabling device (0000 -> 0002)
> > [ 7.962009] ath11k_pci 0000:55:00.0: MSI vectors: 1
> > [ 8.461553] ath11k_pci 0000:55:00.0: Respond mem req failed,
> > result: 1, err: 48
> > [ 8.461556] ath11k_pci 0000:55:00.0: qmi failed to respond fw mem req:-22
>
> I also see this -22 error (see my logs in [1]), even when the firmware
> reboots normally. Do you see anything after these messages?
>
> The problem which reverting 7fef431be9c9 helps has these errors:
>
> ath11k_pci 0000:06:00.0: qmi failed memory request, err = -110
> ath11k_pci 0000:06:00.0: qmi failed to respond fw mem req:-110
>
> [1] http://lists.infradead.org/pipermail/ath11k/2020-November/000641.html
>
> > and just for thoroughness, here are my firmware file checksums (sha256):
> >
> > 9cc48d1dce819ead4112c6a8051c51e4d75e2b11f99ba9d8738cf8108967b70e amss.bin
> > 5081930c3b207f8ed82ff250f9b90fb77e87b2a92c3cf80ad020a58dea0bc5b7 board.bin
> > 596482f780d21645f72a48acd9aed6c6fc8cf2d039ac31552a19800674d253cc m3.bin
>
> But these do not look same. I have:
>
> a101dc90f8e876f39383b60c9da64ec4 /lib/firmware/ath11k/QCA6390/hw2.0/amss.bin
> 4c0781f659d2b7d6bef10a2e3d457728 /lib/firmware/ath11k/QCA6390/hw2.0/board-2.bin
> d4c912a3501a3694a3f460d13de06d28 /lib/firmware/ath11k/QCA6390/hw2.0/m3.bin
>
> Download them like this:
>
> wget https://github.com/kvalo/ath11k-firmware/raw/master/QCA6390/hw2.0/1.0.1/WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1/amss.bin
>
> wget https://github.com/kvalo/ath11k-firmware/raw/master/QCA6390/hw2.0/1.0.1/WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1/m3.bin
>
> wget https://github.com/kvalo/ath11k-firmware/raw/master/QCA6390/hw2.0/board-2.bin
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Sorry for the top posting, web email has ruined my mailing list
etiquette. It seems having the correct firmware in place has caused
some forward movement. I now see this:

[ 8.513210] ath11k_pci 0000:55:00.0: WARNING: ath11k PCI support is
experimental!
[ 8.513251] ath11k_pci 0000:55:00.0: BAR 0: assigned [mem
0x8e300000-0x8e3fffff 64bit]
[ 8.513269] ath11k_pci 0000:55:00.0: enabling device (0000 -> 0002)
[ 8.513348] ath11k_pci 0000:55:00.0: MSI vectors: 1
[ 8.789499] ath11k_pci 0000:55:00.0: Respond mem req failed,
result: 1, err: 0
[ 8.789500] ath11k_pci 0000:55:00.0: qmi failed to respond fw mem req:-22
[ 8.794236] ath11k_pci 0000:55:00.0: req mem_seg[0] 0x28100000 524288 1
[ 8.794237] ath11k_pci 0000:55:00.0: req mem_seg[1] 0x28180000 524288 1
[ 8.794238] ath11k_pci 0000:55:00.0: req mem_seg[2] 0x28200000 524288 1
[ 8.794238] ath11k_pci 0000:55:00.0: req mem_seg[3] 0x28280000 294912 1
[ 8.794239] ath11k_pci 0000:55:00.0: req mem_seg[4] 0x28300000 524288 1
[ 8.794239] ath11k_pci 0000:55:00.0: req mem_seg[5] 0x28380000 524288 1
[ 8.794240] ath11k_pci 0000:55:00.0: req mem_seg[6] 0x27c00000 458752 1
[ 8.794240] ath11k_pci 0000:55:00.0: req mem_seg[7] 0x27c80000 131072 1
[ 8.794240] ath11k_pci 0000:55:00.0: req mem_seg[8] 0x27d00000 524288 4
[ 8.794241] ath11k_pci 0000:55:00.0: req mem_seg[9] 0x27d80000 360448 4
[ 8.794241] ath11k_pci 0000:55:00.0: req mem_seg[10] 0x28578000 16384 1
[ 8.807053] ath11k_pci 0000:55:00.0: chip_id 0x0 chip_family 0xb
board_id 0xff soc_id 0xffffffff
[ 8.807054] ath11k_pci 0000:55:00.0: fw_version 0x101c06cc
fw_build_timestamp 2020-06-24 19:50 fw_build_id
[ 8.910984] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
[ 9.446566] ath11k_pci 0000:55:00.0 wlp85s0: renamed from wlan0
[ 11.296620] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
[ 22.088028] ath11k_pci 0000:55:00.0: wmi command 12290 timeout
[ 22.088030] ath11k_pci 0000:55:00.0: failed to send WMI_STOP_SCAN_CMDID
[ 22.088031] ath11k_pci 0000:55:00.0: failed to stop wmi scan: -11
[ 22.088032] ath11k_pci 0000:55:00.0: failed to stop scan: -11
[ 22.088033] ath11k_pci 0000:55:00.0: failed to start hw scan: -110
[ 28.232066] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
[ 28.232069] ath11k_pci 0000:55:00.0: failed to send WMI_START_SCAN_CMDID
[ 28.232073] ath11k_pci 0000:55:00.0: failed to start hw scan: -11
[ 38.216054] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
[ 38.216057] ath11k_pci 0000:55:00.0: failed to send WMI_START_SCAN_CMDID
[ 38.216061] ath11k_pci 0000:55:00.0: failed to start hw scan: -11
[ 51.783961] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
[ 51.783965] ath11k_pci 0000:55:00.0: failed to send WMI_START_SCAN_CMDID
[ 51.783970] ath11k_pci 0000:55:00.0: failed to start hw scan: -11
[ 71.695627] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
[ 71.695629] ath11k_pci 0000:55:00.0: failed to send WMI_START_SCAN_CMDID
[ 71.695630] ath11k_pci 0000:55:00.0: failed to start hw scan: -11
[ 100.864905] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
[ 100.864909] ath11k_pci 0000:55:00.0: failed to send WMI_START_SCAN_CMDID
[ 100.864913] ath11k_pci 0000:55:00.0: failed to start hw scan: -11
[ 107.306896] mhi 0000:55:00.0: Device failed to exit MHI Reset state
[ 143.868561] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
[ 143.868564] ath11k_pci 0000:55:00.0: failed to send WMI_START_SCAN_CMDID
[ 143.868566] ath11k_pci 0000:55:00.0: failed to start hw scan: -11
[ 199.464250] mhi 0000:55:00.0: Device failed to exit MHI Reset state
<snip>

Occasionally my kernel is panic'ing at random spots (this is probably
related to the other patch I guess), but I do have a bit of an adapter
now ,ifconfig shows it. I don't seem to be able to find any networks
with it however.

2020-11-11 21:59:50

by Stefani Seibold

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On Wed, 2020-11-11 at 21:10 +0200, Kalle Valo wrote:
>
>
> The proof of concept patch for v5.10-rc2 is here:
>
> https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
>
> Hopefully it makes it possible to boot the firmware now. But this is
> a
> quick hack and most likely buggy, so keep your expectations low :)
>
> In case there are these warnings during firmware initialisation:
>
> ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
>
> Try reverting this commit:
>
> 7fef431be9c9 mm/page_alloc: place pages to tail in
> __free_pages_core()
>
> That's another issue which is debugged here:
>
> http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html
>

Success on DELL XPS13 910. Applying the patch and revert patch
7fef431be9c9 worked for me.

Thanks!


2020-11-11 22:03:49

by Stefani Seibold

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On Wed, 2020-11-11 at 21:10 +0200, Kalle Valo wrote:
>
> The proof of concept patch for v5.10-rc2 is here:
>
> https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
>
> Hopefully it makes it possible to boot the firmware now. But this is
> a
> quick hack and most likely buggy, so keep your expectations low :)
>
> In case there are these warnings during firmware initialisation:
>
> ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
>
> Try reverting this commit:
>
> 7fef431be9c9 mm/page_alloc: place pages to tail in
> __free_pages_core()
>
> That's another issue which is debugged here:
>
> http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html
>

Applying the patch and revert patch 7fef431be9c9 worked on the first
glance.

After a couple of minutes the connection get broken. The kernel log
shows the following error:

ath11k_pci 0000:55:00.0: wmi command 16387 timeout
ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
ath11k_pc
i 0000:55:00.0: failed to enable PMF QOS: (-11

It is also not possible to unload the ath11k_pci, rmmod will hang.


2020-11-12 01:35:43

by wi nk

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On Wed, Nov 11, 2020 at 11:02 PM Stefani Seibold <[email protected]> wrote:
>
> On Wed, 2020-11-11 at 21:10 +0200, Kalle Valo wrote:
> >
> > The proof of concept patch for v5.10-rc2 is here:
> >
> > https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
> >
> > Hopefully it makes it possible to boot the firmware now. But this is
> > a
> > quick hack and most likely buggy, so keep your expectations low :)
> >
> > In case there are these warnings during firmware initialisation:
> >
> > ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> > ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
> >
> > Try reverting this commit:
> >
> > 7fef431be9c9 mm/page_alloc: place pages to tail in
> > __free_pages_core()
> >
> > That's another issue which is debugged here:
> >
> > http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html
> >
>
> Applying the patch and revert patch 7fef431be9c9 worked on the first
> glance.
>
> After a couple of minutes the connection get broken. The kernel log
> shows the following error:
>
> ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
> ath11k_pc
> i 0000:55:00.0: failed to enable PMF QOS: (-11
>
> It is also not possible to unload the ath11k_pci, rmmod will hang.
>
>

I can confirm the same behavior as Stefani so far. After applying the
patch, and reverting commit 7fef431be9c9, I am able to connect to a
network. It hasn't disconnected yet (I'm sending this email via that
connection). I'll report what I find next.

Thanks again for the help!

2020-11-12 01:36:15

by wi nk

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

I've yet to see any instability after 45 minutes of exercising it, I
do see a couple of messages that came out of the driver:

[ 8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
[ 11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a

then when it associates:

[ 16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
[ 16.722636] wlp85s0: authenticated
[ 16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
[ 16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
(capab=0x411 status=0 aid=8)
[ 16.738443] wlp85s0: associated
[ 16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready

The adapter is achieving around 500 mbps on my gigabit connection, my
2018 mbp sees around 650, so it's doing pretty well so far.

Stefani - when you applied the patch that Kalle shared, which branch
did you apply it to? I applied it to ath11k-qca6390-bringup and when
I revert 7fef431be9c9 there is a small merge conflict I needed to
resolve. I wonder if either the starting branch, or your chosen
resolution are related to the instability you see (or I'm just lucky
so far! :)).

On Thu, Nov 12, 2020 at 1:24 AM wi nk <[email protected]> wrote:
>
> On Wed, Nov 11, 2020 at 11:02 PM Stefani Seibold <[email protected]> wrote:
> >
> > On Wed, 2020-11-11 at 21:10 +0200, Kalle Valo wrote:
> > >
> > > The proof of concept patch for v5.10-rc2 is here:
> > >
> > > https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
> > >
> > > Hopefully it makes it possible to boot the firmware now. But this is
> > > a
> > > quick hack and most likely buggy, so keep your expectations low :)
> > >
> > > In case there are these warnings during firmware initialisation:
> > >
> > > ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> > > ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
> > >
> > > Try reverting this commit:
> > >
> > > 7fef431be9c9 mm/page_alloc: place pages to tail in
> > > __free_pages_core()
> > >
> > > That's another issue which is debugged here:
> > >
> > > http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html
> > >
> >
> > Applying the patch and revert patch 7fef431be9c9 worked on the first
> > glance.
> >
> > After a couple of minutes the connection get broken. The kernel log
> > shows the following error:
> >
> > ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> > ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
> > ath11k_pc
> > i 0000:55:00.0: failed to enable PMF QOS: (-11
> >
> > It is also not possible to unload the ath11k_pci, rmmod will hang.
> >
> >
>
> I can confirm the same behavior as Stefani so far. After applying the
> patch, and reverting commit 7fef431be9c9, I am able to connect to a
> network. It hasn't disconnected yet (I'm sending this email via that
> connection). I'll report what I find next.
>
> Thanks again for the help!

2020-11-12 01:45:57

by wi nk

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On Thu, Nov 12, 2020 at 2:10 AM wi nk <[email protected]> wrote:
>
> I've yet to see any instability after 45 minutes of exercising it, I
> do see a couple of messages that came out of the driver:
>
> [ 8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
> [ 11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
>
> then when it associates:
>
> [ 16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> [ 16.722636] wlp85s0: authenticated
> [ 16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> [ 16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> (capab=0x411 status=0 aid=8)
> [ 16.738443] wlp85s0: associated
> [ 16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready
>
> The adapter is achieving around 500 mbps on my gigabit connection, my
> 2018 mbp sees around 650, so it's doing pretty well so far.
>
> Stefani - when you applied the patch that Kalle shared, which branch
> did you apply it to? I applied it to ath11k-qca6390-bringup and when
> I revert 7fef431be9c9 there is a small merge conflict I needed to
> resolve. I wonder if either the starting branch, or your chosen
> resolution are related to the instability you see (or I'm just lucky
> so far! :)).
>
> On Thu, Nov 12, 2020 at 1:24 AM wi nk <[email protected]> wrote:
> >
> > On Wed, Nov 11, 2020 at 11:02 PM Stefani Seibold <[email protected]> wrote:
> > >
> > > On Wed, 2020-11-11 at 21:10 +0200, Kalle Valo wrote:
> > > >
> > > > The proof of concept patch for v5.10-rc2 is here:
> > > >
> > > > https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
> > > >
> > > > Hopefully it makes it possible to boot the firmware now. But this is
> > > > a
> > > > quick hack and most likely buggy, so keep your expectations low :)
> > > >
> > > > In case there are these warnings during firmware initialisation:
> > > >
> > > > ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> > > > ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
> > > >
> > > > Try reverting this commit:
> > > >
> > > > 7fef431be9c9 mm/page_alloc: place pages to tail in
> > > > __free_pages_core()
> > > >
> > > > That's another issue which is debugged here:
> > > >
> > > > http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html
> > > >
> > >
> > > Applying the patch and revert patch 7fef431be9c9 worked on the first
> > > glance.
> > >
> > > After a couple of minutes the connection get broken. The kernel log
> > > shows the following error:
> > >
> > > ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> > > ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
> > > ath11k_pc
> > > i 0000:55:00.0: failed to enable PMF QOS: (-11
> > >
> > > It is also not possible to unload the ath11k_pci, rmmod will hang.
> > >
> > >
> >
> > I can confirm the same behavior as Stefani so far. After applying the
> > patch, and reverting commit 7fef431be9c9, I am able to connect to a
> > network. It hasn't disconnected yet (I'm sending this email via that
> > connection). I'll report what I find next.
> >
> > Thanks again for the help!

Sigh.... sorry for the top post again. I'll now get a real email client.

2020-11-12 05:41:22

by wi nk

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On Thu, Nov 12, 2020 at 2:11 AM wi nk <[email protected]> wrote:
>
> On Thu, Nov 12, 2020 at 2:10 AM wi nk <[email protected]> wrote:
> >
> > I've yet to see any instability after 45 minutes of exercising it, I
> > do see a couple of messages that came out of the driver:
> >
> > [ 8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
> > [ 11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
> >
> > then when it associates:
> >
> > [ 16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> > [ 16.722636] wlp85s0: authenticated
> > [ 16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> > [ 16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> > (capab=0x411 status=0 aid=8)
> > [ 16.738443] wlp85s0: associated
> > [ 16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready
> >
> > The adapter is achieving around 500 mbps on my gigabit connection, my
> > 2018 mbp sees around 650, so it's doing pretty well so far.
> >
> > Stefani - when you applied the patch that Kalle shared, which branch
> > did you apply it to? I applied it to ath11k-qca6390-bringup and when
> > I revert 7fef431be9c9 there is a small merge conflict I needed to
> > resolve. I wonder if either the starting branch, or your chosen
> > resolution are related to the instability you see (or I'm just lucky
> > so far! :)).
> >
> > On Thu, Nov 12, 2020 at 1:24 AM wi nk <[email protected]> wrote:
> > >
> > > On Wed, Nov 11, 2020 at 11:02 PM Stefani Seibold <[email protected]> wrote:
> > > >
> > > > On Wed, 2020-11-11 at 21:10 +0200, Kalle Valo wrote:
> > > > >
> > > > > The proof of concept patch for v5.10-rc2 is here:
> > > > >
> > > > > https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
> > > > >
> > > > > Hopefully it makes it possible to boot the firmware now. But this is
> > > > > a
> > > > > quick hack and most likely buggy, so keep your expectations low :)
> > > > >
> > > > > In case there are these warnings during firmware initialisation:
> > > > >
> > > > > ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> > > > > ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
> > > > >
> > > > > Try reverting this commit:
> > > > >
> > > > > 7fef431be9c9 mm/page_alloc: place pages to tail in
> > > > > __free_pages_core()
> > > > >
> > > > > That's another issue which is debugged here:
> > > > >
> > > > > http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html
> > > > >
> > > >
> > > > Applying the patch and revert patch 7fef431be9c9 worked on the first
> > > > glance.
> > > >
> > > > After a couple of minutes the connection get broken. The kernel log
> > > > shows the following error:
> > > >
> > > > ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> > > > ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
> > > > ath11k_pc
> > > > i 0000:55:00.0: failed to enable PMF QOS: (-11
> > > >
> > > > It is also not possible to unload the ath11k_pci, rmmod will hang.
> > > >
> > > >
> > >
> > > I can confirm the same behavior as Stefani so far. After applying the
> > > patch, and reverting commit 7fef431be9c9, I am able to connect to a
> > > network. It hasn't disconnected yet (I'm sending this email via that
> > > connection). I'll report what I find next.
> > >
> > > Thanks again for the help!
>
> Sigh.... sorry for the top post again. I'll now get a real email client.

So the connection remained super stable for a while, so I decided to
tempt fate and suspend the laptop to see what would happen :).

[ 5994.143715] PM: suspend exit
[ 5997.260351] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
[ 5997.260353] ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[ 5997.260356] ath11k_pci 0000:55:00.0: failed to enable dynamic bw: -11
[ 6000.332299] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
[ 6000.332303] ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[ 6000.332308] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
[ 6003.404365] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
[ 6003.404368] ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[ 6003.404373] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
[ 6016.204347] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
[ 6016.204351] ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[ 6016.204357] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
[ 6019.276319] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
[ 6019.276323] ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[ 6019.276329] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
[ 6031.052272] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
[ 6031.052275] ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[ 6031.052279] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
[ 6034.128257] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
[ 6034.128261] ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[ 6034.128265] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
[ 6039.500241] ath11k_pci 0000:55:00.0: qmi failed set mode request,
mode: 4, err = -110
[ 6039.500244] ath11k_pci 0000:55:00.0: qmi failed to send wlan mode off

I was able to remove the ath11k module using rmmod -f , and then
modprobe ath11k + atk11k_pci and the device was able to reassociate
and bring the connection back up.

2020-11-12 06:31:11

by Carl Huang

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On 2020-11-12 10:31, wi nk wrote:
> On Thu, Nov 12, 2020 at 2:11 AM wi nk <[email protected]> wrote:
>>
>> On Thu, Nov 12, 2020 at 2:10 AM wi nk <[email protected]> wrote:
>> >
>> > I've yet to see any instability after 45 minutes of exercising it, I
>> > do see a couple of messages that came out of the driver:
>> >
>> > [ 8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
>> > [ 11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
>> >
>> > then when it associates:
>> >
>> > [ 16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
>> > [ 16.722636] wlp85s0: authenticated
>> > [ 16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
>> > [ 16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
>> > (capab=0x411 status=0 aid=8)
>> > [ 16.738443] wlp85s0: associated
>> > [ 16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready
>> >
>> > The adapter is achieving around 500 mbps on my gigabit connection, my
>> > 2018 mbp sees around 650, so it's doing pretty well so far.
>> >
>> > Stefani - when you applied the patch that Kalle shared, which branch
>> > did you apply it to? I applied it to ath11k-qca6390-bringup and when
>> > I revert 7fef431be9c9 there is a small merge conflict I needed to
>> > resolve. I wonder if either the starting branch, or your chosen
>> > resolution are related to the instability you see (or I'm just lucky
>> > so far! :)).
>> >
>> > On Thu, Nov 12, 2020 at 1:24 AM wi nk <[email protected]> wrote:
>> > >
>> > > On Wed, Nov 11, 2020 at 11:02 PM Stefani Seibold <[email protected]> wrote:
>> > > >
>> > > > On Wed, 2020-11-11 at 21:10 +0200, Kalle Valo wrote:
>> > > > >
>> > > > > The proof of concept patch for v5.10-rc2 is here:
>> > > > >
>> > > > > https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
>> > > > >
>> > > > > Hopefully it makes it possible to boot the firmware now. But this is
>> > > > > a
>> > > > > quick hack and most likely buggy, so keep your expectations low :)
>> > > > >
>> > > > > In case there are these warnings during firmware initialisation:
>> > > > >
>> > > > > ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
>> > > > > ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
>> > > > >
>> > > > > Try reverting this commit:
>> > > > >
>> > > > > 7fef431be9c9 mm/page_alloc: place pages to tail in
>> > > > > __free_pages_core()
>> > > > >
>> > > > > That's another issue which is debugged here:
>> > > > >
>> > > > > http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html
>> > > > >
>> > > >
>> > > > Applying the patch and revert patch 7fef431be9c9 worked on the first
>> > > > glance.
>> > > >
>> > > > After a couple of minutes the connection get broken. The kernel log
>> > > > shows the following error:
>> > > >
>> > > > ath11k_pci 0000:55:00.0: wmi command 16387 timeout
>> > > > ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
>> > > > ath11k_pc
>> > > > i 0000:55:00.0: failed to enable PMF QOS: (-11
>> > > >
>> > > > It is also not possible to unload the ath11k_pci, rmmod will hang.
>> > > >
>> > > >
>> > >
>> > > I can confirm the same behavior as Stefani so far. After applying the
>> > > patch, and reverting commit 7fef431be9c9, I am able to connect to a
>> > > network. It hasn't disconnected yet (I'm sending this email via that
>> > > connection). I'll report what I find next.
>> > >
>> > > Thanks again for the help!
>>
>> Sigh.... sorry for the top post again. I'll now get a real email
>> client.
>
> So the connection remained super stable for a while, so I decided to
> tempt fate and suspend the laptop to see what would happen :).
>
> [ 5994.143715] PM: suspend exit
> [ 5997.260351] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> [ 5997.260353] ath11k_pci 0000:55:00.0: failed to send
> WMI_PDEV_SET_PARAM cmd
> [ 5997.260356] ath11k_pci 0000:55:00.0: failed to enable dynamic bw:
> -11
> [ 6000.332299] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> [ 6000.332303] ath11k_pci 0000:55:00.0: failed to send
> WMI_PDEV_SET_PARAM cmd
> [ 6000.332308] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
> [ 6003.404365] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> [ 6003.404368] ath11k_pci 0000:55:00.0: failed to send
> WMI_PDEV_SET_PARAM cmd
> [ 6003.404373] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
> [ 6016.204347] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> [ 6016.204351] ath11k_pci 0000:55:00.0: failed to send
> WMI_PDEV_SET_PARAM cmd
> [ 6016.204357] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
> [ 6019.276319] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> [ 6019.276323] ath11k_pci 0000:55:00.0: failed to send
> WMI_PDEV_SET_PARAM cmd
> [ 6019.276329] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
> [ 6031.052272] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> [ 6031.052275] ath11k_pci 0000:55:00.0: failed to send
> WMI_PDEV_SET_PARAM cmd
> [ 6031.052279] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
> [ 6034.128257] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> [ 6034.128261] ath11k_pci 0000:55:00.0: failed to send
> WMI_PDEV_SET_PARAM cmd
> [ 6034.128265] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
> [ 6039.500241] ath11k_pci 0000:55:00.0: qmi failed set mode request,
> mode: 4, err = -110
> [ 6039.500244] ath11k_pci 0000:55:00.0: qmi failed to send wlan mode
> off
>
> I was able to remove the ath11k module using rmmod -f , and then
> modprobe ath11k + atk11k_pci and the device was able to reassociate
> and bring the connection back up.

Please apply below to have a try:
https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/

2020-11-12 07:07:24

by Stefani Seibold

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

Am Donnerstag, den 12.11.2020, 02:10 +0100 schrieb wi nk:
> I've yet to see any instability after 45 minutes of exercising it, I
> do see a couple of messages that came out of the driver:
>
> [ 8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
> [ 11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
>
> then when it associates:
>
> [ 16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> [ 16.722636] wlp85s0: authenticated
> [ 16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> [ 16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> (capab=0x411 status=0 aid=8)
> [ 16.738443] wlp85s0: associated
> [ 16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes
> ready
>
> The adapter is achieving around 500 mbps on my gigabit connection, my
> 2018 mbp sees around 650, so it's doing pretty well so far.
>
> Stefani - when you applied the patch that Kalle shared, which branch
> did you apply it to? I applied it to ath11k-qca6390-bringup and when
> I revert 7fef431be9c9 there is a small merge conflict I needed to
> resolve. I wonder if either the starting branch, or your chosen
> resolution are related to the instability you see (or I'm just lucky
> so far! :)).
>

I used the vanilla kernel tree
https://git.kernel.org/torvalds/t/linux-5.10-rc2.tar.gz. On top of this
i applied the

RFT-ath11k-pci-support-platforms-with-one-MSI-vector.patch

and reverted the patch 7fef431be9c9


2020-11-12 07:16:15

by Kalle Valo

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

Stefani Seibold <[email protected]> writes:

> Am Donnerstag, den 12.11.2020, 02:10 +0100 schrieb wi nk:
>> I've yet to see any instability after 45 minutes of exercising it, I
>> do see a couple of messages that came out of the driver:
>>
>> [ 8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
>> [ 11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
>>
>> then when it associates:
>>
>> [ 16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
>> [ 16.722636] wlp85s0: authenticated
>> [ 16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
>> [ 16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
>> (capab=0x411 status=0 aid=8)
>> [ 16.738443] wlp85s0: associated
>> [ 16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes
>> ready
>>
>> The adapter is achieving around 500 mbps on my gigabit connection, my
>> 2018 mbp sees around 650, so it's doing pretty well so far.
>>
>> Stefani - when you applied the patch that Kalle shared, which branch
>> did you apply it to? I applied it to ath11k-qca6390-bringup and when
>> I revert 7fef431be9c9 there is a small merge conflict I needed to
>> resolve. I wonder if either the starting branch, or your chosen
>> resolution are related to the instability you see (or I'm just lucky
>> so far! :)).
>>
>
> I used the vanilla kernel tree
> https://git.kernel.org/torvalds/t/linux-5.10-rc2.tar.gz. On top of this
> i applied the
>
> RFT-ath11k-pci-support-platforms-with-one-MSI-vector.patch
>
> and reverted the patch 7fef431be9c9

I did also my testing on v5.10-rc2 and I recommend to use that as the
baseline when debuggin these ath11k problems. It helps to compare the
results if everyone have the same baseline.

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-11-12 07:41:57

by wi nk

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On Thu, Nov 12, 2020 at 8:15 AM Kalle Valo <[email protected]> wrote:
>
> Stefani Seibold <[email protected]> writes:
>
> > Am Donnerstag, den 12.11.2020, 02:10 +0100 schrieb wi nk:
> >> I've yet to see any instability after 45 minutes of exercising it, I
> >> do see a couple of messages that came out of the driver:
> >>
> >> [ 8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
> >> [ 11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
> >>
> >> then when it associates:
> >>
> >> [ 16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> >> [ 16.722636] wlp85s0: authenticated
> >> [ 16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> >> [ 16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> >> (capab=0x411 status=0 aid=8)
> >> [ 16.738443] wlp85s0: associated
> >> [ 16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes
> >> ready
> >>
> >> The adapter is achieving around 500 mbps on my gigabit connection, my
> >> 2018 mbp sees around 650, so it's doing pretty well so far.
> >>
> >> Stefani - when you applied the patch that Kalle shared, which branch
> >> did you apply it to? I applied it to ath11k-qca6390-bringup and when
> >> I revert 7fef431be9c9 there is a small merge conflict I needed to
> >> resolve. I wonder if either the starting branch, or your chosen
> >> resolution are related to the instability you see (or I'm just lucky
> >> so far! :)).
> >>
> >
> > I used the vanilla kernel tree
> > https://git.kernel.org/torvalds/t/linux-5.10-rc2.tar.gz. On top of this
> > i applied the
> >
> > RFT-ath11k-pci-support-platforms-with-one-MSI-vector.patch
> >
> > and reverted the patch 7fef431be9c9
>
> I did also my testing on v5.10-rc2 and I recommend to use that as the
> baseline when debuggin these ath11k problems. It helps to compare the
> results if everyone have the same baseline.
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Absolutely, I'll rebuild to 5.10 later today and apply the same series
of patches and report back. I'll also test out the patch on both
versions from Carl to fix resuming. It stands to reason that we may
be seeing another regression between Stefani (5.10) and myself (5.9
bringup branch) as I don't see any disconnections or instability once
the interface is online.

2020-11-12 09:04:45

by Kalle Valo

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

wi nk <[email protected]> writes:

> On Thu, Nov 12, 2020 at 8:15 AM Kalle Valo <[email protected]> wrote:
>>
>> Stefani Seibold <[email protected]> writes:
>>
>> > Am Donnerstag, den 12.11.2020, 02:10 +0100 schrieb wi nk:
>> >> I've yet to see any instability after 45 minutes of exercising it, I
>> >> do see a couple of messages that came out of the driver:
>> >>
>> >> [ 8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
>> >> [ 11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
>> >>
>> >> then when it associates:
>> >>
>> >> [ 16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
>> >> [ 16.722636] wlp85s0: authenticated
>> >> [ 16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
>> >> [ 16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
>> >> (capab=0x411 status=0 aid=8)
>> >> [ 16.738443] wlp85s0: associated
>> >> [ 16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes
>> >> ready
>> >>
>> >> The adapter is achieving around 500 mbps on my gigabit connection, my
>> >> 2018 mbp sees around 650, so it's doing pretty well so far.
>> >>
>> >> Stefani - when you applied the patch that Kalle shared, which branch
>> >> did you apply it to? I applied it to ath11k-qca6390-bringup and when
>> >> I revert 7fef431be9c9 there is a small merge conflict I needed to
>> >> resolve. I wonder if either the starting branch, or your chosen
>> >> resolution are related to the instability you see (or I'm just lucky
>> >> so far! :)).
>> >>
>> >
>> > I used the vanilla kernel tree
>> > https://git.kernel.org/torvalds/t/linux-5.10-rc2.tar.gz. On top of this
>> > i applied the
>> >
>> > RFT-ath11k-pci-support-platforms-with-one-MSI-vector.patch
>> >
>> > and reverted the patch 7fef431be9c9
>>
>> I did also my testing on v5.10-rc2 and I recommend to use that as the
>> baseline when debuggin these ath11k problems. It helps to compare the
>> results if everyone have the same baseline.
>>
>> --
>> https://patchwork.kernel.org/project/linux-wireless/list/
>>
>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
>
> Absolutely, I'll rebuild to 5.10 later today and apply the same series
> of patches and report back.

Great, thanks.

> I'll also test out the patch on both versions from Carl to fix
> resuming. It stands to reason that we may be seeing another regression
> between Stefani (5.10) and myself (5.9 bringup branch) as I don't see
> any disconnections or instability once the interface is online.

Yeah, there is something strange happening between v5.9 and v5.10 we
have not yet figured out. Most likely it has something to do with memory
allocations and DMA transfers failing, but no clear understanding yet.

But to keep things simple let's only discuss the MSI problem on this
thread, and discuss the timeouts in the another thread:

http://lists.infradead.org/pipermail/ath11k/2020-November/000641.html

I'll include you and other reporters to that thread.

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-11-12 15:48:10

by wi nk

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On Thu, Nov 12, 2020 at 10:00 AM Kalle Valo <[email protected]> wrote:
>
> wi nk <[email protected]> writes:
>
> > On Thu, Nov 12, 2020 at 8:15 AM Kalle Valo <[email protected]> wrote:
> >>
> >> Stefani Seibold <[email protected]> writes:
> >>
> >> > Am Donnerstag, den 12.11.2020, 02:10 +0100 schrieb wi nk:
> >> >> I've yet to see any instability after 45 minutes of exercising it, I
> >> >> do see a couple of messages that came out of the driver:
> >> >>
> >> >> [ 8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
> >> >> [ 11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
> >> >>
> >> >> then when it associates:
> >> >>
> >> >> [ 16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> >> >> [ 16.722636] wlp85s0: authenticated
> >> >> [ 16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> >> >> [ 16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> >> >> (capab=0x411 status=0 aid=8)
> >> >> [ 16.738443] wlp85s0: associated
> >> >> [ 16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes
> >> >> ready
> >> >>
> >> >> The adapter is achieving around 500 mbps on my gigabit connection, my
> >> >> 2018 mbp sees around 650, so it's doing pretty well so far.
> >> >>
> >> >> Stefani - when you applied the patch that Kalle shared, which branch
> >> >> did you apply it to? I applied it to ath11k-qca6390-bringup and when
> >> >> I revert 7fef431be9c9 there is a small merge conflict I needed to
> >> >> resolve. I wonder if either the starting branch, or your chosen
> >> >> resolution are related to the instability you see (or I'm just lucky
> >> >> so far! :)).
> >> >>
> >> >
> >> > I used the vanilla kernel tree
> >> > https://git.kernel.org/torvalds/t/linux-5.10-rc2.tar.gz. On top of this
> >> > i applied the
> >> >
> >> > RFT-ath11k-pci-support-platforms-with-one-MSI-vector.patch
> >> >
> >> > and reverted the patch 7fef431be9c9
> >>
> >> I did also my testing on v5.10-rc2 and I recommend to use that as the
> >> baseline when debuggin these ath11k problems. It helps to compare the
> >> results if everyone have the same baseline.
> >>
> >> --
> >> https://patchwork.kernel.org/project/linux-wireless/list/
> >>
> >> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> >
> > Absolutely, I'll rebuild to 5.10 later today and apply the same series
> > of patches and report back.
>
> Great, thanks.
>
> > I'll also test out the patch on both versions from Carl to fix
> > resuming. It stands to reason that we may be seeing another regression
> > between Stefani (5.10) and myself (5.9 bringup branch) as I don't see
> > any disconnections or instability once the interface is online.
>
> Yeah, there is something strange happening between v5.9 and v5.10 we
> have not yet figured out. Most likely it has something to do with memory
> allocations and DMA transfers failing, but no clear understanding yet.
>
> But to keep things simple let's only discuss the MSI problem on this
> thread, and discuss the timeouts in the another thread:
>
> http://lists.infradead.org/pipermail/ath11k/2020-November/000641.html
>
> I'll include you and other reporters to that thread.
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Ok, I've tried a clean checkout of 5.10-rc2 with the one MSI patch
applied and 7fef431be9c9 reverted. I can't get my machine to boot
into anything usable with that configuration. I'm running ubuntu so
its starting right into X and sometime between showing the available
users and me clicking the icon to login the machine freezes. I can
see in the system tray that the wifi adapter is being activated and
appears to have associated with an AP, I just can't do much beyond
that as the keyboard backlight wakes up, but the caps lock key doesn't
work. I see similar behavior with the 5.9 configuration, but after a
reboot or two I win whatever race is occuring. With 5.10, I tried
maybe 10-15 times with 0 success.

2020-11-13 09:55:09

by wi nk

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On Thu, Nov 12, 2020 at 4:44 PM wi nk <[email protected]> wrote:
>
> On Thu, Nov 12, 2020 at 10:00 AM Kalle Valo <[email protected]> wrote:
> >
> > wi nk <[email protected]> writes:
> >
> > > On Thu, Nov 12, 2020 at 8:15 AM Kalle Valo <[email protected]> wrote:
> > >>
> > >> Stefani Seibold <[email protected]> writes:
> > >>
> > >> > Am Donnerstag, den 12.11.2020, 02:10 +0100 schrieb wi nk:
> > >> >> I've yet to see any instability after 45 minutes of exercising it, I
> > >> >> do see a couple of messages that came out of the driver:
> > >> >>
> > >> >> [ 8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
> > >> >> [ 11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
> > >> >>
> > >> >> then when it associates:
> > >> >>
> > >> >> [ 16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> > >> >> [ 16.722636] wlp85s0: authenticated
> > >> >> [ 16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> > >> >> [ 16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> > >> >> (capab=0x411 status=0 aid=8)
> > >> >> [ 16.738443] wlp85s0: associated
> > >> >> [ 16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes
> > >> >> ready
> > >> >>
> > >> >> The adapter is achieving around 500 mbps on my gigabit connection, my
> > >> >> 2018 mbp sees around 650, so it's doing pretty well so far.
> > >> >>
> > >> >> Stefani - when you applied the patch that Kalle shared, which branch
> > >> >> did you apply it to? I applied it to ath11k-qca6390-bringup and when
> > >> >> I revert 7fef431be9c9 there is a small merge conflict I needed to
> > >> >> resolve. I wonder if either the starting branch, or your chosen
> > >> >> resolution are related to the instability you see (or I'm just lucky
> > >> >> so far! :)).
> > >> >>
> > >> >
> > >> > I used the vanilla kernel tree
> > >> > https://git.kernel.org/torvalds/t/linux-5.10-rc2.tar.gz. On top of this
> > >> > i applied the
> > >> >
> > >> > RFT-ath11k-pci-support-platforms-with-one-MSI-vector.patch
> > >> >
> > >> > and reverted the patch 7fef431be9c9
> > >>
> > >> I did also my testing on v5.10-rc2 and I recommend to use that as the
> > >> baseline when debuggin these ath11k problems. It helps to compare the
> > >> results if everyone have the same baseline.
> > >>
> > >> --
> > >> https://patchwork.kernel.org/project/linux-wireless/list/
> > >>
> > >> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> > >
> > > Absolutely, I'll rebuild to 5.10 later today and apply the same series
> > > of patches and report back.
> >
> > Great, thanks.
> >
> > > I'll also test out the patch on both versions from Carl to fix
> > > resuming. It stands to reason that we may be seeing another regression
> > > between Stefani (5.10) and myself (5.9 bringup branch) as I don't see
> > > any disconnections or instability once the interface is online.
> >
> > Yeah, there is something strange happening between v5.9 and v5.10 we
> > have not yet figured out. Most likely it has something to do with memory
> > allocations and DMA transfers failing, but no clear understanding yet.
> >
> > But to keep things simple let's only discuss the MSI problem on this
> > thread, and discuss the timeouts in the another thread:
> >
> > http://lists.infradead.org/pipermail/ath11k/2020-November/000641.html
> >
> > I'll include you and other reporters to that thread.
> >
> > --
> > https://patchwork.kernel.org/project/linux-wireless/list/
> >
> > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
>
> Ok, I've tried a clean checkout of 5.10-rc2 with the one MSI patch
> applied and 7fef431be9c9 reverted. I can't get my machine to boot
> into anything usable with that configuration. I'm running ubuntu so
> its starting right into X and sometime between showing the available
> users and me clicking the icon to login the machine freezes. I can
> see in the system tray that the wifi adapter is being activated and
> appears to have associated with an AP, I just can't do much beyond
> that as the keyboard backlight wakes up, but the caps lock key doesn't
> work. I see similar behavior with the 5.9 configuration, but after a
> reboot or two I win whatever race is occuring. With 5.10, I tried
> maybe 10-15 times with 0 success.

Kalle, what would be a useful next move for trying to hunt this? It
seems I can't really test the single MSI patch on 5.10 since with the
patch (+ the reverted commit) the driver isn't stable enough for my
machine to stay running. It seems your hunch is that this is related
to the issues in the other thread
(http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html)?
I see the SOTA for debugging these things would be to use the kdump
tools and let the secondary kernel dump diagnostics for me. Would
such logs be useful for you/this?

Thanks!

2020-11-15 14:37:35

by Thomas Krause

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310


Am 12.11.20 um 16:44 schrieb wi nk:
> On Thu, Nov 12, 2020 at 10:00 AM Kalle Valo <[email protected]> wrote:
>> wi nk <[email protected]> writes:
>>
>>> On Thu, Nov 12, 2020 at 8:15 AM Kalle Valo <[email protected]> wrote:
>>>> Stefani Seibold <[email protected]> writes:
>>>>
>>>>> Am Donnerstag, den 12.11.2020, 02:10 +0100 schrieb wi nk:
>>>>>> I've yet to see any instability after 45 minutes of exercising it, I
>>>>>> do see a couple of messages that came out of the driver:
>>>>>>
>>>>>> [ 8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
>>>>>> [ 11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
>>>>>>
>>>>>> then when it associates:
>>>>>>
>>>>>> [ 16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
>>>>>> [ 16.722636] wlp85s0: authenticated
>>>>>> [ 16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
>>>>>> [ 16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
>>>>>> (capab=0x411 status=0 aid=8)
>>>>>> [ 16.738443] wlp85s0: associated
>>>>>> [ 16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes
>>>>>> ready
>>>>>>
>>>>>> The adapter is achieving around 500 mbps on my gigabit connection, my
>>>>>> 2018 mbp sees around 650, so it's doing pretty well so far.
>>>>>>
>>>>>> Stefani - when you applied the patch that Kalle shared, which branch
>>>>>> did you apply it to? I applied it to ath11k-qca6390-bringup and when
>>>>>> I revert 7fef431be9c9 there is a small merge conflict I needed to
>>>>>> resolve. I wonder if either the starting branch, or your chosen
>>>>>> resolution are related to the instability you see (or I'm just lucky
>>>>>> so far! :)).
>>>>>>
>>>>> I used the vanilla kernel tree
>>>>> https://git.kernel.org/torvalds/t/linux-5.10-rc2.tar.gz. On top of this
>>>>> i applied the
>>>>>
>>>>> RFT-ath11k-pci-support-platforms-with-one-MSI-vector.patch
>>>>>
>>>>> and reverted the patch 7fef431be9c9
>>>> I did also my testing on v5.10-rc2 and I recommend to use that as the
>>>> baseline when debuggin these ath11k problems. It helps to compare the
>>>> results if everyone have the same baseline.
>>>>
>>>> --
>>>> https://patchwork.kernel.org/project/linux-wireless/list/
>>>>
>>>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
>>> Absolutely, I'll rebuild to 5.10 later today and apply the same series
>>> of patches and report back.
>> Great, thanks.
>>
>>> I'll also test out the patch on both versions from Carl to fix
>>> resuming. It stands to reason that we may be seeing another regression
>>> between Stefani (5.10) and myself (5.9 bringup branch) as I don't see
>>> any disconnections or instability once the interface is online.
>> Yeah, there is something strange happening between v5.9 and v5.10 we
>> have not yet figured out. Most likely it has something to do with memory
>> allocations and DMA transfers failing, but no clear understanding yet.
>>
>> But to keep things simple let's only discuss the MSI problem on this
>> thread, and discuss the timeouts in the another thread:
>>
>> http://lists.infradead.org/pipermail/ath11k/2020-November/000641.html
>>
>> I'll include you and other reporters to that thread.
>>
>> --
>> https://patchwork.kernel.org/project/linux-wireless/list/
>>
>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> Ok, I've tried a clean checkout of 5.10-rc2 with the one MSI patch
> applied and 7fef431be9c9 reverted. I can't get my machine to boot
> into anything usable with that configuration. I'm running ubuntu so
> its starting right into X and sometime between showing the available
> users and me clicking the icon to login the machine freezes. I can
> see in the system tray that the wifi adapter is being activated and
> appears to have associated with an AP, I just can't do much beyond
> that as the keyboard backlight wakes up, but the caps lock key doesn't
> work. I see similar behavior with the 5.9 configuration, but after a
> reboot or two I win whatever race is occuring. With 5.10, I tried
> maybe 10-15 times with 0 success.

I can confirm this behavior on my configuration. I managed to login once
and select the Wifi and connect to it. It seemed curiously enough be
stable long enough to enter the Wifi passphrase. After the connection
was established, the system hang and on each attempt to reboot into the
graphical system it would freeze at some point (sometimes even before
showing the login screen).

Kernel was both based on 5.10-rc2 and 5.10-rc3 (I did see the same
behavior) with the patch applied, 7fef431be9c9 reverted and firmware
downloaded and copied to /lib/firmware/ath11k/QCA6390/hw2.0/.


2020-11-15 20:13:00

by wi nk

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On Sun, Nov 15, 2020 at 2:30 PM Thomas Krause <[email protected]> wrote:
>
>
> Am 12.11.20 um 16:44 schrieb wi nk:
> > On Thu, Nov 12, 2020 at 10:00 AM Kalle Valo <[email protected]> wrote:
> >> wi nk <[email protected]> writes:
> >>
> >>> On Thu, Nov 12, 2020 at 8:15 AM Kalle Valo <[email protected]> wrote:
> >>>> Stefani Seibold <[email protected]> writes:
> >>>>
> >>>>> Am Donnerstag, den 12.11.2020, 02:10 +0100 schrieb wi nk:
> >>>>>> I've yet to see any instability after 45 minutes of exercising it, I
> >>>>>> do see a couple of messages that came out of the driver:
> >>>>>>
> >>>>>> [ 8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
> >>>>>> [ 11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
> >>>>>>
> >>>>>> then when it associates:
> >>>>>>
> >>>>>> [ 16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> >>>>>> [ 16.722636] wlp85s0: authenticated
> >>>>>> [ 16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> >>>>>> [ 16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> >>>>>> (capab=0x411 status=0 aid=8)
> >>>>>> [ 16.738443] wlp85s0: associated
> >>>>>> [ 16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes
> >>>>>> ready
> >>>>>>
> >>>>>> The adapter is achieving around 500 mbps on my gigabit connection, my
> >>>>>> 2018 mbp sees around 650, so it's doing pretty well so far.
> >>>>>>
> >>>>>> Stefani - when you applied the patch that Kalle shared, which branch
> >>>>>> did you apply it to? I applied it to ath11k-qca6390-bringup and when
> >>>>>> I revert 7fef431be9c9 there is a small merge conflict I needed to
> >>>>>> resolve. I wonder if either the starting branch, or your chosen
> >>>>>> resolution are related to the instability you see (or I'm just lucky
> >>>>>> so far! :)).
> >>>>>>
> >>>>> I used the vanilla kernel tree
> >>>>> https://git.kernel.org/torvalds/t/linux-5.10-rc2.tar.gz. On top of this
> >>>>> i applied the
> >>>>>
> >>>>> RFT-ath11k-pci-support-platforms-with-one-MSI-vector.patch
> >>>>>
> >>>>> and reverted the patch 7fef431be9c9
> >>>> I did also my testing on v5.10-rc2 and I recommend to use that as the
> >>>> baseline when debuggin these ath11k problems. It helps to compare the
> >>>> results if everyone have the same baseline.
> >>>>
> >>>> --
> >>>> https://patchwork.kernel.org/project/linux-wireless/list/
> >>>>
> >>>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> >>> Absolutely, I'll rebuild to 5.10 later today and apply the same series
> >>> of patches and report back.
> >> Great, thanks.
> >>
> >>> I'll also test out the patch on both versions from Carl to fix
> >>> resuming. It stands to reason that we may be seeing another regression
> >>> between Stefani (5.10) and myself (5.9 bringup branch) as I don't see
> >>> any disconnections or instability once the interface is online.
> >> Yeah, there is something strange happening between v5.9 and v5.10 we
> >> have not yet figured out. Most likely it has something to do with memory
> >> allocations and DMA transfers failing, but no clear understanding yet.
> >>
> >> But to keep things simple let's only discuss the MSI problem on this
> >> thread, and discuss the timeouts in the another thread:
> >>
> >> http://lists.infradead.org/pipermail/ath11k/2020-November/000641.html
> >>
> >> I'll include you and other reporters to that thread.
> >>
> >> --
> >> https://patchwork.kernel.org/project/linux-wireless/list/
> >>
> >> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> > Ok, I've tried a clean checkout of 5.10-rc2 with the one MSI patch
> > applied and 7fef431be9c9 reverted. I can't get my machine to boot
> > into anything usable with that configuration. I'm running ubuntu so
> > its starting right into X and sometime between showing the available
> > users and me clicking the icon to login the machine freezes. I can
> > see in the system tray that the wifi adapter is being activated and
> > appears to have associated with an AP, I just can't do much beyond
> > that as the keyboard backlight wakes up, but the caps lock key doesn't
> > work. I see similar behavior with the 5.9 configuration, but after a
> > reboot or two I win whatever race is occuring. With 5.10, I tried
> > maybe 10-15 times with 0 success.
>
> I can confirm this behavior on my configuration. I managed to login once
> and select the Wifi and connect to it. It seemed curiously enough be
> stable long enough to enter the Wifi passphrase. After the connection
> was established, the system hang and on each attempt to reboot into the
> graphical system it would freeze at some point (sometimes even before
> showing the login screen).
>
> Kernel was both based on 5.10-rc2 and 5.10-rc3 (I did see the same
> behavior) with the patch applied, 7fef431be9c9 reverted and firmware
> downloaded and copied to /lib/firmware/ath11k/QCA6390/hw2.0/.
>
>

I did a bit more digging to see if I could find any new information,
I'm not sure I did but here's what I did / found. I spent the time to
get a kdump kernel running and enabled, I was able to SysRq-C (both
via keyboard and echo c > /proc/sysrq-trigger) and generate a crash
dump. Actually viewing them at the moment will require reverting a
couple of patches to printk to fix the file for the crash utility
(https://github.com/crash-utility/crash/issues/67), but right now
that's not super important since the mechanism isn't being triggered.
As reported here and by Mitchell, the adapter will work occasionally,
but more often it will hang the machine (I too tried 5.10-rc3 with no
noticable differences). Whatever is causing the system to hang isn't
triggering the kdump kernel to take over and dump the vmcore. I've
set watchdog=1 , nmi_watchdog=1, hung_task_panic=1, softlockup_panic=1
trying to convince the kernel to dump it's state during this. I've
not been able to make it write a crash, it just sits 'hung'. One
interesting observation that may be related, is that if the lockup
occurs during my login, I can actually see the system grind to a halt
over the course of a number of frames (the rendering of the login
animations starts to stutter/get really slow, then after a few frames
everything is frozen). If something were spin locking/ed, I'd expect
the soft lockup panic to find it, but I don't know these mechanisms
well.

The only consistent behavior that I managed to create is that if the
wifi adapter / machine are in a 'working' state (ie: I can browse the
internet, etc) and I issue sysrq-c to crash the kernel and then let
the crash dump write and reboot the machine, once booted the adapter
is no longer seen by the kernel, and there are zero messages in dmesg
that match "ath11k". The driver shows up in lsmod , but it reports
zero messages and it's like the adapter is completely invisible. A
power off and back on of the machine will re-enter it into the
freezing/wifi working lottery.

2020-11-17 15:54:10

by wi nk

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On Sun, Nov 15, 2020 at 8:55 PM wi nk <[email protected]> wrote:
>
> On Sun, Nov 15, 2020 at 2:30 PM Thomas Krause <[email protected]> wrote:
> >
> >
> > Am 12.11.20 um 16:44 schrieb wi nk:
> > > On Thu, Nov 12, 2020 at 10:00 AM Kalle Valo <[email protected]> wrote:
> > >> wi nk <[email protected]> writes:
> > >>
> > >>> On Thu, Nov 12, 2020 at 8:15 AM Kalle Valo <[email protected]> wrote:
> > >>>> Stefani Seibold <[email protected]> writes:
> > >>>>
> > >>>>> Am Donnerstag, den 12.11.2020, 02:10 +0100 schrieb wi nk:
> > >>>>>> I've yet to see any instability after 45 minutes of exercising it, I
> > >>>>>> do see a couple of messages that came out of the driver:
> > >>>>>>
> > >>>>>> [ 8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
> > >>>>>> [ 11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
> > >>>>>>
> > >>>>>> then when it associates:
> > >>>>>>
> > >>>>>> [ 16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> > >>>>>> [ 16.722636] wlp85s0: authenticated
> > >>>>>> [ 16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> > >>>>>> [ 16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> > >>>>>> (capab=0x411 status=0 aid=8)
> > >>>>>> [ 16.738443] wlp85s0: associated
> > >>>>>> [ 16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes
> > >>>>>> ready
> > >>>>>>
> > >>>>>> The adapter is achieving around 500 mbps on my gigabit connection, my
> > >>>>>> 2018 mbp sees around 650, so it's doing pretty well so far.
> > >>>>>>
> > >>>>>> Stefani - when you applied the patch that Kalle shared, which branch
> > >>>>>> did you apply it to? I applied it to ath11k-qca6390-bringup and when
> > >>>>>> I revert 7fef431be9c9 there is a small merge conflict I needed to
> > >>>>>> resolve. I wonder if either the starting branch, or your chosen
> > >>>>>> resolution are related to the instability you see (or I'm just lucky
> > >>>>>> so far! :)).
> > >>>>>>
> > >>>>> I used the vanilla kernel tree
> > >>>>> https://git.kernel.org/torvalds/t/linux-5.10-rc2.tar.gz. On top of this
> > >>>>> i applied the
> > >>>>>
> > >>>>> RFT-ath11k-pci-support-platforms-with-one-MSI-vector.patch
> > >>>>>
> > >>>>> and reverted the patch 7fef431be9c9
> > >>>> I did also my testing on v5.10-rc2 and I recommend to use that as the
> > >>>> baseline when debuggin these ath11k problems. It helps to compare the
> > >>>> results if everyone have the same baseline.
> > >>>>
> > >>>> --
> > >>>> https://patchwork.kernel.org/project/linux-wireless/list/
> > >>>>
> > >>>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> > >>> Absolutely, I'll rebuild to 5.10 later today and apply the same series
> > >>> of patches and report back.
> > >> Great, thanks.
> > >>
> > >>> I'll also test out the patch on both versions from Carl to fix
> > >>> resuming. It stands to reason that we may be seeing another regression
> > >>> between Stefani (5.10) and myself (5.9 bringup branch) as I don't see
> > >>> any disconnections or instability once the interface is online.
> > >> Yeah, there is something strange happening between v5.9 and v5.10 we
> > >> have not yet figured out. Most likely it has something to do with memory
> > >> allocations and DMA transfers failing, but no clear understanding yet.
> > >>
> > >> But to keep things simple let's only discuss the MSI problem on this
> > >> thread, and discuss the timeouts in the another thread:
> > >>
> > >> http://lists.infradead.org/pipermail/ath11k/2020-November/000641.html
> > >>
> > >> I'll include you and other reporters to that thread.
> > >>
> > >> --
> > >> https://patchwork.kernel.org/project/linux-wireless/list/
> > >>
> > >> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> > > Ok, I've tried a clean checkout of 5.10-rc2 with the one MSI patch
> > > applied and 7fef431be9c9 reverted. I can't get my machine to boot
> > > into anything usable with that configuration. I'm running ubuntu so
> > > its starting right into X and sometime between showing the available
> > > users and me clicking the icon to login the machine freezes. I can
> > > see in the system tray that the wifi adapter is being activated and
> > > appears to have associated with an AP, I just can't do much beyond
> > > that as the keyboard backlight wakes up, but the caps lock key doesn't
> > > work. I see similar behavior with the 5.9 configuration, but after a
> > > reboot or two I win whatever race is occuring. With 5.10, I tried
> > > maybe 10-15 times with 0 success.
> >
> > I can confirm this behavior on my configuration. I managed to login once
> > and select the Wifi and connect to it. It seemed curiously enough be
> > stable long enough to enter the Wifi passphrase. After the connection
> > was established, the system hang and on each attempt to reboot into the
> > graphical system it would freeze at some point (sometimes even before
> > showing the login screen).
> >
> > Kernel was both based on 5.10-rc2 and 5.10-rc3 (I did see the same
> > behavior) with the patch applied, 7fef431be9c9 reverted and firmware
> > downloaded and copied to /lib/firmware/ath11k/QCA6390/hw2.0/.
> >
> >
>
> I did a bit more digging to see if I could find any new information,
> I'm not sure I did but here's what I did / found. I spent the time to
> get a kdump kernel running and enabled, I was able to SysRq-C (both
> via keyboard and echo c > /proc/sysrq-trigger) and generate a crash
> dump. Actually viewing them at the moment will require reverting a
> couple of patches to printk to fix the file for the crash utility
> (https://github.com/crash-utility/crash/issues/67), but right now
> that's not super important since the mechanism isn't being triggered.
> As reported here and by Mitchell, the adapter will work occasionally,
> but more often it will hang the machine (I too tried 5.10-rc3 with no
> noticable differences). Whatever is causing the system to hang isn't
> triggering the kdump kernel to take over and dump the vmcore. I've
> set watchdog=1 , nmi_watchdog=1, hung_task_panic=1, softlockup_panic=1
> trying to convince the kernel to dump it's state during this. I've
> not been able to make it write a crash, it just sits 'hung'. One
> interesting observation that may be related, is that if the lockup
> occurs during my login, I can actually see the system grind to a halt
> over the course of a number of frames (the rendering of the login
> animations starts to stutter/get really slow, then after a few frames
> everything is frozen). If something were spin locking/ed, I'd expect
> the soft lockup panic to find it, but I don't know these mechanisms
> well.
>
> The only consistent behavior that I managed to create is that if the
> wifi adapter / machine are in a 'working' state (ie: I can browse the
> internet, etc) and I issue sysrq-c to crash the kernel and then let
> the crash dump write and reboot the machine, once booted the adapter
> is no longer seen by the kernel, and there are zero messages in dmesg
> that match "ath11k". The driver shows up in lsmod , but it reports
> zero messages and it's like the adapter is completely invisible. A
> power off and back on of the machine will re-enter it into the
> freezing/wifi working lottery.

Good evening all! Just wanted to follow up as I think I've started to
uncover some of what's happening with the XPS and this driver. So
since I can't get the kdump kernel to dump anything for me, I took a
bit more of a naive approach. I blacklisted the modules (ath11k /
ath11k_pci) from modprobe so I could at least control when it was
loaded. I managed to capture a series of crashes (in phone pics, but
I'll transcribe the relevant bits here) that seem to indicate some
kind of runaway / spin locked behavior. In all but one case[*], both
the crash and the eventual working state, the driver completely
initialized successfully with messaging like this:

[ 23.209335] ath11k_pci 0000:55:00.0: WARNING: ath11k PCI support is
experimental!
[ 23.209404] ath11k_pci 0000:55:00.0: BAR 0: assigned [mem
0x8e300000-0x8e3fffff 64bit]
[ 23.209421] ath11k_pci 0000:55:00.0: enabling device (0000 -> 0002)
[ 23.209502] ath11k_pci 0000:55:00.0: MSI vectors: 1
[ 23.454227] ath11k_pci 0000:55:00.0: Respond mem req failed,
result: 1, err: 0
[ 23.454233] ath11k_pci 0000:55:00.0: qmi failed to respond fw mem req:-22
[ 23.455810] ath11k_pci 0000:55:00.0: req mem_seg[0] 0x27d00000 524288 1
[ 23.455814] ath11k_pci 0000:55:00.0: req mem_seg[1] 0x27d80000 524288 1
[ 23.455816] ath11k_pci 0000:55:00.0: req mem_seg[2] 0x27e00000 524288 1
[ 23.455817] ath11k_pci 0000:55:00.0: req mem_seg[3] 0x27e80000 294912 1
[ 23.455819] ath11k_pci 0000:55:00.0: req mem_seg[4] 0x27f00000 524288 1
[ 23.455820] ath11k_pci 0000:55:00.0: req mem_seg[5] 0x27f80000 524288 1
[ 23.455822] ath11k_pci 0000:55:00.0: req mem_seg[6] 0x27800000 458752 1
[ 23.455824] ath11k_pci 0000:55:00.0: req mem_seg[7] 0x27cc0000 131072 1
[ 23.455825] ath11k_pci 0000:55:00.0: req mem_seg[8] 0x27880000 524288 4
[ 23.455827] ath11k_pci 0000:55:00.0: req mem_seg[9] 0x27900000 360448 4
[ 23.455829] ath11k_pci 0000:55:00.0: req mem_seg[10] 0x27ca4000 16384 1
[ 23.466226] ath11k_pci 0000:55:00.0: chip_id 0x0 chip_family 0xb
board_id 0xff soc_id 0xffffffff
[ 23.466230] ath11k_pci 0000:55:00.0: fw_version 0x101c06cc
fw_build_timestamp 2020-06-24 19:50 fw_build_id
[ 23.677675] ath11k_pci 0000:55:00.0 wlp85s0: renamed from wlan0

So up until this point, everything is working without issues.
Everything seems to spiral out of control a couple of seconds later
when my system attempts to actually bring up the adapter. In most of
the crash states I will see this:

[ 31.286725] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
[ 31.390187] wlp85s0: send auth to ec:08:6b:27:01:ea (try 2/3)
[ 31.391928] wlp85s0: authenticated
[ 31.394196] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
[ 31.396513] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
(capab=0x411 status=0 aid=6)
[ 31.407730] wlp85s0: associated
[ 31.434354] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready

And then either somewhere in that pile of messages, or a second or two
after this my machine will start to stutter as I mentioned before, and
then it either hangs, or I see this message (I'm truncating the
timestamp):

[ 35.xxxx ] sched: RT throttling activated

After that moment, the machine is unresponsive. Sorry I can't seem to
extract this data other than screenshots from my phone at the moment,
you can see the dmesg output from 6 different hangs here:
https://github.com/w1nk/ath11k-debug

* - In the case where the driver didn't fully initialize successfully
and hung; during the initialization right after the "MSI vectors: %d"
printk, I started seeing these:

[ 77.xxx ] alloc_contig_range: [88d8e0, 88d8e9) PFNs busy

2020-11-17 21:02:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On Tue, Nov 17 2020 at 16:49, wi nk wrote:
> On Sun, Nov 15, 2020 at 8:55 PM wi nk <[email protected]> wrote:
> So up until this point, everything is working without issues.
> Everything seems to spiral out of control a couple of seconds later
> when my system attempts to actually bring up the adapter. In most of
> the crash states I will see this:
>
> [ 31.286725] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> [ 31.390187] wlp85s0: send auth to ec:08:6b:27:01:ea (try 2/3)
> [ 31.391928] wlp85s0: authenticated
> [ 31.394196] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> [ 31.396513] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> (capab=0x411 status=0 aid=6)
> [ 31.407730] wlp85s0: associated
> [ 31.434354] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready
>
> And then either somewhere in that pile of messages, or a second or two
> after this my machine will start to stutter as I mentioned before, and
> then it either hangs, or I see this message (I'm truncating the
> timestamp):
>
> [ 35.xxxx ] sched: RT throttling activated

As this driver uses threaded interrupts, this looks like an interrupt
storm and the interrupt thread consumes the CPU fully. The RT throttler
limits the RT runtime of it which allows other tasks make some
progress. That's what you observe as stutter.

You can apply the hack below so the irq thread(s) run in the SCHED_OTHER
class which prevents them from monopolizing the CPU. That might make the
problem simpler to debug.

Thanks,

tglx
---
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index c460e0496006..8473ecacac7a 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1320,7 +1320,7 @@ setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
if (IS_ERR(t))
return PTR_ERR(t);

- sched_set_fifo(t);
+ //sched_set_fifo(t);

/*
* We keep the reference to the task struct even if

2020-11-18 10:24:53

by wi nk

[permalink] [raw]
Subject: Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310

On Tue, Nov 17, 2020 at 9:59 PM Thomas Gleixner <[email protected]> wrote:
>
> On Tue, Nov 17 2020 at 16:49, wi nk wrote:
> > On Sun, Nov 15, 2020 at 8:55 PM wi nk <[email protected]> wrote:
> > So up until this point, everything is working without issues.
> > Everything seems to spiral out of control a couple of seconds later
> > when my system attempts to actually bring up the adapter. In most of
> > the crash states I will see this:
> >
> > [ 31.286725] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> > [ 31.390187] wlp85s0: send auth to ec:08:6b:27:01:ea (try 2/3)
> > [ 31.391928] wlp85s0: authenticated
> > [ 31.394196] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> > [ 31.396513] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> > (capab=0x411 status=0 aid=6)
> > [ 31.407730] wlp85s0: associated
> > [ 31.434354] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready
> >
> > And then either somewhere in that pile of messages, or a second or two
> > after this my machine will start to stutter as I mentioned before, and
> > then it either hangs, or I see this message (I'm truncating the
> > timestamp):
> >
> > [ 35.xxxx ] sched: RT throttling activated
>
> As this driver uses threaded interrupts, this looks like an interrupt
> storm and the interrupt thread consumes the CPU fully. The RT throttler
> limits the RT runtime of it which allows other tasks make some
> progress. That's what you observe as stutter.
>
> You can apply the hack below so the irq thread(s) run in the SCHED_OTHER
> class which prevents them from monopolizing the CPU. That might make the
> problem simpler to debug.
>
> Thanks,
>
> tglx
> ---
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index c460e0496006..8473ecacac7a 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -1320,7 +1320,7 @@ setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
> if (IS_ERR(t))
> return PTR_ERR(t);
>
> - sched_set_fifo(t);
> + //sched_set_fifo(t);
>
> /*
> * We keep the reference to the task struct even if

I was able to apply this patch and play a little bit. Unfortunately,
whatever is still going on is mostly the same. It seems this patch
extends the 'stuttering' I see a little bit, but the end result is
still an unresponsive machine. I didn't get tons of time to play yet,
so the extra time may make it possible to finally get sysrq-c issued
and get a vmcore dump. I also tried to replicate a google android
patch I found to basically BUG() on the rt throttling activating
(https://groups.google.com/a/chromium.org/g/chromium-os-reviews/c/NDyPucYrvRY)
but that path hasn't activated for me since I booted it. I'll
hopefully have a chance again this evening.