2020-11-05 09:05:38

by Kalle Valo

[permalink] [raw]
Subject: Regression: QCA6390 fails with "mm/page_alloc: place pages to tail in __free_pages_core()"

(changing the subject, adding more lists and people)

Pavel Procopiuc <[email protected]> writes:

> Op 04.11.2020 om 10:12 schreef Kalle Valo:
>> Yeah, it is unfortunately time consuming but it is the best way to get
>> bottom of this.
>
> I have found the commit that breaks things for me, it's
> 7fef431be9c9ac255838a9578331567b9dba4477 mm/page_alloc: place pages to
> tail in __free_pages_core()
>
> I've reverted it on top of the 5.10-rc2 and ath11k driver loads fine
> and I have wifi working.

Oh, very interesting. Thanks a lot for the bisection, otherwise we would
have never found out whats causing this.

David & mm folks: Pavel noticed that his QCA6390 Wi-Fi 6 device (driver
ath11k) failed on v5.10-rc1. After bisecting he found that the commit
below causes the regression. I have not been able to reproduce this and
for me QCA6390 works fine. I don't know if this needs a specific kernel
configuration or what's the difference between our setups.

Any ideas what might cause this and how to fix it?

Full discussion: http://lists.infradead.org/pipermail/ath11k/2020-November/000501.html

commit 7fef431be9c9ac255838a9578331567b9dba4477
Author: David Hildenbrand <[email protected]>
AuthorDate: Thu Oct 15 20:09:35 2020 -0700
Commit: Linus Torvalds <[email protected]>
CommitDate: Fri Oct 16 11:11:18 2020 -0700

mm/page_alloc: place pages to tail in __free_pages_core()

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


2020-11-05 09:48:42

by Pavel Procopiuc

[permalink] [raw]
Subject: Re: Regression: QCA6390 fails with "mm/page_alloc: place pages to tail in __free_pages_core()"

Op 05.11.2020 om 10:04 schreef Kalle Valo:
> Oh, very interesting. Thanks a lot for the bisection, otherwise we would
> have never found out whats causing this.
>
> David & mm folks: Pavel noticed that his QCA6390 Wi-Fi 6 device (driver
> ath11k) failed on v5.10-rc1. After bisecting he found that the commit
> below causes the regression. I have not been able to reproduce this and
> for me QCA6390 works fine. I don't know if this needs a specific kernel
> configuration or what's the difference between our setups.
>
> Any ideas what might cause this and how to fix it?
>
> Full discussion: http://lists.infradead.org/pipermail/ath11k/2020-November/000501.html
>
> commit 7fef431be9c9ac255838a9578331567b9dba4477
> Author: David Hildenbrand <[email protected]>
> AuthorDate: Thu Oct 15 20:09:35 2020 -0700
> Commit: Linus Torvalds <[email protected]>
> CommitDate: Fri Oct 16 11:11:18 2020 -0700
>
> mm/page_alloc: place pages to tail in __free_pages_core()

This is my kernel config, for the reference: https://gist.github.com/twistedfall/455885024c56587fc5a0f4b2784612e8

2020-11-05 10:49:25

by Pavel Procopiuc

[permalink] [raw]
Subject: Re: Regression: QCA6390 fails with "mm/page_alloc: place pages to tail in __free_pages_core()"

Op 05.11.2020 om 11:42 schreef Vlastimil Babka:
> Let me paste from the ath11k discussion:
>
>> * Relevant errors from the log:
>> # journalctl -b | grep -iP '05:00|ath11k'
>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link
>> at 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
>> Nov 02 10:41:27 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
>> Nov 02 10:41:27 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
>> Nov 02 10:41:27 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
>> Nov 02 10:41:27 razor kernel: mhi 0000:05:00.0: Requested to power ON
>> Nov 02 10:41:27 razor kernel: mhi 0000:05:00.0: Power on setup success
>> Nov 02 10:41:27 razor kernel: ath11k_pci 0000:05:00.0: Respond mem req failed, result: 1, err: 0
>
> This seems to be ath11k_qmi_respond_fw_mem_request(). Why is it failure with error 0? No idea.
>
> What would happen if all the GFP_KERNEL in the file were changed to GFP_DMA32?
>
> I'm thinking the hardware perhaps doesn't like too high physical addresses or something. But if I think correctly,
> freeing to tail should actually move them towards head. So it's weird.

Well, in fact I still have this particular error, although the hardware works correctly. This is my current log:

# journalctl -b | grep -iP '05:00|ath11k|Linux version'
Nov 05 09:43:31 razor kernel: Linux version 5.10.0-rc2 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
p6) 2.34.0) #4 SMP Thu Nov 5 09:26:00 CET 2020
Nov 05 09:43:31 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
Nov 05 09:43:31 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
Nov 05 09:43:31 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
Nov 05 09:43:31 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
Nov 05 09:43:31 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
Nov 05 09:43:32 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
Nov 05 09:43:32 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
Nov 05 09:43:32 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
Nov 05 09:43:32 razor kernel: mhi 0000:05:00.0: Requested to power ON
Nov 05 09:43:32 razor kernel: mhi 0000:05:00.0: Power on setup success
Nov 05 09:43:32 razor kernel: ath11k_pci 0000:05:00.0: Respond mem req failed, result: 1, err: 0
Nov 05 09:43:32 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-22
Nov 05 09:43:32 razor kernel: ath11k_pci 0000:05:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
Nov 05 09:43:32 razor kernel: ath11k_pci 0000:05:00.0: fw_version 0x101c06cc fw_build_timestamp 2020-06-24 19:50 fw_build_id
Nov 05 09:43:33 razor NetworkManager[777]: <info> [1604565813.8043] rfkill1: found Wi-Fi radio killswitch (at
/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0/ieee80211/phy0/rfkill1) (driver ath11k_pci)
Nov 05 09:43:35 razor ModemManager[720]: <info> Couldn't check support for device
'/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by any plugin

2020-11-05 12:55:54

by Pavel Procopiuc

[permalink] [raw]
Subject: Re: Regression: QCA6390 fails with "mm/page_alloc: place pages to tail in __free_pages_core()"

Op 05.11.2020 om 12:13 schreef David Hildenbrand:
> It depends in which order memory is exposed to MM, which might depend on other factors in some configurations.
>
> This smells like it exposes an existing bug. Can you reproduce also with zone shuffling enabled?

So just to make sure I understand you correctly, you'd like to see if the problem with ath11k driver on my hardware
persists when I boot pristine 5.10-rc2 kernel (without reverting commit 7fef431be9c9ac255838a9578331567b9dba4477) and
with page_alloc.shuffle=1, right?

2020-11-06 20:42:54

by David Hildenbrand

[permalink] [raw]
Subject: Re: Regression: QCA6390 fails with "mm/page_alloc: place pages to tail in __free_pages_core()"

On 06.11.20 18:32, Pavel Procopiuc wrote:
> Op 05.11.2020 om 21:23 schreef David Hildenbrand:
>>> So just to make sure I understand you correctly, you'd like to see if the problem with ath11k driver on my hardware persists when I boot pristine 5.10-rc2 kernel (without reverting commit 7fef431be9c9ac255838a9578331567b9dba4477) and with page_alloc.shuffle=1, right?
>>>
>>
>> Right, but as lists are randomized then it might take a couple of tries to reproduce. I‘ll have a look at the driver code / failing path on Monday, when back to work.
>
> I have done 5 boots of pristine 5.10-rc2 with page_alloc.shuffle=1. Out of those: 1st, 2nd, 4th and 5th resulted in
> working ath11k driver, logs were the same as with the commit 7fef431be9c9ac255838a9578331567b9dba4477 reverted. The 3rd
> one failed, but in a different way, I just had no output from the driver after initialization lines:
>
> Nov 06 18:19:41 razor kernel: Linux version 5.10.0-rc2 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
> p6) 2.34.0) #8 SMP Fri Nov 6 18:14:36 CET 2020
> Nov 06 18:19:41 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
> Nov 06 18:19:41 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
> Nov 06 18:19:41 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
> Nov 06 18:19:41 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
> 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
> Nov 06 18:19:41 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
> Nov 06 18:19:42 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
> Nov 06 18:19:42 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
> Nov 06 18:19:42 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
> Nov 06 18:19:42 razor kernel: mhi 0000:05:00.0: Requested to power ON
> Nov 06 18:19:42 razor kernel: mhi 0000:05:00.0: Power on setup success
>
> I had this before and usually it was fixed after rebooting into Windows and back. This time I just went and rebooted
> into Linux again and driver was working on that boot (4th).

I'm sorry, but "WARNING: ath11k PCI support is experimental!" and such
occasional issues don't give me the best feeling that everything is
operating as it should :)

>
> After that I removed page_alloc.shuffle=1 and did 2 additional boots, both of them resulted in a non-working driver with
> the error messages about not being able to talk to firmware like I had before on the clean 5.10-rc2:
>
> Nov 06 18:24:07 razor kernel: Linux version 5.10.0-rc2 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
> p6) 2.34.0) #9 SMP Fri Nov 6 18:22:43 CET 2020
> Nov 06 18:24:07 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
> Nov 06 18:24:07 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
> Nov 06 18:24:07 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
> Nov 06 18:24:07 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
> 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
> Nov 06 18:24:07 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
> Nov 06 18:24:08 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
> Nov 06 18:24:08 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
> Nov 06 18:24:08 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
> Nov 06 18:24:08 razor kernel: mhi 0000:05:00.0: Requested to power ON
> Nov 06 18:24:08 razor kernel: mhi 0000:05:00.0: Power on setup success
> Nov 06 18:24:08 razor kernel: ath11k_pci 0000:05:00.0: Respond mem req failed, result: 1, err: 0
> Nov 06 18:24:08 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-22
> Nov 06 18:24:13 razor kernel: ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> Nov 06 18:24:13 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
> Nov 06 18:25:39 razor kernel: mhi 0000:05:00.0: Device failed to exit MHI Reset state
>

Okay, that means that you should be able to reproduce
pre-7fef431be9c9ac255838a9578331567b9dba4477 with page_alloc.shuffle=1
as well ... it just might take a lot of tries to get a problematic page.

I could also imagine that loading the driver deferred, after quite some
system/mm activity could result in the same issue.

Looks like something either cannot handle a specific address we received
via dma_alloc_coherent(), or something is reading out of bounds, and the
content after our allocated page doesn't have the expected value anymore
(e.g., used to be zero, now no longer zero).

What puzzles me is that "err: 0". That should have been properly set by
HW, no?

--
Thanks,

David / dhildenb

2020-11-11 19:25:01

by Kalle Valo

[permalink] [raw]
Subject: Re: Regression: QCA6390 fails with "mm/page_alloc: place pages to tail in __free_pages_core()"

David Hildenbrand <[email protected]> writes:

>> Am 05.11.2020 um 11:42 schrieb Vlastimil Babka <[email protected]>:
>>
>> On 11/5/20 10:04 AM, Kalle Valo wrote:
>>> (changing the subject, adding more lists and people)
>>> Pavel Procopiuc <[email protected]> writes:
>>>> Op 04.11.2020 om 10:12 schreef Kalle Valo:
>>>>> Yeah, it is unfortunately time consuming but it is the best way to get
>>>>> bottom of this.
>>>>
>>>> I have found the commit that breaks things for me, it's
>>>> 7fef431be9c9ac255838a9578331567b9dba4477 mm/page_alloc: place pages to
>>>> tail in __free_pages_core()
>>>>
>>>> I've reverted it on top of the 5.10-rc2 and ath11k driver loads fine
>>>> and I have wifi working.
>>> Oh, very interesting. Thanks a lot for the bisection, otherwise we would
>>> have never found out whats causing this.
>>> David & mm folks: Pavel noticed that his QCA6390 Wi-Fi 6 device (driver
>>> ath11k) failed on v5.10-rc1. After bisecting he found that the commit
>>> below causes the regression. I have not been able to reproduce this and
>>> for me QCA6390 works fine. I don't know if this needs a specific kernel
>>> configuration or what's the difference between our setups.
>>> Any ideas what might cause this and how to fix it?
>>> Full discussion:
>>> http://lists.infradead.org/pipermail/ath11k/2020-November/000501.html
>>> commit 7fef431be9c9ac255838a9578331567b9dba4477
>>> Author: David Hildenbrand <[email protected]>
>>> AuthorDate: Thu Oct 15 20:09:35 2020 -0700
>>> Commit: Linus Torvalds <[email protected]>
>>> CommitDate: Fri Oct 16 11:11:18 2020 -0700
>>> mm/page_alloc: place pages to tail in __free_pages_core()
>>
>> Let me paste from the ath11k discussion:
>>
>>> * Relevant errors from the log:
>>> # journalctl -b | grep -iP '05:00|ath11k'
>>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
>>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: reg 0x10: [mem
>>> 0xd2100000-0xd21fffff 64bit]
>>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
>>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: 4.000 Gb/s
>>> available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
>>> 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
>>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
>>> Nov 02 10:41:27 razor kernel: ath11k_pci 0000:05:00.0: WARNING:
>>> ath11k PCI support is experimental!
>>> Nov 02 10:41:27 razor kernel: ath11k_pci 0000:05:00.0: BAR 0:
>>> assigned [mem 0xd2100000-0xd21fffff 64bit]
>>> Nov 02 10:41:27 razor kernel: ath11k_pci 0000:05:00.0: enabling
>>> device (0000 -> 0002)
>>> Nov 02 10:41:27 razor kernel: mhi 0000:05:00.0: Requested to power ON
>>> Nov 02 10:41:27 razor kernel: mhi 0000:05:00.0: Power on setup success
>>> Nov 02 10:41:27 razor kernel: ath11k_pci 0000:05:00.0: Respond mem
>>> req failed, result: 1, err: 0
>>
>> This seems to be ath11k_qmi_respond_fw_mem_request(). Why is it
>> failure with error 0? No idea.
>>
>> What would happen if all the GFP_KERNEL in the file were changed to GFP_DMA32?
>>
>> I'm thinking the hardware perhaps doesn't like too high physical
>> addresses or something. But if I think correctly, freeing to tail
>> should actually move them towards head. So it's weird.
>
> It depends in which order memory is exposed to MM, which might depend
> on other factors in some configurations.
>
> This smells like it exposes an existing bug. Can you reproduce also
> with zone shuffling enabled?

I think I can reproduce this bug now on my NUC box by disabling vt-d in
the BIOS, but I'm not sure yet if it really is the same problem or not.
I included some debug messages to this ath11k patch:

https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/

Pavel, can you test with that patch on v5.10-rc2 and provide the ath11k
log messages? Preferably both before and after reverting commit
7fef431be9c9. Do note that I'm not expecting the debug patch to fix
anything, in your case it's just for providing more debug info.

With vt-d disabled on v5.10-rc2 before the revert I see:

ath11k_pci 0000:06:00.0: WARNING: ath11k PCI support is experimental!
ath11k_pci 0000:06:00.0: BAR 0: assigned [mem 0xdb000000-0xdbffffff 64bit]
ath11k_pci 0000:06:00.0: enabling device (0000 -> 0002)
ath11k_pci 0000:06:00.0: MSI vectors: 1
NET: Registered protocol family 42
mhi 0000:06:00.0: Requested to power ON
mhi 0000:06:00.0: Power on setup success
ath11k_pci 0000:06:00.0: Respond mem req failed, result: 1, err: 0
ath11k_pci 0000:06:00.0: qmi failed to respond fw mem req:-22
ath11k_pci 0000:06:00.0: req mem_seg[0] 0x1580000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[1] 0x1600000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[2] 0x1680000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[3] 0x1700000 294912 1
ath11k_pci 0000:06:00.0: req mem_seg[4] 0x1780000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[5] 0x1800000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[6] 0x1880000 458752 1
ath11k_pci 0000:06:00.0: req mem_seg[7] 0x1520000 131072 1
ath11k_pci 0000:06:00.0: req mem_seg[8] 0x1900000 524288 4
ath11k_pci 0000:06:00.0: req mem_seg[9] 0x1980000 360448 4
ath11k_pci 0000:06:00.0: req mem_seg[10] 0x1540000 16384 1
ath11k_pci 0000:06:00.0: qmi failed memory request, err = -110
ath11k_pci 0000:06:00.0: qmi failed to respond fw mem req:-110

With vt-d disabled on v5.10-rc2 and reverting commit 7fef431be9c9 I see:

ath11k_pci 0000:06:00.0: WARNING: ath11k PCI support is experimental!
ath11k_pci 0000:06:00.0: BAR 0: assigned [mem 0xdb000000-0xdbffffff 64bit]
ath11k_pci 0000:06:00.0: MSI vectors: 1
mhi 0000:06:00.0: Requested to power ON
mhi 0000:06:00.0: Power on setup success
ath11k_pci 0000:06:00.0: Respond mem req failed, result: 1, err: 0
ath11k_pci 0000:06:00.0: qmi failed to respond fw mem req:-22
ath11k_pci 0000:06:00.0: req mem_seg[0] 0x76300000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[1] 0x76380000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[2] 0x76a00000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[3] 0x76a80000 294912 1
ath11k_pci 0000:06:00.0: req mem_seg[4] 0x76b00000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[5] 0x76b80000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[6] 0x76400000 458752 1
ath11k_pci 0000:06:00.0: req mem_seg[7] 0x761a0000 131072 1
ath11k_pci 0000:06:00.0: req mem_seg[8] 0x76480000 524288 4
ath11k_pci 0000:06:00.0: req mem_seg[9] 0x76500000 360448 4
ath11k_pci 0000:06:00.0: req mem_seg[10] 0x76580000 16384 1
ath11k_pci 0000:06:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
ath11k_pci 0000:06:00.0: fw_version 0x101c06cc fw_build_timestamp 2020-06-24 19:50 fw_build_id

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-11-11 20:42:28

by Pavel Procopiuc

[permalink] [raw]
Subject: Re: Regression: QCA6390 fails with "mm/page_alloc: place pages to tail in __free_pages_core()"

Op 11.11.2020 om 20:23 schreef Kalle Valo:
> Pavel, can you test with that patch on v5.10-rc2 and provide the ath11k
> log messages? Preferably both before and after reverting commit
> 7fef431be9c9. Do note that I'm not expecting the debug patch to fix
> anything, in your case it's just for providing more debug info.
>
> With vt-d disabled on v5.10-rc2 before the revert I see:
>
> ath11k_pci 0000:06:00.0: WARNING: ath11k PCI support is experimental!
> ath11k_pci 0000:06:00.0: BAR 0: assigned [mem 0xdb000000-0xdbffffff 64bit]
> ath11k_pci 0000:06:00.0: enabling device (0000 -> 0002)
> ath11k_pci 0000:06:00.0: MSI vectors: 1
> NET: Registered protocol family 42
> mhi 0000:06:00.0: Requested to power ON
> mhi 0000:06:00.0: Power on setup success
> ath11k_pci 0000:06:00.0: Respond mem req failed, result: 1, err: 0
> ath11k_pci 0000:06:00.0: qmi failed to respond fw mem req:-22
> ath11k_pci 0000:06:00.0: req mem_seg[0] 0x1580000 524288 1
> ath11k_pci 0000:06:00.0: req mem_seg[1] 0x1600000 524288 1
> ath11k_pci 0000:06:00.0: req mem_seg[2] 0x1680000 524288 1
> ath11k_pci 0000:06:00.0: req mem_seg[3] 0x1700000 294912 1
> ath11k_pci 0000:06:00.0: req mem_seg[4] 0x1780000 524288 1
> ath11k_pci 0000:06:00.0: req mem_seg[5] 0x1800000 524288 1
> ath11k_pci 0000:06:00.0: req mem_seg[6] 0x1880000 458752 1
> ath11k_pci 0000:06:00.0: req mem_seg[7] 0x1520000 131072 1
> ath11k_pci 0000:06:00.0: req mem_seg[8] 0x1900000 524288 4
> ath11k_pci 0000:06:00.0: req mem_seg[9] 0x1980000 360448 4
> ath11k_pci 0000:06:00.0: req mem_seg[10] 0x1540000 16384 1
> ath11k_pci 0000:06:00.0: qmi failed memory request, err = -110
> ath11k_pci 0000:06:00.0: qmi failed to respond fw mem req:-110
>
> With vt-d disabled on v5.10-rc2 and reverting commit 7fef431be9c9 I see:
>
> ath11k_pci 0000:06:00.0: WARNING: ath11k PCI support is experimental!
> ath11k_pci 0000:06:00.0: BAR 0: assigned [mem 0xdb000000-0xdbffffff 64bit]
> ath11k_pci 0000:06:00.0: MSI vectors: 1
> mhi 0000:06:00.0: Requested to power ON
> mhi 0000:06:00.0: Power on setup success
> ath11k_pci 0000:06:00.0: Respond mem req failed, result: 1, err: 0
> ath11k_pci 0000:06:00.0: qmi failed to respond fw mem req:-22
> ath11k_pci 0000:06:00.0: req mem_seg[0] 0x76300000 524288 1
> ath11k_pci 0000:06:00.0: req mem_seg[1] 0x76380000 524288 1
> ath11k_pci 0000:06:00.0: req mem_seg[2] 0x76a00000 524288 1
> ath11k_pci 0000:06:00.0: req mem_seg[3] 0x76a80000 294912 1
> ath11k_pci 0000:06:00.0: req mem_seg[4] 0x76b00000 524288 1
> ath11k_pci 0000:06:00.0: req mem_seg[5] 0x76b80000 524288 1
> ath11k_pci 0000:06:00.0: req mem_seg[6] 0x76400000 458752 1
> ath11k_pci 0000:06:00.0: req mem_seg[7] 0x761a0000 131072 1
> ath11k_pci 0000:06:00.0: req mem_seg[8] 0x76480000 524288 4
> ath11k_pci 0000:06:00.0: req mem_seg[9] 0x76500000 360448 4
> ath11k_pci 0000:06:00.0: req mem_seg[10] 0x76580000 16384 1
> ath11k_pci 0000:06:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
> ath11k_pci 0000:06:00.0: fw_version 0x101c06cc fw_build_timestamp 2020-06-24 19:50 fw_build_id

I have had VT-d turned on the whole time in my previous tests. I have tried turning it off for some of this tests and it
doesn't seem to affect my main bug. Here are the results:

1. Without reverting the 7fef431be9c9, VT-d on (wifi doesn't work):
Nov 11 21:19:20 razor kernel: Linux version 5.10.0-rc2 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
p6) 2.34.0) #1 SMP Wed Nov 11 21:12:24 CET 2020
Nov 11 21:19:20 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
Nov 11 21:19:20 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
Nov 11 21:19:20 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
Nov 11 21:19:20 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
Nov 11 21:19:20 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
Nov 11 21:19:21 razor kernel: mhi 0000:05:00.0: Requested to power ON
Nov 11 21:19:21 razor kernel: mhi 0000:05:00.0: Power on setup success
Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: Respond mem req failed, result: 1, err: 0
Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-22
Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[0] 0x1500000 524288 1
Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[1] 0x1580000 524288 1
Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[2] 0x1600000 524288 1
Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[3] 0x1680000 294912 1
Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[4] 0x1700000 524288 1
Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[5] 0x1780000 524288 1
Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[6] 0x1800000 458752 1
Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[7] 0x11e0000 131072 1
Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[8] 0x1880000 524288 4
Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[9] 0x1900000 360448 4
Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[10] 0x1980000 16384 1
Nov 11 21:19:26 razor kernel: ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
Nov 11 21:19:26 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110

2. With reverting 7fef431be9c9, VT-d on (wifi does work):
Nov 11 21:21:50 razor kernel: Linux version 5.10.0-rc2 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
p6) 2.34.0) #2 SMP Wed Nov 11 21:20:51 CET 2020
Nov 11 21:21:50 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
Nov 11 21:21:50 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
Nov 11 21:21:50 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
Nov 11 21:21:50 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
Nov 11 21:21:50 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
Nov 11 21:21:51 razor kernel: mhi 0000:05:00.0: Requested to power ON
Nov 11 21:21:51 razor kernel: mhi 0000:05:00.0: Power on setup success
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: Respond mem req failed, result: 1, err: 0
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-22
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[0] 0x3f100000 524288 1
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[1] 0x3f180000 524288 1
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[2] 0x3f200000 524288 1
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[3] 0x3f280000 294912 1
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[4] 0x3f300000 524288 1
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[5] 0x3f380000 524288 1
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[6] 0x3fc00000 458752 1
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[7] 0x3f0c0000 131072 1
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[8] 0x3fc80000 524288 4
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[9] 0x3fd00000 360448 4
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[10] 0x3f0a4000 16384 1
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: fw_version 0x101c06cc fw_build_timestamp 2020-06-24 19:50
fw_build_id
Nov 11 21:21:53 razor NetworkManager[786]: <info> [1605126113.1294] rfkill1: found Wi-Fi radio killswitch (at
/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0/ieee80211/phy0/rfkill1) (driver ath11k_pci)
Nov 11 21:21:55 razor ModemManager[724]: <info> Couldn't check support for device
'/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by any plugin

3. Without reverting the 7fef431be9c9, VT-d off (wifi doesn't work):
Nov 11 21:32:41 razor kernel: Linux version 5.10.0-rc2 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
p6) 2.34.0) #3 SMP Wed Nov 11 21:31:35 CET 2020
Nov 11 21:32:41 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
Nov 11 21:32:41 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
Nov 11 21:32:41 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
Nov 11 21:32:41 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 1
Nov 11 21:32:42 razor kernel: mhi 0000:05:00.0: Requested to power ON
Nov 11 21:32:42 razor kernel: mhi 0000:05:00.0: Power on setup success
Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: Respond mem req failed, result: 1, err: 0
Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-22
Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[0] 0x1480000 524288 1
Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[1] 0x1500000 524288 1
Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[2] 0x1580000 524288 1
Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[3] 0x1600000 294912 1
Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[4] 0x1680000 524288 1
Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[5] 0x1700000 524288 1
Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[6] 0x1780000 458752 1
Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[7] 0x1800000 131072 1
Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[8] 0x1880000 524288 4
Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[9] 0x1900000 360448 4
Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[10] 0x10e4000 16384 1
Nov 11 21:32:47 razor kernel: ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
Nov 11 21:32:47 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110

4. With reverting 7fef431be9c9, VT-d off (not sure if wifi works, system hung shortly thereafter):
Nov 11 21:28:16 razor kernel: Linux version 5.10.0-rc2 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
p6) 2.34.0) #2 SMP Wed Nov 11 21:20:51 CET 2020
Nov 11 21:28:16 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
Nov 11 21:28:16 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
Nov 11 21:28:16 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
Nov 11 21:28:16 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 1
Nov 11 21:28:17 razor kernel: mhi 0000:05:00.0: Requested to power ON
Nov 11 21:28:17 razor kernel: mhi 0000:05:00.0: Power on setup success
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: Respond mem req failed, result: 1, err: 0
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-22
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[0] 0x3f900000 524288 1
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[1] 0x3f980000 524288 1
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[2] 0x3fa00000 524288 1
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[3] 0x3fa80000 294912 1
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[4] 0x3fb00000 524288 1
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[5] 0x3fb80000 524288 1
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[6] 0x40800000 458752 1
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[7] 0x3f8c0000 131072 1
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[8] 0x40880000 524288 4
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[9] 0x40900000 360448 4
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[10] 0x3f8a4000 16384 1
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: fw_version 0x101c06cc fw_build_timestamp 2020-06-24 19:50
fw_build_id
Nov 11 21:28:19 razor NetworkManager[782]: <info> [1605126499.2535] rfkill1: found Wi-Fi radio killswitch (at
/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0/ieee80211/phy0/rfkill1) (driver ath11k_pci)
Nov 11 21:28:21 razor ModemManager[717]: <info> Couldn't check support for device
'/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by any plugin
Nov 11 21:28:58 razor kernel: ath11k_pci 0000:05:00.0: failed to receive scan abort comple: timed out
Nov 11 21:28:58 razor kernel: ath11k_pci 0000:05:00.0: failed to abort scan: -110
Nov 11 21:29:01 razor kernel: ath11k_pci 0000:05:00.0: wmi command 12289 timeout
Nov 11 21:29:01 razor kernel: ath11k_pci 0000:05:00.0: failed to send WMI_START_SCAN_CMDID
Nov 11 21:29:01 razor kernel: ath11k_pci 0000:05:00.0: failed to start hw scan: -11

2020-11-12 10:49:46

by David Hildenbrand

[permalink] [raw]
Subject: Re: Regression: QCA6390 fails with "mm/page_alloc: place pages to tail in __free_pages_core()"

On 11.11.20 21:41, Pavel Procopiuc wrote:
> Op 11.11.2020 om 20:23 schreef Kalle Valo:
>> Pavel, can you test with that patch on v5.10-rc2 and provide the ath11k
>> log messages? Preferably both before and after reverting commit
>> 7fef431be9c9. Do note that I'm not expecting the debug patch to fix
>> anything, in your case it's just for providing more debug info.
>>
>> With vt-d disabled on v5.10-rc2 before the revert I see:
>>
>> ath11k_pci 0000:06:00.0: WARNING: ath11k PCI support is experimental!
>> ath11k_pci 0000:06:00.0: BAR 0: assigned [mem 0xdb000000-0xdbffffff 64bit]
>> ath11k_pci 0000:06:00.0: enabling device (0000 -> 0002)
>> ath11k_pci 0000:06:00.0: MSI vectors: 1
>> NET: Registered protocol family 42
>> mhi 0000:06:00.0: Requested to power ON
>> mhi 0000:06:00.0: Power on setup success
>> ath11k_pci 0000:06:00.0: Respond mem req failed, result: 1, err: 0
>> ath11k_pci 0000:06:00.0: qmi failed to respond fw mem req:-22
>> ath11k_pci 0000:06:00.0: req mem_seg[0] 0x1580000 524288 1
>> ath11k_pci 0000:06:00.0: req mem_seg[1] 0x1600000 524288 1
>> ath11k_pci 0000:06:00.0: req mem_seg[2] 0x1680000 524288 1
>> ath11k_pci 0000:06:00.0: req mem_seg[3] 0x1700000 294912 1
>> ath11k_pci 0000:06:00.0: req mem_seg[4] 0x1780000 524288 1
>> ath11k_pci 0000:06:00.0: req mem_seg[5] 0x1800000 524288 1
>> ath11k_pci 0000:06:00.0: req mem_seg[6] 0x1880000 458752 1
>> ath11k_pci 0000:06:00.0: req mem_seg[7] 0x1520000 131072 1
>> ath11k_pci 0000:06:00.0: req mem_seg[8] 0x1900000 524288 4
>> ath11k_pci 0000:06:00.0: req mem_seg[9] 0x1980000 360448 4
>> ath11k_pci 0000:06:00.0: req mem_seg[10] 0x1540000 16384 1
>> ath11k_pci 0000:06:00.0: qmi failed memory request, err = -110
>> ath11k_pci 0000:06:00.0: qmi failed to respond fw mem req:-110
>>
>> With vt-d disabled on v5.10-rc2 and reverting commit 7fef431be9c9 I see:
>>
>> ath11k_pci 0000:06:00.0: WARNING: ath11k PCI support is experimental!
>> ath11k_pci 0000:06:00.0: BAR 0: assigned [mem 0xdb000000-0xdbffffff 64bit]
>> ath11k_pci 0000:06:00.0: MSI vectors: 1
>> mhi 0000:06:00.0: Requested to power ON
>> mhi 0000:06:00.0: Power on setup success
>> ath11k_pci 0000:06:00.0: Respond mem req failed, result: 1, err: 0
>> ath11k_pci 0000:06:00.0: qmi failed to respond fw mem req:-22
>> ath11k_pci 0000:06:00.0: req mem_seg[0] 0x76300000 524288 1
>> ath11k_pci 0000:06:00.0: req mem_seg[1] 0x76380000 524288 1
>> ath11k_pci 0000:06:00.0: req mem_seg[2] 0x76a00000 524288 1
>> ath11k_pci 0000:06:00.0: req mem_seg[3] 0x76a80000 294912 1
>> ath11k_pci 0000:06:00.0: req mem_seg[4] 0x76b00000 524288 1
>> ath11k_pci 0000:06:00.0: req mem_seg[5] 0x76b80000 524288 1
>> ath11k_pci 0000:06:00.0: req mem_seg[6] 0x76400000 458752 1
>> ath11k_pci 0000:06:00.0: req mem_seg[7] 0x761a0000 131072 1
>> ath11k_pci 0000:06:00.0: req mem_seg[8] 0x76480000 524288 4
>> ath11k_pci 0000:06:00.0: req mem_seg[9] 0x76500000 360448 4
>> ath11k_pci 0000:06:00.0: req mem_seg[10] 0x76580000 16384 1
>> ath11k_pci 0000:06:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
>> ath11k_pci 0000:06:00.0: fw_version 0x101c06cc fw_build_timestamp 2020-06-24 19:50 fw_build_id
>
> I have had VT-d turned on the whole time in my previous tests. I have tried turning it off for some of this tests and it
> doesn't seem to affect my main bug. Here are the results:
>
> 1. Without reverting the 7fef431be9c9, VT-d on (wifi doesn't work):
> Nov 11 21:19:20 razor kernel: Linux version 5.10.0-rc2 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
> p6) 2.34.0) #1 SMP Wed Nov 11 21:12:24 CET 2020
> Nov 11 21:19:20 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
> Nov 11 21:19:20 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
> Nov 11 21:19:20 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
> Nov 11 21:19:20 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
> 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
> Nov 11 21:19:20 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
> Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
> Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
> Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
> Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
> Nov 11 21:19:21 razor kernel: mhi 0000:05:00.0: Requested to power ON
> Nov 11 21:19:21 razor kernel: mhi 0000:05:00.0: Power on setup success
> Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: Respond mem req failed, result: 1, err: 0
> Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-22
> Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[0] 0x1500000 524288 1
> Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[1] 0x1580000 524288 1
> Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[2] 0x1600000 524288 1
> Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[3] 0x1680000 294912 1
> Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[4] 0x1700000 524288 1
> Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[5] 0x1780000 524288 1
> Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[6] 0x1800000 458752 1
> Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[7] 0x11e0000 131072 1
> Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[8] 0x1880000 524288 4
> Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[9] 0x1900000 360448 4
> Nov 11 21:19:21 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[10] 0x1980000 16384 1
> Nov 11 21:19:26 razor kernel: ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> Nov 11 21:19:26 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
>
> 2. With reverting 7fef431be9c9, VT-d on (wifi does work):
> Nov 11 21:21:50 razor kernel: Linux version 5.10.0-rc2 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
> p6) 2.34.0) #2 SMP Wed Nov 11 21:20:51 CET 2020
> Nov 11 21:21:50 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
> Nov 11 21:21:50 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
> Nov 11 21:21:50 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
> Nov 11 21:21:50 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
> 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
> Nov 11 21:21:50 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
> Nov 11 21:21:51 razor kernel: mhi 0000:05:00.0: Requested to power ON
> Nov 11 21:21:51 razor kernel: mhi 0000:05:00.0: Power on setup success
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: Respond mem req failed, result: 1, err: 0
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-22
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[0] 0x3f100000 524288 1
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[1] 0x3f180000 524288 1
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[2] 0x3f200000 524288 1
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[3] 0x3f280000 294912 1
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[4] 0x3f300000 524288 1
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[5] 0x3f380000 524288 1
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[6] 0x3fc00000 458752 1
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[7] 0x3f0c0000 131072 1
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[8] 0x3fc80000 524288 4
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[9] 0x3fd00000 360448 4
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[10] 0x3f0a4000 16384 1
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
> Nov 11 21:21:51 razor kernel: ath11k_pci 0000:05:00.0: fw_version 0x101c06cc fw_build_timestamp 2020-06-24 19:50
> fw_build_id
> Nov 11 21:21:53 razor NetworkManager[786]: <info> [1605126113.1294] rfkill1: found Wi-Fi radio killswitch (at
> /sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0/ieee80211/phy0/rfkill1) (driver ath11k_pci)
> Nov 11 21:21:55 razor ModemManager[724]: <info> Couldn't check support for device
> '/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by any plugin
>
> 3. Without reverting the 7fef431be9c9, VT-d off (wifi doesn't work):
> Nov 11 21:32:41 razor kernel: Linux version 5.10.0-rc2 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
> p6) 2.34.0) #3 SMP Wed Nov 11 21:31:35 CET 2020
> Nov 11 21:32:41 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
> Nov 11 21:32:41 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
> Nov 11 21:32:41 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
> Nov 11 21:32:41 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
> 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
> Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
> Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
> Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
> Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 1
> Nov 11 21:32:42 razor kernel: mhi 0000:05:00.0: Requested to power ON
> Nov 11 21:32:42 razor kernel: mhi 0000:05:00.0: Power on setup success
> Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: Respond mem req failed, result: 1, err: 0
> Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-22
> Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[0] 0x1480000 524288 1
> Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[1] 0x1500000 524288 1
> Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[2] 0x1580000 524288 1
> Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[3] 0x1600000 294912 1
> Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[4] 0x1680000 524288 1
> Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[5] 0x1700000 524288 1
> Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[6] 0x1780000 458752 1
> Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[7] 0x1800000 131072 1
> Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[8] 0x1880000 524288 4
> Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[9] 0x1900000 360448 4
> Nov 11 21:32:42 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[10] 0x10e4000 16384 1
> Nov 11 21:32:47 razor kernel: ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> Nov 11 21:32:47 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
>
> 4. With reverting 7fef431be9c9, VT-d off (not sure if wifi works, system hung shortly thereafter):
> Nov 11 21:28:16 razor kernel: Linux version 5.10.0-rc2 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
> p6) 2.34.0) #2 SMP Wed Nov 11 21:20:51 CET 2020
> Nov 11 21:28:16 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
> Nov 11 21:28:16 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
> Nov 11 21:28:16 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
> Nov 11 21:28:16 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
> 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 1
> Nov 11 21:28:17 razor kernel: mhi 0000:05:00.0: Requested to power ON
> Nov 11 21:28:17 razor kernel: mhi 0000:05:00.0: Power on setup success
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: Respond mem req failed, result: 1, err: 0
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-22
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[0] 0x3f900000 524288 1
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[1] 0x3f980000 524288 1
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[2] 0x3fa00000 524288 1
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[3] 0x3fa80000 294912 1
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[4] 0x3fb00000 524288 1
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[5] 0x3fb80000 524288 1
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[6] 0x40800000 458752 1
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[7] 0x3f8c0000 131072 1
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[8] 0x40880000 524288 4
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[9] 0x40900000 360448 4
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[10] 0x3f8a4000 16384 1
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
> Nov 11 21:28:17 razor kernel: ath11k_pci 0000:05:00.0: fw_version 0x101c06cc fw_build_timestamp 2020-06-24 19:50
> fw_build_id
> Nov 11 21:28:19 razor NetworkManager[782]: <info> [1605126499.2535] rfkill1: found Wi-Fi radio killswitch (at
> /sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0/ieee80211/phy0/rfkill1) (driver ath11k_pci)
> Nov 11 21:28:21 razor ModemManager[717]: <info> Couldn't check support for device
> '/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by any plugin
> Nov 11 21:28:58 razor kernel: ath11k_pci 0000:05:00.0: failed to receive scan abort comple: timed out
> Nov 11 21:28:58 razor kernel: ath11k_pci 0000:05:00.0: failed to abort scan: -110
> Nov 11 21:29:01 razor kernel: ath11k_pci 0000:05:00.0: wmi command 12289 timeout
> Nov 11 21:29:01 razor kernel: ath11k_pci 0000:05:00.0: failed to send WMI_START_SCAN_CMDID
> Nov 11 21:29:01 razor kernel: ath11k_pci 0000:05:00.0: failed to start hw scan: -11
>

Trying to understand the code, it looks like there are always two rounds
of reqests. The first one always fails ("requesting one big chunk of DMA
memory"), the second one (providing multiple chunks of DMA memory) is
supposed to work - and we do allocate memory.


In the *working* cases we have

Respond mem req failed, result: 1, err: 0
qmi failed to respond fw mem req:-22
...
chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff

We don't fail in qmi_txn_wait() - second request w


In the *non-working* cases we have

Respond mem req failed, result: 1, err: 0
qmi failed to respond fw mem req:-22
...
qmi failed memory request, err = -110
qmi failed to respond fw mem req:-110

We fail in qmi_txn_wait(). We run into a timeout (ETIMEDOUT).

Can we bump up the timeout limit and see if things change? Maybe FW
needs more time with other addresses.

--
Thanks,

David / dhildenb

2020-11-13 08:21:10

by Pavel Procopiuc

[permalink] [raw]
Subject: Re: Regression: QCA6390 fails with "mm/page_alloc: place pages to tail in __free_pages_core()"

Op 12.11.2020 om 11:48 schreef David Hildenbrand:
> Trying to understand the code, it looks like there are always two rounds of reqests. The first one always fails
> ("requesting one big chunk of DMA memory"), the second one (providing multiple chunks of DMA memory) is supposed to work
> - and we do allocate memory.
>
>
> In the *working* cases we have
>
> Respond mem req failed, result: 1, err: 0
> qmi failed to respond fw mem req:-22
> ...
> chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
>
> We don't fail in qmi_txn_wait() - second request w
>
>
> In the *non-working* cases we have
>
> Respond mem req failed, result: 1, err: 0
> qmi failed to respond fw mem req:-22
> ...
> qmi failed memory request, err = -110
> qmi failed to respond fw mem req:-110
>
> We fail in qmi_txn_wait(). We run into a timeout (ETIMEDOUT).
>
> Can we bump up the timeout limit and see if things change? Maybe FW needs more time with other addresses.

I tried increasing ATH11K_QMI_WLANFW_TIMEOUT_MS 20 times to 100000 (i.e. 100 seconds) and it didn't have any positive
effect, the second error (-110) just came 100 seconds later and not 5.

2020-11-13 11:11:53

by Carl Huang

[permalink] [raw]
Subject: Re: Regression: QCA6390 fails with "mm/page_alloc: place pages to tail in __free_pages_core()"

On 2020-11-13 16:17, Pavel Procopiuc wrote:
> Op 12.11.2020 om 11:48 schreef David Hildenbrand:
>> Trying to understand the code, it looks like there are always two
>> rounds of reqests. The first one always fails ("requesting one big
>> chunk of DMA memory"), the second one (providing multiple chunks of
>> DMA memory) is supposed to work - and we do allocate memory.
>>
>>
>> In the *working* cases we have
>>
>> Respond mem req failed, result: 1, err: 0
>> qmi failed to respond fw mem req:-22
>> ...
>> chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
>>
>> We don't fail in qmi_txn_wait() - second request w
>>
>>
>> In the *non-working* cases we have
>>
>> Respond mem req failed, result: 1, err: 0
>> qmi failed to respond fw mem req:-22
>> ...
>> qmi failed memory request, err = -110
>> qmi failed to respond fw mem req:-110
>>
>> We fail in qmi_txn_wait(). We run into a timeout (ETIMEDOUT).
>>
>> Can we bump up the timeout limit and see if things change? Maybe FW
>> needs more time with other addresses.
>
> I tried increasing ATH11K_QMI_WLANFW_TIMEOUT_MS 20 times to 100000
> (i.e. 100 seconds) and it didn't have any positive effect, the second
> error (-110) just came 100 seconds later and not 5.
>
Checked some logs. Looks when the error happens, the physical address
are
very small. Its' between 20M - 30M.

So could you have a try to reserve the memory starting from 20M?
Add "memmap=10M\$20M" to your grub.cfg or edit in kernel parameters. so
ath11k
can't allocate from these address.

Or you can try to reserve even larger memory starting from 20M.

2020-11-13 11:50:39

by Carl Huang

[permalink] [raw]
Subject: Re: Regression: QCA6390 fails with "mm/page_alloc: place pages to tail in __free_pages_core()"

On 2020-11-13 19:08, Carl Huang wrote:
> On 2020-11-13 16:17, Pavel Procopiuc wrote:
>> Op 12.11.2020 om 11:48 schreef David Hildenbrand:
>>> Trying to understand the code, it looks like there are always two
>>> rounds of reqests. The first one always fails ("requesting one big
>>> chunk of DMA memory"), the second one (providing multiple chunks of
>>> DMA memory) is supposed to work - and we do allocate memory.
>>>
>>>
>>> In the *working* cases we have
>>>
>>> Respond mem req failed, result: 1, err: 0
>>> qmi failed to respond fw mem req:-22
>>> ...
>>> chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
>>>
>>> We don't fail in qmi_txn_wait() - second request w
>>>
>>>
>>> In the *non-working* cases we have
>>>
>>> Respond mem req failed, result: 1, err: 0
>>> qmi failed to respond fw mem req:-22
>>> ...
>>> qmi failed memory request, err = -110
>>> qmi failed to respond fw mem req:-110
>>>
>>> We fail in qmi_txn_wait(). We run into a timeout (ETIMEDOUT).
>>>
>>> Can we bump up the timeout limit and see if things change? Maybe FW
>>> needs more time with other addresses.
>>
>> I tried increasing ATH11K_QMI_WLANFW_TIMEOUT_MS 20 times to 100000
>> (i.e. 100 seconds) and it didn't have any positive effect, the second
>> error (-110) just came 100 seconds later and not 5.
>>
> Checked some logs. Looks when the error happens, the physical address
> are
> very small. Its' between 20M - 30M.
>
> So could you have a try to reserve the memory starting from 20M?
> Add "memmap=10M\$20M" to your grub.cfg or edit in kernel parameters. so
> ath11k
> can't allocate from these address.
>
> Or you can try to reserve even larger memory starting from 20M.
>
To guarantee ath11k doesn't get physical address below 32M, reserve some
more, for
example "memmap=12M\$20M".

2020-11-13 12:54:03

by Pavel Procopiuc

[permalink] [raw]
Subject: Re: Regression: QCA6390 fails with "mm/page_alloc: place pages to tail in __free_pages_core()"

Op 13.11.2020 om 12:08 schreef Carl Huang:
> Checked some logs. Looks when the error happens, the physical address are
> very small. Its' between 20M - 30M.
>
> So could you have a try to reserve the memory starting from 20M?
> Add "memmap=10M\$20M" to your grub.cfg or edit in kernel parameters. so ath11k
> can't allocate from these address.
>
> Or you can try to reserve even larger memory starting from 20M.

That worked, booting with memmap=12M$20M resulted in the working wifi:

$ journalctl -b | grep -iP '05:00|ath11k|Linux version|memmap'
Nov 13 13:45:34 razor kernel: Linux version 5.10.0-rc2 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
p6) 2.34.0) #1 SMP Fri Nov 13 13:29:48 CET 2020
Nov 13 13:45:34 razor kernel: Command line: ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2
memmap=12M$20M quiet
Nov 13 13:45:34 razor kernel: DMA zone: 64 pages used for memmap
Nov 13 13:45:34 razor kernel: DMA32 zone: 5165 pages used for memmap
Nov 13 13:45:34 razor kernel: Normal zone: 255840 pages used for memmap
Nov 13 13:45:34 razor kernel: Kernel command line: ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2
memmap=12M$20M quiet ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2 memmap=12M$20M quiet
Nov 13 13:45:34 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
Nov 13 13:45:34 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
Nov 13 13:45:34 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
Nov 13 13:45:34 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
Nov 13 13:45:34 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
Nov 13 13:45:35 razor kernel: mhi 0000:05:00.0: Requested to power ON
Nov 13 13:45:35 razor kernel: mhi 0000:05:00.0: Power on setup success
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: Respond mem req failed, result: 1, err: 0
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-22
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[0] 0x2100000 524288 1
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[1] 0x2180000 524288 1
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[2] 0x2200000 524288 1
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[3] 0x2280000 294912 1
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[4] 0x2300000 524288 1
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[5] 0x2380000 524288 1
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[6] 0x2400000 458752 1
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[7] 0x20c0000 131072 1
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[8] 0x2480000 524288 4
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[9] 0x2500000 360448 4
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[10] 0x20a4000 16384 1
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: fw_version 0x101c06cc fw_build_timestamp 2020-06-24 19:50
fw_build_id
Nov 13 13:45:37 razor NetworkManager[782]: <info> [1605271537.1168] rfkill1: found Wi-Fi radio killswitch (at
/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0/ieee80211/phy0/rfkill1) (driver ath11k_pci)
Nov 13 13:45:39 razor ModemManager[722]: <info> Couldn't check support for device
'/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by any plugin
Nov 13 13:45:45 razor kernel: ath11k_pci 0000:05:00.0: failed to enqueue rx buf: -28

2020-11-13 13:37:48

by wi nk

[permalink] [raw]
Subject: Re: Regression: QCA6390 fails with "mm/page_alloc: place pages to tail in __free_pages_core()"

On Fri, Nov 13, 2020 at 1:52 PM Pavel Procopiuc
<[email protected]> wrote:
>
> Op 13.11.2020 om 12:08 schreef Carl Huang:
> > Checked some logs. Looks when the error happens, the physical address are
> > very small. Its' between 20M - 30M.
> >
> > So could you have a try to reserve the memory starting from 20M?
> > Add "memmap=10M\$20M" to your grub.cfg or edit in kernel parameters. so ath11k
> > can't allocate from these address.
> >
> > Or you can try to reserve even larger memory starting from 20M.
>
> That worked, booting with memmap=12M$20M resulted in the working wifi:
>
> $ journalctl -b | grep -iP '05:00|ath11k|Linux version|memmap'
> Nov 13 13:45:34 razor kernel: Linux version 5.10.0-rc2 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
> p6) 2.34.0) #1 SMP Fri Nov 13 13:29:48 CET 2020
> Nov 13 13:45:34 razor kernel: Command line: ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2
> memmap=12M$20M quiet
> Nov 13 13:45:34 razor kernel: DMA zone: 64 pages used for memmap
> Nov 13 13:45:34 razor kernel: DMA32 zone: 5165 pages used for memmap
> Nov 13 13:45:34 razor kernel: Normal zone: 255840 pages used for memmap
> Nov 13 13:45:34 razor kernel: Kernel command line: ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2
> memmap=12M$20M quiet ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2 memmap=12M$20M quiet
> Nov 13 13:45:34 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
> Nov 13 13:45:34 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
> Nov 13 13:45:34 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
> Nov 13 13:45:34 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
> 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
> Nov 13 13:45:34 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
> Nov 13 13:45:35 razor kernel: mhi 0000:05:00.0: Requested to power ON
> Nov 13 13:45:35 razor kernel: mhi 0000:05:00.0: Power on setup success
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: Respond mem req failed, result: 1, err: 0
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-22
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[0] 0x2100000 524288 1
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[1] 0x2180000 524288 1
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[2] 0x2200000 524288 1
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[3] 0x2280000 294912 1
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[4] 0x2300000 524288 1
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[5] 0x2380000 524288 1
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[6] 0x2400000 458752 1
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[7] 0x20c0000 131072 1
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[8] 0x2480000 524288 4
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[9] 0x2500000 360448 4
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[10] 0x20a4000 16384 1
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: fw_version 0x101c06cc fw_build_timestamp 2020-06-24 19:50
> fw_build_id
> Nov 13 13:45:37 razor NetworkManager[782]: <info> [1605271537.1168] rfkill1: found Wi-Fi radio killswitch (at
> /sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0/ieee80211/phy0/rfkill1) (driver ath11k_pci)
> Nov 13 13:45:39 razor ModemManager[722]: <info> Couldn't check support for device
> '/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by any plugin
> Nov 13 13:45:45 razor kernel: ath11k_pci 0000:05:00.0: failed to enqueue rx buf: -28
>
> --
> ath11k mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/ath11k

When I attempt to boot my 5.10rc2 kernel with that memmap option, my
machine immediately hangs. That said, it seems to have done something
bizarre, as immediately afterwards, if I remove that option and let
5.10 boot normally, it seems to boot and bring up the wifi adapter ok
(which didn't happen before). Now that I've managed to boot 5.10
twice, the first time after a couple of minutes my video started going
nuts and displaying all sorts of artifacts[1]. This time things seem
to be functioning nominally (wifi is online and the machine is
behaving properly). I may just never turn it off again :D.

[1] - Here is what dmesg reported as that was occuring:

[ 16.158464] ath11k_pci 0000:55:00.0 wlp85s0: renamed from wlan0
[ 16.266416] bridge: filtering via arp/ip/ip6tables is no longer
available by default. Update your scripts to load br_netfilter if you
need this.
[ 16.267385] Bridge firewalling registered
[ 16.415304] Initializing XFRM netlink socket
[ 16.587820] process 'docker/tmp/qemu-check645322819/check' started
with executable stack
[ 16.806205] Bluetooth: hci0: QCA Downloading qca/htnv20.bin
[ 17.000375] Bluetooth: hci0: QCA setup on UART is completed
[ 17.022058] NET: Registered protocol family 38
[ 18.149182] rfkill: input handler disabled
[ 30.403700] Bluetooth: RFCOMM TTY layer initialized
[ 30.403704] Bluetooth: RFCOMM socket layer initialized
[ 30.403707] Bluetooth: RFCOMM ver 1.11
[ 30.613483] rfkill: input handler enabled
[ 31.928415] rfkill: input handler disabled
[ 45.130209] ath11k_pci 0000:55:00.0: failed to receive scan abort
comple: timed out
[ 45.130219] ath11k_pci 0000:55:00.0: failed to abort scan: -110
[ 48.202259] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
[ 48.202264] ath11k_pci 0000:55:00.0: failed to send WMI_START_SCAN_CMDID
[ 48.202269] ath11k_pci 0000:55:00.0: failed to start hw scan: -11
[ 48.220668] wlp85s0: authenticate with ec:08:6b:27:01:ea
[ 51.274151] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
[ 51.274153] ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[ 51.274155] ath11k_pci 0000:55:00.0: failed to recalc txpower limit
24 using pdev param 3: -11
[ 54.346271] ath11k_pci 0000:55:00.0: wmi command 20488 timeout
[ 54.346276] ath11k_pci 0000:55:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
[ 54.346283] ath11k_pci 0000:55:00.0: Failed to set beacon interval
for VDEV: 0
[ 57.418158] ath11k_pci 0000:55:00.0: wmi command 20488 timeout
[ 57.418161] ath11k_pci 0000:55:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
[ 57.418163] ath11k_pci 0000:55:00.0: failed to set mgmt tx rate -11
[ 60.490264] ath11k_pci 0000:55:00.0: wmi command 20488 timeout
[ 60.490268] ath11k_pci 0000:55:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
[ 60.490273] ath11k_pci 0000:55:00.0: failed to set beacon tx rate -11
[ 63.562154] ath11k_pci 0000:55:00.0: wmi command 24577 timeout
[ 63.562157] ath11k_pci 0000:55:00.0: failed to submit WMI_PEER_CREATE cmd
[ 63.562159] ath11k_pci 0000:55:00.0: failed to send peer create
vdev_id 0 ret -11
[ 63.562161] ath11k_pci 0000:55:00.0: Failed to add peer:
ec:08:6b:27:01:ea for VDEV: 0
[ 63.562163] ath11k_pci 0000:55:00.0: Failed to add station:
ec:08:6b:27:01:ea for VDEV: 0
[ 63.562196] wlp85s0: failed to insert STA entry for the AP (error -11)
[ 63.562226] ------------[ cut here ]------------
[ 63.562235] WARNING: CPU: 1 PID: 1036 at
drivers/net/wireless/ath/ath11k/mac.c:5287
ath11k_mac_op_unassign_vif_chanctx+0x1e3/0x2e0 [ath11k]
[ 63.562236] Modules linked in: rfcomm cmac algif_hash
algif_skcipher af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink
xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nft_chain_nat
nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables libcrc32c
nfnetlink br_netfilter bridge stp llc snd_soc_skl_hda_dsp
snd_soc_hdac_hdmi qrtr_mhi bnep overlay snd_hda_codec_hdmi
snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic snd_sof_pci
snd_sof_intel_byt snd_sof_intel_ipc snd_sof_intel_hda_common
snd_soc_hdac_hda snd_sof_xtensa_dsp snd_sof_intel_hda snd_sof
snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core
snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg
snd_hda_codec snd_hda_core snd_hwdep snd_pcm qrtr ns snd_seq_midi
ath11k_pci snd_seq_midi_event mhi snd_rawmidi ath11k snd_seq
nls_iso8859_1 qmi_helpers joydev mei_hdcp dell_laptop snd_seq_device
snd_timer ledtrig_audio uvcvideo intel_rapl_msr videobuf2_vmalloc
videobuf2_memops dell_wmi
[ 63.562269] x86_pkg_temp_thermal intel_powerclamp videobuf2_v4l2
dell_smm_hwmon mac80211 hci_uart dell_smbios videobuf2_common coretemp
btqca kvm_intel hid_sensor_als hid_sensor_trigger
industrialio_triggered_buffer kfifo_buf kvm hid_sensor_iio_common
mousedev cfg80211 snd dcdbas btrtl input_leds videodev serio_raw
efi_pstore wmi_bmof intel_cstate dell_wmi_descriptor libarc4 mc
hid_multitouch industrialio btbcm soundcore 8250_dw btintel bluetooth
mei_me mei cros_ec_ishtp cros_ec processor_thermal_device ucsi_acpi
ecdh_generic typec_ucsi intel_rapl_common typec intel_soc_dts_iosf ecc
int3403_thermal int340x_thermal_zone mac_hid acpi_pad intel_hid
int3400_thermal acpi_thermal_rel sparse_keymap acpi_tad sch_fq_codel
parport_pc ppdev lp parport ip_tables x_tables autofs4 dm_crypt
hid_sensor_hub intel_ishtp_loader intel_ishtp_hid hid_generic
i2c_designware_platform i2c_designware_core i915 nvme crct10dif_pclmul
crc32_pclmul i2c_algo_bit rtsx_pci_sdmmc nvme_core ghash_clmulni_intel
[ 63.562306] drm_kms_helper aesni_intel syscopyarea sysfillrect
crypto_simd sysimgblt fb_sys_fops cryptd glue_helper cec psmouse
i2c_i801 rc_core i2c_smbus rtsx_pci intel_lpss_pci drm vmd intel_lpss
thunderbolt intel_ish_ipc idma64 intel_ishtp virt_dma xhci_pci
xhci_pci_renesas wmi i2c_hid hid video backlight pinctrl_tigerlake
[ 63.562323] CPU: 1 PID: 1036 Comm: wpa_supplicant Tainted: G
W I 5.10.0-rc2+ #1
[ 63.562324] Hardware name: Dell Inc. XPS 13 9310/0F7M4C, BIOS 1.1.1
10/05/2020
[ 63.562328] RIP: 0010:ath11k_mac_op_unassign_vif_chanctx+0x1e3/0x2e0 [ath11k]
[ 63.562329] Code: 8b 83 e0 02 00 00 4c 89 e9 be 10 00 00 00 4c 89
e7 48 c7 c2 e8 c2 cb c0 e8 2a 5b 01 00 80 bb 98 03 00 00 00 0f 85 6d
fe ff ff <0f> 0b e9 66 fe ff ff f0 41 80 a6 d8 16 00 00 fe f6 05 16 67
04 00
[ 63.562330] RSP: 0018:ffffa63580bcf760 EFLAGS: 00010246
[ 63.562331] RAX: 0000000000000000 RBX: ffff8e2c1c891588 RCX: 0000000000000000
[ 63.562332] RDX: ffff8e2c2516dac0 RSI: ffff8e2c1c891588 RDI: ffff8e2c1e3c35f8
[ 63.562332] RBP: ffffa63580bcf798 R08: ffff8e2c1c890980 R09: ffffa63580bcf4f8
[ 63.562333] R10: ffffa63580bcf4f0 R11: ffffffff8d152ca8 R12: ffff8e2c27c60000
[ 63.562333] R13: ffff8e2c1a9214d8 R14: 0000000000000000 R15: ffff8e2c1e3c35f8
[ 63.562334] FS: 00007fa314f19800(0000) GS:ffff8e2c4f440000(0000)
knlGS:0000000000000000
[ 63.562335] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 63.562336] CR2: 00007f222eefcaf8 CR3: 0000000860cdc001 CR4: 0000000000770ee0
[ 63.562336] PKRU: 55555554
[ 63.562337] Call Trace:
[ 63.562360] ieee80211_assign_vif_chanctx+0x8f/0x410 [mac80211]
[ 63.562370] __ieee80211_vif_release_channel+0x54/0x140 [mac80211]
[ 63.562380] ieee80211_vif_release_channel+0x3e/0x60 [mac80211]
[ 63.562391] ieee80211_mgd_auth+0x213/0x3e0 [mac80211]
[ 63.562405] ? cfg80211_get_bss+0x1d9/0x2a0 [cfg80211]
[ 63.562414] ieee80211_auth+0x18/0x20 [mac80211]
[ 63.562423] cfg80211_mlme_auth+0x104/0x1e0 [cfg80211]
[ 63.562431] nl80211_authenticate+0x29d/0x2f0 [cfg80211]
[ 63.562435] genl_family_rcv_msg_doit+0xe7/0x150
[ 63.562437] genl_rcv_msg+0xe2/0x1e0
[ 63.562444] ? nl80211_parse_key+0x310/0x310 [cfg80211]
[ 63.562445] ? genl_get_cmd+0xd0/0xd0
[ 63.562446] netlink_rcv_skb+0x55/0x100
[ 63.562447] genl_rcv+0x29/0x40
[ 63.562448] netlink_unicast+0x221/0x330
[ 63.562449] netlink_sendmsg+0x233/0x460
[ 63.562451] sock_sendmsg+0x65/0x70
[ 63.562452] ____sys_sendmsg+0x257/0x2a0
[ 63.562454] ? import_iovec+0x31/0x40
[ 63.562455] ? sendmsg_copy_msghdr+0x7e/0xa0
[ 63.562456] ___sys_sendmsg+0x82/0xc0
[ 63.562458] ? __check_object_size+0x4d/0x150
[ 63.562459] ? _copy_to_user+0x31/0x50
[ 63.562460] ? sock_getsockopt+0x1a1/0xcd0
[ 63.562462] ? unix_ioctl+0x5f/0x70
[ 63.562463] ? sock_do_ioctl+0x40/0x140
[ 63.562465] ? __cgroup_bpf_run_filter_setsockopt+0xb8/0x2e0
[ 63.562467] __sys_sendmsg+0x62/0xb0
[ 63.562468] __x64_sys_sendmsg+0x1f/0x30
[ 63.562470] do_syscall_64+0x38/0x90
[ 63.562471] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 63.562472] RIP: 0033:0x7fa31537c777
[ 63.562473] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7
0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74
24 10
[ 63.562474] RSP: 002b:00007ffdcaf17988 EFLAGS: 00000246 ORIG_RAX:
000000000000002e
[ 63.562475] RAX: ffffffffffffffda RBX: 0000560dc75336e0 RCX: 00007fa31537c777
[ 63.562475] RDX: 0000000000000000 RSI: 00007ffdcaf179c0 RDI: 0000000000000006
[ 63.562476] RBP: 0000560dc75758c0 R08: 0000000000000004 R09: 0000560dc756e760
[ 63.562476] R10: 00007ffdcaf17a94 R11: 0000000000000246 R12: 0000560dc75335f0
[ 63.562477] R13: 00007ffdcaf179c0 R14: 00007ffdcaf17a94 R15: 0000560dc756e760
[ 63.562478] CPU: 1 PID: 1036 Comm: wpa_supplicant Tainted: G
W I 5.10.0-rc2+ #1
[ 63.562479] Hardware name: Dell Inc. XPS 13 9310/0F7M4C, BIOS 1.1.1
10/05/2020
[ 63.562479] Call Trace:
[ 63.562481] dump_stack+0x70/0x8b
[ 63.562485] ? ath11k_mac_op_unassign_vif_chanctx+0x1e3/0x2e0 [ath11k]
[ 63.562487] __warn.cold+0x24/0x77
[ 63.562489] ? ath11k_mac_op_unassign_vif_chanctx+0x1e3/0x2e0 [ath11k]
[ 63.562491] report_bug+0xa1/0xc0
[ 63.562493] handle_bug+0x3e/0xa0
[ 63.562494] exc_invalid_op+0x19/0x70
[ 63.562495] asm_exc_invalid_op+0x12/0x20
[ 63.562497] RIP: 0010:ath11k_mac_op_unassign_vif_chanctx+0x1e3/0x2e0 [ath11k]
[ 63.562498] Code: 8b 83 e0 02 00 00 4c 89 e9 be 10 00 00 00 4c 89
e7 48 c7 c2 e8 c2 cb c0 e8 2a 5b 01 00 80 bb 98 03 00 00 00 0f 85 6d
fe ff ff <0f> 0b e9 66 fe ff ff f0 41 80 a6 d8 16 00 00 fe f6 05 16 67
04 00
[ 63.562498] RSP: 0018:ffffa63580bcf760 EFLAGS: 00010246
[ 63.562499] RAX: 0000000000000000 RBX: ffff8e2c1c891588 RCX: 0000000000000000
[ 63.562499] RDX: ffff8e2c2516dac0 RSI: ffff8e2c1c891588 RDI: ffff8e2c1e3c35f8
[ 63.562500] RBP: ffffa63580bcf798 R08: ffff8e2c1c890980 R09: ffffa63580bcf4f8
[ 63.562500] R10: ffffa63580bcf4f0 R11: ffffffff8d152ca8 R12: ffff8e2c27c60000
[ 63.562501] R13: ffff8e2c1a9214d8 R14: 0000000000000000 R15: ffff8e2c1e3c35f8
[ 63.562511] ieee80211_assign_vif_chanctx+0x8f/0x410 [mac80211]
[ 63.562520] __ieee80211_vif_release_channel+0x54/0x140 [mac80211]
[ 63.562528] ieee80211_vif_release_channel+0x3e/0x60 [mac80211]
[ 63.562538] ieee80211_mgd_auth+0x213/0x3e0 [mac80211]
[ 63.562545] ? cfg80211_get_bss+0x1d9/0x2a0 [cfg80211]
[ 63.562554] ieee80211_auth+0x18/0x20 [mac80211]
[ 63.562562] cfg80211_mlme_auth+0x104/0x1e0 [cfg80211]
[ 63.562570] nl80211_authenticate+0x29d/0x2f0 [cfg80211]
[ 63.562571] genl_family_rcv_msg_doit+0xe7/0x150
[ 63.562573] genl_rcv_msg+0xe2/0x1e0
[ 63.562580] ? nl80211_parse_key+0x310/0x310 [cfg80211]
[ 63.562581] ? genl_get_cmd+0xd0/0xd0
[ 63.562582] netlink_rcv_skb+0x55/0x100
[ 63.562583] genl_rcv+0x29/0x40
[ 63.562583] netlink_unicast+0x221/0x330
[ 63.562584] netlink_sendmsg+0x233/0x460
[ 63.562585] sock_sendmsg+0x65/0x70
[ 63.562586] ____sys_sendmsg+0x257/0x2a0
[ 63.562587] ? import_iovec+0x31/0x40
[ 63.562588] ? sendmsg_copy_msghdr+0x7e/0xa0
[ 63.562589] ___sys_sendmsg+0x82/0xc0
[ 63.562590] ? __check_object_size+0x4d/0x150
[ 63.562591] ? _copy_to_user+0x31/0x50
[ 63.562592] ? sock_getsockopt+0x1a1/0xcd0
[ 63.562593] ? unix_ioctl+0x5f/0x70
[ 63.562594] ? sock_do_ioctl+0x40/0x140
[ 63.562595] ? __cgroup_bpf_run_filter_setsockopt+0xb8/0x2e0
[ 63.562596] __sys_sendmsg+0x62/0xb0
[ 63.562597] __x64_sys_sendmsg+0x1f/0x30
[ 63.562598] do_syscall_64+0x38/0x90
[ 63.562599] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 63.562600] RIP: 0033:0x7fa31537c777
[ 63.562601] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7
0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74
24 10
[ 63.562601] RSP: 002b:00007ffdcaf17988 EFLAGS: 00000246 ORIG_RAX:
000000000000002e
[ 63.562602] RAX: ffffffffffffffda RBX: 0000560dc75336e0 RCX: 00007fa31537c777
[ 63.562602] RDX: 0000000000000000 RSI: 00007ffdcaf179c0 RDI: 0000000000000006
[ 63.562603] RBP: 0000560dc75758c0 R08: 0000000000000004 R09: 0000560dc756e760
[ 63.562603] R10: 00007ffdcaf17a94 R11: 0000000000000246 R12: 0000560dc75335f0
[ 63.562604] R13: 00007ffdcaf179c0 R14: 00007ffdcaf17a94 R15: 0000560dc756e760
[ 63.562605] ---[ end trace fa93bfa591439000 ]---
[ 66.634150] ath11k_pci 0000:55:00.0: wmi command 20486 timeout
[ 66.634152] ath11k_pci 0000:55:00.0: failed to submit WMI_VDEV_STOP cmd
[ 66.634155] ath11k_pci 0000:55:00.0: failed to stop WMI vdev 0: -11
[ 66.634156] ath11k_pci 0000:55:00.0: failed to stop vdev 0: -11
[ 69.962250] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
[ 69.962255] ath11k_pci 0000:55:00.0: failed to send WMI_START_SCAN_CMDID
[ 69.962259] ath11k_pci 0000:55:00.0: failed to start hw scan: -11
[ 74.058210] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
[ 74.058212] ath11k_pci 0000:55:00.0: failed to send WMI_START_SCAN_CMDID
[ 74.058214] ath11k_pci 0000:55:00.0: failed to start hw scan: -11
[ 77.130191] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
[ 77.130193] ath11k_pci 0000:55:00.0: failed to send WMI_START_SCAN_CMDID
[ 77.130196] ath11k_pci 0000:55:00.0: failed to start hw scan: -11
[ 80.202147] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
[ 80.202148] ath11k_pci 0000:55:00.0: failed to send WMI_START_SCAN_CMDID
[ 80.202150] ath11k_pci 0000:55:00.0: failed to start hw scan: -11
[ 83.274164] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
[ 83.274168] ath11k_pci 0000:55:00.0: failed to send WMI_START_SCAN_CMDID
[ 83.274173] ath11k_pci 0000:55:00.0: failed to start hw scan: -11
[ 86.346167] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
<snip>

2020-11-13 15:50:34

by wi nk

[permalink] [raw]
Subject: Re: Regression: QCA6390 fails with "mm/page_alloc: place pages to tail in __free_pages_core()"

On Fri, Nov 13, 2020 at 2:56 PM David Hildenbrand <[email protected]> wrote:
>
> On 13.11.20 14:36, wi nk wrote:
> > On Fri, Nov 13, 2020 at 1:52 PM Pavel Procopiuc
> > <[email protected]> wrote:
> >>
> >> Op 13.11.2020 om 12:08 schreef Carl Huang:
> >>> Checked some logs. Looks when the error happens, the physical address are
> >>> very small. Its' between 20M - 30M.
> >>>
> >>> So could you have a try to reserve the memory starting from 20M?
> >>> Add "memmap=10M\$20M" to your grub.cfg or edit in kernel parameters. so ath11k
> >>> can't allocate from these address.
> >>>
> >>> Or you can try to reserve even larger memory starting from 20M.
> >>
> >> That worked, booting with memmap=12M$20M resulted in the working wifi:
> >>
> >> $ journalctl -b | grep -iP '05:00|ath11k|Linux version|memmap'
> >> Nov 13 13:45:34 razor kernel: Linux version 5.10.0-rc2 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
> >> p6) 2.34.0) #1 SMP Fri Nov 13 13:29:48 CET 2020
> >> Nov 13 13:45:34 razor kernel: Command line: ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2
> >> memmap=12M$20M quiet
> >> Nov 13 13:45:34 razor kernel: DMA zone: 64 pages used for memmap
> >> Nov 13 13:45:34 razor kernel: DMA32 zone: 5165 pages used for memmap
> >> Nov 13 13:45:34 razor kernel: Normal zone: 255840 pages used for memmap
> >> Nov 13 13:45:34 razor kernel: Kernel command line: ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2
> >> memmap=12M$20M quiet ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2 memmap=12M$20M quiet
> >> Nov 13 13:45:34 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
> >> Nov 13 13:45:34 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
> >> Nov 13 13:45:34 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
> >> Nov 13 13:45:34 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
> >> 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
> >> Nov 13 13:45:34 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
> >> Nov 13 13:45:35 razor kernel: mhi 0000:05:00.0: Requested to power ON
> >> Nov 13 13:45:35 razor kernel: mhi 0000:05:00.0: Power on setup success
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: Respond mem req failed, result: 1, err: 0
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-22
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[0] 0x2100000 524288 1
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[1] 0x2180000 524288 1
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[2] 0x2200000 524288 1
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[3] 0x2280000 294912 1
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[4] 0x2300000 524288 1
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[5] 0x2380000 524288 1
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[6] 0x2400000 458752 1
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[7] 0x20c0000 131072 1
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[8] 0x2480000 524288 4
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[9] 0x2500000 360448 4
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: req mem_seg[10] 0x20a4000 16384 1
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
> >> Nov 13 13:45:35 razor kernel: ath11k_pci 0000:05:00.0: fw_version 0x101c06cc fw_build_timestamp 2020-06-24 19:50
> >> fw_build_id
> >> Nov 13 13:45:37 razor NetworkManager[782]: <info> [1605271537.1168] rfkill1: found Wi-Fi radio killswitch (at
> >> /sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0/ieee80211/phy0/rfkill1) (driver ath11k_pci)
> >> Nov 13 13:45:39 razor ModemManager[722]: <info> Couldn't check support for device
> >> '/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by any plugin
> >> Nov 13 13:45:45 razor kernel: ath11k_pci 0000:05:00.0: failed to enqueue rx buf: -28
> >>
> >> --
> >> ath11k mailing list
> >> [email protected]
> >> http://lists.infradead.org/mailman/listinfo/ath11k
> >
> > When I attempt to boot my 5.10rc2 kernel with that memmap option, my
> > machine immediately hangs. That said, it seems to have done something
> > bizarre, as immediately afterwards, if I remove that option and let
> > 5.10 boot normally, it seems to boot and bring up the wifi adapter ok
> > (which didn't happen before). Now that I've managed to boot 5.10
> > twice, the first time after a couple of minutes my video started going
> > nuts and displaying all sorts of artifacts[1]. This time things seem
> > to be functioning nominally (wifi is online and the machine is
> > behaving properly). I may just never turn it off again :D.
>
> Honestly, that FW sounds horribly flawed. :)
>
> Would be interesting what happens when you boot back to 5.9 now ...
>
> --
> Thanks,
>
> David / dhildenb
>

Well nothing super interesting....rebooting to 5.9 hard locked the
machine once the adapter associated, before I could do much.
Rebooting back to 5.10 and it booted fine (I'm sending this email with
it). There's definitely something non deterministic causing the
driver to work occasionally and fail/panic a bit more often. Are
there other memory / device allocation settings I can tweak to see if
something settles it down?