2024-04-08 16:44:03

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH 0/3] wifi: Un-embed ath10k and ath11k dummy netdev

Breno Leitao <[email protected]> writes:

> Hello Kalle,
>
> On Fri, Apr 05, 2024 at 06:15:05PM +0300, Kalle Valo wrote:
>> Breno Leitao <[email protected]> writes:
>>
>> > struct net_device shouldn't be embedded into any structure, instead,
>> > the owner should use the private space to embed their state into
>> > net_device.
>> >
>> > This patch set fixes the problem above for ath10k and ath11k. This also
>> > fixes the conversion of qtnfmac driver to the new helper.
>> >
>> > This patch set depends on a series that is still under review:
>> > https://lore.kernel.org/all/[email protected]/#t
>> >
>> > If it helps, I've pushed the tree to
>> > https://github.com/leitao/linux/tree/wireless-dummy
>> >
>> > PS: Due to lack of hardware, unfortunately all these patches are
>> > compiled tested only.
>> >
>> > Breno Leitao (3):
>> > wifi: qtnfmac: Use netdev dummy allocator helper
>> > wifi: ath10k: allocate dummy net_device dynamically
>> > wifi: ath11k: allocate dummy net_device dynamically
>>
>> Thanks for setting up the branch, that makes the testing very easy. I
>> now tested the branch using the commit below with ath11k WCN6855 hw2.0
>> on an x86 box:
>>
>> 5be9a125d8e7 wifi: ath11k: allocate dummy net_device dynamically
>>
>> But unfortunately it crashes, the stack trace below. I can easily test
>> your branches, just let me know what to test. A direct 'git pull'
>> command is the best.
>
> Thanks for the test.
>
> Reading the issue, I am afraid that freeing netdev explicitly
> (free_netdev()) might not be the best approach at the exit path.
>
> I would like to try to leverage the ->needs_free_netdev netdev
> mechanism to do the clean-up, if that makes sense. I've updated the
> ath11k patch, and I am curious if that is what we want.
>
> Would you mind testing a net patch I have, please?
>
> https://github.com/leitao/linux/tree/wireless-dummy_v2

I tested this again with my WCN6855 hw2.0 x86 test box on this commit:

a87674ac820e wifi: ath11k: allocate dummy net_device dynamically

It passes my tests and doesn't crash, but I see this kmemleak warning a
lot:

unreferenced object 0xffff888127109400 (size 128):
comm "insmod", pid 2813, jiffies 4294926528
hex dump (first 32 bytes):
d0 93 d5 0a 81 88 ff ff d0 93 d5 0a 81 88 ff ff ................
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc 870e4f12):
[<ffffffff99bcd375>] kmemleak_alloc+0x45/0x80
[<ffffffff975707a8>] kmalloc_trace+0x278/0x2c0
[<ffffffff992904c5>] __hw_addr_create+0x55/0x260
[<ffffffff992909cb>] __hw_addr_add_ex+0x2fb/0x6d0
[<ffffffff99294004>] dev_addr_init+0x144/0x230
[<ffffffff992629ee>] alloc_netdev_mqs+0x12e/0xfe0
[<ffffffff992638c5>] alloc_netdev_dummy+0x25/0x30
[<ffffffffc0b6b0cd>] ath11k_pcic_ext_irq_config+0x1ad/0xc10 [ath11k]
[<ffffffffc0b6c431>] ath11k_pcic_config_irq+0x2f1/0x4b0 [ath11k]
[<ffffffffc0cb8314>] ath11k_pci_probe+0x874/0x1210 [ath11k_pci]
[<ffffffff97febf06>] local_pci_probe+0xd6/0x180
[<ffffffff97feefaa>] pci_call_probe+0x15a/0x400
[<ffffffff97ff03d6>] pci_device_probe+0xa6/0x100
[<ffffffff98abe315>] really_probe+0x1d5/0x920
[<ffffffff98abed48>] __driver_probe_device+0x2e8/0x3f0
[<ffffffff98abee9a>] driver_probe_device+0x4a/0x140


--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


2024-04-08 19:34:23

by Breno Leitao

[permalink] [raw]
Subject: Re: [PATCH 0/3] wifi: Un-embed ath10k and ath11k dummy netdev

On Mon, Apr 08, 2024 at 07:43:42PM +0300, Kalle Valo wrote:
> Breno Leitao <[email protected]> writes:
> > On Fri, Apr 05, 2024 at 06:15:05PM +0300, Kalle Valo wrote:
> >> Breno Leitao <[email protected]> writes:
> >>
> >> > struct net_device shouldn't be embedded into any structure, instead,
> >> > the owner should use the private space to embed their state into
> >> > net_device.
> >> >
> >> > This patch set fixes the problem above for ath10k and ath11k. This also
> >> > fixes the conversion of qtnfmac driver to the new helper.
> >> >
> >> > This patch set depends on a series that is still under review:
> >> > https://lore.kernel.org/all/[email protected]/#t
> >> >
> >> > If it helps, I've pushed the tree to
> >> > https://github.com/leitao/linux/tree/wireless-dummy
> >> >
> >> > PS: Due to lack of hardware, unfortunately all these patches are
> >> > compiled tested only.
> >> >
> >> > Breno Leitao (3):
> >> > wifi: qtnfmac: Use netdev dummy allocator helper
> >> > wifi: ath10k: allocate dummy net_device dynamically
> >> > wifi: ath11k: allocate dummy net_device dynamically
> >>
> >> Thanks for setting up the branch, that makes the testing very easy. I
> >> now tested the branch using the commit below with ath11k WCN6855 hw2.0
> >> on an x86 box:
> >>
> >> 5be9a125d8e7 wifi: ath11k: allocate dummy net_device dynamically
> >>
> >> But unfortunately it crashes, the stack trace below. I can easily test
> >> your branches, just let me know what to test. A direct 'git pull'
> >> command is the best.
> >
> > Thanks for the test.
> >
> > Reading the issue, I am afraid that freeing netdev explicitly
> > (free_netdev()) might not be the best approach at the exit path.
> >
> > I would like to try to leverage the ->needs_free_netdev netdev
> > mechanism to do the clean-up, if that makes sense. I've updated the
> > ath11k patch, and I am curious if that is what we want.
> >
> > Would you mind testing a net patch I have, please?
> >
> > https://github.com/leitao/linux/tree/wireless-dummy_v2
>
> I tested this again with my WCN6855 hw2.0 x86 test box on this commit:
>
> a87674ac820e wifi: ath11k: allocate dummy net_device dynamically
>
> It passes my tests and doesn't crash, but I see this kmemleak warning a
> lot:

Thanks Kalle, that was helpful. The device is not being clean-up
automatically.

Chatting with Jakub, he suggested coming back to the original approach,
but, adding a additional patch, at the free_netdev().

Would you mind running another test, please?

https://github.com/leitao/linux/tree/wireless-dummy_v3

The branch above is basically the original branch (as in this patch
series), with this additional patch:

Author: Breno Leitao <[email protected]>
Date: Mon Apr 8 11:37:32 2024 -0700

net: free_netdev: exit earlier if dummy

For dummy devices, exit earlier at free_netdev() instead of executing
the whole function. This is necessary, because dummy devices are
special, and shouldn't have the second part of the function executed.

Otherwise reg_state, which is NETREG_DUMMY, will be overwritten and
there will be no way to identify that this is a dummy device. Also, this
device do not need the final put_device(), since dummy devices are not
registered (through register_netdevice()), where the device reference is
increased (at netdev_register_kobject()/device_add()).

Suggested-by: Jakub Kicinski <[email protected]>
Signed-off-by: Breno Leitao <[email protected]>

diff --git a/net/core/dev.c b/net/core/dev.c
index 2b82bd1cd2f8..5d2cb97d0ae6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11058,7 +11058,8 @@ void free_netdev(struct net_device *dev)
dev->xdp_bulkq = NULL;

/* Compatibility with error handling in drivers */
- if (dev->reg_state == NETREG_UNINITIALIZED) {
+ if (dev->reg_state == NETREG_UNINITIALIZED ||
+ dev->reg_state == NETREG_DUMMY) {
netdev_freemem(dev);
return;
}