2022-05-02 17:47:16

by H. Nikolaus Schaller

[permalink] [raw]
Subject: Re: [PATCH] wl1251: dynamically allocate memory used for DMA

Hi Arnd,

> Am 02.05.2022 um 16:06 schrieb Arnd Bergmann <[email protected]>:
>
> On Mon, May 2, 2022 at 2:38 PM H. Nikolaus Schaller <[email protected]> wrote:
>> With introduction of vmap'ed stacks, stack parameters can no
>> longer be used for DMA and now leads to kernel panic.
>>
>> It happens at several places for the wl1251 (e.g. when
>> accessed through SDIO) making it unuseable on e.g. the
>> OpenPandora.
>>
>> We solve this by allocating temporary buffers or use wl1251_read32().
>
> This looks all correct to me. I had another look at the related wlcore
> driver now,
> and see that the same problem existed there but was fixed back in 2012
> in a different way, see 690142e98826 ("wl12xx: fix DMA-API-related warnings").

Interesting!

>
> The approach in the wlcore driver appears to be simpler because it
> avoids dynamic memory allocation and the associated error handling.

It looks as if it just avoids kmalloc/free sequences in event handling
by allocating a big enough buffer once.

wl1271_cmd_wait_for_event_or_timeout() allocates it like we do now.

> However, it probably makes another problem worse that also exists
> here:
>
> static inline u32 wl1251_read32(struct wl1251 *wl, int addr)
> {
> u32 response;
> wl->if_ops->read(wl, addr, &wl->buffer_32, sizeof(wl->buffer_32));
> return le32_to_cpu(wl->buffer_32);
> }
>
> I think the 'buffer_32' member of 'struct wl1251' needs an explicit
> '__cacheline_aligned' attribute to avoid potentially clobbering
> some of the structure during a DMA write.
>
> I don't know if anyone cares enough about the two drivers to
> have an opinion. I've added Luca to Cc, but he hasn't maintained
> the driver since 2013 and probably doesn't.

Well, there seems to be quite some common code but indeed devices
using these older chips are getting rare so it is probably not worth
combining code. And testing needs someone who owns boards
with both chips...

>
> It's probably ok to just apply your patch for the moment to fix
> the regression we saw on the machines that we know use this.
>
> One more detail:
>
>> diff --git a/drivers/net/wireless/ti/wl1251/event.c b/drivers/net/wireless/ti/wl1251/event.c
>> index e6d426edab56b..e945aafd88ee5 100644
>> --- a/drivers/net/wireless/ti/wl1251/event.c
>> +++ b/drivers/net/wireless/ti/wl1251/event.c
>> @@ -169,11 +169,9 @@ int wl1251_event_wait(struct wl1251 *wl, u32 mask, int timeout_ms)
>> msleep(1);
>>
>> /* read from both event fields */
>> - wl1251_mem_read(wl, wl->mbox_ptr[0], &events_vector,
>> - sizeof(events_vector));
>> + events_vector = wl1251_mem_read32(wl, wl->mbox_ptr[0]);
>> event = events_vector & mask;
>> - wl1251_mem_read(wl, wl->mbox_ptr[1], &events_vector,
>> - sizeof(events_vector));
>> + events_vector = wl1251_mem_read32(wl, wl->mbox_ptr[1]);
>> event |= events_vector & mask;
>
> This appears to change endianness of the data, on big-endian kernels.
> Is that intentional?

Hm. I didn't think about it. I just noticed that wl1251_mem_read32 uses the
internal buffer so we don't have to allocate its own buffer any more.

>
> My first guess would be that the driver never worked correctly on big-endian
> machines, and that the change is indeed correct, but on the other hand
> the conversion was added in commit ac9e2d9afa90 ("wl1251: convert
> 32-bit values to le32 before writing to the chip") in a way that suggests it
> was meant to work on both.

wl1251_event_wait() seems to work with the masks provided by code.
So I guess the conversion to le32 is harmless on the OpenPandora.
Most likely it should be done on big endian devices. I.e. we might have
done the right thing.

Let's see if someone compains or knows more. Otherwise we should
fix it just for the Pandora and N900 (both omap3 based) as the only
upstream users of this chip.

BR and thanks,
Nikolaus



2022-05-03 00:19:38

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] wl1251: dynamically allocate memory used for DMA

On Mon, May 2, 2022 at 4:47 PM H. Nikolaus Schaller <[email protected]> wrote:
> > Am 02.05.2022 um 16:06 schrieb Arnd Bergmann <[email protected]>:
> > On Mon, May 2, 2022 at 2:38 PM H. Nikolaus Schaller <[email protected]> wrote:
> >
> > The approach in the wlcore driver appears to be simpler because it
> > avoids dynamic memory allocation and the associated error handling.
>
> It looks as if it just avoids kmalloc/free sequences in event handling
> by allocating a big enough buffer once.
>
> wl1271_cmd_wait_for_event_or_timeout() allocates it like we do now.

Ah right, I missed that one.

> > However, it probably makes another problem worse that also exists
> > here:
> >
> > static inline u32 wl1251_read32(struct wl1251 *wl, int addr)
> > {
> > u32 response;
> > wl->if_ops->read(wl, addr, &wl->buffer_32, sizeof(wl->buffer_32));
> > return le32_to_cpu(wl->buffer_32);
> > }
> >
> > I think the 'buffer_32' member of 'struct wl1251' needs an explicit
> > '__cacheline_aligned' attribute to avoid potentially clobbering
> > some of the structure during a DMA write.
> >
> > I don't know if anyone cares enough about the two drivers to
> > have an opinion. I've added Luca to Cc, but he hasn't maintained
> > the driver since 2013 and probably doesn't.
>
> Well, there seems to be quite some common code but indeed devices
> using these older chips are getting rare so it is probably not worth
> combining code. And testing needs someone who owns boards
> with both chips...

No, I wasn't even thinking of combining code, just whether there
is value in keeping the similar bits in sync to ensure we have the
same set of bugs on both ;-)

I think the best fix for both drivers would be to keep the DMA
members (partition and buffer_32) in the respective device
structures, but mark each one as aligned.

> > My first guess would be that the driver never worked correctly on big-endian
> > machines, and that the change is indeed correct, but on the other hand
> > the conversion was added in commit ac9e2d9afa90 ("wl1251: convert
> > 32-bit values to le32 before writing to the chip") in a way that suggests it
> > was meant to work on both.
>
> wl1251_event_wait() seems to work with the masks provided by code.
> So I guess the conversion to le32 is harmless on the OpenPandora.
> Most likely it should be done on big endian devices. I.e. we might have
> done the right thing.
>
> Let's see if someone compains or knows more. Otherwise we should
> fix it just for the Pandora and N900 (both omap3 based) as the only
> upstream users.

Ok. In general I like ensure we keep things working for big-endian
kernels, which are meant to work on any ARMv6+ or newer machine.
Most of the time it's just a couple of drivers (like this one) that get
in the way of actually doing it, but then again very few people ever
care about big-endian ARMv6/v7.

If we don't have a reason to believe this one is actually wrong, I think
fixing the endian issue silently is fine, as is ignoring the potential
other endian issues in the same driver that nobody complained about
in the past decade ;-)

Arnd