2021-09-22 11:19:02

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mwifiex: Use non-posted PCI write when setting TX ring write pointer

On Tue, Sep 14, 2021 at 01:48:12PM +0200, Jonas Dre?ler wrote:
> On the 88W8897 card it's very important the TX ring write pointer is
> updated correctly to its new value before setting the TX ready
> interrupt, otherwise the firmware appears to crash (probably because
> it's trying to DMA-read from the wrong place). The issue is present in
> the latest firmware version 15.68.19.p21 of the pcie+usb card.

Please, be consistent in the commit message(s) and the code (esp. if the term
comes from a specification).

Here, PCIe (same in the code, at least that I have noticed, but should be done
everywhere).

> Since PCI uses "posted writes" when writing to a register, it's not
> guaranteed that a write will happen immediately. That means the pointer
> might be outdated when setting the TX ready interrupt, leading to
> firmware crashes especially when ASPM L1 and L1 substates are enabled
> (because of the higher link latency, the write will probably take
> longer).
>
> So fix those firmware crashes by always using a non-posted write for
> this specific register write. We do that by simply reading back the
> register after writing it, just as a few other PCI drivers do.
>
> This fixes a bug where during rx/tx traffic and with ASPM L1 substates

Ditto. TX/RX.

> enabled (the enabled substates are platform dependent), the firmware
> crashes and eventually a command timeout appears in the logs.

Should it have a Fixes tag?

> Cc: [email protected]
> Signed-off-by: Jonas Dre?ler <[email protected]>

...

> - /* Write the TX ring write pointer in to reg->tx_wrptr */
> - if (mwifiex_write_reg(adapter, reg->tx_wrptr,
> - card->txbd_wrptr | rx_val)) {
> + /* Write the TX ring write pointer in to reg->tx_wrptr.
> + * The firmware (latest version 15.68.19.p21) of the 88W8897
> + * pcie+usb card seems to crash when getting the TX ready
> + * interrupt but the TX ring write pointer points to an outdated
> + * address, so it's important we do a non-posted write here to
> + * force the completion of the write.
> + */
> + if (mwifiex_write_reg_np(adapter, reg->tx_wrptr,
> + card->txbd_wrptr | rx_val)) {

> mwifiex_dbg(adapter, ERROR,
> "SEND DATA: failed to write reg->tx_wrptr\n");
> ret = -1;

I'm not sure how this is not a dead code.

On top of that, I would rather to call old function and explicitly put the
dummy read after it.

/* Write the TX ring write pointer in to reg->tx_wrptr */
if (mwifiex_write_reg(adapter, reg->tx_wrptr,
card->txbd_wrptr | rx_val)) {
...eliminate dead code in the following patch(es)...
}

+ /* The firmware (latest version 15.68.19.p21) of the 88W8897
+ * pcie+usb card seems to crash when getting the TX ready
+ * interrupt but the TX ring write pointer points to an outdated
+ * address, so it's important we do a non-posted write here to
+ * force the completion of the write.
+ */
mwifiex_read_reg(...);

Now, since I found the dummy read function to be present, perhaps you need to
dive more into the code and understand why it exists.

--
With Best Regards,
Andy Shevchenko



2021-09-22 12:10:07

by Jonas Dreßler

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mwifiex: Use non-posted PCI write when setting TX ring write pointer

On 9/22/21 1:17 PM, Andy Shevchenko wrote:
> On Tue, Sep 14, 2021 at 01:48:12PM +0200, Jonas Dreßler wrote:
>> On the 88W8897 card it's very important the TX ring write pointer is
>> updated correctly to its new value before setting the TX ready
>> interrupt, otherwise the firmware appears to crash (probably because
>> it's trying to DMA-read from the wrong place). The issue is present in
>> the latest firmware version 15.68.19.p21 of the pcie+usb card.
>
> Please, be consistent in the commit message(s) and the code (esp. if the term
> comes from a specification).
>
> Here, PCIe (same in the code, at least that I have noticed, but should be done
> everywhere).
>
>> Since PCI uses "posted writes" when writing to a register, it's not
>> guaranteed that a write will happen immediately. That means the pointer
>> might be outdated when setting the TX ready interrupt, leading to
>> firmware crashes especially when ASPM L1 and L1 substates are enabled
>> (because of the higher link latency, the write will probably take
>> longer).
>>
>> So fix those firmware crashes by always using a non-posted write for
>> this specific register write. We do that by simply reading back the
>> register after writing it, just as a few other PCI drivers do.
>>
>> This fixes a bug where during rx/tx traffic and with ASPM L1 substates
>
> Ditto. TX/RX.
>
>> enabled (the enabled substates are platform dependent), the firmware
>> crashes and eventually a command timeout appears in the logs.
>
> Should it have a Fixes tag?
>

Don't think so, there's the infamous
(https://bugzilla.kernel.org/show_bug.cgi?id=109681) Bugzilla bug it
fixes though, I'll mention that in v3.

>> Cc: [email protected]
>> Signed-off-by: Jonas Dreßler <[email protected]>
>
> ...
>
>> - /* Write the TX ring write pointer in to reg->tx_wrptr */
>> - if (mwifiex_write_reg(adapter, reg->tx_wrptr,
>> - card->txbd_wrptr | rx_val)) {
>> + /* Write the TX ring write pointer in to reg->tx_wrptr.
>> + * The firmware (latest version 15.68.19.p21) of the 88W8897
>> + * pcie+usb card seems to crash when getting the TX ready
>> + * interrupt but the TX ring write pointer points to an outdated
>> + * address, so it's important we do a non-posted write here to
>> + * force the completion of the write.
>> + */
>> + if (mwifiex_write_reg_np(adapter, reg->tx_wrptr,
>> + card->txbd_wrptr | rx_val)) {
>
>> mwifiex_dbg(adapter, ERROR,
>> "SEND DATA: failed to write reg->tx_wrptr\n");
>> ret = -1;
>
> I'm not sure how this is not a dead code.
>
> On top of that, I would rather to call old function and explicitly put the
> dummy read after it
>
> /* Write the TX ring write pointer in to reg->tx_wrptr */
> if (mwifiex_write_reg(adapter, reg->tx_wrptr,
> card->txbd_wrptr | rx_val)) {
> ...eliminate dead code in the following patch(es)...
> }
>
> + /* The firmware (latest version 15.68.19.p21) of the 88W8897
> + * pcie+usb card seems to crash when getting the TX ready
> + * interrupt but the TX ring write pointer points to an outdated
> + * address, so it's important we do a non-posted write here to
> + * force the completion of the write.
> + */
> mwifiex_read_reg(...);
>
> Now, since I found the dummy read function to be present, perhaps you need to
> dive more into the code and understand why it exists.
>

Interesting, I haven't noticed that mwifiex_write_reg() always returns
0. So are you suggesting to remove that return value and get rid of all
the "if (mwifiex_write_reg()) {}" checks in a separate commit?

As for why the dummy read/write functions exist, I have no idea. Looking
at git history it seems they were always there (only change is that
mwifiex_read_reg() started to handle read errors with commit
af05148392f50490c662dccee6c502d9fcba33e2). My bet would be that they
were created to be consistent with sdio.c which is the oldest supported
bus type in mwifiex.