2023-04-16 13:49:51

by Christian Lamparter

[permalink] [raw]
Subject: Re: [PATCH] ath9k: fix calibration data endianness

On 4/16/23 12:50, Toke Høiland-Jørgensen wrote:
> Christian Lamparter <[email protected]> writes:
>
>> On 4/15/23 18:02, Christian Lamparter wrote:
>>> Hi,
>>>
>>> On 4/15/23 17:25, Toke Høiland-Jørgensen wrote:
>>>> Álvaro Fernández Rojas <[email protected]> writes:
>>>>
>>>>> BCM63xx (Big Endian MIPS) devices store the calibration data in MTD
>>>>> partitions but it needs to be swapped in order to work, otherwise it fails:
>>>>> ath9k 0000:00:01.0: enabling device (0000 -> 0002)
>>>>> ath: phy0: Ignoring endianness difference in EEPROM magic bytes.
>>>>> ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0
>>>>> ath: phy0: Unable to initialize hardware; initialization status: -22
>>>>> ath9k 0000:00:01.0: Failed to initialize device
>>>>> ath9k: probe of 0000:00:01.0 failed with error -22
>>>>
>>>> How does this affect other platforms? Why was the NO_EEP_SWAP flag set
>>>> in the first place? Christian, care to comment on this?
>>>
>>> I knew this would come up. I've written what I know and remember in the
>>> pull-request/buglink.
>>>
>>> Maybe this can be added to the commit?
>>> Link: https://github.com/openwrt/openwrt/pull/12365
>>>
>>> | From what I remember, the ah->ah_flags |= AH_NO_EEP_SWAP; was copied verbatim from ath9k_of_init's request_eeprom.
>>>
>>> Since the existing request_firmware eeprom fetcher code set the flag,
>>> the nvmem code had to do it too.
>>>
>>> In theory, I don't think that not setting the AH_NO_EEP_SWAP flag will cause havoc.
>>> I don't know if there are devices out there, which have a swapped magic (which is
>>> used to detect the endianess), but the caldata is in the correct endiannes (or
>>> vice versa - Magic is correct, but data needs swapping).
>>>
>>> I can run tests with it on a Netzgear WNDR3700v2 (AR7161+2xAR9220)
>>> and FritzBox 7360v2 (Lantiq XWAY+AR9220). (But these worked fine.
>>> So I don't expect there to be a new issue there).
>>
>> Nope! This is a classic self-own!... Well at least, this now gets documented!
>>
>> Here are my findings. Please excuse the overlong lines.
>>
>> ## The good news / AVM FritzBox 7360v2 ##
>>
>> The good news: The AVM FritzBox 7360v2 worked the same as before.
>
> [...]
>
>> ## The not so good news / Netgear WNDR3700v2 ##
>>
>> But not the Netgar WNDR3700v2. One WiFi (The 2.4G, reported itself now as the 5G @0000:00:11.0 -
>> doesn't really work now), and the real 5G WiFi (@0000:00:12.0) failed with:
>> "phy1: Bad EEPROM VER 0x0001 or REV 0x06e0"
>
> [...]
>
> Alright, so IIUC, we have a situation where some devices only work
> *with* the flag, and some devices only work *without* the flag? So we'll
> need some kind of platform-specific setting? Could we put this in the
> device trees, or is there a better solution?

Depends. From what I gather, ath9k calls this "need_swap". Thing is,
the flag in the EEPROM is called "AR5416_EEPMISC_BIG_ENDIAN". In the
official documentation about the AR9170 Base EEPROM (has the same base
structure as AR5008 up to AR92xx) this is specified as:

"Only bit 0 is defined as Big Endian. This bit should be written as 1
when the structure is interpreted in big Endian byte ordering. This bit
must be reviewed before any larger than byte parameters can be interpreted."

It makes sense that on a Big-Endian MIPS device (like the Netgear WNDR3700v2),
the caldata should be in "Big-Endian" too... so no swapping is necessary.

Looking in ath9k's eeprom.c function ath9k_hw_nvram_swap_data() that deals
with this eepmisc flag:

| if (ah->eep_ops->get_eepmisc(ah) & AR5416_EEPMISC_BIG_ENDIAN) {
| *swap_needed = true;
| ath_dbg(common, EEPROM,
| "Big Endian EEPROM detected according to EEPMISC register.\n");
| } else {
| *swap_needed = false;
| }

This doesn't take into consideration that swapping is not needed if
the data is in big endian format on a big endian device. So, this
could be changed so that the *swap_needed is only true if the flag and
device endiannes disagrees?

That said, Martin and Felix have written their reasons in the cover letter
and patches for why the code is what it is:
<https://ath9k-devel.ath9k.narkive.com/2q5A6nu0/patch-0-5-ath9k-eeprom-swapping-improvements>

Toke, What's your take on this? Having something similar like the
check_endian bool... but for OF? Or more logic that can somehow
figure out if it's big or little endian.

Cheers,
Christian


2023-04-16 15:05:09

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH] ath9k: fix calibration data endianness

> | if (ah->eep_ops->get_eepmisc(ah) & AR5416_EEPMISC_BIG_ENDIAN) {
> | *swap_needed = true;
> | ath_dbg(common, EEPROM,
> | "Big Endian EEPROM detected according to EEPMISC register.\n");
> | } else {
> | *swap_needed = false;
> | }
>
> This doesn't take into consideration that swapping is not needed if
> the data is in big endian format on a big endian device. So, this
> could be changed so that the *swap_needed is only true if the flag and
> device endiannes disagrees?

There are versions of the macro which performs the swap which
understands the CPU endianness and become a NOP when it is not
required. htons()/ntohs() are the classic examples. So you need to
consider:

Despite swap_needed being true, it is possible no swap it actually
happening, because such a macro is being used.

and

Maybe using these variant can make the code simpler, by just doing the
NOP swap when the CPU endianess does not require it.

Andrew

2023-04-16 22:03:11

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [PATCH] ath9k: fix calibration data endianness

Christian Lamparter <[email protected]> writes:

> On 4/16/23 12:50, Toke Høiland-Jørgensen wrote:
>> Christian Lamparter <[email protected]> writes:
>>
>>> On 4/15/23 18:02, Christian Lamparter wrote:
>>>> Hi,
>>>>
>>>> On 4/15/23 17:25, Toke Høiland-Jørgensen wrote:
>>>>> Álvaro Fernández Rojas <[email protected]> writes:
>>>>>
>>>>>> BCM63xx (Big Endian MIPS) devices store the calibration data in MTD
>>>>>> partitions but it needs to be swapped in order to work, otherwise it fails:
>>>>>> ath9k 0000:00:01.0: enabling device (0000 -> 0002)
>>>>>> ath: phy0: Ignoring endianness difference in EEPROM magic bytes.
>>>>>> ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0
>>>>>> ath: phy0: Unable to initialize hardware; initialization status: -22
>>>>>> ath9k 0000:00:01.0: Failed to initialize device
>>>>>> ath9k: probe of 0000:00:01.0 failed with error -22
>>>>>
>>>>> How does this affect other platforms? Why was the NO_EEP_SWAP flag set
>>>>> in the first place? Christian, care to comment on this?
>>>>
>>>> I knew this would come up. I've written what I know and remember in the
>>>> pull-request/buglink.
>>>>
>>>> Maybe this can be added to the commit?
>>>> Link: https://github.com/openwrt/openwrt/pull/12365
>>>>
>>>> | From what I remember, the ah->ah_flags |= AH_NO_EEP_SWAP; was copied verbatim from ath9k_of_init's request_eeprom.
>>>>
>>>> Since the existing request_firmware eeprom fetcher code set the flag,
>>>> the nvmem code had to do it too.
>>>>
>>>> In theory, I don't think that not setting the AH_NO_EEP_SWAP flag will cause havoc.
>>>> I don't know if there are devices out there, which have a swapped magic (which is
>>>> used to detect the endianess), but the caldata is in the correct endiannes (or
>>>> vice versa - Magic is correct, but data needs swapping).
>>>>
>>>> I can run tests with it on a Netzgear WNDR3700v2 (AR7161+2xAR9220)
>>>> and FritzBox 7360v2 (Lantiq XWAY+AR9220). (But these worked fine.
>>>> So I don't expect there to be a new issue there).
>>>
>>> Nope! This is a classic self-own!... Well at least, this now gets documented!
>>>
>>> Here are my findings. Please excuse the overlong lines.
>>>
>>> ## The good news / AVM FritzBox 7360v2 ##
>>>
>>> The good news: The AVM FritzBox 7360v2 worked the same as before.
>>
>> [...]
>>
>>> ## The not so good news / Netgear WNDR3700v2 ##
>>>
>>> But not the Netgar WNDR3700v2. One WiFi (The 2.4G, reported itself now as the 5G @0000:00:11.0 -
>>> doesn't really work now), and the real 5G WiFi (@0000:00:12.0) failed with:
>>> "phy1: Bad EEPROM VER 0x0001 or REV 0x06e0"
>>
>> [...]
>>
>> Alright, so IIUC, we have a situation where some devices only work
>> *with* the flag, and some devices only work *without* the flag? So we'll
>> need some kind of platform-specific setting? Could we put this in the
>> device trees, or is there a better solution?
>
> Depends. From what I gather, ath9k calls this "need_swap". Thing is,
> the flag in the EEPROM is called "AR5416_EEPMISC_BIG_ENDIAN". In the
> official documentation about the AR9170 Base EEPROM (has the same base
> structure as AR5008 up to AR92xx) this is specified as:
>
> "Only bit 0 is defined as Big Endian. This bit should be written as 1
> when the structure is interpreted in big Endian byte ordering. This bit
> must be reviewed before any larger than byte parameters can be interpreted."
>
> It makes sense that on a Big-Endian MIPS device (like the Netgear WNDR3700v2),
> the caldata should be in "Big-Endian" too... so no swapping is necessary.
>
> Looking in ath9k's eeprom.c function ath9k_hw_nvram_swap_data() that deals
> with this eepmisc flag:
>
> | if (ah->eep_ops->get_eepmisc(ah) & AR5416_EEPMISC_BIG_ENDIAN) {
> | *swap_needed = true;
> | ath_dbg(common, EEPROM,
> | "Big Endian EEPROM detected according to EEPMISC register.\n");
> | } else {
> | *swap_needed = false;
> | }
>
> This doesn't take into consideration that swapping is not needed if
> the data is in big endian format on a big endian device. So, this
> could be changed so that the *swap_needed is only true if the flag and
> device endiannes disagrees?
>
> That said, Martin and Felix have written their reasons in the cover letter
> and patches for why the code is what it is:
> <https://ath9k-devel.ath9k.narkive.com/2q5A6nu0/patch-0-5-ath9k-eeprom-swapping-improvements>
>
> Toke, What's your take on this? Having something similar like the
> check_endian bool... but for OF? Or more logic that can somehow
> figure out if it's big or little endian.

Digging into that old thread, it seems we are re-hashing a lot of the
old discussion when those patches went in. Basically, the code you
quoted above is correct because the commit that introduced it sets all
fields to be __le16 and __le32 types and reads them using the
leXX_to_cpu() macros.

The code *further up* in that function is what is enabled by Alvaro's
patch. Which is a different type of swapping (where the whole eeprom is
swab16()'ed, not just the actual multi-byte data fields in them).
However, in OpenWrt the in-driver code to do this is not used; instead,
a hotplug script applies the swapping before the device is seen by the
driver, as described in this commit[0]. Martin indeed mentions that this
is a device-specific thing, so the driver can't actually do the right
thing without some outside feature flag[1]. The commit[0] also indicates
that there used used to exist a device-tree binding in the out-of-tree
device trees used in OpenWrt to do the unconditional swab16().

The code in [0] still exists in OpenWrt today, albeit in a somewhat
modified form[2]. I guess the question then boils down to, Álvaro, can
your issue be resolved by a pre-processing step similar to that which is
done in [2]? Or do we need the device tree flag after all?

-Toke

[0] https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=afa37092663d00aa0abf8c61943d9a1b5558b144
[1] https://narkive.com/2q5A6nu0.34
[2] https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/lantiq/xway/base-files/etc/hotplug.d/firmware/12-ath9k-eeprom;h=98bb9af6947a298775ff7fa26ac6501c57df8378;hb=HEAD