2022-02-09 10:03:24

by Christoph Niedermaier

[permalink] [raw]
Subject: Possible bug on wilc1000 [Klartext]

Hello,

I tested the wireless chip wilc1000 with the 5.16.5 Kernel and the firmware v15.4.1
(https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/atmel/wilc1000_wifi_firmware-1.bin)
on an i.MX6 QUAD with iperf3:

# iperf3 -c IP_ADDR -P 16 -t 0

After a while the test gets stuck and I got the following kernel messages:
mmc0: Timeout waiting for hardware interrupt.
mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
mmc0: sdhci: Sys addr: 0x138f0200 | Version: 0x00000002
mmc0: sdhci: Blk size: 0x00000158 | Blk cnt: 0x00000001
mmc0: sdhci: Argument: 0x14000158 | Trn mode: 0x00000013
mmc0: sdhci: Present: 0x01d88a0a | Host ctl: 0x00000013
mmc0: sdhci: Power: 0x00000002 | Blk gap: 0x00000080
mmc0: sdhci: Wake-up: 0x00000008 | Clock: 0x0000009f
mmc0: sdhci: Timeout: 0x0000008f | Int stat: 0x00000000
mmc0: sdhci: Int enab: 0x107f100b | Sig enab: 0x107f100b
mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000003
mmc0: sdhci: Caps: 0x07eb0000 | Caps_1: 0x0000a000
mmc0: sdhci: Cmd: 0x0000353a | Max curr: 0x00ffffff
mmc0: sdhci: Resp[0]: 0x00001000 | Resp[1]: 0x00000000
mmc0: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000
mmc0: sdhci: Host ctl2: 0x00000000
mmc0: sdhci: ADMA Err: 0x00000007 | ADMA Ptr: 0x4c041200
mmc0: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP =========
mmc0: sdhci-esdhc-imx: cmd debug status: 0x2100
mmc0: sdhci-esdhc-imx: data debug status: 0x2200
mmc0: sdhci-esdhc-imx: trans debug status: 0x2300
mmc0: sdhci-esdhc-imx: dma debug status: 0x2402
mmc0: sdhci-esdhc-imx: adma debug status: 0x25b4
mmc0: sdhci-esdhc-imx: fifo debug status: 0x2610
mmc0: sdhci-esdhc-imx: async fifo debug status: 0x2751
mmc0: sdhci: ============================================
wilc1000_sdio mmc0:0001:1: wilc_sdio_cmd53..failed, err(-110)
wilc1000_sdio mmc0:0001:1: Failed cmd53 [0], bytes read...

I tried to reduce the clock speed to 20MHz in the devicetree with
max-frequency = <20000000>;
but the problem then also occurs.

Is this a possible bug?


Thanks in advance and best regards
Christoph


2022-02-10 10:17:00

by Christoph Niedermaier

[permalink] [raw]
Subject: RE: Possible bug on wilc1000 [Klartext]

From: [email protected] [mailto:[email protected]]
Sent: Wednesday, February 9, 2022 3:37 PM
> On 08/02/22 21:56, Christoph Niedermaier wrote:
>>
>> Hello,
>>
>> I tested the wireless chip wilc1000 with the 5.16.5 Kernel and the firmware v15.4.1
>> (https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/atmel/wilc1000_wifi_firmware-1.bin)
>> on an i.MX6 QUAD with iperf3:
>>
>> # iperf3 -c IP_ADDR -P 16 -t 0
>>
>> After a while the test gets stuck and I got the following kernel messages:
>> mmc0: Timeout waiting for hardware interrupt.
>> mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>> mmc0: sdhci: Sys addr: 0x138f0200 | Version: 0x00000002
>> mmc0: sdhci: Blk size: 0x00000158 | Blk cnt: 0x00000001
>> mmc0: sdhci: Argument: 0x14000158 | Trn mode: 0x00000013
>> mmc0: sdhci: Present: 0x01d88a0a | Host ctl: 0x00000013
>> mmc0: sdhci: Power: 0x00000002 | Blk gap: 0x00000080
>> mmc0: sdhci: Wake-up: 0x00000008 | Clock: 0x0000009f
>> mmc0: sdhci: Timeout: 0x0000008f | Int stat: 0x00000000
>> mmc0: sdhci: Int enab: 0x107f100b | Sig enab: 0x107f100b
>> mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000003
>> mmc0: sdhci: Caps: 0x07eb0000 | Caps_1: 0x0000a000
>> mmc0: sdhci: Cmd: 0x0000353a | Max curr: 0x00ffffff
>> mmc0: sdhci: Resp[0]: 0x00001000 | Resp[1]: 0x00000000
>> mmc0: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000
>> mmc0: sdhci: Host ctl2: 0x00000000
>> mmc0: sdhci: ADMA Err: 0x00000007 | ADMA Ptr: 0x4c041200
>> mmc0: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP =========
>> mmc0: sdhci-esdhc-imx: cmd debug status: 0x2100
>> mmc0: sdhci-esdhc-imx: data debug status: 0x2200
>> mmc0: sdhci-esdhc-imx: trans debug status: 0x2300
>> mmc0: sdhci-esdhc-imx: dma debug status: 0x2402
>> mmc0: sdhci-esdhc-imx: adma debug status: 0x25b4
>> mmc0: sdhci-esdhc-imx: fifo debug status: 0x2610
>> mmc0: sdhci-esdhc-imx: async fifo debug status: 0x2751
>> mmc0: sdhci: ============================================
>> wilc1000_sdio mmc0:0001:1: wilc_sdio_cmd53..failed, err(-110)
>> wilc1000_sdio mmc0:0001:1: Failed cmd53 [0], bytes read...
>>
>> I tried to reduce the clock speed to 20MHz in the devicetree with
>> max-frequency = <20000000>;
>> but the problem then also occurs.
>>
>> Is this a possible bug?
>>
>>
>

Hi Ajay,
Thanks for the answer.

> The bus error seems to be specific to the host during the SDIO transfer.
> How long does it take to reproduce it? Does the issue also happen
> without "-P 16" iPerf3 option?

It takes about 10s (something a bit longer) till I got this kernel error
messages and it doesn't matter if I use it with "-P 16" or without.

> It might be worth trying with power-save mode disabled. The PS mode can
> be disabled either by setting WID_POWER_MANAGEMENT WID to
> 0(WILC_FW_NO_POWERSAVE) in "wilc_set_power_mgmt()" or compile the kernel
> with CFG80211_DEFAULT_PS disabled.

I built a the kernel with CFG80211_DEFAULT_PS disabled, but the kernel
error messages still appear.


Regards
Christoph

2022-02-10 16:49:09

by Marek Vasut

[permalink] [raw]
Subject: Re: Possible bug on wilc1000 [Klartext]

On 2/10/22 17:19, [email protected] wrote:

Hi,

> On 10/02/22 14:10, Christoph Niedermaier wrote:
>> From: [email protected] [mailto:[email protected]]
>> Sent: Wednesday, February 9, 2022 3:37 PM
>>> On 08/02/22 21:56, Christoph Niedermaier wrote:
>>>> Hello,
>>>>
>>>> I tested the wireless chip wilc1000 with the 5.16.5 Kernel and the firmware v15.4.1
>>>> (https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/atmel/wilc1000_wifi_firmware-1.bin)
>>>> on an i.MX6 QUAD with iperf3:
>>>>
>>>> # iperf3 -c IP_ADDR -P 16 -t 0
>>>>
>>>> After a while the test gets stuck and I got the following kernel messages:
>>>> mmc0: Timeout waiting for hardware interrupt.
>>>> mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>>>> mmc0: sdhci: Sys addr: 0x138f0200 | Version: 0x00000002
>>>> mmc0: sdhci: Blk size: 0x00000158 | Blk cnt: 0x00000001
>>>> mmc0: sdhci: Argument: 0x14000158 | Trn mode: 0x00000013
>>>> mmc0: sdhci: Present: 0x01d88a0a | Host ctl: 0x00000013
>>>> mmc0: sdhci: Power: 0x00000002 | Blk gap: 0x00000080
>>>> mmc0: sdhci: Wake-up: 0x00000008 | Clock: 0x0000009f
>>>> mmc0: sdhci: Timeout: 0x0000008f | Int stat: 0x00000000
>>>> mmc0: sdhci: Int enab: 0x107f100b | Sig enab: 0x107f100b
>>>> mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000003
>>>> mmc0: sdhci: Caps: 0x07eb0000 | Caps_1: 0x0000a000
>>>> mmc0: sdhci: Cmd: 0x0000353a | Max curr: 0x00ffffff
>>>> mmc0: sdhci: Resp[0]: 0x00001000 | Resp[1]: 0x00000000
>>>> mmc0: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000
>>>> mmc0: sdhci: Host ctl2: 0x00000000
>>>> mmc0: sdhci: ADMA Err: 0x00000007 | ADMA Ptr: 0x4c041200
>>>> mmc0: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP =========
>>>> mmc0: sdhci-esdhc-imx: cmd debug status: 0x2100
>>>> mmc0: sdhci-esdhc-imx: data debug status: 0x2200
>>>> mmc0: sdhci-esdhc-imx: trans debug status: 0x2300
>>>> mmc0: sdhci-esdhc-imx: dma debug status: 0x2402
>>>> mmc0: sdhci-esdhc-imx: adma debug status: 0x25b4
>>>> mmc0: sdhci-esdhc-imx: fifo debug status: 0x2610
>>>> mmc0: sdhci-esdhc-imx: async fifo debug status: 0x2751
>>>> mmc0: sdhci: ============================================
>>>> wilc1000_sdio mmc0:0001:1: wilc_sdio_cmd53..failed, err(-110)
>>>> wilc1000_sdio mmc0:0001:1: Failed cmd53 [0], bytes read...
>>>>
>>>> I tried to reduce the clock speed to 20MHz in the devicetree with
>>>> max-frequency = <20000000>;
>>>> but the problem then also occurs.
>>>>
>>>> Is this a possible bug?
>>>>
>>>>
>> Hi Ajay,
>> Thanks for the answer.
>>
>>> The bus error seems to be specific to the host during the SDIO transfer.
>>> How long does it take to reproduce it? Does the issue also happen
>>> without "-P 16" iPerf3 option?
>> It takes about 10s (something a bit longer) till I got this kernel error
>> messages and it doesn't matter if I use it with "-P 16" or without.
>
>
> I did not observe the issue with my setup(SAMA5D4 XPLAINED + WILC1000
> SDIO) when tested iPerf for a longer duration(~1000sec). I suspect the
> issue could be related to the SDHCI host controller.
> Try to debug the host controller side for the possible cause of timeout.

It seems the timeout happens because the card fails to respond to SDIO
command 53, right ?

Is there some error logging/tracing functionality in the WILC1000
firmware which can provide further information why the card did not
respond ?

Could it be the card suffered some sort of FIFO overflow ? The MX6Q is a
bit more performant than the CA7 (I think?) SAMA5D4, so maybe that plays
some role ?

2022-02-10 18:36:26

by Ajay Singh

[permalink] [raw]
Subject: Re: Possible bug on wilc1000 [Klartext]

Hi Christoph,

On 10/02/22 14:10, Christoph Niedermaier wrote:
> From: [email protected] [mailto:[email protected]]
> Sent: Wednesday, February 9, 2022 3:37 PM
>> On 08/02/22 21:56, Christoph Niedermaier wrote:
>>> Hello,
>>>
>>> I tested the wireless chip wilc1000 with the 5.16.5 Kernel and the firmware v15.4.1
>>> (https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/atmel/wilc1000_wifi_firmware-1.bin)
>>> on an i.MX6 QUAD with iperf3:
>>>
>>> # iperf3 -c IP_ADDR -P 16 -t 0
>>>
>>> After a while the test gets stuck and I got the following kernel messages:
>>> mmc0: Timeout waiting for hardware interrupt.
>>> mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>>> mmc0: sdhci: Sys addr: 0x138f0200 | Version: 0x00000002
>>> mmc0: sdhci: Blk size: 0x00000158 | Blk cnt: 0x00000001
>>> mmc0: sdhci: Argument: 0x14000158 | Trn mode: 0x00000013
>>> mmc0: sdhci: Present: 0x01d88a0a | Host ctl: 0x00000013
>>> mmc0: sdhci: Power: 0x00000002 | Blk gap: 0x00000080
>>> mmc0: sdhci: Wake-up: 0x00000008 | Clock: 0x0000009f
>>> mmc0: sdhci: Timeout: 0x0000008f | Int stat: 0x00000000
>>> mmc0: sdhci: Int enab: 0x107f100b | Sig enab: 0x107f100b
>>> mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000003
>>> mmc0: sdhci: Caps: 0x07eb0000 | Caps_1: 0x0000a000
>>> mmc0: sdhci: Cmd: 0x0000353a | Max curr: 0x00ffffff
>>> mmc0: sdhci: Resp[0]: 0x00001000 | Resp[1]: 0x00000000
>>> mmc0: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000
>>> mmc0: sdhci: Host ctl2: 0x00000000
>>> mmc0: sdhci: ADMA Err: 0x00000007 | ADMA Ptr: 0x4c041200
>>> mmc0: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP =========
>>> mmc0: sdhci-esdhc-imx: cmd debug status: 0x2100
>>> mmc0: sdhci-esdhc-imx: data debug status: 0x2200
>>> mmc0: sdhci-esdhc-imx: trans debug status: 0x2300
>>> mmc0: sdhci-esdhc-imx: dma debug status: 0x2402
>>> mmc0: sdhci-esdhc-imx: adma debug status: 0x25b4
>>> mmc0: sdhci-esdhc-imx: fifo debug status: 0x2610
>>> mmc0: sdhci-esdhc-imx: async fifo debug status: 0x2751
>>> mmc0: sdhci: ============================================
>>> wilc1000_sdio mmc0:0001:1: wilc_sdio_cmd53..failed, err(-110)
>>> wilc1000_sdio mmc0:0001:1: Failed cmd53 [0], bytes read...
>>>
>>> I tried to reduce the clock speed to 20MHz in the devicetree with
>>> max-frequency = <20000000>;
>>> but the problem then also occurs.
>>>
>>> Is this a possible bug?
>>>
>>>
> Hi Ajay,
> Thanks for the answer.
>
>> The bus error seems to be specific to the host during the SDIO transfer.
>> How long does it take to reproduce it? Does the issue also happen
>> without "-P 16" iPerf3 option?
> It takes about 10s (something a bit longer) till I got this kernel error
> messages and it doesn't matter if I use it with "-P 16" or without.


I did not observe the issue with my setup(SAMA5D4 XPLAINED + WILC1000
SDIO) when tested iPerf for a longer duration(~1000sec). I suspect the
issue could be related to the SDHCI host controller.
Try to debug the host controller side for the possible cause of timeout.


Regards,
Ajay

2022-02-11 12:51:10

by Marek Vasut

[permalink] [raw]
Subject: Re: Possible bug on wilc1000 [Klartext]

On 2/11/22 11:15, [email protected] wrote:
> On 10/02/22 21:55, Marek Vasut wrote:
>>
>> On 2/10/22 17:19, [email protected] wrote:
>>
>> Hi,
>>
>>> On 10/02/22 14:10, Christoph Niedermaier wrote:
>>>> From: [email protected] [mailto:[email protected]]
>>>> Sent: Wednesday, February 9, 2022 3:37 PM
>>>>> On 08/02/22 21:56, Christoph Niedermaier wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I tested the wireless chip wilc1000 with the 5.16.5 Kernel and the
>>>>>> firmware v15.4.1
>>>>>> (https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/atmel/wilc1000_wifi_firmware-1.bin)
>>>>>>
>>>>>> on an i.MX6 QUAD with iperf3:
>>>>>>
>>>>>> # iperf3 -c IP_ADDR -P 16 -t 0
>>>>>>
>>>>>> After a while the test gets stuck and I got the following kernel
>>>>>> messages:
>>>>>> mmc0: Timeout waiting for hardware interrupt.
>>>>>> mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>>>>>> mmc0: sdhci: Sys addr:  0x138f0200 | Version:  0x00000002
>>>>>> mmc0: sdhci: Blk size:  0x00000158 | Blk cnt:  0x00000001
>>>>>> mmc0: sdhci: Argument:  0x14000158 | Trn mode: 0x00000013
>>>>>> mmc0: sdhci: Present:   0x01d88a0a | Host ctl: 0x00000013
>>>>>> mmc0: sdhci: Power:     0x00000002 | Blk gap:  0x00000080
>>>>>> mmc0: sdhci: Wake-up:   0x00000008 | Clock:    0x0000009f
>>>>>> mmc0: sdhci: Timeout:   0x0000008f | Int stat: 0x00000000
>>>>>> mmc0: sdhci: Int enab:  0x107f100b | Sig enab: 0x107f100b
>>>>>> mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000003
>>>>>> mmc0: sdhci: Caps:      0x07eb0000 | Caps_1:   0x0000a000
>>>>>> mmc0: sdhci: Cmd:       0x0000353a | Max curr: 0x00ffffff
>>>>>> mmc0: sdhci: Resp[0]:   0x00001000 | Resp[1]:  0x00000000
>>>>>> mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
>>>>>> mmc0: sdhci: Host ctl2: 0x00000000
>>>>>> mmc0: sdhci: ADMA Err:  0x00000007 | ADMA Ptr: 0x4c041200
>>>>>> mmc0: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP
>>>>>> =========
>>>>>> mmc0: sdhci-esdhc-imx: cmd debug status:  0x2100
>>>>>> mmc0: sdhci-esdhc-imx: data debug status:  0x2200
>>>>>> mmc0: sdhci-esdhc-imx: trans debug status:  0x2300
>>>>>> mmc0: sdhci-esdhc-imx: dma debug status:  0x2402
>>>>>> mmc0: sdhci-esdhc-imx: adma debug status:  0x25b4
>>>>>> mmc0: sdhci-esdhc-imx: fifo debug status:  0x2610
>>>>>> mmc0: sdhci-esdhc-imx: async fifo debug status:  0x2751
>>>>>> mmc0: sdhci: ============================================
>>>>>> wilc1000_sdio mmc0:0001:1: wilc_sdio_cmd53..failed, err(-110)
>>>>>> wilc1000_sdio mmc0:0001:1: Failed cmd53 [0], bytes read...
>>>>>>
>>>>>> I tried to reduce the clock speed to 20MHz in the devicetree with
>>>>>> max-frequency = <20000000>;
>>>>>> but the problem then also occurs.
>>>>>>
>>>>>> Is this a possible bug?
>>>>>>
>>>>>>
>>>> Hi Ajay,
>>>> Thanks for the answer.
>>>>
>>>>> The bus error seems to be specific to the host during the SDIO
>>>>> transfer.
>>>>> How long does it take to reproduce it? Does the issue also happen
>>>>> without "-P 16" iPerf3 option?
>>>> It takes about 10s (something a bit longer) till I got this kernel
>>>> error
>>>> messages and it doesn't matter if I use it with "-P 16" or without.
>>>
>>>
>>> I did not observe the issue with my setup(SAMA5D4 XPLAINED + WILC1000
>>> SDIO) when tested iPerf for a longer duration(~1000sec). I suspect the
>>> issue could be related to the SDHCI host controller.
>>> Try to debug the host controller side for the possible cause of timeout.
>>
>> It seems the timeout happens because the card fails to respond to SDIO
>> command 53, right ?
>>
>
> Yes, the timeout could be for any reason like either the CMD53 has not
> reached to chip or response not received correctly at host end.

The problem happens seconds or tens of seconds into the test, so there
must've been CMD53 which reached the card before the problem occurred,
and there must have been a lot of those CMD53 before the problem
happened too, since CMD53 seems to be some data transfer CMD ?

>> Is there some error logging/tracing functionality in the WILC1000
>> firmware which can provide further information why the card did not
>> respond ?
>
>
> WILC1000 SD module has UART serial debug port for firmware logs but I
> don't think it would be useful here because it needs to be debug/probe
> at SDIO bus level.

Is there some other kind of logging which can tell us more details on
where to look for this problem ?

Maybe we can try monitoring the SDIO traffic with ftrace ?

Any other options, short of taking the hardware apart ?

>> Could it be the card suffered some sort of FIFO overflow ? The MX6Q is a
>> bit more performant than the CA7 (I think?) SAMA5D4, so maybe that plays
>> some role ?
>
> As I understand, the issue is observed with basic iPerf testing(less
> throughput) so not sure if the host performance will have such an
> impact. IIRC few of the customers are using the same host(i.MX6) though
> I am not sure if it's over SPI or SDIO bus. Till now, I have not come
> across such limitations with the specific host.

That iperf -P 16 hammers the chip with a lot of short packets, the
problem does not occur during iperf3 -P 1 run or UDP iperf3 run (that's
the one with low traffic). Here the interface is saturated, that's why I
speculate some sort of FIFO overrun is happening.

I have also noticed there are some wilc1000 downstream drivers with huge
stacks of patches, but I never really figured out whether those are
still relevant or whether the upstream wilc1000 driver is perfectly
fine. I would like to believe it is the later, is it ?

2022-02-11 14:56:07

by Marek Vasut

[permalink] [raw]
Subject: Re: Possible bug on wilc1000 [Klartext]

On 2/11/22 12:23, [email protected] wrote:
> On 11/02/22 15:59, Marek Vasut wrote:
>> EXTERNAL EMAIL: Do not click links or open attachments unless you know
>> the content is safe
>>
>> On 2/11/22 11:15, [email protected] wrote:
>>> On 10/02/22 21:55, Marek Vasut wrote:
>>>>
>>>> On 2/10/22 17:19, [email protected] wrote:
>>>>
>>>> Hi,
>>>>
>>>>> On 10/02/22 14:10, Christoph Niedermaier wrote:
>>>>>> From: [email protected] [mailto:[email protected]]
>>>>>> Sent: Wednesday, February 9, 2022 3:37 PM
>>>>>>> On 08/02/22 21:56, Christoph Niedermaier wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I tested the wireless chip wilc1000 with the 5.16.5 Kernel and the
>>>>>>>> firmware v15.4.1
>>>>>>>> (https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/atmel/wilc1000_wifi_firmware-1.bin)
>>>>>>>>
>>>>>>>>
>>>>>>>> on an i.MX6 QUAD with iperf3:
>>>>>>>>
>>>>>>>> # iperf3 -c IP_ADDR -P 16 -t 0
>>>>>>>>
>>>>>>>> After a while the test gets stuck and I got the following kernel
>>>>>>>> messages:
>>>>>>>> mmc0: Timeout waiting for hardware interrupt.
>>>>>>>> mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>>>>>>>> mmc0: sdhci: Sys addr:  0x138f0200 | Version: 0x00000002
>>>>>>>> mmc0: sdhci: Blk size:  0x00000158 | Blk cnt: 0x00000001
>>>>>>>> mmc0: sdhci: Argument:  0x14000158 | Trn mode: 0x00000013
>>>>>>>> mmc0: sdhci: Present:   0x01d88a0a | Host ctl: 0x00000013
>>>>>>>> mmc0: sdhci: Power:     0x00000002 | Blk gap: 0x00000080
>>>>>>>> mmc0: sdhci: Wake-up:   0x00000008 | Clock: 0x0000009f
>>>>>>>> mmc0: sdhci: Timeout:   0x0000008f | Int stat: 0x00000000
>>>>>>>> mmc0: sdhci: Int enab:  0x107f100b | Sig enab: 0x107f100b
>>>>>>>> mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000003
>>>>>>>> mmc0: sdhci: Caps:      0x07eb0000 | Caps_1: 0x0000a000
>>>>>>>> mmc0: sdhci: Cmd:       0x0000353a | Max curr: 0x00ffffff
>>>>>>>> mmc0: sdhci: Resp[0]:   0x00001000 | Resp[1]: 0x00000000
>>>>>>>> mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]: 0x00000000
>>>>>>>> mmc0: sdhci: Host ctl2: 0x00000000
>>>>>>>> mmc0: sdhci: ADMA Err:  0x00000007 | ADMA Ptr: 0x4c041200
>>>>>>>> mmc0: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP
>>>>>>>> =========
>>>>>>>> mmc0: sdhci-esdhc-imx: cmd debug status:  0x2100
>>>>>>>> mmc0: sdhci-esdhc-imx: data debug status:  0x2200
>>>>>>>> mmc0: sdhci-esdhc-imx: trans debug status:  0x2300
>>>>>>>> mmc0: sdhci-esdhc-imx: dma debug status:  0x2402
>>>>>>>> mmc0: sdhci-esdhc-imx: adma debug status:  0x25b4
>>>>>>>> mmc0: sdhci-esdhc-imx: fifo debug status:  0x2610
>>>>>>>> mmc0: sdhci-esdhc-imx: async fifo debug status: 0x2751
>>>>>>>> mmc0: sdhci: ============================================
>>>>>>>> wilc1000_sdio mmc0:0001:1: wilc_sdio_cmd53..failed, err(-110)
>>>>>>>> wilc1000_sdio mmc0:0001:1: Failed cmd53 [0], bytes read...
>>>>>>>>
>>>>>>>> I tried to reduce the clock speed to 20MHz in the devicetree with
>>>>>>>> max-frequency = <20000000>;
>>>>>>>> but the problem then also occurs.
>>>>>>>>
>>>>>>>> Is this a possible bug?
>>>>>>>>
>>>>>>>>
>>>>>> Hi Ajay,
>>>>>> Thanks for the answer.
>>>>>>
>>>>>>> The bus error seems to be specific to the host during the SDIO
>>>>>>> transfer.
>>>>>>> How long does it take to reproduce it? Does the issue also happen
>>>>>>> without "-P 16" iPerf3 option?
>>>>>> It takes about 10s (something a bit longer) till I got this kernel
>>>>>> error
>>>>>> messages and it doesn't matter if I use it with "-P 16" or without.
>>>>>
>>>>>
>>>>> I did not observe the issue with my setup(SAMA5D4 XPLAINED + WILC1000
>>>>> SDIO) when tested iPerf for a longer duration(~1000sec). I suspect the
>>>>> issue could be related to the SDHCI host controller.
>>>>> Try to debug the host controller side for the possible cause of
>>>>> timeout.
>>>>
>>>> It seems the timeout happens because the card fails to respond to SDIO
>>>> command 53, right ?
>>>>
>>>
>>> Yes, the timeout could be for any reason like either the CMD53 has not
>>> reached to chip or response not received correctly at host end.
>>
>> The problem happens seconds or tens of seconds into the test, so there
>> must've been CMD53 which reached the card before the problem occurred,
>> and there must have been a lot of those CMD53 before the problem
>> happened too, since CMD53 seems to be some data transfer CMD ?
>>
>>>> Is there some error logging/tracing functionality in the WILC1000
>>>> firmware which can provide further information why the card did not
>>>> respond ?
>>>
>>>
>>> WILC1000 SD module has UART serial debug port for firmware logs but I
>>> don't think it would be useful here because it needs to be debug/probe
>>> at SDIO bus level.
>>
>> Is there some other kind of logging which can tell us more details on
>> where to look for this problem ?
>>
>> Maybe we can try monitoring the SDIO traffic with ftrace ?
>>
>> Any other options, short of taking the hardware apart ?
>>
>>>> Could it be the card suffered some sort of FIFO overflow ? The MX6Q
>>>> is a
>>>> bit more performant than the CA7 (I think?) SAMA5D4, so maybe that
>>>> plays
>>>> some role ?
>>>
>>> As I understand, the issue is observed with basic iPerf testing(less
>>> throughput) so not sure if the host performance will have such an
>>> impact. IIRC few of the customers are using the same host(i.MX6) though
>>> I am not sure if it's over SPI or SDIO bus. Till now, I have not come
>>> across such limitations with the specific host.
>>
>> That iperf -P 16 hammers the chip with a lot of short packets, the
>> problem does not occur during iperf3 -P 1 run or UDP iperf3 run (that's
>> the one with low traffic). Here the interface is saturated, that's why I
>> speculate some sort of FIFO overrun is happening.
>>
>
> But earlier it was mentioned that the problem doesn't matter with or
> without "-P" option. So it seems the issue happens during stress test.

I missed that one. So apparently multiple parallel streams only make the
issue happen quicker.

>> I have also noticed there are some wilc1000 downstream drivers with huge
>> stacks of patches, but I never really figured out whether those are
>> still relevant or whether the upstream wilc1000 driver is perfectly
>> fine. I would like to believe it is the later, is it ?
>
> Yes, wilc1000 mainline driver has all the required changes except few
> minor bug fixes and new features but none related to this scenario.

All right, so we stick with upstream, thanks.

2022-02-11 18:17:00

by Ajay Singh

[permalink] [raw]
Subject: Re: Possible bug on wilc1000 [Klartext]

On 10/02/22 21:55, Marek Vasut wrote:
>
> On 2/10/22 17:19, [email protected] wrote:
>
> Hi,
>
>> On 10/02/22 14:10, Christoph Niedermaier wrote:
>>> From: [email protected] [mailto:[email protected]]
>>> Sent: Wednesday, February 9, 2022 3:37 PM
>>>> On 08/02/22 21:56, Christoph Niedermaier wrote:
>>>>> Hello,
>>>>>
>>>>> I tested the wireless chip wilc1000 with the 5.16.5 Kernel and the
>>>>> firmware v15.4.1
>>>>> (https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/atmel/wilc1000_wifi_firmware-1.bin)
>>>>>
>>>>> on an i.MX6 QUAD with iperf3:
>>>>>
>>>>> # iperf3 -c IP_ADDR -P 16 -t 0
>>>>>
>>>>> After a while the test gets stuck and I got the following kernel
>>>>> messages:
>>>>> mmc0: Timeout waiting for hardware interrupt.
>>>>> mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>>>>> mmc0: sdhci: Sys addr:  0x138f0200 | Version:  0x00000002
>>>>> mmc0: sdhci: Blk size:  0x00000158 | Blk cnt:  0x00000001
>>>>> mmc0: sdhci: Argument:  0x14000158 | Trn mode: 0x00000013
>>>>> mmc0: sdhci: Present:   0x01d88a0a | Host ctl: 0x00000013
>>>>> mmc0: sdhci: Power:     0x00000002 | Blk gap:  0x00000080
>>>>> mmc0: sdhci: Wake-up:   0x00000008 | Clock:    0x0000009f
>>>>> mmc0: sdhci: Timeout:   0x0000008f | Int stat: 0x00000000
>>>>> mmc0: sdhci: Int enab:  0x107f100b | Sig enab: 0x107f100b
>>>>> mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000003
>>>>> mmc0: sdhci: Caps:      0x07eb0000 | Caps_1:   0x0000a000
>>>>> mmc0: sdhci: Cmd:       0x0000353a | Max curr: 0x00ffffff
>>>>> mmc0: sdhci: Resp[0]:   0x00001000 | Resp[1]:  0x00000000
>>>>> mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
>>>>> mmc0: sdhci: Host ctl2: 0x00000000
>>>>> mmc0: sdhci: ADMA Err:  0x00000007 | ADMA Ptr: 0x4c041200
>>>>> mmc0: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP
>>>>> =========
>>>>> mmc0: sdhci-esdhc-imx: cmd debug status:  0x2100
>>>>> mmc0: sdhci-esdhc-imx: data debug status:  0x2200
>>>>> mmc0: sdhci-esdhc-imx: trans debug status:  0x2300
>>>>> mmc0: sdhci-esdhc-imx: dma debug status:  0x2402
>>>>> mmc0: sdhci-esdhc-imx: adma debug status:  0x25b4
>>>>> mmc0: sdhci-esdhc-imx: fifo debug status:  0x2610
>>>>> mmc0: sdhci-esdhc-imx: async fifo debug status:  0x2751
>>>>> mmc0: sdhci: ============================================
>>>>> wilc1000_sdio mmc0:0001:1: wilc_sdio_cmd53..failed, err(-110)
>>>>> wilc1000_sdio mmc0:0001:1: Failed cmd53 [0], bytes read...
>>>>>
>>>>> I tried to reduce the clock speed to 20MHz in the devicetree with
>>>>> max-frequency = <20000000>;
>>>>> but the problem then also occurs.
>>>>>
>>>>> Is this a possible bug?
>>>>>
>>>>>
>>> Hi Ajay,
>>> Thanks for the answer.
>>>
>>>> The bus error seems to be specific to the host during the SDIO
>>>> transfer.
>>>> How long does it take to reproduce it? Does the issue also happen
>>>> without "-P 16" iPerf3 option?
>>> It takes about 10s (something a bit longer) till I got this kernel
>>> error
>>> messages and it doesn't matter if I use it with "-P 16" or without.
>>
>>
>> I did not observe the issue with my setup(SAMA5D4 XPLAINED + WILC1000
>> SDIO) when tested iPerf for a longer duration(~1000sec). I suspect the
>> issue could be related to the SDHCI host controller.
>> Try to debug the host controller side for the possible cause of timeout.
>
> It seems the timeout happens because the card fails to respond to SDIO
> command 53, right ?
>

Yes, the timeout could be for any reason like either the CMD53 has not
reached to chip or response not received correctly at host end.

> Is there some error logging/tracing functionality in the WILC1000
> firmware which can provide further information why the card did not
> respond ?


WILC1000 SD module has UART serial debug port for firmware logs but I
don't think it would be useful here because it needs to be debug/probe
at SDIO bus level.

> Could it be the card suffered some sort of FIFO overflow ? The MX6Q is a
> bit more performant than the CA7 (I think?) SAMA5D4, so maybe that plays
> some role ?

As I understand, the issue is observed with basic iPerf testing(less
throughput) so not sure if the host performance will have such an
impact. IIRC few of the customers are using the same host(i.MX6) though
I am not sure if it's over SPI or SDIO bus. Till now, I have not come
across such limitations with the specific host.

Regards,
Ajay

2022-02-14 05:38:39

by Ajay Singh

[permalink] [raw]
Subject: Re: Possible bug on wilc1000 [Klartext]

On 11/02/22 15:59, Marek Vasut wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know
> the content is safe
>
> On 2/11/22 11:15, [email protected] wrote:
>> On 10/02/22 21:55, Marek Vasut wrote:
>>>
>>> On 2/10/22 17:19, [email protected] wrote:
>>>
>>> Hi,
>>>
>>>> On 10/02/22 14:10, Christoph Niedermaier wrote:
>>>>> From: [email protected] [mailto:[email protected]]
>>>>> Sent: Wednesday, February 9, 2022 3:37 PM
>>>>>> On 08/02/22 21:56, Christoph Niedermaier wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I tested the wireless chip wilc1000 with the 5.16.5 Kernel and the
>>>>>>> firmware v15.4.1
>>>>>>> (https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/atmel/wilc1000_wifi_firmware-1.bin)
>>>>>>>
>>>>>>>
>>>>>>> on an i.MX6 QUAD with iperf3:
>>>>>>>
>>>>>>> # iperf3 -c IP_ADDR -P 16 -t 0
>>>>>>>
>>>>>>> After a while the test gets stuck and I got the following kernel
>>>>>>> messages:
>>>>>>> mmc0: Timeout waiting for hardware interrupt.
>>>>>>> mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>>>>>>> mmc0: sdhci: Sys addr:  0x138f0200 | Version: 0x00000002
>>>>>>> mmc0: sdhci: Blk size:  0x00000158 | Blk cnt: 0x00000001
>>>>>>> mmc0: sdhci: Argument:  0x14000158 | Trn mode: 0x00000013
>>>>>>> mmc0: sdhci: Present:   0x01d88a0a | Host ctl: 0x00000013
>>>>>>> mmc0: sdhci: Power:     0x00000002 | Blk gap: 0x00000080
>>>>>>> mmc0: sdhci: Wake-up:   0x00000008 | Clock: 0x0000009f
>>>>>>> mmc0: sdhci: Timeout:   0x0000008f | Int stat: 0x00000000
>>>>>>> mmc0: sdhci: Int enab:  0x107f100b | Sig enab: 0x107f100b
>>>>>>> mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000003
>>>>>>> mmc0: sdhci: Caps:      0x07eb0000 | Caps_1: 0x0000a000
>>>>>>> mmc0: sdhci: Cmd:       0x0000353a | Max curr: 0x00ffffff
>>>>>>> mmc0: sdhci: Resp[0]:   0x00001000 | Resp[1]: 0x00000000
>>>>>>> mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]: 0x00000000
>>>>>>> mmc0: sdhci: Host ctl2: 0x00000000
>>>>>>> mmc0: sdhci: ADMA Err:  0x00000007 | ADMA Ptr: 0x4c041200
>>>>>>> mmc0: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP
>>>>>>> =========
>>>>>>> mmc0: sdhci-esdhc-imx: cmd debug status:  0x2100
>>>>>>> mmc0: sdhci-esdhc-imx: data debug status:  0x2200
>>>>>>> mmc0: sdhci-esdhc-imx: trans debug status:  0x2300
>>>>>>> mmc0: sdhci-esdhc-imx: dma debug status:  0x2402
>>>>>>> mmc0: sdhci-esdhc-imx: adma debug status:  0x25b4
>>>>>>> mmc0: sdhci-esdhc-imx: fifo debug status:  0x2610
>>>>>>> mmc0: sdhci-esdhc-imx: async fifo debug status: 0x2751
>>>>>>> mmc0: sdhci: ============================================
>>>>>>> wilc1000_sdio mmc0:0001:1: wilc_sdio_cmd53..failed, err(-110)
>>>>>>> wilc1000_sdio mmc0:0001:1: Failed cmd53 [0], bytes read...
>>>>>>>
>>>>>>> I tried to reduce the clock speed to 20MHz in the devicetree with
>>>>>>> max-frequency = <20000000>;
>>>>>>> but the problem then also occurs.
>>>>>>>
>>>>>>> Is this a possible bug?
>>>>>>>
>>>>>>>
>>>>> Hi Ajay,
>>>>> Thanks for the answer.
>>>>>
>>>>>> The bus error seems to be specific to the host during the SDIO
>>>>>> transfer.
>>>>>> How long does it take to reproduce it? Does the issue also happen
>>>>>> without "-P 16" iPerf3 option?
>>>>> It takes about 10s (something a bit longer) till I got this kernel
>>>>> error
>>>>> messages and it doesn't matter if I use it with "-P 16" or without.
>>>>
>>>>
>>>> I did not observe the issue with my setup(SAMA5D4 XPLAINED + WILC1000
>>>> SDIO) when tested iPerf for a longer duration(~1000sec). I suspect the
>>>> issue could be related to the SDHCI host controller.
>>>> Try to debug the host controller side for the possible cause of
>>>> timeout.
>>>
>>> It seems the timeout happens because the card fails to respond to SDIO
>>> command 53, right ?
>>>
>>
>> Yes, the timeout could be for any reason like either the CMD53 has not
>> reached to chip or response not received correctly at host end.
>
> The problem happens seconds or tens of seconds into the test, so there
> must've been CMD53 which reached the card before the problem occurred,
> and there must have been a lot of those CMD53 before the problem
> happened too, since CMD53 seems to be some data transfer CMD ?
>
>>> Is there some error logging/tracing functionality in the WILC1000
>>> firmware which can provide further information why the card did not
>>> respond ?
>>
>>
>> WILC1000 SD module has UART serial debug port for firmware logs but I
>> don't think it would be useful here because it needs to be debug/probe
>> at SDIO bus level.
>
> Is there some other kind of logging which can tell us more details on
> where to look for this problem ?
>
> Maybe we can try monitoring the SDIO traffic with ftrace ?
>
> Any other options, short of taking the hardware apart ?
>
>>> Could it be the card suffered some sort of FIFO overflow ? The MX6Q
>>> is a
>>> bit more performant than the CA7 (I think?) SAMA5D4, so maybe that
>>> plays
>>> some role ?
>>
>> As I understand, the issue is observed with basic iPerf testing(less
>> throughput) so not sure if the host performance will have such an
>> impact. IIRC few of the customers are using the same host(i.MX6) though
>> I am not sure if it's over SPI or SDIO bus. Till now, I have not come
>> across such limitations with the specific host.
>
> That iperf -P 16 hammers the chip with a lot of short packets, the
> problem does not occur during iperf3 -P 1 run or UDP iperf3 run (that's
> the one with low traffic). Here the interface is saturated, that's why I
> speculate some sort of FIFO overrun is happening.
>

But earlier it was mentioned that the problem doesn't matter with or
without "-P" option. So it seems the issue happens during stress test.

> I have also noticed there are some wilc1000 downstream drivers with huge
> stacks of patches, but I never really figured out whether those are
> still relevant or whether the upstream wilc1000 driver is perfectly
> fine. I would like to believe it is the later, is it ?

Yes, wilc1000 mainline driver has all the required changes except few
minor bug fixes and new features but none related to this scenario.


Regards,
Ajay