2020-01-03 08:18:53

by Aaro Koskinen

[permalink] [raw]
Subject: [BISECTED, REGRESSION] OMAP3 onenand/DMA broken

Hi,

When booting v5.4 (or v5.5-rc4) on N900, the console gets flooded with:

[ 8.335754] omap2-onenand 1000000.onenand: timeout waiting for DMA
[ 8.365753] omap2-onenand 1000000.onenand: timeout waiting for DMA
[ 8.395751] omap2-onenand 1000000.onenand: timeout waiting for DMA
[ 8.425750] omap2-onenand 1000000.onenand: timeout waiting for DMA
[ 8.455749] omap2-onenand 1000000.onenand: timeout waiting for DMA
[ 8.485748] omap2-onenand 1000000.onenand: timeout waiting for DMA
[ 8.515777] omap2-onenand 1000000.onenand: timeout waiting for DMA
[ 8.545776] omap2-onenand 1000000.onenand: timeout waiting for DMA
[ 8.575775] omap2-onenand 1000000.onenand: timeout waiting for DMA

making the system unusable.

Bisected to:

4689d35c765c696bdf0535486a990038b242a26b is the first bad commit
commit 4689d35c765c696bdf0535486a990038b242a26b
Author: Peter Ujfalusi <[email protected]>
Date: Tue Jul 16 11:24:59 2019 +0300

dmaengine: ti: omap-dma: Improved memcpy polling support

The commit does not revert cleanly anymore. Any ideas how to fix this?

A.


2020-01-03 08:52:27

by H. Nikolaus Schaller

[permalink] [raw]
Subject: Re: [BISECTED, REGRESSION] OMAP3 onenand/DMA broken

Hi,

> Am 03.01.2020 um 09:17 schrieb Aaro Koskinen <[email protected]>:
>
> Hi,
>
> When booting v5.4 (or v5.5-rc4) on N900, the console gets flooded with:
>
> [ 8.335754] omap2-onenand 1000000.onenand: timeout waiting for DMA
> [ 8.365753] omap2-onenand 1000000.onenand: timeout waiting for DMA
> [ 8.395751] omap2-onenand 1000000.onenand: timeout waiting for DMA
> [ 8.425750] omap2-onenand 1000000.onenand: timeout waiting for DMA
> [ 8.455749] omap2-onenand 1000000.onenand: timeout waiting for DMA
> [ 8.485748] omap2-onenand 1000000.onenand: timeout waiting for DMA
> [ 8.515777] omap2-onenand 1000000.onenand: timeout waiting for DMA
> [ 8.545776] omap2-onenand 1000000.onenand: timeout waiting for DMA
> [ 8.575775] omap2-onenand 1000000.onenand: timeout waiting for DMA
>
> making the system unusable.

I can confirm that this issue exists but so far we failed to bisect
and make a proper report.

Sometimes the system boots fine and sometimes it fails.

It happens on omap3-gta04a5one.dts only, but not with omap3-gta04a4.dts
(both dm3730 but different NAND).

>
> Bisected to:
>
> 4689d35c765c696bdf0535486a990038b242a26b is the first bad commit
> commit 4689d35c765c696bdf0535486a990038b242a26b
> Author: Peter Ujfalusi <[email protected]>
> Date: Tue Jul 16 11:24:59 2019 +0300
>
> dmaengine: ti: omap-dma: Improved memcpy polling support
>
> The commit does not revert cleanly anymore. Any ideas how to fix this?
>
> A.

BR, Nikolaus

2020-01-03 17:25:23

by Aaro Koskinen

[permalink] [raw]
Subject: Re: [BISECTED, REGRESSION] OMAP3 onenand/DMA broken

Hi,

On Fri, Jan 03, 2020 at 09:46:58AM +0100, H. Nikolaus Schaller wrote:
> > Am 03.01.2020 um 09:17 schrieb Aaro Koskinen <[email protected]>:
> > When booting v5.4 (or v5.5-rc4) on N900, the console gets flooded with:
> >
> > [ 8.335754] omap2-onenand 1000000.onenand: timeout waiting for DMA
> > [ 8.365753] omap2-onenand 1000000.onenand: timeout waiting for DMA
> > [ 8.395751] omap2-onenand 1000000.onenand: timeout waiting for DMA
> > [ 8.425750] omap2-onenand 1000000.onenand: timeout waiting for DMA
> > [ 8.455749] omap2-onenand 1000000.onenand: timeout waiting for DMA
> > [ 8.485748] omap2-onenand 1000000.onenand: timeout waiting for DMA
> > [ 8.515777] omap2-onenand 1000000.onenand: timeout waiting for DMA
> > [ 8.545776] omap2-onenand 1000000.onenand: timeout waiting for DMA
> > [ 8.575775] omap2-onenand 1000000.onenand: timeout waiting for DMA
> >
> > making the system unusable.
>
> I can confirm that this issue exists but so far we failed to bisect
> and make a proper report.
>
> Sometimes the system boots fine and sometimes it fails.
>
> It happens on omap3-gta04a5one.dts only, but not with omap3-gta04a4.dts
> (both dm3730 but different NAND).

I tried three different boards (N810, N900 and N950) and it always
fails reliably.

A.

2020-01-03 18:30:11

by H. Nikolaus Schaller

[permalink] [raw]
Subject: Re: [BISECTED, REGRESSION] OMAP3 onenand/DMA broken

Hi Aaro,

> Am 03.01.2020 um 18:23 schrieb Aaro Koskinen <[email protected]>:
>
> Hi,
>
> On Fri, Jan 03, 2020 at 09:46:58AM +0100, H. Nikolaus Schaller wrote:
>>> Am 03.01.2020 um 09:17 schrieb Aaro Koskinen <[email protected]>:
>>> When booting v5.4 (or v5.5-rc4) on N900, the console gets flooded with:
>>>
>>> [ 8.335754] omap2-onenand 1000000.onenand: timeout waiting for DMA
>>> [ 8.365753] omap2-onenand 1000000.onenand: timeout waiting for DMA
>>> [ 8.395751] omap2-onenand 1000000.onenand: timeout waiting for DMA
>>> [ 8.425750] omap2-onenand 1000000.onenand: timeout waiting for DMA
>>> [ 8.455749] omap2-onenand 1000000.onenand: timeout waiting for DMA
>>> [ 8.485748] omap2-onenand 1000000.onenand: timeout waiting for DMA
>>> [ 8.515777] omap2-onenand 1000000.onenand: timeout waiting for DMA
>>> [ 8.545776] omap2-onenand 1000000.onenand: timeout waiting for DMA
>>> [ 8.575775] omap2-onenand 1000000.onenand: timeout waiting for DMA
>>>
>>> making the system unusable.
>>
>> I can confirm that this issue exists but so far we failed to bisect
>> and make a proper report.
>>
>> Sometimes the system boots fine and sometimes it fails.

Well, we boot from µSD and the number of the timeouts changes. So it may
be a race or depend on driver load sequence if we come to a login: or not.
But this is not the real bug.

>>
>> It happens on omap3-gta04a5one.dts only, but not with omap3-gta04a4.dts
>> (both dm3730 but different NAND).
>
> I tried three different boards (N810, N900 and N950) and it always
> fails reliably.

The big question is why the patch is harmful.

I tried to understand what the patch is doing (without any knowledge
about the DMA hard- or software architecture).

Basically it reorders error handling and some corner cases.
Maybe it handles one differently that happens only for OneNAND.

What did jump to my mind is that before the patch there is an
unconditional call to omap_dma_chan_read(c, CCR) if (!c->paused && c->running)

And then DMA_COMPLETE is returned or ret if txstate == 0

With the new code the check for DMA_COMPLETE comes first and
directly leads to a return. Independently of txstate.

So if we have (!c->paused && c->running) and dma_cookie_status()
returns DMA_COMPLETE, there is no longer a call to omap_dma_chan_read()

Since I do not understand what omap_dma_chan_read() is doing,
and if (!c->paused && c->running) is relevant here,
I can not conclude if that is harmful.

But I can imagine that reading a register may have a side-effect of
resetting some bit like interrupt status registers.

I hope that Peter or Tony can respond soon.

BR and thanks,
Nikolaus



2020-01-04 07:38:55

by Peter Ujfalusi

[permalink] [raw]
Subject: Re: [BISECTED, REGRESSION] OMAP3 onenand/DMA broken

Hi Aaro,

On 1/3/20 10:17 AM, Aaro Koskinen wrote:
> Hi,
>
> When booting v5.4 (or v5.5-rc4) on N900, the console gets flooded with:
>
> [ 8.335754] omap2-onenand 1000000.onenand: timeout waiting for DMA
> [ 8.365753] omap2-onenand 1000000.onenand: timeout waiting for DMA
> [ 8.395751] omap2-onenand 1000000.onenand: timeout waiting for DMA
> [ 8.425750] omap2-onenand 1000000.onenand: timeout waiting for DMA
> [ 8.455749] omap2-onenand 1000000.onenand: timeout waiting for DMA
> [ 8.485748] omap2-onenand 1000000.onenand: timeout waiting for DMA
> [ 8.515777] omap2-onenand 1000000.onenand: timeout waiting for DMA
> [ 8.545776] omap2-onenand 1000000.onenand: timeout waiting for DMA
> [ 8.575775] omap2-onenand 1000000.onenand: timeout waiting for DMA
>
> making the system unusable.
>
> Bisected to:
>
> 4689d35c765c696bdf0535486a990038b242a26b is the first bad commit
> commit 4689d35c765c696bdf0535486a990038b242a26b
> Author: Peter Ujfalusi <[email protected]>
> Date: Tue Jul 16 11:24:59 2019 +0300
>
> dmaengine: ti: omap-dma: Improved memcpy polling support
>
> The commit does not revert cleanly anymore. Any ideas how to fix this?

I certainly tested the memcpy via dmatest in polled and non polled mode..

I can take a look on Tuesday earliest, but sent a patch (untested) which
should fix the issue:
https://lore.kernel.org/lkml/[email protected]/


>
> A.
>

- Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki