2013-03-12 15:35:55

by Mark Jackson

[permalink] [raw]
Subject: Excessive ethernet interrupts on AM335x board

I'm just fighting an issue with ethernet on our custom AM335x board:-

# uname -a
Linux nanobone 3.9.0-rc2-00113-gd60f039 #139 Tue Mar 12 15:14:01 GMT 2013 armv7l GNU/Linux

Every now and then, the whole unit slows to a crawl. The only indication of any problem is:-

(a) the serial tty port becomes much less responsive
(b) normal ping times jump from 1ms to >10sec (sometimes >20sec !!)
(c) the ethernet interrupt count rockets (see below)

I've tried to force the problem by flood pinging from my PC.

# while true
> do grep "58:" /proc/interrupts; sleep 10
> done
58: 1291 INTC 4a100000.ethernet <<< normal pinging (about 100 irqs per 10sec)
58: 1333 INTC 4a100000.ethernet
58: 1372 INTC 4a100000.ethernet
58: 3979 INTC 4a100000.ethernet <<< start flood ping (about 4k irqs per 10sec)
58: 6540 INTC 4a100000.ethernet
58: 17519 INTC 4a100000.ethernet <<< big jump >>>
58: 20169 INTC 4a100000.ethernet
58: 22775 INTC 4a100000.ethernet
58: 25368 INTC 4a100000.ethernet
58: 34598 INTC 4a100000.ethernet <<< big jump >>>
58: 37182 INTC 4a100000.ethernet
58: 39730 INTC 4a100000.ethernet
58: 141220 INTC 4a100000.ethernet <<< whoa !!! >>>
58: 146080 INTC 4a100000.ethernet
58: 149351 INTC 4a100000.ethernet
58: 152922 INTC 4a100000.ethernet
58: 156420 INTC 4a100000.ethernet
58: 159538 INTC 4a100000.ethernet
58: 162711 INTC 4a100000.ethernet
58: 165746 INTC 4a100000.ethernet
58: 168973 INTC 4a100000.ethernet
58: 172128 INTC 4a100000.ethernet
58: 175030 INTC 4a100000.ethernet
58: 177957 INTC 4a100000.ethernet
58: 180782 INTC 4a100000.ethernet
58: 183618 INTC 4a100000.ethernet
58: 186450 INTC 4a100000.ethernet
58: 189242 INTC 4a100000.ethernet
58: 191909 INTC 4a100000.ethernet
58: 194565 INTC 4a100000.ethernet
58: 197153 INTC 4a100000.ethernet
58: 199730 INTC 4a100000.ethernet <<< another big jump >>>
58: 252629 INTC 4a100000.ethernet
58: 262955 INTC 4a100000.ethernet
58: 265557 INTC 4a100000.ethernet
58: 268131 INTC 4a100000.ethernet
58: 272586 INTC 4a100000.ethernet
58: 275623 INTC 4a100000.ethernet <<< here I stop flood pings >>>
[ 631.727758] nfs: server 10.0.0.100 not responding, still trying
[ 638.738864] nfs: server 10.0.0.100 OK
58: 277694 INTC 4a100000.ethernet
58: 277703 INTC 4a100000.ethernet
58: 277709 INTC 4a100000.ethernet
58: 277719 INTC 4a100000.ethernet
58: 277725 INTC 4a100000.ethernet
58: 277734 INTC 4a100000.ethernet
58: 277745 INTC 4a100000.ethernet

As you can see, when I stop the flood pings, the nfs link is now reported
as being lost.

Any ideas ?

Cheers
Mark J.


2013-03-12 15:54:57

by Mark Jackson

[permalink] [raw]
Subject: Re: Excessive ethernet interrupts on AM335x board

On 12/03/13 15:35, Mark Jackson wrote:
> I'm just fighting an issue with ethernet on our custom AM335x board:-
>
> # uname -a
> Linux nanobone 3.9.0-rc2-00113-gd60f039 #139 Tue Mar 12 15:14:01 GMT 2013 armv7l GNU/Linux
>
> Every now and then, the whole unit slows to a crawl. The only indication of any problem is:-
>
> (a) the serial tty port becomes much less responsive
> (b) normal ping times jump from 1ms to >10sec (sometimes >20sec !!)
> (c) the ethernet interrupt count rockets (see below)
>
> I've tried to force the problem by flood pinging from my PC.
>
> # while true
>> do grep "58:" /proc/interrupts; sleep 10
>> done
> 58: 1291 INTC 4a100000.ethernet <<< normal pinging (about 100 irqs per 10sec)
> 58: 1333 INTC 4a100000.ethernet
> 58: 1372 INTC 4a100000.ethernet
> 58: 3979 INTC 4a100000.ethernet <<< start flood ping (about 4k irqs per 10sec)
> 58: 6540 INTC 4a100000.ethernet
> 58: 17519 INTC 4a100000.ethernet <<< big jump >>>
> 58: 20169 INTC 4a100000.ethernet
> 58: 22775 INTC 4a100000.ethernet
> 58: 25368 INTC 4a100000.ethernet
> 58: 34598 INTC 4a100000.ethernet <<< big jump >>>
> 58: 37182 INTC 4a100000.ethernet
> 58: 39730 INTC 4a100000.ethernet
> 58: 141220 INTC 4a100000.ethernet <<< whoa !!! >>>
> 58: 146080 INTC 4a100000.ethernet

Doing the flood ping test on an old Beaglebone (running kernel 3.2.34 on an sdcard), I get:-

# while true
> do grep "94:" /proc/interrupts; sleep 10
ne
> done
94: 281353 INTC cpsw.0
94: 370782 INTC cpsw.0
94: 457537 INTC cpsw.0
94: 544876 INTC cpsw.0
94: 631795 INTC cpsw.0
94: 717747 INTC cpsw.0
94: 805974 INTC cpsw.0
94: 892961 INTC cpsw.0
94: 981490 INTC cpsw.0
94: 1070627 INTC cpsw.0
94: 1153086 INTC cpsw.0
94: 1242060 INTC cpsw.0
94: 1327734 INTC cpsw.0
94: 1413705 INTC cpsw.0
94: 1504494 INTC cpsw.0
94: 1591395 INTC cpsw.0
94: 1676769 INTC cpsw.0

So these are going up by 90k irqs per 10sec ... meaning that the AM335x
board seems to be *dropping* most of its ethernet irqs.

I'll try to get 3.9.0-rc2 on the BB and retest.

Mark J.

2013-03-13 08:44:24

by Koen Kooi

[permalink] [raw]
Subject: Re: Excessive ethernet interrupts on AM335x board


Op 12 mrt. 2013, om 16:35 heeft Mark Jackson <[email protected]> het volgende geschreven:

> I'm just fighting an issue with ethernet on our custom AM335x board:-
>
> # uname -a
> Linux nanobone 3.9.0-rc2-00113-gd60f039 #139 Tue Mar 12 15:14:01 GMT 2013 armv7l GNU/Linux
>
> Every now and then, the whole unit slows to a crawl. The only indication of any problem is:-
>
> (a) the serial tty port becomes much less responsive
> (b) normal ping times jump from 1ms to >10sec (sometimes >20sec !!)
> (c) the ethernet interrupt count rockets (see below)

You probably have PG2.x silicon, have a look at this patch: https://github.com/beagleboard/kernel/blob/3.8/patches/net/0003-cpsw-Fix-interrupt-storm-among-other-things.patch

I saw some patches going into net-next today that might address this in a different way, but I haven't tried 3.9rc on an am335x yet.-

2013-03-13 10:11:39

by Mark Jackson

[permalink] [raw]
Subject: Re: Excessive ethernet interrupts on AM335x board

On 13/03/13 08:44, Koen Kooi wrote:
>
> Op 12 mrt. 2013, om 16:35 heeft Mark Jackson <[email protected]> het volgende geschreven:
>
>> I'm just fighting an issue with ethernet on our custom AM335x board:-
>>
>> # uname -a
>> Linux nanobone 3.9.0-rc2-00113-gd60f039 #139 Tue Mar 12 15:14:01 GMT 2013 armv7l GNU/Linux
>>
>> Every now and then, the whole unit slows to a crawl. The only indication of any problem is:-
>>
>> (a) the serial tty port becomes much less responsive
>> (b) normal ping times jump from 1ms to >10sec (sometimes >20sec !!)
>> (c) the ethernet interrupt count rockets (see below)
>
> You probably have PG2.x silicon, have a look at this patch: https://github.com/beagleboard/kernel/blob/3.8/patches/net/0003-cpsw-Fix-interrupt-storm-among-other-things.patch

No, it's 1.0 ...

[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 3.9.0-rc2-00113-gd60f039-dirty (mpfj@mpfj-nanobone) (gcc version 4.5.4 (Buildroot 2012.11) ) #141 Wed Mar 13 09:14:03 GMT 2013
[ 0.000000] CPU: ARMv7 Processor [413fc082] revision 2 (ARMv7), cr=10c53c7d
[ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[ 0.000000] Machine: Generic AM33XX (Flattened Device Tree), model: Newflow AM335x NanoBone
[ 0.000000] Memory policy: ECC disabled, Data cache writeback
[ 0.000000] CPU: All CPU(s) started in SVC mode.
[ 0.000000] AM335X ES1.0 (neon )

The patch certainly didn't fix things (and possibly made things worse i.e. my nfs root kept dropping off even more).

> I saw some patches going into net-next today that might address this in a different way, but I haven't tried 3.9rc on an am335x yet.

I might track those down and test them.

Cheers
Mark J.

2013-03-13 10:32:26

by Daniel Mack

[permalink] [raw]
Subject: Re: Excessive ethernet interrupts on AM335x board

On Tue, Mar 12, 2013 at 4:35 PM, Mark Jackson <[email protected]> wrote:
> I'm just fighting an issue with ethernet on our custom AM335x board:-
>
> # uname -a
> Linux nanobone 3.9.0-rc2-00113-gd60f039 #139 Tue Mar 12 15:14:01 GMT 2013 armv7l GNU/Linux
>
> Every now and then, the whole unit slows to a crawl. The only indication of any problem is:-
>
> (a) the serial tty port becomes much less responsive
> (b) normal ping times jump from 1ms to >10sec (sometimes >20sec !!)
> (c) the ethernet interrupt count rockets (see below)
>
> I've tried to force the problem by flood pinging from my PC.
>
> # while true
>> do grep "58:" /proc/interrupts; sleep 10
>> done
> 58: 1291 INTC 4a100000.ethernet <<< normal pinging (about 100 irqs per 10sec)
> 58: 1333 INTC 4a100000.ethernet
> 58: 1372 INTC 4a100000.ethernet
> 58: 3979 INTC 4a100000.ethernet <<< start flood ping (about 4k irqs per 10sec)
> 58: 6540 INTC 4a100000.ethernet
> 58: 17519 INTC 4a100000.ethernet <<< big jump >>>
> 58: 20169 INTC 4a100000.ethernet
> 58: 22775 INTC 4a100000.ethernet
> 58: 25368 INTC 4a100000.ethernet
> 58: 34598 INTC 4a100000.ethernet <<< big jump >>>
> 58: 37182 INTC 4a100000.ethernet
> 58: 39730 INTC 4a100000.ethernet
> 58: 141220 INTC 4a100000.ethernet <<< whoa !!! >>>
> 58: 146080 INTC 4a100000.ethernet
> 58: 149351 INTC 4a100000.ethernet
> 58: 152922 INTC 4a100000.ethernet
> 58: 156420 INTC 4a100000.ethernet
> 58: 159538 INTC 4a100000.ethernet
> 58: 162711 INTC 4a100000.ethernet
> 58: 165746 INTC 4a100000.ethernet
> 58: 168973 INTC 4a100000.ethernet
> 58: 172128 INTC 4a100000.ethernet
> 58: 175030 INTC 4a100000.ethernet
> 58: 177957 INTC 4a100000.ethernet
> 58: 180782 INTC 4a100000.ethernet
> 58: 183618 INTC 4a100000.ethernet
> 58: 186450 INTC 4a100000.ethernet
> 58: 189242 INTC 4a100000.ethernet
> 58: 191909 INTC 4a100000.ethernet
> 58: 194565 INTC 4a100000.ethernet
> 58: 197153 INTC 4a100000.ethernet
> 58: 199730 INTC 4a100000.ethernet <<< another big jump >>>
> 58: 252629 INTC 4a100000.ethernet
> 58: 262955 INTC 4a100000.ethernet
> 58: 265557 INTC 4a100000.ethernet
> 58: 268131 INTC 4a100000.ethernet
> 58: 272586 INTC 4a100000.ethernet
> 58: 275623 INTC 4a100000.ethernet <<< here I stop flood pings >>>
> [ 631.727758] nfs: server 10.0.0.100 not responding, still trying
> [ 638.738864] nfs: server 10.0.0.100 OK
> 58: 277694 INTC 4a100000.ethernet
> 58: 277703 INTC 4a100000.ethernet
> 58: 277709 INTC 4a100000.ethernet
> 58: 277719 INTC 4a100000.ethernet
> 58: 277725 INTC 4a100000.ethernet
> 58: 277734 INTC 4a100000.ethernet
> 58: 277745 INTC 4a100000.ethernet
>
> As you can see, when I stop the flood pings, the nfs link is now reported
> as being lost.

I had the same problem. Please check this patch, I'm sure it will fix you issue:

https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=d35162f89b8f00537d7b240b76d2d0e8b8d29aa0



Daniel

2013-03-13 10:37:08

by Mark Jackson

[permalink] [raw]
Subject: Re: Excessive ethernet interrupts on AM335x board

On 13/03/13 10:32, Daniel Mack wrote:
> On Tue, Mar 12, 2013 at 4:35 PM, Mark Jackson <[email protected]> wrote:
>> I'm just fighting an issue with ethernet on our custom AM335x board:-
>>
>> # uname -a
>> Linux nanobone 3.9.0-rc2-00113-gd60f039 #139 Tue Mar 12 15:14:01 GMT 2013 armv7l GNU/Linux
>>
>> Every now and then, the whole unit slows to a crawl. The only indication of any problem is:-
>>
>> (a) the serial tty port becomes much less responsive
>> (b) normal ping times jump from 1ms to >10sec (sometimes >20sec !!)
>> (c) the ethernet interrupt count rockets (see below)
>>
>> I've tried to force the problem by flood pinging from my PC.

<snip>

>> As you can see, when I stop the flood pings, the nfs link is now reported
>> as being lost.
>
> I had the same problem. Please check this patch, I'm sure it will fix you issue:
>
> https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=d35162f89b8f00537d7b240b76d2d0e8b8d29aa0

Brilliant ... that's the one !!

Cheers
Mark J.