2006-01-10 03:08:04

by Grant Coady

[permalink] [raw]
Subject: 2.4: e100 accounting bust for multiple adapters

Hi there,

While testing for a different issue on a box with two e100 NICs I noticed
that interrupt and other accounting are accumulated to the first e100 NIC.

Tested 2.4.(29|30|31|32)-hf32.1 plus 2.4.33-pre1 so the issue has been in
there for a while :(

e100 is compiled as module, the problem is not there with 2.6.15.

The problem also goes away if I compile e100 in, tested with 2.4.32-hf32.1

grant@deltree:~$ while true; do egrep 'eth0|eth1' /proc/interrupts; sleep 1; done
11: 3407 XT-PIC eth0
12: 630 XT-PIC eth1
11: 3436 XT-PIC eth0 [same test as below]
12: 921 XT-PIC eth1
11: 3439 XT-PIC eth0
12: 1266 XT-PIC eth1
11: 3443 XT-PIC eth0
12: 1343 XT-PIC eth1
11: 3446 XT-PIC eth0
12: 1343 XT-PIC eth1
11: 3449 XT-PIC eth0
12: 1343 XT-PIC eth1
11: 3456 XT-PIC eth0
12: 1343 XT-PIC eth1

dmesgs + configs on http://bugsplatter.mine.nu/test/boxen/deltree/

Is this a known issue?

Cheers,
Grant.

more info:
part of http://lkml.org/lkml/2006/1/9/27 -- watching interrupts on
eth1, but they're being accumulated to eth0:

grant@deltree:~$ while true; do egrep 'eth0|eth1' /proc/interrupts; sleep 1; done
11: 9136 XT-PIC eth0 \
12: 4 XT-PIC eth1 |
11: 9516 XT-PIC eth0 > time cat /var/log/apache/access_log
12: 4 XT-PIC eth1 | real 0m1.946s
11: 10146 XT-PIC eth0 | user 0m0.000s
12: 4 XT-PIC eth1 / sys 0m0.200s
11: 10321 XT-PIC eth0
12: 4 XT-PIC eth1

Odd, with 2.4, the two e100 NICs are not being accounted properly:

root@deltree:~# ifconfig
eth0 Link encap:Ethernet HWaddr 00:90:27:42:AA:77
inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4840 errors:0 dropped:0 overruns:0 frame:0
TX packets:8825 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:341812 (333.8 Kb) TX bytes:9931009 (9.4 Mb)
Interrupt:11 Base address:0xdcc0 Memory:fd201000-fd201038

eth1 Link encap:Ethernet HWaddr 00:90:27:58:32:D4
inet addr:192.168.2.1 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:12 Base address:0xdc80 Memory:fd200000-fd200038

dmesg says:
Intel(R) PRO/100 Network Driver - version 2.3.43-k1
Copyright (c) 2004 Intel Corporation

e100: selftest OK.
e100: eth0: Intel(R) PRO/100 Network Connection
Hardware receive checksums enabled
cpu cycle saver enabled

e100: selftest OK.
e100: eth1: Intel(R) PRO/100 Network Connection
Hardware receive checksums enabled
cpu cycle saver enabled

smc-ultra.c:v2.02 2/3/98 Donald Becker ([email protected])
eth2: SMC Ultra at 0x280, 00 00 C0 5D 46 B5,assigned IRQ 3 memory 0xd0000-0xd3fff.
e100: eth0 NIC Link is Up 100 Mbps Full duplex
e100: eth1 NIC Link is Up 100 Mbps Full duplex
- - -


2006-01-11 00:24:30

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: 2.4: e100 accounting bust for multiple adapters

On 1/9/06, Grant Coady <[email protected]> wrote:
> Hi there,
>
> While testing for a different issue on a box with two e100 NICs I noticed
> that interrupt and other accounting are accumulated to the first e100 NIC.

are the two e100's on the same broadcast domain? if they are you
might actually be transferring all traffic on eth0

e100 doesn't track its own interrupt counts, the kernel does that for us.

Jesse

2006-01-11 01:35:29

by Grant Coady

[permalink] [raw]
Subject: Re: 2.4: e100 accounting bust for multiple adapters

On Tue, 10 Jan 2006 16:24:28 -0800, Jesse Brandeburg <[email protected]> wrote:

>On 1/9/06, Grant Coady <[email protected]> wrote:
>> Hi there,
>>
>> While testing for a different issue on a box with two e100 NICs I noticed
>> that interrupt and other accounting are accumulated to the first e100 NIC.
>
>are the two e100's on the same broadcast domain? if they are you
>might actually be transferring all traffic on eth0

You ignore the fact these two NICs work as expected on 2.6.15
and on 2.4.32 when e100 driver is compiled in, for the same
hardware and test.
>
>e100 doesn't track its own interrupt counts, the kernel does that for us.

What further testing would you like? Also, you ignore the all
zeroes ifconfig accounting for the second NIC, and that the
accounting was also accumulated to the first e100 along with
interrupts.

Anyway the solution is simple: modular e100 is borked on 2.4,
compiled in is okay.

Grant.

2006-01-12 04:59:57

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: 2.4: e100 accounting bust for multiple adapters

On 1/10/06, Grant Coady <[email protected]> wrote:
> On Tue, 10 Jan 2006 16:24:28 -0800, Jesse Brandeburg <[email protected]> wrote:
>
> >On 1/9/06, Grant Coady <[email protected]> wrote:
> >> Hi there,
> >>
> >> While testing for a different issue on a box with two e100 NICs I noticed
> >> that interrupt and other accounting are accumulated to the first e100 NIC.
> >
> >are the two e100's on the same broadcast domain? if they are you
> >might actually be transferring all traffic on eth0
>
> You ignore the fact these two NICs work as expected on 2.6.15
> and on 2.4.32 when e100 driver is compiled in, for the same
> hardware and test.
> >
> >e100 doesn't track its own interrupt counts, the kernel does that for us.
>
> What further testing would you like? Also, you ignore the all
> zeroes ifconfig accounting for the second NIC, and that the
> accounting was also accumulated to the first e100 along with
> interrupts.

okay mea culpa, I guess I didn't see you say that.
It sounds like the netdev structs are somehow overlapped.

> Anyway the solution is simple: modular e100 is borked on 2.4,
> compiled in is okay.

modular e100 2.X is borked, right? if you have a moment could you try
the 3.X e100 driver from sourceforge?
(http://prdownloads.sf.net/e1000) it should work fine on 2.4 and I
haven't heard any reports of icky stats.

thanks for your testing and bug reports.

Jesse

2006-01-14 02:55:29

by Grant Coady

[permalink] [raw]
Subject: Re: 2.4: e100 accounting bust for multiple adapters

On Wed, 11 Jan 2006 20:59:50 -0800, Jesse Brandeburg <[email protected]> wrote:

>> Anyway the solution is simple: modular e100 is borked on 2.4,
>> compiled in is okay.
>
>modular e100 2.X is borked, right? if you have a moment could you try
>the 3.X e100 driver from sourceforge?
>(http://prdownloads.sf.net/e1000) it should work fine on 2.4 and I
>haven't heard any reports of icky stats.

Hi Jesse,

Couple of compile warnings:
grant@deltree:~/e100-3.5.10/src$ make clean; make CFLAGS_EXTRA=-DE100_NO_NAPI
rm -rf e100.o e100.o e100.o e100.o e100.o e100.7.gz .*cmd .tmp_versions
gcc -DLINUX -D__KERNEL__ -DMODULE -O2 -pipe -Wall -I/lib/modules/2.4.32-hf32.1x/build/include -I. -DMODVERSIONS -DEXPORT_SYMTAB -include /lib/modules/2.4.32-hf32.1x/build/include/linux/modversions.h -DE100_NO_NAPI -c -o e100.o e100.c
In file included from /lib/modules/2.4.32-hf32.1x/build/include/linux/spinlock.h:6,
from /lib/modules/2.4.32-hf32.1x/build/include/linux/module.h:12,
from e100.c:138:
/lib/modules/2.4.32-hf32.1x/build/include/asm/system.h: In function `__set_64bit_var':
/lib/modules/2.4.32-hf32.1x/build/include/asm/system.h:190: warning: dereferencing type-punned pointer will break strict-aliasing rules
/lib/modules/2.4.32-hf32.1x/build/include/asm/system.h:190: warning: dereferencing type-punned pointer will break strict-aliasing rules


**************************************************
** e100.o built for 2.4.32-hf32.1x
** SMP Disabled
**************************************************

I have e100-3.5.10 up now and the stats now look okay. Is this
driver update headed for 2.4 kernel inclusion?

Grant.

2006-01-14 07:50:39

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: 2.4: e100 accounting bust for multiple adapters

On 1/13/06, Grant Coady <[email protected]> wrote:
> On Wed, 11 Jan 2006 20:59:50 -0800, Jesse Brandeburg <[email protected]> wrote:
>
> >> Anyway the solution is simple: modular e100 is borked on 2.4,
> >> compiled in is okay.
> >
> >modular e100 2.X is borked, right? if you have a moment could you try
> >the 3.X e100 driver from sourceforge?
> >(http://prdownloads.sf.net/e1000) it should work fine on 2.4 and I
> >haven't heard any reports of icky stats.
>
> Hi Jesse,
>
> Couple of compile warnings:
> grant@deltree:~/e100-3.5.10/src$ make clean; make CFLAGS_EXTRA=-DE100_NO_NAPI
> rm -rf e100.o e100.o e100.o e100.o e100.o e100.7.gz .*cmd .tmp_versions
> gcc -DLINUX -D__KERNEL__ -DMODULE -O2 -pipe -Wall -I/lib/modules/2.4.32-hf32.1x/build/include -I. -DMODVERSIONS -DEXPORT_SYMTAB -include /lib/modules/2.4.32-hf32.1x/build/include/linux/modversions.h -DE100_NO_NAPI -c -o e100.o e100.c
> In file included from /lib/modules/2.4.32-hf32.1x/build/include/linux/spinlock.h:6,
> from /lib/modules/2.4.32-hf32.1x/build/include/linux/module.h:12,
> from e100.c:138:
> /lib/modules/2.4.32-hf32.1x/build/include/asm/system.h: In function `__set_64bit_var':
> /lib/modules/2.4.32-hf32.1x/build/include/asm/system.h:190: warning: dereferencing type-punned pointer will break strict-aliasing rules
> /lib/modules/2.4.32-hf32.1x/build/include/asm/system.h:190: warning: dereferencing type-punned pointer will break strict-aliasing rules
>

I'm not sure but I don't think those warnings are from something e100
can control.

> **************************************************
> ** e100.o built for 2.4.32-hf32.1x
> ** SMP Disabled
> **************************************************
>
> I have e100-3.5.10 up now and the stats now look okay. Is this
> driver update headed for 2.4 kernel inclusion?

Most people running 2.4 seem to be okay with the old driver. We
haven't maintained it for a long time, and have moved all of our
efforts to the more maintainable (rewritten) 3.X driver. Basically
we've taken the position that 2.4 is legacy and if it ain't broke
don't fix it.

well, you've found something that broke, but maybe someone on list
here can help with a fix, but to answer your question its unlikely
that we'll attempt to rev the 2.4 driver unless something very serious
is brought up.

I'm hoping you can get along with the 3.X driver, and I'll be glad to
look into any issues that you come up with for that driver.

Jesse

2006-01-14 08:24:39

by Grant Coady

[permalink] [raw]
Subject: Re: 2.4: e100 accounting bust for multiple adapters

On Fri, 13 Jan 2006 23:50:38 -0800, Jesse Brandeburg <[email protected]> wrote:

>we've taken the position that 2.4 is legacy and if it ain't broke
>don't fix it.
Okay.

Had a look at the source but it is difficult to track the indirection,
I gave up ;) After all, performance-wise the old driver is okay, just
odd accounting. Annoying to know there's a missing (harmless) indirection
in the module case but I don't have a clue where to look for it.


>I'm hoping you can get along with the 3.X driver, and I'll be glad to
>look into any issues that you come up with for that driver.

Minor nit: lsmod shows zero when two e100-3.5.10 in use.

Grant.