2004-06-23 14:31:56

by Mikael Bouillot

[permalink] [raw]
Subject: Forcedeth driver bug

Hi all,

I'm having trouble with the forcedeth driver in kernel version 2.6.7.
>From what I can see, it seems that incoming packets sometime get stuck
on their way in.

What happens is this: some packet enters the NIC, and for some reason,
it doesn't come out of the driver. As soon as another incoming packet
gets in, both packets are handed down by the driver.

It is usually invisible during normal TCP operation, as there are
several packets in flight and the stuck packet gets pushed down by the
one following it very soon. But for lockstep protocols like SMB, it very
annoying as it means you get "blanks" of 2 to 5 seconds during the
transfer.

I can reproduce this very easily with a modified version of ping. I
do a flood ping from another machine to the one with the nvnet NIC, but
I modified ping to send a new packet if one gets "lost" only 10 seconds
later instead of after 10 ms. The result is that after a couple hundred
ping-pong at full speed, one ping gets stuck. After 10 seconds, another
ping is sent and both pong come back.

This didn't happen with the proprietary nvnet driver on kernel 2.4.24.
My hardware is a nForce 2 mobo (in a shuttle SN45G barebones).

Is this a know bug? If someone working on it already or should I
investigate the matter further? Please CC any reply to me as I'm not on
the list.


Mikael Bouillot

--
Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread!


2004-06-23 14:47:00

by Martin Zwickel

[permalink] [raw]
Subject: Re: Forcedeth driver bug

On Wed, 23 Jun 2004 14:29:36 +0000
Mikael Bouillot <[email protected]> bubbled:

> Hi all,
>
> I'm having trouble with the forcedeth driver in kernel version 2.6.7.
> >From what I can see, it seems that incoming packets sometime get stuck
> on their way in.
>
> What happens is this: some packet enters the NIC, and for some reason,
> it doesn't come out of the driver. As soon as another incoming packet
> gets in, both packets are handed down by the driver.

Do you really know that the driver don't get the stuck packet. Or is it possible
that the kernels network stack does the fault?

I'm asking because I have a similar problem with udp and kernel 2.6.7-rc2-mm2.
My sendto gets stuck sometimes and only continues if the kernel handles another
network packet.

But maybe my problem is a totally different one.

Regards,
Martin

--
MyExcuse:
YOU HAVE AN I/O ERROR -> Incompetent Operator error

Martin Zwickel <[email protected]>
Research & Development

TechnoTrend AG <http://www.technotrend.de>

2004-06-23 14:57:32

by Kalin KOZHUHAROV

[permalink] [raw]
Subject: Re: Forcedeth driver bug

Mikael Bouillot wrote:
> Hi all,
>
> I'm having trouble with the forcedeth driver in kernel version 2.6.7.
>>From what I can see, it seems that incoming packets sometime get stuck
> on their way in.
>
> What happens is this: some packet enters the NIC, and for some reason,
> it doesn't come out of the driver. As soon as another incoming packet
> gets in, both packets are handed down by the driver.
>
> It is usually invisible during normal TCP operation, as there are
> several packets in flight and the stuck packet gets pushed down by the
> one following it very soon. But for lockstep protocols like SMB, it very
> annoying as it means you get "blanks" of 2 to 5 seconds during the
> transfer.
>
> I can reproduce this very easily with a modified version of ping. I
> do a flood ping from another machine to the one with the nvnet NIC, but
> I modified ping to send a new packet if one gets "lost" only 10 seconds
> later instead of after 10 ms. The result is that after a couple hundred
> ping-pong at full speed, one ping gets stuck. After 10 seconds, another
> ping is sent and both pong come back.
>
> This didn't happen with the proprietary nvnet driver on kernel 2.4.24.
> My hardware is a nForce 2 mobo (in a shuttle SN45G barebones).
>
> Is this a know bug? If someone working on it already or should I
> investigate the matter further? Please CC any reply to me as I'm not on
> the list.

Search http://groups.google.com/ or somewhere else in LKML for "new device support for forcedeth.c"

Try the latest patch ( forcedeth_gigabit_try17.txt was the one I tested last) and report back.
The driver has undergone quite a lot of patching lately.
AFAIR, while testing it, similar effect was observed, but the it was way broken anyway.

Kalin.

--
||///_ o *****************************
||//'_/> WWW: http://ThinRope.net/

2004-06-23 15:58:19

by Carl-Daniel Hailfinger

[permalink] [raw]
Subject: Re: Forcedeth driver bug

Kalin KOZHUHAROV wrote:
> Mikael Bouillot wrote:
>
>> Hi all,
>>
>> I'm having trouble with the forcedeth driver in kernel version 2.6.7.
>>
>>> From what I can see, it seems that incoming packets sometime get stuck
>>
>> on their way in.
>>
>> What happens is this: some packet enters the NIC, and for some reason,
>> it doesn't come out of the driver. As soon as another incoming packet
>> gets in, both packets are handed down by the driver.
>>
>> It is usually invisible during normal TCP operation, as there are
>> several packets in flight and the stuck packet gets pushed down by the
>> one following it very soon. But for lockstep protocols like SMB, it very
>> annoying as it means you get "blanks" of 2 to 5 seconds during the
>> transfer.
>>
>> I can reproduce this very easily with a modified version of ping. I
>> do a flood ping from another machine to the one with the nvnet NIC, but
>> I modified ping to send a new packet if one gets "lost" only 10 seconds
>> later instead of after 10 ms. The result is that after a couple hundred
>> ping-pong at full speed, one ping gets stuck. After 10 seconds, another
>> ping is sent and both pong come back.

Are you sure both come back? If so, what does dmesg say during this time?
Is the system in question under heavy load?

Can you confirm that the ping packet got stuck in the receive path or
could the associated pong reply have gotten stuck in the send path?


>> This didn't happen with the proprietary nvnet driver on kernel 2.4.24.
>> My hardware is a nForce 2 mobo (in a shuttle SN45G barebones).
>>
>> Is this a know bug? If someone working on it already or should I
>> investigate the matter further? Please CC any reply to me as I'm not on
>> the list.

It could be a weird interaction with interrupt mitigation, but I doubt it.
Nobody has ever mailed me about such problems with the driver.


> Search http://groups.google.com/ or somewhere else in LKML for "new
> device support for forcedeth.c"
>
> Try the latest patch ( forcedeth_gigabit_try17.txt was the one I tested
> last) and report back.
> The driver has undergone quite a lot of patching lately.
> AFAIR, while testing it, similar effect was observed, but the it was way
> broken anyway.

forcedeth_gigabit_try19.txt is the most recent one.
Changes against try17:
- fix compilation warnings and rename the Kconfig entry

Get it at
http://www.hailfinger.org/carldani/linux/patches/forcedeth/
and please report if it fixes your problem.

Regards,
Carl-Daniel

2004-06-23 16:06:04

by Mikael Bouillot

[permalink] [raw]
Subject: Re: Forcedeth driver bug

> Do you really know that the driver don't get the stuck packet. Or is it possible
> that the kernels network stack does the fault?

No, I'm not sure it's the driver's fault. I have this problem since I
switched from nvnet to forcedeth and from 2.4.24 to 2.6.7. But I suspect
this is the drivers fault because:

* I have tried to reproduce it on another 2.6.7 machine (with a
different driver) and failed.
* Such an important bug in the network stack would hardly go unnoticed.
* The forcedeth driver is still new and somewhat untested :-)


> I'm asking because I have a similar problem with udp and kernel 2.6.7-rc2-mm2.
> My sendto gets stuck sometimes and only continues if the kernel handles another
> network packet.
>
> But maybe my problem is a totally different one.

In my case, it's the incoming packets that get stuck. Outgoing packets
work just fine. But then again, I'm not sure without running further
tests. I sent my message to the list mainly to know if this was a well
know bug.

--
Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread!

2004-06-23 16:21:19

by Mikael Bouillot

[permalink] [raw]
Subject: Re: Forcedeth driver bug

> Are you sure both come back? If so, what does dmesg say during this time?
> Is the system in question under heavy load?
>
> Can you confirm that the ping packet got stuck in the receive path or
> could the associated pong reply have gotten stuck in the send path?

A tcpdump at the remote end shows the packet leaving, but a tcpdump at
the local side doesn't show it until the next packet arrives. I tried
this on a system running nothing but tcpdump, but the network load is
high (ping sends packets as soon as the previous reply comes back).


> It could be a weird interaction with interrupt mitigation, but I doubt it.
> Nobody has ever mailed me about such problems with the driver.

Another note: I run my 2.6.7 with Local APIC and IO-APIC. Maybe that
has to do with interrupt problems. I will try reverting to the older PIC
during further testing to see if that has an effect on things.


> forcedeth_gigabit_try19.txt is the most recent one.
> Changes against try17:
> - fix compilation warnings and rename the Kconfig entry
>
> Get it at
> http://www.hailfinger.org/carldani/linux/patches/forcedeth/
> and please report if it fixes your problem.

OK, I'll do that.


--
Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread!

2004-06-23 17:03:41

by Mikael Bouillot

[permalink] [raw]
Subject: Re: Forcedeth driver bug

> forcedeth_gigabit_try19.txt is the most recent one.

OK, I've tried forcedeth_gigabit_try19 and I still get the same
problem. The only difference is a "bad: scheduling while atomic!" in the
syslog, but I still get stuck packets.

I've also tried reverting to the older XT-PIC and again, no
improvement.

I'll now try to work my way through debugging the problem myself. I've
got limited experience with kernel hacking, but I'll learn along the way
:-) If anyone has got any new information or suggestion, I would like to
hear about it.

Mikael

--
Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread!