2008-02-07 14:17:21

by Pavel Machek

[permalink] [raw]
Subject: e1000 1sec latency problem

Hi!

I have the famous e1000 latency problems:

64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms

...and they are still there in 2.6.25-git0. I had ethernet EEPROM
checksum problems, which I fixed by the update, but problems are not
gone.

irqpoll helps.

nosmp (which implies XT-PIC is being used) does not help.

16: 1925 0 IO-APIC-fasteoi ahci, yenta, uhci_hcd:usb2, eth0

Booting kernel with nosmp/ no yenta, no usb does not help.

Hmm, as expected, interrupt load on ahci (find /) makes latencies go
away.

It should be easily reproducible on x60 with latest bios, it is 100%
reproducible for me...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


2008-02-07 16:58:54

by Max Krasnyansky

[permalink] [raw]
Subject: Re: e1000 1sec latency problem

Pavel Machek wrote:
> Hi!
>
> I have the famous e1000 latency problems:
>
> 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
> 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
> 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
> 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
> 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
> 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
> 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms
>
> ...and they are still there in 2.6.25-git0. I had ethernet EEPROM
> checksum problems, which I fixed by the update, but problems are not
> gone.
>
> irqpoll helps.
>
> nosmp (which implies XT-PIC is being used) does not help.
>
> 16: 1925 0 IO-APIC-fasteoi ahci, yenta, uhci_hcd:usb2, eth0
>
> Booting kernel with nosmp/ no yenta, no usb does not help.
>
> Hmm, as expected, interrupt load on ahci (find /) makes latencies go
> away.
>
> It should be easily reproducible on x60 with latest bios, it is 100%
> reproducible for me...

So you don't think it's related to the interrupt coalescing by any chance ?
I'd suggest to try and disable the coalescing and see if it makes any difference.
We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 second) though.

Add this to modprobe.conf and reload e1000 module

options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 TxIntDelay=0,0 TxAbsIntDelay=0,0

Max

2008-02-07 17:28:26

by Kok, Auke

[permalink] [raw]
Subject: Re: [E1000-devel] e1000 1sec latency problem

Max Krasnyansky wrote:
> Pavel Machek wrote:
>> Hi!
>>
>> I have the famous e1000 latency problems:
>>
>> 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
>> 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
>> 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
>> 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
>> 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
>> 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
>> 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms
>>
>> ...and they are still there in 2.6.25-git0. I had ethernet EEPROM
>> checksum problems, which I fixed by the update, but problems are not
>> gone.
>>
>> irqpoll helps.
>>
>> nosmp (which implies XT-PIC is being used) does not help.
>>
>> 16: 1925 0 IO-APIC-fasteoi ahci, yenta, uhci_hcd:usb2, eth0
>>
>> Booting kernel with nosmp/ no yenta, no usb does not help.
>>
>> Hmm, as expected, interrupt load on ahci (find /) makes latencies go
>> away.
>>
>> It should be easily reproducible on x60 with latest bios, it is 100%
>> reproducible for me...
>
> So you don't think it's related to the interrupt coalescing by any chance ?
> I'd suggest to try and disable the coalescing and see if it makes any difference.
> We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 second) though.
>
> Add this to modprobe.conf and reload e1000 module
>
> options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 TxIntDelay=0,0 TxAbsIntDelay=0,0

that can't be the problem. irq moderation would only account for 2-3ms variance
maximum.

Pavel, can you send me the `lspci -vvv` of your machine with the very latest git
tree and after it's showing the poor ping performance?

Auke

2008-02-07 18:07:09

by Max Krasnyansky

[permalink] [raw]
Subject: Re: [E1000-devel] e1000 1sec latency problem

Kok, Auke wrote:
> Max Krasnyansky wrote:
>> So you don't think it's related to the interrupt coalescing by any chance ?
>> I'd suggest to try and disable the coalescing and see if it makes any difference.
>> We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 second) though.
>>
>> Add this to modprobe.conf and reload e1000 module
>>
>> options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 TxIntDelay=0,0 TxAbsIntDelay=0,0
>
> that can't be the problem. irq moderation would only account for 2-3ms variance
> maximum.
Oh, I've definitely seen worse than that. Not as bad as a 1second though. Plus you're talking
about the case when coalescing logic is working as designed ;-). What if there is some kind of
bug where timer did not expire or something.

Max

2008-02-07 18:14:24

by Kok, Auke

[permalink] [raw]
Subject: Re: [E1000-devel] e1000 1sec latency problem

Max Krasnyansky wrote:
> Kok, Auke wrote:
>> Max Krasnyansky wrote:
>>> So you don't think it's related to the interrupt coalescing by any chance ?
>>> I'd suggest to try and disable the coalescing and see if it makes any difference.
>>> We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 second) though.
>>>
>>> Add this to modprobe.conf and reload e1000 module
>>>
>>> options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 TxIntDelay=0,0 TxAbsIntDelay=0,0
>> that can't be the problem. irq moderation would only account for 2-3ms variance
>> maximum.
> Oh, I've definitely seen worse than that. Not as bad as a 1second though. Plus you're talking
> about the case when coalescing logic is working as designed ;-). What if there is some kind of
> bug where timer did not expire or something.

we don't use a software timer in e1000 irq coalescing/moderation, it's all in
hardware, so we don't have that problem at all. And I certainly have never seen
anything you are referring to with e1000 hardware, and I do not know of any bug
related to this.

are you maybe confused with other hardware?

feel free to demonstrate an example...

Cheers,

Auke

2008-02-07 18:22:36

by Kok, Auke

[permalink] [raw]
Subject: Re: [E1000-devel] e1000 1sec latency problem

Pavel Machek wrote:
> Hi!
>
> I have the famous e1000 latency problems:
>
> 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
> 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
> 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
> 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
> 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
> 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
> 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms
>
> ...and they are still there in 2.6.25-git0. I had ethernet EEPROM
> checksum problems, which I fixed by the update, but problems are not
> gone.

pavel, start using "e1000e" instead - this driver replaces e1000 for all the
pci-express devices and has the infamous L1 ASPM disable patch to fix this issue.

make sure you have CONFIG_E1000E=m/y in your .config, otherwise the old e1000 code
will drive your card, and that driver does not have the fix.

BAH, this is a good example how Linus' patch can wreak havoc - a lot of people
will now not see fixes since they only go into e1000e, but people can unnoticed
now go and use e1000 for too long...

Auke

2008-02-07 18:35:34

by Max Krasnyansky

[permalink] [raw]
Subject: Re: [E1000-devel] e1000 1sec latency problem



Kok, Auke wrote:
> Max Krasnyansky wrote:
>> Kok, Auke wrote:
>>> Max Krasnyansky wrote:
>>>> So you don't think it's related to the interrupt coalescing by any chance ?
>>>> I'd suggest to try and disable the coalescing and see if it makes any difference.
>>>> We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 second) though.
>>>>
>>>> Add this to modprobe.conf and reload e1000 module
>>>>
>>>> options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 TxIntDelay=0,0 TxAbsIntDelay=0,0
>>> that can't be the problem. irq moderation would only account for 2-3ms variance
>>> maximum.
>> Oh, I've definitely seen worse than that. Not as bad as a 1second though. Plus you're talking
>> about the case when coalescing logic is working as designed ;-). What if there is some kind of
>> bug where timer did not expire or something.
>
> we don't use a software timer in e1000 irq coalescing/moderation, it's all in
> hardware, so we don't have that problem at all. And I certainly have never seen
> anything you are referring to with e1000 hardware, and I do not know of any bug
> related to this.
>
> are you maybe confused with other hardware ?
>
> feel free to demonstrate an example...

Just to give you a background. I wrote and maintain http://libe1000.sf.net
So I know E1000 HW and SW in and out. And no I'm not confused with other HW and I know that we're
not using SW timers for the coalescing. HW can be buggy as well. Note that I'm not saying that I
know for sure that the problem is coalescing, I'm just suggesting to take it out of the equation
while Pavel is investigating.

Unfortunately I cannot demonstrate an example but I've seen unexplained packet delays in the range
of 1-20 milliseconds on E1000 HW (and boy ... I do have a lot of it in my labs). Once coalescing
was disabled those problems have gone away.

Max

2008-02-07 18:47:29

by Kok, Auke

[permalink] [raw]
Subject: Re: [E1000-devel] e1000 1sec latency problem

Max Krasnyansky wrote:
>
> Kok, Auke wrote:
>> Max Krasnyansky wrote:
>>> Kok, Auke wrote:
>>>> Max Krasnyansky wrote:
>>>>> So you don't think it's related to the interrupt coalescing by any chance ?
>>>>> I'd suggest to try and disable the coalescing and see if it makes any difference.
>>>>> We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 second) though.
>>>>>
>>>>> Add this to modprobe.conf and reload e1000 module
>>>>>
>>>>> options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 TxIntDelay=0,0 TxAbsIntDelay=0,0
>>>> that can't be the problem. irq moderation would only account for 2-3ms variance
>>>> maximum.
>>> Oh, I've definitely seen worse than that. Not as bad as a 1second though. Plus you're talking
>>> about the case when coalescing logic is working as designed ;-). What if there is some kind of
>>> bug where timer did not expire or something.
>> we don't use a software timer in e1000 irq coalescing/moderation, it's all in
>> hardware, so we don't have that problem at all. And I certainly have never seen
>> anything you are referring to with e1000 hardware, and I do not know of any bug
>> related to this.
>>
>> are you maybe confused with other hardware ?
>>
>> feel free to demonstrate an example...
>
> Just to give you a background. I wrote and maintain http://libe1000.sf.net
> So I know E1000 HW and SW in and out.

wow, even I do not dare to say that!

> And no I'm not confused with other HW and I know that we're
> not using SW timers for the coalescing. HW can be buggy as well. Note that I'm not saying that I
> know for sure that the problem is coalescing, I'm just suggesting to take it out of the equation
> while Pavel is investigating.
>
> Unfortunately I cannot demonstrate an example but I've seen unexplained packet delays in the range
> of 1-20 milliseconds on E1000 HW (and boy ... I do have a lot of it in my labs). Once coalescing
> was disabled those problems have gone away.

this sounds like you have some sort of PCI POST-ing problem and those can indeed
be worse if you use any form of interrupt coalescing. In any case that is largely
irrelevant to the in-kernel drivers, and as I said we definately have no open
issues on that right now, and I really do not recollect any as well either (other
than the issue of interference when both ends are irq coalescing)

Cheers,

Auke

2008-02-07 18:55:53

by Max Krasnyansky

[permalink] [raw]
Subject: Re: [E1000-devel] e1000 1sec latency problem

Kok, Auke wrote:
> Max Krasnyansky wrote:
>> Kok, Auke wrote:
>>> Max Krasnyansky wrote:
>>>> Kok, Auke wrote:
>>>>> Max Krasnyansky wrote:
>>>>>> So you don't think it's related to the interrupt coalescing by any chance ?
>>>>>> I'd suggest to try and disable the coalescing and see if it makes any difference.
>>>>>> We've had lots of issues with coalescing misbehavior. Not this bad (ie 1 second) though.
>>>>>>
>>>>>> Add this to modprobe.conf and reload e1000 module
>>>>>>
>>>>>> options e1000 RxIntDelay=0,0 RxAbsIntDelay=0,0 InterruptThrottleRate=0,0 TxIntDelay=0,0 TxAbsIntDelay=0,0
>>>>> that can't be the problem. irq moderation would only account for 2-3ms variance
>>>>> maximum.
>>>> Oh, I've definitely seen worse than that. Not as bad as a 1second though. Plus you're talking
>>>> about the case when coalescing logic is working as designed ;-). What if there is some kind of
>>>> bug where timer did not expire or something.
>>> we don't use a software timer in e1000 irq coalescing/moderation, it's all in
>>> hardware, so we don't have that problem at all. And I certainly have never seen
>>> anything you are referring to with e1000 hardware, and I do not know of any bug
>>> related to this.
>>>
>>> are you maybe confused with other hardware ?
>>>
>>> feel free to demonstrate an example...
>> Just to give you a background. I wrote and maintain http://libe1000.sf.net
>> So I know E1000 HW and SW in and out.
>
> wow, even I do not dare to say that!
Ok maybe that was a bit of an overstatement :).

>> And no I'm not confused with other HW and I know that we're
>> not using SW timers for the coalescing. HW can be buggy as well. Note that I'm not saying that I
>> know for sure that the problem is coalescing, I'm just suggesting to take it out of the equation
>> while Pavel is investigating.
>>
>> Unfortunately I cannot demonstrate an example but I've seen unexplained packet delays in the range
>> of 1-20 milliseconds on E1000 HW (and boy ... I do have a lot of it in my labs). Once coalescing
>> was disabled those problems have gone away.
>
> this sounds like you have some sort of PCI POST-ing problem and those can indeed
> be worse if you use any form of interrupt coalescing. In any case that is largely
> irrelevant to the in-kernel drivers, and as I said we definately have no open
> issues on that right now, and I really do not recollect any as well either (other
> than the issue of interference when both ends are irq coalescing)
I was actually talking about in kernel drivers. ie We were seeing delays with TIPC running over in
kernel E1000 driver. And no it was not a TIPC issue, everything worked fine with over TG3 and issues
went away when coalescing was disabled.
Anyway, I think we can drop this subject.

Max

2008-02-07 22:24:14

by Pavel Machek

[permalink] [raw]
Subject: Re: [E1000-devel] e1000 1sec latency problem

Hi!

> > I have the famous e1000 latency problems:
> >
> > 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
> > 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
> > 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
> > 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
> > 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
> > 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
> > 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms
> >
> > ...and they are still there in 2.6.25-git0. I had ethernet EEPROM
> > checksum problems, which I fixed by the update, but problems are not
> > gone.
>
> pavel, start using "e1000e" instead - this driver replaces e1000 for all the
> pci-express devices and has the infamous L1 ASPM disable patch to
> fix this issue.

Ok, e1000e seems to work for me.

In another email, you asked for lspci -vvvv of failing e1000
case. Should I still provide it?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-02-07 22:33:53

by Martin

[permalink] [raw]
Subject: Re: e1000 1sec latency problem

Pavel Machek wrote:

> Hi!
>
> I have the famous e1000 latency problems:

Hi, I have the same problem with my Thinkpad T60.

root@zorro:~# ping arnold
PING arnold (192.168.158.6) 56(84) bytes of data.
64 bytes from arnold (192.168.158.6): icmp_seq=1 ttl=64 time=49.7 ms
64 bytes from arnold (192.168.158.6): icmp_seq=2 ttl=64 time=0.438 ms
64 bytes from arnold (192.168.158.6): icmp_seq=3 ttl=64 time=1000 ms
64 bytes from arnold (192.168.158.6): icmp_seq=4 ttl=64 time=0.970 ms
64 bytes from arnold (192.168.158.6): icmp_seq=5 ttl=64 time=885 ms
64 bytes from arnold (192.168.158.6): icmp_seq=6 ttl=64 time=0.484 ms
64 bytes from arnold (192.168.158.6): icmp_seq=7 ttl=64 time=529 ms
64 bytes from arnold (192.168.158.6): icmp_seq=8 ttl=64 time=1.02 ms
64 bytes from arnold (192.168.158.6): icmp_seq=9 ttl=64 time=149 ms
64 bytes from arnold (192.168.158.6): icmp_seq=10 ttl=64 time=0.549 ms
64 bytes from arnold (192.168.158.6): icmp_seq=11 ttl=64 time=0.829 ms

--- arnold ping statistics ---
11 packets transmitted, 11 received, 0% packet loss, time 9999ms
rtt min/avg/max/mdev = 0.438/238.113/1000.967/365.279 ms, pipe 2
root@zorro:~# uname -a
Linux zorro 2.6.24 #6 SMP PREEMPT Sun Feb 3 18:27:48 CET 2008 i686 Intel(R)
Core(TM)2 CPU T7200 @ 2.00GHz GenuineIntel GNU/Linux
root@zorro:~# lspci -vvv
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and
945GT Express Memory Controller Hub (rev 03)
Subsystem: Lenovo Unknown device 2015
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
Latency: 0
Capabilities: [e0] Vendor Specific Information

[stuff deleted]

02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet
Controller
Subsystem: Lenovo ThinkPad T60
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at ee000000 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at 3000 [size=32]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+
Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [e0] Express Endpoint IRQ 0
Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag-
Device: Latency L0s <512ns, L1 <64us
Device: AtnBtn- AtnInd- PwrInd-
Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s L1, Port 0
Link: Latency L0s <128ns, L1 <64us
Link: ASPM L1 Enabled RCB 64 bytes CommClk+ ExtSynch-
Link: Speed 2.5Gb/s, Width x1
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Device Serial Number 71-3a-c3-ff-ff-58-15-00


Unfortunately the e1000e driver is not an option as it will not detect the
NIC:

----from dmesg with e1000 compiled in:
Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI
Copyright (c) 1999-2006 Intel Corporation.
ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:02:00.0 to 64
e1000: 0000:02:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x1)
00:15:58:c3:3a:71
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection

----from dmesg with e1000e compiled in:
e1000e: Intel(R) PRO/1000 Network Driver - 0.2.0
e1000e: Copyright (c) 1999-2007 Intel Corporation.

Any pointers?

Thanks,

Martin

2008-02-07 22:34:26

by Kok, Auke

[permalink] [raw]
Subject: Re: [E1000-devel] e1000 1sec latency problem

Pavel Machek wrote:
> Hi!
>
>>> I have the famous e1000 latency problems:
>>>
>>> 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
>>> 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
>>> 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
>>> 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
>>> 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
>>> 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
>>> 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms
>>>
>>> ...and they are still there in 2.6.25-git0. I had ethernet EEPROM
>>> checksum problems, which I fixed by the update, but problems are not
>>> gone.
>> pavel, start using "e1000e" instead - this driver replaces e1000 for all the
>> pci-express devices and has the infamous L1 ASPM disable patch to
>> fix this issue.
>
> Ok, e1000e seems to work for me.
>
> In another email, you asked for lspci -vvvv of failing e1000
> case. Should I still provide it?

well, if you do it you should see that L1 ASPM is now disabled (with e1000e)
whereas with e1000 it is still enabled. That's the fix that you need...

Auke

2008-02-07 22:37:06

by Pavel Machek

[permalink] [raw]
Subject: Re: [E1000-devel] e1000 1sec latency problem

On Thu 2008-02-07 14:32:16, Kok, Auke wrote:
> Pavel Machek wrote:
> > Hi!
> >
> >>> I have the famous e1000 latency problems:
> >>>
> >>> 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
> >>> 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
> >>> 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
> >>> 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
> >>> 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
> >>> 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
> >>> 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms
> >>>
> >>> ...and they are still there in 2.6.25-git0. I had ethernet EEPROM
> >>> checksum problems, which I fixed by the update, but problems are not
> >>> gone.
> >> pavel, start using "e1000e" instead - this driver replaces e1000 for all the
> >> pci-express devices and has the infamous L1 ASPM disable patch to
> >> fix this issue.
> >
> > Ok, e1000e seems to work for me.
> >
> > In another email, you asked for lspci -vvvv of failing e1000
> > case. Should I still provide it?
>
> well, if you do it you should see that L1 ASPM is now disabled (with e1000e)
> whereas with e1000 it is still enabled. That's the fix that you need...

Is there easy way to push that fix to e1000, too? Or print "use e1000e
instead" and refuse to load?
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-02-07 23:17:14

by Kok, Auke

[permalink] [raw]
Subject: Re: [E1000-devel] e1000 1sec latency problem

Pavel Machek wrote:
> On Thu 2008-02-07 14:32:16, Kok, Auke wrote:
>> Pavel Machek wrote:
>>> Hi!
>>>
>>>>> I have the famous e1000 latency problems:
>>>>>
>>>>> 64 bytes from 195.113.31.123: icmp_seq=68 ttl=56 time=351.9 ms
>>>>> 64 bytes from 195.113.31.123: icmp_seq=69 ttl=56 time=209.2 ms
>>>>> 64 bytes from 195.113.31.123: icmp_seq=70 ttl=56 time=1004.1 ms
>>>>> 64 bytes from 195.113.31.123: icmp_seq=71 ttl=56 time=308.9 ms
>>>>> 64 bytes from 195.113.31.123: icmp_seq=72 ttl=56 time=305.4 ms
>>>>> 64 bytes from 195.113.31.123: icmp_seq=73 ttl=56 time=9.8 ms
>>>>> 64 bytes from 195.113.31.123: icmp_seq=74 ttl=56 time=3.7 ms
>>>>>
>>>>> ...and they are still there in 2.6.25-git0. I had ethernet EEPROM
>>>>> checksum problems, which I fixed by the update, but problems are not
>>>>> gone.
>>>> pavel, start using "e1000e" instead - this driver replaces e1000 for all the
>>>> pci-express devices and has the infamous L1 ASPM disable patch to
>>>> fix this issue.
>>> Ok, e1000e seems to work for me.
>>>
>>> In another email, you asked for lspci -vvvv of failing e1000
>>> case. Should I still provide it?
>> well, if you do it you should see that L1 ASPM is now disabled (with e1000e)
>> whereas with e1000 it is still enabled. That's the fix that you need...
>
> Is there easy way to push that fix to e1000, too? Or print "use e1000e
> instead" and refuse to load?

well we're going to delete all pci-e related code from this driver soon anyway,
but I am indeed writing a patch right now that prints out this warning...

Auke