Hi,
I was testing something on an old server (Dell T105 opteron) and noticed
packet loss after updating the kernel from 3.10 to 4.5. The test was:
On Dell run: iperf -s
On another system: iperf3 -c dell -u -b 20M -l 1k -t 1000
This sends a 20mbit UDP stream to the Dell. It works fine normally (0%
packet loss), but when CONFIG_PROVE_LOCKING is enabled there is high
(35%) packet loss. (DEBUG_LOCKDEP also seems to cause packet loss)
The packet loss bisected back to:
commit 88d84ac97378c2f1d5fec9af1e8b7d9a662d6b00
Author: Borislav Petkov <[email protected]>
Date: Fri Jul 19 12:28:25 2013 +0200
EDAC: Fix lockdep splat
I have confirmed that the commit preceding this (v3.11-rc1) is fine and
that 88d84a introduced the bug.
On Mon, Mar 21, 2016 at 09:42:09PM +0000, Chris Bainbridge wrote:
> Hi,
>
> I was testing something on an old server (Dell T105 opteron) and noticed
> packet loss after updating the kernel from 3.10 to 4.5. The test was:
>
> On Dell run: iperf -s
> On another system: iperf3 -c dell -u -b 20M -l 1k -t 1000
>
> This sends a 20mbit UDP stream to the Dell. It works fine normally (0%
> packet loss), but when CONFIG_PROVE_LOCKING is enabled there is high
> (35%) packet loss. (DEBUG_LOCKDEP also seems to cause packet loss)
>
> The packet loss bisected back to:
>
> commit 88d84ac97378c2f1d5fec9af1e8b7d9a662d6b00
> Author: Borislav Petkov <[email protected]>
> Date: Fri Jul 19 12:28:25 2013 +0200
>
> EDAC: Fix lockdep splat
Hmm, how would that cause a packet loss?!
> I have confirmed that the commit preceding this (v3.11-rc1) is fine and
> that 88d84a introduced the bug.
Did you revert this commit ontop of 4.5 and reproduce again? Do you see
the same packet loss?
What kind of hw is that target system, can you send full dmesg and
.config?
Thanks.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
On Tue, Mar 22, 2016 at 06:31:54AM +0100, Borislav Petkov wrote:
> On Mon, Mar 21, 2016 at 09:42:09PM +0000, Chris Bainbridge wrote:
> > Hi,
> >
> > I was testing something on an old server (Dell T105 opteron) and noticed
> > packet loss after updating the kernel from 3.10 to 4.5. The test was:
> >
> > On Dell run: iperf -s
> > On another system: iperf3 -c dell -u -b 20M -l 1k -t 1000
> >
> > This sends a 20mbit UDP stream to the Dell. It works fine normally (0%
> > packet loss), but when CONFIG_PROVE_LOCKING is enabled there is high
> > (35%) packet loss. (DEBUG_LOCKDEP also seems to cause packet loss)
> >
> > The packet loss bisected back to:
> >
> > commit 88d84ac97378c2f1d5fec9af1e8b7d9a662d6b00
> > Author: Borislav Petkov <[email protected]>
> > Date: Fri Jul 19 12:28:25 2013 +0200
> >
> > EDAC: Fix lockdep splat
>
> Hmm, how would that cause a packet loss?!
The previous bug would disable lockdep and thereby avoid much of the
normal overhead associated with lockdep. I suspect the packet loss is a
result of increased overhead.
IOW, everything works as expected.
On Tue, Mar 22, 2016 at 06:31:54AM +0100, Borislav Petkov wrote:
> On Mon, Mar 21, 2016 at 09:42:09PM +0000, Chris Bainbridge wrote:
> > Hi,
> >
> > I was testing something on an old server (Dell T105 opteron) and noticed
> > packet loss after updating the kernel from 3.10 to 4.5. The test was:
> >
> > On Dell run: iperf -s
> > On another system: iperf3 -c dell -u -b 20M -l 1k -t 1000
> >
> > This sends a 20mbit UDP stream to the Dell. It works fine normally (0%
> > packet loss), but when CONFIG_PROVE_LOCKING is enabled there is high
> > (35%) packet loss. (DEBUG_LOCKDEP also seems to cause packet loss)
> >
> > The packet loss bisected back to:
> >
> > commit 88d84ac97378c2f1d5fec9af1e8b7d9a662d6b00
> > Author: Borislav Petkov <[email protected]>
> > Date: Fri Jul 19 12:28:25 2013 +0200
> >
> > EDAC: Fix lockdep splat
>
> Hmm, how would that cause a packet loss?!
Good question. The patch looks pretty innocuous but it is for a lockdep
issue and the bug only appears when lockdep config options are enabled.
Could it somehow have broken a lock used to synchronise packet rx?
> > I have confirmed that the commit preceding this (v3.11-rc1) is fine and
> > that 88d84a introduced the bug.
>
> Did you revert this commit ontop of 4.5 and reproduce again? Do you see
> the same packet loss?
Reverting over 4.5 does fix the packet loss issue.
> What kind of hw is that target system, can you send full dmesg and
> .config?
It is https://www.suse.com/yes/123682.htm with a slightly faster CPU
(Opteron 1212 2Ghz) and 4GB RAM.
dmesg and .config attached.
On Tue, Mar 22, 2016 at 09:16:56AM +0000, Chris Bainbridge wrote:
> Good question. The patch looks pretty innocuous but it is for a lockdep
> issue and the bug only appears when lockdep config options are enabled.
> Could it somehow have broken a lock used to synchronise packet rx?
How? EDAC and net don't have anything in common...
> Reverting over 4.5 does fix the packet loss issue.
Peter makes sense to me. If he's right, if you build a kernel with
CONFIG_EDAC=n and PROVE_LOCKING=y and do your test again, you should
be seeing that packet loss again... because EDAC with the fix won't be
there to disable lockdep.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.