2016-03-21 21:42:16

by Chris Bainbridge

[permalink] [raw]
Subject: [BUG] packet loss with PROVE_LOCKING, bisected to EDAC fix

Hi,

I was testing something on an old server (Dell T105 opteron) and noticed
packet loss after updating the kernel from 3.10 to 4.5. The test was:

On Dell run: iperf -s
On another system: iperf3 -c dell -u -b 20M -l 1k -t 1000

This sends a 20mbit UDP stream to the Dell. It works fine normally (0%
packet loss), but when CONFIG_PROVE_LOCKING is enabled there is high
(35%) packet loss. (DEBUG_LOCKDEP also seems to cause packet loss)

The packet loss bisected back to:

commit 88d84ac97378c2f1d5fec9af1e8b7d9a662d6b00
Author: Borislav Petkov <[email protected]>
Date: Fri Jul 19 12:28:25 2013 +0200

EDAC: Fix lockdep splat

I have confirmed that the commit preceding this (v3.11-rc1) is fine and
that 88d84a introduced the bug.


2016-03-22 05:31:59

by Borislav Petkov

[permalink] [raw]
Subject: Re: [BUG] packet loss with PROVE_LOCKING, bisected to EDAC fix

On Mon, Mar 21, 2016 at 09:42:09PM +0000, Chris Bainbridge wrote:
> Hi,
>
> I was testing something on an old server (Dell T105 opteron) and noticed
> packet loss after updating the kernel from 3.10 to 4.5. The test was:
>
> On Dell run: iperf -s
> On another system: iperf3 -c dell -u -b 20M -l 1k -t 1000
>
> This sends a 20mbit UDP stream to the Dell. It works fine normally (0%
> packet loss), but when CONFIG_PROVE_LOCKING is enabled there is high
> (35%) packet loss. (DEBUG_LOCKDEP also seems to cause packet loss)
>
> The packet loss bisected back to:
>
> commit 88d84ac97378c2f1d5fec9af1e8b7d9a662d6b00
> Author: Borislav Petkov <[email protected]>
> Date: Fri Jul 19 12:28:25 2013 +0200
>
> EDAC: Fix lockdep splat

Hmm, how would that cause a packet loss?!

> I have confirmed that the commit preceding this (v3.11-rc1) is fine and
> that 88d84a introduced the bug.

Did you revert this commit ontop of 4.5 and reproduce again? Do you see
the same packet loss?

What kind of hw is that target system, can you send full dmesg and
.config?

Thanks.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.

2016-03-22 08:18:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [BUG] packet loss with PROVE_LOCKING, bisected to EDAC fix

On Tue, Mar 22, 2016 at 06:31:54AM +0100, Borislav Petkov wrote:
> On Mon, Mar 21, 2016 at 09:42:09PM +0000, Chris Bainbridge wrote:
> > Hi,
> >
> > I was testing something on an old server (Dell T105 opteron) and noticed
> > packet loss after updating the kernel from 3.10 to 4.5. The test was:
> >
> > On Dell run: iperf -s
> > On another system: iperf3 -c dell -u -b 20M -l 1k -t 1000
> >
> > This sends a 20mbit UDP stream to the Dell. It works fine normally (0%
> > packet loss), but when CONFIG_PROVE_LOCKING is enabled there is high
> > (35%) packet loss. (DEBUG_LOCKDEP also seems to cause packet loss)
> >
> > The packet loss bisected back to:
> >
> > commit 88d84ac97378c2f1d5fec9af1e8b7d9a662d6b00
> > Author: Borislav Petkov <[email protected]>
> > Date: Fri Jul 19 12:28:25 2013 +0200
> >
> > EDAC: Fix lockdep splat
>
> Hmm, how would that cause a packet loss?!

The previous bug would disable lockdep and thereby avoid much of the
normal overhead associated with lockdep. I suspect the packet loss is a
result of increased overhead.

IOW, everything works as expected.

2016-03-22 09:17:18

by Chris Bainbridge

[permalink] [raw]
Subject: Re: [BUG] packet loss with PROVE_LOCKING, bisected to EDAC fix

On Tue, Mar 22, 2016 at 06:31:54AM +0100, Borislav Petkov wrote:
> On Mon, Mar 21, 2016 at 09:42:09PM +0000, Chris Bainbridge wrote:
> > Hi,
> >
> > I was testing something on an old server (Dell T105 opteron) and noticed
> > packet loss after updating the kernel from 3.10 to 4.5. The test was:
> >
> > On Dell run: iperf -s
> > On another system: iperf3 -c dell -u -b 20M -l 1k -t 1000
> >
> > This sends a 20mbit UDP stream to the Dell. It works fine normally (0%
> > packet loss), but when CONFIG_PROVE_LOCKING is enabled there is high
> > (35%) packet loss. (DEBUG_LOCKDEP also seems to cause packet loss)
> >
> > The packet loss bisected back to:
> >
> > commit 88d84ac97378c2f1d5fec9af1e8b7d9a662d6b00
> > Author: Borislav Petkov <[email protected]>
> > Date: Fri Jul 19 12:28:25 2013 +0200
> >
> > EDAC: Fix lockdep splat
>
> Hmm, how would that cause a packet loss?!

Good question. The patch looks pretty innocuous but it is for a lockdep
issue and the bug only appears when lockdep config options are enabled.
Could it somehow have broken a lock used to synchronise packet rx?

> > I have confirmed that the commit preceding this (v3.11-rc1) is fine and
> > that 88d84a introduced the bug.
>
> Did you revert this commit ontop of 4.5 and reproduce again? Do you see
> the same packet loss?

Reverting over 4.5 does fix the packet loss issue.

> What kind of hw is that target system, can you send full dmesg and
> .config?

It is https://www.suse.com/yes/123682.htm with a slightly faster CPU
(Opteron 1212 2Ghz) and 4GB RAM.

dmesg and .config attached.


Attachments:
(No filename) (1.54 kB)
config-t105-4.5 (108.63 kB)
dmesg-dell (42.65 kB)
Download all attachments

2016-03-22 11:12:51

by Borislav Petkov

[permalink] [raw]
Subject: Re: [BUG] packet loss with PROVE_LOCKING, bisected to EDAC fix

On Tue, Mar 22, 2016 at 09:16:56AM +0000, Chris Bainbridge wrote:
> Good question. The patch looks pretty innocuous but it is for a lockdep
> issue and the bug only appears when lockdep config options are enabled.
> Could it somehow have broken a lock used to synchronise packet rx?

How? EDAC and net don't have anything in common...

> Reverting over 4.5 does fix the packet loss issue.

Peter makes sense to me. If he's right, if you build a kernel with
CONFIG_EDAC=n and PROVE_LOCKING=y and do your test again, you should
be seeing that packet loss again... because EDAC with the fix won't be
there to disable lockdep.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.