2015-12-21 20:34:21

by Oleksandr Natalenko

[permalink] [raw]
Subject: [REGRESSION] tcp/ipv4: kernel panic because of (possible) division by zero

Commit 3759824da87b30ce7a35b4873b62b0ba38905ef5 (tcp: PRR uses CRB mode by
default and SS mode conditionally) introduced changes to net/ipv4/tcp_input.c
tcp_cwnd_reduction() that, possibly, cause division by zero, and therefore,
kernel panic in interrupt handler [1].

Reverting 3759824da87b30ce7a35b4873b62b0ba38905ef5 seems to fix the issue.

I'm able to reproduce the issue on 4.3.0–4.3.3 once per several day
(occasionally).

What could be done to help in debugging this issue?

Regards,
Oleksandr.

[1] http://i.piccy.info/
i9/6f5cb187c4ff282d189f78c63f95af43/1450729403/283985/951663/panic.jpg


2015-12-22 02:11:18

by Yuchung Cheng

[permalink] [raw]
Subject: Re: [REGRESSION] tcp/ipv4: kernel panic because of (possible) division by zero

On Mon, Dec 21, 2015 at 12:25 PM, Oleksandr Natalenko
<[email protected]> wrote:
> Commit 3759824da87b30ce7a35b4873b62b0ba38905ef5 (tcp: PRR uses CRB mode by
> default and SS mode conditionally) introduced changes to net/ipv4/tcp_input.c
> tcp_cwnd_reduction() that, possibly, cause division by zero, and therefore,
> kernel panic in interrupt handler [1].
>
> Reverting 3759824da87b30ce7a35b4873b62b0ba38905ef5 seems to fix the issue.
>
> I'm able to reproduce the issue on 4.3.0–4.3.3 once per several day
> (occasionally).
>
> What could be done to help in debugging this issue?
Do you have ECN enabled (i.e. sysctl net.ipv4.tcp_ecn > 0)?

If so I suspect an ACK carrying ECE during CA_Loss causes entering CWR
state w/o calling tcp_init_cwnd_reduct() to set tp->prior_cwnd. Can
you try this debug / quick-fix patch and send me the error message if
any?


>
> Regards,
> Oleksandr.
>
> [1] http://i.piccy.info/
> i9/6f5cb187c4ff282d189f78c63f95af43/1450729403/283985/951663/panic.jpg


Attachments:
0001-tcp-debug-tcp_cwnd_reduction-div0.patch (1.32 kB)

2015-12-22 20:13:42

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: [REGRESSION] tcp/ipv4: kernel panic because of (possible) division by zero

That is correct, I have net.ipv4.tcp_ecn set to 1.

I've recompiled the kernel with proposed patch, now still waiting for issue to
be triggered.

Could I manually simulate the erroneous TCP ECN behavior to speed up the
debugging?

On понеділок, 21 грудня 2015 р. 18:10:32 EET Yuchung Cheng wrote:
> On Mon, Dec 21, 2015 at 12:25 PM, Oleksandr Natalenko
>
> <[email protected]> wrote:
> > Commit 3759824da87b30ce7a35b4873b62b0ba38905ef5 (tcp: PRR uses CRB mode by
> > default and SS mode conditionally) introduced changes to
> > net/ipv4/tcp_input.c tcp_cwnd_reduction() that, possibly, cause division
> > by zero, and therefore, kernel panic in interrupt handler [1].
> >
> > Reverting 3759824da87b30ce7a35b4873b62b0ba38905ef5 seems to fix the issue.
> >
> > I'm able to reproduce the issue on 4.3.0–4.3.3 once per several day
> > (occasionally).
> >
> > What could be done to help in debugging this issue?
>
> Do you have ECN enabled (i.e. sysctl net.ipv4.tcp_ecn > 0)?
>
> If so I suspect an ACK carrying ECE during CA_Loss causes entering CWR
> state w/o calling tcp_init_cwnd_reduct() to set tp->prior_cwnd. Can
> you try this debug / quick-fix patch and send me the error message if
> any?
>
> > Regards,
> >
> > Oleksandr.
> >
> > [1] http://i.piccy.info/
> > i9/6f5cb187c4ff282d189f78c63f95af43/1450729403/283985/951663/panic.jpg