2017-12-11 20:38:39

by Joseph Salisbury

[permalink] [raw]
Subject: [REGRESSION][4.13.y][4.14.y][v4.15.y] net: reduce skb_warn_bad_offload() noise

Hi Eric,

A kernel bug report was opened against Ubuntu [0].  It was found that
reverting the following commit resolved this bug:

commit b2504a5dbef3305ef41988ad270b0e8ec289331c
Author: Eric Dumazet <[email protected]>
Date:   Tue Jan 31 10:20:32 2017 -0800

    net: reduce skb_warn_bad_offload() noise
   

The regression was introduced as of v4.11-rc1 and still exists in
current mainline.
   
I was hoping to get your feedback, since you are the patch author.  Do
you think gathering any additional data will help diagnose this issue,
or would it be best to submit a revert request?
   
This commit did in fact resolve another bug[1], but in the process
introduced this regression.
  
Thanks,

Joe

[0] http://pad.lv/1715609
[1] http://pad.lv/1705447


2017-12-11 21:26:10

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [REGRESSION][4.13.y][4.14.y][v4.15.y] net: reduce skb_warn_bad_offload() noise

On Mon, Dec 11, 2017 at 3:35 PM, Joseph Salisbury
<[email protected]> wrote:
> Hi Eric,
>
> A kernel bug report was opened against Ubuntu [0]. It was found that
> reverting the following commit resolved this bug:

The recorded trace in that bug is against 4.10.0 with some backports.
Given that commit b2504a5dbef3 ("net: reduce skb_warn_bad_offload()
noise") is implicated, I guess that that was backported from 4.11-rc1.

The WARN shows

e1000e: caps=(0x00000030002149a9, 0x0000000000000000)
len=1701 data_len=1659 gso_size=1480 gso_type=2 ip_summed=0

The numbering changed in 4.14, but for this kernel

SKB_GSO_UDP = 1 << 1,

so this is a UFO packet with CHECKSUM_NONE. The stack shows

kernel: [570943.494549] skb_warn_bad_offload+0xd1/0x120
kernel: [570943.494550] __skb_gso_segment+0x17d/0x190
kernel: [570943.494564] validate_xmit_skb+0x14f/0x2a0
kernel: [570943.494565] validate_xmit_skb_list+0x43/0x70

so if that patch has been backported, then this must trigger in
__skb_gso_segment on the return path from skb_mac_gso_segment.

Did you backport

commit 8d63bee643f1fb53e472f0e135cae4eb99d62d19
Author: Willem de Bruijn <[email protected]>
Date: Tue Aug 8 14:22:55 2017 -0400

net: avoid skb_warn_bad_offload false positives on UFO

skb_warn_bad_offload triggers a warning when an skb enters the GSO
stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL
checksum offload set.

Commit b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise")
observed that SKB_GSO_DODGY producers can trigger the check and
that passing those packets through the GSO handlers will fix it
up. But, the software UFO handler will set ip_summed to
CHECKSUM_NONE.

When __skb_gso_segment is called from the receive path, this
triggers the warning again.

Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On
Tx these two are equivalent. On Rx, this better matches the
skb state (checksum computed), as CHECKSUM_NONE here means no
checksum computed.

See also this thread for context:
http://patchwork.ozlabs.org/patch/799015/

Fixes: b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise")
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Note that UFO was removed in 4.14 and that skb_warn_bad_offload
can happen for various types of packets, so there may be multiple
independent bug reports. I'm investigating two other non-UFO reports
just now.

2017-12-11 21:28:59

by David Miller

[permalink] [raw]
Subject: Re: [REGRESSION][4.13.y][4.14.y][v4.15.y] net: reduce skb_warn_bad_offload() noise

From: Joseph Salisbury <[email protected]>
Date: Mon, 11 Dec 2017 15:35:34 -0500

> A kernel bug report was opened against Ubuntu [0].? It was found that
> reverting the following commit resolved this bug:
>
> commit b2504a5dbef3305ef41988ad270b0e8ec289331c
> Author: Eric Dumazet <[email protected]>
> Date:?? Tue Jan 31 10:20:32 2017 -0800
>
> ??? net: reduce skb_warn_bad_offload() noise
> ???
>
> The regression was introduced as of v4.11-rc1 and still exists in
> current mainline.
> ???
> I was hoping to get your feedback, since you are the patch author.? Do
> you think gathering any additional data will help diagnose this issue,
> or would it be best to submit a revert request?
> ???
> This commit did in fact resolve another bug[1], but in the process
> introduced this regression.

It helps if you can consolidate the information obtained in your bug
tracking here in the email so that people on this list can get an idea
of what the problem scope might be without having to go to your
special bug tracking site.

This is really not about us being snobs about this mailing list, it's
about you wanting to get a result. And you'll get a better result
faster if you post the details here on the lsit because most
developers are not going to go to your bug tracking site to read the
bug comments.

Also, this isn't a functional regression, it is just that we are
generating warnings that we didn't before. It doesn't mean that
Eric's patch is wrong, it could just be that his new check is
triggering for a bug that has always been there.

Scanning the bug myself it seems that the critical required component
is IPSEC, and IPSEC has it's own way of doing segmentation offload.

Thanks.

2017-12-11 21:44:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [REGRESSION][4.13.y][4.14.y][v4.15.y] net: reduce skb_warn_bad_offload() noise

On Mon, Dec 11, 2017 at 04:25:26PM -0500, Willem de Bruijn wrote:
> Note that UFO was removed in 4.14 and that skb_warn_bad_offload
> can happen for various types of packets, so there may be multiple
> independent bug reports. I'm investigating two other non-UFO reports
> just now.

Meta-comment, now that UFO is gone from mainline, I'm wondering if I
should just delete it from 4.4 and 4.9 as well. Any objections for
that? I'd like to make it easy to maintain these kernels for a while,
and having them diverge like this, with all of the issues around UFO,
seems like it will just make life harder for myself if I leave it in.

Any opinions?

thanks,

greg k-h

2017-12-11 21:57:40

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [REGRESSION][4.13.y][4.14.y][v4.15.y] net: reduce skb_warn_bad_offload() noise

On Mon, Dec 11, 2017 at 4:44 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Mon, Dec 11, 2017 at 04:25:26PM -0500, Willem de Bruijn wrote:
>> Note that UFO was removed in 4.14 and that skb_warn_bad_offload
>> can happen for various types of packets, so there may be multiple
>> independent bug reports. I'm investigating two other non-UFO reports
>> just now.
>
> Meta-comment, now that UFO is gone from mainline, I'm wondering if I
> should just delete it from 4.4 and 4.9 as well. Any objections for
> that? I'd like to make it easy to maintain these kernels for a while,
> and having them diverge like this, with all of the issues around UFO,
> seems like it will just make life harder for myself if I leave it in.
>
> Any opinions?

Some of that removal had to be reverted with commit 0c19f846d582
("net: accept UFO datagrams from tuntap and packet") for VM live
migration between kernels.

Any backports probably should squash that in at the least. Just today
another thread discussed that that patch may not address all open
issues still, so it may be premature to backport at this point.
http://lkml.kernel.org/r/<[email protected]>

2017-12-12 14:10:21

by David Miller

[permalink] [raw]
Subject: Re: [REGRESSION][4.13.y][4.14.y][v4.15.y] net: reduce skb_warn_bad_offload() noise

From: Willem de Bruijn <[email protected]>
Date: Mon, 11 Dec 2017 16:56:56 -0500

> On Mon, Dec 11, 2017 at 4:44 PM, Greg Kroah-Hartman
> <[email protected]> wrote:
>> On Mon, Dec 11, 2017 at 04:25:26PM -0500, Willem de Bruijn wrote:
>>> Note that UFO was removed in 4.14 and that skb_warn_bad_offload
>>> can happen for various types of packets, so there may be multiple
>>> independent bug reports. I'm investigating two other non-UFO reports
>>> just now.
>>
>> Meta-comment, now that UFO is gone from mainline, I'm wondering if I
>> should just delete it from 4.4 and 4.9 as well. Any objections for
>> that? I'd like to make it easy to maintain these kernels for a while,
>> and having them diverge like this, with all of the issues around UFO,
>> seems like it will just make life harder for myself if I leave it in.
>>
>> Any opinions?
>
> Some of that removal had to be reverted with commit 0c19f846d582
> ("net: accept UFO datagrams from tuntap and packet") for VM live
> migration between kernels.
>
> Any backports probably should squash that in at the least. Just today
> another thread discussed that that patch may not address all open
> issues still, so it may be premature to backport at this point.
> http://lkml.kernel.org/r/<[email protected]>

I would probably discourage backporting the UFO removal, at least for
now.

2017-12-12 21:17:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [REGRESSION][4.13.y][4.14.y][v4.15.y] net: reduce skb_warn_bad_offload() noise

On Tue, Dec 12, 2017 at 09:10:11AM -0500, David Miller wrote:
> From: Willem de Bruijn <[email protected]>
> Date: Mon, 11 Dec 2017 16:56:56 -0500
>
> > On Mon, Dec 11, 2017 at 4:44 PM, Greg Kroah-Hartman
> > <[email protected]> wrote:
> >> On Mon, Dec 11, 2017 at 04:25:26PM -0500, Willem de Bruijn wrote:
> >>> Note that UFO was removed in 4.14 and that skb_warn_bad_offload
> >>> can happen for various types of packets, so there may be multiple
> >>> independent bug reports. I'm investigating two other non-UFO reports
> >>> just now.
> >>
> >> Meta-comment, now that UFO is gone from mainline, I'm wondering if I
> >> should just delete it from 4.4 and 4.9 as well. Any objections for
> >> that? I'd like to make it easy to maintain these kernels for a while,
> >> and having them diverge like this, with all of the issues around UFO,
> >> seems like it will just make life harder for myself if I leave it in.
> >>
> >> Any opinions?
> >
> > Some of that removal had to be reverted with commit 0c19f846d582
> > ("net: accept UFO datagrams from tuntap and packet") for VM live
> > migration between kernels.
> >
> > Any backports probably should squash that in at the least. Just today
> > another thread discussed that that patch may not address all open
> > issues still, so it may be premature to backport at this point.
> > http://lkml.kernel.org/r/<[email protected]>
>
> I would probably discourage backporting the UFO removal, at least for
> now.

Ok, thanks for letting me know, I'll ask again in 6 months or so :)

greg k-h