Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752332AbdFSO7g (ORCPT ); Mon, 19 Jun 2017 10:59:36 -0400 Received: from mx1.redhat.com ([209.132.183.28]:43324 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751081AbdFSO7d (ORCPT ); Mon, 19 Jun 2017 10:59:33 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com ABC39C0587DA Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=vyasevic@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com ABC39C0587DA Reply-To: vyasevic@redhat.com Subject: Re: [PATCH net] net: account for current skb length when deciding about UFO To: Michal Kubecek , "David S. Miller" Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Zheng Li , Hideaki YOSHIFUJI References: <20170619110343.7CB5EA1377@unicorn.suse.cz> From: Vlad Yasevich Organization: Red Hat Message-ID: <9547a50c-2b2c-1772-2944-098b21eefa40@redhat.com> Date: Mon, 19 Jun 2017 10:59:16 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0 MIME-Version: 1.0 In-Reply-To: <20170619110343.7CB5EA1377@unicorn.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Mon, 19 Jun 2017 14:59:22 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3343 Lines: 77 On 06/19/2017 07:03 AM, Michal Kubecek wrote: > Our customer encountered stuck NFS writes for blocks starting at specific > offsets w.r.t. page boundary caused by networking stack sending packets via > UFO enabled device with wrong checksum. The problem can be reproduced by > composing a long UDP datagram from multiple parts using MSG_MORE flag: > > sendto(sd, buff, 1000, MSG_MORE, ...); > sendto(sd, buff, 1000, MSG_MORE, ...); > sendto(sd, buff, 3000, 0, ...); > > Assume this packet is to be routed via a device with MTU 1500 and > NETIF_F_UFO enabled. When second sendto() gets into __ip_append_data(), > this condition is tested (among others) to decide whether to call > ip_ufo_append_data(): > > ((length + fragheaderlen) > mtu) || (skb && skb_is_gso(skb)) > > At the moment, we already have skb with 1028 bytes of data which is not > marked for GSO so that the test is false (fragheaderlen is usually 20). > Thus we append second 1000 bytes to this skb without invoking UFO. Third > sendto(), however, has sufficient length to trigger the UFO path so that we > end up with non-UFO skb followed by a UFO one. Later on, udp_send_skb() > uses udp_csum() to calculate the checksum but that assumes all fragments > have correct checksum in skb->csum which is not true for UFO fragments. > > When checking against MTU, we need to add skb->len to length of new segment > if we already have a partially filled skb and fragheaderlen only if there > isn't one. > > In the IPv6 case, skb can only be null if this is the first segment so that > we have to use headersize (length of the first IPv6 header) rather than > fragheaderlen (length of IPv6 header of further fragments) for skb == NULL. > > Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach") > Fixes: e4c5e13aa45c ("ipv6: Should use consistent conditional judgement for > ip6 fragment between __ip6_append_data and ip6_finish_output") > Signed-off-by: Michal Kubecek LGTM. Acked-by: Vlad Yasevich -vlad > --- > net/ipv4/ip_output.c | 3 ++- > net/ipv6/ip6_output.c | 2 +- > 2 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > index 7a3fd25e8913..532b36e9ce2a 100644 > --- a/net/ipv4/ip_output.c > +++ b/net/ipv4/ip_output.c > @@ -964,7 +964,8 @@ static int __ip_append_data(struct sock *sk, > csummode = CHECKSUM_PARTIAL; > > cork->length += length; > - if ((((length + fragheaderlen) > mtu) || (skb && skb_is_gso(skb))) && > + if ((((length + (skb ? skb->len : fragheaderlen)) > mtu) || > + (skb && skb_is_gso(skb))) && > (sk->sk_protocol == IPPROTO_UDP) && > (rt->dst.dev->features & NETIF_F_UFO) && !dst_xfrm(&rt->dst) && > (sk->sk_type == SOCK_DGRAM) && !sk->sk_no_check_tx) { > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c > index bf8a58a1c32d..1699acb2fa2c 100644 > --- a/net/ipv6/ip6_output.c > +++ b/net/ipv6/ip6_output.c > @@ -1390,7 +1390,7 @@ static int __ip6_append_data(struct sock *sk, > */ > > cork->length += length; > - if ((((length + fragheaderlen) > mtu) || > + if ((((length + (skb ? skb->len : headersize)) > mtu) || > (skb && skb_is_gso(skb))) && > (sk->sk_protocol == IPPROTO_UDP) && > (rt->dst.dev->features & NETIF_F_UFO) && !dst_xfrm(&rt->dst) && >