Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1822958imm; Thu, 24 May 2018 01:12:00 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqut3h/Gs+xGHbH5JftOemv5h23TK2dRr0WMvE2tQO2b6XLUVXBpMEhW+Fi8jcE0+vjzsR5 X-Received: by 2002:a63:b80a:: with SMTP id p10-v6mr5107761pge.207.1527149520707; Thu, 24 May 2018 01:12:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527149520; cv=none; d=google.com; s=arc-20160816; b=gsqQ52/lCoOjAohcZ7oKjtrbcmRXEV6BPKtsBVbSB/qbXz3D7wxHnD7mprNHN5upP8 vhFTq0C3TV6uur58UWjhNVRZVM4po0HISpszWcKlB3bx3tV67KZ9vMVA7OryR6LveHel DDl9g/XdmHHNBQ4ONku91HRGkLhYJc8cEEYAFtJ0iL+jDf9iET96ruUn94NB0PvBLF5k OkDskVqZjrFYY+3REzYGg9Wz/7baLdxKTYK4hYR9Ow79JcAp4gE/TzWy4OstQPiVQkJF S0+NhDcUvNwGrPX/8XrlkL3aNmjiaHXUQW5iX0AhW2x7x0/9cb9j089QuG/Epwpn+u8M V1Ng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id :arc-authentication-results; bh=aMHCWgLYu0IVUWLMxmwDXKXXzKb3K6H+BNyhfnO5Imo=; b=MgJynA4PBQ+/ZvpFdW5u+4ypimEGN/rooEhP5LsJ5ccueuI38h7featF1faifHBYjm 7MrTKnU2lMzaymcGiTMYk16Tdth3oRNsRA/jDwD8dtkDAIytCEqFMedYPhoHYUn25vZl QlzAXW1RtbC7/wgRURPcQVXRZLn/pFImgus2KnEggyAGiS/pQnFYRzWktV52CqHRSymC AAGaAB5bldYCuXYamMAW0/x4v0QlFwkUR88ayeM9En5VJZSOL5R7Z3IlYCSw40tK5gtT zhTUlU03o64r9h9gsZEK42dVkYLgl9x0UBspsHiyEB4i8pZUCQ1/S6kemEqYorJRb1QS XIGg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z190-v6si16237708pgb.108.2018.05.24.01.11.45; Thu, 24 May 2018 01:12:00 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935840AbeEXIAj (ORCPT + 99 others); Thu, 24 May 2018 04:00:39 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:60208 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S935825AbeEXIAc (ORCPT ); Thu, 24 May 2018 04:00:32 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 80E034022407; Thu, 24 May 2018 08:00:31 +0000 (UTC) Received: from ovpn-117-42.ams2.redhat.com (ovpn-117-42.ams2.redhat.com [10.36.117.42]) by smtp.corp.redhat.com (Postfix) with ESMTP id D7C8A2166BB2; Thu, 24 May 2018 08:00:27 +0000 (UTC) Message-ID: <1527148826.3058.16.camel@redhat.com> Subject: Re: WARNING in ip_recv_error From: Paolo Abeni To: Willem de Bruijn , David Miller Cc: Eric Dumazet , DaeLyong Jeong , Alexey Kuznetsov , Hideaki YOSHIFUJI , Network Development , LKML , Byoungyoung Lee , Kyungtae Kim , bammanag@purdue.edu, Willem de Bruijn Date: Thu, 24 May 2018 10:00:26 +0200 In-Reply-To: References: <20180518120826.GA19515@dragonet.kaist.ac.kr> <293d029c-b14c-a625-3703-97a5754e99f1@gmail.com> <20180518.114433.390752642781753429.davem@davemloft.net> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Thu, 24 May 2018 08:00:31 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Thu, 24 May 2018 08:00:31 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'pabeni@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2018-05-23 at 11:40 -0400, Willem de Bruijn wrote: > On Sun, May 20, 2018 at 7:13 PM, Willem de Bruijn > wrote: > > On Fri, May 18, 2018 at 2:59 PM, Willem de Bruijn > > wrote: > > > On Fri, May 18, 2018 at 2:46 PM, Willem de Bruijn > > > wrote: > > > > On Fri, May 18, 2018 at 2:44 PM, Willem de Bruijn > > > > wrote: > > > > > On Fri, May 18, 2018 at 1:09 PM, Willem de Bruijn > > > > > wrote: > > > > > > On Fri, May 18, 2018 at 11:44 AM, David Miller wrote: > > > > > > > From: Eric Dumazet > > > > > > > Date: Fri, 18 May 2018 08:30:43 -0700 > > > > > > > > > > > > > > > We probably need to revert Willem patch (7ce875e5ecb8562fd44040f69bda96c999e38bbc) > > > > > > > > > > > > > > Is it really valid to reach ip_recv_err with an ipv6 socket? > > > > > > > > > > > > I guess the issue is that setsockopt IPV6_ADDRFORM is not an > > > > > > atomic operation, so that the socket is neither fully ipv4 nor fully > > > > > > ipv6 by the time it reaches ip_recv_error. > > > > > > > > > > > > sk->sk_socket->ops = &inet_dgram_ops; > > > > > > < HERE > > > > > > > sk->sk_family = PF_INET; > > > > > > > > > > > > Even calling inet_recv_error to demux would not necessarily help. > > > > > > > > > > > > Safest would be to look up by skb->protocol, similar to what > > > > > > ipv6_recv_error does to handle v4-mapped-v6. > > > > > > > > > > > > Or to make that function safe with PF_INET and swap the order > > > > > > of the above two operations. > > > > > > > > > > > > All sound needlessly complicated for this rare socket option, but > > > > > > I don't have a better idea yet. Dropping on the floor is not nice, > > > > > > either. > > > > > > > > > > Ensuring that ip_recv_error correctly handles packets from either > > > > > socket and removing the warning should indeed be good. > > > > > > > > > > It is robust against v4-mapped packets from an AF_INET6 socket, > > > > > but see caveat on reconnect below. > > > > > > > > > > The code between ipv6_recv_error for v4-mapped addresses and > > > > > ip_recv_error is essentially the same, the main difference being > > > > > whether to return network headers as sockaddr_in with SOL_IP > > > > > or sockaddr_in6 with SOL_IPV6. > > > > > > > > > > There are very few other locations in the stack that explicitly test > > > > > sk_family in this way and thus would be vulnerable to races with > > > > > IPV6_ADDRFORM. > > > > > > > > > > I'm not sure whether it is possible for a udpv6 socket to queue a > > > > > real ipv6 packet on the error queue, disconnect, connect to an > > > > > ipv4 address, call IPV6_ADDRFORM and then call ip_recv_error > > > > > on a true ipv6 packet. That would return buggy data, e.g., in > > > > > msg_name. > > > > > > > > In do_ipv6_setsockopt IPV6_ADDRFORM we can test that the > > > > error queue is empty, and then take its lock for the duration of the > > > > operation. > > > > > > Actually, no reason to hold the lock. This setsockopt holds the socket > > > lock, which connect would need, too. So testing that the queue > > > is empty after testing that it is connected to a v4 address is > > > sufficient to ensure that no ipv6 packets are queued for reception. > > > > > > diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c > > > index 4d780c7f0130..a975d6311341 100644 > > > --- a/net/ipv6/ipv6_sockglue.c > > > +++ b/net/ipv6/ipv6_sockglue.c > > > @@ -199,6 +199,11 @@ static int do_ipv6_setsockopt(struct sock *sk, > > > int level, int optname, > > > > > > if (ipv6_only_sock(sk) || > > > !ipv6_addr_v4mapped(&sk->sk_v6_daddr)) { > > > retv = -EADDRNOTAVAIL; > > > break; > > > } > > > > > > + if (!skb_queue_empty(&sk->sk_error_queue)) { > > > + retv = -EBUSY; > > > + break; > > > + } > > > + > > > fl6_free_socklist(sk); > > > __ipv6_sock_mc_close(sk); > > > > > > After this it should be safe to remove the warning in ip_recv_error. > > > > Hmm.. nope. > > > > This ensures that the socket cannot produce any new true v6 packets. > > But it does not guarantee that they are not already in the system, e.g. > > queued in tc, and will find their way to the error queue later. > > > > We'll have to just be able to handle ipv6 packets in ip_recv_error. > > Since IPV6_ADDRFORM is used to pass to legacy v4-only > > processes and those likely are only confused by SOL_IPV6 > > error messages, it is probably best to just drop them and perhaps > > WARN_ONCE. > > Even more fun, this is not limited to the error queue. > > I can queue a v6 packet for reception on a socket, connect to a v4 > address, call IPV6_ADDRFORM and then a regular recvfrom will > return a partial v6 address as AF_INET. > > We definitely do not want to have to add a check > > if (skb->protocol == htons(ETH_P_IPV6)) { > kfree_skb(skb); > goto try_again; > } > > to the normal recvmsg path. > > An alternative may be to tighten the check on when to allow > IPV6_ADDRFORM. Not only return EBUSY if a packet is pending, > but also if any sk_{rmem, omem, wmem}_alloc is non-zero. Only, > these tightened constraints could break a legacy application. I fear that condition will be very restrictive: for UDP sockets sk_rmem can be zero only occasionally, after the first packet has been received, due to the peculiar memory accounting - commit 6b229cf77d68 ("This computer thing still completely fool me"). Cheers, Paolo