Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp3408606pxp; Tue, 8 Mar 2022 13:49:55 -0800 (PST) X-Google-Smtp-Source: ABdhPJx3kQqh3i+TSJ1LzF1cdD/Fc6NWCKfNx2gyqx1b9UXJY/CjDg0g1J/pprkL0EuGWhOdEngy X-Received: by 2002:a63:205c:0:b0:380:a9f7:6caf with SMTP id r28-20020a63205c000000b00380a9f76cafmr1924394pgm.365.1646776195453; Tue, 08 Mar 2022 13:49:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646776195; cv=none; d=google.com; s=arc-20160816; b=nzSZ/HTsCRrM+5Dj61lxAjondpfBrcDauGIEotXvR2c39MdStpKrjmGAOvFPvcQksJ Df2eE+5Lzc1u3udFBPlZM06rw6tZVBo5c0on1zVf30EzgvuP7nQcaPX+7tfhbOJrDNN4 5K5Qf5RxhbE1LHaBKPoG1jIqMlGmMFUIWO7pICpjvtwVbujHmPT5sxYgbDAErl6A/Xd/ Aa1h1ssUyFQ1e8aVoBTGQ+JXXv/ehtznbUVQrYE1sBq1qX3F2G2jHjGRvk6qEm/3wSmh F8VLhooIFcWXHe9BMJbozIBCbG+DFG/CEULnbTDYKcBk4GEA3//9J6LFdd8a4fFhPvjs +p9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :mime-version:user-agent:date:message-id:cc:to:subject:from; bh=q264u1t8Dht7p49QT6MrQojq1BxOhxtOdMtW3hAnzQY=; b=yxWWPaSnayPQH4Yxc5EmrE8JkLR3PB8R/QpacDUEEaSolsRvSEv8WIsJGMDa9422sY gu0eWDMIFZMKdG2+pjNemxXyT8jPf/tKoyS9Xw3brGdoneO3W6cATYw2uawugS4JSx6T I3jgx0LR1yKT1eWI6xhAZEptjrESrrD0xga3dwlzfwYbQ5lnktzMUxy+dxKwT579kXFV mFqrZOmnCS8mVEsk8YSlrnBVyfFcpAjvCi0uiJnXDLWbZPktJM5S6iBcl0FKPZxhnS1X 2lmpJZ+tHfe2q7tUsyEapPT8ZTdTPh68rY5y+C1SDwhFM+qDaTtg+ewXe7VQPpMYpiLg x0MQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c10-20020a634e0a000000b0034d3d6c31e1si16079834pgb.794.2022.03.08.13.49.38; Tue, 08 Mar 2022 13:49:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241886AbiCHGm2 (ORCPT + 99 others); Tue, 8 Mar 2022 01:42:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60274 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231614AbiCHGm0 (ORCPT ); Tue, 8 Mar 2022 01:42:26 -0500 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F4563983B; Mon, 7 Mar 2022 22:41:29 -0800 (PST) Received: from canpemm500006.china.huawei.com (unknown [172.30.72.54]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4KCQcX0YzLz9sSr; Tue, 8 Mar 2022 14:37:48 +0800 (CST) Received: from [10.174.179.200] (10.174.179.200) by canpemm500006.china.huawei.com (7.192.105.130) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 8 Mar 2022 14:41:27 +0800 From: "Ziyang Xuan (William)" Subject: IPv4 saddr do not match with selected output device in double default gateways scene To: David Miller , , David Ahern , Jakub Kicinski , netdev , CC: Linux Kernel Mailing List Message-ID: <58c15089-f1c7-675e-db4b-b6dfdad4b497@huawei.com> Date: Tue, 8 Mar 2022 14:41:27 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 Content-Type: text/plain; charset="gbk" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.179.200] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To canpemm500006.china.huawei.com (7.192.105.130) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Create VLAN devices and add default gateways with following commands: # ip link add link eth2 dev eth2.71 type vlan id 71 # ip link add link eth2 dev eth2.72 type vlan id 72 # ip addr add 192.168.71.41/24 dev eth2.71 # ip addr add 192.168.72.41/24 dev eth2.72 # ip link set eth2.71 up # ip link set eth2.72 up # route add -net default gw 192.168.71.1 dev eth2.71 # route add -net default gw 192.168.72.1 dev eth2.72 Add a nameserver configuration in the following file: # cat /etc/resolv.conf nameserver 8.8.8.8 Use the following command trigger DNS packet: # ping www.baidu.com Assume the above test machine is client. Of course, we should also create VLAN devices in peer server as following: # ip link add link eth2 dev eth2.71 type vlan id 71 # ip link add link eth2 dev eth2.72 type vlan id 72 # ip addr add 192.168.71.1/24 dev eth2.71 # ip addr add 192.168.72.1/24 dev eth2.72 # ip link set eth2.71 up # ip link set eth2.72 up We capture packets with tcpdump in client machine when ping: # tcpdump -i eth2 -vne ... 20:30:22.996044 52:54:00:20:23:a9 > 52:54:00:d2:4f:e3, ethertype 802.1Q (0x8100), length 77: vlan 71, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 25407, offset 0, flags [DF], proto UDP (17), length 59) 192.168.72.41.42666 > 8.8.8.8.domain: 58562+ A? www.baidu.com. (31) 20:30:22.996125 52:54:00:20:23:a9 > 52:54:00:d2:4f:e3, ethertype 802.1Q (0x8100), length 77: vlan 71, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 25408, offset 0, flags [DF], proto UDP (17), length 59) 192.168.72.41.42666 > 8.8.8.8.domain: 25803+ AAAA? www.baidu.com. (31) ... We can find that IPv4 saddr "192.168.72.41" do not match with selected VLAN device "eth2.71". I tracked the related processes, and found that user space program uses connect() firstly, then sends UDP packet. The problem happens in the connect() process. Analysis as following with codes: static inline struct rtable *ip_route_connect(struct flowi4 *fl4, __be32 dst, __be32 src, u32 tos, int oif, u8 protocol, __be16 sport, __be16 dport, struct sock *sk) { struct net *net = sock_net(sk); struct rtable *rt; ip_route_connect_init(fl4, dst, src, tos, oif, protocol, sport, dport, sk); if (!dst || !src) { /* rtable and fl4 are matched after the first __ip_route_output_key(). * rtable->dst.dev->name == "eth2.72" && rtable->rt_gw4 == 0x148a8c0 * fl4->saddr == 0x2948a8c0 */ rt = __ip_route_output_key(net, fl4); if (IS_ERR(rt)) return rt; ip_rt_put(rt); flowi4_update_output(fl4, oif, tos, fl4->daddr, fl4->saddr); } security_sk_classify_flow(sk, flowi4_to_flowi_common(fl4)); /* rtable and fl4 do not match after the second __ip_route_output_key(). * rtable->dst.dev->name == "eth2.71" && rtable->rt_gw4 == 0x147a8c0 * fl4->saddr == 0x2948a8c0 */ return ip_route_output_flow(net, fl4, sk); } Deep tracking, it because fa->fa_default has changed in fib_select_default() after first __ip_route_output_key() process, and a new fib_nh is selected in fib_select_default() within the second __ip_route_output_key() process but not update flowi4. So the phenomenon described at the beginning happens. Does it a kernel bug or a user problem? If it is a kernel bug, is there any good solution?