Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp1221841pxp; Thu, 10 Mar 2022 00:53:17 -0800 (PST) X-Google-Smtp-Source: ABdhPJy9yFWYvhNiyXZqzZXA8b2IJ4nM3EDKTGbyvEkvOWzCom4MTWYBKMV10Ux4UKlJF/0DRhOK X-Received: by 2002:a05:6402:1112:b0:413:3d99:f2d7 with SMTP id u18-20020a056402111200b004133d99f2d7mr3278959edv.23.1646902397064; Thu, 10 Mar 2022 00:53:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646902397; cv=none; d=google.com; s=arc-20160816; b=S30+DA0dEhTS4n4IF7QgV5C7URLsTJkuESoA6KcqcsuPc/cejSAuSFITmTJLDBKcLs bBdUYfKshHKfFJhNC115H8Kpcxva6C4CNy9VD880CS/+DxIq5ljvXHU31+btzuM5zhLV Me5/lOXu02INanPXxGPU4KhGpNM/E997TrBVLL9JexoJYriaECHm6kLlrTJhDgmLm5Qj o7TyTfbXTsKhTWGi0RmpzI97VDTRNJo7r6qW6NZBNwXKjO4p+cCgeIXUZ49zgA9QN8OM 3QzGstzKLRkxUKwzPSDkdjGWMZdzMJQ4ra4E+mXLmUPmLHPZ0gaxsXHQ45SUHy0H10aR xVjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:references:cc :to:from:subject; bh=ZrlarG2Dach4ggikgP3L5J0ofyg0F6MLSumDrzMPDKg=; b=ym1ZM4honh3LPBWcqu8dyxRGZlTWyBX4h6bxqMZ3y8S4Ui/KPxp+r9zd16UJfZHV7l b3m1A36winVTvTdlBu6z6YN0iDoFb4xyggW0wSlcpRHuXdUVKTIYOsEroUxfzFwXpDjD hOyKM10qzTOSsrTvkZ1KL/B2Ob/YtJFSQ6ovZlTtPVRoAtpIPZTDsnRicarSYL3eOhyQ XkfXPONg2NSUy7fkUtcKBHQI7lDcAENRTiT6fzK+2AXwWgC7oR5lJK6pOI+NltV4dWda cRxCqbp3cRfCc/7lUGxwoKIS/tXRZ8KDFh74d+oAnSOohtLzou6rzvjrLTOLsGFqL/o9 OIHQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gt12-20020a1709072d8c00b006da9ef895besi2850603ejc.651.2022.03.10.00.52.53; Thu, 10 Mar 2022 00:53:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234493AbiCJBw3 (ORCPT + 99 others); Wed, 9 Mar 2022 20:52:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59394 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236818AbiCJBwX (ORCPT ); Wed, 9 Mar 2022 20:52:23 -0500 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A437127D79; Wed, 9 Mar 2022 17:51:22 -0800 (PST) Received: from canpemm500006.china.huawei.com (unknown [172.30.72.57]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4KDX3W1HZlz1GCK0; Thu, 10 Mar 2022 09:46:31 +0800 (CST) Received: from [10.174.179.200] (10.174.179.200) by canpemm500006.china.huawei.com (7.192.105.130) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Thu, 10 Mar 2022 09:51:20 +0800 Subject: Re: IPv4 saddr do not match with selected output device in double default gateways scene From: "Ziyang Xuan (William)" To: David Miller , , David Ahern , Jakub Kicinski , netdev , CC: Linux Kernel Mailing List References: <58c15089-f1c7-675e-db4b-b6dfdad4b497@huawei.com> Message-ID: <0de63268-a33b-d514-9457-1332c8aec58e@huawei.com> Date: Thu, 10 Mar 2022 09:51:19 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <58c15089-f1c7-675e-db4b-b6dfdad4b497@huawei.com> Content-Type: text/plain; charset="gbk" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.179.200] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To canpemm500006.china.huawei.com (7.192.105.130) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Create VLAN devices and add default gateways with following commands: > > # ip link add link eth2 dev eth2.71 type vlan id 71 > # ip link add link eth2 dev eth2.72 type vlan id 72 > # ip addr add 192.168.71.41/24 dev eth2.71 > # ip addr add 192.168.72.41/24 dev eth2.72 > # ip link set eth2.71 up > # ip link set eth2.72 up > # route add -net default gw 192.168.71.1 dev eth2.71 > # route add -net default gw 192.168.72.1 dev eth2.72 > > Add a nameserver configuration in the following file: > # cat /etc/resolv.conf > nameserver 8.8.8.8 > > Use the following command trigger DNS packet: > # ping www.baidu.com > > Assume the above test machine is client. > > Of course, we should also create VLAN devices in peer server as following: > > # ip link add link eth2 dev eth2.71 type vlan id 71 > # ip link add link eth2 dev eth2.72 type vlan id 72 > # ip addr add 192.168.71.1/24 dev eth2.71 > # ip addr add 192.168.72.1/24 dev eth2.72 > # ip link set eth2.71 up > # ip link set eth2.72 up > > We capture packets with tcpdump in client machine when ping: > # tcpdump -i eth2 -vne > ... > 20:30:22.996044 52:54:00:20:23:a9 > 52:54:00:d2:4f:e3, ethertype 802.1Q (0x8100), length 77: vlan 71, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 25407, offset 0, flags [DF], proto UDP (17), length 59) > 192.168.72.41.42666 > 8.8.8.8.domain: 58562+ A? www.baidu.com. (31) > 20:30:22.996125 52:54:00:20:23:a9 > 52:54:00:d2:4f:e3, ethertype 802.1Q (0x8100), length 77: vlan 71, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 25408, offset 0, flags [DF], proto UDP (17), length 59) > 192.168.72.41.42666 > 8.8.8.8.domain: 25803+ AAAA? www.baidu.com. (31) > ... > > We can find that IPv4 saddr "192.168.72.41" do not match with selected VLAN device "eth2.71". Is there anyone familiar with route/fib realization? And thank you for your warm-hearted help! > > I tracked the related processes, and found that user space program uses connect() firstly, then sends UDP packet. > > The problem happens in the connect() process. Analysis as following with codes: > > static inline struct rtable *ip_route_connect(struct flowi4 *fl4, > __be32 dst, __be32 src, u32 tos, > int oif, u8 protocol, > __be16 sport, __be16 dport, > struct sock *sk) > { > struct net *net = sock_net(sk); > struct rtable *rt; > > ip_route_connect_init(fl4, dst, src, tos, oif, protocol, > sport, dport, sk); > > if (!dst || !src) { > > /* rtable and fl4 are matched after the first __ip_route_output_key(). > * rtable->dst.dev->name == "eth2.72" && rtable->rt_gw4 == 0x148a8c0 > * fl4->saddr == 0x2948a8c0 > */ > rt = __ip_route_output_key(net, fl4); > if (IS_ERR(rt)) > return rt; > ip_rt_put(rt); > flowi4_update_output(fl4, oif, tos, fl4->daddr, fl4->saddr); > } > security_sk_classify_flow(sk, flowi4_to_flowi_common(fl4)); > > /* rtable and fl4 do not match after the second __ip_route_output_key(). > * rtable->dst.dev->name == "eth2.71" && rtable->rt_gw4 == 0x147a8c0 > * fl4->saddr == 0x2948a8c0 > */ > return ip_route_output_flow(net, fl4, sk); > } > > Deep tracking, it because fa->fa_default has changed in fib_select_default() after first __ip_route_output_key() process, > and a new fib_nh is selected in fib_select_default() within the second __ip_route_output_key() process but not update flowi4. > So the phenomenon described at the beginning happens. > > Does it a kernel bug or a user problem? If it is a kernel bug, is there any good solution? >