Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp4264461imw; Tue, 19 Jul 2022 03:19:11 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sZZ9jArlM9I8VL2hcRRJJxh7J36+n7SqRP057Wb9iutJBc9rg2NbZ4KT+x1vy+YTOswze+ X-Received: by 2002:a17:907:3fa8:b0:72b:5895:f54f with SMTP id hr40-20020a1709073fa800b0072b5895f54fmr29832495ejc.464.1658225951243; Tue, 19 Jul 2022 03:19:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658225951; cv=none; d=google.com; s=arc-20160816; b=ZEk7L8Ms+DLYdNyKJ3ROiJQhWdccJIZToAN8r4FU7GuqhxjsEpY411St+wfEuwLcZ1 XgYF0Ld2Nuivf+qVqvsdArA+8RbDqD5znNKv0+SDPsPKfQxXVzg4yjcBpe8o9pKE8nND 0Q6WrLlAPfErAjib3TELRf5+WEfwv3zYHwSlG28jfOcm87ricVHL4p+9qUHgKdabxaUT OzVJ+IXRCQB8UA/zGYblbbGPOzlXjR11BJwKEtSLFI94z9egGNmZDCtQek4R2jDU94U3 L2hww1AwDqYbtUeA0z1CEdQwcWNa/qXs1zz2fJp5C9wWJVaxOYb5aTXozJJ1pnWqgtJa AG5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=kcODNYrp+VKkOtbBUD23HLsfyp2gv6gQrx0DgI07j10=; b=vLGKSMtIq7fwFq/DcVhytb9HLiJksNqT+e0f7HVq8CpcFeCbmKmZ81Txq/GJoV6G10 gZ0Q1X9rIHcWYCx3hqy923cyag5wt613TKXMHK/LdSNk1H/szya8/OOaLaovXhaWqu94 z6TKdzoifklkHcEa4a+XaXd2HbyEeA23N6l/zW12RxVCweMnkmq+yRKBOpbh+FHvkLH5 UifIXTpLci14Lr04PAHpygfwlkJlelH1BnlGUjVvJhxkbOaf1tJGK3YCGvfNRHBi8mx9 FThcP/RgdNFA2EqatPJHj+ue+N8vQqSy1W2L89k1Irj+F8wt+Knk/RuKFuJEx3ip94N+ 3ygQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Zw7PrFKf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e40-20020a056402332800b0043b906fe19dsi1281277eda.537.2022.07.19.03.18.46; Tue, 19 Jul 2022 03:19:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Zw7PrFKf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237225AbiGSJgf (ORCPT + 99 others); Tue, 19 Jul 2022 05:36:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37924 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236843AbiGSJge (ORCPT ); Tue, 19 Jul 2022 05:36:34 -0400 Received: from mail-yb1-xb2b.google.com (mail-yb1-xb2b.google.com [IPv6:2607:f8b0:4864:20::b2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E38981116C for ; Tue, 19 Jul 2022 02:36:32 -0700 (PDT) Received: by mail-yb1-xb2b.google.com with SMTP id i206so25477927ybc.5 for ; Tue, 19 Jul 2022 02:36:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=kcODNYrp+VKkOtbBUD23HLsfyp2gv6gQrx0DgI07j10=; b=Zw7PrFKfP+/4qoL4plmuh+Ti44dlIeOdjEROXu/pO4UbBi70en+R4pYSONQpWhE03u Sc0Y2E2GEq1P7l8zJvfNyHS7ZJtjSnLByG43SbwQZ3pNzBF/ROWZvz0hrYlgdgD8Cl2Y Ez29bVQSZyegjrIAsyoqicMAHSsclmeETDayc1WJZvXFZAp1CQ9Zbg8X0FE35WgF9Xc2 qfGcmLC9kRCbBjNulWYzZkwDfTHvi61NkFLLR1pUlPEe+Lyw8I7J3BnQBj7iOUOtIK91 CZsLyScvAuoBurN7GSExxuRy77xhAvnRVusD1Byw3M3f0niKxQEHzdNTLEIqn02VrPI6 26wg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=kcODNYrp+VKkOtbBUD23HLsfyp2gv6gQrx0DgI07j10=; b=EEt3aDtQcAEgh3aYFCmIiMinPwqzuhBCCgVcX3QUnAYvg1McHpD10QBFKHBhOvZ+RY sryQAvW+kTmihi2oqG7IroRVmowuR+ink4IfvToTiWV5ytqmBdZMFdpcsLHarfcE50er KgJOFk2fzbYCLbLB1ir6Rl5rO+Bcp2AULutvtRYZe9ZaknOG/cJLFutsQPAXP6EpxkS/ hwiPT6SsSCa4Z6amJDfJs+n/PIhPJrL0xHmxN6khvn9+JR/Twg0PQV/WyOB2bSGXvi2A iQvndMWU0L7rDfMgS/2ZaaMeKHzliVukPHckIq2H7Tfe9q9d0YCy5GzPN2y9fbgcpLFg sliw== X-Gm-Message-State: AJIora/WUvtqYrNNnv/KOAM9cAU6fCndbjMxX1ZT2MVtQ+mGj3IrCVNy MK1EBGVikyYr6RIIxJunezJGY1nrt+uzoz55/+DU0g== X-Received: by 2002:a05:6902:120c:b0:66e:fbad:39de with SMTP id s12-20020a056902120c00b0066efbad39demr30443605ybu.388.1658223392022; Tue, 19 Jul 2022 02:36:32 -0700 (PDT) MIME-Version: 1.0 References: <0eb1cb5746e9ac938a7ba7848b33ccf680d30030.1657643355.git.asml.silence@gmail.com> <20220718185413.0f393c91@kernel.org> In-Reply-To: <20220718185413.0f393c91@kernel.org> From: Willem de Bruijn Date: Tue, 19 Jul 2022 11:35:54 +0200 Message-ID: Subject: Re: [PATCH net-next v5 01/27] ipv4: avoid partial copy for zc To: Jakub Kicinski Cc: Pavel Begunkov , io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, "David S . Miller" , Jonathan Lemon , Jens Axboe , David Ahern , kernel-team@fb.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 19, 2022 at 3:54 AM Jakub Kicinski wrote: > > On Tue, 12 Jul 2022 21:52:25 +0100 Pavel Begunkov wrote: > > Even when zerocopy transmission is requested and possible, > > __ip_append_data() will still copy a small chunk of data just because it > > allocated some extra linear space (e.g. 148 bytes). It wastes CPU cycles > > on copy and iter manipulations and also misalignes potentially aligned > > data. Avoid such coies. And as a bonus we can allocate smaller skb. > > s/coies/copies/ can fix when applying > > > > > Signed-off-by: Pavel Begunkov > > --- > > net/ipv4/ip_output.c | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > > index 00b4bf26fd93..581d1e233260 100644 > > --- a/net/ipv4/ip_output.c > > +++ b/net/ipv4/ip_output.c > > @@ -969,7 +969,6 @@ static int __ip_append_data(struct sock *sk, > > struct inet_sock *inet = inet_sk(sk); > > struct ubuf_info *uarg = NULL; > > struct sk_buff *skb; > > - > > struct ip_options *opt = cork->opt; > > int hh_len; > > int exthdrlen; > > @@ -977,6 +976,7 @@ static int __ip_append_data(struct sock *sk, > > int copy; > > int err; > > int offset = 0; > > + bool zc = false; > > unsigned int maxfraglen, fragheaderlen, maxnonfragsize; > > int csummode = CHECKSUM_NONE; > > struct rtable *rt = (struct rtable *)cork->dst; > > @@ -1025,6 +1025,7 @@ static int __ip_append_data(struct sock *sk, > > if (rt->dst.dev->features & NETIF_F_SG && > > csummode == CHECKSUM_PARTIAL) { > > paged = true; > > + zc = true; > > } else { > > uarg->zerocopy = 0; > > skb_zcopy_set(skb, uarg, &extra_uref); > > @@ -1091,9 +1092,12 @@ static int __ip_append_data(struct sock *sk, > > (fraglen + alloc_extra < SKB_MAX_ALLOC || > > !(rt->dst.dev->features & NETIF_F_SG))) > > alloclen = fraglen; > > - else { > > + else if (!zc) { > > alloclen = min_t(int, fraglen, MAX_HEADER); > > Willem, I think this came in with your GSO work, is there a reason we > use MAX_HEADER here? I thought MAX_HEADER is for headers (i.e. more or > less to be reserved) not for the min amount of data to be included. > > I wanna make sure we're not missing something about GSO here. > > Otherwise I don't think we need the extra branch but that can > be a follow up. The change was introduced for UDP GSO, to avoid copying most payload on software segmentation: " commit 15e36f5b8e982debe43e425d2e12d34e022d51e9 Author: Willem de Bruijn Date: Thu Apr 26 13:42:19 2018 -0400 udp: paged allocation with gso When sending large datagrams that are later segmented, store data in page frags to avoid copying from linear in skb_segment. " and in code - else - alloclen = datalen + fragheaderlen; + else if (!paged) + alloclen = fraglen; + else { + alloclen = min_t(int, fraglen, MAX_HEADER); + pagedlen = fraglen - alloclen; + } MAX_HEADER was a short-hand for the exact header length. "alloclen = fragheaderlen + transhdrlen;" is probably a better choice indeed. Whether with branch or without, the same change needs to be made to __ip6_append_data, just as in the referenced commit. Let's keep the stacks in sync. This is tricky code. If in doubt, run the msg_zerocopy and udp_gso tests from tools/testing/selftests/net, ideally with KASAN.