Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp3427172rwe; Mon, 29 Aug 2022 11:28:34 -0700 (PDT) X-Google-Smtp-Source: AA6agR4DhBAo44V/HWmDISvmAojkiMffyAd1rpZqqy5xR9IC27VAkHJmFrc2gHkNgvqKf2FYSFpv X-Received: by 2002:a50:c8cd:0:b0:448:302a:7163 with SMTP id k13-20020a50c8cd000000b00448302a7163mr8814587edh.278.1661797714555; Mon, 29 Aug 2022 11:28:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661797714; cv=none; d=google.com; s=arc-20160816; b=rK4gybFsvTj0YkDltPmz+oYX96sSRptCrwVJeCUa2nM66LYz4Yj85m2GKqkqHxqk1n 8ibYj3y1P9owt8SsVjfS2bnajIWLgz533RubBowrWbFQWMs3bV52hQaP+EvWyiAsiA0P ff7RjcKI3f5nGy2JNowKdt9Z2hewbW+v8c3pKEo9m6xYfsjAwRHj7G26P5sMjfjyJrjU qaWq+kP2M0cdzs/bhiZnN+DDrYKSnMsMCIt4QkvAgJsD1WzSyorSKhklJnsO8kOVgoCY RLx7/ps8iD0bsNNT/WjCtQ748WOYgqHDDRCFpe66HGxOvqLvV+aXCH+xMwMqPDCCn0MN 0uZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=Gx2HQD0tpQBbdWem1ACm9Xo5vyJERkm6qbJN/TC6ltY=; b=SKvt3LY515jy6ZfL5/D+/b3tmWH2VBLyXI2UgyMEHWfjpKeA6zZTwP1/hUYNewk1hb Mi2gAovCjzHSExrXy2W5ztzaPWcgTSGCw28wV26vPfYbgAaIKnaBThL/6+zkFjLEVufb m7aBD81gY0obyTdFxeUzOrPNcYeKafnIg9GhUEzP5lHXiq4472abYzVZ63ZtXOYJaTE5 hZ7nGkbt+xYEsKDcTVcRomI0TXvzBkVbp4FQwdnLaYCKP2oedmCqR6F6cPsQGZJbez/W R6NO12HsjMjfwd/Z+QNvyJtHHbWAIAZxwnNcSISKJqo1e3zGqhgPMgMhwBbo0W5Zi+p2 lOCA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=B7Tf35tR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u3-20020a170906068300b00730a3906ecfsi6453337ejb.110.2022.08.29.11.28.00; Mon, 29 Aug 2022 11:28:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=B7Tf35tR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231309AbiH2RQQ (ORCPT + 99 others); Mon, 29 Aug 2022 13:16:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231393AbiH2RQK (ORCPT ); Mon, 29 Aug 2022 13:16:10 -0400 Received: from mail-yw1-x1130.google.com (mail-yw1-x1130.google.com [IPv6:2607:f8b0:4864:20::1130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CD90E792EC for ; Mon, 29 Aug 2022 10:16:07 -0700 (PDT) Received: by mail-yw1-x1130.google.com with SMTP id 00721157ae682-33dc345ad78so212608757b3.3 for ; Mon, 29 Aug 2022 10:16:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=Gx2HQD0tpQBbdWem1ACm9Xo5vyJERkm6qbJN/TC6ltY=; b=B7Tf35tRw6Fx3gjD21CLlEspBY/2Z8HSGoqTK8Tg9huokbrUtPaH6Z/cspLcFc2SNf 3/3ARPjswphIwIvNvWYPPjCkREnm7BJPdfPKcyMIzqQDeD5Rt58E2HCCC6O5CdcJw7fx 3nVMdy7zdOP5tqdQ11duqwxnvNcgy4sMU6BEuaR9pH2DBqSd4YA5yPcZ5+k5aIeHuba3 /xYlUKvo8dzzt8BAFRJvUBh3ps/CzBBtVypB1yGWGzJniKIxqdbvZE2ma4XLkS1PBNGz KPsAAwDHZFdr4fz3sWKY0C3m9nsSfdJOVcMi6nTm0ONhvfELwFicDBEENKmeoDNhs7+I dMJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=Gx2HQD0tpQBbdWem1ACm9Xo5vyJERkm6qbJN/TC6ltY=; b=Hh8TraLc3wLCHANrT/Zw8UaCvM6NpQGhuX+y7gYj9DPc0Xow+/+Ua1+3YDfv0C0rgj m4/IOvw6T97FXfxQtM4WjmArBZywtK0/PfJTCn/6HWUrZR574KqJ1Qs+rrO7jamfW1Pm PqKvtdZV2C9SyhBZTRDCMb1vYfv2hoJnuryG1j4vwWpjyIRRufnDM7v2wRsLZgTWREVw 512cMpfP6Gj7xKnZ9DEv49wvuylttOaa9HkSBnP8BepJcudUw9NPzw3oAvCp8S/u/auO CQfuFeGPnrTOtEw650LAy5BBTszouy78RzfQIYw3UA3UgKUu+XShaDC/PpA+TbvyqZLJ gBQQ== X-Gm-Message-State: ACgBeo1yzNc4ji08o9WdA/3D1dnplmNiB291+v3q35tPgAFpnpwbv/jO cYzHS/bkmDmyYXNQCnc4JGZhpR4R+0Sg8FcmXVaxmQ== X-Received: by 2002:a0d:c681:0:b0:33c:2e21:4756 with SMTP id i123-20020a0dc681000000b0033c2e214756mr10499738ywd.467.1661793366537; Mon, 29 Aug 2022 10:16:06 -0700 (PDT) MIME-Version: 1.0 References: <20220829114739.GA2436@debian> In-Reply-To: <20220829114739.GA2436@debian> From: Eric Dumazet Date: Mon, 29 Aug 2022 10:15:55 -0700 Message-ID: Subject: Re: [PATCH 4/4] net-next: frags: dynamic timeout under load To: Richard Gobert Cc: David Miller , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Hideaki YOSHIFUJI , David Ahern , Alexander Aring , Stefan Schmidt , Pablo Neira Ayuso , Jozsef Kadlecsik , Florian Westphal , Martin KaFai Lau , netdev , "open list:DOCUMENTATION" , LKML , linux-wpan@vger.kernel.org, netfilter-devel@vger.kernel.org, coreteam@netfilter.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 29, 2022 at 4:49 AM Richard Gobert wrote: > > Calculate a dynamic fragment reassembly timeout, taking into > consideration the current fqdir load and the load introduced by > the peer. Reintroduce low_thresh, which now acts as a knob for > adjusting per-peer memory limits. > > Signed-off-by: Richard Gobert > --- > Documentation/networking/ip-sysctl.rst | 3 +++ > include/net/inet_frag.h | 1 + > net/ipv4/inet_fragment.c | 30 +++++++++++++++++++++++++- > net/ipv4/ip_fragment.c | 2 +- > 4 files changed, 34 insertions(+), 2 deletions(-) > > diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst > index 56cd4ea059b2..fb25aa6e22a2 100644 > --- a/Documentation/networking/ip-sysctl.rst > +++ b/Documentation/networking/ip-sysctl.rst > @@ -247,6 +247,9 @@ ipfrag_low_thresh - LONG INTEGER > begins to remove incomplete fragment queues to free up resources. > The kernel still accepts new fragments for defragmentation. > > + (Since linux-6.1) > + Maximum memory used to reassemble IP fragments sent by a single peer. > + > ipfrag_time - INTEGER > Time in seconds to keep an IP fragment in memory. > > diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h > index 077a0ec78a58..595a6db57a0e 100644 > --- a/include/net/inet_frag.h > +++ b/include/net/inet_frag.h > @@ -99,6 +99,7 @@ struct inet_frag_queue { > u16 max_size; > struct fqdir *fqdir; > struct inet_peer *peer; > + u64 timeout; Why u64 ? This is not what the timer interface uses (look at mod_timer(), it uses "unsigned long") > struct rcu_head rcu; > }; > > diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c > index 8b8d77d548d4..34c5ebba4951 100644 > --- a/net/ipv4/inet_fragment.c > +++ b/net/ipv4/inet_fragment.c > @@ -314,6 +314,30 @@ void inet_frag_free(struct inet_frag_queue *q) > call_rcu(&q->rcu, inet_frag_destroy_rcu); > } > > +static int inet_frag_update_timeout(struct inet_frag_queue *q) > +{ > + u64 peer_timeout, inet_timeout; > + long peer_mem, inet_mem; > + long high_thresh = READ_ONCE(q->fqdir->high_thresh); > + long low_thresh = READ_ONCE(q->fqdir->low_thresh); > + u64 base_timeout = READ_ONCE(q->fqdir->timeout); > + > + peer_mem = low_thresh - peer_mem_limit(q); > + inet_mem = high_thresh - frag_mem_limit(q->fqdir); > + > + if (peer_mem <= 0 || inet_mem <= 0) > + return -ENOMEM; > + > + /* Timeout changes linearly with respect to the amount of free memory. > + * Choose the more permissive of the two timeouts, to avoid limiting > + * the system while there is still enough memory. > + */ > + peer_timeout = div64_long(base_timeout * peer_mem, low_thresh); > + inet_timeout = div64_long(base_timeout * inet_mem, high_thresh); > + q->timeout = max_t(u64, peer_timeout, inet_timeout); If/when under load, timeout is close to zero, we would fire many timers (increased system load) and make impossible for datagrams to complete. In contrast, a reasonable timer and probabilistic drops of new datagrams when the queue is full lets some datagrams to complete. Make sure to test your change under a real DDOS, not only non malicious netperf > + return 0; > +} > + > void inet_frag_destroy(struct inet_frag_queue *q) > { > struct fqdir *fqdir; > @@ -346,6 +370,10 @@ static struct inet_frag_queue *inet_frag_alloc(struct fqdir *fqdir, > > q->fqdir = fqdir; > f->constructor(q, arg); > + if (inet_frag_update_timeout(q)) { > + inet_frag_free(q); > + return NULL; > + } > add_frag_mem_limit(q, f->qsize); > > timer_setup(&q->timer, f->frag_expire, 0); > @@ -367,7 +395,7 @@ static struct inet_frag_queue *inet_frag_create(struct fqdir *fqdir, > *prev = ERR_PTR(-ENOMEM); > return NULL; > } > - mod_timer(&q->timer, jiffies + fqdir->timeout); > + mod_timer(&q->timer, jiffies + q->timeout); > > *prev = rhashtable_lookup_get_insert_key(&fqdir->rhashtable, &q->key, > &q->node, f->rhash_params); > diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c > index e35061f6aadb..88a99242d721 100644 > --- a/net/ipv4/ip_fragment.c > +++ b/net/ipv4/ip_fragment.c > @@ -236,7 +236,7 @@ static int ip_frag_reinit(struct ipq *qp) > { > unsigned int sum_truesize = 0; > > - if (!mod_timer(&qp->q.timer, jiffies + qp->q.fqdir->timeout)) { > + if (!mod_timer(&qp->q.timer, jiffies + qp->q.timeout)) { > refcount_inc(&qp->q.refcnt); > return -ETIMEDOUT; > } > -- > 2.36.1 >