Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp1920297imm; Sun, 9 Sep 2018 11:41:30 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZPtm2iKmLzOoTZH25870PEDRGAQnLizQ5Yv0NFw9/RcPPIQ1fjZZpW+zN43PhYv1PfPR4x X-Received: by 2002:a17:902:f096:: with SMTP id go22mr18559055plb.183.1536518490407; Sun, 09 Sep 2018 11:41:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536518490; cv=none; d=google.com; s=arc-20160816; b=tRSiQ039w6Gd1Fu6xRR+Y1dDDY6CK6AWy3WI2Jj8VBTUObO094I2hlUVGhOBR2dg/E ATO3doG3M4L6nIAGqBWOdlytaPf5bfmH1/0XHn/3SK0RutlRg65uopxZ6uslhQwydzHL lZY2w6fR1tnq4+576nprcFCLJu4Ku26fO7VK7BN0SMz5zNZbOCx8Y+Alm0To8RIUYG9D HcZoSWCy/xKr//QIXqnSjUxS41Weyjb5WCPBMADi7KJ2v2OPVSYExP/Av09qPTof8PSV hTPdhx2aYGrhAEFHbxXV5pnMIUiAnvzIi6dMbvJRFb9hXzStvbInP2PCt07kIAswoU8R f9ng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=+LM8DgJEyT1bkkly4woXLFhCYhcFYS+Ps3NyBDgVLRM=; b=lOkBdn2or5SeBft3MKxJ2O1apZmiYaQYSzoj4W4zp1fT2I+fOUZwWDj8o7OJK60V00 y9SN7/i8cUm7F43z4Bsn4WF4SWrlduSRYmpnM1gN9vboY10PPUwuHVLXwOPUo/fHHCTO lOBt58lqHS7BemxlXZCioZkUGSlIzxcaSomFiYNMPz32MnF13vbiYy5H1eNFluTRu+8V WMYD5gh1GSYwzWCfwRJTNHDTyfEiQQvSXJQuQZXZN7IFf+27nPtVONKXBm/TUAH7TbOB PyFKGaD5aNkkyDKbG6AM2btBQKCfsWdCXoNAtLsXyB3XmEYIpHvtdV3msURADKmTQtoC CUlA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=FLf7OaXy; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o6-v6si14106338pls.480.2018.09.09.11.41.14; Sun, 09 Sep 2018 11:41:30 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=FLf7OaXy; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727711AbeIIX3W (ORCPT + 99 others); Sun, 9 Sep 2018 19:29:22 -0400 Received: from mail-it0-f66.google.com ([209.85.214.66]:37187 "EHLO mail-it0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727675AbeIIX3W (ORCPT ); Sun, 9 Sep 2018 19:29:22 -0400 Received: by mail-it0-f66.google.com with SMTP id h20-v6so25918283itf.2 for ; Sun, 09 Sep 2018 11:38:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+LM8DgJEyT1bkkly4woXLFhCYhcFYS+Ps3NyBDgVLRM=; b=FLf7OaXyTVfbWQDtBxIzWR7q2k1fq2X4UGwCvHUTbjhM/n/5UHULPsDAgx8Yj8J4S/ uKApAwEQjsqqz3ep8+zuvlnMMzg5if+R5/9F3Rn8n4x7nWmu9iiqzdiLW3HByeQm5uJD EDoOuV6hh1vdLZrwXYrSp5K52EOerTVg6ymHrkrNhIMatjDOA60ygUhcwhQn4L7FbpTx oeVvxfwOHtJMOpYraiHkm1llpXR15l75n6BaOnEZZk0gxp/K3dwLzlHSgKormWJDceib qli749C3CpbNhbWdTvrOckr+STUqtwQOpIY87lnvxZwINB0S1qQwVTi7fF8f9MfZ/gcT 7k1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+LM8DgJEyT1bkkly4woXLFhCYhcFYS+Ps3NyBDgVLRM=; b=Aunu3WAwqaL7MRXluqfKob9u9lgb+U0tllsG/J3+/64DcE+0YxMXNIMgrUzXNHBtek Q0MnTgt3dgM0PlULSGTR69VBab94ah3mzHFaFzCPC2oDO8ExkHFDw7ww9D2PomkYmrTH NTu/F4UhEZwx5549bt4hXatuoRea7RaLWXtW9PlPExl/R8FxEGZUcAKID/cp//SIIuyB K0HTOHHy+2Etz9+IFW3VkT8PygB0y7oPcbCWq6vFYozLiFZ8/qlAHAjS5DpEvaMZLTEb HGNDuW/frF/DvHXemC2eiB4suknGNWiUwFULSYUtyiceKL7echGyAW6PEZtB7IBOpAWz /fkA== X-Gm-Message-State: APzg51DgcwFGQbD8w9MLb31FSCS8JtR33hooM21lL5DA1OUIbvdjE8Iv MIAf3By5Mex8kJE/SuSp0C9QaRJiSjsMhuUFH612VQ== X-Received: by 2002:a24:534c:: with SMTP id n73-v6mr14841350itb.25.1536518328862; Sun, 09 Sep 2018 11:38:48 -0700 (PDT) MIME-Version: 1.0 References: <20180906192034.8467-1-olof@lixom.net> <20180907033257.2nlgiqm2t4jiwhzc@gondor.apana.org.au> In-Reply-To: From: Eric Dumazet Date: Sun, 9 Sep 2018 11:38:37 -0700 Message-ID: Subject: Re: [PATCH] net/sock: move memory_allocated over to percpu_counter variables To: Olof Johansson Cc: Herbert Xu , David Miller , Neil Horman , Marcelo Ricardo Leitner , Vladislav Yasevich , Alexey Kuznetsov , Hideaki YOSHIFUJI , linux-crypto@vger.kernel.org, LKML , linux-sctp@vger.kernel.org, netdev , linux-decnet-user@lists.sourceforge.net, kernel-team , Yuchung Cheng , Neal Cardwell Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Sep 8, 2018 at 10:02 AM Olof Johansson wrote: > > Hi, > > On Fri, Sep 7, 2018 at 12:21 AM, Eric Dumazet wrote: > > On Fri, Sep 7, 2018 at 12:03 AM Eric Dumazet wrote: > > > >> Problem is : we have platforms with more than 100 cpus, and > >> sk_memory_allocated() cost will be too expensive, > >> especially if the host is under memory pressure, since all cpus will > >> touch their private counter. > >> > >> per cpu variables do not really scale, they were ok 10 years ago when > >> no more than 16 cpus were the norm. > >> > >> I would prefer change TCP to not aggressively call > >> __sk_mem_reduce_allocated() from tcp_write_timer() > >> > >> Ideally only tcp_retransmit_timer() should attempt to reduce forward > >> allocations, after recurring timeout. > >> > >> Note that after 20c64d5cd5a2bdcdc8982a06cb05e5e1bd851a3d ("net: avoid > >> sk_forward_alloc overflows") > >> we have better control over sockets having huge forward allocations. > >> > >> Something like : > > > > Or something less risky : > > I gave both of these patches a run, and neither do as well on the > system that has slower atomics. :( > > The percpu version: > > 8.05% workload [kernel.vmlinux] > [k] __do_softirq > 7.04% swapper [kernel.vmlinux] > [k] cpuidle_enter_state > 5.54% workload [kernel.vmlinux] > [k] _raw_spin_unlock_irqrestore > 1.66% swapper [kernel.vmlinux] > [k] __do_softirq > 1.55% workload [kernel.vmlinux] > [k] finish_task_switch > 1.24% swapper [kernel.vmlinux] > [k] finish_task_switch > 1.07% workload [kernel.vmlinux] > [k] net_rx_action > > The first patch from you still has significant amount of time spent in > the atomics paths (non-inlined versions used): > > 7.87% workload [kernel.vmlinux] > [k] __ll_sc_atomic64_sub The second patch I gave should not enter this path at all, please try it. > 7.48% workload [kernel.vmlinux] > [k] __do_softirq > 5.05% workload [kernel.vmlinux] > [k] _raw_spin_unlock_irqrestore > 2.42% workload [kernel.vmlinux] > [k] __ll_sc_atomic64_add_return > 1.49% swapper [kernel.vmlinux] > [k] cpuidle_enter_state > 1.31% workload [kernel.vmlinux] > [k] finish_task_switch > 1.09% workload [kernel.vmlinux] > [k] tcp_sendmsg_locked > 1.08% workload [kernel.vmlinux] > [k] __arch_copy_from_user > 1.02% workload [kernel.vmlinux] > [k] net_rx_action > > I think a lot of the overhead from percpu approach can be alleviated > if we can use percpu_counter_read() instead of _sum() (i.e. no need to > iterate through the local per-cpu recent delta). I don't know the TCP > stack well enough to tell where it's OK to use a bit of slack in the > numbers though -- by default count will at most be off by 32*online > cpus. Might not be a significant number in reality.