Received: by 10.192.165.148 with SMTP id m20csp1464603imm; Sat, 21 Apr 2018 08:52:32 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+IcLJecLJrfJknG2/VEieKwYwl9TJuHH17Z0jco8HC+2eqoQYqmmpnxa9BqQ6L9STz4+eb X-Received: by 10.98.71.141 with SMTP id p13mr11440066pfi.164.1524325952300; Sat, 21 Apr 2018 08:52:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524325952; cv=none; d=google.com; s=arc-20160816; b=XanZc6Nh+ZgA7VBVu13ZE+ls8DHPKSg7d+8IVQHptBvM73PPyLENEQgKaAGh8pi2Ig gfRaDUvo8EEnlNUx7BSFaiaGKEYWgzm2eUHs+VHUACwFlW88UMgbu8VOrPxN+xy2n5wB zyI/F/lJ9kJjTXHpGJtaFx82asP40mJXLhmFSsoflfdd5iUZTPA4mfVh1py32cbKwinn 05GiKCgFIHQNFNHp1e1W8IR3Qq+SzBH9VXkwmFTe8MxBd53KyCZI85oRL/CgmlhLEF3Z N2KOmFo/gpc+S/Fz7ysS0CPK0Ke+vBVTxGJ2W0mWmDIq5+7DQZUnUmuMQpNTSpccspL/ 6PGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:date:from :arc-authentication-results; bh=5AEvJrLsfFLSXPhfIKy7GBFOTogRtb+gOp25x1MYDBg=; b=nKTdLWhqPnFNZ0QROuMx7y70jOBHdzzj2DaZQWICc3fZMz18xjSVg9sLZKjUS55Mus D0K2llnGDC/15gQ27jPrjn9cIUxC2kGtFpt7Oz9PYJWbkO3Qr+MR/M88/0r5Z4Vva4XO WtvqTOGQhBB5T1bOPrRWM0muGEvNz1ft6t8wNRW1RaS+lL2cEp/MIyVDUgZBGyN6a4zr lwdp+Pa+2vXj1Uq9V/O9pyTbdLvcJsB1eM/MneWTlaQB7t5NBG568zEMlMcv3CsoKT9V 3EDDJYmQpvSnvCGpKRTBLBi8gjLvrlyD9Ww137/Jd145ll2BvDJL5C3BGyCWjkvSEUGR MhJg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d6-v6si7530568plo.551.2018.04.21.08.51.54; Sat, 21 Apr 2018 08:52:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752903AbeDUPtS (ORCPT + 99 others); Sat, 21 Apr 2018 11:49:18 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:34300 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752350AbeDUPtQ (ORCPT ); Sat, 21 Apr 2018 11:49:16 -0400 Received: from mail-wr0-f200.google.com ([209.85.128.200]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1f9ulH-0008U3-Nj for linux-kernel@vger.kernel.org; Sat, 21 Apr 2018 15:49:15 +0000 Received: by mail-wr0-f200.google.com with SMTP id p7-v6so5532398wrj.4 for ; Sat, 21 Apr 2018 08:49:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=5AEvJrLsfFLSXPhfIKy7GBFOTogRtb+gOp25x1MYDBg=; b=Cq1qXAHyj06RDtC/oBzeQKXBMqqkGoOdTQqsFO93eK8rvQpFuEAPqZ/EuLTCFA2us3 wbqrvSdwyB5EIrgGTUzGURVgdh/9pnM7SuBrvZsTMgpK/NeXFz6g71twIfvZ5mqQPEbs QThKKTdxjZPrSKB0ak0b1Up1LPBR9lMADgXUPh0sVD0/4FNOqBLy9jDTBJQLkYVdkhXz LB2zU6xbx34k5uUE6HETW+6SuicjIZdmeEWkGgeCsEZjmmTvCD89eM9RyBZZxE+dra0h M2p2C90+835wEZ5lJJ1C/QMqlZritujXPTfclfDLRPMLhN4tqubAfjZSXCygQJ2oeeX8 cwxQ== X-Gm-Message-State: ALQs6tDF/chyKtSTt6oawK1+8c0HkKcnEPbjYC5godG5AErK/E0waiA7 m5vPnx6UyLytQmF9qrGGIACHzAJRaTWQCfRxReHwHtHAmQSBCHzeDCe/QwULuKXS8+c/r3fVT1r 3m8hgKw1kOnDaAVUasKojNyVO4x04HM9s/IYGD+xi7Q== X-Received: by 2002:adf:e843:: with SMTP id d3-v6mr4054821wrn.146.1524325753921; Sat, 21 Apr 2018 08:49:13 -0700 (PDT) X-Received: by 2002:adf:e843:: with SMTP id d3-v6mr4054805wrn.146.1524325753598; Sat, 21 Apr 2018 08:49:13 -0700 (PDT) Received: from gmail.com ([2a02:8070:8895:9700:b0c9:237:9998:dabc]) by smtp.gmail.com with ESMTPSA id b57-v6sm8397460wra.9.2018.04.21.08.49.12 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 21 Apr 2018 08:49:13 -0700 (PDT) From: Christian Brauner X-Google-Original-From: Christian Brauner Date: Sat, 21 Apr 2018 17:49:12 +0200 To: "Eric W. Biederman" Cc: davem@davemloft.net, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, avagin@virtuozzo.com, ktkhai@virtuozzo.com, serge@hallyn.com, gregkh@linuxfoundation.org Subject: Re: [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks Message-ID: <20180421154910.GA31964@gmail.com> References: <20180418152106.18519-1-christian.brauner@ubuntu.com> <20180418152106.18519-3-christian.brauner@ubuntu.com> <874lk8wj1j.fsf@xmission.com> <20180418215246.GA24000@gmail.com> <20180420135627.GA8350@gmail.com> <20180420161643.GA15182@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20180420161643.GA15182@gmail.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 20, 2018 at 06:16:44PM +0200, Christian Brauner wrote: > On Fri, Apr 20, 2018 at 03:56:28PM +0200, Christian Brauner wrote: > > On Wed, Apr 18, 2018 at 11:52:47PM +0200, Christian Brauner wrote: > > > On Wed, Apr 18, 2018 at 11:55:52AM -0500, Eric W. Biederman wrote: > > > > Christian Brauner writes: > > > > > > > > > Now that it's possible to have a different set of uevents in different > > > > > network namespaces, per-network namespace uevent sequence numbers are > > > > > introduced. This increases performance as locking is now restricted to the > > > > > network namespace affected by the uevent rather than locking > > > > > everything. > > > > > > > > Numbers please. I personally expect that the netlink mc_list issues > > > > will swamp any benefit you get from this. > > > > > > I wouldn't see how this would be the case. The gist of this is: > > > Everytime you send a uevent into a network namespace *not* owned by > > > init_user_ns you currently *have* to take mutex_lock(uevent_sock_list) > > > effectively blocking the host from processing uevents even though > > > - the uevent you're receiving might be totally different from the > > > uevent that you're sending > > > - the uevent socket of the non-init_user_ns owned network namespace > > > isn't even recorded in the list. > > > > > > The other argument is that we now have properly isolated network > > > namespaces wrt to uevents such that each netns can have its own set of > > > uevents. This can either happen by a sufficiently privileged userspace > > > process sending it uevents that are only dedicated to that specific > > > netns. Or - and this *has been true for a long time* - because network > > > devices are *properly namespaced*. Meaning a uevent for that network > > > device is *tied to a network namespace*. For both cases the uevent > > > sequence numbering will be absolutely misleading. For example, whenever > > > you create e.g. a new veth device in a new network namespace it > > > shouldn't be accounted against the initial network namespace but *only* > > > against the network namespace that has that device added to it. > > > > Eric, I did the testing. Here's what I did: > > > > I compiled two 4.17-rc1 Kernels: > > - one with per netns uevent seqnums with decoupled locking > > - one without per netns uevent seqnums with decoupled locking > > > > # Testcase 1: > > Only Injecting Uevents into network namespaces not owned by the initial user > > namespace. > > - created 1000 new user namespace + network namespace pairs > > - opened a uevent listener in each of those namespace pairs > > - injected uevents into each of those network namespaces 10,000 times meaning > > 10,000,000 (10 million) uevents were injected. (The high number of > > uevent injections should get rid of a lot of jitter.) > > - Calculated the mean transaction time. > > - *without* uevent sequence number namespacing: > > 67 μs > > - *with* uevent sequence number namespacing: > > 55 μs > > - makes a difference of 12 μs > > > > # Testcase 2: > > Injecting Uevents into network namespaces not owned by the initial user > > namespace and network namespaces owned by the initial user namespace. > > - created 500 new user namespace + network namespace pairs > > - created 500 new network namespace pairs > > - opened a uevent listener in each of those namespace pairs > > - injected uevents into each of those network namespaces 10,000 times meaning > > 10,000,000 (10 million) uevents were injected. (The high number of > > uevent injections should get rid of a lot of jitter.) > > - Calculated the mean transaction time. > > - *without* uevent sequence number namespacing: > > 572 μs > > - *with* uevent sequence number namespacing: > > 514 μs > > - makes a difference of 58 μs > > > > So there's performance gain. The third case would be to create a bunch > > of hanging processes that send SIGSTOP to themselves but do not actually > > open a uevent socket in their respective namespaces and then inject > > uevents into them. I expect there to be an even more performance > > benefits since the rtnl_table_lock() isn't hit in this case because > > there are no listeners. > > I did the third test-case as well so: > - created 500 new user namespace + network namespace pairs *without > uevent listeners* > - created 500 new network namespace pairs *without uevent listeners* > - injected uevents into each of those network namespaces 10,000 times meaning > 10,000,000 (10 million) uevents were injected. (The high number of > uevent injections should get rid of a lot of jitter.) > - Calculated the mean transaction time. > - *without* uevent sequence number namespacing: > 206 μs > - *with* uevent sequence number namespacing: > 163 μs > - makes a difference of 43 μs > > So this test-case shows performance improvement as well. Just for fun, I did a simple statistical anlysis using t-tests and they all show significant differences at alpha-level 0.001 (Which I chose because it seemed 0.05 is a bit too lax.). Testcase 1: Welch Two Sample t-test data: x1 and y1 t = 405.16, df = 18883000, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 12.14949 12.26761 sample estimates: mean of x mean of y 68.48594 56.27739 Testcase 2: Welch Two Sample t-test data: x2 and y2 t = 38.685, df = 19682000, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 55.10630 60.98815 sample estimates: mean of x mean of y 572.9684 514.9211 Testcase 3: Welch Two Sample t-test data: x3 and y3 t = 58.37, df = 17711000, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 41.77860 44.68178 sample estimates: mean of x mean of y 207.2632 164.0330 Thanks! Christian