Received: by 10.192.165.148 with SMTP id m20csp294081imm; Fri, 20 Apr 2018 06:57:53 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/knnWLAgaV737l+ZK7DxnhgAfilSRCqSDl4M2/MNqX/EA6rY35jR1pX8cz/BpGHxMEScPE X-Received: by 10.101.87.136 with SMTP id b8mr8741490pgr.282.1524232673274; Fri, 20 Apr 2018 06:57:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524232673; cv=none; d=google.com; s=arc-20160816; b=pcuVehbDulTJQ2fcsSwU/+VEBKikQ/K2Jtnz2WJ8oVNYqv+4WtqiY2t5bqxyEBdMOK IPwZ5DVOZXgeuOaqkDlmd30nGIw2VgAgdf0FdyfUbGNB1inB48h3o4vcGZeEaQLVY7hf L3hz/67rriGBIJNDQOFlfwhky1gqmNzYwk6qslecu0tNj8Z6mTPpKQF6WwSv0nSRzW66 0TYnfCLsitsTQ2uxDe3pqtDkI2pyb21fIP6UlEKQef+XnyuCDi3b+ccZOzxl2375RvXP 9W36jxdB/t7cI/gnWuYGYcnEfEaOkd7laAFuBo42GvuXZAkNQFCYvAPrVydrtOYIkzLZ XEBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:date:from :arc-authentication-results; bh=bmzOUM5MwNM00WZ0dzT1Z+4ANyOdQzJANb5E1RVXwws=; b=IWCbKd2DiUNBxTMp2x34ZNaUjkxVwwsKisaJXl+4NYUWcr+/F6Xt4Nk9dNdWM5WPW8 7p/eCjbC9AU3Y39M4FGraNO2CftCL3J31zB6OtjXagYvq5pvbpwvfa9fFSkbyZoxToil DQ0j1LktIJsllOEsWcM+A6xr/14sSpk5VbvBLaq/SVgdQglwQD7y4SU76t2Rm68x9qjr ebUcUow5/62gkSHBO7FZt7UMn898NEiuts9cuperqkYcjN/1O9FE2yFoTF9ZVAJNnUbx ut2Vc1Lh2/0rY0DZlRTmIPM8yWivtx/xjjHu5L1PMev6Qe5ILi12S6/Gl6PBExmCGh3J 60Tw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 94-v6si5748642ple.56.2018.04.20.06.57.38; Fri, 20 Apr 2018 06:57:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755047AbeDTN4d (ORCPT + 99 others); Fri, 20 Apr 2018 09:56:33 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:47023 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754879AbeDTN4b (ORCPT ); Fri, 20 Apr 2018 09:56:31 -0400 Received: from mail-wr0-f200.google.com ([209.85.128.200]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1f9WWc-0005Wu-N9 for linux-kernel@vger.kernel.org; Fri, 20 Apr 2018 13:56:30 +0000 Received: by mail-wr0-f200.google.com with SMTP id c56-v6so8974056wrc.5 for ; Fri, 20 Apr 2018 06:56:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=bmzOUM5MwNM00WZ0dzT1Z+4ANyOdQzJANb5E1RVXwws=; b=aDoyW8D+VD3r0Yuea8OB3LIfLXd8Ms1vNeuHdz3Tcx0ajRG/8yR5D74dWuWyV5NQsM 9SYfJunFQylSwLF1+/ZBLC+YQObwPKmN5FxvOtUwv6RUHiClh8sWSTy9PgHj8rFJhhrN 9zChh45ROnj+zWSjcHjxMdzpsDOrNtOKcG/+1u+d10RYymdbsBRHReg2XhMG3kJe7yC3 M08qidTUqcmV16qPIIVOmqfv5wN5cb/f3bxyJHVKWNcBlu1/5gMJzGHP4tBw5Icrd4NO yNNejDqiUnVf9PZDTg7MeEYCbskrGE76VKeZfdl9BzluSjbqv5dHxcgRZUCbk8LnQ/U7 Vwxg== X-Gm-Message-State: ALQs6tAFC0HEcLvtUYhEo3u63w2/RhkHJylj1qVmF8PKOZwrJgGLvoJg ORHCIRba2F/l8h/ikIZ0NmPMGpUiVDSnmg6BO29JQzh1iiT+M/F4/xegDlJ3jPGSxpD65JFJQeo KQc3UvkPybvNtqd2FhXfFxkkI4NXPeb6HayVxW3FnBQ== X-Received: by 2002:adf:9615:: with SMTP id b21-v6mr8496019wra.253.1524232590336; Fri, 20 Apr 2018 06:56:30 -0700 (PDT) X-Received: by 2002:adf:9615:: with SMTP id b21-v6mr8495998wra.253.1524232589982; Fri, 20 Apr 2018 06:56:29 -0700 (PDT) Received: from gmail.com (u-087-c122.eap.uni-tuebingen.de. [134.2.87.122]) by smtp.gmail.com with ESMTPSA id 19sm2301199wmn.40.2018.04.20.06.56.29 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 20 Apr 2018 06:56:29 -0700 (PDT) From: Christian Brauner X-Google-Original-From: Christian Brauner Date: Fri, 20 Apr 2018 15:56:28 +0200 To: "Eric W. Biederman" Cc: davem@davemloft.net, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, avagin@virtuozzo.com, ktkhai@virtuozzo.com, serge@hallyn.com, gregkh@linuxfoundation.org Subject: Re: [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks Message-ID: <20180420135627.GA8350@gmail.com> References: <20180418152106.18519-1-christian.brauner@ubuntu.com> <20180418152106.18519-3-christian.brauner@ubuntu.com> <874lk8wj1j.fsf@xmission.com> <20180418215246.GA24000@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20180418215246.GA24000@gmail.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 18, 2018 at 11:52:47PM +0200, Christian Brauner wrote: > On Wed, Apr 18, 2018 at 11:55:52AM -0500, Eric W. Biederman wrote: > > Christian Brauner writes: > > > > > Now that it's possible to have a different set of uevents in different > > > network namespaces, per-network namespace uevent sequence numbers are > > > introduced. This increases performance as locking is now restricted to the > > > network namespace affected by the uevent rather than locking > > > everything. > > > > Numbers please. I personally expect that the netlink mc_list issues > > will swamp any benefit you get from this. > > I wouldn't see how this would be the case. The gist of this is: > Everytime you send a uevent into a network namespace *not* owned by > init_user_ns you currently *have* to take mutex_lock(uevent_sock_list) > effectively blocking the host from processing uevents even though > - the uevent you're receiving might be totally different from the > uevent that you're sending > - the uevent socket of the non-init_user_ns owned network namespace > isn't even recorded in the list. > > The other argument is that we now have properly isolated network > namespaces wrt to uevents such that each netns can have its own set of > uevents. This can either happen by a sufficiently privileged userspace > process sending it uevents that are only dedicated to that specific > netns. Or - and this *has been true for a long time* - because network > devices are *properly namespaced*. Meaning a uevent for that network > device is *tied to a network namespace*. For both cases the uevent > sequence numbering will be absolutely misleading. For example, whenever > you create e.g. a new veth device in a new network namespace it > shouldn't be accounted against the initial network namespace but *only* > against the network namespace that has that device added to it. Eric, I did the testing. Here's what I did: I compiled two 4.17-rc1 Kernels: - one with per netns uevent seqnums with decoupled locking - one without per netns uevent seqnums with decoupled locking # Testcase 1: Only Injecting Uevents into network namespaces not owned by the initial user namespace. - created 1000 new user namespace + network namespace pairs - opened a uevent listener in each of those namespace pairs - injected uevents into each of those network namespaces 10,000 times meaning 10,000,000 (10 million) uevents were injected. (The high number of uevent injections should get rid of a lot of jitter.) - Calculated the mean transaction time. - *without* uevent sequence number namespacing: 67 μs - *with* uevent sequence number namespacing: 55 μs - makes a difference of 12 μs # Testcase 2: Injecting Uevents into network namespaces not owned by the initial user namespace and network namespaces owned by the initial user namespace. - created 500 new user namespace + network namespace pairs - created 500 new network namespace pairs - opened a uevent listener in each of those namespace pairs - injected uevents into each of those network namespaces 10,000 times meaning 10,000,000 (10 million) uevents were injected. (The high number of uevent injections should get rid of a lot of jitter.) - Calculated the mean transaction time. - *without* uevent sequence number namespacing: 572 μs - *with* uevent sequence number namespacing: 514 μs - makes a difference of 58 μs So there's performance gain. The third case would be to create a bunch of hanging processes that send SIGSTOP to themselves but do not actually open a uevent socket in their respective namespaces and then inject uevents into them. I expect there to be an even more performance benefits since the rtnl_table_lock() isn't hit in this case because there are no listeners. Christian