Received: by 2002:a25:824b:0:0:0:0:0 with SMTP id d11csp5195680ybn; Sat, 28 Sep 2019 15:34:30 -0700 (PDT) X-Google-Smtp-Source: APXvYqzuX/5frciWE+z0lleRibbQhK5dGqfXJN4g4agGSNJA2FHc7nxiQxRKNuo6ORSAh7Yt65PM X-Received: by 2002:a50:aa8e:: with SMTP id q14mr12163322edc.155.1569710070366; Sat, 28 Sep 2019 15:34:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1569710070; cv=none; d=google.com; s=arc-20160816; b=N9R8alhOSrCw2rTvTdYSqEkWOrOJLlVuyuy9qEPpyr7LqNsaXBaFco3MFubsUrvUSY G1rvROPZj3YykSvS7W2VCAGT4qLEitqc1hFVmN+BhvWM2PeTf0P0HHhtB30kp87hHEgp oSHhhneWQaYvfVd5dQWDhyEdy7Rz2krJn90ci8/vbm0u9qiauhffH1yt0J3Meb1UDYTy dX83BKytwI0sQ+/LlQZDLnatiIixyFTEKDocj/FmogK3DRxxocLBs2Jd0ooEOS+zYHIG koe9d9ne4JHQwP/PSgQGl8A7iP7xbHORBeq3CW6qCwJ40o3y+LyrkTK0km4zweV528fU /J3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:message-id:in-reply-to:date:references:subject:cc:to :from:dkim-signature; bh=AEZIJNzLPkvsFCAzQm/Iaj2JvjR7Pra3pBs/P444LIA=; b=HKGXy31EG/rIRvT9i6SnxHzs/a+f+SL3tt6dUno0AYHqWW+8DUHGMN66fXk3a/wZHJ yqvqJkvQoUVN/PUMA647inInqtDXOe6XK8O9UR5rPLqCaXQjt3IQMwGH6jJV5arCaYWt H+v+VIpFGtxfhsRTkKGpg6Jtm5FIT2broZswwqSf07Ozufkt45b5pRYnEgk2DhNBQkvH rAA9ykEKoag0fJ/9VXelChfXKGVI22BGFJn8i3Omb/9kScVObEfdb9OBFsN8jEGOd6fp ReV3oiUt4C341V5ERYl9IH7PXeQTUtqDFOaK+6z1Mv+pccufFfSxQbn1D01fnevEeMwM anuw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=BSkK+t1r; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b42si4185058eda.367.2019.09.28.15.33.41; Sat, 28 Sep 2019 15:34:30 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=BSkK+t1r; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728898AbfI1W3r (ORCPT + 99 others); Sat, 28 Sep 2019 18:29:47 -0400 Received: from mail-pl1-f193.google.com ([209.85.214.193]:45121 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728742AbfI1W3r (ORCPT ); Sat, 28 Sep 2019 18:29:47 -0400 Received: by mail-pl1-f193.google.com with SMTP id u12so2409103pls.12 for ; Sat, 28 Sep 2019 15:29:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=AEZIJNzLPkvsFCAzQm/Iaj2JvjR7Pra3pBs/P444LIA=; b=BSkK+t1rem246jzTlalFkAqga0guUSSHEtLeWsO53LpfnYU8auIx7CY0mBtnOPsJj4 j2qQoHhjaY0tNJLUvqL+kp8MCT/ZALu+qi0chkb23g8V6uL2puNyFG3Bd5vRDoeE9jNW 7HjCZtbQcPuhtKZor75t//6X0o4tgcXfSSp7MJJZnV9SFTdhAWpP//cJEN28G9Qq8A4c f51FjRPp809BtL29997NaessI83my1iTO59aic4Y5eOcErQLmADUGLZbQmDXvuZNQleK vqSsfkaZWdlTdRQ/QwUQxIoelmJaxmsafhVeqHg5K0T3e5YfEtX/rlJs5aWnji54E8Us 5orA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:references:date :in-reply-to:message-id:user-agent:mime-version :content-transfer-encoding; bh=AEZIJNzLPkvsFCAzQm/Iaj2JvjR7Pra3pBs/P444LIA=; b=ax29ThzCMabHdU544y2XXQ6dlBJsoBjnxDb3gDJwYTI391DWPL3ia4u6qW2gyc7G2z v7Sc51jD2FvnFPCwVkmiY45RS+VIlOveVENxIXdSLHjeNqhPWyX85XrGXjaVOb19HJM2 BWcWft1XTM4T9eHOLTvLQTAnjn4apuqkbNhDi7+RxFqK5hyNmonxvimtVdpiWT4vtzM3 KLKVcMpAYU58o6fkfxAC0DBjbbyqFGkCG/c0ciALyqJ6ZftIiCWp88XSiZC/9CwRu725 46L+UozT350f7X56xZHOeYjRRDLknlv3cfSjGvkPY4131jXRutxHq5eF/F5NiJJx9GdV ba8w== X-Gm-Message-State: APjAAAU6JXRLLavYWKWiNu/tL0EGyVPSYl9r2PDCwPq0Qm9+Ou4mwatp ggKZVNndwXPWTNH7eHAuNEj0FJaq X-Received: by 2002:a17:902:144:: with SMTP id 62mr12562357plb.70.1569709786041; Sat, 28 Sep 2019 15:29:46 -0700 (PDT) Received: from alun-evanss-mbpr.local ([2001:5a8:4:3d80:2137:a395:5c87:9605]) by smtp.gmail.com with ESMTPSA id 19sm8666721pjd.23.2019.09.28.15.29.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 28 Sep 2019 15:29:45 -0700 (PDT) From: Alun Evans To: ebiederm@xmission.com (Eric W. Biederman) Cc: linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 05/27] containers: Open a socket inside a container References: <871rw1yey8.fsf@x220.int.ebiederm.org> Date: Sat, 28 Sep 2019 15:29:44 -0700 In-Reply-To: <871rw1yey8.fsf@x220.int.ebiederm.org> (Eric W. Biederman's message of "Fri, 27 Sep 2019 09:46:39 -0500") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (darwin) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 27 Sep '19 at 07:46 ebiederm@xmission.com (Eric W. Biederman) wrote: >=20 > Alun Evans writes: > >> Hi Eric, >> >> >> On Tue, 19 Feb 2019, Eric W. Biederman wrote: >>> >>> David Howells writes: >>> >>> > Provide a system call to open a socket inside of a container, using t= hat >>> > container's network namespace. This allows netlink to be used to man= age >>> > the container. >>> > >>> > fd =3D container_socket(int container_fd, >>> > int domain, int type, int protocol); >>> > >>> >>> Nacked-by: "Eric W. Biederman" >>> >>> Use a namespace file descriptor if you need this. So far we have not >>> added this system call as it is just a performance optimization. And it >>> has been too niche to matter. >>> >>> If this that has changed we can add this separately from everything else >>> you are doing here. >> >> I think I've found the niche. >> >> >> I'm trying to use network namespaces from Go. > > Yes. Go sucks for this. Haha... Neither confirm nor deny. >> Since setns is thread >> specific, I'm forced to use this pattern: >> >> runtime.LockOSThread() >> defer runtime.UnlockOSThread() >> =E2=80=A6 >> err =3D netns.Set(newns) >> >> >> This is only safe recently: >> https://github.com/vishvananda/netns/issues/17#issuecomment-367325770 >> >> - but is still less than ideal performance wise, as it locks out other >> socket operations. >> >> The socketat() / socketns() would be ideal: >> >> https://lwn.net/Articles/406684/ >> https://lwn.net/Articles/407495/ >> https://lkml.org/lkml/2011/10/3/220 >> >> >> One thing that is interesting, the LockOSThread works pretty well for >> receiving, since I can wrap it around the socket()/bind()/listen() at >> startup. Then accept() can run outside of the lock. >> >> It's creating new outbound tcp connections via socket()/connect() pairs >> that is the issue. > > As I understand it you should be able to write socketat in go something l= ike: > > runtime.LockOSThread() > err =3D netns.Set(newns); > fd =3D socket(...); > err =3D netns.Set(defaultns); > runtime.UnlockOSThread() Yeah, this is currently what I'm having to do. It's painful because due to the Go runtime model of a single OS netpoller thread, locking the OS thread to the current goroutine blocks out the other goroutines doing network I/O. > I have no real objections to a kernel system call doing that. It has > just never risen to the level where it was necessary to optimize > userspace yet. Would you be able to accept the patch from this thread with the container API? fd =3D container_socket(int container_fd, int domain, int type, int protocol); I think that seems more coherent with the rest of the container world than a follow up of https://lkml.org/lkml/2011/10/3/220 : int socketns(int namespace, int domain, int type, int protocol) I could also put some up if required. A. --=20 Alun Evans.