Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp4055511pxj; Tue, 8 Jun 2021 05:33:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyaxXiO8OrWeI4RhdgLbowEmDZ0YyHSw0OJsiRZWm4UK52kaQ/jI5F9jIroapwoiRD2GWbX X-Received: by 2002:a17:906:b0cb:: with SMTP id bk11mr23553292ejb.310.1623155616451; Tue, 08 Jun 2021 05:33:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623155616; cv=none; d=google.com; s=arc-20160816; b=YOTd9NlQIyFAaXtQMQI01G1mzvzQZv7Z7FFdGyw7JY9EEdaAipsIWYiRrOM9TcvvqR aSG3yKIitX7siZWHSqbtd3jP1+ji4z++z1dvf8AMJ6TuIw9AXXjELAjhQitZA8srXj+u WUGpe4PymvDejCJTBR3/9jpFAPOcdcpmqcCXAIGzmq1MMxeCDJtQy0eDaWYcXPoPZrFB Yj3TfsdIg6zf94rPlDSJnWYhCma0aXnxNLm4cPCuyhr7LZP5R/WpznBpgXpjbCPY9AFP 0DANIz0cLNbqAzUWTBWiycp40QUDZBU8BcFvgFaGM0dLfLFXq5VgbUVdNeLvqyNhfASj EpFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=JXEUC+5geEmB0sbf8YxOUonPUiwV8WZGQPpm0HDn6h0=; b=FTc14jhsgIekPu/oD3wXk8LY+3lOdnXCwTwrznW5+MTcol9iZgdEaIv1dJUbnstNod tXlITPwFeYCRSShTY61mp54wZLgg+aPk2Q8p9qVTmvcYnFBP2Ogo3+08UGJYBivJ3QHN HWnyy1gMI1G9OQ4fpT+2cPSQ2KPkLAHMgp9sqO4rlFFgejFUXZ0ptNlhUFxL6Tshp/d1 V8rRSasL6+3K4cPwbFPJFQkcIwfd5swEBZ7Agt+86Qr/nMrNNMsY93P1cGFkfVTwVysQ djozbs13zZ0z6LLQlOP6sgNraJB2cdj7b81VmdGLbaedzBMwek2yCIAzCcO44dZ+Z1E6 Wddw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u17si15436425edr.539.2021.06.08.05.33.12; Tue, 08 Jun 2021 05:33:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232620AbhFHMcw (ORCPT + 99 others); Tue, 8 Jun 2021 08:32:52 -0400 Received: from mail.kernel.org ([198.145.29.99]:37808 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232627AbhFHMcr (ORCPT ); Tue, 8 Jun 2021 08:32:47 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 2E94760FDA; Tue, 8 Jun 2021 12:30:53 +0000 (UTC) Date: Tue, 8 Jun 2021 14:30:50 +0200 From: Christian Brauner To: "Enrico Weigelt, metux IT consult" , Greg Kroah-Hartman Cc: containers@lists.linux.dev, "linux-kernel@vger.kernel.org" Subject: Re: device namespaces Message-ID: <20210608123050.zde5lwmovjr4yhiy@wittgenstein> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 08, 2021 at 11:38:16AM +0200, Enrico Weigelt, metux IT consult wrote: > Hello folks, > > > I'm going to implement device namespaces, where containers can get an > entirely different view of the devices in the machine (usually just a > specific subset, but possibly additional virtual devices). > > For start I'd like to add a simple mapping of dev maj/min (leaving aside > sysfs, udev, etc). An important requirement for me is that the parent ns > can choose to delegate devices from those it full access too (child > namespaces can do the same to their childs), and the assignment can > change (for simplicity ignoring the case of removing devices that are > already opened by some process - haven't decided yet whether they should > be forcefully closed or whether keeping them open is a valid use case). > > The big question for me now is how exactly to do the table maintenance > from userland. We already have entries in /proc//ns/*. I'm thinking > about using them as command channel, like this: > > * new child namespaces are created with empty mapping > * mapping manipulation is done by just writing commands to the ns file > * access is only granted if the writing process itself is in the > parent's device ns and has CAP_SYS_ADMIN (or maybe their could be some > admin user for the ns ? or the 'root' of the corresponding user_ns ?) > * if the caller has some restrictions on some particular device, these > are automatically added (eg. if you're restricted to readonly, you > can't give rw to the child ns). > > Is this a good way to go ? Or what would be a better one ? Ccing Greg. Without adressing specific problems, I should warn you that this idea is not new and the plan is unlikely to go anywhere. Especially not without support from Greg. Also note that I have done work to make it possible to do sufficient device management in containers. There's a longer series associated with this but the gist is 692ec06d7c92 ("netns: send uevent messages") where you can forward uevents to containers. I spoke about this at Plumbers in 2018 or so too. For example, LXD makes use of this. When you hotplug a device into a container LXD will forward the generated uevents to the container making it possible for the container to manage those devices. That's fully under control of userspace and means we don't need to burden the kernel with this.