Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp427417pxb; Wed, 22 Sep 2021 05:28:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx4cj4L46KbHYrULcMc3Aa85hYQGqzcN1wOBYdAA9eFzIf5oyJe6iLdIyMeJlDkXywswwx2 X-Received: by 2002:a17:906:d8a4:: with SMTP id qc4mr41855057ejb.323.1632313681643; Wed, 22 Sep 2021 05:28:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632313681; cv=none; d=google.com; s=arc-20160816; b=T/QmoUNEPL/JNC/WBojDvMZn4mordCaSyYkEwlSauc4H53HTr+16mQMAAXNM49Gfsb eUcSwMqKtSwSQp9eB0TsCgQvBPLIB7C1aLUL8VwgEDHwmFVd0cKcwaOi3KJ3wTembzXk EoELM/n8hTaQhdhpHiA/bJRQbApqHvkkC+7bfUeDaRD3BxjA0tLzpAQA/+P2SiLQNCMu DlQXgX16E258ZABD4YS1zd2T5ZouvxtFlGKNveU4dHy+xAreKO6Sdbh+OniSSMwbkZvx A12i/FTmpaGxMGXugB6uD3v5T1o1fIYHBSjkrZG76pp0V6NaRDAN6clBhjmN5n2Zeedw g9Ow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=lX91ZlLZQWZ4O0K5elSbDPs1dJq8Mm2sGA9qiMayluk=; b=MbbrZVXRbBnYW1ExqPHraz7QMwZou+SQWrjQQZi4sHUusuyUwqt3a7Q8krdxjV3Ztj fVjQtbeI4I7Ej439kfvTtWrqUJ6ymq8atgz+C/9in89WTuOqErfwEnNDtmw9N7E/ce3P 0A5ew7yY7pe7jF5dQBheIE0IYzeSB6XVW6KRBRO8jAOsp1TERQSP6GKfb8S3cwv4TLRW stZcVYDbqjckEuhwxz0vxTpubAZYO6/C5FjflORaXvn6z7cNgTPXyrFG+kxeJF5XF8tM VGheoREgM5/5MeVK5bnH7i6jtRy5kdFXN8MK0CBHp603y+gwsX7Dmo26iuXkRBwLIHn4 JnNQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g21si2158797edb.488.2021.09.22.05.27.14; Wed, 22 Sep 2021 05:28:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235421AbhIVM1A (ORCPT + 99 others); Wed, 22 Sep 2021 08:27:00 -0400 Received: from mail.kernel.org ([198.145.29.99]:40958 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235335AbhIVM1A (ORCPT ); Wed, 22 Sep 2021 08:27:00 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 443DF610A1; Wed, 22 Sep 2021 12:25:28 +0000 (UTC) Date: Wed, 22 Sep 2021 14:25:23 +0200 From: Christian Brauner To: Andy Lutomirski Cc: Luis Chamberlain , Thomas =?utf-8?Q?Wei=C3=9Fschuh?= , Linux API , Linux Kernel Mailing List , Jessica Yu Subject: Re: [RFC] Expose request_module via syscall Message-ID: <20210922122523.72ypzg4pm2x6nkod@wittgenstein> References: <705fde50-37a6-49ed-b9c2-c9107cd88189@t-8ch.de> <20210916092719.v4pkhhugdiq7ytcp@wittgenstein> <2ebf1a9d-77d5-472b-a99a-b141654725da@www.fastmail.com> <6eff0e8a-4965-437d-9273-1d9d73892e1a@t-8ch.de> <8cbf0703-5734-4e92-a6cc-12de69094f95@t-8ch.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 20, 2021 at 11:36:47AM -0700, Andy Lutomirski wrote: > On Mon, Sep 20, 2021 at 11:16 AM Luis Chamberlain wrote: > > > > On Mon, Sep 20, 2021 at 04:51:19PM +0200, Thomas Weißschuh wrote: > > > > > Do you mean it literally invokes /sbin/modprobe? If so, hooking this > > > > at /sbin/modprobe and calling out to the container manager seems like > > > > a decent solution. > > > > > > Yes it does. Thanks for the idea, I'll see how this works out. > > > > Would documentation guiding you in that way have helped? If so > > I welcome a patch that does just that. > > If someone wants to make this classy, we should probably have the > container counterpart of a standardized paravirt interface. There > should be a way for a container to, in a runtime-agnostic way, issue > requests to its manager, and requesting a module by (name, Linux > kernel version for which that name makes sense) seems like an > excellent use of such an interface. I always thought of this in two ways we currently do this: 1. Caller transparent container manager requests. This is the seccomp notifier where we transparently handle syscalls including intercepting init_module() where we parse out the module to be loaded from the syscall args of the container and if it is allow-listed load it for the container otherwise continue the syscall letting it fail or failing directly through seccomp return value. 2. A process in the container explicitly calling out to the container manager. One example how this happens is systemd-nspawn via dbus messages between systemd in the container and systemd outside the container to e.g. allocate a new terminal in the container (kinda insecure but that's another issue) or other stuff. So what was your idea: would it be like a device file that could be exposed to the container where it writes requestes to the container manager? What would be the advantage to just standardizing a socket protocol which is what we do for example (it doesn't do module loading of course as we handle that differently): ## Container to host communication LXD sets up a socket at `/dev/lxd/sock` which root in the container can use to communicate with LXD on the host. In LXD, this feature is implemented through a /dev/lxd/sock node which is created and setup for all LXD instances. This file is a Unix socket which processes inside the instance can connect to. It's multi-threaded so multiple clients can be connected at the same time. Implementation details LXD on the host binds /var/lib/lxd/devlxd/sock and starts listening for new connections on it. This socket is then exposed into every single instance started by LXD at /dev/lxd/sock. The single socket is required so we can exceed 4096 instances, otherwise, LXD would have to bind a different socket for every instance, quickly reaching the FD limit. Authentication Queries on /dev/lxd/sock will only return information related to the requesting instance. To figure out where a request comes from, LXD will extract the initial socket ucred and compare that to the list of instances it manages. Protocol The protocol on /dev/lxd/sock is plain-text HTTP with JSON messaging, so very similar to the local version of the LXD protocol. Unlike the main LXD API, there is no background operation and no authentication support in the /dev/lxd/sock API. Christian