Received: by 2002:ab2:6203:0:b0:1f5:f2ab:c469 with SMTP id o3csp1885053lqt; Sun, 21 Apr 2024 13:48:50 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUOQo9+A2y2uq805ej1QavLnouSFbY9m/SSaqLqll28orzS5EVPDEOFFplNOI/901lAEiouyQvYzqWIKLSEsWg/CTLj387fI8XgBY9M5Q== X-Google-Smtp-Source: AGHT+IFl7fTlLgePwx9/0WOHVjkH1re/BJjC6olMnR6TaZoHJ8qNwrbheb3WN0J5P1D0flT96hrc X-Received: by 2002:a17:907:3e1a:b0:a55:8c0b:bc00 with SMTP id hp26-20020a1709073e1a00b00a558c0bbc00mr5519489ejc.62.1713732530117; Sun, 21 Apr 2024 13:48:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1713732530; cv=none; d=google.com; s=arc-20160816; b=yX0xivatIWlLvEHhu0P3382QFRMbadMPXNfPEO+eLWKEfW4KnBfDhOBPki/mDcm55I AQvfN/YYoJ0h/Mgt1MwhBjqxpKNOkLKZllOFpZ+/FeaqLbw92ReO5BGGsOIr8M9fT2EL ZxrCxYrzkYGS/XYwEIPmbjjlCNh/avYcB90vqvvU0dV880iw3lFPP1aU/JT5lwojcvIE LBFcuLmC9ahCqzHYhBr2tRzWad9g1aDXv3D9UkpOWJZvtnGCWrmWWYB3a9lS7E6f9WYA 7jQBgQoXh8PuuHdDKQU5Ezoc7ugX6WLNCUQow7sunaQlthyvsSFDWnvxZj34fnJG9XFp Q+Og== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=subject:user-agent:in-reply-to:content-disposition:mime-version :references:message-id:to:from:date:delivered-to:delivered-to :reply-to:list-id:list-subscribe:list-unsubscribe:list-help :list-post:precedence:mailing-list; bh=K+e5xZcCw/wX1leG+TPL/56zuOGQwIg3qOZ3C6jFaFY=; fh=9jsPTyo6edd9xvAeG+KFFrRrXMmgB/RdwUKOrvy9dcA=; b=ZFMnmNsGMNRdi5NMypv9Gx4ckw23Qymu9/JK/N189QB1QzAr7jA58FrjYJiJktPEII LY+2EFK/f1D8twT/rU9k0V856LMJ8mNNvfxJdbpVrnmRYwpSlEgEGMxHyCWaKtXkMU0d 59QzmDfV40CLbTFN3A0PE2lofFqcYjmBtWg3WrQKkpxQPcMZwVKEWhjrJ6ytHRH6421p 7uydQBgYnip7efT1qySJlwh9ZXPRDxe+9Dml174CXy2AmiGUwn/si1djRq429Q5Uj4j8 E1ektbH1aSj8E8mg4ZCYCGMCH5f7sadRWjlgOjV1L7L2eagPuwe02xHKmiAGkX7/DHD7 ju5g==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of oss-security-return-30066-linux.lists.archive=gmail.com@lists.openwall.com designates 193.110.157.125 as permitted sender) smtp.mailfrom="oss-security-return-30066-linux.lists.archive=gmail.com@lists.openwall.com" Return-Path: Received: from second.openwall.net (second.openwall.net. [193.110.157.125]) by mx.google.com with SMTP id ko8-20020a170906aa0800b00a51f7d93e93si4818935ejb.968.2024.04.21.13.48.50 for ; Sun, 21 Apr 2024 13:48:50 -0700 (PDT) Received-SPF: pass (google.com: domain of oss-security-return-30066-linux.lists.archive=gmail.com@lists.openwall.com designates 193.110.157.125 as permitted sender) client-ip=193.110.157.125; Authentication-Results: mx.google.com; spf=pass (google.com: domain of oss-security-return-30066-linux.lists.archive=gmail.com@lists.openwall.com designates 193.110.157.125 as permitted sender) smtp.mailfrom="oss-security-return-30066-linux.lists.archive=gmail.com@lists.openwall.com" Received: (qmail 32617 invoked by uid 550); 21 Apr 2024 20:48:36 -0000 Mailing-List: contact oss-security-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: oss-security@lists.openwall.com Delivered-To: mailing list oss-security@lists.openwall.com Delivered-To: moderator for oss-security@lists.openwall.com Received: (qmail 31791 invoked from network); 21 Apr 2024 20:47:22 -0000 Date: Sun, 21 Apr 2024 22:47:12 +0200 From: Solar Designer To: oss-security@lists.openwall.com Message-ID: <20240421204712.GA17034@openwall.com> References: <20240414190855.GA12716@openwall.com> <354b913bc1c154c1e3a2fc34ed8ed6b0d4641f11.camel@canonical.com> <20240419154435.GA7046@openwall.com> <20240420181211.GA12463@openwall.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Subject: Re: [oss-security] Linux: Disabling network namespaces On Sun, Apr 21, 2024 at 01:30:49PM +0100, Simon McVittie wrote: > On Sat, 20 Apr 2024 at 20:12:11 +0200, Solar Designer wrote: > > So with my idea/proposal, someone using these tools on a > > desktop system would need to set the max depth to 1. That would leave > > the kernel's full attack surface exposed on the host system, but not to > > sandboxed programs because those would run with capabilities already > > relinquished (per what you write above) and would not be able to regain > > them by creating a nested namespace. > > I believe that's all correct. If someone prototypes this, a way to verify > it would be, minimally: > > $ ip addr ls > (should show all your IP addresses) > $ bwrap --dev-bind / / -- ip addr ls > (same output) > $ bwrap --dev-bind / / --unshare-net -- ip addr ls > (should show only lo with 127.0.0.1 and ::1) > > or for a "whole stack" version with Flatpak, install any random Flatpak > app such as org.gnome.Recipes and do: > > $ flatpak run --unshare=network org.gnome.Recipes > > # or to explore the sandbox environment interactively > $ flatpak run --command=bash --unshare=network org.gnome.Recipes > > For simplicity, the use of bwrap shown above is not a security boundary: > it doesn't make any attempt to restrict access to the host filesystem > like e.g. Flatpak does. bwrap command-lines that implement a meaningful > security boundary, while still providing useful functionality, are much > longer than that! Thank you! > > Sounds like a worthwhile feature? > > I'm not sure. As with most security designs, it depends on your security > model. My priorities are: 1. Systems and especially servers that do not use containers. They nevertheless may use namespaces in some systemd services by default, which I'd like to keep working seamlessly. On such systems, it should be possible to reduce the kernel's exposure to a level we had prior to unprivileged user namespaces. Right now, upstream's only max_* settings break those systemd services (just the sandboxing aspect or fully). I hope my proposed setting with a depth of 0 (capabilities ineffective in any namespace, or maybe in any created other than by host root) would work for this. With luck, the same might even work for Firefox if it does not need capabilities (but my priority is server systems, so that would be a pleasant extra, not a requirement). 2. Server and development systems that use containers, such as Docker and Kubernetes. I guess for them a depth of 1 would commonly be needed, but we'd still protect the kernel from attacks by nested containers (intentional or attacker-created). I suppose a compromised task running as non-root in a container (I mean non-root even from the container's perspective) would no longer have capabilities and due to max depth would not be able to usefully gain them by creating a nested namespace. With luck, some setups like this could even work with a max depth of 0, if we allow capabilities to remain effective when the container is started by host root. 3. Desktop systems, Flatpak, etc. If we can provide useful settings and hardening for these as well, that's a great bonus. Overall, my thinking is that someone using containers may be most concerned about attacks from within containers than from the host. Similarly, someone using nested containers may be most concerned about attacks from the deepest level. Ideally, we'd protect against attacks from all levels, but since can't do that easily, let's at least protect from some - hopefully, the most relevant ones. > To protect a trusted user from their own sandboxed apps, it should be > unnecessary/redundant for Flatpak users, because Flatpak already doesn't > let apps inherit CAP_NET_ADMIN or create new user namespaces - but it > could be useful for other sandboxed app frameworks, or as a second line > of defence against Flatpak not providing the boundary that it aims to. > > To protect the OS and other users from a malicious or compromised > user account using kernel vulnerabilities to elevate privileges, it's > insufficient - if that's your security model then there isn't going to be > any substitute for either trusting the kernel to make CAP_NET_ADMIN in a > non-init user namespace be safe, or trusting a component like bwrap to > impose restrictions that its caller is not allowed to bypass. Yes, with depth >= 1 allowed, such as to use Flatpak, there would be no protection from host users. > Of course, any time we say things like "trusting a component to impose > restrictions that its caller is not allowed to bypass", we get into > the same territory as setuid/setgid/setcap, in terms of needing to > prevent LD_PRELOAD, LD_LIBRARY_PATH and similar ways to influence the > trusted component's behaviour from the outside - which is likely to be > impossible if the kernel isn't helping to defang those aspects of the > execution environment by flagging the process as AT_SECURE, either in > core kernel code or in an LSM like AppArmor. To have a component impose restrictions, the feature would first need to be made unavailable directly. Which basically means no _unprivileged_ user namespaces, and bwrap or such started as SUID root - in which case it would have AT_SECURE. That's not a setup I was thinking of, but now that you bring it up this shows how upstream Linux is lacking support for it - this needs a separate knob to control _unprivileged_ user namespaces like Debian has. My proposed knob could also satisfy this need, if we do include a bypass for namespaces created by host root. > I believe the kernel maintainers' position is that CAP_NET_ADMIN in > a non-init userns is meant to be safe for untrusted code to have, so > auditing and if necessary hardening the kernel's use of CAP_NET_ADMIN > might well be better-received upstream than trying to limit which parts > of user-space can obtain it. This seems to be the case, but those activities are orthogonal. We can try and have both. Alexander