Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp1342726rdh; Fri, 24 Nov 2023 10:09:11 -0800 (PST) X-Google-Smtp-Source: AGHT+IHyofaPmQjpyGXYb7709SEp5kBczTgRhlZxgd01CmJQmO/K1KlWMYxySmQdjRUyX03+fSML X-Received: by 2002:a05:6a20:7daa:b0:18c:2bb3:b9e6 with SMTP id v42-20020a056a207daa00b0018c2bb3b9e6mr650086pzj.47.1700849351418; Fri, 24 Nov 2023 10:09:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700849351; cv=none; d=google.com; s=arc-20160816; b=U8t+eJX1lGKhCS62mA+o7lhfI6WeP5LCLjkUY4CTDnmP0TiFYTTaLjE+CQfZXdhBSG X4LVQC4mgyCI70O3PVb2hT7OFInBQXJMtjVVN4xJ1dXgUWGmsYbf0W3dv1IpWoo3BvrS DpWq0RM+yFttD7dgvAGiA5lAKWn7C0QBTTomNtu+VOTa6zXgtRycyeNuA0ulvnb6G6Tl XgFo+hMeoU4xsZXpGbXO8L/Gv9aiMX1EhCMmU+pVkecP7Tt7diavLbhWYjxQXUsND/QH P3NMbDB5G2ltgpJRXqEdfwSB1pekAt6mtq1qoB7g6j0dk3WVPIxxRYsiYH7HE9FHp67j LCXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=01RM434kpO9Tn0+IiA+HOzk2iSu3rT9XAzz/CbkrBxI=; fh=q6XKL7ArGdKsioVJH5IlIe4bY+j84UABH+3xosQaP/E=; b=wX1ccbtNlUT3GWoqQiyhR61ULlM19otv3sCsconVEjoXlThctDOU/uIcGJI0U59uiO iNGORIqMinLmzN9KmCix9RgVayfxyolhPWdn7EsK2alQzVCKCuGSvRmNswV38Bqoaoa/ tmRs1l7DT1vleGMHga8P54ZJJYDorxyuvK+0ldu9vyykShmgotYWVVGnAGo7cnlEu2xd L8PQB3d3RUxrx1DMTBDJz/0JNld3ZDWHzJ9bY5giAHg8wuKHECuUQEl5FixXPURePkEj r0wa6XlxzQF2onkRkCK+7j44BJNs49I3vG6PoKvYMC5UpiP2/ZeqFnt5QXXQR9+55w20 85EQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=pWCVMqtk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id z13-20020a630a4d000000b005c202446846si3850961pgk.510.2023.11.24.10.09.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Nov 2023 10:09:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=pWCVMqtk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 9A90383CDFEB; Fri, 24 Nov 2023 10:09:08 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230484AbjKXSIx (ORCPT + 99 others); Fri, 24 Nov 2023 13:08:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39204 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229741AbjKXSIw (ORCPT ); Fri, 24 Nov 2023 13:08:52 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C227119A8 for ; Fri, 24 Nov 2023 10:08:58 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5B38CC433C9; Fri, 24 Nov 2023 18:08:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1700849338; bh=+Onur62i+JRm8hJ4GcCqwMgzvptXqa0TFT19YK6FOgY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=pWCVMqtkqXftVtN9joPIEwEYC8AlmmadpDBhBJUmc8X7rNJZn991L6GelQikqJ1Bf kXMtzeeU24PZHElIOMKyDaboNLUZyGOZtnenAQ27mbDzRpH143SFNLwrXAYn5XMKeA +oOrZcutqPo6LgBt5FfHLgbzSUvh5XEGvCskX8OO0aylLp5TpWZcehCp5uzT6VHmDZ msQnrOmjV2CBRJJi0Jy6jgU2tIapJTh8Dc3mV9j4oUh2Tin/Vd8es4SPVs8z/vLt6B cwyXpKCJYMZv9ylKef5vNxZH2HRbwVRNmhl2iiK+3PK7w//WngOggTC6V8Vph0ZBcq zESHmGtnQd0gw== Date: Fri, 24 Nov 2023 19:08:50 +0100 From: Christian Brauner To: Michael =?utf-8?B?V2Vpw58=?= Cc: Alexander Mikhalitsyn , Alexei Starovoitov , Paul Moore , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Quentin Monnet , Alexander Viro , Miklos Szeredi , Amir Goldstein , "Serge E. Hallyn" , bpf@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, gyroidos@aisec.fraunhofer.de Subject: Re: [RESEND RFC PATCH v2 00/14] device_cgroup: guard mknod for non-initial user namespace Message-ID: <20231124-tropfen-kautschukbaum-bee7c7dec096@brauner> References: <20231025094224.72858-1-michael.weiss@aisec.fraunhofer.de> <20231124-filzen-bohrinsel-7ff9c7f44fe1@brauner> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20231124-filzen-bohrinsel-7ff9c7f44fe1@brauner> X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Fri, 24 Nov 2023 10:09:08 -0800 (PST) On Fri, Nov 24, 2023 at 05:47:32PM +0100, Christian Brauner wrote: > > - Integrate this as LSM (Christian, Paul) > > Huh, my rant made you write an LSM. I'm not sure if that's a good or bad > thing... > > So I dislike this less than the initial version that just worked around Hm, I wonder if we're being to timid or too complex in how we want to solve this problem. The device cgroup management logic is hacked into multiple layers and is frankly pretty appalling. What I think device access management wants to look like is that you can implement a policy in an LSM - be it bpf or regular selinux - and have this guarded by the main hooks: security_file_open() security_inode_mknod() So, look at: vfs_get_tree() -> security_sb_set_mnt_opts() -> bpf_sb_set_mnt_opts() A bpf LSM program should be able to strip SB_I_NODEV from sb->s_iflags today via bpf_sb_set_mnt_opts() without any kernel changes at all. I assume that a bpf LSM can also keep state in sb->s_security just like selinux et al? If so then a device access management program or whatever can be stored in sb->s_security. That device access management program would then be run on each call to: security_file_open() -> bpf_file_open() and security_inode_mknod() -> bpf_sb_set_mnt_opts() and take access decisions. This obviously makes device access management something that's tied completely to a filesystem. So, you could have the same device node on two tmpfs filesystems both mounted in the same userns. The first tmpfs has SB_I_NODEV and doesn't allow you to open that device. The second tmpfs has a bpf LSM program attached to it that has stripped SB_I_NODEV and manages device access and allows callers to open that device. I guess it's even possible to restrict this on a caller basis by marking them with a "container id" when the container is started. That can be done with that task storage thing also via a bpf LSM hook. And then you can further restrict device access to only those tasks that have a specific container id in some range or some token or something. I might just be fantasizing abilities into bpf that it doesn't have so anyone with the knowledge please speak up. If this is feasible then the only thing we need to figure out is what to do with the legacy cgroup access management and specifically the capable(CAP_SYS_ADMIN) check that's more of a hack than anything else. So, we could introduce a sysctl that makes it possible to turn this check into ns_capable(sb->s_userns, CAP_SYS_ADMIN). Because due to SB_I_NODEV it is inherently safe to do that. It's just that a lot of container runtimes need to have time to adapt to a world where you may be able to create a device but not be able to then open it. This isn't rocket science but it will take time. But in the end this will mean we get away with minimal kernel changes and using a lot of existing infrastructure. Thoughts?