Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp2686547ybz; Sun, 19 Apr 2020 07:19:11 -0700 (PDT) X-Google-Smtp-Source: APiQypKn4P2iH3liZj7wcDlYJaTkmHeuvOpfOybAp5CcoKYvNPPEDkFczxO4z/UYXDxzAUIwY6gL X-Received: by 2002:a17:906:1c8a:: with SMTP id g10mr12365193ejh.342.1587305951035; Sun, 19 Apr 2020 07:19:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587305951; cv=none; d=google.com; s=arc-20160816; b=jHWBMqC/Ca98y27eQpguMnr467hngg2c7s4tlk7J6NvhmUlkzFMpQF+OZeFk7pM1wx kUJoQEFOL6kf4OwgVc+sfdmrT6xD5hsVAzcoQTHVHKKRGQ8HB4vHoftRIY/yDbjMSzQM bEuyn0AvxVtjXzrUPCZVCgLsgyD/tjv/+JrYPGlrZesp45WMQHNkEbY3T6XXodoxGrUm itaZol4B0fOmg3klywSzJ7ImTaGV09x0hL5tK5MIFMqfw4KD2owmZeggA5vZoOVyKkAP aRZAkcewydJhER0xGEnwhNoNNQFGiWlQmVEwJqcawfdf0KxVbkxi+V6qCgoMyGxEgfdj odHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=29g2Y7iRRRvIHcrRx9ZGfnfPRBCUIdg9WBA4HRgSohk=; b=DHo/YRX99xuo3En2Cq2jvrUcMbVnRTVkW0J8n7eeLj8zX8v6+gRSf8BkAVOcqOhpj0 8taWZgBIfDUTdasbaVxH4ZBuvJb+xNxO4eZOKIKZtD0aEcX+U8dfhtnkmTk84FJcK2Ai qSNI9UU0txHIqD5oUoV8M/joJwfWxrC1mObXBanSsJNxjvkjkRqrvgh1trrtkGKPmY8j Pd8YPnm7u689aNQbKbHGPXo3Jw1KQ5C2MoPSwt675Ivl2953quJcuCkzbCoiIkC0qlw7 of9Eo//j9sFKChJnraPLr0n9r0VRN8BsvulEEcg/luaMvliFMj/NfvCBF/98pxTDXMbP Dpmw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t5si18481032ejb.231.2020.04.19.07.18.48; Sun, 19 Apr 2020 07:19:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726157AbgDSORg (ORCPT + 99 others); Sun, 19 Apr 2020 10:17:36 -0400 Received: from raptor.unsafe.ru ([5.9.43.93]:55922 "EHLO raptor.unsafe.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725905AbgDSORg (ORCPT ); Sun, 19 Apr 2020 10:17:36 -0400 Received: from comp-core-i7-2640m-0182e6 (ip-89-102-33-211.net.upcbroadband.cz [89.102.33.211]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by raptor.unsafe.ru (Postfix) with ESMTPSA id 2AD14209FA; Sun, 19 Apr 2020 14:17:32 +0000 (UTC) Date: Sun, 19 Apr 2020 16:17:27 +0200 From: Alexey Gladkov To: "Eric W. Biederman" Cc: LKML , Kernel Hardening , Linux API , Linux FS Devel , Linux Security Module , Akinobu Mita , Alexander Viro , Alexey Dobriyan , Andrew Morton , Andy Lutomirski , Daniel Micay , Djalal Harouni , "Dmitry V . Levin" , Greg Kroah-Hartman , Ingo Molnar , "J . Bruce Fields" , Jeff Layton , Jonathan Corbet , Kees Cook , Linus Torvalds , Oleg Nesterov , David Howells Subject: Re: [PATCH RESEND v11 2/8] proc: allow to mount many instances of proc in one pid namespace Message-ID: <20200419141727.zjstym5kbp5efoz6@comp-core-i7-2640m-0182e6> References: <20200409123752.1070597-1-gladkov.alexey@gmail.com> <20200409123752.1070597-3-gladkov.alexey@gmail.com> <87tv1iaqnq.fsf@x220.int.ebiederm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87tv1iaqnq.fsf@x220.int.ebiederm.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.6.1 (raptor.unsafe.ru [5.9.43.93]); Sun, 19 Apr 2020 14:17:32 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 17, 2020 at 01:55:05PM -0500, Eric W. Biederman wrote: > Alexey Gladkov writes: > > > This patch allows to have multiple procfs instances inside the > > same pid namespace. The aim here is lightweight sandboxes, and to allow > > that we have to modernize procfs internals. > > > > 1) The main aim of this work is to have on embedded systems one > > supervisor for apps. Right now we have some lightweight sandbox support, > > however if we create pid namespacess we have to manages all the > > processes inside too, where our goal is to be able to run a bunch of > > apps each one inside its own mount namespace without being able to > > notice each other. We only want to use mount namespaces, and we want > > procfs to behave more like a real mount point. > > > > 2) Linux Security Modules have multiple ptrace paths inside some > > subsystems, however inside procfs, the implementation does not guarantee > > that the ptrace() check which triggers the security_ptrace_check() hook > > will always run. We have the 'hidepid' mount option that can be used to > > force the ptrace_may_access() check inside has_pid_permissions() to run. > > The problem is that 'hidepid' is per pid namespace and not attached to > > the mount point, any remount or modification of 'hidepid' will propagate > > to all other procfs mounts. > > > > This also does not allow to support Yama LSM easily in desktop and user > > sessions. Yama ptrace scope which restricts ptrace and some other > > syscalls to be allowed only on inferiors, can be updated to have a > > per-task context, where the context will be inherited during fork(), > > clone() and preserved across execve(). If we support multiple private > > procfs instances, then we may force the ptrace_may_access() on > > /proc// to always run inside that new procfs instances. This will > > allow to specifiy on user sessions if we should populate procfs with > > pids that the user can ptrace or not. > > > > By using Yama ptrace scope, some restricted users will only be able to see > > inferiors inside /proc, they won't even be able to see their other > > processes. Some software like Chromium, Firefox's crash handler, Wine > > and others are already using Yama to restrict which processes can be > > ptracable. With this change this will give the possibility to restrict > > /proc// but more importantly this will give desktop users a > > generic and usuable way to specifiy which users should see all processes > > and which users can not. > > > > Side notes: > > * This covers the lack of seccomp where it is not able to parse > > arguments, it is easy to install a seccomp filter on direct syscalls > > that operate on pids, however /proc// is a Linux ABI using > > filesystem syscalls. With this change LSMs should be able to analyze > > open/read/write/close... > > > > In the new patchset version I removed the 'newinstance' option > > as suggested by Eric W. Biederman. > > Some very small requests. > > 1) Can you please not place fs_info in fs_context, and instead allocate > fs_info in fill_super? Unless I have misread introduced a resource > leak if proc is not mounted or if proc is simply reconfigured. Hm ... it seems you're right. > 2) Can you please move hide_pid and pid_gid into fs_info in this patch? > As was shown by my recent bug fix OK. I’ll do it in the next version. > 3) Can you please rebase on on v5.7-rc1 or v5.7-rc2 and repost these > patches please? I thought I could do it safely but between my bug > fixes, and Alexey Dobriyan's parallel changes to proc these patches > do not apply cleanly. > > Plus there is a resource leak in this patch. On my way. > > struct proc_fs_context { > > - struct pid_namespace *pid_ns; > > + struct proc_fs_info *fs_info; > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Please don't do this. As best as I can tell that introduces a memory > leak of proc is not mounted. Please allocate fs_info in OK. -- Rgrds, legion