Received: by 2002:a05:6a10:c604:0:0:0:0 with SMTP id y4csp490776pxt; Thu, 12 Aug 2021 03:17:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzIQ71Nqo1g+xlnK3ZWsqJDGQmumNmLvp7oGqdGtYG4n00r5TiX4epotaQaP8bPI1rR6Zik X-Received: by 2002:a6b:670a:: with SMTP id b10mr2423624ioc.137.1628763442135; Thu, 12 Aug 2021 03:17:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628763442; cv=none; d=google.com; s=arc-20160816; b=jefOnopkPYBO8r2vqZdaY0to3OSYKuudKlTFstewKhtUWR5b4aQe5OFMtZtGe/mhy+ WK3lBspdo/FkZyOKnHXLOGabC0AI2V2vyNT22JN/KrXqU4Ve6Yvju693uWUTiSeVKHEm 79ENdFHOUjin4X5yEDSDx4qEqK30lyDIYYJ24kxITZKBS10us1V3436+HPwHSHFpIcO7 26l7TSkrVhRHgCV6GwfiLYWrQU9MzfBjMmVTYwRnOiG4ThmpxWP0tWEPR5ihiLqgAUBx OkQEoSQGkIdonu85ZJvZmRTfsmdRCuPrY6K5oM0I5IBNn6m83eS3jIS5DsXC0IfHrELy LoTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=5wLmDyZPKX0MxU+lc2Z8CrU5rz6nEUyXs9YRMJm9O8U=; b=rMlPSwmynTbyBqNrmHvGPhZ1gOWoXAlSdKTL/EJiSyhxZr0MOrq+ZPX7UqCcFatfLI cJULQQ1iQQzOJ+3JFT7XhxS7GmikCmAnqL0SMI1eZy4uKJ+ILyCjXL9ga6U64XxboTiB Y0ztSa0+pb8KFBKuZo2DtCqdqz/Eu0gVGSLCJnVPWbuPedBi2MH3TmwEYZgZ3wvIj5Ai cR5Uv3uKv/oTbPsazM4ImhX+xu+GbrbsIX7SYro4r5eWyWoccEJvG9YyQGtEtpICcxbs v1t2ADB8UBbpHEcal158P7nlibCMFjP7M8dFD84Rp/H2dGJLpvAvpHHS2oPLrmEPCQTr SsJQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j9si2398293ilr.97.2021.08.12.03.17.10; Thu, 12 Aug 2021 03:17:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235321AbhHLKGZ (ORCPT + 99 others); Thu, 12 Aug 2021 06:06:25 -0400 Received: from mail.kernel.org ([198.145.29.99]:40962 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231470AbhHLKGY (ORCPT ); Thu, 12 Aug 2021 06:06:24 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 6346160FBF; Thu, 12 Aug 2021 10:05:47 +0000 (UTC) Date: Thu, 12 Aug 2021 12:05:44 +0200 From: Christian Brauner To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, Linus Torvalds , Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Alexander Viro , Alexey Dobriyan , Steven Rostedt , Peter Zijlstra , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Petr Mladek , Sergey Senozhatsky , Andy Shevchenko , Rasmus Villemoes , Kees Cook , "Eric W. Biederman" , Greg Ungerer , Geert Uytterhoeven , Mike Rapoport , Vlastimil Babka , Vincenzo Frascino , Chinwen Chang , Michel Lespinasse , Catalin Marinas , "Matthew Wilcox (Oracle)" , Huang Ying , Jann Horn , Feng Tang , Kevin Brodsky , Michael Ellerman , Shawn Anastasio , Steven Price , Nicholas Piggin , Jens Axboe , Gabriel Krisman Bertazi , Peter Xu , Suren Baghdasaryan , Shakeel Butt , Marco Elver , Daniel Jordan , Nicolas Viennot , Thomas Cedeno , Collin Fijalkovich , Michal Hocko , Miklos Szeredi , Chengguang Xu , Christian =?utf-8?B?S8O2bmln?= , linux-unionfs@vger.kernel.org, linux-api@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Andrei Vagin Subject: Re: [PATCH v1 3/7] kernel/fork: always deny write access to current MM exe_file Message-ID: <20210812100544.uhsfp75b4jcrv3qx@wittgenstein> References: <20210812084348.6521-1-david@redhat.com> <20210812084348.6521-4-david@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210812084348.6521-4-david@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [+Cc Andrei] On Thu, Aug 12, 2021 at 10:43:44AM +0200, David Hildenbrand wrote: > We want to remove VM_DENYWRITE only currently only used when mapping the > executable during exec. During exec, we already deny_write_access() the > executable, however, after exec completes the VMAs mapped > with VM_DENYWRITE effectively keeps write access denied via > deny_write_access(). > > Let's deny write access when setting the MM exe_file. With this change, we > can remove VM_DENYWRITE for mapping executables. > > This represents a minor user space visible change: > sys_prctl(PR_SET_MM_EXE_FILE) can now fail if the file is already > opened writable. Also, after sys_prctl(PR_SET_MM_EXE_FILE), the file Just for completeness, this also affects PR_SET_MM_MAP when exe_fd is set. > cannot be opened writable. Note that we can already fail with -EACCES if > the file doesn't have execute permissions. > > Signed-off-by: David Hildenbrand > --- The biggest user I know and that I'm involved in is CRIU which heavily uses PR_SET_MM_MAP (with a fallback to PR_SET_MM_EXE_FILE on older kernels) during restore. Afair, criu opens the exe fd as an O_PATH during dump and thus will use the same flag during restore when opening it. So that should be fine. However, if I understand the consequences of this change correctly, a problem could be restoring workloads that hold a writable fd open to their exe file at dump time which would mean that during restore that fd would be reopened writable causing CRIU to fail when setting the exe file for the task to be restored. Which honestly, no idea how many such workloads exist. (I know at least of runC and LXC need to sometimes reopen to rexec themselves (weird bug to protect against attacking the exe file) and thus re-open /proc/self/exe but read-only.) > kernel/fork.c | 39 ++++++++++++++++++++++++++++++++++----- > 1 file changed, 34 insertions(+), 5 deletions(-) > > diff --git a/kernel/fork.c b/kernel/fork.c > index 6bd2e52bcdfb..5d904878f19b 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -476,6 +476,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, > { > struct vm_area_struct *mpnt, *tmp, *prev, **pprev; > struct rb_node **rb_link, *rb_parent; > + struct file *exe_file; > int retval; > unsigned long charge; > LIST_HEAD(uf); > @@ -493,7 +494,10 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, > mmap_write_lock_nested(mm, SINGLE_DEPTH_NESTING); > > /* No ordering required: file already has been exposed. */ > - RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm)); > + exe_file = get_mm_exe_file(oldmm); > + RCU_INIT_POINTER(mm->exe_file, exe_file); > + if (exe_file) > + deny_write_access(exe_file); > > mm->total_vm = oldmm->total_vm; > mm->data_vm = oldmm->data_vm; > @@ -638,8 +642,13 @@ static inline void mm_free_pgd(struct mm_struct *mm) > #else > static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) > { > + struct file *exe_file; > + > mmap_write_lock(oldmm); > - RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm)); > + exe_file = get_mm_exe_file(oldmm); > + RCU_INIT_POINTER(mm->exe_file, exe_file); > + if (exe_file) > + deny_write_access(exe_file); > mmap_write_unlock(oldmm); > return 0; > } > @@ -1163,11 +1172,19 @@ void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file) > */ > old_exe_file = rcu_dereference_raw(mm->exe_file); > > - if (new_exe_file) > + if (new_exe_file) { > get_file(new_exe_file); > + /* > + * exec code is required to deny_write_access() successfully, > + * so this cannot fail > + */ > + deny_write_access(new_exe_file); > + } > rcu_assign_pointer(mm->exe_file, new_exe_file); > - if (old_exe_file) > + if (old_exe_file) { > + allow_write_access(old_exe_file); > fput(old_exe_file); > + } > } > > int atomic_set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file) > @@ -1194,10 +1211,22 @@ int atomic_set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file) > } > > /* set the new file, lockless */ > + ret = deny_write_access(new_exe_file); > + if (ret) > + return -EACCES; > get_file(new_exe_file); > + > old_exe_file = xchg(&mm->exe_file, new_exe_file); > - if (old_exe_file) > + if (old_exe_file) { > + /* > + * Don't race with dup_mmap() getting the file and disallowing > + * write access while someone might open the file writable. > + */ > + mmap_read_lock(mm); > + allow_write_access(old_exe_file); > fput(old_exe_file); > + mmap_read_unlock(mm); > + } > return 0; > } > > -- > 2.31.1 >