Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp267770pxb; Wed, 20 Apr 2022 22:14:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwGzyPCStFU3iRIOcw040RflgcYlRbh2I5+Yyhzj0Pgj+W6b7qxlzy7qKvJQS5n1cFBhuPF X-Received: by 2002:a05:6a00:c8d:b0:50a:51fe:e462 with SMTP id a13-20020a056a000c8d00b0050a51fee462mr25408357pfv.43.1650518079176; Wed, 20 Apr 2022 22:14:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650518079; cv=none; d=google.com; s=arc-20160816; b=RJpyo4RPLoVnvlDgm0SR01bNvqxiDOQEJGqUGeyxgz5gTT387z0eCtjfBA4if2RP3U pwxWN1FP09EoCTwS40XfzVtQiN/n4+JPVz16UpRdoPo36jh2vKe6vb4PnQu2e912Qriu wQISixIj77UNzcXDCAO5igW5H7QR1jS8gbqUcVldKKpjM6uyMwUo+Z86LBkYxN9R5+BB 1Z0p62WIQQ2PWxoxXnsYy/578QBgiyZT6v3BQi4V7kcxhOCzNDxKeF1ZDWoGl+s28ThD qXI9vWe5yQQAH4kPMNOuyFcHM82FbLgWPEiw/5tAzfGyQFIlLI+TtPAdEsfPNae3dAdI mviA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=hq54c2tbtgvv1KlzecXP7HbXLy3i3p7rC6kNtK/Nvbs=; b=GiqR850E2d3XOxVdzJEYUy+P5fLkctwIqA7C/2jYWwU3R7lztpG413KDfj7TctctS8 l9XsC01Z4HwMowWo4t+2c/vcpVW5dpUtz/GKo249FDKmoKICfj8mNKtRtSkw8Wf6HXMS 2b5lyuDm2ETM8XkBStM8rt5osg4TtBp6yzXSy/g6j9+CyLX1ZcOhBLS4EEijXLsRdUlG zobEiyl0WfasH0y8ACU2dHLmzs1zNMrl40MNpfZH+Kdtpvc5pQMVni7QzxcQK0FzFPCc 3zaME7CkOqeDmxJvBTOO0V05iJgN7n/H+H7Po8kRPBN8p5qcc8Ejgtb9RU05Oq+mFOar ji0w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t190-20020a6381c7000000b003aa361d8fc0si4361372pgd.689.2022.04.20.22.14.24; Wed, 20 Apr 2022 22:14:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378720AbiDTNEA (ORCPT + 99 others); Wed, 20 Apr 2022 09:04:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378705AbiDTND4 (ORCPT ); Wed, 20 Apr 2022 09:03:56 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55A3D13E1E; Wed, 20 Apr 2022 06:01:10 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id E5ACC619F3; Wed, 20 Apr 2022 13:01:09 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A2875C385A0; Wed, 20 Apr 2022 13:01:05 +0000 (UTC) Date: Wed, 20 Apr 2022 14:01:02 +0100 From: Catalin Marinas To: Kees Cook Cc: Andrew Morton , Christoph Hellwig , Lennart Poettering , Zbigniew =?utf-8?Q?J=C4=99drzejewski-Szmek?= , Will Deacon , Alexander Viro , Eric Biederman , Szabolcs Nagy , Mark Brown , Jeremy Linton , Topi Miettinen , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-abi-devel@lists.sourceforge.net, linux-hardening@vger.kernel.org, Jann Horn , Salvatore Mesoraca , Igor Zhbanov Subject: Re: [PATCH RFC 0/4] mm, arm64: In-kernel support for memory-deny-write-execute (MDWE) Message-ID: References: <20220413134946.2732468-1-catalin.marinas@arm.com> <202204141028.0482B08@keescook> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <202204141028.0482B08@keescook> X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 14, 2022 at 11:52:17AM -0700, Kees Cook wrote: > On Wed, Apr 13, 2022 at 02:49:42PM +0100, Catalin Marinas wrote: > > The background to this is that systemd has a configuration option called > > MemoryDenyWriteExecute [1], implemented as a SECCOMP BPF filter. Its aim > > is to prevent a user task from inadvertently creating an executable > > mapping that is (or was) writeable. Since such BPF filter is stateless, > > it cannot detect mappings that were previously writeable but > > subsequently changed to read-only. Therefore the filter simply rejects > > any mprotect(PROT_EXEC). The side-effect is that on arm64 with BTI > > support (Branch Target Identification), the dynamic loader cannot change > > an ELF section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect(). > > For libraries, it can resort to unmapping and re-mapping but for the > > main executable it does not have a file descriptor. The original bug > > report in the Red Hat bugzilla - [2] - and subsequent glibc workaround > > for libraries - [3]. > > Right, so, the systemd filter is a big hammer solution for the kernel > not having a very easy way to provide W^X mapping protections to > userspace. There's stuff in SELinux, and there have been several > attempts[1] at other LSMs to do it too, but nothing stuck. > > Given the filter, and the implementation of how to enable BTI, I see two > solutions: > > - provide a way to do W^X so systemd can implement the feature differently > - provide a way to turn on BTI separate from mprotect to bypass the filter > > I would agree, the latter seems like the greater hack, We discussed such hacks in the past but they are just working around the fundamental issue - systemd wants W^X but with BPF it can only achieve it by preventing mprotect(PROT_EXEC) irrespective of whether the mapping was already executable. If we find a better solution for W^X, we wouldn't have to hack anything for mprotect(PROT_EXEC|PROT_BTI). > so I welcome > this RFC, though I think it might need to explore a bit of the feature > space exposed by other solutions[1] (i.e. see SARA and NAX), otherwise > it risks being too narrowly implemented. For example, playing well with > JITs should be part of the design, and will likely need some kind of > ELF flags and/or "sealing" mode, and to handle the vma alias case as > Jann Horn pointed out[2]. I agree we should look at what we want to cover, though trying to avoid re-inventing SELinux. With this patchset I went for the minimum that systemd MDWE does with BPF. I think JITs get around it using something like memfd with two separate mappings to the same page. We could try to prevent such aliases but allow it if an ELF note is detected (or get the JIT to issue a prctl()). Anyway, with a prctl() we can allow finer-grained control starting with anonymous and file mappings and later extending to vma aliases, writeable files etc. On top we can add a seal mask so that a process cannot disable a control was set. Something like (I'm not good at names): prctl(PR_MDWX_SET, flags, seal_mask); prctl(PR_MDWX_GET); with flags like: PR_MDWX_MMAP - basics, should cover mmap() and mprotect() PR_MDWX_ALIAS - vma aliases, allowed with an ELF note PR_MDWX_WRITEABLE_FILE (needs some more thinking) -- Catalin