Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp1257796pxb; Wed, 4 Nov 2020 04:21:35 -0800 (PST) X-Google-Smtp-Source: ABdhPJw7lRF1ZCViVaUdhyDj5r/vcxYf1QiivvHHfg8xy8QO67pOaFT7E83PST+qelsO0SyD3OE4 X-Received: by 2002:a17:906:8058:: with SMTP id x24mr25499763ejw.272.1604492494808; Wed, 04 Nov 2020 04:21:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604492494; cv=none; d=google.com; s=arc-20160816; b=KCeuTD2ZNgF0pqmWmoC23pZiCSYBWev5jDUgUMlXG+IYkohFHDMSXwky5w1/dz0X8X ncZnMMm8iWbHLG4weaifI3ETSmDWiEEVzd+bB0Hxu7ea+kb6U4Ea3CBBUaWhPntKC9it j/aZ3SzTXXd6/A52ExxMPo9pRLYv3PCKuv/VYnhffwjBfRtQeGcovpCTPAOlauz3fYUg JJbgxW7fa/zQBcH7sbTaDvFlOKD0v8qIw+YdjrqozRH1ZCLPX/w0F1KGMWIabJqD0Eat VivrjSQk4SB/bC5ZshEPh43KAPexKHNb8LE3k13pHRoauOakIbCapbaFwoZzslyjvoWa matA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=qmTh0AvyPT2BtzBStmGxVmTkgFL9rbGv9z01cvD6uhE=; b=kdNNOcfBd8UWKzN1FLZLoY5usJoBswJndP+QZV5CfQxI1rD/ZFo+9LWY+6c7zF1piG bwyDUkxLm9fhu6JXfgVw5AADUlPi7E0k2AydvvUfCOw8mZcq9Ht2hFGA4dCzkaCYGIsI khSHta3ARHgQYUmVICEfy87VoNKuD/KBuH+kIkr23Ggd3YIcVmSgTMq1n2nZzlvZywzM kw3x81hvt71X+XW9zcOemJTc8srtLydbsesdzw7r4zDJYimJYaFPngUNX5K8rFPheFF3 ATzhjfgjtrmJHscOjA6dcFLi9KOY2S8SKRN8kZ+FptgYzX/rwdfpHPi8jBLTQr4sbJtJ n+Lw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m14si1245833ejr.448.2020.11.04.04.21.11; Wed, 04 Nov 2020 04:21:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729782AbgKDMTC (ORCPT + 99 others); Wed, 4 Nov 2020 07:19:02 -0500 Received: from foss.arm.com ([217.140.110.172]:36142 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726344AbgKDMTC (ORCPT ); Wed, 4 Nov 2020 07:19:02 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CEBCB1474; Wed, 4 Nov 2020 04:19:01 -0800 (PST) Received: from arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 651B63F719; Wed, 4 Nov 2020 04:19:00 -0800 (PST) Date: Wed, 4 Nov 2020 12:18:56 +0000 From: Dave Martin To: Catalin Marinas Cc: Mark Rutland , systemd-devel@lists.freedesktop.org, Kees Cook , Will Deacon , "linux-kernel@vger.kernel.org" , Jeremy Linton , Mark Brown , toiwoton@gmail.com, libc-alpha@sourceware.org, "linux-arm-kernel@lists.infradead.org" Subject: Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures Message-ID: <20201104121855.GQ6882@arm.com> References: <8584c14f-5c28-9d70-c054-7c78127d84ea@arm.com> <20201026162410.GB27285@arm.com> <20201026165755.GV3819@arm.com> <20201026175230.GC27285@arm.com> <45c64b49-a38b-4b0c-d9cf-6c586dacbcc9@arm.com> <20201027141522.GD27285@arm.com> <20201029110220.GC10776@gaia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201029110220.GC10776@gaia> User-Agent: Mutt/1.5.23 (2014-03-12) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 29, 2020 at 11:02:22AM +0000, Catalin Marinas via Libc-alpha wrote: > On Tue, Oct 27, 2020 at 02:15:22PM +0000, Dave P Martin wrote: > > I also wonder whether we actually care whether the pages are marked > > executable or not here; probably the flags can just be independent. This > > rather depends on whether the how the architecture treats the BTI (a.k.a > > GP) pagetable bit for non-executable pages. I have a feeling we already > > allow PROT_BTI && !PROT_EXEC through anyway. > > > > > > What about a generic-ish set/clear interface that still works by just > > adding a couple of PROT_ flags: > > > > switch (flags & (PROT_SET | PROT_CLEAR)) { > > case PROT_SET: prot |= flags; break; > > case PROT_CLEAR: prot &= ~flags; break; > > case 0: prot = flags; break; > > > > default: > > return -EINVAL; > > } > > > > This can't atomically set some flags while clearing some others, but for > > simple stuff it seems sufficient and shouldn't be too invasive on the > > kernel side. > > > > We will still have to take the mm lock when doing a SET or CLEAR, but > > not for the non-set/clear case. > > > > > > Anyway, libc could now do: > > > > mprotect(addr, len, PROT_SET | PROT_BTI); > > > > with much the same effect as your PROT_BTI_IF_X. > > > > > > JITting or breakpoint setting code that wants to change the permissions > > temporarily, without needing to know whether PROT_BTI is set, say: > > > > mprotect(addr, len, PROT_SET | PROT_WRITE); > > *addr = BKPT_INSN; > > mprotect(addr, len, PROT_CLEAR | PROT_WRITE); > > The problem with this approach is that you can't catch > PROT_EXEC|PROT_WRITE mappings via seccomp. So you'd have to limit it to > some harmless PROT_ flags only. I don't like this limitation, nor the > PROT_BTI_IF_X approach. Ack; this is just one flavour of interface, and every approach seems to have some shortcomings. > The only generic solutions I see are to either use a stateful filter in > systemd or pass the old state to the kernel in a cmpxchg style so that > seccomp can check it (I think you suggest this at some point). The "cmpxchg" option has the disadvantage that the caller needs to know the original permissions. It seems that glibc is prepared to work around this, but it won't always be feasible in ancillary / instrumentation code or libraries. IMHO it would be preferable to apply a policy to mmap/mprotect in the kernel proper rather then BPF being the only way to do it -- in any case, the required checks seem to be out of the scope of what can be done efficiently (or perhaps at all) in a syscall filter. > The latter requires a new syscall which is not something we can address > as a quick, back-portable fix here. If systemd cannot be changed to use > a stateful filter for w^x detection, my suggestion is to go for the > kernel setting PROT_BTI on the main executable with glibc changed to > tolerate EPERM on mprotect(). I don't mind adding an AT_FLAGS bit if > needed but I don't think it buys us much. I agree, this seems the best short-term approach. > Once the current problem is fixed, we can look at a better solution > longer term as a new syscall. Agreed, I think if we try to rush the addition of new syscalls, the chance of coming up with a bad design is high... Cheers ---Dave