Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756056AbeAHUg2 (ORCPT + 1 other); Mon, 8 Jan 2018 15:36:28 -0500 Received: from wtarreau.pck.nerim.net ([62.212.114.60]:39005 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754909AbeAHUg1 (ORCPT ); Mon, 8 Jan 2018 15:36:27 -0500 Date: Mon, 8 Jan 2018 21:35:50 +0100 From: Willy Tarreau To: Linus Torvalds Cc: Linux Kernel Mailing List , the arch/x86 maintainers , Thomas Gleixner , One Thousand Gnomes , Andy Lutomirski , Borislav Petkov , Dave Hansen , Ingo Molnar , Peter Zijlstra , Josh Poimboeuf , "H. Peter Anvin" Subject: Re: [PATCH RFC 2/4] x86/arch_prctl: add ARCH_GET_NOPTI and ARCH_SET_NOPTI to enable/disable PTI Message-ID: <20180108203550.GA11238@1wt.eu> References: <1515427939-10999-1-git-send-email-w@1wt.eu> <1515427939-10999-3-git-send-email-w@1wt.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.1 (2016-04-27) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: [ increased the CC list this time ] On Mon, Jan 08, 2018 at 09:54:05AM -0800, Linus Torvalds wrote: > On Mon, Jan 8, 2018 at 8:12 AM, Willy Tarreau wrote: > > This allows to report the current state of the PTI protection and to > > enable or disable it for the current task. > > So I really think that this needs to be done up-front to avoid a lot > of complexity. And per mm. > > If the process is already threaded (so the mm has multiple users), > it's too late to start playing games with PTI. > > In fact, maybe the whole thing needs to be controlled before "exec" > happens, so that we have the knowledge as we build up the mm, rather > than being "runtime" dynamic at all. > > But in no case should you even try to handle the multi-threaded case - > just error out for trying to change the PTI setting. > > So make the thing per-mm, and then at task switch time as you switch > mms, you set the bit in a percpu variable for testing at kernel entry. So I did something like this (will have to remerge the awful patches and remove the printks before resending). In short, now here's what it does : - added a new x86 flag : "mm->context.pti_disable", depends on CONFIG_PAGE_TABLE_ISOLATION. - the new prctl() also depends on this config setting. - prctl() refuses any change if mm->mm_users > 1 - prctl() refuses to set nopti if !CAP_SYS_RAWIO, but clearing it is fine without (Ingo's idea) - __switch_to() sets a new "pti_disable" per-cpu variable to the copy of mm->context.pti_disable - entry code in SWITCH_TO_USER_CR3_NOSTACK now checks PER_CPU_VAR(pti_disable) First tests show that it still works. One main difference I immediately observed is that it stops at execve(). This means that it will not be possible to implement a wrapper to enable the bypass, but on the other hand it guarantees that any execve() even from a so called "trusted" process doesn't accidently expose a victim program. So there are pros and cons here. I'm personally fine with both the wrapper and the code changes. But I'm in the easiest situation, working with opensource code that I can easily update to accommodate the changes. Other users might have a different opinion here. Another option could be to have a per-task (and really task here) flag is only passed to execve() to mention that the per-mm pti_disable has to be set in the new mm (and which would clear the task flag). But this mechanism would always require a wrapper. Or we could have both. I'll clean up my patches tomorrow morning and will post an update. Ideas and objections welcome in the mean time ;-) Cheers, Willy