Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752846AbeAFUYi (ORCPT + 1 other); Sat, 6 Jan 2018 15:24:38 -0500 Received: from wtarreau.pck.nerim.net ([62.212.114.60]:38583 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751718AbeAFUYg (ORCPT ); Sat, 6 Jan 2018 15:24:36 -0500 Date: Sat, 6 Jan 2018 21:24:33 +0100 From: Willy Tarreau To: Avi Kivity Cc: "linux-kernel@vger.kernel.org" Subject: Re: Proposal: CAP_PAYLOAD to reduce Meltdown and Spectre mitigation costs Message-ID: <20180106202433.GB9075@1wt.eu> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.1 (2016-04-27) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hi Avi, On Sat, Jan 06, 2018 at 09:33:28PM +0200, Avi Kivity wrote: > Meltdown and Spectre mitigations focus on protecting the kernel from a > hostile userspace. However, it's not a given that the kernel is the most > important target in the system. It is common in server workloads that a > single userspace application contains the valuable data on a system, and if > it were hostile, the game would already be over, without the need to > compromise the kernel. > > In these workloads, a single application performs most system calls, and so > it pays the cost of protection, without benefiting from it directly (since > it is the target, rather than the kernel). Definitely :-) > I propose to create a new capability, CAP_PAYLOAD, that allows the system > administrator to designate an application as the main workload in that > system. Other processes (like sshd or monitoring daemons) exist to support > it, and so it makes sense to protect the rest of the system from their being > compromised. Initially I was thinking about letting applications disable PTI using prctl() when running under a certain capability (I initially thought about CAP_SYSADMIN though I changed my mind). One advantage of proceeding like this is that it would have to be explicitly implemented in the application, which limits the risk of running by default. I later thought that we could use CAP_RAWIO for this, given that such processes already have access to the hardware anyway. We could even imagine not switching the page tables on such a capability without requiring prctl(), though it would mean that processes running as root (as is often found on a number of servers) would automatically present a risk for the system. But maybe CAP_RAWIO + prctl() could be a good solution. I'm interested in participating to working on such a solution, given that haproxy is severely impacted by "pti=on" and that for now we'll have to run with "pti=off" on the whole system until a more suitable solution is found. I'd rather not rush anything and let things calm down for a while to avoid adding disturbance to the current situation. But I'm willing to continue this discussion and even test patches. Cheers, Willy