Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754277AbeAGRjw (ORCPT + 1 other); Sun, 7 Jan 2018 12:39:52 -0500 Received: from wtarreau.pck.nerim.net ([62.212.114.60]:38705 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754200AbeAGRjv (ORCPT ); Sun, 7 Jan 2018 12:39:51 -0500 Date: Sun, 7 Jan 2018 18:39:48 +0100 From: Willy Tarreau To: Avi Kivity Cc: "linux-kernel@vger.kernel.org" Subject: Re: Proposal: CAP_PAYLOAD to reduce Meltdown and Spectre mitigation costs Message-ID: <20180107173948.GB9772@1wt.eu> References: <20180106202433.GB9075@1wt.eu> <757ae83d-12d7-e1ad-775b-51dc3e3d3e77@scylladb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <757ae83d-12d7-e1ad-775b-51dc3e3d3e77@scylladb.com> User-Agent: Mutt/1.6.1 (2016-04-27) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Sun, Jan 07, 2018 at 11:14:21AM +0200, Avi Kivity wrote: > CAP_RAWIO is like CAP_PAYLOAD in that both allow you to read stuff you > shouldn't have access to on a vulnerable CPU. But CAP_PAYLOAD won't give you > that access on a non-vulnerable CPU, so it's safer. But it's still a wider surface for something quite similar. With CAP_SYS_RAWIO you already have /dev/mem, iopl(), etc. I don't think it's unreasonable to require that prctl() is added to applications that require such functionality, it's not really more difficult to deal with than dealing with an extra capability and managing its impacts. And prctl() already does quite a lot of similar stuff like enabling/disabling access to the TSC for example. > The advantage of not requiring prctl() is that it will work on unmodified > applications, requiring only sysadmin intervention (and it's the sysadmin's > role to designate an application as payload, not the application's). It can as well be seen as a configuration option. And not opening this to any random application by default sounds reasonable as well. I'm not saying it's perfect, just trying to figure a reasonable path here. > > I'm interested in participating to working on such a solution, given > > that haproxy is severely impacted by "pti=on" and that for now we'll > > have to run with "pti=off" on the whole system until a more suitable > > solution is found. > > > > I'd rather not rush anything and let things calm down for a while to > > avoid adding disturbance to the current situation. But I'm willing to > > continue this discussion and even test patches. > > > > > > Then you might want to test > https://www.spinics.net/lists/kernel/msg2689101.html and its companion > patchset https://www.spinics.net/lists/kernel/msg2689134.html, which as a > side effect significantly reduce KPTI impact on C10K applications (and as > their main effect improve their performance). I've seen that two days ago but didn't read more. Now I've checked a bit more but it seems very focused on block I/O (which makes sense for a DB or for a server for example), which will not help for my specific use case. In my case I'm wasting a lot of time in accept(), setsockopt(), fcntl(), bind(), connect(), recv(), send(), shutdown() or close(). The poller is almost unnoticeable since I/O events are grouped. Cheers, Willy