Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753095AbeAFTdd (ORCPT + 1 other); Sat, 6 Jan 2018 14:33:33 -0500 Received: from mail-wr0-f173.google.com ([209.85.128.173]:39629 "EHLO mail-wr0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752108AbeAFTdc (ORCPT ); Sat, 6 Jan 2018 14:33:32 -0500 X-Google-Smtp-Source: ACJfBovWWZpKI/TWkSuY2kz/WG/04JGpe7IDWn9xur60tiIa/kquf2FQj+j3JX453K3kfREceRjPhw== From: Avi Kivity Subject: Proposal: CAP_PAYLOAD to reduce Meltdown and Spectre mitigation costs To: "linux-kernel@vger.kernel.org" Message-ID: Date: Sat, 6 Jan 2018 21:33:28 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Meltdown and Spectre mitigations focus on protecting the kernel from a hostile userspace. However, it's not a given that the kernel is the most important target in the system. It is common in server workloads that a single userspace application contains the valuable data on a system, and if it were hostile, the game would already be over, without the need to compromise the kernel. In these workloads, a single application performs most system calls, and so it pays the cost of protection, without benefiting from it directly (since it is the target, rather than the kernel). I propose to create a new capability, CAP_PAYLOAD, that allows the system administrator to designate an application as the main workload in that system. Other processes (like sshd or monitoring daemons) exist to support it, and so it makes sense to protect the rest of the system from their being compromised. When the kernel switches to user mode of a CAP_PAYLOAD process, it doesn't switch page tables and instead leaves the kernel mapped into the adddress space (still with supervisor protection, of course). This reduces context switch cost, and will also reduce interrupt costs if the interrupt happens while that process executes. Since a CAP_PAYLOAD process is likely to consume the majority of CPU time, the costs associated with Meltdown mitigation are almost completely nullified. CAP_PAYLOAD has potential to be abused; every software vendor will be absolutely certain that their application is the reason the universe (let alone that server) exists and they will turn it on, so init systems will have to work to make it doesn't get turned on without administrator opt-in. It's also not perfect, since if there is a payload application compromise, in addition to stealing the application's data ssh keys can be stolen too. But I think it's better than having to choose between significantly reduced performance and security. You get performance for your important application, and protection against the possibility that a remote exploit against a supporting process turns into a remote exploit against that important application.