Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp609809pxa; Fri, 14 Aug 2020 12:46:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw4WVNjqU4kWZ3tePbzIYRFyZoYEO8zNxwkKXPSM6cA8cGgIQXcX9y+c3UUnwP6SFFcISdM X-Received: by 2002:a17:906:e118:: with SMTP id gj24mr3803631ejb.219.1597434410193; Fri, 14 Aug 2020 12:46:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597434410; cv=none; d=google.com; s=arc-20160816; b=H1PuIOZ7cCPcnw41KLUnL5iZRn5SQP1LcOPmN7YHn8BzuG8yuF/my5qv/noUqzd0AQ m/ticp90GPslRGgFN7UKhXe4stFAFzj3V1x62vLTZsa4wrWgICQUuPbEEE4vx1tFBRMZ rGenZe8wcv6qMBEMpwMiUDDtmsjTrGRFloC2xyb52oawTUeEVK9rI/hTkwQ0Y7UcKfDO ax1y8GgE7F/3W1cOoUbCiKRLZWMZQsRkzye6birwp7TTT7LKmsM7WrQiRJzw34+IPJu3 09SC7nXk31PMJoU968Zuha6k2fe53LxvlUlVZgOQZOElFDwaRM08kpCKUa6g7Yw2wjIU w+yg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=GhTivzvkkptPNTxNBZ6IA1i9TlJFzOleEHyS80O3Mhc=; b=J5PesKhfPIBEtuxplI3iSndLatSqpH++88nPeY3QZE+tj1mYxjEmDPhh9qYi1wVyFV MM0stFOIDgcGYmcIZub9syBFK6kU3Ma8+4IXaxLalRWWAxWCfKUdlUkO/VK5UwmBuWgR nJ+WWya4nABVXAI0vPncJRqT8zl68MUmtkMmGVIfno3oDBMlp3g7YXSLzhvkDYhFCKfn 5dqXfQy39YBl2/rLfbz3v5rxfi2JjSwheJy66NS4oLnOF570yfqA1zZq+OWnlx8NLY2C mQBMK6MnnqUVefZJg9NpemoPR4ZfAPT6nKMwnswTDWIs3echYsTZYP2Eh4PWMBig7D4L NpiA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=M1BVD6Sn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d27si5961777ejb.176.2020.08.14.12.46.26; Fri, 14 Aug 2020 12:46:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=M1BVD6Sn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728476AbgHNRrQ (ORCPT + 99 others); Fri, 14 Aug 2020 13:47:16 -0400 Received: from mail.kernel.org ([198.145.29.99]:36980 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726844AbgHNRrJ (ORCPT ); Fri, 14 Aug 2020 13:47:09 -0400 Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6482A20885 for ; Fri, 14 Aug 2020 17:47:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1597427228; bh=akv/UmJpcxiw7tC5PwlDZdR2KC9Vs7eUyK1bJfdTthU=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=M1BVD6SnLlAB2txByxPfGzUWpVOp3wq3n7nM5a54Y2PQJRoJlEXq4f0QbkFagm6pg 9dBY7bMOODEevE6mAvxFYo35Mi9HPTiws8gRQ5R225rEKUaODsFf8AGNqWHm7C1l6j PV+URg/mt57tVczx/6gYBnLYKuLv/hYkJ1cGVOd8= Received: by mail-wr1-f52.google.com with SMTP id f7so9079516wrw.1 for ; Fri, 14 Aug 2020 10:47:08 -0700 (PDT) X-Gm-Message-State: AOAM5329retzUHMWEiLlrROY+ZIQBEvtqnVJWuZLKw0trga6u6DdsYG+ HY0UBR6RjeX/1nsS2tCcpPdPi94JZ7S+2g51RHkRkg== X-Received: by 2002:adf:e90f:: with SMTP id f15mr3752426wrm.18.1597427226974; Fri, 14 Aug 2020 10:47:06 -0700 (PDT) MIME-Version: 1.0 References: <20200130162340.GA14232@rapoport-lnx> In-Reply-To: <20200130162340.GA14232@rapoport-lnx> From: Andy Lutomirski Date: Fri, 14 Aug 2020 10:46:55 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH] mm: extend memfd with ability to create "secret" memory areas To: Mike Rapoport Cc: LKML , Alan Cox , Andrew Morton , Andy Lutomirski , Christopher Lameter , Dave Hansen , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Peter Zijlstra , "Reshetova, Elena" , Thomas Gleixner , Tycho Andersen , Linux API , Linux-MM Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 30, 2020 at 8:23 AM Mike Rapoport wrote: > > Hi, > > This is essentially a resend of my attempt to implement "secret" mappings > using a file descriptor [1]. > > I've done a couple of experiments with secret/exclusive/whatever > memory backed by a file-descriptor using a chardev and memfd_create > syscall. There is indeed no need for VM_ flag, but there are still places > that would require special care, e.g vm_normal_page(), madvise(DO_FORK), so > it won't be completely free of core mm modifications. > > Below is a POC that implements extension to memfd_create() that allows > mapping of a "secret" memory. The "secrecy" mode should be explicitly set > using ioctl(), for now I've implemented exclusive and uncached mappings. Hi- Sorry for the extremely delayed response. I like the general concept, and I like the exclusive concept. While it is certainly annoying for the kernel to manage non-direct-mapped pages, I think it's the future. But I have serious concerns about the uncached part. Here are some concerns. If it's done at all, I think it should be MFD_SECRET_X86_UNCACHED. I think that uncached memory is outside the scope of things that can reasonably be considered to be architecture-neutral. (For example, on x86, UC and WC have very different semantics, and UC has quite different properties than WB for things like atomics. Also, the performance of UC is interesting at best, and the ways to even moderately efficiently read from UC memory or write to UC memory are highly x86-specific.) I'm a little unconvinced about the security benefits. As far as I know, UC memory will not end up in cache by any means (unless aliased), but it's going to be tough to do much with UC data with anything resembling reasonable performance without derived values getting cached. It's likely entirely impossible to do it reliably without asm. But even with plain WB memory, getting it into L1 really should not be that bad unless major new vulnerabilities are discovered. And there are other approaches that could be more arch-neutral and more performant. For example, there could be an option to flush a few cache lines on schedule out. This way a task could work on some (exclusive but WB) secret memory and have the cache lines flushed if anything interrupts it. Combined with turning SMT off, this could offer comparable protection with much less overhead. UC also doesn't seem reliable on x86, sadly. From asking around, there are at least a handful of scenarios under which the kernel can ask the CPU for UC but get WB anyway. Apparently Xen hypervisors will do this unless the domain has privileged MMIO access, and ESXi will do it under some set of common circumstances. So unless we probe somehow or have fancy enumeration or administrative configuration, I'm not sure we can even get predictable behavior if we hand userspace a supposedly UC mapping. Giving user code WB when it thinks it has UC could end badly. --Andy