Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1971455yba; Thu, 25 Apr 2019 08:36:22 -0700 (PDT) X-Google-Smtp-Source: APXvYqyVPEVMPLyV5Hu5RcytDwLs7KSYcRJWSFrb8OdJejknlfeqkklgEU1aA43fkWeMAcjpd/xL X-Received: by 2002:a17:902:848d:: with SMTP id c13mr14744399plo.279.1556206582369; Thu, 25 Apr 2019 08:36:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556206582; cv=none; d=google.com; s=arc-20160816; b=f2MmQsNt/ajfFGd5NXESU9ceHoJ71rsOFcT5yaun6ueFrFHCyov2n7o5H5Q/3pXpVz 3df7IZsP+BpuqNMB9Eo9Ko8q/xT5nSI67GAQS0dp1vPDcfYR3qMTKyqgEotAuudRWUrv gp8hO8IhAZDuEDHJxezXqISF8g304p5eWiH+xBEIGXQBklO0c0XKHshvPkKDofBHu0uE fwm7mMZTofskL8iV5w+5+N/26jf9UTcihsJLpTbkKFZETIXcBMKZVeKXKtf2/f9hLA2Q wW6upQLOi4m6P2cmis1iDk1YIiD6M/4UVbElSKe+FL8iuqvRSR89tBTmCtok+npyS8Qm UUyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=bXgqURV3byj3cYEdSZNf6ZaP2flQvEPyxHp20vNGNFo=; b=yhJyRdqdIK4eWggELCjklPGfGWhhLY8X+iB2PxE7ZTHDXeYrcWASZDxZgtDDmil72o fA0uce6zqnxBx1WD/lCALVoE+H1/LYmaZH5ZtkGm2C+z4X7h5yU7lmiQ11c6LtgrjdtG Sd2x4sPLX+NWfl2sivU6uPzWNqx/F/Z0Lcxg5UgfpS7WujkPYH1DcdVNkhdw82H9UpSp l1QMX5LiORS51BmEj4TfcW3sPgWJb4ZqrYXD+rvFQ4/gaFCnl9PW+Wb0gzllCmHm6aeg +siBH8M02/s3wUC1qJPAxDkdkrqT+1JLaWpz1+AUqeDtWABr9zr6DL3fOE7sfVwi+bdh ryAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=YTEL7lcX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d3si23187134pfc.278.2019.04.25.08.36.06; Thu, 25 Apr 2019 08:36:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=YTEL7lcX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731310AbfDYMnV (ORCPT + 99 others); Thu, 25 Apr 2019 08:43:21 -0400 Received: from mail-ot1-f67.google.com ([209.85.210.67]:43750 "EHLO mail-ot1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725965AbfDYMnU (ORCPT ); Thu, 25 Apr 2019 08:43:20 -0400 Received: by mail-ot1-f67.google.com with SMTP id u15so19235929otq.10 for ; Thu, 25 Apr 2019 05:43:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=bXgqURV3byj3cYEdSZNf6ZaP2flQvEPyxHp20vNGNFo=; b=YTEL7lcX1b8/OtmLyct2lZre+5wMQQQI6B1dICt0ZW3Kbaog1YmCeYwf1gHyOlV5bS XE1yVqlxhZtqALK+Iyh7u1l9XffWkEhM0+3V4f70zgstTgIs68CD9plXvmJZ9pi8KVye XVT5GEAJzA+JuSKdM27WhrsKU3oVwTkLxojaRASvPzPCBbFi1zbxWnUoBvGfH0CuFaAN TR+Br9nLRVfE8BMbyl4lgA63ehL7UJdFHGAc9yCz4QaGA7VkkpfwDMcHhnsFJdReAWFU rjZ/nzOt3YV3EzXQYl9I3BFPpbCCwg0RaklU16hKCngkz42TCwNlfSyU4ecbMd/WCjCd aAbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bXgqURV3byj3cYEdSZNf6ZaP2flQvEPyxHp20vNGNFo=; b=czWONyy0IopPyKOHqBYo0O08kRsM96h6kCV8u9ScOhddIh0IV6L703N+UeSw+o2tVp r6AdKIKIZYRFBoJKjMZDAyRpu54VY3fi13rY3Y2co7vG3rK7GDtCHYocIksLKlvrylB1 f8hq4zboYcSMMQIxL/xHx352+nbN8NRhFhUSQtBiA0hwydLSThdBN/hVxI30skLiQDE5 h6zTaMPuzL5IiIUPwfOfsIIBiSFZWG2QDlFNke51wAruG5Ra8EdgMj7PXfNgviFB35hr SSTmE3f5o81nqLqQpXBaRpyQNttja4RUqyc7mFslOa+waHm4p5WWdB2z2jBdQMk6oi2G UPJw== X-Gm-Message-State: APjAAAX9HOhdwx46fsWInRRICo1bSWTC2MGfUnxP39mGImMRhIrJhVAw wrB69n51mY4hZs1B/m0Wo2/R4UDlcynYSOIoSxnsMA== X-Received: by 2002:a9d:7095:: with SMTP id l21mr24116770otj.35.1556196198973; Thu, 25 Apr 2019 05:43:18 -0700 (PDT) MIME-Version: 1.0 References: <20190424211038.204001-1-matthewgarrett@google.com> <20190425121410.GC1144@dhcp22.suse.cz> In-Reply-To: <20190425121410.GC1144@dhcp22.suse.cz> From: Jann Horn Date: Thu, 25 Apr 2019 14:42:52 +0200 Message-ID: Subject: Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release To: Michal Hocko Cc: Matthew Garrett , Linux-MM , kernel list , Matthew Garrett , Linux API Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 25, 2019 at 2:14 PM Michal Hocko wrote: [...] > On Wed 24-04-19 14:10:39, Matthew Garrett wrote: > > From: Matthew Garrett > > > > Applications that hold secrets and wish to avoid them leaking can use > > mlock() to prevent the page from being pushed out to swap and > > MADV_DONTDUMP to prevent it from being included in core dumps. Applications > > can also use atexit() handlers to overwrite secrets on application exit. > > However, if an attacker can reboot the system into another OS, they can > > dump the contents of RAM and extract secrets. We can avoid this by setting > > CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that the > > firmware wipe the contents of RAM before booting another OS, but this means > > rebooting takes a *long* time - the expected behaviour is for a clean > > shutdown to remove the request after scrubbing secrets from RAM in order to > > avoid this. > > > > Unfortunately, if an application exits uncleanly, its secrets may still be > > present in RAM. This can't be easily fixed in userland (eg, if the OOM > > killer decides to kill a process holding secrets, we're not going to be able > > to avoid that), so this patch adds a new flag to madvise() to allow userland > > to request that the kernel clear the covered pages whenever the page > > reference count hits zero. Since vm_flags is already full on 32-bit, it > > will only work on 64-bit systems. [...] > > diff --git a/mm/madvise.c b/mm/madvise.c > > index 21a7881a2db4..989c2fde15cf 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -92,6 +92,22 @@ static long madvise_behavior(struct vm_area_struct *vma, > > case MADV_KEEPONFORK: > > new_flags &= ~VM_WIPEONFORK; > > break; > > + case MADV_WIPEONRELEASE: > > + /* MADV_WIPEONRELEASE is only supported on anonymous memory. */ > > + if (VM_WIPEONRELEASE == 0 || vma->vm_file || > > + vma->vm_flags & VM_SHARED) { > > + error = -EINVAL; > > + goto out; > > + } > > + new_flags |= VM_WIPEONRELEASE; > > + break; An interesting effect of this is that it will be possible to set this on a CoW anon VMA in a fork() child, and then the semantics in the parent will be subtly different - e.g. if the parent vmsplice()d a CoWed page into a pipe, then forked an unprivileged child, the child set MADV_WIPEONRELEASE on its VMA, the parent died somehow, and then the child died, the page in the pipe would be zeroed out. A child should not be able to affect its parent like this, I think. If this was an mmap() flag instead of a madvise() command, that issue could be avoided. Alternatively, if adding more mmap() flags doesn't work, perhaps you could scan the VMA and ensure that it contains no pages yet, or something like that? > > diff --git a/mm/memory.c b/mm/memory.c > > index ab650c21bccd..ff78b527660e 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -1091,6 +1091,9 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, > > page_remove_rmap(page, false); > > if (unlikely(page_mapcount(page) < 0)) > > print_bad_pte(vma, addr, ptent, page); > > + if (unlikely(vma->vm_flags & VM_WIPEONRELEASE) && > > + page_mapcount(page) == 0) > > + clear_highpage(page); > > if (unlikely(__tlb_remove_page(tlb, page))) { > > force_flush = 1; > > addr += PAGE_SIZE; Should something like this perhaps be added in page_remove_rmap() instead? That's where the mapcount is decremented; and looking at other callers of page_remove_rmap(), in particular the following ones look interesting: - do_huge_pmd_wp_page()/do_huge_pmd_wp_page_fallback() might be relevant in the case where a forking process contains transparent hugepages? - zap_huge_pmd() is relevant when transparent hugepages are used, I think (otherwise transparent hugepages might not be wiped?) - there are various callers related to migration; I think this is relevant on a NUMA system where memory is moved between nodes to improve locality (moving memory to a new page and freeing the old one, in which case you'd want to wipe the old page) I think all the callers have a reference to the VMA, so perhaps you could add a VMA parameter to page_remove_rmap() and then look at the VMA in there?