Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp683358yba; Fri, 26 Apr 2019 07:07:25 -0700 (PDT) X-Google-Smtp-Source: APXvYqxYDoLPO5iwRmIf/htfPdy5L03b8RHAzUEA6WvvlneGA4ghsqMrzhHpQRVk6IV+VLe4cwS7 X-Received: by 2002:a63:fc62:: with SMTP id r34mr9336936pgk.89.1556287645560; Fri, 26 Apr 2019 07:07:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556287645; cv=none; d=google.com; s=arc-20160816; b=kv4Wa9OOUAou3AsqB5zsMqcXy6+BR/qQEcBUxG99Kui5YgIE/2l6zBokUIlx++Qs1E apj8PbwClQi9joRxJvMJMfdnEUr4C8KZwCHpBCfT06LxNVnCkTCBKYReqfTPNN5aNkW+ ae1DPjQpNRYbOkhmDkeGfYf8sL0hTfBFmQQN+7aDCowkCvwxntMFcoJrinDBBxPMbzCI bh0QIZo0WrFkQ66tW0XZvi15yqI5UY+5XyemqnmL5JqbrG1dEI14AeOS1stfk9lIqNAI edFdqUfXReikTNO2fuNt/UjyKLhg5vDqC6DGWY+hD41quFRHrdTS4jyWRMxhKTaq/V6R yU1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=fj+E7Rfrrp/lQ9+vFbFFdvs17RbI59m3xmdolzIfIm8=; b=kGwmB6jo0Wk0oWIdAPOJ0EuQt56e0yVvqw38xmV++hoAfuYG72qMiPnLZOLlbXy/ql bi+TsimtEXmLlT4nuHCsskO+vrDIcfe8jDLew7fU0VutoNWeA5sTJ2KTNrQGtN6yMmjz WnwQTXfmVeURDEAm4c18EhzJb2blR26h/mFha9d6+bNxjiL7rEnDdTpw71DIPX4mfoO2 axMdSJwb5FKVdb5i9qeOgie5EsZEUwicN85vt4TY7AKKNk7pAjuZbnxaUhRBod6EjcOr 8PPrzbljTpvvrOWYnFrv7C2FPHwhPm/hYip1AYpSKmCSyuVInsmiQq6/+cDkQuFl8OHg +TtQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=aCpmAYhL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u25si12280921pgk.595.2019.04.26.07.07.08; Fri, 26 Apr 2019 07:07:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=aCpmAYhL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726380AbfDZODz (ORCPT + 99 others); Fri, 26 Apr 2019 10:03:55 -0400 Received: from mail-ot1-f67.google.com ([209.85.210.67]:45831 "EHLO mail-ot1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726170AbfDZODy (ORCPT ); Fri, 26 Apr 2019 10:03:54 -0400 Received: by mail-ot1-f67.google.com with SMTP id e5so2689533otk.12 for ; Fri, 26 Apr 2019 07:03:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fj+E7Rfrrp/lQ9+vFbFFdvs17RbI59m3xmdolzIfIm8=; b=aCpmAYhLDQFgKjQbSs9DSyGvCZ7CRQCaeKhAqizfNq4zHXSkKVAsq9fZ0TSqIsbZaV 8xMpt5Cc/ZiVgi3exUwI+kwPYufdFgP424g8sYliriJHamZ0ygYpDIVpZnc4/cT21Lx6 JfS7Im460+W5CUS5u+f7iLgopIFSZPbw5D4vIGW5tdE+QKX6PSCyIJQ41lsbjoXAI0bF 9xRQ1jXP/yePLi1koYbLvds3lQoMrJMxRUAW1UnJ7JCHgLtCbaAMSXTVzUh/ajzVVsdH XiJKMppeSJUmHxhyWtVGZdMrTNsMKZHHz+nhgpnitX/1gJRfDOoT8nV/q1jAcSC3eND5 tDCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fj+E7Rfrrp/lQ9+vFbFFdvs17RbI59m3xmdolzIfIm8=; b=TAmw1bN9lk1kw45AK+nZePG5Cj1ZTSk8hHtd3Qv3veqW/RtQDZ95kq0cThSQF8Y2f8 DGpeXgbzodi1Mq6fxo48vak9poJTX/dRJg9wzQVax2sIdpK910aPjGEA8SjMdGr+bTAL 8mknLiz0nQgrCA5L1Lh9H2ndb9ELFC1Io0o+SgXGQJNT+dJ/PeAcDnZYPUzNqpT8n/32 sTLV25/5DYKVrsjcGcN9CYkBv5kynqkP/PhMDo62uAQ3FVmxyZ8ccPqBhYbWimmQcBwx ngLsCJBwfaZPl7bDtiYLsdPZtnEkGrzu5GljtuihvU8DPL9AokuSXvIN4sOQFlfElHm/ 3W1g== X-Gm-Message-State: APjAAAWQ91LOLfbJqMYJr8rO6GJNEumBnGUfwkBRqmOT9t8yLVjgJPpC KlzGjoxm2gSluDqamgGAyCIzel8GGrnSgxV3G4JKXA== X-Received: by 2002:a9d:53cc:: with SMTP id i12mr6737028oth.242.1556287433090; Fri, 26 Apr 2019 07:03:53 -0700 (PDT) MIME-Version: 1.0 References: <20190424211038.204001-1-matthewgarrett@google.com> <20190425121410.GC1144@dhcp22.suse.cz> <20190426053135.GC12337@dhcp22.suse.cz> <20190426134722.GH22245@dhcp22.suse.cz> In-Reply-To: <20190426134722.GH22245@dhcp22.suse.cz> From: Jann Horn Date: Fri, 26 Apr 2019 16:03:26 +0200 Message-ID: Subject: Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release To: Michal Hocko Cc: Matthew Garrett , Linux-MM , kernel list , Matthew Garrett , Linux API Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 26, 2019 at 3:47 PM Michal Hocko wrote: > On Fri 26-04-19 15:33:25, Jann Horn wrote: > > On Fri, Apr 26, 2019 at 7:31 AM Michal Hocko wrote: > > > On Thu 25-04-19 14:42:52, Jann Horn wrote: > > > > On Thu, Apr 25, 2019 at 2:14 PM Michal Hocko wrote: > > > > [...] > > > > > On Wed 24-04-19 14:10:39, Matthew Garrett wrote: > > > > > > From: Matthew Garrett > > > > > > > > > > > > Applications that hold secrets and wish to avoid them leaking can use > > > > > > mlock() to prevent the page from being pushed out to swap and > > > > > > MADV_DONTDUMP to prevent it from being included in core dumps. Applications > > > > > > can also use atexit() handlers to overwrite secrets on application exit. > > > > > > However, if an attacker can reboot the system into another OS, they can > > > > > > dump the contents of RAM and extract secrets. We can avoid this by setting > > > > > > CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that the > > > > > > firmware wipe the contents of RAM before booting another OS, but this means > > > > > > rebooting takes a *long* time - the expected behaviour is for a clean > > > > > > shutdown to remove the request after scrubbing secrets from RAM in order to > > > > > > avoid this. > > > > > > > > > > > > Unfortunately, if an application exits uncleanly, its secrets may still be > > > > > > present in RAM. This can't be easily fixed in userland (eg, if the OOM > > > > > > killer decides to kill a process holding secrets, we're not going to be able > > > > > > to avoid that), so this patch adds a new flag to madvise() to allow userland > > > > > > to request that the kernel clear the covered pages whenever the page > > > > > > reference count hits zero. Since vm_flags is already full on 32-bit, it > > > > > > will only work on 64-bit systems. > > > > [...] > > > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > > > > > index 21a7881a2db4..989c2fde15cf 100644 > > > > > > --- a/mm/madvise.c > > > > > > +++ b/mm/madvise.c > > > > > > @@ -92,6 +92,22 @@ static long madvise_behavior(struct vm_area_struct *vma, > > > > > > case MADV_KEEPONFORK: > > > > > > new_flags &= ~VM_WIPEONFORK; > > > > > > break; > > > > > > + case MADV_WIPEONRELEASE: > > > > > > + /* MADV_WIPEONRELEASE is only supported on anonymous memory. */ > > > > > > + if (VM_WIPEONRELEASE == 0 || vma->vm_file || > > > > > > + vma->vm_flags & VM_SHARED) { > > > > > > + error = -EINVAL; > > > > > > + goto out; > > > > > > + } > > > > > > + new_flags |= VM_WIPEONRELEASE; > > > > > > + break; > > > > > > > > An interesting effect of this is that it will be possible to set this > > > > on a CoW anon VMA in a fork() child, and then the semantics in the > > > > parent will be subtly different - e.g. if the parent vmsplice()d a > > > > CoWed page into a pipe, then forked an unprivileged child, the child > > > > > > Maybe a stupid question. How do you fork an unprivileged child (without > > > exec)? Child would have to drop priviledges on its own, no? > > > > Sorry, yes, that's what I meant. > > But then the VMA is gone along with the flag so why does it matter? But in theory, the page might still be used somewhere, e.g. as data in a pipe (into which the parent wrote it) or whatever. Parent vmsplice()s a page into a pipe, parent exits, child marks the VMA as WIPEONRELEASE and exits, page gets wiped, someone else reads the page from the pipe. Yes, this is very theoretical, and you'd have to write some pretty weird software for this to matter. But it doesn't seem clean to me to allow a child to affect the data in e.g. a pipe that it isn't supposed to have access to like this. Then again, this could probably already happen, since do_wp_page() reuses pages depending on only the mapcount, without looking at the refcount.