Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp506853pxb; Fri, 8 Jan 2021 10:22:37 -0800 (PST) X-Google-Smtp-Source: ABdhPJyPgj+d1X2qqc64ImtMTcsr4r01s65HRghjZQCNRU/UhsKDsvY3C/oqTh5c5zkl7RwJyHRr X-Received: by 2002:a05:6402:352:: with SMTP id r18mr6018416edw.373.1610130156896; Fri, 08 Jan 2021 10:22:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610130156; cv=none; d=google.com; s=arc-20160816; b=HJSIYejZnObhDsh35zIVsTuG8HoxJA4EBksIf87scV02FwiqaMPGQ3w5lRpYDjQIUC FIJaW5cvjtuxROawabAA7sa7o+wERVUt0JRVj2l97KUWX6TxzpMaeObNP3AMPYF5BOjn xu4kX8hJM1ACCpQ7QwfAK0o5WRyeInIUhWqe/wSgrkWIpFIPUnIJcuKdtIIzpyJ8GyC0 U1uF0887hWvJSHSaDHP6+Eyf8UgU2n5KUPCrYZysg/zR0L52IHKIJciG6msvWwDRj7Qh YzaDMq3/chbskGfvcnBPUi2WwzpWkCxATrjDE4Rpo1LhSq8N5XIDK0brR1/feHkQYo9X 0BDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=eRLIK07R5/k0ZbmMrIsckpNA62/b6A5anKQsmHzEdBY=; b=IdIi8AtJncaEYzPUq569ejRdZkM/KcfvM9M4XpRhe8/QXbv097TgX9EslqhQtt+qRG dgf/r0FSq6jOT/5zWUWaFXhv+sO0v7e+2qQd1HzNDaHSgU+gjXsWoE6bsBcXsR3SB3Sr U9ucTq6ZNFb6sY9sP6Lh8/B7MdoaJAdkadaqrxkBIuS4gowyXnaul3lMXfOMfHJh/084 Q8diRt2hP5mlYJ7tCMnL/PkopFOg4vecIAFbTR/2zMRCP9jxg6wS8x2z/ydNxxcZya0V 6sGhHqIa94ugwi3anAOyCZf4bM3JJATh8Pbn/5/SM1Yj54Q48iFgV9J5N3oTPvGm1X4L f0Xw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=R+475cbo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 61si4096942edf.562.2021.01.08.10.22.12; Fri, 08 Jan 2021 10:22:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=R+475cbo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728778AbhAHSU6 (ORCPT + 99 others); Fri, 8 Jan 2021 13:20:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728505AbhAHSU4 (ORCPT ); Fri, 8 Jan 2021 13:20:56 -0500 Received: from mail-qt1-x834.google.com (mail-qt1-x834.google.com [IPv6:2607:f8b0:4864:20::834]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 76DCFC061380 for ; Fri, 8 Jan 2021 10:19:48 -0800 (PST) Received: by mail-qt1-x834.google.com with SMTP id z3so7143631qtw.9 for ; Fri, 08 Jan 2021 10:19:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=eRLIK07R5/k0ZbmMrIsckpNA62/b6A5anKQsmHzEdBY=; b=R+475cboSKr58vv5Hxs0z9t7POpKNf/xhofVv4JdASVwAwEsgcjZcJmwpLBGtHnbMI /y4gV0jyoiWjSN+CQcuhmyl+l+hLgRJ3n8luhxBPOfykD4a0qw0AlPCE+FB4nejDX9TQ QUPNKenHeiKIbLHyyFjs0Y9mS8tSfEEDX5GAJd+nbuvh94vxkaM9mxkv8owrl/qYzi7a Nyx3g4KNCyekZ5JMFw6Kwpc8DJzsP6J/gwh7eXiQJSpg/QozsRrR7jGcpij+crgr5dgc FE0j4XH2c76AMTWuGgMUuekYV/I9iVmGyjf7w7wbSoo8Jk46wn2VMC7AMes4ngH5Q9oE cSTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=eRLIK07R5/k0ZbmMrIsckpNA62/b6A5anKQsmHzEdBY=; b=suaPxs9PkeDndcvUJqEDohJy6aSBlylLB6nQwiuZiZke8ZkXPuhlnKNBxljEKdZ8NU SlbRMW8XmbF5p+LkEwxfjzZqWffbk42hmXE5J1iv4lQ517GC4m9ZBIlDlJmlCug8HGyr UmPG6xrB7Ql/GKDbj8v8YgbDomRbf0Z35hFQdGE7iEqwRquLoKr8+cljmIDeOQ9gY+af Nps/yZ44cZiZcSO2sSNWbVusctKavdDCGZ61dB5m62BeI/uXU6voOumEodRRnnOApQr5 jjPyg/CT91E177sclwyfw1AOnRhJCKlLsQrAGP7r9ugahrB1Tsr9IQg1V3cKNlfJw9Ti oizg== X-Gm-Message-State: AOAM530ccZ3A0aftBna0npotbjXMwpErapPR1DbYcrAEjusxEGORWNeB usxunUb4To6QGFxffykPH/CFnQ== X-Received: by 2002:ac8:7a81:: with SMTP id x1mr4591721qtr.373.1610129987552; Fri, 08 Jan 2021 10:19:47 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-142-162-115-133.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.115.133]) by smtp.gmail.com with ESMTPSA id d46sm4848135qtc.76.2021.01.08.10.19.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Jan 2021 10:19:46 -0800 (PST) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1kxwMT-004IpK-Ik; Fri, 08 Jan 2021 14:19:45 -0400 Date: Fri, 8 Jan 2021 14:19:45 -0400 From: Jason Gunthorpe To: Andrea Arcangeli Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao , Andy Lutomirski , Peter Xu , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , Minchan Kim , Will Deacon , Peter Zijlstra , Linus Torvalds , Hugh Dickins , "Kirill A. Shutemov" , Matthew Wilcox , Oleg Nesterov , Jann Horn , Kees Cook , John Hubbard , Leon Romanovsky , Jan Kara , Kirill Tkhai Subject: Re: [PATCH 0/2] page_count can't be used to decide when wp_page_copy Message-ID: <20210108181945.GF504133@ziepe.ca> References: <20210107200402.31095-1-aarcange@redhat.com> <20210107202525.GD504133@ziepe.ca> <20210108133649.GE504133@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 08, 2021 at 12:00:36PM -0500, Andrea Arcangeli wrote: > > The majority cannot be converted to notifiers because they are DMA > > based. Every one of those is an ABI for something, and does not expect > > extra privilege to function. It would be a major breaking change to > > have pin_user_pages require some cap. > > ... what makes them safe is to be transient GUP pin and not long > term. > > Please note the "long term" in the underlined line. Many of them are long term, though only 50 or so have been marked specifically with FOLL_LONGTERM. I don't see how we can make such a major ABI break. Looking at it, vmsplice() is simply wrong. A long term page pin must use pin_user_pages(), and either FOLL_LONGTERM|FOLL_WRITE (write mode) FOLL_LONGTERM|FOLL_FORCE|FOLL_WRITE (read mode) ie it must COW and it must reject cases that are not longterm safe, like DAX and CMA and so on. These are the well established rules, vmsplice does not get a pass simply because it is using the CPU to memory copy as its "DMA". > speaking in practice. io_uring has similar concern but it can use mmu > notifier, so it can totally fix it and be 100% safe from this. IIRC io_uring does use FOLL_LONGTERM and FOLL_WRITE.. > The scheduler disclosure date was 2020-08-25 so I can freely explain > the case that motivated all these changes. > > case A) > > if !fork() { > // in child > mmap one page > vmsplice takes gup pin long term on such page > munmap one page > // mapcount == 1 (parent mm) > // page_count == 2 (gup in child, and parent mm) > } else { > parent writes to the page > // mapcount == 1, wp_page_reuse > } > > parent did a COW with mapcount == 1 so the parent will take over a > page that is still GUP pinned in the child. Sorry, I missed something, how does mmaping a fresh new page in the child impact the parent? I guess the issue is not to mmap but to GUP a shared page in a way that doesn't trigger COW during GUP and then munmap that page so a future parent COW does re-use, leaking access. It seems enforcing FOLL_WRITE to always COW on GUP closes this, right? This is what all correct FOLL_LONGTERM users do today, it is required for many other reasons beyond this interesting security issue. > However, you know full well in the second case it is a feature and not > a bug, that wp_page_reuse is called instead, and in fact it has to be > called or it's a bug (and that's the bug page_count in do_wp_page > introduces). What I was trying to explain below, is I think we agreed that a page under active FOLL_LONGTERM pin *can not* be write protected. Establishing the FOLL_LONGTERM pin (for read or write) must *always* break the write protection and the VM *cannot* later establish a new write protection on that page while the pin is active. Indeed, it is complete nonsense to try and write protect a page that has active DMA write activity! Changing the CPU page protection bits will not stop any DMA! Doing so will inevitably become a security problem with an attack similar to what you described. So this is what was done during fork() - fork will no longer write protect pages under FOLL_LONGTERM to make them COWable, instead it will copy them at fork time. Any other place doing write protect must also follow these same rules. I wasn't aware this could be used to create a security problem, but it does make sense. write protect really must mean writes to the memory must stop and that is fundementally incompatible with active DMA. Thus write protect of pages under DMA must be forbidden, as a matter of security. Jason