Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp5268837pxu; Tue, 22 Dec 2020 12:21:01 -0800 (PST) X-Google-Smtp-Source: ABdhPJy1AdW8LTpY8bPQDgDHx0KT9LnPwKX0k/BS6z952h68KGXIMSTsFC/itATxcnV49//bu/eP X-Received: by 2002:a17:906:ae41:: with SMTP id lf1mr21736365ejb.369.1608668461758; Tue, 22 Dec 2020 12:21:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608668461; cv=none; d=google.com; s=arc-20160816; b=HrObNhcXT67/ttMlKXkq7CQXaGiuxjO0l6AC6pJSsaQ/LE+iWsOnKJ+MMNAWXVBwJg gQMMOE1rgFknzep06EZJOrigRnxagpZ028lMSrCEDRVDpJq3UXUPh5d1yd6rOFXFLwbo 3Z6+z/aDcc3028qExYYq5VM8Q/8SFhYjq7l95ohDgzw7az8j/qmuvRMNIJw5CxFWJYO7 zZvJwKGRw6nyVAI7Vuhls7EAgcgVfHHsVcE//dnjRQa944EhmiWhlO0vVtD7p2sLRpZQ UleS2urENLLhhczdJD6ayfYcT6nIWfRzEQ/hhjC1FVH8704P3cYj8qnMXMnDuuvMBHXo lfuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=uODjZhfzyMhjUStJXk2bkrXwEumDeIAUQ24xW8hVb/U=; b=0yPmy3WRoYMZI0kgpOMcZOiM1iHJ0Dco8hAaeNgYKmObueqYAQQZHY9Wl4Iuoo0Q8m mgWGaymFPicXzj44eIMjUQUs1xtFsn7qwhLIIvtkY1EAaZA37ECAmoANeOFnSBFH37Wg sTy0zrk12B9n05CZ61kWF7rc0KGeH3Uu2u9dAaUrGnpr4Eyg2om4JFS4USsCwtzb0IF1 vFr9fWDlHdhaIBHiYSLbSjDJjM7bY0iaOT4FJ60IQtHi4yQvO7BzuwRR5kO7p1RUlhV7 F2pi/Pbkx10qKRc01XNDUnckyMjZTkvOHf0O6LpWpoqMDC95Dsc/yM8rQ5L5qHlfZfRt Y9gA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="aEwxO/bF"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id rh14si7221719ejb.210.2020.12.22.12.20.38; Tue, 22 Dec 2020 12:21:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="aEwxO/bF"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726412AbgLVUUH (ORCPT + 99 others); Tue, 22 Dec 2020 15:20:07 -0500 Received: from mail.kernel.org ([198.145.29.99]:52586 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725913AbgLVUUG (ORCPT ); Tue, 22 Dec 2020 15:20:06 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 59BD422B2D for ; Tue, 22 Dec 2020 20:19:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1608668365; bh=mXrauKSE9SR6fvVV4uNsc75DoOCwPZAdIHnBX7if57M=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=aEwxO/bFWk3IH/MxTKqrjzFlAnchKQNCXcXVz5B529HTN4x2fmzTXjf8xJjN/uldl fc6EX/84JFTXxFGnzli6OAL5KavuHZl4veW5AfmtlXBezn+bKB3HHxXuXFkBULbvoo lx1pw4beDG9SeRmTML1dMDFWKYKFeiBSsYldq7oDaqgSP7zW3V9TlmsfwnizhkbF2f eNvQAc0HQ5U8YQ7xd7N7MmPa9FKBZ27KEWNYTc5vxqm/iJq+wyhTHjeSIkS8RE9JGo itH9HPZONETy9Ntd6nUgZFl3qeQIALwVfTFef6C4QzYKPwo2PnFqUKV7vvPlttK09t t9JSRb6eGVYOQ== Received: by mail-wr1-f44.google.com with SMTP id t16so15863582wra.3 for ; Tue, 22 Dec 2020 12:19:25 -0800 (PST) X-Gm-Message-State: AOAM531gJxEbdL5r5jW1IXqseLQKcL62ZIFRdXg/5ogtZrpm/oGchrgu iHsZZf+G1f5jvqS5k5MdfPebRn20bdeB/LmLCaFk4w== X-Received: by 2002:adf:e64b:: with SMTP id b11mr25822587wrn.257.1608668363950; Tue, 22 Dec 2020 12:19:23 -0800 (PST) MIME-Version: 1.0 References: <20201221172711.GE6640@xz-x1> <76B4F49B-ED61-47EA-9BE4-7F17A26B610D@gmail.com> <9E301C7C-882A-4E0F-8D6D-1170E792065A@gmail.com> <1FCC8F93-FF29-44D3-A73A-DF943D056680@gmail.com> <20201221223041.GL6640@xz-x1> In-Reply-To: From: Andy Lutomirski Date: Tue, 22 Dec 2020 12:19:11 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect To: Linus Torvalds Cc: Andy Lutomirski , Peter Xu , Nadav Amit , Yu Zhao , Andrea Arcangeli , linux-mm , lkml , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , stable , Minchan Kim , Will Deacon , Peter Zijlstra Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 21, 2020 at 8:16 PM Linus Torvalds wrote: > > On Mon, Dec 21, 2020 at 7:19 PM Andy Lutomirski wrote: > > > > Ugh, this is unpleasantly complicated. > > What I *should* have said is that *because* userfaultfd is changing > the VM layout, it should always act as if it had to take the mmap_sem > for writing, and that the RW->RO change is an example of when > downgrading that write-lock to a read lock is simply not valid - > because it basically breaks the rules about what a lookup (ie a read) > can do. > > A lookup can never turn a writable page non-writable. A lookup - > through COW - _can_ turn a read-only page writable. So that's why > "RO->RW" can be ok under the read lock, but RW->RO is not. > > Does that clarify the situation when I phrase it that way instead? It's certainly more clear, but I'm still not thrilled with a bunch of functions in different files maintained by different people that cooperate using an undocumented lockless protocol. It makes me nervous from a maintainability standpoint even if the code is, in fact, entirely correct. But I didn't make my question clear either: when I asked about mark_screen_rdonly(), I wasn't asking about locking or races at all. mark_screen_rdonly() will happily mark *any* PTE readonly. I can easily believe that writable mappings of non-shared mappings all follow the same CoW rules in the kernel and that, assuming all the locking is correct, marking them readonly is conceptually okay. But shared mappings are a whole different story. Is it actually the case that all, or even most, drivers and other sources of shared mappings will function correctly if one of their PTEs becomes readonly behind their back? Or maybe there are less bizarre ways for this to happen without vm86 shenanigans and this is completely safe after all. P.S. This whole mess reminds me that I need to take a closer look at the upcoming SHSTK code. Shadow stack pages violate all common sense about how the various PTE bits work.