Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp788056rwb; Thu, 1 Dec 2022 08:21:54 -0800 (PST) X-Google-Smtp-Source: AA0mqf6IQ7MPeWOrQLLzl+j40hbFzzDtBNsnT60QYS7nYPH3JRE/IplrpzCQ3FIuddx5tKXbMJ8h X-Received: by 2002:a17:906:f281:b0:7ae:3b9e:1d8a with SMTP id gu1-20020a170906f28100b007ae3b9e1d8amr54160936ejb.581.1669911714625; Thu, 01 Dec 2022 08:21:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669911714; cv=none; d=google.com; s=arc-20160816; b=tHby8Sk8fhHM363eULeYZpuAq21O2TIgpzKSzJQC4jUmjNPvsoSbH7Zxd9cmKYWhNz FppzOH1VkjFXhomno+QExUgGv6lLvj//Scy5nHsfyWkisMCqbA9n+GgcBuKkZJVHLx2v 7zVyivYOqed4cgyDkrnh2V0KkGc0RtejisVOPyaUauHWFz0fvqIAjGNs/XLVKPtZ8y2e mgc2yxNHJR8cSkGckcW58O/OH/VqWijt+2s0XRm2alPYklD6vb92ZGfRJzbs+12xTJ5S EvZwBYv2ORCOkmLiT/D8hqqM8A3f3zWjvNCiPc3tbLdI7/dSrsyg1QCU5zwkXKj3X4rg Fzug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :organization:from:content-language:references:cc:to:subject :user-agent:mime-version:date:message-id:dkim-signature; bh=1PA8FyCR3/gunJx7DkyWDmKgZ7n6tougFbaW1vOf1yk=; b=l8/En0L077UGc0JxIfngPQZvRGJz6qOOlX8QylEXHGMNLVTiLmzZ5VpH/VtAaIkCB8 myZItqWbXAV/ZUvOY/ZPKWPnDB2XGdrNSmgmF/gvCpRf+JORCaihW2eZXEFJUFb9LQWQ +oEfbCMW46HQ+8SkG9e7e6tDGLbceucLBWq1GIiLlJQHsxAXPaG2assuPdSCQcX711Yp r+smoJwQ3Juq32VZ6XUQ2z/enEwnahi8YxUNfHM1D4qgckHmGSJjkU04hyyFphhG6UWo O2iwQVBRwcPnBe0wP5vVmG1eCRUMc5uoih9b/vSdfk1QrrozKsxJ7DANrhEgzcmlvpJV 7Blw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NWTd2Hwq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hs17-20020a1709073e9100b007addc76341csi4422666ejc.25.2022.12.01.08.21.35; Thu, 01 Dec 2022 08:21:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NWTd2Hwq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231737AbiLAPoH (ORCPT + 82 others); Thu, 1 Dec 2022 10:44:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231244AbiLAPoD (ORCPT ); Thu, 1 Dec 2022 10:44:03 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 530842C66C for ; Thu, 1 Dec 2022 07:43:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1669909386; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1PA8FyCR3/gunJx7DkyWDmKgZ7n6tougFbaW1vOf1yk=; b=NWTd2HwqGipQSsCWpNs7tN4cccx7vnVt65ET/nGK/9Kh2kVA3f8ZOS5JfhJJJa1jKnYhqc cZ6ccNGjL07hYex1T6jTTYq5G3IJBaOZD08/x2mJVNdKyEKofISzKbSJlq4FySfIydVrKh JjSYk+/ggcQx2QKKhgNHiJO08LJDvEA= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-357-MyKaNbwkNnOsjGvtS0NIvQ-1; Thu, 01 Dec 2022 10:43:05 -0500 X-MC-Unique: MyKaNbwkNnOsjGvtS0NIvQ-1 Received: by mail-wr1-f72.google.com with SMTP id d8-20020adf9b88000000b0024207f09827so534103wrc.20 for ; Thu, 01 Dec 2022 07:43:05 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=1PA8FyCR3/gunJx7DkyWDmKgZ7n6tougFbaW1vOf1yk=; b=PB8B96Ia1xDk+Cev+yxiTY8TYvFDGon7j8MNm9gmluAJ92IgZXKh5/3rOL6qJYXYno EYBRkpKLUCOfHK9K1En+ztU1BI/fWZd0Tqt9cjEhZwAMmpQmCBJlNYs3yjAVGybEbN7v S1aIwrAkLG3SJk01i39R+HDRjaAY2GauyaJU01Jm2k0cDUWsQFAJ2Ia0uwLhJy38IBQt StubpffD8MaVbzFn+naDtysHeY1oSj3Lxu3ugTb7u3p3LasOWUrjdR4Bey2DM2GWlBN5 IJybj12o/OZviUk88c1I6k9ulY8VT+6e4npUWldQtn1dFMxBw9izsB0rrp5cnhvMhaVn Jjzw== X-Gm-Message-State: ANoB5pkZxp2w2auAoSkO9ZDNJOlFQb8fqLDovqszrMwZ/H21WaEGEOSq 7mSq7boHf2I5jZ5QhTYGVNZskxI8n9zXcpNIcWsHhpQgqHbLZfYIEn/4XTwEXyFkeWoyzGll6Wy o/tefzOiQqn9orL/U6+z3qtpn X-Received: by 2002:a05:6000:124d:b0:242:10a:6667 with SMTP id j13-20020a056000124d00b00242010a6667mr23538788wrx.39.1669909381684; Thu, 01 Dec 2022 07:43:01 -0800 (PST) X-Received: by 2002:a05:6000:124d:b0:242:10a:6667 with SMTP id j13-20020a056000124d00b00242010a6667mr23538465wrx.39.1669909374676; Thu, 01 Dec 2022 07:42:54 -0800 (PST) Received: from ?IPV6:2a09:80c0:192:0:5dac:bf3d:c41:c3e7? ([2a09:80c0:192:0:5dac:bf3d:c41:c3e7]) by smtp.gmail.com with ESMTPSA id u11-20020a5d6acb000000b00241c4bd6c09sm4777074wrw.33.2022.12.01.07.42.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 01 Dec 2022 07:42:53 -0800 (PST) Message-ID: Date: Thu, 1 Dec 2022 16:42:52 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0 Subject: Re: [PATCH v3 1/2] mm/migrate: Fix read-only page got writable when recover pte To: Peter Xu , Andrew Morton Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Mike Rapoport , Nadav Amit , Andrea Arcangeli , Ives van Hoorne , Axel Rasmussen , Alistair Popple , stable@vger.kernel.org References: <20221114000447.1681003-1-peterx@redhat.com> <20221114000447.1681003-2-peterx@redhat.com> <5ddf1310-b49f-6e66-a22a-6de361602558@redhat.com> <20221130142425.6a7fdfa3e5954f3c305a77ee@linux-foundation.org> Content-Language: en-US From: David Hildenbrand Organization: Red Hat In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01.12.22 16:28, Peter Xu wrote: > Hi, Andrew, > > On Wed, Nov 30, 2022 at 02:24:25PM -0800, Andrew Morton wrote: >> On Tue, 15 Nov 2022 19:17:43 +0100 David Hildenbrand wrote: >> >>> On 14.11.22 01:04, Peter Xu wrote: >>>> Ives van Hoorne from codesandbox.io reported an issue regarding possible >>>> data loss of uffd-wp when applied to memfds on heavily loaded systems. The >>>> symptom is some read page got data mismatch from the snapshot child VMs. >>>> >>>> Here I can also reproduce with a Rust reproducer that was provided by Ives >>>> that keeps taking snapshot of a 256MB VM, on a 32G system when I initiate >>>> 80 instances I can trigger the issues in ten minutes. >>>> >>>> It turns out that we got some pages write-through even if uffd-wp is >>>> applied to the pte. >>>> >>>> The problem is, when removing migration entries, we didn't really worry >>>> about write bit as long as we know it's not a write migration entry. That >>>> may not be true, for some memory types (e.g. writable shmem) mk_pte can >>>> return a pte with write bit set, then to recover the migration entry to its >>>> original state we need to explicit wr-protect the pte or it'll has the >>>> write bit set if it's a read migration entry. For uffd it can cause >>>> write-through. >>>> >>>> The relevant code on uffd was introduced in the anon support, which is >>>> commit f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration", >>>> 2020-04-07). However anon shouldn't suffer from this problem because anon >>>> should already have the write bit cleared always, so that may not be a >>>> proper Fixes target, while I'm adding the Fixes to be uffd shmem support. >>>> >>> >>> ... >>> >>>> --- a/mm/migrate.c >>>> +++ b/mm/migrate.c >>>> @@ -213,8 +213,14 @@ static bool remove_migration_pte(struct folio *folio, >>>> pte = pte_mkdirty(pte); >>>> if (is_writable_migration_entry(entry)) >>>> pte = maybe_mkwrite(pte, vma); >>>> - else if (pte_swp_uffd_wp(*pvmw.pte)) >>>> + else >>>> + /* NOTE: mk_pte can have write bit set */ >>>> + pte = pte_wrprotect(pte); >>>> + >>>> + if (pte_swp_uffd_wp(*pvmw.pte)) { >>>> + WARN_ON_ONCE(pte_write(pte)); >> >> Will this warnnig trigger in the scenario you and Ives have discovered? > > If without the above newly added wr-protect, yes. This is the case where > we found we got write bit set even if uffd-wp bit is also set, hence allows > the write to go through even if marked protected. > >> >>>> pte = pte_mkuffd_wp(pte); >>>> + } >>>> >>>> if (folio_test_anon(folio) && !is_readable_migration_entry(entry)) >>>> rmap_flags |= RMAP_EXCLUSIVE; >>> >>> As raised, I don't agree to this generic non-uffd-wp change without >>> further, clear justification. >> >> Pater, can you please work this further? > > I didn't reply here because I have already replied with the question in > previous version with a few attempts. Quotting myself: > > https://lore.kernel.org/all/Y3KgYeMTdTM0FN5W@x1n/ > > The thing is recovering the pte into its original form is the > safest approach to me, so I think we need justification on why it's > always safe to set the write bit. > > I've also got another longer email trying to explain why I think it's the > other way round to be justfied, rather than justifying removal of the write > bit for a read migration entry, here: > And I disagree for this patch that is supposed to fix this hunk: @@ -243,11 +243,15 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, entry = pte_to_swp_entry(*pvmw.pte); if (is_write_migration_entry(entry)) pte = maybe_mkwrite(pte, vma); + else if (pte_swp_uffd_wp(*pvmw.pte)) + pte = pte_mkuffd_wp(pte); if (unlikely(is_zone_device_page(new))) { if (is_device_private_page(new)) { entry = make_device_private_entry(new, pte_write(pte)); pte = swp_entry_to_pte(entry); + if (pte_swp_uffd_wp(*pvmw.pte)) + pte = pte_mkuffd_wp(pte); } } There is really nothing to justify the other way around here. If it's broken fix it independently and properly backport it independenty. But we don't know about any such broken case. I have no energy to spare to argue further ;) -- Thanks, David / dhildenb