This comes from a report from Ives on using uffd-wp on shmem. More
information can be found in patch 1 commit message.
Patch 2 added some more sanity check when walking pgtables and when we
convert the ptes into other forms e.g. for migration and swap. It will
make the error trigger even earlier than the user could notice, meanwhile
nail down the case if it's a wrong pgtable setup.
Ives, I only attached the reported-by tag for you but not tested-by because
the fix patch (patch 1) has a slight change compared to what I sent you
before, but hopefully it should also work for you. If you want, feel free
to reply directly here if the patch also works for you.
We probably need patch 1 for stable (5.19+). Please have a look, thanks.
Peter Xu (2):
mm/migrate: Fix read-only page got writable when recover pte
mm/uffd: Sanity check write bit for uffd-wp protected ptes
arch/x86/include/asm/pgtable.h | 16 +++++++++++++++-
mm/migrate.c | 8 +++++++-
2 files changed, 22 insertions(+), 2 deletions(-)
--
2.37.3
Let's add one sanity check for CONFIG_DEBUG_VM on the write bit in whatever
chance we have when walking through the pgtables. It can bring the error
earlier even before the app notices the data was corrupted on the snapshot.
Also it helps us to identify this is a wrong pgtable setup, so hopefully a
great information to have for debugging too.
Wrapping with CONFIG_DEBUG_VM is not that useful considering we have that
in many distros already, but still do that just in case some custom build
doesn't want anything like it.
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
arch/x86/include/asm/pgtable.h | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 5059799bebe3..27fff6b14929 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -291,7 +291,21 @@ static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear)
#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
static inline int pte_uffd_wp(pte_t pte)
{
- return pte_flags(pte) & _PAGE_UFFD_WP;
+ bool wp = pte_flags(pte) & _PAGE_UFFD_WP;
+#ifdef CONFIG_DEBUG_VM
+ /*
+ * Having write bit for wr-protect-marked present ptes is fatal,
+ * because it means the uffd-wp bit will be ignored and write will
+ * just go through.
+ *
+ * Use any chance of pgtable walking to verify this (e.g., when
+ * page swapped out or being migrated for all purposes). It means
+ * something is already wrong. Tell the admin even before the
+ * process crashes. We also nail it with wrong pgtable setup.
+ */
+ WARN_ON_ONCE(wp && pte_write(pte));
+#endif
+ return wp;
}
static inline pte_t pte_mkuffd_wp(pte_t pte)
--
2.37.3
Ives van Hoorne from codesandbox.io reported an issue regarding possible
data loss of uffd-wp when applied to memfds on heavily loaded systems. The
sympton is some read page got data mismatch from the snapshot child VMs.
Here I can also reproduce with a Rust reproducer that was provided by Ives
that keeps taking snapshot of a 256MB VM, on a 32G system when I initiate
80 instances I can trigger the issues in ten minutes.
It turns out that we got some pages write-through even if uffd-wp is
applied to the pte.
The problem is, when removing migration entries, we didn't really worry
about write bit as long as we know it's not a write migration entry. That
may not be true, for some memory types (e.g. writable shmem) mk_pte can
return a pte with write bit set, then to recover the migration entry to its
original state we need to explicit wr-protect the pte or it'll has the
write bit set if it's a read migration entry.
For uffd it can cause write-through. I didn't verify, but I think it'll be
the same for mprotect()ed pages and after migration we can miss the sigbus
instead.
The relevant code on uffd was introduced in the anon support, which is
commit f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration",
2020-04-07). However anon shouldn't suffer from this problem because anon
should already have the write bit cleared always, so that may not be a
proper Fixes target. To satisfy the need on the backport, I'm attaching
the Fixes tag to the uffd-wp shmem support. Since no one had issue with
mprotect, so I assume that's also the kernel version we should start to
backport for stable, and we shouldn't need to worry before that.
Cc: Andrea Arcangeli <[email protected]>
Cc: [email protected]
Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs")
Reported-by: Ives van Hoorne <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
mm/migrate.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index dff333593a8a..8b6351c08c78 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -213,8 +213,14 @@ static bool remove_migration_pte(struct folio *folio,
pte = pte_mkdirty(pte);
if (is_writable_migration_entry(entry))
pte = maybe_mkwrite(pte, vma);
- else if (pte_swp_uffd_wp(*pvmw.pte))
+ else
+ /* NOTE: mk_pte can have write bit set */
+ pte = pte_wrprotect(pte);
+
+ if (pte_swp_uffd_wp(*pvmw.pte)) {
+ WARN_ON_ONCE(pte_write(pte));
pte = pte_mkuffd_wp(pte);
+ }
if (folio_test_anon(folio) && !is_readable_migration_entry(entry))
rmap_flags |= RMAP_EXCLUSIVE;
--
2.37.3
On Nov 10, 2022, at 7:17 AM, Peter Xu <[email protected]> wrote:
> +#ifdef CONFIG_DEBUG_VM
> + /*
> + * Having write bit for wr-protect-marked present ptes is fatal,
> + * because it means the uffd-wp bit will be ignored and write will
> + * just go through.
> + *
> + * Use any chance of pgtable walking to verify this (e.g., when
> + * page swapped out or being migrated for all purposes). It means
> + * something is already wrong. Tell the admin even before the
> + * process crashes. We also nail it with wrong pgtable setup.
> + */
> + WARN_ON_ONCE(wp && pte_write(pte));
How about VM_WARN_ON_ONCE() and no ifdef?
On Thu, Nov 10, 2022 at 10:43:25AM -0800, Nadav Amit wrote:
> On Nov 10, 2022, at 7:17 AM, Peter Xu <[email protected]> wrote:
>
> > +#ifdef CONFIG_DEBUG_VM
> > + /*
> > + * Having write bit for wr-protect-marked present ptes is fatal,
> > + * because it means the uffd-wp bit will be ignored and write will
> > + * just go through.
> > + *
> > + * Use any chance of pgtable walking to verify this (e.g., when
> > + * page swapped out or being migrated for all purposes). It means
> > + * something is already wrong. Tell the admin even before the
> > + * process crashes. We also nail it with wrong pgtable setup.
> > + */
> > + WARN_ON_ONCE(wp && pte_write(pte));
>
> How about VM_WARN_ON_ONCE() and no ifdef?
Oops.. Will quickly respin, thanks.
--
Peter Xu