2022-04-23 14:17:23

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v4 0/3] Avoid live-lock in btrfs fault-in+uaccess loop

Hi,

A minor update from v3 here:

https://lore.kernel.org/r/[email protected]

In patch 3/3 I dropped the 'len' local variable, so the btrfs patch
simply replaces fault_in_writeable() with fault_in_subpage_writeable()
and adds a comment. I kept David's ack as there's no functional change
since v3.

Andrew, since there was no objection last time around, I'd like this
series to land in 5.19. As it touches arch, fs and mm, it should
probably go in via the mm tree but I'm also happy to merge the series
via arm64. Please let me know if you have any preference.

The btrfs search_ioctl() function can potentially live-lock on arm64
with MTE enabled due to a fault_in_writeable() + copy_to_user_nofault()
unbounded loop. The uaccess can fault in the middle of a page (MTE tag
check fault) even if a prior fault_in_writeable() successfully wrote to
the beginning of that page. The btrfs loop always restarts the fault-in
loop from the beginning of the user buffer, hence the live-lock.

The series introduces fault_in_subpage_writeable() together with the
arm64 probing counterpart and the btrfs fix.

Thanks.

Catalin Marinas (3):
mm: Add fault_in_subpage_writeable() to probe at sub-page granularity
arm64: Add support for user sub-page fault probing
btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page
faults

arch/Kconfig | 7 +++++++
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/mte.h | 1 +
arch/arm64/include/asm/uaccess.h | 15 +++++++++++++++
arch/arm64/kernel/mte.c | 30 ++++++++++++++++++++++++++++++
fs/btrfs/ioctl.c | 7 ++++++-
include/linux/pagemap.h | 1 +
include/linux/uaccess.h | 22 ++++++++++++++++++++++
mm/gup.c | 29 +++++++++++++++++++++++++++++
9 files changed, 112 insertions(+), 1 deletion(-)


base-commit: b2d229d4ddb17db541098b83524d901257e93845


2022-04-23 16:35:33

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v4 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults

Commit a48b73eca4ce ("btrfs: fix potential deadlock in the search
ioctl") addressed a lockdep warning by pre-faulting the user pages and
attempting the copy_to_user_nofault() in an infinite loop. On
architectures like arm64 with MTE, an access may fault within a page at
a location different from what fault_in_writeable() probed. Since the
sk_offset is rewound to the previous struct btrfs_ioctl_search_header
boundary, there is no guaranteed forward progress and search_ioctl() may
live-lock.

Use fault_in_subpage_writeable() instead of fault_in_writeable() to
ensure the permission is checked at the right granularity (smaller than
PAGE_SIZE).

Signed-off-by: Catalin Marinas <[email protected]>
Fixes: a48b73eca4ce ("btrfs: fix potential deadlock in the search ioctl")
Reported-by: Al Viro <[email protected]>
Acked-by: David Sterba <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
---
fs/btrfs/ioctl.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index be6c24577dbe..9bf0616a3069 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2565,7 +2565,12 @@ static noinline int search_ioctl(struct inode *inode,

while (1) {
ret = -EFAULT;
- if (fault_in_writeable(ubuf + sk_offset, *buf_size - sk_offset))
+ /*
+ * Ensure that the whole user buffer is faulted in at sub-page
+ * granularity, otherwise the loop may live-lock.
+ */
+ if (fault_in_subpage_writeable(ubuf + sk_offset,
+ *buf_size - sk_offset))
break;

ret = btrfs_search_forward(root, &key, path, sk->min_transid);

2022-04-24 05:25:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] Avoid live-lock in btrfs fault-in+uaccess loop

On Sat, Apr 23, 2022 at 3:07 AM Catalin Marinas <[email protected]> wrote:
>
> The series introduces fault_in_subpage_writeable() together with the
> arm64 probing counterpart and the btrfs fix.

Looks fine to me - and I think it can probably go through the arm64
tree since you'd be the only one really testing it anyway.

I assume you checked that btrfs is the only one that uses
fault_in_writeable() in this way? Everybody else updates to the right
byte boundary and retries (or returns immediately)?

Linus

2022-04-24 06:24:01

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v4 2/3] arm64: Add support for user sub-page fault probing

With MTE, even if the pte allows an access, a mismatched tag somewhere
within a page can still cause a fault. Select ARCH_HAS_SUBPAGE_FAULTS if
MTE is enabled and implement the probe_subpage_writeable() function.
Note that get_user() is sufficient for the writeable MTE check since the
same tag mismatch fault would be triggered by a read. The caller of
probe_subpage_writeable() will need to check the pte permissions
(put_user, GUP).

Signed-off-by: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/mte.h | 1 +
arch/arm64/include/asm/uaccess.h | 15 +++++++++++++++
arch/arm64/kernel/mte.c | 30 ++++++++++++++++++++++++++++++
4 files changed, 47 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 57c4c995965f..290b88238103 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1871,6 +1871,7 @@ config ARM64_MTE
depends on AS_HAS_LSE_ATOMICS
# Required for tag checking in the uaccess routines
depends on ARM64_PAN
+ select ARCH_HAS_SUBPAGE_FAULTS
select ARCH_USES_HIGH_VMA_FLAGS
help
Memory Tagging (part of the ARMv8.5 Extensions) provides
diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index adcb937342f1..aa523591a44e 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -47,6 +47,7 @@ long set_mte_ctrl(struct task_struct *task, unsigned long arg);
long get_mte_ctrl(struct task_struct *task);
int mte_ptrace_copy_tags(struct task_struct *child, long request,
unsigned long addr, unsigned long data);
+size_t mte_probe_user_range(const char __user *uaddr, size_t size);

#else /* CONFIG_ARM64_MTE */

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index e8dce0cc5eaa..6677aa7e9993 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -460,4 +460,19 @@ static inline int __copy_from_user_flushcache(void *dst, const void __user *src,
}
#endif

+#ifdef CONFIG_ARCH_HAS_SUBPAGE_FAULTS
+
+/*
+ * Return 0 on success, the number of bytes not probed otherwise.
+ */
+static inline size_t probe_subpage_writeable(const void __user *uaddr,
+ size_t size)
+{
+ if (!system_supports_mte())
+ return 0;
+ return mte_probe_user_range(uaddr, size);
+}
+
+#endif /* CONFIG_ARCH_HAS_SUBPAGE_FAULTS */
+
#endif /* __ASM_UACCESS_H */
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 78b3e0f8e997..35697a09926f 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -15,6 +15,7 @@
#include <linux/swapops.h>
#include <linux/thread_info.h>
#include <linux/types.h>
+#include <linux/uaccess.h>
#include <linux/uio.h>

#include <asm/barrier.h>
@@ -543,3 +544,32 @@ static int register_mte_tcf_preferred_sysctl(void)
return 0;
}
subsys_initcall(register_mte_tcf_preferred_sysctl);
+
+/*
+ * Return 0 on success, the number of bytes not probed otherwise.
+ */
+size_t mte_probe_user_range(const char __user *uaddr, size_t size)
+{
+ const char __user *end = uaddr + size;
+ int err = 0;
+ char val;
+
+ __raw_get_user(val, uaddr, err);
+ if (err)
+ return size;
+
+ uaddr = PTR_ALIGN(uaddr, MTE_GRANULE_SIZE);
+ while (uaddr < end) {
+ /*
+ * A read is sufficient for mte, the caller should have probed
+ * for the pte write permission if required.
+ */
+ __raw_get_user(val, uaddr, err);
+ if (err)
+ return end - uaddr;
+ uaddr += MTE_GRANULE_SIZE;
+ }
+ (void)val;
+
+ return 0;
+}

2022-04-24 12:11:17

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] Avoid live-lock in btrfs fault-in+uaccess loop

On Sat, Apr 23, 2022 at 09:35:42AM -0700, Linus Torvalds wrote:
> On Sat, Apr 23, 2022 at 3:07 AM Catalin Marinas <[email protected]> wrote:
> >
> > The series introduces fault_in_subpage_writeable() together with the
> > arm64 probing counterpart and the btrfs fix.
>
> Looks fine to me - and I think it can probably go through the arm64
> tree since you'd be the only one really testing it anyway.

I'll queue it via arm64 then.

> I assume you checked that btrfs is the only one that uses
> fault_in_writeable() in this way? Everybody else updates to the right
> byte boundary and retries (or returns immediately)?

I couldn't find any other places (by inspection or testing). The
buffered file I/O can already make progress in current fault_in_*() +
copy_*_user() loops. O_DIRECT either goes via GUP (and memcpy() doesn't
fault) or, if the user buffer is not PAGE aligned, it may fall back to
buffered I/O. That's why I simplified the series, AFAICT it's only btrfs
search_ioctl() with this problem.

--
Catalin

2022-04-25 22:25:59

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] Avoid live-lock in btrfs fault-in+uaccess loop

On Sat, 23 Apr 2022 11:07:48 +0100, Catalin Marinas wrote:
> A minor update from v3 here:
>
> https://lore.kernel.org/r/[email protected]
>
> In patch 3/3 I dropped the 'len' local variable, so the btrfs patch
> simply replaces fault_in_writeable() with fault_in_subpage_writeable()
> and adds a comment. I kept David's ack as there's no functional change
> since v3.
>
> [...]

Applied to arm64 (for-next/fault-in-subpage). Also changed the
probe_subpage_writeable() prototype to use char __user * instead of void
__user * (as per Andrew's suggestion).

[1/3] mm: Add fault_in_subpage_writeable() to probe at sub-page granularity
https://git.kernel.org/arm64/c/da32b5817253
[2/3] arm64: Add support for user sub-page fault probing
https://git.kernel.org/arm64/c/f3ba50a7a100
[3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults
https://git.kernel.org/arm64/c/18788e34642e

--
Catalin

2022-04-25 22:29:42

by Andreas Gruenbacher

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] Avoid live-lock in btrfs fault-in+uaccess loop

Hi Catalin,

On Sat, Apr 23, 2022 at 8:40 PM Catalin Marinas <[email protected]> wrote:
> On Sat, Apr 23, 2022 at 09:35:42AM -0700, Linus Torvalds wrote:
> > On Sat, Apr 23, 2022 at 3:07 AM Catalin Marinas <[email protected]> wrote:
> > >
> > > The series introduces fault_in_subpage_writeable() together with the
> > > arm64 probing counterpart and the btrfs fix.
> >
> > Looks fine to me - and I think it can probably go through the arm64
> > tree since you'd be the only one really testing it anyway.
>
> I'll queue it via arm64 then.

sounds good to me, thank you.

> > I assume you checked that btrfs is the only one that uses
> > fault_in_writeable() in this way? Everybody else updates to the right
> > byte boundary and retries (or returns immediately)?
>
> I couldn't find any other places (by inspection or testing). The
> buffered file I/O can already make progress in current fault_in_*() +
> copy_*_user() loops.

This started working correctly with commit bc1bb416bbb9
("generic_perform_write()/iomap_write_actor(): saner logics for short
copy") by Al from last May.

> O_DIRECT either goes via GUP (and memcpy() doesn't
> fault) or, if the user buffer is not PAGE aligned, it may fall back to
> buffered I/O. That's why I simplified the series, AFAICT it's only btrfs
> search_ioctl() with this problem.

Thanks,
Andreas