2022-04-22 14:48:10

by Nico Pache

[permalink] [raw]
Subject: [RFC 0/3] Slight improvements for OOM/Futex

The following 3 patches were developed while working on the OOM/Futex
fix.

Patch 1: This is a nonfunctional change. The vma_is_anonymous function
originally led to some confusion about what the oom reaper is checking
for in __oom_reap_task_mm. This patch makes that function name more
verbose.

Patch 2: Futex cleanup was designed with silent failures. Printing this
failure would have led to more quickly finding the issue. This
introduces a pr_info if the exit path has any issues walking the
userspace list.

Patch 3: During the debug process I noticed that the oom_reap_task_mm
function was always running twice for the same mm_struct; Once in the
exit path and once in the oom_reaper. By checking the MMF_OOM_SKIP we
can prevent these unnecissary double calls. I'd like some input from
David Rientjes here with regards to CVE-2018-1000200, I want to make
sure we are not reintroducing that CVE.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: William Kucharski <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: "AndrĂ© Almeida" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Ulrich Drepper <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Nico Pache <[email protected]>

Nico Pache (3):
mm: change vma_is_anonymous to vma_is_private_anon
futex: exit: Print a warning when futex_cleanup fails
exit: Check for MMF_OOM_SKIP in exit_mmap

arch/powerpc/mm/book3s64/pgtable.c | 2 +-
fs/userfaultfd.c | 2 +-
include/linux/huge_mm.h | 2 +-
include/linux/mm.h | 2 +-
kernel/futex/core.c | 44 ++++++++++++++++++------------
mm/gup.c | 4 +--
mm/huge_memory.c | 10 +++----
mm/madvise.c | 4 +--
mm/memory.c | 10 +++----
mm/migrate_device.c | 6 ++--
mm/mincore.c | 2 +-
mm/mmap.c | 7 +++--
mm/oom_kill.c | 2 +-
mm/userfaultfd.c | 8 +++---
14 files changed, 57 insertions(+), 48 deletions(-)

--
2.35.1


2022-04-22 18:13:04

by Nico Pache

[permalink] [raw]
Subject: [RFC 1/3] mm: change vma_is_anonymous to vma_is_private_anon

The vma_is_anonymous function isn't fully indicative of what it checks.

Without having full knowledge of the mmap process, one may incorrectly
assume this covers all types of anonymous memory; which is not the case.

No functional changes.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: William Kucharski <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Nico Pache <[email protected]>
---
arch/powerpc/mm/book3s64/pgtable.c | 2 +-
fs/userfaultfd.c | 2 +-
include/linux/huge_mm.h | 2 +-
include/linux/mm.h | 2 +-
mm/gup.c | 4 ++--
mm/huge_memory.c | 10 +++++-----
mm/madvise.c | 4 ++--
mm/memory.c | 10 +++++-----
mm/migrate_device.c | 6 +++---
mm/mincore.c | 2 +-
mm/mmap.c | 4 ++--
mm/oom_kill.c | 2 +-
mm/userfaultfd.c | 8 ++++----
13 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 052e6590f84f..c8fd68a7965d 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -479,7 +479,7 @@ int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
struct vm_area_struct *vma)
{
if (radix_enabled())
- return (new_pmd_ptl != old_pmd_ptl) && vma_is_anonymous(vma);
+ return (new_pmd_ptl != old_pmd_ptl) && vma_is_private_anon(vma);

return true;
}
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index aa0c47cb0d16..d3fe2c26874a 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1269,7 +1269,7 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma,
return false;
}

- return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) ||
+ return vma_is_private_anon(vma) || is_vm_hugetlb_page(vma) ||
vma_is_shmem(vma);
}

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 2999190adc22..40493f53562c 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -119,7 +119,7 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
unsigned long haddr)
{
/* Don't have to check pgoff for anonymous vma */
- if (!vma_is_anonymous(vma)) {
+ if (!vma_is_private_anon(vma)) {
if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
HPAGE_PMD_NR))
return false;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e34edb775334..9a01d053853b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -623,7 +623,7 @@ static inline void vma_set_anonymous(struct vm_area_struct *vma)
vma->vm_ops = NULL;
}

-static inline bool vma_is_anonymous(struct vm_area_struct *vma)
+static inline bool vma_is_private_anon(struct vm_area_struct *vma)
{
return !vma->vm_ops;
}
diff --git a/mm/gup.c b/mm/gup.c
index f598a037eb04..7cd3ef329ea2 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -398,7 +398,7 @@ static struct page *no_page_table(struct vm_area_struct *vma,
* be zero-filled if handle_mm_fault() actually did handle it.
*/
if ((flags & FOLL_DUMP) &&
- (vma_is_anonymous(vma) || !vma->vm_ops->fault))
+ (vma_is_private_anon(vma) || !vma->vm_ops->fault))
return ERR_PTR(-EFAULT);
return NULL;
}
@@ -913,7 +913,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
if (vm_flags & (VM_IO | VM_PFNMAP))
return -EFAULT;

- if (gup_flags & FOLL_ANON && !vma_is_anonymous(vma))
+ if (gup_flags & FOLL_ANON && !vma_is_private_anon(vma))
return -EFAULT;

if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma))
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c468fee595ff..81c2123aeffe 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -82,7 +82,7 @@ bool transparent_hugepage_active(struct vm_area_struct *vma)

if (!transhuge_vma_suitable(vma, addr))
return false;
- if (vma_is_anonymous(vma))
+ if (vma_is_private_anon(vma))
return __transparent_hugepage_enabled(vma);
if (vma_is_shmem(vma))
return shmem_huge_enabled(vma);
@@ -1035,7 +1035,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
int ret = -ENOMEM;

/* Skip if can be re-fill on fault */
- if (!vma_is_anonymous(dst_vma))
+ if (!vma_is_private_anon(dst_vma))
return 0;

pgtable = pte_alloc_one(dst_mm);
@@ -1622,7 +1622,7 @@ static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
*
* We also don't deposit and withdraw tables for file pages.
*/
- return (new_pmd_ptl != old_pmd_ptl) && vma_is_anonymous(vma);
+ return (new_pmd_ptl != old_pmd_ptl) && vma_is_private_anon(vma);
}
#endif

@@ -1799,7 +1799,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
}
ret = HPAGE_PMD_NR;
set_pmd_at(mm, addr, pmd, entry);
- BUG_ON(vma_is_anonymous(vma) && !preserve_write && pmd_write(entry));
+ BUG_ON(vma_is_private_anon(vma) && !preserve_write && pmd_write(entry));
unlock:
spin_unlock(ptl);
return ret;
@@ -1957,7 +1957,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,

count_vm_event(THP_SPLIT_PMD);

- if (!vma_is_anonymous(vma)) {
+ if (!vma_is_private_anon(vma)) {
old_pmd = pmdp_huge_clear_flush_notify(vma, haddr, pmd);
/*
* We are going to unmap this huge page. So
diff --git a/mm/madvise.c b/mm/madvise.c
index 1873616a37d2..23771875ae9d 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -543,7 +543,7 @@ static void madvise_pageout_page_range(struct mmu_gather *tlb,

static inline bool can_do_pageout(struct vm_area_struct *vma)
{
- if (vma_is_anonymous(vma))
+ if (vma_is_private_anon(vma))
return true;
if (!vma->vm_file)
return false;
@@ -725,7 +725,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
struct mmu_gather tlb;

/* MADV_FREE works for only anon vma at the moment */
- if (!vma_is_anonymous(vma))
+ if (!vma_is_private_anon(vma))
return -EINVAL;

range.start = max(vma->vm_start, start_addr);
diff --git a/mm/memory.c b/mm/memory.c
index 76e3af9639d9..fcb599bc5c33 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4494,7 +4494,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)

static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf)
{
- if (vma_is_anonymous(vmf->vma))
+ if (vma_is_private_anon(vmf->vma))
return do_huge_pmd_anonymous_page(vmf);
if (vmf->vma->vm_ops->huge_fault)
return vmf->vma->vm_ops->huge_fault(vmf, PE_SIZE_PMD);
@@ -4504,7 +4504,7 @@ static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf)
/* `inline' is required to avoid gcc 4.1.2 build error */
static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf)
{
- if (vma_is_anonymous(vmf->vma)) {
+ if (vma_is_private_anon(vmf->vma)) {
if (userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd))
return handle_userfault(vmf, VM_UFFD_WP);
return do_huge_pmd_wp_page(vmf);
@@ -4527,7 +4527,7 @@ static vm_fault_t create_huge_pud(struct vm_fault *vmf)
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \
defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
/* No support for anonymous transparent PUD pages yet */
- if (vma_is_anonymous(vmf->vma))
+ if (vma_is_private_anon(vmf->vma))
goto split;
if (vmf->vma->vm_ops->huge_fault) {
vm_fault_t ret = vmf->vma->vm_ops->huge_fault(vmf, PE_SIZE_PUD);
@@ -4546,7 +4546,7 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud)
{
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
/* No support for anonymous transparent PUD pages yet */
- if (vma_is_anonymous(vmf->vma))
+ if (vma_is_private_anon(vmf->vma))
return VM_FAULT_FALLBACK;
if (vmf->vma->vm_ops->huge_fault)
return vmf->vma->vm_ops->huge_fault(vmf, PE_SIZE_PUD);
@@ -4621,7 +4621,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
}

if (!vmf->pte) {
- if (vma_is_anonymous(vmf->vma))
+ if (vma_is_private_anon(vmf->vma))
return do_anonymous_page(vmf);
else
return do_fault(vmf);
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 70c7dc05bbfc..4a4068c410f7 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -40,7 +40,7 @@ static int migrate_vma_collect_hole(unsigned long start,
unsigned long addr;

/* Only allow populating anonymous memory. */
- if (!vma_is_anonymous(walk->vma))
+ if (!vma_is_private_anon(walk->vma))
return migrate_vma_collect_skip(start, end, walk);

for (addr = start; addr < end; addr += PAGE_SIZE) {
@@ -120,7 +120,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
pte = *ptep;

if (pte_none(pte)) {
- if (vma_is_anonymous(vma)) {
+ if (vma_is_private_anon(vma)) {
mpfn = MIGRATE_PFN_MIGRATE;
migrate->cpages++;
}
@@ -518,7 +518,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
pte_t *ptep;

/* Only allow populating anonymous memory */
- if (!vma_is_anonymous(vma))
+ if (!vma_is_private_anon(vma))
goto abort;

pgdp = pgd_offset(mm, addr);
diff --git a/mm/mincore.c b/mm/mincore.c
index 9122676b54d6..5aa4a1358ac1 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -156,7 +156,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,

static inline bool can_do_mincore(struct vm_area_struct *vma)
{
- if (vma_is_anonymous(vma))
+ if (vma_is_private_anon(vma))
return true;
if (!vma->vm_file)
return false;
diff --git a/mm/mmap.c b/mm/mmap.c
index 3aa839f81e63..a2968669fd4e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3189,7 +3189,7 @@ int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma)
* using the existing file pgoff checks and manipulations.
* Similarly in do_mmap and in do_brk_flags.
*/
- if (vma_is_anonymous(vma)) {
+ if (vma_is_private_anon(vma)) {
BUG_ON(vma->anon_vma);
vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
}
@@ -3217,7 +3217,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
* If anonymous vma has not yet been faulted, update new pgoff
* to match new location, to increase its chance of merging.
*/
- if (unlikely(vma_is_anonymous(vma) && !vma->anon_vma)) {
+ if (unlikely(vma_is_private_anon(vma) && !vma->anon_vma)) {
pgoff = addr >> PAGE_SHIFT;
faulted_in_anon_vma = false;
}
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 7ec38194f8e1..aa3e68934fcb 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -536,7 +536,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm)
* we do not want to block exit_mmap by keeping mm ref
* count elevated without a good reason.
*/
- if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) {
+ if (vma_is_private_anon(vma) || !(vma->vm_flags & VM_SHARED)) {
struct mmu_notifier_range range;
struct mmu_gather tlb;

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 0cb8e5ef1713..c94f81bd109b 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -540,9 +540,9 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
err = -EINVAL;
/*
* shmem_zero_setup is invoked in mmap for MAP_ANONYMOUS|MAP_SHARED but
- * it will overwrite vm_ops, so vma_is_anonymous must return false.
+ * it will overwrite vm_ops, so vma_is_private_anon must return false.
*/
- if (WARN_ON_ONCE(vma_is_anonymous(dst_vma) &&
+ if (WARN_ON_ONCE(vma_is_private_anon(dst_vma) &&
dst_vma->vm_flags & VM_SHARED))
goto out_unlock;

@@ -561,7 +561,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start,
src_start, len, mcopy_mode);

- if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma))
+ if (!vma_is_private_anon(dst_vma) && !vma_is_shmem(dst_vma))
goto out_unlock;
if (!vma_is_shmem(dst_vma) && mcopy_mode == MCOPY_ATOMIC_CONTINUE)
goto out_unlock;
@@ -717,7 +717,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
goto out_unlock;
if (!userfaultfd_wp(dst_vma))
goto out_unlock;
- if (!vma_is_anonymous(dst_vma))
+ if (!vma_is_private_anon(dst_vma))
goto out_unlock;

if (enable_wp)
--
2.35.1

2022-04-22 18:16:50

by Nico Pache

[permalink] [raw]
Subject: [RFC 2/3] futex: exit: Print a warning when futex_cleanup fails

The futex_cleanup routines currently fails silently.

Allow the futex_cleanup routines to fail more verbosely so we can
leave a message behind for easier debugging/error detecting.

Fixes: 0771dfefc9e5 ("[PATCH] lightweight robust futexes: core")
Cc: Andrew Morton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: "AndrĂ© Almeida" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Ulrich Drepper <[email protected]>
Signed-off-by: Nico Pache <[email protected]>
---
kernel/futex/core.c | 44 ++++++++++++++++++++++++++------------------
1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index b22ef1efe751..8a62eb1b5d77 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -760,9 +760,9 @@ static inline int fetch_robust_entry(struct robust_list __user **entry,
* Walk curr->robust_list (very carefully, it's a userspace list!)
* and mark any locks found there dead, and notify any waiters.
*
- * We silently return on any sign of list-walking problem.
+ * We return false on any sign of list-walking problem.
*/
-static void exit_robust_list(struct task_struct *curr)
+static bool exit_robust_list(struct task_struct *curr)
{
struct robust_list_head __user *head = curr->robust_list;
struct robust_list __user *entry, *next_entry, *pending;
@@ -771,23 +771,25 @@ static void exit_robust_list(struct task_struct *curr)
unsigned long futex_offset;
int rc;

+ if (!futex_cmpxchg_enabled)
+ return false;
/*
* Fetch the list head (which was registered earlier, via
* sys_set_robust_list()):
*/
if (fetch_robust_entry(&entry, &head->list.next, &pi))
- return;
+ return false;
/*
* Fetch the relative futex offset:
*/
if (get_user(futex_offset, &head->futex_offset))
- return;
+ return false;
/*
* Fetch any possibly pending lock-add first, and handle it
* if it exists:
*/
if (fetch_robust_entry(&pending, &head->list_op_pending, &pip))
- return;
+ return false;

next_entry = NULL; /* avoid warning with gcc */
while (entry != &head->list) {
@@ -803,10 +805,10 @@ static void exit_robust_list(struct task_struct *curr)
if (entry != pending) {
if (handle_futex_death((void __user *)entry + futex_offset,
curr, pi, HANDLE_DEATH_LIST))
- return;
+ return false;
}
if (rc)
- return;
+ return false;
entry = next_entry;
pi = next_pi;
/*
@@ -819,9 +821,10 @@ static void exit_robust_list(struct task_struct *curr)
}

if (pending) {
- handle_futex_death((void __user *)pending + futex_offset,
+ return handle_futex_death((void __user *)pending + futex_offset,
curr, pip, HANDLE_DEATH_PENDING);
}
+ return true;
}

#ifdef CONFIG_COMPAT
@@ -854,9 +857,9 @@ compat_fetch_robust_entry(compat_uptr_t *uentry, struct robust_list __user **ent
* Walk curr->robust_list (very carefully, it's a userspace list!)
* and mark any locks found there dead, and notify any waiters.
*
- * We silently return on any sign of list-walking problem.
+ * We return false on any sign of list-walking problem.
*/
-static void compat_exit_robust_list(struct task_struct *curr)
+static bool compat_exit_robust_list(struct task_struct *curr)
{
struct compat_robust_list_head __user *head = curr->compat_robust_list;
struct robust_list __user *entry, *next_entry, *pending;
@@ -866,24 +869,26 @@ static void compat_exit_robust_list(struct task_struct *curr)
compat_long_t futex_offset;
int rc;

+ if (!futex_cmpxchg_enabled)
+ return false;
/*
* Fetch the list head (which was registered earlier, via
* sys_set_robust_list()):
*/
if (compat_fetch_robust_entry(&uentry, &entry, &head->list.next, &pi))
- return;
+ return false;
/*
* Fetch the relative futex offset:
*/
if (get_user(futex_offset, &head->futex_offset))
- return;
+ return false;
/*
* Fetch any possibly pending lock-add first, and handle it
* if it exists:
*/
if (compat_fetch_robust_entry(&upending, &pending,
&head->list_op_pending, &pip))
- return;
+ return false;

next_entry = NULL; /* avoid warning with gcc */
while (entry != (struct robust_list __user *) &head->list) {
@@ -902,10 +907,10 @@ static void compat_exit_robust_list(struct task_struct *curr)

if (handle_futex_death(uaddr, curr, pi,
HANDLE_DEATH_LIST))
- return;
+ return false;
}
if (rc)
- return;
+ return false;
uentry = next_uentry;
entry = next_entry;
pi = next_pi;
@@ -920,8 +925,9 @@ static void compat_exit_robust_list(struct task_struct *curr)
if (pending) {
void __user *uaddr = futex_uaddr(pending, futex_offset);

- handle_futex_death(uaddr, curr, pip, HANDLE_DEATH_PENDING);
+ return handle_futex_death(uaddr, curr, pip, HANDLE_DEATH_PENDING);
}
+ return true;
}
#endif

@@ -1007,13 +1013,15 @@ static inline void exit_pi_state_list(struct task_struct *curr) { }
static void futex_cleanup(struct task_struct *tsk)
{
if (unlikely(tsk->robust_list)) {
- exit_robust_list(tsk);
+ if (!exit_robust_list(tsk))
+ pr_info("futex: exit_robust_list failed");
tsk->robust_list = NULL;
}

#ifdef CONFIG_COMPAT
if (unlikely(tsk->compat_robust_list)) {
- compat_exit_robust_list(tsk);
+ if (!compat_exit_robust_list(tsk))
+ pr_info("futex: compat_exit_robust_list failed");
tsk->compat_robust_list = NULL;
}
#endif
--
2.35.1

2022-04-22 18:50:35

by Nico Pache

[permalink] [raw]
Subject: Re: [RFC 2/3] futex: exit: Print a warning when futex_cleanup fails



On 4/21/22 15:30, Matthew Wilcox wrote:
> On Thu, Apr 21, 2022 at 03:05:32PM -0400, Nico Pache wrote:
>> @@ -1007,13 +1013,15 @@ static inline void exit_pi_state_list(struct task_struct *curr) { }
>> static void futex_cleanup(struct task_struct *tsk)
>> {
>> if (unlikely(tsk->robust_list)) {
>> - exit_robust_list(tsk);
>> + if (!exit_robust_list(tsk))
>> + pr_info("futex: exit_robust_list failed");
>
> Doesn't this allow a malicious user process to spam the kernel logs
> with messages? There needs to be a ratelimit on this, at least.
Fair point, we'd need a ratelimited print if we want to continue forward with
this. Additionally we may want to limit this print to debug kernels, but thats
just a thought.

Thanks for the review :)
-- Nico

2022-04-22 18:51:38

by Nico Pache

[permalink] [raw]
Subject: Re: [RFC 2/3] futex: exit: Print a warning when futex_cleanup fails



On 4/21/22 16:53, Thomas Gleixner wrote:
> On Thu, Apr 21 2022 at 15:05, Nico Pache wrote:
>
>> The futex_cleanup routines currently fails silently.
>>
>> Allow the futex_cleanup routines to fail more verbosely so we can
>> leave a message behind for easier debugging/error detecting.
>
> This is a free ticket to spam dmesg for any unpriviledged user.
As Matthew pointed out, we'd want to rate limit this. I was also thinking we can
limit this to debug kernels so unpriviledged users can not cause dmesg spamming
in production configs.
>
>> Fixes: 0771dfefc9e5 ("[PATCH] lightweight robust futexes: core")
>
> There is nothing to fix, really.
Fair enough, its been part of the design since inceptions.
>
> Robust futexes are best effort as I explained you before and spamming
> dmesg won't help anything.
It would have helped find the OOM/Robust Futex issue more quickly. It may also
help detect breakages in the pthread code (or any other users of robust futex)
if someone is doing something incorrectly.

Just a thought though. If you think its pointless to proceed down this path I
wont push forward.

Cheers,
-- Nico

2022-04-22 19:52:16

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC 2/3] futex: exit: Print a warning when futex_cleanup fails

On Thu, Apr 21 2022 at 15:05, Nico Pache wrote:

> The futex_cleanup routines currently fails silently.
>
> Allow the futex_cleanup routines to fail more verbosely so we can
> leave a message behind for easier debugging/error detecting.

This is a free ticket to spam dmesg for any unpriviledged user.

> Fixes: 0771dfefc9e5 ("[PATCH] lightweight robust futexes: core")

There is nothing to fix, really.

Robust futexes are best effort as I explained you before and spamming
dmesg won't help anything.

Thanks,

tglx

2022-04-22 21:41:36

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC 2/3] futex: exit: Print a warning when futex_cleanup fails

On Thu, Apr 21, 2022 at 03:05:32PM -0400, Nico Pache wrote:
> @@ -1007,13 +1013,15 @@ static inline void exit_pi_state_list(struct task_struct *curr) { }
> static void futex_cleanup(struct task_struct *tsk)
> {
> if (unlikely(tsk->robust_list)) {
> - exit_robust_list(tsk);
> + if (!exit_robust_list(tsk))
> + pr_info("futex: exit_robust_list failed");

Doesn't this allow a malicious user process to spam the kernel logs
with messages? There needs to be a ratelimit on this, at least.