2020-06-19 16:09:13

by Peter Xu

[permalink] [raw]
Subject: [PATCH 00/26] mm: Page fault accounting cleanups

(Forgot to cc mm list on v1; adding in)

This is v2 of the pf accounting cleanup series. It originates from Gerald
Schaefer's report on an issue a week ago regarding to incorrect page fault
accountings for retried page fault after commit 4064b9827063 ("mm: allow
VM_FAULT_RETRY for multiple times"):

https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/

This version used a better approach suggested by Linus so that we do accounting
directly in handle_mm_fault(). Moreover, we'll cover some special accounting
too like gup or IOMMU fault requests on process page tables. The outcome of
this series is to keep all the pf accountings in handle_mm_fault() (besides
PERF_COUNT_SW_PAGE_FAULTS, which is still done in per-arch #pf handlers).

Since v2 changed quite a lot from v1, changelog is omitted, and I also didn't
have a chance to pick up any r-b in previous version. I really appreciate
anyone who has looked at v1. V1 for reference:

https://lore.kernel.org/lkml/[email protected]/

What this series did:

- Correct page fault accounting: we do accounting for a page fault (no matter
whether it's from #PF handling, or gup, or anything else) only with the one
that completed the fault. For example, page fault retries should not be
counted in page fault counters. Same to the perf events.

- Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf event is
used in an adhoc way across different archs.

Case (1): for many archs it's done at the entry of a page fault handler, so
that it will also cover e.g. errornous faults.

Case (2): for some other archs, it is only accounted when the page fault is
resolved successfully.

Case (3): there're still quite some archs that have not enabled this perf event.

Since this series will touch merely all the archs, we unify this perf event
to always follow case (1), which is the one that makes most sense. And
since we moved the accounting into handle_mm_fault, the other two MAJ/MIN
perf events are well taken care of naturally.

- Unify definition of "major faults": the definition of "major fault" is
slightly changed when used in accounting (not VM_FAULT_MAJOR). More
information in patch 1.

- Always account the page fault onto the one that triggered the page fault.
This does not matter much for #PF handlings, but mostly for gup. More
information on this in patch 25.

Patchset layout:

Patch 1: Introduced the accounting in handle_mm_fault(), not enabled.
Patch 2-24: Enable the new accounting for arch #PF handlers one by one.
Patch 25: Enable the new accounting for the rest outliers (gup, iommu, etc.)
Patch 26: Cleanup GUP task_struct pointer since it's not needed any more

For each of the patch that fixes a specific arch, I'm CCing the maintainers and
the arch list if there is. Besides, I only lightly tested this series on x86.

Please have a look, thanks.

Peter Xu (26):
mm: Do page fault accounting in handle_mm_fault
mm/alpha: Use general page fault accounting
mm/arc: Use general page fault accounting
mm/arm: Use general page fault accounting
mm/arm64: Use general page fault accounting
mm/csky: Use general page fault accounting
mm/hexagon: Use general page fault accounting
mm/ia64: Use general page fault accounting
mm/m68k: Use general page fault accounting
mm/microblaze: Use general page fault accounting
mm/mips: Use general page fault accounting
mm/nds32: Use general page fault accounting
mm/nios2: Use general page fault accounting
mm/openrisc: Use general page fault accounting
mm/parisc: Use general page fault accounting
mm/powerpc: Use general page fault accounting
mm/riscv: Use general page fault accounting
mm/s390: Use general page fault accounting
mm/sh: Use general page fault accounting
mm/sparc32: Use general page fault accounting
mm/sparc64: Use general page fault accounting
mm/unicore32: Use general page fault accounting
mm/x86: Use general page fault accounting
mm/xtensa: Use general page fault accounting
mm: Clean up the last pieces of page fault accountings
mm/gup: Remove task_struct pointer for all gup code

arch/alpha/mm/fault.c | 8 +-
arch/arc/kernel/process.c | 2 +-
arch/arc/mm/fault.c | 18 +---
arch/arm/mm/fault.c | 25 ++---
arch/arm64/mm/fault.c | 29 ++----
arch/csky/mm/fault.c | 13 +--
arch/hexagon/mm/vm_fault.c | 9 +-
arch/ia64/mm/fault.c | 9 +-
arch/m68k/mm/fault.c | 14 +--
arch/microblaze/mm/fault.c | 9 +-
arch/mips/mm/fault.c | 14 +--
arch/nds32/mm/fault.c | 19 +---
arch/nios2/mm/fault.c | 14 +--
arch/openrisc/mm/fault.c | 9 +-
arch/parisc/mm/fault.c | 8 +-
arch/powerpc/mm/copro_fault.c | 7 +-
arch/powerpc/mm/fault.c | 11 +-
arch/riscv/mm/fault.c | 16 +--
arch/s390/kvm/interrupt.c | 2 +-
arch/s390/kvm/kvm-s390.c | 2 +-
arch/s390/kvm/priv.c | 8 +-
arch/s390/mm/fault.c | 16 +--
arch/s390/mm/gmap.c | 4 +-
arch/sh/mm/fault.c | 11 +-
arch/sparc/mm/fault_32.c | 13 +--
arch/sparc/mm/fault_64.c | 11 +-
arch/um/kernel/trap.c | 6 +-
arch/unicore32/mm/fault.c | 14 +--
arch/x86/mm/fault.c | 17 +---
arch/xtensa/mm/fault.c | 15 +--
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 2 +-
drivers/infiniband/core/umem_odp.c | 2 +-
drivers/iommu/amd_iommu_v2.c | 2 +-
drivers/iommu/intel-svm.c | 2 +-
drivers/vfio/vfio_iommu_type1.c | 2 +-
fs/exec.c | 2 +-
include/linux/mm.h | 16 +--
kernel/events/uprobes.c | 6 +-
kernel/futex.c | 2 +-
mm/gup.c | 107 +++++++-------------
mm/hmm.c | 3 +-
mm/ksm.c | 3 +-
mm/memory.c | 72 ++++++++++++-
mm/process_vm_access.c | 2 +-
security/tomoyo/domain.c | 2 +-
virt/kvm/async_pf.c | 2 +-
virt/kvm/kvm_main.c | 2 +-
47 files changed, 222 insertions(+), 360 deletions(-)

--
2.26.2


2020-06-19 16:09:21

by Peter Xu

[permalink] [raw]
Subject: [PATCH 05/26] mm/arm64: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened. To do this, we pass pt_regs pointer into __do_page_fault().

CC: Catalin Marinas <[email protected]>
CC: Will Deacon <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/arm64/mm/fault.c | 29 ++++++-----------------------
1 file changed, 6 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 5f6607b951b8..09b206521559 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -398,7 +398,8 @@ static void do_bad_area(unsigned long addr, unsigned int esr, struct pt_regs *re
#define VM_FAULT_BADACCESS 0x020000

static vm_fault_t __do_page_fault(struct mm_struct *mm, unsigned long addr,
- unsigned int mm_flags, unsigned long vm_flags)
+ unsigned int mm_flags, unsigned long vm_flags,
+ struct pt_regs *regs)
{
struct vm_area_struct *vma = find_vma(mm, addr);

@@ -422,7 +423,7 @@ static vm_fault_t __do_page_fault(struct mm_struct *mm, unsigned long addr,
*/
if (!(vma->vm_flags & vm_flags))
return VM_FAULT_BADACCESS;
- return handle_mm_fault(vma, addr & PAGE_MASK, mm_flags, NULL);
+ return handle_mm_fault(vma, addr & PAGE_MASK, mm_flags, regs);
}

static bool is_el0_instruction_abort(unsigned int esr)
@@ -444,7 +445,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
{
const struct fault_info *inf;
struct mm_struct *mm = current->mm;
- vm_fault_t fault, major = 0;
+ vm_fault_t fault;
unsigned long vm_flags = VM_ACCESS_FLAGS;
unsigned int mm_flags = FAULT_FLAG_DEFAULT;

@@ -510,8 +511,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
#endif
}

- fault = __do_page_fault(mm, addr, mm_flags, vm_flags);
- major |= fault & VM_FAULT_MAJOR;
+ fault = __do_page_fault(mm, addr, mm_flags, vm_flags, regs);

/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
@@ -532,25 +532,8 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
* Handle the "normal" (no error) case first.
*/
if (likely(!(fault & (VM_FAULT_ERROR | VM_FAULT_BADMAP |
- VM_FAULT_BADACCESS)))) {
- /*
- * Major/minor page fault accounting is only done
- * once. If we go through a retry, it is extremely
- * likely that the page will be found in page cache at
- * that point.
- */
- if (major) {
- current->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs,
- addr);
- } else {
- current->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs,
- addr);
- }
-
+ VM_FAULT_BADACCESS))))
return 0;
- }

/*
* If we are in kernel mode at this point, we have no context to
--
2.26.2

2020-06-19 16:09:21

by Peter Xu

[permalink] [raw]
Subject: [PATCH 01/26] mm: Do page fault accounting in handle_mm_fault

This is a preparation patch to move page fault accountings into the general
code in handle_mm_fault(). This includes both the per task flt_maj/flt_min
counters, and the major/minor page fault perf events. To do this, the pt_regs
pointer is passed into handle_mm_fault().

PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault handlers.

So far, all the pt_regs pointer that passed into handle_mm_fault() is NULL,
which means this patch should have no intented functional change.

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
arch/alpha/mm/fault.c | 2 +-
arch/arc/mm/fault.c | 2 +-
arch/arm/mm/fault.c | 2 +-
arch/arm64/mm/fault.c | 2 +-
arch/csky/mm/fault.c | 3 +-
arch/hexagon/mm/vm_fault.c | 2 +-
arch/ia64/mm/fault.c | 2 +-
arch/m68k/mm/fault.c | 2 +-
arch/microblaze/mm/fault.c | 2 +-
arch/mips/mm/fault.c | 2 +-
arch/nds32/mm/fault.c | 2 +-
arch/nios2/mm/fault.c | 2 +-
arch/openrisc/mm/fault.c | 2 +-
arch/parisc/mm/fault.c | 2 +-
arch/powerpc/mm/copro_fault.c | 2 +-
arch/powerpc/mm/fault.c | 2 +-
arch/riscv/mm/fault.c | 2 +-
arch/s390/mm/fault.c | 2 +-
arch/sh/mm/fault.c | 2 +-
arch/sparc/mm/fault_32.c | 4 +--
arch/sparc/mm/fault_64.c | 2 +-
arch/um/kernel/trap.c | 2 +-
arch/unicore32/mm/fault.c | 2 +-
arch/x86/mm/fault.c | 2 +-
arch/xtensa/mm/fault.c | 2 +-
drivers/iommu/amd_iommu_v2.c | 2 +-
drivers/iommu/intel-svm.c | 2 +-
include/linux/mm.h | 7 ++--
mm/gup.c | 4 +--
mm/hmm.c | 3 +-
mm/ksm.c | 3 +-
mm/memory.c | 66 ++++++++++++++++++++++++++++++++++-
32 files changed, 105 insertions(+), 35 deletions(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index c2d7b6d7bac7..82e72f24486e 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -148,7 +148,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
/* If for any reason at all we couldn't handle the fault,
make sure we exit gracefully rather than endlessly redo
the fault. */
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, NULL);

if (fault_signal_pending(fault, regs))
return;
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index 92b339c7adba..34380139e7a2 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -131,7 +131,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
goto bad_area;
}

- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, NULL);

/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 2dd5c41cbb8d..0d6be0f4f27c 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -223,7 +223,7 @@ __do_page_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
goto out;
}

- return handle_mm_fault(vma, addr & PAGE_MASK, flags);
+ return handle_mm_fault(vma, addr & PAGE_MASK, flags, NULL);

check_stack:
/* Don't allow expansion below FIRST_USER_ADDRESS */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index c9cedc0432d2..5f6607b951b8 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -422,7 +422,7 @@ static vm_fault_t __do_page_fault(struct mm_struct *mm, unsigned long addr,
*/
if (!(vma->vm_flags & vm_flags))
return VM_FAULT_BADACCESS;
- return handle_mm_fault(vma, addr & PAGE_MASK, mm_flags);
+ return handle_mm_fault(vma, addr & PAGE_MASK, mm_flags, NULL);
}

static bool is_el0_instruction_abort(unsigned int esr)
diff --git a/arch/csky/mm/fault.c b/arch/csky/mm/fault.c
index 4e6dc68f3258..b14f97d3cb15 100644
--- a/arch/csky/mm/fault.c
+++ b/arch/csky/mm/fault.c
@@ -150,7 +150,8 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, write ? FAULT_FLAG_WRITE : 0);
+ fault = handle_mm_fault(vma, address, write ? FAULT_FLAG_WRITE : 0,
+ NULL);
if (unlikely(fault & VM_FAULT_ERROR)) {
if (fault & VM_FAULT_OOM)
goto out_of_memory;
diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c
index 72334b26317a..f04cd0a6d905 100644
--- a/arch/hexagon/mm/vm_fault.c
+++ b/arch/hexagon/mm/vm_fault.c
@@ -89,7 +89,7 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
break;
}

- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, NULL);

if (fault_signal_pending(fault, regs))
return;
diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
index 30d0c1fca99e..caa93e083c9d 100644
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -139,7 +139,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
* sure we exit gracefully rather than endlessly redo the
* fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, NULL);

if (fault_signal_pending(fault, regs))
return;
diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
index 3bfb5c8ac3c7..2db38dfbc00c 100644
--- a/arch/m68k/mm/fault.c
+++ b/arch/m68k/mm/fault.c
@@ -135,7 +135,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
* the fault.
*/

- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, NULL);
pr_debug("handle_mm_fault returns %x\n", fault);

if (fault_signal_pending(fault, regs))
diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c
index 3248141f8ed5..9abfa5224386 100644
--- a/arch/microblaze/mm/fault.c
+++ b/arch/microblaze/mm/fault.c
@@ -215,7 +215,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, NULL);

if (fault_signal_pending(fault, regs))
return;
diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index f8d62cd83b36..31c2afb8f8a5 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -152,7 +152,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, NULL);

if (fault_signal_pending(fault, regs))
return;
diff --git a/arch/nds32/mm/fault.c b/arch/nds32/mm/fault.c
index f331e533edc2..22527129025c 100644
--- a/arch/nds32/mm/fault.c
+++ b/arch/nds32/mm/fault.c
@@ -207,7 +207,7 @@ void do_page_fault(unsigned long entry, unsigned long addr,
* the fault.
*/

- fault = handle_mm_fault(vma, addr, flags);
+ fault = handle_mm_fault(vma, addr, flags, NULL);

/*
* If we need to retry but a fatal signal is pending, handle the
diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c
index ec9d8a9c426f..88abf297c759 100644
--- a/arch/nios2/mm/fault.c
+++ b/arch/nios2/mm/fault.c
@@ -131,7 +131,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, NULL);

if (fault_signal_pending(fault, regs))
return;
diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c
index 8af1cc78c4fb..45aedc572361 100644
--- a/arch/openrisc/mm/fault.c
+++ b/arch/openrisc/mm/fault.c
@@ -159,7 +159,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
* the fault.
*/

- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, NULL);

if (fault_signal_pending(fault, regs))
return;
diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
index 86e8c848f3d7..c10908ea8803 100644
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -302,7 +302,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
* fault.
*/

- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, NULL);

if (fault_signal_pending(fault, regs))
return;
diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index beb060b96632..c0478bef1f14 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -64,7 +64,7 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
}

ret = 0;
- *flt = handle_mm_fault(vma, ea, is_write ? FAULT_FLAG_WRITE : 0);
+ *flt = handle_mm_fault(vma, ea, is_write ? FAULT_FLAG_WRITE : 0, NULL);
if (unlikely(*flt & VM_FAULT_ERROR)) {
if (*flt & VM_FAULT_OOM) {
ret = -ENOMEM;
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 84af6c8eecf7..992b10c3761c 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -563,7 +563,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, NULL);

#ifdef CONFIG_PPC_MEM_KEYS
/*
diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index be84e32adc4c..677ee1bb11ac 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -110,7 +110,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, addr, flags);
+ fault = handle_mm_fault(vma, addr, flags, NULL);

/*
* If we need to retry but a fatal signal is pending, handle the
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index dedc28be27ab..ab6d7eedcfab 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -479,7 +479,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, NULL);
if (fault_signal_pending(fault, regs)) {
fault = VM_FAULT_SIGNAL;
if (flags & FAULT_FLAG_RETRY_NOWAIT)
diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c
index 5f23d7907597..a4e670a9c9b3 100644
--- a/arch/sh/mm/fault.c
+++ b/arch/sh/mm/fault.c
@@ -464,7 +464,7 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, NULL);

if (unlikely(fault & (VM_FAULT_RETRY | VM_FAULT_ERROR)))
if (mm_fault_error(regs, error_code, address, fault))
diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c
index f6e0e601f857..61524d284706 100644
--- a/arch/sparc/mm/fault_32.c
+++ b/arch/sparc/mm/fault_32.c
@@ -235,7 +235,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, NULL);

if (fault_signal_pending(fault, regs))
return;
@@ -411,7 +411,7 @@ static void force_user_fault(unsigned long address, int write)
if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
goto bad_area;
}
- switch (handle_mm_fault(vma, address, flags)) {
+ switch (handle_mm_fault(vma, address, flags, NULL)) {
case VM_FAULT_SIGBUS:
case VM_FAULT_OOM:
goto do_sigbus;
diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index c0c0dd471b6b..6b702a0a8155 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -423,7 +423,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
goto bad_area;
}

- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, NULL);

if (fault_signal_pending(fault, regs))
goto exit_exception;
diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
index 8f18cf56b3dd..32cc8f59322b 100644
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -75,7 +75,7 @@ int handle_page_fault(unsigned long address, unsigned long ip,
do {
vm_fault_t fault;

- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, NULL);

if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
goto out_nosemaphore;
diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c
index 3022104aa613..847ff24fcc2a 100644
--- a/arch/unicore32/mm/fault.c
+++ b/arch/unicore32/mm/fault.c
@@ -186,7 +186,7 @@ static vm_fault_t __do_pf(struct mm_struct *mm, unsigned long addr,
* If for any reason at all we couldn't handle the fault, make
* sure we exit gracefully rather than endlessly redo the fault.
*/
- fault = handle_mm_fault(vma, addr & PAGE_MASK, flags);
+ fault = handle_mm_fault(vma, addr & PAGE_MASK, flags, NULL);
return fault;

check_stack:
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a51df516b87b..3e27ed85af06 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1461,7 +1461,7 @@ void do_user_addr_fault(struct pt_regs *regs,
* userland). The return to userland is identified whenever
* FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in flags.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, NULL);
major |= fault & VM_FAULT_MAJOR;

/* Quick path to respond to signals */
diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c
index e7172bd53ced..722ef3c98d60 100644
--- a/arch/xtensa/mm/fault.c
+++ b/arch/xtensa/mm/fault.c
@@ -108,7 +108,7 @@ void do_page_fault(struct pt_regs *regs)
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, NULL);

if (fault_signal_pending(fault, regs))
return;
diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index d6d85debd01b..66042b816943 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -497,7 +497,7 @@ static void do_fault(struct work_struct *work)
if (access_error(vma, fault))
goto out;

- ret = handle_mm_fault(vma, address, flags);
+ ret = handle_mm_fault(vma, address, flags, NULL);
out:
up_read(&mm->mmap_sem);

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 2998418f0a38..c9cb5e5b6c34 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -629,7 +629,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
goto invalid;

ret = handle_mm_fault(vma, address,
- req->wr_req ? FAULT_FLAG_WRITE : 0);
+ req->wr_req ? FAULT_FLAG_WRITE : 0, NULL);
if (ret & VM_FAULT_ERROR)
goto invalid;

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f3fe7371855c..46bee4044ac1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -36,6 +36,7 @@ struct file_ra_state;
struct user_struct;
struct writeback_control;
struct bdi_writeback;
+struct pt_regs;

void init_mm_internals(void);

@@ -1652,7 +1653,8 @@ int invalidate_inode_page(struct page *page);

#ifdef CONFIG_MMU
extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma,
- unsigned long address, unsigned int flags);
+ unsigned long address, unsigned int flags,
+ struct pt_regs *regs);
extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
unsigned long address, unsigned int fault_flags,
bool *unlocked);
@@ -1662,7 +1664,8 @@ void unmap_mapping_range(struct address_space *mapping,
loff_t const holebegin, loff_t const holelen, int even_cows);
#else
static inline vm_fault_t handle_mm_fault(struct vm_area_struct *vma,
- unsigned long address, unsigned int flags)
+ unsigned long address, unsigned int flags,
+ struct pt_regs *regs)
{
/* should never happen if there's no MMU */
BUG();
diff --git a/mm/gup.c b/mm/gup.c
index 87a6a59fe667..1a48c639ea49 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -876,7 +876,7 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma,
fault_flags |= FAULT_FLAG_TRIED;
}

- ret = handle_mm_fault(vma, address, fault_flags);
+ ret = handle_mm_fault(vma, address, fault_flags, NULL);
if (ret & VM_FAULT_ERROR) {
int err = vm_fault_to_errno(ret, *flags);

@@ -1222,7 +1222,7 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
fatal_signal_pending(current))
return -EINTR;

- ret = handle_mm_fault(vma, address, fault_flags);
+ ret = handle_mm_fault(vma, address, fault_flags, NULL);
major |= ret & VM_FAULT_MAJOR;
if (ret & VM_FAULT_ERROR) {
int err = vm_fault_to_errno(ret, 0);
diff --git a/mm/hmm.c b/mm/hmm.c
index 280585833adf..5fca59a1f6e9 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -90,7 +90,8 @@ static int hmm_vma_fault(unsigned long addr, unsigned long end,
}

for (; addr < end; addr += PAGE_SIZE)
- if (handle_mm_fault(vma, addr, fault_flags) & VM_FAULT_ERROR)
+ if (handle_mm_fault(vma, addr, fault_flags, NULL) &
+ VM_FAULT_ERROR)
return -EFAULT;
return -EBUSY;
}
diff --git a/mm/ksm.c b/mm/ksm.c
index 281c00129a2e..2e2b02abcc0f 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -480,7 +480,8 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
break;
if (PageKsm(page))
ret = handle_mm_fault(vma, addr,
- FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE);
+ FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE,
+ NULL);
else
ret = VM_FAULT_WRITE;
put_page(page);
diff --git a/mm/memory.c b/mm/memory.c
index f703fe8c8346..23c738b3756e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -71,6 +71,8 @@
#include <linux/dax.h>
#include <linux/oom.h>
#include <linux/numa.h>
+#include <linux/perf_event.h>
+#include <linux/ptrace.h>

#include <trace/events/kmem.h>

@@ -4345,6 +4347,36 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
return handle_pte_fault(&vmf);
}

+/**
+ * mm_account_fault - Do page fault accountings
+ * @regs: the pt_regs struct pointer. When set to NULL, will skip accounting
+ * @address: faulted address.
+ * @major: whether this is a major fault.
+ *
+ * This will take care of most of the page fault accountings. It should only
+ * be called when a page fault is completed. For example, VM_FAULT_RETRY means
+ * the fault needs to be retried again later, so it should not contribute to
+ * the accounting.
+ *
+ * The accounting will also include the PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]
+ * perf counter updates. Note: the handling of PERF_COUNT_SW_PAGE_FAULTS
+ * should still be in per-arch page fault handlers at the entry of page fault.
+ */
+static inline void mm_account_fault(struct pt_regs *regs,
+ unsigned long address, bool major)
+{
+ if (!regs)
+ return;
+
+ if (major) {
+ current->maj_flt++;
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
+ } else {
+ current->min_flt++;
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
+ }
+}
+
/*
* By the time we get here, we already hold the mm semaphore
*
@@ -4352,7 +4384,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
* return value. See filemap_fault() and __lock_page_or_retry().
*/
vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
- unsigned int flags)
+ unsigned int flags, struct pt_regs *regs)
{
vm_fault_t ret;

@@ -4393,6 +4425,38 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
mem_cgroup_oom_synchronize(false);
}

+ if (ret & VM_FAULT_RETRY)
+ return ret;
+
+ /*
+ * Do accounting in the common code, to avoid unnecessary
+ * architecture differences or duplicated code.
+ *
+ * We arbitrarily make the rules be:
+ *
+ * - faults that never even got here (because the address
+ * wasn't valid). That includes arch_vma_access_permitted()
+ * failing above.
+ *
+ * So this is expressly not a "this many hardware page
+ * faults" counter. Use the hw profiling for that.
+ *
+ * - incomplete faults (ie RETRY) do not count (see above).
+ * They will only count once completed.
+ *
+ * - the fault counts as a "major" fault when the final
+ * successful fault is VM_FAULT_MAJOR, or if it was a
+ * retry (which implies that we couldn't handle it
+ * immediately previously).
+ *
+ * - if the fault is done for GUP, regs wil be NULL and
+ * no accounting will be done (but you _could_ pass in
+ * your own regs and it would be accounted to the thread
+ * doing the fault, not to the target!)
+ */
+ mm_account_fault(regs, address, (ret & VM_FAULT_MAJOR) ||
+ (flags & FAULT_FLAG_TRIED));
+
return ret;
}
EXPORT_SYMBOL_GPL(handle_mm_fault);
--
2.26.2

2020-06-19 16:09:39

by Peter Xu

[permalink] [raw]
Subject: [PATCH 08/26] mm/ia64: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two
perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault().

Signed-off-by: Peter Xu <[email protected]>
---
arch/ia64/mm/fault.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
index caa93e083c9d..613255e947a8 100644
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -14,6 +14,7 @@
#include <linux/kdebug.h>
#include <linux/prefetch.h>
#include <linux/uaccess.h>
+#include <linux/perf_event.h>

#include <asm/pgtable.h>
#include <asm/processor.h>
@@ -101,6 +102,8 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
flags |= FAULT_FLAG_USER;
if (mask & VM_WRITE)
flags |= FAULT_FLAG_WRITE;
+
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
retry:
down_read(&mm->mmap_sem);

@@ -139,7 +142,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
* sure we exit gracefully rather than endlessly redo the
* fault.
*/
- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (fault_signal_pending(fault, regs))
return;
@@ -162,10 +165,6 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
}

if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR)
- current->maj_flt++;
- else
- current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

--
2.26.2

2020-06-19 16:09:43

by Peter Xu

[permalink] [raw]
Subject: [PATCH 09/26] mm/m68k: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two
perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault().

CC: Geert Uytterhoeven <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/m68k/mm/fault.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
index 2db38dfbc00c..983054d209bc 100644
--- a/arch/m68k/mm/fault.c
+++ b/arch/m68k/mm/fault.c
@@ -12,6 +12,7 @@
#include <linux/interrupt.h>
#include <linux/module.h>
#include <linux/uaccess.h>
+#include <linux/perf_event.h>

#include <asm/setup.h>
#include <asm/traps.h>
@@ -85,6 +86,8 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,

if (user_mode(regs))
flags |= FAULT_FLAG_USER;
+
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
retry:
down_read(&mm->mmap_sem);

@@ -135,7 +138,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
* the fault.
*/

- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);
pr_debug("handle_mm_fault returns %x\n", fault);

if (fault_signal_pending(fault, regs))
@@ -151,16 +154,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
BUG();
}

- /*
- * Major/minor page fault accounting is only done on the
- * initial attempt. If we go through a retry, it is extremely
- * likely that the page will be found in page cache at that point.
- */
if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR)
- current->maj_flt++;
- else
- current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

--
2.26.2

2020-06-19 16:10:09

by Peter Xu

[permalink] [raw]
Subject: [PATCH 06/26] mm/csky: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

CC: Guo Ren <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/csky/mm/fault.c | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/arch/csky/mm/fault.c b/arch/csky/mm/fault.c
index b14f97d3cb15..a3e0aa3ebb79 100644
--- a/arch/csky/mm/fault.c
+++ b/arch/csky/mm/fault.c
@@ -151,7 +151,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
* the fault.
*/
fault = handle_mm_fault(vma, address, write ? FAULT_FLAG_WRITE : 0,
- NULL);
+ regs);
if (unlikely(fault & VM_FAULT_ERROR)) {
if (fault & VM_FAULT_OOM)
goto out_of_memory;
@@ -161,16 +161,6 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
goto bad_area;
BUG();
}
- if (fault & VM_FAULT_MAJOR) {
- tsk->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs,
- address);
- } else {
- tsk->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs,
- address);
- }
-
up_read(&mm->mmap_sem);
return;

--
2.26.2

2020-06-19 16:10:12

by Peter Xu

[permalink] [raw]
Subject: [PATCH 04/26] mm/arm: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened. To do this, we need to pass the pt_regs pointer into
__do_page_fault().

Fix PERF_COUNT_SW_PAGE_FAULTS perf event manually for page fault retries, by
moving it before taking mmap_sem.

CC: Russell King <[email protected]>
CC: Will Deacon <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/arm/mm/fault.c | 25 ++++++-------------------
1 file changed, 6 insertions(+), 19 deletions(-)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 0d6be0f4f27c..8530befee012 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -201,7 +201,8 @@ static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma)

static vm_fault_t __kprobes
__do_page_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
- unsigned int flags, struct task_struct *tsk)
+ unsigned int flags, struct task_struct *tsk,
+ struct pt_regs *regs)
{
struct vm_area_struct *vma;
vm_fault_t fault;
@@ -223,7 +224,7 @@ __do_page_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
goto out;
}

- return handle_mm_fault(vma, addr & PAGE_MASK, flags, NULL);
+ return handle_mm_fault(vma, addr & PAGE_MASK, flags, regs);

check_stack:
/* Don't allow expansion below FIRST_USER_ADDRESS */
@@ -265,6 +266,8 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
if ((fsr & FSR_WRITE) && !(fsr & FSR_CM))
flags |= FAULT_FLAG_WRITE;

+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
+
/*
* As per x86, we may deadlock here. However, since the kernel only
* validly references user space from well defined areas of the code,
@@ -289,7 +292,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
#endif
}

- fault = __do_page_fault(mm, addr, fsr, flags, tsk);
+ fault = __do_page_fault(mm, addr, fsr, flags, tsk, regs);

/* If we need to retry but a fatal signal is pending, handle the
* signal first. We do not need to release the mmap_sem because
@@ -301,23 +304,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
return 0;
}

- /*
- * Major/minor page fault accounting is only done on the
- * initial attempt. If we go through a retry, it is extremely
- * likely that the page will be found in page cache at that point.
- */
-
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
if (!(fault & VM_FAULT_ERROR) && flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR) {
- tsk->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
- regs, addr);
- } else {
- tsk->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
- regs, addr);
- }
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;
goto retry;
--
2.26.2

2020-06-19 16:10:35

by Peter Xu

[permalink] [raw]
Subject: [PATCH 10/26] mm/microblaze: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two
perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault().

CC: Michal Simek <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
arch/microblaze/mm/fault.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c
index 9abfa5224386..3d58dbd227cd 100644
--- a/arch/microblaze/mm/fault.c
+++ b/arch/microblaze/mm/fault.c
@@ -28,6 +28,7 @@
#include <linux/mman.h>
#include <linux/mm.h>
#include <linux/interrupt.h>
+#include <linux/perf_event.h>

#include <asm/page.h>
#include <asm/pgtable.h>
@@ -122,6 +123,8 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
if (user_mode(regs))
flags |= FAULT_FLAG_USER;

+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
+
/* When running in the kernel we expect faults to occur only to
* addresses in user space. All other faults represent errors in the
* kernel and should generate an OOPS. Unfortunately, in the case of an
@@ -215,7 +218,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (fault_signal_pending(fault, regs))
return;
@@ -231,10 +234,6 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
}

if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (unlikely(fault & VM_FAULT_MAJOR))
- current->maj_flt++;
- else
- current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

--
2.26.2

2020-06-19 16:11:30

by Peter Xu

[permalink] [raw]
Subject: [PATCH 11/26] mm/mips: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

Fix PERF_COUNT_SW_PAGE_FAULTS perf event manually for page fault retries, by
moving it before taking mmap_sem.

CC: Thomas Bogendoerfer <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/mips/mm/fault.c | 14 +++-----------
1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index 31c2afb8f8a5..750a4978a12b 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -96,6 +96,8 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,

if (user_mode(regs))
flags |= FAULT_FLAG_USER;
+
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
retry:
down_read(&mm->mmap_sem);
vma = find_vma(mm, address);
@@ -152,12 +154,11 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (fault_signal_pending(fault, regs))
return;

- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
if (unlikely(fault & VM_FAULT_ERROR)) {
if (fault & VM_FAULT_OOM)
goto out_of_memory;
@@ -168,15 +169,6 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
BUG();
}
if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR) {
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
- regs, address);
- tsk->maj_flt++;
- } else {
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
- regs, address);
- tsk->min_flt++;
- }
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

--
2.26.2

2020-06-19 16:16:38

by Peter Xu

[permalink] [raw]
Subject: [PATCH 15/26] mm/parisc: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two
perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault().

CC: James E.J. Bottomley <[email protected]>
CC: Helge Deller <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/parisc/mm/fault.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
index c10908ea8803..65661e22678e 100644
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -18,6 +18,7 @@
#include <linux/extable.h>
#include <linux/uaccess.h>
#include <linux/hugetlb.h>
+#include <linux/perf_event.h>

#include <asm/traps.h>

@@ -281,6 +282,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
acc_type = parisc_acctyp(code, regs->iir);
if (acc_type & VM_WRITE)
flags |= FAULT_FLAG_WRITE;
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
retry:
down_read(&mm->mmap_sem);
vma = find_vma_prev(mm, address, &prev_vma);
@@ -302,7 +304,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
* fault.
*/

- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (fault_signal_pending(fault, regs))
return;
@@ -323,10 +325,6 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
BUG();
}
if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR)
- current->maj_flt++;
- else
- current->min_flt++;
if (fault & VM_FAULT_RETRY) {
/*
* No need to up_read(&mm->mmap_sem) as we would
--
2.26.2

2020-06-19 16:16:50

by Peter Xu

[permalink] [raw]
Subject: [PATCH 18/26] mm/s390: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

CC: Heiko Carstens <[email protected]>
CC: Vasily Gorbik <[email protected]>
CC: Christian Borntraeger <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/s390/mm/fault.c | 16 +---------------
1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index ab6d7eedcfab..4d62ca7d3e09 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -479,7 +479,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);
if (fault_signal_pending(fault, regs)) {
fault = VM_FAULT_SIGNAL;
if (flags & FAULT_FLAG_RETRY_NOWAIT)
@@ -489,21 +489,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
if (unlikely(fault & VM_FAULT_ERROR))
goto out_up;

- /*
- * Major/minor page fault accounting is only done on the
- * initial attempt. If we go through a retry, it is extremely
- * likely that the page will be found in page cache at that point.
- */
if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR) {
- tsk->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
- regs, address);
- } else {
- tsk->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
- regs, address);
- }
if (fault & VM_FAULT_RETRY) {
if (IS_ENABLED(CONFIG_PGSTE) && gmap &&
(flags & FAULT_FLAG_RETRY_NOWAIT)) {
--
2.26.2

2020-06-19 16:17:11

by Peter Xu

[permalink] [raw]
Subject: [PATCH 24/26] mm/xtensa: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

Remove the PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN] perf events because it's now
also done in handle_mm_fault().

Move the PERF_COUNT_SW_PAGE_FAULTS event higher before taking mmap_sem for the
fault, then it'll match with the rest of the archs.

CC: Chris Zankel <[email protected]>
CC: Max Filippov <[email protected]>
CC: [email protected]
Acked-by: Max Filippov <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
arch/xtensa/mm/fault.c | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c
index 722ef3c98d60..9ef7331e37f8 100644
--- a/arch/xtensa/mm/fault.c
+++ b/arch/xtensa/mm/fault.c
@@ -73,6 +73,9 @@ void do_page_fault(struct pt_regs *regs)

if (user_mode(regs))
flags |= FAULT_FLAG_USER;
+
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
+
retry:
down_read(&mm->mmap_sem);
vma = find_vma(mm, address);
@@ -108,7 +111,7 @@ void do_page_fault(struct pt_regs *regs)
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (fault_signal_pending(fault, regs))
return;
@@ -123,10 +126,6 @@ void do_page_fault(struct pt_regs *regs)
BUG();
}
if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR)
- current->maj_flt++;
- else
- current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

@@ -140,12 +139,6 @@ void do_page_fault(struct pt_regs *regs)
}

up_read(&mm->mmap_sem);
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
- if (flags & VM_FAULT_MAJOR)
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
- else
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
-
return;

/* Something tried to access memory that isn't in our memory map..
--
2.26.2

2020-06-19 16:17:18

by Peter Xu

[permalink] [raw]
Subject: [PATCH 20/26] mm/sparc32: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

CC: David S. Miller <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/sparc/mm/fault_32.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c
index 61524d284706..542bf034962f 100644
--- a/arch/sparc/mm/fault_32.c
+++ b/arch/sparc/mm/fault_32.c
@@ -235,7 +235,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (fault_signal_pending(fault, regs))
return;
@@ -251,15 +251,6 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
}

if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR) {
- current->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ,
- 1, regs, address);
- } else {
- current->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN,
- 1, regs, address);
- }
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

--
2.26.2

2020-06-19 16:17:56

by Peter Xu

[permalink] [raw]
Subject: [PATCH 16/26] mm/powerpc: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().

CC: Michael Ellerman <[email protected]>
CC: Benjamin Herrenschmidt <[email protected]>
CC: Paul Mackerras <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/powerpc/mm/fault.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 992b10c3761c..e325d13efaf5 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -563,7 +563,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

#ifdef CONFIG_PPC_MEM_KEYS
/*
@@ -604,14 +604,9 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
/*
* Major/minor page fault accounting.
*/
- if (major) {
- current->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
+ if (major)
cmo_account_page_fault();
- } else {
- current->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
- }
+
return 0;
}
NOKPROBE_SYMBOL(__do_page_fault);
--
2.26.2

2020-06-19 16:17:58

by Peter Xu

[permalink] [raw]
Subject: [PATCH 13/26] mm/nios2: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two
perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault().

CC: Ley Foon Tan <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
arch/nios2/mm/fault.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c
index 88abf297c759..823e7d0a9e97 100644
--- a/arch/nios2/mm/fault.c
+++ b/arch/nios2/mm/fault.c
@@ -24,6 +24,7 @@
#include <linux/mm.h>
#include <linux/extable.h>
#include <linux/uaccess.h>
+#include <linux/perf_event.h>

#include <asm/mmu_context.h>
#include <asm/traps.h>
@@ -83,6 +84,8 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
if (user_mode(regs))
flags |= FAULT_FLAG_USER;

+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
+
if (!down_read_trylock(&mm->mmap_sem)) {
if (!user_mode(regs) && !search_exception_tables(regs->ea))
goto bad_area_nosemaphore;
@@ -131,7 +134,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (fault_signal_pending(fault, regs))
return;
@@ -146,16 +149,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
BUG();
}

- /*
- * Major/minor page fault accounting is only done on the
- * initial attempt. If we go through a retry, it is extremely
- * likely that the page will be found in page cache at that point.
- */
if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR)
- current->maj_flt++;
- else
- current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

--
2.26.2

2020-06-19 16:19:10

by Peter Xu

[permalink] [raw]
Subject: [PATCH 17/26] mm/riscv: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

CC: Paul Walmsley <[email protected]>
CC: Palmer Dabbelt <[email protected]>
CC: Albert Ou <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/riscv/mm/fault.c | 16 +---------------
1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index 677ee1bb11ac..e796ba02b572 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -110,7 +110,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, addr, flags, NULL);
+ fault = handle_mm_fault(vma, addr, flags, regs);

/*
* If we need to retry but a fatal signal is pending, handle the
@@ -128,21 +128,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
BUG();
}

- /*
- * Major/minor page fault accounting is only done on the
- * initial attempt. If we go through a retry, it is extremely
- * likely that the page will be found in page cache at that point.
- */
if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR) {
- tsk->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ,
- 1, regs, addr);
- } else {
- tsk->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN,
- 1, regs, addr);
- }
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

--
2.26.2

2020-06-19 16:19:29

by Peter Xu

[permalink] [raw]
Subject: [PATCH 26/26] mm/gup: Remove task_struct pointer for all gup code

After the cleanup of page fault accounting, gup does not need to pass
task_struct around any more. Remove that parameter in the whole gup stack.

Signed-off-by: Peter Xu <[email protected]>
---
arch/arc/kernel/process.c | 2 +-
arch/s390/kvm/interrupt.c | 2 +-
arch/s390/kvm/kvm-s390.c | 2 +-
arch/s390/kvm/priv.c | 8 +-
arch/s390/mm/gmap.c | 4 +-
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 2 +-
drivers/infiniband/core/umem_odp.c | 2 +-
drivers/vfio/vfio_iommu_type1.c | 2 +-
fs/exec.c | 2 +-
include/linux/mm.h | 9 +--
kernel/events/uprobes.c | 6 +-
kernel/futex.c | 2 +-
mm/gup.c | 90 +++++++++------------
mm/memory.c | 2 +-
mm/process_vm_access.c | 2 +-
security/tomoyo/domain.c | 2 +-
virt/kvm/async_pf.c | 2 +-
virt/kvm/kvm_main.c | 2 +-
18 files changed, 63 insertions(+), 80 deletions(-)

diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
index 315528f04bc1..2aad79ffc7f8 100644
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -91,7 +91,7 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new)
goto fail;

down_read(&current->mm->mmap_sem);
- ret = fixup_user_fault(current, current->mm, (unsigned long) uaddr,
+ ret = fixup_user_fault(current->mm, (unsigned long) uaddr,
FAULT_FLAG_WRITE, NULL);
up_read(&current->mm->mmap_sem);

diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index bfb481134994..7f4c5895aabd 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2768,7 +2768,7 @@ static struct page *get_map_page(struct kvm *kvm, u64 uaddr)
struct page *page = NULL;

down_read(&kvm->mm->mmap_sem);
- get_user_pages_remote(NULL, kvm->mm, uaddr, 1, FOLL_WRITE,
+ get_user_pages_remote(kvm->mm, uaddr, 1, FOLL_WRITE,
&page, NULL, NULL);
up_read(&kvm->mm->mmap_sem);
return page;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index d05bb040fd42..12fa299986f8 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1892,7 +1892,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)

r = set_guest_storage_key(current->mm, hva, keys[i], 0);
if (r) {
- r = fixup_user_fault(current, current->mm, hva,
+ r = fixup_user_fault(current->mm, hva,
FAULT_FLAG_WRITE, &unlocked);
if (r)
break;
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 893893642415..45b7d5df72d7 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -274,7 +274,7 @@ static int handle_iske(struct kvm_vcpu *vcpu)
rc = get_guest_storage_key(current->mm, vmaddr, &key);

if (rc) {
- rc = fixup_user_fault(current, current->mm, vmaddr,
+ rc = fixup_user_fault(current->mm, vmaddr,
FAULT_FLAG_WRITE, &unlocked);
if (!rc) {
up_read(&current->mm->mmap_sem);
@@ -320,7 +320,7 @@ static int handle_rrbe(struct kvm_vcpu *vcpu)
down_read(&current->mm->mmap_sem);
rc = reset_guest_reference_bit(current->mm, vmaddr);
if (rc < 0) {
- rc = fixup_user_fault(current, current->mm, vmaddr,
+ rc = fixup_user_fault(current->mm, vmaddr,
FAULT_FLAG_WRITE, &unlocked);
if (!rc) {
up_read(&current->mm->mmap_sem);
@@ -391,7 +391,7 @@ static int handle_sske(struct kvm_vcpu *vcpu)
m3 & SSKE_MC);

if (rc < 0) {
- rc = fixup_user_fault(current, current->mm, vmaddr,
+ rc = fixup_user_fault(current->mm, vmaddr,
FAULT_FLAG_WRITE, &unlocked);
rc = !rc ? -EAGAIN : rc;
}
@@ -1095,7 +1095,7 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
rc = cond_set_guest_storage_key(current->mm, vmaddr,
key, NULL, nq, mr, mc);
if (rc < 0) {
- rc = fixup_user_fault(current, current->mm, vmaddr,
+ rc = fixup_user_fault(current->mm, vmaddr,
FAULT_FLAG_WRITE, &unlocked);
rc = !rc ? -EAGAIN : rc;
}
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 1a95d8809cc3..0faf4f5f3fd4 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -649,7 +649,7 @@ int gmap_fault(struct gmap *gmap, unsigned long gaddr,
rc = vmaddr;
goto out_up;
}
- if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags,
+ if (fixup_user_fault(gmap->mm, vmaddr, fault_flags,
&unlocked)) {
rc = -EFAULT;
goto out_up;
@@ -879,7 +879,7 @@ static int gmap_pte_op_fixup(struct gmap *gmap, unsigned long gaddr,

BUG_ON(gmap_is_shadow(gmap));
fault_flags = (prot == PROT_WRITE) ? FAULT_FLAG_WRITE : 0;
- if (fixup_user_fault(current, mm, vmaddr, fault_flags, &unlocked))
+ if (fixup_user_fault(mm, vmaddr, fault_flags, &unlocked))
return -EFAULT;
if (unlocked)
/* lost mmap_sem, caller has to retry __gmap_translate */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 7ffd7afeb7a5..e87fa79c18d5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -472,7 +472,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
locked = 1;
}
ret = get_user_pages_remote
- (work->task, mm,
+ (mm,
obj->userptr.ptr + pinned * PAGE_SIZE,
npages - pinned,
flags,
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 3b1e627d9a8d..73b1a01b7339 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -437,7 +437,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
* complex (and doesn't gain us much performance in most use
* cases).
*/
- npages = get_user_pages_remote(owning_process, owning_mm,
+ npages = get_user_pages_remote(owning_mm,
user_virt, gup_num_pages,
flags, local_page_list, NULL, NULL);
up_read(&owning_mm->mmap_sem);
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index cc1d64765ce7..d77b34d6ee19 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -329,7 +329,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
flags |= FOLL_WRITE;

down_read(&mm->mmap_sem);
- ret = pin_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
+ ret = pin_user_pages_remote(mm, vaddr, 1, flags | FOLL_LONGTERM,
page, NULL, NULL);
if (ret == 1) {
*pfn = page_to_pfn(page[0]);
diff --git a/fs/exec.c b/fs/exec.c
index 2c465119affc..f3f87911f3d0 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -213,7 +213,7 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
* We are doing an exec(). 'current' is the process
* doing the exec and bprm->mm is the new process's mm.
*/
- ret = get_user_pages_remote(current, bprm->mm, pos, 1, gup_flags,
+ ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
&page, NULL, NULL);
if (ret <= 0)
return NULL;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 46bee4044ac1..5e347ffb049f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1655,7 +1655,7 @@ int invalidate_inode_page(struct page *page);
extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma,
unsigned long address, unsigned int flags,
struct pt_regs *regs);
-extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
+extern int fixup_user_fault(struct mm_struct *mm,
unsigned long address, unsigned int fault_flags,
bool *unlocked);
void unmap_mapping_pages(struct address_space *mapping,
@@ -1671,8 +1671,7 @@ static inline vm_fault_t handle_mm_fault(struct vm_area_struct *vma,
BUG();
return VM_FAULT_SIGBUS;
}
-static inline int fixup_user_fault(struct task_struct *tsk,
- struct mm_struct *mm, unsigned long address,
+static inline int fixup_user_fault(struct mm_struct *mm, unsigned long address,
unsigned int fault_flags, bool *unlocked)
{
/* should never happen if there's no MMU */
@@ -1698,11 +1697,11 @@ extern int access_remote_vm(struct mm_struct *mm, unsigned long addr,
extern int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
unsigned long addr, void *buf, int len, unsigned int gup_flags);

-long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
+long get_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked);
-long pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
+long pin_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ece7e13f6e4a..b7c9ad7e7d54 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -382,7 +382,7 @@ __update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
if (!vaddr || !d)
return -EINVAL;

- ret = get_user_pages_remote(NULL, mm, vaddr, 1,
+ ret = get_user_pages_remote(mm, vaddr, 1,
FOLL_WRITE, &page, &vma, NULL);
if (unlikely(ret <= 0)) {
/*
@@ -483,7 +483,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
if (is_register)
gup_flags |= FOLL_SPLIT_PMD;
/* Read the page with vaddr into memory */
- ret = get_user_pages_remote(NULL, mm, vaddr, 1, gup_flags,
+ ret = get_user_pages_remote(mm, vaddr, 1, gup_flags,
&old_page, &vma, NULL);
if (ret <= 0)
return ret;
@@ -2027,7 +2027,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
* but we treat this as a 'remote' access since it is
* essentially a kernel access to the memory.
*/
- result = get_user_pages_remote(NULL, mm, vaddr, 1, FOLL_FORCE, &page,
+ result = get_user_pages_remote(mm, vaddr, 1, FOLL_FORCE, &page,
NULL, NULL);
if (result < 0)
return result;
diff --git a/kernel/futex.c b/kernel/futex.c
index b59532862bc0..1466b4322491 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -696,7 +696,7 @@ static int fault_in_user_writeable(u32 __user *uaddr)
int ret;

down_read(&mm->mmap_sem);
- ret = fixup_user_fault(current, mm, (unsigned long)uaddr,
+ ret = fixup_user_fault(mm, (unsigned long)uaddr,
FAULT_FLAG_WRITE, NULL);
up_read(&mm->mmap_sem);

diff --git a/mm/gup.c b/mm/gup.c
index 17b4d0c45a6b..b8eb02673c10 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -851,7 +851,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
* does not include FOLL_NOWAIT, the mmap_sem may be released. If it
* is, *@locked will be set to 0 and -EBUSY returned.
*/
-static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma,
+static int faultin_page(struct vm_area_struct *vma,
unsigned long address, unsigned int *flags, int *locked)
{
unsigned int fault_flags = 0;
@@ -954,7 +954,6 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)

/**
* __get_user_pages() - pin user pages in memory
- * @tsk: task_struct of target task
* @mm: mm_struct of target mm
* @start: starting user address
* @nr_pages: number of pages from start to pin
@@ -1012,7 +1011,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
* instead of __get_user_pages. __get_user_pages should be used only if
* you need some special @gup_flags.
*/
-static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
+static long __get_user_pages(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked)
@@ -1088,8 +1087,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,

page = follow_page_mask(vma, start, foll_flags, &ctx);
if (!page) {
- ret = faultin_page(tsk, vma, start, &foll_flags,
- locked);
+ ret = faultin_page(vma, start, &foll_flags, locked);
switch (ret) {
case 0:
goto retry;
@@ -1163,8 +1161,6 @@ static bool vma_permits_fault(struct vm_area_struct *vma,

/*
* fixup_user_fault() - manually resolve a user page fault
- * @tsk: the task_struct to use for page fault accounting, or
- * NULL if faults are not to be recorded.
* @mm: mm_struct of target mm
* @address: user address
* @fault_flags:flags to pass down to handle_mm_fault()
@@ -1191,7 +1187,7 @@ static bool vma_permits_fault(struct vm_area_struct *vma,
* This function will not return with an unlocked mmap_sem. So it has not the
* same semantics wrt the @mm->mmap_sem as does filemap_fault().
*/
-int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
+int fixup_user_fault(struct mm_struct *mm,
unsigned long address, unsigned int fault_flags,
bool *unlocked)
{
@@ -1236,8 +1232,7 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
}
EXPORT_SYMBOL_GPL(fixup_user_fault);

-static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
- struct mm_struct *mm,
+static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
struct page **pages,
@@ -1270,7 +1265,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
pages_done = 0;
lock_dropped = false;
for (;;) {
- ret = __get_user_pages(tsk, mm, start, nr_pages, flags, pages,
+ ret = __get_user_pages(mm, start, nr_pages, flags, pages,
vmas, locked);
if (!locked)
/* VM_FAULT_RETRY couldn't trigger, bypass */
@@ -1330,7 +1325,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
}

*locked = 1;
- ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED,
+ ret = __get_user_pages(mm, start, 1, flags | FOLL_TRIED,
pages, NULL, locked);
if (!*locked) {
/* Continue to retry until we succeeded */
@@ -1416,7 +1411,7 @@ long populate_vma_page_range(struct vm_area_struct *vma,
* We made sure addr is within a VMA, so the following will
* not result in a stack expansion that recurses back here.
*/
- return __get_user_pages(current, mm, start, nr_pages, gup_flags,
+ return __get_user_pages(mm, start, nr_pages, gup_flags,
NULL, NULL, locked);
}

@@ -1500,7 +1495,7 @@ struct page *get_dump_page(unsigned long addr)
struct vm_area_struct *vma;
struct page *page;

- if (__get_user_pages(current, current->mm, addr, 1,
+ if (__get_user_pages(current->mm, addr, 1,
FOLL_FORCE | FOLL_DUMP | FOLL_GET, &page, &vma,
NULL) < 1)
return NULL;
@@ -1509,8 +1504,7 @@ struct page *get_dump_page(unsigned long addr)
}
#endif /* CONFIG_ELF_CORE */
#else /* CONFIG_MMU */
-static long __get_user_pages_locked(struct task_struct *tsk,
- struct mm_struct *mm, unsigned long start,
+static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
unsigned long nr_pages, struct page **pages,
struct vm_area_struct **vmas, int *locked,
unsigned int foll_flags)
@@ -1626,8 +1620,7 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
return __alloc_pages_node(nid, gfp_mask, 0);
}

-static long check_and_migrate_cma_pages(struct task_struct *tsk,
- struct mm_struct *mm,
+static long check_and_migrate_cma_pages(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
struct page **pages,
@@ -1701,7 +1694,7 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk,
* again migrating any new CMA pages which we failed to isolate
* earlier.
*/
- ret = __get_user_pages_locked(tsk, mm, start, nr_pages,
+ ret = __get_user_pages_locked(mm, start, nr_pages,
pages, vmas, NULL,
gup_flags);

@@ -1715,8 +1708,7 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk,
return ret;
}
#else
-static long check_and_migrate_cma_pages(struct task_struct *tsk,
- struct mm_struct *mm,
+static long check_and_migrate_cma_pages(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
struct page **pages,
@@ -1731,8 +1723,7 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk,
* __gup_longterm_locked() is a wrapper for __get_user_pages_locked which
* allows us to process the FOLL_LONGTERM flag.
*/
-static long __gup_longterm_locked(struct task_struct *tsk,
- struct mm_struct *mm,
+static long __gup_longterm_locked(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
struct page **pages,
@@ -1757,7 +1748,7 @@ static long __gup_longterm_locked(struct task_struct *tsk,
flags = memalloc_nocma_save();
}

- rc = __get_user_pages_locked(tsk, mm, start, nr_pages, pages,
+ rc = __get_user_pages_locked(mm, start, nr_pages, pages,
vmas_tmp, NULL, gup_flags);

if (gup_flags & FOLL_LONGTERM) {
@@ -1772,7 +1763,7 @@ static long __gup_longterm_locked(struct task_struct *tsk,
goto out;
}

- rc = check_and_migrate_cma_pages(tsk, mm, start, rc, pages,
+ rc = check_and_migrate_cma_pages(mm, start, rc, pages,
vmas_tmp, gup_flags);
}

@@ -1782,22 +1773,20 @@ static long __gup_longterm_locked(struct task_struct *tsk,
return rc;
}
#else /* !CONFIG_FS_DAX && !CONFIG_CMA */
-static __always_inline long __gup_longterm_locked(struct task_struct *tsk,
- struct mm_struct *mm,
+static __always_inline long __gup_longterm_locked(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
struct page **pages,
struct vm_area_struct **vmas,
unsigned int flags)
{
- return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
+ return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
NULL, flags);
}
#endif /* CONFIG_FS_DAX || CONFIG_CMA */

#ifdef CONFIG_MMU
-static long __get_user_pages_remote(struct task_struct *tsk,
- struct mm_struct *mm,
+static long __get_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked)
@@ -1816,20 +1805,18 @@ static long __get_user_pages_remote(struct task_struct *tsk,
* This will check the vmas (even if our vmas arg is NULL)
* and return -ENOTSUPP if DAX isn't allowed in this case:
*/
- return __gup_longterm_locked(tsk, mm, start, nr_pages, pages,
+ return __gup_longterm_locked(mm, start, nr_pages, pages,
vmas, gup_flags | FOLL_TOUCH |
FOLL_REMOTE);
}

- return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
+ return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
locked,
gup_flags | FOLL_TOUCH | FOLL_REMOTE);
}

/*
* get_user_pages_remote() - pin user pages in memory
- * @tsk: the task_struct to use for page fault accounting, or
- * NULL if faults are not to be recorded.
* @mm: mm_struct of target mm
* @start: starting user address
* @nr_pages: number of pages from start to pin
@@ -1888,7 +1875,7 @@ static long __get_user_pages_remote(struct task_struct *tsk,
* should use get_user_pages because it cannot pass
* FAULT_FLAG_ALLOW_RETRY to handle_mm_fault.
*/
-long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
+long get_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked)
@@ -1900,13 +1887,13 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
if (WARN_ON_ONCE(gup_flags & FOLL_PIN))
return -EINVAL;

- return __get_user_pages_remote(tsk, mm, start, nr_pages, gup_flags,
+ return __get_user_pages_remote(mm, start, nr_pages, gup_flags,
pages, vmas, locked);
}
EXPORT_SYMBOL(get_user_pages_remote);

#else /* CONFIG_MMU */
-long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
+long get_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked)
@@ -1914,8 +1901,7 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
return 0;
}

-static long __get_user_pages_remote(struct task_struct *tsk,
- struct mm_struct *mm,
+static long __get_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked)
@@ -1942,7 +1928,7 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
if (WARN_ON_ONCE(gup_flags & FOLL_PIN))
return -EINVAL;

- return __gup_longterm_locked(current, current->mm, start, nr_pages,
+ return __gup_longterm_locked(current->mm, start, nr_pages,
pages, vmas, gup_flags | FOLL_TOUCH);
}
EXPORT_SYMBOL(get_user_pages);
@@ -1956,7 +1942,7 @@ EXPORT_SYMBOL(get_user_pages);
*
* down_read(&mm->mmap_sem);
* do_something()
- * get_user_pages(tsk, mm, ..., pages, NULL);
+ * get_user_pages(mm, ..., pages, NULL);
* up_read(&mm->mmap_sem);
*
* to:
@@ -1964,7 +1950,7 @@ EXPORT_SYMBOL(get_user_pages);
* int locked = 1;
* down_read(&mm->mmap_sem);
* do_something()
- * get_user_pages_locked(tsk, mm, ..., pages, &locked);
+ * get_user_pages_locked(mm, ..., pages, &locked);
* if (locked)
* up_read(&mm->mmap_sem);
*/
@@ -1981,7 +1967,7 @@ long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
return -EINVAL;

- return __get_user_pages_locked(current, current->mm, start, nr_pages,
+ return __get_user_pages_locked(current->mm, start, nr_pages,
pages, NULL, locked,
gup_flags | FOLL_TOUCH);
}
@@ -1991,12 +1977,12 @@ EXPORT_SYMBOL(get_user_pages_locked);
* get_user_pages_unlocked() is suitable to replace the form:
*
* down_read(&mm->mmap_sem);
- * get_user_pages(tsk, mm, ..., pages, NULL);
+ * get_user_pages(mm, ..., pages, NULL);
* up_read(&mm->mmap_sem);
*
* with:
*
- * get_user_pages_unlocked(tsk, mm, ..., pages);
+ * get_user_pages_unlocked(mm, ..., pages);
*
* It is functionally equivalent to get_user_pages_fast so
* get_user_pages_fast should be used instead if specific gup_flags
@@ -2019,7 +2005,7 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
return -EINVAL;

down_read(&mm->mmap_sem);
- ret = __get_user_pages_locked(current, mm, start, nr_pages, pages, NULL,
+ ret = __get_user_pages_locked(mm, start, nr_pages, pages, NULL,
&locked, gup_flags | FOLL_TOUCH);
if (locked)
up_read(&mm->mmap_sem);
@@ -2720,7 +2706,7 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages,
*/
if (gup_flags & FOLL_LONGTERM) {
down_read(&current->mm->mmap_sem);
- ret = __gup_longterm_locked(current, current->mm,
+ ret = __gup_longterm_locked(current->mm,
start, nr_pages,
pages, NULL, gup_flags);
up_read(&current->mm->mmap_sem);
@@ -2850,10 +2836,8 @@ int pin_user_pages_fast(unsigned long start, int nr_pages,
EXPORT_SYMBOL_GPL(pin_user_pages_fast);

/**
- * pin_user_pages_remote() - pin pages of a remote process (task != current)
+ * pin_user_pages_remote() - pin pages of a remote process
*
- * @tsk: the task_struct to use for page fault accounting, or
- * NULL if faults are not to be recorded.
* @mm: mm_struct of target mm
* @start: starting user address
* @nr_pages: number of pages from start to pin
@@ -2877,7 +2861,7 @@ EXPORT_SYMBOL_GPL(pin_user_pages_fast);
* This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It
* is NOT intended for Case 2 (RDMA: long-term pins).
*/
-long pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
+long pin_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked)
@@ -2887,7 +2871,7 @@ long pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
return -EINVAL;

gup_flags |= FOLL_PIN;
- return __get_user_pages_remote(tsk, mm, start, nr_pages, gup_flags,
+ return __get_user_pages_remote(mm, start, nr_pages, gup_flags,
pages, vmas, locked);
}
EXPORT_SYMBOL(pin_user_pages_remote);
@@ -2922,7 +2906,7 @@ long pin_user_pages(unsigned long start, unsigned long nr_pages,
return -EINVAL;

gup_flags |= FOLL_PIN;
- return __gup_longterm_locked(current, current->mm, start, nr_pages,
+ return __gup_longterm_locked(current->mm, start, nr_pages,
pages, vmas, gup_flags);
}
EXPORT_SYMBOL(pin_user_pages);
diff --git a/mm/memory.c b/mm/memory.c
index 59a2989231fa..5af912cabe9a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4742,7 +4742,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
void *maddr;
struct page *page = NULL;

- ret = get_user_pages_remote(tsk, mm, addr, 1,
+ ret = get_user_pages_remote(mm, addr, 1,
gup_flags, &page, &vma, NULL);
if (ret <= 0) {
#ifndef CONFIG_HAVE_IOREMAP_PROT
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 74e957e302fe..5523464d0ab5 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -105,7 +105,7 @@ static int process_vm_rw_single_vec(unsigned long addr,
* current/current->mm
*/
down_read(&mm->mmap_sem);
- pinned_pages = pin_user_pages_remote(task, mm, pa, pinned_pages,
+ pinned_pages = pin_user_pages_remote(mm, pa, pinned_pages,
flags, process_pages,
NULL, &locked);
if (locked)
diff --git a/security/tomoyo/domain.c b/security/tomoyo/domain.c
index 7869d6a9980b..afe5e68ede77 100644
--- a/security/tomoyo/domain.c
+++ b/security/tomoyo/domain.c
@@ -914,7 +914,7 @@ bool tomoyo_dump_page(struct linux_binprm *bprm, unsigned long pos,
* (represented by bprm). 'current' is the process doing
* the execve().
*/
- if (get_user_pages_remote(current, bprm->mm, pos, 1,
+ if (get_user_pages_remote(bprm->mm, pos, 1,
FOLL_FORCE, &page, NULL, NULL) <= 0)
return false;
#else
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 15e5b037f92d..73098e18baaf 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -60,7 +60,7 @@ static void async_pf_execute(struct work_struct *work)
* access remotely.
*/
down_read(&mm->mmap_sem);
- get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL,
+ get_user_pages_remote(mm, addr, 1, FOLL_WRITE, NULL, NULL,
&locked);
if (locked)
up_read(&mm->mmap_sem);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 731c1e517716..3e1b2ec4ec96 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1829,7 +1829,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
* not call the fault handler, so do it here.
*/
bool unlocked = false;
- r = fixup_user_fault(current, current->mm, addr,
+ r = fixup_user_fault(current->mm, addr,
(write_fault ? FAULT_FLAG_WRITE : 0),
&unlocked);
if (unlocked)
--
2.26.2

2020-06-20 03:32:37

by Peter Xu

[permalink] [raw]
Subject: [PATCH 07/26] mm/hexagon: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two
perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault().

CC: Brian Cain <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/hexagon/mm/vm_fault.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c
index f04cd0a6d905..1b1802f30862 100644
--- a/arch/hexagon/mm/vm_fault.c
+++ b/arch/hexagon/mm/vm_fault.c
@@ -19,6 +19,7 @@
#include <linux/signal.h>
#include <linux/extable.h>
#include <linux/hardirq.h>
+#include <linux/perf_event.h>

/*
* Decode of hardware exception sends us to one of several
@@ -54,6 +55,8 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)

if (user_mode(regs))
flags |= FAULT_FLAG_USER;
+
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
retry:
down_read(&mm->mmap_sem);
vma = find_vma(mm, address);
@@ -89,7 +92,7 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
break;
}

- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (fault_signal_pending(fault, regs))
return;
@@ -97,10 +100,6 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
/* The most common case -- we are done. */
if (likely(!(fault & VM_FAULT_ERROR))) {
if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR)
- current->maj_flt++;
- else
- current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;
goto retry;
--
2.26.2

2020-06-20 03:33:29

by Peter Xu

[permalink] [raw]
Subject: [PATCH 21/26] mm/sparc64: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

CC: David S. Miller <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/sparc/mm/fault_64.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index 6b702a0a8155..fe8854d447ed 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -423,7 +423,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
goto bad_area;
}

- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (fault_signal_pending(fault, regs))
goto exit_exception;
@@ -439,15 +439,6 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
}

if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR) {
- current->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ,
- 1, regs, address);
- } else {
- current->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN,
- 1, regs, address);
- }
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

--
2.26.2

2020-06-20 03:33:30

by Peter Xu

[permalink] [raw]
Subject: [PATCH 19/26] mm/sh: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

CC: Yoshinori Sato <[email protected]>
CC: Rich Felker <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/sh/mm/fault.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c
index a4e670a9c9b3..ba6f7ed570e5 100644
--- a/arch/sh/mm/fault.c
+++ b/arch/sh/mm/fault.c
@@ -464,22 +464,13 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (unlikely(fault & (VM_FAULT_RETRY | VM_FAULT_ERROR)))
if (mm_fault_error(regs, error_code, address, fault))
return;

if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR) {
- tsk->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
- regs, address);
- } else {
- tsk->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
- regs, address);
- }
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

--
2.26.2

2020-06-20 03:48:31

by Peter Xu

[permalink] [raw]
Subject: [PATCH 02/26] mm/alpha: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().

Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two
perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault().

CC: Richard Henderson <[email protected]>
CC: Ivan Kokshaysky <[email protected]>
CC: Matt Turner <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/alpha/mm/fault.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index 82e72f24486e..2e325af081bc 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -25,6 +25,7 @@
#include <linux/interrupt.h>
#include <linux/extable.h>
#include <linux/uaccess.h>
+#include <linux/perf_event.h>

extern void die_if_kernel(char *,struct pt_regs *,long, unsigned long *);

@@ -116,6 +117,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
#endif
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
retry:
down_read(&mm->mmap_sem);
vma = find_vma(mm, address);
@@ -148,7 +150,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
/* If for any reason at all we couldn't handle the fault,
make sure we exit gracefully rather than endlessly redo
the fault. */
- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (fault_signal_pending(fault, regs))
return;
@@ -164,10 +166,6 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
}

if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR)
- current->maj_flt++;
- else
- current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

--
2.26.2

2020-06-20 03:48:31

by Peter Xu

[permalink] [raw]
Subject: [PATCH 12/26] mm/nds32: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

Fix PERF_COUNT_SW_PAGE_FAULTS perf event manually for page fault retries, by
moving it before taking mmap_sem.

CC: Nick Hu <[email protected]>
CC: Greentime Hu <[email protected]>
CC: Vincent Chen <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
arch/nds32/mm/fault.c | 19 +++----------------
1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/arch/nds32/mm/fault.c b/arch/nds32/mm/fault.c
index 22527129025c..e7344440623c 100644
--- a/arch/nds32/mm/fault.c
+++ b/arch/nds32/mm/fault.c
@@ -122,6 +122,8 @@ void do_page_fault(unsigned long entry, unsigned long addr,
if (unlikely(faulthandler_disabled() || !mm))
goto no_context;

+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
+
/*
* As per x86, we may deadlock here. However, since the kernel only
* validly references user space from well defined areas of the code,
@@ -207,7 +209,7 @@ void do_page_fault(unsigned long entry, unsigned long addr,
* the fault.
*/

- fault = handle_mm_fault(vma, addr, flags, NULL);
+ fault = handle_mm_fault(vma, addr, flags, regs);

/*
* If we need to retry but a fatal signal is pending, handle the
@@ -229,22 +231,7 @@ void do_page_fault(unsigned long entry, unsigned long addr,
goto bad_area;
}

- /*
- * Major/minor page fault accounting is only done on the initial
- * attempt. If we go through a retry, it is extremely likely that the
- * page will be found in page cache at that point.
- */
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR) {
- tsk->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ,
- 1, regs, addr);
- } else {
- tsk->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN,
- 1, regs, addr);
- }
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

--
2.26.2

2020-06-20 03:50:09

by Peter Xu

[permalink] [raw]
Subject: [PATCH 23/26] mm/x86: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().

CC: Dave Hansen <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: [email protected]
CC: H. Peter Anvin <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
arch/x86/mm/fault.c | 17 ++---------------
1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 3e27ed85af06..4604755a303d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1309,7 +1309,7 @@ void do_user_addr_fault(struct pt_regs *regs,
struct vm_area_struct *vma;
struct task_struct *tsk;
struct mm_struct *mm;
- vm_fault_t fault, major = 0;
+ vm_fault_t fault;
unsigned int flags = FAULT_FLAG_DEFAULT;

tsk = current;
@@ -1461,8 +1461,7 @@ void do_user_addr_fault(struct pt_regs *regs,
* userland). The return to userland is identified whenever
* FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in flags.
*/
- fault = handle_mm_fault(vma, address, flags, NULL);
- major |= fault & VM_FAULT_MAJOR;
+ fault = handle_mm_fault(vma, address, flags, regs);

/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
@@ -1489,18 +1488,6 @@ void do_user_addr_fault(struct pt_regs *regs,
return;
}

- /*
- * Major/minor page fault accounting. If any of the events
- * returned VM_FAULT_MAJOR, we account it as a major fault.
- */
- if (major) {
- tsk->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
- } else {
- tsk->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
- }
-
check_v8086_mode(regs, address, tsk);
}
NOKPROBE_SYMBOL(do_user_addr_fault);
--
2.26.2

2020-06-20 03:50:31

by Peter Xu

[permalink] [raw]
Subject: [PATCH 25/26] mm: Clean up the last pieces of page fault accountings

Here're the last pieces of page fault accounting that were still done outside
handle_mm_fault() where we still have regs==NULL when calling handle_mm_fault():

arch/powerpc/mm/copro_fault.c: copro_handle_mm_fault
arch/sparc/mm/fault_32.c: force_user_fault
arch/um/kernel/trap.c: handle_page_fault
mm/gup.c: faultin_page
fixup_user_fault
mm/hmm.c: hmm_vma_fault
mm/ksm.c: break_ksm

Some of them has the issue of duplicated accounting for page fault retries.
Some of them didn't do the accounting at all.

This patch cleans all these up by letting handle_mm_fault() to do per-task page
fault accounting even if regs==NULL (though we'll still skip the perf event
accountings). With that, we can safely remove all the outliers now.

There's another functional change in that now we account the page faults to the
caller of gup, rather than the task_struct that passed into the gup code. More
information of this can be found at [1].

After this patch, below things should never be touched again outside
handle_mm_fault():

- task_struct.[maj|min]_flt
- PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]

[1] https://lore.kernel.org/lkml/CAHk-=wj_V2Tps2QrMn20_W0OJF9xqNh52XSGA42s-ZJ8Y+GyKw@mail.gmail.com/

Signed-off-by: Peter Xu <[email protected]>
---
arch/powerpc/mm/copro_fault.c | 5 -----
arch/um/kernel/trap.c | 4 ----
mm/gup.c | 13 -------------
mm/memory.c | 20 ++++++++++++--------
4 files changed, 12 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index c0478bef1f14..2e59be1a9359 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -76,11 +76,6 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
BUG();
}

- if (*flt & VM_FAULT_MAJOR)
- current->maj_flt++;
- else
- current->min_flt++;
-
out_unlock:
up_read(&mm->mmap_sem);
return ret;
diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
index 32cc8f59322b..c881831de357 100644
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -92,10 +92,6 @@ int handle_page_fault(unsigned long address, unsigned long ip,
BUG();
}
if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR)
- current->maj_flt++;
- else
- current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

diff --git a/mm/gup.c b/mm/gup.c
index 1a48c639ea49..17b4d0c45a6b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -885,13 +885,6 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma,
BUG();
}

- if (tsk) {
- if (ret & VM_FAULT_MAJOR)
- tsk->maj_flt++;
- else
- tsk->min_flt++;
- }
-
if (ret & VM_FAULT_RETRY) {
if (locked && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT))
*locked = 0;
@@ -1239,12 +1232,6 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
goto retry;
}

- if (tsk) {
- if (major)
- tsk->maj_flt++;
- else
- tsk->min_flt++;
- }
return 0;
}
EXPORT_SYMBOL_GPL(fixup_user_fault);
diff --git a/mm/memory.c b/mm/memory.c
index 23c738b3756e..59a2989231fa 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4350,6 +4350,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
/**
* mm_account_fault - Do page fault accountings
* @regs: the pt_regs struct pointer. When set to NULL, will skip accounting
+ * of perf event counters, but we'll still do the per-task accounting to
+ * the task who triggered this page fault.
* @address: faulted address.
* @major: whether this is a major fault.
*
@@ -4365,16 +4367,18 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
static inline void mm_account_fault(struct pt_regs *regs,
unsigned long address, bool major)
{
+ if (major)
+ current->maj_flt++;
+ else
+ current->min_flt++;
+
if (!regs)
return;

- if (major) {
- current->maj_flt++;
+ if (major)
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
- } else {
- current->min_flt++;
+ else
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
- }
}

/*
@@ -4450,9 +4454,9 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
* immediately previously).
*
* - if the fault is done for GUP, regs wil be NULL and
- * no accounting will be done (but you _could_ pass in
- * your own regs and it would be accounted to the thread
- * doing the fault, not to the target!)
+ * we only do the accounting for the per thread fault
+ * counters who triggered the fault, and we skip the
+ * perf event updates.
*/
mm_account_fault(regs, address, (ret & VM_FAULT_MAJOR) ||
(flags & FAULT_FLAG_TRIED));
--
2.26.2

2020-06-20 03:51:25

by Peter Xu

[permalink] [raw]
Subject: [PATCH 14/26] mm/openrisc: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two
perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault().

CC: Jonas Bonn <[email protected]>
CC: Stefan Kristiansson <[email protected]>
CC: Stafford Horne <[email protected]>
CC: [email protected]
Acked-by: Stafford Horne <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
arch/openrisc/mm/fault.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c
index 45aedc572361..5255d73ce180 100644
--- a/arch/openrisc/mm/fault.c
+++ b/arch/openrisc/mm/fault.c
@@ -15,6 +15,7 @@
#include <linux/interrupt.h>
#include <linux/extable.h>
#include <linux/sched/signal.h>
+#include <linux/perf_event.h>

#include <linux/uaccess.h>
#include <asm/siginfo.h>
@@ -103,6 +104,8 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
if (in_interrupt() || !mm)
goto no_context;

+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
+
retry:
down_read(&mm->mmap_sem);
vma = find_vma(mm, address);
@@ -159,7 +162,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
* the fault.
*/

- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (fault_signal_pending(fault, regs))
return;
@@ -176,10 +179,6 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,

if (flags & FAULT_FLAG_ALLOW_RETRY) {
/*RGD modeled on Cris */
- if (fault & VM_FAULT_MAJOR)
- tsk->maj_flt++;
- else
- tsk->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

--
2.26.2

2020-06-20 03:51:40

by Peter Xu

[permalink] [raw]
Subject: [PATCH 22/26] mm/unicore32: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two
perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault().

CC: Guan Xuetao <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
arch/unicore32/mm/fault.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c
index 847ff24fcc2a..b272a389d977 100644
--- a/arch/unicore32/mm/fault.c
+++ b/arch/unicore32/mm/fault.c
@@ -16,6 +16,7 @@
#include <linux/page-flags.h>
#include <linux/sched/signal.h>
#include <linux/io.h>
+#include <linux/perf_event.h>

#include <asm/pgtable.h>
#include <asm/tlbflush.h>
@@ -160,7 +161,8 @@ static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma)
}

static vm_fault_t __do_pf(struct mm_struct *mm, unsigned long addr,
- unsigned int fsr, unsigned int flags, struct task_struct *tsk)
+ unsigned int fsr, unsigned int flags,
+ struct task_struct *tsk, struct pt_regs *regs)
{
struct vm_area_struct *vma;
vm_fault_t fault;
@@ -186,7 +188,7 @@ static vm_fault_t __do_pf(struct mm_struct *mm, unsigned long addr,
* If for any reason at all we couldn't handle the fault, make
* sure we exit gracefully rather than endlessly redo the fault.
*/
- fault = handle_mm_fault(vma, addr & PAGE_MASK, flags, NULL);
+ fault = handle_mm_fault(vma, addr & PAGE_MASK, flags, regs);
return fault;

check_stack:
@@ -219,6 +221,8 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
if (!(fsr ^ 0x12))
flags |= FAULT_FLAG_WRITE;

+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
+
/*
* As per x86, we may deadlock here. However, since the kernel only
* validly references user space from well defined areas of the code,
@@ -244,7 +248,7 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
#endif
}

- fault = __do_pf(mm, addr, fsr, flags, tsk);
+ fault = __do_pf(mm, addr, fsr, flags, tsk, regs);

/* If we need to retry but a fatal signal is pending, handle the
* signal first. We do not need to release the mmap_sem because
@@ -254,10 +258,6 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
return 0;

if (!(fault & VM_FAULT_ERROR) && (flags & FAULT_FLAG_ALLOW_RETRY)) {
- if (fault & VM_FAULT_MAJOR)
- tsk->maj_flt++;
- else
- tsk->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;
goto retry;
--
2.26.2

2020-06-20 05:03:19

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH 06/26] mm/csky: Use general page fault accounting

On Sat, Jun 20, 2020 at 12:05 AM Peter Xu <[email protected]> wrote:
>
> Use the general page fault accounting by passing regs into handle_mm_fault().
> It naturally solve the issue of multiple page fault accounting when page fault
> retry happened.
>
> CC: Guo Ren <[email protected]>
> CC: [email protected]
> Signed-off-by: Peter Xu <[email protected]>
> ---
> arch/csky/mm/fault.c | 12 +-----------
> 1 file changed, 1 insertion(+), 11 deletions(-)
>
> diff --git a/arch/csky/mm/fault.c b/arch/csky/mm/fault.c
> index b14f97d3cb15..a3e0aa3ebb79 100644
> --- a/arch/csky/mm/fault.c
> +++ b/arch/csky/mm/fault.c
> @@ -151,7 +151,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
> * the fault.
> */
> fault = handle_mm_fault(vma, address, write ? FAULT_FLAG_WRITE : 0,
> - NULL);
> + regs);
what's your kernel version ? (4th arg exsist ?)
/*
* If for any reason at all we couldn't handle the fault,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
fault = handle_mm_fault(vma, address, write ? FAULT_FLAG_WRITE : 0);
if (unlikely(fault & VM_FAULT_ERROR)) {



--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2020-06-20 16:10:55

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH 06/26] mm/csky: Use general page fault accounting

On Sat, Jun 20, 2020 at 09:44:31AM +0800, Guo Ren wrote:
> On Sat, Jun 20, 2020 at 12:05 AM Peter Xu <[email protected]> wrote:
> >
> > Use the general page fault accounting by passing regs into handle_mm_fault().
> > It naturally solve the issue of multiple page fault accounting when page fault
> > retry happened.
> >
> > CC: Guo Ren <[email protected]>
> > CC: [email protected]
> > Signed-off-by: Peter Xu <[email protected]>
> > ---
> > arch/csky/mm/fault.c | 12 +-----------
> > 1 file changed, 1 insertion(+), 11 deletions(-)
> >
> > diff --git a/arch/csky/mm/fault.c b/arch/csky/mm/fault.c
> > index b14f97d3cb15..a3e0aa3ebb79 100644
> > --- a/arch/csky/mm/fault.c
> > +++ b/arch/csky/mm/fault.c
> > @@ -151,7 +151,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
> > * the fault.
> > */
> > fault = handle_mm_fault(vma, address, write ? FAULT_FLAG_WRITE : 0,
> > - NULL);
> > + regs);
> what's your kernel version ? (4th arg exsist ?)
> /*
> * If for any reason at all we couldn't handle the fault,
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> fault = handle_mm_fault(vma, address, write ? FAULT_FLAG_WRITE : 0);
> if (unlikely(fault & VM_FAULT_ERROR)) {

Hi, Guo,

Sorry to be unclear. This patch is based on patch 1 in the same series:

https://lore.kernel.org/lkml/[email protected]/

Thanks,

--
Peter Xu

2020-06-24 18:50:13

by Gerald Schaefer

[permalink] [raw]
Subject: Re: [PATCH 01/26] mm: Do page fault accounting in handle_mm_fault

On Fri, 19 Jun 2020 12:05:13 -0400
Peter Xu <[email protected]> wrote:

[...]

> @@ -4393,6 +4425,38 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
> mem_cgroup_oom_synchronize(false);
> }
>
> + if (ret & VM_FAULT_RETRY)
> + return ret;

I'm wondering if this also needs a check and exit for VM_FAULT_ERROR.
In arch code (s390 and all others I briefly checked), the accounting
was skipped for VM_FAULT_ERROR case.

> +
> + /*
> + * Do accounting in the common code, to avoid unnecessary
> + * architecture differences or duplicated code.
> + *
> + * We arbitrarily make the rules be:
> + *
> + * - faults that never even got here (because the address
> + * wasn't valid). That includes arch_vma_access_permitted()

Missing "do not count" at the end of the first sentence?

> + * failing above.
> + *
> + * So this is expressly not a "this many hardware page
> + * faults" counter. Use the hw profiling for that.
> + *
> + * - incomplete faults (ie RETRY) do not count (see above).
> + * They will only count once completed.
> + *
> + * - the fault counts as a "major" fault when the final
> + * successful fault is VM_FAULT_MAJOR, or if it was a
> + * retry (which implies that we couldn't handle it
> + * immediately previously).
> + *
> + * - if the fault is done for GUP, regs wil be NULL and

wil -> will

Regards,
Gerald

2020-06-24 18:52:11

by Gerald Schaefer

[permalink] [raw]
Subject: Re: [PATCH 18/26] mm/s390: Use general page fault accounting

On Fri, 19 Jun 2020 12:13:35 -0400
Peter Xu <[email protected]> wrote:

> Use the general page fault accounting by passing regs into handle_mm_fault().
> It naturally solve the issue of multiple page fault accounting when page fault
> retry happened.
>
> CC: Heiko Carstens <[email protected]>
> CC: Vasily Gorbik <[email protected]>
> CC: Christian Borntraeger <[email protected]>
> CC: [email protected]
> Signed-off-by: Peter Xu <[email protected]>
> ---
> arch/s390/mm/fault.c | 16 +---------------
> 1 file changed, 1 insertion(+), 15 deletions(-)
>
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index ab6d7eedcfab..4d62ca7d3e09 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -479,7 +479,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> - fault = handle_mm_fault(vma, address, flags, NULL);
> + fault = handle_mm_fault(vma, address, flags, regs);
> if (fault_signal_pending(fault, regs)) {
> fault = VM_FAULT_SIGNAL;
> if (flags & FAULT_FLAG_RETRY_NOWAIT)
> @@ -489,21 +489,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> if (unlikely(fault & VM_FAULT_ERROR))
> goto out_up;

There are two cases here where we skipped the accounting,
fault_signal_pending() and VM_FAULT_ERROR, similar to other archs.

fault_signal_pending() should be ok, because that only seems to be true
for fault & VM_FAULT_RETRY, in which case the new approach also skips
the accounting.

But for VM_FAULT_ERROR, the new approach would do accounting, IIUC. Is
that changed on purpose? See also my reply on [PATCH 01/26].

Regards,
Gerald

2020-06-24 20:37:34

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH 01/26] mm: Do page fault accounting in handle_mm_fault

On Wed, Jun 24, 2020 at 08:49:03PM +0200, Gerald Schaefer wrote:
> On Fri, 19 Jun 2020 12:05:13 -0400
> Peter Xu <[email protected]> wrote:
>
> [...]
>
> > @@ -4393,6 +4425,38 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
> > mem_cgroup_oom_synchronize(false);
> > }
> >
> > + if (ret & VM_FAULT_RETRY)
> > + return ret;
>
> I'm wondering if this also needs a check and exit for VM_FAULT_ERROR.
> In arch code (s390 and all others I briefly checked), the accounting
> was skipped for VM_FAULT_ERROR case.

Yes. I didn't explicitly add the check because I thought it's still OK to count
the error cases, especially after we've discussed about
PERF_COUNT_SW_PAGE_FAULTS in v1. So far, the major reason (iiuc) to have
PERF_COUNT_SW_PAGE_FAULTS still in per-arch handlers is to also cover these
corner cases like VM_FAULT_ERROR. So to me it makes sense too to also count
them in here. But I agree it changes the old counting on most archs.

Again, I don't have strong opinion either on this, just like the same to
PERF_COUNT_SW_PAGE_FAULTS... But if no one disagree, I will change this to:

if (ret & (VM_FAULT_RETRY | VM_FAULT_ERROR))
return ret;

So we try our best to follow the past.

Btw, note that there will still be some even more special corner cases. E.g.,
for ARM64 it's also not accounted for some ARM64 specific fault errors
(VM_FAULT_BADMAP, VM_FAULT_BADACCESS). So even if we don't count
VM_FAULT_ERROR, we might still count these for ARM64. We can try to redefine
VM_FAULT_ERROR in ARM64 to cover all the arch-specific errors, however that
seems an overkill to me sololy for fault accountings, so hopefully I can ignore
that difference.

>
> > +
> > + /*
> > + * Do accounting in the common code, to avoid unnecessary
> > + * architecture differences or duplicated code.
> > + *
> > + * We arbitrarily make the rules be:
> > + *
> > + * - faults that never even got here (because the address
> > + * wasn't valid). That includes arch_vma_access_permitted()
>
> Missing "do not count" at the end of the first sentence?
>
> > + * failing above.
> > + *
> > + * So this is expressly not a "this many hardware page
> > + * faults" counter. Use the hw profiling for that.
> > + *
> > + * - incomplete faults (ie RETRY) do not count (see above).
> > + * They will only count once completed.
> > + *
> > + * - the fault counts as a "major" fault when the final
> > + * successful fault is VM_FAULT_MAJOR, or if it was a
> > + * retry (which implies that we couldn't handle it
> > + * immediately previously).
> > + *
> > + * - if the fault is done for GUP, regs wil be NULL and
>
> wil -> will

Will fix both places. Thanks,

--
Peter Xu

2020-06-24 20:44:29

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH 18/26] mm/s390: Use general page fault accounting

On Wed, Jun 24, 2020 at 08:49:30PM +0200, Gerald Schaefer wrote:
> On Fri, 19 Jun 2020 12:13:35 -0400
> Peter Xu <[email protected]> wrote:
>
> > Use the general page fault accounting by passing regs into handle_mm_fault().
> > It naturally solve the issue of multiple page fault accounting when page fault
> > retry happened.
> >
> > CC: Heiko Carstens <[email protected]>
> > CC: Vasily Gorbik <[email protected]>
> > CC: Christian Borntraeger <[email protected]>
> > CC: [email protected]
> > Signed-off-by: Peter Xu <[email protected]>
> > ---
> > arch/s390/mm/fault.c | 16 +---------------
> > 1 file changed, 1 insertion(+), 15 deletions(-)
> >
> > diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> > index ab6d7eedcfab..4d62ca7d3e09 100644
> > --- a/arch/s390/mm/fault.c
> > +++ b/arch/s390/mm/fault.c
> > @@ -479,7 +479,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> > * make sure we exit gracefully rather than endlessly redo
> > * the fault.
> > */
> > - fault = handle_mm_fault(vma, address, flags, NULL);
> > + fault = handle_mm_fault(vma, address, flags, regs);
> > if (fault_signal_pending(fault, regs)) {
> > fault = VM_FAULT_SIGNAL;
> > if (flags & FAULT_FLAG_RETRY_NOWAIT)
> > @@ -489,21 +489,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> > if (unlikely(fault & VM_FAULT_ERROR))
> > goto out_up;
>
> There are two cases here where we skipped the accounting,
> fault_signal_pending() and VM_FAULT_ERROR, similar to other archs.
>
> fault_signal_pending() should be ok, because that only seems to be true
> for fault & VM_FAULT_RETRY, in which case the new approach also skips
> the accounting.

IMHO it's still possible to have fault_signal_pending() return true even if the
fault is not with VM_FAULT_RETRY, e.g., when the signal is delivered right
after the fault is correctly handled for the thread. However I hope we can
avoid considering that too even if so...

>
> But for VM_FAULT_ERROR, the new approach would do accounting, IIUC. Is
> that changed on purpose? See also my reply on [PATCH 01/26].

(replied in the other thread)

Thanks,

--
Peter Xu

2020-06-25 09:43:50

by Thomas Bogendoerfer

[permalink] [raw]
Subject: Re: [PATCH 11/26] mm/mips: Use general page fault accounting

On Fri, Jun 19, 2020 at 12:05:23PM -0400, Peter Xu wrote:
> Use the general page fault accounting by passing regs into handle_mm_fault().
> It naturally solve the issue of multiple page fault accounting when page fault
> retry happened.
>
> Fix PERF_COUNT_SW_PAGE_FAULTS perf event manually for page fault retries, by
> moving it before taking mmap_sem.
>
> CC: Thomas Bogendoerfer <[email protected]>
> CC: [email protected]
> Signed-off-by: Peter Xu <[email protected]>
> ---
> arch/mips/mm/fault.c | 14 +++-----------
> 1 file changed, 3 insertions(+), 11 deletions(-)

Acked-by: Thomas Bogendoerfer <[email protected]>

Thomas.

--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]

2020-06-26 19:55:56

by Gerald Schaefer

[permalink] [raw]
Subject: Re: [PATCH 01/26] mm: Do page fault accounting in handle_mm_fault

On Wed, 24 Jun 2020 16:34:12 -0400
Peter Xu <[email protected]> wrote:

> On Wed, Jun 24, 2020 at 08:49:03PM +0200, Gerald Schaefer wrote:
> > On Fri, 19 Jun 2020 12:05:13 -0400
> > Peter Xu <[email protected]> wrote:
> >
> > [...]
> >
> > > @@ -4393,6 +4425,38 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
> > > mem_cgroup_oom_synchronize(false);
> > > }
> > >
> > > + if (ret & VM_FAULT_RETRY)
> > > + return ret;
> >
> > I'm wondering if this also needs a check and exit for VM_FAULT_ERROR.
> > In arch code (s390 and all others I briefly checked), the accounting
> > was skipped for VM_FAULT_ERROR case.
>
> Yes. I didn't explicitly add the check because I thought it's still OK to count
> the error cases, especially after we've discussed about
> PERF_COUNT_SW_PAGE_FAULTS in v1. So far, the major reason (iiuc) to have
> PERF_COUNT_SW_PAGE_FAULTS still in per-arch handlers is to also cover these
> corner cases like VM_FAULT_ERROR. So to me it makes sense too to also count
> them in here. But I agree it changes the old counting on most archs.

Having PERF_COUNT_SW_PAGE_FAULTS count everything including VM_FAULT_ERROR
is OK. Just major/minor accounting should be only about successes, IIRC from
v1 discussion.

The "new rules" also say

+ * - faults that never even got here (because the address
+ * wasn't valid). That includes arch_vma_access_permitted()
+ * failing above.

VM_FAULT_ERROR, and also the arch-specific VM_FAULT_BADxxx, qualify
as "address wasn't valid" I think, so they should not be counted as
major/minor.

IIRC from v1, and we want to only count success as major/minor, maybe
the rule could also be made more clear about that, e.g. like

+ * - unsuccessful faults (because the address wasn't valid)
+ * do not count. That includes arch_vma_access_permitted()
+ * failing above.

>
> Again, I don't have strong opinion either on this, just like the same to
> PERF_COUNT_SW_PAGE_FAULTS... But if no one disagree, I will change this to:
>
> if (ret & (VM_FAULT_RETRY | VM_FAULT_ERROR))
> return ret;
>
> So we try our best to follow the past.

Sounds good to me, and VM_FAULT_BADxxx should never show up here.

>
> Btw, note that there will still be some even more special corner cases. E.g.,
> for ARM64 it's also not accounted for some ARM64 specific fault errors
> (VM_FAULT_BADMAP, VM_FAULT_BADACCESS). So even if we don't count
> VM_FAULT_ERROR, we might still count these for ARM64. We can try to redefine
> VM_FAULT_ERROR in ARM64 to cover all the arch-specific errors, however that
> seems an overkill to me sololy for fault accountings, so hopefully I can ignore
> that difference.

Hmm, arm64 already does not count the VM_FAULT_BADxxx, but also does not
call handle_mm_fault() for those, so no change with this patch. arm (and
also unicore32) do count those, but also not call handle_mm_fault(), so
there would be the change that they lose accounting, IIUC.

I agree that this probably can be ignored. The code in arm64 also looks
more recent, so it's probably just a left-over in arm/unicore32 code.

Regards,
Gerald

2020-06-26 21:54:56

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH 01/26] mm: Do page fault accounting in handle_mm_fault

On Fri, Jun 26, 2020 at 09:54:24PM +0200, Gerald Schaefer wrote:
> On Wed, 24 Jun 2020 16:34:12 -0400
> Peter Xu <[email protected]> wrote:
>
> > On Wed, Jun 24, 2020 at 08:49:03PM +0200, Gerald Schaefer wrote:
> > > On Fri, 19 Jun 2020 12:05:13 -0400
> > > Peter Xu <[email protected]> wrote:
> > >
> > > [...]
> > >
> > > > @@ -4393,6 +4425,38 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
> > > > mem_cgroup_oom_synchronize(false);
> > > > }
> > > >
> > > > + if (ret & VM_FAULT_RETRY)
> > > > + return ret;
> > >
> > > I'm wondering if this also needs a check and exit for VM_FAULT_ERROR.
> > > In arch code (s390 and all others I briefly checked), the accounting
> > > was skipped for VM_FAULT_ERROR case.
> >
> > Yes. I didn't explicitly add the check because I thought it's still OK to count
> > the error cases, especially after we've discussed about
> > PERF_COUNT_SW_PAGE_FAULTS in v1. So far, the major reason (iiuc) to have
> > PERF_COUNT_SW_PAGE_FAULTS still in per-arch handlers is to also cover these
> > corner cases like VM_FAULT_ERROR. So to me it makes sense too to also count
> > them in here. But I agree it changes the old counting on most archs.
>
> Having PERF_COUNT_SW_PAGE_FAULTS count everything including VM_FAULT_ERROR
> is OK. Just major/minor accounting should be only about successes, IIRC from
> v1 discussion.
>
> The "new rules" also say
>
> + * - faults that never even got here (because the address
> + * wasn't valid). That includes arch_vma_access_permitted()
> + * failing above.
>
> VM_FAULT_ERROR, and also the arch-specific VM_FAULT_BADxxx, qualify
> as "address wasn't valid" I think, so they should not be counted as
> major/minor.
>
> IIRC from v1, and we want to only count success as major/minor, maybe
> the rule could also be made more clear about that, e.g. like
>
> + * - unsuccessful faults (because the address wasn't valid)
> + * do not count. That includes arch_vma_access_permitted()
> + * failing above.

Sure.

>
> >
> > Again, I don't have strong opinion either on this, just like the same to
> > PERF_COUNT_SW_PAGE_FAULTS... But if no one disagree, I will change this to:
> >
> > if (ret & (VM_FAULT_RETRY | VM_FAULT_ERROR))
> > return ret;
> >
> > So we try our best to follow the past.
>
> Sounds good to me, and VM_FAULT_BADxxx should never show up here.
>
> >
> > Btw, note that there will still be some even more special corner cases. E.g.,
> > for ARM64 it's also not accounted for some ARM64 specific fault errors
> > (VM_FAULT_BADMAP, VM_FAULT_BADACCESS). So even if we don't count
> > VM_FAULT_ERROR, we might still count these for ARM64. We can try to redefine
> > VM_FAULT_ERROR in ARM64 to cover all the arch-specific errors, however that
> > seems an overkill to me sololy for fault accountings, so hopefully I can ignore
> > that difference.
>
> Hmm, arm64 already does not count the VM_FAULT_BADxxx, but also does not
> call handle_mm_fault() for those, so no change with this patch. arm (and
> also unicore32) do count those, but also not call handle_mm_fault(), so
> there would be the change that they lose accounting, IIUC.

Oh you are right... I just noticed that VM_FAULT_BADMAP and VM_FAULT_BADACCESS
can never returned in handle_mm_fault() itself.

>
> I agree that this probably can be ignored. The code in arm64 also looks
> more recent, so it's probably just a left-over in arm/unicore32 code.

Anyway, glad to know that we've reached consensus so that we can accept these
differences.

Since this patch seems to be the only one that needs a new post so far, I'll
repost this patch only by replying to itself with v2.1. Hopefully that can
avoid some unecessary mail bombs.

Thanks for the very detailed review!

--
Peter Xu

2020-06-26 22:30:14

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH 01/26] mm: Do page fault accounting in handle_mm_fault

On Fri, Jun 26, 2020 at 05:53:46PM -0400, Peter Xu wrote:
> Since this patch seems to be the only one that needs a new post so far, I'll
> repost this patch only by replying to itself with v2.1. Hopefully that can
> avoid some unecessary mail bombs.

Unluckily patch 25 will need a trivial touch-up on the comment... I'll just
resend the whole series for simplicity. Thanks,

--
Peter Xu