2020-06-26 22:36:20

by Peter Xu

[permalink] [raw]
Subject: [PATCH 00/26] mm: Page fault accounting cleanups

v3:
- comment fixes [Gerald]
- check VM_FAULT_ERROR too [Gerald]
- collect r-b

==================== cover letter for v2 =============================

This is v2 of the pf accounting cleanup series. It originates from Gerald
Schaefer's report on an issue a week ago regarding to incorrect page fault
accountings for retried page fault after commit 4064b9827063 ("mm: allow
VM_FAULT_RETRY for multiple times"):

https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/

This version used a better approach suggested by Linus so that we do accounting
directly in handle_mm_fault(). Moreover, we'll cover some special accounting
too like gup or IOMMU fault requests on process page tables. The outcome of
this series is to keep all the pf accountings in handle_mm_fault() (besides
PERF_COUNT_SW_PAGE_FAULTS, which is still done in per-arch #pf handlers).

Since v2 changed quite a lot from v1, changelog is omitted, and I also didn't
have a chance to pick up any r-b in previous version. I really appreciate
anyone who has looked at v1. V1 for reference:

https://lore.kernel.org/lkml/[email protected]/

What this series did:

- Correct page fault accounting: we do accounting for a page fault (no matter
whether it's from #PF handling, or gup, or anything else) only with the one
that completed the fault. For example, page fault retries should not be
counted in page fault counters. Same to the perf events.

- Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf event is
used in an adhoc way across different archs.

Case (1): for many archs it's done at the entry of a page fault handler, so
that it will also cover e.g. errornous faults.

Case (2): for some other archs, it is only accounted when the page fault is
resolved successfully.

Case (3): there're still quite some archs that have not enabled this perf event.

Since this series will touch merely all the archs, we unify this perf event
to always follow case (1), which is the one that makes most sense. And
since we moved the accounting into handle_mm_fault, the other two MAJ/MIN
perf events are well taken care of naturally.

- Unify definition of "major faults": the definition of "major fault" is
slightly changed when used in accounting (not VM_FAULT_MAJOR). More
information in patch 1.

- Always account the page fault onto the one that triggered the page fault.
This does not matter much for #PF handlings, but mostly for gup. More
information on this in patch 25.

Patchset layout:

Patch 1: Introduced the accounting in handle_mm_fault(), not enabled.
Patch 2-24: Enable the new accounting for arch #PF handlers one by one.
Patch 25: Enable the new accounting for the rest outliers (gup, iommu, etc.)
Patch 26: Cleanup GUP task_struct pointer since it's not needed any more

For each of the patch that fixes a specific arch, I'm CCing the maintainers and
the arch list if there is. Besides, I only lightly tested this series on x86.

Please have a look, thanks.

Peter Xu (26):
mm: Do page fault accounting in handle_mm_fault
mm/alpha: Use general page fault accounting
mm/arc: Use general page fault accounting
mm/arm: Use general page fault accounting
mm/arm64: Use general page fault accounting
mm/csky: Use general page fault accounting
mm/hexagon: Use general page fault accounting
mm/ia64: Use general page fault accounting
mm/m68k: Use general page fault accounting
mm/microblaze: Use general page fault accounting
mm/mips: Use general page fault accounting
mm/nds32: Use general page fault accounting
mm/nios2: Use general page fault accounting
mm/openrisc: Use general page fault accounting
mm/parisc: Use general page fault accounting
mm/powerpc: Use general page fault accounting
mm/riscv: Use general page fault accounting
mm/s390: Use general page fault accounting
mm/sh: Use general page fault accounting
mm/sparc32: Use general page fault accounting
mm/sparc64: Use general page fault accounting
mm/unicore32: Use general page fault accounting
mm/x86: Use general page fault accounting
mm/xtensa: Use general page fault accounting
mm: Clean up the last pieces of page fault accountings
mm/gup: Remove task_struct pointer for all gup code

arch/alpha/mm/fault.c | 8 +-
arch/arc/kernel/process.c | 2 +-
arch/arc/mm/fault.c | 18 +---
arch/arm/mm/fault.c | 25 ++---
arch/arm64/mm/fault.c | 29 ++----
arch/csky/mm/fault.c | 13 +--
arch/hexagon/mm/vm_fault.c | 9 +-
arch/ia64/mm/fault.c | 9 +-
arch/m68k/mm/fault.c | 14 +--
arch/microblaze/mm/fault.c | 9 +-
arch/mips/mm/fault.c | 14 +--
arch/nds32/mm/fault.c | 19 +---
arch/nios2/mm/fault.c | 14 +--
arch/openrisc/mm/fault.c | 9 +-
arch/parisc/mm/fault.c | 8 +-
arch/powerpc/mm/copro_fault.c | 7 +-
arch/powerpc/mm/fault.c | 11 +-
arch/riscv/mm/fault.c | 16 +--
arch/s390/kvm/interrupt.c | 2 +-
arch/s390/kvm/kvm-s390.c | 2 +-
arch/s390/kvm/priv.c | 8 +-
arch/s390/mm/fault.c | 16 +--
arch/s390/mm/gmap.c | 4 +-
arch/sh/mm/fault.c | 11 +-
arch/sparc/mm/fault_32.c | 13 +--
arch/sparc/mm/fault_64.c | 11 +-
arch/um/kernel/trap.c | 6 +-
arch/unicore32/mm/fault.c | 14 +--
arch/x86/mm/fault.c | 17 +---
arch/xtensa/mm/fault.c | 15 +--
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 2 +-
drivers/infiniband/core/umem_odp.c | 2 +-
drivers/iommu/amd_iommu_v2.c | 2 +-
drivers/iommu/intel-svm.c | 2 +-
drivers/vfio/vfio_iommu_type1.c | 2 +-
fs/exec.c | 2 +-
include/linux/mm.h | 16 +--
kernel/events/uprobes.c | 6 +-
kernel/futex.c | 2 +-
mm/gup.c | 107 +++++++-------------
mm/hmm.c | 3 +-
mm/ksm.c | 3 +-
mm/memory.c | 69 ++++++++++++-
mm/process_vm_access.c | 2 +-
security/tomoyo/domain.c | 2 +-
virt/kvm/async_pf.c | 2 +-
virt/kvm/kvm_main.c | 2 +-
47 files changed, 219 insertions(+), 360 deletions(-)

--
2.26.2


2020-06-26 22:37:10

by Peter Xu

[permalink] [raw]
Subject: [PATCH 15/26] mm/parisc: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two
perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault().

CC: James E.J. Bottomley <[email protected]>
CC: Helge Deller <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/parisc/mm/fault.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
index c10908ea8803..65661e22678e 100644
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -18,6 +18,7 @@
#include <linux/extable.h>
#include <linux/uaccess.h>
#include <linux/hugetlb.h>
+#include <linux/perf_event.h>

#include <asm/traps.h>

@@ -281,6 +282,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
acc_type = parisc_acctyp(code, regs->iir);
if (acc_type & VM_WRITE)
flags |= FAULT_FLAG_WRITE;
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
retry:
down_read(&mm->mmap_sem);
vma = find_vma_prev(mm, address, &prev_vma);
@@ -302,7 +304,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
* fault.
*/

- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (fault_signal_pending(fault, regs))
return;
@@ -323,10 +325,6 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
BUG();
}
if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR)
- current->maj_flt++;
- else
- current->min_flt++;
if (fault & VM_FAULT_RETRY) {
/*
* No need to up_read(&mm->mmap_sem) as we would
--
2.26.2

2020-06-26 22:37:13

by Peter Xu

[permalink] [raw]
Subject: [PATCH 13/26] mm/nios2: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two
perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault().

CC: Ley Foon Tan <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
arch/nios2/mm/fault.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c
index 88abf297c759..823e7d0a9e97 100644
--- a/arch/nios2/mm/fault.c
+++ b/arch/nios2/mm/fault.c
@@ -24,6 +24,7 @@
#include <linux/mm.h>
#include <linux/extable.h>
#include <linux/uaccess.h>
+#include <linux/perf_event.h>

#include <asm/mmu_context.h>
#include <asm/traps.h>
@@ -83,6 +84,8 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
if (user_mode(regs))
flags |= FAULT_FLAG_USER;

+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
+
if (!down_read_trylock(&mm->mmap_sem)) {
if (!user_mode(regs) && !search_exception_tables(regs->ea))
goto bad_area_nosemaphore;
@@ -131,7 +134,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (fault_signal_pending(fault, regs))
return;
@@ -146,16 +149,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
BUG();
}

- /*
- * Major/minor page fault accounting is only done on the
- * initial attempt. If we go through a retry, it is extremely
- * likely that the page will be found in page cache at that point.
- */
if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR)
- current->maj_flt++;
- else
- current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

--
2.26.2

2020-06-26 22:37:14

by Peter Xu

[permalink] [raw]
Subject: [PATCH 16/26] mm/powerpc: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().

CC: Michael Ellerman <[email protected]>
CC: Benjamin Herrenschmidt <[email protected]>
CC: Paul Mackerras <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/powerpc/mm/fault.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 992b10c3761c..e325d13efaf5 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -563,7 +563,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

#ifdef CONFIG_PPC_MEM_KEYS
/*
@@ -604,14 +604,9 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
/*
* Major/minor page fault accounting.
*/
- if (major) {
- current->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
+ if (major)
cmo_account_page_fault();
- } else {
- current->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
- }
+
return 0;
}
NOKPROBE_SYMBOL(__do_page_fault);
--
2.26.2

2020-06-26 22:37:27

by Peter Xu

[permalink] [raw]
Subject: [PATCH 20/26] mm/sparc32: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

CC: David S. Miller <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/sparc/mm/fault_32.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c
index 61524d284706..542bf034962f 100644
--- a/arch/sparc/mm/fault_32.c
+++ b/arch/sparc/mm/fault_32.c
@@ -235,7 +235,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (fault_signal_pending(fault, regs))
return;
@@ -251,15 +251,6 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
}

if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR) {
- current->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ,
- 1, regs, address);
- } else {
- current->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN,
- 1, regs, address);
- }
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

--
2.26.2

2020-06-26 22:37:34

by Peter Xu

[permalink] [raw]
Subject: [PATCH 21/26] mm/sparc64: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

CC: David S. Miller <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/sparc/mm/fault_64.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index 6b702a0a8155..fe8854d447ed 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -423,7 +423,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
goto bad_area;
}

- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (fault_signal_pending(fault, regs))
goto exit_exception;
@@ -439,15 +439,6 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
}

if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR) {
- current->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ,
- 1, regs, address);
- } else {
- current->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN,
- 1, regs, address);
- }
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

--
2.26.2

2020-06-26 22:37:43

by Peter Xu

[permalink] [raw]
Subject: [PATCH 22/26] mm/unicore32: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two
perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault().

CC: Guan Xuetao <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
arch/unicore32/mm/fault.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c
index 847ff24fcc2a..b272a389d977 100644
--- a/arch/unicore32/mm/fault.c
+++ b/arch/unicore32/mm/fault.c
@@ -16,6 +16,7 @@
#include <linux/page-flags.h>
#include <linux/sched/signal.h>
#include <linux/io.h>
+#include <linux/perf_event.h>

#include <asm/pgtable.h>
#include <asm/tlbflush.h>
@@ -160,7 +161,8 @@ static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma)
}

static vm_fault_t __do_pf(struct mm_struct *mm, unsigned long addr,
- unsigned int fsr, unsigned int flags, struct task_struct *tsk)
+ unsigned int fsr, unsigned int flags,
+ struct task_struct *tsk, struct pt_regs *regs)
{
struct vm_area_struct *vma;
vm_fault_t fault;
@@ -186,7 +188,7 @@ static vm_fault_t __do_pf(struct mm_struct *mm, unsigned long addr,
* If for any reason at all we couldn't handle the fault, make
* sure we exit gracefully rather than endlessly redo the fault.
*/
- fault = handle_mm_fault(vma, addr & PAGE_MASK, flags, NULL);
+ fault = handle_mm_fault(vma, addr & PAGE_MASK, flags, regs);
return fault;

check_stack:
@@ -219,6 +221,8 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
if (!(fsr ^ 0x12))
flags |= FAULT_FLAG_WRITE;

+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
+
/*
* As per x86, we may deadlock here. However, since the kernel only
* validly references user space from well defined areas of the code,
@@ -244,7 +248,7 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
#endif
}

- fault = __do_pf(mm, addr, fsr, flags, tsk);
+ fault = __do_pf(mm, addr, fsr, flags, tsk, regs);

/* If we need to retry but a fatal signal is pending, handle the
* signal first. We do not need to release the mmap_sem because
@@ -254,10 +258,6 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
return 0;

if (!(fault & VM_FAULT_ERROR) && (flags & FAULT_FLAG_ALLOW_RETRY)) {
- if (fault & VM_FAULT_MAJOR)
- tsk->maj_flt++;
- else
- tsk->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;
goto retry;
--
2.26.2

2020-06-26 22:37:46

by Peter Xu

[permalink] [raw]
Subject: [PATCH 23/26] mm/x86: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().

CC: Dave Hansen <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: [email protected]
CC: H. Peter Anvin <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
arch/x86/mm/fault.c | 17 ++---------------
1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 3e27ed85af06..4604755a303d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1309,7 +1309,7 @@ void do_user_addr_fault(struct pt_regs *regs,
struct vm_area_struct *vma;
struct task_struct *tsk;
struct mm_struct *mm;
- vm_fault_t fault, major = 0;
+ vm_fault_t fault;
unsigned int flags = FAULT_FLAG_DEFAULT;

tsk = current;
@@ -1461,8 +1461,7 @@ void do_user_addr_fault(struct pt_regs *regs,
* userland). The return to userland is identified whenever
* FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in flags.
*/
- fault = handle_mm_fault(vma, address, flags, NULL);
- major |= fault & VM_FAULT_MAJOR;
+ fault = handle_mm_fault(vma, address, flags, regs);

/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
@@ -1489,18 +1488,6 @@ void do_user_addr_fault(struct pt_regs *regs,
return;
}

- /*
- * Major/minor page fault accounting. If any of the events
- * returned VM_FAULT_MAJOR, we account it as a major fault.
- */
- if (major) {
- tsk->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
- } else {
- tsk->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
- }
-
check_v8086_mode(regs, address, tsk);
}
NOKPROBE_SYMBOL(do_user_addr_fault);
--
2.26.2

2020-06-26 22:38:27

by Peter Xu

[permalink] [raw]
Subject: [PATCH 18/26] mm/s390: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

CC: Heiko Carstens <[email protected]>
CC: Vasily Gorbik <[email protected]>
CC: Christian Borntraeger <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/s390/mm/fault.c | 16 +---------------
1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index ab6d7eedcfab..4d62ca7d3e09 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -479,7 +479,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);
if (fault_signal_pending(fault, regs)) {
fault = VM_FAULT_SIGNAL;
if (flags & FAULT_FLAG_RETRY_NOWAIT)
@@ -489,21 +489,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
if (unlikely(fault & VM_FAULT_ERROR))
goto out_up;

- /*
- * Major/minor page fault accounting is only done on the
- * initial attempt. If we go through a retry, it is extremely
- * likely that the page will be found in page cache at that point.
- */
if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR) {
- tsk->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
- regs, address);
- } else {
- tsk->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
- regs, address);
- }
if (fault & VM_FAULT_RETRY) {
if (IS_ENABLED(CONFIG_PGSTE) && gmap &&
(flags & FAULT_FLAG_RETRY_NOWAIT)) {
--
2.26.2

2020-06-26 22:39:09

by Peter Xu

[permalink] [raw]
Subject: [PATCH 14/26] mm/openrisc: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two
perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault().

CC: Jonas Bonn <[email protected]>
CC: Stefan Kristiansson <[email protected]>
CC: Stafford Horne <[email protected]>
CC: [email protected]
Acked-by: Stafford Horne <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
arch/openrisc/mm/fault.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c
index 45aedc572361..5255d73ce180 100644
--- a/arch/openrisc/mm/fault.c
+++ b/arch/openrisc/mm/fault.c
@@ -15,6 +15,7 @@
#include <linux/interrupt.h>
#include <linux/extable.h>
#include <linux/sched/signal.h>
+#include <linux/perf_event.h>

#include <linux/uaccess.h>
#include <asm/siginfo.h>
@@ -103,6 +104,8 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
if (in_interrupt() || !mm)
goto no_context;

+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
+
retry:
down_read(&mm->mmap_sem);
vma = find_vma(mm, address);
@@ -159,7 +162,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
* the fault.
*/

- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (fault_signal_pending(fault, regs))
return;
@@ -176,10 +179,6 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,

if (flags & FAULT_FLAG_ALLOW_RETRY) {
/*RGD modeled on Cris */
- if (fault & VM_FAULT_MAJOR)
- tsk->maj_flt++;
- else
- tsk->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

--
2.26.2

2020-06-26 22:39:13

by Peter Xu

[permalink] [raw]
Subject: [PATCH 19/26] mm/sh: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

CC: Yoshinori Sato <[email protected]>
CC: Rich Felker <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/sh/mm/fault.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c
index a4e670a9c9b3..ba6f7ed570e5 100644
--- a/arch/sh/mm/fault.c
+++ b/arch/sh/mm/fault.c
@@ -464,22 +464,13 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (unlikely(fault & (VM_FAULT_RETRY | VM_FAULT_ERROR)))
if (mm_fault_error(regs, error_code, address, fault))
return;

if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR) {
- tsk->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
- regs, address);
- } else {
- tsk->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
- regs, address);
- }
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

--
2.26.2

2020-06-26 22:40:14

by Peter Xu

[permalink] [raw]
Subject: [PATCH 25/26] mm: Clean up the last pieces of page fault accountings

Here're the last pieces of page fault accounting that were still done outside
handle_mm_fault() where we still have regs==NULL when calling handle_mm_fault():

arch/powerpc/mm/copro_fault.c: copro_handle_mm_fault
arch/sparc/mm/fault_32.c: force_user_fault
arch/um/kernel/trap.c: handle_page_fault
mm/gup.c: faultin_page
fixup_user_fault
mm/hmm.c: hmm_vma_fault
mm/ksm.c: break_ksm

Some of them has the issue of duplicated accounting for page fault retries.
Some of them didn't do the accounting at all.

This patch cleans all these up by letting handle_mm_fault() to do per-task page
fault accounting even if regs==NULL (though we'll still skip the perf event
accountings). With that, we can safely remove all the outliers now.

There's another functional change in that now we account the page faults to the
caller of gup, rather than the task_struct that passed into the gup code. More
information of this can be found at [1].

After this patch, below things should never be touched again outside
handle_mm_fault():

- task_struct.[maj|min]_flt
- PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]

[1] https://lore.kernel.org/lkml/CAHk-=wj_V2Tps2QrMn20_W0OJF9xqNh52XSGA42s-ZJ8Y+GyKw@mail.gmail.com/

Signed-off-by: Peter Xu <[email protected]>
---
arch/powerpc/mm/copro_fault.c | 5 -----
arch/um/kernel/trap.c | 4 ----
mm/gup.c | 13 -------------
mm/memory.c | 19 ++++++++++++-------
4 files changed, 12 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index c0478bef1f14..2e59be1a9359 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -76,11 +76,6 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
BUG();
}

- if (*flt & VM_FAULT_MAJOR)
- current->maj_flt++;
- else
- current->min_flt++;
-
out_unlock:
up_read(&mm->mmap_sem);
return ret;
diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
index 32cc8f59322b..c881831de357 100644
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -92,10 +92,6 @@ int handle_page_fault(unsigned long address, unsigned long ip,
BUG();
}
if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR)
- current->maj_flt++;
- else
- current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

diff --git a/mm/gup.c b/mm/gup.c
index 1a48c639ea49..17b4d0c45a6b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -885,13 +885,6 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma,
BUG();
}

- if (tsk) {
- if (ret & VM_FAULT_MAJOR)
- tsk->maj_flt++;
- else
- tsk->min_flt++;
- }
-
if (ret & VM_FAULT_RETRY) {
if (locked && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT))
*locked = 0;
@@ -1239,12 +1232,6 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
goto retry;
}

- if (tsk) {
- if (major)
- tsk->maj_flt++;
- else
- tsk->min_flt++;
- }
return 0;
}
EXPORT_SYMBOL_GPL(fixup_user_fault);
diff --git a/mm/memory.c b/mm/memory.c
index 4a9b333b079e..0b3c747cd2b3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4350,6 +4350,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
/**
* mm_account_fault - Do page fault accountings
* @regs: the pt_regs struct pointer. When set to NULL, will skip accounting
+ * of perf event counters, but we'll still do the per-task accounting to
+ * the task who triggered this page fault.
* @address: faulted address.
* @major: whether this is a major fault.
*
@@ -4365,16 +4367,18 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
static inline void mm_account_fault(struct pt_regs *regs,
unsigned long address, bool major)
{
+ if (major)
+ current->maj_flt++;
+ else
+ current->min_flt++;
+
if (!regs)
return;

- if (major) {
- current->maj_flt++;
+ if (major)
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
- } else {
- current->min_flt++;
+ else
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
- }
}

/*
@@ -4447,8 +4451,9 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
* fault is VM_FAULT_MAJOR, or if it was a retry (which implies that
* we couldn't handle it immediately previously).
*
- * - If the fault is done for GUP, regs will be NULL and no accounting
- * will be done.
+ * - If the fault is done for GUP, regs will be NULL and we only do
+ * the accounting for the per thread fault counters who triggered
+ * the fault, and we skip the perf event updates.
*/
mm_account_fault(regs, address, (ret & VM_FAULT_MAJOR) ||
(flags & FAULT_FLAG_TRIED));
--
2.26.2

2020-06-26 22:40:22

by Peter Xu

[permalink] [raw]
Subject: [PATCH 26/26] mm/gup: Remove task_struct pointer for all gup code

After the cleanup of page fault accounting, gup does not need to pass
task_struct around any more. Remove that parameter in the whole gup stack.

Signed-off-by: Peter Xu <[email protected]>
---
arch/arc/kernel/process.c | 2 +-
arch/s390/kvm/interrupt.c | 2 +-
arch/s390/kvm/kvm-s390.c | 2 +-
arch/s390/kvm/priv.c | 8 +-
arch/s390/mm/gmap.c | 4 +-
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 2 +-
drivers/infiniband/core/umem_odp.c | 2 +-
drivers/vfio/vfio_iommu_type1.c | 2 +-
fs/exec.c | 2 +-
include/linux/mm.h | 9 +--
kernel/events/uprobes.c | 6 +-
kernel/futex.c | 2 +-
mm/gup.c | 90 +++++++++------------
mm/memory.c | 2 +-
mm/process_vm_access.c | 2 +-
security/tomoyo/domain.c | 2 +-
virt/kvm/async_pf.c | 2 +-
virt/kvm/kvm_main.c | 2 +-
18 files changed, 63 insertions(+), 80 deletions(-)

diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
index 315528f04bc1..2aad79ffc7f8 100644
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -91,7 +91,7 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new)
goto fail;

down_read(&current->mm->mmap_sem);
- ret = fixup_user_fault(current, current->mm, (unsigned long) uaddr,
+ ret = fixup_user_fault(current->mm, (unsigned long) uaddr,
FAULT_FLAG_WRITE, NULL);
up_read(&current->mm->mmap_sem);

diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index bfb481134994..7f4c5895aabd 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2768,7 +2768,7 @@ static struct page *get_map_page(struct kvm *kvm, u64 uaddr)
struct page *page = NULL;

down_read(&kvm->mm->mmap_sem);
- get_user_pages_remote(NULL, kvm->mm, uaddr, 1, FOLL_WRITE,
+ get_user_pages_remote(kvm->mm, uaddr, 1, FOLL_WRITE,
&page, NULL, NULL);
up_read(&kvm->mm->mmap_sem);
return page;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index d05bb040fd42..12fa299986f8 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1892,7 +1892,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)

r = set_guest_storage_key(current->mm, hva, keys[i], 0);
if (r) {
- r = fixup_user_fault(current, current->mm, hva,
+ r = fixup_user_fault(current->mm, hva,
FAULT_FLAG_WRITE, &unlocked);
if (r)
break;
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 893893642415..45b7d5df72d7 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -274,7 +274,7 @@ static int handle_iske(struct kvm_vcpu *vcpu)
rc = get_guest_storage_key(current->mm, vmaddr, &key);

if (rc) {
- rc = fixup_user_fault(current, current->mm, vmaddr,
+ rc = fixup_user_fault(current->mm, vmaddr,
FAULT_FLAG_WRITE, &unlocked);
if (!rc) {
up_read(&current->mm->mmap_sem);
@@ -320,7 +320,7 @@ static int handle_rrbe(struct kvm_vcpu *vcpu)
down_read(&current->mm->mmap_sem);
rc = reset_guest_reference_bit(current->mm, vmaddr);
if (rc < 0) {
- rc = fixup_user_fault(current, current->mm, vmaddr,
+ rc = fixup_user_fault(current->mm, vmaddr,
FAULT_FLAG_WRITE, &unlocked);
if (!rc) {
up_read(&current->mm->mmap_sem);
@@ -391,7 +391,7 @@ static int handle_sske(struct kvm_vcpu *vcpu)
m3 & SSKE_MC);

if (rc < 0) {
- rc = fixup_user_fault(current, current->mm, vmaddr,
+ rc = fixup_user_fault(current->mm, vmaddr,
FAULT_FLAG_WRITE, &unlocked);
rc = !rc ? -EAGAIN : rc;
}
@@ -1095,7 +1095,7 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
rc = cond_set_guest_storage_key(current->mm, vmaddr,
key, NULL, nq, mr, mc);
if (rc < 0) {
- rc = fixup_user_fault(current, current->mm, vmaddr,
+ rc = fixup_user_fault(current->mm, vmaddr,
FAULT_FLAG_WRITE, &unlocked);
rc = !rc ? -EAGAIN : rc;
}
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 1a95d8809cc3..0faf4f5f3fd4 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -649,7 +649,7 @@ int gmap_fault(struct gmap *gmap, unsigned long gaddr,
rc = vmaddr;
goto out_up;
}
- if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags,
+ if (fixup_user_fault(gmap->mm, vmaddr, fault_flags,
&unlocked)) {
rc = -EFAULT;
goto out_up;
@@ -879,7 +879,7 @@ static int gmap_pte_op_fixup(struct gmap *gmap, unsigned long gaddr,

BUG_ON(gmap_is_shadow(gmap));
fault_flags = (prot == PROT_WRITE) ? FAULT_FLAG_WRITE : 0;
- if (fixup_user_fault(current, mm, vmaddr, fault_flags, &unlocked))
+ if (fixup_user_fault(mm, vmaddr, fault_flags, &unlocked))
return -EFAULT;
if (unlocked)
/* lost mmap_sem, caller has to retry __gmap_translate */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 7ffd7afeb7a5..e87fa79c18d5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -472,7 +472,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
locked = 1;
}
ret = get_user_pages_remote
- (work->task, mm,
+ (mm,
obj->userptr.ptr + pinned * PAGE_SIZE,
npages - pinned,
flags,
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 3b1e627d9a8d..73b1a01b7339 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -437,7 +437,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
* complex (and doesn't gain us much performance in most use
* cases).
*/
- npages = get_user_pages_remote(owning_process, owning_mm,
+ npages = get_user_pages_remote(owning_mm,
user_virt, gup_num_pages,
flags, local_page_list, NULL, NULL);
up_read(&owning_mm->mmap_sem);
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index cc1d64765ce7..d77b34d6ee19 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -329,7 +329,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
flags |= FOLL_WRITE;

down_read(&mm->mmap_sem);
- ret = pin_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
+ ret = pin_user_pages_remote(mm, vaddr, 1, flags | FOLL_LONGTERM,
page, NULL, NULL);
if (ret == 1) {
*pfn = page_to_pfn(page[0]);
diff --git a/fs/exec.c b/fs/exec.c
index 2c465119affc..f3f87911f3d0 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -213,7 +213,7 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
* We are doing an exec(). 'current' is the process
* doing the exec and bprm->mm is the new process's mm.
*/
- ret = get_user_pages_remote(current, bprm->mm, pos, 1, gup_flags,
+ ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
&page, NULL, NULL);
if (ret <= 0)
return NULL;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 46bee4044ac1..5e347ffb049f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1655,7 +1655,7 @@ int invalidate_inode_page(struct page *page);
extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma,
unsigned long address, unsigned int flags,
struct pt_regs *regs);
-extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
+extern int fixup_user_fault(struct mm_struct *mm,
unsigned long address, unsigned int fault_flags,
bool *unlocked);
void unmap_mapping_pages(struct address_space *mapping,
@@ -1671,8 +1671,7 @@ static inline vm_fault_t handle_mm_fault(struct vm_area_struct *vma,
BUG();
return VM_FAULT_SIGBUS;
}
-static inline int fixup_user_fault(struct task_struct *tsk,
- struct mm_struct *mm, unsigned long address,
+static inline int fixup_user_fault(struct mm_struct *mm, unsigned long address,
unsigned int fault_flags, bool *unlocked)
{
/* should never happen if there's no MMU */
@@ -1698,11 +1697,11 @@ extern int access_remote_vm(struct mm_struct *mm, unsigned long addr,
extern int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
unsigned long addr, void *buf, int len, unsigned int gup_flags);

-long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
+long get_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked);
-long pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
+long pin_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ece7e13f6e4a..b7c9ad7e7d54 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -382,7 +382,7 @@ __update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
if (!vaddr || !d)
return -EINVAL;

- ret = get_user_pages_remote(NULL, mm, vaddr, 1,
+ ret = get_user_pages_remote(mm, vaddr, 1,
FOLL_WRITE, &page, &vma, NULL);
if (unlikely(ret <= 0)) {
/*
@@ -483,7 +483,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
if (is_register)
gup_flags |= FOLL_SPLIT_PMD;
/* Read the page with vaddr into memory */
- ret = get_user_pages_remote(NULL, mm, vaddr, 1, gup_flags,
+ ret = get_user_pages_remote(mm, vaddr, 1, gup_flags,
&old_page, &vma, NULL);
if (ret <= 0)
return ret;
@@ -2027,7 +2027,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
* but we treat this as a 'remote' access since it is
* essentially a kernel access to the memory.
*/
- result = get_user_pages_remote(NULL, mm, vaddr, 1, FOLL_FORCE, &page,
+ result = get_user_pages_remote(mm, vaddr, 1, FOLL_FORCE, &page,
NULL, NULL);
if (result < 0)
return result;
diff --git a/kernel/futex.c b/kernel/futex.c
index b59532862bc0..1466b4322491 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -696,7 +696,7 @@ static int fault_in_user_writeable(u32 __user *uaddr)
int ret;

down_read(&mm->mmap_sem);
- ret = fixup_user_fault(current, mm, (unsigned long)uaddr,
+ ret = fixup_user_fault(mm, (unsigned long)uaddr,
FAULT_FLAG_WRITE, NULL);
up_read(&mm->mmap_sem);

diff --git a/mm/gup.c b/mm/gup.c
index 17b4d0c45a6b..b8eb02673c10 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -851,7 +851,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
* does not include FOLL_NOWAIT, the mmap_sem may be released. If it
* is, *@locked will be set to 0 and -EBUSY returned.
*/
-static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma,
+static int faultin_page(struct vm_area_struct *vma,
unsigned long address, unsigned int *flags, int *locked)
{
unsigned int fault_flags = 0;
@@ -954,7 +954,6 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)

/**
* __get_user_pages() - pin user pages in memory
- * @tsk: task_struct of target task
* @mm: mm_struct of target mm
* @start: starting user address
* @nr_pages: number of pages from start to pin
@@ -1012,7 +1011,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
* instead of __get_user_pages. __get_user_pages should be used only if
* you need some special @gup_flags.
*/
-static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
+static long __get_user_pages(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked)
@@ -1088,8 +1087,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,

page = follow_page_mask(vma, start, foll_flags, &ctx);
if (!page) {
- ret = faultin_page(tsk, vma, start, &foll_flags,
- locked);
+ ret = faultin_page(vma, start, &foll_flags, locked);
switch (ret) {
case 0:
goto retry;
@@ -1163,8 +1161,6 @@ static bool vma_permits_fault(struct vm_area_struct *vma,

/*
* fixup_user_fault() - manually resolve a user page fault
- * @tsk: the task_struct to use for page fault accounting, or
- * NULL if faults are not to be recorded.
* @mm: mm_struct of target mm
* @address: user address
* @fault_flags:flags to pass down to handle_mm_fault()
@@ -1191,7 +1187,7 @@ static bool vma_permits_fault(struct vm_area_struct *vma,
* This function will not return with an unlocked mmap_sem. So it has not the
* same semantics wrt the @mm->mmap_sem as does filemap_fault().
*/
-int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
+int fixup_user_fault(struct mm_struct *mm,
unsigned long address, unsigned int fault_flags,
bool *unlocked)
{
@@ -1236,8 +1232,7 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
}
EXPORT_SYMBOL_GPL(fixup_user_fault);

-static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
- struct mm_struct *mm,
+static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
struct page **pages,
@@ -1270,7 +1265,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
pages_done = 0;
lock_dropped = false;
for (;;) {
- ret = __get_user_pages(tsk, mm, start, nr_pages, flags, pages,
+ ret = __get_user_pages(mm, start, nr_pages, flags, pages,
vmas, locked);
if (!locked)
/* VM_FAULT_RETRY couldn't trigger, bypass */
@@ -1330,7 +1325,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
}

*locked = 1;
- ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED,
+ ret = __get_user_pages(mm, start, 1, flags | FOLL_TRIED,
pages, NULL, locked);
if (!*locked) {
/* Continue to retry until we succeeded */
@@ -1416,7 +1411,7 @@ long populate_vma_page_range(struct vm_area_struct *vma,
* We made sure addr is within a VMA, so the following will
* not result in a stack expansion that recurses back here.
*/
- return __get_user_pages(current, mm, start, nr_pages, gup_flags,
+ return __get_user_pages(mm, start, nr_pages, gup_flags,
NULL, NULL, locked);
}

@@ -1500,7 +1495,7 @@ struct page *get_dump_page(unsigned long addr)
struct vm_area_struct *vma;
struct page *page;

- if (__get_user_pages(current, current->mm, addr, 1,
+ if (__get_user_pages(current->mm, addr, 1,
FOLL_FORCE | FOLL_DUMP | FOLL_GET, &page, &vma,
NULL) < 1)
return NULL;
@@ -1509,8 +1504,7 @@ struct page *get_dump_page(unsigned long addr)
}
#endif /* CONFIG_ELF_CORE */
#else /* CONFIG_MMU */
-static long __get_user_pages_locked(struct task_struct *tsk,
- struct mm_struct *mm, unsigned long start,
+static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
unsigned long nr_pages, struct page **pages,
struct vm_area_struct **vmas, int *locked,
unsigned int foll_flags)
@@ -1626,8 +1620,7 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
return __alloc_pages_node(nid, gfp_mask, 0);
}

-static long check_and_migrate_cma_pages(struct task_struct *tsk,
- struct mm_struct *mm,
+static long check_and_migrate_cma_pages(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
struct page **pages,
@@ -1701,7 +1694,7 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk,
* again migrating any new CMA pages which we failed to isolate
* earlier.
*/
- ret = __get_user_pages_locked(tsk, mm, start, nr_pages,
+ ret = __get_user_pages_locked(mm, start, nr_pages,
pages, vmas, NULL,
gup_flags);

@@ -1715,8 +1708,7 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk,
return ret;
}
#else
-static long check_and_migrate_cma_pages(struct task_struct *tsk,
- struct mm_struct *mm,
+static long check_and_migrate_cma_pages(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
struct page **pages,
@@ -1731,8 +1723,7 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk,
* __gup_longterm_locked() is a wrapper for __get_user_pages_locked which
* allows us to process the FOLL_LONGTERM flag.
*/
-static long __gup_longterm_locked(struct task_struct *tsk,
- struct mm_struct *mm,
+static long __gup_longterm_locked(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
struct page **pages,
@@ -1757,7 +1748,7 @@ static long __gup_longterm_locked(struct task_struct *tsk,
flags = memalloc_nocma_save();
}

- rc = __get_user_pages_locked(tsk, mm, start, nr_pages, pages,
+ rc = __get_user_pages_locked(mm, start, nr_pages, pages,
vmas_tmp, NULL, gup_flags);

if (gup_flags & FOLL_LONGTERM) {
@@ -1772,7 +1763,7 @@ static long __gup_longterm_locked(struct task_struct *tsk,
goto out;
}

- rc = check_and_migrate_cma_pages(tsk, mm, start, rc, pages,
+ rc = check_and_migrate_cma_pages(mm, start, rc, pages,
vmas_tmp, gup_flags);
}

@@ -1782,22 +1773,20 @@ static long __gup_longterm_locked(struct task_struct *tsk,
return rc;
}
#else /* !CONFIG_FS_DAX && !CONFIG_CMA */
-static __always_inline long __gup_longterm_locked(struct task_struct *tsk,
- struct mm_struct *mm,
+static __always_inline long __gup_longterm_locked(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
struct page **pages,
struct vm_area_struct **vmas,
unsigned int flags)
{
- return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
+ return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
NULL, flags);
}
#endif /* CONFIG_FS_DAX || CONFIG_CMA */

#ifdef CONFIG_MMU
-static long __get_user_pages_remote(struct task_struct *tsk,
- struct mm_struct *mm,
+static long __get_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked)
@@ -1816,20 +1805,18 @@ static long __get_user_pages_remote(struct task_struct *tsk,
* This will check the vmas (even if our vmas arg is NULL)
* and return -ENOTSUPP if DAX isn't allowed in this case:
*/
- return __gup_longterm_locked(tsk, mm, start, nr_pages, pages,
+ return __gup_longterm_locked(mm, start, nr_pages, pages,
vmas, gup_flags | FOLL_TOUCH |
FOLL_REMOTE);
}

- return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
+ return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
locked,
gup_flags | FOLL_TOUCH | FOLL_REMOTE);
}

/*
* get_user_pages_remote() - pin user pages in memory
- * @tsk: the task_struct to use for page fault accounting, or
- * NULL if faults are not to be recorded.
* @mm: mm_struct of target mm
* @start: starting user address
* @nr_pages: number of pages from start to pin
@@ -1888,7 +1875,7 @@ static long __get_user_pages_remote(struct task_struct *tsk,
* should use get_user_pages because it cannot pass
* FAULT_FLAG_ALLOW_RETRY to handle_mm_fault.
*/
-long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
+long get_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked)
@@ -1900,13 +1887,13 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
if (WARN_ON_ONCE(gup_flags & FOLL_PIN))
return -EINVAL;

- return __get_user_pages_remote(tsk, mm, start, nr_pages, gup_flags,
+ return __get_user_pages_remote(mm, start, nr_pages, gup_flags,
pages, vmas, locked);
}
EXPORT_SYMBOL(get_user_pages_remote);

#else /* CONFIG_MMU */
-long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
+long get_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked)
@@ -1914,8 +1901,7 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
return 0;
}

-static long __get_user_pages_remote(struct task_struct *tsk,
- struct mm_struct *mm,
+static long __get_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked)
@@ -1942,7 +1928,7 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
if (WARN_ON_ONCE(gup_flags & FOLL_PIN))
return -EINVAL;

- return __gup_longterm_locked(current, current->mm, start, nr_pages,
+ return __gup_longterm_locked(current->mm, start, nr_pages,
pages, vmas, gup_flags | FOLL_TOUCH);
}
EXPORT_SYMBOL(get_user_pages);
@@ -1956,7 +1942,7 @@ EXPORT_SYMBOL(get_user_pages);
*
* down_read(&mm->mmap_sem);
* do_something()
- * get_user_pages(tsk, mm, ..., pages, NULL);
+ * get_user_pages(mm, ..., pages, NULL);
* up_read(&mm->mmap_sem);
*
* to:
@@ -1964,7 +1950,7 @@ EXPORT_SYMBOL(get_user_pages);
* int locked = 1;
* down_read(&mm->mmap_sem);
* do_something()
- * get_user_pages_locked(tsk, mm, ..., pages, &locked);
+ * get_user_pages_locked(mm, ..., pages, &locked);
* if (locked)
* up_read(&mm->mmap_sem);
*/
@@ -1981,7 +1967,7 @@ long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
return -EINVAL;

- return __get_user_pages_locked(current, current->mm, start, nr_pages,
+ return __get_user_pages_locked(current->mm, start, nr_pages,
pages, NULL, locked,
gup_flags | FOLL_TOUCH);
}
@@ -1991,12 +1977,12 @@ EXPORT_SYMBOL(get_user_pages_locked);
* get_user_pages_unlocked() is suitable to replace the form:
*
* down_read(&mm->mmap_sem);
- * get_user_pages(tsk, mm, ..., pages, NULL);
+ * get_user_pages(mm, ..., pages, NULL);
* up_read(&mm->mmap_sem);
*
* with:
*
- * get_user_pages_unlocked(tsk, mm, ..., pages);
+ * get_user_pages_unlocked(mm, ..., pages);
*
* It is functionally equivalent to get_user_pages_fast so
* get_user_pages_fast should be used instead if specific gup_flags
@@ -2019,7 +2005,7 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
return -EINVAL;

down_read(&mm->mmap_sem);
- ret = __get_user_pages_locked(current, mm, start, nr_pages, pages, NULL,
+ ret = __get_user_pages_locked(mm, start, nr_pages, pages, NULL,
&locked, gup_flags | FOLL_TOUCH);
if (locked)
up_read(&mm->mmap_sem);
@@ -2720,7 +2706,7 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages,
*/
if (gup_flags & FOLL_LONGTERM) {
down_read(&current->mm->mmap_sem);
- ret = __gup_longterm_locked(current, current->mm,
+ ret = __gup_longterm_locked(current->mm,
start, nr_pages,
pages, NULL, gup_flags);
up_read(&current->mm->mmap_sem);
@@ -2850,10 +2836,8 @@ int pin_user_pages_fast(unsigned long start, int nr_pages,
EXPORT_SYMBOL_GPL(pin_user_pages_fast);

/**
- * pin_user_pages_remote() - pin pages of a remote process (task != current)
+ * pin_user_pages_remote() - pin pages of a remote process
*
- * @tsk: the task_struct to use for page fault accounting, or
- * NULL if faults are not to be recorded.
* @mm: mm_struct of target mm
* @start: starting user address
* @nr_pages: number of pages from start to pin
@@ -2877,7 +2861,7 @@ EXPORT_SYMBOL_GPL(pin_user_pages_fast);
* This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It
* is NOT intended for Case 2 (RDMA: long-term pins).
*/
-long pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
+long pin_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked)
@@ -2887,7 +2871,7 @@ long pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
return -EINVAL;

gup_flags |= FOLL_PIN;
- return __get_user_pages_remote(tsk, mm, start, nr_pages, gup_flags,
+ return __get_user_pages_remote(mm, start, nr_pages, gup_flags,
pages, vmas, locked);
}
EXPORT_SYMBOL(pin_user_pages_remote);
@@ -2922,7 +2906,7 @@ long pin_user_pages(unsigned long start, unsigned long nr_pages,
return -EINVAL;

gup_flags |= FOLL_PIN;
- return __gup_longterm_locked(current, current->mm, start, nr_pages,
+ return __gup_longterm_locked(current->mm, start, nr_pages,
pages, vmas, gup_flags);
}
EXPORT_SYMBOL(pin_user_pages);
diff --git a/mm/memory.c b/mm/memory.c
index 0b3c747cd2b3..65576e3b382f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4739,7 +4739,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
void *maddr;
struct page *page = NULL;

- ret = get_user_pages_remote(tsk, mm, addr, 1,
+ ret = get_user_pages_remote(mm, addr, 1,
gup_flags, &page, &vma, NULL);
if (ret <= 0) {
#ifndef CONFIG_HAVE_IOREMAP_PROT
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 74e957e302fe..5523464d0ab5 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -105,7 +105,7 @@ static int process_vm_rw_single_vec(unsigned long addr,
* current/current->mm
*/
down_read(&mm->mmap_sem);
- pinned_pages = pin_user_pages_remote(task, mm, pa, pinned_pages,
+ pinned_pages = pin_user_pages_remote(mm, pa, pinned_pages,
flags, process_pages,
NULL, &locked);
if (locked)
diff --git a/security/tomoyo/domain.c b/security/tomoyo/domain.c
index 7869d6a9980b..afe5e68ede77 100644
--- a/security/tomoyo/domain.c
+++ b/security/tomoyo/domain.c
@@ -914,7 +914,7 @@ bool tomoyo_dump_page(struct linux_binprm *bprm, unsigned long pos,
* (represented by bprm). 'current' is the process doing
* the execve().
*/
- if (get_user_pages_remote(current, bprm->mm, pos, 1,
+ if (get_user_pages_remote(bprm->mm, pos, 1,
FOLL_FORCE, &page, NULL, NULL) <= 0)
return false;
#else
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 15e5b037f92d..73098e18baaf 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -60,7 +60,7 @@ static void async_pf_execute(struct work_struct *work)
* access remotely.
*/
down_read(&mm->mmap_sem);
- get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL,
+ get_user_pages_remote(mm, addr, 1, FOLL_WRITE, NULL, NULL,
&locked);
if (locked)
up_read(&mm->mmap_sem);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 731c1e517716..3e1b2ec4ec96 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1829,7 +1829,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
* not call the fault handler, so do it here.
*/
bool unlocked = false;
- r = fixup_user_fault(current, current->mm, addr,
+ r = fixup_user_fault(current->mm, addr,
(write_fault ? FAULT_FLAG_WRITE : 0),
&unlocked);
if (unlocked)
--
2.26.2

2020-06-26 22:40:27

by Peter Xu

[permalink] [raw]
Subject: [PATCH 17/26] mm/riscv: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

CC: Paul Walmsley <[email protected]>
CC: Palmer Dabbelt <[email protected]>
CC: Albert Ou <[email protected]>
CC: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/riscv/mm/fault.c | 16 +---------------
1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index 677ee1bb11ac..e796ba02b572 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -110,7 +110,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, addr, flags, NULL);
+ fault = handle_mm_fault(vma, addr, flags, regs);

/*
* If we need to retry but a fatal signal is pending, handle the
@@ -128,21 +128,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
BUG();
}

- /*
- * Major/minor page fault accounting is only done on the
- * initial attempt. If we go through a retry, it is extremely
- * likely that the page will be found in page cache at that point.
- */
if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR) {
- tsk->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ,
- 1, regs, addr);
- } else {
- tsk->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN,
- 1, regs, addr);
- }
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

--
2.26.2

2020-06-26 22:40:38

by Peter Xu

[permalink] [raw]
Subject: [PATCH 24/26] mm/xtensa: Use general page fault accounting

Use the general page fault accounting by passing regs into handle_mm_fault().
It naturally solve the issue of multiple page fault accounting when page fault
retry happened.

Remove the PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN] perf events because it's now
also done in handle_mm_fault().

Move the PERF_COUNT_SW_PAGE_FAULTS event higher before taking mmap_sem for the
fault, then it'll match with the rest of the archs.

CC: Chris Zankel <[email protected]>
CC: Max Filippov <[email protected]>
CC: [email protected]
Acked-by: Max Filippov <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
arch/xtensa/mm/fault.c | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c
index 722ef3c98d60..9ef7331e37f8 100644
--- a/arch/xtensa/mm/fault.c
+++ b/arch/xtensa/mm/fault.c
@@ -73,6 +73,9 @@ void do_page_fault(struct pt_regs *regs)

if (user_mode(regs))
flags |= FAULT_FLAG_USER;
+
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
+
retry:
down_read(&mm->mmap_sem);
vma = find_vma(mm, address);
@@ -108,7 +111,7 @@ void do_page_fault(struct pt_regs *regs)
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags, NULL);
+ fault = handle_mm_fault(vma, address, flags, regs);

if (fault_signal_pending(fault, regs))
return;
@@ -123,10 +126,6 @@ void do_page_fault(struct pt_regs *regs)
BUG();
}
if (flags & FAULT_FLAG_ALLOW_RETRY) {
- if (fault & VM_FAULT_MAJOR)
- current->maj_flt++;
- else
- current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

@@ -140,12 +139,6 @@ void do_page_fault(struct pt_regs *regs)
}

up_read(&mm->mmap_sem);
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
- if (flags & VM_FAULT_MAJOR)
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
- else
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
-
return;

/* Something tried to access memory that isn't in our memory map..
--
2.26.2

2020-07-01 11:49:16

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 17/26] mm/riscv: Use general page fault accounting

Hi Peter,

On Sat, Jun 27, 2020 at 1:36 AM Peter Xu <[email protected]> wrote:
> Use the general page fault accounting by passing regs into handle_mm_fault().
> It naturally solve the issue of multiple page fault accounting when page fault
> retry happened.

I sent a patch to fix up riscv page fault accounting some days ago:

http://lists.infradead.org/pipermail/linux-riscv/2020-June/000775.html

However, your fix is obviously even better. For the generic and riscv parts:

Reviewed-by: Pekka Enberg <[email protected]>

- Pekka

2020-07-02 15:52:29

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH 17/26] mm/riscv: Use general page fault accounting

On Wed, Jul 01, 2020 at 02:46:24PM +0300, Pekka Enberg wrote:
> Hi Peter,

Hi, Pekka,

>
> On Sat, Jun 27, 2020 at 1:36 AM Peter Xu <[email protected]> wrote:
> > Use the general page fault accounting by passing regs into handle_mm_fault().
> > It naturally solve the issue of multiple page fault accounting when page fault
> > retry happened.
>
> I sent a patch to fix up riscv page fault accounting some days ago:
>
> http://lists.infradead.org/pipermail/linux-riscv/2020-June/000775.html

Yes, this is a valid fix too.

>
> However, your fix is obviously even better. For the generic and riscv parts:
>
> Reviewed-by: Pekka Enberg <[email protected]>

Thanks!

--
Peter Xu

2020-07-03 11:08:38

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 16/26] mm/powerpc: Use general page fault accounting

Peter Xu <[email protected]> writes:
> Use the general page fault accounting by passing regs into handle_mm_fault().
>
> CC: Michael Ellerman <[email protected]>
> CC: Benjamin Herrenschmidt <[email protected]>
> CC: Paul Mackerras <[email protected]>
> CC: [email protected]
> Signed-off-by: Peter Xu <[email protected]>
> ---
> arch/powerpc/mm/fault.c | 11 +++--------
> 1 file changed, 3 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 992b10c3761c..e325d13efaf5 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -563,7 +563,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> - fault = handle_mm_fault(vma, address, flags, NULL);
> + fault = handle_mm_fault(vma, address, flags, regs);
>
> #ifdef CONFIG_PPC_MEM_KEYS
> /*
> @@ -604,14 +604,9 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
> /*
> * Major/minor page fault accounting.
> */
> - if (major) {
> - current->maj_flt++;
> - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
> + if (major)
> cmo_account_page_fault();
> - } else {
> - current->min_flt++;
> - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
> - }
> +
> return 0;
> }
> NOKPROBE_SYMBOL(__do_page_fault);


You do change the logic a bit if regs is NULL (in mm_account_fault()),
but regs can never be NULL in this path, so it looks OK to me.

Acked-by: Michael Ellerman <[email protected]>

cheers

2020-07-11 19:44:50

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [PATCH 17/26] mm/riscv: Use general page fault accounting

On Fri, 26 Jun 2020 15:36:25 PDT (-0700), [email protected] wrote:
> Use the general page fault accounting by passing regs into handle_mm_fault().
> It naturally solve the issue of multiple page fault accounting when page fault
> retry happened.
>
> CC: Paul Walmsley <[email protected]>
> CC: Palmer Dabbelt <[email protected]>
> CC: Albert Ou <[email protected]>
> CC: [email protected]
> Signed-off-by: Peter Xu <[email protected]>
> ---
> arch/riscv/mm/fault.c | 16 +---------------
> 1 file changed, 1 insertion(+), 15 deletions(-)
>
> diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
> index 677ee1bb11ac..e796ba02b572 100644
> --- a/arch/riscv/mm/fault.c
> +++ b/arch/riscv/mm/fault.c
> @@ -110,7 +110,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> - fault = handle_mm_fault(vma, addr, flags, NULL);
> + fault = handle_mm_fault(vma, addr, flags, regs);
>
> /*
> * If we need to retry but a fatal signal is pending, handle the
> @@ -128,21 +128,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
> BUG();
> }
>
> - /*
> - * Major/minor page fault accounting is only done on the
> - * initial attempt. If we go through a retry, it is extremely
> - * likely that the page will be found in page cache at that point.
> - */
> if (flags & FAULT_FLAG_ALLOW_RETRY) {
> - if (fault & VM_FAULT_MAJOR) {
> - tsk->maj_flt++;
> - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ,
> - 1, regs, addr);
> - } else {
> - tsk->min_flt++;
> - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN,
> - 1, regs, addr);
> - }
> if (fault & VM_FAULT_RETRY) {
> flags |= FAULT_FLAG_TRIED;

This still slightly changes the accounting numbers, but I don't think it does
so in a way that's meaningful enough to care about. SIGBUS is the only one
that might happen frequently enough to notice, I doubt anyone cares about
whether faults are accounted for during OOM.

Acked-by: Palmer Dabbelt <[email protected]>

Thanks!

2020-07-14 15:13:31

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 17/26] mm/riscv: Use general page fault accounting

Hi Palmer,

On Sat, Jul 11, 2020 at 10:43 PM Palmer Dabbelt <[email protected]> wrote:
> This still slightly changes the accounting numbers, but I don't think it does
> so in a way that's meaningful enough to care about. SIGBUS is the only one
> that might happen frequently enough to notice, I doubt anyone cares about
> whether faults are accounted for during OOM.

What do you mean?

AFAICT, this patch _fixes_ the accounting to be correct on page fault
restarts. I sent a patch to fix this up some time ago, but of course
this generic solutions is better:

http://lists.infradead.org/pipermail/linux-riscv/2020-June/000775.html

- Pekka