LinuxLists.cc - [PATCH 0/9] ARC do_page

2019-05-15 00:31:24

Subject: [PATCH 0/9] ARC do_page_fault rework

This was on my todo list and Eugeniy's patch in this area pestered me to
finish it finally.

The ideas it to cleanup/tidyup ancient do_page_fault() code and make
it more readable and hopefully better generated code. There are only
a few/subtle functional changes
- Eugeniy's fix to prevent user space looping
- reduction in mmap_sem hold times

Also this could have been 1 single patch but this is tricky part of mm
handling so better done as bite sized pieces to track down any regressions

Eugeniy Paltsev (1):
ARC: mm: SIGSEGV userspace trying to access kernel virtual memory

Vineet Gupta (8):
ARC: mm: do_page_fault refactor #1: remove label @good_area
ARC: mm: do_page_fault refactor #2: remove short lived variable
ARC: mm: do_page_fault refactor #3: tidyup vma access permission code
ARC: mm: do_page_fault refactor #4: consolidate retry related logic
ARC: mm: do_page_fault refactor #5: scoot no_context to end
ARC: mm: do_page_fault refactor #6: error handlers to use same pattern
ARC: mm: do_page_fault refactor #7: fold the various error handling
ARC: mm: do_page_fault refactor #8: release mmap_sem sooner

arch/arc/mm/fault.c | 192 +++++++++++++++++++++-------------------------------
1 file changed, 79 insertions(+), 113 deletions(-)

--
2.7.4

2019-05-15 00:31:48

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 5/9] ARC: mm: do_page_fault refactor #4: consolidate retry related logic

stats update code can now elide "retry" check and additional level of
indentation since all retry handling is done ahead of it already

Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/mm/fault.c | 60 +++++++++++++++++++++++++++--------------------------
1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index ae890a8d5ebf..7f211b493170 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -68,8 +68,8 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
struct mm_struct *mm = tsk->mm;
int si_code = SEGV_MAPERR;
unsigned int write = 0, exec = 0, mask;
- vm_fault_t fault;
- unsigned int flags;
+ vm_fault_t fault; /* handle_mm_fault() output */
+ unsigned int flags; /* handle_mm_fault() input */

/*
* NOTE! We MUST NOT take any locks for this case. We may
@@ -128,49 +128,51 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
goto bad_area;
}

- /*
- * If for any reason at all we couldn't handle the fault,
- * make sure we exit gracefully rather than endlessly redo
- * the fault.
- */
fault = handle_mm_fault(vma, address, flags);

- if (fatal_signal_pending(current)) {
+ /*
+ * Fault retry nuances
+ */
+ if (unlikely(fault & VM_FAULT_RETRY)) {

/*
- * if fault retry, mmap_sem already relinquished by core mm
- * so OK to return to user mode (with signal handled first)
+ * If fault needs to be retried, handle any pending signals
+ * first (by returning to user mode).
+ * mmap_sem already relinquished by core mm for RETRY case
*/
- if (fault & VM_FAULT_RETRY) {
+ if (fatal_signal_pending(current)) {
if (!user_mode(regs))
goto no_context;
return;
}
+ /*
+ * retry state machine
+ */
+ if (flags & FAULT_FLAG_ALLOW_RETRY) {
+ flags &= ~FAULT_FLAG_ALLOW_RETRY;
+ flags |= FAULT_FLAG_TRIED;
+ goto retry;
+ }
}

+ /*
+ * Major/minor page fault accounting
+ * (in case of retry we only land here once)
+ */
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);

if (likely(!(fault & VM_FAULT_ERROR))) {
- if (flags & FAULT_FLAG_ALLOW_RETRY) {
- /* To avoid updating stats twice for retry case */
- if (fault & VM_FAULT_MAJOR) {
- tsk->maj_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
- regs, address);
- } else {
- tsk->min_flt++;
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
- regs, address);
- }
-
- if (fault & VM_FAULT_RETRY) {
- flags &= ~FAULT_FLAG_ALLOW_RETRY;
- flags |= FAULT_FLAG_TRIED;
- goto retry;
- }
+ if (fault & VM_FAULT_MAJOR) {
+ tsk->maj_flt++;
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
+ regs, address);
+ } else {
+ tsk->min_flt++;
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
+ regs, address);
}

- /* Fault Handled Gracefully */
+ /* Normal return path: fault Handled Gracefully */
up_read(&mm->mmap_sem);
return;
}
--
2.7.4

2019-05-15 00:32:05

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 2/9] ARC: mm: do_page_fault refactor #1: remove label @good_area

Invert the condition for stack expansion.
No functional change

Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/mm/fault.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index 6836095251ed..94d242740ac5 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -100,21 +100,19 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
flags |= FAULT_FLAG_USER;
retry:
down_read(&mm->mmap_sem);
+
vma = find_vma(mm, address);
if (!vma)
goto bad_area;
- if (vma->vm_start <= address)
- goto good_area;
- if (!(vma->vm_flags & VM_GROWSDOWN))
- goto bad_area;
- if (expand_stack(vma, address))
- goto bad_area;
+ if (unlikely(address < vma->vm_start)) {
+ if (!(vma->vm_flags & VM_GROWSDOWN) || expand_stack(vma, address))
+ goto bad_area;
+ }

/*
* Ok, we have a good vm_area for this memory access, so
* we can handle it..
*/
-good_area:
si_code = SEGV_ACCERR;

/* Handle protection violation, execute on heap or stack */
--
2.7.4

2019-05-15 00:32:05

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 6/9] ARC: mm: do_page_fault refactor #5: scoot no_context to end

This is different than the rest of signal handling stuff

No functional change

Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/mm/fault.c | 21 +++++++--------------
1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index 7f211b493170..c0a60aeb4abd 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -201,20 +201,6 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
return;
}

-no_context:
- /* Are we prepared to handle this kernel fault?
- *
- * (The kernel has valid exception-points in the source
- * when it accesses user-memory. When it fails in one
- * of those points, we find it in a table and do a jump
- * to some fixup code that loads an appropriate error
- * code)
- */
- if (fixup_exception(regs))
- return;
-
- die("Oops", regs, address);
-
out_of_memory:
up_read(&mm->mmap_sem);

@@ -233,4 +219,11 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)

tsk->thread.fault_address = address;
force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address, tsk);
+ return;
+
+no_context:
+ if (fixup_exception(regs))
+ return;
+
+ die("Oops", regs, address);
}
--
2.7.4

2019-05-15 00:32:10

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 8/9] ARC: mm: do_page_fault refactor #7: fold the various error handling

- single up_read() call vs. 4
- so much easier on eyes

Technically it seems like @bad_area label moved up, but even in old
regime, it was a special case of delivering SIGSEGV unconditionally
which we now do as well, although with checks.

Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/mm/fault.c | 46 +++++++++++++---------------------------------
1 file changed, 13 insertions(+), 33 deletions(-)

diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index 0e1a222a97ad..20402217d9da 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -66,7 +66,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
struct vm_area_struct *vma = NULL;
struct task_struct *tsk = current;
struct mm_struct *mm = tsk->mm;
- int si_code = SEGV_MAPERR;
+ int sig, si_code = SEGV_MAPERR;
unsigned int write = 0, exec = 0, mask;
vm_fault_t fault; /* handle_mm_fault() output */
unsigned int flags; /* handle_mm_fault() input */
@@ -177,47 +177,27 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
return;
}

- if (fault & VM_FAULT_OOM)
- goto out_of_memory;
- else if (fault & VM_FAULT_SIGSEGV)
- goto bad_area;
- else if (fault & VM_FAULT_SIGBUS)
- goto do_sigbus;
-
- /* no man's land */
- BUG();
-
- /*
- * Something tried to access memory that isn't in our memory map..
- * Fix it, but check if it's kernel or user first..
- */
bad_area:
up_read(&mm->mmap_sem);

if (!user_mode(regs))
goto no_context;

- tsk->thread.fault_address = address;
- force_sig_fault(SIGSEGV, si_code, (void __user *)address, tsk);
- return;
-
-out_of_memory:
- up_read(&mm->mmap_sem);
-
- if (!user_mode(regs))
- goto no_context;
-
- pagefault_out_of_memory();
- return;
-
-do_sigbus:
- up_read(&mm->mmap_sem);
+ if (fault & VM_FAULT_OOM) {
+ pagefault_out_of_memory();
+ return;
+ }

- if (!user_mode(regs))
- goto no_context;
+ if (fault & VM_FAULT_SIGBUS) {
+ sig = SIGBUS;
+ si_code = BUS_ADRERR;
+ }
+ else {
+ sig = SIGSEGV;
+ }

tsk->thread.fault_address = address;
- force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address, tsk);
+ force_sig_fault(sig, si_code, (void __user *)address, tsk);
return;

no_context:
--
2.7.4

2019-05-15 00:32:23

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 3/9] ARC: mm: do_page_fault refactor #2: remove short lived variable

Compiler will do this anyways, still..

No functional change.

Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/mm/fault.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index 94d242740ac5..f1175685d914 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -67,23 +67,18 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
struct task_struct *tsk = current;
struct mm_struct *mm = tsk->mm;
int si_code = SEGV_MAPERR;
- int ret;
vm_fault_t fault;
int write = regs->ecr_cause & ECR_C_PROTV_STORE; /* ST/EX */
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;

/*
- * We fault-in kernel-space virtual memory on-demand. The
- * 'reference' page table is init_mm.pgd.
- *
* NOTE! We MUST NOT take any locks for this case. We may
* be in an interrupt or a critical region, and should
* only copy the information from the master page table,
* nothing more.
*/
if (address >= VMALLOC_START && !user_mode(regs)) {
- ret = handle_kernel_vaddr_fault(address);
- if (unlikely(ret))
+ if (unlikely(handle_kernel_vaddr_fault(address)))
goto no_context;
else
return;
--
2.7.4

2019-05-15 00:32:51

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 7/9] ARC: mm: do_page_fault refactor #6: error handlers to use same pattern

- up_read
- if !user_mode
- whatever error handling

Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/mm/fault.c | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index c0a60aeb4abd..0e1a222a97ad 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -194,22 +194,21 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
bad_area:
up_read(&mm->mmap_sem);

- /* User mode accesses just cause a SIGSEGV */
- if (user_mode(regs)) {
- tsk->thread.fault_address = address;
- force_sig_fault(SIGSEGV, si_code, (void __user *)address, tsk);
- return;
- }
+ if (!user_mode(regs))
+ goto no_context;
+
+ tsk->thread.fault_address = address;
+ force_sig_fault(SIGSEGV, si_code, (void __user *)address, tsk);
+ return;

out_of_memory:
up_read(&mm->mmap_sem);

- if (user_mode(regs)) {
- pagefault_out_of_memory();
- return;
- }
+ if (!user_mode(regs))
+ goto no_context;

- goto no_context;
+ pagefault_out_of_memory();
+ return;

do_sigbus:
up_read(&mm->mmap_sem);
--
2.7.4

2019-05-15 00:33:27

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 1/9] ARC: mm: SIGSEGV userspace trying to access kernel virtual memory

From: Eugeniy Paltsev <[email protected]>

As of today if userspace process tries to access a kernel virtual addres
(0x7000_0000 to 0x7ffff_ffff) such that a legit kernel mapping already
exists, that process hangs instead of being killed with SIGSEGV

Fix that by ensuring that do_page_fault() handles kenrel vaddr only if
in kernel mode.

And given this, we can also simplify the code a bit. Now a vmalloc fault
implies kernel mode so its failure (for some reason) can reuse the
@no_context label and we can remove @bad_area_nosemaphore.

Reproduce user test for original problem:

------------------------>8-----------------
#include <stdlib.h>
#include <stdint.h>

int main(int argc, char *argv[])
{
volatile uint32_t temp;

temp = *(uint32_t *)(0x70000000);
}
------------------------>8-----------------

Cc: <[email protected]>
Signed-off-by: Eugeniy Paltsev <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/mm/fault.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index 8df1638259f3..6836095251ed 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -66,7 +66,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
struct vm_area_struct *vma = NULL;
struct task_struct *tsk = current;
struct mm_struct *mm = tsk->mm;
- int si_code = 0;
+ int si_code = SEGV_MAPERR;
int ret;
vm_fault_t fault;
int write = regs->ecr_cause & ECR_C_PROTV_STORE; /* ST/EX */
@@ -81,16 +81,14 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
* only copy the information from the master page table,
* nothing more.
*/
- if (address >= VMALLOC_START) {
+ if (address >= VMALLOC_START && !user_mode(regs)) {
ret = handle_kernel_vaddr_fault(address);
if (unlikely(ret))
- goto bad_area_nosemaphore;
+ goto no_context;
else
return;
}

- si_code = SEGV_MAPERR;
-
/*
* If we're in an interrupt or have no user
* context, we must not take the fault..
@@ -198,7 +196,6 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
bad_area:
up_read(&mm->mmap_sem);

-bad_area_nosemaphore:
/* User mode accesses just cause a SIGSEGV */
if (user_mode(regs)) {
tsk->thread.fault_address = address;
--
2.7.4

2019-05-15 00:34:00

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 9/9] ARC: mm: do_page_fault refactor #8: release mmap_sem sooner

In case of successful page fault handling, this patch releases mmap_sem
before updating the perf stat event for major/minor faults. So even
though the contention reduction is NOT supe rhigh, it is still an
improvement.

There's an additional code size improvement as we only have 2 up_read()
calls now.

Note to myself:
--------------

1. Given the way it is done, we are forced to move @bad_area label earlier
causing the various "goto bad_area" cases to hit perf stat code.

- PERF_COUNT_SW_PAGE_FAULTS is NOW updated for access errors which is what
arm/arm64 seem to be doing as well (with slightly different code)
- PERF_COUNT_SW_PAGE_FAULTS_{MAJ,MIN} must NOT be updated for the
error case which is guarded by now setting @fault initial value
to VM_FAULT_ERROR which serves both cases when handle_mm_fault()
returns error or is not called at all.

2. arm/arm64 use two homebrew fault flags VM_FAULT_BAD{MAP,MAPACCESS}
which I was inclined to add too but seems not needed for ARC

- given that we have everything is 1 function we cabn stil use goto
- we setup si_code at the right place (arm* do that in the end)
- we init fault already to error value which guards entry into perf
stats event update

Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/mm/fault.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index 20402217d9da..e93ea06c214c 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -68,7 +68,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
struct mm_struct *mm = tsk->mm;
int sig, si_code = SEGV_MAPERR;
unsigned int write = 0, exec = 0, mask;
- vm_fault_t fault; /* handle_mm_fault() output */
+ vm_fault_t fault = VM_FAULT_ERROR; /* handle_mm_fault() output */
unsigned int flags; /* handle_mm_fault() input */

/*
@@ -155,6 +155,9 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
}
}

+bad_area:
+ up_read(&mm->mmap_sem);
+
/*
* Major/minor page fault accounting
* (in case of retry we only land here once)
@@ -173,13 +176,9 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
}

/* Normal return path: fault Handled Gracefully */
- up_read(&mm->mmap_sem);
return;
}

-bad_area:
- up_read(&mm->mmap_sem);
-
if (!user_mode(regs))
goto no_context;

--
2.7.4

2019-05-15 00:35:09

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 4/9] ARC: mm: do_page_fault refactor #3: tidyup vma access permission code

The coding pattern to NOT intialize variables at declaration time but
rather near code which makes us eof them makes it much easier to grok
the overall logic, specially when the init is not simply 0 or 1

Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/mm/fault.c | 39 +++++++++++++++++++++------------------
1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index f1175685d914..ae890a8d5ebf 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -67,9 +67,9 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
struct task_struct *tsk = current;
struct mm_struct *mm = tsk->mm;
int si_code = SEGV_MAPERR;
+ unsigned int write = 0, exec = 0, mask;
vm_fault_t fault;
- int write = regs->ecr_cause & ECR_C_PROTV_STORE; /* ST/EX */
- unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ unsigned int flags;

/*
* NOTE! We MUST NOT take any locks for this case. We may
@@ -91,8 +91,18 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
if (faulthandler_disabled() || !mm)
goto no_context;

+ if (regs->ecr_cause & ECR_C_PROTV_STORE) /* ST/EX */
+ write = 1;
+ else if ((regs->ecr_vec == ECR_V_PROTV) &&
+ (regs->ecr_cause == ECR_C_PROTV_INST_FETCH))
+ exec = 1;
+
+ flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
+ if (write)
+ flags |= FAULT_FLAG_WRITE;
+
retry:
down_read(&mm->mmap_sem);

@@ -105,24 +115,17 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
}

/*
- * Ok, we have a good vm_area for this memory access, so
- * we can handle it..
+ * vm_area is good, now check permissions for this memory access
*/
- si_code = SEGV_ACCERR;
-
- /* Handle protection violation, execute on heap or stack */
-
- if ((regs->ecr_vec == ECR_V_PROTV) &&
- (regs->ecr_cause == ECR_C_PROTV_INST_FETCH))
+ mask = VM_READ;
+ if (write)
+ mask = VM_WRITE;
+ if (exec)
+ mask = VM_EXEC;
+
+ if (!(vma->vm_flags & mask)) {
+ si_code = SEGV_ACCERR;
goto bad_area;
-
- if (write) {
- if (!(vma->vm_flags & VM_WRITE))
- goto bad_area;
- flags |= FAULT_FLAG_WRITE;
- } else {
- if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
- goto bad_area;
}

/*
--
2.7.4

2019-05-16 18:36:04

by Eugeniy Paltsev

[permalink] [raw]

Subject: Re: [PATCH 4/9] ARC: mm: do_page_fault refactor #3: tidyup vma access permission code

On Tue, 2019-05-14 at 17:29 -0700, Vineet Gupta wrote:
> The coding pattern to NOT intialize variables at declaration time but
> rather near code which makes us eof them makes it much easier to grok
> the overall logic, specially when the init is not simply 0 or 1
>
> Signed-off-by: Vineet Gupta <[email protected]>
> ---
> arch/arc/mm/fault.c | 39 +++++++++++++++++++++------------------
> 1 file changed, 21 insertions(+), 18 deletions(-)
>
> diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
> index f1175685d914..ae890a8d5ebf 100644
> --- a/arch/arc/mm/fault.c
> +++ b/arch/arc/mm/fault.c
> @@ -67,9 +67,9 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
> struct task_struct *tsk = current;
> struct mm_struct *mm = tsk->mm;
> int si_code = SEGV_MAPERR;
> + unsigned int write = 0, exec = 0, mask;

Probably it's better to use 'bool' type for 'write' and 'exec' as we really use them as a boolean variables.

> vm_fault_t fault;
> - int write = regs->ecr_cause & ECR_C_PROTV_STORE; /* ST/EX */
> - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + unsigned int flags;
>
> /*
> * NOTE! We MUST NOT take any locks for this case. We may
> @@ -91,8 +91,18 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
> if (faulthandler_disabled() || !mm)
> goto no_context;
>
> + if (regs->ecr_cause & ECR_C_PROTV_STORE) /* ST/EX */
> + write = 1;
> + else if ((regs->ecr_vec == ECR_V_PROTV) &&
> + (regs->ecr_cause == ECR_C_PROTV_INST_FETCH))
> + exec = 1;
> +
> + flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> if (user_mode(regs))
> flags |= FAULT_FLAG_USER;
> + if (write)
> + flags |= FAULT_FLAG_WRITE;
> +
> retry:
> down_read(&mm->mmap_sem);
>
> @@ -105,24 +115,17 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
> }
>
> /*
> - * Ok, we have a good vm_area for this memory access, so
> - * we can handle it..
> + * vm_area is good, now check permissions for this memory access
> */
> - si_code = SEGV_ACCERR;
> -
> - /* Handle protection violation, execute on heap or stack */
> -
> - if ((regs->ecr_vec == ECR_V_PROTV) &&
> - (regs->ecr_cause == ECR_C_PROTV_INST_FETCH))
> + mask = VM_READ;
> + if (write)
> + mask = VM_WRITE;
> + if (exec)
> + mask = VM_EXEC;
> +
> + if (!(vma->vm_flags & mask)) {
> + si_code = SEGV_ACCERR;
> goto bad_area;
> -
> - if (write) {
> - if (!(vma->vm_flags & VM_WRITE))
> - goto bad_area;
> - flags |= FAULT_FLAG_WRITE;
> - } else {
> - if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
> - goto bad_area;
> }
>
> /*
--
Eugeniy Paltsev

2019-05-16 19:26:16

by Vineet Gupta

[permalink] [raw]

Subject: Re: [PATCH 4/9] ARC: mm: do_page_fault refactor #3: tidyup vma access permission code

On 5/16/19 10:44 AM, Alexey Brodkin wrote:
>> On 5/16/19 10:24 AM, Eugeniy Paltsev wrote:
>>>> + unsigned int write = 0, exec = 0, mask;
>>> Probably it's better to use 'bool' type for 'write' and 'exec' as we really use them as a boolean
>> variables.
>>
>> Right those are semantics, but the generated code for "bool" is not ideal - given
>> it is inherently a "char" it is promoted first to an int with an additional EXTB
>> which I really dislike.
>> Guess it is more of a style thing.
>
> In that sense maybe think about re-definition of "bool" type to 32-bit one
> for entire architecture and get that benefit across the entire source tree?

what bool maps to is driven by the ABI and while not explicit in the ARCv2 ABI
doc, I guess it is byte and hence can't be changed just like that.

-Vineet

2019-05-16 20:37:01

by Vineet Gupta

[permalink] [raw]

Subject: Re: [PATCH 4/9] ARC: mm: do_page_fault refactor #3: tidyup vma access permission code

On 5/16/19 10:24 AM, Eugeniy Paltsev wrote:
>> + unsigned int write = 0, exec = 0, mask;
> Probably it's better to use 'bool' type for 'write' and 'exec' as we really use them as a boolean variables.

Right those are semantics, but the generated code for "bool" is not ideal - given
it is inherently a "char" it is promoted first to an int with an additional EXTB
which I really dislike.
Guess it is more of a style thing.

-Vineet

2019-05-16 20:49:08

by Alexey Brodkin

[permalink] [raw]

Subject: RE: [PATCH 4/9] ARC: mm: do_page_fault refactor #3: tidyup vma access permission code

Hi Vineet,

> -----Original Message-----
> From: Vineet Gupta <[email protected]>
> Sent: Thursday, May 16, 2019 8:38 PM
> To: Eugeniy Paltsev <[email protected]>
> Cc: [email protected]; [email protected]; Alexey Brodkin <[email protected]>; linux-
> [email protected]
> Subject: Re: [PATCH 4/9] ARC: mm: do_page_fault refactor #3: tidyup vma access permission code
>
> On 5/16/19 10:24 AM, Eugeniy Paltsev wrote:
> >> + unsigned int write = 0, exec = 0, mask;
> > Probably it's better to use 'bool' type for 'write' and 'exec' as we really use them as a boolean
> variables.
>
> Right those are semantics, but the generated code for "bool" is not ideal - given
> it is inherently a "char" it is promoted first to an int with an additional EXTB
> which I really dislike.
> Guess it is more of a style thing.

In that sense maybe think about re-definition of "bool" type to 32-bit one
for entire architecture and get that benefit across the entire source tree?

-Alexey

2019-05-17 23:16:07

by Eugeniy Paltsev

[permalink] [raw]

Subject: Re: [PATCH 4/9] ARC: mm: do_page_fault refactor #3: tidyup vma access permission code

Hmmm,

so load the bool variable from memory is converted to such asm code:

----------------->8-------------------
ldb r2,[some_bool_address]
extb_s r2,r2
----------------->8-------------------

Could you please describe that the magic is going on there?

This extb_s instruction looks completely useless here, according on the LDB description from PRM:
----------------->8-------------------
LD LDH LDW LDB LDD:
The size of the requested data is specified by the data size field <.zz> and by default, data is zero
extended from the most-significant bit of the data to the most-significant bit of the destination
register.
----------------->8-------------------

Am I missing something?

On Thu, 2019-05-16 at 17:37 +0000, Vineet Gupta wrote:
> On 5/16/19 10:24 AM, Eugeniy Paltsev wrote:
> > > + unsigned int write = 0, exec = 0, mask;
> >
> > Probably it's better to use 'bool' type for 'write' and 'exec' as we really use them as a boolean variables.
>
> Right those are semantics, but the generated code for "bool" is not ideal - given
> it is inherently a "char" it is promoted first to an int with an additional EXTB
> which I really dislike.
> Guess it is more of a style thing.
>
> -Vineet
--
Eugeniy Paltsev

2019-05-30 16:50:17

by Vineet Gupta

[permalink] [raw]

Subject: Re: [PATCH 9/9] ARC: mm: do_page_fault refactor #8: release mmap_sem sooner

On 5/14/19 5:29 PM, Vineet Gupta wrote:
> In case of successful page fault handling, this patch releases mmap_sem
> before updating the perf stat event for major/minor faults. So even
> though the contention reduction is NOT supe rhigh, it is still an
> improvement.

This patch causes regression: LMBench lat_sig triggers a bogus oom.

The issue is not due to code movement of up_read() but as artifact of @fault
initialized to VM_FAULT_ERROR. The lat_sig invalid access doesn't take
handle_mm_fault() instead it hits !VM_WRITE case for write access leading to use
of "initialized" value of @fault VM_FAULT_ERROR which includes VM_FAULT_OOM hence
the bogus oom handling.

> There's an additional code size improvement as we only have 2 up_read()
> calls now.
>
> Note to myself:
> --------------
>
> 1. Given the way it is done, we are forced to move @bad_area label earlier
> causing the various "goto bad_area" cases to hit perf stat code.
>
> - PERF_COUNT_SW_PAGE_FAULTS is NOW updated for access errors which is what
> arm/arm64 seem to be doing as well (with slightly different code)
> - PERF_COUNT_SW_PAGE_FAULTS_{MAJ,MIN} must NOT be updated for the
> error case which is guarded by now setting @fault initial value
> to VM_FAULT_ERROR which serves both cases when handle_mm_fault()
> returns error or is not called at all.
>
> 2. arm/arm64 use two homebrew fault flags VM_FAULT_BAD{MAP,MAPACCESS}
> which I was inclined to add too but seems not needed for ARC
>
> - given that we have everything is 1 function we cabn stil use goto
> - we setup si_code at the right place (arm* do that in the end)
> - we init fault already to error value which guards entry into perf
> stats event update
>
> Cc: Peter Zijlstra <[email protected]>
> Signed-off-by: Vineet Gupta <[email protected]>
> ---
> arch/arc/mm/fault.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
> index 20402217d9da..e93ea06c214c 100644
> --- a/arch/arc/mm/fault.c
> +++ b/arch/arc/mm/fault.c
> @@ -68,7 +68,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
> struct mm_struct *mm = tsk->mm;
> int sig, si_code = SEGV_MAPERR;
> unsigned int write = 0, exec = 0, mask;
> - vm_fault_t fault; /* handle_mm_fault() output */
> + vm_fault_t fault = VM_FAULT_ERROR; /* handle_mm_fault() output */
> unsigned int flags; /* handle_mm_fault() input */
>
> /*
> @@ -155,6 +155,9 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
> }
> }
>
> +bad_area:
> + up_read(&mm->mmap_sem);
> +
> /*
> * Major/minor page fault accounting
> * (in case of retry we only land here once)
> @@ -173,13 +176,9 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
> }
>
> /* Normal return path: fault Handled Gracefully */
> - up_read(&mm->mmap_sem);
> return;
> }
>
> -bad_area:
> - up_read(&mm->mmap_sem);
> -
> if (!user_mode(regs))
> goto no_context;
>
>

2019-05-30 18:00:37

by Vineet Gupta

[permalink] [raw]

Subject: extraneous generated EXTB (was Re: [PATCH 4/9] ARC: mm: do_page_fault refactor #3: tidyup vma access permission code)

On 5/17/19 3:23 PM, Eugeniy Paltsev wrote:
> Hmmm,
>
> so load the bool variable from memory is converted to such asm code:
>
> ----------------->8-------------------
> ldb r2,[some_bool_address]
> extb_s r2,r2
> ----------------->8-------------------
>
> Could you please describe that the magic is going on there?
>
> This extb_s instruction looks completely useless here, according on the LDB description from PRM:
> ----------------->8-------------------
> LD LDH LDW LDB LDD:
> The size of the requested data is specified by the data size field <.zz> and by default, data is zero
> extended from the most-significant bit of the data to the most-significant bit of the destination
> register.
> ----------------->8-------------------
>
> Am I missing something?

@Claudiu is that a target specific optimization/tuning in ARC backend ?

>
> On Thu, 2019-05-16 at 17:37 +0000, Vineet Gupta wrote:
>> On 5/16/19 10:24 AM, Eugeniy Paltsev wrote:
>>>> + unsigned int write = 0, exec = 0, mask;
>>>
>>> Probably it's better to use 'bool' type for 'write' and 'exec' as we really use them as a boolean variables.
>>
>> Right those are semantics, but the generated code for "bool" is not ideal - given
>> it is inherently a "char" it is promoted first to an int with an additional EXTB
>> which I really dislike.
>> Guess it is more of a style thing.
>>
>> -Vineet