This patchset is a split of previous patch called the same way as
this summary. Comments from Michael are taken into account.
Christophe Leroy (5):
powerpc/mm: only call store_updates_sp() on stores in do_page_fault()
powerpc/mm: split store_updates_sp() in two parts in do_page_fault()
powerpc/mm: remove a redundant test in do_page_fault()
powerpc/mm: Evaluate user_mode(regs) only once in do_page_fault()
powerpc/mm: The 8xx doesn't call do_page_fault() for breakpoints
arch/powerpc/mm/fault.c | 30 ++++++++++++++----------------
1 file changed, 14 insertions(+), 16 deletions(-)
--
2.12.0
Only the get_user() in store_updates_sp() has to be done outside
the mm semaphore. All the comparison can be done within the semaphore,
so only when really needed.
As we got a DSI exception, the address pointed by regs->nip is
obviously valid, otherwise we would have had a instruction exception.
So __get_user() can be used instead of get_user()
Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/mm/fault.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 67fefb59d40e..9d21e5fd383d 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -73,12 +73,8 @@ static inline int notify_page_fault(struct pt_regs *regs)
* Check whether the instruction at regs->nip is a store using
* an update addressing form which will update r1.
*/
-static int store_updates_sp(struct pt_regs *regs)
+static int store_updates_sp(unsigned int inst)
{
- unsigned int inst;
-
- if (get_user(inst, (unsigned int __user *)regs->nip))
- return 0;
/* check for 1 in the rA field */
if (((inst >> 16) & 0x1f) != 1)
return 0;
@@ -207,7 +203,8 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
int trap = TRAP(regs);
int is_exec = trap == 0x400;
int fault;
- int rc = 0, store_update_sp = 0;
+ int rc = 0;
+ unsigned int inst = 0;
#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
/*
@@ -288,7 +285,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
* mmap_sem held
*/
if (is_write && user_mode(regs))
- store_update_sp = store_updates_sp(regs);
+ __get_user(inst, (unsigned int __user *)regs->nip);
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
@@ -358,7 +355,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
* between the last mapped region and the stack will
* expand the stack rather than segfaulting.
*/
- if (address + 2048 < uregs->gpr[1] && !store_update_sp)
+ if (address + 2048 < uregs->gpr[1] && !store_updates_sp(inst))
goto bad_area;
}
if (expand_stack(vma, address))
--
2.12.0
Analysis of the assembly code shows that when using user_mode(regs),
at least the 'andi.' is redone all the time, and also
the 'lwz ,132(r31)' most of the time. With the new form, the 'is_user'
is mapped to cr4, then all further use of is_user results in just
things like 'beq cr4,218 <do_page_fault+0x218>'
Without the patch:
50: 81 1e 00 84 lwz r8,132(r30)
54: 71 09 40 00 andi. r9,r8,16384
58: 40 82 00 0c bne 64 <do_page_fault+0x64>
84: 81 3e 00 84 lwz r9,132(r30)
8c: 71 2a 40 00 andi. r10,r9,16384
90: 41 a2 01 64 beq 1f4 <do_page_fault+0x1f4>
d4: 81 3e 00 84 lwz r9,132(r30)
dc: 71 28 40 00 andi. r8,r9,16384
e0: 41 82 02 08 beq 2e8 <do_page_fault+0x2e8>
108: 81 3e 00 84 lwz r9,132(r30)
110: 71 28 40 00 andi. r8,r9,16384
118: 41 82 02 28 beq 340 <do_page_fault+0x340>
1e4: 81 3e 00 84 lwz r9,132(r30)
1e8: 71 2a 40 00 andi. r10,r9,16384
1ec: 40 82 01 68 bne 354 <do_page_fault+0x354>
228: 81 3e 00 84 lwz r9,132(r30)
22c: 71 28 40 00 andi. r8,r9,16384
230: 41 82 ff c4 beq 1f4 <do_page_fault+0x1f4>
288: 71 2a 40 00 andi. r10,r9,16384
294: 41 a2 fe 60 beq f4 <do_page_fault+0xf4>
50c: 81 3e 00 84 lwz r9,132(r30)
514: 71 2a 40 00 andi. r10,r9,16384
518: 40 a2 fc e0 bne 1f8 <do_page_fault+0x1f8>
534: 81 3e 00 84 lwz r9,132(r30)
53c: 71 2a 40 00 andi. r10,r9,16384
540: 41 82 fc b8 beq 1f8 <do_page_fault+0x1f8>
This patch creates a local var called 'is_user' which contains the
result of user_mode(regs)
With the patch:
20: 81 03 00 84 lwz r8,132(r3)
48: 55 09 97 fe rlwinm r9,r8,18,31,31
58: 2e 09 00 00 cmpwi cr4,r9,0
5c: 40 92 00 0c bne cr4,68 <do_page_fault+0x68>
88: 41 b2 01 90 beq cr4,218 <do_page_fault+0x218>
d4: 40 92 01 d0 bne cr4,2a4 <do_page_fault+0x2a4>
120: 41 b2 00 f8 beq cr4,218 <do_page_fault+0x218>
138: 41 b2 ff a0 beq cr4,d8 <do_page_fault+0xd8>
1d4: 40 92 00 e0 bne cr4,2b4 <do_page_fault+0x2b4>
Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/mm/fault.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index b56bf472db6d..8d1639eee3af 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -202,6 +202,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
int is_write = 0;
int trap = TRAP(regs);
int is_exec = trap == 0x400;
+ int is_user = user_mode(regs);
int fault;
int rc = 0;
unsigned int inst = 0;
@@ -244,7 +245,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
* The kernel should never take an execute fault nor should it
* take a page fault to a kernel address.
*/
- if (!user_mode(regs) && (is_exec || (address >= TASK_SIZE))) {
+ if (!is_user && (is_exec || (address >= TASK_SIZE))) {
rc = SIGSEGV;
goto bail;
}
@@ -263,7 +264,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
local_irq_enable();
if (faulthandler_disabled() || mm == NULL) {
- if (!user_mode(regs)) {
+ if (!is_user) {
rc = SIGSEGV;
goto bail;
}
@@ -284,10 +285,10 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
* can result in fault, which will cause a deadlock when called with
* mmap_sem held
*/
- if (is_write && user_mode(regs))
+ if (is_write && is_user)
__get_user(inst, (unsigned int __user *)regs->nip);
- if (user_mode(regs))
+ if (is_user)
flags |= FAULT_FLAG_USER;
/* When running in the kernel we expect faults to occur only to
@@ -306,7 +307,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
* thus avoiding the deadlock.
*/
if (!down_read_trylock(&mm->mmap_sem)) {
- if (!user_mode(regs) && !search_exception_tables(regs->nip))
+ if (!is_user && !search_exception_tables(regs->nip))
goto bad_area_nosemaphore;
retry:
@@ -506,7 +507,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
bad_area_nosemaphore:
/* User mode accesses cause a SIGSEGV */
- if (user_mode(regs)) {
+ if (is_user) {
_exception(SIGSEGV, regs, code, address);
goto bail;
}
--
2.12.0
The 8xx has a dedicated exception for breakpoints, that directly
calls do_break()
Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/mm/fault.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 8d1639eee3af..400f2d0d42f8 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -251,7 +251,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
}
#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE) || \
- defined(CONFIG_PPC_BOOK3S_64))
+ defined(CONFIG_PPC_BOOK3S_64) || defined(CONFIG_PPC_8xx))
if (error_code & DSISR_DABRMATCH) {
/* breakpoint match */
do_break(regs, address, error_code);
--
2.12.0
Function store_updates_sp() checks whether the faulting
instruction is a store updating r1. Therefore we can limit its calls
to stores exceptions.
This patch is an improvement of commit a7a9dcd882a67 ("powerpc: Avoid
taking a data miss on every userspace instruction miss")
With the same microbenchmark app, run with 500 as argument, on an
MPC885 we get:
Before this patch: 152000 DTLB misses
After this patch: 147000 DTLB misses
Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/mm/fault.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 3a7d580fdc59..67fefb59d40e 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -287,7 +287,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
* can result in fault, which will cause a deadlock when called with
* mmap_sem held
*/
- if (!is_exec && user_mode(regs))
+ if (is_write && user_mode(regs))
store_update_sp = store_updates_sp(regs);
if (user_mode(regs))
--
2.12.0
The result of (trap == 0x400) is already in is_exec.
Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/mm/fault.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 9d21e5fd383d..b56bf472db6d 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -213,7 +213,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
* bits we are interested in. But there are some bits which
* indicate errors in DSISR but can validly be set in SRR1.
*/
- if (trap == 0x400)
+ if (is_exec)
error_code &= 0x48200000;
else
is_write = error_code & DSISR_ISSTORE;
--
2.12.0
Christophe Leroy <[email protected]> writes:
> Function store_updates_sp() checks whether the faulting
> instruction is a store updating r1. Therefore we can limit its calls
> to stores exceptions.
>
> This patch is an improvement of commit a7a9dcd882a67 ("powerpc: Avoid
> taking a data miss on every userspace instruction miss")
>
> With the same microbenchmark app, run with 500 as argument, on an
> MPC885 we get:
>
> Before this patch: 152000 DTLB misses
> After this patch: 147000 DTLB misses
>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
> Signed-off-by: Christophe Leroy <[email protected]>
> ---
> arch/powerpc/mm/fault.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 3a7d580fdc59..67fefb59d40e 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -287,7 +287,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
> * can result in fault, which will cause a deadlock when called with
> * mmap_sem held
> */
> - if (!is_exec && user_mode(regs))
> + if (is_write && user_mode(regs))
> store_update_sp = store_updates_sp(regs);
>
> if (user_mode(regs))
> --
> 2.12.0
Christophe Leroy <[email protected]> writes:
> Only the get_user() in store_updates_sp() has to be done outside
> the mm semaphore. All the comparison can be done within the semaphore,
> so only when really needed.
>
> As we got a DSI exception, the address pointed by regs->nip is
> obviously valid, otherwise we would have had a instruction exception.
> So __get_user() can be used instead of get_user()
>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
> Signed-off-by: Christophe Leroy <[email protected]>
> ---
> arch/powerpc/mm/fault.c | 13 +++++--------
> 1 file changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 67fefb59d40e..9d21e5fd383d 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -73,12 +73,8 @@ static inline int notify_page_fault(struct pt_regs *regs)
> * Check whether the instruction at regs->nip is a store using
> * an update addressing form which will update r1.
> */
> -static int store_updates_sp(struct pt_regs *regs)
> +static int store_updates_sp(unsigned int inst)
> {
> - unsigned int inst;
> -
> - if (get_user(inst, (unsigned int __user *)regs->nip))
> - return 0;
> /* check for 1 in the rA field */
> if (((inst >> 16) & 0x1f) != 1)
> return 0;
> @@ -207,7 +203,8 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
> int trap = TRAP(regs);
> int is_exec = trap == 0x400;
> int fault;
> - int rc = 0, store_update_sp = 0;
> + int rc = 0;
> + unsigned int inst = 0;
>
> #if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
> /*
> @@ -288,7 +285,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
> * mmap_sem held
> */
> if (is_write && user_mode(regs))
> - store_update_sp = store_updates_sp(regs);
> + __get_user(inst, (unsigned int __user *)regs->nip);
>
> if (user_mode(regs))
> flags |= FAULT_FLAG_USER;
> @@ -358,7 +355,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
> * between the last mapped region and the stack will
> * expand the stack rather than segfaulting.
> */
> - if (address + 2048 < uregs->gpr[1] && !store_update_sp)
> + if (address + 2048 < uregs->gpr[1] && !store_updates_sp(inst))
> goto bad_area;
> }
> if (expand_stack(vma, address))
> --
> 2.12.0
Christophe Leroy <[email protected]> writes:
> Analysis of the assembly code shows that when using user_mode(regs),
> at least the 'andi.' is redone all the time, and also
> the 'lwz ,132(r31)' most of the time. With the new form, the 'is_user'
> is mapped to cr4, then all further use of is_user results in just
> things like 'beq cr4,218 <do_page_fault+0x218>'
>
> Without the patch:
>
> 50: 81 1e 00 84 lwz r8,132(r30)
> 54: 71 09 40 00 andi. r9,r8,16384
> 58: 40 82 00 0c bne 64 <do_page_fault+0x64>
>
> 84: 81 3e 00 84 lwz r9,132(r30)
> 8c: 71 2a 40 00 andi. r10,r9,16384
> 90: 41 a2 01 64 beq 1f4 <do_page_fault+0x1f4>
>
> d4: 81 3e 00 84 lwz r9,132(r30)
> dc: 71 28 40 00 andi. r8,r9,16384
> e0: 41 82 02 08 beq 2e8 <do_page_fault+0x2e8>
>
> 108: 81 3e 00 84 lwz r9,132(r30)
> 110: 71 28 40 00 andi. r8,r9,16384
> 118: 41 82 02 28 beq 340 <do_page_fault+0x340>
>
> 1e4: 81 3e 00 84 lwz r9,132(r30)
> 1e8: 71 2a 40 00 andi. r10,r9,16384
> 1ec: 40 82 01 68 bne 354 <do_page_fault+0x354>
>
> 228: 81 3e 00 84 lwz r9,132(r30)
> 22c: 71 28 40 00 andi. r8,r9,16384
> 230: 41 82 ff c4 beq 1f4 <do_page_fault+0x1f4>
>
> 288: 71 2a 40 00 andi. r10,r9,16384
> 294: 41 a2 fe 60 beq f4 <do_page_fault+0xf4>
>
> 50c: 81 3e 00 84 lwz r9,132(r30)
> 514: 71 2a 40 00 andi. r10,r9,16384
> 518: 40 a2 fc e0 bne 1f8 <do_page_fault+0x1f8>
>
> 534: 81 3e 00 84 lwz r9,132(r30)
> 53c: 71 2a 40 00 andi. r10,r9,16384
> 540: 41 82 fc b8 beq 1f8 <do_page_fault+0x1f8>
>
> This patch creates a local var called 'is_user' which contains the
> result of user_mode(regs)
>
> With the patch:
>
> 20: 81 03 00 84 lwz r8,132(r3)
> 48: 55 09 97 fe rlwinm r9,r8,18,31,31
> 58: 2e 09 00 00 cmpwi cr4,r9,0
> 5c: 40 92 00 0c bne cr4,68 <do_page_fault+0x68>
>
> 88: 41 b2 01 90 beq cr4,218 <do_page_fault+0x218>
>
> d4: 40 92 01 d0 bne cr4,2a4 <do_page_fault+0x2a4>
>
> 120: 41 b2 00 f8 beq cr4,218 <do_page_fault+0x218>
>
> 138: 41 b2 ff a0 beq cr4,d8 <do_page_fault+0xd8>
>
> 1d4: 40 92 00 e0 bne cr4,2b4 <do_page_fault+0x2b4>
>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
> Signed-off-by: Christophe Leroy <[email protected]>
> ---
> arch/powerpc/mm/fault.c | 13 +++++++------
> 1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index b56bf472db6d..8d1639eee3af 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -202,6 +202,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
> int is_write = 0;
> int trap = TRAP(regs);
> int is_exec = trap == 0x400;
> + int is_user = user_mode(regs);
> int fault;
> int rc = 0;
> unsigned int inst = 0;
> @@ -244,7 +245,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
> * The kernel should never take an execute fault nor should it
> * take a page fault to a kernel address.
> */
> - if (!user_mode(regs) && (is_exec || (address >= TASK_SIZE))) {
> + if (!is_user && (is_exec || (address >= TASK_SIZE))) {
> rc = SIGSEGV;
> goto bail;
> }
> @@ -263,7 +264,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
> local_irq_enable();
>
> if (faulthandler_disabled() || mm == NULL) {
> - if (!user_mode(regs)) {
> + if (!is_user) {
> rc = SIGSEGV;
> goto bail;
> }
> @@ -284,10 +285,10 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
> * can result in fault, which will cause a deadlock when called with
> * mmap_sem held
> */
> - if (is_write && user_mode(regs))
> + if (is_write && is_user)
> __get_user(inst, (unsigned int __user *)regs->nip);
>
> - if (user_mode(regs))
> + if (is_user)
> flags |= FAULT_FLAG_USER;
>
> /* When running in the kernel we expect faults to occur only to
> @@ -306,7 +307,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
> * thus avoiding the deadlock.
> */
> if (!down_read_trylock(&mm->mmap_sem)) {
> - if (!user_mode(regs) && !search_exception_tables(regs->nip))
> + if (!is_user && !search_exception_tables(regs->nip))
> goto bad_area_nosemaphore;
>
> retry:
> @@ -506,7 +507,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
>
> bad_area_nosemaphore:
> /* User mode accesses cause a SIGSEGV */
> - if (user_mode(regs)) {
> + if (is_user) {
> _exception(SIGSEGV, regs, code, address);
> goto bail;
> }
> --
> 2.12.0
Christophe Leroy <[email protected]> writes:
> Only the get_user() in store_updates_sp() has to be done outside
> the mm semaphore. All the comparison can be done within the semaphore,
> so only when really needed.
>
> As we got a DSI exception, the address pointed by regs->nip is
> obviously valid, otherwise we would have had a instruction exception.
> So __get_user() can be used instead of get_user()
I don't think that part is true.
You took a DSI so there *was* an instruction at NIP, but since then it
may have been unmapped by another thread.
So I don't think you can assume the get_user() will succeed.
cheers
Le 02/06/2017 à 11:26, Michael Ellerman a écrit :
> Christophe Leroy <[email protected]> writes:
>
>> Only the get_user() in store_updates_sp() has to be done outside
>> the mm semaphore. All the comparison can be done within the semaphore,
>> so only when really needed.
>>
>> As we got a DSI exception, the address pointed by regs->nip is
>> obviously valid, otherwise we would have had a instruction exception.
>> So __get_user() can be used instead of get_user()
>
> I don't think that part is true.
>
> You took a DSI so there *was* an instruction at NIP, but since then it
> may have been unmapped by another thread.
>
> So I don't think you can assume the get_user() will succeed.
>
The difference between get_user() and __get_user() is that get_user()
performs an access_ok() in addition.
Doesn't access_ok() only checks whether addr is below TASK_SIZE to
ensure it is a valid user address ?
Christophe
On Fri, 2017-06-02 at 11:39 +0200, Christophe LEROY wrote:
> The difference between get_user() and __get_user() is that get_user()
> performs an access_ok() in addition.
>
> Doesn't access_ok() only checks whether addr is below TASK_SIZE to
> ensure it is a valid user address ?
Do you have a measurable improvement by skipping that check ? I agree
with your reasoning but I'm also paranoid and so I wouldn't change it
unless it's really worth it.
Cheers,
Ben.
Le 02/06/2017 à 14:11, Benjamin Herrenschmidt a écrit :
> On Fri, 2017-06-02 at 11:39 +0200, Christophe LEROY wrote:
>> The difference between get_user() and __get_user() is that get_user()
>> performs an access_ok() in addition.
>>
>> Doesn't access_ok() only checks whether addr is below TASK_SIZE to
>> ensure it is a valid user address ?
>
> Do you have a measurable improvement by skipping that check ? I agree
> with your reasoning but I'm also paranoid and so I wouldn't change it
> unless it's really worth it.
>
No I don't have. Taking into account the patch following this serie
which limits even more the calls to get_user(), it is probably not worth
it anymore (see https://patchwork.ozlabs.org/patch/757564/)
I will then have to resubmit the entire serie (including that additional
one), but there is no get_user_inatomic() so will have to either:
- do the access_ok() verification inside the function
- get back to v2 (https://patchwork.ozlabs.org/patch/756234/)
- implement an get_user_inatomic() function
What would be the best ?
Christophe
On Wed, 2017-04-19 at 12:56:24 UTC, Christophe Leroy wrote:
> Function store_updates_sp() checks whether the faulting
> instruction is a store updating r1. Therefore we can limit its calls
> to stores exceptions.
>
> This patch is an improvement of commit a7a9dcd882a67 ("powerpc: Avoid
> taking a data miss on every userspace instruction miss")
>
> With the same microbenchmark app, run with 500 as argument, on an
> MPC885 we get:
>
> Before this patch: 152000 DTLB misses
> After this patch: 147000 DTLB misses
>
> Signed-off-by: Christophe Leroy <[email protected]>
> Reviewed-by: Aneesh Kumar K.V <[email protected]>
Patches 1, 3, 4 and 5 applied to powerpc next, thanks.
https://git.kernel.org/powerpc/c/e8de85ca32f572f5dee00733022d8a
cheers
Christophe LEROY <[email protected]> writes:
> Le 02/06/2017 à 11:26, Michael Ellerman a écrit :
>> Christophe Leroy <[email protected]> writes:
>>
>>> Only the get_user() in store_updates_sp() has to be done outside
>>> the mm semaphore. All the comparison can be done within the semaphore,
>>> so only when really needed.
>>>
>>> As we got a DSI exception, the address pointed by regs->nip is
>>> obviously valid, otherwise we would have had a instruction exception.
>>> So __get_user() can be used instead of get_user()
>>
>> I don't think that part is true.
>>
>> You took a DSI so there *was* an instruction at NIP, but since then it
>> may have been unmapped by another thread.
>>
>> So I don't think you can assume the get_user() will succeed.
>
> The difference between get_user() and __get_user() is that get_user()
> performs an access_ok() in addition.
>
> Doesn't access_ok() only checks whether addr is below TASK_SIZE to
> ensure it is a valid user address ?
Yeah more or less, via some gross macros.
I was actually not that worried about the switch from get_user() to
__get_user(), but rather that you removed the check of the return value.
ie.
- if (get_user(inst, (unsigned int __user *)regs->nip))
- return 0;
Became:
if (is_write && user_mode(regs))
- store_update_sp = store_updates_sp(regs);
+ __get_user(inst, (unsigned int __user *)regs->nip);
I think dropping the access_ok() probably is alright, because the NIP
must (should!) have been in userspace, though as Ben says it's always
good to be paranoid.
But ignoring that the address can fault at all is wrong AFAICS.
cheers
Christophe LEROY <[email protected]> writes:
> Le 02/06/2017 à 14:11, Benjamin Herrenschmidt a écrit :
>> On Fri, 2017-06-02 at 11:39 +0200, Christophe LEROY wrote:
>>> The difference between get_user() and __get_user() is that get_user()
>>> performs an access_ok() in addition.
>>>
>>> Doesn't access_ok() only checks whether addr is below TASK_SIZE to
>>> ensure it is a valid user address ?
>>
>> Do you have a measurable improvement by skipping that check ? I agree
>> with your reasoning but I'm also paranoid and so I wouldn't change it
>> unless it's really worth it.
>>
>
> No I don't have. Taking into account the patch following this serie
> which limits even more the calls to get_user(), it is probably not worth
> it anymore (see https://patchwork.ozlabs.org/patch/757564/)
>
> I will then have to resubmit the entire serie (including that additional
> one), but there is no get_user_inatomic() so will have to either:
> - do the access_ok() verification inside the function
I think open coding the access_ok() check is probably the best option.
cheers
Le 05/06/2017 à 12:45, Michael Ellerman a écrit :
> Christophe LEROY <[email protected]> writes:
>
>> Le 02/06/2017 à 11:26, Michael Ellerman a écrit :
>>> Christophe Leroy <[email protected]> writes:
>>>
>>>> Only the get_user() in store_updates_sp() has to be done outside
>>>> the mm semaphore. All the comparison can be done within the semaphore,
>>>> so only when really needed.
>>>>
>>>> As we got a DSI exception, the address pointed by regs->nip is
>>>> obviously valid, otherwise we would have had a instruction exception.
>>>> So __get_user() can be used instead of get_user()
>>>
>>> I don't think that part is true.
>>>
>>> You took a DSI so there *was* an instruction at NIP, but since then it
>>> may have been unmapped by another thread.
>>>
>>> So I don't think you can assume the get_user() will succeed.
>>
>> The difference between get_user() and __get_user() is that get_user()
>> performs an access_ok() in addition.
>>
>> Doesn't access_ok() only checks whether addr is below TASK_SIZE to
>> ensure it is a valid user address ?
>
> Yeah more or less, via some gross macros.
>
> I was actually not that worried about the switch from get_user() to
> __get_user(), but rather that you removed the check of the return value.
> ie.
>
> - if (get_user(inst, (unsigned int __user *)regs->nip))
> - return 0;
>
> Became:
>
> if (is_write && user_mode(regs))
> - store_update_sp = store_updates_sp(regs);
> + __get_user(inst, (unsigned int __user *)regs->nip);
>
>
> I think dropping the access_ok() probably is alright, because the NIP
> must (should!) have been in userspace, though as Ben says it's always
> good to be paranoid.
>
> But ignoring that the address can fault at all is wrong AFAICS.
I see what you mean now.
Indeed,
- unsigned int inst;
Became
+ unsigned int inst = 0;
Since __get_user() doesn't modify 'inst' in case of error, 'inst'
remains 0, and store_updates_sp(0) return false. That was the idea behind.
Christophe
---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus
christophe leroy <[email protected]> writes:
> Le 05/06/2017 à 12:45, Michael Ellerman a écrit :
>> Christophe LEROY <[email protected]> writes:
>>
>>> Le 02/06/2017 à 11:26, Michael Ellerman a écrit :
>>>> Christophe Leroy <[email protected]> writes:
>>>>
>>>>> Only the get_user() in store_updates_sp() has to be done outside
>>>>> the mm semaphore. All the comparison can be done within the semaphore,
>>>>> so only when really needed.
>>>>>
>>>>> As we got a DSI exception, the address pointed by regs->nip is
>>>>> obviously valid, otherwise we would have had a instruction exception.
>>>>> So __get_user() can be used instead of get_user()
>>>>
>>>> I don't think that part is true.
>>>>
>>>> You took a DSI so there *was* an instruction at NIP, but since then it
>>>> may have been unmapped by another thread.
>>>>
>>>> So I don't think you can assume the get_user() will succeed.
>>>
>>> The difference between get_user() and __get_user() is that get_user()
>>> performs an access_ok() in addition.
>>>
>>> Doesn't access_ok() only checks whether addr is below TASK_SIZE to
>>> ensure it is a valid user address ?
>>
>> Yeah more or less, via some gross macros.
>>
>> I was actually not that worried about the switch from get_user() to
>> __get_user(), but rather that you removed the check of the return value.
>> ie.
>>
>> - if (get_user(inst, (unsigned int __user *)regs->nip))
>> - return 0;
>>
>> Became:
>>
>> if (is_write && user_mode(regs))
>> - store_update_sp = store_updates_sp(regs);
>> + __get_user(inst, (unsigned int __user *)regs->nip);
>>
>>
>> I think dropping the access_ok() probably is alright, because the NIP
>> must (should!) have been in userspace, though as Ben says it's always
>> good to be paranoid.
>>
>> But ignoring that the address can fault at all is wrong AFAICS.
>
> I see what you mean now.
>
> Indeed,
>
> - unsigned int inst;
>
> Became
>
> + unsigned int inst = 0;
>
> Since __get_user() doesn't modify 'inst' in case of error, 'inst'
> remains 0, and store_updates_sp(0) return false. That was the idea behind.
Ugh. OK, my bad. Though it is a little subtle.
How about:
@@ -286,10 +290,13 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
/*
* We want to do this outside mmap_sem, because reading code around nip
* can result in fault, which will cause a deadlock when called with
- * mmap_sem held
+ * mmap_sem held. We don't need to check if get_user() fails, if it does
+ * it won't modify inst, and an inst of 0 will return false from
+ * store_updates_sp().
*/
+ inst = 0;
if (is_write && is_user)
- store_update_sp = store_updates_sp(regs);
+ get_user(inst, (unsigned int __user *)regs->nip);
if (is_user)
flags |= FAULT_FLAG_USER;
Then this one can go in.
cheers
Le 06/06/2017 à 13:00, Michael Ellerman a écrit :
> christophe leroy <[email protected]> writes:
>
>> Le 05/06/2017 à 12:45, Michael Ellerman a écrit :
>>> Christophe LEROY <[email protected]> writes:
>>>
>>>> Le 02/06/2017 à 11:26, Michael Ellerman a écrit :
>>>>> Christophe Leroy <[email protected]> writes:
>>>>>
>>>>>> Only the get_user() in store_updates_sp() has to be done outside
>>>>>> the mm semaphore. All the comparison can be done within the semaphore,
>>>>>> so only when really needed.
>>>>>>
>>>>>> As we got a DSI exception, the address pointed by regs->nip is
>>>>>> obviously valid, otherwise we would have had a instruction exception.
>>>>>> So __get_user() can be used instead of get_user()
>>>>>
>>>>> I don't think that part is true.
>>>>>
>>>>> You took a DSI so there *was* an instruction at NIP, but since then it
>>>>> may have been unmapped by another thread.
>>>>>
>>>>> So I don't think you can assume the get_user() will succeed.
>>>>
>>>> The difference between get_user() and __get_user() is that get_user()
>>>> performs an access_ok() in addition.
>>>>
>>>> Doesn't access_ok() only checks whether addr is below TASK_SIZE to
>>>> ensure it is a valid user address ?
>>>
>>> Yeah more or less, via some gross macros.
>>>
>>> I was actually not that worried about the switch from get_user() to
>>> __get_user(), but rather that you removed the check of the return value.
>>> ie.
>>>
>>> - if (get_user(inst, (unsigned int __user *)regs->nip))
>>> - return 0;
>>>
>>> Became:
>>>
>>> if (is_write && user_mode(regs))
>>> - store_update_sp = store_updates_sp(regs);
>>> + __get_user(inst, (unsigned int __user *)regs->nip);
>>>
>>>
>>> I think dropping the access_ok() probably is alright, because the NIP
>>> must (should!) have been in userspace, though as Ben says it's always
>>> good to be paranoid.
>>>
>>> But ignoring that the address can fault at all is wrong AFAICS.
>>
>> I see what you mean now.
>>
>> Indeed,
>>
>> - unsigned int inst;
>>
>> Became
>>
>> + unsigned int inst = 0;
>>
>> Since __get_user() doesn't modify 'inst' in case of error, 'inst'
>> remains 0, and store_updates_sp(0) return false. That was the idea behind.
>
> Ugh. OK, my bad. Though it is a little subtle.
>
> How about:
>
> @@ -286,10 +290,13 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
> /*
> * We want to do this outside mmap_sem, because reading code around nip
> * can result in fault, which will cause a deadlock when called with
> - * mmap_sem held
> + * mmap_sem held. We don't need to check if get_user() fails, if it does
> + * it won't modify inst, and an inst of 0 will return false from
> + * store_updates_sp().
> */
> + inst = 0;
> if (is_write && is_user)
> - store_update_sp = store_updates_sp(regs);
> + get_user(inst, (unsigned int __user *)regs->nip);
>
> if (is_user)
> flags |= FAULT_FLAG_USER;
>
>
> Then this one can go in.
>
I just submitted v4 version of the patch "powerpc/mm: Only read faulting
instruction when necessary in do_page_fault()", skipping this step and
going directly to the final solution.
The new approach has been to keep everything inside store_updates_sp()
function and just move the call.
Christophe