2012-06-19 19:48:53

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 0/5] uprobes: write_opcode() cleanups

Hello,

write_opcode() cleanups I promised before.

Srikar, please review ;)

Oleg.


2012-06-19 19:49:08

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/5] uprobes: don't recheck vma/f_mapping in write_opcode()

write_opcode() rechecks valid_vma() and ->f_mapping, this is pointless.
The caller, register_for_each_vma() or uprobe_mmap(), has already done
these checks under mmap_sem.

To clarify, uprobe_mmap() checks valid_vma() only, but we can rely on
build_probe_list(vm_file->f_mapping->host).

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/events/uprobes.c | 19 +------------------
1 files changed, 1 insertions(+), 18 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index f935327..a2b32a5 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -206,33 +206,16 @@ static int write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
unsigned long vaddr, uprobe_opcode_t opcode)
{
struct page *old_page, *new_page;
- struct address_space *mapping;
void *vaddr_old, *vaddr_new;
struct vm_area_struct *vma;
- struct uprobe *uprobe;
int ret;
+
retry:
/* Read the page with vaddr into memory */
ret = get_user_pages(NULL, mm, vaddr, 1, 0, 0, &old_page, &vma);
if (ret <= 0)
return ret;

- ret = -EINVAL;
-
- /*
- * We are interested in text pages only. Our pages of interest
- * should be mapped for read and execute only. We desist from
- * adding probes in write mapped pages since the breakpoints
- * might end up in the file copy.
- */
- if (!valid_vma(vma, is_swbp_insn(&opcode)))
- goto put_out;
-
- uprobe = container_of(auprobe, struct uprobe, arch);
- mapping = uprobe->inode->i_mapping;
- if (mapping != vma->vm_file->f_mapping)
- goto put_out;
-
ret = -ENOMEM;
new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vaddr);
if (!new_page)
--
1.5.5.1

2012-06-19 19:49:25

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 2/5] uprobes: __replace_page() should not use page_address_in_vma()

page_address_in_vma(old_page) in __replace_page() is ugly and wrong.
The caller already knows the correct virtual address, this page was
found by get_user_pages(vaddr).

However, page_address_in_vma() can actually fail if page->mapping was
cleared by __delete_from_page_cache() after get_user_pages() returns.
But this means the race with page reclaim, write_opcode() should not
fail, it should retry and read this page again. Not sure this race is
really possible though, page_freeze_refs() logic should prevent it.

We could change __replace_page() to return -EAGAIN in this case, but
it would be better to simply use the caller's vaddr and rely on
page_check_address().

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/events/uprobes.c | 10 +++-------
1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index a2b32a5..5b10705 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -132,17 +132,13 @@ static loff_t vma_address(struct vm_area_struct *vma, loff_t offset)
*
* Returns 0 on success, -EFAULT on failure.
*/
-static int __replace_page(struct vm_area_struct *vma, struct page *page, struct page *kpage)
+static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
+ struct page *page, struct page *kpage)
{
struct mm_struct *mm = vma->vm_mm;
- unsigned long addr;
spinlock_t *ptl;
pte_t *ptep;

- addr = page_address_in_vma(page, vma);
- if (addr == -EFAULT)
- return -EFAULT;
-
ptep = page_check_address(page, mm, addr, &ptl, 0);
if (!ptep)
return -EAGAIN;
@@ -243,7 +239,7 @@ retry:
goto unlock_out;

lock_page(new_page);
- ret = __replace_page(vma, old_page, new_page);
+ ret = __replace_page(vma, vaddr, old_page, new_page);
unlock_page(new_page);

unlock_out:
--
1.5.5.1

2012-06-19 19:49:42

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 3/5] uprobes: kill write_opcode()->lock_page(new_page)

write_opcode() does lock_page(new_page) for no reason. Nobody can
see this page until __replace_page() exposes it under ptl lock, and
we do nothing with this page after pte_unmap_unlock().

If nothing else, the similar code in do_wp_page() doesn't lock the
new page for page_add_new_anon_rmap/set_pte_at_notify.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/events/uprobes.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 5b10705..16a3b69 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -238,9 +238,7 @@ retry:
if (ret)
goto unlock_out;

- lock_page(new_page);
ret = __replace_page(vma, vaddr, old_page, new_page);
- unlock_page(new_page);

unlock_out:
unlock_page(old_page);
--
1.5.5.1

2012-06-19 19:49:58

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 4/5] uprobes: cleanup and document write_opcode()->lock_page(old_page)

The comment above write_opcode()->lock_page(old_page) tells about
the race with do_wp_page(). I don't really understand which exactly
race it means, but afaics this lock_page() was not enough to close
all races with do_wp_page().

Anyway, since 77fc4af1 this code is always called with ->mmap_sem
hold for writing so we can forget about do_wp_page().

However, we can't simply remove this lock_page(), and the only
(afaics) reason is __replace_page()->try_to_free_swap().

Nothing in write_opcode() needs it, move it into __replace_page()
and fix the comment.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/events/uprobes.c | 27 ++++++++++++++-------------
1 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 16a3b69..442064d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -138,10 +138,15 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
struct mm_struct *mm = vma->vm_mm;
spinlock_t *ptl;
pte_t *ptep;
+ int err;

+ /* freeze PageSwapCache() for try_to_free_swap() below */
+ lock_page(page);
+
+ err = -EAGAIN;
ptep = page_check_address(page, mm, addr, &ptl, 0);
if (!ptep)
- return -EAGAIN;
+ goto unlock;

get_page(kpage);
page_add_new_anon_rmap(kpage, vma, addr);
@@ -161,7 +166,10 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
put_page(page);
pte_unmap_unlock(ptep, ptl);

- return 0;
+ err = 0;
+ unlock:
+ unlock_page(page);
+ return err;
}

/**
@@ -215,15 +223,10 @@ retry:
ret = -ENOMEM;
new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vaddr);
if (!new_page)
- goto put_out;
+ goto put_old;

__SetPageUptodate(new_page);

- /*
- * lock page will serialize against do_wp_page()'s
- * PageAnon() handling
- */
- lock_page(old_page);
/* copy the page now that we've got it stable */
vaddr_old = kmap_atomic(old_page);
vaddr_new = kmap_atomic(new_page);
@@ -236,15 +239,13 @@ retry:

ret = anon_vma_prepare(vma);
if (ret)
- goto unlock_out;
+ goto put_new;

ret = __replace_page(vma, vaddr, old_page, new_page);

-unlock_out:
- unlock_page(old_page);
+put_new:
page_cache_release(new_page);
-
-put_out:
+put_old:
put_page(old_page);

if (unlikely(ret == -EAGAIN))
--
1.5.5.1

2012-06-19 19:50:58

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 5/5] uprobes: write_opcode: alloc the new page outside of "retry" loop

It is ugly to free and re-alloc new_page if write_opcode() needs to
retry. Move alloc_page_vma/page_cache_release outside of "retry" loop.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/events/uprobes.c | 26 +++++++++++---------------
1 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 442064d..3fd7bdf 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -214,19 +214,18 @@ static int write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
struct vm_area_struct *vma;
int ret;

-retry:
- /* Read the page with vaddr into memory */
- ret = get_user_pages(NULL, mm, vaddr, 1, 0, 0, &old_page, &vma);
- if (ret <= 0)
- return ret;
-
- ret = -ENOMEM;
new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vaddr);
if (!new_page)
- goto put_old;
+ return -ENOMEM;

__SetPageUptodate(new_page);

+ retry:
+ /* Read the page with vaddr into memory */
+ ret = get_user_pages(NULL, mm, vaddr, 1, 0, 0, &old_page, &vma);
+ if (ret <= 0)
+ goto out;
+
/* copy the page now that we've got it stable */
vaddr_old = kmap_atomic(old_page);
vaddr_new = kmap_atomic(new_page);
@@ -238,18 +237,15 @@ retry:
kunmap_atomic(vaddr_old);

ret = anon_vma_prepare(vma);
- if (ret)
- goto put_new;
+ if (!ret)
+ ret = __replace_page(vma, vaddr, old_page, new_page);

- ret = __replace_page(vma, vaddr, old_page, new_page);
-
-put_new:
- page_cache_release(new_page);
-put_old:
put_page(old_page);

if (unlikely(ret == -EAGAIN))
goto retry;
+ out:
+ page_cache_release(new_page);
return ret;
}

--
1.5.5.1

2012-06-20 12:13:09

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH 2/5] uprobes: __replace_page() should not use page_address_in_vma()

* Oleg Nesterov <[email protected]> [2012-06-19 21:47:12]:

> page_address_in_vma(old_page) in __replace_page() is ugly and wrong.
> The caller already knows the correct virtual address, this page was
> found by get_user_pages(vaddr).
>
> However, page_address_in_vma() can actually fail if page->mapping was
> cleared by __delete_from_page_cache() after get_user_pages() returns.
> But this means the race with page reclaim, write_opcode() should not
> fail, it should retry and read this page again. Not sure this race is
> really possible though, page_freeze_refs() logic should prevent it.
>
> We could change __replace_page() to return -EAGAIN in this case, but
> it would be better to simply use the caller's vaddr and rely on
> page_check_address().
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/events/uprobes.c | 10 +++-------
> 1 files changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index a2b32a5..5b10705 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -132,17 +132,13 @@ static loff_t vma_address(struct vm_area_struct *vma, loff_t offset)
> *
> * Returns 0 on success, -EFAULT on failure.
> */
> -static int __replace_page(struct vm_area_struct *vma, struct page *page, struct page *kpage)
> +static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
> + struct page *page, struct page *kpage)

Could please update the comment above __replace_page to mention that it
now takes addr as a parameter?

2012-06-20 12:31:08

by Anton Arapov

[permalink] [raw]
Subject: Re: [PATCH 5/5] uprobes: write_opcode: alloc the new page outside of "retry" loop

On Tue, Jun 19, 2012 at 09:47:59PM +0200, Oleg Nesterov wrote:
> It is ugly to free and re-alloc new_page if write_opcode() needs to
> retry. Move alloc_page_vma/page_cache_release outside of "retry" loop.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/events/uprobes.c | 26 +++++++++++---------------
> 1 files changed, 11 insertions(+), 15 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 442064d..3fd7bdf 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -214,19 +214,18 @@ static int write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> struct vm_area_struct *vma;
> int ret;
>
> -retry:
> - /* Read the page with vaddr into memory */
> - ret = get_user_pages(NULL, mm, vaddr, 1, 0, 0, &old_page, &vma);
> - if (ret <= 0)
> - return ret;
> -
> - ret = -ENOMEM;
> new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vaddr);
Should we want NULL instead of vma in alloc_page_vma()? Or
initialize vma with NULL...

> if (!new_page)
> - goto put_old;
> + return -ENOMEM;
>
> __SetPageUptodate(new_page);
>
> + retry:
> + /* Read the page with vaddr into memory */
> + ret = get_user_pages(NULL, mm, vaddr, 1, 0, 0, &old_page, &vma);
> + if (ret <= 0)
> + goto out;
> +
[cut]

Anton.

2012-06-20 13:50:14

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/5] uprobes: __replace_page() should not use page_address_in_vma()

On 06/20, Srikar Dronamraju wrote:
>
> > -static int __replace_page(struct vm_area_struct *vma, struct page *page, struct page *kpage)
> > +static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
> > + struct page *page, struct page *kpage)
>
> Could please update the comment above __replace_page to mention that it
> now takes addr as a parameter?

Will do, thanks.

Oleg.

2012-06-20 13:51:49

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 5/5] uprobes: write_opcode: alloc the new page outside of "retry" loop

On 06/20, Anton Arapov wrote:
>
> On Tue, Jun 19, 2012 at 09:47:59PM +0200, Oleg Nesterov wrote:
> > @@ -214,19 +214,18 @@ static int write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> > struct vm_area_struct *vma;
> > int ret;
> >
> > -retry:
> > - /* Read the page with vaddr into memory */
> > - ret = get_user_pages(NULL, mm, vaddr, 1, 0, 0, &old_page, &vma);
> > - if (ret <= 0)
> > - return ret;
> > -
> > - ret = -ENOMEM;
> > new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vaddr);
> Should we want NULL instead of vma in alloc_page_vma()? Or
> initialize vma with NULL...

OOPS. Thanks Anton, this should be fixed.

Oleg.