2013-08-06 08:43:43

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 1/4] mm, rmap: do easy-job first in anon_vma_fork

If we fail due to some errorous situation, it is better to quit
without doing heavy work. So changing order of execution.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/rmap.c b/mm/rmap.c
index a149e3a..c2f51cb 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -278,19 +278,19 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
if (!pvma->anon_vma)
return 0;

+ /* First, allocate required objects */
+ avc = anon_vma_chain_alloc(GFP_KERNEL);
+ if (!avc)
+ goto out_error;
+ anon_vma = anon_vma_alloc();
+ if (!anon_vma)
+ goto out_error_free_avc;
+
/*
- * First, attach the new VMA to the parent VMA's anon_vmas,
+ * Then attach the new VMA to the parent VMA's anon_vmas,
* so rmap can find non-COWed pages in child processes.
*/
if (anon_vma_clone(vma, pvma))
- return -ENOMEM;
-
- /* Then add our own anon_vma. */
- anon_vma = anon_vma_alloc();
- if (!anon_vma)
- goto out_error;
- avc = anon_vma_chain_alloc(GFP_KERNEL);
- if (!avc)
goto out_error_free_anon_vma;

/*
@@ -312,10 +312,11 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)

return 0;

- out_error_free_anon_vma:
+out_error_free_anon_vma:
put_anon_vma(anon_vma);
- out_error:
- unlink_anon_vmas(vma);
+out_error_free_avc:
+ anon_vma_chain_free(avc);
+out_error:
return -ENOMEM;
}

--
1.7.9.5


2013-08-06 08:43:46

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 3/4] mm, rmap: minimize lock hold when unlink_anon_vmas

Currently, we free the avc objects with holding a lock. To minimize
lock hold time, we just move the avc objects to another list
with holding a lock. Then, iterate them and free objects without holding
a lock. This makes lock hold time minimized.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/rmap.c b/mm/rmap.c
index 1603f64..9cfb282 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -330,6 +330,7 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
{
struct anon_vma_chain *avc, *next;
struct anon_vma *root = NULL;
+ LIST_HEAD(avc_list);

/*
* Unlink each anon_vma chained to the VMA. This list is ordered
@@ -348,10 +349,14 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
if (RB_EMPTY_ROOT(&anon_vma->rb_root))
continue;

+ list_move(&avc->same_vma, &avc_list);
+ }
+ unlock_anon_vma_root(root);
+
+ list_for_each_entry_safe(avc, next, &avc_list, same_vma) {
list_del(&avc->same_vma);
anon_vma_chain_free(avc);
}
- unlock_anon_vma_root(root);

/*
* Iterate the list once more, it now only contains empty and unlinked
@@ -363,7 +368,6 @@ void unlink_anon_vmas(struct vm_area_struct *vma)

put_anon_vma(anon_vma);

- list_del(&avc->same_vma);
anon_vma_chain_free(avc);
}
}
--
1.7.9.5

2013-08-06 08:43:57

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 4/4] mm, page_alloc: optimize batch count in free_pcppages_bulk()

If we use a division operation, we can compute a batch count more closed
to ideal value. With this value, we can finish our job within
MIGRATE_PCPTYPES iteration. In addition, batching to free more pages
may be helpful to cache usage.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 26ab229..7f145cc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -634,53 +634,76 @@ static inline int free_pages_check(struct page *page)
static void free_pcppages_bulk(struct zone *zone, int count,
struct per_cpu_pages *pcp)
{
- int migratetype = 0;
- int batch_free = 0;
- int to_free = count;
+ struct list_head *list;
+ int batch_free;
+ int mt;
+ int nr_list;
+ bool all = false;
+
+ if (pcp->count == count)
+ all = true;

spin_lock(&zone->lock);
zone->all_unreclaimable = 0;
zone->pages_scanned = 0;

- while (to_free) {
- struct page *page;
- struct list_head *list;
+redo:
+ /* Count non-empty list */
+ nr_list = 0;
+ for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) {
+ list = &pcp->lists[mt];
+ if (!list_empty(list))
+ nr_list++;
+ }

- /*
- * Remove pages from lists in a round-robin fashion. A
- * batch_free count is maintained that is incremented when an
- * empty list is encountered. This is so more pages are freed
- * off fuller lists instead of spinning excessively around empty
- * lists
- */
- do {
- batch_free++;
- if (++migratetype == MIGRATE_PCPTYPES)
- migratetype = 0;
- list = &pcp->lists[migratetype];
- } while (list_empty(list));
+ /*
+ * If there is only one non-empty list, free them all.
+ * Otherwise, remove pages from lists in a round-robin fashion.
+ * batch_free is set to remove at least one list.
+ */
+ if (all || nr_list == 1)
+ batch_free = count;
+ else if (count <= nr_list)
+ batch_free = 1;
+ else
+ batch_free = count / nr_list;

- /* This is the only non-empty list. Free them all. */
- if (batch_free == MIGRATE_PCPTYPES)
- batch_free = to_free;
+ for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) {
+ struct page *page;
+ int i, page_mt;

- do {
- int mt; /* migratetype of the to-be-freed page */
+ list = &pcp->lists[mt];

+ for (i = 0; i < batch_free; i++) {
+ if (list_empty(list))
+ break;
+
+ count--;
page = list_entry(list->prev, struct page, lru);
+
/* must delete as __free_one_page list manipulates */
list_del(&page->lru);
- mt = get_freepage_migratetype(page);
+ page_mt = get_freepage_migratetype(page);
/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
- __free_one_page(page, zone, 0, mt);
- trace_mm_page_pcpu_drain(page, 0, mt);
- if (likely(!is_migrate_isolate_page(page))) {
- __mod_zone_page_state(zone, NR_FREE_PAGES, 1);
- if (is_migrate_cma(mt))
- __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
- }
- } while (--to_free && --batch_free && !list_empty(list));
+ __free_one_page(page, zone, 0, page_mt);
+ trace_mm_page_pcpu_drain(page, 0, page_mt);
+
+ if (unlikely(is_migrate_isolate_page(page)))
+ continue;
+
+ __mod_zone_page_state(zone, NR_FREE_PAGES, 1);
+ if (is_migrate_cma(page_mt))
+ __mod_zone_page_state(zone,
+ NR_FREE_CMA_PAGES, 1);
+ }
+
+ if (!count)
+ break;
}
+
+ if (count)
+ goto redo;
+
spin_unlock(&zone->lock);
}

--
1.7.9.5

2013-08-06 08:45:21

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 2/4] mm, rmap: allocate anon_vma_chain before starting to link anon_vma_chain

If we allocate anon_vma_chain before starting to link, we can reduce
the lock hold time. This patch implement it.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/rmap.c b/mm/rmap.c
index c2f51cb..1603f64 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -240,18 +240,21 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
{
struct anon_vma_chain *avc, *pavc;
struct anon_vma *root = NULL;
+ LIST_HEAD(avc_list);
+
+ list_for_each_entry(pavc, &src->anon_vma_chain, same_vma) {
+ avc = anon_vma_chain_alloc(GFP_KERNEL);
+ if (unlikely(!avc))
+ goto enomem_failure;
+
+ list_add_tail(&avc->same_vma, &avc_list);
+ }

list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
struct anon_vma *anon_vma;

- avc = anon_vma_chain_alloc(GFP_NOWAIT | __GFP_NOWARN);
- if (unlikely(!avc)) {
- unlock_anon_vma_root(root);
- root = NULL;
- avc = anon_vma_chain_alloc(GFP_KERNEL);
- if (!avc)
- goto enomem_failure;
- }
+ avc = list_entry((&avc_list)->next, typeof(*avc), same_vma);
+ list_del(&avc->same_vma);
anon_vma = pavc->anon_vma;
root = lock_anon_vma_root(root, anon_vma);
anon_vma_chain_link(dst, avc, anon_vma);
@@ -259,8 +262,11 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
unlock_anon_vma_root(root);
return 0;

- enomem_failure:
- unlink_anon_vmas(dst);
+enomem_failure:
+ list_for_each_entry_safe(avc, pavc, &avc_list, same_vma) {
+ list_del(&avc->same_vma);
+ anon_vma_chain_free(avc);
+ }
return -ENOMEM;
}

--
1.7.9.5

2013-08-06 08:48:55

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm, page_alloc: optimize batch count in free_pcppages_bulk()

On Tue, Aug 06, 2013 at 05:43:40PM +0900, Joonsoo Kim wrote:
> If we use a division operation, we can compute a batch count more closed
> to ideal value. With this value, we can finish our job within
> MIGRATE_PCPTYPES iteration. In addition, batching to free more pages
> may be helpful to cache usage.
>
> Signed-off-by: Joonsoo Kim <[email protected]>

Oops... Sorry.
Please ignore this one.
This patch is already submitted few seconds ago :)

Thanks.

2013-08-06 12:59:05

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm, rmap: do easy-job first in anon_vma_fork

On Tue, Aug 06, 2013 at 05:43:37PM +0900, Joonsoo Kim wrote:
> If we fail due to some errorous situation, it is better to quit
> without doing heavy work. So changing order of execution.
>
> Signed-off-by: Joonsoo Kim <[email protected]>
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index a149e3a..c2f51cb 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -278,19 +278,19 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
> if (!pvma->anon_vma)
> return 0;
>
> + /* First, allocate required objects */
> + avc = anon_vma_chain_alloc(GFP_KERNEL);
> + if (!avc)
> + goto out_error;
> + anon_vma = anon_vma_alloc();
> + if (!anon_vma)
> + goto out_error_free_avc;
> +
> /*
> - * First, attach the new VMA to the parent VMA's anon_vmas,
> + * Then attach the new VMA to the parent VMA's anon_vmas,
> * so rmap can find non-COWed pages in child processes.
> */
> if (anon_vma_clone(vma, pvma))
> - return -ENOMEM;
> -
> - /* Then add our own anon_vma. */
> - anon_vma = anon_vma_alloc();
> - if (!anon_vma)
> - goto out_error;
> - avc = anon_vma_chain_alloc(GFP_KERNEL);
> - if (!avc)
> goto out_error_free_anon_vma;

Which heavy work? anon_vma_clone() is anon_vma_chain_alloc() in a
loop.

Optimizing error paths only makes sense if they are common and you
actually could save something by reordering. This matches neither.

2013-08-07 06:08:14

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 2/4] mm, rmap: allocate anon_vma_chain before starting to link anon_vma_chain

On Tue, Aug 06, 2013 at 05:43:38PM +0900, Joonsoo Kim wrote:
> If we allocate anon_vma_chain before starting to link, we can reduce
> the lock hold time. This patch implement it.
>
> Signed-off-by: Joonsoo Kim <[email protected]>
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index c2f51cb..1603f64 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -240,18 +240,21 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> {
> struct anon_vma_chain *avc, *pavc;
> struct anon_vma *root = NULL;
> + LIST_HEAD(avc_list);
> +
> + list_for_each_entry(pavc, &src->anon_vma_chain, same_vma) {
> + avc = anon_vma_chain_alloc(GFP_KERNEL);
> + if (unlikely(!avc))
> + goto enomem_failure;
> +
> + list_add_tail(&avc->same_vma, &avc_list);
> + }
>
> list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
> struct anon_vma *anon_vma;
>
> - avc = anon_vma_chain_alloc(GFP_NOWAIT | __GFP_NOWARN);
> - if (unlikely(!avc)) {
> - unlock_anon_vma_root(root);
> - root = NULL;
> - avc = anon_vma_chain_alloc(GFP_KERNEL);
> - if (!avc)
> - goto enomem_failure;
> - }
> + avc = list_entry((&avc_list)->next, typeof(*avc), same_vma);

list_first_entry() please

> + list_del(&avc->same_vma);
> anon_vma = pavc->anon_vma;
> root = lock_anon_vma_root(root, anon_vma);
> anon_vma_chain_link(dst, avc, anon_vma);
> @@ -259,8 +262,11 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> unlock_anon_vma_root(root);
> return 0;
>
> - enomem_failure:
> - unlink_anon_vmas(dst);
> +enomem_failure:
> + list_for_each_entry_safe(avc, pavc, &avc_list, same_vma) {
> + list_del(&avc->same_vma);
> + anon_vma_chain_free(avc);
> + }
> return -ENOMEM;
> }

Otherwise, looks good.

Thanks!

2013-08-07 06:11:45

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm, rmap: minimize lock hold when unlink_anon_vmas

On Tue, Aug 06, 2013 at 05:43:39PM +0900, Joonsoo Kim wrote:
> Currently, we free the avc objects with holding a lock. To minimize
> lock hold time, we just move the avc objects to another list
> with holding a lock. Then, iterate them and free objects without holding
> a lock. This makes lock hold time minimized.
>
> Signed-off-by: Joonsoo Kim <[email protected]>

I expect kfree() to be fast enough that we don't think we need to
bother batching this outside the lock.

2013-08-07 07:16:23

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm, rmap: do easy-job first in anon_vma_fork

Hello, Johannes.

On Tue, Aug 06, 2013 at 08:58:54AM -0400, Johannes Weiner wrote:
> > if (anon_vma_clone(vma, pvma))
> > - return -ENOMEM;
> > -
> > - /* Then add our own anon_vma. */
> > - anon_vma = anon_vma_alloc();
> > - if (!anon_vma)
> > - goto out_error;
> > - avc = anon_vma_chain_alloc(GFP_KERNEL);
> > - if (!avc)
> > goto out_error_free_anon_vma;
>
> Which heavy work? anon_vma_clone() is anon_vma_chain_alloc() in a
> loop.
>
> Optimizing error paths only makes sense if they are common and you
> actually could save something by reordering. This matches neither.

Yes, you are right. I drop this one.

Thanks.

2013-08-07 07:17:24

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 2/4] mm, rmap: allocate anon_vma_chain before starting to link anon_vma_chain

On Wed, Aug 07, 2013 at 02:08:03AM -0400, Johannes Weiner wrote:
> >
> > list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
> > struct anon_vma *anon_vma;
> >
> > - avc = anon_vma_chain_alloc(GFP_NOWAIT | __GFP_NOWARN);
> > - if (unlikely(!avc)) {
> > - unlock_anon_vma_root(root);
> > - root = NULL;
> > - avc = anon_vma_chain_alloc(GFP_KERNEL);
> > - if (!avc)
> > - goto enomem_failure;
> > - }
> > + avc = list_entry((&avc_list)->next, typeof(*avc), same_vma);
>
> list_first_entry() please

Okay. I will send next version soon.

>
> > + list_del(&avc->same_vma);
> > anon_vma = pavc->anon_vma;
> > root = lock_anon_vma_root(root, anon_vma);
> > anon_vma_chain_link(dst, avc, anon_vma);
> > @@ -259,8 +262,11 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> > unlock_anon_vma_root(root);
> > return 0;
> >
> > - enomem_failure:
> > - unlink_anon_vmas(dst);
> > +enomem_failure:
> > + list_for_each_entry_safe(avc, pavc, &avc_list, same_vma) {
> > + list_del(&avc->same_vma);
> > + anon_vma_chain_free(avc);
> > + }
> > return -ENOMEM;
> > }
>
> Otherwise, looks good.

Thank you!

2013-08-07 07:18:32

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm, rmap: minimize lock hold when unlink_anon_vmas

On Wed, Aug 07, 2013 at 02:11:38AM -0400, Johannes Weiner wrote:
> On Tue, Aug 06, 2013 at 05:43:39PM +0900, Joonsoo Kim wrote:
> > Currently, we free the avc objects with holding a lock. To minimize
> > lock hold time, we just move the avc objects to another list
> > with holding a lock. Then, iterate them and free objects without holding
> > a lock. This makes lock hold time minimized.
> >
> > Signed-off-by: Joonsoo Kim <[email protected]>
>
> I expect kfree() to be fast enough that we don't think we need to
> bother batching this outside the lock.

Okay!

Thanks.