alloc_pages calls alloc_pages_node with numa_node_id().
alloc_pages_node can't see nid < 0.
So we can use alloc_pages_exact_node instead of alloc_pages_node.
It could avoid comparison and branch as 6484eb3e2a81807722 tried.
Cc: Mel Gorman <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
include/linux/gfp.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4c6d413..b65f003 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -308,7 +308,7 @@ extern struct page *alloc_page_vma(gfp_t gfp_mask,
struct vm_area_struct *vma, unsigned long addr);
#else
#define alloc_pages(gfp_mask, order) \
- alloc_pages_node(numa_node_id(), gfp_mask, order)
+ alloc_pages_exact_node(numa_node_id(), gfp_mask, order)
#define alloc_page_vma(gfp_mask, vma, addr) alloc_pages(gfp_mask, 0)
#endif
#define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
--
1.7.0.5
alloc_pages_node is called with cpu_to_node(cpu).
I think cpu_to_node(cpu) never returns -1.
(But I am not sure we need double check.)
So we can use alloc_pages_exact_node instead of alloc_pages_node.
It could avoid comparison and branch as 6484eb3e2a81807722 tried.
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/percpu.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/percpu.c b/mm/percpu.c
index 768419d..ec3e671 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -720,7 +720,7 @@ static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
for (i = page_start; i < page_end; i++) {
struct page **pagep = &pages[pcpu_page_idx(cpu, i)];
- *pagep = alloc_pages_node(cpu_to_node(cpu), gfp, 0);
+ *pagep = alloc_pages_exact_node(cpu_to_node(cpu), gfp, 0);
if (!*pagep) {
pcpu_free_pages(chunk, pages, populated,
page_start, page_end);
--
1.7.0.5
alloc_slab_page never calls alloc_pages_node with -1.
It means node's validity check is unnecessary.
So we can use alloc_pages_exact_node instead of alloc_pages_node.
It could avoid comparison and branch as 6484eb3e2a81807722 tried.
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/slub.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index b364844..9984165 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1084,7 +1084,7 @@ static inline struct page *alloc_slab_page(gfp_t flags, int node,
if (node == -1)
return alloc_pages(flags, order);
else
- return alloc_pages_node(node, flags, order);
+ return alloc_pages_exact_node(node, flags, order);
}
static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
--
1.7.0.5
if node_state is N_HIGH_MEMORY, node doesn't have -1.
It means node's validity check is unnecessary.
So we can use alloc_pages_exact_node instead of alloc_pages_node.
It could avoid comparison and branch as 6484eb3e2a81807722 tried.
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/sparse-vmemmap.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 392b9bb..7710ebc 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -53,7 +53,7 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node)
struct page *page;
if (node_state(node, N_HIGH_MEMORY))
- page = alloc_pages_node(node,
+ page = alloc_pages_exact_node(node,
GFP_KERNEL | __GFP_ZERO, get_order(size));
else
page = alloc_pages(GFP_KERNEL | __GFP_ZERO,
--
1.7.0.5
__vmalloc_area_node never pass -1 to alloc_pages_node.
It means node's validity check is unnecessary.
So we can use alloc_pages_exact_node instead of alloc_pages_node.
It could avoid comparison and branch as 6484eb3e2a81807722 tried.
Cc: Nick Piggin <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/vmalloc.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ae00746..7abf423 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1499,7 +1499,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
if (node < 0)
page = alloc_page(gfp_mask);
else
- page = alloc_pages_node(node, gfp_mask, 0);
+ page = alloc_pages_exact_node(node, gfp_mask, 0);
if (unlikely(!page)) {
/* Successfully allocated i pages, free them in __vunmap() */
--
1.7.0.5
alloc_pages_exact_node naming makes some people misleading.
They considered it following as.
"This function will allocate pages from node which I wanted
exactly".
But it can allocate pages from fallback list if page allocator
can't find free page from node user wanted.
So let's comment this NOTE.
Actually I wanted to change naming with better.
ex) alloc_pages_explict_node.
But I changed my mind since the comment would be enough.
If anybody suggests better name, I will do with pleasure.
Cc: Mel Gorman <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Bob Liu <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
include/linux/gfp.h | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index b65f003..7539c17 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -288,6 +288,11 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
}
+/*
+ * NOTE : Allow page from fallback if page allocator can't find free page
+ * in your nid. Only if you want to allocate page from your nid, use
+ * __GFP_THISNODE flags with gfp_mask.
+ */
static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
unsigned int order)
{
--
1.7.0.5
On Wed, Apr 14, 2010 at 12:24:58AM +0900, Minchan Kim wrote:
> alloc_pages calls alloc_pages_node with numa_node_id().
> alloc_pages_node can't see nid < 0.
>
> So we can use alloc_pages_exact_node instead of alloc_pages_node.
> It could avoid comparison and branch as 6484eb3e2a81807722 tried.
>
> Cc: Mel Gorman <[email protected]>
> Cc: KAMEZAWA Hiroyuki <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
Makes sense.
Signed-off-by: Mel Gorman <[email protected]>
> ---
> include/linux/gfp.h | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 4c6d413..b65f003 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -308,7 +308,7 @@ extern struct page *alloc_page_vma(gfp_t gfp_mask,
> struct vm_area_struct *vma, unsigned long addr);
> #else
> #define alloc_pages(gfp_mask, order) \
> - alloc_pages_node(numa_node_id(), gfp_mask, order)
> + alloc_pages_exact_node(numa_node_id(), gfp_mask, order)
> #define alloc_page_vma(gfp_mask, vma, addr) alloc_pages(gfp_mask, 0)
> #endif
> #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
> --
> 1.7.0.5
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Wed, Apr 14, 2010 at 12:24:59AM +0900, Minchan Kim wrote:
> alloc_pages_node is called with cpu_to_node(cpu).
> I think cpu_to_node(cpu) never returns -1.
> (But I am not sure we need double check.)
>
> So we can use alloc_pages_exact_node instead of alloc_pages_node.
> It could avoid comparison and branch as 6484eb3e2a81807722 tried.
>
Well, numa_node_id() is implemented as
#ifndef numa_node_id
#define numa_node_id() (cpu_to_node(raw_smp_processor_id()))
#endif
and the mapping table on x86 at least is based on possible CPUs in
init_cpu_to_node() leaves the mapping as 0 if the APIC is bad or the numa
node is reported in apicid_to_node as -1. It would appear on power that
the node will be 0 for possible CPUs as well.
Hence, I believe this to be safe but a confirmation from Tejun would be
nice. I would continue digging but this looks like an initialisation path
so I'll move on to the next patch rather than spending more time.
> Cc: Tejun Heo <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> mm/percpu.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 768419d..ec3e671 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -720,7 +720,7 @@ static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
> for (i = page_start; i < page_end; i++) {
> struct page **pagep = &pages[pcpu_page_idx(cpu, i)];
>
> - *pagep = alloc_pages_node(cpu_to_node(cpu), gfp, 0);
> + *pagep = alloc_pages_exact_node(cpu_to_node(cpu), gfp, 0);
> if (!*pagep) {
> pcpu_free_pages(chunk, pages, populated,
> page_start, page_end);
> --
> 1.7.0.5
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Wed, Apr 14, 2010 at 12:25:00AM +0900, Minchan Kim wrote:
> alloc_slab_page never calls alloc_pages_node with -1.
Are you certain? What about
__kmalloc
-> slab_alloc (passed -1 as a node from __kmalloc)
-> __slab_alloc
-> new_slab
-> allocate_slab
-> alloc_slab_page
> It means node's validity check is unnecessary.
> So we can use alloc_pages_exact_node instead of alloc_pages_node.
> It could avoid comparison and branch as 6484eb3e2a81807722 tried.
>
> Cc: Christoph Lameter <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> mm/slub.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index b364844..9984165 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1084,7 +1084,7 @@ static inline struct page *alloc_slab_page(gfp_t flags, int node,
> if (node == -1)
> return alloc_pages(flags, order);
> else
> - return alloc_pages_node(node, flags, order);
> + return alloc_pages_exact_node(node, flags, order);
> }
>
> static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> --
> 1.7.0.5
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Wed, Apr 14, 2010 at 12:25:01AM +0900, Minchan Kim wrote:
> if node_state is N_HIGH_MEMORY, node doesn't have -1.
Also, if node_state is called with -1, a negative index is being checked in
a bitmap and that would be pretty broken in itself. I can't see a problem
with this patch.
Reviewed-by: Mel Gorman <[email protected]>
> It means node's validity check is unnecessary.
> So we can use alloc_pages_exact_node instead of alloc_pages_node.
> It could avoid comparison and branch as 6484eb3e2a81807722 tried.
>
> Cc: Christoph Lameter <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> mm/sparse-vmemmap.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 392b9bb..7710ebc 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -53,7 +53,7 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node)
> struct page *page;
>
> if (node_state(node, N_HIGH_MEMORY))
> - page = alloc_pages_node(node,
> + page = alloc_pages_exact_node(node,
> GFP_KERNEL | __GFP_ZERO, get_order(size));
> else
> page = alloc_pages(GFP_KERNEL | __GFP_ZERO,
> --
> 1.7.0.5
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Wed, Apr 14, 2010 at 12:52 AM, Mel Gorman <[email protected]> wrote:
> On Wed, Apr 14, 2010 at 12:25:00AM +0900, Minchan Kim wrote:
>> alloc_slab_page never calls alloc_pages_node with -1.
>
> Are you certain? What about
>
> __kmalloc
> -> slab_alloc (passed -1 as a node from __kmalloc)
> -> __slab_alloc
> -> new_slab
> -> allocate_slab
> -> alloc_slab_page
>
Sorry for writing confusing changelog.
I means if node == -1 alloc_slab_page always calls alloc_pages.
So we don't need redundant check.
--
Kind regards,
Minchan Kim
On Wed, Apr 14, 2010 at 12:25:02AM +0900, Minchan Kim wrote:
> __vmalloc_area_node never pass -1 to alloc_pages_node.
> It means node's validity check is unnecessary.
> So we can use alloc_pages_exact_node instead of alloc_pages_node.
> It could avoid comparison and branch as 6484eb3e2a81807722 tried.
>
Seems fine, the check for node id being negative has already been made.
Reviewed-by: Mel Gorman <[email protected]>
> Cc: Nick Piggin <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> mm/vmalloc.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index ae00746..7abf423 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1499,7 +1499,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> if (node < 0)
> page = alloc_page(gfp_mask);
> else
> - page = alloc_pages_node(node, gfp_mask, 0);
> + page = alloc_pages_exact_node(node, gfp_mask, 0);
>
> if (unlikely(!page)) {
> /* Successfully allocated i pages, free them in __vunmap() */
> --
> 1.7.0.5
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Wed, Apr 14, 2010 at 12:25:03AM +0900, Minchan Kim wrote:
> alloc_pages_exact_node naming makes some people misleading.
> They considered it following as.
> "This function will allocate pages from node which I wanted
> exactly".
> But it can allocate pages from fallback list if page allocator
> can't find free page from node user wanted.
>
> So let's comment this NOTE.
>
It's a little tough to read. How about
/*
* Use this instead of alloc_pages_node when the caller knows
* exactly which node they need (as opposed to passing in -1
* for current). Fallback to other nodes will still occur
* unless __GFP_THISNODE is specified.
*/
That at least will tie in why "exact" is in the name?
> Actually I wanted to change naming with better.
> ex) alloc_pages_explict_node.
"Explicit" can also be taken to mean "this and only this node".
> But I changed my mind since the comment would be enough.
>
> If anybody suggests better name, I will do with pleasure.
>
> Cc: Mel Gorman <[email protected]>
> Cc: KAMEZAWA Hiroyuki <[email protected]>
> Cc: Bob Liu <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> include/linux/gfp.h | 5 +++++
> 1 files changed, 5 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index b65f003..7539c17 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -288,6 +288,11 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
> return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
> }
>
> +/*
> + * NOTE : Allow page from fallback if page allocator can't find free page
> + * in your nid. Only if you want to allocate page from your nid, use
> + * __GFP_THISNODE flags with gfp_mask.
> + */
> static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
> unsigned int order)
> {
> --
> 1.7.0.5
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Wed, Apr 14, 2010 at 01:01:31AM +0900, Minchan Kim wrote:
> On Wed, Apr 14, 2010 at 12:52 AM, Mel Gorman <[email protected]> wrote:
> > On Wed, Apr 14, 2010 at 12:25:00AM +0900, Minchan Kim wrote:
> >> alloc_slab_page never calls alloc_pages_node with -1.
> >
> > Are you certain? What about
> >
> > __kmalloc
> > ?-> slab_alloc (passed -1 as a node from __kmalloc)
> > ? ?-> __slab_alloc
> > ? ? ?-> new_slab
> > ? ? ? ?-> allocate_slab
> > ? ? ? ? ?-> alloc_slab_page
> >
>
> Sorry for writing confusing changelog.
>
> I means if node == -1 alloc_slab_page always calls alloc_pages.
> So we don't need redundant check.
>
When the changelog is fixed up, feel free to add;
Reviewed-by: Mel Gorman <[email protected]>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Wed, Apr 14, 2010 at 1:13 AM, Mel Gorman <[email protected]> wrote:
> On Wed, Apr 14, 2010 at 12:25:03AM +0900, Minchan Kim wrote:
>> alloc_pages_exact_node naming makes some people misleading.
>> They considered it following as.
>> "This function will allocate pages from node which I wanted
>> exactly".
>> But it can allocate pages from fallback list if page allocator
>> can't find free page from node user wanted.
>>
>> So let's comment this NOTE.
>>
>
> It's a little tough to read. How about
>
> /*
> * Use this instead of alloc_pages_node when the caller knows
> * exactly which node they need (as opposed to passing in -1
> * for current). Fallback to other nodes will still occur
> * unless __GFP_THISNODE is specified.
> */
It is better than mine.
>
> That at least will tie in why "exact" is in the name?
>
>> Actually I wanted to change naming with better.
>> ex) alloc_pages_explict_node.
>
> "Explicit" can also be taken to mean "this and only this node".
I agree.
I will repost modified comment after Tejun comment [2/6].
Thanks for quick review, Mel. :)
--
Kind regards,
Minchan Kim
On Wed, 14 Apr 2010, Minchan Kim wrote:
> alloc_slab_page never calls alloc_pages_node with -1.
> It means node's validity check is unnecessary.
> So we can use alloc_pages_exact_node instead of alloc_pages_node.
> It could avoid comparison and branch as 6484eb3e2a81807722 tried.
>
> Cc: Christoph Lameter <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> mm/slub.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index b364844..9984165 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1084,7 +1084,7 @@ static inline struct page *alloc_slab_page(gfp_t flags, int node,
> if (node == -1)
> return alloc_pages(flags, order);
> else
> - return alloc_pages_node(node, flags, order);
> + return alloc_pages_exact_node(node, flags, order);
> }
>
> static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
Slub changes need to go through its maintainer, Pekka Enberg
<[email protected]>.
On Wed, Apr 14, 2010 at 6:37 AM, David Rientjes <[email protected]> wrote:
> On Wed, 14 Apr 2010, Minchan Kim wrote:
>
>> alloc_slab_page never calls alloc_pages_node with -1.
>> It means node's validity check is unnecessary.
>> So we can use alloc_pages_exact_node instead of alloc_pages_node.
>> It could avoid comparison and branch as 6484eb3e2a81807722 tried.
>>
>> Cc: Christoph Lameter <[email protected]>
>> Signed-off-by: Minchan Kim <[email protected]>
>> ---
>> mm/slub.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index b364844..9984165 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1084,7 +1084,7 @@ static inline struct page *alloc_slab_page(gfp_t flags, int node,
>> if (node == -1)
>> return alloc_pages(flags, order);
>> else
>> - return alloc_pages_node(node, flags, order);
>> + return alloc_pages_exact_node(node, flags, order);
>> }
>>
>> static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>
> Slub changes need to go through its maintainer, Pekka Enberg
> <[email protected]>.
Thanks, David. It was by mistake.
Pekka.
This changlog is bad.
I will change it with following as when I send v2.
"alloc_slab_page always checks nid == -1, so alloc_page_node can't be
called with -1.
It means node's validity check in alloc_pages_node is unnecessary.
So we can use alloc_pages_exact_node instead of alloc_pages_node.
It could avoid comparison and branch as 6484eb3e2a81807722 tried."
Thanks.
--
Kind regards,
Minchan Kim
On Wed, 14 Apr 2010, Minchan Kim wrote:
> This changlog is bad.
> I will change it with following as when I send v2.
>
> "alloc_slab_page always checks nid == -1, so alloc_page_node can't be
> called with -1.
> It means node's validity check in alloc_pages_node is unnecessary.
> So we can use alloc_pages_exact_node instead of alloc_pages_node.
> It could avoid comparison and branch as 6484eb3e2a81807722 tried."
>
Each patch in this series seems to be independent and can be applied
seperately, so it's probably not necessary to make them incremental.
On Wed, Apr 14, 2010 at 8:55 AM, David Rientjes <[email protected]> wrote:
> On Wed, 14 Apr 2010, Minchan Kim wrote:
>
>> This changlog is bad.
>> I will change it with following as when I send v2.
>>
>> "alloc_slab_page always checks nid == -1, so alloc_page_node can't be
>> called with -1.
>> It means node's validity check in alloc_pages_node is unnecessary.
>> So we can use alloc_pages_exact_node instead of alloc_pages_node.
>> It could avoid comparison and branch as 6484eb3e2a81807722 tried."
>>
>
> Each patch in this series seems to be independent and can be applied
> seperately, so it's probably not necessary to make them incremental.
Surely.
Without considering, I used git-formant-patch -n --thread.
I will keep it in mind.
Thanks, David.
--
Kind regards,
Minchan Kim
On Wed, 14 Apr 2010 00:24:58 +0900
Minchan Kim <[email protected]> wrote:
> alloc_pages calls alloc_pages_node with numa_node_id().
> alloc_pages_node can't see nid < 0.
>
> So we can use alloc_pages_exact_node instead of alloc_pages_node.
> It could avoid comparison and branch as 6484eb3e2a81807722 tried.
>
> Cc: Mel Gorman <[email protected]>
> Cc: KAMEZAWA Hiroyuki <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
On Wed, 14 Apr 2010 00:25:00 +0900
Minchan Kim <[email protected]> wrote:
> alloc_slab_page never calls alloc_pages_node with -1.
> It means node's validity check is unnecessary.
> So we can use alloc_pages_exact_node instead of alloc_pages_node.
> It could avoid comparison and branch as 6484eb3e2a81807722 tried.
>
> Cc: Christoph Lameter <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
On Wed, 14 Apr 2010 00:25:01 +0900
Minchan Kim <[email protected]> wrote:
> if node_state is N_HIGH_MEMORY, node doesn't have -1.
> It means node's validity check is unnecessary.
> So we can use alloc_pages_exact_node instead of alloc_pages_node.
> It could avoid comparison and branch as 6484eb3e2a81807722 tried.
>
> Cc: Christoph Lameter <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
On Wed, 14 Apr 2010 00:25:02 +0900
Minchan Kim <[email protected]> wrote:
> __vmalloc_area_node never pass -1 to alloc_pages_node.
> It means node's validity check is unnecessary.
> So we can use alloc_pages_exact_node instead of alloc_pages_node.
> It could avoid comparison and branch as 6484eb3e2a81807722 tried.
>
> Cc: Nick Piggin <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
But, in another thinking,
- if (node < 0)
- page = alloc_page(gfp_mask);
may be better ;)
Thanks,
-Kame
> ---
> mm/vmalloc.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index ae00746..7abf423 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1499,7 +1499,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> if (node < 0)
> page = alloc_page(gfp_mask);
> else
> - page = alloc_pages_node(node, gfp_mask, 0);
> + page = alloc_pages_exact_node(node, gfp_mask, 0);
>
> if (unlikely(!page)) {
> /* Successfully allocated i pages, free them in __vunmap() */
> --
> 1.7.0.5
>
>
On Wed, Apr 14, 2010 at 9:22 AM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Wed, 14 Apr 2010 00:25:02 +0900
> Minchan Kim <[email protected]> wrote:
>
>> __vmalloc_area_node never pass -1 to alloc_pages_node.
>> It means node's validity check is unnecessary.
>> So we can use alloc_pages_exact_node instead of alloc_pages_node.
>> It could avoid comparison and branch as 6484eb3e2a81807722 tried.
>>
>> Cc: Nick Piggin <[email protected]>
>> Signed-off-by: Minchan Kim <[email protected]>
>
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
>
> But, in another thinking,
>
> - if (node < 0)
> - page = alloc_page(gfp_mask);
>
> may be better ;)
I thought it.
but alloc_page is different function with alloc_pages_node in NUMA.
It calls alloc_pages_current.
--
Kind regards,
Minchan Kim
On Wed, Apr 14, 2010 at 3:18 AM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Wed, 14 Apr 2010 00:25:00 +0900
> Minchan Kim <[email protected]> wrote:
>
>> alloc_slab_page never calls alloc_pages_node with -1.
>> It means node's validity check is unnecessary.
>> So we can use alloc_pages_exact_node instead of alloc_pages_node.
>> It could avoid comparison and branch as 6484eb3e2a81807722 tried.
>>
>> Cc: Christoph Lameter <[email protected]>
>> Signed-off-by: Minchan Kim <[email protected]>
>
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Minchan, care to send a v2 with proper changelog and reviewed-by attributions?
Hello,
On 04/14/2010 12:48 AM, Mel Gorman wrote:
> and the mapping table on x86 at least is based on possible CPUs in
> init_cpu_to_node() leaves the mapping as 0 if the APIC is bad or the numa
> node is reported in apicid_to_node as -1. It would appear on power that
> the node will be 0 for possible CPUs as well.
>
> Hence, I believe this to be safe but a confirmation from Tejun would be
> nice. I would continue digging but this looks like an initialisation path
> so I'll move on to the next patch rather than spending more time.
This being a pretty cold path, I don't really see much benefit in
converting it to alloc_pages_node_exact(). It ain't gonna make any
difference. I'd rather stay with the safer / boring one unless
there's a pressing reason to convert.
Thanks.
--
tejun
Hi, Tejun.
On Thu, Apr 15, 2010 at 8:39 AM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On 04/14/2010 12:48 AM, Mel Gorman wrote:
>> and the mapping table on x86 at least is based on possible CPUs in
>> init_cpu_to_node() leaves the mapping as 0 if the APIC is bad or the numa
>> node is reported in apicid_to_node as -1. It would appear on power that
>> the node will be 0 for possible CPUs as well.
>>
>> Hence, I believe this to be safe but a confirmation from Tejun would be
>> nice. I would continue digging but this looks like an initialisation path
>> so I'll move on to the next patch rather than spending more time.
>
> This being a pretty cold path, I don't really see much benefit in
> converting it to alloc_pages_node_exact(). It ain't gonna make any
> difference. I'd rather stay with the safer / boring one unless
> there's a pressing reason to convert.
Actually, It's to weed out not-good API usage as well as some performance gain.
But I don't think to need it strongly.
Okay. Please keep in mind about this and correct it if you confirms it
in future. :)
>
> Thanks.
>
> --
> tejun
>
--
Kind regards,
Minchan Kim
Hello,
On 04/15/2010 10:31 AM, Minchan Kim wrote:
> Hi, Tejun.
>> This being a pretty cold path, I don't really see much benefit in
>> converting it to alloc_pages_node_exact(). It ain't gonna make any
>> difference. I'd rather stay with the safer / boring one unless
>> there's a pressing reason to convert.
>
> Actually, It's to weed out not-good API usage as well as some
> performance gain. But I don't think to need it strongly.
> Okay. Please keep in mind about this and correct it if you confirms
> it in future. :)
Hmm... if most users are converting over to alloc_pages_node_exact(),
I think it would be better to convert percpu too. I thought it was a
performance optimization (of rather silly kind too). So, this is to
weed out -1 node id usage? Wouldn't it be better to update
alloc_pages_node() such that it whines once per each caller if it's
called with -1 node id and after updating most users convert the
warning into WARN_ON_ONCE()? Having two variants for this seems
rather extreme to me.
Thanks.
--
tejun
On Thu, Apr 15, 2010 at 4:21 PM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On 04/15/2010 10:31 AM, Minchan Kim wrote:
>> Hi, Tejun.
>>> This being a pretty cold path, I don't really see much benefit in
>>> converting it to alloc_pages_node_exact(). It ain't gonna make any
>>> difference. I'd rather stay with the safer / boring one unless
>>> there's a pressing reason to convert.
>>
>> Actually, It's to weed out not-good API usage as well as some
>> performance gain. But I don't think to need it strongly.
>> Okay. Please keep in mind about this and correct it if you confirms
>> it in future. :)
>
> Hmm... if most users are converting over to alloc_pages_node_exact(),
> I think it would be better to convert percpu too. I thought it was a
> performance optimization (of rather silly kind too). So, this is to
> weed out -1 node id usage? Wouldn't it be better to update
> alloc_pages_node() such that it whines once per each caller if it's
> called with -1 node id and after updating most users convert the
> warning into WARN_ON_ONCE()? Having two variants for this seems
> rather extreme to me.
Yes. I don't like it.
With it, someone who does care about API usage uses alloc_pages_exact_node but
someone who don't have a time or careless uses alloc_pages_node.
It would make API fragmentation and not good.
Maybe we can weed out -1 and make new API which is more clear.
* struct page *alloc_pages_any_node(gfp_t gfp_mask, unsigned int order);
* struct page *alloc_pages_exact_node(int nid, gfp_mask, unsigned int order);
So firstly we have to make sure users who use alloc_pages_node can
change alloc_pages_node with alloc_pages_exact_node.
After all of it was weed out, I will change alloc_pages_node with
alloc_pages_any_node.
--
Kind regards,
Minchan Kim
Hello,
On 04/15/2010 05:00 PM, Minchan Kim wrote:
> Yes. I don't like it.
> With it, someone who does care about API usage uses alloc_pages_exact_node but
> someone who don't have a time or careless uses alloc_pages_node.
> It would make API fragmentation and not good.
> Maybe we can weed out -1 and make new API which is more clear.
>
> * struct page *alloc_pages_any_node(gfp_t gfp_mask, unsigned int order);
> * struct page *alloc_pages_exact_node(int nid, gfp_mask, unsigned int order);
I'm not an expert on that part of the kernel but isn't
alloc_pages_any_node() identical to alloc_pages_exact_node()? All
that's necessary to do now is to weed out callers which pass in
negative nid to alloc_pages_node(), right? If so, why not just do a
clean sweep of alloc_pages_node() users and update them so that they
don't call in w/ -1 nid and add WARN_ON_ONCE() in alloc_pages_node()?
Is there any reason to keep both variants going forward? If not,
introducing new API just to weed out invalid usages seems like an
overkill.
Thanks.
--
tejun
On Thu, Apr 15, 2010 at 5:15 PM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On 04/15/2010 05:00 PM, Minchan Kim wrote:
>> Yes. I don't like it.
>> With it, someone who does care about API usage uses alloc_pages_exact_node but
>> someone who don't have a time or careless uses alloc_pages_node.
>> It would make API fragmentation and not good.
>> Maybe we can weed out -1 and make new API which is more clear.
>>
>> * struct page *alloc_pages_any_node(gfp_t gfp_mask, unsigned int order);
>> * struct page *alloc_pages_exact_node(int nid, gfp_mask, unsigned int order);
>
> I'm not an expert on that part of the kernel but isn't
> alloc_pages_any_node() identical to alloc_pages_exact_node()? All
alloc_pages_any_node means user allows allocated pages in any
node(most likely current node)
alloc_pages_exact_node means user allows allocated pages in nid node
if he doesn't use __GFP_THISNODE.
> that's necessary to do now is to weed out callers which pass in
> negative nid to alloc_pages_node(), right? If so, why not just do a
It might be my final goal. I hope user uses alloc_pages_any_node
instead of nid == -1.
> clean sweep of alloc_pages_node() users and update them so that they
> don't call in w/ -1 nid and add WARN_ON_ONCE() in alloc_pages_node()?
> Is there any reason to keep both variants going forward? If not,
I am not sure someone still need alloc_pages_node. That's because
sometime he want to allocate page from specific node but sometime not.
I hope it doesn't happen. Anyway I have to identify the situation.
> introducing new API just to weed out invalid usages seems like an
> overkill.
It might be.
It think it's almost same add_to_page_cache and add_to_page_cache_locked.
If user knows the page is already locked, he can use
add_to_page_cache_locked for performance gain and code readability
which we need to lock the page before calling it.
The important point is that user uses it as he is conscious of locked page.
I think if user already know to want page where from specific node, it
would be better to use alloc_pages_exact_node instead of
alloc_pages_node.
If he want to allocate page from any node[current..fallback list], he
have to use alloc_pages_any_node without nid argument. It would make a
little performance gain(reduce passing argument) and good readbility.
:)
Now, most of user uses alloc_pages_exact_node. So I think we can do it.
But someone else also might think of overkill. :)
It's a not urgent issue. So I will take it easy.
Thanks.
--
Kind regards,
Minchan Kim
Hello,
On 04/15/2010 06:40 PM, Minchan Kim wrote:
>> I'm not an expert on that part of the kernel but isn't
>> alloc_pages_any_node() identical to alloc_pages_exact_node()? All
>
> alloc_pages_any_node means user allows allocated pages in any
> node(most likely current node) alloc_pages_exact_node means user
> allows allocated pages in nid node if he doesn't use __GFP_THISNODE.
Ooh, sorry, I meant alloc_pages(). What would be the difference
between alloc_pages_any_node() and alloc_pages()?
>> introducing new API just to weed out invalid usages seems like an
>> overkill.
>
> It might be.
>
> It think it's almost same add_to_page_cache and add_to_page_cache_locked.
> If user knows the page is already locked, he can use
> add_to_page_cache_locked for performance gain and code readability
> which we need to lock the page before calling it.
Yeah, if both APIs are necessary at the end of the conversion, sure.
I was just saying that introducing new APIs just to weed out invalid
usages and then later killing the old API would be rather excessive.
I was just wondering whether we could just clean up alloc_pages_node()
users and kill alloc_pages_exact_node().
Thanks.
--
tejun
On Thu, Apr 15, 2010 at 7:08 PM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On 04/15/2010 06:40 PM, Minchan Kim wrote:
>>> I'm not an expert on that part of the kernel but isn't
>>> alloc_pages_any_node() identical to alloc_pages_exact_node()? All
>>
>> alloc_pages_any_node means user allows allocated pages in any
>> node(most likely current node) alloc_pages_exact_node means user
>> allows allocated pages in nid node if he doesn't use __GFP_THISNODE.
>
> Ooh, sorry, I meant alloc_pages(). What would be the difference
> between alloc_pages_any_node() and alloc_pages()?
It's no different. It's same. Just naming is more explicit. :)
I think it could be following as.
#define alloc_pages alloc_pages_any_node.
strucdt page * alloc_pages_node() {
int nid = numa_node_id();
...
return page;
}
>
>>> introducing new API just to weed out invalid usages seems like an
>>> overkill.
>>
>> It might be.
>>
>> It think it's almost same add_to_page_cache and add_to_page_cache_locked.
>> If user knows the page is already locked, he can use
>> add_to_page_cache_locked for performance gain and code readability
>> which we need to lock the page before calling it.
>
> Yeah, if both APIs are necessary at the end of the conversion, sure.
> I was just saying that introducing new APIs just to weed out invalid
> usages and then later killing the old API would be rather excessive.
>
> I was just wondering whether we could just clean up alloc_pages_node()
> users and kill alloc_pages_exact_node().
kill alloc_pages_exact_node?
Sorry but I can't understand your point.
I don't want to kill user of alloc_pages_exact_node.
That's opposite.
I want to kill user of alloc_pages_node and change it with
alloc_pages_any_node or alloc_pages_exact_node. :)
I think we can do it. That's because all of caller already can check nid == -1
before calling allocation function explicitly if he cares node locality.
--
Kind regards,
Minchan Kim
On Thu, Apr 15, 2010 at 7:21 PM, Minchan Kim <[email protected]> wrote:
> On Thu, Apr 15, 2010 at 7:08 PM, Tejun Heo <[email protected]> wrote:
>> Hello,
>>
>> On 04/15/2010 06:40 PM, Minchan Kim wrote:
>>>> I'm not an expert on that part of the kernel but isn't
>>>> alloc_pages_any_node() identical to alloc_pages_exact_node()? All
>>>
>>> alloc_pages_any_node means user allows allocated pages in any
>>> node(most likely current node) alloc_pages_exact_node means user
>>> allows allocated pages in nid node if he doesn't use __GFP_THISNODE.
>>
>> Ooh, sorry, I meant alloc_pages(). What would be the difference
>> between alloc_pages_any_node() and alloc_pages()?
>
> It's no different. It's same. Just naming is more explicit. :)
> I think it could be following as.
>
> #define alloc_pages alloc_pages_any_node.
> strucdt page * alloc_pages_node() {
typo. Sorry.
struct page * alloc_pages_any_node {
> int nid = numa_node_id();
> ...
> return page;
> }
>
--
Kind regards,
Minchan Kim
Hello,
On 04/15/2010 07:21 PM, Minchan Kim wrote:
> kill alloc_pages_exact_node?
> Sorry but I can't understand your point.
> I don't want to kill user of alloc_pages_exact_node.
> That's opposite.
> I want to kill user of alloc_pages_node and change it with
> alloc_pages_any_node or alloc_pages_exact_node. :)
I see, so...
alloc_pages() -> alloc_pages_any_node()
alloc_pages_node() -> alloc_pages_exact_node()
right? It just seems strange to me and different from usual naming
convention - ie. something which doesn't care about nodes usually
doesn't carry _node postfix. Anyways, no big deal, those names just
felt a bit strange to me.
Thanks.
--
tejun
On Thu, Apr 15, 2010 at 8:43 PM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On 04/15/2010 07:21 PM, Minchan Kim wrote:
>> kill alloc_pages_exact_node?
>> Sorry but I can't understand your point.
>> I don't want to kill user of alloc_pages_exact_node.
>> That's opposite.
>> I want to kill user of alloc_pages_node and change it with
>> alloc_pages_any_node or alloc_pages_exact_node. :)
>
> I see, so...
>
> alloc_pages() -> alloc_pages_any_node()
> alloc_pages_node() -> alloc_pages_exact_node()
>
> right? It just seems strange to me and different from usual naming
> convention - ie. something which doesn't care about nodes usually
> doesn't carry _node postfix. Anyways, no big deal, those names just
> felt a bit strange to me.
I don't want to remove alloc_pages for UMA system.
#define alloc_pages alloc_page_sexact_node
What I want to remove is just alloc_pages_node. :)
Sorry for confusing you.
--
Kind regards,
Minchan Kim
On Thu, 15 Apr 2010, Minchan Kim wrote:
> I don't want to remove alloc_pages for UMA system.
alloc_pages is the same as alloc_pages_any_node so why have it?
> #define alloc_pages alloc_page_sexact_node
>
> What I want to remove is just alloc_pages_node. :)
Why remove it? If you want to get rid of -1 handling then check all the
callsites and make sure that they are not using -1.
Also could you define a constant for -1? -1 may have various meanings. One
is the local node and the other is any node. The difference is if memory
policies are obeyed or not. Note that alloc_pages follows memory policies
whereas alloc_pages_node does not.
Therefore
alloc_pages() != alloc_pages_node( , -1)
On Wed, 14 Apr 2010, Pekka Enberg wrote:
> Minchan, care to send a v2 with proper changelog and reviewed-by attributions?
Still wondering what the big deal about alloc_pages_node_exact is. Its not
exact since we can fall back to another node. It is better to clarify the
API for alloc_pages_node and forbid / clarify the use of -1.
On Fri, 2010-04-16 at 11:07 -0500, Christoph Lameter wrote:
> On Thu, 15 Apr 2010, Minchan Kim wrote:
>
> > I don't want to remove alloc_pages for UMA system.
>
> alloc_pages is the same as alloc_pages_any_node so why have it?
>
> > #define alloc_pages alloc_page_sexact_node
> >
> > What I want to remove is just alloc_pages_node. :)
>
> Why remove it? If you want to get rid of -1 handling then check all the
> callsites and make sure that they are not using -1.
>
> Also could you define a constant for -1? -1 may have various meanings. One
> is the local node and the other is any node.
NUMA_NO_NODE is #defined as (-1) and can be used for this purpose. '-1'
has been replaced by this in many cases. It can be interpreted as "No
node specified" == "any node is acceptable". But, it also has multiple
meanings. E.g., in the hugetlb sysfs attribute and sysctl functions it
indicates the global hstates [all nodes] vs a per node hstate. So, I
suppose one could define a NUMA_ANY_NODE, to make the intention clear at
the call site.
I believe that all usage of -1 to mean the local node has been removed,
unless I missed one. Local allocation is now indicated by a mempolicy
mode flag--MPOL_F_LOCAL. It's treated as a special case of
MPOL_PREFERRED.
> The difference is if memory
> policies are obeyed or not. Note that alloc_pages follows memory policies
> whereas alloc_pages_node does not.
>
> Therefore
>
> alloc_pages() != alloc_pages_node( , -1)
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
Hi, Christoph.
On Fri, 2010-04-16 at 11:07 -0500, Christoph Lameter wrote:
> On Thu, 15 Apr 2010, Minchan Kim wrote:
>
> > I don't want to remove alloc_pages for UMA system.
>
> alloc_pages is the same as alloc_pages_any_node so why have it?
I don't want to force using '_node' postfix on UMA users.
Maybe they don't care getting page from any node and event don't need to
know about _NODE_.
>
> > #define alloc_pages alloc_page_sexact_node
> >
> > What I want to remove is just alloc_pages_node. :)
>
> Why remove it? If you want to get rid of -1 handling then check all the
alloc_pages_node have multiple meaning as you said. So some of users
misuses that API. I want to clear intention of user.
> callsites and make sure that they are not using -1.
Sure. I must do it before any progressing.
>
> Also could you define a constant for -1? -1 may have various meanings. One
> is the local node and the other is any node. The difference is if memory
> policies are obeyed or not. Note that alloc_pages follows memory policies
> whereas alloc_pages_node does not.
>
> Therefore
>
> alloc_pages() != alloc_pages_node( , -1)
>
Yes, now it's totally different.
On UMA, It's any node but on NUMA, local node.
My concern is following as.
alloc_pages_node means any node but it has nid argument.
Why should user of alloc_pages who want to get page from any node pass
nid argument? It's rather awkward.
Some of user misunderstood it and used alloc_pages_node instead of
alloc_pages_exact_node although he already know exact _NID_.
Of course, it's not a BUG since if nid >= 0 it works well.
But I want to remove such multiple meaning to clear intention of user.
--
Kind regards,
Minchan Kim
Hi, Lee.
On Fri, 2010-04-16 at 15:13 -0400, Lee Schermerhorn wrote:
> On Fri, 2010-04-16 at 11:07 -0500, Christoph Lameter wrote:
> > On Thu, 15 Apr 2010, Minchan Kim wrote:
> >
> > > I don't want to remove alloc_pages for UMA system.
> >
> > alloc_pages is the same as alloc_pages_any_node so why have it?
> >
> > > #define alloc_pages alloc_page_sexact_node
> > >
> > > What I want to remove is just alloc_pages_node. :)
> >
> > Why remove it? If you want to get rid of -1 handling then check all the
> > callsites and make sure that they are not using -1.
> >
> > Also could you define a constant for -1? -1 may have various meanings. One
> > is the local node and the other is any node.
>
> NUMA_NO_NODE is #defined as (-1) and can be used for this purpose. '-1'
> has been replaced by this in many cases. It can be interpreted as "No
> node specified" == "any node is acceptable". But, it also has multiple
> meanings. E.g., in the hugetlb sysfs attribute and sysctl functions it
> indicates the global hstates [all nodes] vs a per node hstate. So, I
> suppose one could define a NUMA_ANY_NODE, to make the intention clear at
> the call site.
>
> I believe that all usage of -1 to mean the local node has been removed,
> unless I missed one. Local allocation is now indicated by a mempolicy
> mode flag--MPOL_F_LOCAL. It's treated as a special case of
> MPOL_PREFERRED.
Thanks for good information. :)
--
Kind regards,
Minchan Kim
Christoph Lameter wrote:
> On Wed, 14 Apr 2010, Pekka Enberg wrote:
>
>> Minchan, care to send a v2 with proper changelog and reviewed-by attributions?
>
> Still wondering what the big deal about alloc_pages_node_exact is. Its not
> exact since we can fall back to another node. It is better to clarify the
> API for alloc_pages_node and forbid / clarify the use of -1.
Minchan's patch is a continuation of this patch:
http://git.kernel.org/?p=linux/kernel/git/penberg/slab-2.6.git;a=commit;h=6484eb3e2a81807722
On 04/19/2010 12:54 AM, Minchan Kim wrote:
>> alloc_pages is the same as alloc_pages_any_node so why have it?
>
> I don't want to force using '_node' postfix on UMA users.
> Maybe they don't care getting page from any node and event don't need to
> know about _NODE_.
Yeah, then, remove alloc_pages_any_node(). I can't really see the
point of any_/exact_node. alloc_pages() and alloc_pages_node() are
fine and in line with other functions. Why change it?
>> Why remove it? If you want to get rid of -1 handling then check all the
>
> alloc_pages_node have multiple meaning as you said. So some of users
> misuses that API. I want to clear intention of user.
The name is fine. Just clean up the users and make the intended usage
clear in documentation and implementation (ie. trigger a big fat
warning) and make all the callers use named constants instead of -1
for special meanings.
Thanks.
--
tejun
Hi, Tejun.
On Mon, Apr 19, 2010 at 6:22 AM, Tejun Heo <[email protected]> wrote:
> On 04/19/2010 12:54 AM, Minchan Kim wrote:
>>> alloc_pages is the same as alloc_pages_any_node so why have it?
>>
>> I don't want to force using '_node' postfix on UMA users.
>> Maybe they don't care getting page from any node and event don't need to
>> know about _NODE_.
>
> Yeah, then, remove alloc_pages_any_node(). I can't really see the
> point of any_/exact_node. alloc_pages() and alloc_pages_node() are
> fine and in line with other functions. Why change it?
>
>>> Why remove it? If you want to get rid of -1 handling then check all the
>>
>> alloc_pages_node have multiple meaning as you said. So some of users
>> misuses that API. I want to clear intention of user.
>
> The name is fine. Just clean up the users and make the intended usage
> clear in documentation and implementation (ie. trigger a big fat
> warning) and make all the callers use named constants instead of -1
> for special meanings.
>
> Thanks.
Let's tidy my table.
I made quick patch to show the concept with one example of pci-dma.
(Sorry but I attach patch since web gmail's mangling.)
On UMA, we can change alloc_pages with
alloc_pages_exact_node(numa_node_id(),....)
(Actually, the patch is already merged mmotm)
on NUMA, alloc_pages is some different meaning, so I don't want to change it.
on NUMA, alloc_pages_node means _ANY_NODE_.
So let's remove nid argument and change naming with alloc_pages_any_node.
Then, whole users of alloc_pages_node can be changed between
alloc_pages_exact_node and alloc_pages_any_node.
It was my intention. What's your concern?
Thanks for your interest, Tejun. :)
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index a4ac764..dc511cb 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -152,12 +152,21 @@ void *dma_generic_alloc_coherent(struct device
*dev, size_t size,
unsigned long dma_mask;
struct page *page;
dma_addr_t addr;
+ int nid;
dma_mask = dma_alloc_coherent_mask(dev, flag);
flag |= __GFP_ZERO;
again:
- page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
+ nid = dev_to_node(dev);
+ /*
+ * If pci-dma maintainer makes sure nid never has NUMA_NO_NODE
+ * we can remove this ugly checking.
+ */
+ if (nid == NUMA_NO_NODE)
+ page = alloc_pages_any_node(flag, get_order(size));
+ else
+ page = alloc_pages_exact_node(nid, flag, get_order(size));
if (!page)
return NULL;
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4c6d413..47fba21 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -278,13 +278,10 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
}
-static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
+static inline struct page *alloc_pagse_any_node(gfp_t gfp_mask,
unsigned int order)
{
- /* Unknown node is current node */
- if (nid < 0)
- nid = numa_node_id();
-
+ int nid = numa_node_id();
return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
}
@@ -308,7 +305,7 @@ extern struct page *alloc_page_vma(gfp_t gfp_mask,
struct vm_area_struct *vma, unsigned long addr);
#else
#define alloc_pages(gfp_mask, order) \
- alloc_pages_node(numa_node_id(), gfp_mask, order)
+ alloc_pages_exact_node(numa_node_id(), gfp_mask, order)
#define alloc_page_vma(gfp_mask, vma, addr) alloc_pages(gfp_mask, 0)
#endif
#define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
~
--
Kind regards,
Minchan Kim
On Fri, Apr 16, 2010 at 11:10:01AM -0500, Christoph Lameter wrote:
> On Wed, 14 Apr 2010, Pekka Enberg wrote:
>
> > Minchan, care to send a v2 with proper changelog and reviewed-by attributions?
>
> Still wondering what the big deal about alloc_pages_node_exact is. Its not
> exact since we can fall back to another node. It is better to clarify the
> API for alloc_pages_node and forbid / clarify the use of -1.
>
There should be a comment clarifing it now. I admit the naming fault is
mine. At the time, the intended meaning was "allocate pages from any node in
the fallback list and the caller knows exactly which node to start from". I
did not take into account that the meaning of "exact" depends on context.
With a comment clarifying the meaning, I do not think a rename is necessary.
However, I'd rather not see a mass renaming of functions like alloc_pages()
that have existed a long times. If nothing else, they are documented in books
like "Linux Device Drivers" so why make life harder on device authors than
it already is?
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Mon, 19 Apr 2010, Minchan Kim wrote:
> My concern is following as.
>
> alloc_pages_node means any node but it has nid argument.
> Why should user of alloc_pages who want to get page from any node pass
> nid argument? It's rather awkward.
Its not awkward but an optimization. The page can be placed on any node
but the user would prefer a certain node. Most of the NUMA things are
there for optimization purposes and not for correctness. If you must have
an allocation on certain nodes for correctness (like SLAB) then
GFP_THISNODE is used.
> Some of user misunderstood it and used alloc_pages_node instead of
> alloc_pages_exact_node although he already know exact _NID_.
> Of course, it's not a BUG since if nid >= 0 it works well.
>
> But I want to remove such multiple meaning to clear intention of user.
Its not clear to me that this renaming etc helps. You
must use GFP_THISNODE if allocation must occur from a certain node.
alloc_pages_exact_node results in more confusion because it does suggest
that fallback to other nodes is not allowed.
On Mon, 19 Apr 2010, Minchan Kim wrote:
> Let's tidy my table.
>
> I made quick patch to show the concept with one example of pci-dma.
> (Sorry but I attach patch since web gmail's mangling.)
>
> On UMA, we can change alloc_pages with
> alloc_pages_exact_node(numa_node_id(),....)
> (Actually, the patch is already merged mmotm)
UMA does not have the concept of nodes. Whatever node you specify is
irrelevant. Please remove the patch from mmotm.
> on NUMA, alloc_pages is some different meaning, so I don't want to change it.
No it has the same meaning. It means allocate a page.
> on NUMA, alloc_pages_node means _ANY_NODE_.
It means allocate on the indicated node if possible. Memory could come
from any node due to fallback (in order of node preference).
> So let's remove nid argument and change naming with alloc_pages_any_node.
??? What in the world are you doing?
> Then, whole users of alloc_pages_node can be changed between
> alloc_pages_exact_node and alloc_pages_any_node.
>
> It was my intention. What's your concern?
I dont see the point.
> again:
> - page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
> + nid = dev_to_node(dev);
> + /*
> + * If pci-dma maintainer makes sure nid never has NUMA_NO_NODE
> + * we can remove this ugly checking.
> + */
> + if (nid == NUMA_NO_NODE)
> + page = alloc_pages_any_node(flag, get_order(size));
s/alloc_pages_any_node/alloc_pages/
> + else
> + page = alloc_pages_exact_node(nid, flag, get_order(size));
s/alloc_pages_exact_node/alloc_pages_node/
> -static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
> +static inline struct page *alloc_pagse_any_node(gfp_t gfp_mask,
> unsigned int order)
> {
> - /* Unknown node is current node */
> - if (nid < 0)
> - nid = numa_node_id();
> -
> + int nid = numa_node_id();
> return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
> }
>
This is very confusing. Because it is
alloc_pages_numa_node_id()
alloca_pages_any_node suggests that the kernel randomly picks a node?
Hello, Christoph.
On 04/20/2010 02:38 AM, Christoph Lameter wrote:
> alloc_pages_exact_node results in more confusion because it does suggest
> that fallback to other nodes is not allowed.
I can't see why alloc_pages_exact_node() exists at all. It's in the
mainline and if you look at the difference between alloc_pages_node()
and alloc_pages_exact_node(), it's almost silly. :-(
--
tejun
On Tue, Apr 20, 2010 at 2:45 AM, Christoph Lameter
<[email protected]> wrote:
> On Mon, 19 Apr 2010, Minchan Kim wrote:
>
>> Let's tidy my table.
>>
>> I made quick patch to show the concept with one example of pci-dma.
>> (Sorry but I attach patch since web gmail's mangling.)
>>
>> On UMA, we can change alloc_pages with
>> alloc_pages_exact_node(numa_node_id(),....)
>> (Actually, the patch is already merged mmotm)
>
> UMA does not have the concept of nodes. Whatever node you specify is
> irrelevant. Please remove the patch from mmotm.
I didn't change API name. The patch is just for optimization.
http://lkml.org/lkml/2010/4/14/225
I think it's reasonable in UMA.
Why do you want to remove it?
Do you dislike alloc_pages_exact_node naming?
I added comment.
http://lkml.org/lkml/2010/4/14/230
Do you think it isn't enough?
This patch results from misunderstanding of alloc_pages_exact_node.
(http://marc.info/?l=linux-mm&m=127109064101184&w=2)
At that time, I thought naming changing is worth.
But many people don't like it.
Okay. It was just trial and if everyone dislike, I don't have any strong cause.
But this patch series don't relate to it. Again said, It's just for
optimization patch.
Let's clarify other's opinion.
1. "I dislike alloc_pages_exact_node naming. Let's change it with more
clear name."
2. "I hate alloc_pages_exact_node. It's trivial optimization. Let's
remove it and replace it with alloc_pages_node."
3. "alloc_pages_exact_node naming is not bad. Let's add the comment to
clear name"
4. "Let's cleanup alloc_pages_xxx in this change as well as 3.
5. "Please, don't touch. Remain whole of thing like as-is."
I think Chrsitop selects 5 or 1, Tejun selects 2, Mel selects 3, me
want to 4 but is satisfied with 3. Right?
If we selects 5, In future, there are confusing between
alloc_pages_node and alloc_pages_exact_node.So I don't want it.
If we select 2, We already have many place of alloc_pages_exact_node.
And I add this patch series. So most of caller uses alloc_pages_exact_node now.
Isn't it trivial?
So I want 3 at lest although you guys don't like 4.
Please, suggest better idea to me. :)
--
Kind regards,
Minchan Kim
On Tue, Apr 20, 2010 at 07:27:09AM +0900, Tejun Heo wrote:
> Hello, Christoph.
>
> On 04/20/2010 02:38 AM, Christoph Lameter wrote:
> > alloc_pages_exact_node results in more confusion because it does suggest
> > that fallback to other nodes is not allowed.
>
> I can't see why alloc_pages_exact_node() exists at all. It's in the
> mainline and if you look at the difference between alloc_pages_node()
> and alloc_pages_exact_node(), it's almost silly. :-(
>
alloc_pages_exact_node() avoids a branch in a hot path that is checking for
something the caller already knows. That's the reason it exists.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Tue, 20 Apr 2010, Mel Gorman wrote:
> alloc_pages_exact_node() avoids a branch in a hot path that is checking for
> something the caller already knows. That's the reason it exists.
We can avoid alloc_pages_exact_node() by making all callers of
alloc_pages_node() never use -1. -1 is ambiguous and only rarely will a
caller pass that to alloc_pages_node().
On Wed, Apr 21, 2010 at 11:15 PM, Christoph Lameter
<[email protected]> wrote:
> On Tue, 20 Apr 2010, Mel Gorman wrote:
>
>> alloc_pages_exact_node() avoids a branch in a hot path that is checking for
>> something the caller already knows. That's the reason it exists.
>
> We can avoid alloc_pages_exact_node() by making all callers of
> alloc_pages_node() never use -1. -1 is ambiguous and only rarely will a
> caller pass that to alloc_pages_node().
That's very reasonable to me.
Then, we can remove alloc_pages_exact_node and nid < 0 check in
alloc_pages_node at the same time.
Mel. Could you agree?
Firstly Tejun suggested this but I didn't got the point.
Sorry for bothering you.
Okay. I will dive into this approach.
Thanks for careful review, All.
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>
--
Kind regards,
Minchan Kim
Hello,
On 04/20/2010 05:05 PM, Mel Gorman wrote:
> alloc_pages_exact_node() avoids a branch in a hot path that is checking for
> something the caller already knows. That's the reason it exists.
Yeah sure but Minchan is trying to tidy up the API by converting
alloc_pages_node() users to use alloc_pages_exact_node(), at which
point, the distinction becomes pretty useless. Wouldn't just making
alloc_pages_node() do what alloc_pages_exact_node() does now and
converting all its users be simpler? IIRC, the currently planned
transformation looks like the following.
alloc_pages() -> alloc_pages_any_node()
alloc_pages_node() -> basically gonna be obsoleted by _exact_node
alloc_pages_exact_node() -> gonna be used by most NUMA aware allocs
So, let's just make sure no one calls alloc_pages_node() w/ -1 nid,
kill alloc_pages_node() and rename alloc_pages_exact_node() to
alloc_pages_node().
Thanks.
--
tejun
On Wed, Apr 21, 2010 at 7:48 PM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On 04/20/2010 05:05 PM, Mel Gorman wrote:
>> alloc_pages_exact_node() avoids a branch in a hot path that is checking for
>> something the caller already knows. That's the reason it exists.
>
> Yeah sure but Minchan is trying to tidy up the API by converting
> alloc_pages_node() users to use alloc_pages_exact_node(), at which
> point, the distinction becomes pretty useless. Wouldn't just making
> alloc_pages_node() do what alloc_pages_exact_node() does now and
> converting all its users be simpler? IIRC, the currently planned
> transformation looks like the following.
>
> alloc_pages() -> alloc_pages_any_node()
> alloc_pages_node() -> basically gonna be obsoleted by _exact_node
> alloc_pages_exact_node() -> gonna be used by most NUMA aware allocs
>
> So, let's just make sure no one calls alloc_pages_node() w/ -1 nid,
> kill alloc_pages_node() and rename alloc_pages_exact_node() to
> alloc_pages_node().
Yes. It was a stupid idea. I hope Mel agree this suggestion.
Thanks for careful review, Tejun.
--
Kind regards,
Minchan Kim