2014-12-20 13:55:10

by Rafael Aquini

[permalink] [raw]
Subject: [PATCH] proc: task_mmu: show page size in /proc/<pid>/numa_maps

This patch introduces 'pagesize' line element to /proc/<pid>/numa_maps
report file in order to help disambiguating the size of pages that are
backing memory areas mapped by a task. When the VMA backing page size
is observed different from kernel's default PAGE_SIZE, the new element
is printed out to complement report output. This is specially useful to
help differentiating between HUGE and GIGANTIC page VMAs.

This patch is based on Dave Hansen's proposal and reviewer's follow ups
taken from this dicussion: https://lkml.org/lkml/2011/9/21/454

Signed-off-by: Rafael Aquini <[email protected]>
---
fs/proc/task_mmu.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 246eae8..9f2e2c8 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1479,6 +1479,7 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
struct mm_struct *mm = vma->vm_mm;
struct mm_walk walk = {};
struct mempolicy *pol;
+ unsigned long page_size;
char buffer[64];
int nid;

@@ -1533,6 +1534,10 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
if (!md->pages)
goto out;

+ page_size = vma_kernel_pagesize(vma);
+ if (page_size != PAGE_SIZE)
+ seq_printf(m, " pagesize=%lu", page_size);
+
if (md->anon)
seq_printf(m, " anon=%lu", md->anon);

--
1.9.3


2014-12-20 16:41:11

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] proc: task_mmu: show page size in /proc/<pid>/numa_maps

On 12/20/2014 05:54 AM, Rafael Aquini wrote:
> This patch introduces 'pagesize' line element to /proc/<pid>/numa_maps
> report file in order to help disambiguating the size of pages that are
> backing memory areas mapped by a task. When the VMA backing page size
> is observed different from kernel's default PAGE_SIZE, the new element
> is printed out to complement report output. This is specially useful to
> help differentiating between HUGE and GIGANTIC page VMAs.

Heh, I completely forgot about this. Thanks for picking it back up.

I sometimes wonder what 'numa_maps' purpose is any if we should have
_some_ kind of policy about what goes in there vs. smaps. numa_maps
seems to be turning in to smaps, minus the \n. :)

But that isn't the case for this patch. The "anon=50 dirty=50 N0=50"
output of numa_maps is wholly *useless* without either this patch or
some other mechanism to find out of hugetbfs memory is present. I think
that needs to make it in to the description.

I'm fine with the code, though. Feel free to add my acked-by.

2014-12-20 18:36:21

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH] proc: task_mmu: show page size in /proc/<pid>/numa_maps

On Sat, Dec 20, 2014 at 08:54:45AM -0500, Rafael Aquini wrote:
> This patch introduces 'pagesize' line element to /proc/<pid>/numa_maps
> report file in order to help disambiguating the size of pages that are
> backing memory areas mapped by a task. When the VMA backing page size
> is observed different from kernel's default PAGE_SIZE, the new element
> is printed out to complement report output. This is specially useful to
> help differentiating between HUGE and GIGANTIC page VMAs.
>
> This patch is based on Dave Hansen's proposal and reviewer's follow ups
> taken from this dicussion: https://lkml.org/lkml/2011/9/21/454
>
> Signed-off-by: Rafael Aquini <[email protected]>
> ---
> fs/proc/task_mmu.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 246eae8..9f2e2c8 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1479,6 +1479,7 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
> struct mm_struct *mm = vma->vm_mm;
> struct mm_walk walk = {};
> struct mempolicy *pol;
> + unsigned long page_size;
> char buffer[64];
> int nid;
>
> @@ -1533,6 +1534,10 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
> if (!md->pages)
> goto out;
>
> + page_size = vma_kernel_pagesize(vma);
> + if (page_size != PAGE_SIZE)
> + seq_printf(m, " pagesize=%lu", page_size);

It would be simpler to include this unconditionally. Otherwise you
are forcing everybody parsing the file and trying to run calculations
of it to check for its presence, and then have them fall back and get
the value from somewhere else if not.

2014-12-20 19:45:24

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH] proc: task_mmu: show page size in /proc/<pid>/numa_maps

On Sat, Dec 20, 2014 at 01:36:13PM -0500, Johannes Weiner wrote:
> On Sat, Dec 20, 2014 at 08:54:45AM -0500, Rafael Aquini wrote:
> > This patch introduces 'pagesize' line element to /proc/<pid>/numa_maps
> > report file in order to help disambiguating the size of pages that are
> > backing memory areas mapped by a task. When the VMA backing page size
> > is observed different from kernel's default PAGE_SIZE, the new element
> > is printed out to complement report output. This is specially useful to
> > help differentiating between HUGE and GIGANTIC page VMAs.
> >
> > This patch is based on Dave Hansen's proposal and reviewer's follow ups
> > taken from this dicussion: https://lkml.org/lkml/2011/9/21/454
> >
> > Signed-off-by: Rafael Aquini <[email protected]>
> > ---
> > fs/proc/task_mmu.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index 246eae8..9f2e2c8 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -1479,6 +1479,7 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
> > struct mm_struct *mm = vma->vm_mm;
> > struct mm_walk walk = {};
> > struct mempolicy *pol;
> > + unsigned long page_size;
> > char buffer[64];
> > int nid;
> >
> > @@ -1533,6 +1534,10 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
> > if (!md->pages)
> > goto out;
> >
> > + page_size = vma_kernel_pagesize(vma);
> > + if (page_size != PAGE_SIZE)
> > + seq_printf(m, " pagesize=%lu", page_size);
>
> It would be simpler to include this unconditionally. Otherwise you
> are forcing everybody parsing the file and trying to run calculations
> of it to check for its presence, and then have them fall back and get
> the value from somewhere else if not.

I'm fine either way, it makes the change even simpler. Also, if we
decide to get rid of page_size != PAGE_SIZE condition I believe we can
also get rid of that "huge" hint being conditionally printed out too.

-- Rafael

2014-12-21 18:02:52

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] proc: task_mmu: show page size in /proc/<pid>/numa_maps

On 12/20/2014 11:44 AM, Rafael Aquini wrote:
>> >
>> > It would be simpler to include this unconditionally. Otherwise you
>> > are forcing everybody parsing the file and trying to run calculations
>> > of it to check for its presence, and then have them fall back and get
>> > the value from somewhere else if not.
> I'm fine either way, it makes the change even simpler. Also, if we
> decide to get rid of page_size != PAGE_SIZE condition I believe we can
> also get rid of that "huge" hint being conditionally printed out too.

That would break existing users of the "huge" flag. That makes it out
of the question, right?

2014-12-21 22:29:09

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH] proc: task_mmu: show page size in /proc/<pid>/numa_maps

On Sun, Dec 21, 2014 at 10:02:49AM -0800, Dave Hansen wrote:
> On 12/20/2014 11:44 AM, Rafael Aquini wrote:
> >> >
> >> > It would be simpler to include this unconditionally. Otherwise you
> >> > are forcing everybody parsing the file and trying to run calculations
> >> > of it to check for its presence, and then have them fall back and get
> >> > the value from somewhere else if not.
> > I'm fine either way, it makes the change even simpler. Also, if we
> > decide to get rid of page_size != PAGE_SIZE condition I believe we can
> > also get rid of that "huge" hint being conditionally printed out too.
>
> That would break existing users of the "huge" flag. That makes it out
> of the question, right?
>
Yeah, but it sort of follows the same complaint Johannes did for the
conditional page size printouts. If we start to print out page size
deliberately for each map regardless their backing pages being PAGE_SIZE
long or bigger, I don't see much point on keep conditionally printing out
the 'huge' hint out. As I said before, I'm fine either way though I think
we can keep the current behaviour, and just disambiguate page sizes !=
PAGE_SIZE as in the current proposal.

Looking forward more of your thoughts!

Cheers,
-- Rafael

2014-12-22 17:10:38

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] proc: task_mmu: show page size in /proc/<pid>/numa_maps

On 12/21/2014 02:28 PM, Rafael Aquini wrote:
>>> > > I'm fine either way, it makes the change even simpler. Also, if we
>>> > > decide to get rid of page_size != PAGE_SIZE condition I believe we can
>>> > > also get rid of that "huge" hint being conditionally printed out too.
>> >
>> > That would break existing users of the "huge" flag. That makes it out
>> > of the question, right?
>> >
> Yeah, but it sort of follows the same complaint Johannes did for the
> conditional page size printouts. If we start to print out page size
> deliberately for each map regardless their backing pages being PAGE_SIZE
> long or bigger, I don't see much point on keep conditionally printing out
> the 'huge' hint out.

Because existing userspace might be relying on it. If we take the
'huge' hint out, userspace will break.

> As I said before, I'm fine either way though I think
> we can keep the current behaviour, and just disambiguate page sizes !=
> PAGE_SIZE as in the current proposal.

Unless we somehow have a (really good) handle on how many apps out there
are reading and using 'huge', I think we have to keep the existing behavior.

2014-12-22 17:25:19

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH] proc: task_mmu: show page size in /proc/<pid>/numa_maps

On Mon, Dec 22, 2014 at 09:10:34AM -0800, Dave Hansen wrote:
> On 12/21/2014 02:28 PM, Rafael Aquini wrote:
> >>> > > I'm fine either way, it makes the change even simpler. Also, if we
> >>> > > decide to get rid of page_size != PAGE_SIZE condition I believe we can
> >>> > > also get rid of that "huge" hint being conditionally printed out too.
> >> >
> >> > That would break existing users of the "huge" flag. That makes it out
> >> > of the question, right?
> >> >
> > Yeah, but it sort of follows the same complaint Johannes did for the
> > conditional page size printouts. If we start to print out page size
> > deliberately for each map regardless their backing pages being PAGE_SIZE
> > long or bigger, I don't see much point on keep conditionally printing out
> > the 'huge' hint out.
>
> Because existing userspace might be relying on it. If we take the
> 'huge' hint out, userspace will break.
>
> > As I said before, I'm fine either way though I think
> > we can keep the current behaviour, and just disambiguate page sizes !=
> > PAGE_SIZE as in the current proposal.
>
> Unless we somehow have a (really good) handle on how many apps out there
> are reading and using 'huge', I think we have to keep the existing behavior.
>
Right. I definitely don't have anything better than what I already
presented which seems beaten by your argument, already.
Remaining question here is: should we print out 'pagesize' deliberately
or conditionally, only to disambiguate cases where page_size != PAGE_SIZE?

Have a very nice holidays folks!
-- Rafael

2014-12-22 17:59:40

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] proc: task_mmu: show page size in /proc/<pid>/numa_maps

On 12/22/2014 09:25 AM, Rafael Aquini wrote:
> Remaining question here is: should we print out 'pagesize' deliberately
> or conditionally, only to disambiguate cases where page_size != PAGE_SIZE?

I say print it unconditionally. Not to completely overdesign this, but
I do think we should try to at least mirror the terminology that smaps uses:

KernelPageSize: 4 kB
MMUPageSize: 4 kB

So definitely call this kernelpagesize.

It appears that powerpc is the only architecture where there is a
difference, and I'm not sure that this is very common at all these days.
Do we need mmupagesize in numa_maps too?

2014-12-22 22:21:39

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] proc: task_mmu: show page size in /proc/<pid>/numa_maps

On Mon, 22 Dec 2014, Dave Hansen wrote:

> > Remaining question here is: should we print out 'pagesize' deliberately
> > or conditionally, only to disambiguate cases where page_size != PAGE_SIZE?
>
> I say print it unconditionally. Not to completely overdesign this, but
> I do think we should try to at least mirror the terminology that smaps uses:
>
> KernelPageSize: 4 kB
> MMUPageSize: 4 kB
>
> So definitely call this kernelpagesize.
>

Feel free to add my acked-by if this patch prints it unconditionally and
renames this to kernelpagesize per Dave. I agree we need to leave "huge"
for existing dependencies even though we have multiple possible hugepage
sizes.

2014-12-22 22:27:42

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] proc: task_mmu: show page size in /proc/<pid>/numa_maps

On Sat, 20 Dec 2014, Dave Hansen wrote:

> I sometimes wonder what 'numa_maps' purpose is any if we should have
> _some_ kind of policy about what goes in there vs. smaps. numa_maps
> seems to be turning in to smaps, minus the \n. :)
>

It seems like an interface, similar to the one proposed by Ulrich, that
described the NUMA topology would have obsoleted numa_maps and you could
only rely on smaps to determine locality. There's existing userspace
dependencies on numa_maps already, though, so owell.

I had to fix numa_maps output when autonuma was merged for a much more
subtle difference at
http://marc.info/?l=git-commits-head&m=139113691614467 so I know some
people actually parse this and care quite a bit about its accuracy, it's a
shame it can't be deprecated in favor of adding the necessary information
to smaps.