2012-02-02 06:24:18

by Siddhesh Poyarekar

[permalink] [raw]
Subject: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps

Hi,

Resending since I did not get any feedback on the second take of the patch.

Thanks,
Siddhesh


---------- Forwarded message ----------
From: Siddhesh Poyarekar <[email protected]>
Date: Tue, Jan 17, 2012 at 10:24 AM
Subject: [PATCH] Mark thread stack correctly in proc/<pid>/maps
To: Jamie Lokier <[email protected]>
Cc: [email protected], [email protected], Alexander Viro
<[email protected]>, [email protected], Michael
Kerrisk <[email protected]>, [email protected], Siddhesh
Poyarekar <[email protected]>


[Take 2]

Memory mmaped by glibc for a thread stack currently shows up as a simple
anonymous map, which makes it difficult to differentiate between memory
usage of the thread on stack and other dynamic allocation. Since glibc
already uses MAP_STACK to request this mapping, the attached patch
uses this flag to add additional VM_STACK_FLAGS to the resulting vma
so that the mapping is treated as a stack and not any regular
anonymous mapping. Also, one may use vm_flags to decide if a vma is a
stack.

This patch also changes the maps output to annotate stack guards for
both the process stack as well as the thread stacks. Thus is born the
[stack guard] annotation, which should be exactly a page long for the
process stack and can be longer than a page (configurable in
userspace) for POSIX compliant thread stacks. A thread stack guard is
simply page(s) with PROT_NONE.

If accepted, this should also reflect in the man page for mmap since
MAP_STACK will no longer be a noop.

Signed-off-by: Siddhesh Poyarekar <[email protected]>
---
?fs/proc/task_mmu.c | ? 41 ++++++++++++++++++++++++++++++++++++-----
?include/linux/mm.h | ? 19 +++++++++++++++++--
?mm/mmap.c ? ? ? ? ?| ? ?3 +++
?3 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e418c5a..650330c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -227,13 +227,42 @@ static void show_map_vma(struct seq_file *m,
struct vm_area_struct *vma)
? ? ? ? ? ? ? ?pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT;
? ? ? ?}

- ? ? ? /* We don't show the stack guard page in /proc/maps */
+ ? ? ? /*
+ ? ? ? ?* Mark the process stack guard, which is just one page at the
+ ? ? ? ?* beginning of the stack within the stack vma.
+ ? ? ? ?*/
? ? ? ?start = vma->vm_start;
- ? ? ? if (stack_guard_page_start(vma, start))
+ ? ? ? if (stack_guard_page_start(vma, start)) {
+ ? ? ? ? ? ? ? seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n",
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? start,
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? start + PAGE_SIZE,
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? flags & VM_READ ? 'r' : '-',
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? flags & VM_WRITE ? 'w' : '-',
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? flags & VM_EXEC ? 'x' : '-',
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? flags & VM_MAYSHARE ? 's' : 'p',
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? pgoff,
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? MAJOR(dev), MINOR(dev), ino, &len);
+
+ ? ? ? ? ? ? ? pad_len_spaces(m, len);
+ ? ? ? ? ? ? ? seq_puts(m, "[stack guard]\n");
? ? ? ? ? ? ? ?start += PAGE_SIZE;
+ ? ? ? }
? ? ? ?end = vma->vm_end;
- ? ? ? if (stack_guard_page_end(vma, end))
+ ? ? ? if (stack_guard_page_end(vma, end)) {
+ ? ? ? ? ? ? ? seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n",
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? end - PAGE_SIZE,
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? end,
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? flags & VM_READ ? 'r' : '-',
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? flags & VM_WRITE ? 'w' : '-',
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? flags & VM_EXEC ? 'x' : '-',
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? flags & VM_MAYSHARE ? 's' : 'p',
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? pgoff,
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? MAJOR(dev), MINOR(dev), ino, &len);
+
+ ? ? ? ? ? ? ? pad_len_spaces(m, len);
+ ? ? ? ? ? ? ? seq_puts(m, "[stack guard]\n");
? ? ? ? ? ? ? ?end -= PAGE_SIZE;
+ ? ? ? }

? ? ? ?seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n",
? ? ? ? ? ? ? ? ? ? ? ?start,
@@ -259,8 +288,10 @@ static void show_map_vma(struct seq_file *m,
struct vm_area_struct *vma)
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?if (vma->vm_start <= mm->brk &&
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?vma->vm_end >= mm->start_brk) {
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?name = "[heap]";
- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? } else if (vma->vm_start <= mm->start_stack &&
- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?vma->vm_end >= mm->start_stack) {
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? } else if (vma_is_stack(vma) &&
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?vma_is_guard(vma)) {
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? name = "[stack guard]";
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? } else if (vma_is_stack(vma)) {
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?name = "[stack]";
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?}
? ? ? ? ? ? ? ? ? ? ? ?} else {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 17b27cd..4e57753 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1018,12 +1018,26 @@ static inline int vma_growsdown(struct
vm_area_struct *vma, unsigned long addr)
? ? ? ?return vma && (vma->vm_end == addr) && (vma->vm_flags & VM_GROWSDOWN);
?}

+static inline int vma_is_stack(struct vm_area_struct *vma)
+{
+ ? ? ? return vma && (vma->vm_flags & (VM_GROWSUP | VM_GROWSDOWN));
+}
+
+/*
+ * Check guard set by userspace (PROT_NONE)
+ */
+static inline int vma_is_guard(struct vm_area_struct *vma)
+{
+ ? ? ? return (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC |
VM_SHARED)) == 0;
+}
+
?static inline int stack_guard_page_start(struct vm_area_struct *vma,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? unsigned long addr)
?{
? ? ? ?return (vma->vm_flags & VM_GROWSDOWN) &&
? ? ? ? ? ? ? ?(vma->vm_start == addr) &&
- ? ? ? ? ? ? ? !vma_growsdown(vma->vm_prev, addr);
+ ? ? ? ? ? ? ? !vma_growsdown(vma->vm_prev, addr) &&
+ ? ? ? ? ? ? ? !vma_is_guard(vma);
?}

?/* Is the vma a continuation of the stack vma below it? */
@@ -1037,7 +1051,8 @@ static inline int stack_guard_page_end(struct
vm_area_struct *vma,
?{
? ? ? ?return (vma->vm_flags & VM_GROWSUP) &&
? ? ? ? ? ? ? ?(vma->vm_end == addr) &&
- ? ? ? ? ? ? ? !vma_growsup(vma->vm_next, addr);
+ ? ? ? ? ? ? ? !vma_growsup(vma->vm_next, addr) &&
+ ? ? ? ? ? ? ? !vma_is_guard(vma);
?}

?extern unsigned long move_page_tables(struct vm_area_struct *vma,
diff --git a/mm/mmap.c b/mm/mmap.c
index 3f758c7..2f9f540 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -992,6 +992,9 @@ unsigned long do_mmap_pgoff(struct file *file,
unsigned long addr,
? ? ? ?vm_flags = calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags) |
? ? ? ? ? ? ? ? ? ? ? ?mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;

+ ? ? ? if (flags & MAP_STACK)
+ ? ? ? ? ? ? ? vm_flags |= VM_STACK_FLAGS;
+
? ? ? ?if (flags & MAP_LOCKED)
? ? ? ? ? ? ? ?if (!can_do_mlock())
? ? ? ? ? ? ? ? ? ? ? ?return -EPERM;
--
1.7.7.4



--
Siddhesh Poyarekar
http://siddhesh.in


2012-02-02 21:40:21

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps

> extern unsigned long move_page_tables(struct vm_area_struct *vma,
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 3f758c7..2f9f540 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -992,6 +992,9 @@ unsigned long do_mmap_pgoff(struct file *file,
> unsigned long addr,
> vm_flags = calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags) |
> mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
>
> + if (flags& MAP_STACK)
> + vm_flags |= VM_STACK_FLAGS;

??
MAP_STACK doesn't mean auto stack expansion. Why do you turn on VM_GROWSDOWN?
Seems incorrect.

2012-02-03 07:09:21

by Siddhesh Poyarekar

[permalink] [raw]
Subject: Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps

On Fri, Feb 3, 2012 at 3:10 AM, KOSAKI Motohiro
<[email protected]> wrote:
>> ?extern unsigned long move_page_tables(struct vm_area_struct *vma,
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 3f758c7..2f9f540 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -992,6 +992,9 @@ unsigned long do_mmap_pgoff(struct file *file,
>> unsigned long addr,
>> ? ? ? ?vm_flags = calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags) |
>> ? ? ? ? ? ? ? ? ? ? ? ?mm->def_flags | VM_MAYREAD | VM_MAYWRITE |
>> VM_MAYEXEC;
>>
>> + ? ? ? if (flags& ?MAP_STACK)
>> + ? ? ? ? ? ? ? vm_flags |= VM_STACK_FLAGS;
>
>
> ??
> MAP_STACK doesn't mean auto stack expansion. Why do you turn on
> VM_GROWSDOWN?
> Seems incorrect.
>

Right now MAP_STACK does not mean anything since it is ignored. The
intention of this behaviour change is to make MAP_STACK mean that the
map is going to be used as a stack and hence, set it up like a stack
ought to be. I could not really think of a valid case for fixed size
stacks; it looks like a limitation in the pthread implementation in
glibc rather than a feature. So this patch will actually result in
uniform behaviour across threads when it comes to stacks.

This does change vm accounting since thread stacks were earlier
accounted as anon memory.

--
Siddhesh Poyarekar
http://siddhesh.in

2012-02-03 08:01:57

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps

> Right now MAP_STACK does not mean anything since it is ignored. The
> intention of this behaviour change is to make MAP_STACK mean that the
> map is going to be used as a stack and hence, set it up like a stack
> ought to be. I could not really think of a valid case for fixed size
> stacks; it looks like a limitation in the pthread implementation in
> glibc rather than a feature. So this patch will actually result in
> uniform behaviour across threads when it comes to stacks.
>
> This does change vm accounting since thread stacks were earlier
> accounted as anon memory.

The fact is, now process stack and pthread stack clearly behave
different dance. libc don't expect pthread stack grow automatically.
So, your patch will break userland. Just only change display thing.

2012-02-03 09:49:46

by Siddhesh Poyarekar

[permalink] [raw]
Subject: Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps

On Fri, Feb 3, 2012 at 1:31 PM, KOSAKI Motohiro
<[email protected]> wrote:
> The fact is, now process stack and pthread stack clearly behave
> different dance. libc don't expect pthread stack grow automatically.
> So, your patch will break userland. Just only change display thing.

Thanks for your feedback. This attempt was to unify this behaviours,
but I guess you're right; I need to check if glibc really has a
problem with this than assuming that it should not. I will check with
glibc maintainers on this and update here. Since this flag is
specifically for glibc, it should not affect other applications or
libraries.

The proc changes won't make sense without the change to mark thread
stacks unless we create yet another vm flag to reflect MAP_STACK in
the vma and then use that for both process and its threads. I'll
submit a patch with this (if acceptable of course) if glibc strictly
requires fixed sized stacks.

--
Siddhesh Poyarekar
http://siddhesh.in

2012-02-03 10:29:11

by Mike Frysinger

[permalink] [raw]
Subject: Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps

On Friday 03 February 2012 03:01:35 KOSAKI Motohiro wrote:
> > Right now MAP_STACK does not mean anything since it is ignored. The
> > intention of this behaviour change is to make MAP_STACK mean that the
> > map is going to be used as a stack and hence, set it up like a stack
> > ought to be. I could not really think of a valid case for fixed size
> > stacks; it looks like a limitation in the pthread implementation in
> > glibc rather than a feature. So this patch will actually result in
> > uniform behaviour across threads when it comes to stacks.
> >
> > This does change vm accounting since thread stacks were earlier
> > accounted as anon memory.
>
> The fact is, now process stack and pthread stack clearly behave
> different dance. libc don't expect pthread stack grow automatically.
> So, your patch will break userland. Just only change display thing.

does it though ? glibc doesn't keep track of the unused address space ...
that's what the kernel is for. pthread_attr_setstacksize explicitly operates
on the *minimum* stack size, not the *exact* size.

where exactly do you think userland would break ?

http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_attr_setstacksize.html
-mike


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part.

2012-02-03 18:34:14

by Siddhesh Poyarekar

[permalink] [raw]
Subject: Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps

On Fri, Feb 3, 2012 at 1:31 PM, KOSAKI Motohiro
<[email protected]> wrote:
> The fact is, now process stack and pthread stack clearly behave
> different dance. libc don't expect pthread stack grow automatically.
> So, your patch will break userland. Just only change display thing.

The change should not make thread stacks (as implemented by glibc)
grow automatically in the general case since it has to implement guard
page(s) at the beginning of the mapped memory for stack using
mprotect(top, size, PROT_NONE). Due to this, the program will crash
before it ever has a chance to cause a page fault to make the stack
grow. This is of course assuming that the program doesn't purposely
jump over the guard page(s) by setting %sp beyond them and then
causing a page fault. So the only case in which a thread stack will
grow automatically is if the stackguard is set to 0:

http://pubs.opengroup.org/onlinepubs/007904975/functions/pthread_attr_setguardsize.html

I have also dropped an email on the libc-alpha list here to solicit
comments from libc maintainers on this:

http://sourceware.org/ml/libc-alpha/2012-02/msg00036.html


--
Siddhesh Poyarekar
http://siddhesh.in

2012-02-08 04:00:40

by Siddhesh Poyarekar

[permalink] [raw]
Subject: Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps

On Sat, Feb 4, 2012 at 12:04 AM, Siddhesh Poyarekar
<[email protected]> wrote:
> On Fri, Feb 3, 2012 at 1:31 PM, KOSAKI Motohiro
> <[email protected]> wrote:
>> The fact is, now process stack and pthread stack clearly behave
>> different dance. libc don't expect pthread stack grow automatically.
>> So, your patch will break userland. Just only change display thing.
<snip>
> I have also dropped an email on the libc-alpha list here to solicit
> comments from libc maintainers on this:
>
> http://sourceware.org/ml/libc-alpha/2012-02/msg00036.html
>

Kosaki-san, your suggestion of adding an extra flag seems like the
right way to go about this based on the discussion on libc-alpha,
specifically, your point about pthread_getattr_np() -- it may not be a
standard, but it's a breakage anyway. However, looking at the vm_flags
options in mm.h, it looks like the entire 32-bit space has been
exhausted for the flag value. The vm_flags is an unsigned long, so it
ought to take 8 bytes on a 64-bit system, but 32-bit systems will be
left behind.

So there are two options for this:

1) make vm_flags 64-bit for all arches. This will cause ABI breakage
on 32-bit systems, so any external drivers will have to be rebuilt
2) Implement this patch for 64-bit only by defining the new flag only
for 64-bit. 32-bit systems behave as is

Which of these would be better? I prefer the latter because it looks
like the path of least breakage.

--
Siddhesh Poyarekar
http://siddhesh.in

2012-02-08 17:57:01

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps

(2/7/12 11:00 PM), Siddhesh Poyarekar wrote:
> On Sat, Feb 4, 2012 at 12:04 AM, Siddhesh Poyarekar
> <[email protected]> wrote:
>> On Fri, Feb 3, 2012 at 1:31 PM, KOSAKI Motohiro
>> <[email protected]> wrote:
>>> The fact is, now process stack and pthread stack clearly behave
>>> different dance. libc don't expect pthread stack grow automatically.
>>> So, your patch will break userland. Just only change display thing.
> <snip>
>> I have also dropped an email on the libc-alpha list here to solicit
>> comments from libc maintainers on this:
>>
>> http://sourceware.org/ml/libc-alpha/2012-02/msg00036.html
>>
>
> Kosaki-san, your suggestion of adding an extra flag seems like the
> right way to go about this based on the discussion on libc-alpha,
> specifically, your point about pthread_getattr_np() -- it may not be a
> standard, but it's a breakage anyway. However, looking at the vm_flags
> options in mm.h, it looks like the entire 32-bit space has been
> exhausted for the flag value. The vm_flags is an unsigned long, so it
> ought to take 8 bytes on a 64-bit system, but 32-bit systems will be
> left behind.
>
> So there are two options for this:
>
> 1) make vm_flags 64-bit for all arches. This will cause ABI breakage
> on 32-bit systems, so any external drivers will have to be rebuilt

Several month ago, Linus NAKed this way.


> 2) Implement this patch for 64-bit only by defining the new flag only
> for 64-bit. 32-bit systems behave as is

No. That's bad than status quo. Enduser may get inconsistent and bad user
experience.

Now, we are using some bit saving hack. example,

1) use ifdef

#ifndef CONFIG_TRANSPARENT_HUGEPAGE
#define VM_MAPPED_COPY 0x01000000 /* T if mapped copy of data (nommu mmap) */
#else
#define VM_HUGEPAGE 0x01000000 /* MADV_HUGEPAGE marked this vma */
#endif

2) use bit combination

#define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ)


Maybe you can take a similar way. And of course, you can ban some useless flag
bits.

thanks.

> Which of these would be better? I prefer the latter because it looks
> like the path of least breakage.
>

2012-02-11 10:19:05

by Siddhesh Poyarekar

[permalink] [raw]
Subject: Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps

On Wed, Feb 8, 2012 at 11:27 PM, KOSAKI Motohiro
<[email protected]> wrote:
> Now, we are using some bit saving hack. example,
>
> 1) use ifdef
>
> #ifndef CONFIG_TRANSPARENT_HUGEPAGE
> #define VM_MAPPED_COPY 0x01000000 ? ? ?/* T if mapped copy of data (nommu
> mmap) */
> #else
> #define VM_HUGEPAGE ? ?0x01000000 ? ? ?/* MADV_HUGEPAGE marked this vma */
> #endif
>
> 2) use bit combination
>
> #define VM_STACK_INCOMPLETE_SETUP ? ? ?(VM_RAND_READ | VM_SEQ_READ)
>
>
> Maybe you can take a similar way. And of course, you can ban some useless
> flag
> bits.

I found the thread in which Linus rejected the idea of expanding vm_flags:

https://lkml.org/lkml/2011/11/10/522

and based on that, I don't think I can justify the need for a new flag
for this patch since it is purely for display purposes and has nothing
to do with the actual treatment of the vma. So I figured out another
way to identify a thread stack without changing the way the vma
properties (I should have done this in the first place I think) which
is by checking if the vma contains the stack pointer of the task.

With this change:

/proc/pid/task/tid/maps: will only mark the stack that the task uses

/proc/pid/maps: will mark all stacks. This path will be slower since
it will iterate through the entire thread group for each vma.

I'll test this and attach a new version of the patch.

Regards,
Siddhesh

--
Siddhesh Poyarekar
http://siddhesh.in

2012-02-11 15:02:45

by Siddhesh Poyarekar

[permalink] [raw]
Subject: [PATCH] Mark thread stack correctly in proc/<pid>/maps

Stack for a new thread is mapped by userspace code and passed via
sys_clone. This memory is currently seen as anonymous in
/proc/<pid>/maps, which makes it difficult to ascertain which mappings
are being used for thread stacks. This patch uses the individual task
stack pointers to determine which vmas are actually thread stacks.

The display for maps, smaps and numa_maps is now different at the
thread group (/proc/PID/maps) and thread (/proc/PID/task/TID/maps)
levels. The idea is to give the mapping as the individual tasks see it
in /proc/PID/task/TID/maps and then give an overview of the entire mm
as it were, in /proc/PID/maps.

At the thread group level, all vmas that are used as stacks are marked
as such. At the thread level however, only the stack that the task in
question uses is marked as such and all others (including the main
stack) are marked as anonymous memory.

Signed-off-by: Siddhesh Poyarekar <[email protected]>
---
Documentation/filesystems/proc.txt | 10 ++-
fs/proc/base.c | 12 ++--
fs/proc/internal.h | 9 ++-
fs/proc/task_mmu.c | 139 ++++++++++++++++++++++++++++++------
fs/proc/task_nommu.c | 57 ++++++++++++---
include/linux/mm.h | 9 +++
mm/memory.c | 22 ++++++
7 files changed, 214 insertions(+), 44 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index a76a26a..e0f9de3 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -290,7 +290,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
rsslim current limit in bytes on the rss
start_code address above which program text can run
end_code address below which program text can run
- start_stack address of the start of the stack
+ start_stack address of the start of the main process stack
esp current value of ESP
eip current value of EIP
pending bitmap of pending signals
@@ -356,12 +356,18 @@ The "pathname" shows the name associated file for this mapping. If the mapping
is not associated with a file:

[heap] = the heap of the program
- [stack] = the stack of the main process
+ [stack] = the mapping is used as a stack by one
+ of the threads of the process
[vdso] = the "virtual dynamic shared object",
the kernel system call handler

or if empty, the mapping is anonymous.

+The /proc/PID/task/TID/maps is a view of the virtual memory from the viewpoint
+of the individual tasks of a process. In this file you will see a mapping marked
+as [stack] only if that task sees it as a stack. This is a key difference from
+the content of /proc/PID/maps, where you will see all mappings that are being
+used as stack by all of those tasks.

The /proc/PID/smaps is an extension based on maps, showing the memory
consumption for each of the process's mappings. For each of mappings there
diff --git a/fs/proc/base.c b/fs/proc/base.c
index d4548dd..558660a 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2990,9 +2990,9 @@ static const struct pid_entry tgid_base_stuff[] = {
INF("cmdline", S_IRUGO, proc_pid_cmdline),
ONE("stat", S_IRUGO, proc_tgid_stat),
ONE("statm", S_IRUGO, proc_pid_statm),
- REG("maps", S_IRUGO, proc_maps_operations),
+ REG("maps", S_IRUGO, proc_pid_maps_operations),
#ifdef CONFIG_NUMA
- REG("numa_maps", S_IRUGO, proc_numa_maps_operations),
+ REG("numa_maps", S_IRUGO, proc_pid_numa_maps_operations),
#endif
REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations),
LNK("cwd", proc_cwd_link),
@@ -3003,7 +3003,7 @@ static const struct pid_entry tgid_base_stuff[] = {
REG("mountstats", S_IRUSR, proc_mountstats_operations),
#ifdef CONFIG_PROC_PAGE_MONITOR
REG("clear_refs", S_IWUSR, proc_clear_refs_operations),
- REG("smaps", S_IRUGO, proc_smaps_operations),
+ REG("smaps", S_IRUGO, proc_pid_smaps_operations),
REG("pagemap", S_IRUGO, proc_pagemap_operations),
#endif
#ifdef CONFIG_SECURITY
@@ -3349,9 +3349,9 @@ static const struct pid_entry tid_base_stuff[] = {
INF("cmdline", S_IRUGO, proc_pid_cmdline),
ONE("stat", S_IRUGO, proc_tid_stat),
ONE("statm", S_IRUGO, proc_pid_statm),
- REG("maps", S_IRUGO, proc_maps_operations),
+ REG("maps", S_IRUGO, proc_tid_maps_operations),
#ifdef CONFIG_NUMA
- REG("numa_maps", S_IRUGO, proc_numa_maps_operations),
+ REG("numa_maps", S_IRUGO, proc_tid_numa_maps_operations),
#endif
REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations),
LNK("cwd", proc_cwd_link),
@@ -3361,7 +3361,7 @@ static const struct pid_entry tid_base_stuff[] = {
REG("mountinfo", S_IRUGO, proc_mountinfo_operations),
#ifdef CONFIG_PROC_PAGE_MONITOR
REG("clear_refs", S_IWUSR, proc_clear_refs_operations),
- REG("smaps", S_IRUGO, proc_smaps_operations),
+ REG("smaps", S_IRUGO, proc_tid_smaps_operations),
REG("pagemap", S_IRUGO, proc_pagemap_operations),
#endif
#ifdef CONFIG_SECURITY
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 2925775..c44efe1 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -53,9 +53,12 @@ extern int proc_pid_statm(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task);
extern loff_t mem_lseek(struct file *file, loff_t offset, int orig);

-extern const struct file_operations proc_maps_operations;
-extern const struct file_operations proc_numa_maps_operations;
-extern const struct file_operations proc_smaps_operations;
+extern const struct file_operations proc_pid_maps_operations;
+extern const struct file_operations proc_tid_maps_operations;
+extern const struct file_operations proc_pid_numa_maps_operations;
+extern const struct file_operations proc_tid_numa_maps_operations;
+extern const struct file_operations proc_pid_smaps_operations;
+extern const struct file_operations proc_tid_smaps_operations;
extern const struct file_operations proc_clear_refs_operations;
extern const struct file_operations proc_pagemap_operations;
extern const struct file_operations proc_net_operations;
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 7dcd2a2..3e166f5 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -209,10 +209,12 @@ static int do_maps_open(struct inode *inode, struct file *file,
return ret;
}

-static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
+static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
{
struct mm_struct *mm = vma->vm_mm;
struct file *file = vma->vm_file;
+ struct proc_maps_private *priv = m->private;
+ struct task_struct *task = priv->task;
vm_flags_t flags = vma->vm_flags;
unsigned long ino = 0;
unsigned long long pgoff = 0;
@@ -259,8 +261,7 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
if (vma->vm_start <= mm->brk &&
vma->vm_end >= mm->start_brk) {
name = "[heap]";
- } else if (vma->vm_start <= mm->start_stack &&
- vma->vm_end >= mm->start_stack) {
+ } else if (vm_is_stack(task, vma, is_pid)) {
name = "[stack]";
}
} else {
@@ -275,13 +276,13 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
seq_putc(m, '\n');
}

-static int show_map(struct seq_file *m, void *v)
+static int show_map(struct seq_file *m, void *v, int is_pid)
{
struct vm_area_struct *vma = v;
struct proc_maps_private *priv = m->private;
struct task_struct *task = priv->task;

- show_map_vma(m, vma);
+ show_map_vma(m, vma, is_pid);

if (m->count < m->size) /* vma is copied successfully */
m->version = (vma != get_gate_vma(task->mm))
@@ -289,20 +290,49 @@ static int show_map(struct seq_file *m, void *v)
return 0;
}

+static int show_pid_map(struct seq_file *m, void *v)
+{
+ return show_map(m, v, 1);
+}
+
+static int show_tid_map(struct seq_file *m, void *v)
+{
+ return show_map(m, v, 0);
+}
+
static const struct seq_operations proc_pid_maps_op = {
.start = m_start,
.next = m_next,
.stop = m_stop,
- .show = show_map
+ .show = show_pid_map
};

-static int maps_open(struct inode *inode, struct file *file)
+static const struct seq_operations proc_tid_maps_op = {
+ .start = m_start,
+ .next = m_next,
+ .stop = m_stop,
+ .show = show_tid_map
+};
+
+static int pid_maps_open(struct inode *inode, struct file *file)
{
return do_maps_open(inode, file, &proc_pid_maps_op);
}

-const struct file_operations proc_maps_operations = {
- .open = maps_open,
+static int tid_maps_open(struct inode *inode, struct file *file)
+{
+ return do_maps_open(inode, file, &proc_tid_maps_op);
+}
+
+const struct file_operations proc_pid_maps_operations = {
+ .open = pid_maps_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release_private,
+};
+
+const struct file_operations proc_tid_maps_operations = {
+ .open = tid_maps_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release_private,
@@ -422,7 +452,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
return 0;
}

-static int show_smap(struct seq_file *m, void *v)
+static int show_smap(struct seq_file *m, void *v, int is_pid)
{
struct proc_maps_private *priv = m->private;
struct task_struct *task = priv->task;
@@ -440,7 +470,7 @@ static int show_smap(struct seq_file *m, void *v)
if (vma->vm_mm && !is_vm_hugetlb_page(vma))
walk_page_range(vma->vm_start, vma->vm_end, &smaps_walk);

- show_map_vma(m, vma);
+ show_map_vma(m, vma, is_pid);

seq_printf(m,
"Size: %8lu kB\n"
@@ -479,20 +509,49 @@ static int show_smap(struct seq_file *m, void *v)
return 0;
}

+static int show_pid_smap(struct seq_file *m, void *v)
+{
+ return show_smap(m, v, 1);
+}
+
+static int show_tid_smap(struct seq_file *m, void *v)
+{
+ return show_smap(m, v, 0);
+}
+
static const struct seq_operations proc_pid_smaps_op = {
.start = m_start,
.next = m_next,
.stop = m_stop,
- .show = show_smap
+ .show = show_pid_smap
+};
+
+static const struct seq_operations proc_tid_smaps_op = {
+ .start = m_start,
+ .next = m_next,
+ .stop = m_stop,
+ .show = show_tid_smap
};

-static int smaps_open(struct inode *inode, struct file *file)
+static int pid_smaps_open(struct inode *inode, struct file *file)
{
return do_maps_open(inode, file, &proc_pid_smaps_op);
}

-const struct file_operations proc_smaps_operations = {
- .open = smaps_open,
+static int tid_smaps_open(struct inode *inode, struct file *file)
+{
+ return do_maps_open(inode, file, &proc_tid_smaps_op);
+}
+
+const struct file_operations proc_pid_smaps_operations = {
+ .open = pid_smaps_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release_private,
+};
+
+const struct file_operations proc_tid_smaps_operations = {
+ .open = tid_smaps_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release_private,
@@ -1002,7 +1061,7 @@ static int gather_hugetbl_stats(pte_t *pte, unsigned long hmask,
/*
* Display pages allocated per node and memory policy via /proc.
*/
-static int show_numa_map(struct seq_file *m, void *v)
+static int show_numa_map(struct seq_file *m, void *v, int is_pid)
{
struct numa_maps_private *numa_priv = m->private;
struct proc_maps_private *proc_priv = &numa_priv->proc_maps;
@@ -1039,8 +1098,7 @@ static int show_numa_map(struct seq_file *m, void *v)
seq_path(m, &file->f_path, "\n\t= ");
} else if (vma->vm_start <= mm->brk && vma->vm_end >= mm->start_brk) {
seq_printf(m, " heap");
- } else if (vma->vm_start <= mm->start_stack &&
- vma->vm_end >= mm->start_stack) {
+ } else if (vm_is_stack(proc_priv->task, vma, is_pid)) {
seq_printf(m, " stack");
}

@@ -1084,21 +1142,39 @@ out:
return 0;
}

+static int show_pid_numa_map(struct seq_file *m, void *v)
+{
+ return show_numa_map(m, v, 1);
+}
+
+static int show_tid_numa_map(struct seq_file *m, void *v)
+{
+ return show_numa_map(m, v, 0);
+}
+
static const struct seq_operations proc_pid_numa_maps_op = {
.start = m_start,
.next = m_next,
.stop = m_stop,
- .show = show_numa_map,
+ .show = show_pid_numa_map,
};

-static int numa_maps_open(struct inode *inode, struct file *file)
+static const struct seq_operations proc_tid_numa_maps_op = {
+ .start = m_start,
+ .next = m_next,
+ .stop = m_stop,
+ .show = show_tid_numa_map,
+};
+
+static int numa_maps_open(struct inode *inode, struct file *file,
+ const struct seq_operations *ops)
{
struct numa_maps_private *priv;
int ret = -ENOMEM;
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
if (priv) {
priv->proc_maps.pid = proc_pid(inode);
- ret = seq_open(file, &proc_pid_numa_maps_op);
+ ret = seq_open(file, ops);
if (!ret) {
struct seq_file *m = file->private_data;
m->private = priv;
@@ -1109,8 +1185,25 @@ static int numa_maps_open(struct inode *inode, struct file *file)
return ret;
}

-const struct file_operations proc_numa_maps_operations = {
- .open = numa_maps_open,
+static int pid_numa_maps_open(struct inode *inode, struct file *file)
+{
+ return numa_maps_open(inode, file, &proc_pid_numa_maps_op);
+}
+
+static int tid_numa_maps_open(struct inode *inode, struct file *file)
+{
+ return numa_maps_open(inode, file, &proc_tid_numa_maps_op);
+}
+
+const struct file_operations proc_pid_numa_maps_operations = {
+ .open = pid_numa_maps_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release_private,
+};
+
+const struct file_operations proc_tid_numa_maps_operations = {
+ .open = tid_numa_maps_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release_private,
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 980de54..bdfff69 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -134,9 +134,11 @@ static void pad_len_spaces(struct seq_file *m, int len)
/*
* display a single VMA to a sequenced file
*/
-static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma)
+static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma,
+ int is_pid)
{
struct mm_struct *mm = vma->vm_mm;
+ struct proc_maps_private *priv = m->private;
unsigned long ino = 0;
struct file *file;
dev_t dev = 0;
@@ -168,8 +170,7 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma)
pad_len_spaces(m, len);
seq_path(m, &file->f_path, "");
} else if (mm) {
- if (vma->vm_start <= mm->start_stack &&
- vma->vm_end >= mm->start_stack) {
+ if (vm_is_stack(priv->task, vma, is_pid))
pad_len_spaces(m, len);
seq_puts(m, "[stack]");
}
@@ -182,11 +183,22 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma)
/*
* display mapping lines for a particular process's /proc/pid/maps
*/
-static int show_map(struct seq_file *m, void *_p)
+static int show_map(struct seq_file *m, void *_p, int is_pid)
{
struct rb_node *p = _p;

- return nommu_vma_show(m, rb_entry(p, struct vm_area_struct, vm_rb));
+ return nommu_vma_show(m, rb_entry(p, struct vm_area_struct, vm_rb),
+ is_pid);
+}
+
+static int show_pid_map(struct seq_file *m, void *_p)
+{
+ return show_map(m, _p, 1);
+}
+
+static int show_tid_map(struct seq_file *m, void *_p)
+{
+ return show_map(m, _p, 0);
}

static void *m_start(struct seq_file *m, loff_t *pos)
@@ -240,10 +252,18 @@ static const struct seq_operations proc_pid_maps_ops = {
.start = m_start,
.next = m_next,
.stop = m_stop,
- .show = show_map
+ .show = show_pid_map
+};
+
+static const struct seq_operations proc_tid_maps_ops = {
+ .start = m_start,
+ .next = m_next,
+ .stop = m_stop,
+ .show = show_tid_map
};

-static int maps_open(struct inode *inode, struct file *file)
+static int maps_open(struct inode *inode, struct file *file,
+ const struct seq_operations *ops)
{
struct proc_maps_private *priv;
int ret = -ENOMEM;
@@ -251,7 +271,7 @@ static int maps_open(struct inode *inode, struct file *file)
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
if (priv) {
priv->pid = proc_pid(inode);
- ret = seq_open(file, &proc_pid_maps_ops);
+ ret = seq_open(file, ops);
if (!ret) {
struct seq_file *m = file->private_data;
m->private = priv;
@@ -262,8 +282,25 @@ static int maps_open(struct inode *inode, struct file *file)
return ret;
}

-const struct file_operations proc_maps_operations = {
- .open = maps_open,
+static int pid_maps_open(struct inode *inode, struct file *file)
+{
+ return maps_open(inode, file, &proc_pid_maps_ops);
+}
+
+static int tid_maps_open(struct inode *inode, struct file *file)
+{
+ return maps_open(inode, file, &proc_tid_maps_ops);
+}
+
+const struct file_operations proc_pid_maps_operations = {
+ .open = pid_maps_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release_private,
+};
+
+const struct file_operations proc_tid_maps_operations = {
+ .open = tid_maps_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release_private,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 17b27cd..b0fc583 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1040,6 +1040,15 @@ static inline int stack_guard_page_end(struct vm_area_struct *vma,
!vma_growsup(vma->vm_next, addr);
}

+/* Check if the vma is being used as a stack by this task */
+static inline int vm_is_stack_for_task(struct task_struct *t,
+ struct vm_area_struct *vma)
+{
+ return (vma->vm_start <= KSTK_ESP(t) && vma->vm_end >= KSTK_ESP(t));
+}
+
+extern int vm_is_stack(struct task_struct *task, struct vm_area_struct *vma, int in_group);
+
extern unsigned long move_page_tables(struct vm_area_struct *vma,
unsigned long old_addr, struct vm_area_struct *new_vma,
unsigned long new_addr, unsigned long len);
diff --git a/mm/memory.c b/mm/memory.c
index fa2f04e..601a920 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3909,6 +3909,28 @@ void print_vma_addr(char *prefix, unsigned long ip)
up_read(&current->mm->mmap_sem);
}

+/*
+ * Check if the vma is being used as a stack.
+ * If is_group is non-zero, check in the entire thread group or else
+ * just check in the current task.
+ */
+int vm_is_stack(struct task_struct *task,
+ struct vm_area_struct *vma, int in_group)
+{
+ if (vm_is_stack_for_task(task, vma))
+ return 1;
+
+ if (in_group) {
+ struct task_struct *t = task;
+ while_each_thread(task, t) {
+ if (vm_is_stack_for_task(t, vma))
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
#ifdef CONFIG_PROVE_LOCKING
void might_fault(void)
{
--
1.7.7.4

2012-02-21 04:24:08

by Siddhesh Poyarekar

[permalink] [raw]
Subject: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps

Hi,

Resending patch.

Regards,
Siddhesh


---------- Forwarded message ----------
From: Siddhesh Poyarekar <[email protected]>
Date: Sat, Feb 11, 2012 at 8:33 PM
Subject: [PATCH] Mark thread stack correctly in proc/<pid>/maps
To: KOSAKI Motohiro <[email protected]>
Cc: [email protected], [email protected], Alexander Viro
<[email protected]>, [email protected], Jamie Lokier
<[email protected]>, [email protected], Siddhesh Poyarekar
<[email protected]>


Stack for a new thread is mapped by userspace code and passed via
sys_clone. This memory is currently seen as anonymous in
/proc/<pid>/maps, which makes it difficult to ascertain which mappings
are being used for thread stacks. This patch uses the individual task
stack pointers to determine which vmas are actually thread stacks.

The display for maps, smaps and numa_maps is now different at the
thread group (/proc/PID/maps) and thread (/proc/PID/task/TID/maps)
levels. The idea is to give the mapping as the individual tasks see it
in /proc/PID/task/TID/maps and then give an overview of the entire mm
as it were, in /proc/PID/maps.

At the thread group level, all vmas that are used as stacks are marked
as such. At the thread level however, only the stack that the task in
question uses is marked as such and all others (including the main
stack) are marked as anonymous memory.

Signed-off-by: Siddhesh Poyarekar <[email protected]>
---
?Documentation/filesystems/proc.txt | ? 10 ++-
?fs/proc/base.c ? ? ? ? ? ? ? ? ? ? | ? 12 ++--
?fs/proc/internal.h ? ? ? ? ? ? ? ? | ? ?9 ++-
?fs/proc/task_mmu.c ? ? ? ? ? ? ? ? | ?139 ++++++++++++++++++++++++++++++------
?fs/proc/task_nommu.c ? ? ? ? ? ? ? | ? 57 ++++++++++++---
?include/linux/mm.h ? ? ? ? ? ? ? ? | ? ?9 +++
?mm/memory.c ? ? ? ? ? ? ? ? ? ? ? ?| ? 22 ++++++
?7 files changed, 214 insertions(+), 44 deletions(-)

diff --git a/Documentation/filesystems/proc.txt
b/Documentation/filesystems/proc.txt
index a76a26a..e0f9de3 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -290,7 +290,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
? rsslim ? ? ? ?current limit in bytes on the rss
? start_code ? ?address above which program text can run
? end_code ? ? ?address below which program text can run
- ?start_stack ? address of the start of the stack
+ ?start_stack ? address of the start of the main process stack
? esp ? ? ? ? ? current value of ESP
? eip ? ? ? ? ? current value of EIP
? pending ? ? ? bitmap of pending signals
@@ -356,12 +356,18 @@ The "pathname" shows the name associated file
for this mapping. ?If the mapping
?is not associated with a file:

?[heap] ? ? ? ? ? ? ? ? ? = the heap of the program
- [stack] ? ? ? ? ? ? ? ? ?= the stack of the main process
+ [stack] ? ? ? ? ? ? ? ? ?= the mapping is used as a stack by one
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ?of the threads of the process
?[vdso] ? ? ? ? ? ? ? ? ? = the "virtual dynamic shared object",
? ? ? ? ? ? ? ? ? ? ? ? ? ? the kernel system call handler

?or if empty, the mapping is anonymous.

+The /proc/PID/task/TID/maps is a view of the virtual memory from the viewpoint
+of the individual tasks of a process. In this file you will see a
mapping marked
+as [stack] only if that task sees it as a stack. This is a key difference from
+the content of /proc/PID/maps, where you will see all mappings that are being
+used as stack by all of those tasks.

?The /proc/PID/smaps is an extension based on maps, showing the memory
?consumption for each of the process's mappings. For each of mappings there
diff --git a/fs/proc/base.c b/fs/proc/base.c
index d4548dd..558660a 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2990,9 +2990,9 @@ static const struct pid_entry tgid_base_stuff[] = {
? ? ? ?INF("cmdline", ? ?S_IRUGO, proc_pid_cmdline),
? ? ? ?ONE("stat", ? ? ? S_IRUGO, proc_tgid_stat),
? ? ? ?ONE("statm", ? ? ?S_IRUGO, proc_pid_statm),
- ? ? ? REG("maps", ? ? ? S_IRUGO, proc_maps_operations),
+ ? ? ? REG("maps", ? ? ? S_IRUGO, proc_pid_maps_operations),
?#ifdef CONFIG_NUMA
- ? ? ? REG("numa_maps", ?S_IRUGO, proc_numa_maps_operations),
+ ? ? ? REG("numa_maps", ?S_IRUGO, proc_pid_numa_maps_operations),
?#endif
? ? ? ?REG("mem", ? ? ? ?S_IRUSR|S_IWUSR, proc_mem_operations),
? ? ? ?LNK("cwd", ? ? ? ?proc_cwd_link),
@@ -3003,7 +3003,7 @@ static const struct pid_entry tgid_base_stuff[] = {
? ? ? ?REG("mountstats", S_IRUSR, proc_mountstats_operations),
?#ifdef CONFIG_PROC_PAGE_MONITOR
? ? ? ?REG("clear_refs", S_IWUSR, proc_clear_refs_operations),
- ? ? ? REG("smaps", ? ? ?S_IRUGO, proc_smaps_operations),
+ ? ? ? REG("smaps", ? ? ?S_IRUGO, proc_pid_smaps_operations),
? ? ? ?REG("pagemap", ? ?S_IRUGO, proc_pagemap_operations),
?#endif
?#ifdef CONFIG_SECURITY
@@ -3349,9 +3349,9 @@ static const struct pid_entry tid_base_stuff[] = {
? ? ? ?INF("cmdline", ? S_IRUGO, proc_pid_cmdline),
? ? ? ?ONE("stat", ? ? ?S_IRUGO, proc_tid_stat),
? ? ? ?ONE("statm", ? ? S_IRUGO, proc_pid_statm),
- ? ? ? REG("maps", ? ? ?S_IRUGO, proc_maps_operations),
+ ? ? ? REG("maps", ? ? ?S_IRUGO, proc_tid_maps_operations),
?#ifdef CONFIG_NUMA
- ? ? ? REG("numa_maps", S_IRUGO, proc_numa_maps_operations),
+ ? ? ? REG("numa_maps", S_IRUGO, proc_tid_numa_maps_operations),
?#endif
? ? ? ?REG("mem", ? ? ? S_IRUSR|S_IWUSR, proc_mem_operations),
? ? ? ?LNK("cwd", ? ? ? proc_cwd_link),
@@ -3361,7 +3361,7 @@ static const struct pid_entry tid_base_stuff[] = {
? ? ? ?REG("mountinfo", ?S_IRUGO, proc_mountinfo_operations),
?#ifdef CONFIG_PROC_PAGE_MONITOR
? ? ? ?REG("clear_refs", S_IWUSR, proc_clear_refs_operations),
- ? ? ? REG("smaps", ? ? S_IRUGO, proc_smaps_operations),
+ ? ? ? REG("smaps", ? ? S_IRUGO, proc_tid_smaps_operations),
? ? ? ?REG("pagemap", ? ?S_IRUGO, proc_pagemap_operations),
?#endif
?#ifdef CONFIG_SECURITY
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 2925775..c44efe1 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -53,9 +53,12 @@ extern int proc_pid_statm(struct seq_file *m,
struct pid_namespace *ns,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?struct pid *pid, struct task_struct *task);
?extern loff_t mem_lseek(struct file *file, loff_t offset, int orig);

-extern const struct file_operations proc_maps_operations;
-extern const struct file_operations proc_numa_maps_operations;
-extern const struct file_operations proc_smaps_operations;
+extern const struct file_operations proc_pid_maps_operations;
+extern const struct file_operations proc_tid_maps_operations;
+extern const struct file_operations proc_pid_numa_maps_operations;
+extern const struct file_operations proc_tid_numa_maps_operations;
+extern const struct file_operations proc_pid_smaps_operations;
+extern const struct file_operations proc_tid_smaps_operations;
?extern const struct file_operations proc_clear_refs_operations;
?extern const struct file_operations proc_pagemap_operations;
?extern const struct file_operations proc_net_operations;
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 7dcd2a2..3e166f5 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -209,10 +209,12 @@ static int do_maps_open(struct inode *inode,
struct file *file,
? ? ? ?return ret;
?}

-static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
+static void show_map_vma(struct seq_file *m, struct vm_area_struct
*vma, int is_pid)
?{
? ? ? ?struct mm_struct *mm = vma->vm_mm;
? ? ? ?struct file *file = vma->vm_file;
+ ? ? ? struct proc_maps_private *priv = m->private;
+ ? ? ? struct task_struct *task = priv->task;
? ? ? ?vm_flags_t flags = vma->vm_flags;
? ? ? ?unsigned long ino = 0;
? ? ? ?unsigned long long pgoff = 0;
@@ -259,8 +261,7 @@ static void show_map_vma(struct seq_file *m,
struct vm_area_struct *vma)
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?if (vma->vm_start <= mm->brk &&
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?vma->vm_end >= mm->start_brk) {
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?name = "[heap]";
- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? } else if (vma->vm_start <= mm->start_stack &&
- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?vma->vm_end >= mm->start_stack) {
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? } else if (vm_is_stack(task, vma, is_pid)) {
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?name = "[stack]";
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?}
? ? ? ? ? ? ? ? ? ? ? ?} else {
@@ -275,13 +276,13 @@ static void show_map_vma(struct seq_file *m,
struct vm_area_struct *vma)
? ? ? ?seq_putc(m, '\n');
?}

-static int show_map(struct seq_file *m, void *v)
+static int show_map(struct seq_file *m, void *v, int is_pid)
?{
? ? ? ?struct vm_area_struct *vma = v;
? ? ? ?struct proc_maps_private *priv = m->private;
? ? ? ?struct task_struct *task = priv->task;

- ? ? ? show_map_vma(m, vma);
+ ? ? ? show_map_vma(m, vma, is_pid);

? ? ? ?if (m->count < m->size) ?/* vma is copied successfully */
? ? ? ? ? ? ? ?m->version = (vma != get_gate_vma(task->mm))
@@ -289,20 +290,49 @@ static int show_map(struct seq_file *m, void *v)
? ? ? ?return 0;
?}

+static int show_pid_map(struct seq_file *m, void *v)
+{
+ ? ? ? return show_map(m, v, 1);
+}
+
+static int show_tid_map(struct seq_file *m, void *v)
+{
+ ? ? ? return show_map(m, v, 0);
+}
+
?static const struct seq_operations proc_pid_maps_op = {
? ? ? ?.start ?= m_start,
? ? ? ?.next ? = m_next,
? ? ? ?.stop ? = m_stop,
- ? ? ? .show ? = show_map
+ ? ? ? .show ? = show_pid_map
?};

-static int maps_open(struct inode *inode, struct file *file)
+static const struct seq_operations proc_tid_maps_op = {
+ ? ? ? .start ?= m_start,
+ ? ? ? .next ? = m_next,
+ ? ? ? .stop ? = m_stop,
+ ? ? ? .show ? = show_tid_map
+};
+
+static int pid_maps_open(struct inode *inode, struct file *file)
?{
? ? ? ?return do_maps_open(inode, file, &proc_pid_maps_op);
?}

-const struct file_operations proc_maps_operations = {
- ? ? ? .open ? ? ? ? ? = maps_open,
+static int tid_maps_open(struct inode *inode, struct file *file)
+{
+ ? ? ? return do_maps_open(inode, file, &proc_tid_maps_op);
+}
+
+const struct file_operations proc_pid_maps_operations = {
+ ? ? ? .open ? ? ? ? ? = pid_maps_open,
+ ? ? ? .read ? ? ? ? ? = seq_read,
+ ? ? ? .llseek ? ? ? ? = seq_lseek,
+ ? ? ? .release ? ? ? ?= seq_release_private,
+};
+
+const struct file_operations proc_tid_maps_operations = {
+ ? ? ? .open ? ? ? ? ? = tid_maps_open,
? ? ? ?.read ? ? ? ? ? = seq_read,
? ? ? ?.llseek ? ? ? ? = seq_lseek,
? ? ? ?.release ? ? ? ?= seq_release_private,
@@ -422,7 +452,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned
long addr, unsigned long end,
? ? ? ?return 0;
?}

-static int show_smap(struct seq_file *m, void *v)
+static int show_smap(struct seq_file *m, void *v, int is_pid)
?{
? ? ? ?struct proc_maps_private *priv = m->private;
? ? ? ?struct task_struct *task = priv->task;
@@ -440,7 +470,7 @@ static int show_smap(struct seq_file *m, void *v)
? ? ? ?if (vma->vm_mm && !is_vm_hugetlb_page(vma))
? ? ? ? ? ? ? ?walk_page_range(vma->vm_start, vma->vm_end, &smaps_walk);

- ? ? ? show_map_vma(m, vma);
+ ? ? ? show_map_vma(m, vma, is_pid);

? ? ? ?seq_printf(m,
? ? ? ? ? ? ? ? ? "Size: ? ? ? ? ? %8lu kB\n"
@@ -479,20 +509,49 @@ static int show_smap(struct seq_file *m, void *v)
? ? ? ?return 0;
?}

+static int show_pid_smap(struct seq_file *m, void *v)
+{
+ ? ? ? return show_smap(m, v, 1);
+}
+
+static int show_tid_smap(struct seq_file *m, void *v)
+{
+ ? ? ? return show_smap(m, v, 0);
+}
+
?static const struct seq_operations proc_pid_smaps_op = {
? ? ? ?.start ?= m_start,
? ? ? ?.next ? = m_next,
? ? ? ?.stop ? = m_stop,
- ? ? ? .show ? = show_smap
+ ? ? ? .show ? = show_pid_smap
+};
+
+static const struct seq_operations proc_tid_smaps_op = {
+ ? ? ? .start ?= m_start,
+ ? ? ? .next ? = m_next,
+ ? ? ? .stop ? = m_stop,
+ ? ? ? .show ? = show_tid_smap
?};

-static int smaps_open(struct inode *inode, struct file *file)
+static int pid_smaps_open(struct inode *inode, struct file *file)
?{
? ? ? ?return do_maps_open(inode, file, &proc_pid_smaps_op);
?}

-const struct file_operations proc_smaps_operations = {
- ? ? ? .open ? ? ? ? ? = smaps_open,
+static int tid_smaps_open(struct inode *inode, struct file *file)
+{
+ ? ? ? return do_maps_open(inode, file, &proc_tid_smaps_op);
+}
+
+const struct file_operations proc_pid_smaps_operations = {
+ ? ? ? .open ? ? ? ? ? = pid_smaps_open,
+ ? ? ? .read ? ? ? ? ? = seq_read,
+ ? ? ? .llseek ? ? ? ? = seq_lseek,
+ ? ? ? .release ? ? ? ?= seq_release_private,
+};
+
+const struct file_operations proc_tid_smaps_operations = {
+ ? ? ? .open ? ? ? ? ? = tid_smaps_open,
? ? ? ?.read ? ? ? ? ? = seq_read,
? ? ? ?.llseek ? ? ? ? = seq_lseek,
? ? ? ?.release ? ? ? ?= seq_release_private,
@@ -1002,7 +1061,7 @@ static int gather_hugetbl_stats(pte_t *pte,
unsigned long hmask,
?/*
?* Display pages allocated per node and memory policy via /proc.
?*/
-static int show_numa_map(struct seq_file *m, void *v)
+static int show_numa_map(struct seq_file *m, void *v, int is_pid)
?{
? ? ? ?struct numa_maps_private *numa_priv = m->private;
? ? ? ?struct proc_maps_private *proc_priv = &numa_priv->proc_maps;
@@ -1039,8 +1098,7 @@ static int show_numa_map(struct seq_file *m, void *v)
? ? ? ? ? ? ? ?seq_path(m, &file->f_path, "\n\t= ");
? ? ? ?} else if (vma->vm_start <= mm->brk && vma->vm_end >= mm->start_brk) {
? ? ? ? ? ? ? ?seq_printf(m, " heap");
- ? ? ? } else if (vma->vm_start <= mm->start_stack &&
- ? ? ? ? ? ? ? ? ? ? ? vma->vm_end >= mm->start_stack) {
+ ? ? ? } else if (vm_is_stack(proc_priv->task, vma, is_pid)) {
? ? ? ? ? ? ? ?seq_printf(m, " stack");
? ? ? ?}

@@ -1084,21 +1142,39 @@ out:
? ? ? ?return 0;
?}

+static int show_pid_numa_map(struct seq_file *m, void *v)
+{
+ ? ? ? return show_numa_map(m, v, 1);
+}
+
+static int show_tid_numa_map(struct seq_file *m, void *v)
+{
+ ? ? ? return show_numa_map(m, v, 0);
+}
+
?static const struct seq_operations proc_pid_numa_maps_op = {
? ? ? ? .start ?= m_start,
? ? ? ? .next ? = m_next,
? ? ? ? .stop ? = m_stop,
- ? ? ? ?.show ? = show_numa_map,
+ ? ? ? ?.show ? = show_pid_numa_map,
?};

-static int numa_maps_open(struct inode *inode, struct file *file)
+static const struct seq_operations proc_tid_numa_maps_op = {
+ ? ? ? ?.start ?= m_start,
+ ? ? ? ?.next ? = m_next,
+ ? ? ? ?.stop ? = m_stop,
+ ? ? ? ?.show ? = show_tid_numa_map,
+};
+
+static int numa_maps_open(struct inode *inode, struct file *file,
+ ? ? ? ? ? ? ? ? ? ? ? ? const struct seq_operations *ops)
?{
? ? ? ?struct numa_maps_private *priv;
? ? ? ?int ret = -ENOMEM;
? ? ? ?priv = kzalloc(sizeof(*priv), GFP_KERNEL);
? ? ? ?if (priv) {
? ? ? ? ? ? ? ?priv->proc_maps.pid = proc_pid(inode);
- ? ? ? ? ? ? ? ret = seq_open(file, &proc_pid_numa_maps_op);
+ ? ? ? ? ? ? ? ret = seq_open(file, ops);
? ? ? ? ? ? ? ?if (!ret) {
? ? ? ? ? ? ? ? ? ? ? ?struct seq_file *m = file->private_data;
? ? ? ? ? ? ? ? ? ? ? ?m->private = priv;
@@ -1109,8 +1185,25 @@ static int numa_maps_open(struct inode *inode,
struct file *file)
? ? ? ?return ret;
?}

-const struct file_operations proc_numa_maps_operations = {
- ? ? ? .open ? ? ? ? ? = numa_maps_open,
+static int pid_numa_maps_open(struct inode *inode, struct file *file)
+{
+ ? ? ? return numa_maps_open(inode, file, &proc_pid_numa_maps_op);
+}
+
+static int tid_numa_maps_open(struct inode *inode, struct file *file)
+{
+ ? ? ? return numa_maps_open(inode, file, &proc_tid_numa_maps_op);
+}
+
+const struct file_operations proc_pid_numa_maps_operations = {
+ ? ? ? .open ? ? ? ? ? = pid_numa_maps_open,
+ ? ? ? .read ? ? ? ? ? = seq_read,
+ ? ? ? .llseek ? ? ? ? = seq_lseek,
+ ? ? ? .release ? ? ? ?= seq_release_private,
+};
+
+const struct file_operations proc_tid_numa_maps_operations = {
+ ? ? ? .open ? ? ? ? ? = tid_numa_maps_open,
? ? ? ?.read ? ? ? ? ? = seq_read,
? ? ? ?.llseek ? ? ? ? = seq_lseek,
? ? ? ?.release ? ? ? ?= seq_release_private,
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 980de54..bdfff69 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -134,9 +134,11 @@ static void pad_len_spaces(struct seq_file *m, int len)
?/*
?* display a single VMA to a sequenced file
?*/
-static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma)
+static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma,
+ ? ? ? ? ? ? ? ? ? ? ? ? int is_pid)
?{
? ? ? ?struct mm_struct *mm = vma->vm_mm;
+ ? ? ? struct proc_maps_private *priv = m->private;
? ? ? ?unsigned long ino = 0;
? ? ? ?struct file *file;
? ? ? ?dev_t dev = 0;
@@ -168,8 +170,7 @@ static int nommu_vma_show(struct seq_file *m,
struct vm_area_struct *vma)
? ? ? ? ? ? ? ?pad_len_spaces(m, len);
? ? ? ? ? ? ? ?seq_path(m, &file->f_path, "");
? ? ? ?} else if (mm) {
- ? ? ? ? ? ? ? if (vma->vm_start <= mm->start_stack &&
- ? ? ? ? ? ? ? ? ? ? ? vma->vm_end >= mm->start_stack) {
+ ? ? ? ? ? ? ? if (vm_is_stack(priv->task, vma, is_pid))
? ? ? ? ? ? ? ? ? ? ? ?pad_len_spaces(m, len);
? ? ? ? ? ? ? ? ? ? ? ?seq_puts(m, "[stack]");
? ? ? ? ? ? ? ?}
@@ -182,11 +183,22 @@ static int nommu_vma_show(struct seq_file *m,
struct vm_area_struct *vma)
?/*
?* display mapping lines for a particular process's /proc/pid/maps
?*/
-static int show_map(struct seq_file *m, void *_p)
+static int show_map(struct seq_file *m, void *_p, int is_pid)
?{
? ? ? ?struct rb_node *p = _p;

- ? ? ? return nommu_vma_show(m, rb_entry(p, struct vm_area_struct, vm_rb));
+ ? ? ? return nommu_vma_show(m, rb_entry(p, struct vm_area_struct, vm_rb),
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? is_pid);
+}
+
+static int show_pid_map(struct seq_file *m, void *_p)
+{
+ ? ? ? return show_map(m, _p, 1);
+}
+
+static int show_tid_map(struct seq_file *m, void *_p)
+{
+ ? ? ? return show_map(m, _p, 0);
?}

?static void *m_start(struct seq_file *m, loff_t *pos)
@@ -240,10 +252,18 @@ static const struct seq_operations proc_pid_maps_ops = {
? ? ? ?.start ?= m_start,
? ? ? ?.next ? = m_next,
? ? ? ?.stop ? = m_stop,
- ? ? ? .show ? = show_map
+ ? ? ? .show ? = show_pid_map
+};
+
+static const struct seq_operations proc_tid_maps_ops = {
+ ? ? ? .start ?= m_start,
+ ? ? ? .next ? = m_next,
+ ? ? ? .stop ? = m_stop,
+ ? ? ? .show ? = show_tid_map
?};

-static int maps_open(struct inode *inode, struct file *file)
+static int maps_open(struct inode *inode, struct file *file,
+ ? ? ? ? ? ? ? ? ? ?const struct seq_operations *ops)
?{
? ? ? ?struct proc_maps_private *priv;
? ? ? ?int ret = -ENOMEM;
@@ -251,7 +271,7 @@ static int maps_open(struct inode *inode, struct file *file)
? ? ? ?priv = kzalloc(sizeof(*priv), GFP_KERNEL);
? ? ? ?if (priv) {
? ? ? ? ? ? ? ?priv->pid = proc_pid(inode);
- ? ? ? ? ? ? ? ret = seq_open(file, &proc_pid_maps_ops);
+ ? ? ? ? ? ? ? ret = seq_open(file, ops);
? ? ? ? ? ? ? ?if (!ret) {
? ? ? ? ? ? ? ? ? ? ? ?struct seq_file *m = file->private_data;
? ? ? ? ? ? ? ? ? ? ? ?m->private = priv;
@@ -262,8 +282,25 @@ static int maps_open(struct inode *inode, struct
file *file)
? ? ? ?return ret;
?}

-const struct file_operations proc_maps_operations = {
- ? ? ? .open ? ? ? ? ? = maps_open,
+static int pid_maps_open(struct inode *inode, struct file *file)
+{
+ ? ? ? return maps_open(inode, file, &proc_pid_maps_ops);
+}
+
+static int tid_maps_open(struct inode *inode, struct file *file)
+{
+ ? ? ? return maps_open(inode, file, &proc_tid_maps_ops);
+}
+
+const struct file_operations proc_pid_maps_operations = {
+ ? ? ? .open ? ? ? ? ? = pid_maps_open,
+ ? ? ? .read ? ? ? ? ? = seq_read,
+ ? ? ? .llseek ? ? ? ? = seq_lseek,
+ ? ? ? .release ? ? ? ?= seq_release_private,
+};
+
+const struct file_operations proc_tid_maps_operations = {
+ ? ? ? .open ? ? ? ? ? = tid_maps_open,
? ? ? ?.read ? ? ? ? ? = seq_read,
? ? ? ?.llseek ? ? ? ? = seq_lseek,
? ? ? ?.release ? ? ? ?= seq_release_private,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 17b27cd..b0fc583 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1040,6 +1040,15 @@ static inline int stack_guard_page_end(struct
vm_area_struct *vma,
? ? ? ? ? ? ? ?!vma_growsup(vma->vm_next, addr);
?}

+/* Check if the vma is being used as a stack by this task */
+static inline int vm_is_stack_for_task(struct task_struct *t,
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?struct vm_area_struct *vma)
+{
+ ? ? ? return (vma->vm_start <= KSTK_ESP(t) && vma->vm_end >= KSTK_ESP(t));
+}
+
+extern int vm_is_stack(struct task_struct *task, struct
vm_area_struct *vma, int in_group);
+
?extern unsigned long move_page_tables(struct vm_area_struct *vma,
? ? ? ? ? ? ? ?unsigned long old_addr, struct vm_area_struct *new_vma,
? ? ? ? ? ? ? ?unsigned long new_addr, unsigned long len);
diff --git a/mm/memory.c b/mm/memory.c
index fa2f04e..601a920 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3909,6 +3909,28 @@ void print_vma_addr(char *prefix, unsigned long ip)
? ? ? ?up_read(&current->mm->mmap_sem);
?}

+/*
+ * Check if the vma is being used as a stack.
+ * If is_group is non-zero, check in the entire thread group or else
+ * just check in the current task.
+ */
+int vm_is_stack(struct task_struct *task,
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct vm_area_struct *vma, int in_group)
+{
+ ? ? ? if (vm_is_stack_for_task(task, vma))
+ ? ? ? ? ? ? ? return 1;
+
+ ? ? ? if (in_group) {
+ ? ? ? ? ? ? ? struct task_struct *t = task;
+ ? ? ? ? ? ? ? while_each_thread(task, t) {
+ ? ? ? ? ? ? ? ? ? ? ? if (vm_is_stack_for_task(t, vma))
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? return 1;
+ ? ? ? ? ? ? ? }
+ ? ? ? }
+
+ ? ? ? return 0;
+}
+
?#ifdef CONFIG_PROVE_LOCKING
?void might_fault(void)
?{
--
1.7.7.4



--
Siddhesh Poyarekar
http://siddhesh.in

2012-02-22 23:00:14

by Andrew Morton

[permalink] [raw]
Subject: Re: [RESEND][PATCH] Mark thread stack correctly in proc/<pid>/maps

On Tue, 21 Feb 2012 09:54:04 +0530
Siddhesh Poyarekar <[email protected]> wrote:

> Stack for a new thread is mapped by userspace code and passed via
> sys_clone. This memory is currently seen as anonymous in
> /proc/<pid>/maps, which makes it difficult to ascertain which mappings
> are being used for thread stacks. This patch uses the individual task
> stack pointers to determine which vmas are actually thread stacks.
>
> The display for maps, smaps and numa_maps is now different at the
> thread group (/proc/PID/maps) and thread (/proc/PID/task/TID/maps)
> levels. The idea is to give the mapping as the individual tasks see it
> in /proc/PID/task/TID/maps and then give an overview of the entire mm
> as it were, in /proc/PID/maps.
>
> At the thread group level, all vmas that are used as stacks are marked
> as such. At the thread level however, only the stack that the task in
> question uses is marked as such and all others (including the main
> stack) are marked as anonymous memory.

Please flesh this description out with specific examples of the
before-and-after contents of all the applicable procfs files. This way
we can clearly see the proposed interface changes, which is the thing
we care most about with such a patch.

The patch itself has been utterly and hopelessly mangled by gmail.
Please fix that up when resending (as a last resort: use a text/plain
attachment).

2012-02-23 04:03:18

by Siddhesh Poyarekar

[permalink] [raw]
Subject: [PATCH] Mark thread stack correctly in proc/<pid>/maps

Stack for a new thread is mapped by userspace code and passed via
sys_clone. This memory is currently seen as anonymous in
/proc/<pid>/maps, which makes it difficult to ascertain which mappings
are being used for thread stacks. This patch uses the individual task
stack pointers to determine which vmas are actually thread stacks.

For a multithreaded program like the following:

\#include <pthread.h>
void *thread_main(void *foo)
{
while(1);
}

int main()
{
pthread_t t;
pthread_create(&t, NULL, thread_main, NULL);
pthread_join(t, NULL);
}

proc/PID/maps looks like the following:

00400000-00401000 r-xp 00000000 fd:0a 3671804 /home/siddhesh/a.out
00600000-00601000 rw-p 00000000 fd:0a 3671804 /home/siddhesh/a.out
019ef000-01a10000 rw-p 00000000 00:00 0 [heap]
7f8a44491000-7f8a44492000 ---p 00000000 00:00 0
7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0
7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482 /lib64/libc-2.14.90.so
7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so
7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so
7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482 /lib64/libc-2.14.90.so
7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0
7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938 /lib64/libpthread-2.14.90.so
7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so
7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938 /lib64/libpthread-2.14.90.so
7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so
7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0
7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348 /lib64/ld-2.14.90.so
7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0
7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0
7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348 /lib64/ld-2.14.90.so
7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348 /lib64/ld-2.14.90.so
7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0
7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack]
7fff627ff000-7fff62800000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

Here, one could guess that 7f8a44492000-7f8a44c92000 is a stack since
the earlier vma that has no permissions (7f8a44e3d000-7f8a4503d000)
but that is not always a reliable way to find out which vma is a
thread stack. Also, /proc/PID/maps and /proc/PID/task/TID/maps
has the same content.

With this patch in place, /proc/PID/task/TID/maps are treated as 'maps
as the task would see it' and hence, only the vma that that task uses
as stack is marked as [stack]. All other 'stack' vmas are marked as
anonymous memory. /proc/PID/maps acts as a thread group level view,
where all stack vmas are marked.

So /proc/PID/maps will look like this:

00400000-00401000 r-xp 00000000 fd:0a 3671804 /home/siddhesh/a.out
00600000-00601000 rw-p 00000000 fd:0a 3671804 /home/siddhesh/a.out
019ef000-01a10000 rw-p 00000000 00:00 0 [heap]
7f8a44491000-7f8a44492000 ---p 00000000 00:00 0
7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack]
7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482 /lib64/libc-2.14.90.so
7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so
7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so
7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482 /lib64/libc-2.14.90.so
7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0
7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938 /lib64/libpthread-2.14.90.so
7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so
7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938 /lib64/libpthread-2.14.90.so
7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so
7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0
7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348 /lib64/ld-2.14.90.so
7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0
7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0
7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348 /lib64/ld-2.14.90.so
7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348 /lib64/ld-2.14.90.so
7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0
7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack]
7fff627ff000-7fff62800000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

Thus marking all vmas that are used as stacks in the thread group. The
task level maps will however look like this:

00400000-00401000 r-xp 00000000 fd:0a 3671804 /home/siddhesh/a.out
00600000-00601000 rw-p 00000000 fd:0a 3671804 /home/siddhesh/a.out
019ef000-01a10000 rw-p 00000000 00:00 0 [heap]
7f8a44491000-7f8a44492000 ---p 00000000 00:00 0
7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack]
7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482 /lib64/libc-2.14.90.so
7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so
7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so
7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482 /lib64/libc-2.14.90.so
7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0
7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938 /lib64/libpthread-2.14.90.so
7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so
7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938 /lib64/libpthread-2.14.90.so
7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so
7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0
7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348 /lib64/ld-2.14.90.so
7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0
7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0
7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348 /lib64/ld-2.14.90.so
7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348 /lib64/ld-2.14.90.so
7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0
7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0
7fff627ff000-7fff62800000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

where only the vma that is being used as a stack by *that* task is
marked as [stack].

Analogous changes have been made to /proc/PID/smaps,
/proc/PID/numa_maps, /proc/PID/task/TID/smaps and
/proc/PID/task/TID/numa_maps. Relevant snippets from smaps and
numa_maps:

[siddhesh@localhost ~ ]$ pgrep a.out
1441
[siddhesh@localhost ~ ]$ cat /proc/1441/smaps | grep "\[stack\]"
7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack]
7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack]
[siddhesh@localhost ~ ]$ cat /proc/1441/task/1442/smaps | grep "\[stack\]"
7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack]
[siddhesh@localhost ~ ]$ cat /proc/1441/task/1441/smaps | grep "\[stack\]"
7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack]
[siddhesh@localhost ~ ]$ cat /proc/1441/numa_maps | grep "stack"
7f8a44492000 default stack anon=2 dirty=2 N0=2
7fff6273a000 default stack anon=3 dirty=3 N0=3
[siddhesh@localhost ~ ]$ cat /proc/1441/task/1442/numa_maps | grep "stack"
7f8a44492000 default stack anon=2 dirty=2 N0=2
[siddhesh@localhost ~ ]$ cat /proc/1441/task/1441/numa_maps | grep "stack"
7fff6273a000 default stack anon=3 dirty=3 N0=3

Signed-off-by: Siddhesh Poyarekar <[email protected]>
---
Documentation/filesystems/proc.txt | 10 ++-
fs/proc/base.c | 12 ++--
fs/proc/internal.h | 9 ++-
fs/proc/task_mmu.c | 139 ++++++++++++++++++++++++++++++------
fs/proc/task_nommu.c | 57 ++++++++++++---
include/linux/mm.h | 9 +++
mm/memory.c | 22 ++++++
7 files changed, 214 insertions(+), 44 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index a76a26a..e0f9de3 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -290,7 +290,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
rsslim current limit in bytes on the rss
start_code address above which program text can run
end_code address below which program text can run
- start_stack address of the start of the stack
+ start_stack address of the start of the main process stack
esp current value of ESP
eip current value of EIP
pending bitmap of pending signals
@@ -356,12 +356,18 @@ The "pathname" shows the name associated file for this mapping. If the mapping
is not associated with a file:

[heap] = the heap of the program
- [stack] = the stack of the main process
+ [stack] = the mapping is used as a stack by one
+ of the threads of the process
[vdso] = the "virtual dynamic shared object",
the kernel system call handler

or if empty, the mapping is anonymous.

+The /proc/PID/task/TID/maps is a view of the virtual memory from the viewpoint
+of the individual tasks of a process. In this file you will see a mapping marked
+as [stack] only if that task sees it as a stack. This is a key difference from
+the content of /proc/PID/maps, where you will see all mappings that are being
+used as stack by all of those tasks.

The /proc/PID/smaps is an extension based on maps, showing the memory
consumption for each of the process's mappings. For each of mappings there
diff --git a/fs/proc/base.c b/fs/proc/base.c
index d4548dd..558660a 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2990,9 +2990,9 @@ static const struct pid_entry tgid_base_stuff[] = {
INF("cmdline", S_IRUGO, proc_pid_cmdline),
ONE("stat", S_IRUGO, proc_tgid_stat),
ONE("statm", S_IRUGO, proc_pid_statm),
- REG("maps", S_IRUGO, proc_maps_operations),
+ REG("maps", S_IRUGO, proc_pid_maps_operations),
#ifdef CONFIG_NUMA
- REG("numa_maps", S_IRUGO, proc_numa_maps_operations),
+ REG("numa_maps", S_IRUGO, proc_pid_numa_maps_operations),
#endif
REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations),
LNK("cwd", proc_cwd_link),
@@ -3003,7 +3003,7 @@ static const struct pid_entry tgid_base_stuff[] = {
REG("mountstats", S_IRUSR, proc_mountstats_operations),
#ifdef CONFIG_PROC_PAGE_MONITOR
REG("clear_refs", S_IWUSR, proc_clear_refs_operations),
- REG("smaps", S_IRUGO, proc_smaps_operations),
+ REG("smaps", S_IRUGO, proc_pid_smaps_operations),
REG("pagemap", S_IRUGO, proc_pagemap_operations),
#endif
#ifdef CONFIG_SECURITY
@@ -3349,9 +3349,9 @@ static const struct pid_entry tid_base_stuff[] = {
INF("cmdline", S_IRUGO, proc_pid_cmdline),
ONE("stat", S_IRUGO, proc_tid_stat),
ONE("statm", S_IRUGO, proc_pid_statm),
- REG("maps", S_IRUGO, proc_maps_operations),
+ REG("maps", S_IRUGO, proc_tid_maps_operations),
#ifdef CONFIG_NUMA
- REG("numa_maps", S_IRUGO, proc_numa_maps_operations),
+ REG("numa_maps", S_IRUGO, proc_tid_numa_maps_operations),
#endif
REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations),
LNK("cwd", proc_cwd_link),
@@ -3361,7 +3361,7 @@ static const struct pid_entry tid_base_stuff[] = {
REG("mountinfo", S_IRUGO, proc_mountinfo_operations),
#ifdef CONFIG_PROC_PAGE_MONITOR
REG("clear_refs", S_IWUSR, proc_clear_refs_operations),
- REG("smaps", S_IRUGO, proc_smaps_operations),
+ REG("smaps", S_IRUGO, proc_tid_smaps_operations),
REG("pagemap", S_IRUGO, proc_pagemap_operations),
#endif
#ifdef CONFIG_SECURITY
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 2925775..c44efe1 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -53,9 +53,12 @@ extern int proc_pid_statm(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task);
extern loff_t mem_lseek(struct file *file, loff_t offset, int orig);

-extern const struct file_operations proc_maps_operations;
-extern const struct file_operations proc_numa_maps_operations;
-extern const struct file_operations proc_smaps_operations;
+extern const struct file_operations proc_pid_maps_operations;
+extern const struct file_operations proc_tid_maps_operations;
+extern const struct file_operations proc_pid_numa_maps_operations;
+extern const struct file_operations proc_tid_numa_maps_operations;
+extern const struct file_operations proc_pid_smaps_operations;
+extern const struct file_operations proc_tid_smaps_operations;
extern const struct file_operations proc_clear_refs_operations;
extern const struct file_operations proc_pagemap_operations;
extern const struct file_operations proc_net_operations;
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 7dcd2a2..3e166f5 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -209,10 +209,12 @@ static int do_maps_open(struct inode *inode, struct file *file,
return ret;
}

-static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
+static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
{
struct mm_struct *mm = vma->vm_mm;
struct file *file = vma->vm_file;
+ struct proc_maps_private *priv = m->private;
+ struct task_struct *task = priv->task;
vm_flags_t flags = vma->vm_flags;
unsigned long ino = 0;
unsigned long long pgoff = 0;
@@ -259,8 +261,7 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
if (vma->vm_start <= mm->brk &&
vma->vm_end >= mm->start_brk) {
name = "[heap]";
- } else if (vma->vm_start <= mm->start_stack &&
- vma->vm_end >= mm->start_stack) {
+ } else if (vm_is_stack(task, vma, is_pid)) {
name = "[stack]";
}
} else {
@@ -275,13 +276,13 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
seq_putc(m, '\n');
}

-static int show_map(struct seq_file *m, void *v)
+static int show_map(struct seq_file *m, void *v, int is_pid)
{
struct vm_area_struct *vma = v;
struct proc_maps_private *priv = m->private;
struct task_struct *task = priv->task;

- show_map_vma(m, vma);
+ show_map_vma(m, vma, is_pid);

if (m->count < m->size) /* vma is copied successfully */
m->version = (vma != get_gate_vma(task->mm))
@@ -289,20 +290,49 @@ static int show_map(struct seq_file *m, void *v)
return 0;
}

+static int show_pid_map(struct seq_file *m, void *v)
+{
+ return show_map(m, v, 1);
+}
+
+static int show_tid_map(struct seq_file *m, void *v)
+{
+ return show_map(m, v, 0);
+}
+
static const struct seq_operations proc_pid_maps_op = {
.start = m_start,
.next = m_next,
.stop = m_stop,
- .show = show_map
+ .show = show_pid_map
};

-static int maps_open(struct inode *inode, struct file *file)
+static const struct seq_operations proc_tid_maps_op = {
+ .start = m_start,
+ .next = m_next,
+ .stop = m_stop,
+ .show = show_tid_map
+};
+
+static int pid_maps_open(struct inode *inode, struct file *file)
{
return do_maps_open(inode, file, &proc_pid_maps_op);
}

-const struct file_operations proc_maps_operations = {
- .open = maps_open,
+static int tid_maps_open(struct inode *inode, struct file *file)
+{
+ return do_maps_open(inode, file, &proc_tid_maps_op);
+}
+
+const struct file_operations proc_pid_maps_operations = {
+ .open = pid_maps_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release_private,
+};
+
+const struct file_operations proc_tid_maps_operations = {
+ .open = tid_maps_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release_private,
@@ -422,7 +452,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
return 0;
}

-static int show_smap(struct seq_file *m, void *v)
+static int show_smap(struct seq_file *m, void *v, int is_pid)
{
struct proc_maps_private *priv = m->private;
struct task_struct *task = priv->task;
@@ -440,7 +470,7 @@ static int show_smap(struct seq_file *m, void *v)
if (vma->vm_mm && !is_vm_hugetlb_page(vma))
walk_page_range(vma->vm_start, vma->vm_end, &smaps_walk);

- show_map_vma(m, vma);
+ show_map_vma(m, vma, is_pid);

seq_printf(m,
"Size: %8lu kB\n"
@@ -479,20 +509,49 @@ static int show_smap(struct seq_file *m, void *v)
return 0;
}

+static int show_pid_smap(struct seq_file *m, void *v)
+{
+ return show_smap(m, v, 1);
+}
+
+static int show_tid_smap(struct seq_file *m, void *v)
+{
+ return show_smap(m, v, 0);
+}
+
static const struct seq_operations proc_pid_smaps_op = {
.start = m_start,
.next = m_next,
.stop = m_stop,
- .show = show_smap
+ .show = show_pid_smap
+};
+
+static const struct seq_operations proc_tid_smaps_op = {
+ .start = m_start,
+ .next = m_next,
+ .stop = m_stop,
+ .show = show_tid_smap
};

-static int smaps_open(struct inode *inode, struct file *file)
+static int pid_smaps_open(struct inode *inode, struct file *file)
{
return do_maps_open(inode, file, &proc_pid_smaps_op);
}

-const struct file_operations proc_smaps_operations = {
- .open = smaps_open,
+static int tid_smaps_open(struct inode *inode, struct file *file)
+{
+ return do_maps_open(inode, file, &proc_tid_smaps_op);
+}
+
+const struct file_operations proc_pid_smaps_operations = {
+ .open = pid_smaps_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release_private,
+};
+
+const struct file_operations proc_tid_smaps_operations = {
+ .open = tid_smaps_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release_private,
@@ -1002,7 +1061,7 @@ static int gather_hugetbl_stats(pte_t *pte, unsigned long hmask,
/*
* Display pages allocated per node and memory policy via /proc.
*/
-static int show_numa_map(struct seq_file *m, void *v)
+static int show_numa_map(struct seq_file *m, void *v, int is_pid)
{
struct numa_maps_private *numa_priv = m->private;
struct proc_maps_private *proc_priv = &numa_priv->proc_maps;
@@ -1039,8 +1098,7 @@ static int show_numa_map(struct seq_file *m, void *v)
seq_path(m, &file->f_path, "\n\t= ");
} else if (vma->vm_start <= mm->brk && vma->vm_end >= mm->start_brk) {
seq_printf(m, " heap");
- } else if (vma->vm_start <= mm->start_stack &&
- vma->vm_end >= mm->start_stack) {
+ } else if (vm_is_stack(proc_priv->task, vma, is_pid)) {
seq_printf(m, " stack");
}

@@ -1084,21 +1142,39 @@ out:
return 0;
}

+static int show_pid_numa_map(struct seq_file *m, void *v)
+{
+ return show_numa_map(m, v, 1);
+}
+
+static int show_tid_numa_map(struct seq_file *m, void *v)
+{
+ return show_numa_map(m, v, 0);
+}
+
static const struct seq_operations proc_pid_numa_maps_op = {
.start = m_start,
.next = m_next,
.stop = m_stop,
- .show = show_numa_map,
+ .show = show_pid_numa_map,
};

-static int numa_maps_open(struct inode *inode, struct file *file)
+static const struct seq_operations proc_tid_numa_maps_op = {
+ .start = m_start,
+ .next = m_next,
+ .stop = m_stop,
+ .show = show_tid_numa_map,
+};
+
+static int numa_maps_open(struct inode *inode, struct file *file,
+ const struct seq_operations *ops)
{
struct numa_maps_private *priv;
int ret = -ENOMEM;
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
if (priv) {
priv->proc_maps.pid = proc_pid(inode);
- ret = seq_open(file, &proc_pid_numa_maps_op);
+ ret = seq_open(file, ops);
if (!ret) {
struct seq_file *m = file->private_data;
m->private = priv;
@@ -1109,8 +1185,25 @@ static int numa_maps_open(struct inode *inode, struct file *file)
return ret;
}

-const struct file_operations proc_numa_maps_operations = {
- .open = numa_maps_open,
+static int pid_numa_maps_open(struct inode *inode, struct file *file)
+{
+ return numa_maps_open(inode, file, &proc_pid_numa_maps_op);
+}
+
+static int tid_numa_maps_open(struct inode *inode, struct file *file)
+{
+ return numa_maps_open(inode, file, &proc_tid_numa_maps_op);
+}
+
+const struct file_operations proc_pid_numa_maps_operations = {
+ .open = pid_numa_maps_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release_private,
+};
+
+const struct file_operations proc_tid_numa_maps_operations = {
+ .open = tid_numa_maps_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release_private,
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 980de54..bdfff69 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -134,9 +134,11 @@ static void pad_len_spaces(struct seq_file *m, int len)
/*
* display a single VMA to a sequenced file
*/
-static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma)
+static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma,
+ int is_pid)
{
struct mm_struct *mm = vma->vm_mm;
+ struct proc_maps_private *priv = m->private;
unsigned long ino = 0;
struct file *file;
dev_t dev = 0;
@@ -168,8 +170,7 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma)
pad_len_spaces(m, len);
seq_path(m, &file->f_path, "");
} else if (mm) {
- if (vma->vm_start <= mm->start_stack &&
- vma->vm_end >= mm->start_stack) {
+ if (vm_is_stack(priv->task, vma, is_pid))
pad_len_spaces(m, len);
seq_puts(m, "[stack]");
}
@@ -182,11 +183,22 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma)
/*
* display mapping lines for a particular process's /proc/pid/maps
*/
-static int show_map(struct seq_file *m, void *_p)
+static int show_map(struct seq_file *m, void *_p, int is_pid)
{
struct rb_node *p = _p;

- return nommu_vma_show(m, rb_entry(p, struct vm_area_struct, vm_rb));
+ return nommu_vma_show(m, rb_entry(p, struct vm_area_struct, vm_rb),
+ is_pid);
+}
+
+static int show_pid_map(struct seq_file *m, void *_p)
+{
+ return show_map(m, _p, 1);
+}
+
+static int show_tid_map(struct seq_file *m, void *_p)
+{
+ return show_map(m, _p, 0);
}

static void *m_start(struct seq_file *m, loff_t *pos)
@@ -240,10 +252,18 @@ static const struct seq_operations proc_pid_maps_ops = {
.start = m_start,
.next = m_next,
.stop = m_stop,
- .show = show_map
+ .show = show_pid_map
+};
+
+static const struct seq_operations proc_tid_maps_ops = {
+ .start = m_start,
+ .next = m_next,
+ .stop = m_stop,
+ .show = show_tid_map
};

-static int maps_open(struct inode *inode, struct file *file)
+static int maps_open(struct inode *inode, struct file *file,
+ const struct seq_operations *ops)
{
struct proc_maps_private *priv;
int ret = -ENOMEM;
@@ -251,7 +271,7 @@ static int maps_open(struct inode *inode, struct file *file)
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
if (priv) {
priv->pid = proc_pid(inode);
- ret = seq_open(file, &proc_pid_maps_ops);
+ ret = seq_open(file, ops);
if (!ret) {
struct seq_file *m = file->private_data;
m->private = priv;
@@ -262,8 +282,25 @@ static int maps_open(struct inode *inode, struct file *file)
return ret;
}

-const struct file_operations proc_maps_operations = {
- .open = maps_open,
+static int pid_maps_open(struct inode *inode, struct file *file)
+{
+ return maps_open(inode, file, &proc_pid_maps_ops);
+}
+
+static int tid_maps_open(struct inode *inode, struct file *file)
+{
+ return maps_open(inode, file, &proc_tid_maps_ops);
+}
+
+const struct file_operations proc_pid_maps_operations = {
+ .open = pid_maps_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release_private,
+};
+
+const struct file_operations proc_tid_maps_operations = {
+ .open = tid_maps_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release_private,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 17b27cd..b0fc583 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1040,6 +1040,15 @@ static inline int stack_guard_page_end(struct vm_area_struct *vma,
!vma_growsup(vma->vm_next, addr);
}

+/* Check if the vma is being used as a stack by this task */
+static inline int vm_is_stack_for_task(struct task_struct *t,
+ struct vm_area_struct *vma)
+{
+ return (vma->vm_start <= KSTK_ESP(t) && vma->vm_end >= KSTK_ESP(t));
+}
+
+extern int vm_is_stack(struct task_struct *task, struct vm_area_struct *vma, int in_group);
+
extern unsigned long move_page_tables(struct vm_area_struct *vma,
unsigned long old_addr, struct vm_area_struct *new_vma,
unsigned long new_addr, unsigned long len);
diff --git a/mm/memory.c b/mm/memory.c
index fa2f04e..601a920 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3909,6 +3909,28 @@ void print_vma_addr(char *prefix, unsigned long ip)
up_read(&current->mm->mmap_sem);
}

+/*
+ * Check if the vma is being used as a stack.
+ * If is_group is non-zero, check in the entire thread group or else
+ * just check in the current task.
+ */
+int vm_is_stack(struct task_struct *task,
+ struct vm_area_struct *vma, int in_group)
+{
+ if (vm_is_stack_for_task(task, vma))
+ return 1;
+
+ if (in_group) {
+ struct task_struct *t = task;
+ while_each_thread(task, t) {
+ if (vm_is_stack_for_task(t, vma))
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
#ifdef CONFIG_PROVE_LOCKING
void might_fault(void)
{
--
1.7.7.4

2012-02-23 20:22:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps

On Thu, 23 Feb 2012 09:33:31 +0530
Siddhesh Poyarekar <[email protected]> wrote:

> With this patch in place, /proc/PID/task/TID/maps are treated as 'maps
> as the task would see it' and hence, only the vma that that task uses
> as stack is marked as [stack]. All other 'stack' vmas are marked as
> anonymous memory. /proc/PID/maps acts as a thread group level view,
> where all stack vmas are marked.

Looks OK to me, thanks. I doubt if those interface changes will cause
significant disruption.

2012-02-23 23:18:14

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps

Hi

This version makes sense to me. and I verified this change don't break
procps tools.

But,

> +int vm_is_stack(struct task_struct *task,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct vm_area_struct *vma, int in_group)
> +{
> + ? ? ? if (vm_is_stack_for_task(task, vma))
> + ? ? ? ? ? ? ? return 1;
> +
> + ? ? ? if (in_group) {
> + ? ? ? ? ? ? ? struct task_struct *t = task;
> + ? ? ? ? ? ? ? while_each_thread(task, t) {

How protect this loop from task exiting? AFAIK, while_each_thread
require rcu_read_lock or task_list_lock.


> + ? ? ? ? ? ? ? ? ? ? ? if (vm_is_stack_for_task(t, vma))
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? return 1;
> + ? ? ? ? ? ? ? }
> + ? ? ? }

2012-02-23 23:47:55

by Mike Frysinger

[permalink] [raw]
Subject: Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps

On Wednesday 22 February 2012 23:03:31 Siddhesh Poyarekar wrote:
> With this patch in place, /proc/PID/task/TID/maps are treated as 'maps
> as the task would see it' and hence, only the vma that that task uses
> as stack is marked as [stack]. All other 'stack' vmas are marked as
> anonymous memory. /proc/PID/maps acts as a thread group level view,
> where all stack vmas are marked.

i don't suppose we could have it say "[tid stack]" rather than "[stack]" ? or
perhaps even "[stack tid:%u]" with replacing %u with the tid ?
-mike


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part.

2012-02-24 00:50:07

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps

2012/2/23 KOSAKI Motohiro <[email protected]>:
> Hi
>
> This version makes sense to me. and I verified this change don't break
> procps tools.

Sigh. No, I missed one thing. If application use
makecontext()/swapcontext() pair,
ESP is not reliable way to detect pthread stack. At that time the
stack is still marked
as anonymous memory.

2012-02-24 05:29:07

by Siddhesh Poyarekar

[permalink] [raw]
Subject: Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps

On Fri, Feb 24, 2012 at 4:47 AM, KOSAKI Motohiro
<[email protected]> wrote:
> How protect this loop from task exiting? AFAIK, while_each_thread
> require rcu_read_lock or task_list_lock.

I missed this, thanks. I'll send a patch for this on top of my earlier
patch since Andrew has already included the earlier patch.

> Sigh. No, I missed one thing. If application use
> makecontext()/swapcontext() pair,
> ESP is not reliable way to detect pthread stack. At that time the
> stack is still marked
> as anonymous memory.

This is not wrong, because it essentially gives the correct picture of
the state of that task -- the task is using another vma as a stack
during that point and not the one it was allotted by pthreads during
thread creation.

I don't think we can successfully stick to the idea of trying to mark
stack space allocated by pthreads but not used by any task *currently*
as stack as long as the allocation happens outside the kernel space.
The only way to mark this is either by marking the stack as
VM_GROWSDOWN (which will make the stack grow and break some pthreads
functions) or create a new flag, which a simple display such as this
does not deserve. So it's best that this sticks to what the kernel
*knows* is being used as stack.

--
Siddhesh Poyarekar
http://siddhesh.in

2012-02-24 05:47:51

by Siddhesh Poyarekar

[permalink] [raw]
Subject: Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps

On Fri, Feb 24, 2012 at 5:17 AM, Mike Frysinger <[email protected]> wrote:
> i don't suppose we could have it say "[tid stack]" rather than "[stack]" ? ?or
> perhaps even "[stack tid:%u]" with replacing %u with the tid ?

Why do we need to differentiate a thread stack from a process stack?
If someone really wants to know, the main stack is the last one since
it doesn't look like mmap allocates anything above the stack right
now.

I like the idea of marking all stack vmas with their task ids but it
will most likely break procps. Besides, I think it could be done
within procps with this change rather than having the kernel do it.

--
Siddhesh Poyarekar
http://siddhesh.in

2012-02-24 13:05:43

by Siddhesh Poyarekar

[permalink] [raw]
Subject: Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps

On Fri, Feb 24, 2012 at 1:52 AM, Andrew Morton
<[email protected]> wrote:
> Looks OK to me, thanks. ?I doubt if those interface changes will cause
> significant disruption.
>

I just found one breakage due to this patch: `cat /proc/self/maps`
does not always get the stack marked right. I think this is because it
gets the $esp a little to early, even before the vma is sent to its
randomized space. That is why /proc/self/smaps works just ok as it
always wins the race due to the sheer volume of data it prints.
Similarly numa_maps always fails since its write volume is lower than
maps. I'll try to fix this.

--
Siddhesh Poyarekar
http://siddhesh.in

2012-02-24 16:12:48

by Mike Frysinger

[permalink] [raw]
Subject: Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps

On Friday 24 February 2012 00:47:48 Siddhesh Poyarekar wrote:
> On Fri, Feb 24, 2012 at 5:17 AM, Mike Frysinger wrote:
> > i don't suppose we could have it say "[tid stack]" rather than "[stack]"
> > ? or perhaps even "[stack tid:%u]" with replacing %u with the tid ?
>
> Why do we need to differentiate a thread stack from a process stack?

if it's trivial to display, it'd be nice to coordinate things when
investigating issues

> If someone really wants to know, the main stack is the last one since
> it doesn't look like mmap allocates anything above the stack right
> now.

you can't rely on that. you're describing arch-specific details that happen to
work.

> I like the idea of marking all stack vmas with their task ids but it
> will most likely break procps.

how ?

> Besides, I think it could be done within procps with this change rather than
> having the kernel do it.

how exactly is procps supposed to figure this out ? /proc/<pid>/maps shows the
pid's main stack, as does /proc/<pid>/tid/*/maps.
-mike


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part.

2012-02-24 16:15:00

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps

>> Sigh. No, I missed one thing. If application use
>> makecontext()/swapcontext() pair,
>> ESP is not reliable way to detect pthread stack. At that time the
>> stack is still marked
>> as anonymous memory.
>
> This is not wrong, because it essentially gives the correct picture of
> the state of that task -- the task is using another vma as a stack
> during that point and not the one it was allotted by pthreads during
> thread creation.
>
> I don't think we can successfully stick to the idea of trying to mark
> stack space allocated by pthreads but not used by any task *currently*
> as stack as long as the allocation happens outside the kernel space.
> The only way to mark this is either by marking the stack as
> VM_GROWSDOWN (which will make the stack grow and break some pthreads
> functions) or create a new flag, which a simple display such as this
> does not deserve. So it's best that this sticks to what the kernel
> *knows* is being used as stack.

Oh, maybe generically you are right. but you missed one thing. Before
your patch, stack or not stack are address space property. thus, using
/proc/pid/maps makes sense. but after your patch, it's no longer memory
property. applications can use heap or mapped file as a stack. then, at
least, current your code is wrong. the code assume each memory property
are exclusive.

Moreover, if pthread stack is unimportant, I wonder why we need this patch
at all. Which application does need it? and When?

2012-02-24 18:23:49

by Siddhesh Poyarekar

[permalink] [raw]
Subject: Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps

On Fri, Feb 24, 2012 at 9:42 PM, Mike Frysinger <[email protected]> wrote:
>> I like the idea of marking all stack vmas with their task ids but it
>> will most likely break procps.
>
> how ?

I don't know yet, since I haven't looked at the procps code. I intend
to do that once the patch is stable. But I imagine it would look for
"[stack]" or something similar in the output. It ought to be easy
enough to change I guess.

>> Besides, I think it could be done within procps with this change rather than
>> having the kernel do it.
>
> how exactly is procps supposed to figure this out ? ?/proc/<pid>/maps shows the
> pid's main stack, as does /proc/<pid>/tid/*/maps.

Since the maps are essentially the same, it would require pmap for
example, to read through the PID/maps as well as TID/maps and
associate them. I understand now that this may be a little racy.

I'll include thread ids and see how procps copes with it.


--
Siddhesh Poyarekar
http://siddhesh.in

2012-02-24 18:58:42

by Siddhesh Poyarekar

[permalink] [raw]
Subject: Re: [PATCH] Mark thread stack correctly in proc/<pid>/maps

On Fri, Feb 24, 2012 at 9:44 PM, KOSAKI Motohiro
<[email protected]> wrote:
> Oh, maybe generically you are right. but you missed one thing. Before
> your patch, stack or not stack are address space property. thus, using
> /proc/pid/maps makes sense. but after your patch, it's no longer memory
> property. applications can use heap or mapped file as a stack. then, at
> least, current your code is wrong. the code assume each memory property
> are exclusive.

Right, but I cannot think of any other alternative that does not
involve touching some sensitive code.

The setcontext family of functions where any heap, stack or even data
area portion could be used as stack, break the very concept of an
entire vma being used as a stack. In such a scenario the kernel can
only show what it knows, which is that a specific vma is being used as
a stack. I agree that it is not correct to show the entire vma as
stack, but there doesn't seem to be a better way right now in that
implementation. FWIW, if the stack space is allocated in heap, it will
show up as heap and not stack since the former gets preference.

> Moreover, if pthread stack is unimportant, I wonder why we need this patch
> at all. Which application does need it? and When?

Right, my original intent was to mark stack vmas allocated by
pthreads, which included those vmas that are in the pthreads cache.
However, this means that the kernel does not have any real control
over what it marks as stack. Also, the concept is very specific to the
glibc pthreads implementation and we're essentially just making the
kernel spit out some data blindly for glibc.

The other solution I can think of is to have stack_start as a task
level property so that each task knows their stack vma start (obtained
from its sys_clone call and not from mmap). This however means
increasing the size of task_struct by sizeof(unsigned long). Is that
overhead acceptable?

--
Siddhesh Poyarekar
http://siddhesh.in

2012-02-26 16:17:28

by Siddhesh Poyarekar

[permalink] [raw]
Subject: [PATCH] x86_64: Record stack pointer before task execution begins

task->thread.usersp is unusable immediately after a binary is exec()'d
until it undergoes a context switch cycle. The start_thread() function
called during execve() saves the stack pointer into pt_regs and into
old_rsp, but fails to record it into task->thread.usersp.

Because of this, KSTK_ESP(task) returns an incorrect value for a
64-bit program until the task is switched out and back in since
switch_to swaps %rsp values in and out into task->thread.usersp.

Signed-off-by: Siddhesh Poyarekar <[email protected]>
---
arch/x86/kernel/process_64.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 9b9fe4a..702a3b9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -341,6 +341,7 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip,
loadsegment(es, _ds);
loadsegment(ds, _ds);
load_gs_index(0);
+ current->thread.usersp = new_sp;
regs->ip = new_ip;
regs->sp = new_sp;
percpu_write(old_rsp, new_sp);
--
1.7.7.4

2012-02-27 06:18:10

by Siddhesh Poyarekar

[permalink] [raw]
Subject: [tip:x86/process] x86_64: Record stack pointer before task execution begins

Commit-ID: 42dfc43ee5999ac64284476ea0ac6c937587cf2b
Gitweb: http://git.kernel.org/tip/42dfc43ee5999ac64284476ea0ac6c937587cf2b
Author: Siddhesh Poyarekar <[email protected]>
AuthorDate: Sun, 26 Feb 2012 21:47:55 +0530
Committer: H. Peter Anvin <[email protected]>
CommitDate: Sun, 26 Feb 2012 12:59:04 -0800

x86_64: Record stack pointer before task execution begins

task->thread.usersp is unusable immediately after a binary is exec()'d
until it undergoes a context switch cycle. The start_thread() function
called during execve() saves the stack pointer into pt_regs and into
old_rsp, but fails to record it into task->thread.usersp.

Because of this, KSTK_ESP(task) returns an incorrect value for a
64-bit program until the task is switched out and back in since
switch_to swaps %rsp values in and out into task->thread.usersp.

Signed-off-by: Siddhesh Poyarekar <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/kernel/process_64.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 1fd94bc..eb54dd0 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -341,6 +341,7 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip,
loadsegment(es, _ds);
loadsegment(ds, _ds);
load_gs_index(0);
+ current->thread.usersp = new_sp;
regs->ip = new_ip;
regs->sp = new_sp;
percpu_write(old_rsp, new_sp);