2009-12-31 14:13:12

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH] proc: revert to show stack information in /proc/{pid}/status

Commit d899bf7b (procfs: provide stack information for threads) introduced
to show stack information in /proc/{pid}/status. But it cause large performance
regression. Unfortunately /proc/{pid}/status is used ps command too and ps is
one of most important component. Because both to take mmap_sem and page table walk
are heavily operation.

if many process run, the ps performance is,

[before d899bf7b]

% perf stat ps >/dev/null

Performance counter stats for 'ps':

4090.435806 task-clock-msecs # 0.032 CPUs
229 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
234 page-faults # 0.000 M/sec
8587565207 cycles # 2099.425 M/sec
9866662403 instructions # 1.149 IPC
3789415411 cache-references # 926.409 M/sec
30419509 cache-misses # 7.437 M/sec

128.859521955 seconds time elapsed

[after d899bf7b]

% perf stat ps > /dev/null

Performance counter stats for 'ps':

4305.081146 task-clock-msecs # 0.028 CPUs
480 context-switches # 0.000 M/sec
2 CPU-migrations # 0.000 M/sec
237 page-faults # 0.000 M/sec
9021211334 cycles # 2095.480 M/sec
10605887536 instructions # 1.176 IPC
3612650999 cache-references # 839.160 M/sec
23917502 cache-misses # 5.556 M/sec

152.277819582 seconds time elapsed

Thus, this patch revert it. Fortunately /proc/{pid}/task/{tid}/smaps
provide almost same information. we can use it.

Cc: Stefani Seibold <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
Documentation/filesystems/proc.txt | 2 -
fs/proc/array.c | 89 ------------------------------------
2 files changed, 0 insertions(+), 91 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 220cc63..0d07513 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -177,7 +177,6 @@ read the file /proc/PID/status:
CapBnd: ffffffffffffffff
voluntary_ctxt_switches: 0
nonvoluntary_ctxt_switches: 1
- Stack usage: 12 kB

This shows you nearly the same information you would get if you viewed it with
the ps command. In fact, ps uses the proc file system to obtain its
@@ -231,7 +230,6 @@ Table 1-2: Contents of the statm files (as of 2.6.30-rc7)
Mems_allowed_list Same as previous, but in "list format"
voluntary_ctxt_switches number of voluntary context switches
nonvoluntary_ctxt_switches number of non voluntary context switches
- Stack usage: stack usage high water mark (round up to page size)
..............................................................................

Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 2e5c2a3..b8df659 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -322,94 +322,6 @@ static inline void task_context_switch_counts(struct seq_file *m,
p->nivcsw);
}

-#ifdef CONFIG_MMU
-
-struct stack_stats {
- struct vm_area_struct *vma;
- unsigned long startpage;
- unsigned long usage;
-};
-
-static int stack_usage_pte_range(pmd_t *pmd, unsigned long addr,
- unsigned long end, struct mm_walk *walk)
-{
- struct stack_stats *ss = walk->private;
- struct vm_area_struct *vma = ss->vma;
- pte_t *pte, ptent;
- spinlock_t *ptl;
- int ret = 0;
-
- pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
- for (; addr != end; pte++, addr += PAGE_SIZE) {
- ptent = *pte;
-
-#ifdef CONFIG_STACK_GROWSUP
- if (pte_present(ptent) || is_swap_pte(ptent))
- ss->usage = addr - ss->startpage + PAGE_SIZE;
-#else
- if (pte_present(ptent) || is_swap_pte(ptent)) {
- ss->usage = ss->startpage - addr + PAGE_SIZE;
- pte++;
- ret = 1;
- break;
- }
-#endif
- }
- pte_unmap_unlock(pte - 1, ptl);
- cond_resched();
- return ret;
-}
-
-static inline unsigned long get_stack_usage_in_bytes(struct vm_area_struct *vma,
- struct task_struct *task)
-{
- struct stack_stats ss;
- struct mm_walk stack_walk = {
- .pmd_entry = stack_usage_pte_range,
- .mm = vma->vm_mm,
- .private = &ss,
- };
-
- if (!vma->vm_mm || is_vm_hugetlb_page(vma))
- return 0;
-
- ss.vma = vma;
- ss.startpage = task->stack_start & PAGE_MASK;
- ss.usage = 0;
-
-#ifdef CONFIG_STACK_GROWSUP
- walk_page_range(KSTK_ESP(task) & PAGE_MASK, vma->vm_end,
- &stack_walk);
-#else
- walk_page_range(vma->vm_start, (KSTK_ESP(task) & PAGE_MASK) + PAGE_SIZE,
- &stack_walk);
-#endif
- return ss.usage;
-}
-
-static inline void task_show_stack_usage(struct seq_file *m,
- struct task_struct *task)
-{
- struct vm_area_struct *vma;
- struct mm_struct *mm = get_task_mm(task);
-
- if (mm) {
- down_read(&mm->mmap_sem);
- vma = find_vma(mm, task->stack_start);
- if (vma)
- seq_printf(m, "Stack usage:\t%lu kB\n",
- get_stack_usage_in_bytes(vma, task) >> 10);
-
- up_read(&mm->mmap_sem);
- mmput(mm);
- }
-}
-#else
-static void task_show_stack_usage(struct seq_file *m, struct task_struct *task)
-{
-}
-#endif /* CONFIG_MMU */
-
static void task_cpus_allowed(struct seq_file *m, struct task_struct *task)
{
seq_printf(m, "Cpus_allowed:\t");
@@ -440,7 +352,6 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
task_show_regs(m, task);
#endif
task_context_switch_counts(m, task);
- task_show_stack_usage(m, task);
return 0;
}

--
1.6.6



2009-12-31 15:48:39

by Stefani Seibold

[permalink] [raw]
Subject: Re: [PATCH] proc: revert to show stack information in /proc/{pid}/status

Am Donnerstag, den 31.12.2009, 23:12 +0900 schrieb KOSAKI Motohiro:
> Commit d899bf7b (procfs: provide stack information for threads) introduced
> to show stack information in /proc/{pid}/status. But it cause large performance
> regression. Unfortunately /proc/{pid}/status is used ps command too and ps is
> one of most important component. Because both to take mmap_sem and page table walk
> are heavily operation.
>

/proc/<pid>/status is IMHO not a performance relevant thing. The main
reason is to provide exact information about a running process.

> Thus, this patch revert it. Fortunately /proc/{pid}/task/{tid}/smaps
> provide almost same information. we can use it.
>

Completely wrong. Where does smaps provides this kind of information?
Where is there the high water mark of the stack usage?

> Cc: Stefani Seibold <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Alexey Dobriyan <[email protected]>
> Cc: "Eric W. Biederman" <[email protected]>
> Cc: Randy Dunlap <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---

Nak: Stefani Seibold <[email protected]>

2010-01-01 14:14:57

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] proc: revert to show stack information in /proc/{pid}/status

> Am Donnerstag, den 31.12.2009, 23:12 +0900 schrieb KOSAKI Motohiro:
> > Commit d899bf7b (procfs: provide stack information for threads) introduced
> > to show stack information in /proc/{pid}/status. But it cause large performance
> > regression. Unfortunately /proc/{pid}/status is used ps command too and ps is
> > one of most important component. Because both to take mmap_sem and page table walk
> > are heavily operation.
> >
>
> /proc/<pid>/status is IMHO not a performance relevant thing. The main
> reason is to provide exact information about a running process.

No. You have to learn real world use case. if you think so, you should
change ps before submit this change. This patch obviously make harm than worth.
Nobody (except you) use it but everybody get regression.


> > Thus, this patch revert it. Fortunately /proc/{pid}/task/{tid}/smaps
> > provide almost same information. we can use it.
>
> Completely wrong. Where does smaps provides this kind of information?
> Where is there the high water mark of the stack usage?

You have to see you patch itself. show_map_vma() isn't only used by /proc/pid/maps,
but also be used by /proc/pid/smaps.

Now, /proc/{pid}/task/{tid}/smaps show following column.

7fb97c181000-7fb97d1d1000 rw-p 00000000 00:00 0 [threadstack:0084eff0]
Size: 16704 kB
Rss: 8 kB
Pss: 8 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 8 kB
Referenced: 8 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB

It descibe
- This vma is thread stack
- vma size is 16704kB
- stack vsz size in vma is 0x0084eff0 (~ 8508kB)
- and, physical memory is used 8kB.


Anyway, I revert the regresstion patch as other regression patches. if you really want
this feature, you have three options.

1. create new /proc file instead to use /proc/pid/status.
2. improve performance until typical use-case don't notice regression.
3. change ps and other /proc related userland implementation and resubmit this patch.

But even if you choose anything, You have to test both its functional and performance
_before_ submitting kernel patch.



2010-01-01 15:10:38

by Stefani Seibold

[permalink] [raw]
Subject: Re: [PATCH] proc: revert to show stack information in /proc/{pid}/status

Am Freitag, den 01.01.2010, 23:14 +0900 schrieb KOSAKI Motohiro:
> > Am Donnerstag, den 31.12.2009, 23:12 +0900 schrieb KOSAKI Motohiro:
> > > Commit d899bf7b (procfs: provide stack information for threads) introduced
> > > to show stack information in /proc/{pid}/status. But it cause large performance
> > > regression. Unfortunately /proc/{pid}/status is used ps command too and ps is
> > > one of most important component. Because both to take mmap_sem and page table walk
> > > are heavily operation.
> > >
> >
> > /proc/<pid>/status is IMHO not a performance relevant thing. The main
> > reason is to provide exact information about a running process.
>
> No. You have to learn real world use case. if you think so, you should
> change ps before submit this change. This patch obviously make harm than worth.
> Nobody (except you) use it but everybody get regression.
>

It is fascinating that every developer means that only his personal view
and requirements are the wisdom of the world.

>
> > > Thus, this patch revert it. Fortunately /proc/{pid}/task/{tid}/smaps
> > > provide almost same information. we can use it.
> >
> > Completely wrong. Where does smaps provides this kind of information?
> > Where is there the high water mark of the stack usage?
>
> You have to see you patch itself. show_map_vma() isn't only used by /proc/pid/maps,
> but also be used by /proc/pid/smaps.
>
> Now, /proc/{pid}/task/{tid}/smaps show following column.
>
> 7fb97c181000-7fb97d1d1000 rw-p 00000000 00:00 0 [threadstack:0084eff0]
> Size: 16704 kB
> Rss: 8 kB
> Pss: 8 kB
> Shared_Clean: 0 kB
> Shared_Dirty: 0 kB
> Private_Clean: 0 kB
> Private_Dirty: 8 kB
> Referenced: 8 kB
> Swap: 0 kB
> KernelPageSize: 4 kB
> MMUPageSize: 4 kB
>
> It descibe
> - This vma is thread stack
> - vma size is 16704kB
> - stack vsz size in vma is 0x0084eff0 (~ 8508kB)
> - and, physical memory is used 8kB.
>

But it don't describe the usage high water mark. With the information
provided by proc/*/smaps this is not possible. It is very funny to get
complains without checking. Your assertion is completely WRONG.

>
> Anyway, I revert the regresstion patch as other regression patches. if you really want
> this feature, you have three options.
>
> 1. create new /proc file instead to use /proc/pid/status.

This was discarded by Andrew. He prefered the inclusion
in /proc/pid/status.

> 2. improve performance until typical use-case don't notice regression.

Not possible.

> 3. change ps and other /proc related userland implementation and resubmit this patch.

ps works quiet well.

>
> But even if you choose anything, You have to test both its functional and performance
> _before_ submitting kernel patch.

Teach me master! Do you think i don't know that is coast something?.
Walking through the pages coast some runtime. But ps is not a
performance critical application nor a cat /proc/*/status!

Stefani

2010-01-01 15:49:32

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] proc: revert to show stack information in /proc/{pid}/status

KOSAKI Motohiro <[email protected]> writes:
>
> Anyway, I revert the regresstion patch as other regression patches. if you really want
> this feature, you have three options.
>
> 1. create new /proc file instead to use /proc/pid/status.
> 2. improve performance until typical use-case don't notice regression.
> 3. change ps and other /proc related userland implementation and resubmit this patch.
>
> But even if you choose anything, You have to test both its functional and performance
> _before_ submitting kernel patch.


4. Leave everything alone (revert all the commits) and use a ptrace
based tool to get this information when you need it.

That is what I suggested during the original code review and I still think
it's the best solution for such a obscure problem.

-Andi

--
[email protected] -- Speaking for myself only.

2010-01-01 16:10:14

by Stefani Seibold

[permalink] [raw]
Subject: Re: [PATCH] proc: revert to show stack information in /proc/{pid}/status

Am Freitag, den 01.01.2010, 16:49 +0100 schrieb Andi Kleen:
> KOSAKI Motohiro <[email protected]> writes:
> >
> > Anyway, I revert the regresstion patch as other regression patches. if you really want
> > this feature, you have three options.
> >
> > 1. create new /proc file instead to use /proc/pid/status.
> > 2. improve performance until typical use-case don't notice regression.
> > 3. change ps and other /proc related userland implementation and resubmit this patch.
> >
> > But even if you choose anything, You have to test both its functional and performance
> > _before_ submitting kernel patch.
>
>
> 4. Leave everything alone (revert all the commits) and use a ptrace
> based tool to get this information when you need it.
>

Without the stack_start and the fix for the bogus x86_64 KSTP_ESP it
wouldn't be possible.

2010-01-02 01:42:53

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] proc: revert to show stack information in /proc/{pid}/status

Samuel Thibault <[email protected]> writes:

> Stefani Seibold, le Fri 01 Jan 2010 16:10:16 +0100, a ?crit :
>> But ps is not a performance critical application nor a cat
>> /proc/*/status!
>
> Errr, maybe not so much as some other operations, but a lot of tools use
> them, so it's really worth considering it.

As an historical anecdote one of my first patches to Linux ever made
exactly the same mistake as Stefani's (adding mmap_sem to a proc file
read by proc). It was quickly reverted again by Linus when the
reports of ps not working under swapping poured in.

Same should be done with this patch (commit 9ebd4eba761b624a6a6c9189335adeddcb1fa0e0)

-Andi

--
[email protected] -- Speaking for myself only.

2010-01-01 22:05:34

by Samuel Thibault

[permalink] [raw]
Subject: Re: [PATCH] proc: revert to show stack information in /proc/{pid}/status

Stefani Seibold, le Fri 01 Jan 2010 16:10:16 +0100, a ?crit :
> But ps is not a performance critical application nor a cat
> /proc/*/status!

Errr, maybe not so much as some other operations, but a lot of tools use
them, so it's really worth considering it.

Samuel

2010-01-01 22:21:51

by Stefani Seibold

[permalink] [raw]
Subject: Re: [PATCH] proc: revert to show stack information in /proc/{pid}/status

Am Freitag, den 01.01.2010, 23:05 +0100 schrieb Samuel Thibault:
> Stefani Seibold, le Fri 01 Jan 2010 16:10:16 +0100, a ?crit :
> > But ps is not a performance critical application nor a cat
> > /proc/*/status!
>
> Errr, maybe not so much as some other operations, but a lot of tools use
> them, so it's really worth considering it.
>

Right. And i am still trying to find some optimization. Stay tuned.

Stefani

2010-01-02 05:53:55

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] proc: revert to show stack information in /proc/{pid}/status

2010/1/2 Stefani Seibold <[email protected]>:
> Am Freitag, den 01.01.2010, 23:05 +0100 schrieb Samuel Thibault:
>> Stefani Seibold, le Fri 01 Jan 2010 16:10:16 +0100, a ?crit :
>> > But ps is not a performance critical application nor a cat
>> > /proc/*/status!
>>
>> Errr, maybe not so much as some other operations, but a lot of tools use
>> them, so it's really worth considering it.
>
> Right. And i am still trying to find some optimization. Stay tuned.

I can understand you feel sad. but don't worry. reverting is very usual event
on kernel development. You can retry anytime if you make optimized code.

2010-01-02 08:26:35

by Stefani Seibold

[permalink] [raw]
Subject: Re: [PATCH] proc: revert to show stack information in /proc/{pid}/status

Am Samstag, den 02.01.2010, 14:53 +0900 schrieb KOSAKI Motohiro:
> 2010/1/2 Stefani Seibold <[email protected]>:
> > Am Freitag, den 01.01.2010, 23:05 +0100 schrieb Samuel Thibault:
> >> Stefani Seibold, le Fri 01 Jan 2010 16:10:16 +0100, a ?crit :
> >> > But ps is not a performance critical application nor a cat
> >> > /proc/*/status!
> >>
> >> Errr, maybe not so much as some other operations, but a lot of tools use
> >> them, so it's really worth considering it.
> >
> > Right. And i am still trying to find some optimization. Stay tuned.
>
> I can understand you feel sad. but don't worry. reverting is very usual event
> on kernel development. You can retry anytime if you make optimized code.

I had analyzed the problem and there is only one solution. Create a new
proc/<pid>/stack_usage entry, which was my first idea. This has absolut
now side effects.

I will write a patch and post it in the next days.

Stefani

2010-01-02 14:05:34

by Stefani Seibold

[permalink] [raw]
Subject: [PATCH] partial revert to show stack information in /proc/<pid>/status

As announce here is the patch to partial revert the show stack
information patch due a not accepted performance regression. It will be
now show only the current stack usage, not the high water mark.

The path is only partial reverted because i need the other parts to do
it in an other way.

There are now two possibilities solutions:

- create a new /proc/<pid>/stackinfo entry, which provides the reverted
information and maybe others like the sigaltstack.
- create a user space tool which use /proc/<pid>/pagemap

In both cases the information of task->stack_start and the KSTK_ESP is
needed.

It will be also needed for an enhancement of the oom handler, where i
free unused stack pages (the pages before the stack pointer) under high
memory pressure. This is currently under work.

Andrew please apply this patch to 2.6.34-rc* tree.

Greetings,
Stefani

Signed-off-by: Stefani Seibold <[email protected]>
---
Documentation/filesystems/proc.txt | 2
fs/proc/array.c | 87 ++-----------------------------------
2 files changed, 8 insertions(+), 81 deletions(-)

diff -u -N -r -p linux-2.6.33-rc2.orig/fs/proc/array.c linux-2.6.33-rc2.new/fs/proc/array.c
--- linux-2.6.33-rc2.orig/fs/proc/array.c 2009-12-27 23:37:04.817427024 +0100
+++ linux-2.6.33-rc2.new/fs/proc/array.c 2010-01-02 14:36:53.794188418 +0100
@@ -327,93 +327,20 @@ static inline void task_context_switch_c
p->nivcsw);
}

-#ifdef CONFIG_MMU
-
-struct stack_stats {
- struct vm_area_struct *vma;
- unsigned long startpage;
- unsigned long usage;
-};
-
-static int stack_usage_pte_range(pmd_t *pmd, unsigned long addr,
- unsigned long end, struct mm_walk *walk)
+static inline void task_show_stack_usage(struct seq_file *m,
+ struct task_struct *task)
{
- struct stack_stats *ss = walk->private;
- struct vm_area_struct *vma = ss->vma;
- pte_t *pte, ptent;
- spinlock_t *ptl;
- int ret = 0;
-
- pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
- for (; addr != end; pte++, addr += PAGE_SIZE) {
- ptent = *pte;
+ unsigned long usage;

+ if (task->mm) {
#ifdef CONFIG_STACK_GROWSUP
- if (pte_present(ptent) || is_swap_pte(ptent))
- ss->usage = addr - ss->startpage + PAGE_SIZE;
+ usage = KSTK_ESP(task) - task->stack_start;
#else
- if (pte_present(ptent) || is_swap_pte(ptent)) {
- ss->usage = ss->startpage - addr + PAGE_SIZE;
- pte++;
- ret = 1;
- break;
- }
+ usage = task->stack_start - KSTK_ESP(task);
#endif
+ seq_printf(m, "Stack usage:\t%lu kB\n", (usage + 1023) >> 10);
}
- pte_unmap_unlock(pte - 1, ptl);
- cond_resched();
- return ret;
-}
-
-static inline unsigned long get_stack_usage_in_bytes(struct vm_area_struct *vma,
- struct task_struct *task)
-{
- struct stack_stats ss;
- struct mm_walk stack_walk = {
- .pmd_entry = stack_usage_pte_range,
- .mm = vma->vm_mm,
- .private = &ss,
- };
-
- if (!vma->vm_mm || is_vm_hugetlb_page(vma))
- return 0;
-
- ss.vma = vma;
- ss.startpage = task->stack_start & PAGE_MASK;
- ss.usage = 0;
-
-#ifdef CONFIG_STACK_GROWSUP
- walk_page_range(KSTK_ESP(task) & PAGE_MASK, vma->vm_end,
- &stack_walk);
-#else
- walk_page_range(vma->vm_start, (KSTK_ESP(task) & PAGE_MASK) + PAGE_SIZE,
- &stack_walk);
-#endif
- return ss.usage;
-}
-
-static inline void task_show_stack_usage(struct seq_file *m,
- struct task_struct *task)
-{
- struct vm_area_struct *vma;
- struct mm_struct *mm = get_task_mm(task);
-
- if (mm) {
- down_read(&mm->mmap_sem);
- vma = find_vma(mm, task->stack_start);
- if (vma)
- seq_printf(m, "Stack usage:\t%lu kB\n",
- get_stack_usage_in_bytes(vma, task) >> 10);
-
- up_read(&mm->mmap_sem);
- mmput(mm);
- }
-}
-#else
-static void task_show_stack_usage(struct seq_file *m, struct task_struct *task)
-{
}
-#endif /* CONFIG_MMU */

static void task_cpus_allowed(struct seq_file *m, struct task_struct *task)
{
diff -u -N -r -p linux-2.6.33-rc2.orig/Documentation/filesystems/proc.txt linux-2.6.33-rc2.new/Documentation/filesystems/proc.txt
--- linux-2.6.33-rc2.orig/Documentation/filesystems/proc.txt 2009-12-27 23:37:01.098310709 +0100
+++ linux-2.6.33-rc2.new/Documentation/filesystems/proc.txt 2010-01-02 14:30:39.059150340 +0100
@@ -231,7 +231,7 @@ Table 1-2: Contents of the statm files (
Mems_allowed_list Same as previous, but in "list format"
voluntary_ctxt_switches number of voluntary context switches
nonvoluntary_ctxt_switches number of non voluntary context switches
- Stack usage: stack usage high water mark (round up to page size)
+ Stack usage: stack usage in kB
..............................................................................

Table 1-3: Contents of the statm files (as of 2.6.8-rc3)

2010-01-05 05:24:13

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] partial revert to show stack information in /proc/<pid>/status

> As announce here is the patch to partial revert the show stack
> information patch due a not accepted performance regression. It will be
> now show only the current stack usage, not the high water mark.
>
> The path is only partial reverted because i need the other parts to do
> it in an other way.
>
> There are now two possibilities solutions:
>
> - create a new /proc/<pid>/stackinfo entry, which provides the reverted
> information and maybe others like the sigaltstack.
> - create a user space tool which use /proc/<pid>/pagemap
>
> In both cases the information of task->stack_start and the KSTK_ESP is
> needed.
>
> It will be also needed for an enhancement of the oom handler, where i
> free unused stack pages (the pages before the stack pointer) under high
> memory pressure. This is currently under work.

This explanation seems still strange. both task->stack_start and the KSTK_ESP are already exported
by /proc/{pid}/task/{tid}/stat. and This patch assume KSTK_ESP(task) - task->stack_start mean
StackUsage but it isn't the same as many people's assumption.

However, useless feature better than regression. probably I can ack this.


>
> Andrew please apply this patch to 2.6.34-rc* tree.
>
> Greetings,
> Stefani
>
> Signed-off-by: Stefani Seibold <[email protected]>
> ---
> Documentation/filesystems/proc.txt | 2
> fs/proc/array.c | 87 ++-----------------------------------
> 2 files changed, 8 insertions(+), 81 deletions(-)
>
> diff -u -N -r -p linux-2.6.33-rc2.orig/fs/proc/array.c linux-2.6.33-rc2.new/fs/proc/array.c
> --- linux-2.6.33-rc2.orig/fs/proc/array.c 2009-12-27 23:37:04.817427024 +0100
> +++ linux-2.6.33-rc2.new/fs/proc/array.c 2010-01-02 14:36:53.794188418 +0100
> @@ -327,93 +327,20 @@ static inline void task_context_switch_c
> p->nivcsw);
> }
>
> -#ifdef CONFIG_MMU
> -
> -struct stack_stats {
> - struct vm_area_struct *vma;
> - unsigned long startpage;
> - unsigned long usage;
> -};
> -
> -static int stack_usage_pte_range(pmd_t *pmd, unsigned long addr,
> - unsigned long end, struct mm_walk *walk)
> +static inline void task_show_stack_usage(struct seq_file *m,
> + struct task_struct *task)
> {
> - struct stack_stats *ss = walk->private;
> - struct vm_area_struct *vma = ss->vma;
> - pte_t *pte, ptent;
> - spinlock_t *ptl;
> - int ret = 0;
> -
> - pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> - for (; addr != end; pte++, addr += PAGE_SIZE) {
> - ptent = *pte;
> + unsigned long usage;
>
> + if (task->mm) {
> #ifdef CONFIG_STACK_GROWSUP
> - if (pte_present(ptent) || is_swap_pte(ptent))
> - ss->usage = addr - ss->startpage + PAGE_SIZE;
> + usage = KSTK_ESP(task) - task->stack_start;
> #else
> - if (pte_present(ptent) || is_swap_pte(ptent)) {
> - ss->usage = ss->startpage - addr + PAGE_SIZE;
> - pte++;
> - ret = 1;
> - break;
> - }
> + usage = task->stack_start - KSTK_ESP(task);
> #endif
> + seq_printf(m, "Stack usage:\t%lu kB\n", (usage + 1023) >> 10);
> }
> - pte_unmap_unlock(pte - 1, ptl);
> - cond_resched();
> - return ret;
> -}
> -
> -static inline unsigned long get_stack_usage_in_bytes(struct vm_area_struct *vma,
> - struct task_struct *task)
> -{
> - struct stack_stats ss;
> - struct mm_walk stack_walk = {
> - .pmd_entry = stack_usage_pte_range,
> - .mm = vma->vm_mm,
> - .private = &ss,
> - };
> -
> - if (!vma->vm_mm || is_vm_hugetlb_page(vma))
> - return 0;
> -
> - ss.vma = vma;
> - ss.startpage = task->stack_start & PAGE_MASK;
> - ss.usage = 0;
> -
> -#ifdef CONFIG_STACK_GROWSUP
> - walk_page_range(KSTK_ESP(task) & PAGE_MASK, vma->vm_end,
> - &stack_walk);
> -#else
> - walk_page_range(vma->vm_start, (KSTK_ESP(task) & PAGE_MASK) + PAGE_SIZE,
> - &stack_walk);
> -#endif
> - return ss.usage;
> -}
> -
> -static inline void task_show_stack_usage(struct seq_file *m,
> - struct task_struct *task)
> -{
> - struct vm_area_struct *vma;
> - struct mm_struct *mm = get_task_mm(task);
> -
> - if (mm) {
> - down_read(&mm->mmap_sem);
> - vma = find_vma(mm, task->stack_start);
> - if (vma)
> - seq_printf(m, "Stack usage:\t%lu kB\n",
> - get_stack_usage_in_bytes(vma, task) >> 10);
> -
> - up_read(&mm->mmap_sem);
> - mmput(mm);
> - }
> -}
> -#else
> -static void task_show_stack_usage(struct seq_file *m, struct task_struct *task)
> -{
> }
> -#endif /* CONFIG_MMU */
>
> static void task_cpus_allowed(struct seq_file *m, struct task_struct *task)
> {
> diff -u -N -r -p linux-2.6.33-rc2.orig/Documentation/filesystems/proc.txt linux-2.6.33-rc2.new/Documentation/filesystems/proc.txt
> --- linux-2.6.33-rc2.orig/Documentation/filesystems/proc.txt 2009-12-27 23:37:01.098310709 +0100
> +++ linux-2.6.33-rc2.new/Documentation/filesystems/proc.txt 2010-01-02 14:30:39.059150340 +0100
> @@ -231,7 +231,7 @@ Table 1-2: Contents of the statm files (
> Mems_allowed_list Same as previous, but in "list format"
> voluntary_ctxt_switches number of voluntary context switches
> nonvoluntary_ctxt_switches number of non voluntary context switches
> - Stack usage: stack usage high water mark (round up to page size)
> + Stack usage: stack usage in kB
> ..............................................................................
>
> Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
>
>


2010-01-07 21:53:27

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] proc: revert to show stack information in /proc/{pid}/status

On Thu, 31 Dec 2009 23:12:22 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> Thus, this patch revert it.

OK, thanks, I applied the full revert.

We can look at a new and less costly implementation of this feature for
2.6.34 if desired. In which case, all we have done is to delay the feature
by one release, which is not a big deal.

2010-01-08 00:21:37

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] proc: revert to show stack information in /proc/{pid}/status

> On Thu, 31 Dec 2009 23:12:22 +0900 (JST)
> KOSAKI Motohiro <[email protected]> wrote:
>
> > Thus, this patch revert it.
>
> OK, thanks, I applied the full revert.
>
> We can look at a new and less costly implementation of this feature for
> 2.6.34 if desired. In which case, all we have done is to delay the feature
> by one release, which is not a big deal.

Thanks.

note: my patch isn't full revert too. Commit d899bf7b introduced two
feature (btw, I don't like one patch have two feature).

1) Add the annotattion of [thread stack: xxxx] mark to
/proc/{pid}/task/{tid}/maps.
2) Add StackUsage field to /proc/{pid}/status.

I only revert (2), because I haven't seen (1) cause regression.


2010-01-08 00:35:41

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] proc: revert to show stack information in /proc/{pid}/status

On Fri, 8 Jan 2010 09:21:30 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> > On Thu, 31 Dec 2009 23:12:22 +0900 (JST)
> > KOSAKI Motohiro <[email protected]> wrote:
> >
> > > Thus, this patch revert it.
> >
> > OK, thanks, I applied the full revert.
> >
> > We can look at a new and less costly implementation of this feature for
> > 2.6.34 if desired. In which case, all we have done is to delay the feature
> > by one release, which is not a big deal.
>
> Thanks.
>
> note: my patch isn't full revert too. Commit d899bf7b introduced two
> feature (btw, I don't like one patch have two feature).
>
> 1) Add the annotattion of [thread stack: xxxx] mark to
> /proc/{pid}/task/{tid}/maps.
> 2) Add StackUsage field to /proc/{pid}/status.
>
> I only revert (2), because I haven't seen (1) cause regression.
>
>

Ah, thanks. I've updated the patch title and changelog to reflect that.