This patch add new counter slowpath_entered in /proc/vmstat to
track how many times the system entered into slowpath after
first allocation attempt is failed.
This is useful to know the rate of allocation success within
the slowpath.
This patch is tested on ARM with 512MB RAM.
A sample output is shown below after successful boot-up:
shell> cat /proc/vmstat
nr_free_pages 4712
pgalloc_normal 1319432
pgalloc_movable 0
pageoutrun 379
allocstall 0
slowpath_entered 585
compact_stall 0
compact_fail 0
compact_success 0
>From the above output we can see that the system entered
slowpath 585 times.
But the existing counter kswapd(pageoutrun), direct_reclaim(allocstall),
direct_compact(compact_stall) does not tell this value.
>From the above value, it clearly indicates that the system have
entered slowpath 585 times. Out of which 379 times allocation passed
through kswapd, without performing direct reclaim/compaction.
That means the remaining 206 times the allocation would have succeeded
using the alloc_pages_high_priority.
Signed-off-by: Pintu Kumar <[email protected]>
---
include/linux/vm_event_item.h | 2 +-
mm/page_alloc.c | 2 ++
mm/vmstat.c | 2 +-
3 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 2b1cef8..9825f294 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -37,7 +37,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
#endif
PGINODESTEAL, SLABS_SCANNED, KSWAPD_INODESTEAL,
KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY,
- PAGEOUTRUN, ALLOCSTALL, PGROTATED,
+ PAGEOUTRUN, ALLOCSTALL, SLOWPATH_ENTERED, PGROTATED,
DROP_PAGECACHE, DROP_SLAB,
#ifdef CONFIG_NUMA_BALANCING
NUMA_PTE_UPDATES,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2024d2e..4a5d487 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3029,6 +3029,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (IS_ENABLED(CONFIG_NUMA) && (gfp_mask & __GFP_THISNODE) && !wait)
goto nopage;
+ count_vm_event(SLOWPATH_ENTERED);
+
retry:
if (!(gfp_mask & __GFP_NO_KSWAPD))
wake_all_kswapds(order, ac);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 1fd0886..1c54fdf 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -778,7 +778,7 @@ const char * const vmstat_text[] = {
"kswapd_high_wmark_hit_quickly",
"pageoutrun",
"allocstall",
-
+ "slowpath_entered",
"pgrotated",
"drop_pagecache",
--
1.7.9.5
On Fri 07-08-15 12:38:54, Pintu Kumar wrote:
> This patch add new counter slowpath_entered in /proc/vmstat to
> track how many times the system entered into slowpath after
> first allocation attempt is failed.
This is too lowlevel to be exported in the regular user visible
interface IMO.
> This is useful to know the rate of allocation success within
> the slowpath.
What would be that information good for? Is a regular administrator
expected to consume this value or this is aimed more to kernel
developers? If the later then I think a trace point sounds like a better
interface.
> This patch is tested on ARM with 512MB RAM.
> A sample output is shown below after successful boot-up:
> shell> cat /proc/vmstat
> nr_free_pages 4712
> pgalloc_normal 1319432
> pgalloc_movable 0
> pageoutrun 379
> allocstall 0
> slowpath_entered 585
> compact_stall 0
> compact_fail 0
> compact_success 0
>
> >From the above output we can see that the system entered
> slowpath 585 times.
> But the existing counter kswapd(pageoutrun), direct_reclaim(allocstall),
> direct_compact(compact_stall) does not tell this value.
> >From the above value, it clearly indicates that the system have
> entered slowpath 585 times. Out of which 379 times allocation passed
> through kswapd, without performing direct reclaim/compaction.
> That means the remaining 206 times the allocation would have succeeded
> using the alloc_pages_high_priority.
>
> Signed-off-by: Pintu Kumar <[email protected]>
--
Michal Hocko
SUSE Labs
On (08/07/15 12:38), Pintu Kumar wrote:
> This patch add new counter slowpath_entered in /proc/vmstat to
> track how many times the system entered into slowpath after
> first allocation attempt is failed.
> This is useful to know the rate of allocation success within
> the slowpath.
> This patch is tested on ARM with 512MB RAM.
> A sample output is shown below after successful boot-up:
> shell> cat /proc/vmstat
> nr_free_pages 4712
> pgalloc_normal 1319432
> pgalloc_movable 0
> pageoutrun 379
> allocstall 0
> slowpath_entered 585
> compact_stall 0
> compact_fail 0
> compact_success 0
>
> From the above output we can see that the system entered
> slowpath 585 times.
so what can you do with this number?
-ss
> But the existing counter kswapd(pageoutrun), direct_reclaim(allocstall),
> direct_compact(compact_stall) does not tell this value.
> From the above value, it clearly indicates that the system have
> entered slowpath 585 times. Out of which 379 times allocation passed
> through kswapd, without performing direct reclaim/compaction.
> That means the remaining 206 times the allocation would have succeeded
> using the alloc_pages_high_priority.
>
> Signed-off-by: Pintu Kumar <[email protected]>
> ---
> include/linux/vm_event_item.h | 2 +-
> mm/page_alloc.c | 2 ++
> mm/vmstat.c | 2 +-
> 3 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 2b1cef8..9825f294 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -37,7 +37,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> #endif
> PGINODESTEAL, SLABS_SCANNED, KSWAPD_INODESTEAL,
> KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY,
> - PAGEOUTRUN, ALLOCSTALL, PGROTATED,
> + PAGEOUTRUN, ALLOCSTALL, SLOWPATH_ENTERED, PGROTATED,
> DROP_PAGECACHE, DROP_SLAB,
> #ifdef CONFIG_NUMA_BALANCING
> NUMA_PTE_UPDATES,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2024d2e..4a5d487 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3029,6 +3029,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> if (IS_ENABLED(CONFIG_NUMA) && (gfp_mask & __GFP_THISNODE) && !wait)
> goto nopage;
>
> + count_vm_event(SLOWPATH_ENTERED);
> +
> retry:
> if (!(gfp_mask & __GFP_NO_KSWAPD))
> wake_all_kswapds(order, ac);
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 1fd0886..1c54fdf 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -778,7 +778,7 @@ const char * const vmstat_text[] = {
> "kswapd_high_wmark_hit_quickly",
> "pageoutrun",
> "allocstall",
> -
> + "slowpath_entered",
> "pgrotated",
>
> "drop_pagecache",
> --
> 1.7.9.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>
Hi,
> -----Original Message-----
> From: Michal Hocko [mailto:[email protected]]
> Sent: Friday, August 07, 2015 1:14 PM
> To: Pintu Kumar
> Cc: [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH 1/1] mm: vmstat: introducing vm counter for slowpath
>
> On Fri 07-08-15 12:38:54, Pintu Kumar wrote:
> > This patch add new counter slowpath_entered in /proc/vmstat to track
> > how many times the system entered into slowpath after first allocation
> > attempt is failed.
>
> This is too lowlevel to be exported in the regular user visible interface IMO.
>
I think its ok because I think this interface is for lowlevel debugging itself.
> > This is useful to know the rate of allocation success within the
> > slowpath.
>
> What would be that information good for? Is a regular administrator expected
to
> consume this value or this is aimed more to kernel developers? If the later
then I
> think a trace point sounds like a better interface.
>
This information is good for kernel developers.
I found this information useful while debugging low memory situation and
sluggishness behavior.
I wanted to know how many times the first allocation is failing and how many
times system entering slowpath.
As I said, the existing counter does not give this information clearly.
The pageoutrun, allocstall is too confusing.
Also, if kswapd and compaction is disabled, we have no other counter for
slowpath (except allocstall).
Another problem is that allocstall can also be incremented from hibernation
during shrink_all_memory calling.
Which may create more confusion.
Thus I found this interface useful to understand low memory behavior.
If device sluggishness is happening because of too many slowpath or due to some
other problem.
Then we can decide what will be the best memory configuration for my device to
reduce the slowpath.
Regarding trace points, I am not sure if we can attach counter to it.
Also trace may have more over-head and requires additional configs to be enabled
to debug.
Mostly these configs will not be enabled by default (at least in embedded, low
memory device).
I found the vmstat interface more easy and useful.
Comments and suggestions are welcome.
> > This patch is tested on ARM with 512MB RAM.
> > A sample output is shown below after successful boot-up:
> > shell> cat /proc/vmstat
> > nr_free_pages 4712
> > pgalloc_normal 1319432
> > pgalloc_movable 0
> > pageoutrun 379
> > allocstall 0
> > slowpath_entered 585
> > compact_stall 0
> > compact_fail 0
> > compact_success 0
> >
> > >From the above output we can see that the system entered
> > slowpath 585 times.
> > But the existing counter kswapd(pageoutrun),
> > direct_reclaim(allocstall),
> > direct_compact(compact_stall) does not tell this value.
> > >From the above value, it clearly indicates that the system have
> > entered slowpath 585 times. Out of which 379 times allocation passed
> > through kswapd, without performing direct reclaim/compaction.
> > That means the remaining 206 times the allocation would have succeeded
> > using the alloc_pages_high_priority.
> >
> > Signed-off-by: Pintu Kumar <[email protected]>
> --
> Michal Hocko
> SUSE Labs
On Fri 07-08-15 18:16:47, PINTU KUMAR wrote:
[...]
> > On Fri 07-08-15 12:38:54, Pintu Kumar wrote:
> > > This patch add new counter slowpath_entered in /proc/vmstat to track
> > > how many times the system entered into slowpath after first allocation
> > > attempt is failed.
> >
> > This is too lowlevel to be exported in the regular user visible interface IMO.
> >
> I think its ok because I think this interface is for lowlevel debugging itself.
Yes but this might change in future implementations where the counter
might be misleading or even lacking any meaning. This is a user visible
interface which has to be maintained practically for ever. We have made
those mistakes in the past...
[...]
> This information is good for kernel developers.
Then make it a trace point and you can dump even more information. E.g.
timestamps, gfp_mask, order...
[...]
> Regarding trace points, I am not sure if we can attach counter to it.
You do not need to have a counter. You just watch for the tracepoint
while debugging your particular problem.
> Also trace may have more over-head
Tracepoints should be close to 0 overhead when disabled and certainly
not a performance killer during the debugging session.
> and requires additional configs to be enabled to debug.
This is to be expected for the debugging sessions. And I am pretty
sure that the static event tracepoints do not require anything really
excessive.
> Mostly these configs will not be enabled by default (at least in embedded, low
> memory device).
Are you sure? I thought that CONFIG_TRACING should be sufficient for
EVENT_TRACING but I am not familiar with this too deeply...
--
Michal Hocko
SUSE Labs
On Fri, 07 Aug 2015 18:16:47 +0530 PINTU KUMAR <[email protected]> wrote:
> > > This is useful to know the rate of allocation success within the
> > > slowpath.
> >
> > What would be that information good for? Is a regular administrator expected
> to
> > consume this value or this is aimed more to kernel developers? If the later
> then I
> > think a trace point sounds like a better interface.
> >
> This information is good for kernel developers.
> I found this information useful while debugging low memory situation and
> sluggishness behavior.
> I wanted to know how many times the first allocation is failing and how many
> times system entering slowpath.
> As I said, the existing counter does not give this information clearly.
> The pageoutrun, allocstall is too confusing.
> Also, if kswapd and compaction is disabled, we have no other counter for
> slowpath (except allocstall).
> Another problem is that allocstall can also be incremented from hibernation
> during shrink_all_memory calling.
> Which may create more confusion.
> Thus I found this interface useful to understand low memory behavior.
> If device sluggishness is happening because of too many slowpath or due to some
> other problem.
> Then we can decide what will be the best memory configuration for my device to
> reduce the slowpath.
>
> Regarding trace points, I am not sure if we can attach counter to it.
> Also trace may have more over-head and requires additional configs to be enabled
> to debug.
> Mostly these configs will not be enabled by default (at least in embedded, low
> memory device).
> I found the vmstat interface more easy and useful.
This does seem like a pretty basic and sensible thing to expose in
vmstat. It probably makes more sense than some of the other things we
have in there.
Yes, it could be a tracepoint but practically speaking, a tracepoint
makes it developer-only. You can ask a bug reporter or a customer
"what is /proc/vmstat:slowpath_entered" doing, but it's harder to ask
them to set up tracing.
And I don't think this will lock us into anything - vmstat is a big
dumping ground and I don't see a big problem with removing or changing
things later on. IMO, debugfs rules apply here and vmstat would be in
debugfs, had debugfs existed at the time.
Two things:
- we appear to have forgotten to document /proc/vmstat
- How does one actually use slowpath_entered? Obviously we'd like to
know "what proportion of allocations entered the slowpath", so we
calculate
slowpath_entered/X
how do we obtain "X"? Is it by adding up all the pgalloc_*? If
so, perhaps we should really have slowpath_entered_dma,
slowpath_entered_dma32, ...?
Hi,
> -----Original Message-----
> From: Andrew Morton [mailto:[email protected]]
> Sent: Saturday, August 08, 2015 4:06 AM
> To: PINTU KUMAR
> Cc: 'Michal Hocko'; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH 1/1] mm: vmstat: introducing vm counter for slowpath
>
> On Fri, 07 Aug 2015 18:16:47 +0530 PINTU KUMAR <[email protected]>
> wrote:
>
> > > > This is useful to know the rate of allocation success within the
> > > > slowpath.
> > >
> > > What would be that information good for? Is a regular administrator
> > > expected
> > to
> > > consume this value or this is aimed more to kernel developers? If
> > > the later
> > then I
> > > think a trace point sounds like a better interface.
> > >
> > This information is good for kernel developers.
> > I found this information useful while debugging low memory situation
> > and sluggishness behavior.
> > I wanted to know how many times the first allocation is failing and
> > how many times system entering slowpath.
> > As I said, the existing counter does not give this information clearly.
> > The pageoutrun, allocstall is too confusing.
> > Also, if kswapd and compaction is disabled, we have no other counter
> > for slowpath (except allocstall).
> > Another problem is that allocstall can also be incremented from
> > hibernation during shrink_all_memory calling.
> > Which may create more confusion.
> > Thus I found this interface useful to understand low memory behavior.
> > If device sluggishness is happening because of too many slowpath or
> > due to some other problem.
> > Then we can decide what will be the best memory configuration for my
> > device to reduce the slowpath.
> >
> > Regarding trace points, I am not sure if we can attach counter to it.
> > Also trace may have more over-head and requires additional configs to
> > be enabled to debug.
> > Mostly these configs will not be enabled by default (at least in
> > embedded, low memory device).
> > I found the vmstat interface more easy and useful.
>
> This does seem like a pretty basic and sensible thing to expose in vmstat. It
> probably makes more sense than some of the other things we have in there.
>
Thanks Andrew.
Yes, as par my analysis, I feel that this is one of the useful and important
interface.
I added it in one of our internal product and found it to be very useful.
Specially during shrink_memory and compact_nodes analysis I found it really
useful.
It helps me to prove that if higher-order pages are present, it can reduce the
slowpath drastically.
Also during my ELC presentation people asked me how to monitor the slowpath
counts.
> Yes, it could be a tracepoint but practically speaking, a tracepoint makes it
> developer-only. You can ask a bug reporter or a customer "what is
> /proc/vmstat:slowpath_entered" doing, but it's harder to ask them to set up
> tracing.
>
Yes, at times tracing are painful to analyze.
Also, in commercial user binaries, most of tracing support are disabled (with no
root privileges).
However, /proc/vmstat works with normal user binaries.
When memory issues are reported, we just get log dumps and few interfaces like
this.
Most of the time these memory issues are hard to reproduce because it may happen
after long usage.
> And I don't think this will lock us into anything - vmstat is a big dumping
ground
> and I don't see a big problem with removing or changing things later on. IMO,
> debugfs rules apply here and vmstat would be in debugfs, had debugfs existed
at
> the time.
>
>
> Two things:
>
> - we appear to have forgotten to document /proc/vmstat
>
Yes, I could not find any document on vmstat under kernel/Documentation.
I think it's a nice think to have.
May be, I can start this initiative to create one :)
If respective owner can update, it will be great.
> - How does one actually use slowpath_entered? Obviously we'd like to
> know "what proportion of allocations entered the slowpath", so we
> calculate
>
> slowpath_entered/X
>
> how do we obtain "X"? Is it by adding up all the pgalloc_*? If
> so, perhaps we should really have slowpath_entered_dma,
> slowpath_entered_dma32, ...?
I think the slowpath for other zones may not be required.
We just need to know how many times we entered slowpath and possibly do
something to reduce it.
But, I think, pgalloc_* count may also include success for fastpath.
How I use slowpath for analysis is:
VMSTAT BEFORE AFTER %DIFF
---------- ---------- ---------- ------------
nr_free_pages 6726 12494 46.17%
pgalloc_normal 985836 1549333 36.37%
pageoutrun 2699 529 80.40%
allocstall 298 98 67.11%
slowpath_entered 16659 739 95.56%
compact_stall 244 21 91.39%
compact_fail 178 11 93.82%
compact_success 52 7 86.54%
The above values are from 512MB system with only NORMAL zone.
Before, the slowpath count was 16659.
After (memory shrinker + compaction), the slowpath reduced by 95%, for the same
scenario.
This is just an example.
If we are interested to know even allocation success/fail ratio in slowpath,
then I think we need more counters.
Such as; direct_reclaim_success/fail, kswapd_success/fail (just like compaction
success/fail).
OR, we can have pgalloc_success_fastpath counter.
Then we can do:
pgalloc_success_in_slowpath = (pgalloc_normal - pgalloc_success_fastpath)
Therefore, success_ratio for slowpath could be;
(pgalloc_success_in_slowpath/slowpath_entered) * 100
More comments, welcome.
On Mon 10-08-15 15:15:06, PINTU KUMAR wrote:
[...]
> > > Regarding trace points, I am not sure if we can attach counter to it.
> > > Also trace may have more over-head and requires additional configs to
> > > be enabled to debug.
> > > Mostly these configs will not be enabled by default (at least in
> > > embedded, low memory device).
> > > I found the vmstat interface more easy and useful.
> >
> > This does seem like a pretty basic and sensible thing to expose in vmstat. It
> > probably makes more sense than some of the other things we have in there.
I still fail to see what exactly this number says. The allocator
slowpath (aka __alloc_pages_slowpath) is more an organizational split
up of the code than anything that would tell us about how costly the
allocation is - e.g. zone_reclaim might happen before we enter the
slowpath.
> Thanks Andrew.
> Yes, as par my analysis, I feel that this is one of the useful and important
> interface.
> I added it in one of our internal product and found it to be very useful.
> Specially during shrink_memory and compact_nodes analysis I found it really
> useful.
> It helps me to prove that if higher-order pages are present, it can reduce the
> slowpath drastically.
I am not sure I understand but this is kind of obvious, no?
> Also during my ELC presentation people asked me how to monitor the slowpath
> counts.
Isn't the allocation latency a much well defined metric? What does the
slowpath without compaction/reclaim tell to user?
> > Yes, it could be a tracepoint but practically speaking, a tracepoint makes it
> > developer-only. You can ask a bug reporter or a customer "what is
> > /proc/vmstat:slowpath_entered" doing, but it's harder to ask them to set up
> > tracing.
> >
> Yes, at times tracing are painful to analyze.
> Also, in commercial user binaries, most of tracing support are disabled (with no
> root privileges).
> However, /proc/vmstat works with normal user binaries.
> When memory issues are reported, we just get log dumps and few interfaces like
> this.
> Most of the time these memory issues are hard to reproduce because it may happen
> after long usage.
Yes, I do understand that vmstat is much more convenient. No question
about that. But the counter should be generally usable.
When I see COMPACTSTALL increasing I know that the direct compaction had
to be invoked and that tells me that the system is getting fragmented
and COMPACTFAIL/COMPACTSUCCESS will tell me how successful the
compaction is.
Similarly when I see ALLOCSTALL I know that kswapd doesn't catch up and
scan/reclaim will tell me how effective it is. Snapshoting
ALLOCSTALL/time helped me to narrow down memory pressure peaks to
further investigate other counters in a more detail.
What will entered-slowpath without triggering neither compaction nor
direct reclaim tell me?
[...]
> > Two things:
> >
> > - we appear to have forgotten to document /proc/vmstat
> >
> Yes, I could not find any document on vmstat under kernel/Documentation.
> I think it's a nice think to have.
> May be, I can start this initiative to create one :)
That would be more than appreciated.
> If respective owner can update, it will be great.
>
> > - How does one actually use slowpath_entered? Obviously we'd like to
> > know "what proportion of allocations entered the slowpath", so we
> > calculate
> >
> > slowpath_entered/X
> >
> > how do we obtain "X"? Is it by adding up all the pgalloc_*?
It's not because pgalloc_ count number of pages while slowpath_entered
counts allocations requests.
> > If
> > so, perhaps we should really have slowpath_entered_dma,
> > slowpath_entered_dma32, ...?
>
> I think the slowpath for other zones may not be required.
> We just need to know how many times we entered slowpath and possibly do
> something to reduce it.
> But, I think, pgalloc_* count may also include success for fastpath.
>
> How I use slowpath for analysis is:
> VMSTAT BEFORE AFTER %DIFF
> ---------- ---------- ---------- ------------
> nr_free_pages 6726 12494 46.17%
> pgalloc_normal 985836 1549333 36.37%
> pageoutrun 2699 529 80.40%
> allocstall 298 98 67.11%
> slowpath_entered 16659 739 95.56%
> compact_stall 244 21 91.39%
> compact_fail 178 11 93.82%
> compact_success 52 7 86.54%
>
> The above values are from 512MB system with only NORMAL zone.
> Before, the slowpath count was 16659.
> After (memory shrinker + compaction), the slowpath reduced by 95%, for
> the same scenario.
> This is just an example.
But what additional information does it give to us? We can see that the
direct reclaim has been reduced as well as the compaction which was even
more effective so the overall memory pressure was lighter and memory
less fragmented. I assume that your test has requested the same amount
of high order allocations and pgalloc_normal much higher in the second
case suggests they were more effective but we can see that clearly even
without slowpath_entered.
So I would argue that we do not need slowpath_entered. We already have
it, even specialized depending on which _slow_ path has been executed.
What we are missing is a number of all requests to have a reasonable
base. Whether adding such a counter in the hot path is justified is a
question. I haven't really needed it so far and I am looking into vmstat
and meminfo to debug memory reclaim related issues quite often.
> If we are interested to know even allocation success/fail ratio in slowpath,
> then I think we need more counters.
> Such as; direct_reclaim_success/fail, kswapd_success/fail (just like compaction
> success/fail).
> OR, we can have pgalloc_success_fastpath counter.
This all sounds like exposing more and more details about internal
implementation. This all fits into tracepoints world IMO.
--
Michal Hocko
SUSE Labs
Hi,
> -----Original Message-----
> From: Michal Hocko [mailto:[email protected]]
> Sent: Tuesday, August 11, 2015 4:25 PM
> To: PINTU KUMAR
> Cc: 'Andrew Morton'; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH 1/1] mm: vmstat: introducing vm counter for slowpath
>
> On Mon 10-08-15 15:15:06, PINTU KUMAR wrote:
> [...]
> > > > Regarding trace points, I am not sure if we can attach counter to it.
> > > > Also trace may have more over-head and requires additional configs
> > > > to be enabled to debug.
> > > > Mostly these configs will not be enabled by default (at least in
> > > > embedded, low memory device).
> > > > I found the vmstat interface more easy and useful.
> > >
> > > This does seem like a pretty basic and sensible thing to expose in
> > > vmstat. It probably makes more sense than some of the other things we
have
> in there.
>
> I still fail to see what exactly this number says. The allocator slowpath (aka
> __alloc_pages_slowpath) is more an organizational split up of the code than
> anything that would tell us about how costly the allocation is - e.g.
zone_reclaim
> might happen before we enter the slowpath.
>
> > Thanks Andrew.
> > Yes, as par my analysis, I feel that this is one of the useful and
> > important interface.
> > I added it in one of our internal product and found it to be very useful.
> > Specially during shrink_memory and compact_nodes analysis I found it
> > really useful.
> > It helps me to prove that if higher-order pages are present, it can
> > reduce the slowpath drastically.
>
> I am not sure I understand but this is kind of obvious, no?
>
Yes, but it's hard to prove to management that the slowpath count is reduced.
As we have seen, most of the time this kind of performance issues are hard to
reproduce.
> > Also during my ELC presentation people asked me how to monitor the
> > slowpath counts.
>
> Isn't the allocation latency a much well defined metric? What does the
slowpath
> without compaction/reclaim tell to user?
>
The current metrics in slowpath is the story half told.
> > > Yes, it could be a tracepoint but practically speaking, a tracepoint
> > > makes it developer-only. You can ask a bug reporter or a customer
> > > "what is /proc/vmstat:slowpath_entered" doing, but it's harder to
> > > ask them to set up tracing.
> > >
> > Yes, at times tracing are painful to analyze.
> > Also, in commercial user binaries, most of tracing support are
> > disabled (with no root privileges).
> > However, /proc/vmstat works with normal user binaries.
> > When memory issues are reported, we just get log dumps and few
> > interfaces like this.
> > Most of the time these memory issues are hard to reproduce because it
> > may happen after long usage.
>
> Yes, I do understand that vmstat is much more convenient. No question about
> that. But the counter should be generally usable.
>
> When I see COMPACTSTALL increasing I know that the direct compaction had to
> be invoked and that tells me that the system is getting fragmented and
> COMPACTFAIL/COMPACTSUCCESS will tell me how successful the compaction is.
>
> Similarly when I see ALLOCSTALL I know that kswapd doesn't catch up and
> scan/reclaim will tell me how effective it is. Snapshoting ALLOCSTALL/time
> helped me to narrow down memory pressure peaks to further investigate other
> counters in a more detail.
>
> What will entered-slowpath without triggering neither compaction nor direct
> reclaim tell me?
>
The slowpath count will actually give the actual number, irrespective of
compact/reclaim/kswapd.
There are other things that happens in slowpath, for which we don't have
counters.
Thus having one counter _slowpath_ is enough for all situations.
Even, when KSWAP/COMPACTION is disabled, or not used.
> [...]
>
> > > Two things:
> > >
> > > - we appear to have forgotten to document /proc/vmstat
> > >
> > Yes, I could not find any document on vmstat under kernel/Documentation.
> > I think it's a nice think to have.
> > May be, I can start this initiative to create one :)
>
> That would be more than appreciated.
>
Ok, I will start the basic vmstat.txt in Documentation and release first
version.
Thanks.
> > If respective owner can update, it will be great.
> >
> > > - How does one actually use slowpath_entered? Obviously we'd like to
> > > know "what proportion of allocations entered the slowpath", so we
> > > calculate
> > >
> > > slowpath_entered/X
> > >
> > > how do we obtain "X"? Is it by adding up all the pgalloc_*?
>
> It's not because pgalloc_ count number of pages while slowpath_entered counts
> allocations requests.
>
> > > If
> > > so, perhaps we should really have slowpath_entered_dma,
> > > slowpath_entered_dma32, ...?
> >
> > I think the slowpath for other zones may not be required.
> > We just need to know how many times we entered slowpath and possibly
> > do something to reduce it.
> > But, I think, pgalloc_* count may also include success for fastpath.
> >
> > How I use slowpath for analysis is:
> > VMSTAT BEFORE AFTER %DIFF
> > ---------- ---------- ---------- ------------
> > nr_free_pages 6726 12494 46.17%
> > pgalloc_normal 985836 1549333 36.37%
> > pageoutrun 2699 529 80.40%
> > allocstall 298 98 67.11%
> > slowpath_entered 16659 739 95.56%
> > compact_stall 244 21 91.39%
> > compact_fail 178 11 93.82%
> > compact_success 52 7 86.54%
> >
> > The above values are from 512MB system with only NORMAL zone.
> > Before, the slowpath count was 16659.
> > After (memory shrinker + compaction), the slowpath reduced by 95%, for
> > the same scenario.
> > This is just an example.
>
> But what additional information does it give to us? We can see that the direct
> reclaim has been reduced as well as the compaction which was even more
> effective so the overall memory pressure was lighter and memory less
> fragmented. I assume that your test has requested the same amount of high
> order allocations and pgalloc_normal much higher in the second case suggests
> they were more effective but we can see that clearly even without
> slowpath_entered.
>
The think to note here is that, slowpath count is 16659 (which is 100% actual,
and no confusion).
However, if you see the other counter for slowpath (pageoutrun:2699,
allocstall:298, compact_stall:244),
And add all of them (2699+298+244)=3241, it is much lesser than the actual
slowpath count.
So, these counter doesn't really tells what actually happened in the slowpath.
There are other factors that effects slowpath (like, alloc without watermarks).
Moreover, with _retry_ and _rebalance_ mechanism, the allocstall/compact_stall
counter will keep increasing.
But, slowpath count will remain same.
Also, in some system, the KSWAP can be disabled, so pageoutrun will be always 0.
Similarly, COMPACTION can be disabled, so compact_stall will not be present.
In this scenario, we are left with only allocstall.
Also, as I said earlier, this allocstall can also be incremented from other
place, such as shrink_all_memory.
Consider, another situation like below:
VMSTAT
-------------------------------------
nr_free_pages 59982
pgalloc_normal 364163
pgalloc_high 2046
pageoutrun 1
allocstall 0
compact_stall 0
compact_fail 0
compact_success 0
------------------------------------
>From the above, is it possible to tell how many times it entered into slowpath?
Now, I will add slowpath here, and check again. I don't have that data right
now.
Thus, the point is, just one counter is enough to quickly analyze the behavior
in slowpath.
More suggestions are welcome!
> So I would argue that we do not need slowpath_entered. We already have it,
> even specialized depending on which _slow_ path has been executed.
> What we are missing is a number of all requests to have a reasonable base.
> Whether adding such a counter in the hot path is justified is a question. I
haven't
> really needed it so far and I am looking into vmstat and meminfo to debug
> memory reclaim related issues quite often.
>
> > If we are interested to know even allocation success/fail ratio in
> > slowpath, then I think we need more counters.
> > Such as; direct_reclaim_success/fail, kswapd_success/fail (just like
> > compaction success/fail).
> > OR, we can have pgalloc_success_fastpath counter.
>
> This all sounds like exposing more and more details about internal
> implementation. This all fits into tracepoints world IMO.
>
> --
> Michal Hocko
> SUSE Labs
On Wed 12-08-15 20:22:10, PINTU KUMAR wrote:
> > On Mon 10-08-15 15:15:06, PINTU KUMAR wrote:
[...]
> > > Yes, as par my analysis, I feel that this is one of the useful and
> > > important interface.
> > > I added it in one of our internal product and found it to be very useful.
> > > Specially during shrink_memory and compact_nodes analysis I found it
> > > really useful.
> > > It helps me to prove that if higher-order pages are present, it can
> > > reduce the slowpath drastically.
> >
> > I am not sure I understand but this is kind of obvious, no?
> >
> Yes, but it's hard to prove to management that the slowpath count is reduced.
> As we have seen, most of the time this kind of performance issues are hard to
> reproduce.
But the counter doesn't tell you much as I've tried to explain in my
previous email. You simply do not have the base to compare it to. The
fact is that slow path in this context is quite ambiguous. As I've
mentioned the fast path (as per the code organization) can already do
expensive operations (e.g. zone_reclaim). So what you are exporting is
more a slow path from the code organization POV.
Management might be happy about comparing two arbitrary numbers but that
doesn't mean it is relevant...
[...]
> > When I see COMPACTSTALL increasing I know that the direct compaction had to
> > be invoked and that tells me that the system is getting fragmented and
> > COMPACTFAIL/COMPACTSUCCESS will tell me how successful the compaction is.
> >
> > Similarly when I see ALLOCSTALL I know that kswapd doesn't catch up and
> > scan/reclaim will tell me how effective it is. Snapshoting ALLOCSTALL/time
> > helped me to narrow down memory pressure peaks to further investigate other
> > counters in a more detail.
> >
> > What will entered-slowpath without triggering neither compaction nor direct
> > reclaim tell me?
> >
> The slowpath count will actually give the actual number, irrespective of
> compact/reclaim/kswapd.
If we are missing them and they are significant to make a picture of
what is causing allocation delays then let's focus on those.
> There are other things that happens in slowpath, for which we don't have
> counters.
Which would be interesting enough to account for?
[...]
> > > How I use slowpath for analysis is:
> > > VMSTAT BEFORE AFTER %DIFF
> > > ---------- ---------- ---------- ------------
> > > nr_free_pages 6726 12494 46.17%
> > > pgalloc_normal 985836 1549333 36.37%
> > > pageoutrun 2699 529 80.40%
> > > allocstall 298 98 67.11%
> > > slowpath_entered 16659 739 95.56%
> > > compact_stall 244 21 91.39%
> > > compact_fail 178 11 93.82%
> > > compact_success 52 7 86.54%
> > >
> > > The above values are from 512MB system with only NORMAL zone.
> > > Before, the slowpath count was 16659.
> > > After (memory shrinker + compaction), the slowpath reduced by 95%, for
> > > the same scenario.
> > > This is just an example.
> >
> > But what additional information does it give to us? We can see that the direct
> > reclaim has been reduced as well as the compaction which was even more
> > effective so the overall memory pressure was lighter and memory less
> > fragmented. I assume that your test has requested the same amount of high
> > order allocations and pgalloc_normal much higher in the second case suggests
> > they were more effective but we can see that clearly even without
> > slowpath_entered.
> >
> The think to note here is that, slowpath count is 16659 (which is 100% actual,
> and no confusion).
100% against what? It certainly is not 100% of all costly allocations
because of what has been said already. Moreover this number is really
meaningless without knowing how many allocations requests were done
in total.
> However, if you see the other counter for slowpath (pageoutrun:2699,
> allocstall:298, compact_stall:244),
> And add all of them (2699+298+244)=3241, it is much lesser than the actual
> slowpath count.
Yes, because the allocation might have succeeded before the compaction
and/or direct reclaim. Such an allocation could be marginally slower
than what is not accounted as a fastpath.
> So, these counter doesn't really tells what actually happened in the slowpath.
No they are not and that is not their purpose. They aim at telling you
about costly allocation paths and they give you quite a good view into
how they operate. At least they've been serving good for me so far. If
there are gaps then let's fill them.
> There are other factors that effects slowpath (like, alloc without watermarks).
> Moreover, with _retry_ and _rebalance_ mechanism, the allocstall/compact_stall
> counter will keep increasing.
> But, slowpath count will remain same.
I am not sure direct reclaims per one slow path is a super important
information. It's been quite sufficient for me to see that there have
been many direct reclaims per time unit to debug what is causing the
memory peak.
> Also, in some system, the KSWAP can be disabled, so pageoutrun will be always 0.
Such a system would be really unhealthy but that is really irrelevant to
the discussion.
> Similarly, COMPACTION can be disabled, so compact_stall will not be present.
> In this scenario, we are left with only allocstall.
Yes and so what?
> Also, as I said earlier, this allocstall can also be incremented from other
> place, such as shrink_all_memory.
But shrink_all_memory is really uninteresting because this is a
hibernation path. You can save the file before and after the hibernation
to exclude it.
> Consider, another situation like below:
> VMSTAT
> -------------------------------------
> nr_free_pages 59982
> pgalloc_normal 364163
> pgalloc_high 2046
> pageoutrun 1
> allocstall 0
> compact_stall 0
> compact_fail 0
> compact_success 0
> ------------------------------------
> From the above, is it possible to tell how many times it entered into slowpath?
No and I would argue this is not really that interesting. Because we
know that neither the direct reclaim nor compaction had to be triggered.
So from my point of view those allocations were still in a good shape.
entered_slowpath would tell me marginally more. Merely the fact that I had
to go via get_page_from_freelist one more time and as this doesn't have
a constant cost I would have to go for tracing to have a better picture.
That being said, this counter alone is IMHO useless for any reasonable
analysis. I would even argue it is actively misleading because it
doesn't mark all the slow paths during the allocation. So NAK to this
patch.
Nevertheless, I can imagine some additional counters could help for
debugging.
ALLOC_REQUESTS - to count all requests
ALLOC_FAILS - to count number of failed requests
ALLOC_OOM - to count OOM events
COMPACTBACKOFF - compaction backed off because it wouldn't be worth it
I could find a way without them until now so I am so sure they are
really necessary but if somebody has a usecase and the additional
overhead (especially for ALLOC_REQUESTS which is the hot path) is worth
it I wouldn't mind.
--
Michal Hocko
SUSE Labs