2013-07-16 23:44:54

by Dave Hansen

[permalink] [raw]
Subject: [RESEND][PATCH] mm: vmstats: tlb flush counters


I was investigating some TLB flush scaling issues and realized
that we do not have any good methods for figuring out how many
TLB flushes we are doing.

It would be nice to be able to do these in generic code, but the
arch-independent calls don't explicitly specify whether we
actually need to do remote flushes or not. In the end, we really
need to know if we actually _did_ global vs. local invalidations,
so that leaves us with few options other than to muck with the
counters from arch-specific code.

Signed-off-by: Dave Hansen <[email protected]>
---

linux.git-davehans/arch/x86/mm/tlb.c | 18 ++++++++++++++----
linux.git-davehans/include/linux/vm_event_item.h | 5 +++++
linux.git-davehans/mm/vmstat.c | 5 +++++
3 files changed, 24 insertions(+), 4 deletions(-)

diff -puN arch/x86/mm/tlb.c~tlb-vmstats arch/x86/mm/tlb.c
--- linux.git/arch/x86/mm/tlb.c~tlb-vmstats 2013-07-16 16:41:56.476280350 -0700
+++ linux.git-davehans/arch/x86/mm/tlb.c 2013-07-16 16:41:56.483280658 -0700
@@ -103,6 +103,7 @@ static void flush_tlb_func(void *info)
if (f->flush_mm != this_cpu_read(cpu_tlbstate.active_mm))
return;

+ count_vm_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_OK) {
if (f->flush_end == TLB_FLUSH_ALL)
local_flush_tlb();
@@ -130,6 +131,7 @@ void native_flush_tlb_others(const struc
info.flush_start = start;
info.flush_end = end;

+ count_vm_event(NR_TLB_REMOTE_FLUSH);
if (is_uv_system()) {
unsigned int cpu;

@@ -149,6 +151,7 @@ void flush_tlb_current_task(void)

preempt_disable();

+ count_vm_event(NR_TLB_LOCAL_FLUSH_ALL);
local_flush_tlb();
if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
@@ -211,16 +214,19 @@ void flush_tlb_mm_range(struct mm_struct
act_entries = mm->total_vm > tlb_entries ? tlb_entries : mm->total_vm;

/* tlb_flushall_shift is on balance point, details in commit log */
- if ((end - start) >> PAGE_SHIFT > act_entries >> tlb_flushall_shift)
+ if ((end - start) >> PAGE_SHIFT > act_entries >> tlb_flushall_shift) {
+ count_vm_event(NR_TLB_LOCAL_FLUSH_ALL);
local_flush_tlb();
- else {
+ } else {
if (has_large_page(mm, start, end)) {
local_flush_tlb();
goto flush_all;
}
/* flush range by one by one 'invlpg' */
- for (addr = start; addr < end; addr += PAGE_SIZE)
+ for (addr = start; addr < end; addr += PAGE_SIZE) {
+ count_vm_event(NR_TLB_LOCAL_FLUSH_ONE);
__flush_tlb_single(addr);
+ }

if (cpumask_any_but(mm_cpumask(mm),
smp_processor_id()) < nr_cpu_ids)
@@ -256,6 +262,7 @@ void flush_tlb_page(struct vm_area_struc

static void do_flush_tlb_all(void *info)
{
+ count_vm_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
__flush_tlb_all();
if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_LAZY)
leave_mm(smp_processor_id());
@@ -263,6 +270,7 @@ static void do_flush_tlb_all(void *info)

void flush_tlb_all(void)
{
+ count_vm_event(NR_TLB_REMOTE_FLUSH);
on_each_cpu(do_flush_tlb_all, NULL, 1);
}

@@ -272,8 +280,10 @@ static void do_kernel_range_flush(void *
unsigned long addr;

/* flush range by one by one 'invlpg' */
- for (addr = f->flush_start; addr < f->flush_end; addr += PAGE_SIZE)
+ for (addr = f->flush_start; addr < f->flush_end; addr += PAGE_SIZE) {
+ count_vm_event(NR_TLB_LOCAL_FLUSH_ONE_KERNEL);
__flush_tlb_single(addr);
+ }
}

void flush_tlb_kernel_range(unsigned long start, unsigned long end)
diff -puN include/linux/vm_event_item.h~tlb-vmstats include/linux/vm_event_item.h
--- linux.git/include/linux/vm_event_item.h~tlb-vmstats 2013-07-16 16:41:56.478280438 -0700
+++ linux.git-davehans/include/linux/vm_event_item.h 2013-07-16 16:41:56.483280658 -0700
@@ -70,6 +70,11 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
THP_ZERO_PAGE_ALLOC,
THP_ZERO_PAGE_ALLOC_FAILED,
#endif
+ NR_TLB_REMOTE_FLUSH, /* cpu tried to flush others' tlbs */
+ NR_TLB_REMOTE_FLUSH_RECEIVED,/* cpu received ipi for flush */
+ NR_TLB_LOCAL_FLUSH_ALL,
+ NR_TLB_LOCAL_FLUSH_ONE,
+ NR_TLB_LOCAL_FLUSH_ONE_KERNEL,
NR_VM_EVENT_ITEMS
};

diff -puN mm/vmstat.c~tlb-vmstats mm/vmstat.c
--- linux.git/mm/vmstat.c~tlb-vmstats 2013-07-16 16:41:56.480280525 -0700
+++ linux.git-davehans/mm/vmstat.c 2013-07-16 16:41:56.484280703 -0700
@@ -817,6 +817,11 @@ const char * const vmstat_text[] = {
"thp_zero_page_alloc",
"thp_zero_page_alloc_failed",
#endif
+ "nr_tlb_remote_flush",
+ "nr_tlb_remote_flush_received",
+ "nr_tlb_local_flush_all",
+ "nr_tlb_local_flush_one",
+ "nr_tlb_local_flush_one_kernel",

#endif /* CONFIG_VM_EVENTS_COUNTERS */
};
_


2013-07-17 07:21:09

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RESEND][PATCH] mm: vmstats: tlb flush counters


* Dave Hansen <[email protected]> wrote:

> I was investigating some TLB flush scaling issues and realized
> that we do not have any good methods for figuring out how many
> TLB flushes we are doing.
>
> It would be nice to be able to do these in generic code, but the
> arch-independent calls don't explicitly specify whether we
> actually need to do remote flushes or not. In the end, we really
> need to know if we actually _did_ global vs. local invalidations,
> so that leaves us with few options other than to muck with the
> counters from arch-specific code.
>
> Signed-off-by: Dave Hansen <[email protected]>
> ---
>
> linux.git-davehans/arch/x86/mm/tlb.c | 18 ++++++++++++++----
> linux.git-davehans/include/linux/vm_event_item.h | 5 +++++
> linux.git-davehans/mm/vmstat.c | 5 +++++
> 3 files changed, 24 insertions(+), 4 deletions(-)
>
> diff -puN arch/x86/mm/tlb.c~tlb-vmstats arch/x86/mm/tlb.c
> --- linux.git/arch/x86/mm/tlb.c~tlb-vmstats 2013-07-16 16:41:56.476280350 -0700
> +++ linux.git-davehans/arch/x86/mm/tlb.c 2013-07-16 16:41:56.483280658 -0700
> @@ -103,6 +103,7 @@ static void flush_tlb_func(void *info)
> if (f->flush_mm != this_cpu_read(cpu_tlbstate.active_mm))
> return;
>
> + count_vm_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
> if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_OK) {
> if (f->flush_end == TLB_FLUSH_ALL)
> local_flush_tlb();
> @@ -130,6 +131,7 @@ void native_flush_tlb_others(const struc
> info.flush_start = start;
> info.flush_end = end;
>
> + count_vm_event(NR_TLB_REMOTE_FLUSH);
> if (is_uv_system()) {
> unsigned int cpu;
>
> @@ -149,6 +151,7 @@ void flush_tlb_current_task(void)
>
> preempt_disable();
>
> + count_vm_event(NR_TLB_LOCAL_FLUSH_ALL);
> local_flush_tlb();
> if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
> flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
> @@ -211,16 +214,19 @@ void flush_tlb_mm_range(struct mm_struct
> act_entries = mm->total_vm > tlb_entries ? tlb_entries : mm->total_vm;
>
> /* tlb_flushall_shift is on balance point, details in commit log */
> - if ((end - start) >> PAGE_SHIFT > act_entries >> tlb_flushall_shift)
> + if ((end - start) >> PAGE_SHIFT > act_entries >> tlb_flushall_shift) {
> + count_vm_event(NR_TLB_LOCAL_FLUSH_ALL);
> local_flush_tlb();
> - else {
> + } else {
> if (has_large_page(mm, start, end)) {
> local_flush_tlb();
> goto flush_all;
> }
> /* flush range by one by one 'invlpg' */
> - for (addr = start; addr < end; addr += PAGE_SIZE)
> + for (addr = start; addr < end; addr += PAGE_SIZE) {
> + count_vm_event(NR_TLB_LOCAL_FLUSH_ONE);
> __flush_tlb_single(addr);
> + }
>
> if (cpumask_any_but(mm_cpumask(mm),
> smp_processor_id()) < nr_cpu_ids)
> @@ -256,6 +262,7 @@ void flush_tlb_page(struct vm_area_struc
>
> static void do_flush_tlb_all(void *info)
> {
> + count_vm_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
> __flush_tlb_all();
> if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_LAZY)
> leave_mm(smp_processor_id());
> @@ -263,6 +270,7 @@ static void do_flush_tlb_all(void *info)
>
> void flush_tlb_all(void)
> {
> + count_vm_event(NR_TLB_REMOTE_FLUSH);
> on_each_cpu(do_flush_tlb_all, NULL, 1);
> }
>
> @@ -272,8 +280,10 @@ static void do_kernel_range_flush(void *
> unsigned long addr;
>
> /* flush range by one by one 'invlpg' */
> - for (addr = f->flush_start; addr < f->flush_end; addr += PAGE_SIZE)
> + for (addr = f->flush_start; addr < f->flush_end; addr += PAGE_SIZE) {
> + count_vm_event(NR_TLB_LOCAL_FLUSH_ONE_KERNEL);
> __flush_tlb_single(addr);
> + }
> }
>
> void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> diff -puN include/linux/vm_event_item.h~tlb-vmstats include/linux/vm_event_item.h
> --- linux.git/include/linux/vm_event_item.h~tlb-vmstats 2013-07-16 16:41:56.478280438 -0700
> +++ linux.git-davehans/include/linux/vm_event_item.h 2013-07-16 16:41:56.483280658 -0700
> @@ -70,6 +70,11 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
> THP_ZERO_PAGE_ALLOC,
> THP_ZERO_PAGE_ALLOC_FAILED,
> #endif
>
> + NR_TLB_REMOTE_FLUSH, /* cpu tried to flush others' tlbs */
> + NR_TLB_REMOTE_FLUSH_RECEIVED,/* cpu received ipi for flush */
> + NR_TLB_LOCAL_FLUSH_ALL,
> + NR_TLB_LOCAL_FLUSH_ONE,
> + NR_TLB_LOCAL_FLUSH_ONE_KERNEL,

Please fix the vertical alignment of comments.

> NR_VM_EVENT_ITEMS
> };
>
> diff -puN mm/vmstat.c~tlb-vmstats mm/vmstat.c
> --- linux.git/mm/vmstat.c~tlb-vmstats 2013-07-16 16:41:56.480280525 -0700
> +++ linux.git-davehans/mm/vmstat.c 2013-07-16 16:41:56.484280703 -0700
> @@ -817,6 +817,11 @@ const char * const vmstat_text[] = {
> "thp_zero_page_alloc",
> "thp_zero_page_alloc_failed",
> #endif
> + "nr_tlb_remote_flush",
> + "nr_tlb_remote_flush_received",
> + "nr_tlb_local_flush_all",
> + "nr_tlb_local_flush_one",
> + "nr_tlb_local_flush_one_kernel",

At first sight this seems pretty x86 specific. No range flush events, etc.

But no strong objections from me, if Andrew likes it.

Thanks,

Ingo

2013-07-18 20:52:00

by Andrew Morton

[permalink] [raw]
Subject: Re: [RESEND][PATCH] mm: vmstats: tlb flush counters

On Wed, 17 Jul 2013 09:21:00 +0200 Ingo Molnar <[email protected]> wrote:

>
> * Dave Hansen <[email protected]> wrote:
>
> > I was investigating some TLB flush scaling issues and realized
> > that we do not have any good methods for figuring out how many
> > TLB flushes we are doing.
> >
> > It would be nice to be able to do these in generic code, but the
> > arch-independent calls don't explicitly specify whether we
> > actually need to do remote flushes or not. In the end, we really
> > need to know if we actually _did_ global vs. local invalidations,
> > so that leaves us with few options other than to muck with the
> > counters from arch-specific code.

Spose so, if you really think it's worth it. It's all downside for
uniprocessor machines. And for architectures which don't implement the
counters, of course.

> > --- linux.git/include/linux/vm_event_item.h~tlb-vmstats 2013-07-16 16:41:56.478280438 -0700
> > +++ linux.git-davehans/include/linux/vm_event_item.h 2013-07-16 16:41:56.483280658 -0700
> > @@ -70,6 +70,11 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
> > THP_ZERO_PAGE_ALLOC,
> > THP_ZERO_PAGE_ALLOC_FAILED,
> > #endif
> >
> > + NR_TLB_REMOTE_FLUSH, /* cpu tried to flush others' tlbs */
> > + NR_TLB_REMOTE_FLUSH_RECEIVED,/* cpu received ipi for flush */
> > + NR_TLB_LOCAL_FLUSH_ALL,
> > + NR_TLB_LOCAL_FLUSH_ONE,
> > + NR_TLB_LOCAL_FLUSH_ONE_KERNEL,
>
> Please fix the vertical alignment of comments.

I looked - this isn't practical.

It would be nice to actually document these things though. We don't
*have* to squeeze the comment into the RHS.

2013-07-19 08:28:55

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RESEND][PATCH] mm: vmstats: tlb flush counters


* Andrew Morton <[email protected]> wrote:

> On Wed, 17 Jul 2013 09:21:00 +0200 Ingo Molnar <[email protected]> wrote:
>
> >
> > * Dave Hansen <[email protected]> wrote:
> >
> > > I was investigating some TLB flush scaling issues and realized
> > > that we do not have any good methods for figuring out how many
> > > TLB flushes we are doing.
> > >
> > > It would be nice to be able to do these in generic code, but the
> > > arch-independent calls don't explicitly specify whether we
> > > actually need to do remote flushes or not. In the end, we really
> > > need to know if we actually _did_ global vs. local invalidations,
> > > so that leaves us with few options other than to muck with the
> > > counters from arch-specific code.
>
> Spose so, if you really think it's worth it. It's all downside for
> uniprocessor machines. [...]

UP is slowly going extinct, but in any case these counters ought to inform
us about TLB flushes even on UP systems:

> > > + NR_TLB_LOCAL_FLUSH_ALL,
> > > + NR_TLB_LOCAL_FLUSH_ONE,
> > > + NR_TLB_LOCAL_FLUSH_ONE_KERNEL,

While these ought to be compiled out on UP kernels:

> > > + NR_TLB_REMOTE_FLUSH, /* cpu tried to flush others' tlbs */
> > > + NR_TLB_REMOTE_FLUSH_RECEIVED,/* cpu received ipi for flush */

Right?

> > Please fix the vertical alignment of comments.
>
> I looked - this isn't practical.
>
> It would be nice to actually document these things though. We don't
> *have* to squeeze the comment into the RHS.

Agreed.

Thanks,

Ingo

2013-07-19 11:38:07

by Raghavendra KT

[permalink] [raw]
Subject: Re: [RESEND][PATCH] mm: vmstats: tlb flush counters

On Wed, Jul 17, 2013 at 5:14 AM, Dave Hansen <[email protected]> wrote:
>
> I was investigating some TLB flush scaling issues and realized
> that we do not have any good methods for figuring out how many
> TLB flushes we are doing.
>
> It would be nice to be able to do these in generic code, but the
> arch-independent calls don't explicitly specify whether we
> actually need to do remote flushes or not. In the end, we really
> need to know if we actually _did_ global vs. local invalidations,
> so that leaves us with few options other than to muck with the
> counters from arch-specific code.
>
> Signed-off-by: Dave Hansen <[email protected]>
> ---

Hi Dave,
While measuring non - PLE performance, one of the bottleneck, I am seeing is
flush tlbs.
perf had helped in alaysing a bit there, but this patch would help
in precise calculation. It will aslo help in tuning the PLE window
experiments (larger PLE window
would affect remote flush TLBs)

Thanks for this patch. Tested the patch on my sandybridge.box

Tested-by: Raghavendra K T <[email protected]>

2013-07-19 15:20:30

by Dave Hansen

[permalink] [raw]
Subject: Re: [RESEND][PATCH] mm: vmstats: tlb flush counters

On 07/19/2013 04:38 AM, Raghavendra KT wrote:
> While measuring non - PLE performance, one of the bottleneck, I am seeing is
> flush tlbs.
> perf had helped in alaysing a bit there, but this patch would help
> in precise calculation. It will aslo help in tuning the PLE window
> experiments (larger PLE window
> would affect remote flush TLBs)

Interesting. What workload is that? I've been having problems finding
workloads that are too consumed with TLB flushes.

2013-07-19 15:52:09

by Dave Hansen

[permalink] [raw]
Subject: Re: [RESEND][PATCH] mm: vmstats: tlb flush counters

On 07/19/2013 01:28 AM, Ingo Molnar wrote:
> UP is slowly going extinct, but in any case these counters ought to inform
> us about TLB flushes even on UP systems:
>
>>>> > > > + NR_TLB_LOCAL_FLUSH_ALL,
>>>> > > > + NR_TLB_LOCAL_FLUSH_ONE,
>>>> > > > + NR_TLB_LOCAL_FLUSH_ONE_KERNEL,
> While these ought to be compiled out on UP kernels:
>
>>>> > > > + NR_TLB_REMOTE_FLUSH, /* cpu tried to flush others' tlbs */
>>>> > > > + NR_TLB_REMOTE_FLUSH_RECEIVED,/* cpu received ipi for flush */
> Right?

Yeah, it's useful on UP too. But I realized that my changes were
confined to the SMP code. The UP code is almost all in one of the
headers, and I didn't touch it. So I've got some work there to fix it up.

2013-07-20 13:02:32

by Raghavendra K T

[permalink] [raw]
Subject: Re: [RESEND][PATCH] mm: vmstats: tlb flush counters

On 07/19/2013 08:50 PM, Dave Hansen wrote:
> On 07/19/2013 04:38 AM, Raghavendra KT wrote:
>> While measuring non - PLE performance, one of the bottleneck, I am seeing is
>> flush tlbs.
>> perf had helped in alaysing a bit there, but this patch would help
>> in precise calculation. It will aslo help in tuning the PLE window
>> experiments (larger PLE window
>> would affect remote flush TLBs)
>
> Interesting. What workload is that? I've been having problems finding
> workloads that are too consumed with TLB flushes.
>

Dave,
ebizzy is the one. and dbench to some small extent.

[root@codeblue ~]# cat /proc/vmstat |grep nr_tlb ;
/root/data/script/do_ebizzy.sh; cat /proc/vmstat |grep nr_tlb
nr_tlb_remote_flush 721
nr_tlb_remote_flush_received 923
nr_tlb_local_flush_all 13992
nr_tlb_local_flush_one 0
nr_tlb_local_flush_one_kernel 0
7482 records/s
real 120.00 s
user 86.69 s
sys 3746.57 s
nr_tlb_remote_flush 912896
nr_tlb_remote_flush_received 28261974
nr_tlb_local_flush_all 926272
nr_tlb_local_flush_one 0
nr_tlb_local_flush_one_kernel 0

2013-07-22 10:06:12

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RESEND][PATCH] mm: vmstats: tlb flush counters


* Dave Hansen <[email protected]> wrote:

> On 07/19/2013 04:38 AM, Raghavendra KT wrote:
> > While measuring non - PLE performance, one of the bottleneck, I am seeing is
> > flush tlbs.
> > perf had helped in alaysing a bit there, but this patch would help
> > in precise calculation. It will aslo help in tuning the PLE window
> > experiments (larger PLE window
> > would affect remote flush TLBs)
>
> Interesting. What workload is that? I've been having problems finding
> workloads that are too consumed with TLB flushes.

Btw., would be nice to also integrate these VM counters into perf as well,
as an instrumentation variant/option.

It could be done in an almost zero overhead fashion using jump-labels I
think.

[ Just in case someone is bored to death and is looking for an interesting
side project ;-) ]

Thanks,

Ingo

2013-07-22 16:59:52

by Dave Hansen

[permalink] [raw]
Subject: Re: [RESEND][PATCH] mm: vmstats: tlb flush counters

On 07/22/2013 03:06 AM, Ingo Molnar wrote:
> Btw., would be nice to also integrate these VM counters into perf as well,
> as an instrumentation variant/option.
>
> It could be done in an almost zero overhead fashion using jump-labels I
> think.
>
> [ Just in case someone is bored to death and is looking for an interesting
> side project ;-) ]

I'd actually been thinking about making them in to tracepoints, but the
tracepoint macros seem to create #include messes if you try to use them
in very common headers.

Agree it would be an interesting side project, though. :)

2013-07-23 08:17:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RESEND][PATCH] mm: vmstats: tlb flush counters


* Dave Hansen <[email protected]> wrote:

> On 07/22/2013 03:06 AM, Ingo Molnar wrote:
> > Btw., would be nice to also integrate these VM counters into perf as well,
> > as an instrumentation variant/option.
> >
> > It could be done in an almost zero overhead fashion using jump-labels I
> > think.
> >
> > [ Just in case someone is bored to death and is looking for an interesting
> > side project ;-) ]
>
> I'd actually been thinking about making them in to tracepoints, but the
> tracepoint macros seem to create #include messes if you try to use them
> in very common headers.
>
> Agree it would be an interesting side project, though. :)

Yes, tracepoints was what I was thinking about, it would allow easy
integration into perf [and it's useful even without any userspace side] -
as long as:

- the tracepoints trace the counts/sums, not just the events themselves
- when the tracepoints are not active the VM counts are still maintained
separately

I.e. the existing VM counts and its extraction facilities are not impacted
in any way, just a new channel of instrumentation is provided -
jump-label/static-key optimized by virtue of being tracepoints.

Thanks,

Ingo