2014-10-20 22:41:58

by Peter Zijlstra

[permalink] [raw]
Subject: [RFC][PATCH 0/6] Another go at speculative page faults

Hi,

I figured I'd give my 2010 speculative fault series another spin:

https://lkml.org/lkml/2010/1/4/257

Since then I think many of the outstanding issues have changed sufficiently to
warrant another go. In particular Al Viro's delayed fput seems to have made it
entirely 'normal' to delay fput(). Lai Jiangshan's SRCU rewrite provided us
with call_srcu() and my preemptible mmu_gather removed the TLB flushes from
under the PTL.

The code needs way more attention but builds a kernel and runs the
micro-benchmark so I figured I'd post it before sinking more time into it.

I realize the micro-bench is about as good as it gets for this series and not
very realistic otherwise, but I think it does show the potential benefit the
approach has.

(patches go against .18-rc1+)

---

Using Kamezawa's multi-fault micro-bench from: https://lkml.org/lkml/2010/1/6/28

My Ivy Bridge EP (2*10*2) has a ~58% improvement in pagefault throughput:

PRE:

root@ivb-ep:~# perf stat -e page-faults,cache-misses --repeat 5 ./multi-fault 20

Performance counter stats for './multi-fault 20' (5 runs):

149,441,555 page-faults ( +- 1.25% )
2,153,651,828 cache-misses ( +- 1.09% )

60.003082014 seconds time elapsed ( +- 0.00% )

POST:

root@ivb-ep:~# perf stat -e page-faults,cache-misses --repeat 5 ./multi-fault 20

Performance counter stats for './multi-fault 20' (5 runs):

236,442,626 page-faults ( +- 0.08% )
2,796,353,939 cache-misses ( +- 1.01% )

60.002792431 seconds time elapsed ( +- 0.00% )


My Ivy Bridge EX (4*15*2) has a ~78% improvement in pagefault throughput:

PRE:

root@ivb-ex:~# perf stat -e page-faults,cache-misses --repeat 5 ./multi-fault 60

Performance counter stats for './multi-fault 60' (5 runs):

105,789,078 page-faults ( +- 2.24% )
1,314,072,090 cache-misses ( +- 1.17% )

60.009243533 seconds time elapsed ( +- 0.00% )

POST:

root@ivb-ex:~# perf stat -e page-faults,cache-misses --repeat 5 ./multi-fault 60

Performance counter stats for './multi-fault 60' (5 runs):

187,751,767 page-faults ( +- 2.24% )
1,792,758,664 cache-misses ( +- 2.30% )

60.011611579 seconds time elapsed ( +- 0.00% )

(I've not yet looked at why the EX sucks chunks compared to the EP box, I
suspect we contend on other locks, but it could be anything.)

---

arch/x86/mm/fault.c | 35 ++-
include/linux/mm.h | 19 +-
include/linux/mm_types.h | 5 +
kernel/fork.c | 1 +
mm/init-mm.c | 1 +
mm/internal.h | 18 ++
mm/memory.c | 672 ++++++++++++++++++++++++++++-------------------
mm/mmap.c | 101 +++++--
8 files changed, 544 insertions(+), 308 deletions(-)


2014-10-21 00:07:08

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/6] Another go at speculative page faults

On 10/20/2014 02:56 PM, Peter Zijlstra wrote:
> Hi,
>
> I figured I'd give my 2010 speculative fault series another spin:
>
> https://lkml.org/lkml/2010/1/4/257
>
> Since then I think many of the outstanding issues have changed sufficiently to
> warrant another go. In particular Al Viro's delayed fput seems to have made it
> entirely 'normal' to delay fput(). Lai Jiangshan's SRCU rewrite provided us
> with call_srcu() and my preemptible mmu_gather removed the TLB flushes from
> under the PTL.
>
> The code needs way more attention but builds a kernel and runs the
> micro-benchmark so I figured I'd post it before sinking more time into it.
>
> I realize the micro-bench is about as good as it gets for this series and not
> very realistic otherwise, but I think it does show the potential benefit the
> approach has.

Does this mean that an entire fault can complete without ever taking
mmap_sem at all? If so, that's a *huge* win.

I'm a bit concerned about drivers that assume that the vma is unchanged
during .fault processing. In particular, is there a race between .close
and .fault? Would it make sense to add a per-vma rw lock and hold it
during vma modification and .fault calls?

--Andy

>
> (patches go against .18-rc1+)
>
> ---
>
> Using Kamezawa's multi-fault micro-bench from: https://lkml.org/lkml/2010/1/6/28
>
> My Ivy Bridge EP (2*10*2) has a ~58% improvement in pagefault throughput:
>
> PRE:
>
> root@ivb-ep:~# perf stat -e page-faults,cache-misses --repeat 5 ./multi-fault 20
>
> Performance counter stats for './multi-fault 20' (5 runs):
>
> 149,441,555 page-faults ( +- 1.25% )
> 2,153,651,828 cache-misses ( +- 1.09% )
>
> 60.003082014 seconds time elapsed ( +- 0.00% )
>
> POST:
>
> root@ivb-ep:~# perf stat -e page-faults,cache-misses --repeat 5 ./multi-fault 20
>
> Performance counter stats for './multi-fault 20' (5 runs):
>
> 236,442,626 page-faults ( +- 0.08% )
> 2,796,353,939 cache-misses ( +- 1.01% )
>
> 60.002792431 seconds time elapsed ( +- 0.00% )
>
>
> My Ivy Bridge EX (4*15*2) has a ~78% improvement in pagefault throughput:
>
> PRE:
>
> root@ivb-ex:~# perf stat -e page-faults,cache-misses --repeat 5 ./multi-fault 60
>
> Performance counter stats for './multi-fault 60' (5 runs):
>
> 105,789,078 page-faults ( +- 2.24% )
> 1,314,072,090 cache-misses ( +- 1.17% )
>
> 60.009243533 seconds time elapsed ( +- 0.00% )
>
> POST:
>
> root@ivb-ex:~# perf stat -e page-faults,cache-misses --repeat 5 ./multi-fault 60
>
> Performance counter stats for './multi-fault 60' (5 runs):
>
> 187,751,767 page-faults ( +- 2.24% )
> 1,792,758,664 cache-misses ( +- 2.30% )
>
> 60.011611579 seconds time elapsed ( +- 0.00% )
>
> (I've not yet looked at why the EX sucks chunks compared to the EP box, I
> suspect we contend on other locks, but it could be anything.)
>
> ---
>
> arch/x86/mm/fault.c | 35 ++-
> include/linux/mm.h | 19 +-
> include/linux/mm_types.h | 5 +
> kernel/fork.c | 1 +
> mm/init-mm.c | 1 +
> mm/internal.h | 18 ++
> mm/memory.c | 672 ++++++++++++++++++++++++++++-------------------
> mm/mmap.c | 101 +++++--
> 8 files changed, 544 insertions(+), 308 deletions(-)
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2014-10-21 08:12:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/6] Another go at speculative page faults

On Mon, Oct 20, 2014 at 05:07:02PM -0700, Andy Lutomirski wrote:
> On 10/20/2014 02:56 PM, Peter Zijlstra wrote:
> > Hi,
> >
> > I figured I'd give my 2010 speculative fault series another spin:
> >
> > https://lkml.org/lkml/2010/1/4/257
> >
> > Since then I think many of the outstanding issues have changed sufficiently to
> > warrant another go. In particular Al Viro's delayed fput seems to have made it
> > entirely 'normal' to delay fput(). Lai Jiangshan's SRCU rewrite provided us
> > with call_srcu() and my preemptible mmu_gather removed the TLB flushes from
> > under the PTL.
> >
> > The code needs way more attention but builds a kernel and runs the
> > micro-benchmark so I figured I'd post it before sinking more time into it.
> >
> > I realize the micro-bench is about as good as it gets for this series and not
> > very realistic otherwise, but I think it does show the potential benefit the
> > approach has.
>
> Does this mean that an entire fault can complete without ever taking
> mmap_sem at all? If so, that's a *huge* win.

Yep.

> I'm a bit concerned about drivers that assume that the vma is unchanged
> during .fault processing. In particular, is there a race between .close
> and .fault? Would it make sense to add a per-vma rw lock and hold it
> during vma modification and .fault calls?

VMA granularity contention would be about as bad as mmap_sem for many
workloads. But yes, that is one of the things we need to look at, I was
_hoping_ that holding the file open would sort most these problems, but
I'm sure there plenty 'interesting' cruft left.

2014-10-21 16:23:50

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/6] Another go at speculative page faults


* Peter Zijlstra <[email protected]> wrote:

> My Ivy Bridge EP (2*10*2) has a ~58% improvement in pagefault throughput:
>
> PRE:
> 149,441,555 page-faults ( +- 1.25% )
>
> POST:
> 236,442,626 page-faults ( +- 0.08% )

> My Ivy Bridge EX (4*15*2) has a ~78% improvement in pagefault throughput:
>
> PRE:
> 105,789,078 page-faults ( +- 2.24% )
>
> POST:
> 187,751,767 page-faults ( +- 2.24% )

I guess the 'PRE' and 'POST' numbers should be flipped around?

Thanks,

Ingo

2014-10-21 17:15:19

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/6] Another go at speculative page faults

On Tue, Oct 21, 2014 at 06:23:40PM +0200, Ingo Molnar wrote:
>
> * Peter Zijlstra <[email protected]> wrote:
>
> > My Ivy Bridge EP (2*10*2) has a ~58% improvement in pagefault throughput:
> >
> > PRE:
> > 149,441,555 page-faults ( +- 1.25% )
> >
> > POST:
> > 236,442,626 page-faults ( +- 0.08% )
>
> > My Ivy Bridge EX (4*15*2) has a ~78% improvement in pagefault throughput:
> >
> > PRE:
> > 105,789,078 page-faults ( +- 2.24% )
> >
> > POST:
> > 187,751,767 page-faults ( +- 2.24% )
>
> I guess the 'PRE' and 'POST' numbers should be flipped around?

I think it's faults per second.

It would be interesting to see if the patchset affects non-condended case.
Like a one-threaded workload.

--
Kirill A. Shutemov

2014-10-21 17:25:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/6] Another go at speculative page faults

On Tue, Oct 21, 2014 at 06:23:40PM +0200, Ingo Molnar wrote:
>
> * Peter Zijlstra <[email protected]> wrote:
>
> > My Ivy Bridge EP (2*10*2) has a ~58% improvement in pagefault throughput:
> >
> > PRE:
> > 149,441,555 page-faults ( +- 1.25% )
> >
> > POST:
> > 236,442,626 page-faults ( +- 0.08% )
>
> > My Ivy Bridge EX (4*15*2) has a ~78% improvement in pagefault throughput:
> >
> > PRE:
> > 105,789,078 page-faults ( +- 2.24% )
> >
> > POST:
> > 187,751,767 page-faults ( +- 2.24% )
>
> I guess the 'PRE' and 'POST' numbers should be flipped around?

Nope, its the number of page-faults serviced in a fixed amount of time
(60 seconds), therefore higher is better.

2014-10-21 17:56:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/6] Another go at speculative page faults

On Tue, Oct 21, 2014 at 08:09:48PM +0300, Kirill A. Shutemov wrote:
> It would be interesting to see if the patchset affects non-condended case.
> Like a one-threaded workload.

It does, and not in a good way, I'll have to look at that... :/

Performance counter stats for './multi-fault 1' (5 runs):

73,860,251 page-faults ( +- 0.28% )
40,914 cache-misses ( +- 41.26% )

60.001484913 seconds time elapsed ( +- 0.00% )


Performance counter stats for './multi-fault 1' (5 runs):

70,700,838 page-faults ( +- 0.03% )
31,466 cache-misses ( +- 8.62% )

60.001753906 seconds time elapsed ( +- 0.00% )

2014-10-22 07:35:15

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/6] Another go at speculative page faults

On Mon, 2014-10-20 at 23:56 +0200, Peter Zijlstra wrote:
> Hi,
>
> I figured I'd give my 2010 speculative fault series another spin:
>
> https://lkml.org/lkml/2010/1/4/257
>
> Since then I think many of the outstanding issues have changed sufficiently to
> warrant another go. In particular Al Viro's delayed fput seems to have made it
> entirely 'normal' to delay fput(). Lai Jiangshan's SRCU rewrite provided us
> with call_srcu() and my preemptible mmu_gather removed the TLB flushes from
> under the PTL.
>
> The code needs way more attention but builds a kernel and runs the
> micro-benchmark so I figured I'd post it before sinking more time into it.
>
> I realize the micro-bench is about as good as it gets for this series and not
> very realistic otherwise, but I think it does show the potential benefit the
> approach has.
>
> (patches go against .18-rc1+)

I think patch 2/6 is borken:

error: patch failed: mm/memory.c:2025
error: mm/memory.c: patch does not apply

and related, as you mention, I would very much welcome having the
introduction of 'struct faut_env' as a separate cleanup patch. May I
suggest renaming it to fault_cxt?

Thanks,
Davidlohr

2014-10-22 11:30:04

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/6] Another go at speculative page faults

On Wed, Oct 22, 2014 at 12:34:49AM -0700, Davidlohr Bueso wrote:
> On Mon, 2014-10-20 at 23:56 +0200, Peter Zijlstra wrote:
> > Hi,
> >
> > I figured I'd give my 2010 speculative fault series another spin:
> >
> > https://lkml.org/lkml/2010/1/4/257
> >
> > Since then I think many of the outstanding issues have changed sufficiently to
> > warrant another go. In particular Al Viro's delayed fput seems to have made it
> > entirely 'normal' to delay fput(). Lai Jiangshan's SRCU rewrite provided us
> > with call_srcu() and my preemptible mmu_gather removed the TLB flushes from
> > under the PTL.
> >
> > The code needs way more attention but builds a kernel and runs the
> > micro-benchmark so I figured I'd post it before sinking more time into it.
> >
> > I realize the micro-bench is about as good as it gets for this series and not
> > very realistic otherwise, but I think it does show the potential benefit the
> > approach has.
> >
> > (patches go against .18-rc1+)
>
> I think patch 2/6 is borken:
>
> error: patch failed: mm/memory.c:2025
> error: mm/memory.c: patch does not apply
>
> and related, as you mention, I would very much welcome having the
> introduction of 'struct faut_env' as a separate cleanup patch. May I
> suggest renaming it to fault_cxt?

What about extend start using 'struct vm_fault' earlier by stack?

--
Kirill A. Shutemov

2014-10-22 11:46:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/6] Another go at speculative page faults

On Wed, Oct 22, 2014 at 02:29:25PM +0300, Kirill A. Shutemov wrote:
> On Wed, Oct 22, 2014 at 12:34:49AM -0700, Davidlohr Bueso wrote:
> > On Mon, 2014-10-20 at 23:56 +0200, Peter Zijlstra wrote:
> > > Hi,
> > >
> > > I figured I'd give my 2010 speculative fault series another spin:
> > >
> > > https://lkml.org/lkml/2010/1/4/257
> > >
> > > Since then I think many of the outstanding issues have changed sufficiently to
> > > warrant another go. In particular Al Viro's delayed fput seems to have made it
> > > entirely 'normal' to delay fput(). Lai Jiangshan's SRCU rewrite provided us
> > > with call_srcu() and my preemptible mmu_gather removed the TLB flushes from
> > > under the PTL.
> > >
> > > The code needs way more attention but builds a kernel and runs the
> > > micro-benchmark so I figured I'd post it before sinking more time into it.
> > >
> > > I realize the micro-bench is about as good as it gets for this series and not
> > > very realistic otherwise, but I think it does show the potential benefit the
> > > approach has.
> > >
> > > (patches go against .18-rc1+)
> >
> > I think patch 2/6 is borken:
> >
> > error: patch failed: mm/memory.c:2025
> > error: mm/memory.c: patch does not apply
> >
> > and related, as you mention, I would very much welcome having the
> > introduction of 'struct faut_env' as a separate cleanup patch. May I
> > suggest renaming it to fault_cxt?
>
> What about extend start using 'struct vm_fault' earlier by stack?

I'm not sure we should mix the environment for vm_ops::fault, which
acquires the page, and the fault path, which deals with changing the
PTE. Ideally we should not expose the page-table information to file
ops, its a layering violating if nothing else, drivers should not have
access to the page tables.

2014-10-22 11:56:21

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/6] Another go at speculative page faults

On Wed, Oct 22, 2014 at 01:45:58PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 22, 2014 at 02:29:25PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Oct 22, 2014 at 12:34:49AM -0700, Davidlohr Bueso wrote:
> > > On Mon, 2014-10-20 at 23:56 +0200, Peter Zijlstra wrote:
> > > > Hi,
> > > >
> > > > I figured I'd give my 2010 speculative fault series another spin:
> > > >
> > > > https://lkml.org/lkml/2010/1/4/257
> > > >
> > > > Since then I think many of the outstanding issues have changed sufficiently to
> > > > warrant another go. In particular Al Viro's delayed fput seems to have made it
> > > > entirely 'normal' to delay fput(). Lai Jiangshan's SRCU rewrite provided us
> > > > with call_srcu() and my preemptible mmu_gather removed the TLB flushes from
> > > > under the PTL.
> > > >
> > > > The code needs way more attention but builds a kernel and runs the
> > > > micro-benchmark so I figured I'd post it before sinking more time into it.
> > > >
> > > > I realize the micro-bench is about as good as it gets for this series and not
> > > > very realistic otherwise, but I think it does show the potential benefit the
> > > > approach has.
> > > >
> > > > (patches go against .18-rc1+)
> > >
> > > I think patch 2/6 is borken:
> > >
> > > error: patch failed: mm/memory.c:2025
> > > error: mm/memory.c: patch does not apply
> > >
> > > and related, as you mention, I would very much welcome having the
> > > introduction of 'struct faut_env' as a separate cleanup patch. May I
> > > suggest renaming it to fault_cxt?
> >
> > What about extend start using 'struct vm_fault' earlier by stack?
>
> I'm not sure we should mix the environment for vm_ops::fault, which
> acquires the page, and the fault path, which deals with changing the
> PTE. Ideally we should not expose the page-table information to file
> ops, its a layering violating if nothing else, drivers should not have
> access to the page tables.

We already have this for ->map_pages() :-P
I have asked if it's considered layering violation and seems nobody
cares...

--
Kirill A. Shutemov

2014-10-22 12:35:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/6] Another go at speculative page faults


* Peter Zijlstra <[email protected]> wrote:

> On Tue, Oct 21, 2014 at 06:23:40PM +0200, Ingo Molnar wrote:
> >
> > * Peter Zijlstra <[email protected]> wrote:
> >
> > > My Ivy Bridge EP (2*10*2) has a ~58% improvement in pagefault throughput:
> > >
> > > PRE:
> > > 149,441,555 page-faults ( +- 1.25% )
> > >
> > > POST:
> > > 236,442,626 page-faults ( +- 0.08% )
> >
> > > My Ivy Bridge EX (4*15*2) has a ~78% improvement in pagefault throughput:
> > >
> > > PRE:
> > > 105,789,078 page-faults ( +- 2.24% )
> > >
> > > POST:
> > > 187,751,767 page-faults ( +- 2.24% )
> >
> > I guess the 'PRE' and 'POST' numbers should be flipped around?
>
> Nope, its the number of page-faults serviced in a fixed amount of time
> (60 seconds), therefore higher is better.

Ah, okay!

Thanks,

Ingo

2014-10-23 10:37:01

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/6] Another go at speculative page faults

On 10/22/2014 01:56 AM, Peter Zijlstra wrote:
> On Tue, Oct 21, 2014 at 08:09:48PM +0300, Kirill A. Shutemov wrote:
>> It would be interesting to see if the patchset affects non-condended case.
>> Like a one-threaded workload.
>
> It does, and not in a good way, I'll have to look at that... :/

Maybe it is blamed to find_vma_srcu() that it doesn't take the advantage of
the vmacache_find() and cause more cache-misses.


Is it hard to use the vmacache in the find_vma_srcu()?

>
> Performance counter stats for './multi-fault 1' (5 runs):
>
> 73,860,251 page-faults ( +- 0.28% )
> 40,914 cache-misses ( +- 41.26% )
>
> 60.001484913 seconds time elapsed ( +- 0.00% )
>
>
> Performance counter stats for './multi-fault 1' (5 runs):
>
> 70,700,838 page-faults ( +- 0.03% )
> 31,466 cache-misses ( +- 8.62% )
>
> 60.001753906 seconds time elapsed ( +- 0.00% )
> .
>

2014-10-23 11:04:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/6] Another go at speculative page faults

On Thu, Oct 23, 2014 at 06:40:05PM +0800, Lai Jiangshan wrote:
> On 10/22/2014 01:56 AM, Peter Zijlstra wrote:
> > On Tue, Oct 21, 2014 at 08:09:48PM +0300, Kirill A. Shutemov wrote:
> >> It would be interesting to see if the patchset affects non-condended case.
> >> Like a one-threaded workload.
> >
> > It does, and not in a good way, I'll have to look at that... :/
>
> Maybe it is blamed to find_vma_srcu() that it doesn't take the advantage of
> the vmacache_find() and cause more cache-misses.

Its what I thought initially, I tried doing perf record with and
without, but then I ran into perf diff not quite working for me and I've
yet to find time to kick that thing into shape.

> Is it hard to use the vmacache in the find_vma_srcu()?

I've not had time to look at it.

2014-10-24 07:54:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/6] Another go at speculative page faults


* Peter Zijlstra <[email protected]> wrote:

> On Thu, Oct 23, 2014 at 06:40:05PM +0800, Lai Jiangshan wrote:
> > On 10/22/2014 01:56 AM, Peter Zijlstra wrote:
> > > On Tue, Oct 21, 2014 at 08:09:48PM +0300, Kirill A. Shutemov wrote:
> > >> It would be interesting to see if the patchset affects non-condended case.
> > >> Like a one-threaded workload.
> > >
> > > It does, and not in a good way, I'll have to look at that... :/
> >
> > Maybe it is blamed to find_vma_srcu() that it doesn't take the advantage of
> > the vmacache_find() and cause more cache-misses.
>
> Its what I thought initially, I tried doing perf record with and
> without, but then I ran into perf diff not quite working for me and I've
> yet to find time to kick that thing into shape.

Might be the 'perf diff' regression fixed by this:

9ab1f50876db perf diff: Add missing hists__init() call at tool start

I just pushed it out into tip:master.

Thanks,

Ingo

2014-10-24 13:14:58

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/6] Another go at speculative page faults

On Fri, Oct 24, 2014 at 09:54:23AM +0200, Ingo Molnar wrote:
>
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Thu, Oct 23, 2014 at 06:40:05PM +0800, Lai Jiangshan wrote:
> > > On 10/22/2014 01:56 AM, Peter Zijlstra wrote:
> > > > On Tue, Oct 21, 2014 at 08:09:48PM +0300, Kirill A. Shutemov wrote:
> > > >> It would be interesting to see if the patchset affects non-condended case.
> > > >> Like a one-threaded workload.
> > > >
> > > > It does, and not in a good way, I'll have to look at that... :/
> > >
> > > Maybe it is blamed to find_vma_srcu() that it doesn't take the advantage of
> > > the vmacache_find() and cause more cache-misses.
> >
> > Its what I thought initially, I tried doing perf record with and
> > without, but then I ran into perf diff not quite working for me and I've
> > yet to find time to kick that thing into shape.
>
> Might be the 'perf diff' regression fixed by this:
>
> 9ab1f50876db perf diff: Add missing hists__init() call at tool start
>
> I just pushed it out into tip:master.

I was on tip/master, so unlikely to be that as I was likely already
having it.

perf-report was affected too, for some reason my CONFIG_DEBUG_INFO=y
vmlinux wasn't showing symbols (and I double checked that KASLR crap was
disabled, so that wasn't confusing stuff either).

When I forced perf-report to use kallsyms it works, however perf-diff
doesn't have that option.

So there's two issues there, 1) perf-report failing to generate useful
output and 2) per-diff lacking options to force it to behave.

2014-10-28 05:32:23

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/6] Another go at speculative page faults

Hi Peter,

On Fri, 24 Oct 2014 15:14:40 +0200, Peter Zijlstra wrote:
> On Fri, Oct 24, 2014 at 09:54:23AM +0200, Ingo Molnar wrote:
>>
>> * Peter Zijlstra <[email protected]> wrote:
>> > Its what I thought initially, I tried doing perf record with and
>> > without, but then I ran into perf diff not quite working for me and I've
>> > yet to find time to kick that thing into shape.
>>
>> Might be the 'perf diff' regression fixed by this:
>>
>> 9ab1f50876db perf diff: Add missing hists__init() call at tool start
>>
>> I just pushed it out into tip:master.
>
> I was on tip/master, so unlikely to be that as I was likely already
> having it.
>
> perf-report was affected too, for some reason my CONFIG_DEBUG_INFO=y
> vmlinux wasn't showing symbols (and I double checked that KASLR crap was
> disabled, so that wasn't confusing stuff either).
>
> When I forced perf-report to use kallsyms it works, however perf-diff
> doesn't have that option.
>
> So there's two issues there, 1) perf-report failing to generate useful
> output and 2) per-diff lacking options to force it to behave.

Did the perf-report fail to show any (kernel) symbols or are they wrong
symbols? Maybe it's related to this:

https://lkml.org/lkml/2014/9/22/78

Thanks,
Namhyung