2019-10-16 12:53:20

by Jiri Olsa

[permalink] [raw]
Subject: [PATCHv2 0/2] perf tools: Share struct map after clone

hi,
Andi reported that maps cloning is eating lot of memory and
it's probably unnecessary, because they keep the same data.

This 'maps sharing' seems to save lot of heap for reports with
many forks/cloned mmaps (over 60% in example below).

Profile kernel build:

$ perf record make -j 40

Get heap profile (tools/perf directory):

$ <install gperftools>
$ make TCMALLOC=1
$ HEAPPROFILE=/tmp/heapprof ./perf report -i perf.data --stdio > out
$ pprof ./perf /tmp/heapprof.000*

Before:

(pprof) top
Total: 2335.5 MB
1735.1 74.3% 74.3% 1735.1 74.3% memdup
402.0 17.2% 91.5% 402.0 17.2% zalloc
140.2 6.0% 97.5% 145.8 6.2% map__new
33.6 1.4% 98.9% 33.6 1.4% symbol__new
12.4 0.5% 99.5% 12.4 0.5% alloc_event
6.2 0.3% 99.7% 6.2 0.3% nsinfo__new
5.5 0.2% 100.0% 5.5 0.2% nsinfo__copy
0.3 0.0% 100.0% 0.3 0.0% dso__new
0.1 0.0% 100.0% 0.1 0.0% do_read_string
0.0 0.0% 100.0% 0.0 0.0% __GI__IO_file_doallocate

After:

(pprof) top
Total: 784.5 MB
385.8 49.2% 49.2% 385.8 49.2% memdup
285.8 36.4% 85.6% 285.8 36.4% zalloc
80.4 10.3% 95.9% 83.7 10.7% map__new
19.1 2.4% 98.3% 19.1 2.4% symbol__new
6.2 0.8% 99.1% 6.2 0.8% alloc_event
3.6 0.5% 99.6% 3.6 0.5% nsinfo__new
3.2 0.4% 100.0% 3.2 0.4% nsinfo__copy
0.2 0.0% 100.0% 0.2 0.0% dso__new
0.0 0.0% 100.0% 0.0 0.0% do_read_string
0.0 0.0% 100.0% 0.0 0.0% elf_fill

v2 changes:
- rebased to Arnaldo's perf/core
- patch 1 already taken

Also available in here:
git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
perf/map_shared

thanks,
jirka


---
Jiri Olsa (2):
perf tools: Separate shareable part of 'struct map' into 'struct map_shared'
perf tools: Make 'struct map_shared' truly shared

tools/perf/arch/arm/tests/dwarf-unwind.c | 2 +-
tools/perf/arch/arm64/tests/dwarf-unwind.c | 2 +-
tools/perf/arch/powerpc/tests/dwarf-unwind.c | 2 +-
tools/perf/arch/powerpc/util/skip-callchain-idx.c | 4 +--
tools/perf/arch/powerpc/util/sym-handling.c | 4 +--
tools/perf/arch/s390/annotate/instructions.c | 2 +-
tools/perf/arch/x86/tests/dwarf-unwind.c | 2 +-
tools/perf/arch/x86/util/event.c | 6 ++--
tools/perf/builtin-annotate.c | 8 +++---
tools/perf/builtin-inject.c | 10 ++++---
tools/perf/builtin-kallsyms.c | 7 +++--
tools/perf/builtin-kmem.c | 2 +-
tools/perf/builtin-mem.c | 6 ++--
tools/perf/builtin-report.c | 19 +++++++------
tools/perf/builtin-script.c | 16 +++++------
tools/perf/builtin-top.c | 15 +++++-----
tools/perf/builtin-trace.c | 2 +-
tools/perf/tests/code-reading.c | 34 ++++++++++++-----------
tools/perf/tests/hists_common.c | 4 +--
tools/perf/tests/hists_cumulate.c | 4 +--
tools/perf/tests/hists_filter.c | 4 +--
tools/perf/tests/hists_output.c | 2 +-
tools/perf/tests/map_groups.c | 22 +++++++--------
tools/perf/tests/mmap-thread-lookup.c | 2 +-
tools/perf/tests/vmlinux-kallsyms.c | 36 ++++++++++++------------
tools/perf/ui/browsers/annotate.c | 4 +--
tools/perf/ui/browsers/hists.c | 10 +++----
tools/perf/ui/browsers/map.c | 4 +--
tools/perf/ui/gtk/annotate.c | 2 +-
tools/perf/util/annotate.c | 34 +++++++++++------------
tools/perf/util/auxtrace.c | 2 +-
tools/perf/util/bpf-event.c | 8 +++---
tools/perf/util/build-id.c | 2 +-
tools/perf/util/callchain.c | 6 ++--
tools/perf/util/db-export.c | 4 +--
tools/perf/util/dso.c | 2 +-
tools/perf/util/event.c | 6 ++--
tools/perf/util/evsel_fprintf.c | 2 +-
tools/perf/util/hist.c | 10 +++----
tools/perf/util/intel-pt.c | 42 ++++++++++++++++------------
tools/perf/util/machine.c | 76 +++++++++++++++++++++++++-------------------------
tools/perf/util/map.c | 226 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------------------------------------
tools/perf/util/map.h | 32 +++++++++++++--------
tools/perf/util/probe-event.c | 32 ++++++++++-----------
tools/perf/util/scripting-engines/trace-event-perl.c | 8 +++---
tools/perf/util/scripting-engines/trace-event-python.c | 12 ++++----
tools/perf/util/sort.c | 58 ++++++++++++++++++++------------------
tools/perf/util/symbol-elf.c | 28 +++++++++----------
tools/perf/util/symbol.c | 80 ++++++++++++++++++++++++++---------------------------
tools/perf/util/symbol_fprintf.c | 2 +-
tools/perf/util/synthetic-events.c | 14 +++++-----
tools/perf/util/thread.c | 14 +++++-----
tools/perf/util/unwind-libdw.c | 16 +++++++----
tools/perf/util/unwind-libunwind-local.c | 37 +++++++++++++------------
tools/perf/util/unwind-libunwind.c | 4 +--
tools/perf/util/vdso.c | 2 +-
56 files changed, 540 insertions(+), 456 deletions(-)


2019-10-23 12:33:06

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCHv2 0/2] perf tools: Share struct map after clone

On Wed, Oct 16, 2019 at 10:22:24AM +0200, Jiri Olsa wrote:
> hi,
> Andi reported that maps cloning is eating lot of memory and
> it's probably unnecessary, because they keep the same data.
>
> This 'maps sharing' seems to save lot of heap for reports with
> many forks/cloned mmaps (over 60% in example below).
>
> Profile kernel build:
>
> $ perf record make -j 40
>
> Get heap profile (tools/perf directory):
>
> $ <install gperftools>
> $ make TCMALLOC=1
> $ HEAPPROFILE=/tmp/heapprof ./perf report -i perf.data --stdio > out
> $ pprof ./perf /tmp/heapprof.000*
>
> Before:
>
> (pprof) top
> Total: 2335.5 MB
> 1735.1 74.3% 74.3% 1735.1 74.3% memdup
> 402.0 17.2% 91.5% 402.0 17.2% zalloc
> 140.2 6.0% 97.5% 145.8 6.2% map__new
> 33.6 1.4% 98.9% 33.6 1.4% symbol__new
> 12.4 0.5% 99.5% 12.4 0.5% alloc_event
> 6.2 0.3% 99.7% 6.2 0.3% nsinfo__new
> 5.5 0.2% 100.0% 5.5 0.2% nsinfo__copy
> 0.3 0.0% 100.0% 0.3 0.0% dso__new
> 0.1 0.0% 100.0% 0.1 0.0% do_read_string
> 0.0 0.0% 100.0% 0.0 0.0% __GI__IO_file_doallocate
>
> After:
>
> (pprof) top
> Total: 784.5 MB
> 385.8 49.2% 49.2% 385.8 49.2% memdup
> 285.8 36.4% 85.6% 285.8 36.4% zalloc
> 80.4 10.3% 95.9% 83.7 10.7% map__new
> 19.1 2.4% 98.3% 19.1 2.4% symbol__new
> 6.2 0.8% 99.1% 6.2 0.8% alloc_event
> 3.6 0.5% 99.6% 3.6 0.5% nsinfo__new
> 3.2 0.4% 100.0% 3.2 0.4% nsinfo__copy
> 0.2 0.0% 100.0% 0.2 0.0% dso__new
> 0.0 0.0% 100.0% 0.0 0.0% do_read_string
> 0.0 0.0% 100.0% 0.0 0.0% elf_fill
>
> v2 changes:
> - rebased to Arnaldo's perf/core
> - patch 1 already taken
>
> Also available in here:
> git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> perf/map_shared

I rebased to latest perf/core and pushed the branch out

jirka

2019-10-29 21:00:29

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCHv2 0/2] perf tools: Share struct map after clone

On Wed, Oct 23, 2019 at 09:55:17AM +0200, Jiri Olsa wrote:
> On Wed, Oct 16, 2019 at 10:22:24AM +0200, Jiri Olsa wrote:
> > hi,
> > Andi reported that maps cloning is eating lot of memory and
> > it's probably unnecessary, because they keep the same data.
> >
> > This 'maps sharing' seems to save lot of heap for reports with
> > many forks/cloned mmaps (over 60% in example below).
> >
> > Profile kernel build:
> >
> > $ perf record make -j 40
> >
> > Get heap profile (tools/perf directory):
> >
> > $ <install gperftools>
> > $ make TCMALLOC=1
> > $ HEAPPROFILE=/tmp/heapprof ./perf report -i perf.data --stdio > out
> > $ pprof ./perf /tmp/heapprof.000*
> >
> > Before:
> >
> > (pprof) top
> > Total: 2335.5 MB
> > 1735.1 74.3% 74.3% 1735.1 74.3% memdup
> > 402.0 17.2% 91.5% 402.0 17.2% zalloc
> > 140.2 6.0% 97.5% 145.8 6.2% map__new
> > 33.6 1.4% 98.9% 33.6 1.4% symbol__new
> > 12.4 0.5% 99.5% 12.4 0.5% alloc_event
> > 6.2 0.3% 99.7% 6.2 0.3% nsinfo__new
> > 5.5 0.2% 100.0% 5.5 0.2% nsinfo__copy
> > 0.3 0.0% 100.0% 0.3 0.0% dso__new
> > 0.1 0.0% 100.0% 0.1 0.0% do_read_string
> > 0.0 0.0% 100.0% 0.0 0.0% __GI__IO_file_doallocate
> >
> > After:
> >
> > (pprof) top
> > Total: 784.5 MB
> > 385.8 49.2% 49.2% 385.8 49.2% memdup
> > 285.8 36.4% 85.6% 285.8 36.4% zalloc
> > 80.4 10.3% 95.9% 83.7 10.7% map__new
> > 19.1 2.4% 98.3% 19.1 2.4% symbol__new
> > 6.2 0.8% 99.1% 6.2 0.8% alloc_event
> > 3.6 0.5% 99.6% 3.6 0.5% nsinfo__new
> > 3.2 0.4% 100.0% 3.2 0.4% nsinfo__copy
> > 0.2 0.0% 100.0% 0.2 0.0% dso__new
> > 0.0 0.0% 100.0% 0.0 0.0% do_read_string
> > 0.0 0.0% 100.0% 0.0 0.0% elf_fill
> >
> > v2 changes:
> > - rebased to Arnaldo's perf/core
> > - patch 1 already taken
> >
> > Also available in here:
> > git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> > perf/map_shared
>
> I rebased to latest perf/core and pushed the branch out

rebased and pushed out

jirka

2019-11-18 12:16:05

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCHv2 0/2] perf tools: Share struct map after clone

On Tue, Oct 29, 2019 at 09:58:55PM +0100, Jiri Olsa wrote:
> On Wed, Oct 23, 2019 at 09:55:17AM +0200, Jiri Olsa wrote:
> > On Wed, Oct 16, 2019 at 10:22:24AM +0200, Jiri Olsa wrote:
> > > hi,
> > > Andi reported that maps cloning is eating lot of memory and
> > > it's probably unnecessary, because they keep the same data.
> > >
> > > This 'maps sharing' seems to save lot of heap for reports with
> > > many forks/cloned mmaps (over 60% in example below).
> > >
> > > Profile kernel build:
> > >
> > > $ perf record make -j 40
> > >
> > > Get heap profile (tools/perf directory):
> > >
> > > $ <install gperftools>
> > > $ make TCMALLOC=1
> > > $ HEAPPROFILE=/tmp/heapprof ./perf report -i perf.data --stdio > out
> > > $ pprof ./perf /tmp/heapprof.000*
> > >
> > > Before:
> > >
> > > (pprof) top
> > > Total: 2335.5 MB
> > > 1735.1 74.3% 74.3% 1735.1 74.3% memdup
> > > 402.0 17.2% 91.5% 402.0 17.2% zalloc
> > > 140.2 6.0% 97.5% 145.8 6.2% map__new
> > > 33.6 1.4% 98.9% 33.6 1.4% symbol__new
> > > 12.4 0.5% 99.5% 12.4 0.5% alloc_event
> > > 6.2 0.3% 99.7% 6.2 0.3% nsinfo__new
> > > 5.5 0.2% 100.0% 5.5 0.2% nsinfo__copy
> > > 0.3 0.0% 100.0% 0.3 0.0% dso__new
> > > 0.1 0.0% 100.0% 0.1 0.0% do_read_string
> > > 0.0 0.0% 100.0% 0.0 0.0% __GI__IO_file_doallocate
> > >
> > > After:
> > >
> > > (pprof) top
> > > Total: 784.5 MB
> > > 385.8 49.2% 49.2% 385.8 49.2% memdup
> > > 285.8 36.4% 85.6% 285.8 36.4% zalloc
> > > 80.4 10.3% 95.9% 83.7 10.7% map__new
> > > 19.1 2.4% 98.3% 19.1 2.4% symbol__new
> > > 6.2 0.8% 99.1% 6.2 0.8% alloc_event
> > > 3.6 0.5% 99.6% 3.6 0.5% nsinfo__new
> > > 3.2 0.4% 100.0% 3.2 0.4% nsinfo__copy
> > > 0.2 0.0% 100.0% 0.2 0.0% dso__new
> > > 0.0 0.0% 100.0% 0.0 0.0% do_read_string
> > > 0.0 0.0% 100.0% 0.0 0.0% elf_fill
> > >
> > > v2 changes:
> > > - rebased to Arnaldo's perf/core
> > > - patch 1 already taken
> > >
> > > Also available in here:
> > > git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> > > perf/map_shared
> >
> > I rebased to latest perf/core and pushed the branch out
>
> rebased and pushed out

heya,
I lost track of this.. what's the status, are you going with your
version, or is this one still in? I don't see any of them in latest
code..

thanks,
jirka

2019-11-18 21:50:33

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCHv2 0/2] perf tools: Share struct map after clone

Em Mon, Nov 18, 2019 at 01:14:00PM +0100, Jiri Olsa escreveu:
> On Tue, Oct 29, 2019 at 09:58:55PM +0100, Jiri Olsa wrote:
> > > >
> > > > Also available in here:
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> > > > perf/map_shared

> > > I rebased to latest perf/core and pushed the branch out

> > rebased and pushed out

> heya,
> I lost track of this.. what's the status, are you going with your
> version, or is this one still in? I don't see any of them in latest
> code..

So, I'm still working on and off on this, current status is at:

https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/log/?h=perf/map_share

Its just one patch more than perf/core, the one that does the sharing.

The thing is, as I'm going over all the fields in 'struct map', it seems
that we'll end up with just one cacheline per instance, as there are
things there that are not strictly related to a map, but to a map_group
(unmap_ip/map_ip), or to a dso (maj, min, ino, ino_generation), and some
need less than what is allocated to them.

Current status is:

[root@quaco ~]# pahole -C map ~acme/bin/perf
struct map {
union {
struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
struct list_head node; /* 0 16 */
} __attribute__((__aligned__(8))); /* 0 24 */
u64 start; /* 24 8 */
u64 end; /* 32 8 */
_Bool erange_warned:1; /* 40: 0 1 */
_Bool priv:1; /* 40: 1 1 */

/* XXX 6 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */

u32 prot; /* 44 4 */
u64 pgoff; /* 48 8 */
u64 reloc; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 (*map_ip)(struct map *, u64); /* 64 8 */
u64 (*unmap_ip)(struct map *, u64); /* 72 8 */
struct dso * dso; /* 80 8 */
refcount_t refcnt; /* 88 4 */
u32 flags; /* 92 4 */

/* size: 96, cachelines: 2, members: 13 */
/* sum members: 92, holes: 1, sum holes: 3 */
/* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
/* forced alignments: 1 */
/* last cacheline: 32 bytes */
} __attribute__((__aligned__(8)));
[root@quaco ~]#

This is with the tentative move of maj/min/ino/ino_generation to 'struct
dso', but that needs more work to match the sort order that touches it
"dcacheline", i.e. a map that comes with the same backing DSO but
different values for those fields is not the same DSO, right?

Right now with moving the maj/min/etc to dso, in the map_share patch we
get the structure used to keep shared entries in the rb_tree at 40
bytes, under one cacheline, while the full 'struct map' is 32 bytes more
than one cacheline, so still good for sharing:

[acme@quaco perf]$ pahole -C map_node ~/bin/perf
struct map_node {
union {
struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
struct list_head node; /* 0 16 */
} __attribute__((__aligned__(8))); /* 0 24 */
refcount_t refcnt; /* 24 4 */
_Bool is_node:1; /* 28: 0 1 */

/* XXX 7 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */

struct map * map; /* 32 8 */

/* size: 40, cachelines: 1, members: 4 */
/* sum members: 36, holes: 1, sum holes: 3 */
/* sum bitfield members: 1 bits, bit holes: 1, sum bit holes: 7 bits */
/* forced alignments: 1 */
/* last cacheline: 40 bytes */
} __attribute__((__aligned__(8)));
[acme@quaco perf]$ pahole -C map ~/bin/perf
struct map {
union {
struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
struct list_head node; /* 0 16 */
} __attribute__((__aligned__(8))); /* 0 24 */
refcount_t refcnt; /* 24 4 */
_Bool is_node:1; /* 28: 0 1 */
_Bool erange_warned:1; /* 28: 1 1 */
_Bool priv:1; /* 28: 2 1 */

/* XXX 5 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */

u64 start; /* 32 8 */
u64 end; /* 40 8 */
u64 pgoff; /* 48 8 */
u64 reloc; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 (*map_ip)(struct map *, u64); /* 64 8 */
u64 (*unmap_ip)(struct map *, u64); /* 72 8 */
struct dso * dso; /* 80 8 */
u32 flags; /* 88 4 */
u32 prot; /* 92 4 */

/* size: 96, cachelines: 2, members: 14 */
/* sum members: 92, holes: 1, sum holes: 3 */
/* sum bitfield members: 3 bits, bit holes: 1, sum bit holes: 5 bits */
/* forced alignments: 1 */
/* last cacheline: 32 bytes */
} __attribute__((__aligned__(8)));
[acme@quaco perf]$

So give me some more time, please :-)

The sharing map is this one, without the maj/map/ move to 'struct dso':

https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?h=perf/map_share&id=451df1d4ad3c636f6be57b8e69b4f94c1bbf4a65

- Arnaldo

2019-11-19 11:08:22

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCHv2 0/2] perf tools: Share struct map after clone

On Mon, Nov 18, 2019 at 06:48:51PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Nov 18, 2019 at 01:14:00PM +0100, Jiri Olsa escreveu:
> > On Tue, Oct 29, 2019 at 09:58:55PM +0100, Jiri Olsa wrote:
> > > > >
> > > > > Also available in here:
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> > > > > perf/map_shared
>
> > > > I rebased to latest perf/core and pushed the branch out
>
> > > rebased and pushed out
>
> > heya,
> > I lost track of this.. what's the status, are you going with your
> > version, or is this one still in? I don't see any of them in latest
> > code..
>
> So, I'm still working on and off on this, current status is at:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/log/?h=perf/map_share
>
> Its just one patch more than perf/core, the one that does the sharing.
>
> The thing is, as I'm going over all the fields in 'struct map', it seems
> that we'll end up with just one cacheline per instance, as there are
> things there that are not strictly related to a map, but to a map_group
> (unmap_ip/map_ip), or to a dso (maj, min, ino, ino_generation), and some
> need less than what is allocated to them.
>
> Current status is:
>
> [root@quaco ~]# pahole -C map ~acme/bin/perf
> struct map {
> union {
> struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
> struct list_head node; /* 0 16 */
> } __attribute__((__aligned__(8))); /* 0 24 */
> u64 start; /* 24 8 */
> u64 end; /* 32 8 */
> _Bool erange_warned:1; /* 40: 0 1 */
> _Bool priv:1; /* 40: 1 1 */
>
> /* XXX 6 bits hole, try to pack */
> /* XXX 3 bytes hole, try to pack */
>
> u32 prot; /* 44 4 */
> u64 pgoff; /* 48 8 */
> u64 reloc; /* 56 8 */
> /* --- cacheline 1 boundary (64 bytes) --- */
> u64 (*map_ip)(struct map *, u64); /* 64 8 */
> u64 (*unmap_ip)(struct map *, u64); /* 72 8 */
> struct dso * dso; /* 80 8 */
> refcount_t refcnt; /* 88 4 */
> u32 flags; /* 92 4 */
>
> /* size: 96, cachelines: 2, members: 13 */
> /* sum members: 92, holes: 1, sum holes: 3 */
> /* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
> /* forced alignments: 1 */
> /* last cacheline: 32 bytes */
> } __attribute__((__aligned__(8)));
> [root@quaco ~]#
>
> This is with the tentative move of maj/min/ino/ino_generation to 'struct
> dso', but that needs more work to match the sort order that touches it
> "dcacheline", i.e. a map that comes with the same backing DSO but
> different values for those fields is not the same DSO, right?
>
> Right now with moving the maj/min/etc to dso, in the map_share patch we
> get the structure used to keep shared entries in the rb_tree at 40
> bytes, under one cacheline, while the full 'struct map' is 32 bytes more
> than one cacheline, so still good for sharing:
>
> [acme@quaco perf]$ pahole -C map_node ~/bin/perf
> struct map_node {
> union {
> struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
> struct list_head node; /* 0 16 */
> } __attribute__((__aligned__(8))); /* 0 24 */
> refcount_t refcnt; /* 24 4 */
> _Bool is_node:1; /* 28: 0 1 */
>
> /* XXX 7 bits hole, try to pack */
> /* XXX 3 bytes hole, try to pack */
>
> struct map * map; /* 32 8 */
>
> /* size: 40, cachelines: 1, members: 4 */
> /* sum members: 36, holes: 1, sum holes: 3 */
> /* sum bitfield members: 1 bits, bit holes: 1, sum bit holes: 7 bits */
> /* forced alignments: 1 */
> /* last cacheline: 40 bytes */
> } __attribute__((__aligned__(8)));
> [acme@quaco perf]$ pahole -C map ~/bin/perf
> struct map {
> union {
> struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
> struct list_head node; /* 0 16 */
> } __attribute__((__aligned__(8))); /* 0 24 */
> refcount_t refcnt; /* 24 4 */
> _Bool is_node:1; /* 28: 0 1 */
> _Bool erange_warned:1; /* 28: 1 1 */
> _Bool priv:1; /* 28: 2 1 */
>
> /* XXX 5 bits hole, try to pack */
> /* XXX 3 bytes hole, try to pack */
>
> u64 start; /* 32 8 */
> u64 end; /* 40 8 */
> u64 pgoff; /* 48 8 */
> u64 reloc; /* 56 8 */
> /* --- cacheline 1 boundary (64 bytes) --- */
> u64 (*map_ip)(struct map *, u64); /* 64 8 */
> u64 (*unmap_ip)(struct map *, u64); /* 72 8 */
> struct dso * dso; /* 80 8 */
> u32 flags; /* 88 4 */
> u32 prot; /* 92 4 */
>
> /* size: 96, cachelines: 2, members: 14 */
> /* sum members: 92, holes: 1, sum holes: 3 */
> /* sum bitfield members: 3 bits, bit holes: 1, sum bit holes: 5 bits */
> /* forced alignments: 1 */
> /* last cacheline: 32 bytes */
> } __attribute__((__aligned__(8)));
> [acme@quaco perf]$
>
> So give me some more time, please :-)

sure ;-) I just did not want to loose track of this

thanks,
jirka