2007-05-04 03:08:43

by Christoph Lameter

[permalink] [raw]
Subject: Remove constructor from buffer_head

Performance tests show a slight improvements in netperf (not a
strong case for a performance improvement but removing the
constructor has definitely no negative impact so why keep
this around?).

TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

Before:
87380 16384 16384 10.01 6026.04
87380 16384 16384 10.01 5992.17
87380 16384 16384 10.01 6071.23

After:
87380 16384 16384 10.01 6090.20
87380 16384 16384 10.01 6078.3
87380 16384 16384 10.00 6013.52


Signed-off-by: Christoph Lameter <[email protected]>

---
fs/buffer.c | 22 ++++------------------
1 file changed, 4 insertions(+), 18 deletions(-)

Index: slub/fs/buffer.c
===================================================================
--- slub.orig/fs/buffer.c 2007-05-03 19:17:09.000000000 -0700
+++ slub/fs/buffer.c 2007-05-03 19:57:30.000000000 -0700
@@ -2907,9 +2907,10 @@ static void recalc_bh_state(void)

struct buffer_head *alloc_buffer_head(gfp_t gfp_flags)
{
- struct buffer_head *ret = kmem_cache_alloc(bh_cachep,
+ struct buffer_head *ret = kmem_cache_zalloc(bh_cachep,
set_migrateflags(gfp_flags, __GFP_RECLAIMABLE));
if (ret) {
+ INIT_LIST_HEAD(&ret->b_assoc_buffers);
get_cpu_var(bh_accounting).nr++;
recalc_bh_state();
put_cpu_var(bh_accounting);
@@ -2928,17 +2929,6 @@ void free_buffer_head(struct buffer_head
}
EXPORT_SYMBOL(free_buffer_head);

-static void
-init_buffer_head(void *data, struct kmem_cache *cachep, unsigned long flags)
-{
- if (flags & SLAB_CTOR_CONSTRUCTOR) {
- struct buffer_head * bh = (struct buffer_head *)data;
-
- memset(bh, 0, sizeof(*bh));
- INIT_LIST_HEAD(&bh->b_assoc_buffers);
- }
-}
-
static void buffer_exit_cpu(int cpu)
{
int i;
@@ -2965,12 +2955,8 @@ void __init buffer_init(void)
{
int nrpages;

- bh_cachep = kmem_cache_create("buffer_head",
- sizeof(struct buffer_head), 0,
- (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
- SLAB_MEM_SPREAD),
- init_buffer_head,
- NULL);
+ bh_cachep = KMEM_CACHE(buffer_head,
+ SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD);

/*
* Limit the bh occupancy to 10% of ZONE_NORMAL


2007-05-04 03:21:15

by Andrew Morton

[permalink] [raw]
Subject: Re: Remove constructor from buffer_head

On Thu, 3 May 2007 20:08:41 -0700 (PDT) Christoph Lameter <[email protected]> wrote:

> Performance tests show a slight improvements in netperf (not a
> strong case for a performance improvement but removing the
> constructor has definitely no negative impact so why keep
> this around?).
>
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET
> Recv Send Send
> Socket Socket Message Elapsed
> Size Size Size Time Throughput
> bytes bytes bytes secs. 10^6bits/sec
>
> Before:
> 87380 16384 16384 10.01 6026.04
> 87380 16384 16384 10.01 5992.17
> 87380 16384 16384 10.01 6071.23
>
> After:
> 87380 16384 16384 10.01 6090.20
> 87380 16384 16384 10.01 6078.3
> 87380 16384 16384 10.00 6013.52

How could a filesystem change affect networking performance?

The change looks nice, but I'd microbenchmark it with a write-to-ext2-on-ramdisk
or something like that.

2007-05-04 03:34:52

by Christoph Lameter

[permalink] [raw]
Subject: Re: Remove constructor from buffer_head

On Thu, 3 May 2007, Andrew Morton wrote:

> On Thu, 3 May 2007 20:08:41 -0700 (PDT) Christoph Lameter <[email protected]> wrote:
>
> > Performance tests show a slight improvements in netperf (not a
> > strong case for a performance improvement but removing the
> > constructor has definitely no negative impact so why keep
> > this around?).
> >
> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET
> > Recv Send Send
> > Socket Socket Message Elapsed
> > Size Size Size Time Throughput
> > bytes bytes bytes secs. 10^6bits/sec
> >
> > Before:
> > 87380 16384 16384 10.01 6026.04
> > 87380 16384 16384 10.01 5992.17
> > 87380 16384 16384 10.01 6071.23
> >
> > After:
> > 87380 16384 16384 10.01 6090.20
> > 87380 16384 16384 10.01 6078.3
> > 87380 16384 16384 10.00 6013.52
>
> How could a filesystem change affect networking performance?
>
> The change looks nice, but I'd microbenchmark it with a write-to-ext2-on-ramdisk
> or something like that.

Hmmmm.. I was told in another thread that this is the most frequently used
slab for this benchmark ...... Just accepted that as true.

2007-05-04 04:34:17

by Christoph Lameter

[permalink] [raw]
Subject: Re: Remove constructor from buffer_head

On Thu, 3 May 2007, Andrew Morton wrote:

> The change looks nice, but I'd microbenchmark it with a write-to-ext2-on-ramdisk
> or something like that.

Hmmm... How does one benchmark buffer head performance? Guess just by
copying files? Not sure if the following will cut it.

Two tests. First copying 8M of small files into a 16M ramdisk:

for i in 1 2 3 4 5 6 7 8 9; do

mke2fs /dev/ram0 >/dev/null
mount /dev/ram0 /media >/dev/null
time cp -a /etc /media
umount /dev/ram0

done;


No constructor

real 0m0.104s
user 0m0.016s
sys 0m0.056s

real 0m0.090s
user 0m0.008s
sys 0m0.056s

real 0m0.089s
user 0m0.016s
sys 0m0.048s

real 0m0.097s
user 0m0.004s
sys 0m0.064s

real 0m0.091s
user 0m0.008s
sys 0m0.052s

real 0m0.091s
user 0m0.004s
sys 0m0.060s

real 0m0.098s
user 0m0.008s
sys 0m0.060s

real 0m0.091s
user 0m0.000s
sys 0m0.064s

real 0m0.090s
user 0m0.012s
sys 0m0.052s

W/constructor

real 0m0.099s
user 0m0.004s
sys 0m0.100s

real 0m0.098s
user 0m0.008s
sys 0m0.096s

real 0m0.091s
user 0m0.016s
sys 0m0.080s

real 0m0.091s
user 0m0.012s
sys 0m0.084s

real 0m0.090s
user 0m0.012s
sys 0m0.080s

real 0m0.090s
user 0m0.020s
sys 0m0.076s

real 0m1.269s
user 0m0.012s
sys 0m0.084s

real 0m0.095s
user 0m0.016s
sys 0m0.084s

real 0m0.096s
user 0m0.020s
sys 0m0.084s

The no constructor numbers are generally lower.
Lowest is no constructor with 0.089.

Second. Copy vmlinux (52M) to 128M ramdisk:

for i in 1 2 3 4 5 6 7 8 9; do

mke2fs /dev/ram0 >/dev/null
mount /dev/ram0 /media >/dev/null
time cp slub/vmlinux /media
umount /dev/ram0

done;


No constructor:

real 0m2.095s
user 0m0.000s
sys 0m0.168s

real 0m0.187s
user 0m0.008s
sys 0m0.124s

real 0m0.186s
user 0m0.008s
sys 0m0.120s

real 0m0.195s
user 0m0.008s
sys 0m0.128s

real 0m0.177s
user 0m0.004s
sys 0m0.120s

real 0m0.182s
user 0m0.004s
sys 0m0.120s

real 0m0.186s
user 0m0.008s
sys 0m0.120s

real 0m0.190s
user 0m0.004s
sys 0m0.128s

real 0m0.174s
user 0m0.004s
sys 0m0.116s


Constructor

real 0m0.183s
user 0m0.004s
sys 0m0.188s

real 0m0.183s
user 0m0.004s
sys 0m0.192s

real 0m0.177s
user 0m0.012s
sys 0m0.176s

real 0m0.186s
user 0m0.004s
sys 0m0.192s

real 0m0.187s
user 0m0.008s
sys 0m0.188s

real 0m0.184s
user 0m0.004s
sys 0m0.192s

real 0m0.177s
user 0m0.012s
sys 0m0.176s

real 0m0.183s
user 0m0.004s
sys 0m0.192s

real 0m0.182s
user 0m0.004s
sys 0m0.188s

Same here. Low is 0.174 no constructor.


2007-05-04 04:37:22

by Andrew Morton

[permalink] [raw]
Subject: Re: Remove constructor from buffer_head

On Thu, 3 May 2007 20:34:48 -0700 (PDT) Christoph Lameter <[email protected]> wrote:

> On Thu, 3 May 2007, Andrew Morton wrote:
>
> > On Thu, 3 May 2007 20:08:41 -0700 (PDT) Christoph Lameter <[email protected]> wrote:
> >
> > > Performance tests show a slight improvements in netperf (not a
> > > strong case for a performance improvement but removing the
> > > constructor has definitely no negative impact so why keep
> > > this around?).
> > >
> > > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET
> > > Recv Send Send
> > > Socket Socket Message Elapsed
> > > Size Size Size Time Throughput
> > > bytes bytes bytes secs. 10^6bits/sec
> > >
> > > Before:
> > > 87380 16384 16384 10.01 6026.04
> > > 87380 16384 16384 10.01 5992.17
> > > 87380 16384 16384 10.01 6071.23
> > >
> > > After:
> > > 87380 16384 16384 10.01 6090.20
> > > 87380 16384 16384 10.01 6078.3
> > > 87380 16384 16384 10.00 6013.52
> >
> > How could a filesystem change affect networking performance?
> >
> > The change looks nice, but I'd microbenchmark it with a write-to-ext2-on-ramdisk
> > or something like that.
>
> Hmmmm.. I was told in another thread that this is the most frequently used
> slab for this benchmark

That would be hair-raising ;) I suspect confusion with sk_buff.

buffer_heads do get used quite a bit though. A good microbenchmark would
be to sit in a tight loop extending and truncating an ext2 file

2007-05-04 06:34:58

by William Lee Irwin III

[permalink] [raw]
Subject: Re: Remove constructor from buffer_head

On Thu, May 03, 2007 at 08:08:41PM -0700, Christoph Lameter wrote:
> Performance tests show a slight improvements in netperf (not a
> strong case for a performance improvement but removing the
> constructor has definitely no negative impact so why keep
> this around?).

Cache effects are not so easily visible. Cache profile results from
more realistic workloads (e.g. major macrobenchmarks) are more
appropriate for gauging this.


-- wli

2007-05-04 16:05:36

by Christoph Lameter

[permalink] [raw]
Subject: Re: Remove constructor from buffer_head

On Thu, 3 May 2007, William Lee Irwin III wrote:

> On Thu, May 03, 2007 at 08:08:41PM -0700, Christoph Lameter wrote:
> > Performance tests show a slight improvements in netperf (not a
> > strong case for a performance improvement but removing the
> > constructor has definitely no negative impact so why keep
> > this around?).
>
> Cache effects are not so easily visible. Cache profile results from
> more realistic workloads (e.g. major macrobenchmarks) are more
> appropriate for gauging this.

Yeah I really out to stick a performance counter in this but that would
require some effort. Defer for now I guess.

2007-05-04 20:42:17

by Andrew Morton

[permalink] [raw]
Subject: Re: Remove constructor from buffer_head

On Thu, 3 May 2007 20:08:41 -0700 (PDT)
Christoph Lameter <[email protected]> wrote:

> Performance tests show a slight improvements in netperf (not a
> strong case for a performance improvement but removing the
> constructor has definitely no negative impact so why keep
> this around?).
>
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET
> Recv Send Send
> Socket Socket Message Elapsed
> Size Size Size Time Throughput
> bytes bytes bytes secs. 10^6bits/sec
>
> Before:
> 87380 16384 16384 10.01 6026.04
> 87380 16384 16384 10.01 5992.17
> 87380 16384 16384 10.01 6071.23
>
> After:
> 87380 16384 16384 10.01 6090.20
> 87380 16384 16384 10.01 6078.3
> 87380 16384 16384 10.00 6013.52
>
>
> Signed-off-by: Christoph Lameter <[email protected]>
>
> ---
> fs/buffer.c | 22 ++++------------------


So I benchmarked this by repeatedly extending (via write()) and truncating
a 10MB file, on ext2. Using create-delete.c from
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz

Machine is a fast 2x4 core Woodcrest. CONFIG_SLAB=y

The command used was

time create-delete -s $((16 * 1024 * 1024)) -n 300 foo

which will allocate and free 300*4096 buffer_heads.

With patch:

akpm2:/mnt/sda2> time create-delete -s $((16 * 1024 * 1024)) -n 300 foo
create-delete -s $((16 * 1024 * 1024)) -n 300 foo 0.00s user 4.56s system 99% cpu 4.565 total
akpm2:/mnt/sda2> time create-delete -s $((16 * 1024 * 1024)) -n 300 foo
create-delete -s $((16 * 1024 * 1024)) -n 300 foo 0.00s user 4.60s system 99% cpu 4.612 total
akpm2:/mnt/sda2> time create-delete -s $((16 * 1024 * 1024)) -n 300 foo
create-delete -s $((16 * 1024 * 1024)) -n 300 foo 0.00s user 4.60s system 99% cpu 4.602 total
akpm2:/mnt/sda2> time create-delete -s $((16 * 1024 * 1024)) -n 300 foo
create-delete -s $((16 * 1024 * 1024)) -n 300 foo 0.00s user 4.56s system 99% cpu 4.567 total
akpm2:/mnt/sda2> time create-delete -s $((16 * 1024 * 1024)) -n 300 foo
create-delete -s $((16 * 1024 * 1024)) -n 300 foo 0.00s user 4.59s system 95% cpu 4.824 total

Without patch:

akpm2:/mnt/sda2> time create-delete -s $((16 * 1024 * 1024)) -n 300 foo
create-delete -s $((16 * 1024 * 1024)) -n 300 foo 0.00s user 4.42s system 99% cpu 4.419 total
akpm2:/mnt/sda2> time create-delete -s $((16 * 1024 * 1024)) -n 300 foo
create-delete -s $((16 * 1024 * 1024)) -n 300 foo 0.00s user 4.42s system 99% cpu 4.421 total
akpm2:/mnt/sda2> time create-delete -s $((16 * 1024 * 1024)) -n 300 foo
create-delete -s $((16 * 1024 * 1024)) -n 300 foo 0.00s user 4.42s system 99% cpu 4.427 total
akpm2:/mnt/sda2> time create-delete -s $((16 * 1024 * 1024)) -n 300 foo
create-delete -s $((16 * 1024 * 1024)) -n 300 foo 0.00s user 4.42s system 99% cpu 4.417 total
akpm2:/mnt/sda2> time create-delete -s $((16 * 1024 * 1024)) -n 300 foo
create-delete -s $((16 * 1024 * 1024)) -n 300 foo 0.00s user 4.42s system 99% cpu 4.435 total

So the patch took the average system time from 4.42 seconds up to 4.582
seconds. Nice slowdown!

It could just be the usual inter-kernel-build noise, dunno.

I'd investigate further, but someone has gone and broken oprofile.

2007-05-04 21:33:54

by Andrew Morton

[permalink] [raw]
Subject: Re: Remove constructor from buffer_head

On Fri, 4 May 2007 13:42:12 -0700
Andrew Morton <[email protected]> wrote:

> I'd investigate further, but someone has gone and broken oprofile.

Damn, we went and merged that bustage?


2.6.20:

akpm2:/home/akpm> opcontrol --start-daemon
/usr/bin/opcontrol: line 1098: /dev/oprofile/0/enabled: No such file or directory
/usr/bin/opcontrol: line 1098: /dev/oprofile/0/event: No such file or directory
/usr/bin/opcontrol: line 1098: /dev/oprofile/0/count: No such file or directory
/usr/bin/opcontrol: line 1098: /dev/oprofile/0/kernel: No such file or directory
/usr/bin/opcontrol: line 1098: /dev/oprofile/0/user: No such file or directory
/usr/bin/opcontrol: line 1098: /dev/oprofile/0/unit_mask: No such file or directory

2.6.21:

akpm2:/home/akpm# opreport -l /boot/vmlinux-$(uname -r) | head -50
opreport error: No sample file found: try running opcontrol --dump
or specify a session containing sample files



This is an FC6 machine. `yum update oprofile' says

Could not find update match for oprofile
No Packages marked for Update/Obsoletion

akpm2:/home/akpm> rpm -q oprofile
oprofile-0.9.2-3.fc6


I'm quite stunned that we did this.

Now what?

2007-05-04 21:42:05

by Christoph Lameter

[permalink] [raw]
Subject: Re: Remove constructor from buffer_head

On Fri, 4 May 2007, Andrew Morton wrote:

> So the patch took the average system time from 4.42 seconds up to 4.582
> seconds. Nice slowdown!

All of that from a memset and a list head init on a cacheline we already
use?

2007-05-04 21:48:05

by Andrew Morton

[permalink] [raw]
Subject: Re: Remove constructor from buffer_head

On Fri, 4 May 2007 14:42:02 -0700 (PDT)
Christoph Lameter <[email protected]> wrote:

> On Fri, 4 May 2007, Andrew Morton wrote:
>
> > So the patch took the average system time from 4.42 seconds up to 4.582
> > seconds. Nice slowdown!
>
> All of that from a memset and a list head init on a cacheline we already
> use?

Seems unlikely, especially when you consider all the other stuff which a write()
has to do.

2007-05-04 21:52:21

by Chuck Ebbert

[permalink] [raw]
Subject: Re: Remove constructor from buffer_head

Andrew Morton wrote:
>
> I'd investigate further, but someone has gone and broken oprofile.
>

Did you just notice that? Apparently it's been broken since 2.6.21-final.

2007-05-04 23:22:13

by Andi Kleen

[permalink] [raw]
Subject: Re: Remove constructor from buffer_head

On Friday 04 May 2007 23:33:47 Andrew Morton wrote:
> On Fri, 4 May 2007 13:42:12 -0700

>
> 2.6.20:
>
> akpm2:/home/akpm> opcontrol --start-daemon
> /usr/bin/opcontrol: line 1098: /dev/oprofile/0/enabled: No such file or directory
> /usr/bin/opcontrol: line 1098: /dev/oprofile/0/event: No such file or directory
> /usr/bin/opcontrol: line 1098: /dev/oprofile/0/count: No such file or directory
> /usr/bin/opcontrol: line 1098: /dev/oprofile/0/kernel: No such file or directory
> /usr/bin/opcontrol: line 1098: /dev/oprofile/0/user: No such file or directory
> /usr/bin/opcontrol: line 1098: /dev/oprofile/0/unit_mask: No such file or directory

This isn't a problem anymore since the nmi watchdog is off by default now.

> 2.6.21:
>
> akpm2:/home/akpm# opreport -l /boot/vmlinux-$(uname -r) | head -50
> opreport error: No sample file found: try running opcontrol --dump
> or specify a session containing sample files

For me it works on a slightly post 2.6.21 kernel with suse oprofile-0.9.2-21

Did you try opcontrol --dump?

-Andi

2007-05-04 23:46:36

by Andrew Morton

[permalink] [raw]
Subject: Re: Remove constructor from buffer_head

On Sat, 5 May 2007 01:22:05 +0200
Andi Kleen <[email protected]> wrote:

> > 2.6.21:
> >
> > akpm2:/home/akpm# opreport -l /boot/vmlinux-$(uname -r) | head -50
> > opreport error: No sample file found: try running opcontrol --dump
> > or specify a session containing sample files
>
> For me it works on a slightly post 2.6.21 kernel with suse oprofile-0.9.2-21
>
> Did you try opcontrol --dump?

Yes, tried various things. There's just nothing turning up in /var/lib/oprofile.

Chuck appears to be claiming that 2.6.21 oprofile is known to be broken,
but I never heard anything about that.

2007-05-05 09:31:30

by Andi Kleen

[permalink] [raw]
Subject: Re: Remove constructor from buffer_head

On Fri, May 04, 2007 at 04:45:29PM -0700, Andrew Morton wrote:
> On Sat, 5 May 2007 01:22:05 +0200
> Andi Kleen <[email protected]> wrote:
>
> > > 2.6.21:
> > >
> > > akpm2:/home/akpm# opreport -l /boot/vmlinux-$(uname -r) | head -50
> > > opreport error: No sample file found: try running opcontrol --dump
> > > or specify a session containing sample files
> >
> > For me it works on a slightly post 2.6.21 kernel with suse oprofile-0.9.2-21
> >
> > Did you try opcontrol --dump?
>
> Yes, tried various things. There's just nothing turning up in /var/lib/oprofile.
>
> Chuck appears to be claiming that 2.6.21 oprofile is known to be broken,
> but I never heard anything about that.

Hmm, after a opcontrol --reset i see the same issue now. Don't know what's
wrong, but it must be something different from the .20 perfctr allocation
problem.

It looks like the daemon doesn't get any data from the kernel


-Andi

2007-05-13 20:38:33

by Benjamin LaHaise

[permalink] [raw]
Subject: oprofile broken in 2.6.21 SMP (was Re: Remove constructor from buffer_head)

On Sat, May 05, 2007 at 11:31:20AM +0200, Andi Kleen wrote:
> Hmm, after a opcontrol --reset i see the same issue now. Don't know what's
> wrong, but it must be something different from the .20 perfctr allocation
> problem.
>
> It looks like the daemon doesn't get any data from the kernel

I finally had time to track this down. The breakage is caused by "[PATCH]
x86-64: Let oprofile reserve MSR on all CPUs". Oprofile is already calling
the reserve functions on each CPU in the system when it sets up the MSRs.
This results in oprofile getting a reservation failure on CPUs above 0. The
following makes oprofile adapt to the API change for now -- oprofile
still needs to be modified to perform the reservations earlier during its
initialization, but that's a little bit more involved than the immediate
bug fix. This only affects systems with more than 1 CPU. This patch has
been through limited testing (Athlon 64 X2 and Core 2, but not on the P4) on
x86 and x86-64 (Core 2 only).

-ben

Signed-off-by: Benjamin LaHaise <[email protected]>
diff --git a/arch/i386/kernel/nmi.c b/arch/i386/kernel/nmi.c
index 84c3497..21fc74d 100644
--- a/arch/i386/kernel/nmi.c
+++ b/arch/i386/kernel/nmi.c
@@ -148,7 +148,7 @@ int avail_to_resrv_perfctr_nmi(unsigned int msr)
return 1;
}

-static int __reserve_perfctr_nmi(int cpu, unsigned int msr)
+int __reserve_perfctr_nmi(int cpu, unsigned int msr)
{
unsigned int counter;
if (cpu < 0)
@@ -162,7 +162,7 @@ static int __reserve_perfctr_nmi(int cpu, unsigned int msr)
return 0;
}

-static void __release_perfctr_nmi(int cpu, unsigned int msr)
+void __release_perfctr_nmi(int cpu, unsigned int msr)
{
unsigned int counter;
if (cpu < 0)
@@ -212,7 +212,7 @@ int __reserve_evntsel_nmi(int cpu, unsigned int msr)
return 0;
}

-static void __release_evntsel_nmi(int cpu, unsigned int msr)
+void __release_evntsel_nmi(int cpu, unsigned int msr)
{
unsigned int counter;
if (cpu < 0)
@@ -1188,5 +1188,9 @@ EXPORT_SYMBOL(reserve_perfctr_nmi);
EXPORT_SYMBOL(release_perfctr_nmi);
EXPORT_SYMBOL(reserve_evntsel_nmi);
EXPORT_SYMBOL(release_evntsel_nmi);
+EXPORT_SYMBOL(__reserve_perfctr_nmi);
+EXPORT_SYMBOL(__release_perfctr_nmi);
+EXPORT_SYMBOL(__reserve_evntsel_nmi);
+EXPORT_SYMBOL(__release_evntsel_nmi);
EXPORT_SYMBOL(disable_timer_nmi_watchdog);
EXPORT_SYMBOL(enable_timer_nmi_watchdog);
diff --git a/arch/i386/oprofile/op_model_athlon.c b/arch/i386/oprofile/op_model_athlon.c
index 3057a19..738a579 100644
--- a/arch/i386/oprofile/op_model_athlon.c
+++ b/arch/i386/oprofile/op_model_athlon.c
@@ -45,14 +45,14 @@ static void athlon_fill_in_addresses(struct op_msrs * const msrs)
int i;

for (i=0; i < NUM_COUNTERS; i++) {
- if (reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i))
+ if (__reserve_perfctr_nmi(-1, MSR_K7_PERFCTR0 + i))
msrs->counters[i].addr = MSR_K7_PERFCTR0 + i;
else
msrs->counters[i].addr = 0;
}

for (i=0; i < NUM_CONTROLS; i++) {
- if (reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i))
+ if (__reserve_evntsel_nmi(-1, MSR_K7_EVNTSEL0 + i))
msrs->controls[i].addr = MSR_K7_EVNTSEL0 + i;
else
msrs->controls[i].addr = 0;
@@ -160,11 +160,11 @@ static void athlon_shutdown(struct op_msrs const * const msrs)

for (i = 0 ; i < NUM_COUNTERS ; ++i) {
if (CTR_IS_RESERVED(msrs,i))
- release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
+ __release_perfctr_nmi(-1, MSR_K7_PERFCTR0 + i);
}
for (i = 0 ; i < NUM_CONTROLS ; ++i) {
if (CTRL_IS_RESERVED(msrs,i))
- release_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
+ __release_evntsel_nmi(-1, MSR_K7_EVNTSEL0 + i);
}
}

diff --git a/arch/i386/oprofile/op_model_p4.c b/arch/i386/oprofile/op_model_p4.c
index 4792592..ce096dc 100644
--- a/arch/i386/oprofile/op_model_p4.c
+++ b/arch/i386/oprofile/op_model_p4.c
@@ -413,7 +413,7 @@ static void p4_fill_in_addresses(struct op_msrs * const msrs)
for (i = 0; i < num_counters; ++i) {
addr = p4_counters[VIRT_CTR(stag, i)].counter_address;
cccraddr = p4_counters[VIRT_CTR(stag, i)].cccr_address;
- if (reserve_perfctr_nmi(addr)){
+ if (__reserve_perfctr_nmi(-1, addr)){
msrs->counters[i].addr = addr;
msrs->controls[i].addr = cccraddr;
}
@@ -422,7 +422,7 @@ static void p4_fill_in_addresses(struct op_msrs * const msrs)
/* 43 ESCR registers in three or four discontiguous group */
for (addr = MSR_P4_BSU_ESCR0 + stag;
addr < MSR_P4_IQ_ESCR0; ++i, addr += addr_increment()) {
- if (reserve_evntsel_nmi(addr))
+ if (__reserve_evntsel_nmi(-1, addr))
msrs->controls[i].addr = addr;
}

@@ -431,32 +431,32 @@ static void p4_fill_in_addresses(struct op_msrs * const msrs)
if (boot_cpu_data.x86_model >= 0x3) {
for (addr = MSR_P4_BSU_ESCR0 + stag;
addr <= MSR_P4_BSU_ESCR1; ++i, addr += addr_increment()) {
- if (reserve_evntsel_nmi(addr))
+ if (__reserve_evntsel_nmi(-1, addr))
msrs->controls[i].addr = addr;
}
} else {
for (addr = MSR_P4_IQ_ESCR0 + stag;
addr <= MSR_P4_IQ_ESCR1; ++i, addr += addr_increment()) {
- if (reserve_evntsel_nmi(addr))
+ if (__reserve_evntsel_nmi(-1, addr))
msrs->controls[i].addr = addr;
}
}

for (addr = MSR_P4_RAT_ESCR0 + stag;
addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) {
- if (reserve_evntsel_nmi(addr))
+ if (__reserve_evntsel_nmi(-1, addr))
msrs->controls[i].addr = addr;
}

for (addr = MSR_P4_MS_ESCR0 + stag;
addr <= MSR_P4_TC_ESCR1; ++i, addr += addr_increment()) {
- if (reserve_evntsel_nmi(addr))
+ if (__reserve_evntsel_nmi(-1, addr))
msrs->controls[i].addr = addr;
}

for (addr = MSR_P4_IX_ESCR0 + stag;
addr <= MSR_P4_CRU_ESCR3; ++i, addr += addr_increment()) {
- if (reserve_evntsel_nmi(addr))
+ if (__reserve_evntsel_nmi(-1, addr))
msrs->controls[i].addr = addr;
}

@@ -464,21 +464,21 @@ static void p4_fill_in_addresses(struct op_msrs * const msrs)

if (num_counters == NUM_COUNTERS_NON_HT) {
/* standard non-HT CPUs handle both remaining ESCRs*/
- if (reserve_evntsel_nmi(MSR_P4_CRU_ESCR5))
+ if (__reserve_evntsel_nmi(-1, MSR_P4_CRU_ESCR5))
msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
- if (reserve_evntsel_nmi(MSR_P4_CRU_ESCR4))
+ if (__reserve_evntsel_nmi(-1, MSR_P4_CRU_ESCR4))
msrs->controls[i++].addr = MSR_P4_CRU_ESCR4;

} else if (stag == 0) {
/* HT CPUs give the first remainder to the even thread, as
the 32nd control register */
- if (reserve_evntsel_nmi(MSR_P4_CRU_ESCR4))
+ if (__reserve_evntsel_nmi(-1, MSR_P4_CRU_ESCR4))
msrs->controls[i++].addr = MSR_P4_CRU_ESCR4;

} else {
/* and two copies of the second to the odd thread,
for the 22st and 23nd control registers */
- if (reserve_evntsel_nmi(MSR_P4_CRU_ESCR5)) {
+ if (__reserve_evntsel_nmi(-1, MSR_P4_CRU_ESCR5)) {
msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
}
@@ -684,7 +684,7 @@ static void p4_shutdown(struct op_msrs const * const msrs)

for (i = 0 ; i < num_counters ; ++i) {
if (CTR_IS_RESERVED(msrs,i))
- release_perfctr_nmi(msrs->counters[i].addr);
+ __release_perfctr_nmi(-1, msrs->counters[i].addr);
}
/* some of the control registers are specially reserved in
* conjunction with the counter registers (hence the starting offset).
@@ -692,7 +692,7 @@ static void p4_shutdown(struct op_msrs const * const msrs)
*/
for (i = num_counters ; i < num_controls ; ++i) {
if (CTRL_IS_RESERVED(msrs,i))
- release_evntsel_nmi(msrs->controls[i].addr);
+ __release_evntsel_nmi(-1, msrs->controls[i].addr);
}
}

diff --git a/arch/i386/oprofile/op_model_ppro.c b/arch/i386/oprofile/op_model_ppro.c
index c554f52..10d2c5d 100644
--- a/arch/i386/oprofile/op_model_ppro.c
+++ b/arch/i386/oprofile/op_model_ppro.c
@@ -47,14 +47,14 @@ static void ppro_fill_in_addresses(struct op_msrs * const msrs)
int i;

for (i=0; i < NUM_COUNTERS; i++) {
- if (reserve_perfctr_nmi(MSR_P6_PERFCTR0 + i))
+ if (__reserve_perfctr_nmi(-1, MSR_P6_PERFCTR0 + i))
msrs->counters[i].addr = MSR_P6_PERFCTR0 + i;
else
msrs->counters[i].addr = 0;
}

for (i=0; i < NUM_CONTROLS; i++) {
- if (reserve_evntsel_nmi(MSR_P6_EVNTSEL0 + i))
+ if (__reserve_evntsel_nmi(-1, MSR_P6_EVNTSEL0 + i))
msrs->controls[i].addr = MSR_P6_EVNTSEL0 + i;
else
msrs->controls[i].addr = 0;
@@ -171,11 +171,11 @@ static void ppro_shutdown(struct op_msrs const * const msrs)

for (i = 0 ; i < NUM_COUNTERS ; ++i) {
if (CTR_IS_RESERVED(msrs,i))
- release_perfctr_nmi(MSR_P6_PERFCTR0 + i);
+ __release_perfctr_nmi(-1, MSR_P6_PERFCTR0 + i);
}
for (i = 0 ; i < NUM_CONTROLS ; ++i) {
if (CTRL_IS_RESERVED(msrs,i))
- release_evntsel_nmi(MSR_P6_EVNTSEL0 + i);
+ __release_evntsel_nmi(-1, MSR_P6_EVNTSEL0 + i);
}
}

diff --git a/arch/x86_64/kernel/nmi.c b/arch/x86_64/kernel/nmi.c
index dfab9f1..5011a3b 100644
--- a/arch/x86_64/kernel/nmi.c
+++ b/arch/x86_64/kernel/nmi.c
@@ -135,7 +135,7 @@ int avail_to_resrv_perfctr_nmi(unsigned int msr)
return 1;
}

-static int __reserve_perfctr_nmi(int cpu, unsigned int msr)
+int __reserve_perfctr_nmi(int cpu, unsigned int msr)
{
unsigned int counter;
if (cpu < 0)
@@ -149,7 +149,7 @@ static int __reserve_perfctr_nmi(int cpu, unsigned int msr)
return 0;
}

-static void __release_perfctr_nmi(int cpu, unsigned int msr)
+void __release_perfctr_nmi(int cpu, unsigned int msr)
{
unsigned int counter;
if (cpu < 0)
@@ -198,7 +198,7 @@ int __reserve_evntsel_nmi(int cpu, unsigned int msr)
return 0;
}

-static void __release_evntsel_nmi(int cpu, unsigned int msr)
+void __release_evntsel_nmi(int cpu, unsigned int msr)
{
unsigned int counter;
if (cpu < 0)
@@ -1073,6 +1073,10 @@ EXPORT_SYMBOL(reserve_perfctr_nmi);
EXPORT_SYMBOL(release_perfctr_nmi);
EXPORT_SYMBOL(reserve_evntsel_nmi);
EXPORT_SYMBOL(release_evntsel_nmi);
+EXPORT_SYMBOL(__reserve_perfctr_nmi);
+EXPORT_SYMBOL(__release_perfctr_nmi);
+EXPORT_SYMBOL(__reserve_evntsel_nmi);
+EXPORT_SYMBOL(__release_evntsel_nmi);
EXPORT_SYMBOL(disable_timer_nmi_watchdog);
EXPORT_SYMBOL(enable_timer_nmi_watchdog);
EXPORT_SYMBOL(touch_nmi_watchdog);
diff --git a/include/asm-i386/nmi.h b/include/asm-i386/nmi.h
index b04333e..062db4d 100644
--- a/include/asm-i386/nmi.h
+++ b/include/asm-i386/nmi.h
@@ -25,6 +25,11 @@ extern void release_perfctr_nmi(unsigned int);
extern int reserve_evntsel_nmi(unsigned int);
extern void release_evntsel_nmi(unsigned int);

+extern int __reserve_perfctr_nmi(int cpu, unsigned int msr);
+extern void __release_perfctr_nmi(int cpu, unsigned int msr);
+extern int __reserve_evntsel_nmi(int cpu, unsigned int msr);
+extern void __release_evntsel_nmi(int cpu, unsigned int msr);
+
extern void setup_apic_nmi_watchdog (void *);
extern void stop_apic_nmi_watchdog (void *);
extern void disable_timer_nmi_watchdog(void);
diff --git a/include/asm-x86_64/nmi.h b/include/asm-x86_64/nmi.h
index 72375e7..5d6a0b3 100644
--- a/include/asm-x86_64/nmi.h
+++ b/include/asm-x86_64/nmi.h
@@ -53,6 +53,11 @@ extern void release_perfctr_nmi(unsigned int);
extern int reserve_evntsel_nmi(unsigned int);
extern void release_evntsel_nmi(unsigned int);

+extern int __reserve_perfctr_nmi(int cpu, unsigned int msr);
+extern void __release_perfctr_nmi(int cpu, unsigned int msr);
+extern int __reserve_evntsel_nmi(int cpu, unsigned int msr);
+extern void __release_evntsel_nmi(int cpu, unsigned int msr);
+
extern void setup_apic_nmi_watchdog (void *);
extern void stop_apic_nmi_watchdog (void *);
extern void disable_timer_nmi_watchdog(void);

2007-05-15 03:13:31

by Andrew Morton

[permalink] [raw]
Subject: Re: oprofile broken in 2.6.21 SMP (was Re: Remove constructor from buffer_head)

On Sun, 13 May 2007 16:38:16 -0400 Benjamin LaHaise <[email protected]> wrote:

> On Sat, May 05, 2007 at 11:31:20AM +0200, Andi Kleen wrote:
> > Hmm, after a opcontrol --reset i see the same issue now. Don't know what's
> > wrong, but it must be something different from the .20 perfctr allocation
> > problem.
> >
> > It looks like the daemon doesn't get any data from the kernel
>
> I finally had time to track this down. The breakage is caused by "[PATCH]
> x86-64: Let oprofile reserve MSR on all CPUs". Oprofile is already calling
> the reserve functions on each CPU in the system when it sets up the MSRs.
> This results in oprofile getting a reservation failure on CPUs above 0. The
> following makes oprofile adapt to the API change for now -- oprofile
> still needs to be modified to perform the reservations earlier during its
> initialization, but that's a little bit more involved than the immediate
> bug fix. This only affects systems with more than 1 CPU. This patch has
> been through limited testing (Athlon 64 X2 and Core 2, but not on the P4) on
> x86 and x86-64 (Core 2 only).

Unfortunately we've left this a bit too late - your patch is patching code which
isn't there any more in mainline and we also need a 2.6.21.x fix.

So perhaps we could merge your "immediate bugfix" into -stable and implement the
"more involved" fix for 2.6.22.

Andi, any preferences?