Hi all,
I would propose merging the following patches...
The first set is mostly from Jason and tweaks the mutex adaptive
spinning, AIM7 throughput numbers:
PRE: 100 2000.04 21564.90 2721.29 311.99 3.12 0.01 0.00 99
POST: 100 2000.04 42603.85 5142.80 311.99 3.12 0.00 0.00 99
The second set is the qrwlock, although mostly rewritten by me. I didn't do
much with it other than boot and build a kernel. But I like them because
of the much better worst case preformance.
On Mon, 10 Feb 2014 20:58:20 +0100 Peter Zijlstra <[email protected]> wrote:
> Hi all,
>
> I would propose merging the following patches...
>
> The first set is mostly from Jason and tweaks the mutex adaptive
> spinning, AIM7 throughput numbers:
>
> PRE: 100 2000.04 21564.90 2721.29 311.99 3.12 0.01 0.00 99
> POST: 100 2000.04 42603.85 5142.80 311.99 3.12 0.00 0.00 99
What do these columns represent? I'm guessing the large improvement
was in context switches?
> The second set is the qrwlock, although mostly rewritten by me. I didn't do
> much with it other than boot and build a kernel. But I like them because
> of the much better worst case preformance.
On Mon, Feb 10, 2014 at 03:02:30PM -0800, Andrew Morton wrote:
> On Mon, 10 Feb 2014 20:58:20 +0100 Peter Zijlstra <[email protected]> wrote:
>
> > Hi all,
> >
> > I would propose merging the following patches...
> >
> > The first set is mostly from Jason and tweaks the mutex adaptive
> > spinning, AIM7 throughput numbers:
> >
Jobs/min/ Jobs/sec/ Time: Time: Time: Time: Running child time
Forks Jobs/min child child parent childU childS std_dev JTI :max :min
> > PRE: 100 2000.04 21564.90 2721.29 311.99 3.12 0.01 0.00 99
> > POST: 100 2000.04 42603.85 5142.80 311.99 3.12 0.00 0.00 99
>
> What do these columns represent? I'm guessing the large improvement
> was in context switches?
I pasted the header from reaim above; I'm not entirely sure what the
bloody thing does and I hate that it takes hours to get these numbers :/
Bloody stupid benchmark if you ask me.
On Tue, 11 Feb 2014 08:17:00 +0100 Peter Zijlstra <[email protected]> wrote:
> On Mon, Feb 10, 2014 at 03:02:30PM -0800, Andrew Morton wrote:
> > On Mon, 10 Feb 2014 20:58:20 +0100 Peter Zijlstra <[email protected]> wrote:
> >
> > > Hi all,
> > >
> > > I would propose merging the following patches...
> > >
> > > The first set is mostly from Jason and tweaks the mutex adaptive
> > > spinning, AIM7 throughput numbers:
> > >
>
> Jobs/min/ Jobs/sec/ Time: Time: Time: Time: Running child time
> Forks Jobs/min child child parent childU childS std_dev JTI :max :min
>
> > > PRE: 100 2000.04 21564.90 2721.29 311.99 3.12 0.01 0.00 99
> > > POST: 100 2000.04 42603.85 5142.80 311.99 3.12 0.00 0.00 99
> >
> > What do these columns represent? I'm guessing the large improvement
> > was in context switches?
>
> I pasted the header from reaim above;
hmpf. I wonder what's the difference between Jobs/min, Jobs/min(child)
and Jobs/sec(child), which is not Jobs/min(child) / 60.
> I'm not entirely sure what the
> bloody thing does and I hate that it takes hours to get these numbers :/
>
> Bloody stupid benchmark if you ask me.
heh, yes, it's stupid how long many benchmarks take. Ditch it. A
change like this should be testable with a 30-line microbenchmark which
runs in 5 seconds tops.
* Andrew Morton <[email protected]> wrote:
> On Tue, 11 Feb 2014 08:17:00 +0100 Peter Zijlstra <[email protected]> wrote:
>
> > On Mon, Feb 10, 2014 at 03:02:30PM -0800, Andrew Morton wrote:
> > > On Mon, 10 Feb 2014 20:58:20 +0100 Peter Zijlstra <[email protected]> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I would propose merging the following patches...
> > > >
> > > > The first set is mostly from Jason and tweaks the mutex adaptive
> > > > spinning, AIM7 throughput numbers:
> > > >
> >
> > Jobs/min/ Jobs/sec/ Time: Time: Time: Time: Running child time
> > Forks Jobs/min child child parent childU childS std_dev JTI :max :min
> >
> > > > PRE: 100 2000.04 21564.90 2721.29 311.99 3.12 0.01 0.00 99
> > > > POST: 100 2000.04 42603.85 5142.80 311.99 3.12 0.00 0.00 99
> > >
> > > What do these columns represent? I'm guessing the large improvement
> > > was in context switches?
> >
> > I pasted the header from reaim above;
>
> hmpf. I wonder what's the difference between Jobs/min, Jobs/min(child)
> and Jobs/sec(child), which is not Jobs/min(child) / 60.
>
> > I'm not entirely sure what the bloody thing does and I hate that
> > it takes hours to get these numbers :/
> >
> > Bloody stupid benchmark if you ask me.
>
> heh, yes, it's stupid how long many benchmarks take. Ditch it. A
> change like this should be testable with a 30-line microbenchmark
> which runs in 5 seconds tops.
Another very nice option would be to stick the relevant workload
patterns into 'perf bench', calibrate it to emit similar figures (and
double check the speedup is similar as well) and thus make it an AIM7
work-alike microbenchmark.
Thanks,
Ingo
On Tue, Feb 11, 2014 at 09:45:02AM +0100, Ingo Molnar wrote:
> > heh, yes, it's stupid how long many benchmarks take. Ditch it. A
> > change like this should be testable with a 30-line microbenchmark
> > which runs in 5 seconds tops.
>
> Another very nice option would be to stick the relevant workload
> patterns into 'perf bench', calibrate it to emit similar figures (and
> double check the speedup is similar as well) and thus make it an AIM7
> work-alike microbenchmark.
/me stares at Jason and Waiman.. :-)
On 02/11/2014 03:57 AM, Peter Zijlstra wrote:
> On Tue, Feb 11, 2014 at 09:45:02AM +0100, Ingo Molnar wrote:
>>> heh, yes, it's stupid how long many benchmarks take. Ditch it. A
>>> change like this should be testable with a 30-line microbenchmark
>>> which runs in 5 seconds tops.
>> Another very nice option would be to stick the relevant workload
>> patterns into 'perf bench', calibrate it to emit similar figures (and
>> double check the speedup is similar as well) and thus make it an AIM7
>> work-alike microbenchmark.
> /me stares at Jason and Waiman.. :-)
It shouldn't be too hard to get the mutex exercising portion of AIM7
into perf. I will discuss with Jason about this when we finish some of
the high-priority tasks we currently have.
-Longman
On Mon, 2014-02-10 at 15:02 -0800, Andrew Morton wrote:
> On Mon, 10 Feb 2014 20:58:20 +0100 Peter Zijlstra <[email protected]> wrote:
>
> > Hi all,
> >
> > I would propose merging the following patches...
> >
> > The first set is mostly from Jason and tweaks the mutex adaptive
> > spinning, AIM7 throughput numbers:
> >
> > PRE: 100 2000.04 21564.90 2721.29 311.99 3.12 0.01 0.00 99
> > POST: 100 2000.04 42603.85 5142.80 311.99 3.12 0.00 0.00 99
>
> What do these columns represent? I'm guessing the large improvement
> was in context switches?
Hello,
I also re-tested the mutex patches 1-6 on my 2 and 8 socket machines
with the high_systime and fserver AIM7 workloads (ran on disk). The
workloads are able to generate contention on the
&EXT4_SB(inode->i_sb)->s_orphan_lock mutex. Below are the % improvement
in throughput with the patches on a recent tip kernel. The main benefits
were on the larger box and when there were higher number of users.
Note: the -0.7% drop in performance for fserver at 10-90 users on the 2
socket machine was mainly due to "[PATCH 6/8] mutex: Extra reschedule
point". Without patch 6, there was almost no % difference in throughput
between the baseline kernel and kernel with patches 1-5.
8 socket machine:
--------------------------
fserver
--------------------------
users | % improvement
| in throughput
| with patches
--------------------------
1000-2000 | +29.2%
--------------------------
100-900 | +10.0%
--------------------------
10-90 | +0.4%
--------------------------
high_systime
--------------------------
users | % improvement
| in throughput
| with patches
--------------------------
1000-2000 | +34.9%
--------------------------
100-900 | +49.2%
--------------------------
10-90 | +3.1%
2 socket machine:
--------------------------
fserver
--------------------------
users | % improvement
| in throughput
| with patches
--------------------------
1000-2000 | +1.8%
--------------------------
100-900 | +0.0%
--------------------------
10-90 | -0.7%
--------------------------
high_systime
--------------------------
users | % improvement
| in throughput
| with patches
--------------------------
1000-2000 | +0.8%
--------------------------
100-900 | +0.4%
--------------------------
10-90 | +0.0%
On Mon, Feb 10, 2014 at 08:58:20PM +0100, Peter Zijlstra wrote:
> Hi all,
>
> I would propose merging the following patches...
>
> The first set is mostly from Jason and tweaks the mutex adaptive
> spinning, AIM7 throughput numbers:
>
> PRE: 100 2000.04 21564.90 2721.29 311.99 3.12 0.01 0.00 99
> POST: 100 2000.04 42603.85 5142.80 311.99 3.12 0.00 0.00 99
>
> The second set is the qrwlock, although mostly rewritten by me. I didn't do
> much with it other than boot and build a kernel. But I like them because
> of the much better worst case preformance.
This series passes a short locktorture test when based on top of current
tip/core/locking.
But don't read too much into this... This was in an 8-CPU KVM guest on
x86, and locktorture is still a bit on the lame side. But you have to
start somewhere!
Thanx, Paul