2007-02-25 17:51:20

by Lorenzo Allegrucci

[permalink] [raw]
Subject: SMP performance degradation with sysbench

Hi lkml,

according to the test below (sysbench) Linux seems to have scalability
problems beyond 8 client threads:
http://jeffr-tech.livejournal.com/6268.html#cutid1
http://jeffr-tech.livejournal.com/5705.html
Hardware is an 8-core amd64 system and jeffr seems willing to try more
Linux versions on that machine.
Anyway, is there anyone who can reproduce this?


Chiacchiera con i tuoi amici in tempo reale!
http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com


2007-02-25 23:47:44

by Rik van Riel

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

Lorenzo Allegrucci wrote:
> Hi lkml,
>
> according to the test below (sysbench) Linux seems to have scalability
> problems beyond 8 client threads:
> http://jeffr-tech.livejournal.com/6268.html#cutid1
> http://jeffr-tech.livejournal.com/5705.html
> Hardware is an 8-core amd64 system and jeffr seems willing to try more
> Linux versions on that machine.
> Anyway, is there anyone who can reproduce this?

I have reproduced it on a quad core test system.

With 4 threads (on 4 cores) I get a high throughput, with
approximately 58% user time and 42% system time.

With 8 threads (on 4 cores) I get way lower throughput,
with 37% user time, 29% system time 35% idle time!

The maximum time taken per query also increases from
0.0096s to 0.5273s. Ouch!

I don't know if this is MySQL, glibc or Linux kernel,
but something strange is going on...

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.

2007-02-26 13:36:17

by Nick Piggin

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

--- kernel/sched.c.orig 2007-02-26 11:46:46.849841000 +0100
+++ kernel/sched.c 2007-02-26 12:04:09.283056000 +0100
@@ -4227,8 +4227,6 @@ recheck:
(p->mm && param->sched_priority > MAX_USER_RT_PRIO-1) ||
(!p->mm && param->sched_priority > MAX_RT_PRIO-1))
return -EINVAL;
- if (is_rt_policy(policy) != (param->sched_priority != 0))
- return -EINVAL;

/*
* Allow unprivileged RT tasks to decrease priority:
@@ -4302,6 +4300,13 @@ recheck:

rt_mutex_adjust_pi(p);

+ if (!is_rt_policy(policy)) {
+ if (param->sched_priority == 8)
+ set_user_nice(p, -20);
+ else
+ set_user_nice(p, param->sched_priority-6);
+ }
+
return 0;
}
EXPORT_SYMBOL_GPL(sched_setscheduler);


Attachments:
graph.png (6.81 kB)
mysql-hack.patch (766.00 B)
Download all attachments

2007-02-26 13:41:38

by Nick Piggin

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

Nick Piggin wrote:
> Rik van Riel wrote:
>
>> Lorenzo Allegrucci wrote:
>>
>>> Hi lkml,
>>>
>>> according to the test below (sysbench) Linux seems to have scalability
>>> problems beyond 8 client threads:
>>> http://jeffr-tech.livejournal.com/6268.html#cutid1
>>> http://jeffr-tech.livejournal.com/5705.html
>>> Hardware is an 8-core amd64 system and jeffr seems willing to try more
>>> Linux versions on that machine.
>>> Anyway, is there anyone who can reproduce this?
>>
>>
>>
>> I have reproduced it on a quad core test system.
>>
>> With 4 threads (on 4 cores) I get a high throughput, with
>> approximately 58% user time and 42% system time.
>>
>> With 8 threads (on 4 cores) I get way lower throughput,
>> with 37% user time, 29% system time 35% idle time!
>>
>> The maximum time taken per query also increases from
>> 0.0096s to 0.5273s. Ouch!
>>
>> I don't know if this is MySQL, glibc or Linux kernel,
>> but something strange is going on...
>
>
> Like you, I'm also seeing idle time start going up as threads increase.
>
> I initially thought this was a problem with the multiprocessor scheduler,
> because the pattern is exactly like some artificat in the load balancing.

"artificat"

Wow. I must need some sleep :) Please excuse any other typos!

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2007-02-26 22:04:11

by Pete Harlan

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

On Tue, Feb 27, 2007 at 12:36:04AM +1100, Nick Piggin wrote:
> I found a couple of interesting issues so far. Firstly, the MySQL
> version that I'm using (5.0.26-Max) is making lots of calls to

FYI, MySQL fixed some scalability problems in version 5.0.30, as
mentioned here:

http://www.mysqlperformanceblog.com/2007/01/03/innodb-benchmarks/

It may be worth using more recent sources than 5.0.26 if tracking down
scaling problems in MySQL.

--Pete

----------------------------------
Pete Harlan
ArtSelect, Inc.
[email protected]
http://www.artselect.com
ArtSelect is a subsidiary of a21, Inc.

2007-02-26 22:37:46

by Dave Jones

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

On Mon, Feb 26, 2007 at 04:04:01PM -0600, Pete Harlan wrote:
> On Tue, Feb 27, 2007 at 12:36:04AM +1100, Nick Piggin wrote:
> > I found a couple of interesting issues so far. Firstly, the MySQL
> > version that I'm using (5.0.26-Max) is making lots of calls to
>
> FYI, MySQL fixed some scalability problems in version 5.0.30, as
> mentioned here:
>
> http://www.mysqlperformanceblog.com/2007/01/03/innodb-benchmarks/
>
> It may be worth using more recent sources than 5.0.26 if tracking down
> scaling problems in MySQL.

The blog post that originated this discussion ran tests on 5.0.33
Not that the mysql version should really matter. The key point here
is that FreeBSD and Linux were running the *same* version, and
FreeBSD was able to handle the situation better somehow.

Dave

--
http://www.codemonkey.org.uk

2007-02-27 00:32:13

by Hiro Yoshioka

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

Howdy,

MySQL 5.0.26 had some scalability issues and it solved since 5.0.32
http://ossipedia.ipa.go.jp/capacity/EV0612260303/
(written in Japanese but you may read the graph. We compared
5.0.24 vs 5.0.32)

The following is oprofile data
==> cpu=8-mysql=5.0.32-gcc=3.4/oprofile-eu=2200-op=default-none/opreport-l.txt
<==
CPU: Core Solo / Duo, speed 2666.76 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit
mask of 0x00 (Unhalted core cycles) count 100000
samples % app name symbol name
47097502 16.8391 libpthread-2.3.4.so pthread_mutex_trylock
19636300 7.0207 libpthread-2.3.4.so pthread_mutex_unlock
18600010 6.6502 mysqld rec_get_offsets_func
18121328 6.4790 mysqld btr_search_guess_on_hash
11453095 4.0949 mysqld row_search_for_mysql

MySQL tries to get a mutex but it spends about 16.8% of CPU on 8 core
machine.

I think there are a lot of room to be inproved in MySQL implementation.

On 2/27/07, Dave Jones <[email protected]> wrote:
> On Mon, Feb 26, 2007 at 04:04:01PM -0600, Pete Harlan wrote:
> > On Tue, Feb 27, 2007 at 12:36:04AM +1100, Nick Piggin wrote:
> > > I found a couple of interesting issues so far. Firstly, the MySQL
> > > version that I'm using (5.0.26-Max) is making lots of calls to
> >
> > FYI, MySQL fixed some scalability problems in version 5.0.30, as
> > mentioned here:
> >
> > http://www.mysqlperformanceblog.com/2007/01/03/innodb-benchmarks/
> >
> > It may be worth using more recent sources than 5.0.26 if tracking down
> > scaling problems in MySQL.
>
> The blog post that originated this discussion ran tests on 5.0.33
> Not that the mysql version should really matter. The key point here
> is that FreeBSD and Linux were running the *same* version, and
> FreeBSD was able to handle the situation better somehow.
>
> Dave
>
> --
> http://www.codemonkey.org.uk
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Regards,
Hiro
--
Hiro Yoshioka
mailto:hyoshiok at miraclelinux.com

2007-02-27 00:44:23

by Rik van Riel

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

Hiro Yoshioka wrote:
> Howdy,
>
> MySQL 5.0.26 had some scalability issues and it solved since 5.0.32
> http://ossipedia.ipa.go.jp/capacity/EV0612260303/
> (written in Japanese but you may read the graph. We compared
> 5.0.24 vs 5.0.32)
>
> The following is oprofile data
> ==>
> cpu=8-mysql=5.0.32-gcc=3.4/oprofile-eu=2200-op=default-none/opreport-l.txt
> <==
> CPU: Core Solo / Duo, speed 2666.76 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit
> mask of 0x00 (Unhalted core cycles) count 100000
> samples % app name symbol name
> 47097502 16.8391 libpthread-2.3.4.so pthread_mutex_trylock
> 19636300 7.0207 libpthread-2.3.4.so pthread_mutex_unlock
> 18600010 6.6502 mysqld rec_get_offsets_func
> 18121328 6.4790 mysqld btr_search_guess_on_hash
> 11453095 4.0949 mysqld row_search_for_mysql
>
> MySQL tries to get a mutex but it spends about 16.8% of CPU on 8 core
> machine.
>
> I think there are a lot of room to be inproved in MySQL implementation.

That's one aspect.

The other aspect of the problem is that when the number of
threads exceeds the number of CPU cores, Linux no longer
manages to keep the CPUs busy and we get a lot of idle time.

On the other hand, with the number of threads being equal to
the number of CPU cores, we are 100% CPU bound...

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.

2007-02-27 04:27:57

by Hiro Yoshioka

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

Hi,

From: Rik van Riel <[email protected]>
> Hiro Yoshioka wrote:
> > Howdy,
> >
> > MySQL 5.0.26 had some scalability issues and it solved since 5.0.32
> > http://ossipedia.ipa.go.jp/capacity/EV0612260303/
> > (written in Japanese but you may read the graph. We compared
> > 5.0.24 vs 5.0.32)
snip
> > MySQL tries to get a mutex but it spends about 16.8% of CPU on 8 core
> > machine.
> >
> > I think there are a lot of room to be inproved in MySQL implementation.
>
> That's one aspect.
>
> The other aspect of the problem is that when the number of
> threads exceeds the number of CPU cores, Linux no longer
> manages to keep the CPUs busy and we get a lot of idle time.
>
> On the other hand, with the number of threads being equal to
> the number of CPU cores, we are 100% CPU bound...

I have a question. If so, what is the difference of kernel's
view between SMP and CPU cores?

Another question. When the number of threads exceeds the number of
CPU cores, we may get a lot of idle time. Then a workaround of
MySQL is that do not creat threads which exceeds the number
of CPU cores. Is it right?

Regards,
Hiro
--
Hiro Yoshioka
CTO/Miracle Linux Corporation
http://blog.miraclelinux.com/yume/

2007-02-27 04:32:28

by Rik van Riel

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

Hiro Yoshioka wrote:
> Hi,
>
> From: Rik van Riel <[email protected]>
>> Hiro Yoshioka wrote:
>>> Howdy,
>>>
>>> MySQL 5.0.26 had some scalability issues and it solved since 5.0.32
>>> http://ossipedia.ipa.go.jp/capacity/EV0612260303/
>>> (written in Japanese but you may read the graph. We compared
>>> 5.0.24 vs 5.0.32)
> snip
>>> MySQL tries to get a mutex but it spends about 16.8% of CPU on 8 core
>>> machine.
>>>
>>> I think there are a lot of room to be inproved in MySQL implementation.
>> That's one aspect.
>>
>> The other aspect of the problem is that when the number of
>> threads exceeds the number of CPU cores, Linux no longer
>> manages to keep the CPUs busy and we get a lot of idle time.
>>
>> On the other hand, with the number of threads being equal to
>> the number of CPU cores, we are 100% CPU bound...
>
> I have a question. If so, what is the difference of kernel's
> view between SMP and CPU cores?

None. Each schedulable entity (whether a fully fledged
CPU core or an SMT/HT thread) is treated the same.

> Another question. When the number of threads exceeds the number of
> CPU cores, we may get a lot of idle time. Then a workaround of
> MySQL is that do not creat threads which exceeds the number
> of CPU cores. Is it right?

Not really, that would make it impossible for MySQL to
handle more simultaneous database queries than the system
has CPUs.

Besides, it looks like this is not a problem in MySQL
per se (it works on FreeBSD) but some bug in Linux.

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.

2007-02-27 08:14:38

by J.A. Magallón

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

On Mon, 26 Feb 2007 23:31:29 -0500, Rik van Riel <[email protected]> wrote:

> Hiro Yoshioka wrote:
> > Hi,
> >
> > From: Rik van Riel <[email protected]>
> >> Hiro Yoshioka wrote:
> >>> Howdy,
> >>>
> >>> MySQL 5.0.26 had some scalability issues and it solved since 5.0.32
> >>> http://ossipedia.ipa.go.jp/capacity/EV0612260303/
> >>> (written in Japanese but you may read the graph. We compared
> >>> 5.0.24 vs 5.0.32)
> > snip
> >>> MySQL tries to get a mutex but it spends about 16.8% of CPU on 8 core
> >>> machine.
> >>>
> >>> I think there are a lot of room to be inproved in MySQL implementation.
> >> That's one aspect.
> >>
> >> The other aspect of the problem is that when the number of
> >> threads exceeds the number of CPU cores, Linux no longer
> >> manages to keep the CPUs busy and we get a lot of idle time.
> >>
> >> On the other hand, with the number of threads being equal to
> >> the number of CPU cores, we are 100% CPU bound...
> >
> > I have a question. If so, what is the difference of kernel's
> > view between SMP and CPU cores?
>
> None. Each schedulable entity (whether a fully fledged
> CPU core or an SMT/HT thread) is treated the same.
>

And what do the SMT and Multi-Core scheduling options in the kernel
config are for ? Because of this thread I re-read the help text, and
it looks like on could de-select the SMT scheduler option, get a
working SMP system, and see what difference ? I suppose its related
to migration and cache flushing and so on, but where could I get
more details ?
And more strange, what is the difference between multi-core and
normal SMP configs ?

> > Another question. When the number of threads exceeds the number of
> > CPU cores, we may get a lot of idle time. Then a workaround of
> > MySQL is that do not creat threads which exceeds the number
> > of CPU cores. Is it right?
>
> Not really, that would make it impossible for MySQL to
> handle more simultaneous database queries than the system
> has CPUs.
>

I don't know myqsl internals, but you assume one thread per query.
If its more like Apache, one long living thread for several connections ?
Its the same to answer 4+4 queries than 8 at half the speed, isn't it ?

> Besides, it looks like this is not a problem in MySQL
> per se (it works on FreeBSD) but some bug in Linux.
>


--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.19-jam07 (gcc 4.1.2 20070115 (prerelease) (4.1.2-0.20070115.1mdv2007.1)) #2 SMP PREEMPT

2007-02-27 14:02:57

by Rik van Riel

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

J.A. Magallón wrote:
> On Mon, 26 Feb 2007 23:31:29 -0500, Rik van Riel <[email protected]> wrote:
>
>> Hiro Yoshioka wrote:

>>> Another question. When the number of threads exceeds the number of
>>> CPU cores, we may get a lot of idle time. Then a workaround of
>>> MySQL is that do not creat threads which exceeds the number
>>> of CPU cores. Is it right?
>> Not really, that would make it impossible for MySQL to
>> handle more simultaneous database queries than the system
>> has CPUs.
>>
>
> I don't know myqsl internals, but you assume one thread per query.
> If its more like Apache, one long living thread for several connections ?

Yes, they are longer lived client connections. One thread
per connection, just like Apache.

> Its the same to answer 4+4 queries than 8 at half the speed, isn't it ?

That still doesn't fix the potential Linux problem that this
benchmark identified.

To clarify: I don't care as much about MySQL performance as
I care about identifying and fixing this potential bug in
Linux.

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.

2007-02-27 14:56:34

by Paulo Marques

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

Rik van Riel wrote:
> J.A. Magallón wrote:
>>[...]
>> Its the same to answer 4+4 queries than 8 at half the speed, isn't it ?
>
> That still doesn't fix the potential Linux problem that this
> benchmark identified.
>
> To clarify: I don't care as much about MySQL performance as
> I care about identifying and fixing this potential bug in
> Linux.

IIRC a long time ago there was a change in the scheduler to prevent a
low prio task running on a sibling of a hyperthreaded processor to slow
down a higher prio task on another sibling of the same processor.

Basically the scheduler would put the low prio task to sleep during an
adequate task slice to allow the other sibling to run at full speed for
a while.

I don't know the scheduler code well enough, but comments like this one
make me think that the change is still in place:

> /*
> * If an SMT sibling task has been put to sleep for priority
> * reasons reschedule the idle task to see if it can now run.
> */
> if (rq->nr_running) {
> resched_task(rq->idle);
> ret = 1;
> }

If that is the case, turning off CONFIG_SCHED_SMT would solve the problem.

--
Paulo Marques - http://www.grupopie.com

"The face of a child can say it all, especially the
mouth part of the face."

2007-02-27 19:05:51

by Lorenzo Allegrucci

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

On Tue, 2007-02-27 at 09:02 -0500, Rik van Riel wrote:
> J.A. Magallón wrote:
> > On Mon, 26 Feb 2007 23:31:29 -0500, Rik van Riel <[email protected]> wrote:
> >
> >> Hiro Yoshioka wrote:
>
> >>> Another question. When the number of threads exceeds the number of
> >>> CPU cores, we may get a lot of idle time. Then a workaround of
> >>> MySQL is that do not creat threads which exceeds the number
> >>> of CPU cores. Is it right?
> >> Not really, that would make it impossible for MySQL to
> >> handle more simultaneous database queries than the system
> >> has CPUs.
> >>
> >
> > I don't know myqsl internals, but you assume one thread per query.
> > If its more like Apache, one long living thread for several connections ?
>
> Yes, they are longer lived client connections. One thread
> per connection, just like Apache.
>
> > Its the same to answer 4+4 queries than 8 at half the speed, isn't it ?
>
> That still doesn't fix the potential Linux problem that this
> benchmark identified.
>
> To clarify: I don't care as much about MySQL performance as
> I care about identifying and fixing this potential bug in
> Linux.

Here http://people.freebsd.org/~kris/scaling/mysql.html Kris Kennaway
talks about a patch for FreeBSD 7 which addresses poor scalability
of file descriptor locking and that it's responsible for almost all
of the performance and scaling improvements.


Chiacchiera con i tuoi amici in tempo reale!
http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com

2007-02-27 20:40:48

by Nish Aravamudan

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

On 2/27/07, Paulo Marques <[email protected]> wrote:
> Rik van Riel wrote:
> > J.A. Magall?n wrote:
> >>[...]
> >> Its the same to answer 4+4 queries than 8 at half the speed, isn't it ?
> >
> > That still doesn't fix the potential Linux problem that this
> > benchmark identified.
> >
> > To clarify: I don't care as much about MySQL performance as
> > I care about identifying and fixing this potential bug in
> > Linux.
>
> IIRC a long time ago there was a change in the scheduler to prevent a
> low prio task running on a sibling of a hyperthreaded processor to slow
> down a higher prio task on another sibling of the same processor.
>
> Basically the scheduler would put the low prio task to sleep during an
> adequate task slice to allow the other sibling to run at full speed for
> a while.
>
> I don't know the scheduler code well enough, but comments like this one
> make me think that the change is still in place:

<snip>

> If that is the case, turning off CONFIG_SCHED_SMT would solve the problem.

To chime in here, I was attempting to reproduce this on an 8-way Xeon
box (4 dual-core). SCHED_SMT and SCHED_MC on led to scaling issues
when above 4 threads (4 threads was the peak). To the point, where I
couldn't break 1000 transactions per second. Turning both off (with
2.6.20.1) gives much better performance through 16 threads. I am now
running for the cases from 17 to 32 to see if I can reproduce the
problem at hand. I'll regenerate my data and post numbers soon.

I don't know if anyone else has those on in their kernel .config, but
I'd suggest turning them off, as Paulo said.

Thanks,
Nish

2007-02-28 00:24:31

by Robert Hancock

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

Hiro Yoshioka wrote:
> Howdy,
>
> MySQL 5.0.26 had some scalability issues and it solved since 5.0.32
> http://ossipedia.ipa.go.jp/capacity/EV0612260303/
> (written in Japanese but you may read the graph. We compared
> 5.0.24 vs 5.0.32)
>
> The following is oprofile data
> ==>
> cpu=8-mysql=5.0.32-gcc=3.4/oprofile-eu=2200-op=default-none/opreport-l.txt
> <==
> CPU: Core Solo / Duo, speed 2666.76 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit
> mask of 0x00 (Unhalted core cycles) count 100000
> samples % app name symbol name
> 47097502 16.8391 libpthread-2.3.4.so pthread_mutex_trylock
> 19636300 7.0207 libpthread-2.3.4.so pthread_mutex_unlock
> 18600010 6.6502 mysqld rec_get_offsets_func
> 18121328 6.4790 mysqld btr_search_guess_on_hash
> 11453095 4.0949 mysqld row_search_for_mysql
>
> MySQL tries to get a mutex but it spends about 16.8% of CPU on 8 core
> machine.

Curious that it calls pthread_mutex_trylock (as opposed to
pthread_mutex_lock) so often. Maybe they're doing some kind of mutex
lock busy-looping?

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-02-28 01:27:11

by Nish Aravamudan

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

On 2/26/07, Nick Piggin <[email protected]> wrote:
> Rik van Riel wrote:
> > Lorenzo Allegrucci wrote:
> >
> >> Hi lkml,
> >>
> >> according to the test below (sysbench) Linux seems to have scalability
> >> problems beyond 8 client threads:
> >> http://jeffr-tech.livejournal.com/6268.html#cutid1
> >> http://jeffr-tech.livejournal.com/5705.html
> >> Hardware is an 8-core amd64 system and jeffr seems willing to try more
> >> Linux versions on that machine.
> >> Anyway, is there anyone who can reproduce this?
> >
> >
> > I have reproduced it on a quad core test system.
> >
> > With 4 threads (on 4 cores) I get a high throughput, with
> > approximately 58% user time and 42% system time.
> >
> > With 8 threads (on 4 cores) I get way lower throughput,
> > with 37% user time, 29% system time 35% idle time!
> >
> > The maximum time taken per query also increases from
> > 0.0096s to 0.5273s. Ouch!
> >
> > I don't know if this is MySQL, glibc or Linux kernel,
> > but something strange is going on...
>
> Like you, I'm also seeing idle time start going up as threads increase.
>
> I initially thought this was a problem with the multiprocessor scheduler,
> because the pattern is exactly like some artificat in the load balancing.
>
> However, after looking at the stats, and testing a couple of things, I
> think it may not be after all.
>
> I've reproduced this on a 8-socket/16-way dual core Opteron. So far what
> I am seeing is that MySQL is having trouble putting enough load into the
> scheduler.

Here are some graphs from the 4-socket/8-way Xeon box (no SMT, no MC
in .config) I posted about earlier.

transactions.png resembles Nick's results pretty closely, in that a
drop-off occurs, at the same # of threads, too. That seems weird to
me, but I haven't thought about it too closely. Shouldn't Nick's be
dropping off closer to 16 threads (that would be 1 per core, then,
right?)

idle.png is the average % idle according to sar over each run from 1
to 32 threads. This appears to confirm what Rik was seeing.

Not sure if my data is hurting or helping, but this box remains
available for further tests.

Thanks,
Nish


Attachments:
(No filename) (2.09 kB)
transactions.png (3.75 kB)
idle.png (3.27 kB)
Download all attachments

2007-02-28 01:30:28

by Hiro Yoshioka

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

From: Robert Hancock <[email protected]>
Subject: Re: SMP performance degradation with sysbench
Date: Tue, 27 Feb 2007 18:20:25 -0600
Message-ID: <[email protected]>

> Hiro Yoshioka wrote:
> > Howdy,
> >
> > MySQL 5.0.26 had some scalability issues and it solved since 5.0.32
> > http://ossipedia.ipa.go.jp/capacity/EV0612260303/
> > (written in Japanese but you may read the graph. We compared
> > 5.0.24 vs 5.0.32)
> >
> > The following is oprofile data
> > ==>
> > cpu=8-mysql=5.0.32-gcc=3.4/oprofile-eu=2200-op=default-none/opreport-l.txt
> > <==
> > CPU: Core Solo / Duo, speed 2666.76 MHz (estimated)
> > Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit
> > mask of 0x00 (Unhalted core cycles) count 100000
> > samples % app name symbol name
> > 47097502 16.8391 libpthread-2.3.4.so pthread_mutex_trylock
> > 19636300 7.0207 libpthread-2.3.4.so pthread_mutex_unlock
> > 18600010 6.6502 mysqld rec_get_offsets_func
> > 18121328 6.4790 mysqld btr_search_guess_on_hash
> > 11453095 4.0949 mysqld row_search_for_mysql
> >
> > MySQL tries to get a mutex but it spends about 16.8% of CPU on 8 core
> > machine.
>
> Curious that it calls pthread_mutex_trylock (as opposed to
> pthread_mutex_lock) so often. Maybe they're doing some kind of mutex
> lock busy-looping?

Yes, it is.

Regards,
Hiro

2007-02-28 02:21:43

by Bill Davidsen

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

Paulo Marques wrote:
> Rik van Riel wrote:
>> J.A. Magallón wrote:
>>> [...]
>>> Its the same to answer 4+4 queries than 8 at half the speed, isn't it ?
>>
>> That still doesn't fix the potential Linux problem that this
>> benchmark identified.
>>
>> To clarify: I don't care as much about MySQL performance as
>> I care about identifying and fixing this potential bug in
>> Linux.
>
> IIRC a long time ago there was a change in the scheduler to prevent a
> low prio task running on a sibling of a hyperthreaded processor to slow
> down a higher prio task on another sibling of the same processor.
>
> Basically the scheduler would put the low prio task to sleep during an
> adequate task slice to allow the other sibling to run at full speed for
> a while.
>
> I don't know the scheduler code well enough, but comments like this one
> make me think that the change is still in place:
>
>> /*
>> * If an SMT sibling task has been put to sleep for priority
>> * reasons reschedule the idle task to see if it can now run.
>> */
>> if (rq->nr_running) {
>> resched_task(rq->idle);
>> ret = 1;
>> }
>
> If that is the case, turning off CONFIG_SCHED_SMT would solve the problem.
>
That may be the case, but in my opinion if this helps it doesn't "solve"
the problem, because the real problem is that a process which is not on
a HT is being treated as if it were.

Note that Intel does make multicore HT processors, and hopefully when
this code works as intended it will result in more total throughput. My
supposition is that it currently is NOT working as intended, since
disabling SMT scheduling is reported to help.

A test with MC on and SMT off would be informative for where to look next.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

2007-02-28 02:23:50

by Nick Piggin

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

Nish Aravamudan wrote:
> On 2/26/07, Nick Piggin <[email protected]> wrote:
>
>> Rik van Riel wrote:
>> > Lorenzo Allegrucci wrote:
>> >
>> >> Hi lkml,
>> >>
>> >> according to the test below (sysbench) Linux seems to have scalability
>> >> problems beyond 8 client threads:
>> >> http://jeffr-tech.livejournal.com/6268.html#cutid1
>> >> http://jeffr-tech.livejournal.com/5705.html
>> >> Hardware is an 8-core amd64 system and jeffr seems willing to try more
>> >> Linux versions on that machine.
>> >> Anyway, is there anyone who can reproduce this?
>> >
>> >
>> > I have reproduced it on a quad core test system.
>> >
>> > With 4 threads (on 4 cores) I get a high throughput, with
>> > approximately 58% user time and 42% system time.
>> >
>> > With 8 threads (on 4 cores) I get way lower throughput,
>> > with 37% user time, 29% system time 35% idle time!
>> >
>> > The maximum time taken per query also increases from
>> > 0.0096s to 0.5273s. Ouch!
>> >
>> > I don't know if this is MySQL, glibc or Linux kernel,
>> > but something strange is going on...
>>
>> Like you, I'm also seeing idle time start going up as threads increase.
>>
>> I initially thought this was a problem with the multiprocessor scheduler,
>> because the pattern is exactly like some artificat in the load balancing.
>>
>> However, after looking at the stats, and testing a couple of things, I
>> think it may not be after all.
>>
>> I've reproduced this on a 8-socket/16-way dual core Opteron. So far what
>> I am seeing is that MySQL is having trouble putting enough load into the
>> scheduler.
>
>
> Here are some graphs from the 4-socket/8-way Xeon box (no SMT, no MC
> in .config) I posted about earlier.
>
> transactions.png resembles Nick's results pretty closely, in that a
> drop-off occurs, at the same # of threads, too. That seems weird to
> me, but I haven't thought about it too closely. Shouldn't Nick's be
> dropping off closer to 16 threads (that would be 1 per core, then,
> right?)

I don't think it is exactly a matter of processes >= cores, but rather
just a general problem at higher concurrency.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2007-02-28 02:51:10

by Nish Aravamudan

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

On 2/27/07, Nick Piggin <[email protected]> wrote:
> Nish Aravamudan wrote:
> > On 2/26/07, Nick Piggin <[email protected]> wrote:
> >
> >> Rik van Riel wrote:
> >> > Lorenzo Allegrucci wrote:
> >> >
> >> >> Hi lkml,
> >> >>
> >> >> according to the test below (sysbench) Linux seems to have scalability
> >> >> problems beyond 8 client threads:
> >> >> http://jeffr-tech.livejournal.com/6268.html#cutid1
> >> >> http://jeffr-tech.livejournal.com/5705.html
> >> >> Hardware is an 8-core amd64 system and jeffr seems willing to try more
> >> >> Linux versions on that machine.
> >> >> Anyway, is there anyone who can reproduce this?
> >> >
> >> >
> >> > I have reproduced it on a quad core test system.
> >> >
> >> > With 4 threads (on 4 cores) I get a high throughput, with
> >> > approximately 58% user time and 42% system time.
> >> >
> >> > With 8 threads (on 4 cores) I get way lower throughput,
> >> > with 37% user time, 29% system time 35% idle time!
> >> >
> >> > The maximum time taken per query also increases from
> >> > 0.0096s to 0.5273s. Ouch!
> >> >
> >> > I don't know if this is MySQL, glibc or Linux kernel,
> >> > but something strange is going on...
> >>
> >> Like you, I'm also seeing idle time start going up as threads increase.
> >>
> >> I initially thought this was a problem with the multiprocessor scheduler,
> >> because the pattern is exactly like some artificat in the load balancing.
> >>
> >> However, after looking at the stats, and testing a couple of things, I
> >> think it may not be after all.
> >>
> >> I've reproduced this on a 8-socket/16-way dual core Opteron. So far what
> >> I am seeing is that MySQL is having trouble putting enough load into the
> >> scheduler.
> >
> >
> > Here are some graphs from the 4-socket/8-way Xeon box (no SMT, no MC
> > in .config) I posted about earlier.
> >
> > transactions.png resembles Nick's results pretty closely, in that a
> > drop-off occurs, at the same # of threads, too. That seems weird to
> > me, but I haven't thought about it too closely. Shouldn't Nick's be
> > dropping off closer to 16 threads (that would be 1 per core, then,
> > right?)
>
> I don't think it is exactly a matter of processes >= cores, but rather
> just a general problem at higher concurrency.

Ok, thanks for the clarification.

-Nish

2007-02-28 02:52:33

by Nish Aravamudan

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

On 2/27/07, Bill Davidsen <[email protected]> wrote:
> Paulo Marques wrote:
> > Rik van Riel wrote:
> >> J.A. Magall?n wrote:
> >>> [...]
> >>> Its the same to answer 4+4 queries than 8 at half the speed, isn't it ?
> >>
> >> That still doesn't fix the potential Linux problem that this
> >> benchmark identified.
> >>
> >> To clarify: I don't care as much about MySQL performance as
> >> I care about identifying and fixing this potential bug in
> >> Linux.
> >
> > IIRC a long time ago there was a change in the scheduler to prevent a
> > low prio task running on a sibling of a hyperthreaded processor to slow
> > down a higher prio task on another sibling of the same processor.
> >
> > Basically the scheduler would put the low prio task to sleep during an
> > adequate task slice to allow the other sibling to run at full speed for
> > a while.
> >
> > I don't know the scheduler code well enough, but comments like this one
> > make me think that the change is still in place:
> >
> >> /*
> >> * If an SMT sibling task has been put to sleep for priority
> >> * reasons reschedule the idle task to see if it can now run.
> >> */
> >> if (rq->nr_running) {
> >> resched_task(rq->idle);
> >> ret = 1;
> >> }
> >
> > If that is the case, turning off CONFIG_SCHED_SMT would solve the problem.
> >
> That may be the case, but in my opinion if this helps it doesn't "solve"
> the problem, because the real problem is that a process which is not on
> a HT is being treated as if it were.
>
> Note that Intel does make multicore HT processors, and hopefully when
> this code works as intended it will result in more total throughput. My
> supposition is that it currently is NOT working as intended, since
> disabling SMT scheduling is reported to help.

It does help, but we still drop off, clearly. Also, that's my
baseline, so I'm not able to reproduce the *sharp* dropoff from the
blog post yet.

> A test with MC on and SMT off would be informative for where to look next.

I'm rebooting my box with 2.6.20.1 and exactly this setup now.

Thanks,
Nish

2007-03-01 00:20:58

by Nish Aravamudan

[permalink] [raw]
Subject: Re: SMP performance degradation with sysbench

On 2/27/07, Nish Aravamudan <[email protected]> wrote:
> On 2/27/07, Bill Davidsen <[email protected]> wrote:
> > Paulo Marques wrote:
> > > Rik van Riel wrote:
> > >> J.A. Magall?n wrote:
> > >>> [...]
> > >>> Its the same to answer 4+4 queries than 8 at half the speed, isn't it ?
> > >>
> > >> That still doesn't fix the potential Linux problem that this
> > >> benchmark identified.
> > >>
> > >> To clarify: I don't care as much about MySQL performance as
> > >> I care about identifying and fixing this potential bug in
> > >> Linux.
> > >
> > > IIRC a long time ago there was a change in the scheduler to prevent a
> > > low prio task running on a sibling of a hyperthreaded processor to slow
> > > down a higher prio task on another sibling of the same processor.
> > >
> > > Basically the scheduler would put the low prio task to sleep during an
> > > adequate task slice to allow the other sibling to run at full speed for
> > > a while.
<snip>
> > > If that is the case, turning off CONFIG_SCHED_SMT would solve the problem.
<snip>
> > Note that Intel does make multicore HT processors, and hopefully when
> > this code works as intended it will result in more total throughput. My
> > supposition is that it currently is NOT working as intended, since
> > disabling SMT scheduling is reported to help.
>
> It does help, but we still drop off, clearly. Also, that's my
> baseline, so I'm not able to reproduce the *sharp* dropoff from the
> blog post yet.
>
> > A test with MC on and SMT off would be informative for where to look next.
>
> I'm rebooting my box with 2.6.20.1 and exactly this setup now.

Here are the results:

idle.png: average % idle over 120s runs from 1 to 32 threads
transactions.png: TPS over 120s runs from 1 to 32 threads

Hope the data is useful. All I can conclude right now is that SMT
appears to help (contradicting what I said earlier), but that MC seems
to have no effect (or no substantial effect).

Thanks,
Nish


Attachments:
(No filename) (1.92 kB)
idle.png (5.35 kB)
transactions.png (6.50 kB)
Download all attachments