2005-11-20 09:25:06

by Marcel Zalmanovici

[permalink] [raw]
Subject: Inconsistent timing results of multithreaded program on an SMP machine.





Hi,

I am trying, as part of my thesis, to make some improvement to the linux
scheduler.
Therefore I've written a small multithreaded application and tested how
long on average it takes for it to complete.

The results were very surprising. I expected to see the completion time
vary 1 to 2 seconds at most.
Instead what I've got was an oscillation where the maximum time was twice
and more than the minimum!! For a short test results ranged ~7sec to ~16
sec, and the same happened to longer tests where the min was ~1min and the
max ~2:30min.

Does anyone have any clue as to what might happen ?
Is there anything I can do to get stable results ?

Here is a small test case program:

(See attached file: sched_test.c)

The test was always done on a pretty much empty machine. I've tried both
kernel 2.6.4-52 and 2.6.13.4 but the results were the same.

I'm working on a Xeon Intel machine, dual processor, hyperthreaded.
Below is the info from /proc/cpuinfo

(See attached file: cpuinfo.log)

The machine's configuration is as follows: (filtered with grep "=[y|m]" )

(See attached file: conf_filtered)

Please respond to either:
[email protected]
or
[email protected]
as I am not subscribed to the mailing list.

Thank you for your help,
Marcel


Attachments:
sched_test.c (1.44 kB)
cpuinfo.log (1.91 kB)
conf_filtered (37.42 kB)
Download all attachments

2005-11-20 09:55:01

by Con Kolivas

[permalink] [raw]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP machine.

On Sun, 20 Nov 2005 20:27, Marcel Zalmanovici wrote:
> Hi,
>
> I am trying, as part of my thesis, to make some improvement to the linux
> scheduler.
> Therefore I've written a small multithreaded application and tested how
> long on average it takes for it to complete.
>
> The results were very surprising. I expected to see the completion time
> vary 1 to 2 seconds at most.
> Instead what I've got was an oscillation where the maximum time was twice
> and more than the minimum!! For a short test results ranged ~7sec to ~16
> sec, and the same happened to longer tests where the min was ~1min and the
> max ~2:30min.
>
> Does anyone have any clue as to what might happen ?
> Is there anything I can do to get stable results ?
>
> Here is a small test case program:
>
> (See attached file: sched_test.c)
>
> The test was always done on a pretty much empty machine. I've tried both
> kernel 2.6.4-52 and 2.6.13.4 but the results were the same.
>
> I'm working on a Xeon Intel machine, dual processor, hyperthreaded.
^^^^^^^^^^^^^^

I suspect what you're seeing is the random nature of threads to bind either to
a hyperthread sibling or a real core. If all your threads land on only one
physical cpu and both hyperthread siblings it will run much slower than if
half land on one physical cpu and half on the other physical cpu. To confirm
this, try setting cpu affinity just for one logical core of each phyiscal cpu
(like affinity for cpu 1 and 3 only for example) or disabling hyperthread in
the bios.

Cheers,
Con

2005-11-20 10:16:19

by Marcel Zalmanovici

[permalink] [raw]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP machine.






Hi Con,

Thanks for the quick reply.

I suspected hyperthreading might cause some these results, therefore, I
have also tried disabling hyperthreading altogether.
The results haven't changed.

I've also tried entering in single-user mode, no effect there either.

Marcel




Con Kolivas
<[email protected] To: Marcel Zalmanovici/Haifa/IBM@IBMIL
rg> cc: [email protected]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP
20/11/2005 11:54 machine.






On Sun, 20 Nov 2005 20:27, Marcel Zalmanovici wrote:
> Hi,
>
> I am trying, as part of my thesis, to make some improvement to the linux
> scheduler.
> Therefore I've written a small multithreaded application and tested how
> long on average it takes for it to complete.
>
> The results were very surprising. I expected to see the completion time
> vary 1 to 2 seconds at most.
> Instead what I've got was an oscillation where the maximum time was twice
> and more than the minimum!! For a short test results ranged ~7sec to ~16
> sec, and the same happened to longer tests where the min was ~1min and
the
> max ~2:30min.
>
> Does anyone have any clue as to what might happen ?
> Is there anything I can do to get stable results ?
>
> Here is a small test case program:
>
> (See attached file: sched_test.c)
>
> The test was always done on a pretty much empty machine. I've tried both
> kernel 2.6.4-52 and 2.6.13.4 but the results were the same.
>
> I'm working on a Xeon Intel machine, dual processor, hyperthreaded.
^^^^^^^^^^^^^^

I suspect what you're seeing is the random nature of threads to bind either
to
a hyperthread sibling or a real core. If all your threads land on only one
physical cpu and both hyperthread siblings it will run much slower than if
half land on one physical cpu and half on the other physical cpu. To
confirm
this, try setting cpu affinity just for one logical core of each phyiscal
cpu
(like affinity for cpu 1 and 3 only for example) or disabling hyperthread
in
the bios.

Cheers,
Con


2005-11-20 10:28:25

by Con Kolivas

[permalink] [raw]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP machine.

On Sun, 20 Nov 2005 21:18, Marcel Zalmanovici wrote:
> Hi Con,
>
> Thanks for the quick reply.

No problems.

Please reply below what you are replying to so we can keep track of email
threads.

> <[email protected] To: Marcel
> Zalmanovici/Haifa/IBM@IBMIL rg> cc:
> [email protected] Subject: Re: Inconsistent timing results of
> multithreaded program on an SMP 20/11/2005 11:54 machine.
>
> On Sun, 20 Nov 2005 20:27, Marcel Zalmanovici wrote:
> > Hi,
> >
> > I am trying, as part of my thesis, to make some improvement to the linux
> > scheduler.
> > Therefore I've written a small multithreaded application and tested how
> > long on average it takes for it to complete.
> >
> > The results were very surprising. I expected to see the completion time
> > vary 1 to 2 seconds at most.
> > Instead what I've got was an oscillation where the maximum time was twice
> > and more than the minimum!! For a short test results ranged ~7sec to ~16
> > sec, and the same happened to longer tests where the min was ~1min and
>
> the
>
> > max ~2:30min.
> >
> > Does anyone have any clue as to what might happen ?
> > Is there anything I can do to get stable results ?
> >
> > Here is a small test case program:
> >
> > (See attached file: sched_test.c)
> >
> > The test was always done on a pretty much empty machine. I've tried both
> > kernel 2.6.4-52 and 2.6.13.4 but the results were the same.
> >
> > I'm working on a Xeon Intel machine, dual processor, hyperthreaded.
>
> ^^^^^^^^^^^^^^
>
> I suspect what you're seeing is the random nature of threads to bind either
> to
> a hyperthread sibling or a real core. If all your threads land on only one
> physical cpu and both hyperthread siblings it will run much slower than if
> half land on one physical cpu and half on the other physical cpu. To
> confirm
> this, try setting cpu affinity just for one logical core of each phyiscal
> cpu
> (like affinity for cpu 1 and 3 only for example) or disabling hyperthread
> in
> the bios.

Ok I've had a look at the actual program now ;) Are you timing the time it
takes to completion of everything?

This part of your program:
for (i= 0; i<8; i++)
pthread_join(tid[i], NULL);

Cares about the order the threads finish. Do you think this might be affecting
your results?

Cheers,
Con

2005-11-20 10:39:19

by Con Kolivas

[permalink] [raw]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP machine.

On Sun, 20 Nov 2005 21:35, Muli Ben-Yehuda wrote:
> On Sun, Nov 20, 2005 at 09:28:13PM +1100, Con Kolivas wrote:
> > Ok I've had a look at the actual program now ;) Are you timing the time
> > it takes to completion of everything?
> >
> > This part of your program:
> > for (i= 0; i<8; i++)
> > pthread_join(tid[i], NULL);
> >
> > Cares about the order the threads finish. Do you think this might be
> > affecting your results?
>
> I don't see why it should matter. Depending on the order the threads
> finish, we will always wait in pthread_join until the last one
> finishes, and then do between 0 and 7 more pthread_joins that should
> return immediately (since the last one has already finished).

If it was instant it shouldn't matter. I'm aware of that in theory, but there
have certainly been reports of pthread_join taking quite a while happening in
a sort of lazy/sloppy way. I don't know why this is the case but I wondered
if it was showing up here.

Cheers,
Con

2005-11-20 10:36:50

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP machine.

On Sun, Nov 20, 2005 at 09:28:13PM +1100, Con Kolivas wrote:

> Ok I've had a look at the actual program now ;) Are you timing the time it
> takes to completion of everything?
>
> This part of your program:
> for (i= 0; i<8; i++)
> pthread_join(tid[i], NULL);
>
> Cares about the order the threads finish. Do you think this might be affecting
> your results?

I don't see why it should matter. Depending on the order the threads
finish, we will always wait in pthread_join until the last one
finishes, and then do between 0 and 7 more pthread_joins that should
return immediately (since the last one has already finished).

Cheers,
Muli
--
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/

2005-11-20 10:41:23

by Marcel Zalmanovici

[permalink] [raw]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP machine.


Con Kolivas
<[email protected] To: Marcel Zalmanovici/Haifa/IBM@IBMIL
rg> cc: [email protected]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP
20/11/2005 12:28 machine.













On Sun, 20 Nov 2005 21:18, Marcel Zalmanovici wrote:
> Hi Con,
>
> Thanks for the quick reply.

No problems.

Please reply below what you are replying to so we can keep track of email
threads.

> <[email protected] To: Marcel
> Zalmanovici/Haifa/IBM@IBMIL rg> cc:
> [email protected] Subject: Re: Inconsistent timing results of
> multithreaded program on an SMP 20/11/2005 11:54 machine.
>
> On Sun, 20 Nov 2005 20:27, Marcel Zalmanovici wrote:
> > Hi,
> >
> > I am trying, as part of my thesis, to make some improvement to the
linux
> > scheduler.
> > Therefore I've written a small multithreaded application and tested how
> > long on average it takes for it to complete.
> >
> > The results were very surprising. I expected to see the completion time
> > vary 1 to 2 seconds at most.
> > Instead what I've got was an oscillation where the maximum time was
twice
> > and more than the minimum!! For a short test results ranged ~7sec to
~16
> > sec, and the same happened to longer tests where the min was ~1min and
>
> the
>
> > max ~2:30min.
> >
> > Does anyone have any clue as to what might happen ?
> > Is there anything I can do to get stable results ?
> >
> > Here is a small test case program:
> >
> > (See attached file: sched_test.c)
> >
> > The test was always done on a pretty much empty machine. I've tried
both
> > kernel 2.6.4-52 and 2.6.13.4 but the results were the same.
> >
> > I'm working on a Xeon Intel machine, dual processor, hyperthreaded.
>
> ^^^^^^^^^^^^^^
>
> I suspect what you're seeing is the random nature of threads to bind
either
> to
> a hyperthread sibling or a real core. If all your threads land on only
one
> physical cpu and both hyperthread siblings it will run much slower than
if
> half land on one physical cpu and half on the other physical cpu. To
> confirm
> this, try setting cpu affinity just for one logical core of each phyiscal
> cpu
> (like affinity for cpu 1 and 3 only for example) or disabling hyperthread
> in
> the bios.

Ok I've had a look at the actual program now ;) Are you timing the time it
takes to completion of everything?

Yes. I'm timing the whole program.

This part of your program:
for (i= 0; i<8; i++)
pthread_join(tid[i], NULL);

Cares about the order the threads finish. Do you think this might be
affecting
your results?

I highly doubt that. The worst case scenario would have thread 0 finish
last right ? but then, all other queries would simply return without
waiting.

Cheers,
Con


2005-11-20 10:44:04

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP machine.

On Sun, Nov 20, 2005 at 09:39:00PM +1100, Con Kolivas wrote:
> On Sun, 20 Nov 2005 21:35, Muli Ben-Yehuda wrote:

> If it was instant it shouldn't matter. I'm aware of that in theory, but there
> have certainly been reports of pthread_join taking quite a while happening in
> a sort of lazy/sloppy way. I don't know why this is the case but I wondered
> if it was showing up here.

I see, I didn't realize pthread_join might not return immediately even
if the thread already finished (I would actually consider that a
bug...). Marcel, could you please check?

Cheers,
Muli
--
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/

2005-11-20 10:48:26

by Marcel Zalmanovici

[permalink] [raw]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP machine.


Con Kolivas
<[email protected] To: Muli Ben-Yehuda <[email protected]>
rg> cc: Marcel Zalmanovici/Haifa/IBM@IBMIL, [email protected]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP
20/11/2005 12:39 machine.













On Sun, 20 Nov 2005 21:35, Muli Ben-Yehuda wrote:
> On Sun, Nov 20, 2005 at 09:28:13PM +1100, Con Kolivas wrote:
> > Ok I've had a look at the actual program now ;) Are you timing the time
> > it takes to completion of everything?
> >
> > This part of your program:
> > for (i= 0; i<8; i++)
> > pthread_join(tid[i], NULL);
> >
> > Cares about the order the threads finish. Do you think this might be
> > affecting your results?
>
> I don't see why it should matter. Depending on the order the threads
> finish, we will always wait in pthread_join until the last one
> finishes, and then do between 0 and 7 more pthread_joins that should
> return immediately (since the last one has already finished).

If it was instant it shouldn't matter. I'm aware of that in theory, but
there
have certainly been reports of pthread_join taking quite a while happening
in
a sort of lazy/sloppy way. I don't know why this is the case but I wondered

if it was showing up here.

I've looked through the detailed run results and found this:

Thread DS 0, TID = 7646
Thread DS 0, TID = 7645
Thread DS 1, TID = 7648
Thread DS 1, TID = 7650
Thread DS 2, TID = 7651
Thread DS 2, TID = 7652
Thread DS 3, TID = 7653
Thread DS 3, TID = 7654
Main exit ...
real 23.25

As you can see except the first 2 threads all finished in order they were
created. With the average being at about 16.5 this is a high result and
thread order was almost ideal.

Cheers,
Con


2005-11-20 10:51:07

by Con Kolivas

[permalink] [raw]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP machine.

On Sun, 20 Nov 2005 21:50, Marcel Zalmanovici wrote:
> On Sun, 20 Nov 2005 21:35, Muli Ben-Yehuda wrote:
> > On Sun, Nov 20, 2005 at 09:28:13PM +1100, Con Kolivas wrote:
> > > Ok I've had a look at the actual program now ;) Are you timing the time
> > > it takes to completion of everything?
> > >
> > > This part of your program:
> > > for (i= 0; i<8; i++)
> > > pthread_join(tid[i], NULL);
> > >
> > > Cares about the order the threads finish. Do you think this might be
> > > affecting your results?
> >
> > I don't see why it should matter. Depending on the order the threads
> > finish, we will always wait in pthread_join until the last one
> > finishes, and then do between 0 and 7 more pthread_joins that should
> > return immediately (since the last one has already finished).
>
> If it was instant it shouldn't matter. I'm aware of that in theory, but
> there
> have certainly been reports of pthread_join taking quite a while happening
> in
> a sort of lazy/sloppy way. I don't know why this is the case but I wondered
>
> if it was showing up here.
>
> I've looked through the detailed run results and found this:
>
> Thread DS 0, TID = 7646
> Thread DS 0, TID = 7645
> Thread DS 1, TID = 7648
> Thread DS 1, TID = 7650
> Thread DS 2, TID = 7651
> Thread DS 2, TID = 7652
> Thread DS 3, TID = 7653
> Thread DS 3, TID = 7654
> Main exit ...
> real 23.25
>
> As you can see except the first 2 threads all finished in order they were
> created. With the average being at about 16.5 this is a high result and
> thread order was almost ideal.

Ok, but how long does pthread_join actually take to complete?

Cheers,
Con

P.S. Lotus notes is terrible for a mailer isn't it?

2005-11-20 14:02:22

by Paul Jackson

[permalink] [raw]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP machine.

Marcel wrote:
> Instead what I've got was an oscillation where the maximum time was twice
> and more than the minimum!! For a short test results ranged ~7sec to ~16 ...

Just for grins, try displaying which cpu each thread runs on. Display the
return from "latestcpu(getpid())" in the two per-thread printf's, to display
the thread's cpu at the beginning and end of each compute_thread(). Perhaps
you will notice that the per-thread cpu correlates with the test times.

/*
* int latestcpu(pid_t pid)
*
* Copyright (C) 2005 Silicon Graphics, Inc.
* This code is subject to the terms and conditions of the
* GNU General Public License.
*
* Return most recent CPU on which task pid executed.
*
* The last used CPU is visible for a given pid as field #39
* (starting with #1) in the file /proc/<pid>/stat. Currently
* this file has 41 fields, in which case this is the 3rd to
* the last field.
*/

#include <stdio.h>
#include <fcntl.h>
#include <sys/types.h>
#include <limits.h>
#include <linux/limits.h>

int latestcpu(pid_t pid)
{
char buf[PATH_MAX];
int fd;
int cpu = -1;

snprintf(buf, sizeof(buf), "/proc/%d/stat", pid);
fd = open(buf, O_RDONLY);
buf[0] = 0; /* in case fd < 0 and read() is no-op */
read(fd, buf, sizeof(buf));
close(fd);
sscanf(buf, "%*u %*s %*s %*u %*u %*u %*u %*u %*u %*u "
"%*u %*u %*u %*u %*u %*u %*u %*u %*u %*u "
"%*u %*u %*u %*u %*u %*u %*u %*u %*u %*u "
"%*u %*u %*u %*u %*u %*u %*u %*u %u", /* 39th field */
&cpu);
return cpu;
}

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2005-11-24 09:41:18

by Con Kolivas

[permalink] [raw]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP machine.

On Thu, 24 Nov 2005 19:33, Marcel Zalmanovici wrote:
> Hi Con,
>
> I've tried the pthread_join theory.
> pthread_join completes very fast, no evidence on it being the perpetrator
> here.
>
> I've ran your ps -... idea in a different term window every ms while the
> test was running. It created a large file so I won't mail it to you, but
> both me and Muli observed that once the thread ends it quickly disappears
> from the ps list.
>
> I've also ran Muli's idea and added gettimeofday calls.
> Here's the altered code and output:
>
> (See attached file: idle.log)(See attached file: sched_test.c)
>
> As you can see, if a thread already finished pthread_join returns in a
> split second.
>
> Any other ideas are welcome :-)

Profile the actual application?

> P.S. - I agree that Lotus isn't ideal for this kind of conversations, but
> that's what IBM is using.

It was tongue in cheek ;) Well that should still not stop you from replying
below the original thread.

Cheers,
Con

2005-11-24 09:48:16

by Marcel Zalmanovici

[permalink] [raw]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP machine.






Hi Paul,

I've tried your suggestion.
I tried first adding latestcpu( getpid() ), like you've said, but as you
can see from the log this did not yield anything interesting.
I've then tried with latestcpu( gettid() ). The results there surprised my
a bit:

I would've expected to see low times whenever the 2 threads using a DS are
on the same CPU and high times when they are placed on different CPUs.
Also lower times would be expected when the threads aer divided 4 to CPU 0
and 4 to CPU 1 (I've disabled hyperthreading for now).

If you look at the run that took 14.78sec you'll see that only the threads
of 1 out of 4 DS are on same CPU, but result is low. Conversly, look at the
19.55sec reults; same thing happens there.
Pretty much the same can be seen looking at the 14.62 and 19.57 results,
except there the threads of 2 out of 4 DS match.

As for the 2nd presumption, look at times 14.64 and 18.90 - threads are
divided 4 and 4 on both ocassions.

Here are the log files:

(See attached file: latest.tid.log)(See attached file: latest.pid.log)

Any other ideas are highly welcomed.

Marcel




Paul Jackson
<[email protected]> To: Marcel Zalmanovici/Haifa/IBM@IBMIL
cc: [email protected]
20/11/2005 16:02 Subject: Re: Inconsistent timing results of multithreaded program on an SMP
machine.






Marcel wrote:
> Instead what I've got was an oscillation where the maximum time was twice
> and more than the minimum!! For a short test results ranged ~7sec to ~16
...

Just for grins, try displaying which cpu each thread runs on. Display the
return from "latestcpu(getpid())" in the two per-thread printf's, to
display
the thread's cpu at the beginning and end of each compute_thread().
Perhaps
you will notice that the per-thread cpu correlates with the test times.

/*
* int latestcpu(pid_t pid)
*
* Copyright (C) 2005 Silicon Graphics, Inc.
* This code is subject to the terms and conditions of the
* GNU General Public License.
*
* Return most recent CPU on which task pid executed.
*
* The last used CPU is visible for a given pid as field #39
* (starting with #1) in the file /proc/<pid>/stat. Currently
* this file has 41 fields, in which case this is the 3rd to
* the last field.
*/

#include <stdio.h>
#include <fcntl.h>
#include <sys/types.h>
#include <limits.h>
#include <linux/limits.h>

int latestcpu(pid_t pid)
{
char buf[PATH_MAX];
int fd;
int cpu = -1;

snprintf(buf, sizeof(buf), "/proc/%d/stat", pid);
fd = open(buf, O_RDONLY);
buf[0] = 0; /* in case fd < 0 and read() is no-op */
read(fd, buf, sizeof(buf));
close(fd);
sscanf(buf, "%*u %*s %*s %*u %*u %*u %*u %*u %*u %*u "
"%*u %*u %*u %*u %*u %*u %*u %*u %*u %*u "
"%*u %*u %*u %*u %*u %*u %*u %*u %*u %*u "
"%*u %*u %*u %*u %*u %*u %*u %*u %u", /* 39th
field */
&cpu);
return cpu;
}

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401


Attachments:
latest.tid.log (17.35 kB)
latest.pid.log (17.35 kB)
Download all attachments

2005-11-24 09:58:07

by Marcel Zalmanovici

[permalink] [raw]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP machine.


Con Kolivas
<[email protected] To: Marcel Zalmanovici/Haifa/IBM@IBMIL
rg> cc: [email protected], Muli Ben-Yehuda <[email protected]>
Subject: Re: Inconsistent timing results of multithreaded program on an SMP
24/11/2005 11:40 machine.













On Thu, 24 Nov 2005 19:33, Marcel Zalmanovici wrote:
> Hi Con,
>
> I've tried the pthread_join theory.
> pthread_join completes very fast, no evidence on it being the perpetrator
> here.
>
> I've ran your ps -... idea in a different term window every ms while the
> test was running. It created a large file so I won't mail it to you, but
> both me and Muli observed that once the thread ends it quickly disappears
> from the ps list.
>
> I've also ran Muli's idea and added gettimeofday calls.
> Here's the altered code and output:
>
> (See attached file: idle.log)(See attached file: sched_test.c)
>
> As you can see, if a thread already finished pthread_join returns in a
> split second.
>
> Any other ideas are welcome :-)

Profile the actual application?

Well, technically, this IS the actual application. Ultimately, I would like
to make some changes to the kernel in order to reduce the cache misses and
maybe improve run time as a consequence.
Having results spread out as much as I get, it would be very difficult, if
not impossible for me to estimate if my algorithm conveyed any real
results.

I have profiled (time ticks and L2 cache misses) the example I've posted
and a few other variations on the example, but haven't seen nothing out of
the ordinary.
Around 98-99% of time and misses are at the inner loop of the example.

Is there anything else I can check with profile that may help me understand
the phenomenon ?

> P.S. - I agree that Lotus isn't ideal for this kind of conversations, but
> that's what IBM is using.

It was tongue in cheek ;) Well that should still not stop you from replying

below the original thread.

Cheers,
Con


2005-11-24 12:43:40

by Paul Jackson

[permalink] [raw]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP machine.

Marcel,

I condensed the results you attached to another reply earlier
today on this query into a little table, showing for each of
8 threads, in order, which of the two CPUs 0 or 1 they finished
on, and the real (elapsed) time:

0 0 0 1 0 1 0 0 14.49
1 1 0 0 1 1 1 1 15.18
0 1 0 1 0 0 1 1 14.64
1 1 0 0 0 1 0 0 14.65
0 0 0 1 1 1 1 1 14.61
1 1 0 0 1 1 1 1 18.89
0 1 0 0 1 1 0 1 14.62
1 1 0 1 0 0 1 1 14.51
0 0 1 1 0 0 0 1 14.54
0 1 0 1 1 1 1 1 14.73
0 1 0 1 0 1 1 1 14.78
0 1 1 1 0 0 0 1 18.90
0 0 0 1 1 1 0 1 14.57
1 1 0 1 0 1 0 0 17.07
0 1 1 1 0 1 1 1 14.56
0 1 0 0 1 1 0 0 14.49
0 1 1 1 0 1 0 0 17.80
0 1 0 1 1 1 1 1 14.62
0 1 0 1 0 1 0 0 19.55
0 1 0 1 0 0 1 1 19.57

I notice that the deviation of the faster runs is very low,
with the best 13 of 20 runs all finishing in times between
14.49 and 14.78 seconds, but the slower runs have a higher
deviation, with the worst 7 of 20 runs finishing in times
between 15.18 and 19.57 seconds, a wider range of times.

That is, the best 13 runs differ in time by only 0.29 secs,
but the worst 7 runs differ in time by 4.39 seconds, a much
wider range.

I agree with you that I don't see a pattern in which CPU
the threads finished on or any relation between that and
the real times.

However this skewed distribution of times does suggest that
something is intruding - something that only happens perhaps
a third of the time is running or making things worse.

What I might suggest next is that you look for nice ways
that fit on a page or less and condense out all nonessential
detail to present these timing results. People have a
tendency to dump a big attachment of raw output data onto
email lists, which almost no one ever reads closely,
especially if it is in some homebrew format perculiar to
that particular inquiry. Time spent cleaning up the show
of data is often well spent, because then others might
actually look at the data and recognize a pattern that
they have something useful to say about.

Then after you have it collapsed to a nice table format,
go even the next step and describe in words any patterns
that are apparent, which will catch another chunk of
potential readers, who will skim over a data table without
really thinking. Then fine tune that wording, until it
would be understood by someone, the first time they heard
it, standing at the coffee urn, discussing sports. Often
times, by the time you get that far, you will have an
'aha' moment, leading to further experiments and insights,
before you even get a chance to ask someone else.

Question - how do you know its the multithreading that
is causing the variance? Try this with single threads,
just one per cpu, with or without hyperthread perhaps.
Never assume that you know the cause of something until
you have bracketed the tests with demonstrations that
all else being equal (or as close as can be) just adding
or removing the one suspect element causes the observed
affect to come and go.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2005-12-04 15:26:10

by Marcel Zalmanovici

[permalink] [raw]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP machine.






Hi Paul,

I've tried (again) running on a single CPU just to reassure myself that
everything is stable there. The differences were negligible.
Also, I've ran the test on a Regatta machine with 2 hyperthreaded CPUs
(just like my Intel Xeon). The results surprised me somewhat since they
were VERY stable; less than a second between the shortest and longest runs.
In addition to that, other users were connected to the machine at that
time.

I've converted the *writes* to the array tp *reads* from it and ran this
test on my machine.
The results were even more stable than I could possible hope for; maybe 0.2
sec between extremities.

I think that the conclusion from this test is that the write-back algorithm
might be responsible for the oscillating results.
Unfortunetaly, after looking at the hardware configuration I'm more
puzzled; it has write back and write through with the former being set as
default.
Write back is supposed to yield better results since data is written back
only when needed. Not seeing a correlation between run times and CPU on
which the threads run puzzles me even more now.

Any ideas come to mind ? cause I'm running out of.

Marcel




Paul Jackson
<[email protected]> To: Marcel Zalmanovici/Haifa/IBM@IBMIL
cc: [email protected], [email protected], [email protected]
24/11/2005 14:43 Subject: Re: Inconsistent timing results of multithreaded program on an SMP machine.






Marcel,

I condensed the results you attached to another reply earlier
today on this query into a little table, showing for each of
8 threads, in order, which of the two CPUs 0 or 1 they finished
on, and the real (elapsed) time:

0 0 0 1 0 1 0 0 14.49
1 1 0 0 1 1 1 1 15.18
0 1 0 1 0 0 1 1 14.64
1 1 0 0 0 1 0 0 14.65
0 0 0 1 1 1 1 1 14.61
1 1 0 0 1 1 1 1 18.89
0 1 0 0 1 1 0 1 14.62
1 1 0 1 0 0 1 1 14.51
0 0 1 1 0 0 0 1 14.54
0 1 0 1 1 1 1 1 14.73
0 1 0 1 0 1 1 1 14.78
0 1 1 1 0 0 0 1 18.90
0 0 0 1 1 1 0 1 14.57
1 1 0 1 0 1 0 0 17.07
0 1 1 1 0 1 1 1 14.56
0 1 0 0 1 1 0 0 14.49
0 1 1 1 0 1 0 0 17.80
0 1 0 1 1 1 1 1 14.62
0 1 0 1 0 1 0 0 19.55
0 1 0 1 0 0 1 1 19.57

I notice that the deviation of the faster runs is very low,
with the best 13 of 20 runs all finishing in times between
14.49 and 14.78 seconds, but the slower runs have a higher
deviation, with the worst 7 of 20 runs finishing in times
between 15.18 and 19.57 seconds, a wider range of times.

That is, the best 13 runs differ in time by only 0.29 secs,
but the worst 7 runs differ in time by 4.39 seconds, a much
wider range.

I agree with you that I don't see a pattern in which CPU
the threads finished on or any relation between that and
the real times.

However this skewed distribution of times does suggest that
something is intruding - something that only happens perhaps
a third of the time is running or making things worse.

What I might suggest next is that you look for nice ways
that fit on a page or less and condense out all nonessential
detail to present these timing results. People have a
tendency to dump a big attachment of raw output data onto
email lists, which almost no one ever reads closely,
especially if it is in some homebrew format perculiar to
that particular inquiry. Time spent cleaning up the show
of data is often well spent, because then others might
actually look at the data and recognize a pattern that
they have something useful to say about.

Then after you have it collapsed to a nice table format,
go even the next step and describe in words any patterns
that are apparent, which will catch another chunk of
potential readers, who will skim over a data table without
really thinking. Then fine tune that wording, until it
would be understood by someone, the first time they heard
it, standing at the coffee urn, discussing sports. Often
times, by the time you get that far, you will have an
'aha' moment, leading to further experiments and insights,
before you even get a chance to ask someone else.

Question - how do you know its the multithreading that
is causing the variance? Try this with single threads,
just one per cpu, with or without hyperthread perhaps.
Never assume that you know the cause of something until
you have bracketed the tests with demonstrations that
all else being equal (or as close as can be) just adding
or removing the one suspect element causes the observed
affect to come and go.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401


2005-12-04 19:55:04

by Paul Jackson

[permalink] [raw]
Subject: Re: Inconsistent timing results of multithreaded program on an SMP machine.

Marcel wrote:
> I think that the conclusion from this test is that the write-back algorithm
> might be responsible for the oscillating results.

Well ... if you suspect your storage/hardware, then try a different
file system, such as network files or /dev/ramdisk.

It's a few seconds worth of difference you're dealing with, so try
playing around with printf's in the user code and printk's in the
kernel at key points in the sequence of events, so you can see what
happens when.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401