Hi all,
I've run into a weird phenomenon I was hoping someone could help me pin down.
I've got a multithreaded program which reads data off of 8 input
disks, and does some processing on it. I have 8 Reader threads, each
of which reads data off of one of the eight input disks. That data
gets passed to other threads, which do some processing (this is an 8
core machine). If I do no or minimal processing in those other
threads, the read() calls go at the speed of the disks (~100 MBps). I
measure that speed by taking a timestamp before the read syscall, then
afterwards, and dividing that into the read size (5MB). All seems
good.
However, if I start doing more computation on those other threads, the
read() syscalls take longer to read the same amount of data,
eventually slowing down to 50 MBps (50% slower). I've used
setaffinity() to isolate the Reader threads to one set of cores, and
the compute threads to a different set of cores, and so I don't think
it is CPU/scheduling interference.
Thoughts? Has anyone run into this before?
Also, if you could CC me directly on any responses I would appreciate it.
Thanks, George
On 1 May 2012 16:03, George Porter <[email protected]> wrote:
> Hi all,
>
> I've run into a weird phenomenon I was hoping someone could help me pin down.
>
> I've got a multithreaded program which reads data off of 8 input
> disks, and does some processing on it. ?I have 8 Reader threads, each
> of which reads data off of one of the eight input disks. ?That data
> gets passed to other threads, which do some processing (this is an 8
> core machine). ?If I do no or minimal processing in those other
> threads, the read() calls go at the speed of the disks (~100 MBps). ?I
> measure that speed by taking a timestamp before the read syscall, then
> afterwards, and dividing that into the read size (5MB). ?All seems
> good.
>
> However, if I start doing more computation on those other threads, the
> read() syscalls take longer to read the same amount of data,
> eventually slowing down to 50 MBps (50% slower). ?I've used
> setaffinity() to isolate the Reader threads to one set of cores, and
> the compute threads to a different set of cores, and so I don't think
> it is CPU/scheduling interference.
>
> Thoughts? ?Has anyone run into this before?
It could be memory or coherency traffic that's causing a slowdown across
the system?
Could a dynamic core frequency / thermal throttling explain any of the
slowdown?
On Mon, Apr 30, 2012 at 11:40 PM, Nick Piggin <[email protected]> wrote:
> It could be memory or coherency traffic that's causing a slowdown across
> the system?
>
> Could a dynamic core frequency / thermal throttling explain any of the
> slowdown?
Thanks for these suggestions. I checked the BIOS, and indeed it was
set to "Hybrid power/performance." I just reset the server to "Static
high performance" which I believe turns off processor frequency power
optimizations. I'll re-run our workload and see if that has an
effect.
Thanks, George
On 05/01/2012 12:03 AM, George Porter wrote:
> However, if I start doing more computation on those other threads, the
> read() syscalls take longer to read the same amount of data,
> eventually slowing down to 50 MBps (50% slower). I've used
> setaffinity() to isolate the Reader threads to one set of cores, and
> the compute threads to a different set of cores, and so I don't think
> it is CPU/scheduling interference.
>
> Thoughts? Has anyone run into this before?
If you're using hyperthreading you may want to try it with either
putting the computation threads on the siblings of the cpus for the
reader threads (to share cache) or else not on the siblings of the cpus
for the reader threads (to minimize contention of cpu resources).
Similarly, you may want to play with wither or not the threads are on
the same or different sockets.
Chris
--
Chris Friesen
Software Developer
GENBAND
[email protected]
http://www.genband.com
Thanks for the response--we did play around with core affinity, and it
does make a difference for sure. The major thing was turning off HP's
power management stuff, and putting the BIOS into high-performance
mode. That helped a lot.
Thanks, George
On Wed, May 2, 2012 at 11:39 AM, Chris Friesen
<[email protected]> wrote:
> On 05/01/2012 12:03 AM, George Porter wrote:
>
>> However, if I start doing more computation on those other threads, the
>> read() syscalls take longer to read the same amount of data,
>> eventually slowing down to 50 MBps (50% slower). ?I've used
>> setaffinity() to isolate the Reader threads to one set of cores, and
>> the compute threads to a different set of cores, and so I don't think
>> it is CPU/scheduling interference.
>>
>> Thoughts? ?Has anyone run into this before?
>
>
> If you're using hyperthreading you may want to try it with either putting
> the computation threads on the siblings of the cpus for the reader threads
> (to share cache) or else not on the siblings of the cpus for the reader
> threads (to minimize contention of cpu resources).
>
> Similarly, you may want to play with wither or not the threads are on the
> same or different sockets.
>
> Chris
>
> --
> Chris Friesen
> Software Developer
> GENBAND
> [email protected]
> http://www.genband.com