2013-04-15 23:19:37

by Dave Hansen

[permalink] [raw]
Subject: Re: [RESEND] IOZone with transparent huge page cache

On 04/15/2013 11:17 AM, Kirill A. Shutemov wrote:
> I run iozone using mmap files (-B) with different number of threads.
> The test machine is 4s Westmere - 4x10 cores + HT.

How did you run this, exactly? Which iozone arguments? It was run on
ramfs, since that's the only thing that transparent huge page cache
supports right now?

> ** Initial writers **
> threads: 1 2 4 8 16 32 64 128 256
> baseline: 1103360 912585 500065 260503 128918 62039 34799 18718 9376
> patched: 2127476 2155029 2345079 1942158 1127109 571899 127090 52939 25950
> speed-up(times): 1.93 2.36 4.69 7.46 8.74 9.22 3.65 2.83 2.77

I'm a _bit_ surprised that iozone scales _that_ badly especially while
threads<nr_cpus. Is this normal for iozone? What are the units and
metric there, btw?

> Minimal speed up is in 1-thread reverse readers - 23%.
> Maximal is 9.2 times in 32-thread initial writers. It's probably due
> batched radix tree insert - we insert 512 pages a time. It reduces
> mapping->tree_lock contention.

It might actually be interesting to see this at 10, 20, 40, 80, etc...
since that'll actually match iozone threads to CPU cores on your
particular system.


2013-04-16 05:55:38

by [email protected]

[permalink] [raw]
Subject: Re: [RESEND] IOZone with transparent huge page cache

Dave Hansen wrote:
> On 04/15/2013 11:17 AM, Kirill A. Shutemov wrote:
> > I run iozone using mmap files (-B) with different number of threads.
> > The test machine is 4s Westmere - 4x10 cores + HT.
>
> How did you run this, exactly? Which iozone arguments?

iozone -B -s 21822226/$threads -t $threads -r 4 -i 0 -i 1 -i 2 -i 3

It's slightly modified iozone test from mmtests.

> It was run on ramfs, since that's the only thing that transparent huge page
> cache supports right now?

Correct.

> > ** Initial writers **
> > threads: 1 2 4 8 16 32 64 128 256
> > baseline: 1103360 912585 500065 260503 128918 62039 34799 18718 9376
> > patched: 2127476 2155029 2345079 1942158 1127109 571899 127090 52939 25950
> > speed-up(times): 1.93 2.36 4.69 7.46 8.74 9.22 3.65 2.83 2.77
>
> I'm a _bit_ surprised that iozone scales _that_ badly especially while
> threads<nr_cpus. Is this normal for iozone? What are the units and
> metric there, btw?

The units is KB/sec per process (I used 'Avg throughput per process' from
iozone report). So it scales not that badly.
I will use total children throughput next time to avoid confusion.

> > Minimal speed up is in 1-thread reverse readers - 23%.
> > Maximal is 9.2 times in 32-thread initial writers. It's probably due
> > batched radix tree insert - we insert 512 pages a time. It reduces
> > mapping->tree_lock contention.
>
> It might actually be interesting to see this at 10, 20, 40, 80, etc...
> since that'll actually match iozone threads to CPU cores on your
> particular system.

Okay.

--
Kirill A. Shutemov

2013-04-16 06:10:13

by Dave Hansen

[permalink] [raw]
Subject: Re: [RESEND] IOZone with transparent huge page cache

On 04/15/2013 10:57 PM, Kirill A. Shutemov wrote:
>>> > > ** Initial writers **
>>> > > threads: 1 2 4 8 16 32 64 128 256
>>> > > baseline: 1103360 912585 500065 260503 128918 62039 34799 18718 9376
>>> > > patched: 2127476 2155029 2345079 1942158 1127109 571899 127090 52939 25950
>>> > > speed-up(times): 1.93 2.36 4.69 7.46 8.74 9.22 3.65 2.83 2.77
>> >
>> > I'm a _bit_ surprised that iozone scales _that_ badly especially while
>> > threads<nr_cpus. Is this normal for iozone? What are the units and
>> > metric there, btw?
> The units is KB/sec per process (I used 'Avg throughput per process' from
> iozone report). So it scales not that badly.
> I will use total children throughput next time to avoid confusion.

Wow. Well, it's cool that your patches just fix it up inherently. I'd
still really like to see some analysis exactly where the benefit is
coming from though.