2008-11-17 22:48:01

by James Bottomley

[permalink] [raw]
Subject: Enterprise workload testing for storage and filesystems

Hi all,

High on our list at the recent Linux Foundation end user summit was
obtaining a method of obtaining enterprise workloads (or simulators) we
can run in our own testing environments. The main problem being that
the data sets used by the systems are usually secret or under regulatory
embargo and thus unobtainable. However, several participants noted that
regulatory prohibitions also extended to their own in-house IT team,
thus they had had to develop simulators for the workloads which, since
they contained no customer data, might be more widely distributable.

Fidelity National Information Service were the first to try this.
They've kicked off a sourceforge site for their stress testing tool
(which is the same tool they use in their own qualification labs). The
source for the tool is available here:

http://sourceforge.net/project/showfiles.php?group_id=11026&package_id=298597

And it comes with a fairly detailed readme explaining what it's trying
to simulate and why. Hopefully this will give us all a much better
insight into both enterprise workloads and the way enterprise IT
departments conduct testing.

Let's see how our storage and filesystem tuning measures up to this.

James


2008-11-20 21:37:29

by Jeff Moyer

[permalink] [raw]
Subject: Re: Enterprise workload testing for storage and filesystems

James Bottomley <[email protected]> writes:

> Hi all,
>
> High on our list at the recent Linux Foundation end user summit was
> obtaining a method of obtaining enterprise workloads (or simulators) we
> can run in our own testing environments. The main problem being that
> the data sets used by the systems are usually secret or under regulatory
> embargo and thus unobtainable. However, several participants noted that
> regulatory prohibitions also extended to their own in-house IT team,
> thus they had had to develop simulators for the workloads which, since
> they contained no customer data, might be more widely distributable.
>
> Fidelity National Information Service were the first to try this.
> They've kicked off a sourceforge site for their stress testing tool
> (which is the same tool they use in their own qualification labs). The
> source for the tool is available here:
>
> http://sourceforge.net/project/showfiles.php?group_id=11026&package_id=298597
>
> And it comes with a fairly detailed readme explaining what it's trying
> to simulate and why. Hopefully this will give us all a much better
> insight into both enterprise workloads and the way enterprise IT
> departments conduct testing.
>
> Let's see how our storage and filesystem tuning measures up to this.

This is indeed great news! The tool is very flexible, so I'd like to
know if we can get some sane configuration options to start testing.
I'm sure I can cook something up, but I'd like to be confident that what
I'm testing does indeed reflect a real-world workload.

Cheers,
Jeff

2008-11-21 04:42:50

by Grant Grundler

[permalink] [raw]
Subject: Re: Enterprise workload testing for storage and filesystems

On Mon, Nov 17, 2008 at 2:47 PM, James Bottomley
<[email protected]> wrote:
> Hi all,
>
> High on our list at the recent Linux Foundation end user summit was
> obtaining a method of obtaining enterprise workloads (or simulators) we
> can run in our own testing environments. The main problem being that
> the data sets used by the systems are usually secret or under regulatory
> embargo and thus unobtainable. However, several participants noted that
> regulatory prohibitions also extended to their own in-house IT team,
> thus they had had to develop simulators for the workloads which, since
> they contained no customer data, might be more widely distributable.

Google has the same concerns. Hoping to work through those, last year
I arranged funding for UNSW (Joshua Root) to develop a GPL linux block
layer replay tool:
http://www.gelato.unsw.edu.au/IA64wiki/JoshuaRoot/MarkovChains

Unfortunately, I've not been able to address all concerns with this
and thus can't offer any google specific markov chains. :( Still I
hope this tool can be of use to others.


> Fidelity National Information Service were the first to try this.
> They've kicked off a sourceforge site for their stress testing tool
> (which is the same tool they use in their own qualification labs). The
> source for the tool is available here:
>
> http://sourceforge.net/project/showfiles.php?group_id=11026&package_id=298597

Awesome! Kudos!

> And it comes with a fairly detailed readme explaining what it's trying
> to simulate and why. Hopefully this will give us all a much better
> insight into both enterprise workloads and the way enterprise IT
> departments conduct testing.
>
> Let's see how our storage and filesystem tuning measures up to this.

*nod*

thanks,
grant

2008-11-21 16:08:01

by K.S. Bhaskar

[permalink] [raw]
Subject: Re: Enterprise workload testing for storage and filesystems

On 11/20/2008 04:37 PM, Jeff Moyer wrote:
> James Bottomley <[email protected]> writes:

[KSB] <...snip...>

> > Let's see how our storage and filesystem tuning measures up to this.
>
> This is indeed great news! The tool is very flexible, so I'd like to
> know if we can get some sane configuration options to start testing.
> I'm sure I can cook something up, but I'd like to be confident that what
> I'm testing does indeed reflect a real-world workload.

[KSB] Here are numbers for some tests that we ran recently:

io_thrash -o 4 4 testdb 4000000 100000 12 8192 512 1000 90 90 10 512
io_thrash -o 4 4 testdb 4000000 100000 12 8192 512 10000 90 90 10 512
io_thrash -o 4 4 testdb 4000000 100000 12 8192 512 100000 90 90 10 512
io_thrash -o 4 4 testdb 4000000 100000 12 8192 512 200000 90 90 10 512

Note that these are relatively modest tests (4x32GB database files, all
on one file system, 12 processes). To simulate bigger loads, allow the
journal file sizes to grow to 4GB, use a configuration file to spread
the database and journal files on different file systems, take the
number of processes up into the hundreds and database sizes into the
hundreds of GB. To keep test times reasonable, use the smallest numbers
that give insightful results (after a point, making things bigger adds
more time, but does not yield additional insights into system behavior,
which is what we are trying to achieve).

Regards
-- Bhaskar

_____________

The information contained in this message is proprietary and/or confidential. If you are not the
intended recipient, please: (i) delete the message and all copies; (ii) do not disclose,
distribute or use the message in any manner; and (iii) notify the sender immediately. In addition,
please be aware that any message addressed to our domain is subject to archiving and review by
persons other than the intended recipient. Thank you.
_____________

2008-11-21 16:19:34

by Alan D. Brunelle

[permalink] [raw]
Subject: Re: Enterprise workload testing for storage and filesystems

K.S. Bhaskar wrote:
> On 11/20/2008 04:37 PM, Jeff Moyer wrote:
>> James Bottomley <[email protected]> writes:
>
> [KSB] <...snip...>
>
>> > Let's see how our storage and filesystem tuning measures up to this.
>>
>> This is indeed great news! The tool is very flexible, so I'd like to
>> know if we can get some sane configuration options to start testing.
>> I'm sure I can cook something up, but I'd like to be confident that what
>> I'm testing does indeed reflect a real-world workload.
>
> [KSB] Here are numbers for some tests that we ran recently:
>
> io_thrash -o 4 4 testdb 4000000 100000 12 8192 512 1000 90 90 10 512
> io_thrash -o 4 4 testdb 4000000 100000 12 8192 512 10000 90 90 10 512
> io_thrash -o 4 4 testdb 4000000 100000 12 8192 512 100000 90 90 10 512
> io_thrash -o 4 4 testdb 4000000 100000 12 8192 512 200000 90 90 10 512
>
> Note that these are relatively modest tests (4x32GB database files, all
> on one file system, 12 processes). To simulate bigger loads, allow the
> journal file sizes to grow to 4GB, use a configuration file to spread
> the database and journal files on different file systems, take the
> number of processes up into the hundreds and database sizes into the
> hundreds of GB. To keep test times reasonable, use the smallest numbers
> that give insightful results (after a point, making things bigger adds
> more time, but does not yield additional insights into system behavior,
> which is what we are trying to achieve).
>
> Regards
> -- Bhaskar

Thanks for additional feedback Bhaskar - I've been playing with this
on-and-off the last couple of days trying to stress one testbed (16 way
AMD, 128GB RAM, two P800 Smart Arrays (48 disks total put into a single
LVM2/DM volume)). I've been able to get the I/O subsystem 100% utilized,
but in so doing really didn't stress the system (something like 80-90%
idle).

In order to stress the whole system, it sounds like it _may_ be better
to use 48 separate file systems on 48 separate platters (each with its
own DB)? Or are there other knobs to play with to get more of the system
involved besides the I/O? Is it a good idea to separate the journals
from the DB (separate FS/platter)?

Regards,
Alan

2008-11-21 16:43:04

by K.S. Bhaskar

[permalink] [raw]
Subject: Re: Enterprise workload testing for storage and filesystems

On 11/21/2008 11:18 AM, Alan D. Brunelle wrote:
> K.S. Bhaskar wrote:

[KSB2] <...snip...>

> Thanks for additional feedback Bhaskar - I've been playing with this
> on-and-off the last couple of days trying to stress one testbed (16 way
> AMD, 128GB RAM, two P800 Smart Arrays (48 disks total put into a single
> LVM2/DM volume)). I've been able to get the I/O subsystem 100% utilized,
> but in so doing really didn't stress the system (something like 80-90%
> idle).
>
> In order to stress the whole system, it sounds like it _may_ be better
> to use 48 separate file systems on 48 separate platters (each with its
> own DB)? Or are there other knobs to play with to get more of the system
> involved besides the I/O? Is it a good idea to separate the journals
> from the DB (separate FS/platter)?

[KSB2] The intent of io_thrash is to stress the IO subsystem. So, I am
not at all surprised that CPU and memory were not stressed.

With the 48 platters on your system, perhaps consider creating 4 logical
volumes each striped across 12 physical volumes. Try 8 databases, with
each logical volume having two databases and journal files for two
databases that reside on different file systems.

In the real world, yes one would separate each journal file from its
database file, at least putting them on separate platters, because if
the journal platters, disk controller, or file system croak, you still
have the database, and if the database underpinnings die, the database
is recoverable from a backup and the journal file. One aims to get
maximum separation from the database and its journal file.

If you want to simulate an application that produces a more balanced
load, perhaps you can set %ioUnderLock to 0 and modify io_thrash to do
some compute intensive task (like fill a large block of memory with
pseudo random numbers) before each IO operation. You would probably
want to increase the number of processes so that the IO subsystem
continues to be driven hard.

Regards
-- Bhaskar

_____________

The information contained in this message is proprietary and/or confidential. If you are not the
intended recipient, please: (i) delete the message and all copies; (ii) do not disclose,
distribute or use the message in any manner; and (iii) notify the sender immediately. In addition,
please be aware that any message addressed to our domain is subject to archiving and review by
persons other than the intended recipient. Thank you.
_____________