Project page: http://www.csn.ul.ie/~mel/projects/vmregress/
Download: http://www.csn.ul.ie/~mel/projects/vmregress/vmregress-0.4.tar.gz
This is the first public release of VM Regress v0.4 (BumbleBee). It is the
beginnings of a regression, benchmarking and test tool for the Linux VM.
The web page has an introduction and the project itself has quiet
comprehensive documentation and commentary so I am not going to go into
heavy detail here.
There appears to be frequent trouble reliably testing the VM and comparing
the impact (beneficial or otherwise) of VM features. As best as I can
tell, there is heavy reliance on stress testing or intuitive decisions
made by individual kernel developers to prove a VM is working or that is
is better than another implementation. This tool eventually will be able
to provide empirical data on VM performance as well as acting as a
regression tool to make sure changes don't break anything.
It works by using kernel modules to get a definite view of what the kernel
is at and to provide reliable, reproducible tests. Modules are divided
up into 4 catagories. Core modules provide infrastructure for the tool.
Sense modules tell what is going on in the VM. Test tests particular
features and bench modules (none yet) will benchmark different sections
of the VM.
The aim is to eventually eliminate guesswork in development. The tool will
be able to tell for definite if a feature works. If it does work, it will
be able to tell how well or poorly the feature performed. This will
hopefully replace ad-hoc shell script tests and provide concrete
performance data any developer can reliably produce and use as proof of
"Feature X is better"
The interface to the tests are via proc at /proc/vmregress. Help is provided
for most of them by cat'ing the entries after module load. The README and
manual are very comprehensive and each c file has a detailed description at
the top so I'm not going to go into heavy detail in this mail. The README
includes a sample set of tests to illustrate how the tool can be used to
provide useful information about performance.
This was developed against 2.4.18 and 2.4.19 but will compile with 2.5.30
and takes into account the existence of rmap (will compile and work with
or without rmap). Bear in mind the tool is far for complete and I'm just
looking for feedback on the viability and usefulness (or the lack thereof)
of this tool. Consequently, it doesn't do much. Currently it
o Provides infrastructure such as proc helper functions, page table walk
functions and so on
o Provides tests for the /proc interface to ensure it works
o Prints out the sizeof() VM related structs and prints out the memory usage
o Prints out information on all zones in the system
o Tests physical page allocation/free functions with either GFP_ATOMIC or
GFP_KERNEL flags
o Tests page faulting routines
This has been tested heavily with UML 2.4.18 and with a dual PII350 running
2.4.19 . It is known to compile with 2.5.30 but I haven't done any 2.5 testing
yet due to the lack of a crash box. It will work with or without rmap as
the tool was written with it (as well as every other VM feature) in mind.
Any feedback is appreciated.
--
Mel Gorman
MSc Student, University of Limerick
http://www.csn.ul.ie/~mel
In article <Pine.LNX.4.44.0208112109110.16360-100000@skynet> you wrote:
> It works by using kernel modules to get a definite view of what the kernel
> is at and to provide reliable, reproducible tests. Modules are divided
> up into 4 catagories. Core modules provide infrastructure for the tool.
> Sense modules tell what is going on in the VM. Test tests particular
> features and bench modules (none yet) will benchmark different sections
> of the VM.
This sounds more like a micro benchmark tool, which is a good start, but the
real problem with VM optimizations is, that they have to take into account
real world load and especially user experience.
A simple example is the fact, that an idle desktop box will feel very sluggy
if a user comes back after a few hours break, because all visible programs
are paged out. To improve this, one could think about adding a flag to
applications like "connected to gui". This feature would need a test then,
which is no usual micro benchmark.
So I think it is a good idea to avoid to introduce slow operations in
hot code path, but it does not help much the developers in the problem of
simulating workload and measuring the interactive and real throughput.
But perhaps you can take this into account?
Greetings
Bernd
On Monday 12 August 2002 03:40, Bernd Eckenfels wrote:
> In article <Pine.LNX.4.44.0208112109110.16360-100000@skynet> you wrote:
> > It works by using kernel modules to get a definite view of what the kernel
> > is at and to provide reliable, reproducible tests. Modules are divided
> > up into 4 catagories. Core modules provide infrastructure for the tool.
> > Sense modules tell what is going on in the VM. Test tests particular
> > features and bench modules (none yet) will benchmark different sections
> > of the VM.
>
> This sounds more like a micro benchmark tool, which is a good start, but the
> real problem with VM optimizations is, that they have to take into account
> real world load and especially user experience.
We get too hung up on 'real world' world loads, that is not a productive way
VM developers to spend their time. Developers need to use tests that focus
on very specific aspects of VM performance. Yes, this testing should be
backed up by 'real world' tests to confirm what the VM developer thinks, that
improved performance on a subsystem translates into improved overall
performance, and to keep a watch out for unexpected or undesirable
interactions. That's called a 'reality tests'.
If you want to help with 'interactive performance', i.e., user experience,
then *quantify what contributes to that* and write a micro-measurement tool
that measures such things. E.g, latency of response to keyboard events under
load. It's not rocket science, it just takes time and effort to set this
kind of thing up so it's accurate and predictive.
It's an incredible waste of developer's time to be running 'reality tests'
all the time, and never using more precise measurement methods. Anyone who
wants to run reality tests and post the results is more than welcome to, and
this is valuable. It's not valuable to throw mud at a testing/measurement
tool because you think it's not 'realistic'.
--
Daniel
On Mon, 12 Aug 2002, Bernd Eckenfels wrote:
> This sounds more like a micro benchmark tool, which is a good start, but the
> real problem with VM optimizations is, that they have to take into account
> real world load and especially user experience.
>
The tool is a micro test and benchmark tool, true. It is known and noted
in the documentation that this won't take overall system performance into
account. Fortunatly there is a number of existing userland tools out there
like lmbench that provide that type of information and there is a number
of subjective reports available from users regarding interactivity.
> applications like "connected to gui". This feature would need a test then,
> which is no usual micro benchmark.
>
That type of information is different to what VM Regress aims to provide.
VM Regress is aimed at providing performance and test data on individual
parts of the VM.
> hot code path, but it does not help much the developers in the problem of
> simulating workload and measuring the interactive and real throughput.
>
> But perhaps you can take this into account?
>
I have taken it into account and decided after some thought that overall
performance and throughput is not the place for a micro tool like VM
Regress and more the domain of a userland test suite.
I am more interested in answering questions like
o Does subsystem X still work after changes made to it?
o How well does subsystem X perform?
o How long does it take to find pages to swap out?
o How much overhead is introduced by feature Y?
o What does my process space look like after vmscan does it's work?
For example, in time, it'll be able to tell exactly how well rmap is
performing and compare it to a VM without rmap in terms of "how long it
took to find a page to replace" and "what did the address space look like
after kswapd worked". I should be able to show that rmap kept the correct
pages in memory for instance where as an overall benchmarking tool is
going to tell me nothing new. Used in combination with a profiling tool
like oprofile, I should be able to get very specific performance data that
I suspect will be useful to developers and to a much lesser extent, users.
I am making a persumption that if it can be shown that each individual
component is working and performs well, then overall performance should
improve.
--
Mel Gorman
MSc Student, University of Limerick
http://www.csn.ul.ie/~mel
On Mon, 12 Aug 2002, Daniel Phillips wrote:
> If you want to help with 'interactive performance', i.e., user
> experience, then *quantify what contributes to that* and write a
> micro-measurement tool that measures such things. E.g, latency of
> response to keyboard events under load. It's not rocket science, it
> just takes time and effort to set this kind of thing up so it's accurate
> and predictive.
http://people.redhat.com/bmatthews/irman/
I've already asked Randy Hwron(sp?) to include this in
his regular benchmarking.
> It's an incredible waste of developer's time to be running 'reality
> tests' all the time, and never using more precise measurement methods.
> Anyone who wants to run reality tests and post the results is more than
> welcome to, and this is valuable. It's not valuable to throw mud at a
> testing/measurement tool because you think it's not 'realistic'.
The thing is that developers need some benchmarking thing
they can script to run overnight. Watching vmstat for
hours on end is not a useful way of spending development
time.
On the other hand, if somebody could code up some scriptable
benchmarks that approximate real workloads better than the
current benchmarks do, I'd certainly appreciate it.
For web serving, for example, I wouldn't mind a benchmark that:
1) simulates a number of users, that:
1a) load a page with 10 to 20 associated images
1b) sleep for a random time between 3 and 60 seconds,
"reading the page"
1c) follow a link and grab another page with N images
2) varies the number of users from 1 to N
3) measures
3a) the server's response time until it starts
answering the request
3b) the time it takes to download each full page
Then we can plot both kinds of response time against the number
of users and we have an idea of the web serving performance of
a particular system ... without focussing on, or even measuring,
the unrealistic "servers N pages per minute" number.
Volunteers ? ;)
kind regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
On Mon, 12 Aug 2002, Rik van Riel wrote:
> On the other hand, if somebody could code up some scriptable
> benchmarks that approximate real workloads better than the
> current benchmarks do, I'd certainly appreciate it.
>
This looks like an overall system benchmark again and while it would be
great to have, it is not what I aim to provide here with VM Regress.
> For web serving, for example, I wouldn't mind a benchmark that:
>
> <Benchmark snipped>
That benchmarks like it would be more likely to test network throughput
than VM performance although I could be misunderstanding your benchmark,
apologies if I am. The 20 images would either get mmaped by the server or
else will be read from the buffer cache. Once it's in memory, it's just a
case of retransmitting each time which doesn't appear particularly
interesting to me from a VM perspective.
In VM Regress land, I would be much more likely to provide a benchmark
that did something like the folllowing. (Remember that VM Regress aims to
provide more than been a pure benchmarking tool. Benchmarking is just one
aspect)
1) Memory map with MAP_SHARED a number of regions
1a) Each region is 512 pages large (2MB on a x86)
1b) Create a number of regions until a percentage of memoryt is used
that would hit the various watermarks of the zones
2) Over 1 hour do, reference regions with a gaussion pattern to simulate
popular pages and images
3) At the end, give the best, worst and average time to read a region.
Print out what regions are still resident in memory and compare that to
the references. Regions referenced often should still be in memory and
dead regions should be in swap
4) Repeat the test altering the following parameters
- The percentage of physical memory consumed to see what gets swapped out
- Simulate disk buffer usage instead of mmap'ing regions
With a low percentage of physical memory used, there shouldn't be anything
too interesting happening because cache should be doing most of the work.
With more regions, it should be noted how the VM holds up, how well it
selects regions to swap out and how long it takes to find the proper pages
and so on.
This type of benchmark is far away but I already do most of this work with
the fault.o module. I memory map a region of which the size is related to
the amount of physical memory (more accuratly it's related to the zone
watermarks for the zone known to be affected by the test) and touch every
page in the region. For n passes, I check if each page is present, if it's
swapped out, I touch it to swap it back in. I then print out how many
pages were swapped in, how many pages are physically present in the region
and how long that pass took in milliseconds.
That is most of the work there so this isn't quiet vapourware, more a
really dense fog. I just need a few more bits and pieces such as printing
graphs of present pages vs references and meaningful data is easily
accessible
> Then we can plot both kinds of response time against the number
> of users and we have an idea of the web serving performance of
> a particular system ... without focussing on, or even measuring,
> the unrealistic "servers N pages per minute" number.
>
> Volunteers ? ;)
>
Not for that particular benchmark, but how useful would the VM Regress
equivilant be?
--
Mel Gorman
MSc Student, University of Limerick
http://www.csn.ul.ie/~mel
On Mon, 12 Aug 2002, Mel wrote:
> On Mon, 12 Aug 2002, Rik van Riel wrote:
>
> > On the other hand, if somebody could code up some scriptable
> > benchmarks that approximate real workloads better than the
> > current benchmarks do, I'd certainly appreciate it.
>
> This looks like an overall system benchmark again and while it would be
> great to have, it is not what I aim to provide here with VM Regress.
>
> > For web serving, for example, I wouldn't mind a benchmark that:
> >
> > <Benchmark snipped>
>
> That benchmarks like it would be more likely to test network throughput
> than VM performance although I could be misunderstanding your benchmark,
The thing is that the indivual 'users' will be downloading
files at modem and adsl speeds, meaning a LOT of apache
daemons could be sitting around on the server.
You are right though that this is more of an overall system
benchmark than a pure VM test. On the other hand, the VM
doesn't function on its own, it really needs to be part of
a larger system ;)
> In VM Regress land, I would be much more likely to provide a benchmark
> that did something like the folllowing. (Remember that VM Regress aims
> to provide more than been a pure benchmarking tool. Benchmarking is just
> one aspect)
That might be a useful test. How useful it would be we can't
really know until we've tried, but it definately does sound like
it's worth a try...
> > Volunteers ? ;)
>
> Not for that particular benchmark, but how useful would the VM Regress
> equivilant be?
I can't say in advance how useful it would be, but my gut
feeling is that it might help getting things right.
regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
On Mon, 12 Aug 2002, Rik van Riel wrote:
> The thing is that the indivual 'users' will be downloading
> files at modem and adsl speeds, meaning a LOT of apache
> daemons could be sitting around on the server.
At the moment, VM Regress cannot run multiple instances of the same test
(although it should be SMP safe and is written with SMP in mind) but I
plan to address that. The problem is not the code running, it's printing
out test results but I know how to address it, I just haven't implemented
it yet.
Because this is a benchmark and not a straight-forward test, another
parameter could be added called page reference delay. It would be a time a
page would be locked while it was been "transmitted" and then run multiple
instances of the test for different transmission speeds. I digress because
benchmarks like this are vapourware in VM Regress land at the moment and
I'm not prepared to discuss individual benchmarks just yet.
> You are right though that this is more of an overall system
> benchmark than a pure VM test. On the other hand, the VM
> doesn't function on its own, it really needs to be part of
> a larger system ;)
>
I understand that but I believe there is a number of benchmarks that
already demonstrate overall performance. A normal benchmark will tell you
X bytes was transmitted but won't tell you where time in the kernel was
spent and won't tell you anything about the end state of the system.
Using VM Regress, you could tell if delays were in page allocation, disk
reads, bad swap decisions etc. by running individual micro benchmarks (you
can already test __alloc_pages and mmap related routines). A normal
benchmark won't tell you and I'm not aware of any tool that can do the
equivilant outside of stress testing. That is one of my "selling point".
> > Not for that particular benchmark, but how useful would the VM Regress
> > equivilant be?
>
> I can't say in advance how useful it would be, but my gut
> feeling is that it might help getting things right.
>
ok, that is more or less what I was looking for. If I can get some sort of
indication from experienced VM developers on whether such a tool is
useful, I'll keep developing the tool. I know it doesn't do enough to be
truly useful yet. This first release was to find if there was any "What
sort of uselessness is that?", "haha, we already have all this
information" or "we don't need such data" reactions.
--
Mel Gorman
MSc Student, University of Limerick
http://www.csn.ul.ie/~mel
Rik van Riel wrote:
> The thing is that developers need some benchmarking thing
> they can script to run overnight. Watching vmstat for
> hours on end is not a useful way of spending development
> time.
>
> On the other hand, if somebody could code up some scriptable
> benchmarks that approximate real workloads better than the
> current benchmarks do, I'd certainly appreciate it.
>
> For web serving, for example, I wouldn't mind a benchmark that:
>
> 1) simulates a number of users, that:
> 1a) load a page with 10 to 20 associated images
> 1b) sleep for a random time between 3 and 60 seconds,
> "reading the page"
> 1c) follow a link and grab another page with N images
> 2) varies the number of users from 1 to N
> 3) measures
> 3a) the server's response time until it starts
> answering the request
> 3b) the time it takes to download each full page
>
> Then we can plot both kinds of response time against the number
> of users and we have an idea of the web serving performance of
> a particular system ... without focussing on, or even measuring,
> the unrealistic "servers N pages per minute" number.
>
Don't forget to count the total amount of
swap & block io. (i.e. vmstat 1 > logfile & sum it up)
Good strategies for page replacement may result in
less io for the same job, which means a lot for
performance whenever you get disk-bound. Many
a web server serves more than fits in cache, and of
course there are file servers too...
Helge Hafting