LinuxLists.cc - [PATCH 00/32] Swap over NFS

2008-10-02 13:19:40

Subject: [PATCH 00/32] Swap over NFS - v19

Patches are against: v2.6.27-rc5-mm1

This release features more comments and (hopefully) better Changelogs.
Also the netns stuff got sorted and ipv6 will now build and not oops
on boot ;-)

The first 4 patches are cleanups and can go in if the respective maintainers
agree.

The code is lightly tested but seems to work on my default config.

Let's get this ball rolling...

2008-10-02 19:49:22

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH 00/32] Swap over NFS - v19

On Thu, 02 Oct 2008 15:05:04 +0200 Peter Zijlstra <[email protected]> wrote:

> Let's get this ball rolling...

I don't think we're really able to get any MM balls rolling until we
get all the split-LRU stuff landed. Is anyone testing it? Is it good?

2008-10-02 20:59:26

by Lee Schermerhorn

[permalink] [raw]

Subject: Re: [PATCH 00/32] Swap over NFS - v19

On Thu, 2008-10-02 at 12:47 -0700, Andrew Morton wrote:
> On Thu, 02 Oct 2008 15:05:04 +0200 Peter Zijlstra <[email protected]> wrote:
>
> > Let's get this ball rolling...
>
> I don't think we're really able to get any MM balls rolling until we
> get all the split-LRU stuff landed. Is anyone testing it? Is it good?

Andrew:

Up until the mailing list traffic and patches slowed down, I was testing
it continuously with a heavy stress load that would bring the system to
its knees before the splitlru and unevictable changes. When it would
run for days without error [96 hours was my max run] and no further
patches came, I've concentrated on other things.

Rik and Kosaki-san have run some performance oriented tests, reported
here a while back. Maybe they have more info.

Lee

2008-10-03 06:50:08

by Nick Piggin

[permalink] [raw]

Subject: Re: [PATCH 00/32] Swap over NFS - v19

On Thursday 02 October 2008 23:05, Peter Zijlstra wrote:
> Patches are against: v2.6.27-rc5-mm1
>
> This release features more comments and (hopefully) better Changelogs.
> Also the netns stuff got sorted and ipv6 will now build and not oops
> on boot ;-)
>
> The first 4 patches are cleanups and can go in if the respective
> maintainers agree.
>
> The code is lightly tested but seems to work on my default config.
>
> Let's get this ball rolling...

I know it's not too helpful for me to say this, but I am spending
time looking at this stuff. I have commented on it in the past,
but I want to get a good handle on the code before I chime in again.

2008-10-03 06:53:59

by Nick Piggin

[permalink] [raw]

Subject: Re: [PATCH 00/32] Swap over NFS - v19

On Friday 03 October 2008 05:47, Andrew Morton wrote:
> On Thu, 02 Oct 2008 15:05:04 +0200 Peter Zijlstra <[email protected]>
wrote:
> > Let's get this ball rolling...
>
> I don't think we're really able to get any MM balls rolling until we
> get all the split-LRU stuff landed. Is anyone testing it? Is it good?

Peter's patches are very orthogonal to that work and shouldn't
actually change those kinds of reclaim heuristics at all.

2008-10-03 17:17:35

by Luiz Fernando N. Capitulino

[permalink] [raw]

Subject: Re: [PATCH 00/32] Swap over NFS - v19

Em Thu, 02 Oct 2008 15:05:04 +0200
Peter Zijlstra <[email protected]> escreveu:

| Patches are against: v2.6.27-rc5-mm1
|
| This release features more comments and (hopefully) better Changelogs.
| Also the netns stuff got sorted and ipv6 will now build and not oops
| on boot ;-)
|
| The first 4 patches are cleanups and can go in if the respective maintainers
| agree.
|
| The code is lightly tested but seems to work on my default config.
|
| Let's get this ball rolling...

What's the best way to test this? Create a swap in a NFS mount
point and stress it?

--
Luiz Fernando N. Capitulino

2008-10-03 19:40:19

by Rik van Riel

[permalink] [raw]

Subject: Re: [PATCH 00/32] Swap over NFS - v19

On Thu, 2 Oct 2008 12:47:48 -0700
Andrew Morton <[email protected]> wrote:
> On Thu, 02 Oct 2008 15:05:04 +0200 Peter Zijlstra <[email protected]> wrote:
>
> > Let's get this ball rolling...
>
> I don't think we're really able to get any MM balls rolling until we
> get all the split-LRU stuff landed. Is anyone testing it? Is it good?

I've done some testing on it on my two test systems and have not
found performance regressions against the mainline VM.

As for stability, I think we have done enough testing to conclude
that it is stable by now.

--
All rights reversed.

2008-10-04 10:13:23

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 00/32] Swap over NFS - v19

On Fri, 2008-10-03 at 14:17 -0300, Luiz Fernando N. Capitulino wrote:
> Em Thu, 02 Oct 2008 15:05:04 +0200
> Peter Zijlstra <[email protected]> escreveu:
>
> | Patches are against: v2.6.27-rc5-mm1
> |
> | This release features more comments and (hopefully) better Changelogs.
> | Also the netns stuff got sorted and ipv6 will now build and not oops
> | on boot ;-)
> |
> | The first 4 patches are cleanups and can go in if the respective maintainers
> | agree.
> |
> | The code is lightly tested but seems to work on my default config.
> |
> | Let's get this ball rolling...
>
> What's the best way to test this? Create a swap in a NFS mount
> point and stress it?

What I do is boot with mem=256M, then swapoff -a;
swapon /net/host/$path/file.swp;

the file.swp I created using dd and mkswap on the remote host.

I then run 2 cyclic loops on anonymous memory sized 96mb, and run 2
cyclic loops on file backed memory on the same NFS mount
(eg /net/host/$path/file[12]), also sized 96mb

That gives a memory footprint of 4*96=384mb and will thus rely on paging
quite heavily.

While this is on-going you can have a little deamon that listens and
accepts connections and reads from them.

On a 3rd machine, start say a 1000 connections to this deamon that
continuously write stuff to it.

Then on you NFS host do something like: /etc/init.d/nfs stop

go for lunch

and when you're back do: /etc/init.d/nfs start

and see if all comes back up again ;-)

2008-10-04 15:05:28

by KOSAKI Motohiro

[permalink] [raw]

Subject: Re: [PATCH 00/32] Swap over NFS - v19

Hi

> Andrew Morton <[email protected]> wrote:
> > On Thu, 02 Oct 2008 15:05:04 +0200 Peter Zijlstra <[email protected]> wrote:
> >
> > > Let's get this ball rolling...
> >
> > I don't think we're really able to get any MM balls rolling until we
> > get all the split-LRU stuff landed. Is anyone testing it? Is it good?
>
> I've done some testing on it on my two test systems and have not
> found performance regressions against the mainline VM.
>
> As for stability, I think we have done enough testing to conclude
> that it is stable by now.

Also my experience doesn't found any regression.
and in my experience, split-lru patch increase performance stability.

What is performance stability?
example, HPC parallel compution use many process and communication
each other.
Then, the system performance is decided by most slow process.

So, peek and average performance isn't only important, but also
worst case performance is important.

Especially, split-lru outperform mainline in anon and file mixed workload.

example, I ran himeno benchmark.
(this is one of most famous hpc benchmark in japan, this benchmark
do matrix calculation on large memory (= use anon only))

machine
-------------
CPU IA64 x8
MEM 8G

benchmark setting
----------------
# of parallel: 4
use mem: 1.7G x4 (used nealy total mem)

first:
result of when other process stoped (Unit: MFLOPS)

each process
result
1 2 3 4 worst average
---------------------------------------------------------
2.6.27-rc8: 217 213 217 154 154 200
mmotm 02 Oct: 217 214 217 217 214 216

ok, these are the almost same

next:
result of when another io process running (Unit: MFLOPS)
(*) infinite loop of dd command used

each process
result
1 2 3 4 worst average
---------------------------------------------------------
2.6.27-rc8: 34 205 69 196 34 126
mmotm 02 Oct: 162 179 146 178 146 166

Wow, worst case is significant difference.
(this result is reprodusable)

because reclaim processing of mainline VM is too slow.
then, the process of calling direct reclaim is decreased performance largely.

this characteristics is not useful for hpc, but also useful for desktop.
because if X server (or another critical process) call direct reclaim,
it can strike end-user-experience easily.

yup,
I know many people want to other benchmark result too.
I'll try to mesure other bench at next week.

2008-10-06 06:03:23

by Suresh Jayaraman

[permalink] [raw]

Subject: Re: [PATCH 00/32] Swap over NFS - v19

Peter Zijlstra wrote:
> Patches are against: v2.6.27-rc5-mm1
>
> This release features more comments and (hopefully) better Changelogs.
> Also the netns stuff got sorted and ipv6 will now build

Except for this one I think ;-)

net/netfilter/core.c: In function ‘nf_hook_slow’:
net/netfilter/core.c:191: error: ‘pskb’ undeclared (first use in this
function)

> and not oops on boot ;-)

The culprit is emergency-nf_queue.patch. The following change fixes the
build error for me.

Index: linux-2.6.26/net/netfilter/core.c
===================================================================
--- linux-2.6.26.orig/net/netfilter/core.c
+++ linux-2.6.26/net/netfilter/core.c
@@ -184,9 +184,12 @@ next_hook:
ret = 1;
goto unlock;
} else if (verdict == NF_DROP) {
+drop:
kfree_skb(skb);
ret = -EPERM;
} else if ((verdict & NF_VERDICT_MASK) == NF_QUEUE) {
+ if (skb_emergency(skb))
+ goto drop;
if (!nf_queue(skb, elem, pf, hook, indev, outdev, okfn,
verdict >> NF_VERDICT_BITS))
goto next_hook;

Thanks,

--
Suresh Jayaraman

2008-10-07 14:27:18

by KOSAKI Motohiro

[permalink] [raw]

Subject: split-lru performance mesurement part2

Hi

> yup,
> I know many people want to other benchmark result too.
> I'll try to mesure other bench at next week.

I ran another benchmark today.
I choice dbench because dbench is one of most famous and real workload like i/o benchmark.

% dbench client.txt 4000

mainline: Throughput 13.4231 MB/sec 4000 clients 4000 procs max_latency=1421988.159 ms
mmotm(*): Throughput 7.0354 MB/sec 4000 clients 4000 procs max_latency=2369213.380 ms

(*) mmotm 2/Oct + Hugh's recently slub fix

Wow!
mmotm is slower than mainline largely (about half performance).

Therefore, I mesured it on "mainline + split-lru(only)" build.

mainline + split-lru(only): Throughput 14.4062 MB/sec 4000 clients 4000 procs max_latency=1152231.896 ms

OK!
split-lru outperform mainline from viewpoint of both throughput and latency :)

However, I don't understand why this regression happend.
Do you have any suggestion?

2008-10-07 20:19:18

by Andrew Morton

[permalink] [raw]

Subject: Re: split-lru performance mesurement part2

On Tue, 7 Oct 2008 23:26:54 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> Hi
>
> > yup,
> > I know many people want to other benchmark result too.
> > I'll try to mesure other bench at next week.
>
> I ran another benchmark today.
> I choice dbench because dbench is one of most famous and real workload like i/o benchmark.
>
>
> % dbench client.txt 4000
>
> mainline: Throughput 13.4231 MB/sec 4000 clients 4000 procs max_latency=1421988.159 ms
> mmotm(*): Throughput 7.0354 MB/sec 4000 clients 4000 procs max_latency=2369213.380 ms
>
> (*) mmotm 2/Oct + Hugh's recently slub fix
>
>
> Wow!
> mmotm is slower than mainline largely (about half performance).
>
> Therefore, I mesured it on "mainline + split-lru(only)" build.
>
>
> mainline + split-lru(only): Throughput 14.4062 MB/sec 4000 clients 4000 procs max_latency=1152231.896 ms
>
>
> OK!
> split-lru outperform mainline from viewpoint of both throughput and latency :)
>
>
>
> However, I don't understand why this regression happend.

erk.

dbench is pretty chaotic and it could be that a good change causes
dbench to get worse. That's happened plenty of times in the past.

> Do you have any suggestion?

One of these:

vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch
vm-dont-run-touch_buffer-during-buffercache-lookups.patch

perhaps?

2008-10-07 21:29:59

by Rik van Riel

[permalink] [raw]

Subject: Re: split-lru performance mesurement part2

Andrew Morton wrote:

> dbench is pretty chaotic and it could be that a good change causes
> dbench to get worse. That's happened plenty of times in the past.
>
>
>> Do you have any suggestion?
>
>
> One of these:
>
> vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch
> vm-dont-run-touch_buffer-during-buffercache-lookups.patch
>
> perhaps?

Worth a try, but it could just as well be a CPU scheduler change
that happens to indirectly impact locking :)

--
All rights reversed.