LinuxLists.cc - partially uptodate page reads

2008-07-24 15:17:11

Subject: partially uptodate page reads

Hi, I have some questions about your patch in -mm

vfs-pagecache-usage-optimization-onpagesize=blocksize-environment.patch

I have no particular problem with something like this, but leaving the
implementation details aside for the moment, can we discuss the
justification for this?

Are there significant numbers of people using block size < page size in
situations where performance is important and significantly improved by
this patch? Can you give any performance numbers to illustrate perhaps?

Thanks,
Nick

2008-07-24 17:59:18

by Christoph Hellwig

[permalink] [raw]

Subject: Re: partially uptodate page reads

On Fri, Jul 25, 2008 at 01:17:11AM +1000, Nick Piggin wrote:
> Hi, I have some questions about your patch in -mm
>
> vfs-pagecache-usage-optimization-onpagesize=blocksize-environment.patch
>
> I have no particular problem with something like this, but leaving the
> implementation details aside for the moment, can we discuss the
> justification for this?
>
> Are there significant numbers of people using block size < page size in
> situations where performance is important and significantly improved by
> this patch? Can you give any performance numbers to illustrate perhaps?

With XFS lots of people use 4k blocksize filesystems on ia64 systems
with 16k pages, so an optimization like this would be useful.

But as mentioned in one of your previous comments I'd rather prefer
a readpage interface chaneg to deal with this.

2008-07-24 19:09:20

by Andrew Morton

[permalink] [raw]

Subject: Re: partially uptodate page reads

On Thu, 24 Jul 2008 13:59:13 -0400 Christoph Hellwig <[email protected]> wrote:

> On Fri, Jul 25, 2008 at 01:17:11AM +1000, Nick Piggin wrote:
> > Hi, I have some questions about your patch in -mm
> >
> > vfs-pagecache-usage-optimization-onpagesize=blocksize-environment.patch
> >
> > I have no particular problem with something like this, but leaving the
> > implementation details aside for the moment, can we discuss the
> > justification for this?
> >
> > Are there significant numbers of people using block size < page size in
> > situations where performance is important and significantly improved by
> > this patch? Can you give any performance numbers to illustrate perhaps?
>
> With XFS lots of people use 4k blocksize filesystems on ia64 systems
> with 16k pages, so an optimization like this would be useful.

As Nick says, we really should have some measurement results which
confirm this theory. Maybe we did do some but they didn't find theor
way into the changelog.

I've put the patch on hold until this confirmation data is available.

> But as mentioned in one of your previous comments I'd rather prefer
> a readpage interface chaneg to deal with this.

2008-07-25 09:22:49

by Nick Piggin

[permalink] [raw]

Subject: Re: partially uptodate page reads

On Friday 25 July 2008 03:59, Christoph Hellwig wrote:
> On Fri, Jul 25, 2008 at 01:17:11AM +1000, Nick Piggin wrote:
> > Hi, I have some questions about your patch in -mm
> >
> > vfs-pagecache-usage-optimization-onpagesize=blocksize-environment.patch
> >
> > I have no particular problem with something like this, but leaving the
> > implementation details aside for the moment, can we discuss the
> > justification for this?
> >
> > Are there significant numbers of people using block size < page size in
> > situations where performance is important and significantly improved by
> > this patch? Can you give any performance numbers to illustrate perhaps?
>
> With XFS lots of people use 4k blocksize filesystems on ia64 systems
> with 16k pages, so an optimization like this would be useful.
>
> But as mentioned in one of your previous comments I'd rather prefer
> a readpage interface chaneg to deal with this.

Yeah... actually if it is a nice win I don't mind too much to go
with this API to start with, and consolidate with readpage later.
Readpage I am thinking about making a few other changes for it as
well, so I am happy to look at folding in this partially-uptodate
API with it as well.

If we just get some numbers (maybe SGI can help out?), I'm happy
enough with this approach.

2008-07-28 04:34:12

by Hisashi Hifumi

[permalink] [raw]

Subject: Re: partially uptodate page reads

Hi

>> >
>> > Are there significant numbers of people using block size < page size in
>> > situations where performance is important and significantly improved by
>> > this patch? Can you give any performance numbers to illustrate perhaps?
>>
>> With XFS lots of people use 4k blocksize filesystems on ia64 systems
>> with 16k pages, so an optimization like this would be useful.
>
>As Nick says, we really should have some measurement results which
>confirm this theory. Maybe we did do some but they didn't find theor
>way into the changelog.
>
>I've put the patch on hold until this confirmation data is available.
>

I've got some performance number.
I wrote a benchmark program and got result number with this program.
This benchmark do:
1, mount and open a test file.
2, create a 512MB file.
3, close a file and umount.
4, mount and again open a test file.
5, pwrite randomly 300000 times on a test file. offset is aligned by IO size(1024bytes).
6, measure time of preading randomly 100000 times on a test file.

The result was:
2.6.26
330 sec

2.6.26-patched
226 sec

Arch:i386
Filesystem:ext3
Blocksize:1024 bytes
Memory: 1GB

On ext3/4, a file is written through buffer/block. So random read/write mixed workloads
or random read after random write workloads are optimized with this patch under
pagesize != blocksize environment. This test result showed this.

The benchmark program is as follows:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <time.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mount.h>

#define LEN 1024
#define LOOP 1024*512 /* 512MB */

main(void)
{
unsigned long i, offset, filesize;
int fd;
char buf[LEN];
time_t t1, t2;

if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
perror("cannot mount\n");
exit(1);
}
memset(buf, 0, LEN);
fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
if (fd < 0) {
perror("cannot open file\n");
exit(1);
}
for (i = 0; i < LOOP; i++)
write(fd, buf, LEN);
close(fd);
if (umount("/root/test1/") < 0) {
perror("cannot umount\n");
exit(1);
}
if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
perror("cannot mount\n");
exit(1);
}
fd = open("/root/test1/testfile", O_RDWR);
if (fd < 0) {
perror("cannot open file\n");
exit(1);
}

filesize = LEN * LOOP;
for (i = 0; i < 300000; i++){
offset = (random() % filesize) & (~(LEN - 1));
pwrite(fd, buf, LEN, offset);
}
printf("start test\n");
time(&t1);
for (i = 0; i < 100000; i++){
offset = (random() % filesize) & (~(LEN - 1));
pread(fd, buf, LEN, offset);
}
time(&t2);
printf("%ld sec\n", t2-t1);
close(fd);
if (umount("/root/test1/") < 0) {
perror("cannot umount\n");
exit(1);
}
}

2008-07-28 06:51:41

by Andrew Morton

[permalink] [raw]

Subject: Re: partially uptodate page reads

On Mon, 28 Jul 2008 13:34:12 +0900 Hisashi Hifumi <[email protected]> wrote:

> Hi
>
> >> >
> >> > Are there significant numbers of people using block size < page size in
> >> > situations where performance is important and significantly improved by
> >> > this patch? Can you give any performance numbers to illustrate perhaps?
> >>
> >> With XFS lots of people use 4k blocksize filesystems on ia64 systems
> >> with 16k pages, so an optimization like this would be useful.
> >
> >As Nick says, we really should have some measurement results which
> >confirm this theory. Maybe we did do some but they didn't find theor
> >way into the changelog.
> >
> >I've put the patch on hold until this confirmation data is available.
> >
>
> I've got some performance number.
> I wrote a benchmark program and got result number with this program.
> This benchmark do:
> 1, mount and open a test file.
> 2, create a 512MB file.
> 3, close a file and umount.
> 4, mount and again open a test file.
> 5, pwrite randomly 300000 times on a test file. offset is aligned by IO size(1024bytes).
> 6, measure time of preading randomly 100000 times on a test file.
>
> The result was:
> 2.6.26
> 330 sec
>
> 2.6.26-patched
> 226 sec
>
> Arch:i386
> Filesystem:ext3
> Blocksize:1024 bytes
> Memory: 1GB
>
> On ext3/4, a file is written through buffer/block. So random read/write mixed workloads
> or random read after random write workloads are optimized with this patch under
> pagesize != blocksize environment. This test result showed this.

OK, thanks. Those are pretty nice numbers for what is probably a
fairly common workload.

2008-07-28 06:56:47

by Nick Piggin

[permalink] [raw]

Subject: Re: partially uptodate page reads

On Monday 28 July 2008 16:51, Andrew Morton wrote:
> On Mon, 28 Jul 2008 13:34:12 +0900 Hisashi Hifumi
<[email protected]> wrote:
> > Hi
> >
> > >> > Are there significant numbers of people using block size < page size
> > >> > in situations where performance is important and significantly
> > >> > improved by this patch? Can you give any performance numbers to
> > >> > illustrate perhaps?
> > >>
> > >> With XFS lots of people use 4k blocksize filesystems on ia64 systems
> > >> with 16k pages, so an optimization like this would be useful.
> > >
> > >As Nick says, we really should have some measurement results which
> > >confirm this theory. Maybe we did do some but they didn't find theor
> > >way into the changelog.
> > >
> > >I've put the patch on hold until this confirmation data is available.
> >
> > I've got some performance number.
> > I wrote a benchmark program and got result number with this program.
> > This benchmark do:
> > 1, mount and open a test file.
> > 2, create a 512MB file.
> > 3, close a file and umount.
> > 4, mount and again open a test file.
> > 5, pwrite randomly 300000 times on a test file. offset is aligned by IO
> > size(1024bytes). 6, measure time of preading randomly 100000 times on a
> > test file.
> >
> > The result was:
> > 2.6.26
> > 330 sec
> >
> > 2.6.26-patched
> > 226 sec
> >
> > Arch:i386
> > Filesystem:ext3
> > Blocksize:1024 bytes
> > Memory: 1GB
> >
> > On ext3/4, a file is written through buffer/block. So random read/write
> > mixed workloads or random read after random write workloads are optimized
> > with this patch under pagesize != blocksize environment. This test result
> > showed this.

Yeah, thanks for the numbers.

> OK, thanks. Those are pretty nice numbers for what is probably a
> fairly common workload.

What kind of workloads does this kind of thing?

2008-07-28 07:09:27

by Andrew Morton

[permalink] [raw]

Subject: Re: partially uptodate page reads

On Mon, 28 Jul 2008 16:56:37 +1000 Nick Piggin <[email protected]> wrote:

> On Monday 28 July 2008 16:51, Andrew Morton wrote:
> > On Mon, 28 Jul 2008 13:34:12 +0900 Hisashi Hifumi
> <[email protected]> wrote:
> > > Hi
> > >
> > > >> > Are there significant numbers of people using block size < page size
> > > >> > in situations where performance is important and significantly
> > > >> > improved by this patch? Can you give any performance numbers to
> > > >> > illustrate perhaps?
> > > >>
> > > >> With XFS lots of people use 4k blocksize filesystems on ia64 systems
> > > >> with 16k pages, so an optimization like this would be useful.
> > > >
> > > >As Nick says, we really should have some measurement results which
> > > >confirm this theory. Maybe we did do some but they didn't find theor
> > > >way into the changelog.
> > > >
> > > >I've put the patch on hold until this confirmation data is available.
> > >
> > > I've got some performance number.
> > > I wrote a benchmark program and got result number with this program.
> > > This benchmark do:
> > > 1, mount and open a test file.
> > > 2, create a 512MB file.
> > > 3, close a file and umount.
> > > 4, mount and again open a test file.
> > > 5, pwrite randomly 300000 times on a test file. offset is aligned by IO
> > > size(1024bytes). 6, measure time of preading randomly 100000 times on a
> > > test file.
> > >
> > > The result was:
> > > 2.6.26
> > > 330 sec
> > >
> > > 2.6.26-patched
> > > 226 sec
> > >
> > > Arch:i386
> > > Filesystem:ext3
> > > Blocksize:1024 bytes
> > > Memory: 1GB
> > >
> > > On ext3/4, a file is written through buffer/block. So random read/write
> > > mixed workloads or random read after random write workloads are optimized
> > > with this patch under pagesize != blocksize environment. This test result
> > > showed this.
>
> Yeah, thanks for the numbers.
>
>
> > OK, thanks. Those are pretty nice numbers for what is probably a
> > fairly common workload.
>
> What kind of workloads does this kind of thing?

Various databases? (confused).

More likely pattern is 8k IOs with 16k pagesize or thereabouts.

2008-07-28 07:22:39

by Nick Piggin

[permalink] [raw]

Subject: Re: partially uptodate page reads

On Monday 28 July 2008 17:09, Andrew Morton wrote:
> On Mon, 28 Jul 2008 16:56:37 +1000 Nick Piggin <[email protected]>
wrote:
> > On Monday 28 July 2008 16:51, Andrew Morton wrote:
> > > On Mon, 28 Jul 2008 13:34:12 +0900 Hisashi Hifumi

> > Yeah, thanks for the numbers.
> >
> > > OK, thanks. Those are pretty nice numbers for what is probably a
> > > fairly common workload.
> >
> > What kind of workloads does this kind of thing?
>
> Various databases? (confused).

I guess so, I was thinking of direct IO, but I guess there are
good open source ones which go through pagecache.

> More likely pattern is 8k IOs with 16k pagesize or thereabouts.

Right, but it won't be a completely random workload. Also, it would
be interesting to know if there are any 8k database block size databases
on 4k block size filesystems, running on 16k page size machines, which
are very performance critical ;)

But I guess it is only a small amount of code in order to get a pretty
good speedup. So while those are probably very few installations, it is
probably as much because we do a bad job of it as it just isn't a good
idea in general ;)

The improvement is quite significant, even if it is the artificial best
possible case... I suppose let's just merge it then?