2010-02-02 15:34:33

by Fengguang Wu

[permalink] [raw]
Subject: [PATCH 10/11] readahead: dont do start-of-file readahead after lseek()

Some applications (eg. blkid, id3tool etc.) seek around the file
to get information. For example, blkid does
seek to 0
read 1024
seek to 1536
read 16384

The start-of-file readahead heuristic is wrong for them, whose
access pattern can be identified by lseek() calls.

So test-and-set a READAHEAD_LSEEK flag on lseek() and don't
do start-of-file readahead on seeing it. Proposed by Linus.

CC: Linus Torvalds <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>
---
fs/read_write.c | 3 +++
include/linux/fs.h | 1 +
mm/readahead.c | 5 +++++
3 files changed, 9 insertions(+)

--- linux.orig/mm/readahead.c 2010-02-02 21:52:19.000000000 +0800
+++ linux/mm/readahead.c 2010-02-02 21:52:32.000000000 +0800
@@ -625,6 +625,11 @@ ondemand_readahead(struct address_space
if (!offset) {
ra_set_pattern(ra, RA_PATTERN_INITIAL);
ra->start = offset;
+ if ((ra->ra_flags & READAHEAD_LSEEK) && req_size <= max) {
+ ra->size = req_size;
+ ra->async_size = 0;
+ goto readit;
+ }
ra->size = get_init_ra_size(req_size, max);
ra->async_size = ra->size > req_size ?
ra->size - req_size : ra->size;
--- linux.orig/fs/read_write.c 2010-02-02 21:50:51.000000000 +0800
+++ linux/fs/read_write.c 2010-02-02 21:53:04.000000000 +0800
@@ -71,6 +71,9 @@ generic_file_llseek_unlocked(struct file
file->f_version = 0;
}

+ if (!(file->f_ra.ra_flags & READAHEAD_LSEEK))
+ file->f_ra.ra_flags |= READAHEAD_LSEEK;
+
return offset;
}
EXPORT_SYMBOL(generic_file_llseek_unlocked);
--- linux.orig/include/linux/fs.h 2010-02-02 21:52:19.000000000 +0800
+++ linux/include/linux/fs.h 2010-02-02 21:52:19.000000000 +0800
@@ -899,6 +899,7 @@ struct file_ra_state {
#define READAHEAD_MMAP_MISS 0x0000ffff /* cache misses for mmap access */
#define READAHEAD_THRASHED 0x10000000
#define READAHEAD_MMAP 0x20000000
+#define READAHEAD_LSEEK 0x40000000 /* be conservative after lseek() */

/*
* Which policy makes decision to do the current read-ahead IO?


2010-02-02 17:41:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 10/11] readahead: dont do start-of-file readahead after lseek()



On Tue, 2 Feb 2010, Wu Fengguang wrote:
>
> Some applications (eg. blkid, id3tool etc.) seek around the file
> to get information. For example, blkid does
> seek to 0
> read 1024
> seek to 1536
> read 16384
>
> The start-of-file readahead heuristic is wrong for them, whose
> access pattern can be identified by lseek() calls.
>
> So test-and-set a READAHEAD_LSEEK flag on lseek() and don't
> do start-of-file readahead on seeing it. Proposed by Linus.
>
> CC: Linus Torvalds <[email protected]>
> Signed-off-by: Wu Fengguang <[email protected]>

Acked-by: Linus Torvalds <[email protected]>

Linus

2010-02-02 18:13:23

by Olivier Galibert

[permalink] [raw]
Subject: Re: [PATCH 10/11] readahead: dont do start-of-file readahead after lseek()

On Tue, Feb 02, 2010 at 11:28:45PM +0800, Wu Fengguang wrote:
> Some applications (eg. blkid, id3tool etc.) seek around the file
> to get information. For example, blkid does
> seek to 0
> read 1024
> seek to 1536
> read 16384
>
> The start-of-file readahead heuristic is wrong for them, whose
> access pattern can be identified by lseek() calls.
>
> So test-and-set a READAHEAD_LSEEK flag on lseek() and don't
> do start-of-file readahead on seeing it. Proposed by Linus.

Wouldn't that trigger on lseeks to end of file to get the size?

OG.

2010-02-02 18:42:14

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 10/11] readahead: dont do start-of-file readahead after lseek()



On Tue, 2 Feb 2010, Olivier Galibert wrote:
>
> Wouldn't that trigger on lseeks to end of file to get the size?

Well, you'd only ever do that with a raw block device, no (if even that:
more "raw block device" tools just use the BLKSIZE64 ioctl etc)? Any sane
regular file accessor will do 'fstat()' instead.

And do we care about startup speed of ramping up read-ahead from the
beginning? In fact, the problem case that caused this was literally
'blkid' on a block device - and the fact that the kernel tried to
read-ahead TOO MUCh rather than too little.

If somebody is really doing lots of serial reading, the read-ahead code
will figure it out very quickly. The case this worries about is just the
_first_ read, where the question is one of "do we think it might be
seeking around, or does it look like the user is going to just read the
whole thing"?

IOW, if you start off with a SEEK_END, I think it's reasonable to expect
it to _not_ read the whole thing.

Linus

2010-02-02 18:48:36

by Olivier Galibert

[permalink] [raw]
Subject: Re: [PATCH 10/11] readahead: dont do start-of-file readahead after lseek()

On Tue, Feb 02, 2010 at 10:40:41AM -0800, Linus Torvalds wrote:
> IOW, if you start off with a SEEK_END, I think it's reasonable to expect
> it to _not_ read the whole thing.

I've seen a lot of:
int fd = open(...);
size = lseek(fd, 0, SEEK_END);
lseek(fd, 0, SEEK_SET);

data = malloc(size);
read(fd, data, size);
close(fd);

Why not fstat? I don't know. Perhaps a case of cargo culting,
perhaps a case of "other unixes suck for portability"[1]. But it's
probably still there a lot in real code.

OG.

[1] In the hpux, dgux, sunos, etc sense. Not to be taken as a comment
on modern BSDs.

2010-02-02 19:19:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 10/11] readahead: dont do start-of-file readahead after lseek()



On Tue, 2 Feb 2010, Olivier Galibert wrote:
>
> On Tue, Feb 02, 2010 at 10:40:41AM -0800, Linus Torvalds wrote:
> > IOW, if you start off with a SEEK_END, I think it's reasonable to expect
> > it to _not_ read the whole thing.
>
> I've seen a lot of:
> int fd = open(...);
> size = lseek(fd, 0, SEEK_END);
> lseek(fd, 0, SEEK_SET);
>
> data = malloc(size);
> read(fd, data, size);
> close(fd);
>
> Why not fstat? I don't know.

Well, the above will work perfectly with or without the patch, since it
does the read of the full size. There is no read-ahead hint necessary for
that kind of single read behavior.

Rememebr: read-ahead is about filling the empty IO spaces _between_ reads,
and turning many smaller reads into one bigger one. If you only have a
single big read, read-ahead cannot help.

Also, keep in mind that read-ahead is not always a win. It can be a huge
loss too. Which is why we have _heuristics_. They fundamentally cannot
catch every case, but what they aim for is to do a good job on average.

Linus

2010-02-02 20:28:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 10/11] readahead: dont do start-of-file readahead after lseek()



On Tue, 2 Feb 2010, [email protected] wrote:

> On Tue, 2 Feb 2010, Linus Torvalds wrote:
> >
> > Also, keep in mind that read-ahead is not always a win. It can be a huge
> > loss too. Which is why we have _heuristics_. They fundamentally cannot
> > catch every case, but what they aim for is to do a good job on average.
>
> as a note from the field, I just had an application that needed to be changed
> because it did excessive read-ahead. it turned a 2 min reporting run into a 20
> min reporting run because for this report the access was really random and the
> app forced large read-ahead.

Yeah. And the reason Wu did this patch is similar: something that _should_
have taken just quarter of a second took about 7 seconds, because
read-ahead triggered on this really slow device that only feeds about
15kB/s (yes, _kilo_byte, not megabyte).

You can always use POSIX_FADVISE_RANDOM to disable it, but it's seldom
something that people do. And there are real loads that have random
components to them without being _entirely_ random, so in an optimal world
we should just have heuristics that work well.

The problem is, it's often easier to test/debug the "good" cases, ie the
cases where we _want_ read-ahead to trigger. So that probably means that
we have a tendency to read-ahead too aggressively, because those cases are
the ones where people can most easily look at it and say "yeah, this
improves throughput of a 'dd bs=8192'".

So then when we find loads where read-ahead hurts, I think we need to take
_that_ case very seriously. Because otherwise our selection bias for
testing read-ahead will fail.

Linus

2010-02-02 20:33:31

by David Lang

[permalink] [raw]
Subject: Re: [PATCH 10/11] readahead: dont do start-of-file readahead after lseek()

On Tue, 2 Feb 2010, Linus Torvalds wrote:

> Rememebr: read-ahead is about filling the empty IO spaces _between_ reads,
> and turning many smaller reads into one bigger one. If you only have a
> single big read, read-ahead cannot help.
>
> Also, keep in mind that read-ahead is not always a win. It can be a huge
> loss too. Which is why we have _heuristics_. They fundamentally cannot
> catch every case, but what they aim for is to do a good job on average.

as a note from the field, I just had an application that needed to be
changed because it did excessive read-ahead. it turned a 2 min reporting
run into a 20 min reporting run because for this report the access was
really random and the app forced large read-ahead.

David Lang