LinuxLists.cc - [PATCH 00/33] Adaptive read-ahead V12

2006-05-24 11:19:03

Subject: [PATCH 00/33] Adaptive read-ahead V12

Andrew,

This is the 12th release of the adaptive readahead patchset.

It has received tests in a wide range of applications in the past
six months, and polished up considerably.

Please consider it for inclusion in -mm tree.

Performance benefits
====================

Besides file servers and desktops, it is recently found to benefit
postgresql databases a lot.

I explained to pgsql users how the patch may help their db performance:
http://archives.postgresql.org/pgsql-performance/2006-04/msg00491.php
[QUOTE]
HOW IT WORKS

In adaptive readahead, the context based method may be of particular
interest to postgresql users. It works by peeking into the file cache
and check if there are any history pages present or accessed. In this
way it can detect almost all forms of sequential / semi-sequential read
patterns, e.g.
- parallel / interleaved sequential scans on one file
- sequential reads across file open/close
- mixed sequential / random accesses
- sparse / skimming sequential read

It also have methods to detect some less common cases:
- reading backward
- seeking all over reading N pages

WAYS TO BENEFIT FROM IT

As we know, postgresql relies on the kernel to do proper readahead.
The adaptive readahead might help performance in the following cases:
- concurrent sequential scans
- sequential scan on a fragmented table
(some DBs suffer from this problem, not sure for pgsql)
- index scan with clustered matches
- index scan on majority rows (in case the planner goes wrong)

And received positive responses:
[QUOTE from Michael Stone]
I've got one DB where the VACUUM ANALYZE generally takes 11M-12M ms;
with the patch the job took 1.7M ms. Another VACUUM that normally takes
between 300k-500k ms took 150k. Definately a promising addition.

[QUOTE from Michael Stone]
>I'm thinking about it, we're already using a fixed read-ahead of 16MB
>using blockdev on the stock Redhat 2.6.9 kernel, it would be nice to
>not have to set this so we may try it.

FWIW, I never saw much performance difference from doing that. Wu's
patch, OTOH, gave a big boost.

[QUOTE: odbc-bench with Postgresql 7.4.11 on dual Opteron]
Base kernel:
Transactions per second: 92.384758
Transactions per second: 99.800896

After read-ahvm.readahead_ratio = 100:
Transactions per second: 105.461952
Transactions per second: 105.458664

vm.readahead_ratio = 100 ; vm.readahead_hit_rate = 1:
Transactions per second: 113.055367
Transactions per second: 124.815910

Patches
=======
All 33 patches are bisect friendly:
special cares have been taken to make them compile cleanly on each step.

The following 29 patches are only logically seperated -
one should not remove one of them and expect others to compile cleanly:

[patch 01/33] readahead: kconfig options
[patch 02/33] radixtree: look-aside cache
[patch 03/33] radixtree: hole scanning functions
[patch 04/33] readahead: page flag PG_readahead
[patch 05/33] readahead: refactor do_generic_mapping_read()
[patch 06/33] readahead: refactor __do_page_cache_readahead()
[patch 07/33] readahead: insert cond_resched() calls
[patch 08/33] readahead: common macros
[patch 09/33] readahead: events accounting
[patch 10/33] readahead: support functions
[patch 11/33] readahead: sysctl parameters
[patch 12/33] readahead: min/max sizes
[patch 13/33] readahead: state based method - aging accounting
[patch 14/33] readahead: state based method - data structure
[patch 15/33] readahead: state based method - routines
[patch 16/33] readahead: state based method
[patch 17/33] readahead: context based method
[patch 18/33] readahead: initial method - guiding sizes
[patch 19/33] readahead: initial method - thrashing guard size
[patch 20/33] readahead: initial method - expected read size
[patch 21/33] readahead: initial method - user recommended size
[patch 22/33] readahead: initial method
[patch 23/33] readahead: backward prefetching method
[patch 24/33] readahead: seeking reads method
[patch 25/33] readahead: thrashing recovery method
[patch 26/33] readahead: call scheme
[patch 27/33] readahead: laptop mode
[patch 28/33] readahead: loop case
[patch 29/33] readahead: nfsd case

The following 4 patches are for debugging purpose, and for -mm only:

[patch 30/33] readahead: turn on by default
[patch 31/33] readahead: debug radix tree new functions
[patch 32/33] readahead: debug traces showing accessed file names
[patch 33/33] readahead: debug traces showing read patterns

Diffstat
========
Documentation/sysctl/vm.txt | 37
block/ll_rw_blk.c | 34
drivers/block/loop.c | 6
fs/file_table.c | 7
fs/mpage.c | 4
fs/nfsd/vfs.c | 5
include/linux/backing-dev.h | 3
include/linux/fs.h | 57 +
include/linux/mm.h | 31
include/linux/mmzone.h | 5
include/linux/page-flags.h | 5
include/linux/radix-tree.h | 87 ++
include/linux/sysctl.h | 2
include/linux/writeback.h | 6
kernel/sysctl.c | 28
lib/radix-tree.c | 202 ++++-
mm/Kconfig | 62 +
mm/filemap.c | 90 ++
mm/page-writeback.c | 2
mm/page_alloc.c | 2
mm/readahead.c | 1641 +++++++++++++++++++++++++++++++++++++++++++-
mm/swap.c | 2
mm/vmscan.c | 4
23 files changed, 2262 insertions(+), 60 deletions(-)

Changelog
=========

V12 2006-05-24
- improve small files case
- allow pausing of events accounting
- disable sparse read-ahead by default
- a bug fix in radix_tree_cache_lookup_parent()
- more cleanups

V11 2006-03-19
- patchset rework
- add kconfig option to make the feature compile-time selectable
- improve radix tree scan functions
- fix bug of using smp_processor_id() in preemptible code
- avoid overflow in compute_thrashing_threshold()
- disable sparse read prefetching if (readahead_hit_rate == 1)
- make thrashing recovery a standalone function
- random cleanups

V10 2005-12-16
- remove delayed page activation
- remove live page protection
- revert mmap readaround to old behavior
- default to original readahead logic
- default to original readahead size
- merge comment fixes from Andreas Mohr
- merge radixtree cleanups from Christoph Lameter
- reduce sizeof(struct file_ra_state) by unnamed union
- stateful method cleanups
- account other read-ahead paths

V9 2005-12-3
- standalone mmap read-around code, a little more smart and tunable
- make stateful method sensible of request size
- decouple readahead_ratio from live pages protection
- let readahead_ratio contribute to ra_size grow speed in stateful method
- account variance of ra_size

V8 2005-11-25

- balance zone aging only in page relaim paths and do it right
- do the aging of slabs in the same way as zones
- add debug code to dump the detailed page reclaim steps
- undo exposing of struct radix_tree_node and uninline related functions
- work better with nfsd
- generalize accelerated context based read-ahead
- account smooth read-ahead aging based on page referenced/activate bits
- avoid divide error in compute_thrashing_threshold()
- more low lantency efforts
- update some comments
- rebase debug actions on debugfs entries instead of magic readahead_ratio values

V7 2005-11-09

- new tunable parameters: readahead_hit_rate/readahead_live_chunk
- support sparse sequential accesses
- delay look-ahead if drive is spinned down in laptop mode
- disable look-ahead for loopback file
- make mandatory thrashing protection more simple and robust
- attempt to improve responsiveness on large read-ahead size

V6 2005-11-01

- cancel look-ahead in laptop mode
- increase read-ahead limit to 0xFFFF pages

V5 2005-10-28

- rewrite context based method to make it clean and robust
- improved accuracy of stateful thrashing threshold estimation
- make page aging equal to the number of code pages scanned
- sort out the thrashing protection logic
- enhanced debug/accounting facilities

V4 2005-10-15

- detect and save live chunks on page reclaim
- support database workload
- support reading backward
- radix tree lookup look-aside cache

V3 2005-10-06

- major code reorganization and documention
- stateful estimation of thrashing-threshold
- context method with accelerated grow up phase
- adaptive look-ahead
- early detection and rescue of pages in danger
- statitics data collection
- synchronized page aging between zones

V2 2005-09-15

- delayed page activation
- look-ahead: towards pipelined read-ahead

V1 2005-09-13

Initial release which features:
o stateless (for now)
o adapts to available memory / read speed
o free of thrashing (in theory)

And handles:
o large number of slow streams (FTP server)
o open/read/close access patterns (NFS server)
o multiple interleaved, sequential streams in one file
(multithread / multimedia / database)

Cheers,
Wu Fengguang
--
Dept. Automation University of Science and Technology of China

2006-05-25 15:45:07

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH 00/33] Adaptive read-ahead V12

Wu Fengguang <[email protected]> wrote:
>
> Andrew,
>
> This is the 12th release of the adaptive readahead patchset.
>
> It has received tests in a wide range of applications in the past
> six months, and polished up considerably.
>
> Please consider it for inclusion in -mm tree.
>
>
> Performance benefits
> ====================
>
> Besides file servers and desktops, it is recently found to benefit
> postgresql databases a lot.
>
> I explained to pgsql users how the patch may help their db performance:
> http://archives.postgresql.org/pgsql-performance/2006-04/msg00491.php
> [QUOTE]
> HOW IT WORKS
>
> In adaptive readahead, the context based method may be of particular
> interest to postgresql users. It works by peeking into the file cache
> and check if there are any history pages present or accessed. In this
> way it can detect almost all forms of sequential / semi-sequential read
> patterns, e.g.
> - parallel / interleaved sequential scans on one file
> - sequential reads across file open/close
> - mixed sequential / random accesses
> - sparse / skimming sequential read
>
> It also have methods to detect some less common cases:
> - reading backward
> - seeking all over reading N pages
>
> WAYS TO BENEFIT FROM IT
>
> As we know, postgresql relies on the kernel to do proper readahead.
> The adaptive readahead might help performance in the following cases:
> - concurrent sequential scans
> - sequential scan on a fragmented table
> (some DBs suffer from this problem, not sure for pgsql)
> - index scan with clustered matches
> - index scan on majority rows (in case the planner goes wrong)
>
> And received positive responses:
> [QUOTE from Michael Stone]
> I've got one DB where the VACUUM ANALYZE generally takes 11M-12M ms;
> with the patch the job took 1.7M ms. Another VACUUM that normally takes
> between 300k-500k ms took 150k. Definately a promising addition.
>
> [QUOTE from Michael Stone]
> >I'm thinking about it, we're already using a fixed read-ahead of 16MB
> >using blockdev on the stock Redhat 2.6.9 kernel, it would be nice to
> >not have to set this so we may try it.
>
> FWIW, I never saw much performance difference from doing that. Wu's
> patch, OTOH, gave a big boost.
>
> [QUOTE: odbc-bench with Postgresql 7.4.11 on dual Opteron]
> Base kernel:
> Transactions per second: 92.384758
> Transactions per second: 99.800896
>
> After read-ahvm.readahead_ratio = 100:
> Transactions per second: 105.461952
> Transactions per second: 105.458664
>
> vm.readahead_ratio = 100 ; vm.readahead_hit_rate = 1:
> Transactions per second: 113.055367
> Transactions per second: 124.815910

These are nice-looking numbers, but one wonders. If optimising readahead
makes this much difference to postgresql performance then postgresql should
be doing the readahead itself, rather than relying upon the kernel's
ability to guess what the application will be doing in the future. Because
surely the database can do a better job of that than the kernel.

That would involve using posix_fadvise(POSIX_FADV_RANDOM) to disable kernel
readahead and then using posix_fadvise(POSIX_FADV_WILLNEED) to launch
application-level readahead.

Has this been considered or attempted?

2006-05-25 19:26:28

by Michael Stone

[permalink] [raw]

Subject: Re: [PATCH 00/33] Adaptive read-ahead V12

On Thu, May 25, 2006 at 08:44:15AM -0700, Andrew Morton wrote:
>These are nice-looking numbers, but one wonders. If optimising readahead
>makes this much difference to postgresql performance then postgresql should
>be doing the readahead itself, rather than relying upon the kernel's
>ability to guess what the application will be doing in the future. Because
>surely the database can do a better job of that than the kernel.

In this particular case Wu had asked about postgres numbers, so I
reported some postgres numbers. You could probably get similar speedups
out of postgres by implementing readahead in postgres. OTOH, the kernel
patch also gives substantial speedups to thing like cp; the question
comes down to whether it's better for every application to implement
readahead or for the kernel to do it. (There are, of course, other
concerns like maintainability or whether performance degrades in other
cases, but I didn't test that. :)

Mike Stone

2006-05-25 21:50:23

by David Lang

[permalink] [raw]

Subject: Re: [PATCH 00/33] Adaptive read-ahead V12

On Thu, 25 May 2006, Andrew Morton wrote:

> Wu Fengguang <[email protected]> wrote:
>>
>
> These are nice-looking numbers, but one wonders. If optimising readahead
> makes this much difference to postgresql performance then postgresql should
> be doing the readahead itself, rather than relying upon the kernel's
> ability to guess what the application will be doing in the future. Because
> surely the database can do a better job of that than the kernel.
>
> That would involve using posix_fadvise(POSIX_FADV_RANDOM) to disable kernel
> readahead and then using posix_fadvise(POSIX_FADV_WILLNEED) to launch
> application-level readahead.
>
> Has this been considered or attempted?

Postgres chooses not to try and duplicate OS functionality in it's I/O
routines.

it doesn't try to determine where on disk the data is (other then
splitting the data into multiple files and possibly spreading things
between directories)

it doesn't try to do it's own readahead.

it _does_ maintain it's own journal, but depends on the OS to do the right
thing when a fsync is issued on the files.

yes it could be re-written to do all this itself, but the project has
decided not to try and figure out the best options for all the different
filesystems and OS's that it runs on and instead trust the OS developers
to do reasonable things instead.

besides, do you really want to have every program doing it's own
readahead?

David Lang

2006-05-25 22:05:10

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH 00/33] Adaptive read-ahead V12

David Lang <[email protected]> wrote:
>
> On Thu, 25 May 2006, Andrew Morton wrote:
>
> > Wu Fengguang <[email protected]> wrote:
> >>
> >
> > These are nice-looking numbers, but one wonders. If optimising readahead
> > makes this much difference to postgresql performance then postgresql should
> > be doing the readahead itself, rather than relying upon the kernel's
> > ability to guess what the application will be doing in the future. Because
> > surely the database can do a better job of that than the kernel.
> >
> > That would involve using posix_fadvise(POSIX_FADV_RANDOM) to disable kernel
> > readahead and then using posix_fadvise(POSIX_FADV_WILLNEED) to launch
> > application-level readahead.
> >
> > Has this been considered or attempted?
>
> Postgres chooses not to try and duplicate OS functionality in it's I/O
> routines.
>
> it doesn't try to determine where on disk the data is (other then
> splitting the data into multiple files and possibly spreading things
> between directories)
>
> it doesn't try to do it's own readahead.
>
> it _does_ maintain it's own journal, but depends on the OS to do the right
> thing when a fsync is issued on the files.
>
> yes it could be re-written to do all this itself, but the project has
> decided not to try and figure out the best options for all the different
> filesystems and OS's that it runs on and instead trust the OS developers
> to do reasonable things instead.
>
> besides, do you really want to have every program doing it's own
> readahead?
>

If the developers of that program want to squeeze the last 5% out of it
then sure, I'd expect them to use such OS-provided I/O scheduling
facilities. Database developers do that sort of thing all the time.

We have an application which knows what it's doing sending IO requests to
the kernel which must then try to reverse engineer what the application is
doing via this rather inappropriate communication channel.

Is that dumb, or what?

Given that the application already knows what it's doing, it's in a much
better position to issue the anticipatory IO requests than is the kernel.

2006-05-25 22:37:32

by David Lang

[permalink] [raw]

Subject: Re: [PATCH 00/33] Adaptive read-ahead V12

> If the developers of that program want to squeeze the last 5% out of it
> then sure, I'd expect them to use such OS-provided I/O scheduling
> facilities. Database developers do that sort of thing all the time.
>
> We have an application which knows what it's doing sending IO requests to
> the kernel which must then try to reverse engineer what the application is
> doing via this rather inappropriate communication channel.
>
> Is that dumb, or what?
>
> Given that the application already knows what it's doing, it's in a much
> better position to issue the anticipatory IO requests than is the kernel.

if a program is trying to squeeze every last bit of performance out of a
system then you are right, it should run on the bare hardware. however
in reality many people are willing to sacrafice a little performance for
maintainability, and portability.

if Adaptive read-ahead was only useful for Postgres (and had a negative
effect on everything else, even if it's just the added complication in the
kernel) then I would agree that it should be in Postgres, not in the
kernel. but I don't believe that this is the case, this patch series helps
in a large number of workloads (including 'cp' according to some other
posters), postgres was just used as the example in this subthread.

gnome startup has some serious read-ahead issues from what I've heard,
should it include an I/O scheduler as well (after all it knows what it's
going to be doing, why should the kernel have to reverse-enginer it)

David Lang

2006-05-26 00:48:24

by Michael Stone

[permalink] [raw]

Subject: Re: [PATCH 00/33] Adaptive read-ahead V12

On Thu, May 25, 2006 at 03:01:49PM -0700, Andrew Morton wrote:
>If the developers of that program want to squeeze the last 5% out of it
>then sure, I'd expect them to use such OS-provided I/O scheduling
>facilities.

Maybe, if we were talking about squeezing the last 5%. But all
applications should be required to greatly complicate their IO routines
for the last 30%? To reimplement something the kernel already does (at
least to some degree), as opposed to making the kernel implementation
better? "Is that dumb, or what?" :-)

>Database developers do that sort of thing all the time.

Even the oracle people seem to have figured out they were doing too much
that's properly the responsibility of the OS and creating a maintenance
and portability nightmare.

Mike Stone

2006-05-26 01:19:43

by Wu Fengguang

[permalink] [raw]

Subject: Re: [PATCH 00/33] Adaptive read-ahead V12

On Thu, May 25, 2006 at 08:44:15AM -0700, Andrew Morton wrote:
> These are nice-looking numbers, but one wonders. If optimising readahead
> makes this much difference to postgresql performance then postgresql should
> be doing the readahead itself, rather than relying upon the kernel's
> ability to guess what the application will be doing in the future. Because
> surely the database can do a better job of that than the kernel.
>
> That would involve using posix_fadvise(POSIX_FADV_RANDOM) to disable kernel
> readahead and then using posix_fadvise(POSIX_FADV_WILLNEED) to launch
> application-level readahead.
>
> Has this been considered or attempted?

There has been many lengthy debates in the postgresql mailing list,
and it seems that there has been _strong_ resistance to it.

IMHO, a best scheme would be
- leave _obvious_ patterns to the kernel
i.e. all kinds of (semi-)sequential reads
- do fadvise() for _non-obvious_ patterns on _critical_ points
i.e. the index scans

Wu

2006-05-26 02:10:37

by [email protected]

[permalink] [raw]

Subject: Re: [PATCH 00/33] Adaptive read-ahead V12

On 5/25/06, Andrew Morton <[email protected]> wrote:
> These are nice-looking numbers, but one wonders. If optimising readahead
> makes this much difference to postgresql performance then postgresql should
> be doing the readahead itself, rather than relying upon the kernel's
> ability to guess what the application will be doing in the future. Because
> surely the database can do a better job of that than the kernel.
>
> That would involve using posix_fadvise(POSIX_FADV_RANDOM) to disable kernel
> readahead and then using posix_fadvise(POSIX_FADV_WILLNEED) to launch
> application-level readahead.

Users have also reported that this patch fixes performance problems
from web servers using sendfile(). In the case of lighttpd they
actually stopped using sendfile() for large transfers and wrote a user
space replacement where they could control readahead manually. With
this patch in place sendfile() went back to being faster than the user
space implementation.

--
Jon Smirl
[email protected]

2006-05-26 03:14:12

by Nick Piggin

[permalink] [raw]

Subject: Re: [PATCH 00/33] Adaptive read-ahead V12

Jon Smirl wrote:

> On 5/25/06, Andrew Morton <[email protected]> wrote:
>
>> These are nice-looking numbers, but one wonders. If optimising
>> readahead
>> makes this much difference to postgresql performance then postgresql
>> should
>> be doing the readahead itself, rather than relying upon the kernel's
>> ability to guess what the application will be doing in the future.
>> Because
>> surely the database can do a better job of that than the kernel.
>>
>> That would involve using posix_fadvise(POSIX_FADV_RANDOM) to disable
>> kernel
>> readahead and then using posix_fadvise(POSIX_FADV_WILLNEED) to launch
>> application-level readahead.
>
>
> Users have also reported that this patch fixes performance problems
> from web servers using sendfile(). In the case of lighttpd they
> actually stopped using sendfile() for large transfers and wrote a user
> space replacement where they could control readahead manually. With
> this patch in place sendfile() went back to being faster than the user
> space implementation.

Of course, that is something one would expect should be made to work
properly
with the current readahead implementation.

I don't see Wu's patches getting in for a little while yet.

Reproducable test cases (preferably without a whole lot of network clients)
should get this proble fixed.

--

Send instant messages to your online friends http://au.messenger.yahoo.com

2006-05-26 08:55:24

by Just Marc

[permalink] [raw]

Subject: Re: [PATCH 00/33] Adaptive read-ahead V12

Hi,

>If the developers of that program want to squeeze the last 5% out of it
>then sure, I'd expect them to use such OS-provided I/O scheduling
>facilities. Database developers do that sort of thing all the time.
>
>We have an application which knows what it's doing sending IO requests to
>the kernel which must then try to reverse engineer what the application is
>doing via this rather inappropriate communication channel.
>
>Is that dumb, or what?
>
> Given that the application already knows what it's doing, it's in a much
>better position to issue the anticipatory IO requests than is the kernel.

What about a performance driven application (A web server) that's using say
sendfile() in order to reduce the overhead of context switching, how would
this application do its own read-ahead "management" effectively?

Thanks

2006-05-26 14:00:22

by Andi Kleen

[permalink] [raw]

Subject: Re: [PATCH 00/33] Adaptive read-ahead V12

Andrew Morton <[email protected]> writes:
>
> These are nice-looking numbers, but one wonders. If optimising readahead
> makes this much difference to postgresql performance then postgresql should
> be doing the readahead itself, rather than relying upon the kernel's
> ability to guess what the application will be doing in the future. Because
> surely the database can do a better job of that than the kernel.

With that argument we should remove all readahead from the kernel?
Because it's already trying to guess what the application will do.

I suspect it's better to have good readahead code in the kernel
than in a zillion application.

-Andi

2006-05-26 16:26:33

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH 00/33] Adaptive read-ahead V12

Andi Kleen <[email protected]> wrote:
>
> Andrew Morton <[email protected]> writes:
> >
> > These are nice-looking numbers, but one wonders. If optimising readahead
> > makes this much difference to postgresql performance then postgresql should
> > be doing the readahead itself, rather than relying upon the kernel's
> > ability to guess what the application will be doing in the future. Because
> > surely the database can do a better job of that than the kernel.
>
> With that argument we should remove all readahead from the kernel?
> Because it's already trying to guess what the application will do.
>
> I suspect it's better to have good readahead code in the kernel
> than in a zillion application.
>

Wu: "this readahead patch speeds up postgres"

Me: "but postgres could be sped up even more via X"

everyone: "ah, you're saying that's a reason for not altering readahead!".

Would everyone *please* stop being so completely and utterly thick?

Thank you.

2006-05-26 23:54:48

by folkert

[permalink] [raw]

Subject: Re: [PATCH 00/33] Adaptive read-ahead V12

> > These are nice-looking numbers, but one wonders. If optimising readahead
> > makes this much difference to postgresql performance then postgresql should
> > be doing the readahead itself, rather than relying upon the kernel's
> > ability to guess what the application will be doing in the future. Because
> > surely the database can do a better job of that than the kernel.
> With that argument we should remove all readahead from the kernel?
> Because it's already trying to guess what the application will do.
> I suspect it's better to have good readahead code in the kernel
> than in a zillion application.

Maybe a pluggable read-ahead system could be implemented.

Folkert van Heusden

--
Ever wonder what is out there? Any alien races? Then please support
the seti@home project: setiathome.ssl.berkeley.edu
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com

2006-05-27 00:01:04

by Con Kolivas

[permalink] [raw]

Subject: Re: [PATCH 00/33] Adaptive read-ahead V12

On Saturday 27 May 2006 09:54, Folkert van Heusden wrote:
> > > These are nice-looking numbers, but one wonders. If optimising
> > > readahead makes this much difference to postgresql performance then
> > > postgresql should be doing the readahead itself, rather than relying
> > > upon the kernel's ability to guess what the application will be doing
> > > in the future. Because surely the database can do a better job of that
> > > than the kernel.
> >
> > With that argument we should remove all readahead from the kernel?
> > Because it's already trying to guess what the application will do.
> > I suspect it's better to have good readahead code in the kernel
> > than in a zillion application.
>
> Maybe a pluggable read-ahead system could be implemented.

Pluggable anything is unpopular with Linus and other maintainers. See
pluggable cpu scheduler and pluggable page replacement policy (vm) patchsets.

--
-ck

2006-05-27 00:09:36

by Con Kolivas

[permalink] [raw]

Subject: Re: [PATCH 00/33] Adaptive read-ahead V12

On Saturday 27 May 2006 10:00, Con Kolivas wrote:
> On Saturday 27 May 2006 09:54, Folkert van Heusden wrote:
> > > > These are nice-looking numbers, but one wonders. If optimising
> > > > readahead makes this much difference to postgresql performance then
> > > > postgresql should be doing the readahead itself, rather than relying
> > > > upon the kernel's ability to guess what the application will be doing
> > > > in the future. Because surely the database can do a better job of
> > > > that than the kernel.
> > >
> > > With that argument we should remove all readahead from the kernel?
> > > Because it's already trying to guess what the application will do.
> > > I suspect it's better to have good readahead code in the kernel
> > > than in a zillion application.
> >
> > Maybe a pluggable read-ahead system could be implemented.
>
> Pluggable anything is unpopular with Linus and other maintainers. See
> pluggable cpu scheduler and pluggable page replacement policy (vm)
> patchsets.

Sorry I should have been clearer. The belief is that certain infrastructure
components do not benefit from a pluggable framework, and readeahead probably
comes under that description. It's not like Linus was implying we should only
have one filesystem for example, since filesystems are afterall pluggable
features.

--
-ck

2006-05-28 22:21:55

by Diego Calleja

[permalink] [raw]

Subject: Re: [PATCH 00/33] Adaptive read-ahead V12

El Sat, 27 May 2006 10:08:41 +1000,
Con Kolivas <[email protected]> escribi?:
> On Saturday 27 May 2006 10:00, Con Kolivas wrote:

> Sorry I should have been clearer. The belief is that certain infrastructure
> components do not benefit from a pluggable framework, and readeahead probably
> comes under that description. It's not like Linus was implying we should only
> have one filesystem for example, since filesystems are afterall pluggable
> features.

That leaves another question that I (a poor user) may have missed: Why is
adaptive read-ahead compile-time configurable instead of completely replacing
the old system?

2006-05-29 00:32:41

by Con Kolivas

[permalink] [raw]

Subject: Re: [PATCH 00/33] Adaptive read-ahead V12

Quoting Diego Calleja <[email protected]>:

> That leaves another question that I (a poor user) may have missed: Why is
> adaptive read-ahead compile-time configurable instead of completely
> replacing
> the old system?

That was done to appease the users out there that had worse performance with it.
In the early stages of development of this code it was rather detrimental on an
ordinary desktop. Fortunately that seems to have gotten a lot better. I don't
think the final version should be a compile time option. It's either "adaptive"
and better everywhere or it's not.

--
-ck

2006-05-29 03:04:48

by Wu Fengguang

[permalink] [raw]

Subject: Re: [PATCH 00/33] Adaptive read-ahead V12

On Mon, May 29, 2006 at 08:31:43AM +1000, [email protected] wrote:
> Quoting Diego Calleja <[email protected]>:
>
> > That leaves another question that I (a poor user) may have missed: Why is
> > adaptive read-ahead compile-time configurable instead of completely
> > replacing
> > the old system?
>
> That was done to appease the users out there that had worse performance with it.
> In the early stages of development of this code it was rather detrimental on an
> ordinary desktop. Fortunately that seems to have gotten a lot better. I don't
> think the final version should be a compile time option. It's either "adaptive"
> and better everywhere or it's not.

Hehe, I have a dream - that it helps *everywhere* ;-)

Wu