2006-03-19 02:36:54

by Wu Fengguang

[permalink] [raw]
Subject: [PATCH 00/23] Adaptive read-ahead V11

Mornings,

A fresh patch for a fresh new day, and wish you a good appetite ;)

Highlights in this release:
- The patch series are heavily reworked.
- The code is re-audited and made cleaner.
- The old stock read-ahead logic is untouched and will always be
available; the new adaptive read-ahead logic will be presented as a
compile time selectable feature.

Why do we need this?
In short, the stock read-ahead logic does not cover many important I/O
applications. This patch series present linux users a new option. Please
refer to the first patch in this series for the new features.

Patches in the series:

[PATCH 01/23] readahead: kconfig options
[PATCH 02/23] radixtree: look-aside cache
[PATCH 03/23] radixtree: hole scanning functions
[PATCH 04/23] readahead: page flag PG_readahead
[PATCH 05/23] readahead: refactor do_generic_mapping_read()
[PATCH 06/23] readahead: refactor __do_page_cache_readahead()
[PATCH 07/23] readahead: insert cond_resched() calls
[PATCH 08/23] readahead: common macros
[PATCH 09/23] readahead: events accounting
[PATCH 10/23] readahead: support functions
[PATCH 11/23] readahead: sysctl parameters
[PATCH 12/23] readahead: min/max sizes
[PATCH 13/23] readahead: page cache aging accounting
[PATCH 14/23] readahead: state based method
[PATCH 15/23] readahead: context based method
[PATCH 16/23] readahead: other methods
[PATCH 17/23] readahead: call scheme
[PATCH 18/23] readahead: laptop mode
[PATCH 19/23] readahead: loop case
[PATCH 20/23] readahead: nfsd case
[PATCH 21/23] readahead: debug radix tree new functions
[PATCH 22/23] readahead: debug traces showing accessed file names
[PATCH 23/23] readahead: debug traces showing read patterns

Note that the last three patches are optional and only to help
early stage development.

Patches for stable kernels will soon be available at:
http://www.vanheusden.com/ara/
Thanks to Folkert van Heusden for providing the free hosting!

Changelog
=========

V11 2006-03-19
- patchset rework
- add kconfig option to make the feature compile-time selectable
- improve radix tree scan functions
- fix bug of using smp_processor_id() in preemptible code
- avoid overflow in compute_thrashing_threshold()
- disable sparse read prefetching if (readahead_hit_rate == 1)
- make thrashing recovery a standalone function
- random cleanups

V10 2005-12-16
- remove delayed page activation
- remove live page protection
- revert mmap readaround to old behavior
- default to original readahead logic
- default to original readahead size
- merge comment fixes from Andreas Mohr
- merge radixtree cleanups from Christoph Lameter
- reduce sizeof(struct file_ra_state) by unnamed union
- stateful method cleanups
- account other read-ahead paths

V9 2005-12-3
- standalone mmap read-around code, a little more smart and tunable
- make stateful method sensible of request size
- decouple readahead_ratio from live pages protection
- let readahead_ratio contribute to ra_size grow speed in stateful method
- account variance of ra_size

V8 2005-11-25

- balance zone aging only in page relaim paths and do it right
- do the aging of slabs in the same way as zones
- add debug code to dump the detailed page reclaim steps
- undo exposing of struct radix_tree_node and uninline related functions
- work better with nfsd
- generalize accelerated context based read-ahead
- account smooth read-ahead aging based on page referenced/activate bits
- avoid divide error in compute_thrashing_threshold()
- more low lantency efforts
- update some comments
- rebase debug actions on debugfs entries instead of magic readahead_ratio values

V7 2005-11-09

- new tunable parameters: readahead_hit_rate/readahead_live_chunk
- support sparse sequential accesses
- delay look-ahead if drive is spinned down in laptop mode
- disable look-ahead for loopback file
- make mandatory thrashing protection more simple and robust
- attempt to improve responsiveness on large read-ahead size

V6 2005-11-01

- cancel look-ahead in laptop mode
- increase read-ahead limit to 0xFFFF pages

V5 2005-10-28

- rewrite context based method to make it clean and robust
- improved accuracy of stateful thrashing threshold estimation
- make page aging equal to the number of code pages scanned
- sort out the thrashing protection logic
- enhanced debug/accounting facilities

V4 2005-10-15

- detect and save live chunks on page reclaim
- support database workload
- support reading backward
- radix tree lookup look-aside cache

V3 2005-10-06

- major code reorganization and documention
- stateful estimation of thrashing-threshold
- context method with accelerated grow up phase
- adaptive look-ahead
- early detection and rescue of pages in danger
- statitics data collection
- synchronized page aging between zones

V2 2005-09-15

- delayed page activation
- look-ahead: towards pipelined read-ahead

V1 2005-09-13

Initial release which features:
o stateless (for now)
o adapts to available memory / read speed
o free of thrashing (in theory)

And handles:
o large number of slow streams (FTP server)
o open/read/close access patterns (NFS server)
o multiple interleaved, sequential streams in one file
(multithread / multimedia / database)

Cheers,
Wu Fengguang
--
Dept. Automation University of Science and Technology of China


2006-03-19 03:10:54

by [email protected]

[permalink] [raw]
Subject: Re: [PATCH 00/23] Adaptive read-ahead V11

This is probably a readahead problem. The lighttpd people that are
encountering this problem are not regular lkml readers.

http://bugzilla.kernel.org/show_bug.cgi?id=5949

--
Jon Smirl
[email protected]

2006-03-19 03:47:30

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH 00/23] Adaptive read-ahead V11

On Sat, Mar 18, 2006 at 10:10:43PM -0500, Jon Smirl wrote:
> This is probably a readahead problem. The lighttpd people that are
> encountering this problem are not regular lkml readers.
>
> http://bugzilla.kernel.org/show_bug.cgi?id=5949

[QUOTE]
My general conclusion is that since they were able to write a user
space implementation that avoids the problem something must be broken
in the kernel readahead logic for sendfile().

Maybe the user space solution does the trick by using a larger window size?

IMHO, the stock read-ahead is not designed with extremely high concurrency in
mind. However, 100 streams should not be a problem at all.

Wu

2006-03-19 04:10:34

by [email protected]

[permalink] [raw]
Subject: Re: [PATCH 00/23] Adaptive read-ahead V11

On 3/18/06, Wu Fengguang <[email protected]> wrote:
> On Sat, Mar 18, 2006 at 10:10:43PM -0500, Jon Smirl wrote:
> > This is probably a readahead problem. The lighttpd people that are
> > encountering this problem are not regular lkml readers.
> >
> > http://bugzilla.kernel.org/show_bug.cgi?id=5949
>
> [QUOTE]
> My general conclusion is that since they were able to write a user
> space implementation that avoids the problem something must be broken
> in the kernel readahead logic for sendfile().
>
> Maybe the user space solution does the trick by using a larger window size?
>
> IMHO, the stock read-ahead is not designed with extremely high concurrency in
> mind. However, 100 streams should not be a problem at all.

Has anyone checked to see if the readahead logic is working as
expected from sendfile? IO from sendfile is a different type of
context than IO from user space, there could be sendfile specific
problems. If window size is the trick, shouldn't sendfile
automatically adapt it's window size? I don't think you can control
the sendfile window size from user space.

The goal of sendfile is to be the most optimal way to send a file over
the network. This is a case where user space code is easily beating
it.

--
Jon Smirl
[email protected]

2006-03-19 05:09:43

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH 00/23] Adaptive read-ahead V11

On Sat, Mar 18, 2006 at 11:10:33PM -0500, Jon Smirl wrote:
> > Maybe the user space solution does the trick by using a larger window size?
> >
> > IMHO, the stock read-ahead is not designed with extremely high concurrency in
> > mind. However, 100 streams should not be a problem at all.
>
> Has anyone checked to see if the readahead logic is working as
> expected from sendfile? IO from sendfile is a different type of
> context than IO from user space, there could be sendfile specific

AFAIK, sendfile() and read() use the same readahead logic, which
handles them equally good. And there is another readaround logic
which handles unhinted mmapped reads.

> problems. If window size is the trick, shouldn't sendfile
> automatically adapt it's window size? I don't think you can control
> the sendfile window size from user space.

For whole file readings, the stock readahead logic by default uses a fixed
window size of VM_MAX_READAHEAD=128KB, the adaptive readahead logic
uses an adaptive window size with a high limit of VM_MAX_READAHEAD=1024KB.

The VM_MAX_READAHEAD in the kernel is used to init the .ra_pages
attribute of block devices, which can later be altered in _runtime_.
To set a 512KB window size limit for hda, one can do it in two ways:
1) blockdev --setra 1024 /dev/hda
2) echo 512 > /sys/block/had/queue/read_ahead_kb

Cheers,
Wu

2006-03-19 15:53:48

by [email protected]

[permalink] [raw]
Subject: Re: [PATCH 00/23] Adaptive read-ahead V11

On 3/19/06, Wu Fengguang <[email protected]> wrote:
> On Sat, Mar 18, 2006 at 11:10:33PM -0500, Jon Smirl wrote:
> > > Maybe the user space solution does the trick by using a larger window size?
> > >
> > > IMHO, the stock read-ahead is not designed with extremely high concurrency in
> > > mind. However, 100 streams should not be a problem at all.
> >
> > Has anyone checked to see if the readahead logic is working as
> > expected from sendfile? IO from sendfile is a different type of
> > context than IO from user space, there could be sendfile specific
>
> AFAIK, sendfile() and read() use the same readahead logic, which
> handles them equally good. And there is another readaround logic
> which handles unhinted mmapped reads.

In another thread someone made a mention that this problem may have
something to do with the pools of memory being used for sendfile. The
readahead from sendfile is going into a moderately sized pool. When
you get 100 of them going at once the other threads flush the
readahead data out of the pool before it can be used and thus trigger
the thrashing seek storm. Is this true, that sendfile data is read
ahead into a fixed sized pool? If so, the readahead algorithms would
need to reduce the sendfile window sizes to stop the pool from
thrashing.

--
Jon Smirl
[email protected]

2006-03-20 13:49:32

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH 00/23] Adaptive read-ahead V11

On Sun, Mar 19, 2006 at 10:53:46AM -0500, Jon Smirl wrote:
> In another thread someone made a mention that this problem may have
> something to do with the pools of memory being used for sendfile. The
> readahead from sendfile is going into a moderately sized pool. When
> you get 100 of them going at once the other threads flush the
> readahead data out of the pool before it can be used and thus trigger
> the thrashing seek storm. Is this true, that sendfile data is read
> ahead into a fixed sized pool? If so, the readahead algorithms would

The pages are kept in a cache pool which is made of all the free memory.
E.g. the following command shows a system with 331M cache pool:

% free -m
total used free shared buffers cached
Mem: 488 482 5 0 7 331
-/+ buffers/cache: 142 345
Swap: 127 0 127

That would be more than enough for the stock read-ahead to handle 100
concurrent readers.

> need to reduce the sendfile window sizes to stop the pool from
> thrashing.

Sure, it is the desired behavior. This patch provides exactly this feature :)

Cheers,
Wu

2006-03-27 21:38:34

by Matt Heler

[permalink] [raw]
Subject: Re: [PATCH 00/23] Adaptive read-ahead V11

We use lighttpd on our servers, and I can say with 100% that this problem
happens alot. Because of this , we were forced to use to userspace mechanism
that the lighttpd author made to cirvumvent this issue. However with this
patch, I'm unable to produce any of the problems we had experienced before.
IO-Wait has dropped significantly from 80% to 20%.
I'd be happy to send over some benchmarks if need be.

Matt Heler

On Saturday 18 March 2006 10:10 pm, Jon Smirl wrote:
> This is probably a readahead problem. The lighttpd people that are
> encountering this problem are not regular lkml readers.
>
> http://bugzilla.kernel.org/show_bug.cgi?id=5949
>
> --
> Jon Smirl
> [email protected]
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2006-03-28 03:15:44

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH 00/23] Adaptive read-ahead V11

On Mon, Mar 27, 2006 at 04:38:33PM -0500, Matt Heler wrote:
> We use lighttpd on our servers, and I can say with 100% that this problem
> happens alot. Because of this , we were forced to use to userspace mechanism
> that the lighttpd author made to cirvumvent this issue. However with this
> patch, I'm unable to produce any of the problems we had experienced before.
> IO-Wait has dropped significantly from 80% to 20%.
> I'd be happy to send over some benchmarks if need be.

Thanks, your production service would be the best benchmark ;)

Would you send some performance numbers and the basic server configuration?

Wu