backing_dev_info.ra_expect_bytes is dynamicly updated to be the expected
read pages on start-of-file. It allows the initial readahead to be more
aggressive and hence efficient.
Signed-off-by: Wu Fengguang <[email protected]>
---
fs/file_table.c | 7 ++++++
include/linux/mm.h | 1
mm/readahead.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 63 insertions(+)
--- linux-2.6.17-rc4-mm3.orig/include/linux/mm.h
+++ linux-2.6.17-rc4-mm3/include/linux/mm.h
@@ -1032,6 +1032,7 @@ unsigned long page_cache_readahead(struc
void handle_ra_miss(struct address_space *mapping,
struct file_ra_state *ra, pgoff_t offset);
unsigned long max_sane_readahead(unsigned long nr);
+void fastcall readahead_close(struct file *file);
#ifdef CONFIG_ADAPTIVE_READAHEAD
extern int readahead_ratio;
--- linux-2.6.17-rc4-mm3.orig/fs/file_table.c
+++ linux-2.6.17-rc4-mm3/fs/file_table.c
@@ -12,6 +12,7 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/smp_lock.h>
+#include <linux/mm.h>
#include <linux/fs.h>
#include <linux/security.h>
#include <linux/eventpoll.h>
@@ -160,6 +161,12 @@ void fastcall __fput(struct file *file)
might_sleep();
fsnotify_close(file);
+
+#ifdef CONFIG_ADAPTIVE_READAHEAD
+ if (file->f_ra.flags & RA_FLAG_EOF)
+ readahead_close(file);
+#endif
+
/*
* The function eventpoll_release() should be the first called
* in the file cleanup chain.
--- linux-2.6.17-rc4-mm3.orig/mm/readahead.c
+++ linux-2.6.17-rc4-mm3/mm/readahead.c
@@ -1555,6 +1555,61 @@ static inline void get_readahead_bounds(
PAGES_KB(128)), *ra_max / 2);
}
+/*
+ * When closing a normal readonly file,
+ * - on cache hit: increase `backing_dev_info.ra_expect_bytes' slowly;
+ * - on cache miss: decrease it rapidly.
+ *
+ * The resulted `ra_expect_bytes' answers the question of:
+ * How many pages are expected to be read on start-of-file?
+ */
+void fastcall readahead_close(struct file *file)
+{
+ struct inode *inode = file->f_dentry->d_inode;
+ struct address_space *mapping = inode->i_mapping;
+ struct backing_dev_info *bdi = mapping->backing_dev_info;
+ unsigned long pos = file->f_pos;
+ unsigned long pgrahit = file->f_ra.cache_hits;
+ unsigned long pgaccess = 1 + pos / PAGE_CACHE_SIZE;
+ unsigned long pgcached = mapping->nrpages;
+
+ if (!pos) /* pread */
+ return;
+
+ if (pgcached > bdi->ra_pages0) /* excessive reads */
+ return;
+
+ if (pgaccess >= pgcached) {
+ if (bdi->ra_expect_bytes < bdi->ra_pages0 * PAGE_CACHE_SIZE)
+ bdi->ra_expect_bytes += pgcached * PAGE_CACHE_SIZE / 8;
+
+ debug_inc(initial_ra_hit);
+ dprintk("initial_ra_hit on file %s size %lluK "
+ "pos %lu by %s(%d)\n",
+ file->f_dentry->d_name.name,
+ i_size_read(inode) / 1024,
+ pos,
+ current->comm, current->pid);
+ } else {
+ unsigned long missed;
+
+ missed = (pgcached - pgaccess) * PAGE_CACHE_SIZE;
+ if (bdi->ra_expect_bytes >= missed / 2)
+ bdi->ra_expect_bytes -= missed / 2;
+
+ debug_inc(initial_ra_miss);
+ dprintk("initial_ra_miss on file %s "
+ "size %lluK cached %luK hit %luK "
+ "pos %lu by %s(%d)\n",
+ file->f_dentry->d_name.name,
+ i_size_read(inode) / 1024,
+ pgcached << (PAGE_CACHE_SHIFT - 10),
+ pgrahit << (PAGE_CACHE_SHIFT - 10),
+ pos,
+ current->comm, current->pid);
+ }
+}
+
#endif /* CONFIG_ADAPTIVE_READAHEAD */
/*
--
BTW. while your patchset might be nicely broken down, I think your
naming and descriptions are letting it down a little bit.
Wu Fengguang wrote:
>Aggressive readahead policy for read on start-of-file.
>
>Instead of selecting a conservative readahead size,
>it tries to do large readahead in the first place.
>
>However we have to watch on two cases:
> - do not ruin the hit rate for file-head-checkers
> - do not lead to thrashing for memory tight systems
>
>
How does it handle
- don't needlessly readahead too much if the file is in cache
Would the current readahead mechanism benefit from more aggressive
start-of-file
readahead?
--
Send instant messages to your online friends http://au.messenger.yahoo.com
On Thu, May 25, 2006 at 03:34:30PM +1000, Nick Piggin wrote:
> BTW. while your patchset might be nicely broken down, I think your
> naming and descriptions are letting it down a little bit.
:) Maybe more practices will help.
> Wu Fengguang wrote:
>
> >Aggressive readahead policy for read on start-of-file.
> >
> >Instead of selecting a conservative readahead size,
> >it tries to do large readahead in the first place.
> >
> >However we have to watch on two cases:
> > - do not ruin the hit rate for file-head-checkers
> > - do not lead to thrashing for memory tight systems
> >
> >
>
> How does it handle
> - don't needlessly readahead too much if the file is in cache
It is prevented by the calling scheme.
The adaptive readahead logic will only be called on
- read a non-cached page
So readahead will be started/stopped on demand.
- read a PG_readahead marked page
Since the PG_readahead mark will only be set on fresh
new pages in __do_page_cache_readahead(), readahead
will automatically cease on cache hit.
>
> Would the current readahead mechanism benefit from more aggressive
> start-of-file
> readahead?
It will have the same benefits(and drawbacks).
[QUOTE FROM ANOTHER MAIL]
> can we try to incrementally improve the current logic as well as work
> towards merging your readahead rewrite?
The current readahead is left untouched on purpose.
If I understand it right, its simplicity is a great virtue. And it is
hard to improve it without loosing this virtue, or avoid disturbing
old users.
Then the new framework provides a ideal testbed for fancy new things.
We can do experimental things without calling for complaints(before it
is stabilized after one year). And then we might port some proved
features to the current logic.
Wu
Wu Fengguang <[email protected]> wrote:
>
> backing_dev_info.ra_expect_bytes is dynamicly updated to be the expected
> read pages on start-of-file. It allows the initial readahead to be more
> aggressive and hence efficient.
>
>
> +void fastcall readahead_close(struct file *file)
eww, fastcall.
> +{
> + struct inode *inode = file->f_dentry->d_inode;
> + struct address_space *mapping = inode->i_mapping;
> + struct backing_dev_info *bdi = mapping->backing_dev_info;
> + unsigned long pos = file->f_pos;
f_pos is loff_t.
On Fri, May 26, 2006 at 10:29:34AM -0700, Andrew Morton wrote:
> Wu Fengguang <[email protected]> wrote:
> >
> > backing_dev_info.ra_expect_bytes is dynamicly updated to be the expected
> > read pages on start-of-file. It allows the initial readahead to be more
> > aggressive and hence efficient.
> >
> >
> > +void fastcall readahead_close(struct file *file)
>
> eww, fastcall.
Hehe, it's a tiny function, and calls no further sub-routines
except debugging ones. Still not necessary?
> > +{
> > + struct inode *inode = file->f_dentry->d_inode;
> > + struct address_space *mapping = inode->i_mapping;
> > + struct backing_dev_info *bdi = mapping->backing_dev_info;
> > + unsigned long pos = file->f_pos;
>
> f_pos is loff_t.
Just meant to be a little more compact ;)
+ unsigned long pos = file->f_pos;
+ unsigned long pgrahit = file->f_ra.cache_hits;
+ unsigned long pgaccess = 1 + pos / PAGE_CACHE_SIZE;
+ unsigned long pgcached = mapping->nrpages;
+
+ if (!pos) /* pread */
+ return;
+
+ if (pgcached > bdi->ra_pages0) /* excessive reads */
+ return;
Here the f_pos will almost definitely has small values.
+
+ if (pgaccess >= pgcached) {
Fixed by adding a comment to clarify it:
+ unsigned long pos = file->f_pos; /* supposed to be small */