2007-12-16 12:05:15

by Wu Fengguang

[permalink] [raw]
Subject: [PATCH 0/9] mmap read-around and readahead

Andrew,

Here are the mmap read-around related patches initiated by Linus.
They are for linux-2.6.24-rc4-mm1. The one major new feature -
auto detection and early readahead for mmap sequential reads - runs
as expected on my desktop :-)


[PATCH 1/9] readahead: simplify readahead call scheme
[PATCH 2/9] readahead: clean up and simplify the code for filemap page fault readahead
[PATCH 3/9] readahead: auto detection of sequential mmap reads
[PATCH 4/9] readahead: quick startup on sequential mmap readahead
[PATCH 5/9] readahead: make ra_submit() non-static
[PATCH 6/9] readahead: save mmap read-around states in file_ra_state
[PATCH 7/9] readahead: remove unused do_page_cache_readahead()
[PATCH 8/9] readahead: move max_sane_readahead() calls into force_page_cache_readahead()
[PATCH 9/9] readahead: call max_sane_readahead() in ondemand_readahead()

Thank you,
Fengguang
--


2007-12-16 23:36:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/9] mmap read-around and readahead



On Sun, 16 Dec 2007, Fengguang Wu wrote:
>
> Here are the mmap read-around related patches initiated by Linus.
> They are for linux-2.6.24-rc4-mm1. The one major new feature -
> auto detection and early readahead for mmap sequential reads - runs
> as expected on my desktop :-)

Just out of interest - did you check to see if it makes any difference to
any IO patterns (or even timings)?

Linus

2007-12-18 11:46:38

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH 0/9] mmap read-around and readahead

On Sun, Dec 16, 2007 at 03:35:58PM -0800, Linus Torvalds wrote:
>
>
> On Sun, 16 Dec 2007, Fengguang Wu wrote:
> >
> > Here are the mmap read-around related patches initiated by Linus.
> > They are for linux-2.6.24-rc4-mm1. The one major new feature -
> > auto detection and early readahead for mmap sequential reads - runs
> > as expected on my desktop :-)
>
> Just out of interest - did you check to see if it makes any difference to
> any IO patterns (or even timings)?

No timings for now... but I wrote a debug patch(attached) and watched
it running for about a week. Here are some interesting numbers:

% grep .so, /var/log/kern.log|grep init0|wc
4085 60806 583895

% grep .so, /var/log/kern.log|grep around|wc
14438 215265 2107308
% grep .so, /var/log/kern.log|grep around|grep '= 32' | wc
3133 46757 462446

% grep .so, /var/log/kern.log|grep interleaved|wc
997 14866 148921
% grep .so, /var/log/kern.log|grep interleaved|grep '= 0'|wc
544 8089 79661
% grep .so, /var/log/kern.log|grep interleaved|grep '= 32'|wc
179 2683 28233

% grep .so, /var/log/kern.log|grep sequential|wc
3499 52275 541319
% grep .so, /var/log/kern.log|grep sequential|grep '= 0' | wc
915 13598 131953
% grep .so, /var/log/kern.log|grep sequential|grep '= 32' | wc
1327 19880 212896

That says, there are
4085 page faults on start-of-lib-file,
14438 mmap read-around, 22% full ra size
3499 mmap async readahead, 38% full ra size, or 51% if removing pure cache hits
997 mmap sync readahead, 18% full ra size, or 40% if removing pure cache hits
That's good numbers: I/O sizes get larger, and possibly less I/O waits :-)

Sure it's rather coarse estimation, but there are some sequential mmap accesses.
E.g.

[11736.998347] readahead-init0(process: sh/23926, file: sda1/w3m, offset=0:-1, ra=0+4-3) = 4
[11737.014985] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:0, ra=290+32-0) = 17
[11737.019488] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:0, ra=118+32-0) = 32
[11737.024921] readahead-interleaved(process: w3m/23926, file: sda1/w3m, offset=0:2, ra=4+6-6) = 6
[11737.025726] readahead-sequential(process: w3m/23926, file: sda1/w3m, offset=0:3, ra=10+12-12) = 12
[11737.025794] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:4, ra=90+32-0) = 28
--- sequential begin ---
[11737.037893] readahead-init(process: w3m/23926, file: sda1/w3m, offset=0:149, ra=150+64-32) = 64
[11737.043928] readahead-sequential(process: w3m/23926, file: sda1/w3m, offset=0:181, ra=214+32-32) = 32
[11737.044086] readahead-sequential(process: w3m/23926, file: sda1/w3m, offset=0:213, ra=246+32-32) = 32
[11737.045633] readahead-sequential(process: w3m/23926, file: sda1/w3m, offset=0:245, ra=278+32-32) = 12
[11737.047321] readahead-sequential(process: w3m/23926, file: sda1/w3m, offset=0:277, ra=310+32-32) = 0
--- sequential end ---
[11737.048296] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:119, ra=48+32-0) = 32
[11737.066908] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:63, ra=73+32-0) = 10
[11737.136880] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:116, ra=30+32-0) = 18
[11737.166005] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:37, ra=6+32-0) = 8


But also there is one minor problem.

[16416.600720] readahead-init0(process: zsh/30490, file: sda1/bc, offset=0:-1, ra=0+4-3) = 4
[16416.607967] readahead-around(process: bc/30490, file: sda1/bc, offset=0:0, ra=1+32-0) = 14

The 4-page readahead-init0() hurts performance. It occurs before every initial mmap reads.
A longer example:

wfg ~% dmesg|grep mplayer
[ 1221.454230] readahead-init0(process: mutt/7131, file: md0/mplayer-devel, offset=0:-1, ra=0+4-3) = 4
[ 1378.667305] readahead-init0(process: strace/7352, file: sda1/mplayer, offset=0:-1, ra=0+4-3) = 4
[ 1378.692389] readahead-around(process: mplayer/7352, file: sda1/mplayer, offset=0:0, ra=2212+32-0) = 17
[ 1378.703656] readahead-around(process: mplayer/7352, file: sda1/mplayer, offset=0:0, ra=2061+32-0) = 32
[ 1378.715537] readahead-around(process: mplayer/7352, file: sda1/mplayer, offset=0:2077, ra=0+32-0) = 28
[ 1378.716261] readahead-around(process: mplayer/7352, file: sda1/mplayer, offset=0:10, ra=44+32-0) = 32
[ 1378.727570] readahead-init0(process: mplayer/7352, file: sda1/libdirectfb-0.9.so.25.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.740579] readahead-around(process: mplayer/7352, file: sda1/libdirectfb-0.9.so.25.0.0, offset=0:0, ra=79+32-0) = 17
[ 1378.744826] readahead-around(process: mplayer/7352, file: sda1/libdirectfb-0.9.so.25.0.0, offset=0:1, ra=0+32-0) = 28
[ 1378.749882] readahead-init0(process: mplayer/7352, file: sda1/libXv.so.1.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.754546] readahead-around(process: mplayer/7352, file: sda1/libXv.so.1.0.0, offset=0:0, ra=0+32-0) = 1
[ 1378.758057] readahead-init0(process: mplayer/7352, file: sda1/libXvMC.so.1.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.759566] readahead-init0(process: mplayer/7352, file: sda1/libXvMCW.so.1.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.764991] readahead-init0(process: mplayer/7352, file: sda1/libXxf86dga.so.1.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.766036] readahead-around(process: mplayer/7352, file: sda1/libXxf86dga.so.1.0.0, offset=0:0, ra=0+32-0) = 2
[ 1378.766887] readahead-init0(process: mplayer/7352, file: sda1/libGL.so.1.2, offset=0:-1, ra=0+4-3) = 4
[ 1378.778437] readahead-around(process: mplayer/7352, file: sda1/libGL.so.1.2, offset=0:0, ra=109+32-0) = 17
[ 1378.782107] readahead-around(process: mplayer/7352, file: sda1/libGL.so.1.2, offset=0:2, ra=1+32-0) = 29
[ 1378.792935] readahead-init0(process: mplayer/7352, file: sda1/libggi.so.2.0.2, offset=0:-1, ra=0+4-3) = 4
[ 1378.799236] readahead-around(process: mplayer/7352, file: sda1/libggi.so.2.0.2, offset=0:0, ra=132+32-0) = 18
[ 1378.808167] readahead-around(process: mplayer/7352, file: sda1/libggi.so.2.0.2, offset=0:0, ra=0+32-0) = 28
[ 1378.808759] readahead-init0(process: mplayer/7352, file: sda1/libaa.so.1.0.4, offset=0:-1, ra=0+4-3) = 4
[ 1378.818428] readahead-around(process: mplayer/7352, file: sda1/libaa.so.1.0.4, offset=0:0, ra=12+32-0) = 18
[ 1378.830829] readahead-init0(process: mplayer/7352, file: sda1/libcaca.so.0.99.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.832195] readahead-around(process: mplayer/7352, file: sda1/libcaca.so.0.99.0, offset=0:0, ra=0+32-0) = 6
[ 1378.832945] readahead-init0(process: mplayer/7352, file: sda1/libcucul.so.0.99.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.837474] readahead-around(process: mplayer/7352, file: sda1/libcucul.so.0.99.0, offset=0:0, ra=135+32-0) = 18
[ 1378.844951] readahead-around(process: mplayer/7352, file: sda1/libcucul.so.0.99.0, offset=0:151, ra=1+32-0) = 29
[ 1378.845851] readahead-init0(process: mplayer/7352, file: sda1/libSDL-1.2.so.0.11.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.867151] readahead-around(process: mplayer/7352, file: sda1/libSDL-1.2.so.0.11.0, offset=0:0, ra=88+32-0) = 18
[ 1378.871796] readahead-around(process: mplayer/7352, file: sda1/libSDL-1.2.so.0.11.0, offset=0:0, ra=0+32-0) = 28
[ 1378.873248] readahead-init0(process: mplayer/7352, file: sda1/libartsc.so.0.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.885419] readahead-around(process: mplayer/7352, file: sda1/libartsc.so.0.0.0, offset=0:0, ra=0+32-0) = 2
[ 1378.892469] readahead-init0(process: mplayer/7352, file: sda1/libpulse.so.0.2.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.903642] readahead-around(process: mplayer/7352, file: sda1/libpulse.so.0.2.0, offset=0:0, ra=43+32-0) = 17
[ 1378.907206] readahead-around(process: mplayer/7352, file: sda1/libpulse.so.0.2.0, offset=0:1, ra=0+32-0) = 28
[ 1378.918549] readahead-init0(process: mplayer/7352, file: sda1/libjack.so.0.0.23, offset=0:-1, ra=0+4-3) = 4
[ 1378.928575] readahead-around(process: mplayer/7352, file: sda1/libjack.so.0.0.23, offset=0:0, ra=2+32-0) = 16
[ 1378.940046] readahead-init0(process: mplayer/7352, file: sda1/libopenal.so.0.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.963093] readahead-around(process: mplayer/7352, file: sda1/libopenal.so.0.0.0, offset=0:0, ra=42+32-0) = 17
[ 1378.981748] readahead-init0(process: mplayer/7352, file: sda1/libfaac.so.0.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1378.993281] readahead-around(process: mplayer/7352, file: sda1/libfaac.so.0.0.0, offset=0:0, ra=0+32-0) = 14
[ 1378.994296] readahead-init0(process: mplayer/7352, file: sda1/libx264.so.55, offset=0:-1, ra=0+4-3) = 4
[ 1379.004907] readahead-around(process: mplayer/7352, file: sda1/libx264.so.55, offset=0:0, ra=112+32-0) = 18
[ 1379.010374] readahead-around(process: mplayer/7352, file: sda1/libx264.so.55, offset=0:0, ra=0+32-0) = 28
[ 1379.025175] readahead-init0(process: mplayer/7352, file: sda1/libsmbclient.so.0.1, offset=0:-1, ra=0+4-3) = 4
[ 1379.040139] readahead-around(process: mplayer/7352, file: sda1/libsmbclient.so.0.1, offset=0:0, ra=530+32-0) = 17
[ 1379.043905] readahead-around(process: mplayer/7352, file: sda1/libsmbclient.so.0.1, offset=0:535, ra=0+32-0) = 28
[ 1379.044276] readahead-around(process: mplayer/7352, file: sda1/libsmbclient.so.0.1, offset=0:8, ra=49+32-0) = 32
[ 1379.083560] readahead-init0(process: mplayer/7352, file: sda1/libungif.so.4.1.4, offset=0:-1, ra=0+4-3) = 4
[ 1379.088050] readahead-around(process: mplayer/7352, file: sda1/libungif.so.4.1.4, offset=0:0, ra=0+32-0) = 4
[ 1379.095605] readahead-init0(process: mplayer/7352, file: sda1/libcdda_interface.so.0.10.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.100462] readahead-around(process: mplayer/7352, file: sda1/libcdda_interface.so.0.10.0, offset=0:0, ra=0+32-0) = 12
[ 1379.100889] readahead-init0(process: mplayer/7352, file: sda1/libcdda_paranoia.so.0.10.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.108911] readahead-around(process: mplayer/7352, file: sda1/libcdda_paranoia.so.0.10.0, offset=0:0, ra=0+32-0) = 4
[ 1379.110094] readahead-init0(process: mplayer/7352, file: sda1/libfribidi.so.0.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.111707] readahead-around(process: mplayer/7352, file: sda1/libfribidi.so.0.0.0, offset=0:0, ra=0+32-0) = 11
[ 1379.116159] readahead-init0(process: mplayer/7352, file: sda1/libspeex.so.1.2.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.134065] readahead-around(process: mplayer/7352, file: sda1/libspeex.so.1.2.0, offset=0:0, ra=18+32-0) = 17
[ 1379.137322] readahead-init0(process: mplayer/7352, file: sda1/libtheora.so.0.2.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.137976] readahead-around(process: mplayer/7352, file: sda1/libtheora.so.0.2.0, offset=0:0, ra=33+32-0) = 18
[ 1379.141476] readahead-init0(process: mplayer/7352, file: sda1/libmpcdec.so.3.1.1, offset=0:-1, ra=0+4-3) = 4
[ 1379.150304] readahead-around(process: mplayer/7352, file: sda1/libmpcdec.so.3.1.1, offset=0:0, ra=0+32-0) = 10
[ 1379.151400] readahead-init0(process: mplayer/7352, file: sda1/libamrnb.so.2.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.169518] readahead-around(process: mplayer/7352, file: sda1/libamrnb.so.2.0.0, offset=0:0, ra=44+32-0) = 17
[ 1379.171870] readahead-init0(process: mplayer/7352, file: sda1/libamrwb.so.2.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.172558] readahead-around(process: mplayer/7352, file: sda1/libamrwb.so.2.0.0, offset=0:0, ra=28+32-0) = 17
[ 1379.179794] readahead-init0(process: mplayer/7352, file: sda1/libdv.so.4.0.3, offset=0:-1, ra=0+4-3) = 4
[ 1379.196072] readahead-around(process: mplayer/7352, file: sda1/libdv.so.4.0.3, offset=0:0, ra=13+32-0) = 17
[ 1379.209467] readahead-init0(process: mplayer/7352, file: sda1/libxvidcore.so.4.1, offset=0:-1, ra=0+4-3) = 4
[ 1379.210581] readahead-around(process: mplayer/7352, file: sda1/libxvidcore.so.4.1, offset=0:0, ra=115+32-0) = 18
[ 1379.225045] readahead-init0(process: mplayer/7352, file: sda1/liblirc_client.so.0.1.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.229523] readahead-around(process: mplayer/7352, file: sda1/liblirc_client.so.0.1.0, offset=0:0, ra=0+32-0) = 2
[ 1379.230907] readahead-init0(process: mplayer/7352, file: sda1/libdirect-0.9.so.25.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.237679] readahead-around(process: mplayer/7352, file: sda1/libdirect-0.9.so.25.0.0, offset=0:0, ra=0+32-0) = 12
[ 1379.238163] readahead-init0(process: mplayer/7352, file: sda1/libfusion-0.9.so.25.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.245010] readahead-around(process: mplayer/7352, file: sda1/libfusion-0.9.so.25.0.0, offset=0:0, ra=0+32-0) = 3
[ 1379.246950] readahead-init0(process: mplayer/7352, file: sda1/libXxf86vm.so.1.0.0, offset=0:-1, ra=0+4-3) = 4
[ 1379.255703] readahead-around(process: mplayer/7352, file: sda1/libXxf86vm.so.1.0.0, offset=0:0, ra=0+32-0) = 1

There are so many readahead-init0() calls... because ld-linux.so will
do a read(0+832) before doing mmap(in L1):

L0: open("/lib/libc.so.6", O_RDONLY) = 3
L1: read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\342"..., 832) = 832
L2: fstat(3, {st_mode=S_IFREG|0755, st_size=1420624, ...}) = 0
L3: mmap(NULL, 3527256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fac6e51d000
L4: mprotect(0x7fac6e671000, 2097152, PROT_NONE) = 0
L5: mmap(0x7fac6e871000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x154000) = 0x7fac6e871000
L6: mmap(0x7fac6e876000, 16984, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fac6e876000
L7: close(3) = 0


I cannot think of a good solution to it. Teaching ld-linux.so to blindly
do a fadvise(128KB) looks bad. And the kernel can do little about it.

This is also the major reason I disabled the interleaved readahead
support for mmap reads. Otherwise the PG_readahead flag leaved by
ld-linux.so will trigger _small_ interleaved readahead like this:

readahead-interleaved(process: firefox-bin/4596, file: sda1/libmozjs.so, offset=0, ra=4+6-6) = 6

It would be a much larger read-around if we don't do that readahead ;-)

Fengguang

2007-12-18 12:13:39

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH 0/9] mmap read-around and readahead

On Tue, Dec 18, 2007 at 07:46:09PM +0800, Fengguang Wu wrote:
> No timings for now... but I wrote a debug patch(attached) and watched
> it running for about a week. Here are some interesting numbers:

Here are the (forgotten) readahead-debug.patch:

---
include/linux/fs.h | 43 ++++++++++++++++++++++++++++++++++
mm/Kconfig | 19 +++++++++++++++
mm/filemap.c | 1
mm/readahead.c | 54 ++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 116 insertions(+), 1 deletion(-)

--- linux-2.6.24-rc4-mm1.orig/include/linux/fs.h
+++ linux-2.6.24-rc4-mm1/include/linux/fs.h
@@ -760,11 +760,54 @@ struct file_ra_state {
unsigned int async_size; /* do asynchronous readahead when
there are only # of pages ahead */

+ unsigned int flags;
unsigned int ra_pages; /* Maximum readahead window */
int mmap_miss; /* Cache miss stat for mmap accesses */
loff_t prev_pos; /* Cache last read() position */
};

+#define RA_CLASS_SHIFT 4
+#define RA_CLASS_MASK ((1 << RA_CLASS_SHIFT) - 1)
+/*
+ * Detailed classification of read-ahead behaviors.
+ */
+enum ra_class {
+ RA_CLASS_INIT0,
+ RA_CLASS_INIT,
+ RA_CLASS_SEQUENTIAL,
+ RA_CLASS_INTERLEAVED,
+ RA_CLASS_CONTEXT,
+ RA_CLASS_AROUND,
+ RA_CLASS_COUNT
+};
+
+static inline enum ra_class ra_class_new(struct file_ra_state *ra)
+{
+ return ra->flags & RA_CLASS_MASK;
+}
+
+static inline enum ra_class ra_class_old(struct file_ra_state *ra)
+{
+ return (ra->flags >> RA_CLASS_SHIFT) & RA_CLASS_MASK;
+}
+
+/*
+ * Which method is issuing this read-ahead?
+ */
+static inline void ra_set_class(struct file_ra_state *ra, enum ra_class ra_class)
+{
+ unsigned long flags_mask;
+ unsigned long flags;
+ unsigned long old_ra_class;
+
+ flags_mask = ~(RA_CLASS_MASK | (RA_CLASS_MASK << RA_CLASS_SHIFT));
+ flags = ra->flags & flags_mask;
+
+ old_ra_class = ra_class_new(ra) << RA_CLASS_SHIFT;
+
+ ra->flags = flags | old_ra_class | ra_class;
+}
+
/*
* Check if @index falls in the readahead windows.
*/
--- linux-2.6.24-rc4-mm1.orig/mm/Kconfig
+++ linux-2.6.24-rc4-mm1/mm/Kconfig
@@ -194,3 +194,22 @@ config NR_QUICK
config VIRT_TO_BUS
def_bool y
depends on !ARCH_NO_VIRT_TO_BUS
+
+config DEBUG_READAHEAD
+ bool "Readahead debug and accounting"
+ default y
+ select DEBUG_FS
+ help
+ This option injects extra code to dump detailed debug traces and do
+ readahead events accounting.
+
+ To actually get the data:
+
+ mkdir /debug
+ mount -t debug none /debug
+
+ After that you can do the following:
+
+ echo > /debug/readahead/events # reset the counters
+ cat /debug/readahead/events # check the counters
+
--- linux-2.6.24-rc4-mm1.orig/mm/readahead.c
+++ linux-2.6.24-rc4-mm1/mm/readahead.c
@@ -16,6 +16,29 @@
#include <linux/task_io_accounting_ops.h>
#include <linux/pagevec.h>
#include <linux/pagemap.h>
+#include <linux/debugfs.h>
+
+static const char * const ra_class_name[] = {
+ [RA_CLASS_INIT0] = "init0",
+ [RA_CLASS_INIT] = "init",
+ [RA_CLASS_SEQUENTIAL] = "sequential",
+ [RA_CLASS_INTERLEAVED] = "interleaved",
+ [RA_CLASS_CONTEXT] = "context",
+ [RA_CLASS_AROUND] = "around",
+};
+
+#ifdef CONFIG_DEBUG_READAHEAD
+static u32 readahead_debug_level = 1;
+# define debug_option(o) (o)
+#else
+# define debug_option(o) (0)
+# define readahead_debug_level (0)
+#endif /* CONFIG_DEBUG_READAHEAD */
+
+#define dprintk(args...) \
+ do { if (readahead_debug_level >= 2) printk(KERN_DEBUG args); } while(0)
+#define ddprintk(args...) \
+ do { if (readahead_debug_level >= 3) printk(KERN_DEBUG args); } while(0)

void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page)
{
@@ -220,6 +243,13 @@ unsigned long max_sane_readahead(unsigne

static int __init readahead_init(void)
{
+#ifdef CONFIG_DEBUG_READAHEAD
+ struct dentry *root;
+
+ root = debugfs_create_dir("readahead", NULL);
+
+ debugfs_create_u32("debug_level", 0644, root, &readahead_debug_level);
+#endif
return bdi_init(&default_backing_dev_info);
}
subsys_initcall(readahead_init);
@@ -235,6 +265,15 @@ unsigned long ra_submit(struct file_ra_s
actual = __do_page_cache_readahead(mapping, filp,
ra->start, ra->size, ra->async_size);

+ dprintk("readahead-%s(process: %s/%d, file: %s/%s, "
+ "offset=%ld:%ld, ra=%ld+%d-%d) = %d\n",
+ ra_class_name[ra_class_new(ra)],
+ current->comm, current->pid,
+ mapping->host->i_sb->s_id,
+ filp->f_path.dentry->d_iname,
+ (long)(filp->f_pos >> PAGE_CACHE_SHIFT),
+ (long)(ra->prev_pos >> PAGE_CACHE_SHIFT),
+ ra->start, ra->size, ra->async_size, actual);
return actual;
}

@@ -337,6 +376,7 @@ ondemand_readahead(struct address_space
ra->start += ra->size;
ra->size = get_next_ra_size(ra, max);
ra->async_size = ra->size;
+ ra_set_class(ra, RA_CLASS_SEQUENTIAL);
goto readit;
}

@@ -348,8 +388,15 @@ ondemand_readahead(struct address_space
* Read as is, and do not pollute the readahead state.
*/
if (!hit_readahead_marker && !sequential) {
- return __do_page_cache_readahead(mapping, filp,
+ int actual = __do_page_cache_readahead(mapping, filp,
offset, req_size, 0);
+ dprintk("read-random(process: %s/%d, file: %s/%s, "
+ "req=%ld+%ld) = %d\n",
+ current->comm, current->pid,
+ mapping->host->i_sb->s_id,
+ filp->f_path.dentry->d_iname,
+ offset, req_size, actual);
+ return actual;
}

/*
@@ -372,6 +419,7 @@ ondemand_readahead(struct address_space
ra->size = start - offset; /* old async_size */
ra->size = get_next_ra_size(ra, max);
ra->async_size = ra->size;
+ ra_set_class(ra, RA_CLASS_INTERLEAVED);
goto readit;
}

@@ -385,6 +433,10 @@ ondemand_readahead(struct address_space
ra->start = offset;
ra->size = get_init_ra_size(req_size, max);
ra->async_size = ra->size > req_size ? ra->size - req_size : ra->size;
+ if (offset)
+ ra_set_class(ra, RA_CLASS_INIT);
+ else
+ ra_set_class(ra, RA_CLASS_INIT0);

readit:
/*
--- linux-2.6.24-rc4-mm1.orig/mm/filemap.c
+++ linux-2.6.24-rc4-mm1/mm/filemap.c
@@ -1340,6 +1340,7 @@ static void do_sync_mmap_readahead(struc
ra->start = max_t(long, 0, offset - ra_pages / 2);
ra->size = ra_pages;
ra->async_size = 0;
+ ra_set_class(ra, RA_CLASS_AROUND);
ra_submit(ra, mapping, file);
}
}

2007-12-19 07:37:31

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH 0/9] mmap read-around and readahead

On Sun, Dec 16, 2007 at 03:35:58PM -0800, Linus Torvalds wrote:
>
>
> On Sun, 16 Dec 2007, Fengguang Wu wrote:
> >
> > Here are the mmap read-around related patches initiated by Linus.
> > They are for linux-2.6.24-rc4-mm1. The one major new feature -
> > auto detection and early readahead for mmap sequential reads - runs
> > as expected on my desktop :-)
>
> Just out of interest - did you check to see if it makes any difference to
> any IO patterns (or even timings)?

Now I have some numbers on 100,000 sequential mmap reads:

user system cpu total
(1-1) plain -mm, 128KB readaround: 3.224 2.554 48.40% 11.838
(1-2) plain -mm, 256KB readaround: 3.170 2.392 46.20% 11.976
(2) patched -mm, 128KB readahead: 3.117 2.448 47.33% 11.607

The patched (2) has smallest total time. It has no cache hit overheads
and less I/O block time(thanks to async readahead). Here the I/O size
makes no much difference, since there's only one single stream.

Note that (1-1)'s real I/O size is 64KB and (1-2)'s real I/O size is
128KB, since the half of the read-around pages will be cache hits.

Fengguang
---

PS. raw time numbers:

1) linux-2.6.24-rc5-mm1, 128KB read_ahead_kb:

3.27s user 2.62s system 50% cpu 11.730 total
3.25s user 2.65s system 49% cpu 11.816 total
3.07s user 2.62s system 47% cpu 11.911 total
3.32s user 2.42s system 48% cpu 11.948 total
3.21s user 2.46s system 48% cpu 11.787 total

2) linux-2.6.24-rc5-mm1, 256KB read_ahead_kb:

3.00s user 2.46s system 45% cpu 12.077 total
3.41s user 2.51s system 49% cpu 12.038 total
3.25s user 2.34s system 47% cpu 11.889 total
3.13s user 2.33s system 45% cpu 11.922 total
3.06s user 2.32s system 45% cpu 11.952 total

3) linux-2.6.24-rc5-mm1 + this patchset, 128KB read_ahead_kb:

2.79s user 2.26s system 43% cpu 11.515 total
3.19s user 2.21s system 46% cpu 11.563 total
3.28s user 2.51s system 49% cpu 11.596 total
3.22s user 2.75s system 51% cpu 11.687 total
3.08s user 2.58s system 48% cpu 11.643 total
3.14s user 2.38s system 47% cpu 11.637 total