2004-01-24 18:08:52

by Felix von Leitner

[permalink] [raw]
Subject: Request: I/O request recording

I would like to have a user space program that I could run while I cold
start KDE. The program would then record which I/O pages were read in
which order. The output of that program could then be used to pre-cache
all those pages, but in an order that reduces disk head movement.
Demand Loading unfortunately produces lots of random page I/O scattered
all over the disk.

Having a way to know which pages are accessed in which order at a
typical cold start would be very benefitial, not only for the purpose
described above but it could also be used as input for a linker code
reordering optimization.

What do you think?

Felix


2004-01-24 18:25:25

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: Request: I/O request recording

On Sat, 24 Jan 2004 19:10:27 +0100, Felix von Leitner <[email protected]> said:
> I would like to have a user space program that I could run while I cold
> start KDE. The program would then record which I/O pages were read in
> which order. The output of that program could then be used to pre-cache
> all those pages, but in an order that reduces disk head movement.
> Demand Loading unfortunately produces lots of random page I/O scattered
> all over the disk.

The Fedora version of the kernel-utils RPM includes /usr/sbin/readahead, which
gets launched like this:

start() {
echo -n $"Starting background readahead: "
/usr/sbin/readahead /usr/share/icons/Bluecurve/48x48/mimetypes/* &
/usr/sbin/readahead /usr/share/icons/Bluecurve/24x24/stock/* &
/usr/sbin/readahead /usr/share/applications/* &
/usr/sbin/readahead `cat /etc/readahead.files` &
}

So given that program, you could simpy strace your KDE stuff, grep out all the
open calls and the filenames, stick them in /etc/readahead.files, and be done.


Attachments:
(No filename) (226.00 B)

2004-01-24 18:26:34

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Request: I/O request recording

On Sat, 2004-01-24 at 19:10, Felix von Leitner wrote:
> I would like to have a user space program that I could run while I cold
> start KDE. The program would then record which I/O pages were read in
> which order. The output of that program could then be used to pre-cache
> all those pages, but in an order that reduces disk head movement.
> Demand Loading unfortunately produces lots of random page I/O scattered
> all over the disk.

I recently did something like this (and it scared me, it seems a typical
Fedora boot into gnome opens like 11.000 files ;) but via a printk in
the kernel....

I experimented with readahead'ing all that stuff while the initscripts
ran in the hope it would save time... but it doesn't somehow.

Some other things kinda help; if you feel adventurous you could play
with the kernel-utils RPM in rawhide which does a readahead of the files
the desktop opens while GDM login window is displayed; if the user isn't
typing his name really fast that decreases the desktop startup time...


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-01-24 19:25:57

by Ville Herva

[permalink] [raw]
Subject: Re: Request: I/O request recording

On Sat, Jan 24, 2004 at 07:26:17PM +0100, you [Arjan van de Ven] wrote:
>
> I recently did something like this (and it scared me, it seems a typical
> Fedora boot into gnome opens like 11.000 files ;) but via a printk in
> the kernel....
>
> I experimented with readahead'ing all that stuff while the initscripts
> ran in the hope it would save time... but it doesn't somehow.

Did you sort the sectors to be read, or just read the files into page cache
in randomish order ?

Or do you mean that even after all the files were read into cache, the X
startup time didn't get any better (not counting the cache priming)?


-- v --

[email protected]

2004-01-24 20:13:25

by grundig

[permalink] [raw]
Subject: Re: Request: I/O request recording

El Sat, 24 Jan 2004 19:10:27 +0100 Felix von Leitner <[email protected]> escribi?:

> I would like to have a user space program that I could run while I cold
> start KDE. The program would then record which I/O pages were read in
> which order. The output of that program could then be used to pre-cache
> all those pages, but in an order that reduces disk head movement.
> Demand Loading unfortunately produces lots of random page I/O scattered
> all over the disk.
>
> Having a way to know which pages are accessed in which order at a
> typical cold start would be very benefitial, not only for the purpose
> described above but it could also be used as input for a linker code
> reordering optimization.
>
> What do you think?

That's exactly what XP does (and Mac OS X, for that matter).
And it really works (ie: you can notice it)

XP records what the OS does in the first 2 minutes (or so). The next
time it boots, it tries to load the files that he knows that are going
to be used. The same for an app that is frecuently used: it records
what the app does, and it optimizes the startup of that app.
Take a look at: (search prefetch)
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/appendix/hh/appendix/enhancements5_0qhx.asp
http://msdn.microsoft.com/msdnmag/issues/01/12/xpkernel/default.aspx

Andrew Morton wrote a patch some time ago for 2.5.64-mm6 which achieves a
similar effect, I think:

" To test the nonlinear mapping code more thoroughly I have arranged for all
executable file-backed mmaps to be treated as nonlinear.

This means that when an executable is first mapped in, the kernel will
slurp the whole thing off disk in one hit. Some IO changes were made to
speed this up.

This means that large cache-cold executables start significantly faster.
Launching X11+KDE+mozilla goes from 23 seconds to 16. Starting OpenOffice
seems to be 2x to 3x faster, and starting Konqueror maybe 3x faster too.
Interesting."
(see: http://www.ussg.iu.edu/hypermail/linux/kernel/0303.1/1296.html)

The patches are still available. IIRC, they were dropped "because it should be
done in userspace". It'd very interesting to write userspace program
that does what XP does (it looks like a good idea for desktops)





http://www.ussg.iu.edu/hypermail/linux/kernel/0303.1/1296.html



2004-01-24 21:09:49

by Ville Herva

[permalink] [raw]
Subject: Re: Request: I/O request recording

On Sat, Jan 24, 2004 at 09:11:56PM +0100, you [Diego Calleja] wrote:
>
> That's exactly what XP does (and Mac OS X, for that matter).
> And it really works (ie: you can notice it)
>
> XP records what the OS does in the first 2 minutes (or so). The next
> time it boots, it tries to load the files that he knows that are going
> to be used. The same for an app that is frecuently used: it records
> what the app does, and it optimizes the startup of that app.
> Take a look at: (search prefetch)
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/appendix/hh/appendix/enhancements5_0qhx.asp
> http://msdn.microsoft.com/msdnmag/issues/01/12/xpkernel/default.aspx

It's perhaps worth pointing out that XP not only uses the boot (or
application launch) traces to prefetch the data on next boot (application
launch) but also to reorder the data on disk optimally via XP's
defragmenter.

And XP is noticeable faster to boot than (say) W2000.


-- v --

[email protected]

2004-01-24 22:44:20

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Request: I/O request recording

On Sat, Jan 24, 2004 at 09:25:45PM +0200, Ville Herva wrote:
> On Sat, Jan 24, 2004 at 07:26:17PM +0100, you [Arjan van de Ven] wrote:
> >
> > I recently did something like this (and it scared me, it seems a typical
> > Fedora boot into gnome opens like 11.000 files ;) but via a printk in
> > the kernel....
> >
> > I experimented with readahead'ing all that stuff while the initscripts
> > ran in the hope it would save time... but it doesn't somehow.
>
> Did you sort the sectors to be read, or just read the files into page cache
> in randomish order ?

semi random order but mostly submitted in parallel so the kernel has lots of
freedom to reorder

> Or do you mean that even after all the files were read into cache, the X
> startup time didn't get any better (not counting the cache priming)?

I mean that the time it takes to prime is just about exactly the time you
then win... eg net gain of about zero


Attachments:
(No filename) (917.00 B)
(No filename) (189.00 B)
Download all attachments

2004-01-24 23:36:24

by Andrew Morton

[permalink] [raw]
Subject: Re: Request: I/O request recording

Felix von Leitner <[email protected]> wrote:
>
> I would like to have a user space program that I could run while I cold
> start KDE. The program would then record which I/O pages were read in
> which order. The output of that program could then be used to pre-cache
> all those pages, but in an order that reduces disk head movement.
> Demand Loading unfortunately produces lots of random page I/O scattered
> all over the disk.

I wrote a similar thing in September of 2001. What you do is:

- Reboot the system, wait until everything is steady-state (eg: X has
started, applications are loaded).

- Load a kernel module which dumps the current contents of the pagecache
(filename/offset-into-file) into a file.

(The kernel module writes to modprobe's stdout, so you just do

modprobe fboot-dump > /tmp/fboot-dump.out

I'm very proud of this.)

- Post-process the resulting output into a database which is used on the
next reboot.

- reboot

- This time a userspace application cuts in real early and reads the
database and preloads all the pagecache using "optimal" I/O patterns so
that everything which you will need in the subsequent boot is already in
memory.


So it's all an attempt to optimise the boot-time I/O patterns. It was
pretty much a waste of time, gaining only 10% or so, from memory. You
could get just as much or more speedup from simply launching all the
initscripts in parallel, although this did tend to break stuff.

Anyway, the code's ancient but might provide some ideas:

http://www.zip.com.au/~akpm/linux/fboot.tar.gz


2004-01-24 23:53:46

by Davide Libenzi

[permalink] [raw]
Subject: Re: Request: I/O request recording

On Sat, 24 Jan 2004, Andrew Morton wrote:

> Felix von Leitner <[email protected]> wrote:
> >
> > I would like to have a user space program that I could run while I cold
> > start KDE. The program would then record which I/O pages were read in
> > which order. The output of that program could then be used to pre-cache
> > all those pages, but in an order that reduces disk head movement.
> > Demand Loading unfortunately produces lots of random page I/O scattered
> > all over the disk.
>
> I wrote a similar thing in September of 2001. What you do is:
>
> - Reboot the system, wait until everything is steady-state (eg: X has
> started, applications are loaded).
>
> - Load a kernel module which dumps the current contents of the pagecache
> (filename/offset-into-file) into a file.
>
> (The kernel module writes to modprobe's stdout, so you just do
>
> modprobe fboot-dump > /tmp/fboot-dump.out
>
> I'm very proud of this.)
>
> - Post-process the resulting output into a database which is used on the
> next reboot.
>
> - reboot
>
> - This time a userspace application cuts in real early and reads the
> database and preloads all the pagecache using "optimal" I/O patterns so
> that everything which you will need in the subsequent boot is already in
> memory.
>
>
> So it's all an attempt to optimise the boot-time I/O patterns. It was
> pretty much a waste of time, gaining only 10% or so, from memory. You
> could get just as much or more speedup from simply launching all the
> initscripts in parallel, although this did tend to break stuff.
>
> Anyway, the code's ancient but might provide some ideas:
>
> http://www.zip.com.au/~akpm/linux/fboot.tar.gz

Warning. I don't know if they do have a patent for this, but MS does this
starting from XP (look inside %WINDIR%\PreFetch). It is both boot and app
based.




- Davide


2004-01-25 00:04:19

by Andrew Morton

[permalink] [raw]
Subject: Re: Request: I/O request recording

Davide Libenzi <[email protected]> wrote:
>
> > http://www.zip.com.au/~akpm/linux/fboot.tar.gz
>
> Warning. I don't know if they do have a patent for this, but MS does this
> starting from XP (look inside %WINDIR%\PreFetch). It is both boot and app
> based.

Did they do it in August 2001?

2004-01-25 00:07:22

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: Request: I/O request recording

On Sat, 24 Jan 2004 15:53:44 PST, Davide Libenzi said:

> > Anyway, the code's ancient but might provide some ideas:
> >
> > http://www.zip.com.au/~akpm/linux/fboot.tar.gz
>
> Warning. I don't know if they do have a patent for this, but MS does this
> starting from XP (look inside %WINDIR%\PreFetch). It is both boot and app
> based.

Hmm.. prior art time. ;)

IBM's OS/VS1 and MVS operating systems had the 'link pack area', where
frequently loaded modules were loaded at system startup. And there were
numerous 3rd party optimizers that would analyze the LOAD SVC patterns on your
system and produce a list of which modules should be pre-loaded in order to get
the most bang for the buck (even a *large* 370/168 or 303x processor might be
able to spare a megabyte tops, so optimizing it was important, and sites would
spend $5K on software that would optimize the memory usage and save them a
memory upgrade at $40K a meg...

This was mid-70s, so definitely pre-XP.


Attachments:
(No filename) (226.00 B)

2004-01-25 00:09:16

by Davide Libenzi

[permalink] [raw]
Subject: Re: Request: I/O request recording

On Sat, 24 Jan 2004, Andrew Morton wrote:

> Davide Libenzi <[email protected]> wrote:
> >
> > > http://www.zip.com.au/~akpm/linux/fboot.tar.gz
> >
> > Warning. I don't know if they do have a patent for this, but MS does this
> > starting from XP (look inside %WINDIR%\PreFetch). It is both boot and app
> > based.
>
> Did they do it in August 2001?

Ouch, I don't know. I know for sure that it came with XP, but I'm not
really into MS things ;) This is one of the links that talk about that:

http://msdn.microsoft.com/msdnmag/issues/01/12/XPKernel/default.aspx



- Davide


2004-01-25 00:10:59

by Davide Libenzi

[permalink] [raw]
Subject: Re: Request: I/O request recording

On Sat, 24 Jan 2004 [email protected] wrote:

> On Sat, 24 Jan 2004 15:53:44 PST, Davide Libenzi said:
>
> > > Anyway, the code's ancient but might provide some ideas:
> > >
> > > http://www.zip.com.au/~akpm/linux/fboot.tar.gz
> >
> > Warning. I don't know if they do have a patent for this, but MS does this
> > starting from XP (look inside %WINDIR%\PreFetch). It is both boot and app
> > based.
>
> Hmm.. prior art time. ;)
>
> IBM's OS/VS1 and MVS operating systems had the 'link pack area', where
> frequently loaded modules were loaded at system startup. And there were
> numerous 3rd party optimizers that would analyze the LOAD SVC patterns on your
> system and produce a list of which modules should be pre-loaded in order to get
> the most bang for the buck (even a *large* 370/168 or 303x processor might be
> able to spare a megabyte tops, so optimizing it was important, and sites would
> spend $5K on software that would optimize the memory usage and save them a
> memory upgrade at $40K a meg...
>
> This was mid-70s, so definitely pre-XP.

They (MS) do work of a page fault basis though. It is quite different.



- Davide


2004-01-25 12:26:46

by Felipe Alfaro Solana

[permalink] [raw]
Subject: Re: Request: I/O request recording

On Sun, 2004-01-25 at 00:53, Davide Libenzi wrote:
> > So it's all an attempt to optimise the boot-time I/O patterns. It was
> > pretty much a waste of time, gaining only 10% or so, from memory. You
> > could get just as much or more speedup from simply launching all the
> > initscripts in parallel, although this did tend to break stuff.
> >
> > Anyway, the code's ancient but might provide some ideas:
> >
> > http://www.zip.com.au/~akpm/linux/fboot.tar.gz
>
> Warning. I don't know if they do have a patent for this, but MS does this
> starting from XP (look inside %WINDIR%\PreFetch). It is both boot and app
> based.

And tomorrow, they'll say the have patented the hamburger recipe, or the
euclidean triangle, or... who knows. C'mon... The world is going crazy!

2004-01-25 23:38:28

by Andrew Morton

[permalink] [raw]
Subject: Re: Request: I/O request recording

Bart Samwel <[email protected]> wrote:
>
> >
> > Linux caches disk data on a per-file basis. So if you preload pagecache
> > via the /dev/hda1 "file", that is of no benefit to the /etc/passwd file.
> > Each one has its own unique pagecache. When reading pages for /etc/passwd
> > we don't go looking for the same disk blocks in the cache of /dev/hda1.
> >
> > Which is why the userspace cache preloading needs to know the pathnames of
> > all the relevant files - it needs to open and read each one, applying
> > knowledge of disk layout while doing it.
>
> Hmmm, that explains why this didn't work. :( So if I wanted to do this
> completely from user space using only block_dump data I'd probably have
> to go through all files and find out if they had any blocks in common
> with my preload set -- presuming there is a way to find that out, which
> there probably isn't. That makes this idea pretty much useless, I'm
> sorry to have bothered you with it.
>

You could certainly do that. Given disk block #N you need to search all
files on the disk asking "who owns this block". The FIBMAP ioctl can be
used on most filesystems (ext2, ext3, others..) to find out which blocks a
file is using. See bmap.c in

http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz

Unfortunately you cannot determine a directory's blocks in this way.
Ext3's directories live in the /dev/hda1 pagecache anyway. ext2's
directories each have their own pagecache.

2004-01-25 23:31:43

by Bart Samwel

[permalink] [raw]
Subject: Re: Request: I/O request recording

Andrew Morton wrote:
> Bart Samwel <[email protected]> wrote:
>
>>When I saw this thread I've fiddled for a bit with the block_dump
>> functionality that's in the laptop_mode patch. I wanted to see if it
>> could support a similar thing completely from user space (except for the
>> block_dump code, of course). I've written a small tool to generate a
>> complete file that lists tuples (sector, size, device) from the kernel
>> output in syslog; it parses all "READ block xxx" messages since the
>> last reboot. Putting this through sort -n -u delivers a nicely sorted
>> file, ready for optimized reading.
>>
>> Unfortunately I'm now stuck within the other part, which is reading the
>> pages back in memory at the next boot. It's not working, and I was
>> hoping someone here could take a look and tell me what I'm doing wrong.
>
> Linux caches disk data on a per-file basis. So if you preload pagecache
> via the /dev/hda1 "file", that is of no benefit to the /etc/passwd file.
> Each one has its own unique pagecache. When reading pages for /etc/passwd
> we don't go looking for the same disk blocks in the cache of /dev/hda1.
>
> Which is why the userspace cache preloading needs to know the pathnames of
> all the relevant files - it needs to open and read each one, applying
> knowledge of disk layout while doing it.

Hmmm, that explains why this didn't work. :( So if I wanted to do this
completely from user space using only block_dump data I'd probably have
to go through all files and find out if they had any blocks in common
with my preload set -- presuming there is a way to find that out, which
there probably isn't. That makes this idea pretty much useless, I'm
sorry to have bothered you with it.

-- Bart

2004-01-25 23:09:45

by Andrew Morton

[permalink] [raw]
Subject: Re: Request: I/O request recording

Bart Samwel <[email protected]> wrote:
>
> When I saw this thread I've fiddled for a bit with the block_dump
> functionality that's in the laptop_mode patch. I wanted to see if it
> could support a similar thing completely from user space (except for the
> block_dump code, of course). I've written a small tool to generate a
> complete file that lists tuples (sector, size, device) from the kernel
> output in syslog; it parses all "READ block xxx" messages since the
> last reboot. Putting this through sort -n -u delivers a nicely sorted
> file, ready for optimized reading.
>
> Unfortunately I'm now stuck within the other part, which is reading the
> pages back in memory at the next boot. It's not working, and I was
> hoping someone here could take a look and tell me what I'm doing wrong.

Linux caches disk data on a per-file basis. So if you preload pagecache
via the /dev/hda1 "file", that is of no benefit to the /etc/passwd file.
Each one has its own unique pagecache. When reading pages for /etc/passwd
we don't go looking for the same disk blocks in the cache of /dev/hda1.

Which is why the userspace cache preloading needs to know the pathnames of
all the relevant files - it needs to open and read each one, applying
knowledge of disk layout while doing it.


2004-01-25 22:59:27

by Bart Samwel

[permalink] [raw]
Subject: Re: Request: I/O request recording

Andrew Morton wrote:
> Felix von Leitner <[email protected]> wrote:
>
>>I would like to have a user space program that I could run while I cold
>>start KDE. The program would then record which I/O pages were read in
>>which order. The output of that program could then be used to pre-cache
>>all those pages, but in an order that reduces disk head movement.
>>Demand Loading unfortunately produces lots of random page I/O scattered
>>all over the disk.
>
> I wrote a similar thing in September of 2001. What you do is:
[...]

When I saw this thread I've fiddled for a bit with the block_dump
functionality that's in the laptop_mode patch. I wanted to see if it
could support a similar thing completely from user space (except for the
block_dump code, of course). I've written a small tool to generate a
complete file that lists tuples (sector, size, device) from the kernel
output in syslog; it parses all "READ block xxx" messages since the
last reboot. Putting this through sort -n -u delivers a nicely sorted
file, ready for optimized reading.

Unfortunately I'm now stuck within the other part, which is reading the
pages back in memory at the next boot. It's not working, and I was
hoping someone here could take a look and tell me what I'm doing wrong.

Here's what I've tried so far. I've written a program that simply reads
the ranges by opening the device and reading from sector*512 to
sector*512+size. It uses async io for efficiency, and to allow the
kernel to merge read requests. It seems to read all the data, but after
that the other programs seem to read most of it *again*! I only go from
8500 down to 7000 reads or so, while most of the 7000 reads that remain
are also in the range that is being prefetched. :(

I was wondering if the pages could have been removed so soon, so, to
make sure, I mmaped the whole shebang with MAP_LOCKED and PROT_READ, and
kept the mapping process in memory during the whole boot process. This
had exactly the same effect. So, I thought that I might be reading the
wrong blocks. However, when I feed it something like (160000, 4096,
hdb1) I get a block_dump log that says exactly that (plus some extra,
because mmap seems to read in a bit more than needed). So, that's not it.

I'm out of clues. If someone would be so kind to take a look at what I'm
doing wrong, I'd very much appreciate it. I've put the code up at
http://www.xs4all.nl/~bsamwel/block_read_replay.tar.gz. How to use it:

1. Patch your kernel with the patch that's included in the tarball. This
patch modifies the block_dump output slightly, and enables a block_dump
value of 2 which only reports READ actions. It's against 2.6.1-mm2, but
it should apply fine to any kernel that has laptop_mode in it.

2. Record the bootup info. Somewhere at the very beginning, include
"echo 2 > /proc/sys/vm/block_dump" in an init script. Reboot, and after
the bootup sequence is complete, do echo 0 > /proc/sys/vm/block_dump.

3. "make" and put brexec (one of the two versions) somewhere your init
scripts can access it.

4. Run slbrp (SysLog Block Read Parser) to generate a block list file:
slbrp /var/log/syslog | sort -n -u > /etc/bootup_blocks.

5. Precede the echo 2 > /proc/sys/vm/block_dump at startup with a brexec
("block read executor") call, e.g. "brexec /etc/bootup_blocks". The mmap
version takes an extra parameter <N> = the number of seconds to keep the
pages mapped and must be put in the background because it will simply
wait for N seconds before exiting. So, it should be something like
"brexec /etc/bootup_blocks 60" and then "sleep 30" to give it time to
read everything before bootup continues. Yes, it's not pretty. It's just
used for experimenting, so it doesn't matter.

6. Reboot, and disable block_dump after booting, like in step (2). Now
the logging of reads only starts _after_ brexec has attempted to load
all pages, and this gives info on what is still loaded. You'll probably
see that it loads many things that are also listed in the bootup_blocks
file. Now my question is: what am I doing wrong that it needs to read
those again?

-- Bart

2004-01-26 00:23:56

by d.c

[permalink] [raw]
Subject: Re: Request: I/O request recording

El Sun, 25 Jan 2004 15:38:03 -0800 Andrew Morton <[email protected]> escribi?:

> Unfortunately you cannot determine a directory's blocks in this way.
> Ext3's directories live in the /dev/hda1 pagecache anyway. ext2's
> directories each have their own pagecache.

It would be possible to "hijack" the syscalls at libc level and look at what
the program is doing?

2004-01-26 00:32:29

by Andrew Morton

[permalink] [raw]
Subject: Re: Request: I/O request recording

Diego Calleja Garc?a <[email protected]> wrote:
>
> El Sun, 25 Jan 2004 15:38:03 -0800 Andrew Morton <[email protected]> escribi?:
>
> > Unfortunately you cannot determine a directory's blocks in this way.
> > Ext3's directories live in the /dev/hda1 pagecache anyway. ext2's
> > directories each have their own pagecache.
>
> It would be possible to "hijack" the syscalls at libc level and look at what
> the program is doing?

That would work. It misses out on pagefaults, which are kind of syscalls
in disguise. So for any files which were mmapped you'd have to either
assume that all of the file's pages are required, or use mincore() to poke
around and find out which pages were really faulted in.

2004-01-26 11:50:46

by Bart Samwel

[permalink] [raw]
Subject: Re: Request: I/O request recording

Andrew Morton wrote:
> Bart Samwel <[email protected]> wrote:
>
>>>Linux caches disk data on a per-file basis. So if you preload pagecache
>>>via the /dev/hda1 "file", that is of no benefit to the /etc/passwd file.
>>>Each one has its own unique pagecache. When reading pages for /etc/passwd
>>>we don't go looking for the same disk blocks in the cache of /dev/hda1.
>>>
>>>Which is why the userspace cache preloading needs to know the pathnames of
>>>all the relevant files - it needs to open and read each one, applying
>>>knowledge of disk layout while doing it.
>>
>>Hmmm, that explains why this didn't work. :( So if I wanted to do this
>>completely from user space using only block_dump data I'd probably have
>>to go through all files and find out if they had any blocks in common
>>with my preload set -- presuming there is a way to find that out, which
>>there probably isn't. That makes this idea pretty much useless, I'm
>>sorry to have bothered you with it.
>
> You could certainly do that. Given disk block #N you need to search all
> files on the disk asking "who owns this block". The FIBMAP ioctl can be
> used on most filesystems (ext2, ext3, others..) to find out which blocks a
> file is using. See bmap.c in
>
> http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz
>
> Unfortunately you cannot determine a directory's blocks in this way.
> Ext3's directories live in the /dev/hda1 pagecache anyway. ext2's
> directories each have their own pagecache.

I found out two things while trying to do this:

1. Many filesystems in linux set f_fsid to zero for statfs. I was trying
to use this to skip over mount points, but that doesn't work. Had to use
the st_dev field from stat instead. :(

2. Swapfiles apparently don't like to be touched. I did an
ioctl(FIGETBSZ) on a swapfile, and it would simply block until I did a
swapoff on the file. I didn't even get to the FIBMAP part. :( Is this
correct behaviour? And is there any way to detect this so that I can
work around it?

-- Bart

2004-01-26 11:58:40

by Andrew Morton

[permalink] [raw]
Subject: Re: Request: I/O request recording

Bart Samwel <[email protected]> wrote:
>
> 2. Swapfiles apparently don't like to be touched. I did an
> ioctl(FIGETBSZ) on a swapfile, and it would simply block until I did a
> swapoff on the file. I didn't even get to the FIBMAP part. :( Is this
> correct behaviour?

yup.

> And is there any way to detect this so that I can work around it?

swapoff -a beforehand, I guess.

2004-01-27 19:16:24

by Bart Samwel

[permalink] [raw]
Subject: Re: Request: I/O request recording

Andrew Morton wrote:
> You could certainly do that. Given disk block #N you need to search all
> files on the disk asking "who owns this block". The FIBMAP ioctl can be
> used on most filesystems (ext2, ext3, others..) to find out which blocks a
> file is using. See bmap.c in
>
> http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz
>
> Unfortunately you cannot determine a directory's blocks in this way.
> Ext3's directories live in the /dev/hda1 pagecache anyway. ext2's
> directories each have their own pagecache.

OK, I've written something that does this (but only correctly for ext3).
I've put it here:

http://www.xs4all.nl/~bsamwel/bootup_prefetch.tar.gz

I haven't had the opportunity to do good measurements, so I don't really
know if it even increases performance. If anyone feels like benchmarking
this, I'd be very happy to hear from you. I don't really expect
performance increases, as the bootup scripts seem to have enough
processing to do to keep the system busy even without disk I/O. I wonder
if it might make a difference on a faster processor though, my system's
kind of sluggish by today's standards.

-- Bart