2005-03-30 04:32:13

by John Richard Moser

[permalink] [raw]
Subject: Aligning file system data

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

How likely is it that I can actually align stuff to 31.5KiB on the
physical disk, i.e. have each block be a track?

Rather than leveraging the track cache, would it be less expensive for
me to simply read in blocks totaling about 16 or 32KiB all at once?


Let's say I have two situations...

A)
My blocks are all 31.5KiB (512 bytes/sector * 63 sectors) and aligned
to tracks. The track cache on the disk stores the entire block, so
repeted reads to the disk are 0mS seek. I leverage this to read a
couple sectors at a time and seek as I care within the block while it's
cached, making several requests to the ATA device.

B)
My blocks are all 32KiB and cross track boundaries. All of them exist
in part in two separate tracks. Upon reading a block, I request the
entire block and work with it in main memory.

Which situation has less overhead?

C)
My blocks are all 31.5KiB and perfectly aligned within tracks. I read
the entire block as in (B) and work with it in main memory.

How much more latency is involved in (B) than in (C)? Does crossing a
track boundary incur anything expensive?


- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.

Creative brains are a valuable, limited resource. They shouldn't be
wasted on re-inventing the wheel when there are so many fascinating
new problems waiting out there.
-- Eric Steven Raymond
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCSivPhDd4aOud5P8RAszeAJ4wPonhpXas8IprMBUq8/NdM57aegCdEBva
24LXB3O+7GEE0XKxPBFr1L0=
=iTEm
-----END PGP SIGNATURE-----


2005-03-30 04:51:43

by Robert Hancock

[permalink] [raw]
Subject: Re: Aligning file system data

John Richard Moser wrote:
> How likely is it that I can actually align stuff to 31.5KiB on the
> physical disk, i.e. have each block be a track?

I don't think this is very likely. Even being able to find out what the
physical disk arrangement is, or whether it is consistent in terms of
track size, etc. seems unlikely.

>
> Rather than leveraging the track cache, would it be less expensive for
> me to simply read in blocks totaling about 16 or 32KiB all at once?

For block sizes that small I think that the kernel should be smart
enough to do this itself, there is no need to concern with such low
level details in the application.

> How much more latency is involved in (B) than in (C)? Does crossing a
> track boundary incur anything expensive?

Given that both the disk and the kernel will likely read far more than
32KB ahead I can't see much difference other than the overhead inside
your application..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2005-03-30 05:31:44

by John Richard Moser

[permalink] [raw]
Subject: Re: Aligning file system data

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Well then, the verdict is reached.

My original design is based around storing related data in the same
block so that the track cache allows me to evade doing reads while I
poke around.

The design will stay the same; but the dependency on the track cache
will dissappear. I'll simply consider 32KiB or 64KiB to be a nice block
size, 64KiB being the biggest, and leverage the design on the kernel
reading whole blocks into main memory to play with at a time.

Back to designing my file system. . . .


The only lasting regrets I have is that I don't have a good, fast way to
do on-disk locking for a cluster file system. This would make my FS a
complete solution. . . .

It doesn't matter, finishing the design is a while off anyway. I still
have to define several extended journal transaction types to support
fault tolerant dynamic resizing (grow, shrink) while running. I don't
see how to grow left; shrinking from the left is easy enough. Wait,
suddenly I see how to grow left: Superblock at the end, and a bit of
magic. . . .


Robert Hancock wrote:
> John Richard Moser wrote:
>
>> How likely is it that I can actually align stuff to 31.5KiB on the
>> physical disk, i.e. have each block be a track?
>
>
> I don't think this is very likely. Even being able to find out what the
> physical disk arrangement is, or whether it is consistent in terms of
> track size, etc. seems unlikely.
>
>>
>> Rather than leveraging the track cache, would it be less expensive for
>> me to simply read in blocks totaling about 16 or 32KiB all at once?
>
>
> For block sizes that small I think that the kernel should be smart
> enough to do this itself, there is no need to concern with such low
> level details in the application.
>
>> How much more latency is involved in (B) than in (C)? Does crossing a
>> track boundary incur anything expensive?
>
>
> Given that both the disk and the kernel will likely read far more than
> 32KB ahead I can't see much difference other than the overhead inside
> your application..
>

- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.

Creative brains are a valuable, limited resource. They shouldn't be
wasted on re-inventing the wheel when there are so many fascinating
new problems waiting out there.
-- Eric Steven Raymond
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD4DBQFCSjmPhDd4aOud5P8RAgB7AJiWq4Qiyfk1G0SJa+5ZCtJ//WH8AJ9ysogo
3z6+FLvkNgyU/k0o9HBf1w==
=OPXo
-----END PGP SIGNATURE-----

2005-03-30 05:37:14

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: Aligning file system data

In article <[email protected]> you wrote:
> How likely is it that I can actually align stuff to 31.5KiB on the
> physical disk, i.e. have each block be a track?

It is not that easy to allign on tracks, even on raw partition. Some disks
have different length of tracks (of course because the inner cylinders are
shorter), some show a totally different geometry than they have internally,
and the disks are happyly remapping.

With raid and lvm the situation get worse.

Why do you want to do thoe micro optimizations?

With a filesystem in between you have virtuelly no way to allign larger
files for streaming.

Let the buffer cache and prefetch do, what they are intended for and feel
happy.

Greetings
Bernd

2005-03-30 05:41:37

by Barry K. Nathan

[permalink] [raw]
Subject: Re: Aligning file system data

On Tue, Mar 29, 2005 at 11:32:16PM -0500, John Richard Moser wrote:
> Does crossing a
> track boundary incur anything expensive?

AFAIK, yes. It's going to involve some kind of seeking (even a head
switch needs microjogging on modern drives), and it will certainly add
latency (although I don't remember how much, off the top of my head).

However, trying to control this from the kernel may be vastly harder
than you're expecting (assuming a modern hard drive). You may want to
look at these pages for more info:

http://www.storagereview.com/guide2000/ref/hdd/geom/tracksZBR.html
http://www.storagereview.com/guide2000/ref/hdd/geom/geomLogical.html

Also look at the last paragraph on this page -- not the paragraph with
the "Stop" sign, but the one after it:
http://www.storagereview.com/guide2000/ref/hdd/geom/formatDefect.html


I think this could in fact be done, but it would be a lot of effort,
and the kernel would need knowledge on a per-drive-model basis (or
at least it would need a way to obtain such knowledge from user space,
and the per-model knowledge would need to be stored there somehow).
For all I know, vendor-specific commands might also be needed in order
to find out which blocks are remapped, in order to use that knowledge to
avoid changing tracks spuriously. (And one other note: Since your device
almost certainly has many tracks with well over 256 sectors in reality,
your device is actually incapable of reading or writing a single track
with a single ATA command unless it supports LBA48.)

-Barry K. Nathan <[email protected]>