2001-04-08 13:51:15

by Russell Coker

[permalink] [raw]
Subject: struct stat{st_blksize} for /dev entries in 2.4.3

When you stat() the files under /dev the st_blksize is returned as 1024
bytes. Currently cat will look at the input block size and the output block
size and use the maximum of them as it's buffer size. I believe that
programs such as cat should never use a buffer size smaller than a page of
memory and reported this as a bug in cat.
Herbert Xu (the maintainer of the Debian package textutils which contains
cat) considers that the device should return a larger number in the
st_blksize.

Here are some test results, first a P3-650 with 256M of RAM using a 300M
partition of a 10G disk:
root@lyta:/#time cat /dev/ide/host0/bus0/target0/lun0/part1 > /dev/null

real 0m37.959s
user 0m0.220s
sys 0m4.910s
My patched cat uses 25% of the user CPU time and 90% the system CPU time of
the unpatched cat:
root@lyta:/#time /usr/src/textutils-new/src/cat
/dev/ide/host0/bus0/target0/lun0/part1 > /dev/null

real 0m35.502s
user 0m0.060s
sys 0m4.440s

Now here's an AMD K6-350 with 64M of RAM using a 2G RAID-1 partition across
two 46G disks:
root@ivanova:~#time cat /dev/md/1 > /dev/null

real 2m25.906s
user 0m2.200s
sys 1m16.290s

My patched cat uses 30% the user CPU time and 95% the system CPU time:
root@ivanova:~#time /tmp/cat /dev/md/1 > /dev/null

real 2m14.845s
user 0m0.720s
sys 1m12.030s


On an AMD Athlon 800 machine I noticed an even more significant difference
(the command "cat /dev/zero > /dev/hdc1" was using 100% CPU time but the disk
was not at maximum speed before I patched cat). But I don't have suitable
test results with me at this time so I can't give you the details. Another
issue is that an Athlon 800 is a reasonably fast CPU, and it should probably
be able to handle 33000 reads and 33000 writes per second easily without
using any significant amount of CPU time.


Now I would like some comments on the following issues:
Is this a bug in cat regardless of whether the behaviour of the kernel is
right or wrong? I have attached a patch for cat in case it is determined
that cat is buggy.


Regardless of whether cat is doing the right thing or not, does it make sense
for the st_blksize of /dev/* to be 1024 bytes? Or should it be 4096?

My understanding is that the st_blksize is the recommended IO size (not
mandatory). So it shouldn't break anything if this is set to a minimum of
the page size. But setting it to the page size will hint that applications
should use a page as the minimum IO block size and cause some applications to
deliver better performance.



diff -ru textutils-2.0/src/cat.c textutils-new/src/cat.c
--- textutils-2.0/src/cat.c Sun Apr 8 22:55:10 2001
+++ textutils-new/src/cat.c Sun Apr 8 23:23:54 2001
@@ -790,6 +790,9 @@
if (options == 0)
{
insize = max (insize, outsize);
+#ifdef _SC_PHYS_PAGES
+ insize = max (insize, sysconf(_SC_PAGESIZE));
+#endif
inbuf = (unsigned char *) xmalloc (insize);

simple_cat (inbuf, insize);

--
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/ Postal SMTP/POP benchmark
http://www.coker.com.au/projects.html Projects I am working on
http://www.coker.com.au/~russell/ My home page


2001-04-08 18:46:59

by Ulrich Drepper

[permalink] [raw]
Subject: Re: struct stat{st_blksize} for /dev entries in 2.4.3

Russell Coker <[email protected]> writes:

> diff -ru textutils-2.0/src/cat.c textutils-new/src/cat.c
> --- textutils-2.0/src/cat.c Sun Apr 8 22:55:10 2001
> +++ textutils-new/src/cat.c Sun Apr 8 23:23:54 2001
> @@ -790,6 +790,9 @@
> if (options == 0)
> {
> insize = max (insize, outsize);
> +#ifdef _SC_PHYS_PAGES
> + insize = max (insize, sysconf(_SC_PAGESIZE));
> +#endif
> inbuf = (unsigned char *) xmalloc (insize);
>
> simple_cat (inbuf, insize);

The #ifdef is certainly wrong. And there is no guarantee that any of
the _SC_* constants are defined as macros. You'll have to add a
configure test.

--
---------------. ,-. 1325 Chesapeake Terrace
Ulrich Drepper \ ,-------------------' \ Sunnyvale, CA 94089 USA
Red Hat `--' drepper at redhat.com `------------------------

2001-04-09 07:33:37

by Eric W. Biederman

[permalink] [raw]
Subject: Re: struct stat{st_blksize} for /dev entries in 2.4.3

Russell Coker <[email protected]> writes:

> When you stat() the files under /dev the st_blksize is returned as 1024
> bytes. Currently cat will look at the input block size and the output block
> size and use the maximum of them as it's buffer size. I believe that
> programs such as cat should never use a buffer size smaller than a page of
> memory and reported this as a bug in cat.

Hmm. I can't agree with that, at least not all of the way. I suspect
there are some character devices that do I/O in small chunks for which
a small block size is preferable. The old latency version bandwidth issue.
Plus this assumes an linux like architecture. With a different
structure to the code who knows where the tradeoffs fall.

In particular consider where the tradeoffs go if we increase the
internal PAGE_SIZE but not the external one. So I think we really
need to fix stat!

However it looks like you aren't using ext2 for root.

Currently looking at 2.4.2 source I see the following things.
1) cp_new_stat is broken, if you don't set st_blksize in your
filesystem. It appears to be counting 1K blocks base on filesize.
st_blocks is always in multiples of 512 bytes.

2) The st_blksize comes out of the inode structure, and it looks
currently the device layer doesn't set it so it gets left at
whatever the filesystem sets this to. This is PAGE_SIZE for ext2.
Which filesystem are you runnning?

>
> Now I would like some comments on the following issues:
> Is this a bug in cat regardless of whether the behaviour of the kernel is
> right or wrong? I have attached a patch for cat in case it is determined
> that cat is buggy.

Unless you can prove that PAGE_SIZE is a determining factor of I/O
efficiency on all possible architectures I don't see a problem with
cat.

> Regardless of whether cat is doing the right thing or not, does it make sense
> for the st_blksize of /dev/* to be 1024 bytes? Or should it be
> 4096?

Hmm. I think we might want to be even smarter than that, but as a
first approximation yes it should be PAGE_SIZE.

> My understanding is that the st_blksize is the recommended IO size (not
> mandatory). So it shouldn't break anything if this is set to a minimum of
> the page size. But setting it to the page size will hint that applications
> should use a page as the minimum IO block size and cause some applications to
> deliver better performance.

Correct. Changing st_blksize should not break any correct program.

Eric