2001-02-22 15:34:52

by Andries E. Brouwer

[permalink] [raw]
Subject: filesystem statistics

Now that people are discussing the right hash function to use,
and the amount of space taken by filenames in various schemes,
I wondered how these things are on a random machine.
Here some statistics.

Andries

--------------------------------------------------------------

Statistics on a filesystem with 63 GB worth of files.

2797212 files

average file size: 22600 bytes
average depth: 8
average pathname length: 59 bytes
average filename length: 10 bytes

max file size: 678035456 bytes
max depth: 17
max pathname length: 159 bytes
max filename length: 99 bytes

longest path name (also with largest depth):
159 bytes: /b2/g1a/linux/nist/NIST-PCTS/STD/DIF/data/dif.d/ln_gt_100_test/ln_gt_100_test/ln_gt_100_test/ln_gt_100_test/ln_gt_100_test/ln_gt_100_test/ln_gt_100_test/tar_19

longest file name:
99 bytes: CL_Streamed_RawSample_Session_CL_Streamed_RawSample_Session_CL_InputSource_SoundFormatintbool_.html

distribution of depths:
0: 1
1: 50
2: 3212
3: 19951
4: 57534
5: 159917
6: 347124
7: 705958
8: 569661
9: 657689
10: 176777
11: 63221
12: 24646
13: 9765
14: 1364
15: 259
16: 80
17: 3

distribution of pathname lengths:
0 1 2 3 4 5 6 7 8 9
0: 0 1 0 14 14 10 13 66 203 646
10: 1133 649 1521 3367 2664 2398 2969 3657 4822 3010
20: 3360 3951 3824 4182 5702 5043 6352 7660 9027 11948
30: 11877 25050 17943 26597 24599 23174 25292 31789 31897 31319
40: 33225 36892 36911 42668 42106 46898 46666 49673 54825 61980
50: 64753 62859 72410 75021 79526 73447 75175 72326 70532 70574
60: 71446 71227 68235 67907 62067 56403 54213 49642 44474 39715
70: 35215 31877 31270 22486 20232 16859 14218 12767 13826 12855
80: 8269 139474 13866 13182 5858 5158 268605 4077 3633 3152
90: 2627 2201 1691 1680 1486 1538 1327 1163 1025 1390
100: 818 891 775 935 1503 1450 438 368 296 1198
110: 1102 169 174 139 101 963 925 74 51 44
120: 31 22 20 20 20 6 12 17 12 18
130: 13 7 10 8 9 7 3 6 1 0
140: 2 0 0 0 0 0 0 0 1 0
150: 0 1 2 0 0 1 0 0 1 2
160: 0 0 0 0 0 0 0 0 0 0

distribution of filename lengths:
0 1 2 3 4 5 6 7 8 9
0: 1 2341 16946 36630 66883 115118 189020 224596 289985 682943
10: 237134 213677 199863 114987 81873 60902 52485 38200 29354 24779
20: 21153 15792 13279 14638 11973 9366 7586 4674 3832 2975
30: 2645 1938 1643 1244 1076 909 660 642 544 508
40: 379 222 186 217 229 126 124 107 103 71
50: 52 52 48 54 56 41 38 40 11 18
60: 19 11 8 20 21 8 9 8 21 13
70: 10 9 10 8 9 11 4 9 5 4
80: 3 3 3 1 1 0 1 3 2 2
90: 1 0 1 1 1 1 2 0 0 1
100: 0 0 0 0 0 0 0 0 0 0


2001-02-23 18:55:33

by Andreas Dilger

[permalink] [raw]
Subject: Re: filesystem statistics

Andries Brouwer writes:
> Now that people are discussing the right hash function to use,
> and the amount of space taken by filenames in various schemes,
> I wondered how these things are on a random machine.
> Here some statistics.

Can you generate statistics on the number of files in each directory,
and the total size of each directory? For total directory size, it
may be useful to have not only size in kB and/or disk blocks, but also
the sum of raw dentry sizes as well, because directories always show
full block sizes.

This would also be helpful to determine how often indexing will be used
in an "average" system.

Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert

2001-02-23 22:13:53

by Guest section DW

[permalink] [raw]
Subject: Re: filesystem statistics

On Fri, Feb 23, 2001 at 11:54:28AM -0700, Andreas Dilger wrote:
> Andries Brouwer writes:
> > Here some statistics.
>
> Can you generate statistics on the number of files in each directory,
> and the total size of each directory?
>
> This would also be helpful to determine how often indexing will be used
> in an "average" system.

Hmm - there is no way this is an average system. It is just a random system.
For doing statistics on file names it is a valid example, I think.
For doing statistics on file sizes or directory sizes it is worthless.
Some people have few very large files, some have news spools or other
things with lots of small files in a directory.
(This particular system does not have a news spool.)

Anyway, I can give you the stats.

127533 directories
2555633 regular files
946 other files

Largest file: 678035456 bytes
Largest directory: 1283 links

Distribution of nlinks:

0 1 2 3 4 5 6 7 8 9
0: 0 0 98330 13901 4510 2238 1624 1318 877 662
10: 668 490 519 307 308 226 157 140 101 62
20: 130 116 73 78 54 26 59 41 36 33
30: 13 24 15 23 12 11 14 40 25 21
40: 21 9 10 20 10 4 6 10 2 1
50: 3 3 1 2 1 4 11 3 1 1
60: 2 3 0 2 6 6 6 1 1 1
70: 7 2 0 2 0 4 3 0 1 3
80: 1 5 0 1 0 0 1 0 0 0
90: 1 0 1 1 0 1 2 3 1 0
100: 1 0 1 0 5 4 0 3 3 1
110: 0 1 0 0 0 0 4 0 1 0
120: 0 0 0 1 0 0 0 0 0 2
130: 0 0 1 0 1 0 0 0 1 0
140: 0 0 0 0 0 0 0 0 1 0
160: 0 0 0 1 0 0 0 0 0 1
170: 0 0 0 0 0 0 0 0 0 1
180: 0 0 1 0 0 0 0 1 0 0
200: 0 0 0 0 0 0 0 0 0 1
210: 1 0 0 0 0 0 0 0 0 0
220: 0 0 0 0 0 1 0 1 0 0
230: 0 0 0 0 0 0 0 0 0 1
250: 0 0 0 0 0 0 0 0 16 0
790: 1 0 0 0 0 0 0 0 0 0
1150: 0 0 0 1 0 0 0 0 0 0
1280: 0 0 0 1 0 0 0 0 0 0

(Interesting - I never thought about that, but it looks as if most directories are empty.)

Distribution of directory sizes (in 4kB blocks):

0 1 2 3 4 5 6 7 8 9
0: 3 126133 763 247 35 38 21 18 26 5
10: 10 16 102 3 8 9 4 2 10 1
20: 4 1 5 4 20 9 15 6 0 3
30: 0 3 3 0 0 0 0 3 1 1
40: 0 0 0 0 0 0 0 0 0 0
50: 0 0 0 0 1 0 0 0 0 0