2012-05-08 15:44:17

by Andreas Dilger

[permalink] [raw]
Subject: Filesystem pathname distributions

There is an interesting analysis of filesystem pathname distributions
available in a master's thesis at:

http://www.pdl.cmu.edu/PDL-FTP/HECStorage/Yifan_Final.pdf

This shows (at least for the filesystems analyzed) that median
directory sizes continue to be quite small. Looking at figure
10, 60-90% of all directories have 8 or fewer entries in them,
and in many cases 50% of directories have only 1 or 2 entries.

It would be useful for you to run the fsstats tool (available
from http://www.pdsi-scidac.org/fsstats/) against some filesystems
that you have access to (e.g. typical distro desktop and home
directory, file servers, etc) and compute what fraction of the
files and directories could be stored inside the inode.

Similarly, and perhaps more importantly, for any filesystems you
have using the bigalloc feature, compare the actual size of
files and directories (only the filename + 10 bytes) to see how
many you could fit into an inode of a given size. A default
256-byte inode has about 100 bytes of space, a 512-byte inode
has about 350 bytes of space, etc). It would be useful to see
if there is a clear win for having larger inodes to hold small
files/directories in bigalloc filesystems, or whether this would
waste more space in total than is wasted by allocating a full
bigalloc block for each inode.

Cheers, Andreas
--
Andreas Dilger Whamcloud, Inc.
Principal Lustre Engineer http://www.whamcloud.com/