2009-02-09 11:06:51

by Kalpak Shah

[permalink] [raw]
Subject: [RFC] Design for large EAs

Hi Ted,

Earlier I had posted patches which added large EA support by storing all
EAs with size greater than 1/2 of blocksize into an external inode. Ted
had expressed reservations that this may use up too many inodes. Further
discussion ensued and few different designs were discussed. Here's the
relevant thread:
http://kerneltrap.org/index.php?q=mailarchive/linux-ext4/2008/12/3/4296884/thread

There was agreement on an ext4 concall that we can increase extended
attribute space by allocating contiguous EA blocks. So only those EAs
with size > blocksize will be stored in external inode. Thereby less
number of inodes will be wasted and we will have space for more number
of entries as well. I have included few design details, and it would be
nice to have agreement on these before I move on to coding.

----

Problem Definition:
Increase Extended Attribute space by allocating contiguous EA blocks.

Design:

1) On-disk structure changes

- h_blocks field in ext4_xattr_header can be used to indicate
number of contiguous EA blocks for the inode. We can have a maximum of
64KB of EA space since we are limited by 16-bit value offset field in
ext4_xattr_entry.
- The EXT4_FEATURE_INCOMPAT_EA_INODE flag will encapsulate the
external inode EA and multiple EA blocks features.
- A new magic number for h_magic(EXT4_XATTR_MAGIC_V2) in
ext4_xattr_header to indicate that this inode has multiple EA blocks.

2) We will initially allocate only a single EA block if xattr fits in
one block. If xattrs do not fit in single EA block then we will try to
allocate 64KB of contiguous blocks. If allocation of 64KB fails we will
try to allocate 32KB and then 16KB of contiguous blocks, ...

3) EA value will be stored in external inode if value_size > blocksize.
All EA values in EA blocks will not cross 4KB boundaries. This way we
can always read in an EA value by reading only a single block and making
the code simpler. I don't think we want to store medium sized EAs
(5-32KB) to be stored in EA blocks anyway.

Questions:

1) Since we will have 64KB EA space, we will need to allocate 64KB
buffers to read in the EA space. This way a lot of the code will remain
as-is, since most of the macros/functions assume that EA space is
contiguous(as was with single EA block). Will allocation of such large
buffers(even with vmalloc) lead to any problems?
3) We won't be using refcount for EA blocks any more. Any problems with
this? Should we think about removing the entire mb_cache code or atleast
disable mb_cache when not needed?

Thanks,
Kalpak