From: Andreas Dilger <adilger@sun.com>
Subject: Re: [RFC] dynamic inodes
Date: Thu, 25 Sep 2008 16:09:36 -0600
Message-ID: <20080925220936.GL10950@webber.adilger.int>
References: <48DA28B0.2020207@sun.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7BIT
Cc: ext4 development <linux-ext4@vger.kernel.org>
To: Alex Tomas <bzzz@sun.com>
In-reply-to: <48DA28B0.2020207@sun.com>
Content-disposition: inline
Sender: linux-ext4-owner@vger.kernel.org

Sadly this was sitting in my outbox overnight, and might be obsolete
already (explanation in a follow-up email), but I'm sending it as food
for thought...

On Sep 24, 2008  15:46 +0400, Alex Tomas wrote:
> another idea how to achieve more (dynamic) inodes:
>   * new dir_entry format with 64bit inum

Yes, that is a requirement in all cases.

I've always thought that we should also implement inode-in-dirent when
we need to change the dirent format and make dynamic inodes, but that
may be too much to chew on at one time.

>   * ino space is 64bit:
>     * 2^48 phys. 4K blocks
>     * 2^5  inodes in 4K block

The 2^5 inodes/4kB block would actually depend on the blocksize/inodesize,
lets just call this inodes-per-block-bits (IPBB).  It will be a power-of-2
between 0 and 8 (i.e. between 1 and 256 inodes per block), which is fine.
For common ext4 filesystems this would be 2^4 = 16 inodes/block, because
the default is 256-byte inodes today.

>     * highest bit is used to choose addressing schema: static or dynamic

Alternately, any inode >= 2^32 would be dynamic?  One clear benefit of
putting the dynamic inodes at the end of the number space is that they
will only be used if the static inodes are full, which reduces risk due
to corruption and overhead due to dynamic allocations.

>   * each block is covered by two bits: in inode (I) and block (B) bitmaps:
>     I: 0, B: 0 - block is just free
>     I: 0, B: 1 - block is used, but not contains inodes
>     I: 1, B: 0 - block is full of inodes
>     I: 1, B: 1 - block contains few inodes, has free space

Storing B:0 for an in-use block seems very dangerous to me.  This also
doesn't really address the need to be able to quickly locate free inodes,
because it means "I:1" _might_ mean the inode is free or it might not,
so EVERY "in-use" inode would need to be checked to see if it is free.


We need to start with a "dynamic inode bitmap" (DIB) that is mapped from
an "inode table file" (possibly only for the dynamic inode table blocks).
Free inodes can be scanned using the normal ext4_find_next_zero_bit()
in each of the bitmaps.

Each such DIB block holds an array of bits indicating dynamic inode
use, as well as an array of block numbers which map IPBB inode bits to
dynamic inode table blocks.  The DIBB should also have a header which
contains space for a magic, a checksum, and the count of free and total
inodes, like a GDT has, as well as a count of in-use itable blocks.

The dynamic inode table blocks (DITB) should also hold a header with
magic, checksum, back-pointer to DIBB.  The back-pointer to the DIBB
allows efficient clearing of in-use bit and location of the DIBB if the
dynamic inode itself is corrupted, and possibly freeing the DITB if
the last in-use inode is freed.

For common 256-byte inodes and 4kB blocks we need 8 bytes/block for the
block addresses, and 1 bit/inode, so

4096 bytes/block / 256 bytes/inode = 16 inodes(bits)/block = 2 byte bitmap

(4096 bytes - 64-byte header) / (8 byte address + 2 byte bitmap) =
	400 itable blocks per DIBB = 400 * 16 = 6400 inodes/DIBB

65536 bytes/block / 256 bytes/inode = 256 inodes(bits)/block = 8 byte bitmap
(65536 bytes - 64-byte header) / (8 byte address + 8 byte bitmap) =
	4092 itable blocks per DIBB = 4092 * 16 = 1048576 inodes/DIBB


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.