From: Mingming Cao Subject: Re: [PATCH] Ext4 Documentation updates. Date: Wed, 02 Jul 2008 14:45:55 -0700 Message-ID: <1215035155.6788.43.camel@mingming-laptop> References: <20080702172200.14990.37737.stgit@rx8> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: linux-ext4@vger.kernel.org To: "Jose R. Santos" Return-path: Received: from e3.ny.us.ibm.com ([32.97.182.143]:48854 "EHLO e3.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752657AbYGBVqQ (ORCPT ); Wed, 2 Jul 2008 17:46:16 -0400 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e3.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id m62Lk5Bp009142 for ; Wed, 2 Jul 2008 17:46:05 -0400 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v9.0) with ESMTP id m62Ljtlf239766 for ; Wed, 2 Jul 2008 17:45:55 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m62LjtTr010359 for ; Wed, 2 Jul 2008 17:45:55 -0400 In-Reply-To: <20080702172200.14990.37737.stgit@rx8> Sender: linux-ext4-owner@vger.kernel.org List-ID: On Wed, 2008-07-02 at 12:22 -0500, Jose R. Santos wrote: > From: Jose R. Santos > > Ext4 Documentation updates. > > Some of the information in Documentation/filesystems/ext4.txt is out > of date and in need of an update. > > Signed-off-by: Jose R. Santos > -- Thanks, I added it to the ext4 patch queue before the new ordered mode patches. Here is another documentation update patch that add documentation for the new ordered mode and the delayed allocation, should go after the delayed allocation. Also added a few update --- Documentation/filesystems/ext4.txt | 37 +++++++++++++++++++++++++++---------- 1 file changed, 27 insertions(+), 10 deletions(-) Index: linux-2.6.26-rc8/Documentation/filesystems/ext4.txt =================================================================== --- linux-2.6.26-rc8.orig/Documentation/filesystems/ext4.txt 2008-07-02 13:49:28.000000000 -0700 +++ linux-2.6.26-rc8/Documentation/filesystems/ext4.txt 2008-07-02 14:06:34.000000000 -0700 @@ -57,7 +57,7 @@ Mailing list: linux-ext4@vger.kernel.org * extent format reduces metadata overhead (RAM, IO for access, transactions) * extent format more robust in face of on-disk corruption due to magics, * internal redunancy in tree -* improved file allocation (multi-block alloc, delayed alloc) +* improved file allocation (multi-block alloc) * fix 32000 subdirectory limit * nsec timestamps for mtime, atime, ctime, create time * inode version field on disk (NFSv4, Lustre) @@ -67,8 +67,15 @@ Mailing list: linux-ext4@vger.kernel.org * ability to pack bitmaps and inode tables into larger virtual groups via the flex_bg feature * large file support +* delayed allocation +* large block (up to pagesize) support +* efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force + the ordering) +* Inode allocation using large virtual block groups via flex_bg (patch + available; fragmentation issues due to prolong fs use still unknown) + -2.2 Previously available, soon to be enabled by default by "mkefs.ext4": +2.2 Previously available, now is enabled by default by "mkefs.ext4": * dir_index and resize inode will be on by default * large inodes will be used by default for fast EAs, nsec timestamps, etc @@ -76,8 +83,6 @@ Mailing list: linux-ext4@vger.kernel.org 2.3 Candidate features for future inclusion * Online defrag (patches available but not well tested) -* Inode allocation using large virtual block groups via flex_bg (patch - available; fragmentation issues due to prolong fs use still unknown) * reduced mke2fs time via uninit_bg feature (capability to do this is available in e2fsprogs but a kernel thread to do lazy zeroing of unused inode table blocks after filesystem is first mounted is required for @@ -236,7 +241,9 @@ stripe=n Number of filesystem blocks th to use for allocation size and alignment. For RAID5/6 systems this should be the number of data disks * RAID chunk size in file system blocks. - +delalloc (*) Deferring block allocation until write-out time. +nodelalloc Disable delayed allocation. Blocks are allocation + when data is copied from user to page cache. Data Mode ========= There are 3 different data modes: @@ -250,10 +257,19 @@ typically provide the best ext4 performa * ordered mode In data=ordered mode, ext4 only officially journals metadata, but it logically -groups metadata and data blocks into a single unit called a transaction. When -it's time to write the new metadata out to disk, the associated data blocks -are written first. In general, this mode performs slightly slower than -writeback but significantly faster than journal mode. +groups metadata information related to data changes with the data blocks into a +single unit called a transaction. When it's time to write the new metadata +out to disk, the associated data blocks are written first. In general, +this mode performs slightly slower than writeback but significantly faster than journal mode. + +In ext4/JBD2 this ordered mode implementation is different than ext3/JBD +ordered mode. First it get rid of using buffer heads to enforce the ordering +between metadata change with the related data chage. Instead, in the new +ordering mode, it keeps track of per transaction journalled inode list, and +flush all the dirty pages for those inodes, when committing that transaction. +Second, the new ordered mode reverse the lock ordering of the page lock and +transaction lock, to fixing the locking issue in the new mode, and also provide +easy support for delayed allocation over the new ordered mode * journal mode data=journal mode provides full data and metadata journaling. All new data is @@ -261,7 +277,8 @@ written to the journal first, and then t In the event of a crash, the journal can be replayed, bringing both data and metadata into a consistent state. This mode is the slowest except when data needs to be read from and written to disk at the same time where it -outperforms all others modes. +outperforms all others modes. Right now this mode does not have +delayed allocation support. References ==========