From: Andreas Dilger Subject: Re: [PATCH 15/23] jbd2: Change disk layout for metadata checksumming Date: Mon, 30 Apr 2012 10:51:43 -0600 Message-ID: <29C40967-4A9E-4D32-B356-A5D15E23EB38@dilger.ca> References: <20120306204750.1663.96751.stgit@elm3b70.beaverton.ibm.com> <20120306204941.1663.56283.stgit@elm3b70.beaverton.ibm.com> <20120428141933.GB29481@thunk.org> <20120430155341.GC6938@tux1.beaverton.ibm.com> Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: "Ted Ts'o" , Sunil Mushran , Martin K Petersen , Greg Freemyer , Amir Goldstein , linux-kernel , Andi Kleen , Mingming Cao , Joel Becker , linux-fsdevel , linux-ext4@vger.kernel.org, Coly Li To: djwong@us.ibm.com Return-path: In-Reply-To: <20120430155341.GC6938@tux1.beaverton.ibm.com> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On 2012-04-30, at 9:53 AM, djwong wrote: > As for putting half the checksum into the upper 16 bits of the flags > field -- is journal space at such a premium that we need to overload > the field and reduce the strength of the checksum? Enabling journal > checksums on a 4k block filesystem causes tags_per_block to decrease > from 512 to 341 on a 32bit journal and from 341 to 256 on a 64bit > journal. Do transactions typically have that many blocks? I didn't > think most transactions had 1-2MB of dirty data. I think on a busy filesystem there can be many thousands of blocks in a single transaction. We run Lustre with 400MB journals, and under metadata-intensive workloads we can hit the 100MB transaction size limit easily. However, this doesn't mean there are 25k blocks in each transaction, since most of these blocks are reserved for the worst case, but not used. As for the impact of reducing the number of tags in each block, for a 4096-block transaction this would currently mean 8 32-bit tag blocks, and it would grow to 12 or 13, which isn't significant in the end. My suggestion was mostly to avoid problems with the disk format change. If this can be handled in another manner, AND it doesn't break journal recovery on older kernels/e2fsprogs, then I'm OK with the cleaner approach. Please ensure that this is tested. Cheers, Andreas