From: Akira Fujita Subject: Re: Review of ext4-online-defrag-check-for-freespace-fragmentation.patch Date: Wed, 17 Sep 2008 15:43:41 +0900 Message-ID: <48D0A71D.7000004@rs.jp.nec.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Cc: Takashi Sato , linux-ext4@vger.kernel.org To: "Theodore Ts'o" Return-path: Received: from TYO202.gate.nec.co.jp ([202.32.8.206]:46456 "EHLO tyo202.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751525AbYIQGoH (ORCPT ); Wed, 17 Sep 2008 02:44:07 -0400 In-Reply-To: Sender: linux-ext4-owner@vger.kernel.org List-ID: Hi Ted, Thank you for your review comments. Theodore Ts'o wrote: > Hi, > > First, some general comments about your patch series --- I would prefer > if we didn't have a double layer of ioctl's dispatch. That is, don't > have have ext4_ioctl() do one level of case statements, only to then > call ext4_defrag_ioctl(), which dispatches on the ioctl's a second time. > It's much better to do all of the dispatching out of the top-level > ext4_ioctl() function. I will fix defrag to call each of functions directly from ext4_ioctl(). > Secondly, some of the ioctl numbers chosen by the defrag patches overlap > other, already existing patches. This is something we will need to fix, > long term. For now, folks should know that we can't count on the ioctl > numbers being stable, since we will probably need to move them. I will use renumbered defrag patches in the patch queue. > Finally, the patch comments don't always describe all of the changes in > the patch. For example, in the patch > "ext4-online-defrag-check-for-freespace-fragmentation.patch", the patch > contains *far* more than just code to return the free space > fragmentation. It defines three new ioctl: > > EXT4_IOC_GROUP_INFO > EXT4_IOC_FREE_BLOCKS_INFO > EXT4_IOC_EXTENTS_INFO OK, I will split defrag patches into each ioctls and add more descriptions about them. > These ioctl's are someone odd. First of all, EXT4_IOC_GROUP_INFO only > returns two bits of information: > > struct ext4_group_data_info { > int s_blocks_per_group; /* blocks per group */ > int s_inodes_per_group; /* inodes per group */ > }; > > Why just those two bits? Userspace can easily get this information by > opening the block device directly, but if you don't want to do that, why > not just return the full 1024 byte ext4 superblock? That would be far > more generally useful, instead of just returning these two fields. Either will be fine for me. But if defrag opens the block device directly to get superblock related information, it will make sense since we can reduce the number of ioctl. Or as you mentioned later, if defrag implements new ioctl which gets superblock information, it will be useful. So I am puzzled a little about which is better. > Also, the code implementing EXT4_IOC_GROUP_INFO needlessly does a > copy_from_user() before filling in the data structure (and overwriting > all of the user data copied using copy_from_user); that's not necessary, > and should be elimiated. You are right. I will remove it. > > The second ioctl, EXT4_IOC_FREE_BLOCKS_INFO, is very strange in that it > returns the free space available in a particular block group, but > instead of passing in a block group number, instead an inode number is > passed in. This seems very strange, and unnecessary. EXT4_IOC_FREE_BLOCKS_INFO is only used in the force defrag mode (-f) and it is necessary to get free block information. It reads block bitmap of target block group then sets free block distribution to ext4_extents_info structure as extents array and returns it to userspace. After that in userspace, defragger calculates the combination of extents which should be moved to other block group as victim extents with the results of EXT4_IOC_FREE_BLOCKS_INFO and EXT4_IOC_EXTENTS_INFO. > (Looking at the defrag command, it appears that the defragger only tries > to defragment an inode if can find free space in the block group > associated with the inode? That seems highly limited. Note > particularly that with ext4's flex_bg feature, we now freely try to > allocate blocks within a flex_bg which is composed of multiple block > groups. Also, some of the files that most badly need defragmentation > will be the larger ones that can't possibly fit in a single block > group.) In the force defrag mode (-f), the maximum file defrag size is limited to 128MB (It is the maximum size we can store blocks into a single block group when blocksize is 4KB). Because if defragger scans all of block groups and calculates the combination of free-blocks and used-blocks for victim extents, it will cost a lot. On the other hand, in usual defrag (no option) and the relevant defrag (-r), defragger scans all of block groups in ext4 for block allocation with mballoc (It means no need to calculate the combination of used/unused blocks). So there is no file size limitation in this case. > > The last ioctl, EXT4_IOC_EXTENTS_INFO, duplicates the generic FIEMAP > ioctl, and most of the implementation duplicates what is found in the > ext4-fiemap patch. If this patch had been separated into three separate > patches, one for each ioctl, it would be easier to eliminate the > EXT4_IOC_EXTENTS_INFO changes as duplicated by other code. There was no ioctl to get extents information and thus defrag implemented EXT4_IOC_EXTENTS_INFO originally at that time. But FIEMAP is already in the patch queue now, so defrag should use it instead of EXT4_IOC_EXTENTS_INFO. I am working on this now. > > More generally, I'm beginning to think the best way to make progress > with the defragmentation patches is to break out the patches into > separate ioctl's, and we should merge the read-only ioctls that return > information about the filesystem first. These ioctl's should be made as I agree with you. > generally useful as possible, and if they overlap ioctl's that already > exist or are planned to be merged, such as the FIBMAP or FIEMAP > interfaces, we should figure out why we can't use the pre-existing > ioctl's first. > > For example, if we change EXT4_IOC_GROUP_INFO into something which > returns the entire ext4 superblock, then in the future if the defrag > command gains the ability to take the flex_bg feature or the RAID layout > into account, we won't need to create new ioctl's to return those > filesystem parameters. Best regards, Akira Fujita