Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752222AbXFNJPU (ORCPT ); Thu, 14 Jun 2007 05:15:20 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751022AbXFNJPG (ORCPT ); Thu, 14 Jun 2007 05:15:06 -0400 Received: from mail.clusterfs.com ([206.168.112.78]:38414 "EHLO mail.clusterfs.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750886AbXFNJPC (ORCPT ); Thu, 14 Jun 2007 05:15:02 -0400 Date: Thu, 14 Jun 2007 03:14:58 -0600 From: Andreas Dilger To: David Chinner Cc: "Amit K. Arora" , Suparna Bhattacharya , torvalds@osdl.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org, xfs@oss.sgi.com, cmm@us.ibm.com Subject: Re: [PATCH 1/5] fallocate() implementation in i86, x86_64 and powerpc Message-ID: <20070614091458.GH5181@schatzie.adilger.int> Mail-Followup-To: David Chinner , "Amit K. Arora" , Suparna Bhattacharya , torvalds@osdl.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org, xfs@oss.sgi.com, cmm@us.ibm.com References: <20070426175056.GA25321@amitarora.in.ibm.com> <20070426180332.GA7209@amitarora.in.ibm.com> <20070509160102.GA30745@amitarora.in.ibm.com> <20070510005926.GT85884050@sgi.com> <20070510115620.GB21400@amitarora.in.ibm.com> <20070510223950.GD86004887@sgi.com> <20070511110301.GB28425@in.ibm.com> <20070512080157.GF85884050@sgi.com> <20070612061652.GA6320@amitarora.in.ibm.com> <20070613235217.GS86004887@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070613235217.GS86004887@sgi.com> User-Agent: Mutt/1.4.1i X-GPG-Key: 1024D/0D35BED6 X-GPG-Fingerprint: 7A37 5D79 BF1B CECA D44F 8A29 A488 39F5 0D35 BED6 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2822 Lines: 76 On Jun 14, 2007 09:52 +1000, David Chinner wrote: > B FA_PREALLOCATE > provides the same functionality as > B FA_ALLOCATE > except it does not ever change the file size. This allows allocation > of zero blocks beyond the end of file and is useful for optimising > append workloads. > TP > B FA_DEALLOCATE > removes the underlying disk space with the given range. The disk space > shall be removed regardless of it's contents so both allocated space > from > B FA_ALLOCATE > and > B FA_PREALLOCATE > as well as from > B write(3) > will be removed. > B FA_DEALLOCATE > shall never remove disk blocks outside the range specified. So this is essentially the same as "punch". There doesn't seem to be a mechanism to only unallocate unused FA_{PRE,}ALLOCATE space at the end. > B FA_DEALLOCATE > shall never change the file size. If changing the file size > is required when deallocating blocks from an offset to end > of file (or beyond end of file) is required, > B ftuncate64(3) > should be used. This also seems to be a bit of a wart, since it isn't a natural converse of either of the above functions. How about having two modes, similar to FA_ALLOCATE and FA_PREALLOCATE? Say, FA_PUNCH (which would be as you describe here - deletes all data in the specified range changing the file size if it overlaps EOF, and FA_DEALLOCATE, which only deallocates unused FA_{PRE,}ALLOCATE space? We might also consider making @mode be a mask instead of an enumeration: FA_FL_DEALLOC 0x01 (default allocate) FA_FL_KEEP_SIZE 0x02 (default extend/shrink size) FA_FL_DEL_DATA 0x04 (default keep written data on DEALLOC) We might then build FA_ALLOCATE and FA_DEALLOCATE out of these flags without making the interface sub-optimal. I suppose it might be a bit late in the game to add a "goal" parameter and e.g. FA_FL_REQUIRE_GOAL, FA_FL_NEAR_GOAL, etc to make the API more suitable for XFS? The goal could be a single __u64, or a struct with e.g. __u64 byte offset (possibly also __u32 lun like in FIEMAP). I guess the one potential limitation here is the number of function parameters on some architectures. > B ENOSPC > There is not enough space left on the device containing the file > referred to by > IR fd. Should probably say whether space is removed on failure or not. In some (primitive) implementations it might no longer be possible to distinguish between unwritten extents and zero-filled blocks, though at this point DEALLOC of zero-filled blocks might not be harmful either. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/