From: "Amit K. Arora" Subject: [PATCH 2/5][TAKE8] fallocate() implementation in i386, x86_64 and powerpc Date: Sat, 14 Jul 2007 00:19:09 +0530 Message-ID: <20070713184909.GC12156@amitarora.in.ibm.com> References: <20070713184125.GA12156@amitarora.in.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: xfs@oss.sgi.com, tytso@mit.edu, cmm@us.ibm.com, suparna@in.ibm.com, adilger@clusterfs.com, dgc@sgi.com To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org Return-path: Received: from e6.ny.us.ibm.com ([32.97.182.146]:56302 "EHLO e6.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759612AbXGMStA (ORCPT ); Fri, 13 Jul 2007 14:49:00 -0400 Content-Disposition: inline In-Reply-To: <20070713184125.GA12156@amitarora.in.ibm.com> Sender: linux-ext4-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org From: Amit Arora sys_fallocate() implementation on i386, x86_64 and powerpc fallocate() is a new system call being proposed here which will allow applications to preallocate space to any file(s) in a file system. Each file system implementation that wants to use this feature will need to support an inode operation called ->fallocate(). Applications can use this feature to avoid fragmentation to certain level and thus get faster access speed. With preallocation, applications also get a guarantee of space for particular file(s) - even if later the the system becomes full. Currently, glibc provides an interface called posix_fallocate() which can be used for similar cause. Though this has the advantage of working on all file systems, but it is quite slow (since it writes zeroes to each block that has to be preallocated). Without a doubt, file systems can do this more efficiently within the kernel, by implementing the proposed fallocate() system call. It is expected that posix_fallocate() will be modified to call this new system call first and incase the kernel/filesystem does not implement it, it should fall back to the current implementation of writing zeroes to the new blocks. ToDos: 1. Implementation on other architectures (other than i386, x86_64, and ppc). Patches for s390(x) and ia64 are already available from previous posts, but it was decided that they should be added later once fallocate is in the mainline. Hence not including those patches in this take. 2. Changes to glibc, a) to support fallocate() system call b) to make posix_fallocate() and posix_fallocate64() call fallocate() CHANGELOG: --------- Following changed from TAKE7: 1. Added linux/falloc.h and moved FALLOC_FL_KEEP_SIZE flag to it. 2. Removed the two modes (FALLOC_ALLOCATE and FALLOC_RESV_SPACE). 3. Merged "revalidate write permissions" patch from David P. Quigley to this patch. 4. Deleted comment above sys_fallocate definition, as suggested by Christoph. Signed-off-by: Amit Arora Index: linux-2.6.22/arch/i386/kernel/syscall_table.S =================================================================== --- linux-2.6.22.orig/arch/i386/kernel/syscall_table.S +++ linux-2.6.22/arch/i386/kernel/syscall_table.S @@ -323,3 +323,4 @@ ENTRY(sys_call_table) .long sys_signalfd .long sys_timerfd .long sys_eventfd + .long sys_fallocate Index: linux-2.6.22/arch/powerpc/kernel/sys_ppc32.c =================================================================== --- linux-2.6.22.orig/arch/powerpc/kernel/sys_ppc32.c +++ linux-2.6.22/arch/powerpc/kernel/sys_ppc32.c @@ -773,6 +773,13 @@ asmlinkage int compat_sys_truncate64(con return sys_truncate(path, (high << 32) | low); } +asmlinkage long compat_sys_fallocate(int fd, int mode, u32 offhi, u32 offlo, + u32 lenhi, u32 lenlo) +{ + return sys_fallocate(fd, mode, ((loff_t)offhi << 32) | offlo, + ((loff_t)lenhi << 32) | lenlo); +} + asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long high, unsigned long low) { Index: linux-2.6.22/arch/x86_64/ia32/ia32entry.S =================================================================== --- linux-2.6.22.orig/arch/x86_64/ia32/ia32entry.S +++ linux-2.6.22/arch/x86_64/ia32/ia32entry.S @@ -719,4 +719,5 @@ ia32_sys_call_table: .quad compat_sys_signalfd .quad compat_sys_timerfd .quad sys_eventfd + .quad sys32_fallocate ia32_syscall_end: Index: linux-2.6.22/fs/open.c =================================================================== --- linux-2.6.22.orig/fs/open.c +++ linux-2.6.22/fs/open.c @@ -26,6 +26,7 @@ #include #include #include +#include int vfs_statfs(struct dentry *dentry, struct kstatfs *buf) { @@ -352,6 +353,64 @@ asmlinkage long sys_ftruncate64(unsigned } #endif +asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len) +{ + struct file *file; + struct inode *inode; + long ret = -EINVAL; + + if (offset < 0 || len <= 0) + goto out; + + /* Return error if mode is not supported */ + ret = -EOPNOTSUPP; + if (mode && !(mode & FALLOC_FL_KEEP_SIZE)) + goto out; + + ret = -EBADF; + file = fget(fd); + if (!file) + goto out; + if (!(file->f_mode & FMODE_WRITE)) + goto out_fput; + /* + * Revalidate the write permissions, in case security policy has + * changed since the files were opened. + */ + ret = security_file_permission(file, MAY_WRITE); + if (ret) + goto out_fput; + + inode = file->f_path.dentry->d_inode; + + ret = -ESPIPE; + if (S_ISFIFO(inode->i_mode)) + goto out_fput; + + ret = -ENODEV; + /* + * Let individual file system decide if it supports preallocation + * for directories or not. + */ + if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode)) + goto out_fput; + + ret = -EFBIG; + /* Check for wrap through zero too */ + if (((offset + len) > inode->i_sb->s_maxbytes) || ((offset + len) < 0)) + goto out_fput; + + if (inode->i_op && inode->i_op->fallocate) + ret = inode->i_op->fallocate(inode, mode, offset, len); + else + ret = -ENOSYS; + +out_fput: + fput(file); +out: + return ret; +} + /* * access() needs to use the real uid/gid, not the effective uid/gid. * We do this by temporarily clearing all FS-related capabilities and Index: linux-2.6.22/include/asm-i386/unistd.h =================================================================== --- linux-2.6.22.orig/include/asm-i386/unistd.h +++ linux-2.6.22/include/asm-i386/unistd.h @@ -329,10 +329,11 @@ #define __NR_signalfd 321 #define __NR_timerfd 322 #define __NR_eventfd 323 +#define __NR_fallocate 324 #ifdef __KERNEL__ -#define NR_syscalls 324 +#define NR_syscalls 325 #define __ARCH_WANT_IPC_PARSE_VERSION #define __ARCH_WANT_OLD_READDIR Index: linux-2.6.22/include/asm-powerpc/systbl.h =================================================================== --- linux-2.6.22.orig/include/asm-powerpc/systbl.h +++ linux-2.6.22/include/asm-powerpc/systbl.h @@ -308,6 +308,7 @@ COMPAT_SYS_SPU(move_pages) SYSCALL_SPU(getcpu) COMPAT_SYS(epoll_pwait) COMPAT_SYS_SPU(utimensat) +COMPAT_SYS(fallocate) COMPAT_SYS_SPU(signalfd) COMPAT_SYS_SPU(timerfd) SYSCALL_SPU(eventfd) Index: linux-2.6.22/include/asm-powerpc/unistd.h =================================================================== --- linux-2.6.22.orig/include/asm-powerpc/unistd.h +++ linux-2.6.22/include/asm-powerpc/unistd.h @@ -331,10 +331,11 @@ #define __NR_timerfd 306 #define __NR_eventfd 307 #define __NR_sync_file_range2 308 +#define __NR_fallocate 309 #ifdef __KERNEL__ -#define __NR_syscalls 309 +#define __NR_syscalls 310 #define __NR__exit __NR_exit #define NR_syscalls __NR_syscalls Index: linux-2.6.22/include/asm-x86_64/unistd.h =================================================================== --- linux-2.6.22.orig/include/asm-x86_64/unistd.h +++ linux-2.6.22/include/asm-x86_64/unistd.h @@ -630,6 +630,8 @@ __SYSCALL(__NR_signalfd, sys_signalfd) __SYSCALL(__NR_timerfd, sys_timerfd) #define __NR_eventfd 284 __SYSCALL(__NR_eventfd, sys_eventfd) +#define __NR_fallocate 285 +__SYSCALL(__NR_fallocate, sys_fallocate) #ifndef __NO_STUBS #define __ARCH_WANT_OLD_READDIR Index: linux-2.6.22/include/linux/fs.h =================================================================== --- linux-2.6.22.orig/include/linux/fs.h +++ linux-2.6.22/include/linux/fs.h @@ -1138,6 +1138,8 @@ struct inode_operations { ssize_t (*listxattr) (struct dentry *, char *, size_t); int (*removexattr) (struct dentry *, const char *); void (*truncate_range)(struct inode *, loff_t, loff_t); + long (*fallocate)(struct inode *inode, int mode, loff_t offset, + loff_t len); }; struct seq_file; Index: linux-2.6.22/include/linux/syscalls.h =================================================================== --- linux-2.6.22.orig/include/linux/syscalls.h +++ linux-2.6.22/include/linux/syscalls.h @@ -610,6 +610,7 @@ asmlinkage long sys_signalfd(int ufd, si asmlinkage long sys_timerfd(int ufd, int clockid, int flags, const struct itimerspec __user *utmr); asmlinkage long sys_eventfd(unsigned int count); +asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len); int kernel_execve(const char *filename, char *const argv[], char *const envp[]); Index: linux-2.6.22/arch/x86_64/ia32/sys_ia32.c =================================================================== --- linux-2.6.22.orig/arch/x86_64/ia32/sys_ia32.c +++ linux-2.6.22/arch/x86_64/ia32/sys_ia32.c @@ -879,3 +879,11 @@ asmlinkage long sys32_fadvise64(int fd, return sys_fadvise64_64(fd, ((u64)offset_hi << 32) | offset_lo, len, advice); } + +asmlinkage long sys32_fallocate(int fd, int mode, unsigned offset_lo, + unsigned offset_hi, unsigned len_lo, + unsigned len_hi) +{ + return sys_fallocate(fd, mode, ((u64)offset_hi << 32) | offset_lo, + ((u64)len_hi << 32) | len_lo); +} Index: linux-2.6.22/include/linux/falloc.h =================================================================== --- /dev/null +++ linux-2.6.22/include/linux/falloc.h @@ -0,0 +1,6 @@ +#ifndef _FALLOC_H_ +#define _FALLOC_H_ + +#define FALLOC_FL_KEEP_SIZE 0x01 /* default is extend size */ + +#endif /* _FALLOC_H_ */