2007-05-14 14:45:19

by Amit K. Arora

[permalink] [raw]
Subject: [PATCH 1/5][TAKE2] fallocate() implementation on i86, x86_64 and powerpc

This patch implements sys_fallocate() and adds support on i386, x86_64
and powerpc platforms.

Changelog:
---------
Following changes were made to the previous version:
1) Added description before sys_fallocate() definition.
2) Return EINVAL for len<=0 (With new draft that Ulrich pointed to,
posix_fallocate should return EINVAL for len <= 0.
3) Return EOPNOTSUPP if mode is not one of FA_ALLOCATE or FA_DEALLOCATE
4) Do not return ENODEV for dirs (let individual file systems decide if
they want to support preallocation to directories or not.
5) Check for wrap through zero.
6) Update c/mtime if fallocate() succeeds.
7) Added mode descriptions in fs.h
8) Added variable names to function definition (fallocate inode op)

Here is the new patch:

Signed-off-by: Amit Arora <[email protected]>
---
arch/i386/kernel/syscall_table.S | 1
arch/powerpc/kernel/sys_ppc32.c | 7 +++
arch/x86_64/kernel/functionlist | 1
fs/open.c | 89 +++++++++++++++++++++++++++++++++++++++
include/asm-i386/unistd.h | 3 -
include/asm-powerpc/systbl.h | 1
include/asm-powerpc/unistd.h | 3 -
include/asm-x86_64/unistd.h | 4 +
include/linux/fs.h | 13 +++++
include/linux/syscalls.h | 1
10 files changed, 120 insertions(+), 3 deletions(-)

Index: linux-2.6.21/arch/i386/kernel/syscall_table.S
===================================================================
--- linux-2.6.21.orig/arch/i386/kernel/syscall_table.S
+++ linux-2.6.21/arch/i386/kernel/syscall_table.S
@@ -319,3 +319,4 @@ ENTRY(sys_call_table)
.long sys_move_pages
.long sys_getcpu
.long sys_epoll_pwait
+ .long sys_fallocate /* 320 */
Index: linux-2.6.21/arch/x86_64/kernel/functionlist
===================================================================
--- linux-2.6.21.orig/arch/x86_64/kernel/functionlist
+++ linux-2.6.21/arch/x86_64/kernel/functionlist
@@ -931,6 +931,7 @@
*(.text.sys_getitimer)
*(.text.sys_getgroups)
*(.text.sys_ftruncate)
+*(.text.sys_fallocate)
*(.text.sysfs_lookup)
*(.text.sys_exit_group)
*(.text.stub_fork)
Index: linux-2.6.21/fs/open.c
===================================================================
--- linux-2.6.21.orig/fs/open.c
+++ linux-2.6.21/fs/open.c
@@ -351,6 +351,95 @@ asmlinkage long sys_ftruncate64(unsigned
#endif

/*
+ * sys_fallocate - preallocate blocks or free preallocated blocks
+ * @fd: the file descriptor
+ * @mode: mode specifies if fallocate should preallocate blocks OR free
+ * (unallocate) preallocated blocks. Currently only FA_ALLOCATE and
+ * FA_DEALLOCATE modes are supported.
+ * @offset: The offset within file, from where (un)allocation is being
+ * requested. It should not have a negative value.
+ * @len: The amount (in bytes) of space to be (un)allocated, from the offset.
+ *
+ * This system call, depending on the mode, preallocates or unallocates blocks
+ * for a file. The range of blocks depends on the value of offset and len
+ * arguments provided by the user/application. For FA_ALLOCATE mode, if this
+ * system call succeeds, subsequent writes to the file in the given range
+ * (specified by offset & len) should not fail - even if the file system
+ * later becomes full. Hence the preallocation done is persistent (valid
+ * even after reopen of the file and remount/reboot).
+ *
+ * Note: Incase the file system does not support preallocation,
+ * posix_fallocate() should fall back to the library implementation (i.e.
+ * allocating zero-filled new blocks to the file).
+ *
+ * Return Values
+ * 0 : On SUCCESS a value of zero is returned.
+ * error : On Failure, an error code will be returned.
+ * An error code of -ENOSYS or -EOPNOTSUPP should make posix_fallocate()
+ * fall back on library implementation of fallocate.
+ *
+ * <TBD> Generic fallocate to be added for file systems that do not
+ * support fallocate it.
+ */
+asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len)
+{
+ struct file *file;
+ struct inode *inode;
+ long ret = -EINVAL;
+
+ if (offset < 0 || len <= 0)
+ goto out;
+
+ /* Return error if mode is not supported */
+ ret = -EOPNOTSUPP;
+ if (mode != FA_ALLOCATE && mode !=FA_DEALLOCATE)
+ goto out;
+
+ ret = -EBADF;
+ file = fget(fd);
+ if (!file)
+ goto out;
+ if (!(file->f_mode & FMODE_WRITE))
+ goto out_fput;
+
+ inode = file->f_path.dentry->d_inode;
+
+ ret = -ESPIPE;
+ if (S_ISFIFO(inode->i_mode))
+ goto out_fput;
+
+ ret = -ENODEV;
+ /*
+ * Let individual file system decide if it supports preallocation
+ * for directories or not.
+ */
+ if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+ goto out_fput;
+
+ ret = -EFBIG;
+ /* Check for wrap through zero too */
+ if (((offset + len) > inode->i_sb->s_maxbytes) || ((offset + len) < 0))
+ goto out_fput;
+
+ if (inode->i_op && inode->i_op->fallocate)
+ ret = inode->i_op->fallocate(inode, mode, offset, len);
+ else
+ ret = -ENOSYS;
+
+ /*
+ * Update [cm]time.
+ * Partial allocation will not result in the time stamp changes,
+ * since ->fallocate will return error (say, -ENOSPC) in this case.
+ */
+ if (!ret)
+ file_update_time(file);
+out_fput:
+ fput(file);
+out:
+ return ret;
+}
+
+/*
* access() needs to use the real uid/gid, not the effective uid/gid.
* We do this by temporarily clearing all FS-related capabilities and
* switching the fsuid/fsgid around to the real ones.
Index: linux-2.6.21/include/asm-i386/unistd.h
===================================================================
--- linux-2.6.21.orig/include/asm-i386/unistd.h
+++ linux-2.6.21/include/asm-i386/unistd.h
@@ -325,10 +325,11 @@
#define __NR_move_pages 317
#define __NR_getcpu 318
#define __NR_epoll_pwait 319
+#define __NR_fallocate 320

#ifdef __KERNEL__

-#define NR_syscalls 320
+#define NR_syscalls 321

#define __ARCH_WANT_IPC_PARSE_VERSION
#define __ARCH_WANT_OLD_READDIR
Index: linux-2.6.21/include/asm-powerpc/systbl.h
===================================================================
--- linux-2.6.21.orig/include/asm-powerpc/systbl.h
+++ linux-2.6.21/include/asm-powerpc/systbl.h
@@ -307,3 +307,4 @@ COMPAT_SYS_SPU(set_robust_list)
COMPAT_SYS_SPU(move_pages)
SYSCALL_SPU(getcpu)
COMPAT_SYS(epoll_pwait)
+COMPAT_SYS(fallocate)
Index: linux-2.6.21/include/asm-powerpc/unistd.h
===================================================================
--- linux-2.6.21.orig/include/asm-powerpc/unistd.h
+++ linux-2.6.21/include/asm-powerpc/unistd.h
@@ -326,10 +326,11 @@
#define __NR_move_pages 301
#define __NR_getcpu 302
#define __NR_epoll_pwait 303
+#define __NR_fallocate 304

#ifdef __KERNEL__

-#define __NR_syscalls 304
+#define __NR_syscalls 305

#define __NR__exit __NR_exit
#define NR_syscalls __NR_syscalls
Index: linux-2.6.21/include/asm-x86_64/unistd.h
===================================================================
--- linux-2.6.21.orig/include/asm-x86_64/unistd.h
+++ linux-2.6.21/include/asm-x86_64/unistd.h
@@ -619,8 +619,10 @@ __SYSCALL(__NR_sync_file_range, sys_sync
__SYSCALL(__NR_vmsplice, sys_vmsplice)
#define __NR_move_pages 279
__SYSCALL(__NR_move_pages, sys_move_pages)
+#define __NR_fallocate 280
+__SYSCALL(__NR_fallocate, sys_fallocate)

-#define __NR_syscall_max __NR_move_pages
+#define __NR_syscall_max __NR_fallocate

#ifndef __NO_STUBS
#define __ARCH_WANT_OLD_READDIR
Index: linux-2.6.21/include/linux/fs.h
===================================================================
--- linux-2.6.21.orig/include/linux/fs.h
+++ linux-2.6.21/include/linux/fs.h
@@ -264,6 +264,17 @@ extern int dir_notify_enable;
#define SYNC_FILE_RANGE_WRITE 2
#define SYNC_FILE_RANGE_WAIT_AFTER 4

+/*
+ * sys_fallocate modes
+ * Currently sys_fallocate supports two modes:
+ * FA_ALLOCATE : This is the preallocate mode, using which an application/user
+ * may request (pre)allocation of blocks.
+ * FA_DEALLOCATE: This is the deallocate mode, which can be used to free
+ * the preallocated blocks.
+ */
+#define FA_ALLOCATE 0x1
+#define FA_DEALLOCATE 0x2
+
#ifdef __KERNEL__

#include <linux/linkage.h>
@@ -1125,6 +1136,8 @@ struct inode_operations {
ssize_t (*listxattr) (struct dentry *, char *, size_t);
int (*removexattr) (struct dentry *, const char *);
void (*truncate_range)(struct inode *, loff_t, loff_t);
+ long (*fallocate)(struct inode *inode, int mode, loff_t offset,
+ loff_t len);
};

struct seq_file;
Index: linux-2.6.21/include/linux/syscalls.h
===================================================================
--- linux-2.6.21.orig/include/linux/syscalls.h
+++ linux-2.6.21/include/linux/syscalls.h
@@ -602,6 +602,7 @@ asmlinkage long sys_get_robust_list(int
asmlinkage long sys_set_robust_list(struct robust_list_head __user *head,
size_t len);
asmlinkage long sys_getcpu(unsigned __user *cpu, unsigned __user *node, struct getcpu_cache __user *cache);
+asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len);

int kernel_execve(const char *filename, char *const argv[], char *const envp[]);

Index: linux-2.6.21/arch/powerpc/kernel/sys_ppc32.c
===================================================================
--- linux-2.6.21.orig/arch/powerpc/kernel/sys_ppc32.c
+++ linux-2.6.21/arch/powerpc/kernel/sys_ppc32.c
@@ -777,6 +777,13 @@ asmlinkage int compat_sys_truncate64(con
return sys_truncate(path, (high << 32) | low);
}

+asmlinkage long compat_sys_fallocate(int fd, int mode, u32 offhi, u32 offlo,
+ u32 lenhi, u32 lenlo)
+{
+ return sys_fallocate(fd, mode, ((loff_t)offhi << 32) | offlo,
+ ((loff_t)lenhi << 32) | lenlo);
+}
+
asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long high,
unsigned long low)
{


2007-05-14 23:44:36

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [PATCH 1/5][TAKE2] fallocate() implementation on i86, x86_64 and powerpc

On Mon, 14 May 2007 20:15:24 +0530 "Amit K. Arora" <[email protected]> wrote:
>
> This patch implements sys_fallocate() and adds support on i386, x86_64
> and powerpc platforms.

This patch no longer applies to Linus' tree - for a start there is no file
arch/x86_64/kernel/functionlist any more.

Can you rebase it, please?

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (435.00 B)
(No filename) (189.00 B)
Download all attachments

2007-05-15 13:23:53

by Amit K. Arora

[permalink] [raw]
Subject: Re: [PATCH 1/5][TAKE2] fallocate() implementation on i86, x86_64 and powerpc

On Tue, May 15, 2007 at 09:44:36AM +1000, Stephen Rothwell wrote:
> On Mon, 14 May 2007 20:15:24 +0530 "Amit K. Arora" <[email protected]> wrote:
> >
> > This patch implements sys_fallocate() and adds support on i386, x86_64
> > and powerpc platforms.
>
> This patch no longer applies to Linus' tree - for a start there is no file
> arch/x86_64/kernel/functionlist any more.
>
> Can you rebase it, please?

I will rebase it to 2.6.22-rc1 and repost the patches soon.
Thanks!

--
Regards,
Amit Arora

2007-05-18 21:36:32

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 1/5][TAKE2] fallocate() implementation on i86, x86_64 and powerpc

On Tue, May 15, 2007 at 06:53:53PM +0530, Amit K. Arora wrote:
> I will rebase it to 2.6.22-rc1 and repost the patches soon.
> Thanks!

I've rebased to 2.6.22-rc1 and put it in the ext4-patch-queue.

Mingming had rebased your previous (take3) set to 2.6.22-rc1, but
apparently the series file was corrupted, so it referenced an
incorrect patch filename, and the patch series didn't apply cleanly.
I've fixed it and confirmed that it builds and boots under UML. Will
do more testing, but please take a look and confirm that it looks good.

Amit, we should probably get you access to repo.or.cz so you can
update the patches yourself. My normal process is to transfer the
patches into git using the 'guilt' tool, and then start doing test
builds from there. After I fix up the patches and do whatever is
necessary so they build, I copy them back into the ext4-patch-queue
directly, and then do a git-diff to see what has changed, and make the
changes to the patches look sane. Can you send me and/or mingming
your ssh key, and we can give you push access to repo.or.cz?

We've missed the -rc1 merge window, so the goal should be to make sure
that everything in the series file before the "unstable patches" is
ready for merging.

Regards,

- Ted

2007-05-18 23:11:06

by Mingming Cao

[permalink] [raw]
Subject: Re: [PATCH 1/5][TAKE2] fallocate() implementation on i86, x86_64 and powerpc

On Fri, 2007-05-18 at 17:36 -0400, Theodore Tso wrote:
> On Tue, May 15, 2007 at 06:53:53PM +0530, Amit K. Arora wrote:
> > I will rebase it to 2.6.22-rc1 and repost the patches soon.
> > Thanks!
>
> I've rebased to 2.6.22-rc1 and put it in the ext4-patch-queue.
>
> Mingming had rebased your previous (take3) set to 2.6.22-rc1, but
> apparently the series file was corrupted, so it referenced an
> incorrect patch filename, and the patch series didn't apply cleanly.
> I've fixed it and confirmed that it builds and boots under UML. Will
> do more testing, but please take a look and confirm that it looks good.
>

Thanks Ted. I am not sure how the series corrupted but I am glad that
you catch that and updated with fallocate patches.:-)

We don't need the ext4-fallocate-1b-fallocate_inode_op_fix.patch as Amit
fixed the ext4_fallocate() return value type to match VFS fallocate() in
takes 4, patch 5/6. I will update the series to reflect this and run
test.

I think Kalpak's patch to remove 32000 subdirs patch can be add to the
ext4 patch queue as well. Agreed?

> Amit, we should probably get you access to repo.or.cz so you can
> update the patches yourself. My normal process is to transfer the
> patches into git using the 'guilt' tool, and then start doing test
> builds from there. After I fix up the patches and do whatever is
> necessary so they build, I copy them back into the ext4-patch-queue
> directly, and then do a git-diff to see what has changed, and make the
> changes to the patches look sane. Can you send me and/or mingming
> your ssh key, and we can give you push access to repo.or.cz?
>

I am not sure Amit can response this before he leave for vacation.(from
May 19 for 10 days).

I will checked the fallocate patches and run auto tests.

> We've missed the -rc1 merge window, so the goal should be to make sure
> that everything in the series file before the "unstable patches" is
> ready for merging.
>
I tend to agree. But there are some bug-fix type or mount option
patches that can try to target for rc2, what do you think?


# New patch to fix whitespace before applying new patches
whitespace.patch

#New patch to remove unnecessary exported symbols
ext4_remove_exported_symbles.patch

# New patch to add mount option to turn off extents
ext4_noextent_mount_opt.patch
# Now Turn on extents feature by default
ext4_extents_on_by_default.patch

#New patch to propagate inode flags
ext4-propagate_flags.patch

#New patch to add extent sanity checks
ext4-extent-sanity-checks.patch

#New patch to free blocks when failed to insert an extent
ext4-free-blocks-on-insert-extent-failure.patch

Cheers,
Mingming

2007-05-20 12:39:41

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [PATCH 1/5][TAKE2] fallocate() implementation on i86, x86_64 and powerpc

On Fri, 2007-05-18 at 16:10 -0700, Mingming Cao wrote:
> On Fri, 2007-05-18 at 17:36 -0400, Theodore Tso wrote:

> > We've missed the -rc1 merge window, so the goal should be to make sure
> > that everything in the series file before the "unstable patches" is
> > ready for merging.
> >
> I tend to agree. But there are some bug-fix type or mount option
> patches that can try to target for rc2, what do you think?

I agree with Mingming. There's no reason for these patches not to be in
mainline. I am curious why the fallocate patches were put at the top of
the series file in the first place. The older patches shouldn't be held
up by fallocate (which should wait until the next merge window).

Thanks,
Shaggy
--
David Kleikamp
IBM Linux Technology Center

2007-05-21 05:38:57

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 1/5][TAKE2] fallocate() implementation on i86, x86_64 and powerpc

On Sun, May 20, 2007 at 07:39:32AM -0500, Dave Kleikamp wrote:
> On Fri, 2007-05-18 at 16:10 -0700, Mingming Cao wrote:
> > On Fri, 2007-05-18 at 17:36 -0400, Theodore Tso wrote:
>
> > > We've missed the -rc1 merge window, so the goal should be to make sure
> > > that everything in the series file before the "unstable patches" is
> > > ready for merging.
> > >
> > I tend to agree. But there are some bug-fix type or mount option
> > patches that can try to target for rc2, what do you think?
>
> I agree with Mingming. There's no reason for these patches not to be in
> mainline. I am curious why the fallocate patches were put at the top of
> the series file in the first place. The older patches shouldn't be held
> up by fallocate (which should wait until the next merge window).

I've rebased the ext4 patch queue for 2.6.22-rc2, and moved the
obvious bug fixes to the top of the queue. There's one patch which I
missed (ext4-free-blocks-on-insert-extent-failure.patch) which is also
a bug fixed, that should be moved up.

It's true that some of the older patches are below fallocate in the
queue, but they are still new features that probably shouldn't be
pushed at this point. But yes, I agree that the bug fixes should be
pushed to Linus before 2.6.22 ships.

Regards,

- Ted