LinuxLists.cc - [PATCH 1/3] copyfile: generic

2004-09-04 16:57:57

Subject: [PATCH 1/3] copyfile: generic_sendpage

Andrew,

looks like there are four independent groups interested in these
ground-laying patches for cowlinks:
o me (of course)
o vserver - they want cowlinks as well
o cifs - to optimize sys_copyfile() and reduce network transfers
o unionmount - you might remember Jan Blunck's fix to ext3

So I am, ready to accept my deserved beating and do the necessary
cleanups.

J?rn

--
Don't worry about people stealing your ideas. If your ideas are any good,
you'll have to ram them down people's throats.
-- Howard Aiken quoted by Ken Iverson quoted by Jim Horning quoted by
Raph Levien, 1979

o Add sendfile() support for file targets to normal mm/filemap.c.
o Have ext[23] use that support.

fs/ext2/file.c | 1
fs/ext3/file.c | 1
include/linux/fs.h | 1
mm/filemap.c | 216 +++++++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 219 insertions(+)

--- linux-2.6.5cow/include/linux/fs.h~generic_sendpage 2004-05-06 10:04:00.000000000 +0200
+++ linux-2.6.5cow/include/linux/fs.h 2004-05-14 18:30:50.000000000 +0200
@@ -1332,6 +1332,7 @@
ssize_t generic_file_write_nolock(struct file *file, const struct iovec *iov,
unsigned long nr_segs, loff_t *ppos);
extern ssize_t generic_file_sendfile(struct file *, loff_t *, size_t, read_actor_t, void *);
+extern ssize_t generic_file_sendpage(struct file *, struct page *, int, size_t, loff_t *, int);
extern void do_generic_mapping_read(struct address_space *mapping,
struct file_ra_state *, struct file *,
loff_t *, read_descriptor_t *, read_actor_t);
--- linux-2.6.5cow/mm/filemap.c~generic_sendpage 2004-05-06 10:03:59.000000000 +0200
+++ linux-2.6.5cow/mm/filemap.c 2004-05-06 10:04:33.000000000 +0200
@@ -896,6 +896,31 @@
return written;
}

+/* FIXME: It would be as simple as this, if we had a (void __user*) to write.
+ * We already have a kernel buffer, so it should be even simpler, right? ;)
+ *
+ * Yes, sorta. After duplicating the complete path of generic_file_write(),
+ * at least some special cases could be removed, so the copy is simpler than
+ * the original. But it remains a copy, so overall complexity increases.
+ */
+static ssize_t
+generic_kernel_file_write(struct file *, const char *, size_t, loff_t *);
+
+ssize_t generic_file_sendpage(struct file *file, struct page *page,
+ int offset, size_t size, loff_t *ppos, int more)
+{
+ ssize_t ret;
+ char *kaddr;
+
+ kaddr = kmap(page);
+ ret = generic_kernel_file_write(file, kaddr + offset, size, ppos);
+ kunmap(page);
+
+ return ret;
+}
+
+EXPORT_SYMBOL(generic_file_sendpage);
+
ssize_t generic_file_sendfile(struct file *in_file, loff_t *ppos,
size_t count, read_actor_t actor, void *target)
{
@@ -1554,6 +1579,19 @@
return bytes - left;
}

+static inline size_t
+filemap_copy_from_kernel(struct page *page, unsigned long offset,
+ const char *buf, unsigned bytes)
+{
+ char *kaddr;
+
+ kaddr = kmap(page);
+ memcpy(kaddr + offset, buf, bytes);
+ kunmap(page);
+
+ return bytes;
+}
+
static size_t
__filemap_copy_from_user_iovec(char *vaddr,
const struct iovec *iov, size_t base, size_t bytes)
@@ -1910,6 +1948,155 @@

EXPORT_SYMBOL(generic_file_aio_write_nolock);

+/*
+ * TODO:
+ * This largely tries to copy generic_file_aio_write_nolock(), although it
+ * doesn't have to be nearly as generic. A real cleanup should either
+ * merge this into generic_file_aio_write_nolock() as well or keep it special
+ * and remove as much code as possible.
+ */
+static ssize_t
+generic_kernel_file_aio_write_nolock(struct kiocb *iocb, const struct iovec*iov,
+ unsigned long nr_segs, loff_t *ppos)
+{
+ struct file *file = iocb->ki_filp;
+ struct address_space * mapping = file->f_mapping;
+ struct address_space_operations *a_ops = mapping->a_ops;
+ size_t ocount; /* original count */
+ size_t count; /* after file limit checks */
+ struct inode *inode = mapping->host;
+ long status = 0;
+ loff_t pos;
+ struct page *page;
+ struct page *cached_page = NULL;
+ const int isblk = S_ISBLK(inode->i_mode);
+ ssize_t written;
+ ssize_t err;
+ size_t bytes;
+ struct pagevec lru_pvec;
+ const struct iovec *cur_iov = iov; /* current iovec */
+ size_t iov_base = 0; /* offset in the current iovec */
+ unsigned long seg;
+ char *buf;
+
+ ocount = 0;
+ for (seg = 0; seg < nr_segs; seg++) {
+ const struct iovec *iv = &iov[seg];
+
+ /*
+ * If any segment has a negative length, or the cumulative
+ * length ever wraps negative then return -EINVAL.
+ */
+ ocount += iv->iov_len;
+ if (unlikely((ssize_t)(ocount|iv->iov_len) < 0))
+ return -EINVAL;
+ }
+
+ count = ocount;
+ pos = *ppos;
+ pagevec_init(&lru_pvec, 0);
+
+ /* We can write back this queue in page reclaim */
+ current->backing_dev_info = mapping->backing_dev_info;
+ written = 0;
+
+ err = generic_write_checks(file, &pos, &count, isblk);
+ if (err)
+ goto out;
+
+
+ if (count == 0)
+ goto out;
+
+ remove_suid(file->f_dentry);
+ inode_update_time(inode, 1);
+
+ /* There is no sane reason to use O_DIRECT */
+ BUG_ON(file->f_flags & O_DIRECT);
+
+ buf = (char *)iov->iov_base;
+ do {
+ unsigned long index;
+ unsigned long offset;
+ size_t copied;
+
+ offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */
+ index = pos >> PAGE_CACHE_SHIFT;
+ bytes = PAGE_CACHE_SIZE - offset;
+ if (bytes > count)
+ bytes = count;
+
+ page = __grab_cache_page(mapping,index,&cached_page,&lru_pvec);
+ if (!page) {
+ status = -ENOMEM;
+ break;
+ }
+
+ status = a_ops->prepare_write(file, page, offset, offset+bytes);
+ if (unlikely(status)) {
+ loff_t isize = i_size_read(inode);
+ /*
+ * prepare_write() may have instantiated a few blocks
+ * outside i_size. Trim these off again.
+ */
+ unlock_page(page);
+ page_cache_release(page);
+ if (pos + bytes > isize)
+ vmtruncate(inode, isize);
+ break;
+ }
+
+ BUG_ON(nr_segs != 1);
+ copied = filemap_copy_from_kernel(page, offset, buf, bytes);
+
+ flush_dcache_page(page);
+ status = a_ops->commit_write(file, page, offset, offset+bytes);
+ if (likely(copied > 0)) {
+ if (!status)
+ status = copied;
+
+ if (status >= 0) {
+ written += status;
+ count -= status;
+ pos += status;
+ buf += status;
+ if (unlikely(nr_segs > 1))
+ filemap_set_next_iovec(&cur_iov,
+ &iov_base, status);
+ }
+ }
+ if (unlikely(copied != bytes))
+ if (status >= 0)
+ status = -EFAULT;
+ unlock_page(page);
+ mark_page_accessed(page);
+ page_cache_release(page);
+ if (status < 0)
+ break;
+ balance_dirty_pages_ratelimited(mapping);
+ cond_resched();
+ } while (count);
+ *ppos = pos;
+
+ if (cached_page)
+ page_cache_release(cached_page);
+
+ /*
+ * For now, when the user asks for O_SYNC, we'll actually give O_DSYNC
+ */
+ if (status >= 0) {
+ if ((file->f_flags & O_SYNC) || IS_SYNC(inode))
+ status = generic_osync_inode(inode, mapping,
+ OSYNC_METADATA|OSYNC_DATA);
+ }
+
+ err = written ? written : status;
+out:
+ pagevec_lru_add(&lru_pvec);
+ current->backing_dev_info = 0;
+ return err;
+}
+
ssize_t
generic_file_write_nolock(struct file *file, const struct iovec *iov,
unsigned long nr_segs, loff_t *ppos)
@@ -1924,6 +2111,20 @@
return ret;
}

+static ssize_t
+generic_kernel_file_write_nolock(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos)
+{
+ struct kiocb kiocb;
+ ssize_t ret;
+
+ init_sync_kiocb(&kiocb, file);
+ ret = generic_kernel_file_aio_write_nolock(&kiocb, iov, nr_segs, ppos);
+ if (ret == -EIOCBQUEUED)
+ ret = wait_on_sync_kiocb(&kiocb);
+ return ret;
+}
+
EXPORT_SYMBOL(generic_file_write_nolock);

ssize_t generic_file_aio_write(struct kiocb *iocb, const char __user *buf,
@@ -1962,6 +2163,21 @@
}
EXPORT_SYMBOL(generic_file_write);

+static ssize_t generic_kernel_file_write(struct file *file, const char *buf,
+ size_t count, loff_t *ppos)
+{
+ struct inode *inode = file->f_mapping->host;
+ ssize_t err;
+ struct iovec local_iov = {.iov_base = (void __user *)buf,
+ .iov_len = count };
+
+ down(&inode->i_sem);
+ err = generic_kernel_file_write_nolock(file, &local_iov, 1, ppos);
+ up(&inode->i_sem);
+
+ return err;
+}
+
ssize_t generic_file_readv(struct file *filp, const struct iovec *iov,
unsigned long nr_segs, loff_t *ppos)
{
--- linux-2.6.5cow/fs/ext2/file.c~generic_sendpage 2004-05-06 10:03:59.000000000 +0200
+++ linux-2.6.5cow/fs/ext2/file.c 2004-05-06 10:04:33.000000000 +0200
@@ -53,6 +53,7 @@
.readv = generic_file_readv,
.writev = generic_file_writev,
.sendfile = generic_file_sendfile,
+ .sendpage = generic_file_sendpage,
};

struct inode_operations ext2_file_inode_operations = {
--- linux-2.6.5cow/fs/ext3/file.c~generic_sendpage 2004-05-06 10:03:59.000000000 +0200
+++ linux-2.6.5cow/fs/ext3/file.c 2004-05-06 10:04:33.000000000 +0200
@@ -127,6 +127,7 @@
.release = ext3_release_file,
.fsync = ext3_sync_file,
.sendfile = generic_file_sendfile,
+ .sendpage = generic_file_sendpage,
};

struct inode_operations ext3_file_inode_operations = {

2004-09-04 17:01:20

by Jörn Engel

[permalink] [raw]

Subject: [PATCH 2/3] copyfile: sendfile

See below.

J?rn

--
The strong give up and move away, while the weak give up and stay.
-- unknown

Creates vfs_sendfile(), which can be called from other places within
the kernel. Such other places include copyfile() and cowlinks.

In principle, this just removes code from do_sendfile() to
vfs_sendfile(). On top of that, it adds a check to out_inode,
identical to the one on in_inode. True, the check for out_inode was
never needed, maybe that tells you something about the check to
in_inode as well. ;)

fs/read_write.c | 107 ++++++++++++++++++++++++++---------------------
include/linux/fs.h | 1
include/linux/syscalls.h | 2
3 files changed, 64 insertions(+), 46 deletions(-)

--- linux-2.6.9-rc1-mm3/fs/read_write.c~sendfile 2004-09-04 18:17:31.000000000 +0200
+++ linux-2.6.9-rc1-mm3/fs/read_write.c 2004-09-04 18:19:07.000000000 +0200
@@ -561,12 +561,70 @@
return ret;
}

+ssize_t vfs_sendfile(struct file *out_file, struct file *in_file, loff_t *ppos,
+ size_t count, loff_t max)
+{
+ struct inode * in_inode, * out_inode;
+ loff_t pos;
+ ssize_t ret;
+
+ /* verify in_file */
+ in_inode = in_file->f_dentry->d_inode;
+ if (!in_inode)
+ return -EINVAL;
+ if (!in_file->f_op || !in_file->f_op->sendfile)
+ return -EINVAL;
+
+ if (!ppos)
+ ppos = &in_file->f_pos;
+ else
+ if (!(in_file->f_mode & FMODE_PREAD))
+ return -ESPIPE;
+
+ ret = locks_verify_area(FLOCK_VERIFY_READ, in_inode, in_file, *ppos, count);
+ if (ret)
+ return ret;
+
+ /* verify out_file */
+ out_inode = out_file->f_dentry->d_inode;
+ if (!out_inode)
+ return -EINVAL;
+ if (!out_file->f_op || !out_file->f_op->sendpage)
+ return -EINVAL;
+
+ ret = locks_verify_area(FLOCK_VERIFY_WRITE, out_inode, out_file, out_file->f_pos, count);
+ if (ret)
+ return ret;
+
+ ret = security_file_permission (out_file, MAY_WRITE);
+ if (ret)
+ return ret;
+
+ if (!max)
+ max = min(in_inode->i_sb->s_maxbytes, out_inode->i_sb->s_maxbytes);
+
+ pos = *ppos;
+ if (unlikely(pos < 0))
+ return -EINVAL;
+ if (unlikely(pos + count > max)) {
+ if (pos >= max)
+ return -EOVERFLOW;
+ count = max - pos;
+ }
+
+ ret = in_file->f_op->sendfile(in_file, ppos, count, file_send_actor, out_file);
+
+ if (*ppos > max)
+ return -EOVERFLOW;
+ return ret;
+}
+
+EXPORT_SYMBOL(vfs_sendfile);
+
static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
size_t count, loff_t max)
{
struct file * in_file, * out_file;
- struct inode * in_inode, * out_inode;
- loff_t pos;
ssize_t retval;
int fput_needed_in, fput_needed_out;

@@ -579,21 +637,6 @@
goto out;
if (!(in_file->f_mode & FMODE_READ))
goto fput_in;
- retval = -EINVAL;
- in_inode = in_file->f_dentry->d_inode;
- if (!in_inode)
- goto fput_in;
- if (!in_file->f_op || !in_file->f_op->sendfile)
- goto fput_in;
- retval = -ESPIPE;
- if (!ppos)
- ppos = &in_file->f_pos;
- else
- if (!(in_file->f_mode & FMODE_PREAD))
- goto fput_in;
- retval = locks_verify_area(FLOCK_VERIFY_READ, in_inode, in_file, *ppos, count);
- if (retval)
- goto fput_in;

retval = security_file_permission (in_file, MAY_READ);
if (retval)
@@ -608,36 +651,8 @@
goto fput_in;
if (!(out_file->f_mode & FMODE_WRITE))
goto fput_out;
- retval = -EINVAL;
- if (!out_file->f_op || !out_file->f_op->sendpage)
- goto fput_out;
- out_inode = out_file->f_dentry->d_inode;
- retval = locks_verify_area(FLOCK_VERIFY_WRITE, out_inode, out_file, out_file->f_pos, count);
- if (retval)
- goto fput_out;
-
- retval = security_file_permission (out_file, MAY_WRITE);
- if (retval)
- goto fput_out;
-
- if (!max)
- max = min(in_inode->i_sb->s_maxbytes, out_inode->i_sb->s_maxbytes);
-
- pos = *ppos;
- retval = -EINVAL;
- if (unlikely(pos < 0))
- goto fput_out;
- if (unlikely(pos + count > max)) {
- retval = -EOVERFLOW;
- if (pos >= max)
- goto fput_out;
- count = max - pos;
- }
-
- retval = in_file->f_op->sendfile(in_file, ppos, count, file_send_actor, out_file);

- if (*ppos > max)
- retval = -EOVERFLOW;
+ retval = vfs_sendfile(out_file, in_file, ppos, count, max);

fput_out:
fput_light(out_file, fput_needed_out);
--- linux-2.6.9-rc1-mm3/include/linux/fs.h~sendfile 2004-09-03 21:02:58.000000000 +0200
+++ linux-2.6.9-rc1-mm3/include/linux/fs.h 2004-09-04 18:17:14.000000000 +0200
@@ -1018,6 +1018,7 @@
unsigned long, loff_t *);
extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
unsigned long, loff_t *);
+ssize_t vfs_sendfile(struct file *, struct file *, loff_t *, size_t, loff_t);

/*
* NOTE: write_inode, delete_inode, clear_inode, put_inode can be called
--- linux-2.6.9-rc1-mm3/include/linux/syscalls.h~sendfile 2004-09-03 20:49:16.000000000 +0200
+++ linux-2.6.9-rc1-mm3/include/linux/syscalls.h 2004-09-04 18:17:15.000000000 +0200
@@ -285,6 +285,8 @@
asmlinkage long sys_unlink(const char __user *pathname);
asmlinkage long sys_rename(const char __user *oldname,
const char __user *newname);
+asmlinkage long sys_copyfile(const char __user *from, const char __user *to,
+ umode_t mode);
asmlinkage long sys_chmod(const char __user *filename, mode_t mode);
asmlinkage long sys_fchmod(unsigned int fd, mode_t mode);

2004-09-04 17:04:47

by Jörn Engel

[permalink] [raw]

Subject: [PATCH 3/3] copyfile: copyfile

This one might be a little controversion. The system call would be
needed for cifs, so they can optimize it and to the copyfile
completely on the server. It's also nice for testing, for people that
don't want to use the madcow patches (I do, but I agree those should
not go into mainline.

copyfile(2) is about 10% faster than cp(1) with the read/write loop on
a single large file and with warm cache. Arguably, using open(2),
sendfile(2), close(2) should give the same result, so that only leaves
cifs as a good reason for the system call.

Comment?

J?rn

--
A victorious army first wins and then seeks battle.
-- Sun Tzu

Adds a new syscall, copyfile() which does as the name sais.

This syscall is a preparation for real cowlinks, but it might make sense on
it's own as well. For cowlinks, the real copy is not done until the first
write (or similar operation) occurs, but at that point, both operations
should be the same.

arch/i386/kernel/entry.S | 1
fs/namei.c | 142 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 143 insertions(+)

--- linux-2.6.7cow/arch/i386/kernel/entry.S~copyfile 2004-06-27 23:14:31.000000000 +0200
+++ linux-2.6.7cow/arch/i386/kernel/entry.S 2004-08-02 20:16:38.000000000 +0200
@@ -886,5 +886,6 @@
.long sys_vperfctr_unlink
.long sys_vperfctr_iresume
.long sys_vperfctr_read
+ .long sys_copyfile

syscall_table_size=(.-sys_call_table)
--- linux-2.6.7cow/fs/namei.c~copyfile 2004-06-27 23:14:31.000000000 +0200
+++ linux-2.6.7cow/fs/namei.c 2004-08-02 20:19:04.000000000 +0200
@@ -2144,6 +2144,148 @@
return error;
}

+/* note how we open() two struct file *, just to do the sendfile(). this
+ * should not be necessary, only nfs, smbfs and cifs depend on the struct
+ * file *, while afs and all local filesystems can do without.
+ * but changing all interfaces and fixing up three filesystems is a little
+ * too much work for now.
+ */
+#include <linux/file.h> /* for fput */
+static int copy_data(struct dentry *old_dentry, struct vfsmount *old_mnt,
+ struct dentry *new_dentry, struct vfsmount *new_mnt)
+{
+ int ret;
+ loff_t size;
+ ssize_t n;
+ struct file *old_file;
+ struct file *new_file;
+
+ dget(old_dentry);
+ mntget(old_mnt);
+ old_file = dentry_open(old_dentry, old_mnt, O_RDONLY);
+ if (IS_ERR(old_file))
+ return PTR_ERR(old_file);
+
+ dget(new_dentry);
+ mntget(new_mnt);
+ new_file = dentry_open(new_dentry, new_mnt, O_WRONLY);
+ ret = PTR_ERR(new_file);
+ if (IS_ERR(new_file))
+ goto out1;
+
+ size = i_size_read(old_file->f_dentry->d_inode);
+ if (((size_t)size != size) || ((ssize_t)size != size)) {
+ ret = -EFBIG;
+ goto out2;
+ }
+ n = vfs_sendfile(new_file, old_file, NULL, size, 0);
+ if (n == size)
+ ret = 0;
+ else if (n < 0)
+ ret = n;
+ else
+ ret = -ENOSPC;
+
+out2:
+ fput(new_file);
+out1:
+ fput(old_file);
+ return ret;
+}
+
+int vfs_copyfile(struct nameidata *old_nd, struct nameidata *new_nd,
+ struct dentry *new_dentry, int mode)
+{
+ int ret;
+
+ if (!old_nd->dentry->d_inode)
+ return -ENOENT;
+ if (!S_ISREG(old_nd->dentry->d_inode->i_mode))
+ return -EINVAL;
+ /* FIXME: replace with proper permission check */
+ if (new_dentry->d_inode)
+ return -EEXIST;
+
+ if (mode == -1)
+ mode = old_nd->dentry->d_inode->i_mode;
+
+ ret = vfs_create(new_nd->dentry->d_inode, new_dentry, mode, new_nd);
+ if (ret)
+ return ret;
+
+ ret = copy_data(old_nd->dentry, old_nd->mnt, new_dentry, new_nd->mnt);
+
+ if (ret) {
+ int error = vfs_unlink(new_nd->dentry->d_inode, new_dentry);
+ BUG_ON(error);
+ /* FIXME: not sure if there are return value we should not BUG()
+ * on */
+ }
+ return ret;
+}
+
+int do_copyfile(const char *from, const char *to, int mode)
+{
+ int ret;
+ struct dentry *new_dentry;
+ struct nameidata old_nd, new_nd;
+
+ ret = path_lookup(from, 0, &old_nd);
+ if (ret)
+ return ret;
+
+ ret = path_lookup(to, LOOKUP_PARENT|LOOKUP_OPEN|LOOKUP_CREATE, &new_nd);
+ if (ret)
+ goto out1;
+
+ new_dentry = lookup_create(&new_nd, 0);
+ ret = PTR_ERR(new_dentry);
+ if (IS_ERR(new_dentry))
+ goto out2;
+
+ ret = vfs_copyfile(&old_nd, &new_nd, new_dentry, mode);
+
+ dput(new_dentry);
+ up(&new_nd.dentry->d_inode->i_sem);
+out2:
+ path_release(&new_nd);
+out1:
+ path_release(&old_nd);
+ return ret;
+}
+
+/* copyfile() - create a new file and copy the old one's contents into it, all
+ * inside the kernel
+ *
+ * TODO:
+ * quota
+ * notification
+ * security
+ * unlock directory during data copy
+ * loop for data copy
+ */
+int sys_copyfile(const char __user *ufrom, const char __user *uto, umode_t mode)
+{
+ char *from, *to;
+ int ret;
+
+ from = getname(ufrom);
+ if (IS_ERR(from))
+ return PTR_ERR(from);
+
+ to = getname(uto);
+ ret = PTR_ERR(to);
+ if (IS_ERR(to))
+ goto err;
+
+ ret = do_copyfile(from, to, mode);
+
+ putname(to);
+err:
+ putname(from);
+ return ret;
+}
+
int vfs_readlink(struct dentry *dentry, char __user *buffer, int buflen, const char *link)
{
int len;

2004-09-04 17:08:48

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH 3/3] copyfile: copyfile

On Sat, 4 September 2004 19:04:25 +0200, J?rn Engel wrote:
>
> Comment?

Oh yes, I've been using the patches for a couple of month on my
private machine. Herbert Poetzl found and fixed one bug that I only
noticed on power-down (rare event) and was too lazy to care about.

Since then another month has passed and my disk still contains a few
files with useful data here and there. ;)

J?rn

--
When in doubt, punt. When somebody actually complains, go back and fix it...
The 90% solution is a good thing.
-- Rob Landley

2004-09-04 17:11:53

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 2/3] copyfile: sendfile

On Sat, Sep 04, 2004 at 06:59:38PM +0200, J?rn Engel wrote:
> Creates vfs_sendfile(), which can be called from other places within
> the kernel. Such other places include copyfile() and cowlinks.
>
> In principle, this just removes code from do_sendfile() to
> vfs_sendfile(). On top of that, it adds a check to out_inode,
> identical to the one on in_inode. True, the check for out_inode was
> never needed, maybe that tells you something about the check to
> in_inode as well. ;)

Both checks aren't nessecary.

> +++ linux-2.6.9-rc1-mm3/include/linux/syscalls.h 2004-09-04 18:17:15.000000000 +0200
> @@ -285,6 +285,8 @@
> asmlinkage long sys_unlink(const char __user *pathname);
> asmlinkage long sys_rename(const char __user *oldname,
> const char __user *newname);
> +asmlinkage long sys_copyfile(const char __user *from, const char __user *to,
> + umode_t mode);

oesn't seem to belong into this patch.

2004-09-04 17:12:26

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

> o unionmount - you might remember Jan Blunck's fix to ext3

Could you give some context please?

2004-09-04 17:17:18

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH 2/3] copyfile: sendfile

On Sat, 4 September 2004 18:11:43 +0100, Christoph Hellwig wrote:
> On Sat, Sep 04, 2004 at 06:59:38PM +0200, J?rn Engel wrote:
> > Creates vfs_sendfile(), which can be called from other places within
> > the kernel. Such other places include copyfile() and cowlinks.
> >
> > In principle, this just removes code from do_sendfile() to
> > vfs_sendfile(). On top of that, it adds a check to out_inode,
> > identical to the one on in_inode. True, the check for out_inode was
> > never needed, maybe that tells you something about the check to
> > in_inode as well. ;)
>
> Both checks aren't nessecary.

Ok, will remove them before resending.

> > +++ linux-2.6.9-rc1-mm3/include/linux/syscalls.h 2004-09-04 18:17:15.000000000 +0200
> > @@ -285,6 +285,8 @@
> > asmlinkage long sys_unlink(const char __user *pathname);
> > asmlinkage long sys_rename(const char __user *oldname,
> > const char __user *newname);
> > +asmlinkage long sys_copyfile(const char __user *from, const char __user *to,
> > + umode_t mode);
>
> oesn't seem to belong into this patch.

True, should have been in 3/3. Thanks.

J?rn

--
Anything that can go wrong, will.
-- Finagle's Law

2004-09-04 17:22:41

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

On Sat, 4 September 2004 18:12:20 +0100, Christoph Hellwig wrote:
>
> > o unionmount - you might remember Jan Blunck's fix to ext3
>
> Could you give some context please?

Trivial example union mount:

Top layer: <empty>
2nd layer: foo

Writes to foo have to be done in the top layer, so foo has to be
copied up first. And since that has to be done inside the kernel,
any possible implementation will be similar to my code.

For further details on unionmount, you should ask Jan directly. But
for politeness' sake, please wait a month or two, so he can focus on
his university exams before getting into the flamewars. ;)

J?rn

--
Eighty percent of success is showing up.
-- Woody Allen

2004-09-04 17:26:31

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

On Sat, Sep 04, 2004 at 07:22:23PM +0200, J?rn Engel wrote:
> On Sat, 4 September 2004 18:12:20 +0100, Christoph Hellwig wrote:
> >
> > > o unionmount - you might remember Jan Blunck's fix to ext3
> >
> > Could you give some context please?
>
> Trivial example union mount:
>
> Top layer: <empty>
> 2nd layer: foo
>
> Writes to foo have to be done in the top layer, so foo has to be
> copied up first. And since that has to be done inside the kernel,
> any possible implementation will be similar to my code.
>
> For further details on unionmount, you should ask Jan directly. But
> for politeness' sake, please wait a month or two, so he can focus on
> his university exams before getting into the flamewars. ;)

Oh, I know what union mounts are. But I wonder who's hacking on them
as they need some major VFS surgey to get right.

2004-09-04 22:41:04

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

J?rn Engel <[email protected]> wrote:
>
> o Add sendfile() support for file targets to normal mm/filemap.c.

This comes up fairly regularly. See, for example,
http://lwn.net/Articles/88751/ for what appears to be a much simpler
implementation.

I discussed file->file sendfile with Linus a while back and he said

> I think it was about doing a 2GB file-file sendfile, and see your system
> grind to a halt without being able to kill it.
>
> That said, we have some of the same problems with just regular read/write
> too. sendfile just makes it easier.
>
> We should probably make read/write be interruptible by _fatal_ signals.
> It would require a new task state, though (TASK_KILLABLE or something, and
> it would show up as a normal 'D' state).

I don't know how much of a problem this is in practice - there are all
sorts of nasty things which unprivileged apps can do to the system by
overloading filesystems. Although most of them can be killed off by the
sysadmin.

(My infamous bash-shared-mappings stresstest can spend ten or more minutes
within a single write() call, but you have to try hard to do this).

2004-09-05 00:17:10

by William Lee Irwin III

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

On Sat, Sep 04, 2004 at 03:39:02PM -0700, Andrew Morton wrote:
> I don't know how much of a problem this is in practice - there are all
> sorts of nasty things which unprivileged apps can do to the system by
> overloading filesystems. Although most of them can be killed off by the
> sysadmin.
> (My infamous bash-shared-mappings stresstest can spend ten or more minutes
> within a single write() call, but you have to try hard to do this).

This reminds me; I'm having a chicken and egg problem with several
stresstests I've written but withheld until fixes for the crashes they
trigger are available, but the fixes appear to be hard enough to arrange
they need public commentary to find acceptable methods of addressing
them. What's the recommended procedure for all this?

-- wli

2004-09-05 00:25:50

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

William Lee Irwin III <[email protected]> wrote:
>
> On Sat, Sep 04, 2004 at 03:39:02PM -0700, Andrew Morton wrote:
> > I don't know how much of a problem this is in practice - there are all
> > sorts of nasty things which unprivileged apps can do to the system by
> > overloading filesystems. Although most of them can be killed off by the
> > sysadmin.
> > (My infamous bash-shared-mappings stresstest can spend ten or more minutes
> > within a single write() call, but you have to try hard to do this).
>
> This reminds me; I'm having a chicken and egg problem with several
> stresstests I've written but withheld until fixes for the crashes they
> trigger are available, but the fixes appear to be hard enough to arrange
> they need public commentary to find acceptable methods of addressing
> them. What's the recommended procedure for all this?
>

A local DoS via resource exhaustion would not be an earth-shatteringly new
development. Just send 'em out.

2004-09-06 11:54:42

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

On Sat, 4 September 2004 15:39:02 -0700, Andrew Morton wrote:
> J?rn Engel <[email protected]> wrote:
> >
> > o Add sendfile() support for file targets to normal mm/filemap.c.
>
> This comes up fairly regularly. See, for example,
> http://lwn.net/Articles/88751/ for what appears to be a much simpler
> implementation.

Yes, I saw it and commented on it. Al didn't seem to like the set_fs
very much, so that specific implementation appears to be dead.

> I discussed file->file sendfile with Linus a while back and he said
>
> > I think it was about doing a 2GB file-file sendfile, and see your system
> > grind to a halt without being able to kill it.
> >
> > That said, we have some of the same problems with just regular read/write
> > too. sendfile just makes it easier.
> >
> > We should probably make read/write be interruptible by _fatal_ signals.
> > It would require a new task state, though (TASK_KILLABLE or something, and
> > it would show up as a normal 'D' state).
>
> I don't know how much of a problem this is in practice - there are all
> sorts of nasty things which unprivileged apps can do to the system by
> overloading filesystems. Although most of them can be killed off by the
> sysadmin.
>
> (My infamous bash-shared-mappings stresstest can spend ten or more minutes
> within a single write() call, but you have to try hard to do this).

Sure, a 2GB copyfile(2) is nasty (sendfile can at least return a short
count), esp. with slow media like a usb1 attached hard drive. Might
be a reasonable short-term solution to return -EFBIG whenever a user
tries to copy a file >$PRETTY_LARGE and doesn't have CAP_WHATNOT.

Would you accept such a solution? And if you would, which magic
constants should I use?

J?rn

--
When you close your hand, you own nothing. When you open it up, you
own the whole world.
-- Li Mu Bai in Tiger & Dragon

2004-09-06 12:45:59

by Gunnar Ritter

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

Andrew Morton <[email protected]> wrote:

> I discussed file->file sendfile with Linus a while back and he said
>
> > I think it was about doing a 2GB file-file sendfile, and see your system
> > grind to a halt without being able to kill it.
> >
> > That said, we have some of the same problems with just regular read/write
> > too. sendfile just makes it easier.
> >
> > We should probably make read/write be interruptible by _fatal_ signals.
> > It would require a new task state, though (TASK_KILLABLE or something, and
> > it would show up as a normal 'D' state).
>
> I don't know how much of a problem this is in practice

It is an even more serious problem in my experience. I have been
using sendfile() in my cp command at <http://heirloom.sourceforge.net>
for quite some time, and I quickly decided to send files separated in
some decently sized blocks. Otherwise if a whole file is sent at once
and the source file is e.g. on an uncached floppy disk, cp will become
uninterruptible for about a minute, which is a serious usability flaw.
The user might discover that he is copying the wrong file, or he might
simply change his mind and like to abort the copy or whatever. A
performance gain of only 10 % is neglegible in comparison to this
problem. Thus I think if copyfile() would not be interruptible by SIGINT
and friends, its practical value would be quite limited.

Gunnar

2004-09-06 13:36:18

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

On Mon, 6 September 2004 14:45:38 +0200, Gunnar Ritter wrote:
>
> It is an even more serious problem in my experience. I have been
> using sendfile() in my cp command at <http://heirloom.sourceforge.net>
> for quite some time, and I quickly decided to send files separated in
> some decently sized blocks. Otherwise if a whole file is sent at once
> and the source file is e.g. on an uncached floppy disk, cp will become
> uninterruptible for about a minute, which is a serious usability flaw.
> The user might discover that he is copying the wrong file, or he might
> simply change his mind and like to abort the copy or whatever. A
> performance gain of only 10 % is neglegible in comparison to this
> problem. Thus I think if copyfile() would not be interruptible by SIGINT
> and friends, its practical value would be quite limited.

Using a loop of 4k sendfile commands should be easy enough to do.
Problem is that copyfile(2) should do some decent cleanup after
receiving a signal. Hans Reiser got it right that all filesystem
operations should be atomic.

J?rn

--
Premature optimization is the root of all evil.
-- Donald Knuth

2004-09-06 14:32:17

by Gunnar Ritter

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

J?rn Engel <[email protected]> wrote:

> On Mon, 6 September 2004 14:45:38 +0200, Gunnar Ritter wrote:
> > It is an even more serious problem in my experience. I have been
> > using sendfile() in my cp command at <http://heirloom.sourceforge.net>
> > for quite some time, and I quickly decided to send files separated in
> > some decently sized blocks. Otherwise if a whole file is sent at once
> > and the source file is e.g. on an uncached floppy disk, cp will become
> > uninterruptible for about a minute, which is a serious usability flaw.
> > The user might discover that he is copying the wrong file, or he might
> > simply change his mind and like to abort the copy or whatever. A
> > performance gain of only 10 % is neglegible in comparison to this
> > problem. Thus I think if copyfile() would not be interruptible by SIGINT
> > and friends, its practical value would be quite limited.
>
> Using a loop of 4k sendfile commands should be easy enough to do.

Heck, guess what I did (although 4k seems a bit small).

> Problem is that copyfile(2) should do some decent cleanup after
> receiving a signal. Hans Reiser got it right that all filesystem
> operations should be atomic.

Then I don't see the point in having a copyfile system call. In
fact, I would consider to deactivate it in every kernel derivative
I'm responsible for to prevent hanging applications.

Gunnar

2004-09-06 14:45:00

by Oliver Neukum

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

Am Montag, 6. September 2004 16:32 schrieb Gunnar Ritter:
> Then I don't see the point in having a copyfile system call. In

Potentially tremendous speedups in networked filesystems.

Regards
Oliver

2004-09-06 15:51:53

by Gunnar Ritter

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

Oliver Neukum <[email protected]> wrote:

> Am Montag, 6. September 2004 16:32 schrieb Gunnar Ritter:
> > Then I don't see the point in having a copyfile system call. In
>
> Potentially tremendous speedups in networked filesystems.

Which are even particularly susceptible to operations the user
might want to interrupt.

Gunnar

2004-09-07 08:32:50

by Jan Blunck

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

Christoph Hellwig wrote:
>
>>>Could you give some context please?
>
>
> Oh, I know what union mounts are. But I wonder who's hacking on them
> as they need some major VFS surgey to get right.
>

Ok, first sorry for the late reply. I'm working on a vfs-based
union-mount implementation for linux. The focus is on the basic
functionality explained by McKusick [1] without a need for a new
separate filesystem, without options like "before". "after", "in
between" or something like that, with permanent whiteouts if the fstype
supports them. For the copyup I plan to use J?rn's patch.
At the moment some basic stuff (actually the unions) is working, so you
actually can "see" something but I don't think that the patches are
ready for being posted now. Please relax and wait a few weeks because I
have to go back to university for some final exams. After that I'm back
for some more full-time hacking on union-mounts.

[1]:http://www.usenix.org/publications/library/proceedings/neworl/full_papers/mckusick.a

Regards,
Jan

2004-09-07 11:09:58

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

On Mon, 6 September 2004 16:32:06 +0200, Gunnar Ritter wrote:
> J?rn Engel <[email protected]> wrote:
>
> > Using a loop of 4k sendfile commands should be easy enough to do.
>
> Heck, guess what I did (although 4k seems a bit small).

I did the loop inside the kernel, so the syscall overhead is less of
an issue. 4k is a safe bet for _really_ slow devices and if people
want to increase it, hey, it's just a single constant to touch.

> > Problem is that copyfile(2) should do some decent cleanup after
> > receiving a signal. Hans Reiser got it right that all filesystem
> > operations should be atomic.
>
> Then I don't see the point in having a copyfile system call. In
> fact, I would consider to deactivate it in every kernel derivative
> I'm responsible for to prevent hanging applications.

Personally, I don't care much either. It's nice to get some test
coverage and Steve French liked to have it for cifs. Anyway, for the
curious, here is the loop patch.

Tested, sendfile(2) returns a short count if you send a signal to the
calling process. Add another loop in the userspace caller to deal
with it, if you don't already have it. It's a valid and documented
return value, after all.

Andrew, I'll resend all four patches to you in new thread.

J?rn

--
Ninety percent of everything is crap.
-- Sturgeon's Law

Linus and Andrew are rightfully concerned about local DoS via a large
file->file sendfile(). This patch turns large sendfile() calls into a
loop of 4k chunks. After each chunk, it adds a cond_resched for
interactivity and a signal check to allow aborts etc. after the user
found out what a bad idea this may be.

Signed-off-by: J?rn Engel <[email protected]>
---

read_write.c | 31 ++++++++++++++++++++++++++++++-
1 files changed, 30 insertions(+), 1 deletion(-)

--- linux-2.6.8cow/fs/read_write.c~sendfile_loop 2004-09-05 12:06:39.000000000 +0200
+++ linux-2.6.8cow/fs/read_write.c 2004-09-07 11:18:55.000000000 +0200
@@ -561,6 +561,35 @@
return ret;
}

+/**
+ * sendfile() of a 2GB file over usb1-attached hard drives can take a moment.
+ * This little loop is supposed to stop now and then to check for signals,
+ * reschedule and generally play nice with others.
+ */
+ssize_t inline __vfs_sendfile(struct file *in_file, loff_t *ppos, size_t count,
+ read_actor_t actor, struct file *out_file)
+{
+ ssize_t done = 0, ret;
+ while (count) {
+ size_t n = min(count, (size_t)4096);
+ ret = in_file->f_op->sendfile(in_file, ppos, n, actor,out_file);
+ if (ret < 0) {
+ if (done)
+ return done;
+ else
+ return ret;
+ }
+
+ done += ret;
+ count -= ret;
+
+ cond_resched();
+ if (signal_pending(current))
+ break;
+ }
+ return done;
+}
+
ssize_t vfs_sendfile(struct file *out_file, struct file *in_file, loff_t *ppos,
size_t count, loff_t max)
{
@@ -608,7 +637,7 @@
count = max - pos;
}

- ret = in_file->f_op->sendfile(in_file, ppos, count, file_send_actor, out_file);
+ ret = __vfs_sendfile(in_file, ppos, count, file_send_actor, out_file);

if (*ppos > max)
return -EOVERFLOW;

2004-09-07 11:46:25

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

On Tue, 7 September 2004 13:09:13 +0200, J?rn Engel wrote:
>
> I did the loop inside the kernel, so the syscall overhead is less of
> an issue. 4k is a safe bet for _really_ slow devices and if people
> want to increase it, hey, it's just a single constant to touch.

And a stupid bug I missed. Call sendfile with count>filesize and the
loop doesn't terminate. Fixed patch below.

J?rn

--
Premature optimization is the root of all evil.
-- Donald Knuth

Linus and Andrew are rightfully concerned about local DoS via a large
file->file sendfile(). This patch turns large sendfile() calls into a
loop of 4k chunks. After each chunk, it adds a cond_resched for
interactivity and a signal check to allow aborts etc. after the user
found out what a bad idea this may be.

Signed-off-by: J?rn Engel <[email protected]>
---

read_write.c | 31 ++++++++++++++++++++++++++++++-
1 files changed, 30 insertions(+), 1 deletion(-)

--- linux-2.6.8cow/fs/read_write.c~sendfile_loop 2004-09-05 12:06:39.000000000 +0200
+++ linux-2.6.8cow/fs/read_write.c 2004-09-07 13:27:30.000000000 +0200
@@ -561,6 +561,35 @@
return ret;
}

+/**
+ * sendfile() of a 2GB file over usb1-attached hard drives can take a moment.
+ * This little loop is supposed to stop now and then to check for signals,
+ * reschedule and generally play nice with others.
+ */
+ssize_t inline __vfs_sendfile(struct file *in_file, loff_t *ppos, size_t count,
+ read_actor_t actor, struct file *out_file)
+{
+ ssize_t done = 0, ret;
+ while (count) {
+ size_t n = min(count, (size_t)4096);
+ ret = in_file->f_op->sendfile(in_file, ppos, n, actor,out_file);
+ if (ret < 0) {
+ if (done)
+ return done;
+ else
+ return ret;
+ }
+
+ done += ret;
+ count -= ret;
+
+ if ((ret == 0) || signal_pending(current))
+ break;
+ cond_resched();
+ }
+ return done;
+}
+
ssize_t vfs_sendfile(struct file *out_file, struct file *in_file, loff_t *ppos,
size_t count, loff_t max)
{
@@ -608,7 +637,7 @@
count = max - pos;
}

- ret = in_file->f_op->sendfile(in_file, ppos, count, file_send_actor, out_file);
+ ret = __vfs_sendfile(in_file, ppos, count, file_send_actor, out_file);

if (*ppos > max)
return -EOVERFLOW;

2004-09-07 11:50:28

by Gunnar Ritter

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

Jörn Engel <[email protected]> wrote:

> Tested, sendfile(2) returns a short count if you send a signal to the
> calling process.

Fine, thank you. This makes it really useful.

> Add another loop in the userspace caller to deal with it, if you don't
> already have it. It's a valid and documented return value, after all.

Of course (although cp will be terminated by the SIGINT anyway, so it
does not matter in this situation).

Do I understand this correctly that sendfile now behaves like write with
SA_RESTART not set for the signal? If so, it might perhaps make sense to
add SA_RESTART semantics too, so that sendfile then just continues if the
process catches the signal and does not abort (scenario: SIGWINCH is sent
to a curses-based file manager).

Gunnar

2004-09-07 12:03:36

by Gunnar Ritter

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

Jörn Engel <[email protected]> wrote:

Also it might perhaps make sense to add the kernel programmer's
equivalent of

> + while (count) {
> + size_t n = min(count, (size_t)4096);
> + ret = in_file->f_op->sendfile(in_file, ppos, n, actor,out_file);
> + if (ret < 0) {
> + if (done)
> + return done;
> + else
> + return ret;
> + }
> +
> + done += ret;
> + count -= ret;
> +
> + if ((ret == 0) || signal_pending(current))
{
if (count == 0) {
done = -1;
errno = EINTR;
}
> + break;
}
> + cond_resched();
> + }
> + return done;
> +}

here, for write-like semantics, too.

Gunnar

2004-09-07 12:08:48

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

On Tue, 7 September 2004 13:48:54 +0200, Gunnar Ritter wrote:
> J?rn Engel <[email protected]> wrote:
>
> > Add another loop in the userspace caller to deal with it, if you don't
> > already have it. It's a valid and documented return value, after all.
>
> Of course (although cp will be terminated by the SIGINT anyway, so it
> does not matter in this situation).

I tested by hitting Ctrl-z. My admittedly stupid test program didn't
have a loop, so the fg terminated it. :)

> Do I understand this correctly that sendfile now behaves like write with
> SA_RESTART not set for the signal?

Yes.

> If so, it might perhaps make sense to
> add SA_RESTART semantics too, so that sendfile then just continues if the
> process catches the signal and does not abort (scenario: SIGWINCH is sent
> to a curses-based file manager).

It might, not sure. I've never actually looked at it, never missed
it, don't need it. Maybe someone else has a strong feeling one way or
the other?

J?rn

--
He who knows others is wise.
He who knows himself is enlightened.
-- Lao Tsu

2004-09-07 12:09:30

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

On Tue, 7 September 2004 14:03:19 +0200, Gunnar Ritter wrote:
> J?rn Engel <[email protected]> wrote:
>
> Also it might perhaps make sense to add the kernel programmer's
> equivalent of
>
> > + while (count) {
> > + size_t n = min(count, (size_t)4096);
> > + ret = in_file->f_op->sendfile(in_file, ppos, n, actor,out_file);
> > + if (ret < 0) {
> > + if (done)
> > + return done;
> > + else
> > + return ret;
> > + }
> > +
> > + done += ret;
> > + count -= ret;
> > +
> > + if ((ret == 0) || signal_pending(current))
> {
> if (count == 0) {
> done = -1;
> errno = EINTR;
> }
> > + break;
> }
> > + cond_resched();
> > + }
> > + return done;
> > +}
>
> here, for write-like semantics, too.

Is that really needed? If in_file->f_op->sendfile() returns some
error, it's already handled above. If it returns 0, done remains 0
and we return 0, indicating EOF. And the check for pending signals
happens after handling the first chunk, so -EINTR shouldn't be
necessary.

But I could be stupid and miss something.

J?rn

--
Public Domain - Free as in Beer
General Public - Free as in Speech
BSD License - Free as in Enterprise
Shared Source - Free as in "Work will make you..."

2004-09-07 12:32:45

by Gunnar Ritter

[permalink] [raw]

Subject: Re: [PATCH 1/3] copyfile: generic_sendpage

Jörn Engel <[email protected]> wrote:

> On Tue, 7 September 2004 14:03:19 +0200, Gunnar Ritter wrote:
> > if (count == 0) {
> > done = -1;
> > errno = EINTR; [...]
> Is that really needed? If in_file->f_op->sendfile() returns some
> error, it's already handled above. If it returns 0, done remains 0
> and we return 0, indicating EOF. And the check for pending signals
> happens after handling the first chunk, so -EINTR shouldn't be
> necessary.

Yes, this was obviously not the correct location for the check then.

Gunnar