Hi, Andrew. Please consider the patch below for the next open 2.6 release.
The problem is that close() syscalls can call a file system's flush
handler, which in turn might sleep interruptibly and ultimately pass
back an -ERESTARTSYS return value. This happens for files backed by
an interruptible NFS mount under nfs_file_flush() when a large file
has just been written and nfs_wait_bit_interruptible() detects that
there is a signal pending.
I have a test case where the "strace" command is used to attach to a
process sleeping in such a close(). Since the SIGSTOP is forced onto
the victim process (removing it from the thread's "blocked" mask in
force_sig_info()), the RPC wait is interrupted and the close() is
terminated early.
But the file table entry has already been cleared before the flush
handler was called. Thus, when the syscall is restarted, the file
descriptor appears closed and an EBADF error is returned (which is
wrong). What's worse, there is the hypothetical case where another
thread of a multi-threaded application might have reused the file
descriptor, in which case that file would be mistakenly closed.
The bottom line is that close() syscalls are not restartable, and
thus -ERESTARTSYS return values should be mapped to -EINTR. This
is consistent with the close(2) manual page. The fix is below.
Cheers. -ernie
Signed-off-by: Ernie Petrides <[email protected]>
--- linux-2.6.17/fs/open.c.orig
+++ linux-2.6.17/fs/open.c
@@ -1172,6 +1172,7 @@ asmlinkage long sys_close(unsigned int f
struct file * filp;
struct files_struct *files = current->files;
struct fdtable *fdt;
+ int retval;
spin_lock(&files->file_lock);
fdt = files_fdtable(files);
@@ -1184,7 +1185,10 @@ asmlinkage long sys_close(unsigned int f
FD_CLR(fd, fdt->close_on_exec);
__put_unused_fd(files, fd);
spin_unlock(&files->file_lock);
- return filp_close(filp, files);
+ retval = filp_close(filp, files);
+
+ /* can't restart close syscall because file table entry was cleared */
+ return (retval == -ERESTARTSYS) ? -EINTR : retval;
out_unlock:
spin_unlock(&files->file_lock);
On Thursday, 10-Aug-2006 at 19:36 EDT, Ernie Petrides wrote:
> [...]
> The bottom line is that close() syscalls are not restartable, and
> thus -ERESTARTSYS return values should be mapped to -EINTR. This
> is consistent with the close(2) manual page. The fix is below.
> [...]
> --- linux-2.6.17/fs/open.c.orig
> +++ linux-2.6.17/fs/open.c
> @@ -1172,6 +1172,7 @@ asmlinkage long sys_close(unsigned int f
> struct file * filp;
> struct files_struct *files = current->files;
> struct fdtable *fdt;
> + int retval;
>
> spin_lock(&files->file_lock);
> fdt = files_fdtable(files);
> @@ -1184,7 +1185,10 @@ asmlinkage long sys_close(unsigned int f
> FD_CLR(fd, fdt->close_on_exec);
> __put_unused_fd(files, fd);
> spin_unlock(&files->file_lock);
> - return filp_close(filp, files);
> + retval = filp_close(filp, files);
> +
> + /* can't restart close syscall because file table entry was cleared */
> + return (retval == -ERESTARTSYS) ? -EINTR : retval;
>
> out_unlock:
> spin_unlock(&files->file_lock);
Hi, Andrew. Roland McGrath has pointed out to me that my original patch
for this problem only handles the case of ERESTARTSYS, but doesn't treat
the three other restart pseudo-error codes similarly.
I don't see how any code currently in the 2.6.17 source tree could possibly
result in any ERESTARTNOINTR, ERESTARTNOHAND, or ERESTART_RESTARTBLOCK
errors being sent back through filp_close().
But if you prefer the extra checks for completeness, feel free to add
the incremental patch below (on top of my previous patch to check for
ERESTARTSYS).
Cheers. -ernie
--- linux-2.6.17/fs/open.c.prev
+++ linux-2.6.17/fs/open.c
@@ -1188,7 +1188,13 @@ asmlinkage long sys_close(unsigned int f
retval = filp_close(filp, files);
/* can't restart close syscall because file table entry was cleared */
- return (retval == -ERESTARTSYS) ? -EINTR : retval;
+ if (unlikely(retval == -ERESTARTSYS ||
+ retval == -ERESTARTNOINTR ||
+ retval == -ERESTARTNOHAND ||
+ retval == -ERESTART_RESTARTBLOCK))
+ retval = -EINTR;
+
+ return retval;
out_unlock:
spin_unlock(&files->file_lock);