2008-01-27 02:18:22

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [0/18] Implement some low hanging BKL removal fruit in fs/*


[Andrew: I believe this is -mm material for .25]

- Convert some more file systems (generally those who don't use the BKL
for anything except mount) to use unlocked_bkl.
- Implement BKL less fasync (see patch for the rationale)
This is currently a separate entry point, but since the number of fasync
users in the tree is relatively small I hope the older entry point can
be removed then in the not too far future
[help from other people converting more fasync users to unlocked_fasync
would be appreciated]
- Implement BKL less remote_llseek
- While I was at it I also added a few missing compat ioctl handlers
- Fix a few comments

This fixes a lot of relatively trivial BKL users in fs/*. The main
remaining non legacy offenders are now locks.c, nfs/nfsd and reiserfs.
I believe BKL removal for all of those is being worked on by other people.
Also a lot of "legacy" file systems still use it, but converting those
does not seem to be very pressing.

-Andi


2008-01-27 02:17:28

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [1/18] BKL-removal: Convert ext2 over to use unlocked_ioctl


I checked ext2_ioctl and could not find anything in there that would
need the BKL. So convert it over to use unlocked_ioctl

Signed-off-by: Andi Kleen <[email protected]>

---
fs/ext2/dir.c | 2 +-
fs/ext2/ext2.h | 3 +--
fs/ext2/file.c | 4 ++--
fs/ext2/ioctl.c | 12 +++---------
4 files changed, 7 insertions(+), 14 deletions(-)

Index: linux/fs/ext2/dir.c
===================================================================
--- linux.orig/fs/ext2/dir.c
+++ linux/fs/ext2/dir.c
@@ -703,7 +703,7 @@ const struct file_operations ext2_dir_op
.llseek = generic_file_llseek,
.read = generic_read_dir,
.readdir = ext2_readdir,
- .ioctl = ext2_ioctl,
+ .unlocked_ioctl = ext2_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = ext2_compat_ioctl,
#endif
Index: linux/fs/ext2/ext2.h
===================================================================
--- linux.orig/fs/ext2/ext2.h
+++ linux/fs/ext2/ext2.h
@@ -139,8 +139,7 @@ int __ext2_write_begin(struct file *file
struct page **pagep, void **fsdata);

/* ioctl.c */
-extern int ext2_ioctl (struct inode *, struct file *, unsigned int,
- unsigned long);
+extern long ext2_ioctl(struct file *, unsigned int, unsigned long);
extern long ext2_compat_ioctl(struct file *, unsigned int, unsigned long);

/* namei.c */
Index: linux/fs/ext2/file.c
===================================================================
--- linux.orig/fs/ext2/file.c
+++ linux/fs/ext2/file.c
@@ -48,7 +48,7 @@ const struct file_operations ext2_file_o
.write = do_sync_write,
.aio_read = generic_file_aio_read,
.aio_write = generic_file_aio_write,
- .ioctl = ext2_ioctl,
+ .unlocked_ioctl = ext2_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = ext2_compat_ioctl,
#endif
@@ -65,7 +65,7 @@ const struct file_operations ext2_xip_fi
.llseek = generic_file_llseek,
.read = xip_file_read,
.write = xip_file_write,
- .ioctl = ext2_ioctl,
+ .unlocked_ioctl = ext2_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = ext2_compat_ioctl,
#endif
Index: linux/fs/ext2/ioctl.c
===================================================================
--- linux.orig/fs/ext2/ioctl.c
+++ linux/fs/ext2/ioctl.c
@@ -17,9 +17,9 @@
#include <asm/uaccess.h>


-int ext2_ioctl (struct inode * inode, struct file * filp, unsigned int cmd,
- unsigned long arg)
+long ext2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
+ struct inode *inode = filp->f_dentry->d_inode;
struct ext2_inode_info *ei = EXT2_I(inode);
unsigned int flags;
unsigned short rsv_window_size;
@@ -141,9 +141,6 @@ int ext2_ioctl (struct inode * inode, st
#ifdef CONFIG_COMPAT
long ext2_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
- struct inode *inode = file->f_path.dentry->d_inode;
- int ret;
-
/* These are just misnamed, they actually get/put from/to user an int */
switch (cmd) {
case EXT2_IOC32_GETFLAGS:
@@ -161,9 +158,6 @@ long ext2_compat_ioctl(struct file *file
default:
return -ENOIOCTLCMD;
}
- lock_kernel();
- ret = ext2_ioctl(inode, file, cmd, (unsigned long) compat_ptr(arg));
- unlock_kernel();
- return ret;
+ return ext2_ioctl(file, cmd, (unsigned long) compat_ptr(arg));
}
#endif

2008-01-27 02:17:50

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [2/18] BKL-removal: Remove incorrect BKL comment in ext2


No BKL used anywhere, so don't mention it.

Signed-off-by: Andi Kleen <[email protected]>

---
fs/ext2/inode.c | 1 -
1 file changed, 1 deletion(-)

Index: linux/fs/ext2/inode.c
===================================================================
--- linux.orig/fs/ext2/inode.c
+++ linux/fs/ext2/inode.c
@@ -569,7 +569,6 @@ static void ext2_splice_branch(struct in
*
* `handle' can be NULL if create == 0.
*
- * The BKL may not be held on entry here. Be sure to take it early.
* return > 0, # of blocks mapped or allocated.
* return = 0, if plain lookup failed.
* return < 0, error case.

2008-01-27 02:18:45

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [4/18] ext3: Remove incorrect BKL comment


There is no BKL held on entry in ->fsync nor in the low level ext3_sync_file.

Cc: [email protected]
Cc: [email protected]

Signed-off-by: Andi Kleen <[email protected]>

---
fs/ext3/dir.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/fs/ext3/dir.c
===================================================================
--- linux.orig/fs/ext3/dir.c
+++ linux/fs/ext3/dir.c
@@ -46,7 +46,7 @@ const struct file_operations ext3_dir_op
#ifdef CONFIG_COMPAT
.compat_ioctl = ext3_compat_ioctl,
#endif
- .fsync = ext3_sync_file, /* BKL held */
+ .fsync = ext3_sync_file,
.release = ext3_release_dir,
};

2008-01-27 02:19:10

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [7/18] BKL-removal: Remove incorrect comments refering to BKL from ext4


BKL is not hold in any of those

Cc: [email protected]
Cc: [email protected]

Signed-off-by: Andi Kleen <[email protected]>

---
fs/ext4/dir.c | 2 +-
fs/ext4/inode.c | 1 -
2 files changed, 1 insertion(+), 2 deletions(-)

Index: linux/fs/ext4/dir.c
===================================================================
--- linux.orig/fs/ext4/dir.c
+++ linux/fs/ext4/dir.c
@@ -46,7 +46,7 @@ const struct file_operations ext4_dir_op
#ifdef CONFIG_COMPAT
.compat_ioctl = ext4_compat_ioctl,
#endif
- .fsync = ext4_sync_file, /* BKL held */
+ .fsync = ext4_sync_file,
.release = ext4_release_dir,
};

Index: linux/fs/ext4/inode.c
===================================================================
--- linux.orig/fs/ext4/inode.c
+++ linux/fs/ext4/inode.c
@@ -778,7 +778,6 @@ err_out:
*
* `handle' can be NULL if create == 0.
*
- * The BKL may not be held on entry here. Be sure to take it early.
* return > 0, # of blocks mapped or allocated.
* return = 0, if plain lookup failed.
* return < 0, error case.

2008-01-27 02:19:29

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek


- Replace remote_llseek with remote_llseek_unlocked (to force compilation
failures in all users)
- Change all users to either use remote_llseek directly or take the
BKL around. I changed the file systems who don't use the BKL
for anything (CIFS, GFS) to call it directly. NCPFS and SMBFS and NFS
take the BKL, but explicitely in their own source now.

I moved them all over in a single patch to avoid unbisectable sections.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

Signed-off-by: Andi Kleen <[email protected]>

---
fs/cifs/cifsfs.c | 2 +-
fs/gfs2/ops_file.c | 4 ++--
fs/ncpfs/file.c | 12 +++++++++++-
fs/nfs/file.c | 6 +++++-
fs/read_write.c | 7 +++----
fs/smbfs/file.c | 11 ++++++++++-
include/linux/fs.h | 3 ++-
7 files changed, 34 insertions(+), 11 deletions(-)

Index: linux/fs/cifs/cifsfs.c
===================================================================
--- linux.orig/fs/cifs/cifsfs.c
+++ linux/fs/cifs/cifsfs.c
@@ -549,7 +549,7 @@ static loff_t cifs_llseek(struct file *f
if (retval < 0)
return (loff_t)retval;
}
- return remote_llseek(file, offset, origin);
+ return remote_llseek_unlocked(file, offset, origin);
}

static struct file_system_type cifs_fs_type = {
Index: linux/fs/gfs2/ops_file.c
===================================================================
--- linux.orig/fs/gfs2/ops_file.c
+++ linux/fs/gfs2/ops_file.c
@@ -107,11 +107,11 @@ static loff_t gfs2_llseek(struct file *f
error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, LM_FLAG_ANY,
&i_gh);
if (!error) {
- error = remote_llseek(file, offset, origin);
+ error = remote_llseek_unlocked(file, offset, origin);
gfs2_glock_dq_uninit(&i_gh);
}
} else
- error = remote_llseek(file, offset, origin);
+ error = remote_llseek_unlocked(file, offset, origin);

return error;
}
Index: linux/fs/ncpfs/file.c
===================================================================
--- linux.orig/fs/ncpfs/file.c
+++ linux/fs/ncpfs/file.c
@@ -18,6 +18,7 @@
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/sched.h>
+#include <linux/smp_lock.h>

#include <linux/ncp_fs.h>
#include "ncplib_kernel.h"
@@ -281,9 +282,18 @@ static int ncp_release(struct inode *ino
return 0;
}

+static loff_t ncp_remote_llseek(struct file *file, loff_t offset, int origin)
+{
+ loff_t ret;
+ lock_kernel();
+ ret = remote_llseek_unlocked(file, offset, origin);
+ unlock_kernel();
+ return ret;
+}
+
const struct file_operations ncp_file_operations =
{
- .llseek = remote_llseek,
+ .llseek = ncp_remote_llseek,
.read = ncp_file_read,
.write = ncp_file_write,
.ioctl = ncp_ioctl,
Index: linux/fs/read_write.c
===================================================================
--- linux.orig/fs/read_write.c
+++ linux/fs/read_write.c
@@ -58,11 +58,10 @@ loff_t generic_file_llseek(struct file *

EXPORT_SYMBOL(generic_file_llseek);

-loff_t remote_llseek(struct file *file, loff_t offset, int origin)
+loff_t remote_llseek_unlocked(struct file *file, loff_t offset, int origin)
{
long long retval;

- lock_kernel();
switch (origin) {
case SEEK_END:
offset += i_size_read(file->f_path.dentry->d_inode);
@@ -73,15 +72,15 @@ loff_t remote_llseek(struct file *file,
retval = -EINVAL;
if (offset>=0 && offset<=file->f_path.dentry->d_inode->i_sb->s_maxbytes) {
if (offset != file->f_pos) {
+ /* AK: do we need a lock for those? */
file->f_pos = offset;
file->f_version = 0;
}
retval = offset;
}
- unlock_kernel();
return retval;
}
-EXPORT_SYMBOL(remote_llseek);
+EXPORT_SYMBOL(remote_llseek_unlocked);

loff_t no_llseek(struct file *file, loff_t offset, int origin)
{
Index: linux/fs/smbfs/file.c
===================================================================
--- linux.orig/fs/smbfs/file.c
+++ linux/fs/smbfs/file.c
@@ -422,9 +422,18 @@ smb_file_permission(struct inode *inode,
return error;
}

+static loff_t smb_remote_llseek(struct file *file, loff_t offset, int origin)
+{
+ loff_t ret;
+ lock_kernel();
+ ret = remote_llseek_unlocked(file, offset, origin);
+ unlock_kernel();
+ return ret;
+}
+
const struct file_operations smb_file_operations =
{
- .llseek = remote_llseek,
+ .llseek = smb_remote_llseek,
.read = do_sync_read,
.aio_read = smb_file_aio_read,
.write = do_sync_write,
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h
+++ linux/include/linux/fs.h
@@ -1825,7 +1825,8 @@ extern void
file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping);
extern loff_t no_llseek(struct file *file, loff_t offset, int origin);
extern loff_t generic_file_llseek(struct file *file, loff_t offset, int origin);
-extern loff_t remote_llseek(struct file *file, loff_t offset, int origin);
+extern loff_t remote_llseek_unlocked(struct file *file, loff_t offset,
+ int origin);
extern int generic_file_open(struct inode * inode, struct file * filp);
extern int nonseekable_open(struct inode * inode, struct file * filp);

Index: linux/fs/nfs/file.c
===================================================================
--- linux.orig/fs/nfs/file.c
+++ linux/fs/nfs/file.c
@@ -166,6 +166,7 @@ force_reval:

static loff_t nfs_file_llseek(struct file *filp, loff_t offset, int origin)
{
+ loff_t loff;
/* origin == SEEK_END => we must revalidate the cached file length */
if (origin == SEEK_END) {
struct inode *inode = filp->f_mapping->host;
@@ -173,7 +174,10 @@ static loff_t nfs_file_llseek(struct fil
if (retval < 0)
return (loff_t)retval;
}
- return remote_llseek(filp, offset, origin);
+ lock_kernel(); /* BKL needed? */
+ loff = remote_llseek_unlocked(filp, offset, origin);
+ unlock_kernel();
+ return loff;
}

/*

2008-01-27 02:19:42

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [10/18] BKL-removal: Implement a compat_ioctl handler for JFS


The ioctls were already compatible except for the actual values so this
was fairly easy to do.

Cc: [email protected]

Signed-off-by: Andi Kleen <[email protected]>

---
fs/jfs/file.c | 3 +++
fs/jfs/ioctl.c | 18 ++++++++++++++++++
fs/jfs/jfs_dinode.h | 2 ++
fs/jfs/jfs_inode.h | 1 +
fs/jfs/namei.c | 3 +++
5 files changed, 27 insertions(+)

Index: linux/fs/jfs/file.c
===================================================================
--- linux.orig/fs/jfs/file.c
+++ linux/fs/jfs/file.c
@@ -113,4 +113,7 @@ const struct file_operations jfs_file_op
.fsync = jfs_fsync,
.release = jfs_release,
.unlocked_ioctl = jfs_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = jfs_compat_ioctl,
+#endif
};
Index: linux/fs/jfs/ioctl.c
===================================================================
--- linux.orig/fs/jfs/ioctl.c
+++ linux/fs/jfs/ioctl.c
@@ -117,3 +117,21 @@ long jfs_ioctl(struct file *filp, unsign
}
}

+#ifdef CONFIG_COMPAT
+long jfs_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+ /* While these ioctl numbers defined with 'long' and have different
+ * numbers than the 64bit ABI,
+ * the actual implementation only deals with ints and is compatible.
+ */
+ switch (cmd) {
+ case JFS_IOC_GETFLAGS32:
+ cmd = JFS_IOC_GETFLAGS;
+ break;
+ case JFS_IOC_SETFLAGS32:
+ cmd = JFS_IOC_SETFLAGS;
+ break;
+ }
+ return jfs_ioctl(filp, cmd, arg);
+}
+#endif
Index: linux/fs/jfs/jfs_inode.h
===================================================================
--- linux.orig/fs/jfs/jfs_inode.h
+++ linux/fs/jfs/jfs_inode.h
@@ -23,6 +23,7 @@ struct fid;
extern struct inode *ialloc(struct inode *, umode_t);
extern int jfs_fsync(struct file *, struct dentry *, int);
extern long jfs_ioctl(struct file *, unsigned int, unsigned long);
+extern long jfs_compat_ioctl(struct file *, unsigned int, unsigned long);
extern void jfs_read_inode(struct inode *);
extern int jfs_commit_inode(struct inode *, int);
extern int jfs_write_inode(struct inode*, int);
Index: linux/fs/jfs/namei.c
===================================================================
--- linux.orig/fs/jfs/namei.c
+++ linux/fs/jfs/namei.c
@@ -1563,6 +1563,9 @@ const struct file_operations jfs_dir_ope
.readdir = jfs_readdir,
.fsync = jfs_fsync,
.unlocked_ioctl = jfs_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = jfs_compat_ioctl,
+#endif
};

static int jfs_ci_hash(struct dentry *dir, struct qstr *this)
Index: linux/fs/jfs/jfs_dinode.h
===================================================================
--- linux.orig/fs/jfs/jfs_dinode.h
+++ linux/fs/jfs/jfs_dinode.h
@@ -170,5 +170,7 @@ struct dinode {
#define JFS_IOC_GETFLAGS _IOR('f', 1, long)
#define JFS_IOC_SETFLAGS _IOW('f', 2, long)

+#define JFS_IOC_GETFLAGS32 _IOR('f', 1, int)
+#define JFS_IOC_SETFLAGS32 _IOW('f', 2, int)

#endif /*_H_JFS_DINODE */

2008-01-27 02:19:58

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [14/18] BKL-removal: Add unlocked_fasync


Add a new fops entry point to allow fasync without BKL. While it's arguably
unclear this entry point is called often enough for it really matters
it was still relatively easy to do. And there are far less async users
in the tree than ioctls so it's likely they can be all converted
eventually and then the non unlocked async entry point could be dropped.

Signed-off-by: Andi Kleen <[email protected]>

---
Documentation/filesystems/vfs.txt | 5 ++++-
fs/fcntl.c | 6 +++++-
fs/ioctl.c | 5 ++++-
include/linux/fs.h | 1 +
4 files changed, 14 insertions(+), 3 deletions(-)

Index: linux/fs/fcntl.c
===================================================================
--- linux.orig/fs/fcntl.c
+++ linux/fs/fcntl.c
@@ -240,11 +240,15 @@ static int setfl(int fd, struct file * f

lock_kernel();
if ((arg ^ filp->f_flags) & FASYNC) {
- if (filp->f_op && filp->f_op->fasync) {
+ if (filp->f_op && filp->f_op->unlocked_fasync)
+ error = filp->f_op->unlocked_fasync(fd, filp,
+ !!(arg & FASYNC));
+ else if (filp->f_op && filp->f_op->fasync) {
error = filp->f_op->fasync(fd, filp, (arg & FASYNC) != 0);
if (error < 0)
goto out;
}
+ /* AK: no else error = -EINVAL here? */
}

filp->f_flags = (arg & SETFL_MASK) | (filp->f_flags & ~SETFL_MASK);
Index: linux/fs/ioctl.c
===================================================================
--- linux.orig/fs/ioctl.c
+++ linux/fs/ioctl.c
@@ -118,7 +118,10 @@ int vfs_ioctl(struct file *filp, unsigne

/* Did FASYNC state change ? */
if ((flag ^ filp->f_flags) & FASYNC) {
- if (filp->f_op && filp->f_op->fasync) {
+ if (filp->f_op && filp->f_op->unlocked_fasync)
+ error = filp->f_op->unlocked_fasync(fd,
+ filp, on);
+ else if (filp->f_op && filp->f_op->fasync) {
lock_kernel();
error = filp->f_op->fasync(fd, filp, on);
unlock_kernel();
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h
+++ linux/include/linux/fs.h
@@ -1179,6 +1179,7 @@ struct file_operations {
int (*fsync) (struct file *, struct dentry *, int datasync);
int (*aio_fsync) (struct kiocb *, int datasync);
int (*fasync) (int, struct file *, int);
+ int (*unlocked_fasync) (int, struct file *, int);
int (*lock) (struct file *, int, struct file_lock *);
ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
Index: linux/Documentation/filesystems/vfs.txt
===================================================================
--- linux.orig/Documentation/filesystems/vfs.txt
+++ linux/Documentation/filesystems/vfs.txt
@@ -769,6 +769,7 @@ struct file_operations {
int (*fsync) (struct file *, struct dentry *, int datasync);
int (*aio_fsync) (struct kiocb *, int datasync);
int (*fasync) (int, struct file *, int);
+ int (*unlocked_fasync) (int, struct file *, int);
int (*lock) (struct file *, int, struct file_lock *);
ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *);
ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *);
@@ -828,7 +829,9 @@ otherwise noted.
fsync: called by the fsync(2) system call

fasync: called by the fcntl(2) system call when asynchronous
- (non-blocking) mode is enabled for a file
+ (non-blocking) mode is enabled for a file. BKL hold
+
+ unlocked_fasync: like fasync, but without BKL

lock: called by the fcntl(2) system call for F_GETLK, F_SETLK, and F_SETLKW
commands

2008-01-27 02:20:29

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [15/18] BKL-removal: Convert pipe over to unlocked_fasync


Signed-off-by: Andi Kleen <[email protected]>

---
fs/pipe.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux/fs/pipe.c
===================================================================
--- linux.orig/fs/pipe.c
+++ linux/fs/pipe.c
@@ -788,7 +788,7 @@ const struct file_operations read_fifo_f
.ioctl = pipe_ioctl,
.open = pipe_read_open,
.release = pipe_read_release,
- .fasync = pipe_read_fasync,
+ .unlocked_fasync = pipe_read_fasync,
};

const struct file_operations write_fifo_fops = {
@@ -800,7 +800,7 @@ const struct file_operations write_fifo_
.ioctl = pipe_ioctl,
.open = pipe_write_open,
.release = pipe_write_release,
- .fasync = pipe_write_fasync,
+ .unlocked_fasync = pipe_write_fasync,
};

const struct file_operations rdwr_fifo_fops = {
@@ -813,7 +813,7 @@ const struct file_operations rdwr_fifo_f
.ioctl = pipe_ioctl,
.open = pipe_rdwr_open,
.release = pipe_rdwr_release,
- .fasync = pipe_rdwr_fasync,
+ .unlocked_fasync = pipe_rdwr_fasync,
};

static const struct file_operations read_pipe_fops = {
@@ -825,7 +825,7 @@ static const struct file_operations read
.ioctl = pipe_ioctl,
.open = pipe_read_open,
.release = pipe_read_release,
- .fasync = pipe_read_fasync,
+ .unlocked_fasync = pipe_read_fasync,
};

static const struct file_operations write_pipe_fops = {
@@ -837,7 +837,7 @@ static const struct file_operations writ
.ioctl = pipe_ioctl,
.open = pipe_write_open,
.release = pipe_write_release,
- .fasync = pipe_write_fasync,
+ .unlocked_fasync = pipe_write_fasync,
};

static const struct file_operations rdwr_pipe_fops = {
@@ -850,7 +850,7 @@ static const struct file_operations rdwr
.ioctl = pipe_ioctl,
.open = pipe_rdwr_open,
.release = pipe_rdwr_release,
- .fasync = pipe_rdwr_fasync,
+ .unlocked_fasync = pipe_rdwr_fasync,
};

struct pipe_inode_info * alloc_pipe_info(struct inode *inode)

2008-01-27 02:20:49

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [17/18] BKL-removal: Convert fuse to unlocked_fasync


Cc: [email protected]

Signed-off-by: Andi Kleen <[email protected]>

---
fs/fuse/dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/fs/fuse/dev.c
===================================================================
--- linux.orig/fs/fuse/dev.c
+++ linux/fs/fuse/dev.c
@@ -1040,7 +1040,7 @@ const struct file_operations fuse_dev_op
.aio_write = fuse_dev_write,
.poll = fuse_dev_poll,
.release = fuse_dev_release,
- .fasync = fuse_dev_fasync,
+ .unlocked_fasync = fuse_dev_fasync,
};

static struct miscdevice fuse_miscdevice = {

2008-01-27 02:21:10

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [18/18] BKL-removal: Convert bad_inode to unlocked_fasync


Not that it matters much, but it was easy.

Signed-off-by: Andi Kleen <[email protected]>

---
fs/bad_inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/fs/bad_inode.c
===================================================================
--- linux.orig/fs/bad_inode.c
+++ linux/fs/bad_inode.c
@@ -174,7 +174,7 @@ static const struct file_operations bad_
.release = bad_file_release,
.fsync = bad_file_fsync,
.aio_fsync = bad_file_aio_fsync,
- .fasync = bad_file_fasync,
+ .unlocked_fasync = bad_file_fasync,
.lock = bad_file_lock,
.sendpage = bad_file_sendpage,
.get_unmapped_area = bad_file_get_unmapped_area,

2008-01-27 02:21:33

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [9/18] BKL-removal: Use unlocked_ioctl for jfs


Convert jfs_ioctl over to not use the BKL. The only potential race
I could see was with two ioctls in parallel changing the flags
and losing the updates. Use the i_mutex to protect against this.

Cc: [email protected]

Signed-off-by: Andi Kleen <[email protected]>

---
fs/jfs/file.c | 2 +-
fs/jfs/ioctl.c | 13 ++++++++++---
fs/jfs/jfs_inode.h | 3 +--
fs/jfs/namei.c | 2 +-
4 files changed, 13 insertions(+), 7 deletions(-)

Index: linux/fs/jfs/ioctl.c
===================================================================
--- linux.orig/fs/jfs/ioctl.c
+++ linux/fs/jfs/ioctl.c
@@ -51,9 +51,9 @@ static long jfs_map_ext2(unsigned long f
}


-int jfs_ioctl(struct inode * inode, struct file * filp, unsigned int cmd,
- unsigned long arg)
+long jfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
+ struct inode *inode = filp->f_dentry->d_inode;
struct jfs_inode_info *jfs_inode = JFS_IP(inode);
unsigned int flags;

@@ -82,6 +82,10 @@ int jfs_ioctl(struct inode * inode, stru
/* Is it quota file? Do not allow user to mess with it */
if (IS_NOQUOTA(inode))
return -EPERM;
+
+ /* Lock against other parallel changes of flags */
+ mutex_lock(&inode->i_mutex);
+
jfs_get_inode_flags(jfs_inode);
oldflags = jfs_inode->mode2;

@@ -92,8 +96,10 @@ int jfs_ioctl(struct inode * inode, stru
if ((oldflags & JFS_IMMUTABLE_FL) ||
((flags ^ oldflags) &
(JFS_APPEND_FL | JFS_IMMUTABLE_FL))) {
- if (!capable(CAP_LINUX_IMMUTABLE))
+ if (!capable(CAP_LINUX_IMMUTABLE)) {
+ mutex_unlock(&inode->i_mutex);
return -EPERM;
+ }
}

flags = flags & JFS_FL_USER_MODIFIABLE;
@@ -101,6 +107,7 @@ int jfs_ioctl(struct inode * inode, stru
jfs_inode->mode2 = flags;

jfs_set_inode_flags(inode);
+ mutex_unlock(&inode->i_mutex);
inode->i_ctime = CURRENT_TIME_SEC;
mark_inode_dirty(inode);
return 0;
Index: linux/fs/jfs/file.c
===================================================================
--- linux.orig/fs/jfs/file.c
+++ linux/fs/jfs/file.c
@@ -112,5 +112,5 @@ const struct file_operations jfs_file_op
.splice_write = generic_file_splice_write,
.fsync = jfs_fsync,
.release = jfs_release,
- .ioctl = jfs_ioctl,
+ .unlocked_ioctl = jfs_ioctl,
};
Index: linux/fs/jfs/jfs_inode.h
===================================================================
--- linux.orig/fs/jfs/jfs_inode.h
+++ linux/fs/jfs/jfs_inode.h
@@ -22,8 +22,7 @@ struct fid;

extern struct inode *ialloc(struct inode *, umode_t);
extern int jfs_fsync(struct file *, struct dentry *, int);
-extern int jfs_ioctl(struct inode *, struct file *,
- unsigned int, unsigned long);
+extern long jfs_ioctl(struct file *, unsigned int, unsigned long);
extern void jfs_read_inode(struct inode *);
extern int jfs_commit_inode(struct inode *, int);
extern int jfs_write_inode(struct inode*, int);
Index: linux/fs/jfs/namei.c
===================================================================
--- linux.orig/fs/jfs/namei.c
+++ linux/fs/jfs/namei.c
@@ -1562,7 +1562,7 @@ const struct file_operations jfs_dir_ope
.read = generic_read_dir,
.readdir = jfs_readdir,
.fsync = jfs_fsync,
- .ioctl = jfs_ioctl,
+ .unlocked_ioctl = jfs_ioctl,
};

static int jfs_ci_hash(struct dentry *dir, struct qstr *this)

2008-01-27 02:21:47

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [16/18] BKL-removal: Convert socket fasync to unlocked_fasync


Cc: [email protected]

Signed-off-by: Andi Kleen <[email protected]>

---
net/socket.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/net/socket.c
===================================================================
--- linux.orig/net/socket.c
+++ linux/net/socket.c
@@ -131,7 +131,7 @@ static const struct file_operations sock
.mmap = sock_mmap,
.open = sock_no_open, /* special open code to disallow open via /proc */
.release = sock_close,
- .fasync = sock_fasync,
+ .unlocked_fasync = sock_fasync,
.sendpage = sock_sendpage,
.splice_write = generic_splice_sendpage,
};

2008-01-27 02:22:07

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [13/18] BKL-removal: Add compat_ioctl for cifs


Similar to the compat handlers of other file systems. The ioctls
are compatible except that they have different numbers.

Cc: [email protected]

Signed-off-by: Andi Kleen <[email protected]>

---
fs/cifs/cifsfs.c | 15 +++++++++++++++
fs/cifs/cifsfs.h | 2 ++
fs/cifs/ioctl.c | 19 +++++++++++++++++++
3 files changed, 36 insertions(+)

Index: linux/fs/cifs/cifsfs.c
===================================================================
--- linux.orig/fs/cifs/cifsfs.c
+++ linux/fs/cifs/cifsfs.c
@@ -626,6 +626,9 @@ const struct file_operations cifs_file_o
.llseek = cifs_llseek,
#ifdef CONFIG_CIFS_POSIX
.unlocked_ioctl = cifs_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = cifs_compat_ioctl,
+#endif
#endif /* CONFIG_CIFS_POSIX */

#ifdef CONFIG_CIFS_EXPERIMENTAL
@@ -646,6 +649,9 @@ const struct file_operations cifs_file_d
.splice_read = generic_file_splice_read,
#ifdef CONFIG_CIFS_POSIX
.unlocked_ioctl = cifs_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = cifs_compat_ioctl,
+#endif
#endif /* CONFIG_CIFS_POSIX */
.llseek = cifs_llseek,
#ifdef CONFIG_CIFS_EXPERIMENTAL
@@ -666,6 +672,9 @@ const struct file_operations cifs_file_n
.llseek = cifs_llseek,
#ifdef CONFIG_CIFS_POSIX
.unlocked_ioctl = cifs_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = cifs_compat_ioctl,
+#endif
#endif /* CONFIG_CIFS_POSIX */

#ifdef CONFIG_CIFS_EXPERIMENTAL
@@ -685,6 +694,9 @@ const struct file_operations cifs_file_d
.splice_read = generic_file_splice_read,
#ifdef CONFIG_CIFS_POSIX
.unlocked_ioctl = cifs_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = cifs_compat_ioctl,
+#endif
#endif /* CONFIG_CIFS_POSIX */
.llseek = cifs_llseek,
#ifdef CONFIG_CIFS_EXPERIMENTAL
@@ -700,6 +712,9 @@ const struct file_operations cifs_dir_op
.dir_notify = cifs_dir_notify,
#endif /* CONFIG_CIFS_EXPERIMENTAL */
.unlocked_ioctl = cifs_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = cifs_compat_ioctl,
+#endif
};

static void
Index: linux/fs/cifs/cifsfs.h
===================================================================
--- linux.orig/fs/cifs/cifsfs.h
+++ linux/fs/cifs/cifsfs.h
@@ -101,6 +101,8 @@ extern ssize_t cifs_getxattr(struct dent
extern ssize_t cifs_listxattr(struct dentry *, char *, size_t);
extern long cifs_ioctl(struct file *filep, unsigned int command,
unsigned long arg);
+extern long cifs_compat_ioctl(struct file *filep, unsigned int command,
+ unsigned long arg);

#ifdef CONFIG_CIFS_EXPERIMENTAL
extern const struct export_operations cifs_export_ops;
Index: linux/fs/cifs/ioctl.c
===================================================================
--- linux.orig/fs/cifs/ioctl.c
+++ linux/fs/cifs/ioctl.c
@@ -108,3 +108,22 @@ long cifs_ioctl(struct file *filep, unsi
FreeXid(xid);
return rc;
}
+
+#ifdef CONFIG_COMPAT
+#define FS_IOC_GETFLAGS32 _IOR('f', 1, int)
+#define FS_IOC_SETFLAGS32 _IOR('f', 2, int)
+
+long cifs_compat_ioctl(struct file *f, unsigned int command, unsigned long arg)
+{
+ /* Native ioctl is just mistyped, the real type is always int */
+ switch (command) {
+ case FS_IOC_GETFLAGS32:
+ command = FS_IOC_GETFLAGS;
+ break;
+ case FS_IOC_SETFLAGS32:
+ command = FS_IOC_SETFLAGS;
+ break;
+ }
+ return cifs_ioctl(f, command, arg);
+}
+#endif

2008-01-27 02:22:34

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [12/18] BKL-removal: Convert CIFS over to unlocked_ioctl


cifs_ioctl doesn't seem to need the BKL for anything, so convert it over
to use unlocked_ioctl.

Cc: [email protected]

Signed-off-by: Andi Kleen <[email protected]>

---
fs/cifs/cifsfs.c | 10 +++++-----
fs/cifs/cifsfs.h | 4 ++--
fs/cifs/ioctl.c | 4 ++--
3 files changed, 9 insertions(+), 9 deletions(-)

Index: linux/fs/cifs/cifsfs.c
===================================================================
--- linux.orig/fs/cifs/cifsfs.c
+++ linux/fs/cifs/cifsfs.c
@@ -625,7 +625,7 @@ const struct file_operations cifs_file_o
.splice_read = generic_file_splice_read,
.llseek = cifs_llseek,
#ifdef CONFIG_CIFS_POSIX
- .ioctl = cifs_ioctl,
+ .unlocked_ioctl = cifs_ioctl,
#endif /* CONFIG_CIFS_POSIX */

#ifdef CONFIG_CIFS_EXPERIMENTAL
@@ -645,7 +645,7 @@ const struct file_operations cifs_file_d
.flush = cifs_flush,
.splice_read = generic_file_splice_read,
#ifdef CONFIG_CIFS_POSIX
- .ioctl = cifs_ioctl,
+ .unlocked_ioctl = cifs_ioctl,
#endif /* CONFIG_CIFS_POSIX */
.llseek = cifs_llseek,
#ifdef CONFIG_CIFS_EXPERIMENTAL
@@ -665,7 +665,7 @@ const struct file_operations cifs_file_n
.splice_read = generic_file_splice_read,
.llseek = cifs_llseek,
#ifdef CONFIG_CIFS_POSIX
- .ioctl = cifs_ioctl,
+ .unlocked_ioctl = cifs_ioctl,
#endif /* CONFIG_CIFS_POSIX */

#ifdef CONFIG_CIFS_EXPERIMENTAL
@@ -684,7 +684,7 @@ const struct file_operations cifs_file_d
.flush = cifs_flush,
.splice_read = generic_file_splice_read,
#ifdef CONFIG_CIFS_POSIX
- .ioctl = cifs_ioctl,
+ .unlocked_ioctl = cifs_ioctl,
#endif /* CONFIG_CIFS_POSIX */
.llseek = cifs_llseek,
#ifdef CONFIG_CIFS_EXPERIMENTAL
@@ -699,7 +699,7 @@ const struct file_operations cifs_dir_op
#ifdef CONFIG_CIFS_EXPERIMENTAL
.dir_notify = cifs_dir_notify,
#endif /* CONFIG_CIFS_EXPERIMENTAL */
- .ioctl = cifs_ioctl,
+ .unlocked_ioctl = cifs_ioctl,
};

static void
Index: linux/fs/cifs/cifsfs.h
===================================================================
--- linux.orig/fs/cifs/cifsfs.h
+++ linux/fs/cifs/cifsfs.h
@@ -99,8 +99,8 @@ extern int cifs_setxattr(struct dentry
size_t, int);
extern ssize_t cifs_getxattr(struct dentry *, const char *, void *, size_t);
extern ssize_t cifs_listxattr(struct dentry *, char *, size_t);
-extern int cifs_ioctl(struct inode *inode, struct file *filep,
- unsigned int command, unsigned long arg);
+extern long cifs_ioctl(struct file *filep, unsigned int command,
+ unsigned long arg);

#ifdef CONFIG_CIFS_EXPERIMENTAL
extern const struct export_operations cifs_export_ops;
Index: linux/fs/cifs/ioctl.c
===================================================================
--- linux.orig/fs/cifs/ioctl.c
+++ linux/fs/cifs/ioctl.c
@@ -30,9 +30,9 @@

#define CIFS_IOC_CHECKUMOUNT _IO(0xCF, 2)

-int cifs_ioctl (struct inode *inode, struct file *filep,
- unsigned int command, unsigned long arg)
+long cifs_ioctl(struct file *filep, unsigned int command, unsigned long arg)
{
+ struct inode *inode = filep->f_dentry->d_inode;
int rc = -ENOTTY; /* strange error - but the precedent */
int xid;
struct cifs_sb_info *cifs_sb;

2008-01-27 02:22:47

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [11/18] BKL-removal: Convert ocfs2 over to unlocked_ioctl


As far as I can see there is nothing in ocfs2_ioctl that requires the BKL,
so use unlocked_ioctl

Cc: [email protected]

Signed-off-by: Andi Kleen <[email protected]>

---
fs/ocfs2/file.c | 4 ++--
fs/ocfs2/ioctl.c | 12 +++---------
fs/ocfs2/ioctl.h | 3 +--
3 files changed, 6 insertions(+), 13 deletions(-)

Index: linux/fs/ocfs2/ioctl.c
===================================================================
--- linux.orig/fs/ocfs2/ioctl.c
+++ linux/fs/ocfs2/ioctl.c
@@ -111,9 +111,9 @@ bail:
return status;
}

-int ocfs2_ioctl(struct inode * inode, struct file * filp,
- unsigned int cmd, unsigned long arg)
+long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
+ struct inode *inode = filp->f_path.dentry->d_inode;
unsigned int flags;
int status;
struct ocfs2_space_resv sr;
@@ -148,9 +148,6 @@ int ocfs2_ioctl(struct inode * inode, st
#ifdef CONFIG_COMPAT
long ocfs2_compat_ioctl(struct file *file, unsigned cmd, unsigned long arg)
{
- struct inode *inode = file->f_path.dentry->d_inode;
- int ret;
-
switch (cmd) {
case OCFS2_IOC32_GETFLAGS:
cmd = OCFS2_IOC_GETFLAGS;
@@ -167,9 +164,6 @@ long ocfs2_compat_ioctl(struct file *fil
return -ENOIOCTLCMD;
}

- lock_kernel();
- ret = ocfs2_ioctl(inode, file, cmd, arg);
- unlock_kernel();
- return ret;
+ return ocfs2_ioctl(file, cmd, arg);
}
#endif
Index: linux/fs/ocfs2/ioctl.h
===================================================================
--- linux.orig/fs/ocfs2/ioctl.h
+++ linux/fs/ocfs2/ioctl.h
@@ -10,8 +10,7 @@
#ifndef OCFS2_IOCTL_H
#define OCFS2_IOCTL_H

-int ocfs2_ioctl(struct inode * inode, struct file * filp,
- unsigned int cmd, unsigned long arg);
+long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
long ocfs2_compat_ioctl(struct file *file, unsigned cmd, unsigned long arg);

#endif /* OCFS2_IOCTL_H */
Index: linux/fs/ocfs2/file.c
===================================================================
--- linux.orig/fs/ocfs2/file.c
+++ linux/fs/ocfs2/file.c
@@ -2212,7 +2212,7 @@ const struct file_operations ocfs2_fops
.open = ocfs2_file_open,
.aio_read = ocfs2_file_aio_read,
.aio_write = ocfs2_file_aio_write,
- .ioctl = ocfs2_ioctl,
+ .unlocked_ioctl = ocfs2_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = ocfs2_compat_ioctl,
#endif
@@ -2224,7 +2224,7 @@ const struct file_operations ocfs2_dops
.read = generic_read_dir,
.readdir = ocfs2_readdir,
.fsync = ocfs2_sync_file,
- .ioctl = ocfs2_ioctl,
+ .unlocked_ioctl = ocfs2_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = ocfs2_compat_ioctl,
#endif

2008-01-27 02:23:09

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [3/18] BKL-removal: Convert ext3 to use unlocked_ioctl


I checked ext3_ioctl and it looked largely safe to not be used
without BKL. So convert it over to unlocked_ioctl.

The only case where I wasn't quite sure was for the
dynamic fs grow ioctls versus umounting -- I kept the BKL for those.

Cc: [email protected]
Cc: [email protected]

Signed-off-by: Andi Kleen <[email protected]>

---
fs/ext3/dir.c | 2 +-
fs/ext3/file.c | 2 +-
fs/ext3/ioctl.c | 21 +++++++++++----------
include/linux/ext3_fs.h | 3 +--
4 files changed, 14 insertions(+), 14 deletions(-)

Index: linux/fs/ext3/dir.c
===================================================================
--- linux.orig/fs/ext3/dir.c
+++ linux/fs/ext3/dir.c
@@ -42,7 +42,7 @@ const struct file_operations ext3_dir_op
.llseek = generic_file_llseek,
.read = generic_read_dir,
.readdir = ext3_readdir, /* we take BKL. needed?*/
- .ioctl = ext3_ioctl, /* BKL held */
+ .unlocked_ioctl = ext3_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = ext3_compat_ioctl,
#endif
Index: linux/fs/ext3/file.c
===================================================================
--- linux.orig/fs/ext3/file.c
+++ linux/fs/ext3/file.c
@@ -112,7 +112,7 @@ const struct file_operations ext3_file_o
.write = do_sync_write,
.aio_read = generic_file_aio_read,
.aio_write = ext3_file_write,
- .ioctl = ext3_ioctl,
+ .unlocked_ioctl = ext3_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = ext3_compat_ioctl,
#endif
Index: linux/fs/ext3/ioctl.c
===================================================================
--- linux.orig/fs/ext3/ioctl.c
+++ linux/fs/ext3/ioctl.c
@@ -17,9 +17,9 @@
#include <linux/smp_lock.h>
#include <asm/uaccess.h>

-int ext3_ioctl (struct inode * inode, struct file * filp, unsigned int cmd,
- unsigned long arg)
+long ext3_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
+ struct inode *inode = filp->f_dentry->d_inode;
struct ext3_inode_info *ei = EXT3_I(inode);
unsigned int flags;
unsigned short rsv_window_size;
@@ -224,10 +224,14 @@ flags_err:
if (get_user(n_blocks_count, (__u32 __user *)arg))
return -EFAULT;

+ /* AK: not sure the BKL is needed, but this might prevent
+ * races against umount */
+ lock_kernel();
err = ext3_group_extend(sb, EXT3_SB(sb)->s_es, n_blocks_count);
journal_lock_updates(EXT3_SB(sb)->s_journal);
journal_flush(EXT3_SB(sb)->s_journal);
journal_unlock_updates(EXT3_SB(sb)->s_journal);
+ unlock_kernel();

return err;
}
@@ -245,11 +249,14 @@ flags_err:
if (copy_from_user(&input, (struct ext3_new_group_input __user *)arg,
sizeof(input)))
return -EFAULT;
-
+ /* AK: not sure the BKL is needed, but this might prevent
+ * races against umount */
+ lock_kernel();
err = ext3_group_add(sb, &input);
journal_lock_updates(EXT3_SB(sb)->s_journal);
journal_flush(EXT3_SB(sb)->s_journal);
journal_unlock_updates(EXT3_SB(sb)->s_journal);
+ unlock_kernel();

return err;
}
@@ -263,9 +270,6 @@ flags_err:
#ifdef CONFIG_COMPAT
long ext3_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
- struct inode *inode = file->f_path.dentry->d_inode;
- int ret;
-
/* These are just misnamed, they actually get/put from/to user an int */
switch (cmd) {
case EXT3_IOC32_GETFLAGS:
@@ -305,9 +309,6 @@ long ext3_compat_ioctl(struct file *file
default:
return -ENOIOCTLCMD;
}
- lock_kernel();
- ret = ext3_ioctl(inode, file, cmd, (unsigned long) compat_ptr(arg));
- unlock_kernel();
- return ret;
+ return ext3_ioctl(file, cmd, (unsigned long) compat_ptr(arg));
}
#endif
Index: linux/include/linux/ext3_fs.h
===================================================================
--- linux.orig/include/linux/ext3_fs.h
+++ linux/include/linux/ext3_fs.h
@@ -838,8 +838,7 @@ extern void ext3_get_inode_flags(struct
extern void ext3_set_aops(struct inode *inode);

/* ioctl.c */
-extern int ext3_ioctl (struct inode *, struct file *, unsigned int,
- unsigned long);
+extern long ext3_ioctl(struct file *, unsigned int, unsigned long);
extern long ext3_compat_ioctl (struct file *, unsigned int, unsigned long);

/* namei.c */

2008-01-27 02:23:37

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [6/18] BKL-removal: Convert ext4 to use unlocked_ioctl


I checked ext4_ioctl and it looked largely safe to not be used
without BKL. So convert it over to unlocked_ioctl.

The only case where I wasn't quite sure was for the
dynamic fs grow ioctls versus umounting -- I kept the BKL for those.

Cc: [email protected]
Cc: [email protected]

Signed-off-by: Andi Kleen <[email protected]>

---
fs/ext4/dir.c | 2 +-
fs/ext4/file.c | 2 +-
fs/ext4/ioctl.c | 20 +++++++++++---------
include/linux/ext4_fs.h | 3 +--
4 files changed, 14 insertions(+), 13 deletions(-)

Index: linux/fs/ext4/dir.c
===================================================================
--- linux.orig/fs/ext4/dir.c
+++ linux/fs/ext4/dir.c
@@ -42,7 +42,7 @@ const struct file_operations ext4_dir_op
.llseek = generic_file_llseek,
.read = generic_read_dir,
.readdir = ext4_readdir, /* we take BKL. needed?*/
- .ioctl = ext4_ioctl, /* BKL held */
+ .unlocked_ioctl = ext4_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = ext4_compat_ioctl,
#endif
Index: linux/fs/ext4/file.c
===================================================================
--- linux.orig/fs/ext4/file.c
+++ linux/fs/ext4/file.c
@@ -112,7 +112,7 @@ const struct file_operations ext4_file_o
.write = do_sync_write,
.aio_read = generic_file_aio_read,
.aio_write = ext4_file_write,
- .ioctl = ext4_ioctl,
+ .unlocked_ioctl = ext4_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = ext4_compat_ioctl,
#endif
Index: linux/fs/ext4/ioctl.c
===================================================================
--- linux.orig/fs/ext4/ioctl.c
+++ linux/fs/ext4/ioctl.c
@@ -17,9 +17,9 @@
#include <linux/smp_lock.h>
#include <asm/uaccess.h>

-int ext4_ioctl (struct inode * inode, struct file * filp, unsigned int cmd,
- unsigned long arg)
+long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
+ struct inode *inode = filp->f_dentry->d_inode;
struct ext4_inode_info *ei = EXT4_I(inode);
unsigned int flags;
unsigned short rsv_window_size;
@@ -224,10 +224,14 @@ flags_err:
if (get_user(n_blocks_count, (__u32 __user *)arg))
return -EFAULT;

+ /* AK: not sure the BKL is needed, but this might prevent
+ * races against umount. */
+ lock_kernel();
err = ext4_group_extend(sb, EXT4_SB(sb)->s_es, n_blocks_count);
jbd2_journal_lock_updates(EXT4_SB(sb)->s_journal);
jbd2_journal_flush(EXT4_SB(sb)->s_journal);
jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
+ unlock_kernel();

return err;
}
@@ -246,10 +250,14 @@ flags_err:
sizeof(input)))
return -EFAULT;

+ /* AK: not sure the BKL is needed, but this might prevent
+ * races against umount. */
+ lock_kernel();
err = ext4_group_add(sb, &input);
jbd2_journal_lock_updates(EXT4_SB(sb)->s_journal);
jbd2_journal_flush(EXT4_SB(sb)->s_journal);
jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
+ unlock_kernel();

return err;
}
@@ -262,9 +270,6 @@ flags_err:
#ifdef CONFIG_COMPAT
long ext4_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
- struct inode *inode = file->f_path.dentry->d_inode;
- int ret;
-
/* These are just misnamed, they actually get/put from/to user an int */
switch (cmd) {
case EXT4_IOC32_GETFLAGS:
@@ -304,9 +309,6 @@ long ext4_compat_ioctl(struct file *file
default:
return -ENOIOCTLCMD;
}
- lock_kernel();
- ret = ext4_ioctl(inode, file, cmd, (unsigned long) compat_ptr(arg));
- unlock_kernel();
- return ret;
+ return ext4_ioctl(file, cmd, (unsigned long) compat_ptr(arg));
}
#endif
Index: linux/include/linux/ext4_fs.h
===================================================================
--- linux.orig/include/linux/ext4_fs.h
+++ linux/include/linux/ext4_fs.h
@@ -939,8 +939,7 @@ extern int ext4_block_truncate_page(hand
struct address_space *mapping, loff_t from);

/* ioctl.c */
-extern int ext4_ioctl (struct inode *, struct file *, unsigned int,
- unsigned long);
+extern long ext4_ioctl(struct file *, unsigned int, unsigned long);
extern long ext4_compat_ioctl (struct file *, unsigned int, unsigned long);

/* namei.c */

2008-01-27 02:23:54

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [5/18] BKL-removal: Remove incorrect comment refering to lock_kernel() from jbd/jbd2


None of the callers of this function does actually take the BKL
as far as I can see. So remove the comment refering to the BKL.

Cc: [email protected]
Cc: [email protected]

Signed-off-by: Andi Kleen <[email protected]>

---
fs/jbd/recovery.c | 2 +-
fs/jbd2/recovery.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

Index: linux/fs/jbd/recovery.c
===================================================================
--- linux.orig/fs/jbd/recovery.c
+++ linux/fs/jbd/recovery.c
@@ -354,7 +354,7 @@ static int do_one_pass(journal_t *journa
struct buffer_head * obh;
struct buffer_head * nbh;

- cond_resched(); /* We're under lock_kernel() */
+ cond_resched();

/* If we already know where to stop the log traversal,
* check right now that we haven't gone past the end of
Index: linux/fs/jbd2/recovery.c
===================================================================
--- linux.orig/fs/jbd2/recovery.c
+++ linux/fs/jbd2/recovery.c
@@ -364,7 +364,7 @@ static int do_one_pass(journal_t *journa
struct buffer_head * obh;
struct buffer_head * nbh;

- cond_resched(); /* We're under lock_kernel() */
+ cond_resched();

/* If we already know where to stop the log traversal,
* check right now that we haven't gone past the end of

2008-01-27 07:05:59

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] [14/18] BKL-removal: Add unlocked_fasync

> Add a new fops entry point to allow fasync without BKL. While it's arguably
> unclear this entry point is called often enough for it really matters
> it was still relatively easy to do. And there are far less async users
> in the tree than ioctls so it's likely they can be all converted
> eventually and then the non unlocked async entry point could be dropped.

Excellent!
I like this patch :)

2008-01-27 11:15:31

by Bodo Eggert

[permalink] [raw]
Subject: Re: [PATCH] [14/18] BKL-removal: Add unlocked_fasync

> +++ linux/fs/fcntl.c
> @@ -240,11 +240,15 @@ static int setfl(int fd, struct file * f
>
> lock_kernel();
> if ((arg ^ filp->f_flags) & FASYNC) {
> - if (filp->f_op && filp->f_op->fasync) {
> + if (filp->f_op && filp->f_op->unlocked_fasync)
> + error = filp->f_op->unlocked_fasync(fd, filp,
> + !!(arg & FASYNC));
> + else if (filp->f_op && filp->f_op->fasync) {
> error = filp->f_op->fasync(fd, filp, (arg & FASYNC) !=
0);
> if (error < 0)
> goto out;

No goto if you use unlocked_fasync?

> }
> + /* AK: no else error = -EINVAL here? */
> }
>
> filp->f_flags = (arg & SETFL_MASK) | (filp->f_flags & ~SETFL_MASK);
> --
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at ?http://vger.kernel.org/majordomo-info.html
Please read the FAQ at ?http://www.tux.org/lkml/

2008-01-27 16:32:17

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [14/18] BKL-removal: Add unlocked_fasync v2

> No goto if you use unlocked_fasync?

Indeed. There was another problem in the patch too. Here's an updated
patch that also fixes another latent bug.

The whole f_flags still seems to be somewhat fragile because
the checks tend to happen without any lock, but that has not
changed to the previous state.

---

Add unlocked_fasync v2

Add a new fops entry point to allow fasync without BKL. While it's arguably
unclear this entry point is called often enough for it really matters
it was still relatively easy to do. And there are far less async users
in the tree than ioctls so it's likely they can be all converted
eventually and then the non unlocked async entry point could be dropped.

There was still the problem of the actual flags change being
protected against other setters of flags. Instead of using BKL
for this use the i_mutex now.

I also added a mutex_lock against one other flags change
that was lockless and could potentially lose updates.

Signed-off-by: Andi Kleen <[email protected]>

---
Documentation/filesystems/vfs.txt | 5 ++++-
fs/fcntl.c | 6 +++++-
fs/ioctl.c | 5 ++++-
include/linux/fs.h | 1 +
4 files changed, 14 insertions(+), 3 deletions(-)

Index: linux/fs/fcntl.c
===================================================================
--- linux.orig/fs/fcntl.c
+++ linux/fs/fcntl.c
@@ -238,18 +238,26 @@ static int setfl(int fd, struct file * f
if (error)
return error;

- lock_kernel();
+ /* AK: potential race here. Since test is unlocked fasync could
+ be called several times in a row with on. We hope the drivers
+ deal with it. */
if ((arg ^ filp->f_flags) & FASYNC) {
- if (filp->f_op && filp->f_op->fasync) {
- error = filp->f_op->fasync(fd, filp, (arg & FASYNC) != 0);
- if (error < 0)
- goto out;
+ int on = !!(arg & FASYNC);
+ if (filp->f_op && filp->f_op->unlocked_fasync)
+ error = filp->f_op->unlocked_fasync(fd, filp, on);
+ else if (filp->f_op && filp->f_op->fasync) {
+ lock_kernel();
+ error = filp->f_op->fasync(fd, filp, on);
+ unlock_kernel();
}
+ /* AK: no else error = -EINVAL here? */
+ if (error < 0)
+ return error;
}

+ mutex_lock(&filp->f_dentry->d_inode->i_mutex);
filp->f_flags = (arg & SETFL_MASK) | (filp->f_flags & ~SETFL_MASK);
- out:
- unlock_kernel();
+ mutex_unlock(&filp->f_dentry->d_inode->i_mutex);
return error;
}

Index: linux/fs/ioctl.c
===================================================================
--- linux.orig/fs/ioctl.c
+++ linux/fs/ioctl.c
@@ -105,10 +105,14 @@ int vfs_ioctl(struct file *filp, unsigne
if(O_NONBLOCK != O_NDELAY)
flag |= O_NDELAY;
#endif
+
+ mutex_lock(&filp->f_dentry->d_inode->i_mutex);
if (on)
filp->f_flags |= flag;
else
filp->f_flags &= ~flag;
+
+ mutex_unlock(&filp->f_dentry->d_inode->i_mutex);
break;

case FIOASYNC:
@@ -117,8 +121,13 @@ int vfs_ioctl(struct file *filp, unsigne
flag = on ? FASYNC : 0;

/* Did FASYNC state change ? */
+ /* AK: potential race here: ->fasync could be called with on two times
+ in a row. We hope the drivers deal with it. */
if ((flag ^ filp->f_flags) & FASYNC) {
- if (filp->f_op && filp->f_op->fasync) {
+ if (filp->f_op && filp->f_op->unlocked_fasync)
+ error = filp->f_op->unlocked_fasync(fd,
+ filp, on);
+ else if (filp->f_op && filp->f_op->fasync) {
lock_kernel();
error = filp->f_op->fasync(fd, filp, on);
unlock_kernel();
@@ -128,10 +137,12 @@ int vfs_ioctl(struct file *filp, unsigne
if (error != 0)
break;

+ mutex_lock(&filp->f_dentry->d_inode->i_mutex);
if (on)
filp->f_flags |= FASYNC;
else
filp->f_flags &= ~FASYNC;
+ mutex_unlock(&filp->f_dentry->d_inode->i_mutex);
break;

case FIOQSIZE:
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h
+++ linux/include/linux/fs.h
@@ -1179,6 +1179,7 @@ struct file_operations {
int (*fsync) (struct file *, struct dentry *, int datasync);
int (*aio_fsync) (struct kiocb *, int datasync);
int (*fasync) (int, struct file *, int);
+ int (*unlocked_fasync) (int, struct file *, int);
int (*lock) (struct file *, int, struct file_lock *);
ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
Index: linux/Documentation/filesystems/vfs.txt
===================================================================
--- linux.orig/Documentation/filesystems/vfs.txt
+++ linux/Documentation/filesystems/vfs.txt
@@ -769,6 +769,7 @@ struct file_operations {
int (*fsync) (struct file *, struct dentry *, int datasync);
int (*aio_fsync) (struct kiocb *, int datasync);
int (*fasync) (int, struct file *, int);
+ int (*unlocked_fasync) (int, struct file *, int);
int (*lock) (struct file *, int, struct file_lock *);
ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *);
ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *);
@@ -828,7 +829,9 @@ otherwise noted.
fsync: called by the fsync(2) system call

fasync: called by the fcntl(2) system call when asynchronous
- (non-blocking) mode is enabled for a file
+ (non-blocking) mode is enabled for a file. BKL hold
+
+ unlocked_fasync: like fasync, but without BKL

lock: called by the fcntl(2) system call for F_GETLK, F_SETLK, and F_SETLKW
commands

2008-01-27 16:57:36

by Steve French

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

Don't you need to a spinlock/spinunlock(i_lock) or something similar
(there isn't a spinlock in the file struct unfortunately) around the
reads and writes from f_pos in fs/read_write.c in remote_llseek with
your patch since the reads/writes from that field are not necessarily
atomic and threads could be racing in seek on the same file struct?

On Jan 26, 2008 8:17 PM, Andi Kleen <[email protected]> wrote:
>
> - Replace remote_llseek with remote_llseek_unlocked (to force compilation
> failures in all users)
> - Change all users to either use remote_llseek directly or take the
> BKL around. I changed the file systems who don't use the BKL
> for anything (CIFS, GFS) to call it directly. NCPFS and SMBFS and NFS
> take the BKL, but explicitely in their own source now.
>
> I moved them all over in a single patch to avoid unbisectable sections.
>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
>
> Signed-off-by: Andi Kleen <[email protected]>
>
> ---
> fs/cifs/cifsfs.c | 2 +-
> fs/gfs2/ops_file.c | 4 ++--
> fs/ncpfs/file.c | 12 +++++++++++-
> fs/nfs/file.c | 6 +++++-
> fs/read_write.c | 7 +++----
> fs/smbfs/file.c | 11 ++++++++++-
> include/linux/fs.h | 3 ++-
> 7 files changed, 34 insertions(+), 11 deletions(-)
>
> Index: linux/fs/cifs/cifsfs.c
> ===================================================================
> --- linux.orig/fs/cifs/cifsfs.c
> +++ linux/fs/cifs/cifsfs.c
> @@ -549,7 +549,7 @@ static loff_t cifs_llseek(struct file *f
> if (retval < 0)
> return (loff_t)retval;
> }
> - return remote_llseek(file, offset, origin);
> + return remote_llseek_unlocked(file, offset, origin);
> }
>
> static struct file_system_type cifs_fs_type = {
> Index: linux/fs/gfs2/ops_file.c
> ===================================================================
> --- linux.orig/fs/gfs2/ops_file.c
> +++ linux/fs/gfs2/ops_file.c
> @@ -107,11 +107,11 @@ static loff_t gfs2_llseek(struct file *f
> error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, LM_FLAG_ANY,
> &i_gh);
> if (!error) {
> - error = remote_llseek(file, offset, origin);
> + error = remote_llseek_unlocked(file, offset, origin);
> gfs2_glock_dq_uninit(&i_gh);
> }
> } else
> - error = remote_llseek(file, offset, origin);
> + error = remote_llseek_unlocked(file, offset, origin);
>
> return error;
> }
> Index: linux/fs/ncpfs/file.c
> ===================================================================
> --- linux.orig/fs/ncpfs/file.c
> +++ linux/fs/ncpfs/file.c
> @@ -18,6 +18,7 @@
> #include <linux/slab.h>
> #include <linux/vmalloc.h>
> #include <linux/sched.h>
> +#include <linux/smp_lock.h>
>
> #include <linux/ncp_fs.h>
> #include "ncplib_kernel.h"
> @@ -281,9 +282,18 @@ static int ncp_release(struct inode *ino
> return 0;
> }
>
> +static loff_t ncp_remote_llseek(struct file *file, loff_t offset, int origin)
> +{
> + loff_t ret;
> + lock_kernel();
> + ret = remote_llseek_unlocked(file, offset, origin);
> + unlock_kernel();
> + return ret;
> +}
> +
> const struct file_operations ncp_file_operations =
> {
> - .llseek = remote_llseek,
> + .llseek = ncp_remote_llseek,
> .read = ncp_file_read,
> .write = ncp_file_write,
> .ioctl = ncp_ioctl,
> Index: linux/fs/read_write.c
> ===================================================================
> --- linux.orig/fs/read_write.c
> +++ linux/fs/read_write.c
> @@ -58,11 +58,10 @@ loff_t generic_file_llseek(struct file *
>
> EXPORT_SYMBOL(generic_file_llseek);
>
> -loff_t remote_llseek(struct file *file, loff_t offset, int origin)
> +loff_t remote_llseek_unlocked(struct file *file, loff_t offset, int origin)
> {
> long long retval;
>
> - lock_kernel();
> switch (origin) {
> case SEEK_END:
> offset += i_size_read(file->f_path.dentry->d_inode);
> @@ -73,15 +72,15 @@ loff_t remote_llseek(struct file *file,
> retval = -EINVAL;
> if (offset>=0 && offset<=file->f_path.dentry->d_inode->i_sb->s_maxbytes) {
> if (offset != file->f_pos) {
> + /* AK: do we need a lock for those? */
> file->f_pos = offset;
> file->f_version = 0;
> }
> retval = offset;
> }
> - unlock_kernel();
> return retval;
> }
> -EXPORT_SYMBOL(remote_llseek);
> +EXPORT_SYMBOL(remote_llseek_unlocked);
>
> loff_t no_llseek(struct file *file, loff_t offset, int origin)
> {
> Index: linux/fs/smbfs/file.c
> ===================================================================
> --- linux.orig/fs/smbfs/file.c
> +++ linux/fs/smbfs/file.c
> @@ -422,9 +422,18 @@ smb_file_permission(struct inode *inode,
> return error;
> }
>
> +static loff_t smb_remote_llseek(struct file *file, loff_t offset, int origin)
> +{
> + loff_t ret;
> + lock_kernel();
> + ret = remote_llseek_unlocked(file, offset, origin);
> + unlock_kernel();
> + return ret;
> +}
> +
> const struct file_operations smb_file_operations =
> {
> - .llseek = remote_llseek,
> + .llseek = smb_remote_llseek,
> .read = do_sync_read,
> .aio_read = smb_file_aio_read,
> .write = do_sync_write,
> Index: linux/include/linux/fs.h
> ===================================================================
> --- linux.orig/include/linux/fs.h
> +++ linux/include/linux/fs.h
> @@ -1825,7 +1825,8 @@ extern void
> file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping);
> extern loff_t no_llseek(struct file *file, loff_t offset, int origin);
> extern loff_t generic_file_llseek(struct file *file, loff_t offset, int origin);
> -extern loff_t remote_llseek(struct file *file, loff_t offset, int origin);
> +extern loff_t remote_llseek_unlocked(struct file *file, loff_t offset,
> + int origin);
> extern int generic_file_open(struct inode * inode, struct file * filp);
> extern int nonseekable_open(struct inode * inode, struct file * filp);
>
> Index: linux/fs/nfs/file.c
> ===================================================================
> --- linux.orig/fs/nfs/file.c
> +++ linux/fs/nfs/file.c
> @@ -166,6 +166,7 @@ force_reval:
>
> static loff_t nfs_file_llseek(struct file *filp, loff_t offset, int origin)
> {
> + loff_t loff;
> /* origin == SEEK_END => we must revalidate the cached file length */
> if (origin == SEEK_END) {
> struct inode *inode = filp->f_mapping->host;
> @@ -173,7 +174,10 @@ static loff_t nfs_file_llseek(struct fil
> if (retval < 0)
> return (loff_t)retval;
> }
> - return remote_llseek(filp, offset, origin);
> + lock_kernel(); /* BKL needed? */
> + loff = remote_llseek_unlocked(filp, offset, origin);
> + unlock_kernel();
> + return loff;
> }
>
> /*
>



--
Thanks,

Steve

2008-01-27 18:46:33

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek


On Sun, 2008-01-27 at 10:57 -0600, Steve French wrote:
> Don't you need to a spinlock/spinunlock(i_lock) or something similar
> (there isn't a spinlock in the file struct unfortunately) around the
> reads and writes from f_pos in fs/read_write.c in remote_llseek with
> your patch since the reads/writes from that field are not necessarily
> atomic and threads could be racing in seek on the same file struct?

Where does is state in POSIX or SUS that we need to cater to that kind
of application?
In any case, the current behaviour of f_pos if two threads are sharing
the file struct is undefined no matter whether you spinlock or not,
since there is no special locking around sys_read() or sys_write().

Trond

2008-01-27 22:18:38

by Steve French

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

If two seeks overlap, can't you end up with an f_pos value that is
different than what either thread seeked to? or if you have a seek and
a read overlap can't you end up with the read occurring in the midst
of an update of f_pos (which takes more than one instruction on
various architectures), e.g. reading an f_pos, which has only the
lower half of a 64 bit field updated? I agree that you shouldn't
have seeks racing in parallel but I think it is preferable to get
either the updated f_pos or the earlier f_pos not something 1/2
updated.

On Jan 27, 2008 11:56 AM, Trond Myklebust <[email protected]> wrote:
>
> On Sun, 2008-01-27 at 10:57 -0600, Steve French wrote:
> > Don't you need to a spinlock/spinunlock(i_lock) or something similar
> > (there isn't a spinlock in the file struct unfortunately) around the
> > reads and writes from f_pos in fs/read_write.c in remote_llseek with
> > your patch since the reads/writes from that field are not necessarily
> > atomic and threads could be racing in seek on the same file struct?
>
> Where does is state in POSIX or SUS that we need to cater to that kind
> of application?
> In any case, the current behaviour of f_pos if two threads are sharing
> the file struct is undefined no matter whether you spinlock or not,
> since there is no special locking around sys_read() or sys_write().
>
> Trond
>



--
Thanks,

Steve

2008-01-27 23:05:36

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [PATCH] [9/18] BKL-removal: Use unlocked_ioctl for jfs

On Sun, 2008-01-27 at 03:17 +0100, Andi Kleen wrote:
> Convert jfs_ioctl over to not use the BKL. The only potential race
> I could see was with two ioctls in parallel changing the flags
> and losing the updates. Use the i_mutex to protect against this.
>
> Cc: [email protected]
>
> Signed-off-by: Andi Kleen <[email protected]>

Added to the jfs git tree.

Thanks,
Shaggy
--
David Kleikamp
IBM Linux Technology Center

2008-01-27 23:06:19

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [PATCH] [10/18] BKL-removal: Implement a compat_ioctl handler for JFS

On Sun, 2008-01-27 at 03:17 +0100, Andi Kleen wrote:
> The ioctls were already compatible except for the actual values so this
> was fairly easy to do.
>
> Cc: [email protected]
>
> Signed-off-by: Andi Kleen <[email protected]>

Added to the jfs git tree.

Thanks,
Shaggy
--
David Kleikamp
IBM Linux Technology Center

2008-01-27 23:09:17

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek


On Sun, 2008-01-27 at 16:18 -0600, Steve French wrote:
> If two seeks overlap, can't you end up with an f_pos value that is
> different than what either thread seeked to? or if you have a seek and
> a read overlap can't you end up with the read occurring in the midst
> of an update of f_pos (which takes more than one instruction on
> various architectures), e.g. reading an f_pos, which has only the
> lower half of a 64 bit field updated? I agree that you shouldn't
> have seeks racing in parallel but I think it is preferable to get
> either the updated f_pos or the earlier f_pos not something 1/2
> updated.

Why? The threads are doing something inherently liable to corrupt data
anyway. If they can race over the seek, why wouldn't they race over the
read or write too?
The race in lseek() should probably be the least of your worries in this
case.

Trond

2008-01-28 01:59:53

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] [0/18] Implement some low hanging BKL removal fruit in fs/*

On Sunday 27 January 2008 13:17, Andi Kleen wrote:
> [Andrew: I believe this is -mm material for .25]
>
> - Convert some more file systems (generally those who don't use the BKL
> for anything except mount) to use unlocked_bkl.
> - Implement BKL less fasync (see patch for the rationale)
> This is currently a separate entry point, but since the number of fasync
> users in the tree is relatively small I hope the older entry point can
> be removed then in the not too far future
> [help from other people converting more fasync users to unlocked_fasync
> would be appreciated]
> - Implement BKL less remote_llseek
> - While I was at it I also added a few missing compat ioctl handlers
> - Fix a few comments
>
> This fixes a lot of relatively trivial BKL users in fs/*. The main
> remaining non legacy offenders are now locks.c, nfs/nfsd and reiserfs.
> I believe BKL removal for all of those is being worked on by other people.
> Also a lot of "legacy" file systems still use it, but converting those
> does not seem to be very pressing.

BTW. here is a patch I did a while back for minix. I know it isn't
a big deal, but the work is done so I guess I should send it along.


Attachments:
(No filename) (1.15 kB)
minix-no-bkl.patch (4.11 kB)
Download all attachments

2008-01-28 02:43:54

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

On Sunday 27 January 2008 17:57:14 Steve French wrote:
> Don't you need to a spinlock/spinunlock(i_lock) or something similar
> (there isn't a spinlock in the file struct unfortunately) around the
> reads and writes from f_pos in fs/read_write.c in remote_llseek with
> your patch since the reads/writes from that field are not necessarily
> atomic and threads could be racing in seek on the same file struct?

Funny that you mention it. I actually noticed this too while working on this,
but noticed that it is wrong everywhere (as in even plain sys_write/read gets
it wrong). So I decided to not address it because it is already
broken.

I did actually send email to a few people about this, but no answer
yet.

I agree it's probably all broken on 32bit platforms, but I'm not
sure how to best address this. When it is comprehensively addressed remote_llseek
can use that new method too.

-Andi

2008-01-28 02:44:46

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

On Sunday 27 January 2008 23:18:26 Steve French wrote:
> If two seeks overlap, can't you end up with an f_pos value that is
> different than what either thread seeked to?

Yes you can on 32bit. Especially during the 4GB wrap

-Andi

2008-01-28 02:58:31

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

On Monday 28 January 2008 00:08:56 Trond Myklebust wrote:
>
> On Sun, 2008-01-27 at 16:18 -0600, Steve French wrote:
> > If two seeks overlap, can't you end up with an f_pos value that is
> > different than what either thread seeked to? or if you have a seek and
> > a read overlap can't you end up with the read occurring in the midst
> > of an update of f_pos (which takes more than one instruction on
> > various architectures), e.g. reading an f_pos, which has only the
> > lower half of a 64 bit field updated? I agree that you shouldn't
> > have seeks racing in parallel but I think it is preferable to get
> > either the updated f_pos or the earlier f_pos not something 1/2
> > updated.
>
> Why? The threads are doing something inherently liable to corrupt data
> anyway. If they can race over the seek, why wouldn't they race over the
> read or write too?
> The race in lseek() should probably be the least of your worries in this
> case.

The problem is that it's not a race in who gets to do its thing first, but a
parallel reader can actually see a corrupted value from the two independent
words on 32bit (e.g. during a 4GB). And this could actually completely corrupt
f_pos when it happens with two racing relative seeks or read/write()s

I would consider that a bug.

Fixes would be either to always take a spinlock to update this (nasty on
platforms where spinlocks are expensive like P4) or define some architecture
specific way to read/write 64bit values consistently. In theory also some
lazy locking seqlock like mechanism could be used, but that had the disadvantage
of being theoretically starvable.

-Andi

2008-01-28 03:15:56

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [0/18] Implement some low hanging BKL removal fruit in fs/*


> BTW. here is a patch I did a while back for minix. I know it isn't
> a big deal, but the work is done so I guess I should send it along.

Looks safe, although I'm surprised it actually gets around with such
little locking in general.

-Andi


2008-01-28 04:13:24

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek


On Mon, 2008-01-28 at 03:58 +0100, Andi Kleen wrote:
> The problem is that it's not a race in who gets to do its thing first, but a
> parallel reader can actually see a corrupted value from the two independent
> words on 32bit (e.g. during a 4GB). And this could actually completely corrupt
> f_pos when it happens with two racing relative seeks or read/write()s
>
> I would consider that a bug.

I disagree. The corruption occurs because this isn't a situation that is
allowed by either POSIX or SUSv2/v3. Exactly what spec are you referring
to here?

Trond

2008-01-28 04:38:40

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

On Monday 28 January 2008 05:13:09 Trond Myklebust wrote:
>
> On Mon, 2008-01-28 at 03:58 +0100, Andi Kleen wrote:
> > The problem is that it's not a race in who gets to do its thing first, but a
> > parallel reader can actually see a corrupted value from the two independent
> > words on 32bit (e.g. during a 4GB). And this could actually completely corrupt
> > f_pos when it happens with two racing relative seeks or read/write()s
> >
> > I would consider that a bug.
>
> I disagree. The corruption occurs because this isn't a situation that is
> allowed by either POSIX or SUSv2/v3. Exactly what spec are you referring
> to here?

No specific spec, just general quality of implementation. We normally don't have
non thread safe system calls even if it was in theory allowed by some specification.

-Andi

2008-01-28 04:52:09

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek


On Mon, 2008-01-28 at 05:38 +0100, Andi Kleen wrote:
> On Monday 28 January 2008 05:13:09 Trond Myklebust wrote:
> >
> > On Mon, 2008-01-28 at 03:58 +0100, Andi Kleen wrote:
> > > The problem is that it's not a race in who gets to do its thing first, but a
> > > parallel reader can actually see a corrupted value from the two independent
> > > words on 32bit (e.g. during a 4GB). And this could actually completely corrupt
> > > f_pos when it happens with two racing relative seeks or read/write()s
> > >
> > > I would consider that a bug.
> >
> > I disagree. The corruption occurs because this isn't a situation that is
> > allowed by either POSIX or SUSv2/v3. Exactly what spec are you referring
> > to here?
>
> No specific spec, just general quality of implementation. We normally don't have
> non thread safe system calls even if it was in theory allowed by some specification.

We've had the existing implementation for quite some time. The arguments
against changing it have been the same all along: if your application
wants to share files between threads, the portability argument implies
that you should either use pread/pwrite or use a mutex or some other
form of synchronisation primitive in order to ensure that
lseek()/read()/write() do not overlap.

Cheers
Trond

2008-01-28 05:16:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

On Mon, 28 Jan 2008 05:38:25 +0100 Andi Kleen <[email protected]> wrote:

> On Monday 28 January 2008 05:13:09 Trond Myklebust wrote:
> >
> > On Mon, 2008-01-28 at 03:58 +0100, Andi Kleen wrote:
> > > The problem is that it's not a race in who gets to do its thing first, but a
> > > parallel reader can actually see a corrupted value from the two independent
> > > words on 32bit (e.g. during a 4GB). And this could actually completely corrupt
> > > f_pos when it happens with two racing relative seeks or read/write()s
> > >
> > > I would consider that a bug.
> >
> > I disagree. The corruption occurs because this isn't a situation that is
> > allowed by either POSIX or SUSv2/v3. Exactly what spec are you referring
> > to here?
>
> No specific spec, just general quality of implementation.

I completely agree. If one thread writes A and another writes B then the
kernel should record either A or B, not ((A & 0xffffffff00000000) | (B &
0xffffffff))

2008-01-28 05:35:38

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] [3/18] BKL-removal: Convert ext3 to use unlocked_ioctl

On Sun, 27 Jan 2008 03:17:09 +0100 (CET) Andi Kleen <[email protected]> wrote:

>
> I checked ext3_ioctl and it looked largely safe to not be used
> without BKL. So convert it over to unlocked_ioctl.
>
> The only case where I wasn't quite sure was for the
> dynamic fs grow ioctls versus umounting -- I kept the BKL for those.
>

Please cpoy linux-ext4 on ext2/3/4 material.

I skippped a lot of these patches because I just got bored of fixing
rejects. Now is a very optimistic time to be raising patches against
mainline.

I'm going to work on getting a unified devel tree operating: one which
contains everyone's latest stuff and is updated daily. Basically it'll be
-mm without a couple of the quilt trees. People can then prepare patches
against that, as it seems that most can't be bothered patching against -mm,
let alone building and testing it. More later.

> + /* AK: not sure the BKL is needed, but this might prevent
> + * races against umount */
> + lock_kernel();
> err = ext3_group_extend(sb, EXT3_SB(sb)->s_es, n_blocks_count);
> journal_lock_updates(EXT3_SB(sb)->s_journal);
> journal_flush(EXT3_SB(sb)->s_journal);
> journal_unlock_updates(EXT3_SB(sb)->s_journal);
> + unlock_kernel();
>
> return err;
> }
> @@ -245,11 +249,14 @@ flags_err:
> if (copy_from_user(&input, (struct ext3_new_group_input __user *)arg,
> sizeof(input)))
> return -EFAULT;
> -
> + /* AK: not sure the BKL is needed, but this might prevent
> + * races against umount */
> + lock_kernel();
> err = ext3_group_add(sb, &input);
> journal_lock_updates(EXT3_SB(sb)->s_journal);
> journal_flush(EXT3_SB(sb)->s_journal);
> journal_unlock_updates(EXT3_SB(sb)->s_journal);
> + unlock_kernel();
>

The ext3_ioctl() caller has an open fd against the fs - should be
sufficient to keep unmount away?

(gets even more rejects, drops all the fasync patches too)

It's all reached the stage of stupid.

2008-01-28 06:04:10

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [3/18] BKL-removal: Convert ext3 to use unlocked_ioctl

On Monday 28 January 2008 06:33, Andrew Morton wrote:
> On Sun, 27 Jan 2008 03:17:09 +0100 (CET) Andi Kleen <[email protected]> wrote:
> > I checked ext3_ioctl and it looked largely safe to not be used
> > without BKL. So convert it over to unlocked_ioctl.
> >
> > The only case where I wasn't quite sure was for the
> > dynamic fs grow ioctls versus umounting -- I kept the BKL for those.
>
> Please cpoy linux-ext4 on ext2/3/4 material.

Ok I'll resubmit those to tytso/ext4-devel (or perhaps he has already seen
them)

>
> I skippped a lot of these patches because I just got bored of fixing
> rejects. Now is a very optimistic time to be raising patches against
> mainline.

JFS and CIFS are already taken care of by the maintainers. This leaves
remote_llseek which touches a couple of file systems. Could you
perhaps take that one only please? And perhaps Nick's minix
patchkit which looks safe to me and is unlikely to cause conflicts.

> > + /* AK: not sure the BKL is needed, but this might prevent
> > + * races against umount */
> > + lock_kernel();
> > err = ext3_group_add(sb, &input);
> > journal_lock_updates(EXT3_SB(sb)->s_journal);
> > journal_flush(EXT3_SB(sb)->s_journal);
> > journal_unlock_updates(EXT3_SB(sb)->s_journal);
> > + unlock_kernel();
>
> The ext3_ioctl() caller has an open fd against the fs - should be
> sufficient to keep unmount away?

True. I am still conservative because group_add is a lot of code
which I didn't fully check. But with the open fd it's likely safe
to not take the BKL because there is nothing else (except
readdir?) in ext* that takes it.

> It's all reached the stage of stupid.

I'll resubmit ->fasync_unlocked against -mm.

Also I wanted to recheck the ->f_flags locking. I found one bug in those
already and I can extract the bug fix for that one.

-Andi

2008-01-28 08:19:29

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek


> I completely agree. If one thread writes A and another writes B then the
> kernel should record either A or B, not ((A & 0xffffffff00000000) | (B &
> 0xffffffff))

The problem is pretty nasty unfortunately. To solve it properly I think
the file_operations->read/write prototypes would need to be changed
because otherwise it is not possible to do atomic relative updates
of f_pos. Right now the actual update is burrowed deeply in the low level
read/write implementation. But that would be a huge impact all over
the tree :/

Or maybe define a new read/write64 and keep the default as 32bit only-- i
suppose most users don't really need 64bit. Still would be a nasty API
change.

-Andi

2008-01-28 09:46:35

by Bodo Eggert

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

Trond Myklebust <[email protected]> wrote:
> On Mon, 2008-01-28 at 05:38 +0100, Andi Kleen wrote:
>> On Monday 28 January 2008 05:13:09 Trond Myklebust wrote:
>> > On Mon, 2008-01-28 at 03:58 +0100, Andi Kleen wrote:

>> > > The problem is that it's not a race in who gets to do its thing first,
>> > > but a parallel reader can actually see a corrupted value from the two
>> > > independent words on 32bit (e.g. during a 4GB). And this could actually
>> > > completely corrupt f_pos when it happens with two racing relative seeks
>> > > or read/write()s
>> > >
>> > > I would consider that a bug.
>> >
>> > I disagree. The corruption occurs because this isn't a situation that is
>> > allowed by either POSIX or SUSv2/v3. Exactly what spec are you referring
>> > to here?
>>
>> No specific spec, just general quality of implementation. We normally don't
>> have non thread safe system calls even if it was in theory allowed by some
>> specification.
>
> We've had the existing implementation for quite some time. The arguments
> against changing it have been the same all along: if your application
> wants to share files between threads, the portability argument implies
> that you should either use pread/pwrite or use a mutex or some other
> form of synchronisation primitive in order to ensure that
> lseek()/read()/write() do not overlap.

Does anything in the kernel depend on f_pos being valid?
E.g. is it possible to read beyond the EOF using this race, or to have files
larger than the ulimit?

If not, update the manpage and be done. ??

2008-01-28 13:03:14

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

> > No specific spec, just general quality of implementation.
>
> I completely agree. If one thread writes A and another writes B then the
> kernel should record either A or B, not ((A & 0xffffffff00000000) | (B &
> 0xffffffff))

Agree entirely: the spec doesn't allow for random scribbling in the wrong
place. It doesn't cover which goes first or who "wins" the race but
provides pwrite/pread for that situation. Writing somewhere unrelated is
definitely not to spec and not good.

2008-01-28 13:27:52

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

On Monday 28 January 2008 13:56:05 Alan Cox wrote:
> > > No specific spec, just general quality of implementation.
> >
> > I completely agree. If one thread writes A and another writes B then the
> > kernel should record either A or B, not ((A & 0xffffffff00000000) | (B &
> > 0xffffffff))
>
> Agree entirely: the spec doesn't allow for random scribbling in the wrong
> place. It doesn't cover which goes first or who "wins" the race but
> provides pwrite/pread for that situation. Writing somewhere unrelated is
> definitely not to spec

Actually it would probably -- i guess it's undefined and in undefined
country such things can happen.

Also to be fair I think it's only a problem for the 4GB wrapping case
which is presumably rare (otherwise we would have heard about it)

Also worse really fixing it would be a major change to the VFS
because of the way ->read/write are defined :/

-Andi

2008-01-28 13:44:46

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

> Also worse really fixing it would be a major change to the VFS
> because of the way ->read/write are defined :/

I don't see a problem there. ->read and ->write update the passed pointer
which is not the real f_pos anyway. Just the copies need fixing.

Alan

2008-01-28 14:10:51

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

On Monday 28 January 2008 14:38:57 Alan Cox wrote:
> > Also worse really fixing it would be a major change to the VFS
> > because of the way ->read/write are defined :/
>
> I don't see a problem there. ->read and ->write update the passed pointer
> which is not the real f_pos anyway. Just the copies need fixing.

They are effectually doing a decoupled read/modify/write cycle. e.g.:

A B

read fpos

read fpos

fpos += A fpos += B
write fpos


write fpos

So you get overlapping reads. Probably not good.

-Andi

2008-01-28 14:55:54

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

On Mon, 28 Jan 2008 15:10:34 +0100
Andi Kleen <[email protected]> wrote:

> On Monday 28 January 2008 14:38:57 Alan Cox wrote:
> > > Also worse really fixing it would be a major change to the VFS
> > > because of the way ->read/write are defined :/
> >
> > I don't see a problem there. ->read and ->write update the passed pointer
> > which is not the real f_pos anyway. Just the copies need fixing.
>
> They are effectually doing a decoupled read/modify/write cycle. e.g.:
>
> A B
>
> read fpos
>
> read fpos
>
> fpos += A fpos += B
> write fpos
>
>
> write fpos
>
> So you get overlapping reads. Probably not good.

No unix system I'm aware of cares about the read/write positioning during
parallel simultaneous reads or writes, with the exception of O_APPEND
which is strictly defined. The problem case is getting fpos != either
valid value.

2008-01-28 15:16:05

by Diego Calleja

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

El Mon, 28 Jan 2008 15:10:34 +0100, Andi Kleen <[email protected]> escribi?:

> So you get overlapping reads. Probably not good.

This was discussed in the past i think ->

http://lkml.org/lkml/2006/4/13/124
http://lkml.org/lkml/2006/4/13/130

2008-01-28 18:34:09

by Steve French

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

On Jan 28, 2008 2:17 AM, Andi Kleen <[email protected]> wrote:
> > I completely agree. If one thread writes A and another writes B then the
> > kernel should record either A or B, not ((A & 0xffffffff00000000) | (B &
> > 0xffffffff))
>
> The problem is pretty nasty unfortunately. To solve it properly I think
> the file_operations->read/write prototypes would need to be changed
> because otherwise it is not possible to do atomic relative updates
> of f_pos. Right now the actual update is burrowed deeply in the low level
> read/write implementation. But that would be a huge impact all over
> the tree :/

If there were a wrapper around reads and writes of f_pos as there is
for i_size e.g. it would hit a lot of code, but not as many as I had
originally thought. the most important ones are in the vfs itself, where
there are only 59 uses of the field (not all need to be changed). ext3
has fewer (25), and cifs only 12 uses.


--
Thanks,

Steve

2008-01-28 19:35:09

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek


On Mon, 2008-01-28 at 12:33 -0600, Steve French wrote:
> On Jan 28, 2008 2:17 AM, Andi Kleen <[email protected]> wrote:
> > > I completely agree. If one thread writes A and another writes B then the
> > > kernel should record either A or B, not ((A & 0xffffffff00000000) | (B &
> > > 0xffffffff))
> >
> > The problem is pretty nasty unfortunately. To solve it properly I think
> > the file_operations->read/write prototypes would need to be changed
> > because otherwise it is not possible to do atomic relative updates
> > of f_pos. Right now the actual update is burrowed deeply in the low level
> > read/write implementation. But that would be a huge impact all over
> > the tree :/
>
> If there were a wrapper around reads and writes of f_pos as there is
> for i_size e.g. it would hit a lot of code, but not as many as I had
> originally thought. the most important ones are in the vfs itself, where
> there are only 59 uses of the field (not all need to be changed). ext3
> has fewer (25), and cifs only 12 uses.

Most of the uses in ext3 and cifs deal with a directory's f_pos in
readdir, which is protected by i_mutex, so I don't think we need to
worry about them at all.
--
David Kleikamp
IBM Linux Technology Center