2003-03-15 10:10:37

by Alex Tomas

[permalink] [raw]
Subject: [PATCH] remove BKL from ext2's readdir


hi!

I took a look at readdir() in 2.5.64's ext2 and found it serialized by BKL.
As far as I see in sources, there is no need in this BKL. So, here is small
patch and some benchmarks. of course, the result was expected pretty much ;)


500 1000
readdir+BKL: 0m11.793s 0m23.403s
readdir-BKL: 0m6.060s 0m12.113s

description: two processes read own dir populated by files (500 and 1000 files).
this repeats for 100000 times. the iron is dual 1GHz P3.


diff -uNr linux/fs/ext2/dir.c edited/fs/ext2/dir.c
--- linux/fs/ext2/dir.c Sat Mar 15 13:08:24 2003
+++ edited/fs/ext2/dir.c Sat Mar 15 13:08:11 2003
@@ -259,8 +259,6 @@
int need_revalidate = (filp->f_version != inode->i_version);
int ret = 0;

- lock_kernel();
-
if (pos > inode->i_size - EXT2_DIR_REC_LEN(1))
goto done;

@@ -313,7 +311,6 @@
filp->f_pos = (n << PAGE_CACHE_SHIFT) | offset;
filp->f_version = inode->i_version;
UPDATE_ATIME(inode);
- unlock_kernel();
return 0;
}



2003-03-15 10:25:30

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] remove BKL from ext2's readdir

Alex Tomas <[email protected]> wrote:
>
>
> hi!
>
> I took a look at readdir() in 2.5.64's ext2 and found it serialized by BKL.

Yes, I had this in -mm for ages, seem to have lost it. Also removal of BKL
from lseek().

The theory is that the lock is there to avoid f_pos races against lseek. We
ended up deciding that the way to address this is to ensure that all readdir
implementations do:

foo_readdir()
{
loff_t pos = file->f_pos;

....
<code which doesn't touch file->f_pos, but which modifies pos>
...

file->f_pos = pos;
}

ext2 does this right and does not need the lock_kernel(). Once all
filesystems have been audited (and, if necessary, fixed) we can remove the
BKL from lseek also.

2003-03-15 10:52:36

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] remove BKL from ext2's readdir

Andrew Morton <[email protected]> wrote:
>
> Alex Tomas <[email protected]> wrote:
> >
> >
> > hi!
> >
> > I took a look at readdir() in 2.5.64's ext2 and found it serialized by BKL.
>
> Yes, I had this in -mm for ages, seem to have lost it. Also removal of BKL
> from lseek().
>
> The theory is that the lock is there to avoid f_pos races against lseek. We
> ended up deciding that the way to address this is to ensure that all readdir
> implementations do:
>

hm, no. lseek is using the directory's i_sem now, and readdir runs
under that too. So we should be able to remove lock_kernel() from
the readdir implementation of all filesystems which are using
generic_file_llseek().

2003-03-15 16:19:44

by Alex Tomas

[permalink] [raw]
Subject: Re: [PATCH] remove BKL from ext2's readdir

>>>>> Andrew Morton (AM) writes:

AM> hm, no. lseek is using the directory's i_sem now, and readdir
AM> runs under that too. So we should be able to remove
AM> lock_kernel() from the readdir implementation of all filesystems
AM> which are using generic_file_llseek().

looks like only coda and ntfs use generic_file_llseek(). other use
default_llseek(). what's the reason do not use generic_file_llseek().
historical only? or not?

2003-03-15 16:18:08

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH] remove BKL from ext2's readdir

On Sat, 2003-03-15 at 05:36, Andrew Morton wrote:

> > I took a look at readdir() in 2.5.64's ext2 and found it serialized by BKL.
>
> Yes, I had this in -mm for ages, seem to have lost it. Also removal of BKL
> from lseek().

I moved lseek() from the BKL to the i_sem in early 2.5.

Robert Love

2003-03-15 20:00:39

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] remove BKL from ext2's readdir

Alex Tomas <[email protected]> wrote:
>
> >>>>> Andrew Morton (AM) writes:
>
> AM> hm, no. lseek is using the directory's i_sem now, and readdir
> AM> runs under that too. So we should be able to remove
> AM> lock_kernel() from the readdir implementation of all filesystems
> AM> which are using generic_file_llseek().
>
> looks like only coda and ntfs use generic_file_llseek(). other use
> default_llseek(). what's the reason do not use generic_file_llseek().
> historical only? or not?

grep again.

I've made the change to ext2 and ext3. Other filesystems will require
some thought to verify that the lock_kernel()s are not protecting
against some other random codepath.

2003-03-15 20:07:10

by Alex Tomas

[permalink] [raw]
Subject: Re: [PATCH] remove BKL from ext2's readdir

>>>>> Andrew Morton (AM) writes:


AM> grep again.

AM> I've made the change to ext2 and ext3. Other filesystems will
AM> require some thought to verify that the lock_kernel()s are not
AM> protecting against some other random codepath.


hmm:

[root@proto edited]# head Makefile
VERSION = 2
PATCHLEVEL = 5
SUBLEVEL = 64
EXTRAVERSION =


fs/ext2/dir.c:

struct file_operations ext2_dir_operations = {
.read = generic_read_dir,
.readdir = ext2_readdir,
.ioctl = ext2_ioctl,
.fsync = ext2_sync_file,
};



fs/read_write.c:

static inline loff_t llseek(struct file *file, loff_t offset, int origin)
{
loff_t (*fn)(struct file *, loff_t, int);

fn = default_llseek;
if (file->f_op && file->f_op->llseek)
fn = file->f_op->llseek;
return fn(file, offset, origin);
}

2003-03-15 20:13:43

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] remove BKL from ext2's readdir

Alex Tomas <[email protected]> wrote:
>
> fs/ext2/dir.c:
>
> struct file_operations ext2_dir_operations = {
> .read = generic_read_dir,
> .readdir = ext2_readdir,
> .ioctl = ext2_ioctl,
> .fsync = ext2_sync_file,
> };

ah, OK. How about this?

diff -puN fs/ext2/dir.c~lseek-ext2_readdir fs/ext2/dir.c
--- 25/fs/ext2/dir.c~lseek-ext2_readdir 2003-03-15 03:20:22.000000000 -0800
+++ 25-akpm/fs/ext2/dir.c 2003-03-15 12:21:56.000000000 -0800
@@ -259,8 +259,6 @@ ext2_readdir (struct file * filp, void *
int need_revalidate = (filp->f_version != inode->i_version);
int ret = 0;

- lock_kernel();
-
if (pos > inode->i_size - EXT2_DIR_REC_LEN(1))
goto done;

@@ -313,7 +311,6 @@ done:
filp->f_pos = (n << PAGE_CACHE_SHIFT) | offset;
filp->f_version = inode->i_version;
UPDATE_ATIME(inode);
- unlock_kernel();
return 0;
}

@@ -660,6 +657,7 @@ not_empty:
}

struct file_operations ext2_dir_operations = {
+ .llseek = generic_file_llseek,
.read = generic_read_dir,
.readdir = ext2_readdir,
.ioctl = ext2_ioctl,
diff -puN fs/ext3/dir.c~lseek-ext2_readdir fs/ext3/dir.c
--- 25/fs/ext3/dir.c~lseek-ext2_readdir 2003-03-15 03:20:22.000000000 -0800
+++ 25-akpm/fs/ext3/dir.c 2003-03-15 12:22:14.000000000 -0800
@@ -37,10 +37,11 @@ static int ext3_release_dir (struct inod
struct file * filp);

struct file_operations ext3_dir_operations = {
+ .llseek = generic_file_llseek,
.read = generic_read_dir,
.readdir = ext3_readdir, /* we take BKL. needed?*/
.ioctl = ext3_ioctl, /* BKL held */
- .fsync = ext3_sync_file, /* BKL held */
+ .fsync = ext3_sync_file, /* BKL held */
#ifdef CONFIG_EXT3_INDEX
.release = ext3_release_dir,
#endif
@@ -98,8 +99,7 @@ static int ext3_readdir(struct file * fi
struct super_block * sb;
int err;
struct inode *inode = filp->f_dentry->d_inode;
-
- lock_kernel();
+ int ret = 0;

sb = inode->i_sb;

@@ -110,8 +110,8 @@ static int ext3_readdir(struct file * fi
((inode->i_size >> sb->s_blocksize_bits) == 1))) {
err = ext3_dx_readdir(filp, dirent, filldir);
if (err != ERR_BAD_DX_DIR) {
- unlock_kernel();
- return err;
+ ret = err;
+ goto out;
}
/*
* We don't set the inode dirty flag since it's not
@@ -191,8 +191,8 @@ revalidate:
filp->f_pos = (filp->f_pos |
(sb->s_blocksize - 1)) + 1;
brelse (bh);
- unlock_kernel();
- return stored;
+ ret = stored;
+ goto out;
}
offset += le16_to_cpu(de->rec_len);
if (le32_to_cpu(de->inode)) {
@@ -222,8 +222,8 @@ revalidate:
brelse (bh);
}
UPDATE_ATIME(inode);
- unlock_kernel();
- return 0;
+out:
+ return ret;
}

#ifdef CONFIG_EXT3_INDEX
diff -puN fs/ext2/file.c~lseek-ext2_readdir fs/ext2/file.c

_


2003-03-15 20:16:45

by Alex Tomas

[permalink] [raw]
Subject: Re: [PATCH] remove BKL from ext2's readdir

>>>>> Andrew Morton (AM) writes:

AM> ah, OK. How about this?

looks fine

2003-03-15 21:44:33

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] remove BKL from ext2's readdir

Andrew Morton <[email protected]> writes:

> foo_readdir()
> {
> loff_t pos = file->f_pos;
>
> ....
> <code which doesn't touch file->f_pos, but which modifies pos>
> ...
>
> file->f_pos = pos;
> }

At least for alpha this will require an rmb_depends() between the read
and the write. Probably on x86 an rmb() wouldn't hurt neither.

Otherwise there is no guarantee other CPUs see that intended memory
modification order

-Andi

2003-03-15 21:54:27

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] remove BKL from ext2's readdir

Andi Kleen <[email protected]> wrote:
>
> Andrew Morton <[email protected]> writes:
>
> > foo_readdir()
> > {
> > loff_t pos = file->f_pos;
> >
> > ....
> > <code which doesn't touch file->f_pos, but which modifies pos>
> > ...
> >
> > file->f_pos = pos;
> > }
>
> At least for alpha this will require an rmb_depends() between the read
> and the write. Probably on x86 an rmb() wouldn't hurt neither.
>
> Otherwise there is no guarantee other CPUs see that intended memory
> modification order
>

Won't the atomic operations in down() and up() do that?

It's all a bit moot really. If two user threads or processes are fighting
over the value of f_pos at the same time in this manner then they're buggy
anyway. All we need to do here is to ensure that the kernel doesn't do
anything grossly wrong or insecure.