2009-01-15 22:55:45

by Jonathan Corbet

[permalink] [raw]
Subject: [PATCH, RFC] Remove fasync() BKL usage, take 3325

One of these years I've got to get this right. I've fixed the problem
pointed out by Oleg where f_flags would get changed even if fasync()
fails.

I have also taken out the ABI change. CCing the linux-api list because
I still think it's not quite right; fcntl() should not silently let
applications set the FASYNC flag if the underlying driver/filesystem
does not support it. But that's How We've Always Done It, and one
messes with such things at great risk. If we want fcntl() to return an
error in this case, it's an easy change.

This one's against 2.6.29-rc1. If I don't hear screaming, I'll drop
this one into linux-next.

Thanks,

jon

--

Accesses to the f_flags member of struct file involve read-modify-write
cycles; they have traditionally been done in a racy way. This patch
introduces a global spinlock to protect f_flags against concurrent
modifications.

Additionally, changes to the FASYNC flag and resulting calls to
f_op->fasync() need to be done in an atomic manner. Here, the BKL is
removed and FASYNC modifications are protected with a mutex.

Signed-off-by: Jonathan Corbet <[email protected]>
---
drivers/char/tty_io.c | 5 +--
fs/fcntl.c | 65 +++++++++++++++++++++++++++++++++++++++----------
fs/ioctl.c | 25 ++++---------------
fs/nfsd/vfs.c | 5 +++-
include/linux/fs.h | 17 +++++++++++++
5 files changed, 80 insertions(+), 37 deletions(-)

diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index d33e5ab..8450316 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -2160,13 +2160,12 @@ static int fionbio(struct file *file, int __user *p)
if (get_user(nonblock, p))
return -EFAULT;

- /* file->f_flags is still BKL protected in the fs layer - vomit */
- lock_kernel();
+ lock_file_flags();
if (nonblock)
file->f_flags |= O_NONBLOCK;
else
file->f_flags &= ~O_NONBLOCK;
- unlock_kernel();
+ unlock_file_flags();
return 0;
}

diff --git a/fs/fcntl.c b/fs/fcntl.c
index cdc1419..ddd497d 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -19,12 +19,16 @@
#include <linux/signal.h>
#include <linux/rcupdate.h>
#include <linux/pid_namespace.h>
-#include <linux/smp_lock.h>
+#include <linux/mutex.h>

#include <asm/poll.h>
#include <asm/siginfo.h>
#include <asm/uaccess.h>

+/* Serialize access to file->f_flags */
+DEFINE_SPINLOCK(file_flags_lock);
+EXPORT_SYMBOL(file_flags_lock);
+
void set_close_on_exec(unsigned int fd, int flag)
{
struct files_struct *files = current->files;
@@ -141,7 +145,7 @@ asmlinkage long sys_dup(unsigned int fildes)
return ret;
}

-#define SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | FASYNC | O_DIRECT | O_NOATIME)
+#define SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | O_DIRECT | O_NOATIME)

static int setfl(int fd, struct file * filp, unsigned long arg)
{
@@ -176,25 +180,60 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
if (error)
return error;

- /*
- * We still need a lock here for now to keep multiple FASYNC calls
- * from racing with each other.
- */
- lock_kernel();
if ((arg ^ filp->f_flags) & FASYNC) {
- if (filp->f_op && filp->f_op->fasync) {
- error = filp->f_op->fasync(fd, filp, (arg & FASYNC) != 0);
- if (error < 0)
- goto out;
- }
+ error = fasync_change(fd, filp, (arg & FASYNC) != 0);
+ if (error == -ENOTTY)
+ /*
+ * ABI compatibility: fcntl() has traditionally returned
+ * zero in this case (but ioctl() does not).
+ */
+ error = 0;
+ else if (error < 0)
+ goto out;
}

+ lock_file_flags();
filp->f_flags = (arg & SETFL_MASK) | (filp->f_flags & ~SETFL_MASK);
+ unlock_file_flags();
out:
- unlock_kernel();
return error;
}

+
+
+/*
+ * Change the setting of fasync, let the driver know.
+ * Not static because ioctl_fioasync() uses it too.
+ */
+int fasync_change(int fd, struct file *filp, int on)
+{
+ int ret = 0;
+ static DEFINE_MUTEX(fasync_mutex);
+
+ if (filp->f_op->fasync == NULL)
+ return -ENOTTY;
+
+ mutex_lock(&fasync_mutex);
+ /* Can test without flags lock, nobody else will change it */
+ if (((filp->f_flags & FASYNC) == 0) == (on == 0))
+ goto out;
+ ret = filp->f_op->fasync(fd, filp, on);
+ if (ret >= 0) {
+ lock_file_flags();
+ if (on)
+ filp->f_flags |= FASYNC;
+ else
+ filp->f_flags &= ~FASYNC;
+ unlock_file_flags();
+ }
+ out:
+ mutex_unlock(&fasync_mutex);
+ return ret;
+}
+
+
+
+
static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
uid_t uid, uid_t euid, int force)
{
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 20b0a8a..e5f85fb 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -404,10 +404,12 @@ static int ioctl_fionbio(struct file *filp, int __user *argp)
if (O_NONBLOCK != O_NDELAY)
flag |= O_NDELAY;
#endif
+ lock_file_flags();
if (on)
filp->f_flags |= flag;
else
filp->f_flags &= ~flag;
+ unlock_file_flags();
return error;
}

@@ -423,20 +425,9 @@ static int ioctl_fioasync(unsigned int fd, struct file *filp,
flag = on ? FASYNC : 0;

/* Did FASYNC state change ? */
- if ((flag ^ filp->f_flags) & FASYNC) {
- if (filp->f_op && filp->f_op->fasync)
- error = filp->f_op->fasync(fd, filp, on);
- else
- error = -ENOTTY;
- }
- if (error)
- return error;
-
- if (on)
- filp->f_flags |= FASYNC;
- else
- filp->f_flags &= ~FASYNC;
- return error;
+ if ((flag ^ filp->f_flags) & FASYNC)
+ return fasync_change(fd, filp, on);
+ return 0;
}

static int ioctl_fsfreeze(struct file *filp)
@@ -499,17 +490,11 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
break;

case FIONBIO:
- /* BKL needed to avoid races tweaking f_flags */
- lock_kernel();
error = ioctl_fionbio(filp, argp);
- unlock_kernel();
break;

case FIOASYNC:
- /* BKL needed to avoid races tweaking f_flags */
- lock_kernel();
error = ioctl_fioasync(fd, filp, argp);
- unlock_kernel();
break;

case FIOQSIZE:
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 6e50aaa..4965a5a 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -998,8 +998,11 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,

if (!EX_ISSYNC(exp))
stable = 0;
- if (stable && !EX_WGATHER(exp))
+ if (stable && !EX_WGATHER(exp)) {
+ lock_file_flags();
file->f_flags |= O_SYNC;
+ unlock_file_flags();
+ }

/* Write the data. */
oldfs = get_fs(); set_fs(KERNEL_DS);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6022f44..ccba351 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -874,6 +874,23 @@ extern spinlock_t files_lock;
#define get_file(x) atomic_long_inc(&(x)->f_count)
#define file_count(x) atomic_long_read(&(x)->f_count)

+/*
+ * Serialize changes to file->f_flags. These should not be called
+ * from interrupt context.
+ */
+extern spinlock_t file_flags_lock;
+
+static inline void lock_file_flags(void)
+{
+ spin_lock(&file_flags_lock);
+}
+
+static inline void unlock_file_flags(void)
+{
+ spin_unlock(&file_flags_lock);
+}
+extern int fasync_change(int fd, struct file *filp, int on);
+
#ifdef CONFIG_DEBUG_WRITECOUNT
static inline void file_take_write(struct file *f)
{
--
1.6.1


2009-01-22 14:53:28

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

> On Thu, 15 Jan 2009 15:32:11 -0700 Jonathan Corbet <[email protected]> wrote:
> One of these years I've got to get this right. I've fixed the problem
> pointed out by Oleg where f_flags would get changed even if fasync()
> fails.
>
> I have also taken out the ABI change. CCing the linux-api list because
> I still think it's not quite right; fcntl() should not silently let
> applications set the FASYNC flag if the underlying driver/filesystem
> does not support it. But that's How We've Always Done It, and one
> messes with such things at great risk. If we want fcntl() to return an
> error in this case, it's an easy change.
>
> This one's against 2.6.29-rc1. If I don't hear screaming, I'll drop
> this one into linux-next.
>

scream.

>
> jon
>
> --
>
> Accesses to the f_flags member of struct file involve read-modify-write
> cycles; they have traditionally been done in a racy way. This patch
> introduces a global spinlock to protect f_flags against concurrent
> modifications.
>
> Additionally, changes to the FASYNC flag and resulting calls to
> f_op->fasync() need to be done in an atomic manner. Here, the BKL is
> removed and FASYNC modifications are protected with a mutex.
>
> Signed-off-by: Jonathan Corbet <[email protected]>
> ---
> drivers/char/tty_io.c | 5 +--
> fs/fcntl.c | 65 +++++++++++++++++++++++++++++++++++++++----------
> fs/ioctl.c | 25 ++++---------------
> fs/nfsd/vfs.c | 5 +++-
> include/linux/fs.h | 17 +++++++++++++
> 5 files changed, 80 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
> index d33e5ab..8450316 100644
> --- a/drivers/char/tty_io.c
> +++ b/drivers/char/tty_io.c
> @@ -2160,13 +2160,12 @@ static int fionbio(struct file *file, int __user *p)
> if (get_user(nonblock, p))
> return -EFAULT;
>
> - /* file->f_flags is still BKL protected in the fs layer - vomit */
> - lock_kernel();
> + lock_file_flags();
> if (nonblock)
> file->f_flags |= O_NONBLOCK;
> else
> file->f_flags &= ~O_NONBLOCK;
> - unlock_kernel();
> + unlock_file_flags();

OK, replacing a lock_kernel() with a spin_lock(&global_lock) is pretty
straightforwad. But it's really really sad. It basically leaves a great
big FIXME in there. It'd be better to fix it.

We don't have a handy lock in struct file which could be borrowed.

- We could add one

- We could borrow file->f_path.dentry->d_inode->i_lock

- We could convert that field to long and use bitops (sounds nice?)

> return 0;
> }
>
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index cdc1419..ddd497d 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
>
> ...
>
> +/*
> + * Change the setting of fasync, let the driver know.
> + * Not static because ioctl_fioasync() uses it too.
> + */
> +int fasync_change(int fd, struct file *filp, int on)
> +{
> + int ret = 0;
> + static DEFINE_MUTEX(fasync_mutex);
> +
> + if (filp->f_op->fasync == NULL)
> + return -ENOTTY;
> +
> + mutex_lock(&fasync_mutex);
> + /* Can test without flags lock, nobody else will change it */
> + if (((filp->f_flags & FASYNC) == 0) == (on == 0))
> + goto out;
> + ret = filp->f_op->fasync(fd, filp, on);
> + if (ret >= 0) {
> + lock_file_flags();
> + if (on)
> + filp->f_flags |= FASYNC;
> + else
> + filp->f_flags &= ~FASYNC;
> + unlock_file_flags();
> + }
> + out:

column zero, please.

> + mutex_unlock(&fasync_mutex);
> + return ret;
> +}

It isn't completely obvious what fasync_mutex is protecting, why it exists.

A comment which explains this would be appropriate?

2009-01-22 15:54:27

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

> OK, replacing a lock_kernel() with a spin_lock(&global_lock) is pretty
> straightforwad. But it's really really sad. It basically leaves a great
> big FIXME in there. It'd be better to fix it.

Also it might be that it's even worse than the BKL.

>
> We don't have a handy lock in struct file which could be borrowed.
>
> - We could add one
>
> - We could borrow file->f_path.dentry->d_inode->i_lock
>
> - We could convert that field to long and use bitops (sounds nice?)

It would still require a bitlock because some state in the low
level fasync needs to be protected.

Oleg has a proposal to do this using a flag bit which seemed
reasonable to me.

-Andi

2009-01-22 20:33:09

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On Thu, Jan 22, 2009 at 06:51:04AM -0800, Andrew Morton wrote:
> OK, replacing a lock_kernel() with a spin_lock(&global_lock) is pretty
> straightforwad. But it's really really sad. It basically leaves a great
> big FIXME in there. It'd be better to fix it.


Umm, we've been discussiong this in and out a guestimated million times.

Let's go forward with Jon's patch which is on obvious improvement and
if it shows problems later on we can revisit it.

There's this proverb about premature optimization, ya' know?

2009-01-23 04:41:35

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On Thu, Jan 22, 2009 at 03:32:49PM -0500, Christoph Hellwig wrote:
> On Thu, Jan 22, 2009 at 06:51:04AM -0800, Andrew Morton wrote:
> > OK, replacing a lock_kernel() with a spin_lock(&global_lock) is pretty
> > straightforwad. But it's really really sad. It basically leaves a great
> > big FIXME in there. It'd be better to fix it.
>
>
> Umm, we've been discussiong this in and out a guestimated million times.
>
> Let's go forward with Jon's patch which is on obvious improvement and
> if it shows problems later on we can revisit it.

The point was that we already have a better patch from Oleg.

-Andi
--
[email protected] -- Speaking for myself only.

2009-01-23 05:15:26

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On Thu, 22 Jan 2009 06:51:04 -0800
Andrew Morton <[email protected]> wrote:

> OK, replacing a lock_kernel() with a spin_lock(&global_lock) is pretty
> straightforwad. But it's really really sad. It basically leaves a
> great big FIXME in there. It'd be better to fix it.
>
> We don't have a handy lock in struct file which could be borrowed.

Yeah, I noticed that too.

> - We could add one

The problem there is that this bloats struct file, and that seemed like
something worth avoiding. It could easily be done, but I don't know
why we would before knowing that the global spinlock is a problem.

But... it's *already* protected by a global spinlock (the BKL) which is
(still) more widely used.

> - We could borrow file->f_path.dentry->d_inode->i_lock

I didn't think of that one. Using a lock which is three indirections
away seems a little obscure; again, I guess we could do that if the
global spinlock actually turns out to be a problem.

> - We could convert that field to long and use bitops (sounds nice?)

I did think of that one. Reasons not to include growing struct file
and the fact that there are places which set more than one flag at
once. So we'd replace assignments with loops - and we still don't
solve the fasync() problem.

So that was my thinking.

I'll address your other comments when I get back home.

Thanks,

jon

2009-01-23 05:21:59

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On Thu, 22 Jan 2009 17:09:35 +0100
Andi Kleen <[email protected]> wrote:

> > OK, replacing a lock_kernel() with a spin_lock(&global_lock) is
> > pretty straightforwad. But it's really really sad. It basically
> > leaves a great big FIXME in there. It'd be better to fix it.
>
> Also it might be that it's even worse than the BKL.

I don't quite see how now. Like the BKL, it's a spinlock.

> It would still require a bitlock because some state in the low
> level fasync needs to be protected.
>
> Oleg has a proposal to do this using a flag bit which seemed
> reasonable to me.

I didn't see a reason to add a one-off custom locking regime for such a
non-hot-path situation. But it would certainly work; if we want to go
that way I'll not fight it.

jon

2009-01-23 05:32:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

> On Thu, 22 Jan 2009 22:15:00 -0700 Jonathan Corbet <[email protected]> wrote:
> On Thu, 22 Jan 2009 06:51:04 -0800
> Andrew Morton <[email protected]> wrote:
>
> > OK, replacing a lock_kernel() with a spin_lock(&global_lock) is pretty
> > straightforwad. But it's really really sad. It basically leaves a
> > great big FIXME in there. It'd be better to fix it.
> >
> > We don't have a handy lock in struct file which could be borrowed.
>
> Yeah, I noticed that too.
>
> > - We could add one
>
> The problem there is that this bloats struct file, and that seemed like
> something worth avoiding.

Not a big deal, really. There's one of these for each presently-open file.
It's not like dentries and inodes, which we cache after userspace has
closed off the file handles.

> It could easily be done, but I don't know
> why we would before knowing that the global spinlock is a problem.
>
> But... it's *already* protected by a global spinlock (the BKL) which is
> (still) more widely used.
>
> > - We could borrow file->f_path.dentry->d_inode->i_lock
>
> I didn't think of that one. Using a lock which is three indirections
> away seems a little obscure; again, I guess we could do that if the
> global spinlock actually turns out to be a problem.
>
> > - We could convert that field to long and use bitops (sounds nice?)
>
> I did think of that one. Reasons not to include growing struct file
> and the fact that there are places which set more than one flag at
> once. So we'd replace assignments with loops - and we still don't
> solve the fasync() problem.
>

I don't know what "the fasync() problem" is?

2009-01-23 05:38:54

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

> I don't know what "the fasync() problem" is?

The state needs to be protected while the per driver ->fasync callback
runs, otherwise the bit can get out of sync with what the driver
thinks it is.

Mind you imho the best way would be to move the bit manipulation for
that into the drivers, but that would require to change them all.

-Andi
--
[email protected] -- Speaking for myself only.

2009-01-23 05:47:00

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On Thu, 2009-01-22 at 21:31 -0800, Andrew Morton wrote:
> > On Thu, 22 Jan 2009 22:15:00 -0700 Jonathan Corbet <[email protected]> wrote:
> > On Thu, 22 Jan 2009 06:51:04 -0800
> > Andrew Morton <[email protected]> wrote:
> >
> > > OK, replacing a lock_kernel() with a spin_lock(&global_lock) is pretty
> > > straightforwad. But it's really really sad. It basically leaves a
> > > great big FIXME in there. It'd be better to fix it.
> > >
> > > We don't have a handy lock in struct file which could be borrowed.
> >
> > Yeah, I noticed that too.
> >
> > > - We could add one
> >
> > The problem there is that this bloats struct file, and that seemed like
> > something worth avoiding.
>
> Not a big deal, really. There's one of these for each presently-open file.
> It's not like dentries and inodes, which we cache after userspace has
> closed off the file handles.

I have to agree with Christoph. The priority here is breaking down the
BKL and document all the things being protected by it and we've got a
reasonably obvious patch in that direction. Meanwhile, there's not
currently a pressing demand to make fasync in particular scale that I'm
aware of.

Having a single big lock here is quite possibly something we'll want to
fix down the road, agreed, but until we can actually measure it hurting
us, debating about whether to use a bit lock or reuse an existing lock
or add a new lock to all struct files is a bit premature.

--
http://selenic.com : development and support for Mercurial and Linux

2009-01-23 06:00:46

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

> I have to agree with Christoph. The priority here is breaking down the
> BKL and document all the things being protected by it and we've got a
> reasonably obvious patch in that direction. Meanwhile, there's not
> currently a pressing demand to make fasync in particular scale that I'm
> aware of.

The classic case is a high throughput network server that uses async
sockets. It has to call F_SETFL on each new socket it opens.

> Having a single big lock here is quite possibly something we'll want to
> fix down the road, agreed, but until we can actually measure it hurting
> us, debating about whether to use a bit lock or reuse an existing lock
> or add a new lock to all struct files is a bit premature.

I think i would agree with you if we didn't have a better patch
already, but if there's one it doesn't make sense not to use it.

-Andi

--
[email protected] -- Speaking for myself only.

2009-01-23 06:02:05

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On Fri, 23 Jan 2009 06:54:04 +0100
Andi Kleen <[email protected]> wrote:

> The state needs to be protected while the per driver ->fasync callback
> runs, otherwise the bit can get out of sync with what the driver
> thinks it is.
>
> Mind you imho the best way would be to move the bit manipulation for
> that into the drivers, but that would require to change them all.

You know, I'm not sure why I didn't look into that. Do we want drivers
reaching directly into struct file and making changes? Maybe a helper
would be better. Hmm, maybe we could call it fasync_helper() and it
could just do the right thing? Will investigate further...

jon

2009-01-23 06:58:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

> On Fri, 23 Jan 2009 06:54:04 +0100 Andi Kleen <[email protected]> wrote:
> > I don't know what "the fasync() problem" is?
>
> The state needs to be protected while the per driver ->fasync callback
> runs, otherwise the bit can get out of sync with what the driver
> thinks it is.

That's the sort of gem which one thinks might have merited a code comment
and some changelog discussion.

> Mind you imho the best way would be to move the bit manipulation for
> that into the drivers, but that would require to change them all.

Do these mystery drivers do the ->f_flags changes under lock_kernel()? If
so, they all should be changed to take lock_file_flgs()?

2009-01-23 10:47:19

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On Fri, 2009-01-23 at 07:15 +0100, Andi Kleen wrote:
> > I have to agree with Christoph. The priority here is breaking down the
> > BKL and document all the things being protected by it and we've got a
> > reasonably obvious patch in that direction. Meanwhile, there's not
> > currently a pressing demand to make fasync in particular scale that I'm
> > aware of.
>
> The classic case is a high throughput network server that uses async
> sockets. It has to call F_SETFL on each new socket it opens.

Am I the only one missing an (additional) socket()-like sys-call with an
additional "flags" argument (somewhat similar to open())?
O_NONBLOCK is another flag that may be set quite often/regularly (at
least in my small world).

Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services

2009-01-28 00:56:20

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On Fri, 23 Jan 2009 05:56:46 +0100
Andi Kleen <[email protected]> wrote:

> On Thu, Jan 22, 2009 at 03:32:49PM -0500, Christoph Hellwig wrote:
> > On Thu, Jan 22, 2009 at 06:51:04AM -0800, Andrew Morton wrote:
> > > OK, replacing a lock_kernel() with a spin_lock(&global_lock) is pretty
> > > straightforwad. But it's really really sad. It basically leaves a great
> > > big FIXME in there. It'd be better to fix it.
> >
> >
> > Umm, we've been discussiong this in and out a guestimated million times.
> >
> > Let's go forward with Jon's patch which is on obvious improvement and
> > if it shows problems later on we can revisit it.
>
> The point was that we already have a better patch from Oleg.
>

Where is this patch?

2009-01-28 03:17:25

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On 01/27, Andrew Morton wrote:
>
> On Fri, 23 Jan 2009 05:56:46 +0100
> Andi Kleen <[email protected]> wrote:
>
> > On Thu, Jan 22, 2009 at 03:32:49PM -0500, Christoph Hellwig wrote:
> > > On Thu, Jan 22, 2009 at 06:51:04AM -0800, Andrew Morton wrote:
> > > > OK, replacing a lock_kernel() with a spin_lock(&global_lock) is pretty
> > > > straightforwad. But it's really really sad. It basically leaves a great
> > > > big FIXME in there. It'd be better to fix it.
> > >
> > >
> > > Umm, we've been discussiong this in and out a guestimated million times.
> > >
> > > Let's go forward with Jon's patch which is on obvious improvement and
> > > if it shows problems later on we can revisit it.
> >
> > The point was that we already have a better patch from Oleg.
> >
>
> Where is this patch?

I didn't send the actual patch. The idea is,

can't we use O_LOCK_FLAGS bit? I agree, it is a bit ugly,
and I won't insist if you don't like is.

static inline int try_lock_f_flags(struct file *file)
{
return !test_and_set_bit(O_LOCK_FLAGS, file->f_flags);
}

static inline set_f_flags(struct file *file, unsigned int flags)
{
file->f_flags = flags & ~O_LOCK_FLAGS;
}

Now, nobody should change ->f_flags directly (except create/open
pathes. For example, ioctl_fionbio() should be changed:

if (try_lock_f_flags(filp)) {
if (on)
set_f_flags(filp, filp->f_flags | flag);
else
set_f_flags(filp, filp->f_flags & ~flag);
}

If try_lock_f_flags() fails we do nothing, as if the current owner of
O_LOCK_FLAGS changes ->f_flags after us.

and, from another message,

No need to disable preemption, we never spin waiting for the
lock bit. If it is locked - somebody else updates ->f_flags,
we can pretend it does this after us. This can confuse F_GETFL
after F_SETFL (if F_SETFL "fails"), but I think in that case
user-space is wrong anyway, it must not do F_GETFL in parallel.

I'll try to make the patch tomorrow, but the problem is that I am not
sure this is not too ugly. At least Jonathan dislikes this approach,
and I do understand him ;)

Oleg.

2009-01-28 03:57:52

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On Wed, 28 Jan 2009 04:14:39 +0100
Oleg Nesterov <[email protected]> wrote:

> I'll try to make the patch tomorrow, but the problem is that I am not
> sure this is not too ugly.

FWIW, I'm partway through a new attempt using bitops for f_flags, moving
FASYNC flag handling into fasync_helper(), and doing away with
additional locks altogether. I have to get past the publication
deadline before I can finish it, though.

jon

2009-01-28 04:26:34

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On 01/27, Jonathan Corbet wrote:
>
> On Wed, 28 Jan 2009 04:14:39 +0100
> Oleg Nesterov <[email protected]> wrote:
>
> > I'll try to make the patch tomorrow, but the problem is that I am not
> > sure this is not too ugly.
>
> FWIW, I'm partway through a new attempt using bitops for f_flags, moving
> FASYNC flag handling into fasync_helper(), and doing away with
> additional locks altogether. I have to get past the publication
> deadline before I can finish it, though.

Great. I'd be happy to agree with another approach.

But please don't forget it is not strictly necessary f_op->fasync()
must use fasync_helper(). And we have users (pipe_rdwr_fasync) which
call fasync_helper() twice.

Oleg.

2009-01-28 14:14:09

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On Wed, 28 Jan 2009 05:23:37 +0100
Oleg Nesterov <[email protected]> wrote:

> But please don't forget it is not strictly necessary f_op->fasync()
> must use fasync_helper(). And we have users (pipe_rdwr_fasync) which
> call fasync_helper() twice.

Understood. Any such change will certainly require an audit of all
in-tree fasync() implementations; that's part of why it's taking a
little while. But it also cleans things up and solves the atomicity
problems nicely, so I think it's worth trying.

jon

2009-01-28 17:36:41

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On Wed, Jan 28, 2009 at 04:14:39AM +0100, Oleg Nesterov wrote:
> I didn't send the actual patch. The idea is,
>
> can't we use O_LOCK_FLAGS bit? I agree, it is a bit ugly,
> and I won't insist if you don't like is.
>
> static inline int try_lock_f_flags(struct file *file)
> {
> return !test_and_set_bit(O_LOCK_FLAGS, file->f_flags);
> }

->f_flags is an unsigned int and the bit macros need an unsigned long.
Increasing the size of struct file for this is probably a bad idea.

2009-01-28 17:44:28

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On Wed, 28 Jan 2009 12:36:18 -0500
Christoph Hellwig <[email protected]> wrote:

> ->f_flags is an unsigned int and the bit macros need an unsigned
> long. Increasing the size of struct file for this is probably a bad
> idea.

That was my concern too, initially, but akpm told me it was OK. From
earlier in the thread:

> > The problem there is that this bloats struct file, and that seemed like
> > something worth avoiding.
>
> Not a big deal, really. There's one of these for each presently-open file.
> It's not like dentries and inodes, which we cache after userspace has
> closed off the file handles.

If others disagree, and using bitops is not an idea which will fly, I'd
sure like to know sooner rather than later.

jon

2009-01-28 17:56:25

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On Wed, Jan 28, 2009 at 10:44:14AM -0700, Jonathan Corbet wrote:
> If others disagree, and using bitops is not an idea which will fly, I'd
> sure like to know sooner rather than later.

There are more than enough use cases that have large numbers of open
files (e.g. various high-end network servers). While it might not be
as sewer as for inodes I think it's really bad idea to do it for no
reason.

2009-01-28 18:15:29

by David Daney

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

Christoph Hellwig wrote:
> On Wed, Jan 28, 2009 at 04:14:39AM +0100, Oleg Nesterov wrote:
>> I didn't send the actual patch. The idea is,
>>
>> can't we use O_LOCK_FLAGS bit? I agree, it is a bit ugly,
>> and I won't insist if you don't like is.
>>
>> static inline int try_lock_f_flags(struct file *file)
>> {
>> return !test_and_set_bit(O_LOCK_FLAGS, file->f_flags);
>> }
>
> ->f_flags is an unsigned int and the bit macros need an unsigned long.
> Increasing the size of struct file for this is probably a bad idea.
>

Could that be seen as a deficiency in the bit macros?

Could we modify them so that they worked on unsigned int as well? I
know we could for some architectures.

David Daney

2009-01-28 18:15:48

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On Wed, 2009-01-28 at 12:55 -0500, Christoph Hellwig wrote:
> On Wed, Jan 28, 2009 at 10:44:14AM -0700, Jonathan Corbet wrote:
> > If others disagree, and using bitops is not an idea which will fly, I'd
> > sure like to know sooner rather than later.
>
> There are more than enough use cases that have large numbers of open
> files (e.g. various high-end network servers). While it might not be
> as sewer as for inodes I think it's really bad idea to do it for no
> reason.

Maybe we can just demote f_ep_lock to f_lock and share it?

Or extend flags and have two independent bitlocks in it. This actually
shrinks struct_file for most users.

--
http://selenic.com : development and support for Mercurial and Linux

2009-01-28 21:05:39

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On Wed, 28 Jan 2009 12:13:47 -0600
Matt Mackall <[email protected]> wrote:

> Maybe we can just demote f_ep_lock to f_lock and share it?

We'd have to move it out of the CONFIG_EPOLL ifdef, but that could be
done. The bigger problem is that there is resistance to putting a lock
around accesses to f_flags. I've tried that before...

It also doesn't solve the FASYNC atomicity problem unless we decide to
call ->fasync() with the lock held, and that opens yet another can of
worms.

> Or extend flags and have two independent bitlocks in it. This actually
> shrinks struct_file for most users.

That has some potential. We'd only need one bitlock (for epoll).
Biggest problem might be preventing the regular file flags from
expanding into the lock bit, especially since those flags are defined
in a pile of architecture-dependent .h files. I'll look into it.

(That's somebody's cue to tell me why this approach is all wrong
too...:)

Thanks,

jon

2009-01-29 14:22:20

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On Wed, Jan 28, 2009 at 12:36:18PM -0500, Christoph Hellwig wrote:
> On Wed, Jan 28, 2009 at 04:14:39AM +0100, Oleg Nesterov wrote:
> > I didn't send the actual patch. The idea is,
> >
> > can't we use O_LOCK_FLAGS bit? I agree, it is a bit ugly,
> > and I won't insist if you don't like is.
> >
> > static inline int try_lock_f_flags(struct file *file)
> > {
> > return !test_and_set_bit(O_LOCK_FLAGS, file->f_flags);
> > }
>
> ->f_flags is an unsigned int and the bit macros need an unsigned long.
> Increasing the size of struct file for this is probably a bad idea.


I think very few architectures actually need the unsigned long.

For 2.4 I had a hack (when it did still matter for struct page)
to define a new type for this that denoted the minimal type size
needed here.

That could be reintroduced. Then only the few archs which really
require unsigned long here would pay the overhead.

-Andi

--
[email protected] -- Speaking for myself only.

2012-08-23 21:07:27

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH, RFC] Remove fasync() BKL usage, take 3325

On Fri, 23 Jan 2009 05:56:46 +0100
Andi Kleen <[email protected]> wrote:

> On Thu, Jan 22, 2009 at 03:32:49PM -0500, Christoph Hellwig wrote:
> > On Thu, Jan 22, 2009 at 06:51:04AM -0800, Andrew Morton wrote:
> > > OK, replacing a lock_kernel() with a spin_lock(&global_lock) is pretty
> > > straightforwad. But it's really really sad. It basically leaves a great
> > > big FIXME in there. It'd be better to fix it.
> >
> >
> > Umm, we've been discussiong this in and out a guestimated million times.
> >
> > Let's go forward with Jon's patch which is on obvious improvement and
> > if it shows problems later on we can revisit it.
>
> The point was that we already have a better patch from Oleg.
>

Where is this patch??