2007-10-16 18:54:39

by Roland Dreier

[permalink] [raw]
Subject: file operations: release can race with read/write?

I have a question about the synchronization of file_operations: is it
intended that the .release method can be called while a .read or
.write method is still running?

The reason I ask is that I've seen a crash in practice that appears to
be caused by this race, and I'm wondering whether the correct fix is
to add synchronization to the driver's .release method (this is a
character device), or whether the core kernel is supposed to provide
this synchronization.

I've written a quick test case that seems to show this happen in
practice, and from the code in fs/open.c and fs/read_write.c it looks
possible as well: sys_read() and sys_write() just do fget_light(),
which will not increment the file's reference count on the fast path,
so a racing sys_close() from another thread could do the final fput()
that ends up calling the file's .release method before the read or
write has finished.

I think the actual file data structure is OK, because it is freed with
RCU, so it won't be freed too soon. But a .release method may free
other driver-internal state that is still in use because of this.

It seems that we wouldn't want to give up the fget_light()
optimization in read/write, so probably the right place to handle this
is in the driver. But on the other hand, this is kind of a subtle
booby trap that has been laid. So I would appreciate guidance from
smarter people.

Thanks,
Roland


BTW, here's the test case -- it seems pretty conclusive to me, but
maybe I'm all wet and misunderstanding things. There's a kernel
module that just prints "Write raced with close?" if it sees the race,
and a userspace driver that just spawns two threads, one that does
open/close in a loop and one that does write in a loop. Running this
on my machine, I see several race messages get printed.

--- kernel module ---

#include <linux/module.h>
#include <linux/init.h>
#include <linux/errno.h>
#include <linux/miscdevice.h>
#include <linux/fs.h>
#include <linux/delay.h>

MODULE_LICENSE("GPL");

static unsigned long flag;

static int cw_open(struct inode *inode, struct file *filp)
{
return 0;
}

static int cw_close(struct inode *inode, struct file *filp)
{
clear_bit(1, &flag);

msleep(1);

if (test_and_set_bit(1, &flag))
printk(KERN_ERR "Write raced with close?\n");

return 0;
}

static ssize_t cw_write(struct file *filp, const char __user *buf,
size_t len, loff_t *pos)
{
set_bit(1, &flag);

return 0;
}

static const struct file_operations cw_fops = {
.owner = THIS_MODULE,
.open = cw_open,
.release = cw_close,
.write = cw_write,
};

static struct miscdevice cw_misc = {
.minor = MISC_DYNAMIC_MINOR,
.name = "cw",
.fops = &cw_fops,
};

static int __init cw_init(void)
{
return misc_register(&cw_misc);
}

static void __exit cw_cleanup(void)
{
misc_deregister(&cw_misc);
}

module_init(cw_init);
module_exit(cw_cleanup);

--- userspace code ---

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <pthread.h>

static int dfd = -1;

static void *write_loop(void *p)
{
char buf[1];

while (1)
write(dfd, buf, 1);

return NULL;
}

int main(int argc, char *argv[])
{
pthread_t thr;

if (pthread_create(&thr, NULL, write_loop, NULL))
return 1;

while (1) {
dfd = open("/dev/cw", O_RDWR);
close(dfd);
}

return 0;
}


2007-10-17 18:31:25

by Roland Dreier

[permalink] [raw]
Subject: [RESEND] file operations: release can race with read/write?

[Resending, directly to a couple likely suspects this time, in the
hope of getting a reply... thanks]

I have a question about the synchronization of file_operations: is it
intended that the .release method of a file can be called while a
.read or .write method is still running for that file?

The reason I ask is that I've seen a crash in practice that appears to
be caused by this race, and I'm wondering whether the correct fix is
to add synchronization between .read/.write and .release to the
driver's methods (this is a character device), or whether the core
kernel is supposed to provide this synchronization.

I've written a quick test case that seems to show this happen in
practice, and reading the code in fs/open.c and fs/read_write.c I see
no reason why this race can't happen: sys_read() and sys_write() just
do fget_light(), which will not increment the file's reference count
on the fast path, so a racing sys_close() from another thread could do
the final fput() that ends up calling the file's .release method
before the read or write has finished.

I think the actual file data structure is OK, because it is freed with
RCU, so it won't be freed too soon. But a .release method may free
other driver-internal state that is still in use because of this race.

It seems that we wouldn't want to give up the fget_light()
optimization in read/write, so probably the right place to handle this
is in the driver. But on the other hand, this is kind of a subtle
booby trap that has been laid, and maybe there is a clever way to
handle this. So I would appreciate guidance from smarter people.

Thanks,
Roland


BTW, here's the test case -- it seems pretty conclusive to me, but
maybe I'm all wet and misunderstanding things. There's a kernel
module that just prints "Write raced with close?" if it sees the race,
and a userspace driver that just spawns two threads, one that does
open/close in a loop and one that does write in a loop. Running this
on my machine, I see several race messages get printed.

--- kernel module ---

#include <linux/module.h>
#include <linux/init.h>
#include <linux/errno.h>
#include <linux/miscdevice.h>
#include <linux/fs.h>
#include <linux/delay.h>

MODULE_LICENSE("GPL");

static unsigned long flag;

static int cw_open(struct inode *inode, struct file *filp)
{
return 0;
}

static int cw_close(struct inode *inode, struct file *filp)
{
clear_bit(1, &flag);

msleep(1);

if (test_and_set_bit(1, &flag))
printk(KERN_ERR "Write raced with close?\n");

return 0;
}

static ssize_t cw_write(struct file *filp, const char __user *buf,
size_t len, loff_t *pos)
{
set_bit(1, &flag);

return 0;
}

static const struct file_operations cw_fops = {
.owner = THIS_MODULE,
.open = cw_open,
.release = cw_close,
.write = cw_write,
};

static struct miscdevice cw_misc = {
.minor = MISC_DYNAMIC_MINOR,
.name = "cw",
.fops = &cw_fops,
};

static int __init cw_init(void)
{
return misc_register(&cw_misc);
}

static void __exit cw_cleanup(void)
{
misc_deregister(&cw_misc);
}

module_init(cw_init);
module_exit(cw_cleanup);

--- userspace code ---

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <pthread.h>

static int dfd = -1;

static void *write_loop(void *p)
{
char buf[1];

while (1)
write(dfd, buf, 1);

return NULL;
}

int main(int argc, char *argv[])
{
pthread_t thr;

if (pthread_create(&thr, NULL, write_loop, NULL))
return 1;

while (1) {
dfd = open("/dev/cw", O_RDWR);
close(dfd);
}

return 0;
}

2007-10-17 19:17:31

by Andrew Morton

[permalink] [raw]
Subject: Re: [RESEND] file operations: release can race with read/write?

On Wed, 17 Oct 2007 11:30:44 -0700 Roland Dreier <[email protected]> wrote:

> [Resending, directly to a couple likely suspects this time, in the
> hope of getting a reply... thanks]
>
> I have a question about the synchronization of file_operations: is it
> intended that the .release method of a file can be called while a
> .read or .write method is still running for that file?
>
> The reason I ask is that I've seen a crash in practice that appears to
> be caused by this race, and I'm wondering whether the correct fix is
> to add synchronization between .read/.write and .release to the
> driver's methods (this is a character device), or whether the core
> kernel is supposed to provide this synchronization.
>
> I've written a quick test case that seems to show this happen in
> practice, and reading the code in fs/open.c and fs/read_write.c I see
> no reason why this race can't happen: sys_read() and sys_write() just
> do fget_light(), which will not increment the file's reference count
> on the fast path, so a racing sys_close() from another thread could do
> the final fput() that ends up calling the file's .release method
> before the read or write has finished.

After pthread_create()'s clone() you should have files->count==2, so
fget_light() will do the full atomic_inc_not_zero() thing?

> I think the actual file data structure is OK, because it is freed with
> RCU, so it won't be freed too soon. But a .release method may free
> other driver-internal state that is still in use because of this race.
>
> It seems that we wouldn't want to give up the fget_light()
> optimization in read/write, so probably the right place to handle this
> is in the driver. But on the other hand, this is kind of a subtle
> booby trap that has been laid, and maybe there is a clever way to
> handle this. So I would appreciate guidance from smarter people.
>
> Thanks,
> Roland
>
>
> BTW, here's the test case -- it seems pretty conclusive to me, but
> maybe I'm all wet and misunderstanding things. There's a kernel
> module that just prints "Write raced with close?" if it sees the race,
> and a userspace driver that just spawns two threads, one that does
> open/close in a loop and one that does write in a loop. Running this
> on my machine, I see several race messages get printed.
>

That seems pretty conclusive. I haven't actually thought about this yet -
I just wanted to clarify that fget_light() thing.

2007-10-17 19:27:52

by Al Viro

[permalink] [raw]
Subject: Re: [RESEND] file operations: release can race with read/write?

On Wed, Oct 17, 2007 at 11:30:44AM -0700, Roland Dreier wrote:
> [Resending, directly to a couple likely suspects this time, in the
> hope of getting a reply... thanks]
>
> I have a question about the synchronization of file_operations: is it
> intended that the .release method of a file can be called while a
> .read or .write method is still running for that file?

> I've written a quick test case that seems to show this happen in
> practice, and reading the code in fs/open.c and fs/read_write.c I see
> no reason why this race can't happen: sys_read() and sys_write() just
> do fget_light(), which will not increment the file's reference count
> on the fast path, so a racing sys_close() from another thread could do
> the final fput() that ends up calling the file's .release method
> before the read or write has finished.

If you _have_ another thread to call close() from, you won't hit that
fast path. Case closed.

2007-10-17 19:33:33

by Al Viro

[permalink] [raw]
Subject: Re: [RESEND] file operations: release can race with read/write?

On Wed, Oct 17, 2007 at 11:30:44AM -0700, Roland Dreier wrote:
> static unsigned long flag;
>
> static int cw_open(struct inode *inode, struct file *filp)
> {
> return 0;
> }
>
> static int cw_close(struct inode *inode, struct file *filp)
> {
> clear_bit(1, &flag);
>
> msleep(1);
>
> if (test_and_set_bit(1, &flag))
> printk(KERN_ERR "Write raced with close?\n");

Of _course_ you can trigger that - just open two files and have write
on one of them while another gets closed.

That is certainly possible, but it has nothing whatsoever with the
race you've described - you have two different objects and method
call on one of them might happen when destructor is called on another.

You have protection against ->release() for a struct file called
while you are in ->write() for _same_ struct file.

2007-10-17 22:07:04

by Roland Dreier

[permalink] [raw]
Subject: Re: [RESEND] file operations: release can race with read/write?

> After pthread_create()'s clone() you should have files->count==2, so
> fget_light() will do the full atomic_inc_not_zero() thing?

Indeed. Thanks for the clue.

I guess the search for the bug goes on. (I only have a picture of the
tail end of one oops message to work with so far, hence the wild
speculation).

Thanks,
Roland

2007-10-17 22:10:21

by Roland Dreier

[permalink] [raw]
Subject: Re: [RESEND] file operations: release can race with read/write?

> Of _course_ you can trigger that - just open two files and have write
> on one of them while another gets closed.

Right, thanks for the explanation.

- R.