2005-04-03 02:01:15

by Tomita, Haruo

[permalink] [raw]
Subject: Isn't there race issue during fput() and the dentry_open()?

Isn't there race issue during fput() and the dentry_open()?
When booting kernel, the following deadlocks are experienced.

Stack traceback for pid 2130
0xf717f1b0 2130 1 1 0 R 0xf717f400 *irqbalance
ESP EIP Function (args)
0xf75bfe38 0xc02d04b2 _spin_lock+0x2e (0xf7441a80)
0xf75bff34 0xc015667c file_move+0x14 (0xf63080e4, 0xf75bff58, 0x0, 0xf74bf000)
0xf75bff40 0xc0154e37 dentry_open+0xb9 (0xf63080e4, 0xf7f5ad80, 0xc02d00e6, 0x100100, 0x246)
0xf75bff58 0xc0154d78 filp_open+0x36
0xf75bffb4 0xc0155079 sys_open+0x31
0xf75bffc4 0xc02d196f syscall_call+0x7

The patch was made. Is this patch right?

diff -urN linux-2.6.12-rc1.orig/fs/file_table.c linux-2.6.12-rc1/fs/file_table.c
--- linux-2.6.12-rc1.orig/fs/file_table.c 2005-03-02 16:37:47.000000000 +0900
+++ linux-2.6.12-rc1/fs/file_table.c 2005-03-31 17:50:46.323999320 +0900
@@ -209,11 +209,11 @@

void file_kill(struct file *file)
{
+ file_list_lock();
if (!list_empty(&file->f_list)) {
- file_list_lock();
list_del_init(&file->f_list);
- file_list_unlock();
}
+ file_list_unlock();
}

int fs_may_remount_ro(struct super_block *sb)

--
Haruo


2005-04-03 05:00:34

by Al Viro

[permalink] [raw]
Subject: Re: Isn't there race issue during fput() and the dentry_open()?

On Sun, Apr 03, 2005 at 10:56:44AM +0900, Tomita, Haruo wrote:
> Isn't there race issue during fput() and the dentry_open()?
> When booting kernel, the following deadlocks are experienced.


> Stack traceback for pid 2130
> 0xf717f1b0 2130 1 1 0 R 0xf717f400 *irqbalance
> ESP EIP Function (args)
> 0xf75bfe38 0xc02d04b2 _spin_lock+0x2e (0xf7441a80)
> 0xf75bff34 0xc015667c file_move+0x14 (0xf63080e4, 0xf75bff58, 0x0, 0xf74bf000)
> 0xf75bff40 0xc0154e37 dentry_open+0xb9 (0xf63080e4, 0xf7f5ad80, 0xc02d00e6, 0x100100, 0x246)
> 0xf75bff58 0xc0154d78 filp_open+0x36
> 0xf75bffb4 0xc0155079 sys_open+0x31
> 0xf75bffc4 0xc02d196f syscall_call+0x7
>
> The patch was made. Is this patch right?
>
> diff -urN linux-2.6.12-rc1.orig/fs/file_table.c linux-2.6.12-rc1/fs/file_table.c
> --- linux-2.6.12-rc1.orig/fs/file_table.c 2005-03-02 16:37:47.000000000 +0900
> +++ linux-2.6.12-rc1/fs/file_table.c 2005-03-31 17:50:46.323999320 +0900
> @@ -209,11 +209,11 @@
>
> void file_kill(struct file *file)
> {
> + file_list_lock();
> if (!list_empty(&file->f_list)) {
> - file_list_lock();
> list_del_init(&file->f_list);
> - file_list_unlock();
> }
> + file_list_unlock();
> }

This is absolutely useless. What are you trying to protect and how the
hell could keeping a lock around that check prevent any sort of deadlock?

Besides, who could possibly call fput() on struct file allocated by
dentry_open() and do that before the latter returns a reference to
that struct file?

2005-04-03 23:46:59

by Tomita, Haruo

[permalink] [raw]
Subject: RE: Isn't there race issue during fput() and the dentry_open()?

Hi Viro,

Thank you for your replay.

> > Stack traceback for pid 2130
> > 0xf717f1b0 2130 1 1 0 R
> 0xf717f400 *irqbalance
> > ESP EIP Function (args)
> > 0xf75bfe38 0xc02d04b2 _spin_lock+0x2e (0xf7441a80)
> > 0xf75bff34 0xc015667c file_move+0x14 (0xf63080e4,
> 0xf75bff58, 0x0, 0xf74bf000)
> > 0xf75bff40 0xc0154e37 dentry_open+0xb9 (0xf63080e4,
> 0xf7f5ad80, 0xc02d00e6, 0x100100, 0x246)
> > 0xf75bff58 0xc0154d78 filp_open+0x36
> > 0xf75bffb4 0xc0155079 sys_open+0x31
> > 0xf75bffc4 0xc02d196f syscall_call+0x7
> >
> > The patch was made. Is this patch right?
> >
> > diff -urN linux-2.6.12-rc1.orig/fs/file_table.c
> linux-2.6.12-rc1/fs/file_table.c
> > --- linux-2.6.12-rc1.orig/fs/file_table.c 2005-03-02
> 16:37:47.000000000 +0900
> > +++ linux-2.6.12-rc1/fs/file_table.c 2005-03-31
> 17:50:46.323999320 +0900
> > @@ -209,11 +209,11 @@
> >
> > void file_kill(struct file *file)
> > {
> > + file_list_lock();
> > if (!list_empty(&file->f_list)) {
> > - file_list_lock();
> > list_del_init(&file->f_list);
> > - file_list_unlock();
> > }
> > + file_list_unlock();
> > }
>
> This is absolutely useless. What are you trying to protect
> and how the
> hell could keeping a lock around that check prevent any sort
> of deadlock?

I think that it is true not to be able to acquire file_list_lock()
that is called from file_move().

> Besides, who could possibly call fput() on struct file allocated by
> dentry_open() and do that before the latter returns a reference to
> that struct file?

Indeed, Is there a good method of debugging this issue?
In the check on the source, a doubtful place was not found except file_kill().
I might call not race of fput() and the dentry_open() but
the deadlock of file_list_lock().
--
Haruo

2005-04-04 10:50:38

by Al Viro

[permalink] [raw]
Subject: Re: Isn't there race issue during fput() and the dentry_open()?

On Mon, Apr 04, 2005 at 08:42:30AM +0900, Tomita, Haruo wrote:
> Indeed, Is there a good method of debugging this issue?
> In the check on the source, a doubtful place was not found except file_kill().

The obvious way would be to add a variable and do something like
#define file_list_lock() \
do { \
spin_lock(&files_lock); \
holder_pid = pid; \
} while(0)
and add a way to check its value (anything - sysrq, whatever).

That assumes that you can reproduce that deadlock at will, obviously...

2005-04-05 04:47:20

by Tomita, Haruo

[permalink] [raw]
Subject: RE: Isn't there race issue during fput() and the dentry_open()?

Hi Viro,

Thank you for your help and advice.
I made the following patches referring to your advice.
I try to debugging by using this patch.

Thanks again,
Haruo

diff -urN linux-2.6.12-rc2.orig/fs/file_table.c linux-2.6.12-rc2/fs/file_table.c
--- linux-2.6.12-rc2.orig/fs/file_table.c 2005-03-02 16:37:47.000000000 +0900
+++ linux-2.6.12-rc2/fs/file_table.c 2005-04-05 11:21:58.000000000 +0900
@@ -26,6 +26,7 @@

/* public. Not pretty! */
__cacheline_aligned_in_smp DEFINE_SPINLOCK(files_lock);
+pid_t holder_pid;

static DEFINE_SPINLOCK(filp_count_lock);

diff -urN linux-2.6.12-rc2.orig/include/linux/fs.h linux-2.6.12-rc2/include/linux/fs.h
--- linux-2.6.12-rc2.orig/include/linux/fs.h 2005-04-05 11:12:53.000000000 +0900
+++ linux-2.6.12-rc2/include/linux/fs.h 2005-04-05 11:20:15.000000000 +0900
@@ -602,8 +602,17 @@
struct address_space *f_mapping;
};
extern spinlock_t files_lock;
-#define file_list_lock() spin_lock(&files_lock);
-#define file_list_unlock() spin_unlock(&files_lock);
+extern pid_t holder_pid;
+#define file_list_lock() \
+ do { \
+ spin_lock(&files_lock); \
+ holder_pid = current->pid; \
+ }while(0)
+#define file_list_unlock() \
+ do { \
+ holder_pid = 0; \
+ spin_unlock(&files_lock); \
+ }while(0)

#define get_file(x) atomic_inc(&(x)->f_count)
#define file_count(x) atomic_read(&(x)->f_count)