Isn't there race issue during fput() and the dentry_open()?
When booting kernel, the following deadlocks are experienced.
Stack traceback for pid 2130
0xf717f1b0 2130 1 1 0 R 0xf717f400 *irqbalance
ESP EIP Function (args)
0xf75bfe38 0xc02d04b2 _spin_lock+0x2e (0xf7441a80)
0xf75bff34 0xc015667c file_move+0x14 (0xf63080e4, 0xf75bff58, 0x0, 0xf74bf000)
0xf75bff40 0xc0154e37 dentry_open+0xb9 (0xf63080e4, 0xf7f5ad80, 0xc02d00e6, 0x100100, 0x246)
0xf75bff58 0xc0154d78 filp_open+0x36
0xf75bffb4 0xc0155079 sys_open+0x31
0xf75bffc4 0xc02d196f syscall_call+0x7
The patch was made. Is this patch right?
diff -urN linux-2.6.12-rc1.orig/fs/file_table.c linux-2.6.12-rc1/fs/file_table.c
--- linux-2.6.12-rc1.orig/fs/file_table.c 2005-03-02 16:37:47.000000000 +0900
+++ linux-2.6.12-rc1/fs/file_table.c 2005-03-31 17:50:46.323999320 +0900
@@ -209,11 +209,11 @@
void file_kill(struct file *file)
{
+ file_list_lock();
if (!list_empty(&file->f_list)) {
- file_list_lock();
list_del_init(&file->f_list);
- file_list_unlock();
}
+ file_list_unlock();
}
int fs_may_remount_ro(struct super_block *sb)
--
Haruo
On Sun, Apr 03, 2005 at 10:56:44AM +0900, Tomita, Haruo wrote:
> Isn't there race issue during fput() and the dentry_open()?
> When booting kernel, the following deadlocks are experienced.
> Stack traceback for pid 2130
> 0xf717f1b0 2130 1 1 0 R 0xf717f400 *irqbalance
> ESP EIP Function (args)
> 0xf75bfe38 0xc02d04b2 _spin_lock+0x2e (0xf7441a80)
> 0xf75bff34 0xc015667c file_move+0x14 (0xf63080e4, 0xf75bff58, 0x0, 0xf74bf000)
> 0xf75bff40 0xc0154e37 dentry_open+0xb9 (0xf63080e4, 0xf7f5ad80, 0xc02d00e6, 0x100100, 0x246)
> 0xf75bff58 0xc0154d78 filp_open+0x36
> 0xf75bffb4 0xc0155079 sys_open+0x31
> 0xf75bffc4 0xc02d196f syscall_call+0x7
>
> The patch was made. Is this patch right?
>
> diff -urN linux-2.6.12-rc1.orig/fs/file_table.c linux-2.6.12-rc1/fs/file_table.c
> --- linux-2.6.12-rc1.orig/fs/file_table.c 2005-03-02 16:37:47.000000000 +0900
> +++ linux-2.6.12-rc1/fs/file_table.c 2005-03-31 17:50:46.323999320 +0900
> @@ -209,11 +209,11 @@
>
> void file_kill(struct file *file)
> {
> + file_list_lock();
> if (!list_empty(&file->f_list)) {
> - file_list_lock();
> list_del_init(&file->f_list);
> - file_list_unlock();
> }
> + file_list_unlock();
> }
This is absolutely useless. What are you trying to protect and how the
hell could keeping a lock around that check prevent any sort of deadlock?
Besides, who could possibly call fput() on struct file allocated by
dentry_open() and do that before the latter returns a reference to
that struct file?
Hi Viro,
Thank you for your replay.
> > Stack traceback for pid 2130
> > 0xf717f1b0 2130 1 1 0 R
> 0xf717f400 *irqbalance
> > ESP EIP Function (args)
> > 0xf75bfe38 0xc02d04b2 _spin_lock+0x2e (0xf7441a80)
> > 0xf75bff34 0xc015667c file_move+0x14 (0xf63080e4,
> 0xf75bff58, 0x0, 0xf74bf000)
> > 0xf75bff40 0xc0154e37 dentry_open+0xb9 (0xf63080e4,
> 0xf7f5ad80, 0xc02d00e6, 0x100100, 0x246)
> > 0xf75bff58 0xc0154d78 filp_open+0x36
> > 0xf75bffb4 0xc0155079 sys_open+0x31
> > 0xf75bffc4 0xc02d196f syscall_call+0x7
> >
> > The patch was made. Is this patch right?
> >
> > diff -urN linux-2.6.12-rc1.orig/fs/file_table.c
> linux-2.6.12-rc1/fs/file_table.c
> > --- linux-2.6.12-rc1.orig/fs/file_table.c 2005-03-02
> 16:37:47.000000000 +0900
> > +++ linux-2.6.12-rc1/fs/file_table.c 2005-03-31
> 17:50:46.323999320 +0900
> > @@ -209,11 +209,11 @@
> >
> > void file_kill(struct file *file)
> > {
> > + file_list_lock();
> > if (!list_empty(&file->f_list)) {
> > - file_list_lock();
> > list_del_init(&file->f_list);
> > - file_list_unlock();
> > }
> > + file_list_unlock();
> > }
>
> This is absolutely useless. What are you trying to protect
> and how the
> hell could keeping a lock around that check prevent any sort
> of deadlock?
I think that it is true not to be able to acquire file_list_lock()
that is called from file_move().
> Besides, who could possibly call fput() on struct file allocated by
> dentry_open() and do that before the latter returns a reference to
> that struct file?
Indeed, Is there a good method of debugging this issue?
In the check on the source, a doubtful place was not found except file_kill().
I might call not race of fput() and the dentry_open() but
the deadlock of file_list_lock().
--
Haruo
On Mon, Apr 04, 2005 at 08:42:30AM +0900, Tomita, Haruo wrote:
> Indeed, Is there a good method of debugging this issue?
> In the check on the source, a doubtful place was not found except file_kill().
The obvious way would be to add a variable and do something like
#define file_list_lock() \
do { \
spin_lock(&files_lock); \
holder_pid = pid; \
} while(0)
and add a way to check its value (anything - sysrq, whatever).
That assumes that you can reproduce that deadlock at will, obviously...
Hi Viro,
Thank you for your help and advice.
I made the following patches referring to your advice.
I try to debugging by using this patch.
Thanks again,
Haruo
diff -urN linux-2.6.12-rc2.orig/fs/file_table.c linux-2.6.12-rc2/fs/file_table.c
--- linux-2.6.12-rc2.orig/fs/file_table.c 2005-03-02 16:37:47.000000000 +0900
+++ linux-2.6.12-rc2/fs/file_table.c 2005-04-05 11:21:58.000000000 +0900
@@ -26,6 +26,7 @@
/* public. Not pretty! */
__cacheline_aligned_in_smp DEFINE_SPINLOCK(files_lock);
+pid_t holder_pid;
static DEFINE_SPINLOCK(filp_count_lock);
diff -urN linux-2.6.12-rc2.orig/include/linux/fs.h linux-2.6.12-rc2/include/linux/fs.h
--- linux-2.6.12-rc2.orig/include/linux/fs.h 2005-04-05 11:12:53.000000000 +0900
+++ linux-2.6.12-rc2/include/linux/fs.h 2005-04-05 11:20:15.000000000 +0900
@@ -602,8 +602,17 @@
struct address_space *f_mapping;
};
extern spinlock_t files_lock;
-#define file_list_lock() spin_lock(&files_lock);
-#define file_list_unlock() spin_unlock(&files_lock);
+extern pid_t holder_pid;
+#define file_list_lock() \
+ do { \
+ spin_lock(&files_lock); \
+ holder_pid = current->pid; \
+ }while(0)
+#define file_list_unlock() \
+ do { \
+ holder_pid = 0; \
+ spin_unlock(&files_lock); \
+ }while(0)
#define get_file(x) atomic_inc(&(x)->f_count)
#define file_count(x) atomic_read(&(x)->f_count)