2002-09-17 08:10:09

by Hirokazu Takahashi

[permalink] [raw]
Subject: [BUG] kNFSd may get a freeing inode.

Hello,

This is a big problem.

kNFSd might get inode which is just freeing.
We know inodes under freeing(I_FREEING) are not on the inode-hash.

Please suppose kNFSd calls iget() against freeing inodes.
It might cause weird problem as iget() calls get_new_inode() if the
inode isn't on the hash.

It would never happen on local filesysmtem as no one can
lookup freeing dentries or inodes while NFSd try to get a inode
with inode-number directly.

What is the best way to avoid this situation?
We shold keep freeing inodes the hash?

Thank you,
Hirokazu Takahashi.


-------------------------------------------------------
Sponsored by: AMD - Your access to the experts on Hammer Technology!
Open Source & Linux Developers, register now for the AMD Developer
Symposium. Code: EX8664 http://www.developwithamd.com/developerlab
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2002-09-20 11:13:47

by Hirokazu Takahashi

[permalink] [raw]
Subject: Re: [BUG] kNFSd may get a freeing inode.

Hello,

I've checked with printk that nfsd will sleep when it find a
freeing inode. It worked good as we expected.

taka> I was thinking about your code and I've changed my mind that
taka> your aproach isn't bad.
taka>
taka> > Brief description of patch:
taka> >
taka> > leaf about-to-be-delete inodes in hash table until
taka> > filesystem has signed off on them, then remove from table and
taka> > wake_up_inode.
taka> > If a FREEING inode is found on hash table, sleep on
taka> > the inodes wait_queue (which usefully is separate from the inode and
taka> > won't be freed when the inode goes) and then try again.


--- linux/fs/inode.c.ORG Wed Sep 18 15:58:03 2030
+++ linux/fs/inode.c Fri Sep 20 20:00:16 2030
@@ -444,6 +444,8 @@ int shrink_icache_memory(int priority, i
return 0;
}

+static void __wait_on_freeing_inode(struct inode *inode);
+
/*
* Called with the inode lock held.
* NOTE: we are not increasing the inode-refcount, you must call __iget()
@@ -454,7 +456,7 @@ static struct inode * find_inode(struct
{
struct list_head *tmp;
struct inode * inode;
-
+retry:
tmp = head;
for (;;) {
tmp = tmp->next;
@@ -466,6 +468,17 @@ static struct inode * find_inode(struct
continue;
if (!test(inode, data))
continue;
+ /*
+ * Keep freeing inodes on the hash or nfsd might try to
+ * read them from disk and put them on the hash as nfsd
+ * can call iget() directly at any time.
+ * If it's a freeing inode, nfsd has to wait until it's
+ * freed completely and try to search again.
+ */
+ if (inode->i_state & (I_FREEING|I_CLEAR)) {
+ __wait_on_freeing_inode(inode);
+ goto retry;
+ }
break;
}
return inode;
@@ -479,7 +492,7 @@ static struct inode * find_inode_fast(st
{
struct list_head *tmp;
struct inode * inode;
-
+retry:
tmp = head;
for (;;) {
tmp = tmp->next;
@@ -491,6 +504,17 @@ static struct inode * find_inode_fast(st
continue;
if (inode->i_sb != sb)
continue;
+ /*
+ * Keep freeing inodes on the hash or nfsd might try to
+ * read them from disk and put them on the hash as nfsd
+ * can call iget() directly at any time.
+ * If it's a freeing inode, nfsd has to wait until it's
+ * freed completely and try to search again.
+ */
+ if (inode->i_state & (I_FREEING|I_CLEAR)) {
+ __wait_on_freeing_inode(inode);
+ goto retry;
+ }
break;
}
return inode;
@@ -793,7 +817,6 @@ void generic_delete_inode(struct inode *
{
struct super_operations *op = inode->i_sb->s_op;

- list_del_init(&inode->i_hash);
list_del_init(&inode->i_list);
inode->i_state|=I_FREEING;
inodes_stat.nr_inodes--;
@@ -812,6 +835,10 @@ void generic_delete_inode(struct inode *
delete(inode);
} else
clear_inode(inode);
+ spin_lock(&inode_lock);
+ list_del_init(&inode->i_hash);
+ spin_unlock(&inode_lock);
+ wake_up_inode(inode);
if (inode->i_state != I_CLEAR)
BUG();
destroy_inode(inode);
@@ -1050,6 +1077,21 @@ void wake_up_inode(struct inode *inode)
if (waitqueue_active(wq))
wake_up_all(wq);
}
+
+static void __wait_on_freeing_inode(struct inode *inode)
+{
+ DECLARE_WAITQUEUE(wait, current);
+ wait_queue_head_t *wq = i_waitq_head(inode);
+
+ add_wait_queue(wq, &wait);
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ spin_unlock(&inode_lock);
+ schedule();
+ remove_wait_queue(wq, &wait);
+ current->state = TASK_RUNNING;
+ spin_lock(&inode_lock);
+}
+

/*
* Initialize the waitqueues and inode hash table.
--- linux/fs/fs-writeback.c.ORG Fri Sep 20 17:49:36 2030
+++ linux/fs/fs-writeback.c Fri Sep 20 17:53:48 2030
@@ -87,7 +87,7 @@ void __mark_inode_dirty(struct inode *in
* Only add valid (hashed) inode to the superblock's
* dirty list. Add blockdev inodes as well.
*/
- if (list_empty(&inode->i_hash) && !S_ISBLK(inode->i_mode))
+ if ((list_empty(&inode->i_hash) || inode->i_state & (I_FREEING|I_CLEAR)) && !S_ISBLK(inode->i_mode))
goto out;

/*


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-09-18 06:29:47

by NeilBrown

[permalink] [raw]
Subject: Re: [BUG] kNFSd may get a freeing inode.

On Tuesday September 17, [email protected] wrote:
> Hello,
>
> This is a big problem.
>
> kNFSd might get inode which is just freeing.
> We know inodes under freeing(I_FREEING) are not on the inode-hash.
....
> What is the best way to avoid this situation?
> We shold keep freeing inodes the hash?

I think you are right.
If knfsd (or a filesystem's ->get_dentry) calls iget just as
iput_final is being called, the iget will fail to find the inode (it
is dropped from the hash before unlocking inode_lock) and so will call
->read_inode, possibly before generic_delete_inode has called
->delete_inode, and so will get an apparently valid inode just before
that inode actually gets destroyed.

The only solution I can see is, as you say, keeping it on the hash
table a bit longer.

Al, can you see any problem with this patch which should, I think,
close the race?

NeilBrown


--- fs/inode.c.~1.1~ 2002-09-18 16:21:51.000000000 +1000
+++ fs/inode.c 2002-09-18 16:16:29.000000000 +1000
@@ -787,37 +787,37 @@
spin_lock(&inode_lock);
list_del_init(&inode->i_hash);
spin_unlock(&inode_lock);
}

void generic_delete_inode(struct inode *inode)
{
struct super_operations *op = inode->i_sb->s_op;

- list_del_init(&inode->i_hash);
list_del_init(&inode->i_list);
inode->i_state|=I_FREEING;
inodes_stat.nr_inodes--;
spin_unlock(&inode_lock);

if (inode->i_data.nrpages)
truncate_inode_pages(&inode->i_data, 0);

security_ops->inode_delete(inode);

if (op && op->delete_inode) {
void (*delete)(struct inode *) = op->delete_inode;
if (!is_bad_inode(inode))
DQUOT_INIT(inode);
/* s_op->delete_inode internally recalls clear_inode() */
delete(inode);
} else
clear_inode(inode);
+ list_del_init(&inode->i_hash);
if (inode->i_state != I_CLEAR)
BUG();
destroy_inode(inode);
}
EXPORT_SYMBOL(generic_delete_inode);

static void generic_forget_inode(struct inode *inode)
{
struct super_block *sb = inode->i_sb;


-------------------------------------------------------
This SF.NET email is sponsored by: AMD - Your access to the experts
on Hammer Technology! Open Source & Linux Developers, register now
for the AMD Developer Symposium. Code: EX8664
http://www.developwithamd.com/developerlab
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-09-18 07:17:35

by Hirokazu Takahashi

[permalink] [raw]
Subject: Re: [BUG] kNFSd may get a freeing inode.

Hello,

> > This is a big problem.
> >
> > kNFSd might get inode which is just freeing.
> > We know inodes under freeing(I_FREEING) are not on the inode-hash.
> ....
> > What is the best way to avoid this situation?
> > We shold keep freeing inodes the hash?
>
> I think you are right.
> If knfsd (or a filesystem's ->get_dentry) calls iget just as
> iput_final is being called, the iget will fail to find the inode (it
> is dropped from the hash before unlocking inode_lock) and so will call
> ->read_inode, possibly before generic_delete_inode has called
> ->delete_inode, and so will get an apparently valid inode just before
> that inode actually gets destroyed.
>
> The only solution I can see is, as you say, keeping it on the hash
> table a bit longer.
>
> Al, can you see any problem with this patch which should, I think,
> close the race?

There would be some problems.

1) knfsd can get a freeing inode which will be destroyed soon,
so that knfsd may access unknown memory.
find_inode() should be fixed to not get a freeing inode

2) Just after calling op->delete_inode(inode) or clear_inode(inode),
others can create a new inode which have the same inode-number
even if old one is still on the hash.

3) Some code assumes inodes on the hash are valid.
I found __mark_inode_dirty() assumes this and may move freeing
inodes on the dirty-inode-list.
__mark_inode_dirty() should check i_state flag instead of i_hash.

Any others?


Thank you,
Hirokazu Takahashi.


-------------------------------------------------------
This SF.NET email is sponsored by: AMD - Your access to the experts
on Hammer Technology! Open Source & Linux Developers, register now
for the AMD Developer Symposium. Code: EX8664
http://www.developwithamd.com/developerlab
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-09-18 10:47:02

by NeilBrown

[permalink] [raw]
Subject: Re: [BUG] kNFSd may get a freeing inode.

On Wednesday September 18, [email protected] wrote:
> >
> > Al, can you see any problem with this patch which should, I think,
> > close the race?
>
> There would be some problems.
>
> 1) knfsd can get a freeing inode which will be destroyed soon,
> so that knfsd may access unknown memory.
> find_inode() should be fixed to not get a freeing inode

I thought of that too.... after I posted :-(

find_inode is always called immediately after inode_lock
is claimed (except for iunique). So it would be ok for find_inode
to drop the spinlock, wait, and try again. The logic works for
iunique too - there is no harm in find_inode_fast dropping and
reclaiming inode_lock.

So how about if find_inode{,_fast} checks the inode that it finds
and, if it is FREEING, it drops the spinlock and waits on some
waitqueue. destroy_inode calls wake_up on that waitqueue.
When find_inode is then woken up, it tries the lookup again.

The patch below (uncompiled) should do something like that.
>
> 2) Just after calling op->delete_inode(inode) or clear_inode(inode),
> others can create a new inode which have the same inode-number
> even if old one is still on the hash.

That's true.. but with the above approach, the attempt to
create the new inode should block until the delete_inode has
completed.

>
> 3) Some code assumes inodes on the hash are valid.
> I found __mark_inode_dirty() assumes this and may move freeing
> inodes on the dirty-inode-list.
> __mark_inode_dirty() should check i_state flag instead of i_hash.
>
> Any others?

Thanks. That patch makes that change. If search the tree for
i_hash and the only other references are in fs/inode.c and they
all seem safe to me.

NeilBrown

Brief description of patch:

leaf about-to-be-delete inodes in hash table until
filesystem has signed off on them, then remove from table and
wake_up_inode.
If a FREEING inode is found on hash table, sleep on
the inodes wait_queue (which usefully is separate from the inode and
won't be freed when the inode goes) and then try again.

This is against 2.5.35.

--- ./fs/inode.c 2002/09/18 06:16:13 1.1
+++ ./fs/inode.c 2002/09/18 10:43:21
@@ -454,6 +454,9 @@ static struct inode * find_inode(struct
{
struct list_head *tmp;
struct inode * inode;
+ wait_queue_t wt;
+ wait_queue_head_t *wq;
+ wt.task = NULL;

tmp = head;
for (;;) {
@@ -466,7 +469,23 @@ static struct inode * find_inode(struct
continue;
if (!test(inode, data))
continue;
- break;
+ if (!(inode->i_state & I_FREEING))
+ break;
+ spin_unlock(&inode_lock);
+ if (!wq.task) {
+ set_current_state(TASK_INTERRUPTIBLE);
+ init_waitqueue_entry(&wt, current);
+ wq= i_waitq_head(inode);
+ add_wait_queue(wq, &wt);
+ } else {
+ schedule();
+ set_current_state(TASK_INTERRUPTIBLE);
+ }
+ spin_lock(&inode_lock);
+ }
+ if (wt.task) {
+ current->state = TASK_RUNNING;
+ remove_wait_queue(wq, &wt);
}
return inode;
}
@@ -479,6 +498,9 @@ static struct inode * find_inode_fast(st
{
struct list_head *tmp;
struct inode * inode;
+ wait_queue_t wt;
+ wait_queue_head_t *wq;
+ wt.task = NULL;

tmp = head;
for (;;) {
@@ -491,7 +513,23 @@ static struct inode * find_inode_fast(st
continue;
if (inode->i_sb != sb)
continue;
- break;
+ if (!(inode->i_state & I_FREEING))
+ break;
+ spin_unlock(&inode_lock);
+ if (!wq.task) {
+ set_current_state(TASK_INTERRUPTIBLE);
+ init_waitqueue_entry(&wt, current);
+ wq= i_waitq_head(inode);
+ add_wait_queue(wq, &wt);
+ } else {
+ schedule();
+ set_current_state(TASK_INTERRUPTIBLE);
+ }
+ spin_lock(&inode_lock);
+ }
+ if (wt.task) {
+ current->state = TASK_RUNNING;
+ remove_wait_queue(wq, &wt);
}
return inode;
}
@@ -793,7 +831,6 @@ void generic_delete_inode(struct inode *
{
struct super_operations *op = inode->i_sb->s_op;

- list_del_init(&inode->i_hash);
list_del_init(&inode->i_list);
inode->i_state|=I_FREEING;
inodes_stat.nr_inodes--;
@@ -812,6 +849,11 @@ void generic_delete_inode(struct inode *
delete(inode);
} else
clear_inode(inode);
+ spin_lock(&inode_lock);
+ list_del_init(&inode->i_hash);
+ spin_unlock(&inode_lock);
+ wake_up_inode(inode);
+
if (inode->i_state != I_CLEAR)
BUG();
destroy_inode(inode);
--- ./fs/fs-writeback.c 2002/09/18 10:37:04 1.1
+++ ./fs/fs-writeback.c 2002/09/18 10:37:47
@@ -87,7 +87,8 @@ void __mark_inode_dirty(struct inode *in
* Only add valid (hashed) inode to the superblock's
* dirty list. Add blockdev inodes as well.
*/
- if (list_empty(&inode->i_hash) && !S_ISBLK(inode->i_mode))
+ if ((list_empty(&inode->i_hash) || (inode->i_state & I_FREEING))
+ && !S_ISBLK(inode->i_mode))
goto out;

/*


-------------------------------------------------------
This SF.NET email is sponsored by: AMD - Your access to the experts
on Hammer Technology! Open Source & Linux Developers, register now
for the AMD Developer Symposium. Code: EX8664
http://www.developwithamd.com/developerlab
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-09-18 12:39:21

by Hirokazu Takahashi

[permalink] [raw]
Subject: Re: [BUG] kNFSd may get a freeing inode.

Hello,

> So how about if find_inode{,_fast} checks the inode that it finds
> and, if it is FREEING, it drops the spinlock and waits on some
> waitqueue. destroy_inode calls wake_up on that waitqueue.
> When find_inode is then woken up, it tries the lookup again.

Mmmmm... I can see what you would like to do, but it looks
little bit complex to me.

How do you think of my idea?

1. If find_inode() finds a valid inode, iget() will return it.

2. If find_inode() finds nothing, iget() will call get_new_inode()
to make a new inode.

3. If find_inode() finds a I_FREEING inode, iget() will return NULL.
I guess this case will only happen on NFS. So kNFSd has to accept
NULL inode.

4. If find_inode() finds both of a I_FREEING inode and a valid inode,
iget() will return the valid inode.


neilb> --- ./fs/fs-writeback.c 2002/09/18 10:37:04 1.1
neilb> +++ ./fs/fs-writeback.c 2002/09/18 10:37:47
neilb> @@ -87,7 +87,8 @@ void __mark_inode_dirty(struct inode *in
neilb> * Only add valid (hashed) inode to the superblock's
neilb> * dirty list. Add blockdev inodes as well.
neilb> */
neilb> - if (list_empty(&inode->i_hash) && !S_ISBLK(inode->i_mode))
neilb> + if ((list_empty(&inode->i_hash) || (inode->i_state & I_FREEING))
neilb> + && !S_ISBLK(inode->i_mode))
neilb> goto out;
neilb>
neilb> /*

I think it should be like this:

if ((list_empty(&inode->i_hash) || (inode->i_state & (I_FREEING|I_CLEAN)))
^^^^^^^^^^

Following patch is just a sample which may not work.
And find_inode() and iget5_locked also should be fixed.


--- inode.c.ORG Wed Sep 18 15:58:03 2030
+++ inode.c Wed Sep 18 21:10:41 2030
@@ -479,18 +479,24 @@ static struct inode * find_inode_fast(st
{
struct list_head *tmp;
struct inode * inode;
+ struct inode * freeing_inode = NULL;

tmp = head;
for (;;) {
tmp = tmp->next;
- inode = NULL;
- if (tmp == head)
+ if (tmp == head) {
+ inode = freeing_inode;
break;
+ }
inode = list_entry(tmp, struct inode, i_hash);
if (inode->i_ino != ino)
continue;
if (inode->i_sb != sb)
continue;
+ if (inode->i_state & (I_FREEING|I_CLEAR)) {
+ freeing_inode = inode;
+ continue;
+ }
break;
}
return inode;
@@ -604,6 +610,11 @@ static struct inode * get_new_inode_fast
spin_lock(&inode_lock);
/* We released the lock, so.. */
old = find_inode_fast(sb, head, ino);
+ if (old && old->i_state & (I_FREEING|I_CLEAR)) {
+ spin_unlock(&inode_lock);
+ destroy_inode(inode);
+ return NULL;
+ }
if (!old) {
inode->i_ino = ino;
inodes_stat.nr_inodes++;
@@ -737,6 +748,10 @@ struct inode *iget_locked(struct super_b
spin_lock(&inode_lock);
inode = find_inode_fast(sb, head, ino);
if (inode) {
+ if (inode->i_state & (I_FREEING|I_CLEAR)) {
+ spin_unlock(&inode_lock);
+ return NULL;
+ }
__iget(inode);
spin_unlock(&inode_lock);
wait_on_inode(inode);
@@ -793,7 +808,6 @@ void generic_delete_inode(struct inode *
{
struct super_operations *op = inode->i_sb->s_op;

- list_del_init(&inode->i_hash);
list_del_init(&inode->i_list);
inode->i_state|=I_FREEING;
inodes_stat.nr_inodes--;
@@ -812,6 +826,7 @@ void generic_delete_inode(struct inode *
delete(inode);
} else
clear_inode(inode);
+ list_del_init(&inode->i_hash);
if (inode->i_state != I_CLEAR)
BUG();
destroy_inode(inode);



-------------------------------------------------------
This SF.NET email is sponsored by: AMD - Your access to the experts
on Hammer Technology! Open Source & Linux Developers, register now
for the AMD Developer Symposium. Code: EX8664
http://www.developwithamd.com/developerlab
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-09-19 16:01:31

by Hirokazu Takahashi

[permalink] [raw]
Subject: Re: [BUG] kNFSd may get a freeing inode.

Hi,

I was thinking about your code and I've changed my mind that
your aproach isn't bad.

> Brief description of patch:
>
> leaf about-to-be-delete inodes in hash table until
> filesystem has signed off on them, then remove from table and
> wake_up_inode.
> If a FREEING inode is found on hash table, sleep on
> the inodes wait_queue (which usefully is separate from the inode and
> won't be freed when the inode goes) and then try again.
.....

I refined your patch, how do you think about it?
(This patch is just sample)

--- inode.c.ORG Wed Sep 18 15:58:03 2030
+++ inode.c Fri Sep 20 00:48:31 2030
@@ -444,6 +444,20 @@ int shrink_icache_memory(int priority, i
return 0;
}

+void __wait_on_freeing_inode(struct inode *inode)
+{
+ DECLARE_WAITQUEUE(wait, current);
+ wait_queue_head_t *wq = i_waitq_head(inode);
+
+ add_wait_queue(wq, &wait);
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ spin_unlock(&inode_lock);
+ schedule();
+ remove_wait_queue(wq, &wait);
+ current->state = TASK_RUNNING;
+ spin_lock(&inode_lock);
+}
+
/*
* Called with the inode lock held.
* NOTE: we are not increasing the inode-refcount, you must call __iget()
@@ -466,6 +480,10 @@ static struct inode * find_inode(struct
continue;
if (!test(inode, data))
continue;
+ if (inode->i_state & (I_FREEING|I_CLEAR)) {
+ __wait_on_freeing_inode(inode);
+ goto retry;
+ }
break;
}
return inode;
@@ -479,7 +497,7 @@ static struct inode * find_inode_fast(st
{
struct list_head *tmp;
struct inode * inode;
-
+retry:
tmp = head;
for (;;) {
tmp = tmp->next;
@@ -491,6 +509,10 @@ static struct inode * find_inode_fast(st
continue;
if (inode->i_sb != sb)
continue;
+ if (inode->i_state & (I_FREEING|I_CLEAR)) {
+ __wait_on_freeing_inode(inode);
+ goto retry;
+ }
break;
}
return inode;
@@ -793,7 +815,6 @@ void generic_delete_inode(struct inode *
{
struct super_operations *op = inode->i_sb->s_op;

- list_del_init(&inode->i_hash);
list_del_init(&inode->i_list);
inode->i_state|=I_FREEING;
inodes_stat.nr_inodes--;
@@ -812,6 +833,10 @@ void generic_delete_inode(struct inode *
delete(inode);
} else
clear_inode(inode);
+ spin_lock(&inode_lock);
+ list_del_init(&inode->i_hash);
+ spin_unlock(&inode_lock);
+ wake_up_inode(inode);
if (inode->i_state != I_CLEAR)
BUG();
destroy_inode(inode);


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs