2003-06-12 12:42:49

by John M Flinchbaugh

[permalink] [raw]
Subject: 2.5.70-bk16: nfs crash

running 2.5.70-bk16, i got this error and hang. sysrq worked for
reboot, etc.

it apparently crashed when it mounted an nfs export from a 2.4.18 box,
and tried to mv a file. i doubt it matters, but the nic is an
orinoco_cs wireless card. thanks.

Jun 12 02:00:04 density kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Jun 12 02:00:04 density kernel: printing eip:
Jun 12 02:00:04 density kernel: c0169ef1
Jun 12 02:00:04 density kernel: *pde = 00000000
Jun 12 02:00:04 density kernel: Oops: 0002 [#1]
Jun 12 02:00:04 density kernel: CPU: 0
Jun 12 02:00:04 density kernel: EIP: 0060:[<c0169ef1>] Not tainted
Jun 12 02:00:04 density kernel: EFLAGS: 00010246
Jun 12 02:00:04 density kernel: EIP is at d_move+0x51/0x250
Jun 12 02:00:04 density kernel: eax: 00000000 ebx: cd5e6960 ecx: cd5e69d0 edx: 00000000
Jun 12 02:00:04 density kernel: esi: cd2b4440 edi: cfdb4af4 ebp: cd087e4c esp: cd087e20
Jun 12 02:00:04 density kernel: ds: 007b es: 007b ss: 0068
Jun 12 02:00:04 density kernel: Process mv (pid: 4960, threadinfo=cd086000 task=d41681a0)
Jun 12 02:00:04 density kernel: Stack: cdf4a508 cd5e69dc 0000000e cdf4a508 cd087e70 00000014 d8bf416d cdf4a6a0
Jun 12 02:00:04 density kernel: cd087e70 cd2b4440 cdf4a614 cd087ebc d8bf1ce7 cd5e6960 cd2b4440 cdf4a614
Jun 12 02:00:04 density kernel: cd087ea0 00000001 00000108 d78f0c98 73666e2e 30303030 31313030 30303030
Jun 12 02:00:04 density kernel: Call Trace:
Jun 12 02:00:04 density kernel: [<d8bf416d>] nfs_zap_caches+0x5d/0xd0 [nfs]
Jun 12 02:00:04 density kernel: [<d8bf1ce7>] nfs_sillyrename+0x197/0x220 [nfs]
Jun 12 02:00:04 density kernel: [<d8bf259c>] nfs_rename+0x20c/0x2b0 [nfs]
Jun 12 02:00:04 density kernel: [<c0161f50>] vfs_rename_other+0xc0/0x120
Jun 12 02:00:04 density kernel: [<d8bf2390>] nfs_rename+0x0/0x2b0 [nfs]
Jun 12 02:00:04 density kernel: [<c016213d>] vfs_rename+0x18d/0x3a0
Jun 12 02:00:04 density kernel: [<c01624f3>] sys_rename+0x1a3/0x1d0
Jun 12 02:00:04 density kernel: [<c01093cb>] syscall_call+0x7/0xb
Jun 12 02:00:04 density kernel:
Jun 12 02:00:04 density kernel: Code: 89 02 74 03 89 50 04 c7 43 70 00 01 10 00 c7 41 04 00 02 20
Jun 12 02:00:04 density kernel: <6>note: mv[4960] exited with preempt_count 5
Jun 12 02:00:04 density kernel: bad: scheduling while atomic!
Jun 12 02:00:04 density kernel: Call Trace:
Jun 12 02:00:04 density kernel: [<c0116d32>] schedule+0x3c2/0x3d0
Jun 12 02:00:04 density kernel: [<c01d5576>] __copy_from_user_ll+0x66/0x70
Jun 12 02:00:04 density kernel: [<c0135a06>] unlock_page+0x16/0x50
Jun 12 02:00:04 density kernel: [<c0137e95>] generic_file_aio_write_nolock+0x5f5/0xba0
Jun 12 02:00:04 density kernel: [<c01384b8>] generic_file_write_nolock+0x78/0x90
Jun 12 02:00:04 density kernel: [<c0225658>] vt_console_print+0x1e8/0x2f0
Jun 12 02:00:04 density kernel: [<c011a8de>] __call_console_drivers+0x5e/0x60
Jun 12 02:00:04 density kernel: [<c011a9b5>] call_console_drivers+0x65/0x120
Jun 12 02:00:04 density kernel: [<c01385b5>] generic_file_write+0x55/0x70
Jun 12 02:00:04 density kernel: [<c01af270>] reiserfs_file_write+0x50/0x652
Jun 12 02:00:04 density kernel: [<c02465f2>] vgacon_scroll+0x192/0x250
Jun 12 02:00:04 density kernel: [<c0245243>] vgacon_cursor+0xe3/0x1e0
Jun 12 02:00:04 density kernel: [<c0121ae1>] mod_timer+0x131/0x190
Jun 12 02:00:04 density kernel: [<c0132ea0>] check_free_space+0x100/0x190
Jun 12 02:00:04 density kernel: [<c022638b>] poke_blanked_console+0x6b/0x80
Jun 12 02:00:04 density kernel: [<c022567d>] vt_console_print+0x20d/0x2f0
Jun 12 02:00:04 density kernel: [<c01162e9>] try_to_wake_up+0xa9/0x150
Jun 12 02:00:04 density kernel: [<c0116dba>] default_wake_function+0x2a/0x30
Jun 12 02:00:04 density kernel: [<c013346a>] do_acct_process+0x27a/0x290
Jun 12 02:00:04 density kernel: [<c0114820>] do_page_fault+0x0/0x4cd
Jun 12 02:00:04 density kernel: [<c01334b9>] acct_process+0x39/0x61
Jun 12 02:00:04 density kernel: [<c011c81d>] do_exit+0x8d/0x460
Jun 12 02:00:04 density kernel: [<c0114820>] do_page_fault+0x0/0x4cd
Jun 12 02:00:04 density kernel: [<c0109c01>] die+0xe1/0xf0
Jun 12 02:00:04 density kernel: [<c011494f>] do_page_fault+0x12f/0x4cd
Jun 12 02:00:04 density kernel: [<d8c12988>] nfs_procedures+0x108/0x1b0 [nfs]
Jun 12 02:00:04 density kernel: [<d8babc70>] rpc_run_timer+0x0/0x80 [sunrpc]
Jun 12 02:00:04 density kernel: [<c0114820>] do_page_fault+0x0/0x4cd
Jun 12 02:00:04 density kernel: [<c0109575>] error_code+0x2d/0x38
Jun 12 02:00:04 density kernel: [<c0169ef1>] d_move+0x51/0x250
Jun 12 02:00:04 density kernel: [<d8bf416d>] nfs_zap_caches+0x5d/0xd0 [nfs]
Jun 12 02:00:04 density kernel: [<d8bf1ce7>] nfs_sillyrename+0x197/0x220 [nfs]
Jun 12 02:00:04 density kernel: [<d8bf259c>] nfs_rename+0x20c/0x2b0 [nfs]
Jun 12 02:00:04 density kernel: [<c0161f50>] vfs_rename_other+0xc0/0x120
Jun 12 02:00:04 density kernel: [<d8bf2390>] nfs_rename+0x0/0x2b0 [nfs]
Jun 12 02:00:04 density kernel: [<c016213d>] vfs_rename+0x18d/0x3a0
Jun 12 02:00:04 density kernel: [<c01624f3>] sys_rename+0x1a3/0x1d0
Jun 12 02:00:04 density kernel: [<c01093cb>] syscall_call+0x7/0xb
Jun 12 02:00:04 density kernel:

some info about the build environment:
Gnu C 3.3
Gnu make 3.80
util-linux 2.11z
mount 2.11z
Linux C Library 2.3.1
Dynamic linker (ldd) 2.3.1
Procps 3.1.9
Net-tools 1.60
Console-tools 0.2.3
Sh-utils 5.0

--
____________________}John Flinchbaugh{______________________
| [email protected] http://www.hjsoft.com/~glynis/ |
~~Powered by Linux: Reboots are for hardware upgrades only~~


Attachments:
(No filename) (5.51 kB)
(No filename) (189.00 B)
Download all attachments

2003-06-12 13:36:25

by Dipankar Sarma

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash

On Thu, Jun 12, 2003 at 12:57:26PM +0000, John M Flinchbaugh wrote:
> running 2.5.70-bk16, i got this error and hang. sysrq worked for
> reboot, etc.
>
> it apparently crashed when it mounted an nfs export from a 2.4.18 box,
> and tried to mv a file. i doubt it matters, but the nic is an
> orinoco_cs wireless card. thanks.
>
> Jun 12 02:00:04 density kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
> Jun 12 02:00:04 density kernel: printing eip:
> Jun 12 02:00:04 density kernel: c0169ef1
> Jun 12 02:00:04 density kernel: *pde = 00000000
> Jun 12 02:00:04 density kernel: Oops: 0002 [#1]
> Jun 12 02:00:04 density kernel: CPU: 0
> Jun 12 02:00:04 density kernel: EIP: 0060:[<c0169ef1>] Not tainted
> Jun 12 02:00:04 density kernel: EFLAGS: 00010246
> Jun 12 02:00:04 density kernel: EIP is at d_move+0x51/0x250
> Jun 12 02:00:04 density kernel: eax: 00000000 ebx: cd5e6960 ecx: cd5e69d0 edx: 00000000

I am not supprised at all by this, I can see two csets in Linus' tree
that will definitely break dcache -

1. http://linux.bkbits.net:8080/linux-2.5/[email protected]?nav=index.html|ChangeSet@-2d

__d_drop() *must not* initialize d_hash fields. Lockfree lookup depends on
that. If __d_drop() needs to be allowed on an unhashed dentry, the right
thing to do would be to check for DCACHE_UNHASHED before unhashing. I will
submit a patch a little later to do this.


2. http://linux.bkbits.net:8080/linux-2.5/[email protected]?nav=index.html|ChangeSet@-2d

hlist poison patch is broken. list_del_rcu() and hlist_del_rcu()
*must not* re-initialize the pointers. Maneesh submitted a patch
earlier today that corrects this -

http://marc.theaimsgroup.com/?l=linux-kernel&m=105541206017154&w=2


Thanks
Dipankar

2003-06-12 15:17:28

by Dipankar Sarma

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash

On Thu, Jun 12, 2003 at 07:22:54PM +0530, Dipankar Sarma wrote:
> I am not supprised at all by this, I can see two csets in Linus' tree
> that will definitely break dcache -
>
> 1. http://linux.bkbits.net:8080/linux-2.5/[email protected]?nav=index.html|ChangeSet@-2d
>
> __d_drop() *must not* initialize d_hash fields. Lockfree lookup depends on
> that. If __d_drop() needs to be allowed on an unhashed dentry, the right
> thing to do would be to check for DCACHE_UNHASHED before unhashing. I will
> submit a patch a little later to do this.
>
>
> 2. http://linux.bkbits.net:8080/linux-2.5/[email protected]?nav=index.html|ChangeSet@-2d
>
> hlist poison patch is broken. list_del_rcu() and hlist_del_rcu()
> *must not* re-initialize the pointers. Maneesh submitted a patch
> earlier today that corrects this -
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=105541206017154&w=2

John,

Can you try the patch below along with Maneesh's patch mentioned
above to see if they fix your problem ?

Thanks
Dipankar



Fix __d_drop() to be allowed on unhashed dentries correctly. Don't
re-initialize the pointers, instead check for DCACHE_UNHASHED
before really unhashing it.


include/linux/dcache.h | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

diff -puN include/linux/dcache.h~dentry-hash-init-fix include/linux/dcache.h
--- linux-2.5.70-bk16/include/linux/dcache.h~dentry-hash-init-fix 2003-06-12 20:43:32.000000000 +0530
+++ linux-2.5.70-bk16-dipankar/include/linux/dcache.h 2003-06-12 20:47:10.000000000 +0530
@@ -174,8 +174,10 @@ extern spinlock_t dcache_lock;

static inline void __d_drop(struct dentry *dentry)
{
- dentry->d_vfs_flags |= DCACHE_UNHASHED;
- hlist_del_rcu_init(&dentry->d_hash);
+ if (!(dentry->d_vfs_flags & DCACHE_UNHASHED)) {
+ dentry->d_vfs_flags |= DCACHE_UNHASHED;
+ hlist_del_rcu(&dentry->d_hash);
+ }
}

static inline void d_drop(struct dentry *dentry)

_

2003-06-12 15:21:42

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash

>>>>> " " == Dipankar Sarma <[email protected]> writes:

> I am not supprised at all by this, I can see two csets in
> Linus' tree that will definitely break dcache -

> 1. http://linux.bkbits.net:8080/linux-2.5/[email protected]?nav=index.html|ChangeSet@-2d

> __d_drop() *must not* initialize d_hash fields. Lockfree lookup
> depends on that. If __d_drop() needs to be allowed on an
> unhashed dentry, the right thing to do would be to check for
> DCACHE_UNHASHED before unhashing. I will submit a patch a
> little later to do this.

Can you please remind us exactly what the benefits of all this are?
Why can't d_free() immediately free the memory instead of relying on
the RCU mechanism?

Cheers,
Trond

2003-06-12 15:36:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash


On Thu, 12 Jun 2003, Dipankar Sarma wrote:
>
> hlist poison patch is broken. list_del_rcu() and hlist_del_rcu()
> *must not* re-initialize the pointers. Maneesh submitted a patch
> earlier today that corrects this -

Sorry, but you're wrong.

If you depend on not re-initializing the pointers, you should not use the
"xxx_del()" function, and you should document it.

This is that the __xxx_del() functions are there for - they won't do the
poisoning. The regular delete functions have historically always poisoned
the pointers - it was only removed a few months ago by Andrew.

Linus

2003-06-12 15:37:20

by Dipankar Sarma

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash

On Thu, Jun 12, 2003 at 08:35:14AM -0700, Trond Myklebust wrote:
> >>>>> " " == Dipankar Sarma <[email protected]> writes:
> > __d_drop() *must not* initialize d_hash fields. Lockfree lookup
> > depends on that. If __d_drop() needs to be allowed on an
> > unhashed dentry, the right thing to do would be to check for
> > DCACHE_UNHASHED before unhashing. I will submit a patch a
> > little later to do this.
>
> Can you please remind us exactly what the benefits of all this are?
> Why can't d_free() immediately free the memory instead of relying on
> the RCU mechanism?

Because we no longer hold the dcache_lock while doing a d_lookup().
With the dentry still around (RCU wouldn't happen until all CPUs
do a context switch or execute user-level code), lookup can continue
to traverse the hash list while another CPU deletes the currrent
dentry. Once RCU happens, it is guranteed that no other CPU
could be in that dentry during hash list traversal. That is why
we have _rcu versions of the list deletion macros.
Lockfree d_lookup() gives us significant benefits in larger
SMP machines.

Does my patch meet the requirements that you had for __d_drop() ?

Thanks
Dipankar

2003-06-12 15:52:45

by Dipankar Sarma

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash

On Thu, Jun 12, 2003 at 08:49:53AM -0700, Linus Torvalds wrote:
> On Thu, 12 Jun 2003, Dipankar Sarma wrote:
> >
> > hlist poison patch is broken. list_del_rcu() and hlist_del_rcu()
> > *must not* re-initialize the pointers. Maneesh submitted a patch
> > earlier today that corrects this -
>
> Sorry, but you're wrong.
>
> If you depend on not re-initializing the pointers, you should not use the
> "xxx_del()" function, and you should document it.

Right. That is why they are list_del_rcu() and hlist_del_rcu().
The comments for list_del_rcu() clearly say that pointers
are not re-initialized.

>
> This is that the __xxx_del() functions are there for - they won't do the
> poisoning. The regular delete functions have historically always poisoned
> the pointers - it was only removed a few months ago by Andrew.

Poisoning xxx_del() functions is ok. That should be done.

Thanks
Dipankar

2003-06-12 16:04:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash


On Thu, 12 Jun 2003, Linus Torvalds wrote:
>
> If you depend on not re-initializing the pointers, you should not use the
> "xxx_del()" function, and you should document it.

Besides, the code doesn't actually depend on not re-initializing the
pointers, it depends on the _forward_ pointers still being walkable in
case some other CPU is traversing the list just as we remove the entry.

Which means that I think the proper patch is to (a) document this and also
(b) poison the back pointer.

A patch like the attached, in short.

Linus

---
===== include/linux/dcache.h 1.32 vs edited =====
--- 1.32/include/linux/dcache.h Tue Jun 10 14:56:43 2003
+++ edited/include/linux/dcache.h Thu Jun 12 09:12:27 2003
@@ -174,8 +174,10 @@

static inline void __d_drop(struct dentry *dentry)
{
- dentry->d_vfs_flags |= DCACHE_UNHASHED;
- hlist_del_rcu_init(&dentry->d_hash);
+ if (!(dentry->d_vfs_flags & DCACHE_UNHASHED)) {
+ dentry->d_vfs_flags |= DCACHE_UNHASHED;
+ hlist_del_rcu(&dentry->d_hash);
+ }
}

static inline void d_drop(struct dentry *dentry)
===== include/linux/list.h 1.32 vs edited =====
--- 1.32/include/linux/list.h Tue Jun 10 15:46:31 2003
+++ edited/include/linux/list.h Thu Jun 12 08:59:31 2003
@@ -152,14 +152,17 @@
/**
* list_del_rcu - deletes entry from list without re-initialization
* @entry: the element to delete from the list.
+ *
* Note: list_empty on entry does not return true after this,
* the entry is in an undefined state. It is useful for RCU based
* lockfree traversal.
+ *
+ * In particular, it means that we can not poison the forward
+ * pointers that may still be used for path walking.
*/
static inline void list_del_rcu(struct list_head *entry)
{
__list_del(entry->prev, entry->next);
- entry->next = LIST_POISON1;
entry->prev = LIST_POISON2;
}

@@ -431,7 +434,22 @@
n->pprev = LIST_POISON2;
}

-#define hlist_del_rcu hlist_del /* list_del_rcu is identical too? */
+/**
+ * hlist_del_rcu - deletes entry from hash list without re-initialization
+ * @entry: the element to delete from the list.
+ *
+ * Note: list_empty on entry does not return true after this,
+ * the entry is in an undefined state. It is useful for RCU based
+ * lockfree traversal.
+ *
+ * In particular, it means that we can not poison the forward
+ * pointers that may still be used for path walking.
+ */
+static inline void hlist_del_rcu(struct hlist_node *n)
+{
+ __hlist_del(n);
+ n->pprev = LIST_POISON2;
+}

static __inline__ void hlist_del_init(struct hlist_node *n)
{

2003-06-12 16:12:53

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash

>>>>> " " == Dipankar Sarma <[email protected]> writes:

> Does my patch meet the requirements that you had for __d_drop()
> ?

I still need a real fix for d_move(). In addition, I'm getting worried
about the changes in functionality that you've introduced here. It
seems to me that your lockless scheme opens for a *lot* of races:

Look at all those functions that take dcache_lock, and then test
dentry->d_count. Unless I'm missing something here, your d_lookup()
clearly has them all screwed, no?

Cheers,
Trond

2003-06-12 16:17:03

by Al Viro

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash

On Thu, Jun 12, 2003 at 09:23:45PM +0530, Dipankar Sarma wrote:

> Lockfree d_lookup() gives us significant benefits in larger
> SMP machines.

I wonder if they outweight debugging time wasted after any change...

Note that for vfsmounts proposed RCU patch had been utterly useless -
practically all improvements had been from separate lock for vfsmounts
(see akpm tree).

2003-06-12 16:21:14

by Dipankar Sarma

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash

On Thu, Jun 12, 2003 at 09:18:11AM -0700, Linus Torvalds wrote:
>
> On Thu, 12 Jun 2003, Linus Torvalds wrote:
> >
> > If you depend on not re-initializing the pointers, you should not use the
> > "xxx_del()" function, and you should document it.
>
> Besides, the code doesn't actually depend on not re-initializing the
> pointers, it depends on the _forward_ pointers still being walkable in
> case some other CPU is traversing the list just as we remove the entry.
>
> Which means that I think the proper patch is to (a) document this and also
> (b) poison the back pointer.

That should work. However, I do have once concern. At the generic
list macro level, we don't know if the lockfree traversal is being
done in forward or backward direction. So, I am not sure if
list_del_rcu() should poison the backward pointer or atleast
document the fact that RCU-based traversal would not work on
backward pointers. hlist_del_rcu() can indeed poison pprev.


>
> A patch like the attached, in short.

Since we are at it, I can submit a patch documenting the rest of
hlist functions, if you like.

Thanks
Dipankar

2003-06-12 16:34:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash


On Thu, 12 Jun 2003, Dipankar Sarma wrote:
>
> That should work. However, I do have once concern. At the generic
> list macro level, we don't know if the lockfree traversal is being
> done in forward or backward direction.

Sure we do. We do have backwards list traversal, but it's already not
available for the RCU case (ie we have "list_for_each_entry_rcu()", but we
don't have "list_for_each_entry_reverse_rcu()").

Of course, somebody may be using the non-RCU versions of the list
traversal macros on a RCU list, but that would be a bug anyway.

Linus

2003-06-12 16:36:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash


On Thu, 12 Jun 2003, Trond Myklebust wrote:
>
> I still need a real fix for d_move().

How about something like this.. It still breaks if the _target_ is
unhashed, but quite frankly, if you have unhashed the target, I think
something is seriously wrong.

Linus

---
===== fs/dcache.c 1.57 vs edited =====
--- 1.57/fs/dcache.c Sat Jun 7 10:17:01 2003
+++ edited/fs/dcache.c Thu Jun 12 09:42:48 2003
@@ -1221,10 +1221,14 @@
}

/* Move the dentry to the target hash queue, if on different bucket */
+ if (dentry->d_vfs_flags & DCACHE_UNHASHED)
+ goto already_unhashed;
if (dentry->d_bucket != target->d_bucket) {
- dentry->d_bucket = target->d_bucket;
hlist_del_rcu(&dentry->d_hash);
+already_unhashed:
+ dentry->d_bucket = target->d_bucket;
hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
+ dentry->d_vfs_flags &= ~DCACHE_UNHASHED;
}

/* Unhash the target: dput() will then get rid of it */

2003-06-12 16:39:42

by Dipankar Sarma

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash

On Thu, Jun 12, 2003 at 05:30:45PM +0100, [email protected] wrote:
> On Thu, Jun 12, 2003 at 09:23:45PM +0530, Dipankar Sarma wrote:
>
> > Lockfree d_lookup() gives us significant benefits in larger
> > SMP machines.
>
> I wonder if they outweight debugging time wasted after any change...

Several sets of numbers have been published in lkml on this.
I will work on sending out my updates to the vfs locking document
you wrote ASAP. AFAICS, most dcache APIs work as is despite lockfree
lookup. As long as we follow those rules, we should be ok.

>
> Note that for vfsmounts proposed RCU patch had been utterly useless -
> practically all improvements had been from separate lock for vfsmounts
> (see akpm tree).

Yes and that is why Maneesh's patch had two parts. In that case benefit
came from reducing acquisition of dcache_lock by the first part itself -
using a separate lock for vfsmounts. It does not seem possible to
split up dcache_lock any further without very significant changes in vfs.
The acquisition of vfsmount_lock by itself was not significant enough
to really warrant lockfree lookup.

Thanks
Dipankar

2003-06-12 16:41:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash


On Thu, 12 Jun 2003, Linus Torvalds wrote:
>
> How about something like this.. It still breaks if the _target_ is
> unhashed

Actually, it doesn't _break_, it just does something surprising (but
possibly quite correct): it will hash the dentry to the same hash chain
the target _used_ to be on before being unhashed.

Which might actually be the right thing, but it still sounds to me like
a bad idea to have a unhashed target.

Linus

2003-06-12 19:37:34

by Dipankar Sarma

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash

On Thu, Jun 12, 2003 at 09:26:29AM -0700, Trond Myklebust wrote:
> I still need a real fix for d_move(). In addition, I'm getting worried

Linus' fix allowing unhashed src dentries seems ok, if that is what
you are looking for.


> about the changes in functionality that you've introduced here. It
> seems to me that your lockless scheme opens for a *lot* of races:
>
> Look at all those functions that take dcache_lock, and then test
> dentry->d_count. Unless I'm missing something here, your d_lookup()
> clearly has them all screwed, no?

Not necessarily. One example is the fact that d_lookup() can
only increase d_count. Besides, dput() decrements d_count
without dcache_lock, so I am not sure holding dcache_lock during
d_count test buys you much.

We will do some audit tomorrow and see if there are issues here.

Thanks
Dipankar

2003-06-13 05:11:02

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash

>>>>> " " == Dipankar Sarma <[email protected]> writes:

>> Look at all those functions that take dcache_lock, and then
>> test
>> dentry-> d_count. Unless I'm missing something here, your
>> d_lookup() clearly has them all screwed, no?

> Not necessarily. One example is the fact that d_lookup() can
> only increase d_count. Besides, dput() decrements d_count
> without dcache_lock, so I am not sure holding dcache_lock
> during d_count test buys you much.

Wrong. Look at the VFS code. In all cases the test is of the form.

spin_lock(&dcache_lock);
/* Are we the sole users of this dentry */
if (atomic_read(&dentry->d_count) == 1) {
/* Yes - do some operation */
}


Knowing that d_lookup() can *increase* d_count is not a plus here. The
whole idea is to have a test for sole use.

In most cases, the "do some operation" above is

d_drop(dentry);

in order (for instance) to ensure that nobody else can look up this
dentry while we're working on it (e.g. rename or unlink,...).

Your d_lookup() screws the above example of code which you can find in
any number of VFS functions. dput(), d_delete(), d_invalidate(),
d_prune_aliases(), prune_dcache(), shrink_dcache_sb() are but a few
functions that rely on the above code snippet working to keep
d_lookup() from intruding.

Cheers,
Trond

2003-06-13 05:33:32

by Dipankar Sarma

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash

On Thu, Jun 12, 2003 at 10:24:16PM -0700, Trond Myklebust wrote:
> Wrong. Look at the VFS code. In all cases the test is of the form.
>
> spin_lock(&dcache_lock);
> /* Are we the sole users of this dentry */
> if (atomic_read(&dentry->d_count) == 1) {
> /* Yes - do some operation */
> }
>
>
> Knowing that d_lookup() can *increase* d_count is not a plus here. The
> whole idea is to have a test for sole use.

Well, d_lookup() isn't the only place that does a dget() without
holding dcache_lock. There are *many* places where dget() is
done without holding dcache_lock. That didn't seem to be a
requirement in the pre-RCU dcache model.

>
> In most cases, the "do some operation" above is
>
> d_drop(dentry);
>

I don't think that would work in pre or post-RCU dcache.

> in order (for instance) to ensure that nobody else can look up this
> dentry while we're working on it (e.g. rename or unlink,...).

rename, unlink etc. hold the per-dentry lock, so they are protected
against lockfree d_lookup().

>
> Your d_lookup() screws the above example of code which you can find in
> any number of VFS functions. dput(), d_delete(), d_invalidate(),
> d_prune_aliases(), prune_dcache(), shrink_dcache_sb() are but a few
> functions that rely on the above code snippet working to keep
> d_lookup() from intruding.

Those routines hold the per-dentry lock as required and that protects
them from intruding lockfree d_lookup().

Thanks
Dipankar

2003-06-13 05:49:55

by Dipankar Sarma

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash

On Thu, Jun 12, 2003 at 10:24:16PM -0700, Trond Myklebust wrote:
> Wrong. Look at the VFS code. In all cases the test is of the form.
>
> spin_lock(&dcache_lock);
> /* Are we the sole users of this dentry */
> if (atomic_read(&dentry->d_count) == 1) {
> /* Yes - do some operation */
> }
>
>
> Knowing that d_lookup() can *increase* d_count is not a plus here. The
> whole idea is to have a test for sole use.

I missed this part. If you want to do this, I would suggest taking
the per-dentry lock instead. Most dcache routines have been fixed
to do this. We will look around and see anything violates this rule.

Thanks
Dipankar

2003-06-13 06:00:01

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash

>>>>> " " == Dipankar Sarma <[email protected]> writes:

> Well, d_lookup() isn't the only place that does a dget()
> without holding dcache_lock. There are *many* places where
> dget() is done without holding dcache_lock. That didn't seem to
> be a requirement in the pre-RCU dcache model.

d_lookup() is the only place where someone can pick up an existing
dentry for which they do not already hold a reference.

>> d_invalidate(), d_prune_aliases(), prune_dcache(),
>> shrink_dcache_sb() are but a few functions that rely on the
>> above code snippet working to keep d_lookup() from intruding.

> Those routines hold the per-dentry lock as required and that
> protects them from intruding lockfree d_lookup().

d_invalidate() does not. d_prune_aliases() does not. d_unhash() does
not.

Down in the per-filesystem code, I know of several locations in the
NFS code that do not. There's one in procfs. I'm sure you can find
more examples in the other filesystems too...

Your argument only holds water if you demand that all callers of
d_drop() should also take the per-dentry lock. AFAICS this is not
being enforced.

Cheers,
Trond

2003-06-13 06:37:43

by Dipankar Sarma

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash

On Thu, Jun 12, 2003 at 11:13:31PM -0700, Trond Myklebust wrote:
> >>>>> " " == Dipankar Sarma <[email protected]> writes:
> d_lookup() is the only place where someone can pick up an existing
> dentry for which they do not already hold a reference.

AFAICS, the way to do a perfectly safe "sole ownership" check is
to grab the per-dentry lock (instead of dcache_lock) and
do the check.

> >> d_invalidate(), d_prune_aliases(), prune_dcache(),
> >> shrink_dcache_sb() are but a few functions that rely on the
> >> above code snippet working to keep d_lookup() from intruding.
>
> > Those routines hold the per-dentry lock as required and that
> > protects them from intruding lockfree d_lookup().
>
> d_invalidate() does not. d_prune_aliases() does not. d_unhash() does
> not.
>
> Down in the per-filesystem code, I know of several locations in the
> NFS code that do not. There's one in procfs. I'm sure you can find
> more examples in the other filesystems too...

Yes and we need an audit. Besides I see places where d_count check
is being done without holding *any* lock. That is ok only if the
dentry is guranteed to be unhashed and we are doing a "sole ownership"
check. Otherwise it may not work.

>
> Your argument only holds water if you demand that all callers of
> d_drop() should also take the per-dentry lock. AFAICS this is not
> being enforced.

There are some rules that we need to work out irrespective of
RCU and document clearly -

1. d_unhashed() checks that are being done with and without dcache_lock -

if (!d_unhashed(new_dentry)) {
d_drop(new_dentry);
rehash = new_dentry;
}
This seems to require that either we hold the dcache_lock or the operations
that we do on the dentry allow unhashed dentries.

2. Checking for "sole ownership" of dentries. Depending on whether the
dentry is hashed and what operation we are going to do, we will
need to hold the per-dentry lock.

I am glad that you are reviewing this. Is there any other dcache
operations in filesystems that you would like to add to the list
above ?

Thanks
Dipankar

2003-06-13 12:32:51

by Maneesh Soni

[permalink] [raw]
Subject: Re: 2.5.70-bk16: nfs crash

On Thu, Jun 12, 2003 at 09:18:11AM -0700, Linus Torvalds wrote:
>
>
[..]
> ---
> ===== include/linux/dcache.h 1.32 vs edited =====
> --- 1.32/include/linux/dcache.h Tue Jun 10 14:56:43 2003
> +++ edited/include/linux/dcache.h Thu Jun 12 09:12:27 2003
> @@ -174,8 +174,10 @@
>
> static inline void __d_drop(struct dentry *dentry)
> {
> - dentry->d_vfs_flags |= DCACHE_UNHASHED;
> - hlist_del_rcu_init(&dentry->d_hash);
> + if (!(dentry->d_vfs_flags & DCACHE_UNHASHED)) {
> + dentry->d_vfs_flags |= DCACHE_UNHASHED;
> + hlist_del_rcu(&dentry->d_hash);
> + }
> }

Looks like there is some problem in this. With this conditional
d_drop, umounting an NFS mount goes in a loop and oopses in dput()
This is on 2.5.70-bk17.

Unable to handle kernel paging request at virtual address 00100104
printing eip:
c016f9b1
*pde = 00000000
Oops: 0002 [#1]
CPU: 3
EIP: 0060:[<c016f9b1>] Not tainted
EFLAGS: 00010246
EIP is at dput+0x1f1/0x350
eax: 00100100 ebx: f72a8c80 ecx: f72a8c9c edx: 00200200
esi: f64d2000 edi: f72a8c88 ebp: 00000000 esp: f64d3e8c
ds: 007b es: 007b ss: 0068
Process umount (pid: 926, threadinfo=f64d2000 task=f79d9900)
Stack: 00000010 c016007b 0000007b ffffffef f72a8c80 f6448380 f64481e0 c02fccca
f72a8c80 f6448380 f72a8cf8 f72a8820 f72a8820 f6448860 c035f8e0 c02fd1c0
f72a8820 f78775e0 f7fde560 f74971d9 00000005 10ee271a 00000010 00000000
Call Trace:
[<c016007b>] bd_claim+0x2b/0xa0
[<c02fccca>] rpc_depopulate+0x16a/0x190
[<c02fd1c0>] rpc_rmdir+0x60/0xa0
[<c02f2a4d>] rpcauth_destroy+0xd/0x60
[<c02ec45d>] rpc_destroy_client+0x4d/0x70
[<c01b6427>] nfs_put_super+0x17/0x40
[<c015e03f>] generic_shutdown_super+0xbf/0x200
[<c015eec0>] kill_anon_super+0x10/0x80
[<c01b8811>] nfs_kill_super+0x11/0x20
[<c015dcf9>] deactivate_super+0xa9/0x140
[<c0175d08>] __mntput+0x18/0x30
[<c0165e79>] path_release+0x29/0x30
[<c0176641>] sys_umount+0x81/0x90
[<c015710f>] sys_close+0x9f/0x100
[<c017665c>] sys_oldumount+0xc/0x10
[<c0109477>] syscall_call+0x7/0xb


Removing the DCACHE_UNHASHED check makes it work again. Needs more
investigation.


--
Maneesh Soni
IBM Linux Technology Center,
IBM India Software Lab, Bangalore.
Phone: +91-80-5044999 email: [email protected]
http://lse.sourceforge.net/