2004-03-03 03:09:52

by Mike Fedyk

[permalink] [raw]
Subject: bad: scheduling while atomic in nfs with 2.6.3

I'm running 2.6.3-zonebal-lofft-slabfaz

That's with the nfsd loff_t patch and two VM patches from -mm.

More info available upon request.

Mike

loop: loaded (max 8 devices)
ISO 9660 Extensions: Microsoft Joliet Level 3
ISOFS: changing to secondary root
ISO 9660 Extensions: Microsoft Joliet Level 3
ISOFS: changing to secondary root
Debug: sleeping function called from invalid context at
include/linux/rwsem.h:66
in_atomic():1, irqs_disabled():0
Call Trace:
[<c012258d>] __might_sleep+0x9d/0xe0
[<c01651d8>] deactivate_super+0x58/0x100
[<f89e9fba>] svc_export_put+0x7a/0x80 [nfsd]
[<f898167c>] cache_clean+0x18c/0x2e0 [sunrpc]
[<f89817d9>] do_cache_clean+0x9/0x50 [sunrpc]
[<c0136128>] worker_thread+0x1b8/0x260
[<f89817d0>] do_cache_clean+0x0/0x50 [sunrpc]
[<c0120750>] default_wake_function+0x0/0x20
[<c0109e16>] ret_from_fork+0x6/0x20
[<c0120750>] default_wake_function+0x0/0x20
[<c0135f70>] worker_thread+0x0/0x260
[<c0107d95>] kernel_thread_helper+0x5/0x10

bad: scheduling while atomic!
Call Trace:
[<c01206eb>] schedule+0x6fb/0x710
[<c014a06c>] __pagevec_release+0x1c/0x30
[<c014a7d4>] truncate_inode_pages+0xc4/0x2a0
[<c010cb7b>] do_IRQ+0x16b/0x1a0
[<c0179d00>] dispose_list+0xc0/0xd0
[<c0179e82>] invalidate_inodes+0xb2/0xf0
[<c016549b>] generic_shutdown_super+0x9b/0x200
[<c0166454>] kill_block_super+0x14/0x30
[<c01651f7>] deactivate_super+0x77/0x100
[<f89e9fba>] svc_export_put+0x7a/0x80 [nfsd]
[<f898167c>] cache_clean+0x18c/0x2e0 [sunrpc]
[<f89817d9>] do_cache_clean+0x9/0x50 [sunrpc]
[<c0136128>] worker_thread+0x1b8/0x260
[<f89817d0>] do_cache_clean+0x0/0x50 [sunrpc]
[<c0120750>] default_wake_function+0x0/0x20
[<c0109e16>] ret_from_fork+0x6/0x20
[<c0120750>] default_wake_function+0x0/0x20
[<c0135f70>] worker_thread+0x0/0x260
[<c0107d95>] kernel_thread_helper+0x5/0x10

bad: scheduling while atomic!
Call Trace:
[<c01206eb>] schedule+0x6fb/0x710
[<c0163fae>] free_buffer_head+0x3e/0x70
[<c013fe48>] __remove_from_page_cache+0x18/0x80
[<c014a06c>] __pagevec_release+0x1c/0x30
[<c014aa3f>] invalidate_mapping_pages+0x8f/0xf0
[<c0119a95>] smp_apic_timer_interrupt+0xe5/0x160
[<c0161c9d>] invalidate_bh_lru+0x2d/0x60
[<c014aab0>] invalidate_inode_pages+0x10/0x20
[<c016690f>] kill_bdev+0xf/0x30
[<c0167c9c>] blkdev_put+0x1fc/0x220
[<c0166466>] kill_block_super+0x26/0x30
[<c01651f7>] deactivate_super+0x77/0x100
[<f89e9fba>] svc_export_put+0x7a/0x80 [nfsd]
[<f898167c>] cache_clean+0x18c/0x2e0 [sunrpc]
[<f89817d9>] do_cache_clean+0x9/0x50 [sunrpc]
[<c0136128>] worker_thread+0x1b8/0x260
[<f89817d0>] do_cache_clean+0x0/0x50 [sunrpc]
[<c0120750>] default_wake_function+0x0/0x20
[<c0109e16>] ret_from_fork+0x6/0x20
[<c0120750>] default_wake_function+0x0/0x20
[<c0135f70>] worker_thread+0x0/0x260
[<c0107d95>] kernel_thread_helper+0x5/0x10


2004-03-03 04:02:01

by dan carpenter

[permalink] [raw]
Subject: Re: bad: scheduling while atomic in nfs with 2.6.3

On Tuesday 02 March 2004 07:09 pm, Mike Fedyk wrote:
> I'm running 2.6.3-zonebal-lofft-slabfaz
> Call Trace:
> [<c012258d>] __might_sleep+0x9d/0xe0
> [<c01651d8>] deactivate_super+0x58/0x100
> [<f89e9fba>] svc_export_put+0x7a/0x80 [nfsd]

The bad call path goes something like this:
svc_export_put() -> mntput() -> __mntput() -> deactivate_super()
-> down_write() -> might_sleep()

I don't have a fix. Neil Brown might. I've CC'd him.

regards,
dan carpenter

2004-03-03 05:30:24

by J. Bruce Fields

[permalink] [raw]
Subject: Re: bad: scheduling while atomic in nfs with 2.6.3

On Tue, Mar 02, 2004 at 07:09:35PM -0800, Mike Fedyk wrote:
> I'm running 2.6.3-zonebal-lofft-slabfaz
>
> That's with the nfsd loff_t patch and two VM patches from -mm.
>
> Call Trace:
> [<c012258d>] __might_sleep+0x9d/0xe0
> [<c01651d8>] deactivate_super+0x58/0x100
> [<f89e9fba>] svc_export_put+0x7a/0x80 [nfsd]
> [<f898167c>] cache_clean+0x18c/0x2e0 [sunrpc]
> [<f89817d9>] do_cache_clean+0x9/0x50 [sunrpc]
> [<c0136128>] worker_thread+0x1b8/0x260
> [<f89817d0>] do_cache_clean+0x0/0x50 [sunrpc]
> [<c0120750>] default_wake_function+0x0/0x20
> [<c0109e16>] ret_from_fork+0x6/0x20
> [<c0120750>] default_wake_function+0x0/0x20
> [<c0135f70>] worker_thread+0x0/0x260
> [<c0107d95>] kernel_thread_helper+0x5/0x10

This is fixed in 2.6.4-rc1, with the following patch.

--Bruce Fields



We currently call cache_put, which can schedule(), under a spin_lock. This
patch moves that call outside the spinlock.

(From neilb)


net/sunrpc/cache.c | 13 ++++++++-----
1 files changed, 8 insertions(+), 5 deletions(-)

diff -puN net/sunrpc/cache.c~neil_cache_clean_fix net/sunrpc/cache.c
--- linux-2.6.2/net/sunrpc/cache.c~neil_cache_clean_fix 2004-02-11 12:44:13.000000000 -0500
+++ linux-2.6.2-bfields/net/sunrpc/cache.c 2004-02-11 12:44:13.000000000 -0500
@@ -325,6 +325,7 @@ int cache_clean(void)

if (current_detail && current_index < current_detail->hash_size) {
struct cache_head *ch, **cp;
+ struct cache_detail *d;

write_lock(&current_detail->hash_lock);

@@ -354,12 +355,14 @@ int cache_clean(void)
rv = 1;
}
write_unlock(&current_detail->hash_lock);
- if (ch)
- current_detail->cache_put(ch, current_detail);
- else
+ d = current_detail;
+ if (!ch)
current_index ++;
- }
- spin_unlock(&cache_list_lock);
+ spin_unlock(&cache_list_lock);
+ if (ch)
+ d->cache_put(ch, d);
+ } else
+ spin_unlock(&cache_list_lock);

return rv;
}

_