LinuxLists.cc - NFS hang

2006-11-06 19:16:47

[permalink] [raw]

Subject: NFS hang

Hi,

We've got an IBM bladecenter we're using to do some NFS stress testing
(v3). It seems with kernel 2.6.16.21 we get a hang during our stress
tests after roughly and hour or two. The hanging process has a
backtrace that looks like this:

cp D 000000000ff0b198 6672 16505 16296
(NOTLB)
Call Trace:
[C00000000F43B1D0] [C00000002DD5B088] 0xc00000002dd5b088 (unreliable)
[C00000000F43B3A0] [C00000000000F0B4] .__switch_to+0x12c/0x150
[C00000000F43B430] [C00000000039980C] .schedule+0xcec/0xe4c
[C00000000F43B540] [C00000000039A688] .io_schedule+0x68/0xb8
[C00000000F43B5D0] [C000000000093D60] .sync_page+0x7c/0x98
[C00000000F43B650] [C00000000039A8B4] .__wait_on_bit_lock+0x8c/0x114
[C00000000F43B700] [C000000000093C94] .__lock_page+0xa0/0xc8
[C00000000F43B820] [C000000000094A6C] .do_generic_mapping_read+0x21c/0x4a0
[C00000000F43B970] [C000000000095624] .__generic_file_aio_read+0x17c/0x228
[C00000000F43BA40] [C0000000000957C4] .generic_file_aio_read+0x44/0x54
[C00000000F43BAD0] [D000000000320F58] .nfs_file_read+0xb8/0xec [nfs]
[C00000000F43BB70] [C0000000000C5298] .do_sync_read+0xd4/0x130
[C00000000F43BCF0] [C0000000000C60E0] .vfs_read+0x128/0x20c
[C00000000F43BD90] [C0000000000C65C0] .sys_read+0x4c/0x8c
[C00000000F43BE30] [C00000000000871C] syscall_exit+0x0/0x40

So far, we've only been able to recreate this with the client and server
blades both in the same chassis. We tried recreating with nfs_debug
enabled, but it did not hang. So perhaps there is a correlation with
the high-speed network (gigabit ethernet over fibre) that comes with
that configuration.

Does anyone recognize the backtrace of the process or the symptoms of
the problem? We'd appreciate any info that you may have.

Please CC me in responses as I'm not subscribed to this list.

josh

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-27 19:09:20

by Chris Caputo

[permalink] [raw]

Subject: Re: NFS hang

On Thu, 23 Nov 2006, Chris Caputo wrote:
> On Mon, 6 Nov 2006, Josh Boyer wrote:
> > We've got an IBM bladecenter we're using to do some NFS stress testing
> > (v3). It seems with kernel 2.6.16.21 we get a hang during our stress
> > tests after roughly and hour or two. The hanging process has a
> > backtrace that looks like this:
> >
> > cp D 000000000ff0b198 6672 16505 16296
> > (NOTLB)
> > Call Trace:
> > [C00000000F43B1D0] [C00000002DD5B088] 0xc00000002dd5b088 (unreliable)
> > [C00000000F43B3A0] [C00000000000F0B4] .__switch_to+0x12c/0x150
> > [C00000000F43B430] [C00000000039980C] .schedule+0xcec/0xe4c
> > [C00000000F43B540] [C00000000039A688] .io_schedule+0x68/0xb8
> > [C00000000F43B5D0] [C000000000093D60] .sync_page+0x7c/0x98
> > [C00000000F43B650] [C00000000039A8B4] .__wait_on_bit_lock+0x8c/0x114
> > [C00000000F43B700] [C000000000093C94] .__lock_page+0xa0/0xc8
> > [C00000000F43B820] [C000000000094A6C] .do_generic_mapping_read+0x21c/0x4a0
> > [C00000000F43B970] [C000000000095624] .__generic_file_aio_read+0x17c/0x228
> > [C00000000F43BA40] [C0000000000957C4] .generic_file_aio_read+0x44/0x54
> > [C00000000F43BAD0] [D000000000320F58] .nfs_file_read+0xb8/0xec [nfs]
> > [C00000000F43BB70] [C0000000000C5298] .do_sync_read+0xd4/0x130
> > [C00000000F43BCF0] [C0000000000C60E0] .vfs_read+0x128/0x20c
> > [C00000000F43BD90] [C0000000000C65C0] .sys_read+0x4c/0x8c
> > [C00000000F43BE30] [C00000000000871C] syscall_exit+0x0/0x40
> >
> > So far, we've only been able to recreate this with the client and server
> > blades both in the same chassis. We tried recreating with nfs_debug
> > enabled, but it did not hang. So perhaps there is a correlation with
> > the high-speed network (gigabit ethernet over fibre) that comes with
> > that configuration.
> >
> > Does anyone recognize the backtrace of the process or the symptoms of
> > the problem? We'd appreciate any info that you may have.
> >
> > Please CC me in responses as I'm not subscribed to this list.
>
> Hi. I am digging into a hang file issue (albeit NFSv2) that I've narrowed
> down to beginning with 2.6.11 and continuing to happen through 2.6.18, but
> which does not happen on 2.6.10 or prior.
>
> I wonder if you could run your tests on 2.6.10 and 2.6.11.12 and share the
> results with me.

If you were running into the same problem I was, the following should fix
it. The below patch should apply with ease to 2.6.16.21.

Chris

---

[PATCH 2.6.19-rc6] sunrpc: fix race condition

Patch linux-2.6.10-01-rpc_workqueue.dif introduced a race condition into
net/sunrpc/sched.c in kernels 2.6.11-rc1 through 2.6.19-rc6. The race
scenario is as follows...

Given: RPC_TASK_QUEUED, RPC_TASK_RUNNING and RPC_TASK_ASYNC are set.

__rpc_execute() (no spinlock) rpc_make_runnable() (queue spinlock held)
----------------------------- -----------------------------------------
do_ret = rpc_test_and_set_running(task);
rpc_clear_running(task);
if (RPC_IS_ASYNC(task)) {
if (RPC_IS_QUEUED(task))
return 0;
rpc_clear_queued(task);
if (do_ret)
return;

Thus both threads return and the task is abandoned forever.

In my test NFS client usage (~200 Mb/s at ~3,000 RPC calls/s) this race
condition has resulted in processes getting permanently stuck in 'D' state
often in less than 15 minutes of uptime.

The following patch fixes the problem by returning to use of a spinlock in
__rpc_execute().

Signed-off-by: Chris Caputo <[email protected]>
---

diff -up a/net/sunrpc/sched.c b/net/sunrpc/sched.c
--- a/net/sunrpc/sched.c 2006-11-27 08:41:07.000000000 +0000
+++ b/net/sunrpc/sched.c 2006-11-27 11:14:21.000000000 +0000
@@ -587,6 +587,7 @@ EXPORT_SYMBOL(rpc_exit_task);
static int __rpc_execute(struct rpc_task *task)
{
int status = 0;
+ struct rpc_wait_queue *queue;

dprintk("RPC: %4d rpc_execute flgs %x\n",
task->tk_pid, task->tk_flags);
@@ -631,22 +632,27 @@ static int __rpc_execute(struct rpc_task
lock_kernel();
task->tk_action(task);
unlock_kernel();
+ /* micro-optimization to avoid spinlock */
+ if (!RPC_IS_QUEUED(task))
+ continue;
}

/*
- * Lockless check for whether task is sleeping or not.
+ * Check whether task is sleeping.
*/
- if (!RPC_IS_QUEUED(task))
- continue;
- rpc_clear_running(task);
+ queue = task->u.tk_wait.rpc_waitq;
+ spin_lock_bh(&queue->lock);
if (RPC_IS_ASYNC(task)) {
- /* Careful! we may have raced... */
- if (RPC_IS_QUEUED(task))
- return 0;
- if (rpc_test_and_set_running(task))
+ if (RPC_IS_QUEUED(task)) {
+ rpc_clear_running(task);
+ spin_unlock_bh(&queue->lock);
return 0;
+ }
+ spin_unlock_bh(&queue->lock);
continue;
}
+ rpc_clear_running(task);
+ spin_unlock_bh(&queue->lock);

/* sync task: sleep here */
dprintk("RPC: %4d sync task going to sleep\n", task->tk_pid);

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-27 19:17:52

by Trond Myklebust

[permalink] [raw]

Subject: Re: NFS hang

On Mon, 2006-11-27 at 19:09 +0000, Chris Caputo wrote:
> On Thu, 23 Nov 2006, Chris Caputo wrote:
> > On Mon, 6 Nov 2006, Josh Boyer wrote:
> > > We've got an IBM bladecenter we're using to do some NFS stress testing
> > > (v3). It seems with kernel 2.6.16.21 we get a hang during our stress
> > > tests after roughly and hour or two. The hanging process has a
> > > backtrace that looks like this:
> > >
> > > cp D 000000000ff0b198 6672 16505 16296
> > > (NOTLB)
> > > Call Trace:
> > > [C00000000F43B1D0] [C00000002DD5B088] 0xc00000002dd5b088 (unreliable)
> > > [C00000000F43B3A0] [C00000000000F0B4] .__switch_to+0x12c/0x150
> > > [C00000000F43B430] [C00000000039980C] .schedule+0xcec/0xe4c
> > > [C00000000F43B540] [C00000000039A688] .io_schedule+0x68/0xb8
> > > [C00000000F43B5D0] [C000000000093D60] .sync_page+0x7c/0x98
> > > [C00000000F43B650] [C00000000039A8B4] .__wait_on_bit_lock+0x8c/0x114
> > > [C00000000F43B700] [C000000000093C94] .__lock_page+0xa0/0xc8
> > > [C00000000F43B820] [C000000000094A6C] .do_generic_mapping_read+0x21c/0x4a0
> > > [C00000000F43B970] [C000000000095624] .__generic_file_aio_read+0x17c/0x228
> > > [C00000000F43BA40] [C0000000000957C4] .generic_file_aio_read+0x44/0x54
> > > [C00000000F43BAD0] [D000000000320F58] .nfs_file_read+0xb8/0xec [nfs]
> > > [C00000000F43BB70] [C0000000000C5298] .do_sync_read+0xd4/0x130
> > > [C00000000F43BCF0] [C0000000000C60E0] .vfs_read+0x128/0x20c
> > > [C00000000F43BD90] [C0000000000C65C0] .sys_read+0x4c/0x8c
> > > [C00000000F43BE30] [C00000000000871C] syscall_exit+0x0/0x40
> > >
> > > So far, we've only been able to recreate this with the client and server
> > > blades both in the same chassis. We tried recreating with nfs_debug
> > > enabled, but it did not hang. So perhaps there is a correlation with
> > > the high-speed network (gigabit ethernet over fibre) that comes with
> > > that configuration.
> > >
> > > Does anyone recognize the backtrace of the process or the symptoms of
> > > the problem? We'd appreciate any info that you may have.
> > >
> > > Please CC me in responses as I'm not subscribed to this list.
> >
> > Hi. I am digging into a hang file issue (albeit NFSv2) that I've narrowed
> > down to beginning with 2.6.11 and continuing to happen through 2.6.18, but
> > which does not happen on 2.6.10 or prior.
> >
> > I wonder if you could run your tests on 2.6.10 and 2.6.11.12 and share the
> > results with me.
>
> If you were running into the same problem I was, the following should fix
> it. The below patch should apply with ease to 2.6.16.21.
>
> Chris
>
> ---
>
> [PATCH 2.6.19-rc6] sunrpc: fix race condition
>
> Patch linux-2.6.10-01-rpc_workqueue.dif introduced a race condition into
> net/sunrpc/sched.c in kernels 2.6.11-rc1 through 2.6.19-rc6. The race
> scenario is as follows...
>
> Given: RPC_TASK_QUEUED, RPC_TASK_RUNNING and RPC_TASK_ASYNC are set.
>
> __rpc_execute() (no spinlock) rpc_make_runnable() (queue spinlock held)
> ----------------------------- -----------------------------------------
> do_ret = rpc_test_and_set_running(task);
> rpc_clear_running(task);
> if (RPC_IS_ASYNC(task)) {
> if (RPC_IS_QUEUED(task))
> return 0;
> rpc_clear_queued(task);
> if (do_ret)
> return;
>
> Thus both threads return and the task is abandoned forever.
>
> In my test NFS client usage (~200 Mb/s at ~3,000 RPC calls/s) this race
> condition has resulted in processes getting permanently stuck in 'D' state
> often in less than 15 minutes of uptime.
>
> The following patch fixes the problem by returning to use of a spinlock in
> __rpc_execute().
>
> Signed-off-by: Chris Caputo <[email protected]>
> ---
>
> diff -up a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> --- a/net/sunrpc/sched.c 2006-11-27 08:41:07.000000000 +0000
> +++ b/net/sunrpc/sched.c 2006-11-27 11:14:21.000000000 +0000
> @@ -587,6 +587,7 @@ EXPORT_SYMBOL(rpc_exit_task);
> static int __rpc_execute(struct rpc_task *task)
> {
> int status = 0;
> + struct rpc_wait_queue *queue;
>
> dprintk("RPC: %4d rpc_execute flgs %x\n",
> task->tk_pid, task->tk_flags);
> @@ -631,22 +632,27 @@ static int __rpc_execute(struct rpc_task
> lock_kernel();
> task->tk_action(task);
> unlock_kernel();
> + /* micro-optimization to avoid spinlock */
> + if (!RPC_IS_QUEUED(task))
> + continue;
> }
>
> /*
> - * Lockless check for whether task is sleeping or not.
> + * Check whether task is sleeping.
> */
> - if (!RPC_IS_QUEUED(task))
> - continue;
> - rpc_clear_running(task);
> + queue = task->u.tk_wait.rpc_waitq;

NACK... There is no guarantee that task->u.tk_wait has any meaning here.
Particularly not so in the case of an asynchronous task, where the
storage is shared with the work_struct.

> + spin_lock_bh(&queue->lock);
> if (RPC_IS_ASYNC(task)) {
> - /* Careful! we may have raced... */
> - if (RPC_IS_QUEUED(task))
> - return 0;
> - if (rpc_test_and_set_running(task))
> + if (RPC_IS_QUEUED(task)) {
> + rpc_clear_running(task);
> + spin_unlock_bh(&queue->lock);
> return 0;
> + }
> + spin_unlock_bh(&queue->lock);
> continue;
> }
> + rpc_clear_running(task);
> + spin_unlock_bh(&queue->lock);
>
> /* sync task: sleep here */
> dprintk("RPC: %4d sync task going to sleep\n", task->tk_pid);
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-27 19:33:21

by Chris Caputo

[permalink] [raw]

Subject: Re: NFS hang

On Mon, 27 Nov 2006, Trond Myklebust wrote:
> On Mon, 2006-11-27 at 19:09 +0000, Chris Caputo wrote:
> > - if (!RPC_IS_QUEUED(task))
> > - continue;
> > - rpc_clear_running(task);
> > + queue = task->u.tk_wait.rpc_waitq;
>
> NACK... There is no guarantee that task->u.tk_wait has any meaning here.
> Particularly not so in the case of an asynchronous task, where the
> storage is shared with the work_struct.

Yikes. Would you suggest I move the lock outside of the union and try
again?

Thanks,
Chris

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-27 19:58:45

by Trond Myklebust

[permalink] [raw]

Subject: Re: NFS hang

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

Attachments:

linux-2.6.19-001-fix_rpc_wakeup_race.dif (2.16 kB)
(No filename) (347.00 B)
(No filename) (140.00 B)
Download all attachments

2006-11-27 21:22:53

by Chris Caputo

[permalink] [raw]

Subject: Re: NFS hang

On Mon, 27 Nov 2006, Trond Myklebust wrote:
> On Mon, 2006-11-27 at 19:33 +0000, Chris Caputo wrote:
> > On Mon, 27 Nov 2006, Trond Myklebust wrote:
> > > On Mon, 2006-11-27 at 19:09 +0000, Chris Caputo wrote:
> > > > - if (!RPC_IS_QUEUED(task))
> > > > - continue;
> > > > - rpc_clear_running(task);
> > > > + queue = task->u.tk_wait.rpc_waitq;
> > >
> > > NACK... There is no guarantee that task->u.tk_wait has any meaning here.
> > > Particularly not so in the case of an asynchronous task, where the
> > > storage is shared with the work_struct.
> >
> > Yikes. Would you suggest I move the lock outside of the union and try
> > again?
>
> No. There is no way this can work. You would need something that
> guarantees that the task stays queued while you are taking the queue
> lock.
>
> Have you instead tried Christophe Saout's patch (see attachment)?

Thank you for the suggestion. With 65 minutes of uptime so far, Saout's
November 5th patch is looking good. For reference, normally I see the
race happen in under 15 minutes.

I'll report back if any problems develop. This machine is an outgoing
newsfeed server and so it pounds on NFS client routines 24x7.

Chris

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-27 21:40:30

[permalink] [raw]

Subject: Re: NFS hang

On Mon, 2006-11-27 at 21:22 +0000, Chris Caputo wrote:
> On Mon, 27 Nov 2006, Trond Myklebust wrote:
> > On Mon, 2006-11-27 at 19:33 +0000, Chris Caputo wrote:
> > > On Mon, 27 Nov 2006, Trond Myklebust wrote:
> > > > On Mon, 2006-11-27 at 19:09 +0000, Chris Caputo wrote:
> > > > > - if (!RPC_IS_QUEUED(task))
> > > > > - continue;
> > > > > - rpc_clear_running(task);
> > > > > + queue = task->u.tk_wait.rpc_waitq;
> > > >
> > > > NACK... There is no guarantee that task->u.tk_wait has any meaning here.
> > > > Particularly not so in the case of an asynchronous task, where the
> > > > storage is shared with the work_struct.
> > >
> > > Yikes. Would you suggest I move the lock outside of the union and try
> > > again?
> >
> > No. There is no way this can work. You would need something that
> > guarantees that the task stays queued while you are taking the queue
> > lock.
> >
> > Have you instead tried Christophe Saout's patch (see attachment)?
>
> Thank you for the suggestion. With 65 minutes of uptime so far, Saout's
> November 5th patch is looking good. For reference, normally I see the
> race happen in under 15 minutes.
>
> I'll report back if any problems develop. This machine is an outgoing
> newsfeed server and so it pounds on NFS client routines 24x7.

Would the race condition that Chris described potentially lead to the
stack trace I originally posted? If so, I can try to test this patch
out myself.

josh

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-27 21:46:45

by Chris Caputo

[permalink] [raw]

Subject: Re: NFS hang

On Mon, 27 Nov 2006, Josh Boyer wrote:
> On Mon, 2006-11-27 at 21:22 +0000, Chris Caputo wrote:
> > On Mon, 27 Nov 2006, Trond Myklebust wrote:
> > > On Mon, 2006-11-27 at 19:33 +0000, Chris Caputo wrote:
> > > > On Mon, 27 Nov 2006, Trond Myklebust wrote:
> > > > > On Mon, 2006-11-27 at 19:09 +0000, Chris Caputo wrote:
> > > > > > - if (!RPC_IS_QUEUED(task))
> > > > > > - continue;
> > > > > > - rpc_clear_running(task);
> > > > > > + queue = task->u.tk_wait.rpc_waitq;
> > > > >
> > > > > NACK... There is no guarantee that task->u.tk_wait has any meaning here.
> > > > > Particularly not so in the case of an asynchronous task, where the
> > > > > storage is shared with the work_struct.
> > > >
> > > > Yikes. Would you suggest I move the lock outside of the union and try
> > > > again?
> > >
> > > No. There is no way this can work. You would need something that
> > > guarantees that the task stays queued while you are taking the queue
> > > lock.
> > >
> > > Have you instead tried Christophe Saout's patch (see attachment)?
> >
> > Thank you for the suggestion. With 65 minutes of uptime so far, Saout's
> > November 5th patch is looking good. For reference, normally I see the
> > race happen in under 15 minutes.
> >
> > I'll report back if any problems develop. This machine is an outgoing
> > newsfeed server and so it pounds on NFS client routines 24x7.
>
> Would the race condition that Chris described potentially lead to the
> stack trace I originally posted? If so, I can try to test this patch
> out myself.

Yes. Your stack showed that your process was waiting for data to be read
via NFS. It is possible that this bug resulted in your read request being
lost forever and thus your process hung forever. And any other processes
which then attempted to read the same data would also hang. It's
contagious. :-)

Chris

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-27 21:54:39

by Trond Myklebust

[permalink] [raw]

Subject: Re: NFS hang

On Mon, 2006-11-27 at 15:40 -0600, Josh Boyer wrote:
> On Mon, 2006-11-27 at 21:22 +0000, Chris Caputo wrote:
> > On Mon, 27 Nov 2006, Trond Myklebust wrote:
> > > On Mon, 2006-11-27 at 19:33 +0000, Chris Caputo wrote:
> > > > On Mon, 27 Nov 2006, Trond Myklebust wrote:
> > > > > On Mon, 2006-11-27 at 19:09 +0000, Chris Caputo wrote:
> > > > > > - if (!RPC_IS_QUEUED(task))
> > > > > > - continue;
> > > > > > - rpc_clear_running(task);
> > > > > > + queue = task->u.tk_wait.rpc_waitq;
> > > > >
> > > > > NACK... There is no guarantee that task->u.tk_wait has any meaning here.
> > > > > Particularly not so in the case of an asynchronous task, where the
> > > > > storage is shared with the work_struct.
> > > >
> > > > Yikes. Would you suggest I move the lock outside of the union and try
> > > > again?
> > >
> > > No. There is no way this can work. You would need something that
> > > guarantees that the task stays queued while you are taking the queue
> > > lock.
> > >
> > > Have you instead tried Christophe Saout's patch (see attachment)?
> >
> > Thank you for the suggestion. With 65 minutes of uptime so far, Saout's
> > November 5th patch is looking good. For reference, normally I see the
> > race happen in under 15 minutes.
> >
> > I'll report back if any problems develop. This machine is an outgoing
> > newsfeed server and so it pounds on NFS client routines 24x7.
>
> Would the race condition that Chris described potentially lead to the
> stack trace I originally posted? If so, I can try to test this patch
> out myself.

Possibly. It is worth trying.

Trond

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-27 21:56:02

[permalink] [raw]

Subject: Re: NFS hang

On Mon, 2006-11-27 at 21:46 +0000, Chris Caputo wrote:
> On Mon, 27 Nov 2006, Josh Boyer wrote:
> > On Mon, 2006-11-27 at 21:22 +0000, Chris Caputo wrote:
> > > On Mon, 27 Nov 2006, Trond Myklebust wrote:
> > > > On Mon, 2006-11-27 at 19:33 +0000, Chris Caputo wrote:
> > > > > On Mon, 27 Nov 2006, Trond Myklebust wrote:
> > > > > > On Mon, 2006-11-27 at 19:09 +0000, Chris Caputo wrote:
> > > > > > > - if (!RPC_IS_QUEUED(task))
> > > > > > > - continue;
> > > > > > > - rpc_clear_running(task);
> > > > > > > + queue = task->u.tk_wait.rpc_waitq;
> > > > > >
> > > > > > NACK... There is no guarantee that task->u.tk_wait has any meaning here.
> > > > > > Particularly not so in the case of an asynchronous task, where the
> > > > > > storage is shared with the work_struct.
> > > > >
> > > > > Yikes. Would you suggest I move the lock outside of the union and try
> > > > > again?
> > > >
> > > > No. There is no way this can work. You would need something that
> > > > guarantees that the task stays queued while you are taking the queue
> > > > lock.
> > > >
> > > > Have you instead tried Christophe Saout's patch (see attachment)?
> > >
> > > Thank you for the suggestion. With 65 minutes of uptime so far, Saout's
> > > November 5th patch is looking good. For reference, normally I see the
> > > race happen in under 15 minutes.
> > >
> > > I'll report back if any problems develop. This machine is an outgoing
> > > newsfeed server and so it pounds on NFS client routines 24x7.
> >
> > Would the race condition that Chris described potentially lead to the
> > stack trace I originally posted? If so, I can try to test this patch
> > out myself.
>
> Yes. Your stack showed that your process was waiting for data to be read
> via NFS. It is possible that this bug resulted in your read request being
> lost forever and thus your process hung forever. And any other processes
> which then attempted to read the same data would also hang. It's
> contagious. :-)

Ok. I will try and give this a shot soon. Our test setup takes
anywhere from a couple hours to a couple days to recreate, but it
typically fails overnight. I'll report back as soon as I have results.

josh

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-30 12:07:19

[permalink] [raw]

Subject: Re: NFS hang

On Mon, 2006-11-27 at 16:54 -0500, Trond Myklebust wrote:
> On Mon, 2006-11-27 at 15:40 -0600, Josh Boyer wrote:
> > On Mon, 2006-11-27 at 21:22 +0000, Chris Caputo wrote:
> > > On Mon, 27 Nov 2006, Trond Myklebust wrote:
> > > > On Mon, 2006-11-27 at 19:33 +0000, Chris Caputo wrote:
> > > > > On Mon, 27 Nov 2006, Trond Myklebust wrote:
> > > > > > On Mon, 2006-11-27 at 19:09 +0000, Chris Caputo wrote:
> > > > > > > - if (!RPC_IS_QUEUED(task))
> > > > > > > - continue;
> > > > > > > - rpc_clear_running(task);
> > > > > > > + queue = task->u.tk_wait.rpc_waitq;
> > > > > >
> > > > > > NACK... There is no guarantee that task->u.tk_wait has any meaning here.
> > > > > > Particularly not so in the case of an asynchronous task, where the
> > > > > > storage is shared with the work_struct.
> > > > >
> > > > > Yikes. Would you suggest I move the lock outside of the union and try
> > > > > again?
> > > >
> > > > No. There is no way this can work. You would need something that
> > > > guarantees that the task stays queued while you are taking the queue
> > > > lock.
> > > >
> > > > Have you instead tried Christophe Saout's patch (see attachment)?
> > >
> > > Thank you for the suggestion. With 65 minutes of uptime so far, Saout's
> > > November 5th patch is looking good. For reference, normally I see the
> > > race happen in under 15 minutes.
> > >
> > > I'll report back if any problems develop. This machine is an outgoing
> > > newsfeed server and so it pounds on NFS client routines 24x7.
> >
> > Would the race condition that Chris described potentially lead to the
> > stack trace I originally posted? If so, I can try to test this patch
> > out myself.
>
> Possibly. It is worth trying.

We've been testing this patch in our kernel and it looks good. The
tests have run for 2 days and 13 hours so far, which is longer than
previous attempts. I would suggest adding this to upstream (if it isn't
already), and possibly even the stable release.

josh

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-30 21:19:58

by Chris Caputo

[permalink] [raw]

Subject: Re: NFS hang

On Mon, 27 Nov 2006, Chris Caputo wrote:
> On Mon, 27 Nov 2006, Trond Myklebust wrote:
> > No. There is no way this can work. You would need something that
> > guarantees that the task stays queued while you are taking the queue
> > lock.
> >
> > Have you instead tried Christophe Saout's patch (see attachment)?
>
> Thank you for the suggestion. With 65 minutes of uptime so far, Saout's
> November 5th patch is looking good. For reference, normally I see the
> race happen in under 15 minutes.
>
> I'll report back if any problems develop. This machine is an outgoing
> newsfeed server and so it pounds on NFS client routines 24x7.

I am not sure if this is related, but at just over 3 days of uptime with
2.6.19-rc6 and the Saout patch, I had this happen:

---
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000028
printing eip:
c029cf64
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: weasel_pci
CPU: 1
EIP: 0060:[<c029cf64>] Not tainted VLI
EFLAGS: 00010202 (2.6.19-rc6 #5)
EIP is at call_start+0x5c/0x6f
eax: c030ae00 ebx: d2c80ac4 ecx: 00000001 edx: f6ad7550
esi: 00000000 edi: 00000000 ebp: f6adbdd0 esp: f6adbd18
ds: 007b es: 007b ss: 0068
Process feeder (pid: 4557, ti=f6adb000 task=f6ad7550 task.ti=f6adb000)
Stack: 00000000 00000282 c02ade34 00000001 f6ad7550 c0112a05 d2c80ac4 d2c80b2c
c02a22de d2c80c54 f78ca6e0 f6adbdd0 c02adc03 d2c80ac4 d2c80c54 f78ca6e0
c01a0795 80000000 00000000 d2c80ac0 d2c80c54 00002000 c01a0cf8 00002000
Call Trace:
[<c02ade34>] __down+0xa1/0xd7
[<c0112a05>] default_wake_function+0x0/0xc
[<c02a22de>] __rpc_execute+0x82/0x1b8
[<c02adc03>] __down_failed+0x7/0xc
[<c01a0795>] nfs_execute_read+0x35/0x49
[<c01a0cf8>] nfs_pagein_one+0x239/0x251
[<c01a0fb2>] nfs_readpages+0x140/0x1a2
[<c01377d1>] __alloc_pages+0x55/0x284
[<c0138ead>] __do_page_cache_readahead+0x126/0x1e5
[<c025bdc0>] lock_sock+0x88/0x90
[<c025eebc>] __alloc_skb+0x49/0xef
[<c02adf66>] _spin_lock_bh+0x8/0x18
[<c025bcaf>] release_sock+0xf/0x98
[<c027b43a>] tcp_sendpage+0x55b/0x575
[<c02ac57a>] __sched_text_start+0x69a/0x73d
[<c0138fb8>] blockable_page_cache_readahead+0x4c/0x9f
[<c0139087>] make_ahead_window+0x7c/0x99
[<c0139216>] page_cache_readahead+0x172/0x195
[<c0133e83>] do_generic_mapping_read+0x13f/0x42e
[<c01341c0>] generic_file_sendfile+0x4e/0x60
[<c0133283>] file_send_actor+0x0/0x53
[<c014c1b9>] do_sendfile+0x193/0x23e
[<c0133283>] file_send_actor+0x0/0x53
[<c014c346>] sys_sendfile+0x4d/0x94
[<c0102a03>] syscall_call+0x7/0xb
=======================
Code: 8b 46 14 89 44 24 0c 8b 46 20 c7 04 24 d7 dc 2e c0 89 44 24 08 0f b7 83 b0 00 00 00 89 44 24 04 e8 69 b3 e7 ff 8b 43 1c ff 40 10 <8b> 46 28 ff 40 18 c7 43 3c 77 cf 29 c0 83 c4 18 5b 5e c3 53 89
EIP: [<c029cf64>] call_start+0x5c/0x6f SS:ESP 0068:f6adbd18
---

# nfsstat -c
Client rpc stats:
calls retrans authrefrsh
747221801 4444 0
Client nfs v2:
null getattr setattr root lookup readlink
0 0% 19831747 2% 0 0% 0 0% 16718708 2% 0 0%
read wrcache write create remove rename
710082513 95% 0 0% 588780 0% 6 0% 3 0% 3 0%
link symlink mkdir rmdir readdir fsstat
0 0% 0 0% 0 0% 0 0% 9 0% 29 0%

I am now compiling 2.6.19 with the Saout patch...

Chris

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-23 07:40:25

by Chris Caputo

[permalink] [raw]

Subject: Re: NFS hang

On Mon, 6 Nov 2006, Josh Boyer wrote:
> We've got an IBM bladecenter we're using to do some NFS stress testing
> (v3). It seems with kernel 2.6.16.21 we get a hang during our stress
> tests after roughly and hour or two. The hanging process has a
> backtrace that looks like this:
>
> cp D 000000000ff0b198 6672 16505 16296
> (NOTLB)
> Call Trace:
> [C00000000F43B1D0] [C00000002DD5B088] 0xc00000002dd5b088 (unreliable)
> [C00000000F43B3A0] [C00000000000F0B4] .__switch_to+0x12c/0x150
> [C00000000F43B430] [C00000000039980C] .schedule+0xcec/0xe4c
> [C00000000F43B540] [C00000000039A688] .io_schedule+0x68/0xb8
> [C00000000F43B5D0] [C000000000093D60] .sync_page+0x7c/0x98
> [C00000000F43B650] [C00000000039A8B4] .__wait_on_bit_lock+0x8c/0x114
> [C00000000F43B700] [C000000000093C94] .__lock_page+0xa0/0xc8
> [C00000000F43B820] [C000000000094A6C] .do_generic_mapping_read+0x21c/0x4a0
> [C00000000F43B970] [C000000000095624] .__generic_file_aio_read+0x17c/0x228
> [C00000000F43BA40] [C0000000000957C4] .generic_file_aio_read+0x44/0x54
> [C00000000F43BAD0] [D000000000320F58] .nfs_file_read+0xb8/0xec [nfs]
> [C00000000F43BB70] [C0000000000C5298] .do_sync_read+0xd4/0x130
> [C00000000F43BCF0] [C0000000000C60E0] .vfs_read+0x128/0x20c
> [C00000000F43BD90] [C0000000000C65C0] .sys_read+0x4c/0x8c
> [C00000000F43BE30] [C00000000000871C] syscall_exit+0x0/0x40
>
> So far, we've only been able to recreate this with the client and server
> blades both in the same chassis. We tried recreating with nfs_debug
> enabled, but it did not hang. So perhaps there is a correlation with
> the high-speed network (gigabit ethernet over fibre) that comes with
> that configuration.
>
> Does anyone recognize the backtrace of the process or the symptoms of
> the problem? We'd appreciate any info that you may have.
>
> Please CC me in responses as I'm not subscribed to this list.

Hi. I am digging into a hang file issue (albeit NFSv2) that I've narrowed
down to beginning with 2.6.11 and continuing to happen through 2.6.18, but
which does not happen on 2.6.10 or prior.

I wonder if you could run your tests on 2.6.10 and 2.6.11.12 and share the
results with me.

Thanks,
Chris

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-12-06 18:36:09

by Chris Caputo

[permalink] [raw]

Subject: Re: NFS hang

On Fri, 1 Dec 2006, Chris Caputo wrote:
> On Thu, 30 Nov 2006, Trond Myklebust wrote:
> > On Fri, 2006-12-01 at 04:20 +0000, Chris Caputo wrote:
> > > On Thu, 30 Nov 2006, Chris Caputo wrote:
> > > > I am not sure if this is related, but at just over 3 days of uptime with
> > > > 2.6.19-rc6 and the Saout patch, I had this happen:
> > > >
> > > > ---
> > > > BUG: unable to handle kernel NULL pointer dereference at virtual address 00000028
> > > > printing eip:
> > > > c029cf64
> > > > EIP is at call_start+0x5c/0x6f
> > >
> > > I believe the above is the following line in clnt.c:call_start():
> > >
> > > clnt->cl_stats->rpccnt++;
> > >
> > > So call_start() was called with a NULL task->tk_client.
> > >
> > > Similar result with 2.6.19...
> > >
> > > BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
> > > printing eip:
> > > c029ef8e
> > > EIP is at xprt_reserve+0x28/0x119
> > >
> > > In this one xprt.c:xprt_reserve() is I believe crashing at:
> > >
> > > spin_lock(&xprt->reserve_lock);
> > >
> > > due to a NULL task->tk_xprt.
> > >
> > > Ideas on whether this is related to the Saout sched race patch or if this
> > > is something else entirely?
> >
> > I suspect that it is related, but not the same race. I've identified a
> > couple of other possible race conditions, mainly to do with the fact
> > that nothing prevents an rpc_task from being freed while you are inside
> > rpc_wake_up_task(). Could you try applying both the attached patches,
> > and see if that helps?
>
> I compiled 2.6.19 with your patches in addition to the Saout patch and a
> test is now in progress.

With:

linux-2.6.19-001-fix_rpc_wakeup_race.dif
linux-2.6.19-002-cleanup_rpc_wakeup.dif
linux-2.6.19-003-rpc_wakeup_fix_pot_race.dif

My 2.6.19 test machine is now at 5 days 9+ hours of uptime with no
problems and the following number of RPC calls:

Client rpc stats:
calls retrans authrefrsh
1314188163 2344 0

Previously a problem happened within between a few hours and up to 3 days
of uptime, so this may be progress.

I'll leave the machine running and report back any problems.

Thanks,
Chris

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-12-01 04:20:05

by Chris Caputo

[permalink] [raw]

Subject: Re: NFS hang

On Thu, 30 Nov 2006, Chris Caputo wrote:
> On Mon, 27 Nov 2006, Chris Caputo wrote:
> > On Mon, 27 Nov 2006, Trond Myklebust wrote:
> > > No. There is no way this can work. You would need something that
> > > guarantees that the task stays queued while you are taking the queue
> > > lock.
> > >
> > > Have you instead tried Christophe Saout's patch (see attachment)?
> >
> > Thank you for the suggestion. With 65 minutes of uptime so far, Saout's
> > November 5th patch is looking good. For reference, normally I see the
> > race happen in under 15 minutes.
> >
> > I'll report back if any problems develop. This machine is an outgoing
> > newsfeed server and so it pounds on NFS client routines 24x7.
>
> I am not sure if this is related, but at just over 3 days of uptime with
> 2.6.19-rc6 and the Saout patch, I had this happen:
>
> ---
> BUG: unable to handle kernel NULL pointer dereference at virtual address 00000028
> printing eip:
> c029cf64
> EIP is at call_start+0x5c/0x6f

I believe the above is the following line in clnt.c:call_start():

clnt->cl_stats->rpccnt++;

So call_start() was called with a NULL task->tk_client.

Similar result with 2.6.19...

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
printing eip:
c029ef8e
EIP is at xprt_reserve+0x28/0x119

In this one xprt.c:xprt_reserve() is I believe crashing at:

spin_lock(&xprt->reserve_lock);

due to a NULL task->tk_xprt.

Ideas on whether this is related to the Saout sched race patch or if this
is something else entirely?

Chris

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-12-01 04:52:17

by Trond Myklebust

[permalink] [raw]

Subject: Re: NFS hang

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

Attachments:

linux-2.6.19-002-cleanup_rpc_wakeup.dif (8.59 kB)
linux-2.6.19-003-rpc_wakeup_fix_pot_race.dif (7.74 kB)
(No filename) (347.00 B)
(No filename) (140.00 B)
Download all attachments

2006-12-01 09:14:52

by Chris Caputo

[permalink] [raw]

Subject: Re: NFS hang

On Thu, 30 Nov 2006, Trond Myklebust wrote:
> On Fri, 2006-12-01 at 04:20 +0000, Chris Caputo wrote:
> > On Thu, 30 Nov 2006, Chris Caputo wrote:
> > > I am not sure if this is related, but at just over 3 days of uptime with
> > > 2.6.19-rc6 and the Saout patch, I had this happen:
> > >
> > > ---
> > > BUG: unable to handle kernel NULL pointer dereference at virtual address 00000028
> > > printing eip:
> > > c029cf64
> > > EIP is at call_start+0x5c/0x6f
> >
> > I believe the above is the following line in clnt.c:call_start():
> >
> > clnt->cl_stats->rpccnt++;
> >
> > So call_start() was called with a NULL task->tk_client.
> >
> > Similar result with 2.6.19...
> >
> > BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
> > printing eip:
> > c029ef8e
> > EIP is at xprt_reserve+0x28/0x119
> >
> > In this one xprt.c:xprt_reserve() is I believe crashing at:
> >
> > spin_lock(&xprt->reserve_lock);
> >
> > due to a NULL task->tk_xprt.
> >
> > Ideas on whether this is related to the Saout sched race patch or if this
> > is something else entirely?
>
> I suspect that it is related, but not the same race. I've identified a
> couple of other possible race conditions, mainly to do with the fact
> that nothing prevents an rpc_task from being freed while you are inside
> rpc_wake_up_task(). Could you try applying both the attached patches,
> and see if that helps?

I compiled 2.6.19 with your patches in addition to the Saout patch and a
test is now in progress.

Thanks,
Chris

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs