2005-02-21 23:29:58

by Richard J.Farnsworth

[permalink] [raw]
Subject: Bug in NFS client handling of EJUKEBOX?

There seems to be a bug in the 2.6.11-rc code related to the handling
of nfsd's EJUNKBOX error return. The code for __rpc_execute in
net/sunrpc/sched.c looks like:

=============================

static int __rpc_execute(struct rpc_task *task)
{
int status = 0;

dprintk("RPC: %4d rpc_execute flgs %x\n",
task->tk_pid, task->tk_flags);

BUG_ON(RPC_IS_QUEUED(task));

restarted:
while (1) {
/*
* Garbage collection of pending timers...
*/
rpc_delete_timer(task);

.....

if (task->tk_exit) {
lock_kernel();
>>>> task->tk_exit(task);
unlock_kernel();
/* If tk_action is non-null, the user wants us to restart */
if (task->tk_action) {
if (!RPC_ASSASSINATED(task)) {
/* Release RPC slot and buffer memory */
if (task->tk_rqstp)
xprt_release(task);
rpc_free(task);
goto restarted;
}
printk(KERN_ERR "RPC: dead task tries to walk away.\n");
}
}

=============================

If the tk_exit procedure (e.g., nfs_read_done) called at ">>>>" above
detects EJUKEBOX as the result of the call, it will attempt to do
an rpc_restart and rpc_delay on the task. The rpc_delay will set a timer.
However, when the main loop is re-entered via the "goto restarted",
the first thing that happens is that all timers for the task are removed.
This results in the RPC task hanging in an uninterruptible wait.

Am I missing something here or is this really a problem?





-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2005-07-11 09:34:51

by Olaf Kirch

[permalink] [raw]
Subject: [PATCH]: Fix EJUKEBOX handling

Hi all,

this problem still seems to be present in 2.6.12, or am I
missing something?

The patch I've been using is attached below.

Olaf

On Thu, Feb 24, 2005 at 05:08:35AM +0000, Richard J.Farnsworth wrote:
> There seems to be a bug in the 2.6.11-rc code related to the handling
> of nfsd's EJUNKBOX error return. The code for __rpc_execute in
> net/sunrpc/sched.c looks like:
>
> =============================
>
> static int __rpc_execute(struct rpc_task *task)
> {
> int status = 0;
>
> dprintk("RPC: %4d rpc_execute flgs %x\n",
> task->tk_pid, task->tk_flags);
>
> BUG_ON(RPC_IS_QUEUED(task));
>
> restarted:
> while (1) {
> /*
> * Garbage collection of pending timers...
> */
> rpc_delete_timer(task);
>
> .....
>
> if (task->tk_exit) {
> lock_kernel();
> >>>> task->tk_exit(task);
> unlock_kernel();
> /* If tk_action is non-null, the user wants us to restart */
> if (task->tk_action) {
> if (!RPC_ASSASSINATED(task)) {
> /* Release RPC slot and buffer memory */
> if (task->tk_rqstp)
> xprt_release(task);
> rpc_free(task);
> goto restarted;
> }
> printk(KERN_ERR "RPC: dead task tries to walk away.\n");
> }
> }
>
> =============================
>
> If the tk_exit procedure (e.g., nfs_read_done) called at ">>>>" above
> detects EJUKEBOX as the result of the call, it will attempt to do
> an rpc_restart and rpc_delay on the task. The rpc_delay will set a timer.
> However, when the main loop is re-entered via the "goto restarted",
> the first thing that happens is that all timers for the task are removed.
> This results in the RPC task hanging in an uninterruptible wait.
>
> Am I missing something here or is this really a problem?
>
>
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs

--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
[email protected] | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax


Attachments:
(No filename) (2.28 kB)
sunrpc-restart-delay-fix (880.00 B)
Download all attachments

2005-07-11 12:10:00

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH]: Fix EJUKEBOX handling

m=C3=A5 den 11.07.2005 Klokka 11:34 (+0200) skreiv Olaf Kirch:
> Hi all,
>=20
> this problem still seems to be present in 2.6.12, or am I
> missing something?

It should already be fixed in the 2.6.13-rcX series.

Cheers,
Trond



-------------------------------------------------------
This SF.Net email is sponsored by the 'Do More With Dual!' webinar happening
July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual
core and dual graphics technology at this free one hour event hosted by HP,
AMD, and NVIDIA. To register visit http://www.hp.com/go/dualwebinar
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs