2007-05-08 19:41:00

by Rogier Wolff

[permalink] [raw]
Subject: nbd problem.


Hi,

The nbd client still reliably hangs when I use it.

While looking into this, I found:


446 req->errors = 0;
447 spin_unlock_irq(q->queue_lock);
^^^^^^^^^^^^^^^^^^^^
448
449 mutex_lock(&lo->tx_lock);
450 if (unlikely(!lo->sock)) {
451 mutex_unlock(&lo->tx_lock);
452 printk(KERN_ERR "%s: Attempted send on closed socket\n",
453 lo->disk->disk_name);
454 req->errors++;
455 nbd_end_request(req);
456 spin_lock_irq(q->queue_lock);
457 continue;
458 }
459
460 lo->active_req = req;
461
462 if (nbd_send_req(lo, req) != 0) {
463 printk(KERN_ERR "%s: Request send failed\n",
464 lo->disk->disk_name);
465 req->errors++;
466 nbd_end_request(req);
467 } else {
468 spin_lock(&lo->queue_lock);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
469 list_add(&req->queuelist, &lo->queue_head);
470 spin_unlock(&lo->queue_lock);
471 }
472
473 lo->active_req = NULL;


As far as I read things, the function is called with the lock
held and interrupts disabled., the lock can then be released and
retaken without disabling interrupts again.

Should this be fixed?

(it doesn't fix my hang though....)

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ


2007-05-08 20:34:00

by Satyam Sharma

[permalink] [raw]
Subject: Re: nbd problem.

On 5/8/07, Rogier Wolff <[email protected]> wrote:
>
> Hi,
>
> The nbd client still reliably hangs when I use it.
>
> While looking into this, I found:
>
>
> 446 req->errors = 0;
> 447 spin_unlock_irq(q->queue_lock);
> ^^^^^^^^^^^^^^^^^^^^

BTW (this could be unrelated to the original issue here), but can
anybody ever have a _genuine_ excuse to use spin_lock_irq /
spin_unlock_irq and not spin_lock_irqsave / spin_unlock_restore? I
find the latter primitives more tasteful even when I *know* something
is being called with interrupts enabled / disabled -- you never know
when some code is re-used again somewhere else and/or ripped out of
one place and put inside another ... the former API only invites
trouble, if anything.

2007-05-09 05:50:32

by Jens Axboe

[permalink] [raw]
Subject: Re: nbd problem.

On Tue, May 08 2007, Rogier Wolff wrote:
>
> Hi,
>
> The nbd client still reliably hangs when I use it.
>
> While looking into this, I found:
>
>
> 446 req->errors = 0;
> 447 spin_unlock_irq(q->queue_lock);
> ^^^^^^^^^^^^^^^^^^^^
> 448
> 449 mutex_lock(&lo->tx_lock);
> 450 if (unlikely(!lo->sock)) {
> 451 mutex_unlock(&lo->tx_lock);
> 452 printk(KERN_ERR "%s: Attempted send on closed socket\n",
> 453 lo->disk->disk_name);
> 454 req->errors++;
> 455 nbd_end_request(req);
> 456 spin_lock_irq(q->queue_lock);
> 457 continue;
> 458 }
> 459
> 460 lo->active_req = req;
> 461
> 462 if (nbd_send_req(lo, req) != 0) {
> 463 printk(KERN_ERR "%s: Request send failed\n",
> 464 lo->disk->disk_name);
> 465 req->errors++;
> 466 nbd_end_request(req);
> 467 } else {
> 468 spin_lock(&lo->queue_lock);
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 469 list_add(&req->queuelist, &lo->queue_head);
> 470 spin_unlock(&lo->queue_lock);
> 471 }
> 472
> 473 lo->active_req = NULL;
>
>
> As far as I read things, the function is called with the lock
> held and interrupts disabled., the lock can then be released and
> retaken without disabling interrupts again.
>
> Should this be fixed?
>
> (it doesn't fix my hang though....)

Note lo->queue_lock vs q->queue_lock.

--
Jens Axboe

2007-05-09 11:11:05

by Rogier Wolff

[permalink] [raw]
Subject: Re: nbd problem.

On Tue, May 08, 2007 at 01:33:52PM -0700, Satyam Sharma wrote:
> On 5/8/07, Rogier Wolff <[email protected]> wrote:
> >
> >Hi,
> >
> >The nbd client still reliably hangs when I use it.

Someone suggested to use

http://git.kernel.org/?p=linux/kernel/git/axboe/linux-2.6-block.git;a=summary

and that fixed it. (i.e. there is something in there that should
be merged....)

Jens, thanks for pointing out that there were different locks
involved.

Roger.

(I seem to have lost all other EMails in this thread. Apparently
my delete-old-list-emails is too agressive today...)

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

2007-05-09 12:38:34

by Rogier Wolff

[permalink] [raw]
Subject: Re: nbd problem.

On Wed, May 09, 2007 at 01:10:49PM +0200, Rogier Wolff wrote:
> On Tue, May 08, 2007 at 01:33:52PM -0700, Satyam Sharma wrote:
> > On 5/8/07, Rogier Wolff <[email protected]> wrote:
> > >
> > >Hi,
> > >
> > >The nbd client still reliably hangs when I use it.
>
> Someone suggested to use
>
> http://git.kernel.org/?p=linux/kernel/git/axboe/linux-2.6-block.git;a=summary
>
> and that fixed it. (i.e. there is something in there that should
> be merged....)

Cancel the party! It got MUCH further than before, but crashed
eventually.

ozon:~> ps auxww | grep D
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 110 0.4 0.0 0 0 ? D 10:28 0:31 [pdflush]
root 112 0.0 0.0 0 0 ? D< 10:28 0:05 [kswapd0]
root 1649 0.0 0.1 1604 108 pts/0 D+ 11:17 0:03 nbd-client petisuix 1234 /dev/nd0
root 1654 0.9 4.5 4648 2816 pts/0 D+ 11:17 0:44 rsync /usr/src/linux-2.6.21.ozon /mnt/test1 -av --progress
wolff 1716 0.0 0.9 1648 560 pts/1 R+ 12:33 0:00 grep D
ozon:~>

Can anybody help me figure out what these proceses are waiting for?

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

2007-05-09 12:53:36

by Jan Engelhardt

[permalink] [raw]
Subject: Re: nbd problem.


On May 9 2007 14:38, Rogier Wolff wrote:
>
>ozon:~> ps auxww | grep D
>USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
>root 110 0.4 0.0 0 0 ? D 10:28 0:31 [pdflush]
>root 112 0.0 0.0 0 0 ? D< 10:28 0:05 [kswapd0]
>root 1649 0.0 0.1 1604 108 pts/0 D+ 11:17 0:03 nbd-client petisuix 1234 /dev/nd0
>root 1654 0.9 4.5 4648 2816 pts/0 D+ 11:17 0:44 rsync /usr/src/linux-2.6.21.ozon /mnt/test1 -av --progress
>wolff 1716 0.0 0.9 1648 560 pts/1 R+ 12:33 0:00 grep D
>ozon:~>
>
>Can anybody help me figure out what these proceses are waiting for?

echo t >/proc/sysrq-trigger

dumps a ton to /var/log/messages.



Jan
--

2007-05-09 13:41:38

by Jens Axboe

[permalink] [raw]
Subject: Re: nbd problem.

On Wed, May 09 2007, Rogier Wolff wrote:
> On Tue, May 08, 2007 at 01:33:52PM -0700, Satyam Sharma wrote:
> > On 5/8/07, Rogier Wolff <[email protected]> wrote:
> > >
> > >Hi,
> > >
> > >The nbd client still reliably hangs when I use it.
>
> Someone suggested to use
>
> http://git.kernel.org/?p=linux/kernel/git/axboe/linux-2.6-block.git;a=summary
>
> and that fixed it. (i.e. there is something in there that should
> be merged....)

Hmm, which branch? Most of my stuff is merged up with Linus as this
point.

> Jens, thanks for pointing out that there were different locks
> involved.

You're welcome.

--
Jens Axboe