2006-01-18 00:30:25

by Blaisorblade

[permalink] [raw]
Subject: [PATCH 4/9] uml: fix spinlock recursion and sleep-inside-spinlock in error path


From: Paolo 'Blaisorblade' Giarrusso <[email protected]>

In this error path, when the interface has had a problem, we call dev_close(),
which is disallowed for two reasons:

*) takes again the UML internal spinlock, inside the ->stop method of this
device
*) can be called in process context only, while we're in interrupt context.

I've also thought that calling dev_close() may be a wrong policy to follow, but
it's not up to me to decide that.

However, we may end up with multiple dev_close() queued on the same device.
But the initial test for (dev->flags & IFF_UP) makes this harmless, though -
and dev_close() is supposed to care about races with itself. So there's no harm
in delaying the shutdown, IMHO.

Something to mark the interface as "going to shutdown" would be appreciated, but
dev_deactivate has the same problems as dev_close(), so we can't use it either.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---

arch/um/drivers/net_kern.c | 15 +++++++++++++--
1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/um/drivers/net_kern.c b/arch/um/drivers/net_kern.c
index 98350bb..178f68b 100644
--- a/arch/um/drivers/net_kern.c
+++ b/arch/um/drivers/net_kern.c
@@ -68,6 +68,11 @@ static int uml_net_rx(struct net_device
return pkt_len;
}

+static void uml_dev_close(void* dev)
+{
+ dev_close( (struct net_device *) dev);
+}
+
irqreturn_t uml_net_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
struct net_device *dev = dev_id;
@@ -80,15 +85,21 @@ irqreturn_t uml_net_interrupt(int irq, v
spin_lock(&lp->lock);
while((err = uml_net_rx(dev)) > 0) ;
if(err < 0) {
+ DECLARE_WORK(close_work, uml_dev_close, dev);
printk(KERN_ERR
"Device '%s' read returned %d, shutting it down\n",
dev->name, err);
- dev_close(dev);
+ /* dev_close can't be called in interrupt context, and takes
+ * again lp->lock.
+ * And dev_close() can be safely called multiple times on the
+ * same device, since it tests for (dev->flags & IFF_UP). So
+ * there's no harm in delaying the device shutdown. */
+ schedule_work(&close_work);
goto out;
}
reactivate_fd(lp->fd, UM_ETH_IRQ);

- out:
+out:
spin_unlock(&lp->lock);
return(IRQ_HANDLED);
}


2006-01-18 00:50:25

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 4/9] uml: fix spinlock recursion and sleep-inside-spinlock in error path

"Paolo 'Blaisorblade' Giarrusso" <[email protected]> wrote:
>
>
> From: Paolo 'Blaisorblade' Giarrusso <[email protected]>
>
> In this error path, when the interface has had a problem, we call dev_close(),
> which is disallowed for two reasons:
>
> *) takes again the UML internal spinlock, inside the ->stop method of this
> device
> *) can be called in process context only, while we're in interrupt context.
>
> I've also thought that calling dev_close() may be a wrong policy to follow, but
> it's not up to me to decide that.
>
> However, we may end up with multiple dev_close() queued on the same device.
> But the initial test for (dev->flags & IFF_UP) makes this harmless, though -
> and dev_close() is supposed to care about races with itself. So there's no harm
> in delaying the shutdown, IMHO.
>
> Something to mark the interface as "going to shutdown" would be appreciated, but
> dev_deactivate has the same problems as dev_close(), so we can't use it either.
>
> ...
> + /* dev_close can't be called in interrupt context, and takes
> + * again lp->lock.
> + * And dev_close() can be safely called multiple times on the
> + * same device, since it tests for (dev->flags & IFF_UP). So
> + * there's no harm in delaying the device shutdown. */
> + schedule_work(&close_work);
> goto out;
> }

This callback can be pending for an arbitrary amount of time. I'd have
expected to see a flush_sceduled_work() somewhere in the driver to force
all such pending work to complete before we destroy things which that
callback wil be using.

2006-01-18 11:44:37

by Blaisorblade

[permalink] [raw]
Subject: Re: [uml-devel] Re: [PATCH 4/9] uml: fix spinlock recursion and sleep-inside-spinlock in error path

On Wednesday 18 January 2006 01:52, Andrew Morton wrote:
> "Paolo 'Blaisorblade' Giarrusso" <[email protected]> wrote:
> > From: Paolo 'Blaisorblade' Giarrusso <[email protected]>
> >
> > In this error path, when the interface has had a problem, we call
> > dev_close(), which is disallowed for two reasons:

> > *) takes again the UML internal spinlock, inside the ->stop method of
> > this device
> > *) can be called in process context only, while we're in interrupt
> > context.

> > I've also thought that calling dev_close() may be a wrong policy to
> > follow, but it's not up to me to decide that.

> > However, we may end up with multiple dev_close() queued on the same
> > device. But the initial test for (dev->flags & IFF_UP) makes this
> > harmless, though - and dev_close() is supposed to care about races with
> > itself. So there's no harm in delaying the shutdown, IMHO.

> > Something to mark the interface as "going to shutdown" would be
> > appreciated, but dev_deactivate has the same problems as dev_close(), so
> > we can't use it either.

> > ...
> > + /* dev_close can't be called in interrupt context, and takes
> > + * again lp->lock.
> > + * And dev_close() can be safely called multiple times on the
> > + * same device, since it tests for (dev->flags & IFF_UP). So
> > + * there's no harm in delaying the device shutdown. */
> > + schedule_work(&close_work);
> > goto out;
> > }

> This callback can be pending for an arbitrary amount of time. I'd have
> expected to see a flush_sceduled_work() somewhere in the driver to force
> all such pending work to complete before we destroy things which that
> callback wil be using.

Thanks for the tip (it's good I got schedule_work right, it wasn't hard but
1st time I use it), however the device is not later freed, it seems, in this
context... even because it's an unlikely event (I reproduced it with a
damn-bad configuration).

Not going to cause seeable leaks anyway, but yes, to correct.

However, network devices are created from the host only, so only the host
should free them (as done by net_remove on "hot-unplug" request).

Maybe that's to fix too, however, but in a separate patch; Jeff, what about
this?
--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade


___________________________________
Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive
http://it.messenger.yahoo.com