2007-09-19 14:38:42

by Maciej W. Rozycki

[permalink] [raw]
Subject: [PATCH] PHYLIB: IRQ event workqueue handling fixes

Keep track of disable_irq_nosync() invocations and call enable_irq() the
right number of times if work has been cancelled that would include them.

Signed-off-by: Maciej W. Rozycki <[email protected]>
---
Now that the call to flush_work_keventd() (problematic because of
rtnl_mutex being held) has been replaced by cancel_work_sync() another
issue has arisen and been left unresolved. As the MDIO bus cannot be
accessed from the interrupt context the PHY interrupt handler uses
disable_irq_nosync() to prevent from looping and schedules some work to be
done as a softirq, which, apart from handling the state change of the
originating PHY, is responsible for reenabling the interrupt. Now if the
interrupt line is shared by another device and a call to the softirq
handler has been cancelled, that call to enable_irq() never happens and
the other device cannot use its interrupt anymore as its stuck disabled.

I decided to use a counter rather than a flag because there may be more
than one call to phy_change() cancelled in the queue -- a real one and a
fake one triggered by free_irq() if DEBUG_SHIRQ is used, if nothing else.
Therefore because of its nesting property enable_irq() has to be called
the right number of times to match the number disable_irq_nosync() was
called and restore the original state. This DEBUG_SHIRQ feature is also
the reason why free_irq() has to be called before cancel_work_sync().

While at it I updated the comment about phy_stop_interrupts() being
called from `keventd' -- this is no longer relevant as the use of
cancel_work_sync() makes such an approach unnecessary. OTOH a similar
comment referring to flush_scheduled_work() in phy_stop() still applies as
using cancel_work_sync() there would be dangerous.

Checked with checkpatch.pl and at the run time (with and without
DEBUG_SHIRQ).

Please apply.

Maciej

patch-mips-2.6.23-rc5-20070904-phy-irq-fix-9
diff -up --recursive --new-file linux-mips-2.6.23-rc5-20070904.macro/drivers/net/phy/phy.c linux-mips-2.6.23-rc5-20070904/drivers/net/phy/phy.c
--- linux-mips-2.6.23-rc5-20070904.macro/drivers/net/phy/phy.c 2007-09-16 17:17:20.000000000 +0000
+++ linux-mips-2.6.23-rc5-20070904/drivers/net/phy/phy.c 2007-09-18 14:28:07.000000000 +0000
@@ -7,7 +7,7 @@
* Author: Andy Fleming
*
* Copyright (c) 2004 Freescale Semiconductor, Inc.
- * Copyright (c) 2006 Maciej W. Rozycki
+ * Copyright (c) 2006, 2007 Maciej W. Rozycki
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the
@@ -35,6 +35,7 @@
#include <linux/timer.h>
#include <linux/workqueue.h>

+#include <asm/atomic.h>
#include <asm/io.h>
#include <asm/irq.h>
#include <asm/uaccess.h>
@@ -561,6 +562,7 @@ static irqreturn_t phy_interrupt(int irq
* queue will write the PHY to disable and clear the
* interrupt, and then reenable the irq line. */
disable_irq_nosync(irq);
+ atomic_inc(&phydev->irq_disable);

schedule_work(&phydev->phy_queue);

@@ -631,6 +633,7 @@ int phy_start_interrupts(struct phy_devi

INIT_WORK(&phydev->phy_queue, phy_change);

+ atomic_set(&phydev->irq_disable, 0);
if (request_irq(phydev->irq, phy_interrupt,
IRQF_SHARED,
"phy_interrupt",
@@ -661,13 +664,22 @@ int phy_stop_interrupts(struct phy_devic
if (err)
phy_error(phydev);

+ free_irq(phydev->irq, phydev);
+
/*
- * Finish any pending work; we might have been scheduled to be called
- * from keventd ourselves, but cancel_work_sync() handles that.
+ * Cannot call flush_scheduled_work() here as desired because
+ * of rtnl_lock(), but we do not really care about what would
+ * be done, except from enable_irq(), so cancel any work
+ * possibly pending and take care of the matter below.
*/
cancel_work_sync(&phydev->phy_queue);
-
- free_irq(phydev->irq, phydev);
+ /*
+ * If work indeed has been cancelled, disable_irq() will have
+ * been left unbalanced from phy_interrupt() and enable_irq()
+ * has to be called so that other devices on the line work.
+ */
+ while (atomic_dec_return(&phydev->irq_disable) >= 0)
+ enable_irq(phydev->irq);

return err;
}
@@ -694,6 +706,7 @@ static void phy_change(struct work_struc
phydev->state = PHY_CHANGELINK;
spin_unlock(&phydev->lock);

+ atomic_dec(&phydev->irq_disable);
enable_irq(phydev->irq);

/* Reenable interrupts */
@@ -707,6 +720,7 @@ static void phy_change(struct work_struc

irq_enable_err:
disable_irq(phydev->irq);
+ atomic_inc(&phydev->irq_disable);
phy_err:
phy_error(phydev);
}
diff -up --recursive --new-file linux-mips-2.6.23-rc5-20070904.macro/include/linux/phy.h linux-mips-2.6.23-rc5-20070904/include/linux/phy.h
--- linux-mips-2.6.23-rc5-20070904.macro/include/linux/phy.h 2007-07-10 04:56:57.000000000 +0000
+++ linux-mips-2.6.23-rc5-20070904/include/linux/phy.h 2007-09-11 23:05:41.000000000 +0000
@@ -25,6 +25,8 @@
#include <linux/timer.h>
#include <linux/workqueue.h>

+#include <asm/atomic.h>
+
#define PHY_BASIC_FEATURES (SUPPORTED_10baseT_Half | \
SUPPORTED_10baseT_Full | \
SUPPORTED_100baseT_Half | \
@@ -281,6 +283,7 @@ struct phy_device {
/* Interrupt and Polling infrastructure */
struct work_struct phy_queue;
struct timer_list phy_timer;
+ atomic_t irq_disable;

spinlock_t lock;


2007-09-20 23:54:48

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Wed, 19 Sep 2007 15:38:19 +0100 (BST)
"Maciej W. Rozycki" <[email protected]> wrote:

> Keep track of disable_irq_nosync() invocations and call enable_irq() the
> right number of times if work has been cancelled that would include them.
>
> Signed-off-by: Maciej W. Rozycki <[email protected]>
> ---
> Now that the call to flush_work_keventd() (problematic because of
> rtnl_mutex being held) has been replaced by cancel_work_sync() another
> issue has arisen and been left unresolved. As the MDIO bus cannot be
> accessed from the interrupt context the PHY interrupt handler uses
> disable_irq_nosync() to prevent from looping and schedules some work to be
> done as a softirq, which, apart from handling the state change of the
> originating PHY, is responsible for reenabling the interrupt. Now if the
> interrupt line is shared by another device and a call to the softirq
> handler has been cancelled, that call to enable_irq() never happens and
> the other device cannot use its interrupt anymore as its stuck disabled.
>
> I decided to use a counter rather than a flag because there may be more
> than one call to phy_change() cancelled in the queue -- a real one and a
> fake one triggered by free_irq() if DEBUG_SHIRQ is used, if nothing else.
> Therefore because of its nesting property enable_irq() has to be called
> the right number of times to match the number disable_irq_nosync() was
> called and restore the original state. This DEBUG_SHIRQ feature is also
> the reason why free_irq() has to be called before cancel_work_sync().
>
> While at it I updated the comment about phy_stop_interrupts() being
> called from `keventd' -- this is no longer relevant as the use of
> cancel_work_sync() makes such an approach unnecessary. OTOH a similar
> comment referring to flush_scheduled_work() in phy_stop() still applies as
> using cancel_work_sync() there would be dangerous.
>
> Checked with checkpatch.pl and at the run time (with and without
> DEBUG_SHIRQ).

You always put boring, crappy, insufficient text in the for-the-changelog
section and interesting, useful, sufficient text in the not-for-the-changelog
section.

But you can't fool me! I have an editor and I fix it up.

2007-09-21 12:51:37

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Thu, 20 Sep 2007, Andrew Morton wrote:

> You always put boring, crappy, insufficient text in the for-the-changelog
> section and interesting, useful, sufficient text in the not-for-the-changelog
> section.

I'll swap the sections in the future then. ;-) Frankly I was not sure
whether the changelog was happy about being fed with lengthy explanations
and it has not spoken out.

I have to admit this is a habit carried over from the FSF-style ChangeLog
-- where the enforced rule is actually *not* to provide any explanation
for why changes are done and only describe what has been modified (with
any discussion around archived in the respective mailing list).

> But you can't fool me! I have an editor and I fix it up.

Thank you and sorry for the extra work I caused you -- I shall keep your
suggestion in mind in the future.

Maciej

2007-09-21 18:43:11

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Fri, 21 Sep 2007 13:51:12 +0100 (BST)
"Maciej W. Rozycki" <[email protected]> wrote:

> On Thu, 20 Sep 2007, Andrew Morton wrote:
>
> > You always put boring, crappy, insufficient text in the for-the-changelog
> > section and interesting, useful, sufficient text in the not-for-the-changelog
> > section.
>
> I'll swap the sections in the future then. ;-) Frankly I was not sure
> whether the changelog was happy about being fed with lengthy explanations
> and it has not spoken out.

I think it's worth putting plenty of details in the changelog: it's compressed
on-disk and on-the-wire and is overall pretty cheap. If people don't
actually seek the information out, it has close to zero impact on them.

But on those occasions when people _do_ seek the information out (and it
can be years later) then they want every drop of information they can get.

Numerous times I've gone back to the 2.5.x mm/ changelogs to work out
what on earth we were thinking when we did something, and it has proved
quite useful in explaining the existing code, or in suggesting possible
problems which we had forgotten about by 2007.


otoh, you can get a lot of handy info by googling for strategic parts of
the kernel code, or by googling snippets of the existing-but-short
changelog. For example, this patch: google for "Keep track of
disable_irq_nosync() invocations" and voila. Perhaps we don't need
changelogs at all ;)

2007-10-15 12:50:19

by Jarek Poplawski

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On 19-09-2007 16:38, Maciej W. Rozycki wrote:
...
> @@ -661,13 +664,22 @@ int phy_stop_interrupts(struct phy_devic
> if (err)
> phy_error(phydev);
>
> + free_irq(phydev->irq, phydev);
> +
> /*
> - * Finish any pending work; we might have been scheduled to be called
> - * from keventd ourselves, but cancel_work_sync() handles that.
> + * Cannot call flush_scheduled_work() here as desired because
> + * of rtnl_lock(), but we do not really care about what would
> + * be done, except from enable_irq(), so cancel any work
> + * possibly pending and take care of the matter below.
> */
> cancel_work_sync(&phydev->phy_queue);

Hi,

Could you explain why cancel_work_sync() is better here than
flush_scheduled_work() wrt. rtnl_lock()?

Regards,
Jarek P.

2007-10-15 17:04:01

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Mon, 15 Oct 2007, Jarek Poplawski wrote:

> Could you explain why cancel_work_sync() is better here than
> flush_scheduled_work() wrt. rtnl_lock()?

Well, this is actually the bit that made cancel_work_sync() be written in
the first place. The short story is the netlink lock is most probably
held at this point (depending on the usage of phy_disconnect()) and there
is also an event waiting in the queue that requires the lock, so if
flush_scheduled_work() is called here a deadlock will happen.

Let me find a reference for a longer story...:

http://www.linux-mips.org/cgi-bin/mesg.cgi?a=linux-mips&i=Pine.LNX.4.64N.0610031509380.4642%40blysk.ds.pg.gda.pl

and then discussed again:

http://www.uwsg.indiana.edu/hypermail/linux/kernel/0612.0/0593.html

Maciej

2007-10-16 06:18:23

by Jarek Poplawski

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Mon, Oct 15, 2007 at 06:03:20PM +0100, Maciej W. Rozycki wrote:
> On Mon, 15 Oct 2007, Jarek Poplawski wrote:
>
> > Could you explain why cancel_work_sync() is better here than
> > flush_scheduled_work() wrt. rtnl_lock()?
>
> Well, this is actually the bit that made cancel_work_sync() be written in
> the first place. The short story is the netlink lock is most probably
> held at this point (depending on the usage of phy_disconnect()) and there
> is also an event waiting in the queue that requires the lock, so if
> flush_scheduled_work() is called here a deadlock will happen.
>
> Let me find a reference for a longer story...:
>
> http://www.linux-mips.org/cgi-bin/mesg.cgi?a=linux-mips&i=Pine.LNX.4.64N.0610031509380.4642%40blysk.ds.pg.gda.pl
>
> and then discussed again:
>
> http://www.uwsg.indiana.edu/hypermail/linux/kernel/0612.0/0593.html
>

Yes, it's all right here. Sorry for bothering - I should've found this
by myself.

I've still some doubts about this possible enable_irq() after
free_irq(). If it's the only handler the status would be changed again
and at least some of this code in check_irq_resend() would be run, but
I can miss something again or/and this doesn't matter, as well.

Thanks,
Jarek P.

2007-10-16 17:19:58

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Tue, 16 Oct 2007, Jarek Poplawski wrote:

> Yes, it's all right here. Sorry for bothering - I should've found this
> by myself.

Ah, no problem -- even with the right keys you may sometimes get lost in
the results.

> I've still some doubts about this possible enable_irq() after
> free_irq(). If it's the only handler the status would be changed again
> and at least some of this code in check_irq_resend() would be run, but
> I can miss something again or/and this doesn't matter, as well.

Well, enable_irq() and disable_irq() themselves are nesting, so they are
not a problem. OTOH, free_irq() does not seem to maintain the depth count
correctly, which looks like a bug to me and which could trigger regardless
of whether flush_scheduled_work() or cancel_work_sync() was called.

The reason is CONFIG_DEBUG_SHIRQ which makes a simulated interrupt event
be sent at the end of free_irq(). It looks like a problem that is
complementary to one I signalled here:

http://lkml.org/lkml/2007/9/12/82

with respect to request_irq(), where, similarly, such an interrupt event
is sent at the beginning. It looks like nobody was concerned back then,
but perhaps it is time to do a better investigation now and propose a
solution.

I'll think about it and thanks for your inquisitiveness that has led to
these conclusions.

Maciej

2007-10-17 08:55:27

by Jarek Poplawski

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Tue, Oct 16, 2007 at 06:19:32PM +0100, Maciej W. Rozycki wrote:
...
> Well, enable_irq() and disable_irq() themselves are nesting, so they are
> not a problem. OTOH, free_irq() does not seem to maintain the depth count
> correctly, which looks like a bug to me and which could trigger regardless
> of whether flush_scheduled_work() or cancel_work_sync() was called.

I'm not sure free_irq() should maintain the depth count - rather warn
on not zero. But, IMHO, any activity on freed irq seems suspicious to
me (and doesn't look like very common), even if it's safe with current
implementation.

> The reason is CONFIG_DEBUG_SHIRQ which makes a simulated interrupt event
> be sent at the end of free_irq(). It looks like a problem that is
> complementary to one I signalled here:
>
> http://lkml.org/lkml/2007/9/12/82
>
> with respect to request_irq(), where, similarly, such an interrupt event
> is sent at the beginning. It looks like nobody was concerned back then,
> but perhaps it is time to do a better investigation now and propose a
> solution.

Yes, these DEBUG_SHIRQ checks are suspicious to me too, but they seem
to be reasonable only in the case of possible resent irqs (so not for
all irqs). On the other hand, it seems, proper irq handler with proper
hardware shouldn't have any problems with such a check.

> I'll think about it and thanks for your inquisitiveness that has led to
> these conclusions.

Btw., since, because of this patch, I've had a one more look at phy.c
and there are a few more doubts, so this time I'll try to bother you
for real:

1) phy_change() checks PHY_HALTED flag without lock; I think it's
racy: eg. if it's done during phy_stop() it can check just before
the flag is set and reenable interrupts just after phy_stop() ends.

2) phy_change() doesn't reenable irq line after it sees returns
with errors; IMHO it should at least write some warning, but maybe
try some safety plan, so enable_irq() and try to disable interrupts
and free_irq() on the next call (if it happens). (But, I can be very
wrong with this - maybe it's OK and official way.)

3) phy_interrupt() checks PHY_HALTED flag without lock too, but I'm
not sure now if it could be dangerous after fixing #1; on the other
hand even if we know it's not our regular interrupt, with current
DEBUG_SHIRQ it could be easier to call schedule_work() anyway since
we are sure it's before/in free_irq() yet.

4) phy_interrupt() should check return value from schedule_work() and
enable irq on 0.

5) phy_stop_interrupts(): maybe I miss something, but it seems
phy_stop() is required before this, so maybe there should be a
comment on this?

6) phy_stop_interrupts(): if I'm not wrong with #3 calling
phy_disable_interrupts() looks like we are not sure this phy_stop()
really works; than maybe a WARN_ON?

7) phy_stop_interrupts(): after above mentioned changes in
phy_interrupt(), and phy_changes() (always enable_irq()) I can't see
any reason why there should be more than one skipped enable_irq(),
so checking return from cancel_work_sync() shouldn't be enough
instead of this atomic counter.

8) phy_stop_interrupts(): I'm not sure this additional call from
DEBUG_SHIRQ should be so dangerous, eg.:

/*
* status == PHY_HALTED &&
* interrupts are stopped after phy_stop()
*/
if (cancel_work_sync(...))
enable_irq();

free_irq(...);
/*
* possible schedule_work() from DEBUG_SHIRQ only,
* but proper check for PHY_HALTED is done;
* so, let's flush after this too:
*/
cancel_work_sync();


Of course, I don't know phy.c enough, so most of this can be terribly
wrong, then feel free to forget about this - I don't expect you should
waste any time for explaining me these things - after all they are
doubts only.

Regards,
Jarek P.

2007-10-17 09:06:53

by Jarek Poplawski

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Wed, Oct 17, 2007 at 10:58:09AM +0200, Jarek Poplawski wrote:
...
> 5) phy_stop_interrupts(): maybe I miss something, but it seems
> phy_stop() is required before this, so maybe there should be a
> comment on this?
>
> 6) phy_stop_interrupts(): if I'm not wrong with #3 calling

Should be:
6) phy_stop_interrupts(): if I'm not wrong with #5 calling

Jarek P.

2007-10-17 09:09:23

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes


> Btw., since, because of this patch, I've had a one more look at phy.c
> and there are a few more doubts, so this time I'll try to bother you
> for real:

.../...

While there... is somebody interested in making the whole PHY lib
operate a task level and use mutexes instead of spinlock ? I need that
for drivers like EMAC (who use their own PHY layer for now), and I might
even give a go at adapting phylib myself, but I'd like to take the
temperature about it first.

Basically, there is nothing in phylib that is performance critical or
such that requires it to run at irq time and/or use locks. On the other
hand, it complicates things in various areas. The most obvious one being
that it prevents the network driver mii access callbacks from sleeping
which can be annoying as MDIO can be slow, and some drivers have fancy
muxes in there that are better off mutexed than spinlocked.

Ben.


2007-10-18 06:29:11

by Jarek Poplawski

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Wed, Oct 17, 2007 at 10:58:09AM +0200, Jarek Poplawski wrote:
...
> 8) phy_stop_interrupts(): I'm not sure this additional call from
> DEBUG_SHIRQ should be so dangerous, eg.:
>
> /*
> * status == PHY_HALTED &&
> * interrupts are stopped after phy_stop()
> */
> if (cancel_work_sync(...))
> enable_irq();
>
> free_irq(...);
> /*
> * possible schedule_work() from DEBUG_SHIRQ only,
> * but proper check for PHY_HALTED is done;
> * so, let's flush after this too:
> */
> cancel_work_sync();

After rethinking, it looks like this last cancel should be useless.
So, if phy_interrupt() schedules only if !PHY_HALTED and phy_change()
does enable_irq() with no exeptions, it seems phy_interrupt() even
without lock must see PHY_HALTED state before this free_irq() with
possible DEBUG_SHIRQ call, then maybe only this safety:

WARN_ON(work_pending(&phydev->phy_queue));


Btw, I've read this was considered and not liked, but IMHO, if this
really has to be like this, creating phy's own workqueue seems to be
resonable, at the very least to reduce latencies to other users of
this irq.

Jarek P.

2007-10-18 07:02:39

by Jarek Poplawski

[permalink] [raw]
Subject: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

After reading this and earlier threads about phylib's way of using
workqueue I think such a lighter and safer wrt. locking alternative
for flush_scheduled_work should be useful, but maybe it's only my
imagination.

So, let's ask Oleg Nesterov, whose solutions are here only
copy-cut-pasted & possibly abused by myslef.

--------->
Subject: flush_work_sync as an alternative for flush_scheduled_work

Similar to cancel_work_sync() but will only busy wait & block
(without cancel).

Signed-off-by: Jarek Poplawski <[email protected]>

---

include/linux/workqueue.h | 1 +
kernel/workqueue.c | 24 ++++++++++++++++++++++++
2 files changed, 25 insertions(+)

diff -Nurp 2.6.23-mm1-/include/linux/workqueue.h 2.6.23-mm1/include/linux/workqueue.h
--- 2.6.23-mm1-/include/linux/workqueue.h 2007-10-12 23:45:24.000000000 +0200
+++ 2.6.23-mm1/include/linux/workqueue.h 2007-10-17 20:55:26.000000000 +0200
@@ -192,6 +192,7 @@ extern void init_workqueues(void);
int execute_in_process_context(work_func_t fn, struct execute_work *);

extern int cancel_work_sync(struct work_struct *work);
+extern void flush_work_sync(struct work_struct *work);

/*
* Kill off a pending schedule_delayed_work(). Note that the work callback
diff -Nurp 2.6.23-mm1-/kernel/workqueue.c 2.6.23-mm1/kernel/workqueue.c
--- 2.6.23-mm1-/kernel/workqueue.c 2007-10-12 23:45:25.000000000 +0200
+++ 2.6.23-mm1/kernel/workqueue.c 2007-10-17 20:54:03.000000000 +0200
@@ -539,6 +539,30 @@ int cancel_delayed_work_sync(struct dela
}
EXPORT_SYMBOL(cancel_delayed_work_sync);

+/**
+ * flush_work_sync - block until a work_struct's callback has terminated
+ * @work: the work which is to be flushed
+ *
+ * Similar to cancel_work_sync() but will only busy wait (without cancel)
+ * if the work is queued. If the work's callback appears to be running,
+ * flush_work_sync() will block until it has completed (but doesn't block
+ * while other callbacks are running, like flush_scheduled_work() does).
+ *
+ * It is not allowed to use this function if the work re-queues itself.
+ */
+void flush_work_sync(struct work_struct *work)
+{
+ int ret;
+
+ do {
+ ret = work_pending(work);
+ wait_on_work(work);
+ if (ret)
+ cpu_relax();
+ } while (ret);
+}
+EXPORT_SYMBOL(flush_work_sync);
+
static struct workqueue_struct *keventd_wq __read_mostly;

/**

2007-10-18 11:30:58

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Wed, 17 Oct 2007, Jarek Poplawski wrote:

> I'm not sure free_irq() should maintain the depth count - rather warn
> on not zero. But, IMHO, any activity on freed irq seems suspicious to
> me (and doesn't look like very common), even if it's safe with current
> implementation.

No way to avoid it with DEBUG_SHIRQ.

> Yes, these DEBUG_SHIRQ checks are suspicious to me too, but they seem
> to be reasonable only in the case of possible resent irqs (so not for
> all irqs). On the other hand, it seems, proper irq handler with proper
> hardware shouldn't have any problems with such a check.

What do you mean by "proper irq handler with proper hardware"? Using
softirqs (they used to be called bottom-halves) is actually a natural way
of handling any interrupt which requires extensive processing.

> 1) phy_change() checks PHY_HALTED flag without lock; I think it's
> racy: eg. if it's done during phy_stop() it can check just before
> the flag is set and reenable interrupts just after phy_stop() ends.

I remember having a look into it, but it was long ago and I cannot
immediately recall the conclusion. Which means it is either broken or
deserves a comment as non-obvious. I will have a look into it again, but
I am resource-starved a little at the moment, sorry.

> 2) phy_change() doesn't reenable irq line after it sees returns
> with errors; IMHO it should at least write some warning, but maybe
> try some safety plan, so enable_irq() and try to disable interrupts
> and free_irq() on the next call (if it happens). (But, I can be very
> wrong with this - maybe it's OK and official way.)

No way to do this safely -- at this point the device probably still has
its interrupt output asserted and the register to clear it is
inaccessible, so enabling the line will enter an infinite loop. At this
point the system is no longer stable, so it is better to keep at least
some functionality, so that it may be attempted to be shut down cleanly,
rather than make it completely irresponsive. The alternative is panic().

> 3) phy_interrupt() checks PHY_HALTED flag without lock too, but I'm
> not sure now if it could be dangerous after fixing #1; on the other
> hand even if we know it's not our regular interrupt, with current
> DEBUG_SHIRQ it could be easier to call schedule_work() anyway since
> we are sure it's before/in free_irq() yet.

See #1 above.

> 4) phy_interrupt() should check return value from schedule_work() and
> enable irq on 0.

No -- the work already pending will do that.

> 5) phy_stop_interrupts(): maybe I miss something, but it seems
> phy_stop() is required before this, so maybe there should be a
> comment on this?

The API is documented in: Documentation/networking/phy.txt -- you are
welcome to improve. If you do not want to get into the gory details, just
use the cooked interface and phy_disconnect() will do the dirty work for
you.

> 6) phy_stop_interrupts(): if I'm not wrong with #3 calling
> phy_disable_interrupts() looks like we are not sure this phy_stop()
> really works; than maybe a WARN_ON?

WARN_ON what?

> 7) phy_stop_interrupts(): after above mentioned changes in
> phy_interrupt(), and phy_changes() (always enable_irq()) I can't see
> any reason why there should be more than one skipped enable_irq(),
> so checking return from cancel_work_sync() shouldn't be enough
> instead of this atomic counter.

CONFIG_DEBUG_SHIRQ. Barring this option, cancel_work_sync() could have
been moved to the front of free_irq(). I think I have seen this in
reality, with the interrupt line left stuck disabled afterwards, but I
will double check when I have an opportunity. The approach implemented
with this patch does work, which is the important bit, and if
simplification is possible, then it may be applied later.

> 8) phy_stop_interrupts(): I'm not sure this additional call from
> DEBUG_SHIRQ should be so dangerous, eg.:
>
> /*
> * status == PHY_HALTED &&
> * interrupts are stopped after phy_stop()
> */
> if (cancel_work_sync(...))
> enable_irq();
>
> free_irq(...);
> /*
> * possible schedule_work() from DEBUG_SHIRQ only,
> * but proper check for PHY_HALTED is done;
> * so, let's flush after this too:
> */
> cancel_work_sync();

Well, if there is another handler registered on this line, you'll get
your interrupt line stuck disabled.

> Of course, I don't know phy.c enough, so most of this can be terribly
> wrong, then feel free to forget about this - I don't expect you should
> waste any time for explaining me these things - after all they are
> doubts only.

I am not sure I know phy.c well enough either, ;-) and your concerns are
appreciated as interesting conclusions may develop. If someone disagrees
with what I have written here, they are welcome to speak out too.

Maciej

2007-10-18 11:37:53

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Thu, 18 Oct 2007, Jarek Poplawski wrote:

> After rethinking, it looks like this last cancel should be useless.
> So, if phy_interrupt() schedules only if !PHY_HALTED and phy_change()
> does enable_irq() with no exeptions, it seems phy_interrupt() even
> without lock must see PHY_HALTED state before this free_irq() with
> possible DEBUG_SHIRQ call, then maybe only this safety:
>
> WARN_ON(work_pending(&phydev->phy_queue));

Good point.

Maciej

2007-10-18 14:34:59

by Jarek Poplawski

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Thu, Oct 18, 2007 at 12:30:35PM +0100, Maciej W. Rozycki wrote:
> On Wed, 17 Oct 2007, Jarek Poplawski wrote:
>
> > I'm not sure free_irq() should maintain the depth count - rather warn
> > on not zero. But, IMHO, any activity on freed irq seems suspicious to
> > me (and doesn't look like very common), even if it's safe with current
> > implementation.
>
> No way to avoid it with DEBUG_SHIRQ.
>
> > Yes, these DEBUG_SHIRQ checks are suspicious to me too, but they seem
> > to be reasonable only in the case of possible resent irqs (so not for
> > all irqs). On the other hand, it seems, proper irq handler with proper
> > hardware shouldn't have any problems with such a check.
>
> What do you mean by "proper irq handler with proper hardware"? Using
> softirqs (they used to be called bottom-halves) is actually a natural way
> of handling any interrupt which requires extensive processing.

Technically until free_irq returns a handler should respond to calls
and with proper hardware it should have no problem with checking if
it's its interrupt, even after disabling this hardware, because of
possible races.

But with a scenario like this:

- disable_irq()
- disable_my_hadrware_irq()
- clear_my_hardware_irq()
- free_irq()
- enable_irq()

it seems the handler should respond even after free_irq because there
could be still interrupts to resend, originally generated by its
hardware, so such behavior looks very suspicious, at least with some
type of interrupts.

So, I think, the idea of DEBUG_SHIRQ is generally right and very
useful - but, of course, there could be exceptions, which btw. could
try some hacks under DEBUG_SHIRQ too. And my opinion about
'properness' was very general (not about phy) too, just like my
'knowledge' of drivers.

>
> > 1) phy_change() checks PHY_HALTED flag without lock; I think it's
> > racy: eg. if it's done during phy_stop() it can check just before
> > the flag is set and reenable interrupts just after phy_stop() ends.
>
> I remember having a look into it, but it was long ago and I cannot
> immediately recall the conclusion. Which means it is either broken or
> deserves a comment as non-obvious. I will have a look into it again, but
> I am resource-starved a little at the moment, sorry.
>
> > 2) phy_change() doesn't reenable irq line after it sees returns
> > with errors; IMHO it should at least write some warning, but maybe
> > try some safety plan, so enable_irq() and try to disable interrupts
> > and free_irq() on the next call (if it happens). (But, I can be very
> > wrong with this - maybe it's OK and official way.)
>
> No way to do this safely -- at this point the device probably still has
> its interrupt output asserted and the register to clear it is
> inaccessible, so enabling the line will enter an infinite loop. At this
> point the system is no longer stable, so it is better to keep at least
> some functionality, so that it may be attempted to be shut down cleanly,
> rather than make it completely irresponsive. The alternative is panic().

Right! But then some warning could be useful, I presume.

>
> > 3) phy_interrupt() checks PHY_HALTED flag without lock too, but I'm
> > not sure now if it could be dangerous after fixing #1; on the other
> > hand even if we know it's not our regular interrupt, with current
> > DEBUG_SHIRQ it could be easier to call schedule_work() anyway since
> > we are sure it's before/in free_irq() yet.
>
> See #1 above.
>
> > 4) phy_interrupt() should check return value from schedule_work() and
> > enable irq on 0.
>
> No -- the work already pending will do that.

How? If schedule_work won't succeed because there is a pending one,
we did disable_irq_nosync 2x, so we would stay at least 1x too deep!

>
> > 5) phy_stop_interrupts(): maybe I miss something, but it seems
> > phy_stop() is required before this, so maybe there should be a
> > comment on this?
>
> The API is documented in: Documentation/networking/phy.txt -- you are
> welcome to improve. If you do not want to get into the gory details, just
> use the cooked interface and phy_disconnect() will do the dirty work for
> you.
>
> > 6) phy_stop_interrupts(): if I'm not wrong with #3 calling
> > phy_disable_interrupts() looks like we are not sure this phy_stop()
> > really works; than maybe a WARN_ON?
>
> WARN_ON what?

I've thought that phy_stop() could be needed before
phy_stop_interrupt() to set PHY_HALTED, but since it disables and
clears interrupts too, then there should be no need to repeat this,
maybe only check it's done. But if there is no such dependency, then
no point at all.

>
> > 7) phy_stop_interrupts(): after above mentioned changes in
> > phy_interrupt(), and phy_changes() (always enable_irq()) I can't see
> > any reason why there should be more than one skipped enable_irq(),
> > so checking return from cancel_work_sync() shouldn't be enough
> > instead of this atomic counter.
>
> CONFIG_DEBUG_SHIRQ. Barring this option, cancel_work_sync() could have
> been moved to the front of free_irq(). I think I have seen this in
> reality, with the interrupt line left stuck disabled afterwards, but I
> will double check when I have an opportunity. The approach implemented
> with this patch does work, which is the important bit, and if
> simplification is possible, then it may be applied later.
>
> > 8) phy_stop_interrupts(): I'm not sure this additional call from
> > DEBUG_SHIRQ should be so dangerous, eg.:
> >
> > /*
> > * status == PHY_HALTED &&
> > * interrupts are stopped after phy_stop()
> > */
> > if (cancel_work_sync(...))
> > enable_irq();
> >
> > free_irq(...);
> > /*
> > * possible schedule_work() from DEBUG_SHIRQ only,
> > * but proper check for PHY_HALTED is done;
> > * so, let's flush after this too:
> > */
> > cancel_work_sync();
>
> Well, if there is another handler registered on this line, you'll get
> your interrupt line stuck disabled.

I'll rethink this yet...

>
> > Of course, I don't know phy.c enough, so most of this can be terribly
> > wrong, then feel free to forget about this - I don't expect you should
> > waste any time for explaining me these things - after all they are
> > doubts only.
>
> I am not sure I know phy.c well enough either, ;-) and your concerns are
> appreciated as interesting conclusions may develop. If someone disagrees
> with what I have written here, they are welcome to speak out too.

But, I've enough of other concerns too, so nothing urgent here...

Many thanks,
Jarek P.

2007-10-18 15:31:45

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Thu, 18 Oct 2007, Jarek Poplawski wrote:

> Technically until free_irq returns a handler should respond to calls
> and with proper hardware it should have no problem with checking if
> it's its interrupt, even after disabling this hardware, because of
> possible races.

Well, the hardirq handler can check it, no problem -- it is just it is so
slow, the latency would be unacceptable. The problem with the softirq
handler is we do really not want it to be called after the driver has
already been shut down and its structures freed. It used to happen before
this flush/cancel call was added with the usual effect (oops) as by then
they may well have been stamped on already.

> But with a scenario like this:
>
> - disable_irq()
> - disable_my_hadrware_irq()
> - clear_my_hardware_irq()
> - free_irq()
> - enable_irq()
>
> it seems the handler should respond even after free_irq because there
> could be still interrupts to resend, originally generated by its
> hardware, so such behavior looks very suspicious, at least with some
> type of interrupts.

These are softirqs, not hardware interrupts, so they are as such not
related to *_irq() infrastructure. The flaw is the depth count of IRQ
lines is not maintained consistently. This is, according to comments
around the code in question, to cover up bugs elsewhere. Not a brillant
idea, I am afraid -- there should be no need to reset the depth upon
request_irq() and likewise with free_irq(), but there you go. I would be
happy to investigate a possible solution and rewrite the necessary bits,
but right now I am committed to other stuff, overdue already, sorry.

The view could change if we supported hot-pluggable interrupt
controllers, but it is not the case at the moment right now, so the
interrupt lines are there to stay for the duration of the system lifespan
and could be manipulated as necessary.

> So, I think, the idea of DEBUG_SHIRQ is generally right and very
> useful - but, of course, there could be exceptions, which btw. could
> try some hacks under DEBUG_SHIRQ too. And my opinion about
> 'properness' was very general (not about phy) too, just like my
> 'knowledge' of drivers.

The idea is right, no question, but I am not quite sure it has been
properly architected into our current design. Actually I am almost sure
of the reverse. This is why I was (and still am) interested in feedback
on it.

> Right! But then some warning could be useful, I presume.

Of course; adding one to phy_error() should be straightforward.

> > > 4) phy_interrupt() should check return value from schedule_work() and
> > > enable irq on 0.
> >
> > No -- the work already pending will do that.
>
> How? If schedule_work won't succeed because there is a pending one,
> we did disable_irq_nosync 2x, so we would stay at least 1x too deep!

Correct and my note is misleading, sorry.

The thing is we shouldn't have come here the second time in the first
place (which is I think why the check is not there) as handlers for the
same line are not allowed to run in parallel (cf. irq_desc->lock and
IRQ_INPROGRESS). Perhaps BUG_ON(!schedule_work()) would be appropriate,
though I am not sure if we should handle every "impossible" condition we
can imagine. Quite a lot of hardirq handlers assume two instances will
not run in parallel, so if it ever happened, then a serious breakage would
unleash.

> But, I've enough of other concerns too, so nothing urgent here...

The problem is at the moment I am still probably the only user of this
code ;-) -- `grep' through the sources to see how many drivers request an
IRQ for their PHY through phylib; unless something has changed very
recently.

Maciej

2007-10-18 15:44:08

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On 10/18, Jarek Poplawski wrote:
>
> +/**
> + * flush_work_sync - block until a work_struct's callback has terminated
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Hmm...

> + * Similar to cancel_work_sync() but will only busy wait (without cancel)
> + * if the work is queued.

Yes, it won't block, but will spin in busy-wait loop until all other works
scheduled before this work are finished. Not good. After that it really
blocks waiting for this work to complete.

And I am a bit confused. We can't use flush_workqueue() because some of the
queued work_structs may take rtnl_lock, yes? But in that case we can't use
the new flush_work_sync() helper as well, no?

If we can't just cancel the work, can't we do something like

if (cancel_work_sync(w))
w->func(w);

instead?

> +void flush_work_sync(struct work_struct *work)
> +{
> + int ret;
> +
> + do {
> + ret = work_pending(work);
> + wait_on_work(work);
> + if (ret)
> + cpu_relax();
> + } while (ret);
> +}

If we really the new helper, perhaps we can make it a bit better?

1. Modify insert_work() to take the "struct list_head *at" parameter instead
of "int tail". I think this patch will also cleanup the code a bit, and
shrink a couple of bytes from .text

2. flush_work_sync() inserts a barrier right after this work and blocks.
We still need some retry logic to handle the queueing is in progress
of course, but we won't spin waiting for the other works.

What do you think?

Oleg.

2007-10-18 15:59:33

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Thu, 18 Oct 2007, Oleg Nesterov wrote:

> If we can't just cancel the work, can't we do something like
>
> if (cancel_work_sync(w))
> w->func(w);
>
> instead?

We do an equivalent of this -- all that we care about that w->func(w)
would do is enable_irq() and the rest we want to skip at this point. We
probably do not need the counter in the end though.

Maciej

2007-10-19 07:47:19

by Jarek Poplawski

[permalink] [raw]
Subject: Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Thu, Oct 18, 2007 at 07:48:19PM +0400, Oleg Nesterov wrote:
> On 10/18, Jarek Poplawski wrote:
> >
> > +/**
> > + * flush_work_sync - block until a work_struct's callback has terminated
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Hmm...
>
> > + * Similar to cancel_work_sync() but will only busy wait (without cancel)
> > + * if the work is queued.
>
> Yes, it won't block, but will spin in busy-wait loop until all other works
> scheduled before this work are finished. Not good. After that it really
> blocks waiting for this work to complete.
>
> And I am a bit confused. We can't use flush_workqueue() because some of the
> queued work_structs may take rtnl_lock, yes? But in that case we can't use
> the new flush_work_sync() helper as well, no?

OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOPS!

Of course, we can't!!! I remembered there was this issue long time
ago, but then I've had some break in tracking net & workqueue. So,
while reading this patch I was alarmed at first, and self-misled
later. I think, there is definitely needed some warning about
locking (or unlocking) during these flush_ & cancel_ functions.
(Btw, I've very much wondered now, why I didn't notice at that 'old'
time, that you added such a great feature (wrt. locking) and I even
didn't notice this...).

So, Maciej (and other readers of this thread) - I withdraw my false
opinion from my second message here: it's very wrong to call this
sched_work_sync() with rtnl_lock(). It's only less probable to lockup
with this than with flush_schedule_work().

>
> If we can't just cancel the work, can't we do something like
>
> if (cancel_work_sync(w))
> w->func(w);
>
> instead?
>
> > +void flush_work_sync(struct work_struct *work)
> > +{
> > + int ret;
> > +
> > + do {
> > + ret = work_pending(work);
> > + wait_on_work(work);
> > + if (ret)
> > + cpu_relax();
> > + } while (ret);
> > +}
>
> If we really the new helper, perhaps we can make it a bit better?
>
> 1. Modify insert_work() to take the "struct list_head *at" parameter instead
> of "int tail". I think this patch will also cleanup the code a bit, and
> shrink a couple of bytes from .text

Looks like a very good idea, but I need more time to rethink this.
Probably some code example should be helpful.

>
> 2. flush_work_sync() inserts a barrier right after this work and blocks.
> We still need some retry logic to handle the queueing is in progress
> of course, but we won't spin waiting for the other works.

Until monday I should have an opinion on that (today a bit under
fire...).

>
> What do you think?

Since there is no gain wrt. locking with my current proposal, I
withdraw this patch of course.

It looks like my wrong patch was great idea because we got this very
precious Oleg's opinion! (I know I'm a genius sometimes...)

Thanks very much,
Jarek P.

2007-10-19 07:59:38

by Jarek Poplawski

[permalink] [raw]
Subject: Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Fri, Oct 19, 2007 at 09:50:14AM +0200, Jarek Poplawski wrote:
...
> sched_work_sync() with rtnl_lock(). It's only less probable to lockup
> with this than with flush_schedule_work().

...But, not much less...

Jarek P.

2007-10-19 08:01:31

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Thu, 2007-10-18 at 19:48 +0400, Oleg Nesterov wrote:

> > +void flush_work_sync(struct work_struct *work)

> If we really the new helper, perhaps we can make it a bit better?
>
> 1. Modify insert_work() to take the "struct list_head *at" parameter instead
> of "int tail". I think this patch will also cleanup the code a bit, and
> shrink a couple of bytes from .text
>
> 2. flush_work_sync() inserts a barrier right after this work and blocks.
> We still need some retry logic to handle the queueing is in progress
> of course, but we won't spin waiting for the other works.

3. Add lockdep annotation like the other API. :) Andrew just sent my
patch (used to be two patches by somebody's request but that's fine)
titled "workqueue: debug flushing deadlocks with lockdep" to Linus.

johannes


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part

2007-10-19 08:14:18

by Jarek Poplawski

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Thu, Oct 18, 2007 at 12:30:35PM +0100, Maciej W. Rozycki wrote:
> On Wed, 17 Oct 2007, Jarek Poplawski wrote:
...
> > 2) phy_change() doesn't reenable irq line after it sees returns
> > with errors; IMHO it should at least write some warning, but maybe
> > try some safety plan, so enable_irq() and try to disable interrupts
> > and free_irq() on the next call (if it happens). (But, I can be very
> > wrong with this - maybe it's OK and official way.)
>
> No way to do this safely -- at this point the device probably still has
> its interrupt output asserted and the register to clear it is
> inaccessible, so enabling the line will enter an infinite loop. At this
> point the system is no longer stable, so it is better to keep at least
> some functionality, so that it may be attempted to be shut down cleanly,
> rather than make it completely irresponsive. The alternative is panic().

But then... your patch seems to make it possible, because it enables
irq to the initial state of the counter. Of course, this could happen
on closing only.

Jarek P.

2007-10-19 11:38:50

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Thu, 18 Oct 2007, Maciej W. Rozycki wrote:

> > 1) phy_change() checks PHY_HALTED flag without lock; I think it's
> > racy: eg. if it's done during phy_stop() it can check just before
> > the flag is set and reenable interrupts just after phy_stop() ends.
>
> I remember having a look into it, but it was long ago and I cannot
> immediately recall the conclusion. Which means it is either broken or
> deserves a comment as non-obvious. I will have a look into it again, but
> I am resource-starved a little at the moment, sorry.

Well, I have now recalled what the issue is -- we just plainly and simply
want to avoid a hardirq spinlock for the very reason we do not do all the
processing in the hardirq handler. The thing is we make accesses to the
MDIO bus with the phydev lock held and it may take ages until these
accesses will have completed. And we cannot afford keeping interrupts
disabled for so long.

So the only way is to make the check for the HALTED state lockless and
make sure any race condition is handled gracefully and does not lead to
inconsistent behaviour. Which I think as of what we have in the
net-2.6.24 tree is the case, but there are never too many eyes to look at
a piece of code, so if anybody feels like proving me wrong, then just go
ahead!

Maciej

2007-10-19 12:57:59

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Fri, 19 Oct 2007, Jarek Poplawski wrote:

> But then... your patch seems to make it possible, because it enables
> irq to the initial state of the counter. Of course, this could happen
> on closing only.

That's because free_irq() does not disable the interrupt in the correct
manner. The scenario is more or less like this:

phy_interrupt() [depth == 0]
disable_irq()
depth++; status |= IRQ_DISABLED;
...
free_irq() [depth == 1]
status |= IRQ_DISABLED;
...
phy_change() [depth == 1]
enable_irq()
depth--; status &= ~IRQ_DISABLED;
oops!

Now if free_irq() correctly incremented the depth counter, then the last
enable_irq() would still decrement it, but with its initial value of 2 it
would not change the status to reenable the line.

Maciej

2007-10-19 14:36:28

by Jarek Poplawski

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Fri, Oct 19, 2007 at 12:38:29PM +0100, Maciej W. Rozycki wrote:
> On Thu, 18 Oct 2007, Maciej W. Rozycki wrote:
>
> > > 1) phy_change() checks PHY_HALTED flag without lock; I think it's
> > > racy: eg. if it's done during phy_stop() it can check just before
> > > the flag is set and reenable interrupts just after phy_stop() ends.
> >
> > I remember having a look into it, but it was long ago and I cannot
> > immediately recall the conclusion. Which means it is either broken or
> > deserves a comment as non-obvious. I will have a look into it again, but
> > I am resource-starved a little at the moment, sorry.
>
> Well, I have now recalled what the issue is -- we just plainly and simply
> want to avoid a hardirq spinlock for the very reason we do not do all the
> processing in the hardirq handler. The thing is we make accesses to the
> MDIO bus with the phydev lock held and it may take ages until these
> accesses will have completed. And we cannot afford keeping interrupts
> disabled for so long.
>
> So the only way is to make the check for the HALTED state lockless and
> make sure any race condition is handled gracefully and does not lead to
> inconsistent behaviour. Which I think as of what we have in the
> net-2.6.24 tree is the case, but there are never too many eyes to look at
> a piece of code, so if anybody feels like proving me wrong, then just go
> ahead!

Actually I'm not convinced with this explanation. It seems to me that
since there are such serious locking problems (especially with rntl),
there could be once more considered a private workqueue. You've
written earlier about being a lonely user of this code. But, since
Benjamin offered his help with changing to mutexes, which looks like
very reasonable idea to me (probably I miss most of the points...),
maybe it's very good opportunity to both: make this code better and
double the user base! I'm interested in looking for such solution
if Benjamin thinks there could be too few problems for him... So,
let somebody tell us what could be wrong with this idea?

Cheers (till Monday),
Jarek P.

2007-10-19 17:59:04

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Fri, 19 Oct 2007, Jarek Poplawski wrote:

> Actually I'm not convinced with this explanation. It seems to me that
> since there are such serious locking problems (especially with rntl),
> there could be once more considered a private workqueue. You've
> written earlier about being a lonely user of this code. But, since

Well, this will change eventually when I submit the patch for Broadcom
SiByte platforms so that sb1250-mac.c will be able to request an interrupt
for the PHYs. All the infrastructure is in place except from platform
code to configure the SOC's GPIO line used for the interrupt input (when
applicable) and let the driver know what the line actually is.

> Benjamin offered his help with changing to mutexes, which looks like
> very reasonable idea to me (probably I miss most of the points...),
> maybe it's very good opportunity to both: make this code better and
> double the user base! I'm interested in looking for such solution
> if Benjamin thinks there could be too few problems for him... So,
> let somebody tell us what could be wrong with this idea?

I do not object and can certainly cooperate, but cannot make it a high
priority work for me at the moment -- sorry.

Maciej

2007-10-19 21:49:00

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes


> Actually I'm not convinced with this explanation. It seems to me that
> since there are such serious locking problems (especially with rntl),
> there could be once more considered a private workqueue. You've
> written earlier about being a lonely user of this code. But, since
> Benjamin offered his help with changing to mutexes, which looks like
> very reasonable idea to me (probably I miss most of the points...),
> maybe it's very good opportunity to both: make this code better and
> double the user base! I'm interested in looking for such solution
> if Benjamin thinks there could be too few problems for him... So,
> let somebody tell us what could be wrong with this idea?

My main problem is time :-) But I need to do the change to mutex if I am
to use phylib for emac. I can't tell when I'll have time to do it tho.

Ben.

2007-10-22 06:08:40

by Jarek Poplawski

[permalink] [raw]
Subject: Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Fri, Oct 19, 2007 at 09:50:14AM +0200, Jarek Poplawski wrote:
> On Thu, Oct 18, 2007 at 07:48:19PM +0400, Oleg Nesterov wrote:
> > On 10/18, Jarek Poplawski wrote:
> > >
> > > +/**
> > > + * flush_work_sync - block until a work_struct's callback has terminated
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > Hmm...
> >
> > > + * Similar to cancel_work_sync() but will only busy wait (without cancel)
> > > + * if the work is queued.
> >
> > Yes, it won't block, but will spin in busy-wait loop until all other works
> > scheduled before this work are finished. Not good. After that it really
> > blocks waiting for this work to complete.
> >
> > And I am a bit confused. We can't use flush_workqueue() because some of the
> > queued work_structs may take rtnl_lock, yes? But in that case we can't use
> > the new flush_work_sync() helper as well, no?

OK, I know I'm dumber and dumber everyday, but it seems in a hurry I
got it wrong again or miss something (as usual): these all flushes are
rtnl lockup vulnerable wrt. other work functions, but cancel_work_sync
looks perfectly fine... (Or am I wrong because: ...?)

Then, if by any chance I'm right, something like flush_work_sync
(or changed flush_scheduled_work, if there is no problem with such
a change of implementation) could be safely (if it's called without
locks used by flushed work only) done cancel_work_sync() way, by
running a work function after try_to_grab_pending() returns 1 (after
list_del_init - of course without respecting a queue order).

Regards,
Jarek P.

2007-10-22 17:58:25

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On 10/22, Jarek Poplawski wrote:
>
> On Fri, Oct 19, 2007 at 09:50:14AM +0200, Jarek Poplawski wrote:
> > On Thu, Oct 18, 2007 at 07:48:19PM +0400, Oleg Nesterov wrote:
> > > On 10/18, Jarek Poplawski wrote:
> > > >
> > > > +/**
> > > > + * flush_work_sync - block until a work_struct's callback has terminated
> > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > Hmm...
> > >
> > > > + * Similar to cancel_work_sync() but will only busy wait (without cancel)
> > > > + * if the work is queued.
> > >
> > > Yes, it won't block, but will spin in busy-wait loop until all other works
> > > scheduled before this work are finished. Not good. After that it really
> > > blocks waiting for this work to complete.
> > >
> > > And I am a bit confused. We can't use flush_workqueue() because some of the
> > > queued work_structs may take rtnl_lock, yes? But in that case we can't use
> > > the new flush_work_sync() helper as well, no?
>
> OK, I know I'm dumber and dumber everyday,

You are not alone. I have the same feeling about myself!

> these all flushes are
> rtnl lockup vulnerable wrt. other work functions, but cancel_work_sync
> looks perfectly fine

Yes,

> Then, if by any chance I'm right, something like flush_work_sync
> (or changed flush_scheduled_work, if there is no problem with such
> a change of implementation) could be safely (if it's called without
> locks used by flushed work only) done cancel_work_sync() way, by
> running a work function after try_to_grab_pending() returns 1

If this work doesn't rearm itself - yes. (otherwise, the same ->func
can run twice _at the same time_)

But again, in this case wait_on_work() after try_to_grab_pending() == 1
doesn't block, so we can just do

if (cancel_work_sync(w))
w->func();

Oleg.

2007-10-23 06:56:47

by Jarek Poplawski

[permalink] [raw]
Subject: Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Mon, Oct 22, 2007 at 10:02:59PM +0400, Oleg Nesterov wrote:
> On 10/22, Jarek Poplawski wrote:
...
> > OK, I know I'm dumber and dumber everyday,
>
> You are not alone. I have the same feeling about myself!

Feeling is not the same, only true knowledge counts!

>
> > these all flushes are
> > rtnl lockup vulnerable wrt. other work functions, but cancel_work_sync
> > looks perfectly fine
>
> Yes,
>
> > Then, if by any chance I'm right, something like flush_work_sync
> > (or changed flush_scheduled_work, if there is no problem with such
> > a change of implementation) could be safely (if it's called without
> > locks used by flushed work only) done cancel_work_sync() way, by
> > running a work function after try_to_grab_pending() returns 1
>
> If this work doesn't rearm itself - yes. (otherwise, the same ->func
> can run twice _at the same time_)
>
> But again, in this case wait_on_work() after try_to_grab_pending() == 1
> doesn't block, so we can just do
>
> if (cancel_work_sync(w))
> w->func();

Of course, I should have written it much shorter by saying
flush_scheduled_work could be done internally just like you suggested!

My point is to make this all safer and simpler, so we don't have to
remember each time about these differences wrt. locking. Then checking
of such patches could be much easier. Unless this current behavior
of flush_scheduled_work has any real advantages, worth of this
potential unsafety. (Btw. is there any reason to use this with
rearming works, anyway?)

Thanks,
Jarek P.

2007-10-23 09:18:22

by Jarek Poplawski

[permalink] [raw]
Subject: Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Mon, Oct 22, 2007 at 10:02:59PM +0400, Oleg Nesterov wrote:
...
> If this work doesn't rearm itself - yes. (otherwise, the same ->func
> can run twice _at the same time_)
>
> But again, in this case wait_on_work() after try_to_grab_pending() == 1
> doesn't block, so we can just do
>
> if (cancel_work_sync(w))
> w->func();
>

...but, if it were run just before work_clear_pending()?

Jarek P.