2003-03-27 13:21:03

by Shmuel Hen

[permalink] [raw]
Subject: Re: BUG or not? GFP_KERNEL with interrupts disabled.

Further more, holding a lock_irq doesn't mean bottom halves are disabled
too, it just means interrupts are disabled and no *new* softirq can be
queued. Consider the following situation:

In bond_release() we hold write_lock_irqsave(&bond->lock, flags) and then
do all the releasing stuff. If, for example, we need to call
dev_mc_upload() for the released slave, the following will happen

spin_lock_bh(&dev->xmit_lock);
__dev_mc_upload(dev);
spin_unlock_bh(&dev->xmit_lock);

spin_unlock_bh() calls local_bh_enable() which checks local_bh_count. If
local_bh_count reaches zero (and it does), it directly executes
do_softirq(). The check for in_interrupt() in do_softirq() is false and
the softirqs that were queued begin to run and process the Tx and Rx
backlogs. dev_queue_xmit() is called on the bond device which calls, lets
say, bond_xmit_xor(). The first thing bond_xmit_xor() does is try to grab
read_lock_irqsave(&bond->lock, flags). Since this lock is already held by
bond_release(), and we're on the same cpu without any context switch,
we've got ourselves a deadlock. This actually happened to us and it took us
a while to figure the system halt, but we've got the kdb trace to prove
it.

Specifically for bonding, as stated by Dan below, it is indeed not
necessary to hold a lock_irq in every entry point in the driver. From our
experience in previous projects, we discovered that it is sufficient to
just grab a read_lock when accessing the slaves list in any softirq level
function (receive, transmit and timer), and hold a write_lock_bh() only
when changing the slaves list in ioctl calls like bond_enslave(),
bond_release(), bond_release_all() which all run at user context.

We have created a version that uses the above scheme that is being tested
by our QA group these days. Such a major change in the locking scheme
requires allot of testing to try and detect potential hidden bugs and
corner cases. We expect this will also increase the total throughput,
since interrupts won't be blocked each time a packet is being transmitted
or the miimon timer pops. We believe we will be able to post the patch
(+results) next week.


On Tue, 25 Mar 2003, Dan Eble wrote:
>
> (kernel is ppc 2.4.21-pre4)
>
> In bond_enslave() [drivers/net/bonding.c]:
>
> ??????? write_lock_irqsave(&bond->lock, flags);
> ??????? ...
> ??????? err = netdev_set_master(slave_dev, master_dev);
> ??????? ...
> ??????? write_unlock_irqrestore(&bond->lock, flags);
>
> In netdev_set_master() [net/core/dev.c]:
>
> ??????? rtmsg_ifinfo(RTM_NEWLINK, slave, IFF_SLAVE);
>
> In rtmsg_ifinfo() [net/core/rtnetlink.c]:
>
> ??????? skb = alloc_skb(size, GFP_KERNEL);
> ??????? ...
> ??????? netlink_broadcast(rtnl, skb, 0, RTMGRP_LINK, GFP_KERNEL);
>
> Doesn't this admit the possibility of sleeping with interrupts disabled??
> I found it because I'm working on a driver that uses a master-slave
> relationship like the bonding driver, and decided I didn't really need to
> disable interrupts, so I tried using write_lock_bh()? instead.? The
> result
> was an "alloc_skb called nonatomically from interrupt" message because
> write_lock_bh() increments the local BH count (which seems reasonable).
>
> A bigger question: Why are the IRQ check and the BH check inconsistent?
> That is, local_bh_count() says "yes" if you are currently running in BH
> context OR have disabled BHs; however, local_irq_count() says "yes" if
> you
> are currently running in interrupt context, but it says nothing (as far
> as
> I have seen) about whether IRQs are enabled or disabled.? Is this (a) the
> Right Way, (b) something that's more trouble to fix than to be burned-by
> once and then avoid for the rest of your life, or (c) totally horked?
>
> --
> Dan Eble <[email protected]>? _____? .
> ?????????????????????????? |? _? |/|
> Applied Innovation Inc.??? | |_| | |
> http://www.aiinet.com/???? |__/|_|_|
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-net" in
> the body of a message to [email protected]
> More majordomo info at? http://vger.kernel.org/majordomo-info.html
>
>
>

--
| Shmulik Hen |
| Israel Design Center (Jerusalem) |
| LAN Access Division |
| Intel Communications Group, Intel corp. |



2003-03-27 13:36:40

by David Miller

[permalink] [raw]
Subject: Re: BUG or not? GFP_KERNEL with interrupts disabled.

From: [email protected]
Date: Thu, 27 Mar 2003 15:32:02 +0200 (IST)

Further more, holding a lock_irq doesn't mean bottom halves are disabled
too, it just means interrupts are disabled and no *new* softirq can be
queued. Consider the following situation:

I think local_bh_enable() should check irqs_disabled() and honour that.
What you are showing here, that BH's can run via local_bh_enable()
even when IRQs are disabled, is a BUG().

IRQ disabling is meant to be stronger than softint disabling.

Ingo/Linus?

2003-03-27 14:05:19

by David Miller

[permalink] [raw]
Subject: Re: BUG or not? GFP_KERNEL with interrupts disabled.

From: Trond Myklebust <[email protected]>
Date: 27 Mar 2003 15:11:56 +0100

> IRQ disabling is meant to be stronger than softint disabling.

In that case, you'll need to have things like spin_lock_irqrestore()
call local_bh_enable() in order to run the pending softirqs. Is that
worth the trouble?

"trouble" is a weird word to use when the current behavior is
just wrong. :-)

My point is that it doesn't matter what the fix is, running
softints while hw IRQs are disabled must be fixed.

2003-03-27 14:10:17

by Trond Myklebust

[permalink] [raw]
Subject: Re: BUG or not? GFP_KERNEL with interrupts disabled.

>>>>> " " == David S Miller <[email protected]> writes:

> From: [email protected] Date: Thu, 27 Mar 2003 15:32:02
> +0200 (IST)

> Further more, holding a lock_irq doesn't mean bottom halves
> are disabled too, it just means interrupts are disabled and
> no *new* softirq can be queued. Consider the following
> situation:

> I think local_bh_enable() should check irqs_disabled() and
> honour that. What you are showing here, that BH's can run via
> local_bh_enable() even when IRQs are disabled, is a BUG().

> IRQ disabling is meant to be stronger than softint disabling.

In that case, you'll need to have things like spin_lock_irqrestore()
call local_bh_enable() in order to run the pending softirqs. Is that
worth the trouble?

Cheers,
Trond

2003-03-27 17:14:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: BUG or not? GFP_KERNEL with interrupts disabled.


On Thu, 27 Mar 2003, David S. Miller wrote:
>
> Further more, holding a lock_irq doesn't mean bottom halves are disabled
> too, it just means interrupts are disabled and no *new* softirq can be
> queued. Consider the following situation:
>
> I think local_bh_enable() should check irqs_disabled() and honour that.
> What you are showing here, that BH's can run via local_bh_enable()
> even when IRQs are disabled, is a BUG().

I'd disagree.

I do agree that we should obviously not run bottom halves with interrupts
disabled, but I think the _real_ bug is doing "local_bh_enable()" in the
first place. It's a nesting bug: you must nest the "stronger" lock inside
the weaker one, which means that the following is right:

local_bh_disable()
..
local_irq_disable()
...
local_irq_enable()
..
local_bh_enable()

and this is WRONG:

local_irq_disable() (or spinlock)
..
local_bh_disable()
..
local_bh_enable() !BUG BUG BUG!
..
local_irq_enable()

So the bug is, in my opinion, not in BK handling, but in the caller.

I missed the start of this thread, so I don't know how hard this is to
fix. But if you have a buggy sequence, the _simple_ fix may be to do
somehting like this:

+++ local_bh_disable()
local_irq_disable() (or spinlock)
..
local_bh_disable()
..
local_bh_enable() ! now it's a no-op and no longer a bug
..
local_irq_enable()
+++ local_bh_enable()

What's the code sequence?

Linus

2003-03-27 17:48:30

by David Miller

[permalink] [raw]
Subject: Re: BUG or not? GFP_KERNEL with interrupts disabled.

From: Linus Torvalds <[email protected]>
Date: Thu, 27 Mar 2003 09:22:29 -0800 (PST)

I do agree that we should obviously not run bottom halves with
interrupts disabled

Ok, so can we add a:

if (irqs_disabled())
BUG();

check to do_softirq()?

I'll address the rest of your email in a bit.

2003-03-27 18:01:08

by David Miller

[permalink] [raw]
Subject: Re: BUG or not? GFP_KERNEL with interrupts disabled.

From: Linus Torvalds <[email protected]>
Date: Thu, 27 Mar 2003 10:04:52 -0800 (PST)

I'd suggest making it a counting warning (with a static counter per
local-bh-enable macro expansion) and adding it to local_bh_enable() -
otherwise it will only BUG() when the (potentially rare) condition
happens - instead of always giving a nice backtrace of exact problem
spots.

Ok, maybe it's time to move local_bh_enable() out of line, it's
getting large and it's expanded in hundreds of places.

2003-03-27 17:56:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: BUG or not? GFP_KERNEL with interrupts disabled.


On Thu, 27 Mar 2003, David S. Miller wrote:
>
> I do agree that we should obviously not run bottom halves with
> interrupts disabled
>
> Ok, so can we add a:
>
> if (irqs_disabled())
> BUG();
>
> check to do_softirq()?

I'd suggest making it a counting warning (with a static counter per
local-bh-enable macro expansion) and adding it to local_bh_enable() -
otherwise it will only BUG() when the (potentially rare) condition
happens - instead of always giving a nice backtrace of exact problem
spots.

Linus