LinuxLists.cc - Re: [v5 Patch 1/3] netpoll: add generic support for bridge and bonding devices

2010-06-01 09:54:27

Subject: Re: [v5 Patch 1/3] netpoll: add generic support for bridge and bonding devices

On 06/01/10 03:08, Flavio Leitner wrote:
> On Mon, May 31, 2010 at 01:56:52PM +0800, Cong Wang wrote:
>> Hi, Flavio,
>>
>> Please use the attached patch instead, try to see if it solves
>> all your problems.
>
> I tried and it hangs. No backtraces this time.
> The bond_change_active_slave() prints before NETDEV_BONDING_FAILOVER
> notification, so I think it won't work.

Ah, I thought the same.

>
> Please, correct if I'm wrong, but when a failover happens with your
> patch applied, the netconsole would be disabled forever even with
> another healthy slave, right?
>

Yes, this is an easy solution, because bonding has several modes,
it is complex to make netpoll work in different modes.

Would you like to test the following patch?

Thanks much!

Attachments:

drivers-net-bonding-fix-activebackup-deadlock.diff (591.00 B)

2010-06-01 18:42:54

by Jay Vosburgh

[permalink] [raw]

Subject: Re: [v5 Patch 1/3] netpoll: add generic support for bridge and bonding devices

Cong Wang <[email protected]> wrote:

>On 06/01/10 03:08, Flavio Leitner wrote:
>> On Mon, May 31, 2010 at 01:56:52PM +0800, Cong Wang wrote:
>>> Hi, Flavio,
>>>
>>> Please use the attached patch instead, try to see if it solves
>>> all your problems.
>>
>> I tried and it hangs. No backtraces this time.
>> The bond_change_active_slave() prints before NETDEV_BONDING_FAILOVER
>> notification, so I think it won't work.
>
>Ah, I thought the same.
>
>>
>> Please, correct if I'm wrong, but when a failover happens with your
>> patch applied, the netconsole would be disabled forever even with
>> another healthy slave, right?
>>
>
>Yes, this is an easy solution, because bonding has several modes,
>it is complex to make netpoll work in different modes.

If I understand correctly, the root cause of the problem with
netconsole and bonding is that bonding is, ultimately, performing
printks with a write lock held, and when netconsole recursively calls
into bonding to send the printk over the netconsole, there is a deadlock
(when the bonding xmit function attempts to acquire the same lock for
read).

You're trying to avoid the deadlock by shutting off netconsole
(permanently, it looks like) for one problem case: a failover, which
does some printks with a write lock held.

This doesn't look to me like a complete solution, there are
other cases in bonding that will do printk with write locks held. I
suspect those will also hang netconsole as things exist today, and won't
be affected by your patch below.

For example:

The sysfs functions to set the primary (bonding_store_primary)
or active (bonding_store_active_slave) options: a pr_info is called to
provide a log message of the results. These could be tested by setting
the primary or active options via sysfs, e.g.,

echo eth0 > /sys/class/net/bond0/bonding/primary
echo eth0 > /sys/class/net/bond0/bonding/active

If the kernel is defined with DEBUG, there are a few pr_debug
calls within write_locks (bond_del_vlan, for example).

If the slave's underlying device driver's ndo_vlan_rx_register
or ndo_vlan_rx_kill_vid functions call printk (and it looks like some do
for error cases, e.g., igbvf, ehea, enic), those would also presumably
deadlock (because bonding holds its write_lock when calling the ndo_
vlan functions).

It also appears that (with the patch below) some nominally
normal usage patterns will immediately disable netconsole. The one that
comes to mind is if the primary= option is set (to "eth1" for this
example), but that slave not enslaved first (the slaves are added, say,
eth0 then eth1). In that situation, when the primary slave (eth1 here)
is added, the first thing that will happen is a failover, and that will
disable netconsole.

Thoughts?

-J

>Would you like to test the following patch?
>
>Thanks much!
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 5e12462..59ade92 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1109,6 +1109,14 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
> if (old_active == new_active)
> return;
>
>+ write_unlock_bh(&bond->curr_slave_lock);
>+ read_unlock(&bond->lock);
>+
>+ netdev_bonding_change(bond->dev, NETDEV_BONDING_DESLAVE);
>+
>+ read_lock(&bond->lock);
>+ write_lock_bh(&bond->curr_slave_lock);
>+
> if (new_active) {
> new_active->jiffies = jiffies;
>

---
-Jay Vosburgh, IBM Linux Technology Center, [email protected]

2010-06-02 10:01:23

by Cong Wang

[permalink] [raw]

Subject: Re: [v5 Patch 1/3] netpoll: add generic support for bridge and bonding devices

On 06/02/10 02:42, Jay Vosburgh wrote:
> Cong Wang<[email protected]> wrote:
>
>> On 06/01/10 03:08, Flavio Leitner wrote:
>>> On Mon, May 31, 2010 at 01:56:52PM +0800, Cong Wang wrote:
>>>> Hi, Flavio,
>>>>
>>>> Please use the attached patch instead, try to see if it solves
>>>> all your problems.
>>>
>>> I tried and it hangs. No backtraces this time.
>>> The bond_change_active_slave() prints before NETDEV_BONDING_FAILOVER
>>> notification, so I think it won't work.
>>
>> Ah, I thought the same.
>>
>>>
>>> Please, correct if I'm wrong, but when a failover happens with your
>>> patch applied, the netconsole would be disabled forever even with
>>> another healthy slave, right?
>>>
>>
>> Yes, this is an easy solution, because bonding has several modes,
>> it is complex to make netpoll work in different modes.
>
> If I understand correctly, the root cause of the problem with
> netconsole and bonding is that bonding is, ultimately, performing
> printks with a write lock held, and when netconsole recursively calls
> into bonding to send the printk over the netconsole, there is a deadlock
> (when the bonding xmit function attempts to acquire the same lock for
> read).

Yes.

>
> You're trying to avoid the deadlock by shutting off netconsole
> (permanently, it looks like) for one problem case: a failover, which
> does some printks with a write lock held.
>
> This doesn't look to me like a complete solution, there are
> other cases in bonding that will do printk with write locks held. I
> suspect those will also hang netconsole as things exist today, and won't
> be affected by your patch below.

I can expect that, bonding modes are complex.

>
> For example:
>
> The sysfs functions to set the primary (bonding_store_primary)
> or active (bonding_store_active_slave) options: a pr_info is called to
> provide a log message of the results. These could be tested by setting
> the primary or active options via sysfs, e.g.,
>
> echo eth0> /sys/class/net/bond0/bonding/primary
> echo eth0> /sys/class/net/bond0/bonding/active
>
> If the kernel is defined with DEBUG, there are a few pr_debug
> calls within write_locks (bond_del_vlan, for example).
>
> If the slave's underlying device driver's ndo_vlan_rx_register
> or ndo_vlan_rx_kill_vid functions call printk (and it looks like some do
> for error cases, e.g., igbvf, ehea, enic), those would also presumably
> deadlock (because bonding holds its write_lock when calling the ndo_
> vlan functions).
>
> It also appears that (with the patch below) some nominally
> normal usage patterns will immediately disable netconsole. The one that
> comes to mind is if the primary= option is set (to "eth1" for this
> example), but that slave not enslaved first (the slaves are added, say,
> eth0 then eth1). In that situation, when the primary slave (eth1 here)
> is added, the first thing that will happen is a failover, and that will
> disable netconsole.
>

Thanks for your detailed explanation!

This is why I said bonding is complex. I guess we would have to adjust
netpoll code for different bonding cases, one solution seems not fix all.
I am not sure how much work to do, since I am not familiar with bonding
code. Maybe Andy can help?

For the previous patch, it at least can make Flavio happy. :)

Thanks!

2010-06-04 19:19:21

by Andy Gospodarek

[permalink] [raw]

Subject: Re: [v5 Patch 1/3] netpoll: add generic support for bridge and bonding devices

On Wed, Jun 02, 2010 at 06:04:45PM +0800, Cong Wang wrote:
> On 06/02/10 02:42, Jay Vosburgh wrote:
>> Cong Wang<[email protected]> wrote:
>>
>>> On 06/01/10 03:08, Flavio Leitner wrote:
>>>> On Mon, May 31, 2010 at 01:56:52PM +0800, Cong Wang wrote:
>>>>> Hi, Flavio,
>>>>>
>>>>> Please use the attached patch instead, try to see if it solves
>>>>> all your problems.
>>>>
>>>> I tried and it hangs. No backtraces this time.
>>>> The bond_change_active_slave() prints before NETDEV_BONDING_FAILOVER
>>>> notification, so I think it won't work.
>>>
>>> Ah, I thought the same.
>>>
>>>>
>>>> Please, correct if I'm wrong, but when a failover happens with your
>>>> patch applied, the netconsole would be disabled forever even with
>>>> another healthy slave, right?
>>>>
>>>
>>> Yes, this is an easy solution, because bonding has several modes,
>>> it is complex to make netpoll work in different modes.
>>
>> If I understand correctly, the root cause of the problem with
>> netconsole and bonding is that bonding is, ultimately, performing
>> printks with a write lock held, and when netconsole recursively calls
>> into bonding to send the printk over the netconsole, there is a deadlock
>> (when the bonding xmit function attempts to acquire the same lock for
>> read).
>
>
> Yes.
>
>>
>> You're trying to avoid the deadlock by shutting off netconsole
>> (permanently, it looks like) for one problem case: a failover, which
>> does some printks with a write lock held.
>>
>> This doesn't look to me like a complete solution, there are
>> other cases in bonding that will do printk with write locks held. I
>> suspect those will also hang netconsole as things exist today, and won't
>> be affected by your patch below.
>
>
> I can expect that, bonding modes are complex.
>
>>
>> For example:
>>
>> The sysfs functions to set the primary (bonding_store_primary)
>> or active (bonding_store_active_slave) options: a pr_info is called to
>> provide a log message of the results. These could be tested by setting
>> the primary or active options via sysfs, e.g.,
>>
>> echo eth0> /sys/class/net/bond0/bonding/primary
>> echo eth0> /sys/class/net/bond0/bonding/active
>>
>> If the kernel is defined with DEBUG, there are a few pr_debug
>> calls within write_locks (bond_del_vlan, for example).
>>
>> If the slave's underlying device driver's ndo_vlan_rx_register
>> or ndo_vlan_rx_kill_vid functions call printk (and it looks like some do
>> for error cases, e.g., igbvf, ehea, enic), those would also presumably
>> deadlock (because bonding holds its write_lock when calling the ndo_
>> vlan functions).
>>
>> It also appears that (with the patch below) some nominally
>> normal usage patterns will immediately disable netconsole. The one that
>> comes to mind is if the primary= option is set (to "eth1" for this
>> example), but that slave not enslaved first (the slaves are added, say,
>> eth0 then eth1). In that situation, when the primary slave (eth1 here)
>> is added, the first thing that will happen is a failover, and that will
>> disable netconsole.
>>
>
> Thanks for your detailed explanation!
>
> This is why I said bonding is complex. I guess we would have to adjust
> netpoll code for different bonding cases, one solution seems not fix all.
> I am not sure how much work to do, since I am not familiar with bonding
> code. Maybe Andy can help?
>

Sorry I've been silent until now. This does seem quite similar to a
problem I've previously encountered when dealing with bonding+netpoll on
some old 2.6.9-based kernels. There is no guarantee the methods used
there will apply here, but I'll talk about them anyway.

As Flavio noticed, recursive calls into the bond transmit routines were
not a good idea. I discovered the same and worked around this issue by
checking to see if we could take the bond->lock for writing before
continuing. If we could not get, I wanted to signal that this should be
queued for transmission later. Based on the flow of netpoll_send_skb
(or possibly for another reason that is escaping me right now) I added
one of these checks in bond_poll_controller too. These aren't the
prettiest fixes, but seemed to work well for me when I did this work in
the past. I realize the differences are not that great compared to some
of the patches posted by Flavio, but I think they are worth trying.

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index ef60244..d7b9b99 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1290,6 +1290,12 @@ static bool slaves_support_netpoll(struct net_device *bond_dev)
static void bond_poll_controller(struct net_device *bond_dev)
{
struct net_device *dev = bond_dev->npinfo->netpoll->real_dev;
+ struct bonding *bond = netdev_priv(bond_dev);
+
+ if (!write_trylock(&bond->lock))
+ return;
+ write_unlock(&bond->lock);
+
if (dev != bond_dev)
netpoll_poll_dev(dev);
}
@@ -4418,7 +4424,11 @@ static void bond_set_xmit_hash_policy(struct bonding *bond)

static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
- const struct bonding *bond = netdev_priv(dev);
+ struct bonding *bond = netdev_priv(dev);
+
+ if (!write_trylock(&bond->lock))
+ return NETDEV_TX_BUSY;
+ write_unlock(&bond->lock);

switch (bond->params.mode) {
case BOND_MODE_ROUNDROBIN:

The other key to all of this is to make sure that queuing is done
correctly now that we expect to queue these frames and have them sent at
some point when there is a member of the bond that is actually capable
of sending them out.

The new style of sending queued skbs in a workqueue is much better than
what was done in the 2.6.9 timeframe, but careful attention should still
be paid to txq lock and which processor is the owner. Returning
something other than NETDEV_TX_OK from bond_start_xmit and checking for
locks being held there should also help with any deadlocks that show up
while running in queue_process (though they would not be recursive).

I'm not in a good spot to test this right now, but I can take a look at
next week and we can try and track down any of the other deadlocks that
currently exist as I suspect this will not resolve all of the issues.

2010-06-07 09:55:24

by Cong Wang

[permalink] [raw]

Subject: Re: [v5 Patch 1/3] netpoll: add generic support for bridge and bonding devices

On 06/05/10 03:18, Andy Gospodarek wrote:
> On Wed, Jun 02, 2010 at 06:04:45PM +0800, Cong Wang wrote:
>> On 06/02/10 02:42, Jay Vosburgh wrote:
>>> Cong Wang<[email protected]> wrote:
>>>
>>>> On 06/01/10 03:08, Flavio Leitner wrote:
>>>>> On Mon, May 31, 2010 at 01:56:52PM +0800, Cong Wang wrote:
>>>>>> Hi, Flavio,
>>>>>>
>>>>>> Please use the attached patch instead, try to see if it solves
>>>>>> all your problems.
>>>>>
>>>>> I tried and it hangs. No backtraces this time.
>>>>> The bond_change_active_slave() prints before NETDEV_BONDING_FAILOVER
>>>>> notification, so I think it won't work.
>>>>
>>>> Ah, I thought the same.
>>>>
>>>>>
>>>>> Please, correct if I'm wrong, but when a failover happens with your
>>>>> patch applied, the netconsole would be disabled forever even with
>>>>> another healthy slave, right?
>>>>>
>>>>
>>>> Yes, this is an easy solution, because bonding has several modes,
>>>> it is complex to make netpoll work in different modes.
>>>
>>> If I understand correctly, the root cause of the problem with
>>> netconsole and bonding is that bonding is, ultimately, performing
>>> printks with a write lock held, and when netconsole recursively calls
>>> into bonding to send the printk over the netconsole, there is a deadlock
>>> (when the bonding xmit function attempts to acquire the same lock for
>>> read).
>>
>>
>> Yes.
>>
>>>
>>> You're trying to avoid the deadlock by shutting off netconsole
>>> (permanently, it looks like) for one problem case: a failover, which
>>> does some printks with a write lock held.
>>>
>>> This doesn't look to me like a complete solution, there are
>>> other cases in bonding that will do printk with write locks held. I
>>> suspect those will also hang netconsole as things exist today, and won't
>>> be affected by your patch below.
>>
>>
>> I can expect that, bonding modes are complex.
>>
>>>
>>> For example:
>>>
>>> The sysfs functions to set the primary (bonding_store_primary)
>>> or active (bonding_store_active_slave) options: a pr_info is called to
>>> provide a log message of the results. These could be tested by setting
>>> the primary or active options via sysfs, e.g.,
>>>
>>> echo eth0> /sys/class/net/bond0/bonding/primary
>>> echo eth0> /sys/class/net/bond0/bonding/active
>>>
>>> If the kernel is defined with DEBUG, there are a few pr_debug
>>> calls within write_locks (bond_del_vlan, for example).
>>>
>>> If the slave's underlying device driver's ndo_vlan_rx_register
>>> or ndo_vlan_rx_kill_vid functions call printk (and it looks like some do
>>> for error cases, e.g., igbvf, ehea, enic), those would also presumably
>>> deadlock (because bonding holds its write_lock when calling the ndo_
>>> vlan functions).
>>>
>>> It also appears that (with the patch below) some nominally
>>> normal usage patterns will immediately disable netconsole. The one that
>>> comes to mind is if the primary= option is set (to "eth1" for this
>>> example), but that slave not enslaved first (the slaves are added, say,
>>> eth0 then eth1). In that situation, when the primary slave (eth1 here)
>>> is added, the first thing that will happen is a failover, and that will
>>> disable netconsole.
>>>
>>
>> Thanks for your detailed explanation!
>>
>> This is why I said bonding is complex. I guess we would have to adjust
>> netpoll code for different bonding cases, one solution seems not fix all.
>> I am not sure how much work to do, since I am not familiar with bonding
>> code. Maybe Andy can help?
>>
>
> Sorry I've been silent until now. This does seem quite similar to a
> problem I've previously encountered when dealing with bonding+netpoll on
> some old 2.6.9-based kernels. There is no guarantee the methods used
> there will apply here, but I'll talk about them anyway.
>
> As Flavio noticed, recursive calls into the bond transmit routines were
> not a good idea. I discovered the same and worked around this issue by
> checking to see if we could take the bond->lock for writing before
> continuing. If we could not get, I wanted to signal that this should be
> queued for transmission later. Based on the flow of netpoll_send_skb
> (or possibly for another reason that is escaping me right now) I added
> one of these checks in bond_poll_controller too. These aren't the
> prettiest fixes, but seemed to work well for me when I did this work in
> the past. I realize the differences are not that great compared to some
> of the patches posted by Flavio, but I think they are worth trying.

Hmm, I still feel like this way is ugly, although it may work.
I guess David doesn't like it either.

Anyway, Flavio, could you try the following patch as well?

Thanks a lot!

>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index ef60244..d7b9b99 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1290,6 +1290,12 @@ static bool slaves_support_netpoll(struct net_device *bond_dev)
> static void bond_poll_controller(struct net_device *bond_dev)
> {
> struct net_device *dev = bond_dev->npinfo->netpoll->real_dev;
> + struct bonding *bond = netdev_priv(bond_dev);
> +
> + if (!write_trylock(&bond->lock))
> + return;
> + write_unlock(&bond->lock);
> +
> if (dev != bond_dev)
> netpoll_poll_dev(dev);
> }
> @@ -4418,7 +4424,11 @@ static void bond_set_xmit_hash_policy(struct bonding *bond)
>
> static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
> {
> - const struct bonding *bond = netdev_priv(dev);
> + struct bonding *bond = netdev_priv(dev);
> +
> + if (!write_trylock(&bond->lock))
> + return NETDEV_TX_BUSY;
> + write_unlock(&bond->lock);
>
> switch (bond->params.mode) {
> case BOND_MODE_ROUNDROBIN:
>
> The other key to all of this is to make sure that queuing is done
> correctly now that we expect to queue these frames and have them sent at
> some point when there is a member of the bond that is actually capable
> of sending them out.
>
> The new style of sending queued skbs in a workqueue is much better than
> what was done in the 2.6.9 timeframe, but careful attention should still
> be paid to txq lock and which processor is the owner. Returning
> something other than NETDEV_TX_OK from bond_start_xmit and checking for
> locks being held there should also help with any deadlocks that show up
> while running in queue_process (though they would not be recursive).
>
> I'm not in a good spot to test this right now, but I can take a look at
> next week and we can try and track down any of the other deadlocks that
> currently exist as I suspect this will not resolve all of the issues.

2010-06-07 10:00:59

by David Miller

[permalink] [raw]

Subject: Re: [v5 Patch 1/3] netpoll: add generic support for bridge and bonding devices

From: Cong Wang <[email protected]>
Date: Mon, 07 Jun 2010 17:57:49 +0800

> Hmm, I still feel like this way is ugly, although it may work.
> I guess David doesn't like it either.

Of course I don't like it. :-)

I suspect the locking scheme will need to be changed.

Besides, if we're going to hack this up and do write lock attempts in
the read locking paths, there is no point in using a rwlock any more.
And I'm personally in disfavor of all rwlock usage anyways (it dirties
the cacheline for readers just as equally for writers, and if the
critically protected code path is short enough, that shared cache
line atomic operation will be the predominant cost).

So I'd say, 1) make this a spinlock and 2) try to use RCU for the
read path.

That would fix everything.

2010-06-07 13:04:32

by Andy Gospodarek

[permalink] [raw]

Subject: Re: [v5 Patch 1/3] netpoll: add generic support for bridge and bonding devices

On Mon, Jun 07, 2010 at 05:57:49PM +0800, Cong Wang wrote:
> On 06/05/10 03:18, Andy Gospodarek wrote:
>> On Wed, Jun 02, 2010 at 06:04:45PM +0800, Cong Wang wrote:
>>> On 06/02/10 02:42, Jay Vosburgh wrote:
>>>> Cong Wang<[email protected]> wrote:
>>>>
>>>>> On 06/01/10 03:08, Flavio Leitner wrote:
>>>>>> On Mon, May 31, 2010 at 01:56:52PM +0800, Cong Wang wrote:
>>>>>>> Hi, Flavio,
>>>>>>>
>>>>>>> Please use the attached patch instead, try to see if it solves
>>>>>>> all your problems.
>>>>>>
>>>>>> I tried and it hangs. No backtraces this time.
>>>>>> The bond_change_active_slave() prints before NETDEV_BONDING_FAILOVER
>>>>>> notification, so I think it won't work.
>>>>>
>>>>> Ah, I thought the same.
>>>>>
>>>>>>
>>>>>> Please, correct if I'm wrong, but when a failover happens with your
>>>>>> patch applied, the netconsole would be disabled forever even with
>>>>>> another healthy slave, right?
>>>>>>
>>>>>
>>>>> Yes, this is an easy solution, because bonding has several modes,
>>>>> it is complex to make netpoll work in different modes.
>>>>
>>>> If I understand correctly, the root cause of the problem with
>>>> netconsole and bonding is that bonding is, ultimately, performing
>>>> printks with a write lock held, and when netconsole recursively calls
>>>> into bonding to send the printk over the netconsole, there is a deadlock
>>>> (when the bonding xmit function attempts to acquire the same lock for
>>>> read).
>>>
>>>
>>> Yes.
>>>
>>>>
>>>> You're trying to avoid the deadlock by shutting off netconsole
>>>> (permanently, it looks like) for one problem case: a failover, which
>>>> does some printks with a write lock held.
>>>>
>>>> This doesn't look to me like a complete solution, there are
>>>> other cases in bonding that will do printk with write locks held. I
>>>> suspect those will also hang netconsole as things exist today, and won't
>>>> be affected by your patch below.
>>>
>>>
>>> I can expect that, bonding modes are complex.
>>>
>>>>
>>>> For example:
>>>>
>>>> The sysfs functions to set the primary (bonding_store_primary)
>>>> or active (bonding_store_active_slave) options: a pr_info is called to
>>>> provide a log message of the results. These could be tested by setting
>>>> the primary or active options via sysfs, e.g.,
>>>>
>>>> echo eth0> /sys/class/net/bond0/bonding/primary
>>>> echo eth0> /sys/class/net/bond0/bonding/active
>>>>
>>>> If the kernel is defined with DEBUG, there are a few pr_debug
>>>> calls within write_locks (bond_del_vlan, for example).
>>>>
>>>> If the slave's underlying device driver's ndo_vlan_rx_register
>>>> or ndo_vlan_rx_kill_vid functions call printk (and it looks like some do
>>>> for error cases, e.g., igbvf, ehea, enic), those would also presumably
>>>> deadlock (because bonding holds its write_lock when calling the ndo_
>>>> vlan functions).
>>>>
>>>> It also appears that (with the patch below) some nominally
>>>> normal usage patterns will immediately disable netconsole. The one that
>>>> comes to mind is if the primary= option is set (to "eth1" for this
>>>> example), but that slave not enslaved first (the slaves are added, say,
>>>> eth0 then eth1). In that situation, when the primary slave (eth1 here)
>>>> is added, the first thing that will happen is a failover, and that will
>>>> disable netconsole.
>>>>
>>>
>>> Thanks for your detailed explanation!
>>>
>>> This is why I said bonding is complex. I guess we would have to adjust
>>> netpoll code for different bonding cases, one solution seems not fix all.
>>> I am not sure how much work to do, since I am not familiar with bonding
>>> code. Maybe Andy can help?
>>>
>>
>> Sorry I've been silent until now. This does seem quite similar to a
>> problem I've previously encountered when dealing with bonding+netpoll on
>> some old 2.6.9-based kernels. There is no guarantee the methods used
>> there will apply here, but I'll talk about them anyway.
>>
>> As Flavio noticed, recursive calls into the bond transmit routines were
>> not a good idea. I discovered the same and worked around this issue by
>> checking to see if we could take the bond->lock for writing before
>> continuing. If we could not get, I wanted to signal that this should be
>> queued for transmission later. Based on the flow of netpoll_send_skb
>> (or possibly for another reason that is escaping me right now) I added
>> one of these checks in bond_poll_controller too. These aren't the
>> prettiest fixes, but seemed to work well for me when I did this work in
>> the past. I realize the differences are not that great compared to some
>> of the patches posted by Flavio, but I think they are worth trying.
>
>
> Hmm, I still feel like this way is ugly, although it may work.
> I guess David doesn't like it either.
>

Notice how I referred to it as a work-around? :)

It certainly isn't a great way to resolve the issue, but I wanted to
offer my opinon on the issue since you asked.

> Anyway, Flavio, could you try the following patch as well?
>
> Thanks a lot!
>
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index ef60244..d7b9b99 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -1290,6 +1290,12 @@ static bool slaves_support_netpoll(struct net_device *bond_dev)
>> static void bond_poll_controller(struct net_device *bond_dev)
>> {
>> struct net_device *dev = bond_dev->npinfo->netpoll->real_dev;
>> + struct bonding *bond = netdev_priv(bond_dev);
>> +
>> + if (!write_trylock(&bond->lock))
>> + return;
>> + write_unlock(&bond->lock);
>> +
>> if (dev != bond_dev)
>> netpoll_poll_dev(dev);
>> }
>> @@ -4418,7 +4424,11 @@ static void bond_set_xmit_hash_policy(struct bonding *bond)
>>
>> static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
>> {
>> - const struct bonding *bond = netdev_priv(dev);
>> + struct bonding *bond = netdev_priv(dev);
>> +
>> + if (!write_trylock(&bond->lock))
>> + return NETDEV_TX_BUSY;
>> + write_unlock(&bond->lock);
>>
>> switch (bond->params.mode) {
>> case BOND_MODE_ROUNDROBIN:
>>
>> The other key to all of this is to make sure that queuing is done
>> correctly now that we expect to queue these frames and have them sent at
>> some point when there is a member of the bond that is actually capable
>> of sending them out.
>>
>> The new style of sending queued skbs in a workqueue is much better than
>> what was done in the 2.6.9 timeframe, but careful attention should still
>> be paid to txq lock and which processor is the owner. Returning
>> something other than NETDEV_TX_OK from bond_start_xmit and checking for
>> locks being held there should also help with any deadlocks that show up
>> while running in queue_process (though they would not be recursive).
>>
>> I'm not in a good spot to test this right now, but I can take a look at
>> next week and we can try and track down any of the other deadlocks that
>> currently exist as I suspect this will not resolve all of the issues.
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-06-07 19:26:08

by Flavio Leitner

[permalink] [raw]

Subject: [PATCH] netconsole: queue console messages to send later

There are some networking drivers that hold a lock in the
transmit path. Therefore, if a console message is printed
after that, netconsole will push it through the transmit path,
resulting in a deadlock.

This patch fixes the re-injection problem by queuing the console
messages in a preallocated circular buffer and then scheduling a
workqueue to send them later with another context.

Signed-off-by: Flavio Leitner <[email protected]>
---
drivers/net/netconsole.c | 156 +++++++++++++++++++++++++++++++++++++++-------
1 files changed, 133 insertions(+), 23 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index ca142c4..874376d 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -44,6 +44,8 @@
#include <linux/netpoll.h>
#include <linux/inet.h>
#include <linux/configfs.h>
+#include <linux/workqueue.h>
+#include <linux/circ_buf.h>

MODULE_AUTHOR("Maintainer: Matt Mackall <[email protected]>");
MODULE_DESCRIPTION("Console driver for network interfaces");
@@ -56,6 +58,10 @@ static char config[MAX_PARAM_LENGTH];
module_param_string(netconsole, config, MAX_PARAM_LENGTH, 0);
MODULE_PARM_DESC(netconsole, " netconsole=[src-port]@[src-ip]/[dev],[tgt-port]@<tgt-ip>/[tgt-macaddr]");

+static int logsize = PAGE_SIZE;
+module_param(logsize, int, 0444);
+MODULE_PARM_DESC(logsize, "netconsole buffer size");
+
#ifndef MODULE
static int __init option_setup(char *opt)
{
@@ -100,6 +106,75 @@ struct netconsole_target {
struct netpoll np;
};

+struct netconsole_msg_ctl {
+ struct circ_buf messages;
+ unsigned long ring_size;
+ struct page *buffer_page;
+ struct work_struct tx_work;
+};
+static struct netconsole_msg_ctl *netconsole_ctl;
+
+#define RING_INC_POS(pos, inc, size) ((pos + inc) & (size - 1))
+
+static void netconsole_target_get(struct netconsole_target *nt);
+static void netconsole_target_put(struct netconsole_target *nt);
+
+static void netconsole_start_xmit(const char *msg, unsigned int length)
+{
+ int frag, left;
+ unsigned long flags;
+ struct netconsole_target *nt;
+ const char *tmp;
+
+ /* Avoid taking lock and disabling interrupts unnecessarily */
+ if (list_empty(&target_list))
+ return;
+
+ spin_lock_irqsave(&target_list_lock, flags);
+ list_for_each_entry(nt, &target_list, list) {
+ netconsole_target_get(nt);
+ if (nt->enabled && netif_running(nt->np.dev)) {
+ /*
+ * We nest this inside the for-each-target loop above
+ * so that we're able to get as much logging out to
+ * at least one target if we die inside here, instead
+ * of unnecessarily keeping all targets in lock-step.
+ */
+ tmp = msg;
+ for (left = length; left;) {
+ frag = min(left, MAX_PRINT_CHUNK);
+ netpoll_send_udp(&nt->np, tmp, frag);
+ tmp += frag;
+ left -= frag;
+ }
+ }
+ netconsole_target_put(nt);
+ }
+ spin_unlock_irqrestore(&target_list_lock, flags);
+}
+
+static void netconsole_process_queue(struct work_struct *work)
+{
+ struct circ_buf *messages = &netconsole_ctl->messages;
+ unsigned long ring_size = netconsole_ctl->ring_size;
+ unsigned long head = ACCESS_ONCE(messages->head);
+ unsigned long len;
+
+ while (CIRC_CNT(head, messages->tail, ring_size) >= 1) {
+ /* read index before reading contents at that index */
+ smp_read_barrier_depends();
+
+ /* pick up a length that don't wrap in the middle */
+ len = CIRC_CNT_TO_END(head, messages->tail, ring_size);
+ netconsole_start_xmit(&messages->buf[messages->tail], len);
+
+ /* finish reading descriptor before incrementing tail */
+ smp_mb();
+ messages->tail = RING_INC_POS(messages->tail, len, ring_size);
+ head = ACCESS_ONCE(messages->head);
+ }
+}
+
#ifdef CONFIG_NETCONSOLE_DYNAMIC

static struct configfs_subsystem netconsole_subsys;
@@ -702,38 +777,43 @@ static struct notifier_block netconsole_netdev_notifier = {
.notifier_call = netconsole_netdev_event,
};

+/* called with console sem, interrupts disabled */
static void write_msg(struct console *con, const char *msg, unsigned int len)
{
- int frag, left;
- unsigned long flags;
- struct netconsole_target *nt;
- const char *tmp;
+ struct circ_buf *messages = &netconsole_ctl->messages;
+ unsigned long ring_size = netconsole_ctl->ring_size;
+ unsigned long tail = ACCESS_ONCE(messages->tail);
+ unsigned long left;
+ unsigned long end;
+ unsigned long pos;

/* Avoid taking lock and disabling interrupts unnecessarily */
if (list_empty(&target_list))
return;

- spin_lock_irqsave(&target_list_lock, flags);
- list_for_each_entry(nt, &target_list, list) {
- netconsole_target_get(nt);
- if (nt->enabled && netif_running(nt->np.dev)) {
- /*
- * We nest this inside the for-each-target loop above
- * so that we're able to get as much logging out to
- * at least one target if we die inside here, instead
- * of unnecessarily keeping all targets in lock-step.
- */
- tmp = msg;
- for (left = len; left;) {
- frag = min(left, MAX_PRINT_CHUNK);
- netpoll_send_udp(&nt->np, tmp, frag);
- tmp += frag;
- left -= frag;
- }
+ pos = 0;
+ left = len;
+ while (left && CIRC_SPACE(messages->head, tail, ring_size) >= 1) {
+ end = CIRC_SPACE_TO_END(messages->head, tail, ring_size);
+ /* fast path, no wrapping is needed */
+ if (end >= left) {
+ memcpy(&messages->buf[messages->head], &msg[pos], left);
+ smp_wmb();
+ messages->head = RING_INC_POS(messages->head, left, ring_size);
+ left = 0;
}
- netconsole_target_put(nt);
+ else {
+ /* copy up to the end */
+ memcpy(&messages->buf[messages->head], &msg[pos], end);
+ smp_wmb();
+ messages->head = RING_INC_POS(messages->head, end, ring_size);
+ left -= end;
+ pos += end;
+ }
+
}
- spin_unlock_irqrestore(&target_list_lock, flags);
+
+ schedule_work(&netconsole_ctl->tx_work);
}

static struct console netconsole = {
@@ -746,9 +826,25 @@ static int __init init_netconsole(void)
{
int err;
struct netconsole_target *nt, *tmp;
+ struct circ_buf *messages;
unsigned long flags;
char *target_config;
char *input = config;
+ int order = get_order(logsize);
+
+ err = -ENOMEM;
+ netconsole_ctl = kzalloc(sizeof(*netconsole_ctl), GFP_KERNEL);
+ if (netconsole_ctl == NULL)
+ goto nomem;
+
+ netconsole_ctl->buffer_page = alloc_pages(GFP_KERNEL, order);
+ if (netconsole_ctl->buffer_page == NULL)
+ goto nopage;
+
+ netconsole_ctl->ring_size = (PAGE_SIZE << order);
+ messages = &netconsole_ctl->messages;
+ messages->buf = page_address(netconsole_ctl->buffer_page);
+ INIT_WORK(&netconsole_ctl->tx_work, netconsole_process_queue);

if (strnlen(input, MAX_PARAM_LENGTH)) {
while ((target_config = strsep(&input, ";"))) {
@@ -795,6 +891,11 @@ fail:
free_param_target(nt);
}

+ __free_pages(netconsole_ctl->buffer_page, order);
+nopage:
+ kfree(netconsole_ctl);
+
+nomem:
return err;
}

@@ -806,6 +907,10 @@ static void __exit cleanup_netconsole(void)
dynamic_netconsole_exit();
unregister_netdevice_notifier(&netconsole_netdev_notifier);

+ flush_work(&netconsole_ctl->tx_work);
+ cancel_work_sync(&netconsole_ctl->tx_work);
+ netconsole_process_queue(NULL);
+
/*
* Targets created via configfs pin references on our module
* and would first be rmdir(2)'ed from userspace. We reach
@@ -818,6 +923,11 @@ static void __exit cleanup_netconsole(void)
list_del(&nt->list);
free_param_target(nt);
}
+
+ __free_pages(netconsole_ctl->buffer_page,
+ get_order(netconsole_ctl->ring_size));
+
+ kfree(netconsole_ctl);
}

module_init(init_netconsole);
--
1.7.0.1

2010-06-07 19:50:53

by Matt Mackall

[permalink] [raw]

Subject: Re: [PATCH] netconsole: queue console messages to send later

On Mon, 2010-06-07 at 16:24 -0300, Flavio Leitner wrote:
> There are some networking drivers that hold a lock in the
> transmit path. Therefore, if a console message is printed
> after that, netconsole will push it through the transmit path,
> resulting in a deadlock.

This is an ongoing pain we've known about since before introducing the
netpoll code to the tree.

My take has always been that any form of queueing is contrary to the
goal of netpoll: timely delivery of messages even during machine-killing
situations like oopses. There may never be a second chance to deliver
the message as the machine may be locked solid. And there may be no
other way to get the message out of the box in such situations. Adding
queueing is a throwing-the-baby-out-with-the-bathwater fix.

I think Dave agrees with me here, and I believe he's said in the past
that drivers trying to print messages in such contexts should be
considered buggy.

--
Mathematics is the supreme nostalgia of our time.

2010-06-07 20:01:11

by Stephen Hemminger

[permalink] [raw]

Subject: Re: [PATCH] netconsole: queue console messages to send later

On Mon, 07 Jun 2010 14:50:48 -0500
Matt Mackall <[email protected]> wrote:

> On Mon, 2010-06-07 at 16:24 -0300, Flavio Leitner wrote:
> > There are some networking drivers that hold a lock in the
> > transmit path. Therefore, if a console message is printed
> > after that, netconsole will push it through the transmit path,
> > resulting in a deadlock.
>
> This is an ongoing pain we've known about since before introducing the
> netpoll code to the tree.
>
> My take has always been that any form of queueing is contrary to the
> goal of netpoll: timely delivery of messages even during machine-killing
> situations like oopses. There may never be a second chance to deliver
> the message as the machine may be locked solid. And there may be no
> other way to get the message out of the box in such situations. Adding
> queueing is a throwing-the-baby-out-with-the-bathwater fix.
>
> I think Dave agrees with me here, and I believe he's said in the past
> that drivers trying to print messages in such contexts should be
> considered buggy.
>

Because it to hard to fix all possible device configurations.
There should be any way to detect recursion and just drop the message to
avoid deadlock.

--

2010-06-07 20:21:37

by Matt Mackall

[permalink] [raw]

Subject: Re: [PATCH] netconsole: queue console messages to send later

On Mon, 2010-06-07 at 13:00 -0700, Stephen Hemminger wrote:
> On Mon, 07 Jun 2010 14:50:48 -0500
> Matt Mackall <[email protected]> wrote:
>
> > On Mon, 2010-06-07 at 16:24 -0300, Flavio Leitner wrote:
> > > There are some networking drivers that hold a lock in the
> > > transmit path. Therefore, if a console message is printed
> > > after that, netconsole will push it through the transmit path,
> > > resulting in a deadlock.
> >
> > This is an ongoing pain we've known about since before introducing the
> > netpoll code to the tree.
> >
> > My take has always been that any form of queueing is contrary to the
> > goal of netpoll: timely delivery of messages even during machine-killing
> > situations like oopses. There may never be a second chance to deliver
> > the message as the machine may be locked solid. And there may be no
> > other way to get the message out of the box in such situations. Adding
> > queueing is a throwing-the-baby-out-with-the-bathwater fix.
> >
> > I think Dave agrees with me here, and I believe he's said in the past
> > that drivers trying to print messages in such contexts should be
> > considered buggy.
> >
>
> Because it to hard to fix all possible device configurations.
> There should be any way to detect recursion and just drop the message to
> avoid deadlock.

Open to suggestions. The locks in question are driver-internal. There
also may not be any actual recursion taking place:

driver path a takes private lock x
driver path a attempts printk
printk calls into netconsole
netconsole calls into driver path b
driver path b attempts to take lock x -> deadlock

So we can't even try to walk back the stack looking for such nonsense.
Though we could perhaps force queuing of all messages -from- the driver
bound to netconsole. Tricky, and not quite foolproof.

--
Mathematics is the supreme nostalgia of our time.

2010-06-07 23:50:16

by David Miller

[permalink] [raw]

Subject: Re: [PATCH] netconsole: queue console messages to send later

From: Flavio Leitner <[email protected]>
Date: Mon, 7 Jun 2010 16:24:52 -0300

> There are some networking drivers that hold a lock in the
> transmit path. Therefore, if a console message is printed
> after that, netconsole will push it through the transmit path,
> resulting in a deadlock.
>
> This patch fixes the re-injection problem by queuing the console
> messages in a preallocated circular buffer and then scheduling a
> workqueue to send them later with another context.
>
> Signed-off-by: Flavio Leitner <[email protected]>

You absolutely and positively MUST NOT do this. Otherwise netconsole
becomes completely useless. Your idea has been proposed several times
as far back as 6 years ago, it was unacceptable then and it's
unacceptable now.

The whole point of netconsole is that we may be deep in an interrupt
or other atomic context, the machine is about to hard hang, and it's
absolutely essential that we get out any and all kernel logging
messages that we can, immediately.

There may not be another timer or workqueue able to execute after the
printk() we're trying to emit. We may never get to that point.

So if we defer messages, that means we won't get the message and we
won't be able to debug the problem.

Fix the locking in the drivers or layers that cause the issue instead
of breaking netconsole.

2010-06-07 23:52:16

by David Miller

[permalink] [raw]

Subject: Re: [PATCH] netconsole: queue console messages to send later

From: Matt Mackall <[email protected]>
Date: Mon, 07 Jun 2010 15:21:31 -0500

> Open to suggestions. The locks in question are driver-internal. There
> also may not be any actual recursion taking place:
>
> driver path a takes private lock x
> driver path a attempts printk
> printk calls into netconsole
> netconsole calls into driver path b
> driver path b attempts to take lock x -> deadlock
>
> So we can't even try to walk back the stack looking for such nonsense.
> Though we could perhaps force queuing of all messages -from- the driver
> bound to netconsole. Tricky, and not quite foolproof.

Look, this is all nonsense talk.

This is only coming about because of the recent discussions about
bonding, so let's fix bonding's locking. I've made concrete
suggestions on converting it's rwlocks over to spinlocks and RCU to
fix the specific problem bonding has.

Every time we hit some new locking issue the knee jerk reaction is
to do something stupid to the generic netconsole code instead of
fixing the real source of the problem.

2010-06-08 00:37:17

by Flavio Leitner

[permalink] [raw]

Subject: Re: [PATCH] netconsole: queue console messages to send later

On Mon, Jun 07, 2010 at 04:50:24PM -0700, David Miller wrote:
> From: Flavio Leitner <[email protected]>
> Date: Mon, 7 Jun 2010 16:24:52 -0300
>
> > There are some networking drivers that hold a lock in the
> > transmit path. Therefore, if a console message is printed
> > after that, netconsole will push it through the transmit path,
> > resulting in a deadlock.
> >
> > This patch fixes the re-injection problem by queuing the console
> > messages in a preallocated circular buffer and then scheduling a
> > workqueue to send them later with another context.
> >
> > Signed-off-by: Flavio Leitner <[email protected]>
>
> You absolutely and positively MUST NOT do this. Otherwise netconsole
> becomes completely useless. Your idea has been proposed several times
> as far back as 6 years ago, it was unacceptable then and it's
> unacceptable now.
>
> The whole point of netconsole is that we may be deep in an interrupt
> or other atomic context, the machine is about to hard hang, and it's
> absolutely essential that we get out any and all kernel logging
> messages that we can, immediately.

Got it. I've never assumed that netconsole would work reliable on
such situations, so I thought as we have better ways now it would
be helpful. See another idea below.

> There may not be another timer or workqueue able to execute after the
> printk() we're trying to emit. We may never get to that point.

What if in the netpoll, before we push the skb to the driver, we check
for a bit saying that it's already pushing another skb. In this case,
queue the new skb inside of netpoll and soon as the first call returns
and try to clear the bit, it will send the next skb?

printk("message 1")
...
netconsole called
netpoll sets the flag bit
pushes to the bonding driver which does another printk("message 2")
netconsole called again
netpoll checks for the flag, queue the message, returns.
so, bonding can finish up to send the first message
netpoll is about to return, checks for new queued messages, and pushes them.
bonding finishes up to send the second message
....

No deadlocks, skbs are ordered and still under the same opportunity
to send something. Does it sound acceptable?
It's off the top of my head, so probably this idea has some problems.

> Fix the locking in the drivers or layers that cause the issue instead
> of breaking netconsole.

Someday, somewhere, I know because I did this before, someone will
use a debugging printk() and will see the entire box hanging with
absolutely no message in any console because of this problem.
I'm not saying that fixing driver isn't the right way to go but
it seems not enough to me.

--
Flavio

2010-06-08 08:32:38

by Cong Wang

[permalink] [raw]

Subject: Re: [v5 Patch 1/3] netpoll: add generic support for bridge and bonding devices

On 06/07/10 18:01, David Miller wrote:
> From: Cong Wang<[email protected]>
> Date: Mon, 07 Jun 2010 17:57:49 +0800
>
>> Hmm, I still feel like this way is ugly, although it may work.
>> I guess David doesn't like it either.
>
> Of course I don't like it. :-)
>
> I suspect the locking scheme will need to be changed.
>
> Besides, if we're going to hack this up and do write lock attempts in
> the read locking paths, there is no point in using a rwlock any more.
> And I'm personally in disfavor of all rwlock usage anyways (it dirties
> the cacheline for readers just as equally for writers, and if the
> critically protected code path is short enough, that shared cache
> line atomic operation will be the predominant cost).
>
> So I'd say, 1) make this a spinlock and 2) try to use RCU for the
> read path.
>
> That would fix everything.

Yeah, agreed. Even not talking about netconsole, bonding code
does have locking problems, netconsole just makes this problem
clear.

I will try your suggestions above.

Thanks!

2010-06-08 08:35:17

by Cong Wang

[permalink] [raw]

Subject: Re: [v5 Patch 1/3] netpoll: add generic support for bridge and bonding devices

On 06/07/10 21:03, Andy Gospodarek wrote:
> On Mon, Jun 07, 2010 at 05:57:49PM +0800, Cong Wang wrote:
>> On 06/05/10 03:18, Andy Gospodarek wrote:
>>> On Wed, Jun 02, 2010 at 06:04:45PM +0800, Cong Wang wrote:
>>>> On 06/02/10 02:42, Jay Vosburgh wrote:
>>>>> Cong Wang<[email protected]> wrote:
>>>>>
>>>>>> On 06/01/10 03:08, Flavio Leitner wrote:
>>>>>>> On Mon, May 31, 2010 at 01:56:52PM +0800, Cong Wang wrote:
>>>>>>>> Hi, Flavio,
>>>>>>>>
>>>>>>>> Please use the attached patch instead, try to see if it solves
>>>>>>>> all your problems.
>>>>>>>
>>>>>>> I tried and it hangs. No backtraces this time.
>>>>>>> The bond_change_active_slave() prints before NETDEV_BONDING_FAILOVER
>>>>>>> notification, so I think it won't work.
>>>>>>
>>>>>> Ah, I thought the same.
>>>>>>
>>>>>>>
>>>>>>> Please, correct if I'm wrong, but when a failover happens with your
>>>>>>> patch applied, the netconsole would be disabled forever even with
>>>>>>> another healthy slave, right?
>>>>>>>
>>>>>>
>>>>>> Yes, this is an easy solution, because bonding has several modes,
>>>>>> it is complex to make netpoll work in different modes.
>>>>>
>>>>> If I understand correctly, the root cause of the problem with
>>>>> netconsole and bonding is that bonding is, ultimately, performing
>>>>> printks with a write lock held, and when netconsole recursively calls
>>>>> into bonding to send the printk over the netconsole, there is a deadlock
>>>>> (when the bonding xmit function attempts to acquire the same lock for
>>>>> read).
>>>>
>>>>
>>>> Yes.
>>>>
>>>>>
>>>>> You're trying to avoid the deadlock by shutting off netconsole
>>>>> (permanently, it looks like) for one problem case: a failover, which
>>>>> does some printks with a write lock held.
>>>>>
>>>>> This doesn't look to me like a complete solution, there are
>>>>> other cases in bonding that will do printk with write locks held. I
>>>>> suspect those will also hang netconsole as things exist today, and won't
>>>>> be affected by your patch below.
>>>>
>>>>
>>>> I can expect that, bonding modes are complex.
>>>>
>>>>>
>>>>> For example:
>>>>>
>>>>> The sysfs functions to set the primary (bonding_store_primary)
>>>>> or active (bonding_store_active_slave) options: a pr_info is called to
>>>>> provide a log message of the results. These could be tested by setting
>>>>> the primary or active options via sysfs, e.g.,
>>>>>
>>>>> echo eth0> /sys/class/net/bond0/bonding/primary
>>>>> echo eth0> /sys/class/net/bond0/bonding/active
>>>>>
>>>>> If the kernel is defined with DEBUG, there are a few pr_debug
>>>>> calls within write_locks (bond_del_vlan, for example).
>>>>>
>>>>> If the slave's underlying device driver's ndo_vlan_rx_register
>>>>> or ndo_vlan_rx_kill_vid functions call printk (and it looks like some do
>>>>> for error cases, e.g., igbvf, ehea, enic), those would also presumably
>>>>> deadlock (because bonding holds its write_lock when calling the ndo_
>>>>> vlan functions).
>>>>>
>>>>> It also appears that (with the patch below) some nominally
>>>>> normal usage patterns will immediately disable netconsole. The one that
>>>>> comes to mind is if the primary= option is set (to "eth1" for this
>>>>> example), but that slave not enslaved first (the slaves are added, say,
>>>>> eth0 then eth1). In that situation, when the primary slave (eth1 here)
>>>>> is added, the first thing that will happen is a failover, and that will
>>>>> disable netconsole.
>>>>>
>>>>
>>>> Thanks for your detailed explanation!
>>>>
>>>> This is why I said bonding is complex. I guess we would have to adjust
>>>> netpoll code for different bonding cases, one solution seems not fix all.
>>>> I am not sure how much work to do, since I am not familiar with bonding
>>>> code. Maybe Andy can help?
>>>>
>>>
>>> Sorry I've been silent until now. This does seem quite similar to a
>>> problem I've previously encountered when dealing with bonding+netpoll on
>>> some old 2.6.9-based kernels. There is no guarantee the methods used
>>> there will apply here, but I'll talk about them anyway.
>>>
>>> As Flavio noticed, recursive calls into the bond transmit routines were
>>> not a good idea. I discovered the same and worked around this issue by
>>> checking to see if we could take the bond->lock for writing before
>>> continuing. If we could not get, I wanted to signal that this should be
>>> queued for transmission later. Based on the flow of netpoll_send_skb
>>> (or possibly for another reason that is escaping me right now) I added
>>> one of these checks in bond_poll_controller too. These aren't the
>>> prettiest fixes, but seemed to work well for me when I did this work in
>>> the past. I realize the differences are not that great compared to some
>>> of the patches posted by Flavio, but I think they are worth trying.
>>
>>
>> Hmm, I still feel like this way is ugly, although it may work.
>> I guess David doesn't like it either.
>>
>
> Notice how I referred to it as a work-around? :)
>
> It certainly isn't a great way to resolve the issue, but I wanted to
> offer my opinon on the issue since you asked.

Sorry for my misunderstanding.

2010-06-08 08:58:53

by Cong Wang

[permalink] [raw]

Subject: Re: [PATCH] netconsole: queue console messages to send later

Thanks for your fix, Flavio!

On 06/08/10 08:37, Flavio Leitner wrote:
>> There may not be another timer or workqueue able to execute after the
>> printk() we're trying to emit. We may never get to that point.
>
> What if in the netpoll, before we push the skb to the driver, we check
> for a bit saying that it's already pushing another skb. In this case,
> queue the new skb inside of netpoll and soon as the first call returns
> and try to clear the bit, it will send the next skb?
>
> printk("message 1")
> ...
> netconsole called
> netpoll sets the flag bit
> pushes to the bonding driver which does another printk("message 2")
> netconsole called again
> netpoll checks for the flag, queue the message, returns.
> so, bonding can finish up to send the first message
> netpoll is about to return, checks for new queued messages, and pushes them.
> bonding finishes up to send the second message
> ....
>
> No deadlocks, skbs are ordered and still under the same opportunity
> to send something. Does it sound acceptable?
> It's off the top of my head, so probably this idea has some problems.
>

I am not a net expert, I am not sure if this solution really addresses
David's concern, but it makes sense for me.

>
>> Fix the locking in the drivers or layers that cause the issue instead
>> of breaking netconsole.
>
> Someday, somewhere, I know because I did this before, someone will
> use a debugging printk() and will see the entire box hanging with
> absolutely no message in any console because of this problem.
> I'm not saying that fixing driver isn't the right way to go but
> it seems not enough to me.

Well, I think netconsole is not alone, other console drivers could
have the same problem, printk() is not always available in some
situation like this.

Thanks.