2009-04-01 15:40:30

by Andy Lutomirski

[permalink] [raw]
Subject: [PATCH 1/1] Tell linkwatch about new interfaces

From: Andrew Lutomirski <[email protected]>

When a network driver registers a new interface, linkwatch will not notice,
and hence not set the rfc2863 operstate, until netif_carrier_on gets called.
If the new interface has no carrier when it is connected, then a status of
"unknown" is reported to userspace, which confuses various tools
(NetworkManager, for example).

This fires a linkwatch event for all new interfaces, so that operstate
gets set reasonably quickly.

Signed-off-by: Andrew Lutomirski <[email protected]>

---

This looks like the root cause of the bug I reported here:

http://lkml.org/lkml/2009/3/24/499

Without knowing all the locking and ordering constraints imposed on network
drivers, this seemed like the safest way to fix the bug. Alternative
approaches would be to call rfc2863_policy directly or to initialize the
relevent fields to sane values (for the carrier off state) in alloc_netdev.

This applies to 2.6.29. I can rediff it for any other tree, but I didn't see
any changes that would conflict.

diff --git a/net/core/dev.c b/net/core/dev.c
index e438f54..45911fd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4445,6 +4445,9 @@ int register_netdevice(struct net_device *dev)
dev->reg_state = NETREG_UNREGISTERED;
}

+ /* Update link state. */
+ linkwatch_fire_event(dev);
+
out:
return ret;


2009-04-05 00:06:05

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/1] Tell linkwatch about new interfaces

From: Andrew Lutomirski <[email protected]>
Date: Wed, 1 Apr 2009 11:40:06 -0400

> When a network driver registers a new interface, linkwatch will not notice,
> and hence not set the rfc2863 operstate, until netif_carrier_on gets called.
> If the new interface has no carrier when it is connected, then a status of
> "unknown" is reported to userspace, which confuses various tools
> (NetworkManager, for example).
>
> This fires a linkwatch event for all new interfaces, so that operstate
> gets set reasonably quickly.
>
> Signed-off-by: Andrew Lutomirski <[email protected]>

The default assumed state for a freshly registered network
device is that the link is up.

If that disagrees from reality, the driver should make the
appropriate netif_carrier_off() call.

I'm sure you'll find that the e1000 driver is not doing this
and that is what causes the bug you are seeing.

2009-04-05 04:01:16

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 1/1] Tell linkwatch about new interfaces

On Sat, Apr 4, 2009 at 8:05 PM, David Miller <[email protected]> wrote:
> From: Andrew Lutomirski <[email protected]>
> Date: Wed, 1 Apr 2009 11:40:06 -0400
>
>> When a network driver registers a new interface, linkwatch will not notice,
>> and hence not set the rfc2863 operstate, until netif_carrier_on gets called.
>> If the new interface has no carrier when it is connected, then a status of
>> "unknown" is reported to userspace, which confuses various tools
>> (NetworkManager, for example).
>>
>> This fires a linkwatch event for all new interfaces, so that operstate
>> gets set reasonably quickly.
>>
>> Signed-off-by: Andrew Lutomirski <[email protected]>
>
> The default assumed state for a freshly registered network
> device is that the link is up.
>
> If that disagrees from reality, the driver should make the
> appropriate netif_carrier_off() call.
>
> I'm sure you'll find that the e1000 driver is not doing this
> and that is what causes the bug you are seeing.
>

Nope. The end of e1000_probe (in e1000e) is:

/* tell the stack to leave us alone until e1000_open() is called */
netif_carrier_off(netdev);
netif_tx_stop_all_queues(netdev);

strcpy(netdev->name, "eth%d");
err = register_netdev(netdev);
if (err)
goto err_register;

e1000_print_device_info(adapter);

netif_carrier_off does:

void netif_carrier_off(struct net_device *dev)
{
if (!test_and_set_bit(__LINK_STATE_NOCARRIER, &dev->state)) {
if (dev->reg_state == NETREG_UNINITIALIZED)
return;
linkwatch_fire_event(dev);
}
}

So, either it should be illegal to call netif_carrier_off on an
unregistered net_device (and there should be a WARN() in
netif_carrier_off), or it should work correctly, and
register_netdevice should do the right thing when there is no carrier
(i.e. something like my patch).

--Andy

2009-04-10 00:48:29

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: [PATCH 1/1] Tell linkwatch about new interfaces

On Sat, 4 Apr 2009, Andrew Lutomirski wrote:
> On Sat, Apr 4, 2009 at 8:05 PM, David Miller <[email protected]> wrote:
> > From: Andrew Lutomirski <[email protected]>
> > Date: Wed, 1 Apr 2009 11:40:06 -0400
> >
> >> When a network driver registers a new interface, linkwatch will not notice,
> >> and hence not set the rfc2863 operstate, until netif_carrier_on gets called.
> >> If the new interface has no carrier when it is connected, then a status of
> >> "unknown" is reported to userspace, which confuses various tools
> >> (NetworkManager, for example).
> >>
> >> This fires a linkwatch event for all new interfaces, so that operstate
> >> gets set reasonably quickly.
> >>
> >> Signed-off-by: Andrew Lutomirski <[email protected]>
> >
> > The default assumed state for a freshly registered network
> > device is that the link is up.
> >
> > If that disagrees from reality, the driver should make the
> > appropriate netif_carrier_off() call.
> >
> > I'm sure you'll find that the e1000 driver is not doing this
> > and that is what causes the bug you are seeing.
> >

Dave, if we move the netif_carrier_off call to our dev->open like tg3 has,
do you think this is sufficient?

I note that we were calling netif_tx_stop_all_queues here, but I'm curious
if davem thinks we need to lock out our tx routine until dev->open
completes or whether the starting state of the netdev is sufficient.

> Nope. The end of e1000_probe (in e1000e) is:
>
> /* tell the stack to leave us alone until e1000_open() is called */
> netif_carrier_off(netdev);
> netif_tx_stop_all_queues(netdev);
>
> strcpy(netdev->name, "eth%d");
> err = register_netdev(netdev);
> if (err)
> goto err_register;
>
> e1000_print_device_info(adapter);
>
> netif_carrier_off does:
>
> void netif_carrier_off(struct net_device *dev)
> {
> if (!test_and_set_bit(__LINK_STATE_NOCARRIER, &dev->state)) {
> if (dev->reg_state == NETREG_UNINITIALIZED)
> return;
> linkwatch_fire_event(dev);
> }
> }
>
> So, either it should be illegal to call netif_carrier_off on an
> unregistered net_device (and there should be a WARN() in
> netif_carrier_off), or it should work correctly, and
> register_netdevice should do the right thing when there is no carrier
> (i.e. something like my patch).

does this patch also fix the issue?

===== begin =====

e1000e: indicate link down at load

From: Jesse Brandeburg <[email protected]>

same kind of patch as e1000, let driver explicitly push link state
once link comes up.

Signed-off-by: Jesse Brandeburg <[email protected]>
---

drivers/net/e1000e/netdev.c | 6 ++----
1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index fb78278..6a0411e 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -3072,6 +3072,8 @@ static int e1000_open(struct net_device *netdev)
if (test_bit(__E1000_TESTING, &adapter->state))
return -EBUSY;

+ netif_carrier_off(netdev);
+
/* allocate transmit descriptors */
err = e1000e_setup_tx_resources(adapter);
if (err)
@@ -5006,10 +5008,6 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
if (!(adapter->flags & FLAG_HAS_AMT))
e1000_get_hw_control(adapter);

- /* tell the stack to leave us alone until e1000_open() is called */
- netif_carrier_off(netdev);
- netif_tx_stop_all_queues(netdev);
-
strcpy(netdev->name, "eth%d");
err = register_netdev(netdev);
if (err)

2009-04-11 15:46:22

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 1/1] Tell linkwatch about new interfaces

On Thu, Apr 9, 2009 at 8:48 PM, Brandeburg, Jesse
<[email protected]> wrote:
>
> does this patch also fix the issue?

Yes.

2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
state DOWN qlen 1000

Tested-by: Andy Lutomirski <[email protected]>

The link operational states are still funny, though:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
3: wmaster0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
pfifo_fast state UNKNOWN qlen 1000

It looks like all interfaces that don't call netif_carrier_down after
registration end up in UNKNOWN until something happens. The case of
carrier_on but UNKNOWN doesn't seem to confuse my userspace, but it's
still odd.

--Andy

>
> ===== begin =====
>
> e1000e: indicate link down at load
>
> From: Jesse Brandeburg <[email protected]>
>
> same kind of patch as e1000, let driver explicitly push link state
> once link comes up.
>
> Signed-off-by: Jesse Brandeburg <[email protected]>
> ---
>
> ?drivers/net/e1000e/netdev.c | ? ?6 ++----
> ?1 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
> index fb78278..6a0411e 100644
> --- a/drivers/net/e1000e/netdev.c
> +++ b/drivers/net/e1000e/netdev.c
> @@ -3072,6 +3072,8 @@ static int e1000_open(struct net_device *netdev)
> ? ? ? ?if (test_bit(__E1000_TESTING, &adapter->state))
> ? ? ? ? ? ? ? ?return -EBUSY;
>
> + ? ? ? netif_carrier_off(netdev);
> +
> ? ? ? ?/* allocate transmit descriptors */
> ? ? ? ?err = e1000e_setup_tx_resources(adapter);
> ? ? ? ?if (err)
> @@ -5006,10 +5008,6 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
> ? ? ? ?if (!(adapter->flags & FLAG_HAS_AMT))
> ? ? ? ? ? ? ? ?e1000_get_hw_control(adapter);
>
> - ? ? ? /* tell the stack to leave us alone until e1000_open() is called */
> - ? ? ? netif_carrier_off(netdev);
> - ? ? ? netif_tx_stop_all_queues(netdev);
> -
> ? ? ? ?strcpy(netdev->name, "eth%d");
> ? ? ? ?err = register_netdev(netdev);
> ? ? ? ?if (err)
>

2009-04-13 23:23:17

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/1] Tell linkwatch about new interfaces

From: "Brandeburg, Jesse" <[email protected]>
Date: Thu, 9 Apr 2009 17:48:04 -0700 (Pacific Daylight Time)

> Dave, if we move the netif_carrier_off call to our dev->open like tg3 has,
> do you think this is sufficient?

Yes.

> e1000e: indicate link down at load
>
> From: Jesse Brandeburg <[email protected]>
>
> same kind of patch as e1000, let driver explicitly push link state
> once link comes up.
>
> Signed-off-by: Jesse Brandeburg <[email protected]>

I'll wait to get this via Jeff Kirsher.

Thanks.

2009-04-14 07:43:34

by Jeff Kirsher

[permalink] [raw]
Subject: Re: [PATCH 1/1] Tell linkwatch about new interfaces

On Mon, Apr 13, 2009 at 4:22 PM, David Miller <[email protected]> wrote:
> From: "Brandeburg, Jesse" <[email protected]>
> Date: Thu, 9 Apr 2009 17:48:04 -0700 (Pacific Daylight Time)
>
>> Dave, if we move the netif_carrier_off call to our dev->open like tg3 has,
>> do you think this is sufficient?
>
> Yes.
>
>> e1000e: indicate link down at load
>>
>> From: Jesse Brandeburg <[email protected]>
>>
>> same kind of patch as e1000, let driver explicitly push link state
>> once link comes up.
>>
>> Signed-off-by: Jesse Brandeburg <[email protected]>
>
> I'll wait to get this via Jeff Kirsher.
>
> Thanks.
> --

I have added this to my queue and will be sent out with the other
e1000 patch I have,in the next day or two.

--
Cheers,
Jeff

2009-07-14 17:17:25

by Sergio Luis

[permalink] [raw]
Subject: Re: [PATCH 1/1] Tell linkwatch about new interfaces

Hello Dave,

On Sat, Apr 4, 2009 at 9:05 PM, David Miller<[email protected]> wrote:
> From: Andrew Lutomirski <[email protected]>
> Date: Wed, 1 Apr 2009 11:40:06 -0400
>
>> When a network driver registers a new interface, linkwatch will not notice,
>> and hence not set the rfc2863 operstate, until netif_carrier_on gets called.
>> If the new interface has no carrier when it is connected, then a status of
>> "unknown" is reported to userspace, which confuses various tools
>> (NetworkManager, for example).
>>
>> This fires a linkwatch event for all new interfaces, so that operstate
>> gets set reasonably quickly.
>>
>> Signed-off-by: Andrew Lutomirski <[email protected]>
>
> The default assumed state for a freshly registered network
> device is that the link is up.
>
> If that disagrees from reality, the driver should make the
> appropriate netif_carrier_off() call.
>
> I'm sure you'll find that the e1000 driver is not doing this
> and that is what causes the bug you are seeing.
> --

is this patch incorrect, though? with the linkwatch_fire_event() call,
the rfc2863 operstate will be set for everyone at device register
time.
in here I am having the interface operstate as 'unknown', but I do
ifconfig down and up or unplug/plug the cable again it will finally
set the correct rfc2863 operstate.

or should this be fixed on a per-driver basis, like it apparently was
in this case, for his e1000? (drivers/net/skge.c in here).

thanks,
sergio

2009-07-14 18:33:46

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/1] Tell linkwatch about new interfaces

From: Sergio Luis <[email protected]>
Date: Tue, 14 Jul 2009 14:17:21 -0300

> is this patch incorrect, though? with the linkwatch_fire_event() call,
> the rfc2863 operstate will be set for everyone at device register
> time.

The issue is dumb drivers that do not manage their link state
at all. We want them to always have their links up, from the
moment they are registered.

This is especially important for virtual devices.

2009-07-14 18:37:24

by Sergio Luis

[permalink] [raw]
Subject: Re: [PATCH 1/1] Tell linkwatch about new interfaces

On Tue, Jul 14, 2009 at 3:33 PM, David Miller<[email protected]> wrote:
> From: Sergio Luis <[email protected]>
> Date: Tue, 14 Jul 2009 14:17:21 -0300
>
>> is this patch incorrect, though? with the linkwatch_fire_event() call,
>> the rfc2863 operstate will be set for everyone at device register
>> time.
>
> The issue is dumb drivers that do not manage their link state
> at all. ?We want them to always have their links up, from the
> moment they are registered.
>
> This is especially important for virtual devices.
>

I see. I will try to take a look at the driver in question, then. Thanks.

2009-07-14 18:58:11

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 1/1] Tell linkwatch about new interfaces

On Tue, Jul 14, 2009 at 2:33 PM, David Miller<[email protected]> wrote:
> From: Sergio Luis <[email protected]>
> Date: Tue, 14 Jul 2009 14:17:21 -0300
>
>> is this patch incorrect, though? with the linkwatch_fire_event() call,
>> the rfc2863 operstate will be set for everyone at device register
>> time.
>
> The issue is dumb drivers that do not manage their link state
> at all. ?We want them to always have their links up, from the
> moment they are registered.

Such dumb drivers still end up with bogus operstate.

>
> This is especially important for virtual devices.

$ ip link show lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

I've never noticed this causing a problem, but it seems a little
silly. Presumably lo should be "UP."

--Andy

>