2009-10-16 19:43:40

by Nix

[permalink] [raw]
Subject: Keeping network device renaming working in the presence of netconsole?

So I'm testing suspend/resume and getting a lot of wildly variable
panics on resume. I'd like to report them, so I need to capture them
somehow.

Netconsole looks like just the thing: it even says it's nonintrusive.

Unfortunately it intrudes in one very unfortunate way: if netconsole is
jabbering out of some network interface, that interface is up, so you
can't rename it: and because (to catch early panics) netconsole has to
start before userspace kicks up, this means that there is *no*
opportunity to rename network interfaces that netconsole is operating
over.

This breaks userspace more than slightly if you rely on udev's
persistent net generator rules to keep network interface names constant,
or if you rename the lot to something more memorable than ethN. Any
userspace setup of that interface, assignment of additional addresses,
routing, MTU setting et al is all toast: and you can't stop using
interface renaming unless you like your interfaces to change identities
intermittently (but we've had that flamewar).

Is there a way to rename interfaces netconsole is working over? Why
can't you rename interfaces that are in use anyway? Yes, there's the
name_hlist, but that's protected by the dev_base_lock anyway.


2009-10-16 19:58:05

by Matt Mackall

[permalink] [raw]
Subject: Re: Keeping network device renaming working in the presence of netconsole?

On Fri, 2009-10-16 at 20:43 +0100, Nix wrote:
> So I'm testing suspend/resume and getting a lot of wildly variable
> panics on resume. I'd like to report them, so I need to capture them
> somehow.
>
> Netconsole looks like just the thing: it even says it's nonintrusive.
>
> Unfortunately it intrudes in one very unfortunate way: if netconsole is
> jabbering out of some network interface, that interface is up, so you
> can't rename it: and because (to catch early panics) netconsole has to
> start before userspace kicks up, this means that there is *no*
> opportunity to rename network interfaces that netconsole is operating
> over.
>
> This breaks userspace more than slightly if you rely on udev's
> persistent net generator rules to keep network interface names constant,
> or if you rename the lot to something more memorable than ethN. Any
> userspace setup of that interface, assignment of additional addresses,
> routing, MTU setting et al is all toast: and you can't stop using
> interface renaming unless you like your interfaces to change identities
> intermittently (but we've had that flamewar).

A device definitely does have to be up for netconsole to work.

But as far as I know, there's no good reason you can't rename an
interface that's up.

But back to your original problem: netconsole is actually probably a
poor match for debugging suspend/resume as getting from an off state to
a working state in the network driver takes a non-trivial amount of
code.

A useful technique here is capturing kernel message buffers in RAM
across resets, something that can be done on most systems (provided you
can disable memory test). Alternately, you might look at firewire
techniques.

--
http://selenic.com : development and support for Mercurial and Linux

2009-10-17 11:08:15

by Nix

[permalink] [raw]
Subject: Re: Keeping network device renaming working in the presence of netconsole?

On 16 Oct 2009, Matt Mackall said:

> On Fri, 2009-10-16 at 20:43 +0100, Nix wrote:
>> This breaks userspace more than slightly if you rely on udev's
>> persistent net generator rules to keep network interface names constant,
>> or if you rename the lot to something more memorable than ethN. Any
>> userspace setup of that interface, assignment of additional addresses,
>> routing, MTU setting et al is all toast: and you can't stop using
>> interface renaming unless you like your interfaces to change identities
>> intermittently (but we've had that flamewar).
>
> A device definitely does have to be up for netconsole to work.

Thought so (hell, it brings it up itself). (I wish I knew how BMCs did
it, shadowing a real network device with a virtual one which can be
independently up and has a different MAC and everything. Probably it
takes hardware hacks.)

> But as far as I know, there's no good reason you can't rename an
> interface that's up.

I'll hack out the test and see what happens :)

> But back to your original problem: netconsole is actually probably a
> poor match for debugging suspend/resume as getting from an off state to
> a working state in the network driver takes a non-trivial amount of
> code.

I'm resuming from hibernation using TuxOnIce, so the network device has
initialized conventionally before the process state starts to
load, though it isn't *up* yet. (IIRC, anyway.)

(I probably used the wrong term: I always get suspension and hibernation
mixed up. This is a desktop, so the likelihood of suspend-to-RAM working
seemed remote, and also seemed unlikely to achieve the sorts of power
savings I was looking for.)

(Why TuxOnIce rather than swsusp? 'cos TuxOnIce was the only thing I
could ever get to work, is why, and 'cos it Just Worked until 2.6.31
when it suddenly started oopsing and panicking all over the place
in numerous exciting ways.)

> A useful technique here is capturing kernel message buffers in RAM
> across resets, something that can be done on most systems (provided you
> can disable memory test). Alternately, you might look at firewire
> techniques.

Neither of those work if the machine's been powered off for nine hours :)

2009-10-20 18:54:11

by Nix

[permalink] [raw]
Subject: [PATCH] Allow renaming of network interfaces that are up.

The ancient restriction banning renaming of busy network interfaces appears
to be purposeless. Everything that depends on a network interface's name is
done under the dev_base_lock in any case.

This makes it much easier to use things like netconsole which bring up a
network interface before userspace has started: presently these will cause
interface renamings to fail, breaking any userspace that relies on renaming
devices to avoid reliance on the potentially-unstable kernel-assigned name.

Signed-off-by: Nick Alcock <[email protected]>
---
net/core/dev.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

Note: this may very well be wrong, as I know essentially nothing about this
part of the kernel. All I know is that without a patch something like this,
netconsole is nearly useless to me, so many panics are uncapturable. Maybe
there is some constraint preventing the kernel from reliably renaming
up interfaces. In that case, netconsole, DHCP netboot discovery and so on
probably need to grow the ability to rename them themselves.

diff --git a/net/core/dev.c b/net/core/dev.c
index b8f74cf..87e9f88 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -901,8 +901,6 @@ int dev_change_name(struct net_device *dev, const char *newname)
BUG_ON(!dev_net(dev));

net = dev_net(dev);
- if (dev->flags & IFF_UP)
- return -EBUSY;

if (!dev_valid_name(newname))
return -EINVAL;
--
1.6.5.1

2009-10-21 00:39:25

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] Allow renaming of network interfaces that are up.


You'll reach networking developers at [email protected]
for patch postings, linux-net is for user questions.

2009-10-21 01:38:53

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] Allow renaming of network interfaces that are up.

On Tue, 20 Oct 2009 19:54:02 +0100
Nix <[email protected]> wrote:

> The ancient restriction banning renaming of busy network interfaces appears
> to be purposeless. Everything that depends on a network interface's name is
> done under the dev_base_lock in any case.
>
> This makes it much easier to use things like netconsole which bring up a
> network interface before userspace has started: presently these will cause
> interface renamings to fail, breaking any userspace that relies on renaming
> devices to avoid reliance on the potentially-unstable kernel-assigned name.
>
> Signed-off-by: Nick Alcock <[email protected]>
> ---
> net/core/dev.c | 2 --
> 1 files changed, 0 insertions(+), 2 deletions(-)

This breaks quagga and other applications that track renames.

2009-10-21 06:50:39

by Nix

[permalink] [raw]
Subject: Re: [PATCH] Allow renaming of network interfaces that are up.

[Cc:s adjusted, thanks davem]

On 21 Oct 2009, Stephen Hemminger said:

> On Tue, 20 Oct 2009 19:54:02 +0100
> Nix <[email protected]> wrote:
[...]
>> This makes it much easier to use things like netconsole which bring up a
>> network interface before userspace has started: presently these will cause
>> interface renamings to fail, breaking any userspace that relies on renaming
>> devices to avoid reliance on the potentially-unstable kernel-assigned name.
[...]
> This breaks quagga and other applications that track renames.

So it's only userspace that's the problem? We have a choice of breaking
apps that assume that only downed interfaces can be renamed, and thus
breaking routing while the system is running, or breaking userspaces
that assume that they can rename interfaces, and thus breaking routing
at bootup when netconsole is on? Great :/

(How many systems run things that track renames? Is this, ew, a reason
to make this constraint configurable, maybe even at runtime, so you
could start with interfaces renameable and then lock them down once
static route assignment is up, just before you fire up quagga?)

2009-10-23 19:50:30

by Nix

[permalink] [raw]
Subject: [PATCH] Make it clear how to rename netconsole-used network interfaces.

On 21 Oct 2009, Stephen Hemminger stated:

> On Tue, 20 Oct 2009 19:54:02 +0100
> Nix <[email protected]> wrote:
>> This makes it much easier to use things like netconsole which bring up a
>> network interface before userspace has started: presently these will cause
>> interface renamings to fail, breaking any userspace that relies on renaming
>> devices to avoid reliance on the potentially-unstable kernel-assigned name.
[...]
> This breaks quagga and other applications that track renames.

I've figured out how to do it without patches now. The following doc patch
may help other puzzled users.

---
Documentation/networking/netconsole.txt | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/Documentation/networking/netconsole.txt b/Documentation/networking/netconsole.txt
index 8d02207..11c7b90 100644
--- a/Documentation/networking/netconsole.txt
+++ b/Documentation/networking/netconsole.txt
@@ -132,6 +132,13 @@ the sender, it is suggested to try specifying the MAC address of the
default gateway (you may use /sbin/route -n to find it out) as the
remote MAC address instead.

+TIP: if you need to rename the network interface (as is done by many
+distributions in their startup scripts), you may find that it fails
+for interfaces managed by netconsole, because you cannot rename
+interfaces that are up. The solution here is to take the interface
+down around the renaming, then bring it up again or let the normal
+boot process do so.
+
NOTE: the network device (eth1 in the above case) can run any kind
of other network traffic, netconsole is not intrusive. Netconsole
might cause slight delays in other traffic if the volume of kernel
--
1.6.5.1