2008-01-02 21:05:12

by Richard Jonsson

[permalink] [raw]
Subject: forcedeth: MAC-address reversed on resume from suspend

Bugreport regarding forcedeth driver.

When returning from suspend-to-RAM the MAC-address byteorder is
reversed. After another suspend-resume cycle the MAC-address is again
correct. This brings a great deal of pain since the NIC is assigned a
random MAC-address when it is detected as invalid.

I cannot say if this issue appeared recently or if it's been in the
kernel for a while, as I've been using wireless until recently.

I read somewhere on lkml that the mac is read from the device, then
reversed and finally written back to the device. Can it be that it is
read again on resume and reversed, and then again written to the device?
This would explain why it's ok every other resume cycle.

MAC is always correct when cold booting.

Random computer info:
HP dv2140eu, amd64 smp, nvidia mcp51, kernel 2.6.24-rc3, ubuntu feisty

dmesg when fail:
forcedeth 0000:00:14.0: Invalid Mac address detected: e1:97:11:d3:16:00
forcedeth 0000:00:14.0: ifname eth0, PHY OUI 0x5043 @ 1, addr
00:00:6c:1e:34:69

dmesg when good:
forcedeth 0000:00:14.0: ifname eth0, PHY OUI 0x5043 @ 1, addr
00:16:d3:11:97:e1


2008-01-02 21:48:57

by Andreas Mohr

[permalink] [raw]
Subject: Re: forcedeth: MAC-address reversed on resume from suspend

Hi,

On Wed, Jan 02, 2008 at 10:04:49PM +0100, Richard Jonsson wrote:
> Bugreport regarding forcedeth driver.
>
> When returning from suspend-to-RAM the MAC-address byteorder is reversed.
> After another suspend-resume cycle the MAC-address is again correct. This
> brings a great deal of pain since the NIC is assigned a random MAC-address
> when it is detected as invalid.
>
> I cannot say if this issue appeared recently or if it's been in the kernel
> for a while, as I've been using wireless until recently.
>
> I read somewhere on lkml that the mac is read from the device, then
> reversed and finally written back to the device. Can it be that it is read
> again on resume and reversed, and then again written to the device? This
> would explain why it's ok every other resume cycle.

Uh, Use The Source, Luke?

But no, I think it's simply driver dainbreadness, not a matter of
complicated writing and reading back in reversed order.

drivers/net/forcedeth.c has a nice DEV_HAS_CORRECT_MACADDR flag which is
being enabled for certain cards (those which need this) and disabled for others.

The nv_probe() function then nicely obeys this flag by reversing the
address if needed, _however_ there's another function, nv_copy_mac_to_hw(),
which does _NOT_ obey it and simply copies the _raw_ netdev dev_addr
to the card register (NvRegMacAddrA etc.).

I don't know, this all looks a bit dirty to me, MAC reading/writing
should have been implemented in a more central way, then those people
wouldn't have confused heaven and hell with MAC address fiddling.

And yeah, this certainly looks like a bug that should be fixed ASAP,
unless my short analysis somehow happened to be wrong.
(I could probably fix it, but then the forcedeth people
most likely know better how they would like to see it implemented)

Thank you for your report!

Now the only thing remaining would be: is there a specific way to
contact forcedeth-related people? I didn't find anything within a short
search.

Andreas Mohr

2008-01-02 23:42:45

by Adrian Bunk

[permalink] [raw]
Subject: Re: forcedeth: MAC-address reversed on resume from suspend

[ original bug report: http://lkml.org/lkml/2008/1/2/253 ]

On Wed, Jan 02, 2008 at 10:48:43PM +0100, Andreas Mohr wrote:
> Hi,
>
> On Wed, Jan 02, 2008 at 10:04:49PM +0100, Richard Jonsson wrote:
> > Bugreport regarding forcedeth driver.
> >
> > When returning from suspend-to-RAM the MAC-address byteorder is reversed.
> > After another suspend-resume cycle the MAC-address is again correct. This
> > brings a great deal of pain since the NIC is assigned a random MAC-address
> > when it is detected as invalid.
> >
> > I cannot say if this issue appeared recently or if it's been in the kernel
> > for a while, as I've been using wireless until recently.
> >
> > I read somewhere on lkml that the mac is read from the device, then
> > reversed and finally written back to the device. Can it be that it is read
> > again on resume and reversed, and then again written to the device? This
> > would explain why it's ok every other resume cycle.
>
> Uh, Use The Source, Luke?
>
> But no, I think it's simply driver dainbreadness, not a matter of
> complicated writing and reading back in reversed order.
>
> drivers/net/forcedeth.c has a nice DEV_HAS_CORRECT_MACADDR flag which is
> being enabled for certain cards (those which need this) and disabled for others.
>
> The nv_probe() function then nicely obeys this flag by reversing the
> address if needed, _however_ there's another function, nv_copy_mac_to_hw(),
> which does _NOT_ obey it and simply copies the _raw_ netdev dev_addr
> to the card register (NvRegMacAddrA etc.).
>
> I don't know, this all looks a bit dirty to me, MAC reading/writing
> should have been implemented in a more central way, then those people
> wouldn't have confused heaven and hell with MAC address fiddling.
>
> And yeah, this certainly looks like a bug that should be fixed ASAP,
> unless my short analysis somehow happened to be wrong.
> (I could probably fix it, but then the forcedeth people
> most likely know better how they would like to see it implemented)
>
> Thank you for your report!
>
> Now the only thing remaining would be: is there a specific way to
> contact forcedeth-related people? I didn't find anything within a short
> search.

Cc's added.

> Andreas Mohr

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-01-04 01:36:03

by Richard Jonsson

[permalink] [raw]
Subject: Re: forcedeth: MAC-address reversed on resume from suspend

> I don't know, this all looks a bit dirty to me, MAC reading/writing
> should have been implemented in a more central way, then those people
> wouldn't have confused heaven and hell with MAC address fiddling.
>
> And yeah, this certainly looks like a bug that should be fixed ASAP,
> unless my short analysis somehow happened to be wrong.
> (I could probably fix it, but then the forcedeth people
> most likely know better how they would like to see it implemented)
>
> Thank you for your report!
>
Thank you for looking into this, it's a major pain for me. If I knew how
to, I would have submitted a patch along with the report, if I had the
know-how. I'd be happy to help in any way I can.

> Now the only thing remaining would be: is there a specific way to
> contact forcedeth-related people? I didn't find anything within a short
> search.
>
> Andreas Mohr

Well, I did search the maintainers file, and did some googling, but
didn't find any contact info.

/ Richard

2008-01-04 03:44:19

by Björn Steinbrink

[permalink] [raw]
Subject: Re: forcedeth: MAC-address reversed on resume from suspend

On 2008.01.03 01:42:09 +0200, Adrian Bunk wrote:
> [ original bug report: http://lkml.org/lkml/2008/1/2/253 ]
>
> On Wed, Jan 02, 2008 at 10:48:43PM +0100, Andreas Mohr wrote:
> > Hi,
> >
> > On Wed, Jan 02, 2008 at 10:04:49PM +0100, Richard Jonsson wrote:
> > > Bugreport regarding forcedeth driver.
> > >
> > > When returning from suspend-to-RAM the MAC-address byteorder is
> > > reversed. After another suspend-resume cycle the MAC-address is
> > > again correct. This brings a great deal of pain since the NIC is
> > > assigned a random MAC-address when it is detected as invalid.
> > >
> > > I cannot say if this issue appeared recently or if it's been in
> > > the kernel for a while, as I've been using wireless until
> > > recently.
> > >
> > > I read somewhere on lkml that the mac is read from the device,
> > > then reversed and finally written back to the device. Can it be
> > > that it is read again on resume and reversed, and then again
> > > written to the device? This would explain why it's ok every other
> > > resume cycle.
> >
> > Uh, Use The Source, Luke?
> >
> > But no, I think it's simply driver dainbreadness, not a matter of
> > complicated writing and reading back in reversed order.
> >
> > drivers/net/forcedeth.c has a nice DEV_HAS_CORRECT_MACADDR flag
> > which is being enabled for certain cards (those which need this) and
> > disabled for others.
> >
> > The nv_probe() function then nicely obeys this flag by reversing the
> > address if needed, _however_ there's another function,
> > nv_copy_mac_to_hw(), which does _NOT_ obey it and simply copies the
> > _raw_ netdev dev_addr to the card register (NvRegMacAddrA etc.).

nv_copy_mac_to_hw() is called from nv_probe() _after_ the MAC address
has been fixed, and from nv_set_mac_address() which is supposed to get a
correct MAC address anyway. So I don't see any problem there.


After some digging, I guess it's related to
5070d3408405ae1941f259acac7a9882045c3be4

That introduced a flag in a register to signal if the MAC address has
been corrected, so that for example kexec won't reverse the adddress
again, when calling nv_probe() again.

As I know neither the suspend nor the kexec code well enough, I'll have
to make a few smart guesses (let's hope that that works out *g*):

a) suspend manages to reverse the MAC address again, so it must call
nv_probe. And we have lost the flag that stops us from reversing the
address. We cannot have lost the fixed MAC address, because then we'd
always get the reversed address, and not go back and forth. I'll assume
that it will also call nv_remove during suspend, because it's nicely
symmetrical then :-)

b) kexec does not call nv_remove, because otherwise, it would see a
reversed address on its nv_probe call anyway, and the flag wouldn't be
necessary.

Now the commit that introduced the flag did also introduce an indirect
change to nv_remove. Although it still says that it writes back the
broken MAC address, that's no longer true, because orig_mac now also
gets the correct address in nv_probe.

Smart guessing time again: That was required because otherwise a "rmood
forcedeth && modprobe forcedeth" would have produced a reversed MAC
address.

But that also causes the problem that we get when we loose either the
flag, we start reversing the fixed address. If we instead return the
card to its initial state in nv_remove, ie. unset the flag and write
back the reversed address, then everyone going through nv_remove _and_
nv_probe should be happy. And kexec, which only goes through nv_probe
again and doesn't loose the flag should be happy, too.

Richard, could you give this a spin? And then we'd likely need someone
to test that with kexec...

thanks,
Bj?rn
---
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index a96583c..f84c752 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -5199,10 +5199,6 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
dev->dev_addr[3] = (np->orig_mac[0] >> 16) & 0xff;
dev->dev_addr[4] = (np->orig_mac[0] >> 8) & 0xff;
dev->dev_addr[5] = (np->orig_mac[0] >> 0) & 0xff;
- /* set permanent address to be correct aswell */
- np->orig_mac[0] = (dev->dev_addr[0] << 0) + (dev->dev_addr[1] << 8) +
- (dev->dev_addr[2] << 16) + (dev->dev_addr[3] << 24);
- np->orig_mac[1] = (dev->dev_addr[4] << 0) + (dev->dev_addr[5] << 8);
writel(txreg|NVREG_TRANSMITPOLL_MAC_ADDR_REV, base + NvRegTransmitPoll);
}
memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len);
@@ -5414,6 +5410,8 @@ static void __devexit nv_remove(struct pci_dev *pci_dev)
*/
writel(np->orig_mac[0], base + NvRegMacAddrA);
writel(np->orig_mac[1], base + NvRegMacAddrB);
+ writel(readl(base + NvRegTransmitPoll) & ~NVREG_TRANSMITPOLL_MAC_ADDR_REV,
+ base + NvRegTransmitPoll);

/* free all structures */
free_rings(dev);

2008-01-04 08:45:31

by Andreas Mohr

[permalink] [raw]
Subject: Re: forcedeth: MAC-address reversed on resume from suspend

Hi,

On Fri, Jan 04, 2008 at 04:43:57AM +0100, Bj?rn Steinbrink wrote:
> On 2008.01.03 01:42:09 +0200, Adrian Bunk wrote:
> > [ original bug report: http://lkml.org/lkml/2008/1/2/253 ]
> >
> > On Wed, Jan 02, 2008 at 10:48:43PM +0100, Andreas Mohr wrote:
> > > The nv_probe() function then nicely obeys this flag by reversing the
> > > address if needed, _however_ there's another function,
> > > nv_copy_mac_to_hw(), which does _NOT_ obey it and simply copies the
> > > _raw_ netdev dev_addr to the card register (NvRegMacAddrA etc.).
>
> nv_copy_mac_to_hw() is called from nv_probe() _after_ the MAC address
> has been fixed, and from nv_set_mac_address() which is supposed to get a
> correct MAC address anyway. So I don't see any problem there.

/* set permanent address to be correct aswell */
... orig_mac fumbling ...

Why, then, does this driver do such a nice hack as:

/* special op: write back the misordered MAC address - otherwise
* the next nv_probe would see a wrong address.
*/
writel(np->orig_mac[0], base + NvRegMacAddrA);
writel(np->orig_mac[1], base + NvRegMacAddrB);

I really don't see why this driver needs to do these things in such a
messy way.

One thing is for certain: netdev->dev_addr is always in operating system
order, and order should always be handled properly when copying to/from
hardware.

Such a driver needs exactly *one* central helper method
to reverse an input MAC address in 6 bytes u8 format
(which could arguably be added to include/linux/etherdevice.h even,
since a wee bit too many devices seem to be having trouble
with wrongly ordered MAC addresses).
Then it needs exactly *one* function for I/O register access
to read a MAC address from a device and exactly *one* function
to write it back (both doing raw, unsorted format transfers only,
and possibly as inline function).

And then it needs these card I/O functions wrapped into two functions which
interface with driver- and OS-related MAC variables
(struct variables ALWAYS stored in usual system order, NOT H/W order!!!!!!)
which optionally reverse the address (if needed for a particular card).

And then there will never be any confusion about any MAC address format
anywhere any more, right?

I really don't mean to sound cranky, but this driver did ask for trouble,
and lots of it ;)

Thank you for your reply, and especially thank you for this very fast
patch!

I might even have enough time this weekend to work on this...

Andreas Mohr

2008-01-04 10:17:55

by Björn Steinbrink

[permalink] [raw]
Subject: Re: forcedeth: MAC-address reversed on resume from suspend

On 2008.01.04 09:45:17 +0100, Andreas Mohr wrote:
> Hi,
>
> On Fri, Jan 04, 2008 at 04:43:57AM +0100, Bj?rn Steinbrink wrote:
> > On 2008.01.03 01:42:09 +0200, Adrian Bunk wrote:
> > > [ original bug report: http://lkml.org/lkml/2008/1/2/253 ]
> > >
> > > On Wed, Jan 02, 2008 at 10:48:43PM +0100, Andreas Mohr wrote:
> > > > The nv_probe() function then nicely obeys this flag by reversing the
> > > > address if needed, _however_ there's another function,
> > > > nv_copy_mac_to_hw(), which does _NOT_ obey it and simply copies the
> > > > _raw_ netdev dev_addr to the card register (NvRegMacAddrA etc.).
> >
> > nv_copy_mac_to_hw() is called from nv_probe() _after_ the MAC address
> > has been fixed, and from nv_set_mac_address() which is supposed to get a
> > correct MAC address anyway. So I don't see any problem there.
>
> /* set permanent address to be correct aswell */
> ... orig_mac fumbling ...
>
> Why, then, does this driver do such a nice hack as:
>
> /* special op: write back the misordered MAC address - otherwise
> * the next nv_probe would see a wrong address.
> */
> writel(np->orig_mac[0], base + NvRegMacAddrA);
> writel(np->orig_mac[1], base + NvRegMacAddrB);
>
> I really don't see why this driver needs to do these things in such a
> messy way.

Sure you can add some layers of wrappers and then have
"nv_store_mac(np->orig_mac)" instead. Congratulations, you saved 2 lines
and added 6 new ones (assuming that you don't go universally u8, that
adds even more). This is the only place that writes the MAC address
besides the existing nv_copy_mac_to_hw wrapper which deals with
net_devices and thus is perfectly usable, internally and externally.

> One thing is for certain: netdev->dev_addr is always in operating system
> order, and order should always be handled properly when copying to/from
> hardware.
>
> Such a driver needs exactly *one* central helper method
> to reverse an input MAC address in 6 bytes u8 format
> (which could arguably be added to include/linux/etherdevice.h even,
> since a wee bit too many devices seem to be having trouble
> with wrongly ordered MAC addresses).
> Then it needs exactly *one* function for I/O register access
> to read a MAC address from a device and exactly *one* function
> to write it back (both doing raw, unsorted format transfers only,
> and possibly as inline function).

A function to read the MAC address from the hardware? There's only a
single place (nv_probe()) that ever reads it, why bother?

> And then it needs these card I/O functions wrapped into two functions which
> interface with driver- and OS-related MAC variables
> (struct variables ALWAYS stored in usual system order, NOT H/W order!!!!!!)
> which optionally reverse the address (if needed for a particular card).

Hu? The MAC address is only ever reversed when the card is not in use,
why the hell would you want to reverse anything when the rest of the OS
is involved? This whole reversing stuff is purely before and after the
card is actually used. It's not that you need to reverse the address to
communicate with the card, it's just initially wrong on the card.

> And then there will never be any confusion about any MAC address format
> anywhere any more, right?

At probing time, you'll have to know whether the address you read from
the hardware is reversed or not. Unless you write it back in reversed
order when you release the card, you need a flag that at least survives
until nv_probe is called the next time. kexec does not write it back, so
you do need such a flag. But the flag apparently doesn't survive a
suspend/resume cycle, so you need to write back the reversed address
then. But the flag will survive a rmmod + modprobe cycle, so you need to
reset it when writing back the reversed address.

> I really don't mean to sound cranky, but this driver did ask for trouble,
> and lots of it ;)
>
> Thank you for your reply, and especially thank you for this very fast
> patch!

Well, let's see if my analysis was correct and the patch works, first. I
saw the forcedeth code before, but kexec and suspend is totally new to
me and I just made some assumptions based on the reported behaviour and
the commit messages... ;-)

Bj?rn

2008-01-04 14:09:21

by Richard Jonsson

[permalink] [raw]
Subject: Re: forcedeth: MAC-address reversed on resume from suspend

Bj?rn Steinbrink skrev:
> Richard, could you give this a spin? And then we'd likely need someone
> to test that with kexec...
>
> thanks,
> Bj?rn

The patch you sent does the trick, works fine now, thanks!
I cannot test this with kexec as I barely know what it is, I'll leave
that to someone else.

/ Richard

2008-01-04 22:26:49

by Björn Steinbrink

[permalink] [raw]
Subject: [PATCH] Fix forcedeth reversing the MAC address on suspend

For cards that initially have the MAC address stored in reverse order,
the forcedeth driver uses a flag to signal whether the address was
already corrected, so that it is not reversed again on a subsequent
probe.

Unfortunately this flag, which is stored in a register of the card,
seems to get lost during suspend, resulting in the MAC address being
reversed again. To fix that, the MAC address needs to be written back in
reversed order before we suspend and the flag needs to be reset.

The flag is still required because at least kexec will never write back
the reversed address and thus needs to know what state the card is in.

Signed-off-by: Bj?rn Steinbrink <[email protected]>
---
On 2008.01.04 15:08:42 +0100, Richard Jonsson wrote:
> Bj?rn Steinbrink skrev:
>> Richard, could you give this a spin? And then we'd likely need someone
>> to test that with kexec...
>
> The patch you sent does the trick, works fine now, thanks!
> I cannot test this with kexec as I barely know what it is, I'll leave that
> to someone else.

Thanks.

Ayaz, you originally wrote the kexec fix (IIRC), was my analysis of the
problem correct? If so, I'm quite sure that the patch DTRT. Still it
should be tested for the rmmod+modprobe and the kexec case. I'll try to
get my box free for some testing, but that's unlikely in the next few
days. Plus, I've never used kexec myself either. So I'd be grateful if
someone else would step up.

diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index a96583c..f84c752 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -5199,10 +5199,6 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
dev->dev_addr[3] = (np->orig_mac[0] >> 16) & 0xff;
dev->dev_addr[4] = (np->orig_mac[0] >> 8) & 0xff;
dev->dev_addr[5] = (np->orig_mac[0] >> 0) & 0xff;
- /* set permanent address to be correct aswell */
- np->orig_mac[0] = (dev->dev_addr[0] << 0) + (dev->dev_addr[1] << 8) +
- (dev->dev_addr[2] << 16) + (dev->dev_addr[3] << 24);
- np->orig_mac[1] = (dev->dev_addr[4] << 0) + (dev->dev_addr[5] << 8);
writel(txreg|NVREG_TRANSMITPOLL_MAC_ADDR_REV, base + NvRegTransmitPoll);
}
memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len);
@@ -5414,6 +5410,8 @@ static void __devexit nv_remove(struct pci_dev *pci_dev)
*/
writel(np->orig_mac[0], base + NvRegMacAddrA);
writel(np->orig_mac[1], base + NvRegMacAddrB);
+ writel(readl(base + NvRegTransmitPoll) & ~NVREG_TRANSMITPOLL_MAC_ADDR_REV,
+ base + NvRegTransmitPoll);

/* free all structures */
free_rings(dev);

2008-01-04 22:44:09

by Andreas Mohr

[permalink] [raw]
Subject: Re: forcedeth: MAC-address reversed on resume from suspend

On Fri, Jan 04, 2008 at 11:17:40AM +0100, Bj?rn Steinbrink wrote:
> On 2008.01.04 09:45:17 +0100, Andreas Mohr wrote:
> > And then it needs these card I/O functions wrapped into two functions which
> > interface with driver- and OS-related MAC variables
> > (struct variables ALWAYS stored in usual system order, NOT H/W order!!!!!!)
> > which optionally reverse the address (if needed for a particular card).
>
> Hu? The MAC address is only ever reversed when the card is not in use,
> why the hell would you want to reverse anything when the rest of the OS
> is involved? This whole reversing stuff is purely before and after the
> card is actually used. It's not that you need to reverse the address to
> communicate with the card, it's just initially wrong on the card.

Huhrmm? OK, let me ask this then:
So what you're saying is that the address is only initially wrong
(e.g. due to card EEPROM content somehow initializing the registers
in wrong order on power-on), it's not _always_ supposed to be stored
in wrong order during operation.

IOW, the default card state after power-on is _unusable_ (due to
invalidly ordered MAC address) and has to be _corrected_ by the driver,
_initially_ only?

OTOH I know that at least acx100 has a _permanently_ wrong-ordered
MAC address setup, i.e. it's required to have it in "wrong" order
_during operation_. And I wouldn't be surprised to see several examples
of other hardware having a permanently wrongly-ordered address
requirement, given the amount of MAC reversal in Linux drivers.


Couldn't it be by chance that it's _believed_ to be reversed-on-power-on only,
whereas those cards should _actually_ have it reversed-during-lifetime
instead? Such a misunderstanding might explain a lot of trouble...

Obviously I was expecting the latter case, which my code layout proposal
was supposed to support in a clean way.

> > And then there will never be any confusion about any MAC address format
> > anywhere any more, right?
>
> At probing time, you'll have to know whether the address you read from
> the hardware is reversed or not. Unless you write it back in reversed
> order when you release the card, you need a flag that at least survives
> until nv_probe is called the next time. kexec does not write it back, so
> you do need such a flag. But the flag apparently doesn't survive a
> suspend/resume cycle, so you need to write back the reversed address
> then. But the flag will survive a rmmod + modprobe cycle, so you need to
> reset it when writing back the reversed address.

If it's indeed reversed-on-power-on only, then one probably cannot do
much about it, thus I'd give up and simply check the MAC address for
validity (should be very easy to check for a simple reversal by
checking for the manufacturer-specific fingerprint in reverse order).
Keeping _any_ sort of state to record the fact that it used to be reversed
or now is reversed is futile (or rather: catastrophic) given the complexity of
suspend / virtualisation or whichever other nice operations Linux may invent
in the future ;)

> Well, let's see if my analysis was correct and the patch works, first. I
> saw the forcedeth code before, but kexec and suspend is totally new to
> me and I just made some assumptions based on the reported behaviour and
> the commit messages... ;-)

You most likely did more analysis than me, I just said
"this very obviously must be the problem" and replied ;)

Andreas Mohr

2008-01-05 06:10:18

by Björn Steinbrink

[permalink] [raw]
Subject: Re: forcedeth: MAC-address reversed on resume from suspend

On 2008.01.04 23:43:52 +0100, Andreas Mohr wrote:
> On Fri, Jan 04, 2008 at 11:17:40AM +0100, Bj?rn Steinbrink wrote:
> > On 2008.01.04 09:45:17 +0100, Andreas Mohr wrote:
> > > And then it needs these card I/O functions wrapped into two
> > > functions which interface with driver- and OS-related MAC
> > > variables (struct variables ALWAYS stored in usual system order,
> > > NOT H/W order!!!!!!) which optionally reverse the address (if
> > > needed for a particular card).
> >
> > Hu? The MAC address is only ever reversed when the card is not in
> > use, why the hell would you want to reverse anything when the rest
> > of the OS is involved? This whole reversing stuff is purely before
> > and after the card is actually used. It's not that you need to
> > reverse the address to communicate with the card, it's just
> > initially wrong on the card.
>
> Huhrmm? OK, let me ask this then: So what you're saying is that the
> address is only initially wrong (e.g. due to card EEPROM content
> somehow initializing the registers in wrong order on power-on), it's
> not _always_ supposed to be stored in wrong order during operation.
>
> IOW, the default card state after power-on is _unusable_ (due to
> invalidly ordered MAC address) and has to be _corrected_ by the
> driver, _initially_ only?

Yeah. _But_, all deduced from the code.

> OTOH I know that at least acx100 has a _permanently_ wrong-ordered MAC
> address setup, i.e. it's required to have it in "wrong" order _during
> operation_. And I wouldn't be surprised to see several examples of
> other hardware having a permanently wrongly-ordered address
> requirement, given the amount of MAC reversal in Linux drivers.
>
> Couldn't it be by chance that it's _believed_ to be
> reversed-on-power-on only, whereas those cards should _actually_ have
> it reversed-during-lifetime instead? Such a misunderstanding might
> explain a lot of trouble...

Humm... Wouldn't that have made itself evident? The card's not running
in promisc mode all the time, so there should be some problems if the
card would expect the MAC in the reversed order, right? (At least I see
some code that can switch it into promisc mode, so I assume that it is
not enabled all the time).

There _is_ a comment about some cards not generating any TX interrupts
at all. But that seems to predate any card that stores the MAC address
in correct order (the patch for that came after git, the comment
predates git). So assuming that the card wants the address in reversed
order at runtime, would likely imply that _no_ card would generate TX
interrupts (unless promisc?), and the comment basically invalidates that
assumption.

> > > And then there will never be any confusion about any MAC address
> > > format anywhere any more, right?
> >
> > At probing time, you'll have to know whether the address you read
> > from the hardware is reversed or not. Unless you write it back in
> > reversed order when you release the card, you need a flag that at
> > least survives until nv_probe is called the next time. kexec does
> > not write it back, so you do need such a flag. But the flag
> > apparently doesn't survive a suspend/resume cycle, so you need to
> > write back the reversed address then. But the flag will survive a
> > rmmod + modprobe cycle, so you need to reset it when writing back
> > the reversed address.
>
> If it's indeed reversed-on-power-on only, then one probably cannot do
> much about it, thus I'd give up and simply check the MAC address for
> validity (should be very easy to check for a simple reversal by
> checking for the manufacturer-specific fingerprint in reverse order).
> Keeping _any_ sort of state to record the fact that it used to be
> reversed or now is reversed is futile (or rather: catastrophic) given
> the complexity of suspend / virtualisation or whichever other nice
> operations Linux may invent in the future ;)

There was once a patch that checked the vendor specific address part,
long ago. No idea what happened to it. But for kexec, I'd say that that
is broken, too.

- ifconfig eth0 hw ether 00:11:22:00:00:01
- $kexec_incantation

- nv_probe sees 00:11:22:00:00:01
- doesn't match the vendor stuff
- becomes 01:00:00:22:11:00

Oops, that's not quite the expected thing.

No way to tell whether we have to reverse from the address alone AFAICT.

Bj?rn

2008-01-05 06:39:18

by Björn Steinbrink

[permalink] [raw]
Subject: Re: forcedeth: MAC-address reversed on resume from suspend

On 2008.01.05 07:10:02 +0100, Bj?rn Steinbrink wrote:
> - nv_probe sees 00:11:22:00:00:01
> - doesn't match the vendor stuff
> - becomes 01:00:00:22:11:00
>
> Oops, that's not quite the expected thing.

Haha, I just noticed that that even is a multicast address, so you'd get
a random MAC address in that case *lol.

Bj?rn

2008-01-06 21:50:01

by Adolfo R. Brandes

[permalink] [raw]
Subject: Re: forcedeth: MAC-address reversed on resume from suspend

I have this forcedeth MAC address reversal problem when suspending
on 2 distinct boxes. I can confirm Steinbrink's patch fixes the
problem on only one of them:

00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev f3)

On the other one the problem persists:

00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a1)

Thanks,
Adolfo

2008-01-07 01:47:01

by Björn Steinbrink

[permalink] [raw]
Subject: Re: forcedeth: MAC-address reversed on resume from suspend

On 2008.01.06 19:49:49 -0200, Adolfo R. Brandes wrote:
> I have this forcedeth MAC address reversal problem when suspending
> on 2 distinct boxes. I can confirm Steinbrink's patch fixes the
> problem on only one of them:
>
> 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev f3)
>
> On the other one the problem persists:
>
> 00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a1)

Thanks. Leaves me pretty clueless though. Especially since it worked for
Richard, who also has a MCP51. In a private mail, you said that you had
hardcoded the mac address in the source. Any chance that you applied the
patch on your modified sources and didn't get it right?

thanks,
Bj?rn

2008-01-07 01:48:17

by Björn Steinbrink

[permalink] [raw]
Subject: Re: [PATCH] Fix forcedeth reversing the MAC address on suspend

On 2008.01.04 23:26:33 +0100, Bj?rn Steinbrink wrote:
> For cards that initially have the MAC address stored in reverse order,
> the forcedeth driver uses a flag to signal whether the address was
> already corrected, so that it is not reversed again on a subsequent
> probe.
>
> Unfortunately this flag, which is stored in a register of the card,
> seems to get lost during suspend, resulting in the MAC address being
> reversed again. To fix that, the MAC address needs to be written back in
> reversed order before we suspend and the flag needs to be reset.
>
> The flag is still required because at least kexec will never write back
> the reversed address and thus needs to know what state the card is in.
>
> Signed-off-by: Bj?rn Steinbrink <[email protected]>
> ---
> On 2008.01.04 15:08:42 +0100, Richard Jonsson wrote:
> > Bj?rn Steinbrink skrev:
> >> Richard, could you give this a spin? And then we'd likely need someone
> >> to test that with kexec...
> >
> > The patch you sent does the trick, works fine now, thanks!
> > I cannot test this with kexec as I barely know what it is, I'll leave that
> > to someone else.
>
> Thanks.
>
> Ayaz, you originally wrote the kexec fix (IIRC), was my analysis of the
> problem correct? If so, I'm quite sure that the patch DTRT. Still it
> should be tested for the rmmod+modprobe and the kexec case. I'll try to
> get my box free for some testing, but that's unlikely in the next few
> days. Plus, I've never used kexec myself either. So I'd be grateful if
> someone else would step up.

Found a few minutes to test, but kexec would just hang for me. So I'm
unable to test that atm. :-(

Bj?rn

2008-01-07 12:02:39

by Adolfo R. Brandes

[permalink] [raw]
Subject: Re: forcedeth: MAC-address reversed on resume from suspend

Hi,

On Jan 6, 2008 10:46 PM, Bj?rn Steinbrink <[email protected]> wrote:
> Any chance that you applied the patch on your modified sources and didn't get it right?

It is perfectly possible that I messed something up, although I
double checked what I was doing on the MCP51. However, on that board
(an Asus M2NPV-VM) I'm running the 2.6.22-based Ubuntu Gutsy default
kernel with a recompiled forcedeth, as opposed to a vanilla 2.6.24-rc6
on the CK804, where your patch worked. It may be that the different
kernel is throwing the patch off somehow.

Later on today I'm going to try the MCP51 with the same build of
2.6.24-rc6 (built on a .config from the Ubuntu Hardy development
kernel) I'm running on the CK804, and post results.

Adolfo

2008-01-07 16:10:37

by Richard Jonsson

[permalink] [raw]
Subject: Re: forcedeth: MAC-address reversed on resume from suspend

Bj?rn Steinbrink skrev:
> On 2008.01.06 19:49:49 -0200, Adolfo R. Brandes wrote:
>> I have this forcedeth MAC address reversal problem when suspending
>> on 2 distinct boxes. I can confirm Steinbrink's patch fixes the
>> problem on only one of them:
>>
>> 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev f3)
>>
>> On the other one the problem persists:
>>
>> 00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a1)
>
> Thanks. Leaves me pretty clueless though. Especially since it worked for
> Richard, who also has a MCP51. In a private mail, you said that you had
> hardcoded the mac address in the source. Any chance that you applied the
> patch on your modified sources and didn't get it right?
>
> thanks,
> Bj?rn
In case it matters, the revision of my device differs from Adolfo's:
00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a3)

/ Richard

2008-01-08 07:24:04

by David Miller

[permalink] [raw]
Subject: Re: forcedeth: MAC-address reversed on resume from suspend

From: Bj?rn Steinbrink <[email protected]>
Date: Mon, 7 Jan 2008 02:46:38 +0100

> On 2008.01.06 19:49:49 -0200, Adolfo R. Brandes wrote:
> > I have this forcedeth MAC address reversal problem when suspending
> > on 2 distinct boxes. I can confirm Steinbrink's patch fixes the
> > problem on only one of them:
> >
> > 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev f3)
> >
> > On the other one the problem persists:
> >
> > 00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a1)
>
> Thanks. Leaves me pretty clueless though. Especially since it worked for
> Richard, who also has a MCP51. In a private mail, you said that you had
> hardcoded the mac address in the source. Any chance that you applied the
> patch on your modified sources and didn't get it right?

I am going to push this patch upstream, it is correct and we
have a positive test case that failed before, so overall it's
a net improvement even if we still don't exactly know why
Adolfo's case still fails.

Let me know if there are any follow-on patches.

Thanks.

2008-02-16 09:41:29

by Adolfo R. Brandes

[permalink] [raw]
Subject: Re: forcedeth: MAC-address reversed on resume from suspend

On Jan 8, 2008 5:23 AM, David Miller <[email protected]> wrote:
> I am going to push this patch upstream, it is correct and we
> have a positive test case that failed before, so overall it's
> a net improvement even if we still don't exactly know why
> Adolfo's case still fails.

Sorry it took so long, but I got around to testing the patch properly,
i.e. same kernel on CK804 and MCP51, and it worked beautifully!
Thanks!

Adolfo