2007-10-23 15:26:40

by Adam Jackson

[permalink] [raw]
Subject: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

When the EEPROM gets corrupted, you can fix it with ethtool, but only if
the module loads and creates a network device. But, without this option,
if the EEPROM is corrupted, the driver will not create a network device.

Signed-off-by: Adam Jackson <[email protected]>
---
drivers/net/e1000/e1000_main.c | 10 ++++++++--
1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index f1ce348..b308c32 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -255,6 +255,10 @@ static int debug = NETIF_MSG_DRV | NETIF_MSG_PROBE;
module_param(debug, int, 0);
MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");

+static int eeprom_bad_csum_allow = 0;
+module_param(eeprom_bad_csum_allow, int, 0);
+MODULE_PARM_DESC(eeprom_bad_csum_allow, "Allow bad eeprom checksums");
+
/**
* e1000_init_module - Driver Registration Routine
*
@@ -1012,7 +1016,8 @@ e1000_probe(struct pci_dev *pdev,

if (e1000_validate_eeprom_checksum(&adapter->hw) < 0) {
DPRINTK(PROBE, ERR, "The EEPROM Checksum Is Not Valid\n");
- goto err_eeprom;
+ if (!eeprom_bad_csum_allow)
+ goto err_eeprom;
}

/* copy the MAC address out of the EEPROM */
@@ -1024,7 +1029,8 @@ e1000_probe(struct pci_dev *pdev,

if (!is_valid_ether_addr(netdev->perm_addr)) {
DPRINTK(PROBE, ERR, "Invalid MAC Address\n");
- goto err_eeprom;
+ if (!eeprom_bad_csum_allow)
+ goto err_eeprom;
}

e1000_get_bus_info(&adapter->hw);
--
1.5.2.4


2007-10-23 16:19:17

by Kok, Auke

[permalink] [raw]
Subject: Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

Adam Jackson wrote:
> When the EEPROM gets corrupted, you can fix it with ethtool, but only if
> the module loads and creates a network device. But, without this option,
> if the EEPROM is corrupted, the driver will not create a network device.
>
> Signed-off-by: Adam Jackson <[email protected]>


NAK

wrong list, not sent to me, and while for e100 I was OK with this patch, for e1000
it really does not make sense to 'just allow' a bad checksum - if your eeprom is
randomly messed up then you cannot just fix it like this anyway.

Auke



> ---
> drivers/net/e1000/e1000_main.c | 10 ++++++++--
> 1 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
> index f1ce348..b308c32 100644
> --- a/drivers/net/e1000/e1000_main.c
> +++ b/drivers/net/e1000/e1000_main.c
> @@ -255,6 +255,10 @@ static int debug = NETIF_MSG_DRV | NETIF_MSG_PROBE;
> module_param(debug, int, 0);
> MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
>
> +static int eeprom_bad_csum_allow = 0;
> +module_param(eeprom_bad_csum_allow, int, 0);
> +MODULE_PARM_DESC(eeprom_bad_csum_allow, "Allow bad eeprom checksums");
> +
> /**
> * e1000_init_module - Driver Registration Routine
> *
> @@ -1012,7 +1016,8 @@ e1000_probe(struct pci_dev *pdev,
>
> if (e1000_validate_eeprom_checksum(&adapter->hw) < 0) {
> DPRINTK(PROBE, ERR, "The EEPROM Checksum Is Not Valid\n");
> - goto err_eeprom;
> + if (!eeprom_bad_csum_allow)
> + goto err_eeprom;
> }
>
> /* copy the MAC address out of the EEPROM */
> @@ -1024,7 +1029,8 @@ e1000_probe(struct pci_dev *pdev,
>
> if (!is_valid_ether_addr(netdev->perm_addr)) {
> DPRINTK(PROBE, ERR, "Invalid MAC Address\n");
> - goto err_eeprom;
> + if (!eeprom_bad_csum_allow)
> + goto err_eeprom;
> }
>
> e1000_get_bus_info(&adapter->hw);

2007-10-23 16:49:14

by Adam Jackson

[permalink] [raw]
Subject: Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

On Tue, 2007-10-23 at 09:18 -0700, Kok, Auke wrote:
> Adam Jackson wrote:
> > When the EEPROM gets corrupted, you can fix it with ethtool, but only if
> > the module loads and creates a network device. But, without this option,
> > if the EEPROM is corrupted, the driver will not create a network device.
> >
> > Signed-off-by: Adam Jackson <[email protected]>
>
> NAK
>
> wrong list, not sent to me, and while for e100 I was OK with this patch, for e1000
> it really does not make sense to 'just allow' a bad checksum - if your eeprom is
> randomly messed up then you cannot just fix it like this anyway.

That's strange, I managed to recover an otherwise horked e1000 with it.
What should I have done instead?

- ajax

2007-10-23 17:12:31

by Kok, Auke

[permalink] [raw]
Subject: Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

Adam Jackson wrote:
> On Tue, 2007-10-23 at 09:18 -0700, Kok, Auke wrote:
>> Adam Jackson wrote:
>>> When the EEPROM gets corrupted, you can fix it with ethtool, but only if
>>> the module loads and creates a network device. But, without this option,
>>> if the EEPROM is corrupted, the driver will not create a network device.
>>>
>>> Signed-off-by: Adam Jackson <[email protected]>
>> NAK
>>
>> wrong list, not sent to me, and while for e100 I was OK with this patch, for e1000
>> it really does not make sense to 'just allow' a bad checksum - if your eeprom is
>> randomly messed up then you cannot just fix it like this anyway.
>
> That's strange, I managed to recover an otherwise horked e1000 with it.
> What should I have done instead?


Dump the eeprom and send us a copy, plus any and all information to the card,
system etc.. I realize that you need the patch to actually create it but the
danger is that people will start using it *without* troubleshooting the real
issue. In various systems the eeprom checksum failure is actually due to a
misconfigured powersavings feature and the checksum is really not bad at all, but
the card just reports random values.

In any case, this patch should not be merged. We often send it around to users to
debug their issue in case it involves eeproms, but merging it will just conceal
the real issue and all of a sudden a flood of people stop reporting *real* issues
to us.

for e100 the case is completely different: there are many boarded e100 chips out
there mostly on embedded devices where the embedded manufacturer just forgot to
even program the eeprom, and the device really does not care that much at all.

Cheers,

Auke


Auke

2007-10-23 20:40:20

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

Kok, Auke wrote:
> Adam Jackson wrote:
>> On Tue, 2007-10-23 at 09:18 -0700, Kok, Auke wrote:
>>> Adam Jackson wrote:
>>>> When the EEPROM gets corrupted, you can fix it with ethtool, but only if
>>>> the module loads and creates a network device. But, without this option,
>>>> if the EEPROM is corrupted, the driver will not create a network device.
>>>>
>>>> Signed-off-by: Adam Jackson <[email protected]>
>>> NAK
>>>
>>> wrong list, not sent to me, and while for e100 I was OK with this patch, for e1000
>>> it really does not make sense to 'just allow' a bad checksum - if your eeprom is
>>> randomly messed up then you cannot just fix it like this anyway.
>> That's strange, I managed to recover an otherwise horked e1000 with it.
>> What should I have done instead?
>
>
> Dump the eeprom and send us a copy, plus any and all information to the card,
> system etc.. I realize that you need the patch to actually create it but the
> danger is that people will start using it *without* troubleshooting the real
> issue. In various systems the eeprom checksum failure is actually due to a
> misconfigured powersavings feature and the checksum is really not bad at all, but
> the card just reports random values.
>
> In any case, this patch should not be merged. We often send it around to users to
> debug their issue in case it involves eeproms, but merging it will just conceal
> the real issue and all of a sudden a flood of people stop reporting *real* issues
> to us.


Sorry, I disagree. Just as with e100, if there is a clear way the user
can recover their setup -- and Adam says his was effective -- I don't
see why we should be denying users the ability to use their own hardware.

Jeff


2007-10-23 21:02:20

by Kok, Auke

[permalink] [raw]
Subject: Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

Jeff Garzik wrote:
> Kok, Auke wrote:
>> Adam Jackson wrote:
>>> On Tue, 2007-10-23 at 09:18 -0700, Kok, Auke wrote:
>>>> Adam Jackson wrote:
>>>>> When the EEPROM gets corrupted, you can fix it with ethtool, but
>>>>> only if
>>>>> the module loads and creates a network device. But, without this
>>>>> option,
>>>>> if the EEPROM is corrupted, the driver will not create a network
>>>>> device.
>>>>>
>>>>> Signed-off-by: Adam Jackson <[email protected]>
>>>> NAK
>>>>
>>>> wrong list, not sent to me, and while for e100 I was OK with this
>>>> patch, for e1000
>>>> it really does not make sense to 'just allow' a bad checksum - if
>>>> your eeprom is
>>>> randomly messed up then you cannot just fix it like this anyway.
>>> That's strange, I managed to recover an otherwise horked e1000 with it.
>>> What should I have done instead?
>>
>>
>> Dump the eeprom and send us a copy, plus any and all information to
>> the card,
>> system etc.. I realize that you need the patch to actually create it
>> but the
>> danger is that people will start using it *without* troubleshooting
>> the real
>> issue. In various systems the eeprom checksum failure is actually due
>> to a
>> misconfigured powersavings feature and the checksum is really not bad
>> at all, but
>> the card just reports random values.
>>
>> In any case, this patch should not be merged. We often send it around
>> to users to
>> debug their issue in case it involves eeproms, but merging it will
>> just conceal
>> the real issue and all of a sudden a flood of people stop reporting
>> *real* issues
>> to us.
>
>
> Sorry, I disagree. Just as with e100, if there is a clear way the user
> can recover their setup -- and Adam says his was effective -- I don't
> see why we should be denying users the ability to use their own hardware.


That's not even relevant, I already offer the same patch offline to people who
*really* only have a wrong checksum, AFTER we check the contents of their eeprom
for them.

We help everyone out, and if you merge this patch you will prevent users from
getting to us for support in the first place.

I realize that we should probably document the "bad eeprom checksum" case more
decently but I think merging this patch is a bad idea for the *user* in all cases.

You completely bypass the fact that e100 eeproms and e1000 eeproms are completely
different beasts as well, one can be practically empty in all cases (e100), the
other one every bit counts (most e1000's), which is an unfair representation and
falsely tells the user that he can just do this without any information or
disclaimer as to what he may expect afterwards.


Auke

2007-10-23 21:21:12

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

On Tue, Oct 23, 2007 at 04:40:01PM -0400, Jeff Garzik wrote:

> > In any case, this patch should not be merged. We often send it around to users to
> > debug their issue in case it involves eeproms, but merging it will just conceal
> > the real issue and all of a sudden a flood of people stop reporting *real* issues
> > to us.
>
> Sorry, I disagree. Just as with e100, if there is a clear way the user
> can recover their setup -- and Adam says his was effective -- I don't
> see why we should be denying users the ability to use their own hardware.

Indeed. This is a common enough problem that not including it causes more pain
than its worth. I have two affected boxes myself that I actually thought
the hardware was dead before I tried ajax's patch.

People aren't going to report this as a bug. They aren't going to try out patches,
they're going to do what I did and stick another network card in the box and
go on with life.

Our users deserve better than this.

Dave

--
http://www.codemonkey.org.uk

2007-10-23 21:35:38

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

> People aren't going to report this as a bug. They aren't going to try out patches,
> they're going to do what I did and stick another network card in the box and
> go on with life.
>
> Our users deserve better than this.

Agreed. By all means warn people, or give them a 1-800 Intel number to
phone, but they should be able to continue as well.

Alan

2007-10-23 21:48:40

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

From: Jeff Garzik <[email protected]>
Date: Tue, 23 Oct 2007 16:40:01 -0400

> Sorry, I disagree. Just as with e100, if there is a clear way the user
> can recover their setup -- and Adam says his was effective -- I don't
> see why we should be denying users the ability to use their own hardware.

I'd like to second these sentiments.

Just because you can come up with cases where using a wrong
eeprom would fail, does not mean the facility should not be
provided at all.

2007-10-23 21:50:56

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

From: "Kok, Auke" <[email protected]>
Date: Tue, 23 Oct 2007 14:01:21 -0700

> We help everyone out, and if you merge this patch you will prevent
> users from getting to us for support in the first place.

If using the bad eeprom has to be explicitly enabled by the user, your
argument holds no water. We just need to make sure the patch does
that.

2007-10-23 21:53:20

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

From: Dave Jones <[email protected]>
Date: Tue, 23 Oct 2007 17:20:26 -0400

> Indeed. This is a common enough problem that not including it causes
> more pain than its worth. I have two affected boxes myself that I
> actually thought the hardware was dead before I tried ajax's patch.
>
> People aren't going to report this as a bug. They aren't going to
> try out patches, they're going to do what I did and stick another
> network card in the box and go on with life.
>
> Our users deserve better than this.

Seconded. The resistence to this patch is just flat-out rediculious,
just like it was in the e100 case.

And I think all of this "e1000 is different!" talk is merely a
scarecrow for the fact that Intel simply doesn't want this patch
merged for some other reason.

2007-10-23 23:04:20

by Kok, Auke

[permalink] [raw]
Subject: Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

Dave Jones wrote:
> On Tue, Oct 23, 2007 at 04:40:01PM -0400, Jeff Garzik wrote:
>
> > > In any case, this patch should not be merged. We often send it around to users to
> > > debug their issue in case it involves eeproms, but merging it will just conceal
> > > the real issue and all of a sudden a flood of people stop reporting *real* issues
> > > to us.
> >
> > Sorry, I disagree. Just as with e100, if there is a clear way the user
> > can recover their setup -- and Adam says his was effective -- I don't
> > see why we should be denying users the ability to use their own hardware.
>
> Indeed. This is a common enough problem that not including it causes more pain
> than its worth. I have two affected boxes myself that I actually thought
> the hardware was dead before I tried ajax's patch.


look: You should have reported this to us and you didn't. Now you are using the
fact that you did not report it as an argument which is out of place.

why do you say it is common? how often have you seen this and not reported it back
to our support? are you willingly trying to frustrate this issue?


Auke

2007-10-23 23:19:57

by Kok, Auke

[permalink] [raw]
Subject: Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

David Miller wrote:
> From: Dave Jones <[email protected]>
> Date: Tue, 23 Oct 2007 17:20:26 -0400
>
>> Indeed. This is a common enough problem that not including it causes
>> more pain than its worth. I have two affected boxes myself that I
>> actually thought the hardware was dead before I tried ajax's patch.
>>
>> People aren't going to report this as a bug. They aren't going to
>> try out patches, they're going to do what I did and stick another
>> network card in the box and go on with life.
>>
>> Our users deserve better than this.
>
> Seconded. The resistence to this patch is just flat-out rediculious,
> just like it was in the e100 case.
>
> And I think all of this "e1000 is different!" talk is merely a
> scarecrow for the fact that Intel simply doesn't want this patch
> merged for some other reason.

no, e1000 eeproms contain many timing information and bits crucial to getting the
adapter working in the first place. All of these are documented in our PUBLICALLY
available SDM which is downloadable from our e1000.sf.net website. (e.g. 8254x
sdm, section 5.6, page 98+). For pci-e silicon this gets much more complex.

we haven't even heard from the user what hardware he has nor gotten an eeprom dump
from him.

I'm not hiding anything and you're deliberately creating a negative atmosphere here.

The people who do have eeprom checksum issues have come to us in the past. The
fact that you didn't see them means that they *properly* made it to us.

As a matter of fact I am still working on a permanent solution for bad eeprom
checksums on lenovo T60 laptops. Should I just drop that issue and leave the real
problem unsolved?

This patch only affirms *YOUR* point of view, not that of many customers who have
come to us and received help with many issues. You're completely ignoring that and
that is unfair.

If we want this patch in the kernel in some form that actually shows in a decent
way what a user *should* do and more importantly should *know*, then maybe we can
talk about that.

The patch in question does not add any extra information nor does it do some
sanity checking on the eeprom values or turn off any of the problematic features
that we really should disable. We even have code that allows some of the hardware
to run without a properly setup eeprom in a few hardware cases. And we definately
should print out a lot more warnings to the user that running with garbage eeprom
data is _not_ a good idea.


Auke


2007-10-23 23:54:32

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

On Tue, 23 Oct 2007 16:03:38 -0700
"Kok, Auke" <[email protected]> wrote:

> Dave Jones wrote:
> > On Tue, Oct 23, 2007 at 04:40:01PM -0400, Jeff Garzik wrote:
> >
> > > > In any case, this patch should not be merged. We often send it around to users to
> > > > debug their issue in case it involves eeproms, but merging it will just conceal
> > > > the real issue and all of a sudden a flood of people stop reporting *real* issues
> > > > to us.
> > >
> > > Sorry, I disagree. Just as with e100, if there is a clear way the user
> > > can recover their setup -- and Adam says his was effective -- I don't
> > > see why we should be denying users the ability to use their own hardware.
> >
> > Indeed. This is a common enough problem that not including it causes more pain
> > than its worth. I have two affected boxes myself that I actually thought
> > the hardware was dead before I tried ajax's patch.
>
>
> look: You should have reported this to us and you didn't. Now you are using the
> fact that you did not report it as an argument which is out of place.
>
> why do you say it is common? how often have you seen this and not reported it back
> to our support? are you willingly trying to frustrate this issue?
>
>
> Auke

What about a compromise like "ignore_checksum" module option?
That way users with bad checksums wouldn't just ignore the problem (no one reads console logs),
but would have a way to correct the checksum.

There are many reasons would want the ability to fix the problem themselves without
asking Intel.

--
Stephen Hemminger <[email protected]>

2007-10-24 00:55:46

by Jeff Garzik

[permalink] [raw]
Subject: [PATCH] e1000, e1000e valid-addr fixes

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index f1ce348..8936d48 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -1022,11 +1022,6 @@ e1000_probe(struct pci_dev *pdev,
memcpy(netdev->dev_addr, adapter->hw.mac_addr, netdev->addr_len);
memcpy(netdev->perm_addr, adapter->hw.mac_addr, netdev->addr_len);

- if (!is_valid_ether_addr(netdev->perm_addr)) {
- DPRINTK(PROBE, ERR, "Invalid MAC Address\n");
- goto err_eeprom;
- }
-
e1000_get_bus_info(&adapter->hw);

init_timer(&adapter->tx_fifo_stall_timer);
@@ -1134,7 +1129,9 @@ e1000_probe(struct pci_dev *pdev,
"32-bit"));
}

- printk("%s\n", print_mac(mac, netdev->dev_addr));
+ printk("%s%s\n",
+ print_mac(mac, netdev->dev_addr),
+ is_valid_ether_addr(netdev->dev_addr) ? "" : " (INVALID)");

/* reset the hardware with the new settings */
e1000_reset(adapter);
@@ -1396,6 +1393,9 @@ e1000_open(struct net_device *netdev)
struct e1000_adapter *adapter = netdev_priv(netdev);
int err;

+ if (!is_valid_ether_addr(netdev->dev_addr))
+ return -EINVAL;
+
/* disallow open during test */
if (test_bit(__E1000_TESTING, &adapter->flags))
return -EBUSY;
@@ -2377,7 +2377,7 @@ e1000_set_mac(struct net_device *netdev, void *p)
struct sockaddr *addr = p;

if (!is_valid_ether_addr(addr->sa_data))
- return -EADDRNOTAVAIL;
+ return -EINVAL;

/* 82542 2.0 needs to be in reset to write receive address registers */

diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 033e124..0e3216c 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -2557,6 +2557,9 @@ static int e1000_open(struct net_device *netdev)
struct e1000_hw *hw = &adapter->hw;
int err;

+ if (!is_valid_ether_addr(netdev->dev_addr))
+ return -EINVAL;
+
/* disallow open during test */
if (test_bit(__E1000_TESTING, &adapter->state))
return -EBUSY;
@@ -2670,7 +2673,7 @@ static int e1000_set_mac(struct net_device *netdev, void *p)
struct sockaddr *addr = p;

if (!is_valid_ether_addr(addr->sa_data))
- return -EADDRNOTAVAIL;
+ return -EINVAL;

memcpy(netdev->dev_addr, addr->sa_data, netdev->addr_len);
memcpy(adapter->hw.mac.addr, addr->sa_data, netdev->addr_len);
@@ -3960,14 +3963,16 @@ static void e1000_print_device_info(struct e1000_adapter *adapter)

/* print bus type/speed/width info */
ndev_info(netdev, "(PCI Express:2.5GB/s:%s) "
- "%02x:%02x:%02x:%02x:%02x:%02x\n",
+ "%02x:%02x:%02x:%02x:%02x:%02x%s\n",
/* bus width */
((hw->bus.width == e1000_bus_width_pcie_x4) ? "Width x4" :
"Width x1"),
/* MAC address */
netdev->dev_addr[0], netdev->dev_addr[1],
netdev->dev_addr[2], netdev->dev_addr[3],
- netdev->dev_addr[4], netdev->dev_addr[5]);
+ netdev->dev_addr[4], netdev->dev_addr[5],
+ is_valid_ether_addr(netdev->dev_addr) ?
+ "" : " (INVALID)");
ndev_info(netdev, "Intel(R) PRO/%s Network Connection\n",
(hw->phy.type == e1000_phy_ife)
? "10/100" : "1000");
@@ -4170,16 +4175,6 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
memcpy(netdev->dev_addr, adapter->hw.mac.addr, netdev->addr_len);
memcpy(netdev->perm_addr, adapter->hw.mac.addr, netdev->addr_len);

- if (!is_valid_ether_addr(netdev->perm_addr)) {
- ndev_err(netdev, "Invalid MAC Address: "
- "%02x:%02x:%02x:%02x:%02x:%02x\n",
- netdev->perm_addr[0], netdev->perm_addr[1],
- netdev->perm_addr[2], netdev->perm_addr[3],
- netdev->perm_addr[4], netdev->perm_addr[5]);
- err = -EIO;
- goto err_eeprom;
- }
-
init_timer(&adapter->watchdog_timer);
adapter->watchdog_timer.function = &e1000_watchdog;
adapter->watchdog_timer.data = (unsigned long) adapter;


Attachments:
patch.e1000-addr (3.65 kB)

2007-10-24 01:03:49

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] e1000, e1000e valid-addr fixes

Jeff Garzik wrote:
> Actually, looking over the code I see obvious bugs in the logic:
>
> An invalid ethernet address should not cause device loading to fail,
> because the user is given the opportunity to supply a MAC address via
> userspace (ifconfig or whatever) before the interface goes up.
>
> I just created the attached -bug fix- patch as illustration, though I
> have not committed it, waiting for comment.
>
> This patch will make no difference for users hitting invalid-eep-csum
> rather than invalid-MAC-addr condition, but it's a problem I noticed
> while reviewing Adam's patch in detail.


Adding my own comment :)

Does the ethernet stack check is_valid_ether_addr() before permitting
interface-up?

I'm wondering if there is a way to avoid adding

if (!is_valid_ether_addr(dev->dev_addr))
return -EINVAL;

to every ethernet driver's ->open() hook.

Jeff


2007-10-24 01:07:29

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] e1000, e1000e valid-addr fixes

From: Jeff Garzik <[email protected]>
Date: Tue, 23 Oct 2007 21:03:36 -0400

> I'm wondering if there is a way to avoid adding
>
> if (!is_valid_ether_addr(dev->dev_addr))
> return -EINVAL;
>
> to every ethernet driver's ->open() hook.

The first idea I get is:

1) Create netdev->validate_dev_addr().

2) If it exists, invoke it before ->open(), abort
and return if any errors signaled.

etherdev init hooks up a function that does the above
check, which allows us to avoid editing every ethernet
driver

What do you think?

2007-10-24 01:15:23

by Adrian Bunk

[permalink] [raw]
Subject: Re: [PATCH] e1000, e1000e valid-addr fixes

On Tue, Oct 23, 2007 at 08:55:29PM -0400, Jeff Garzik wrote:
> Actually, looking over the code I see obvious bugs in the logic:
>
> An invalid ethernet address should not cause device loading to fail,
> because the user is given the opportunity to supply a MAC address via
> userspace (ifconfig or whatever) before the interface goes up.
>
> I just created the attached -bug fix- patch as illustration, though I have
> not committed it, waiting for comment.
>...

Is there a good reason why we have such checks duplicated in the drivers
(with every driver doing it differently...) instead of doing it in
net/core/dev.c?

> Jeff

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-10-24 02:20:44

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] e1000, e1000e valid-addr fixes

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4a3f54e..962d1de 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -669,6 +669,8 @@ struct net_device
#define HAVE_SET_MAC_ADDR
int (*set_mac_address)(struct net_device *dev,
void *addr);
+#define HAVE_VALIDATE_ADDR
+ int (*validate_addr)(struct net_device *dev);
#define HAVE_PRIVATE_IOCTL
int (*do_ioctl)(struct net_device *dev,
struct ifreq *ifr, int cmd);
diff --git a/net/core/dev.c b/net/core/dev.c
index 8726589..f861555 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1007,17 +1007,20 @@ int dev_open(struct net_device *dev)
* Call device private open method
*/
set_bit(__LINK_STATE_START, &dev->state);
- if (dev->open) {
+
+ if (dev->validate_addr)
+ ret = dev->validate_addr(dev);
+
+ if (!ret && dev->open)
ret = dev->open(dev);
- if (ret)
- clear_bit(__LINK_STATE_START, &dev->state);
- }

/*
* If it went open OK then:
*/

- if (!ret) {
+ if (ret)
+ clear_bit(__LINK_STATE_START, &dev->state);
+ else {
/*
* Set the flags.
*/
@@ -1038,6 +1041,7 @@ int dev_open(struct net_device *dev)
*/
call_netdevice_notifiers(NETDEV_UP, dev);
}
+
return ret;
}

diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index ed8a3d4..5471cd2 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -298,6 +298,14 @@ static int eth_change_mtu(struct net_device *dev, int new_mtu)
return 0;
}

+static int eth_validate_addr(struct net_device *dev)
+{
+ if (!is_valid_ether_addr(dev->dev_addr))
+ return -EINVAL;
+
+ return 0;
+}
+
const struct header_ops eth_header_ops ____cacheline_aligned = {
.create = eth_header,
.parse = eth_header_parse,
@@ -317,6 +325,7 @@ void ether_setup(struct net_device *dev)

dev->change_mtu = eth_change_mtu;
dev->set_mac_address = eth_mac_addr;
+ dev->validate_addr = eth_validate_addr;

dev->type = ARPHRD_ETHER;
dev->hard_header_len = ETH_HLEN;


Attachments:
patch.validate-addr (1.95 kB)

2007-10-24 02:22:58

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] e1000, e1000e valid-addr fixes

From: Jeff Garzik <[email protected]>
Date: Tue, 23 Oct 2007 22:20:30 -0400

> David Miller wrote:
> > From: Jeff Garzik <[email protected]>
> > Date: Tue, 23 Oct 2007 21:03:36 -0400
> >
> >> I'm wondering if there is a way to avoid adding
> >>
> >> if (!is_valid_ether_addr(dev->dev_addr))
> >> return -EINVAL;
> >>
> >> to every ethernet driver's ->open() hook.
> >
> > The first idea I get is:
> >
> > 1) Create netdev->validate_dev_addr().
> >
> > 2) If it exists, invoke it before ->open(), abort
> > and return if any errors signaled.
> >
> > etherdev init hooks up a function that does the above
> > check, which allows us to avoid editing every ethernet
> > driver
> >
> > What do you think?
>
> Seems sane to me. Something like this (attached)?

Looks great:

Acked-by: David S. Miller <[email protected]>

2007-10-24 05:25:39

by speedy

[permalink] [raw]
Subject: Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

Hello Auke, Kernel crew,

> I realize that you need the patch to actually create it but the
> danger is that people will start using it *without* troubleshooting the real
> issue.

Just write out a big fat kernel output with explanations of
the override parameter, possible repercusionss of not fixing it
and an email address to which you (or, better yet - Intel) want
stuff "this and that" reported.

I hadn't the slightest idea from the kernel messages or the skim of
the driver source that anyone is wanting feedback on this issue...

ps. Were there recently (2.6.22+) any known issues about e1000 refusing
to send packets out / getting stuck upon jumbo-frames being enabled?

ps2. I'm not following this list, sorry about the poor thread CC:
quality


--
Best regards,
speedy mailto:[email protected]

2007-10-24 05:41:22

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

On Tue, Oct 23, 2007 at 04:03:38PM -0700, Kok, Auke wrote:
> Dave Jones wrote:
> > On Tue, Oct 23, 2007 at 04:40:01PM -0400, Jeff Garzik wrote:
> >
> > > > In any case, this patch should not be merged. We often send it around to users to
> > > > debug their issue in case it involves eeproms, but merging it will just conceal
> > > > the real issue and all of a sudden a flood of people stop reporting *real* issues
> > > > to us.
> > >
> > > Sorry, I disagree. Just as with e100, if there is a clear way the user
> > > can recover their setup -- and Adam says his was effective -- I don't
> > > see why we should be denying users the ability to use their own hardware.
> >
> > Indeed. This is a common enough problem that not including it causes more pain
> > than its worth. I have two affected boxes myself that I actually thought
> > the hardware was dead before I tried ajax's patch.
>
>
> look: You should have reported this to us and you didn't. Now you are using the
> fact that you did not report it as an argument which is out of place.

you're missing the point. It looks like a hardware failure. Why would I report this?

> why do you say it is common? how often have you seen this and not reported it back
> to our support? are you willingly trying to frustrate this issue?

Not at all. The only frustration here is that I used to have a kernel that
worked, upgraded, and thought that my hardware was broken.
How many other users thought the same ?

Dave

--
http://www.codemonkey.org.uk

2007-11-01 18:07:01

by Kok, Auke

[permalink] [raw]
Subject: Re: [PATCH] e1000, e1000e valid-addr fixes

David Miller wrote:
> From: Jeff Garzik <[email protected]>
> Date: Tue, 23 Oct 2007 22:20:30 -0400
>
>> David Miller wrote:
>>> From: Jeff Garzik <[email protected]>
>>> Date: Tue, 23 Oct 2007 21:03:36 -0400
>>>
>>>> I'm wondering if there is a way to avoid adding
>>>>
>>>> if (!is_valid_ether_addr(dev->dev_addr))
>>>> return -EINVAL;
>>>>
>>>> to every ethernet driver's ->open() hook.
>>> The first idea I get is:
>>>
>>> 1) Create netdev->validate_dev_addr().
>>>
>>> 2) If it exists, invoke it before ->open(), abort
>>> and return if any errors signaled.
>>>
>>> etherdev init hooks up a function that does the above
>>> check, which allows us to avoid editing every ethernet
>>> driver
>>>
>>> What do you think?
>> Seems sane to me. Something like this (attached)?
>
> Looks great:
>
> Acked-by: David S. Miller <[email protected]>

I like it.

Should I start sending patches to remove the checks from e1000/e1000e/ixgb/ixgbe
already (to David, I assume?)?

Auke

2007-11-01 18:13:57

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] e1000, e1000e valid-addr fixes

How about:

static int eth_validate_addr(const struct net_device *dev)
{
return is_valid_ether_addr(dev->dev_addr) ? 0 : -EINVAL;
}

--
Stephen Hemminger <[email protected]>

2007-11-01 18:48:18

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] e1000, e1000e valid-addr fixes

Kok, Auke wrote:
> David Miller wrote:
>> From: Jeff Garzik <[email protected]>
>> Date: Tue, 23 Oct 2007 22:20:30 -0400
>>
>>> David Miller wrote:
>>>> From: Jeff Garzik <[email protected]>
>>>> Date: Tue, 23 Oct 2007 21:03:36 -0400
>>>>
>>>>> I'm wondering if there is a way to avoid adding
>>>>>
>>>>> if (!is_valid_ether_addr(dev->dev_addr))
>>>>> return -EINVAL;
>>>>>
>>>>> to every ethernet driver's ->open() hook.
>>>> The first idea I get is:
>>>>
>>>> 1) Create netdev->validate_dev_addr().
>>>>
>>>> 2) If it exists, invoke it before ->open(), abort
>>>> and return if any errors signaled.
>>>>
>>>> etherdev init hooks up a function that does the above
>>>> check, which allows us to avoid editing every ethernet
>>>> driver
>>>>
>>>> What do you think?
>>> Seems sane to me. Something like this (attached)?
>> Looks great:
>>
>> Acked-by: David S. Miller <[email protected]>
>
> I like it.
>
> Should I start sending patches to remove the checks from e1000/e1000e/ixgb/ixgbe
> already (to David, I assume?)?

Send the patches to me like normal...

Jeff



2007-11-01 19:31:40

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] e1000, e1000e valid-addr fixes

Stephen Hemminger wrote:
> How about:
>
> static int eth_validate_addr(const struct net_device *dev)
> {
> return is_valid_ether_addr(dev->dev_addr) ? 0 : -EINVAL;
> }

hmmm -- its a slow path, so I don't see the value of marking the
argument 'const' -- right now this implementation merely reads the
dev->dev_addr[], but that need not always be the case.

And I don't see the value of squashing everything onto one line, IMO the
current version is more readable.

Jeff