2006-03-30 21:39:50

by linas

[permalink] [raw]
Subject: [PATCH]: e1000: prevent statistics from getting garbled during reset.


Please review, sign-off/ack, and forward upstream.
--linas

[PATCH]: e1000: prevent statistics from getting garbled during reset.

If a PCI bus error/fault triggers a PCI bus reset, attempts to get the
ethernet packet count statistics from the hardware will fail, returning
garbage data upstream. This patch skips statistics data collection
if the PCI device is not on the bus.

This patch presumes that an earlier patch,
[PATCH] PCI Error Recovery: e1000 network device driver
has already been applied.

Signed-off-by: Linas Vepstas <[email protected]>

----
drivers/net/e1000/e1000_main.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

Index: linux-2.6.16-git6/drivers/net/e1000/e1000_main.c
===================================================================
--- linux-2.6.16-git6.orig/drivers/net/e1000/e1000_main.c 2006-03-30 14:42:33.000000000 -0600
+++ linux-2.6.16-git6/drivers/net/e1000/e1000_main.c 2006-03-30 14:44:52.000000000 -0600
@@ -3069,13 +3069,17 @@ void
e1000_update_stats(struct e1000_adapter *adapter)
{
struct e1000_hw *hw = &adapter->hw;
+ struct pci_dev *pdev = adapter->pdev;
unsigned long flags;
uint16_t phy_tmp;

#define PHY_IDLE_ERROR_COUNT_MASK 0x00FF

- /* Prevent stats update while adapter is being reset */
- if (adapter->link_speed == 0)
+ /* Prevent stats update while adapter is being reset, or if
+ * the pci connection is down. */
+ if ((adapter->link_speed == 0) ||
+ ((pdev->error_state != 0) &&
+ (pdev->error_state != pci_channel_io_normal)))
return;

spin_lock_irqsave(&adapter->stats_lock, flags);


2006-03-31 00:02:14

by linas

[permalink] [raw]
Subject: Re: [PATCH]: e1000: prevent statistics from getting garbled during reset.

On Thu, Mar 30, 2006 at 03:39:28PM -0600, Linas Vepstas wrote:
>
> Please review, sign-off/ack, and forward upstream.
> --linas

Per feedback, here's a slightly more human-readable version.
--linas

[PATCH]: e1000: prevent statistics from getting garbled during reset.

If a PCI bus error/fault triggers a PCI bus reset, attempts to get the
ethernet packet count statistics from the hardware will fail, returning
garbage data upstream. This patch skips statistics data collection
if the PCI device is not on the bus.

This patch presumes that an earlier patch,
[PATCH] PCI Error Recovery: e1000 network device driver
has already been applied.

Signed-off-by: Linas Vepstas <[email protected]>

----
drivers/net/e1000/e1000_main.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletion(-)

Index: linux-2.6.16-git6/drivers/net/e1000/e1000_main.c
===================================================================
--- linux-2.6.16-git6.orig/drivers/net/e1000/e1000_main.c 2006-03-30 17:51:37.924162779 -0600
+++ linux-2.6.16-git6/drivers/net/e1000/e1000_main.c 2006-03-30 17:54:07.659188391 -0600
@@ -3069,14 +3069,18 @@ void
e1000_update_stats(struct e1000_adapter *adapter)
{
struct e1000_hw *hw = &adapter->hw;
+ struct pci_dev *pdev = adapter->pdev;
unsigned long flags;
uint16_t phy_tmp;

#define PHY_IDLE_ERROR_COUNT_MASK 0x00FF

- /* Prevent stats update while adapter is being reset */
+ /* Prevent stats update while adapter is being reset,
+ * or if the pci connection is down. */
if (adapter->link_speed == 0)
return;
+ if (pdev->error_state && pdev->error_state != pci_channel_io_normal)
+ return;

spin_lock_irqsave(&adapter->stats_lock, flags);

2006-03-31 00:14:41

by Jeffrey V. Merkey

[permalink] [raw]
Subject: Re: [PATCH]: e1000: prevent statistics from getting garbled during reset.


Linas Vepstas wrote:

>On Thu, Mar 30, 2006 at 03:39:28PM -0600, Linas Vepstas wrote:
>
>
>>Please review, sign-off/ack, and forward upstream.
>>--linas
>>
>>
>
>Per feedback, here's a slightly more human-readable version.
>--linas
>
>[PATCH]: e1000: prevent statistics from getting garbled during reset.
>
>If a PCI bus error/fault triggers a PCI bus reset, attempts to get the
>ethernet packet count statistics from the hardware will fail, returning
>garbage data upstream. This patch skips statistics data collection
>if the PCI device is not on the bus.
>
>This patch presumes that an earlier patch,
>[PATCH] PCI Error Recovery: e1000 network device driver
>has already been applied.
>
>Signed-off-by: Linas Vepstas <[email protected]>
>
>----
> drivers/net/e1000/e1000_main.c | 6 +++++-
> 1 files changed, 5 insertions(+), 1 deletion(-)
>
>Index: linux-2.6.16-git6/drivers/net/e1000/e1000_main.c
>===================================================================
>--- linux-2.6.16-git6.orig/drivers/net/e1000/e1000_main.c 2006-03-30 17:51:37.924162779 -0600
>+++ linux-2.6.16-git6/drivers/net/e1000/e1000_main.c 2006-03-30 17:54:07.659188391 -0600
>@@ -3069,14 +3069,18 @@ void
> e1000_update_stats(struct e1000_adapter *adapter)
> {
> struct e1000_hw *hw = &adapter->hw;
>+ struct pci_dev *pdev = adapter->pdev;
> unsigned long flags;
> uint16_t phy_tmp;
>
> #define PHY_IDLE_ERROR_COUNT_MASK 0x00FF
>
>- /* Prevent stats update while adapter is being reset */
>+ /* Prevent stats update while adapter is being reset,
>+ * or if the pci connection is down. */
> if (adapter->link_speed == 0)
> return;
>+ if (pdev->error_state && pdev->error_state != pci_channel_io_normal)
>+ return;
>
> spin_lock_irqsave(&adapter->stats_lock, flags);
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>

The driver also needs to be fixed to allow clearing of the stats (like
all the other adapter drivers). At present, when I run performance
and packet drop counts on the cards, I cannot reset the stats with this
code because the driver stores them in the e100_adapter
structure. This is busted.

This function:

int clear_network_device_stats(BYTE *name)
{
register int i;

i = get_dev_index_by_name(name);
if (i != -1)
{
if (ndevs[i])
{
struct net_device_stats *stats;

stats = (ndevs[i]->get_stats)(ndevs[i]);
if (stats)
{
stats->rx_packets = 0;
stats->tx_packets = 0;
stats->rx_bytes = 0;
stats->tx_bytes = 0;
stats->multicast = 0;
stats->collisions = 0;
stats->rx_errors = 0;
stats->rx_length_errors = 0;
stats->rx_crc_errors = 0;
stats->rx_frame_errors = 0;
stats->rx_fifo_errors = 0;
stats->rx_missed_errors = 0;
stats->tx_errors = 0;
stats->tx_aborted_errors = 0;
stats->tx_window_errors = 0;
stats->tx_carrier_errors = 0;
P_Print("solera: stats cleared for %s\n", ndevs[i]->name);
return 0;
}
}
return 0;
}
return -1;
}

does not work on e1000 due to this section of code:


void
e1000_update_stats(struct e1000_adapter *adapter)
{
struct e1000_hw *hw = &adapter->hw;
unsigned long flags;
uint16_t phy_tmp;

#define PHY_IDLE_ERROR_COUNT_MASK 0x00FF

spin_lock_irqsave(&adapter->stats_lock, flags);

/* these counters are modified from e1000_adjust_tbi_stats,
* called from the interrupt context, so they must only
* be written while holding adapter->stats_lock
*/

adapter->stats.crcerrs += E1000_READ_REG(hw, CRCERRS);
adapter->stats.gprc += E1000_READ_REG(hw, GPRC);
adapter->stats.gorcl += E1000_READ_REG(hw, GORCL);
adapter->stats.gorch += E1000_READ_REG(hw, GORCH);
adapter->stats.bprc += E1000_READ_REG(hw, BPRC);
adapter->stats.mprc += E1000_READ_REG(hw, MPRC);
adapter->stats.roc += E1000_READ_REG(hw, ROC);
adapter->stats.prc64 += E1000_READ_REG(hw, PRC64);
adapter->stats.prc127 += E1000_READ_REG(hw, PRC127);
adapter->stats.prc255 += E1000_READ_REG(hw, PRC255);
adapter->stats.prc511 += E1000_READ_REG(hw, PRC511);
adapter->stats.prc1023 += E1000_READ_REG(hw, PRC1023);
adapter->stats.prc1522 += E1000_READ_REG(hw, PRC1522);

adapter->stats.symerrs += E1000_READ_REG(hw, SYMERRS);
adapter->stats.mpc += E1000_READ_REG(hw, MPC);
adapter->stats.scc += E1000_READ_REG(hw, SCC);
adapter->stats.ecol += E1000_READ_REG(hw, ECOL);
adapter->stats.mcc += E1000_READ_REG(hw, MCC);
adapter->stats.latecol += E1000_READ_REG(hw, LATECOL);
adapter->stats.dc += E1000_READ_REG(hw, DC);
adapter->stats.sec += E1000_READ_REG(hw, SEC);
adapter->stats.rlec += E1000_READ_REG(hw, RLEC);
adapter->stats.xonrxc += E1000_READ_REG(hw, XONRXC);
adapter->stats.xontxc += E1000_READ_REG(hw, XONTXC);
adapter->stats.xoffrxc += E1000_READ_REG(hw, XOFFRXC);
adapter->stats.xofftxc += E1000_READ_REG(hw, XOFFTXC);
adapter->stats.fcruc += E1000_READ_REG(hw, FCRUC);
adapter->stats.gptc += E1000_READ_REG(hw, GPTC);
adapter->stats.gotcl += E1000_READ_REG(hw, GOTCL);
adapter->stats.gotch += E1000_READ_REG(hw, GOTCH);
adapter->stats.rnbc += E1000_READ_REG(hw, RNBC);
adapter->stats.ruc += E1000_READ_REG(hw, RUC);
adapter->stats.rfc += E1000_READ_REG(hw, RFC);
adapter->stats.rjc += E1000_READ_REG(hw, RJC);
adapter->stats.torl += E1000_READ_REG(hw, TORL);
adapter->stats.torh += E1000_READ_REG(hw, TORH);
adapter->stats.totl += E1000_READ_REG(hw, TOTL);
adapter->stats.toth += E1000_READ_REG(hw, TOTH);
adapter->stats.tpr += E1000_READ_REG(hw, TPR);
adapter->stats.ptc64 += E1000_READ_REG(hw, PTC64);
adapter->stats.ptc127 += E1000_READ_REG(hw, PTC127);
adapter->stats.ptc255 += E1000_READ_REG(hw, PTC255);
adapter->stats.ptc511 += E1000_READ_REG(hw, PTC511);
adapter->stats.ptc1023 += E1000_READ_REG(hw, PTC1023);
adapter->stats.ptc1522 += E1000_READ_REG(hw, PTC1522);
adapter->stats.mptc += E1000_READ_REG(hw, MPTC);
adapter->stats.bptc += E1000_READ_REG(hw, BPTC);

//NOTE These stats need to be stored in the stats structure so they can
be cleared by
statistics monitoring programs.

/* used for adaptive IFS */

hw->tx_packet_delta = E1000_READ_REG(hw, TPT);
adapter->stats.tpt += hw->tx_packet_delta;
hw->collision_delta = E1000_READ_REG(hw, COLC);
adapter->stats.colc += hw->collision_delta;

if(hw->mac_type >= e1000_82543) {
adapter->stats.algnerrc += E1000_READ_REG(hw, ALGNERRC);
adapter->stats.rxerrc += E1000_READ_REG(hw, RXERRC);
adapter->stats.tncrs += E1000_READ_REG(hw, TNCRS);
adapter->stats.cexterr += E1000_READ_REG(hw, CEXTERR);
adapter->stats.tsctc += E1000_READ_REG(hw, TSCTC);
adapter->stats.tsctfc += E1000_READ_REG(hw, TSCTFC);
}

/* Fill out the OS statistics structure */

adapter->net_stats.rx_packets = adapter->stats.gprc;
adapter->net_stats.tx_packets = adapter->stats.gptc;
adapter->net_stats.rx_bytes = adapter->stats.gorcl;
adapter->net_stats.tx_bytes = adapter->stats.gotcl;
adapter->net_stats.multicast = adapter->stats.mprc;
adapter->net_stats.collisions = adapter->stats.colc;

/* Rx Errors */

adapter->net_stats.rx_errors = adapter->stats.rxerrc +
adapter->stats.crcerrs + adapter->stats.algnerrc +
adapter->stats.rlec + adapter->stats.rnbc +
adapter->stats.mpc + adapter->stats.cexterr;
adapter->net_stats.rx_dropped = adapter->stats.rnbc;
adapter->net_stats.rx_length_errors = adapter->stats.rlec;
adapter->net_stats.rx_crc_errors = adapter->stats.crcerrs;
adapter->net_stats.rx_frame_errors = adapter->stats.algnerrc;
adapter->net_stats.rx_fifo_errors = adapter->stats.mpc;
adapter->net_stats.rx_missed_errors = adapter->stats.mpc;

/* Tx Errors */

adapter->net_stats.tx_errors = adapter->stats.ecol +
adapter->stats.latecol;
adapter->net_stats.tx_aborted_errors = adapter->stats.ecol;
adapter->net_stats.tx_window_errors = adapter->stats.latecol;
adapter->net_stats.tx_carrier_errors = adapter->stats.tncrs;

/* Tx Dropped needs to be maintained elsewhere */

/* Phy Stats */

if(hw->media_type == e1000_media_type_copper) {
if((adapter->link_speed == SPEED_1000) &&
(!e1000_read_phy_reg(hw, PHY_1000T_STATUS, &phy_tmp))) {
phy_tmp &= PHY_IDLE_ERROR_COUNT_MASK;
adapter->phy_stats.idle_errors += phy_tmp;
}

if((hw->mac_type <= e1000_82546) &&
(hw->phy_type == e1000_phy_m88) &&
!e1000_read_phy_reg(hw, M88E1000_RX_ERR_CNTR, &phy_tmp))
adapter->phy_stats.receive_errors += phy_tmp;
}

spin_unlock_irqrestore(&adapter->stats_lock, flags);
}

2006-03-31 00:35:13

by linas

[permalink] [raw]
Subject: Re: [PATCH]: e1000: prevent statistics from getting garbled during reset.

On Thu, Mar 30, 2006 at 06:05:45PM -0700, Jeff V. Merkey wrote:
>
> Linas Vepstas wrote:

Well, these comments have nothing to do with my patch, but ...
anyway ...

> The driver also needs to be fixed to allow clearing of the stats (like
> all the other adapter drivers). At present, when I run performance
> and packet drop counts on the cards, I cannot reset the stats with this
> code because the driver stores them in the e100_adapter
> structure. This is busted.
>
> This function:
>
> int clear_network_device_stats(BYTE *name)

I couldn't find such a function in the kernel.

> does not work on e1000 due to this section of code:
>
> void
> e1000_update_stats(struct e1000_adapter *adapter)
> {
>
> adapter->stats.xofftxc += E1000_READ_REG(hw, XOFFTXC);
> adapter->stats.fcruc += E1000_READ_REG(hw, FCRUC);

These are hardware stats ... presumably useless without
a detailed understanding of the guts of the e1000.

> //NOTE These stats need to be stored in the stats structure so they can
> be cleared by
> statistics monitoring programs.

I can't imagine what generic interface would allow these
to be viewed.

> /* Fill out the OS statistics structure */
>
> adapter->net_stats.rx_packets = adapter->stats.gprc;
> adapter->net_stats.tx_packets = adapter->stats.gptc;
> adapter->net_stats.rx_bytes = adapter->stats.gorcl;
> adapter->net_stats.tx_bytes = adapter->stats.gotcl;

Now *these* are generic ... and fixing this so that the
stats increment instead of over-riding would take
maybe half-an-hour or so; this is not hard to do ... !?

Do you want me to write a patch to do this?

--linas

2006-03-31 04:12:14

by Jeffrey V. Merkey

[permalink] [raw]
Subject: Re: [PATCH]: e1000: prevent statistics from getting garbled during reset.

Linas Vepstas wrote:

>On Thu, Mar 30, 2006 at 06:05:45PM -0700, Jeff V. Merkey wrote:
>
>
>>Linas Vepstas wrote:
>>
>>
>
>Well, these comments have nothing to do with my patch, but ...
>anyway ...
>
>
>
>>The driver also needs to be fixed to allow clearing of the stats (like
>>all the other adapter drivers). At present, when I run performance
>>and packet drop counts on the cards, I cannot reset the stats with this
>>code because the driver stores them in the e100_adapter
>>structure. This is busted.
>>
>>This function:
>>
>>int clear_network_device_stats(BYTE *name)
>>
>>
>
>I couldn't find such a function in the kernel.
>
>
>
>>does not work on e1000 due to this section of code:
>>
>>void
>>e1000_update_stats(struct e1000_adapter *adapter)
>>{
>>
>>adapter->stats.xofftxc += E1000_READ_REG(hw, XOFFTXC);
>>adapter->stats.fcruc += E1000_READ_REG(hw, FCRUC);
>>
>>
>
>These are hardware stats ... presumably useless without
>a detailed understanding of the guts of the e1000.
>
>
>
>>//NOTE These stats need to be stored in the stats structure so they can
>>be cleared by
>>statistics monitoring programs.
>>
>>
>
>I can't imagine what generic interface would allow these
>to be viewed.
>
>
>
>>/* Fill out the OS statistics structure */
>>
>>adapter->net_stats.rx_packets = adapter->stats.gprc;
>>adapter->net_stats.tx_packets = adapter->stats.gptc;
>>adapter->net_stats.rx_bytes = adapter->stats.gorcl;
>>adapter->net_stats.tx_bytes = adapter->stats.gotcl;
>>
>>
>
>Now *these* are generic ... and fixing this so that the
>stats increment instead of over-riding would take
>maybe half-an-hour or so; this is not hard to do ... !?
>
>Do you want me to write a patch to do this?
>
>--lina
>
>
Yes, we need one. The adapter needs to maintain these stats from the
registers in the kernel structure and not
its own local variables. That way, when someone calls to clear the stats
for testing and analysis purposes,
they zero out and are reset.

The code fragment above is in our analysis code not the kernel, but we
use it to test performance of various
vendor cards. It's provided as an example of this capability.

Jeff

>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>


2006-03-31 05:51:46

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH]: e1000: prevent statistics from getting garbled during reset.

On Thu, Mar 30, 2006 at 06:02:08PM -0600, Linas Vepstas wrote:
> - /* Prevent stats update while adapter is being reset */
> + /* Prevent stats update while adapter is being reset,
> + * or if the pci connection is down. */
> if (adapter->link_speed == 0)
> return;
> + if (pdev->error_state && pdev->error_state != pci_channel_io_normal)
> + return;

Coding style is still wrong here :(

(hint, use a tab...)

thanks,

greg k-h

2006-03-31 17:03:34

by linas

[permalink] [raw]
Subject: Re: [PATCH]: e1000: prevent statistics from getting garbled during reset.

On Thu, Mar 30, 2006 at 09:14:56PM -0700, Jeffrey V. Merkey wrote:
> Yes, we need one. The adapter needs to maintain these stats from the
> registers in the kernel structure and not
> its own local variables.

Did you read the code to see what the adapter does with these stats?
Among other things, it uses them to adaptively modulate transmit rates
to avoid collisions. Just clearing the hardware-private stats will mess
up that function.

> That way, when someone calls to clear the stats
> for testing and analysis purposes,
> they zero out and are reset.

1) ifdown/ifup is guarenteed to to clear things. Try that.

2) What's wrong with taking deltas? Typical through-put performance
measurement is done by pre-loading the pipes (i.e. running for
a few minutes wihtout measuring, then starting the measurement).
I'd think that snapshotting the numbers would be easier, and is
trivially doable in user-space. I guess I don't understand why
you need a new kernel featre to imlement this.

--linas

2006-03-31 17:06:24

by linas

[permalink] [raw]
Subject: Re: [PATCH]: e1000: prevent statistics from getting garbled during reset.

On Thu, Mar 30, 2006 at 09:46:54PM -0800, Greg KH wrote:
>
> (hint, use a tab...)

glurg.



[PATCH]: e1000: prevent statistics from getting garbled during reset.

If a PCI bus error/fault triggers a PCI bus reset, attempts to get the
ethernet packet count statistics from the hardware will fail, returning
garbage data upstream. This patch skips statistics data collection
if the PCI device is not on the bus.

This patch presumes that an earlier patch,
[PATCH] PCI Error Recovery: e1000 network device driver
has already been applied.

Signed-off-by: Linas Vepstas <[email protected]>

----
drivers/net/e1000/e1000_main.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletion(-)

Index: linux-2.6.16-git6/drivers/net/e1000/e1000_main.c
===================================================================
--- linux-2.6.16-git6.orig/drivers/net/e1000/e1000_main.c 2006-03-30 17:51:37.924162779 -0600
+++ linux-2.6.16-git6/drivers/net/e1000/e1000_main.c 2006-03-30 17:54:07.659188391 -0600
@@ -3069,14 +3069,18 @@ void
e1000_update_stats(struct e1000_adapter *adapter)
{
struct e1000_hw *hw = &adapter->hw;
+ struct pci_dev *pdev = adapter->pdev;
unsigned long flags;
uint16_t phy_tmp;

#define PHY_IDLE_ERROR_COUNT_MASK 0x00FF

- /* Prevent stats update while adapter is being reset */
+ /* Prevent stats update while adapter is being reset,
+ * or if the pci connection is down. */
if (adapter->link_speed == 0)
return;
+ if (pdev->error_state && pdev->error_state != pci_channel_io_normal)
+ return;

spin_lock_irqsave(&adapter->stats_lock, flags);

2006-03-31 17:36:32

by Rick Jones

[permalink] [raw]
Subject: Re: [PATCH]: e1000: prevent statistics from getting garbled during reset.

> 2) What's wrong with taking deltas? Typical through-put performance
> measurement is done by pre-loading the pipes (i.e. running for
> a few minutes wihtout measuring, then starting the measurement).
> I'd think that snapshotting the numbers would be easier, and is
> trivially doable in user-space. I guess I don't understand why
> you need a new kernel featre to imlement this.

ftp://ftp.cup.hp.com/dist/networking/tools/beforeafter.tar.gz

Not my code, I've used it with success against ethtool -S output.

rick jones

2006-03-31 17:49:13

by Jeffrey V. Merkey

[permalink] [raw]
Subject: Re: [PATCH]: e1000: prevent statistics from getting garbled during reset.

Linas Vepstas wrote:

>On Thu, Mar 30, 2006 at 09:14:56PM -0700, Jeffrey V. Merkey wrote:
>
>
>>Yes, we need one. The adapter needs to maintain these stats from the
>>registers in the kernel structure and not
>>its own local variables.
>>
>>
>
>Did you read the code to see what the adapter does with these stats?
>Among other things, it uses them to adaptively modulate transmit rates
>to avoid collisions. Just clearing the hardware-private stats will mess
>up that function.
>
>
>
I noticed that.

>>That way, when someone calls to clear the stats
>>for testing and analysis purposes,
>>they zero out and are reset.
>>
>>
>
>1) ifdown/ifup is guarenteed to to clear things. Try that.
>
>
No, not dynamic. I'll patch the driver locally, thanks.

Jeff

>2) What's wrong with taking deltas? Typical through-put performance
>measurement is done by pre-loading the pipes (i.e. running for
>a few minutes wihtout measuring, then starting the measurement).
>I'd think that snapshotting the numbers would be easier, and is
>trivially doable in user-space. I guess I don't understand why
>you need a new kernel featre to imlement this.
>
>--linas
>
>
>