2003-06-16 20:20:39

by Janice M Girouard

[permalink] [raw]
Subject: patch for common networking error messages


Below is a patch that demonstrates standard messages for ethernet device
drivers. I would like your feedback on the concept of standard network
messages, and any suggestions for messages to include.

The intent of the standard message change is to:
1) Ensure key events are communicated to user space in a predictable
way, enabling automated diagnostic systems or error log analysis,
2) Reduce the number of puzzling messages that are logged -- in this
case, by replacing them with standard messages, and/or
3) Identify the device (or driver name) that is responsible for the error.

The patch includes changes for two drivers, the e1000 and tg3, to
provide a concrete example of the concept. Below is a snapshot of an
error log, with the new messages:

Jun 4 14:54:06 dyn95394175 kernel: e1000: Intel(R) PRO/1000 Network
Driver - version 5.0.43-k3
Jun 4 14:54:06 dyn95394175 kernel: Copyright (c) 1999-2003 Intel
Corporation.
Jun 4 14:54:06 dyn95394175 kernel: eth2: Intel(R) PRO/1000 Network
Connection
Jun 4 14:54:06 dyn95394175 kernel: eth2: scatter/gather I/O enabled
Jun 4 14:54:06 dyn95394175 kernel: eth2: all IP checksums on transmit
enabled
Jun 4 14:54:06 dyn95394175 kernel: eth3: Intel(R) PRO/1000 Network
Connection
Jun 4 14:54:06 dyn95394175 kernel: eth3: scatter/gather I/O enabled
Jun 4 14:54:06 dyn95394175 kernel: eth3: all IP checksums on transmit
enabled
...
Jun 4 14:54:06 dyn95394175 kernel: tg3: Broadcom Tigon3 ethernet driver
- version 1.5
...

Below is the text for the most basic standard messages:

EMSG_NET_LINK_FAIL "%s: transient problem: link error detected - MII
status %x\n"
EMSG_NET_LINK_UP "%s: state change: link up, %d Mbps, %s-duplex\n"
EMSG_NET_HUNG "%s: software failure: ethernet controller hung\n"
EMSG_NET_RX_ERR "%s: transient problem: packet receive error,
rx_errors = %ld\n"
EMSG_NET_TX_ERR "%s: transient problem: packet transmit error,
tx_errors = %ld\n"
EMSG_NET_START_QUEUE "%s: performance event: (re)starting netdev queue\n"
EMSG_NET_STOP_QUEUE "%s: performance event: stopping netdev queue\n"
EMSG_NET_SGATHER "%s: scatter/gather I/O enabled\n"
EMSG_NET_NO_SGATHER "%s: performance event: scatter/gather I/O
disabled\n"
EMSG_NET_HW_CSUMS "%s: all IP checksums on transmit enabled\n"
EMSG_NET_CSUMS "%s: TCP/UDP over IPv6 checksums on transmit
enabled\n"
EMSG_NET_NO_CSUMS "%s: performance event: IP checksums on transmit
disabled\n"

Janice Girouard
[email protected]
===================================================

diff -Naur linux-2.5.69.orig/drivers/net/e1000/e1000_hw.c
linux-2.5.69.newMsgs/drivers/net/e1000/e1000_hw.c
--- linux-2.5.69.orig/drivers/net/e1000/e1000_hw.c 2003-06-04
13:24:46.000000000 -0500
+++ linux-2.5.69.newMsgs/drivers/net/e1000/e1000_hw.c 2003-06-04
13:14:58.000000000 -0500
@@ -31,6 +31,7 @@
*/

#include "e1000_hw.h"
+#include <linux/stdmsgs.h>

static int32_t e1000_set_phy_type(struct e1000_hw *hw);
static void e1000_phy_init_script(struct e1000_hw *hw);
@@ -468,7 +469,7 @@
* be initialized based on a value in the EEPROM.
*/
if(e1000_read_eeprom(hw, EEPROM_INIT_CONTROL2_REG, 1, &eeprom_data)
< 0) {
- DEBUGOUT("EEPROM Read Error\n");
+ DEBUGOUT1(EMSG_DEV_EEPROM_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_EEPROM;
}

@@ -666,7 +667,11 @@
hw->autoneg_failed = 1;
ret_val = e1000_check_for_link(hw);
if(ret_val < 0) {
+ uint16_t mii_status_reg;
DEBUGOUT("Error while checking for link\n");
+ e1000_read_phy_reg( hw, PHY_STATUS, &mii_status_reg);
+ DEBUGOUT1(EMSG_NET_LINK_FAIL, hw->back->adapter->netdev->name,
+ mii_status_reg);
return ret_val;
}
hw->autoneg_failed = 0;
@@ -730,7 +735,7 @@
msec_delay(15);

if(e1000_write_phy_reg(hw, IGP01E1000_PHY_PAGE_SELECT, 0x0000)
< 0) {
- DEBUGOUT("PHY Write Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_WRITE,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}

@@ -746,29 +751,29 @@
/* Disable SmartSpeed */
if(e1000_read_phy_reg(hw, IGP01E1000_PHY_PORT_CONFIG,
&phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
phy_data &= ~IGP01E1000_PSCFR_SMART_SPEED;
if(e1000_write_phy_reg(hw, IGP01E1000_PHY_PORT_CONFIG,
phy_data) < 0) {
- DEBUGOUT("PHY Write Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_WRITE,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
/* Set auto Master/Slave resolution process */
if(e1000_read_phy_reg(hw, PHY_1000T_CTRL, &phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
phy_data &= ~CR_1000T_MS_ENABLE;
if(e1000_write_phy_reg(hw, PHY_1000T_CTRL, phy_data) < 0) {
- DEBUGOUT("PHY Write Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_WRITE,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
}

if(e1000_read_phy_reg(hw, IGP01E1000_PHY_PORT_CTRL, &phy_data)
< 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}

@@ -779,14 +784,14 @@
hw->mdix = 1;

if(e1000_write_phy_reg(hw, IGP01E1000_PHY_PORT_CTRL, phy_data)
< 0) {
- DEBUGOUT("PHY Write Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_WRITE,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}

} else {
/* Enable CRS on TX. This must be set for half-duplex operation. */
if(e1000_read_phy_reg(hw, M88E1000_PHY_SPEC_CTRL, &phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
phy_data |= M88E1000_PSCR_ASSERT_CRS_ON_TX;
@@ -826,7 +831,7 @@
if(hw->disable_polarity_correction == 1)
phy_data |= M88E1000_PSCR_POLARITY_REVERSAL;
if(e1000_write_phy_reg(hw, M88E1000_PHY_SPEC_CTRL, phy_data) < 0) {
- DEBUGOUT("PHY Write Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_WRITE,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}

@@ -834,7 +839,7 @@
* to 25MHz clock.
*/
if(e1000_read_phy_reg(hw, M88E1000_EXT_PHY_SPEC_CTRL,
&phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
phy_data |= M88E1000_EPSCR_TX_CLK_25;
@@ -847,7 +852,7 @@
M88E1000_EPSCR_SLAVE_DOWNSHIFT_1X);
if(e1000_write_phy_reg(hw, M88E1000_EXT_PHY_SPEC_CTRL,
phy_data) < 0) {
- DEBUGOUT("PHY Write Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_WRITE,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
}
@@ -855,7 +860,7 @@
/* SW Reset the PHY so all changes take effect */
ret_val = e1000_phy_reset(hw);
if(ret_val < 0) {
- DEBUGOUT("Error Resetting the PHY\n");
+ DEBUGOUT1(EMSG_DEV_SW_RESET, hw->back->adapter->netdev->name);
return ret_val;
}
}
@@ -899,12 +904,12 @@
* the Auto Neg Restart bit in the PHY control register.
*/
if(e1000_read_phy_reg(hw, PHY_CTRL, &phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
phy_data |= (MII_CR_AUTO_NEG_EN | MII_CR_RESTART_AUTO_NEG);
if(e1000_write_phy_reg(hw, PHY_CTRL, phy_data) < 0) {
- DEBUGOUT("PHY Write Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_WRITE,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}

@@ -933,11 +938,11 @@
*/
for(i = 0; i < 10; i++) {
if(e1000_read_phy_reg(hw, PHY_STATUS, &phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
if(e1000_read_phy_reg(hw, PHY_STATUS, &phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
if(phy_data & MII_SR_LINK_STATUS) {
@@ -988,13 +993,13 @@

/* Read the MII Auto-Neg Advertisement Register (Address 4). */
if(e1000_read_phy_reg(hw, PHY_AUTONEG_ADV, &mii_autoneg_adv_reg) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}

/* Read the MII 1000Base-T Control Register (Address 9). */
if(e1000_read_phy_reg(hw, PHY_1000T_CTRL, &mii_1000t_ctrl_reg) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}

@@ -1103,14 +1108,14 @@
}

if(e1000_write_phy_reg(hw, PHY_AUTONEG_ADV, mii_autoneg_adv_reg) < 0) {
- DEBUGOUT("PHY Write Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}

DEBUGOUT1("Auto-Neg Advertising %x\n", mii_autoneg_adv_reg);

if(e1000_write_phy_reg(hw, PHY_1000T_CTRL, mii_1000t_ctrl_reg) < 0) {
- DEBUGOUT("PHY Write Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
return 0;
@@ -1150,7 +1155,7 @@

/* Read the MII Control Register. */
if(e1000_read_phy_reg(hw, PHY_CTRL, &mii_ctrl_reg) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}

@@ -1199,7 +1204,7 @@

if (hw->phy_type == e1000_phy_m88) {
if(e1000_read_phy_reg(hw, M88E1000_PHY_SPEC_CTRL, &phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}

@@ -1208,7 +1213,7 @@
*/
phy_data &= ~M88E1000_PSCR_AUTO_X_MODE;
if(e1000_write_phy_reg(hw, M88E1000_PHY_SPEC_CTRL, phy_data) < 0) {
- DEBUGOUT("PHY Write Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_WRITE,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
DEBUGOUT1("M88E1000 PSCR: %x \n", phy_data);
@@ -1220,7 +1225,7 @@
* forced whenever speed or duplex are forced.
*/
if(e1000_read_phy_reg(hw, IGP01E1000_PHY_PORT_CTRL, &phy_data)
< 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}

@@ -1228,14 +1233,14 @@
phy_data &= ~IGP01E1000_PSCR_FORCE_MDI_MDIX;

if(e1000_write_phy_reg(hw, IGP01E1000_PHY_PORT_CTRL, phy_data)
< 0) {
- DEBUGOUT("PHY Write Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_WRITE,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
}

/* Write back the modified PHY MII control register. */
if(e1000_write_phy_reg(hw, PHY_CTRL, mii_ctrl_reg) < 0) {
- DEBUGOUT("PHY Write Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
udelay(1);
@@ -1258,11 +1263,11 @@
* to be set.
*/
if(e1000_read_phy_reg(hw, PHY_STATUS, &mii_status_reg) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
if(e1000_read_phy_reg(hw, PHY_STATUS, &mii_status_reg) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
if(mii_status_reg & MII_SR_LINK_STATUS) break;
@@ -1285,11 +1290,11 @@
* to be set.
*/
if(e1000_read_phy_reg(hw, PHY_STATUS, &mii_status_reg) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
if(e1000_read_phy_reg(hw, PHY_STATUS, &mii_status_reg) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
}
@@ -1301,12 +1306,12 @@
* defaults back to a 2.5MHz clock when the PHY is reset.
*/
if(e1000_read_phy_reg(hw, M88E1000_EXT_PHY_SPEC_CTRL,
&phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
phy_data |= M88E1000_EPSCR_TX_CLK_25;
if(e1000_write_phy_reg(hw, M88E1000_EXT_PHY_SPEC_CTRL,
phy_data) < 0) {
- DEBUGOUT("PHY Write Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_WRITE,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}

@@ -1314,12 +1319,12 @@
* TX. This must be set for both full and half duplex operation.
*/
if(e1000_read_phy_reg(hw, M88E1000_PHY_SPEC_CTRL, &phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
phy_data |= M88E1000_PSCR_ASSERT_CRS_ON_TX;
if(e1000_write_phy_reg(hw, M88E1000_PHY_SPEC_CTRL, phy_data) < 0) {
- DEBUGOUT("PHY Write Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_WRITE,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
}
@@ -1379,7 +1384,7 @@
*/
if (hw->phy_type == e1000_phy_igp) {
if(e1000_read_phy_reg(hw, IGP01E1000_PHY_PORT_STATUS,
&phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
if(phy_data & IGP01E1000_PSSR_FULL_DUPLEX) ctrl |= E1000_CTRL_FD;
@@ -1398,7 +1403,7 @@
ctrl |= E1000_CTRL_SPD_100;
} else {
if(e1000_read_phy_reg(hw, M88E1000_PHY_SPEC_STATUS, &phy_data)
< 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
if(phy_data & M88E1000_PSSR_DPLX) ctrl |= E1000_CTRL_FD;
@@ -1533,11 +1538,11 @@
* some "sticky" (latched) bits.
*/
if(e1000_read_phy_reg(hw, PHY_STATUS, &mii_status_reg) < 0) {
- DEBUGOUT("PHY Read Error \n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
if(e1000_read_phy_reg(hw, PHY_STATUS, &mii_status_reg) < 0) {
- DEBUGOUT("PHY Read Error \n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}

@@ -1549,11 +1554,11 @@
* negotiated.
*/
if(e1000_read_phy_reg(hw, PHY_AUTONEG_ADV,
&mii_nway_adv_reg) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
if(e1000_read_phy_reg(hw, PHY_LP_ABILITY,
&mii_nway_lp_ability_reg) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}

@@ -1735,11 +1740,11 @@
* Read the register twice since the link bit is sticky.
*/
if(e1000_read_phy_reg(hw, PHY_STATUS, &phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
if(e1000_read_phy_reg(hw, PHY_STATUS, &phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}

@@ -1798,7 +1803,7 @@
*/
if(hw->tbi_compatibility_en) {
if(e1000_read_phy_reg(hw, PHY_LP_ABILITY, &lp_capability) <
0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
if(lp_capability & (NWAY_LPAR_10T_HD_CAPS |
@@ -1941,11 +1946,11 @@
* Complete bit to be set.
*/
if(e1000_read_phy_reg(hw, PHY_STATUS, &phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
if(e1000_read_phy_reg(hw, PHY_STATUS, &phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
if(phy_data & MII_SR_AUTONEG_COMPLETE) {
@@ -2286,7 +2291,7 @@

if((hw->mac_type == e1000_82541) || (hw->mac_type == e1000_82547)) {
if(e1000_write_phy_reg(hw, IGP01E1000_PHY_PAGE_SELECT, 0x0000)
< 0) {
- DEBUGOUT("PHY Write Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_WRITE,
hw->back->adapter->netdev->name);
return;
}

@@ -2315,12 +2320,12 @@
DEBUGFUNC("e1000_phy_reset");

if(e1000_read_phy_reg(hw, PHY_CTRL, &phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
phy_data |= MII_CR_RESET;
if(e1000_write_phy_reg(hw, PHY_CTRL, phy_data) < 0) {
- DEBUGOUT("PHY Write Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_WRITE, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
udelay(1);
@@ -2346,13 +2351,13 @@

/* Read the PHY ID Registers to identify which PHY is onboard. */
if(e1000_read_phy_reg(hw, PHY_ID1, &phy_id_high) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
hw->phy_id = (uint32_t) (phy_id_high << 16);
udelay(20);
if(e1000_read_phy_reg(hw, PHY_ID2, &phy_id_low) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
hw->phy_id |= (uint32_t) (phy_id_low & PHY_REVISION_MASK);
@@ -2406,7 +2411,7 @@
ret_val = 0;
} while(0);

- if(ret_val < 0) DEBUGOUT("PHY Write Error\n");
+ if(ret_val < 0) DEBUGOUT1(EMSG_DEV_PHY_WRITE,
hw->back->adapter->netdev->name);
return ret_val;
}

@@ -2566,11 +2571,11 @@
}

if(e1000_read_phy_reg(hw, PHY_STATUS, &phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
if(e1000_read_phy_reg(hw, PHY_STATUS, &phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
if((phy_data & MII_SR_LINK_STATUS) != MII_SR_LINK_STATUS) {
@@ -3121,7 +3126,7 @@

for(i = 0; i < (EEPROM_CHECKSUM_REG + 1); i++) {
if(e1000_read_eeprom(hw, i, 1, &eeprom_data) < 0) {
- DEBUGOUT("EEPROM Read Error\n");
+ DEBUGOUT1(EMSG_DEV_EEPROM_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_EEPROM;
}
checksum += eeprom_data;
@@ -3153,14 +3158,14 @@

for(i = 0; i < EEPROM_CHECKSUM_REG; i++) {
if(e1000_read_eeprom(hw, i, 1, &eeprom_data) < 0) {
- DEBUGOUT("EEPROM Read Error\n");
+ DEBUGOUT1(EMSG_DEV_EEPROM_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_EEPROM;
}
checksum += eeprom_data;
}
checksum = (uint16_t) EEPROM_SUM - checksum;
if(e1000_write_eeprom(hw, EEPROM_CHECKSUM_REG, 1, &checksum) < 0) {
- DEBUGOUT("EEPROM Write Error\n");
+ DEBUGOUT1(EMSG_DEV_EEPROM_WRITE, hw->back->adapter->netdev->name);
return -E1000_ERR_EEPROM;
}
return 0;
@@ -3381,7 +3386,7 @@

/* Get word 0 from EEPROM */
if(e1000_read_eeprom(hw, offset, 1, &eeprom_data) < 0) {
- DEBUGOUT("EEPROM Read Error\n");
+ DEBUGOUT1(EMSG_DEV_EEPROM_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_EEPROM;
}
/* Save word 0 in upper half of part_num */
@@ -3389,7 +3394,7 @@

/* Get word 1 from EEPROM */
if(e1000_read_eeprom(hw, ++offset, 1, &eeprom_data) < 0) {
- DEBUGOUT("EEPROM Read Error\n");
+ DEBUGOUT1(EMSG_DEV_EEPROM_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_EEPROM;
}
/* Save word 1 in lower half of part_num */
@@ -3415,7 +3420,7 @@
for(i = 0; i < NODE_ADDRESS_SIZE; i += 2) {
offset = i >> 1;
if(e1000_read_eeprom(hw, offset, 1, &eeprom_data) < 0) {
- DEBUGOUT("EEPROM Read Error\n");
+ DEBUGOUT1(EMSG_DEV_EEPROM_READ,
hw->back->adapter->netdev->name);
return -E1000_ERR_EEPROM;
}
hw->perm_mac_addr[i] = (uint8_t) (eeprom_data & 0x00FF);
@@ -3715,7 +3720,7 @@
hw->ledctl_mode2 = hw->ledctl_default;

if(e1000_read_eeprom(hw, EEPROM_ID_LED_SETTINGS, 1, &eeprom_data) <
0) {
- DEBUGOUT("EEPROM Read Error\n");
+ DEBUGOUT1(EMSG_DEV_EEPROM_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_EEPROM;
}
if((eeprom_data== ID_LED_RESERVED_0000) ||
@@ -3807,7 +3812,7 @@
E1000_WRITE_REG(hw, LEDCTL, hw->ledctl_mode1);
break;
default:
- DEBUGOUT("Invalid device ID\n");
+ DEBUGOUT1(EMSG_PCI_BAD_ID, hw->back->adapter->netdev->name);
return -E1000_ERR_CONFIG;
}
return 0;
@@ -3849,7 +3854,7 @@
E1000_WRITE_REG(hw, LEDCTL, hw->ledctl_default);
break;
default:
- DEBUGOUT("Invalid device ID\n");
+ DEBUGOUT1(EMSG_PCI_BAD_ID, hw->back->adapter->netdev->name);
return -E1000_ERR_CONFIG;
}
return 0;
@@ -3902,7 +3907,7 @@
E1000_WRITE_REG(hw, LEDCTL, hw->ledctl_mode2);
break;
default:
- DEBUGOUT("Invalid device ID\n");
+ DEBUGOUT1(EMSG_PCI_BAD_ID, hw->back->adapter->netdev->name);
return -E1000_ERR_CONFIG;
}
return 0;
@@ -3955,7 +3960,7 @@
E1000_WRITE_REG(hw, LEDCTL, hw->ledctl_mode1);
break;
default:
- DEBUGOUT("Invalid device ID\n");
+ DEBUGOUT1(EMSG_PCI_BAD_ID, hw->back->adapter->netdev->name);
return -E1000_ERR_CONFIG;
}
return 0;
@@ -4468,14 +4473,14 @@

if(hw->phy_type == e1000_phy_igp) {
if(e1000_read_phy_reg(hw, IGP01E1000_PHY_LINK_HEALTH,
&phy_data) < 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
hw->speed_downgraded = (phy_data &
IGP01E1000_PLHR_SS_DOWNGRADE) ? 1 : 0;
}
else if(hw->phy_type == e1000_phy_m88) {
if(e1000_read_phy_reg(hw, M88E1000_PHY_SPEC_STATUS, &phy_data)
< 0) {
- DEBUGOUT("PHY Read Error\n");
+ DEBUGOUT1(EMSG_DEV_PHY_READ, hw->back->adapter->netdev->name);
return -E1000_ERR_PHY;
}
hw->speed_downgraded = (phy_data & M88E1000_PSSR_DOWNSHIFT) >>
diff -Naur linux-2.5.69.orig/drivers/net/e1000/e1000_main.c
linux-2.5.69.newMsgs/drivers/net/e1000/e1000_main.c
--- linux-2.5.69.orig/drivers/net/e1000/e1000_main.c 2003-06-04
13:24:46.000000000 -0500
+++ linux-2.5.69.newMsgs/drivers/net/e1000/e1000_main.c 2003-06-04
13:14:58.000000000 -0500
@@ -27,6 +27,7 @@
*******************************************************************************/

#include "e1000.h"
+#include <linux/stdmsgs.h>

/* Change Log
*
@@ -228,10 +229,11 @@
e1000_init_module(void)
{
int ret;
- printk(KERN_INFO "%s - version %s\n",
- e1000_driver_string, e1000_driver_version);

- printk(KERN_INFO "%s\n", e1000_copyright);
+ printk(KERN_INFO EMSG_BASICS, e1000_driver_name ,
+ e1000_driver_string , e1000_driver_version);
+
+ printk(KERN_INFO " %s\n", e1000_copyright);

ret = pci_module_init(&e1000_driver);
if(ret >= 0)
@@ -508,6 +510,22 @@
netif_stop_queue(netdev);

printk(KERN_INFO "%s: %s\n", netdev->name, adapter->id_string);
+
+ if (netdev->features & NETIF_F_SG)
+ printk(KERN_INFO EMSG_NET_SGATHER, netdev->name);
+ else
+ printk(KERN_INFO EMSG_NET_NO_SGATHER, netdev->name);
+
+ if (netdev->features & NETIF_F_HW_CSUM) {
+ printk(KERN_INFO EMSG_NET_HW_CSUMS, netdev->name);
+ } else {
+ if (netdev->features & NETIF_F_IP_CSUM)
+ printk(KERN_INFO EMSG_NET_CSUMS, netdev->name);
+ else
+ printk(KERN_INFO EMSG_NET_NO_CSUMS, netdev->name);
+ }
+
+
e1000_check_options(adapter);

/* Initial Wake on LAN setting
@@ -597,9 +615,11 @@
hw->subsystem_vendor_id = pdev->subsystem_vendor;
hw->subsystem_id = pdev->subsystem_device;

- pci_read_config_byte(pdev, PCI_REVISION_ID, &hw->revision_id);
+ if (pci_read_config_byte(pdev, PCI_REVISION_ID, &hw->revision_id))
+ printk(KERN_ERR EMSG_PCI_READ, netdev->name);

- pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word);
+ if (pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word))
+ printk(KERN_ERR EMSG_PCI_READ, netdev->name);

adapter->rx_buffer_len = E1000_RXBUFFER_2048;
hw->max_frame_size = netdev->mtu +
@@ -1334,6 +1354,7 @@

adapter->tx_fifo_head = 0;
atomic_set(&adapter->tx_fifo_stall, 0);
+ printk(KERN_INFO EMSG_NET_START_QUEUE, netdev->name );
netif_wake_queue(netdev);
} else {
mod_timer(&adapter->tx_fifo_stall_timer, jiffies + 1);
@@ -1361,13 +1382,10 @@
e1000_get_speed_and_duplex(&adapter->hw,
&adapter->link_speed,
&adapter->link_duplex);
-
- printk(KERN_INFO
- "e1000: %s NIC Link is Up %d Mbps %s\n",
+ printk(KERN_INFO EMSG_NET_LINK_UP,
netdev->name, adapter->link_speed,
adapter->link_duplex == FULL_DUPLEX ?
- "Full Duplex" : "Half Duplex");
-
+ "full" : "half");
netif_carrier_on(netdev);
netif_wake_queue(netdev);
mod_timer(&adapter->phy_info_timer, jiffies + 2 * HZ);
@@ -1375,13 +1393,14 @@
}
} else {
if(netif_carrier_ok(netdev)) {
+ uint16_t mii_status_reg;
adapter->link_speed = 0;
adapter->link_duplex = 0;
- printk(KERN_INFO
- "e1000: %s NIC Link is Down\n",
- netdev->name);
+ e1000_read_phy_reg(&adapter->hw, PHY_STATUS,
+ &mii_status_reg);
+ printk(KERN_INFO EMSG_NET_LINK_FAIL,
+ netdev->name, mii_status_reg);
netif_carrier_off(netdev);
- netif_stop_queue(netdev);
mod_timer(&adapter->phy_info_timer, jiffies + 2 * HZ);
}

@@ -1420,9 +1439,21 @@
i = txdr->next_to_clean;
if(txdr->buffer_info[i].dma &&
time_after(jiffies, txdr->buffer_info[i].time_stamp + HZ) &&
- !(E1000_READ_REG(&adapter->hw, STATUS) & E1000_STATUS_TXOFF))
+ !(E1000_READ_REG(&adapter->hw, STATUS) & E1000_STATUS_TXOFF)) {
+ printk(KERN_INFO EMSG_NET_HUNG, netdev->name );
netif_stop_queue(netdev);
+ }

+ /*
+ * Need to add some code here to see if an individual tx has timed out.
+ * Right now we only look for hangs when the entire tx buffer fills up
+ * and there is nowhere to put an in-bound transmit packet. Then we
+ * could log the following error:
+ *
+ * netdev_err(netdev, EMSG_NET_TX_ERR);
+ *
+ */
+
/* Reset the timer */
mod_timer(&adapter->watchdog_timer, jiffies + 2 * HZ);
}
@@ -1697,12 +1728,14 @@
}

if(E1000_DESC_UNUSED(&adapter->tx_ring) < DESC_NEEDED) {
+ printk(KERN_INFO EMSG_NET_STOP_QUEUE, netdev->name );
netif_stop_queue(netdev);
return 1;
}

if(adapter->hw.mac_type == e1000_82547) {
if(e1000_82547_fifo_workaround(adapter, skb)) {
+ printk(KERN_INFO EMSG_NET_STOP_QUEUE, netdev->name );
netif_stop_queue(netdev);
mod_timer(&adapter->tx_fifo_stall_timer, jiffies);
return 1;
@@ -1828,6 +1861,8 @@
struct e1000_hw *hw = &adapter->hw;
unsigned long flags;
uint16_t phy_tmp;
+ unsigned long rx_errors;
+ unsigned long tx_errors;

#define PHY_IDLE_ERROR_COUNT_MASK 0x00FF

@@ -1920,10 +1955,14 @@

/* Rx Errors */

+ rx_errors = adapter->net_stats.rx_errors;
adapter->net_stats.rx_errors = adapter->stats.rxerrc +
adapter->stats.crcerrs + adapter->stats.algnerrc +
adapter->stats.rlec + adapter->stats.rnbc +
adapter->stats.mpc + adapter->stats.cexterr;
+ if (rx_errors != adapter->net_stats.rx_errors)
+ printk(KERN_INFO EMSG_NET_RX_ERR, adapter->ifname,
+ adapter->net_stats.rx_errors);
adapter->net_stats.rx_dropped = adapter->stats.rnbc;
adapter->net_stats.rx_length_errors = adapter->stats.rlec;
adapter->net_stats.rx_crc_errors = adapter->stats.crcerrs;
@@ -1933,8 +1972,12 @@

/* Tx Errors */

+ tx_errors = adapter->net_stats.tx_errors;
adapter->net_stats.tx_errors = adapter->stats.ecol +
adapter->stats.latecol;
+ if ( tx_errors != adapter->net_stats.tx_errors)
+ printk(KERN_INFO EMSG_NET_TX_ERR, adapter->ifname,
+ adapter->net_stats.tx_errors);
adapter->net_stats.tx_aborted_errors = adapter->stats.ecol;
adapter->net_stats.tx_window_errors = adapter->stats.latecol;
adapter->net_stats.tx_carrier_errors = adapter->stats.tncrs;
@@ -2105,8 +2148,10 @@

tx_ring->next_to_clean = i;

- if(cleaned && netif_queue_stopped(netdev) && netif_carrier_ok(netdev))
+ if(cleaned && netif_queue_stopped(netdev) &&
netif_carrier_ok(netdev)) {
+ printk(KERN_INFO EMSG_NET_START_QUEUE, netdev->name );
netif_wake_queue(netdev);
+ }

return cleaned;
}
@@ -2516,7 +2561,8 @@
{
struct e1000_adapter *adapter = hw->back;

- pci_read_config_word(adapter->pdev, reg, value);
+ if (pci_read_config_word(adapter->pdev, reg, value))
+ printk(KERN_ERR EMSG_PCI_READ, adapter->netdev->name);
}

void
@@ -2524,7 +2570,8 @@
{
struct e1000_adapter *adapter = hw->back;

- pci_write_config_word(adapter->pdev, reg, *value);
+ if (pci_write_config_word(adapter->pdev, reg, *value))
+ printk(KERN_ERR EMSG_PCI_WRITE, adapter->netdev->name);
}

uint32_t
diff -Naur linux-2.5.69.orig/drivers/net/e1000/e1000_param.c
linux-2.5.69.newMsgs/drivers/net/e1000/e1000_param.c
--- linux-2.5.69.orig/drivers/net/e1000/e1000_param.c 2003-06-04
13:24:46.000000000 -0500
+++ linux-2.5.69.newMsgs/drivers/net/e1000/e1000_param.c 2003-06-04
13:14:58.000000000 -0500
@@ -27,6 +27,7 @@
*******************************************************************************/

#include "e1000.h"
+#include <linux/stdmsgs.h>

/* This is the only thing that needs to be changed to adjust the
* maximum number of ports that the driver can manage.
@@ -244,7 +245,8 @@
};

static int __devinit
-e1000_validate_option(int *value, struct e1000_option *opt)
+e1000_validate_option(struct e1000_adapter *adapter, int *value,
+ struct e1000_option *opt)
{
if(*value == OPTION_UNSET) {
*value = opt->def;
@@ -255,16 +257,19 @@
case enable_option:
switch (*value) {
case OPTION_ENABLED:
- printk(KERN_INFO "%s Enabled\n", opt->name);
+ printk(KERN_INFO EMSG_DEV_CFG_ENABLED,
+ adapter->netdev->name, opt->name);
return 0;
case OPTION_DISABLED:
- printk(KERN_INFO "%s Disabled\n", opt->name);
+ printk(KERN_INFO EMSG_DEV_CFG_DISABLED,
+ adapter->netdev->name, opt->name);
return 0;
}
break;
case range_option:
if(*value >= opt->arg.r.min && *value <= opt->arg.r.max) {
- printk(KERN_INFO "%s set to %i\n", opt->name, *value);
+ printk(KERN_INFO EMSG_DEV_CFG_ISET,
+ adapter->netdev->name, opt->name, *value);
return 0;
}
break;
@@ -330,7 +335,7 @@
MAX_TXD : MAX_82544_TXD;

tx_ring->count = TxDescriptors[bd];
- e1000_validate_option(&tx_ring->count, &opt);
+ e1000_validate_option(adapter, &tx_ring->count, &opt);
E1000_ROUNDUP(tx_ring->count, REQ_TX_DESCRIPTOR_MULTIPLE);
}
{ /* Receive Descriptor Count */
@@ -346,7 +351,7 @@
opt.arg.r.max = mac_type < e1000_82544 ? MAX_RXD : MAX_82544_RXD;

rx_ring->count = RxDescriptors[bd];
- e1000_validate_option(&rx_ring->count, &opt);
+ e1000_validate_option(adapter, &rx_ring->count, &opt);
E1000_ROUNDUP(rx_ring->count, REQ_RX_DESCRIPTOR_MULTIPLE);
}
{ /* Checksum Offload Enable/Disable */
@@ -358,7 +363,7 @@
};

int rx_csum = XsumRX[bd];
- e1000_validate_option(&rx_csum, &opt);
+ e1000_validate_option(adapter, &rx_csum, &opt);
adapter->rx_csum = rx_csum;
}
{ /* Flow Control */
@@ -380,7 +385,7 @@
};

int fc = FlowControl[bd];
- e1000_validate_option(&fc, &opt);
+ e1000_validate_option(adapter, &fc, &opt);
adapter->hw.fc = adapter->hw.original_fc = fc;
}
{ /* Transmit Interrupt Delay */
@@ -394,7 +399,7 @@
};

adapter->tx_int_delay = TxIntDelay[bd];
- e1000_validate_option(&adapter->tx_int_delay, &opt);
+ e1000_validate_option(adapter, &adapter->tx_int_delay, &opt);
}
{ /* Transmit Absolute Interrupt Delay */
struct e1000_option opt = {
@@ -407,7 +412,7 @@
};

adapter->tx_abs_int_delay = TxAbsIntDelay[bd];
- e1000_validate_option(&adapter->tx_abs_int_delay, &opt);
+ e1000_validate_option(adapter, &adapter->tx_abs_int_delay, &opt);
}
{ /* Receive Interrupt Delay */
struct e1000_option opt = {
@@ -420,7 +425,7 @@
};

adapter->rx_int_delay = RxIntDelay[bd];
- e1000_validate_option(&adapter->rx_int_delay, &opt);
+ e1000_validate_option(adapter, &adapter->rx_int_delay, &opt);
}
{ /* Receive Absolute Interrupt Delay */
struct e1000_option opt = {
@@ -433,7 +438,7 @@
};

adapter->rx_abs_int_delay = RxAbsIntDelay[bd];
- e1000_validate_option(&adapter->rx_abs_int_delay, &opt);
+ e1000_validate_option(adapter, &adapter->rx_abs_int_delay, &opt);
}
{ /* Interrupt Throttling Rate */
struct e1000_option opt = {
@@ -452,7 +457,7 @@
/* Dynamic mode */
adapter->itr = 1;
} else {
- e1000_validate_option(&adapter->itr, &opt);
+ e1000_validate_option(adapter, &adapter->itr, &opt);
}
}

@@ -525,7 +530,7 @@
};

speed = Speed[bd];
- e1000_validate_option(&speed, &opt);
+ e1000_validate_option(adapter, &speed, &opt);
}
{ /* Duplex */
struct e1000_opt_list dplx_list[] = {{ 0, "" },
@@ -542,7 +547,7 @@
};

dplx = Duplex[bd];
- e1000_validate_option(&dplx, &opt);
+ e1000_validate_option(adapter, &dplx, &opt);
}

if(AutoNeg[bd] != OPTION_UNSET && (speed != 0 || dplx != 0)) {
@@ -595,7 +600,7 @@
};

int an = AutoNeg[bd];
- e1000_validate_option(&an, &opt);
+ e1000_validate_option(adapter, &an, &opt);
adapter->hw.autoneg_advertised = an;
}

diff -Naur linux-2.5.69.orig/drivers/net/tg3.c
linux-2.5.69.newMsgs/drivers/net/tg3.c
--- linux-2.5.69.orig/drivers/net/tg3.c 2003-06-04 13:24:35.000000000
-0500
+++ linux-2.5.69.newMsgs/drivers/net/tg3.c 2003-06-04
13:15:45.000000000 -0500
@@ -26,6 +26,7 @@
#include <linux/ip.h>
#include <linux/tcp.h>
#include <linux/workqueue.h>
+#include <linux/stdmsgs.h>

#include <asm/system.h>
#include <asm/io.h>
@@ -347,6 +348,7 @@
udelay(40);
}

+ if (ret) printk(KERN_ERR EMSG_DEV_PHY_READ, tp->dev->name);
return ret;
}

@@ -393,6 +395,7 @@
udelay(40);
}

+ if (ret) printk(KERN_ERR EMSG_DEV_PHY_WRITE, tp->dev->name);
return ret;
}

@@ -634,9 +637,11 @@
static void tg3_link_report(struct tg3 *tp)
{
if (!netif_carrier_ok(tp->dev)) {
- printk(KERN_INFO PFX "%s: Link is down.\n", tp->dev->name);
+ u32 mii_regval;
+ tg3_readphy(tp, MII_TG3_PHY_STAT, &mii_regval);
+ printk(KERN_INFO EMSG_NET_LINK_FAIL, tp->dev->name, mii_regval);
} else {
- printk(KERN_INFO PFX "%s: Link is up at %d Mbps, %s duplex.\n",
+ printk(KERN_INFO EMSG_NET_LINK_UP,
tp->dev->name,
(tp->link_config.active_speed == SPEED_1000 ?
1000 :
@@ -1780,8 +1785,10 @@
tp->tx_cons = sw_idx;

if (netif_queue_stopped(tp->dev) &&
- (TX_BUFFS_AVAIL(tp) > TG3_TX_WAKEUP_THRESH))
+ (TX_BUFFS_AVAIL(tp) > TG3_TX_WAKEUP_THRESH)) {
+ printk(KERN_INFO EMSG_NET_START_QUEUE, tp->dev->name);
netif_wake_queue(tp->dev);
+ }
}

/* Returns size of skb allocated or < 0 on error.
@@ -2580,8 +2587,10 @@
}

tp->tx_prod = entry;
- if (TX_BUFFS_AVAIL(tp) <= (MAX_SKB_FRAGS + 1))
+ if (TX_BUFFS_AVAIL(tp) <= (MAX_SKB_FRAGS + 1)) {
+ printk(KERN_INFO EMSG_NET_STOP_QUEUE, dev->name);
netif_stop_queue(dev);
+ }

out_unlock:
spin_unlock_irqrestore(&tp->tx_lock, flags);
@@ -2727,8 +2736,10 @@
}

tp->tx_prod = entry;
- if (TX_BUFFS_AVAIL(tp) <= (MAX_SKB_FRAGS + 1))
+ if (TX_BUFFS_AVAIL(tp) <= (MAX_SKB_FRAGS + 1)) {
+ printk(KERN_INFO EMSG_NET_STOP_QUEUE, dev->name);
netif_stop_queue(dev);
+ }

spin_unlock_irqrestore(&tp->tx_lock, flags);

@@ -3436,10 +3447,7 @@
}

if (i >= 10000) {
- printk(KERN_ERR PFX "tg3_reset_cpu timed out for %s, "
- "and %s CPU\n",
- tp->dev->name,
- (offset == RX_CPU_BASE ? "RX" : "TX"));
+ printk(KERN_ERR EMSG_DEV_SW_RESET, tp->dev->name);
return -ENODEV;
}
return 0;
@@ -4462,7 +4470,7 @@
tw32(TG3PCI_MEM_WIN_BASE_ADDR, 0);

err = tg3_reset_hw(tp);
-
+ if (err) printk( KERN_ERR EMSG_DEV_SW_RESET, tp->dev->name);
out:
return err;
}
@@ -4963,11 +4971,15 @@

stats->rx_errors = old_stats->rx_errors +
get_stat64(&hw_stats->rx_errors);
+ if (stats->rx_errors != old_stats->rx_errors)
+ printk(KERN_INFO EMSG_NET_RX_ERR, dev->name, stats->rx_errors);
stats->tx_errors = old_stats->tx_errors +
get_stat64(&hw_stats->tx_errors) +
get_stat64(&hw_stats->tx_mac_errors) +
get_stat64(&hw_stats->tx_carrier_sense_errors) +
get_stat64(&hw_stats->tx_discards);
+ if (stats->tx_errors != old_stats->tx_errors)
+ printk(KERN_INFO EMSG_NET_TX_ERR, dev->name, stats->tx_errors);

stats->multicast = old_stats->multicast +
get_stat64(&hw_stats->rx_mcast_packets);
@@ -5661,8 +5673,10 @@
int i;

if (offset > EEPROM_ADDR_ADDR_MASK ||
- (offset % 4) != 0)
+ (offset % 4) != 0) {
+ printk(KERN_ERR EMSG_DEV_EEPROM_READ, tp->dev->name);
return -EINVAL;
+ }

tmp = tr32(GRC_EEPROM_ADDR) & ~(EEPROM_ADDR_ADDR_MASK |
EEPROM_ADDR_DEVID_MASK |
@@ -5681,8 +5695,10 @@
break;
udelay(100);
}
- if (!(tmp & EEPROM_ADDR_COMPLETE))
+ if (!(tmp & EEPROM_ADDR_COMPLETE)) {
+ printk(KERN_ERR EMSG_NET_HUNG, tp->dev->name);
return -EBUSY;
+ }

*val = tr32(GRC_EEPROM_DATA);
return 0;
@@ -5875,8 +5891,10 @@
*/
if (tp->phy_id == PHY_ID_INVALID) {
if (!eeprom_signature_found ||
- !KNOWN_PHY_ID(eeprom_phy_id & PHY_ID_MASK))
+ !KNOWN_PHY_ID(eeprom_phy_id & PHY_ID_MASK)) {
+ printk(KERN_ERR EMSG_PCI_BAD_ID, tp->dev->name);
return -ENODEV;
+ }
tp->phy_id = eeprom_phy_id;
}
}
@@ -5953,6 +5971,7 @@
~(ADVERTISED_1000baseT_Half |
ADVERTISED_1000baseT_Full);

+ if (err) printk(KERN_ERR EMSG_DEV_PHY_READ, tp->dev->name);
return err;
}

@@ -6046,9 +6065,11 @@
* workaround but turns MWI off all the times so never uses
* it. This seems to suggest that the workaround is insufficient.
*/
- pci_read_config_word(tp->pdev, PCI_COMMAND, &pci_cmd);
+ if (pci_read_config_word(tp->pdev, PCI_COMMAND, &pci_cmd))
+ printk(KERN_ERR EMSG_PCI_READ, tp->dev->name);
pci_cmd &= ~PCI_COMMAND_INVALIDATE;
- pci_write_config_word(tp->pdev, PCI_COMMAND, pci_cmd);
+ if (pci_write_config_word(tp->pdev, PCI_COMMAND, pci_cmd))
+ printk(KERN_ERR EMSG_PCI_WRITE, tp->dev->name);

/* It is absolutely critical that TG3PCI_MISC_HOST_CTRL
* has the register indirect write enable bit set before
@@ -6056,8 +6077,9 @@
* critical that the PCI-X hw workaround situation is decided
* before that as well.
*/
- pci_read_config_dword(tp->pdev, TG3PCI_MISC_HOST_CTRL,
- &misc_ctrl_reg);
+ if (pci_read_config_dword(tp->pdev, TG3PCI_MISC_HOST_CTRL,
+ &misc_ctrl_reg))
+ printk(KERN_ERR EMSG_PCI_READ, tp->dev->name);

tp->pci_chip_rev_id = (misc_ctrl_reg >>
MISC_HOST_CTRL_CHIPREV_SHIFT);
@@ -6870,6 +6892,20 @@
} else
tp->tg3_flags &= ~TG3_FLAG_RX_CHECKSUMS;

+ if (dev->features & NETIF_F_SG)
+ printk(KERN_INFO EMSG_NET_SGATHER, dev->name);
+ else
+ printk(KERN_INFO EMSG_NET_NO_SGATHER, dev->name);
+
+ if (dev->features & NETIF_F_HW_CSUM) {
+ printk(KERN_INFO EMSG_NET_HW_CSUMS, dev->name);
+ } else {
+ if (dev->features & NETIF_F_IP_CSUM)
+ printk(KERN_INFO EMSG_NET_CSUMS, dev->name);
+ else
+ printk(KERN_INFO EMSG_NET_NO_CSUMS, dev->name);
+ }
+
#if TG3_DO_TSO != 0
if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5700 ||
(GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701 &&
@@ -7027,6 +7063,8 @@

static int __init tg3_init(void)
{
+ printk(KERN_INFO EMSG_BASICS, DRV_MODULE_NAME,
+ "Broadcom Tigon3 ethernet driver" , DRV_MODULE_VERSION);
return pci_module_init(&tg3_driver);
}

diff -Naur linux-2.5.69.orig/drivers/net/tg3.h
linux-2.5.69.newMsgs/drivers/net/tg3.h
--- linux-2.5.69.orig/drivers/net/tg3.h 2003-06-04 13:24:35.000000000
-0500
+++ linux-2.5.69.newMsgs/drivers/net/tg3.h 2003-06-04
13:15:45.000000000 -0500
@@ -1319,6 +1319,8 @@
/* Tigon3 specific PHY MII registers. */
#define TG3_BMCR_SPEED1000 0x0040

+#define MII_TG3_PHY_STAT 0x01 /* Status Register*/
+
#define MII_TG3_CTRL 0x09 /* 1000-baseT control register */
#define MII_TG3_CTRL_ADV_1000_HALF 0x0100
#define MII_TG3_CTRL_ADV_1000_FULL 0x0200
diff -Naur linux-2.5.69.orig/include/linux/stdmsgs.h
linux-2.5.69.newMsgs/include/linux/stdmsgs.h
--- linux-2.5.69.orig/include/linux/stdmsgs.h 1969-12-31
18:00:00.000000000 -0600
+++ linux-2.5.69.newMsgs/include/linux/stdmsgs.h 2003-06-10
10:44:12.000000000 -0500
@@ -0,0 +1,57 @@
+#ifndef _STDMSGS_
+#define _STDMSGS_
+
+/*
+ * Some common error messages for logging.
+ *
+ * Note: the "%s:" text preceeding each message
+ * is used to describe the device name for
+ * messages unique to a specific piece of h/w,
+ * or the device driver name otherwise.
+ *
+
+/*********************************************************
+ * common system errors/msgs
+ */
+#define EMSG_BASICS "%s: %s - version %s\n"
+#define EMGS_NOMEM
+
+
+/*********************************************************
+ * device errors/msgs
+ */
+#define EMSG_DEV_EEPROM_READ "%s: hardware failure: EEPROM read error\n"
+#define EMSG_DEV_EEPROM_WRITE "%s: hardware failure: EEPROM write
error\n"
+#define EMSG_DEV_PHY_READ "%s: hardware failure: read error on
physical interface\n"
+#define EMSG_DEV_PHY_WRITE "%s: hardware failure: write error on
physical interface\n"
+#define EMSG_DEV_SW_RESET "%s: software failure: unable to reset
device \n"
+#define EMSG_DEV_CFG_ENABLED "%s: configuration note: %s enabled\n"
+#define EMSG_DEV_CFG_DISABLED "%s: configuration note: %s disabled\n"
+#define EMSG_DEV_CFG_ISET "%s: configuration note: %s set to %i\n"
+
+
+
+/*********************************************************
+ * network errors/msgs
+ */
+#define EMSG_NET_LINK_FAIL "%s: transient problem: link error
detected - MII status %x\n"
+#define EMSG_NET_LINK_UP "%s: state change: link up, %d Mbps,
%s-duplex\n"
+#define EMSG_NET_HUNG "%s: software failure: ethernet controller
hung\n"
+#define EMSG_NET_RX_ERR "%s: transient problem: packet receive
error, rx_errors = %ld\n"
+#define EMSG_NET_TX_ERR "%s: transient problem: packet transmit
error, tx_errors = %ld\n"
+#define EMSG_NET_START_QUEUE "%s: performance event: (re)starting
netdev queue\n"
+#define EMSG_NET_STOP_QUEUE "%s: performance event: stopping netdev
queue\n"
+#define EMSG_NET_SGATHER "%s: scatter/gather I/O enabled\n"
+#define EMSG_NET_NO_SGATHER "%s: performance event: scatter/gather
I/O disabled\n"
+#define EMSG_NET_HW_CSUMS "%s: all IP checksums on transmit
enabled\n"
+#define EMSG_NET_CSUMS "%s: TCP/UDP over IPv6 checksums on
transmit enabled\n"
+#define EMSG_NET_NO_CSUMS "%s: performance event: IP checksums on
transmit disabled\n"
+
+/*********************************************************
+ * PCI device errors/msgs
+ */
+#define EMSG_PCI_BAD_ID "%s: invalid device ID in PCI config
header\n"
+#define EMSG_PCI_READ "%s: hardware failure: PCI read error \n"
+#define EMSG_PCI_WRITE "%s: hardware failure: PCI write error \n"
+
+#endif /* _STDMSGS_ */






2003-06-16 20:29:08

by David Miller

[permalink] [raw]
Subject: Re: patch for common networking error messages

From: Janice M Girouard <[email protected]>
Date: Mon, 16 Jun 2003 15:30:22 -0500

EMSG_NET_LINK_UP "%s: state change: link up, %d Mbps, %s-duplex\n"

Should indicate flow control state too.

EMSG_NET_START_QUEUE "%s: performance event: (re)starting netdev queue\n"
EMSG_NET_STOP_QUEUE "%s: performance event: stopping netdev queue\n"

Oh _ABSOLUTELY NOT_, you're not printing a message
for normal events like this. Especially those that are
going to occur on highly loaded systems.

2003-06-16 20:39:52

by Andi Kleen

[permalink] [raw]
Subject: Re: patch for common networking error messages

On Mon, Jun 16, 2003 at 01:38:41PM -0700, David S. Miller wrote:
> From: Janice M Girouard <[email protected]>
> Date: Mon, 16 Jun 2003 15:30:22 -0500
>
> EMSG_NET_LINK_UP "%s: state change: link up, %d Mbps, %s-duplex\n"
>
> Should indicate flow control state too.

It would be actually useful to wrap these in real functions.

Why? It will make supporting netconsole easier which has to be careful
to never recurse in the network driver.

-Andi

2003-06-16 20:42:07

by David Miller

[permalink] [raw]
Subject: Re: patch for common networking error messages

From: Andi Kleen <[email protected]>
Date: Mon, 16 Jun 2003 22:53:42 +0200

Why? It will make supporting netconsole easier which has to be careful
to never recurse in the network driver.

printk can check this

2003-06-16 20:48:25

by Janice M Girouard

[permalink] [raw]
Subject: Re: patch for common networking error messages

I agree that it's not desirable to introduce a bunch of messages that we
aren't already logging. I didn't show the netif_msg prefix because I
was trying to focus the patch on the common messages, but you would
normally proceed a message with:

if netif_msg_link()
printk("some text to indicate the link is up/down")

The netif_msg_link test would normally filter out what messages should
be logged.

Or, just leave out the message call. I added one or two messages to the
tg3 and e1000 drivers to demonstrate where you might use these common
messages... just to show that various drivers could use the text.
Actually using the specific message would be completely up to the
developer.

Jaince



David S. Miller wrote:

> From: Janice M Girouard <[email protected]>
> Date: Mon, 16 Jun 2003 15:30:22 -0500
>
> EMSG_NET_LINK_UP "%s: state change: link up, %d Mbps, %s-duplex\n"
>
>Should indicate flow control state too.
>
> EMSG_NET_START_QUEUE "%s: performance event: (re)starting netdev queue\n"
> EMSG_NET_STOP_QUEUE "%s: performance event: stopping netdev queue\n"
>
>Oh _ABSOLUTELY NOT_, you're not printing a message
>for normal events like this. Especially those that are
>going to occur on highly loaded systems.
>
>
>


2003-06-16 22:00:49

by Nivedita Singhvi

[permalink] [raw]
Subject: Re: patch for common networking error messages

Janice M Girouard wrote:

> Below is a patch that demonstrates standard messages for ethernet device
> drivers. I would like your feedback on the concept of standard network
> messages, and any suggestions for messages to include.

Very useful! I'd like to see a short note on this in Documentation/
networking..Or perhaps if there is already a RAS best practices
kind of doc or something, add to that? (sorry, havent checked)
But it would be handy for people who wanted to contribute
patches for other drivers.

Essentially, things like some guidelines on classifying some
of those messages, when creating new messages. eg when is
something a state change and when is it a performance event?
I notice some slight ambiguity in your defs..(sorry, very
minor nitpick :)).

I'd certainly like to see messages from the driver when the
card enters/leaves promiscuous mode, as an example of things
we'd like to add...

thanks,
Nivedita


> The intent of the standard message change is to:
> 1) Ensure key events are communicated to user space in a predictable
> way, enabling automated diagnostic systems or error log analysis,
> 2) Reduce the number of puzzling messages that are logged -- in this
> case, by replacing them with standard messages, and/or
> 3) Identify the device (or driver name) that is responsible for the error.

2003-06-16 22:03:37

by David Miller

[permalink] [raw]
Subject: Re: patch for common networking error messages

From: Nivedita Singhvi <[email protected]>
Date: Mon, 16 Jun 2003 15:13:05 -0700

I'd certainly like to see messages from the driver when the
card enters/leaves promiscuous mode,

egrep "promiscuous mode" net/core/dev.c | grep printk

2003-06-16 22:12:42

by Andrew Morton

[permalink] [raw]
Subject: Re: patch for common networking error messages

"David S. Miller" <[email protected]> wrote:
>
> From: Andi Kleen <[email protected]>
> Date: Mon, 16 Jun 2003 22:53:42 +0200
>
> Why? It will make supporting netconsole easier which has to be careful
> to never recurse in the network driver.
>
> printk can check this

Actually it already does, to cover the case where an interrupt handler calls
printk while process-context code is performing a printk.

The nested printk will squirt the message into the log buffer and the
"outer" code will display it.

2003-06-16 22:15:47

by Janice Girouard

[permalink] [raw]
Subject: Re: patch for common networking error messages


From: "David S. Miller" <[email protected]>
Date : 06/16/2003 05:13 PM

egrep "promiscuous mode" net/core/dev.c | grep printk

I noticed when I performed the grep, the printk shows:
printk(KERN_INFO "device %s %s promiscuous mode\n"

For the sake of consistency and automatic error log analysis, it might be
nice to standardize on a message closer to:
printk(KERN_INFO "%s: %s promiscuous mode\n"

It's somewhat common, but not universal to start the error message with the
device name followed by a colon.

Janice



2003-06-16 22:18:16

by David Miller

[permalink] [raw]
Subject: Re: patch for common networking error messages

From: Janice Girouard <[email protected]>
Date: Mon, 16 Jun 2003 17:29:15 -0500

For the sake of consistency and automatic error log analysis, it might be

And all the scripts checking for the existing messages
in log files? Screw them, right?

This whole idea is starting to leave a very bad taste in
my mouth...

2003-06-16 22:32:28

by Nivedita Singhvi

[permalink] [raw]
Subject: Re: patch for common networking error messages

David S. Miller wrote:
> From: Janice Girouard <[email protected]>
> Date: Mon, 16 Jun 2003 17:29:15 -0500
>
> For the sake of consistency and automatic error log analysis, it might be
>
> And all the scripts checking for the existing messages
> in log files? Screw them, right?

Are you saying we never get to change any current
log messages ever again on accnt of the scripts that are
monitoring for those precise words? Hope not :)

I'd agree a lot of thought (and agreement :))has to go
into this before changing minor nits and stuff, and not
causing too much disruption..Evolution, as opposed to
revolution ;). I would hope that most wouldnt need changing..

thanks,
Nivedita



2003-06-16 22:34:36

by Jeff Garzik

[permalink] [raw]
Subject: Re: patch for common networking error messages

Janice M Girouard wrote:
> I agree that it's not desirable to introduce a bunch of messages that we
> aren't already logging. I didn't show the netif_msg prefix because I
> was trying to focus the patch on the common messages, but you would
> normally proceed a message with:
>
> if netif_msg_link()
> printk("some text to indicate the link is up/down")
>
> The netif_msg_link test would normally filter out what messages should
> be logged.


There are several issues at play here.

1) In general, I think you're approaching the logging from the wrong
angle. Start with netif_msg_xxx/NETIF_MSG_xxx first, and figure out the
logging API for those cases. These cover the majority of common cases,
and most are not specific to hardware at all. Starting at the driver
level and trying to move driver-specific messages into the upper layers
is the wrong direction, I feel.

2) If we are going to do major surgery on messages, make them more
computer-parseable at the same time. Human readable, since it must
above-all-else be kernel hacker readable, ... but computer parseable.

Here is an example. DISCLAIMER: No doubt there is a better format, it
is merely for illustration.
"%s: performance event: scatter/gather I/O disabled\n"
becomes
"dev=%s evt=perf sgio=disabled\n"

Basically a key-value format. Resist the urge to use numeric response
codes. For stuff like this, I think both Linus and the typical human
brain prefer English words to numeric response codes. This suggested
output is not unlike some arch's show-processor-state sysrq output.

3) _Somebody_ needs to do some "ground pounding", and figure out what
info sysadmins and users want to see. Event logging in general, so far,
seems to me more like a management checklist item than a real user
need... but I am quite willing to be proved wrong. Until we get
feedback along these lines, I tend to resist changes like this in
general. My initial read of your attached patch was that it was a long
of source churn, and I couldn't fathom what any user would gain from it all.

Jeff



There are a whole bunch of netif_msg_xxx and corresponding NETIF_MSG_xxx
bits. I don't see much need to change that I think getting the logging
API right for those would be an important first step.

Jeff


P.S. It is important to note the bits are laid out in increasing verbosity.

2003-06-16 22:39:27

by Nivedita Singhvi

[permalink] [raw]
Subject: Re: patch for common networking error messages

David S. Miller wrote:
> From: Nivedita Singhvi <[email protected]>
> Date: Mon, 16 Jun 2003 15:13:05 -0700
>
> I'd certainly like to see messages from the driver when the
> card enters/leaves promiscuous mode,
>
> egrep "promiscuous mode" net/core/dev.c | grep printk


Yeah, but dev_mc_upload() doesnt return any status ;).
(For those of us who distrust hw (Sorry Scott! :))).

But it was just my example, I assure you. I'm not
holding up a flag in the wind on this particular nit.
I do see positives in the feature as a whole though.
It would be a shame to get grounded for minor things..

thanks,
Nivedita



2003-06-16 22:37:09

by Janice Girouard

[permalink] [raw]
Subject: Re: patch for common networking error messages


From: David S. Miller <[email protected]>
Date:06/16/2003 05:27 PM

And all the scripts checking for the existing messages in log files?
Screw them, right?

That's a good point. One possible suggestion would be to submit more than
one stdmsgs.h files. One a legacy file, and one that is more consistent
from message to message.. shooting for a gradual migration.

Ultimately, I think standard messages would greatly support/simplify
scripts, especially between the myriad of ethernet drivers. Each one
reports the data slightly differently, so you're error log analysis needs
to recognize 100 or so ways of being told that the link just went down.

Janice







"David S. Miller"
<[email protected] To: Janice Girouard/Austin/IBM@IBMUS
> cc: Daniel Stekloff/Beaverton/IBM@IBMUS,
[email protected], [email protected],
06/16/2003 05:27 [email protected], Larry Kessler/Beaverton/IBM@IBMUS,
PM [email protected], [email protected],
[email protected]
Subject: Re: patch for common networking error messages






From: Janice Girouard <[email protected]>
Date: Mon, 16 Jun 2003 17:29:15 -0500

For the sake of consistency and automatic error log analysis, it might
be

And all the scripts checking for the existing messages
in log files? Screw them, right?

This whole idea is starting to leave a very bad taste in
my mouth...




2003-06-16 22:43:25

by David Miller

[permalink] [raw]
Subject: Re: patch for common networking error messages

From: Nivedita Singhvi <[email protected]>
Date: Mon, 16 Jun 2003 15:45:20 -0700

[ I removed this [email protected] from the CC:, it bounces... ]

I'd agree a lot of thought (and agreement :))has to go
into this before changing minor nits and stuff, and not
causing too much disruption..Evolution, as opposed to
revolution ;). I would hope that most wouldnt need changing..

There would be absolutely ZERO disruption if you guys would use you
brains and implement what you're actually trying to achieve, a system
event logging mechanism.

We have a message queueing mechanism using sockets, called netlink,
and you can make whatever actions in the kernel you think should be
monitored go and stuff messages into this system event netlink socket.

Then, you don't have to standardize a bunch of absolutely silly
strings (I mean, the concept is so incredibly stupid), you get events
that are in a precisely defined format going over this netlink socket.

Then whoever in userspace reads out the messages can interpret them
however the fuck it wants to. It is then trivial to parse the
messages and filter them. Furthermore, you could even transmit such
messages over a network connection to a remote logging server as-is.

And hey, look, for network links going up and down we have the hooks
already. Funny that...

2003-06-16 22:41:07

by Jeff Garzik

[permalink] [raw]
Subject: Re: patch for common networking error messages

Jeff Garzik wrote:
> 3) _Somebody_ needs to do some "ground pounding", and figure out what
> info sysadmins and users want to see. Event logging in general, so far,
> seems to me more like a management checklist item than a real user
> need... but I am quite willing to be proved wrong. Until we get
> feedback along these lines, I tend to resist changes like this in
> general. My initial read of your attached patch was that it was a long
> of source churn, and I couldn't fathom what any user would gain from it

make that "a lot of"


> There are a whole bunch of netif_msg_xxx and corresponding NETIF_MSG_xxx
> bits. I don't see much need to change that I think getting the logging
> API right for those would be an important first step.
>
> Jeff

arg :) I should fire my editor.

Jeff



2003-06-16 22:46:10

by David Miller

[permalink] [raw]
Subject: Re: patch for common networking error messages

From: Janice Girouard <[email protected]>
Date: Mon, 16 Jun 2003 17:50:08 -0500

One possible suggestion would be to submit more than one stdmsgs.h
files. One a legacy file, and one that is more consistent from
message to message.. shooting for a gradual migration.

Let me know when you're back on planet earth ok?

Standardizing strings is an absolutely FRUITLESS exercise.

If you want events, standardize events and push them over
a queueing based communications channel to userspace, namely
using netlink sockets.

2003-06-16 22:47:57

by David Miller

[permalink] [raw]
Subject: Re: patch for common networking error messages

From: Nivedita Singhvi <[email protected]>
Date: Mon, 16 Jun 2003 15:50:34 -0700

I do see positives in the feature as a whole though.

Would you design a network protocol this way? By passing
strings like "open connection please", "sure no problem"
back and forth between server and client?

Of course not.

So why are we even remotely considering the standardization
of _STRINGS_ for event reporting?

2003-06-16 23:03:25

by Donald Becker

[permalink] [raw]
Subject: Re: patch for common networking error messages

On Mon, 16 Jun 2003, Nivedita Singhvi wrote:
> Janice M Girouard wrote:
>
> > Below is a patch that demonstrates standard messages for ethernet device
> > drivers. I would like your feedback on the concept of standard network
> > messages, and any suggestions for messages to include.

There are many things that can go wrong, and most of them are very
dependent on the NIC's architecture. A few minutes of thought would
lead to the understanding that a enforced-common list of error/problem
messages is equivalent to a set of error counters. We have had that
since the initial Linux network hardware layer design. (Note: BSD of
that era had only a single "error" counter. )

> Essentially, things like some guidelines on classifying some
> of those messages, when creating new messages. eg when is
> something a state change and when is it a performance event?

The NETIF_MSG_* enable bits classify the status message types.

> I'd certainly like to see messages from the driver when the
> card enters/leaves promiscuous mode, as an example of things
> we'd like to add...

Most Ethernet drivers should already do this, right around the message
/* Unconditionally log net taps. */

> > 2) Reduce the number of puzzling messages that are logged -- in this
> > case, by replacing them with standard messages, and/or
> > 3) Identify the device (or driver name) that is responsible for the error.

The (physical) interface name, dev->name, should prefix every error
message. This is a recommendation, not enforced, so not every driver
does it.
[[[
Comment: Status/error messages are not an area to demonstrate
creativity, but all too frequently it becomes one.
]]]

--
Donald Becker [email protected]
Scyld Computing Corporation http://www.scyld.com
914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system
Annapolis MD 21403 410-990-9993

2003-06-16 23:54:14

by Nivedita Singhvi

[permalink] [raw]
Subject: Re: patch for common networking error messages

David S. Miller wrote:

> There would be absolutely ZERO disruption if you guys would use you
> brains and implement what you're actually trying to achieve, a system
> event logging mechanism.

> We have a message queueing mechanism using sockets, called netlink,
> and you can make whatever actions in the kernel you think should be
> monitored go and stuff messages into this system event netlink socket.

I should clarify here that I was speaking strictly for my lonesome sorry
self :), and have no knowledge of what the state of the various
RAS projects currently are, and the approaches they are trying..
For all I know, they may be currently trying precisely that..

Janice's patch is the first I've seen in this area (Luckily,
most of the time they keep me in a cave :) :)), and I do
appreciate *something* being done in this area, it seemed a
good start and really, I dont care how its implemented, I'll
leave that to the folks who have spent longer than the
8 mins I currently have on it..

> Then, you don't have to standardize a bunch of absolutely silly
> strings (I mean, the concept is so incredibly stupid), you get events
> that are in a precisely defined format going over this netlink socket.

Well, right now, thats all we have, right? Silly strings? But
thats not really my position, which is more like:
Whatever! Whatever! Somebody! Make it so! :) :).

> Then whoever in userspace reads out the messages can interpret them
> however the fuck it wants to. It is then trivial to parse the
> messages and filter them. Furthermore, you could even transmit such
> messages over a network connection to a remote logging server as-is.
>
> And hey, look, for network links going up and down we have the hooks
> already. Funny that...

OK, that is a good idea.. :)

thanks,
Nivedita



2003-06-17 00:31:44

by Janice Girouard

[permalink] [raw]
Subject: Re: patch for common networking error messages


From: David S. Miller" <[email protected]>
Date: Mon, 16 Jun 2003 17:50:08 -0500

If you want events, standardize events and push them over
a queueing based communications channel to userspace, namely
using netlink sockets.


It sounds like you are proposing a new family for the netlink
subsystem. I've included a list of the families I see in 2.5.70
the end of this note. I'm trying to understand which family handles
events such as link state changes or device initialization failure?
It sounds like you're proposing somethink equivalent to
NETLINK_TCP_DIAG. Is that right?

There seems to be some overlap between netlink and netdev notifier
events. Netdev notifier reports many key events. One event I don't
see reported is any indication that device initialization failed.
#define NETDEV_UP 0x0001
#define NETDEV_DOWN 0x0002
#define NETDEV_REBOOT 0x0003
#define NETDEV_CHANGE 0x0004
#define NETDEV_REGISTER 0x0005
#define NETDEV_UNREGISTER 0x0006
#define NETDEV_CHANGEMTU 0x0007
#define NETDEV_CHANGEADDR 0x0008
#define NETDEV_GOING_DOWN 0x0009
#define NETDEV_CHANGENAME 0x000A

One issue with netdev, is that it doesn't seem to allow
for the flexibility of the information passed that netlink has.
For example, when you issue a NETDEV_CHANGE, it's not clear
from the event what the specific change was.


==============================
NETLINK 2.5.70

netlink.h

#define NETLINK_ROUTE 0 /* Routing/device hook */
#define NETLINK_SKIP 1 /* Reserved for ENskip */
#define NETLINK_USERSOCK 2 /* Reserved for user mode socket protocols */
#define NETLINK_FIREWALL 3 /* Firewalling hook */
#define NETLINK_TCPDIAG 4 /* TCP socket monitoring */
#define NETLINK_NFLOG 5 /* netfilter/iptables ULOG */
#define NETLINK_XFRM 6 /* ipsec */
#define NETLINK_ARPD 8
#define NETLINK_ROUTE6 11 /* af_inet6 route comm channel */
#define NETLINK_IP6_FW 13
#define NETLINK_DNRTMSG 14 /* DECnet routing messages */
#define NETLINK_TAPBASE 16 /* 16 to 31 are ethertap */

Descriptions for many of these events can be found at
http://www.europe.redhat.com/documentation/man-pages/man7/netlink.7.php3


NETLINK_ROUTE
Receives routing updates and may be used
to modify the IPv4 routing table (see
NETLINK_FIREWALL
Receives packets sent by the IPv4 firewall code.
NETLINK_ARPD
For managing the arp table in user space.
NETLINK_ROUTE6
Receives and sends IPv6 routing table updates.
NETLINK_IP6_FW
to receive packets that failed the IPv6 firewall
checks (currently not implemented).
NETLINK_TAPBASE...NETLINK_TAPBASE+15
are the instances of the ethertap device. Ethertap
is a pseudo network tunnel device that allows an
ethernet driver to be simulated from user space.
NETLINK_SKIP
Reserved for ENskip.
NETLINK_USERSOCK
is reserved for future user space protocols.








2003-06-17 01:10:17

by David Miller

[permalink] [raw]
Subject: Re: patch for common networking error messages

From: Janice Girouard <[email protected]>
Date: Mon, 16 Jun 2003 19:44:22 -0500

It sounds like you are proposing a new family for the netlink
subsystem.

Exactly, you have to create this.

2003-06-17 02:01:59

by Janice Girouard

[permalink] [raw]
Subject: Re: patch for common networking error messages





From: Janice Girouard <[email protected]>
Date: Mon, 16 Jun 2003 19:44:22 -0500

It sounds like you are proposing a new family for the netlink
subsystem.

From: "David S. Miller" <[email protected]>

Date:06/16/2003 08:19 PM


Exactly, you have to create this.



Okay. That solves the issue of events generated in a plethora of formats
for the same event. Any suggestions on what should be included in this new
family? I can present a patch to suggest a starting point. However, it
would be great to hear from everyone that has any initial thoughts.



One question that comes to mind, since there is some overlap with netdev
notifier events, should we include those events in the new family? I can
envision a couple of approaches:



1) keep the two interfaces (netdev notifier and netlink), with separate end
users in mind and duplicate the events to each interface. Possibly
thinking about migrating to just one interface over time. Applications
would then just receive one set of events.



2) keep the two interfaces, with no duplication of messages, clarifying the
uses for the two interfaces. An application would then register, and
obtain events from the two separate mechanisms.

p.s. thanks for all the input so far.





2003-06-17 04:21:03

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: patch for common networking error messages

On Mon, 16 Jun 2003 21:12:50 CDT, Janice Girouard said:

> Okay. That solves the issue of events generated in a plethora of formats
> for the same event. Any suggestions on what should be included in this new
> family? I can present a patch to suggest a starting point. However, it
> would be great to hear from everyone that has any initial thoughts.

Well, at the risk of torquing off any SCO supporters, I'd suggest a quick
peek over at the general design of the AIX errpt/trace systems - in both
cases, data comes out of the kernel in a formatted binary stream, and then
a 'template' file is used to drive the parsing of the formatted data.

Quite slick overall, and nicely extensible - you add a new kernel subsystem
that has more trace points, you just tack its templates onto the end of
the format file and you're good to go....

(And SCO can't even claim that's their code - it's pretty obvious the
parentage of the AIX errpt/trace logging is the OS/VS1 and MVS SMF logging
features :)


Attachments:
(No filename) (226.00 B)

2003-06-17 06:56:21

by Andi Kleen

[permalink] [raw]
Subject: Re: patch for common networking error messages

> Actually it already does, to cover the case where an interrupt handler calls
> printk while process-context code is performing a printk.

I don't think it'll work. Both printk and release_console_sem take the logbuf_lock,
which will deadlock if the same CPU already holds it.

It would need to use the usual linux recursive lock hack.

-Andi

2003-06-17 07:05:58

by Andrew Morton

[permalink] [raw]
Subject: Re: patch for common networking error messages

Andi Kleen <[email protected]> wrote:
>
> > Actually it already does, to cover the case where an interrupt handler calls
> > printk while process-context code is performing a printk.
>
> I don't think it'll work. Both printk and release_console_sem take the logbuf_lock,
> which will deadlock if the same CPU already holds it.

Look more closely.

logbuf_lock is only held to protect the logbuf contents and its indices.
And to pin down the current console_sem holder to reliably ensure that
he'll print the text which the nested printer just placed in the buffer.

We do not call the console drivers while holding logbuf_lock. console_sem
is held across the console driver call.

2003-06-17 14:22:34

by Mr. James W. Laferriere

[permalink] [raw]
Subject: Re: patch for common networking error messages

Hello Dave , I know your time is constrained . BUT DAMN IT there
were other questions ask in the (partially quoted) email you are
replying to . Some of those answers would be nice to know .
RSVP , JimL

On Mon, 16 Jun 2003, David S. Miller wrote:
> From: Janice Girouard <[email protected]>
> Date: Mon, 16 Jun 2003 19:44:22 -0500
> It sounds like you are proposing a new family for the netlink
> subsystem.

> Exactly, you have to create this.

--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | P.O. Box 854 | Give me Linux |
| [email protected] | Coudersport PA 16915 | only on AXP |
+------------------------------------------------------------------+

2003-06-17 15:55:38

by Stephen Hemminger

[permalink] [raw]
Subject: Re: patch for common networking error messages

Binary interface's will never cut it. Read the hotplug thread to see how Linus
said, he will never add a binary event daemon interface.

That said, there is an oppurtunity for to provide a ascii interface (similar to
/sbin/hotplug) decodes the data from the rtnetlink interface in a standardized format.

Then it would be easy to write things like perl monitoring scripts that do things
like:
perl phone-me-if-network-dies.pl < /proc/net/events

Don't flame me about the choce of name. /proc/net/events is not the right name
to use for such an interface since adding more to /proc is probably not desired.

2003-06-17 16:00:08

by David Miller

[permalink] [raw]
Subject: Re: patch for common networking error messages

From: Stephen Hemminger <[email protected]>
Date: Tue, 17 Jun 2003 09:08:59 -0700

Read the hotplug thread to see how Linus
said, he will never add a binary event daemon interface.

Funny, rtnetlink is exactly this and it is in the tree :-)

Every networking configuration event is transmitted over
rtnetlink sockets to all listeners, in a fixed binary
format.

What Linus doesn't want is this for configuration events,
ie. things like "device appears" not "my ethernet burped"

2003-06-17 18:32:53

by Jeff Garzik

[permalink] [raw]
Subject: Re: patch for common networking error messages

David S. Miller wrote:
> From: Stephen Hemminger <[email protected]>
> Date: Tue, 17 Jun 2003 09:08:59 -0700
>
> Read the hotplug thread to see how Linus
> said, he will never add a binary event daemon interface.
>
> Funny, rtnetlink is exactly this and it is in the tree :-)


...and it's been in the tree for quite a while too. It's a shame people
aren't taking advantage of it's obvious utility...

Jeff



2003-06-17 18:55:14

by Janice M Girouard

[permalink] [raw]
Subject: Re: patch for common networking error messages

From: Jeff Gazik <[email protected]>
Date: Tue, 17 Jun 2003 01:46

...and it's been in the tree for quite a while too. It's a shame
people aren't taking advantage of it's obvious utility...


I'd like to hear what others believe belong in this new netlink family.
The two events that come to mind for this family (or netdev notified if
that's more appropriate) are:
1) device initialization failures,
2) events that drive load balancing software. Right now if we need to
throttle the card, we don't send events up to indicate we have reached
capacity.

Possibly the first might belong in the netdev notified family, and the
second in the netlink family, since you might want to present more than
a two state (success/failure.. or just failure in this case) result.
-




2003-06-17 19:09:27

by Jeff Garzik

[permalink] [raw]
Subject: Re: patch for common networking error messages

Janice M Girouard wrote:
> 2) events that drive load balancing software. Right now if we need to
> throttle the card, we don't send events up to indicate we have reached
> capacity.


Question related to this item specifically :)

Do you want to individually send 4000 - 16000 (or more) TX stop / start
events per second to userspace? :) At some point Heisenburg defeats
low latency :)

If not (and I hope not), perhaps also look into the net stack statistics
already kept (or add more sampling stats if necessary), and instead
trigger events based on sampling those statistics.

Jeff



2003-06-17 19:35:42

by Janice M Girouard

[permalink] [raw]
Subject: Re: patch for common networking error messages

Jeff Garzik wrote:

Do you want to individually send 4000 - 16000 (or more) TX stop /
start events per second to userspace? :) At some point Heisenburg
defeats low latency :)

How about looking at 1000 byte packet transmit example. A gigabit
adapter would send 125,000 packets per second. I'm thinking that most
of the time, you will have enough available buffers in the adapter that
you don't start to see the adapter buffers completely fill up. Are you
saying that 3.2% - 12.8% of the time in this case you're disabling the
tcp/ip stack because the transmit buffers on your card are completely
full? Perhaps with zero copy enabled, but the tcp/ip cpu load alone
will throttle your ability to fill the adapter buffers up.

What does your own experience indicate for gigabit adapter cards?

I could see the buffers backing up for 10/100 cards. So that case
favors your point. I'm still thinking that it's a sign someone should
be buying a 2nd card and ramping up their network capability. But I can
see your point.


2003-06-17 19:41:22

by David Miller

[permalink] [raw]
Subject: Re: patch for common networking error messages

From: Janice M Girouard <[email protected]>
Date: Tue, 17 Jun 2003 14:46:56 -0500

I could see the buffers backing up for 10/100 cards. So that case
favors your point. I'm still thinking that it's a sign someone should
be buying a 2nd card and ramping up their network capability. But I can
see your point.

And when we have 1GHZ memory busses and 10GHz cpus tomorrow,
what does this say for 1gbit and 10gbit cards?

You want to define a machine as having too much "work" or not, yet you
only want to consider one metric to do so. Such schemes are
fundamentally flawed.

2003-06-17 20:13:08

by Janice M Girouard

[permalink] [raw]
Subject: Re: patch for common networking error messages


From David S. Miller:

And when we have 1GHZ memory busses and 10GHz cpus tomorrow,
what does this say for 1gbit and 10gbit cards?

Such schemes are fundamentally flawed.

Bottom line.. I was asking for input, and I received it. It's valid
to say... look at the statistics. I really like the concept of driving
events through netlink, but querying statistics works.


p.s. It's been my experience that the memory system is the main
bottleneck when trying to support a heavy network load. When the 10
Gigabit card emerges, and it's here today, the memory system will be
pressed to support it, especially if you're not using zerocopy and
you're thinking of using more than one card. Perhaps if RDMA is
capabilities are added to Linux, then things might be different.

So.. when do you think RDMA will show up on Linx?

2003-06-17 20:18:00

by David Miller

[permalink] [raw]
Subject: Re: patch for common networking error messages

From: Janice M Girouard <[email protected]>
Date: Tue, 17 Jun 2003 15:24:20 -0500

Perhaps if RDMA is capabilities are added to Linux, then things
might be different.

So.. when do you think RDMA will show up on Linx?

RDMA is total junk.

On RX, clever RX buffer management is what we need.

On TX zerocopy + TSO is more than sufficient and we have
that today.

2003-06-17 20:27:28

by Janice Girouard

[permalink] [raw]
Subject: Re: patch for common networking error messages


From: David S. Miller" <[email protected]>
Date: 06/17/2003 03:27 PM

On RX, clever RX buffer management is what we need.

What RX buffer management are you proposing? I'm having a hard time
understanding how you'll get rid of the copy without support from the
card.

2003-06-17 20:32:44

by David Miller

[permalink] [raw]
Subject: Re: patch for common networking error messages

From: Janice Girouard <[email protected]>
Date: Tue, 17 Jun 2003 15:40:48 -0500

From: David S. Miller" <[email protected]>
Date: 06/17/2003 03:27 PM

On RX, clever RX buffer management is what we need.

What RX buffer management are you proposing? I'm having a hard time
understanding how you'll get rid of the copy without support from the
card.

Sigh... someone write store email down somewhere for the next time
someone asks about this.

The "one true way (tm)" works like this:

1) Chip has a "flow cache", LRU based, managed like routing caches
in many production router implementations. Difference is
that it merely does flow watching.

Flow entries are keyed on saddr/daddr/sport/dport. Flow misses
kill the oldest entry, and replace it with the new one.

Entries are only created in response to full sized data
packets.

2) The receive buffering is segmented into small (256 byte) and
PAGE sized buffers. IP/TCP/whatever headers (determined using
a simply programmable header parser logic, so you can do things
like RPC etc. headers for NFS) are put into the "small" buffers,
data portions for matching flows get accumulated into the PAGE
sized buffers.

It is implied that the card's flow cache keeps track of the
pointers into page it is currently trying to fill for that
flow.

So the first time you see a flow, you add a entry and grab a page
buffer and stick the data part into the page buffer and the
TCP/IP/etc. headers into a "small" buffer. You defer a configurable
amount of time waiting for more TCP data packets (a packet train)
to accumulate more into the PAGE buffer for that flow.

Such receive buffers are presented to the stack as a linked list
of packets, with some indicator that together their data parts are
filling a page.

Things like "sys_receivefile()" and NFS flip these things into the
filesystem page cache.

I'm surprised this isn't evident to more people...

2003-06-17 20:44:27

by Janice Girouard

[permalink] [raw]
Subject: Re: patch for common networking error messages


Did I understand:

> 1) Chip has a "flow cache", LRU based, managed like routing caches

You need the chip to support your technique. Are the vendors picking up on
this? I still don't see how this gets rid of the copy_to_user space once
you've gathered the buffers. How do you feed the user buffer addresses to
the card? You must have something equivalent to the queue pair management
supported in RDMA. What technique are you using? Is it proprietary?




"David S. Miller"
<[email protected] To: Janice Girouard/Austin/IBM@IBMUS
> cc: [email protected], [email protected],
[email protected], Daniel Stekloff/Beaverton/IBM@IBMUS,
06/17/2003 03:42 Larry Kessler/Beaverton/IBM@IBMUS,
PM [email protected], [email protected],
[email protected]
Subject: Re: patch for common networking error messages






From: Janice Girouard <[email protected]>
Date: Tue, 17 Jun 2003 15:40:48 -0500

From: David S. Miller" <[email protected]>
Date: 06/17/2003 03:27 PM

On RX, clever RX buffer management is what we need.

What RX buffer management are you proposing? I'm having a hard time
understanding how you'll get rid of the copy without support from the
card.

Sigh... someone write store email down somewhere for the next time
someone asks about this.

The "one true way (tm)" works like this:

1) Chip has a "flow cache", LRU based, managed like routing caches
in many production router implementations. Difference is
that it merely does flow watching.

Flow entries are keyed on saddr/daddr/sport/dport. Flow misses
kill the oldest entry, and replace it with the new one.

Entries are only created in response to full sized data
packets.

2) The receive buffering is segmented into small (256 byte) and
PAGE sized buffers. IP/TCP/whatever headers (determined using
a simply programmable header parser logic, so you can do things
like RPC etc. headers for NFS) are put into the "small" buffers,
data portions for matching flows get accumulated into the PAGE
sized buffers.

It is implied that the card's flow cache keeps track of the
pointers into page it is currently trying to fill for that
flow.

So the first time you see a flow, you add a entry and grab a page
buffer and stick the data part into the page buffer and the
TCP/IP/etc. headers into a "small" buffer. You defer a configurable
amount of time waiting for more TCP data packets (a packet train)
to accumulate more into the PAGE buffer for that flow.

Such receive buffers are presented to the stack as a linked list
of packets, with some indicator that together their data parts are
filling a page.

Things like "sys_receivefile()" and NFS flip these things into the
filesystem page cache.

I'm surprised this isn't evident to more people...




2003-06-17 21:05:20

by David Miller

[permalink] [raw]
Subject: Re: patch for common networking error messages

From: Janice Girouard <[email protected]>
Date: Tue, 17 Jun 2003 15:57:59 -0500

Did I understand:

> 1) Chip has a "flow cache", LRU based, managed like routing caches

You need the chip to support your technique.

No shit Sherlock.

But it should be noted that the idea can be fully verified in software
by adding the scheme into the RX processing of some existing ethernet
driver.

Are the vendors picking up on this?

If they're going to ignore my ideas, that's not my problem.

I still don't see how this gets rid of the copy_to_user space once
you've gathered the buffers. How do you feed the user buffer addresses to
the card?

You flip the pages into userspace, ie. you replace the page the user
currently has with the one the networking buffer is using.

What technique are you using? Is it proprietary?

ROFL!

2003-06-21 12:25:19

by Alan

[permalink] [raw]
Subject: Re: patch for common networking error messages

On Llu, 2003-06-16 at 23:55, David S. Miller wrote:
> Let me know when you're back on planet earth ok?
>
> Standardizing strings is an absolutely FRUITLESS exercise.

Standardising strings is a real help for end users, but its not the way
to approach logging issues I agree.

2003-06-21 14:13:44

by Jamal Hadi

[permalink] [raw]
Subject: Re: patch for common networking error messages



On Sat, 21 Jun 2003, Alan Cox wrote:

> On Llu, 2003-06-16 at 23:55, David S. Miller wrote:
> > Let me know when you're back on planet earth ok?
> >
> > Standardizing strings is an absolutely FRUITLESS exercise.
>
> Standardising strings is a real help for end users, but its not the way
> to approach logging issues I agree.

now that xml is the holy grail ive seen people actually
preach xml strings as encoding for protocols ;-> The arguement
i have seen put forward is that strings are easier to read
for users than binary encoding ;-> Therefore they can debug problems.
There maybe cases where this may be valid[1] - the only problem is
a lot of loonies will think this is the next sliced bread.

what about all that bandwidth stoopid xml consumes?
"bandwidth? Who has a problem with bandwidth?;->
what about all that involved processiong of stoopid xml?
"cpu? who has CPU problems?"
Intel has a 10Gige NIC, a 2Mhz cpu, adn 4G DDR Ram for your hungry
applications.
Its a conspiracy i tell ya ;->

cheers,
jamal

[1] For people who use expect for example to send string commands
to a remote system to configure things, when expect (simple req-resp)
becomes too simple you may need something more sophisticated.
They are already sending strings across tcp probably.
Infact a IETF working group has been formed to standardixe this.
http://www.ietf.org/html.charters/netconf-charter.html
theres a draft at :
http://www.ietf.org/internet-drafts/draft-enns-xmlconf-spec-00.txt

The only unfortunate side effect to this is you will see a lot
idjots putting XML in protocols from now on just because.

2003-06-23 00:38:04

by David Miller

[permalink] [raw]
Subject: Re: patch for common networking error messages

From: Alan Cox <[email protected]>
Date: 21 Jun 2003 13:36:54 +0100

Standardising strings is a real help for end users,

I agree. But my objections are in the context of doing this
inside the kernel, where such things do not belong.

2003-06-23 11:42:35

by Alan

[permalink] [raw]
Subject: Re: patch for common networking error messages

On Llu, 2003-06-23 at 01:46, David S. Miller wrote:
> From: Alan Cox <[email protected]>
> Date: 21 Jun 2003 13:36:54 +0100
>
> Standardising strings is a real help for end users,
>
> I agree. But my objections are in the context of doing this
> inside the kernel, where such things do not belong.

Standardising strings for end users in the kernel is also good
because it both saves space and makes things more consistent for
the poor human wondering what blew up.

Standardising them for programs to parse is not a good idea