2006-08-18 22:07:06

by linas

[permalink] [raw]
Subject: [PATCH 0/6]: powerpc/cell spidernet ethernet driver update


Jeff,

Can you apply and forward upstream the following series of patches?
This is the same set of patches I sent only a few days ago; however,
it appears that I failed to cc you on some of them, and I failed to
ask anyone in particular to actually apply them. So I'm asking now. :-)

The maintainership of the spidernet driver is transitioning from
Utz Bacher and Jens Osterkamp to Jim Lewis; you should be receiving
a patch to the MAINTAINERS file from Jim, noting this, shortly.
I suspect that neither Utz nor Jens will be ack'ing or nack'ing
these patches unless prodded; let me know if you want some particlar
ack from someone before you apply these. (BenH and Arnd Bergmann
are the ones who usually keep us honest about these things.)

Although these are based on a week-old mm tree, they should
apply cleanly just about anywhere, as there has been little/no
change to this code base in a long while.

The patches bring the transmit performance of the driver up
to a decent level, as reviewed in greter detail in the patch
descriptions.

--linas


2006-08-18 22:20:42

by linas

[permalink] [raw]
Subject: [PATCH 1/6]: powerpc/cell spidernet burst alignment patch



This patch increases the Burst Address alignment from 64 to 1024 in the
Spidernet driver. This improves transmit performance for arge packets
from about 100Mbps to 300-400Mbps.

From: James K Lewis <[email protected]>
Signed-off-by: James K Lewis <[email protected]>
Signed-off-by: Linas Vepstas <[email protected]>
Cc: Utz Bacher <[email protected]>
Cc: Jens Osterkamp <[email protected]>
Cc: Arnd Bergmann <[email protected]>

----
drivers/net/spider_net.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.h
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.h 2006-08-07 14:37:10.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.h 2006-08-11 11:09:57.000000000 -0500
@@ -209,7 +209,7 @@ extern char spider_net_driver_name[];
#define SPIDER_NET_DMA_RX_FEND_VALUE 0x00030003
/* to set TX_DMA_EN */
#define SPIDER_NET_TX_DMA_EN 0x80000000
-#define SPIDER_NET_GDTDCEIDIS 0x00000002
+#define SPIDER_NET_GDTDCEIDIS 0x00000302
#define SPIDER_NET_DMA_TX_VALUE SPIDER_NET_TX_DMA_EN | \
SPIDER_NET_GDTDCEIDIS
#define SPIDER_NET_DMA_TX_FEND_VALUE 0x00030003

2006-08-18 22:21:56

by linas

[permalink] [raw]
Subject: [PATCH 2/6]: powerpc/cell spidernet low watermark patch.



Implement basic low-watermark support for the transmit queue.
Hardware low-watermarks allow a properly configured kernel
to continously stream data to a device and not have to handle
any interrupts at all in doing so. Correct zero-interrupt
operation can be actually observed for this driver, when the
socket buffer is made large enough.

The basic idea of a low-watermark interrupt is as follows.
The device driver queues up a bunch of packets for the hardware
to transmit, and then kicks the hardware to get it started.
As the hardware drains the queue of pending, untransmitted
packets, the device driver will want to know when the queue
is almost empty, so that it can queue some more packets.

If the queue drains down to the low waterark, then an interrupt
will be generated. However, if the kernel/driver continues
to add enough packets to keep the queue partially filled,
no interrupt will actually be generated, and the hardware
can continue streaming packets indefinitely in this mode.

The impelmentation is done by setting the DESCR_TXDESFLG flag
in one of the packets. When the hardware sees this flag, it will
interrupt the device driver. Because this flag is on a fixed
packet, rather than at fixed location in the queue, the
code below needs to move the flag as more packets are
queued up. This implementation attempts to keep the flag
at about 1/4 from "empty".

This patch boosts driver performance from about
300-400Mbps for 1500 byte packets, to about 710-740Mbps.


Signed-off-by: Linas Vepstas <[email protected]>
Signed-off-by: James K Lewis <[email protected]>
Cc: Utz Bacher <[email protected]>
Cc: Jens Osterkamp <[email protected]>
Cc: Arnd Bergmann <[email protected]>

----
drivers/net/spider_net.c | 56 ++++++++++++++++++++++++++++++++++++++++++-----
drivers/net/spider_net.h | 6 +++--
2 files changed, 55 insertions(+), 7 deletions(-)

Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.c 2006-08-07 14:39:38.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.c 2006-08-11 11:23:24.000000000 -0500
@@ -700,6 +700,39 @@ spider_net_release_tx_descr(struct spide
dev_kfree_skb_any(skb);
}

+static void
+spider_net_set_low_watermark(struct spider_net_card *card)
+{
+ int status;
+ int cnt=0;
+ int i;
+ struct spider_net_descr *descr = card->tx_chain.tail;
+
+ /* Measure the length of the queue. */
+ while (descr != card->tx_chain.head) {
+ status = descr->dmac_cmd_status & SPIDER_NET_DESCR_NOT_IN_USE;
+ if (status == SPIDER_NET_DESCR_NOT_IN_USE)
+ break;
+ descr = descr->next;
+ cnt++;
+ }
+ if (cnt == 0)
+ return;
+
+ /* Set low-watermark 3/4th's of the way into the queue. */
+ descr = card->tx_chain.tail;
+ cnt = (cnt*3)/4;
+ for (i=0;i<cnt; i++)
+ descr = descr->next;
+
+ /* Set the new watermark, clear the old wtermark */
+ descr->dmac_cmd_status |= SPIDER_NET_DESCR_TXDESFLG;
+ if (card->low_watermark && card->low_watermark != descr)
+ card->low_watermark->dmac_cmd_status =
+ card->low_watermark->dmac_cmd_status & ~SPIDER_NET_DESCR_TXDESFLG;
+ card->low_watermark = descr;
+}
+
/**
* spider_net_release_tx_chain - processes sent tx descriptors
* @card: adapter structure
@@ -717,6 +750,7 @@ spider_net_release_tx_chain(struct spide
{
struct spider_net_descr_chain *chain = &card->tx_chain;
int status;
+ int rc=0;

spider_net_read_reg(card, SPIDER_NET_GDTDMACCNTR);

@@ -729,8 +763,10 @@ spider_net_release_tx_chain(struct spide
break;

case SPIDER_NET_DESCR_CARDOWNED:
- if (!brutal)
- return 1;
+ if (!brutal) {
+ rc = 1;
+ goto done;
+ }
/* fallthrough, if we release the descriptors
* brutally (then we don't care about
* SPIDER_NET_DESCR_CARDOWNED) */
@@ -747,12 +783,15 @@ spider_net_release_tx_chain(struct spide

default:
card->netdev_stats.tx_dropped++;
- return 1;
+ rc = 1;
+ goto done;
}
spider_net_release_tx_descr(card);
}
-
- return 0;
+done:
+ if (rc == 1)
+ spider_net_set_low_watermark(card);
+ return rc;
}

/**
@@ -1453,6 +1492,10 @@ spider_net_interrupt(int irq, void *ptr,
spider_net_rx_irq_off(card);
netif_rx_schedule(netdev);
}
+ if (status_reg & SPIDER_NET_TXINT ) {
+ spider_net_cleanup_tx_ring(card);
+ netif_wake_queue(netdev);
+ }

if (status_reg & SPIDER_NET_ERRINT )
spider_net_handle_error_irq(card, status_reg);
@@ -1615,6 +1658,9 @@ spider_net_open(struct net_device *netde
card->descr,
PCI_DMA_TODEVICE, tx_descriptors))
goto alloc_tx_failed;
+
+ card->low_watermark = NULL;
+
if (spider_net_init_chain(card, &card->rx_chain,
card->descr + tx_descriptors,
PCI_DMA_FROMDEVICE, rx_descriptors))
Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.h
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.h 2006-08-11 11:09:57.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.h 2006-08-11 11:19:47.000000000 -0500
@@ -47,7 +47,7 @@ extern char spider_net_driver_name[];
#define SPIDER_NET_TX_DESCRIPTORS_MIN 16
#define SPIDER_NET_TX_DESCRIPTORS_MAX 512

-#define SPIDER_NET_TX_TIMER 20
+#define SPIDER_NET_TX_TIMER (HZ/5)

#define SPIDER_NET_RX_CSUM_DEFAULT 1

@@ -209,7 +209,7 @@ extern char spider_net_driver_name[];
#define SPIDER_NET_DMA_RX_FEND_VALUE 0x00030003
/* to set TX_DMA_EN */
#define SPIDER_NET_TX_DMA_EN 0x80000000
-#define SPIDER_NET_GDTDCEIDIS 0x00000302
+#define SPIDER_NET_GDTDCEIDIS 0x00000300
#define SPIDER_NET_DMA_TX_VALUE SPIDER_NET_TX_DMA_EN | \
SPIDER_NET_GDTDCEIDIS
#define SPIDER_NET_DMA_TX_FEND_VALUE 0x00030003
@@ -349,6 +349,7 @@ enum spider_net_int2_status {
#define SPIDER_NET_DESCR_FORCE_END 0x50000000 /* used in rx and tx */
#define SPIDER_NET_DESCR_CARDOWNED 0xA0000000 /* used in rx and tx */
#define SPIDER_NET_DESCR_NOT_IN_USE 0xF0000000
+#define SPIDER_NET_DESCR_TXDESFLG 0x00800000

struct spider_net_descr {
/* as defined by the hardware */
@@ -424,6 +425,7 @@ struct spider_net_card {

struct spider_net_descr_chain tx_chain;
struct spider_net_descr_chain rx_chain;
+ struct spider_net_descr *low_watermark;

struct net_device_stats netdev_stats;

2006-08-18 22:23:14

by linas

[permalink] [raw]
Subject: [PATCH 3/6]: powerpc/cell spidernet stop error printing patch.



Turn off mis-interpretation of the queue-empty interrupt
status bit as an error. This bit is set as a part of
the previous low-watermark patch.

Signed-off-by: Linas Vepstas <[email protected]>
Signed-off-by: James K Lewis <[email protected]>
Cc: Utz Bacher <[email protected]>
Cc: Jens Osterkamp <[email protected]>
Cc: Arnd Bergmann <[email protected]>

----
drivers/net/spider_net.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.c 2006-08-11 11:23:24.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.c 2006-08-11 11:34:16.000000000 -0500
@@ -1275,12 +1275,15 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_PHYINT:
case SPIDER_NET_GMAC2INT:
case SPIDER_NET_GMAC1INT:
- case SPIDER_NET_GIPSINT:
case SPIDER_NET_GFIFOINT:
case SPIDER_NET_DMACINT:
case SPIDER_NET_GSYSINT:
break; */

+ case SPIDER_NET_GIPSINT:
+ show_error = 0;
+ break;
+
case SPIDER_NET_GPWOPCMPINT:
/* PHY write operation completed */
show_error = 0;
@@ -1339,9 +1342,10 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GDTDCEINT:
/* chain end. If a descriptor should be sent, kick off
* tx dma
- if (card->tx_chain.tail == card->tx_chain.head)
+ if (card->tx_chain.tail != card->tx_chain.head)
spider_net_kick_tx_dma(card);
- show_error = 0; */
+ */
+ show_error = 0;
break;

/* case SPIDER_NET_G1TMCNTINT: not used. print a message */
@@ -1455,8 +1459,9 @@ spider_net_handle_error_irq(struct spide
}

if ((show_error) && (netif_msg_intr(card)))
- pr_err("Got error interrupt, GHIINT0STS = 0x%08x, "
+ pr_err("Got error interrupt on %s, GHIINT0STS = 0x%08x, "
"GHIINT1STS = 0x%08x, GHIINT2STS = 0x%08x\n",
+ card->netdev->name,
status_reg, error_reg1, error_reg2);

/* clear interrupt sources */

2006-08-18 22:25:05

by linas

[permalink] [raw]
Subject: [PATCH 4/6]: powerpc/cell spidernet ethtool -i version number info.



This patch adds version information as reported by
ethtool -i to the Spidernet driver.

From: James K Lewis <[email protected]>
Signed-off-by: James K Lewis <[email protected]>
Signed-off-by: Linas Vepstas <[email protected]>
Cc: Utz Bacher <[email protected]>
Cc: Jens Osterkamp <[email protected]>

----
drivers/net/spider_net.c | 3 +++
drivers/net/spider_net.h | 2 ++
drivers/net/spider_net_ethtool.c | 2 +-
3 files changed, 6 insertions(+), 1 deletion(-)

Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.c 2006-08-11 11:34:16.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.c 2006-08-11 11:38:48.000000000 -0500
@@ -55,6 +55,7 @@ MODULE_AUTHOR("Utz Bacher <utz.bacher@de
"<[email protected]>");
MODULE_DESCRIPTION("Spider Southbridge Gigabit Ethernet driver");
MODULE_LICENSE("GPL");
+MODULE_VERSION(VERSION);

static int rx_descriptors = SPIDER_NET_RX_DESCRIPTORS_DEFAULT;
static int tx_descriptors = SPIDER_NET_TX_DESCRIPTORS_DEFAULT;
@@ -2293,6 +2294,8 @@ static struct pci_driver spider_net_driv
*/
static int __init spider_net_init(void)
{
+ printk("spidernet Version %s.\n",VERSION);
+
if (rx_descriptors < SPIDER_NET_RX_DESCRIPTORS_MIN) {
rx_descriptors = SPIDER_NET_RX_DESCRIPTORS_MIN;
pr_info("adjusting rx descriptors to %i.\n", rx_descriptors);
Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.h
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.h 2006-08-11 11:19:47.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.h 2006-08-11 11:38:48.000000000 -0500
@@ -24,6 +24,8 @@
#ifndef _SPIDER_NET_H
#define _SPIDER_NET_H

+#define VERSION "1.1 A"
+
#include "sungem_phy.h"

extern int spider_net_stop(struct net_device *netdev);
Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net_ethtool.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net_ethtool.c 2006-06-17 20:49:35.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net_ethtool.c 2006-08-11 11:38:48.000000000 -0500
@@ -55,7 +55,7 @@ spider_net_ethtool_get_drvinfo(struct ne
/* clear and fill out info */
memset(drvinfo, 0, sizeof(struct ethtool_drvinfo));
strncpy(drvinfo->driver, spider_net_driver_name, 32);
- strncpy(drvinfo->version, "0.1", 32);
+ strncpy(drvinfo->version, VERSION, 32);
strcpy(drvinfo->fw_version, "no information");
strncpy(drvinfo->bus_info, pci_name(card->pdev), 32);
}

2006-08-18 22:27:00

by linas

[permalink] [raw]
Subject: [PATCH 5/6]: powerpc/cell spidernet bottom half


The recent set of low-waterark patches for the spider result in a
significant amount of computing being done in an interrupt context.
This patch moves this to a "bottom half" aka work queue, so that
the code runs in a normal kernel context. Curiously, this seems to
result in a performance boost of about 5% for large packets.

Signed-off-by: Linas Vepstas <[email protected]>
Cc: Utz Bacher <[email protected]>
Cc: Jens Osterkamp <[email protected]>
Cc: James K Lewis <[email protected]>

----
drivers/net/spider_net.c | 21 +++++++++++++++------
drivers/net/spider_net.h | 4 +++-
2 files changed, 18 insertions(+), 7 deletions(-)

Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.c 2006-08-15 14:25:50.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.c 2006-08-15 14:28:56.000000000 -0500
@@ -883,9 +883,10 @@ out:
* spider_net_cleanup_tx_ring - cleans up the TX ring
* @card: card structure
*
- * spider_net_cleanup_tx_ring is called by the tx_timer (as we don't use
- * interrupts to cleanup our TX ring) and returns sent packets to the stack
- * by freeing them
+ * spider_net_cleanup_tx_ring is called by either the tx_timer
+ * or from a work-queue scheduled by the tx-empty interrupt.
+ * This routine releases resources associted with transmitted
+ * packets, including updating the queue tail pointer.
*/
static void
spider_net_cleanup_tx_ring(struct spider_net_card *card)
@@ -895,12 +896,20 @@ spider_net_cleanup_tx_ring(struct spider
spin_lock_irqsave(&card->tx_chain.lock, flags);

if ((spider_net_release_tx_chain(card, 0) != 0) &&
- (card->netdev->flags & IFF_UP))
+ (card->netdev->flags & IFF_UP)) {
spider_net_kick_tx_dma(card);
+ netif_wake_queue(card->netdev);
+ }

spin_unlock_irqrestore(&card->tx_chain.lock, flags);
}

+static void
+spider_net_tx_cleanup_task(void * data)
+{
+ spider_net_cleanup_tx_ring((struct spider_net_card *) data);
+}
+
/**
* spider_net_do_ioctl - called for device ioctls
* @netdev: interface device structure
@@ -1499,8 +1508,7 @@ spider_net_interrupt(int irq, void *ptr,
netif_rx_schedule(netdev);
}
if (status_reg & SPIDER_NET_TXINT ) {
- spider_net_cleanup_tx_ring(card);
- netif_wake_queue(netdev);
+ schedule_work(&card->tx_cleanup_task);
}

if (status_reg & SPIDER_NET_ERRINT )
@@ -2117,6 +2125,7 @@ spider_net_alloc_card(void)
card->netdev = netdev;
card->msg_enable = SPIDER_NET_DEFAULT_MSG;
INIT_WORK(&card->tx_timeout_task, spider_net_tx_timeout_task, netdev);
+ INIT_WORK(&card->tx_cleanup_task, spider_net_tx_cleanup_task, card);
init_waitqueue_head(&card->waitq);
atomic_set(&card->tx_timeout_task_counter, 0);

Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.h
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.h 2006-08-15 14:25:50.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.h 2006-08-15 14:28:56.000000000 -0500
@@ -24,7 +24,7 @@
#ifndef _SPIDER_NET_H
#define _SPIDER_NET_H

-#define VERSION "1.1 A"
+#define VERSION "1.1 B"

#include "sungem_phy.h"

@@ -441,6 +441,8 @@ struct spider_net_card {
atomic_t tx_timeout_task_counter;
wait_queue_head_t waitq;

+ struct work_struct tx_cleanup_task;
+
/* for ethtool */
int msg_enable;

2006-08-18 22:29:06

by linas

[permalink] [raw]
Subject: [PATCH 6/6]: powerpc/cell spidernet refine locking



The transmit side of the spider ethernet driver currently
places locks around some very large chunks of code. This
results in a fair amount of lock contention is some cases.
This patch makes the locks much more fine-grained, protecting
only the cirtical sections. One lock is used to protect
three locations: the queue head and tail pointers, and the
queue low-watermark location.

This, with the previous patches, result in the following
performance, using netperf, averaged over 5 minute runs:

pkt size rate
-------- ----
1500 804 Mbits/sec
800 701 Mbits/sec
600 600 Mbits/sec
300 280 Mbits/sec
60 60 Mbits/sec


Signed-off-by: Linas Vepstas <[email protected]>
Cc: Utz Bacher <[email protected]>
Cc: Jens Osterkamp <[email protected]>
Cc: James K Lewis <[email protected]>

----
drivers/net/spider_net.c | 77 ++++++++++++++++++++---------------------------
drivers/net/spider_net.h | 2 -
2 files changed, 35 insertions(+), 44 deletions(-)

Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.c 2006-08-15 14:28:56.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.c 2006-08-15 14:29:36.000000000 -0500
@@ -644,8 +644,9 @@ static int
spider_net_prepare_tx_descr(struct spider_net_card *card,
struct sk_buff *skb)
{
- struct spider_net_descr *descr = card->tx_chain.head;
+ struct spider_net_descr *descr;
dma_addr_t buf;
+ unsigned long flags;

buf = pci_map_single(card->pdev, skb->data, skb->len, PCI_DMA_TODEVICE);
if (buf == DMA_ERROR_CODE) {
@@ -655,6 +656,10 @@ spider_net_prepare_tx_descr(struct spide
return -ENOMEM;
}

+ spin_lock_irqsave(&card->tx_chain.lock, flags);
+ descr = card->tx_chain.head;
+ card->tx_chain.head = card->tx_chain.head->next;
+
descr->buf_addr = buf;
descr->buf_size = skb->len;
descr->next_descr_addr = 0;
@@ -663,6 +668,8 @@ spider_net_prepare_tx_descr(struct spide

descr->dmac_cmd_status =
SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_NOCS;
+ spin_unlock_irqrestore(&card->tx_chain.lock, flags);
+
if (skb->protocol == htons(ETH_P_IP))
switch (skb->nh.iph->protocol) {
case IPPROTO_TCP:
@@ -673,37 +680,16 @@ spider_net_prepare_tx_descr(struct spide
break;
}

+ /* Chain the bus address, so that the DMA engine finds this descr. */
descr->prev->next_descr_addr = descr->bus_addr;

return 0;
}

-/**
- * spider_net_release_tx_descr - processes a used tx descriptor
- * @card: card structure
- * @descr: descriptor to release
- *
- * releases a used tx descriptor (unmapping, freeing of skb)
- */
-static inline void
-spider_net_release_tx_descr(struct spider_net_card *card)
-{
- struct spider_net_descr *descr = card->tx_chain.tail;
- struct sk_buff *skb;
-
- card->tx_chain.tail = card->tx_chain.tail->next;
- descr->dmac_cmd_status |= SPIDER_NET_DESCR_NOT_IN_USE;
-
- /* unmap the skb */
- skb = descr->skb;
- pci_unmap_single(card->pdev, descr->buf_addr, skb->len,
- PCI_DMA_TODEVICE);
- dev_kfree_skb_any(skb);
-}
-
static void
spider_net_set_low_watermark(struct spider_net_card *card)
{
+ unsigned long flags;
int status;
int cnt=0;
int i;
@@ -727,11 +713,13 @@ spider_net_set_low_watermark(struct spid
descr = descr->next;

/* Set the new watermark, clear the old wtermark */
+ spin_lock_irqsave(&card->tx_chain.lock, flags);
descr->dmac_cmd_status |= SPIDER_NET_DESCR_TXDESFLG;
if (card->low_watermark && card->low_watermark != descr)
card->low_watermark->dmac_cmd_status =
card->low_watermark->dmac_cmd_status & ~SPIDER_NET_DESCR_TXDESFLG;
card->low_watermark = descr;
+ spin_unlock_irqrestore(&card->tx_chain.lock, flags);
}

/**
@@ -750,22 +738,30 @@ static int
spider_net_release_tx_chain(struct spider_net_card *card, int brutal)
{
struct spider_net_descr_chain *chain = &card->tx_chain;
+ struct spider_net_descr *descr;
+ struct sk_buff *skb;
+ u32 buf_addr;
+ unsigned long flags;
int status;
int rc=0;

spider_net_read_reg(card, SPIDER_NET_GDTDMACCNTR);

while (chain->tail != chain->head) {
- status = spider_net_get_descr_status(chain->tail);
+ spin_lock_irqsave(&chain->lock, flags);
+ descr = chain->tail;
+
+ status = spider_net_get_descr_status(descr);
switch (status) {
case SPIDER_NET_DESCR_COMPLETE:
card->netdev_stats.tx_packets++;
- card->netdev_stats.tx_bytes += chain->tail->skb->len;
+ card->netdev_stats.tx_bytes += descr->skb->len;
break;

case SPIDER_NET_DESCR_CARDOWNED:
if (!brutal) {
rc = 1;
+ spin_unlock_irqrestore(&chain->lock, flags);
goto done;
}
/* fallthrough, if we release the descriptors
@@ -785,9 +781,19 @@ spider_net_release_tx_chain(struct spide
default:
card->netdev_stats.tx_dropped++;
rc = 1;
+ spin_unlock_irqrestore(&chain->lock, flags);
goto done;
}
- spider_net_release_tx_descr(card);
+
+ chain->tail = chain->tail->next;
+ descr->dmac_cmd_status |= SPIDER_NET_DESCR_NOT_IN_USE;
+ skb = descr->skb;
+ buf_addr = descr->buf_addr;
+ spin_unlock_irqrestore(&chain->lock, flags);
+
+ /* unmap the skb */
+ pci_unmap_single(card->pdev, buf_addr, skb->len, PCI_DMA_TODEVICE);
+ dev_kfree_skb_any(skb);
}
done:
if (rc == 1)
@@ -844,11 +850,8 @@ spider_net_xmit(struct sk_buff *skb, str
struct spider_net_card *card = netdev_priv(netdev);
struct spider_net_descr_chain *chain = &card->tx_chain;
struct spider_net_descr *descr = chain->head;
- unsigned long flags;
int result;

- spin_lock_irqsave(&chain->lock, flags);
-
spider_net_release_tx_chain(card, 0);

if (chain->head->next == chain->tail->prev) {
@@ -869,12 +872,9 @@ spider_net_xmit(struct sk_buff *skb, str
}

result = NETDEV_TX_OK;
-
spider_net_kick_tx_dma(card);
- card->tx_chain.head = card->tx_chain.head->next;

out:
- spin_unlock_irqrestore(&chain->lock, flags);
netif_wake_queue(netdev);
return result;
}
@@ -891,17 +891,11 @@ out:
static void
spider_net_cleanup_tx_ring(struct spider_net_card *card)
{
- unsigned long flags;
-
- spin_lock_irqsave(&card->tx_chain.lock, flags);
-
if ((spider_net_release_tx_chain(card, 0) != 0) &&
(card->netdev->flags & IFF_UP)) {
spider_net_kick_tx_dma(card);
netif_wake_queue(card->netdev);
}
-
- spin_unlock_irqrestore(&card->tx_chain.lock, flags);
}

static void
@@ -1932,10 +1926,7 @@ spider_net_stop(struct net_device *netde
spider_net_disable_rxdmac(card);

/* release chains */
- if (spin_trylock(&card->tx_chain.lock)) {
- spider_net_release_tx_chain(card, 1);
- spin_unlock(&card->tx_chain.lock);
- }
+ spider_net_release_tx_chain(card, 1);

spider_net_free_chain(card, &card->tx_chain);
spider_net_free_chain(card, &card->rx_chain);
Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.h
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.h 2006-08-15 14:28:56.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.h 2006-08-15 14:29:36.000000000 -0500
@@ -24,7 +24,7 @@
#ifndef _SPIDER_NET_H
#define _SPIDER_NET_H

-#define VERSION "1.1 B"
+#define VERSION "1.1 C"

#include "sungem_phy.h"

2006-08-18 22:51:20

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 1/6]: powerpc/cell spidernet burst alignment patch

On Saturday 19 August 2006 00:20, Linas Vepstas wrote:
> This patch increases the Burst Address alignment from 64 to 1024 in the
> Spidernet driver. This improves transmit performance for arge packets
> from about 100Mbps to 300-400Mbps.

Acked-by: Arnd Bergmann <[email protected]>

2006-08-18 22:56:55

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 4/6]: powerpc/cell spidernet ethtool -i version number info.

On Saturday 19 August 2006 00:25, Linas Vepstas wrote:
> This patch adds version information as reported by
> ethtool -i to the Spidernet driver.

Acked-by: Arnd Bergmann <[email protected]>

except for

> @@ -2293,6 +2294,8 @@ static struct pci_driver spider_net_driv
> ? */
> ?static int __init spider_net_init(void)
> ?{
> +???????printk("spidernet Version %s.\n",VERSION);
> +

This printk is missing a level (KERN_INFO or similar). Moreover,
it is rather strange for a driver to print a message when no
device is actually used by it. I'd rather drop the version
printk completely, but I know that Jim has strong feelings about
what to do with version information. I suggest that if we decide
to keep something like that in the driver, it should be printed
in spider_net_probe().

Arnd <><

2006-08-18 23:03:19

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 5/6]: powerpc/cell spidernet bottom half

On Saturday 19 August 2006 00:26, Linas Vepstas wrote:
> The recent set of low-waterark patches for the spider result in a
> significant amount of computing being done in an interrupt context.
> This patch moves this to a "bottom half" aka work queue, so that
> the code runs in a normal kernel context. Curiously, this seems to
> result in a performance boost of about 5% for large packets.

I guess this one still needs some work. We already have a bottom
half mechanism in the network layer, using the NAPI poll function
that is strongly serialized.

Linas, you wrote that you have tried doing the TX descriptor cleanup
in dev->poll(), but I think you missed the point that this function
needs should then be scheduled whenever necessary.

This seems a little strange, but I think what we need to do is in the
low-watermark interrupt call netif_rx_schedule(netdev) in order to
arrange for the ->poll function to be called in the next softirq.

Someone should probably document that in
Documentation/networking/NAPI_HOWTO.txt, I might end up doing that
once we get it right for spidernet.

Arnd <><

2006-08-19 00:56:48

by Arnd Bergmann

[permalink] [raw]
Subject: [RFC] HOWTO use NAPI to reduce TX interrupts

On Saturday 19 August 2006 01:03, Arnd Bergmann wrote:
> Someone should probably document that in
> Documentation/networking/NAPI_HOWTO.txt, I might end up doing that
> once we get it right for spidernet.

Oh well, what else is there to do on a Friday night ;-)

This is a first draft, I expect to have gotten some aspects wrong,
so please review well.

For those that did not follow the spidernet discussion, it turns out
that all good ethernet drivers call their TX descriptor reclaim function
from their poll() method, but there is no documentation explaining
why this is a good idea . I talked to a number of people that have
written network drivers before and most of them have not understood
this well enough. I'm sure spidernet is not the only driver that
implemented this wrong.

Thanks to benh and davem for being really patient about explaining this
to me!

Arnd <><

diff --git a/Documentation/networking/NAPI_HOWTO.txt b/Documentation/networking/NAPI_HOWTO.txt
index 54376e8..4d333a4 100644
--- a/Documentation/networking/NAPI_HOWTO.txt
+++ b/Documentation/networking/NAPI_HOWTO.txt
@@ -738,6 +738,60 @@ USER PID %CPU %MEM SIZE RSS TTY
root 3 0.2 0.0 0 0 ? RWN Aug 15 602:00 (ksoftirqd_CPU0)
root 232 0.0 7.9 41400 40884 ? S Aug 15 74:12 gated

+
+APPENDIX 4: Using NAPI for TX skb cleanup
+=========================================
+
+While most of the discussion is focused on optimising the receive path,
+in most drivers it is also beneficial to free TX buffers from the
+dev->poll() function. Many devices trigger an interrupt for each
+frame that has been sent out to notify the driver that it can free
+the skb. This results in a large amount of interrupt processing that
+we want to avoid.
+
+The simplistic approach of setting a long kernel timer to clean up
+descriptors results in poor throughput because a user process that
+tries to send out a lot of data then blocks on its socket send buffer,
+while the driver never frees up the skbs in that buffer until the
+timeout.
+
+In order to get optimal throughput on transmit, the sent skbs need to
+be cleaned up before the chip runs out of data to transmit, so
+relying on an end of queue interrupt means that in the window between
+the interrupt and the time that new user packets have arrived in the
+adapter, there is no outgoing data on the wire, even if user data is
+available.
+It may also be bad to defer freeing skbs too long because they may consume
+a significan amount of memory.
+
+Experience shows that combination of events that trigger skb reclaim
+works best. These events include:
+- new packets coming in through hard_start_xmit()
+- packets coming in from the network through dev->poll()
+- time has passed since the first frame was send over the wire
+ but has not been reclaimed (tx_coalesce_usecs)
+- a number of packets have been sent (tx_max_coalesced_frames)
+
+We can avoid expensive locking between these by using the poll()
+function as the only place to call skb reclaim. This also means
+that in the interrupt handler, we always call netif_rx_schedule()
+for any interrupt, regardless of whether it is an rx or tx
+event. Don't get confused by the fact that we're scheduling
+the rx softirq to do tx work.
+
+Depending on the actual hardware, slightly different methods
+for coalesced tx interrupts may be used:
+- a timer that starts with the successful transmission of a packet
+ may need to be replaced with a timer that is started at when a
+ packet is submitted to the adapter.
+- instead of an interrupt that is triggered after a fixed number
+ of transmitted packets, it may be possible to mark a specific
+ packet so it generates an interrupt after processing.
+- If the adapter knows about the number of packets that have been
+ queued, a low-watermark interrupt may be used that fires
+ when the number drops below a user-defined value.
+
+
--------------------------------------------------------------------

relevant sites:

2006-08-19 01:31:58

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [RFC] HOWTO use NAPI to reduce TX interrupts

Arnd Bergmann wrote:
> On Saturday 19 August 2006 01:03, Arnd Bergmann wrote:
>
>> Someone should probably document that in
>> Documentation/networking/NAPI_HOWTO.txt, I might end up doing that
>> once we get it right for spidernet
>>

The reason reclaim via poll() is efficient is because it avoid causing a
softirq that is
necessary when skb_free_irq() is done. Instead it reuses the softirq
from the poll()
routine. Like all Rx NAPI, using poll() for reclaim means:
+ aggregating multiple frames in one irq
- increased overhead of twiddling with the IRQ mask
- more ways to get driver stuck

Some drivers do all their irq work in the poll() routine (including PHY
handling).
This is good if reading the IRQ status does an auto mask operation.

The whole NAPI documentation area is a mess and needs a good writer
to do some major restructuring. It should also be split into reference
information,
tutorial and guide sections.

2006-08-19 11:25:47

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC] HOWTO use NAPI to reduce TX interrupts

On Sunday 20 August 2006 03:31, Stephen Hemminger wrote:
>
> The reason reclaim via poll() is efficient is because it avoid causing a
> softirq that is
> necessary when skb_free_irq() is done. Instead it reuses the softirq
> from the poll() routine.

Ok, I completely missed this point so far, thanks for the info.

> Like all Rx NAPI, using poll() for reclaim means:
> ? ? + aggregating multiple frames in one irq
> ? ? - increased overhead of twiddling with the IRQ mask
> ? ? - more ways to get driver stuck

What is the best way to treat the IRQ mask for TX interrupts?
I guess it should be roughly:

- off when we expect ->poll() to be called, i.e. after calling
netif_rx_schedule() or returning after a partial rx from poll().
- off when there are no packets left in the TX queue
- on while RX interrupts are on and we're waiting for packets
to be transmitted.

> Some drivers do all their irq work in the poll() routine (including PHY
> handling).
> This is good if reading the IRQ status does an auto mask operation.
>
> The whole NAPI documentation area is a mess and needs a good writer
> to do some major restructuring. It should also be split into reference
> information,
> tutorial and guide sections.

I won't be able to do that work, I'm neither a good writer nor a networking
person.

Do you think we should still merge a section like the text I wrote up, even
if it makes the text even less well structured? Should I maybe add it
somewhere else than the appendix?

Arnd <><

2006-08-20 06:32:40

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 2/6]: powerpc/cell spidernet low watermark patch.


> card->low_watermark->next->dmac_cmd_status |= SPIDER_NET_DESCR_TXDESFLG;
> mb();
> card->low_watermark->dmac_cmd_status &= ~SPIDER_NET_DESCR_TXDESFLG;
> card->low_watermark = card->low_watermark->next;
>
> when we queue another frame for TX.

I would have expected those to be racy vs. the hardware... what if the
hardware is updating dmac_cmd_status just as your are trying to and the
bit out of it ?

Ben


2006-08-20 10:04:04

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 2/6]: powerpc/cell spidernet low watermark patch.

On Sunday 20 August 2006 08:31, Benjamin Herrenschmidt wrote:
> > card->low_watermark->next->dmac_cmd_status |= SPIDER_NET_DESCR_TXDESFLG;
> > mb();
> > card->low_watermark->dmac_cmd_status &= ~SPIDER_NET_DESCR_TXDESFLG;
> > card->low_watermark = card->low_watermark->next;
> >
> > when we queue another frame for TX.
>
> I would have expected those to be racy vs. the hardware... what if the
> hardware is updating dmac_cmd_status just as your are trying to and the
> bit out of it ?

Right, that doesn't work. It is the only bit we use in that byte though,
so maybe it can be done with a single byte write.

Arnd <><

2006-08-20 17:48:34

by Arnd Bergmann

[permalink] [raw]
Subject: [RFC v2] HOWTO use NAPI to reduce TX interrupts

A recent discussion about the spidernet driver resulted in the dicovery
that network drivers are supposed to use NAPI for both their receive and
transmit paths, but this is documented nowhere.

In order to help the next person writing a NAPI based driver, I wrote
down what I found missing about this.

Please tell me if anything in here is still wrong or could use better
wording.

Signed-off-by: Arnd Bergmann <[email protected]>

---
This is the second version of my mini howto, after a few comments
I got from Stephen Hemminger and Avuton Olrich.

Index: linux-cg/Documentation/networking/NAPI_HOWTO.txt
===================================================================
--- linux-cg.orig/Documentation/networking/NAPI_HOWTO.txt 2006-08-20 16:51:12.000000000 +0200
+++ linux-cg/Documentation/networking/NAPI_HOWTO.txt 2006-08-20 19:42:20.000000000 +0200
@@ -1,11 +1,6 @@
-HISTORY:
-February 16/2002 -- revision 0.2.1:
-COR typo corrected
-February 10/2002 -- revision 0.2:
-some spell checking ;->
-January 12/2002 -- revision 0.1
-This is still work in progress so may change.
-To keep up to date please watch this space.
+Note: this document could use a serious cleanup by a good writer.
+It would be nice to split out the reference parts into a kerneldoc
+document and turn the rest into a tutorial.

Introduction to NAPI
====================
@@ -738,6 +733,64 @@
root 3 0.2 0.0 0 0 ? RWN Aug 15 602:00 (ksoftirqd_CPU0)
root 232 0.0 7.9 41400 40884 ? S Aug 15 74:12 gated

+
+APPENDIX 4: Using NAPI for TX skb cleanup
+=========================================
+
+While most of the discussion is focused on optimizing the receive path, in
+most drivers it is also beneficial to free TX buffers from the dev->poll()
+function. Many devices trigger an interrupt for each packet that has been
+sent out to notify the driver that it can free the skb. This results in
+a large amount of interrupt processing that we want to avoid. It is also
+suboptimal to free skbs in a hardirq context, because dev_kfree_skb_irq()
+needs to schedule a softirq to do the actual work. Calling dev_kfree_skb()
+from dev->poll() directly avoids these extra softirq schedules.
+
+The simplistic approach of setting a long kernel timer to clean up
+descriptors results in poor throughput because a user process that tries
+to send out a lot of data then blocks on its socket send buffer, while
+the driver never frees up the skbs in that buffer until the timeout.
+
+Trying the cleanup every time that hard_start_xmit() is entered provides
+relatively good throughput, but typically causes extra processing overhead
+because of mmio accesses and/or spinlocks, so you normally want to batch
+skb reclaim.
+
+In order to get optimal throughput on transmit, the sent skbs need to be
+cleaned up before the chip runs out of data to transmit, so relying on
+an end of queue interrupt means that in the window between the interrupt
+and the time that new user packets have arrived in the adapter, there is
+no outgoing data on the wire, even if user data is available. It may
+also be bad to defer freeing skbs too long because they may consume a
+significant amount of memory.
+
+Experience shows that combination of events that trigger skb reclaim
+works best. These events include:
+- new packets coming in through hard_start_xmit()
+- packets coming in from the network through dev->poll()
+- time has passed since the first packet was send over the wire
+ but has not been reclaimed (tx_coalesce_usecs)
+- a number of packets have been sent (tx_max_coalesced_frames)
+
+We can avoid expensive locking between these by using the poll() function
+as the only place to call skb reclaim. This also means that in the
+interrupt handler, we always call netif_rx_schedule() for any interrupt,
+including those for tx or e.g. PHY handling. This is particularly
+helpful if reading the IRQ status does an auto mask operation.
+
+Depending on the actual hardware, slightly different methods for coalesced
+tx interrupts may be used:
+- a timer that starts with the successful transmission of a packet
+ may need to be replaced with a timer that is started at when a packet
+ is submitted to the adapter.
+- instead of an interrupt that is triggered after a fixed number
+ of transmitted packets, it may be possible to mark a specific packet
+ so it generates an interrupt after processing.
+- If the adapter knows about the number of packets that have been
+ queued, a low-watermark interrupt may be used that fires when the
+ number drops below a user-defined value.
+
+
--------------------------------------------------------------------

relevant sites:
@@ -764,3 +817,4 @@
Manfred Spraul <[email protected]>
Donald Becker <[email protected]>
Jeff Garzik <[email protected]>
+Arnd Bergmann <[email protected]>

2006-08-21 20:41:20

by Stephen Hemminger

[permalink] [raw]
Subject: NAPI documentation

I took this opportunity to get a start on improving NAPI documentation.
I mashed together the short text about NAPI (with permission) from lwn.net
and the existing NAPI-HOWTO, to make a page on the Linux net wiki.

I took the liberty of removing some of the bits that were out of date, or referred
to old Becker code. It still needs lots of editing to be presentable.

Please edit and improve
http://linux-net.osdl.org/index.php/NAPI

When the page is in good shape, I will de-wiki it to place in kernel doc tree.

2006-08-21 22:04:56

by David Miller

[permalink] [raw]
Subject: Re: NAPI documentation

From: Stephen Hemminger <[email protected]>
Date: Mon, 21 Aug 2006 13:40:53 -0700

> Please edit and improve
> http://linux-net.osdl.org/index.php/NAPI
>
> When the page is in good shape, I will de-wiki it to place in kernel doc tree.

How do I edit the introduction paragraphs at the top? I want to edit
this sentence since it sounds awful:

NAPI ("New API") is a modification to the packet process, ...

I want to change "packet process" to something more descriptive
and accurate.

2006-08-21 22:10:48

by Stephen Hemminger

[permalink] [raw]
Subject: Re: NAPI documentation

On Mon, 21 Aug 2006 15:05:09 -0700 (PDT)
David Miller <[email protected]> wrote:

> From: Stephen Hemminger <[email protected]>
> Date: Mon, 21 Aug 2006 13:40:53 -0700
>
> > Please edit and improve
> > http://linux-net.osdl.org/index.php/NAPI
> >
> > When the page is in good shape, I will de-wiki it to place in kernel doc tree.
>
> How do I edit the introduction paragraphs at the top?

Click edit on wiki and it is the first sentence.


> I want to edit
> this sentence since it sounds awful:
>
> NAPI ("New API") is a modification to the packet process, ...
>
> I want to change "packet process" to something more descriptive
> and accurate.
>

Yes.

Also, does Jamal have a link to his UKUUG paper?

--
Stephen Hemminger <[email protected]>

All non-trivial abstractions, to some degree, are leaky.

2006-08-21 22:17:32

by David Miller

[permalink] [raw]
Subject: Re: NAPI documentation

From: Stephen Hemminger <[email protected]>
Date: Mon, 21 Aug 2006 15:09:35 -0700

> Click edit on wiki and it is the first sentence.

Thanks a lot.

> Also, does Jamal have a link to his UKUUG paper?

I don't think there is a copy online right now.

2006-08-21 23:52:51

by linas

[permalink] [raw]
Subject: Re: [RFC] HOWTO use NAPI to reduce TX interrupts

On Sat, Aug 19, 2006 at 01:25:18PM +0200, Arnd Bergmann wrote:
>
> What is the best way to treat the IRQ mask for TX interrupts?
> I guess it should be roughly:
>
> - off when we expect ->poll() to be called, i.e. after calling
> netif_rx_schedule() or returning after a partial rx from poll().

Under what circumstance does one turn TX interrupts back on?
I couldn't quite figure that out.

--linas

2006-08-21 23:56:04

by David Miller

[permalink] [raw]
Subject: Re: [RFC] HOWTO use NAPI to reduce TX interrupts

From: [email protected] (Linas Vepstas)
Date: Mon, 21 Aug 2006 18:52:44 -0500

> Under what circumstance does one turn TX interrupts back on?
> I couldn't quite figure that out.

Don't touch interrupts until both RX and TX queue work is
fullydepleted. You seem to have this notion that RX and TX interrupts
are seperate. They aren't, even if your device can generate those
events individually. Whatever interrupt you get, you shut down all
interrupt sources and schedule the ->poll(). Then ->poll() does
something like:

all_tx_completion_work();
ret = as_much_rx_work_as_budget_and_quota_allows();
if (!ret)
reenable_interrupts_and_complet_napi_poll();

You always run the TX completion work fully, then you do the RX work
within the quota/budget.

See the tg3 driver for details, really...

2006-08-22 00:29:11

by Roland Dreier

[permalink] [raw]
Subject: Re: [RFC] HOWTO use NAPI to reduce TX interrupts

David> Don't touch interrupts until both RX and TX queue work is
David> fullydepleted. You seem to have this notion that RX and TX
David> interrupts are seperate. They aren't, even if your device
David> can generate those events individually. Whatever interrupt
David> you get, you shut down all interrupt sources and schedule
David> the ->poll(). Then ->poll() does something like:

This is a digression from spidernet, but what if a device is able to
generate separate MSIs for TX and RX? Some people from IBM have
suggested that it is beneficial for throughput to handle TX work and
RX work for IP-over-InfiniBand in parallel on separate CPUs, and
handling everything through the ->poll() method would defeat this.

- R.

2006-08-22 00:32:12

by David Miller

[permalink] [raw]
Subject: Re: [RFC] HOWTO use NAPI to reduce TX interrupts

From: Roland Dreier <[email protected]>
Date: Mon, 21 Aug 2006 17:29:05 -0700

> This is a digression from spidernet, but what if a device is able to
> generate separate MSIs for TX and RX? Some people from IBM have
> suggested that it is beneficial for throughput to handle TX work and
> RX work for IP-over-InfiniBand in parallel on separate CPUs, and
> handling everything through the ->poll() method would defeat this.

The TX work is so incredibly cheap, relatively speaking, compared
to the full input packet processing path that the RX side runs
that I see no real benefit.

In fact, you might even get better locality due to the way the
locking can be performed if TX reclaim runs inside of ->poll()

2006-08-23 21:36:46

by linas

[permalink] [raw]
Subject: Re: [PATCH 2/6]: powerpc/cell spidernet low watermark patch.

On Sun, Aug 20, 2006 at 12:03:14PM +0200, Arnd Bergmann wrote:
> On Sunday 20 August 2006 08:31, Benjamin Herrenschmidt wrote:
> > > card->low_watermark->next->dmac_cmd_status |= SPIDER_NET_DESCR_TXDESFLG;
> > > mb();
> > > card->low_watermark->dmac_cmd_status &= ~SPIDER_NET_DESCR_TXDESFLG;
> > > card->low_watermark = card->low_watermark->next;
> > >
> > > when we queue another frame for TX.
> >
> > I would have expected those to be racy vs. the hardware... what if the
> > hardware is updating dmac_cmd_status just as your are trying to and the
> > bit out of it ?
>
> Right, that doesn't work. It is the only bit we use in that byte though,
> so maybe it can be done with a single byte write.

Thanks, you're right, I missed that. I'll change this to byte access
shortly. Any recommendations for style/api for byte access?

I could create a searate patch to change struct descr {} to split
the u32 into several u8's; there's a dozen spots that get touched.

Alternatel, I could do a cheesy cast to char[4] and access that way.
Opinions?

--linas

2006-08-23 21:52:50

by linas

[permalink] [raw]
Subject: Re: [PATCH 5/6]: powerpc/cell spidernet bottom half

On Sat, Aug 19, 2006 at 01:03:04AM +0200, Arnd Bergmann wrote:
> using the NAPI poll function

Still fiddling with this. Getting side-tracked after noticing
that the RX side generates a *huge* numbe of interrupts, despite
code in the driver which superficially appears to be RX NAPI.
One step forward, two steps back, isn't there a dance like that?

--linas

2006-08-23 22:03:14

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 2/6]: powerpc/cell spidernet low watermark patch.

From: [email protected] (Linas Vepstas)
Date: Wed, 23 Aug 2006 16:36:42 -0500

> I could create a searate patch to change struct descr {} to split
> the u32 into several u8's; there's a dozen spots that get touched.
>
> Alternatel, I could do a cheesy cast to char[4] and access that way.
> Opinions?

The most portable scheme would be a "u32/u8[4]" union with
appropriate endianness checks when determining which byte
to access in the u8[] view.