LinuxLists.cc - arm: Optimization for ethernet MAC handling at91

2010-01-12 15:37:43

Subject: arm: Optimization for ethernet MAC handling at91_ether.c

Everyone,

Since, a AT91_EMAC_TUND only happens when the transmitter is unable to
transfer the frame in time for a frame to be sent. It makes sense to
RETRY the packet in this condition in the ISR.
Or would this overcomplicate a simple task?
... see below ...

James

---- code snippet ----

/*
* MAC interrupt handler
*/
static irqreturn_t at91ether_interrupt(int irq, void *dev_id) { ...

if (intstatus & AT91_EMAC_TCOM) { /* Transmit
complete */
/* The TCOM bit is set even if the
transmission failed. */
if (intstatus & (AT91_EMAC_TUND |
AT91_EMAC_RTRY))
dev->stats.tx_errors +=
1;

if (lp->skb) {

dev_kfree_skb_irq(lp->skb);
lp->skb = NULL;
dma_unmap_single(NULL,
lp->skb_physaddr, lp->skb_length, DMA_TO_DEVICE);
}
netif_wake_queue(dev);
}
...

---- Alternate approach ----

/* The TCOM bit is set even if the
transmission failed. */
if (intstatus & (AT91_EMAC_TUND)) {
/* Set address of the
data in the Transmit Address register */

at91_emac_write(AT91_EMAC_TAR, lp->skb_physaddr);
/* Set length of the
packet in the Transmit Control register */

at91_emac_write(AT91_EMAC_TCR, skb->len);
}
else if (intstatus & (AT91_EMAC_RTRY))
dev->stats.tx_errors +=
1;

...
I do know there needs to be a bit more code then to handle the
successful case below this; but, this is enough to understand what I am
talking about. The UNDERRUN error should happen infrequently and in
ideal circumstances not happen at all.

----
James Kosin
Software Engineer

2010-01-12 16:36:13

by Eric Dumazet

[permalink] [raw]

Subject: Re: arm: Optimization for ethernet MAC handling at91_ether.c

CC to netdev

Le 12/01/2010 16:39, James Kosin a ?crit :
> Everyone,
>
> Since, a AT91_EMAC_TUND only happens when the transmitter is unable to
> transfer the frame in time for a frame to be sent. It makes sense to
> RETRY the packet in this condition in the ISR.
> Or would this overcomplicate a simple task?
> ... see below ...
>
...

>
> ...
> I do know there needs to be a bit more code then to handle the
> successful case below this; but, this is enough to understand what I am
> talking about. The UNDERRUN error should happen infrequently and in
> ideal circumstances not happen at all.
>

If this happens once in a while, why do you want driver to retry the transmit ?

2010-01-12 17:50:16

by James Kosin

[permalink] [raw]

Subject: RE: arm: Optimization for ethernet MAC handling at91_ether.c

On 1/12/2010 11:40 AM, Eric Dumazet wrote:
> CC to netdev
>
> Le 12/01/2010 16:39, James Kosin a ?crit :
>> Everyone,
>>
>> Since, a AT91_EMAC_TUND only happens when the transmitter is unable to
>> transfer the frame in time for a frame to be sent. It makes sense to
>> RETRY the packet in this condition in the ISR.
>> Or would this overcomplicate a simple task?
>> ... see below ...
>>
> ...
>
>>
>> ...
>> I do know there needs to be a bit more code then to handle the
>> successful case below this; but, this is enough to understand what I am
>> talking about. The UNDERRUN error should happen infrequently and in
>> ideal circumstances not happen at all.
>>
>
>
> If this happens once in a while, why do you want driver to retry the transmit ?

(a) It would improve performance by allowing the ISR to handle the re-transmit in this case.
(b) It would help in the case of small glitches that may happen from external SDRAM without taxing the polling required to handle the re-transmit of the packet... ie: overhead required to re-queue and initiate a packet delivery... since the packet is already scheduled for delivery now.

James

2010-01-12 18:08:43

by Eric Dumazet

[permalink] [raw]

Subject: Re: arm: Optimization for ethernet MAC handling at91_ether.c

Le 12/01/2010 18:51, James Kosin a ?crit :
> On 1/12/2010 11:40 AM, Eric Dumazet wrote:

>> If this happens once in a while, why do you want driver to retry the transmit ?
>
> (a) It would improve performance by allowing the ISR to handle the re-transmit in this case.
> (b) It would help in the case of small glitches that may happen from external SDRAM without taxing the polling required to handle the re-transmit of the packet... ie: overhead required to re-queue and initiate a packet delivery... since the packet is already scheduled for delivery now.
>

OK, but then this also adds an extra check for each tx completion.

I dont have this piece of hardware, but seeing it has a one skb tx queue (!),
I suppose TX performance is not very good anyway...

at91ether_interrupt() should probably handle tx completion before rx, to feed
next frame faster.

2010-01-12 18:40:36

by James Kosin

[permalink] [raw]

Subject: RE: arm: Optimization for ethernet MAC handling at91_ether.c

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Eric Dumazet
Sent: Tuesday, January 12, 2010 1:08 PM
To: James Kosin
Cc: [email protected]; Linux Netdev List
Subject: Re: arm: Optimization for ethernet MAC handling at91_ether.c

Le 12/01/2010 18:51, James Kosin a ?crit :
> On 1/12/2010 11:40 AM, Eric Dumazet wrote:

>> If this happens once in a while, why do you want driver to retry the transmit ?
>
> (a) It would improve performance by allowing the ISR to handle the re-transmit in this case.
> (b) It would help in the case of small glitches that may happen from external SDRAM without taxing the polling required to handle the re-transmit of the packet... ie: overhead required to re-queue and initiate a packet delivery... since the packet is already scheduled for delivery now.
>

OK, but then this also adds an extra check for each tx completion.

I dont have this piece of hardware, but seeing it has a one skb tx queue (!),
I suppose TX performance is not very good anyway...

at91ether_interrupt() should probably handle tx completion before rx, to feed
next frame faster.

---

I know. Actually, I am using the hardware; and the part is capable of more than this driver is displaying.

1) The part is able to queue up two packets (1) actively being transmitted, and (1) queued up behind that packet. The driver doesn't exploit this; probably because this would cause some confusion, since we wouldn't really know which packet failed when this happens.

2) TX performance is so..so. With bing on another computer using '-z' option it is reporting a fairly good value of 74Mbps. I'll have to try with a re-compiled kernel at some point to give better feedback if we can improve this more.

Your interpretation makes some sense on the transmit side. The RX currently has a DMA queue that extends to a depth of 9 (currently). So getting one transmit out before checking the RX may improve things a bit.

James

2010-01-12 19:02:08

by James Kosin

[permalink] [raw]

Subject: RE: arm: Optimization for ethernet MAC handling at91_ether.c

On 1/12/2010 1:50 PM, James Kosin wrote:
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Eric Dumazet
> Sent: Tuesday, January 12, 2010 1:08 PM
> To: James Kosin
> Cc: [email protected]; Linux Netdev List
> Subject: Re: arm: Optimization for ethernet MAC handling at91_ether.c
>
> Le 12/01/2010 18:51, James Kosin a ?crit :
>> On 1/12/2010 11:40 AM, Eric Dumazet wrote:
>
>
> I know. Actually, I am using the hardware; and the part is capable of more than this driver is displaying.
>
> 1) The part is able to queue up two packets (1) actively being transmitted, and (1) queued up behind that packet. The driver doesn't exploit this; probably because this would cause some confusion, since we wouldn't really know which packet failed when this happens.
>
> 2) TX performance is so..so. With bing on another computer using '-z' option it is reporting a fairly good value of 74Mbps. I'll have to try with a re-compiled kernel at some point to give better feedback if we can improve this more.
>
> Your interpretation makes some sense on the transmit side. The RX currently has a DMA queue that extends to a depth of 9 (currently). So getting one transmit out before checking the RX may improve things a bit.
>

Scratch that. The interrupt doesn't queue up or send another packet directly. So, it wouldn't help on performance here. But, may in other implementations that queue/transmit packets in the ISR. At least in the case where the transmitter is limited to one.

James

2010-01-12 19:24:49

by Eric Dumazet

[permalink] [raw]

Subject: Re: arm: Optimization for ethernet MAC handling at91_ether.c

Le 12/01/2010 20:03, James Kosin a ?crit :
>
> Scratch that. The interrupt doesn't queue up or send another packet directly. So, it wouldn't help on performance here. But, may in other implementations that queue/transmit packets in the ISR. At least in the case where the transmitter is limited to one.
>

It could, at least on SMP. tx completion wakes a blocked sender, while
this cpu continue with RX handling (possibly expensive)

But even on UP, doing tx completion before rx handling allows
a better reuse of skb just freed (and partly present in cpu cache, if available).

Start of IRQ

1) tx completion
-> free a skb

2) rx handling:
-> allocate an skb, kmalloc() reuses previous one, still in cpu cache.

End of IRQ

2010-01-12 19:38:07

by James Kosin

[permalink] [raw]

Subject: RE: arm: Optimization for ethernet MAC handling at91_ether.c

-----Original Message-----
>From: Eric Dumazet [mailto:[email protected]]
>Sent: Tuesday, January 12, 2010 2:25 PM
>To: James Kosin
>Cc: [email protected]; Linux Netdev List
>Subject: Re: arm: Optimization for ethernet MAC handling at91_ether.c
>
>Le 12/01/2010 20:03, James Kosin a ?crit :
>>
>> Scratch that. The interrupt doesn't queue up or send another packet directly. So, it wouldn't help on performance here. But, may in other implementations that queue/transmit packets in the ISR. At least in the case where the transmitter is limited to one.
>>
>
>It could, at least on SMP. tx completion wakes a blocked sender, while
>this cpu continue with RX handling (possibly expensive)
>
>But even on UP, doing tx completion before rx handling allows
>a better reuse of skb just freed (and partly present in cpu cache, if available).
>
>Start of IRQ
>
>1) tx completion
> -> free a skb
>
>2) rx handling:
> -> allocate an skb, kmalloc() reuses previous one, still in cpu cache.
>
>End of IRQ

I think this may work to improve things slightly; since, the transmitter always frees the skb regardless of error or success currently.

What I was proposing was to modify the sequence slightly to be more like this:

1) tx completion
-> test for TX TUND error
-> resend the current skb (to avoid having to re-do)
-> else
-> test for TX RTRY error
-> increment error count as before
-> free the skb