2006-11-08 21:50:18

by Jeff V. Merkey

[permalink] [raw]
Subject: e1000 driver 2.6.18 - how to waste processor cycles


Is there a good reason the skb refill routine in e1000_alloc_rx_buffers
needs to go and touch and remap skb memory
on already loaded descriptors/ This seems extremely wasteful of
processor cycles when refilling the ring buffer.

I note that the archtiecture has changed and is recycling buffers from
the rx_irq routine and when the routine is called
to refill the ring buffers, a lot of wasteful and needless calls for
map_skb is occurring.

Jeff


2006-11-09 01:01:48

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: e1000 driver 2.6.18 - how to waste processor cycles

included netdev...

On 11/8/06, Jeff V. Merkey <[email protected]> wrote:
>
> Is there a good reason the skb refill routine in e1000_alloc_rx_buffers
> needs to go and touch and remap skb memory
> on already loaded descriptors/ This seems extremely wasteful of
> processor cycles when refilling the ring buffer.
>
> I note that the archtiecture has changed and is recycling buffers from
> the rx_irq routine and when the routine is called
> to refill the ring buffers, a lot of wasteful and needless calls for
> map_skb is occurring.

we have to unmap the descriptor (or at least do
pci_dma_sync_single_for_cpu / pci_dma_sync_single_for_device) because
the dma API says we can't be guaranteed the cacheable memory is
consistent until we do one of the afore mentioned pci dma ops.

we have to do *something* before we access it. Simplest path is to
unmap it and then recycle/map it.

If you can show that it is faster to use pci_dma_sync_single_for_cpu
and friends I'd be glad to take a patch.

Hope this helps,
Jesse

2006-11-09 02:43:49

by David Miller

[permalink] [raw]
Subject: Re: e1000 driver 2.6.18 - how to waste processor cycles

From: "Jesse Brandeburg" <[email protected]>
Date: Wed, 8 Nov 2006 17:01:44 -0800

> If you can show that it is faster to use pci_dma_sync_single_for_cpu
> and friends I'd be glad to take a patch.

The problem is if you don't recycle the buffer and really unmap,
you'll flush twice. That can potentially be expensive.

2006-11-09 07:37:38

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: e1000 driver 2.6.18 - how to waste processor cycles

Jesse Brandeburg wrote:

> included netdev...
>
> On 11/8/06, Jeff V. Merkey <[email protected]> wrote:
>
>>
>> Is there a good reason the skb refill routine in e1000_alloc_rx_buffers
>> needs to go and touch and remap skb memory
>> on already loaded descriptors/ This seems extremely wasteful of
>> processor cycles when refilling the ring buffer.
>>
>> I note that the archtiecture has changed and is recycling buffers from
>> the rx_irq routine and when the routine is called
>> to refill the ring buffers, a lot of wasteful and needless calls for
>> map_skb is occurring.
>
>
> we have to unmap the descriptor (or at least do
> pci_dma_sync_single_for_cpu / pci_dma_sync_single_for_device) because
> the dma API says we can't be guaranteed the cacheable memory is
> consistent until we do one of the afore mentioned pci dma ops.

In the case I am referring to, the memory is already mapped with a
previous call, which means it may be getting
mapped twice.

Jeff

>
> we have to do *something* before we access it. Simplest path is to
> unmap it and then recycle/map it.
>
> If you can show that it is faster to use pci_dma_sync_single_for_cpu
> and friends I'd be glad to take a patch.
>
> Hope this helps,
> Jesse
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2006-11-09 22:45:38

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: e1000 driver 2.6.18 - how to waste processor cycles

On 11/9/06, Jeffrey V. Merkey <[email protected]> wrote:
> In the case I am referring to, the memory is already mapped with a
> previous call, which means it may be getting
> mapped twice.

I guess maybe I'm not keeping up with you. This is what I see looking
in 2.6.18, i see e1000_clean_rx_irq:

check done bit
pci_unmap_single
copybreak and recycle
OR
hand buffer up stack

the only branch before the unmap is the napi break out, and in that
case we don't change any memory state, so alloc will not do anything.

As for alloc rx, we always map, because we always unmapped.

Did I miss something? I would appreciate a more detailed explanation
of what you see going wrong.

2006-11-09 23:02:10

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: e1000 driver 2.6.18 - how to waste processor cycles

Jesse Brandeburg wrote:

> On 11/9/06, Jeffrey V. Merkey <[email protected]> wrote:
>
>> In the case I am referring to, the memory is already mapped with a
>> previous call, which means it may be getting
>> mapped twice.
>
>
> I guess maybe I'm not keeping up with you. This is what I see looking
> in 2.6.18, i see e1000_clean_rx_irq:


Check e1000_alloc_rx_buffers:

if (skb already exists in ring buffer)
goto map_skb:
else
dev_alloc_skb
( drop through to map_skb)


map_skb:
pci_map_single

Jeff

>
> check done bit
> pci_unmap_single
> copybreak and recycle
> OR
> hand buffer up stack
>
> the only branch before the unmap is the napi break out, and in that
> case we don't change any memory state, so alloc will not do anything.
>
> As for alloc rx, we always map, because we always unmapped.
>
> Did I miss something? I would appreciate a more detailed explanation
> of what you see going wrong.
>
>

2006-11-09 23:06:34

by Jeffrey V. Merkey

[permalink] [raw]
Subject: Re: e1000 driver 2.6.18 - how to waste processor cycles

Jeff V. Merkey wrote:

> Jesse Brandeburg wrote:
>
>> On 11/9/06, Jeffrey V. Merkey <[email protected]> wrote:
>>
>>> In the case I am referring to, the memory is already mapped with a
>>> previous call, which means it may be getting
>>> mapped twice.
>>
>>
>>
>> I guess maybe I'm not keeping up with you. This is what I see looking
>> in 2.6.18, i see e1000_clean_rx_irq:
>
>
>
> Check e1000_alloc_rx_buffers:
>
> if (skb already exists in ring buffer)
> goto map_skb:
> else
> dev_alloc_skb
> ( drop through to map_skb)
>
>
> map_skb:
> pci_map_single
>
> Jeff


>
>>
>> check done bit
>> pci_unmap_single
>> copybreak and recycle
>> OR
>> hand buffer up stack
>>
>> the only branch before the unmap is the napi break out, and in that
>> case we don't change any memory state, so alloc will not do anything.
>>
>> As for alloc rx, we always map, because we always unmapped.
>
Unmapping every single buffer in rx_irq the remapping them in
alloc_rx_buffers is wasteful of cycles.

Jeff

>>
>> Did I miss something? I would appreciate a more detailed explanation
>> of what you see going wrong.
>>
>>
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>