2002-08-22 17:20:52

by Mala Anand

[permalink] [raw]
Subject: Re: [Lse-tech] Re: (RFC): SKB Initialization


>On Wed, Aug 21, 2002 at 01:07:09PM -0500, Mala Anand wrote:
>>
>> >On Wed, Aug 21, 2002 at 11:59:44AM -0500, Mala Anand wrote:
>> >> The patch reduces the number of cylces by 25%
>>
>> >The data you are reporting is flawed: where are the average cycle
>> >times spent in __kfree_skb with the patch?
>>
>> I measured the cycles for only the initialization code in alloc_skb
>> and __kfree_skb. Since the init code is removed from __kfree_skb,
>> no cycles are spent there.

>Then the testing technique is flawed. You should include all of the
>operations included in an alloc_skb/kfree_skb pair in order to see
>the overall effect of the change, otherwise your change could have a
>net negative effect which would not be noticed.

Cycles for the whole routines alloc_skb and __kfree_skb are as follows:

Baseline 2.5.25
----------------
alloc/free average cycles
-------------------------
Runs: 1st 2nd 3rd

CPU0: 337/1163 336/1132 304/1100
CPU1: 318/1164 309/1153 311/1127


2.5.25+skbinit patch
--------------------

alloc/free average cycles
-------------------------
Runs: 1st 2nd 3rd

CPU0: 447/1015 580/846 402/905
CPU1: 419/1003 383/915 547/856

The above figures indicate that the cycles spent in alloc_skb and
__kfree_skb have gained 5% in the patch case. However if you
take the absolute cycles and average them for the three runs it
comes around 145 cycles saving that is close to what I posted earlier
by measuring just the changed code. As the scope of the code measured
widens the percentage improvement comes down.

So the first two scopes, 1. measuring the cycles spent in changed code
2. measuring the cycles spent in alloc_skb and __kfree_skb, results
are consistent.

The third scope would be measuring this patch in a workload environment.
We measured it in a web serving workload and found that we get 0.7%
improvement.

I would like to stress again that this patch helps only when the
allocations
and frees occur on two different CPUs. I measured it in a UNI system and
did not see any impact.

Regards,
Mala


Mala Anand
IBM Linux Technology Center - Kernel Performance
E-mail:[email protected]
http://www-124.ibm.com/developerworks/opensource/linuxperf
http://www-124.ibm.com/developerworks/projects/linuxperf
Phone:838-8088; Tie-line:678-8088





Benjamin LaHaise
<[email protected]> To: Mala Anand/Austin/IBM@IBMUS
Sent by: cc: [email protected], Bill Hartner/Austin/IBM@IBMUS, [email protected],
[email protected] [email protected], [email protected]
ceforge.net Subject: [Lse-tech] Re: (RFC): SKB Initialization


08/21/02 01:16 PM





On Wed, Aug 21, 2002 at 01:07:09PM -0500, Mala Anand wrote:
>
> >On Wed, Aug 21, 2002 at 11:59:44AM -0500, Mala Anand wrote:
> >> The patch reduces the numer of cylces by 25%
>
> >The data you are reporting is flawed: where are the average cycle
> >times spent in __kfree_skb with the patch?
>
> I measured the cycles for only the initialization code in alloc_skb
> and __kfree_skb. Since the init code is removed from __kfree_skb,
> no cycles are spent there.

Then the testing technique is flawed. You should include all of the
operations included in an alloc_skb/kfree_skb pair in order to see
the overall effect of the change, otherwise your change could have a
net negative effect which would not be noticed.

-ben
--
"You will be reincarnated as a toad; and you will be much happier."


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
Lse-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/lse-tech






2002-08-22 18:28:45

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [Lse-tech] Re: (RFC): SKB Initialization

On Thu, Aug 22, 2002 at 12:22:34PM -0500, Mala Anand wrote:
> I would like to stress again that this patch helps only when the
> allocations
> and frees occur on two different CPUs. I measured it in a UNI system and
> did not see any impact.

Thanks, that looks a lot more complete. We discussed this on irc a bit, and
Andi Kleen pointed out that several years of hacking on skbs has probably
changed the layout significantly from the original intention of keeping all
the initializations to a cacheline or two. I also pointed out that it might
be worth looking at cache misses and perhaps adding a prefetch instruction
or two, especially during allocation when an skb will be used immediately.
Another point is to check the order of writes that gcc is generating to the
skb: if the writes are sequential, the cpu can combine them and make use of
the internal 64 bit bus to the cache. In combination with write buffers in
the cpu, that makes the writes in __kfree_skb almost free, but if the cache
lines are spread out or cold, that would explain the degredation you're
seeing. Cheers,

-ben