2002-08-23 20:10:42

by Bill Hartner

[permalink] [raw]
Subject: Re: [Lse-tech] Re: (RFC): SKB Initialization



Dave Hansen wrote:
>
> Mala Anand wrote:
> > Readprofile ticks are not as accurate as the cycles I measured.
> > Moreover readprofile can give misleading information as it profiles
> > on timer interrupts. The alloc_skb and __kfree_skb call memory
> > management routines and interrupts are disabled in many parts of that code.
> > So I don't trust the readprofile data.
>
> I don't believe your results to be accurate. They may be _precise_
> for a small case, but you couldn't have been measuring them for very
> long. A claim of accuracy requires a large number of samples, which
> you apparently did not do.

Dave,

What is your definition of a "very long time" ?

Read the 1st email. There were 2.4 million samples.

How many do you think is sufficient ?

>
> I can't use oprofile or other NMI-based profilers on my hardware, so
> we'll just have to guess. Is there any chance that you have access to
> a large Specweb setup on hardware that is close to mine and can run
> oprofile?

Why do you think oprofile is a better way to measure this ?
BTW, Mala works with Troy Wilson who is running SPECweb99 on
an 8-way system using Apache. Troy has run with Mala's patch
and that data will be posted.

>
> Where are interrupts disabled? I just went through a set of kernprof
> data and traced up the call graph. In the most common __kfree_skb
> case, I do not believe that it has interupts disabled. I could be
> wrong, but I didn't see it.

What is the revelance of the above ?

>
> http://www.sr71.net/~specweb99/old/run-specweb-2300-nodynpost-2.5.31-bk+profilers-08-14-2002-02.19.22/callgraph
>
> The end result, as I can see it, is that your patches hurt Specweb
> performance.

Based on what ? A callgraph ? A profile ?

Bill


2002-08-23 20:47:09

by Rick Lindsley

[permalink] [raw]
Subject: Re: [Lse-tech] Re: (RFC): SKB Initialization

Read the 1st email. There were 2.4 million samples.

How many do you think is sufficient ?

I looked at my hand 2.4 million times and it was not wet each time.
Therefore, it is not raining.

Of course, if I am inside a roofed structure, the sampling is faulty.
And (correct me if I'm wrong here, Dave) I think that's what we're
asking about. Are the samples you're getting pertinent and
significant? If, as you suggested in another email, you disable
interrupts in the functions to take these measurements, you may be
significantly altering the very environment you hope to measure.

Why do you think oprofile is a better way to measure this ? BTW,
Mala works with Troy Wilson who is running SPECweb99 on an 8-way
system using Apache. Troy has run with Mala's patch and that data
will be posted.

That will be helpful. Microbenchmarks which measure cycles are far
less interesting to the community than the end results of actual
workloads. Note that Mala said "I measured the cycles for only the
initialization code in alloc_skb and __kfree_skb" which could mean that
even other parts of alloc_skb() or __kfree_skb() may have gotten worse
and you would not have known. Later she admits, "As the scope of the
code measured widens the percentage improvement comes down" and finally
observes "We measured it in a web serving workload and found that we
get 0.7% improvement" which is practically in the noise. Dave's
observation was that it was slightly worse (0.35%). Either could be
statistical noise. If the patch only creates statistical noise, the
community won't be interested.

Also, it is well known and widely recognized that more cpus add
increasing complexity with cache and code interactions. Have you
tested this on an 8-way machine, rather than a 2-way, and do the
results still hold? Things which look very good on 2-proc can start to
lose their lustre on 8-proc or bigger.

I'm unfamiliar with netperf -- does it yield "results" which can be
compared? If so, since it was used to generate the load, how did the
results of the two runs compare?

Rick