Alan Cox wrote:
>
> > It's not an incompatibility with the k7 chip, just bad code in
> > include/asm-i386/string.h. in_interrupt() cannot be called from there.
>
> The string.h code was fine, someone came along and put in a ridiculous loop
> in the include dependancies and broke it. Nobody has had the time to untangle
> it cleanly since
Yes, bitrot. I don't see a rearrangement of system headers happening in 2.4.
I'm pretty sure if I committed such a patch it would have no measurable
lifetime.
>
> > I have posted a patch here many times since last May. Most recent was
> > Saturday.
>
> uninlining the code is too high a cost.
I question that. Athlon does branch prediction on call targets, function
calls are cheap. 3dnow saves 25%-50% of cycles on a copy. How many function
calls can be paid for with 1000 cycles or so?
My patch still inlines the standard string const_memcpy for the case of
small known length.
If I configure SMP for a UP box, performance is clearly not my first
concern. If I have a real SMP Athlon system, performance should not improve
by only using one processor.
How about we get it to build before we optimize it?
Regards,
Tom
--
The Daemons lurk and are dumb. -- Emerson
Make it and I will care and post it on kernel.org for you.
I need that patch soon.
On Thu, 1 Feb 2001, Tom Leete wrote:
> Alan Cox wrote:
> >
> > > It's not an incompatibility with the k7 chip, just bad code in
> > > include/asm-i386/string.h. in_interrupt() cannot be called from there.
> >
> > The string.h code was fine, someone came along and put in a ridiculous loop
> > in the include dependancies and broke it. Nobody has had the time to untangle
> > it cleanly since
>
> Yes, bitrot. I don't see a rearrangement of system headers happening in 2.4.
> I'm pretty sure if I committed such a patch it would have no measurable
> lifetime.
>
> >
> > > I have posted a patch here many times since last May. Most recent was
> > > Saturday.
> >
> > uninlining the code is too high a cost.
>
> I question that. Athlon does branch prediction on call targets, function
> calls are cheap. 3dnow saves 25%-50% of cycles on a copy. How many function
> calls can be paid for with 1000 cycles or so?
>
> My patch still inlines the standard string const_memcpy for the case of
> small known length.
>
> If I configure SMP for a UP box, performance is clearly not my first
> concern. If I have a real SMP Athlon system, performance should not improve
> by only using one processor.
>
> How about we get it to build before we optimize it?
>
> Regards,
> Tom
>
> --
> The Daemons lurk and are dumb. -- Emerson
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
>
Andre Hedrick
Linux ATA Development
> > uninlining the code is too high a cost.
>
> I question that. Athlon does branch prediction on call targets, function
> calls are cheap. 3dnow saves 25%-50% of cycles on a copy. How many function
> calls can be paid for with 1000 cycles or so?
>
> My patch still inlines the standard string const_memcpy for the case of
> small known length.
We have a very large number of memcpy's of unknown short length (often in
interrupts) that are close to branches. A lot of
if(foo==NULL)
return
memcpy(..
stuff for example.
Im more than happy for someone to do the benches and prove me wrong
Andre Hedrick wrote:
>
> Make it and I will care and post it on kernel.org for you.
> I need that patch soon.
>
> On Thu, 1 Feb 2001, Tom Leete wrote:
>
> > Alan Cox wrote:
> > > The string.h code was fine, someone came along and put in a ridiculous loop
> > > in the include dependancies and broke it. Nobody has had the time to untangle
> > > it cleanly since
> >
> > Yes, bitrot. I don't see a rearrangement of system headers happening in 2.4.
> > I'm pretty sure if I committed such a patch it would have no measurable
> > lifetime.
Hi Andre,
I meant that nobody should be reshuffling 2.4 headers now, didn't intend to
sound like I take that personally.
I'll take a look. I may be able to do something with include guards or other
#defines + multiple passes. We already have the multiple passes.
I think my arguments for the present patch are good. I'm making a mod of
Arjan's athlon.c to see if I'm right. If you have a suggestion for another
benchmark, I'd like to hear about it. Whatever the results, I'll post them
here.
Glad if whatever comes out is useful to you.
Cheers,
Tom
--
The Daemons lurk and are dumb. -- Emerson
Alan Cox wrote:
>
> We have a very large number of memcpy's of unknown short length (often in
> interrupts) that are close to branches. A lot of
>
> if(foo==NULL)
> return
> memcpy(..
>
> stuff for example.
>
> Im more than happy for someone to do the benches and prove me wrong
Agreed, that is a bad case, and there is overhead for it in my patch. I'm
putting together some metrics, will post results here.
Regards,
Tom
--
The Daemons lurk and are dumb. -- Emerson