Hello all,
Here is a cleaned up version of the previous experimental patch:
http://marc.info/?l=linux-bluetooth&m=123245036109697&w=2
I changed it to be alignment and byte order neutral (input data is read one
byte at a time). It's a bit slower than reading via int16_t * pointer, but
avoids headache of worrying about the other problems. Endian conversion
is still also kept (when reading one byte at a time, it does not affect
performance anyway).
The patch should be safe to apply.
Benchmarks show consistent performance improvement ~30% for both x86
and ARM Cortex-A8. It's even more than I measured before just because
optimizations are cumulative and the effect of each individual change becomes
more visible when the other parts also get faster (the previous benchmark was
run before "-funroll-loops" optimization got committed).
ARM Cortex-A8:
before:
real 1m 24.78s
user 1m 21.20s
sys 0m 3.57s
after:
real 1m 4.72s
user 1m 1.03s
sys 0m 3.68s
Intel Core2:
before:
real 0m10.210s
user 0m9.761s
sys 0m0.324s
after:
real 0m7.729s
user 0m7.268s
sys 0m0.376s
Best regards,
Siarhei Siamashka
On Wednesday 28 January 2009 07:46:53 ext Marcel Holtmann wrote:
> Hi Siarhei,
>
> > Here is a cleaned up version of the previous experimental patch:
> > http://marc.info/?l=linux-bluetooth&m=123245036109697&w=2
> >
> > I changed it to be alignment and byte order neutral (input data is read
> > one byte at a time). It's a bit slower than reading via int16_t *
> > pointer, but avoids headache of worrying about the other problems. Endian
> > conversion is still also kept (when reading one byte at a time, it does
> > not affect performance anyway).
> >
> > The patch should be safe to apply.
>
> your patch has been applied. Thanks.
Thanks.
> > Benchmarks show consistent performance improvement ~30% for both x86
> > and ARM Cortex-A8. It's even more than I measured before just because
> > optimizations are cumulative and the effect of each individual change
> > becomes
> > more visible when the other parts also get faster (the previous
> > benchmark was
> > run before "-funroll-loops" optimization got committed).
>
> Sounds great to me. Keep optimizing it :)
Fortunately not so much is left to be optimized :)
Joint stereo encoding performance still can be improved. Also a few other
tweaks could be tried for 'sbc_pack_frame_internal'. After that, only
implementing SIMD optimizations for various CPU cores (primarily ARM11)
would be left to do.
Best regards,
Siarhei Siamashka
Hi Siarhei,
> Here is a cleaned up version of the previous experimental patch:
> http://marc.info/?l=linux-bluetooth&m=123245036109697&w=2
>
> I changed it to be alignment and byte order neutral (input data is read one
> byte at a time). It's a bit slower than reading via int16_t * pointer, but
> avoids headache of worrying about the other problems. Endian conversion
> is still also kept (when reading one byte at a time, it does not affect
> performance anyway).
>
> The patch should be safe to apply.
your patch has been applied. Thanks.
> Benchmarks show consistent performance improvement ~30% for both x86
> and ARM Cortex-A8. It's even more than I measured before just because
> optimizations are cumulative and the effect of each individual change
> becomes
> more visible when the other parts also get faster (the previous
> benchmark was
> run before "-funroll-loops" optimization got committed).
Sounds great to me. Keep optimizing it :)
Regards
Marcel