2009-01-27 18:14:13

by Siarhei Siamashka

[permalink] [raw]
Subject: [PATCH] Performance optimizations for input data processing in SBC encoder

Hello all,

Here is a cleaned up version of the previous experimental patch:
http://marc.info/?l=linux-bluetooth&m=123245036109697&w=2

I changed it to be alignment and byte order neutral (input data is read one
byte at a time). It's a bit slower than reading via int16_t * pointer, but
avoids headache of worrying about the other problems. Endian conversion
is still also kept (when reading one byte at a time, it does not affect
performance anyway).

The patch should be safe to apply.

Benchmarks show consistent performance improvement ~30% for both x86
and ARM Cortex-A8. It's even more than I measured before just because
optimizations are cumulative and the effect of each individual change becomes
more visible when the other parts also get faster (the previous benchmark was
run before "-funroll-loops" optimization got committed).

ARM Cortex-A8:

before:
real 1m 24.78s
user 1m 21.20s
sys 0m 3.57s

after:
real 1m 4.72s
user 1m 1.03s
sys 0m 3.68s

Intel Core2:

before:
real 0m10.210s
user 0m9.761s
sys 0m0.324s

after:
real 0m7.729s
user 0m7.268s
sys 0m0.376s


Best regards,
Siarhei Siamashka


Attachments:
(No filename) (1.12 kB)
0001-Performance-optimizations-for-input-data-processing.patch (20.63 kB)
Download all attachments

2009-01-29 01:27:53

by Siarhei Siamashka

[permalink] [raw]
Subject: Re: [PATCH] Performance optimizations for input data processing in SBC encoder

On Wednesday 28 January 2009 07:46:53 ext Marcel Holtmann wrote:
> Hi Siarhei,
>
> > Here is a cleaned up version of the previous experimental patch:
> > http://marc.info/?l=linux-bluetooth&m=123245036109697&w=2
> >
> > I changed it to be alignment and byte order neutral (input data is read
> > one byte at a time). It's a bit slower than reading via int16_t *
> > pointer, but avoids headache of worrying about the other problems. Endian
> > conversion is still also kept (when reading one byte at a time, it does
> > not affect performance anyway).
> >
> > The patch should be safe to apply.
>
> your patch has been applied. Thanks.

Thanks.

> > Benchmarks show consistent performance improvement ~30% for both x86
> > and ARM Cortex-A8. It's even more than I measured before just because
> > optimizations are cumulative and the effect of each individual change
> > becomes
> > more visible when the other parts also get faster (the previous
> > benchmark was
> > run before "-funroll-loops" optimization got committed).
>
> Sounds great to me. Keep optimizing it :)

Fortunately not so much is left to be optimized :)

Joint stereo encoding performance still can be improved. Also a few other
tweaks could be tried for 'sbc_pack_frame_internal'. After that, only
implementing SIMD optimizations for various CPU cores (primarily ARM11)
would be left to do.


Best regards,
Siarhei Siamashka

2009-01-28 05:46:53

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [PATCH] Performance optimizations for input data processing in SBC encoder

Hi Siarhei,

> Here is a cleaned up version of the previous experimental patch:
> http://marc.info/?l=linux-bluetooth&m=123245036109697&w=2
>
> I changed it to be alignment and byte order neutral (input data is read one
> byte at a time). It's a bit slower than reading via int16_t * pointer, but
> avoids headache of worrying about the other problems. Endian conversion
> is still also kept (when reading one byte at a time, it does not affect
> performance anyway).
>
> The patch should be safe to apply.

your patch has been applied. Thanks.

> Benchmarks show consistent performance improvement ~30% for both x86
> and ARM Cortex-A8. It's even more than I measured before just because
> optimizations are cumulative and the effect of each individual change
> becomes
> more visible when the other parts also get faster (the previous
> benchmark was
> run before "-funroll-loops" optimization got committed).

Sounds great to me. Keep optimizing it :)

Regards

Marcel