2009-01-29 01:10:03

by Siarhei Siamashka

[permalink] [raw]
Subject: [PATCH] SBC encoder scale factors calculation optimized with __builtin_clz

Hello all,

The attached patch contains optimization for scale factors calculation which
provides additional SBC encoder speedup.

For non-gcc compilers, CLZ function is implemented with a very simple and
slow straightforward code (but it is still faster than current git code even
if used instead of __builtin_clz). Something better could be done like:
http://groups.google.com/group/comp.sys.arm/msg/5ae56e3a95a2345e?hl=en
But I'm not sure about license/copyright of the code at this link and decided
not to touch it. Anyway, I don't think that gcc implementation of
__builtin_clz for the CPU cores which do not support CLZ instruction is any
worse.

Joint stereo processing also involves recalculation of scale factors, which
can use a similar optimization or even exactly the same function.
I intentionally did not benchmark encoding with joint stereo yet as it would
spoil the nice numbers :) That's something to improve next.

Benchmark results (sbcenc with default settings):

====

ARM Cortex-A8:

before:
real 1m 4.84s
user 1m 1.05s
sys 0m 3.78s

after:
real 0m 58.93s
user 0m 55.15s
sys 0m 3.78s

Intel Core2:

before:
real ? ?0m7.729s
user ? ?0m7.268s
sys ? ? 0m0.376s

after:
real 0m6.473s
user 0m6.116s
sys 0m0.292s

====

Overall, CPU usage in SBC encoder looks more or less like this (oprofile log
from ARM Cortex-A8):

samples % image name symbol name
2173 30.6791 sbcenc.neon_new sbc_encode
1774 25.0459 sbcenc.neon_new sbc_analyze_4b_8s_neon
1525 21.5304 sbcenc.neon_new sbc_calculate_bits
916 12.9324 sbcenc.neon_new sbc_calc_scalefactors
600 8.4710 sbcenc.neon_new sbc_enc_process_input_8s_be
75 1.0589 libc-2.5.so memcpy
13 0.1835 sbcenc.neon_new main
4 0.0565 libc-2.5.so write
2 0.0282 sbcenc.neon_new .plt
1 0.0141 ld-2.5.so _dl_relocate_object


Best regards,
Siarhei Siamashka


Attachments:
(No filename) (1.98 kB)
0001-SBC-encoder-scale-factors-calculation-optimized-with.patch (4.59 kB)
Download all attachments

2009-01-30 17:05:51

by Siarhei Siamashka

[permalink] [raw]
Subject: Re: [PATCH] SBC encoder scale factors calculation optimized with __builtin_clz

On Friday 30 January 2009 13:14:47 ext Christian Hoene wrote:
> > > > Thanks for finding the bug. A common things about these failed
>
> testcases
>
> > > > is that block size is not 16.
> > > >
> > > > Looks like this was not covered by my own regression tests, I'll try
>
> to do
>
> > > > something about this problem now.
> > >
> > > A fix is attached. I also extended my regression test script to cover
>
> such
>
> > > cases.
> >
> > patch has been applied. Thanks.
>
> The tests have passed all.
> http://net.cs.uni-tuebingen.de/html/nexgenvoip/html/
>
> Regards,
>
> Christian

Thanks a lot for keeping an eye on bluez sbc implementation quality.

--
Best regards,
Siarhei Siamashka

2009-01-30 11:14:47

by Christian Hoene

[permalink] [raw]
Subject: RE: [PATCH] SBC encoder scale factors calculation optimized with __builtin_clz

> > > Thanks for finding the bug. A common things about these failed
testcases
> > > is that block size is not 16.
> > >
> > > Looks like this was not covered by my own regression tests, I'll try
to do
> > > something about this problem now.
> >
> > A fix is attached. I also extended my regression test script to cover
such
> > cases.
>
> patch has been applied. Thanks.
>

The tests have passed all.
http://net.cs.uni-tuebingen.de/html/nexgenvoip/html/

Regards,

Christian

2009-01-29 17:01:35

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [PATCH] SBC encoder scale factors calculation optimized with __builtin_clz

Hi Siarhei,

> > > The testing results are not positive. It is better to revoke the patch.
> > > http://net.cs.uni-tuebingen.de/html/nexgenvoip/
> > > http://net.cs.uni-tuebingen.de/html/nexgenvoip/html/encoder.bluez.03.wav
> >
> > Thanks for finding the bug. A common things about these failed testcases
> > is that block size is not 16.
> >
> > Looks like this was not covered by my own regression tests, I'll try to do
> > something about this problem now.
>
> A fix is attached. I also extended my regression test script to cover such
> cases.

patch has been applied. Thanks.

Regards

Marcel



2009-01-29 16:31:49

by Siarhei Siamashka

[permalink] [raw]
Subject: Re: [PATCH] SBC encoder scale factors calculation optimized with __builtin_clz

On Thursday 29 January 2009 17:30:31 ext Siarhei Siamashka wrote:
> On Thursday 29 January 2009 17:00:11 ext Christian Hoene wrote:
> > > > The attached patch contains optimization for scale factors
> > > > calculation
> >
> > which
> >
> > > > provides additional SBC encoder speedup.
> > > >
> > > > For non-gcc compilers, CLZ function is implemented with a very simple
> >
> > and
> >
> > > > slow straightforward code (but it is still faster than current git
> > > > code
> >
> > even
> >
> > > > if used instead of __builtin_clz). Something better could be done
> > > > like:
> > > > http://groups.google.com/group/comp.sys.arm/msg/5ae56e3a95a2345e?hl=e
> > > >n But I'm not sure about license/copyright of the code at this link
> > > > and
> > >
> > > decided
> > >
> > > > not to touch it. Anyway, I don't think that gcc implementation of
> > > > __builtin_clz for the CPU cores which do not support CLZ instruction
> > > > is
> >
> > any
> >
> > > > worse.
> > >
> > > personally I don't really care about non-gcc compilers. I think that
> > > some of the BlueZ source might not even compile without gcc.
> > >
> > > Anyway, patch has been applied. Thanks.
> >
> > The testing results are not positive. It is better to revoke the patch.
> > http://net.cs.uni-tuebingen.de/html/nexgenvoip/
> > http://net.cs.uni-tuebingen.de/html/nexgenvoip/html/encoder.bluez.03.wav
>
> Thanks for finding the bug. A common things about these failed testcases
> is that block size is not 16.
>
> Looks like this was not covered by my own regression tests, I'll try to do
> something about this problem now.

A fix is attached. I also extended my regression test script to cover such
cases.


Best regards,
Siarhei Siamashka


Attachments:
(No filename) (1.67 kB)
0001-Fix-for-SBC-encoding-with-block-sizes-other-than-16.patch (1.72 kB)
Download all attachments

2009-01-29 15:30:31

by Siarhei Siamashka

[permalink] [raw]
Subject: Re: [PATCH] SBC encoder scale factors calculation optimized with __builtin_clz

On Thursday 29 January 2009 17:00:11 ext Christian Hoene wrote:
> > > The attached patch contains optimization for scale factors calculation
>
> which
>
> > > provides additional SBC encoder speedup.
> > >
> > > For non-gcc compilers, CLZ function is implemented with a very simple
>
> and
>
> > > slow straightforward code (but it is still faster than current git code
>
> even
>
> > > if used instead of __builtin_clz). Something better could be done like:
> > > http://groups.google.com/group/comp.sys.arm/msg/5ae56e3a95a2345e?hl=en
> > > But I'm not sure about license/copyright of the code at this link and
> >
> > decided
> >
> > > not to touch it. Anyway, I don't think that gcc implementation of
> > > __builtin_clz for the CPU cores which do not support CLZ instruction is
>
> any
>
> > > worse.
> >
> > personally I don't really care about non-gcc compilers. I think that
> > some of the BlueZ source might not even compile without gcc.
> >
> > Anyway, patch has been applied. Thanks.
>
> The testing results are not positive. It is better to revoke the patch.
> http://net.cs.uni-tuebingen.de/html/nexgenvoip/
> http://net.cs.uni-tuebingen.de/html/nexgenvoip/html/encoder.bluez.03.wav

Thanks for finding the bug. A common things about these failed testcases
is that block size is not 16.

Looks like this was not covered by my own regression tests, I'll try to do
something about this problem now.


Best regards,
Siarhei Siamashka

2009-01-29 15:00:11

by Christian Hoene

[permalink] [raw]
Subject: RE: [PATCH] SBC encoder scale factors calculation optimized with __builtin_clz

>
> > The attached patch contains optimization for scale factors calculation
which
> > provides additional SBC encoder speedup.
> >
> > For non-gcc compilers, CLZ function is implemented with a very simple
and
> > slow straightforward code (but it is still faster than current git code
even
> > if used instead of __builtin_clz). Something better could be done like:
> > http://groups.google.com/group/comp.sys.arm/msg/5ae56e3a95a2345e?hl=en
> > But I'm not sure about license/copyright of the code at this link and
> decided
> > not to touch it. Anyway, I don't think that gcc implementation of
> > __builtin_clz for the CPU cores which do not support CLZ instruction is
any
> > worse.
>
> personally I don't really care about non-gcc compilers. I think that
> some of the BlueZ source might not even compile without gcc.
>
> Anyway, patch has been applied. Thanks.

The testing results are not positive. It is better to revoke the patch.
http://net.cs.uni-tuebingen.de/html/nexgenvoip/
http://net.cs.uni-tuebingen.de/html/nexgenvoip/html/encoder.bluez.03.wav


Greetings

Christian

2009-01-29 09:51:15

by Johan Hedberg

[permalink] [raw]
Subject: Re: [PATCH] SBC encoder scale factors calculation optimized with __builtin_clz

Hi Siarhei,

On Thu, Jan 29, 2009, Siarhei Siamashka wrote:
> On Thursday 29 January 2009 03:10:03 ext Siarhei Siamashka wrote:
> > The attached patch contains optimization for scale factors calculation
> > which provides additional SBC encoder speedup.
>
> And MMX variant of this optimization can be implemented with something
> like this patch. It still needs to be tested on X86-64 systems though.

Seems to work fine on my Lenovo X301 (Core 2 Duo) with ubuntu intrepid
amd64.

Johan

2009-01-29 09:07:48

by Christian Hoene

[permalink] [raw]
Subject: RE: [PATCH] SBC encoder scale factors calculation optimized with __builtin_clz

> Hi Siarhei,
>
> > > The attached patch contains optimization for scale factors calculation
> > > which provides additional SBC encoder speedup.
> >
> > And MMX variant of this optimization can be implemented with something
> > like this patch. It still needs to be tested on X86-64 systems though.
>
> I am holding off on this until I get confirmation that is does the right
> thing. So tester, please test this patch.

Yes, Sir!
I'll do it.

Regards,
Christian


>
> Regards
>
> Marcel
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bluetooth"
in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-01-29 07:28:57

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [PATCH] SBC encoder scale factors calculation optimized with __builtin_clz

Hi Siarhei,

> > The attached patch contains optimization for scale factors calculation
> > which provides additional SBC encoder speedup.
>
> And MMX variant of this optimization can be implemented with something
> like this patch. It still needs to be tested on X86-64 systems though.

I am holding off on this until I get confirmation that is does the right
thing. So tester, please test this patch.

Regards

Marcel



2009-01-29 07:27:38

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [PATCH] SBC encoder scale factors calculation optimized with __builtin_clz

Hi Siarhei,

> The attached patch contains optimization for scale factors calculation which
> provides additional SBC encoder speedup.
>
> For non-gcc compilers, CLZ function is implemented with a very simple and
> slow straightforward code (but it is still faster than current git code even
> if used instead of __builtin_clz). Something better could be done like:
> http://groups.google.com/group/comp.sys.arm/msg/5ae56e3a95a2345e?hl=en
> But I'm not sure about license/copyright of the code at this link and decided
> not to touch it. Anyway, I don't think that gcc implementation of
> __builtin_clz for the CPU cores which do not support CLZ instruction is any
> worse.

personally I don't really care about non-gcc compilers. I think that
some of the BlueZ source might not even compile without gcc.

Anyway, patch has been applied. Thanks.

Regards

Marcel



2009-01-29 01:20:25

by Siarhei Siamashka

[permalink] [raw]
Subject: Re: [PATCH] SBC encoder scale factors calculation optimized with __builtin_clz

On Thursday 29 January 2009 03:10:03 ext Siarhei Siamashka wrote:
> The attached patch contains optimization for scale factors calculation
> which provides additional SBC encoder speedup.

And MMX variant of this optimization can be implemented with something
like this patch. It still needs to be tested on X86-64 systems though.

Best regards,
Siarhei Siamashka


Attachments:
(No filename) (364.00 B)
sbc-scalefactors-mmx.diff (1.97 kB)
Download all attachments

2009-02-02 15:20:56

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [PATCH] SBC encoder scale factors calculation optimized with __builtin_clz

Hi Siarhei,

> > > > > > cases.
> > > > >
> > > > > patch has been applied. Thanks.
> > > >
> > > > The tests have passed all.
> > > > http://net.cs.uni-tuebingen.de/html/nexgenvoip/html/
> > >
> > > Thanks a lot for keeping an eye on bluez sbc implementation quality.
> >
> > are all patches applied now or am I missing one?
>
> Yes, thanks, all the patches have been applied except for this one:
> http://marc.info/?l=linux-bluetooth&m=123319235104928&w=2
>
> It should work fine, but I'm still considering how to better optimize
> processing of joint stereo, which might also require changing this code
> a bit. For now it's more like a demonstration that scale factors processing
> can be SIMD optimized quite fine.
>
> I will provide a more complete patch with optimizations for both scale factors
> and joint stereo a bit later.

please re-sent this one with a proper commit message and I apply it.
Then you can start from there is you like. Or send me a complete new
one.

Regards

Marcel



2009-02-02 11:11:00

by Siarhei Siamashka

[permalink] [raw]
Subject: Re: [PATCH] SBC encoder scale factors calculation optimized with __builtin_clz

On Thursday 29 January 2009 09:27:38 ext Marcel Holtmann wrote:
> Hi Siarhei,
>
> > The attached patch contains optimization for scale factors calculation
> > which provides additional SBC encoder speedup.
> >
> > For non-gcc compilers, CLZ function is implemented with a very simple and
> > slow straightforward code (but it is still faster than current git code
> > even if used instead of __builtin_clz). Something better could be done
> > like:
> > http://groups.google.com/group/comp.sys.arm/msg/5ae56e3a95a2345e?hl=en
> > But I'm not sure about license/copyright of the code at this link and
> > decided not to touch it. Anyway, I don't think that gcc implementation of
> > __builtin_clz for the CPU cores which do not support CLZ instruction is
> > any worse.
>
> personally I don't really care about non-gcc compilers. I think that
> some of the BlueZ source might not even compile without gcc.

At least for the SBC codec, there is no harm in trying to keep it as portable
as it is possible. Somebody might want to compile it with some other, better
optimizing compiler and link with the rest of gcc compiled bluez. Or SBC
processing can be offloaded to DSP (C55x port exists) with the rest of bluez
stack running on main core.

> Anyway, patch has been applied. Thanks.

Thanks for applying it.

--
Best regards,
Siarhei Siamashka

2009-02-02 10:48:14

by Siarhei Siamashka

[permalink] [raw]
Subject: Re: [PATCH] SBC encoder scale factors calculation optimized with __builtin_clz

On Sunday 01 February 2009 18:53:43 ext Marcel Holtmann wrote:
> Hi Siarhei,
>
> > > > > cases.
> > > >
> > > > patch has been applied. Thanks.
> > >
> > > The tests have passed all.
> > > http://net.cs.uni-tuebingen.de/html/nexgenvoip/html/
> >
> > Thanks a lot for keeping an eye on bluez sbc implementation quality.
>
> are all patches applied now or am I missing one?

Yes, thanks, all the patches have been applied except for this one:
http://marc.info/?l=linux-bluetooth&m=123319235104928&w=2

It should work fine, but I'm still considering how to better optimize
processing of joint stereo, which might also require changing this code
a bit. For now it's more like a demonstration that scale factors processing
can be SIMD optimized quite fine.

I will provide a more complete patch with optimizations for both scale factors
and joint stereo a bit later.

--
Best regards,
Siarhei Siamashka

2009-02-01 16:53:43

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [PATCH] SBC encoder scale factors calculation optimized with __builtin_clz

Hi Siarhei,

> > > > cases.
> > >
> > > patch has been applied. Thanks.
> >
> > The tests have passed all.
> > http://net.cs.uni-tuebingen.de/html/nexgenvoip/html/
> >
>
> Thanks a lot for keeping an eye on bluez sbc implementation quality.

are all patches applied now or am I missing one?

Regards

Marcel



2009-03-16 19:32:26

by Siarhei Siamashka

[permalink] [raw]
Subject: Re: [PATCH] SBC encoder scale factors calculation optimized with __builtin_clz

On Monday 02 February 2009 17:20:56 ext Marcel Holtmann wrote:
> Hi Siarhei,
>
> > > > > > > cases.
> > > > > >
> > > > > > patch has been applied. Thanks.
> > > > >
> > > > > The tests have passed all.
> > > > > http://net.cs.uni-tuebingen.de/html/nexgenvoip/html/
> > > >
> > > > Thanks a lot for keeping an eye on bluez sbc implementation quality.
> > >
> > > are all patches applied now or am I missing one?
> >
> > Yes, thanks, all the patches have been applied except for this one:
> > http://marc.info/?l=linux-bluetooth&m=123319235104928&w=2
> >
> > It should work fine, but I'm still considering how to better optimize
> > processing of joint stereo, which might also require changing this code
> > a bit. For now it's more like a demonstration that scale factors
> > processing can be SIMD optimized quite fine.
> >
> > I will provide a more complete patch with optimizations for both scale
> > factors and joint stereo a bit later.
>
> please re-sent this one with a proper commit message and I apply it.
> Then you can start from there is you like. Or send me a complete new
> one.

Done. Commit message added and patch attached.

time ./sbcenc tesfile.au > /dev/null

before:
real 0m6.404s
user 0m6.152s
sys 0m0.244s

after:
real 0m5.630s
user 0m5.376s
sys 0m0.248s

--
Best regards,
Siarhei Siamashka


Attachments:
(No filename) (1.30 kB)
0002-sbc-MMX-optimization-for-scale-factors-calculation.patch (2.38 kB)
Download all attachments