2015-04-10 01:00:52

by Olivier Crête

[permalink] [raw]
Subject: libva decoding performance regression with kernel 4.0-rc

Hello,

Using an Atom E3845 board, we had a pretty bad performance regression
when upgrading to 4.0-rc6 from 3.19. With the help of git bisect, I
traced it back to commit 78a42377. Reverting this commit and subsequent
related commits (b9ffd80, 71745376, etc) fixes the performance
regression for me.

Without those patches, I can play 8-9 1080p MPEG2 streams, after them,
it's down to 5-6.

I tested using a libdrm checkout from Feb 16, and the latest git master
of libva, libva-intel-driver and gst-plugins-vaapi. The "identity
drop-probability=1" is to prevent anything from being displayed, so it's
purely decoding performance.

Pure decode, single stream not displayed:
time gst-launch-1.0 filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! identity drop-probability=1 ! vaapisink

With kernel 3.18.0-rc7-01052-g493018d
real 0m11.429s
user 0m6.516s
sys 0m1.640s

With kernel 3.18.0-rc7-01053-g78a4237
real 0m12.694s
user 0m6.744s
sys 0m2.680s


8 simultaneous streams displayed:
time gst-launch-1.0 filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0

With kernel 3.18.0-rc7-01052-g493018d
real 2m45.317s
user 1m21.296s
sys 0m51.080s

With kernel 3.18.0-rc7-01053-g78a4237
real 3m1.275s
user 1m24.336s
sys 1m38.360s


--
Olivier Crête
[email protected]


2015-04-10 06:23:44

by Chris Wilson

[permalink] [raw]
Subject: Re: libva decoding performance regression with kernel 4.0-rc

On Thu, Apr 09, 2015 at 09:00:43PM -0400, Olivier Cr?te wrote:
> Hello,
>
> Using an Atom E3845 board, we had a pretty bad performance regression
> when upgrading to 4.0-rc6 from 3.19. With the help of git bisect, I
> traced it back to commit 78a42377. Reverting this commit and subsequent
> related commits (b9ffd80, 71745376, etc) fixes the performance
> regression for me.

Can you please test

http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=for-olivier-crete

on your setup.

First
http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-olivier-crete&id=044307a99b418258ac0d775460d73b20b80277c1
to get a baseline with nightly as that contains some fine tuning to the
batch allocations, which is pretty significant for libva on Atom (only
double clflushing one or two pages every batch rather than 128) and then
http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-olivier-crete&id=0a24802a5b61403b887ce401ce3efd52f5fd1eac
to see if the command parser tuning helps.

Hope this helps,
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2015-04-10 23:26:10

by Olivier Crête

[permalink] [raw]
Subject: Re: libva decoding performance regression with kernel 4.0-rc

Hello,

Thanks for the quick reply!

With my real use-cases:

1. 9x 720p60 mpeg2 videos
- 4.0-rc6: ~12 frames per second are on time
- 4.0-rc6 + reverts: a stable 45 frames per second are on time
- 044307a9: 40-45 frames per second are on time
- 0a24802a: 45-46 frames per second are on time

2. 1080i30 mpeg2 videos
- 4.0-rc6: 5 videos
- 044307a9: 10 videos
- 0a24802a: 10 videos

So you basically beat my baseline too, good job, thanks a lot! Any
chance you can sneak this into 4.0 ?

Olivier

On Fri, 2015-04-10 at 07:23 +0100, Chris Wilson wrote:
> On Thu, Apr 09, 2015 at 09:00:43PM -0400, Olivier Crête wrote:
> > Hello,
> >
> > Using an Atom E3845 board, we had a pretty bad performance regression
> > when upgrading to 4.0-rc6 from 3.19. With the help of git bisect, I
> > traced it back to commit 78a42377. Reverting this commit and subsequent
> > related commits (b9ffd80, 71745376, etc) fixes the performance
> > regression for me.
>
> Can you please test
>
> http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=for-olivier-crete
>
> on your setup.
>
> First
> http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-olivier-crete&id=044307a99b418258ac0d775460d73b20b80277c1
> to get a baseline with nightly as that contains some fine tuning to the
> batch allocations, which is pretty significant for libva on Atom (only
> double clflushing one or two pages every batch rather than 128) and then
> http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-olivier-crete&id=0a24802a5b61403b887ce401ce3efd52f5fd1eac
> to see if the command parser tuning helps.
>
> Hope this helps,
> -Chris
>

--
Olivier Crête
[email protected]