2021-02-03 16:39:00

by Sven Van Asbroeck

[permalink] [raw]
Subject: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

From: Sven Van Asbroeck <[email protected]>

We have observed that under certain repeatable circumstances, the CODA
mem2mem device consistently generates corrupted frames. This happens only
on an i.MX6qp (Plus) - the classic imx6q is not affected.

This happens when the virtual X screen is wider than 0x900 pixels (1).

Quite strange, because CODA is a mem2mem device, and is presumably not touching
any of the IPU/GPU2D/GPU3D infrastructure used by X. Except if there is a hidden
dependency somehow.

I have captured and visualized generated CODA frames as follows:
gst-launch-1.0 playbin uri=file:///home/default/nycTrain1080p.mp4 flags=0x45
video-sink='multifilesink location=frame%d.yuv'
See (2) for how I converted the raw YUV frame to a PNG image.

For example, the following will break CODA mpeg4 decode (width >= 0x900):
# xrandr --fb 2400x1088
Screen 0: minimum 1 x 1, current 2400 x 1088, maximum 4096 x 4096
HDMI1 disconnected (normal left inverted right x axis y axis)
LVDS1 connected primary 1280x800+0+0 (normal left inverted right x axis y axis) 0mm x 0mm
1280x800 59.79*+

Resulting frame when dumped with multifilesink (NOT written to the display):
https://gitlab.com/TheSven73/coda-investigation/-/blob/master/stripes.png

And the following will restore CODA mpeg4 decode (width < 0x900):
# xrandr --fb 2300x1088
Screen 0: minimum 1 x 1, current 2300 x 1088, maximum 4096 x 4096
HDMI1 disconnected (normal left inverted right x axis y axis)
LVDS1 connected primary 1280x800+0+0 (normal left inverted right x axis y axis) 0mm x 0mm
1280x800 59.79*+

Resulting frame when dumped with multifilesink (NOT written to the display):
https://gitlab.com/TheSven73/coda-investigation/-/blob/master/ok.png

Additional info:
- only the virtual X screen width seems to trigger the issue, it is
independent of the height.
- issue seems independent of the pixel format. Forcing CODA to output NV12
shows the same behaviour.

System description:
- i.MX6 QuadPlus:
[ 0.144518] CPU identified as i.MX6QP, silicon rev 1.1
- mainline Linux v5.9.16 with a small private patchset on top
(patchset does not touch CODA)
- CODA960 silicon contained within i.MX6 QuadPlus:
[ 4798.510033] coda 2040000.vpu: Firmware code revision: 46076
[ 4798.515916] coda 2040000.vpu: Initialized CODA960.
[ 4798.520779] coda 2040000.vpu: Firmware version: 3.1.1
- gstreamer from buildroot:
gst-launch-1.0 version 1.16.2
GStreamer 1.16.2
- X from buildroot, using armada and etnadrm_gpu plugins:
X.Org X Server 1.20.7
X Protocol Version 11, Revision 0
[ 99.527] (II) LoadModule: "armada"
[ 99.527] (II) Loading /usr/lib/xorg/modules/drivers/armada_drv.so
[ 99.538] (II) Module armada: vendor="X.Org Foundation"
[ 99.538] compiled for 1.20.7, module version = 0.0.0
[ 99.538] Module class: X.Org Video Driver
[ 99.538] ABI class: X.Org Video Driver, version 24.1
[ 99.538] (II) armada: Support for Marvell LCD Controller: 88AP510
[ 99.539] (II) armada: Support for Freescale IPU: i.MX6
[ 99.545] (II) armada(0): Added screen for KMS device /dev/dri/card1
[ 99.561] (II) armada(0): hardware: imx-drm
[ 99.563] (**) armada(0): Option "AccelModule" "etnadrm_gpu"
[ 99.563] (II) Loading sub module "etnadrm_gpu"
[ 99.563] (II) LoadModule: "etnadrm_gpu"
[ 99.564] (II) Loading /usr/lib/xorg/modules/drivers/etnadrm_gpu.so
[ 99.576] (II) Module Etnaviv GPU driver (DRM): vendor="X.Org Foundation"
[ 99.576] compiled for 1.20.7, module version = 0.0.0


(1) When using multiple displays, the virtual X screen is typically the bounding
rectangle which includes all screens. That's why it can become wider than
1920 pixels.

(2)

# Convert raw YUYV to PNG
# Python, runs out of the box on a stock Google Colab notebook
import cv2
import numpy as np
import matplotlib.pyplot as plt
import matplotlib

img = np.fromfile('frame1.yuv', dtype=np.uint8)
# YUYV has two 8-bit channels per pixel
img.shape = (1088, 1920, 2)

img2 = cv2.cvtColor(img, cv2.COLOR_YUV2RGB_YUYV)
plt.imshow(img2)
matplotlib.image.imsave('frame1.png', img2)

To: Philipp Zabel <[email protected]>
To: Mauro Carvalho Chehab <[email protected]>
Cc: Adrian Ratiu <[email protected]>
Cc: Lucas Stach <[email protected]>
Cc: Fabio Estevam <[email protected]>
Cc: [email protected]
Cc: [email protected]


2021-02-10 16:15:32

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

Hi Sven,

Le mercredi 03 février 2021 à 11:33 -0500, Sven Van Asbroeck a écrit :
> From: Sven Van Asbroeck <[email protected]>
>
> We have observed that under certain repeatable circumstances, the CODA
> mem2mem device consistently generates corrupted frames. This happens only
> on an i.MX6qp (Plus) - the classic imx6q is not affected.
>
> This happens when the virtual X screen is wider than 0x900 pixels (1).

Are you sure you aren't just running out of CMA ? This is the only things that
comes to mind at the moment, sorry if it's not that useful.

>
> Quite strange, because CODA is a mem2mem device, and is presumably not
> touching
> any of the IPU/GPU2D/GPU3D infrastructure used by X. Except if there is a
> hidden
> dependency somehow.
>
> I have captured and visualized generated CODA frames as follows:
> gst-launch-1.0 playbin uri=file:///home/default/nycTrain1080p.mp4 flags=0x45
>     video-sink='multifilesink location=frame%d.yuv'
> See (2) for how I converted the raw YUV frame to a PNG image.
>
> For example, the following will break CODA mpeg4 decode (width >= 0x900):
> # xrandr --fb 2400x1088
> Screen 0: minimum 1 x 1, current 2400 x 1088, maximum 4096 x 4096
> HDMI1 disconnected (normal left inverted right x axis y axis)
> LVDS1 connected primary 1280x800+0+0 (normal left inverted right x axis y
> axis) 0mm x 0mm
>    1280x800      59.79*+
>
> Resulting frame when dumped with multifilesink (NOT written to the display):
> https://gitlab.com/TheSven73/coda-investigation/-/blob/master/stripes.png
>
> And the following will restore CODA mpeg4 decode (width < 0x900):
> # xrandr --fb 2300x1088
> Screen 0: minimum 1 x 1, current 2300 x 1088, maximum 4096 x 4096
> HDMI1 disconnected (normal left inverted right x axis y axis)
> LVDS1 connected primary 1280x800+0+0 (normal left inverted right x axis y
> axis) 0mm x 0mm
>    1280x800      59.79*+
>
> Resulting frame when dumped with multifilesink (NOT written to the display):
> https://gitlab.com/TheSven73/coda-investigation/-/blob/master/ok.png
>
> Additional info:
> - only the virtual X screen width seems to trigger the issue, it is
>   independent of the height.
> - issue seems independent of the pixel format. Forcing CODA to output NV12
>   shows the same behaviour.
>
> System description:
> - i.MX6 QuadPlus:
> [    0.144518] CPU identified as i.MX6QP, silicon rev 1.1
> - mainline Linux v5.9.16 with a small private patchset on top
>   (patchset does not touch CODA)
> - CODA960 silicon contained within i.MX6 QuadPlus:
> [ 4798.510033] coda 2040000.vpu: Firmware code revision: 46076
> [ 4798.515916] coda 2040000.vpu: Initialized CODA960.
> [ 4798.520779] coda 2040000.vpu: Firmware version: 3.1.1
> - gstreamer from buildroot:
> gst-launch-1.0 version 1.16.2
> GStreamer 1.16.2
> - X from buildroot, using armada and etnadrm_gpu plugins:
> X.Org X Server 1.20.7
> X Protocol Version 11, Revision 0
> [    99.527] (II) LoadModule: "armada"
> [    99.527] (II) Loading /usr/lib/xorg/modules/drivers/armada_drv.so
> [    99.538] (II) Module armada: vendor="X.Org Foundation"
> [    99.538]    compiled for 1.20.7, module version = 0.0.0
> [    99.538]    Module class: X.Org Video Driver
> [    99.538]    ABI class: X.Org Video Driver, version 24.1
> [    99.538] (II) armada: Support for Marvell LCD Controller: 88AP510
> [    99.539] (II) armada: Support for Freescale IPU: i.MX6
> [    99.545] (II) armada(0): Added screen for KMS device /dev/dri/card1
> [    99.561] (II) armada(0): hardware: imx-drm
> [    99.563] (**) armada(0): Option "AccelModule" "etnadrm_gpu"
> [    99.563] (II) Loading sub module "etnadrm_gpu"
> [    99.563] (II) LoadModule: "etnadrm_gpu"
> [    99.564] (II) Loading /usr/lib/xorg/modules/drivers/etnadrm_gpu.so
> [    99.576] (II) Module Etnaviv GPU driver (DRM): vendor="X.Org Foundation"
> [    99.576]    compiled for 1.20.7, module version = 0.0.0
>
>
> (1) When using multiple displays, the virtual X screen is typically the
> bounding
>     rectangle which includes all screens. That's why it can become wider than
>     1920 pixels.
>
> (2)
>
> # Convert raw YUYV to PNG
> # Python, runs out of the box on a stock Google Colab notebook
> import cv2
> import numpy as np
> import matplotlib.pyplot as plt
> import matplotlib
>
> img = np.fromfile('frame1.yuv', dtype=np.uint8)
> # YUYV has two 8-bit channels per pixel
> img.shape = (1088, 1920, 2)
>
> img2 = cv2.cvtColor(img, cv2.COLOR_YUV2RGB_YUYV)
> plt.imshow(img2)
> matplotlib.image.imsave('frame1.png', img2)
>
> To: Philipp Zabel <[email protected]>
> To: Mauro Carvalho Chehab <[email protected]>
> Cc: Adrian Ratiu <[email protected]>
> Cc: Lucas Stach <[email protected]>
> Cc: Fabio Estevam <[email protected]>
> Cc: [email protected]
> Cc: [email protected]


2021-02-10 18:26:56

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

Bonjour Nicolas,

On Wed, Feb 10, 2021 at 11:11 AM Nicolas Dufresne <[email protected]> wrote:
>
> Are you sure you aren't just running out of CMA ? This is the only things that
> comes to mind at the moment, sorry if it's not that useful.

Thanks for the suggestion! No worries, this is such a strange/weird
problem, that basically any idea has merit at this point.

I tried increasing the CMA area from 256M -> 512M, but there was no
impact. The critical framebuffer width still remains the same
(=0x900).

And everything works fine on a classic i.MX6Quad, it's only the
i.MX6QuadPlus that has the problem. I am running i.MX6Quad and
i.MX6QuadPlus side-by-side with identical kernels/rootfses. Obviously
the devicetree is slightly different.

Sven

2021-02-10 18:44:31

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

Found it!

The i.MX6QuadPlus has two pairs of PREs, which use the extended
section of the iRAM. The Classic does not have any PREs or extended
iRAM:

pre1: pre@21c8000 {
compatible = "fsl,imx6qp-pre";
<snip>
fsl,iram = <&ocram2>;
};

pre3: pre@21ca000 {
compatible = "fsl,imx6qp-pre";
<snip>
fsl,iram = <&ocram3>;
};

The CODA (VPU) driver uses the common section of iRAM:

vpu: vpu@2040000 {
compatible = "cnm,coda960";
<snip>
iram = <&ocram>;
};

The VPU or the PREs are overrunning their assigned iRAM area. How do I
know? Because if I change the PRE iRAM order, the problem disappears!

PRE1: ocram2 change to ocram3
PRE2: ocram2 change to ocram3
PRE3: ocram3 change to ocram2
PRE4: ocram3 change to ocram2

Sven

2021-02-11 14:38:50

by Philipp Zabel

[permalink] [raw]
Subject: Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

Hi Sven,

On Wed, Feb 10, 2021 at 01:29:29PM -0500, Sven Van Asbroeck wrote:
> Found it!
>
> The i.MX6QuadPlus has two pairs of PREs, which use the extended
> section of the iRAM. The Classic does not have any PREs or extended
> iRAM:
>
> pre1: pre@21c8000 {
> compatible = "fsl,imx6qp-pre";
> <snip>
> fsl,iram = <&ocram2>;
> };
>
> pre3: pre@21ca000 {
> compatible = "fsl,imx6qp-pre";
> <snip>
> fsl,iram = <&ocram3>;
> };
>
> The CODA (VPU) driver uses the common section of iRAM:
>
> vpu: vpu@2040000 {
> compatible = "cnm,coda960";
> <snip>
> iram = <&ocram>;
> };
>
> The VPU or the PREs are overrunning their assigned iRAM area. How do I
> know? Because if I change the PRE iRAM order, the problem disappears!
>
> PRE1: ocram2 change to ocram3
> PRE2: ocram2 change to ocram3
> PRE3: ocram3 change to ocram2
> PRE4: ocram3 change to ocram2

Thank you for debugging this. Given that CODA uses the OCRAM address
range 0x900000-0x940000 and the PREs use OCRAM2 at 0x940000-0x960000
and OCRAM3 at 0x960000-0x980000, it seems unlikely that the PREs would
overrun into the CODA iRAM. But maybe there is some stride related
overflow that causes it to write at negative offsets or some other kind
of oversight.

Could you check /sys/kernel/debug/dri/?/state while running the error case?

Another thing that might help to identify who is writing where might be to
clear the whole OCRAM region and dump it after running only decode or only
PRE/PRG scanout, for example:

----------8<----------
/* Clear OCRAM */
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>

#define OCRAM_START 0x900000
#define OCRAM_SIZE 0x80000

int main(int argc, char *argv[])
{
int fd = open("/dev/mem", O_RDWR | O_SYNC);
void *map = mmap(NULL, OCRAM_SIZE, PROT_WRITE, MAP_SHARED, fd, OCRAM_START);
if (map == MAP_FAILED)
return EXIT_FAILURE;
memset(map, 0, OCRAM_SIZE);
munmap(map, OCRAM_SIZE);
close(fd);
return EXIT_SUCCESS;
}
---------->8----------

----------8<----------
/* Dump OCRAM to stdout */
#include <fcntl.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>

#define OCRAM_START 0x900000
#define OCRAM_SIZE 0x80000

int main(int argc, char *argv[])
{
int fd = open("/dev/mem", O_RDONLY | O_SYNC);
void *map = mmap(NULL, OCRAM_SIZE, PROT_READ, MAP_SHARED, fd, OCRAM_START);
if (map == MAP_FAILED)
return EXIT_FAILURE;
write(1, map, OCRAM_SIZE);
munmap(map, OCRAM_SIZE);
close(fd);
return EXIT_SUCCESS;
}
---------->8----------

regards
Philipp

2021-02-11 16:32:57

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

Hi Philipp, thank you so much for looking into this, I really appreciate it !

On Thu, Feb 11, 2021 at 9:32 AM Philipp Zabel <[email protected]> wrote:
>
> Another thing that might help to identify who is writing where might be to
> clear the whole OCRAM region and dump it after running only decode or only
> PRE/PRG scanout, for example:

Great idea, I will try that out. This might take a few days. I am also
dealing with higher priority issues,

>
> Could you check /sys/kernel/debug/dri/?/state while running the error case?

dri state in non-error case:
============================

# cat state
plane[31]: plane-0
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.000000x0.000000+0.000000+0.000000
rotation=1
normalized-zpos=0
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
plane[35]: plane-1
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.000000x0.000000+0.000000+0.000000
rotation=1
normalized-zpos=1
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
plane[38]: plane-2
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.000000x0.000000+0.000000+0.000000
rotation=1
normalized-zpos=0
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
plane[42]: plane-3
crtc=crtc-2
fb=59
allocated by = X
refcount=2
format=XR24 little-endian (0x34325258)
modifier=0x0
size=1280x1088
layers:
size[0]=1280x1088
pitch[0]=5120
offset[0]=0
obj[0]:
name=2
refcount=4
start=000105e4
size=5570560
imported=no
paddr=0xee800000
vaddr=78a02004
crtc-pos=1280x800+0+0
src-pos=1280.000000x800.000000+0.000000+0.000000
rotation=1
normalized-zpos=0
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
plane[46]: plane-4
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.000000x0.000000+0.000000+0.000000
rotation=1
normalized-zpos=1
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
plane[49]: plane-5
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.000000x0.000000+0.000000+0.000000
rotation=1
normalized-zpos=0
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
crtc[34]: crtc-0
enable=0
active=0
self_refresh_active=0
planes_changed=0
mode_changed=0
active_changed=0
connectors_changed=0
color_mgmt_changed=0
plane_mask=0
connector_mask=0
encoder_mask=0
mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
crtc[41]: crtc-1
enable=0
active=0
self_refresh_active=0
planes_changed=0
mode_changed=0
active_changed=0
connectors_changed=0
color_mgmt_changed=0
plane_mask=0
connector_mask=0
encoder_mask=0
mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
crtc[45]: crtc-2
enable=1
active=1
self_refresh_active=0
planes_changed=0
mode_changed=0
active_changed=0
connectors_changed=0
color_mgmt_changed=0
plane_mask=8
connector_mask=2
encoder_mask=2
mode: "": 60 67880 1280 1344 1345 1350 800 838 839 841 0x0 0x0
crtc[52]: crtc-3
enable=0
active=0
self_refresh_active=0
planes_changed=0
mode_changed=0
active_changed=0
connectors_changed=0
color_mgmt_changed=0
plane_mask=0
connector_mask=0
encoder_mask=0
mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
connector[54]: HDMI-A-1
crtc=(null)
self_refresh_aware=0
connector[57]: LVDS-1
crtc=crtc-2
self_refresh_aware=0

dri state in error case:
========================
# cat state
plane[31]: plane-0
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.000000x0.000000+0.000000+0.000000
rotation=1
normalized-zpos=0
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
plane[35]: plane-1
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.000000x0.000000+0.000000+0.000000
rotation=1
normalized-zpos=1
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
plane[38]: plane-2
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.000000x0.000000+0.000000+0.000000
rotation=1
normalized-zpos=0
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
plane[42]: plane-3
crtc=crtc-2
fb=60
allocated by = X
refcount=2
format=XR24 little-endian (0x34325258)
modifier=0x0
size=3000x1088
layers:
size[0]=3000x1088
pitch[0]=12000
offset[0]=0
obj[0]:
name=1
refcount=4
start=00010b34
size=13058048
imported=no
paddr=0xeee00000
vaddr=37dd5aa6
crtc-pos=1280x800+0+0
src-pos=1280.000000x800.000000+0.000000+0.000000
rotation=1
normalized-zpos=0
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
plane[46]: plane-4
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.000000x0.000000+0.000000+0.000000
rotation=1
normalized-zpos=1
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
plane[49]: plane-5
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.000000x0.000000+0.000000+0.000000
rotation=1
normalized-zpos=0
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
crtc[34]: crtc-0
enable=0
active=0
self_refresh_active=0
planes_changed=0
mode_changed=0
active_changed=0
connectors_changed=0
color_mgmt_changed=0
plane_mask=0
connector_mask=0
encoder_mask=0
mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
crtc[41]: crtc-1
enable=0
active=0
self_refresh_active=0
planes_changed=0
mode_changed=0
active_changed=0
connectors_changed=0
color_mgmt_changed=0
plane_mask=0
connector_mask=0
encoder_mask=0
mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
crtc[45]: crtc-2
enable=1
active=1
self_refresh_active=0
planes_changed=0
mode_changed=0
active_changed=0
connectors_changed=0
color_mgmt_changed=0
plane_mask=8
connector_mask=2
encoder_mask=2
mode: "": 60 67880 1280 1344 1345 1350 800 838 839 841 0x0 0x0
crtc[52]: crtc-3
enable=0
active=0
self_refresh_active=0
planes_changed=0
mode_changed=0
active_changed=0
connectors_changed=0
color_mgmt_changed=0
plane_mask=0
connector_mask=0
encoder_mask=0
mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
connector[54]: HDMI-A-1
crtc=(null)
self_refresh_aware=0
connector[57]: LVDS-1
crtc=crtc-2
self_refresh_aware=0

2021-02-12 23:55:04

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

Philipp, Fabio,

I was able to verify that the PREs do indeed overrun their allocated ocram area.

Section 38.5.1 of the iMX6QuadPlus manual indicates the ocram size
required: width(pixels) x 8 lines x 4 bytes. For 2048 pixels max, this
comes to 64K. This is what the PRE driver allocates. So far, so good.

The trouble starts when we're displaying a section of a much wider
bitmap. This happens in X when using two displays. e.g.:
HDMI 1920x1088
LVDS 1280x800
X bitmap 3200x1088, left side displayed on HDMI, right side on LVDS.

In such a case, the stride will be much larger than the width of a
display scanline.

This is where things start to go very wrong.

I found that the ocram area used by the PREs increases with the
stride. I experimentally found a formula:
ocam_used = display_widthx8x4 + (bitmap_width-display_width)x7x4

As the stride increases, the PRE eventually overruns the ocram and...
ends up in the "ocram aliased" area, where it overwrites the ocram in
use by the vpu/coda !

I could not find any PRE register setting that changes the used ocram area.

Sven

2021-02-15 10:19:02

by Lucas Stach

[permalink] [raw]
Subject: Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

Hi Sven,

Am Freitag, dem 12.02.2021 um 18:52 -0500 schrieb Sven Van Asbroeck:
> Philipp, Fabio,
>
> I was able to verify that the PREs do indeed overrun their allocated ocram area.
>
> Section 38.5.1 of the iMX6QuadPlus manual indicates the ocram size
> required: width(pixels) x 8 lines x 4 bytes. For 2048 pixels max, this
> comes to 64K. This is what the PRE driver allocates. So far, so good.
>
> The trouble starts when we're displaying a section of a much wider
> bitmap. This happens in X when using two displays. e.g.:
> HDMI 1920x1088
> LVDS 1280x800
> X bitmap 3200x1088, left side displayed on HDMI, right side on LVDS.
>
> In such a case, the stride will be much larger than the width of a
> display scanline.

Urgh, bad tested corner case.

> This is where things start to go very wrong.
>
> I found that the ocram area used by the PREs increases with the
> stride. I experimentally found a formula:
> ocam_used = display_widthx8x4 + (bitmap_width-display_width)x7x4
>
> As the stride increases, the PRE eventually overruns the ocram and...
> ends up in the "ocram aliased" area, where it overwrites the ocram in
> use by the vpu/coda !
>
> I could not find any PRE register setting that changes the used ocram area.

There is no such setting. The PRE always prefetches a doublebuffer of
2x4 scanlines and the scanline size is defined by the store engine
pitch.

The straight forward way to fix this would be to just disable the PRE
when the stride is getting too large, which might not work well with
all userspace requirements, as it effectively disables the ability to
scan GPU tiled surfaces when the stride is getting too large.

I'm not sure if this works in practice, as the PRG address rewriting
might make this harder than it seems, but on could probably try to
rewrite the prefetch start address, input pitch, input width/height and
store pitch of the PRE settings to cover only the area used by the the
CRTC to reduce OCRAM requirements.

Regards,
Lucas

2021-02-15 17:11:38

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

Hi Lucas,

On Mon, Feb 15, 2021 at 5:15 AM Lucas Stach <[email protected]> wrote:
>
> The straight forward way to fix this would be to just disable the PRE
> when the stride is getting too large, which might not work well with
> all userspace requirements, as it effectively disables the ability to
> scan GPU tiled surfaces when the stride is getting too large.

Thank you for your very knowledgeable input, really appreciate it.

I am wondering why I am the first to notice this particular corner
case. Is this perhaps because X+armada plugin allocate a huge bitmap
that fits all displays, and other software frameworks do not? Are
people on i.MX6 mostly using X or Wayland? If Wayland allocates a
separate bitmap for each display, this PRE bug will of course never
show up...

Sven

2021-02-15 17:19:18

by Lucas Stach

[permalink] [raw]
Subject: Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

Am Montag, dem 15.02.2021 um 10:54 -0500 schrieb Sven Van Asbroeck:
> Hi Lucas,
>
> On Mon, Feb 15, 2021 at 5:15 AM Lucas Stach <[email protected]> wrote:
> >
> > The straight forward way to fix this would be to just disable the PRE
> > when the stride is getting too large, which might not work well with
> > all userspace requirements, as it effectively disables the ability to
> > scan GPU tiled surfaces when the stride is getting too large.
>
> Thank you for your very knowledgeable input, really appreciate it.
>
> I am wondering why I am the first to notice this particular corner
> case. Is this perhaps because X+armada plugin allocate a huge bitmap
> that fits all displays, and other software frameworks do not? Are
> people on i.MX6 mostly using X or Wayland? If Wayland allocates a
> separate bitmap for each display, this PRE bug will of course never
> show up...

Yep, I really doubt that there are a lot i.MX6QP, multi-display, X.Org
based devices out there.

While it's not anywhere in a protocol or similar fixed API, Wayland
compositors mostly opted to have a separate surface per display. The
weston reference compositor started out this way (as it makes surface
repaint easier) and other followed the lead, so Wayland based stacks
won't hit this case.

Regards,
Lucas