2023-04-13 10:53:38

by Benjamin Gaignard

[permalink] [raw]
Subject: [PATCH v2] media: verisilicon: Fix crash when probing encoder

ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt()
so assigne it to vpu_fmt led to crash the kernel.
Like for decoder case use 'fmt' as format for encoder and clean up
the code.

Signed-off-by: Benjamin Gaignard <[email protected]>
Tested-by: Marek Szyprowski <[email protected]>
Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")
---
version 2:
- Remove useless vpu_fmt.

drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
index 8f1414085f47..d71f79471396 100644
--- a/drivers/media/platform/verisilicon/hantro_v4l2.c
+++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
@@ -275,7 +275,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
struct v4l2_pix_format_mplane *pix_mp,
enum v4l2_buf_type type)
{
- const struct hantro_fmt *fmt, *vpu_fmt;
+ const struct hantro_fmt *fmt;
bool capture = V4L2_TYPE_IS_CAPTURE(type);
bool coded;

@@ -295,11 +295,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,

if (coded) {
pix_mp->num_planes = 1;
- vpu_fmt = fmt;
- } else if (ctx->is_encoder) {
- vpu_fmt = ctx->vpu_dst_fmt;
- } else {
- vpu_fmt = fmt;
+ } else if (!ctx->is_encoder) {
/*
* Width/height on the CAPTURE end of a decoder are ignored and
* replaced by the OUTPUT ones.
@@ -311,7 +307,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
pix_mp->field = V4L2_FIELD_NONE;

v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height,
- &vpu_fmt->frmsize);
+ &fmt->frmsize);

if (!coded) {
/* Fill remaining fields */
--
2.34.1


2023-04-13 19:51:47

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: [PATCH v2] media: verisilicon: Fix crash when probing encoder

Hi,

cause I sent a Rb on old one after v2 was sent, the extra cleanup make sense to
me.

Le jeudi 13 avril 2023 à 12:47 +0200, Benjamin Gaignard a écrit :
> ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt()
> so assigne it to vpu_fmt led to crash the kernel.
> Like for decoder case use 'fmt' as format for encoder and clean up
> the code.
>
> Signed-off-by: Benjamin Gaignard <[email protected]>
> Tested-by: Marek Szyprowski <[email protected]>
> Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")


Reviewed-by: Nicolas Dufresne <[email protected]>

> ---
> version 2:
> - Remove useless vpu_fmt.
>
> drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++-------
> 1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
> index 8f1414085f47..d71f79471396 100644
> --- a/drivers/media/platform/verisilicon/hantro_v4l2.c
> +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
> @@ -275,7 +275,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
> struct v4l2_pix_format_mplane *pix_mp,
> enum v4l2_buf_type type)
> {
> - const struct hantro_fmt *fmt, *vpu_fmt;
> + const struct hantro_fmt *fmt;
> bool capture = V4L2_TYPE_IS_CAPTURE(type);
> bool coded;
>
> @@ -295,11 +295,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
>
> if (coded) {
> pix_mp->num_planes = 1;
> - vpu_fmt = fmt;
> - } else if (ctx->is_encoder) {
> - vpu_fmt = ctx->vpu_dst_fmt;
> - } else {
> - vpu_fmt = fmt;
> + } else if (!ctx->is_encoder) {
> /*
> * Width/height on the CAPTURE end of a decoder are ignored and
> * replaced by the OUTPUT ones.
> @@ -311,7 +307,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
> pix_mp->field = V4L2_FIELD_NONE;
>
> v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height,
> - &vpu_fmt->frmsize);
> + &fmt->frmsize);
>
> if (!coded) {
> /* Fill remaining fields */
> --
> 2.34.1
>
>

2023-04-13 19:55:46

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: [PATCH v2] media: verisilicon: Fix crash when probing encoder

Hi,

Le jeudi 13 avril 2023 à 10:10 -0300, Ezequiel Garcia a écrit :
> Benjamin,
>
> Please include the crash stracktrace in the commit.
>
>
Careful with HTML message, they don't always make it in these ML and tooling
might not play well with the tooling. Perhaps it can be edited while pulling ?
Here's the info from Marek's bug report:

hantro-vpu fdea0000.video-codec: Adding to iommu group 0
hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as
/dev/video0
hantro-vpu fdee0000.video-codec: Adding to iommu group 1
hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as
/dev/video1
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000008
Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=00000001f446f000
[0000000000000008] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
Modules linked in: hantro_vpu v4l2_vp9 v4l2_h264 v4l2_mem2mem
videobuf2_dma_contig snd_soc_simple_card display_connector
snd_soc_simple_card_utils videobuf2_memops crct10dif_ce dwmac_rk
rockchip_thermal videobuf2_v4l2 stmmac_platform rockchip_saradc
industrialio_triggered_buffer kfifo_buf stmmac videodev pcs_xpcs
rtc_rk808 videobuf2_common rockchipdrm panfrost mc drm_shmem_helper
analogix_dp gpu_sched dw_mipi_dsi dw_hdmi drm_display_helper ip_tables
x_tables ipv6
CPU: 3 PID: 171 Comm: v4l_id Not tainted 6.3.0-rc2+ #13478
Hardware name: Hardkernel ODROID-M1 (DT)
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : hantro_try_fmt+0xb4/0x280 [hantro_vpu]
lr : hantro_try_fmt+0xa8/0x280 [hantro_vpu]
...
Call trace:
hantro_try_fmt+0xb4/0x280 [hantro_vpu]
hantro_set_fmt_out+0x3c/0x278 [hantro_vpu]
hantro_reset_raw_fmt+0x94/0xb4 [hantro_vpu]
hantro_set_fmt_cap+0x23c/0x250 [hantro_vpu]
hantro_reset_fmts+0x94/0xcc [hantro_vpu]
hantro_open+0xd4/0x20c [hantro_vpu]
v4l2_open+0x80/0x120 [videodev]
chrdev_open+0xc0/0x22c
do_dentry_open+0x13c/0x490
vfs_open+0x2c/0x38
path_openat+0x550/0x938
do_filp_open+0x80/0x12c
do_sys_openat2+0xb4/0x16c
__arm64_sys_openat+0x64/0xac
invoke_syscall+0x48/0x114
el0_svc_common.constprop.0+0xfc/0x11c
do_el0_svc+0x38/0xa4
el0_svc+0x48/0xb8
el0t_64_sync_handler+0xb8/0xbc
el0t_64_sync+0x190/0x194
Code: 97fe726c f940aa80 52864a61 72a686c1 (b9400800)
---[ end trace 0000000000000000 ]---


>
> Thanks,
> Ezequiel
>
>
> On Thu, Apr 13, 2023 at 7:48 AM Benjamin Gaignard
> <[email protected]> wrote:
> > ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt()
> > so assigne it to vpu_fmt led to crash the kernel.
> > Like for decoder case use 'fmt' as format for encoder and clean up
> > the code.
> >
> > Signed-off-by: Benjamin Gaignard <[email protected]>
> > Tested-by: Marek Szyprowski <[email protected]>
> > Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats
> > in reset functions")
> > ---
> > version 2:
> > - Remove useless vpu_fmt.
> >
> >  drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++-------
> >  1 file changed, 3 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c
> > b/drivers/media/platform/verisilicon/hantro_v4l2.c
> > index 8f1414085f47..d71f79471396 100644
> > --- a/drivers/media/platform/verisilicon/hantro_v4l2.c
> > +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
> > @@ -275,7 +275,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
> >                           struct v4l2_pix_format_mplane *pix_mp,
> >                           enum v4l2_buf_type type)
> >  {
> > -       const struct hantro_fmt *fmt, *vpu_fmt;
> > +       const struct hantro_fmt *fmt;
> >         bool capture = V4L2_TYPE_IS_CAPTURE(type);
> >         bool coded;
> >
> > @@ -295,11 +295,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
> >
> >         if (coded) {
> >                 pix_mp->num_planes = 1;
> > -               vpu_fmt = fmt;
> > -       } else if (ctx->is_encoder) {
> > -               vpu_fmt = ctx->vpu_dst_fmt;
> > -       } else {
> > -               vpu_fmt = fmt;
> > +       } else if (!ctx->is_encoder) {
> >                 /*
> >                  * Width/height on the CAPTURE end of a decoder are ignored
> > and
> >                  * replaced by the OUTPUT ones.
> > @@ -311,7 +307,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
> >         pix_mp->field = V4L2_FIELD_NONE;
> >
> >         v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height,
> > -                                      &vpu_fmt->frmsize);
> > +                                      &fmt->frmsize);
> >
> >         if (!coded) {
> >                 /* Fill remaining fields */

Subject: Re: [PATCH v2] media: verisilicon: Fix crash when probing encoder

Il 13/04/23 12:47, Benjamin Gaignard ha scritto:
> ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt()
> so assigne it to vpu_fmt led to crash the kernel.
> Like for decoder case use 'fmt' as format for encoder and clean up
> the code.
>
> Signed-off-by: Benjamin Gaignard <[email protected]>
> Tested-by: Marek Szyprowski <[email protected]>
> Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")

Reviewed-by: AngeloGioacchino Del Regno <[email protected]>


2023-05-19 22:54:24

by Diederik de Haas

[permalink] [raw]
Subject: Re: [PATCH v2] media: verisilicon: Fix crash when probing encoder

Hi,

On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
> Le jeudi 13 avril 2023 ? 10:10 -0300, Ezequiel Garcia a ?crit :
> > Benjamin,
> >
> > Please include the crash stracktrace in the commit.
>
> Careful with HTML message, they don't always make it in these ML and tooling
> might not play well with the tooling. Perhaps it can be edited while
> pulling ? Here's the info from Marek's bug report:
>
> hantro-vpu fdea0000.video-codec: Adding to iommu group 0
> hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as
> /dev/video0
> hantro-vpu fdee0000.video-codec: Adding to iommu group 1
> hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as
> /dev/video1
> Unable to handle kernel NULL pointer dereference at virtual address
> 0000000000000008
> Mem abort info:
> ESR = 0x0000000096000004
> EC = 0x25: DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> FSC = 0x04: level 0 translation fault
> Data abort info:
> ISV = 0, ISS = 0x00000004
> CM = 0, WnR = 0
> user pgtable: 4k pages, 48-bit VAs, pgdp=00000001f446f000
> [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> Modules linked in: hantro_vpu v4l2_vp9 v4l2_h264 v4l2_mem2mem
> videobuf2_dma_contig snd_soc_simple_card display_connector
> snd_soc_simple_card_utils videobuf2_memops crct10dif_ce dwmac_rk
> rockchip_thermal videobuf2_v4l2 stmmac_platform rockchip_saradc
> industrialio_triggered_buffer kfifo_buf stmmac videodev pcs_xpcs
> rtc_rk808 videobuf2_common rockchipdrm panfrost mc drm_shmem_helper
> analogix_dp gpu_sched dw_mipi_dsi dw_hdmi drm_display_helper ip_tables
> x_tables ipv6
> CPU: 3 PID: 171 Comm: v4l_id Not tainted 6.3.0-rc2+ #13478
> Hardware name: Hardkernel ODROID-M1 (DT)
> pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : hantro_try_fmt+0xb4/0x280 [hantro_vpu]
> lr : hantro_try_fmt+0xa8/0x280 [hantro_vpu]
> ...
> Call trace:
> hantro_try_fmt+0xb4/0x280 [hantro_vpu]
> hantro_set_fmt_out+0x3c/0x278 [hantro_vpu]
> hantro_reset_raw_fmt+0x94/0xb4 [hantro_vpu]
> hantro_set_fmt_cap+0x23c/0x250 [hantro_vpu]
> hantro_reset_fmts+0x94/0xcc [hantro_vpu]
> hantro_open+0xd4/0x20c [hantro_vpu]
> v4l2_open+0x80/0x120 [videodev]
> chrdev_open+0xc0/0x22c
> do_dentry_open+0x13c/0x490
> vfs_open+0x2c/0x38
> path_openat+0x550/0x938
> do_filp_open+0x80/0x12c
> do_sys_openat2+0xb4/0x16c
> __arm64_sys_openat+0x64/0xac
> invoke_syscall+0x48/0x114
> el0_svc_common.constprop.0+0xfc/0x11c
> do_el0_svc+0x38/0xa4
> el0_svc+0x48/0xb8
> el0t_64_sync_handler+0xb8/0xbc
> el0t_64_sync+0x190/0x194
> Code: 97fe726c f940aa80 52864a61 72a686c1 (b9400800)
> ---[ end trace 0000000000000000 ]---

When I booted into my 6.4-rc1 (but also rc2) kernel on my
Pine64 Quartz64 Model A, I noticed a crash which seems the same as
above, but I didn't have such a crash with my 6.3 kernel.
Searching for 'hantro' led me to this commit as the most likely culprit
but when I build a new 6.4-rcX kernel with this commit reverted,
I still had this crash.
Do you have suggestions which commit would then be the likely culprit?

Cheers,
Diederik

For completeness, this is the error I got with 6.4-rcX:

[ 26.976766] panfrost fde60000.gpu: clock rate = 594000000
[ 26.977297] panfrost fde60000.gpu: bus_clock rate = 500000000
[ 26.996012] random: crng init done
[ 27.072438] videodev: Linux video capture interface: v2.00
[ 27.119012] Registered IR keymap rc-cec
[ 27.125161] rc rc0: dw_hdmi as /devices/platform/fe0a0000.hdmi/rc/rc0
[ 27.125427] panfrost fde60000.gpu: mali-g52 id 0x7402 major 0x1 minor 0x0 status 0x0
[ 27.125905] input: dw_hdmi as /devices/platform/fe0a0000.hdmi/rc/rc0/input1
[ 27.126427] panfrost fde60000.gpu: features: 00000000,00000cf7, issues: 00000000,00000400
[ 27.127785] panfrost fde60000.gpu: Features: L2:0x07110206 Shader:0x00000002 Tiler:0x00000209 Mem:0x1 MMU:0x00002823 AS:0xff JS:0x7
[ 27.128954] panfrost fde60000.gpu: shader_present=0x1 l2_present=0x1
[ 27.148920] gpio-fan gpio_fan: GPIO fan initialized
[ 27.191131] [drm] Initialized panfrost 1.2.0 20180908 for fde60000.gpu on minor 1
[ 27.265920] hantro-vpu fdea0000.video-codec: Adding to iommu group 0
[ 27.267535] hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as /dev/video0
[ 27.270668] hantro-vpu fdee0000.video-codec: Adding to iommu group 1
[ 27.272590] hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as /dev/video1
[ 27.573417] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 27.574238] Mem abort info:
[ 27.574499] ESR = 0x0000000096000004
[ 27.574836] EC = 0x25: DABT (current EL), IL = 32 bits
[ 27.575310] SET = 0, FnV = 0
[ 27.575586] EA = 0, S1PTW = 0
[ 27.575868] FSC = 0x04: level 0 translation fault
[ 27.576368] Data abort info:
[ 27.576637] ISV = 0, ISS = 0x00000004
[ 27.576980] CM = 0, WnR = 0
[ 27.577247] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010718b000
[ 27.577818] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
[ 27.578430] Internal error: Oops: 0000000096000004 [#1] SMP
[ 27.578934] Modules linked in: polyval_generic(E+) ghash_ce(E) gf128mul(E) snd_soc_hdmi_codec(E+) sha2_ce(E) ecdh_generic(E+) sha256_arm64(E) sha1_ce(E) rfkill(E) snd_soc_spdif_tx(E) ecc(E) leds_gpio(E) snd_soc_simple_card(E) snd_soc_simple_card_utils(E) display_connector(E) gpio_fan(E) hantro_vpu(E) v4l2_vp9(E) snd_soc_rockchip_i2s_tdm(E) v4l2_h264(E) videobuf2_dma_contig(E) snd_soc_rockchip_spdif(E) snd_soc_rk817(E) v4l2_mem2mem(E) videobuf2_memops(E) governor_simpleondemand(E) rockchip_thermal(E) videobuf2_v4l2(E) dw_wdt(E) snd_soc_core(E) dw_hdmi_i2s_audio(E) dw_hdmi_cec(E) videodev(E) snd_pcm_dmaengine(E) panfrost(E) videobuf2_common(E) snd_pcm(E) rk805_pwrkey(E) snd_timer(E) gpu_sched(E) snd(E) soundcore(E) mc(E) drm_shmem_helper(E) cpufreq_dt(E) loop(E) fuse(E) efi_pstore(E) dm_mod(E) dax(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) xhci_plat_hcd(E) xhci_hcd(E) motorcomm(E) rk808_regulator(E) fan53555(E) gpio_rockchip(E) dwmac_rk(E) stmmac_platform(E)
[ 27.579108] crct10dif_ce(E) stmmac(E) pcs_xpcs(E) crct10dif_common(E) phylink(E) fixed(E) of_mdio(E) pinctrl_rockchip(E) phy_rockchip_inno_usb2(E) fixed_phy(E) sdhci_of_dwcmshc(E) dw_mmc_rockchip(E) fwnode_mdio(E) sdhci_pltfm(E) phy_rockchip_naneng_combphy(E) dw_mmc_pltfm(E) sdhci(E) dw_mmc(E) pl330(E) libphy(E) rockchipdrm(E) ptp(E) drm_dma_helper(E) analogix_dp(E) dw_hdmi(E) cec(E) rc_core(E) drm_display_helper(E) dw_mipi_dsi(E) pps_core(E) drm_kms_helper(E) ohci_platform(E) ohci_hcd(E) ehci_platform(E) io_domain(E) ehci_hcd(E) dwc3(E) i2c_rk3x(E) udc_core(E) usbcore(E) roles(E) ulpi(E) drm(E) usb_common(E)
[ 27.591758] CPU: 1 PID: 407 Comm: v4l_id Tainted: G E 6.4.0-0-pine64-arm64 #1 Debian 6.4~rc2-1~pine64
[ 27.592705] Hardware name: Pine64 RK3566 Quartz64-A Board (DT)
[ 27.593223] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 27.593843] pc : hantro_try_fmt+0xb8/0x290 [hantro_vpu]
[ 27.594336] lr : hantro_try_fmt+0xac/0x290 [hantro_vpu]
[ 27.594811] sp : ffff80000aa9b670
[ 27.595107] x29: ffff80000aa9b670 x28: ffff80000aa9bb60 x27: 0000000000000000
[ 27.595746] x26: 0000000000000000 x25: ffff000108380008 x24: ffff8000014ebac0
[ 27.596383] x23: 0000000000000001 x22: ffff000108380000 x21: 0000000000000000
[ 27.597019] x20: ffff8000014f10f0 x19: ffff80000aa9b708 x18: 0000000000000010
[ 27.597657] x17: 0000000000000000 x16: 0000000000000000 x15: ffff80000aa9b6a0
[ 27.598292] x14: ffff0001043fe280 x13: 0000000000000001 x12: 0000000000000001
[ 27.598927] x11: 0000000000000002 x10: 0000000000000003 x9 : 0000000000000002
[ 27.599563] x8 : 000000000000001f x7 : 000000000000005f x6 : 0000000000000003
[ 27.600199] x5 : ffff8000013e2583 x4 : ffff8000013e2580 x3 : 00000000ffffffff
[ 27.600834] x2 : 0000000000000010 x1 : 0000000000000000 x0 : 0000000034363253
[ 27.601470] Call trace:
[ 27.601693] hantro_try_fmt+0xb8/0x290 [hantro_vpu]
[ 27.602143] hantro_set_fmt_out+0x44/0x388 [hantro_vpu]
[ 27.602617] hantro_reset_raw_fmt+0x7c/0xe0 [hantro_vpu]
[ 27.603096] hantro_set_fmt_cap+0x29c/0x2b8 [hantro_vpu]
[ 27.603575] hantro_reset_encoded_fmt+0x80/0xc0 [hantro_vpu]
[ 27.604086] hantro_reset_fmts+0x20/0x48 [hantro_vpu]
[ 27.604545] hantro_open+0xe0/0x208 [hantro_vpu]
[ 27.604965] v4l2_open+0x84/0x130 [videodev]
[ 27.605389] chrdev_open+0xd8/0x2d8
[ 27.605712] do_dentry_open+0x1bc/0x490
[ 27.606060] vfs_open+0x34/0x40
[ 27.606345] path_openat+0x9c8/0xf20
[ 27.606670] do_filp_open+0xa4/0x160
[ 27.606991] do_sys_openat2+0xc8/0x188
[ 27.607327] __arm64_sys_openat+0x6c/0xb8
[ 27.607687] invoke_syscall+0x78/0x108
[ 27.608029] el0_svc_common.constprop.0+0xd4/0x100
[ 27.608456] do_el0_svc+0x40/0xa8
[ 27.608755] el0_svc+0x34/0xd8
[ 27.609036] el0t_64_sync_handler+0xf4/0x120
[ 27.609420] el0t_64_sync+0x190/0x198
[ 27.609753] Code: 97fbc8e2 f94056c1 52864a60 72a686c0 (b9400821)
[ 27.610297] ---[ end trace 0000000000000000 ]---


Attachments:
signature.asc (235.00 B)
This is a digitally signed message part.

2023-05-22 16:27:59

by Benjamin Gaignard

[permalink] [raw]
Subject: Re: [PATCH v2] media: verisilicon: Fix crash when probing encoder


Le 20/05/2023 à 00:34, Diederik de Haas a écrit :
> Hi,
>
> On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
>> Le jeudi 13 avril 2023 à 10:10 -0300, Ezequiel Garcia a écrit :
>>> Benjamin,
>>>
>>> Please include the crash stracktrace in the commit.
>> Careful with HTML message, they don't always make it in these ML and tooling
>> might not play well with the tooling. Perhaps it can be edited while
>> pulling ? Here's the info from Marek's bug report:
>>
>> hantro-vpu fdea0000.video-codec: Adding to iommu group 0
>> hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as
>> /dev/video0
>> hantro-vpu fdee0000.video-codec: Adding to iommu group 1
>> hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as
>> /dev/video1
>> Unable to handle kernel NULL pointer dereference at virtual address
>> 0000000000000008
>> Mem abort info:
>> ESR = 0x0000000096000004
>> EC = 0x25: DABT (current EL), IL = 32 bits
>> SET = 0, FnV = 0
>> EA = 0, S1PTW = 0
>> FSC = 0x04: level 0 translation fault
>> Data abort info:
>> ISV = 0, ISS = 0x00000004
>> CM = 0, WnR = 0
>> user pgtable: 4k pages, 48-bit VAs, pgdp=00000001f446f000
>> [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
>> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>> Modules linked in: hantro_vpu v4l2_vp9 v4l2_h264 v4l2_mem2mem
>> videobuf2_dma_contig snd_soc_simple_card display_connector
>> snd_soc_simple_card_utils videobuf2_memops crct10dif_ce dwmac_rk
>> rockchip_thermal videobuf2_v4l2 stmmac_platform rockchip_saradc
>> industrialio_triggered_buffer kfifo_buf stmmac videodev pcs_xpcs
>> rtc_rk808 videobuf2_common rockchipdrm panfrost mc drm_shmem_helper
>> analogix_dp gpu_sched dw_mipi_dsi dw_hdmi drm_display_helper ip_tables
>> x_tables ipv6
>> CPU: 3 PID: 171 Comm: v4l_id Not tainted 6.3.0-rc2+ #13478
>> Hardware name: Hardkernel ODROID-M1 (DT)
>> pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : hantro_try_fmt+0xb4/0x280 [hantro_vpu]
>> lr : hantro_try_fmt+0xa8/0x280 [hantro_vpu]
>> ...
>> Call trace:
>> hantro_try_fmt+0xb4/0x280 [hantro_vpu]
>> hantro_set_fmt_out+0x3c/0x278 [hantro_vpu]
>> hantro_reset_raw_fmt+0x94/0xb4 [hantro_vpu]
>> hantro_set_fmt_cap+0x23c/0x250 [hantro_vpu]
>> hantro_reset_fmts+0x94/0xcc [hantro_vpu]
>> hantro_open+0xd4/0x20c [hantro_vpu]
>> v4l2_open+0x80/0x120 [videodev]
>> chrdev_open+0xc0/0x22c
>> do_dentry_open+0x13c/0x490
>> vfs_open+0x2c/0x38
>> path_openat+0x550/0x938
>> do_filp_open+0x80/0x12c
>> do_sys_openat2+0xb4/0x16c
>> __arm64_sys_openat+0x64/0xac
>> invoke_syscall+0x48/0x114
>> el0_svc_common.constprop.0+0xfc/0x11c
>> do_el0_svc+0x38/0xa4
>> el0_svc+0x48/0xb8
>> el0t_64_sync_handler+0xb8/0xbc
>> el0t_64_sync+0x190/0x194
>> Code: 97fe726c f940aa80 52864a61 72a686c1 (b9400800)
>> ---[ end trace 0000000000000000 ]---
> When I booted into my 6.4-rc1 (but also rc2) kernel on my
> Pine64 Quartz64 Model A, I noticed a crash which seems the same as
> above, but I didn't have such a crash with my 6.3 kernel.
> Searching for 'hantro' led me to this commit as the most likely culprit
> but when I build a new 6.4-rcX kernel with this commit reverted,
> I still had this crash.
> Do you have suggestions which commit would then be the likely culprit?
>
This patch fix the crash at boot time, revert it doesn't seem to be the solution.
Maybe this proposal from Marek can help you ?
https://patchwork.kernel.org/project/linux-media/patch/[email protected]/

Regards,
Benjamin

> Cheers,
> Diederik
>
> For completeness, this is the error I got with 6.4-rcX:
>
> [ 26.976766] panfrost fde60000.gpu: clock rate = 594000000
> [ 26.977297] panfrost fde60000.gpu: bus_clock rate = 500000000
> [ 26.996012] random: crng init done
> [ 27.072438] videodev: Linux video capture interface: v2.00
> [ 27.119012] Registered IR keymap rc-cec
> [ 27.125161] rc rc0: dw_hdmi as /devices/platform/fe0a0000.hdmi/rc/rc0
> [ 27.125427] panfrost fde60000.gpu: mali-g52 id 0x7402 major 0x1 minor 0x0 status 0x0
> [ 27.125905] input: dw_hdmi as /devices/platform/fe0a0000.hdmi/rc/rc0/input1
> [ 27.126427] panfrost fde60000.gpu: features: 00000000,00000cf7, issues: 00000000,00000400
> [ 27.127785] panfrost fde60000.gpu: Features: L2:0x07110206 Shader:0x00000002 Tiler:0x00000209 Mem:0x1 MMU:0x00002823 AS:0xff JS:0x7
> [ 27.128954] panfrost fde60000.gpu: shader_present=0x1 l2_present=0x1
> [ 27.148920] gpio-fan gpio_fan: GPIO fan initialized
> [ 27.191131] [drm] Initialized panfrost 1.2.0 20180908 for fde60000.gpu on minor 1
> [ 27.265920] hantro-vpu fdea0000.video-codec: Adding to iommu group 0
> [ 27.267535] hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as /dev/video0
> [ 27.270668] hantro-vpu fdee0000.video-codec: Adding to iommu group 1
> [ 27.272590] hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as /dev/video1
> [ 27.573417] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> [ 27.574238] Mem abort info:
> [ 27.574499] ESR = 0x0000000096000004
> [ 27.574836] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 27.575310] SET = 0, FnV = 0
> [ 27.575586] EA = 0, S1PTW = 0
> [ 27.575868] FSC = 0x04: level 0 translation fault
> [ 27.576368] Data abort info:
> [ 27.576637] ISV = 0, ISS = 0x00000004
> [ 27.576980] CM = 0, WnR = 0
> [ 27.577247] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010718b000
> [ 27.577818] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
> [ 27.578430] Internal error: Oops: 0000000096000004 [#1] SMP
> [ 27.578934] Modules linked in: polyval_generic(E+) ghash_ce(E) gf128mul(E) snd_soc_hdmi_codec(E+) sha2_ce(E) ecdh_generic(E+) sha256_arm64(E) sha1_ce(E) rfkill(E) snd_soc_spdif_tx(E) ecc(E) leds_gpio(E) snd_soc_simple_card(E) snd_soc_simple_card_utils(E) display_connector(E) gpio_fan(E) hantro_vpu(E) v4l2_vp9(E) snd_soc_rockchip_i2s_tdm(E) v4l2_h264(E) videobuf2_dma_contig(E) snd_soc_rockchip_spdif(E) snd_soc_rk817(E) v4l2_mem2mem(E) videobuf2_memops(E) governor_simpleondemand(E) rockchip_thermal(E) videobuf2_v4l2(E) dw_wdt(E) snd_soc_core(E) dw_hdmi_i2s_audio(E) dw_hdmi_cec(E) videodev(E) snd_pcm_dmaengine(E) panfrost(E) videobuf2_common(E) snd_pcm(E) rk805_pwrkey(E) snd_timer(E) gpu_sched(E) snd(E) soundcore(E) mc(E) drm_shmem_helper(E) cpufreq_dt(E) loop(E) fuse(E) efi_pstore(E) dm_mod(E) dax(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) xhci_plat_hcd(E) xhci_hcd(E) motorcomm(E) rk808_regulator(E) fan53555(E) gpio_rockchip(E) dwmac_rk(E) stmmac_platform(E)
> [ 27.579108] crct10dif_ce(E) stmmac(E) pcs_xpcs(E) crct10dif_common(E) phylink(E) fixed(E) of_mdio(E) pinctrl_rockchip(E) phy_rockchip_inno_usb2(E) fixed_phy(E) sdhci_of_dwcmshc(E) dw_mmc_rockchip(E) fwnode_mdio(E) sdhci_pltfm(E) phy_rockchip_naneng_combphy(E) dw_mmc_pltfm(E) sdhci(E) dw_mmc(E) pl330(E) libphy(E) rockchipdrm(E) ptp(E) drm_dma_helper(E) analogix_dp(E) dw_hdmi(E) cec(E) rc_core(E) drm_display_helper(E) dw_mipi_dsi(E) pps_core(E) drm_kms_helper(E) ohci_platform(E) ohci_hcd(E) ehci_platform(E) io_domain(E) ehci_hcd(E) dwc3(E) i2c_rk3x(E) udc_core(E) usbcore(E) roles(E) ulpi(E) drm(E) usb_common(E)
> [ 27.591758] CPU: 1 PID: 407 Comm: v4l_id Tainted: G E 6.4.0-0-pine64-arm64 #1 Debian 6.4~rc2-1~pine64
> [ 27.592705] Hardware name: Pine64 RK3566 Quartz64-A Board (DT)
> [ 27.593223] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 27.593843] pc : hantro_try_fmt+0xb8/0x290 [hantro_vpu]
> [ 27.594336] lr : hantro_try_fmt+0xac/0x290 [hantro_vpu]
> [ 27.594811] sp : ffff80000aa9b670
> [ 27.595107] x29: ffff80000aa9b670 x28: ffff80000aa9bb60 x27: 0000000000000000
> [ 27.595746] x26: 0000000000000000 x25: ffff000108380008 x24: ffff8000014ebac0
> [ 27.596383] x23: 0000000000000001 x22: ffff000108380000 x21: 0000000000000000
> [ 27.597019] x20: ffff8000014f10f0 x19: ffff80000aa9b708 x18: 0000000000000010
> [ 27.597657] x17: 0000000000000000 x16: 0000000000000000 x15: ffff80000aa9b6a0
> [ 27.598292] x14: ffff0001043fe280 x13: 0000000000000001 x12: 0000000000000001
> [ 27.598927] x11: 0000000000000002 x10: 0000000000000003 x9 : 0000000000000002
> [ 27.599563] x8 : 000000000000001f x7 : 000000000000005f x6 : 0000000000000003
> [ 27.600199] x5 : ffff8000013e2583 x4 : ffff8000013e2580 x3 : 00000000ffffffff
> [ 27.600834] x2 : 0000000000000010 x1 : 0000000000000000 x0 : 0000000034363253
> [ 27.601470] Call trace:
> [ 27.601693] hantro_try_fmt+0xb8/0x290 [hantro_vpu]
> [ 27.602143] hantro_set_fmt_out+0x44/0x388 [hantro_vpu]
> [ 27.602617] hantro_reset_raw_fmt+0x7c/0xe0 [hantro_vpu]
> [ 27.603096] hantro_set_fmt_cap+0x29c/0x2b8 [hantro_vpu]
> [ 27.603575] hantro_reset_encoded_fmt+0x80/0xc0 [hantro_vpu]
> [ 27.604086] hantro_reset_fmts+0x20/0x48 [hantro_vpu]
> [ 27.604545] hantro_open+0xe0/0x208 [hantro_vpu]
> [ 27.604965] v4l2_open+0x84/0x130 [videodev]
> [ 27.605389] chrdev_open+0xd8/0x2d8
> [ 27.605712] do_dentry_open+0x1bc/0x490
> [ 27.606060] vfs_open+0x34/0x40
> [ 27.606345] path_openat+0x9c8/0xf20
> [ 27.606670] do_filp_open+0xa4/0x160
> [ 27.606991] do_sys_openat2+0xc8/0x188
> [ 27.607327] __arm64_sys_openat+0x6c/0xb8
> [ 27.607687] invoke_syscall+0x78/0x108
> [ 27.608029] el0_svc_common.constprop.0+0xd4/0x100
> [ 27.608456] do_el0_svc+0x40/0xa8
> [ 27.608755] el0_svc+0x34/0xd8
> [ 27.609036] el0t_64_sync_handler+0xf4/0x120
> [ 27.609420] el0t_64_sync+0x190/0x198
> [ 27.609753] Code: 97fbc8e2 f94056c1 52864a60 72a686c0 (b9400821)
> [ 27.610297] ---[ end trace 0000000000000000 ]---

2023-05-22 23:07:42

by Diederik de Haas

[permalink] [raw]
Subject: Re: [PATCH v2] media: verisilicon: Fix crash when probing encoder

On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote:
> Le 20/05/2023 ? 00:34, Diederik de Haas a ?crit :
> > On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
> >> Le jeudi 13 avril 2023 ? 10:10 -0300, Ezequiel Garcia a ?crit :
> >>> Benjamin,
> >>>
> >>> Please include the crash stracktrace in the commit.
> >>
> >> Careful with HTML message, they don't always make it in these ML and
> >> tooling might not play well with the tooling. Perhaps it can be edited
> >> while pulling ? Here's the info from Marek's bug report:
> >>
> >> hantro-vpu fdea0000.video-codec: Adding to iommu group 0
> >> hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as
> >> /dev/video0
> >> hantro-vpu fdee0000.video-codec: Adding to iommu group 1
> >> hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as
> >> /dev/video1
> >> Unable to handle kernel NULL pointer dereference at virtual address
> >> 0000000000000008
> >> Mem abort info:
> >> ESR = 0x0000000096000004
> >> EC = 0x25: DABT (current EL), IL = 32 bits
> >> SET = 0, FnV = 0
> >> EA = 0, S1PTW = 0
> >> FSC = 0x04: level 0 translation fault
> >> Data abort info:
> >> ISV = 0, ISS = 0x00000004
> >> CM = 0, WnR = 0
> >> user pgtable: 4k pages, 48-bit VAs, pgdp=00000001f446f000
> >> [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
> >> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> >> Modules linked in: hantro_vpu v4l2_vp9 v4l2_h264 v4l2_mem2mem
> >> videobuf2_dma_contig snd_soc_simple_card display_connector
> >> snd_soc_simple_card_utils videobuf2_memops crct10dif_ce dwmac_rk
> >> rockchip_thermal videobuf2_v4l2 stmmac_platform rockchip_saradc
> >> industrialio_triggered_buffer kfifo_buf stmmac videodev pcs_xpcs
> >> rtc_rk808 videobuf2_common rockchipdrm panfrost mc drm_shmem_helper
> >> analogix_dp gpu_sched dw_mipi_dsi dw_hdmi drm_display_helper ip_tables
> >> x_tables ipv6
> >> CPU: 3 PID: 171 Comm: v4l_id Not tainted 6.3.0-rc2+ #13478
> >> Hardware name: Hardkernel ODROID-M1 (DT)
> >> pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >> pc : hantro_try_fmt+0xb4/0x280 [hantro_vpu]
> >> lr : hantro_try_fmt+0xa8/0x280 [hantro_vpu]
> >> ...
> >> Call trace:
> >> hantro_try_fmt+0xb4/0x280 [hantro_vpu]
> >> hantro_set_fmt_out+0x3c/0x278 [hantro_vpu]
> >> hantro_reset_raw_fmt+0x94/0xb4 [hantro_vpu]
> >> hantro_set_fmt_cap+0x23c/0x250 [hantro_vpu]
> >> hantro_reset_fmts+0x94/0xcc [hantro_vpu]
> >> hantro_open+0xd4/0x20c [hantro_vpu]
> >> v4l2_open+0x80/0x120 [videodev]
> >> chrdev_open+0xc0/0x22c
> >> do_dentry_open+0x13c/0x490
> >> vfs_open+0x2c/0x38
> >> path_openat+0x550/0x938
> >> do_filp_open+0x80/0x12c
> >> do_sys_openat2+0xb4/0x16c
> >> __arm64_sys_openat+0x64/0xac
> >> invoke_syscall+0x48/0x114
> >> el0_svc_common.constprop.0+0xfc/0x11c
> >> do_el0_svc+0x38/0xa4
> >> el0_svc+0x48/0xb8
> >> el0t_64_sync_handler+0xb8/0xbc
> >> el0t_64_sync+0x190/0x194
> >> Code: 97fe726c f940aa80 52864a61 72a686c1 (b9400800)
> >> ---[ end trace 0000000000000000 ]---
> >
> > When I booted into my 6.4-rc1 (but also rc2) kernel on my
> > Pine64 Quartz64 Model A, I noticed a crash which seems the same as
> > above, but I didn't have such a crash with my 6.3 kernel.
> > Searching for 'hantro' led me to this commit as the most likely culprit
> > but when I build a new 6.4-rcX kernel with this commit reverted,
> > I still had this crash.
> > Do you have suggestions which commit would then be the likely culprit?
>
> This patch fix the crash at boot time, revert it doesn't seem to be the
> solution. Maybe this proposal from Marek can help you ?
> https://patchwork.kernel.org/project/linux-media/patch/20230421104759.223646
> [email protected]/

That helped :) After applying that patch I no longer have the crash.
Thanks!

Regards,
Diederik


Attachments:
signature.asc (235.00 B)
This is a digitally signed message part.

2023-05-23 10:58:33

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [PATCH v2] media: verisilicon: Fix crash when probing encoder

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 20.05.23 00:34, Diederik de Haas wrote:
>
> On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
>> Le jeudi 13 avril 2023 à 10:10 -0300, Ezequiel Garcia a écrit :
> [....]
>
> When I booted into my 6.4-rc1 (but also rc2) kernel on my
> Pine64 Quartz64 Model A, I noticed a crash which seems the same as
> above, but I didn't have such a crash with my 6.3 kernel.
> Searching for 'hantro' led me to this commit as the most likely culprit
> but when I build a new 6.4-rcX kernel with this commit reverted,
> I still had this crash.
> Do you have suggestions which commit would then be the likely culprit?
> [...]

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced db6f68b51e5c
#regzbot title media: verisilicon: null pointer dereference in try_fmt
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

2023-05-23 11:11:12

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [PATCH v2] media: verisilicon: Fix crash when probing encoder

CCing the Regression list and a bunch of other people that were CCed in
threads that look related:

On 23.05.23 00:38, Diederik de Haas wrote:
> On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote:
>> Le 20/05/2023 à 00:34, Diederik de Haas a écrit :
>>> On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
> [...]
>>> When I booted into my 6.4-rc1 (but also rc2) kernel on my
>>> Pine64 Quartz64 Model A, I noticed a crash which seems the same as
>>> above, but I didn't have such a crash with my 6.3 kernel.
>>> Searching for 'hantro' led me to this commit as the most likely culprit
>>> but when I build a new 6.4-rcX kernel with this commit reverted,
>>> I still had this crash.
>>> Do you have suggestions which commit would then be the likely culprit?
>>
>> This patch fix the crash at boot time, revert it doesn't seem to be the
>> solution. Maybe this proposal from Marek can help you ?
>>
>> https://patchwork.kernel.org/project/linux-media/patch/[email protected]/
>
> That helped :) After applying that patch I no longer have the crash.
> Thanks!

That regression fix is now a month old, but not yet merged afaics --
guess due to Nicolas comment that wasn't addressed yet and likely
requires a updated patch.

Michael afaics a week ago posted a patch that to my *very limited
understanding of things* (I hope I don't confuse matters here!) seems to
address the same problem, but slightly differently:
https://lore.kernel.org/all/[email protected]/

No reply yet.

That's all a bit unfortunate, as it's not how regression fixes should be
dealt with -- and caused multiple people headaches that could have been
avoided. :-/

But well, things happen. But it leads to the question:

How can we finally address the issue quickly now to ensure is doesn't
cause headaches for even more people?

Marek, Michael, could you work on a patch together that we then get
somewhat fast-tracked to Linus to avoid him getting even more unhappy
about the state of things[1]?

Ciao, Thorsten

[1] see
https://lore.kernel.org/all/CAHk-=wgzU8_dGn0Yg+DyX7ammTkDUCyEJ4C=NvnHRhxKWC7Wpw@mail.gmail.com/

2023-05-23 15:15:36

by Michael Tretter

[permalink] [raw]
Subject: Re: [PATCH v2] media: verisilicon: Fix crash when probing encoder

On Tue, 23 May 2023 12:50:42 +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> CCing the Regression list and a bunch of other people that were CCed in
> threads that look related:

Thanks!

>
> On 23.05.23 00:38, Diederik de Haas wrote:
> > On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote:
> >> Le 20/05/2023 à 00:34, Diederik de Haas a écrit :
> >>> On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
> > [...]
> >>> When I booted into my 6.4-rc1 (but also rc2) kernel on my
> >>> Pine64 Quartz64 Model A, I noticed a crash which seems the same as
> >>> above, but I didn't have such a crash with my 6.3 kernel.
> >>> Searching for 'hantro' led me to this commit as the most likely culprit
> >>> but when I build a new 6.4-rcX kernel with this commit reverted,
> >>> I still had this crash.
> >>> Do you have suggestions which commit would then be the likely culprit?
> >>
> >> This patch fix the crash at boot time, revert it doesn't seem to be the
> >> solution. Maybe this proposal from Marek can help you ?
> >>
> >> https://patchwork.kernel.org/project/linux-media/patch/[email protected]/
> >
> > That helped :) After applying that patch I no longer have the crash.
> > Thanks!
>
> That regression fix is now a month old, but not yet merged afaics --
> guess due to Nicolas comment that wasn't addressed yet and likely
> requires a updated patch.

I agree with Nicolas comment on that patch and it needs to be updated.

>
> Michael afaics a week ago posted a patch that to my *very limited
> understanding of things* (I hope I don't confuse matters here!) seems to
> address the same problem, but slightly differently:
> https://lore.kernel.org/all/[email protected]/

Correct, my patch addresses the same problem.

>
> No reply yet.
>
> That's all a bit unfortunate, as it's not how regression fixes should be
> dealt with -- and caused multiple people headaches that could have been
> avoided. :-/
>
> But well, things happen. But it leads to the question:
>
> How can we finally address the issue quickly now to ensure is doesn't
> cause headaches for even more people?
>
> Marek, Michael, could you work on a patch together that we then get
> somewhat fast-tracked to Linus to avoid him getting even more unhappy
> about the state of things[1]?

Marek, if you have an updated patch, I will happily test and review it.
Otherwise, please take a look at my patch.

Michael

2023-05-23 16:47:10

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v2] media: verisilicon: Fix crash when probing encoder

On 23.05.2023 16:54, Michael Tretter wrote:
> On Tue, 23 May 2023 12:50:42 +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
>> CCing the Regression list and a bunch of other people that were CCed in
>> threads that look related:
> Thanks!
>
>> On 23.05.23 00:38, Diederik de Haas wrote:
>>> On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote:
>>>> Le 20/05/2023 à 00:34, Diederik de Haas a écrit :
>>>>> On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
>>> [...]
>>>>> When I booted into my 6.4-rc1 (but also rc2) kernel on my
>>>>> Pine64 Quartz64 Model A, I noticed a crash which seems the same as
>>>>> above, but I didn't have such a crash with my 6.3 kernel.
>>>>> Searching for 'hantro' led me to this commit as the most likely culprit
>>>>> but when I build a new 6.4-rcX kernel with this commit reverted,
>>>>> I still had this crash.
>>>>> Do you have suggestions which commit would then be the likely culprit?
>>>> This patch fix the crash at boot time, revert it doesn't seem to be the
>>>> solution. Maybe this proposal from Marek can help you ?
>>>>
>>>> https://patchwork.kernel.org/project/linux-media/patch/[email protected]/
>>> That helped :) After applying that patch I no longer have the crash.
>>> Thanks!
>> That regression fix is now a month old, but not yet merged afaics --
>> guess due to Nicolas comment that wasn't addressed yet and likely
>> requires a updated patch.
> I agree with Nicolas comment on that patch and it needs to be updated.
>
>> Michael afaics a week ago posted a patch that to my *very limited
>> understanding of things* (I hope I don't confuse matters here!) seems to
>> address the same problem, but slightly differently:
>> https://lore.kernel.org/all/[email protected]/
> Correct, my patch addresses the same problem.
>
>> No reply yet.
>>
>> That's all a bit unfortunate, as it's not how regression fixes should be
>> dealt with -- and caused multiple people headaches that could have been
>> avoided. :-/
>>
>> But well, things happen. But it leads to the question:
>>
>> How can we finally address the issue quickly now to ensure is doesn't
>> cause headaches for even more people?
>>
>> Marek, Michael, could you work on a patch together that we then get
>> somewhat fast-tracked to Linus to avoid him getting even more unhappy
>> about the state of things[1]?
> Marek, if you have an updated patch, I will happily test and review it.
> Otherwise, please take a look at my patch.

Go ahead with your, my was just a blind try eliminating the oops during
driver probe.


Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland


2023-05-25 14:47:23

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: [PATCH v2] media: verisilicon: Fix crash when probing encoder

Hi Micheal,

Le mardi 23 mai 2023 à 16:54 +0200, Michael Tretter a écrit :
> On Tue, 23 May 2023 12:50:42 +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> > CCing the Regression list and a bunch of other people that were CCed in
> > threads that look related:
>
> Thanks!
>
> >
> > On 23.05.23 00:38, Diederik de Haas wrote:
> > > On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote:
> > > > Le 20/05/2023 à 00:34, Diederik de Haas a écrit :
> > > > > On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
> > > [...]
> > > > > When I booted into my 6.4-rc1 (but also rc2) kernel on my
> > > > > Pine64 Quartz64 Model A, I noticed a crash which seems the same as
> > > > > above, but I didn't have such a crash with my 6.3 kernel.
> > > > > Searching for 'hantro' led me to this commit as the most likely culprit
> > > > > but when I build a new 6.4-rcX kernel with this commit reverted,
> > > > > I still had this crash.
> > > > > Do you have suggestions which commit would then be the likely culprit?
> > > >
> > > > This patch fix the crash at boot time, revert it doesn't seem to be the
> > > > solution. Maybe this proposal from Marek can help you ?
> > > >
> > > > https://patchwork.kernel.org/project/linux-media/patch/[email protected]/
> > >
> > > That helped :) After applying that patch I no longer have the crash.
> > > Thanks!
> >
> > That regression fix is now a month old, but not yet merged afaics --
> > guess due to Nicolas comment that wasn't addressed yet and likely
> > requires a updated patch.
>
> I agree with Nicolas comment on that patch and it needs to be updated.
>
> >
> > Michael afaics a week ago posted a patch that to my *very limited
> > understanding of things* (I hope I don't confuse matters here!) seems to
> > address the same problem, but slightly differently:
> > https://lore.kernel.org/all/[email protected]/
>
> Correct, my patch addresses the same problem.

Sorry, just got really busy and missed the second fix. From a hot fix stand
point, your patch seems a lot safer. It does not go as far, and probably does
not make the driver better, but considering we had such a slow response we need
to do something about it.

Ezequiel, will you be fine with the approach ?

>
> >
> > No reply yet.
> >
> > That's all a bit unfortunate, as it's not how regression fixes should be
> > dealt with -- and caused multiple people headaches that could have been
> > avoided. :-/
> >
> > But well, things happen. But it leads to the question:
> >
> > How can we finally address the issue quickly now to ensure is doesn't
> > cause headaches for even more people?
> >
> > Marek, Michael, could you work on a patch together that we then get
> > somewhat fast-tracked to Linus to avoid him getting even more unhappy
> > about the state of things[1]?
>
> Marek, if you have an updated patch, I will happily test and review it.
> Otherwise, please take a look at my patch.
>
> Michael
>


2023-06-01 13:46:09

by Michael Tretter

[permalink] [raw]
Subject: Re: [PATCH v2] media: verisilicon: Fix crash when probing encoder

Hi Benjamin,

On Thu, 13 Apr 2023 12:47:56 +0200, Benjamin Gaignard wrote:
> ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt()
> so assigne it to vpu_fmt led to crash the kernel.
> Like for decoder case use 'fmt' as format for encoder and clean up
> the code.
>
> Signed-off-by: Benjamin Gaignard <[email protected]>
> Tested-by: Marek Szyprowski <[email protected]>
> Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")
> ---
> version 2:
> - Remove useless vpu_fmt.
>
> drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++-------
> 1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
> index 8f1414085f47..d71f79471396 100644
> --- a/drivers/media/platform/verisilicon/hantro_v4l2.c
> +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
> @@ -275,7 +275,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
> struct v4l2_pix_format_mplane *pix_mp,
> enum v4l2_buf_type type)
> {
> - const struct hantro_fmt *fmt, *vpu_fmt;
> + const struct hantro_fmt *fmt;
> bool capture = V4L2_TYPE_IS_CAPTURE(type);
> bool coded;
>
> @@ -295,11 +295,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
>
> if (coded) {
> pix_mp->num_planes = 1;
> - vpu_fmt = fmt;
> - } else if (ctx->is_encoder) {
> - vpu_fmt = ctx->vpu_dst_fmt;
> - } else {
> - vpu_fmt = fmt;
> + } else if (!ctx->is_encoder) {
> /*
> * Width/height on the CAPTURE end of a decoder are ignored and
> * replaced by the OUTPUT ones.
> @@ -311,7 +307,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
> pix_mp->field = V4L2_FIELD_NONE;
>
> v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height,
> - &vpu_fmt->frmsize);
> + &fmt->frmsize);

This causes a regression on the OUTPUT device of the encoder. fmt->frmsize is
only valid for coded ("bitstream") formats, but fmt on the OUTPUT of an
encoder will be a raw format. This results in width and height to be clamped
to 0.

I think the correct fix would be to apply the frmsize constraints of the
currently configured coded format, but as ctx->vpu_dst_fmt is not initialized
before calling this code, I don't know how to get the coded format.

Michael

>
> if (!coded) {
> /* Fill remaining fields */
> --
> 2.34.1
>

2023-06-01 14:15:43

by Benjamin Gaignard

[permalink] [raw]
Subject: Re: [PATCH v2] media: verisilicon: Fix crash when probing encoder


Le 01/06/2023 à 15:27, Michael Tretter a écrit :
> Hi Benjamin,
>
> On Thu, 13 Apr 2023 12:47:56 +0200, Benjamin Gaignard wrote:
>> ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt()
>> so assigne it to vpu_fmt led to crash the kernel.
>> Like for decoder case use 'fmt' as format for encoder and clean up
>> the code.
>>
>> Signed-off-by: Benjamin Gaignard <[email protected]>
>> Tested-by: Marek Szyprowski <[email protected]>
>> Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")
>> ---
>> version 2:
>> - Remove useless vpu_fmt.
>>
>> drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++-------
>> 1 file changed, 3 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
>> index 8f1414085f47..d71f79471396 100644
>> --- a/drivers/media/platform/verisilicon/hantro_v4l2.c
>> +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
>> @@ -275,7 +275,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
>> struct v4l2_pix_format_mplane *pix_mp,
>> enum v4l2_buf_type type)
>> {
>> - const struct hantro_fmt *fmt, *vpu_fmt;
>> + const struct hantro_fmt *fmt;
>> bool capture = V4L2_TYPE_IS_CAPTURE(type);
>> bool coded;
>>
>> @@ -295,11 +295,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
>>
>> if (coded) {
>> pix_mp->num_planes = 1;
>> - vpu_fmt = fmt;
>> - } else if (ctx->is_encoder) {
>> - vpu_fmt = ctx->vpu_dst_fmt;
>> - } else {
>> - vpu_fmt = fmt;
>> + } else if (!ctx->is_encoder) {
>> /*
>> * Width/height on the CAPTURE end of a decoder are ignored and
>> * replaced by the OUTPUT ones.
>> @@ -311,7 +307,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
>> pix_mp->field = V4L2_FIELD_NONE;
>>
>> v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height,
>> - &vpu_fmt->frmsize);
>> + &fmt->frmsize);
> This causes a regression on the OUTPUT device of the encoder. fmt->frmsize is
> only valid for coded ("bitstream") formats, but fmt on the OUTPUT of an
> encoder will be a raw format. This results in width and height to be clamped
> to 0.
>
> I think the correct fix would be to apply the frmsize constraints of the
> currently configured coded format, but as ctx->vpu_dst_fmt is not initialized
> before calling this code, I don't know how to get the coded format.

if ctx->dst_fmt is correctly set (and it should be) then doing:

pix_mp->width = ctx->dst_fmt.width;
pix_mp->height = ctx->dst_fmt.height;

should solve the issue.

Benjamin

>
> Michael
>
>>
>> if (!coded) {
>> /* Fill remaining fields */
>> --
>> 2.34.1
>>

2023-07-06 07:51:55

by Michael Tretter

[permalink] [raw]
Subject: Re: [PATCH v2] media: verisilicon: Fix crash when probing encoder

On Thu, 01 Jun 2023 16:08:13 +0200, Benjamin Gaignard wrote:
> Le 01/06/2023 à 15:27, Michael Tretter a écrit :
> > On Thu, 13 Apr 2023 12:47:56 +0200, Benjamin Gaignard wrote:
> > > ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt()
> > > so assigne it to vpu_fmt led to crash the kernel.
> > > Like for decoder case use 'fmt' as format for encoder and clean up
> > > the code.
> > >
> > > Signed-off-by: Benjamin Gaignard <[email protected]>
> > > Tested-by: Marek Szyprowski <[email protected]>
> > > Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")
> > > ---
> > > version 2:
> > > - Remove useless vpu_fmt.
> > >
> > > drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++-------
> > > 1 file changed, 3 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
> > > index 8f1414085f47..d71f79471396 100644
> > > --- a/drivers/media/platform/verisilicon/hantro_v4l2.c
> > > +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
> > > @@ -275,7 +275,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
> > > struct v4l2_pix_format_mplane *pix_mp,
> > > enum v4l2_buf_type type)
> > > {
> > > - const struct hantro_fmt *fmt, *vpu_fmt;
> > > + const struct hantro_fmt *fmt;
> > > bool capture = V4L2_TYPE_IS_CAPTURE(type);
> > > bool coded;
> > > @@ -295,11 +295,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
> > > if (coded) {
> > > pix_mp->num_planes = 1;
> > > - vpu_fmt = fmt;
> > > - } else if (ctx->is_encoder) {
> > > - vpu_fmt = ctx->vpu_dst_fmt;
> > > - } else {
> > > - vpu_fmt = fmt;
> > > + } else if (!ctx->is_encoder) {
> > > /*
> > > * Width/height on the CAPTURE end of a decoder are ignored and
> > > * replaced by the OUTPUT ones.
> > > @@ -311,7 +307,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
> > > pix_mp->field = V4L2_FIELD_NONE;
> > > v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height,
> > > - &vpu_fmt->frmsize);
> > > + &fmt->frmsize);
> > This causes a regression on the OUTPUT device of the encoder. fmt->frmsize is
> > only valid for coded ("bitstream") formats, but fmt on the OUTPUT of an
> > encoder will be a raw format. This results in width and height to be clamped
> > to 0.
> >
> > I think the correct fix would be to apply the frmsize constraints of the
> > currently configured coded format, but as ctx->vpu_dst_fmt is not initialized
> > before calling this code, I don't know how to get the coded format.
>
> if ctx->dst_fmt is correctly set (and it should be) then doing:
>
> pix_mp->width = ctx->dst_fmt.width;
> pix_mp->height = ctx->dst_fmt.height;
>
> should solve the issue.

Using the width and height of dst_fmt for OUTPUT is not correct, since the
v4l2 stateless encoder spec dictates that the size is set on the OUTPUT device
and will be ignored on the CAPTURE device.

I sent a patch [0] to apply the frame constraints using dst_fmt.

Michael

[0] https://lore.kernel.org/linux-media/[email protected]/

> > > if (!coded) {
> > > /* Fill remaining fields */
> > > --
> > > 2.34.1
> > >
>