2018-09-20 11:24:13

by John Garry

[permalink] [raw]
Subject: Re: Bug report: HiBMC crash

On 20/09/2018 11:04, John Garry wrote:
> Hi,
>
> I am seeing this crash below on linux-next (20 Sept).
>
> This is on an arm64 D05 board, which includes the HiBMC device. D06 was
> also crashing for what looked like same reason. I am using standard
> defconfig, except DRM and DRM_HISI_HIBMC are built-in.
>
> Is this a known issue? I tested v4.19-rc3 and it had no such crash.
>
> The origin seems to be here, where pointer info is not checked for NULL
> for safety:
> static int framebuffer_check(struct drm_device *dev,
> const struct drm_mode_fb_cmd2 *r)
> {
> ...
>
> /* now let the driver pick its own format info */
> info = drm_get_format_info(dev, r);
>
> ...
>
> for (i = 0; i < info->num_planes; i++) {
> unsigned int width = fb_plane_width(r->width, info, i);
> unsigned int height = fb_plane_height(r->height, info, i);
> unsigned int cpp = info->cpp[i];
>
>

Upon closer inspection the crash is actually from hibmc probe error
handling path, specifically hibmc_fbdev_destroy()->drm_framebuffer_put()
is called with fb holding the error value from hibmc_framebuffer_init(),
as shown:

static int hibmc_drm_fb_create(struct drm_fb_helper *helper,
struct drm_fb_helper_surface_size *sizes)
{

...

hi_fbdev->fb = hibmc_framebuffer_init(priv->dev, &mode_cmd, gobj);
if (IS_ERR(hi_fbdev->fb)) {
ret = PTR_ERR(hi_fbdev->fb);

*** hi_fbdev->fb holds error code ***

DRM_ERROR("failed to initialize framebuffer: %d\n", ret);
goto out_release_fbi;
}


static void hibmc_fbdev_destroy(struct hibmc_fbdev *fbdev)
{
struct hibmc_framebuffer *gfb = fbdev->fb;
struct drm_fb_helper *fbh = &fbdev->helper;

drm_fb_helper_unregister_fbi(fbh);

drm_fb_helper_fini(fbh);

** &gfb->fb holds error code, not pointer ***

if (gfb)
drm_framebuffer_put(&gfb->fb);
}

This change fixes the crash for me:

hi_fbdev->fb = hibmc_framebuffer_init(priv->dev, &mode_cmd, gobj);
if (IS_ERR(hi_fbdev->fb)) {
ret = PTR_ERR(hi_fbdev->fb);
+ hi_fbdev->fb = NULL;
DRM_ERROR("failed to initialize framebuffer: %d\n", ret);
goto out_release_fbi;
}

Why we're hitting the error path at all, I don't know.

And, having said all that, the code I pointed out in framebuffer_check()
still does not seem safe for same reason I mentioned originally.

John

> John
>
> [ 9.220446] pci 0007:90:00.0: can't derive routing for PCI INT A
> [ 9.226517] hibmc-drm 0007:91:00.0: PCI INT A: no GSI
> [ 9.231847] [TTM] Zone kernel: Available graphics memory: 16297696 kiB
> [ 9.238536] [TTM] Zone dma32: Available graphics memory: 2097152 kiB
> [ 9.245133] [TTM] Initializing pool allocator
> [ 9.249536] [TTM] Initializing DMA pool allocator
> [ 9.254340] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [ 9.261026] [drm] No driver support for vblank timestamp query.
> [ 9.272431] WARNING: CPU: 16 PID: 293 at
> drivers/gpu/drm/drm_fourcc.c:221 drm_format_info.part.1+0x0/0x8
> [ 9.282014] Modules linked in:
> [ 9.285095] CPU: 16 PID: 293 Comm: kworker/16:1 Not tainted
> 4.19.0-rc4-next-20180920-00001-g9b0012c #322
> [ 9.294677] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
> D05 IT21 Nemo 2.0 RC0 04/18/2018
> [ 9.303915] Workqueue: events work_for_cpu_fn
> [ 9.308314] pstate: 60000005 (nZCv daif -PAN -UAO)
> [ 9.313150] pc : drm_format_info.part.1+0x0/0x8
> [ 9.317724] lr : drm_get_format_info+0x90/0x98
> [ 9.322208] sp : ffff00000af1baf0
> [ 9.325549] x29: ffff00000af1baf0 x28: 0000000000000000
> [ 9.330915] x27: ffff00000af1bcb0 x26: ffff8017d3018800
> [ 9.336279] x25: ffff8017d28a0018 x24: ffff8017d2f80018
> [ 9.341644] x23: ffff8017d3018670 x22: ffff00000af1bbf0
> [ 9.347009] x21: ffff8017d3018a70 x20: ffff00000af1bbf0
> [ 9.352373] x19: ffff00000af1bbf0 x18: ffffffffffffffff
> [ 9.357737] x17: 0000000000000000 x16: 0000000000000000
> [ 9.363102] x15: ffff0000092296c8 x14: ffff000009074000
> [ 9.368466] x13: 0000000000000000 x12: 0000000000000000
> [ 9.373831] x11: ffff8017fbffe008 x10: ffff8017db9307e8
> [ 9.379195] x9 : 0000000000000000 x8 : ffff8017b517c800
> [ 9.384560] x7 : 0000000000000000 x6 : 000000000000003f
> [ 9.389924] x5 : 0000000000000040 x4 : 0000000000000000
> [ 9.395289] x3 : ffff000008d04000 x2 : 0000000056555941
> [ 9.400654] x1 : ffff000008d04f70 x0 : 0000000000000044
> [ 9.406019] Call trace:
> [ 9.408483] drm_format_info.part.1+0x0/0x8
> [ 9.412705] drm_helper_mode_fill_fb_struct+0x20/0x80
> [ 9.417807] hibmc_framebuffer_init+0x48/0xd0
> [ 9.422204] hibmc_drm_fb_create+0x1ec/0x3c8
> [ 9.426513] __drm_fb_helper_initial_config_and_unlock+0x1cc/0x418
> [ 9.432756] drm_fb_helper_initial_config+0x3c/0x48
> [ 9.437681] hibmc_fbdev_init+0xb4/0x198
> [ 9.441638] hibmc_pci_probe+0x2f4/0x3c8
> [ 9.445598] local_pci_probe+0x3c/0xb0
> [ 9.449379] work_for_cpu_fn+0x18/0x28
> [ 9.453161] process_one_work+0x1e0/0x318
> [ 9.457207] worker_thread+0x228/0x450
> [ 9.460988] kthread+0x128/0x130
> [ 9.464244] ret_from_fork+0x10/0x18
> [ 9.467850] ---[ end trace 2695ffa0af5be373 ]---
> [ 9.472525] WARNING: CPU: 16 PID: 293 at
> drivers/gpu/drm/drm_framebuffer.c:730 drm_framebuffer_init+0x18/0x110
> [ 9.482634] Modules linked in:
> [ 9.485714] CPU: 16 PID: 293 Comm: kworker/16:1 Tainted: G W
> 4.19.0-rc4-next-20180920-00001-g9b0012c #322
> [ 9.496702] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
> D05 IT21 Nemo 2.0 RC0 04/18/2018
> [ 9.505936] Workqueue: events work_for_cpu_fn
> [ 9.510333] pstate: 60000005 (nZCv daif -PAN -UAO)
> [ 9.515170] pc : drm_framebuffer_init+0x18/0x110
> [ 9.519831] lr : hibmc_framebuffer_init+0x60/0xd0
> [ 9.524578] sp : ffff00000af1baf0
> [ 9.527920] x29: ffff00000af1baf0 x28: 0000000000000000
> [ 9.533284] x27: ffff00000af1bcb0 x26: ffff8017d3018800
> [ 9.538649] x25: ffff8017d28a0018 x24: ffff8017d2f80018
> [ 9.544014] x23: ffff8017d3018670 x22: ffff00000af1bbf0
> [ 9.549378] x21: ffff8017d3018a70 x20: ffff8017d2420000
> [ 9.554743] x19: ffff8017b517c700 x18: ffffffffffffffff
> [ 9.560108] x17: 0000000000000000 x16: 0000000000000000
> [ 9.565472] x15: ffff0000092296c8 x14: ffff000009074000
> [ 9.570837] x13: 0000000000000000 x12: 0000000000000000
> [ 9.576201] x11: ffff8017fbffe008 x10: ffff8017db9307e8
> [ 9.581566] x9 : 0000000000000000 x8 : ffff8017b517c800
> [ 9.586930] x7 : 0000000000000000 x6 : 000000000000003f
> [ 9.592295] x5 : 0000000000000040 x4 : 0000000000000000
> [ 9.597660] x3 : ffff00000af1bc24 x2 : ffff000008d23f50
> [ 9.603024] x1 : ffff8017b517c700 x0 : 0000000000000000
> [ 9.608389] Call trace:
> [ 9.610852] drm_framebuffer_init+0x18/0x110
> [ 9.615161] hibmc_framebuffer_init+0x60/0xd0
> [ 9.619558] hibmc_drm_fb_create+0x1ec/0x3c8
> [ 9.623867] __drm_fb_helper_initial_config_and_unlock+0x1cc/0x418
> [ 9.630110] drm_fb_helper_initial_config+0x3c/0x48
> [ 9.635034] hibmc_fbdev_init+0xb4/0x198
> [ 9.638991] hibmc_pci_probe+0x2f4/0x3c8
> [ 9.642949] local_pci_probe+0x3c/0xb0
> [ 9.646731] work_for_cpu_fn+0x18/0x28
> [ 9.650513] process_one_work+0x1e0/0x318
> [ 9.654558] worker_thread+0x228/0x450
> [ 9.658339] kthread+0x128/0x130
> [ 9.661594] ret_from_fork+0x10/0x18
> [ 9.665199] ---[ end trace 2695ffa0af5be374 ]---
> [ 9.669868] [drm:hibmc_framebuffer_init] *ERROR* drm_framebuffer_init
> failed: -22
> [ 9.677434] [drm:hibmc_drm_fb_create] *ERROR* failed to initialize
> framebuffer: -22
> [ 9.685182] [drm:hibmc_fbdev_init] *ERROR* failed to setup initial
> conn config: -22
> [ 9.692926] [drm:hibmc_pci_probe] *ERROR* failed to initialize fbdev:
> -22
> [ 9.699791] Unable to handle kernel NULL pointer dereference at
> virtual address 000000000000001a
> [ 9.708672] Mem abort info:
> [ 9.711489] ESR = 0x96000004
> [ 9.714570] Exception class = DABT (current EL), IL = 32 bits
> [ 9.720551] SET = 0, FnV = 0
> [ 9.723631] EA = 0, S1PTW = 0
> [ 9.726799] Data abort info:
> [ 9.729702] ISV = 0, ISS = 0x00000004
> [ 9.733573] CM = 0, WnR = 0
> [ 9.736566] [000000000000001a] user address but active_mm is swapper
> [ 9.742987] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [ 9.748614] Modules linked in:
> [ 9.751694] CPU: 16 PID: 293 Comm: kworker/16:1 Tainted: G W
> 4.19.0-rc4-next-20180920-00001-g9b0012c #322
> [ 9.762681] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
> D05 IT21 Nemo 2.0 RC0 04/18/2018
> [ 9.771915] Workqueue: events work_for_cpu_fn
> [ 9.776312] pstate: 60000005 (nZCv daif -PAN -UAO)
> [ 9.781150] pc : drm_mode_object_put+0x0/0x20
> [ 9.785547] lr : hibmc_fbdev_fini+0x40/0x58
> [ 9.789767] sp : ffff00000af1bcf0
> [ 9.793108] x29: ffff00000af1bcf0 x28: 0000000000000000
> [ 9.798473] x27: 0000000000000000 x26: ffff000008f66630
> [ 9.803838] x25: 0000000000000000 x24: ffff0000095abb98
> [ 9.809203] x23: ffff8017db92fe00 x22: ffff8017d2b13000
> [ 9.814568] x21: ffffffffffffffea x20: ffff8017d2f80018
> [ 9.819933] x19: ffff8017d28a0018 x18: ffffffffffffffff
> [ 9.825297] x17: 0000000000000000 x16: 0000000000000000
> [ 9.830662] x15: ffff0000092296c8 x14: ffff00008939970f
> [ 9.836026] x13: ffff00000939971d x12: ffff000009229940
> [ 9.841391] x11: ffff0000085f8fc0 x10: ffff00000af1b9a0
> [ 9.846756] x9 : 000000000000000d x8 : 6620657a696c6169
> [ 9.852121] x7 : ffff8017d3340580 x6 : ffff8017d4168000
> [ 9.857486] x5 : 0000000000000000 x4 : ffff8017db92fb20
> [ 9.862850] x3 : 0000000000002690 x2 : ffff8017d3340480
> [ 9.868214] x1 : 0000000000000028 x0 : 0000000000000002
> [ 9.873580] Process kworker/16:1 (pid: 293, stack limit =
> 0x(____ptrval____))
> [ 9.880788] Call trace:
> [ 9.883252] drm_mode_object_put+0x0/0x20
> [ 9.887297] hibmc_unload+0x1c/0x80
> [ 9.890815] hibmc_pci_probe+0x170/0x3c8
> [ 9.894773] local_pci_probe+0x3c/0xb0
> [ 9.898555] work_for_cpu_fn+0x18/0x28
> [ 9.902337] process_one_work+0x1e0/0x318
> [ 9.906382] worker_thread+0x228/0x450
> [ 9.910164] kthread+0x128/0x130
> [ 9.913418] ret_from_fork+0x10/0x18
> [ 9.917024] Code: a94153f3 a8c27bfd d65f03c0 d503201f (f9400c01)
> [ 9.923180] ---[ end trace 2695ffa0af5be375 ]---
>
> On Thu, 20 Sep 2018 at 10:06, John Garry <[email protected]> wrote:
> [ 9.196615] arm-smmu-v3 arm-smmu-v3.4.auto: ias 44-bit, oas 44-bit
> (features 0x00000f0d)
> [ 9.206296] arm-smmu-v3 arm-smmu-v3.4.auto: no evtq irq - events will
> not be reported!
> [ 9.214302] arm-smmu-v3 arm-smmu-v3.4.auto: no gerr irq - errors will
> not be reported!
> [ 9.222673] pci 0007:90:00.0: can't derive routing for PCI INT A
> [ 9.228746] hibmc-drm 0007:91:00.0: PCI INT A: no GSI
> [ 9.234073] [TTM] Zone kernel: Available graphics memory: 16297696 kiB
> [ 9.240763] [TTM] Zone dma32: Available graphics memory: 2097152 kiB
> [ 9.247361] [TTM] Initializing pool allocator
> [ 9.251763] [TTM] Initializing DMA pool allocator
> [ 9.256565] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [ 9.263250] [drm] No driver support for vblank timestamp query.
> [ 9.274661] WARNING: CPU: 16 PID: 293 at
> drivers/gpu/drm/drm_fourcc.c:221 drm_format_info.part.1+0x0/0x8
> [ 9.284244] Modules linked in:
> [ 9.287326] CPU: 16 PID: 293 Comm: kworker/16:1 Not tainted
> 4.19.0-rc4-next-20180919-00001-gcb2f9f4-dirty #321
> [ 9.297435] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
> D05 IT21 Nemo 2.0 RC0 04/18/2018
> [ 9.306674] Workqueue: events work_for_cpu_fn
> [ 9.311072] pstate: 60000005 (nZCv daif -PAN -UAO)
> [ 9.315909] pc : drm_format_info.part.1+0x0/0x8
> [ 9.320482] lr : drm_get_format_info+0x90/0x98
> [ 9.324966] sp : ffff00000af1baf0
> [ 9.328307] x29: ffff00000af1baf0 x28: 0000000000000000
> [ 9.333673] x27: ffff00000af1bcb0 x26: ffff8017b4d78800
> [ 9.339037] x25: ffff8017b4d68018 x24: ffff8017b4d94018
> [ 9.344402] x23: ffff8017b4d78670 x22: ffff00000af1bbf0
> [ 9.349767] x21: ffff8017b4d78a70 x20: ffff00000af1bbf0
> [ 9.355131] x19: ffff00000af1bbf0 x18: ffffffffffffffff
> [ 9.360495] x17: 0000000000000000 x16: 0000000000000000
> [ 9.365860] x15: ffff0000092296c8 x14: ffff000009074000
> [ 9.371225] x13: 0000000000000000 x12: 0000000000000000
> [ 9.376589] x11: ffff8017fbffe008 x10: ffff8017db9307e8
> [ 9.381954] x9 : 0000000000000000 x8 : ffff8017b4d66800
> [ 9.387319] x7 : 0000000000000000 x6 : 000000000000003f
> [ 9.392683] x5 : 0000000000000040 x4 : 0000000000000000
> [ 9.398048] x3 : ffff000008d04000 x2 : 0000000056555941
> [ 9.403412] x1 : ffff000008d04f30 x0 : 0000000000000044
> [ 9.408777] Call trace:
> [ 9.411241] drm_format_info.part.1+0x0/0x8
> [ 9.415462] drm_helper_mode_fill_fb_struct+0x20/0x80
> [ 9.420564] hibmc_framebuffer_init+0x48/0xd0
> [ 9.424961] hibmc_drm_fb_create+0x1ec/0x3c8
> [ 9.429271] __drm_fb_helper_initial_config_and_unlock+0x1cc/0x418
> [ 9.435513] drm_fb_helper_initial_config+0x3c/0x48
> [ 9.440438] hibmc_fbdev_init+0xb4/0x198
> [ 9.444395] hibmc_pci_probe+0x2f4/0x3c8
> [ 9.448356] local_pci_probe+0x3c/0xb0
> [ 9.452137] work_for_cpu_fn+0x18/0x28
> [ 9.455919] process_one_work+0x1e0/0x318
> [ 9.459964] worker_thread+0x228/0x450
> [ 9.463746] kthread+0x128/0x130
> [ 9.467002] ret_from_fork+0x10/0x18
> [ 9.470608] ---[ end trace b05497eb4d842ec0 ]---
> [ 9.475285] WARNING: CPU: 16 PID: 293 at
> drivers/gpu/drm/drm_framebuffer.c:730 drm_framebuffer_init+0x18/0x110
> [ 9.485394] Modules linked in:
> [ 9.488474] CPU: 16 PID: 293 Comm: kworker/16:1 Tainted: G W
> 4.19.0-rc4-next-20180919-00001-gcb2f9f4-dirty #321
> [ 9.499989] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
> D05 IT21 Nemo 2.0 RC0 04/18/2018
> [ 9.509223] Workqueue: events work_for_cpu_fn
> [ 9.513621] pstate: 60000005 (nZCv daif -PAN -UAO)
> [ 9.518457] pc : drm_framebuffer_init+0x18/0x110
> [ 9.523118] lr : hibmc_framebuffer_init+0x60/0xd0
> [ 9.527865] sp : ffff00000af1baf0
> [ 9.531207] x29: ffff00000af1baf0 x28: 0000000000000000
> [ 9.536571] x27: ffff00000af1bcb0 x26: ffff8017b4d78800
> [ 9.541936] x25: ffff8017b4d68018 x24: ffff8017b4d94018
> [ 9.547301] x23: ffff8017b4d78670 x22: ffff00000af1bbf0
> [ 9.552666] x21: ffff8017b4d78a70 x20: ffff8017b4d48000
> [ 9.558030] x19: ffff8017b4d66700 x18: ffffffffffffffff
> [ 9.563395] x17: 0000000000000000 x16: 0000000000000000
> [ 9.568760] x15: ffff0000092296c8 x14: ffff000009074000
> [ 9.574124] x13: 0000000000000000 x12: 0000000000000000
> [ 9.579489] x11: ffff8017fbffe008 x10: ffff8017db9307e8
> [ 9.584854] x9 : 0000000000000000 x8 : ffff8017b4d66800
> [ 9.590218] x7 : 0000000000000000 x6 : 000000000000003f
> [ 9.595582] x5 : 0000000000000040 x4 : 0000000000000000
> [ 9.600946] x3 : ffff00000af1bc24 x2 : ffff000008d23f10
> [ 9.606311] x1 : ffff8017b4d66700 x0 : 0000000000000000
> [ 9.611675] Call trace:
> [ 9.614138] drm_framebuffer_init+0x18/0x110
> [ 9.618447] hibmc_framebuffer_init+0x60/0xd0
> [ 9.622845] hibmc_drm_fb_create+0x1ec/0x3c8
> [ 9.627154] __drm_fb_helper_initial_config_and_unlock+0x1cc/0x418
> [ 9.633397] drm_fb_helper_initial_config+0x3c/0x48
> [ 9.638321] hibmc_fbdev_init+0xb4/0x198
> [ 9.642278] hibmc_pci_probe+0x2f4/0x3c8
> [ 9.646236] local_pci_probe+0x3c/0xb0
> [ 9.650018] work_for_cpu_fn+0x18/0x28
> [ 9.653800] process_one_work+0x1e0/0x318
> [ 9.657845] worker_thread+0x228/0x450
> [ 9.661627] kthread+0x128/0x130
> [ 9.664881] ret_from_fork+0x10/0x18
> [ 9.668486] ---[ end trace b05497eb4d842ec1 ]---
> [ 9.673153] [drm:hibmc_framebuffer_init] *ERROR* drm_framebuffer_init
> failed: -22
> [ 9.680720] [drm:hibmc_drm_fb_create] *ERROR* failed to initialize
> framebuffer: -22
> [ 9.688468] [drm:hibmc_fbdev_init] *ERROR* failed to setup initial
> conn config: -22
> [ 9.696212] [drm:hibmc_pci_probe] *ERROR* failed to initialize fbdev:
> -22
> [ 9.703075] Unable to handle kernel NULL pointer dereference at
> virtual address 000000000000001a
> [ 9.711957] Mem abort info:
> [ 9.714774] ESR = 0x96000004
> [ 9.717855] Exception class = DABT (current EL), IL = 32 bits
> [ 9.723835] SET = 0, FnV = 0
> [ 9.726916] EA = 0, S1PTW = 0
> [ 9.730084] Data abort info:
> [ 9.732986] ISV = 0, ISS = 0x00000004
> [ 9.736858] CM = 0, WnR = 0
> [ 9.739850] [000000000000001a] user address but active_mm is swapper
> [ 9.746271] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [ 9.751898] Modules linked in:
> [ 9.754978] CPU: 16 PID: 293 Comm: kworker/16:1 Tainted: G W
> 4.19.0-rc4-next-20180919-00001-gcb2f9f4-dirty #321
> [ 9.766493] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
> D05 IT21 Nemo 2.0 RC0 04/18/2018
> [ 9.775727] Workqueue: events work_for_cpu_fn
> [ 9.780124] pstate: 60000005 (nZCv daif -PAN -UAO)
> [ 9.784962] pc : drm_mode_object_put+0x0/0x20
> [ 9.789359] lr : hibmc_fbdev_fini+0x40/0x58
> [ 9.793579] sp : ffff00000af1bcf0
> [ 9.796920] x29: ffff00000af1bcf0 x28: 0000000000000000
> [ 9.802285] x27: 0000000000000000 x26: ffff000008f66530
> [ 9.807649] x25: 0000000000000000 x24: ffff0000095abb98
> [ 9.813014] x23: ffff8017db92fe00 x22: ffff8017d2aeb000
> [ 9.818378] x21: ffffffffffffffea x20: ffff8017b4d94018
> [ 9.823742] x19: ffff8017b4d68018 x18: ffffffffffffffff
> [ 9.829106] x17: 0000000000000000 x16: 0000000000000000
> [ 9.834471] x15: ffff0000092296c8 x14: ffff00008939970f
> [ 9.839835] x13: ffff00000939971d x12: ffff000009229940
> [ 9.845200] x11: ffff0000085f8840 x10: ffff00000af1b9a0
> [ 9.850564] x9 : 000000000000000d x8 : 696c616974696e69
> [ 9.855929] x7 : ffff8017d2b96580 x6 : ffff8017d4168000
> [ 9.861294] x5 : 0000000000000000 x4 : ffff8017db92fb20
> [ 9.866659] x3 : 0000000000002650 x2 : ffff8017d2b96480
> [ 9.872023] x1 : 0000000000000028 x0 : 0000000000000002
> [ 9.877389] Process kworker/16:1 (pid: 293, stack limit =
> 0x(____ptrval____))
> [ 9.884598] Call trace:
> [ 9.887061] drm_mode_object_put+0x0/0x20
> [ 9.891107] hibmc_unload+0x1c/0x80
> [ 9.894625] hibmc_pci_probe+0x170/0x3c8
> [ 9.898583] local_pci_probe+0x3c/0xb0
> [ 9.902364] work_for_cpu_fn+0x18/0x28
> [ 9.906146] process_one_work+0x1e0/0x318
> [ 9.910192] worker_thread+0x228/0x450
> [ 9.913973] kthread+0x128/0x130
> [ 9.917227] ret_from_fork+0x10/0x18
> [ 9.920833] Code: a94153f3 a8c27bfd d65f03c0 d503201f (f9400c01)
> [ 9.926989] ---[ end trace b05497eb4d842ec2 ]---
>
>
>
> .
>




2018-09-21 05:51:49

by xinliang

[permalink] [raw]
Subject: Re: Bug report: HiBMC crash

Hi John,
Thank you for reporting bug.
I am now using 4.18.7. I haven't found this issue yet.
I will try linux-next and figure out what's wrong with it.

Thanks,
Xinliang


On 2018/9/20 19:23, John Garry wrote:
> On 20/09/2018 11:04, John Garry wrote:
>> Hi,
>>
>> I am seeing this crash below on linux-next (20 Sept).
>>
>> This is on an arm64 D05 board, which includes the HiBMC device. D06 was
>> also crashing for what looked like same reason. I am using standard
>> defconfig, except DRM and DRM_HISI_HIBMC are built-in.
>>
>> Is this a known issue? I tested v4.19-rc3 and it had no such crash.
>>
>> The origin seems to be here, where pointer info is not checked for NULL
>> for safety:
>> static int framebuffer_check(struct drm_device *dev,
>> const struct drm_mode_fb_cmd2 *r)
>> {
>> ...
>>
>> /* now let the driver pick its own format info */
>> info = drm_get_format_info(dev, r);
>>
>> ...
>>
>> for (i = 0; i < info->num_planes; i++) {
>> unsigned int width = fb_plane_width(r->width, info, i);
>> unsigned int height = fb_plane_height(r->height, info, i);
>> unsigned int cpp = info->cpp[i];
>>
>>
>
> Upon closer inspection the crash is actually from hibmc probe error
> handling path, specifically
> hibmc_fbdev_destroy()->drm_framebuffer_put() is called with fb holding
> the error value from hibmc_framebuffer_init(), as shown:
>
> static int hibmc_drm_fb_create(struct drm_fb_helper *helper,
> struct drm_fb_helper_surface_size *sizes)
> {
>
> ...
>
> hi_fbdev->fb = hibmc_framebuffer_init(priv->dev, &mode_cmd, gobj);
> if (IS_ERR(hi_fbdev->fb)) {
> ret = PTR_ERR(hi_fbdev->fb);
>
> *** hi_fbdev->fb holds error code ***
>
> DRM_ERROR("failed to initialize framebuffer: %d\n", ret);
> goto out_release_fbi;
> }
>
>
> static void hibmc_fbdev_destroy(struct hibmc_fbdev *fbdev)
> {
> struct hibmc_framebuffer *gfb = fbdev->fb;
> struct drm_fb_helper *fbh = &fbdev->helper;
>
> drm_fb_helper_unregister_fbi(fbh);
>
> drm_fb_helper_fini(fbh);
>
> ** &gfb->fb holds error code, not pointer ***
>
> if (gfb)
> drm_framebuffer_put(&gfb->fb);
> }
>
> This change fixes the crash for me:
>
> hi_fbdev->fb = hibmc_framebuffer_init(priv->dev, &mode_cmd, gobj);
> if (IS_ERR(hi_fbdev->fb)) {
> ret = PTR_ERR(hi_fbdev->fb);
> + hi_fbdev->fb = NULL;
> DRM_ERROR("failed to initialize framebuffer: %d\n", ret);
> goto out_release_fbi;
> }
>
> Why we're hitting the error path at all, I don't know.
>
> And, having said all that, the code I pointed out in
> framebuffer_check() still does not seem safe for same reason I
> mentioned originally.
>
> John
>
>> John
>>
>> [ 9.220446] pci 0007:90:00.0: can't derive routing for PCI INT A
>> [ 9.226517] hibmc-drm 0007:91:00.0: PCI INT A: no GSI
>> [ 9.231847] [TTM] Zone kernel: Available graphics memory:
>> 16297696 kiB
>> [ 9.238536] [TTM] Zone dma32: Available graphics memory: 2097152
>> kiB
>> [ 9.245133] [TTM] Initializing pool allocator
>> [ 9.249536] [TTM] Initializing DMA pool allocator
>> [ 9.254340] [drm] Supports vblank timestamp caching Rev 2
>> (21.10.2013).
>> [ 9.261026] [drm] No driver support for vblank timestamp query.
>> [ 9.272431] WARNING: CPU: 16 PID: 293 at
>> drivers/gpu/drm/drm_fourcc.c:221 drm_format_info.part.1+0x0/0x8
>> [ 9.282014] Modules linked in:
>> [ 9.285095] CPU: 16 PID: 293 Comm: kworker/16:1 Not tainted
>> 4.19.0-rc4-next-20180920-00001-g9b0012c #322
>> [ 9.294677] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
>> D05 IT21 Nemo 2.0 RC0 04/18/2018
>> [ 9.303915] Workqueue: events work_for_cpu_fn
>> [ 9.308314] pstate: 60000005 (nZCv daif -PAN -UAO)
>> [ 9.313150] pc : drm_format_info.part.1+0x0/0x8
>> [ 9.317724] lr : drm_get_format_info+0x90/0x98
>> [ 9.322208] sp : ffff00000af1baf0
>> [ 9.325549] x29: ffff00000af1baf0 x28: 0000000000000000
>> [ 9.330915] x27: ffff00000af1bcb0 x26: ffff8017d3018800
>> [ 9.336279] x25: ffff8017d28a0018 x24: ffff8017d2f80018
>> [ 9.341644] x23: ffff8017d3018670 x22: ffff00000af1bbf0
>> [ 9.347009] x21: ffff8017d3018a70 x20: ffff00000af1bbf0
>> [ 9.352373] x19: ffff00000af1bbf0 x18: ffffffffffffffff
>> [ 9.357737] x17: 0000000000000000 x16: 0000000000000000
>> [ 9.363102] x15: ffff0000092296c8 x14: ffff000009074000
>> [ 9.368466] x13: 0000000000000000 x12: 0000000000000000
>> [ 9.373831] x11: ffff8017fbffe008 x10: ffff8017db9307e8
>> [ 9.379195] x9 : 0000000000000000 x8 : ffff8017b517c800
>> [ 9.384560] x7 : 0000000000000000 x6 : 000000000000003f
>> [ 9.389924] x5 : 0000000000000040 x4 : 0000000000000000
>> [ 9.395289] x3 : ffff000008d04000 x2 : 0000000056555941
>> [ 9.400654] x1 : ffff000008d04f70 x0 : 0000000000000044
>> [ 9.406019] Call trace:
>> [ 9.408483] drm_format_info.part.1+0x0/0x8
>> [ 9.412705] drm_helper_mode_fill_fb_struct+0x20/0x80
>> [ 9.417807] hibmc_framebuffer_init+0x48/0xd0
>> [ 9.422204] hibmc_drm_fb_create+0x1ec/0x3c8
>> [ 9.426513] __drm_fb_helper_initial_config_and_unlock+0x1cc/0x418
>> [ 9.432756] drm_fb_helper_initial_config+0x3c/0x48
>> [ 9.437681] hibmc_fbdev_init+0xb4/0x198
>> [ 9.441638] hibmc_pci_probe+0x2f4/0x3c8
>> [ 9.445598] local_pci_probe+0x3c/0xb0
>> [ 9.449379] work_for_cpu_fn+0x18/0x28
>> [ 9.453161] process_one_work+0x1e0/0x318
>> [ 9.457207] worker_thread+0x228/0x450
>> [ 9.460988] kthread+0x128/0x130
>> [ 9.464244] ret_from_fork+0x10/0x18
>> [ 9.467850] ---[ end trace 2695ffa0af5be373 ]---
>> [ 9.472525] WARNING: CPU: 16 PID: 293 at
>> drivers/gpu/drm/drm_framebuffer.c:730 drm_framebuffer_init+0x18/0x110
>> [ 9.482634] Modules linked in:
>> [ 9.485714] CPU: 16 PID: 293 Comm: kworker/16:1 Tainted: G W
>> 4.19.0-rc4-next-20180920-00001-g9b0012c #322
>> [ 9.496702] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
>> D05 IT21 Nemo 2.0 RC0 04/18/2018
>> [ 9.505936] Workqueue: events work_for_cpu_fn
>> [ 9.510333] pstate: 60000005 (nZCv daif -PAN -UAO)
>> [ 9.515170] pc : drm_framebuffer_init+0x18/0x110
>> [ 9.519831] lr : hibmc_framebuffer_init+0x60/0xd0
>> [ 9.524578] sp : ffff00000af1baf0
>> [ 9.527920] x29: ffff00000af1baf0 x28: 0000000000000000
>> [ 9.533284] x27: ffff00000af1bcb0 x26: ffff8017d3018800
>> [ 9.538649] x25: ffff8017d28a0018 x24: ffff8017d2f80018
>> [ 9.544014] x23: ffff8017d3018670 x22: ffff00000af1bbf0
>> [ 9.549378] x21: ffff8017d3018a70 x20: ffff8017d2420000
>> [ 9.554743] x19: ffff8017b517c700 x18: ffffffffffffffff
>> [ 9.560108] x17: 0000000000000000 x16: 0000000000000000
>> [ 9.565472] x15: ffff0000092296c8 x14: ffff000009074000
>> [ 9.570837] x13: 0000000000000000 x12: 0000000000000000
>> [ 9.576201] x11: ffff8017fbffe008 x10: ffff8017db9307e8
>> [ 9.581566] x9 : 0000000000000000 x8 : ffff8017b517c800
>> [ 9.586930] x7 : 0000000000000000 x6 : 000000000000003f
>> [ 9.592295] x5 : 0000000000000040 x4 : 0000000000000000
>> [ 9.597660] x3 : ffff00000af1bc24 x2 : ffff000008d23f50
>> [ 9.603024] x1 : ffff8017b517c700 x0 : 0000000000000000
>> [ 9.608389] Call trace:
>> [ 9.610852] drm_framebuffer_init+0x18/0x110
>> [ 9.615161] hibmc_framebuffer_init+0x60/0xd0
>> [ 9.619558] hibmc_drm_fb_create+0x1ec/0x3c8
>> [ 9.623867] __drm_fb_helper_initial_config_and_unlock+0x1cc/0x418
>> [ 9.630110] drm_fb_helper_initial_config+0x3c/0x48
>> [ 9.635034] hibmc_fbdev_init+0xb4/0x198
>> [ 9.638991] hibmc_pci_probe+0x2f4/0x3c8
>> [ 9.642949] local_pci_probe+0x3c/0xb0
>> [ 9.646731] work_for_cpu_fn+0x18/0x28
>> [ 9.650513] process_one_work+0x1e0/0x318
>> [ 9.654558] worker_thread+0x228/0x450
>> [ 9.658339] kthread+0x128/0x130
>> [ 9.661594] ret_from_fork+0x10/0x18
>> [ 9.665199] ---[ end trace 2695ffa0af5be374 ]---
>> [ 9.669868] [drm:hibmc_framebuffer_init] *ERROR* drm_framebuffer_init
>> failed: -22
>> [ 9.677434] [drm:hibmc_drm_fb_create] *ERROR* failed to initialize
>> framebuffer: -22
>> [ 9.685182] [drm:hibmc_fbdev_init] *ERROR* failed to setup initial
>> conn config: -22
>> [ 9.692926] [drm:hibmc_pci_probe] *ERROR* failed to initialize fbdev:
>> -22
>> [ 9.699791] Unable to handle kernel NULL pointer dereference at
>> virtual address 000000000000001a
>> [ 9.708672] Mem abort info:
>> [ 9.711489] ESR = 0x96000004
>> [ 9.714570] Exception class = DABT (current EL), IL = 32 bits
>> [ 9.720551] SET = 0, FnV = 0
>> [ 9.723631] EA = 0, S1PTW = 0
>> [ 9.726799] Data abort info:
>> [ 9.729702] ISV = 0, ISS = 0x00000004
>> [ 9.733573] CM = 0, WnR = 0
>> [ 9.736566] [000000000000001a] user address but active_mm is swapper
>> [ 9.742987] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>> [ 9.748614] Modules linked in:
>> [ 9.751694] CPU: 16 PID: 293 Comm: kworker/16:1 Tainted: G W
>> 4.19.0-rc4-next-20180920-00001-g9b0012c #322
>> [ 9.762681] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
>> D05 IT21 Nemo 2.0 RC0 04/18/2018
>> [ 9.771915] Workqueue: events work_for_cpu_fn
>> [ 9.776312] pstate: 60000005 (nZCv daif -PAN -UAO)
>> [ 9.781150] pc : drm_mode_object_put+0x0/0x20
>> [ 9.785547] lr : hibmc_fbdev_fini+0x40/0x58
>> [ 9.789767] sp : ffff00000af1bcf0
>> [ 9.793108] x29: ffff00000af1bcf0 x28: 0000000000000000
>> [ 9.798473] x27: 0000000000000000 x26: ffff000008f66630
>> [ 9.803838] x25: 0000000000000000 x24: ffff0000095abb98
>> [ 9.809203] x23: ffff8017db92fe00 x22: ffff8017d2b13000
>> [ 9.814568] x21: ffffffffffffffea x20: ffff8017d2f80018
>> [ 9.819933] x19: ffff8017d28a0018 x18: ffffffffffffffff
>> [ 9.825297] x17: 0000000000000000 x16: 0000000000000000
>> [ 9.830662] x15: ffff0000092296c8 x14: ffff00008939970f
>> [ 9.836026] x13: ffff00000939971d x12: ffff000009229940
>> [ 9.841391] x11: ffff0000085f8fc0 x10: ffff00000af1b9a0
>> [ 9.846756] x9 : 000000000000000d x8 : 6620657a696c6169
>> [ 9.852121] x7 : ffff8017d3340580 x6 : ffff8017d4168000
>> [ 9.857486] x5 : 0000000000000000 x4 : ffff8017db92fb20
>> [ 9.862850] x3 : 0000000000002690 x2 : ffff8017d3340480
>> [ 9.868214] x1 : 0000000000000028 x0 : 0000000000000002
>> [ 9.873580] Process kworker/16:1 (pid: 293, stack limit =
>> 0x(____ptrval____))
>> [ 9.880788] Call trace:
>> [ 9.883252] drm_mode_object_put+0x0/0x20
>> [ 9.887297] hibmc_unload+0x1c/0x80
>> [ 9.890815] hibmc_pci_probe+0x170/0x3c8
>> [ 9.894773] local_pci_probe+0x3c/0xb0
>> [ 9.898555] work_for_cpu_fn+0x18/0x28
>> [ 9.902337] process_one_work+0x1e0/0x318
>> [ 9.906382] worker_thread+0x228/0x450
>> [ 9.910164] kthread+0x128/0x130
>> [ 9.913418] ret_from_fork+0x10/0x18
>> [ 9.917024] Code: a94153f3 a8c27bfd d65f03c0 d503201f (f9400c01)
>> [ 9.923180] ---[ end trace 2695ffa0af5be375 ]---
>>
>> On Thu, 20 Sep 2018 at 10:06, John Garry <[email protected]>
>> wrote:
>> [ 9.196615] arm-smmu-v3 arm-smmu-v3.4.auto: ias 44-bit, oas 44-bit
>> (features 0x00000f0d)
>> [ 9.206296] arm-smmu-v3 arm-smmu-v3.4.auto: no evtq irq - events will
>> not be reported!
>> [ 9.214302] arm-smmu-v3 arm-smmu-v3.4.auto: no gerr irq - errors will
>> not be reported!
>> [ 9.222673] pci 0007:90:00.0: can't derive routing for PCI INT A
>> [ 9.228746] hibmc-drm 0007:91:00.0: PCI INT A: no GSI
>> [ 9.234073] [TTM] Zone kernel: Available graphics memory:
>> 16297696 kiB
>> [ 9.240763] [TTM] Zone dma32: Available graphics memory: 2097152
>> kiB
>> [ 9.247361] [TTM] Initializing pool allocator
>> [ 9.251763] [TTM] Initializing DMA pool allocator
>> [ 9.256565] [drm] Supports vblank timestamp caching Rev 2
>> (21.10.2013).
>> [ 9.263250] [drm] No driver support for vblank timestamp query.
>> [ 9.274661] WARNING: CPU: 16 PID: 293 at
>> drivers/gpu/drm/drm_fourcc.c:221 drm_format_info.part.1+0x0/0x8
>> [ 9.284244] Modules linked in:
>> [ 9.287326] CPU: 16 PID: 293 Comm: kworker/16:1 Not tainted
>> 4.19.0-rc4-next-20180919-00001-gcb2f9f4-dirty #321
>> [ 9.297435] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
>> D05 IT21 Nemo 2.0 RC0 04/18/2018
>> [ 9.306674] Workqueue: events work_for_cpu_fn
>> [ 9.311072] pstate: 60000005 (nZCv daif -PAN -UAO)
>> [ 9.315909] pc : drm_format_info.part.1+0x0/0x8
>> [ 9.320482] lr : drm_get_format_info+0x90/0x98
>> [ 9.324966] sp : ffff00000af1baf0
>> [ 9.328307] x29: ffff00000af1baf0 x28: 0000000000000000
>> [ 9.333673] x27: ffff00000af1bcb0 x26: ffff8017b4d78800
>> [ 9.339037] x25: ffff8017b4d68018 x24: ffff8017b4d94018
>> [ 9.344402] x23: ffff8017b4d78670 x22: ffff00000af1bbf0
>> [ 9.349767] x21: ffff8017b4d78a70 x20: ffff00000af1bbf0
>> [ 9.355131] x19: ffff00000af1bbf0 x18: ffffffffffffffff
>> [ 9.360495] x17: 0000000000000000 x16: 0000000000000000
>> [ 9.365860] x15: ffff0000092296c8 x14: ffff000009074000
>> [ 9.371225] x13: 0000000000000000 x12: 0000000000000000
>> [ 9.376589] x11: ffff8017fbffe008 x10: ffff8017db9307e8
>> [ 9.381954] x9 : 0000000000000000 x8 : ffff8017b4d66800
>> [ 9.387319] x7 : 0000000000000000 x6 : 000000000000003f
>> [ 9.392683] x5 : 0000000000000040 x4 : 0000000000000000
>> [ 9.398048] x3 : ffff000008d04000 x2 : 0000000056555941
>> [ 9.403412] x1 : ffff000008d04f30 x0 : 0000000000000044
>> [ 9.408777] Call trace:
>> [ 9.411241] drm_format_info.part.1+0x0/0x8
>> [ 9.415462] drm_helper_mode_fill_fb_struct+0x20/0x80
>> [ 9.420564] hibmc_framebuffer_init+0x48/0xd0
>> [ 9.424961] hibmc_drm_fb_create+0x1ec/0x3c8
>> [ 9.429271] __drm_fb_helper_initial_config_and_unlock+0x1cc/0x418
>> [ 9.435513] drm_fb_helper_initial_config+0x3c/0x48
>> [ 9.440438] hibmc_fbdev_init+0xb4/0x198
>> [ 9.444395] hibmc_pci_probe+0x2f4/0x3c8
>> [ 9.448356] local_pci_probe+0x3c/0xb0
>> [ 9.452137] work_for_cpu_fn+0x18/0x28
>> [ 9.455919] process_one_work+0x1e0/0x318
>> [ 9.459964] worker_thread+0x228/0x450
>> [ 9.463746] kthread+0x128/0x130
>> [ 9.467002] ret_from_fork+0x10/0x18
>> [ 9.470608] ---[ end trace b05497eb4d842ec0 ]---
>> [ 9.475285] WARNING: CPU: 16 PID: 293 at
>> drivers/gpu/drm/drm_framebuffer.c:730 drm_framebuffer_init+0x18/0x110
>> [ 9.485394] Modules linked in:
>> [ 9.488474] CPU: 16 PID: 293 Comm: kworker/16:1 Tainted: G W
>> 4.19.0-rc4-next-20180919-00001-gcb2f9f4-dirty #321
>> [ 9.499989] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
>> D05 IT21 Nemo 2.0 RC0 04/18/2018
>> [ 9.509223] Workqueue: events work_for_cpu_fn
>> [ 9.513621] pstate: 60000005 (nZCv daif -PAN -UAO)
>> [ 9.518457] pc : drm_framebuffer_init+0x18/0x110
>> [ 9.523118] lr : hibmc_framebuffer_init+0x60/0xd0
>> [ 9.527865] sp : ffff00000af1baf0
>> [ 9.531207] x29: ffff00000af1baf0 x28: 0000000000000000
>> [ 9.536571] x27: ffff00000af1bcb0 x26: ffff8017b4d78800
>> [ 9.541936] x25: ffff8017b4d68018 x24: ffff8017b4d94018
>> [ 9.547301] x23: ffff8017b4d78670 x22: ffff00000af1bbf0
>> [ 9.552666] x21: ffff8017b4d78a70 x20: ffff8017b4d48000
>> [ 9.558030] x19: ffff8017b4d66700 x18: ffffffffffffffff
>> [ 9.563395] x17: 0000000000000000 x16: 0000000000000000
>> [ 9.568760] x15: ffff0000092296c8 x14: ffff000009074000
>> [ 9.574124] x13: 0000000000000000 x12: 0000000000000000
>> [ 9.579489] x11: ffff8017fbffe008 x10: ffff8017db9307e8
>> [ 9.584854] x9 : 0000000000000000 x8 : ffff8017b4d66800
>> [ 9.590218] x7 : 0000000000000000 x6 : 000000000000003f
>> [ 9.595582] x5 : 0000000000000040 x4 : 0000000000000000
>> [ 9.600946] x3 : ffff00000af1bc24 x2 : ffff000008d23f10
>> [ 9.606311] x1 : ffff8017b4d66700 x0 : 0000000000000000
>> [ 9.611675] Call trace:
>> [ 9.614138] drm_framebuffer_init+0x18/0x110
>> [ 9.618447] hibmc_framebuffer_init+0x60/0xd0
>> [ 9.622845] hibmc_drm_fb_create+0x1ec/0x3c8
>> [ 9.627154] __drm_fb_helper_initial_config_and_unlock+0x1cc/0x418
>> [ 9.633397] drm_fb_helper_initial_config+0x3c/0x48
>> [ 9.638321] hibmc_fbdev_init+0xb4/0x198
>> [ 9.642278] hibmc_pci_probe+0x2f4/0x3c8
>> [ 9.646236] local_pci_probe+0x3c/0xb0
>> [ 9.650018] work_for_cpu_fn+0x18/0x28
>> [ 9.653800] process_one_work+0x1e0/0x318
>> [ 9.657845] worker_thread+0x228/0x450
>> [ 9.661627] kthread+0x128/0x130
>> [ 9.664881] ret_from_fork+0x10/0x18
>> [ 9.668486] ---[ end trace b05497eb4d842ec1 ]---
>> [ 9.673153] [drm:hibmc_framebuffer_init] *ERROR* drm_framebuffer_init
>> failed: -22
>> [ 9.680720] [drm:hibmc_drm_fb_create] *ERROR* failed to initialize
>> framebuffer: -22
>> [ 9.688468] [drm:hibmc_fbdev_init] *ERROR* failed to setup initial
>> conn config: -22
>> [ 9.696212] [drm:hibmc_pci_probe] *ERROR* failed to initialize fbdev:
>> -22
>> [ 9.703075] Unable to handle kernel NULL pointer dereference at
>> virtual address 000000000000001a
>> [ 9.711957] Mem abort info:
>> [ 9.714774] ESR = 0x96000004
>> [ 9.717855] Exception class = DABT (current EL), IL = 32 bits
>> [ 9.723835] SET = 0, FnV = 0
>> [ 9.726916] EA = 0, S1PTW = 0
>> [ 9.730084] Data abort info:
>> [ 9.732986] ISV = 0, ISS = 0x00000004
>> [ 9.736858] CM = 0, WnR = 0
>> [ 9.739850] [000000000000001a] user address but active_mm is swapper
>> [ 9.746271] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>> [ 9.751898] Modules linked in:
>> [ 9.754978] CPU: 16 PID: 293 Comm: kworker/16:1 Tainted: G W
>> 4.19.0-rc4-next-20180919-00001-gcb2f9f4-dirty #321
>> [ 9.766493] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
>> D05 IT21 Nemo 2.0 RC0 04/18/2018
>> [ 9.775727] Workqueue: events work_for_cpu_fn
>> [ 9.780124] pstate: 60000005 (nZCv daif -PAN -UAO)
>> [ 9.784962] pc : drm_mode_object_put+0x0/0x20
>> [ 9.789359] lr : hibmc_fbdev_fini+0x40/0x58
>> [ 9.793579] sp : ffff00000af1bcf0
>> [ 9.796920] x29: ffff00000af1bcf0 x28: 0000000000000000
>> [ 9.802285] x27: 0000000000000000 x26: ffff000008f66530
>> [ 9.807649] x25: 0000000000000000 x24: ffff0000095abb98
>> [ 9.813014] x23: ffff8017db92fe00 x22: ffff8017d2aeb000
>> [ 9.818378] x21: ffffffffffffffea x20: ffff8017b4d94018
>> [ 9.823742] x19: ffff8017b4d68018 x18: ffffffffffffffff
>> [ 9.829106] x17: 0000000000000000 x16: 0000000000000000
>> [ 9.834471] x15: ffff0000092296c8 x14: ffff00008939970f
>> [ 9.839835] x13: ffff00000939971d x12: ffff000009229940
>> [ 9.845200] x11: ffff0000085f8840 x10: ffff00000af1b9a0
>> [ 9.850564] x9 : 000000000000000d x8 : 696c616974696e69
>> [ 9.855929] x7 : ffff8017d2b96580 x6 : ffff8017d4168000
>> [ 9.861294] x5 : 0000000000000000 x4 : ffff8017db92fb20
>> [ 9.866659] x3 : 0000000000002650 x2 : ffff8017d2b96480
>> [ 9.872023] x1 : 0000000000000028 x0 : 0000000000000002
>> [ 9.877389] Process kworker/16:1 (pid: 293, stack limit =
>> 0x(____ptrval____))
>> [ 9.884598] Call trace:
>> [ 9.887061] drm_mode_object_put+0x0/0x20
>> [ 9.891107] hibmc_unload+0x1c/0x80
>> [ 9.894625] hibmc_pci_probe+0x170/0x3c8
>> [ 9.898583] local_pci_probe+0x3c/0xb0
>> [ 9.902364] work_for_cpu_fn+0x18/0x28
>> [ 9.906146] process_one_work+0x1e0/0x318
>> [ 9.910192] worker_thread+0x228/0x450
>> [ 9.913973] kthread+0x128/0x130
>> [ 9.917227] ret_from_fork+0x10/0x18
>> [ 9.920833] Code: a94153f3 a8c27bfd d65f03c0 d503201f (f9400c01)
>> [ 9.926989] ---[ end trace b05497eb4d842ec2 ]---
>>
>>
>>
>> .
>>
>
>
>
> .
>



2018-09-21 08:13:41

by John Garry

[permalink] [raw]
Subject: Re: Bug report: HiBMC crash

On 21/09/2018 06:49, Liuxinliang (Matthew Liu) wrote:
> Hi John,
> Thank you for reporting bug.
> I am now using 4.18.7. I haven't found this issue yet.
> I will try linux-next and figure out what's wrong with it.
>
> Thanks,
> Xinliang
>
>

As mentioned in internal mail, the issue may be that the surface
depth/bpp we were using the in the driver was previously invalid, but
code has since been added in v4.19 to reject this. Specifically it looks
like this patch:

commit 70109354fed232dfce8fb2c7cadf635acbe03e19
Author: Chris Wilson <[email protected]>
Date: Wed Sep 5 16:31:16 2018 +0100

drm: Reject unknown legacy bpp and depth for drm_mode_addfb ioctl


Thanks,
John

> On 2018/9/20 19:23, John Garry wrote:
>> On 20/09/2018 11:04, John Garry wrote:
>>> Hi,
>>>
>>> I am seeing this crash below on linux-next (20 Sept).
>>>
>>> This is on an arm64 D05 board, which includes the HiBMC device. D06 was
>>> also crashing for what looked like same reason. I am using standard
>>> defconfig, except DRM and DRM_HISI_HIBMC are built-in.
>>>
>>> Is this a known issue? I tested v4.19-rc3 and it had no such crash.
>>>
>>> The origin seems to be here, where pointer info is not checked for NULL
>>> for safety:
>>> static int framebuffer_check(struct drm_device *dev,
>>> const struct drm_mode_fb_cmd2 *r)
>>> {
>>> ...
>>>
>>> /* now let the driver pick its own format info */
>>> info = drm_get_format_info(dev, r);
>>>
>>> ...
>>>
>>> for (i = 0; i < info->num_planes; i++) {
>>> unsigned int width = fb_plane_width(r->width, info, i);
>>> unsigned int height = fb_plane_height(r->height, info, i);
>>> unsigned int cpp = info->cpp[i];
>>>
>>>
>>
>> Upon closer inspection the crash is actually from hibmc probe error
>> handling path, specifically
>> hibmc_fbdev_destroy()->drm_framebuffer_put() is called with fb holding
>> the error value from hibmc_framebuffer_init(), as shown:
>>
>> static int hibmc_drm_fb_create(struct drm_fb_helper *helper,
>> struct drm_fb_helper_surface_size *sizes)
>> {
>>
>> ...
>>
>> hi_fbdev->fb = hibmc_framebuffer_init(priv->dev, &mode_cmd, gobj);
>> if (IS_ERR(hi_fbdev->fb)) {
>> ret = PTR_ERR(hi_fbdev->fb);
>>
>> *** hi_fbdev->fb holds error code ***
>>
>> DRM_ERROR("failed to initialize framebuffer: %d\n", ret);
>> goto out_release_fbi;
>> }
>>
>>
>> static void hibmc_fbdev_destroy(struct hibmc_fbdev *fbdev)
>> {
>> struct hibmc_framebuffer *gfb = fbdev->fb;
>> struct drm_fb_helper *fbh = &fbdev->helper;
>>
>> drm_fb_helper_unregister_fbi(fbh);
>>
>> drm_fb_helper_fini(fbh);
>>
>> ** &gfb->fb holds error code, not pointer ***
>>
>> if (gfb)
>> drm_framebuffer_put(&gfb->fb);
>> }
>>
>> This change fixes the crash for me:
>>
>> hi_fbdev->fb = hibmc_framebuffer_init(priv->dev, &mode_cmd, gobj);
>> if (IS_ERR(hi_fbdev->fb)) {
>> ret = PTR_ERR(hi_fbdev->fb);
>> + hi_fbdev->fb = NULL;
>> DRM_ERROR("failed to initialize framebuffer: %d\n", ret);
>> goto out_release_fbi;
>> }
>>
>> Why we're hitting the error path at all, I don't know.
>>
>> And, having said all that, the code I pointed out in
>> framebuffer_check() still does not seem safe for same reason I
>> mentioned originally.
>>
>> John
>>
>>> John
>>>
>>> [ 9.220446] pci 0007:90:00.0: can't derive routing for PCI INT A
>>> [ 9.226517] hibmc-drm 0007:91:00.0: PCI INT A: no GSI
>>> [ 9.231847] [TTM] Zone kernel: Available graphics memory:
>>> 16297696 kiB
>>> [ 9.238536] [TTM] Zone dma32: Available graphics memory: 2097152
>>> kiB
>>> [ 9.245133] [TTM] Initializing pool allocator
>>> [ 9.249536] [TTM] Initializing DMA pool allocator
>>> [ 9.254340] [drm] Supports vblank timestamp caching Rev 2
>>> (21.10.2013).
>>> [ 9.261026] [drm] No driver support for vblank timestamp query.
>>> [ 9.272431] WARNING: CPU: 16 PID: 293 at
>>> drivers/gpu/drm/drm_fourcc.c:221 drm_format_info.part.1+0x0/0x8
>>> [ 9.282014] Modules linked in:
>>> [ 9.285095] CPU: 16 PID: 293 Comm: kworker/16:1 Not tainted
>>> 4.19.0-rc4-next-20180920-00001-g9b0012c #322
>>> [ 9.294677] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
>>> D05 IT21 Nemo 2.0 RC0 04/18/2018
>>> [ 9.303915] Workqueue: events work_for_cpu_fn
>>> [ 9.308314] pstate: 60000005 (nZCv daif -PAN -UAO)
>>> [ 9.313150] pc : drm_format_info.part.1+0x0/0x8
>>> [ 9.317724] lr : drm_get_format_info+0x90/0x98
>>> [ 9.322208] sp : ffff00000af1baf0
>>> [ 9.325549] x29: ffff00000af1baf0 x28: 0000000000000000
>>> [ 9.330915] x27: ffff00000af1bcb0 x26: ffff8017d3018800
>>> [ 9.336279] x25: ffff8017d28a0018 x24: ffff8017d2f80018
>>> [ 9.341644] x23: ffff8017d3018670 x22: ffff00000af1bbf0
>>> [ 9.347009] x21: ffff8017d3018a70 x20: ffff00000af1bbf0
>>> [ 9.352373] x19: ffff00000af1bbf0 x18: ffffffffffffffff
>>> [ 9.357737] x17: 0000000000000000 x16: 0000000000000000
>>> [ 9.363102] x15: ffff0000092296c8 x14: ffff000009074000
>>> [ 9.368466] x13: 0000000000000000 x12: 0000000000000000
>>> [ 9.373831] x11: ffff8017fbffe008 x10: ffff8017db9307e8
>>> [ 9.379195] x9 : 0000000000000000 x8 : ffff8017b517c800
>>> [ 9.384560] x7 : 0000000000000000 x6 : 000000000000003f
>>> [ 9.389924] x5 : 0000000000000040 x4 : 0000000000000000
>>> [ 9.395289] x3 : ffff000008d04000 x2 : 0000000056555941
>>> [ 9.400654] x1 : ffff000008d04f70 x0 : 0000000000000044
>>> [ 9.406019] Call trace:
>>> [ 9.408483] drm_format_info.part.1+0x0/0x8
>>> [ 9.412705] drm_helper_mode_fill_fb_struct+0x20/0x80
>>> [ 9.417807] hibmc_framebuffer_init+0x48/0xd0
>>> [ 9.422204] hibmc_drm_fb_create+0x1ec/0x3c8
>>> [ 9.426513] __drm_fb_helper_initial_config_and_unlock+0x1cc/0x418
>>> [ 9.432756] drm_fb_helper_initial_config+0x3c/0x48
>>> [ 9.437681] hibmc_fbdev_init+0xb4/0x198
>>> [ 9.441638] hibmc_pci_probe+0x2f4/0x3c8
>>> [ 9.445598] local_pci_probe+0x3c/0xb0
>>> [ 9.449379] work_for_cpu_fn+0x18/0x28
>>> [ 9.453161] process_one_work+0x1e0/0x318
>>> [ 9.457207] worker_thread+0x228/0x450
>>> [ 9.460988] kthread+0x128/0x130
>>> [ 9.464244] ret_from_fork+0x10/0x18
>>> [ 9.467850] ---[ end trace 2695ffa0af5be373 ]---
>>> [ 9.472525] WARNING: CPU: 16 PID: 293 at
>>> drivers/gpu/drm/drm_framebuffer.c:730 drm_framebuffer_init+0x18/0x110
>>> [ 9.482634] Modules linked in:
>>> [ 9.485714] CPU: 16 PID: 293 Comm: kworker/16:1 Tainted: G W
>>> 4.19.0-rc4-next-20180920-00001-g9b0012c #322
>>> [ 9.496702] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
>>> D05 IT21 Nemo 2.0 RC0 04/18/2018
>>> [ 9.505936] Workqueue: events work_for_cpu_fn
>>> [ 9.510333] pstate: 60000005 (nZCv daif -PAN -UAO)
>>> [ 9.515170] pc : drm_framebuffer_init+0x18/0x110
>>> [ 9.519831] lr : hibmc_framebuffer_init+0x60/0xd0
>>> [ 9.524578] sp : ffff00000af1baf0
>>> [ 9.527920] x29: ffff00000af1baf0 x28: 0000000000000000
>>> [ 9.533284] x27: ffff00000af1bcb0 x26: ffff8017d3018800
>>> [ 9.538649] x25: ffff8017d28a0018 x24: ffff8017d2f80018
>>> [ 9.544014] x23: ffff8017d3018670 x22: ffff00000af1bbf0
>>> [ 9.549378] x21: ffff8017d3018a70 x20: ffff8017d2420000
>>> [ 9.554743] x19: ffff8017b517c700 x18: ffffffffffffffff
>>> [ 9.560108] x17: 0000000000000000 x16: 0000000000000000
>>> [ 9.565472] x15: ffff0000092296c8 x14: ffff000009074000
>>> [ 9.570837] x13: 0000000000000000 x12: 0000000000000000
>>> [ 9.576201] x11: ffff8017fbffe008 x10: ffff8017db9307e8
>>> [ 9.581566] x9 : 0000000000000000 x8 : ffff8017b517c800
>>> [ 9.586930] x7 : 0000000000000000 x6 : 000000000000003f
>>> [ 9.592295] x5 : 0000000000000040 x4 : 0000000000000000
>>> [ 9.597660] x3 : ffff00000af1bc24 x2 : ffff000008d23f50
>>> [ 9.603024] x1 : ffff8017b517c700 x0 : 0000000000000000
>>> [ 9.608389] Call trace:
>>> [ 9.610852] drm_framebuffer_init+0x18/0x110
>>> [ 9.615161] hibmc_framebuffer_init+0x60/0xd0
>>> [ 9.619558] hibmc_drm_fb_create+0x1ec/0x3c8
>>> [ 9.623867] __drm_fb_helper_initial_config_and_unlock+0x1cc/0x418
>>> [ 9.630110] drm_fb_helper_initial_config+0x3c/0x48
>>> [ 9.635034] hibmc_fbdev_init+0xb4/0x198
>>> [ 9.638991] hibmc_pci_probe+0x2f4/0x3c8
>>> [ 9.642949] local_pci_probe+0x3c/0xb0
>>> [ 9.646731] work_for_cpu_fn+0x18/0x28
>>> [ 9.650513] process_one_work+0x1e0/0x318
>>> [ 9.654558] worker_thread+0x228/0x450
>>> [ 9.658339] kthread+0x128/0x130
>>> [ 9.661594] ret_from_fork+0x10/0x18
>>> [ 9.665199] ---[ end trace 2695ffa0af5be374 ]---
>>> [ 9.669868] [drm:hibmc_framebuffer_init] *ERROR* drm_framebuffer_init
>>> failed: -22
>>> [ 9.677434] [drm:hibmc_drm_fb_create] *ERROR* failed to initialize
>>> framebuffer: -22
>>> [ 9.685182] [drm:hibmc_fbdev_init] *ERROR* failed to setup initial
>>> conn config: -22
>>> [ 9.692926] [drm:hibmc_pci_probe] *ERROR* failed to initialize fbdev:
>>> -22
>>> [ 9.699791] Unable to handle kernel NULL pointer dereference at
>>> virtual address 000000000000001a
>>> [ 9.708672] Mem abort info:
>>> [ 9.711489] ESR = 0x96000004
>>> [ 9.714570] Exception class = DABT (current EL), IL = 32 bits
>>> [ 9.720551] SET = 0, FnV = 0
>>> [ 9.723631] EA = 0, S1PTW = 0
>>> [ 9.726799] Data abort info:
>>> [ 9.729702] ISV = 0, ISS = 0x00000004
>>> [ 9.733573] CM = 0, WnR = 0
>>> [ 9.736566] [000000000000001a] user address but active_mm is swapper
>>> [ 9.742987] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>> [ 9.748614] Modules linked in:
>>> [ 9.751694] CPU: 16 PID: 293 Comm: kworker/16:1 Tainted: G W
>>> 4.19.0-rc4-next-20180920-00001-g9b0012c #322
>>> [ 9.762681] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
>>> D05 IT21 Nemo 2.0 RC0 04/18/2018
>>> [ 9.771915] Workqueue: events work_for_cpu_fn
>>> [ 9.776312] pstate: 60000005 (nZCv daif -PAN -UAO)
>>> [ 9.781150] pc : drm_mode_object_put+0x0/0x20
>>> [ 9.785547] lr : hibmc_fbdev_fini+0x40/0x58
>>> [ 9.789767] sp : ffff00000af1bcf0
>>> [ 9.793108] x29: ffff00000af1bcf0 x28: 0000000000000000
>>> [ 9.798473] x27: 0000000000000000 x26: ffff000008f66630
>>> [ 9.803838] x25: 0000000000000000 x24: ffff0000095abb98
>>> [ 9.809203] x23: ffff8017db92fe00 x22: ffff8017d2b13000
>>> [ 9.814568] x21: ffffffffffffffea x20: ffff8017d2f80018
>>> [ 9.819933] x19: ffff8017d28a0018 x18: ffffffffffffffff
>>> [ 9.825297] x17: 0000000000000000 x16: 0000000000000000
>>> [ 9.830662] x15: ffff0000092296c8 x14: ffff00008939970f
>>> [ 9.836026] x13: ffff00000939971d x12: ffff000009229940
>>> [ 9.841391] x11: ffff0000085f8fc0 x10: ffff00000af1b9a0
>>> [ 9.846756] x9 : 000000000000000d x8 : 6620657a696c6169
>>> [ 9.852121] x7 : ffff8017d3340580 x6 : ffff8017d4168000
>>> [ 9.857486] x5 : 0000000000000000 x4 : ffff8017db92fb20
>>> [ 9.862850] x3 : 0000000000002690 x2 : ffff8017d3340480
>>> [ 9.868214] x1 : 0000000000000028 x0 : 0000000000000002
>>> [ 9.873580] Process kworker/16:1 (pid: 293, stack limit =
>>> 0x(____ptrval____))
>>> [ 9.880788] Call trace:
>>> [ 9.883252] drm_mode_object_put+0x0/0x20
>>> [ 9.887297] hibmc_unload+0x1c/0x80
>>> [ 9.890815] hibmc_pci_probe+0x170/0x3c8
>>> [ 9.894773] local_pci_probe+0x3c/0xb0
>>> [ 9.898555] work_for_cpu_fn+0x18/0x28
>>> [ 9.902337] process_one_work+0x1e0/0x318
>>> [ 9.906382] worker_thread+0x228/0x450
>>> [ 9.910164] kthread+0x128/0x130
>>> [ 9.913418] ret_from_fork+0x10/0x18
>>> [ 9.917024] Code: a94153f3 a8c27bfd d65f03c0 d503201f (f9400c01)
>>> [ 9.923180] ---[ end trace 2695ffa0af5be375 ]---
>>>
>>> On Thu, 20 Sep 2018 at 10:06, John Garry <[email protected]>
>>> wrote:
>>> [ 9.196615] arm-smmu-v3 arm-smmu-v3.4.auto: ias 44-bit, oas 44-bit
>>> (features 0x00000f0d)
>>> [ 9.206296] arm-smmu-v3 arm-smmu-v3.4.auto: no evtq irq - events will
>>> not be reported!
>>> [ 9.214302] arm-smmu-v3 arm-smmu-v3.4.auto: no gerr irq - errors will
>>> not be reported!
>>> [ 9.222673] pci 0007:90:00.0: can't derive routing for PCI INT A
>>> [ 9.228746] hibmc-drm 0007:91:00.0: PCI INT A: no GSI
>>> [ 9.234073] [TTM] Zone kernel: Available graphics memory:
>>> 16297696 kiB
>>> [ 9.240763] [TTM] Zone dma32: Available graphics memory: 2097152
>>> kiB
>>> [ 9.247361] [TTM] Initializing pool allocator
>>> [ 9.251763] [TTM] Initializing DMA pool allocator
>>> [ 9.256565] [drm] Supports vblank timestamp caching Rev 2
>>> (21.10.2013).
>>> [ 9.263250] [drm] No driver support for vblank timestamp query.
>>> [ 9.274661] WARNING: CPU: 16 PID: 293 at
>>> drivers/gpu/drm/drm_fourcc.c:221 drm_format_info.part.1+0x0/0x8
>>> [ 9.284244] Modules linked in:
>>> [ 9.287326] CPU: 16 PID: 293 Comm: kworker/16:1 Not tainted
>>> 4.19.0-rc4-next-20180919-00001-gcb2f9f4-dirty #321
>>> [ 9.297435] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
>>> D05 IT21 Nemo 2.0 RC0 04/18/2018
>>> [ 9.306674] Workqueue: events work_for_cpu_fn
>>> [ 9.311072] pstate: 60000005 (nZCv daif -PAN -UAO)
>>> [ 9.315909] pc : drm_format_info.part.1+0x0/0x8
>>> [ 9.320482] lr : drm_get_format_info+0x90/0x98
>>> [ 9.324966] sp : ffff00000af1baf0
>>> [ 9.328307] x29: ffff00000af1baf0 x28: 0000000000000000
>>> [ 9.333673] x27: ffff00000af1bcb0 x26: ffff8017b4d78800
>>> [ 9.339037] x25: ffff8017b4d68018 x24: ffff8017b4d94018
>>> [ 9.344402] x23: ffff8017b4d78670 x22: ffff00000af1bbf0
>>> [ 9.349767] x21: ffff8017b4d78a70 x20: ffff00000af1bbf0
>>> [ 9.355131] x19: ffff00000af1bbf0 x18: ffffffffffffffff
>>> [ 9.360495] x17: 0000000000000000 x16: 0000000000000000
>>> [ 9.365860] x15: ffff0000092296c8 x14: ffff000009074000
>>> [ 9.371225] x13: 0000000000000000 x12: 0000000000000000
>>> [ 9.376589] x11: ffff8017fbffe008 x10: ffff8017db9307e8
>>> [ 9.381954] x9 : 0000000000000000 x8 : ffff8017b4d66800
>>> [ 9.387319] x7 : 0000000000000000 x6 : 000000000000003f
>>> [ 9.392683] x5 : 0000000000000040 x4 : 0000000000000000
>>> [ 9.398048] x3 : ffff000008d04000 x2 : 0000000056555941
>>> [ 9.403412] x1 : ffff000008d04f30 x0 : 0000000000000044
>>> [ 9.408777] Call trace:
>>> [ 9.411241] drm_format_info.part.1+0x0/0x8
>>> [ 9.415462] drm_helper_mode_fill_fb_struct+0x20/0x80
>>> [ 9.420564] hibmc_framebuffer_init+0x48/0xd0
>>> [ 9.424961] hibmc_drm_fb_create+0x1ec/0x3c8
>>> [ 9.429271] __drm_fb_helper_initial_config_and_unlock+0x1cc/0x418
>>> [ 9.435513] drm_fb_helper_initial_config+0x3c/0x48
>>> [ 9.440438] hibmc_fbdev_init+0xb4/0x198
>>> [ 9.444395] hibmc_pci_probe+0x2f4/0x3c8
>>> [ 9.448356] local_pci_probe+0x3c/0xb0
>>> [ 9.452137] work_for_cpu_fn+0x18/0x28
>>> [ 9.455919] process_one_work+0x1e0/0x318
>>> [ 9.459964] worker_thread+0x228/0x450
>>> [ 9.463746] kthread+0x128/0x130
>>> [ 9.467002] ret_from_fork+0x10/0x18
>>> [ 9.470608] ---[ end trace b05497eb4d842ec0 ]---
>>> [ 9.475285] WARNING: CPU: 16 PID: 293 at
>>> drivers/gpu/drm/drm_framebuffer.c:730 drm_framebuffer_init+0x18/0x110
>>> [ 9.485394] Modules linked in:
>>> [ 9.488474] CPU: 16 PID: 293 Comm: kworker/16:1 Tainted: G W
>>> 4.19.0-rc4-next-20180919-00001-gcb2f9f4-dirty #321
>>> [ 9.499989] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
>>> D05 IT21 Nemo 2.0 RC0 04/18/2018
>>> [ 9.509223] Workqueue: events work_for_cpu_fn
>>> [ 9.513621] pstate: 60000005 (nZCv daif -PAN -UAO)
>>> [ 9.518457] pc : drm_framebuffer_init+0x18/0x110
>>> [ 9.523118] lr : hibmc_framebuffer_init+0x60/0xd0
>>> [ 9.527865] sp : ffff00000af1baf0
>>> [ 9.531207] x29: ffff00000af1baf0 x28: 0000000000000000
>>> [ 9.536571] x27: ffff00000af1bcb0 x26: ffff8017b4d78800
>>> [ 9.541936] x25: ffff8017b4d68018 x24: ffff8017b4d94018
>>> [ 9.547301] x23: ffff8017b4d78670 x22: ffff00000af1bbf0
>>> [ 9.552666] x21: ffff8017b4d78a70 x20: ffff8017b4d48000
>>> [ 9.558030] x19: ffff8017b4d66700 x18: ffffffffffffffff
>>> [ 9.563395] x17: 0000000000000000 x16: 0000000000000000
>>> [ 9.568760] x15: ffff0000092296c8 x14: ffff000009074000
>>> [ 9.574124] x13: 0000000000000000 x12: 0000000000000000
>>> [ 9.579489] x11: ffff8017fbffe008 x10: ffff8017db9307e8
>>> [ 9.584854] x9 : 0000000000000000 x8 : ffff8017b4d66800
>>> [ 9.590218] x7 : 0000000000000000 x6 : 000000000000003f
>>> [ 9.595582] x5 : 0000000000000040 x4 : 0000000000000000
>>> [ 9.600946] x3 : ffff00000af1bc24 x2 : ffff000008d23f10
>>> [ 9.606311] x1 : ffff8017b4d66700 x0 : 0000000000000000
>>> [ 9.611675] Call trace:
>>> [ 9.614138] drm_framebuffer_init+0x18/0x110
>>> [ 9.618447] hibmc_framebuffer_init+0x60/0xd0
>>> [ 9.622845] hibmc_drm_fb_create+0x1ec/0x3c8
>>> [ 9.627154] __drm_fb_helper_initial_config_and_unlock+0x1cc/0x418
>>> [ 9.633397] drm_fb_helper_initial_config+0x3c/0x48
>>> [ 9.638321] hibmc_fbdev_init+0xb4/0x198
>>> [ 9.642278] hibmc_pci_probe+0x2f4/0x3c8
>>> [ 9.646236] local_pci_probe+0x3c/0xb0
>>> [ 9.650018] work_for_cpu_fn+0x18/0x28
>>> [ 9.653800] process_one_work+0x1e0/0x318
>>> [ 9.657845] worker_thread+0x228/0x450
>>> [ 9.661627] kthread+0x128/0x130
>>> [ 9.664881] ret_from_fork+0x10/0x18
>>> [ 9.668486] ---[ end trace b05497eb4d842ec1 ]---
>>> [ 9.673153] [drm:hibmc_framebuffer_init] *ERROR* drm_framebuffer_init
>>> failed: -22
>>> [ 9.680720] [drm:hibmc_drm_fb_create] *ERROR* failed to initialize
>>> framebuffer: -22
>>> [ 9.688468] [drm:hibmc_fbdev_init] *ERROR* failed to setup initial
>>> conn config: -22
>>> [ 9.696212] [drm:hibmc_pci_probe] *ERROR* failed to initialize fbdev:
>>> -22
>>> [ 9.703075] Unable to handle kernel NULL pointer dereference at
>>> virtual address 000000000000001a
>>> [ 9.711957] Mem abort info:
>>> [ 9.714774] ESR = 0x96000004
>>> [ 9.717855] Exception class = DABT (current EL), IL = 32 bits
>>> [ 9.723835] SET = 0, FnV = 0
>>> [ 9.726916] EA = 0, S1PTW = 0
>>> [ 9.730084] Data abort info:
>>> [ 9.732986] ISV = 0, ISS = 0x00000004
>>> [ 9.736858] CM = 0, WnR = 0
>>> [ 9.739850] [000000000000001a] user address but active_mm is swapper
>>> [ 9.746271] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>> [ 9.751898] Modules linked in:
>>> [ 9.754978] CPU: 16 PID: 293 Comm: kworker/16:1 Tainted: G W
>>> 4.19.0-rc4-next-20180919-00001-gcb2f9f4-dirty #321
>>> [ 9.766493] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
>>> D05 IT21 Nemo 2.0 RC0 04/18/2018
>>> [ 9.775727] Workqueue: events work_for_cpu_fn
>>> [ 9.780124] pstate: 60000005 (nZCv daif -PAN -UAO)
>>> [ 9.784962] pc : drm_mode_object_put+0x0/0x20
>>> [ 9.789359] lr : hibmc_fbdev_fini+0x40/0x58
>>> [ 9.793579] sp : ffff00000af1bcf0
>>> [ 9.796920] x29: ffff00000af1bcf0 x28: 0000000000000000
>>> [ 9.802285] x27: 0000000000000000 x26: ffff000008f66530
>>> [ 9.807649] x25: 0000000000000000 x24: ffff0000095abb98
>>> [ 9.813014] x23: ffff8017db92fe00 x22: ffff8017d2aeb000
>>> [ 9.818378] x21: ffffffffffffffea x20: ffff8017b4d94018
>>> [ 9.823742] x19: ffff8017b4d68018 x18: ffffffffffffffff
>>> [ 9.829106] x17: 0000000000000000 x16: 0000000000000000
>>> [ 9.834471] x15: ffff0000092296c8 x14: ffff00008939970f
>>> [ 9.839835] x13: ffff00000939971d x12: ffff000009229940
>>> [ 9.845200] x11: ffff0000085f8840 x10: ffff00000af1b9a0
>>> [ 9.850564] x9 : 000000000000000d x8 : 696c616974696e69
>>> [ 9.855929] x7 : ffff8017d2b96580 x6 : ffff8017d4168000
>>> [ 9.861294] x5 : 0000000000000000 x4 : ffff8017db92fb20
>>> [ 9.866659] x3 : 0000000000002650 x2 : ffff8017d2b96480
>>> [ 9.872023] x1 : 0000000000000028 x0 : 0000000000000002
>>> [ 9.877389] Process kworker/16:1 (pid: 293, stack limit =
>>> 0x(____ptrval____))
>>> [ 9.884598] Call trace:
>>> [ 9.887061] drm_mode_object_put+0x0/0x20
>>> [ 9.891107] hibmc_unload+0x1c/0x80
>>> [ 9.894625] hibmc_pci_probe+0x170/0x3c8
>>> [ 9.898583] local_pci_probe+0x3c/0xb0
>>> [ 9.902364] work_for_cpu_fn+0x18/0x28
>>> [ 9.906146] process_one_work+0x1e0/0x318
>>> [ 9.910192] worker_thread+0x228/0x450
>>> [ 9.913973] kthread+0x128/0x130
>>> [ 9.917227] ret_from_fork+0x10/0x18
>>> [ 9.920833] Code: a94153f3 a8c27bfd d65f03c0 d503201f (f9400c01)
>>> [ 9.926989] ---[ end trace b05497eb4d842ec2 ]---
>>>
>>>
>>>
>>> .
>>>
>>
>>
>>
>> .
>>
>
>



2018-09-21 14:29:50

by Chris Wilson

[permalink] [raw]
Subject: Re: Bug report: HiBMC crash

Quoting John Garry (2018-09-21 09:11:19)
> On 21/09/2018 06:49, Liuxinliang (Matthew Liu) wrote:
> > Hi John,
> > Thank you for reporting bug.
> > I am now using 4.18.7. I haven't found this issue yet.
> > I will try linux-next and figure out what's wrong with it.
> >
> > Thanks,
> > Xinliang
> >
> >
>
> As mentioned in internal mail, the issue may be that the surface
> depth/bpp we were using the in the driver was previously invalid, but
> code has since been added in v4.19 to reject this. Specifically it looks
> like this patch:
>
> commit 70109354fed232dfce8fb2c7cadf635acbe03e19
> Author: Chris Wilson <[email protected]>
> Date: Wed Sep 5 16:31:16 2018 +0100
>
> drm: Reject unknown legacy bpp and depth for drm_mode_addfb ioctl


diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
index b92595c477ef..f3e7f41e6781 100644
--- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
+++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
@@ -71,7 +71,6 @@ static int hibmc_drm_fb_create(struct drm_fb_helper *helper,
DRM_DEBUG_DRIVER("surface width(%d), height(%d) and bpp(%d)\n",
sizes->surface_width, sizes->surface_height,
sizes->surface_bpp);
- sizes->surface_depth = 32;

bytes_per_pixel = DIV_ROUND_UP(sizes->surface_bpp, 8);

@@ -192,7 +191,6 @@ int hibmc_fbdev_init(struct hibmc_drm_private *priv)
return -ENOMEM;
}

- priv->fbdev = hifbdev;
drm_fb_helper_prepare(priv->dev, &hifbdev->helper,
&hibmc_fbdev_helper_funcs);

@@ -246,6 +244,7 @@ int hibmc_fbdev_init(struct hibmc_drm_private *priv)
fix->ypanstep, fix->ywrapstep, fix->line_length,
fix->accel, fix->capabilities);

+ priv->fbdev = hifbdev;
return 0;

fini:

Apply chunks 2&3 first to confirm they fix the GPF.
-Chris

2018-09-21 16:24:31

by John Garry

[permalink] [raw]
Subject: Re: Bug report: HiBMC crash

On 21/09/2018 15:28, Chris Wilson wrote:
> Quoting John Garry (2018-09-21 09:11:19)
>> On 21/09/2018 06:49, Liuxinliang (Matthew Liu) wrote:
>>> Hi John,
>>> Thank you for reporting bug.
>>> I am now using 4.18.7. I haven't found this issue yet.
>>> I will try linux-next and figure out what's wrong with it.
>>>
>>> Thanks,
>>> Xinliang
>>>
>>>
>>
>> As mentioned in internal mail, the issue may be that the surface
>> depth/bpp we were using the in the driver was previously invalid, but
>> code has since been added in v4.19 to reject this. Specifically it looks
>> like this patch:
>>
>> commit 70109354fed232dfce8fb2c7cadf635acbe03e19
>> Author: Chris Wilson <[email protected]>
>> Date: Wed Sep 5 16:31:16 2018 +0100
>>
>> drm: Reject unknown legacy bpp and depth for drm_mode_addfb ioctl
>
>
> diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
> index b92595c477ef..f3e7f41e6781 100644
> --- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
> +++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
> @@ -71,7 +71,6 @@ static int hibmc_drm_fb_create(struct drm_fb_helper *helper,
> DRM_DEBUG_DRIVER("surface width(%d), height(%d) and bpp(%d)\n",
> sizes->surface_width, sizes->surface_height,
> sizes->surface_bpp);
> - sizes->surface_depth = 32;
>
> bytes_per_pixel = DIV_ROUND_UP(sizes->surface_bpp, 8);
>
> @@ -192,7 +191,6 @@ int hibmc_fbdev_init(struct hibmc_drm_private *priv)
> return -ENOMEM;
> }
>
> - priv->fbdev = hifbdev;
> drm_fb_helper_prepare(priv->dev, &hifbdev->helper,
> &hibmc_fbdev_helper_funcs);
>
> @@ -246,6 +244,7 @@ int hibmc_fbdev_init(struct hibmc_drm_private *priv)
> fix->ypanstep, fix->ywrapstep, fix->line_length,
> fix->accel, fix->capabilities);
>
> + priv->fbdev = hifbdev;
> return 0;
>
> fini:
>
> Apply chunks 2&3 first to confirm they fix the GPF.
> -Chris

Hi Chris,

So relocating where priv->fbdev is set does fix the crash.

However then applying chunk #1 introduces another crash:

9.229007] pci 0007:90:00.0: can't derive routing for PCI INT A
[ 9.235082] hibmc-drm 0007:91:00.0: PCI INT A: no GSI
[ 9.240457] [TTM] Zone kernel: Available graphics memory: 16297792 kiB
[ 9.247147] [TTM] Zone dma32: Available graphics memory: 2097152 kiB
[ 9.253744] [TTM] Initializing pool allocator
[ 9.258148] [TTM] Initializing DMA pool allocator
[ 9.262951] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 9.269636] [drm] No driver support for vblank timestamp query.
[ 9.280967] Unable to handle kernel 9.229007] pci 0007:90:00.0:
can't derive routing for PCI INT A
[ 9.235082] hibmc-drm 0007:91:00.0: PCI INT A: no GSI
[ 9.240457] [TTM] Zone kernel: Available graphics memory: 16297792 kiB
[ 9.247147] [TTM] Zone dma32: Available graphics memory: 2097152 kiB
[ 9.253744] [TTM] Initializing pool allocator
[ 9.258148] [TTM] Initializing DMA pool allocator
[ 9.262951] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 9.269636] [drm] No driver support for vblank timestamp query.
[ 9.280967] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000150
[ 9.289849] Mem abort info:
[ 9.292666] ESR = 0x96000044
[ 9.295747] Exception class = DABT (current EL), IL = 32 bits
[ 9.301728] SET = 0, FnV = 0
[ 9.304809] EA = 0, S1PTW = 0
[ 9.307977] Data abort info:
[ 9.310882] ISV = 0, ISS = 0x00000044
[ 9.314754] CM = 0, WnR = 1
[ 9.317744] [0000000000000150] user address but active_mm is swapper
[ 9.324166] Internal error: Oops: 96000044 [#1] PREEMPT SMP
[ 9.329793] Modules linked in:
[ 9.332874] CPU: 16 PID: 293 Comm: kworker/16:1 Not tainted
4.19.0-rc4-next-20180920-00001-g9b0012c-dirty #345
[ 9.342983] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
D05 IT21 Nemo 2.0 RC0 04/18/2018
[ 9.352223] Workqueue: events work_for_cpu_fn
[ 9.356621] pstate: 80000005 (Nzcv daif -PAN -UAO)
[ 9.361461] pc : hibmc_drm_fb_create+0x20c/0x3c0
[ 9.366122] lr : hibmc_drm_fb_create+0x1e4/0x3c0
[ 9.370781] sp : ffff00000aeebb50
[ 9.374123] x29: ffff00000aeebb50 x28: 0000000000000000
[ 9.379489] x27: ffff00000aeebca0 x26: ffff8017b3830800
[ 9.384854] x25: ffff8017b3828018 x24: ffff8017b3850018
[ 9.390219] x23: ffff8017b3830670 x22: ffff8017b3830800
[ 9.395583] x21: 00000000000eb000 x20: ffff8017b3830a70
[ 9.400948] x19: ffff0000091f9000 x18: ffffffffffffffff
[ 9.406313] x17: 0000000000000000 x16: ffff8017d4168000
[ 9.411678] x15: ffff0000091f96c8 x14: ffff000009049000
[ 9.417042] x13: 0000000000000000 x12: 0000000000000000
[ 9.422407] x11: ffff8017daf39940 x10: 0000000000000040
[ 9.427772] x9 : ffff8017b53e02b0 x8 : ffff8017daf39918
[ 9.433136] x7 : ffff8017daf39a60 x6 : ffff8017b3840800
[ 9.438500] x5 : 0000000000000000 x4 : 0000000000000000
[ 9.443865] x3 : ffff8017b53e0290 x2 : ffff000009306000
[ 9.449229] x1 : ffff000008fe1d70 x0 : 0000000000000000
[ 9.454594] Process kworker/16:1 (pid: 293, stack limit =
0x(____ptrval____))
[ 9.461803] Call trace:
[ 9.464267] hibmc_drm_fb_create+0x20c/0x3c0
[ 9.468578] __drm_fb_helper_initial_config_and_unlock+0x1cc/0x418
[ 9.474820] drm_fb_helper_initial_config+0x3c/0x48
[ 9.479744] hibmc_fbdev_init+0xb8/0x1b0
[ 9.483701] hibmc_pci_probe+0x2f4/0x3c8
[ 9.487660] local_pci_probe+0x3c/0xb0
[ 9.491442] work_for_cpu_fn+0x18/0x28
[ 9.495225] process_one_work+0x1e0/0x318
[ 9.499270] worker_thread+0x228/0x450
[ 9.503052] kthread+0x128/0x130
[ 9.506308] ret_from_fork+0x10/0x18
[ 9.509914] Code: 12144eb5 b0004841 9135c021 d0006162 (b9015015)
[ 9.516071] ---[ end trace ce5de8f0d3370702 ]---

NULL pointer dereference at virtual address 0000000000000150
[ 9.289849] Mem abort info:
[ 9.292666] ESR = 0x96000044
[ 9.295747] Exception class = DABT (current EL), IL = 32 bits
[ 9.301728] SET = 0, FnV = 0
[ 9.304809] EA = 0, S1PTW = 0
[ 9.307977] Data abort info:
[ 9.310882] ISV = 0, ISS = 0x00000044
[ 9.314754] CM = 0, WnR = 1
[ 9.317744] [0000000000000150] user address but active_mm is swapper
[ 9.324166] Internal error: Oops: 96000044 [#1] PREEMPT SMP
[ 9.329793] Modules linked in:
[ 9.332874] CPU: 16 PID: 293 Comm: kworker/16:1 Not tainted
4.19.0-rc4-next-20180920-00001-g9b0012c-dirty #345
[ 9.342983] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
D05 IT21 Nemo 2.0 RC0 04/18/2018
[ 9.352223] Workqueue: events work_for_cpu_fn
[ 9.356621] pstate: 80000005 (Nzcv daif -PAN -UAO)
[ 9.361461] pc : hibmc_drm_fb_create+0x20c/0x3c0
[ 9.366122] lr : hibmc_drm_fb_create+0x1e4/0x3c0
[ 9.370781] sp : ffff00000aeebb50
[ 9.374123] x29: ffff00000aeebb50 x28: 0000000000000000
[ 9.379489] x27: ffff00000aeebca0 x26: ffff8017b3830800
[ 9.384854] x25: ffff8017b3828018 x24: ffff8017b3850018
[ 9.390219] x23: ffff8017b3830670 x22: ffff8017b3830800
[ 9.395583] x21: 00000000000eb000 x20: ffff8017b3830a70
[ 9.400948] x19: ffff0000091f9000 x18: ffffffffffffffff
[ 9.406313] x17: 0000000000000000 x16: ffff8017d4168000
[ 9.411678] x15: ffff0000091f96c8 x14: ffff000009049000
[ 9.417042] x13: 0000000000000000 x12: 0000000000000000
[ 9.422407] x11: ffff8017daf39940 x10: 0000000000000040
[ 9.427772] x9 : ffff8017b53e02b0 x8 : ffff8017daf39918
[ 9.433136] x7 : ffff8017daf39a60 x6 : ffff8017b3840800
[ 9.438500] x5 : 0000000000000000 x4 : 0000000000000000
[ 9.443865] x3 : ffff8017b53e0290 x2 : ffff000009306000
[ 9.449229] x1 : ffff000008fe1d70 x0 : 0000000000000000
[ 9.454594] Process kworker/16:1 (pid: 293, stack limit =
0x(____ptrval____))
[ 9.461803] Call trace:
[ 9.464267] hibmc_drm_fb_create+0x20c/0x3c0
[ 9.468578] __drm_fb_helper_initial_config_and_unlock+0x1cc/0x418
[ 9.474820] drm_fb_helper_initial_config+0x3c/0x48
[ 9.479744] hibmc_fbdev_init+0xb8/0x1b0
[ 9.483701] hibmc_pci_probe+0x2f4/0x3c8
[ 9.487660] local_pci_probe+0x3c/0xb0
[ 9.491442] work_for_cpu_fn+0x18/0x28
[ 9.495225] process_one_work+0x1e0/0x318
[ 9.499270] worker_thread+0x228/0x450
[ 9.503052] kthread+0x128/0x130
[ 9.506308] ret_from_fork+0x10/0x18
[ 9.509914] Code: 12144eb5 b0004841 9135c021 d0006162 (b9015015)
[ 9.516071] ---[ end trace ce5de8f0d3370702 ]---


I already locally added the following to fix error path (with identical
chunk #1) instead of #2+3:

index b92595c..8bd2907 100644
--- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
+++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
@@ -122,6 +122,7 @@ static int hibmc_drm_fb_create(struct drm_fb_helper
*helper,
hi_fbdev->fb = hibmc_framebuffer_init(priv->dev, &mode_cmd, gobj);
if (IS_ERR(hi_fbdev->fb)) {
ret = PTR_ERR(hi_fbdev->fb);
+ hi_fbdev->fb = NULL;
DRM_ERROR("failed to initialize framebuffer: %d\n", ret);
goto out_release_fbi;
}

And vga function seems ok:
[ 9.233035] hibmc-drm 0007:91:00.0: PCI INT A: no GSI
[ 9.238361] [TTM] Zone kernel: Available graphics memory: 16297762 kiB
[ 9.245051] [TTM] Zone dma32: Available graphics memory: 2097152 kiB
[ 9.251650] [TTM] Initializing pool allocator
[ 9.256052] [TTM] Initializing DMA pool allocator
[ 9.260856] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 9.267541] [drm] No driver support for vblank timestamp query.
[ 9.306234] Console: switching to colour frame buffer device 100x37
[ 9.329622] hibmc-drm 0007:91:00.0: fb0: hibmcdrmfb frame buffer device
[ 9.336530] [drm] Initialized hibmc 1.0.0 20160828 for 0007:91:00.0
on minor 0
[ 9.356393] loop: module loaded

I can send a patchset, but it would be good for a hibmc maintainer to
also comment ....

Thanks,
John

>
> .
>