Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751983AbbFZJ1q (ORCPT ); Fri, 26 Jun 2015 05:27:46 -0400 Received: from mail-oi0-f42.google.com ([209.85.218.42]:35039 "EHLO mail-oi0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751379AbbFZJ1l convert rfc822-to-8bit (ORCPT ); Fri, 26 Jun 2015 05:27:41 -0400 MIME-Version: 1.0 X-Originating-IP: [2a02:168:56c9:0:22cf:30ff:fe4c:37d6] In-Reply-To: <1435305314-14337-1-git-send-email-rui.y.wang@intel.com> References: <1435305314-14337-1-git-send-email-rui.y.wang@intel.com> Date: Fri, 26 Jun 2015 11:27:40 +0200 Message-ID: Subject: Re: drm/mgag200: doesn't work in panic context From: Daniel Vetter To: Rui Wang Cc: Dave Airlie , "Clark, Rob" , Matthew D Roper , "Luck, Tony" , gong.chen@intel.com, Borislav Petkov , Linux Kernel Mailing List Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 13543 Lines: 184 On Fri, Jun 26, 2015 at 9:55 AM, Rui Wang wrote: > Hi all, > > I'm here to report two panics which hang forever (the machine cannot reboot). It is because mgag200 doesn't work in panic context. It sleeps and allocates memory non-atomically. This is the same for all drm drivers, the drm atomic handling with fbcon/fbdev is totally broken. It would be serious work to fix this properly. -Daniel > > These were triggered while injecting machine checks using einj. > > 1) > > [321381.466885] ------------[ cut here ]------------ > [321381.472144] WARNING: CPU: 136 PID: 0 at kernel/time/timer.c:1098 del_timer_sync+0x36/0x60() > [321381.481571] Modules linked in: einj(E) nmioe(E) iscsi_ibft(E) iscsi_boot_sysfs(E) af_packet(E) x86_pkg_temp_thermal(E) btrfs(E) intel_powerclamp(E) coretemp(E) kvm(E) xor(E) crct10dif_pclmul(E) raid6_pq(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) iTCO_wdt(E) iTCO_vendor_support(E) joydev(E) aesni_intel(E) lpc_ich(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) sb_edac(E) ablk_helper(E) cryptd(E) pcspkr(E) mfd_core(E) i2c_i801(E) wmi(E) edac_core(E) shpchp(E) ipmi_si(E) ipmi_msghandler(E) processor(E) acpi_pad(E) button(E) dm_mod(E) ext4(E) crc16(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) sd_mod(E) mgag200(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) ehci_pci(E) ehci_hcd(E) drm_kms_helper(E) ixgbe(E) ahci(E) igb(E) mdio(E) ttm(E) libahci(E) ptp(E) i2c_algo_bit(E) usbcore(E) pps_core(E) drm(E) libata(E) megaraid_sas(E) usb_common(E) dca(E) sg(E) scsi_mod(E) autofs4(E) > [321381.572300] CPU: 136 PID: 0 Comm: swapper/136 Tainted: G W E 4.1.0-rc8-7-default+ #4 > [321381.582117] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0059.R00.1501081238 01/08/2015 > [321381.593777] ffffffff81818089 ffff88047fc88808 ffffffff8157d67e 0000000000000000 > [321381.602184] 0000000000000000 ffff88047fc88848 ffffffff810637fa ffff88046e4bc740 > [321381.610595] ffff88047fc888a8 ffff88047fc888a8 0000000104c6f0f8 ffff88047f5cdb00 > [321381.619006] Call Trace: > [321381.621834] <#MC> [] dump_stack+0x4c/0x65 > [321381.628358] [] warn_slowpath_common+0x8a/0xc0 > [321381.635168] [] warn_slowpath_null+0x1a/0x20 > [321381.641775] [] del_timer_sync+0x36/0x60 > [321381.647995] [] schedule_timeout+0x150/0x280 > [321381.654611] [] ? idr_alloc+0x7b/0xe0 > [321381.660547] [] ? internal_add_timer+0x80/0x80 > [321381.667359] [] msleep+0x3c/0x50 > [321381.672812] [] mga_crtc_prepare+0x167/0x370 [mgag200] > [321381.680404] [] drm_crtc_helper_set_mode+0x2d6/0x530 [drm_kms_helper] > [321381.689453] [] drm_crtc_helper_set_config+0x856/0xa70 [drm_kms_helper] > [321381.698706] [] drm_mode_set_config_internal+0x68/0x100 [drm] > [321381.706971] [] restore_fbdev_mode+0xc2/0xf0 [drm_kms_helper] > [321381.715244] [] drm_fb_helper_force_kernel_mode+0x73/0xb0 [drm_kms_helper] > [321381.724780] [] drm_fb_helper_panic+0x29/0x30 [drm_kms_helper] > [321381.733144] [] notifier_call_chain+0x4d/0x80 > [321381.739859] [] atomic_notifier_call_chain+0x21/0x30 > [321381.747252] [] panic+0xee/0x1f5 > [321381.752704] [] mce_panic+0x1e2/0x200 > [321381.758640] [] mce_timed_out+0x73/0x80 > [321381.764762] [] do_machine_check+0x5f1/0xae0 > [321381.771377] [] ? intel_idle+0xbf/0x130 > [321381.777499] [] machine_check+0x29/0x50 > [321381.783630] [] ? intel_idle+0xbf/0x130 > [321381.789760] <> [] cpuidle_enter_state+0x70/0x1f0 > [321381.797457] [] cpuidle_enter+0x17/0x20 > [321381.803586] [] cpu_startup_entry+0x308/0x390 > [321381.810297] [] start_secondary+0x143/0x170 > [321381.816814] ---[ end trace 9f2a977c4a9be24e ]--- > [321381.822068] bad: scheduling from the idle thread! > [321381.827421] CPU: 136 PID: 0 Comm: swapper/136 Tainted: G W E 4.1.0-rc8-7-default+ #4 > [321381.837238] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0059.R00.1501081238 01/08/2015 > [321381.848898] ffff88046e4bc740 ffff88047fc887a8 ffffffff8157d67e 0000000000000000 > [321381.857305] ffff88047fc95300 ffff88047fc887c8 ffffffff81093675 ffff88047fc88808 > [321381.865713] ffff88047fc95300 ffff88047fc887f8 ffffffff8108796c 0000000100000000 > [321381.874124] Call Trace: > [321381.876951] <#MC> [] dump_stack+0x4c/0x65 > [321381.883483] [] dequeue_task_idle+0x35/0x50 > [321381.890001] [] dequeue_task+0x5c/0x80 > [321381.896027] [] deactivate_task+0x2b/0x30 > [321381.902352] [] __schedule+0x64a/0x910 > [321381.908385] [] schedule+0x3e/0x90 > [321381.914030] [] schedule_timeout+0x148/0x280 > [321381.920636] [] ? idr_alloc+0x7b/0xe0 > [321381.926570] [] ? internal_add_timer+0x80/0x80 > [321381.933382] [] msleep+0x3c/0x50 > [321381.938835] [] mga_crtc_prepare+0x167/0x370 [mgag200] > [321381.946428] [] drm_crtc_helper_set_mode+0x2d6/0x530 [drm_kms_helper] > [321381.955478] [] drm_crtc_helper_set_config+0x856/0xa70 [drm_kms_helper] > [321381.964731] [] drm_mode_set_config_internal+0x68/0x100 [drm] > [321381.973004] [] restore_fbdev_mode+0xc2/0xf0 [drm_kms_helper] > [321381.981277] [] drm_fb_helper_force_kernel_mode+0x73/0xb0 [drm_kms_helper] > [321381.990811] [] drm_fb_helper_panic+0x29/0x30 [drm_kms_helper] > [321381.999174] [] notifier_call_chain+0x4d/0x80 > [321382.005887] [] atomic_notifier_call_chain+0x21/0x30 > [321382.013280] [] panic+0xee/0x1f5 > [321382.018731] [] mce_panic+0x1e2/0x200 > [321382.024660] [] mce_timed_out+0x73/0x80 > [321382.030787] [] do_machine_check+0x5f1/0xae0 > [321382.037404] [] ? intel_idle+0xbf/0x130 > [321382.043533] [] machine_check+0x29/0x50 > [321382.049665] [] ? intel_idle+0xbf/0x130 > [321382.055794] <> [] cpuidle_enter_state+0x70/0x1f0 > [321382.063491] [] cpuidle_enter+0x17/0x20 > [321382.069623] [] cpu_startup_entry+0x308/0x390 > [321382.076335] [] start_secondary+0x143/0x170 > [321382.082877] ------------[ cut here ]------------ > > > 2) > > bkd04sdp:~ # [58109.056018] Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler > [58109.056058] mce: [Hardware Error]: Machine check events logged > [58110.109873] Shutting down cpus with NMI > [58110.176778] Kernel Offset: disabled > [58110.180667] drm_kms_helper: panic occurred, switching back to text console > [58110.188367] mga_delay choosing mdelay... > [58110.242399] mga_delay choosing mdelay... > [58110.266768] ------------[ cut here ]------------ > [58110.271926] kernel BUG at mm/vmalloc.c:1335! > [58110.276695] invalid opcode: 0000 [#1] SMP > [58110.281289] Modules linked in: einj(E) nmioe(E) iscsi_ibft(E) iscsi_boot_sysfs(E) af_packet(E) btrfs(E) xor(E) raid6_pq(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm(E) joydev(E) iTCO_wdt(E) iTCO_vendor_support(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) sb_edac(E) ablk_helper(E) lpc_ich(E) cryptd(E) pcspkr(E) edac_core(E) mfd_core(E) i2c_i801(E) shpchp(E) wmi(E) ipmi_si(E) ipmi_msghandler(E) acpi_pad(E) processor(E) button(E) dm_mod(E) ext4(E) crc16(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) sd_mod(E) mgag200(E) syscopyarea(E) sysfillrect(E) ahci(E) ehci_pci(E) sysimgblt(E) drm_kms_helper(E) ehci_hcd(E) ixgbe(E) igb(E) ttm(E) libahci(E) mdio(E) ptp(E) usbcore(E) pps_core(E) drm(E) libata(E) i2c_algo_bit(E) usb_common(E) dca(E) megaraid_sas(E) sg(E) scsi_mod(E) autofs4(E) > [58110.371884] CPU: 75 PID: 0 Comm: swapper/75 Tainted: G E 4.1.0-rc8-7-default+ #10 > [58110.381506] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0059.R00.1501081238 01/08/2015 > [58110.393063] task: ffff88046ea6d580 ti: ffff88046ea70000 task.ti: ffff88046ea70000 > [58110.401422] RIP: 0010:[] [] __get_vm_area_node+0x155/0x160 > [58110.411156] RSP: 0018:ffff88047f7284b8 EFLAGS: 00010006 > [58110.417091] RAX: 0000000080010003 RBX: 0000000091000000 RCX: ffffc90000000000 > [58110.425065] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 00000000000eb000 > [58110.433038] RBP: ffff88047f7284f8 R08: ffffe8ffffffffff R09: 00000000ffffffff > [58110.441010] R10: ffff880036a6a700 R11: ffff880460ab69c0 R12: 00000000910eb000 > [58110.448983] R13: 0000000000000001 R14: 0000000091000000 R15: 00000000000eb000 > [58110.456955] FS: 0000000000000000(0000) GS:ffff88047f720000(0000) knlGS:0000000000000000 > [58110.465994] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [58110.472413] CR2: 00007f5bdb9ac095 CR3: 0000000001a0b000 CR4: 00000000001407e0 > [58110.480386] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [58110.488358] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [58110.496331] Stack: > [58110.498577] ffff88047f728518 ffffc90000000000 00000000910eafff 0000000091000000 > [58110.506885] 00000000910eb000 0000000000000001 0000000091000000 00000000000eb000 > [58110.515192] ffff88047f728518 ffffffff8118ad20 00000000000000d0 ffffffffa0334f78 > [58110.523499] Call Trace: > [58110.526229] <#MC> > [58110.528382] [] get_vm_area_caller+0x40/0x50 > [58110.535111] [] ? ttm_mem_reg_ioremap+0xc8/0x110 [ttm] > [58110.542607] [] __ioremap_caller+0x188/0x390 > [58110.549127] [] ? find_next_bit+0x19/0x20 > [58110.555353] [] ioremap_wc+0x17/0x20 > [58110.561099] [] ttm_mem_reg_ioremap+0xc8/0x110 [ttm] > [58110.568398] [] ttm_bo_move_memcpy+0xd1/0x700 [ttm] > [58110.575598] [] ? __kmalloc+0x4b5/0x4c0 > [58110.581632] [] mgag200_bo_move+0x18/0x20 [mgag200] > [58110.588830] [] ttm_bo_handle_move_mem+0x260/0x590 [ttm] > [58110.596514] [] ? ttm_bo_mem_space+0xd2/0x320 [ttm] > [58110.603705] [] ttm_bo_validate+0x1c2/0x1d0 [ttm] > [58110.610711] [] ? irq_work_queue+0x11/0x90 > [58110.617037] [] mgag200_bo_push_sysram+0x93/0xe0 [mgag200] > [58110.624915] [] mga_crtc_do_set_base.isra.8.constprop.21+0x76/0x410 [mgag200] > [58110.634636] [] mga_crtc_mode_set+0x1042/0x2140 [mgag200] > [58110.642416] [] ? mga_crtc_prepare+0x132/0x370 [mgag200] > [58110.650106] [] drm_crtc_helper_set_mode+0x2fb/0x530 [drm_kms_helper] > [58110.659052] [] drm_crtc_helper_set_config+0x856/0xa70 [drm_kms_helper] > [58110.668217] [] drm_mode_set_config_internal+0x68/0x100 [drm] > [58110.676388] [] restore_fbdev_mode+0xc2/0xf0 [drm_kms_helper] > [58110.684558] [] drm_fb_helper_force_kernel_mode+0x73/0xb0 [drm_kms_helper] > [58110.693989] [] drm_fb_helper_panic+0x29/0x30 [drm_kms_helper] > [58110.702260] [] notifier_call_chain+0x4d/0x80 > [58110.708873] [] atomic_notifier_call_chain+0x21/0x30 > [58110.716169] [] panic+0xee/0x1f5 > [58110.721530] [] mce_panic+0x1e2/0x200 > [58110.727366] [] mce_timed_out+0x73/0x80 > [58110.733396] [] do_machine_check+0x5f1/0xae0 > [58110.739915] [] ? intel_idle+0xbf/0x130 > [58110.745952] [] machine_check+0x29/0x50 > [58110.751984] [] ? intel_idle+0xbf/0x130 > [58110.758017] <> > [58110.760362] [] cpuidle_enter_state+0x70/0x1f0 > [58110.767275] [] cpuidle_enter+0x17/0x20 > [58110.773309] [] cpu_startup_entry+0x308/0x390 > [58110.779916] [] start_secondary+0x143/0x170 > [58110.786325] Code: 00 00 48 0f bd cf 83 c1 01 83 f9 0c 0f 4c c8 b0 1e 83 f9 1e 0f 4f c8 49 d3 e6 e9 f8 fe ff ff 48 89 df e8 9f a8 01 00 31 c0 eb b8 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 49 89 c8 41 > [58110.808146] RIP [] __get_vm_area_node+0x155/0x160 > [58110.815257] RSP > [58110.820218] ---[ end trace ab0c230901a0ee95 ]--- > > Thanks > Rui > -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/