Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752958AbcJDGRX (ORCPT ); Tue, 4 Oct 2016 02:17:23 -0400 Received: from mail-ua0-f176.google.com ([209.85.217.176]:34851 "EHLO mail-ua0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751597AbcJDGRV (ORCPT ); Tue, 4 Oct 2016 02:17:21 -0400 MIME-Version: 1.0 From: Sitsofe Wheeler Date: Tue, 4 Oct 2016 07:17:19 +0100 Message-ID: Subject: BUG and Oops while trying to issue a discard to LVM on RAID1 md To: Arvind Kumar Cc: VMware PV-Drivers , "James E.J. Bottomley" , "Martin K. Petersen" , "linux-scsi@vger.kernel.org" , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8097 Lines: 147 While trying to do a discard inside an ESXi 6 VM to an LVM device atop an md RAID1 device composed of two SATA SSDs passed up as a raw disk mappings through a PVSCSI controller, this BUG followed by an Oops was hit: [ 86.902888] ------------[ cut here ]------------ [ 86.904600] kernel BUG at arch/x86/kernel/pci-nommu.c:66! [ 86.906538] invalid opcode: 0000 [#1] SMP [ 86.907991] Modules linked in: vmw_vsock_vmci_transport vsock sb_edac edac_core intel_powerclamp coretemp crct10dif_pclmul raid1 crc32_pclmul ppdev ghash_clmulni_intel vmw_balloon joydev intel_rapl_perf vmxnet3 acpi_cpufreq tpm_tis fjes parport_pc tpm vmw_vmci parport shpchp i2c_piix4 dm_multipath vmwgfx drm_kms_helper ttm drm crc32c_intel serio_raw ata_generic vmw_pvscsi pata_acpi [ 86.914919] CPU: 0 PID: 214 Comm: kworker/0:1H Not tainted 4.7.5-200.fc24.x86_64 #1 [ 86.916123] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014 [ 86.917720] Workqueue: kblockd blk_delay_work [ 86.918395] task: ffff88003a65bd00 ti: ffff88003fb18000 task.ti: ffff88003fb18000 [ 86.919478] RIP: 0010:[] [] nommu_map_sg+0x91/0xc0 [ 86.920697] RSP: 0018:ffff88003fb1bc70 EFLAGS: 00010046 [ 86.921471] RAX: 0000000000000200 RBX: 0000000000000001 RCX: 0000000000000001 [ 86.922539] RDX: 0000000000000000 RSI: ffff88003f8ca600 RDI: ffff88003ce820a0 [ 86.923611] RBP: ffff88003fb1bc98 R08: 0000000000000000 R09: 0000000000000000 [ 86.924692] R10: ffff88003f053000 R11: ffff88003c38c900 R12: 0000000000000001 [ 86.925733] R13: ffff88003ce820a0 R14: 0000000000000001 R15: ffff88003f8ca600 [ 86.926817] FS: 0000000000000000(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000 [ 86.928084] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 86.928958] CR2: 00007fc762951dd0 CR3: 000000003b6f8000 CR4: 00000000001406f0 [ 86.930034] Stack: [ 86.930334] 0000000000000001 ffff88003ce820a0 ffffffffb4c1b1c0 0000000000000001 [ 86.931541] 0000000000000001 ffff88003fb1bcd8 ffffffffb4565e87 ffff88003f8ca600 [ 86.932762] ffff88003f075f80 ffff88003c38c900 ffff88003f944000 ffff88003f082c90 [ 86.933990] Call Trace: [ 86.934361] [] scsi_dma_map+0x97/0xc0 [ 86.935122] [] pvscsi_queue+0x4a5/0x860 [vmw_pvscsi] [ 86.936129] [] ? scsi_test_unit_ready+0x150/0x150 [ 86.937099] [] scsi_dispatch_cmd+0xdd/0x220 [ 86.937936] [] scsi_request_fn+0x461/0x5f0 [ 86.938811] [] ? __switch_to+0x29a/0x4a0 [ 86.939705] [] __blk_run_queue+0x33/0x40 [ 86.940504] [] blk_delay_work+0x25/0x40 [ 86.941308] [] process_one_work+0x184/0x440 [ 86.942246] [] worker_thread+0x4e/0x480 [ 86.943054] [] ? process_one_work+0x440/0x440 [ 86.944033] [] ? process_one_work+0x440/0x440 [ 86.944961] [] kthread+0xd8/0xf0 [ 86.945699] [] ret_from_fork+0x1f/0x40 [ 86.946507] [] ? kthread_worker_fn+0x180/0x180 [ 86.947479] Code: ff ff ff 85 c0 74 3c 41 8b 47 0c 4c 89 ff 83 c3 01 41 89 47 18 e8 10 d1 3b 00 41 39 dc 49 89 c7 74 1e 49 8b 17 48 83 e2 fc 75 af <0f> 0b be 3f 00 00 00 48 c7 c7 59 51 a2 b4 e8 5c 1f 07 00 eb 80 [ 86.951837] RIP [] nommu_map_sg+0x91/0xc0 [ 86.952728] RSP [ 86.953238] ---[ end trace 9ce6a81b32bb6ab3 ]--- [ 86.954013] BUG: unable to handle kernel paging request at ffffffffffffffd8 [ 86.955145] IP: [] kthread_data+0x10/0x20 [ 86.956059] PGD 34c09067 PUD 34c0b067 PMD 0 [ 86.956776] Oops: 0000 [#2] SMP [ 86.957245] Modules linked in: vmw_vsock_vmci_transport vsock sb_edac edac_core intel_powerclamp coretemp crct10dif_pclmul raid1 crc32_pclmul ppdev ghash_clmulni_intel vmw_balloon joydev intel_rapl_perf vmxnet3 acpi_cpufreq tpm_tis fjes parport_pc tpm vmw_vmci parport shpchp i2c_piix4 dm_multipath vmwgfx drm_kms_helper ttm drm crc32c_intel serio_raw ata_generic vmw_pvscsi pata_acpi [ 86.962934] CPU: 0 PID: 214 Comm: kworker/0:1H Tainted: G D 4.7.5-200.fc24.x86_64 #1 [ 86.964251] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014 [ 86.965871] task: ffff88003a65bd00 ti: ffff88003fb18000 task.ti: ffff88003fb18000 [ 86.966969] RIP: 0010:[] [] kthread_data+0x10/0x20 [ 86.968221] RSP: 0018:ffff88003fb1b940 EFLAGS: 00010002 [ 86.968978] RAX: 0000000000000000 RBX: ffff88003ec18100 RCX: 0000000000000000 [ 86.970043] RDX: ffff88003e803090 RSI: ffff88003a65bd80 RDI: ffff88003a65bd00 [ 86.971059] RBP: ffff88003fb1b940 R08: ffff88003a65bda8 R09: 0000000000000000 [ 86.972105] R10: 0000000000000000 R11: 000000000015eb90 R12: ffff88003a65c350 [ 86.973160] R13: ffff88003ec18100 R14: ffff88003a65bd00 R15: 0000000000018100 [ 86.974262] FS: 0000000000000000(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000 [ 86.975480] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 86.976352] CR2: 0000000000000028 CR3: 000000003b6f8000 CR4: 00000000001406f0 [ 86.977517] Stack: [ 86.977843] ffff88003fb1b950 ffffffffb40bb23e ffff88003fb1b9a8 ffffffffb47e802f [ 86.979065] 00ff88003fb1b9c0 ffff88003a65bd00 ffff88003a65c238 0000000000000000 [ 86.980295] ffff88003fb1c000 0000000000000000 ffff88003fb1ba00 ffff88003fb1b4a0 [ 86.981525] Call Trace: [ 86.981938] [] wq_worker_sleeping+0xe/0x90 [ 86.982843] [] __schedule+0x50f/0x780 [ 86.983672] [] schedule+0x35/0x80 [ 86.984465] [] do_exit+0x7c3/0xb50 [ 86.985193] [] oops_end+0x9c/0xd0 [ 86.985974] [] die+0x4b/0x70 [ 86.986684] [] do_trap+0xb2/0x140 [ 86.987444] [] do_error_trap+0x89/0x110 [ 86.988279] [] ? nommu_map_sg+0x91/0xc0 [ 86.989098] [] ? sg_init_table+0x1a/0x40 [ 86.989904] [] do_invalid_op+0x20/0x30 [ 86.990757] [] invalid_op+0x1e/0x30 [ 86.991574] [] ? nommu_map_sg+0x91/0xc0 [ 86.992410] [] scsi_dma_map+0x97/0xc0 [ 86.993241] [] pvscsi_queue+0x4a5/0x860 [vmw_pvscsi] [ 86.994239] [] ? scsi_test_unit_ready+0x150/0x150 [ 86.995184] [] scsi_dispatch_cmd+0xdd/0x220 [ 86.996109] [] scsi_request_fn+0x461/0x5f0 [ 86.996943] [] ? __switch_to+0x29a/0x4a0 [ 86.997807] [] __blk_run_queue+0x33/0x40 [ 86.998665] [] blk_delay_work+0x25/0x40 [ 86.999504] [] process_one_work+0x184/0x440 [ 87.000365] [] worker_thread+0x4e/0x480 [ 87.001150] [] ? process_one_work+0x440/0x440 [ 87.002088] [] ? process_one_work+0x440/0x440 [ 87.003017] [] kthread+0xd8/0xf0 [ 87.003800] [] ret_from_fork+0x1f/0x40 [ 87.004576] [] ? kthread_worker_fn+0x180/0x180 [ 87.005463] Code: f7 89 72 00 e9 53 ff ff ff e8 dd fb fd ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 87 d8 05 00 00 55 48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 [ 87.009721] RIP [] kthread_data+0x10/0x20 [ 87.010615] RSP [ 87.011170] CR2: ffffffffffffffd8 [ 87.011732] ---[ end trace 9ce6a81b32bb6ab4 ]--- [ 87.012411] Fixing recursive fault but reboot is needed! Once the above happens the VM freezes up and needs to be hard reset. The kernel is 4.7.5-200.fc24.x86_64 from Fedora 24. A slightly related issue where just a "BUG at block/bio.c:1785" is hit can be seen over on https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1243446.html . -- Sitsofe | http://sucs.org/~sits/