Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp7469370ybp; Wed, 16 Oct 2019 09:06:49 -0700 (PDT) X-Google-Smtp-Source: APXvYqwwVDCusmIDBOH1xY0Tq08yu0Vh1CnXSPDu7WM6YPBoi27K32n/rzZlEOjqkrEaFuYt+F3F X-Received: by 2002:a05:6402:698:: with SMTP id f24mr40496750edy.172.1571242009113; Wed, 16 Oct 2019 09:06:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571242009; cv=none; d=google.com; s=arc-20160816; b=lMIiUwrnKaUqfeiH70WUuot59/oCa3DBfINCJemt/x9qfNOaumrc2qtc7aR5LxTHPH ChkVDDRbW9RvW2xhXrPW1+bGhyMSmrrI1PkT9I4KZNbLCDJAs9gMG4Fznbm7WW2+AC4+ pP4cyVwFFuXdHILtbeBvFWMdpD3r0Z9NqOnc2MMOtDObb/Jrv70bHYdr6n8AFezNqEWk O4vNw4taIOZXhXrcHVmCSM6pfcUr3rVrJ/TPaMn6DbvLDYhHwcV0DRIl+e6p091kiv03 Er4xz4cLftIqaLPCojRS92Y0zJnSwsT7XFE5FtP8QWpswYgJStnjQAbZY6WqjYUv5AQ/ AvaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:mime-version:user-agent:date:message-id:subject:cc :to:from; bh=071FWZg+AvI8HEw2DcuS8AdUjhoPMb607DYe7mhBxkA=; b=u4k/3RcsSOUDvA081LEVrmsxY8IdSgqWiyPBiPUft/83ATetGtUbBX4+CVns39tiNI V9xo0Ad2Q3FQQYTh+1GDsbWim9y+7tw8o9UcYM1+GkBEZSafMepc0stWS59elNktv1fj GocoD0Q1nuxluwPr1aW12Tqr006VtnVsnU75/CZmtlC1bz7eCbcGoScAcAOkjCgvpDO3 coscA91lNsXx7lcfFi2TIMJWOV+xFGoDp/yE8Ctwex8xAj0RCngd9spjvEMYP4JQQlPp OjA6eDL5Nbdh7IPtteirXZgSR1IrHsZqcJGKUu8AhGa+m9UPUUHBxMSneXdayPWqU1v4 4G7A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ay11si15149007edb.215.2019.10.16.09.06.25; Wed, 16 Oct 2019 09:06:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730454AbfJPNgf (ORCPT + 99 others); Wed, 16 Oct 2019 09:36:35 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:38124 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727619AbfJPNge (ORCPT ); Wed, 16 Oct 2019 09:36:34 -0400 Received: from DGGEMS410-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 6DF9E476CBDCC42BCA3A; Wed, 16 Oct 2019 21:36:33 +0800 (CST) Received: from [127.0.0.1] (10.133.224.57) by DGGEMS410-HUB.china.huawei.com (10.3.19.210) with Microsoft SMTP Server id 14.3.439.0; Wed, 16 Oct 2019 21:36:24 +0800 From: Xiang Zheng To: , , , , , CC: Wang Haibin , Guoheyi , yebiaoxiang Subject: Kernel panic while doing vfio-pci hot-plug/unplug test Message-ID: <79827f2f-9b43-4411-1376-b9063b67aee3@huawei.com> Date: Wed, 16 Oct 2019 21:36:23 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.1.0 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.133.224.57] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi all, Recently I encountered a kernel panic while doing vfio-pci hot-plug/unplug test repeatly on my Arm-KVM virtual machines. See the call stack below: [66628.697280] vfio-pci 0000:06:03.5: enabling device (0000 -> 0002) [66628.809290] vfio-pci 0000:06:03.1: enabling device (0000 -> 0002) [66628.921283] vfio-pci 0000:06:02.7: enabling device (0000 -> 0002) [66629.029280] vfio-pci 0000:06:03.6: enabling device (0000 -> 0002) [66629.137338] vfio-pci 0000:06:03.2: enabling device (0000 -> 0002) [66629.249285] vfio-pci 0000:06:03.7: enabling device (0000 -> 0002) [66630.237261] Unable to handle kernel read from unreadable memory at virtual address ffff802dac469000 [66630.246266] Mem abort info: [66630.249047] ESR = 0x8600000d [66630.252088] Exception class = IABT (current EL), IL = 32 bits [66630.257981] SET = 0, FnV = 0 [66630.261022] EA = 0, S1PTW = 0 [66630.264150] swapper pgtable: 4k pages, 48-bit VAs, pgdp = 00000000fb16886e [66630.270992] [ffff802dac469000] pgd=0000203fffff6803, pud=00e8002d80000f11 [66630.277751] Internal error: Oops: 8600000d [#1] SMP [66630.282606] Process qemu-kvm (pid: 37201, stack limit = 0x00000000d8f19858) [66630.289537] CPU: 41 PID: 37201 Comm: qemu-kvm Kdump: loaded Tainted: G OE 4.19.36-vhulk1907.1.0.h453.eulerosv2r8.aarch64 #1 [66630.301822] Hardware name: Huawei TaiShan 2280 V2/BC82AMDDA, BIOS 0.88 07/24/2019 [66630.309270] pstate: 80400089 (Nzcv daIf +PAN -UAO) [66630.314042] pc : 0xffff802dac469000 [66630.317519] lr : __wake_up_common+0x90/0x1a8 [66630.321768] sp : ffff00027746bb00 [66630.325067] x29: ffff00027746bb00 x28: 0000000000000000 [66630.330355] x27: 0000000000000000 x26: ffff0000092755b8 [66630.335643] x25: 0000000000000000 x24: 0000000000000000 [66630.340930] x23: 0000000000000003 x22: ffff00027746bbc0 [66630.346219] x21: 000000000954c000 x20: ffff0001f542bc6c [66630.351506] x19: ffff0001f542bb90 x18: 0000000000000000 [66630.356793] x17: 0000000000000000 x16: 0000000000000000 [66630.362081] x15: 0000000000000000 x14: 0000000000000000 [66630.367368] x13: 0000000000000000 x12: 0000000000000000 [66630.372655] x11: 0000000000000000 x10: 0000000000000bb0 [66630.377942] x9 : ffff00027746ba50 x8 : ffff80367ff6ca10 [66630.383229] x7 : ffff802e20d59200 x6 : 000000000000003f [66630.388517] x5 : ffff00027746bbc0 x4 : ffff802dac469000 [66630.393806] x3 : 0000000000000000 x2 : 0000000000000000 [66630.399093] x1 : 0000000000000003 x0 : ffff0001f542bb90 [66630.404381] Call trace: [66630.406818] 0xffff802dac469000 [66630.409945] __wake_up_common_lock+0xa8/0x1a0 [66630.414283] __wake_up+0x40/0x50 [66630.417499] pci_cfg_access_unlock+0x9c/0xd0 [66630.421752] pci_try_reset_function+0x58/0x78 [66630.426095] vfio_pci_ioctl+0x478/0xdb8 [vfio_pci] [66630.430870] vfio_device_fops_unl_ioctl+0x44/0x70 [vfio] [66630.436158] do_vfs_ioctl+0xc4/0x8c0 [66630.439718] ksys_ioctl+0x8c/0xa0 [66630.443018] __arm64_sys_ioctl+0x28/0x38 [66630.446925] el0_svc_common+0x78/0x130 [66630.450657] el0_svc_handler+0x38/0x78 [66630.454389] el0_svc+0x8/0xc [66630.457260] Code: 00000000 00000000 00000000 00000000 (ac46d000) [66630.463325] kernel fault(0x1) notification starting on CPU 41 [66630.469044] kernel fault(0x1) notification finished on CPU 41 The chance to reproduce this problem is very small. We had an initial analysis of this problem, and found it was caused by the illegal value of the 'curr->func' in the __wake_up_common() function. I cannot image how 'curr->func' can be wrote to 0xffff802dac469000. Is there any problem about concurrent competition between the pci_wait_cfg() function and the wake_up_all() function? -- Thanks, Xiang