Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1311424yba; Sun, 5 May 2019 03:39:57 -0700 (PDT) X-Google-Smtp-Source: APXvYqyaQV2FLL0VNhcQrKasAMaKhkaFI+VxNbOUuLdYKJvNATpvczOiohWSrfDPm61TdOwEWsW7 X-Received: by 2002:a62:b602:: with SMTP id j2mr24897759pff.68.1557052796908; Sun, 05 May 2019 03:39:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557052796; cv=none; d=google.com; s=arc-20160816; b=szmvFrfxcP7Csj5fvqEHu3bwRVEmmePsvrsn+dRtlsnw/RwnKGbMuTipTaFF2k/JR3 uDE/QyjVrFpaO9T0O3/SrGk9aO7p9f+QTpKzx+FoHDgF9FWA+2Hw5hEp30DJiXC9n6y1 HDoZyQl/x0CdRkpOFurC/xLbTi1IPQast0IWtjjt1C2vEbM3d0mzr+ka3ursT7Jx7oA5 e9Dnr2F9fkMEdMy9LSiutzVzcqQpU1lxuoD5omHyvKFyw447lLC5DCre/3eiUe66EFV9 mkmNaGE8vHT9dpFRL+tCdtgUGJ36oSIyL7GiXlHQJDZvvfWKJLsXTo+/+6M+LWt4LVKp UXXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:organization:user-agent :references:in-reply-to:subject:cc:to:from:message-id:date; bh=wMm9iuvEs31oly2fM5EpDVxSnhlAOgQw+wTuzdf3sRI=; b=JJI/JaWR3GgR/PWwTcRkXp2cAnR4JY+/qtST9tRMdUZzRK9OWE9XZElYctW6jZmX4e XxGpUwZuA0LO7Oc/2z7svD7NsUO7TfkucTkySNVn1pnE6canlSplVEID/E8snkUVXtF0 6fBLM4P0/wliocFwxAMhBi47DMugEfBt9dWuN7nwD3jxWTyqrid26KbeXU08tUhv70ti R70WgW59ZyIW3JcylS4rtH2MP4xrL88NTKG+gwNZw+bYtv1PCpQ0kt4Yo9g8Iuz6D9KO tAZ8g1R+ACYd6b0W6O293C4vaD5XmvXO3r8zjn7LBC2DfeCfg9e7ihaB2/dJXgUBHXtW +e9Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o4si10951554plb.312.2019.05.05.03.39.39; Sun, 05 May 2019 03:39:56 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727476AbfEEKiC (ORCPT + 99 others); Sun, 5 May 2019 06:38:02 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:55710 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726310AbfEEKiC (ORCPT ); Sun, 5 May 2019 06:38:02 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 80E2E374; Sun, 5 May 2019 03:38:01 -0700 (PDT) Received: from big-swifty.misterjones.org (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3C3EE3F238; Sun, 5 May 2019 03:38:00 -0700 (PDT) Date: Sun, 05 May 2019 11:38:13 +0100 Message-ID: <86lfzl9ofe.wl-marc.zyngier@arm.com> From: Marc Zyngier To: Heyi Guo Cc: , wanghaibin 00208455 , kvmarm Subject: Re: ARM/gic-v4: deadlock occurred In-Reply-To: <9efe0260-4a84-7489-ecdd-2e9561599320@huawei.com> References: <9efe0260-4a84-7489-ecdd-2e9561599320@huawei.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/26 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Organization: ARM Ltd MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [+ kvmarm] Hi Heyi, On Sun, 05 May 2019 03:26:18 +0100, Heyi Guo wrote: > > Hi folks, > > We observed deadlocks after enabling GICv4 and PCI passthrough on > ARM64 virtual machines, when not pinning VCPU to physical CPU. > > We observed below warnings after enabling lockdep debug in kernel: > > [ 362.847021] ===================================================== > [ 362.855643] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected > [ 362.864840] 4.19.34+ #7 Tainted: G W > [ 362.872314] ----------------------------------------------------- > [ 362.881034] CPU 0/KVM/51468 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: > [ 362.890504] 00000000659c1dc9 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.22+0x0/0x48 > [ 362.901413] > [ 362.901413] and this task is already holding: > [ 362.912976] 000000007318873f (&dev->event_map.vlpi_lock){....}, at: its_irq_set_vcpu_affinity+0x134/0x638 > [ 362.928626] which would create a new lock dependency: > [ 362.936837] (&dev->event_map.vlpi_lock){....} -> (fs_reclaim){+.+.} > [ 362.946449] > [ 362.946449] but this new dependency connects a HARDIRQ-irq-safe lock: > [ 362.960877] (&irq_desc_lock_class){-.-.} > [ 362.960880] > [ 362.960880] ... which became HARDIRQ-irq-safe at: > [ 362.981234] lock_acquire+0xf0/0x258 > [ 362.988337] _raw_spin_lock+0x54/0x90 > [ 362.995543] handle_fasteoi_irq+0x2c/0x198 > [ 363.003205] generic_handle_irq+0x34/0x50 > [ 363.010787] __handle_domain_irq+0x68/0xc0 > [ 363.018500] gic_handle_irq+0xf4/0x1e0 > [ 363.025913] el1_irq+0xc8/0x180 > [ 363.032683] _raw_spin_unlock_irq+0x40/0x60 > [ 363.040512] finish_task_switch+0x98/0x258 > [ 363.048254] __schedule+0x350/0xca8 > [ 363.055359] schedule+0x40/0xa8 > [ 363.062098] worker_thread+0xd8/0x410 > [ 363.069340] kthread+0x134/0x138 > [ 363.076070] ret_from_fork+0x10/0x18 > [ 363.083111] > [ 363.083111] to a HARDIRQ-irq-unsafe lock: > [ 363.095213] (fs_reclaim){+.+.} > [ 363.095216] > [ 363.095216] ... which became HARDIRQ-irq-unsafe at: > [ 363.114527] ... > [ 363.114530] lock_acquire+0xf0/0x258 > [ 363.126269] fs_reclaim_acquire.part.22+0x3c/0x48 > [ 363.134206] fs_reclaim_acquire+0x2c/0x38 > [ 363.141363] kmem_cache_alloc_trace+0x44/0x368 > [ 363.148892] acpi_os_map_iomem+0x9c/0x208 > [ 363.155934] acpi_os_map_memory+0x28/0x38 > [ 363.162831] acpi_tb_acquire_table+0x58/0x8c > [ 363.170021] acpi_tb_validate_table+0x34/0x58 > [ 363.177162] acpi_tb_get_table+0x4c/0x90 > [ 363.183741] acpi_get_table+0x94/0xc4 > [ 363.190020] find_acpi_cpu_topology_tag+0x54/0x240 > [ 363.197404] find_acpi_cpu_topology_package+0x28/0x38 > [ 363.204985] init_cpu_topology+0xdc/0x1e4 > [ 363.211498] smp_prepare_cpus+0x2c/0x108 > [ 363.217882] kernel_init_freeable+0x130/0x508 > [ 363.224699] kernel_init+0x18/0x118 > [ 363.230624] ret_from_fork+0x10/0x18 > [ 363.236611] > [ 363.236611] other info that might help us debug this: > [ 363.236611] > [ 363.251604] Chain exists of: > [ 363.251604] &irq_desc_lock_class --> &dev->event_map.vlpi_lock --> fs_reclaim > [ 363.251604] > [ 363.270508] Possible interrupt unsafe locking scenario: > [ 363.270508] > [ 363.282238] CPU0 CPU1 > [ 363.289228] ---- ---- > [ 363.296189] lock(fs_reclaim); > [ 363.301726] local_irq_disable(); > [ 363.310122] lock(&irq_desc_lock_class); > [ 363.319143] lock(&dev->event_map.vlpi_lock); > [ 363.328617] > [ 363.333713] lock(&irq_desc_lock_class); > [ 363.340414] > [ 363.340414] *** DEADLOCK *** > [ 363.340414] > [ 363.353682] 5 locks held by CPU 0/KVM/51468: > [ 363.360412] #0: 00000000eeb852a5 (&vdev->igate){+.+.}, at: vfio_pci_ioctl+0x2f8/0xed0 > [ 363.370915] #1: 000000002ab491f7 (lock#9){+.+.}, at: irq_bypass_register_producer+0x6c/0x1d0 > [ 363.382139] #2: 000000000d9fd5c6 (&its->its_lock){+.+.}, at: kvm_vgic_v4_set_forwarding+0xd0/0x188 > [ 363.396625] #3: 00000000232bdc47 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x60/0xa0 > [ 363.408486] #4: 000000007318873f (&dev->event_map.vlpi_lock){....}, at: its_irq_set_vcpu_affinity+0x134/0x638 > > > Then we found that irq_set_vcpu_affinity() in kernel/irq/manage.c > aquires an antomic context by irq_get_desc_lock() at the beginning, > but in its_irq_set_vcpu_affinity() > (drivers/irqchip/irq-gic-v3-its.c) we are still using mutext_lock, > kcalloc, kfree, etc, which we think should be forbidden in atomic > context. > > Though the issue is observed in 4.19.34, we don't find any related > fixes in the mainline yet. Thanks for the report. Given that you're the only users of GICv4, you're bound to find a number of these issues. Can you try the patch below and let me know whether it helps? This is the simplest thing I can think off to paper over the issue, but is isn't pretty, and I'm looking at possible alternatives (ideally, we'd be able to allocate the map outside of the irqdesc lock, but this requires some API change between KVM, the GICv4 layer and the ITS code). Note that I'm travelling for the next two weeks without access to my test rig, so I'm relying on you to test this stuff. Thanks, M. diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 7577755bdcf4..18aa04b6a9f4 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -142,7 +142,7 @@ struct event_lpi_map { u16 *col_map; irq_hw_number_t lpi_base; int nr_lpis; - struct mutex vlpi_lock; + raw_spinlock_t vlpi_lock; struct its_vm *vm; struct its_vlpi_map *vlpi_maps; int nr_vlpis; @@ -1263,13 +1263,13 @@ static int its_vlpi_map(struct irq_data *d, struct its_cmd_info *info) if (!info->map) return -EINVAL; - mutex_lock(&its_dev->event_map.vlpi_lock); + raw_spin_lock(&its_dev->event_map.vlpi_lock); if (!its_dev->event_map.vm) { struct its_vlpi_map *maps; maps = kcalloc(its_dev->event_map.nr_lpis, sizeof(*maps), - GFP_KERNEL); + GFP_ATOMIC); if (!maps) { ret = -ENOMEM; goto out; @@ -1312,7 +1312,7 @@ static int its_vlpi_map(struct irq_data *d, struct its_cmd_info *info) } out: - mutex_unlock(&its_dev->event_map.vlpi_lock); + raw_spin_unlock(&its_dev->event_map.vlpi_lock); return ret; } @@ -1322,7 +1322,7 @@ static int its_vlpi_get(struct irq_data *d, struct its_cmd_info *info) u32 event = its_get_event_id(d); int ret = 0; - mutex_lock(&its_dev->event_map.vlpi_lock); + raw_spin_lock(&its_dev->event_map.vlpi_lock); if (!its_dev->event_map.vm || !its_dev->event_map.vlpi_maps[event].vm) { @@ -1334,7 +1334,7 @@ static int its_vlpi_get(struct irq_data *d, struct its_cmd_info *info) *info->map = its_dev->event_map.vlpi_maps[event]; out: - mutex_unlock(&its_dev->event_map.vlpi_lock); + raw_spin_unlock(&its_dev->event_map.vlpi_lock); return ret; } @@ -1344,7 +1344,7 @@ static int its_vlpi_unmap(struct irq_data *d) u32 event = its_get_event_id(d); int ret = 0; - mutex_lock(&its_dev->event_map.vlpi_lock); + raw_spin_lock(&its_dev->event_map.vlpi_lock); if (!its_dev->event_map.vm || !irqd_is_forwarded_to_vcpu(d)) { ret = -EINVAL; @@ -1374,7 +1374,7 @@ static int its_vlpi_unmap(struct irq_data *d) } out: - mutex_unlock(&its_dev->event_map.vlpi_lock); + raw_spin_unlock(&its_dev->event_map.vlpi_lock); return ret; } @@ -2436,7 +2436,7 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id, dev->event_map.col_map = col_map; dev->event_map.lpi_base = lpi_base; dev->event_map.nr_lpis = nr_lpis; - mutex_init(&dev->event_map.vlpi_lock); + raw_spin_lock_init(&dev->event_map.vlpi_lock); dev->device_id = dev_id; INIT_LIST_HEAD(&dev->entry); -- Jazz is not dead, it just smell funny.