Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1339313yba; Sun, 5 May 2019 04:18:43 -0700 (PDT) X-Google-Smtp-Source: APXvYqzhXGz2g6P9IkI3MBNrCh5mkzLcxk1v0fxrDuKWIVvUrcnrJWt9xiLx5paFPxnaF0Y468Nh X-Received: by 2002:a63:134e:: with SMTP id 14mr16548543pgt.226.1557055122940; Sun, 05 May 2019 04:18:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557055122; cv=none; d=google.com; s=arc-20160816; b=wB4zbTVZGV6eAUlUUQTPFZyZfceROeLHbb85ad/rnB92BOnRqxikFTQcHl4KuKpEQ4 ZJYzK4pZrZ1BFW1/VStafJbuKjjQYyqtVepaKQvd9fOAkpqfxSKpS5XLUNElAbw2RMEK r1atBHk/KBpxSO0n9bwkWHSyl1hOesGu+5uZ7WySwIO83d21rAjOP8EHuVCS3FDRtlNj pccHug2HuCvDqRE/Vub1f5ywowbbnbhjbiu12sCg78pZwN7OmmTCEbUerqRVrL8SvOD1 /OffYL68t9WAnTdSNiy1865lKtVYu6qSAt7yOHsfK2QXWyCUjUafDWZKTeMtwRNojrEz IkeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=I6qPKzILiWCZstIhuHMK+tvRxiK0IKdXnU2avHM3jlY=; b=EbWWgy0Z2y6Y/rkcT5iAzGiOn1hM1ofHwNQeITNdA0GJR8XNq3ZA/RbDUatqoeqg8T uhP52owxJ7g7Z/kNs89557SbuKCaSdIMBEzh/btqfzfHIvYhp9sJZVfDtY8IIQ2MyC7a hhBfUnSuX6oiudQUyQ6cZHFSmfVuHEpi9c54opjybLVwTmKqboYEu1BcUa2ZggfVXs5A zeIWBVV2mwrE+iDpTBJ9WCoRIDXHQ5BaZsdemRCk0IXYB2sgOIMkAYEWsbSaH88ZcFrO 9fe/iK+9325ej6kFJIOQeqGjqwZwDgCin3mQurEohez1ejsq1S9RTgqbe64/3BqvJCKs 3+3Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d4si10317687pgc.304.2019.05.05.04.18.27; Sun, 05 May 2019 04:18:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727647AbfEELQG (ORCPT + 99 others); Sun, 5 May 2019 07:16:06 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:7727 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726397AbfEELQG (ORCPT ); Sun, 5 May 2019 07:16:06 -0400 Received: from DGGEMS407-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id D86C79973F6A558DD068; Sun, 5 May 2019 19:16:02 +0800 (CST) Received: from [127.0.0.1] (10.177.31.55) by DGGEMS407-HUB.china.huawei.com (10.3.19.207) with Microsoft SMTP Server id 14.3.439.0; Sun, 5 May 2019 19:15:52 +0800 Subject: Re: ARM/gic-v4: deadlock occurred To: Marc Zyngier References: <9efe0260-4a84-7489-ecdd-2e9561599320@huawei.com> <86lfzl9ofe.wl-marc.zyngier@arm.com> CC: , wanghaibin 00208455 , kvmarm From: Heyi Guo Message-ID: Date: Sun, 5 May 2019 19:15:51 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <86lfzl9ofe.wl-marc.zyngier@arm.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.31.55] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org BTW since its_irq_set_vcpu_affinity() is already in atomic context, do we really need a separate lock its_dev->event_map.vlpi_lock? I didn't find anywhere outside its_irq_set_vcpu_affinity() call chain acquires this lock. Thanks, Heyi On 2019/5/5 18:38, Marc Zyngier wrote: > [+ kvmarm] > > Hi Heyi, > > On Sun, 05 May 2019 03:26:18 +0100, > Heyi Guo wrote: >> Hi folks, >> >> We observed deadlocks after enabling GICv4 and PCI passthrough on >> ARM64 virtual machines, when not pinning VCPU to physical CPU. >> >> We observed below warnings after enabling lockdep debug in kernel: >> >> [ 362.847021] ===================================================== >> [ 362.855643] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected >> [ 362.864840] 4.19.34+ #7 Tainted: G W >> [ 362.872314] ----------------------------------------------------- >> [ 362.881034] CPU 0/KVM/51468 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: >> [ 362.890504] 00000000659c1dc9 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.22+0x0/0x48 >> [ 362.901413] >> [ 362.901413] and this task is already holding: >> [ 362.912976] 000000007318873f (&dev->event_map.vlpi_lock){....}, at: its_irq_set_vcpu_affinity+0x134/0x638 >> [ 362.928626] which would create a new lock dependency: >> [ 362.936837] (&dev->event_map.vlpi_lock){....} -> (fs_reclaim){+.+.} >> [ 362.946449] >> [ 362.946449] but this new dependency connects a HARDIRQ-irq-safe lock: >> [ 362.960877] (&irq_desc_lock_class){-.-.} >> [ 362.960880] >> [ 362.960880] ... which became HARDIRQ-irq-safe at: >> [ 362.981234] lock_acquire+0xf0/0x258 >> [ 362.988337] _raw_spin_lock+0x54/0x90 >> [ 362.995543] handle_fasteoi_irq+0x2c/0x198 >> [ 363.003205] generic_handle_irq+0x34/0x50 >> [ 363.010787] __handle_domain_irq+0x68/0xc0 >> [ 363.018500] gic_handle_irq+0xf4/0x1e0 >> [ 363.025913] el1_irq+0xc8/0x180 >> [ 363.032683] _raw_spin_unlock_irq+0x40/0x60 >> [ 363.040512] finish_task_switch+0x98/0x258 >> [ 363.048254] __schedule+0x350/0xca8 >> [ 363.055359] schedule+0x40/0xa8 >> [ 363.062098] worker_thread+0xd8/0x410 >> [ 363.069340] kthread+0x134/0x138 >> [ 363.076070] ret_from_fork+0x10/0x18 >> [ 363.083111] >> [ 363.083111] to a HARDIRQ-irq-unsafe lock: >> [ 363.095213] (fs_reclaim){+.+.} >> [ 363.095216] >> [ 363.095216] ... which became HARDIRQ-irq-unsafe at: >> [ 363.114527] ... >> [ 363.114530] lock_acquire+0xf0/0x258 >> [ 363.126269] fs_reclaim_acquire.part.22+0x3c/0x48 >> [ 363.134206] fs_reclaim_acquire+0x2c/0x38 >> [ 363.141363] kmem_cache_alloc_trace+0x44/0x368 >> [ 363.148892] acpi_os_map_iomem+0x9c/0x208 >> [ 363.155934] acpi_os_map_memory+0x28/0x38 >> [ 363.162831] acpi_tb_acquire_table+0x58/0x8c >> [ 363.170021] acpi_tb_validate_table+0x34/0x58 >> [ 363.177162] acpi_tb_get_table+0x4c/0x90 >> [ 363.183741] acpi_get_table+0x94/0xc4 >> [ 363.190020] find_acpi_cpu_topology_tag+0x54/0x240 >> [ 363.197404] find_acpi_cpu_topology_package+0x28/0x38 >> [ 363.204985] init_cpu_topology+0xdc/0x1e4 >> [ 363.211498] smp_prepare_cpus+0x2c/0x108 >> [ 363.217882] kernel_init_freeable+0x130/0x508 >> [ 363.224699] kernel_init+0x18/0x118 >> [ 363.230624] ret_from_fork+0x10/0x18 >> [ 363.236611] >> [ 363.236611] other info that might help us debug this: >> [ 363.236611] >> [ 363.251604] Chain exists of: >> [ 363.251604] &irq_desc_lock_class --> &dev->event_map.vlpi_lock --> fs_reclaim >> [ 363.251604] >> [ 363.270508] Possible interrupt unsafe locking scenario: >> [ 363.270508] >> [ 363.282238] CPU0 CPU1 >> [ 363.289228] ---- ---- >> [ 363.296189] lock(fs_reclaim); >> [ 363.301726] local_irq_disable(); >> [ 363.310122] lock(&irq_desc_lock_class); >> [ 363.319143] lock(&dev->event_map.vlpi_lock); >> [ 363.328617] >> [ 363.333713] lock(&irq_desc_lock_class); >> [ 363.340414] >> [ 363.340414] *** DEADLOCK *** >> [ 363.340414] >> [ 363.353682] 5 locks held by CPU 0/KVM/51468: >> [ 363.360412] #0: 00000000eeb852a5 (&vdev->igate){+.+.}, at: vfio_pci_ioctl+0x2f8/0xed0 >> [ 363.370915] #1: 000000002ab491f7 (lock#9){+.+.}, at: irq_bypass_register_producer+0x6c/0x1d0 >> [ 363.382139] #2: 000000000d9fd5c6 (&its->its_lock){+.+.}, at: kvm_vgic_v4_set_forwarding+0xd0/0x188 >> [ 363.396625] #3: 00000000232bdc47 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x60/0xa0 >> [ 363.408486] #4: 000000007318873f (&dev->event_map.vlpi_lock){....}, at: its_irq_set_vcpu_affinity+0x134/0x638 >> >> >> Then we found that irq_set_vcpu_affinity() in kernel/irq/manage.c >> aquires an antomic context by irq_get_desc_lock() at the beginning, >> but in its_irq_set_vcpu_affinity() >> (drivers/irqchip/irq-gic-v3-its.c) we are still using mutext_lock, >> kcalloc, kfree, etc, which we think should be forbidden in atomic >> context. >> >> Though the issue is observed in 4.19.34, we don't find any related >> fixes in the mainline yet. > Thanks for the report. Given that you're the only users of GICv4, > you're bound to find a number of these issues. > > Can you try the patch below and let me know whether it helps? This is > the simplest thing I can think off to paper over the issue, but is > isn't pretty, and I'm looking at possible alternatives (ideally, we'd > be able to allocate the map outside of the irqdesc lock, but this > requires some API change between KVM, the GICv4 layer and the ITS > code). > > Note that I'm travelling for the next two weeks without access to my > test rig, so I'm relying on you to test this stuff. > > Thanks, > > M. > > diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c > index 7577755bdcf4..18aa04b6a9f4 100644 > --- a/drivers/irqchip/irq-gic-v3-its.c > +++ b/drivers/irqchip/irq-gic-v3-its.c > @@ -142,7 +142,7 @@ struct event_lpi_map { > u16 *col_map; > irq_hw_number_t lpi_base; > int nr_lpis; > - struct mutex vlpi_lock; > + raw_spinlock_t vlpi_lock; > struct its_vm *vm; > struct its_vlpi_map *vlpi_maps; > int nr_vlpis; > @@ -1263,13 +1263,13 @@ static int its_vlpi_map(struct irq_data *d, struct its_cmd_info *info) > if (!info->map) > return -EINVAL; > > - mutex_lock(&its_dev->event_map.vlpi_lock); > + raw_spin_lock(&its_dev->event_map.vlpi_lock); > > if (!its_dev->event_map.vm) { > struct its_vlpi_map *maps; > > maps = kcalloc(its_dev->event_map.nr_lpis, sizeof(*maps), > - GFP_KERNEL); > + GFP_ATOMIC); > if (!maps) { > ret = -ENOMEM; > goto out; > @@ -1312,7 +1312,7 @@ static int its_vlpi_map(struct irq_data *d, struct its_cmd_info *info) > } > > out: > - mutex_unlock(&its_dev->event_map.vlpi_lock); > + raw_spin_unlock(&its_dev->event_map.vlpi_lock); > return ret; > } > > @@ -1322,7 +1322,7 @@ static int its_vlpi_get(struct irq_data *d, struct its_cmd_info *info) > u32 event = its_get_event_id(d); > int ret = 0; > > - mutex_lock(&its_dev->event_map.vlpi_lock); > + raw_spin_lock(&its_dev->event_map.vlpi_lock); > > if (!its_dev->event_map.vm || > !its_dev->event_map.vlpi_maps[event].vm) { > @@ -1334,7 +1334,7 @@ static int its_vlpi_get(struct irq_data *d, struct its_cmd_info *info) > *info->map = its_dev->event_map.vlpi_maps[event]; > > out: > - mutex_unlock(&its_dev->event_map.vlpi_lock); > + raw_spin_unlock(&its_dev->event_map.vlpi_lock); > return ret; > } > > @@ -1344,7 +1344,7 @@ static int its_vlpi_unmap(struct irq_data *d) > u32 event = its_get_event_id(d); > int ret = 0; > > - mutex_lock(&its_dev->event_map.vlpi_lock); > + raw_spin_lock(&its_dev->event_map.vlpi_lock); > > if (!its_dev->event_map.vm || !irqd_is_forwarded_to_vcpu(d)) { > ret = -EINVAL; > @@ -1374,7 +1374,7 @@ static int its_vlpi_unmap(struct irq_data *d) > } > > out: > - mutex_unlock(&its_dev->event_map.vlpi_lock); > + raw_spin_unlock(&its_dev->event_map.vlpi_lock); > return ret; > } > > @@ -2436,7 +2436,7 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id, > dev->event_map.col_map = col_map; > dev->event_map.lpi_base = lpi_base; > dev->event_map.nr_lpis = nr_lpis; > - mutex_init(&dev->event_map.vlpi_lock); > + raw_spin_lock_init(&dev->event_map.vlpi_lock); > dev->device_id = dev_id; > INIT_LIST_HEAD(&dev->entry); > >