Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1332241yba; Sun, 5 May 2019 04:09:38 -0700 (PDT) X-Google-Smtp-Source: APXvYqyoHrnwCPmaVLq+037+zUdmCy4agIflAMLx8nqvxpzrl7tzShzA8LztE+QtDdfZjvuIj0og X-Received: by 2002:a65:6496:: with SMTP id e22mr24206247pgv.249.1557054578110; Sun, 05 May 2019 04:09:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557054578; cv=none; d=google.com; s=arc-20160816; b=0QL6BdbOY5N+1Dt8UL5C5F84t4dT4XWA30Vqmft2iSiIzebH2voquN5dNUpyYjEGi3 OsiKBBdmpojD4tkKB7wg0ywRrz3e/xLcxz4S06fLrytVjLLSugaBqZTPs5ujRtaYC78X 9SrE64C5kxN4Xnhsgnxl/K9hFeC+f750njPF8yG8cEDtYKhw2da6eeg/1JIJLZcWhdEP Xe89coL0/2qjoWMnp2C9zX0gifokwTbN4qx/2dGPryjnvtS0YWU/4Npn+YcaQMZZmNhv jW+u+/n4S/F0EAfuY6LIeF1+cv5m7ungVsDkjG7F5gIfa2gBfX1pMcjoSXYquRifj5tO m+uw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=+22Z1WlZWj3tS/9f5TQERtPxsiNwdnCwxXU+ltxRDWI=; b=or/QuRrxhdy9BP8AdNzOP2AYNHX7NWvsjzDxShaKpU5H7O7qk+Hdi4s+pMmX2XfZ7d APo/AFzGhNSo60BGvT0jjYK2ebenwcTndmUmu7FQNTfcZCmda9JD2wxDJQfGwMSZXNvV WI9khq0g4XAuGmYB6ftaVCV/OSDLwp81qiQqBSmydSD5xBkawhUr0/ZJKxXBfD3o8tzy GZIPPS/I1BPOiOJoawwYmtiX2/KHZLkVuIhmEDtoNdwTUmsmmsmReMACC+Lh2x1NSUBW 0uNd+U2DSLGvO7iQfV7l1PCIkyo3lRlYsmelPQJC63/bGDfy7XOpzNtt+Fj589Fd1vxG t1Iw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v3si11625426plb.200.2019.05.05.04.09.22; Sun, 05 May 2019 04:09:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727616AbfEELHO (ORCPT + 99 others); Sun, 5 May 2019 07:07:14 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:7726 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727388AbfEELHO (ORCPT ); Sun, 5 May 2019 07:07:14 -0400 Received: from DGGEMS410-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 526279D4F40EBE4B8746; Sun, 5 May 2019 19:07:10 +0800 (CST) Received: from [127.0.0.1] (10.177.31.55) by DGGEMS410-HUB.china.huawei.com (10.3.19.210) with Microsoft SMTP Server id 14.3.439.0; Sun, 5 May 2019 19:07:02 +0800 Subject: Re: ARM/gic-v4: deadlock occurred To: Marc Zyngier References: <9efe0260-4a84-7489-ecdd-2e9561599320@huawei.com> <86lfzl9ofe.wl-marc.zyngier@arm.com> CC: , wanghaibin 00208455 , kvmarm From: Heyi Guo Message-ID: <0b413592-7d98-ebe8-35c5-da330f800326@huawei.com> Date: Sun, 5 May 2019 19:07:02 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <86lfzl9ofe.wl-marc.zyngier@arm.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.31.55] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Marc, Appreciate your quick patch :) We'll test it and let you know the result. Heyi On 2019/5/5 18:38, Marc Zyngier wrote: > [+ kvmarm] > > Hi Heyi, > > On Sun, 05 May 2019 03:26:18 +0100, > Heyi Guo wrote: >> Hi folks, >> >> We observed deadlocks after enabling GICv4 and PCI passthrough on >> ARM64 virtual machines, when not pinning VCPU to physical CPU. >> >> We observed below warnings after enabling lockdep debug in kernel: >> >> [ 362.847021] ===================================================== >> [ 362.855643] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected >> [ 362.864840] 4.19.34+ #7 Tainted: G W >> [ 362.872314] ----------------------------------------------------- >> [ 362.881034] CPU 0/KVM/51468 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: >> [ 362.890504] 00000000659c1dc9 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.22+0x0/0x48 >> [ 362.901413] >> [ 362.901413] and this task is already holding: >> [ 362.912976] 000000007318873f (&dev->event_map.vlpi_lock){....}, at: its_irq_set_vcpu_affinity+0x134/0x638 >> [ 362.928626] which would create a new lock dependency: >> [ 362.936837] (&dev->event_map.vlpi_lock){....} -> (fs_reclaim){+.+.} >> [ 362.946449] >> [ 362.946449] but this new dependency connects a HARDIRQ-irq-safe lock: >> [ 362.960877] (&irq_desc_lock_class){-.-.} >> [ 362.960880] >> [ 362.960880] ... which became HARDIRQ-irq-safe at: >> [ 362.981234] lock_acquire+0xf0/0x258 >> [ 362.988337] _raw_spin_lock+0x54/0x90 >> [ 362.995543] handle_fasteoi_irq+0x2c/0x198 >> [ 363.003205] generic_handle_irq+0x34/0x50 >> [ 363.010787] __handle_domain_irq+0x68/0xc0 >> [ 363.018500] gic_handle_irq+0xf4/0x1e0 >> [ 363.025913] el1_irq+0xc8/0x180 >> [ 363.032683] _raw_spin_unlock_irq+0x40/0x60 >> [ 363.040512] finish_task_switch+0x98/0x258 >> [ 363.048254] __schedule+0x350/0xca8 >> [ 363.055359] schedule+0x40/0xa8 >> [ 363.062098] worker_thread+0xd8/0x410 >> [ 363.069340] kthread+0x134/0x138 >> [ 363.076070] ret_from_fork+0x10/0x18 >> [ 363.083111] >> [ 363.083111] to a HARDIRQ-irq-unsafe lock: >> [ 363.095213] (fs_reclaim){+.+.} >> [ 363.095216] >> [ 363.095216] ... which became HARDIRQ-irq-unsafe at: >> [ 363.114527] ... >> [ 363.114530] lock_acquire+0xf0/0x258 >> [ 363.126269] fs_reclaim_acquire.part.22+0x3c/0x48 >> [ 363.134206] fs_reclaim_acquire+0x2c/0x38 >> [ 363.141363] kmem_cache_alloc_trace+0x44/0x368 >> [ 363.148892] acpi_os_map_iomem+0x9c/0x208 >> [ 363.155934] acpi_os_map_memory+0x28/0x38 >> [ 363.162831] acpi_tb_acquire_table+0x58/0x8c >> [ 363.170021] acpi_tb_validate_table+0x34/0x58 >> [ 363.177162] acpi_tb_get_table+0x4c/0x90 >> [ 363.183741] acpi_get_table+0x94/0xc4 >> [ 363.190020] find_acpi_cpu_topology_tag+0x54/0x240 >> [ 363.197404] find_acpi_cpu_topology_package+0x28/0x38 >> [ 363.204985] init_cpu_topology+0xdc/0x1e4 >> [ 363.211498] smp_prepare_cpus+0x2c/0x108 >> [ 363.217882] kernel_init_freeable+0x130/0x508 >> [ 363.224699] kernel_init+0x18/0x118 >> [ 363.230624] ret_from_fork+0x10/0x18 >> [ 363.236611] >> [ 363.236611] other info that might help us debug this: >> [ 363.236611] >> [ 363.251604] Chain exists of: >> [ 363.251604] &irq_desc_lock_class --> &dev->event_map.vlpi_lock --> fs_reclaim >> [ 363.251604] >> [ 363.270508] Possible interrupt unsafe locking scenario: >> [ 363.270508] >> [ 363.282238] CPU0 CPU1 >> [ 363.289228] ---- ---- >> [ 363.296189] lock(fs_reclaim); >> [ 363.301726] local_irq_disable(); >> [ 363.310122] lock(&irq_desc_lock_class); >> [ 363.319143] lock(&dev->event_map.vlpi_lock); >> [ 363.328617] >> [ 363.333713] lock(&irq_desc_lock_class); >> [ 363.340414] >> [ 363.340414] *** DEADLOCK *** >> [ 363.340414] >> [ 363.353682] 5 locks held by CPU 0/KVM/51468: >> [ 363.360412] #0: 00000000eeb852a5 (&vdev->igate){+.+.}, at: vfio_pci_ioctl+0x2f8/0xed0 >> [ 363.370915] #1: 000000002ab491f7 (lock#9){+.+.}, at: irq_bypass_register_producer+0x6c/0x1d0 >> [ 363.382139] #2: 000000000d9fd5c6 (&its->its_lock){+.+.}, at: kvm_vgic_v4_set_forwarding+0xd0/0x188 >> [ 363.396625] #3: 00000000232bdc47 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x60/0xa0 >> [ 363.408486] #4: 000000007318873f (&dev->event_map.vlpi_lock){....}, at: its_irq_set_vcpu_affinity+0x134/0x638 >> >> >> Then we found that irq_set_vcpu_affinity() in kernel/irq/manage.c >> aquires an antomic context by irq_get_desc_lock() at the beginning, >> but in its_irq_set_vcpu_affinity() >> (drivers/irqchip/irq-gic-v3-its.c) we are still using mutext_lock, >> kcalloc, kfree, etc, which we think should be forbidden in atomic >> context. >> >> Though the issue is observed in 4.19.34, we don't find any related >> fixes in the mainline yet. > Thanks for the report. Given that you're the only users of GICv4, > you're bound to find a number of these issues. > > Can you try the patch below and let me know whether it helps? This is > the simplest thing I can think off to paper over the issue, but is > isn't pretty, and I'm looking at possible alternatives (ideally, we'd > be able to allocate the map outside of the irqdesc lock, but this > requires some API change between KVM, the GICv4 layer and the ITS > code). > > Note that I'm travelling for the next two weeks without access to my > test rig, so I'm relying on you to test this stuff. > > Thanks, > > M. > > diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c > index 7577755bdcf4..18aa04b6a9f4 100644 > --- a/drivers/irqchip/irq-gic-v3-its.c > +++ b/drivers/irqchip/irq-gic-v3-its.c > @@ -142,7 +142,7 @@ struct event_lpi_map { > u16 *col_map; > irq_hw_number_t lpi_base; > int nr_lpis; > - struct mutex vlpi_lock; > + raw_spinlock_t vlpi_lock; > struct its_vm *vm; > struct its_vlpi_map *vlpi_maps; > int nr_vlpis; > @@ -1263,13 +1263,13 @@ static int its_vlpi_map(struct irq_data *d, struct its_cmd_info *info) > if (!info->map) > return -EINVAL; > > - mutex_lock(&its_dev->event_map.vlpi_lock); > + raw_spin_lock(&its_dev->event_map.vlpi_lock); > > if (!its_dev->event_map.vm) { > struct its_vlpi_map *maps; > > maps = kcalloc(its_dev->event_map.nr_lpis, sizeof(*maps), > - GFP_KERNEL); > + GFP_ATOMIC); > if (!maps) { > ret = -ENOMEM; > goto out; > @@ -1312,7 +1312,7 @@ static int its_vlpi_map(struct irq_data *d, struct its_cmd_info *info) > } > > out: > - mutex_unlock(&its_dev->event_map.vlpi_lock); > + raw_spin_unlock(&its_dev->event_map.vlpi_lock); > return ret; > } > > @@ -1322,7 +1322,7 @@ static int its_vlpi_get(struct irq_data *d, struct its_cmd_info *info) > u32 event = its_get_event_id(d); > int ret = 0; > > - mutex_lock(&its_dev->event_map.vlpi_lock); > + raw_spin_lock(&its_dev->event_map.vlpi_lock); > > if (!its_dev->event_map.vm || > !its_dev->event_map.vlpi_maps[event].vm) { > @@ -1334,7 +1334,7 @@ static int its_vlpi_get(struct irq_data *d, struct its_cmd_info *info) > *info->map = its_dev->event_map.vlpi_maps[event]; > > out: > - mutex_unlock(&its_dev->event_map.vlpi_lock); > + raw_spin_unlock(&its_dev->event_map.vlpi_lock); > return ret; > } > > @@ -1344,7 +1344,7 @@ static int its_vlpi_unmap(struct irq_data *d) > u32 event = its_get_event_id(d); > int ret = 0; > > - mutex_lock(&its_dev->event_map.vlpi_lock); > + raw_spin_lock(&its_dev->event_map.vlpi_lock); > > if (!its_dev->event_map.vm || !irqd_is_forwarded_to_vcpu(d)) { > ret = -EINVAL; > @@ -1374,7 +1374,7 @@ static int its_vlpi_unmap(struct irq_data *d) > } > > out: > - mutex_unlock(&its_dev->event_map.vlpi_lock); > + raw_spin_unlock(&its_dev->event_map.vlpi_lock); > return ret; > } > > @@ -2436,7 +2436,7 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id, > dev->event_map.col_map = col_map; > dev->event_map.lpi_base = lpi_base; > dev->event_map.nr_lpis = nr_lpis; > - mutex_init(&dev->event_map.vlpi_lock); > + raw_spin_lock_init(&dev->event_map.vlpi_lock); > dev->device_id = dev_id; > INIT_LIST_HEAD(&dev->entry); > >