2019-05-05 02:57:49

by Heyi Guo

[permalink] [raw]
Subject: ARM/gic-v4: deadlock occurred

Hi folks,

We observed deadlocks after enabling GICv4 and PCI passthrough on ARM64 virtual
machines, when not pinning VCPU to physical CPU.

We observed below warnings after enabling lockdep debug in kernel:

[ 362.847021] =====================================================
[ 362.855643] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
[ 362.864840] 4.19.34+ #7 Tainted: G W
[ 362.872314] -----------------------------------------------------
[ 362.881034] CPU 0/KVM/51468 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[ 362.890504] 00000000659c1dc9 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.22+0x0/0x48
[ 362.901413]
[ 362.901413] and this task is already holding:
[ 362.912976] 000000007318873f (&dev->event_map.vlpi_lock){....}, at: its_irq_set_vcpu_affinity+0x134/0x638
[ 362.928626] which would create a new lock dependency:
[ 362.936837] (&dev->event_map.vlpi_lock){....} -> (fs_reclaim){+.+.}
[ 362.946449]
[ 362.946449] but this new dependency connects a HARDIRQ-irq-safe lock:
[ 362.960877] (&irq_desc_lock_class){-.-.}
[ 362.960880]
[ 362.960880] ... which became HARDIRQ-irq-safe at:
[ 362.981234] lock_acquire+0xf0/0x258
[ 362.988337] _raw_spin_lock+0x54/0x90
[ 362.995543] handle_fasteoi_irq+0x2c/0x198
[ 363.003205] generic_handle_irq+0x34/0x50
[ 363.010787] __handle_domain_irq+0x68/0xc0
[ 363.018500] gic_handle_irq+0xf4/0x1e0
[ 363.025913] el1_irq+0xc8/0x180
[ 363.032683] _raw_spin_unlock_irq+0x40/0x60
[ 363.040512] finish_task_switch+0x98/0x258
[ 363.048254] __schedule+0x350/0xca8
[ 363.055359] schedule+0x40/0xa8
[ 363.062098] worker_thread+0xd8/0x410
[ 363.069340] kthread+0x134/0x138
[ 363.076070] ret_from_fork+0x10/0x18
[ 363.083111]
[ 363.083111] to a HARDIRQ-irq-unsafe lock:
[ 363.095213] (fs_reclaim){+.+.}
[ 363.095216]
[ 363.095216] ... which became HARDIRQ-irq-unsafe at:
[ 363.114527] ...
[ 363.114530] lock_acquire+0xf0/0x258
[ 363.126269] fs_reclaim_acquire.part.22+0x3c/0x48
[ 363.134206] fs_reclaim_acquire+0x2c/0x38
[ 363.141363] kmem_cache_alloc_trace+0x44/0x368
[ 363.148892] acpi_os_map_iomem+0x9c/0x208
[ 363.155934] acpi_os_map_memory+0x28/0x38
[ 363.162831] acpi_tb_acquire_table+0x58/0x8c
[ 363.170021] acpi_tb_validate_table+0x34/0x58
[ 363.177162] acpi_tb_get_table+0x4c/0x90
[ 363.183741] acpi_get_table+0x94/0xc4
[ 363.190020] find_acpi_cpu_topology_tag+0x54/0x240
[ 363.197404] find_acpi_cpu_topology_package+0x28/0x38
[ 363.204985] init_cpu_topology+0xdc/0x1e4
[ 363.211498] smp_prepare_cpus+0x2c/0x108
[ 363.217882] kernel_init_freeable+0x130/0x508
[ 363.224699] kernel_init+0x18/0x118
[ 363.230624] ret_from_fork+0x10/0x18
[ 363.236611]
[ 363.236611] other info that might help us debug this:
[ 363.236611]
[ 363.251604] Chain exists of:
[ 363.251604] &irq_desc_lock_class --> &dev->event_map.vlpi_lock --> fs_reclaim
[ 363.251604]
[ 363.270508] Possible interrupt unsafe locking scenario:
[ 363.270508]
[ 363.282238] CPU0 CPU1
[ 363.289228] ---- ----
[ 363.296189] lock(fs_reclaim);
[ 363.301726] local_irq_disable();
[ 363.310122] lock(&irq_desc_lock_class);
[ 363.319143] lock(&dev->event_map.vlpi_lock);
[ 363.328617] <Interrupt>
[ 363.333713] lock(&irq_desc_lock_class);
[ 363.340414]
[ 363.340414] *** DEADLOCK ***
[ 363.340414]
[ 363.353682] 5 locks held by CPU 0/KVM/51468:
[ 363.360412] #0: 00000000eeb852a5 (&vdev->igate){+.+.}, at: vfio_pci_ioctl+0x2f8/0xed0
[ 363.370915] #1: 000000002ab491f7 (lock#9){+.+.}, at: irq_bypass_register_producer+0x6c/0x1d0
[ 363.382139] #2: 000000000d9fd5c6 (&its->its_lock){+.+.}, at: kvm_vgic_v4_set_forwarding+0xd0/0x188
[ 363.396625] #3: 00000000232bdc47 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x60/0xa0
[ 363.408486] #4: 000000007318873f (&dev->event_map.vlpi_lock){....}, at: its_irq_set_vcpu_affinity+0x134/0x638


Then we found that irq_set_vcpu_affinity() in kernel/irq/manage.c aquires an
antomic context by irq_get_desc_lock() at the beginning, but in
its_irq_set_vcpu_affinity() (drivers/irqchip/irq-gic-v3-its.c) we are still
using mutext_lock, kcalloc, kfree, etc, which we think should be forbidden in
atomic context.

Though the issue is observed in 4.19.34, we don't find any related fixes in the mainline yet.

Please advise.

Thanks,

Heyi




2019-05-05 10:39:57

by Marc Zyngier

[permalink] [raw]
Subject: Re: ARM/gic-v4: deadlock occurred

[+ kvmarm]

Hi Heyi,

On Sun, 05 May 2019 03:26:18 +0100,
Heyi Guo <[email protected]> wrote:
>
> Hi folks,
>
> We observed deadlocks after enabling GICv4 and PCI passthrough on
> ARM64 virtual machines, when not pinning VCPU to physical CPU.
>
> We observed below warnings after enabling lockdep debug in kernel:
>
> [ 362.847021] =====================================================
> [ 362.855643] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
> [ 362.864840] 4.19.34+ #7 Tainted: G W
> [ 362.872314] -----------------------------------------------------
> [ 362.881034] CPU 0/KVM/51468 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> [ 362.890504] 00000000659c1dc9 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.22+0x0/0x48
> [ 362.901413]
> [ 362.901413] and this task is already holding:
> [ 362.912976] 000000007318873f (&dev->event_map.vlpi_lock){....}, at: its_irq_set_vcpu_affinity+0x134/0x638
> [ 362.928626] which would create a new lock dependency:
> [ 362.936837] (&dev->event_map.vlpi_lock){....} -> (fs_reclaim){+.+.}
> [ 362.946449]
> [ 362.946449] but this new dependency connects a HARDIRQ-irq-safe lock:
> [ 362.960877] (&irq_desc_lock_class){-.-.}
> [ 362.960880]
> [ 362.960880] ... which became HARDIRQ-irq-safe at:
> [ 362.981234] lock_acquire+0xf0/0x258
> [ 362.988337] _raw_spin_lock+0x54/0x90
> [ 362.995543] handle_fasteoi_irq+0x2c/0x198
> [ 363.003205] generic_handle_irq+0x34/0x50
> [ 363.010787] __handle_domain_irq+0x68/0xc0
> [ 363.018500] gic_handle_irq+0xf4/0x1e0
> [ 363.025913] el1_irq+0xc8/0x180
> [ 363.032683] _raw_spin_unlock_irq+0x40/0x60
> [ 363.040512] finish_task_switch+0x98/0x258
> [ 363.048254] __schedule+0x350/0xca8
> [ 363.055359] schedule+0x40/0xa8
> [ 363.062098] worker_thread+0xd8/0x410
> [ 363.069340] kthread+0x134/0x138
> [ 363.076070] ret_from_fork+0x10/0x18
> [ 363.083111]
> [ 363.083111] to a HARDIRQ-irq-unsafe lock:
> [ 363.095213] (fs_reclaim){+.+.}
> [ 363.095216]
> [ 363.095216] ... which became HARDIRQ-irq-unsafe at:
> [ 363.114527] ...
> [ 363.114530] lock_acquire+0xf0/0x258
> [ 363.126269] fs_reclaim_acquire.part.22+0x3c/0x48
> [ 363.134206] fs_reclaim_acquire+0x2c/0x38
> [ 363.141363] kmem_cache_alloc_trace+0x44/0x368
> [ 363.148892] acpi_os_map_iomem+0x9c/0x208
> [ 363.155934] acpi_os_map_memory+0x28/0x38
> [ 363.162831] acpi_tb_acquire_table+0x58/0x8c
> [ 363.170021] acpi_tb_validate_table+0x34/0x58
> [ 363.177162] acpi_tb_get_table+0x4c/0x90
> [ 363.183741] acpi_get_table+0x94/0xc4
> [ 363.190020] find_acpi_cpu_topology_tag+0x54/0x240
> [ 363.197404] find_acpi_cpu_topology_package+0x28/0x38
> [ 363.204985] init_cpu_topology+0xdc/0x1e4
> [ 363.211498] smp_prepare_cpus+0x2c/0x108
> [ 363.217882] kernel_init_freeable+0x130/0x508
> [ 363.224699] kernel_init+0x18/0x118
> [ 363.230624] ret_from_fork+0x10/0x18
> [ 363.236611]
> [ 363.236611] other info that might help us debug this:
> [ 363.236611]
> [ 363.251604] Chain exists of:
> [ 363.251604] &irq_desc_lock_class --> &dev->event_map.vlpi_lock --> fs_reclaim
> [ 363.251604]
> [ 363.270508] Possible interrupt unsafe locking scenario:
> [ 363.270508]
> [ 363.282238] CPU0 CPU1
> [ 363.289228] ---- ----
> [ 363.296189] lock(fs_reclaim);
> [ 363.301726] local_irq_disable();
> [ 363.310122] lock(&irq_desc_lock_class);
> [ 363.319143] lock(&dev->event_map.vlpi_lock);
> [ 363.328617] <Interrupt>
> [ 363.333713] lock(&irq_desc_lock_class);
> [ 363.340414]
> [ 363.340414] *** DEADLOCK ***
> [ 363.340414]
> [ 363.353682] 5 locks held by CPU 0/KVM/51468:
> [ 363.360412] #0: 00000000eeb852a5 (&vdev->igate){+.+.}, at: vfio_pci_ioctl+0x2f8/0xed0
> [ 363.370915] #1: 000000002ab491f7 (lock#9){+.+.}, at: irq_bypass_register_producer+0x6c/0x1d0
> [ 363.382139] #2: 000000000d9fd5c6 (&its->its_lock){+.+.}, at: kvm_vgic_v4_set_forwarding+0xd0/0x188
> [ 363.396625] #3: 00000000232bdc47 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x60/0xa0
> [ 363.408486] #4: 000000007318873f (&dev->event_map.vlpi_lock){....}, at: its_irq_set_vcpu_affinity+0x134/0x638
>
>
> Then we found that irq_set_vcpu_affinity() in kernel/irq/manage.c
> aquires an antomic context by irq_get_desc_lock() at the beginning,
> but in its_irq_set_vcpu_affinity()
> (drivers/irqchip/irq-gic-v3-its.c) we are still using mutext_lock,
> kcalloc, kfree, etc, which we think should be forbidden in atomic
> context.
>
> Though the issue is observed in 4.19.34, we don't find any related
> fixes in the mainline yet.

Thanks for the report. Given that you're the only users of GICv4,
you're bound to find a number of these issues.

Can you try the patch below and let me know whether it helps? This is
the simplest thing I can think off to paper over the issue, but is
isn't pretty, and I'm looking at possible alternatives (ideally, we'd
be able to allocate the map outside of the irqdesc lock, but this
requires some API change between KVM, the GICv4 layer and the ITS
code).

Note that I'm travelling for the next two weeks without access to my
test rig, so I'm relying on you to test this stuff.

Thanks,

M.

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 7577755bdcf4..18aa04b6a9f4 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -142,7 +142,7 @@ struct event_lpi_map {
u16 *col_map;
irq_hw_number_t lpi_base;
int nr_lpis;
- struct mutex vlpi_lock;
+ raw_spinlock_t vlpi_lock;
struct its_vm *vm;
struct its_vlpi_map *vlpi_maps;
int nr_vlpis;
@@ -1263,13 +1263,13 @@ static int its_vlpi_map(struct irq_data *d, struct its_cmd_info *info)
if (!info->map)
return -EINVAL;

- mutex_lock(&its_dev->event_map.vlpi_lock);
+ raw_spin_lock(&its_dev->event_map.vlpi_lock);

if (!its_dev->event_map.vm) {
struct its_vlpi_map *maps;

maps = kcalloc(its_dev->event_map.nr_lpis, sizeof(*maps),
- GFP_KERNEL);
+ GFP_ATOMIC);
if (!maps) {
ret = -ENOMEM;
goto out;
@@ -1312,7 +1312,7 @@ static int its_vlpi_map(struct irq_data *d, struct its_cmd_info *info)
}

out:
- mutex_unlock(&its_dev->event_map.vlpi_lock);
+ raw_spin_unlock(&its_dev->event_map.vlpi_lock);
return ret;
}

@@ -1322,7 +1322,7 @@ static int its_vlpi_get(struct irq_data *d, struct its_cmd_info *info)
u32 event = its_get_event_id(d);
int ret = 0;

- mutex_lock(&its_dev->event_map.vlpi_lock);
+ raw_spin_lock(&its_dev->event_map.vlpi_lock);

if (!its_dev->event_map.vm ||
!its_dev->event_map.vlpi_maps[event].vm) {
@@ -1334,7 +1334,7 @@ static int its_vlpi_get(struct irq_data *d, struct its_cmd_info *info)
*info->map = its_dev->event_map.vlpi_maps[event];

out:
- mutex_unlock(&its_dev->event_map.vlpi_lock);
+ raw_spin_unlock(&its_dev->event_map.vlpi_lock);
return ret;
}

@@ -1344,7 +1344,7 @@ static int its_vlpi_unmap(struct irq_data *d)
u32 event = its_get_event_id(d);
int ret = 0;

- mutex_lock(&its_dev->event_map.vlpi_lock);
+ raw_spin_lock(&its_dev->event_map.vlpi_lock);

if (!its_dev->event_map.vm || !irqd_is_forwarded_to_vcpu(d)) {
ret = -EINVAL;
@@ -1374,7 +1374,7 @@ static int its_vlpi_unmap(struct irq_data *d)
}

out:
- mutex_unlock(&its_dev->event_map.vlpi_lock);
+ raw_spin_unlock(&its_dev->event_map.vlpi_lock);
return ret;
}

@@ -2436,7 +2436,7 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
dev->event_map.col_map = col_map;
dev->event_map.lpi_base = lpi_base;
dev->event_map.nr_lpis = nr_lpis;
- mutex_init(&dev->event_map.vlpi_lock);
+ raw_spin_lock_init(&dev->event_map.vlpi_lock);
dev->device_id = dev_id;
INIT_LIST_HEAD(&dev->entry);


--
Jazz is not dead, it just smell funny.

2019-05-05 11:09:38

by Heyi Guo

[permalink] [raw]
Subject: Re: ARM/gic-v4: deadlock occurred

Hi Marc,

Appreciate your quick patch :) We'll test it and let you know the result.

Heyi


On 2019/5/5 18:38, Marc Zyngier wrote:
> [+ kvmarm]
>
> Hi Heyi,
>
> On Sun, 05 May 2019 03:26:18 +0100,
> Heyi Guo <[email protected]> wrote:
>> Hi folks,
>>
>> We observed deadlocks after enabling GICv4 and PCI passthrough on
>> ARM64 virtual machines, when not pinning VCPU to physical CPU.
>>
>> We observed below warnings after enabling lockdep debug in kernel:
>>
>> [ 362.847021] =====================================================
>> [ 362.855643] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
>> [ 362.864840] 4.19.34+ #7 Tainted: G W
>> [ 362.872314] -----------------------------------------------------
>> [ 362.881034] CPU 0/KVM/51468 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
>> [ 362.890504] 00000000659c1dc9 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.22+0x0/0x48
>> [ 362.901413]
>> [ 362.901413] and this task is already holding:
>> [ 362.912976] 000000007318873f (&dev->event_map.vlpi_lock){....}, at: its_irq_set_vcpu_affinity+0x134/0x638
>> [ 362.928626] which would create a new lock dependency:
>> [ 362.936837] (&dev->event_map.vlpi_lock){....} -> (fs_reclaim){+.+.}
>> [ 362.946449]
>> [ 362.946449] but this new dependency connects a HARDIRQ-irq-safe lock:
>> [ 362.960877] (&irq_desc_lock_class){-.-.}
>> [ 362.960880]
>> [ 362.960880] ... which became HARDIRQ-irq-safe at:
>> [ 362.981234] lock_acquire+0xf0/0x258
>> [ 362.988337] _raw_spin_lock+0x54/0x90
>> [ 362.995543] handle_fasteoi_irq+0x2c/0x198
>> [ 363.003205] generic_handle_irq+0x34/0x50
>> [ 363.010787] __handle_domain_irq+0x68/0xc0
>> [ 363.018500] gic_handle_irq+0xf4/0x1e0
>> [ 363.025913] el1_irq+0xc8/0x180
>> [ 363.032683] _raw_spin_unlock_irq+0x40/0x60
>> [ 363.040512] finish_task_switch+0x98/0x258
>> [ 363.048254] __schedule+0x350/0xca8
>> [ 363.055359] schedule+0x40/0xa8
>> [ 363.062098] worker_thread+0xd8/0x410
>> [ 363.069340] kthread+0x134/0x138
>> [ 363.076070] ret_from_fork+0x10/0x18
>> [ 363.083111]
>> [ 363.083111] to a HARDIRQ-irq-unsafe lock:
>> [ 363.095213] (fs_reclaim){+.+.}
>> [ 363.095216]
>> [ 363.095216] ... which became HARDIRQ-irq-unsafe at:
>> [ 363.114527] ...
>> [ 363.114530] lock_acquire+0xf0/0x258
>> [ 363.126269] fs_reclaim_acquire.part.22+0x3c/0x48
>> [ 363.134206] fs_reclaim_acquire+0x2c/0x38
>> [ 363.141363] kmem_cache_alloc_trace+0x44/0x368
>> [ 363.148892] acpi_os_map_iomem+0x9c/0x208
>> [ 363.155934] acpi_os_map_memory+0x28/0x38
>> [ 363.162831] acpi_tb_acquire_table+0x58/0x8c
>> [ 363.170021] acpi_tb_validate_table+0x34/0x58
>> [ 363.177162] acpi_tb_get_table+0x4c/0x90
>> [ 363.183741] acpi_get_table+0x94/0xc4
>> [ 363.190020] find_acpi_cpu_topology_tag+0x54/0x240
>> [ 363.197404] find_acpi_cpu_topology_package+0x28/0x38
>> [ 363.204985] init_cpu_topology+0xdc/0x1e4
>> [ 363.211498] smp_prepare_cpus+0x2c/0x108
>> [ 363.217882] kernel_init_freeable+0x130/0x508
>> [ 363.224699] kernel_init+0x18/0x118
>> [ 363.230624] ret_from_fork+0x10/0x18
>> [ 363.236611]
>> [ 363.236611] other info that might help us debug this:
>> [ 363.236611]
>> [ 363.251604] Chain exists of:
>> [ 363.251604] &irq_desc_lock_class --> &dev->event_map.vlpi_lock --> fs_reclaim
>> [ 363.251604]
>> [ 363.270508] Possible interrupt unsafe locking scenario:
>> [ 363.270508]
>> [ 363.282238] CPU0 CPU1
>> [ 363.289228] ---- ----
>> [ 363.296189] lock(fs_reclaim);
>> [ 363.301726] local_irq_disable();
>> [ 363.310122] lock(&irq_desc_lock_class);
>> [ 363.319143] lock(&dev->event_map.vlpi_lock);
>> [ 363.328617] <Interrupt>
>> [ 363.333713] lock(&irq_desc_lock_class);
>> [ 363.340414]
>> [ 363.340414] *** DEADLOCK ***
>> [ 363.340414]
>> [ 363.353682] 5 locks held by CPU 0/KVM/51468:
>> [ 363.360412] #0: 00000000eeb852a5 (&vdev->igate){+.+.}, at: vfio_pci_ioctl+0x2f8/0xed0
>> [ 363.370915] #1: 000000002ab491f7 (lock#9){+.+.}, at: irq_bypass_register_producer+0x6c/0x1d0
>> [ 363.382139] #2: 000000000d9fd5c6 (&its->its_lock){+.+.}, at: kvm_vgic_v4_set_forwarding+0xd0/0x188
>> [ 363.396625] #3: 00000000232bdc47 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x60/0xa0
>> [ 363.408486] #4: 000000007318873f (&dev->event_map.vlpi_lock){....}, at: its_irq_set_vcpu_affinity+0x134/0x638
>>
>>
>> Then we found that irq_set_vcpu_affinity() in kernel/irq/manage.c
>> aquires an antomic context by irq_get_desc_lock() at the beginning,
>> but in its_irq_set_vcpu_affinity()
>> (drivers/irqchip/irq-gic-v3-its.c) we are still using mutext_lock,
>> kcalloc, kfree, etc, which we think should be forbidden in atomic
>> context.
>>
>> Though the issue is observed in 4.19.34, we don't find any related
>> fixes in the mainline yet.
> Thanks for the report. Given that you're the only users of GICv4,
> you're bound to find a number of these issues.
>
> Can you try the patch below and let me know whether it helps? This is
> the simplest thing I can think off to paper over the issue, but is
> isn't pretty, and I'm looking at possible alternatives (ideally, we'd
> be able to allocate the map outside of the irqdesc lock, but this
> requires some API change between KVM, the GICv4 layer and the ITS
> code).
>
> Note that I'm travelling for the next two weeks without access to my
> test rig, so I'm relying on you to test this stuff.
>
> Thanks,
>
> M.
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 7577755bdcf4..18aa04b6a9f4 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -142,7 +142,7 @@ struct event_lpi_map {
> u16 *col_map;
> irq_hw_number_t lpi_base;
> int nr_lpis;
> - struct mutex vlpi_lock;
> + raw_spinlock_t vlpi_lock;
> struct its_vm *vm;
> struct its_vlpi_map *vlpi_maps;
> int nr_vlpis;
> @@ -1263,13 +1263,13 @@ static int its_vlpi_map(struct irq_data *d, struct its_cmd_info *info)
> if (!info->map)
> return -EINVAL;
>
> - mutex_lock(&its_dev->event_map.vlpi_lock);
> + raw_spin_lock(&its_dev->event_map.vlpi_lock);
>
> if (!its_dev->event_map.vm) {
> struct its_vlpi_map *maps;
>
> maps = kcalloc(its_dev->event_map.nr_lpis, sizeof(*maps),
> - GFP_KERNEL);
> + GFP_ATOMIC);
> if (!maps) {
> ret = -ENOMEM;
> goto out;
> @@ -1312,7 +1312,7 @@ static int its_vlpi_map(struct irq_data *d, struct its_cmd_info *info)
> }
>
> out:
> - mutex_unlock(&its_dev->event_map.vlpi_lock);
> + raw_spin_unlock(&its_dev->event_map.vlpi_lock);
> return ret;
> }
>
> @@ -1322,7 +1322,7 @@ static int its_vlpi_get(struct irq_data *d, struct its_cmd_info *info)
> u32 event = its_get_event_id(d);
> int ret = 0;
>
> - mutex_lock(&its_dev->event_map.vlpi_lock);
> + raw_spin_lock(&its_dev->event_map.vlpi_lock);
>
> if (!its_dev->event_map.vm ||
> !its_dev->event_map.vlpi_maps[event].vm) {
> @@ -1334,7 +1334,7 @@ static int its_vlpi_get(struct irq_data *d, struct its_cmd_info *info)
> *info->map = its_dev->event_map.vlpi_maps[event];
>
> out:
> - mutex_unlock(&its_dev->event_map.vlpi_lock);
> + raw_spin_unlock(&its_dev->event_map.vlpi_lock);
> return ret;
> }
>
> @@ -1344,7 +1344,7 @@ static int its_vlpi_unmap(struct irq_data *d)
> u32 event = its_get_event_id(d);
> int ret = 0;
>
> - mutex_lock(&its_dev->event_map.vlpi_lock);
> + raw_spin_lock(&its_dev->event_map.vlpi_lock);
>
> if (!its_dev->event_map.vm || !irqd_is_forwarded_to_vcpu(d)) {
> ret = -EINVAL;
> @@ -1374,7 +1374,7 @@ static int its_vlpi_unmap(struct irq_data *d)
> }
>
> out:
> - mutex_unlock(&its_dev->event_map.vlpi_lock);
> + raw_spin_unlock(&its_dev->event_map.vlpi_lock);
> return ret;
> }
>
> @@ -2436,7 +2436,7 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
> dev->event_map.col_map = col_map;
> dev->event_map.lpi_base = lpi_base;
> dev->event_map.nr_lpis = nr_lpis;
> - mutex_init(&dev->event_map.vlpi_lock);
> + raw_spin_lock_init(&dev->event_map.vlpi_lock);
> dev->device_id = dev_id;
> INIT_LIST_HEAD(&dev->entry);
>
>


2019-05-05 11:18:43

by Heyi Guo

[permalink] [raw]
Subject: Re: ARM/gic-v4: deadlock occurred

BTW since its_irq_set_vcpu_affinity() is already in atomic context, do we really need a separate lock its_dev->event_map.vlpi_lock? I didn't find anywhere outside its_irq_set_vcpu_affinity() call chain acquires this lock.

Thanks,

Heyi


On 2019/5/5 18:38, Marc Zyngier wrote:
> [+ kvmarm]
>
> Hi Heyi,
>
> On Sun, 05 May 2019 03:26:18 +0100,
> Heyi Guo <[email protected]> wrote:
>> Hi folks,
>>
>> We observed deadlocks after enabling GICv4 and PCI passthrough on
>> ARM64 virtual machines, when not pinning VCPU to physical CPU.
>>
>> We observed below warnings after enabling lockdep debug in kernel:
>>
>> [ 362.847021] =====================================================
>> [ 362.855643] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
>> [ 362.864840] 4.19.34+ #7 Tainted: G W
>> [ 362.872314] -----------------------------------------------------
>> [ 362.881034] CPU 0/KVM/51468 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
>> [ 362.890504] 00000000659c1dc9 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.22+0x0/0x48
>> [ 362.901413]
>> [ 362.901413] and this task is already holding:
>> [ 362.912976] 000000007318873f (&dev->event_map.vlpi_lock){....}, at: its_irq_set_vcpu_affinity+0x134/0x638
>> [ 362.928626] which would create a new lock dependency:
>> [ 362.936837] (&dev->event_map.vlpi_lock){....} -> (fs_reclaim){+.+.}
>> [ 362.946449]
>> [ 362.946449] but this new dependency connects a HARDIRQ-irq-safe lock:
>> [ 362.960877] (&irq_desc_lock_class){-.-.}
>> [ 362.960880]
>> [ 362.960880] ... which became HARDIRQ-irq-safe at:
>> [ 362.981234] lock_acquire+0xf0/0x258
>> [ 362.988337] _raw_spin_lock+0x54/0x90
>> [ 362.995543] handle_fasteoi_irq+0x2c/0x198
>> [ 363.003205] generic_handle_irq+0x34/0x50
>> [ 363.010787] __handle_domain_irq+0x68/0xc0
>> [ 363.018500] gic_handle_irq+0xf4/0x1e0
>> [ 363.025913] el1_irq+0xc8/0x180
>> [ 363.032683] _raw_spin_unlock_irq+0x40/0x60
>> [ 363.040512] finish_task_switch+0x98/0x258
>> [ 363.048254] __schedule+0x350/0xca8
>> [ 363.055359] schedule+0x40/0xa8
>> [ 363.062098] worker_thread+0xd8/0x410
>> [ 363.069340] kthread+0x134/0x138
>> [ 363.076070] ret_from_fork+0x10/0x18
>> [ 363.083111]
>> [ 363.083111] to a HARDIRQ-irq-unsafe lock:
>> [ 363.095213] (fs_reclaim){+.+.}
>> [ 363.095216]
>> [ 363.095216] ... which became HARDIRQ-irq-unsafe at:
>> [ 363.114527] ...
>> [ 363.114530] lock_acquire+0xf0/0x258
>> [ 363.126269] fs_reclaim_acquire.part.22+0x3c/0x48
>> [ 363.134206] fs_reclaim_acquire+0x2c/0x38
>> [ 363.141363] kmem_cache_alloc_trace+0x44/0x368
>> [ 363.148892] acpi_os_map_iomem+0x9c/0x208
>> [ 363.155934] acpi_os_map_memory+0x28/0x38
>> [ 363.162831] acpi_tb_acquire_table+0x58/0x8c
>> [ 363.170021] acpi_tb_validate_table+0x34/0x58
>> [ 363.177162] acpi_tb_get_table+0x4c/0x90
>> [ 363.183741] acpi_get_table+0x94/0xc4
>> [ 363.190020] find_acpi_cpu_topology_tag+0x54/0x240
>> [ 363.197404] find_acpi_cpu_topology_package+0x28/0x38
>> [ 363.204985] init_cpu_topology+0xdc/0x1e4
>> [ 363.211498] smp_prepare_cpus+0x2c/0x108
>> [ 363.217882] kernel_init_freeable+0x130/0x508
>> [ 363.224699] kernel_init+0x18/0x118
>> [ 363.230624] ret_from_fork+0x10/0x18
>> [ 363.236611]
>> [ 363.236611] other info that might help us debug this:
>> [ 363.236611]
>> [ 363.251604] Chain exists of:
>> [ 363.251604] &irq_desc_lock_class --> &dev->event_map.vlpi_lock --> fs_reclaim
>> [ 363.251604]
>> [ 363.270508] Possible interrupt unsafe locking scenario:
>> [ 363.270508]
>> [ 363.282238] CPU0 CPU1
>> [ 363.289228] ---- ----
>> [ 363.296189] lock(fs_reclaim);
>> [ 363.301726] local_irq_disable();
>> [ 363.310122] lock(&irq_desc_lock_class);
>> [ 363.319143] lock(&dev->event_map.vlpi_lock);
>> [ 363.328617] <Interrupt>
>> [ 363.333713] lock(&irq_desc_lock_class);
>> [ 363.340414]
>> [ 363.340414] *** DEADLOCK ***
>> [ 363.340414]
>> [ 363.353682] 5 locks held by CPU 0/KVM/51468:
>> [ 363.360412] #0: 00000000eeb852a5 (&vdev->igate){+.+.}, at: vfio_pci_ioctl+0x2f8/0xed0
>> [ 363.370915] #1: 000000002ab491f7 (lock#9){+.+.}, at: irq_bypass_register_producer+0x6c/0x1d0
>> [ 363.382139] #2: 000000000d9fd5c6 (&its->its_lock){+.+.}, at: kvm_vgic_v4_set_forwarding+0xd0/0x188
>> [ 363.396625] #3: 00000000232bdc47 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x60/0xa0
>> [ 363.408486] #4: 000000007318873f (&dev->event_map.vlpi_lock){....}, at: its_irq_set_vcpu_affinity+0x134/0x638
>>
>>
>> Then we found that irq_set_vcpu_affinity() in kernel/irq/manage.c
>> aquires an antomic context by irq_get_desc_lock() at the beginning,
>> but in its_irq_set_vcpu_affinity()
>> (drivers/irqchip/irq-gic-v3-its.c) we are still using mutext_lock,
>> kcalloc, kfree, etc, which we think should be forbidden in atomic
>> context.
>>
>> Though the issue is observed in 4.19.34, we don't find any related
>> fixes in the mainline yet.
> Thanks for the report. Given that you're the only users of GICv4,
> you're bound to find a number of these issues.
>
> Can you try the patch below and let me know whether it helps? This is
> the simplest thing I can think off to paper over the issue, but is
> isn't pretty, and I'm looking at possible alternatives (ideally, we'd
> be able to allocate the map outside of the irqdesc lock, but this
> requires some API change between KVM, the GICv4 layer and the ITS
> code).
>
> Note that I'm travelling for the next two weeks without access to my
> test rig, so I'm relying on you to test this stuff.
>
> Thanks,
>
> M.
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 7577755bdcf4..18aa04b6a9f4 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -142,7 +142,7 @@ struct event_lpi_map {
> u16 *col_map;
> irq_hw_number_t lpi_base;
> int nr_lpis;
> - struct mutex vlpi_lock;
> + raw_spinlock_t vlpi_lock;
> struct its_vm *vm;
> struct its_vlpi_map *vlpi_maps;
> int nr_vlpis;
> @@ -1263,13 +1263,13 @@ static int its_vlpi_map(struct irq_data *d, struct its_cmd_info *info)
> if (!info->map)
> return -EINVAL;
>
> - mutex_lock(&its_dev->event_map.vlpi_lock);
> + raw_spin_lock(&its_dev->event_map.vlpi_lock);
>
> if (!its_dev->event_map.vm) {
> struct its_vlpi_map *maps;
>
> maps = kcalloc(its_dev->event_map.nr_lpis, sizeof(*maps),
> - GFP_KERNEL);
> + GFP_ATOMIC);
> if (!maps) {
> ret = -ENOMEM;
> goto out;
> @@ -1312,7 +1312,7 @@ static int its_vlpi_map(struct irq_data *d, struct its_cmd_info *info)
> }
>
> out:
> - mutex_unlock(&its_dev->event_map.vlpi_lock);
> + raw_spin_unlock(&its_dev->event_map.vlpi_lock);
> return ret;
> }
>
> @@ -1322,7 +1322,7 @@ static int its_vlpi_get(struct irq_data *d, struct its_cmd_info *info)
> u32 event = its_get_event_id(d);
> int ret = 0;
>
> - mutex_lock(&its_dev->event_map.vlpi_lock);
> + raw_spin_lock(&its_dev->event_map.vlpi_lock);
>
> if (!its_dev->event_map.vm ||
> !its_dev->event_map.vlpi_maps[event].vm) {
> @@ -1334,7 +1334,7 @@ static int its_vlpi_get(struct irq_data *d, struct its_cmd_info *info)
> *info->map = its_dev->event_map.vlpi_maps[event];
>
> out:
> - mutex_unlock(&its_dev->event_map.vlpi_lock);
> + raw_spin_unlock(&its_dev->event_map.vlpi_lock);
> return ret;
> }
>
> @@ -1344,7 +1344,7 @@ static int its_vlpi_unmap(struct irq_data *d)
> u32 event = its_get_event_id(d);
> int ret = 0;
>
> - mutex_lock(&its_dev->event_map.vlpi_lock);
> + raw_spin_lock(&its_dev->event_map.vlpi_lock);
>
> if (!its_dev->event_map.vm || !irqd_is_forwarded_to_vcpu(d)) {
> ret = -EINVAL;
> @@ -1374,7 +1374,7 @@ static int its_vlpi_unmap(struct irq_data *d)
> }
>
> out:
> - mutex_unlock(&its_dev->event_map.vlpi_lock);
> + raw_spin_unlock(&its_dev->event_map.vlpi_lock);
> return ret;
> }
>
> @@ -2436,7 +2436,7 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
> dev->event_map.col_map = col_map;
> dev->event_map.lpi_base = lpi_base;
> dev->event_map.nr_lpis = nr_lpis;
> - mutex_init(&dev->event_map.vlpi_lock);
> + raw_spin_lock_init(&dev->event_map.vlpi_lock);
> dev->device_id = dev_id;
> INIT_LIST_HEAD(&dev->entry);
>
>


2019-05-06 09:23:25

by Marc Zyngier

[permalink] [raw]
Subject: Re: ARM/gic-v4: deadlock occurred

On Sun, 05 May 2019 12:15:51 +0100,
Heyi Guo <[email protected]> wrote:
>
> BTW since its_irq_set_vcpu_affinity() is already in atomic context,
> do we really need a separate lock its_dev->event_map.vlpi_lock? I
> didn't find anywhere outside its_irq_set_vcpu_affinity() call chain
> acquires this lock.

The reason is that the vlpi_maps array covers the whole of the
generating device, and not just a single interrupt. Relying on the
irq_desc lock to protect the array wouldn't work, as you could still
have concurrent accesses to the array (map, unmap and get all access
the same data).

So one way or another, we need some form of mutual exclusion at this
level. I guess one of the design mistakes that we have in the current
code is that there is no "device wide" operation, and that we rely on
map/unmap to perform the allocations on demand in the low level code.

What we could potentially do would be to move this allocation higher
up in the stack, and track the first time an LPI is turned into a VLPI
at that level. That's an invasive change though...

Thanks,

M.

--
Jazz is not dead, it just smell funny.

2019-05-08 12:34:43

by Marc Zyngier

[permalink] [raw]
Subject: Re: ARM/gic-v4: deadlock occurred

On Sun, 05 May 2019 12:07:02 +0100,
Heyi Guo <[email protected]> wrote:
>
> Hi Marc,
>
> Appreciate your quick patch :) We'll test it and let you know the
> result.

For what it is worth, I've just pushed out a branch (irq/vlpi-map-rework)
with a more complete fix. It is of course untested (it does compile though).

Please let me know how this fares.

Thanks,

M.

--
Jazz is not dead, it just smell funny.

2019-05-08 14:43:52

by Heyi Guo

[permalink] [raw]
Subject: Re: ARM/gic-v4: deadlock occurred

Hi Marc,

The bad news is that though your previous patch fixed the lockdep warnings, we can still reproduce soft lockup panics and some other exceptions... So our issue may not be related with this lock defect.

Most of the call traces are as below, stuck in smp_call_function_many:

[ 6862.660611] watchdog: BUG: soft lockup - CPU#27 stuck for 23s! [CPU 18/KVM:95311]
[ 6862.668283] Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nsh nf_nat_ipv6 nf_nat_ipv4 nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad rpcrdma sunrpc rdma_ucm ib_uverbs ib_iser rdma_cm iw_cm ib_cm hns_roce_hw_v2 hns_roce aes_ce_blk crypto_simd ib_core cryptd aes_ce_cipher crc32_ce ghash_ce sha2_ce sha256_arm64 sha1_ce marvell ses enclosure hibmc_drm ttm drm_kms_helper drm sg ixgbe mdio fb_sys_fops syscopyarea hns3 hclge sysfillrect hnae3 sysimgblt sbsa_gwdt vhost_net tun vhost tap ip_tables dm_mod megaraid_sas hisi_sas_v3_hw hisi_sas_main ipmi_si ipmi_devintf ipmi_msghandler br_netfilter xt_sctp
[ 6862.668519] irq event stamp: 1670812
[ 6862.668526] hardirqs last enabled at (1670811): [<ffff000008083498>] el1_irq+0xd8/0x180
[ 6862.668530] hardirqs last disabled at (1670812): [<ffff000008083448>] el1_irq+0x88/0x180
[ 6862.668534] softirqs last enabled at (1661542): [<ffff000008081d2c>] __do_softirq+0x41c/0x51c
[ 6862.668539] softirqs last disabled at (1661535): [<ffff0000080fafc4>] irq_exit+0x18c/0x198
[ 6862.668544] CPU: 27 PID: 95311 Comm: CPU 18/KVM Kdump: loaded Tainted: G W 4.19.36-1.2.141.aarch64 #1
[ 6862.668548] Hardware name: Huawei TaiShan 2280 V2/BC82AMDA, BIOS TA BIOS TaiShan 2280 V2 - B900 01/29/2019
[ 6862.668551] pstate: 80400009 (Nzcv daif +PAN -UAO)
[ 6862.668557] pc : smp_call_function_many+0x360/0x3b8
[ 6862.668560] lr : smp_call_function_many+0x320/0x3b8
[ 6862.668563] sp : ffff000028f338e0
[ 6862.668566] x29: ffff000028f338e0 x28: ffff000009893fb4
[ 6862.668575] x27: 0000000000000400 x26: 0000000000000000
[ 6862.668583] x25: ffff0000080b1e08 x24: 0000000000000001
[ 6862.668591] x23: ffff000009891bc8 x22: ffff000009891bc8
[ 6862.668599] x21: ffff805f7d6da408 x20: ffff000009893fb4
[ 6862.668608] x19: ffff805f7d6da400 x18: 0000000000000000
[ 6862.668616] x17: 0000000000000000 x16: 0000000000000000
[ 6862.668624] x15: 0000000000000000 x14: 0000000000000000
[ 6862.668632] x13: 0000000000000040 x12: 0000000000000228
[ 6862.668640] x11: 0000000000000020 x10: 0000000000000040
[ 6862.668648] x9 : 0000000000000000 x8 : 0000000000000010
[ 6862.668656] x7 : 0000000000000000 x6 : ffff805f7d329660
[ 6862.668664] x5 : ffff000028f33850 x4 : 0000000002000402
[ 6862.668673] x3 : 0000000000000000 x2 : ffff803f7f3dc678
[ 6862.668681] x1 : 0000000000000003 x0 : 000000000000000a
[ 6862.668689] Call trace:
[ 6862.668693] smp_call_function_many+0x360/0x3b8
[ 6862.668697] kvm_make_vcpus_request_mask+0x1dc/0x240
[ 6862.668701] kvm_make_all_cpus_request+0x70/0x98
[ 6862.668705] kvm_arm_halt_guest+0x70/0x80
[ 6862.668708] kvm_arch_irq_bypass_stop+0x24/0x30
[ 6862.668714] __disconnect+0x3c/0x90
[ 6862.668718] irq_bypass_unregister_producer+0xe0/0x138
[ 6862.668724] vfio_msi_set_vector_signal+0x9c/0x2a8
[ 6862.668727] vfio_msi_set_block+0x78/0x108
[ 6862.668730] vfio_pci_set_msi_trigger+0x238/0x2e0
[ 6862.668734] vfio_pci_set_irqs_ioctl+0xf8/0x140
[ 6862.668737] vfio_pci_ioctl+0x30c/0xed0
[ 6862.668743] vfio_device_fops_unl_ioctl+0x44/0x70
[ 6862.668748] do_vfs_ioctl+0xc4/0x7f0
[ 6862.668752] ksys_ioctl+0x8c/0xa0
[ 6862.668755] __arm64_sys_ioctl+0x28/0x38
[ 6862.668760] el0_svc_common+0xa0/0x190
[ 6862.668763] el0_svc_handler+0x38/0x78
[ 6862.668767] el0_svc+0x8/0xc
[ 6862.668771] Kernel panic - not syncing: softlockup: hung tasks
[ 6862.674729] CPU: 27 PID: 95311 Comm: CPU 18/KVM Kdump: loaded Tainted: G W L 4.19.36-1.2.141.aarch64 #1
[ 6862.685479] Hardware name: Huawei TaiShan 2280 V2/BC82AMDA, BIOS TA BIOS TaiShan 2280 V2 - B900 01/29/2019
[ 6862.695346] Call trace:
[ 6862.697831] dump_backtrace+0x0/0x198
[ 6862.701562] show_stack+0x24/0x30
[ 6862.704940] dump_stack+0xd0/0x11c
[ 6862.708409] panic+0x138/0x324
[ 6862.711521] watchdog_timer_fn+0x344/0x348
[ 6862.715700] __run_hrtimer+0xfc/0x4d0
[ 6862.719431] __hrtimer_run_queues+0xd4/0x140
[ 6862.723785] hrtimer_interrupt+0xd8/0x250
[ 6862.727876] arch_timer_handler_phys+0x38/0x58
[ 6862.732410] handle_percpu_devid_irq+0xd0/0x3f8
[ 6862.737033] generic_handle_irq+0x34/0x50
[ 6862.741119] __handle_domain_irq+0x68/0xc0
[ 6862.745295] gic_handle_irq+0xf4/0x1e0
[ 6862.749117] el1_irq+0xc8/0x180
[ 6862.752316] smp_call_function_many+0x360/0x3b8
[ 6862.756937] kvm_make_vcpus_request_mask+0x1dc/0x240
[ 6862.762003] kvm_make_all_cpus_request+0x70/0x98
[ 6862.766714] kvm_arm_halt_guest+0x70/0x80
[ 6862.770805] kvm_arch_irq_bypass_stop+0x24/0x30
[ 6862.775428] __disconnect+0x3c/0x90
[ 6862.778982] irq_bypass_unregister_producer+0xe0/0x138
[ 6862.784228] vfio_msi_set_vector_signal+0x9c/0x2a8
[ 6862.789116] vfio_msi_set_block+0x78/0x108
[ 6862.793294] vfio_pci_set_msi_trigger+0x238/0x2e0
[ 6862.798093] vfio_pci_set_irqs_ioctl+0xf8/0x140
[ 6862.802714] vfio_pci_ioctl+0x30c/0xed0
[ 6862.809692] vfio_device_fops_unl_ioctl+0x44/0x70
[ 6862.817609] do_vfs_ioctl+0xc4/0x7f0
[ 6862.824336] ksys_ioctl+0x8c/0xa0
[ 6862.830680] __arm64_sys_ioctl+0x28/0x38
[ 6862.837507] el0_svc_common+0xa0/0x190
[ 6862.844116] el0_svc_handler+0x38/0x78
[ 6862.850624] el0_svc+0x8/0xc


Any idea is appreciated.

We will find some time and board to test your new patch set, but right now our top priority is to debug the above issue, so it may take some time to get back with the test result. Sorry for that.

Thanks,
Heyi


On 2019/5/8 20:31, Marc Zyngier wrote:
> On Sun, 05 May 2019 12:07:02 +0100,
> Heyi Guo <[email protected]> wrote:
>> Hi Marc,
>>
>> Appreciate your quick patch :) We'll test it and let you know the
>> result.
> For what it is worth, I've just pushed out a branch (irq/vlpi-map-rework)
> with a more complete fix. It is of course untested (it does compile though).
>
> Please let me know how this fares.
>
> Thanks,
>
> M.
>


2019-05-09 07:50:05

by Marc Zyngier

[permalink] [raw]
Subject: Re: ARM/gic-v4: deadlock occurred

Hi Heyi,

On Wed, 08 May 2019 14:01:48 +0100,
Heyi Guo <[email protected]> wrote:
>
> Hi Marc,
>
> The bad news is that though your previous patch fixed the lockdep
> warnings, we can still reproduce soft lockup panics and some other
> exceptions... So our issue may not be related with this lock defect.
>
> Most of the call traces are as below, stuck in smp_call_function_many:
>
> [ 6862.660611] watchdog: BUG: soft lockup - CPU#27 stuck for 23s! [CPU 18/KVM:95311]
> [ 6862.668283] Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nsh nf_nat_ipv6 nf_nat_ipv4 nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad rpcrdma sunrpc rdma_ucm ib_uverbs ib_iser rdma_cm iw_cm ib_cm hns_roce_hw_v2 hns_roce aes_ce_blk crypto_simd ib_core cryptd aes_ce_cipher crc32_ce ghash_ce sha2_ce sha256_arm64 sha1_ce marvell ses enclosure hibmc_drm ttm drm_kms_helper drm sg ixgbe mdio fb_sys_fops syscopyarea hns3 hclge sysfillrect hnae3 sysimgblt sbsa_gwdt vhost_net tun vhost tap ip_tables dm_mod megaraid_sas hisi_sas_v3_hw hisi_sas_main ipmi_si ipmi_devintf ipmi_msghandler br_netfilter xt_sctp
> [ 6862.668519] irq event stamp: 1670812
> [ 6862.668526] hardirqs last enabled at (1670811): [<ffff000008083498>] el1_irq+0xd8/0x180
> [ 6862.668530] hardirqs last disabled at (1670812): [<ffff000008083448>] el1_irq+0x88/0x180
> [ 6862.668534] softirqs last enabled at (1661542): [<ffff000008081d2c>] __do_softirq+0x41c/0x51c
> [ 6862.668539] softirqs last disabled at (1661535): [<ffff0000080fafc4>] irq_exit+0x18c/0x198
> [ 6862.668544] CPU: 27 PID: 95311 Comm: CPU 18/KVM Kdump: loaded Tainted: G W 4.19.36-1.2.141.aarch64 #1
> [ 6862.668548] Hardware name: Huawei TaiShan 2280 V2/BC82AMDA, BIOS TA BIOS TaiShan 2280 V2 - B900 01/29/2019
> [ 6862.668551] pstate: 80400009 (Nzcv daif +PAN -UAO)
> [ 6862.668557] pc : smp_call_function_many+0x360/0x3b8
> [ 6862.668560] lr : smp_call_function_many+0x320/0x3b8
> [ 6862.668563] sp : ffff000028f338e0
> [ 6862.668566] x29: ffff000028f338e0 x28: ffff000009893fb4
> [ 6862.668575] x27: 0000000000000400 x26: 0000000000000000
> [ 6862.668583] x25: ffff0000080b1e08 x24: 0000000000000001
> [ 6862.668591] x23: ffff000009891bc8 x22: ffff000009891bc8
> [ 6862.668599] x21: ffff805f7d6da408 x20: ffff000009893fb4
> [ 6862.668608] x19: ffff805f7d6da400 x18: 0000000000000000
> [ 6862.668616] x17: 0000000000000000 x16: 0000000000000000
> [ 6862.668624] x15: 0000000000000000 x14: 0000000000000000
> [ 6862.668632] x13: 0000000000000040 x12: 0000000000000228
> [ 6862.668640] x11: 0000000000000020 x10: 0000000000000040
> [ 6862.668648] x9 : 0000000000000000 x8 : 0000000000000010
> [ 6862.668656] x7 : 0000000000000000 x6 : ffff805f7d329660
> [ 6862.668664] x5 : ffff000028f33850 x4 : 0000000002000402
> [ 6862.668673] x3 : 0000000000000000 x2 : ffff803f7f3dc678
> [ 6862.668681] x1 : 0000000000000003 x0 : 000000000000000a
> [ 6862.668689] Call trace:
> [ 6862.668693] smp_call_function_many+0x360/0x3b8

This would tend to indicate that one of the CPUs isn't responding to
the IPI because it has its interrupts disabled, or has crashed badly
already. Can you check where in smp_call_function_many this is
hanging? My bet is on the wait loop at the end of the function.

You'll need to find out what this unresponsive CPU is doing...

> Any idea is appreciated.
>
> We will find some time and board to test your new patch set, but
> right now our top priority is to debug the above issue, so it may
> take some time to get back with the test result. Sorry for that.

No worries, that can wait.

M.

--
Jazz is not dead, it just smell funny.

2019-05-09 11:39:50

by Heyi Guo

[permalink] [raw]
Subject: Re: ARM/gic-v4: deadlock occurred



On 2019/5/9 15:48, Marc Zyngier wrote:
> Hi Heyi,
>
> On Wed, 08 May 2019 14:01:48 +0100,
> Heyi Guo <[email protected]> wrote:
>> Hi Marc,
>>
>> The bad news is that though your previous patch fixed the lockdep
>> warnings, we can still reproduce soft lockup panics and some other
>> exceptions... So our issue may not be related with this lock defect.
>>
>> Most of the call traces are as below, stuck in smp_call_function_many:
>>
>> [ 6862.660611] watchdog: BUG: soft lockup - CPU#27 stuck for 23s! [CPU 18/KVM:95311]
>> [ 6862.668283] Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nsh nf_nat_ipv6 nf_nat_ipv4 nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad rpcrdma sunrpc rdma_ucm ib_uverbs ib_iser rdma_cm iw_cm ib_cm hns_roce_hw_v2 hns_roce aes_ce_blk crypto_simd ib_core cryptd aes_ce_cipher crc32_ce ghash_ce sha2_ce sha256_arm64 sha1_ce marvell ses enclosure hibmc_drm ttm drm_kms_helper drm sg ixgbe mdio fb_sys_fops syscopyarea hns3 hclge sysfillrect hnae3 sysimgblt sbsa_gwdt vhost_net tun vhost tap ip_tables dm_mod megaraid_sas hisi_sas_v3_hw hisi_sas_main ipmi_si ipmi_devintf ipmi_msghandler br_netfilter xt_sctp
>> [ 6862.668519] irq event stamp: 1670812
>> [ 6862.668526] hardirqs last enabled at (1670811): [<ffff000008083498>] el1_irq+0xd8/0x180
>> [ 6862.668530] hardirqs last disabled at (1670812): [<ffff000008083448>] el1_irq+0x88/0x180
>> [ 6862.668534] softirqs last enabled at (1661542): [<ffff000008081d2c>] __do_softirq+0x41c/0x51c
>> [ 6862.668539] softirqs last disabled at (1661535): [<ffff0000080fafc4>] irq_exit+0x18c/0x198
>> [ 6862.668544] CPU: 27 PID: 95311 Comm: CPU 18/KVM Kdump: loaded Tainted: G W 4.19.36-1.2.141.aarch64 #1
>> [ 6862.668548] Hardware name: Huawei TaiShan 2280 V2/BC82AMDA, BIOS TA BIOS TaiShan 2280 V2 - B900 01/29/2019
>> [ 6862.668551] pstate: 80400009 (Nzcv daif +PAN -UAO)
>> [ 6862.668557] pc : smp_call_function_many+0x360/0x3b8
>> [ 6862.668560] lr : smp_call_function_many+0x320/0x3b8
>> [ 6862.668563] sp : ffff000028f338e0
>> [ 6862.668566] x29: ffff000028f338e0 x28: ffff000009893fb4
>> [ 6862.668575] x27: 0000000000000400 x26: 0000000000000000
>> [ 6862.668583] x25: ffff0000080b1e08 x24: 0000000000000001
>> [ 6862.668591] x23: ffff000009891bc8 x22: ffff000009891bc8
>> [ 6862.668599] x21: ffff805f7d6da408 x20: ffff000009893fb4
>> [ 6862.668608] x19: ffff805f7d6da400 x18: 0000000000000000
>> [ 6862.668616] x17: 0000000000000000 x16: 0000000000000000
>> [ 6862.668624] x15: 0000000000000000 x14: 0000000000000000
>> [ 6862.668632] x13: 0000000000000040 x12: 0000000000000228
>> [ 6862.668640] x11: 0000000000000020 x10: 0000000000000040
>> [ 6862.668648] x9 : 0000000000000000 x8 : 0000000000000010
>> [ 6862.668656] x7 : 0000000000000000 x6 : ffff805f7d329660
>> [ 6862.668664] x5 : ffff000028f33850 x4 : 0000000002000402
>> [ 6862.668673] x3 : 0000000000000000 x2 : ffff803f7f3dc678
>> [ 6862.668681] x1 : 0000000000000003 x0 : 000000000000000a
>> [ 6862.668689] Call trace:
>> [ 6862.668693] smp_call_function_many+0x360/0x3b8
> This would tend to indicate that one of the CPUs isn't responding to
> the IPI because it has its interrupts disabled, or has crashed badly
> already. Can you check where in smp_call_function_many this is
> hanging? My bet is on the wait loop at the end of the function.

Yes.

>
> You'll need to find out what this unresponsive CPU is doing...

True; we need to dig more deeply...

Appreciate it.

Heyi

>
>> Any idea is appreciated.
>>
>> We will find some time and board to test your new patch set, but
>> right now our top priority is to debug the above issue, so it may
>> take some time to get back with the test result. Sorry for that.
> No worries, that can wait.
>
> M.
>


2019-07-13 11:10:04

by Heyi Guo

[permalink] [raw]
Subject: Re: ARM/gic-v4: deadlock occurred

Hi Marc,

Really sorry for the delay of testing the rework patches. I picked up
the work these days and applied the patches to our 4.19.36 stable
branch. However, I got below panic during the boot process of host (not
yet to boot guests).

I supposed the result was not related with my testing kernel version,
for we don't have many differences in ITS driver; I can test against
mainline if you think it is necessary.

Thanks,

Heyi


[ 16.990413] iommu: Adding device 0000:00:00.0 to group 6
[ 17.000691] pcieport 0000:00:00.0: Signaling PME with IRQ 133
[ 17.006456] pcieport 0000:00:00.0: AER enabled with IRQ 134
[ 17.012151] iommu: Adding device 0000:00:08.0 to group 7
[ 17.018575] Unable to handle kernel paging request at virtual address
00686361635f746f
[ 17.026467] Mem abort info:
[ 17.029251] ESR = 0x96000004
[ 17.032313] Exception class = DABT (current EL), IL = 32 bits
[ 17.038207] SET = 0, FnV = 0
[ 17.041258] EA = 0, S1PTW = 0
[ 17.044391] Data abort info:
[ 17.047260] ISV = 0, ISS = 0x00000004
[ 17.051081] CM = 0, WnR = 0
[ 17.054035] [00686361635f746f] address between user and kernel
address ranges
[ 17.061140] Internal error: Oops: 96000004 [#1] SMP
[ 17.065997] Process kworker/0:4 (pid: 889, stack limit =
0x(____ptrval____))
[ 17.073013] CPU: 0 PID: 889 Comm: kworker/0:4 Not tainted 4.19.36+ #8
[ 17.079422] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 0.52
06/20/2019
[ 17.086788] Workqueue: events work_for_cpu_fn
[ 17.091126] pstate: 20c00009 (nzCv daif +PAN +UAO)
[ 17.095895] pc : __kmalloc_track_caller+0xb0/0x2a0
[ 17.100662] lr : __kmalloc_track_caller+0x64/0x2a0
[ 17.105429] sp : ffff00002920ba00
[ 17.108728] x29: ffff00002920ba00 x28: ffff802cb6792780
[ 17.114015] x27: 00000000006080c0 x26: 00000000006000c0
[ 17.119302] x25: ffff0000084c8a00 x24: ffff802cbfc0fc00
[ 17.124588] x23: ffff802cbfc0fc00 x22: ffff0000084c8a00
[ 17.129875] x21: 0000000000000004 x20: 00000000006000c0
[ 17.135161] x19: 65686361635f746f x18: ffffffffffffffff
[ 17.140448] x17: 000000000000000e x16: 0000000000000007
[ 17.145734] x15: ffff000009119708 x14: 0000000000000000
[ 17.151021] x13: 0000000000000003 x12: 0000000000000000
[ 17.156307] x11: 0000000005f5e0ff x10: ffff00002920bb80
[ 17.161594] x9 : 00000000ffffffd0 x8 : 0000000000000098
[ 17.166880] x7 : ffff00002920bb80 x6 : ffff000008a8cb98
[ 17.172167] x5 : 000000000000a705 x4 : ffff803f802d22e0
[ 17.177453] x3 : ffff00002920b990 x2 : ffff7e00b2dafd00
[ 17.182740] x1 : 0000803f77476000 x0 : 0000000000000000
[ 17.188027] Call trace:
[ 17.190461] __kmalloc_track_caller+0xb0/0x2a0
[ 17.194886] kvasprintf+0x7c/0x108
[ 17.198271] kasprintf+0x60/0x80
[ 17.201488] populate_msi_sysfs+0xe4/0x250
[ 17.205564] __pci_enable_msi_range+0x278/0x450
[ 17.210073] pci_alloc_irq_vectors_affinity+0xd4/0x110
[ 17.215188] pcie_port_device_register+0x134/0x558
[ 17.219955] pcie_portdrv_probe+0x3c/0xf0
[ 17.223947] local_pci_probe+0x44/0xa8
[ 17.227679] work_for_cpu_fn+0x20/0x30
[ 17.231411] process_one_work+0x1b4/0x3f8
[ 17.235401] worker_thread+0x210/0x470
[ 17.239134] kthread+0x134/0x138
[ 17.242348] ret_from_fork+0x10/0x18
[ 17.245907] Code: f100005f fa401a64 54000bc0 b9402300 (f8606a66)
[ 17.251970] kernel fault(0x1) notification starting on CPU 0
[ 17.257602] kernel fault(0x1) notification finished on CPU 0
[ 17.263234] Modules linked in:
[ 17.266277] ---[ end trace 023e6b19cb68b94f ]---



On 2019/5/9 15:48, Marc Zyngier wrote:
> Hi Heyi,
>
> On Wed, 08 May 2019 14:01:48 +0100,
> Heyi Guo <[email protected]> wrote:
>> Hi Marc,
>>
>> The bad news is that though your previous patch fixed the lockdep
>> warnings, we can still reproduce soft lockup panics and some other
>> exceptions... So our issue may not be related with this lock defect.
>>
>> Most of the call traces are as below, stuck in smp_call_function_many:
>>
>> [ 6862.660611] watchdog: BUG: soft lockup - CPU#27 stuck for 23s! [CPU 18/KVM:95311]
>> [ 6862.668283] Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nsh nf_nat_ipv6 nf_nat_ipv4 nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad rpcrdma sunrpc rdma_ucm ib_uverbs ib_iser rdma_cm iw_cm ib_cm hns_roce_hw_v2 hns_roce aes_ce_blk crypto_simd ib_core cryptd aes_ce_cipher crc32_ce ghash_ce sha2_ce sha256_arm64 sha1_ce marvell ses enclosure hibmc_drm ttm drm_kms_helper drm sg ixgbe mdio fb_sys_fops syscopyarea hns3 hclge sysfillrect hnae3 sysimgblt sbsa_gwdt vhost_net tun vhost tap ip_tables dm_mod megaraid_sas hisi_sas_v3_hw hisi_sas_main ipmi_si ipmi_devintf ipmi_msghandler br_netfilter xt_sctp
>> [ 6862.668519] irq event stamp: 1670812
>> [ 6862.668526] hardirqs last enabled at (1670811): [<ffff000008083498>] el1_irq+0xd8/0x180
>> [ 6862.668530] hardirqs last disabled at (1670812): [<ffff000008083448>] el1_irq+0x88/0x180
>> [ 6862.668534] softirqs last enabled at (1661542): [<ffff000008081d2c>] __do_softirq+0x41c/0x51c
>> [ 6862.668539] softirqs last disabled at (1661535): [<ffff0000080fafc4>] irq_exit+0x18c/0x198
>> [ 6862.668544] CPU: 27 PID: 95311 Comm: CPU 18/KVM Kdump: loaded Tainted: G W 4.19.36-1.2.141.aarch64 #1
>> [ 6862.668548] Hardware name: Huawei TaiShan 2280 V2/BC82AMDA, BIOS TA BIOS TaiShan 2280 V2 - B900 01/29/2019
>> [ 6862.668551] pstate: 80400009 (Nzcv daif +PAN -UAO)
>> [ 6862.668557] pc : smp_call_function_many+0x360/0x3b8
>> [ 6862.668560] lr : smp_call_function_many+0x320/0x3b8
>> [ 6862.668563] sp : ffff000028f338e0
>> [ 6862.668566] x29: ffff000028f338e0 x28: ffff000009893fb4
>> [ 6862.668575] x27: 0000000000000400 x26: 0000000000000000
>> [ 6862.668583] x25: ffff0000080b1e08 x24: 0000000000000001
>> [ 6862.668591] x23: ffff000009891bc8 x22: ffff000009891bc8
>> [ 6862.668599] x21: ffff805f7d6da408 x20: ffff000009893fb4
>> [ 6862.668608] x19: ffff805f7d6da400 x18: 0000000000000000
>> [ 6862.668616] x17: 0000000000000000 x16: 0000000000000000
>> [ 6862.668624] x15: 0000000000000000 x14: 0000000000000000
>> [ 6862.668632] x13: 0000000000000040 x12: 0000000000000228
>> [ 6862.668640] x11: 0000000000000020 x10: 0000000000000040
>> [ 6862.668648] x9 : 0000000000000000 x8 : 0000000000000010
>> [ 6862.668656] x7 : 0000000000000000 x6 : ffff805f7d329660
>> [ 6862.668664] x5 : ffff000028f33850 x4 : 0000000002000402
>> [ 6862.668673] x3 : 0000000000000000 x2 : ffff803f7f3dc678
>> [ 6862.668681] x1 : 0000000000000003 x0 : 000000000000000a
>> [ 6862.668689] Call trace:
>> [ 6862.668693] smp_call_function_many+0x360/0x3b8
> This would tend to indicate that one of the CPUs isn't responding to
> the IPI because it has its interrupts disabled, or has crashed badly
> already. Can you check where in smp_call_function_many this is
> hanging? My bet is on the wait loop at the end of the function.
>
> You'll need to find out what this unresponsive CPU is doing...
>
>> Any idea is appreciated.
>>
>> We will find some time and board to test your new patch set, but
>> right now our top priority is to debug the above issue, so it may
>> take some time to get back with the test result. Sorry for that.
> No worries, that can wait.
>
> M.
>


2019-07-13 11:40:45

by Marc Zyngier

[permalink] [raw]
Subject: Re: ARM/gic-v4: deadlock occurred

On Sat, 13 Jul 2019 19:08:57 +0800
Guoheyi <[email protected]> wrote:

Hi Heyi,

> Hi Marc,
>
> Really sorry for the delay of testing the rework patches. I picked up
> the work these days and applied the patches to our 4.19.36 stable
> branch. However, I got below panic during the boot process of host
> (not yet to boot guests).
>
> I supposed the result was not related with my testing kernel version,
> for we don't have many differences in ITS driver; I can test against
> mainline if you think it is necessary.

In general, please report bugs against mainline. There isn't much I can
do about your private tree...

That being said, a couple of comments below.

> Thanks,
>
> Heyi
>
>
> [ 16.990413] iommu: Adding device 0000:00:00.0 to group 6
> [ 17.000691] pcieport 0000:00:00.0: Signaling PME with IRQ 133
> [ 17.006456] pcieport 0000:00:00.0: AER enabled with IRQ 134
> [ 17.012151] iommu: Adding device 0000:00:08.0 to group 7
> [ 17.018575] Unable to handle kernel paging request at virtual address 00686361635f746f
> [ 17.026467] Mem abort info:
> [ 17.029251] ESR = 0x96000004
> [ 17.032313] Exception class = DABT (current EL), IL = 32 bits
> [ 17.038207] SET = 0, FnV = 0
> [ 17.041258] EA = 0, S1PTW = 0
> [ 17.044391] Data abort info:
> [ 17.047260] ISV = 0, ISS = 0x00000004
> [ 17.051081] CM = 0, WnR = 0
> [ 17.054035] [00686361635f746f] address between user and kernel address ranges
> [ 17.061140] Internal error: Oops: 96000004 [#1] SMP
> [ 17.065997] Process kworker/0:4 (pid: 889, stack limit = 0x(____ptrval____))
> [ 17.073013] CPU: 0 PID: 889 Comm: kworker/0:4 Not tainted 4.19.36+ #8
> [ 17.079422] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 0.52 06/20/2019
> [ 17.086788] Workqueue: events work_for_cpu_fn
> [ 17.091126] pstate: 20c00009 (nzCv daif +PAN +UAO)
> [ 17.095895] pc : __kmalloc_track_caller+0xb0/0x2a0
> [ 17.100662] lr : __kmalloc_track_caller+0x64/0x2a0
> [ 17.105429] sp : ffff00002920ba00
> [ 17.108728] x29: ffff00002920ba00 x28: ffff802cb6792780
> [ 17.114015] x27: 00000000006080c0 x26: 00000000006000c0
> [ 17.119302] x25: ffff0000084c8a00 x24: ffff802cbfc0fc00
> [ 17.124588] x23: ffff802cbfc0fc00 x22: ffff0000084c8a00
> [ 17.129875] x21: 0000000000000004 x20: 00000000006000c0
> [ 17.135161] x19: 65686361635f746f x18: ffffffffffffffff
> [ 17.140448] x17: 000000000000000e x16: 0000000000000007
> [ 17.145734] x15: ffff000009119708 x14: 0000000000000000
> [ 17.151021] x13: 0000000000000003 x12: 0000000000000000
> [ 17.156307] x11: 0000000005f5e0ff x10: ffff00002920bb80
> [ 17.161594] x9 : 00000000ffffffd0 x8 : 0000000000000098
> [ 17.166880] x7 : ffff00002920bb80 x6 : ffff000008a8cb98
> [ 17.172167] x5 : 000000000000a705 x4 : ffff803f802d22e0
> [ 17.177453] x3 : ffff00002920b990 x2 : ffff7e00b2dafd00
> [ 17.182740] x1 : 0000803f77476000 x0 : 0000000000000000
> [ 17.188027] Call trace:
> [ 17.190461] __kmalloc_track_caller+0xb0/0x2a0
> [ 17.194886] kvasprintf+0x7c/0x108
> [ 17.198271] kasprintf+0x60/0x80
> [ 17.201488] populate_msi_sysfs+0xe4/0x250
> [ 17.205564] __pci_enable_msi_range+0x278/0x450
> [ 17.210073] pci_alloc_irq_vectors_affinity+0xd4/0x110
> [ 17.215188] pcie_port_device_register+0x134/0x558
> [ 17.219955] pcie_portdrv_probe+0x3c/0xf0
> [ 17.223947] local_pci_probe+0x44/0xa8
> [ 17.227679] work_for_cpu_fn+0x20/0x30
> [ 17.231411] process_one_work+0x1b4/0x3f8
> [ 17.235401] worker_thread+0x210/0x470
> [ 17.239134] kthread+0x134/0x138
> [ 17.242348] ret_from_fork+0x10/0x18
> [ 17.245907] Code: f100005f fa401a64 54000bc0 b9402300 (f8606a66)
> [ 17.251970] kernel fault(0x1) notification starting on CPU 0
> [ 17.257602] kernel fault(0x1) notification finished on CPU 0
> [ 17.263234] Modules linked in:
> [ 17.266277] ---[ end trace 023e6b19cb68b94f ]---

What in this trace makes you think that this has anything to do with an
ITS change? The system crashes in a completely unrelated piece of code.
Also, if you look at the VA that indicates the crash, it should be
obvious that this isn't a kernel address. Worse, this is a piece of
ASCII text:

$ echo 00686361635f746f | xxd -r -p
hcac_to

This tends to indicate some memory form of corruption ("hcac_to" looks
like the begining of a symbol), and I'm not sure how the ITS would be
involved in this... Furthermore, this happens on the host at boot time,
while the patch you suspect only affects VMs...

I think you need to spend more time analysing this issue.

Thanks,

M.
--
Without deviation from the norm, progress is not possible.

2019-07-15 06:33:26

by Heyi Guo

[permalink] [raw]
Subject: Re: ARM/gic-v4: deadlock occurred

Hi Marc,

The issue only occurs after applying the vlpi_map_rework patches, and we
can see the patches only affect VM; it changes its_create_device() a
little so it may affect host booting in some ways, so I took the lazy
way to send it out for some insights.

I am suspecting below code; if alloc_lpis == false, what will happen?
Anyway, I will investigate more on this.


if (alloc_lpis) {
lpi_map = its_lpi_alloc(nvecs, &lpi_base, &nr_lpis);
if (lpi_map)
col_map = kcalloc(nr_lpis, sizeof(*col_map),
GFP_KERNEL);
} else {
col_map = kcalloc(nr_ites, sizeof(*col_map), GFP_KERNEL);
nr_lpis = 0;
lpi_base = 0;
}
if (its->is_v4)
vlpi_map = kcalloc(nr_lpis, sizeof(*vlpi_map), GFP_KERNEL);

if (!dev || !itt || !col_map || (!lpi_map && alloc_lpis) ||
(!vlpi_map && its->is_v4)) {
kfree(dev);
kfree(itt);
kfree(lpi_map);
kfree(col_map);
kfree(vlpi_map);
return NULL;
}


Thanks,

Heyi


On 2019/7/13 19:37, Marc Zyngier wrote:
> On Sat, 13 Jul 2019 19:08:57 +0800
> Guoheyi <[email protected]> wrote:
>
> Hi Heyi,
>
>> Hi Marc,
>>
>> Really sorry for the delay of testing the rework patches. I picked up
>> the work these days and applied the patches to our 4.19.36 stable
>> branch. However, I got below panic during the boot process of host
>> (not yet to boot guests).
>>
>> I supposed the result was not related with my testing kernel version,
>> for we don't have many differences in ITS driver; I can test against
>> mainline if you think it is necessary.
> In general, please report bugs against mainline. There isn't much I can
> do about your private tree...
>
> That being said, a couple of comments below.
>
>> Thanks,
>>
>> Heyi
>>
>>
>> [ 16.990413] iommu: Adding device 0000:00:00.0 to group 6
>> [ 17.000691] pcieport 0000:00:00.0: Signaling PME with IRQ 133
>> [ 17.006456] pcieport 0000:00:00.0: AER enabled with IRQ 134
>> [ 17.012151] iommu: Adding device 0000:00:08.0 to group 7
>> [ 17.018575] Unable to handle kernel paging request at virtual address 00686361635f746f
>> [ 17.026467] Mem abort info:
>> [ 17.029251] ESR = 0x96000004
>> [ 17.032313] Exception class = DABT (current EL), IL = 32 bits
>> [ 17.038207] SET = 0, FnV = 0
>> [ 17.041258] EA = 0, S1PTW = 0
>> [ 17.044391] Data abort info:
>> [ 17.047260] ISV = 0, ISS = 0x00000004
>> [ 17.051081] CM = 0, WnR = 0
>> [ 17.054035] [00686361635f746f] address between user and kernel address ranges
>> [ 17.061140] Internal error: Oops: 96000004 [#1] SMP
>> [ 17.065997] Process kworker/0:4 (pid: 889, stack limit = 0x(____ptrval____))
>> [ 17.073013] CPU: 0 PID: 889 Comm: kworker/0:4 Not tainted 4.19.36+ #8
>> [ 17.079422] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 0.52 06/20/2019
>> [ 17.086788] Workqueue: events work_for_cpu_fn
>> [ 17.091126] pstate: 20c00009 (nzCv daif +PAN +UAO)
>> [ 17.095895] pc : __kmalloc_track_caller+0xb0/0x2a0
>> [ 17.100662] lr : __kmalloc_track_caller+0x64/0x2a0
>> [ 17.105429] sp : ffff00002920ba00
>> [ 17.108728] x29: ffff00002920ba00 x28: ffff802cb6792780
>> [ 17.114015] x27: 00000000006080c0 x26: 00000000006000c0
>> [ 17.119302] x25: ffff0000084c8a00 x24: ffff802cbfc0fc00
>> [ 17.124588] x23: ffff802cbfc0fc00 x22: ffff0000084c8a00
>> [ 17.129875] x21: 0000000000000004 x20: 00000000006000c0
>> [ 17.135161] x19: 65686361635f746f x18: ffffffffffffffff
>> [ 17.140448] x17: 000000000000000e x16: 0000000000000007
>> [ 17.145734] x15: ffff000009119708 x14: 0000000000000000
>> [ 17.151021] x13: 0000000000000003 x12: 0000000000000000
>> [ 17.156307] x11: 0000000005f5e0ff x10: ffff00002920bb80
>> [ 17.161594] x9 : 00000000ffffffd0 x8 : 0000000000000098
>> [ 17.166880] x7 : ffff00002920bb80 x6 : ffff000008a8cb98
>> [ 17.172167] x5 : 000000000000a705 x4 : ffff803f802d22e0
>> [ 17.177453] x3 : ffff00002920b990 x2 : ffff7e00b2dafd00
>> [ 17.182740] x1 : 0000803f77476000 x0 : 0000000000000000
>> [ 17.188027] Call trace:
>> [ 17.190461] __kmalloc_track_caller+0xb0/0x2a0
>> [ 17.194886] kvasprintf+0x7c/0x108
>> [ 17.198271] kasprintf+0x60/0x80
>> [ 17.201488] populate_msi_sysfs+0xe4/0x250
>> [ 17.205564] __pci_enable_msi_range+0x278/0x450
>> [ 17.210073] pci_alloc_irq_vectors_affinity+0xd4/0x110
>> [ 17.215188] pcie_port_device_register+0x134/0x558
>> [ 17.219955] pcie_portdrv_probe+0x3c/0xf0
>> [ 17.223947] local_pci_probe+0x44/0xa8
>> [ 17.227679] work_for_cpu_fn+0x20/0x30
>> [ 17.231411] process_one_work+0x1b4/0x3f8
>> [ 17.235401] worker_thread+0x210/0x470
>> [ 17.239134] kthread+0x134/0x138
>> [ 17.242348] ret_from_fork+0x10/0x18
>> [ 17.245907] Code: f100005f fa401a64 54000bc0 b9402300 (f8606a66)
>> [ 17.251970] kernel fault(0x1) notification starting on CPU 0
>> [ 17.257602] kernel fault(0x1) notification finished on CPU 0
>> [ 17.263234] Modules linked in:
>> [ 17.266277] ---[ end trace 023e6b19cb68b94f ]---
> What in this trace makes you think that this has anything to do with an
> ITS change? The system crashes in a completely unrelated piece of code.
> Also, if you look at the VA that indicates the crash, it should be
> obvious that this isn't a kernel address. Worse, this is a piece of
> ASCII text:
>
> $ echo 00686361635f746f | xxd -r -p
> hcac_to
>
> This tends to indicate some memory form of corruption ("hcac_to" looks
> like the begining of a symbol), and I'm not sure how the ITS would be
> involved in this... Furthermore, this happens on the host at boot time,
> while the patch you suspect only affects VMs...
>
> I think you need to spend more time analysing this issue.
>
> Thanks,
>
> M.


2019-07-15 09:11:12

by Marc Zyngier

[permalink] [raw]
Subject: Re: ARM/gic-v4: deadlock occurred

On 15/07/2019 07:32, Guoheyi wrote:
> Hi Marc,
>
> The issue only occurs after applying the vlpi_map_rework patches, and we
> can see the patches only affect VM; it changes its_create_device() a
> little so it may affect host booting in some ways, so I took the lazy
> way to send it out for some insights.
>
> I am suspecting below code; if alloc_lpis == false, what will happen?

If !alloc_lpis, then we don't allocate the lpi_map, which is the
intended effect.

> Anyway, I will investigate more on this.
>
>
> if (alloc_lpis) {
> lpi_map = its_lpi_alloc(nvecs, &lpi_base, &nr_lpis);
> if (lpi_map)
> col_map = kcalloc(nr_lpis, sizeof(*col_map),
> GFP_KERNEL);
> } else {
> col_map = kcalloc(nr_ites, sizeof(*col_map), GFP_KERNEL);
> nr_lpis = 0;
> lpi_base = 0;
> }
> if (its->is_v4)
> vlpi_map = kcalloc(nr_lpis, sizeof(*vlpi_map), GFP_KERNEL);
>
> if (!dev || !itt || !col_map || (!lpi_map && alloc_lpis) ||
> (!vlpi_map && its->is_v4)) {
> kfree(dev);
> kfree(itt);
> kfree(lpi_map);
> kfree(col_map);
> kfree(vlpi_map);
> return NULL;
> }

How does this relate to the patch posted in this discussion? The
proposed changes turn the locking from a mutex into a raw_spinlock.

That's not to say there is no bug in the GICv4 code, but I'd expect a
bit more analysis before you start pointing at random pieces of code
without establishing any link between effects and possible causes.

Thanks,

M.
>
>
> Thanks,
>
> Heyi
>
>
> On 2019/7/13 19:37, Marc Zyngier wrote:
>> On Sat, 13 Jul 2019 19:08:57 +0800
>> Guoheyi <[email protected]> wrote:
>>
>> Hi Heyi,
>>
>>> Hi Marc,
>>>
>>> Really sorry for the delay of testing the rework patches. I picked up
>>> the work these days and applied the patches to our 4.19.36 stable
>>> branch. However, I got below panic during the boot process of host
>>> (not yet to boot guests).
>>>
>>> I supposed the result was not related with my testing kernel version,
>>> for we don't have many differences in ITS driver; I can test against
>>> mainline if you think it is necessary.
>> In general, please report bugs against mainline. There isn't much I can
>> do about your private tree...
>>
>> That being said, a couple of comments below.
>>
>>> Thanks,
>>>
>>> Heyi
>>>
>>>
>>> [ 16.990413] iommu: Adding device 0000:00:00.0 to group 6
>>> [ 17.000691] pcieport 0000:00:00.0: Signaling PME with IRQ 133
>>> [ 17.006456] pcieport 0000:00:00.0: AER enabled with IRQ 134
>>> [ 17.012151] iommu: Adding device 0000:00:08.0 to group 7
>>> [ 17.018575] Unable to handle kernel paging request at virtual address 00686361635f746f
>>> [ 17.026467] Mem abort info:
>>> [ 17.029251] ESR = 0x96000004
>>> [ 17.032313] Exception class = DABT (current EL), IL = 32 bits
>>> [ 17.038207] SET = 0, FnV = 0
>>> [ 17.041258] EA = 0, S1PTW = 0
>>> [ 17.044391] Data abort info:
>>> [ 17.047260] ISV = 0, ISS = 0x00000004
>>> [ 17.051081] CM = 0, WnR = 0
>>> [ 17.054035] [00686361635f746f] address between user and kernel address ranges
>>> [ 17.061140] Internal error: Oops: 96000004 [#1] SMP
>>> [ 17.065997] Process kworker/0:4 (pid: 889, stack limit = 0x(____ptrval____))
>>> [ 17.073013] CPU: 0 PID: 889 Comm: kworker/0:4 Not tainted 4.19.36+ #8
>>> [ 17.079422] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 0.52 06/20/2019
>>> [ 17.086788] Workqueue: events work_for_cpu_fn
>>> [ 17.091126] pstate: 20c00009 (nzCv daif +PAN +UAO)
>>> [ 17.095895] pc : __kmalloc_track_caller+0xb0/0x2a0
>>> [ 17.100662] lr : __kmalloc_track_caller+0x64/0x2a0
>>> [ 17.105429] sp : ffff00002920ba00
>>> [ 17.108728] x29: ffff00002920ba00 x28: ffff802cb6792780
>>> [ 17.114015] x27: 00000000006080c0 x26: 00000000006000c0
>>> [ 17.119302] x25: ffff0000084c8a00 x24: ffff802cbfc0fc00
>>> [ 17.124588] x23: ffff802cbfc0fc00 x22: ffff0000084c8a00
>>> [ 17.129875] x21: 0000000000000004 x20: 00000000006000c0
>>> [ 17.135161] x19: 65686361635f746f x18: ffffffffffffffff
>>> [ 17.140448] x17: 000000000000000e x16: 0000000000000007
>>> [ 17.145734] x15: ffff000009119708 x14: 0000000000000000
>>> [ 17.151021] x13: 0000000000000003 x12: 0000000000000000
>>> [ 17.156307] x11: 0000000005f5e0ff x10: ffff00002920bb80
>>> [ 17.161594] x9 : 00000000ffffffd0 x8 : 0000000000000098
>>> [ 17.166880] x7 : ffff00002920bb80 x6 : ffff000008a8cb98
>>> [ 17.172167] x5 : 000000000000a705 x4 : ffff803f802d22e0
>>> [ 17.177453] x3 : ffff00002920b990 x2 : ffff7e00b2dafd00
>>> [ 17.182740] x1 : 0000803f77476000 x0 : 0000000000000000
>>> [ 17.188027] Call trace:
>>> [ 17.190461] __kmalloc_track_caller+0xb0/0x2a0
>>> [ 17.194886] kvasprintf+0x7c/0x108
>>> [ 17.198271] kasprintf+0x60/0x80
>>> [ 17.201488] populate_msi_sysfs+0xe4/0x250
>>> [ 17.205564] __pci_enable_msi_range+0x278/0x450
>>> [ 17.210073] pci_alloc_irq_vectors_affinity+0xd4/0x110
>>> [ 17.215188] pcie_port_device_register+0x134/0x558
>>> [ 17.219955] pcie_portdrv_probe+0x3c/0xf0
>>> [ 17.223947] local_pci_probe+0x44/0xa8
>>> [ 17.227679] work_for_cpu_fn+0x20/0x30
>>> [ 17.231411] process_one_work+0x1b4/0x3f8
>>> [ 17.235401] worker_thread+0x210/0x470
>>> [ 17.239134] kthread+0x134/0x138
>>> [ 17.242348] ret_from_fork+0x10/0x18
>>> [ 17.245907] Code: f100005f fa401a64 54000bc0 b9402300 (f8606a66)
>>> [ 17.251970] kernel fault(0x1) notification starting on CPU 0
>>> [ 17.257602] kernel fault(0x1) notification finished on CPU 0
>>> [ 17.263234] Modules linked in:
>>> [ 17.266277] ---[ end trace 023e6b19cb68b94f ]---
>> What in this trace makes you think that this has anything to do with an
>> ITS change? The system crashes in a completely unrelated piece of code.
>> Also, if you look at the VA that indicates the crash, it should be
>> obvious that this isn't a kernel address. Worse, this is a piece of
>> ASCII text:
>>
>> $ echo 00686361635f746f | xxd -r -p
>> hcac_to
>>
>> This tends to indicate some memory form of corruption ("hcac_to" looks
>> like the begining of a symbol), and I'm not sure how the ITS would be
>> involved in this... Furthermore, this happens on the host at boot time,
>> while the patch you suspect only affects VMs...
>>
>> I think you need to spend more time analysing this issue.
>>
>> Thanks,
>>
>> M.
>
>


--
Jazz is not dead. It just smells funny...

2019-07-15 10:44:43

by Heyi Guo

[permalink] [raw]
Subject: Re: ARM/gic-v4: deadlock occurred



On 2019/7/15 17:07, Marc Zyngier wrote:
> On 15/07/2019 07:32, Guoheyi wrote:
>> Hi Marc,
>>
>> The issue only occurs after applying the vlpi_map_rework patches, and we
>> can see the patches only affect VM; it changes its_create_device() a
>> little so it may affect host booting in some ways, so I took the lazy
>> way to send it out for some insights.
>>
>> I am suspecting below code; if alloc_lpis == false, what will happen?
> If !alloc_lpis, then we don't allocate the lpi_map, which is the
> intended effect.
>
>> Anyway, I will investigate more on this.
>>
>>
>> if (alloc_lpis) {
>> lpi_map = its_lpi_alloc(nvecs, &lpi_base, &nr_lpis);
>> if (lpi_map)
>> col_map = kcalloc(nr_lpis, sizeof(*col_map),
>> GFP_KERNEL);
>> } else {
>> col_map = kcalloc(nr_ites, sizeof(*col_map), GFP_KERNEL);
>> nr_lpis = 0;
>> lpi_base = 0;
>> }
>> if (its->is_v4)
>> vlpi_map = kcalloc(nr_lpis, sizeof(*vlpi_map), GFP_KERNEL);
>>
>> if (!dev || !itt || !col_map || (!lpi_map && alloc_lpis) ||
>> (!vlpi_map && its->is_v4)) {
>> kfree(dev);
>> kfree(itt);
>> kfree(lpi_map);
>> kfree(col_map);
>> kfree(vlpi_map);
>> return NULL;
>> }
> How does this relate to the patch posted in this discussion? The
> proposed changes turn the locking from a mutex into a raw_spinlock.

I'm testing the patchset in
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/vlpi-map-rework,
not only the patch posted in the mail directly. The first patch
*"**irqchip/gic-v3-its: Make vlpi_map allocations atomic" works well in
our internal tree, and my new testing is against the other 3 patches in
your vlpi-map-rework branch, as I promised. I'm sorry if I didn't state
this clearly.

Heyi
*
>
> That's not to say there is no bug in the GICv4 code, but I'd expect a
> bit more analysis before you start pointing at random pieces of code
> without establishing any link between effects and possible causes.
>
> Thanks,
>
> M.
>>
>> Thanks,
>>
>> Heyi
>>
>>
>> On 2019/7/13 19:37, Marc Zyngier wrote:
>>> On Sat, 13 Jul 2019 19:08:57 +0800
>>> Guoheyi <[email protected]> wrote:
>>>
>>> Hi Heyi,
>>>
>>>> Hi Marc,
>>>>
>>>> Really sorry for the delay of testing the rework patches. I picked up
>>>> the work these days and applied the patches to our 4.19.36 stable
>>>> branch. However, I got below panic during the boot process of host
>>>> (not yet to boot guests).
>>>>
>>>> I supposed the result was not related with my testing kernel version,
>>>> for we don't have many differences in ITS driver; I can test against
>>>> mainline if you think it is necessary.
>>> In general, please report bugs against mainline. There isn't much I can
>>> do about your private tree...
>>>
>>> That being said, a couple of comments below.
>>>
>>>> Thanks,
>>>>
>>>> Heyi
>>>>
>>>>
>>>> [ 16.990413] iommu: Adding device 0000:00:00.0 to group 6
>>>> [ 17.000691] pcieport 0000:00:00.0: Signaling PME with IRQ 133
>>>> [ 17.006456] pcieport 0000:00:00.0: AER enabled with IRQ 134
>>>> [ 17.012151] iommu: Adding device 0000:00:08.0 to group 7
>>>> [ 17.018575] Unable to handle kernel paging request at virtual address 00686361635f746f
>>>> [ 17.026467] Mem abort info:
>>>> [ 17.029251] ESR = 0x96000004
>>>> [ 17.032313] Exception class = DABT (current EL), IL = 32 bits
>>>> [ 17.038207] SET = 0, FnV = 0
>>>> [ 17.041258] EA = 0, S1PTW = 0
>>>> [ 17.044391] Data abort info:
>>>> [ 17.047260] ISV = 0, ISS = 0x00000004
>>>> [ 17.051081] CM = 0, WnR = 0
>>>> [ 17.054035] [00686361635f746f] address between user and kernel address ranges
>>>> [ 17.061140] Internal error: Oops: 96000004 [#1] SMP
>>>> [ 17.065997] Process kworker/0:4 (pid: 889, stack limit = 0x(____ptrval____))
>>>> [ 17.073013] CPU: 0 PID: 889 Comm: kworker/0:4 Not tainted 4.19.36+ #8
>>>> [ 17.079422] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 0.52 06/20/2019
>>>> [ 17.086788] Workqueue: events work_for_cpu_fn
>>>> [ 17.091126] pstate: 20c00009 (nzCv daif +PAN +UAO)
>>>> [ 17.095895] pc : __kmalloc_track_caller+0xb0/0x2a0
>>>> [ 17.100662] lr : __kmalloc_track_caller+0x64/0x2a0
>>>> [ 17.105429] sp : ffff00002920ba00
>>>> [ 17.108728] x29: ffff00002920ba00 x28: ffff802cb6792780
>>>> [ 17.114015] x27: 00000000006080c0 x26: 00000000006000c0
>>>> [ 17.119302] x25: ffff0000084c8a00 x24: ffff802cbfc0fc00
>>>> [ 17.124588] x23: ffff802cbfc0fc00 x22: ffff0000084c8a00
>>>> [ 17.129875] x21: 0000000000000004 x20: 00000000006000c0
>>>> [ 17.135161] x19: 65686361635f746f x18: ffffffffffffffff
>>>> [ 17.140448] x17: 000000000000000e x16: 0000000000000007
>>>> [ 17.145734] x15: ffff000009119708 x14: 0000000000000000
>>>> [ 17.151021] x13: 0000000000000003 x12: 0000000000000000
>>>> [ 17.156307] x11: 0000000005f5e0ff x10: ffff00002920bb80
>>>> [ 17.161594] x9 : 00000000ffffffd0 x8 : 0000000000000098
>>>> [ 17.166880] x7 : ffff00002920bb80 x6 : ffff000008a8cb98
>>>> [ 17.172167] x5 : 000000000000a705 x4 : ffff803f802d22e0
>>>> [ 17.177453] x3 : ffff00002920b990 x2 : ffff7e00b2dafd00
>>>> [ 17.182740] x1 : 0000803f77476000 x0 : 0000000000000000
>>>> [ 17.188027] Call trace:
>>>> [ 17.190461] __kmalloc_track_caller+0xb0/0x2a0
>>>> [ 17.194886] kvasprintf+0x7c/0x108
>>>> [ 17.198271] kasprintf+0x60/0x80
>>>> [ 17.201488] populate_msi_sysfs+0xe4/0x250
>>>> [ 17.205564] __pci_enable_msi_range+0x278/0x450
>>>> [ 17.210073] pci_alloc_irq_vectors_affinity+0xd4/0x110
>>>> [ 17.215188] pcie_port_device_register+0x134/0x558
>>>> [ 17.219955] pcie_portdrv_probe+0x3c/0xf0
>>>> [ 17.223947] local_pci_probe+0x44/0xa8
>>>> [ 17.227679] work_for_cpu_fn+0x20/0x30
>>>> [ 17.231411] process_one_work+0x1b4/0x3f8
>>>> [ 17.235401] worker_thread+0x210/0x470
>>>> [ 17.239134] kthread+0x134/0x138
>>>> [ 17.242348] ret_from_fork+0x10/0x18
>>>> [ 17.245907] Code: f100005f fa401a64 54000bc0 b9402300 (f8606a66)
>>>> [ 17.251970] kernel fault(0x1) notification starting on CPU 0
>>>> [ 17.257602] kernel fault(0x1) notification finished on CPU 0
>>>> [ 17.263234] Modules linked in:
>>>> [ 17.266277] ---[ end trace 023e6b19cb68b94f ]---
>>> What in this trace makes you think that this has anything to do with an
>>> ITS change? The system crashes in a completely unrelated piece of code.
>>> Also, if you look at the VA that indicates the crash, it should be
>>> obvious that this isn't a kernel address. Worse, this is a piece of
>>> ASCII text:
>>>
>>> $ echo 00686361635f746f | xxd -r -p
>>> hcac_to
>>>
>>> This tends to indicate some memory form of corruption ("hcac_to" looks
>>> like the begining of a symbol), and I'm not sure how the ITS would be
>>> involved in this... Furthermore, this happens on the host at boot time,
>>> while the patch you suspect only affects VMs...
>>>
>>> I think you need to spend more time analysing this issue.
>>>
>>> Thanks,
>>>
>>> M.
>>
>


2019-07-15 11:14:41

by Marc Zyngier

[permalink] [raw]
Subject: Re: ARM/gic-v4: deadlock occurred

On 15/07/2019 11:43, Guoheyi wrote:
>
>
> On 2019/7/15 17:07, Marc Zyngier wrote:
>> On 15/07/2019 07:32, Guoheyi wrote:
>>> Hi Marc,
>>>
>>> The issue only occurs after applying the vlpi_map_rework patches, and we
>>> can see the patches only affect VM; it changes its_create_device() a
>>> little so it may affect host booting in some ways, so I took the lazy
>>> way to send it out for some insights.
>>>
>>> I am suspecting below code; if alloc_lpis == false, what will happen?
>> If !alloc_lpis, then we don't allocate the lpi_map, which is the
>> intended effect.
>>
>>> Anyway, I will investigate more on this.
>>>
>>>
>>> if (alloc_lpis) {
>>> lpi_map = its_lpi_alloc(nvecs, &lpi_base, &nr_lpis);
>>> if (lpi_map)
>>> col_map = kcalloc(nr_lpis, sizeof(*col_map),
>>> GFP_KERNEL);
>>> } else {
>>> col_map = kcalloc(nr_ites, sizeof(*col_map), GFP_KERNEL);
>>> nr_lpis = 0;
>>> lpi_base = 0;
>>> }
>>> if (its->is_v4)
>>> vlpi_map = kcalloc(nr_lpis, sizeof(*vlpi_map), GFP_KERNEL);
>>>
>>> if (!dev || !itt || !col_map || (!lpi_map && alloc_lpis) ||
>>> (!vlpi_map && its->is_v4)) {
>>> kfree(dev);
>>> kfree(itt);
>>> kfree(lpi_map);
>>> kfree(col_map);
>>> kfree(vlpi_map);
>>> return NULL;
>>> }
>> How does this relate to the patch posted in this discussion? The
>> proposed changes turn the locking from a mutex into a raw_spinlock.
>
> I'm testing the patchset in
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/vlpi-map-rework,
> not only the patch posted in the mail directly. The first patch
> *"**irqchip/gic-v3-its: Make vlpi_map allocations atomic" works well in
> our internal tree, and my new testing is against the other 3 patches in
> your vlpi-map-rework branch, as I promised. I'm sorry if I didn't state
> this clearly.

Ah, I had completely forgot about this branch. As I said, it is
completely untested. I'll see if I can get some brain bandwidth in the
next couple of weeks to get back to it...

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2019-07-15 12:50:19

by Heyi Guo

[permalink] [raw]
Subject: Re: ARM/gic-v4: deadlock occurred


On 2019/7/15 19:13, Marc Zyngier wrote:
> On 15/07/2019 11:43, Guoheyi wrote:
>>
>> On 2019/7/15 17:07, Marc Zyngier wrote:
>>> On 15/07/2019 07:32, Guoheyi wrote:
>>>> Hi Marc,
>>>>
>>>> The issue only occurs after applying the vlpi_map_rework patches, and we
>>>> can see the patches only affect VM; it changes its_create_device() a
>>>> little so it may affect host booting in some ways, so I took the lazy
>>>> way to send it out for some insights.
>>>>
>>>> I am suspecting below code; if alloc_lpis == false, what will happen?
>>> If !alloc_lpis, then we don't allocate the lpi_map, which is the
>>> intended effect.
>>>
>>>> Anyway, I will investigate more on this.
>>>>
>>>>
>>>> if (alloc_lpis) {
>>>> lpi_map = its_lpi_alloc(nvecs, &lpi_base, &nr_lpis);
>>>> if (lpi_map)
>>>> col_map = kcalloc(nr_lpis, sizeof(*col_map),
>>>> GFP_KERNEL);
>>>> } else {
>>>> col_map = kcalloc(nr_ites, sizeof(*col_map), GFP_KERNEL);
>>>> nr_lpis = 0;
>>>> lpi_base = 0;
>>>> }
>>>> if (its->is_v4)
>>>> vlpi_map = kcalloc(nr_lpis, sizeof(*vlpi_map), GFP_KERNEL);
>>>>
>>>> if (!dev || !itt || !col_map || (!lpi_map && alloc_lpis) ||
>>>> (!vlpi_map && its->is_v4)) {
>>>> kfree(dev);
>>>> kfree(itt);
>>>> kfree(lpi_map);
>>>> kfree(col_map);
>>>> kfree(vlpi_map);
>>>> return NULL;
>>>> }
>>> How does this relate to the patch posted in this discussion? The
>>> proposed changes turn the locking from a mutex into a raw_spinlock.
>> I'm testing the patchset in
>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/vlpi-map-rework,
>> not only the patch posted in the mail directly. The first patch
>> *"**irqchip/gic-v3-its: Make vlpi_map allocations atomic" works well in
>> our internal tree, and my new testing is against the other 3 patches in
>> your vlpi-map-rework branch, as I promised. I'm sorry if I didn't state
>> this clearly.
> Ah, I had completely forgot about this branch. As I said, it is
> completely untested. I'll see if I can get some brain bandwidth in the
> next couple of weeks to get back to it...
Yes, a bit too long ago...

And finally I found the panic is caused by this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=irq/vlpi-map-rework&id=fe3dd7e06ee0e82bade4f2a107ef6422e5c9021e

diff --git a/drivers/irqchip/irq-gic-v3-its.c
b/drivers/irqchip/irq-gic-v3-its.c
index 18aa04b..6f55886 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -2458,6 +2458,8 @@ static void its_free_device(struct its_device
*its_dev)
list_del(&its_dev->entry);
raw_spin_unlock_irqrestore(&its_dev->its->lock, flags);
kfree(its_dev->itt);
+ kfree(its_dev->event_map.lpi_map);
+ kfree(its_dev->event_map.col_map);
kfree(its_dev);
}

This patch causes double free for both lpi_map and col_map in
its_irq_domain_free():

if (!its_dev->shared &&
bitmap_empty(its_dev->event_map.lpi_map,
its_dev->event_map.nr_lpis)) {
its_lpi_free(its_dev->event_map.lpi_map, ---->
its_dev->event_map.lpi_map is freed
its_dev->event_map.lpi_base,
its_dev->event_map.nr_lpis);
kfree(its_dev->event_map.col_map); ---->
its_dev->event_map.col_map is freed

/* Unmap device/itt */
its_send_mapd(its_dev, 0);
its_free_device(its_dev); ---->
lpi_map and col_map are freed again
}

Thanks,

Heyi

>
> Thanks,
>
> M.