2024-02-13 09:38:01

by Oliver Upton

[permalink] [raw]
Subject: [PATCH v2 00/23] KVM: arm64: Improvements to LPI injection

For full details on the what/why, please see the cover letter in v1.

Apologies for the delay on v2, I wanted to spend some time to get a
microbenchmark in place to slam the ITS code pretty hard, and based on
the results I'm glad I did.

The test is built around having vCPU threads and device threads, where
each device can signal a particular number of event IDs (alien, right?)

Anyway, here are the results of that test in some carefully-selected
examples:

+----------------------------+---------------------+------------------------------+
| Config | v6.8-rc1 (LPIs/sec) | v6.8-rc1 + series (LPIs/sec) |
+----------------------------+---------------------+------------------------------+
| -v 1 -d 1 -e 1 -i 1000000 | 780151.37 | 1255291.00 |
| -v 16 -d 16 -e 16 -i 10000 | 437081.55 | 5078225.70 |
| -v 16 -d 16 -e 17 -i 10000 | 506446.50 | 1126118.10 |
| -v 64 -d 64 -e 1 -i 100000 | 295097.03 | 5565964.87 |
| -v 1 -d 1 -e 17 -i 1000 | 732934.43 | 149.24 |
+----------------------------+---------------------+------------------------------+

While there is an 18x improvement in the scaled-out config (64 vCPUs, 64
devices, 1 event per device), there is an extremely disappointing 4911x
regression in the example that effectively forces a cache eviction for
every lookup.

Clearly the RCU synchronization is a bounding issue in this case. I
think other scenarios where the cache is overcommitted (16 vCPUs, 16
devices, 17 events / device) are able to hide effects somewhat, as other
threads can make forward progress while others are stuck waiting on RCU.

A few ideas on next steps:

1) Rework the lpi_list_lock as an rwlock. This would obviate the need
for RCU protection in the LPI cache as well as memory allocations on
the injection path. This is actually what I had in the internal
version of the series, although it was very incomplete.

I'd expect this to nullify the improvement on the
slightly-overcommitted case and 'fix' the pathological case.

2) call_rcu() and move on. This feels somewhat abusive of the API, as
the guest can flood the host with RCU callbacks, but I wasn't able
to make my machine fall over in any mean configuration of the test.

I haven't studied the degree to which such a malicious VM could
adversely affect neighboring workloads.

3) Redo the whole ITS representation with xarrays and allow RCU readers
outside of the ITS lock. I haven't fully thought this out, and if we
pursue this option then we will need a secondary data structure to
track where ITSes have been placed in guest memory to avoid taking
the SRCU lock. We can then stick RCU synchronization in ITS command
processing, which feels right to me, and dump the translation cache
altogether.

I'd expect slightly worse average case performance in favor of more
consistent performance.

Even though it is more work, I'm slightly in favor of (3) as it is a
net reduction in overall complexity of the ITS implementation. But, I
wanted to send out what I had to guage opinions on these options, and
get feedback on the first 10 patches which are an overall win.

v1: https://lore.kernel.org/kvmarm/[email protected]/

v1 -> v2:
- Add the microbenchmark
- Add tracepoints / VM stats for the important bits of LPI injection.
This was extremely useful for making sense of test results.
- Fix a silly lock imbalance on error path in vgic_add_lpi() (Dan)
- Constrain xas_for_each() based on the properties of the INTID space
(Marc)
- Remove some missed vestiges of the LPI linked-list (Marc)
- Explicitly free unused cache entry on failed insertion race (Marc)
- Don't explode people's machines with a boatload of xchg() (I said it
was WIP!) (Marc)

Oliver Upton (23):
KVM: arm64: Add tracepoints + stats for LPI cache effectiveness
KVM: arm64: vgic: Store LPIs in an xarray
KVM: arm64: vgic: Use xarray to find LPI in vgic_get_lpi()
KVM: arm64: vgic-v3: Iterate the xarray to find pending LPIs
KVM: arm64: vgic-its: Walk the LPI xarray in vgic_copy_lpi_list()
KVM: arm64: vgic: Get rid of the LPI linked-list
KVM: arm64: vgic: Use atomics to count LPIs
KVM: arm64: vgic: Free LPI vgic_irq structs in an RCU-safe manner
KVM: arm64: vgic: Rely on RCU protection in vgic_get_lpi()
KVM: arm64: vgic: Ensure the irq refcount is nonzero when taking a ref
KVM: arm64: vgic: Don't acquire the lpi_list_lock in vgic_put_irq()
KVM: arm64: vgic-its: Lazily allocate LPI translation cache
KVM: arm64: vgic-its: Pick cache victim based on usage count
KVM: arm64: vgic-its: Protect cached vgic_irq pointers with RCU
KVM: arm64: vgic-its: Treat the LPI translation cache as an rculist
KVM: arm64: vgic-its: Rely on RCU to protect translation cache reads
KVM: selftests: Align with kernel's GIC definitions
KVM: selftests: Standardise layout of GIC frames
KVM: selftests: Add a minimal library for interacting with an ITS
KVM: selftests: Add helper for enabling LPIs on a redistributor
KVM: selftests: Use MPIDR_HWID_BITMASK from cputype.h
KVM: selftests: Hack in support for aligned page allocations
KVM: selftests: Add stress test for LPI injection

arch/arm64/include/asm/kvm_host.h | 3 +
arch/arm64/kvm/guest.c | 5 +-
arch/arm64/kvm/vgic/trace.h | 66 ++
arch/arm64/kvm/vgic/vgic-debug.c | 2 +-
arch/arm64/kvm/vgic/vgic-init.c | 7 +-
arch/arm64/kvm/vgic/vgic-its.c | 220 ++++---
arch/arm64/kvm/vgic/vgic-v3.c | 3 +-
arch/arm64/kvm/vgic/vgic.c | 56 +-
arch/arm64/kvm/vgic/vgic.h | 15 +-
include/kvm/arm_vgic.h | 10 +-
include/linux/kvm_host.h | 4 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/aarch64/arch_timer.c | 8 +-
.../testing/selftests/kvm/aarch64/psci_test.c | 2 +
.../testing/selftests/kvm/aarch64/vgic_irq.c | 15 +-
.../selftests/kvm/aarch64/vgic_lpi_stress.c | 388 ++++++++++++
.../kvm/aarch64/vpmu_counter_access.c | 6 +-
.../selftests/kvm/dirty_log_perf_test.c | 5 +-
.../selftests/kvm/include/aarch64/gic.h | 15 +-
.../selftests/kvm/include/aarch64/gic_v3.h | 586 +++++++++++++++++-
.../selftests/kvm/include/aarch64/processor.h | 2 -
.../selftests/kvm/include/aarch64/vgic.h | 27 +-
.../selftests/kvm/include/kvm_util_base.h | 2 +
tools/testing/selftests/kvm/lib/aarch64/gic.c | 18 +-
.../selftests/kvm/lib/aarch64/gic_private.h | 4 +-
.../selftests/kvm/lib/aarch64/gic_v3.c | 69 ++-
.../testing/selftests/kvm/lib/aarch64/vgic.c | 337 +++++++++-
tools/testing/selftests/kvm/lib/kvm_util.c | 27 +-
28 files changed, 1641 insertions(+), 262 deletions(-)
create mode 100644 tools/testing/selftests/kvm/aarch64/vgic_lpi_stress.c


base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
--
2.43.0.687.g38aa6559b0-goog



2024-02-13 09:42:38

by Oliver Upton

[permalink] [raw]
Subject: [PATCH v2 05/23] KVM: arm64: vgic-its: Walk the LPI xarray in vgic_copy_lpi_list()

Start iterating the LPI xarray in anticipation of removing the LPI
linked-list.

Signed-off-by: Oliver Upton <[email protected]>
---
arch/arm64/kvm/vgic/vgic-its.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 0265cd1f2d6e..aefe0794b71c 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -336,6 +336,7 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
int vgic_copy_lpi_list(struct kvm *kvm, struct kvm_vcpu *vcpu, u32 **intid_ptr)
{
struct vgic_dist *dist = &kvm->arch.vgic;
+ XA_STATE(xas, &dist->lpi_xa, GIC_LPI_OFFSET);
struct vgic_irq *irq;
unsigned long flags;
u32 *intids;
@@ -354,7 +355,9 @@ int vgic_copy_lpi_list(struct kvm *kvm, struct kvm_vcpu *vcpu, u32 **intid_ptr)
return -ENOMEM;

raw_spin_lock_irqsave(&dist->lpi_list_lock, flags);
- list_for_each_entry(irq, &dist->lpi_list_head, lpi_list) {
+ rcu_read_lock();
+
+ xas_for_each(&xas, irq, INTERRUPT_ID_BITS_ITS) {
if (i == irq_count)
break;
/* We don't need to "get" the IRQ, as we hold the list lock. */
@@ -362,6 +365,8 @@ int vgic_copy_lpi_list(struct kvm *kvm, struct kvm_vcpu *vcpu, u32 **intid_ptr)
continue;
intids[i++] = irq->intid;
}
+
+ rcu_read_unlock();
raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags);

*intid_ptr = intids;
--
2.43.0.687.g38aa6559b0-goog


2024-02-13 09:43:18

by Oliver Upton

[permalink] [raw]
Subject: [PATCH v2 07/23] KVM: arm64: vgic: Use atomics to count LPIs

Switch to using atomics for LPI accounting, allowing vgic_irq references
to be dropped in parallel.

Signed-off-by: Oliver Upton <[email protected]>
---
arch/arm64/kvm/vgic/vgic-debug.c | 2 +-
arch/arm64/kvm/vgic/vgic-its.c | 4 ++--
arch/arm64/kvm/vgic/vgic.c | 2 +-
include/kvm/arm_vgic.h | 4 ++--
4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-debug.c b/arch/arm64/kvm/vgic/vgic-debug.c
index 85606a531dc3..389025ce7749 100644
--- a/arch/arm64/kvm/vgic/vgic-debug.c
+++ b/arch/arm64/kvm/vgic/vgic-debug.c
@@ -149,7 +149,7 @@ static void print_dist_state(struct seq_file *s, struct vgic_dist *dist)
seq_printf(s, "vgic_model:\t%s\n", v3 ? "GICv3" : "GICv2");
seq_printf(s, "nr_spis:\t%d\n", dist->nr_spis);
if (v3)
- seq_printf(s, "nr_lpis:\t%d\n", dist->lpi_list_count);
+ seq_printf(s, "nr_lpis:\t%d\n", atomic_read(&dist->lpi_count));
seq_printf(s, "enabled:\t%d\n", dist->enabled);
seq_printf(s, "\n");

diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index c68164d6cba0..048226812974 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -97,7 +97,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
goto out_unlock;
}

- dist->lpi_list_count++;
+ atomic_inc(&dist->lpi_count);

out_unlock:
if (ret)
@@ -345,7 +345,7 @@ int vgic_copy_lpi_list(struct kvm *kvm, struct kvm_vcpu *vcpu, u32 **intid_ptr)
* command). If coming from another path (such as enabling LPIs),
* we must be careful not to overrun the array.
*/
- irq_count = READ_ONCE(dist->lpi_list_count);
+ irq_count = atomic_read(&dist->lpi_count);
intids = kmalloc_array(irq_count, sizeof(intids[0]), GFP_KERNEL_ACCOUNT);
if (!intids)
return -ENOMEM;
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index e58ce68e325c..5988d162b765 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -122,7 +122,7 @@ void __vgic_put_lpi_locked(struct kvm *kvm, struct vgic_irq *irq)
return;

xa_erase(&dist->lpi_xa, irq->intid);
- dist->lpi_list_count--;
+ atomic_dec(&dist->lpi_count);

kfree(irq);
}
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index aeff363e3ba6..71e9d719533b 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -273,10 +273,10 @@ struct vgic_dist {
*/
u64 propbaser;

- /* Protects the lpi_list and the count value below. */
+ /* Protects the lpi_list. */
raw_spinlock_t lpi_list_lock;
struct xarray lpi_xa;
- int lpi_list_count;
+ atomic_t lpi_count;

/* LPI translation cache */
struct list_head lpi_translation_cache;
--
2.43.0.687.g38aa6559b0-goog


2024-02-13 09:43:49

by Oliver Upton

[permalink] [raw]
Subject: [PATCH v2 08/23] KVM: arm64: vgic: Free LPI vgic_irq structs in an RCU-safe manner

Free the vgic_irq structs in an RCU-safe manner to allow reads of the
LPI configuration data to happen in parallel with the release of LPIs.

Signed-off-by: Oliver Upton <[email protected]>
---
arch/arm64/kvm/vgic/vgic.c | 2 +-
include/kvm/arm_vgic.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index 5988d162b765..be3ed4c5e1fa 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -124,7 +124,7 @@ void __vgic_put_lpi_locked(struct kvm *kvm, struct vgic_irq *irq)
xa_erase(&dist->lpi_xa, irq->intid);
atomic_dec(&dist->lpi_count);

- kfree(irq);
+ kfree_rcu(irq, rcu);
}

void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 71e9d719533b..47035946648e 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -117,6 +117,7 @@ struct irq_ops {

struct vgic_irq {
raw_spinlock_t irq_lock; /* Protects the content of the struct */
+ struct rcu_head rcu;
struct list_head ap_list;

struct kvm_vcpu *vcpu; /* SGIs and PPIs: The VCPU
--
2.43.0.687.g38aa6559b0-goog


2024-02-13 09:45:44

by Oliver Upton

[permalink] [raw]
Subject: [PATCH v2 13/23] KVM: arm64: vgic-its: Pick cache victim based on usage count

To date the translation cache LRU policy relies on the ordering of the
linked-list to pick the victim, as entries are moved to the head of the
list on every cache hit. These sort of transformations are incompatible
with an rculist, necessitating a different strategy for recording usage
in-place.

Count the number of cache hits since the last translation cache miss for
every entry. The preferences for selecting a victim are as follows:

- Invalid entries over valid entries

- Valid entry with the lowest usage count

- In the case of a tie, pick the entry closest to the tail (oldest)

Signed-off-by: Oliver Upton <[email protected]>
---
arch/arm64/kvm/vgic/vgic-its.c | 46 ++++++++++++++++++++++++++--------
1 file changed, 36 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index a7ba20b57264..35926d5ae561 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -157,6 +157,7 @@ struct vgic_translation_cache_entry {
u32 devid;
u32 eventid;
struct vgic_irq *irq;
+ atomic64_t usage_count;
};

/**
@@ -580,13 +581,7 @@ static struct vgic_irq *__vgic_its_check_cache(struct vgic_dist *dist,
cte->eventid != eventid)
continue;

- /*
- * Move this entry to the head, as it is the most
- * recently used.
- */
- if (!list_is_first(&cte->entry, &dist->lpi_translation_cache))
- list_move(&cte->entry, &dist->lpi_translation_cache);
-
+ atomic64_inc(&cte->usage_count);
return cte->irq;
}

@@ -619,6 +614,36 @@ static unsigned int vgic_its_max_cache_size(struct kvm *kvm)
return atomic_read(&kvm->online_vcpus) * LPI_DEFAULT_PCPU_CACHE_SIZE;
}

+static struct vgic_translation_cache_entry *vgic_its_cache_victim(struct vgic_dist *dist,
+ s64 *max_usage)
+{
+ struct vgic_translation_cache_entry *cte, *victim = NULL;
+ s64 min, tmp, max = S64_MIN;
+
+ /*
+ * Find the least used cache entry since the last cache miss, preferring
+ * older entries in the case of a tie. Return the max usage count seen
+ * during the scan to initialize the new cache entry.
+ */
+ list_for_each_entry(cte, &dist->lpi_translation_cache, entry) {
+ tmp = atomic64_read(&cte->usage_count);
+ max = max(max, tmp);
+
+ if (!cte->irq) {
+ victim = cte;
+ break;
+ }
+
+ if (!victim || tmp <= min) {
+ victim = cte;
+ min = tmp;
+ }
+ }
+
+ *max_usage = max;
+ return victim;
+}
+
static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its,
u32 devid, u32 eventid,
struct vgic_irq *irq)
@@ -627,6 +652,7 @@ static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its,
struct vgic_dist *dist = &kvm->arch.vgic;
unsigned long flags;
phys_addr_t db;
+ s64 usage = 0;

/* Do not cache a directly injected interrupt */
if (irq->hw)
@@ -650,9 +676,8 @@ static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its,
}

if (dist->lpi_cache_count >= vgic_its_max_cache_size(kvm)) {
- /* Always reuse the last entry (LRU policy) */
- victim = list_last_entry(&dist->lpi_translation_cache,
- typeof(*cte), entry);
+ victim = vgic_its_cache_victim(dist, &usage);
+
list_del(&victim->entry);
dist->lpi_cache_count--;
}
@@ -664,6 +689,7 @@ static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its,
lockdep_assert_held(&its->its_lock);
vgic_get_irq_kref(irq);

+ atomic64_set(&new->usage_count, usage);
new->db = db;
new->devid = devid;
new->eventid = eventid;
--
2.43.0.687.g38aa6559b0-goog


2024-02-13 09:46:34

by Oliver Upton

[permalink] [raw]
Subject: [PATCH v2 17/23] KVM: selftests: Align with kernel's GIC definitions

There are a few subtle incongruencies between the GIC definitions used
by the kernel and selftests. Furthermore, the selftests header blends
implementation detail (e.g. default priority) with the architectural
definitions.

This is all rather annoying, since bulk imports of the kernel header
is not possible. Move selftests-specific definitions out of the
offending header and realign tests on the canonical definitions for
things like sysregs. Finally, haul in a fresh copy of the gicv3 header
to enable a forthcoming ITS selftest.

Signed-off-by: Oliver Upton <[email protected]>
---
.../testing/selftests/kvm/aarch64/vgic_irq.c | 4 +-
.../selftests/kvm/include/aarch64/gic_v3.h | 586 +++++++++++++++++-
.../selftests/kvm/lib/aarch64/gic_v3.c | 13 +-
3 files changed, 568 insertions(+), 35 deletions(-)

diff --git a/tools/testing/selftests/kvm/aarch64/vgic_irq.c b/tools/testing/selftests/kvm/aarch64/vgic_irq.c
index 2e64b4856e38..d61a6302f467 100644
--- a/tools/testing/selftests/kvm/aarch64/vgic_irq.c
+++ b/tools/testing/selftests/kvm/aarch64/vgic_irq.c
@@ -152,7 +152,7 @@ static void reset_stats(void)

static uint64_t gic_read_ap1r0(void)
{
- uint64_t reg = read_sysreg_s(SYS_ICV_AP1R0_EL1);
+ uint64_t reg = read_sysreg_s(SYS_ICC_AP1R0_EL1);

dsb(sy);
return reg;
@@ -160,7 +160,7 @@ static uint64_t gic_read_ap1r0(void)

static void gic_write_ap1r0(uint64_t val)
{
- write_sysreg_s(val, SYS_ICV_AP1R0_EL1);
+ write_sysreg_s(val, SYS_ICC_AP1R0_EL1);
isb();
}

diff --git a/tools/testing/selftests/kvm/include/aarch64/gic_v3.h b/tools/testing/selftests/kvm/include/aarch64/gic_v3.h
index ba0886e8a2bb..a76615fa39a1 100644
--- a/tools/testing/selftests/kvm/include/aarch64/gic_v3.h
+++ b/tools/testing/selftests/kvm/include/aarch64/gic_v3.h
@@ -1,82 +1,604 @@
-/* SPDX-License-Identifier: GPL-2.0 */
+/* SPDX-License-Identifier: GPL-2.0-only */
/*
- * ARM Generic Interrupt Controller (GIC) v3 specific defines
+ * Copyright (C) 2013, 2014 ARM Limited, All Rights Reserved.
+ * Author: Marc Zyngier <[email protected]>
*/
-
-#ifndef SELFTEST_KVM_GICV3_H
-#define SELFTEST_KVM_GICV3_H
-
-#include <asm/sysreg.h>
+#ifndef __SELFTESTS_GIC_V3_H
+#define __SELFTESTS_GIC_V3_H

/*
- * Distributor registers
+ * Distributor registers. We assume we're running non-secure, with ARE
+ * being set. Secure-only and non-ARE registers are not described.
*/
#define GICD_CTLR 0x0000
#define GICD_TYPER 0x0004
+#define GICD_IIDR 0x0008
+#define GICD_TYPER2 0x000C
+#define GICD_STATUSR 0x0010
+#define GICD_SETSPI_NSR 0x0040
+#define GICD_CLRSPI_NSR 0x0048
+#define GICD_SETSPI_SR 0x0050
+#define GICD_CLRSPI_SR 0x0058
#define GICD_IGROUPR 0x0080
#define GICD_ISENABLER 0x0100
#define GICD_ICENABLER 0x0180
#define GICD_ISPENDR 0x0200
#define GICD_ICPENDR 0x0280
-#define GICD_ICACTIVER 0x0380
#define GICD_ISACTIVER 0x0300
+#define GICD_ICACTIVER 0x0380
#define GICD_IPRIORITYR 0x0400
#define GICD_ICFGR 0x0C00
+#define GICD_IGRPMODR 0x0D00
+#define GICD_NSACR 0x0E00
+#define GICD_IGROUPRnE 0x1000
+#define GICD_ISENABLERnE 0x1200
+#define GICD_ICENABLERnE 0x1400
+#define GICD_ISPENDRnE 0x1600
+#define GICD_ICPENDRnE 0x1800
+#define GICD_ISACTIVERnE 0x1A00
+#define GICD_ICACTIVERnE 0x1C00
+#define GICD_IPRIORITYRnE 0x2000
+#define GICD_ICFGRnE 0x3000
+#define GICD_IROUTER 0x6000
+#define GICD_IROUTERnE 0x8000
+#define GICD_IDREGS 0xFFD0
+#define GICD_PIDR2 0xFFE8
+
+#define ESPI_BASE_INTID 4096

/*
- * The assumption is that the guest runs in a non-secure mode.
- * The following bits of GICD_CTLR are defined accordingly.
+ * Those registers are actually from GICv2, but the spec demands that they
+ * are implemented as RES0 if ARE is 1 (which we do in KVM's emulated GICv3).
*/
+#define GICD_ITARGETSR 0x0800
+#define GICD_SGIR 0x0F00
+#define GICD_CPENDSGIR 0x0F10
+#define GICD_SPENDSGIR 0x0F20
+
#define GICD_CTLR_RWP (1U << 31)
#define GICD_CTLR_nASSGIreq (1U << 8)
+#define GICD_CTLR_DS (1U << 6)
#define GICD_CTLR_ARE_NS (1U << 4)
#define GICD_CTLR_ENABLE_G1A (1U << 1)
#define GICD_CTLR_ENABLE_G1 (1U << 0)

+#define GICD_IIDR_IMPLEMENTER_SHIFT 0
+#define GICD_IIDR_IMPLEMENTER_MASK (0xfff << GICD_IIDR_IMPLEMENTER_SHIFT)
+#define GICD_IIDR_REVISION_SHIFT 12
+#define GICD_IIDR_REVISION_MASK (0xf << GICD_IIDR_REVISION_SHIFT)
+#define GICD_IIDR_VARIANT_SHIFT 16
+#define GICD_IIDR_VARIANT_MASK (0xf << GICD_IIDR_VARIANT_SHIFT)
+#define GICD_IIDR_PRODUCT_ID_SHIFT 24
+#define GICD_IIDR_PRODUCT_ID_MASK (0xff << GICD_IIDR_PRODUCT_ID_SHIFT)
+
+
+/*
+ * In systems with a single security state (what we emulate in KVM)
+ * the meaning of the interrupt group enable bits is slightly different
+ */
+#define GICD_CTLR_ENABLE_SS_G1 (1U << 1)
+#define GICD_CTLR_ENABLE_SS_G0 (1U << 0)
+
+#define GICD_TYPER_RSS (1U << 26)
+#define GICD_TYPER_LPIS (1U << 17)
+#define GICD_TYPER_MBIS (1U << 16)
+#define GICD_TYPER_ESPI (1U << 8)
+
+#define GICD_TYPER_ID_BITS(typer) ((((typer) >> 19) & 0x1f) + 1)
+#define GICD_TYPER_NUM_LPIS(typer) ((((typer) >> 11) & 0x1f) + 1)
#define GICD_TYPER_SPIS(typer) ((((typer) & 0x1f) + 1) * 32)
-#define GICD_INT_DEF_PRI_X4 0xa0a0a0a0
+#define GICD_TYPER_ESPIS(typer) \
+ (((typer) & GICD_TYPER_ESPI) ? GICD_TYPER_SPIS((typer) >> 27) : 0)
+
+#define GICD_TYPER2_nASSGIcap (1U << 8)
+#define GICD_TYPER2_VIL (1U << 7)
+#define GICD_TYPER2_VID GENMASK(4, 0)
+
+#define GICD_IROUTER_SPI_MODE_ONE (0U << 31)
+#define GICD_IROUTER_SPI_MODE_ANY (1U << 31)
+
+#define GIC_PIDR2_ARCH_MASK 0xf0
+#define GIC_PIDR2_ARCH_GICv3 0x30
+#define GIC_PIDR2_ARCH_GICv4 0x40
+
+#define GIC_V3_DIST_SIZE 0x10000
+
+#define GIC_PAGE_SIZE_4K 0ULL
+#define GIC_PAGE_SIZE_16K 1ULL
+#define GIC_PAGE_SIZE_64K 2ULL
+#define GIC_PAGE_SIZE_MASK 3ULL

/*
- * Redistributor registers
+ * Re-Distributor registers, offsets from RD_base
*/
-#define GICR_CTLR 0x000
-#define GICR_WAKER 0x014
+#define GICR_CTLR GICD_CTLR
+#define GICR_IIDR 0x0004
+#define GICR_TYPER 0x0008
+#define GICR_STATUSR GICD_STATUSR
+#define GICR_WAKER 0x0014
+#define GICR_SETLPIR 0x0040
+#define GICR_CLRLPIR 0x0048
+#define GICR_PROPBASER 0x0070
+#define GICR_PENDBASER 0x0078
+#define GICR_INVLPIR 0x00A0
+#define GICR_INVALLR 0x00B0
+#define GICR_SYNCR 0x00C0
+#define GICR_IDREGS GICD_IDREGS
+#define GICR_PIDR2 GICD_PIDR2
+
+#define GICR_CTLR_ENABLE_LPIS (1UL << 0)
+#define GICR_CTLR_CES (1UL << 1)
+#define GICR_CTLR_IR (1UL << 2)
+#define GICR_CTLR_RWP (1UL << 3)

-#define GICR_CTLR_RWP (1U << 3)
+#define GICR_TYPER_CPU_NUMBER(r) (((r) >> 8) & 0xffff)
+
+#define EPPI_BASE_INTID 1056
+
+#define GICR_TYPER_NR_PPIS(r) \
+ ({ \
+ unsigned int __ppinum = ((r) >> 27) & 0x1f; \
+ unsigned int __nr_ppis = 16; \
+ if (__ppinum == 1 || __ppinum == 2) \
+ __nr_ppis += __ppinum * 32; \
+ \
+ __nr_ppis; \
+ })

#define GICR_WAKER_ProcessorSleep (1U << 1)
#define GICR_WAKER_ChildrenAsleep (1U << 2)

+#define GIC_BASER_CACHE_nCnB 0ULL
+#define GIC_BASER_CACHE_SameAsInner 0ULL
+#define GIC_BASER_CACHE_nC 1ULL
+#define GIC_BASER_CACHE_RaWt 2ULL
+#define GIC_BASER_CACHE_RaWb 3ULL
+#define GIC_BASER_CACHE_WaWt 4ULL
+#define GIC_BASER_CACHE_WaWb 5ULL
+#define GIC_BASER_CACHE_RaWaWt 6ULL
+#define GIC_BASER_CACHE_RaWaWb 7ULL
+#define GIC_BASER_CACHE_MASK 7ULL
+#define GIC_BASER_NonShareable 0ULL
+#define GIC_BASER_InnerShareable 1ULL
+#define GIC_BASER_OuterShareable 2ULL
+#define GIC_BASER_SHAREABILITY_MASK 3ULL
+
+#define GIC_BASER_CACHEABILITY(reg, inner_outer, type) \
+ (GIC_BASER_CACHE_##type << reg##_##inner_outer##_CACHEABILITY_SHIFT)
+
+#define GIC_BASER_SHAREABILITY(reg, type) \
+ (GIC_BASER_##type << reg##_SHAREABILITY_SHIFT)
+
+/* encode a size field of width @w containing @n - 1 units */
+#define GIC_ENCODE_SZ(n, w) (((unsigned long)(n) - 1) & GENMASK_ULL(((w) - 1), 0))
+
+#define GICR_PROPBASER_SHAREABILITY_SHIFT (10)
+#define GICR_PROPBASER_INNER_CACHEABILITY_SHIFT (7)
+#define GICR_PROPBASER_OUTER_CACHEABILITY_SHIFT (56)
+#define GICR_PROPBASER_SHAREABILITY_MASK \
+ GIC_BASER_SHAREABILITY(GICR_PROPBASER, SHAREABILITY_MASK)
+#define GICR_PROPBASER_INNER_CACHEABILITY_MASK \
+ GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, MASK)
+#define GICR_PROPBASER_OUTER_CACHEABILITY_MASK \
+ GIC_BASER_CACHEABILITY(GICR_PROPBASER, OUTER, MASK)
+#define GICR_PROPBASER_CACHEABILITY_MASK GICR_PROPBASER_INNER_CACHEABILITY_MASK
+
+#define GICR_PROPBASER_InnerShareable \
+ GIC_BASER_SHAREABILITY(GICR_PROPBASER, InnerShareable)
+
+#define GICR_PROPBASER_nCnB GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, nCnB)
+#define GICR_PROPBASER_nC GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, nC)
+#define GICR_PROPBASER_RaWt GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, RaWt)
+#define GICR_PROPBASER_RaWb GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, RaWb)
+#define GICR_PROPBASER_WaWt GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, WaWt)
+#define GICR_PROPBASER_WaWb GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, WaWb)
+#define GICR_PROPBASER_RaWaWt GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, RaWaWt)
+#define GICR_PROPBASER_RaWaWb GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, RaWaWb)
+
+#define GICR_PROPBASER_IDBITS_MASK (0x1f)
+#define GICR_PROPBASER_ADDRESS(x) ((x) & GENMASK_ULL(51, 12))
+#define GICR_PENDBASER_ADDRESS(x) ((x) & GENMASK_ULL(51, 16))
+
+#define GICR_PENDBASER_SHAREABILITY_SHIFT (10)
+#define GICR_PENDBASER_INNER_CACHEABILITY_SHIFT (7)
+#define GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT (56)
+#define GICR_PENDBASER_SHAREABILITY_MASK \
+ GIC_BASER_SHAREABILITY(GICR_PENDBASER, SHAREABILITY_MASK)
+#define GICR_PENDBASER_INNER_CACHEABILITY_MASK \
+ GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, MASK)
+#define GICR_PENDBASER_OUTER_CACHEABILITY_MASK \
+ GIC_BASER_CACHEABILITY(GICR_PENDBASER, OUTER, MASK)
+#define GICR_PENDBASER_CACHEABILITY_MASK GICR_PENDBASER_INNER_CACHEABILITY_MASK
+
+#define GICR_PENDBASER_InnerShareable \
+ GIC_BASER_SHAREABILITY(GICR_PENDBASER, InnerShareable)
+
+#define GICR_PENDBASER_nCnB GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, nCnB)
+#define GICR_PENDBASER_nC GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, nC)
+#define GICR_PENDBASER_RaWt GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, RaWt)
+#define GICR_PENDBASER_RaWb GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, RaWb)
+#define GICR_PENDBASER_WaWt GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, WaWt)
+#define GICR_PENDBASER_WaWb GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, WaWb)
+#define GICR_PENDBASER_RaWaWt GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, RaWaWt)
+#define GICR_PENDBASER_RaWaWb GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, RaWaWb)
+
+#define GICR_PENDBASER_PTZ BIT_ULL(62)
+
/*
- * Redistributor registers, offsets from SGI base
+ * Re-Distributor registers, offsets from SGI_base
*/
#define GICR_IGROUPR0 GICD_IGROUPR
#define GICR_ISENABLER0 GICD_ISENABLER
#define GICR_ICENABLER0 GICD_ICENABLER
#define GICR_ISPENDR0 GICD_ISPENDR
+#define GICR_ICPENDR0 GICD_ICPENDR
#define GICR_ISACTIVER0 GICD_ISACTIVER
#define GICR_ICACTIVER0 GICD_ICACTIVER
-#define GICR_ICENABLER GICD_ICENABLER
-#define GICR_ICACTIVER GICD_ICACTIVER
#define GICR_IPRIORITYR0 GICD_IPRIORITYR
+#define GICR_ICFGR0 GICD_ICFGR
+#define GICR_IGRPMODR0 GICD_IGRPMODR
+#define GICR_NSACR GICD_NSACR
+
+#define GICR_TYPER_PLPIS (1U << 0)
+#define GICR_TYPER_VLPIS (1U << 1)
+#define GICR_TYPER_DIRTY (1U << 2)
+#define GICR_TYPER_DirectLPIS (1U << 3)
+#define GICR_TYPER_LAST (1U << 4)
+#define GICR_TYPER_RVPEID (1U << 7)
+#define GICR_TYPER_COMMON_LPI_AFF GENMASK_ULL(25, 24)
+#define GICR_TYPER_AFFINITY GENMASK_ULL(63, 32)
+
+#define GICR_INVLPIR_INTID GENMASK_ULL(31, 0)
+#define GICR_INVLPIR_VPEID GENMASK_ULL(47, 32)
+#define GICR_INVLPIR_V GENMASK_ULL(63, 63)
+
+#define GICR_INVALLR_VPEID GICR_INVLPIR_VPEID
+#define GICR_INVALLR_V GICR_INVLPIR_V
+
+#define GIC_V3_REDIST_SIZE 0x20000
+
+#define LPI_PROP_GROUP1 (1 << 1)
+#define LPI_PROP_ENABLED (1 << 0)
+
+/*
+ * Re-Distributor registers, offsets from VLPI_base
+ */
+#define GICR_VPROPBASER 0x0070
+
+#define GICR_VPROPBASER_IDBITS_MASK 0x1f
+
+#define GICR_VPROPBASER_SHAREABILITY_SHIFT (10)
+#define GICR_VPROPBASER_INNER_CACHEABILITY_SHIFT (7)
+#define GICR_VPROPBASER_OUTER_CACHEABILITY_SHIFT (56)
+
+#define GICR_VPROPBASER_SHAREABILITY_MASK \
+ GIC_BASER_SHAREABILITY(GICR_VPROPBASER, SHAREABILITY_MASK)
+#define GICR_VPROPBASER_INNER_CACHEABILITY_MASK \
+ GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, MASK)
+#define GICR_VPROPBASER_OUTER_CACHEABILITY_MASK \
+ GIC_BASER_CACHEABILITY(GICR_VPROPBASER, OUTER, MASK)
+#define GICR_VPROPBASER_CACHEABILITY_MASK \
+ GICR_VPROPBASER_INNER_CACHEABILITY_MASK
+
+#define GICR_VPROPBASER_InnerShareable \
+ GIC_BASER_SHAREABILITY(GICR_VPROPBASER, InnerShareable)
+
+#define GICR_VPROPBASER_nCnB GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, nCnB)
+#define GICR_VPROPBASER_nC GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, nC)
+#define GICR_VPROPBASER_RaWt GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, RaWt)
+#define GICR_VPROPBASER_RaWb GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, RaWb)
+#define GICR_VPROPBASER_WaWt GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, WaWt)
+#define GICR_VPROPBASER_WaWb GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, WaWb)
+#define GICR_VPROPBASER_RaWaWt GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, RaWaWt)
+#define GICR_VPROPBASER_RaWaWb GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, RaWaWb)
+
+/*
+ * GICv4.1 VPROPBASER reinvention. A subtle mix between the old
+ * VPROPBASER and ITS_BASER. Just not quite any of the two.
+ */
+#define GICR_VPROPBASER_4_1_VALID (1ULL << 63)
+#define GICR_VPROPBASER_4_1_ENTRY_SIZE GENMASK_ULL(61, 59)
+#define GICR_VPROPBASER_4_1_INDIRECT (1ULL << 55)
+#define GICR_VPROPBASER_4_1_PAGE_SIZE GENMASK_ULL(54, 53)
+#define GICR_VPROPBASER_4_1_Z (1ULL << 52)
+#define GICR_VPROPBASER_4_1_ADDR GENMASK_ULL(51, 12)
+#define GICR_VPROPBASER_4_1_SIZE GENMASK_ULL(6, 0)
+
+#define GICR_VPENDBASER 0x0078
+
+#define GICR_VPENDBASER_SHAREABILITY_SHIFT (10)
+#define GICR_VPENDBASER_INNER_CACHEABILITY_SHIFT (7)
+#define GICR_VPENDBASER_OUTER_CACHEABILITY_SHIFT (56)
+#define GICR_VPENDBASER_SHAREABILITY_MASK \
+ GIC_BASER_SHAREABILITY(GICR_VPENDBASER, SHAREABILITY_MASK)
+#define GICR_VPENDBASER_INNER_CACHEABILITY_MASK \
+ GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, MASK)
+#define GICR_VPENDBASER_OUTER_CACHEABILITY_MASK \
+ GIC_BASER_CACHEABILITY(GICR_VPENDBASER, OUTER, MASK)
+#define GICR_VPENDBASER_CACHEABILITY_MASK \
+ GICR_VPENDBASER_INNER_CACHEABILITY_MASK
+
+#define GICR_VPENDBASER_NonShareable \
+ GIC_BASER_SHAREABILITY(GICR_VPENDBASER, NonShareable)
+
+#define GICR_VPENDBASER_InnerShareable \
+ GIC_BASER_SHAREABILITY(GICR_VPENDBASER, InnerShareable)
+
+#define GICR_VPENDBASER_nCnB GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, nCnB)
+#define GICR_VPENDBASER_nC GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, nC)
+#define GICR_VPENDBASER_RaWt GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, RaWt)
+#define GICR_VPENDBASER_RaWb GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, RaWb)
+#define GICR_VPENDBASER_WaWt GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, WaWt)
+#define GICR_VPENDBASER_WaWb GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, WaWb)
+#define GICR_VPENDBASER_RaWaWt GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, RaWaWt)
+#define GICR_VPENDBASER_RaWaWb GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, RaWaWb)
+
+#define GICR_VPENDBASER_Dirty (1ULL << 60)
+#define GICR_VPENDBASER_PendingLast (1ULL << 61)
+#define GICR_VPENDBASER_IDAI (1ULL << 62)
+#define GICR_VPENDBASER_Valid (1ULL << 63)
+
+/*
+ * GICv4.1 VPENDBASER, used for VPE residency. On top of these fields,
+ * also use the above Valid, PendingLast and Dirty.
+ */
+#define GICR_VPENDBASER_4_1_DB (1ULL << 62)
+#define GICR_VPENDBASER_4_1_VGRP0EN (1ULL << 59)
+#define GICR_VPENDBASER_4_1_VGRP1EN (1ULL << 58)
+#define GICR_VPENDBASER_4_1_VPEID GENMASK_ULL(15, 0)
+
+#define GICR_VSGIR 0x0080
+
+#define GICR_VSGIR_VPEID GENMASK(15, 0)
+
+#define GICR_VSGIPENDR 0x0088
+
+#define GICR_VSGIPENDR_BUSY (1U << 31)
+#define GICR_VSGIPENDR_PENDING GENMASK(15, 0)
+
+/*
+ * ITS registers, offsets from ITS_base
+ */
+#define GITS_CTLR 0x0000
+#define GITS_IIDR 0x0004
+#define GITS_TYPER 0x0008
+#define GITS_MPIDR 0x0018
+#define GITS_CBASER 0x0080
+#define GITS_CWRITER 0x0088
+#define GITS_CREADR 0x0090
+#define GITS_BASER 0x0100
+#define GITS_IDREGS_BASE 0xffd0
+#define GITS_PIDR0 0xffe0
+#define GITS_PIDR1 0xffe4
+#define GITS_PIDR2 GICR_PIDR2
+#define GITS_PIDR4 0xffd0
+#define GITS_CIDR0 0xfff0
+#define GITS_CIDR1 0xfff4
+#define GITS_CIDR2 0xfff8
+#define GITS_CIDR3 0xfffc
+
+#define GITS_TRANSLATER 0x10040
+
+#define GITS_SGIR 0x20020
+
+#define GITS_SGIR_VPEID GENMASK_ULL(47, 32)
+#define GITS_SGIR_VINTID GENMASK_ULL(3, 0)
+
+#define GITS_CTLR_ENABLE (1U << 0)
+#define GITS_CTLR_ImDe (1U << 1)
+#define GITS_CTLR_ITS_NUMBER_SHIFT 4
+#define GITS_CTLR_ITS_NUMBER (0xFU << GITS_CTLR_ITS_NUMBER_SHIFT)
+#define GITS_CTLR_QUIESCENT (1U << 31)
+
+#define GITS_TYPER_PLPIS (1UL << 0)
+#define GITS_TYPER_VLPIS (1UL << 1)
+#define GITS_TYPER_ITT_ENTRY_SIZE_SHIFT 4
+#define GITS_TYPER_ITT_ENTRY_SIZE GENMASK_ULL(7, 4)
+#define GITS_TYPER_IDBITS_SHIFT 8
+#define GITS_TYPER_DEVBITS_SHIFT 13
+#define GITS_TYPER_DEVBITS GENMASK_ULL(17, 13)
+#define GITS_TYPER_PTA (1UL << 19)
+#define GITS_TYPER_HCC_SHIFT 24
+#define GITS_TYPER_HCC(r) (((r) >> GITS_TYPER_HCC_SHIFT) & 0xff)
+#define GITS_TYPER_VMOVP (1ULL << 37)
+#define GITS_TYPER_VMAPP (1ULL << 40)
+#define GITS_TYPER_SVPET GENMASK_ULL(42, 41)

-/* CPU interface registers */
-#define SYS_ICC_PMR_EL1 sys_reg(3, 0, 4, 6, 0)
-#define SYS_ICC_IAR1_EL1 sys_reg(3, 0, 12, 12, 0)
-#define SYS_ICC_EOIR1_EL1 sys_reg(3, 0, 12, 12, 1)
-#define SYS_ICC_DIR_EL1 sys_reg(3, 0, 12, 11, 1)
-#define SYS_ICC_CTLR_EL1 sys_reg(3, 0, 12, 12, 4)
-#define SYS_ICC_SRE_EL1 sys_reg(3, 0, 12, 12, 5)
-#define SYS_ICC_GRPEN1_EL1 sys_reg(3, 0, 12, 12, 7)
+#define GITS_IIDR_REV_SHIFT 12
+#define GITS_IIDR_REV_MASK (0xf << GITS_IIDR_REV_SHIFT)
+#define GITS_IIDR_REV(r) (((r) >> GITS_IIDR_REV_SHIFT) & 0xf)
+#define GITS_IIDR_PRODUCTID_SHIFT 24

-#define SYS_ICV_AP1R0_EL1 sys_reg(3, 0, 12, 9, 0)
+#define GITS_CBASER_VALID (1ULL << 63)
+#define GITS_CBASER_SHAREABILITY_SHIFT (10)
+#define GITS_CBASER_INNER_CACHEABILITY_SHIFT (59)
+#define GITS_CBASER_OUTER_CACHEABILITY_SHIFT (53)
+#define GITS_CBASER_SHAREABILITY_MASK \
+ GIC_BASER_SHAREABILITY(GITS_CBASER, SHAREABILITY_MASK)
+#define GITS_CBASER_INNER_CACHEABILITY_MASK \
+ GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, MASK)
+#define GITS_CBASER_OUTER_CACHEABILITY_MASK \
+ GIC_BASER_CACHEABILITY(GITS_CBASER, OUTER, MASK)
+#define GITS_CBASER_CACHEABILITY_MASK GITS_CBASER_INNER_CACHEABILITY_MASK

-#define ICC_PMR_DEF_PRIO 0xf0
+#define GITS_CBASER_InnerShareable \
+ GIC_BASER_SHAREABILITY(GITS_CBASER, InnerShareable)

+#define GITS_CBASER_nCnB GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, nCnB)
+#define GITS_CBASER_nC GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, nC)
+#define GITS_CBASER_RaWt GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWt)
+#define GITS_CBASER_RaWb GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWb)
+#define GITS_CBASER_WaWt GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, WaWt)
+#define GITS_CBASER_WaWb GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, WaWb)
+#define GITS_CBASER_RaWaWt GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWaWt)
+#define GITS_CBASER_RaWaWb GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWaWb)
+
+#define GITS_CBASER_ADDRESS(cbaser) ((cbaser) & GENMASK_ULL(51, 12))
+
+#define GITS_BASER_NR_REGS 8
+
+#define GITS_BASER_VALID (1ULL << 63)
+#define GITS_BASER_INDIRECT (1ULL << 62)
+
+#define GITS_BASER_INNER_CACHEABILITY_SHIFT (59)
+#define GITS_BASER_OUTER_CACHEABILITY_SHIFT (53)
+#define GITS_BASER_INNER_CACHEABILITY_MASK \
+ GIC_BASER_CACHEABILITY(GITS_BASER, INNER, MASK)
+#define GITS_BASER_CACHEABILITY_MASK GITS_BASER_INNER_CACHEABILITY_MASK
+#define GITS_BASER_OUTER_CACHEABILITY_MASK \
+ GIC_BASER_CACHEABILITY(GITS_BASER, OUTER, MASK)
+#define GITS_BASER_SHAREABILITY_MASK \
+ GIC_BASER_SHAREABILITY(GITS_BASER, SHAREABILITY_MASK)
+
+#define GITS_BASER_nCnB GIC_BASER_CACHEABILITY(GITS_BASER, INNER, nCnB)
+#define GITS_BASER_nC GIC_BASER_CACHEABILITY(GITS_BASER, INNER, nC)
+#define GITS_BASER_RaWt GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWt)
+#define GITS_BASER_RaWb GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWb)
+#define GITS_BASER_WaWt GIC_BASER_CACHEABILITY(GITS_BASER, INNER, WaWt)
+#define GITS_BASER_WaWb GIC_BASER_CACHEABILITY(GITS_BASER, INNER, WaWb)
+#define GITS_BASER_RaWaWt GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWaWt)
+#define GITS_BASER_RaWaWb GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWaWb)
+
+#define GITS_BASER_TYPE_SHIFT (56)
+#define GITS_BASER_TYPE(r) (((r) >> GITS_BASER_TYPE_SHIFT) & 7)
+#define GITS_BASER_ENTRY_SIZE_SHIFT (48)
+#define GITS_BASER_ENTRY_SIZE(r) ((((r) >> GITS_BASER_ENTRY_SIZE_SHIFT) & 0x1f) + 1)
+#define GITS_BASER_ENTRY_SIZE_MASK GENMASK_ULL(52, 48)
+#define GITS_BASER_PHYS_52_to_48(phys) \
+ (((phys) & GENMASK_ULL(47, 16)) | (((phys) >> 48) & 0xf) << 12)
+#define GITS_BASER_ADDR_48_to_52(baser) \
+ (((baser) & GENMASK_ULL(47, 16)) | (((baser) >> 12) & 0xf) << 48)
+
+#define GITS_BASER_SHAREABILITY_SHIFT (10)
+#define GITS_BASER_InnerShareable \
+ GIC_BASER_SHAREABILITY(GITS_BASER, InnerShareable)
+#define GITS_BASER_PAGE_SIZE_SHIFT (8)
+#define __GITS_BASER_PSZ(sz) (GIC_PAGE_SIZE_ ## sz << GITS_BASER_PAGE_SIZE_SHIFT)
+#define GITS_BASER_PAGE_SIZE_4K __GITS_BASER_PSZ(4K)
+#define GITS_BASER_PAGE_SIZE_16K __GITS_BASER_PSZ(16K)
+#define GITS_BASER_PAGE_SIZE_64K __GITS_BASER_PSZ(64K)
+#define GITS_BASER_PAGE_SIZE_MASK __GITS_BASER_PSZ(MASK)
+#define GITS_BASER_PAGES_MAX 256
+#define GITS_BASER_PAGES_SHIFT (0)
+#define GITS_BASER_NR_PAGES(r) (((r) & 0xff) + 1)
+
+#define GITS_BASER_TYPE_NONE 0
+#define GITS_BASER_TYPE_DEVICE 1
+#define GITS_BASER_TYPE_VCPU 2
+#define GITS_BASER_TYPE_RESERVED3 3
+#define GITS_BASER_TYPE_COLLECTION 4
+#define GITS_BASER_TYPE_RESERVED5 5
+#define GITS_BASER_TYPE_RESERVED6 6
+#define GITS_BASER_TYPE_RESERVED7 7
+
+#define GITS_LVL1_ENTRY_SIZE (8UL)
+
+/*
+ * ITS commands
+ */
+#define GITS_CMD_MAPD 0x08
+#define GITS_CMD_MAPC 0x09
+#define GITS_CMD_MAPTI 0x0a
+#define GITS_CMD_MAPI 0x0b
+#define GITS_CMD_MOVI 0x01
+#define GITS_CMD_DISCARD 0x0f
+#define GITS_CMD_INV 0x0c
+#define GITS_CMD_MOVALL 0x0e
+#define GITS_CMD_INVALL 0x0d
+#define GITS_CMD_INT 0x03
+#define GITS_CMD_CLEAR 0x04
+#define GITS_CMD_SYNC 0x05
+
+/*
+ * GICv4 ITS specific commands
+ */
+#define GITS_CMD_GICv4(x) ((x) | 0x20)
+#define GITS_CMD_VINVALL GITS_CMD_GICv4(GITS_CMD_INVALL)
+#define GITS_CMD_VMAPP GITS_CMD_GICv4(GITS_CMD_MAPC)
+#define GITS_CMD_VMAPTI GITS_CMD_GICv4(GITS_CMD_MAPTI)
+#define GITS_CMD_VMOVI GITS_CMD_GICv4(GITS_CMD_MOVI)
+#define GITS_CMD_VSYNC GITS_CMD_GICv4(GITS_CMD_SYNC)
+/* VMOVP, VSGI and INVDB are the odd ones, as they dont have a physical counterpart */
+#define GITS_CMD_VMOVP GITS_CMD_GICv4(2)
+#define GITS_CMD_VSGI GITS_CMD_GICv4(3)
+#define GITS_CMD_INVDB GITS_CMD_GICv4(0xe)
+
+/*
+ * ITS error numbers
+ */
+#define E_ITS_MOVI_UNMAPPED_INTERRUPT 0x010107
+#define E_ITS_MOVI_UNMAPPED_COLLECTION 0x010109
+#define E_ITS_INT_UNMAPPED_INTERRUPT 0x010307
+#define E_ITS_CLEAR_UNMAPPED_INTERRUPT 0x010507
+#define E_ITS_MAPD_DEVICE_OOR 0x010801
+#define E_ITS_MAPD_ITTSIZE_OOR 0x010802
+#define E_ITS_MAPC_PROCNUM_OOR 0x010902
+#define E_ITS_MAPC_COLLECTION_OOR 0x010903
+#define E_ITS_MAPTI_UNMAPPED_DEVICE 0x010a04
+#define E_ITS_MAPTI_ID_OOR 0x010a05
+#define E_ITS_MAPTI_PHYSICALID_OOR 0x010a06
+#define E_ITS_INV_UNMAPPED_INTERRUPT 0x010c07
+#define E_ITS_INVALL_UNMAPPED_COLLECTION 0x010d09
+#define E_ITS_MOVALL_PROCNUM_OOR 0x010e01
+#define E_ITS_DISCARD_UNMAPPED_INTERRUPT 0x010f07
+
+/*
+ * CPU interface registers
+ */
+#define ICC_CTLR_EL1_EOImode_SHIFT (1)
+#define ICC_CTLR_EL1_EOImode_drop_dir (0U << ICC_CTLR_EL1_EOImode_SHIFT)
+#define ICC_CTLR_EL1_EOImode_drop (1U << ICC_CTLR_EL1_EOImode_SHIFT)
+#define ICC_CTLR_EL1_EOImode_MASK (1 << ICC_CTLR_EL1_EOImode_SHIFT)
+#define ICC_CTLR_EL1_CBPR_SHIFT 0
+#define ICC_CTLR_EL1_CBPR_MASK (1 << ICC_CTLR_EL1_CBPR_SHIFT)
+#define ICC_CTLR_EL1_PMHE_SHIFT 6
+#define ICC_CTLR_EL1_PMHE_MASK (1 << ICC_CTLR_EL1_PMHE_SHIFT)
+#define ICC_CTLR_EL1_PRI_BITS_SHIFT 8
+#define ICC_CTLR_EL1_PRI_BITS_MASK (0x7 << ICC_CTLR_EL1_PRI_BITS_SHIFT)
+#define ICC_CTLR_EL1_ID_BITS_SHIFT 11
+#define ICC_CTLR_EL1_ID_BITS_MASK (0x7 << ICC_CTLR_EL1_ID_BITS_SHIFT)
+#define ICC_CTLR_EL1_SEIS_SHIFT 14
+#define ICC_CTLR_EL1_SEIS_MASK (0x1 << ICC_CTLR_EL1_SEIS_SHIFT)
+#define ICC_CTLR_EL1_A3V_SHIFT 15
+#define ICC_CTLR_EL1_A3V_MASK (0x1 << ICC_CTLR_EL1_A3V_SHIFT)
+#define ICC_CTLR_EL1_RSS (0x1 << 18)
+#define ICC_CTLR_EL1_ExtRange (0x1 << 19)
+#define ICC_PMR_EL1_SHIFT 0
+#define ICC_PMR_EL1_MASK (0xff << ICC_PMR_EL1_SHIFT)
+#define ICC_BPR0_EL1_SHIFT 0
+#define ICC_BPR0_EL1_MASK (0x7 << ICC_BPR0_EL1_SHIFT)
+#define ICC_BPR1_EL1_SHIFT 0
+#define ICC_BPR1_EL1_MASK (0x7 << ICC_BPR1_EL1_SHIFT)
+#define ICC_IGRPEN0_EL1_SHIFT 0
+#define ICC_IGRPEN0_EL1_MASK (1 << ICC_IGRPEN0_EL1_SHIFT)
+#define ICC_IGRPEN1_EL1_SHIFT 0
+#define ICC_IGRPEN1_EL1_MASK (1 << ICC_IGRPEN1_EL1_SHIFT)
+#define ICC_SRE_EL1_DIB (1U << 2)
+#define ICC_SRE_EL1_DFB (1U << 1)
#define ICC_SRE_EL1_SRE (1U << 0)

-#define ICC_IGRPEN1_EL1_ENABLE (1U << 0)
+/* These are for GICv2 emulation only */
+#define GICH_LR_VIRTUALID (0x3ffUL << 0)
+#define GICH_LR_PHYSID_CPUID_SHIFT (10)
+#define GICH_LR_PHYSID_CPUID (7UL << GICH_LR_PHYSID_CPUID_SHIFT)
+
+#define ICC_IAR1_EL1_SPURIOUS 0x3ff
+
+#define ICC_SRE_EL2_SRE (1 << 0)
+#define ICC_SRE_EL2_ENABLE (1 << 3)

-#define GICV3_MAX_CPUS 512
+#define ICC_SGI1R_TARGET_LIST_SHIFT 0
+#define ICC_SGI1R_TARGET_LIST_MASK (0xffff << ICC_SGI1R_TARGET_LIST_SHIFT)
+#define ICC_SGI1R_AFFINITY_1_SHIFT 16
+#define ICC_SGI1R_AFFINITY_1_MASK (0xff << ICC_SGI1R_AFFINITY_1_SHIFT)
+#define ICC_SGI1R_SGI_ID_SHIFT 24
+#define ICC_SGI1R_SGI_ID_MASK (0xfULL << ICC_SGI1R_SGI_ID_SHIFT)
+#define ICC_SGI1R_AFFINITY_2_SHIFT 32
+#define ICC_SGI1R_AFFINITY_2_MASK (0xffULL << ICC_SGI1R_AFFINITY_2_SHIFT)
+#define ICC_SGI1R_IRQ_ROUTING_MODE_BIT 40
+#define ICC_SGI1R_RS_SHIFT 44
+#define ICC_SGI1R_RS_MASK (0xfULL << ICC_SGI1R_RS_SHIFT)
+#define ICC_SGI1R_AFFINITY_3_SHIFT 48
+#define ICC_SGI1R_AFFINITY_3_MASK (0xffULL << ICC_SGI1R_AFFINITY_3_SHIFT)

-#endif /* SELFTEST_KVM_GICV3_H */
+#endif
diff --git a/tools/testing/selftests/kvm/lib/aarch64/gic_v3.c b/tools/testing/selftests/kvm/lib/aarch64/gic_v3.c
index 263bf3ed8fd5..cd8f0e209599 100644
--- a/tools/testing/selftests/kvm/lib/aarch64/gic_v3.c
+++ b/tools/testing/selftests/kvm/lib/aarch64/gic_v3.c
@@ -9,9 +9,20 @@
#include "processor.h"
#include "delay.h"

+#include "gic.h"
#include "gic_v3.h"
#include "gic_private.h"

+#define GICV3_MAX_CPUS 512
+
+#define GICD_INT_DEF_PRI 0xa0
+#define GICD_INT_DEF_PRI_X4 ((GICD_INT_DEF_PRI << 24) |\
+ (GICD_INT_DEF_PRI << 16) |\
+ (GICD_INT_DEF_PRI << 8) |\
+ GICD_INT_DEF_PRI)
+
+#define ICC_PMR_DEF_PRIO 0xf0
+
struct gicv3_data {
void *dist_base;
void *redist_base[GICV3_MAX_CPUS];
@@ -320,7 +331,7 @@ static void gicv3_cpu_init(unsigned int cpu, void *redist_base)
write_sysreg_s(ICC_PMR_DEF_PRIO, SYS_ICC_PMR_EL1);

/* Enable non-secure Group-1 interrupts */
- write_sysreg_s(ICC_IGRPEN1_EL1_ENABLE, SYS_ICC_GRPEN1_EL1);
+ write_sysreg_s(ICC_IGRPEN1_EL1_MASK, SYS_ICC_IGRPEN1_EL1);

gicv3_data.redist_base[cpu] = redist_base_cpu;
}
--
2.43.0.687.g38aa6559b0-goog


2024-02-13 09:49:10

by Oliver Upton

[permalink] [raw]
Subject: [PATCH v2 20/23] KVM: selftests: Add helper for enabling LPIs on a redistributor

The selftests GIC library presently does not support LPIs. Add a
userspace helper for configuring a redistributor for LPIs, installing
an LPI configuration table and LPI pending table.

Signed-off-by: Oliver Upton <[email protected]>
---
.../selftests/kvm/include/aarch64/vgic.h | 4 +
.../testing/selftests/kvm/lib/aarch64/vgic.c | 78 +++++++++++++++++++
2 files changed, 82 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/aarch64/vgic.h b/tools/testing/selftests/kvm/include/aarch64/vgic.h
index d45b2902439d..ae6a54453c62 100644
--- a/tools/testing/selftests/kvm/include/aarch64/vgic.h
+++ b/tools/testing/selftests/kvm/include/aarch64/vgic.h
@@ -32,6 +32,10 @@ void kvm_irq_write_isactiver(int gic_fd, uint32_t intid, struct kvm_vcpu *vcpu);

#define KVM_IRQCHIP_NUM_PINS (1020 - 32)

+void vgic_rdist_enable_lpis(int gic_fd, struct kvm_vcpu *vcpu,
+ vm_paddr_t cfg_table, size_t cfg_table_size,
+ vm_paddr_t pend_table);
+
struct vgic_its {
int its_fd;
void *cmdq_hva;
diff --git a/tools/testing/selftests/kvm/lib/aarch64/vgic.c b/tools/testing/selftests/kvm/lib/aarch64/vgic.c
index fc7b4fbe6453..2e1e0aa8cabc 100644
--- a/tools/testing/selftests/kvm/lib/aarch64/vgic.c
+++ b/tools/testing/selftests/kvm/lib/aarch64/vgic.c
@@ -3,8 +3,10 @@
* ARM Generic Interrupt Controller (GIC) v3 host support
*/

+#include <linux/kernel.h>
#include <linux/kvm.h>
#include <linux/sizes.h>
+#include <asm/cputype.h>
#include <asm/kvm_para.h>
#include <asm/kvm.h>

@@ -168,6 +170,82 @@ void kvm_irq_write_isactiver(int gic_fd, uint32_t intid, struct kvm_vcpu *vcpu)
vgic_poke_irq(gic_fd, intid, vcpu, GICD_ISACTIVER);
}

+#define VGIC_AFFINITY_0_SHIFT 0
+#define VGIC_AFFINITY_1_SHIFT 8
+#define VGIC_AFFINITY_2_SHIFT 16
+#define VGIC_AFFINITY_3_SHIFT 24
+
+#define MPIDR_TO_VGIC_LEVEL(mpidr, level) \
+ ((((mpidr) >> MPIDR_LEVEL_SHIFT(level)) & MPIDR_LEVEL_MASK) << \
+ VGIC_AFFINITY_## level ##_SHIFT)
+
+#define MPIDR_TO_VGIC(mpidr) \
+ ((MPIDR_TO_VGIC_LEVEL(mpidr, 0) | \
+ MPIDR_TO_VGIC_LEVEL(mpidr, 1) | \
+ MPIDR_TO_VGIC_LEVEL(mpidr, 2) | \
+ MPIDR_TO_VGIC_LEVEL(mpidr, 3)) << 32)
+
+static u32 vgic_rdist_read_reg(int gic_fd, struct kvm_vcpu *vcpu,
+ unsigned long offset)
+{
+ u64 mpidr, attr;
+ u32 val;
+
+ vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(SYS_MPIDR_EL1), &mpidr);
+
+ attr = MPIDR_TO_VGIC(mpidr) | offset;
+ kvm_device_attr_get(gic_fd, KVM_DEV_ARM_VGIC_GRP_REDIST_REGS,
+ attr, &val);
+
+ return val;
+}
+
+static void vgic_rdist_write_reg(int gic_fd, struct kvm_vcpu *vcpu,
+ unsigned long offset, u32 val)
+{
+ u64 mpidr, attr;
+
+ vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(SYS_MPIDR_EL1), &mpidr);
+
+ attr = MPIDR_TO_VGIC(mpidr) | offset;
+ kvm_device_attr_set(gic_fd, KVM_DEV_ARM_VGIC_GRP_REDIST_REGS,
+ attr, &val);
+}
+
+static void vgic_rdist_write_baser(int gic_fd, struct kvm_vcpu *vcpu,
+ unsigned long offset, u64 val)
+{
+ u32 attr = val;
+
+ vgic_rdist_write_reg(gic_fd, vcpu, offset, attr);
+
+ attr = val >> 32;
+ vgic_rdist_write_reg(gic_fd, vcpu, offset + 4, attr);
+}
+
+void vgic_rdist_enable_lpis(int gic_fd, struct kvm_vcpu *vcpu,
+ vm_paddr_t cfg_table, size_t cfg_table_size,
+ vm_paddr_t pend_table)
+{
+ u32 ctlr;
+ u64 val;
+
+ val = (cfg_table |
+ GICR_PROPBASER_InnerShareable |
+ GICR_PROPBASER_RaWaWb |
+ ((ilog2(cfg_table_size) - 1) & GICR_PROPBASER_IDBITS_MASK));
+ vgic_rdist_write_baser(gic_fd, vcpu, GICR_PROPBASER, val);
+
+ val = (pend_table |
+ GICR_PENDBASER_InnerShareable |
+ GICR_PENDBASER_RaWaWb);
+ vgic_rdist_write_baser(gic_fd, vcpu, GICR_PENDBASER, val);
+
+ ctlr = vgic_rdist_read_reg(gic_fd, vcpu, GICR_CTLR);
+ ctlr |= GICR_CTLR_ENABLE_LPIS;
+ vgic_rdist_write_reg(gic_fd, vcpu, GICR_CTLR, ctlr);
+}
+
static u64 vgic_its_read_reg(int its_fd, unsigned long offset)
{
u64 attr;
--
2.43.0.687.g38aa6559b0-goog


2024-02-13 09:49:33

by Oliver Upton

[permalink] [raw]
Subject: [PATCH v2 21/23] KVM: selftests: Use MPIDR_HWID_BITMASK from cputype.h

No need for a home-rolled definition, just rely on the common header.

Signed-off-by: Oliver Upton <[email protected]>
---
tools/testing/selftests/kvm/aarch64/psci_test.c | 2 ++
tools/testing/selftests/kvm/include/aarch64/processor.h | 2 --
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/aarch64/psci_test.c b/tools/testing/selftests/kvm/aarch64/psci_test.c
index 9b004905d1d3..9fa3578d47d5 100644
--- a/tools/testing/selftests/kvm/aarch64/psci_test.c
+++ b/tools/testing/selftests/kvm/aarch64/psci_test.c
@@ -13,7 +13,9 @@

#define _GNU_SOURCE

+#include <linux/kernel.h>
#include <linux/psci.h>
+#include <asm/cputype.h>

#include "kvm_util.h"
#include "processor.h"
diff --git a/tools/testing/selftests/kvm/include/aarch64/processor.h b/tools/testing/selftests/kvm/include/aarch64/processor.h
index cf20e44e86f2..94e8f1892754 100644
--- a/tools/testing/selftests/kvm/include/aarch64/processor.h
+++ b/tools/testing/selftests/kvm/include/aarch64/processor.h
@@ -58,8 +58,6 @@
MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) | \
MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT))

-#define MPIDR_HWID_BITMASK (0xff00fffffful)
-
void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init);
struct kvm_vcpu *aarch64_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id,
struct kvm_vcpu_init *init, void *guest_code);
--
2.43.0.687.g38aa6559b0-goog


2024-02-13 09:50:02

by Oliver Upton

[permalink] [raw]
Subject: [PATCH v2 15/23] KVM: arm64: vgic-its: Treat the LPI translation cache as an rculist

Convert the LPI translation cache to an rculist such that readers can
walk it while only holding the RCU read lock.

Signed-off-by: Oliver Upton <[email protected]>
---
arch/arm64/kvm/vgic/vgic-its.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 99042ecc9c85..55463eb84763 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -569,7 +569,7 @@ static struct vgic_irq *__vgic_its_check_cache(struct vgic_dist *dist,
{
struct vgic_translation_cache_entry *cte;

- list_for_each_entry(cte, &dist->lpi_translation_cache, entry) {
+ list_for_each_entry_rcu(cte, &dist->lpi_translation_cache, entry) {
/*
* If we hit a NULL entry, there is nothing after this
* point.
@@ -625,7 +625,7 @@ static struct vgic_translation_cache_entry *vgic_its_cache_victim(struct vgic_di
* older entries in the case of a tie. Return the max usage count seen
* during the scan to initialize the new cache entry.
*/
- list_for_each_entry(cte, &dist->lpi_translation_cache, entry) {
+ list_for_each_entry_rcu(cte, &dist->lpi_translation_cache, entry) {
tmp = atomic64_read(&cte->usage_count);
max = max(max, tmp);

@@ -679,7 +679,7 @@ static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its,
if (dist->lpi_cache_count >= vgic_its_max_cache_size(kvm)) {
victim = vgic_its_cache_victim(dist, &usage);

- list_del(&victim->entry);
+ list_del_rcu(&victim->entry);
dist->lpi_cache_count--;
}

@@ -697,7 +697,7 @@ static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its,
rcu_assign_pointer(new->irq, irq);

/* Move the new translation to the head of the list */
- list_add(&new->entry, &dist->lpi_translation_cache);
+ list_add_rcu(&new->entry, &dist->lpi_translation_cache);
dist->lpi_cache_count++;

out:
@@ -734,7 +734,7 @@ void vgic_its_invalidate_cache(struct kvm *kvm)
raw_spin_lock_irqsave(&dist->lpi_list_lock, flags);
rcu_read_lock();

- list_for_each_entry(cte, &dist->lpi_translation_cache, entry) {
+ list_for_each_entry_rcu(cte, &dist->lpi_translation_cache, entry) {
/*
* If we hit a NULL entry, there is nothing after this
* point.
@@ -1981,7 +1981,7 @@ void vgic_lpi_translation_cache_destroy(struct kvm *kvm)

list_for_each_entry_safe(cte, tmp,
&dist->lpi_translation_cache, entry) {
- list_del(&cte->entry);
+ list_del_rcu(&cte->entry);
kfree(cte);
}
}
--
2.43.0.687.g38aa6559b0-goog


2024-02-13 09:50:16

by Oliver Upton

[permalink] [raw]
Subject: [PATCH v2 16/23] KVM: arm64: vgic-its: Rely on RCU to protect translation cache reads

Stop taking the lpi_list_lock when reading the LPI translation cache.

Signed-off-by: Oliver Upton <[email protected]>
---
arch/arm64/kvm/vgic/vgic-its.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 55463eb84763..0e449f8c1f01 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -593,15 +593,14 @@ static struct vgic_irq *vgic_its_check_cache(struct kvm *kvm, phys_addr_t db,
{
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_irq *irq;
- unsigned long flags;

- raw_spin_lock_irqsave(&dist->lpi_list_lock, flags);
+ rcu_read_lock();

irq = __vgic_its_check_cache(dist, db, devid, eventid);
if (!vgic_try_get_irq_kref(irq))
irq = NULL;

- raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags);
+ rcu_read_unlock();

return irq;
}
--
2.43.0.687.g38aa6559b0-goog


2024-02-13 09:51:00

by Oliver Upton

[permalink] [raw]
Subject: [PATCH v2 19/23] KVM: selftests: Add a minimal library for interacting with an ITS

A prerequisite of testing LPI injection performance is of course
instantiating an ITS for the guest. Add a small library for creating an
ITS and interacting with it *from userspace*.

Yep, you read that right. KVM unintentionally allows userspace to send
commands to the virtual ITS via the command queue. Besides adding test
coverage for an elusive UAPI, interacting with the ITS in userspace
simplifies the handling of commands that need to allocate memory, like a
MAPD command with an ITT.

Signed-off-by: Oliver Upton <[email protected]>
---
.../selftests/kvm/include/aarch64/gic.h | 7 +-
.../selftests/kvm/include/aarch64/vgic.h | 20 ++
.../testing/selftests/kvm/lib/aarch64/vgic.c | 241 ++++++++++++++++++
3 files changed, 267 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/include/aarch64/gic.h b/tools/testing/selftests/kvm/include/aarch64/gic.h
index 16d944486e9c..abb41d67880c 100644
--- a/tools/testing/selftests/kvm/include/aarch64/gic.h
+++ b/tools/testing/selftests/kvm/include/aarch64/gic.h
@@ -11,7 +11,12 @@ enum gic_type {
GIC_TYPE_MAX,
};

-#define GICD_BASE_GPA 0x8000000ULL
+/*
+ * Note that the redistributor frames are at the end, as the range scales
+ * with the number of vCPUs in the VM.
+ */
+#define GITS_BASE_GPA 0x8000000ULL
+#define GICD_BASE_GPA (GITS_BASE_GPA + SZ_128K)
#define GICR_BASE_GPA (GICD_BASE_GPA + SZ_64K)

/* The GIC is identity-mapped into the guest at the time of setup. */
diff --git a/tools/testing/selftests/kvm/include/aarch64/vgic.h b/tools/testing/selftests/kvm/include/aarch64/vgic.h
index ce19aa0a8360..d45b2902439d 100644
--- a/tools/testing/selftests/kvm/include/aarch64/vgic.h
+++ b/tools/testing/selftests/kvm/include/aarch64/vgic.h
@@ -32,4 +32,24 @@ void kvm_irq_write_isactiver(int gic_fd, uint32_t intid, struct kvm_vcpu *vcpu);

#define KVM_IRQCHIP_NUM_PINS (1020 - 32)

+struct vgic_its {
+ int its_fd;
+ void *cmdq_hva;
+ size_t cmdq_size;
+};
+
+struct vgic_its *vgic_its_setup(struct kvm_vm *vm,
+ vm_paddr_t coll_tbl, size_t coll_tbl_sz,
+ vm_paddr_t device_tbl, size_t device_tbl_sz,
+ vm_paddr_t cmdq, size_t cmdq_size);
+void vgic_its_destroy(struct vgic_its *its);
+
+void vgic_its_send_mapd_cmd(struct vgic_its *its, u32 device_id,
+ vm_paddr_t itt_base, size_t itt_size, bool valid);
+void vgic_its_send_mapc_cmd(struct vgic_its *its, struct kvm_vcpu *vcpu,
+ u32 collection_id, bool valid);
+void vgic_its_send_mapti_cmd(struct vgic_its *its, u32 device_id,
+ u32 event_id, u32 collection_id, u32 intid);
+void vgic_its_send_invall_cmd(struct vgic_its *its, u32 collection_id);
+
#endif // SELFTEST_KVM_VGIC_H
diff --git a/tools/testing/selftests/kvm/lib/aarch64/vgic.c b/tools/testing/selftests/kvm/lib/aarch64/vgic.c
index ac55b6c2e915..fc7b4fbe6453 100644
--- a/tools/testing/selftests/kvm/lib/aarch64/vgic.c
+++ b/tools/testing/selftests/kvm/lib/aarch64/vgic.c
@@ -12,6 +12,7 @@
#include "vgic.h"
#include "gic.h"
#include "gic_v3.h"
+#include "processor.h"

/*
* vGIC-v3 default host setup
@@ -166,3 +167,243 @@ void kvm_irq_write_isactiver(int gic_fd, uint32_t intid, struct kvm_vcpu *vcpu)
{
vgic_poke_irq(gic_fd, intid, vcpu, GICD_ISACTIVER);
}
+
+static u64 vgic_its_read_reg(int its_fd, unsigned long offset)
+{
+ u64 attr;
+
+ kvm_device_attr_get(its_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
+ offset, &attr);
+ return attr;
+}
+
+static void vgic_its_write_reg(int its_fd, unsigned long offset, u64 val)
+{
+ kvm_device_attr_set(its_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
+ offset, &val);
+}
+
+static unsigned long vgic_its_find_baser(int its_fd, unsigned int type)
+{
+ int i;
+
+ for (i = 0; i < GITS_BASER_NR_REGS; i++) {
+ u64 baser;
+ unsigned long offset = GITS_BASER + (i * sizeof(baser));
+
+ baser = vgic_its_read_reg(its_fd, offset);
+ if (GITS_BASER_TYPE(baser) == type)
+ return offset;
+ }
+
+ TEST_FAIL("Couldn't find an ITS BASER of type %u", type);
+ return -1;
+}
+
+static void vgic_its_install_table(int its_fd, unsigned int type, vm_paddr_t base,
+ size_t size)
+{
+ unsigned long offset = vgic_its_find_baser(its_fd, type);
+ u64 baser;
+
+ baser = ((size / SZ_64K) - 1) |
+ GITS_BASER_PAGE_SIZE_64K |
+ GITS_BASER_InnerShareable |
+ base |
+ GITS_BASER_RaWaWb |
+ GITS_BASER_VALID;
+
+ vgic_its_write_reg(its_fd, offset, baser);
+}
+
+static void vgic_its_install_cmdq(int its_fd, vm_paddr_t base, size_t size)
+{
+ u64 cbaser;
+
+ cbaser = ((size / SZ_4K) - 1) |
+ GITS_CBASER_InnerShareable |
+ base |
+ GITS_CBASER_RaWaWb |
+ GITS_CBASER_VALID;
+
+ vgic_its_write_reg(its_fd, GITS_CBASER, cbaser);
+}
+
+struct vgic_its *vgic_its_setup(struct kvm_vm *vm,
+ vm_paddr_t coll_tbl, size_t coll_tbl_sz,
+ vm_paddr_t device_tbl, size_t device_tbl_sz,
+ vm_paddr_t cmdq, size_t cmdq_size)
+{
+ int its_fd = kvm_create_device(vm, KVM_DEV_TYPE_ARM_VGIC_ITS);
+ struct vgic_its *its = malloc(sizeof(struct vgic_its));
+ u64 attr, ctlr;
+
+ attr = GITS_BASE_GPA;
+ kvm_device_attr_set(its_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
+ KVM_VGIC_ITS_ADDR_TYPE, &attr);
+
+ kvm_device_attr_set(its_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
+ KVM_DEV_ARM_VGIC_CTRL_INIT, NULL);
+
+ vgic_its_install_table(its_fd, GITS_BASER_TYPE_COLLECTION, coll_tbl,
+ coll_tbl_sz);
+ vgic_its_install_table(its_fd, GITS_BASER_TYPE_DEVICE, device_tbl,
+ device_tbl_sz);
+
+ vgic_its_install_cmdq(its_fd, cmdq, cmdq_size);
+
+ ctlr = vgic_its_read_reg(its_fd, GITS_CTLR);
+ ctlr |= GITS_CTLR_ENABLE;
+ vgic_its_write_reg(its_fd, GITS_CTLR, ctlr);
+
+ *its = (struct vgic_its) {
+ .its_fd = its_fd,
+ .cmdq_hva = addr_gpa2hva(vm, cmdq),
+ .cmdq_size = cmdq_size,
+ };
+
+ return its;
+}
+
+void vgic_its_destroy(struct vgic_its *its)
+{
+ close(its->its_fd);
+ free(its);
+}
+
+struct its_cmd_block {
+ union {
+ u64 raw_cmd[4];
+ __le64 raw_cmd_le[4];
+ };
+};
+
+static inline void its_fixup_cmd(struct its_cmd_block *cmd)
+{
+ /* Let's fixup BE commands */
+ cmd->raw_cmd_le[0] = cpu_to_le64(cmd->raw_cmd[0]);
+ cmd->raw_cmd_le[1] = cpu_to_le64(cmd->raw_cmd[1]);
+ cmd->raw_cmd_le[2] = cpu_to_le64(cmd->raw_cmd[2]);
+ cmd->raw_cmd_le[3] = cpu_to_le64(cmd->raw_cmd[3]);
+}
+
+static void its_mask_encode(u64 *raw_cmd, u64 val, int h, int l)
+{
+ u64 mask = GENMASK_ULL(h, l);
+ *raw_cmd &= ~mask;
+ *raw_cmd |= (val << l) & mask;
+}
+
+static void its_encode_cmd(struct its_cmd_block *cmd, u8 cmd_nr)
+{
+ its_mask_encode(&cmd->raw_cmd[0], cmd_nr, 7, 0);
+}
+
+static void its_encode_devid(struct its_cmd_block *cmd, u32 devid)
+{
+ its_mask_encode(&cmd->raw_cmd[0], devid, 63, 32);
+}
+
+static void its_encode_event_id(struct its_cmd_block *cmd, u32 id)
+{
+ its_mask_encode(&cmd->raw_cmd[1], id, 31, 0);
+}
+
+static void its_encode_phys_id(struct its_cmd_block *cmd, u32 phys_id)
+{
+ its_mask_encode(&cmd->raw_cmd[1], phys_id, 63, 32);
+}
+
+static void its_encode_size(struct its_cmd_block *cmd, u8 size)
+{
+ its_mask_encode(&cmd->raw_cmd[1], size, 4, 0);
+}
+
+static void its_encode_itt(struct its_cmd_block *cmd, u64 itt_addr)
+{
+ its_mask_encode(&cmd->raw_cmd[2], itt_addr >> 8, 51, 8);
+}
+
+static void its_encode_valid(struct its_cmd_block *cmd, int valid)
+{
+ its_mask_encode(&cmd->raw_cmd[2], !!valid, 63, 63);
+}
+
+static void its_encode_target(struct its_cmd_block *cmd, u64 target_addr)
+{
+ its_mask_encode(&cmd->raw_cmd[2], target_addr >> 16, 51, 16);
+}
+
+static void its_encode_collection(struct its_cmd_block *cmd, u16 col)
+{
+ its_mask_encode(&cmd->raw_cmd[2], col, 15, 0);
+}
+
+static void vgic_its_send_cmd(struct vgic_its *its, struct its_cmd_block *cmd)
+{
+ u64 cwriter = vgic_its_read_reg(its->its_fd, GITS_CWRITER);
+ struct its_cmd_block *dst = its->cmdq_hva + cwriter;
+ u64 next;
+
+ its_fixup_cmd(cmd);
+
+ WRITE_ONCE(*dst, *cmd);
+ dsb(ishst);
+
+ next = (cwriter + sizeof(*cmd)) % its->cmdq_size;
+ vgic_its_write_reg(its->its_fd, GITS_CWRITER, next);
+
+ TEST_ASSERT(vgic_its_read_reg(its->its_fd, GITS_CREADR) == next,
+ "ITS didn't process command at offset: %lu\n", cwriter);
+}
+
+void vgic_its_send_mapd_cmd(struct vgic_its *its, u32 device_id,
+ vm_paddr_t itt_base, size_t itt_size, bool valid)
+{
+ struct its_cmd_block cmd = {};
+
+ its_encode_cmd(&cmd, GITS_CMD_MAPD);
+ its_encode_devid(&cmd, device_id);
+ its_encode_size(&cmd, ilog2(itt_size) - 1);
+ its_encode_itt(&cmd, itt_base);
+ its_encode_valid(&cmd, valid);
+
+ vgic_its_send_cmd(its, &cmd);
+}
+
+void vgic_its_send_mapc_cmd(struct vgic_its *its, struct kvm_vcpu *vcpu,
+ u32 collection_id, bool valid)
+{
+ struct its_cmd_block cmd = {};
+
+ its_encode_cmd(&cmd, GITS_CMD_MAPC);
+ its_encode_collection(&cmd, collection_id);
+ its_encode_target(&cmd, vcpu->id);
+ its_encode_valid(&cmd, valid);
+
+ vgic_its_send_cmd(its, &cmd);
+}
+
+void vgic_its_send_mapti_cmd(struct vgic_its *its, u32 device_id,
+ u32 event_id, u32 collection_id, u32 intid)
+{
+ struct its_cmd_block cmd = {};
+
+ its_encode_cmd(&cmd, GITS_CMD_MAPTI);
+ its_encode_devid(&cmd, device_id);
+ its_encode_event_id(&cmd, event_id);
+ its_encode_phys_id(&cmd, intid);
+ its_encode_collection(&cmd, collection_id);
+
+ vgic_its_send_cmd(its, &cmd);
+}
+
+void vgic_its_send_invall_cmd(struct vgic_its *its, u32 collection_id)
+{
+ struct its_cmd_block cmd = {};
+
+ its_encode_cmd(&cmd, GITS_CMD_INVALL);
+ its_encode_collection(&cmd, collection_id);
+
+ vgic_its_send_cmd(its, &cmd);
+}
--
2.43.0.687.g38aa6559b0-goog


2024-02-13 09:53:50

by Oliver Upton

[permalink] [raw]
Subject: [PATCH v2 23/23] KVM: selftests: Add stress test for LPI injection

Now that all the infrastructure is in place, add a test to stress KVM's
LPI injection. Keep a 1:1 mapping of device IDs to signalling threads,
allowing the user to scale up/down the sender side of an LPI. Make use
of the new VM stats for the translation cache to estimate the
translation hit rate.

Since the primary focus of the test is on performance, you'll notice
that the guest code is not pedantic about the LPIs it receives. Counting
the number of LPIs would require synchronization between the device and
vCPU threads to avoid coalescing and would get in the way of performance
numbers.

Signed-off-by: Oliver Upton <[email protected]>
---
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/aarch64/vgic_lpi_stress.c | 388 ++++++++++++++++++
2 files changed, 389 insertions(+)
create mode 100644 tools/testing/selftests/kvm/aarch64/vgic_lpi_stress.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 492e937fab00..8a240d20ec5e 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -153,6 +153,7 @@ TEST_GEN_PROGS_aarch64 += aarch64/smccc_filter
TEST_GEN_PROGS_aarch64 += aarch64/vcpu_width_config
TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
TEST_GEN_PROGS_aarch64 += aarch64/vgic_irq
+TEST_GEN_PROGS_aarch64 += aarch64/vgic_lpi_stress
TEST_GEN_PROGS_aarch64 += aarch64/vpmu_counter_access
TEST_GEN_PROGS_aarch64 += access_tracking_perf_test
TEST_GEN_PROGS_aarch64 += demand_paging_test
diff --git a/tools/testing/selftests/kvm/aarch64/vgic_lpi_stress.c b/tools/testing/selftests/kvm/aarch64/vgic_lpi_stress.c
new file mode 100644
index 000000000000..d557d907728a
--- /dev/null
+++ b/tools/testing/selftests/kvm/aarch64/vgic_lpi_stress.c
@@ -0,0 +1,388 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * vgic_lpi_stress - Stress test for KVM's ITS emulation
+ *
+ * Copyright (c) 2024 Google LLC
+ */
+
+#include <linux/sizes.h>
+#include <pthread.h>
+#include <sys/sysinfo.h>
+
+#include "kvm_util.h"
+#include "gic.h"
+#include "gic_v3.h"
+#include "processor.h"
+#include "ucall.h"
+#include "vgic.h"
+
+#define TEST_MEMSLOT_INDEX 1
+
+#define GIC_LPI_OFFSET 8192
+
+static u32 nr_vcpus = 1;
+static u32 nr_devices = 1;
+static u32 nr_event_ids = 16;
+static size_t nr_iterations = 1000;
+static vm_paddr_t gpa_base;
+
+static struct kvm_vm *vm;
+static struct kvm_vcpu **vcpus;
+static struct vgic_its *its;
+static int gic_fd;
+
+static bool request_vcpus_stop;
+
+static void guest_irq_handler(struct ex_regs *regs)
+{
+ u32 intid = gic_get_and_ack_irq();
+
+ if (intid == IAR_SPURIOUS)
+ return;
+
+ GUEST_ASSERT(intid >= GIC_LPI_OFFSET);
+ gic_set_eoi(intid);
+}
+
+static void guest_code(size_t nr_lpis)
+{
+ gic_init(GIC_V3, nr_vcpus);
+
+ GUEST_SYNC(0);
+
+ /*
+ * Don't use WFI here to avoid blocking the vCPU thread indefinitely and
+ * never getting the stop singal.
+ */
+ while (!READ_ONCE(request_vcpus_stop))
+ cpu_relax();
+
+ GUEST_DONE();
+}
+
+static void setup_memslot(void)
+{
+ size_t pages;
+ size_t sz;
+
+ /*
+ * For the ITS:
+ * - A single level device table
+ * - A single level collection table
+ * - The command queue
+ * - An ITT for each device
+ */
+ sz = (3 + nr_devices) * SZ_64K;
+
+ /*
+ * For the redistributors:
+ * - A shared LPI configuration table
+ * - An LPI pending table for each vCPU
+ */
+ sz += (1 + nr_vcpus) * SZ_64K;
+
+ pages = sz / vm->page_size;
+ gpa_base = ((vm_compute_max_gfn(vm) + 1) * vm->page_size) - sz;
+ vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, gpa_base,
+ TEST_MEMSLOT_INDEX, pages, 0);
+}
+
+#define LPI_PROP_DEFAULT_PRIO 0xa0
+
+static void configure_lpis(vm_paddr_t prop_table)
+{
+ u8 *tbl = addr_gpa2hva(vm, prop_table);
+ size_t i;
+
+ for (i = 0; i < (nr_devices * nr_event_ids); i++) {
+ tbl[i] = LPI_PROP_DEFAULT_PRIO |
+ LPI_PROP_GROUP1 |
+ LPI_PROP_ENABLED;
+ }
+}
+
+static void setup_gic(void)
+{
+ vm_paddr_t coll_table, device_table, cmdq_base;
+
+ gic_fd = vgic_v3_setup(vm, nr_vcpus, 64);
+ __TEST_REQUIRE(gic_fd >= 0, "Failed to create GICv3");
+
+ coll_table = vm_phy_pages_alloc_aligned(vm, SZ_64K / vm->page_size,
+ gpa_base, TEST_MEMSLOT_INDEX);
+ device_table = vm_phy_pages_alloc_aligned(vm, SZ_64K / vm->page_size,
+ gpa_base, TEST_MEMSLOT_INDEX);
+ cmdq_base = vm_phy_pages_alloc_aligned(vm, SZ_64K / vm->page_size,
+ gpa_base, TEST_MEMSLOT_INDEX);
+
+ its = vgic_its_setup(vm, coll_table, SZ_64K,
+ device_table, SZ_64K, cmdq_base, SZ_64K);
+}
+
+static void setup_its_mappings(void)
+{
+ u32 coll_id, device_id, event_id, intid = GIC_LPI_OFFSET;
+
+ for (coll_id = 0; coll_id < nr_vcpus; coll_id++)
+ vgic_its_send_mapc_cmd(its, vcpus[coll_id], coll_id, true);
+
+ /* Round-robin the LPIs to all of the vCPUs in the VM */
+ coll_id = 0;
+ for (device_id = 0; device_id < nr_devices; device_id++) {
+ vm_paddr_t itt_base = vm_phy_pages_alloc_aligned(vm, SZ_64K / vm->page_size,
+ gpa_base, TEST_MEMSLOT_INDEX);
+
+ vgic_its_send_mapd_cmd(its, device_id, itt_base, SZ_64K, true);
+
+ for (event_id = 0; event_id < nr_event_ids; event_id++) {
+ vgic_its_send_mapti_cmd(its, device_id, event_id, coll_id,
+ intid++);
+
+ coll_id = (coll_id + 1) % nr_vcpus;
+ }
+ }
+}
+
+static void setup_rdists_for_lpis(void)
+{
+ size_t i;
+
+ vm_paddr_t prop_table = vm_phy_pages_alloc_aligned(vm, SZ_64K / vm->page_size,
+ gpa_base, TEST_MEMSLOT_INDEX);
+
+ configure_lpis(prop_table);
+
+ for (i = 0; i < nr_vcpus; i++) {
+ vm_paddr_t pend_table;
+
+ pend_table = vm_phy_pages_alloc_aligned(vm, SZ_64K / vm->page_size,
+ gpa_base, TEST_MEMSLOT_INDEX);
+
+ vgic_rdist_enable_lpis(gic_fd, vcpus[i], prop_table, SZ_64K, pend_table);
+ }
+}
+
+static void invalidate_all_rdists(void)
+{
+ int i;
+
+ for (i = 0; i < nr_vcpus; i++)
+ vgic_its_send_invall_cmd(its, i);
+}
+
+static void signal_lpi(u32 device_id, u32 event_id)
+{
+ vm_paddr_t db_addr = GITS_BASE_GPA + GITS_TRANSLATER;
+
+ struct kvm_msi msi = {
+ .address_lo = db_addr,
+ .address_hi = db_addr >> 32,
+ .data = event_id,
+ .devid = device_id,
+ .flags = KVM_MSI_VALID_DEVID,
+ };
+
+ /*
+ * KVM_SIGNAL_MSI returns 1 if the MSI wasn't 'blocked' by the VM,
+ * which for arm64 implies having a valid translation in the ITS.
+ */
+ TEST_ASSERT(__vm_ioctl(vm, KVM_SIGNAL_MSI, &msi) == 1,
+ "KVM_SIGNAL_MSI ioctl failed");
+}
+
+static pthread_barrier_t test_setup_barrier;
+static pthread_barrier_t test_start_barrier;
+
+static void *lpi_worker_thread(void *data)
+{
+ u32 device_id = (size_t)data;
+ u32 event_id;
+ size_t i;
+
+ pthread_barrier_wait(&test_start_barrier);
+
+ for (i = 0; i < nr_iterations; i++)
+ for (event_id = 0; event_id < nr_event_ids; event_id++)
+ signal_lpi(device_id, event_id);
+
+ return NULL;
+}
+
+static void *vcpu_worker_thread(void *data)
+{
+ struct kvm_vcpu *vcpu = data;
+ struct ucall uc;
+
+ while (true) {
+ vcpu_run(vcpu);
+
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_SYNC:
+ /*
+ * Tell the main thread to complete its last bit of
+ * setup and wait for the signal to start the test.
+ */
+ pthread_barrier_wait(&test_setup_barrier);
+ pthread_barrier_wait(&test_start_barrier);
+ break;
+ case UCALL_DONE:
+ return NULL;
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ break;
+ default:
+ TEST_FAIL("Unknown ucall: %lu", uc.cmd);
+ }
+ }
+
+ return NULL;
+}
+
+static void report_stats(struct timespec delta)
+{
+ u64 cache_hits, cache_misses, cache_accesses;
+ double nr_lpis;
+ double time;
+
+ nr_lpis = nr_devices * nr_event_ids * nr_iterations;
+
+ time = delta.tv_sec;
+ time += ((double)delta.tv_nsec) / NSEC_PER_SEC;
+
+ pr_info("Rate: %.2f LPIs/sec\n", nr_lpis / time);
+
+ __vm_get_stat(vm, "vgic_its_trans_cache_hit", &cache_hits, 1);
+ __vm_get_stat(vm, "vgic_its_trans_cache_miss", &cache_misses, 1);
+
+ cache_accesses = cache_hits + cache_misses;
+
+ pr_info("Translation Cache\n");
+ pr_info(" %lu hits\n", cache_hits);
+ pr_info(" %lu misses\n", cache_misses);
+ pr_info(" %.2f%% hit rate\n", 100 * (((double)cache_hits) / cache_accesses));
+}
+
+static void run_test(void)
+{
+ pthread_t *lpi_threads = malloc(nr_devices * sizeof(pthread_t));
+ pthread_t *vcpu_threads = malloc(nr_vcpus * sizeof(pthread_t));
+ struct timespec start, delta;
+ size_t i;
+
+ TEST_ASSERT(lpi_threads && vcpu_threads, "Failed to allocate pthread arrays");
+
+ /* Only the vCPU threads need to do setup before starting the VM. */
+ pthread_barrier_init(&test_setup_barrier, NULL, nr_vcpus + 1);
+ pthread_barrier_init(&test_start_barrier, NULL, nr_devices + nr_vcpus + 1);
+
+ for (i = 0; i < nr_vcpus; i++)
+ pthread_create(&vcpu_threads[i], NULL, vcpu_worker_thread, vcpus[i]);
+
+ for (i = 0; i < nr_devices; i++)
+ pthread_create(&lpi_threads[i], NULL, lpi_worker_thread, (void *)i);
+
+ pthread_barrier_wait(&test_setup_barrier);
+
+ /*
+ * Setup LPIs for the VM after the guest has initialized the GIC. Yes,
+ * this is weird to be doing in userspace, but creating ITS translations
+ * requires allocating an ITT for every device.
+ */
+ setup_rdists_for_lpis();
+ setup_its_mappings();
+ invalidate_all_rdists();
+
+ clock_gettime(CLOCK_MONOTONIC, &start);
+ pthread_barrier_wait(&test_start_barrier);
+
+ for (i = 0; i < nr_devices; i++)
+ pthread_join(lpi_threads[i], NULL);
+
+ delta = timespec_elapsed(start);
+ write_guest_global(vm, request_vcpus_stop, true);
+
+ for (i = 0; i < nr_vcpus; i++)
+ pthread_join(vcpu_threads[i], NULL);
+
+ report_stats(delta);
+}
+
+static void setup_vm(void)
+{
+ int i;
+
+ vcpus = malloc(nr_vcpus * sizeof(struct kvm_vcpu));
+ TEST_ASSERT(vcpus, "Failed to allocate vCPU array");
+
+ vm = vm_create_with_vcpus(nr_vcpus, guest_code, vcpus);
+
+ vm_init_descriptor_tables(vm);
+ for (i = 0; i < nr_vcpus; i++)
+ vcpu_init_descriptor_tables(vcpus[i]);
+
+ vm_install_exception_handler(vm, VECTOR_IRQ_CURRENT, guest_irq_handler);
+
+ setup_memslot();
+
+ setup_gic();
+
+ /* gic_init() demands the number of vCPUs in the VM */
+ sync_global_to_guest(vm, nr_vcpus);
+}
+
+static void destroy_vm(void)
+{
+ vgic_its_destroy(its);
+ close(gic_fd);
+ kvm_vm_free(vm);
+ free(vcpus);
+}
+
+static void pr_usage(const char *name)
+{
+ pr_info("%s [-v NR_VCPUS] [-d NR_DEVICES] [-e NR_EVENTS] [-i ITERS] -h\n", name);
+ pr_info(" -v:\tnumber of vCPUs (default: %u)\n", nr_vcpus);
+ pr_info(" -d:\tnumber of devices (default: %u)\n", nr_devices);
+ pr_info(" -e:\tnumber of event IDs per device (default: %u)\n", nr_event_ids);
+ pr_info(" -i:\tnumber of iterations (default: %lu)\n", nr_iterations);
+}
+
+int main(int argc, char **argv)
+{
+ u32 nr_threads;
+ int c;
+
+ while ((c = getopt(argc, argv, "hv:d:e:i:")) != -1) {
+ switch (c) {
+ case 'v':
+ nr_vcpus = atoi(optarg);
+ break;
+ case 'd':
+ nr_devices = atoi(optarg);
+ break;
+ case 'e':
+ nr_event_ids = atoi(optarg);
+ break;
+ case 'i':
+ nr_iterations = strtoul(optarg, NULL, 0);
+ break;
+ case 'h':
+ default:
+ pr_usage(argv[0]);
+ return 1;
+ }
+ }
+
+ nr_threads = nr_vcpus + nr_devices;
+ if (nr_threads > get_nprocs())
+ pr_info("WARNING: running %u threads on %d CPUs; performance is degraded.\n",
+ nr_threads, get_nprocs());
+
+ setup_vm();
+
+ run_test();
+
+ destroy_vm();
+
+ return 0;
+}
--
2.43.0.687.g38aa6559b0-goog


2024-02-13 09:55:25

by Oliver Upton

[permalink] [raw]
Subject: [PATCH v2 22/23] KVM: selftests: Hack in support for aligned page allocations

Many of the memory allocations handed off to a GIC ITS require 64K
alignment. One would think that requesting 64K of pages would follow
natural alignment in selftests (much like alloc_pages() in the kernel),
however that is unfortunately not the case.

Add a new helper to allow a caller to decide if they want
naturally-aligned page allocations. Deliberately avoid making this the
default in case this subtly breaks assumptions or memory overheads in
other selftests.

Signed-off-by: Oliver Upton <[email protected]>
---
.../selftests/kvm/include/kvm_util_base.h | 2 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 27 ++++++++++++++++---
2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 9e5afc472c14..750638cfa849 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -830,6 +830,8 @@ vm_paddr_t vm_phy_page_alloc(struct kvm_vm *vm, vm_paddr_t paddr_min,
vm_paddr_t vm_phy_pages_alloc(struct kvm_vm *vm, size_t num,
vm_paddr_t paddr_min, uint32_t memslot);
vm_paddr_t vm_alloc_page_table(struct kvm_vm *vm);
+vm_paddr_t vm_phy_pages_alloc_aligned(struct kvm_vm *vm, size_t num,
+ vm_paddr_t paddr_min, uint32_t memslot);

/*
* ____vm_create() does KVM_CREATE_VM and little else. __vm_create() also
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index e066d584c656..60948a004012 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1984,8 +1984,9 @@ const char *exit_reason_str(unsigned int exit_reason)
* and their base address is returned. A TEST_ASSERT failure occurs if
* not enough pages are available at or above paddr_min.
*/
-vm_paddr_t vm_phy_pages_alloc(struct kvm_vm *vm, size_t num,
- vm_paddr_t paddr_min, uint32_t memslot)
+static vm_paddr_t __vm_phy_pages_alloc(struct kvm_vm *vm, size_t num,
+ vm_paddr_t paddr_min, uint32_t memslot,
+ bool aligned)
{
struct userspace_mem_region *region;
sparsebit_idx_t pg, base;
@@ -1997,14 +1998,20 @@ vm_paddr_t vm_phy_pages_alloc(struct kvm_vm *vm, size_t num,
" paddr_min: 0x%lx page_size: 0x%x",
paddr_min, vm->page_size);

+ TEST_ASSERT(!aligned || (paddr_min % vm->page_size * num) == 0,
+ "Min physical address isn't naturally aligned.\n"
+ " paddr_min: 0x%lx page_size: 0x%x num: %lu",
+ paddr_min, vm->page_size, num);
+
region = memslot2region(vm, memslot);
base = pg = paddr_min >> vm->page_shift;

do {
for (; pg < base + num; ++pg) {
if (!sparsebit_is_set(region->unused_phy_pages, pg)) {
- base = pg = sparsebit_next_set(region->unused_phy_pages, pg);
- break;
+ do {
+ base = pg = sparsebit_next_set(region->unused_phy_pages, pg);
+ } while (aligned && ((pg % num) != 0));
}
}
} while (pg && pg != base + num);
@@ -2024,6 +2031,12 @@ vm_paddr_t vm_phy_pages_alloc(struct kvm_vm *vm, size_t num,
return base * vm->page_size;
}

+vm_paddr_t vm_phy_pages_alloc(struct kvm_vm *vm, size_t num,
+ vm_paddr_t paddr_min, uint32_t memslot)
+{
+ return __vm_phy_pages_alloc(vm, num, paddr_min, memslot, false);
+}
+
vm_paddr_t vm_phy_page_alloc(struct kvm_vm *vm, vm_paddr_t paddr_min,
uint32_t memslot)
{
@@ -2036,6 +2049,12 @@ vm_paddr_t vm_alloc_page_table(struct kvm_vm *vm)
vm->memslots[MEM_REGION_PT]);
}

+vm_paddr_t vm_phy_pages_alloc_aligned(struct kvm_vm *vm, size_t num,
+ vm_paddr_t paddr_min, uint32_t memslot)
+{
+ return __vm_phy_pages_alloc(vm, num, paddr_min, memslot, true);
+}
+
/*
* Address Guest Virtual to Host Virtual
*
--
2.43.0.687.g38aa6559b0-goog


2024-02-13 10:01:34

by Oliver Upton

[permalink] [raw]
Subject: [PATCH v2 14/23] KVM: arm64: vgic-its: Protect cached vgic_irq pointers with RCU

RCU readers of the LPI translation cache will be able to run in parallel
with a cache invalidation, which clears the RCU pointer. Start using RCU
protection on the cached irq pointer in light of this.

Signed-off-by: Oliver Upton <[email protected]>
---
arch/arm64/kvm/vgic/vgic-its.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 35926d5ae561..99042ecc9c85 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -156,7 +156,7 @@ struct vgic_translation_cache_entry {
phys_addr_t db;
u32 devid;
u32 eventid;
- struct vgic_irq *irq;
+ struct vgic_irq __rcu *irq;
atomic64_t usage_count;
};

@@ -574,7 +574,7 @@ static struct vgic_irq *__vgic_its_check_cache(struct vgic_dist *dist,
* If we hit a NULL entry, there is nothing after this
* point.
*/
- if (!cte->irq)
+ if (!rcu_access_pointer(cte->irq))
break;

if (cte->db != db || cte->devid != devid ||
@@ -582,7 +582,7 @@ static struct vgic_irq *__vgic_its_check_cache(struct vgic_dist *dist,
continue;

atomic64_inc(&cte->usage_count);
- return cte->irq;
+ return rcu_dereference(cte->irq);
}

return NULL;
@@ -629,7 +629,7 @@ static struct vgic_translation_cache_entry *vgic_its_cache_victim(struct vgic_di
tmp = atomic64_read(&cte->usage_count);
max = max(max, tmp);

- if (!cte->irq) {
+ if (!rcu_access_pointer(cte->irq)) {
victim = cte;
break;
}
@@ -663,6 +663,7 @@ static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its,
return;

raw_spin_lock_irqsave(&dist->lpi_list_lock, flags);
+ rcu_read_lock();

/*
* We could have raced with another CPU caching the same
@@ -693,27 +694,32 @@ static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its,
new->db = db;
new->devid = devid;
new->eventid = eventid;
- new->irq = irq;
+ rcu_assign_pointer(new->irq, irq);

/* Move the new translation to the head of the list */
list_add(&new->entry, &dist->lpi_translation_cache);
dist->lpi_cache_count++;

out:
+ rcu_read_unlock();
raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags);

if (!victim)
return;

+ synchronize_rcu();
+
/*
* Caching the translation implies having an extra reference
* to the interrupt, so drop the potential reference on what
- * was in the cache, and increment it on the new interrupt.
+ * was in the cache.
*/
if (victim->irq) {
+ struct vgic_irq *irq = rcu_dereference_raw(victim->irq);
+
KVM_VM_TRACE_EVENT(kvm, vgic_its_trans_cache_victim, victim->db,
- victim->devid, victim->eventid, victim->irq->intid);
- vgic_put_irq(kvm, victim->irq);
+ victim->devid, victim->eventid, irq->intid);
+ vgic_put_irq(kvm, irq);
}

kfree(victim);
@@ -726,19 +732,21 @@ void vgic_its_invalidate_cache(struct kvm *kvm)
unsigned long flags;

raw_spin_lock_irqsave(&dist->lpi_list_lock, flags);
+ rcu_read_lock();

list_for_each_entry(cte, &dist->lpi_translation_cache, entry) {
/*
* If we hit a NULL entry, there is nothing after this
* point.
*/
- if (!cte->irq)
+ if (!rcu_access_pointer(cte->irq))
break;

vgic_put_irq(kvm, cte->irq);
- cte->irq = NULL;
+ rcu_assign_pointer(cte->irq, NULL);
}

+ rcu_read_unlock();
raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags);
}

--
2.43.0.687.g38aa6559b0-goog


2024-02-13 10:05:36

by Oliver Upton

[permalink] [raw]
Subject: [PATCH v2 18/23] KVM: selftests: Standardise layout of GIC frames

It would appear that all of the selftests are using the same exact
layout for the GIC frames. Fold this back into the library
implementation to avoid defining magic values all over the selftests.

This is an extension of Colton's change, ripping out parameterization of
from the library internals in addition to the public interfaces.

Co-developed-by: Colton Lewis <[email protected]>
Signed-off-by: Colton Lewis <[email protected]>
Signed-off-by: Oliver Upton <[email protected]>
---
.../selftests/kvm/aarch64/arch_timer.c | 8 +--
.../testing/selftests/kvm/aarch64/vgic_irq.c | 11 +---
.../kvm/aarch64/vpmu_counter_access.c | 6 +-
.../selftests/kvm/dirty_log_perf_test.c | 5 +-
.../selftests/kvm/include/aarch64/gic.h | 10 +++-
.../selftests/kvm/include/aarch64/vgic.h | 3 +-
tools/testing/selftests/kvm/lib/aarch64/gic.c | 18 +++---
.../selftests/kvm/lib/aarch64/gic_private.h | 4 +-
.../selftests/kvm/lib/aarch64/gic_v3.c | 56 +++++++++----------
.../testing/selftests/kvm/lib/aarch64/vgic.c | 18 +++---
10 files changed, 57 insertions(+), 82 deletions(-)

diff --git a/tools/testing/selftests/kvm/aarch64/arch_timer.c b/tools/testing/selftests/kvm/aarch64/arch_timer.c
index 274b8465b42a..f5101898c46a 100644
--- a/tools/testing/selftests/kvm/aarch64/arch_timer.c
+++ b/tools/testing/selftests/kvm/aarch64/arch_timer.c
@@ -59,9 +59,6 @@ static struct test_args test_args = {

#define msecs_to_usecs(msec) ((msec) * 1000LL)

-#define GICD_BASE_GPA 0x8000000ULL
-#define GICR_BASE_GPA 0x80A0000ULL
-
enum guest_stage {
GUEST_STAGE_VTIMER_CVAL = 1,
GUEST_STAGE_VTIMER_TVAL,
@@ -204,8 +201,7 @@ static void guest_code(void)

local_irq_disable();

- gic_init(GIC_V3, test_args.nr_vcpus,
- (void *)GICD_BASE_GPA, (void *)GICR_BASE_GPA);
+ gic_init(GIC_V3, test_args.nr_vcpus);

timer_set_ctl(VIRTUAL, CTL_IMASK);
timer_set_ctl(PHYSICAL, CTL_IMASK);
@@ -391,7 +387,7 @@ static struct kvm_vm *test_vm_create(void)
vcpu_init_descriptor_tables(vcpus[i]);

test_init_timer_irq(vm);
- gic_fd = vgic_v3_setup(vm, nr_vcpus, 64, GICD_BASE_GPA, GICR_BASE_GPA);
+ gic_fd = vgic_v3_setup(vm, nr_vcpus, 64);
__TEST_REQUIRE(gic_fd >= 0, "Failed to create vgic-v3");

/* Make all the test's cmdline args visible to the guest */
diff --git a/tools/testing/selftests/kvm/aarch64/vgic_irq.c b/tools/testing/selftests/kvm/aarch64/vgic_irq.c
index d61a6302f467..a51dbd2a5f84 100644
--- a/tools/testing/selftests/kvm/aarch64/vgic_irq.c
+++ b/tools/testing/selftests/kvm/aarch64/vgic_irq.c
@@ -19,9 +19,6 @@
#include "gic_v3.h"
#include "vgic.h"

-#define GICD_BASE_GPA 0x08000000ULL
-#define GICR_BASE_GPA 0x080A0000ULL
-
/*
* Stores the user specified args; it's passed to the guest and to every test
* function.
@@ -49,9 +46,6 @@ struct test_args {
#define IRQ_DEFAULT_PRIO (LOWEST_PRIO - 1)
#define IRQ_DEFAULT_PRIO_REG (IRQ_DEFAULT_PRIO << KVM_PRIO_SHIFT) /* 0xf0 */

-static void *dist = (void *)GICD_BASE_GPA;
-static void *redist = (void *)GICR_BASE_GPA;
-
/*
* The kvm_inject_* utilities are used by the guest to ask the host to inject
* interrupts (e.g., using the KVM_IRQ_LINE ioctl).
@@ -478,7 +472,7 @@ static void guest_code(struct test_args *args)
bool level_sensitive = args->level_sensitive;
struct kvm_inject_desc *f, *inject_fns;

- gic_init(GIC_V3, 1, dist, redist);
+ gic_init(GIC_V3, 1);

for (i = 0; i < nr_irqs; i++)
gic_irq_enable(i);
@@ -764,8 +758,7 @@ static void test_vgic(uint32_t nr_irqs, bool level_sensitive, bool eoi_split)
memcpy(addr_gva2hva(vm, args_gva), &args, sizeof(args));
vcpu_args_set(vcpu, 1, args_gva);

- gic_fd = vgic_v3_setup(vm, 1, nr_irqs,
- GICD_BASE_GPA, GICR_BASE_GPA);
+ gic_fd = vgic_v3_setup(vm, 1, nr_irqs);
__TEST_REQUIRE(gic_fd >= 0, "Failed to create vgic-v3, skipping");

vm_install_exception_handler(vm, VECTOR_IRQ_CURRENT,
diff --git a/tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c b/tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c
index 9d51b5691349..496a7a3c615b 100644
--- a/tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c
+++ b/tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c
@@ -420,9 +420,6 @@ static void guest_code(uint64_t expected_pmcr_n)
GUEST_DONE();
}

-#define GICD_BASE_GPA 0x8000000ULL
-#define GICR_BASE_GPA 0x80A0000ULL
-
/* Create a VM that has one vCPU with PMUv3 configured. */
static void create_vpmu_vm(void *guest_code)
{
@@ -454,8 +451,7 @@ static void create_vpmu_vm(void *guest_code)
init.features[0] |= (1 << KVM_ARM_VCPU_PMU_V3);
vpmu_vm.vcpu = aarch64_vcpu_add(vpmu_vm.vm, 0, &init, guest_code);
vcpu_init_descriptor_tables(vpmu_vm.vcpu);
- vpmu_vm.gic_fd = vgic_v3_setup(vpmu_vm.vm, 1, 64,
- GICD_BASE_GPA, GICR_BASE_GPA);
+ vpmu_vm.gic_fd = vgic_v3_setup(vpmu_vm.vm, 1, 64);
__TEST_REQUIRE(vpmu_vm.gic_fd >= 0,
"Failed to create vgic-v3, skipping");

diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c
index d374dbcf9a53..53c9d7a2611d 100644
--- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
@@ -22,9 +22,6 @@
#ifdef __aarch64__
#include "aarch64/vgic.h"

-#define GICD_BASE_GPA 0x8000000ULL
-#define GICR_BASE_GPA 0x80A0000ULL
-
static int gic_fd;

static void arch_setup_vm(struct kvm_vm *vm, unsigned int nr_vcpus)
@@ -33,7 +30,7 @@ static void arch_setup_vm(struct kvm_vm *vm, unsigned int nr_vcpus)
* The test can still run even if hardware does not support GICv3, as it
* is only an optimization to reduce guest exits.
*/
- gic_fd = vgic_v3_setup(vm, nr_vcpus, 64, GICD_BASE_GPA, GICR_BASE_GPA);
+ gic_fd = vgic_v3_setup(vm, nr_vcpus, 64);
}

static void arch_cleanup_vm(struct kvm_vm *vm)
diff --git a/tools/testing/selftests/kvm/include/aarch64/gic.h b/tools/testing/selftests/kvm/include/aarch64/gic.h
index b217ea17cac5..16d944486e9c 100644
--- a/tools/testing/selftests/kvm/include/aarch64/gic.h
+++ b/tools/testing/selftests/kvm/include/aarch64/gic.h
@@ -11,6 +11,13 @@ enum gic_type {
GIC_TYPE_MAX,
};

+#define GICD_BASE_GPA 0x8000000ULL
+#define GICR_BASE_GPA (GICD_BASE_GPA + SZ_64K)
+
+/* The GIC is identity-mapped into the guest at the time of setup. */
+#define GICD_BASE_GVA ((void *)GICD_BASE_GPA)
+#define GICR_BASE_GVA ((void *)GICR_BASE_GPA)
+
#define MIN_SGI 0
#define MIN_PPI 16
#define MIN_SPI 32
@@ -21,8 +28,7 @@ enum gic_type {
#define INTID_IS_PPI(intid) (MIN_PPI <= (intid) && (intid) < MIN_SPI)
#define INTID_IS_SPI(intid) (MIN_SPI <= (intid) && (intid) <= MAX_SPI)

-void gic_init(enum gic_type type, unsigned int nr_cpus,
- void *dist_base, void *redist_base);
+void gic_init(enum gic_type type, unsigned int nr_cpus);
void gic_irq_enable(unsigned int intid);
void gic_irq_disable(unsigned int intid);
unsigned int gic_get_and_ack_irq(void);
diff --git a/tools/testing/selftests/kvm/include/aarch64/vgic.h b/tools/testing/selftests/kvm/include/aarch64/vgic.h
index 0ac6f05c63f9..ce19aa0a8360 100644
--- a/tools/testing/selftests/kvm/include/aarch64/vgic.h
+++ b/tools/testing/selftests/kvm/include/aarch64/vgic.h
@@ -16,8 +16,7 @@
((uint64_t)(flags) << 12) | \
index)

-int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs,
- uint64_t gicd_base_gpa, uint64_t gicr_base_gpa);
+int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs);

#define VGIC_MAX_RESERVED 1023

diff --git a/tools/testing/selftests/kvm/lib/aarch64/gic.c b/tools/testing/selftests/kvm/lib/aarch64/gic.c
index 55668631d546..7abbf8866512 100644
--- a/tools/testing/selftests/kvm/lib/aarch64/gic.c
+++ b/tools/testing/selftests/kvm/lib/aarch64/gic.c
@@ -17,13 +17,12 @@
static const struct gic_common_ops *gic_common_ops;
static struct spinlock gic_lock;

-static void gic_cpu_init(unsigned int cpu, void *redist_base)
+static void gic_cpu_init(unsigned int cpu)
{
- gic_common_ops->gic_cpu_init(cpu, redist_base);
+ gic_common_ops->gic_cpu_init(cpu);
}

-static void
-gic_dist_init(enum gic_type type, unsigned int nr_cpus, void *dist_base)
+static void gic_dist_init(enum gic_type type, unsigned int nr_cpus)
{
const struct gic_common_ops *gic_ops = NULL;

@@ -40,7 +39,7 @@ gic_dist_init(enum gic_type type, unsigned int nr_cpus, void *dist_base)

GUEST_ASSERT(gic_ops);

- gic_ops->gic_init(nr_cpus, dist_base);
+ gic_ops->gic_init(nr_cpus);
gic_common_ops = gic_ops;

/* Make sure that the initialized data is visible to all the vCPUs */
@@ -49,18 +48,15 @@ gic_dist_init(enum gic_type type, unsigned int nr_cpus, void *dist_base)
spin_unlock(&gic_lock);
}

-void gic_init(enum gic_type type, unsigned int nr_cpus,
- void *dist_base, void *redist_base)
+void gic_init(enum gic_type type, unsigned int nr_cpus)
{
uint32_t cpu = guest_get_vcpuid();

GUEST_ASSERT(type < GIC_TYPE_MAX);
- GUEST_ASSERT(dist_base);
- GUEST_ASSERT(redist_base);
GUEST_ASSERT(nr_cpus);

- gic_dist_init(type, nr_cpus, dist_base);
- gic_cpu_init(cpu, redist_base);
+ gic_dist_init(type, nr_cpus);
+ gic_cpu_init(cpu);
}

void gic_irq_enable(unsigned int intid)
diff --git a/tools/testing/selftests/kvm/lib/aarch64/gic_private.h b/tools/testing/selftests/kvm/lib/aarch64/gic_private.h
index 75d07313c893..d24e9ecc96c6 100644
--- a/tools/testing/selftests/kvm/lib/aarch64/gic_private.h
+++ b/tools/testing/selftests/kvm/lib/aarch64/gic_private.h
@@ -8,8 +8,8 @@
#define SELFTEST_KVM_GIC_PRIVATE_H

struct gic_common_ops {
- void (*gic_init)(unsigned int nr_cpus, void *dist_base);
- void (*gic_cpu_init)(unsigned int cpu, void *redist_base);
+ void (*gic_init)(unsigned int nr_cpus);
+ void (*gic_cpu_init)(unsigned int cpu);
void (*gic_irq_enable)(unsigned int intid);
void (*gic_irq_disable)(unsigned int intid);
uint64_t (*gic_read_iar)(void);
diff --git a/tools/testing/selftests/kvm/lib/aarch64/gic_v3.c b/tools/testing/selftests/kvm/lib/aarch64/gic_v3.c
index cd8f0e209599..de53d193d0be 100644
--- a/tools/testing/selftests/kvm/lib/aarch64/gic_v3.c
+++ b/tools/testing/selftests/kvm/lib/aarch64/gic_v3.c
@@ -24,8 +24,6 @@
#define ICC_PMR_DEF_PRIO 0xf0

struct gicv3_data {
- void *dist_base;
- void *redist_base[GICV3_MAX_CPUS];
unsigned int nr_cpus;
unsigned int nr_spis;
};
@@ -46,17 +44,23 @@ static void gicv3_gicd_wait_for_rwp(void)
{
unsigned int count = 100000; /* 1s */

- while (readl(gicv3_data.dist_base + GICD_CTLR) & GICD_CTLR_RWP) {
+ while (readl(GICD_BASE_GVA + GICD_CTLR) & GICD_CTLR_RWP) {
GUEST_ASSERT(count--);
udelay(10);
}
}

-static void gicv3_gicr_wait_for_rwp(void *redist_base)
+static inline void *gicr_base_cpu(uint32_t cpu)
+{
+ /* Align all the redistributors sequentially */
+ return GICR_BASE_GVA + cpu * SZ_64K * 2;
+}
+
+static void gicv3_gicr_wait_for_rwp(uint32_t cpu)
{
unsigned int count = 100000; /* 1s */

- while (readl(redist_base + GICR_CTLR) & GICR_CTLR_RWP) {
+ while (readl(gicr_base_cpu(cpu) + GICR_CTLR) & GICR_CTLR_RWP) {
GUEST_ASSERT(count--);
udelay(10);
}
@@ -67,7 +71,7 @@ static void gicv3_wait_for_rwp(uint32_t cpu_or_dist)
if (cpu_or_dist & DIST_BIT)
gicv3_gicd_wait_for_rwp();
else
- gicv3_gicr_wait_for_rwp(gicv3_data.redist_base[cpu_or_dist]);
+ gicv3_gicr_wait_for_rwp(cpu_or_dist);
}

static enum gicv3_intid_range get_intid_range(unsigned int intid)
@@ -127,15 +131,15 @@ static void gicv3_set_eoi_split(bool split)

uint32_t gicv3_reg_readl(uint32_t cpu_or_dist, uint64_t offset)
{
- void *base = cpu_or_dist & DIST_BIT ? gicv3_data.dist_base
- : sgi_base_from_redist(gicv3_data.redist_base[cpu_or_dist]);
+ void *base = cpu_or_dist & DIST_BIT ? GICD_BASE_GVA
+ : sgi_base_from_redist(gicr_base_cpu(cpu_or_dist));
return readl(base + offset);
}

void gicv3_reg_writel(uint32_t cpu_or_dist, uint64_t offset, uint32_t reg_val)
{
- void *base = cpu_or_dist & DIST_BIT ? gicv3_data.dist_base
- : sgi_base_from_redist(gicv3_data.redist_base[cpu_or_dist]);
+ void *base = cpu_or_dist & DIST_BIT ? GICD_BASE_GVA
+ : sgi_base_from_redist(gicr_base_cpu(cpu_or_dist));
writel(reg_val, base + offset);
}

@@ -289,13 +293,7 @@ static void gicv3_enable_redist(void *redist_base)
}
}

-static inline void *gicr_base_cpu(void *redist_base, uint32_t cpu)
-{
- /* Align all the redistributors sequentially */
- return redist_base + cpu * SZ_64K * 2;
-}
-
-static void gicv3_cpu_init(unsigned int cpu, void *redist_base)
+static void gicv3_cpu_init(unsigned int cpu)
{
void *sgi_base;
unsigned int i;
@@ -303,7 +301,7 @@ static void gicv3_cpu_init(unsigned int cpu, void *redist_base)

GUEST_ASSERT(cpu < gicv3_data.nr_cpus);

- redist_base_cpu = gicr_base_cpu(redist_base, cpu);
+ redist_base_cpu = gicr_base_cpu(cpu);
sgi_base = sgi_base_from_redist(redist_base_cpu);

gicv3_enable_redist(redist_base_cpu);
@@ -321,7 +319,7 @@ static void gicv3_cpu_init(unsigned int cpu, void *redist_base)
writel(GICD_INT_DEF_PRI_X4,
sgi_base + GICR_IPRIORITYR0 + i);

- gicv3_gicr_wait_for_rwp(redist_base_cpu);
+ gicv3_gicr_wait_for_rwp(cpu);

/* Enable the GIC system register (ICC_*) access */
write_sysreg_s(read_sysreg_s(SYS_ICC_SRE_EL1) | ICC_SRE_EL1_SRE,
@@ -332,17 +330,14 @@ static void gicv3_cpu_init(unsigned int cpu, void *redist_base)

/* Enable non-secure Group-1 interrupts */
write_sysreg_s(ICC_IGRPEN1_EL1_MASK, SYS_ICC_IGRPEN1_EL1);
-
- gicv3_data.redist_base[cpu] = redist_base_cpu;
}

static void gicv3_dist_init(void)
{
- void *dist_base = gicv3_data.dist_base;
unsigned int i;

/* Disable the distributor until we set things up */
- writel(0, dist_base + GICD_CTLR);
+ writel(0, GICD_BASE_GVA + GICD_CTLR);
gicv3_gicd_wait_for_rwp();

/*
@@ -350,33 +345,32 @@ static void gicv3_dist_init(void)
* Also, deactivate and disable them.
*/
for (i = 32; i < gicv3_data.nr_spis; i += 32) {
- writel(~0, dist_base + GICD_IGROUPR + i / 8);
- writel(~0, dist_base + GICD_ICACTIVER + i / 8);
- writel(~0, dist_base + GICD_ICENABLER + i / 8);
+ writel(~0, GICD_BASE_GVA + GICD_IGROUPR + i / 8);
+ writel(~0, GICD_BASE_GVA + GICD_ICACTIVER + i / 8);
+ writel(~0, GICD_BASE_GVA + GICD_ICENABLER + i / 8);
}

/* Set a default priority for all the SPIs */
for (i = 32; i < gicv3_data.nr_spis; i += 4)
writel(GICD_INT_DEF_PRI_X4,
- dist_base + GICD_IPRIORITYR + i);
+ GICD_BASE_GVA + GICD_IPRIORITYR + i);

/* Wait for the settings to sync-in */
gicv3_gicd_wait_for_rwp();

/* Finally, enable the distributor globally with ARE */
writel(GICD_CTLR_ARE_NS | GICD_CTLR_ENABLE_G1A |
- GICD_CTLR_ENABLE_G1, dist_base + GICD_CTLR);
+ GICD_CTLR_ENABLE_G1, GICD_BASE_GVA + GICD_CTLR);
gicv3_gicd_wait_for_rwp();
}

-static void gicv3_init(unsigned int nr_cpus, void *dist_base)
+static void gicv3_init(unsigned int nr_cpus)
{
GUEST_ASSERT(nr_cpus <= GICV3_MAX_CPUS);

gicv3_data.nr_cpus = nr_cpus;
- gicv3_data.dist_base = dist_base;
gicv3_data.nr_spis = GICD_TYPER_SPIS(
- readl(gicv3_data.dist_base + GICD_TYPER));
+ readl(GICD_BASE_GVA + GICD_TYPER));
if (gicv3_data.nr_spis > 1020)
gicv3_data.nr_spis = 1020;

diff --git a/tools/testing/selftests/kvm/lib/aarch64/vgic.c b/tools/testing/selftests/kvm/lib/aarch64/vgic.c
index b5f28d21a947..ac55b6c2e915 100644
--- a/tools/testing/selftests/kvm/lib/aarch64/vgic.c
+++ b/tools/testing/selftests/kvm/lib/aarch64/vgic.c
@@ -19,8 +19,6 @@
* Input args:
* vm - KVM VM
* nr_vcpus - Number of vCPUs supported by this VM
- * gicd_base_gpa - Guest Physical Address of the Distributor region
- * gicr_base_gpa - Guest Physical Address of the Redistributor region
*
* Output args: None
*
@@ -30,11 +28,10 @@
* redistributor regions of the guest. Since it depends on the number of
* vCPUs for the VM, it must be called after all the vCPUs have been created.
*/
-int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs,
- uint64_t gicd_base_gpa, uint64_t gicr_base_gpa)
+int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs)
{
int gic_fd;
- uint64_t redist_attr;
+ uint64_t attr;
struct list_head *iter;
unsigned int nr_gic_pages, nr_vcpus_created = 0;

@@ -60,18 +57,19 @@ int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs,
kvm_device_attr_set(gic_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
KVM_DEV_ARM_VGIC_CTRL_INIT, NULL);

+ attr = GICD_BASE_GPA;
kvm_device_attr_set(gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
- KVM_VGIC_V3_ADDR_TYPE_DIST, &gicd_base_gpa);
+ KVM_VGIC_V3_ADDR_TYPE_DIST, &attr);
nr_gic_pages = vm_calc_num_guest_pages(vm->mode, KVM_VGIC_V3_DIST_SIZE);
- virt_map(vm, gicd_base_gpa, gicd_base_gpa, nr_gic_pages);
+ virt_map(vm, GICD_BASE_GPA, GICD_BASE_GPA, nr_gic_pages);

/* Redistributor setup */
- redist_attr = REDIST_REGION_ATTR_ADDR(nr_vcpus, gicr_base_gpa, 0, 0);
+ attr = REDIST_REGION_ATTR_ADDR(nr_vcpus, GICR_BASE_GPA, 0, 0);
kvm_device_attr_set(gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
- KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, &redist_attr);
+ KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, &attr);
nr_gic_pages = vm_calc_num_guest_pages(vm->mode,
KVM_VGIC_V3_REDIST_SIZE * nr_vcpus);
- virt_map(vm, gicr_base_gpa, gicr_base_gpa, nr_gic_pages);
+ virt_map(vm, GICR_BASE_GPA, GICR_BASE_GPA, nr_gic_pages);

kvm_device_attr_set(gic_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
KVM_DEV_ARM_VGIC_CTRL_INIT, NULL);
--
2.43.0.687.g38aa6559b0-goog


2024-02-13 20:12:40

by Oliver Upton

[permalink] [raw]
Subject: Re: [PATCH v2 00/23] KVM: arm64: Improvements to LPI injection

On Tue, Feb 13, 2024 at 09:32:37AM +0000, Oliver Upton wrote:

[...]

> Clearly the RCU synchronization is a bounding issue in this case. I
> think other scenarios where the cache is overcommitted (16 vCPUs, 16
> devices, 17 events / device) are able to hide effects somewhat, as other
> threads can make forward progress while others are stuck waiting on RCU.
>
> A few ideas on next steps:
>
> 1) Rework the lpi_list_lock as an rwlock. This would obviate the need
> for RCU protection in the LPI cache as well as memory allocations on
> the injection path. This is actually what I had in the internal
> version of the series, although it was very incomplete.
>
> I'd expect this to nullify the improvement on the
> slightly-overcommitted case and 'fix' the pathological case.
>
> 2) call_rcu() and move on. This feels somewhat abusive of the API, as
> the guest can flood the host with RCU callbacks, but I wasn't able
> to make my machine fall over in any mean configuration of the test.
>
> I haven't studied the degree to which such a malicious VM could
> adversely affect neighboring workloads.
>
> 3) Redo the whole ITS representation with xarrays and allow RCU readers
> outside of the ITS lock. I haven't fully thought this out, and if we
> pursue this option then we will need a secondary data structure to
> track where ITSes have been placed in guest memory to avoid taking
> the SRCU lock. We can then stick RCU synchronization in ITS command
> processing, which feels right to me, and dump the translation cache
> altogether.
>
> I'd expect slightly worse average case performance in favor of more
> consistent performance.

Marc and I had an off-list conversation about this and agreed on option
4!

It is somewhat similar in spirit to (3), in that KVM will maintain an
xarray translation cache per ITS, indexed by (device_id, event_id). This
will be a perfect cache that can fit the entire range addressed by the
ITS. The memory overheads of the xarray are not anticipated to be
consequential, as the ITS memory footprint already scales linearly with
the number of entries in the ITS.

Separately the DB -> ITS translation will be resolved by walking the
ITSes present in the VM.

The existing invalidation calls will be scoped to an ITS besides the
case where the guest disables LPIs on a redistributor.

--
Thanks,
Oliver

2024-02-14 16:48:04

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 07/23] KVM: arm64: vgic: Use atomics to count LPIs

On Tue, 13 Feb 2024 09:32:44 +0000,
Oliver Upton <[email protected]> wrote:
>
> Switch to using atomics for LPI accounting, allowing vgic_irq references
> to be dropped in parallel.
>
> Signed-off-by: Oliver Upton <[email protected]>
> ---
> arch/arm64/kvm/vgic/vgic-debug.c | 2 +-
> arch/arm64/kvm/vgic/vgic-its.c | 4 ++--
> arch/arm64/kvm/vgic/vgic.c | 2 +-
> include/kvm/arm_vgic.h | 4 ++--
> 4 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/kvm/vgic/vgic-debug.c b/arch/arm64/kvm/vgic/vgic-debug.c
> index 85606a531dc3..389025ce7749 100644
> --- a/arch/arm64/kvm/vgic/vgic-debug.c
> +++ b/arch/arm64/kvm/vgic/vgic-debug.c
> @@ -149,7 +149,7 @@ static void print_dist_state(struct seq_file *s, struct vgic_dist *dist)
> seq_printf(s, "vgic_model:\t%s\n", v3 ? "GICv3" : "GICv2");
> seq_printf(s, "nr_spis:\t%d\n", dist->nr_spis);
> if (v3)
> - seq_printf(s, "nr_lpis:\t%d\n", dist->lpi_list_count);
> + seq_printf(s, "nr_lpis:\t%d\n", atomic_read(&dist->lpi_count));
> seq_printf(s, "enabled:\t%d\n", dist->enabled);
> seq_printf(s, "\n");
>
> diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
> index c68164d6cba0..048226812974 100644
> --- a/arch/arm64/kvm/vgic/vgic-its.c
> +++ b/arch/arm64/kvm/vgic/vgic-its.c
> @@ -97,7 +97,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
> goto out_unlock;
> }
>
> - dist->lpi_list_count++;
> + atomic_inc(&dist->lpi_count);
>
> out_unlock:
> if (ret)
> @@ -345,7 +345,7 @@ int vgic_copy_lpi_list(struct kvm *kvm, struct kvm_vcpu *vcpu, u32 **intid_ptr)
> * command). If coming from another path (such as enabling LPIs),
> * we must be careful not to overrun the array.
> */
> - irq_count = READ_ONCE(dist->lpi_list_count);
> + irq_count = atomic_read(&dist->lpi_count);

I'd like to propose an alternative approach here. I've always hated
this "copy a bunch of INTIDs" thing, and the only purpose of this
silly counter is to dimension the resulting array.

Could we instead rely on an xarray marking a bunch of entries (the
ones we want to 'copy'), and get the reader to clear these marks once
done?

Of course, we only have 3 marks, so that's a bit restrictive from a
concurrency perspective, but since most callers hold a lock, it should
be OK.

What do you think?

M.

--
Without deviation from the norm, progress is not possible.

2024-02-14 17:33:31

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 19/23] KVM: selftests: Add a minimal library for interacting with an ITS

On Tue, 13 Feb 2024 09:41:14 +0000,
Oliver Upton <[email protected]> wrote:
>
> A prerequisite of testing LPI injection performance is of course
> instantiating an ITS for the guest. Add a small library for creating an
> ITS and interacting with it *from userspace*.
>
> Yep, you read that right. KVM unintentionally allows userspace to send
> commands to the virtual ITS via the command queue. Besides adding test
> coverage for an elusive UAPI, interacting with the ITS in userspace
> simplifies the handling of commands that need to allocate memory, like a
> MAPD command with an ITT.

I don't mean to derail the party, but I really think we should plug
this hole. Either that, or we make it an official interface for state
restore. And don't we all love to have multiple interfaces to do the
same thing?

>
> Signed-off-by: Oliver Upton <[email protected]>
> ---
> .../selftests/kvm/include/aarch64/gic.h | 7 +-
> .../selftests/kvm/include/aarch64/vgic.h | 20 ++
> .../testing/selftests/kvm/lib/aarch64/vgic.c | 241 ++++++++++++++++++
> 3 files changed, 267 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/kvm/include/aarch64/gic.h b/tools/testing/selftests/kvm/include/aarch64/gic.h
> index 16d944486e9c..abb41d67880c 100644
> --- a/tools/testing/selftests/kvm/include/aarch64/gic.h
> +++ b/tools/testing/selftests/kvm/include/aarch64/gic.h
> @@ -11,7 +11,12 @@ enum gic_type {
> GIC_TYPE_MAX,
> };
>
> -#define GICD_BASE_GPA 0x8000000ULL
> +/*
> + * Note that the redistributor frames are at the end, as the range scales
> + * with the number of vCPUs in the VM.
> + */
> +#define GITS_BASE_GPA 0x8000000ULL
> +#define GICD_BASE_GPA (GITS_BASE_GPA + SZ_128K)
> #define GICR_BASE_GPA (GICD_BASE_GPA + SZ_64K)
>
> /* The GIC is identity-mapped into the guest at the time of setup. */
> diff --git a/tools/testing/selftests/kvm/include/aarch64/vgic.h b/tools/testing/selftests/kvm/include/aarch64/vgic.h
> index ce19aa0a8360..d45b2902439d 100644
> --- a/tools/testing/selftests/kvm/include/aarch64/vgic.h
> +++ b/tools/testing/selftests/kvm/include/aarch64/vgic.h
> @@ -32,4 +32,24 @@ void kvm_irq_write_isactiver(int gic_fd, uint32_t intid, struct kvm_vcpu *vcpu);
>
> #define KVM_IRQCHIP_NUM_PINS (1020 - 32)
>
> +struct vgic_its {
> + int its_fd;
> + void *cmdq_hva;
> + size_t cmdq_size;
> +};
> +
> +struct vgic_its *vgic_its_setup(struct kvm_vm *vm,
> + vm_paddr_t coll_tbl, size_t coll_tbl_sz,
> + vm_paddr_t device_tbl, size_t device_tbl_sz,
> + vm_paddr_t cmdq, size_t cmdq_size);
> +void vgic_its_destroy(struct vgic_its *its);
> +
> +void vgic_its_send_mapd_cmd(struct vgic_its *its, u32 device_id,
> + vm_paddr_t itt_base, size_t itt_size, bool valid);
> +void vgic_its_send_mapc_cmd(struct vgic_its *its, struct kvm_vcpu *vcpu,
> + u32 collection_id, bool valid);
> +void vgic_its_send_mapti_cmd(struct vgic_its *its, u32 device_id,
> + u32 event_id, u32 collection_id, u32 intid);
> +void vgic_its_send_invall_cmd(struct vgic_its *its, u32 collection_id);
> +
> #endif // SELFTEST_KVM_VGIC_H
> diff --git a/tools/testing/selftests/kvm/lib/aarch64/vgic.c b/tools/testing/selftests/kvm/lib/aarch64/vgic.c
> index ac55b6c2e915..fc7b4fbe6453 100644
> --- a/tools/testing/selftests/kvm/lib/aarch64/vgic.c
> +++ b/tools/testing/selftests/kvm/lib/aarch64/vgic.c
> @@ -12,6 +12,7 @@
> #include "vgic.h"
> #include "gic.h"
> #include "gic_v3.h"
> +#include "processor.h"
>
> /*
> * vGIC-v3 default host setup
> @@ -166,3 +167,243 @@ void kvm_irq_write_isactiver(int gic_fd, uint32_t intid, struct kvm_vcpu *vcpu)
> {
> vgic_poke_irq(gic_fd, intid, vcpu, GICD_ISACTIVER);
> }
> +
> +static u64 vgic_its_read_reg(int its_fd, unsigned long offset)
> +{
> + u64 attr;
> +
> + kvm_device_attr_get(its_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
> + offset, &attr);
> + return attr;
> +}
> +
> +static void vgic_its_write_reg(int its_fd, unsigned long offset, u64 val)
> +{
> + kvm_device_attr_set(its_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
> + offset, &val);
> +}
> +
> +static unsigned long vgic_its_find_baser(int its_fd, unsigned int type)
> +{
> + int i;
> +
> + for (i = 0; i < GITS_BASER_NR_REGS; i++) {
> + u64 baser;
> + unsigned long offset = GITS_BASER + (i * sizeof(baser));
> +
> + baser = vgic_its_read_reg(its_fd, offset);
> + if (GITS_BASER_TYPE(baser) == type)
> + return offset;
> + }
> +
> + TEST_FAIL("Couldn't find an ITS BASER of type %u", type);
> + return -1;
> +}
> +
> +static void vgic_its_install_table(int its_fd, unsigned int type, vm_paddr_t base,
> + size_t size)
> +{
> + unsigned long offset = vgic_its_find_baser(its_fd, type);
> + u64 baser;
> +
> + baser = ((size / SZ_64K) - 1) |
> + GITS_BASER_PAGE_SIZE_64K |
> + GITS_BASER_InnerShareable |
> + base |
> + GITS_BASER_RaWaWb |
> + GITS_BASER_VALID;
> +
> + vgic_its_write_reg(its_fd, offset, baser);
> +}
> +
> +static void vgic_its_install_cmdq(int its_fd, vm_paddr_t base, size_t size)
> +{
> + u64 cbaser;
> +
> + cbaser = ((size / SZ_4K) - 1) |
> + GITS_CBASER_InnerShareable |
> + base |
> + GITS_CBASER_RaWaWb |
> + GITS_CBASER_VALID;
> +
> + vgic_its_write_reg(its_fd, GITS_CBASER, cbaser);
> +}
> +
> +struct vgic_its *vgic_its_setup(struct kvm_vm *vm,
> + vm_paddr_t coll_tbl, size_t coll_tbl_sz,
> + vm_paddr_t device_tbl, size_t device_tbl_sz,
> + vm_paddr_t cmdq, size_t cmdq_size)
> +{
> + int its_fd = kvm_create_device(vm, KVM_DEV_TYPE_ARM_VGIC_ITS);
> + struct vgic_its *its = malloc(sizeof(struct vgic_its));
> + u64 attr, ctlr;
> +
> + attr = GITS_BASE_GPA;
> + kvm_device_attr_set(its_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
> + KVM_VGIC_ITS_ADDR_TYPE, &attr);
> +
> + kvm_device_attr_set(its_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
> + KVM_DEV_ARM_VGIC_CTRL_INIT, NULL);
> +
> + vgic_its_install_table(its_fd, GITS_BASER_TYPE_COLLECTION, coll_tbl,
> + coll_tbl_sz);
> + vgic_its_install_table(its_fd, GITS_BASER_TYPE_DEVICE, device_tbl,
> + device_tbl_sz);
> +
> + vgic_its_install_cmdq(its_fd, cmdq, cmdq_size);
> +
> + ctlr = vgic_its_read_reg(its_fd, GITS_CTLR);
> + ctlr |= GITS_CTLR_ENABLE;
> + vgic_its_write_reg(its_fd, GITS_CTLR, ctlr);
> +
> + *its = (struct vgic_its) {
> + .its_fd = its_fd,
> + .cmdq_hva = addr_gpa2hva(vm, cmdq),
> + .cmdq_size = cmdq_size,
> + };
> +
> + return its;
> +}
> +
> +void vgic_its_destroy(struct vgic_its *its)
> +{
> + close(its->its_fd);
> + free(its);
> +}
> +
> +struct its_cmd_block {
> + union {
> + u64 raw_cmd[4];
> + __le64 raw_cmd_le[4];
> + };
> +};
> +
> +static inline void its_fixup_cmd(struct its_cmd_block *cmd)
> +{
> + /* Let's fixup BE commands */
> + cmd->raw_cmd_le[0] = cpu_to_le64(cmd->raw_cmd[0]);
> + cmd->raw_cmd_le[1] = cpu_to_le64(cmd->raw_cmd[1]);
> + cmd->raw_cmd_le[2] = cpu_to_le64(cmd->raw_cmd[2]);
> + cmd->raw_cmd_le[3] = cpu_to_le64(cmd->raw_cmd[3]);
> +}
> +
> +static void its_mask_encode(u64 *raw_cmd, u64 val, int h, int l)
> +{
> + u64 mask = GENMASK_ULL(h, l);
> + *raw_cmd &= ~mask;
> + *raw_cmd |= (val << l) & mask;
> +}
> +
> +static void its_encode_cmd(struct its_cmd_block *cmd, u8 cmd_nr)
> +{
> + its_mask_encode(&cmd->raw_cmd[0], cmd_nr, 7, 0);
> +}
> +
> +static void its_encode_devid(struct its_cmd_block *cmd, u32 devid)
> +{
> + its_mask_encode(&cmd->raw_cmd[0], devid, 63, 32);
> +}
> +
> +static void its_encode_event_id(struct its_cmd_block *cmd, u32 id)
> +{
> + its_mask_encode(&cmd->raw_cmd[1], id, 31, 0);
> +}
> +
> +static void its_encode_phys_id(struct its_cmd_block *cmd, u32 phys_id)
> +{
> + its_mask_encode(&cmd->raw_cmd[1], phys_id, 63, 32);
> +}
> +
> +static void its_encode_size(struct its_cmd_block *cmd, u8 size)
> +{
> + its_mask_encode(&cmd->raw_cmd[1], size, 4, 0);
> +}
> +
> +static void its_encode_itt(struct its_cmd_block *cmd, u64 itt_addr)
> +{
> + its_mask_encode(&cmd->raw_cmd[2], itt_addr >> 8, 51, 8);
> +}
> +
> +static void its_encode_valid(struct its_cmd_block *cmd, int valid)
> +{
> + its_mask_encode(&cmd->raw_cmd[2], !!valid, 63, 63);
> +}
> +
> +static void its_encode_target(struct its_cmd_block *cmd, u64 target_addr)
> +{
> + its_mask_encode(&cmd->raw_cmd[2], target_addr >> 16, 51, 16);
> +}
> +
> +static void its_encode_collection(struct its_cmd_block *cmd, u16 col)
> +{
> + its_mask_encode(&cmd->raw_cmd[2], col, 15, 0);
> +}
> +
> +static void vgic_its_send_cmd(struct vgic_its *its, struct its_cmd_block *cmd)
> +{
> + u64 cwriter = vgic_its_read_reg(its->its_fd, GITS_CWRITER);
> + struct its_cmd_block *dst = its->cmdq_hva + cwriter;
> + u64 next;
> +
> + its_fixup_cmd(cmd);
> +
> + WRITE_ONCE(*dst, *cmd);
> + dsb(ishst);
> +
> + next = (cwriter + sizeof(*cmd)) % its->cmdq_size;
> + vgic_its_write_reg(its->its_fd, GITS_CWRITER, next);
> +
> + TEST_ASSERT(vgic_its_read_reg(its->its_fd, GITS_CREADR) == next,
> + "ITS didn't process command at offset: %lu\n", cwriter);
> +}
> +
> +void vgic_its_send_mapd_cmd(struct vgic_its *its, u32 device_id,
> + vm_paddr_t itt_base, size_t itt_size, bool valid)
> +{
> + struct its_cmd_block cmd = {};
> +
> + its_encode_cmd(&cmd, GITS_CMD_MAPD);
> + its_encode_devid(&cmd, device_id);
> + its_encode_size(&cmd, ilog2(itt_size) - 1);
> + its_encode_itt(&cmd, itt_base);
> + its_encode_valid(&cmd, valid);
> +
> + vgic_its_send_cmd(its, &cmd);
> +}
> +
> +void vgic_its_send_mapc_cmd(struct vgic_its *its, struct kvm_vcpu *vcpu,
> + u32 collection_id, bool valid)
> +{
> + struct its_cmd_block cmd = {};
> +
> + its_encode_cmd(&cmd, GITS_CMD_MAPC);
> + its_encode_collection(&cmd, collection_id);
> + its_encode_target(&cmd, vcpu->id);
> + its_encode_valid(&cmd, valid);
> +
> + vgic_its_send_cmd(its, &cmd);
> +}
> +
> +void vgic_its_send_mapti_cmd(struct vgic_its *its, u32 device_id,
> + u32 event_id, u32 collection_id, u32 intid)
> +{
> + struct its_cmd_block cmd = {};
> +
> + its_encode_cmd(&cmd, GITS_CMD_MAPTI);
> + its_encode_devid(&cmd, device_id);
> + its_encode_event_id(&cmd, event_id);
> + its_encode_phys_id(&cmd, intid);
> + its_encode_collection(&cmd, collection_id);
> +
> + vgic_its_send_cmd(its, &cmd);
> +}
> +
> +void vgic_its_send_invall_cmd(struct vgic_its *its, u32 collection_id)
> +{
> + struct its_cmd_block cmd = {};
> +
> + its_encode_cmd(&cmd, GITS_CMD_INVALL);
> + its_encode_collection(&cmd, collection_id);
> +
> + vgic_its_send_cmd(its, &cmd);
> +}

Holy crap, that's a whole ITS driver in loserspace. *mindblown*.

M.

--
Without deviation from the norm, progress is not possible.

2024-02-14 17:43:52

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 00/23] KVM: arm64: Improvements to LPI injection

On Tue, 13 Feb 2024 09:32:37 +0000,
Oliver Upton <[email protected]> wrote:
>
> For full details on the what/why, please see the cover letter in v1.
>
> Apologies for the delay on v2, I wanted to spend some time to get a
> microbenchmark in place to slam the ITS code pretty hard, and based on
> the results I'm glad I did.

[...]

Buglets and potential improvements aside, I like the smell of this. At
least the first handful of patches could easily be taken as a separate
improvement series.

Let me know how you'd like to play this.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2024-02-14 18:34:40

by Oliver Upton

[permalink] [raw]
Subject: Re: [PATCH v2 07/23] KVM: arm64: vgic: Use atomics to count LPIs

Hey,

On Wed, Feb 14, 2024 at 04:47:49PM +0000, Marc Zyngier wrote:
> I'd like to propose an alternative approach here. I've always hated
> this "copy a bunch of INTIDs" thing,

Agree.

> and the only purpose of this
> silly counter is to dimension the resulting array.

Well, we also use it to trivially print the number of LPIs for a
particular vgic in the debug interface.

> Could we instead rely on an xarray marking a bunch of entries (the
> ones we want to 'copy'), and get the reader to clear these marks once
> done?

I think that'd work. I'm trying to convince myself we don't have bugs
lurking in some of the existing usage of vgic_copy_lpi_list()...

> Of course, we only have 3 marks, so that's a bit restrictive from a
> concurrency perspective, but since most callers hold a lock, it should
> be OK.

They all hold *a* lock, but maybe not the same one! :)

Maybe we should serialize the use of markers on the LPI list on the
config_lock. A slight misuse, but we need a mutex since we're poking at
guest memory. Then we can go through the whole N-dimensional locking
puzzle and convince ourselves it is still correct.

--
Thanks,
Oliver

2024-02-14 19:02:45

by Oliver Upton

[permalink] [raw]
Subject: Re: [PATCH v2 19/23] KVM: selftests: Add a minimal library for interacting with an ITS

On Wed, Feb 14, 2024 at 05:32:25PM +0000, Marc Zyngier wrote:
> On Tue, 13 Feb 2024 09:41:14 +0000,
> Oliver Upton <[email protected]> wrote:
> >
> > A prerequisite of testing LPI injection performance is of course
> > instantiating an ITS for the guest. Add a small library for creating an
> > ITS and interacting with it *from userspace*.
> >
> > Yep, you read that right. KVM unintentionally allows userspace to send
> > commands to the virtual ITS via the command queue. Besides adding test
> > coverage for an elusive UAPI, interacting with the ITS in userspace
> > simplifies the handling of commands that need to allocate memory, like a
> > MAPD command with an ITT.
>
> I don't mean to derail the party, but I really think we should plug
> this hole. Either that, or we make it an official interface for state
> restore. And don't we all love to have multiple interfaces to do the
> same thing?

Ok, I've thought about it a bit more and I'm fully convinced we need to
shut the door on this stupidity.

We expect CREADR == CWRITER at the time userspace saves the ITS
registers, but we have a *hideous* ordering issue on the restore path.

If the order of restore from userspace is CBASER, CWRITER, CREADR then
we **wind up replaying the entire command queue**. While insane, I'm
pretty sure it is legal for the guest to write garbage after the read
pointer has moved past a particular command index.

Fsck!!!

So, how about we do this:

- Provide a uaccess hook for CWRITER that changes the write-pointer
without processing any commands

- Assert an invariant that at any time CWRITER or CREADR are read from
userspace that CREADR == CWRITER. Fail the ioctl and scream if that
isn't the case, so that way we never need to worry about processing
'in-flight' commands at the destination.

--
Thanks,
Oliver

2024-02-14 19:09:18

by Oliver Upton

[permalink] [raw]
Subject: Re: [PATCH v2 00/23] KVM: arm64: Improvements to LPI injection

On Wed, Feb 14, 2024 at 05:43:13PM +0000, Marc Zyngier wrote:
> On Tue, 13 Feb 2024 09:32:37 +0000,
> Oliver Upton <[email protected]> wrote:
> >
> > For full details on the what/why, please see the cover letter in v1.
> >
> > Apologies for the delay on v2, I wanted to spend some time to get a
> > microbenchmark in place to slam the ITS code pretty hard, and based on
> > the results I'm glad I did.
>
> [...]
>
> Buglets and potential improvements aside, I like the smell of this. At
> least the first handful of patches could easily be taken as a separate
> improvement series.
>
> Let me know how you'd like to play this.

Yeah, I think there's 3 independent series here if we want to take the
initial improvements:

- Address contention around vgic_get_irq() / vgic_put_irq() with the
first 10 patches. Appears there is violent agreement these are good
to go.

- Changing out the translation cache into a per-ITS xarray

- A final series cleaning up a lot of the warts we have in LPI
management, like vgic_copy_lpi_list(). I believe we can get rid of
the lpi_list_lock as well, but this needs to be ordered after the
first 2.

I'd really like to de-risk the performance changes from the cleanups, as
I'm convinced they're going to have their own respective piles of bugs.

How does that sound?

--
Thanks,
Oliver

2024-02-14 20:37:31

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 07/23] KVM: arm64: vgic: Use atomics to count LPIs

On Wed, 14 Feb 2024 18:32:02 +0000,
Oliver Upton <[email protected]> wrote:
>
> Hey,
>
> On Wed, Feb 14, 2024 at 04:47:49PM +0000, Marc Zyngier wrote:
> > I'd like to propose an alternative approach here. I've always hated
> > this "copy a bunch of INTIDs" thing,
>
> Agree.
>
> > and the only purpose of this
> > silly counter is to dimension the resulting array.
>
> Well, we also use it to trivially print the number of LPIs for a
> particular vgic in the debug interface.

I think we can get survive this... ;-)

>
> > Could we instead rely on an xarray marking a bunch of entries (the
> > ones we want to 'copy'), and get the reader to clear these marks once
> > done?
>
> I think that'd work. I'm trying to convince myself we don't have bugs
> lurking in some of the existing usage of vgic_copy_lpi_list()...
>
> > Of course, we only have 3 marks, so that's a bit restrictive from a
> > concurrency perspective, but since most callers hold a lock, it should
> > be OK.
>
> They all hold *a* lock, but maybe not the same one! :)

Indeed. But as long as there isn't more than 3 locks (and that the
xarray is OK being concurrently updated with marks), we're good!

> Maybe we should serialize the use of markers on the LPI list on the
> config_lock. A slight misuse, but we need a mutex since we're poking at
> guest memory. Then we can go through the whole N-dimensional locking
> puzzle and convince ourselves it is still correct.

Maybe. This thing is already seeing so many abuses that one more may
not matter much. Need to see how it fits in the whole hierarchy of
GIC-related locks...

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2024-02-14 20:41:19

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 19/23] KVM: selftests: Add a minimal library for interacting with an ITS

On Wed, 14 Feb 2024 19:00:00 +0000,
Oliver Upton <[email protected]> wrote:
>
> On Wed, Feb 14, 2024 at 05:32:25PM +0000, Marc Zyngier wrote:
> > On Tue, 13 Feb 2024 09:41:14 +0000,
> > Oliver Upton <[email protected]> wrote:
> > >
> > > A prerequisite of testing LPI injection performance is of course
> > > instantiating an ITS for the guest. Add a small library for creating an
> > > ITS and interacting with it *from userspace*.
> > >
> > > Yep, you read that right. KVM unintentionally allows userspace to send
> > > commands to the virtual ITS via the command queue. Besides adding test
> > > coverage for an elusive UAPI, interacting with the ITS in userspace
> > > simplifies the handling of commands that need to allocate memory, like a
> > > MAPD command with an ITT.
> >
> > I don't mean to derail the party, but I really think we should plug
> > this hole. Either that, or we make it an official interface for state
> > restore. And don't we all love to have multiple interfaces to do the
> > same thing?
>
> Ok, I've thought about it a bit more and I'm fully convinced we need to
> shut the door on this stupidity.
>
> We expect CREADR == CWRITER at the time userspace saves the ITS
> registers, but we have a *hideous* ordering issue on the restore path.
>
> If the order of restore from userspace is CBASER, CWRITER, CREADR then
> we **wind up replaying the entire command queue**. While insane, I'm
> pretty sure it is legal for the guest to write garbage after the read
> pointer has moved past a particular command index.
>
> Fsck!!!

This is documented Documentation/virt/kvm/devices/arm-vgic-its.rst to
some extent, and it is allowed for the guest to crap itself on behalf
of userspace if the ordering isn't respected.

> So, how about we do this:
>
> - Provide a uaccess hook for CWRITER that changes the write-pointer
> without processing any commands
>
> - Assert an invariant that at any time CWRITER or CREADR are read from
> userspace that CREADR == CWRITER. Fail the ioctl and scream if that
> isn't the case, so that way we never need to worry about processing
> 'in-flight' commands at the destination.

Are we guaranteed that we cannot ever see CWRITER != CREADR at VM
dumping time? I'm not convinced that we cannot preempt the vcpu thread
at the right spot, specially given that you can have an arbitrary
large batch of commands to execute.

Just add a page-fault to the mix, and a signal pending. Pronto, you
see a guest exit and you should be able to start dumping things
without the ITS having processed much. I haven't tried, but that
doesn't seem totally unlikely.

M.

--
Without deviation from the norm, progress is not possible.

2024-02-14 20:55:34

by Oliver Upton

[permalink] [raw]
Subject: Re: [PATCH v2 19/23] KVM: selftests: Add a minimal library for interacting with an ITS

On Wed, Feb 14, 2024 at 08:09:52PM +0000, Marc Zyngier wrote:
> > If the order of restore from userspace is CBASER, CWRITER, CREADR then
> > we **wind up replaying the entire command queue**. While insane, I'm
> > pretty sure it is legal for the guest to write garbage after the read
> > pointer has moved past a particular command index.
> >
> > Fsck!!!
>
> This is documented Documentation/virt/kvm/devices/arm-vgic-its.rst to
> some extent, and it is allowed for the guest to crap itself on behalf
> of userspace if the ordering isn't respected.

Ah, fair, I missed the documentation here. If we require userspace to
write CTLR last then we _should_ be fine, but damn is this a tricky set
of expectations.

> > So, how about we do this:
> >
> > - Provide a uaccess hook for CWRITER that changes the write-pointer
> > without processing any commands
> >
> > - Assert an invariant that at any time CWRITER or CREADR are read from
> > userspace that CREADR == CWRITER. Fail the ioctl and scream if that
> > isn't the case, so that way we never need to worry about processing
> > 'in-flight' commands at the destination.
>
> Are we guaranteed that we cannot ever see CWRITER != CREADR at VM
> dumping time? I'm not convinced that we cannot preempt the vcpu thread
> at the right spot, specially given that you can have an arbitrary
> large batch of commands to execute.
>
> Just add a page-fault to the mix, and a signal pending. Pronto, you
> see a guest exit and you should be able to start dumping things
> without the ITS having processed much. I haven't tried, but that
> doesn't seem totally unlikely.

Well, we would need to run all userspace reads and writes through the
cmd_lock in this case, which is what we already do for the CREADR
uaccess hook. To me the 'racy' queue accessors only make sense for guest
accesses, since the driver is expecting to poll for completion in that
case.

Otherwise we decide the existing rules for restoring the ITS are fine
and I get to keep my funky driver :)

--
Thanks,
Oliver

2024-02-14 21:06:38

by Oliver Upton

[permalink] [raw]
Subject: Re: [PATCH v2 19/23] KVM: selftests: Add a minimal library for interacting with an ITS

On Wed, Feb 14, 2024 at 12:55:11PM -0800, Oliver Upton wrote:
> On Wed, Feb 14, 2024 at 08:09:52PM +0000, Marc Zyngier wrote:
> > > If the order of restore from userspace is CBASER, CWRITER, CREADR then
> > > we **wind up replaying the entire command queue**. While insane, I'm
> > > pretty sure it is legal for the guest to write garbage after the read
> > > pointer has moved past a particular command index.
> > >
> > > Fsck!!!
> >
> > This is documented Documentation/virt/kvm/devices/arm-vgic-its.rst to
> > some extent, and it is allowed for the guest to crap itself on behalf
> > of userspace if the ordering isn't respected.
>
> Ah, fair, I missed the documentation here. If we require userspace to
> write CTLR last then we _should_ be fine, but damn is this a tricky set
> of expectations.
>
> > > So, how about we do this:
> > >
> > > - Provide a uaccess hook for CWRITER that changes the write-pointer
> > > without processing any commands
> > >
> > > - Assert an invariant that at any time CWRITER or CREADR are read from
> > > userspace that CREADR == CWRITER. Fail the ioctl and scream if that
> > > isn't the case, so that way we never need to worry about processing
> > > 'in-flight' commands at the destination.
> >
> > Are we guaranteed that we cannot ever see CWRITER != CREADR at VM
> > dumping time? I'm not convinced that we cannot preempt the vcpu thread
> > at the right spot, specially given that you can have an arbitrary
> > large batch of commands to execute.
> >
> > Just add a page-fault to the mix, and a signal pending. Pronto, you
> > see a guest exit and you should be able to start dumping things
> > without the ITS having processed much. I haven't tried, but that
> > doesn't seem totally unlikely.
>
> Well, we would need to run all userspace reads and writes through the
> cmd_lock in this case, which is what we already do for the CREADR
> uaccess hook. To me the 'racy' queue accessors only make sense for guest
> accesses, since the driver is expecting to poll for completion in that
> case.

My proposed invariant cannot be maintained, of course, since userspace
can do whatever it pleases on the cmdq pointers.

> Otherwise we decide the existing rules for restoring the ITS are fine
> and I get to keep my funky driver :)
>
> --
> Thanks,
> Oliver
>

2024-02-14 23:25:22

by Oliver Upton

[permalink] [raw]
Subject: Re: [PATCH v2 07/23] KVM: arm64: vgic: Use atomics to count LPIs

On Wed, Feb 14, 2024 at 08:01:19PM +0000, Marc Zyngier wrote:
> > > Of course, we only have 3 marks, so that's a bit restrictive from a
> > > concurrency perspective, but since most callers hold a lock, it should
> > > be OK.
> >
> > They all hold *a* lock, but maybe not the same one! :)
>
> Indeed. But as long as there isn't more than 3 locks (and that the
> xarray is OK being concurrently updated with marks), we're good!

Oh, you mean to give each existing caller their own mark?

> > Maybe we should serialize the use of markers on the LPI list on the
> > config_lock. A slight misuse, but we need a mutex since we're poking at
> > guest memory. Then we can go through the whole N-dimensional locking
> > puzzle and convince ourselves it is still correct.
>
> Maybe. This thing is already seeing so many abuses that one more may
> not matter much. Need to see how it fits in the whole hierarchy of
> GIC-related locks...

It doesn't work. We have it that the config_lock needs to be taken
outside the its_lock.

Too many damn locks!

--
Thanks,
Oliver

2024-02-15 09:44:19

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 07/23] KVM: arm64: vgic: Use atomics to count LPIs

On Wed, 14 Feb 2024 23:01:04 +0000,
Oliver Upton <[email protected]> wrote:
>
> On Wed, Feb 14, 2024 at 08:01:19PM +0000, Marc Zyngier wrote:
> > > > Of course, we only have 3 marks, so that's a bit restrictive from a
> > > > concurrency perspective, but since most callers hold a lock, it should
> > > > be OK.
> > >
> > > They all hold *a* lock, but maybe not the same one! :)
> >
> > Indeed. But as long as there isn't more than 3 locks (and that the
> > xarray is OK being concurrently updated with marks), we're good!
>
> Oh, you mean to give each existing caller their own mark?

Well, each caller "class". Where "class" means "holding look
'foo'". Same lock, same mark. With a maximum of 3 (and I think we can
get away with 2).

> > > Maybe we should serialize the use of markers on the LPI list on the
> > > config_lock. A slight misuse, but we need a mutex since we're poking at
> > > guest memory. Then we can go through the whole N-dimensional locking
> > > puzzle and convince ourselves it is still correct.
> >
> > Maybe. This thing is already seeing so many abuses that one more may
> > not matter much. Need to see how it fits in the whole hierarchy of
> > GIC-related locks...
>
> It doesn't work. We have it that the config_lock needs to be taken
> outside the its_lock.
>
> Too many damn locks!

Well, the joys of emulating highly complex HW with a braindead
programming interface. I'd explore the above suggestion to avoid
introducing a new lock, if at all possible.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2024-02-15 15:38:54

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 00/23] KVM: arm64: Improvements to LPI injection

On Wed, 14 Feb 2024 18:40:27 +0000,
Oliver Upton <[email protected]> wrote:
>
> On Wed, Feb 14, 2024 at 05:43:13PM +0000, Marc Zyngier wrote:
> > On Tue, 13 Feb 2024 09:32:37 +0000,
> > Oliver Upton <[email protected]> wrote:
> > >
> > > For full details on the what/why, please see the cover letter in v1.
> > >
> > > Apologies for the delay on v2, I wanted to spend some time to get a
> > > microbenchmark in place to slam the ITS code pretty hard, and based on
> > > the results I'm glad I did.
> >
> > [...]
> >
> > Buglets and potential improvements aside, I like the smell of this. At
> > least the first handful of patches could easily be taken as a separate
> > improvement series.
> >
> > Let me know how you'd like to play this.
>
> Yeah, I think there's 3 independent series here if we want to take the
> initial improvements:
>
> - Address contention around vgic_get_irq() / vgic_put_irq() with the
> first 10 patches. Appears there is violent agreement these are good
> to go.
>
> - Changing out the translation cache into a per-ITS xarray
>
> - A final series cleaning up a lot of the warts we have in LPI
> management, like vgic_copy_lpi_list(). I believe we can get rid of
> the lpi_list_lock as well, but this needs to be ordered after the
> first 2.
>
> I'd really like to de-risk the performance changes from the cleanups, as
> I'm convinced they're going to have their own respective piles of bugs.
>
> How does that sound?

Yup, I'd be on board with that. If you can respin the first part with
bugs fixed and without the stats, that'd be great. We can further
bikeshed on the rest in the 6.10 time frame.

Also please Cc: Eric Auger, as he dealt with a lot of the ITS
save/restore stuff.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2024-02-15 20:15:38

by Oliver Upton

[permalink] [raw]
Subject: Re: [PATCH v2 00/23] KVM: arm64: Improvements to LPI injection

On Thu, Feb 15, 2024 at 03:37:45PM +0000, Marc Zyngier wrote:
> > I'd really like to de-risk the performance changes from the cleanups, as
> > I'm convinced they're going to have their own respective piles of bugs.
> >
> > How does that sound?
>
> Yup, I'd be on board with that. If you can respin the first part with
> bugs fixed and without the stats, that'd be great. We can further
> bikeshed on the rest in the 6.10 time frame.

Cool. That makes things much easier to manage. I'll respin the get / put
improvements shortly, and hopefully this time I actually fix the stupid
lock imbalance :)

> Also please Cc: Eric Auger, as he dealt with a lot of the ITS
> save/restore stuff.

Always happy to designate another victim to review my crap.

--
Thanks,
Oliver