2021-05-27 12:52:45

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 0/7] KVM: arm64: Reduce hyp_vmemmap overhead

Hi all,

When running in nVHE protected mode, the hypervisor manages its own
vmemmap and uses it to store page metadata, e.g. refcounts. This series
shrinks the size of struct hyp_page from 32 bytes to 4 bytes without
loss of functionality, hence reducing the cost of the hyp vmemmap from
8MB/GB to 1MB/GB with 4K pages.

The series has two immediate benefits:
- the memory overhead of the nVHE protected mode is reduced;
- it refactors the host stage-2 memory pools in a way that allows
better re-use of pages to map MMIO ranges, allowing more MMIO
mappings (currently limited to 1GB IPA space) most of the time.

But more importantly, the series reduces the hyp vmemmap overhead enough
that we might consider covering _all_ of memory with it at EL2 in the
future. This would simplify significantly the dynamic admission of
memory into the EL2 allocator, which will be required when the
hypervisor will allocate stage-2 page-tables of guests for instance.
This would also allow the hypervisor to refcount pages it doesn't 'own',
which be useful to track shared pages and such.

The series is split as follows
- patches 01-03 move the list_head of each page from struct hyp_page
to the page itself -- the pages are attached to the free lists only
when they are free by definition;
- patches 04-05 remove the hyp_pool pointer from struct hyp_page as
that information can be inferred from the context;
- patches 06-07 reduce the size of the remaining members of struct
hyp_page which are currently oversized for the needs of the
hypervisor.

On a last note, I believe we could actually make hyp_page fit in 16bits
when using 4K pages: limiting the MAX_ORDER to 7 should suffice and
require only 3 bits, and 13bits should be enough for the refcount for
the existing use-cases. I decided not to implement this as we probably
want to keep some room to grow in hyp_page (e.g. add flags, ...), that
might cause issues to make refcounts atomic, and 16bits are not enough
with 64K pages so we'd have to deal with that separately, but that _is_
a possibility.

Thanks!
Quentin

Quentin Perret (7):
KVM: arm64: Move hyp_pool locking out of refcount helpers
KVM: arm64: Use refcount at hyp to check page availability
KVM: arm64: Remove list_head from hyp_page
KVM: arm64: Unify MMIO and mem host stage-2 pools
KVM: arm64: Remove hyp_pool pointer from struct hyp_page
KVM: arm64: Use less bits for hyp_page order
KVM: arm64: Use less bits for hyp_page refcount

arch/arm64/kvm/hyp/include/nvhe/gfp.h | 33 ++-----
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 +-
arch/arm64/kvm/hyp/include/nvhe/memory.h | 7 +-
arch/arm64/kvm/hyp/include/nvhe/mm.h | 13 +--
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 59 +++++++------
arch/arm64/kvm/hyp/nvhe/page_alloc.c | 87 ++++++++++++-------
arch/arm64/kvm/hyp/nvhe/setup.c | 30 ++++---
arch/arm64/kvm/hyp/reserved_mem.c | 3 +-
8 files changed, 123 insertions(+), 111 deletions(-)

--
2.31.1.818.g46aad6cb9e-goog


2021-05-27 12:52:45

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 1/7] KVM: arm64: Move hyp_pool locking out of refcount helpers

The hyp_page refcount helpers currently rely on the hyp_pool lock for
serialization. However, this means the refcounts can't be changed from
the buddy allocator core as it already holds the lock, which means pages
have to go through odd transient states.

For example, when a page is freed, its refcount is set to 0, and the
lock is transiently released before the page can be attached to a free
list in the buddy tree. This is currently harmless as the allocator
checks the list node of each page to see if it is available for
allocation or not, but it means the page refcount can't be trusted to
represent the state of the page even if the pool lock is held.

In order to fix this, remove the pool locking from the refcount helpers,
and move all the logic to the buddy allocator. This will simplify the
removal of the list node from struct hyp_page in a later patch.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/gfp.h | 21 ++-------------------
arch/arm64/kvm/hyp/nvhe/page_alloc.c | 19 ++++++++-----------
2 files changed, 10 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
index 18a4494337bd..aada4d97de49 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -24,37 +24,20 @@ struct hyp_pool {

static inline void hyp_page_ref_inc(struct hyp_page *p)
{
- struct hyp_pool *pool = hyp_page_to_pool(p);
-
- hyp_spin_lock(&pool->lock);
p->refcount++;
- hyp_spin_unlock(&pool->lock);
}

static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
{
- struct hyp_pool *pool = hyp_page_to_pool(p);
- int ret;
-
- hyp_spin_lock(&pool->lock);
p->refcount--;
- ret = (p->refcount == 0);
- hyp_spin_unlock(&pool->lock);
-
- return ret;
+ return (p->refcount == 0);
}

static inline void hyp_set_page_refcounted(struct hyp_page *p)
{
- struct hyp_pool *pool = hyp_page_to_pool(p);
-
- hyp_spin_lock(&pool->lock);
- if (p->refcount) {
- hyp_spin_unlock(&pool->lock);
+ if (p->refcount)
BUG();
- }
p->refcount = 1;
- hyp_spin_unlock(&pool->lock);
}

/* Allocation */
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index 237e03bf0cb1..04573bf35441 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -93,15 +93,6 @@ static void __hyp_attach_page(struct hyp_pool *pool,
list_add_tail(&p->node, &pool->free_area[order]);
}

-static void hyp_attach_page(struct hyp_page *p)
-{
- struct hyp_pool *pool = hyp_page_to_pool(p);
-
- hyp_spin_lock(&pool->lock);
- __hyp_attach_page(pool, p);
- hyp_spin_unlock(&pool->lock);
-}
-
static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
struct hyp_page *p,
unsigned int order)
@@ -128,16 +119,22 @@ static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
void hyp_put_page(void *addr)
{
struct hyp_page *p = hyp_virt_to_page(addr);
+ struct hyp_pool *pool = hyp_page_to_pool(p);

+ hyp_spin_lock(&pool->lock);
if (hyp_page_ref_dec_and_test(p))
- hyp_attach_page(p);
+ __hyp_attach_page(pool, p);
+ hyp_spin_unlock(&pool->lock);
}

void hyp_get_page(void *addr)
{
struct hyp_page *p = hyp_virt_to_page(addr);
+ struct hyp_pool *pool = hyp_page_to_pool(p);

+ hyp_spin_lock(&pool->lock);
hyp_page_ref_inc(p);
+ hyp_spin_unlock(&pool->lock);
}

void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order)
@@ -159,8 +156,8 @@ void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order)
p = list_first_entry(&pool->free_area[i], struct hyp_page, node);
p = __hyp_extract_page(pool, p, order);

- hyp_spin_unlock(&pool->lock);
hyp_set_page_refcounted(p);
+ hyp_spin_unlock(&pool->lock);

return hyp_page_to_virt(p);
}
--
2.31.1.818.g46aad6cb9e-goog

2021-05-27 12:52:46

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 2/7] KVM: arm64: Use refcount at hyp to check page availability

The hyp buddy allocator currently checks the struct hyp_page list node
to see if a page is available for allocation or not when trying to
coalesce memory. Now that decrementing the refcount and attaching to
the buddy tree is done in the same critical section, we can rely on the
refcount of the buddy page to be in sync, which allows to replace the
list node check by a refcount check. This will ease removing the list
node from struct hyp_page later on.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/page_alloc.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index 04573bf35441..7ee882f36767 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -55,7 +55,7 @@ static struct hyp_page *__find_buddy_avail(struct hyp_pool *pool,
{
struct hyp_page *buddy = __find_buddy_nocheck(pool, p, order);

- if (!buddy || buddy->order != order || list_empty(&buddy->node))
+ if (!buddy || buddy->order != order || buddy->refcount)
return NULL;

return buddy;
@@ -116,14 +116,19 @@ static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
return p;
}

+static void __hyp_put_page(struct hyp_pool *pool, struct hyp_page *p)
+{
+ if (hyp_page_ref_dec_and_test(p))
+ __hyp_attach_page(pool, p);
+}
+
void hyp_put_page(void *addr)
{
struct hyp_page *p = hyp_virt_to_page(addr);
struct hyp_pool *pool = hyp_page_to_pool(p);

hyp_spin_lock(&pool->lock);
- if (hyp_page_ref_dec_and_test(p))
- __hyp_attach_page(pool, p);
+ __hyp_put_page(pool, p);
hyp_spin_unlock(&pool->lock);
}

@@ -178,15 +183,16 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,

/* Init the vmemmap portion */
p = hyp_phys_to_page(phys);
- memset(p, 0, sizeof(*p) * nr_pages);
for (i = 0; i < nr_pages; i++) {
p[i].pool = pool;
+ p[i].order = 0;
INIT_LIST_HEAD(&p[i].node);
+ hyp_set_page_refcounted(&p[i]);
}

/* Attach the unused pages to the buddy tree */
for (i = reserved_pages; i < nr_pages; i++)
- __hyp_attach_page(pool, &p[i]);
+ __hyp_put_page(pool, &p[i]);

return 0;
}
--
2.31.1.818.g46aad6cb9e-goog

2021-05-27 12:52:51

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 3/7] KVM: arm64: Remove list_head from hyp_page

The list_head member of struct hyp_page is only needed when the page is
attached to a free-list, which by definition implies the page is free.
As such, nothing prevents us from using the page itself to store the
list_head, hence reducing the size of the vmemmap.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/memory.h | 1 -
arch/arm64/kvm/hyp/nvhe/page_alloc.c | 39 ++++++++++++++++++++----
2 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index fd78bde939ee..7691ab495eb4 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -12,7 +12,6 @@ struct hyp_page {
unsigned int refcount;
unsigned int order;
struct hyp_pool *pool;
- struct list_head node;
};

extern u64 __hyp_vmemmap;
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index 7ee882f36767..ce7379f1480b 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -62,6 +62,34 @@ static struct hyp_page *__find_buddy_avail(struct hyp_pool *pool,

}

+/*
+ * Pages that are available for allocation are tracked in free-lists, so we use
+ * the pages themselves to store the list nodes to avoid wasting space. As the
+ * allocator always returns zeroed pages (which are zeroed on the hyp_put_page()
+ * path to optimize allocation speed), we also need to clean-up the list node in
+ * each page when we take it out of the list.
+ */
+static inline void page_remove_from_list(struct hyp_page *p)
+{
+ struct list_head *node = (struct list_head *)hyp_page_to_virt(p);
+
+ __list_del_entry(node);
+ memset(node, 0, sizeof(*node));
+}
+
+static inline void page_add_to_list(struct hyp_page *p, struct list_head *head)
+{
+ struct list_head *node = (struct list_head *)hyp_page_to_virt(p);
+
+ INIT_LIST_HEAD(node);
+ list_add_tail(node, head);
+}
+
+static inline struct hyp_page *node_to_page(struct list_head *node)
+{
+ return (struct hyp_page *)hyp_virt_to_page(node);
+}
+
static void __hyp_attach_page(struct hyp_pool *pool,
struct hyp_page *p)
{
@@ -83,14 +111,14 @@ static void __hyp_attach_page(struct hyp_pool *pool,
break;

/* Take the buddy out of its list, and coallesce with @p */
- list_del_init(&buddy->node);
+ page_remove_from_list(buddy);
buddy->order = HYP_NO_ORDER;
p = min(p, buddy);
}

/* Mark the new head, and insert it */
p->order = order;
- list_add_tail(&p->node, &pool->free_area[order]);
+ page_add_to_list(p, &pool->free_area[order]);
}

static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
@@ -99,7 +127,7 @@ static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
{
struct hyp_page *buddy;

- list_del_init(&p->node);
+ page_remove_from_list(p);
while (p->order > order) {
/*
* The buddy of order n - 1 currently has HYP_NO_ORDER as it
@@ -110,7 +138,7 @@ static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
p->order--;
buddy = __find_buddy_nocheck(pool, p, p->order);
buddy->order = p->order;
- list_add_tail(&buddy->node, &pool->free_area[buddy->order]);
+ page_add_to_list(buddy, &pool->free_area[buddy->order]);
}

return p;
@@ -158,7 +186,7 @@ void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order)
}

/* Extract it from the tree at the right order */
- p = list_first_entry(&pool->free_area[i], struct hyp_page, node);
+ p = node_to_page(pool->free_area[i].next);
p = __hyp_extract_page(pool, p, order);

hyp_set_page_refcounted(p);
@@ -186,7 +214,6 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
for (i = 0; i < nr_pages; i++) {
p[i].pool = pool;
p[i].order = 0;
- INIT_LIST_HEAD(&p[i].node);
hyp_set_page_refcounted(&p[i]);
}

--
2.31.1.818.g46aad6cb9e-goog

2021-05-27 12:53:32

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 6/7] KVM: arm64: Use less bits for hyp_page order

The hyp_page order is currently encoded on 4 bytes even though it is
guaranteed to be smaller than this. Make it 2 bytes to reduce the hyp
vmemmap overhead.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/gfp.h | 6 +++---
arch/arm64/kvm/hyp/include/nvhe/memory.h | 2 +-
arch/arm64/kvm/hyp/nvhe/page_alloc.c | 12 ++++++------
3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
index 9ed374648364..d420e5c0845f 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -7,7 +7,7 @@
#include <nvhe/memory.h>
#include <nvhe/spinlock.h>

-#define HYP_NO_ORDER UINT_MAX
+#define HYP_NO_ORDER USHRT_MAX

struct hyp_pool {
/*
@@ -19,7 +19,7 @@ struct hyp_pool {
struct list_head free_area[MAX_ORDER];
phys_addr_t range_start;
phys_addr_t range_end;
- unsigned int max_order;
+ unsigned short max_order;
};

static inline void hyp_page_ref_inc(struct hyp_page *p)
@@ -41,7 +41,7 @@ static inline void hyp_set_page_refcounted(struct hyp_page *p)
}

/* Allocation */
-void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order);
+void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order);
void hyp_get_page(void *addr, struct hyp_pool *pool);
void hyp_put_page(void *addr, struct hyp_pool *pool);

diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 991636be2f46..3fe34fa30ea4 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -9,7 +9,7 @@

struct hyp_page {
unsigned int refcount;
- unsigned int order;
+ unsigned short order;
};

extern u64 __hyp_vmemmap;
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index e453108a2d95..2ba2c550ab03 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -32,7 +32,7 @@ u64 __hyp_vmemmap;
*/
static struct hyp_page *__find_buddy_nocheck(struct hyp_pool *pool,
struct hyp_page *p,
- unsigned int order)
+ unsigned short order)
{
phys_addr_t addr = hyp_page_to_phys(p);

@@ -51,7 +51,7 @@ static struct hyp_page *__find_buddy_nocheck(struct hyp_pool *pool,
/* Find a buddy page currently available for allocation */
static struct hyp_page *__find_buddy_avail(struct hyp_pool *pool,
struct hyp_page *p,
- unsigned int order)
+ unsigned short order)
{
struct hyp_page *buddy = __find_buddy_nocheck(pool, p, order);

@@ -93,7 +93,7 @@ static inline struct hyp_page *node_to_page(struct list_head *node)
static void __hyp_attach_page(struct hyp_pool *pool,
struct hyp_page *p)
{
- unsigned int order = p->order;
+ unsigned short order = p->order;
struct hyp_page *buddy;

memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
@@ -123,7 +123,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,

static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
struct hyp_page *p,
- unsigned int order)
+ unsigned short order)
{
struct hyp_page *buddy;

@@ -168,9 +168,9 @@ void hyp_get_page(void *addr, struct hyp_pool *pool)
hyp_spin_unlock(&pool->lock);
}

-void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order)
+void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order)
{
- unsigned int i = order;
+ unsigned short i = order;
struct hyp_page *p;

hyp_spin_lock(&pool->lock);
--
2.31.1.818.g46aad6cb9e-goog

2021-05-27 12:54:27

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 4/7] KVM: arm64: Unify MMIO and mem host stage-2 pools

We currently maintain two separate memory pools for the host stage-2,
one for pages used in the page-table when mapping memory regions, and
the other to map MMIO regions. The former is large enough to map all of
memory with page granularity and the latter can cover an arbitrary
portion of IPA space, but allows to 'recycle' pages.

However, this split makes accounting difficult to manage as pages at
intermediate levels of the page-table may be used to map both memory and
MMIO regions. Simplify the scheme by merging both pools into one. This
means we can now hit the -ENOMEM case in the memory abort path, but
we're still guaranteed forward-progress in the worst case by unmapping
MMIO regions. On the plus side this also means we can usually map a lot
more MMIO space at once if memory ranges happen to be mapped with block
mappings.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 +-
arch/arm64/kvm/hyp/include/nvhe/mm.h | 13 +++---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 46 ++++++++-----------
arch/arm64/kvm/hyp/nvhe/setup.c | 16 ++-----
arch/arm64/kvm/hyp/reserved_mem.c | 3 +-
5 files changed, 32 insertions(+), 48 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 42d81ec739fa..9c227d87c36d 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -23,7 +23,7 @@ extern struct host_kvm host_kvm;
int __pkvm_prot_finalize(void);
int __pkvm_mark_hyp(phys_addr_t start, phys_addr_t end);

-int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool);
+int kvm_host_prepare_stage2(void *pgt_pool_base);
void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);

static __always_inline void __load_host_stage2(void)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 0095f6289742..8ec3a5a7744b 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -78,19 +78,20 @@ static inline unsigned long hyp_s1_pgtable_pages(void)
return res;
}

-static inline unsigned long host_s2_mem_pgtable_pages(void)
+static inline unsigned long host_s2_pgtable_pages(void)
{
+ unsigned long res;
+
/*
* Include an extra 16 pages to safely upper-bound the worst case of
* concatenated pgds.
*/
- return __hyp_pgtable_total_pages() + 16;
-}
+ res = __hyp_pgtable_total_pages() + 16;

-static inline unsigned long host_s2_dev_pgtable_pages(void)
-{
/* Allow 1 GiB for MMIO mappings */
- return __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
+ res += __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
+
+ return res;
}

#endif /* __KVM_HYP_MM_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index e342f7f4f4fb..fdd5b5702e8a 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -23,8 +23,7 @@
extern unsigned long hyp_nr_cpus;
struct host_kvm host_kvm;

-struct hyp_pool host_s2_mem;
-struct hyp_pool host_s2_dev;
+struct hyp_pool host_s2_pool;

/*
* Copies of the host's CPU features registers holding sanitized values.
@@ -36,7 +35,7 @@ static const u8 pkvm_hyp_id = 1;

static void *host_s2_zalloc_pages_exact(size_t size)
{
- return hyp_alloc_pages(&host_s2_mem, get_order(size));
+ return hyp_alloc_pages(&host_s2_pool, get_order(size));
}

static void *host_s2_zalloc_page(void *pool)
@@ -44,20 +43,14 @@ static void *host_s2_zalloc_page(void *pool)
return hyp_alloc_pages(pool, 0);
}

-static int prepare_s2_pools(void *mem_pgt_pool, void *dev_pgt_pool)
+static int prepare_s2_pool(void *pgt_pool_base)
{
unsigned long nr_pages, pfn;
int ret;

- pfn = hyp_virt_to_pfn(mem_pgt_pool);
- nr_pages = host_s2_mem_pgtable_pages();
- ret = hyp_pool_init(&host_s2_mem, pfn, nr_pages, 0);
- if (ret)
- return ret;
-
- pfn = hyp_virt_to_pfn(dev_pgt_pool);
- nr_pages = host_s2_dev_pgtable_pages();
- ret = hyp_pool_init(&host_s2_dev, pfn, nr_pages, 0);
+ pfn = hyp_virt_to_pfn(pgt_pool_base);
+ nr_pages = host_s2_pgtable_pages();
+ ret = hyp_pool_init(&host_s2_pool, pfn, nr_pages, 0);
if (ret)
return ret;

@@ -86,7 +79,7 @@ static void prepare_host_vtcr(void)
id_aa64mmfr1_el1_sys_val, phys_shift);
}

-int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool)
+int kvm_host_prepare_stage2(void *pgt_pool_base)
{
struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
int ret;
@@ -94,7 +87,7 @@ int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool)
prepare_host_vtcr();
hyp_spin_lock_init(&host_kvm.lock);

- ret = prepare_s2_pools(mem_pgt_pool, dev_pgt_pool);
+ ret = prepare_s2_pool(pgt_pool_base);
if (ret)
return ret;

@@ -199,11 +192,10 @@ static bool range_is_memory(u64 start, u64 end)
}

static inline int __host_stage2_idmap(u64 start, u64 end,
- enum kvm_pgtable_prot prot,
- struct hyp_pool *pool)
+ enum kvm_pgtable_prot prot)
{
return kvm_pgtable_stage2_map(&host_kvm.pgt, start, end - start, start,
- prot, pool);
+ prot, &host_s2_pool);
}

static int host_stage2_idmap(u64 addr)
@@ -211,7 +203,6 @@ static int host_stage2_idmap(u64 addr)
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
struct kvm_mem_range range;
bool is_memory = find_mem_range(addr, &range);
- struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
int ret;

if (is_memory)
@@ -222,22 +213,21 @@ static int host_stage2_idmap(u64 addr)
if (ret)
goto unlock;

- ret = __host_stage2_idmap(range.start, range.end, prot, pool);
- if (is_memory || ret != -ENOMEM)
+ ret = __host_stage2_idmap(range.start, range.end, prot);
+ if (ret != -ENOMEM)
goto unlock;

/*
- * host_s2_mem has been provided with enough pages to cover all of
- * memory with page granularity, so we should never hit the ENOMEM case.
- * However, it is difficult to know how much of the MMIO range we will
- * need to cover upfront, so we may need to 'recycle' the pages if we
- * run out.
+ * The pool has been provided with enough pages to cover all of memory
+ * with page granularity, but it is difficult to know how much of the
+ * MMIO range we will need to cover upfront, so we may need to 'recycle'
+ * the pages if we run out.
*/
ret = host_stage2_unmap_dev_all();
if (ret)
goto unlock;

- ret = __host_stage2_idmap(range.start, range.end, prot, pool);
+ ret = __host_stage2_idmap(range.start, range.end, prot);

unlock:
hyp_spin_unlock(&host_kvm.lock);
@@ -258,7 +248,7 @@ int __pkvm_mark_hyp(phys_addr_t start, phys_addr_t end)

hyp_spin_lock(&host_kvm.lock);
ret = kvm_pgtable_stage2_set_owner(&host_kvm.pgt, start, end - start,
- &host_s2_mem, pkvm_hyp_id);
+ &host_s2_pool, pkvm_hyp_id);
hyp_spin_unlock(&host_kvm.lock);

return ret != -EAGAIN ? ret : 0;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 7488f53b0aa2..709cb3d19eb7 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -25,8 +25,7 @@ unsigned long hyp_nr_cpus;

static void *vmemmap_base;
static void *hyp_pgt_base;
-static void *host_s2_mem_pgt_base;
-static void *host_s2_dev_pgt_base;
+static void *host_s2_pgt_base;

static int divide_memory_pool(void *virt, unsigned long size)
{
@@ -45,14 +44,9 @@ static int divide_memory_pool(void *virt, unsigned long size)
if (!hyp_pgt_base)
return -ENOMEM;

- nr_pages = host_s2_mem_pgtable_pages();
- host_s2_mem_pgt_base = hyp_early_alloc_contig(nr_pages);
- if (!host_s2_mem_pgt_base)
- return -ENOMEM;
-
- nr_pages = host_s2_dev_pgtable_pages();
- host_s2_dev_pgt_base = hyp_early_alloc_contig(nr_pages);
- if (!host_s2_dev_pgt_base)
+ nr_pages = host_s2_pgtable_pages();
+ host_s2_pgt_base = hyp_early_alloc_contig(nr_pages);
+ if (!host_s2_pgt_base)
return -ENOMEM;

return 0;
@@ -158,7 +152,7 @@ void __noreturn __pkvm_init_finalise(void)
if (ret)
goto out;

- ret = kvm_host_prepare_stage2(host_s2_mem_pgt_base, host_s2_dev_pgt_base);
+ ret = kvm_host_prepare_stage2(host_s2_pgt_base);
if (ret)
goto out;

diff --git a/arch/arm64/kvm/hyp/reserved_mem.c b/arch/arm64/kvm/hyp/reserved_mem.c
index 83ca23ac259b..d654921dd09b 100644
--- a/arch/arm64/kvm/hyp/reserved_mem.c
+++ b/arch/arm64/kvm/hyp/reserved_mem.c
@@ -71,8 +71,7 @@ void __init kvm_hyp_reserve(void)
}

hyp_mem_pages += hyp_s1_pgtable_pages();
- hyp_mem_pages += host_s2_mem_pgtable_pages();
- hyp_mem_pages += host_s2_dev_pgtable_pages();
+ hyp_mem_pages += host_s2_pgtable_pages();

/*
* The hyp_vmemmap needs to be backed by pages, but these pages
--
2.31.1.818.g46aad6cb9e-goog

2021-05-27 12:54:58

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 7/7] KVM: arm64: Use less bits for hyp_page refcount

The hyp_page refcount is currently encoded on 4 bytes even though we
never need to count that many objects in a page. Make it 2 bytes to save
some space in the vmemmap.

As overflows are more likely to happen as well, make sure to catch those
with a BUG in the increment function.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/gfp.h | 2 ++
arch/arm64/kvm/hyp/include/nvhe/memory.h | 2 +-
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
index d420e5c0845f..a82f73faf41e 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -24,6 +24,8 @@ struct hyp_pool {

static inline void hyp_page_ref_inc(struct hyp_page *p)
{
+ if (p->refcount == USHRT_MAX)
+ BUG();
p->refcount++;
}

diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 3fe34fa30ea4..592b7edb3edb 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -8,7 +8,7 @@
#include <linux/types.h>

struct hyp_page {
- unsigned int refcount;
+ unsigned short refcount;
unsigned short order;
};

--
2.31.1.818.g46aad6cb9e-goog

2021-05-27 20:50:10

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 5/7] KVM: arm64: Remove hyp_pool pointer from struct hyp_page

Each struct hyp_page currently contains a pointer to a hyp_pool struct
where the page should be freed if its refcount reaches 0. However, this
information can always be inferred from the context in the EL2 code, so
drop the pointer to save a few bytes in the vmemmap.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/gfp.h | 4 ++--
arch/arm64/kvm/hyp/include/nvhe/memory.h | 2 --
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 13 +++++++++++--
arch/arm64/kvm/hyp/nvhe/page_alloc.c | 7 ++-----
arch/arm64/kvm/hyp/nvhe/setup.c | 14 ++++++++++++--
5 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
index aada4d97de49..9ed374648364 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -42,8 +42,8 @@ static inline void hyp_set_page_refcounted(struct hyp_page *p)

/* Allocation */
void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order);
-void hyp_get_page(void *addr);
-void hyp_put_page(void *addr);
+void hyp_get_page(void *addr, struct hyp_pool *pool);
+void hyp_put_page(void *addr, struct hyp_pool *pool);

/* Used pages cannot be freed */
int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 7691ab495eb4..991636be2f46 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -7,11 +7,9 @@

#include <linux/types.h>

-struct hyp_pool;
struct hyp_page {
unsigned int refcount;
unsigned int order;
- struct hyp_pool *pool;
};

extern u64 __hyp_vmemmap;
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index fdd5b5702e8a..3603311eb41c 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -42,6 +42,15 @@ static void *host_s2_zalloc_page(void *pool)
{
return hyp_alloc_pages(pool, 0);
}
+static void host_s2_get_page(void *addr)
+{
+ hyp_get_page(addr, &host_s2_pool);
+}
+
+static void host_s2_put_page(void *addr)
+{
+ hyp_put_page(addr, &host_s2_pool);
+}

static int prepare_s2_pool(void *pgt_pool_base)
{
@@ -60,8 +69,8 @@ static int prepare_s2_pool(void *pgt_pool_base)
.phys_to_virt = hyp_phys_to_virt,
.virt_to_phys = hyp_virt_to_phys,
.page_count = hyp_page_count,
- .get_page = hyp_get_page,
- .put_page = hyp_put_page,
+ .get_page = host_s2_get_page,
+ .put_page = host_s2_put_page,
};

return 0;
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index ce7379f1480b..e453108a2d95 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -150,20 +150,18 @@ static void __hyp_put_page(struct hyp_pool *pool, struct hyp_page *p)
__hyp_attach_page(pool, p);
}

-void hyp_put_page(void *addr)
+void hyp_put_page(void *addr, struct hyp_pool *pool)
{
struct hyp_page *p = hyp_virt_to_page(addr);
- struct hyp_pool *pool = hyp_page_to_pool(p);

hyp_spin_lock(&pool->lock);
__hyp_put_page(pool, p);
hyp_spin_unlock(&pool->lock);
}

-void hyp_get_page(void *addr)
+void hyp_get_page(void *addr, struct hyp_pool *pool)
{
struct hyp_page *p = hyp_virt_to_page(addr);
- struct hyp_pool *pool = hyp_page_to_pool(p);

hyp_spin_lock(&pool->lock);
hyp_page_ref_inc(p);
@@ -212,7 +210,6 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
/* Init the vmemmap portion */
p = hyp_phys_to_page(phys);
for (i = 0; i < nr_pages; i++) {
- p[i].pool = pool;
p[i].order = 0;
hyp_set_page_refcounted(&p[i]);
}
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 709cb3d19eb7..bf61abd4a330 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -137,6 +137,16 @@ static void *hyp_zalloc_hyp_page(void *arg)
return hyp_alloc_pages(&hpool, 0);
}

+static void hpool_get_page(void *addr)
+{
+ hyp_get_page(addr, &hpool);
+}
+
+static void hpool_put_page(void *addr)
+{
+ hyp_put_page(addr, &hpool);
+}
+
void __noreturn __pkvm_init_finalise(void)
{
struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
@@ -160,8 +170,8 @@ void __noreturn __pkvm_init_finalise(void)
.zalloc_page = hyp_zalloc_hyp_page,
.phys_to_virt = hyp_phys_to_virt,
.virt_to_phys = hyp_virt_to_phys,
- .get_page = hyp_get_page,
- .put_page = hyp_put_page,
+ .get_page = hpool_get_page,
+ .put_page = hpool_put_page,
};
pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;

--
2.31.1.818.g46aad6cb9e-goog

2021-06-01 12:03:43

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 1/7] KVM: arm64: Move hyp_pool locking out of refcount helpers

On Thu, 27 May 2021 13:51:28 +0100,
Quentin Perret <[email protected]> wrote:
>
> The hyp_page refcount helpers currently rely on the hyp_pool lock for
> serialization. However, this means the refcounts can't be changed from
> the buddy allocator core as it already holds the lock, which means pages
> have to go through odd transient states.
>
> For example, when a page is freed, its refcount is set to 0, and the
> lock is transiently released before the page can be attached to a free
> list in the buddy tree. This is currently harmless as the allocator
> checks the list node of each page to see if it is available for
> allocation or not, but it means the page refcount can't be trusted to
> represent the state of the page even if the pool lock is held.
>
> In order to fix this, remove the pool locking from the refcount helpers,
> and move all the logic to the buddy allocator. This will simplify the
> removal of the list node from struct hyp_page in a later patch.

Is there any chance some documentation could be added so that we have
a record of what the locking boundaries are? Something along the line
of what we have in arch/arm64/kvm/vgic/vgic.c, for example.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2021-06-01 13:32:22

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 1/7] KVM: arm64: Move hyp_pool locking out of refcount helpers

On Tuesday 01 Jun 2021 at 13:02:00 (+0100), Marc Zyngier wrote:
> On Thu, 27 May 2021 13:51:28 +0100,
> Quentin Perret <[email protected]> wrote:
> >
> > The hyp_page refcount helpers currently rely on the hyp_pool lock for
> > serialization. However, this means the refcounts can't be changed from
> > the buddy allocator core as it already holds the lock, which means pages
> > have to go through odd transient states.
> >
> > For example, when a page is freed, its refcount is set to 0, and the
> > lock is transiently released before the page can be attached to a free
> > list in the buddy tree. This is currently harmless as the allocator
> > checks the list node of each page to see if it is available for
> > allocation or not, but it means the page refcount can't be trusted to
> > represent the state of the page even if the pool lock is held.
> >
> > In order to fix this, remove the pool locking from the refcount helpers,
> > and move all the logic to the buddy allocator. This will simplify the
> > removal of the list node from struct hyp_page in a later patch.
>
> Is there any chance some documentation could be added so that we have
> a record of what the locking boundaries are? Something along the line
> of what we have in arch/arm64/kvm/vgic/vgic.c, for example.

Sounds like a good idea, I'll go write something.

Cheers,
Quentin

2021-06-01 14:39:25

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 3/7] KVM: arm64: Remove list_head from hyp_page

On Thu, 27 May 2021 13:51:30 +0100,
Quentin Perret <[email protected]> wrote:
>
> The list_head member of struct hyp_page is only needed when the page is
> attached to a free-list, which by definition implies the page is free.
> As such, nothing prevents us from using the page itself to store the
> list_head, hence reducing the size of the vmemmap.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/kvm/hyp/include/nvhe/memory.h | 1 -
> arch/arm64/kvm/hyp/nvhe/page_alloc.c | 39 ++++++++++++++++++++----
> 2 files changed, 33 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> index fd78bde939ee..7691ab495eb4 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> @@ -12,7 +12,6 @@ struct hyp_page {
> unsigned int refcount;
> unsigned int order;
> struct hyp_pool *pool;
> - struct list_head node;
> };
>
> extern u64 __hyp_vmemmap;
> diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> index 7ee882f36767..ce7379f1480b 100644
> --- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> +++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> @@ -62,6 +62,34 @@ static struct hyp_page *__find_buddy_avail(struct hyp_pool *pool,
>
> }
>
> +/*
> + * Pages that are available for allocation are tracked in free-lists, so we use
> + * the pages themselves to store the list nodes to avoid wasting space. As the
> + * allocator always returns zeroed pages (which are zeroed on the hyp_put_page()
> + * path to optimize allocation speed), we also need to clean-up the list node in
> + * each page when we take it out of the list.
> + */
> +static inline void page_remove_from_list(struct hyp_page *p)
> +{
> + struct list_head *node = (struct list_head *)hyp_page_to_virt(p);

Nit: How about changing hyp_page_to_virt() so that it returns a
convenient 'void *', and get rid of the ugly casts?

> +
> + __list_del_entry(node);
> + memset(node, 0, sizeof(*node));
> +}
> +
> +static inline void page_add_to_list(struct hyp_page *p, struct list_head *head)
> +{
> + struct list_head *node = (struct list_head *)hyp_page_to_virt(p);
> +
> + INIT_LIST_HEAD(node);
> + list_add_tail(node, head);
> +}
> +
> +static inline struct hyp_page *node_to_page(struct list_head *node)
> +{
> + return (struct hyp_page *)hyp_virt_to_page(node);

Why is this cast necessary? If I'm not mistaken, hyp_vmemmap is
already cast as a 'struct hyp_page *', so hyp_virt_to_page() should
return the same type.

> +}
> +
> static void __hyp_attach_page(struct hyp_pool *pool,
> struct hyp_page *p)
> {
> @@ -83,14 +111,14 @@ static void __hyp_attach_page(struct hyp_pool *pool,
> break;
>
> /* Take the buddy out of its list, and coallesce with @p */
> - list_del_init(&buddy->node);
> + page_remove_from_list(buddy);
> buddy->order = HYP_NO_ORDER;
> p = min(p, buddy);
> }
>
> /* Mark the new head, and insert it */
> p->order = order;
> - list_add_tail(&p->node, &pool->free_area[order]);
> + page_add_to_list(p, &pool->free_area[order]);
> }
>
> static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
> @@ -99,7 +127,7 @@ static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
> {
> struct hyp_page *buddy;
>
> - list_del_init(&p->node);
> + page_remove_from_list(p);
> while (p->order > order) {
> /*
> * The buddy of order n - 1 currently has HYP_NO_ORDER as it
> @@ -110,7 +138,7 @@ static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
> p->order--;
> buddy = __find_buddy_nocheck(pool, p, p->order);
> buddy->order = p->order;
> - list_add_tail(&buddy->node, &pool->free_area[buddy->order]);
> + page_add_to_list(buddy, &pool->free_area[buddy->order]);
> }
>
> return p;
> @@ -158,7 +186,7 @@ void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order)
> }
>
> /* Extract it from the tree at the right order */
> - p = list_first_entry(&pool->free_area[i], struct hyp_page, node);
> + p = node_to_page(pool->free_area[i].next);
> p = __hyp_extract_page(pool, p, order);
>
> hyp_set_page_refcounted(p);
> @@ -186,7 +214,6 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
> for (i = 0; i < nr_pages; i++) {
> p[i].pool = pool;
> p[i].order = 0;
> - INIT_LIST_HEAD(&p[i].node);
> hyp_set_page_refcounted(&p[i]);
> }
>
> --
> 2.31.1.818.g46aad6cb9e-goog
>
>

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2021-06-01 15:04:09

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 5/7] KVM: arm64: Remove hyp_pool pointer from struct hyp_page

On Thu, 27 May 2021 13:51:32 +0100,
Quentin Perret <[email protected]> wrote:
>
> Each struct hyp_page currently contains a pointer to a hyp_pool struct
> where the page should be freed if its refcount reaches 0. However, this
> information can always be inferred from the context in the EL2 code, so
> drop the pointer to save a few bytes in the vmemmap.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/kvm/hyp/include/nvhe/gfp.h | 4 ++--
> arch/arm64/kvm/hyp/include/nvhe/memory.h | 2 --
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 13 +++++++++++--
> arch/arm64/kvm/hyp/nvhe/page_alloc.c | 7 ++-----
> arch/arm64/kvm/hyp/nvhe/setup.c | 14 ++++++++++++--
> 5 files changed, 27 insertions(+), 13 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> index aada4d97de49..9ed374648364 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> @@ -42,8 +42,8 @@ static inline void hyp_set_page_refcounted(struct hyp_page *p)
>
> /* Allocation */
> void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order);
> -void hyp_get_page(void *addr);
> -void hyp_put_page(void *addr);
> +void hyp_get_page(void *addr, struct hyp_pool *pool);
> +void hyp_put_page(void *addr, struct hyp_pool *pool);

It'd be good to be consistent with __hyp_put_page(), which has these
arguments in the opposite order. See below for an example.

>
> /* Used pages cannot be freed */
> int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> index 7691ab495eb4..991636be2f46 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> @@ -7,11 +7,9 @@
>
> #include <linux/types.h>
>
> -struct hyp_pool;
> struct hyp_page {
> unsigned int refcount;
> unsigned int order;
> - struct hyp_pool *pool;
> };
>
> extern u64 __hyp_vmemmap;
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index fdd5b5702e8a..3603311eb41c 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -42,6 +42,15 @@ static void *host_s2_zalloc_page(void *pool)
> {
> return hyp_alloc_pages(pool, 0);
> }
> +static void host_s2_get_page(void *addr)

nit: missing blank line.

> +{
> + hyp_get_page(addr, &host_s2_pool);
> +}
> +
> +static void host_s2_put_page(void *addr)
> +{
> + hyp_put_page(addr, &host_s2_pool);
> +}
>
> static int prepare_s2_pool(void *pgt_pool_base)
> {
> @@ -60,8 +69,8 @@ static int prepare_s2_pool(void *pgt_pool_base)
> .phys_to_virt = hyp_phys_to_virt,
> .virt_to_phys = hyp_virt_to_phys,
> .page_count = hyp_page_count,
> - .get_page = hyp_get_page,
> - .put_page = hyp_put_page,
> + .get_page = host_s2_get_page,
> + .put_page = host_s2_put_page,
> };
>
> return 0;
> diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> index ce7379f1480b..e453108a2d95 100644
> --- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> +++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> @@ -150,20 +150,18 @@ static void __hyp_put_page(struct hyp_pool *pool, struct hyp_page *p)
> __hyp_attach_page(pool, p);
> }
>
> -void hyp_put_page(void *addr)
> +void hyp_put_page(void *addr, struct hyp_pool *pool)
> {
> struct hyp_page *p = hyp_virt_to_page(addr);
> - struct hyp_pool *pool = hyp_page_to_pool(p);
>
> hyp_spin_lock(&pool->lock);
> __hyp_put_page(pool, p);

When I see this, my eyes get crossed, and I don't know what I'm
reading anymore! ;-) In general, I like the "container" as a first
argument, followed by the element that can be contained.

> hyp_spin_unlock(&pool->lock);
> }
>
> -void hyp_get_page(void *addr)
> +void hyp_get_page(void *addr, struct hyp_pool *pool)
> {
> struct hyp_page *p = hyp_virt_to_page(addr);
> - struct hyp_pool *pool = hyp_page_to_pool(p);
>
> hyp_spin_lock(&pool->lock);
> hyp_page_ref_inc(p);
> @@ -212,7 +210,6 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
> /* Init the vmemmap portion */
> p = hyp_phys_to_page(phys);
> for (i = 0; i < nr_pages; i++) {
> - p[i].pool = pool;
> p[i].order = 0;
> hyp_set_page_refcounted(&p[i]);
> }
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 709cb3d19eb7..bf61abd4a330 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -137,6 +137,16 @@ static void *hyp_zalloc_hyp_page(void *arg)
> return hyp_alloc_pages(&hpool, 0);
> }
>
> +static void hpool_get_page(void *addr)
> +{
> + hyp_get_page(addr, &hpool);
> +}
> +
> +static void hpool_put_page(void *addr)
> +{
> + hyp_put_page(addr, &hpool);
> +}
> +
> void __noreturn __pkvm_init_finalise(void)
> {
> struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
> @@ -160,8 +170,8 @@ void __noreturn __pkvm_init_finalise(void)
> .zalloc_page = hyp_zalloc_hyp_page,
> .phys_to_virt = hyp_phys_to_virt,
> .virt_to_phys = hyp_virt_to_phys,
> - .get_page = hyp_get_page,
> - .put_page = hyp_put_page,
> + .get_page = hpool_get_page,
> + .put_page = hpool_put_page,
> };
> pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;
>
> --
> 2.31.1.818.g46aad6cb9e-goog
>
>

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2021-06-01 15:23:20

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 7/7] KVM: arm64: Use less bits for hyp_page refcount

On Thu, 27 May 2021 13:51:34 +0100,
Quentin Perret <[email protected]> wrote:
>
> The hyp_page refcount is currently encoded on 4 bytes even though we
> never need to count that many objects in a page. Make it 2 bytes to save
> some space in the vmemmap.
>
> As overflows are more likely to happen as well, make sure to catch those
> with a BUG in the increment function.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/kvm/hyp/include/nvhe/gfp.h | 2 ++
> arch/arm64/kvm/hyp/include/nvhe/memory.h | 2 +-
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> index d420e5c0845f..a82f73faf41e 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> @@ -24,6 +24,8 @@ struct hyp_pool {
>
> static inline void hyp_page_ref_inc(struct hyp_page *p)
> {
> + if (p->refcount == USHRT_MAX)
> + BUG();

nit: BUG_ON(p->refcount == USHRT_MAX);

> p->refcount++;
> }
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> index 3fe34fa30ea4..592b7edb3edb 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> @@ -8,7 +8,7 @@
> #include <linux/types.h>
>
> struct hyp_page {
> - unsigned int refcount;
> + unsigned short refcount;
> unsigned short order;
> };
>
> --
> 2.31.1.818.g46aad6cb9e-goog
>
>

--
Without deviation from the norm, progress is not possible.

2021-06-01 15:49:48

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 3/7] KVM: arm64: Remove list_head from hyp_page

On Tuesday 01 Jun 2021 at 15:38:22 (+0100), Marc Zyngier wrote:
> On Thu, 27 May 2021 13:51:30 +0100,
> Quentin Perret <[email protected]> wrote:
> > +/*
> > + * Pages that are available for allocation are tracked in free-lists, so we use
> > + * the pages themselves to store the list nodes to avoid wasting space. As the
> > + * allocator always returns zeroed pages (which are zeroed on the hyp_put_page()
> > + * path to optimize allocation speed), we also need to clean-up the list node in
> > + * each page when we take it out of the list.
> > + */
> > +static inline void page_remove_from_list(struct hyp_page *p)
> > +{
> > + struct list_head *node = (struct list_head *)hyp_page_to_virt(p);
>
> Nit: How about changing hyp_page_to_virt() so that it returns a
> convenient 'void *', and get rid of the ugly casts?

It should already return void *, but I kind of liked the explicit cast
here for documentation purpose. We're turning a 'random' piece of unused
memory into a typed object, so that felt like a useful annotation. Happy
to get rid of it though.

> > +
> > + __list_del_entry(node);
> > + memset(node, 0, sizeof(*node));
> > +}
> > +
> > +static inline void page_add_to_list(struct hyp_page *p, struct list_head *head)
> > +{
> > + struct list_head *node = (struct list_head *)hyp_page_to_virt(p);
> > +
> > + INIT_LIST_HEAD(node);
> > + list_add_tail(node, head);
> > +}
> > +
> > +static inline struct hyp_page *node_to_page(struct list_head *node)
> > +{
> > + return (struct hyp_page *)hyp_virt_to_page(node);
>
> Why is this cast necessary? If I'm not mistaken, hyp_vmemmap is
> already cast as a 'struct hyp_page *', so hyp_virt_to_page() should
> return the same type.

Right, that one is totally unnecessary, I'll remove.

Cheers,
Quentin

2021-06-01 17:42:43

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 3/7] KVM: arm64: Remove list_head from hyp_page

On Tue, 01 Jun 2021 16:48:06 +0100,
Quentin Perret <[email protected]> wrote:
>
> On Tuesday 01 Jun 2021 at 15:38:22 (+0100), Marc Zyngier wrote:
> > On Thu, 27 May 2021 13:51:30 +0100,
> > Quentin Perret <[email protected]> wrote:
> > > +/*
> > > + * Pages that are available for allocation are tracked in free-lists, so we use
> > > + * the pages themselves to store the list nodes to avoid wasting space. As the
> > > + * allocator always returns zeroed pages (which are zeroed on the hyp_put_page()
> > > + * path to optimize allocation speed), we also need to clean-up the list node in
> > > + * each page when we take it out of the list.
> > > + */
> > > +static inline void page_remove_from_list(struct hyp_page *p)
> > > +{
> > > + struct list_head *node = (struct list_head *)hyp_page_to_virt(p);
> >
> > Nit: How about changing hyp_page_to_virt() so that it returns a
> > convenient 'void *', and get rid of the ugly casts?
>
> It should already return void *, but I kind of liked the explicit cast
> here for documentation purpose. We're turning a 'random' piece of unused
> memory into a typed object, so that felt like a useful annotation. Happy
> to get rid of it though.

My expectations were that using hyp_page_to_virt() already serves as a
pretty big warning that we're doing something unusual.

I guess that if we want to be really careful about those, we should
then be consistent and make it return a uintptr_t (or unsigned long)
instead, actively making use of the cast, consistently, everywhere.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2021-06-02 09:58:44

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 3/7] KVM: arm64: Remove list_head from hyp_page

On Tuesday 01 Jun 2021 at 18:41:49 (+0100), Marc Zyngier wrote:
> On Tue, 01 Jun 2021 16:48:06 +0100,
> Quentin Perret <[email protected]> wrote:
> >
> > On Tuesday 01 Jun 2021 at 15:38:22 (+0100), Marc Zyngier wrote:
> > > On Thu, 27 May 2021 13:51:30 +0100,
> > > Quentin Perret <[email protected]> wrote:
> > > > +/*
> > > > + * Pages that are available for allocation are tracked in free-lists, so we use
> > > > + * the pages themselves to store the list nodes to avoid wasting space. As the
> > > > + * allocator always returns zeroed pages (which are zeroed on the hyp_put_page()
> > > > + * path to optimize allocation speed), we also need to clean-up the list node in
> > > > + * each page when we take it out of the list.
> > > > + */
> > > > +static inline void page_remove_from_list(struct hyp_page *p)
> > > > +{
> > > > + struct list_head *node = (struct list_head *)hyp_page_to_virt(p);
> > >
> > > Nit: How about changing hyp_page_to_virt() so that it returns a
> > > convenient 'void *', and get rid of the ugly casts?
> >
> > It should already return void *, but I kind of liked the explicit cast
> > here for documentation purpose. We're turning a 'random' piece of unused
> > memory into a typed object, so that felt like a useful annotation. Happy
> > to get rid of it though.
>
> My expectations were that using hyp_page_to_virt() already serves as a
> pretty big warning that we're doing something unusual.
>
> I guess that if we want to be really careful about those, we should
> then be consistent and make it return a uintptr_t (or unsigned long)
> instead, actively making use of the cast, consistently, everywhere.

Right, so I'd prefer keeping it void * for consistency with the EL1
equivalent -- much of this aims to be API compatible with the Linux
stuff. I'll get rid of the cast and post a v2 shortly.

Cheers,
Quentin