Subject: [PATCH] KVM: selftests: replace assertion with warning in access_tracking_perf_test

Page_idle uses {ptep/pmdp}_clear_young_notify which in turn calls
the mmu notifier callback ->clear_young(), which purposefully
does not flush the TLB.

When running the test in a nested guest, point 1. of the test
doc header is violated, because KVM TLB is unbounded by size
and since no flush is forced, KVM does not update the sptes
accessed/idle bits resulting in guest assertion failure.

More precisely, only the first ACCESS_WRITE in run_test() actually
makes visible changes, because sptes are created and the accessed
bit is set to 1 (or idle bit is 0). Then the first mark_memory_idle()
passes since access bit is still one, and sets all pages as idle
(or not accessed). When the next write is performed, the update
is not flushed therefore idle is still 1 and next mark_memory_idle()
fails.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
---
.../selftests/kvm/access_tracking_perf_test.c | 25 ++++++++++++-------
1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/kvm/access_tracking_perf_test.c b/tools/testing/selftests/kvm/access_tracking_perf_test.c
index 1c2749b1481a..87b0bd5ebc65 100644
--- a/tools/testing/selftests/kvm/access_tracking_perf_test.c
+++ b/tools/testing/selftests/kvm/access_tracking_perf_test.c
@@ -31,8 +31,9 @@
* These limitations are worked around in this test by using a large enough
* region of memory for each vCPU such that the number of translations cached in
* the TLB and the number of pages held in pagevecs are a small fraction of the
- * overall workload. And if either of those conditions are not true this test
- * will fail rather than silently passing.
+ * overall workload. And if either of those conditions are not true (for example
+ * in nesting, where TLB size is unlimited) this test will print a warning
+ * rather than silently passing.
*/
#include <inttypes.h>
#include <limits.h>
@@ -172,17 +173,23 @@ static void mark_vcpu_memory_idle(struct kvm_vm *vm,
vcpu_idx, no_pfn, pages);

/*
- * Test that at least 90% of memory has been marked idle (the rest might
- * not be marked idle because the pages have not yet made it to an LRU
- * list or the translations are still cached in the TLB). 90% is
+ * Check that at least 90% of memory has been marked idle (the rest
+ * might not be marked idle because the pages have not yet made it to an
+ * LRU list or the translations are still cached in the TLB). 90% is
* arbitrary; high enough that we ensure most memory access went through
* access tracking but low enough as to not make the test too brittle
* over time and across architectures.
+ *
+ * Note that when run in nested virtualization, this check will trigger
+ * much more frequently because TLB size is unlimited and since no flush
+ * happens, much more pages are cached there and guest won't see the
+ * "idle" bit cleared.
*/
- TEST_ASSERT(still_idle < pages / 10,
- "vCPU%d: Too many pages still idle (%"PRIu64 " out of %"
- PRIu64 ").\n",
- vcpu_idx, still_idle, pages);
+ if (still_idle < pages / 10)
+ printf("WARNING: vCPU%d: Too many pages still idle (%"PRIu64 "
+ out of %" PRIu64 "), this will affect performance results
+ .\n",
+ vcpu_idx, still_idle, pages);

close(page_idle_fd);
close(pagemap_fd);
--
2.31.1


2022-09-27 12:42:01

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] KVM: selftests: replace assertion with warning in access_tracking_perf_test

On 9/26/22 10:29, Emanuele Giuseppe Esposito wrote:
> Page_idle uses {ptep/pmdp}_clear_young_notify which in turn calls
> the mmu notifier callback ->clear_young(), which purposefully
> does not flush the TLB.
>
> When running the test in a nested guest, point 1. of the test
> doc header is violated, because KVM TLB is unbounded by size
> and since no flush is forced, KVM does not update the sptes
> accessed/idle bits resulting in guest assertion failure.
>
> More precisely, only the first ACCESS_WRITE in run_test() actually
> makes visible changes, because sptes are created and the accessed
> bit is set to 1 (or idle bit is 0). Then the first mark_memory_idle()
> passes since access bit is still one, and sets all pages as idle
> (or not accessed). When the next write is performed, the update
> is not flushed therefore idle is still 1 and next mark_memory_idle()
> fails.

Queued, thanks.

Paolo