2023-06-26 18:59:38

by Mingwei Zhang

[permalink] [raw]
Subject: [PATCH v2 0/6] Update document description for kvm_mmu_page and kvm_mmu_page_role

This is the 2nd version and I made some changes according to feedback:

v1 -> v2:
- Update the shortlog and commit messages [Zhi].
- Refactor the description in mmu.rst [Yilun, Kai]

v1: https://lore.kernel.org/all/[email protected]/

Mingwei Zhang (6):
KVM: Documentation: Add the missing description for guest_mode in
kvm_mmu_page_role
KVM: Documentation: Update the field name gfns and its description in
kvm_mmu_page
KVM: Documentation: Add the missing description for ptep in
kvm_mmu_page
KVM: Documentation: Add the missing description for tdp_mmu_root_count
into kvm_mmu_page
KVM: Documentation: Add the missing description for mmu_valid_gen into
kvm_mmu_page
KVM: Documentation: Add the missing description for tdp_mmu_page into
kvm_mmu_page

Documentation/virt/kvm/x86/mmu.rst | 28 ++++++++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)


base-commit: 0b210faf337314e4bc88e796218bc70c72a51209
--
2.41.0.162.gfafddb0af9-goog



2023-06-26 19:19:52

by Mingwei Zhang

[permalink] [raw]
Subject: [PATCH v2 2/6] KVM: Documentation: Update the field name gfns and its description in kvm_mmu_page

Update the field 'gfns' in kvm_mmu_page to 'shadowed_translation' to be
consistent with the code. Also update the corresponding 'gfns' in the
comments. The more detailed description of 'shadowed_translation' is
already inlined in the data structure definition, so no need to duplicate
the text but simply just update the names.

Signed-off-by: Mingwei Zhang <[email protected]>
Reviewed-by: Kai Huang <[email protected]>
---
Documentation/virt/kvm/x86/mmu.rst | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
index 561efa8ec7d7..4c9044b4dc6c 100644
--- a/Documentation/virt/kvm/x86/mmu.rst
+++ b/Documentation/virt/kvm/x86/mmu.rst
@@ -221,11 +221,12 @@ Shadow pages contain the following information:
at __pa(sp2->spt). sp2 will point back at sp1 through parent_pte.
The spt array forms a DAG structure with the shadow page as a node, and
guest pages as leaves.
- gfns:
- An array of 512 guest frame numbers, one for each present pte. Used to
- perform a reverse map from a pte to a gfn. When role.direct is set, any
+ shadowed_translation:
+ An array of 512 shadow translation entries, one for each present pte. Used
+ to perform a reverse map from a pte to a gfn. When role.direct is set, any
element of this array can be calculated from the gfn field when used, in
- this case, the array of gfns is not allocated. See role.direct and gfn.
+ this case, the array of shadowed_translation is not allocated. See
+ role.direct and gfn.
root_count:
A counter keeping track of how many hardware registers (guest cr3 or
pdptrs) are now pointing at the page. While this counter is nonzero, the
--
2.41.0.162.gfafddb0af9-goog


2023-06-26 19:26:18

by Mingwei Zhang

[permalink] [raw]
Subject: [PATCH v2 1/6] KVM: Documentation: Add the missing description for guest_mode in kvm_mmu_page_role

Add the missing description for guest_mode in kvm_mmu_page_role
description. guest_mode tells KVM whether a shadow page is used for the L1
or an L2. Update the missing field in documentation.

Signed-off-by: Mingwei Zhang <[email protected]>
Reviewed-by: Kai Huang <[email protected]>
---
Documentation/virt/kvm/x86/mmu.rst | 2 ++
1 file changed, 2 insertions(+)

diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
index 8364afa228ec..561efa8ec7d7 100644
--- a/Documentation/virt/kvm/x86/mmu.rst
+++ b/Documentation/virt/kvm/x86/mmu.rst
@@ -202,6 +202,8 @@ Shadow pages contain the following information:
Is 1 if the MMU instance cannot use A/D bits. EPT did not have A/D
bits before Haswell; shadow EPT page tables also cannot use A/D bits
if the L1 hypervisor does not enable them.
+ role.guest_mode:
+ Indicates the shadow page is created for a nested guest.
role.passthrough:
The page is not backed by a guest page table, but its first entry
points to one. This is set if NPT uses 5-level page tables (host
--
2.41.0.162.gfafddb0af9-goog


2023-06-26 19:27:49

by Mingwei Zhang

[permalink] [raw]
Subject: [PATCH v2 4/6] KVM: Documentation: Add the missing description for tdp_mmu_root_count into kvm_mmu_page

Add the description of tdp_mmu_root_count into kvm_mmu_page description.
tdp_mmu_root_count is an atomic counter used only in TDP MMU. Its usage and
meaning is slightly different with root_counter in shadow MMU. Update the
doc.

Signed-off-by: Mingwei Zhang <[email protected]>
---
Documentation/virt/kvm/x86/mmu.rst | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
index 5cd6cd5e8926..97d695207e11 100644
--- a/Documentation/virt/kvm/x86/mmu.rst
+++ b/Documentation/virt/kvm/x86/mmu.rst
@@ -231,6 +231,11 @@ Shadow pages contain the following information:
A counter keeping track of how many hardware registers (guest cr3 or
pdptrs) are now pointing at the page. While this counter is nonzero, the
page cannot be destroyed. See role.invalid.
+ tdp_mmu_root_count:
+ An atomic reference counter in TDP MMU root page that allows for parallel
+ accesses. Accessing the page requires lifting the counter value. The
+ initial value is set to 2 indicating one reference from vCPU and one
+ from TDP MMU itself. Note this field is a union with root_count.
parent_ptes:
The reverse mapping for the pte/ptes pointing at this page's spt. If
parent_ptes bit 0 is zero, only one spte points at this page and
--
2.41.0.162.gfafddb0af9-goog


2023-06-26 19:32:39

by Mingwei Zhang

[permalink] [raw]
Subject: [PATCH v2 5/6] KVM: Documentation: Add the missing description for mmu_valid_gen into kvm_mmu_page

Add the description for mmu_valid_gen into kvm_mmu_page description.
mmu_valid_gen is used in shadow MMU for fast zapping. Update the doc to
reflect that.

Signed-off-by: Mingwei Zhang <[email protected]>
Reviewed-by: Kai Huang <[email protected]>
---
Documentation/virt/kvm/x86/mmu.rst | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
index 97d695207e11..cc4bd190c93d 100644
--- a/Documentation/virt/kvm/x86/mmu.rst
+++ b/Documentation/virt/kvm/x86/mmu.rst
@@ -208,6 +208,10 @@ Shadow pages contain the following information:
The page is not backed by a guest page table, but its first entry
points to one. This is set if NPT uses 5-level page tables (host
CR4.LA57=1) and is shadowing L1's 4-level NPT (L1 CR4.LA57=1).
+ mmu_valid_gen:
+ Used by comparing against kvm->arch.mmu_valid_gen to check whether the
+ shadow page is obsolete thus a convenient variable for fast zapping.
+ Note that TDP MMU does not use mmu_valid_gen.
gfn:
Either the guest page table containing the translations shadowed by this
page, or the base page frame for linear translations. See role.direct.
--
2.41.0.162.gfafddb0af9-goog


2023-06-26 19:53:32

by Mingwei Zhang

[permalink] [raw]
Subject: [PATCH v2 3/6] KVM: Documentation: Add the missing description for ptep in kvm_mmu_page

Add the missing description for ptep in kvm_mmu_page description. ptep is
used when TDP MMU is enabled and it shares the storage with parent_ptes.
Update the doc to help readers to get up-to-date info.

Signed-off-by: Mingwei Zhang <[email protected]>
---
Documentation/virt/kvm/x86/mmu.rst | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
index 4c9044b4dc6c..5cd6cd5e8926 100644
--- a/Documentation/virt/kvm/x86/mmu.rst
+++ b/Documentation/virt/kvm/x86/mmu.rst
@@ -237,6 +237,11 @@ Shadow pages contain the following information:
parent_ptes points at this single spte, otherwise, there exists multiple
sptes pointing at this page and (parent_ptes & ~0x1) points at a data
structure with a list of parent sptes.
+ ptep:
+ The reverse mapping for the pte pointing at this page's spt. This field is
+ used in replace of parent_ptes when TDP MMU is used. In TDP MMU, each
+ non-root shadow page will have one parent, while each root shadow page has
+ no parent. Note that this field is a union with parent_ptes.
unsync:
If true, then the translations in this page may not match the guest's
translation. This is equivalent to the state of the tlb when a pte is
--
2.41.0.162.gfafddb0af9-goog


2023-06-26 21:51:55

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] KVM: Documentation: Update the field name gfns and its description in kvm_mmu_page

Hi--

On 6/26/23 11:20, Mingwei Zhang wrote:
> Update the field 'gfns' in kvm_mmu_page to 'shadowed_translation' to be
> consistent with the code. Also update the corresponding 'gfns' in the
> comments. The more detailed description of 'shadowed_translation' is
> already inlined in the data structure definition, so no need to duplicate
> the text but simply just update the names.
>
> Signed-off-by: Mingwei Zhang <[email protected]>
> Reviewed-by: Kai Huang <[email protected]>
> ---
> Documentation/virt/kvm/x86/mmu.rst | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
> index 561efa8ec7d7..4c9044b4dc6c 100644
> --- a/Documentation/virt/kvm/x86/mmu.rst
> +++ b/Documentation/virt/kvm/x86/mmu.rst
> @@ -221,11 +221,12 @@ Shadow pages contain the following information:
> at __pa(sp2->spt). sp2 will point back at sp1 through parent_pte.
> The spt array forms a DAG structure with the shadow page as a node, and
> guest pages as leaves.
> - gfns:
> - An array of 512 guest frame numbers, one for each present pte. Used to
> - perform a reverse map from a pte to a gfn. When role.direct is set, any
> + shadowed_translation:
> + An array of 512 shadow translation entries, one for each present pte. Used
> + to perform a reverse map from a pte to a gfn. When role.direct is set, any
> element of this array can be calculated from the gfn field when used, in
> - this case, the array of gfns is not allocated. See role.direct and gfn.
> + this case, the array of shadowed_translation is not allocated. See

I cannot parse the before version nor the after version of this sentence (new version):

When role.direct is set, any
element of this array can be calculated from the gfn field when used, in
this case, the array of shadowed_translation is not allocated.


> + role.direct and gfn.
> root_count:
> A counter keeping track of how many hardware registers (guest cr3 or
> pdptrs) are now pointing at the page. While this counter is nonzero, the

--
~Randy

2023-06-26 22:13:11

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] KVM: Documentation: Add the missing description for ptep in kvm_mmu_page



On 6/26/23 11:20, Mingwei Zhang wrote:
> Add the missing description for ptep in kvm_mmu_page description. ptep is
> used when TDP MMU is enabled and it shares the storage with parent_ptes.
> Update the doc to help readers to get up-to-date info.
>
> Signed-off-by: Mingwei Zhang <[email protected]>
> ---
> Documentation/virt/kvm/x86/mmu.rst | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
> index 4c9044b4dc6c..5cd6cd5e8926 100644
> --- a/Documentation/virt/kvm/x86/mmu.rst
> +++ b/Documentation/virt/kvm/x86/mmu.rst
> @@ -237,6 +237,11 @@ Shadow pages contain the following information:
> parent_ptes points at this single spte, otherwise, there exists multiple
> sptes pointing at this page and (parent_ptes & ~0x1) points at a data
> structure with a list of parent sptes.
> + ptep:
> + The reverse mapping for the pte pointing at this page's spt. This field is
> + used in replace of parent_ptes when TDP MMU is used. In TDP MMU, each

in replacement of

> + non-root shadow page will have one parent, while each root shadow page has
> + no parent. Note that this field is a union with parent_ptes.
> unsync:
> If true, then the translations in this page may not match the guest's
> translation. This is equivalent to the state of the tlb when a pte is

--
~Randy

2023-06-26 22:41:35

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] KVM: Documentation: Add the missing description for tdp_mmu_root_count into kvm_mmu_page

On Mon, 2023-06-26 at 18:20 +0000, Mingwei Zhang wrote:
> Add the description of tdp_mmu_root_count into kvm_mmu_page description.
> tdp_mmu_root_count is an atomic counter used only in TDP MMU. Its usage and
> meaning is slightly different with root_counter in shadow MMU. Update the
> doc.
>
> Signed-off-by: Mingwei Zhang <[email protected]>
> ---
> Documentation/virt/kvm/x86/mmu.rst | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
> index 5cd6cd5e8926..97d695207e11 100644
> --- a/Documentation/virt/kvm/x86/mmu.rst
> +++ b/Documentation/virt/kvm/x86/mmu.rst
> @@ -231,6 +231,11 @@ Shadow pages contain the following information:
> A counter keeping track of how many hardware registers (guest cr3 or
> pdptrs) are now pointing at the page. While this counter is nonzero, the
> page cannot be destroyed. See role.invalid.
> + tdp_mmu_root_count:
> + An atomic reference counter in TDP MMU root page that allows for parallel
> + accesses. Accessing the page requires lifting the counter value. The
> + initial value is set to 2 indicating one reference from vCPU and one
> + from TDP MMU itself. Note this field is a union with root_count.
> parent_ptes:
> The reverse mapping for the pte/ptes pointing at this page's spt. If
> parent_ptes bit 0 is zero, only one spte points at this page and
> --
> 2.41.0.162.gfafddb0af9-goog
>

Reviewed-by: Kai Huang <[email protected]>

2023-06-26 22:54:38

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] KVM: Documentation: Add the missing description for ptep in kvm_mmu_page

On Mon, 2023-06-26 at 18:20 +0000, Mingwei Zhang wrote:
> Add the missing description for ptep in kvm_mmu_page description. ptep is
> used when TDP MMU is enabled and it shares the storage with parent_ptes.
> Update the doc to help readers to get up-to-date info.
>
> Signed-off-by: Mingwei Zhang <[email protected]>
> ---
> Documentation/virt/kvm/x86/mmu.rst | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
> index 4c9044b4dc6c..5cd6cd5e8926 100644
> --- a/Documentation/virt/kvm/x86/mmu.rst
> +++ b/Documentation/virt/kvm/x86/mmu.rst
> @@ -237,6 +237,11 @@ Shadow pages contain the following information:
> parent_ptes points at this single spte, otherwise, there exists multiple
> sptes pointing at this page and (parent_ptes & ~0x1) points at a data
> structure with a list of parent sptes.
> + ptep:
> + The reverse mapping for the pte pointing at this page's spt. This field is
> + used in replace of parent_ptes when TDP MMU is used. In TDP MMU, each
> + non-root shadow page will have one parent, while each root shadow page has
> + no parent. Note that this field is a union with parent_ptes.
> unsync:
> If true, then the translations in this page may not match the guest's
> translation. This is equivalent to the state of the tlb when a pte is
> --
> 2.41.0.162.gfafddb0af9-goog
>

Reviewed-by: Kai Huang <[email protected]>

2023-06-27 15:48:52

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] KVM: Documentation: Add the missing description for ptep in kvm_mmu_page

On Mon, Jun 26, 2023, Mingwei Zhang wrote:
> Add the missing description for ptep in kvm_mmu_page description. ptep is
> used when TDP MMU is enabled and it shares the storage with parent_ptes.
> Update the doc to help readers to get up-to-date info.
>
> Signed-off-by: Mingwei Zhang <[email protected]>
> ---
> Documentation/virt/kvm/x86/mmu.rst | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
> index 4c9044b4dc6c..5cd6cd5e8926 100644
> --- a/Documentation/virt/kvm/x86/mmu.rst
> +++ b/Documentation/virt/kvm/x86/mmu.rst
> @@ -237,6 +237,11 @@ Shadow pages contain the following information:
> parent_ptes points at this single spte, otherwise, there exists multiple
> sptes pointing at this page and (parent_ptes & ~0x1) points at a data
> structure with a list of parent sptes.
> + ptep:
> + The reverse mapping for the pte pointing at this page's spt. This field is

I don't think describing "reverse mapping" is necessary, and it's arguably even
misleading. A "reverse mapping" typically provides a way to find mappings given
a (guest) physical address. The TDP MMU doesn't bother with reverse mappings
because there is exactly one possible mapping for any given gfn. The "ptep" exists
specifically to expedite zapping a single TDP MMU shadow page, i.e. allows zapping
without having to traverse the paging tree.

The ptep field is just a pointer at the SPTE, no more no less. Something like
this?

ptep:
The kernel virtual address of the SPTE that points at this shadow page.
Used exclusively by the TDP MMU, this field is a union with parent_ptes.

> + used in replace of parent_ptes when TDP MMU is used. In TDP MMU, each
> + non-root shadow page will have one parent, while each root shadow page has
> + no parent. Note that this field is a union with parent_ptes.
> unsync:
> If true, then the translations in this page may not match the guest's
> translation. This is equivalent to the state of the tlb when a pte is
> --
> 2.41.0.162.gfafddb0af9-goog
>

2023-06-27 16:11:03

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] KVM: Documentation: Add the missing description for tdp_mmu_root_count into kvm_mmu_page

On Mon, Jun 26, 2023, Mingwei Zhang wrote:
> Add the description of tdp_mmu_root_count into kvm_mmu_page description.
> tdp_mmu_root_count is an atomic counter used only in TDP MMU. Its usage and
> meaning is slightly different with root_counter in shadow MMU. Update the
> doc.
>
> Signed-off-by: Mingwei Zhang <[email protected]>
> ---
> Documentation/virt/kvm/x86/mmu.rst | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
> index 5cd6cd5e8926..97d695207e11 100644
> --- a/Documentation/virt/kvm/x86/mmu.rst
> +++ b/Documentation/virt/kvm/x86/mmu.rst
> @@ -231,6 +231,11 @@ Shadow pages contain the following information:
> A counter keeping track of how many hardware registers (guest cr3 or
> pdptrs) are now pointing at the page. While this counter is nonzero, the
> page cannot be destroyed. See role.invalid.
> + tdp_mmu_root_count:
> + An atomic reference counter in TDP MMU root page that allows for parallel
> + accesses.

I find the "parallel accesses" simultaneously redundant and confusing. The fact
that's it's an atomic implies that there are concurrent accesses. And need for
an atomic is really just a minor note, i.e. shouldn't be the focus of the
documentation.

On a related topic, the description for "root_count" is stale now that KVM keeps
references to roots.

What if we take this opportunity to unify the documentation?

root_count / tdp_mmu_rount_count:

A reference counter for root shadow pages. vCPUs elevate the refcount when
getting a shadow page that will be used as a root, i.e. will be loaded into
hardware directly (CR3, PDPTRs, nCR3 EPTP). Root pages cannnot be freed
while their refcount is non-zero. The TDP MMU uses an atomic refcount as
vCPUs can acquire references while holding mmu_lock for read. See
role.invalid and Root Pages.

And then add a section specifically for root pages? I think trying to cram
everything important about root pages into the description for their refcount
will be difficult and kludgy. E.g. this doc should also provide an explanation of
previous roots.

Root Pages
==========

Key talking points:

- Definition of a root page
- Lifecycle of roots for both the shadow MMU and TDP MMU
- Previous root tracking, and why only KVM doesn'y track previous roots when
using PAE paging
- The importance of preserving roots that are currently not referenced by any
vCPU, i.e. why TDP MMU roots are initialized with a refcount of '2'
- Why shadow MMU roots don't gift a reference to the MMU itself, i.e. why they
naturally survive their refcount going to zero


> Accessing the page requires lifting the counter value. The
> + initial value is set to 2 indicating one reference from vCPU and one
> + from TDP MMU itself. Note this field is a union with root_count.
> parent_ptes:
> The reverse mapping for the pte/ptes pointing at this page's spt. If
> parent_ptes bit 0 is zero, only one spte points at this page and
> --
> 2.41.0.162.gfafddb0af9-goog
>

2023-06-27 16:45:02

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] KVM: Documentation: Add the missing description for mmu_valid_gen into kvm_mmu_page

On Mon, Jun 26, 2023, Mingwei Zhang wrote:
> Add the description for mmu_valid_gen into kvm_mmu_page description.
> mmu_valid_gen is used in shadow MMU for fast zapping. Update the doc to
> reflect that.
>
> Signed-off-by: Mingwei Zhang <[email protected]>
> Reviewed-by: Kai Huang <[email protected]>
> ---
> Documentation/virt/kvm/x86/mmu.rst | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
> index 97d695207e11..cc4bd190c93d 100644
> --- a/Documentation/virt/kvm/x86/mmu.rst
> +++ b/Documentation/virt/kvm/x86/mmu.rst
> @@ -208,6 +208,10 @@ Shadow pages contain the following information:
> The page is not backed by a guest page table, but its first entry
> points to one. This is set if NPT uses 5-level page tables (host
> CR4.LA57=1) and is shadowing L1's 4-level NPT (L1 CR4.LA57=1).
> + mmu_valid_gen:
> + Used by comparing against kvm->arch.mmu_valid_gen to check whether the

This needs to explain what the generation is, and where it comes from.

The MMU generation of this page, used to effect a "fast" zap of all MMU pages
across all roots. To zap all pages in all roots without blocking vCPUs, e.g.
when deleting a memslot, KVM updates the per-VM valid MMU generation to mark
all existing pages and roots invalid/obsolete. Obsolete pages can't be used,
e.g. vCPUs must load a new, valid root before re-entering the guest.

The MMU generation is only ever '0' or '1', as slots_lock must be held until
all obsolete pages are zapped and freed, i.e. there is exactly one valid
generation and (at most) one invalid generation.

Note, the TDP MMU doesn't use mmu_gen as non-root TDP MMU pages are reachable
only from their owning root, whereas all pages for shadow MMUs are reachable
via the hash map. The TDP MMU uses role.invalid to track obsolete roots.

And then big bonus points if you add

Page Role
=========

to explain the purpose of the role, and how/when it's used in the shadow MMU versus
the TDP MMU. The shadow MMU's use of a hash map is a fundemental aspect that really
should be documented here.

> + shadow page is obsolete thus a convenient variable for fast zapping.
> + Note that TDP MMU does not use mmu_valid_gen.
> gfn:
> Either the guest page table containing the translations shadowed by this
> page, or the base page frame for linear translations. See role.direct.
> --
> 2.41.0.162.gfafddb0af9-goog
>

2023-07-31 18:46:12

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] KVM: Documentation: Update the field name gfns and its description in kvm_mmu_page

On Mon, Jul 31, 2023, Mingwei Zhang wrote:
> On Mon, Jun 26, 2023, Randy Dunlap wrote:
> > Hi--
> >
> > On 6/26/23 11:20, Mingwei Zhang wrote:
> > > Update the field 'gfns' in kvm_mmu_page to 'shadowed_translation' to be
> > > consistent with the code. Also update the corresponding 'gfns' in the
> > > comments. The more detailed description of 'shadowed_translation' is
> > > already inlined in the data structure definition, so no need to duplicate
> > > the text but simply just update the names.
> > >
> > > Signed-off-by: Mingwei Zhang <[email protected]>
> > > Reviewed-by: Kai Huang <[email protected]>
> > > ---
> > > Documentation/virt/kvm/x86/mmu.rst | 9 +++++----
> > > 1 file changed, 5 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
> > > index 561efa8ec7d7..4c9044b4dc6c 100644
> > > --- a/Documentation/virt/kvm/x86/mmu.rst
> > > +++ b/Documentation/virt/kvm/x86/mmu.rst
> > > @@ -221,11 +221,12 @@ Shadow pages contain the following information:
> > > at __pa(sp2->spt). sp2 will point back at sp1 through parent_pte.
> > > The spt array forms a DAG structure with the shadow page as a node, and
> > > guest pages as leaves.
> > > - gfns:
> > > - An array of 512 guest frame numbers, one for each present pte. Used to
> > > - perform a reverse map from a pte to a gfn. When role.direct is set, any
> > > + shadowed_translation:
> > > + An array of 512 shadow translation entries, one for each present pte. Used
> > > + to perform a reverse map from a pte to a gfn. When role.direct is set, any
> > > element of this array can be calculated from the gfn field when used, in
> > > - this case, the array of gfns is not allocated. See role.direct and gfn.
> > > + this case, the array of shadowed_translation is not allocated. See
> >
> > I cannot parse the before version nor the after version of this sentence (new version):
> >
> > When role.direct is set, any
> > element of this array can be calculated from the gfn field when used, in
> > this case, the array of shadowed_translation is not allocated.
> >
> >
>
> Sorry for the late reply. Why is it not parsed? It just means that when
> role.direct is set, do not use gfns. The gfn can be calculated from the
> base address + offset. The base address here is the 'gfn' field in
> kvm_mmu_page.

It's a bit of a run-on sentence with confusing pronoun usage. How about this?

When role.direct is set, the shadow_translation array is not allocated as the
per-SPTE gfn is simply an offset from the base gfn, and KVM doesn't track
access permissions for direct shadow pages.

2023-07-31 19:09:04

by Mingwei Zhang

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] KVM: Documentation: Add the missing description for tdp_mmu_root_count into kvm_mmu_page

On Tue, Jun 27, 2023, Sean Christopherson wrote:
> On Mon, Jun 26, 2023, Mingwei Zhang wrote:
> > Add the description of tdp_mmu_root_count into kvm_mmu_page description.
> > tdp_mmu_root_count is an atomic counter used only in TDP MMU. Its usage and
> > meaning is slightly different with root_counter in shadow MMU. Update the
> > doc.
> >
> > Signed-off-by: Mingwei Zhang <[email protected]>
> > ---
> > Documentation/virt/kvm/x86/mmu.rst | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
> > index 5cd6cd5e8926..97d695207e11 100644
> > --- a/Documentation/virt/kvm/x86/mmu.rst
> > +++ b/Documentation/virt/kvm/x86/mmu.rst
> > @@ -231,6 +231,11 @@ Shadow pages contain the following information:
> > A counter keeping track of how many hardware registers (guest cr3 or
> > pdptrs) are now pointing at the page. While this counter is nonzero, the
> > page cannot be destroyed. See role.invalid.
> > + tdp_mmu_root_count:
> > + An atomic reference counter in TDP MMU root page that allows for parallel
> > + accesses.
>
> I find the "parallel accesses" simultaneously redundant and confusing. The fact
> that's it's an atomic implies that there are concurrent accesses. And need for
> an atomic is really just a minor note, i.e. shouldn't be the focus of the
> documentation.
>
> On a related topic, the description for "root_count" is stale now that KVM keeps
> references to roots.
>
> What if we take this opportunity to unify the documentation?
>
> root_count / tdp_mmu_rount_count:
>
> A reference counter for root shadow pages. vCPUs elevate the refcount when
> getting a shadow page that will be used as a root, i.e. will be loaded into
> hardware directly (CR3, PDPTRs, nCR3 EPTP). Root pages cannnot be freed
> while their refcount is non-zero. The TDP MMU uses an atomic refcount as
> vCPUs can acquire references while holding mmu_lock for read. See
> role.invalid and Root Pages.

Ok, I think this one is reasonable and this clarifies that this field
only works for root pages.
>
> And then add a section specifically for root pages? I think trying to cram
> everything important about root pages into the description for their refcount
> will be difficult and kludgy. E.g. this doc should also provide an explanation of
> previous roots.
>
> Root Pages
> ==========
>
> Key talking points:
>
> - Definition of a root page
> - Lifecycle of roots for both the shadow MMU and TDP MMU
> - Previous root tracking, and why only KVM doesn'y track previous roots when
> using PAE paging
> - The importance of preserving roots that are currently not referenced by any
> vCPU, i.e. why TDP MMU roots are initialized with a refcount of '2'
> - Why shadow MMU roots don't gift a reference to the MMU itself, i.e. why they
> naturally survive their refcount going to zero
>

I am not sure if I can add the whole section in this commit. Maybe
I can push it back separately into a different series. For root_count, a
brief introduction of root pages should be good enough, which is explain
in your suggestion: page that "will be loaded into hardware directly
(CR3, PDPTRs, nCR3 EPTP)".
>
> > Accessing the page requires lifting the counter value. The
> > + initial value is set to 2 indicating one reference from vCPU and one
> > + from TDP MMU itself. Note this field is a union with root_count.
> > parent_ptes:
> > The reverse mapping for the pte/ptes pointing at this page's spt. If
> > parent_ptes bit 0 is zero, only one spte points at this page and
> > --
> > 2.41.0.162.gfafddb0af9-goog
> >

2023-07-31 19:46:41

by Mingwei Zhang

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] KVM: Documentation: Update the field name gfns and its description in kvm_mmu_page

On Mon, Jun 26, 2023, Randy Dunlap wrote:
> Hi--
>
> On 6/26/23 11:20, Mingwei Zhang wrote:
> > Update the field 'gfns' in kvm_mmu_page to 'shadowed_translation' to be
> > consistent with the code. Also update the corresponding 'gfns' in the
> > comments. The more detailed description of 'shadowed_translation' is
> > already inlined in the data structure definition, so no need to duplicate
> > the text but simply just update the names.
> >
> > Signed-off-by: Mingwei Zhang <[email protected]>
> > Reviewed-by: Kai Huang <[email protected]>
> > ---
> > Documentation/virt/kvm/x86/mmu.rst | 9 +++++----
> > 1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
> > index 561efa8ec7d7..4c9044b4dc6c 100644
> > --- a/Documentation/virt/kvm/x86/mmu.rst
> > +++ b/Documentation/virt/kvm/x86/mmu.rst
> > @@ -221,11 +221,12 @@ Shadow pages contain the following information:
> > at __pa(sp2->spt). sp2 will point back at sp1 through parent_pte.
> > The spt array forms a DAG structure with the shadow page as a node, and
> > guest pages as leaves.
> > - gfns:
> > - An array of 512 guest frame numbers, one for each present pte. Used to
> > - perform a reverse map from a pte to a gfn. When role.direct is set, any
> > + shadowed_translation:
> > + An array of 512 shadow translation entries, one for each present pte. Used
> > + to perform a reverse map from a pte to a gfn. When role.direct is set, any
> > element of this array can be calculated from the gfn field when used, in
> > - this case, the array of gfns is not allocated. See role.direct and gfn.
> > + this case, the array of shadowed_translation is not allocated. See
>
> I cannot parse the before version nor the after version of this sentence (new version):
>
> When role.direct is set, any
> element of this array can be calculated from the gfn field when used, in
> this case, the array of shadowed_translation is not allocated.
>
>

Sorry for the late reply. Why is it not parsed? It just means that when
role.direct is set, do not use gfns. The gfn can be calculated from the
base address + offset. The base address here is the 'gfn' field in
kvm_mmu_page.

> > + role.direct and gfn.
> > root_count:
> > A counter keeping track of how many hardware registers (guest cr3 or
> > pdptrs) are now pointing at the page. While this counter is nonzero, the
>
> --
> ~Randy

2023-07-31 20:33:58

by Mingwei Zhang

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] KVM: Documentation: Update the field name gfns and its description in kvm_mmu_page

On Mon, Jul 31, 2023, Sean Christopherson wrote:
> On Mon, Jul 31, 2023, Mingwei Zhang wrote:
> > On Mon, Jun 26, 2023, Randy Dunlap wrote:
> > > Hi--
> > >
> > > On 6/26/23 11:20, Mingwei Zhang wrote:
> > > > Update the field 'gfns' in kvm_mmu_page to 'shadowed_translation' to be
> > > > consistent with the code. Also update the corresponding 'gfns' in the
> > > > comments. The more detailed description of 'shadowed_translation' is
> > > > already inlined in the data structure definition, so no need to duplicate
> > > > the text but simply just update the names.
> > > >
> > > > Signed-off-by: Mingwei Zhang <[email protected]>
> > > > Reviewed-by: Kai Huang <[email protected]>
> > > > ---
> > > > Documentation/virt/kvm/x86/mmu.rst | 9 +++++----
> > > > 1 file changed, 5 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
> > > > index 561efa8ec7d7..4c9044b4dc6c 100644
> > > > --- a/Documentation/virt/kvm/x86/mmu.rst
> > > > +++ b/Documentation/virt/kvm/x86/mmu.rst
> > > > @@ -221,11 +221,12 @@ Shadow pages contain the following information:
> > > > at __pa(sp2->spt). sp2 will point back at sp1 through parent_pte.
> > > > The spt array forms a DAG structure with the shadow page as a node, and
> > > > guest pages as leaves.
> > > > - gfns:
> > > > - An array of 512 guest frame numbers, one for each present pte. Used to
> > > > - perform a reverse map from a pte to a gfn. When role.direct is set, any
> > > > + shadowed_translation:
> > > > + An array of 512 shadow translation entries, one for each present pte. Used
> > > > + to perform a reverse map from a pte to a gfn. When role.direct is set, any
> > > > element of this array can be calculated from the gfn field when used, in
> > > > - this case, the array of gfns is not allocated. See role.direct and gfn.
> > > > + this case, the array of shadowed_translation is not allocated. See
> > >
> > > I cannot parse the before version nor the after version of this sentence (new version):
> > >
> > > When role.direct is set, any
> > > element of this array can be calculated from the gfn field when used, in
> > > this case, the array of shadowed_translation is not allocated.
> > >
> > >
> >
> > Sorry for the late reply. Why is it not parsed? It just means that when
> > role.direct is set, do not use gfns. The gfn can be calculated from the
> > base address + offset. The base address here is the 'gfn' field in
> > kvm_mmu_page.
>
> It's a bit of a run-on sentence with confusing pronoun usage. How about this?
>
> When role.direct is set, the shadow_translation array is not allocated as the
> per-SPTE gfn is simply an offset from the base gfn, and KVM doesn't track
> access permissions for direct shadow pages.

I think the problem might be that the sentence is slightly long. To be
accurate, we have to mention access permission which the original text
did not. Also, I split the sentences and try only using short ones. The
overall description will be longer. How about this?

shadowed_translation:
An array of 512 shadow translation entries, one for each present pte. Used
to perform a reverse map from a pte to a gfn as well as its access
permission. When role.direct is set, the shadow_translation array is not
allocated. This is because the gfn contained in any element of this array
can be calculated from the gfn field when used. In addition, when
role.direct is set, KVM does not track access permission for each of the
gfn. See role.direct and gfn.

2023-07-31 20:53:12

by Mingwei Zhang

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] KVM: Documentation: Add the missing description for ptep in kvm_mmu_page

On Tue, Jun 27, 2023, Sean Christopherson wrote:
> On Mon, Jun 26, 2023, Mingwei Zhang wrote:
> > Add the missing description for ptep in kvm_mmu_page description. ptep is
> > used when TDP MMU is enabled and it shares the storage with parent_ptes.
> > Update the doc to help readers to get up-to-date info.
> >
> > Signed-off-by: Mingwei Zhang <[email protected]>
> > ---
> > Documentation/virt/kvm/x86/mmu.rst | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
> > index 4c9044b4dc6c..5cd6cd5e8926 100644
> > --- a/Documentation/virt/kvm/x86/mmu.rst
> > +++ b/Documentation/virt/kvm/x86/mmu.rst
> > @@ -237,6 +237,11 @@ Shadow pages contain the following information:
> > parent_ptes points at this single spte, otherwise, there exists multiple
> > sptes pointing at this page and (parent_ptes & ~0x1) points at a data
> > structure with a list of parent sptes.
> > + ptep:
> > + The reverse mapping for the pte pointing at this page's spt. This field is
>
> I don't think describing "reverse mapping" is necessary, and it's arguably even
> misleading. A "reverse mapping" typically provides a way to find mappings given
> a (guest) physical address. The TDP MMU doesn't bother with reverse mappings
> because there is exactly one possible mapping for any given gfn. The "ptep" exists
> specifically to expedite zapping a single TDP MMU shadow page, i.e. allows zapping
> without having to traverse the paging tree.
>
> The ptep field is just a pointer at the SPTE, no more no less. Something like
> this?
>
> ptep:
> The kernel virtual address of the SPTE that points at this shadow page.
> Used exclusively by the TDP MMU, this field is a union with parent_ptes.
>

Sure, I can this version instead. Technically, it is still a reverse
mapping, but yeah, I agree that introducing this one is confusing.

> > + used in replace of parent_ptes when TDP MMU is used. In TDP MMU, each
> > + non-root shadow page will have one parent, while each root shadow page has
> > + no parent. Note that this field is a union with parent_ptes.
> > unsync:
> > If true, then the translations in this page may not match the guest's
> > translation. This is equivalent to the state of the tlb when a pte is
> > --
> > 2.41.0.162.gfafddb0af9-goog
> >

2023-07-31 23:18:50

by Mingwei Zhang

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] KVM: Documentation: Add the missing description for mmu_valid_gen into kvm_mmu_page

On Tue, Jun 27, 2023, Sean Christopherson wrote:
> On Mon, Jun 26, 2023, Mingwei Zhang wrote:
> > Add the description for mmu_valid_gen into kvm_mmu_page description.
> > mmu_valid_gen is used in shadow MMU for fast zapping. Update the doc to
> > reflect that.
> >
> > Signed-off-by: Mingwei Zhang <[email protected]>
> > Reviewed-by: Kai Huang <[email protected]>
> > ---
> > Documentation/virt/kvm/x86/mmu.rst | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
> > index 97d695207e11..cc4bd190c93d 100644
> > --- a/Documentation/virt/kvm/x86/mmu.rst
> > +++ b/Documentation/virt/kvm/x86/mmu.rst
> > @@ -208,6 +208,10 @@ Shadow pages contain the following information:
> > The page is not backed by a guest page table, but its first entry
> > points to one. This is set if NPT uses 5-level page tables (host
> > CR4.LA57=1) and is shadowing L1's 4-level NPT (L1 CR4.LA57=1).
> > + mmu_valid_gen:
> > + Used by comparing against kvm->arch.mmu_valid_gen to check whether the
>
> This needs to explain what the generation is, and where it comes from.
>
> The MMU generation of this page, used to effect a "fast" zap of all MMU pages
> across all roots. To zap all pages in all roots without blocking vCPUs, e.g.
> when deleting a memslot, KVM updates the per-VM valid MMU generation to mark
> all existing pages and roots invalid/obsolete. Obsolete pages can't be used,
> e.g. vCPUs must load a new, valid root before re-entering the guest.
>
> The MMU generation is only ever '0' or '1', as slots_lock must be held until
> all obsolete pages are zapped and freed, i.e. there is exactly one valid
> generation and (at most) one invalid generation.
>
> Note, the TDP MMU doesn't use mmu_gen as non-root TDP MMU pages are reachable
> only from their owning root, whereas all pages for shadow MMUs are reachable
> via the hash map. The TDP MMU uses role.invalid to track obsolete roots.

Sean, thanks for the detailed explanation. I will pick the most of the
content and get into the next version.
>
> And then big bonus points if you add
>
> Page Role
> =========
>
> to explain the purpose of the role, and how/when it's used in the shadow MMU versus
> the TDP MMU. The shadow MMU's use of a hash map is a fundemental aspect that really
> should be documented here.
>
> > + shadow page is obsolete thus a convenient variable for fast zapping.
> > + Note that TDP MMU does not use mmu_valid_gen.
> > gfn:
> > Either the guest page table containing the translations shadowed by this
> > page, or the base page frame for linear translations. See role.direct.
> > --
> > 2.41.0.162.gfafddb0af9-goog
> >