2022-05-02 23:22:09

by Yosry Ahmed

[permalink] [raw]
Subject: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

Add NR_SECONDARY_PAGETABLE stat to count secondary page table uses, e.g.
KVM mmu. This provides more insights on the kernel memory used
by a workload.

This stat will be used by subsequent patches to count KVM mmu
memory usage.

Signed-off-by: Yosry Ahmed <[email protected]>
---
Documentation/admin-guide/cgroup-v2.rst | 5 +++++
Documentation/filesystems/proc.rst | 4 ++++
drivers/base/node.c | 2 ++
fs/proc/meminfo.c | 2 ++
include/linux/mmzone.h | 1 +
mm/memcontrol.c | 1 +
mm/page_alloc.c | 6 +++++-
mm/vmstat.c | 1 +
8 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 69d7a6983f78..828cb6b6f918 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1312,6 +1312,11 @@ PAGE_SIZE multiple when read back.
pagetables
Amount of memory allocated for page tables.

+ secondary_pagetables
+ Amount of memory allocated for secondary page tables,
+ this currently includes KVM mmu allocations on x86
+ and arm64.
+
percpu (npn)
Amount of memory used for storing per-cpu kernel
data structures.
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 061744c436d9..894d6317f3bd 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -973,6 +973,7 @@ You may not have all of these fields.
SReclaimable: 159856 kB
SUnreclaim: 124508 kB
PageTables: 24448 kB
+ SecPageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
@@ -1067,6 +1068,9 @@ SUnreclaim
PageTables
amount of memory dedicated to the lowest level of page
tables.
+SecPageTables
+ amount of memory dedicated to secondary page tables, this
+ currently includes KVM mmu allocations on x86 and arm64.
NFS_Unstable
Always zero. Previous counted pages which had been written to
the server, but has not been committed to stable storage.
diff --git a/drivers/base/node.c b/drivers/base/node.c
index ec8bb24a5a22..9fe716832546 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -433,6 +433,7 @@ static ssize_t node_read_meminfo(struct device *dev,
"Node %d ShadowCallStack:%8lu kB\n"
#endif
"Node %d PageTables: %8lu kB\n"
+ "Node %d SecPageTables: %8lu kB\n"
"Node %d NFS_Unstable: %8lu kB\n"
"Node %d Bounce: %8lu kB\n"
"Node %d WritebackTmp: %8lu kB\n"
@@ -459,6 +460,7 @@ static ssize_t node_read_meminfo(struct device *dev,
nid, node_page_state(pgdat, NR_KERNEL_SCS_KB),
#endif
nid, K(node_page_state(pgdat, NR_PAGETABLE)),
+ nid, K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
nid, 0UL,
nid, K(sum_zone_node_page_state(nid, NR_BOUNCE)),
nid, K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 6fa761c9cc78..fad29024eb2e 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -108,6 +108,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
#endif
show_val_kb(m, "PageTables: ",
global_node_page_state(NR_PAGETABLE));
+ show_val_kb(m, "SecPageTables: ",
+ global_node_page_state(NR_SECONDARY_PAGETABLE));

show_val_kb(m, "NFS_Unstable: ", 0);
show_val_kb(m, "Bounce: ",
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 962b14d403e8..35f57f2578c0 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -219,6 +219,7 @@ enum node_stat_item {
NR_KERNEL_SCS_KB, /* measured in KiB */
#endif
NR_PAGETABLE, /* used for pagetables */
+ NR_SECONDARY_PAGETABLE, /* secondary pagetables, e.g. kvm shadow pagetables */
#ifdef CONFIG_SWAP
NR_SWAPCACHE,
#endif
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 725f76723220..89fbd1793960 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1388,6 +1388,7 @@ static const struct memory_stat memory_stats[] = {
{ "kernel", MEMCG_KMEM },
{ "kernel_stack", NR_KERNEL_STACK_KB },
{ "pagetables", NR_PAGETABLE },
+ { "secondary_pagetables", NR_SECONDARY_PAGETABLE },
{ "percpu", MEMCG_PERCPU_B },
{ "sock", MEMCG_SOCK },
{ "vmalloc", MEMCG_VMALLOC },
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2db95780e003..96d00ae9d5c1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5932,7 +5932,8 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
" active_file:%lu inactive_file:%lu isolated_file:%lu\n"
" unevictable:%lu dirty:%lu writeback:%lu\n"
" slab_reclaimable:%lu slab_unreclaimable:%lu\n"
- " mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n"
+ " mapped:%lu shmem:%lu pagetables:%lu\n"
+ " secondary_pagetables:%lu bounce:%lu\n"
" kernel_misc_reclaimable:%lu\n"
" free:%lu free_pcp:%lu free_cma:%lu\n",
global_node_page_state(NR_ACTIVE_ANON),
@@ -5949,6 +5950,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
global_node_page_state(NR_FILE_MAPPED),
global_node_page_state(NR_SHMEM),
global_node_page_state(NR_PAGETABLE),
+ global_node_page_state(NR_SECONDARY_PAGETABLE),
global_zone_page_state(NR_BOUNCE),
global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE),
global_zone_page_state(NR_FREE_PAGES),
@@ -5982,6 +5984,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
" shadow_call_stack:%lukB"
#endif
" pagetables:%lukB"
+ " secondary_pagetables:%lukB"
" all_unreclaimable? %s"
"\n",
pgdat->node_id,
@@ -6007,6 +6010,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
node_page_state(pgdat, NR_KERNEL_SCS_KB),
#endif
K(node_page_state(pgdat, NR_PAGETABLE)),
+ K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES ?
"yes" : "no");
}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index b75b1a64b54c..50bbec73809b 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1240,6 +1240,7 @@ const char * const vmstat_text[] = {
"nr_shadow_call_stack",
#endif
"nr_page_table_pages",
+ "nr_secondary_page_table_pages",
#ifdef CONFIG_SWAP
"nr_swapcached",
#endif
--
2.36.0.464.gb9c8b46e94-goog


2022-05-03 00:16:39

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

On Fri, 29 Apr 2022 21:11:28 +0100,
Yosry Ahmed <[email protected]> wrote:
>
> Add NR_SECONDARY_PAGETABLE stat to count secondary page table uses, e.g.
> KVM mmu. This provides more insights on the kernel memory used
> by a workload.
>
> This stat will be used by subsequent patches to count KVM mmu
> memory usage.
>
> Signed-off-by: Yosry Ahmed <[email protected]>
> ---
> Documentation/admin-guide/cgroup-v2.rst | 5 +++++
> Documentation/filesystems/proc.rst | 4 ++++
> drivers/base/node.c | 2 ++
> fs/proc/meminfo.c | 2 ++
> include/linux/mmzone.h | 1 +
> mm/memcontrol.c | 1 +
> mm/page_alloc.c | 6 +++++-
> mm/vmstat.c | 1 +
> 8 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 69d7a6983f78..828cb6b6f918 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1312,6 +1312,11 @@ PAGE_SIZE multiple when read back.
> pagetables
> Amount of memory allocated for page tables.
>
> + secondary_pagetables
> + Amount of memory allocated for secondary page tables,
> + this currently includes KVM mmu allocations on x86
> + and arm64.

Can you please explain what the rationale is for this? We already
account for the (arm64) S2 PTs as a userspace allocation (see
115bae923ac8bb29ee635). You are saying that this is related to a
'workload', but given that the accounting is global, I fail to see how
you can attribute these allocations on a particular VM.

What do you plan to do for IOMMU page tables? After all, they serve
the exact same purpose, and I'd expect these to be handled the same
way (i.e. why is this KVM specific?).

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2022-05-03 00:36:56

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

On Mon, May 2, 2022 at 3:01 AM Marc Zyngier <[email protected]> wrote:
>
> On Fri, 29 Apr 2022 21:11:28 +0100,
> Yosry Ahmed <[email protected]> wrote:
> >
> > Add NR_SECONDARY_PAGETABLE stat to count secondary page table uses, e.g.
> > KVM mmu. This provides more insights on the kernel memory used
> > by a workload.
> >
> > This stat will be used by subsequent patches to count KVM mmu
> > memory usage.
> >
> > Signed-off-by: Yosry Ahmed <[email protected]>
> > ---
> > Documentation/admin-guide/cgroup-v2.rst | 5 +++++
> > Documentation/filesystems/proc.rst | 4 ++++
> > drivers/base/node.c | 2 ++
> > fs/proc/meminfo.c | 2 ++
> > include/linux/mmzone.h | 1 +
> > mm/memcontrol.c | 1 +
> > mm/page_alloc.c | 6 +++++-
> > mm/vmstat.c | 1 +
> > 8 files changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > index 69d7a6983f78..828cb6b6f918 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1312,6 +1312,11 @@ PAGE_SIZE multiple when read back.
> > pagetables
> > Amount of memory allocated for page tables.
> >
> > + secondary_pagetables
> > + Amount of memory allocated for secondary page tables,
> > + this currently includes KVM mmu allocations on x86
> > + and arm64.
>
> Can you please explain what the rationale is for this? We already
> account for the (arm64) S2 PTs as a userspace allocation (see

This can be considered as continuation for that work. The mentioned
commit accounts S2 PTs to the VM process cgroup kernel memory. We have
stats for the total kernel memory, and some fine-grained categories of
that, like (pagetables, stack, slab, etc.).

This patch just adds another category to give further insights into
what exactly is using kernel memory.

> 115bae923ac8bb29ee635). You are saying that this is related to a
> 'workload', but given that the accounting is global, I fail to see how
> you can attribute these allocations on a particular VM.

The main motivation is having the memcg stats, which give attribution
to workloads. If you think it's more appropriate, we can add it as a
memcg-only stat, like MEMCG_VMALLOC (see 4e5aa1f4c2b4 ("memcg: add
per-memcg vmalloc stat")). The only reason I made this as a global
stat too is to be consistent with NR_PAGETABLE.

>
> What do you plan to do for IOMMU page tables? After all, they serve
> the exact same purpose, and I'd expect these to be handled the same
> way (i.e. why is this KVM specific?).

The reason this was named NR_SECONDARY_PAGTABLE instead of
NR_KVM_PAGETABLE is exactly that. To leave room to incrementally
account other types of secondary page tables to this stat. It is just
that we are currently interested in the KVM MMU usage.

>
> Thanks,
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.

2022-05-09 16:45:05

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

On Mon, May 2, 2022 at 11:46 AM Yosry Ahmed <[email protected]> wrote:
>
> On Mon, May 2, 2022 at 3:01 AM Marc Zyngier <[email protected]> wrote:
> >
> > On Fri, 29 Apr 2022 21:11:28 +0100,
> > Yosry Ahmed <[email protected]> wrote:
> > >
> > > Add NR_SECONDARY_PAGETABLE stat to count secondary page table uses, e.g.
> > > KVM mmu. This provides more insights on the kernel memory used
> > > by a workload.
> > >
> > > This stat will be used by subsequent patches to count KVM mmu
> > > memory usage.
> > >
> > > Signed-off-by: Yosry Ahmed <[email protected]>
> > > ---
> > > Documentation/admin-guide/cgroup-v2.rst | 5 +++++
> > > Documentation/filesystems/proc.rst | 4 ++++
> > > drivers/base/node.c | 2 ++
> > > fs/proc/meminfo.c | 2 ++
> > > include/linux/mmzone.h | 1 +
> > > mm/memcontrol.c | 1 +
> > > mm/page_alloc.c | 6 +++++-
> > > mm/vmstat.c | 1 +
> > > 8 files changed, 21 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > > index 69d7a6983f78..828cb6b6f918 100644
> > > --- a/Documentation/admin-guide/cgroup-v2.rst
> > > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > > @@ -1312,6 +1312,11 @@ PAGE_SIZE multiple when read back.
> > > pagetables
> > > Amount of memory allocated for page tables.
> > >
> > > + secondary_pagetables
> > > + Amount of memory allocated for secondary page tables,
> > > + this currently includes KVM mmu allocations on x86
> > > + and arm64.
> >
> > Can you please explain what the rationale is for this? We already
> > account for the (arm64) S2 PTs as a userspace allocation (see
>
> This can be considered as continuation for that work. The mentioned
> commit accounts S2 PTs to the VM process cgroup kernel memory. We have
> stats for the total kernel memory, and some fine-grained categories of
> that, like (pagetables, stack, slab, etc.).
>
> This patch just adds another category to give further insights into
> what exactly is using kernel memory.
>
> > 115bae923ac8bb29ee635). You are saying that this is related to a
> > 'workload', but given that the accounting is global, I fail to see how
> > you can attribute these allocations on a particular VM.
>
> The main motivation is having the memcg stats, which give attribution
> to workloads. If you think it's more appropriate, we can add it as a
> memcg-only stat, like MEMCG_VMALLOC (see 4e5aa1f4c2b4 ("memcg: add
> per-memcg vmalloc stat")). The only reason I made this as a global
> stat too is to be consistent with NR_PAGETABLE.
>
> >
> > What do you plan to do for IOMMU page tables? After all, they serve
> > the exact same purpose, and I'd expect these to be handled the same
> > way (i.e. why is this KVM specific?).
>
> The reason this was named NR_SECONDARY_PAGTABLE instead of
> NR_KVM_PAGETABLE is exactly that. To leave room to incrementally
> account other types of secondary page tables to this stat. It is just
> that we are currently interested in the KVM MMU usage.
>


Any thoughts on this? Do you think MEMCG_SECONDARY_PAGETABLE would be
more appropriate here?

2022-05-13 04:43:29

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

On Thu, May 12, 2022, Johannes Weiner wrote:
> Hey Yosry,
>
> On Mon, May 02, 2022 at 11:46:26AM -0700, Yosry Ahmed wrote:
> > On Mon, May 2, 2022 at 3:01 AM Marc Zyngier <[email protected]> wrote:
> > > 115bae923ac8bb29ee635). You are saying that this is related to a
> > > 'workload', but given that the accounting is global, I fail to see how
> > > you can attribute these allocations on a particular VM.
> >
> > The main motivation is having the memcg stats, which give attribution
> > to workloads. If you think it's more appropriate, we can add it as a
> > memcg-only stat, like MEMCG_VMALLOC (see 4e5aa1f4c2b4 ("memcg: add
> > per-memcg vmalloc stat")). The only reason I made this as a global
> > stat too is to be consistent with NR_PAGETABLE.
>
> Please no memcg-specific stats if a regular vmstat item is possible
> and useful at the system level as well, like in this case. It's extra
> memcg code, extra callbacks, and it doesn't have NUMA node awareness.
>
> > > What do you plan to do for IOMMU page tables? After all, they serve
> > > the exact same purpose, and I'd expect these to be handled the same
> > > way (i.e. why is this KVM specific?).
> >
> > The reason this was named NR_SECONDARY_PAGTABLE instead of
> > NR_KVM_PAGETABLE is exactly that. To leave room to incrementally
> > account other types of secondary page tables to this stat. It is just
> > that we are currently interested in the KVM MMU usage.
>
> Do you actually care at the supervisor level that this memory is used
> for guest page tables?

Hmm, yes? KVM does have a decent number of large-ish allocations that aren't
for page tables, but except for page tables, the number/size of those allocations
scales linearly with either the number of vCPUs or the amount of memory assigned
to the VM (with no room for improvement barring KVM changes).

Off the top of my head, KVM's secondary page tables are the only allocations that
don't scale linearly, especially when nested virtualization is in use.

> It seems to me you primarily care that it is reported *somewhere*
> (hence the piggybacking off of NR_PAGETABLE at first). And whether
> it's page tables or iommu tables or whatever else allocated for the
> purpose of virtualization, it doesn't make much of a difference to the
> host/cgroup that is tracking it, right?
>
> (The proximity to nr_pagetable could also be confusing. A high page
> table count can be a hint to userspace to enable THP. It seems
> actionable in a different way than a high number of kvm page tables or
> iommu page tables.)

I don't know about iommu page tables, but on the KVM side a high count can also
be a good signal that enabling THP would be beneficial. It's definitely actionable
in a different way though too.

> How about NR_VIRT? It's shorter, seems descriptive enough, less room
> for confusion, and is more easily extensible in the future.

I don't like NR_VIRT because VFIO/iommu can be used for non-virtualization things,
and we'd be lying by omission unless KVM (and other users) updates all of its
large-ish allocations to account them correctly.

2022-05-14 00:44:28

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

Thanks everyone for participating in this discussion and looking into this.

On Fri, May 13, 2022 at 9:12 AM Sean Christopherson <[email protected]> wrote:
>
> On Fri, May 13, 2022, Johannes Weiner wrote:
> > On Thu, May 12, 2022 at 11:29:38PM +0000, Sean Christopherson wrote:
> > > On Thu, May 12, 2022, Johannes Weiner wrote:
> > > > On Mon, May 02, 2022 at 11:46:26AM -0700, Yosry Ahmed wrote:
> > > > > On Mon, May 2, 2022 at 3:01 AM Marc Zyngier <[email protected]> wrote:
> > > > > > What do you plan to do for IOMMU page tables? After all, they serve
> > > > > > the exact same purpose, and I'd expect these to be handled the same
> > > > > > way (i.e. why is this KVM specific?).
> > > > >
> > > > > The reason this was named NR_SECONDARY_PAGTABLE instead of
> > > > > NR_KVM_PAGETABLE is exactly that. To leave room to incrementally
> > > > > account other types of secondary page tables to this stat. It is just
> > > > > that we are currently interested in the KVM MMU usage.
> > > >
> > > > Do you actually care at the supervisor level that this memory is used
> > > > for guest page tables?
> > >
> > > Hmm, yes? KVM does have a decent number of large-ish allocations that aren't
> > > for page tables, but except for page tables, the number/size of those allocations
> > > scales linearly with either the number of vCPUs or the amount of memory assigned
> > > to the VM (with no room for improvement barring KVM changes).
> > >
> > > Off the top of my head, KVM's secondary page tables are the only allocations that
> > > don't scale linearly, especially when nested virtualization is in use.
> >
> > Thanks, that's useful information.
> >
> > Are these other allocations accounted somewhere? If not, are they
> > potential containment holes that will need fixing eventually?
>
> All allocations that are tied to specific VM/vCPU are tagged GFP_KERNEL_ACCOUNT,
> so we should be good on that front.
>
> > > > It seems to me you primarily care that it is reported *somewhere*
> > > > (hence the piggybacking off of NR_PAGETABLE at first). And whether
> > > > it's page tables or iommu tables or whatever else allocated for the
> > > > purpose of virtualization, it doesn't make much of a difference to the
> > > > host/cgroup that is tracking it, right?
> > > >
> > > > (The proximity to nr_pagetable could also be confusing. A high page
> > > > table count can be a hint to userspace to enable THP. It seems
> > > > actionable in a different way than a high number of kvm page tables or
> > > > iommu page tables.)
> > >
> > > I don't know about iommu page tables, but on the KVM side a high count can also
> > > be a good signal that enabling THP would be beneficial.
> >
> > Well, maybe.
> >
> > It might help, but ultimately it's the process that's in control in
> > all cases: it's unmovable kernel memory allocated to manage virtual
> > address space inside the task.
> >
> > So I'm still a bit at a loss whether these things should all be lumped
> > in together or kept separately. meminfo and memory.stat are permanent
> > ABI, so we should try to establish in advance whether the new itme is
> > really a first-class consumer or part of something bigger.
> >
> > The patch initially piggybacked on NR_PAGETABLE. I found an email of
> > you asking why it couldn't be a separate item, but it didn't provide a
> > reasoning for that decision. Could you share your thoughts on that?
>
> It was mostly an honest question, I too am trying to understand what userspace
> wants to do with this information. I was/am also trying to understand the benefits
> of doing the tracking through page_state and not a dedicated KVM stat. E.g. KVM
> already has specific stats for the number of leaf pages mapped into a VM, why not
> do the same for non-leaf pages?

Let me cast some light on this. The reason this started being
piggybacked on NR_PAGETABLE is that we had a remnant of an old
internal implementation of NR_PAGETABLE before it was introduced
upstream, that accounted KVM secondary page tables as normal page
tables. This made me think this behavior was preferable. Personally, I
wanted to make it a separate thing since the beginning. When I found
opinions here that also suggested a separate stat I went ahead for
that.

As for where to put this information, it does not have to be
NR_SECONDARY_PAGETABLE. Ultimately, people working on KVM are the ones
that will interpret and act upon this data, so if you have somewhere
else in mind please let me know, Sean.

2022-05-14 01:37:17

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

On Thu, May 12, 2022 at 11:29:38PM +0000, Sean Christopherson wrote:
> On Thu, May 12, 2022, Johannes Weiner wrote:
> > On Mon, May 02, 2022 at 11:46:26AM -0700, Yosry Ahmed wrote:
> > > On Mon, May 2, 2022 at 3:01 AM Marc Zyngier <[email protected]> wrote:
> > > > What do you plan to do for IOMMU page tables? After all, they serve
> > > > the exact same purpose, and I'd expect these to be handled the same
> > > > way (i.e. why is this KVM specific?).
> > >
> > > The reason this was named NR_SECONDARY_PAGTABLE instead of
> > > NR_KVM_PAGETABLE is exactly that. To leave room to incrementally
> > > account other types of secondary page tables to this stat. It is just
> > > that we are currently interested in the KVM MMU usage.
> >
> > Do you actually care at the supervisor level that this memory is used
> > for guest page tables?
>
> Hmm, yes? KVM does have a decent number of large-ish allocations that aren't
> for page tables, but except for page tables, the number/size of those allocations
> scales linearly with either the number of vCPUs or the amount of memory assigned
> to the VM (with no room for improvement barring KVM changes).
>
> Off the top of my head, KVM's secondary page tables are the only allocations that
> don't scale linearly, especially when nested virtualization is in use.

Thanks, that's useful information.

Are these other allocations accounted somewhere? If not, are they
potential containment holes that will need fixing eventually?

> > It seems to me you primarily care that it is reported *somewhere*
> > (hence the piggybacking off of NR_PAGETABLE at first). And whether
> > it's page tables or iommu tables or whatever else allocated for the
> > purpose of virtualization, it doesn't make much of a difference to the
> > host/cgroup that is tracking it, right?
> >
> > (The proximity to nr_pagetable could also be confusing. A high page
> > table count can be a hint to userspace to enable THP. It seems
> > actionable in a different way than a high number of kvm page tables or
> > iommu page tables.)
>
> I don't know about iommu page tables, but on the KVM side a high count can also
> be a good signal that enabling THP would be beneficial.

Well, maybe.

It might help, but ultimately it's the process that's in control in
all cases: it's unmovable kernel memory allocated to manage virtual
address space inside the task.

So I'm still a bit at a loss whether these things should all be lumped
in together or kept separately. meminfo and memory.stat are permanent
ABI, so we should try to establish in advance whether the new itme is
really a first-class consumer or part of something bigger.

The patch initially piggybacked on NR_PAGETABLE. I found an email of
you asking why it couldn't be a separate item, but it didn't provide a
reasoning for that decision. Could you share your thoughts on that?

Thanks

2022-05-14 02:04:42

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

Hey Yosry,

On Mon, May 02, 2022 at 11:46:26AM -0700, Yosry Ahmed wrote:
> On Mon, May 2, 2022 at 3:01 AM Marc Zyngier <[email protected]> wrote:
> > 115bae923ac8bb29ee635). You are saying that this is related to a
> > 'workload', but given that the accounting is global, I fail to see how
> > you can attribute these allocations on a particular VM.
>
> The main motivation is having the memcg stats, which give attribution
> to workloads. If you think it's more appropriate, we can add it as a
> memcg-only stat, like MEMCG_VMALLOC (see 4e5aa1f4c2b4 ("memcg: add
> per-memcg vmalloc stat")). The only reason I made this as a global
> stat too is to be consistent with NR_PAGETABLE.

Please no memcg-specific stats if a regular vmstat item is possible
and useful at the system level as well, like in this case. It's extra
memcg code, extra callbacks, and it doesn't have NUMA node awareness.

> > What do you plan to do for IOMMU page tables? After all, they serve
> > the exact same purpose, and I'd expect these to be handled the same
> > way (i.e. why is this KVM specific?).
>
> The reason this was named NR_SECONDARY_PAGTABLE instead of
> NR_KVM_PAGETABLE is exactly that. To leave room to incrementally
> account other types of secondary page tables to this stat. It is just
> that we are currently interested in the KVM MMU usage.

Do you actually care at the supervisor level that this memory is used
for guest page tables?

It seems to me you primarily care that it is reported *somewhere*
(hence the piggybacking off of NR_PAGETABLE at first). And whether
it's page tables or iommu tables or whatever else allocated for the
purpose of virtualization, it doesn't make much of a difference to the
host/cgroup that is tracking it, right?

(The proximity to nr_pagetable could also be confusing. A high page
table count can be a hint to userspace to enable THP. It seems
actionable in a different way than a high number of kvm page tables or
iommu page tables.)

How about NR_VIRT? It's shorter, seems descriptive enough, less room
for confusion, and is more easily extensible in the future.

2022-05-14 02:15:52

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

On Mon, May 9, 2022 at 9:38 AM Yosry Ahmed <[email protected]> wrote:
>
[...]
> > >
> > > What do you plan to do for IOMMU page tables? After all, they serve
> > > the exact same purpose, and I'd expect these to be handled the same
> > > way (i.e. why is this KVM specific?).
> >
> > The reason this was named NR_SECONDARY_PAGTABLE instead of
> > NR_KVM_PAGETABLE is exactly that. To leave room to incrementally
> > account other types of secondary page tables to this stat. It is just
> > that we are currently interested in the KVM MMU usage.
> >
>
>
> Any thoughts on this? Do you think MEMCG_SECONDARY_PAGETABLE would be
> more appropriate here?

I think NR_SECONDARY_PAGTABLE is good. Later it can include pagetables
from other subsystems. The only nit (which you can ignore) I have is
the very long memcg stat and vmstat names. Other than that the patch
looks good to me.

2022-05-14 03:53:26

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

On Fri, May 13, 2022 at 9:12 AM Sean Christopherson <[email protected]> wrote:
>
[...]
>
> It was mostly an honest question, I too am trying to understand what userspace
> wants to do with this information. I was/am also trying to understand the benefits
> of doing the tracking through page_state and not a dedicated KVM stat. E.g. KVM
> already has specific stats for the number of leaf pages mapped into a VM, why not
> do the same for non-leaf pages?

Let me answer why a more general stat is useful and the potential
userspace reaction:

For a memory type which is significant enough, it is useful to expose
it in the general interfaces, so that the general data/stat collection
infra can collect them instead of having workload dependent stat
collectors. In addition, not necessarily that stat has to have a
userspace reaction in an online fashion. We do collect stats for
offline analysis which greatly influence the priority order of
optimization workitems.

Next the question is do we really need a separate stat item
(secondary_pagetable instead of just plain pagetable) exposed in the
stable API? To me secondary_pagetable is general (not kvm specific)
enough and can be significant, so having a separate dedicated stat
should be ok. Though I am ok with lump it with pagetable stat for now
but we do want it to be accounted somewhere.

2022-05-14 03:59:37

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

On Fri, May 13, 2022, Johannes Weiner wrote:
> On Thu, May 12, 2022 at 11:29:38PM +0000, Sean Christopherson wrote:
> > On Thu, May 12, 2022, Johannes Weiner wrote:
> > > On Mon, May 02, 2022 at 11:46:26AM -0700, Yosry Ahmed wrote:
> > > > On Mon, May 2, 2022 at 3:01 AM Marc Zyngier <[email protected]> wrote:
> > > > > What do you plan to do for IOMMU page tables? After all, they serve
> > > > > the exact same purpose, and I'd expect these to be handled the same
> > > > > way (i.e. why is this KVM specific?).
> > > >
> > > > The reason this was named NR_SECONDARY_PAGTABLE instead of
> > > > NR_KVM_PAGETABLE is exactly that. To leave room to incrementally
> > > > account other types of secondary page tables to this stat. It is just
> > > > that we are currently interested in the KVM MMU usage.
> > >
> > > Do you actually care at the supervisor level that this memory is used
> > > for guest page tables?
> >
> > Hmm, yes? KVM does have a decent number of large-ish allocations that aren't
> > for page tables, but except for page tables, the number/size of those allocations
> > scales linearly with either the number of vCPUs or the amount of memory assigned
> > to the VM (with no room for improvement barring KVM changes).
> >
> > Off the top of my head, KVM's secondary page tables are the only allocations that
> > don't scale linearly, especially when nested virtualization is in use.
>
> Thanks, that's useful information.
>
> Are these other allocations accounted somewhere? If not, are they
> potential containment holes that will need fixing eventually?

All allocations that are tied to specific VM/vCPU are tagged GFP_KERNEL_ACCOUNT,
so we should be good on that front.

> > > It seems to me you primarily care that it is reported *somewhere*
> > > (hence the piggybacking off of NR_PAGETABLE at first). And whether
> > > it's page tables or iommu tables or whatever else allocated for the
> > > purpose of virtualization, it doesn't make much of a difference to the
> > > host/cgroup that is tracking it, right?
> > >
> > > (The proximity to nr_pagetable could also be confusing. A high page
> > > table count can be a hint to userspace to enable THP. It seems
> > > actionable in a different way than a high number of kvm page tables or
> > > iommu page tables.)
> >
> > I don't know about iommu page tables, but on the KVM side a high count can also
> > be a good signal that enabling THP would be beneficial.
>
> Well, maybe.
>
> It might help, but ultimately it's the process that's in control in
> all cases: it's unmovable kernel memory allocated to manage virtual
> address space inside the task.
>
> So I'm still a bit at a loss whether these things should all be lumped
> in together or kept separately. meminfo and memory.stat are permanent
> ABI, so we should try to establish in advance whether the new itme is
> really a first-class consumer or part of something bigger.
>
> The patch initially piggybacked on NR_PAGETABLE. I found an email of
> you asking why it couldn't be a separate item, but it didn't provide a
> reasoning for that decision. Could you share your thoughts on that?

It was mostly an honest question, I too am trying to understand what userspace
wants to do with this information. I was/am also trying to understand the benefits
of doing the tracking through page_state and not a dedicated KVM stat. E.g. KVM
already has specific stats for the number of leaf pages mapped into a VM, why not
do the same for non-leaf pages?

2022-05-21 09:14:15

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

On Thu, May 19, 2022 at 06:56:54PM -0700, Yosry Ahmed wrote:
> On Fri, May 13, 2022 at 10:14 AM Shakeel Butt <[email protected]> wrote:
> >
> > On Fri, May 13, 2022 at 9:12 AM Sean Christopherson <[email protected]> wrote:
> > >
> > [...]
> > >
> > > It was mostly an honest question, I too am trying to understand what userspace
> > > wants to do with this information. I was/am also trying to understand the benefits
> > > of doing the tracking through page_state and not a dedicated KVM stat. E.g. KVM
> > > already has specific stats for the number of leaf pages mapped into a VM, why not
> > > do the same for non-leaf pages?
> >
> > Let me answer why a more general stat is useful and the potential
> > userspace reaction:
> >
> > For a memory type which is significant enough, it is useful to expose
> > it in the general interfaces, so that the general data/stat collection
> > infra can collect them instead of having workload dependent stat
> > collectors. In addition, not necessarily that stat has to have a
> > userspace reaction in an online fashion. We do collect stats for
> > offline analysis which greatly influence the priority order of
> > optimization workitems.
> >
> > Next the question is do we really need a separate stat item
> > (secondary_pagetable instead of just plain pagetable) exposed in the
> > stable API? To me secondary_pagetable is general (not kvm specific)
> > enough and can be significant, so having a separate dedicated stat
> > should be ok. Though I am ok with lump it with pagetable stat for now
> > but we do want it to be accounted somewhere.
>
> Any thoughts on this? Johannes?

I agree that this memory should show up in vmstat/memory.stat in some
form or another.

The arguments on whether this should be part of NR_PAGETABLE or a
separate entry seem a bit vague to me. I was hoping somebody more
familiar with KVM could provide a better picture of memory consumption
in that area.

Sean had mentioned that these allocations already get tracked through
GFP_KERNEL_ACCOUNT. That's good, but if they are significant enough to
track, they should be represented in memory.stat in some form. Sean
also pointed out though that those allocations tend to scale rather
differently than the page tables, so it probably makes sense to keep
those two things separate at least.

Any thoughts on putting shadow page tables and iommu page tables into
the existing NR_PAGETABLE item? If not, what are the cons?

And creating (maybe later) a separate NR_VIRT for the other
GPF_KERNEL_ACCOUNT allocations in kvm?

2022-05-21 11:14:42

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

On Fri, May 13, 2022 at 10:14 AM Shakeel Butt <[email protected]> wrote:
>
> On Fri, May 13, 2022 at 9:12 AM Sean Christopherson <[email protected]> wrote:
> >
> [...]
> >
> > It was mostly an honest question, I too am trying to understand what userspace
> > wants to do with this information. I was/am also trying to understand the benefits
> > of doing the tracking through page_state and not a dedicated KVM stat. E.g. KVM
> > already has specific stats for the number of leaf pages mapped into a VM, why not
> > do the same for non-leaf pages?
>
> Let me answer why a more general stat is useful and the potential
> userspace reaction:
>
> For a memory type which is significant enough, it is useful to expose
> it in the general interfaces, so that the general data/stat collection
> infra can collect them instead of having workload dependent stat
> collectors. In addition, not necessarily that stat has to have a
> userspace reaction in an online fashion. We do collect stats for
> offline analysis which greatly influence the priority order of
> optimization workitems.
>
> Next the question is do we really need a separate stat item
> (secondary_pagetable instead of just plain pagetable) exposed in the
> stable API? To me secondary_pagetable is general (not kvm specific)
> enough and can be significant, so having a separate dedicated stat
> should be ok. Though I am ok with lump it with pagetable stat for now
> but we do want it to be accounted somewhere.

Any thoughts on this? Johannes?

2022-05-25 19:58:18

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

On Fri, May 20, 2022 at 7:39 AM Johannes Weiner <[email protected]> wrote:
>
> On Thu, May 19, 2022 at 06:56:54PM -0700, Yosry Ahmed wrote:
> > On Fri, May 13, 2022 at 10:14 AM Shakeel Butt <[email protected]> wrote:
> > >
> > > On Fri, May 13, 2022 at 9:12 AM Sean Christopherson <[email protected]> wrote:
> > > >
> > > [...]
> > > >
> > > > It was mostly an honest question, I too am trying to understand what userspace
> > > > wants to do with this information. I was/am also trying to understand the benefits
> > > > of doing the tracking through page_state and not a dedicated KVM stat. E.g. KVM
> > > > already has specific stats for the number of leaf pages mapped into a VM, why not
> > > > do the same for non-leaf pages?
> > >
> > > Let me answer why a more general stat is useful and the potential
> > > userspace reaction:
> > >
> > > For a memory type which is significant enough, it is useful to expose
> > > it in the general interfaces, so that the general data/stat collection
> > > infra can collect them instead of having workload dependent stat
> > > collectors. In addition, not necessarily that stat has to have a
> > > userspace reaction in an online fashion. We do collect stats for
> > > offline analysis which greatly influence the priority order of
> > > optimization workitems.
> > >
> > > Next the question is do we really need a separate stat item
> > > (secondary_pagetable instead of just plain pagetable) exposed in the
> > > stable API? To me secondary_pagetable is general (not kvm specific)
> > > enough and can be significant, so having a separate dedicated stat
> > > should be ok. Though I am ok with lump it with pagetable stat for now
> > > but we do want it to be accounted somewhere.
> >
> > Any thoughts on this? Johannes?
>
> I agree that this memory should show up in vmstat/memory.stat in some
> form or another.
>
> The arguments on whether this should be part of NR_PAGETABLE or a
> separate entry seem a bit vague to me. I was hoping somebody more
> familiar with KVM could provide a better picture of memory consumption
> in that area.
>
> Sean had mentioned that these allocations already get tracked through
> GFP_KERNEL_ACCOUNT. That's good, but if they are significant enough to
> track, they should be represented in memory.stat in some form. Sean
> also pointed out though that those allocations tend to scale rather
> differently than the page tables, so it probably makes sense to keep
> those two things separate at least.
>
> Any thoughts on putting shadow page tables and iommu page tables into
> the existing NR_PAGETABLE item? If not, what are the cons?
>
> And creating (maybe later) a separate NR_VIRT for the other
> GPF_KERNEL_ACCOUNT allocations in kvm?

I agree with Sean that a NR_VIRT stat would be inaccurate by omission,
unless we account for all KVM allocations under this stat. This might
be an unnecessary burden according to what Sean said, as most other
allocations scale linearly with the number of vCPUs or the memory
assigned to the VM.

I don't have enough context to say whether we should piggyback KVM MMU
pages to the existing NR_PAGETABLE item, but from a high level it
seems like it would be more helpful if they are a separate stat.
Anyway, I am willing to go with whatever Sean thinks is best.

2022-05-25 20:34:38

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

On Tue, May 24, 2022 at 03:31:52PM -0700, Yosry Ahmed wrote:
> On Fri, May 20, 2022 at 7:39 AM Johannes Weiner <[email protected]> wrote:
> > I agree that this memory should show up in vmstat/memory.stat in some
> > form or another.
> >
> > The arguments on whether this should be part of NR_PAGETABLE or a
> > separate entry seem a bit vague to me. I was hoping somebody more
> > familiar with KVM could provide a better picture of memory consumption
> > in that area.
> >
> > Sean had mentioned that these allocations already get tracked through
> > GFP_KERNEL_ACCOUNT. That's good, but if they are significant enough to
> > track, they should be represented in memory.stat in some form. Sean
> > also pointed out though that those allocations tend to scale rather
> > differently than the page tables, so it probably makes sense to keep
> > those two things separate at least.
> >
> > Any thoughts on putting shadow page tables and iommu page tables into
> > the existing NR_PAGETABLE item? If not, what are the cons?
> >
> > And creating (maybe later) a separate NR_VIRT for the other
> > GPF_KERNEL_ACCOUNT allocations in kvm?
>
> I agree with Sean that a NR_VIRT stat would be inaccurate by omission,
> unless we account for all KVM allocations under this stat. This might
> be an unnecessary burden according to what Sean said, as most other
> allocations scale linearly with the number of vCPUs or the memory
> assigned to the VM.

I think it's fine to table the addition of NR_VIRT for now. My
conclusion from this discussion was just that if we do want to add
more KVM-related allocation sites later on, they likely would be
something separate and not share an item with the shadow tables. This
simplifies the discussion around how to present the shadow tables.

That said, stats can be incremental and still useful. memory.current
itself lies by ommission. It's more important to catch what's of
significance and allow users to narrow down pathological cases.

> I don't have enough context to say whether we should piggyback KVM MMU
> pages to the existing NR_PAGETABLE item, but from a high level it
> seems like it would be more helpful if they are a separate stat.
> Anyway, I am willing to go with whatever Sean thinks is best.

Somebody should work this out and put it into a changelog. It's
permanent ABI.

2022-05-26 08:39:13

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

On Wed, May 25, 2022, Johannes Weiner wrote:
> On Tue, May 24, 2022 at 03:31:52PM -0700, Yosry Ahmed wrote:
> > I don't have enough context to say whether we should piggyback KVM MMU
> > pages to the existing NR_PAGETABLE item, but from a high level it
> > seems like it would be more helpful if they are a separate stat.
> > Anyway, I am willing to go with whatever Sean thinks is best.
>
> Somebody should work this out and put it into a changelog. It's
> permanent ABI.

After a lot of waffling, my vote is to add a dedicated NR_SECONDARY_PAGETABLE.

It's somewhat redundant from a KVM perspective, as NR_SECONDARY_PAGETABLE will
scale with KVM's per-VM pages_{4k,2m,1g} stats unless the guest is doing something
bizarre, e.g. accessing only 4kb chunks of 2mb pages so that KVM is forced to
allocate a large number of page tables even though the guest isn't accessing that
much memory.

But, someone would need to either understand how KVM works to make that connection,
or know (or be told) to go look at KVM's stats if they're running VMs to better
decipher the stats.

And even in the little bit of time I played with this, I found having
nr_page_table_pages side-by-side with nr_secondary_page_table_pages to be very
informative. E.g. when backing a VM with THP versus HugeTLB,
nr_secondary_page_table_pages is roughly the same, but nr_page_table_pages is an
order of a magnitude higher with THP. I'm guessing the THP behavior is due to
something triggering DoubleMap, but now I want to find out why that's happening.

So while I'm pretty sure a clever user could glean the same info by cross-referencing
NR_PAGETABLE stats with KVM stats, I think having NR_SECONDARY_PAGETABLE will at the
very least prove to be helpful for understanding tradeoffs between VM backing types,
and likely even steer folks towards potential optimizations.

Baseline:
# grep page_table /proc/vmstat
nr_page_table_pages 2830
nr_secondary_page_table_pages 0

THP:
# grep page_table /proc/vmstat
nr_page_table_pages 7584
nr_secondary_page_table_pages 140

HugeTLB:
# grep page_table /proc/vmstat
nr_page_table_pages 3153
nr_secondary_page_table_pages 153


2022-05-28 09:16:49

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

On Wed, May 25, 2022 at 5:39 PM Sean Christopherson <[email protected]> wrote:
>
> On Wed, May 25, 2022, Johannes Weiner wrote:
> > On Tue, May 24, 2022 at 03:31:52PM -0700, Yosry Ahmed wrote:
> > > I don't have enough context to say whether we should piggyback KVM MMU
> > > pages to the existing NR_PAGETABLE item, but from a high level it
> > > seems like it would be more helpful if they are a separate stat.
> > > Anyway, I am willing to go with whatever Sean thinks is best.
> >
> > Somebody should work this out and put it into a changelog. It's
> > permanent ABI.
>
> After a lot of waffling, my vote is to add a dedicated NR_SECONDARY_PAGETABLE.
>
> It's somewhat redundant from a KVM perspective, as NR_SECONDARY_PAGETABLE will
> scale with KVM's per-VM pages_{4k,2m,1g} stats unless the guest is doing something
> bizarre, e.g. accessing only 4kb chunks of 2mb pages so that KVM is forced to
> allocate a large number of page tables even though the guest isn't accessing that
> much memory.
>
> But, someone would need to either understand how KVM works to make that connection,
> or know (or be told) to go look at KVM's stats if they're running VMs to better
> decipher the stats.
>
> And even in the little bit of time I played with this, I found having
> nr_page_table_pages side-by-side with nr_secondary_page_table_pages to be very
> informative. E.g. when backing a VM with THP versus HugeTLB,
> nr_secondary_page_table_pages is roughly the same, but nr_page_table_pages is an
> order of a magnitude higher with THP. I'm guessing the THP behavior is due to
> something triggering DoubleMap, but now I want to find out why that's happening.
>
> So while I'm pretty sure a clever user could glean the same info by cross-referencing
> NR_PAGETABLE stats with KVM stats, I think having NR_SECONDARY_PAGETABLE will at the
> very least prove to be helpful for understanding tradeoffs between VM backing types,
> and likely even steer folks towards potential optimizations.
>
> Baseline:
> # grep page_table /proc/vmstat
> nr_page_table_pages 2830
> nr_secondary_page_table_pages 0
>
> THP:
> # grep page_table /proc/vmstat
> nr_page_table_pages 7584
> nr_secondary_page_table_pages 140
>
> HugeTLB:
> # grep page_table /proc/vmstat
> nr_page_table_pages 3153
> nr_secondary_page_table_pages 153
>

Interesting findings! Thanks for taking the time to look into this, Sean!
I will refresh this patchset and summarize the discussion in the
commit message, and also fix some nits on the KVM side. Does this
sound good to everyone?

2022-06-06 04:49:21

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

On Fri, May 27, 2022 at 11:33:27AM -0700, Yosry Ahmed wrote:
> On Wed, May 25, 2022 at 5:39 PM Sean Christopherson <[email protected]> wrote:
> >
> > On Wed, May 25, 2022, Johannes Weiner wrote:
> > > On Tue, May 24, 2022 at 03:31:52PM -0700, Yosry Ahmed wrote:
> > > > I don't have enough context to say whether we should piggyback KVM MMU
> > > > pages to the existing NR_PAGETABLE item, but from a high level it
> > > > seems like it would be more helpful if they are a separate stat.
> > > > Anyway, I am willing to go with whatever Sean thinks is best.
> > >
> > > Somebody should work this out and put it into a changelog. It's
> > > permanent ABI.
> >
> > After a lot of waffling, my vote is to add a dedicated NR_SECONDARY_PAGETABLE.
> >
> > It's somewhat redundant from a KVM perspective, as NR_SECONDARY_PAGETABLE will
> > scale with KVM's per-VM pages_{4k,2m,1g} stats unless the guest is doing something
> > bizarre, e.g. accessing only 4kb chunks of 2mb pages so that KVM is forced to
> > allocate a large number of page tables even though the guest isn't accessing that
> > much memory.
> >
> > But, someone would need to either understand how KVM works to make that connection,
> > or know (or be told) to go look at KVM's stats if they're running VMs to better
> > decipher the stats.
> >
> > And even in the little bit of time I played with this, I found having
> > nr_page_table_pages side-by-side with nr_secondary_page_table_pages to be very
> > informative. E.g. when backing a VM with THP versus HugeTLB,
> > nr_secondary_page_table_pages is roughly the same, but nr_page_table_pages is an
> > order of a magnitude higher with THP. I'm guessing the THP behavior is due to
> > something triggering DoubleMap, but now I want to find out why that's happening.
> >
> > So while I'm pretty sure a clever user could glean the same info by cross-referencing
> > NR_PAGETABLE stats with KVM stats, I think having NR_SECONDARY_PAGETABLE will at the
> > very least prove to be helpful for understanding tradeoffs between VM backing types,
> > and likely even steer folks towards potential optimizations.
> >
> > Baseline:
> > # grep page_table /proc/vmstat
> > nr_page_table_pages 2830
> > nr_secondary_page_table_pages 0
> >
> > THP:
> > # grep page_table /proc/vmstat
> > nr_page_table_pages 7584
> > nr_secondary_page_table_pages 140
> >
> > HugeTLB:
> > # grep page_table /proc/vmstat
> > nr_page_table_pages 3153
> > nr_secondary_page_table_pages 153
> >
>
> Interesting findings! Thanks for taking the time to look into this, Sean!
> I will refresh this patchset and summarize the discussion in the
> commit message, and also fix some nits on the KVM side. Does this
> sound good to everyone?

Yes, thanks for summarizing this. Sounds good to me!