Hi all,
In this series, the [PATCH 1/3] and [PATCH 2/3] are fixes for some issues
discovered during code inspection.
The [PATCH 3/3] is a supplement to ptdesc conversion in s390, I don't know
why this is not done in the commit 6326c26c1514 ("s390: convert various pgalloc
functions to use ptdescs"), maybe I missed something. And since I don't have an
s390 environment, I hope kernel test robot can help compile and test, and this
is why I did not fold [PATCH 2/3] and [PATCH 3/3] into one patch.
Comments and suggestions are welcome.
Thanks,
Qi
Qi Zheng (3):
mm: pgtable: correct the wrong comment about ptdesc->__page_flags
mm: pgtable: add missing pt_index to struct ptdesc
s390: supplement for ptdesc conversion
arch/s390/include/asm/pgalloc.h | 4 ++--
arch/s390/mm/gmap.c | 38 +++++++++++++++++----------------
arch/s390/mm/pgalloc.c | 8 +++----
include/linux/mm_types.h | 5 ++++-
4 files changed, 30 insertions(+), 25 deletions(-)
--
2.30.2
The commit 32cc0b7c9d50 ("powerpc: add pte_free_defer() for pgtables
sharing page") introduced the use of PageActive flag to page table
fragments tracking, so the ptdesc->__page_flags is not unused, so
correct the wrong comment.
Signed-off-by: Qi Zheng <[email protected]>
---
include/linux/mm_types.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index a7223ba3ea1e..5ea77969daae 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -419,7 +419,7 @@ FOLIO_MATCH(compound_head, _head_2a);
/**
* struct ptdesc - Memory descriptor for page tables.
- * @__page_flags: Same as page flags. Unused for page tables.
+ * @__page_flags: Same as page flags. Powerpc only.
* @pt_rcu_head: For freeing page table pages.
* @pt_list: List of used page tables. Used for s390 and x86.
* @_pt_pad_1: Padding that aliases with page's compound head.
--
2.30.2
In s390, the page->index field is used for gmap (see gmap_shadow_pgt()),
so add the corresponding pt_index to struct ptdesc and add a comment to
clarify this.
Signed-off-by: Qi Zheng <[email protected]>
---
include/linux/mm_types.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5ea77969daae..5240bd7bca33 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -425,6 +425,7 @@ FOLIO_MATCH(compound_head, _head_2a);
* @_pt_pad_1: Padding that aliases with page's compound head.
* @pmd_huge_pte: Protected by ptdesc->ptl, used for THPs.
* @__page_mapping: Aliases with page->mapping. Unused for page tables.
+ * @pt_index: Used for s390 gmap.
* @pt_mm: Used for x86 pgds.
* @pt_frag_refcount: For fragmented page table tracking. Powerpc only.
* @_pt_pad_2: Padding to ensure proper alignment.
@@ -450,6 +451,7 @@ struct ptdesc {
unsigned long __page_mapping;
union {
+ pgoff_t pt_index;
struct mm_struct *pt_mm;
atomic_t pt_frag_refcount;
};
@@ -475,6 +477,7 @@ TABLE_MATCH(flags, __page_flags);
TABLE_MATCH(compound_head, pt_list);
TABLE_MATCH(compound_head, _pt_pad_1);
TABLE_MATCH(mapping, __page_mapping);
+TABLE_MATCH(index, pt_index);
TABLE_MATCH(rcu_head, pt_rcu_head);
TABLE_MATCH(page_type, __page_type);
TABLE_MATCH(_refcount, __page_refcount);
--
2.30.2
After commit 6326c26c1514 ("s390: convert various pgalloc functions to use
ptdescs"), there are still some positions that use page->{lru, index}
instead of ptdesc->{pt_list, pt_index}. In order to make the use of
ptdesc->{pt_list, pt_index} clearer, it would be better to convert them
as well.
Signed-off-by: Qi Zheng <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Janosch Frank <[email protected]>
Cc: Claudio Imbrenda <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
arch/s390/include/asm/pgalloc.h | 4 ++--
arch/s390/mm/gmap.c | 38 +++++++++++++++++----------------
arch/s390/mm/pgalloc.c | 8 +++----
3 files changed, 26 insertions(+), 24 deletions(-)
diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index 502d655fe6ae..7b84ef6dc4b6 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -23,9 +23,9 @@ unsigned long *crst_table_alloc(struct mm_struct *);
void crst_table_free(struct mm_struct *, unsigned long *);
unsigned long *page_table_alloc(struct mm_struct *);
-struct page *page_table_alloc_pgste(struct mm_struct *mm);
+struct ptdesc *page_table_alloc_pgste(struct mm_struct *mm);
void page_table_free(struct mm_struct *, unsigned long *);
-void page_table_free_pgste(struct page *page);
+void page_table_free_pgste(struct ptdesc *ptdesc);
extern int page_table_allocate_pgste;
static inline void crst_table_init(unsigned long *crst, unsigned long entry)
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 8da39deb56ca..4d2674f89322 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -206,9 +206,11 @@ static void gmap_free(struct gmap *gmap)
/* Free additional data for a shadow gmap */
if (gmap_is_shadow(gmap)) {
+ struct ptdesc *ptdesc;
+
/* Free all page tables. */
- list_for_each_entry_safe(page, next, &gmap->pt_list, lru)
- page_table_free_pgste(page);
+ list_for_each_entry_safe(ptdesc, next, &gmap->pt_list, pt_list)
+ page_table_free_pgste(ptdesc);
gmap_rmap_radix_tree_free(&gmap->host_to_rmap);
/* Release reference to the parent */
gmap_put(gmap->parent);
@@ -1348,7 +1350,7 @@ static void gmap_unshadow_pgt(struct gmap *sg, unsigned long raddr)
{
unsigned long *ste;
phys_addr_t sto, pgt;
- struct page *page;
+ struct ptdesc *ptdesc;
BUG_ON(!gmap_is_shadow(sg));
ste = gmap_table_walk(sg, raddr, 1); /* get segment pointer */
@@ -1361,9 +1363,9 @@ static void gmap_unshadow_pgt(struct gmap *sg, unsigned long raddr)
*ste = _SEGMENT_ENTRY_EMPTY;
__gmap_unshadow_pgt(sg, raddr, __va(pgt));
/* Free page table */
- page = phys_to_page(pgt);
- list_del(&page->lru);
- page_table_free_pgste(page);
+ ptdesc = page_ptdesc(phys_to_page(pgt));
+ list_del(&ptdesc->pt_list);
+ page_table_free_pgste(ptdesc);
}
/**
@@ -1377,7 +1379,7 @@ static void gmap_unshadow_pgt(struct gmap *sg, unsigned long raddr)
static void __gmap_unshadow_sgt(struct gmap *sg, unsigned long raddr,
unsigned long *sgt)
{
- struct page *page;
+ struct ptdesc *ptdesc;
phys_addr_t pgt;
int i;
@@ -1389,9 +1391,9 @@ static void __gmap_unshadow_sgt(struct gmap *sg, unsigned long raddr,
sgt[i] = _SEGMENT_ENTRY_EMPTY;
__gmap_unshadow_pgt(sg, raddr, __va(pgt));
/* Free page table */
- page = phys_to_page(pgt);
- list_del(&page->lru);
- page_table_free_pgste(page);
+ ptdesc = page_ptdesc(phys_to_page(pgt));
+ list_del(&ptdesc->pt_list);
+ page_table_free_pgste(ptdesc);
}
}
@@ -2058,19 +2060,19 @@ int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,
{
unsigned long raddr, origin;
unsigned long *table;
- struct page *page;
+ struct ptdesc *ptdesc;
phys_addr_t s_pgt;
int rc;
BUG_ON(!gmap_is_shadow(sg) || (pgt & _SEGMENT_ENTRY_LARGE));
/* Allocate a shadow page table */
- page = page_table_alloc_pgste(sg->mm);
- if (!page)
+ ptdesc = page_table_alloc_pgste(sg->mm);
+ if (!ptdesc)
return -ENOMEM;
- page->index = pgt & _SEGMENT_ENTRY_ORIGIN;
+ ptdesc->pt_index = pgt & _SEGMENT_ENTRY_ORIGIN;
if (fake)
- page->index |= GMAP_SHADOW_FAKE_TABLE;
- s_pgt = page_to_phys(page);
+ ptdesc->pt_index |= GMAP_SHADOW_FAKE_TABLE;
+ s_pgt = page_to_phys(ptdesc_page(ptdesc));
/* Install shadow page table */
spin_lock(&sg->guest_table_lock);
table = gmap_table_walk(sg, saddr, 1); /* get segment pointer */
@@ -2088,7 +2090,7 @@ int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,
/* mark as invalid as long as the parent table is not protected */
*table = (unsigned long) s_pgt | _SEGMENT_ENTRY |
(pgt & _SEGMENT_ENTRY_PROTECT) | _SEGMENT_ENTRY_INVALID;
- list_add(&page->lru, &sg->pt_list);
+ list_add(&ptdesc->pt_list, &sg->pt_list);
if (fake) {
/* nothing to protect for fake tables */
*table &= ~_SEGMENT_ENTRY_INVALID;
@@ -2114,7 +2116,7 @@ int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,
return rc;
out_free:
spin_unlock(&sg->guest_table_lock);
- page_table_free_pgste(page);
+ page_table_free_pgste(ptdesc);
return rc;
}
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 008e487c94a6..abb629d7e131 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -135,7 +135,7 @@ int crst_table_upgrade(struct mm_struct *mm, unsigned long end)
#ifdef CONFIG_PGSTE
-struct page *page_table_alloc_pgste(struct mm_struct *mm)
+struct ptdesc *page_table_alloc_pgste(struct mm_struct *mm)
{
struct ptdesc *ptdesc;
u64 *table;
@@ -147,12 +147,12 @@ struct page *page_table_alloc_pgste(struct mm_struct *mm)
memset64(table, _PAGE_INVALID, PTRS_PER_PTE);
memset64(table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
}
- return ptdesc_page(ptdesc);
+ return ptdesc;
}
-void page_table_free_pgste(struct page *page)
+void page_table_free_pgste(struct ptdesc *ptdesc)
{
- pagetable_free(page_ptdesc(page));
+ pagetable_free(ptdesc);
}
#endif /* CONFIG_PGSTE */
--
2.30.2
After commit 6326c26c1514 ("s390: convert various pgalloc functions to use
ptdescs"), there are still some positions that use page->{lru, index}
instead of ptdesc->{pt_list, pt_index}. In order to make the use of
ptdesc->{pt_list, pt_index} clearer, it would be better to convert them
as well.
Signed-off-by: Qi Zheng <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Janosch Frank <[email protected]>
Cc: Claudio Imbrenda <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
v1 -> v2: fix build failure (cross compilation successful)
arch/s390/include/asm/pgalloc.h | 4 ++--
arch/s390/mm/gmap.c | 38 +++++++++++++++++----------------
arch/s390/mm/pgalloc.c | 8 +++----
3 files changed, 26 insertions(+), 24 deletions(-)
diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index 502d655fe6ae..7b84ef6dc4b6 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -23,9 +23,9 @@ unsigned long *crst_table_alloc(struct mm_struct *);
void crst_table_free(struct mm_struct *, unsigned long *);
unsigned long *page_table_alloc(struct mm_struct *);
-struct page *page_table_alloc_pgste(struct mm_struct *mm);
+struct ptdesc *page_table_alloc_pgste(struct mm_struct *mm);
void page_table_free(struct mm_struct *, unsigned long *);
-void page_table_free_pgste(struct page *page);
+void page_table_free_pgste(struct ptdesc *ptdesc);
extern int page_table_allocate_pgste;
static inline void crst_table_init(unsigned long *crst, unsigned long entry)
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 8da39deb56ca..e43a5a3befd4 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -206,9 +206,11 @@ static void gmap_free(struct gmap *gmap)
/* Free additional data for a shadow gmap */
if (gmap_is_shadow(gmap)) {
+ struct ptdesc *ptdesc, *n;
+
/* Free all page tables. */
- list_for_each_entry_safe(page, next, &gmap->pt_list, lru)
- page_table_free_pgste(page);
+ list_for_each_entry_safe(ptdesc, n, &gmap->pt_list, pt_list)
+ page_table_free_pgste(ptdesc);
gmap_rmap_radix_tree_free(&gmap->host_to_rmap);
/* Release reference to the parent */
gmap_put(gmap->parent);
@@ -1348,7 +1350,7 @@ static void gmap_unshadow_pgt(struct gmap *sg, unsigned long raddr)
{
unsigned long *ste;
phys_addr_t sto, pgt;
- struct page *page;
+ struct ptdesc *ptdesc;
BUG_ON(!gmap_is_shadow(sg));
ste = gmap_table_walk(sg, raddr, 1); /* get segment pointer */
@@ -1361,9 +1363,9 @@ static void gmap_unshadow_pgt(struct gmap *sg, unsigned long raddr)
*ste = _SEGMENT_ENTRY_EMPTY;
__gmap_unshadow_pgt(sg, raddr, __va(pgt));
/* Free page table */
- page = phys_to_page(pgt);
- list_del(&page->lru);
- page_table_free_pgste(page);
+ ptdesc = page_ptdesc(phys_to_page(pgt));
+ list_del(&ptdesc->pt_list);
+ page_table_free_pgste(ptdesc);
}
/**
@@ -1377,7 +1379,7 @@ static void gmap_unshadow_pgt(struct gmap *sg, unsigned long raddr)
static void __gmap_unshadow_sgt(struct gmap *sg, unsigned long raddr,
unsigned long *sgt)
{
- struct page *page;
+ struct ptdesc *ptdesc;
phys_addr_t pgt;
int i;
@@ -1389,9 +1391,9 @@ static void __gmap_unshadow_sgt(struct gmap *sg, unsigned long raddr,
sgt[i] = _SEGMENT_ENTRY_EMPTY;
__gmap_unshadow_pgt(sg, raddr, __va(pgt));
/* Free page table */
- page = phys_to_page(pgt);
- list_del(&page->lru);
- page_table_free_pgste(page);
+ ptdesc = page_ptdesc(phys_to_page(pgt));
+ list_del(&ptdesc->pt_list);
+ page_table_free_pgste(ptdesc);
}
}
@@ -2058,19 +2060,19 @@ int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,
{
unsigned long raddr, origin;
unsigned long *table;
- struct page *page;
+ struct ptdesc *ptdesc;
phys_addr_t s_pgt;
int rc;
BUG_ON(!gmap_is_shadow(sg) || (pgt & _SEGMENT_ENTRY_LARGE));
/* Allocate a shadow page table */
- page = page_table_alloc_pgste(sg->mm);
- if (!page)
+ ptdesc = page_table_alloc_pgste(sg->mm);
+ if (!ptdesc)
return -ENOMEM;
- page->index = pgt & _SEGMENT_ENTRY_ORIGIN;
+ ptdesc->pt_index = pgt & _SEGMENT_ENTRY_ORIGIN;
if (fake)
- page->index |= GMAP_SHADOW_FAKE_TABLE;
- s_pgt = page_to_phys(page);
+ ptdesc->pt_index |= GMAP_SHADOW_FAKE_TABLE;
+ s_pgt = page_to_phys(ptdesc_page(ptdesc));
/* Install shadow page table */
spin_lock(&sg->guest_table_lock);
table = gmap_table_walk(sg, saddr, 1); /* get segment pointer */
@@ -2088,7 +2090,7 @@ int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,
/* mark as invalid as long as the parent table is not protected */
*table = (unsigned long) s_pgt | _SEGMENT_ENTRY |
(pgt & _SEGMENT_ENTRY_PROTECT) | _SEGMENT_ENTRY_INVALID;
- list_add(&page->lru, &sg->pt_list);
+ list_add(&ptdesc->pt_list, &sg->pt_list);
if (fake) {
/* nothing to protect for fake tables */
*table &= ~_SEGMENT_ENTRY_INVALID;
@@ -2114,7 +2116,7 @@ int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,
return rc;
out_free:
spin_unlock(&sg->guest_table_lock);
- page_table_free_pgste(page);
+ page_table_free_pgste(ptdesc);
return rc;
}
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 008e487c94a6..abb629d7e131 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -135,7 +135,7 @@ int crst_table_upgrade(struct mm_struct *mm, unsigned long end)
#ifdef CONFIG_PGSTE
-struct page *page_table_alloc_pgste(struct mm_struct *mm)
+struct ptdesc *page_table_alloc_pgste(struct mm_struct *mm)
{
struct ptdesc *ptdesc;
u64 *table;
@@ -147,12 +147,12 @@ struct page *page_table_alloc_pgste(struct mm_struct *mm)
memset64(table, _PAGE_INVALID, PTRS_PER_PTE);
memset64(table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
}
- return ptdesc_page(ptdesc);
+ return ptdesc;
}
-void page_table_free_pgste(struct page *page)
+void page_table_free_pgste(struct ptdesc *ptdesc)
{
- pagetable_free(page_ptdesc(page));
+ pagetable_free(ptdesc);
}
#endif /* CONFIG_PGSTE */
--
2.30.2
On Tue, Mar 05, 2024 at 03:21:54PM +0800, Qi Zheng wrote:
> After commit 6326c26c1514 ("s390: convert various pgalloc functions to use
> ptdescs"), there are still some positions that use page->{lru, index}
> instead of ptdesc->{pt_list, pt_index}. In order to make the use of
> ptdesc->{pt_list, pt_index} clearer, it would be better to convert them
> as well.
>
> Signed-off-by: Qi Zheng <[email protected]>
> Cc: Christian Borntraeger <[email protected]>
> Cc: Janosch Frank <[email protected]>
> Cc: Claudio Imbrenda <[email protected]>
> Cc: David Hildenbrand <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
> v1 -> v2: fix build failure (cross compilation successful)
>
> arch/s390/include/asm/pgalloc.h | 4 ++--
> arch/s390/mm/gmap.c | 38 +++++++++++++++++----------------
> arch/s390/mm/pgalloc.c | 8 +++----
> 3 files changed, 26 insertions(+), 24 deletions(-)
Same here: Christian, Janosch, or Claudio, please have a look and
provide an ACK if this is ok with you.
On Mon, Mar 04, 2024 at 07:07:17PM +0800, Qi Zheng wrote:
> Hi all,
Sorry for the late review. Thanks for looking at doing some ptdesc
conversions. This patchset has the right idea and looks *mostly* fine.
> In this series, the [PATCH 1/3] and [PATCH 2/3] are fixes for some issues
> discovered during code inspection.
>
> The [PATCH 3/3] is a supplement to ptdesc conversion in s390, I don't know
> why this is not done in the commit 6326c26c1514 ("s390: convert various pgalloc
> functions to use ptdescs"), maybe I missed something. And since I don't have an
It's important to keep in mind the end goal of ptdescs, cleaning up much
of the struct page field misuse by standardizing their usage. s390 page
tables and gmap are similar but not the same, so the conversions require
deeper thought.
My initial "Split ptdesc from struct page" patchset tried to focus on the
most straightforward, simple conversions in order to introduce the
descriptor and lay a foundation for future conversions - you can see some
more complicated iterations v6 and prior.
When converting to ptdescs (and any other newer descriptors), we should
be careful about generating superficial code churn instead of using
them to solve the problems they are trying to solve.
On Mon, Mar 04, 2024 at 07:07:18PM +0800, Qi Zheng wrote:
> The commit 32cc0b7c9d50 ("powerpc: add pte_free_defer() for pgtables
> sharing page") introduced the use of PageActive flag to page table
> fragments tracking, so the ptdesc->__page_flags is not unused, so
> correct the wrong comment.
Thanks for catching this!
In regards to naming, we're trying to prefix unused variables with
__underscores so I'd prefer to see the __ eliminated from the
ptdesc->page_flags field here as well. This doesn't warrant a fix
as it is already in 6.9-rc1, but is something to keep in
mind for the future. Aside from that, LGTM.
Reviewed-by: Vishal Moola (Oracle) <[email protected]>
> Signed-off-by: Qi Zheng <[email protected]>
> ---
> include/linux/mm_types.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index a7223ba3ea1e..5ea77969daae 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -419,7 +419,7 @@ FOLIO_MATCH(compound_head, _head_2a);
>
> /**
> * struct ptdesc - Memory descriptor for page tables.
> - * @__page_flags: Same as page flags. Unused for page tables.
> + * @__page_flags: Same as page flags. Powerpc only.
> * @pt_rcu_head: For freeing page table pages.
> * @pt_list: List of used page tables. Used for s390 and x86.
> * @_pt_pad_1: Padding that aliases with page's compound head.
> --
> 2.30.2
>
On Mon, Mar 04, 2024 at 07:07:19PM +0800, Qi Zheng wrote:
> In s390, the page->index field is used for gmap (see gmap_shadow_pgt()),
> so add the corresponding pt_index to struct ptdesc and add a comment to
> clarify this.
Yes s390 gmap 'uses' page->index, but not for the purpose page->index is
supposed to hold. It's alright to have a variable here, but I'd rather
see it named something more appropriate to the purporse it serves.
You can take look at this patch from v5 of my ptdesc conversion series
for more info:
https://lore.kernel.org/linux-mm/[email protected]/
> Signed-off-by: Qi Zheng <[email protected]>
> ---
> include/linux/mm_types.h | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 5ea77969daae..5240bd7bca33 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -425,6 +425,7 @@ FOLIO_MATCH(compound_head, _head_2a);
> * @_pt_pad_1: Padding that aliases with page's compound head.
> * @pmd_huge_pte: Protected by ptdesc->ptl, used for THPs.
> * @__page_mapping: Aliases with page->mapping. Unused for page tables.
> + * @pt_index: Used for s390 gmap.
> * @pt_mm: Used for x86 pgds.
> * @pt_frag_refcount: For fragmented page table tracking. Powerpc only.
> * @_pt_pad_2: Padding to ensure proper alignment.
> @@ -450,6 +451,7 @@ struct ptdesc {
> unsigned long __page_mapping;
>
> union {
> + pgoff_t pt_index;
> struct mm_struct *pt_mm;
> atomic_t pt_frag_refcount;
> };
> @@ -475,6 +477,7 @@ TABLE_MATCH(flags, __page_flags);
> TABLE_MATCH(compound_head, pt_list);
> TABLE_MATCH(compound_head, _pt_pad_1);
> TABLE_MATCH(mapping, __page_mapping);
> +TABLE_MATCH(index, pt_index);
> TABLE_MATCH(rcu_head, pt_rcu_head);
> TABLE_MATCH(page_type, __page_type);
> TABLE_MATCH(_refcount, __page_refcount);
> --
> 2.30.2
>
On Mon, Mar 04, 2024 at 07:07:20PM +0800, Qi Zheng wrote:
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -206,9 +206,11 @@ static void gmap_free(struct gmap *gmap)
>
> /* Free additional data for a shadow gmap */
> if (gmap_is_shadow(gmap)) {
> + struct ptdesc *ptdesc;
> +
> /* Free all page tables. */
> - list_for_each_entry_safe(page, next, &gmap->pt_list, lru)
> - page_table_free_pgste(page);
> + list_for_each_entry_safe(ptdesc, next, &gmap->pt_list, pt_list)
> + page_table_free_pgste(ptdesc);
An important note: ptdesc allocation/freeing is different than the
standard alloc_pages()/free_pages() routines architectures are used to.
Are we sure we don't have memory leaks here?
We always allocate and free ptdescs as compound pages; for page table
struct pages, most archictectures do not. s390 has CRST_ALLOC_ORDER
pagetables, meaning if we free anything using the ptdesc api, we better
be sure it was allocated using the ptdesc api as well.
Like you, I don't have a s390 to test on, so hopefully some s390 expert
can chime in to let us know if we need a fix for this.
On 2024/3/27 03:12, Vishal Moola wrote:
> On Mon, Mar 04, 2024 at 07:07:18PM +0800, Qi Zheng wrote:
>> The commit 32cc0b7c9d50 ("powerpc: add pte_free_defer() for pgtables
>> sharing page") introduced the use of PageActive flag to page table
>> fragments tracking, so the ptdesc->__page_flags is not unused, so
>> correct the wrong comment.
>
> Thanks for catching this!
>
> In regards to naming, we're trying to prefix unused variables with
> __underscores so I'd prefer to see the __ eliminated from the
> ptdesc->page_flags field here as well. This doesn't warrant a fix
> as it is already in 6.9-rc1, but is something to keep in
> mind for the future.
Got it.
> Aside from that, LGTM.
>
> Reviewed-by: Vishal Moola (Oracle) <[email protected]>
Thanks!
>> Signed-off-by: Qi Zheng <[email protected]>
>> ---
>> include/linux/mm_types.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>> index a7223ba3ea1e..5ea77969daae 100644
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -419,7 +419,7 @@ FOLIO_MATCH(compound_head, _head_2a);
>>
>> /**
>> * struct ptdesc - Memory descriptor for page tables.
>> - * @__page_flags: Same as page flags. Unused for page tables.
>> + * @__page_flags: Same as page flags. Powerpc only.
>> * @pt_rcu_head: For freeing page table pages.
>> * @pt_list: List of used page tables. Used for s390 and x86.
>> * @_pt_pad_1: Padding that aliases with page's compound head.
>> --
>> 2.30.2
>>
On 2024/3/27 03:48, Vishal Moola wrote:
> On Mon, Mar 04, 2024 at 07:07:20PM +0800, Qi Zheng wrote:
>> --- a/arch/s390/mm/gmap.c
>> +++ b/arch/s390/mm/gmap.c
>> @@ -206,9 +206,11 @@ static void gmap_free(struct gmap *gmap)
>>
>> /* Free additional data for a shadow gmap */
>> if (gmap_is_shadow(gmap)) {
>> + struct ptdesc *ptdesc;
>> +
>> /* Free all page tables. */
>> - list_for_each_entry_safe(page, next, &gmap->pt_list, lru)
>> - page_table_free_pgste(page);
>> + list_for_each_entry_safe(ptdesc, next, &gmap->pt_list, pt_list)
>> + page_table_free_pgste(ptdesc);
>
> An important note: ptdesc allocation/freeing is different than the
> standard alloc_pages()/free_pages() routines architectures are used to.
> Are we sure we don't have memory leaks here?
>
> We always allocate and free ptdescs as compound pages; for page table
> struct pages, most archictectures do not. s390 has CRST_ALLOC_ORDER
> pagetables, meaning if we free anything using the ptdesc api, we better
> be sure it was allocated using the ptdesc api as well.
According to the code inspection, all ptdescs added to the pmap->pt_list
are allocated via page_table_alloc_pgste().
>
> Like you, I don't have a s390 to test on, so hopefully some s390 expert
> can chime in to let us know if we need a fix for this.
Yes, hope so!
On 2024/3/27 03:25, Vishal Moola wrote:
> On Mon, Mar 04, 2024 at 07:07:19PM +0800, Qi Zheng wrote:
>> In s390, the page->index field is used for gmap (see gmap_shadow_pgt()),
>> so add the corresponding pt_index to struct ptdesc and add a comment to
>> clarify this.
>
> Yes s390 gmap 'uses' page->index, but not for the purpose page->index is
> supposed to hold. It's alright to have a variable here, but I'd rather
> see it named something more appropriate to the purporse it serves.
Make sense.
>
> You can take look at this patch from v5 of my ptdesc conversion series
> for more info:
> https://lore.kernel.org/linux-mm/[email protected]/
Oh, but it seems that this patch has not been merged?
>
>> Signed-off-by: Qi Zheng <[email protected]>
>> ---
>> include/linux/mm_types.h | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>> index 5ea77969daae..5240bd7bca33 100644
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -425,6 +425,7 @@ FOLIO_MATCH(compound_head, _head_2a);
>> * @_pt_pad_1: Padding that aliases with page's compound head.
>> * @pmd_huge_pte: Protected by ptdesc->ptl, used for THPs.
>> * @__page_mapping: Aliases with page->mapping. Unused for page tables.
>> + * @pt_index: Used for s390 gmap.
>> * @pt_mm: Used for x86 pgds.
>> * @pt_frag_refcount: For fragmented page table tracking. Powerpc only.
>> * @_pt_pad_2: Padding to ensure proper alignment.
>> @@ -450,6 +451,7 @@ struct ptdesc {
>> unsigned long __page_mapping;
>>
>> union {
>> + pgoff_t pt_index;
>> struct mm_struct *pt_mm;
>> atomic_t pt_frag_refcount;
>> };
>> @@ -475,6 +477,7 @@ TABLE_MATCH(flags, __page_flags);
>> TABLE_MATCH(compound_head, pt_list);
>> TABLE_MATCH(compound_head, _pt_pad_1);
>> TABLE_MATCH(mapping, __page_mapping);
>> +TABLE_MATCH(index, pt_index);
>> TABLE_MATCH(rcu_head, pt_rcu_head);
>> TABLE_MATCH(page_type, __page_type);
>> TABLE_MATCH(_refcount, __page_refcount);
>> --
>> 2.30.2
>>
On 26.03.24 20:07, Vishal Moola wrote:
> On Mon, Mar 04, 2024 at 07:07:17PM +0800, Qi Zheng wrote:
>> Hi all,
>
> Sorry for the late review. Thanks for looking at doing some ptdesc
> conversions. This patchset has the right idea and looks *mostly* fine.
>
>> In this series, the [PATCH 1/3] and [PATCH 2/3] are fixes for some issues
>> discovered during code inspection.
>>
>> The [PATCH 3/3] is a supplement to ptdesc conversion in s390, I don't know
>> why this is not done in the commit 6326c26c1514 ("s390: convert various pgalloc
>> functions to use ptdescs"), maybe I missed something. And since I don't have an
>
> It's important to keep in mind the end goal of ptdescs, cleaning up much
> of the struct page field misuse by standardizing their usage. s390 page
> tables and gmap are similar but not the same, so the conversions require
> deeper thought.
>
> My initial "Split ptdesc from struct page" patchset tried to focus on the
> most straightforward, simple conversions in order to introduce the
> descriptor and lay a foundation for future conversions - you can see some
> more complicated iterations v6 and prior.
>
> When converting to ptdescs (and any other newer descriptors), we should
> be careful about generating superficial code churn instead of using
> them to solve the problems they are trying to solve.
The gmap shadow pages are page tables that are not linked into the user
page tables.
I recall I raised in the past that using ptdesc from them is confusing.
--
Cheers,
David / dhildenb