2024-05-08 18:30:14

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 00/10] s390: PG_arch_1+folio cleanups for uv+hugetlb

Rebased on 390x/features. Cleanups around PG_arch_1 and folio handling
in UV and hugetlb code.

One "easy" fix upfront. Another issue I spotted is documented in [1].

Once this hits upstream, we can remove HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
from core-mm and s390x, so only the folio variant will remain.

Compile tested, but not runtime tested with UV, I'll appreciate some
testing help from people with UV access and experience.

[1] https://lkml.kernel.org/r/[email protected]

v2 -> v3:
* "s390/uv: split large folios in gmap_make_secure()"
-> Spelling fix
* "s390/hugetlb: convert PG_arch_1 code to work on folio->flags"
-> Extended patch description

v1 -> v2:
* Rebased on s390x/features:
* "s390/hugetlb: convert PG_arch_1 code to work on folio->flags"
-> pmd_folio() not available on s390x/features
* "s390/uv: don't call folio_wait_writeback() without a folio reference"
-> Willy's folio conversion is in s390x/features
* "s390/uv: convert PG_arch_1 users to only work on small folios"
-> Add comments
* Rearrange code and handle split_folio() return values properly. New
patches to handle splitting:
-> "s390/uv: gmap_make_secure() cleanups for further changes"
-> "s390/uv: split large folios in gmap_make_secure()"
* Added more cleanups:
-> "s390/uv: make uv_convert_from_secure() a static function"
-> "s390/uv: convert uv_destroy_owned_page() to uv_destroy_(folio|pte)()"
-> "s390/uv: convert uv_convert_owned_from_secure() to
uv_convert_from_secure_(folio|pte)()"
-> "s390/mm: implement HAVE_ARCH_MAKE_FOLIO_ACCESSIBLE"

Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Janosch Frank <[email protected]>
Cc: Claudio Imbrenda <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Thomas Huth <[email protected]>

David Hildenbrand (10):
s390/uv: don't call folio_wait_writeback() without a folio reference
s390/uv: gmap_make_secure() cleanups for further changes
s390/uv: split large folios in gmap_make_secure()
s390/uv: convert PG_arch_1 users to only work on small folios
s390/uv: update PG_arch_1 comment
s390/uv: make uv_convert_from_secure() a static function
s390/uv: convert uv_destroy_owned_page() to uv_destroy_(folio|pte)()
s390/uv: convert uv_convert_owned_from_secure() to
uv_convert_from_secure_(folio|pte)()
s390/uv: implement HAVE_ARCH_MAKE_FOLIO_ACCESSIBLE
s390/hugetlb: convert PG_arch_1 code to work on folio->flags

arch/s390/include/asm/page.h | 5 +
arch/s390/include/asm/pgtable.h | 8 +-
arch/s390/include/asm/uv.h | 12 +-
arch/s390/kernel/uv.c | 207 +++++++++++++++++++++-----------
arch/s390/mm/fault.c | 14 ++-
arch/s390/mm/gmap.c | 10 +-
arch/s390/mm/hugetlbpage.c | 8 +-
7 files changed, 172 insertions(+), 92 deletions(-)

--
2.45.0



2024-05-08 18:30:37

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 01/10] s390/uv: don't call folio_wait_writeback() without a folio reference

folio_wait_writeback() requires that no spinlocks are held and that
a folio reference is held, as documented. After we dropped the PTL, the
folio could get freed concurrently. So grab a temporary reference.

Fixes: 214d9bbcd3a6 ("s390/mm: provide memory management functions for protected KVM guests")
Reviewed-by: Claudio Imbrenda <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/kernel/uv.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index 265fea37e030..016993e9eb72 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -318,6 +318,13 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
rc = make_folio_secure(folio, uvcb);
folio_unlock(folio);
}
+
+ /*
+ * Once we drop the PTL, the folio may get unmapped and
+ * freed immediately. We need a temporary reference.
+ */
+ if (rc == -EAGAIN)
+ folio_get(folio);
}
unlock:
pte_unmap_unlock(ptep, ptelock);
@@ -330,6 +337,7 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
* completion, this is just a useless check, but it is safe.
*/
folio_wait_writeback(folio);
+ folio_put(folio);
} else if (rc == -EBUSY) {
/*
* If we have tried a local drain and the folio refcount
--
2.45.0


2024-05-08 18:30:45

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 02/10] s390/uv: gmap_make_secure() cleanups for further changes

Let's factor out handling of LRU cache draining and convert the if-else
chain to a switch-case.

Reviewed-by: Claudio Imbrenda <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/kernel/uv.c | 66 ++++++++++++++++++++++++++-----------------
1 file changed, 40 insertions(+), 26 deletions(-)

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index 016993e9eb72..25fe28d189df 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -266,6 +266,36 @@ static bool should_export_before_import(struct uv_cb_header *uvcb, struct mm_str
return atomic_read(&mm->context.protected_count) > 1;
}

+/*
+ * Drain LRU caches: the local one on first invocation and the ones of all
+ * CPUs on successive invocations. Returns "true" on the first invocation.
+ */
+static bool drain_lru(bool *drain_lru_called)
+{
+ /*
+ * If we have tried a local drain and the folio refcount
+ * still does not match our expected safe value, try with a
+ * system wide drain. This is needed if the pagevecs holding
+ * the page are on a different CPU.
+ */
+ if (*drain_lru_called) {
+ lru_add_drain_all();
+ /* We give up here, don't retry immediately. */
+ return false;
+ }
+ /*
+ * We are here if the folio refcount does not match the
+ * expected safe value. The main culprits are usually
+ * pagevecs. With lru_add_drain() we drain the pagevecs
+ * on the local CPU so that hopefully the refcount will
+ * reach the expected safe value.
+ */
+ lru_add_drain();
+ *drain_lru_called = true;
+ /* The caller should try again immediately */
+ return true;
+}
+
/*
* Requests the Ultravisor to make a page accessible to a guest.
* If it's brought in the first time, it will be cleared. If
@@ -275,7 +305,7 @@ static bool should_export_before_import(struct uv_cb_header *uvcb, struct mm_str
int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
{
struct vm_area_struct *vma;
- bool local_drain = false;
+ bool drain_lru_called = false;
spinlock_t *ptelock;
unsigned long uaddr;
struct folio *folio;
@@ -331,37 +361,21 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
out:
mmap_read_unlock(gmap->mm);

- if (rc == -EAGAIN) {
+ switch (rc) {
+ case -EAGAIN:
/*
* If we are here because the UVC returned busy or partial
* completion, this is just a useless check, but it is safe.
*/
folio_wait_writeback(folio);
folio_put(folio);
- } else if (rc == -EBUSY) {
- /*
- * If we have tried a local drain and the folio refcount
- * still does not match our expected safe value, try with a
- * system wide drain. This is needed if the pagevecs holding
- * the page are on a different CPU.
- */
- if (local_drain) {
- lru_add_drain_all();
- /* We give up here, and let the caller try again */
- return -EAGAIN;
- }
- /*
- * We are here if the folio refcount does not match the
- * expected safe value. The main culprits are usually
- * pagevecs. With lru_add_drain() we drain the pagevecs
- * on the local CPU so that hopefully the refcount will
- * reach the expected safe value.
- */
- lru_add_drain();
- local_drain = true;
- /* And now we try again immediately after draining */
- goto again;
- } else if (rc == -ENXIO) {
+ return -EAGAIN;
+ case -EBUSY:
+ /* Additional folio references. */
+ if (drain_lru(&drain_lru_called))
+ goto again;
+ return -EAGAIN;
+ case -ENXIO:
if (gmap_fault(gmap, gaddr, FAULT_FLAG_WRITE))
return -EFAULT;
return -EAGAIN;
--
2.45.0


2024-05-08 18:31:14

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 03/10] s390/uv: split large folios in gmap_make_secure()

While s390x makes sure to never have PMD-mapped THP in processes that use
KVM -- by remapping them using PTEs in
thp_split_walk_pmd_entry()->split_huge_pmd() -- there is still the
possibility of having PTE-mapped THPs (large folios) mapped into guest
memory.

This would happen if user space allocates memory before calling
KVM_CREATE_VM (which would call s390_enable_sie()). With upstream QEMU,
this currently doesn't happen, because guest memory is setup and
conditionally preallocated after KVM_CREATE_VM.

Could it happen with shmem/file-backed memory when another process
allocated memory in the pagecache? Likely, although currently not a
common setup.

Trying to split any PTE-mapped large folios sounds like the right and
future-proof thing to do here. So let's call split_folio() and handle the
return values accordingly.

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/kernel/uv.c | 31 +++++++++++++++++++++++++------
1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index 25fe28d189df..3c6d86e3e828 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -338,11 +338,10 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
goto out;
if (pte_present(*ptep) && !(pte_val(*ptep) & _PAGE_INVALID) && pte_write(*ptep)) {
folio = page_folio(pte_page(*ptep));
- rc = -EINVAL;
- if (folio_test_large(folio))
- goto unlock;
rc = -EAGAIN;
- if (folio_trylock(folio)) {
+ if (folio_test_large(folio)) {
+ rc = -E2BIG;
+ } else if (folio_trylock(folio)) {
if (should_export_before_import(uvcb, gmap->mm))
uv_convert_from_secure(PFN_PHYS(folio_pfn(folio)));
rc = make_folio_secure(folio, uvcb);
@@ -353,15 +352,35 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
* Once we drop the PTL, the folio may get unmapped and
* freed immediately. We need a temporary reference.
*/
- if (rc == -EAGAIN)
+ if (rc == -EAGAIN || rc == -E2BIG)
folio_get(folio);
}
-unlock:
pte_unmap_unlock(ptep, ptelock);
out:
mmap_read_unlock(gmap->mm);

switch (rc) {
+ case -E2BIG:
+ folio_lock(folio);
+ rc = split_folio(folio);
+ folio_unlock(folio);
+ folio_put(folio);
+
+ switch (rc) {
+ case 0:
+ /* Splitting succeeded, try again immediately. */
+ goto again;
+ case -EAGAIN:
+ /* Additional folio references. */
+ if (drain_lru(&drain_lru_called))
+ goto again;
+ return -EAGAIN;
+ case -EBUSY:
+ /* Unexpected race. */
+ return -EAGAIN;
+ }
+ WARN_ON_ONCE(1);
+ return -ENXIO;
case -EAGAIN:
/*
* If we are here because the UVC returned busy or partial
--
2.45.0


2024-05-08 18:31:31

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 04/10] s390/uv: convert PG_arch_1 users to only work on small folios

Now that make_folio_secure() may only set PG_arch_1 for small folios,
let's convert relevant remaining UV code to only work on (small) folios
and simply reject large folios early. This way, we'll never end up
touching PG_arch_1 on tail pages of a large folio in UV code.

The folio_get()/folio_put() for functions that are documented to already
hold a folio reference look weird; likely they are required to make
concurrent gmap_make_secure() back off because the caller might only hold
an implicit reference due to the page mapping. So leave that alone for now.

Reviewed-by: Claudio Imbrenda <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/include/asm/page.h | 2 ++
arch/s390/kernel/uv.c | 41 ++++++++++++++++++++++--------------
2 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
index 224ff9d433ea..ecbf4b626f46 100644
--- a/arch/s390/include/asm/page.h
+++ b/arch/s390/include/asm/page.h
@@ -247,7 +247,9 @@ static inline unsigned long __phys_addr(unsigned long x, bool is_31bit)
#define pfn_to_phys(pfn) ((pfn) << PAGE_SHIFT)

#define phys_to_page(phys) pfn_to_page(phys_to_pfn(phys))
+#define phys_to_folio(phys) page_folio(phys_to_page(phys))
#define page_to_phys(page) pfn_to_phys(page_to_pfn(page))
+#define folio_to_phys(page) pfn_to_phys(folio_pfn(folio))

static inline void *pfn_to_virt(unsigned long pfn)
{
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index 3c6d86e3e828..914dcec27329 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -135,14 +135,18 @@ static int uv_destroy_page(unsigned long paddr)
*/
int uv_destroy_owned_page(unsigned long paddr)
{
- struct page *page = phys_to_page(paddr);
+ struct folio *folio = phys_to_folio(paddr);
int rc;

- get_page(page);
+ /* See gmap_make_secure(): large folios cannot be secure */
+ if (unlikely(folio_test_large(folio)))
+ return 0;
+
+ folio_get(folio);
rc = uv_destroy_page(paddr);
if (!rc)
- clear_bit(PG_arch_1, &page->flags);
- put_page(page);
+ clear_bit(PG_arch_1, &folio->flags);
+ folio_put(folio);
return rc;
}

@@ -170,14 +174,18 @@ int uv_convert_from_secure(unsigned long paddr)
*/
int uv_convert_owned_from_secure(unsigned long paddr)
{
- struct page *page = phys_to_page(paddr);
+ struct folio *folio = phys_to_folio(paddr);
int rc;

- get_page(page);
+ /* See gmap_make_secure(): large folios cannot be secure */
+ if (unlikely(folio_test_large(folio)))
+ return 0;
+
+ folio_get(folio);
rc = uv_convert_from_secure(paddr);
if (!rc)
- clear_bit(PG_arch_1, &page->flags);
- put_page(page);
+ clear_bit(PG_arch_1, &folio->flags);
+ folio_put(folio);
return rc;
}

@@ -479,33 +487,34 @@ EXPORT_SYMBOL_GPL(gmap_destroy_page);
*/
int arch_make_page_accessible(struct page *page)
{
+ struct folio *folio = page_folio(page);
int rc = 0;

- /* Hugepage cannot be protected, so nothing to do */
- if (PageHuge(page))
+ /* See gmap_make_secure(): large folios cannot be secure */
+ if (unlikely(folio_test_large(folio)))
return 0;

/*
* PG_arch_1 is used in 3 places:
* 1. for kernel page tables during early boot
* 2. for storage keys of huge pages and KVM
- * 3. As an indication that this page might be secure. This can
+ * 3. As an indication that this small folio might be secure. This can
* overindicate, e.g. we set the bit before calling
* convert_to_secure.
* As secure pages are never huge, all 3 variants can co-exists.
*/
- if (!test_bit(PG_arch_1, &page->flags))
+ if (!test_bit(PG_arch_1, &folio->flags))
return 0;

- rc = uv_pin_shared(page_to_phys(page));
+ rc = uv_pin_shared(folio_to_phys(folio));
if (!rc) {
- clear_bit(PG_arch_1, &page->flags);
+ clear_bit(PG_arch_1, &folio->flags);
return 0;
}

- rc = uv_convert_from_secure(page_to_phys(page));
+ rc = uv_convert_from_secure(folio_to_phys(folio));
if (!rc) {
- clear_bit(PG_arch_1, &page->flags);
+ clear_bit(PG_arch_1, &folio->flags);
return 0;
}

--
2.45.0


2024-05-08 18:31:52

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 06/10] s390/uv: make uv_convert_from_secure() a static function

It's not used outside of uv.c, so let's make it a static function.

Reviewed-by: Claudio Imbrenda <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/include/asm/uv.h | 6 ------
arch/s390/kernel/uv.c | 2 +-
2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index 0e7bd3873907..d2205ff97007 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -484,7 +484,6 @@ int uv_pin_shared(unsigned long paddr);
int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb);
int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr);
int uv_destroy_owned_page(unsigned long paddr);
-int uv_convert_from_secure(unsigned long paddr);
int uv_convert_owned_from_secure(unsigned long paddr);
int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr);

@@ -503,11 +502,6 @@ static inline int uv_destroy_owned_page(unsigned long paddr)
return 0;
}

-static inline int uv_convert_from_secure(unsigned long paddr)
-{
- return 0;
-}
-
static inline int uv_convert_owned_from_secure(unsigned long paddr)
{
return 0;
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index ecfc08902215..3d3250b406a6 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -156,7 +156,7 @@ int uv_destroy_owned_page(unsigned long paddr)
*
* @paddr: Absolute host address of page to be exported
*/
-int uv_convert_from_secure(unsigned long paddr)
+static int uv_convert_from_secure(unsigned long paddr)
{
struct uv_cb_cfs uvcb = {
.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
--
2.45.0


2024-05-08 18:40:06

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 07/10] s390/uv: convert uv_destroy_owned_page() to uv_destroy_(folio|pte)()

Let's have the following variants for destroying pages:

(1) uv_destroy(): Like uv_pin_shared() and uv_convert_from_secure(),
"low level" helper that operates on paddr and doesn't mess with folios.

(2) uv_destroy_folio(): Consumes a folio to which we hold a reference.

(3) uv_destroy_pte(): Consumes a PTE that holds a reference through the
mapping.

Unfortunately we need uv_destroy_pte(), because pfn_folio() and
friends are not available in pgtable.h.

Reviewed-by: Claudio Imbrenda <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/include/asm/pgtable.h | 2 +-
arch/s390/include/asm/uv.h | 10 ++++++++--
arch/s390/kernel/uv.c | 24 +++++++++++++++++-------
arch/s390/mm/gmap.c | 6 ++++--
4 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 6f11d063d545..f65db37917f2 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1215,7 +1215,7 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
* The notifier should have destroyed all protected vCPUs at this
* point, so the destroy should be successful.
*/
- if (full && !uv_destroy_owned_page(pte_val(res) & PAGE_MASK))
+ if (full && !uv_destroy_pte(res))
return res;
/*
* If something went wrong and the page could not be destroyed, or
diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index d2205ff97007..a1bef30066ef 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -483,7 +483,8 @@ static inline int is_prot_virt_host(void)
int uv_pin_shared(unsigned long paddr);
int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb);
int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr);
-int uv_destroy_owned_page(unsigned long paddr);
+int uv_destroy_folio(struct folio *folio);
+int uv_destroy_pte(pte_t pte);
int uv_convert_owned_from_secure(unsigned long paddr);
int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr);

@@ -497,7 +498,12 @@ static inline int uv_pin_shared(unsigned long paddr)
return 0;
}

-static inline int uv_destroy_owned_page(unsigned long paddr)
+static inline int uv_destroy_folio(struct folio *folio)
+{
+ return 0;
+}
+
+static inline int uv_destroy_pte(pte_t pte)
{
return 0;
}
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index 3d3250b406a6..61c1ce51c883 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -110,7 +110,7 @@ EXPORT_SYMBOL_GPL(uv_pin_shared);
*
* @paddr: Absolute host address of page to be destroyed
*/
-static int uv_destroy_page(unsigned long paddr)
+static int uv_destroy(unsigned long paddr)
{
struct uv_cb_cfs uvcb = {
.header.cmd = UVC_CMD_DESTR_SEC_STOR,
@@ -131,11 +131,10 @@ static int uv_destroy_page(unsigned long paddr)
}

/*
- * The caller must already hold a reference to the page
+ * The caller must already hold a reference to the folio
*/
-int uv_destroy_owned_page(unsigned long paddr)
+int uv_destroy_folio(struct folio *folio)
{
- struct folio *folio = phys_to_folio(paddr);
int rc;

/* See gmap_make_secure(): large folios cannot be secure */
@@ -143,13 +142,22 @@ int uv_destroy_owned_page(unsigned long paddr)
return 0;

folio_get(folio);
- rc = uv_destroy_page(paddr);
+ rc = uv_destroy(folio_to_phys(folio));
if (!rc)
clear_bit(PG_arch_1, &folio->flags);
folio_put(folio);
return rc;
}

+/*
+ * The present PTE still indirectly holds a folio reference through the mapping.
+ */
+int uv_destroy_pte(pte_t pte)
+{
+ VM_WARN_ON(!pte_present(pte));
+ return uv_destroy_folio(pfn_folio(pte_pfn(pte)));
+}
+
/*
* Requests the Ultravisor to encrypt a guest page and make it
* accessible to the host for paging (export).
@@ -437,6 +445,7 @@ int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr)
{
struct vm_area_struct *vma;
unsigned long uaddr;
+ struct folio *folio;
struct page *page;
int rc;

@@ -460,7 +469,8 @@ int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr)
page = follow_page(vma, uaddr, FOLL_WRITE | FOLL_GET);
if (IS_ERR_OR_NULL(page))
goto out;
- rc = uv_destroy_owned_page(page_to_phys(page));
+ folio = page_folio(page);
+ rc = uv_destroy_folio(folio);
/*
* Fault handlers can race; it is possible that two CPUs will fault
* on the same secure page. One CPU can destroy the page, reboot,
@@ -472,7 +482,7 @@ int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr)
*/
if (rc)
rc = uv_convert_owned_from_secure(page_to_phys(page));
- put_page(page);
+ folio_put(folio);
out:
mmap_read_unlock(gmap->mm);
return rc;
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index f2988bbcebbe..797068dccb73 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2841,13 +2841,15 @@ static const struct mm_walk_ops gather_pages_ops = {
*/
void s390_uv_destroy_pfns(unsigned long count, unsigned long *pfns)
{
+ struct folio *folio;
unsigned long i;

for (i = 0; i < count; i++) {
+ folio = pfn_folio(pfns[i]);
/* we always have an extra reference */
- uv_destroy_owned_page(pfn_to_phys(pfns[i]));
+ uv_destroy_folio(folio);
/* get rid of the extra reference */
- put_page(pfn_to_page(pfns[i]));
+ folio_put(folio);
cond_resched();
}
}
--
2.45.0


2024-05-08 18:42:38

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 09/10] s390/uv: implement HAVE_ARCH_MAKE_FOLIO_ACCESSIBLE

Let's also implement HAVE_ARCH_MAKE_FOLIO_ACCESSIBLE, so we can convert
arch_make_page_accessible() to be a simple wrapper around
arch_make_folio_accessible(). Unfortuantely, we cannot do that in the
header.

There are only two arch_make_page_accessible() calls remaining in gup.c.
We can now drop HAVE_ARCH_MAKE_PAGE_ACCESSIBLE completely form core-MM.
We'll handle that separately, once the s390x part landed.

Suggested-by: Matthew Wilcox <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/include/asm/page.h | 3 +++
arch/s390/kernel/uv.c | 18 +++++++++++-------
arch/s390/mm/fault.c | 14 ++++++++------
3 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
index ecbf4b626f46..5ec41ec3d761 100644
--- a/arch/s390/include/asm/page.h
+++ b/arch/s390/include/asm/page.h
@@ -162,6 +162,7 @@ static inline int page_reset_referenced(unsigned long addr)
#define _PAGE_ACC_BITS 0xf0 /* HW access control bits */

struct page;
+struct folio;
void arch_free_page(struct page *page, int order);
void arch_alloc_page(struct page *page, int order);

@@ -174,6 +175,8 @@ static inline int devmem_is_allowed(unsigned long pfn)
#define HAVE_ARCH_ALLOC_PAGE

#if IS_ENABLED(CONFIG_PGSTE)
+int arch_make_folio_accessible(struct folio *folio);
+#define HAVE_ARCH_MAKE_FOLIO_ACCESSIBLE
int arch_make_page_accessible(struct page *page);
#define HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
#endif
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index b456066d72da..fa62fa0e369f 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -498,14 +498,13 @@ int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr)
EXPORT_SYMBOL_GPL(gmap_destroy_page);

/*
- * To be called with the page locked or with an extra reference! This will
- * prevent gmap_make_secure from touching the page concurrently. Having 2
- * parallel make_page_accessible is fine, as the UV calls will become a
- * no-op if the page is already exported.
+ * To be called with the folio locked or with an extra reference! This will
+ * prevent gmap_make_secure from touching the folio concurrently. Having 2
+ * parallel arch_make_folio_accessible is fine, as the UV calls will become a
+ * no-op if the folio is already exported.
*/
-int arch_make_page_accessible(struct page *page)
+int arch_make_folio_accessible(struct folio *folio)
{
- struct folio *folio = page_folio(page);
int rc = 0;

/* See gmap_make_secure(): large folios cannot be secure */
@@ -537,8 +536,13 @@ int arch_make_page_accessible(struct page *page)

return rc;
}
-EXPORT_SYMBOL_GPL(arch_make_page_accessible);
+EXPORT_SYMBOL_GPL(arch_make_folio_accessible);

+int arch_make_page_accessible(struct page *page)
+{
+ return arch_make_folio_accessible(page_folio(page));
+}
+EXPORT_SYMBOL_GPL(arch_make_page_accessible);
#endif

#if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) || IS_ENABLED(CONFIG_KVM)
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index c421dd44ffbe..a1ba58460593 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -491,6 +491,7 @@ void do_secure_storage_access(struct pt_regs *regs)
unsigned long addr = get_fault_address(regs);
struct vm_area_struct *vma;
struct mm_struct *mm;
+ struct folio *folio;
struct page *page;
struct gmap *gmap;
int rc;
@@ -538,17 +539,18 @@ void do_secure_storage_access(struct pt_regs *regs)
mmap_read_unlock(mm);
break;
}
- if (arch_make_page_accessible(page))
+ folio = page_folio(page);
+ if (arch_make_folio_accessible(folio))
send_sig(SIGSEGV, current, 0);
- put_page(page);
+ folio_put(folio);
mmap_read_unlock(mm);
break;
case KERNEL_FAULT:
- page = phys_to_page(addr);
- if (unlikely(!try_get_page(page)))
+ folio = phys_to_folio(addr);
+ if (unlikely(!folio_try_get(folio)))
break;
- rc = arch_make_page_accessible(page);
- put_page(page);
+ rc = arch_make_folio_accessible(folio);
+ folio_put(folio);
if (rc)
BUG();
break;
--
2.45.0


2024-05-08 18:47:11

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 05/10] s390/uv: update PG_arch_1 comment

We removed the usage of PG_arch_1 for page tables in commit
a51324c430db ("s390/cmma: rework no-dat handling").

Let's update the comment in UV to reflect that.

Reviewed-by: Claudio Imbrenda <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/kernel/uv.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index 914dcec27329..ecfc08902215 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -495,13 +495,12 @@ int arch_make_page_accessible(struct page *page)
return 0;

/*
- * PG_arch_1 is used in 3 places:
- * 1. for kernel page tables during early boot
- * 2. for storage keys of huge pages and KVM
- * 3. As an indication that this small folio might be secure. This can
+ * PG_arch_1 is used in 2 places:
+ * 1. for storage keys of hugetlb folios and KVM
+ * 2. As an indication that this small folio might be secure. This can
* overindicate, e.g. we set the bit before calling
* convert_to_secure.
- * As secure pages are never huge, all 3 variants can co-exists.
+ * As secure pages are never large folios, both variants can co-exists.
*/
if (!test_bit(PG_arch_1, &folio->flags))
return 0;
--
2.45.0


2024-05-08 18:48:11

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 08/10] s390/uv: convert uv_convert_owned_from_secure() to uv_convert_from_secure_(folio|pte)()

Let's do the same as we did for uv_destroy_(folio|pte)() and
have the following variants:

(1) uv_convert_from_secure(): "low level" helper that operates on paddr
and does not mess with folios.

(2) uv_convert_from_secure_folio(): Consumes a folio to which we hold a
reference.

(3) uv_convert_from_secure_pte(): Consumes a PTE that holds a reference
through the mapping.

Unfortunately we need uv_convert_from_secure_pte(), because pfn_folio()
and friends are not available in pgtable.h.

Reviewed-by: Claudio Imbrenda <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/include/asm/pgtable.h | 6 +++---
arch/s390/include/asm/uv.h | 4 ++--
arch/s390/kernel/uv.c | 18 +++++++++++++-----
3 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index f65db37917f2..928282bacfc7 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1165,7 +1165,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
res = ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID));
/* At this point the reference through the mapping is still present */
if (mm_is_protected(mm) && pte_present(res))
- uv_convert_owned_from_secure(pte_val(res) & PAGE_MASK);
+ uv_convert_from_secure_pte(res);
return res;
}

@@ -1183,7 +1183,7 @@ static inline pte_t ptep_clear_flush(struct vm_area_struct *vma,
res = ptep_xchg_direct(vma->vm_mm, addr, ptep, __pte(_PAGE_INVALID));
/* At this point the reference through the mapping is still present */
if (mm_is_protected(vma->vm_mm) && pte_present(res))
- uv_convert_owned_from_secure(pte_val(res) & PAGE_MASK);
+ uv_convert_from_secure_pte(res);
return res;
}

@@ -1222,7 +1222,7 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
* if this is not a mm teardown, the slower export is used as
* fallback instead.
*/
- uv_convert_owned_from_secure(pte_val(res) & PAGE_MASK);
+ uv_convert_from_secure_pte(res);
return res;
}

diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index a1bef30066ef..0679445cac0b 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -485,7 +485,7 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb);
int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr);
int uv_destroy_folio(struct folio *folio);
int uv_destroy_pte(pte_t pte);
-int uv_convert_owned_from_secure(unsigned long paddr);
+int uv_convert_from_secure_pte(pte_t pte);
int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr);

void setup_uv(void);
@@ -508,7 +508,7 @@ static inline int uv_destroy_pte(pte_t pte)
return 0;
}

-static inline int uv_convert_owned_from_secure(unsigned long paddr)
+static inline int uv_convert_from_secure_pte(pte_t pte)
{
return 0;
}
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index 61c1ce51c883..b456066d72da 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -178,11 +178,10 @@ static int uv_convert_from_secure(unsigned long paddr)
}

/*
- * The caller must already hold a reference to the page
+ * The caller must already hold a reference to the folio.
*/
-int uv_convert_owned_from_secure(unsigned long paddr)
+static int uv_convert_from_secure_folio(struct folio *folio)
{
- struct folio *folio = phys_to_folio(paddr);
int rc;

/* See gmap_make_secure(): large folios cannot be secure */
@@ -190,13 +189,22 @@ int uv_convert_owned_from_secure(unsigned long paddr)
return 0;

folio_get(folio);
- rc = uv_convert_from_secure(paddr);
+ rc = uv_convert_from_secure(folio_to_phys(folio));
if (!rc)
clear_bit(PG_arch_1, &folio->flags);
folio_put(folio);
return rc;
}

+/*
+ * The present PTE still indirectly holds a folio reference through the mapping.
+ */
+int uv_convert_from_secure_pte(pte_t pte)
+{
+ VM_WARN_ON(!pte_present(pte));
+ return uv_convert_from_secure_folio(pfn_folio(pte_pfn(pte)));
+}
+
/*
* Calculate the expected ref_count for a folio that would otherwise have no
* further pins. This was cribbed from similar functions in other places in
@@ -481,7 +489,7 @@ int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr)
* we instead try to export the page.
*/
if (rc)
- rc = uv_convert_owned_from_secure(page_to_phys(page));
+ rc = uv_convert_from_secure_folio(folio);
folio_put(folio);
out:
mmap_read_unlock(gmap->mm);
--
2.45.0


2024-05-08 18:49:09

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 10/10] s390/hugetlb: convert PG_arch_1 code to work on folio->flags

Let's make it clearer that we are always working on folio flags and
never page flags of tail pages by converting remaining PG_arch_1 users
that modify page->flags to modify folio->flags instead.

No functional change intended, because we would always have worked with
the head page (where page->flags corresponds to folio->flags) and never
with tail pages.

Reviewed-by: Claudio Imbrenda <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/mm/gmap.c | 4 ++--
arch/s390/mm/hugetlbpage.c | 8 ++++----
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 797068dccb73..7319be707a98 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2733,7 +2733,7 @@ static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr,
{
pmd_t *pmd = (pmd_t *)pte;
unsigned long start, end;
- struct page *page = pmd_page(*pmd);
+ struct folio *folio = page_folio(pmd_page(*pmd));

/*
* The write check makes sure we do not set a key on shared
@@ -2748,7 +2748,7 @@ static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr,
start = pmd_val(*pmd) & HPAGE_MASK;
end = start + HPAGE_SIZE - 1;
__storage_key_init_range(start, end);
- set_bit(PG_arch_1, &page->flags);
+ set_bit(PG_arch_1, &folio->flags);
cond_resched();
return 0;
}
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index c2e8242bd15d..a32047315f9a 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -121,7 +121,7 @@ static inline pte_t __rste_to_pte(unsigned long rste)

static void clear_huge_pte_skeys(struct mm_struct *mm, unsigned long rste)
{
- struct page *page;
+ struct folio *folio;
unsigned long size, paddr;

if (!mm_uses_skeys(mm) ||
@@ -129,16 +129,16 @@ static void clear_huge_pte_skeys(struct mm_struct *mm, unsigned long rste)
return;

if ((rste & _REGION_ENTRY_TYPE_MASK) == _REGION_ENTRY_TYPE_R3) {
- page = pud_page(__pud(rste));
+ folio = page_folio(pud_page(__pud(rste)));
size = PUD_SIZE;
paddr = rste & PUD_MASK;
} else {
- page = pmd_page(__pmd(rste));
+ folio = page_folio(pmd_page(__pmd(rste)));
size = PMD_SIZE;
paddr = rste & PMD_MASK;
}

- if (!test_and_set_bit(PG_arch_1, &page->flags))
+ if (!test_and_set_bit(PG_arch_1, &folio->flags))
__storage_key_init_range(paddr, paddr + size - 1);
}

--
2.45.0


2024-05-09 15:05:36

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH v3 00/10] s390: PG_arch_1+folio cleanups for uv+hugetlb

On Wed, May 08, 2024 at 08:29:45PM +0200, David Hildenbrand wrote:
> Rebased on 390x/features. Cleanups around PG_arch_1 and folio handling
> in UV and hugetlb code.
>
> One "easy" fix upfront. Another issue I spotted is documented in [1].
>
> Once this hits upstream, we can remove HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> from core-mm and s390x, so only the folio variant will remain.
>
> Compile tested, but not runtime tested with UV, I'll appreciate some
> testing help from people with UV access and experience.
>
> [1] https://lkml.kernel.org/r/[email protected]
>
> v2 -> v3:
> * "s390/uv: split large folios in gmap_make_secure()"
> -> Spelling fix
> * "s390/hugetlb: convert PG_arch_1 code to work on folio->flags"
> -> Extended patch description

Added Claudio's Reviewed-by from v2 to the third patch, and fixed a
typo in the commit message of patch 9.

Applied, thanks!

2024-05-09 17:27:44

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3 00/10] s390: PG_arch_1+folio cleanups for uv+hugetlb

On 09.05.24 17:04, Heiko Carstens wrote:
> On Wed, May 08, 2024 at 08:29:45PM +0200, David Hildenbrand wrote:
>> Rebased on 390x/features. Cleanups around PG_arch_1 and folio handling
>> in UV and hugetlb code.
>>
>> One "easy" fix upfront. Another issue I spotted is documented in [1].
>>
>> Once this hits upstream, we can remove HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
>> from core-mm and s390x, so only the folio variant will remain.
>>
>> Compile tested, but not runtime tested with UV, I'll appreciate some
>> testing help from people with UV access and experience.
>>
>> [1] https://lkml.kernel.org/r/[email protected]
>>
>> v2 -> v3:
>> * "s390/uv: split large folios in gmap_make_secure()"
>> -> Spelling fix
>> * "s390/hugetlb: convert PG_arch_1 code to work on folio->flags"
>> -> Extended patch description
>
> Added Claudio's Reviewed-by from v2 to the third patch, and fixed a
> typo in the commit message of patch 9.

Ah, I missed on RB, thanks!

--
Cheers,

David / dhildenb