2024-04-04 16:38:10

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and PG_arch_1

On my journey to remove page_mapcount(), I got hooked up on other folio
cleanups that Willy most certainly will enjoy.

This series removes the s390x usage of:
* page_mapcount() [patches WIP]
* page_has_private() [have patches to remove it]

.. and makes PG_arch_1 only be set on folio->flags (i.e., never on tail
pages of large folios).

Further, one "easy" fix upfront.

.. unfortunately there is one other issue I spotted that I am not
tackling in this series, because I am not 100% sure what we want to
do: the usage of page_ref_freeze()/folio_ref_freeze() in
make_folio_secure() is unsafe. :(

In make_folio_secure(), we're holding the folio lock, the mmap lock and
the PT lock. So we are protected against concurrent fork(), zap, GUP,
swapin, migration ... The page_ref_freeze()/ folio_ref_freeze() should
also block concurrent GUP-fast very reliably.

But if the folio is mapped into multiple page tables, we could see
concurrent zapping of the folio, a pagecache folios could get mapped/
accessed concurrent, we could see fork() sharing the page in another
process, GUP ... trying to adjust the folio refcount while we froze it.
Very bad.

For anonymous folios, it would likely be sufficient to check that
folio_mapcount() == 1. For pagecache folios, that's insufficient, likely
we would have to lock the pagecache. To handle folios mapped into
multiple page tables, we would have to do what
split_huge_page_to_list_to_order() does (temporary migration entries).

So it's a bit more involved, and I'll have to leave that to s390x folks to
figure out. There are othe reasonable cleanups I think, but I'll have to
focus on other stuff.

Compile tested, but not runtime tested, I'll appreiate some testing help
from people with UV access and experience.

Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Janosch Frank <[email protected]>
Cc: Claudio Imbrenda <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Thomas Huth <[email protected]>

David Hildenbrand (5):
s390/uv: don't call wait_on_page_writeback() without a reference
s390/uv: convert gmap_make_secure() to work on folios
s390/uv: convert PG_arch_1 users to only work on small folios
s390/uv: update PG_arch_1 comment
s390/hugetlb: convert PG_arch_1 code to work on folio->flags

arch/s390/include/asm/page.h | 2 +
arch/s390/kernel/uv.c | 112 ++++++++++++++++++++++-------------
arch/s390/mm/gmap.c | 4 +-
arch/s390/mm/hugetlbpage.c | 8 +--
4 files changed, 79 insertions(+), 47 deletions(-)

--
2.44.0



2024-04-04 16:38:15

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 1/5] s390/uv: don't call wait_on_page_writeback() without a reference

wait_on_page_writeback() requires that no spinlocks are held and that
a page reference is held, as documented for folio_wait_writeback(). After
we dropped the PTL, the page could get freed concurrently. So grab a
temporary reference.

Fixes: 214d9bbcd3a6 ("s390/mm: provide memory management functions for protected KVM guests")
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/kernel/uv.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index fc07bc39e698..7401838b960b 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -314,6 +314,13 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
rc = make_page_secure(page, uvcb);
unlock_page(page);
}
+
+ /*
+ * Once we drop the PTL, the page may get unmapped and
+ * freed immediately. We need a temporary reference.
+ */
+ if (rc == -EAGAIN)
+ get_page(page);
}
pte_unmap_unlock(ptep, ptelock);
out:
@@ -325,6 +332,7 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
* completion, this is just a useless check, but it is safe.
*/
wait_on_page_writeback(page);
+ put_page(page);
} else if (rc == -EBUSY) {
/*
* If we have tried a local drain and the page refcount
--
2.44.0


2024-04-04 16:38:35

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 2/5] s390/uv: convert gmap_make_secure() to work on folios

We have various goals that require gmap_make_secure() to only work on
folios. We want to limit the use of page_mapcount() to the places where it
is absolutely necessary, we want to avoid using page flags of tail
pages, and we want to remove page_has_private().

So, let's convert gmap_make_secure() to folios. While s390x makes sure
to never have PMD-mapped THP in processes that use KVM -- by remapping
them using PTEs in thp_split_walk_pmd_entry()->split_huge_pmd() -- we might
still find PTE-mapped THPs and could end up working on tail pages of
such large folios for now.

To handle that cleanly, let's simply split any PTE-mapped large folio,
so we can be sure that we are always working with small folios and never
on tail pages.

There is no real change: splitting will similarly fail on unexpected folio
references, just like it would already when we try to freeze the folio
refcount.

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/include/asm/page.h | 1 +
arch/s390/kernel/uv.c | 66 ++++++++++++++++++++++--------------
2 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
index 9381879f7ecf..54d015bcd8e3 100644
--- a/arch/s390/include/asm/page.h
+++ b/arch/s390/include/asm/page.h
@@ -215,6 +215,7 @@ static inline unsigned long __phys_addr(unsigned long x, bool is_31bit)

#define phys_to_page(phys) pfn_to_page(phys_to_pfn(phys))
#define page_to_phys(page) pfn_to_phys(page_to_pfn(page))
+#define folio_to_phys(page) pfn_to_phys(folio_pfn(folio))

static inline void *pfn_to_virt(unsigned long pfn)
{
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index 7401838b960b..adcbd4b13035 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -181,36 +181,36 @@ int uv_convert_owned_from_secure(unsigned long paddr)
}

/*
- * Calculate the expected ref_count for a page that would otherwise have no
+ * Calculate the expected ref_count for a folio that would otherwise have no
* further pins. This was cribbed from similar functions in other places in
* the kernel, but with some slight modifications. We know that a secure
- * page can not be a huge page for example.
+ * folio can only be a small folio for example.
*/
-static int expected_page_refs(struct page *page)
+static int expected_folio_refs(struct folio *folio)
{
int res;

- res = page_mapcount(page);
- if (PageSwapCache(page)) {
+ res = folio_mapcount(folio);
+ if (folio_test_swapcache(folio)) {
res++;
- } else if (page_mapping(page)) {
+ } else if (folio_mapping(folio)) {
res++;
- if (page_has_private(page))
+ if (folio_has_private(folio))
res++;
}
return res;
}

-static int make_page_secure(struct page *page, struct uv_cb_header *uvcb)
+static int make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb)
{
int expected, cc = 0;

- if (PageWriteback(page))
+ if (folio_test_writeback(folio))
return -EAGAIN;
- expected = expected_page_refs(page);
- if (!page_ref_freeze(page, expected))
+ expected = expected_folio_refs(folio);
+ if (!folio_ref_freeze(folio, expected))
return -EBUSY;
- set_bit(PG_arch_1, &page->flags);
+ set_bit(PG_arch_1, &folio->flags);
/*
* If the UVC does not succeed or fail immediately, we don't want to
* loop for long, or we might get stall notifications.
@@ -220,9 +220,9 @@ static int make_page_secure(struct page *page, struct uv_cb_header *uvcb)
* -EAGAIN and we let the callers deal with it.
*/
cc = __uv_call(0, (u64)uvcb);
- page_ref_unfreeze(page, expected);
+ folio_ref_unfreeze(folio, expected);
/*
- * Return -ENXIO if the page was not mapped, -EINVAL for other errors.
+ * Return -ENXIO if the folio was not mapped, -EINVAL for other errors.
* If busy or partially completed, return -EAGAIN.
*/
if (cc == UVC_CC_OK)
@@ -277,7 +277,7 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
bool local_drain = false;
spinlock_t *ptelock;
unsigned long uaddr;
- struct page *page;
+ struct folio *folio;
pte_t *ptep;
int rc;

@@ -306,33 +306,49 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
if (!ptep)
goto out;
if (pte_present(*ptep) && !(pte_val(*ptep) & _PAGE_INVALID) && pte_write(*ptep)) {
- page = pte_page(*ptep);
+ folio = page_folio(pte_page(*ptep));
rc = -EAGAIN;
- if (trylock_page(page)) {
+
+ /* We might get PTE-mapped large folios; split them first. */
+ if (folio_test_large(folio)) {
+ rc = -E2BIG;
+ } else if (folio_trylock(folio)) {
if (should_export_before_import(uvcb, gmap->mm))
- uv_convert_from_secure(page_to_phys(page));
- rc = make_page_secure(page, uvcb);
- unlock_page(page);
+ uv_convert_from_secure(folio_to_phys(folio));
+ rc = make_folio_secure(folio, uvcb);
+ folio_unlock(folio);
}

/*
- * Once we drop the PTL, the page may get unmapped and
+ * Once we drop the PTL, the folio may get unmapped and
* freed immediately. We need a temporary reference.
*/
- if (rc == -EAGAIN)
- get_page(page);
+ if (rc == -EAGAIN || rc == -E2BIG)
+ folio_get(folio);
}
pte_unmap_unlock(ptep, ptelock);
out:
mmap_read_unlock(gmap->mm);

+ if (rc == -E2BIG) {
+ /*
+ * Splitting might fail with -EBUSY due to unexpected folio
+ * references, just like make_folio_secure(). So handle it
+ * ahead of time without the PTL being held.
+ */
+ folio_lock(folio);
+ rc = split_folio(folio);
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+
if (rc == -EAGAIN) {
/*
* If we are here because the UVC returned busy or partial
* completion, this is just a useless check, but it is safe.
*/
- wait_on_page_writeback(page);
- put_page(page);
+ folio_wait_writeback(folio);
+ folio_put(folio);
} else if (rc == -EBUSY) {
/*
* If we have tried a local drain and the page refcount
--
2.44.0


2024-04-04 16:38:54

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 3/5] s390/uv: convert PG_arch_1 users to only work on small folios

Now that make_folio_secure() may only set PG_arch_1 for small folios,
let's convert relevant remaining UV code to only work on (small) folios
and simply reject large folios early. This way, we'll never end up
touching PG_arch_1 on tail pages of a large folio in UV code.

The folio_get()/folio_put() for functions that are documented to already
hold a folio reference look weird and it should probably be removed.
Similarly, uv_destroy_owned_page() and uv_convert_owned_from_secure()
should really consume a folio reference instead. But these are cleanups for
another day.

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/include/asm/page.h | 1 +
arch/s390/kernel/uv.c | 39 +++++++++++++++++++++---------------
2 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
index 54d015bcd8e3..b64384872c0f 100644
--- a/arch/s390/include/asm/page.h
+++ b/arch/s390/include/asm/page.h
@@ -214,6 +214,7 @@ static inline unsigned long __phys_addr(unsigned long x, bool is_31bit)
#define pfn_to_phys(pfn) ((pfn) << PAGE_SHIFT)

#define phys_to_page(phys) pfn_to_page(phys_to_pfn(phys))
+#define phys_to_folio(phys) page_folio(phys_to_page(phys))
#define page_to_phys(page) pfn_to_phys(page_to_pfn(page))
#define folio_to_phys(page) pfn_to_phys(folio_pfn(folio))

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index adcbd4b13035..9c0113b26735 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -134,14 +134,17 @@ static int uv_destroy_page(unsigned long paddr)
*/
int uv_destroy_owned_page(unsigned long paddr)
{
- struct page *page = phys_to_page(paddr);
+ struct folio *folio = phys_to_folio(paddr);
int rc;

- get_page(page);
+ if (unlikely(folio_test_large(folio)))
+ return 0;
+
+ folio_get(folio);
rc = uv_destroy_page(paddr);
if (!rc)
- clear_bit(PG_arch_1, &page->flags);
- put_page(page);
+ clear_bit(PG_arch_1, &folio->flags);
+ folio_put(folio);
return rc;
}

@@ -169,14 +172,17 @@ int uv_convert_from_secure(unsigned long paddr)
*/
int uv_convert_owned_from_secure(unsigned long paddr)
{
- struct page *page = phys_to_page(paddr);
+ struct folio *folio = phys_to_folio(paddr);
int rc;

- get_page(page);
+ if (unlikely(folio_test_large(folio)))
+ return 0;
+
+ folio_get(folio);
rc = uv_convert_from_secure(paddr);
if (!rc)
- clear_bit(PG_arch_1, &page->flags);
- put_page(page);
+ clear_bit(PG_arch_1, &folio->flags);
+ folio_put(folio);
return rc;
}

@@ -457,33 +463,34 @@ EXPORT_SYMBOL_GPL(gmap_destroy_page);
*/
int arch_make_page_accessible(struct page *page)
{
+ struct folio *folio = page_folio(page);
int rc = 0;

- /* Hugepage cannot be protected, so nothing to do */
- if (PageHuge(page))
+ /* Large folios cannot be protected, so nothing to do */
+ if (unlikely(folio_test_large(folio)))
return 0;

/*
* PG_arch_1 is used in 3 places:
* 1. for kernel page tables during early boot
* 2. for storage keys of huge pages and KVM
- * 3. As an indication that this page might be secure. This can
+ * 3. As an indication that this small folio might be secure. This can
* overindicate, e.g. we set the bit before calling
* convert_to_secure.
* As secure pages are never huge, all 3 variants can co-exists.
*/
- if (!test_bit(PG_arch_1, &page->flags))
+ if (!test_bit(PG_arch_1, &folio->flags))
return 0;

- rc = uv_pin_shared(page_to_phys(page));
+ rc = uv_pin_shared(folio_to_phys(folio));
if (!rc) {
- clear_bit(PG_arch_1, &page->flags);
+ clear_bit(PG_arch_1, &folio->flags);
return 0;
}

- rc = uv_convert_from_secure(page_to_phys(page));
+ rc = uv_convert_from_secure(folio_to_phys(folio));
if (!rc) {
- clear_bit(PG_arch_1, &page->flags);
+ clear_bit(PG_arch_1, &folio->flags);
return 0;
}

--
2.44.0


2024-04-04 16:39:17

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 4/5] s390/uv: update PG_arch_1 comment

We removed the usage of PG_arch_1 for page tables in commit
a51324c430db ("s390/cmma: rework no-dat handling").

Let's update the comment in UV to reflect that.

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/kernel/uv.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index 9c0113b26735..76fc61333fae 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -471,13 +471,12 @@ int arch_make_page_accessible(struct page *page)
return 0;

/*
- * PG_arch_1 is used in 3 places:
- * 1. for kernel page tables during early boot
- * 2. for storage keys of huge pages and KVM
- * 3. As an indication that this small folio might be secure. This can
+ * PG_arch_1 is used in 2 places:
+ * 1. for storage keys of hugetlb folios and KVM
+ * 2. As an indication that this small folio might be secure. This can
* overindicate, e.g. we set the bit before calling
* convert_to_secure.
- * As secure pages are never huge, all 3 variants can co-exists.
+ * As secure pages are never large folios, both variants can co-exists.
*/
if (!test_bit(PG_arch_1, &folio->flags))
return 0;
--
2.44.0


2024-04-04 16:39:32

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 5/5] s390/hugetlb: convert PG_arch_1 code to work on folio->flags

Let's make it clearer that we are always working on folio flags and
never page flags of tail pages.

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/mm/gmap.c | 4 ++--
arch/s390/mm/hugetlbpage.c | 8 ++++----
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 9233b0acac89..ca31f2143bc0 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2731,7 +2731,7 @@ static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr,
{
pmd_t *pmd = (pmd_t *)pte;
unsigned long start, end;
- struct page *page = pmd_page(*pmd);
+ struct folio *folio = pmd_folio(*pmd);

/*
* The write check makes sure we do not set a key on shared
@@ -2746,7 +2746,7 @@ static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr,
start = pmd_val(*pmd) & HPAGE_MASK;
end = start + HPAGE_SIZE - 1;
__storage_key_init_range(start, end);
- set_bit(PG_arch_1, &page->flags);
+ set_bit(PG_arch_1, &folio->flags);
cond_resched();
return 0;
}
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index e1e63dc1b23d..21ed6ac5f1c5 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -121,7 +121,7 @@ static inline pte_t __rste_to_pte(unsigned long rste)

static void clear_huge_pte_skeys(struct mm_struct *mm, unsigned long rste)
{
- struct page *page;
+ struct folio *folio;
unsigned long size, paddr;

if (!mm_uses_skeys(mm) ||
@@ -129,16 +129,16 @@ static void clear_huge_pte_skeys(struct mm_struct *mm, unsigned long rste)
return;

if ((rste & _REGION_ENTRY_TYPE_MASK) == _REGION_ENTRY_TYPE_R3) {
- page = pud_page(__pud(rste));
+ folio = page_folio(pud_page(__pud(rste)));
size = PUD_SIZE;
paddr = rste & PUD_MASK;
} else {
- page = pmd_page(__pmd(rste));
+ folio = pmd_folio(__pmd(rste));
size = PMD_SIZE;
paddr = rste & PMD_MASK;
}

- if (!test_and_set_bit(PG_arch_1, &page->flags))
+ if (!test_and_set_bit(PG_arch_1, &folio->flags))
__storage_key_init_range(paddr, paddr + size - 1);
}

--
2.44.0


2024-04-05 06:38:16

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v1 2/5] s390/uv: convert gmap_make_secure() to work on folios

On Thu, Apr 04, 2024 at 06:36:39PM +0200, David Hildenbrand wrote:
> + /* We might get PTE-mapped large folios; split them first. */
> + if (folio_test_large(folio)) {
> + rc = -E2BIG;

We agree to this point. I just turned this into -EINVAL.

>
> + if (rc == -E2BIG) {
> + /*
> + * Splitting might fail with -EBUSY due to unexpected folio
> + * references, just like make_folio_secure(). So handle it
> + * ahead of time without the PTL being held.
> + */
> + folio_lock(folio);
> + rc = split_folio(folio);
> + folio_unlock(folio);
> + folio_put(folio);
> + }

Ummm ... if split_folio() succeeds, aren't we going to return 0 from
this function, which will be interpreted as make_folio_secure() having
succeeded?


2024-04-05 07:09:52

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 2/5] s390/uv: convert gmap_make_secure() to work on folios

On 05.04.24 05:29, Matthew Wilcox wrote:
> On Thu, Apr 04, 2024 at 06:36:39PM +0200, David Hildenbrand wrote:
>> + /* We might get PTE-mapped large folios; split them first. */
>> + if (folio_test_large(folio)) {
>> + rc = -E2BIG;
>
> We agree to this point. I just turned this into -EINVAL.
>
>>
>> + if (rc == -E2BIG) {
>> + /*
>> + * Splitting might fail with -EBUSY due to unexpected folio
>> + * references, just like make_folio_secure(). So handle it
>> + * ahead of time without the PTL being held.
>> + */
>> + folio_lock(folio);
>> + rc = split_folio(folio);
>> + folio_unlock(folio);
>> + folio_put(folio);
>> + }
>
> Ummm ... if split_folio() succeeds, aren't we going to return 0 from
> this function, which will be interpreted as make_folio_secure() having
> succeeded?

I assume the code would have to handle that, because it must deal with
possible races that would try to convert the folio page.

But the right thing to do is

if (!rc)
goto again;

after the put.

--
Cheers,

David / dhildenb


2024-04-05 07:13:57

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v1 3/5] s390/uv: convert PG_arch_1 users to only work on small folios

On Thu, Apr 04, 2024 at 06:36:40PM +0200, David Hildenbrand wrote:
> Now that make_folio_secure() may only set PG_arch_1 for small folios,
> let's convert relevant remaining UV code to only work on (small) folios
> and simply reject large folios early. This way, we'll never end up
> touching PG_arch_1 on tail pages of a large folio in UV code.
>
> The folio_get()/folio_put() for functions that are documented to already
> hold a folio reference look weird and it should probably be removed.
> Similarly, uv_destroy_owned_page() and uv_convert_owned_from_secure()
> should really consume a folio reference instead. But these are cleanups for
> another day.

Yes, and we should convert arch_make_page_accessible() to
arch_make_folio_accessible() ... one of the two callers already has the
folio (and page-writeback already calls arch_make_folio_accessible()

2024-04-05 07:18:18

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and PG_arch_1

On 05.04.24 05:42, Matthew Wilcox wrote:
> On Thu, Apr 04, 2024 at 06:36:37PM +0200, David Hildenbrand wrote:
>> On my journey to remove page_mapcount(), I got hooked up on other folio
>> cleanups that Willy most certainly will enjoy.
>>
>> This series removes the s390x usage of:
>> * page_mapcount() [patches WIP]
>> * page_has_private() [have patches to remove it]
>>
>> ... and makes PG_arch_1 only be set on folio->flags (i.e., never on tail
>> pages of large folios).
>>
>> Further, one "easy" fix upfront.
>
> Looks like you didn't see:
>
> https://lore.kernel.org/linux-s390/[email protected]/

Yes, I only skimmed linux-mm.

I think s390x certainly wants to handle PTE-mapped THP in that code, I
think there are ways to trigger that, we're just mostly lucky that it
doesn't happen in the common case.

But thinking about it, the current page_mapcount() based check could not
possibly have worked for them and rejected any PTE-mapped THP.

So I can just base my changes on top of yours (we might want to get the
first fix in ahead of time).

>
>> ... unfortunately there is one other issue I spotted that I am not
>> tackling in this series, because I am not 100% sure what we want to
>> do: the usage of page_ref_freeze()/folio_ref_freeze() in
>> make_folio_secure() is unsafe. :(
>>
>> In make_folio_secure(), we're holding the folio lock, the mmap lock and
>> the PT lock. So we are protected against concurrent fork(), zap, GUP,
>> swapin, migration ... The page_ref_freeze()/ folio_ref_freeze() should
>> also block concurrent GUP-fast very reliably.
>>
>> But if the folio is mapped into multiple page tables, we could see
>> concurrent zapping of the folio, a pagecache folios could get mapped/
>> accessed concurrent, we could see fork() sharing the page in another
>> process, GUP ... trying to adjust the folio refcount while we froze it.
>> Very bad.
>
> Hmmm. Why is that not then a problem for, eg, splitting or migrating?
> Is it because they unmap first and then try to freeze?

Yes, exactly. Using mapcount in combination with ref freezing is
problematic. Except maybe for anonymous folios with mapcount=1, while
holding a bunch of locks to stop anybody from stumbling over that.

--
Cheers,

David / dhildenb


2024-04-05 08:43:17

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and PG_arch_1

On Thu, Apr 04, 2024 at 06:36:37PM +0200, David Hildenbrand wrote:
> On my journey to remove page_mapcount(), I got hooked up on other folio
> cleanups that Willy most certainly will enjoy.
>
> This series removes the s390x usage of:
> * page_mapcount() [patches WIP]
> * page_has_private() [have patches to remove it]
>
> ... and makes PG_arch_1 only be set on folio->flags (i.e., never on tail
> pages of large folios).
>
> Further, one "easy" fix upfront.

Looks like you didn't see:

https://lore.kernel.org/linux-s390/[email protected]/

> ... unfortunately there is one other issue I spotted that I am not
> tackling in this series, because I am not 100% sure what we want to
> do: the usage of page_ref_freeze()/folio_ref_freeze() in
> make_folio_secure() is unsafe. :(
>
> In make_folio_secure(), we're holding the folio lock, the mmap lock and
> the PT lock. So we are protected against concurrent fork(), zap, GUP,
> swapin, migration ... The page_ref_freeze()/ folio_ref_freeze() should
> also block concurrent GUP-fast very reliably.
>
> But if the folio is mapped into multiple page tables, we could see
> concurrent zapping of the folio, a pagecache folios could get mapped/
> accessed concurrent, we could see fork() sharing the page in another
> process, GUP ... trying to adjust the folio refcount while we froze it.
> Very bad.

Hmmm. Why is that not then a problem for, eg, splitting or migrating?
Is it because they unmap first and then try to freeze?

> For anonymous folios, it would likely be sufficient to check that
> folio_mapcount() == 1. For pagecache folios, that's insufficient, likely
> we would have to lock the pagecache. To handle folios mapped into
> multiple page tables, we would have to do what
> split_huge_page_to_list_to_order() does (temporary migration entries).
>
> So it's a bit more involved, and I'll have to leave that to s390x folks to
> figure out. There are othe reasonable cleanups I think, but I'll have to
> focus on other stuff.
>
> Compile tested, but not runtime tested, I'll appreiate some testing help
> from people with UV access and experience.
>
> Cc: Matthew Wilcox (Oracle) <[email protected]>
> Cc: Heiko Carstens <[email protected]>
> Cc: Vasily Gorbik <[email protected]>
> Cc: Alexander Gordeev <[email protected]>
> Cc: Christian Borntraeger <[email protected]>
> Cc: Sven Schnelle <[email protected]>
> Cc: Janosch Frank <[email protected]>
> Cc: Claudio Imbrenda <[email protected]>
> Cc: Gerald Schaefer <[email protected]>
> Cc: Matthew Wilcox (Oracle) <[email protected]>
> Cc: Thomas Huth <[email protected]>
>
> David Hildenbrand (5):
> s390/uv: don't call wait_on_page_writeback() without a reference
> s390/uv: convert gmap_make_secure() to work on folios
> s390/uv: convert PG_arch_1 users to only work on small folios
> s390/uv: update PG_arch_1 comment
> s390/hugetlb: convert PG_arch_1 code to work on folio->flags
>
> arch/s390/include/asm/page.h | 2 +
> arch/s390/kernel/uv.c | 112 ++++++++++++++++++++++-------------
> arch/s390/mm/gmap.c | 4 +-
> arch/s390/mm/hugetlbpage.c | 8 +--
> 4 files changed, 79 insertions(+), 47 deletions(-)
>
> --
> 2.44.0
>

2024-04-10 17:45:08

by Claudio Imbrenda

[permalink] [raw]
Subject: Re: [PATCH v1 1/5] s390/uv: don't call wait_on_page_writeback() without a reference

On Thu, 4 Apr 2024 18:36:38 +0200
David Hildenbrand <[email protected]> wrote:

> wait_on_page_writeback() requires that no spinlocks are held and that
> a page reference is held, as documented for folio_wait_writeback(). After

oops

> we dropped the PTL, the page could get freed concurrently. So grab a
> temporary reference.
>
> Fixes: 214d9bbcd3a6 ("s390/mm: provide memory management functions for protected KVM guests")
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Claudio Imbrenda <[email protected]>

> ---
> arch/s390/kernel/uv.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index fc07bc39e698..7401838b960b 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -314,6 +314,13 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
> rc = make_page_secure(page, uvcb);
> unlock_page(page);
> }
> +
> + /*
> + * Once we drop the PTL, the page may get unmapped and
> + * freed immediately. We need a temporary reference.
> + */
> + if (rc == -EAGAIN)
> + get_page(page);
> }
> pte_unmap_unlock(ptep, ptelock);
> out:
> @@ -325,6 +332,7 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
> * completion, this is just a useless check, but it is safe.
> */
> wait_on_page_writeback(page);
> + put_page(page);
> } else if (rc == -EBUSY) {
> /*
> * If we have tried a local drain and the page refcount


2024-04-10 17:45:40

by Claudio Imbrenda

[permalink] [raw]
Subject: Re: [PATCH v1 2/5] s390/uv: convert gmap_make_secure() to work on folios

On Thu, 4 Apr 2024 18:36:39 +0200
David Hildenbrand <[email protected]> wrote:

> We have various goals that require gmap_make_secure() to only work on
> folios. We want to limit the use of page_mapcount() to the places where it
> is absolutely necessary, we want to avoid using page flags of tail
> pages, and we want to remove page_has_private().
>
> So, let's convert gmap_make_secure() to folios. While s390x makes sure
> to never have PMD-mapped THP in processes that use KVM -- by remapping
> them using PTEs in thp_split_walk_pmd_entry()->split_huge_pmd() -- we might
> still find PTE-mapped THPs and could end up working on tail pages of
> such large folios for now.
>
> To handle that cleanly, let's simply split any PTE-mapped large folio,
> so we can be sure that we are always working with small folios and never
> on tail pages.
>
> There is no real change: splitting will similarly fail on unexpected folio
> references, just like it would already when we try to freeze the folio
> refcount.
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> arch/s390/include/asm/page.h | 1 +
> arch/s390/kernel/uv.c | 66 ++++++++++++++++++++++--------------
> 2 files changed, 42 insertions(+), 25 deletions(-)
>
> diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
> index 9381879f7ecf..54d015bcd8e3 100644
> --- a/arch/s390/include/asm/page.h
> +++ b/arch/s390/include/asm/page.h
> @@ -215,6 +215,7 @@ static inline unsigned long __phys_addr(unsigned long x, bool is_31bit)
>
> #define phys_to_page(phys) pfn_to_page(phys_to_pfn(phys))
> #define page_to_phys(page) pfn_to_phys(page_to_pfn(page))
> +#define folio_to_phys(page) pfn_to_phys(folio_pfn(folio))
>
> static inline void *pfn_to_virt(unsigned long pfn)
> {
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index 7401838b960b..adcbd4b13035 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -181,36 +181,36 @@ int uv_convert_owned_from_secure(unsigned long paddr)
> }
>
> /*
> - * Calculate the expected ref_count for a page that would otherwise have no
> + * Calculate the expected ref_count for a folio that would otherwise have no
> * further pins. This was cribbed from similar functions in other places in
> * the kernel, but with some slight modifications. We know that a secure
> - * page can not be a huge page for example.
> + * folio can only be a small folio for example.
> */
> -static int expected_page_refs(struct page *page)
> +static int expected_folio_refs(struct folio *folio)
> {
> int res;
>
> - res = page_mapcount(page);
> - if (PageSwapCache(page)) {
> + res = folio_mapcount(folio);
> + if (folio_test_swapcache(folio)) {
> res++;
> - } else if (page_mapping(page)) {
> + } else if (folio_mapping(folio)) {
> res++;
> - if (page_has_private(page))
> + if (folio_has_private(folio))
> res++;
> }
> return res;
> }
>
> -static int make_page_secure(struct page *page, struct uv_cb_header *uvcb)
> +static int make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb)
> {
> int expected, cc = 0;
>
> - if (PageWriteback(page))
> + if (folio_test_writeback(folio))
> return -EAGAIN;
> - expected = expected_page_refs(page);
> - if (!page_ref_freeze(page, expected))
> + expected = expected_folio_refs(folio);
> + if (!folio_ref_freeze(folio, expected))
> return -EBUSY;
> - set_bit(PG_arch_1, &page->flags);
> + set_bit(PG_arch_1, &folio->flags);
> /*
> * If the UVC does not succeed or fail immediately, we don't want to
> * loop for long, or we might get stall notifications.
> @@ -220,9 +220,9 @@ static int make_page_secure(struct page *page, struct uv_cb_header *uvcb)
> * -EAGAIN and we let the callers deal with it.
> */
> cc = __uv_call(0, (u64)uvcb);
> - page_ref_unfreeze(page, expected);
> + folio_ref_unfreeze(folio, expected);
> /*
> - * Return -ENXIO if the page was not mapped, -EINVAL for other errors.
> + * Return -ENXIO if the folio was not mapped, -EINVAL for other errors.
> * If busy or partially completed, return -EAGAIN.
> */
> if (cc == UVC_CC_OK)
> @@ -277,7 +277,7 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
> bool local_drain = false;
> spinlock_t *ptelock;
> unsigned long uaddr;
> - struct page *page;
> + struct folio *folio;
> pte_t *ptep;
> int rc;
>
> @@ -306,33 +306,49 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
> if (!ptep)
> goto out;
> if (pte_present(*ptep) && !(pte_val(*ptep) & _PAGE_INVALID) && pte_write(*ptep)) {
> - page = pte_page(*ptep);
> + folio = page_folio(pte_page(*ptep));
> rc = -EAGAIN;
> - if (trylock_page(page)) {
> +
> + /* We might get PTE-mapped large folios; split them first. */
> + if (folio_test_large(folio)) {
> + rc = -E2BIG;
> + } else if (folio_trylock(folio)) {
> if (should_export_before_import(uvcb, gmap->mm))
> - uv_convert_from_secure(page_to_phys(page));
> - rc = make_page_secure(page, uvcb);
> - unlock_page(page);
> + uv_convert_from_secure(folio_to_phys(folio));
> + rc = make_folio_secure(folio, uvcb);
> + folio_unlock(folio);
> }
>
> /*
> - * Once we drop the PTL, the page may get unmapped and
> + * Once we drop the PTL, the folio may get unmapped and
> * freed immediately. We need a temporary reference.
> */
> - if (rc == -EAGAIN)
> - get_page(page);
> + if (rc == -EAGAIN || rc == -E2BIG)
> + folio_get(folio);
> }
> pte_unmap_unlock(ptep, ptelock);
> out:
> mmap_read_unlock(gmap->mm);
>
> + if (rc == -E2BIG) {
> + /*
> + * Splitting might fail with -EBUSY due to unexpected folio
> + * references, just like make_folio_secure(). So handle it
> + * ahead of time without the PTL being held.
> + */
> + folio_lock(folio);
> + rc = split_folio(folio);

if split_folio returns -EAGAIN...

> + folio_unlock(folio);
> + folio_put(folio);
> + }
> +
> if (rc == -EAGAIN) {

.. we will not skip this ...

> /*
> * If we are here because the UVC returned busy or partial
> * completion, this is just a useless check, but it is safe.
> */
> - wait_on_page_writeback(page);
> - put_page(page);
> + folio_wait_writeback(folio);
> + folio_put(folio);

.. and we will do one folio_put() too many

> } else if (rc == -EBUSY) {
> /*
> * If we have tried a local drain and the page refcount

are we sure that split_folio() can never return -EAGAIN now and in the
future too?

maybe just change it to } else if (... ?


2024-04-10 17:46:20

by Claudio Imbrenda

[permalink] [raw]
Subject: Re: [PATCH v1 4/5] s390/uv: update PG_arch_1 comment

On Thu, 4 Apr 2024 18:36:41 +0200
David Hildenbrand <[email protected]> wrote:

> We removed the usage of PG_arch_1 for page tables in commit
> a51324c430db ("s390/cmma: rework no-dat handling").
>
> Let's update the comment in UV to reflect that.
>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Claudio Imbrenda <[email protected]>

> ---
> arch/s390/kernel/uv.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index 9c0113b26735..76fc61333fae 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -471,13 +471,12 @@ int arch_make_page_accessible(struct page *page)
> return 0;
>
> /*
> - * PG_arch_1 is used in 3 places:
> - * 1. for kernel page tables during early boot
> - * 2. for storage keys of huge pages and KVM
> - * 3. As an indication that this small folio might be secure. This can
> + * PG_arch_1 is used in 2 places:
> + * 1. for storage keys of hugetlb folios and KVM
> + * 2. As an indication that this small folio might be secure. This can
> * overindicate, e.g. we set the bit before calling
> * convert_to_secure.
> - * As secure pages are never huge, all 3 variants can co-exists.
> + * As secure pages are never large folios, both variants can co-exists.
> */
> if (!test_bit(PG_arch_1, &folio->flags))
> return 0;


2024-04-10 17:47:05

by Claudio Imbrenda

[permalink] [raw]
Subject: Re: [PATCH v1 3/5] s390/uv: convert PG_arch_1 users to only work on small folios

On Thu, 4 Apr 2024 18:36:40 +0200
David Hildenbrand <[email protected]> wrote:

> Now that make_folio_secure() may only set PG_arch_1 for small folios,
> let's convert relevant remaining UV code to only work on (small) folios
> and simply reject large folios early. This way, we'll never end up
> touching PG_arch_1 on tail pages of a large folio in UV code.
>
> The folio_get()/folio_put() for functions that are documented to already
> hold a folio reference look weird and it should probably be removed.
> Similarly, uv_destroy_owned_page() and uv_convert_owned_from_secure()
> should really consume a folio reference instead. But these are cleanups for
> another day.
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> arch/s390/include/asm/page.h | 1 +
> arch/s390/kernel/uv.c | 39 +++++++++++++++++++++---------------
> 2 files changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
> index 54d015bcd8e3..b64384872c0f 100644
> --- a/arch/s390/include/asm/page.h
> +++ b/arch/s390/include/asm/page.h
> @@ -214,6 +214,7 @@ static inline unsigned long __phys_addr(unsigned long x, bool is_31bit)
> #define pfn_to_phys(pfn) ((pfn) << PAGE_SHIFT)
>
> #define phys_to_page(phys) pfn_to_page(phys_to_pfn(phys))
> +#define phys_to_folio(phys) page_folio(phys_to_page(phys))
> #define page_to_phys(page) pfn_to_phys(page_to_pfn(page))
> #define folio_to_phys(page) pfn_to_phys(folio_pfn(folio))
>
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index adcbd4b13035..9c0113b26735 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -134,14 +134,17 @@ static int uv_destroy_page(unsigned long paddr)
> */
> int uv_destroy_owned_page(unsigned long paddr)
> {
> - struct page *page = phys_to_page(paddr);
> + struct folio *folio = phys_to_folio(paddr);
> int rc;
>
> - get_page(page);
> + if (unlikely(folio_test_large(folio)))
> + return 0;

please add a comment here to explain why it's ok to just return 0
here...

> +
> + folio_get(folio);
> rc = uv_destroy_page(paddr);
> if (!rc)
> - clear_bit(PG_arch_1, &page->flags);
> - put_page(page);
> + clear_bit(PG_arch_1, &folio->flags);
> + folio_put(folio);
> return rc;
> }
>
> @@ -169,14 +172,17 @@ int uv_convert_from_secure(unsigned long paddr)
> */
> int uv_convert_owned_from_secure(unsigned long paddr)
> {
> - struct page *page = phys_to_page(paddr);
> + struct folio *folio = phys_to_folio(paddr);
> int rc;
>
> - get_page(page);
> + if (unlikely(folio_test_large(folio)))
> + return 0;

.. and here

> +
> + folio_get(folio);
> rc = uv_convert_from_secure(paddr);
> if (!rc)
> - clear_bit(PG_arch_1, &page->flags);
> - put_page(page);
> + clear_bit(PG_arch_1, &folio->flags);
> + folio_put(folio);
> return rc;
> }
>
> @@ -457,33 +463,34 @@ EXPORT_SYMBOL_GPL(gmap_destroy_page);
> */
> int arch_make_page_accessible(struct page *page)
> {
> + struct folio *folio = page_folio(page);
> int rc = 0;
>
> - /* Hugepage cannot be protected, so nothing to do */
> - if (PageHuge(page))
> + /* Large folios cannot be protected, so nothing to do */
> + if (unlikely(folio_test_large(folio)))
> return 0;
>
> /*
> * PG_arch_1 is used in 3 places:
> * 1. for kernel page tables during early boot
> * 2. for storage keys of huge pages and KVM
> - * 3. As an indication that this page might be secure. This can
> + * 3. As an indication that this small folio might be secure. This can
> * overindicate, e.g. we set the bit before calling
> * convert_to_secure.
> * As secure pages are never huge, all 3 variants can co-exists.
> */
> - if (!test_bit(PG_arch_1, &page->flags))
> + if (!test_bit(PG_arch_1, &folio->flags))
> return 0;
>
> - rc = uv_pin_shared(page_to_phys(page));
> + rc = uv_pin_shared(folio_to_phys(folio));
> if (!rc) {
> - clear_bit(PG_arch_1, &page->flags);
> + clear_bit(PG_arch_1, &folio->flags);
> return 0;
> }
>
> - rc = uv_convert_from_secure(page_to_phys(page));
> + rc = uv_convert_from_secure(folio_to_phys(folio));
> if (!rc) {
> - clear_bit(PG_arch_1, &page->flags);
> + clear_bit(PG_arch_1, &folio->flags);
> return 0;
> }
>


2024-04-10 17:48:02

by Claudio Imbrenda

[permalink] [raw]
Subject: Re: [PATCH v1 2/5] s390/uv: convert gmap_make_secure() to work on folios

On Fri, 5 Apr 2024 09:09:30 +0200
David Hildenbrand <[email protected]> wrote:

> On 05.04.24 05:29, Matthew Wilcox wrote:
> > On Thu, Apr 04, 2024 at 06:36:39PM +0200, David Hildenbrand wrote:
> >> + /* We might get PTE-mapped large folios; split them first. */
> >> + if (folio_test_large(folio)) {
> >> + rc = -E2BIG;
> >
> > We agree to this point. I just turned this into -EINVAL.
> >
> >>
> >> + if (rc == -E2BIG) {
> >> + /*
> >> + * Splitting might fail with -EBUSY due to unexpected folio
> >> + * references, just like make_folio_secure(). So handle it
> >> + * ahead of time without the PTL being held.
> >> + */
> >> + folio_lock(folio);
> >> + rc = split_folio(folio);
> >> + folio_unlock(folio);
> >> + folio_put(folio);
> >> + }
> >
> > Ummm ... if split_folio() succeeds, aren't we going to return 0 from
> > this function, which will be interpreted as make_folio_secure() having
> > succeeded?
>
> I assume the code would have to handle that, because it must deal with
> possible races that would try to convert the folio page.
>
> But the right thing to do is
>
> if (!rc)
> goto again;
>
> after the put.

yes please


2024-04-11 08:24:52

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 1/5] s390/uv: don't call wait_on_page_writeback() without a reference

On 10.04.24 19:21, Claudio Imbrenda wrote:
> Reviewed-by: Claudio Imbrenda<[email protected]>

Thanks! I'll rebase this series on top of the s390/features tree for
now, where Willy's cleanups already reside.

If maintainers want to have that fix first, I can send it out with
Willy's patches rebased on this fix. Whatever people prefer.

--
Cheers,

David / dhildenb


2024-04-11 09:29:47

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 2/5] s390/uv: convert gmap_make_secure() to work on folios

[...]

>> + if (rc == -E2BIG) {
>> + /*
>> + * Splitting might fail with -EBUSY due to unexpected folio
>> + * references, just like make_folio_secure(). So handle it
>> + * ahead of time without the PTL being held.
>> + */
>> + folio_lock(folio);
>> + rc = split_folio(folio);
>
> if split_folio returns -EAGAIN...
>
>> + folio_unlock(folio);
>> + folio_put(folio);
>> + }
>> +
>> if (rc == -EAGAIN) {
>
> ... we will not skip this ...
>
>> /*
>> * If we are here because the UVC returned busy or partial
>> * completion, this is just a useless check, but it is safe.
>> */
>> - wait_on_page_writeback(page);
>> - put_page(page);
>> + folio_wait_writeback(folio);
>> + folio_put(folio);
>
> ... and we will do one folio_put() too many
>
>> } else if (rc == -EBUSY) {
>> /*
>> * If we have tried a local drain and the page refcount
>
> are we sure that split_folio() can never return -EAGAIN now and in the
> future too?

Yes, and in contrast to documentation, that can actually happen! The
documentation is even wrong: we return -EAGAIN if there are unexpected
folio references (e.g., pinned), thanks for raising that.

>
> maybe just change it to } else if (... ?


I think I'll make it all clearer by handling split_folio() return values
separately.

--
Cheers,

David / dhildenb


2024-04-11 10:39:06

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 3/5] s390/uv: convert PG_arch_1 users to only work on small folios

On 10.04.24 19:42, Claudio Imbrenda wrote:
> On Thu, 4 Apr 2024 18:36:40 +0200
> David Hildenbrand <[email protected]> wrote:
>
>> Now that make_folio_secure() may only set PG_arch_1 for small folios,
>> let's convert relevant remaining UV code to only work on (small) folios
>> and simply reject large folios early. This way, we'll never end up
>> touching PG_arch_1 on tail pages of a large folio in UV code.
>>
>> The folio_get()/folio_put() for functions that are documented to already
>> hold a folio reference look weird and it should probably be removed.
>> Similarly, uv_destroy_owned_page() and uv_convert_owned_from_secure()
>> should really consume a folio reference instead. But these are cleanups for
>> another day.
>>
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> arch/s390/include/asm/page.h | 1 +
>> arch/s390/kernel/uv.c | 39 +++++++++++++++++++++---------------
>> 2 files changed, 24 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
>> index 54d015bcd8e3..b64384872c0f 100644
>> --- a/arch/s390/include/asm/page.h
>> +++ b/arch/s390/include/asm/page.h
>> @@ -214,6 +214,7 @@ static inline unsigned long __phys_addr(unsigned long x, bool is_31bit)
>> #define pfn_to_phys(pfn) ((pfn) << PAGE_SHIFT)
>>
>> #define phys_to_page(phys) pfn_to_page(phys_to_pfn(phys))
>> +#define phys_to_folio(phys) page_folio(phys_to_page(phys))
>> #define page_to_phys(page) pfn_to_phys(page_to_pfn(page))
>> #define folio_to_phys(page) pfn_to_phys(folio_pfn(folio))
>>
>> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
>> index adcbd4b13035..9c0113b26735 100644
>> --- a/arch/s390/kernel/uv.c
>> +++ b/arch/s390/kernel/uv.c
>> @@ -134,14 +134,17 @@ static int uv_destroy_page(unsigned long paddr)
>> */
>> int uv_destroy_owned_page(unsigned long paddr)
>> {
>> - struct page *page = phys_to_page(paddr);
>> + struct folio *folio = phys_to_folio(paddr);
>> int rc;
>>
>> - get_page(page);
>> + if (unlikely(folio_test_large(folio)))
>> + return 0;
>
> please add a comment here to explain why it's ok to just return 0
> here...


Will use

"/* See gmap_make_secure(): large folios cannot be protected */"

>
>> +
>> + folio_get(folio);
>> rc = uv_destroy_page(paddr);
>> if (!rc)
>> - clear_bit(PG_arch_1, &page->flags);
>> - put_page(page);
>> + clear_bit(PG_arch_1, &folio->flags);
>> + folio_put(folio);
>> return rc;
>> }
>>
>> @@ -169,14 +172,17 @@ int uv_convert_from_secure(unsigned long paddr)
>> */
>> int uv_convert_owned_from_secure(unsigned long paddr)
>> {
>> - struct page *page = phys_to_page(paddr);
>> + struct folio *folio = phys_to_folio(paddr);
>> int rc;
>>
>> - get_page(page);
>> + if (unlikely(folio_test_large(folio)))
>> + return 0;
>
> ... and here

here as well and ...

>
>> +
>> + folio_get(folio);
>> rc = uv_convert_from_secure(paddr);
>> if (!rc)
>> - clear_bit(PG_arch_1, &page->flags);
>> - put_page(page);
>> + clear_bit(PG_arch_1, &folio->flags);
>> + folio_put(folio);
>> return rc;
>> }
>>
>> @@ -457,33 +463,34 @@ EXPORT_SYMBOL_GPL(gmap_destroy_page);
>> */
>> int arch_make_page_accessible(struct page *page)
>> {
>> + struct folio *folio = page_folio(page);
>> int rc = 0;
>>
>> - /* Hugepage cannot be protected, so nothing to do */
>> - if (PageHuge(page))
>> + /* Large folios cannot be protected, so nothing to do */

^ here as well.

>> + if (unlikely(folio_test_large(folio)))
>> return 0;
>>

--
Cheers,

David / dhildenb


2024-04-11 10:43:51

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 3/5] s390/uv: convert PG_arch_1 users to only work on small folios

On 05.04.24 05:36, Matthew Wilcox wrote:
> On Thu, Apr 04, 2024 at 06:36:40PM +0200, David Hildenbrand wrote:
>> Now that make_folio_secure() may only set PG_arch_1 for small folios,
>> let's convert relevant remaining UV code to only work on (small) folios
>> and simply reject large folios early. This way, we'll never end up
>> touching PG_arch_1 on tail pages of a large folio in UV code.
>>
>> The folio_get()/folio_put() for functions that are documented to already
>> hold a folio reference look weird and it should probably be removed.
>> Similarly, uv_destroy_owned_page() and uv_convert_owned_from_secure()
>> should really consume a folio reference instead. But these are cleanups for
>> another day.
>
> Yes, and we should convert arch_make_page_accessible() to
> arch_make_folio_accessible() ... one of the two callers already has the
> folio (and page-writeback already calls arch_make_folio_accessible()
>

Yes. We should then get rid of the arch_make_folio_accessible() loop
over pages. Arch code can handle that, if required. Will include that in
this series.

--
Cheers,

David / dhildenb


2024-04-11 13:14:01

by Alexander Gordeev

[permalink] [raw]
Subject: Re: [PATCH v1 1/5] s390/uv: don't call wait_on_page_writeback() without a reference

On Thu, Apr 11, 2024 at 10:24:23AM +0200, David Hildenbrand wrote:
> Thanks! I'll rebase this series on top of the s390/features tree for now,
> where Willy's cleanups already reside.

Yes, rebase it on to of s390/features, please.

> If maintainers want to have that fix first, I can send it out with Willy's
> patches rebased on this fix. Whatever people prefer.

Many thanks!

> --
> Cheers,
>
> David / dhildenb