2020-12-22 07:13:42

by Anshuman Khandual

[permalink] [raw]
Subject: [RFC 0/2] arm64/mm: Fix pfn_valid() for ZONE_DEVIE based memory

This series fixes pfn_valid() for ZONE_DEVICE based memory and also improves
its performance for normal hotplug memory. While here, it also reorganizes
pfn_valid() on CONFIG_SPARSEMEM. This series has been slightly tested on the
current mainline tree.

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: James Morse <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: [email protected]
Cc: [email protected]

Anshuman Khandual (2):
arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
arm64/mm: Reorganize pfn_valid()

arch/arm64/mm/init.c | 46 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 41 insertions(+), 5 deletions(-)

--
2.20.1


2020-12-22 07:14:10

by Anshuman Khandual

[permalink] [raw]
Subject: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

pfn_valid() validates a pfn but basically it checks for a valid struct page
backing for that pfn. It should always return positive for memory ranges
backed with struct page mapping. But currently pfn_valid() fails for all
ZONE_DEVICE based memory types even though they have struct page mapping.

pfn_valid() asserts that there is a memblock entry for a given pfn without
MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
that they do not have memblock entries. Hence memblock_is_map_memory() will
invariably fail via memblock_search() for a ZONE_DEVICE based address. This
eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
into the system via memremap_pages() called from a driver, their respective
memory sections will not have SECTION_IS_EARLY set.

Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
for firmware reserved memory regions. memblock_is_map_memory() can just be
skipped as its always going to be positive and that will be an optimization
for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
normal memory too will not have SECTION_IS_EARLY set for their sections.

Skipping memblock_is_map_memory() for all non early memory sections would
fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
performance for normal hotplug memory as well.

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: [email protected]
Cc: [email protected]
Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
Signed-off-by: Anshuman Khandual <[email protected]>
---
arch/arm64/mm/init.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 75addb36354a..ee23bda00c28 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)

if (!valid_section(__pfn_to_section(pfn)))
return 0;
+
+ /*
+ * ZONE_DEVICE memory does not have the memblock entries.
+ * memblock_is_map_memory() check for ZONE_DEVICE based
+ * addresses will always fail. Even the normal hotplugged
+ * memory will never have MEMBLOCK_NOMAP flag set in their
+ * memblock entries. Skip memblock search for all non early
+ * memory sections covering all of hotplug memory including
+ * both normal and ZONE_DEVIE based.
+ */
+ if (!early_section(__pfn_to_section(pfn)))
+ return 1;
#endif
return memblock_is_map_memory(addr);
}
--
2.20.1

2020-12-22 07:14:20

by Anshuman Khandual

[permalink] [raw]
Subject: [RFC 2/2] arm64/mm: Reorganize pfn_valid()

There are multiple instances of pfn_to_section_nr() and __pfn_to_section()
when CONFIG_SPARSEMEM is enabled. This can be just optimized if the memory
section is fetched earlier. Hence bifurcate pfn_valid() into two different
definitions depending on whether CONFIG_SPARSEMEM is enabled. Also replace
the open coded pfn <--> addr conversion with __[pfn|phys]_to_[phys|pfn]().
This does not cause any functional change.

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
arch/arm64/mm/init.c | 36 ++++++++++++++++++++++++++++++------
1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index ee23bda00c28..c3387798a3be 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -212,18 +212,25 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
free_area_init(max_zone_pfns);
}

+#ifdef CONFIG_SPARSEMEM
int pfn_valid(unsigned long pfn)
{
- phys_addr_t addr = pfn << PAGE_SHIFT;
+ struct mem_section *ms = __pfn_to_section(pfn);
+ phys_addr_t addr = __pfn_to_phys(pfn);

- if ((addr >> PAGE_SHIFT) != pfn)
+ /*
+ * Ensure the upper PAGE_SHIFT bits are clear in the
+ * pfn. Else it might lead to false positives when
+ * some of the upper bits are set, but the lower bits
+ * match a valid pfn.
+ */
+ if (__phys_to_pfn(addr) != pfn)
return 0;

-#ifdef CONFIG_SPARSEMEM
if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
return 0;

- if (!valid_section(__pfn_to_section(pfn)))
+ if (!valid_section(ms))
return 0;

/*
@@ -235,11 +242,28 @@ int pfn_valid(unsigned long pfn)
* memory sections covering all of hotplug memory including
* both normal and ZONE_DEVIE based.
*/
- if (!early_section(__pfn_to_section(pfn)))
+ if (!early_section(ms))
return 1;
-#endif
+
return memblock_is_map_memory(addr);
}
+#else
+int pfn_valid(unsigned long pfn)
+{
+ phys_addr_t addr = __pfn_to_phys(pfn);
+
+ /*
+ * Ensure the upper PAGE_SHIFT bits are clear in the
+ * pfn. Else it might lead to false positives when
+ * some of the upper bits are set, but the lower bits
+ * match a valid pfn.
+ */
+ if (__phys_to_pfn(addr) != pfn)
+ return 0;
+
+ return memblock_is_map_memory(addr);
+}
+#endif
EXPORT_SYMBOL(pfn_valid);

static phys_addr_t memory_limit = PHYS_ADDR_MAX;
--
2.20.1

2020-12-22 09:14:44

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

On 22.12.20 08:12, Anshuman Khandual wrote:
> pfn_valid() validates a pfn but basically it checks for a valid struct page
> backing for that pfn. It should always return positive for memory ranges
> backed with struct page mapping. But currently pfn_valid() fails for all
> ZONE_DEVICE based memory types even though they have struct page mapping.
>
> pfn_valid() asserts that there is a memblock entry for a given pfn without
> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> that they do not have memblock entries. Hence memblock_is_map_memory() will
> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> into the system via memremap_pages() called from a driver, their respective
> memory sections will not have SECTION_IS_EARLY set.
>
> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> for firmware reserved memory regions. memblock_is_map_memory() can just be
> skipped as its always going to be positive and that will be an optimization
> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
> normal memory too will not have SECTION_IS_EARLY set for their sections.
>
> Skipping memblock_is_map_memory() for all non early memory sections would
> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> performance for normal hotplug memory as well.
>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: Robin Murphy <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> Signed-off-by: Anshuman Khandual <[email protected]>
> ---
> arch/arm64/mm/init.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 75addb36354a..ee23bda00c28 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>
> if (!valid_section(__pfn_to_section(pfn)))
> return 0;
> +
> + /*
> + * ZONE_DEVICE memory does not have the memblock entries.
> + * memblock_is_map_memory() check for ZONE_DEVICE based
> + * addresses will always fail. Even the normal hotplugged
> + * memory will never have MEMBLOCK_NOMAP flag set in their
> + * memblock entries. Skip memblock search for all non early
> + * memory sections covering all of hotplug memory including
> + * both normal and ZONE_DEVIE based.
> + */
> + if (!early_section(__pfn_to_section(pfn)))
> + return 1;

Actually, I think we want to check for partial present sections.

Maybe we can rather switch to generic pfn_valid() and tweak it to
something like

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fb3bf696c05e..7b1fcce5bd5a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1382,9 +1382,13 @@ static inline int pfn_valid(unsigned long pfn)
return 0;
/*
* Traditionally early sections always returned pfn_valid() for
- * the entire section-sized span.
+ * the entire section-sized span. Some archs might have holes in
+ * early sections, so double check with memblock if configured.
*/
- return early_section(ms) || pfn_section_valid(ms, pfn);
+ if (early_section(ms))
+ return IS_ENABLED(CONFIG_EARLY_SECTION_MEMMAP_HOLES) ?
+ memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
+ return pfn_section_valid(ms, pfn);
}
#endif



Which users are remaining that require us to add/remove memblocks when
hot(un)plugging memory

$ git grep KEEP_MEM | grep memory_hotplug
mm/memory_hotplug.c: if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
mm/memory_hotplug.c: if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
mm/memory_hotplug.c: if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {


I think one user we would have to handle is
arch/arm64/mm/mmap.c:valid_phys_addr_range(). AFAIS, powerpc at least
does not rely on memblock_is_map_memory.

--
Thanks,

David / dhildenb

2021-01-04 07:01:57

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory


On 12/22/20 2:41 PM, David Hildenbrand wrote:
> On 22.12.20 08:12, Anshuman Khandual wrote:
>> pfn_valid() validates a pfn but basically it checks for a valid struct page
>> backing for that pfn. It should always return positive for memory ranges
>> backed with struct page mapping. But currently pfn_valid() fails for all
>> ZONE_DEVICE based memory types even though they have struct page mapping.
>>
>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>> into the system via memremap_pages() called from a driver, their respective
>> memory sections will not have SECTION_IS_EARLY set.
>>
>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>> skipped as its always going to be positive and that will be an optimization
>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>
>> Skipping memblock_is_map_memory() for all non early memory sections would
>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>> performance for normal hotplug memory as well.
>>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: Ard Biesheuvel <[email protected]>
>> Cc: Robin Murphy <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>> Signed-off-by: Anshuman Khandual <[email protected]>
>> ---
>> arch/arm64/mm/init.c | 12 ++++++++++++
>> 1 file changed, 12 insertions(+)
>>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 75addb36354a..ee23bda00c28 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>>
>> if (!valid_section(__pfn_to_section(pfn)))
>> return 0;
>> +
>> + /*
>> + * ZONE_DEVICE memory does not have the memblock entries.
>> + * memblock_is_map_memory() check for ZONE_DEVICE based
>> + * addresses will always fail. Even the normal hotplugged
>> + * memory will never have MEMBLOCK_NOMAP flag set in their
>> + * memblock entries. Skip memblock search for all non early
>> + * memory sections covering all of hotplug memory including
>> + * both normal and ZONE_DEVIE based.
>> + */
>> + if (!early_section(__pfn_to_section(pfn)))
>> + return 1;
>
> Actually, I think we want to check for partial present sections.
>
> Maybe we can rather switch to generic pfn_valid() and tweak it to
> something like
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fb3bf696c05e..7b1fcce5bd5a 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1382,9 +1382,13 @@ static inline int pfn_valid(unsigned long pfn)
> return 0;
> /*
> * Traditionally early sections always returned pfn_valid() for
> - * the entire section-sized span.
> + * the entire section-sized span. Some archs might have holes in
> + * early sections, so double check with memblock if configured.
> */
> - return early_section(ms) || pfn_section_valid(ms, pfn);
> + if (early_section(ms))
> + return IS_ENABLED(CONFIG_EARLY_SECTION_MEMMAP_HOLES) ?
> + memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
> + return pfn_section_valid(ms, pfn);
> }
> #endif

Could not find CONFIG_EARLY_SECTION_MEMMAP_HOLES. Are you suggesting to
create this config which could track platform scenarios where all early
sections might not have mmap coverage such as arm64 ?

>
>
>
> Which users are remaining that require us to add/remove memblocks when
> hot(un)plugging memory
>
> $ git grep KEEP_MEM | grep memory_hotplug
> mm/memory_hotplug.c: if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> mm/memory_hotplug.c: if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> mm/memory_hotplug.c: if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {

Did not follow, do we want to drop ARCH_KEEP_MEMBLOCK ? Without it arm64
will not be able to track MEMBLOCK_NOMAP memory at runtime.

>
>
> I think one user we would have to handle is
> arch/arm64/mm/mmap.c:valid_phys_addr_range(). AFAIS, powerpc at least
> does not rely on memblock_is_map_memory.

memblock_is_map_memory() is currently used only on arm/arm64 platforms.
Apart from the above example in valid_phys_addr_range(), there are some
other memblock_is_map_memory() call sites as well. But then, we are not
trying to completely drop memblock_is_map_memory() or are we ?

2021-01-04 15:39:14

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

On Tue, 22 Dec 2020 12:42:23 +0530
Anshuman Khandual <[email protected]> wrote:

> pfn_valid() validates a pfn but basically it checks for a valid struct page
> backing for that pfn. It should always return positive for memory ranges
> backed with struct page mapping. But currently pfn_valid() fails for all
> ZONE_DEVICE based memory types even though they have struct page mapping.
>
> pfn_valid() asserts that there is a memblock entry for a given pfn without
> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> that they do not have memblock entries. Hence memblock_is_map_memory() will
> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> into the system via memremap_pages() called from a driver, their respective
> memory sections will not have SECTION_IS_EARLY set.
>
> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> for firmware reserved memory regions. memblock_is_map_memory() can just be
> skipped as its always going to be positive and that will be an optimization
> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged

typo: ZONE_DEVIE

> normal memory too will not have SECTION_IS_EARLY set for their sections.
>
> Skipping memblock_is_map_memory() for all non early memory sections would
> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> performance for normal hotplug memory as well.
>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: Robin Murphy <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> Signed-off-by: Anshuman Khandual <[email protected]>
> ---
> arch/arm64/mm/init.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 75addb36354a..ee23bda00c28 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>
> if (!valid_section(__pfn_to_section(pfn)))
> return 0;
> +
> + /*
> + * ZONE_DEVICE memory does not have the memblock entries.
> + * memblock_is_map_memory() check for ZONE_DEVICE based
> + * addresses will always fail. Even the normal hotplugged
> + * memory will never have MEMBLOCK_NOMAP flag set in their
> + * memblock entries. Skip memblock search for all non early
> + * memory sections covering all of hotplug memory including
> + * both normal and ZONE_DEVIE based.

Here as well + the cover letter title.

> + */
> + if (!early_section(__pfn_to_section(pfn)))
> + return 1;
> #endif
> return memblock_is_map_memory(addr);
> }

2021-01-05 03:27:41

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory


On 1/4/21 9:06 PM, Jonathan Cameron wrote:
> On Tue, 22 Dec 2020 12:42:23 +0530
> Anshuman Khandual <[email protected]> wrote:
>
>> pfn_valid() validates a pfn but basically it checks for a valid struct page
>> backing for that pfn. It should always return positive for memory ranges
>> backed with struct page mapping. But currently pfn_valid() fails for all
>> ZONE_DEVICE based memory types even though they have struct page mapping.
>>
>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>> into the system via memremap_pages() called from a driver, their respective
>> memory sections will not have SECTION_IS_EARLY set.
>>
>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>> skipped as its always going to be positive and that will be an optimization
>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>
> typo: ZONE_DEVIE
>
>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>
>> Skipping memblock_is_map_memory() for all non early memory sections would
>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>> performance for normal hotplug memory as well.
>>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: Ard Biesheuvel <[email protected]>
>> Cc: Robin Murphy <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>> Signed-off-by: Anshuman Khandual <[email protected]>
>> ---
>> arch/arm64/mm/init.c | 12 ++++++++++++
>> 1 file changed, 12 insertions(+)
>>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 75addb36354a..ee23bda00c28 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>>
>> if (!valid_section(__pfn_to_section(pfn)))
>> return 0;
>> +
>> + /*
>> + * ZONE_DEVICE memory does not have the memblock entries.
>> + * memblock_is_map_memory() check for ZONE_DEVICE based
>> + * addresses will always fail. Even the normal hotplugged
>> + * memory will never have MEMBLOCK_NOMAP flag set in their
>> + * memblock entries. Skip memblock search for all non early
>> + * memory sections covering all of hotplug memory including
>> + * both normal and ZONE_DEVIE based.
>
> Here as well + the cover letter title.

My bad, will fix all the three instances.

2021-01-11 10:34:26

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

On 04.01.21 07:18, Anshuman Khandual wrote:
>
> On 12/22/20 2:41 PM, David Hildenbrand wrote:
>> On 22.12.20 08:12, Anshuman Khandual wrote:
>>> pfn_valid() validates a pfn but basically it checks for a valid struct page
>>> backing for that pfn. It should always return positive for memory ranges
>>> backed with struct page mapping. But currently pfn_valid() fails for all
>>> ZONE_DEVICE based memory types even though they have struct page mapping.
>>>
>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>> into the system via memremap_pages() called from a driver, their respective
>>> memory sections will not have SECTION_IS_EARLY set.
>>>
>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>> skipped as its always going to be positive and that will be an optimization
>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>
>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>> performance for normal hotplug memory as well.
>>>
>>> Cc: Catalin Marinas <[email protected]>
>>> Cc: Will Deacon <[email protected]>
>>> Cc: Ard Biesheuvel <[email protected]>
>>> Cc: Robin Murphy <[email protected]>
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>> Signed-off-by: Anshuman Khandual <[email protected]>
>>> ---
>>> arch/arm64/mm/init.c | 12 ++++++++++++
>>> 1 file changed, 12 insertions(+)
>>>
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index 75addb36354a..ee23bda00c28 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>>>
>>> if (!valid_section(__pfn_to_section(pfn)))
>>> return 0;
>>> +
>>> + /*
>>> + * ZONE_DEVICE memory does not have the memblock entries.
>>> + * memblock_is_map_memory() check for ZONE_DEVICE based
>>> + * addresses will always fail. Even the normal hotplugged
>>> + * memory will never have MEMBLOCK_NOMAP flag set in their
>>> + * memblock entries. Skip memblock search for all non early
>>> + * memory sections covering all of hotplug memory including
>>> + * both normal and ZONE_DEVIE based.
>>> + */
>>> + if (!early_section(__pfn_to_section(pfn)))
>>> + return 1;
>>
>> Actually, I think we want to check for partial present sections.
>>
>> Maybe we can rather switch to generic pfn_valid() and tweak it to
>> something like
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index fb3bf696c05e..7b1fcce5bd5a 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -1382,9 +1382,13 @@ static inline int pfn_valid(unsigned long pfn)
>> return 0;
>> /*
>> * Traditionally early sections always returned pfn_valid() for
>> - * the entire section-sized span.
>> + * the entire section-sized span. Some archs might have holes in
>> + * early sections, so double check with memblock if configured.
>> */
>> - return early_section(ms) || pfn_section_valid(ms, pfn);
>> + if (early_section(ms))
>> + return IS_ENABLED(CONFIG_EARLY_SECTION_MEMMAP_HOLES) ?
>> + memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
>> + return pfn_section_valid(ms, pfn);
>> }
>> #endif
>
> Could not find CONFIG_EARLY_SECTION_MEMMAP_HOLES. Are you suggesting to
> create this config which could track platform scenarios where all early
> sections might not have mmap coverage such as arm64 ?

Yes, a new config that states what's actually happening.

>
>>
>>
>>
>> Which users are remaining that require us to add/remove memblocks when
>> hot(un)plugging memory
>>
>> $ git grep KEEP_MEM | grep memory_hotplug
>> mm/memory_hotplug.c: if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
>> mm/memory_hotplug.c: if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
>> mm/memory_hotplug.c: if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
>
> Did not follow, do we want to drop ARCH_KEEP_MEMBLOCK ? Without it arm64
> will not be able to track MEMBLOCK_NOMAP memory at runtime.

I'd only like the hot(un)plug parts gone for now, if possible: I don't
see the need for that handling really that cannot be handled easier, as
in the proposed pfn_valid() changes.

I understand that current handling of memory holes in early sections and
memory marked as MEMBLOCK_NOMAP requires ARCH_KEEP_MEMBLOCK for now.

>
>>
>>
>> I think one user we would have to handle is
>> arch/arm64/mm/mmap.c:valid_phys_addr_range(). AFAIS, powerpc at least
>> does not rely on memblock_is_map_memory.
>
> memblock_is_map_memory() is currently used only on arm/arm64 platforms.
> Apart from the above example in valid_phys_addr_range(), there are some
> other memblock_is_map_memory() call sites as well. But then, we are not
> trying to completely drop memblock_is_map_memory() or are we ?

No, just change the semantics: only relevant for early sections. Imagine
freezing MEMBLOCK state after boot.

Only early sections can have memory holes and might be marked
MEMBLOCK_NOMAP. For hotplugged memory, we don't have to call
memblock_is_map_memory().

--
Thanks,

David / dhildenb

2021-01-11 14:05:07

by Mike Rapoport

[permalink] [raw]
Subject: Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

On Mon, Jan 11, 2021 at 11:31:02AM +0100, David Hildenbrand wrote:
> On 04.01.21 07:18, Anshuman Khandual wrote:

...

> >> Actually, I think we want to check for partial present sections.
> >>
> >> Maybe we can rather switch to generic pfn_valid() and tweak it to
> >> something like
> >>
> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >> index fb3bf696c05e..7b1fcce5bd5a 100644
> >> --- a/include/linux/mmzone.h
> >> +++ b/include/linux/mmzone.h
> >> @@ -1382,9 +1382,13 @@ static inline int pfn_valid(unsigned long pfn)
> >> return 0;
> >> /*
> >> * Traditionally early sections always returned pfn_valid() for
> >> - * the entire section-sized span.
> >> + * the entire section-sized span. Some archs might have holes in
> >> + * early sections, so double check with memblock if configured.
> >> */
> >> - return early_section(ms) || pfn_section_valid(ms, pfn);
> >> + if (early_section(ms))
> >> + return IS_ENABLED(CONFIG_EARLY_SECTION_MEMMAP_HOLES) ?
> >> + memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
> >> + return pfn_section_valid(ms, pfn);
> >> }
> >> #endif
> >
> > Could not find CONFIG_EARLY_SECTION_MEMMAP_HOLES. Are you suggesting to
> > create this config which could track platform scenarios where all early
> > sections might not have mmap coverage such as arm64 ?
>
> Yes, a new config that states what's actually happening.

Just to clarify, there are several architectures that may free parts of
memmap with FLATMEM/SPARSEMEM and this functionality is gated by
CONFIG_HAVE_ARCH_PFN_VALID.

So if arm64 is to use a generic version, this should be also taken into
account.

> >> Which users are remaining that require us to add/remove memblocks when
> >> hot(un)plugging memory
> >>
> >> $ git grep KEEP_MEM | grep memory_hotplug
> >> mm/memory_hotplug.c: if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> >> mm/memory_hotplug.c: if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> >> mm/memory_hotplug.c: if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
> >
> > Did not follow, do we want to drop ARCH_KEEP_MEMBLOCK ? Without it arm64
> > will not be able to track MEMBLOCK_NOMAP memory at runtime.
>
> I'd only like the hot(un)plug parts gone for now, if possible: I don't
> see the need for that handling really that cannot be handled easier, as
> in the proposed pfn_valid() changes.
>
> I understand that current handling of memory holes in early sections and
> memory marked as MEMBLOCK_NOMAP requires ARCH_KEEP_MEMBLOCK for now.

This is mostly about the holes in the memory map. I believe it does not
matter if that memory is NOMAP or not, the holes in the memmap are punched
for anything in memblock.memory.

It seems to me that for arm64's pfn_valid() we can safely replace
memblock_is_map_memory() with memblock_is_memory(), not that it would
change much.

> >> I think one user we would have to handle is
> >> arch/arm64/mm/mmap.c:valid_phys_addr_range(). AFAIS, powerpc at least
> >> does not rely on memblock_is_map_memory.
> >
> > memblock_is_map_memory() is currently used only on arm/arm64 platforms.
> > Apart from the above example in valid_phys_addr_range(), there are some
> > other memblock_is_map_memory() call sites as well. But then, we are not
> > trying to completely drop memblock_is_map_memory() or are we ?
>
> No, just change the semantics: only relevant for early sections. Imagine
> freezing MEMBLOCK state after boot.
>
> Only early sections can have memory holes and might be marked
> MEMBLOCK_NOMAP. For hotplugged memory, we don't have to call
> memblock_is_map_memory().
>
> --
> Thanks,
>
> David / dhildenb

--
Sincerely yours,
Mike.

2021-01-25 06:45:22

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory


On 12/22/20 12:42 PM, Anshuman Khandual wrote:
> pfn_valid() asserts that there is a memblock entry for a given pfn without
> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> that they do not have memblock entries. Hence memblock_is_map_memory() will
> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> into the system via memremap_pages() called from a driver, their respective
> memory sections will not have SECTION_IS_EARLY set.
>
> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> for firmware reserved memory regions. memblock_is_map_memory() can just be
> skipped as its always going to be positive and that will be an optimization
> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
> normal memory too will not have SECTION_IS_EARLY set for their sections.
>
> Skipping memblock_is_map_memory() for all non early memory sections would
> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> performance for normal hotplug memory as well.
>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: Robin Murphy <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> Signed-off-by: Anshuman Khandual <[email protected]>

Hello David/Mike,

Given that we would need to rework early sections, memblock semantics via a
new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
which fixes a problem (and improves performance) can be merged first. After
that, I could start working on the proposed rework. Could you please let me
know your thoughts on this. Thank you.

- Anshuman

2021-01-25 07:37:03

by Mike Rapoport

[permalink] [raw]
Subject: Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

On Mon, Jan 25, 2021 at 11:52:32AM +0530, Anshuman Khandual wrote:
>
> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
> > pfn_valid() asserts that there is a memblock entry for a given pfn without
> > MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> > that they do not have memblock entries. Hence memblock_is_map_memory() will
> > invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> > eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> > to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> > into the system via memremap_pages() called from a driver, their respective
> > memory sections will not have SECTION_IS_EARLY set.
> >
> > Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> > regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> > for firmware reserved memory regions. memblock_is_map_memory() can just be
> > skipped as its always going to be positive and that will be an optimization
> > for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
> > normal memory too will not have SECTION_IS_EARLY set for their sections.
> >
> > Skipping memblock_is_map_memory() for all non early memory sections would
> > fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> > performance for normal hotplug memory as well.
> >
> > Cc: Catalin Marinas <[email protected]>
> > Cc: Will Deacon <[email protected]>
> > Cc: Ard Biesheuvel <[email protected]>
> > Cc: Robin Murphy <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> > Signed-off-by: Anshuman Khandual <[email protected]>
>
> Hello David/Mike,
>
> Given that we would need to rework early sections, memblock semantics via a
> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
> which fixes a problem (and improves performance) can be merged first. After
> that, I could start working on the proposed rework. Could you please let me
> know your thoughts on this. Thank you.

I didn't object to these patches, I think they are fine.
I agree that we can look into update of arm64's pfn_valid(), maybe right
after decrease of section size lands in.

> - Anshuman

--
Sincerely yours,
Mike.

2021-01-26 05:03:23

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

On 25.01.21 07:22, Anshuman Khandual wrote:
>
> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>> into the system via memremap_pages() called from a driver, their respective
>> memory sections will not have SECTION_IS_EARLY set.
>>
>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>> skipped as its always going to be positive and that will be an optimization
>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>
>> Skipping memblock_is_map_memory() for all non early memory sections would
>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>> performance for normal hotplug memory as well.
>>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: Ard Biesheuvel <[email protected]>
>> Cc: Robin Murphy <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>> Signed-off-by: Anshuman Khandual <[email protected]>
>
> Hello David/Mike,
>
> Given that we would need to rework early sections, memblock semantics via a
> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
> which fixes a problem (and improves performance) can be merged first. After
> that, I could start working on the proposed rework. Could you please let me
> know your thoughts on this. Thank you.

As I said, we might have to throw in an pfn_section_valid() check, to
catch not-section-aligned ZONE_DEVICE ranges (I assume this is possible
on arm64 as well, no?).

Apart from that, I'm fine with a simple fix upfront, that can be more
easily backported if needed. (Q: do we? is this stable material?)

--
Thanks,

David / dhildenb

2021-01-27 21:00:11

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory



On 1/25/21 2:43 PM, David Hildenbrand wrote:
> On 25.01.21 07:22, Anshuman Khandual wrote:
>>
>> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>> into the system via memremap_pages() called from a driver, their respective
>>> memory sections will not have SECTION_IS_EARLY set.
>>>
>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>> skipped as its always going to be positive and that will be an optimization
>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>
>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>> performance for normal hotplug memory as well.
>>>
>>> Cc: Catalin Marinas <[email protected]>
>>> Cc: Will Deacon <[email protected]>
>>> Cc: Ard Biesheuvel <[email protected]>
>>> Cc: Robin Murphy <[email protected]>
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>> Signed-off-by: Anshuman Khandual <[email protected]>
>>
>> Hello David/Mike,
>>
>> Given that we would need to rework early sections, memblock semantics via a
>> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
>> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
>> which fixes a problem (and improves performance) can be merged first. After
>> that, I could start working on the proposed rework. Could you please let me
>> know your thoughts on this. Thank you.
>
> As I said, we might have to throw in an pfn_section_valid() check, to
> catch not-section-aligned ZONE_DEVICE ranges (I assume this is possible
> on arm64 as well, no?).

pfn_section_valid() should be called only for !early_section() i.e normal
hotplug and ZONE_DEVICE memory ? Because early boot memory should always
be section aligned.

>
> Apart from that, I'm fine with a simple fix upfront, that can be more
> easily backported if needed. (Q: do we? is this stable material?)
>

Right, an upfront fix here would help in backporting. AFAICS it should be
backported to the stable as pte_devmap and ZONE_DEVICE have been around
for some time now. Do you have a particular stable version which needs to
be tagged in the patch ?

2021-01-27 21:21:15

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory



On 1/25/21 1:01 PM, Mike Rapoport wrote:
> On Mon, Jan 25, 2021 at 11:52:32AM +0530, Anshuman Khandual wrote:
>>
>> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>> into the system via memremap_pages() called from a driver, their respective
>>> memory sections will not have SECTION_IS_EARLY set.
>>>
>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>> skipped as its always going to be positive and that will be an optimization
>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>
>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>> performance for normal hotplug memory as well.
>>>
>>> Cc: Catalin Marinas <[email protected]>
>>> Cc: Will Deacon <[email protected]>
>>> Cc: Ard Biesheuvel <[email protected]>
>>> Cc: Robin Murphy <[email protected]>
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>> Signed-off-by: Anshuman Khandual <[email protected]>
>>
>> Hello David/Mike,
>>
>> Given that we would need to rework early sections, memblock semantics via a
>> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
>> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
>> which fixes a problem (and improves performance) can be merged first. After
>> that, I could start working on the proposed rework. Could you please let me
>> know your thoughts on this. Thank you.
>
> I didn't object to these patches, I think they are fine.
> I agree that we can look into update of arm64's pfn_valid(), maybe right
> after decrease of section size lands in.

Sure, will drop the RFC tag and prepare these patches.

2021-01-27 21:48:56

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

On 27.01.21 05:06, Anshuman Khandual wrote:
>
>
> On 1/25/21 2:43 PM, David Hildenbrand wrote:
>> On 25.01.21 07:22, Anshuman Khandual wrote:
>>>
>>> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>>> into the system via memremap_pages() called from a driver, their respective
>>>> memory sections will not have SECTION_IS_EARLY set.
>>>>
>>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>>> skipped as its always going to be positive and that will be an optimization
>>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>>
>>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>>> performance for normal hotplug memory as well.
>>>>
>>>> Cc: Catalin Marinas <[email protected]>
>>>> Cc: Will Deacon <[email protected]>
>>>> Cc: Ard Biesheuvel <[email protected]>
>>>> Cc: Robin Murphy <[email protected]>
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>>> Signed-off-by: Anshuman Khandual <[email protected]>
>>>
>>> Hello David/Mike,
>>>
>>> Given that we would need to rework early sections, memblock semantics via a
>>> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
>>> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
>>> which fixes a problem (and improves performance) can be merged first. After
>>> that, I could start working on the proposed rework. Could you please let me
>>> know your thoughts on this. Thank you.
>>
>> As I said, we might have to throw in an pfn_section_valid() check, to
>> catch not-section-aligned ZONE_DEVICE ranges (I assume this is possible
>> on arm64 as well, no?).
>
> pfn_section_valid() should be called only for !early_section() i.e normal
> hotplug and ZONE_DEVICE memory ? Because early boot memory should always
> be section aligned.

Well, at least not on x86-64 you can have early sections intersect with
ZONE_DEVICE memory.

E.g., have 64MB boot memory in a section. Later, we add ZONE_DEVICE
memory which might cover the remaining 64MB. For pfn_valid() on x86-64,
we always return "true" for such sections, because we always have the
memmap for the whole early section allocated during boot. So, there it's
"simple".

Now, arm64 seems to discard some parts of the vmemmap, so the remaining
64MB in such an early section might not have a memmap anymore? TBH, I
don't know.

Most probably only performing the check for
!early_section() is sufficient on arm64, but I really can't tell as I
don't know what we're actually discarding and if something as described
for x86-64 is even possible on arm64.

We should really try to take the magic out of arm64 vmemmap handling.

>
>>
>> Apart from that, I'm fine with a simple fix upfront, that can be more
>> easily backported if needed. (Q: do we? is this stable material?)
>>
>
> Right, an upfront fix here would help in backporting. AFAICS it should be
> backported to the stable as pte_devmap and ZONE_DEVICE have been around
> for some time now. Do you have a particular stable version which needs to
> be tagged in the patch ?

I haven't looked yet TBH. I guess it is broken since ZONE_DEVICE was
enabled on arm64?

--
Thanks,

David / dhildenb

2021-01-28 07:44:46

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory


On 1/27/21 2:59 PM, David Hildenbrand wrote:
> On 27.01.21 05:06, Anshuman Khandual wrote:
>>
>>
>> On 1/25/21 2:43 PM, David Hildenbrand wrote:
>>> On 25.01.21 07:22, Anshuman Khandual wrote:
>>>>
>>>> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>>>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>>>> into the system via memremap_pages() called from a driver, their respective
>>>>> memory sections will not have SECTION_IS_EARLY set.
>>>>>
>>>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>>>> skipped as its always going to be positive and that will be an optimization
>>>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>>>
>>>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>>>> performance for normal hotplug memory as well.
>>>>>
>>>>> Cc: Catalin Marinas <[email protected]>
>>>>> Cc: Will Deacon <[email protected]>
>>>>> Cc: Ard Biesheuvel <[email protected]>
>>>>> Cc: Robin Murphy <[email protected]>
>>>>> Cc: [email protected]
>>>>> Cc: [email protected]
>>>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>>>> Signed-off-by: Anshuman Khandual <[email protected]>
>>>>
>>>> Hello David/Mike,
>>>>
>>>> Given that we would need to rework early sections, memblock semantics via a
>>>> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
>>>> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
>>>> which fixes a problem (and improves performance) can be merged first. After
>>>> that, I could start working on the proposed rework. Could you please let me
>>>> know your thoughts on this. Thank you.
>>>
>>> As I said, we might have to throw in an pfn_section_valid() check, to
>>> catch not-section-aligned ZONE_DEVICE ranges (I assume this is possible
>>> on arm64 as well, no?).
>>
>> pfn_section_valid() should be called only for !early_section() i.e normal
>> hotplug and ZONE_DEVICE memory ? Because early boot memory should always
>> be section aligned.
>
> Well, at least not on x86-64 you can have early sections intersect with ZONE_DEVICE memory.
>
> E.g., have 64MB boot memory in a section. Later, we add ZONE_DEVICE memory which might cover the remaining 64MB. For pfn_valid() on x86-64, we always return "true" for such sections, because we always have the memmap for the whole early section allocated during boot. So, there it's "simple".

This is the generic pfn_valid() used on X86. As you mentioned this
does not test pfn_section_valid() if the section is early assuming
that vmemmap coverage is complete.

#ifndef CONFIG_HAVE_ARCH_PFN_VALID
static inline int pfn_valid(unsigned long pfn)
{
struct mem_section *ms;

if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
return 0;
ms = __nr_to_section(pfn_to_section_nr(pfn));
if (!valid_section(ms))
return 0;
/*
* Traditionally early sections always returned pfn_valid() for
* the entire section-sized span.
*/
return early_section(ms) || pfn_section_valid(ms, pfn);
}
#endif

Looking at the code, seems like early sections get initialized via
sparse_init() only in section granularity but then please correct
me otherwise.

>
> Now, arm64 seems to discard some parts of the vmemmap, so the remaining 64MB in such an early section might not have a memmap anymore? TBH, I don't know.

Did not get that. Could you please be more specific on how arm64 discards
parts of the vmemmap.

>
> Most probably only performing the check for
> !early_section() is sufficient on arm64, but I really can't tell as I don't know what we're actually discarding and if something as described for x86-64 is even possible on arm64.

Seems like direct users for arch_add_memory() and __add_pages() like
pagemap_range() can cause subsection hotplug and vmemmap mapping. So
pfn_section_valid() should be applicable only for !early_sections().

Although a simple test on arm64 shows that both boot memory and
traditional memory hotplug gets entire subsection_map populated. But
that might not be always true for ZONE_DEVICE memory.

>
> We should really try to take the magic out of arm64 vmemmap handling.

I would really like to understand more about this.

>
>>
>>>
>>> Apart from that, I'm fine with a simple fix upfront, that can be more
>>> easily backported if needed. (Q: do we? is this stable material?)
>>>
>>
>> Right, an upfront fix here would help in backporting. AFAICS it should be
>> backported to the stable as pte_devmap and ZONE_DEVICE have been around
>> for some time now. Do you have a particular stable version which needs to
>> be tagged in the patch ?
>
> I haven't looked yet TBH. I guess it is broken since ZONE_DEVICE was enabled on arm64?
>
Sure, will figure this out.