From: Mike Rapoport <[email protected]>
Hi,
These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
pfn_valid_within() to 1.
The idea is to mark NOMAP pages as reserved in the memory map and restore
the intended semantics of pfn_valid() to designate availability of struct
page for a pfn.
With this the core mm will be able to cope with the fact that it cannot use
NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
will be treated correctly even without the need for pfn_valid_within.
The patches are only boot tested on qemu-system-aarch64 so I'd really
appreciate memory stress tests on real hardware.
If this actually works we'll be one step closer to drop custom pfn_valid()
on arm64 altogether.
Mike Rapoport (3):
memblock: update initialization of reserved pages
arm64: decouple check whether pfn is normal memory from pfn_valid()
arm64: drop pfn_valid_within() and simplify pfn_valid()
arch/arm64/Kconfig | 3 ---
arch/arm64/include/asm/memory.h | 2 +-
arch/arm64/include/asm/page.h | 1 +
arch/arm64/kvm/mmu.c | 2 +-
arch/arm64/mm/init.c | 10 ++++++++--
arch/arm64/mm/ioremap.c | 4 ++--
arch/arm64/mm/mmu.c | 2 +-
mm/memblock.c | 23 +++++++++++++++++++++--
8 files changed, 35 insertions(+), 12 deletions(-)
base-commit: e49d033bddf5b565044e2abe4241353959bc9120
--
2.28.0
From: Mike Rapoport <[email protected]>
The struct pages representing a reserved memory region are initialized
using reserve_bootmem_range() function. This function is called for each
reserved region just before the memory is freed from memblock to the buddy
page allocator.
The struct pages for MEMBLOCK_NOMAP regions are kept with the default
values set by the memory map initialization which makes it necessary to
have a special treatment for such pages in pfn_valid() and
pfn_valid_within().
Split out initialization of the reserved pages to a function with a
meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
reserved regions and mark struct pages for the NOMAP regions as
PageReserved.
Signed-off-by: Mike Rapoport <[email protected]>
---
mm/memblock.c | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/mm/memblock.c b/mm/memblock.c
index afaefa8fc6ab..6b7ea9d86310 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2002,6 +2002,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
return end_pfn - start_pfn;
}
+static void __init memmap_init_reserved_pages(void)
+{
+ struct memblock_region *region;
+ phys_addr_t start, end;
+ u64 i;
+
+ /* initialize struct pages for the reserved regions */
+ for_each_reserved_mem_range(i, &start, &end)
+ reserve_bootmem_region(start, end);
+
+ /* and also treat struct pages for the NOMAP regions as PageReserved */
+ for_each_mem_region(region) {
+ if (memblock_is_nomap(region)) {
+ start = region->base;
+ end = start + region->size;
+ reserve_bootmem_region(start, end);
+ }
+ }
+}
+
static unsigned long __init free_low_memory_core_early(void)
{
unsigned long count = 0;
@@ -2010,8 +2030,7 @@ static unsigned long __init free_low_memory_core_early(void)
memblock_clear_hotplug(0, -1);
- for_each_reserved_mem_range(i, &start, &end)
- reserve_bootmem_region(start, end);
+ memmap_init_reserved_pages();
/*
* We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
--
2.28.0
From: Mike Rapoport <[email protected]>
The arm64's version of pfn_valid() differs from the generic because of two
reasons:
* Parts of the memory map are freed during boot. This makes it necessary to
verify that there is actual physical memory that corresponds to a pfn
which is done by querying memblock.
* There are NOMAP memory regions. These regions are not mapped in the
linear map and until the previous commit the struct pages representing
these areas had default values.
As the consequence of absence of the special treatment of NOMAP regions in
the memory map it was necessary to use memblock_is_map_memory() in
pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
generic mm functionality would not treat a NOMAP page as a normal page.
Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
the rest of core mm will treat them as unusable memory and thus
pfn_valid_within() is no longer required at all and can be disabled by
removing CONFIG_HOLES_IN_ZONE on arm64.
pfn_valid() can be slightly simplified by replacing
memblock_is_map_memory() with memblock_is_memory().
Signed-off-by: Mike Rapoport <[email protected]>
---
arch/arm64/Kconfig | 3 ---
arch/arm64/mm/init.c | 4 ++--
2 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e4e1b6550115..58e439046d05 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
def_bool y
depends on NUMA
-config HOLES_IN_ZONE
- def_bool y
-
source "kernel/Kconfig.hz"
config ARCH_SPARSEMEM_ENABLE
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 258b1905ed4a..bb6dd406b1f0 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn)
/*
* ZONE_DEVICE memory does not have the memblock entries.
- * memblock_is_map_memory() check for ZONE_DEVICE based
+ * memblock_is_memory() check for ZONE_DEVICE based
* addresses will always fail. Even the normal hotplugged
* memory will never have MEMBLOCK_NOMAP flag set in their
* memblock entries. Skip memblock search for all non early
@@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn)
return pfn_section_valid(ms, pfn);
}
#endif
- return memblock_is_map_memory(addr);
+ return memblock_is_memory(addr);
}
EXPORT_SYMBOL(pfn_valid);
--
2.28.0
On 4/7/21 10:56 PM, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
>
> The arm64's version of pfn_valid() differs from the generic because of two
> reasons:
>
> * Parts of the memory map are freed during boot. This makes it necessary to
> verify that there is actual physical memory that corresponds to a pfn
> which is done by querying memblock.
>
> * There are NOMAP memory regions. These regions are not mapped in the
> linear map and until the previous commit the struct pages representing
> these areas had default values.
>
> As the consequence of absence of the special treatment of NOMAP regions in
> the memory map it was necessary to use memblock_is_map_memory() in
> pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
> generic mm functionality would not treat a NOMAP page as a normal page.
>
> Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
> the rest of core mm will treat them as unusable memory and thus
> pfn_valid_within() is no longer required at all and can be disabled by
> removing CONFIG_HOLES_IN_ZONE on arm64.
But what about the memory map that are freed during boot (mentioned above).
Would not they still cause CONFIG_HOLES_IN_ZONE to be applicable and hence
pfn_valid_within() ?
>
> pfn_valid() can be slightly simplified by replacing
> memblock_is_map_memory() with memblock_is_memory().
Just to understand this better, pfn_valid() will now return true for all
MEMBLOCK_NOMAP based memory but that is okay as core MM would still ignore
them as unusable memory for being PageReserved().
>
> Signed-off-by: Mike Rapoport <[email protected]>
> ---
> arch/arm64/Kconfig | 3 ---
> arch/arm64/mm/init.c | 4 ++--
> 2 files changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e4e1b6550115..58e439046d05 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
> def_bool y
> depends on NUMA
>
> -config HOLES_IN_ZONE
> - def_bool y
> -
> source "kernel/Kconfig.hz"
>
> config ARCH_SPARSEMEM_ENABLE
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 258b1905ed4a..bb6dd406b1f0 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn)
>
> /*
> * ZONE_DEVICE memory does not have the memblock entries.
> - * memblock_is_map_memory() check for ZONE_DEVICE based
> + * memblock_is_memory() check for ZONE_DEVICE based
> * addresses will always fail. Even the normal hotplugged
> * memory will never have MEMBLOCK_NOMAP flag set in their
> * memblock entries. Skip memblock search for all non early
> @@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn)
> return pfn_section_valid(ms, pfn);
> }
> #endif
> - return memblock_is_map_memory(addr);
> + return memblock_is_memory(addr);
> }
> EXPORT_SYMBOL(pfn_valid);
>
>
On 4/7/21 10:56 PM, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
>
> The struct pages representing a reserved memory region are initialized
> using reserve_bootmem_range() function. This function is called for each
> reserved region just before the memory is freed from memblock to the buddy
> page allocator.
>
> The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> values set by the memory map initialization which makes it necessary to
> have a special treatment for such pages in pfn_valid() and
> pfn_valid_within().
>
> Split out initialization of the reserved pages to a function with a
> meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> reserved regions and mark struct pages for the NOMAP regions as
> PageReserved.
This would definitely need updating the comment for MEMBLOCK_NOMAP definition
in include/linux/memblock.h just to make the semantics is clear, though arm64
is currently the only user for MEMBLOCK_NOMAP.
>
> Signed-off-by: Mike Rapoport <[email protected]>
> ---
> mm/memblock.c | 23 +++++++++++++++++++++--
> 1 file changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memblock.c b/mm/memblock.c
> index afaefa8fc6ab..6b7ea9d86310 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -2002,6 +2002,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
> return end_pfn - start_pfn;
> }
>
> +static void __init memmap_init_reserved_pages(void)
> +{
> + struct memblock_region *region;
> + phys_addr_t start, end;
> + u64 i;
> +
> + /* initialize struct pages for the reserved regions */
> + for_each_reserved_mem_range(i, &start, &end)
> + reserve_bootmem_region(start, end);
> +
> + /* and also treat struct pages for the NOMAP regions as PageReserved */
> + for_each_mem_region(region) {
> + if (memblock_is_nomap(region)) {
> + start = region->base;
> + end = start + region->size;
> + reserve_bootmem_region(start, end);
> + }
> + }
> +}
> +
> static unsigned long __init free_low_memory_core_early(void)
> {
> unsigned long count = 0;
> @@ -2010,8 +2030,7 @@ static unsigned long __init free_low_memory_core_early(void)
>
> memblock_clear_hotplug(0, -1);
>
> - for_each_reserved_mem_range(i, &start, &end)
> - reserve_bootmem_region(start, end);
> + memmap_init_reserved_pages();
>
> /*
> * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
>
Adding James here.
+ James Morse <[email protected]>
On 4/7/21 10:56 PM, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
>
> Hi,
>
> These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
> pfn_valid_within() to 1.
That would be really great for arm64 platform as it will save CPU cycles on
many generic MM paths, given that our pfn_valid() has been expensive.
>
> The idea is to mark NOMAP pages as reserved in the memory map and restore
Though I am not really sure, would that possibly be problematic for UEFI/EFI
use cases as it might have just treated them as normal struct pages till now.
> the intended semantics of pfn_valid() to designate availability of struct
> page for a pfn.
Right, that would be better as the current semantics is not ideal.
>
> With this the core mm will be able to cope with the fact that it cannot use
> NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
> will be treated correctly even without the need for pfn_valid_within.
>
> The patches are only boot tested on qemu-system-aarch64 so I'd really
> appreciate memory stress tests on real hardware.
Did some preliminary memory stress tests on a guest with portions of memory
marked as MEMBLOCK_NOMAP and did not find any obvious problem. But this might
require some testing on real UEFI environment with firmware using MEMBLOCK_NOMAP
memory to make sure that changing these struct pages to PageReserved() is safe.
>
> If this actually works we'll be one step closer to drop custom pfn_valid()
> on arm64 altogether.
Right, planning to rework and respin the RFC originally sent last month.
https://patchwork.kernel.org/project/linux-mm/patch/[email protected]/
On Thu, Apr 08, 2021 at 10:46:18AM +0530, Anshuman Khandual wrote:
>
>
> On 4/7/21 10:56 PM, Mike Rapoport wrote:
> > From: Mike Rapoport <[email protected]>
> >
> > The struct pages representing a reserved memory region are initialized
> > using reserve_bootmem_range() function. This function is called for each
> > reserved region just before the memory is freed from memblock to the buddy
> > page allocator.
> >
> > The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> > values set by the memory map initialization which makes it necessary to
> > have a special treatment for such pages in pfn_valid() and
> > pfn_valid_within().
> >
> > Split out initialization of the reserved pages to a function with a
> > meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> > reserved regions and mark struct pages for the NOMAP regions as
> > PageReserved.
>
> This would definitely need updating the comment for MEMBLOCK_NOMAP definition
> in include/linux/memblock.h just to make the semantics is clear,
Sure
> though arm64 is currently the only user for MEMBLOCK_NOMAP.
> > Signed-off-by: Mike Rapoport <[email protected]>
> > ---
> > mm/memblock.c | 23 +++++++++++++++++++++--
> > 1 file changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index afaefa8fc6ab..6b7ea9d86310 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -2002,6 +2002,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
> > return end_pfn - start_pfn;
> > }
> >
> > +static void __init memmap_init_reserved_pages(void)
> > +{
> > + struct memblock_region *region;
> > + phys_addr_t start, end;
> > + u64 i;
> > +
> > + /* initialize struct pages for the reserved regions */
> > + for_each_reserved_mem_range(i, &start, &end)
> > + reserve_bootmem_region(start, end);
> > +
> > + /* and also treat struct pages for the NOMAP regions as PageReserved */
> > + for_each_mem_region(region) {
> > + if (memblock_is_nomap(region)) {
> > + start = region->base;
> > + end = start + region->size;
> > + reserve_bootmem_region(start, end);
> > + }
> > + }
> > +}
> > +
> > static unsigned long __init free_low_memory_core_early(void)
> > {
> > unsigned long count = 0;
> > @@ -2010,8 +2030,7 @@ static unsigned long __init free_low_memory_core_early(void)
> >
> > memblock_clear_hotplug(0, -1);
> >
> > - for_each_reserved_mem_range(i, &start, &end)
> > - reserve_bootmem_region(start, end);
> > + memmap_init_reserved_pages();
> >
> > /*
> > * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
> >
--
Sincerely yours,
Mike.
On Thu, Apr 08, 2021 at 10:42:43AM +0530, Anshuman Khandual wrote:
>
> On 4/7/21 10:56 PM, Mike Rapoport wrote:
> > From: Mike Rapoport <[email protected]>
> >
> > The arm64's version of pfn_valid() differs from the generic because of two
> > reasons:
> >
> > * Parts of the memory map are freed during boot. This makes it necessary to
> > verify that there is actual physical memory that corresponds to a pfn
> > which is done by querying memblock.
> >
> > * There are NOMAP memory regions. These regions are not mapped in the
> > linear map and until the previous commit the struct pages representing
> > these areas had default values.
> >
> > As the consequence of absence of the special treatment of NOMAP regions in
> > the memory map it was necessary to use memblock_is_map_memory() in
> > pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
> > generic mm functionality would not treat a NOMAP page as a normal page.
> >
> > Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
> > the rest of core mm will treat them as unusable memory and thus
> > pfn_valid_within() is no longer required at all and can be disabled by
> > removing CONFIG_HOLES_IN_ZONE on arm64.
>
> But what about the memory map that are freed during boot (mentioned above).
> Would not they still cause CONFIG_HOLES_IN_ZONE to be applicable and hence
> pfn_valid_within() ?
The CONFIG_HOLES_IN_ZONE name is misleading as actually pfn_valid_within()
is only required for holes within a MAX_ORDER_NR_PAGES blocks (see comment
near pfn_valid_within() definition in mmzone.h). The freeing of the memory
map during boot avoids breaking MAX_ORDER blocks and the holes for which
memory map is freed are always aligned at MAX_ORDER.
AFAIU, the only case when there could be a hole in a MAX_ORDER block is
when EFI/ACPI reserves memory for its use and this memory becomes NOMAP in
the kernel. We still create struct pages for this memory, but they never
get values other than defaults, so core mm has no idea that this memory
should be touched, hence the need for pfn_valid_within() aliased to
pfn_valid() on arm64.
> > pfn_valid() can be slightly simplified by replacing
> > memblock_is_map_memory() with memblock_is_memory().
>
> Just to understand this better, pfn_valid() will now return true for all
> MEMBLOCK_NOMAP based memory but that is okay as core MM would still ignore
> them as unusable memory for being PageReserved().
Right, pfn_valid() will return true for all memory, including
MEMBLOCK_NOMAP. Since core mm deals with PageResrved() for memory used by
the firmware, e.g. on x86, I don't see why it won't work on arm64.
> >
> > Signed-off-by: Mike Rapoport <[email protected]>
> > ---
> > arch/arm64/Kconfig | 3 ---
> > arch/arm64/mm/init.c | 4 ++--
> > 2 files changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index e4e1b6550115..58e439046d05 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
> > def_bool y
> > depends on NUMA
> >
> > -config HOLES_IN_ZONE
> > - def_bool y
> > -
> > source "kernel/Kconfig.hz"
> >
> > config ARCH_SPARSEMEM_ENABLE
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 258b1905ed4a..bb6dd406b1f0 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn)
> >
> > /*
> > * ZONE_DEVICE memory does not have the memblock entries.
> > - * memblock_is_map_memory() check for ZONE_DEVICE based
> > + * memblock_is_memory() check for ZONE_DEVICE based
> > * addresses will always fail. Even the normal hotplugged
> > * memory will never have MEMBLOCK_NOMAP flag set in their
> > * memblock entries. Skip memblock search for all non early
> > @@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn)
> > return pfn_section_valid(ms, pfn);
> > }
> > #endif
> > - return memblock_is_map_memory(addr);
> > + return memblock_is_memory(addr);
> > }
> > EXPORT_SYMBOL(pfn_valid);
> >
> >
--
Sincerely yours,
Mike.
On Thu, Apr 08, 2021 at 10:49:02AM +0530, Anshuman Khandual wrote:
> Adding James here.
>
> + James Morse <[email protected]>
>
> On 4/7/21 10:56 PM, Mike Rapoport wrote:
> > From: Mike Rapoport <[email protected]>
> >
> > Hi,
> >
> > These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
> > pfn_valid_within() to 1.
>
> That would be really great for arm64 platform as it will save CPU cycles on
> many generic MM paths, given that our pfn_valid() has been expensive.
>
> >
> > The idea is to mark NOMAP pages as reserved in the memory map and restore
>
> Though I am not really sure, would that possibly be problematic for UEFI/EFI
> use cases as it might have just treated them as normal struct pages till now.
I don't think there should be a problem because now the struct pages for
UEFI/ACPI never got to be used by the core mm. They were (rightfully)
skipped by memblock_free_all() from one side and pfn_valid() and
pfn_valid_within() return false for them in various pfn walkers from the
other side.
> > the intended semantics of pfn_valid() to designate availability of struct
> > page for a pfn.
>
> Right, that would be better as the current semantics is not ideal.
>
> >
> > With this the core mm will be able to cope with the fact that it cannot use
> > NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
> > will be treated correctly even without the need for pfn_valid_within.
> >
> > The patches are only boot tested on qemu-system-aarch64 so I'd really
> > appreciate memory stress tests on real hardware.
>
> Did some preliminary memory stress tests on a guest with portions of memory
> marked as MEMBLOCK_NOMAP and did not find any obvious problem. But this might
> require some testing on real UEFI environment with firmware using MEMBLOCK_NOMAP
> memory to make sure that changing these struct pages to PageReserved() is safe.
I surely have no access for such machines :)
> > If this actually works we'll be one step closer to drop custom pfn_valid()
> > on arm64 altogether.
>
> Right, planning to rework and respin the RFC originally sent last month.
>
> https://patchwork.kernel.org/project/linux-mm/patch/[email protected]/
--
Sincerely yours,
Mike.
On Wed, 14 Apr 2021 at 17:14, David Hildenbrand <[email protected]> wrote:
>
> On 07.04.21 19:26, Mike Rapoport wrote:
> > From: Mike Rapoport <[email protected]>
> >
> > The struct pages representing a reserved memory region are initialized
> > using reserve_bootmem_range() function. This function is called for each
> > reserved region just before the memory is freed from memblock to the buddy
> > page allocator.
> >
> > The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> > values set by the memory map initialization which makes it necessary to
> > have a special treatment for such pages in pfn_valid() and
> > pfn_valid_within().
>
> I assume these pages are never given to the buddy, because we don't have
> a direct mapping. So to the kernel, it's essentially just like a memory
> hole with benefits.
>
> I can spot that we want to export such memory like any special memory
> thingy/hole in /proc/iomem -- "reserved", which makes sense.
>
> I would assume that MEMBLOCK_NOMAP is a special type of *reserved*
> memory. IOW, that for_each_reserved_mem_range() should already succeed
> on these as well -- we should mark anything that is MEMBLOCK_NOMAP
> implicitly as reserved. Or are there valid reasons not to do so? What
> can anyone do with that memory?
>
> I assume they are pretty much useless for the kernel, right? Like other
> reserved memory ranges.
>
On ARM, we need to know whether any physical regions that do not
contain system memory contain something with device semantics or not.
One of the examples is ACPI tables: these are in reserved memory, and
so they are not covered by the linear region. However, when the ACPI
core ioremap()s an arbitrary memory region, we don't know whether it
is mapping a memory region or a device region unless we keep track of
this in some way. (Device mappings require device attributes, but
firmware tables require memory attributes, as they might be accessed
using misaligned reads)
>
> >
> > Split out initialization of the reserved pages to a function with a
> > meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> > reserved regions and mark struct pages for the NOMAP regions as
> > PageReserved.
> >
> > Signed-off-by: Mike Rapoport <[email protected]>
> > ---
> > mm/memblock.c | 23 +++++++++++++++++++++--
> > 1 file changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index afaefa8fc6ab..6b7ea9d86310 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -2002,6 +2002,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
> > return end_pfn - start_pfn;
> > }
> >
> > +static void __init memmap_init_reserved_pages(void)
> > +{
> > + struct memblock_region *region;
> > + phys_addr_t start, end;
> > + u64 i;
> > +
> > + /* initialize struct pages for the reserved regions */
> > + for_each_reserved_mem_range(i, &start, &end)
> > + reserve_bootmem_region(start, end);
> > +
> > + /* and also treat struct pages for the NOMAP regions as PageReserved */
> > + for_each_mem_region(region) {
> > + if (memblock_is_nomap(region)) {
> > + start = region->base;
> > + end = start + region->size;
> > + reserve_bootmem_region(start, end);
> > + }
> > + }
> > +}
> > +
> > static unsigned long __init free_low_memory_core_early(void)
> > {
> > unsigned long count = 0;
> > @@ -2010,8 +2030,7 @@ static unsigned long __init free_low_memory_core_early(void)
> >
> > memblock_clear_hotplug(0, -1);
> >
> > - for_each_reserved_mem_range(i, &start, &end)
> > - reserve_bootmem_region(start, end);
> > + memmap_init_reserved_pages();
> >
> > /*
> > * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
> >
>
>
> --
> Thanks,
>
> David / dhildenb
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 07.04.21 19:26, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
>
> The struct pages representing a reserved memory region are initialized
> using reserve_bootmem_range() function. This function is called for each
> reserved region just before the memory is freed from memblock to the buddy
> page allocator.
>
> The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> values set by the memory map initialization which makes it necessary to
> have a special treatment for such pages in pfn_valid() and
> pfn_valid_within().
I assume these pages are never given to the buddy, because we don't have
a direct mapping. So to the kernel, it's essentially just like a memory
hole with benefits.
I can spot that we want to export such memory like any special memory
thingy/hole in /proc/iomem -- "reserved", which makes sense.
I would assume that MEMBLOCK_NOMAP is a special type of *reserved*
memory. IOW, that for_each_reserved_mem_range() should already succeed
on these as well -- we should mark anything that is MEMBLOCK_NOMAP
implicitly as reserved. Or are there valid reasons not to do so? What
can anyone do with that memory?
I assume they are pretty much useless for the kernel, right? Like other
reserved memory ranges.
>
> Split out initialization of the reserved pages to a function with a
> meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> reserved regions and mark struct pages for the NOMAP regions as
> PageReserved.
>
> Signed-off-by: Mike Rapoport <[email protected]>
> ---
> mm/memblock.c | 23 +++++++++++++++++++++--
> 1 file changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memblock.c b/mm/memblock.c
> index afaefa8fc6ab..6b7ea9d86310 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -2002,6 +2002,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
> return end_pfn - start_pfn;
> }
>
> +static void __init memmap_init_reserved_pages(void)
> +{
> + struct memblock_region *region;
> + phys_addr_t start, end;
> + u64 i;
> +
> + /* initialize struct pages for the reserved regions */
> + for_each_reserved_mem_range(i, &start, &end)
> + reserve_bootmem_region(start, end);
> +
> + /* and also treat struct pages for the NOMAP regions as PageReserved */
> + for_each_mem_region(region) {
> + if (memblock_is_nomap(region)) {
> + start = region->base;
> + end = start + region->size;
> + reserve_bootmem_region(start, end);
> + }
> + }
> +}
> +
> static unsigned long __init free_low_memory_core_early(void)
> {
> unsigned long count = 0;
> @@ -2010,8 +2030,7 @@ static unsigned long __init free_low_memory_core_early(void)
>
> memblock_clear_hotplug(0, -1);
>
> - for_each_reserved_mem_range(i, &start, &end)
> - reserve_bootmem_region(start, end);
> + memmap_init_reserved_pages();
>
> /*
> * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
>
--
Thanks,
David / dhildenb
On 14.04.21 17:27, Ard Biesheuvel wrote:
> On Wed, 14 Apr 2021 at 17:14, David Hildenbrand <[email protected]> wrote:
>>
>> On 07.04.21 19:26, Mike Rapoport wrote:
>>> From: Mike Rapoport <[email protected]>
>>>
>>> The struct pages representing a reserved memory region are initialized
>>> using reserve_bootmem_range() function. This function is called for each
>>> reserved region just before the memory is freed from memblock to the buddy
>>> page allocator.
>>>
>>> The struct pages for MEMBLOCK_NOMAP regions are kept with the default
>>> values set by the memory map initialization which makes it necessary to
>>> have a special treatment for such pages in pfn_valid() and
>>> pfn_valid_within().
>>
>> I assume these pages are never given to the buddy, because we don't have
>> a direct mapping. So to the kernel, it's essentially just like a memory
>> hole with benefits.
>>
>> I can spot that we want to export such memory like any special memory
>> thingy/hole in /proc/iomem -- "reserved", which makes sense.
>>
>> I would assume that MEMBLOCK_NOMAP is a special type of *reserved*
>> memory. IOW, that for_each_reserved_mem_range() should already succeed
>> on these as well -- we should mark anything that is MEMBLOCK_NOMAP
>> implicitly as reserved. Or are there valid reasons not to do so? What
>> can anyone do with that memory?
>>
>> I assume they are pretty much useless for the kernel, right? Like other
>> reserved memory ranges.
>>
>
> On ARM, we need to know whether any physical regions that do not
> contain system memory contain something with device semantics or not.
> One of the examples is ACPI tables: these are in reserved memory, and
> so they are not covered by the linear region. However, when the ACPI
> core ioremap()s an arbitrary memory region, we don't know whether it
> is mapping a memory region or a device region unless we keep track of
> this in some way. (Device mappings require device attributes, but
> firmware tables require memory attributes, as they might be accessed
> using misaligned reads)
Using generically sounding NOMAP ("don't create direct mapping") to
identify device regions feels like a hack. I know, it was introduced
just for that purpose.
Looking at memblock_mark_nomap(), we consider "device regions"
1) ACPI tables
2) VIDEO_TYPE_EFI memory
3) some device-tree regions in of/fdt.c
IIUC, right now we end up creating a memmap for this NOMAP memory, but
hide it away in pfn_valid(). This patch set at least fixes that.
Assuming these pages are never mapped to user space via the struct page
(which better be the case), we could further use a new pagetype to mark
these pages in a special way, such that we can identify them directly
via pfn_to_page().
Then, we could mostly avoid having to query memblock at runtime to
figure out that this is special memory. This would obviously be an
extension to this series. Just a thought.
--
Thanks,
David / dhildenb
On Wed, Apr 14, 2021 at 05:12:11PM +0200, David Hildenbrand wrote:
> On 07.04.21 19:26, Mike Rapoport wrote:
> > From: Mike Rapoport <[email protected]>
> >
> > The struct pages representing a reserved memory region are initialized
> > using reserve_bootmem_range() function. This function is called for each
> > reserved region just before the memory is freed from memblock to the buddy
> > page allocator.
> >
> > The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> > values set by the memory map initialization which makes it necessary to
> > have a special treatment for such pages in pfn_valid() and
> > pfn_valid_within().
>
> I assume these pages are never given to the buddy, because we don't have a
> direct mapping. So to the kernel, it's essentially just like a memory hole
> with benefits.
The pages should not be accessed as normal memory so they do not have a
direct (or in ARMish linear) mapping and are never given to buddy.
After looking at ACPI standard I don't see a fundamental reason for this
but they've already made this mess and we need to cope with it.
> I can spot that we want to export such memory like any special memory
> thingy/hole in /proc/iomem -- "reserved", which makes sense.
It does, but let's wait with /proc/iomem changes. We don't really have a
100% consistent view of it on different architectures, so adding yet
another type there does not seem, well, urgent.
> I would assume that MEMBLOCK_NOMAP is a special type of *reserved* memory.
> IOW, that for_each_reserved_mem_range() should already succeed on these as
> well -- we should mark anything that is MEMBLOCK_NOMAP implicitly as
> reserved. Or are there valid reasons not to do so? What can anyone do with
> that memory?
>
> I assume they are pretty much useless for the kernel, right? Like other
> reserved memory ranges.
I agree that there is a lot of commonality between NOMAP and reserved. The
problem is that even semantics for reserved is different between
architectures. Moreover, on the same architecture there could be
E820_TYPE_RESERVED and memblock.reserved with different properties.
I'd really prefer moving in baby steps here because any change in the boot
mm can bear several month of early hangs debugging ;-)
> > Split out initialization of the reserved pages to a function with a
> > meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> > reserved regions and mark struct pages for the NOMAP regions as
> > PageReserved.
> >
> > Signed-off-by: Mike Rapoport <[email protected]>
> > ---
> > mm/memblock.c | 23 +++++++++++++++++++++--
> > 1 file changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index afaefa8fc6ab..6b7ea9d86310 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -2002,6 +2002,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
> > return end_pfn - start_pfn;
> > }
> > +static void __init memmap_init_reserved_pages(void)
> > +{
> > + struct memblock_region *region;
> > + phys_addr_t start, end;
> > + u64 i;
> > +
> > + /* initialize struct pages for the reserved regions */
> > + for_each_reserved_mem_range(i, &start, &end)
> > + reserve_bootmem_region(start, end);
> > +
> > + /* and also treat struct pages for the NOMAP regions as PageReserved */
> > + for_each_mem_region(region) {
> > + if (memblock_is_nomap(region)) {
> > + start = region->base;
> > + end = start + region->size;
> > + reserve_bootmem_region(start, end);
> > + }
> > + }
> > +}
> > +
> > static unsigned long __init free_low_memory_core_early(void)
> > {
> > unsigned long count = 0;
> > @@ -2010,8 +2030,7 @@ static unsigned long __init free_low_memory_core_early(void)
> > memblock_clear_hotplug(0, -1);
> > - for_each_reserved_mem_range(i, &start, &end)
> > - reserve_bootmem_region(start, end);
> > + memmap_init_reserved_pages();
> > /*
> > * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
--
Sincerely yours,
Mike.
On Wed, Apr 14, 2021 at 05:27:53PM +0200, Ard Biesheuvel wrote:
> On Wed, 14 Apr 2021 at 17:14, David Hildenbrand <[email protected]> wrote:
> >
> > On 07.04.21 19:26, Mike Rapoport wrote:
> > > From: Mike Rapoport <[email protected]>
> > >
> > > The struct pages representing a reserved memory region are initialized
> > > using reserve_bootmem_range() function. This function is called for each
> > > reserved region just before the memory is freed from memblock to the buddy
> > > page allocator.
> > >
> > > The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> > > values set by the memory map initialization which makes it necessary to
> > > have a special treatment for such pages in pfn_valid() and
> > > pfn_valid_within().
> >
> > I assume these pages are never given to the buddy, because we don't have
> > a direct mapping. So to the kernel, it's essentially just like a memory
> > hole with benefits.
> >
> > I can spot that we want to export such memory like any special memory
> > thingy/hole in /proc/iomem -- "reserved", which makes sense.
> >
> > I would assume that MEMBLOCK_NOMAP is a special type of *reserved*
> > memory. IOW, that for_each_reserved_mem_range() should already succeed
> > on these as well -- we should mark anything that is MEMBLOCK_NOMAP
> > implicitly as reserved. Or are there valid reasons not to do so? What
> > can anyone do with that memory?
> >
> > I assume they are pretty much useless for the kernel, right? Like other
> > reserved memory ranges.
> >
>
> On ARM, we need to know whether any physical regions that do not
> contain system memory contain something with device semantics or not.
> One of the examples is ACPI tables: these are in reserved memory, and
> so they are not covered by the linear region. However, when the ACPI
> core ioremap()s an arbitrary memory region, we don't know whether it
> is mapping a memory region or a device region unless we keep track of
> this in some way. (Device mappings require device attributes, but
> firmware tables require memory attributes, as they might be accessed
> using misaligned reads)
I mostly agree, but my understanding is that regions of *physical* memory
that are occupied by various pieces of EFI/ACPI information require special
treatment because it was defined this way in the APCI spec.
And since ARM cannot tolerate aliased mappings with different caching mode
the whole bunch of firmware memory should be ioremap()ed to access it.
> > > Split out initialization of the reserved pages to a function with a
> > > meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> > > reserved regions and mark struct pages for the NOMAP regions as
> > > PageReserved.
> > >
> > > Signed-off-by: Mike Rapoport <[email protected]>
> > > ---
> > > mm/memblock.c | 23 +++++++++++++++++++++--
> > > 1 file changed, 21 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/mm/memblock.c b/mm/memblock.c
> > > index afaefa8fc6ab..6b7ea9d86310 100644
> > > --- a/mm/memblock.c
> > > +++ b/mm/memblock.c
> > > @@ -2002,6 +2002,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
> > > return end_pfn - start_pfn;
> > > }
> > >
> > > +static void __init memmap_init_reserved_pages(void)
> > > +{
> > > + struct memblock_region *region;
> > > + phys_addr_t start, end;
> > > + u64 i;
> > > +
> > > + /* initialize struct pages for the reserved regions */
> > > + for_each_reserved_mem_range(i, &start, &end)
> > > + reserve_bootmem_region(start, end);
> > > +
> > > + /* and also treat struct pages for the NOMAP regions as PageReserved */
> > > + for_each_mem_region(region) {
> > > + if (memblock_is_nomap(region)) {
> > > + start = region->base;
> > > + end = start + region->size;
> > > + reserve_bootmem_region(start, end);
> > > + }
> > > + }
> > > +}
> > > +
> > > static unsigned long __init free_low_memory_core_early(void)
> > > {
> > > unsigned long count = 0;
> > > @@ -2010,8 +2030,7 @@ static unsigned long __init free_low_memory_core_early(void)
> > >
> > > memblock_clear_hotplug(0, -1);
> > >
> > > - for_each_reserved_mem_range(i, &start, &end)
> > > - reserve_bootmem_region(start, end);
> > > + memmap_init_reserved_pages();
> > >
> > > /*
> > > * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
> > >
> >
> >
> > --
> > Thanks,
> >
> > David / dhildenb
> >
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
Sincerely yours,
Mike.
On Wed, Apr 14, 2021 at 05:52:57PM +0200, David Hildenbrand wrote:
> On 14.04.21 17:27, Ard Biesheuvel wrote:
> > On Wed, 14 Apr 2021 at 17:14, David Hildenbrand <[email protected]> wrote:
> > >
> > > On 07.04.21 19:26, Mike Rapoport wrote:
> > > > From: Mike Rapoport <[email protected]>
> > > >
> > > > The struct pages representing a reserved memory region are initialized
> > > > using reserve_bootmem_range() function. This function is called for each
> > > > reserved region just before the memory is freed from memblock to the buddy
> > > > page allocator.
> > > >
> > > > The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> > > > values set by the memory map initialization which makes it necessary to
> > > > have a special treatment for such pages in pfn_valid() and
> > > > pfn_valid_within().
> > >
> > > I assume these pages are never given to the buddy, because we don't have
> > > a direct mapping. So to the kernel, it's essentially just like a memory
> > > hole with benefits.
> > >
> > > I can spot that we want to export such memory like any special memory
> > > thingy/hole in /proc/iomem -- "reserved", which makes sense.
> > >
> > > I would assume that MEMBLOCK_NOMAP is a special type of *reserved*
> > > memory. IOW, that for_each_reserved_mem_range() should already succeed
> > > on these as well -- we should mark anything that is MEMBLOCK_NOMAP
> > > implicitly as reserved. Or are there valid reasons not to do so? What
> > > can anyone do with that memory?
> > >
> > > I assume they are pretty much useless for the kernel, right? Like other
> > > reserved memory ranges.
> > >
> >
> > On ARM, we need to know whether any physical regions that do not
> > contain system memory contain something with device semantics or not.
> > One of the examples is ACPI tables: these are in reserved memory, and
> > so they are not covered by the linear region. However, when the ACPI
> > core ioremap()s an arbitrary memory region, we don't know whether it
> > is mapping a memory region or a device region unless we keep track of
> > this in some way. (Device mappings require device attributes, but
> > firmware tables require memory attributes, as they might be accessed
> > using misaligned reads)
>
> Using generically sounding NOMAP ("don't create direct mapping") to identify
> device regions feels like a hack. I know, it was introduced just for that
> purpose.
>
> Looking at memblock_mark_nomap(), we consider "device regions"
>
> 1) ACPI tables
>
> 2) VIDEO_TYPE_EFI memory
>
> 3) some device-tree regions in of/fdt.c
>
>
> IIUC, right now we end up creating a memmap for this NOMAP memory, but hide
> it away in pfn_valid(). This patch set at least fixes that.
Currently we have memmap entries with struct page set to defaults for the
NOMAP memory. AFAIU hiding them in pfn_valid()/pfn_valid_within() was a
solution to failures in pfn walkers that presumed that for a pfn_valid()
there will be a struct page that really reflects the state of that page.
> Assuming these pages are never mapped to user space via the struct page
> (which better be the case), we could further use a new pagetype to mark
> these pages in a special way, such that we can identify them directly via
> pfn_to_page().
Not sure we really need a new pagetype here, PG_Reserved seems to be quite
enough to say "don't touch this". I generally agree that we could make
PG_Reserved a PageType and then have several sub-types for reserved memory.
This definitely will add clarity but I'm not sure that this justifies
amount of churn and effort required to audit uses of PageResrved().
> Then, we could mostly avoid having to query memblock at runtime to figure
> out that this is special memory. This would obviously be an extension to
> this series. Just a thought.
Stop pushing memblock out of kernel! ;-)
Now, seriously, we can minimize memblock involvement in run-time and this
series in yet another step in that direction.
--
Sincerely yours,
Mike.
> Not sure we really need a new pagetype here, PG_Reserved seems to be quite
> enough to say "don't touch this". I generally agree that we could make
> PG_Reserved a PageType and then have several sub-types for reserved memory.
> This definitely will add clarity but I'm not sure that this justifies
> amount of churn and effort required to audit uses of PageResrved().
>
>> Then, we could mostly avoid having to query memblock at runtime to figure
>> out that this is special memory. This would obviously be an extension to
>> this series. Just a thought.
>
> Stop pushing memblock out of kernel! ;-)
Can't stop. Won't stop. :D
It's lovely for booting up a kernel until we have other data-structures
in place ;)
--
Thanks,
David / dhildenb
On 16.04.21 13:44, Mike Rapoport wrote:
> On Thu, Apr 15, 2021 at 11:30:12AM +0200, David Hildenbrand wrote:
>>> Not sure we really need a new pagetype here, PG_Reserved seems to be quite
>>> enough to say "don't touch this". I generally agree that we could make
>>> PG_Reserved a PageType and then have several sub-types for reserved memory.
>>> This definitely will add clarity but I'm not sure that this justifies
>>> amount of churn and effort required to audit uses of PageResrved().
>>>> Then, we could mostly avoid having to query memblock at runtime to figure
>>>> out that this is special memory. This would obviously be an extension to
>>>> this series. Just a thought.
>>>
>>> Stop pushing memblock out of kernel! ;-)
>>
>> Can't stop. Won't stop. :D
>>
>> It's lovely for booting up a kernel until we have other data-structures in
>> place ;)
>
> A bit more seriously, we don't have any data structure that reliably
> represents physical memory layout and arch-independent fashion.
> memblock is probably the best starting point for eventually having one.
We have the (slowish) kernel resource tree after boot and the (faster)
memmap. I really don't see why we really need another slowish variant.
We might be better off to just extend and speed up the kernel resource tree.
Memblock as is is not a reasonable datastructure to keep around after
boot: for example, how we handle boottime allocations and reserve
regions both as reserved.
--
Thanks,
David / dhildenb
On Thu, Apr 15, 2021 at 11:30:12AM +0200, David Hildenbrand wrote:
> > Not sure we really need a new pagetype here, PG_Reserved seems to be quite
> > enough to say "don't touch this". I generally agree that we could make
> > PG_Reserved a PageType and then have several sub-types for reserved memory.
> > This definitely will add clarity but I'm not sure that this justifies
> > amount of churn and effort required to audit uses of PageResrved().
> > > Then, we could mostly avoid having to query memblock at runtime to figure
> > > out that this is special memory. This would obviously be an extension to
> > > this series. Just a thought.
> >
> > Stop pushing memblock out of kernel! ;-)
>
> Can't stop. Won't stop. :D
>
> It's lovely for booting up a kernel until we have other data-structures in
> place ;)
A bit more seriously, we don't have any data structure that reliably
represents physical memory layout and arch-independent fashion.
memblock is probably the best starting point for eventually having one.
--
Sincerely yours,
Mike.