Playing with kdump+virtio-mem I noticed that kexec_file_load() does not
consider System RAM added via dax/kmem and virtio-mem when preparing the
elf header for kdump. Looking into the details, the logic used in
walk_system_ram_res() and walk_mem_res() seems to be outdated.
walk_system_ram_range() already does the right thing, let's change
walk_system_ram_res() and walk_mem_res(), and clean up.
Loading a kdump kernel via "kexec -p -s" ... will result in the kdump
kernel to also dump dax/kmem and virtio-mem added System RAM now.
Note: kexec-tools on x86-64 also have to be updated to consider this
memory in the kexec_load() case when processing /proc/iomem.
Against next-20210322.
David Hildenbrand (3):
kernel/resource: make walk_system_ram_res() find all busy
IORESOURCE_SYSTEM_RAM resources
kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM
resources
kernel/resource: remove first_lvl / siblings_only logic
kernel/resource.c | 45 ++++++++++++---------------------------------
1 file changed, 12 insertions(+), 33 deletions(-)
--
2.29.2
It used to be true that we can have busy system RAM only on the first level
in the resourc tree. However, this is no longer holds for driver-managed
system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
lower levels.
We have two users of walk_system_ram_res(), which currently only
consideres the first level:
a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
locate_mem_hole_callback(), so even after this change, we won't be
placing kexec images onto dax/kmem and virtio-mem added memory. No
change.
b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
not adding relevant ranges to the crash elf info, resulting in them
not getting dumped via kdump.
This change fixes loading a crashkernel via kexec_file_load() and including
dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note
that e.g,, arm64 relies on memblock data and, therefore, always considers
all added System RAM already.
Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function
behave like walk_system_ram_range().
Cc: Andrew Morton <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Signed-off-by: David Hildenbrand <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: David Hildenbrand <[email protected]>
---
kernel/resource.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/resource.c b/kernel/resource.c
index 627e61b0c124..4efd6e912279 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -457,7 +457,7 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
{
unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
- return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true,
+ return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
arg, func);
}
--
2.29.2
It used to be true that we can have system RAM only on the first level
in the resourc tree. However, this is no longer holds for driver-managed
system RAM (i.e., dax/kmem and virtio-mem).
The function walk_mem_res() only consideres the first level and is
used in arch/x86/mm/ioremap.c:__ioremap_check_mem() only. We currently
fail to identify System RAM added by dax/kmem and virtio-mem as
"IORES_MAP_SYSTEM_RAM", for example, allowing for remapping of such
"normal RAM" in __ioremap_caller().
Let's find all busy IORESOURCE_MEM resources, making the function
behave similar to walk_system_ram_res().
Cc: Andrew Morton <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Signed-off-by: David Hildenbrand <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: David Hildenbrand <[email protected]>
---
kernel/resource.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/resource.c b/kernel/resource.c
index 4efd6e912279..16e0c7e8ed24 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -470,7 +470,7 @@ int walk_mem_res(u64 start, u64 end, void *arg,
{
unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
- return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true,
+ return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
arg, func);
}
--
2.29.2
All IORESOURCE_SYSTEM_RAM and IORESOURCE_MEM now properly consider the
whole resource tree, not just the first level. Let's drop the unused
first_lvl / siblings_only logic.
All functions properly search the whole tree, so remove documentation
that indicates that some functions behave differently.
Cc: Andrew Morton <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Signed-off-by: David Hildenbrand <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: David Hildenbrand <[email protected]>
---
kernel/resource.c | 45 ++++++++++++---------------------------------
1 file changed, 12 insertions(+), 33 deletions(-)
diff --git a/kernel/resource.c b/kernel/resource.c
index 16e0c7e8ed24..7e00239a023a 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -64,12 +64,8 @@ static DEFINE_RWLOCK(resource_lock);
static struct resource *bootmem_resource_free;
static DEFINE_SPINLOCK(bootmem_resource_lock);
-static struct resource *next_resource(struct resource *p, bool sibling_only)
+static struct resource *next_resource(struct resource *p)
{
- /* Caller wants to traverse through siblings only */
- if (sibling_only)
- return p->sibling;
-
if (p->child)
return p->child;
while (!p->sibling && p->parent)
@@ -81,7 +77,7 @@ static void *r_next(struct seq_file *m, void *v, loff_t *pos)
{
struct resource *p = v;
(*pos)++;
- return (void *)next_resource(p, false);
+ return (void *)next_resource(p);
}
#ifdef CONFIG_PROC_FS
@@ -330,14 +326,10 @@ EXPORT_SYMBOL(release_resource);
* of the resource that's within [@start..@end]; if none is found, returns
* -ENODEV. Returns -EINVAL for invalid parameters.
*
- * This function walks the whole tree and not just first level children
- * unless @first_lvl is true.
- *
* @start: start address of the resource searched for
* @end: end address of same resource
* @flags: flags which the resource must have
* @desc: descriptor the resource must have
- * @first_lvl: walk only the first level children, if set
* @res: return ptr, if resource found
*
* The caller must specify @start, @end, @flags, and @desc
@@ -345,9 +337,8 @@ EXPORT_SYMBOL(release_resource);
*/
static int find_next_iomem_res(resource_size_t start, resource_size_t end,
unsigned long flags, unsigned long desc,
- bool first_lvl, struct resource *res)
+ struct resource *res)
{
- bool siblings_only = true;
struct resource *p;
if (!res)
@@ -358,7 +349,7 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
read_lock(&resource_lock);
- for (p = iomem_resource.child; p; p = next_resource(p, siblings_only)) {
+ for (p = iomem_resource.child; p; p = next_resource(p)) {
/* If we passed the resource we are looking for, stop */
if (p->start > end) {
p = NULL;
@@ -369,13 +360,6 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
if (p->end < start)
continue;
- /*
- * Now that we found a range that matches what we look for,
- * check the flags and the descriptor. If we were not asked to
- * use only the first level, start looking at children as well.
- */
- siblings_only = first_lvl;
-
if ((p->flags & flags) != flags)
continue;
if ((desc != IORES_DESC_NONE) && (desc != p->desc))
@@ -402,14 +386,14 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
unsigned long flags, unsigned long desc,
- bool first_lvl, void *arg,
+ void *arg,
int (*func)(struct resource *, void *))
{
struct resource res;
int ret = -EINVAL;
while (start < end &&
- !find_next_iomem_res(start, end, flags, desc, first_lvl, &res)) {
+ !find_next_iomem_res(start, end, flags, desc, &res)) {
ret = (*func)(&res, arg);
if (ret)
break;
@@ -431,7 +415,6 @@ static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
* @arg: function argument for the callback @func
* @func: callback function that is called for each qualifying resource area
*
- * This walks through whole tree and not just first level children.
* All the memory ranges which overlap start,end and also match flags and
* desc are valid candidates.
*
@@ -441,7 +424,7 @@ static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start,
u64 end, void *arg, int (*func)(struct resource *, void *))
{
- return __walk_iomem_res_desc(start, end, flags, desc, false, arg, func);
+ return __walk_iomem_res_desc(start, end, flags, desc, arg, func);
}
EXPORT_SYMBOL_GPL(walk_iomem_res_desc);
@@ -457,8 +440,8 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
{
unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
- return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
- arg, func);
+ return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, arg,
+ func);
}
/*
@@ -470,17 +453,14 @@ int walk_mem_res(u64 start, u64 end, void *arg,
{
unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
- return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
- arg, func);
+ return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, arg,
+ func);
}
/*
* This function calls the @func callback against all memory ranges of type
* System RAM which are marked as IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY.
* It is to be used only for System RAM.
- *
- * This will find System RAM ranges that are children of top-level resources
- * in addition to top-level System RAM resources.
*/
int walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
void *arg, int (*func)(unsigned long, unsigned long, void *))
@@ -495,8 +475,7 @@ int walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
end = ((u64)(start_pfn + nr_pages) << PAGE_SHIFT) - 1;
flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
while (start < end &&
- !find_next_iomem_res(start, end, flags, IORES_DESC_NONE,
- false, &res)) {
+ !find_next_iomem_res(start, end, flags, IORES_DESC_NONE, &res)) {
pfn = PFN_UP(res.start);
end_pfn = PFN_DOWN(res.end + 1);
if (end_pfn > pfn)
--
2.29.2
On Mon, Mar 22, 2021 at 9:02 AM David Hildenbrand <[email protected]> wrote:
>
> It used to be true that we can have system RAM only on the first level
> in the resourc tree. However, this is no longer holds for driver-managed
> system RAM (i.e., dax/kmem and virtio-mem).
>
> The function walk_mem_res() only consideres the first level and is
> used in arch/x86/mm/ioremap.c:__ioremap_check_mem() only. We currently
> fail to identify System RAM added by dax/kmem and virtio-mem as
> "IORES_MAP_SYSTEM_RAM", for example, allowing for remapping of such
> "normal RAM" in __ioremap_caller().
>
> Let's find all busy IORESOURCE_MEM resources, making the function
> behave similar to walk_system_ram_res().
Looks good,
Reviewed-by: Dan Williams <[email protected]>
On Mon, Mar 22, 2021 at 9:02 AM David Hildenbrand <[email protected]> wrote:
>
> It used to be true that we can have busy system RAM only on the first level
> in the resourc tree. However, this is no longer holds for driver-managed
> system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
> lower levels.
>
> We have two users of walk_system_ram_res(), which currently only
> consideres the first level:
> a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
> IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
> locate_mem_hole_callback(), so even after this change, we won't be
> placing kexec images onto dax/kmem and virtio-mem added memory. No
> change.
> b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
> not adding relevant ranges to the crash elf info, resulting in them
> not getting dumped via kdump.
>
> This change fixes loading a crashkernel via kexec_file_load() and including
> dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note
> that e.g,, arm64 relies on memblock data and, therefore, always considers
> all added System RAM already.
>
> Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function
> behave like walk_system_ram_range().
>
> Cc: Andrew Morton <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Andy Shevchenko <[email protected]>
> Cc: Mauro Carvalho Chehab <[email protected]>
> Cc: Signed-off-by: David Hildenbrand <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Baoquan He <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Keith Busch <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Qian Cai <[email protected]>
> Cc: Oscar Salvador <[email protected]>
> Cc: Eric Biederman <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: Brijesh Singh <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> kernel/resource.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 627e61b0c124..4efd6e912279 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -457,7 +457,7 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
> {
> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>
> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true,
> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
> arg, func);
Looks good,
Reviewed-by: Dan Williams <[email protected]>
On Mon, Mar 22, 2021 at 9:03 AM David Hildenbrand <[email protected]> wrote:
>
> All IORESOURCE_SYSTEM_RAM and IORESOURCE_MEM now properly consider the
> whole resource tree, not just the first level. Let's drop the unused
> first_lvl / siblings_only logic.
>
> All functions properly search the whole tree, so remove documentation
> that indicates that some functions behave differently.
Looks good, and the staging of the potential regressions as standalone
lead-in patches makes sense.
Reviewed-by: Dan Williams <[email protected]>
On 22.03.21 17:01, David Hildenbrand wrote:
> It used to be true that we can have busy system RAM only on the first level
> in the resourc tree. However, this is no longer holds for driver-managed
> system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
> lower levels.
>
> We have two users of walk_system_ram_res(), which currently only
> consideres the first level:
> a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
> IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
> locate_mem_hole_callback(), so even after this change, we won't be
> placing kexec images onto dax/kmem and virtio-mem added memory. No
> change.
> b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
> not adding relevant ranges to the crash elf info, resulting in them
> not getting dumped via kdump.
>
> This change fixes loading a crashkernel via kexec_file_load() and including
> dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note
> that e.g,, arm64 relies on memblock data and, therefore, always considers
> all added System RAM already.
>
> Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function
> behave like walk_system_ram_range().
>
> Cc: Andrew Morton <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Andy Shevchenko <[email protected]>
> Cc: Mauro Carvalho Chehab <[email protected]>
> Cc: Signed-off-by: David Hildenbrand <[email protected]>
^ My copy-paste action when creating the cc list slipped in a duplicate
SO in all 3 patches. I can resend if desired.
--
Thanks,
David / dhildenb
On Mon, Mar 22, 2021 at 05:01:57PM +0100, David Hildenbrand wrote:
> Playing with kdump+virtio-mem I noticed that kexec_file_load() does not
> consider System RAM added via dax/kmem and virtio-mem when preparing the
> elf header for kdump. Looking into the details, the logic used in
> walk_system_ram_res() and walk_mem_res() seems to be outdated.
>
> walk_system_ram_range() already does the right thing, let's change
> walk_system_ram_res() and walk_mem_res(), and clean up.
>
> Loading a kdump kernel via "kexec -p -s" ... will result in the kdump
> kernel to also dump dax/kmem and virtio-mem added System RAM now.
>
> Note: kexec-tools on x86-64 also have to be updated to consider this
> memory in the kexec_load() case when processing /proc/iomem.
> Against next-20210322.
Hint: `git format-patch --base ...`
>
> David Hildenbrand (3):
> kernel/resource: make walk_system_ram_res() find all busy
> IORESOURCE_SYSTEM_RAM resources
> kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM
> resources
> kernel/resource: remove first_lvl / siblings_only logic
>
> kernel/resource.c | 45 ++++++++++++---------------------------------
> 1 file changed, 12 insertions(+), 33 deletions(-)
>
> --
> 2.29.2
>
--
With Best Regards,
Andy Shevchenko
On Tue, Mar 23, 2021 at 10:40:33AM +0100, David Hildenbrand wrote:
> On 22.03.21 17:01, David Hildenbrand wrote:
> > It used to be true that we can have busy system RAM only on the first level
> > in the resourc tree. However, this is no longer holds for driver-managed
> > system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
> > lower levels.
> >
> > We have two users of walk_system_ram_res(), which currently only
> > consideres the first level:
> > a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
> > IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
> > locate_mem_hole_callback(), so even after this change, we won't be
> > placing kexec images onto dax/kmem and virtio-mem added memory. No
> > change.
> > b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
> > not adding relevant ranges to the crash elf info, resulting in them
> > not getting dumped via kdump.
> >
> > This change fixes loading a crashkernel via kexec_file_load() and including
> > dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note
> > that e.g,, arm64 relies on memblock data and, therefore, always considers
> > all added System RAM already.
> >
> > Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function
> > behave like walk_system_ram_range().
> >
> > Cc: Andrew Morton <[email protected]>
> > Cc: Greg Kroah-Hartman <[email protected]>
> > Cc: Dan Williams <[email protected]>
> > Cc: Daniel Vetter <[email protected]>
> > Cc: Andy Shevchenko <[email protected]>
> > Cc: Mauro Carvalho Chehab <[email protected]>
> > Cc: Signed-off-by: David Hildenbrand <[email protected]>
>
> ^ My copy-paste action when creating the cc list slipped in a duplicate SO
> in all 3 patches. I can resend if desired.
I think to address my comments you will need to resend anyway (as v2).
--
With Best Regards,
Andy Shevchenko
On Mon, Mar 22, 2021 at 05:01:58PM +0100, David Hildenbrand wrote:
> It used to be true that we can have busy system RAM only on the first level
> in the resourc tree. However, this is no longer holds for driver-managed
> system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
> lower levels.
>
> We have two users of walk_system_ram_res(), which currently only
> consideres the first level:
> a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
> IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
> locate_mem_hole_callback(), so even after this change, we won't be
> placing kexec images onto dax/kmem and virtio-mem added memory. No
> change.
> b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
> not adding relevant ranges to the crash elf info, resulting in them
> not getting dumped via kdump.
>
> This change fixes loading a crashkernel via kexec_file_load() and including
"...fixes..." effectively means to me that Fixes tag should be provided.
> dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note
> that e.g,, arm64 relies on memblock data and, therefore, always considers
> all added System RAM already.
>
> Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function
> behave like walk_system_ram_range().
>
> Cc: Andrew Morton <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Andy Shevchenko <[email protected]>
> Cc: Mauro Carvalho Chehab <[email protected]>
> Cc: Signed-off-by: David Hildenbrand <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Baoquan He <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Keith Busch <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Qian Cai <[email protected]>
> Cc: Oscar Salvador <[email protected]>
> Cc: Eric Biederman <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: Brijesh Singh <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> kernel/resource.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 627e61b0c124..4efd6e912279 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -457,7 +457,7 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
> {
> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>
> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true,
> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
> arg, func);
> }
>
> --
> 2.29.2
>
--
With Best Regards,
Andy Shevchenko
On Mon, Mar 22, 2021 at 05:01:59PM +0100, David Hildenbrand wrote:
> It used to be true that we can have system RAM only on the first level
> in the resourc tree. However, this is no longer holds for driver-managed
> system RAM (i.e., dax/kmem and virtio-mem).
>
> The function walk_mem_res() only consideres the first level and is
> used in arch/x86/mm/ioremap.c:__ioremap_check_mem() only. We currently
> fail to identify System RAM added by dax/kmem and virtio-mem as
> "IORES_MAP_SYSTEM_RAM", for example, allowing for remapping of such
> "normal RAM" in __ioremap_caller().
Here I dunno, but consider to add Fixes tag if it fixes known bad behaviour.
> Let's find all busy IORESOURCE_MEM resources, making the function
> behave similar to walk_system_ram_res().
>
> Cc: Andrew Morton <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Andy Shevchenko <[email protected]>
> Cc: Mauro Carvalho Chehab <[email protected]>
> Cc: Signed-off-by: David Hildenbrand <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Baoquan He <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Keith Busch <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Qian Cai <[email protected]>
> Cc: Oscar Salvador <[email protected]>
> Cc: Eric Biederman <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: Brijesh Singh <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> kernel/resource.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 4efd6e912279..16e0c7e8ed24 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -470,7 +470,7 @@ int walk_mem_res(u64 start, u64 end, void *arg,
> {
> unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>
> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true,
> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
> arg, func);
> }
>
> --
> 2.29.2
>
--
With Best Regards,
Andy Shevchenko
On Mon, Mar 22, 2021 at 05:02:00PM +0100, David Hildenbrand wrote:
> All IORESOURCE_SYSTEM_RAM and IORESOURCE_MEM now properly consider the
> whole resource tree, not just the first level. Let's drop the unused
> first_lvl / siblings_only logic.
>
> All functions properly search the whole tree, so remove documentation
> that indicates that some functions behave differently.
Like this clean up!
Reviewed-by: Andy Shevchenko <[email protected]>
Although a few nit-picks below.
> Cc: Andrew Morton <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Andy Shevchenko <[email protected]>
> Cc: Mauro Carvalho Chehab <[email protected]>
> Cc: Signed-off-by: David Hildenbrand <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Baoquan He <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Keith Busch <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Qian Cai <[email protected]>
> Cc: Oscar Salvador <[email protected]>
> Cc: Eric Biederman <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: Brijesh Singh <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> kernel/resource.c | 45 ++++++++++++---------------------------------
> 1 file changed, 12 insertions(+), 33 deletions(-)
>
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 16e0c7e8ed24..7e00239a023a 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -64,12 +64,8 @@ static DEFINE_RWLOCK(resource_lock);
> static struct resource *bootmem_resource_free;
> static DEFINE_SPINLOCK(bootmem_resource_lock);
>
> -static struct resource *next_resource(struct resource *p, bool sibling_only)
> +static struct resource *next_resource(struct resource *p)
> {
> - /* Caller wants to traverse through siblings only */
> - if (sibling_only)
> - return p->sibling;
> -
> if (p->child)
> return p->child;
> while (!p->sibling && p->parent)
> @@ -81,7 +77,7 @@ static void *r_next(struct seq_file *m, void *v, loff_t *pos)
> {
> struct resource *p = v;
> (*pos)++;
> - return (void *)next_resource(p, false);
> + return (void *)next_resource(p);
> }
>
> #ifdef CONFIG_PROC_FS
> @@ -330,14 +326,10 @@ EXPORT_SYMBOL(release_resource);
> * of the resource that's within [@start..@end]; if none is found, returns
> * -ENODEV. Returns -EINVAL for invalid parameters.
> *
> - * This function walks the whole tree and not just first level children
> - * unless @first_lvl is true.
> - *
> * @start: start address of the resource searched for
> * @end: end address of same resource
> * @flags: flags which the resource must have
> * @desc: descriptor the resource must have
> - * @first_lvl: walk only the first level children, if set
> * @res: return ptr, if resource found
> *
> * The caller must specify @start, @end, @flags, and @desc
> @@ -345,9 +337,8 @@ EXPORT_SYMBOL(release_resource);
> */
> static int find_next_iomem_res(resource_size_t start, resource_size_t end,
> unsigned long flags, unsigned long desc,
> - bool first_lvl, struct resource *res)
> + struct resource *res)
> {
> - bool siblings_only = true;
> struct resource *p;
>
> if (!res)
> @@ -358,7 +349,7 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>
> read_lock(&resource_lock);
>
> - for (p = iomem_resource.child; p; p = next_resource(p, siblings_only)) {
> + for (p = iomem_resource.child; p; p = next_resource(p)) {
> /* If we passed the resource we are looking for, stop */
> if (p->start > end) {
> p = NULL;
> @@ -369,13 +360,6 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
> if (p->end < start)
> continue;
>
> - /*
> - * Now that we found a range that matches what we look for,
> - * check the flags and the descriptor. If we were not asked to
> - * use only the first level, start looking at children as well.
> - */
> - siblings_only = first_lvl;
> -
> if ((p->flags & flags) != flags)
> continue;
> if ((desc != IORES_DESC_NONE) && (desc != p->desc))
> @@ -402,14 +386,14 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>
> static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
> unsigned long flags, unsigned long desc,
> - bool first_lvl, void *arg,
> + void *arg,
> int (*func)(struct resource *, void *))
Can it be one line?
> {
> struct resource res;
> int ret = -EINVAL;
>
> while (start < end &&
> - !find_next_iomem_res(start, end, flags, desc, first_lvl, &res)) {
> + !find_next_iomem_res(start, end, flags, desc, &res)) {
> ret = (*func)(&res, arg);
> if (ret)
> break;
> @@ -431,7 +415,6 @@ static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
> * @arg: function argument for the callback @func
> * @func: callback function that is called for each qualifying resource area
> *
> - * This walks through whole tree and not just first level children.
> * All the memory ranges which overlap start,end and also match flags and
> * desc are valid candidates.
> *
> @@ -441,7 +424,7 @@ static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
> int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start,
> u64 end, void *arg, int (*func)(struct resource *, void *))
> {
> - return __walk_iomem_res_desc(start, end, flags, desc, false, arg, func);
> + return __walk_iomem_res_desc(start, end, flags, desc, arg, func);
> }
> EXPORT_SYMBOL_GPL(walk_iomem_res_desc);
>
> @@ -457,8 +440,8 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
> {
> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>
> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
> - arg, func);
> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, arg,
> + func);
I guess you may do it on one line.
> }
>
> /*
> @@ -470,17 +453,14 @@ int walk_mem_res(u64 start, u64 end, void *arg,
> {
> unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>
> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
> - arg, func);
> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, arg,
> + func);
Ditto.
> }
>
> /*
> * This function calls the @func callback against all memory ranges of type
> * System RAM which are marked as IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY.
> * It is to be used only for System RAM.
> - *
> - * This will find System RAM ranges that are children of top-level resources
> - * in addition to top-level System RAM resources.
> */
> int walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
> void *arg, int (*func)(unsigned long, unsigned long, void *))
> @@ -495,8 +475,7 @@ int walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
> end = ((u64)(start_pfn + nr_pages) << PAGE_SHIFT) - 1;
> flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> while (start < end &&
> - !find_next_iomem_res(start, end, flags, IORES_DESC_NONE,
> - false, &res)) {
> + !find_next_iomem_res(start, end, flags, IORES_DESC_NONE, &res)) {
> pfn = PFN_UP(res.start);
> end_pfn = PFN_DOWN(res.end + 1);
> if (end_pfn > pfn)
> --
> 2.29.2
>
--
With Best Regards,
Andy Shevchenko
On 23.03.21 12:11, Andy Shevchenko wrote:
> On Mon, Mar 22, 2021 at 05:02:00PM +0100, David Hildenbrand wrote:
>> All IORESOURCE_SYSTEM_RAM and IORESOURCE_MEM now properly consider the
>> whole resource tree, not just the first level. Let's drop the unused
>> first_lvl / siblings_only logic.
>>
>> All functions properly search the whole tree, so remove documentation
>> that indicates that some functions behave differently.
>
>
> Like this clean up!
> Reviewed-by: Andy Shevchenko <[email protected]>
>
> Although a few nit-picks below.
>
>> Cc: Andrew Morton <[email protected]>
>> Cc: Greg Kroah-Hartman <[email protected]>
>> Cc: Dan Williams <[email protected]>
>> Cc: Daniel Vetter <[email protected]>
>> Cc: Andy Shevchenko <[email protected]>
>> Cc: Mauro Carvalho Chehab <[email protected]>
>> Cc: Signed-off-by: David Hildenbrand <[email protected]>
>> Cc: Dave Young <[email protected]>
>> Cc: Baoquan He <[email protected]>
>> Cc: Vivek Goyal <[email protected]>
>> Cc: Dave Hansen <[email protected]>
>> Cc: Keith Busch <[email protected]>
>> Cc: Michal Hocko <[email protected]>
>> Cc: Qian Cai <[email protected]>
>> Cc: Oscar Salvador <[email protected]>
>> Cc: Eric Biederman <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Borislav Petkov <[email protected]>
>> Cc: "H. Peter Anvin" <[email protected]>
>> Cc: Tom Lendacky <[email protected]>
>> Cc: Brijesh Singh <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> kernel/resource.c | 45 ++++++++++++---------------------------------
>> 1 file changed, 12 insertions(+), 33 deletions(-)
>>
>> diff --git a/kernel/resource.c b/kernel/resource.c
>> index 16e0c7e8ed24..7e00239a023a 100644
>> --- a/kernel/resource.c
>> +++ b/kernel/resource.c
>> @@ -64,12 +64,8 @@ static DEFINE_RWLOCK(resource_lock);
>> static struct resource *bootmem_resource_free;
>> static DEFINE_SPINLOCK(bootmem_resource_lock);
>>
>> -static struct resource *next_resource(struct resource *p, bool sibling_only)
>> +static struct resource *next_resource(struct resource *p)
>> {
>> - /* Caller wants to traverse through siblings only */
>> - if (sibling_only)
>> - return p->sibling;
>> -
>> if (p->child)
>> return p->child;
>> while (!p->sibling && p->parent)
>> @@ -81,7 +77,7 @@ static void *r_next(struct seq_file *m, void *v, loff_t *pos)
>> {
>> struct resource *p = v;
>> (*pos)++;
>> - return (void *)next_resource(p, false);
>> + return (void *)next_resource(p);
>> }
>>
>> #ifdef CONFIG_PROC_FS
>> @@ -330,14 +326,10 @@ EXPORT_SYMBOL(release_resource);
>> * of the resource that's within [@start..@end]; if none is found, returns
>> * -ENODEV. Returns -EINVAL for invalid parameters.
>> *
>> - * This function walks the whole tree and not just first level children
>> - * unless @first_lvl is true.
>> - *
>> * @start: start address of the resource searched for
>> * @end: end address of same resource
>> * @flags: flags which the resource must have
>> * @desc: descriptor the resource must have
>> - * @first_lvl: walk only the first level children, if set
>> * @res: return ptr, if resource found
>> *
>> * The caller must specify @start, @end, @flags, and @desc
>> @@ -345,9 +337,8 @@ EXPORT_SYMBOL(release_resource);
>> */
>> static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>> unsigned long flags, unsigned long desc,
>> - bool first_lvl, struct resource *res)
>> + struct resource *res)
>> {
>> - bool siblings_only = true;
>> struct resource *p;
>>
>> if (!res)
>> @@ -358,7 +349,7 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>>
>> read_lock(&resource_lock);
>>
>> - for (p = iomem_resource.child; p; p = next_resource(p, siblings_only)) {
>> + for (p = iomem_resource.child; p; p = next_resource(p)) {
>> /* If we passed the resource we are looking for, stop */
>> if (p->start > end) {
>> p = NULL;
>> @@ -369,13 +360,6 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>> if (p->end < start)
>> continue;
>>
>> - /*
>> - * Now that we found a range that matches what we look for,
>> - * check the flags and the descriptor. If we were not asked to
>> - * use only the first level, start looking at children as well.
>> - */
>> - siblings_only = first_lvl;
>> -
>> if ((p->flags & flags) != flags)
>> continue;
>> if ((desc != IORES_DESC_NONE) && (desc != p->desc))
>> @@ -402,14 +386,14 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>>
>> static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
>> unsigned long flags, unsigned long desc,
>> - bool first_lvl, void *arg,
>
>> + void *arg,
>> int (*func)(struct resource *, void *))
>
> Can it be one line?
>
>> {
>> struct resource res;
>> int ret = -EINVAL;
>>
>> while (start < end &&
>> - !find_next_iomem_res(start, end, flags, desc, first_lvl, &res)) {
>> + !find_next_iomem_res(start, end, flags, desc, &res)) {
>> ret = (*func)(&res, arg);
>> if (ret)
>> break;
>> @@ -431,7 +415,6 @@ static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
>> * @arg: function argument for the callback @func
>> * @func: callback function that is called for each qualifying resource area
>> *
>> - * This walks through whole tree and not just first level children.
>> * All the memory ranges which overlap start,end and also match flags and
>> * desc are valid candidates.
>> *
>> @@ -441,7 +424,7 @@ static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
>> int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start,
>> u64 end, void *arg, int (*func)(struct resource *, void *))
>> {
>> - return __walk_iomem_res_desc(start, end, flags, desc, false, arg, func);
>> + return __walk_iomem_res_desc(start, end, flags, desc, arg, func);
>> }
>> EXPORT_SYMBOL_GPL(walk_iomem_res_desc);
>>
>> @@ -457,8 +440,8 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
>> {
>> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>>
>> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
>> - arg, func);
>
>> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, arg,
>> + func);
>
> I guess you may do it on one line.
>
>> }
>>
>> /*
>> @@ -470,17 +453,14 @@ int walk_mem_res(u64 start, u64 end, void *arg,
>> {
>> unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>>
>> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
>> - arg, func);
>> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, arg,
>> + func);
>
> Ditto.
To all your comments:
"The preferred limit on the length of a single line is 80 columns."
"Statements longer than 80 columns should be broken into sensible chunks
... unless exceeding 80 columns significantly increases readability"
I don't think it significantly increases readability.
--
Thanks,
David / dhildenb
On 23.03.21 12:06, Andy Shevchenko wrote:
> On Mon, Mar 22, 2021 at 05:01:58PM +0100, David Hildenbrand wrote:
>> It used to be true that we can have busy system RAM only on the first level
>> in the resourc tree. However, this is no longer holds for driver-managed
>> system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
>> lower levels.
>>
>> We have two users of walk_system_ram_res(), which currently only
>> consideres the first level:
>> a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
>> IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
>> locate_mem_hole_callback(), so even after this change, we won't be
>> placing kexec images onto dax/kmem and virtio-mem added memory. No
>> change.
>> b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
>> not adding relevant ranges to the crash elf info, resulting in them
>> not getting dumped via kdump.
>>
>> This change fixes loading a crashkernel via kexec_file_load() and including
>
> "...fixes..." effectively means to me that Fixes tag should be provided.
We can certainly add, although it doesn't really affect the running
kernel, but only crashdumps taken in the kdump kernel:
Fixes: ebf71552bb0e ("virtio-mem: Add parent resource for all added
"System RAM"")
Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use
like normal RAM")
Thanks
--
Thanks,
David / dhildenb
On 23.03.21 12:08, Andy Shevchenko wrote:
> On Mon, Mar 22, 2021 at 05:01:59PM +0100, David Hildenbrand wrote:
>> It used to be true that we can have system RAM only on the first level
>> in the resourc tree. However, this is no longer holds for driver-managed
>> system RAM (i.e., dax/kmem and virtio-mem).
>>
>> The function walk_mem_res() only consideres the first level and is
>> used in arch/x86/mm/ioremap.c:__ioremap_check_mem() only. We currently
>> fail to identify System RAM added by dax/kmem and virtio-mem as
>> "IORES_MAP_SYSTEM_RAM", for example, allowing for remapping of such
>> "normal RAM" in __ioremap_caller().
>
> Here I dunno, but consider to add Fixes tag if it fixes known bad behaviour.
Haven't observed it in the wild. We can add the same fixes tags as to
patch #1.
--
Thanks,
David / dhildenb
On 03/22/21 at 05:01pm, David Hildenbrand wrote:
> It used to be true that we can have busy system RAM only on the first level
> in the resourc tree. However, this is no longer holds for driver-managed
> system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
> lower levels.
>
> We have two users of walk_system_ram_res(), which currently only
> consideres the first level:
> a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
> IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
> locate_mem_hole_callback(), so even after this change, we won't be
> placing kexec images onto dax/kmem and virtio-mem added memory. No
> change.
> b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
> not adding relevant ranges to the crash elf info, resulting in them
> not getting dumped via kdump.
>
> This change fixes loading a crashkernel via kexec_file_load() and including
> dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note
> that e.g,, arm64 relies on memblock data and, therefore, always considers
> all added System RAM already.
>
> Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function
> behave like walk_system_ram_range().
>
> Cc: Andrew Morton <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Andy Shevchenko <[email protected]>
> Cc: Mauro Carvalho Chehab <[email protected]>
> Cc: Signed-off-by: David Hildenbrand <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Baoquan He <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Keith Busch <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Qian Cai <[email protected]>
> Cc: Oscar Salvador <[email protected]>
> Cc: Eric Biederman <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: Brijesh Singh <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> kernel/resource.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 627e61b0c124..4efd6e912279 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -457,7 +457,7 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
> {
> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>
> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true,
> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
> arg, func);
Thanks, David, this is a good fix.
Acked-by: Baoquan He <[email protected]>
> }
>
> --
> 2.29.2
>
On Mon, Mar 22, 2021 at 05:01:58PM +0100, David Hildenbrand wrote:
> It used to be true that we can have busy system RAM only on the first level
> in the resourc tree. However, this is no longer holds for driver-managed
> system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
> lower levels.
Let me ask some rookie questions:
What does "busy" term stand for here?
Why resources coming from virtio-mem are added at a lower levels?
>
> We have two users of walk_system_ram_res(), which currently only
> consideres the first level:
> a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
> IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
> locate_mem_hole_callback(), so even after this change, we won't be
> placing kexec images onto dax/kmem and virtio-mem added memory. No
> change.
> b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
> not adding relevant ranges to the crash elf info, resulting in them
> not getting dumped via kdump.
>
> This change fixes loading a crashkernel via kexec_file_load() and including
> dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note
> that e.g,, arm64 relies on memblock data and, therefore, always considers
> all added System RAM already.
>
> Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function
> behave like walk_system_ram_range().
>
> Cc: Andrew Morton <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Andy Shevchenko <[email protected]>
> Cc: Mauro Carvalho Chehab <[email protected]>
> Cc: Signed-off-by: David Hildenbrand <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Baoquan He <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Keith Busch <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Qian Cai <[email protected]>
> Cc: Oscar Salvador <[email protected]>
> Cc: Eric Biederman <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: Brijesh Singh <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> kernel/resource.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 627e61b0c124..4efd6e912279 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -457,7 +457,7 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
> {
> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>
> - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true,
> + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false,
> arg, func);
> }
>
> --
> 2.29.2
>
>
--
Oscar Salvador
SUSE L3
On 24.03.21 12:18, Oscar Salvador wrote:
> On Mon, Mar 22, 2021 at 05:01:58PM +0100, David Hildenbrand wrote:
>> It used to be true that we can have busy system RAM only on the first level
>> in the resourc tree. However, this is no longer holds for driver-managed
>> system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on
>> lower levels.
>
> Let me ask some rookie questions:
>
> What does "busy" term stand for here?
IORESOURCE_BUSY - here: actually added, not just some reserved range / container.
> Why resources coming from virtio-mem are added at a lower levels?
Some information can be had from ebf71552bb0e690cad523ad175e8c4c89a33c333
commit ebf71552bb0e690cad523ad175e8c4c89a33c333
Author: David Hildenbrand <[email protected]>
Date: Thu May 7 16:01:35 2020 +0200
virtio-mem: Add parent resource for all added "System RAM"
Let's add a parent resource, named after the virtio device (inspired by
drivers/dax/kmem.c). This allows user space to identify which memory
belongs to which virtio-mem device.
With this change and two virtio-mem devices:
:/# cat /proc/iomem
00000000-00000fff : Reserved
00001000-0009fbff : System RAM
[...]
140000000-333ffffff : virtio0
140000000-147ffffff : System RAM
148000000-14fffffff : System RAM
150000000-157ffffff : System RAM
[...]
334000000-3033ffffff : virtio1
338000000-33fffffff : System RAM
340000000-347ffffff : System RAM
348000000-34fffffff : System RAM
[...]
For dax/kmem it comes naturally due to the "Persistent Memory" and
device parent resources like:
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
150000000-33fffffff : dax0.0
150000000-33fffffff : System RAM (kmem)
3280000000-32ffffffff : PCI Bus 0000:00
Thanks
--
Thanks,
David / dhildenb