2008-08-24 22:46:03

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] x86: only put e820 ram entries in resource tree

may need user to have new kexec tools that could create e820 table
from /sys/firmware/memmap instead of /proc/iomem for second kernel

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Bernhard Walle <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>

Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1279,6 +1279,10 @@ void __init e820_reserve_resources(void)

res = alloc_bootmem_low(sizeof(struct resource) * e820.nr_map);
for (i = 0; i < e820.nr_map; i++) {
+ if (e820.map[i].type != E820_RAM) {
+ res++;
+ continue;
+ }
end = e820.map[i].addr + e820.map[i].size - 1;
#ifndef CONFIG_RESOURCES_64BIT
if (end > 0x100000000ULL) {


2008-08-25 03:04:22

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] x86: only put e820 ram entries in resource tree

Yinghai Lu <[email protected]> writes:

> may need user to have new kexec tools that could create e820 table
> from /sys/firmware/memmap instead of /proc/iomem for second kernel

Nacked-by: "Eric W. Biederman" <[email protected]>

/proc/iomem is mostly about io resources which you have just removed.
It is totally the wrong thing to only register RAM resource!

The use by kexec was and is just taking advantage of something that
already existed.

Eric

> Signed-off-by: Yinghai Lu <[email protected]>
> Cc: Bernhard Walle <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> Cc: "Eric W. Biederman" <[email protected]>
>
> Index: linux-2.6/arch/x86/kernel/e820.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/e820.c
> +++ linux-2.6/arch/x86/kernel/e820.c
> @@ -1279,6 +1279,10 @@ void __init e820_reserve_resources(void)
>
> res = alloc_bootmem_low(sizeof(struct resource) * e820.nr_map);
> for (i = 0; i < e820.nr_map; i++) {
> + if (e820.map[i].type != E820_RAM) {
> + res++;
> + continue;
> + }
> end = e820.map[i].addr + e820.map[i].size - 1;
> #ifndef CONFIG_RESOURCES_64BIT
> if (end > 0x100000000ULL) {

2008-08-25 03:43:18

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86: only put e820 ram entries in resource tree

On Sun, Aug 24, 2008 at 7:52 PM, Eric W. Biederman
<[email protected]> wrote:
> Yinghai Lu <[email protected]> writes:
>
>> may need user to have new kexec tools that could create e820 table
>> from /sys/firmware/memmap instead of /proc/iomem for second kernel
>
> Nacked-by: "Eric W. Biederman" <[email protected]>
>
> /proc/iomem is mostly about io resources which you have just removed.
> It is totally the wrong thing to only register RAM resource!
>
> The use by kexec was and is just taking advantage of something that
> already existed.

story:
before 2.6.26, kernel will insert_resource with lapic addr into resource tree.
and then use request_resource to add entries with all entries in e820 tables.
so one entry is overlapped with lapic address is never added to resource tree.

from 2.6.26, we use have e820 insert_resource for it's entries to
resource tree at first. and later use
insert_resource for lapic address. so all entries from e820 is showing
up on resource tree.

problem: some devices that on bus0, has resource with BAR,, and those
address is falling into reserved area in e820.
when pcibios_allocate_bus_resources check those resource, it found
request_resource(pr, res) will fail. at this point pr is
resource of parent bus of those device. ant it is iomem_resource. then
those device will updated resource by OS allocations.
that should be ok, but some chipset put HPET in one BAR1, that changes
will make hpet addr is not consistent anymore.
the system will hang...

solutions will be:
1. use quirks to protect the hpet in BAR

[PATCH] x86: protect hpet in BAR for one ATI chipset v3

so avoid kernel don't allocate nre resource for it because it can not
allocate the old
address from BIOS.

the same way like some IO APIC address in BAR handling

Signed-off-by: Yinghai Lu <[email protected]>

---
drivers/pci/quirks.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

Index: linux-2.6/drivers/pci/quirks.c
===================================================================
--- linux-2.6.orig/drivers/pci/quirks.c
+++ linux-2.6/drivers/pci/quirks.c
@@ -1918,6 +1918,22 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_B
PCI_DEVICE_ID_NX2_5709S,
quirk_brcm_570x_limit_vpd);

+static void __init quirk_hpet_in_bar(struct pci_dev *pdev)
+{
+ int i;
+ u64 base, size;
+
+ /* the BAR1 is the location of the HPET...we must
+ * not touch this, so forcibly insert it into the resource tree */
+ base = pci_resource_start(pdev, 1);
+ size = pci_resource_len(pdev, 1);
+ if (base && size) {
+ insert_resource(&iomem_resource, &pdev->resource[1]);
+ dev_info(&pdev->dev, "HPET at %08llx-%08llx\n", base,
base + size - 1);
+ }
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, 0x4385, quirk_hpet_in_bar);
+
#ifdef CONFIG_PCI_MSI
/* Some chipsets do not support MSI. We cannot easily rely on setting
* PCI_BUS_FLAGS_NO_MSI in its bus flags because there are actually

2. or more generic way, double check that in pcibios_allocate_bus_resources

[PATCH] x86: check hpet with BAR

insert some resources to resource tree forcily, so could avoid kernel update the
resources in pci device.

Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/pci/i386.c | 43 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)

Index: linux-2.6/arch/x86/pci/i386.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/i386.c
+++ linux-2.6/arch/x86/pci/i386.c
@@ -33,6 +33,7 @@
#include <linux/bootmem.h>

#include <asm/pat.h>
+#include <asm/hpet.h>

#include "pci.h"

@@ -77,6 +78,30 @@ pcibios_align_resource(void *data, struc
}
EXPORT_SYMBOL(pcibios_align_resource);

+static int check_res_with_valid(struct pci_dev *dev, struct resource *res)
+{
+ unsigned long base;
+ unsigned long size;
+
+ base = res->start;
+ size = (res->start == 0 && res->end == res->start) ? 0 :
+ (res->end - res->start + 1);
+
+ if (!base || !size)
+ return 0;
+
+#ifdef CONFIG_HPET_TIMER
+ /* for hpet */
+ if (base == hpet_address && (res->flags & IORESOURCE_MEM)) {
+ dev_info(&dev->dev, "BAR has HPET at %08lx-%08lx\n",
+ base, base + size - 1);
+ return 1;
+ }
+#endif
+
+ return 0;
+}
+
/*
* Handle resources of PCI devices. If the world were perfect, we could
* just allocate all the resource regions and do nothing more. It isn't.
@@ -128,6 +153,23 @@ static void __init pcibios_allocate_bus_
pr = pci_find_parent_resource(dev, r);
if (!r->start || !pr ||
request_resource(pr, r) < 0) {
+ if (check_res_with_valid(dev, r)) {
+ struct resource *root = NULL;
+
+ /*
+ * forcibly insert it into the
+ * resource tree
+ */
+ if (r->flags & IORESOURCE_MEM)
+ root = &iomem_resource;
+ else if (r->flags &
IORESOURCE_IO)
+ root = &ioport_resource;
+
+ if (root)
+
insert_resource(root, r);
+ continue;
+ }
+
dev_err(&dev->dev, "BAR %d: can't "
"allocate resource\n", idx);
/*

3. or this patch, just don't use e820 reserved entries in resource tree.
it seems pci code is trying to find gap in e820 directly. (recently
some try to use acpi with that).
other usage of e820 reserved entries is for mmconfig, and that is
checking with e820 directly.

don't know who is using reserved entries in resource tree from e820.
please remember that some reserved entry is missing till 2.6.25....

YH

2008-08-25 07:18:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: only put e820 ram entries in resource tree


* Eric W. Biederman <[email protected]> wrote:

> Yinghai Lu <[email protected]> writes:
>
> > may need user to have new kexec tools that could create e820 table
> > from /sys/firmware/memmap instead of /proc/iomem for second kernel
>
> Nacked-by: "Eric W. Biederman" <[email protected]>
>
> /proc/iomem is mostly about io resources which you have just removed.
> It is totally the wrong thing to only register RAM resource!

see the RFC commit below for more details - about the problem and
various solutions we are thinking about. The core problem is that the
problem was hard to find and hard to debug - it took the exception
debugging effort of David Witbrodt to track it down.

So we are trying structural fixes to improve the situation. Just
reverting the e820 changes breaks other things and is not the real fix
anyway: the real fix is to increase communication between PC platform
devices/drivers and the PCI code. DMI driven quirks are too limited as
well - more such systems are suspected.

For now we've got the patch below from Yinghai - which hooks directly
into the x86 PCI discovery and reallocation code. While that's already
better than the initial DMI quirk, i think the real fix should go one
level higher, to the resource manager.

i'd rather see the e820 reserved entries show up there (losing system
setup information is almost always a bad idea - and the e820 map is
central enough to be one of the more reliable BIOS-provided data
structures), but with a different resource property: a 'sticky' resource
bit which would cause overlapping PCI devices that already have their
BAR programmed stay there. We already have a certain amount of support
for 'container' resources (bridge resources for example).

That would automatically protect any hpet (or, in theory, ioapic)
platform devices from the PCI code's currently blind resource
reprogramming logic. These platform devices are not PCI enumerated so we
cannot just make the platform drivers themselves be PCI drivers, and
they are special in many regards. (often they are not PCI devices at
all)

Note that this is only about the (BIOS provided) e820 map. The core
problem is, inserting e820 map reserved entries as 'real' resources can
break real devices.

Ingo

---------------->
>From 1521c6b7a96e8d79c424216d9118859a017a4e9e Mon Sep 17 00:00:00 2001
From: Yinghai Lu <[email protected]>
Date: Sun, 24 Aug 2008 21:41:28 -0700
Subject: [PATCH] x86: fix HPET regression in 2.6.26 versus 2.6.25, check hpet against BAR v2

David Witbrodt tracked down (and bisected) a bootup hang on his system
to the following problem: a BIOS bug made the hpet device visible as a
generic PCI device. If e820 reserved entries happen to be registered
first in the resource tree [which v2.6.26 started doing - to fix other
bugs], then the PCI code will reallocate that device's BAR to some other
address - breaking timer IRQs and hanging the system.

( Normally hpet devices are hidden by the BIOS from the OS's PCI discovery
via chipset magic. Sometimes the hpet is not a PCI device at all. )

Solve this fundamental fragility by making the non-PCI platform driver
insert resources into the resource tree even if it overlaps the e820
reserved entry, to keep the resource manager from updating the BAR.

NOTE: this is an RFC for now, there might be other, better approaches
as well:

- introduce a new resource type that is 'sticky': it would keep BARs
that are embedded in it from being reallocated.

or

- update the hpet_address from the PCI code. This is risky though: these
PCI devices are often non-generic and might break if we change their
BAR.

or

- do not insert e820 reserved entries at all. This would have
disadvantages as well: if there's some special non-RAM ACPI or SMM
area known to the system and enumerated in the e820 map, we must not
allow the PCI code from possibly allocating a resource into that
region.

[ [email protected]: cleanups ]

Bisected-by: David Witbrodt <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>
Tested-by: David Witbrodt <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/pci/i386.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 5807d1b..57be547 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -28,6 +28,7 @@
#include <linux/kernel.h>
#include <linux/pci.h>
#include <linux/init.h>
+#include <linux/hpet.h>
#include <linux/ioport.h>
#include <linux/errno.h>
#include <linux/bootmem.h>
@@ -78,6 +79,47 @@ pcibios_align_resource(void *data, struct resource *res,
EXPORT_SYMBOL(pcibios_align_resource);

/*
+ * Make sure we protect magic platform devices such as hpet,
+ * even if they show up in PCI discovery. (which should really
+ * not happen, but it does on some broken BIOSen)
+ */
+static int check_platform(struct pci_dev *dev, struct resource *res)
+{
+ unsigned long base;
+ unsigned long size;
+
+ base = res->start;
+ size = (res->start == 0 && res->end == res->start) ? 0 :
+ (res->end - res->start + 1);
+
+ if (!base || !size)
+ return 0;
+
+#ifdef CONFIG_HPET_TIMER
+ /* for hpet */
+ if (base == hpet_address && (res->flags & IORESOURCE_MEM)) {
+ struct resource *root = NULL;
+
+ WARN("BAR has HPET at %08lx-%08lx\n", base, base + size - 1);
+ /*
+ * forcibly insert it into the
+ * resource tree
+ */
+ if (res->flags & IORESOURCE_MEM)
+ root = &iomem_resource;
+ else if (res->flags & IORESOURCE_IO)
+ root = &ioport_resource;
+
+ if (root)
+ insert_resource(root, res);
+ return 1;
+ }
+#endif
+
+ return 0;
+}
+
+/*
* Handle resources of PCI devices. If the world were perfect, we could
* just allocate all the resource regions and do nothing more. It isn't.
* On the other hand, we cannot just re-allocate all devices, as it would
@@ -171,6 +213,8 @@ static void __init pcibios_allocate_resources(int pass)
r->flags, disabled, pass);
pr = pci_find_parent_resource(dev, r);
if (!pr || request_resource(pr, r) < 0) {
+ if (check_platform(dev, r))
+ continue;
dev_err(&dev->dev, "BAR %d: can't "
"allocate resource\n", idx);
/* We'll assign a new address later */

2008-08-25 08:40:00

by Bernhard Walle

[permalink] [raw]
Subject: Re: [PATCH] x86: only put e820 ram entries in resource tree

* Yinghai Lu [2008-08-24 15:44]:
>
> may need user to have new kexec tools that could create e820 table
> from /sys/firmware/memmap instead of /proc/iomem for second kernel

2.0.0 has that implemented.



Bernhard
--
Bernhard Walle, SUSE LINUX Products GmbH, Architecture Development

2008-08-25 13:34:36

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] x86: only put e820 ram entries in resource tree

Ingo Molnar <[email protected]> writes:

> * Eric W. Biederman <[email protected]> wrote:
>
>> Yinghai Lu <[email protected]> writes:
>>
>> > may need user to have new kexec tools that could create e820 table
>> > from /sys/firmware/memmap instead of /proc/iomem for second kernel
>>
>> Nacked-by: "Eric W. Biederman" <[email protected]>
>>
>> /proc/iomem is mostly about io resources which you have just removed.
>> It is totally the wrong thing to only register RAM resource!
>
> see the RFC commit below for more details - about the problem and
> various solutions we are thinking about. The core problem is that the
> problem was hard to find and hard to debug - it took the exception
> debugging effort of David Witbrodt to track it down.
>
> So we are trying structural fixes to improve the situation. Just
> reverting the e820 changes breaks other things and is not the real fix
> anyway: the real fix is to increase communication between PC platform
> devices/drivers and the PCI code. DMI driven quirks are too limited as
> well - more such systems are suspected.
>
> For now we've got the patch below from Yinghai - which hooks directly
> into the x86 PCI discovery and reallocation code. While that's already
> better than the initial DMI quirk, i think the real fix should go one
> level higher, to the resource manager.
>
> i'd rather see the e820 reserved entries show up there (losing system
> setup information is almost always a bad idea - and the e820 map is
> central enough to be one of the more reliable BIOS-provided data
> structures), but with a different resource property: a 'sticky' resource
> bit which would cause overlapping PCI devices that already have their
> BAR programmed stay there. We already have a certain amount of support
> for 'container' resources (bridge resources for example).

Agreed. And that is why I NAK'd YH's first patch which just yanked
all of the reserved entries out of the resource map.

This really does need to get up to how we deal with resources
and the resource manager.

> That would automatically protect any hpet (or, in theory, ioapic)
> platform devices from the PCI code's currently blind resource
> reprogramming logic. These platform devices are not PCI enumerated so we
> cannot just make the platform drivers themselves be PCI drivers, and
> they are special in many regards. (often they are not PCI devices at
> all)

> Note that this is only about the (BIOS provided) e820 map. The core
> problem is, inserting e820 map reserved entries as 'real' resources can
> break real devices.

The core problem is seeing the e820 reservation as a conflict, not inserting
the resources themselves.

The question: How do we deal more gracefully with BIOS bugs.
The problem: We don't have full system information so we have to guess and
perform other magic to make the system work.

I bet if the HPET driver knew we had changed it's bar it would have worked
but of course that won't work in general.

One of the other problems we have seen in this area if memory serves is
that BIOS reserved regions can don't always split on the same boundaries
as real hardware.

The last time this class of problem came up we added insert_resource
to the resource allocator. It seems either we are not using it properly
or it is an insufficient fix.

Hmm.

Why does pci_find_parent_resource fail?

Eric
> Ingo
>
> ---------------->
>>From 1521c6b7a96e8d79c424216d9118859a017a4e9e Mon Sep 17 00:00:00 2001
> From: Yinghai Lu <[email protected]>
> Date: Sun, 24 Aug 2008 21:41:28 -0700
> Subject: [PATCH] x86: fix HPET regression in 2.6.26 versus 2.6.25, check hpet
> against BAR v2
>
> David Witbrodt tracked down (and bisected) a bootup hang on his system
> to the following problem: a BIOS bug made the hpet device visible as a
> generic PCI device. If e820 reserved entries happen to be registered
> first in the resource tree [which v2.6.26 started doing - to fix other
> bugs], then the PCI code will reallocate that device's BAR to some other
> address - breaking timer IRQs and hanging the system.
>
> ( Normally hpet devices are hidden by the BIOS from the OS's PCI discovery
> via chipset magic. Sometimes the hpet is not a PCI device at all. )
>
> Solve this fundamental fragility by making the non-PCI platform driver
> insert resources into the resource tree even if it overlaps the e820
> reserved entry, to keep the resource manager from updating the BAR.
>
> NOTE: this is an RFC for now, there might be other, better approaches
> as well:
>
> - introduce a new resource type that is 'sticky': it would keep BARs
> that are embedded in it from being reallocated.
>
> or
>
> - update the hpet_address from the PCI code. This is risky though: these
> PCI devices are often non-generic and might break if we change their
> BAR.
>
> or
>
> - do not insert e820 reserved entries at all. This would have
> disadvantages as well: if there's some special non-RAM ACPI or SMM
> area known to the system and enumerated in the e820 map, we must not
> allow the PCI code from possibly allocating a resource into that
> region.
>
> [ [email protected]: cleanups ]
>
> Bisected-by: David Witbrodt <[email protected]>
> Signed-off-by: Yinghai Lu <[email protected]>
> Tested-by: David Witbrodt <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> arch/x86/pci/i386.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 44 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
> index 5807d1b..57be547 100644
> --- a/arch/x86/pci/i386.c
> +++ b/arch/x86/pci/i386.c
> @@ -28,6 +28,7 @@
> #include <linux/kernel.h>
> #include <linux/pci.h>
> #include <linux/init.h>
> +#include <linux/hpet.h>
> #include <linux/ioport.h>
> #include <linux/errno.h>
> #include <linux/bootmem.h>
> @@ -78,6 +79,47 @@ pcibios_align_resource(void *data, struct resource *res,
> EXPORT_SYMBOL(pcibios_align_resource);
>
> /*
> + * Make sure we protect magic platform devices such as hpet,
> + * even if they show up in PCI discovery. (which should really
> + * not happen, but it does on some broken BIOSen)
> + */
> +static int check_platform(struct pci_dev *dev, struct resource *res)
> +{
> + unsigned long base;
> + unsigned long size;
> +
> + base = res->start;
> + size = (res->start == 0 && res->end == res->start) ? 0 :
> + (res->end - res->start + 1);
> +
> + if (!base || !size)
> + return 0;
> +
> +#ifdef CONFIG_HPET_TIMER
> + /* for hpet */
> + if (base == hpet_address && (res->flags & IORESOURCE_MEM)) {
> + struct resource *root = NULL;
> +
> + WARN("BAR has HPET at %08lx-%08lx\n", base, base + size - 1);
> + /*
> + * forcibly insert it into the
> + * resource tree
> + */
> + if (res->flags & IORESOURCE_MEM)
> + root = &iomem_resource;
> + else if (res->flags & IORESOURCE_IO)
> + root = &ioport_resource;
> +
> + if (root)
> + insert_resource(root, res);
> + return 1;
> + }
> +#endif
> +
> + return 0;
> +}
> +
> +/*
> * Handle resources of PCI devices. If the world were perfect, we could
> * just allocate all the resource regions and do nothing more. It isn't.
> * On the other hand, we cannot just re-allocate all devices, as it would
> @@ -171,6 +213,8 @@ static void __init pcibios_allocate_resources(int pass)
> r->flags, disabled, pass);
> pr = pci_find_parent_resource(dev, r);
> if (!pr || request_resource(pr, r) < 0) {
> + if (check_platform(dev, r))
> + continue;
> dev_err(&dev->dev, "BAR %d: can't "
> "allocate resource\n", idx);
> /* We'll assign a new address later */

2008-08-25 15:12:26

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86: only put e820 ram entries in resource tree

On Sun, Aug 24, 2008 at 03:44:57PM -0700, Yinghai Lu wrote:
> may need user to have new kexec tools that could create e820 table
> from /sys/firmware/memmap instead of /proc/iomem for second kernel
>
> Signed-off-by: Yinghai Lu <[email protected]>
> Cc: Bernhard Walle <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> Cc: "Eric W. Biederman" <[email protected]>
>
> Index: linux-2.6/arch/x86/kernel/e820.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/e820.c
> +++ linux-2.6/arch/x86/kernel/e820.c
> @@ -1279,6 +1279,10 @@ void __init e820_reserve_resources(void)
>
> res = alloc_bootmem_low(sizeof(struct resource) * e820.nr_map);
> for (i = 0; i < e820.nr_map; i++) {
> + if (e820.map[i].type != E820_RAM) {
> + res++;
> + continue;
> + }
> end = e820.map[i].addr + e820.map[i].size - 1;
> #ifndef CONFIG_RESOURCES_64BIT
> if (end > 0x100000000ULL) {

I think this will wipe out ACPI related entries also from /proc/iomem
and kdump will be broken as second kernel needs to know about the ACPI
areas.

Though, if all these entries are available in /sys/firmware/memap then
probably one can modify kexec-tools to grep RAM entries from /proc/iomem and
rest of the entries from /sys/firmware/memmap.

I would not prefer doing that it makes the logic twisted.

Thanks
Vivek

2008-08-25 17:09:04

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86: only put e820 ram entries in resource tree

On Mon, Aug 25, 2008 at 6:30 AM, Eric W. Biederman
<[email protected]> wrote:
>
> Why does pci_find_parent_resource fail?

it doesn't fail, it got [0, -1ULL].

because that device 00:14.0 is on bus0. and that is one HT chain system,

YH

2008-08-25 17:11:41

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86: only put e820 ram entries in resource tree

On Mon, Aug 25, 2008 at 7:19 AM, Vivek Goyal <[email protected]> wrote:
> On Sun, Aug 24, 2008 at 03:44:57PM -0700, Yinghai Lu wrote:
>> may need user to have new kexec tools that could create e820 table
>> from /sys/firmware/memmap instead of /proc/iomem for second kernel
>>
>> Signed-off-by: Yinghai Lu <[email protected]>
>> Cc: Bernhard Walle <[email protected]>
>> Cc: Vivek Goyal <[email protected]>
>> Cc: "Eric W. Biederman" <[email protected]>
>>
>> Index: linux-2.6/arch/x86/kernel/e820.c
>> ===================================================================
>> --- linux-2.6.orig/arch/x86/kernel/e820.c
>> +++ linux-2.6/arch/x86/kernel/e820.c
>> @@ -1279,6 +1279,10 @@ void __init e820_reserve_resources(void)
>>
>> res = alloc_bootmem_low(sizeof(struct resource) * e820.nr_map);
>> for (i = 0; i < e820.nr_map; i++) {
>> + if (e820.map[i].type != E820_RAM) {
>> + res++;
>> + continue;
>> + }
>> end = e820.map[i].addr + e820.map[i].size - 1;
>> #ifndef CONFIG_RESOURCES_64BIT
>> if (end > 0x100000000ULL) {
>
> I think this will wipe out ACPI related entries also from /proc/iomem
> and kdump will be broken as second kernel needs to know about the ACPI
> areas.
>
> Though, if all these entries are available in /sys/firmware/memap then
> probably one can modify kexec-tools to grep RAM entries from /proc/iomem and
> rest of the entries from /sys/firmware/memmap.

/sys/firmware/memmap have all of them. though RAM entries from
/proc/iomem could be smaller than that in /sys/firmware/memmap because
of trimming from commandline.

YH

2008-08-25 17:13:18

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86: only put e820 ram entries in resource tree

On Mon, Aug 25, 2008 at 1:39 AM, Bernhard Walle <[email protected]> wrote:
> * Yinghai Lu [2008-08-24 15:44]:
>>
>> may need user to have new kexec tools that could create e820 table
>> from /sys/firmware/memmap instead of /proc/iomem for second kernel
>
> 2.0.0 has that implemented.
>
Yes

can you guys make kexec-tools 2.0.0 can be complied to static as one option?

YH

2008-08-26 08:20:38

by Bernhard Walle

[permalink] [raw]
Subject: Re: [PATCH] x86: only put e820 ram entries in resource tree

* Yinghai Lu [2008-08-25 10:13]:
>
> On Mon, Aug 25, 2008 at 1:39 AM, Bernhard Walle <[email protected]> wrote:
> > * Yinghai Lu [2008-08-24 15:44]:
> >>
> >> may need user to have new kexec tools that could create e820 table
> >> from /sys/firmware/memmap instead of /proc/iomem for second kernel
> >
> > 2.0.0 has that implemented.
>
> Yes
>
> can you guys make kexec-tools 2.0.0 can be complied to static as one option?

See http://article.gmane.org/gmane.linux.kernel.kexec/2223.



Bernhard
--
Bernhard Walle, SUSE LINUX Products GmbH, Architecture Development