2008-06-27 11:12:26

by Bernhard Walle

[permalink] [raw]
Subject: x86: Add /sys/firmware/memmap

This patch series adds a new interface /sys/firmware/memmap to export
the BIOS (Firmware) provided memory map via sysfs for usage with
kexec.

While the first patch adds the generic interface, the second patch
implements that interface for E820 on x86.

Change compared to previous commit:

1. Comments from Greg KH.
2. Documentation.

The patch has been tested on i386 and x86_64 with success.


2008-06-27 11:12:41

by Bernhard Walle

[permalink] [raw]
Subject: [PATCH 2/2] Use FIRMWARE_MEMMAP on x86/E820

This patch uses the /sys/firmware/memmap interface provided in the last patch
on the x86 architecture when E820 is used. The patch copies the E820
memory map very early, and registers the E820 map afterwards via
firmware_map_add_early().


Signed-off-by: Bernhard Walle <[email protected]>
---
arch/x86/kernel/e820.c | 42 ++++++++++++++++++++++++++++++++++++------
include/asm-x86/e820.h | 2 ++
2 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 1dcb665..e48eb10 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -19,6 +19,7 @@
#include <linux/mm.h>
#include <linux/pfn.h>
#include <linux/suspend.h>
+#include <linux/firmware-map.h>

#include <asm/pgtable.h>
#include <asm/page.h>
@@ -27,7 +28,22 @@
#include <asm/setup.h>
#include <asm/trampoline.h>

+/*
+ * The e820 map is the map that gets modified e.g. with command line parameters
+ * and that is also registered with modifications in the kernel resource tree
+ * with the iomem_resource as parent.
+ *
+ * The e820_saved is directly saved after the BIOS-provided memory map is
+ * copied. It doesn't get modified afterwards. It's registered for the
+ * /sys/firmware/memmap interface.
+ *
+ * That memory map is not modified and is used as base for kexec. The kexec'd
+ * kernel should get the same memory map as the firmware provides. Then the
+ * user can e.g. boot the original kernel with mem=1G while still booting the
+ * next kernel with full memory.
+ */
struct e820map e820;
+struct e820map e820_saved;

/* For PCI or other memory-mapped resources */
unsigned long pci_mem_start = 0xaeedbabe;
@@ -1180,6 +1196,16 @@ void __init finish_e820_parsing(void)
}
}

+static inline const char *e820_type_to_string(int e820_type)
+{
+ switch (e820_type) {
+ case E820_RAM: return "System RAM";
+ case E820_ACPI: return "ACPI Tables";
+ case E820_NVS: return "ACPI Non-volatile Storage";
+ default: return "reserved";
+ }
+}
+
/*
* Mark e820 reserved areas as busy for the resource manager.
*/
@@ -1190,12 +1216,7 @@ void __init e820_reserve_resources(void)

res = alloc_bootmem_low(sizeof(struct resource) * e820.nr_map);
for (i = 0; i < e820.nr_map; i++) {
- switch (e820.map[i].type) {
- case E820_RAM: res->name = "System RAM"; break;
- case E820_ACPI: res->name = "ACPI Tables"; break;
- case E820_NVS: res->name = "ACPI Non-volatile Storage"; break;
- default: res->name = "reserved";
- }
+ res->name = e820_type_to_string(e820.map[i].type);
res->start = e820.map[i].addr;
res->end = res->start + e820.map[i].size - 1;
#ifndef CONFIG_RESOURCES_64BIT
@@ -1208,6 +1229,13 @@ void __init e820_reserve_resources(void)
insert_resource(&iomem_resource, res);
res++;
}
+
+ for (i = 0; i < e820_saved.nr_map; i++) {
+ struct e820entry *entry = &e820_saved.map[i];
+ firmware_map_add_early(entry->addr,
+ entry->addr + entry->size - 1,
+ e820_type_to_string(entry->type));
+ }
}

char *__init default_machine_specific_memory_setup(void)
@@ -1243,6 +1271,8 @@ char *__init default_machine_specific_memory_setup(void)
e820_add_region(HIGH_MEMORY, mem_size << 10, E820_RAM);
}

+ memcpy(&e820_saved, &e820, sizeof(struct e820map));
+
/* In case someone cares... */
return who;
}
diff --git a/include/asm-x86/e820.h b/include/asm-x86/e820.h
index f622685..64ce6f8 100644
--- a/include/asm-x86/e820.h
+++ b/include/asm-x86/e820.h
@@ -56,7 +56,9 @@ struct e820map {
struct e820entry map[E820_X_MAX];
};

+/* see comment in arch/x86/kernel/e820.c */
extern struct e820map e820;
+extern struct e820map e820_saved;

extern int e820_any_mapped(u64 start, u64 end, unsigned type);
extern int e820_all_mapped(u64 start, u64 end, unsigned type);
--
1.5.6

2008-06-27 11:12:54

by Bernhard Walle

[permalink] [raw]
Subject: [PATCH 1/2] Add /sys/firmware/memmap

This patch adds /sys/firmware/memmap interface that represents the BIOS
(or Firmware) provided memory map. The tree looks like:

/sys/firmware/memmap/0/start (hex number)
end (hex number)
type (string)
... /1/start
end
type

With the following shell snippet one can print the memory map in the same form
the kernel prints itself when booting on x86 (the E820 map).

--------- 8< --------------------------
#!/bin/sh
cd /sys/firmware/memmap
for dir in * ; do
start=$(cat $dir/start)
end=$(cat $dir/end)
type=$(cat $dir/type)
printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type"
done
--------- >8 --------------------------

That patch only provides the needed interface:

1. The sysfs interface.
2. The structure and enumeration definition.
3. The function firmware_map_add() and firmware_map_add_early()
that should be called from architecture code (E820/EFI, for
example) to add the contents to the interface.

If the kernel is compiled without CONFIG_FIRMWARE_MEMMAP, the interface does
nothing without cluttering the architecture-specific code with #ifdef's.

The purpose of the new interface is kexec: While /proc/iomem represents
the *used* memory map (e.g. modified via kernel parameters like 'memmap'
and 'mem'), the /sys/firmware/memmap tree represents the unmodified memory
map provided via the firmware. So kexec can:

- use the original memory map for rebooting,
- use the /proc/iomem for setting up the ELF core headers for kdump
case that should only represent the memory of the system.

The patch has been tested on i386 and x86_64.


Signed-off-by: Bernhard Walle <[email protected]>
---
Documentation/ABI/testing/sysfs-firmware-memmap | 71 ++++++++
drivers/firmware/Kconfig | 10 +
drivers/firmware/Makefile | 1 +
drivers/firmware/memmap.c | 205 +++++++++++++++++++++++
include/linux/firmware-map.h | 74 ++++++++
5 files changed, 361 insertions(+), 0 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-firmware-memmap
create mode 100644 drivers/firmware/memmap.c
create mode 100644 include/linux/firmware-map.h

diff --git a/Documentation/ABI/testing/sysfs-firmware-memmap b/Documentation/ABI/testing/sysfs-firmware-memmap
new file mode 100644
index 0000000..0d99ee6
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-memmap
@@ -0,0 +1,71 @@
+What: /sys/firmware/memmap/
+Date: June 2008
+Contact: Bernhard Walle <[email protected]>
+Description:
+ On all platforms, the firmware provides a memory map which the
+ kernel reads. The resources from that memory map are registered
+ in the kernel resource tree and exposed to userspace via
+ /proc/iomem (together with other resources).
+
+ However, on most architectures that firmware-provided memory
+ map is modified afterwards by the kernel itself, either because
+ the kernel merges that memory map with other information or
+ just because the user overwrites that memory map via command
+ line.
+
+ kexec needs the raw firmware-provided memory map to setup the
+ parameter segment of the kernel that should be booted with
+ kexec. Also, the raw memory map is useful for debugging. For
+ that reason, /sys/firmware/memmap is an interface that provides
+ the raw memory map to userspace.
+
+ The structure is as follows: Under /sys/firmware/memmap there
+ are subdirectories with the number of the entry as their name:
+
+ /sys/firmware/memmap/0
+ /sys/firmware/memmap/1
+ /sys/firmware/memmap/2
+ /sys/firmware/memmap/3
+ ...
+
+ The maximum depends on the number of memory map entries provided
+ by the firmware. The order is just the order that the firmware
+ provides.
+
+ Each directory contains three files:
+
+ start : The start address (as hexadecimal number with the
+ '0x' prefix).
+ end : The end address, inclusive (regardless whether the
+ firmware provides inclusive or exclusive ranges).
+ type : Type of the entry as string. See below for a list of
+ valid types.
+
+ So, for example:
+
+ /sys/firmware/memmap/0/start
+ /sys/firmware/memmap/0/end
+ /sys/firmware/memmap/0/type
+ /sys/firmware/memmap/1/start
+ ...
+
+ Currently following types exist:
+
+ - System RAM
+ - ACPI Tables
+ - ACPI Non-volatile Storage
+ - reserved
+
+ Following shell snippet can be used to display that memory
+ map in a human-readable format:
+
+ -------------------- 8< ----------------------------------------
+ #!/bin/bash
+ cd /sys/firmware/memmap
+ for dir in * ; do
+ start=$(cat $dir/start)
+ end=$(cat $dir/end)
+ type=$(cat $dir/type)
+ printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type"
+ done
+ -------------------- >8 ----------------------------------------
diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index dc2cec6..ebb9e51 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -26,6 +26,16 @@ config EDD_OFF
kernel. Say N if you want EDD enabled by default. EDD can be dynamically set
using the kernel parameter 'edd={on|skipmbr|off}'.

+config FIRMWARE_MEMMAP
+ bool "Add firmware-provided memory map to sysfs" if EMBEDDED
+ default (X86_64 || X86_32)
+ help
+ Add the firmware-provided (unmodified) memory map to /sys/firmware/memmap.
+ That memory map is used for example by kexec to set up parameter area
+ for the next kernel, but can also be used for debugging purposes.
+
+ See also Documentation/ABI/testing/sysfs-firmware-memmap.
+
config EFI_VARS
tristate "EFI Variable Support via sysfs"
depends on EFI
diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
index 4c91471..1c3c173 100644
--- a/drivers/firmware/Makefile
+++ b/drivers/firmware/Makefile
@@ -10,3 +10,4 @@ obj-$(CONFIG_DCDBAS) += dcdbas.o
obj-$(CONFIG_DMIID) += dmi-id.o
obj-$(CONFIG_ISCSI_IBFT_FIND) += iscsi_ibft_find.o
obj-$(CONFIG_ISCSI_IBFT) += iscsi_ibft.o
+obj-$(CONFIG_FIRMWARE_MEMMAP) += memmap.o
diff --git a/drivers/firmware/memmap.c b/drivers/firmware/memmap.c
new file mode 100644
index 0000000..e23399c
--- /dev/null
+++ b/drivers/firmware/memmap.c
@@ -0,0 +1,205 @@
+/*
+ * linux/drivers/firmware/memmap.c
+ * Copyright (C) 2008 SUSE LINUX Products GmbH
+ * by Bernhard Walle <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License v2.0 as published by
+ * the Free Software Foundation
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/string.h>
+#include <linux/firmware-map.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/bootmem.h>
+
+/*
+ * Data types ------------------------------------------------------------------
+ */
+
+/*
+ * Firmware map entry. Because firmware memory maps are flat and not
+ * hierarchical, it's ok to organise them in a linked list. No parent
+ * information is necessary as for the resource tree.
+ */
+struct firmware_map_entry {
+ resource_size_t start; /* start of the memory range */
+ resource_size_t end; /* end of the memory range (incl.) */
+ const char *type; /* type of the memory range */
+ struct list_head list; /* entry for the linked list */
+ struct kobject kobj; /* kobject for each entry */
+};
+
+/*
+ * Forward declarations --------------------------------------------------------
+ */
+static ssize_t memmap_attr_show(struct kobject *kobj,
+ struct attribute *attr, char *buf);
+static ssize_t start_show(struct firmware_map_entry *entry, char *buf);
+static ssize_t end_show(struct firmware_map_entry *entry, char *buf);
+static ssize_t type_show(struct firmware_map_entry *entry, char *buf);
+
+/*
+ * Static data -----------------------------------------------------------------
+ */
+
+struct memmap_attribute {
+ struct attribute attr;
+ ssize_t (*show)(struct firmware_map_entry *entry, char *buf);
+};
+
+struct memmap_attribute memmap_start_attr = __ATTR_RO(start);
+struct memmap_attribute memmap_end_attr = __ATTR_RO(end);
+struct memmap_attribute memmap_type_attr = __ATTR_RO(type);
+
+/*
+ * These are default attributes that are added for every memmap entry.
+ */
+static struct attribute *def_attrs[] = {
+ &memmap_start_attr.attr,
+ &memmap_end_attr.attr,
+ &memmap_type_attr.attr,
+ NULL
+};
+
+static struct sysfs_ops memmap_attr_ops = {
+ .show = memmap_attr_show,
+};
+
+static struct kobj_type memmap_ktype = {
+ .sysfs_ops = &memmap_attr_ops,
+ .default_attrs = def_attrs,
+};
+
+/*
+ * Registration functions ------------------------------------------------------
+ */
+
+/*
+ * Firmware memory map entries
+ */
+static LIST_HEAD(map_entries);
+
+/**
+ * Common implementation of firmware_map_add() and firmware_map_add_early()
+ * which expects a pre-allocated struct firmware_map_entry.
+ *
+ * @start: Start of the memory range.
+ * @end: End of the memory range (inclusive).
+ * @type: Type of the memory range.
+ * @entry: Pre-allocated (either kmalloc() or bootmem allocator), uninitialised
+ * entry.
+ */
+static int firmware_map_add_entry(resource_size_t start, resource_size_t end,
+ const char *type,
+ struct firmware_map_entry *entry)
+{
+ BUG_ON(start > end);
+
+ entry->start = start;
+ entry->end = end;
+ entry->type = type;
+ INIT_LIST_HEAD(&entry->list);
+ kobject_init(&entry->kobj, &memmap_ktype);
+
+ list_add_tail(&entry->list, &map_entries);
+
+ return 0;
+}
+
+/*
+ * See <linux/firmware-map.h> for documentation.
+ */
+int firmware_map_add(resource_size_t start, resource_size_t end,
+ const char *type)
+{
+ struct firmware_map_entry *entry;
+
+ entry = kmalloc(sizeof(struct firmware_map_entry), GFP_ATOMIC);
+ WARN_ON(!entry);
+ if (!entry)
+ return -ENOMEM;
+
+ return firmware_map_add_entry(start, end, type, entry);
+}
+
+/*
+ * See <linux/firmware-map.h> for documentation.
+ */
+int __init firmware_map_add_early(resource_size_t start, resource_size_t end,
+ const char *type)
+{
+ struct firmware_map_entry *entry;
+
+ entry = alloc_bootmem_low(sizeof(struct firmware_map_entry));
+ WARN_ON(!entry);
+ if (!entry)
+ return -ENOMEM;
+
+ return firmware_map_add_entry(start, end, type, entry);
+}
+
+/*
+ * Sysfs functions -------------------------------------------------------------
+ */
+
+static ssize_t start_show(struct firmware_map_entry *entry, char *buf)
+{
+ return snprintf(buf, PAGE_SIZE, "0x%llx\n", entry->start);
+}
+
+static ssize_t end_show(struct firmware_map_entry *entry, char *buf)
+{
+ return snprintf(buf, PAGE_SIZE, "0x%llx\n", entry->end);
+}
+
+static ssize_t type_show(struct firmware_map_entry *entry, char *buf)
+{
+ return snprintf(buf, PAGE_SIZE, "%s\n", entry->type);
+}
+
+#define to_memmap_attr(_attr) container_of(_attr, struct memmap_attribute, attr)
+#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, kobj)
+
+static ssize_t memmap_attr_show(struct kobject *kobj,
+ struct attribute *attr, char *buf)
+{
+ struct firmware_map_entry *entry = to_memmap_entry(kobj);
+ struct memmap_attribute *memmap_attr = to_memmap_attr(attr);
+
+ return memmap_attr->show(entry, buf);
+}
+
+/*
+ * Initialises stuff and adds the entries in the map_entries list to
+ * sysfs. Important is that firmware_map_add() and firmware_map_add_early()
+ * must be called before late_initcall.
+ */
+static int __init memmap_init(void)
+{
+ int i = 0;
+ struct firmware_map_entry *entry;
+ struct kset *memmap_kset;
+
+ memmap_kset = kset_create_and_add("memmap", NULL, firmware_kobj);
+ WARN_ON(!memmap_kset);
+ if (!memmap_kset)
+ return -ENOMEM;
+
+ list_for_each_entry(entry, &map_entries, list) {
+ entry->kobj.kset = memmap_kset;
+ kobject_add(&entry->kobj, NULL, "%d", i++);
+ }
+
+ return 0;
+}
+late_initcall(memmap_init);
+
diff --git a/include/linux/firmware-map.h b/include/linux/firmware-map.h
new file mode 100644
index 0000000..acbdbcc
--- /dev/null
+++ b/include/linux/firmware-map.h
@@ -0,0 +1,74 @@
+/*
+ * include/linux/firmware-map.h:
+ * Copyright (C) 2008 SUSE LINUX Products GmbH
+ * by Bernhard Walle <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License v2.0 as published by
+ * the Free Software Foundation
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+#ifndef _LINUX_FIRMWARE_MAP_H
+#define _LINUX_FIRMWARE_MAP_H
+
+#include <linux/list.h>
+#include <linux/kobject.h>
+
+/*
+ * provide a dummy interface if CONFIG_FIRMWARE_MEMMAP is disabled
+ */
+#ifdef CONFIG_FIRMWARE_MEMMAP
+
+/**
+ * Adds a firmware mapping entry. This function uses kmalloc() for memory
+ * allocation. Use firmware_map_add_early() if you want to use the bootmem
+ * allocator.
+ *
+ * That function must be called before late_initcall.
+ *
+ * @start: Start of the memory range.
+ * @end: End of the memory range (inclusive).
+ * @type: Type of the memory range.
+ *
+ * Returns 0 on success, or -ENOMEM if no memory could be allocated.
+ */
+int firmware_map_add(resource_size_t start, resource_size_t end,
+ const char *type);
+
+/**
+ * Adds a firmware mapping entry. This function uses the bootmem allocator
+ * for memory allocation. Use firmware_map_add() if you want to use kmalloc().
+ *
+ * That function must be called before late_initcall.
+ *
+ * @start: Start of the memory range.
+ * @end: End of the memory range (inclusive).
+ * @type: Type of the memory range.
+ *
+ * Returns 0 on success, or -ENOMEM if no memory could be allocated.
+ */
+int firmware_map_add_early(resource_size_t start, resource_size_t end,
+ const char *type);
+
+#else /* CONFIG_FIRMWARE_MEMMAP */
+
+static inline int firmware_map_add(resource_size_t start, resource_size_t end,
+ const char *type)
+{
+ return 0;
+}
+
+static inline int firmware_map_add_early(resource_size_t start,
+ resource_size_t end, const char *type)
+{
+ return 0;
+}
+
+#endif /* CONFIG_FIRMWARE_MEMMAP */
+
+#endif /* _LINUX_FIRMWARE_MAP_H */
--
1.5.6

2008-06-27 13:02:30

by Vivek Goyal

[permalink] [raw]
Subject: Re: x86: Add /sys/firmware/memmap

On Fri, Jun 27, 2008 at 01:12:53PM +0200, Bernhard Walle wrote:
> This patch series adds a new interface /sys/firmware/memmap to export
> the BIOS (Firmware) provided memory map via sysfs for usage with
> kexec.
>
> While the first patch adds the generic interface, the second patch
> implements that interface for E820 on x86.
>
> Change compared to previous commit:
>
> 1. Comments from Greg KH.
> 2. Documentation.
>
> The patch has been tested on i386 and x86_64 with success.

This patchset looks good to me. We do need this functionality for doing
the right thing for kexec and kdump users.

Acked-by: Vivek Goyal <[email protected]>

Thanks
Vivek

2008-06-27 18:22:45

by Greg KH

[permalink] [raw]
Subject: Re: x86: Add /sys/firmware/memmap

On Fri, Jun 27, 2008 at 01:12:53PM +0200, Bernhard Walle wrote:
> This patch series adds a new interface /sys/firmware/memmap to export
> the BIOS (Firmware) provided memory map via sysfs for usage with
> kexec.
>
> While the first patch adds the generic interface, the second patch
> implements that interface for E820 on x86.
>
> Change compared to previous commit:
>
> 1. Comments from Greg KH.
> 2. Documentation.
>
> The patch has been tested on i386 and x86_64 with success.
>

Looks good, want me to take this through my tree (due to the firmware
core change)? If so, can you get an ack on the x86 portion from the
respective people as well?

thanks,

greg k-h

2008-06-27 18:34:21

by Bernhard Walle

[permalink] [raw]
Subject: Re: x86: Add /sys/firmware/memmap

* Greg KH <[email protected]> [2008-06-27 11:18]:
>
> Looks good, want me to take this through my tree (due to the firmware
> core change)? If so, can you get an ack on the x86 portion from the
> respective people as well?

Well, the x86 part only applies on tip (because your tree doesn't have
the e820_{32,64} -> e820 merge). So, wouldn't it easier to ack that
patch from your side and merge via tip?



Bernhard
--
Bernhard Walle, SUSE LINUX Products GmbH, Architecture Development

2008-06-27 18:51:20

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 1/2] Add /sys/firmware/memmap

On Fri, 2008-06-27 at 13:12 +0200, Bernhard Walle wrote:
>
> This patch adds /sys/firmware/memmap interface that represents the
> BIOS
> (or Firmware) provided memory map. The tree looks like:
>
> /sys/firmware/memmap/0/start (hex number)
> end (hex number)
> type (string)
> ... /1/start
> end
> type

Could we give this a slightly different name? We have various 'mem_map'
variables in the kernel, and this confused me on first glance.

Could we just spell out 'memory_map'?

-- Dave

2008-06-27 21:00:57

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 1/2] Add /sys/firmware/memmap

On Fri, Jun 27, 2008 at 01:12:54PM +0200, Bernhard Walle wrote:
> This patch adds /sys/firmware/memmap interface that represents the BIOS
> (or Firmware) provided memory map. The tree looks like:
>
> /sys/firmware/memmap/0/start (hex number)
> end (hex number)
> type (string)
> ... /1/start
> end
> type
>
> With the following shell snippet one can print the memory map in the same form
> the kernel prints itself when booting on x86 (the E820 map).
>
> --------- 8< --------------------------
> #!/bin/sh
> cd /sys/firmware/memmap
> for dir in * ; do
> start=$(cat $dir/start)
> end=$(cat $dir/end)
> type=$(cat $dir/type)
> printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type"
> done
> --------- >8 --------------------------
>
> That patch only provides the needed interface:
>
> 1. The sysfs interface.
> 2. The structure and enumeration definition.
> 3. The function firmware_map_add() and firmware_map_add_early()
> that should be called from architecture code (E820/EFI, for
> example) to add the contents to the interface.
>
> If the kernel is compiled without CONFIG_FIRMWARE_MEMMAP, the interface does
> nothing without cluttering the architecture-specific code with #ifdef's.
>
> The purpose of the new interface is kexec: While /proc/iomem represents
> the *used* memory map (e.g. modified via kernel parameters like 'memmap'
> and 'mem'), the /sys/firmware/memmap tree represents the unmodified memory
> map provided via the firmware. So kexec can:
>
> - use the original memory map for rebooting,
> - use the /proc/iomem for setting up the ELF core headers for kdump
> case that should only represent the memory of the system.
>
> The patch has been tested on i386 and x86_64.
>
>
> Signed-off-by: Bernhard Walle <[email protected]>

Acked-by: Greg Kroah-Hartman <[email protected]>

x86 developers, feel free to add this to your tree, it looks fine to me.

thanks,

greg k-h

2008-07-01 20:30:59

by Bernhard Walle

[permalink] [raw]
Subject: Re: [PATCH 1/2] Add /sys/firmware/memmap

* Greg KH <[email protected]> [2008-06-27 13:56]:
>
> Acked-by: Greg Kroah-Hartman <[email protected]>
>
> x86 developers, feel free to add this to your tree, it looks fine to me.

Ingo, any pending issues from your side that prevents merging in 'tip'?



Bernhard
--
Bernhard Walle, SUSE LINUX Products GmbH, Architecture Development

2008-07-01 20:33:43

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/2] Add /sys/firmware/memmap


* Bernhard Walle <[email protected]> wrote:

> * Greg KH <[email protected]> [2008-06-27 13:56]:
> >
> > Acked-by: Greg Kroah-Hartman <[email protected]>
> >
> > x86 developers, feel free to add this to your tree, it looks fine to me.
>
> Ingo, any pending issues from your side that prevents merging in
> 'tip'?

nope, no pending issues now that Greg has acked it - will create a topic
for it tomorrow.

Ingo

2008-07-03 14:16:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: x86: Add /sys/firmware/memmap


* Bernhard Walle <[email protected]> wrote:

> * Greg KH <[email protected]> [2008-06-27 11:18]:
> >
> > Looks good, want me to take this through my tree (due to the firmware
> > core change)? If so, can you get an ack on the x86 portion from the
> > respective people as well?
>
> Well, the x86 part only applies on tip (because your tree doesn't have
> the e820_{32,64} -> e820 merge). So, wouldn't it easier to ack that
> patch from your side and merge via tip?

Bernhard, i've created a new -tip topic for your changes:
tip/sysfs/firmware-mmap. It is available at:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git sysfs/firmware-mmap

This will become a 2-commit topic branch once the dependent x86 bits are
upstream. At that point Greg will be able to pull it from -tip and push
it out via the sysfs tree.

I've added it to the tip/master integration rules to give it testing and
to make sure any subsequent x86 changes stay in sync with the e820 API
changes your patches required.

Ingo

2008-07-12 00:36:52

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/2] Add /sys/firmware/memmap

On Fri, 27 Jun 2008 13:12:54 +0200 Bernhard Walle <[email protected]> wrote:

> This patch adds /sys/firmware/memmap interface that represents the BIOS
> (or Firmware) provided memory map. The tree looks like:
>
> /sys/firmware/memmap/0/start (hex number)
> end (hex number)
> type (string)
> ... /1/start
> end
> type
>
> With the following shell snippet one can print the memory map in the same form
> the kernel prints itself when booting on x86 (the E820 map).
>
> --------- 8< --------------------------
> #!/bin/sh
> cd /sys/firmware/memmap
> for dir in * ; do
> start=$(cat $dir/start)
> end=$(cat $dir/end)
> type=$(cat $dir/type)
> printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type"
> done
> --------- >8 --------------------------
>
> That patch only provides the needed interface:
>
> 1. The sysfs interface.
> 2. The structure and enumeration definition.
> 3. The function firmware_map_add() and firmware_map_add_early()
> that should be called from architecture code (E820/EFI, for
> example) to add the contents to the interface.
>
> If the kernel is compiled without CONFIG_FIRMWARE_MEMMAP, the interface does
> nothing without cluttering the architecture-specific code with #ifdef's.
>
> The purpose of the new interface is kexec: While /proc/iomem represents
> the *used* memory map (e.g. modified via kernel parameters like 'memmap'
> and 'mem'), the /sys/firmware/memmap tree represents the unmodified memory
> map provided via the firmware. So kexec can:
>
> - use the original memory map for rebooting,
> - use the /proc/iomem for setting up the ELF core headers for kdump
> case that should only represent the memory of the system.
>
> The patch has been tested on i386 and x86_64.
>
> ...
>
> +/*
> + * Firmware memory map entries
> + */
> +static LIST_HEAD(map_entries);

Alarm bells. Please add a comment explaining why no locking is needed
for this.

> +/**
> + * Common implementation of firmware_map_add() and firmware_map_add_early()
> + * which expects a pre-allocated struct firmware_map_entry.
> + *
> + * @start: Start of the memory range.
> + * @end: End of the memory range (inclusive).
> + * @type: Type of the memory range.
> + * @entry: Pre-allocated (either kmalloc() or bootmem allocator), uninitialised
> + * entry.
> + */

Busted kerneldoc. It should start with

/**
* firmware_map_add_entry - <stuff>

> +static int firmware_map_add_entry(resource_size_t start, resource_size_t end,
> + const char *type,
> + struct firmware_map_entry *entry)
> +{
> + BUG_ON(start > end);
> +
> + entry->start = start;
> + entry->end = end;
> + entry->type = type;
> + INIT_LIST_HEAD(&entry->list);
> + kobject_init(&entry->kobj, &memmap_ktype);
> +
> + list_add_tail(&entry->list, &map_entries);
> +
> + return 0;
> +}
> +
> +/*
> + * See <linux/firmware-map.h> for documentation.
> + */
> +int firmware_map_add(resource_size_t start, resource_size_t end,
> + const char *type)
> +{
> + struct firmware_map_entry *entry;
> +
> + entry = kmalloc(sizeof(struct firmware_map_entry), GFP_ATOMIC);
> + WARN_ON(!entry);

Actually, kmalloc() would have warned already.

> + if (!entry)
> + return -ENOMEM;
> +
> + return firmware_map_add_entry(start, end, type, entry);
> +}
> +
>
> ...
>
> +static ssize_t start_show(struct firmware_map_entry *entry, char *buf)
> +{
> + return snprintf(buf, PAGE_SIZE, "0x%llx\n", entry->start);
> +}
> +
> +static ssize_t end_show(struct firmware_map_entry *entry, char *buf)
> +{
> + return snprintf(buf, PAGE_SIZE, "0x%llx\n", entry->end);
> +}

Printing a resource_size_t needs a cast to `unsigned long long' to
avoid printk warnings.

> +static ssize_t type_show(struct firmware_map_entry *entry, char *buf)
> +{
> + return snprintf(buf, PAGE_SIZE, "%s\n", entry->type);
> +}
> +
> +#define to_memmap_attr(_attr) container_of(_attr, struct memmap_attribute, attr)
> +#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, kobj)
> +
> +static ssize_t memmap_attr_show(struct kobject *kobj,
> + struct attribute *attr, char *buf)
> +{
> + struct firmware_map_entry *entry = to_memmap_entry(kobj);
> + struct memmap_attribute *memmap_attr = to_memmap_attr(attr);
> +
> + return memmap_attr->show(entry, buf);
> +}
> +
> +/*
> + * Initialises stuff and adds the entries in the map_entries list to
> + * sysfs. Important is that firmware_map_add() and firmware_map_add_early()
> + * must be called before late_initcall.

Please update the comment to provide the reason why this is important.

> + */
> +static int __init memmap_init(void)
> +{
> + int i = 0;
> + struct firmware_map_entry *entry;
> + struct kset *memmap_kset;
> +
> + memmap_kset = kset_create_and_add("memmap", NULL, firmware_kobj);
> + WARN_ON(!memmap_kset);
> + if (!memmap_kset)
> + return -ENOMEM;

Actually, you can do

if (WARN_ON(!memmap_kset))
return -ENOMEM;

(multiple instances)

> + list_for_each_entry(entry, &map_entries, list) {
> + entry->kobj.kset = memmap_kset;
> + kobject_add(&entry->kobj, NULL, "%d", i++);

kobject_add() can fail.

I'd suggest that you enable CONFIG_ENABLE_MUST_CHECK in your usualconfig.

> + }
> +
> + return 0;
> +}
> +late_initcall(memmap_init);
> +
>
> ...
>
> +/**
> + * Adds a firmware mapping entry. This function uses kmalloc() for memory
> + * allocation. Use firmware_map_add_early() if you want to use the bootmem
> + * allocator.
> + *
> + * That function must be called before late_initcall.
> + *
> + * @start: Start of the memory range.
> + * @end: End of the memory range (inclusive).
> + * @type: Type of the memory range.
> + *
> + * Returns 0 on success, or -ENOMEM if no memory could be allocated.
> + */
> +int firmware_map_add(resource_size_t start, resource_size_t end,
> + const char *type);

More busted kernedoc (multiple instances)

Please document C functions at the definition site, not in the header file.

a) because otherwise the documentation gets splattered across
different files (eg - static functions?)

b) principle of least surprise: that's where people expect to find it.

> +/**
> + * Adds a firmware mapping entry. This function uses the bootmem allocator
> + * for memory allocation. Use firmware_map_add() if you want to use kmalloc().
> + *
> + * That function must be called before late_initcall.
> + *
> + * @start: Start of the memory range.
> + * @end: End of the memory range (inclusive).
> + * @type: Type of the memory range.
> + *
> + * Returns 0 on success, or -ENOMEM if no memory could be allocated.
> + */
> +int firmware_map_add_early(resource_size_t start, resource_size_t end,
> + const char *type);
> +
> +#else /* CONFIG_FIRMWARE_MEMMAP */
> +
> +static inline int firmware_map_add(resource_size_t start, resource_size_t end,
> + const char *type)
> +{
> + return 0;
> +}
> +
> +static inline int firmware_map_add_early(resource_size_t start,
> + resource_size_t end, const char *type)
> +{
> + return 0;
> +}
> +
> +#endif /* CONFIG_FIRMWARE_MEMMAP */
> +
> +#endif /* _LINUX_FIRMWARE_MAP_H */

2008-07-12 22:40:25

by Bernhard Walle

[permalink] [raw]
Subject: [PATCH 2/2] firmware/memmap: Move kobject initialisation before kobject_add()

Because kobject_init() call could be done from firmware_map_add_entry()
which is called before kmalloc() can be used (we use the early bootmem allocator
here), move that call to memmap_init() which is a late_initcall.

This fixes following regression in linux-next tree:

BUG: Int 14: CR2 b004254c
EDI 00000000 ESI 00000082 EBP c0725ed8 ESP c0725eb4
EBX f0002800 EDX 0000000e ECX c071e070 EAX f000ff53
err 00000000 EIP c0278e3f CS 00000060 flg 00010086
Stack: 000080d0 c071e070 c000e1dc c000e1c0 c06e1bf8 c0725eec c0313d59 c000e1d4
c000e1c0 00000000 c0725f08 c074c7b8 00000000 0009cbff 00000000 c077b400
00000001 c0725f34 c0732e06 0009cbff 00000000 c063af02 01000000 00000000
Pid: 0, comm: swapper Not tainted 2.6.26-rc9-next-20080711 #4
[<c0234266>] ? up+0x2b/0x2f
[<c0278e3f>] ? kmem_cache_alloc+0x2e/0x7d
[<c0313d59>] ? kobject_init+0x46/0xd0
[<c074c7b8>] ? firmware_map_add_early+0x83/0xa3
[<c0732e06>] ? e820_reserve_resources+0x10b/0x11e
[<c07314d5>] ? setup_arch+0x871/0x8d7
[<c0220f63>] ? release_console_sem+0x177/0x17f
[<c07331ca>] ? __reserve_early+0xe4/0xf8
[<c055e14e>] ? printk+0xf/0x11
[<c072b67a>] ? start_kernel+0x5b/0x2d1
[<c072b080>] ? __init_begin+0x80/0x88
=======================


Signed-off-by: Bernhard Walle <[email protected]>
Signed-off-by: Bernhard Walle <[email protected]>
---
drivers/firmware/memmap.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/firmware/memmap.c b/drivers/firmware/memmap.c
index 73b7c1d..4788476 100644
--- a/drivers/firmware/memmap.c
+++ b/drivers/firmware/memmap.c
@@ -111,7 +111,12 @@ static int firmware_map_add_entry(resource_size_t start, resource_size_t end,
entry->end = end;
entry->type = type;
INIT_LIST_HEAD(&entry->list);
- kobject_init(&entry->kobj, &memmap_ktype);
+ /*
+ * don't init the kobject here since it calls kmalloc() internally
+ * which we are not ready to do in firmware_map_add_early() case
+ * Instead, do that before kobject_add() in memmap_init()
+ */
+ memset(&entry->kobj, 0, sizeof(struct kobject));

list_add_tail(&entry->list, &map_entries);

@@ -219,6 +224,7 @@ static int __init memmap_init(void)
return -ENOMEM;

list_for_each_entry(entry, &map_entries, list) {
+ kobject_init(&entry->kobj, &memmap_ktype);
entry->kobj.kset = memmap_kset;
if (kobject_add(&entry->kobj, NULL, "%d", i++))
kobject_put(&entry->kobj);
--
1.5.6.2

2008-07-12 22:40:38

by Bernhard Walle

[permalink] [raw]
Subject: [PATCH 1/2] firmware/memmap: Cleanup

Various cleanup the drivers/firmware/memmap (after review by AKPM):

- fix kdoc to conform to the standard
- move kdoc from header to implementation files
- remove superfluous WARN_ON() after kmalloc()
- WARN_ON(x); if (!x) -> if(!WARN_ON(x))
- improve some comments


Signed-off-by: Bernhard Walle <[email protected]>
Signed-off-by: Bernhard Walle <[email protected]>
---
drivers/firmware/memmap.c | 61 ++++++++++++++++++++++++++-----------
include/linux/firmware-map.h | 26 ----------------
2 files changed, 43 insertions(+), 44 deletions(-)
create mode 100644 include/config/firmware/memmap.h

diff --git a/drivers/firmware/memmap.c b/drivers/firmware/memmap.c
index e23399c..73b7c1d 100644
--- a/drivers/firmware/memmap.c
+++ b/drivers/firmware/memmap.c
@@ -84,20 +84,23 @@ static struct kobj_type memmap_ktype = {
*/

/*
- * Firmware memory map entries
+ * Firmware memory map entries. No locking is needed because the
+ * firmware_map_add() and firmware_map_add_early() functions are called
+ * in firmware initialisation code in one single thread of execution.
*/
static LIST_HEAD(map_entries);

/**
- * Common implementation of firmware_map_add() and firmware_map_add_early()
- * which expects a pre-allocated struct firmware_map_entry.
- *
+ * firmware_map_add_entry() - Does the real work to add a firmware memmap entry.
* @start: Start of the memory range.
* @end: End of the memory range (inclusive).
* @type: Type of the memory range.
* @entry: Pre-allocated (either kmalloc() or bootmem allocator), uninitialised
* entry.
- */
+ *
+ * Common implementation of firmware_map_add() and firmware_map_add_early()
+ * which expects a pre-allocated struct firmware_map_entry.
+ **/
static int firmware_map_add_entry(resource_size_t start, resource_size_t end,
const char *type,
struct firmware_map_entry *entry)
@@ -115,33 +118,52 @@ static int firmware_map_add_entry(resource_size_t start, resource_size_t end,
return 0;
}

-/*
- * See <linux/firmware-map.h> for documentation.
- */
+/**
+ * firmware_map_add() - Adds a firmware mapping entry.
+ * @start: Start of the memory range.
+ * @end: End of the memory range (inclusive).
+ * @type: Type of the memory range.
+ *
+ * This function uses kmalloc() for memory
+ * allocation. Use firmware_map_add_early() if you want to use the bootmem
+ * allocator.
+ *
+ * That function must be called before late_initcall.
+ *
+ * Returns 0 on success, or -ENOMEM if no memory could be allocated.
+ **/
int firmware_map_add(resource_size_t start, resource_size_t end,
const char *type)
{
struct firmware_map_entry *entry;

entry = kmalloc(sizeof(struct firmware_map_entry), GFP_ATOMIC);
- WARN_ON(!entry);
if (!entry)
return -ENOMEM;

return firmware_map_add_entry(start, end, type, entry);
}

-/*
- * See <linux/firmware-map.h> for documentation.
- */
+/**
+ * firmware_map_add_early() - Adds a firmware mapping entry.
+ * @start: Start of the memory range.
+ * @end: End of the memory range (inclusive).
+ * @type: Type of the memory range.
+ *
+ * Adds a firmware mapping entry. This function uses the bootmem allocator
+ * for memory allocation. Use firmware_map_add() if you want to use kmalloc().
+ *
+ * That function must be called before late_initcall.
+ *
+ * Returns 0 on success, or -ENOMEM if no memory could be allocated.
+ **/
int __init firmware_map_add_early(resource_size_t start, resource_size_t end,
const char *type)
{
struct firmware_map_entry *entry;

entry = alloc_bootmem_low(sizeof(struct firmware_map_entry));
- WARN_ON(!entry);
- if (!entry)
+ if (WARN_ON(!entry))
return -ENOMEM;

return firmware_map_add_entry(start, end, type, entry);
@@ -181,7 +203,10 @@ static ssize_t memmap_attr_show(struct kobject *kobj,
/*
* Initialises stuff and adds the entries in the map_entries list to
* sysfs. Important is that firmware_map_add() and firmware_map_add_early()
- * must be called before late_initcall.
+ * must be called before late_initcall. That's just because that function
+ * is called as late_initcall() function, which means that if you call
+ * firmware_map_add() or firmware_map_add_early() afterwards, the entries
+ * are not added to sysfs.
*/
static int __init memmap_init(void)
{
@@ -190,13 +215,13 @@ static int __init memmap_init(void)
struct kset *memmap_kset;

memmap_kset = kset_create_and_add("memmap", NULL, firmware_kobj);
- WARN_ON(!memmap_kset);
- if (!memmap_kset)
+ if (WARN_ON(!memmap_kset))
return -ENOMEM;

list_for_each_entry(entry, &map_entries, list) {
entry->kobj.kset = memmap_kset;
- kobject_add(&entry->kobj, NULL, "%d", i++);
+ if (kobject_add(&entry->kobj, NULL, "%d", i++))
+ kobject_put(&entry->kobj);
}

return 0;
diff --git a/include/config/firmware/memmap.h b/include/config/firmware/memmap.h
new file mode 100644
index 0000000..e69de29
diff --git a/include/linux/firmware-map.h b/include/linux/firmware-map.h
index acbdbcc..6e199c8 100644
--- a/include/linux/firmware-map.h
+++ b/include/linux/firmware-map.h
@@ -24,34 +24,8 @@
*/
#ifdef CONFIG_FIRMWARE_MEMMAP

-/**
- * Adds a firmware mapping entry. This function uses kmalloc() for memory
- * allocation. Use firmware_map_add_early() if you want to use the bootmem
- * allocator.
- *
- * That function must be called before late_initcall.
- *
- * @start: Start of the memory range.
- * @end: End of the memory range (inclusive).
- * @type: Type of the memory range.
- *
- * Returns 0 on success, or -ENOMEM if no memory could be allocated.
- */
int firmware_map_add(resource_size_t start, resource_size_t end,
const char *type);
-
-/**
- * Adds a firmware mapping entry. This function uses the bootmem allocator
- * for memory allocation. Use firmware_map_add() if you want to use kmalloc().
- *
- * That function must be called before late_initcall.
- *
- * @start: Start of the memory range.
- * @end: End of the memory range (inclusive).
- * @type: Type of the memory range.
- *
- * Returns 0 on success, or -ENOMEM if no memory could be allocated.
- */
int firmware_map_add_early(resource_size_t start, resource_size_t end,
const char *type);

--
1.5.6.2

2008-07-12 22:41:16

by Bernhard Walle

[permalink] [raw]
Subject: firmware/memmap fixes

This patch fixes the problems Andrew encountered in his review and also fixes a
boot problem on i386 in linux-next. That was not present while testing the
patch in tip tree.

However, I don't know how to merge that patches properly. Directly to
linux-next or via -mm? Or via x86?

The patches are against linux-next tree and were tested on i386.


Signed-off-by: Bernhard Walle <[email protected]>