Reserve unspecified location of physical memory from kernel command line
Background:
In ChromeOS, we have 1 MB of pstore ramoops reserved so that we can extract
dmesg output and some other information when a crash happens in the field.
(This is only done when the user selects "Allow Google to collect data for
improving the system"). But there are cases when there's a bug that
requires more data to be retrieved to figure out what is happening. We would
like to increase the pstore size, either temporarily, or maybe even
permanently. The pstore on these devices are at a fixed location in RAM (as
the RAM is not cleared on soft reboots nor crashes). The location is chosen
by the BIOS (coreboot) and passed to the kernel via ACPI tables on x86.
There's a driver that queries for this to initialize the pstore for
ChromeOS:
See drivers/platform/chrome/chromeos_pstore.c
Problem:
The problem is that, even though there's a process to change the kernel on
these systems, and is done regularly to install updates, the firmware is
updated much less frequently. Choosing the place in RAM also takes special
care, and may be in a different address for different boards. Updating the
size via firmware is a large effort and not something that many are willing
to do for a temporary pstore size change.
Requirement:
Need a way to reserve memory that will be at a consistent location for
every boot, if the kernel and system are the same. Does not need to work
if rebooting to a different kernel, or if the system can change the
memory layout between boots.
The reserved memory can not be an hard coded address, as the same kernel /
command line needs to run on several different machines. The picked memory
reservation just needs to be the same for a given machine, but may be
different for different machines.
Solution:
The solution I have come up with is to introduce a new "reserve_mem=" kernel
command line. This parameter takes the following format:
reserve_mem=nn:align:label
Where nn is the size of memory to reserve, the align is the alignment of
that memory, and label is the way for other sub-systems to find that memory.
This way the kernel command line could have:
reserve_mem=12M:4096:oops ramoops.mem_name=oops
At boot up, the kernel will search for 12 megabytes in usable memory regions
with an alignment of 4096. It will start at the highest regions and work its
way down (for those old devices that want access to lower address DMA). When
it finds a region, it will save it off in a small table and mark it with the
"oops" label. Then the pstore ramoops sub-system could ask for that memory
and location, and it will map itself there.
This prototype allows for 8 different mappings (which may be overkill, 4 is
probably plenty) with 16 byte size to store the label.
I have tested this and it works for us to solve the above problem. We can
update the kernel and command line and increase the size of pstore without
needing to update the firmware, or knowing every memory layout of each
board. I only tested this locally, it has not been tested in the field.
Changes since v5: https://lore.kernel.org/all/[email protected]/
[ patch at bottom showing differences ]
- Stressed more that this is a best effort use case
- Updated ramoops.rst to document this new feature
- Used a new variable "tmp" to use in reserve_mem_find_by_name() instead
of using "size" and possibly corrupting it.
Changes since v4: https://lore.kernel.org/all/[email protected]/
- Add all checks about reserve_mem before allocation.
This means reserved_mem_add() is now a void function.
- Check for name duplications.
- Fix compare of align to SMP_CACHE_BYTES ("<" instead of "<=")
Changes since v3: https://lore.kernel.org/all/[email protected]/
- Changed table type of start and size from unsigned long to phys_addr_t
(as well as the parameters to the functions that use them)
- Changed old reference to "early_reserve_mem" to "reserve_mem"
- Check before reservering memory:
o Size is non-zero
o name has text in it
- If align is less than SMP_CACHE_BYTES, make it SMP_CACHE_BYTES
- Remove the silly check of testing *p == '\0' after a p += strlen(p)
Changes since v2: https://lore.kernel.org/all/[email protected]/
- Fixed typo of "reserver"
- Added EXPORT_SYMBOL_GPL() for reserve_mem_find_by_name()
- Removed "built-in" from module description that was changed from v1.
Changes since v1: https://lore.kernel.org/all/[email protected]/
- Updated the change log of the first patch as well as added an entry
into kernel-parameters.txt about how reserve_mem is for soft reboots
and may not be reliable.
Steven Rostedt (Google) (2):
mm/memblock: Add "reserve_mem" to reserved named memory at boot up
pstore/ramoops: Add ramoops.mem_name= command line option
----
Documentation/admin-guide/kernel-parameters.txt | 22 +++++
Documentation/admin-guide/ramoops.rst | 13 +++
fs/pstore/ram.c | 14 +++
include/linux/mm.h | 2 +
mm/memblock.c | 117 ++++++++++++++++++++++++
5 files changed, 168 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ce7de8136f2f..56e18b1a520d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5717,9 +5717,11 @@
used for systems that do not wipe the RAM, and this command
line will try to reserve the same physical memory on
soft reboots. Note, it is not guaranteed to be the same
- location. For example, if KASLR places the kernel at the
- location of where the RAM reservation was from a previous
- boot, the new reservation will be at a different location.
+ location. For example, if anything about the system changes
+ or if booting a different kernel. It can also fail if KASLR
+ places the kernel at the location of where the RAM reservation
+ was from a previous boot, the new reservation will be at a
+ different location.
Any subsystem using this feature must add a way to verify
that the contents of the physical memory is from a previous
boot, as there may be cases where the memory will not be
diff --git a/Documentation/admin-guide/ramoops.rst b/Documentation/admin-guide/ramoops.rst
index e9f85142182d..6f534a707b2a 100644
--- a/Documentation/admin-guide/ramoops.rst
+++ b/Documentation/admin-guide/ramoops.rst
@@ -23,6 +23,8 @@ and type of the memory area are set using three variables:
* ``mem_size`` for the size. The memory size will be rounded down to a
power of two.
* ``mem_type`` to specify if the memory type (default is pgprot_writecombine).
+ * ``mem_name`` to specify a memory region defined by ``reserve_mem`` command
+ line parameter.
Typically the default value of ``mem_type=0`` should be used as that sets the pstore
mapping to pgprot_writecombine. Setting ``mem_type=1`` attempts to use
@@ -118,6 +120,17 @@ Setting the ramoops parameters can be done in several different manners:
return ret;
}
+ D. Using a region of memory reserved via ``reserve_mem`` command line
+ parameter. The address and size will be defined by the ``reserve_mem``
+ parameter. Note, that ``reserve_mem`` may not always allocate memory
+ in the same location, and cannot be relied upon. Testing will need
+ to be done, and it may not work on every machine, nor every kernel.
+ Consider this a "best effort" approach. The ``reserve_mem`` option
+ takes a size, alignment and name as arguments. The name is used
+ to map the memory to a label that can be retrieved by ramoops.
+
+ reserver_mem=2M:4096:oops ramoops.mem_name=oops
+
You can specify either RAM memory or peripheral devices' memory. However, when
specifying RAM, be sure to reserve the memory by issuing memblock_reserve()
very early in the architecture code, e.g.::
diff --git a/mm/memblock.c b/mm/memblock.c
index 739d106a9165..b7b0e8c3868d 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2301,7 +2301,7 @@ EXPORT_SYMBOL_GPL(reserve_mem_find_by_name);
*/
static int __init reserve_mem(char *p)
{
- phys_addr_t start, size, align;
+ phys_addr_t start, size, align, tmp;
char *name;
char *oldp;
int len;
@@ -2347,8 +2347,8 @@ static int __init reserve_mem(char *p)
if (!*p)
return -EINVAL;
- /* Make sure the name is not already used (size is only updated if found) */
- if (reserve_mem_find_by_name(name, &start, &size))
+ /* Make sure the name is not already used */
+ if (reserve_mem_find_by_name(name, &start, &tmp))
return -EBUSY;
start = memblock_phys_alloc(size, align);
Hey Steve,
On 13.06.24 17:55, Steven Rostedt wrote:
> Reserve unspecified location of physical memory from kernel command line
>
> Background:
>
> In ChromeOS, we have 1 MB of pstore ramoops reserved so that we can extract
> dmesg output and some other information when a crash happens in the field.
> (This is only done when the user selects "Allow Google to collect data for
> improving the system"). But there are cases when there's a bug that
> requires more data to be retrieved to figure out what is happening. We would
> like to increase the pstore size, either temporarily, or maybe even
> permanently. The pstore on these devices are at a fixed location in RAM (as
> the RAM is not cleared on soft reboots nor crashes). The location is chosen
> by the BIOS (coreboot) and passed to the kernel via ACPI tables on x86.
> There's a driver that queries for this to initialize the pstore for
> ChromeOS:
>
> See drivers/platform/chrome/chromeos_pstore.c
>
> Problem:
>
> The problem is that, even though there's a process to change the kernel on
> these systems, and is done regularly to install updates, the firmware is
> updated much less frequently. Choosing the place in RAM also takes special
> care, and may be in a different address for different boards. Updating the
> size via firmware is a large effort and not something that many are willing
> to do for a temporary pstore size change.
(sorry for not commenting on earlier versions, I didn't see v1-v5 in my
inbox)
Do you have a "real" pstore on these systems that you could store
non-volatile variables in, such as persistent UEFI variables? If so, you
could create an actually persistent mapping for your trace pstore even
across kernel version updates as a general mechanism to create reserved
memblocks at fixed offsets.
> Requirement:
>
> Need a way to reserve memory that will be at a consistent location for
> every boot, if the kernel and system are the same. Does not need to work
> if rebooting to a different kernel, or if the system can change the
> memory layout between boots.
>
> The reserved memory can not be an hard coded address, as the same kernel /
> command line needs to run on several different machines. The picked memory
> reservation just needs to be the same for a given machine, but may be
With KASLR is enabled, doesn't this approach break too often to be
reliable enough for the data you want to extract?
Picking up the idea above, with a persistent variable we could even make
KASLR avoid that reserved pstore region in its search for a viable KASLR
offset.
Alex
Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
On Thu, 13 Jun 2024 18:54:12 +0200
Alexander Graf <[email protected]> wrote:
>
> Do you have a "real" pstore on these systems that you could store
> non-volatile variables in, such as persistent UEFI variables? If so, you
> could create an actually persistent mapping for your trace pstore even
> across kernel version updates as a general mechanism to create reserved
> memblocks at fixed offsets.
After implementing all this, I don't think I can use pstore for my
purpose. pstore is a generic interface for persistent storage, and
requires an interface to access it. From what I understand, it's not
the place to just ask for an area of RAM.
For this, I have a single patch that allows the tracing instance to use
an area reserved by reserve_mem.
reserve_mem=12M:4096:trace trace_instance=boot_mapped@trace
I've already tested this on qemu and a couple of chromebooks. It works
well.
>
>
> > Requirement:
> >
> > Need a way to reserve memory that will be at a consistent location for
> > every boot, if the kernel and system are the same. Does not need to work
> > if rebooting to a different kernel, or if the system can change the
> > memory layout between boots.
> >
> > The reserved memory can not be an hard coded address, as the same kernel /
> > command line needs to run on several different machines. The picked memory
> > reservation just needs to be the same for a given machine, but may be
>
>
> With KASLR is enabled, doesn't this approach break too often to be
> reliable enough for the data you want to extract?
>
> Picking up the idea above, with a persistent variable we could even make
> KASLR avoid that reserved pstore region in its search for a viable KASLR
> offset.
I think I was hit by it once in all my testing. For our use case, the
few times it fails to map is not going to affect what we need this for
at all.
-- Steve