On ppc64 the crashkernel region almost always overlaps an area of firmware.
This works fine except when using the sysfs interface to reduce the kdump
region. If we free the firmware area we are guaranteed to crash.
Rename free_reserved_phys_range to crash_free_reserved_phys_range and make
it a weak function so we can override it.
Signed-off-by: Anton Blanchard <[email protected]>
---
Index: powerpc.git/kernel/kexec.c
===================================================================
--- powerpc.git.orig/kernel/kexec.c 2010-08-25 10:12:11.493863566 +1000
+++ powerpc.git/kernel/kexec.c 2010-08-25 10:12:35.907264327 +1000
@@ -1097,7 +1097,8 @@ size_t crash_get_memory_size(void)
return size;
}
-static void free_reserved_phys_range(unsigned long begin, unsigned long end)
+void __weak crash_free_reserved_phys_range(unsigned long begin,
+ unsigned long end)
{
unsigned long addr;
@@ -1133,7 +1134,7 @@ int crash_shrink_memory(unsigned long ne
start = roundup(start, PAGE_SIZE);
end = roundup(start + new_size, PAGE_SIZE);
- free_reserved_phys_range(end, crashk_res.end);
+ crash_free_reserved_phys_range(end, crashk_res.end);
if ((start == end) && (crashk_res.parent != NULL))
release_resource(&crashk_res);
Index: powerpc.git/include/linux/kexec.h
===================================================================
--- powerpc.git.orig/include/linux/kexec.h 2010-08-25 10:12:11.483862172 +1000
+++ powerpc.git/include/linux/kexec.h 2010-08-25 10:12:13.664166570 +1000
@@ -208,6 +208,7 @@ int __init parse_crashkernel(char *cmdli
unsigned long long *crash_size, unsigned long long *crash_base);
int crash_shrink_memory(unsigned long new_size);
size_t crash_get_memory_size(void);
+void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
#else /* !CONFIG_KEXEC */
struct pt_regs;
The crashkernel region will almost always overlap RTAS. If we free the
crashkernel region via "echo 0 > /sys/kernel/kexec_crash_size" then we will
free RTAS and the machine will crash in confusing and exciting ways.
Override crash_free_reserved_phys_range and check for overlap with RTAS.
Signed-off-by: Anton Blanchard <[email protected]>
---
Index: powerpc.git/arch/powerpc/kernel/crash_dump.c
===================================================================
--- powerpc.git.orig/arch/powerpc/kernel/crash_dump.c 2010-08-24 09:24:32.219643256 +1000
+++ powerpc.git/arch/powerpc/kernel/crash_dump.c 2010-08-24 09:46:22.826868330 +1000
@@ -19,6 +19,7 @@
#include <asm/prom.h>
#include <asm/firmware.h>
#include <asm/uaccess.h>
+#include <asm/rtas.h>
#ifdef DEBUG
#include <asm/udbg.h>
@@ -141,3 +142,35 @@ ssize_t copy_oldmem_page(unsigned long p
return csize;
}
+
+#ifdef CONFIG_PPC_RTAS
+/*
+ * The crashkernel region will almost always overlap the RTAS region, so
+ * we have to be careful when shrinking the crashkernel region.
+ */
+void crash_free_reserved_phys_range(unsigned long begin, unsigned long end)
+{
+ unsigned long addr;
+ const u32 *basep, *sizep;
+ unsigned int rtas_start = 0, rtas_end = 0;
+
+ basep = of_get_property(rtas.dev, "linux,rtas-base", NULL);
+ sizep = of_get_property(rtas.dev, "rtas-size", NULL);
+
+ if (basep && sizep) {
+ rtas_start = *basep;
+ rtas_end = *basep + *sizep;
+ }
+
+ for (addr = begin; addr < end; addr += PAGE_SIZE) {
+ /* Does this page overlap with the RTAS region? */
+ if (addr <= rtas_end && ((addr + PAGE_SIZE) > rtas_start))
+ continue;
+
+ ClearPageReserved(pfn_to_page(addr >> PAGE_SHIFT));
+ init_page_count(pfn_to_page(addr >> PAGE_SHIFT));
+ free_page((unsigned long)__va(addr));
+ totalram_pages++;
+ }
+}
+#endif
Anton Blanchard <[email protected]> writes:
> On ppc64 the crashkernel region almost always overlaps an area of firmware.
> This works fine except when using the sysfs interface to reduce the kdump
> region. If we free the firmware area we are guaranteed to crash.
That is ppc64 bug. firmware should not be in the reserved region. Any
random kernel like thing can be put in to that region at any valid
address and the fact that shrinking the region frees your firmware means
that using that region could also stomp your firmware (which I assume
would be a bad thing).
So please fix the ppc64 reservation.
Eric
On 08/25/2010 06:07 AM, Eric W. Biederman wrote:
> Anton Blanchard <[email protected]> writes:
>
>> On ppc64 the crashkernel region almost always overlaps an area of firmware.
>> This works fine except when using the sysfs interface to reduce the kdump
>> region. If we free the firmware area we are guaranteed to crash.
>
> That is ppc64 bug. firmware should not be in the reserved region. Any
> random kernel like thing can be put in to that region at any valid
> address and the fact that shrinking the region frees your firmware means
> that using that region could also stomp your firmware (which I assume
> would be a bad thing).
The issue only happens while shrinking the region using sysfs interface.
We already have checks in kexec for not to stomp over on the firmware
overlap area while loading capture kernel. Currently we do a top-down
allocation for the firmware region which means it sits at the top of the
RMO, right in the middle of the crashdump region. We can not move the
crashkernel region beyond firmware region because kernel needs its some
of memory in RMO region.
>
> So please fix the ppc64 reservation.
>
> Eric
> _______________________________________________
> Linuxppc-dev mailing list
> [email protected]
> https://lists.ozlabs.org/listinfo/linuxppc-dev
On Wed, Mar 09, 2011 at 12:02:06PM +0530, Mahesh Jagannath Salgaonkar wrote:
>On 08/25/2010 06:07 AM, Eric W. Biederman wrote:
>> Anton Blanchard <[email protected]> writes:
>>
>>> On ppc64 the crashkernel region almost always overlaps an area of firmware.
>>> This works fine except when using the sysfs interface to reduce the kdump
>>> region. If we free the firmware area we are guaranteed to crash.
>>
>> That is ppc64 bug. firmware should not be in the reserved region. Any
>> random kernel like thing can be put in to that region at any valid
>> address and the fact that shrinking the region frees your firmware means
>> that using that region could also stomp your firmware (which I assume
>> would be a bad thing).
>The issue only happens while shrinking the region using sysfs interface.
>We already have checks in kexec for not to stomp over on the firmware
>overlap area while loading capture kernel. Currently we do a top-down
>allocation for the firmware region which means it sits at the top of the
>RMO, right in the middle of the crashdump region. We can not move the
>crashkernel region beyond firmware region because kernel needs its some
>of memory in RMO region.
The crashkernel region is specified via kernel cmdline, so why
not just drop a failure when it overlaps with RMO region?
Am I missing something?
Thanks.
Hi,
> The crashkernel region is specified via kernel cmdline, so why
> not just drop a failure when it overlaps with RMO region?
> Am I missing something?
Unfortunately a ppc64 kernel requires a chunk of RMO memory. We would
need the ability to specify multiple crashkernel regions - about 32MB
in the RMO and the rest can be anywhere. That sounds pretty fragile for
a user to configure successfully on the cmdline.
Thats why the ppc64 crashkernel region begins mid way through the RMO
region. It means both kernels get a chunk of RMO and we only have to
deal with one crashkernel reservation in all the tools and
documentation.
Anton
On Wed, Mar 09, 2011 at 11:46:57PM +1100, Anton Blanchard wrote:
>
>Hi,
>
>> The crashkernel region is specified via kernel cmdline, so why
>> not just drop a failure when it overlaps with RMO region?
>> Am I missing something?
>
>Unfortunately a ppc64 kernel requires a chunk of RMO memory. We would
>need the ability to specify multiple crashkernel regions - about 32MB
>in the RMO and the rest can be anywhere. That sounds pretty fragile for
>a user to configure successfully on the cmdline.
>
>Thats why the ppc64 crashkernel region begins mid way through the RMO
>region. It means both kernels get a chunk of RMO and we only have to
>deal with one crashkernel reservation in all the tools and
>documentation.
>
So, when I specify 128M in cmdline, 32M of them are RMO, and the
rest 96M are normal memory? And when I want to free all of them,
actually the 32M RMO will never be freed?
On Wed, Mar 09, 2011 at 10:21:08PM +0800, Am?rico Wang wrote:
> On Wed, Mar 09, 2011 at 11:46:57PM +1100, Anton Blanchard wrote:
> >
> >Hi,
> >
> >> The crashkernel region is specified via kernel cmdline, so why
> >> not just drop a failure when it overlaps with RMO region?
> >> Am I missing something?
> >
> >Unfortunately a ppc64 kernel requires a chunk of RMO memory. We would
> >need the ability to specify multiple crashkernel regions - about 32MB
> >in the RMO and the rest can be anywhere. That sounds pretty fragile for
> >a user to configure successfully on the cmdline.
> >
> >Thats why the ppc64 crashkernel region begins mid way through the RMO
> >region. It means both kernels get a chunk of RMO and we only have to
> >deal with one crashkernel reservation in all the tools and
> >documentation.
> >
>
> So, when I specify 128M in cmdline, 32M of them are RMO, and the
> rest 96M are normal memory? And when I want to free all of them,
> actually the 32M RMO will never be freed?
In ppc64 implementation the crashkernel region begins mid way through
the RMO and spans across into the normal memory. For e.g. on power system
with 128M RMO size, when user specifies 128M in cmdline, the crashkernel
starts from default offset 64M through 192M. Which means 64M in RMO region
and rest beyond RMO.
<-------------------RMO--------------->
<---rtas-->
0 64M 128M End
|=====================|=================|===========|============|
Crash_start|<------------128M----------->| Crash End
During free we do free all of them including RMO region. But since the rtas
region is always on top of RMO, crashkernel memory overlaps rtas region and
we endup freeing that even, which is causing the crash.
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
--
Signed-off-by: Mahesh Salgaonkar <[email protected]>
On Tue, Mar 15, 2011 at 2:13 AM, Mahesh J Salgaonkar
<[email protected]> wrote:
>
> During free we do free all of them including RMO region. But since the rtas
> region is always on top of RMO, crashkernel memory overlaps rtas region and
> we endup freeing that even, which is causing the crash.
>
Okay, but with this patch applied, we will just ignore rtas region, right?
Thus, when I echo 0 to free all the 128M crashkernel memory, the final
result will be 32M left, which means crash_size will still show 32M.
This looks odd.
How about skipping the 32M as a whole? I mean once the region being freed
has overlap with this rtas region, skip the whole rtas region, and let
crash_size
show 0?
Thanks.
On Tue, Mar 15, 2011 at 03:52:38PM +0800, Am?rico Wang wrote:
> On Tue, Mar 15, 2011 at 2:13 AM, Mahesh J Salgaonkar
> <[email protected]> wrote:
> >
> > During free we do free all of them including RMO region. But since the rtas
> > region is always on top of RMO, crashkernel memory overlaps rtas region and
> > we endup freeing that even, which is causing the crash.
> >
>
> Okay, but with this patch applied, we will just ignore rtas region, right?
Correct.
> Thus, when I echo 0 to free all the 128M crashkernel memory, the final
> result will be 32M left, which means crash_size will still show 32M.
> This looks odd.
>
> How about skipping the 32M as a whole? I mean once the region being freed
> has overlap with this rtas region, skip the whole rtas region, and let
> crash_size
> show 0?
The existing code from crash_shrink_memory() function reduces the crash
size to 0 when echo'ed 0. I did test this patchset and verified that
/sys/kernel/kexec_crash_size show 0 value.
Thanks,
-Mahesh.
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
--
Signed-off-by: Mahesh Salgaonkar <[email protected]>
On Tue, 15 Mar 2011 22:22:19 +0530, Mahesh J Salgaonkar wrote:
> On Tue, Mar 15, 2011 at 03:52:38PM +0800, Américo Wang wrote:
>> On Tue, Mar 15, 2011 at 2:13 AM, Mahesh J Salgaonkar
>> <[email protected]> wrote:
>> >
>> > During free we do free all of them including RMO region. But since
>> > the rtas region is always on top of RMO, crashkernel memory overlaps
>> > rtas region and we endup freeing that even, which is causing the
>> > crash.
>> >
>> >
>> Okay, but with this patch applied, we will just ignore rtas region,
>> right?
> Correct.
>> Thus, when I echo 0 to free all the 128M crashkernel memory, the final
>> result will be 32M left, which means crash_size will still show 32M.
>> This looks odd.
>>
>> How about skipping the 32M as a whole? I mean once the region being
>> freed has overlap with this rtas region, skip the whole rtas region,
>> and let crash_size
>> show 0?
> The existing code from crash_shrink_memory() function reduces the crash
> size to 0 when echo'ed 0. I did test this patchset and verified that
> /sys/kernel/kexec_crash_size show 0 value.
Oh, ok.
Acked-by: WANG Cong <[email protected]>
Thanks.
On Mon, 2011-03-21 at 03:10 +0000, WANG Cong wrote:
> On Tue, 15 Mar 2011 22:22:19 +0530, Mahesh J Salgaonkar wrote:
> >> Okay, but with this patch applied, we will just ignore rtas region,
> >> right?
> > Correct.
> >> Thus, when I echo 0 to free all the 128M crashkernel memory, the final
> >> result will be 32M left, which means crash_size will still show 32M.
> >> This looks odd.
> >>
> >> How about skipping the 32M as a whole? I mean once the region being
> >> freed has overlap with this rtas region, skip the whole rtas region,
> >> and let crash_size
> >> show 0?
> > The existing code from crash_shrink_memory() function reduces the crash
> > size to 0 when echo'ed 0. I did test this patchset and verified that
> > /sys/kernel/kexec_crash_size show 0 value.
>
> Oh, ok.
>
> Acked-by: WANG Cong <[email protected]>
So Eric, what's the right approach to get that merged ? This is a bug
gating an important delivery for us, and the patch doesn't appear
terribly invasive ? :-)
I can send it to Linus myself if you prefer and give me your Ack.
Cheers,
Ben.