People reported a bug on a high end server with many pcie devices, where
kernel bootup with crashkernel=384M, and kaslr is enabled. Even
though we still see much memory under 896 MB, the finding still failed
intermittently. Because currently we can only find region under 896 MB,
if without ',high' specified. Then KASLR breaks 896 MB into several parts
randomly, and crashkernel reservation need be aligned to 128 MB, that's
why failure is found. It raises confusion to the end user that sometimes
crashkernel=X works while sometimes fails.
If want to make it succeed, customer can change kernel option to
"crashkernel=384M,high". Just this give "crashkernel=xx@yy" a very
limited space to behave even though its grammar looks more generic.
And we can't answer questions raised from customer that confidently:
1) why it doesn't succeed to reserve 896 MB;
2) what's wrong with memory region under 4G;
3) why I have to add ',high', I only require 384 MB, not 3840 MB.
This patch tries to get memory region from 896 MB firstly, then [896MB,4G],
finally above 4G.
Dave Young sent the original post, and I just re-post it with commit log
improvement as his requirement.
http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
There was an old discussion below (previously posted by Chao Wang):
https://lkml.org/lkml/2013/10/15/601
Signed-off-by: Pingfan Liu <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: [email protected],
Cc: [email protected]
Cc: Randy Dunlap <[email protected]>
---
v6 -> v7: fix spelling mistake pointed out by Randy
arch/x86/kernel/setup.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3d872a5..fa62c81 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void)
high ? CRASH_ADDR_HIGH_MAX
: CRASH_ADDR_LOW_MAX,
crash_size, CRASH_ALIGN);
+#ifdef CONFIG_X86_64
+ /*
+ * crashkernel=X reserve below 896M fails? Try below 4G
+ */
+ if (!high && !crash_base)
+ crash_base = memblock_find_in_range(CRASH_ALIGN,
+ (1ULL << 32),
+ crash_size, CRASH_ALIGN);
+ /*
+ * crashkernel=X reserve below 4G fails? Try MAXMEM
+ */
+ if (!high && !crash_base)
+ crash_base = memblock_find_in_range(CRASH_ALIGN,
+ CRASH_ADDR_HIGH_MAX,
+ crash_size, CRASH_ALIGN);
+#endif
if (!crash_base) {
pr_info("crashkernel reservation failed - No suitable area found.\n");
return;
--
2.7.4
Pingfan, thanks for the post.
On 01/15/19 at 04:07pm, Pingfan Liu wrote:
> People reported a bug on a high end server with many pcie devices, where
> kernel bootup with crashkernel=384M, and kaslr is enabled. Even
> though we still see much memory under 896 MB, the finding still failed
> intermittently. Because currently we can only find region under 896 MB,
> if without ',high' specified. Then KASLR breaks 896 MB into several parts
> randomly, and crashkernel reservation need be aligned to 128 MB, that's
> why failure is found. It raises confusion to the end user that sometimes
> crashkernel=X works while sometimes fails.
> If want to make it succeed, customer can change kernel option to
> "crashkernel=384M,high". Just this give "crashkernel=xx@yy" a very
> limited space to behave even though its grammar looks more generic.
> And we can't answer questions raised from customer that confidently:
> 1) why it doesn't succeed to reserve 896 MB;
> 2) what's wrong with memory region under 4G;
> 3) why I have to add ',high', I only require 384 MB, not 3840 MB.
> This patch tries to get memory region from 896 MB firstly, then [896MB,4G],
> finally above 4G.
The patch log still looks not very good. It needs some cleanup like
paragraph line breaks to make it more readable.
For example you can take like below:
--
People reported crashkernel=384M reservation failed on a high end server
with KASLR enabled. In that case there is enough free memory under 896M
but crashkernel reservation still fails intermittently.
The situation is crashkernel reservation code only finds free region under
896 MB with 128M aligned in case no ',high' being used. And KASLR could
break the first 896M into several parts randomly thus the failure happens.
User has no way to predict and make sure crashkernel=xM working unless
he/she use 'crashkernel=xM,high'. Since 'crashkernel=xM' is the most
common use case this issue is a serious bug.
And we can't answer questions raised from customer:
1) why it doesn't succeed to reserve 896 MB;
2) what's wrong with memory region under 4G;
3) why I have to add ',high', I only require 384 MB, not 3840 MB.
This patch tries to get memory region from 896 MB firstly, then [896MB,4G],
finally above 4G.
> Dave Young sent the original post, and I just re-post it with commit log
> improvement as his requirement.
> http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> There was an old discussion below (previously posted by Chao Wang):
> https://lkml.org/lkml/2013/10/15/601
I hope someone else can provide review because I posted it previously.
But I think previously when I posted it is a good to have improvement,
but now it is a real serious bug which need to be fixed. I can review
and ack if you can repost with a better log.
>
> Signed-off-by: Pingfan Liu <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Baoquan He <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Mike Rapoport <[email protected]>
> Cc: [email protected],
> Cc: [email protected]
> Cc: Randy Dunlap <[email protected]>
> ---
> v6 -> v7: fix spelling mistake pointed out by Randy
> arch/x86/kernel/setup.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3d872a5..fa62c81 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void)
> high ? CRASH_ADDR_HIGH_MAX
> : CRASH_ADDR_LOW_MAX,
> crash_size, CRASH_ALIGN);
> +#ifdef CONFIG_X86_64
> + /*
> + * crashkernel=X reserve below 896M fails? Try below 4G
> + */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + (1ULL << 32),
> + crash_size, CRASH_ALIGN);
> + /*
> + * crashkernel=X reserve below 4G fails? Try MAXMEM
> + */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + CRASH_ADDR_HIGH_MAX,
> + crash_size, CRASH_ALIGN);
> +#endif
> if (!crash_base) {
> pr_info("crashkernel reservation failed - No suitable area found.\n");
> return;
> --
> 2.7.4
>
Thanks
Dave
On Tue, Jan 15, 2019 at 04:07:03PM +0800, Pingfan Liu wrote:
> People reported a bug on a high end server with many pcie devices, where
> kernel bootup with crashkernel=384M, and kaslr is enabled. Even
> though we still see much memory under 896 MB, the finding still failed
> intermittently. Because currently we can only find region under 896 MB,
> if without ',high' specified. Then KASLR breaks 896 MB into several parts
> randomly, and crashkernel reservation need be aligned to 128 MB, that's
> why failure is found. It raises confusion to the end user that sometimes
> crashkernel=X works while sometimes fails.
> If want to make it succeed, customer can change kernel option to
> "crashkernel=384M,high". Just this give "crashkernel=xx@yy" a very
> limited space to behave even though its grammar looks more generic.
> And we can't answer questions raised from customer that confidently:
> 1) why it doesn't succeed to reserve 896 MB;
> 2) what's wrong with memory region under 4G;
> 3) why I have to add ',high', I only require 384 MB, not 3840 MB.
> This patch tries to get memory region from 896 MB firstly, then [896MB,4G],
> finally above 4G.
While allocating crashkernel from below 4G seems fine, won't we have
problems if the crash kernel gets allocated above 4G because of the SWIOTLB?
thanks
> Dave Young sent the original post, and I just re-post it with commit log
> improvement as his requirement.
> http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> There was an old discussion below (previously posted by Chao Wang):
> https://lkml.org/lkml/2013/10/15/601
>
> Signed-off-by: Pingfan Liu <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Baoquan He <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Mike Rapoport <[email protected]>
> Cc: [email protected],
> Cc: [email protected]
> Cc: Randy Dunlap <[email protected]>
> ---
> v6 -> v7: fix spelling mistake pointed out by Randy
> arch/x86/kernel/setup.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3d872a5..fa62c81 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void)
> high ? CRASH_ADDR_HIGH_MAX
> : CRASH_ADDR_LOW_MAX,
> crash_size, CRASH_ALIGN);
> +#ifdef CONFIG_X86_64
> + /*
> + * crashkernel=X reserve below 896M fails? Try below 4G
> + */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + (1ULL << 32),
> + crash_size, CRASH_ALIGN);
> + /*
> + * crashkernel=X reserve below 4G fails? Try MAXMEM
> + */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + CRASH_ADDR_HIGH_MAX,
> + crash_size, CRASH_ALIGN);
> +#endif
> if (!crash_base) {
> pr_info("crashkernel reservation failed - No suitable area found.\n");
> return;
> --
> 2.7.4
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
--
-----------------------------------------------------------------------------
Jerry Hoemann Software Engineer Hewlett Packard Enterprise
-----------------------------------------------------------------------------
On 01/21/19 at 01:16pm, Pingfan Liu wrote:
> People reported crashkernel=384M reservation failed on a high end server
> with KASLR enabled. In that case there is enough free memory under 896M
> but crashkernel reservation still fails intermittently.
>
> The situation is crashkernel reservation code only finds free region under
> 896 MB with 128M aligned in case no ',high' being used. And KASLR could
> break the first 896M into several parts randomly thus the failure happens.
> User has no way to predict and make sure crashkernel=xM working unless
> he/she use 'crashkernel=xM,high'. Since 'crashkernel=xM' is the most
> common use case this issue is a serious bug.
>
> And we can't answer questions raised from customer:
> 1) why it doesn't succeed to reserve 896 MB;
> 2) what's wrong with memory region under 4G;
> 3) why I have to add ',high', I only require 384 MB, not 3840 MB.
>
> This patch tries to get memory region from 896 MB firstly, then [896MB,4G],
> finally above 4G.
>
> Dave Young sent the original post, and I just re-post it with commit log
> improvement as his requirement.
> http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> There was an old discussion below (previously posted by Chao Wang):
> https://lkml.org/lkml/2013/10/15/601
>
> Signed-off-by: Pingfan Liu <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Baoquan He <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Mike Rapoport <[email protected]>
> Cc: [email protected],
> Cc: [email protected]
> Cc: Randy Dunlap <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
Looks good, ack.
Acked-by: Baoquan He <[email protected]>
Thanks
Baoquan
> ---
> v6 -> v7: commit log improvement
> arch/x86/kernel/setup.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3d872a5..fa62c81 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void)
> high ? CRASH_ADDR_HIGH_MAX
> : CRASH_ADDR_LOW_MAX,
> crash_size, CRASH_ALIGN);
> +#ifdef CONFIG_X86_64
> + /*
> + * crashkernel=X reserve below 896M fails? Try below 4G
> + */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + (1ULL << 32),
> + crash_size, CRASH_ALIGN);
> + /*
> + * crashkernel=X reserve below 4G fails? Try MAXMEM
> + */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + CRASH_ADDR_HIGH_MAX,
> + crash_size, CRASH_ALIGN);
> +#endif
> if (!crash_base) {
> pr_info("crashkernel reservation failed - No suitable area found.\n");
> return;
> --
> 2.7.4
>
On Sat, Jan 19, 2019 at 9:25 AM Jerry Hoemann <[email protected]> wrote:
>
> On Tue, Jan 15, 2019 at 04:07:03PM +0800, Pingfan Liu wrote:
> > People reported a bug on a high end server with many pcie devices, where
> > kernel bootup with crashkernel=384M, and kaslr is enabled. Even
> > though we still see much memory under 896 MB, the finding still failed
> > intermittently. Because currently we can only find region under 896 MB,
> > if without ',high' specified. Then KASLR breaks 896 MB into several parts
> > randomly, and crashkernel reservation need be aligned to 128 MB, that's
> > why failure is found. It raises confusion to the end user that sometimes
> > crashkernel=X works while sometimes fails.
> > If want to make it succeed, customer can change kernel option to
> > "crashkernel=384M,high". Just this give "crashkernel=xx@yy" a very
> > limited space to behave even though its grammar looks more generic.
> > And we can't answer questions raised from customer that confidently:
> > 1) why it doesn't succeed to reserve 896 MB;
> > 2) what's wrong with memory region under 4G;
> > 3) why I have to add ',high', I only require 384 MB, not 3840 MB.
> > This patch tries to get memory region from 896 MB firstly, then [896MB,4G],
> > finally above 4G.
>
> While allocating crashkernel from below 4G seems fine, won't we have
> problems if the crash kernel gets allocated above 4G because of the SWIOTLB?
>
It will reserve extra memory below 4G for the swiotlb purpose. You can
find the logic in reserve_crashkernel_low()
And testing with crashkernel=512M@4G, we will get:
cat /proc/iomem | grep Crash
aa000000-b9ffffff : Crash kernel
100000000-11fffffff : Crash kernel
Thanks,
Pingfan
> thanks
>
>
> > Dave Young sent the original post, and I just re-post it with commit log
> > improvement as his requirement.
> > http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> > There was an old discussion below (previously posted by Chao Wang):
> > https://lkml.org/lkml/2013/10/15/601
> >
> > Signed-off-by: Pingfan Liu <[email protected]>
> > Cc: Dave Young <[email protected]>
> > Cc: Baoquan He <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: Mike Rapoport <[email protected]>
> > Cc: [email protected],
> > Cc: [email protected]
> > Cc: Randy Dunlap <[email protected]>
> > ---
> > v6 -> v7: fix spelling mistake pointed out by Randy
> > arch/x86/kernel/setup.c | 16 ++++++++++++++++
> > 1 file changed, 16 insertions(+)
> >
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 3d872a5..fa62c81 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void)
> > high ? CRASH_ADDR_HIGH_MAX
> > : CRASH_ADDR_LOW_MAX,
> > crash_size, CRASH_ALIGN);
> > +#ifdef CONFIG_X86_64
> > + /*
> > + * crashkernel=X reserve below 896M fails? Try below 4G
> > + */
> > + if (!high && !crash_base)
> > + crash_base = memblock_find_in_range(CRASH_ALIGN,
> > + (1ULL << 32),
> > + crash_size, CRASH_ALIGN);
> > + /*
> > + * crashkernel=X reserve below 4G fails? Try MAXMEM
> > + */
> > + if (!high && !crash_base)
> > + crash_base = memblock_find_in_range(CRASH_ALIGN,
> > + CRASH_ADDR_HIGH_MAX,
> > + crash_size, CRASH_ALIGN);
> > +#endif
> > if (!crash_base) {
> > pr_info("crashkernel reservation failed - No suitable area found.\n");
> > return;
> > --
> > 2.7.4
> >
> >
> > _______________________________________________
> > kexec mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/kexec
>
> --
>
> -----------------------------------------------------------------------------
> Jerry Hoemann Software Engineer Hewlett Packard Enterprise
> -----------------------------------------------------------------------------