People reported crashkernel=384M reservation failed on a high end server
with KASLR enabled. In that case there is enough free memory under 896M
but crashkernel reservation still fails intermittently.
The situation is crashkernel reservation code only finds free region under
896 MB with 128M aligned in case no ',high' being used. And KASLR could
break the first 896M into several parts randomly thus the failure happens.
User has no way to predict and make sure crashkernel=xM working unless
he/she use 'crashkernel=xM,high'. Since 'crashkernel=xM' is the most
common use case this issue is a serious bug.
And we can't answer questions raised from customer:
1) why it doesn't succeed to reserve 896 MB;
2) what's wrong with memory region under 4G;
3) why I have to add ',high', I only require 384 MB, not 3840 MB.
This patch tries to get memory region from 896 MB firstly, then [896MB,4G],
finally above 4G.
Dave Young sent the original post, and I just re-post it with commit log
improvement as his requirement.
http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
There was an old discussion below (previously posted by Chao Wang):
https://lkml.org/lkml/2013/10/15/601
Signed-off-by: Pingfan Liu <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: [email protected],
Cc: [email protected]
Cc: Randy Dunlap <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
v6 -> v7: commit log improvement
arch/x86/kernel/setup.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3d872a5..fa62c81 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void)
high ? CRASH_ADDR_HIGH_MAX
: CRASH_ADDR_LOW_MAX,
crash_size, CRASH_ALIGN);
+#ifdef CONFIG_X86_64
+ /*
+ * crashkernel=X reserve below 896M fails? Try below 4G
+ */
+ if (!high && !crash_base)
+ crash_base = memblock_find_in_range(CRASH_ALIGN,
+ (1ULL << 32),
+ crash_size, CRASH_ALIGN);
+ /*
+ * crashkernel=X reserve below 4G fails? Try MAXMEM
+ */
+ if (!high && !crash_base)
+ crash_base = memblock_find_in_range(CRASH_ALIGN,
+ CRASH_ADDR_HIGH_MAX,
+ crash_size, CRASH_ALIGN);
+#endif
if (!crash_base) {
pr_info("crashkernel reservation failed - No suitable area found.\n");
return;
--
2.7.4
> Subject: Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X
s/bugfix, //
On Mon, Jan 21, 2019 at 01:16:08PM +0800, Pingfan Liu wrote:
> People reported crashkernel=384M reservation failed on a high end server
> with KASLR enabled. In that case there is enough free memory under 896M
> but crashkernel reservation still fails intermittently.
>
> The situation is crashkernel reservation code only finds free region under
> 896 MB with 128M aligned in case no ',high' being used. And KASLR could
> break the first 896M into several parts randomly thus the failure happens.
This reads very strange.
> User has no way to predict and make sure crashkernel=xM working unless
> he/she use 'crashkernel=xM,high'. Since 'crashkernel=xM' is the most
> common use case this issue is a serious bug.
>
> And we can't answer questions raised from customer:
> 1) why it doesn't succeed to reserve 896 MB;
> 2) what's wrong with memory region under 4G;
> 3) why I have to add ',high', I only require 384 MB, not 3840 MB.
Errr, this looks like communication issue. Sounds to me like the text
around crashkernel= in
Documentation/admin-guide/kernel-parameters.txt
needs improving?
> This patch tries to get memory region from 896 MB firstly, then [896MB,4G],
Avoid having "This patch" or "This commit" in the commit message. It is
tautologically useless.
Also, do
$ git grep 'This patch' Documentation/process
for more details.
> finally above 4G.
>
> Dave Young sent the original post, and I just re-post it with commit log
If he sent it, he should be the author I guess.
> improvement as his requirement.
> http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> There was an old discussion below (previously posted by Chao Wang):
> https://lkml.org/lkml/2013/10/15/601
All that changelog info doesn't belong in the commit message ...
> Signed-off-by: Pingfan Liu <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Baoquan He <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Mike Rapoport <[email protected]>
> Cc: [email protected],
> Cc: [email protected]
> Cc: Randy Dunlap <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
.... but here.
> v6 -> v7: commit log improvement
> arch/x86/kernel/setup.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3d872a5..fa62c81 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void)
> high ? CRASH_ADDR_HIGH_MAX
> : CRASH_ADDR_LOW_MAX,
> crash_size, CRASH_ALIGN);
> +#ifdef CONFIG_X86_64
> + /*
> + * crashkernel=X reserve below 896M fails? Try below 4G
> + */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + (1ULL << 32),
> + crash_size, CRASH_ALIGN);
> + /*
> + * crashkernel=X reserve below 4G fails? Try MAXMEM
> + */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + CRASH_ADDR_HIGH_MAX,
> + crash_size, CRASH_ALIGN);
> +#endif
Ok, so this is silly: we know at which physical address KASLR allocated
the kernel so why aren't we querying that and seeing if there's enough
room before it or after it to call memblock_find_in_range() on the
bigger range?
Also, why is "high" dealt with separately and why isn't the code
enforcing "high" if the normal reservation fails?
The presence of high is requiring from our users to pay attention what
to use when the kernel can do all that automatically. Looks like a UI
fail to me.
And look at all the flavors of crashkernel= :
crashkernel=size[KMG][@offset[KMG]]
crashkernel=range1:size1[,range2:size2,...][@offset]
crashkernel=size[KMG],high
crashkernel=size[KMG],low
We couldn't do one so we made 4?!?!
What for?
Nowhere in that help text does it explain why a user would care about
high or low or range or offset or the planets alignment...
So what's up?
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
> >
> > Dave Young sent the original post, and I just re-post it with commit log
>
> If he sent it, he should be the author I guess.
Just drop the line, but can change the credit to Chao Wang since he did
it initially.
But Chao does not work on kexec/kdump any more, and the email address is
outdated as well.
>
> > improvement as his requirement.
> > http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> > There was an old discussion below (previously posted by Chao Wang):
> > https://lkml.org/lkml/2013/10/15/601
>
> All that changelog info doesn't belong in the commit message ...
>
> > Signed-off-by: Pingfan Liu <[email protected]>
> > Cc: Dave Young <[email protected]>
> > Cc: Baoquan He <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: Mike Rapoport <[email protected]>
> > Cc: [email protected],
> > Cc: [email protected]
> > Cc: Randy Dunlap <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > ---
>
> .... but here.
>
> > v6 -> v7: commit log improvement
> > arch/x86/kernel/setup.c | 16 ++++++++++++++++
> > 1 file changed, 16 insertions(+)
> >
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 3d872a5..fa62c81 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void)
> > high ? CRASH_ADDR_HIGH_MAX
> > : CRASH_ADDR_LOW_MAX,
> > crash_size, CRASH_ALIGN);
> > +#ifdef CONFIG_X86_64
> > + /*
> > + * crashkernel=X reserve below 896M fails? Try below 4G
> > + */
> > + if (!high && !crash_base)
> > + crash_base = memblock_find_in_range(CRASH_ALIGN,
> > + (1ULL << 32),
> > + crash_size, CRASH_ALIGN);
> > + /*
> > + * crashkernel=X reserve below 4G fails? Try MAXMEM
> > + */
> > + if (!high && !crash_base)
> > + crash_base = memblock_find_in_range(CRASH_ALIGN,
> > + CRASH_ADDR_HIGH_MAX,
> > + crash_size, CRASH_ALIGN);
> > +#endif
>
> Ok, so this is silly: we know at which physical address KASLR allocated
> the kernel so why aren't we querying that and seeing if there's enough
> room before it or after it to call memblock_find_in_range() on the
> bigger range?
Baoquan may have comments?
>
> Also, why is "high" dealt with separately and why isn't the code
> enforcing "high" if the normal reservation fails?
>
AFAIK, some people prefer to explictly reserve crash memory at high
region even if it is possible to reserve at low area. May because
<4G memory is limited on large server, they want to leave this for other
use.
Yinghai or Vivek should know more about the history, probably they can
recall some initial reason.
> The presence of high is requiring from our users to pay attention what
> to use when the kernel can do all that automatically. Looks like a UI
> fail to me.
>
> And look at all the flavors of crashkernel= :
>
> crashkernel=size[KMG][@offset[KMG]]
> crashkernel=range1:size1[,range2:size2,...][@offset]
> crashkernel=size[KMG],high
> crashkernel=size[KMG],low
>
> We couldn't do one so we made 4?!?!
>
> What for?
>
> Nowhere in that help text does it explain why a user would care about
> high or low or range or offset or the planets alignment...
>
> So what's up?
Good question, still it may be some historical reason, but it is good to
make them clear and rethink about it after long time.
I also want to understand, need dig the log more.
>
> --
> Regards/Gruss,
> Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.
Thanks
Dave
On Fri, Jan 25, 2019 at 09:45:18PM +0800, Dave Young wrote:
> AFAIK, some people prefer to explictly reserve crash memory at high
> region even if it is possible to reserve at low area. May because
> <4G memory is limited on large server, they want to leave this for other
> use.
>
> Yinghai or Vivek should know more about the history, probably they can
> recall some initial reason.
Yes, just "prefer" is not good enough. There should be a technical
reason why that's there.
Also, if the user doesn't care, then the code should be free to force
"high" and thus probe a different range for allocation.
> Good question, still it may be some historical reason, but it is good to
> make them clear and rethink about it after long time.
>
> I also want to understand, need dig the log more.
Good idea. That would be a very nice cleanup. :-)
Thx.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 01/25/19 at 03:08pm, Borislav Petkov wrote:
> On Fri, Jan 25, 2019 at 09:45:18PM +0800, Dave Young wrote:
> > AFAIK, some people prefer to explictly reserve crash memory at high
> > region even if it is possible to reserve at low area. May because
> > <4G memory is limited on large server, they want to leave this for other
> > use.
> >
> > Yinghai or Vivek should know more about the history, probably they can
> > recall some initial reason.
>
> Yes, just "prefer" is not good enough. There should be a technical
> reason why that's there.
>
> Also, if the user doesn't care, then the code should be free to force
> "high" and thus probe a different range for allocation.
Another reason is in case ,high we will need automatically reserve a
region in low area for swiotlb. So for example one use
crashkernel=256M,high, actual reserved memory is 256M above 4G and
another 256M under 4G for swiotlb. Normally it is not necessary for
most people. Thus we can not make ,high as default.
Thanks
Dave
On Mon, Jan 28, 2019 at 05:58:09PM +0800, Dave Young wrote:
> Another reason is in case ,high we will need automatically reserve a
> region in low area for swiotlb. So for example one use
> crashkernel=256M,high, actual reserved memory is 256M above 4G and
> another 256M under 4G for swiotlb. Normally it is not necessary for
> most people. Thus we can not make ,high as default.
And how is the poor user to figure out that we decided for her/him that
swiotlb reservation is something not necessary for most people and thus
we fail the crashkernel= reservation?
IOW, that "logic" above doesn't make a whole lot of sense to me from
user friendliness perspective.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Fri, Jan 25, 2019 at 10:08 PM Borislav Petkov <[email protected]> wrote:
>
> On Fri, Jan 25, 2019 at 09:45:18PM +0800, Dave Young wrote:
> > AFAIK, some people prefer to explictly reserve crash memory at high
> > region even if it is possible to reserve at low area. May because
> > <4G memory is limited on large server, they want to leave this for other
> > use.
> >
> > Yinghai or Vivek should know more about the history, probably they can
> > recall some initial reason.
>
Go through the git log, and I found the initial introduction of
crashkernel_high option. Refer to
commit 55a20ee7804ab64ac90bcdd4e2868a42829e2784
Author: Yinghai Lu <[email protected]>
Date: Mon Apr 15 22:23:47 2013 -0700
x86, kdump: Retore crashkernel= to allocate under 896M
Vivek found old kexec-tools does not work new kernel anymore.
So change back crashkernel= back to old behavoir, and add crashkernel_high=
to let user decide if buffer could be above 4G, and also new
kexec-tools will
be needed.
But kexec-tools-2.0.3, released at 2012, can run 4.20 kernel with
crashkernel=256M@5G, so I think only very old kexec-tools requires
memory under 896M. Due to -1.few people running latest kernel with
very old kexec-tools to date, -2. crashkernel=X is more popular than
crashkernel=X.high, it should be time to eliminate this limit of
crashkernel=X parameter, otherwise we will run into this bug.
As for crashkernel=,high, I think it is a more professional option for
who cares about the DMA32. On high-end machine, big reserved region is
used for crashkernel(e.g. in this case 384M), which make the crowed
situation under under 4GB memory worse.
> Yes, just "prefer" is not good enough. There should be a technical
> reason why that's there.
>
> Also, if the user doesn't care, then the code should be free to force
> "high" and thus probe a different range for allocation.
>
Do you suggest to remove crashkernel=X,high parameter?
Thanks,
Pingfan
> > Good question, still it may be some historical reason, but it is good to
> > make them clear and rethink about it after long time.
> >
> > I also want to understand, need dig the log more.
>
> Good idea. That would be a very nice cleanup. :-)
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.
On Fri, Jan 25, 2019 at 6:39 PM Borislav Petkov <[email protected]> wrote:
>
>
> > Subject: Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X
>
> s/bugfix, //
>
OK.
> On Mon, Jan 21, 2019 at 01:16:08PM +0800, Pingfan Liu wrote:
> > People reported crashkernel=384M reservation failed on a high end server
> > with KASLR enabled. In that case there is enough free memory under 896M
> > but crashkernel reservation still fails intermittently.
> >
> > The situation is crashkernel reservation code only finds free region under
> > 896 MB with 128M aligned in case no ',high' being used. And KASLR could
> > break the first 896M into several parts randomly thus the failure happens.
>
> This reads very strange.
>
What about " It turns out that crashkernel reservation code only
tries to find a region under 896 MB, aligned on 128M. But KASLR
randomly breaks big region inside [0,896M] into smaller pieces, not
big enough as demanded in the "crashkernel=X" parameter."
> > User has no way to predict and make sure crashkernel=xM working unless
> > he/she use 'crashkernel=xM,high'. Since 'crashkernel=xM' is the most
> > common use case this issue is a serious bug.
> >
> > And we can't answer questions raised from customer:
> > 1) why it doesn't succeed to reserve 896 MB;
> > 2) what's wrong with memory region under 4G;
> > 3) why I have to add ',high', I only require 384 MB, not 3840 MB.
>
> Errr, this looks like communication issue. Sounds to me like the text
> around crashkernel= in
>
What about dropping this section in commit log and another patch to
fix the document?
> Documentation/admin-guide/kernel-parameters.txt
>
> needs improving?
>
> > This patch tries to get memory region from 896 MB firstly, then [896MB,4G],
>
> Avoid having "This patch" or "This commit" in the commit message. It is
> tautologically useless.
>
OK
> Also, do
>
> $ git grep 'This patch' Documentation/process
>
> for more details.
>
> > finally above 4G.
> >
> > Dave Young sent the original post, and I just re-post it with commit log
>
> If he sent it, he should be the author I guess.
>
> > improvement as his requirement.
> > http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> > There was an old discussion below (previously posted by Chao Wang):
> > https://lkml.org/lkml/2013/10/15/601
>
> All that changelog info doesn't belong in the commit message ...
>
> > Signed-off-by: Pingfan Liu <[email protected]>
> > Cc: Dave Young <[email protected]>
> > Cc: Baoquan He <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: Mike Rapoport <[email protected]>
> > Cc: [email protected],
> > Cc: [email protected]
> > Cc: Randy Dunlap <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > ---
>
> .... but here.
>
> > v6 -> v7: commit log improvement
> > arch/x86/kernel/setup.c | 16 ++++++++++++++++
> > 1 file changed, 16 insertions(+)
> >
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 3d872a5..fa62c81 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void)
> > high ? CRASH_ADDR_HIGH_MAX
> > : CRASH_ADDR_LOW_MAX,
> > crash_size, CRASH_ALIGN);
> > +#ifdef CONFIG_X86_64
> > + /*
> > + * crashkernel=X reserve below 896M fails? Try below 4G
> > + */
> > + if (!high && !crash_base)
> > + crash_base = memblock_find_in_range(CRASH_ALIGN,
> > + (1ULL << 32),
> > + crash_size, CRASH_ALIGN);
> > + /*
> > + * crashkernel=X reserve below 4G fails? Try MAXMEM
> > + */
> > + if (!high && !crash_base)
> > + crash_base = memblock_find_in_range(CRASH_ALIGN,
> > + CRASH_ADDR_HIGH_MAX,
> > + crash_size, CRASH_ALIGN);
> > +#endif
>
> Ok, so this is silly: we know at which physical address KASLR allocated
> the kernel so why aren't we querying that and seeing if there's enough
> room before it or after it to call memblock_find_in_range() on the
> bigger range?
>
Sorry, can not catch up with you. Do you suggestion
memblock_find_in_range(0, kernel_start) and
memblock_find_in_range(kernel_end, mem_end)? But the memory is
truncated into fraction by many component which call
memblock_reserve(), besides kernel.
For the left question, Dave has follow the discussion in another
email, will follow there.
Thanks and regards,
Pingfan
> Also, why is "high" dealt with separately and why isn't the code
> enforcing "high" if the normal reservation fails?
>
> The presence of high is requiring from our users to pay attention what
> to use when the kernel can do all that automatically. Looks like a UI
> fail to me.
>
> And look at all the flavors of crashkernel= :
>
> crashkernel=size[KMG][@offset[KMG]]
> crashkernel=range1:size1[,range2:size2,...][@offset]
> crashkernel=size[KMG],high
> crashkernel=size[KMG],low
>
> We couldn't do one so we made 4?!?!
>
> What for?
>
> Nowhere in that help text does it explain why a user would care about
> high or low or range or offset or the planets alignment...
>
> So what's up?
>
> --
> Regards/Gruss,
> Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.
On 01/29/19 at 01:25pm, Pingfan Liu wrote:
> On Fri, Jan 25, 2019 at 10:08 PM Borislav Petkov <[email protected]> wrote:
> >
> > On Fri, Jan 25, 2019 at 09:45:18PM +0800, Dave Young wrote:
> > > AFAIK, some people prefer to explictly reserve crash memory at high
> > > region even if it is possible to reserve at low area. May because
> > > <4G memory is limited on large server, they want to leave this for other
> > > use.
> > >
> > > Yinghai or Vivek should know more about the history, probably they can
> > > recall some initial reason.
> >
> Go through the git log, and I found the initial introduction of
> crashkernel_high option. Refer to
> commit 55a20ee7804ab64ac90bcdd4e2868a42829e2784
> Author: Yinghai Lu <[email protected]>
> Date: Mon Apr 15 22:23:47 2013 -0700
>
> x86, kdump: Retore crashkernel= to allocate under 896M
>
> Vivek found old kexec-tools does not work new kernel anymore.
>
> So change back crashkernel= back to old behavoir, and add crashkernel_high=
> to let user decide if buffer could be above 4G, and also new
> kexec-tools will
> be needed.
>
> But kexec-tools-2.0.3, released at 2012, can run 4.20 kernel with
> crashkernel=256M@5G, so I think only very old kexec-tools requires
> memory under 896M. Due to -1.few people running latest kernel with
> very old kexec-tools to date, -2. crashkernel=X is more popular than
> crashkernel=X.high, it should be time to eliminate this limit of
> crashkernel=X parameter, otherwise we will run into this bug.
> As for crashkernel=,high, I think it is a more professional option for
> who cares about the DMA32. On high-end machine, big reserved region is
> used for crashkernel(e.g. in this case 384M), which make the crowed
> situation under under 4GB memory worse.
This seems answers the question why ,high is needed and why
crashkernel=X can not allocate from high by default.
Another reason for this question is for crashkernel=,high, a separate
region under 4G is still needed. I searched some history bug/emails,
previously the initial commit does not reserve low memory by default if
,high is used, but later Yinghai saw a regression bug in case iommu
disabled. see below commit:
commit c729de8fcea37a1c444e81857eace12494c804a9
Author: Yinghai Lu <[email protected]>
Date: Mon Apr 15 22:23:45 2013 -0700
x86, kdump: Set crashkernel_low automatically
Chao said that kdump does does work well on his system on 3.8
without extra parameter, even iommu does not work with kdump.
And now have to append crashkernel_low=Y in first kernel to make
kdump work.
We have now modified crashkernel=X to allocate memory beyong 4G (if
available) and do not allocate low range for crashkernel if the user
does not specify that with crashkernel_low=Y. This causes regression
if iommu is not enabled. Without iommu, swiotlb needs to be setup in
first 4G and there is no low memory available to second kernel.
[snip]
>
> > Yes, just "prefer" is not good enough. There should be a technical
> > reason why that's there.
> >
> > Also, if the user doesn't care, then the code should be free to force
> > "high" and thus probe a different range for allocation.
> >
> Do you suggest to remove crashkernel=X,high parameter?
I do not think it can be removed, because for ,high we also need extra
low memory, it will cause confusion, most people are just fine without
this. People can choose to use ,high if they really need it.
Thanks
Dave
On 01/25/19 at 03:08pm, Borislav Petkov wrote:
> On Fri, Jan 25, 2019 at 09:45:18PM +0800, Dave Young wrote:
> > AFAIK, some people prefer to explictly reserve crash memory at high
> > region even if it is possible to reserve at low area. May because
> > <4G memory is limited on large server, they want to leave this for other
> > use.
> >
> > Yinghai or Vivek should know more about the history, probably they can
> > recall some initial reason.
>
> Yes, just "prefer" is not good enough. There should be a technical
> reason why that's there.
Got more about this, actually the thing is for large server with very
huge memory and also possible a lot of io devices. The UEFI IO drivers
could allocate a lot memory under 896M, so it will leave small room for
crashkernel=X
Also if the memory is huge, then copying the vmcore will be very slow,
people want to use big makedumpfile bitmap buffer, and multithread, then
kdump kernel needs more memory, they can choose ,high to get more
memory.
But for common use cases, if one does not need so much kdump memory
reserved he/she can just use crashkernel=X.
>
> Also, if the user doesn't care, then the code should be free to force
> "high" and thus probe a different range for allocation.
As Pingfan/me mentioned in another reply, there are two reasons:
1. old kexec-tools can not load kernel to high memory
2. ,high will not work on some systems without some amounts of low
memory so it nees reserve extra low memory, it is bad for people who do
not want it.
>
> > Good question, still it may be some historical reason, but it is good to
> > make them clear and rethink about it after long time.
> >
> > I also want to understand, need dig the log more.
>
> Good idea. That would be a very nice cleanup. :-)
Let's review these different parameters:
> crashkernel=size[KMG][@offset[KMG]]
Did not get the initial commit for this, it should be the very old
format, and kernel did not find the base address automatically for the
size
Other arches still use this, for example mips code seems needs explict
@offset, see the function mips_parse_crashkernel in
arch/mips/kernel/setup.c
Probably there are other arches as well which only support this format.
> crashkernel=range1:size1[,range2:size2,...][@offset]
This is nice param which can help to dynamically reserve different size
depends on system memory.
> crashkernel=size[KMG],high
> crashkernel=size[KMG],low
We have talked about these two. It is not graceful, but seems no way to
improve it due to the swiotlb issue.
Thanks
Dave
On Tue, Jan 29, 2019 at 01:51:45PM +0800, Pingfan Liu wrote:
> Sorry, can not catch up with you. Do you suggestion
> memblock_find_in_range(0, kernel_start) and
> memblock_find_in_range(kernel_end, mem_end)? But the memory is
> truncated into fraction by many component which call
> memblock_reserve(), besides kernel.
extract_kernel() selects the physical address where the place the
kernel. You can use that physical address to figure out where to
allocate the crash kernel range.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Thu, Jan 31, 2019 at 03:59:07PM +0800, Dave Young wrote:
> As Pingfan/me mentioned in another reply, there are two reasons:
> 1. old kexec-tools can not load kernel to high memory
> 2. ,high will not work on some systems without some amounts of low
> memory so it nees reserve extra low memory, it is bad for people who do
> not want it.
Let's see: we don't enable high by default because of old kexec-tools
and some systems which do some funky reservations.
But this patch kinda enables high by trying a couple more regions.
So we don't really need this - we simply need to tell people to use high
if it fails with KASLR, AFAICT.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Thu, Jan 31, 2019 at 11:57:32AM +0100, Borislav Petkov wrote:
> On Thu, Jan 31, 2019 at 03:59:07PM +0800, Dave Young wrote:
> > As Pingfan/me mentioned in another reply, there are two reasons:
> > 1. old kexec-tools can not load kernel to high memory
> > 2. ,high will not work on some systems without some amounts of low
> > memory so it nees reserve extra low memory, it is bad for people who do
> > not want it.
>
> Let's see: we don't enable high by default because of old kexec-tools
> and some systems which do some funky reservations.
>
> But this patch kinda enables high by trying a couple more regions.
>
> So we don't really need this - we simply need to tell people to use high
> if it fails with KASLR, AFAICT.
Borris,
The testing I've done shows that the allocation failure caused by KASLR
is intermittent. On one SUT that I've seen this issue on, the crash
kernel allocation fails less than 10% of the time.
So even if a system administrator is diligent and tests
that a chosen kdump configuration works, that configuration
might not work on some random reboot 7 months in the future.
Unfortunately, that will be the time that the system crashes
and we won't be able to collect the crash dump due to the
crashkernel allocation failure.
Jerry
--
-----------------------------------------------------------------------------
Jerry Hoemann Software Engineer Hewlett Packard Enterprise
-----------------------------------------------------------------------------
On Thu, Jan 31, 2019 at 03:27:32PM -0700, Jerry Hoemann wrote:
> So even if a system administrator is diligent and tests
> that a chosen kdump configuration works, that configuration
> might not work on some random reboot 7 months in the future.
Jerry, did you read the rest of the thread where I'm *actually*
suggesting to make the allocation code more robust against such
failures?
Doesn't look like it...
Now let's look at the code:
The "high" allocation does:
crash_base = memblock_find_in_range(CRASH_ALIGN,
high ? CRASH_ADDR_HIGH_MAX
: CRASH_ADDR_LOW_MAX,
crash_size, CRASH_ALIGN);
where high=true and CRASH_ADDR_HIGH_MAX on 64-bit is MAXMEM:
# define CRASH_ADDR_HIGH_MAX MAXMEM
The second fallback in the suggested patch does the same:
+ /*
+ * crashkernel=X reserve below 4G fails? Try MAXMEM
+ */
+ if (!high && !crash_base)
+ crash_base = memblock_find_in_range(CRASH_ALIGN,
+ CRASH_ADDR_HIGH_MAX,
+ crash_size, CRASH_ALIGN);
and yet I get back that falling back to "high" if the first allocation
doesn't succeed is not something we should do by default because of
reasons. But this patch *practically* *does* it.
So no, until this hasn't been done cleanly and properly explained, we'll
be in a holding pattern.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Fri, Feb 01, 2019 at 12:47:40AM +0100, Borislav Petkov wrote:
> On Thu, Jan 31, 2019 at 03:27:32PM -0700, Jerry Hoemann wrote:
> > So even if a system administrator is diligent and tests
> > that a chosen kdump configuration works, that configuration
> > might not work on some random reboot 7 months in the future.
>
> Jerry, did you read the rest of the thread where I'm *actually*
> suggesting to make the allocation code more robust against such
> failures?
Boris,
I may have misunderstood your earlier comment:
So we don't really need this - we simply need to tell people to use high
if it fails with KASLR, AFAICT
To imply an iterative approach to crashkernel size discovery. Whereas you
may simply have ment: Always use ,high as the old way is broken.
> Now let's look at the code:
>
> The "high" allocation does:
>
> crash_base = memblock_find_in_range(CRASH_ALIGN,
> high ? CRASH_ADDR_HIGH_MAX
> : CRASH_ADDR_LOW_MAX,
> crash_size, CRASH_ALIGN);
>
> where high=true and CRASH_ADDR_HIGH_MAX on 64-bit is MAXMEM:
>
> # define CRASH_ADDR_HIGH_MAX MAXMEM
>
> The second fallback in the suggested patch does the same:
>
> + /*
> + * crashkernel=X reserve below 4G fails? Try MAXMEM
> + */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + CRASH_ADDR_HIGH_MAX,
> + crash_size, CRASH_ALIGN);
>
> and yet I get back that falling back to "high" if the first allocation
> doesn't succeed is not something we should do by default because of
> reasons. But this patch *practically* *does* it.
Is your objection only to the second fallback of allocating
memory above >= 4GB? Or are you objecting to allocating from
(896 .. 4GB) as well?
Falling back to allocating < 4GB probably satisfes most of the cases
where the original allocation fails.
thanks
--
-----------------------------------------------------------------------------
Jerry Hoemann Software Engineer Hewlett Packard Enterprise
-----------------------------------------------------------------------------
On Mon, Feb 04, 2019 at 03:30:16PM -0700, Jerry Hoemann wrote:
> Is your objection only to the second fallback of allocating
> memory above >= 4GB? Or are you objecting to allocating from
> (896 .. 4GB) as well?
My problem is why should the user need to specify high or low allocation
explicitly when we can handle all that in the kernel automatically.
The presence of crashkernel= on the cmdline sure means that the user
wants to allocate memory for a second kernel.
Now, if the requested allocation fails, we say:
Error reserving crashkernel
So, instead of saying that, we can *try* *again* and say
Error reserving requested crashkernel at @..., attempting a high range.
and run memblock_find_in_range() on the other regions which we deemed
are ok to allocate from.
Why aren't we doing that by default instead of placing all those
different options in front of the user and expecting her/him to know
something about all those magic ranges?
I don't think most of the users care about where the kernel gets
allocated - all they want is a working kdump setup.
> Falling back to allocating < 4GB probably satisfes most of the cases
> where the original allocation fails.
Yes. Now make that automatic.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 02/05/19 at 09:15am, Borislav Petkov wrote:
> On Mon, Feb 04, 2019 at 03:30:16PM -0700, Jerry Hoemann wrote:
> > Is your objection only to the second fallback of allocating
> > memory above >= 4GB? Or are you objecting to allocating from
> > (896 .. 4GB) as well?
>
> My problem is why should the user need to specify high or low allocation
> explicitly when we can handle all that in the kernel automatically.
>
> The presence of crashkernel= on the cmdline sure means that the user
> wants to allocate memory for a second kernel.
>
> Now, if the requested allocation fails, we say:
>
> Error reserving crashkernel
>
> So, instead of saying that, we can *try* *again* and say
>
> Error reserving requested crashkernel at @..., attempting a high range.
>
> and run memblock_find_in_range() on the other regions which we deemed
> are ok to allocate from.
>
> Why aren't we doing that by default instead of placing all those
> different options in front of the user and expecting her/him to know
> something about all those magic ranges?
As we talked in another reply, for the >4G allocation we can not avoid
the swiotlb issue, but if one request for 256M in high region and we
allocate the low part automatically, it will eat more memory eg. 512M.
But probably in case allacation failed in low region ,high is a must for kdump
reservation, since no other choices perhaps we can make that as you said
>
> I don't think most of the users care about where the kernel gets
> allocated - all they want is a working kdump setup.
>
> > Falling back to allocating < 4GB probably satisfes most of the cases
> > where the original allocation fails.
>
> Yes. Now make that automatic.
For the time being, this should be good enough.
>
> --
> Regards/Gruss,
> Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.
Thanks
Dave
On 02/06/19 at 08:08pm, Dave Young wrote:
> On 02/05/19 at 09:15am, Borislav Petkov wrote:
> > On Mon, Feb 04, 2019 at 03:30:16PM -0700, Jerry Hoemann wrote:
> > > Is your objection only to the second fallback of allocating
> > > memory above >= 4GB? Or are you objecting to allocating from
> > > (896 .. 4GB) as well?
> >
> > My problem is why should the user need to specify high or low allocation
> > explicitly when we can handle all that in the kernel automatically.
> >
> > The presence of crashkernel= on the cmdline sure means that the user
> > wants to allocate memory for a second kernel.
> >
> > Now, if the requested allocation fails, we say:
> >
> > Error reserving crashkernel
> >
> > So, instead of saying that, we can *try* *again* and say
> >
> > Error reserving requested crashkernel at @..., attempting a high range.
> >
> > and run memblock_find_in_range() on the other regions which we deemed
> > are ok to allocate from.
> >
> > Why aren't we doing that by default instead of placing all those
> > different options in front of the user and expecting her/him to know
> > something about all those magic ranges?
>
> As we talked in another reply, for the >4G allocation we can not avoid
> the swiotlb issue, but if one request for 256M in high region and we
> allocate the low part automatically, it will eat more memory eg. 512M.
>
> But probably in case allacation failed in low region ,high is a must for kdump
> reservation, since no other choices perhaps we can make that as you said
That is exactly what Pingfan is doing in this patch.
Even we make it automatic in kernel, but we have to have some default
value for swiotlb in case crashkernel can not find a free region under 4G.
So this default value can not work for every use cases, people need
manually use crashkernel=,low and crashkernel=,high in case
crashkernel=X does not work. One can tune it for their use:
1) crashkernel=X reservation fails, likely the ,low default value is
still too big, one can shrink the value and manually try other value
2) crashkernel=X reserve successfully on high memory and along with some
default low memory region. But the low region is not enough. In this
case one can increase the
This should answer the question why ,high and ,low is still needed.
But for above consumption 1), KASLR can still cause default ,low memory
failed to reserve. So I wonder if KASLR can skip the 0-896M if the
system memory is big enough.
Thanks
Dave
On Tue, Feb 12, 2019 at 4:48 AM Dave Young <[email protected]> wrote:
>
> On 02/06/19 at 08:08pm, Dave Young wrote:
> > On 02/05/19 at 09:15am, Borislav Petkov wrote:
> > > On Mon, Feb 04, 2019 at 03:30:16PM -0700, Jerry Hoemann wrote:
> > > > Is your objection only to the second fallback of allocating
> > > > memory above >= 4GB? Or are you objecting to allocating from
> > > > (896 .. 4GB) as well?
> > >
> > > My problem is why should the user need to specify high or low allocation
> > > explicitly when we can handle all that in the kernel automatically.
> > >
> > > The presence of crashkernel= on the cmdline sure means that the user
> > > wants to allocate memory for a second kernel.
> > >
> > > Now, if the requested allocation fails, we say:
> > >
> > > Error reserving crashkernel
> > >
> > > So, instead of saying that, we can *try* *again* and say
> > >
> > > Error reserving requested crashkernel at @..., attempting a high range.
> > >
> > > and run memblock_find_in_range() on the other regions which we deemed
> > > are ok to allocate from.
> > >
> > > Why aren't we doing that by default instead of placing all those
> > > different options in front of the user and expecting her/him to know
> > > something about all those magic ranges?
> >
> > As we talked in another reply, for the >4G allocation we can not avoid
> > the swiotlb issue, but if one request for 256M in high region and we
> > allocate the low part automatically, it will eat more memory eg. 512M.
> >
> > But probably in case allacation failed in low region ,high is a must for kdump
> > reservation, since no other choices perhaps we can make that as you said
>
> That is exactly what Pingfan is doing in this patch.
>
> Even we make it automatic in kernel, but we have to have some default
> value for swiotlb in case crashkernel can not find a free region under 4G.
> So this default value can not work for every use cases, people need
> manually use crashkernel=,low and crashkernel=,high in case
> crashkernel=X does not work. One can tune it for their use:
>
> 1) crashkernel=X reservation fails, likely the ,low default value is
> still too big, one can shrink the value and manually try other value
> 2) crashkernel=X reserve successfully on high memory and along with some
> default low memory region. But the low region is not enough. In this
> case one can increase the
>
> This should answer the question why ,high and ,low is still needed.
>
> But for above consumption 1), KASLR can still cause default ,low memory
> failed to reserve. So I wonder if KASLR can skip the 0-896M if the
> system memory is big enough.
>
A little fix about the comment. Refer to reserve_crashkernel_low(),
low_base = memblock_find_in_range(0, 1ULL << 32, low_size,
CRASH_ALIGN); So it should try 0~4G for ",low". And the default size
for low is 256M. Given the limited memory region reserved by other
component before crashkernel, we always can find a continuous chunk of
256M inside the fragmented [0,4G], which is split by initrd, KASLR.
Thanks,
Pingfan
On Tue, Feb 12, 2019 at 04:48:16AM +0800, Dave Young wrote:
> Even we make it automatic in kernel, but we have to have some default
> value for swiotlb in case crashkernel can not find a free region under 4G.
> So this default value can not work for every use cases, people need
> manually use crashkernel=,low and crashkernel=,high in case
> crashkernel=X does not work.
Why would the user need to find swiotlb range? The kernel has all the
information it requires at its finger tips in order to decide properly.
The user wants a crashkernel range, the kernel tries the low range =>
no workie, then it tries the next range => workie but needs to allocate
swiotlb range so that DMA can happen too. Doh, then the kernel does
allocate that too.
Why would the user need to do anything here?!
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 02/15/19 at 11:24am, Borislav Petkov wrote:
> On Tue, Feb 12, 2019 at 04:48:16AM +0800, Dave Young wrote:
> > Even we make it automatic in kernel, but we have to have some default
> > value for swiotlb in case crashkernel can not find a free region under 4G.
> > So this default value can not work for every use cases, people need
> > manually use crashkernel=,low and crashkernel=,high in case
> > crashkernel=X does not work.
>
> Why would the user need to find swiotlb range? The kernel has all the
> information it requires at its finger tips in order to decide properly.
>
> The user wants a crashkernel range, the kernel tries the low range =>
> no workie, then it tries the next range => workie but needs to allocate
> swiotlb range so that DMA can happen too. Doh, then the kernel does
> allocate that too.
It is ideal if kernel can do it automatically, but I'm not sure if
kernel can predict the swiotlb reserved size automatically.
Let's add more people to seek for comments.
>
> Why would the user need to do anything here?!
>
> --
> Regards/Gruss,
> Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.
On Mon, Feb 18, 2019 at 9:48 AM Dave Young <[email protected]> wrote:
>
> On 02/15/19 at 11:24am, Borislav Petkov wrote:
> > On Tue, Feb 12, 2019 at 04:48:16AM +0800, Dave Young wrote:
> > > Even we make it automatic in kernel, but we have to have some default
> > > value for swiotlb in case crashkernel can not find a free region under 4G.
> > > So this default value can not work for every use cases, people need
> > > manually use crashkernel=,low and crashkernel=,high in case
> > > crashkernel=X does not work.
> >
> > Why would the user need to find swiotlb range? The kernel has all the
> > information it requires at its finger tips in order to decide properly.
> >
> > The user wants a crashkernel range, the kernel tries the low range =>
> > no workie, then it tries the next range => workie but needs to allocate
> > swiotlb range so that DMA can happen too. Doh, then the kernel does
> > allocate that too.
>
> It is ideal if kernel can do it automatically, but I'm not sure if
> kernel can predict the swiotlb reserved size automatically.
>
Agreed, I think it is hard to decide the reserved size automatically.
We do not know the requirement for memory of ZONE_DMA32 at boot time.
The requirement depends on how many DMA32 devices, and the dynamic
payload of them.
> Let's add more people to seek for comments.
>
> >
> > Why would the user need to do anything here?!
> >
> > --
> > Regards/Gruss,
> > Boris.
> >
> > Good mailing practices for 400: avoid top-posting and trim the reply.
On Mon, Feb 18, 2019 at 09:48:20AM +0800, Dave Young wrote:
> It is ideal if kernel can do it automatically, but I'm not sure if
> kernel can predict the swiotlb reserved size automatically.
Do you see how even more absurd this gets?
If the kernel cannot know the swiotlb reserved size automatically, how
is the normal user even supposed to know?!
I see swiotlb_size_or_default() so we have a sane default which we fall
back to. Now where's the problem with that?
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 02/20/19 at 09:32am, Borislav Petkov wrote:
> On Mon, Feb 18, 2019 at 09:48:20AM +0800, Dave Young wrote:
> > It is ideal if kernel can do it automatically, but I'm not sure if
> > kernel can predict the swiotlb reserved size automatically.
>
> Do you see how even more absurd this gets?
>
> If the kernel cannot know the swiotlb reserved size automatically, how
> is the normal user even supposed to know?!
>
> I see swiotlb_size_or_default() so we have a sane default which we fall
> back to. Now where's the problem with that?
Good question, I expect some answer from people who know more about the
background. It would be good to have some actual test results, Pingfan
is trying to do some tests.
Previously Joerg posted below patch, maybe he has some idea. Joerg?
commit 94fb9334182284e8e7e4bcb9125c25dc33af19d4
Author: Joerg Roedel <[email protected]>
Date: Wed Jun 10 17:49:42 2015 +0200
x86/crash: Allocate enough low memory when crashkernel=high
Thanks
Dave
On Wed, Feb 20, 2019 at 5:41 PM Dave Young <[email protected]> wrote:
>
> On 02/20/19 at 09:32am, Borislav Petkov wrote:
> > On Mon, Feb 18, 2019 at 09:48:20AM +0800, Dave Young wrote:
> > > It is ideal if kernel can do it automatically, but I'm not sure if
> > > kernel can predict the swiotlb reserved size automatically.
> >
> > Do you see how even more absurd this gets?
> >
> > If the kernel cannot know the swiotlb reserved size automatically, how
> > is the normal user even supposed to know?!
> >
I think swiotlb is bounce-buffer, if we enlarge it, we can get better
performance. Default size should be enough for platform to work. But
in case of reserving low memory for crashkernel, things are different.
The reserve low memory = swiotlb_size_or_default() + DMA32 memory for
devices. And the 2nd item in the right of the equation varies, based
on machine type and dynamic payload
> > I see swiotlb_size_or_default() so we have a sane default which we fall
> > back to. Now where's the problem with that?
>
> Good question, I expect some answer from people who know more about the
> background. It would be good to have some actual test results, Pingfan
> is trying to do some tests.
>
Not following the idea, I do not think the following test result can
tell much. (We need various type of machine to get a final result.)
I do a quick test on "HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10",
command line "crashkernel=180M,high crashkernel=64M,low" can work for
the 2nd kernel. Although it complained some memory shortage issue:
[ 7.655591] fbcon: mgadrmfb (fb0) is primary device
[ 7.655639] Console: switching to colour frame buffer device 128x48
[ 7.660609] systemd-udevd: page allocation failure: order:0, mode:0x280d4
[ 7.660611] CPU: 0 PID: 180 Comm: systemd-udevd Not tainted
3.10.0-957.el7.x86_64 #1
[ 7.660612] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
Gen10, BIOS U30 06/20/2018
[ 7.660612] Call Trace:
[ 7.660621] [<ffffffff81761dc1>] dump_stack+0x19/0x1b
[ 7.660625] [<ffffffff811bc830>] warn_alloc_failed+0x110/0x180
[ 7.660628] [<ffffffff8175d3ce>] __alloc_pages_slowpath+0x6b6/0x724
[ 7.660631] [<ffffffff811c0e95>] __alloc_pages_nodemask+0x405/0x420
[ 7.660633] [<ffffffff8120dcf8>] alloc_pages_current+0x98/0x110
[ 7.660638] [<ffffffffc00c8622>] ttm_pool_populate+0x3d2/0x4b0 [ttm]
[ 7.660641] [<ffffffffc00bf1cd>] ttm_tt_populate+0x7d/0x90 [ttm]
[ 7.660644] [<ffffffffc00c3c74>] ttm_bo_kmap+0x124/0x240 [ttm]
[ 7.660648] [<ffffffff810cecbf>] ? __wake_up_sync_key+0x4f/0x60
[ 7.660650] [<ffffffffc012677e>] mga_dirty_update+0x25e/0x310 [mgag200]
[ 7.660653] [<ffffffffc012685f>] mga_imageblit+0x2f/0x40 [mgag200]
[ 7.660657] [<ffffffff813f97ca>] soft_cursor+0x1ba/0x260
[ 7.660659] [<ffffffff813f8f53>] bit_cursor+0x663/0x6a0
[ 7.660662] [<ffffffff81098739>] ? console_trylock+0x19/0x70
[ 7.660664] [<ffffffff813f514d>] fbcon_cursor+0x13d/0x1c0
[ 7.660665] [<ffffffff813f88f0>] ? bit_clear+0x120/0x120
[ 7.660668] [<ffffffff8146af2e>] hide_cursor+0x2e/0xa0
[ 7.660669] [<ffffffff8146d4e8>] redraw_screen+0x188/0x270
[ 7.660671] [<ffffffff8146e086>] do_bind_con_driver+0x316/0x340
[ 7.660672] [<ffffffff8146e5e9>] do_take_over_console+0x49/0x60
[ 7.660674] [<ffffffff813f24c3>] do_fbcon_takeover+0x63/0xd0
[ 7.660675] [<ffffffff813f808d>] fbcon_event_notify+0x61d/0x730
[ 7.660678] [<ffffffff8176fb0f>] notifier_call_chain+0x4f/0x70
[ 7.660681] [<ffffffff810c7f6d>] __blocking_notifier_call_chain+0x4d/0x70
[ 7.660683] [<ffffffff810c7fa6>] blocking_notifier_call_chain+0x16/0x20
[ 7.660684] [<ffffffff813e8b9b>] fb_notifier_call_chain+0x1b/0x20
[ 7.660686] [<ffffffff813e9e46>] register_framebuffer+0x1f6/0x340
[ 7.660690] [<ffffffffc01027e2>]
__drm_fb_helper_initial_config_and_unlock+0x252/0x3e0 [drm_kms_helper]
[ 7.660694] [<ffffffffc01029ae>]
drm_fb_helper_initial_config+0x3e/0x50 [drm_kms_helper]
[ 7.660697] [<ffffffffc01269d3>] mgag200_fbdev_init+0xe3/0x100 [mgag200]
[ 7.660699] [<ffffffffc01254f4>] mgag200_modeset_init+0x154/0x1d0 [mgag200]
[ 7.660701] [<ffffffffc012157d>] mgag200_driver_load+0x41d/0x5b0 [mgag200]
[ 7.660708] [<ffffffffc005ba4f>] drm_dev_register+0x15f/0x1f0 [drm]
[ 7.660711] [<ffffffff813c3518>] ? pci_enable_device_flags+0xe8/0x140
[ 7.660718] [<ffffffffc005d0da>] drm_get_pci_dev+0x8a/0x1a0 [drm]
[ 7.660720] [<ffffffffc012626b>] mga_pci_probe+0x9b/0xc0 [mgag200]
[ 7.660722] [<ffffffff813c4aca>] local_pci_probe+0x4a/0xb0
[ 7.660723] [<ffffffff813c6209>] pci_device_probe+0x109/0x160
[ 7.660726] [<ffffffff814a8285>] driver_probe_device+0xc5/0x3e0
[ 7.660727] [<ffffffff814a8683>] __driver_attach+0x93/0xa0
[ 7.660728] [<ffffffff814a85f0>] ? __device_attach+0x50/0x50
[ 7.660730] [<ffffffff814a5e25>] bus_for_each_dev+0x75/0xc0
[ 7.660731] [<ffffffff814a7bfe>] driver_attach+0x1e/0x20
[ 7.660733] [<ffffffff814a76a0>] bus_add_driver+0x200/0x2d0
[ 7.660734] [<ffffffff814a8d14>] driver_register+0x64/0xf0
[ 7.660735] [<ffffffff813c5a45>] __pci_register_driver+0xa5/0xc0
[ 7.660737] [<ffffffffc012d000>] ? 0xffffffffc012cfff
[ 7.660739] [<ffffffffc012d039>] mgag200_init+0x39/0x1000 [mgag200]
[ 7.660742] [<ffffffff8100210a>] do_one_initcall+0xba/0x240
[ 7.660745] [<ffffffff81118f8c>] load_module+0x272c/0x2bc0
[ 7.660748] [<ffffffff813a3030>] ? ddebug_proc_write+0x100/0x100
[ 7.660750] [<ffffffff8111950f>] SyS_init_module+0xef/0x140
[ 7.660752] [<ffffffff81774ddb>] system_call_fastpath+0x22/0x27
[ 7.660753] Mem-Info:
[ 7.660756] active_anon:3364 inactive_anon:6661 isolated_anon:0
[ 7.660756] active_file:0 inactive_file:0 isolated_file:0
[ 7.660756] unevictable:0 dirty:0 writeback:0 unstable:0
[ 7.660756] slab_reclaimable:1492 slab_unreclaimable:3116
[ 7.660756] mapped:1223 shmem:8449 pagetables:179 bounce:0
[ 7.660756] free:20626 free_pcp:0 free_cma:0
[ 7.660761] Node 0 DMA free:0kB min:4kB low:4kB high:4kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:564kB
managed:448kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 7.660762] lowmem_reserve[]: 0 0 152 152
[ 7.660766] Node 0 DMA32 free:0kB min:0kB low:0kB high:0kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:65536kB
managed:0kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 7.660767] lowmem_reserve[]: 0 0 152 152
[ 7.660771] Node 0 Normal free:82504kB min:1572kB low:1964kB
high:2356kB active_anon:13456kB inactive_anon:26644kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:183740kB managed:158716kB mlocked:0kB
dirty:0kB writeback:0kB mapped:4892kB shmem:33796kB
slab_reclaimable:5968kB slab_unreclaimable:12464kB kernel_stack:784kB
pagetables:716kB unstable:0kB bounce:0kB free_pcp:[ 8.722693]
Microsemi PQI Driver (v1.1.4-115)
> Previously Joerg posted below patch, maybe he has some idea. Joerg?
>
> commit 94fb9334182284e8e7e4bcb9125c25dc33af19d4
> Author: Joerg Roedel <[email protected]>
> Date: Wed Jun 10 17:49:42 2015 +0200
>
> x86/crash: Allocate enough low memory when crashkernel=high
>
> Thanks
> Dave
On Wed, Feb 20, 2019 at 05:41:46PM +0800, Dave Young wrote:
> Previously Joerg posted below patch, maybe he has some idea. Joerg?
Isn't it clear from the commit message?
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 02/21/19 at 06:13pm, Borislav Petkov wrote:
> On Wed, Feb 20, 2019 at 05:41:46PM +0800, Dave Young wrote:
> > Previously Joerg posted below patch, maybe he has some idea. Joerg?
>
> Isn't it clear from the commit message?
Then, does it answered your question?
256M is set as a default value in the patch, but it is not a predict to
satisfy all use cases, from the description it is also possible that some
people run out of the 256M and the ,low and ,high format is still
necessary to exist even if we make crashkernel=X do the allocation
automatically in high in case failed in low area.
crashkernel=X: allocate in low first, if not possible, then allocate in
high
In case people have a lot of devices need more swiotlb, then he manually
set the ,high with ,low together.
What's your suggestion then? remove ,low and ,high and increase default
256M in case we get failure bug report?
On Fri, Feb 22, 2019 at 10:11:01AM +0800, Dave Young wrote:
> In case people have a lot of devices need more swiotlb, then he manually
> set the ,high with ,low together.
The option to specify the high and low values for the crashkernel are
important for certain machines. The point is that swiotlb already
allocates 64MB of low memory by default. But that memory is only used
for 32bit DMA-mask devices that want to DMA into high memory. There are
drivers just allocating GFP_DMA32 memory, which also ends up in the low
region (but not swiotlb), that is why the previous default of 72MB low
memory was not enough, it only left 8MB of GFP_DMA32 memory. The current
default of 256MB was found by experiments on a bigger number of
machines, to create a reasonable default that is at least likely to be
sufficient of an average machine.
There is no way today for the kernel to find an optimum value for the
amount of low memory required to successfully create a crash dump. It
depends on the amount of devices in the system and how the drivers
for them are written. The drivers have no way to report back their
requirements, and even if they had, at the time the allocation happens
no driver is loaded yet.
So it is up to the system administrator to find workable values for the
high and low memory requirements, even using experiments as a last
resort.
Regards,
Joerg
On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote:
> The current default of 256MB was found by experiments on a bigger
> number of machines, to create a reasonable default that is at least
> likely to be sufficient of an average machine.
Exactly, and this is what makes sense.
The code should try the requested reservation and if it fails, it should
try high allocation with default swiotlb size because we need to reserve
*some* range.
If that reservation succeeds, we should say something along the lines of
"... requested range failed, reserved <X> range instead."
And then in Documentation/admin-guide/kernel-parameters.txt above the
crashkernel= explanations, the allocation strategy of best effort should
be explained in short. That the kernel will try to allocate high if the
requested allocation didn't succeed and that the user can tweak the
allocation with the below options.
Bottom line is: the kernel should assist the user and try harder to
allocate *some* range for a crash kernel when there's no detailed
specification what that range should be.
*If* the user adds ,low, high, then the kernel should try only that
specified range because the assumption is that the user knows what she's
doing.
But if the user simply wants a range for a crash kernel without stating
where that range should be in particular and it's placement is a don't
care - as long as there is a range - then the kernel should simply try
high, etc.
Makes sense?
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Fri, Feb 22, 2019 at 9:00 PM Borislav Petkov <[email protected]> wrote:
>
> On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote:
> > The current default of 256MB was found by experiments on a bigger
> > number of machines, to create a reasonable default that is at least
> > likely to be sufficient of an average machine.
>
> Exactly, and this is what makes sense.
>
> The code should try the requested reservation and if it fails, it should
> try high allocation with default swiotlb size because we need to reserve
> *some* range.
>
> If that reservation succeeds, we should say something along the lines of
>
> "... requested range failed, reserved <X> range instead."
>
Maybe I misunderstood you, but does "requested range failed" mean that
user specify the range? If yes, then it should be the duty of user as
you said later, not the duty of kernel"
> And then in Documentation/admin-guide/kernel-parameters.txt above the
> crashkernel= explanations, the allocation strategy of best effort should
> be explained in short. That the kernel will try to allocate high if the
> requested allocation didn't succeed and that the user can tweak the
> allocation with the below options.
>
Yes, it should be improved.
> Bottom line is: the kernel should assist the user and try harder to
> allocate *some* range for a crash kernel when there's no detailed
> specification what that range should be.
>
> *If* the user adds ,low, high, then the kernel should try only that
> specified range because the assumption is that the user knows what she's
> doing.
>
> But if the user simply wants a range for a crash kernel without stating
> where that range should be in particular and it's placement is a don't
> care - as long as there is a range - then the kernel should simply try
> high, etc.
>
We do not know the memory layout of a system, maybe a system with
memory less than 4GB. So it is better to try all the range of system
memory
Thanks,
Pingfan
> Makes sense?
>
> --
> Regards/Gruss,
> Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.
On 02/24/19 at 09:25pm, Pingfan Liu wrote:
> On Fri, Feb 22, 2019 at 9:00 PM Borislav Petkov <[email protected]> wrote:
> >
> > On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote:
> > > The current default of 256MB was found by experiments on a bigger
> > > number of machines, to create a reasonable default that is at least
> > > likely to be sufficient of an average machine.
> >
> > Exactly, and this is what makes sense.
> >
> > The code should try the requested reservation and if it fails, it should
> > try high allocation with default swiotlb size because we need to reserve
> > *some* range.
> >
> > If that reservation succeeds, we should say something along the lines of
> >
> > "... requested range failed, reserved <X> range instead."
> >
> Maybe I misunderstood you, but does "requested range failed" mean that
> user specify the range? If yes, then it should be the duty of user as
> you said later, not the duty of kernel"
If you go with the changes in your current patch it is needed to say
something like:
"crashkernel: can not find free memory under 4G, reserve XM@.. instead"
Also need to print the reserved low memory area in case ,high being used.
But for 896M -> 4G, the 896M faulure is not necessary to show in dmesg,
it is some in kernel logic.
Thanks
Dave
On Sun, Feb 24, 2019 at 09:25:18PM +0800, Pingfan Liu wrote:
> Maybe I misunderstood you, but does "requested range failed" mean that
> user specify the range? If yes, then it should be the duty of user as
> you said later, not the duty of kernel"
No, it should say that it selected a different range only when the user
didn't specify it. Which would mean that the user didn't care about the
range - she/he only wanted to have *any* crashkernel range reserved.
I.e., crashkernel=X invocation.
> We do not know the memory layout of a system, maybe a system with
> memory less than 4GB. So it is better to try all the range of system
> memory.
Ok. If 4G fails, you set high and then try again.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Fri, Feb 22, 2019 at 02:00:26PM +0100, Borislav Petkov wrote:
> On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote:
> > The current default of 256MB was found by experiments on a bigger
> > number of machines, to create a reasonable default that is at least
> > likely to be sufficient of an average machine.
>
> Exactly, and this is what makes sense.
>
> The code should try the requested reservation and if it fails, it should
> try high allocation with default swiotlb size because we need to reserve
> *some* range.
Right, makes sense. While at it, maybe it is time to move the default
allocation policy to 'high' again. The change was reverted six years ago
because it broke old kexec tools, but those are probably out-of-service
now. I think this change would make the whole crashdump allocation
process less fragile.
Regards,
Joerg
On 02/25/19 at 12:00pm, Joerg Roedel wrote:
> On Fri, Feb 22, 2019 at 02:00:26PM +0100, Borislav Petkov wrote:
> > On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote:
> > > The current default of 256MB was found by experiments on a bigger
> > > number of machines, to create a reasonable default that is at least
> > > likely to be sufficient of an average machine.
> >
> > Exactly, and this is what makes sense.
> >
> > The code should try the requested reservation and if it fails, it should
> > try high allocation with default swiotlb size because we need to reserve
> > *some* range.
>
> Right, makes sense. While at it, maybe it is time to move the default
> allocation policy to 'high' again. The change was reverted six years ago
> because it broke old kexec tools, but those are probably out-of-service
> now. I think this change would make the whole crashdump allocation
> process less fragile.
One concern about this is for average cases, one do not need so much
memory for kdump. For example in RHEL we use crashkernel=auto to
automatically reserve kdump kernel memory, and for x86 the reserved size
is like below now:
1G-64G:160M,64G-1T:256M,1T-:512M
That means for a machine with less than 64G memory we only allocate
160M, it works for most machines in our lab.
If we move to high as default, it will allocate 160M high + 256M low. It
is too much for people who is good with the default 160M. Especially
for virtual machine with less memory (but > 4G)
To make the process less fragile maybe we can remove the 896M limitation
and only try <4G then go to high.
Thanks
Dave
On Mon, Feb 25, 2019 at 07:12:16PM +0800, Dave Young wrote:
> If we move to high as default, it will allocate 160M high + 256M low. It
We won't move to high by default - we will *fall* back to high if the
default allocation fails.
> To make the process less fragile maybe we can remove the 896M limitation
> and only try <4G then go to high.
Sure, the more robust for the user, the better.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
Hi Borislav,
Do you think the following patch is good at present?
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 81f9d23..9213073 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -460,7 +460,7 @@ static void __init
memblock_x86_reserve_range_setup_data(void)
# define CRASH_ADDR_LOW_MAX (512 << 20)
# define CRASH_ADDR_HIGH_MAX (512 << 20)
#else
-# define CRASH_ADDR_LOW_MAX (896UL << 20)
+# define CRASH_ADDR_LOW_MAX (1 << 32)
# define CRASH_ADDR_HIGH_MAX MAXMEM
#endif
For documentation, I will send another patch to improve the description.
Thanks,
Pingfan
On Mon, Feb 25, 2019 at 7:30 PM Borislav Petkov <[email protected]> wrote:
>
> On Mon, Feb 25, 2019 at 07:12:16PM +0800, Dave Young wrote:
> > If we move to high as default, it will allocate 160M high + 256M low. It
>
> We won't move to high by default - we will *fall* back to high if the
> default allocation fails.
>
> > To make the process less fragile maybe we can remove the 896M limitation
> > and only try <4G then go to high.
>
> Sure, the more robust for the user, the better.
>
> --
> Regards/Gruss,
> Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.
On Fri, Mar 1, 2019 at 11:04 AM Pingfan Liu <[email protected]> wrote:
>
> Hi Borislav,
>
> Do you think the following patch is good at present?
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 81f9d23..9213073 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -460,7 +460,7 @@ static void __init
> memblock_x86_reserve_range_setup_data(void)
> # define CRASH_ADDR_LOW_MAX (512 << 20)
> # define CRASH_ADDR_HIGH_MAX (512 << 20)
> #else
> -# define CRASH_ADDR_LOW_MAX (896UL << 20)
> +# define CRASH_ADDR_LOW_MAX (1 << 32)
> # define CRASH_ADDR_HIGH_MAX MAXMEM
> #endif
>
Or patch lools like:
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3d872a5..ed0def5 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -459,7 +459,7 @@ static void __init
memblock_x86_reserve_range_setup_data(void)
# define CRASH_ADDR_LOW_MAX (512 << 20)
# define CRASH_ADDR_HIGH_MAX (512 << 20)
#else
-# define CRASH_ADDR_LOW_MAX (896UL << 20)
+# define CRASH_ADDR_LOW_MAX (1 << 32)
# define CRASH_ADDR_HIGH_MAX MAXMEM
#endif
@@ -551,6 +551,15 @@ static void __init reserve_crashkernel(void)
high ? CRASH_ADDR_HIGH_MAX
: CRASH_ADDR_LOW_MAX,
crash_size, CRASH_ALIGN);
+#ifdef CONFIG_X86_64
+ /*
+ * crashkernel=X reserve below 4G fails? Try MAXMEM
+ */
+ if (!high && !crash_base)
+ crash_base = memblock_find_in_range(CRASH_ALIGN,
+ CRASH_ADDR_HIGH_MAX,
+ crash_size, CRASH_ALIGN);
+#endif
which tries 0-4G, the fall back to 4G above
> For documentation, I will send another patch to improve the description.
>
> Thanks,
> Pingfan
>
> On Mon, Feb 25, 2019 at 7:30 PM Borislav Petkov <[email protected]> wrote:
> >
> > On Mon, Feb 25, 2019 at 07:12:16PM +0800, Dave Young wrote:
> > > If we move to high as default, it will allocate 160M high + 256M low. It
> >
> > We won't move to high by default - we will *fall* back to high if the
> > default allocation fails.
> >
> > > To make the process less fragile maybe we can remove the 896M limitation
> > > and only try <4G then go to high.
> >
> > Sure, the more robust for the user, the better.
> >
> > --
> > Regards/Gruss,
> > Boris.
> >
> > Good mailing practices for 400: avoid top-posting and trim the reply.
Hi Pingfan,
Thanks for the effort,
On 03/01/19 at 11:19am, Pingfan Liu wrote:
> On Fri, Mar 1, 2019 at 11:04 AM Pingfan Liu <[email protected]> wrote:
> >
> > Hi Borislav,
> >
> > Do you think the following patch is good at present?
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 81f9d23..9213073 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -460,7 +460,7 @@ static void __init
> > memblock_x86_reserve_range_setup_data(void)
> > # define CRASH_ADDR_LOW_MAX (512 << 20)
> > # define CRASH_ADDR_HIGH_MAX (512 << 20)
> > #else
> > -# define CRASH_ADDR_LOW_MAX (896UL << 20)
> > +# define CRASH_ADDR_LOW_MAX (1 << 32)
> > # define CRASH_ADDR_HIGH_MAX MAXMEM
> > #endif
> >
> Or patch lools like:
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3d872a5..ed0def5 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -459,7 +459,7 @@ static void __init
> memblock_x86_reserve_range_setup_data(void)
> # define CRASH_ADDR_LOW_MAX (512 << 20)
> # define CRASH_ADDR_HIGH_MAX (512 << 20)
> #else
> -# define CRASH_ADDR_LOW_MAX (896UL << 20)
> +# define CRASH_ADDR_LOW_MAX (1 << 32)
> # define CRASH_ADDR_HIGH_MAX MAXMEM
> #endif
>
> @@ -551,6 +551,15 @@ static void __init reserve_crashkernel(void)
> high ? CRASH_ADDR_HIGH_MAX
> : CRASH_ADDR_LOW_MAX,
> crash_size, CRASH_ALIGN);
> +#ifdef CONFIG_X86_64
> + /*
> + * crashkernel=X reserve below 4G fails? Try MAXMEM
> + */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + CRASH_ADDR_HIGH_MAX,
> + crash_size, CRASH_ALIGN);
> +#endif
>
> which tries 0-4G, the fall back to 4G above
This way looks good to me, I will do some testing with old kexec-tools,
Once testing done I can take up this again and repost later with some documentation
update. Also will split to 2 patches one to drop the old limitation,
another for the fallback.
Thanks
Dave
On Mon, Jan 28, 2019 at 11:18:31AM +0100, Borislav Petkov wrote:
> On Mon, Jan 28, 2019 at 05:58:09PM +0800, Dave Young wrote:
> > Another reason is in case ,high we will need automatically reserve a
> > region in low area for swiotlb. So for example one use
> > crashkernel=256M,high, actual reserved memory is 256M above 4G and
> > another 256M under 4G for swiotlb. Normally it is not necessary for
> > most people. Thus we can not make ,high as default.
>
> And how is the poor user to figure out that we decided for her/him that
> swiotlb reservation is something not necessary for most people and thus
> we fail the crashkernel= reservation?
>
> IOW, that "logic" above doesn't make a whole lot of sense to me from
> user friendliness perspective.
So to show what I mean: I'm trying to reserve a crash kernel region on a
box here. I tried:
crashkernel=64M@16M
as it is stated in Documentation/kdump/kdump.txt.
Box said:
[ 0.000000] crashkernel reservation failed - memory is in use.
Oh great.
Then I tried:
crashkernel=64M@64M
Box said:
[ 0.000000] crashkernel reservation failed - memory is in use.
So I simply did:
crashkernel=64M
and the box said:
[ 0.000000] Reserving 64MB of memory at 3392MB for crashkernel (System RAM: 16271MB)
So I could've gone a long time poking at the memory to find a suitable
address.
So do you see what I mean with making this as user-friendly and as
robust as possible?
In this case I don't care about *where* my crash kernel is - I only want
to have one loaded *somewhere*.
And the same strategy should be applied to other reservation attempts
- we should try hard to reserve and if we cannot reserve, then try an
alternating range.
I even think that
crashkernel=X@Y
should not simply fail if Y is occupied but keep trying and say
[ 0.000000] Reserving 64MB of memory at alternative address 3392MB for crashkernel (System RAM: 16271MB)
and only fail when the user doesn't really want the kernel to try hard
by booting with
crashkernel=X@Y,strict
But that's for another day.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 06/07/19 at 07:30pm, Borislav Petkov wrote:
> On Mon, Jan 28, 2019 at 11:18:31AM +0100, Borislav Petkov wrote:
> > On Mon, Jan 28, 2019 at 05:58:09PM +0800, Dave Young wrote:
> > > Another reason is in case ,high we will need automatically reserve a
> > > region in low area for swiotlb. So for example one use
> > > crashkernel=256M,high, actual reserved memory is 256M above 4G and
> > > another 256M under 4G for swiotlb. Normally it is not necessary for
> > > most people. Thus we can not make ,high as default.
> >
> > And how is the poor user to figure out that we decided for her/him that
> > swiotlb reservation is something not necessary for most people and thus
> > we fail the crashkernel= reservation?
> >
> > IOW, that "logic" above doesn't make a whole lot of sense to me from
> > user friendliness perspective.
>
> So to show what I mean: I'm trying to reserve a crash kernel region on a
> box here. I tried:
>
> crashkernel=64M@16M
>
> as it is stated in Documentation/kdump/kdump.txt.
>
> Box said:
>
> [ 0.000000] crashkernel reservation failed - memory is in use.
>
> Oh great.
>
> Then I tried:
>
> crashkernel=64M@64M
>
> Box said:
>
> [ 0.000000] crashkernel reservation failed - memory is in use.
>
> So I simply did:
>
> crashkernel=64M
>
> and the box said:
>
> [ 0.000000] Reserving 64MB of memory at 3392MB for crashkernel (System RAM: 16271MB)
>
> So I could've gone a long time poking at the memory to find a suitable
> address.
>
> So do you see what I mean with making this as user-friendly and as
> robust as possible?
Yes, it is clear to me, I absolutely agree that is not friendly :)
Previously without KASLR, one can check /proc/iomem to find a possible
free area and use it for next and future boot. But in case KASLR
enabled nowadays it become harder to predict the persistent free areas.
>
> In this case I don't care about *where* my crash kernel is - I only want
> to have one loaded *somewhere*.
We would suggest people to use crashkernel=X instead. for the X@Y
I believe it is some historic thing, it *should* be able to be obsolete
at least on X86, (not sure other arches).
I expect people can comment if they have some use cases requiring this
X@Y way.
We have modified the crashkernel=X to search 0 - 4G memory instead
of old 0 - 896M for low memory areas, so a possible case is people who
uses very old kexec-tools which can only load kernel to memory under
896M.
Another way is we just obsolete X@Y, but introduce another interface
like crahskernel=X,max= (max will be used like the CRASH_ADDR_HIGH_MAX
in arch/x86/kernel/setup.c)
>
> And the same strategy should be applied to other reservation attempts
> - we should try hard to reserve and if we cannot reserve, then try an
> alternating range.
>
> I even think that
>
> crashkernel=X@Y
>
> should not simply fail if Y is occupied but keep trying and say
>
> [ 0.000000] Reserving 64MB of memory at alternative address 3392MB for crashkernel (System RAM: 16271MB)
>
> and only fail when the user doesn't really want the kernel to try hard
> by booting with
>
> crashkernel=X@Y,strict
>
> But that's for another day.
Maybe X@Y,max=.. Then kernel will search begin with Y, and stop until
max - 1;
>
> --
> Regards/Gruss,
> Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.
Thanks
Dave