This is v2 post RESEND. Add Mike's Acked-by to patch 2/2 as he suggested
in private mail. And update patches log slightly.
This patchset is trying to fix a bug that SGI UV system casually hang
during boot with KASLR enabled. The root cause is that mm KASLR adapts
size of the direct mapping section only based on the system RAM size.
Then later when map SGI UV MMIOH region into the direct mapping during
rest_init() invocation, it might go beyond of the directing mapping
section and step into VMALLOC or VMEMMAP area, then BUG_ON triggered.
The fix is adding a helper function is_early_uv_system to check UV system
earlier, then call the helper function in kernel_randomize_memory() to
check if it's a SGI UV system, if yes, we keep the size of direct mapping
section to be 64TB just as nokslr.
With this fix, SGI UV system can have 64TB direct mapping size always,
and the starting address of direct mapping/vmalloc/vmemmap and the padding
between them can still be randomized to enhance the system security.
v1->v2:
1. Mike suggested making is_early_uv_system() an inline function and be
put in include/asm/uv/uv.h so that they can adjust them easier in the
future.
2. Split the v1 code into uv part and mm KASLR part as Mike suggested.
Baoquan He (2):
x86/UV: Introduce a helper function to check UV system at earlier
stage
x86/mm/KASLR: Do not adapt the size of the direct mapping section for
SGI UV system
arch/x86/include/asm/uv/uv.h | 6 ++++++
arch/x86/mm/kaslr.c | 3 ++-
2 files changed, 8 insertions(+), 1 deletion(-)
--
2.5.5
The BIOS on SGI UV system will report a UV system table which describes
specific firmware capabilities available to the Linux kernel at runtime.
This UV system table only exists on SGI UV system. And it's detected
in efi_init() which is at very early stage.
So introduce a new helper function is_early_uv_system() to identify if
a system is UV system. Later we will use it to check if the running
system is UV system in mm KASLR code.
Signed-off-by: Baoquan He <[email protected]>
Acked-by: Mike Travis <[email protected]>
---
arch/x86/include/asm/uv/uv.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/x86/include/asm/uv/uv.h b/arch/x86/include/asm/uv/uv.h
index b5a32231abd8..93d7ad8763ba 100644
--- a/arch/x86/include/asm/uv/uv.h
+++ b/arch/x86/include/asm/uv/uv.h
@@ -18,6 +18,11 @@ extern void uv_nmi_init(void);
extern void uv_system_init(void);
extern const struct cpumask *uv_flush_tlb_others(const struct cpumask *cpumask,
const struct flush_tlb_info *info);
+#include <linux/efi.h>
+static inline int is_early_uv_system(void)
+{
+ return !((efi.uv_systab == EFI_INVALID_TABLE_ADDR) || !efi.uv_systab);
+}
#else /* X86_UV */
@@ -30,6 +35,7 @@ static inline const struct cpumask *
uv_flush_tlb_others(const struct cpumask *cpumask,
const struct flush_tlb_info *info)
{ return cpumask; }
+static inline int is_early_uv_system(void) { return 0; }
#endif /* X86_UV */
--
2.5.5
On SGI UV system, kernel often hangs when KASLR is enabled. Disabling
KASLR makes kernel work well.
The back trace is:
kernel BUG at arch/x86/mm/init_64.c:311!
invalid opcode: 0000 [#1] SMP
[...]
RIP: 0010:__init_extra_mapping+0x188/0x196
[...]
Call Trace:
init_extra_mapping_uc+0x13/0x15
map_high+0x67/0x75
map_mmioh_high_uv3+0x20a/0x219
uv_system_init_hub+0x12d9/0x1496
uv_system_init+0x27/0x29
native_smp_prepare_cpus+0x28d/0x2d8
kernel_init_freeable+0xdd/0x253
? rest_init+0x80/0x80
kernel_init+0xe/0x110
ret_from_fork+0x2c/0x40
This is because the SGI UV system need map its MMIOH region to the direct
mapping section, and the mapping happens in rest_init() which is much
later than the calling of kernel_randomize_memory() to do mm KASLR. So
mm KASLR can't count in the size of the MMIOH region when caculate the
needed size of address space for the direct mapping section.
When KASLR is disabled, there are 64TB address space for both system RAM
and the MMIOH regions to share. When KASLR is enabled, the current code
of mm KASLR only reserves the actual size of system RAM plus extra 10TB
for the direct mapping. Thus later the MMIOH mapping could go beyond
the upper bound of the direct mapping to step into VMALLOC or VMEMMAP area.
Then BUG_ON() in __init_extra_mapping() will be triggered.
E.g on the SGI UV3 machine where this bug is reported , there are two MMIOH
regions:
[ 1.519001] UV: Map MMIOH0_HI 0xffc00000000 - 0x100000000000
[ 1.523001] UV: Map MMIOH1_HI 0x100000000000 - 0x200000000000
They are [16TB-16G, 16TB) and [16TB, 32TB). On this machine, 512G RAM are
spread out to 1TB regions. Then above two SGI MMIOH regions also will be
mapped into the direct mapping section.
To fix it, we need check if it's SGI UV system by calling is_early_uv_system()
in kernel_randomize_memory(). If yes, do not adapt the size of the direct
mapping section, just keep it as 64TB.
Signed-off-by: Baoquan He <[email protected]>
Reviewed-by: Thomas Garnier <[email protected]>
Acked-by: Mike Travis <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Thomas Garnier <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Masahiro Yamada <[email protected]>
---
arch/x86/mm/kaslr.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index af599167fe3c..4d68c08df82d 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -27,6 +27,7 @@
#include <asm/pgtable.h>
#include <asm/setup.h>
#include <asm/kaslr.h>
+#include <asm/uv/uv.h>
#include "mm_internal.h"
@@ -123,7 +124,7 @@ void __init kernel_randomize_memory(void)
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
/* Adapt phyiscal memory region size based on available memory */
- if (memory_tb < kaslr_regions[0].size_tb)
+ if (memory_tb < kaslr_regions[0].size_tb && !is_early_uv_system())
kaslr_regions[0].size_tb = memory_tb;
/* Calculate entropy available between regions */
--
2.5.5
Hi all,
PING!
Do anyone has any suggestion about this issue? This bug blocks SGI
system's boot, KASLR has to be disabled on SGI system if they want to
run tests.
Thanks
Baoquan
On 09/07/17 at 03:42pm, Baoquan He wrote:
> This is v2 post RESEND. Add Mike's Acked-by to patch 2/2 as he suggested
> in private mail. And update patches log slightly.
>
> This patchset is trying to fix a bug that SGI UV system casually hang
> during boot with KASLR enabled. The root cause is that mm KASLR adapts
> size of the direct mapping section only based on the system RAM size.
> Then later when map SGI UV MMIOH region into the direct mapping during
> rest_init() invocation, it might go beyond of the directing mapping
> section and step into VMALLOC or VMEMMAP area, then BUG_ON triggered.
>
> The fix is adding a helper function is_early_uv_system to check UV system
> earlier, then call the helper function in kernel_randomize_memory() to
> check if it's a SGI UV system, if yes, we keep the size of direct mapping
> section to be 64TB just as nokslr.
>
> With this fix, SGI UV system can have 64TB direct mapping size always,
> and the starting address of direct mapping/vmalloc/vmemmap and the padding
> between them can still be randomized to enhance the system security.
>
> v1->v2:
> 1. Mike suggested making is_early_uv_system() an inline function and be
> put in include/asm/uv/uv.h so that they can adjust them easier in the
> future.
>
> 2. Split the v1 code into uv part and mm KASLR part as Mike suggested.
>
> Baoquan He (2):
> x86/UV: Introduce a helper function to check UV system at earlier
> stage
> x86/mm/KASLR: Do not adapt the size of the direct mapping section for
> SGI UV system
>
> arch/x86/include/asm/uv/uv.h | 6 ++++++
> arch/x86/mm/kaslr.c | 3 ++-
> 2 files changed, 8 insertions(+), 1 deletion(-)
>
> --
> 2.5.5
>
Add Dave to the CC list, he may have concerns about the code change.
On 09/07/17 at 03:42pm, Baoquan He wrote:
> The BIOS on SGI UV system will report a UV system table which describes
> specific firmware capabilities available to the Linux kernel at runtime.
> This UV system table only exists on SGI UV system. And it's detected
> in efi_init() which is at very early stage.
>
> So introduce a new helper function is_early_uv_system() to identify if
> a system is UV system. Later we will use it to check if the running
> system is UV system in mm KASLR code.
>
> Signed-off-by: Baoquan He <[email protected]>
> Acked-by: Mike Travis <[email protected]>
> ---
> arch/x86/include/asm/uv/uv.h | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/include/asm/uv/uv.h b/arch/x86/include/asm/uv/uv.h
> index b5a32231abd8..93d7ad8763ba 100644
> --- a/arch/x86/include/asm/uv/uv.h
> +++ b/arch/x86/include/asm/uv/uv.h
> @@ -18,6 +18,11 @@ extern void uv_nmi_init(void);
> extern void uv_system_init(void);
> extern const struct cpumask *uv_flush_tlb_others(const struct cpumask *cpumask,
> const struct flush_tlb_info *info);
> +#include <linux/efi.h>
> +static inline int is_early_uv_system(void)
> +{
> + return !((efi.uv_systab == EFI_INVALID_TABLE_ADDR) || !efi.uv_systab);
> +}
>
> #else /* X86_UV */
>
> @@ -30,6 +35,7 @@ static inline const struct cpumask *
> uv_flush_tlb_others(const struct cpumask *cpumask,
> const struct flush_tlb_info *info)
> { return cpumask; }
> +static inline int is_early_uv_system(void) { return 0; }
>
> #endif /* X86_UV */
>
> --
> 2.5.5
>
On 09/14/17 at 03:29pm, Baoquan He wrote:
> Add Dave to the CC list, he may have concerns about the code change.
Baoquan, thanks for cc me
>
> On 09/07/17 at 03:42pm, Baoquan He wrote:
> > The BIOS on SGI UV system will report a UV system table which describes
> > specific firmware capabilities available to the Linux kernel at runtime.
> > This UV system table only exists on SGI UV system. And it's detected
> > in efi_init() which is at very early stage.
> >
> > So introduce a new helper function is_early_uv_system() to identify if
> > a system is UV system. Later we will use it to check if the running
> > system is UV system in mm KASLR code.
> >
> > Signed-off-by: Baoquan He <[email protected]>
> > Acked-by: Mike Travis <[email protected]>
> > ---
> > arch/x86/include/asm/uv/uv.h | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/uv/uv.h b/arch/x86/include/asm/uv/uv.h
> > index b5a32231abd8..93d7ad8763ba 100644
> > --- a/arch/x86/include/asm/uv/uv.h
> > +++ b/arch/x86/include/asm/uv/uv.h
> > @@ -18,6 +18,11 @@ extern void uv_nmi_init(void);
> > extern void uv_system_init(void);
> > extern const struct cpumask *uv_flush_tlb_others(const struct cpumask *cpumask,
> > const struct flush_tlb_info *info);
> > +#include <linux/efi.h>
> > +static inline int is_early_uv_system(void)
> > +{
> > + return !((efi.uv_systab == EFI_INVALID_TABLE_ADDR) || !efi.uv_systab);
> > +}
Sorry for jumping in late, I have two questions about the patch:
1) For efi tables, the only invalid value is EFI_INVALID_TABLE_ADDR, and
efi struct is initialized as EFI_INVALID_TABLE_ADDR by default so no
need to check "|| !efi.uv_systab". Do we have any UV firmware specific
assumption that "0" is also possible to be assigned?
2) It seems adding this function in uv.h for separating this for uv
system only purpose. But I feel it is better to put it in efi.h instead.
uv_systab is already a member of struct efi, it is in efi.h so it is
natural to check the table exist or not. Then just include efi.h in
kaslr.c and use the function.
something like drivers/firmware/efi/esrt.c: esrt_table_exists()
Anyway I have no strong opinon, it looks more natural to me though.
> >
> > #else /* X86_UV */
> >
> > @@ -30,6 +35,7 @@ static inline const struct cpumask *
> > uv_flush_tlb_others(const struct cpumask *cpumask,
> > const struct flush_tlb_info *info)
> > { return cpumask; }
> > +static inline int is_early_uv_system(void) { return 0; }
> >
> > #endif /* X86_UV */
> >
> > --
> > 2.5.5
> >
Thanks
Dave
On 09/14/17 at 03:49pm, Dave Young wrote:
> > > diff --git a/arch/x86/include/asm/uv/uv.h b/arch/x86/include/asm/uv/uv.h
> > > index b5a32231abd8..93d7ad8763ba 100644
> > > --- a/arch/x86/include/asm/uv/uv.h
> > > +++ b/arch/x86/include/asm/uv/uv.h
> > > @@ -18,6 +18,11 @@ extern void uv_nmi_init(void);
> > > extern void uv_system_init(void);
> > > extern const struct cpumask *uv_flush_tlb_others(const struct cpumask *cpumask,
> > > const struct flush_tlb_info *info);
> > > +#include <linux/efi.h>
> > > +static inline int is_early_uv_system(void)
> > > +{
> > > + return !((efi.uv_systab == EFI_INVALID_TABLE_ADDR) || !efi.uv_systab);
> > > +}
>
Thanks for looking into this, Dave!
>
> Sorry for jumping in late, I have two questions about the patch:
>
> 1) For efi tables, the only invalid value is EFI_INVALID_TABLE_ADDR, and
> efi struct is initialized as EFI_INVALID_TABLE_ADDR by default so no
> need to check "|| !efi.uv_systab". Do we have any UV firmware specific
> assumption that "0" is also possible to be assigned?
Hmm, in uv_bios_init() it also checks the !efi.uv_systab case. And
EFI_INVALID_TABLE_ADDR checking is earlier, it won't affect the result
if it's EFI_INVALID_TABLE_ADDR. And !efi.uv_systab can make it safer
since it doesn't work either if efi.uv_systab is 0. Mainly it's not
harmful.
Mike, what's your thought? Should I only check the (efi.uv_systab ==
EFI_INVALID_TABLE_ADDR) case?
>
> 2) It seems adding this function in uv.h for separating this for uv
> system only purpose. But I feel it is better to put it in efi.h instead.
At the beginning I put it in efi.c, later Mike suggested putting it in
asm/uv/uv.h. You can also find the discussion in below link.
https://patchwork.kernel.org/patch/9732787/
Thanks
Baoquan
>
> uv_systab is already a member of struct efi, it is in efi.h so it is
> natural to check the table exist or not. Then just include efi.h in
> kaslr.c and use the function.
>
> something like drivers/firmware/efi/esrt.c: esrt_table_exists()
>
> Anyway I have no strong opinon, it looks more natural to me though.
>
> > >
> > > #else /* X86_UV */
> > >
> > > @@ -30,6 +35,7 @@ static inline const struct cpumask *
> > > uv_flush_tlb_others(const struct cpumask *cpumask,
> > > const struct flush_tlb_info *info)
> > > { return cpumask; }
> > > +static inline int is_early_uv_system(void) { return 0; }
> > >
> > > #endif /* X86_UV */
> > >
> > > --
> > > 2.5.5
> > >
>
> Thanks
> Dave
Cc linux-efi list
On 09/14/17 at 04:08pm, Baoquan He wrote:
> On 09/14/17 at 03:49pm, Dave Young wrote:
> > > > diff --git a/arch/x86/include/asm/uv/uv.h b/arch/x86/include/asm/uv/uv.h
> > > > index b5a32231abd8..93d7ad8763ba 100644
> > > > --- a/arch/x86/include/asm/uv/uv.h
> > > > +++ b/arch/x86/include/asm/uv/uv.h
> > > > @@ -18,6 +18,11 @@ extern void uv_nmi_init(void);
> > > > extern void uv_system_init(void);
> > > > extern const struct cpumask *uv_flush_tlb_others(const struct cpumask *cpumask,
> > > > const struct flush_tlb_info *info);
> > > > +#include <linux/efi.h>
> > > > +static inline int is_early_uv_system(void)
> > > > +{
> > > > + return !((efi.uv_systab == EFI_INVALID_TABLE_ADDR) || !efi.uv_systab);
> > > > +}
> >
>
> Thanks for looking into this, Dave!
>
> >
> > Sorry for jumping in late, I have two questions about the patch:
> >
> > 1) For efi tables, the only invalid value is EFI_INVALID_TABLE_ADDR, and
> > efi struct is initialized as EFI_INVALID_TABLE_ADDR by default so no
> > need to check "|| !efi.uv_systab". Do we have any UV firmware specific
> > assumption that "0" is also possible to be assigned?
>
> Hmm, in uv_bios_init() it also checks the !efi.uv_systab case. And
> EFI_INVALID_TABLE_ADDR checking is earlier, it won't affect the result
> if it's EFI_INVALID_TABLE_ADDR. And !efi.uv_systab can make it safer
> since it doesn't work either if efi.uv_systab is 0. Mainly it's not
> harmful.
>
> Mike, what's your thought? Should I only check the (efi.uv_systab ==
> EFI_INVALID_TABLE_ADDR) case?
>
> >
> > 2) It seems adding this function in uv.h for separating this for uv
> > system only purpose. But I feel it is better to put it in efi.h instead.
>
> At the beginning I put it in efi.c, later Mike suggested putting it in
> asm/uv/uv.h. You can also find the discussion in below link.
> https://patchwork.kernel.org/patch/9732787/
>
> Thanks
> Baoquan
>
> >
> > uv_systab is already a member of struct efi, it is in efi.h so it is
> > natural to check the table exist or not. Then just include efi.h in
> > kaslr.c and use the function.
> >
> > something like drivers/firmware/efi/esrt.c: esrt_table_exists()
> >
> > Anyway I have no strong opinon, it looks more natural to me though.
> >
> > > >
> > > > #else /* X86_UV */
> > > >
> > > > @@ -30,6 +35,7 @@ static inline const struct cpumask *
> > > > uv_flush_tlb_others(const struct cpumask *cpumask,
> > > > const struct flush_tlb_info *info)
> > > > { return cpumask; }
> > > > +static inline int is_early_uv_system(void) { return 0; }
> > > >
> > > > #endif /* X86_UV */
> > > >
> > > > --
> > > > 2.5.5
> > > >
> >
> > Thanks
> > Dave
Missed a comma in cc list in last reply, readd linux-efi list in cc.
On 09/14/17 at 04:08pm, Baoquan He wrote:
> On 09/14/17 at 03:49pm, Dave Young wrote:
> > > > diff --git a/arch/x86/include/asm/uv/uv.h b/arch/x86/include/asm/uv/uv.h
> > > > index b5a32231abd8..93d7ad8763ba 100644
> > > > --- a/arch/x86/include/asm/uv/uv.h
> > > > +++ b/arch/x86/include/asm/uv/uv.h
> > > > @@ -18,6 +18,11 @@ extern void uv_nmi_init(void);
> > > > extern void uv_system_init(void);
> > > > extern const struct cpumask *uv_flush_tlb_others(const struct cpumask *cpumask,
> > > > const struct flush_tlb_info *info);
> > > > +#include <linux/efi.h>
> > > > +static inline int is_early_uv_system(void)
> > > > +{
> > > > + return !((efi.uv_systab == EFI_INVALID_TABLE_ADDR) || !efi.uv_systab);
> > > > +}
> >
>
> Thanks for looking into this, Dave!
>
> >
> > Sorry for jumping in late, I have two questions about the patch:
> >
> > 1) For efi tables, the only invalid value is EFI_INVALID_TABLE_ADDR, and
> > efi struct is initialized as EFI_INVALID_TABLE_ADDR by default so no
> > need to check "|| !efi.uv_systab". Do we have any UV firmware specific
> > assumption that "0" is also possible to be assigned?
>
> Hmm, in uv_bios_init() it also checks the !efi.uv_systab case. And
> EFI_INVALID_TABLE_ADDR checking is earlier, it won't affect the result
> if it's EFI_INVALID_TABLE_ADDR. And !efi.uv_systab can make it safer
> since it doesn't work either if efi.uv_systab is 0. Mainly it's not
> harmful.
>
> Mike, what's your thought? Should I only check the (efi.uv_systab ==
> EFI_INVALID_TABLE_ADDR) case?
>
> >
> > 2) It seems adding this function in uv.h for separating this for uv
> > system only purpose. But I feel it is better to put it in efi.h instead.
>
> At the beginning I put it in efi.c, later Mike suggested putting it in
> asm/uv/uv.h. You can also find the discussion in below link.
> https://patchwork.kernel.org/patch/9732787/
>
> Thanks
> Baoquan
>
> >
> > uv_systab is already a member of struct efi, it is in efi.h so it is
> > natural to check the table exist or not. Then just include efi.h in
> > kaslr.c and use the function.
> >
> > something like drivers/firmware/efi/esrt.c: esrt_table_exists()
> >
> > Anyway I have no strong opinon, it looks more natural to me though.
> >
> > > >
> > > > #else /* X86_UV */
> > > >
> > > > @@ -30,6 +35,7 @@ static inline const struct cpumask *
> > > > uv_flush_tlb_others(const struct cpumask *cpumask,
> > > > const struct flush_tlb_info *info)
> > > > { return cpumask; }
> > > > +static inline int is_early_uv_system(void) { return 0; }
> > > >
> > > > #endif /* X86_UV */
> > > >
> > > > --
> > > > 2.5.5
> > > >
> >
> > Thanks
> > Dave
* Baoquan He <[email protected]> wrote:
> On SGI UV system, kernel often hangs when KASLR is enabled. Disabling
> KASLR makes kernel work well.
>
> The back trace is:
>
> kernel BUG at arch/x86/mm/init_64.c:311!
> invalid opcode: 0000 [#1] SMP
> [...]
> RIP: 0010:__init_extra_mapping+0x188/0x196
> [...]
> Call Trace:
> init_extra_mapping_uc+0x13/0x15
> map_high+0x67/0x75
> map_mmioh_high_uv3+0x20a/0x219
> uv_system_init_hub+0x12d9/0x1496
> uv_system_init+0x27/0x29
> native_smp_prepare_cpus+0x28d/0x2d8
> kernel_init_freeable+0xdd/0x253
> ? rest_init+0x80/0x80
> kernel_init+0xe/0x110
> ret_from_fork+0x2c/0x40
>
> This is because the SGI UV system need map its MMIOH region to the direct
> mapping section, and the mapping happens in rest_init() which is much
> later than the calling of kernel_randomize_memory() to do mm KASLR. So
> mm KASLR can't count in the size of the MMIOH region when caculate the
> needed size of address space for the direct mapping section.
>
> When KASLR is disabled, there are 64TB address space for both system RAM
> and the MMIOH regions to share. When KASLR is enabled, the current code
> of mm KASLR only reserves the actual size of system RAM plus extra 10TB
> for the direct mapping. Thus later the MMIOH mapping could go beyond
> the upper bound of the direct mapping to step into VMALLOC or VMEMMAP area.
> Then BUG_ON() in __init_extra_mapping() will be triggered.
>
> E.g on the SGI UV3 machine where this bug is reported , there are two MMIOH
> regions:
>
> [ 1.519001] UV: Map MMIOH0_HI 0xffc00000000 - 0x100000000000
> [ 1.523001] UV: Map MMIOH1_HI 0x100000000000 - 0x200000000000
>
> They are [16TB-16G, 16TB) and [16TB, 32TB). On this machine, 512G RAM are
> spread out to 1TB regions. Then above two SGI MMIOH regions also will be
> mapped into the direct mapping section.
>
> To fix it, we need check if it's SGI UV system by calling is_early_uv_system()
> in kernel_randomize_memory(). If yes, do not adapt the size of the direct
> mapping section, just keep it as 64TB.
>
> Signed-off-by: Baoquan He <[email protected]>
> Reviewed-by: Thomas Garnier <[email protected]>
> Acked-by: Mike Travis <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: [email protected]
> Cc: Thomas Garnier <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Masahiro Yamada <[email protected]>
> ---
> arch/x86/mm/kaslr.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> index af599167fe3c..4d68c08df82d 100644
> --- a/arch/x86/mm/kaslr.c
> +++ b/arch/x86/mm/kaslr.c
> @@ -27,6 +27,7 @@
> #include <asm/pgtable.h>
> #include <asm/setup.h>
> #include <asm/kaslr.h>
> +#include <asm/uv/uv.h>
>
> #include "mm_internal.h"
>
> @@ -123,7 +124,7 @@ void __init kernel_randomize_memory(void)
> CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
>
> /* Adapt phyiscal memory region size based on available memory */
> - if (memory_tb < kaslr_regions[0].size_tb)
> + if (memory_tb < kaslr_regions[0].size_tb && !is_early_uv_system())
> kaslr_regions[0].size_tb = memory_tb;
This is really an ugly hack. Is kaslr_regions[] incorrect? If so then it should be
corrected instead of uglifying the code that uses it...
Thanks,
Ingo
Hi Ingo,
On 09/28/17 at 09:56am, Ingo Molnar wrote:
> > diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> > index af599167fe3c..4d68c08df82d 100644
> > --- a/arch/x86/mm/kaslr.c
> > +++ b/arch/x86/mm/kaslr.c
> > @@ -27,6 +27,7 @@
> > #include <asm/pgtable.h>
> > #include <asm/setup.h>
> > #include <asm/kaslr.h>
> > +#include <asm/uv/uv.h>
> >
> > #include "mm_internal.h"
> >
> > @@ -123,7 +124,7 @@ void __init kernel_randomize_memory(void)
> > CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> >
> > /* Adapt phyiscal memory region size based on available memory */
> > - if (memory_tb < kaslr_regions[0].size_tb)
> > + if (memory_tb < kaslr_regions[0].size_tb && !is_early_uv_system())
> > kaslr_regions[0].size_tb = memory_tb;
> This is really an ugly hack. Is kaslr_regions[] incorrect? If so then it should be
> corrected instead of uglifying the code that uses it...
Thanks for looking into this!
If on SGI UV system, the kaslr_regions[0].size_tb, namely the size of
the direct mapping section, is incorrect.
Its direct mapping size includes two parts:
#1 RAM size of system
#2 MMIOH region size which only SGI UV system has.
However, the #2 can only be got till uv_system_init() is called in
native_smp_prepare_cpus(). That is too late for mm KASLR calculation.
That's why I made this hack.
I checked uv_system_init() code, seems not easy to know the size of
MMIOH region before or inside kernel_randomize_memory(). I have CCed UV
devel experts, not sure if they have any idea about this. Otherwise,
this patch could be the only way I can think of.
Hi Mike and Russ,
Is there any chance we can get the size of MMIOH region before mm KASLR
code, namely before we call kernel_randomize_memory()?
Thanks
Baoquan
* Baoquan He <[email protected]> wrote:
> Hi Ingo,
>
> On 09/28/17 at 09:56am, Ingo Molnar wrote:
> > > diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> > > index af599167fe3c..4d68c08df82d 100644
> > > --- a/arch/x86/mm/kaslr.c
> > > +++ b/arch/x86/mm/kaslr.c
> > > @@ -27,6 +27,7 @@
> > > #include <asm/pgtable.h>
> > > #include <asm/setup.h>
> > > #include <asm/kaslr.h>
> > > +#include <asm/uv/uv.h>
> > >
> > > #include "mm_internal.h"
> > >
> > > @@ -123,7 +124,7 @@ void __init kernel_randomize_memory(void)
> > > CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> > >
> > > /* Adapt phyiscal memory region size based on available memory */
> > > - if (memory_tb < kaslr_regions[0].size_tb)
> > > + if (memory_tb < kaslr_regions[0].size_tb && !is_early_uv_system())
> > > kaslr_regions[0].size_tb = memory_tb;
> > This is really an ugly hack. Is kaslr_regions[] incorrect? If so then it should be
> > corrected instead of uglifying the code that uses it...
>
> Thanks for looking into this!
>
> If on SGI UV system, the kaslr_regions[0].size_tb, namely the size of
> the direct mapping section, is incorrect.
>
> Its direct mapping size includes two parts:
> #1 RAM size of system
> #2 MMIOH region size which only SGI UV system has.
>
> However, the #2 can only be got till uv_system_init() is called in
> native_smp_prepare_cpus(). That is too late for mm KASLR calculation.
> That's why I made this hack.
>
> I checked uv_system_init() code, seems not easy to know the size of
> MMIOH region before or inside kernel_randomize_memory(). I have CCed UV
> devel experts, not sure if they have any idea about this. Otherwise,
> this patch could be the only way I can think of.
>
> Hi Mike and Russ,
>
> Is there any chance we can get the size of MMIOH region before mm KASLR
> code, namely before we call kernel_randomize_memory()?
I don't mind system specific quirks to hardware enumeration details, as long as
they don't pollute generic code with such special hacks.
I.e. in this case it's wrong to allow kaslr_regions[0].size_tb to be wrong. Any
other code that relies on it in the future will be wrong as well on UV systems.
The right quirk would be to fix that up where it gets introduced, or something
like that.
Thanks,
Ingo
On 9/28/2017 2:01 AM, Ingo Molnar wrote:
>
> * Baoquan He <[email protected]> wrote:
>
>> Hi Ingo,
>>
>> On 09/28/17 at 09:56am, Ingo Molnar wrote:
>>>> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
>>>> index af599167fe3c..4d68c08df82d 100644
>>>> --- a/arch/x86/mm/kaslr.c
>>>> +++ b/arch/x86/mm/kaslr.c
>>>> @@ -27,6 +27,7 @@
>>>> #include <asm/pgtable.h>
>>>> #include <asm/setup.h>
>>>> #include <asm/kaslr.h>
>>>> +#include <asm/uv/uv.h>
>>>>
>>>> #include "mm_internal.h"
>>>>
>>>> @@ -123,7 +124,7 @@ void __init kernel_randomize_memory(void)
>>>> CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
>>>>
>>>> /* Adapt phyiscal memory region size based on available memory */
>>>> - if (memory_tb < kaslr_regions[0].size_tb)
>>>> + if (memory_tb < kaslr_regions[0].size_tb && !is_early_uv_system())
>>>> kaslr_regions[0].size_tb = memory_tb;
>>> This is really an ugly hack. Is kaslr_regions[] incorrect? If so then it should be
>>> corrected instead of uglifying the code that uses it...
>>
>> Thanks for looking into this!
>>
>> If on SGI UV system, the kaslr_regions[0].size_tb, namely the size of
>> the direct mapping section, is incorrect.
>>
>> Its direct mapping size includes two parts:
>> #1 RAM size of system
>> #2 MMIOH region size which only SGI UV system has.
>>
>> However, the #2 can only be got till uv_system_init() is called in
>> native_smp_prepare_cpus(). That is too late for mm KASLR calculation.
>> That's why I made this hack.
>>
>> I checked uv_system_init() code, seems not easy to know the size of
>> MMIOH region before or inside kernel_randomize_memory(). I have CCed UV
>> devel experts, not sure if they have any idea about this. Otherwise,
>> this patch could be the only way I can think of.
>>
>> Hi Mike and Russ,
>>
>> Is there any chance we can get the size of MMIOH region before mm KASLR
>> code, namely before we call kernel_randomize_memory()?
The sizes of the MMIOL and MMIOH areas are tied into the HUB design and
how it is communicated to BIOS and the kernel. This is via some of the
config MMR's found in the HUB and it would be impossible to provide any
access to these registers as they change with each new UV architecture.
The kernel does reserve the memory in the EFI memmap. I can send you a
console log of the full startup that includes the MMIOH reservations.
Note that it is dependent on what I/O devices are actually present as UV
does not map empty slots unless forced (because we'd quickly run out of
resources.) Also, the EFI memmap entries do not specify the exact usage
of the contained areas.
>
> I don't mind system specific quirks to hardware enumeration details, as long as
> they don't pollute generic code with such special hacks.
>
> I.e. in this case it's wrong to allow kaslr_regions[0].size_tb to be wrong. Any
> other code that relies on it in the future will be wrong as well on UV systems.
Which may come into play on other arches with the new upcoming memory
technologies.
>
> The right quirk would be to fix that up where it gets introduced, or something
> like that.
Yes, does make sense.
>
> Thanks,
>
> Ingo
>
Hi Mike,
On 09/28/17 at 07:10am, Mike Travis wrote:
>
>
> On 9/28/2017 2:01 AM, Ingo Molnar wrote:
> >
> > > If on SGI UV system, the kaslr_regions[0].size_tb, namely the size of
> > > the direct mapping section, is incorrect.
> > >
> > > Its direct mapping size includes two parts:
> > > #1 RAM size of system
> > > #2 MMIOH region size which only SGI UV system has.
> > >
> > > However, the #2 can only be got till uv_system_init() is called in
> > > native_smp_prepare_cpus(). That is too late for mm KASLR calculation.
> > > That's why I made this hack.
> > >
> > > I checked uv_system_init() code, seems not easy to know the size of
> > > MMIOH region before or inside kernel_randomize_memory(). I have CCed UV
> > > devel experts, not sure if they have any idea about this. Otherwise,
> > > this patch could be the only way I can think of.
> > >
> > > Hi Mike and Russ,
> > >
> > > Is there any chance we can get the size of MMIOH region before mm KASLR
> > > code, namely before we call kernel_randomize_memory()?
>
> The sizes of the MMIOL and MMIOH areas are tied into the HUB design and how
> it is communicated to BIOS and the kernel. This is via some of the config
> MMR's found in the HUB and it would be impossible to provide any access to
> these registers as they change with each new UV architecture.
>
> The kernel does reserve the memory in the EFI memmap. I can send you a
> console log of the full startup that includes the MMIOH reservations. Note
> that it is dependent on what I/O devices are actually present as UV does not
> map empty slots unless forced (because we'd quickly run out of resources.)
> Also, the EFI memmap entries do not specify the exact usage of the contained
> areas.
Does that mean we can get the size of MMIOH from efi entries? If yes,
please help provide a console log including those. If can get size from
efi, it will be more acceptable.
Or I can ask Frank to loan his uv system to me, not sure if he is doing
testing with them.
Thanks
Baoquan
>
> >
> > I don't mind system specific quirks to hardware enumeration details, as long as
> > they don't pollute generic code with such special hacks.
> >
> > I.e. in this case it's wrong to allow kaslr_regions[0].size_tb to be wrong. Any
> > other code that relies on it in the future will be wrong as well on UV systems.
>
> Which may come into play on other arches with the new upcoming memory
> technologies.
> >
> > The right quirk would be to fix that up where it gets introduced, or something
> > like that.
>
> Yes, does make sense.
> >
> > Thanks,
> >
> > Ingo
> >
Hi Mike, Russ and Frank,
On 09/28/17 at 07:10am, Mike Travis wrote:
>
>
> On 9/28/2017 2:01 AM, Ingo Molnar wrote:
> >
> > * Baoquan He <[email protected]> wrote:
> >
> > > > > @@ -123,7 +124,7 @@ void __init kernel_randomize_memory(void)
> > > > > CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> > > > > /* Adapt phyiscal memory region size based on available memory */
> > > > > - if (memory_tb < kaslr_regions[0].size_tb)
> > > > > + if (memory_tb < kaslr_regions[0].size_tb && !is_early_uv_system())
> > > > > kaslr_regions[0].size_tb = memory_tb;
> > > > This is really an ugly hack. Is kaslr_regions[] incorrect? If so then it should be
> > > > corrected instead of uglifying the code that uses it...
> > >
> > > Thanks for looking into this!
> > >
> > > If on SGI UV system, the kaslr_regions[0].size_tb, namely the size of
> > > the direct mapping section, is incorrect.
> > >
> > > Its direct mapping size includes two parts:
> > > #1 RAM size of system
> > > #2 MMIOH region size which only SGI UV system has.
> > >
> > > However, the #2 can only be got till uv_system_init() is called in
> > > native_smp_prepare_cpus(). That is too late for mm KASLR calculation.
> > > That's why I made this hack.
> > >
> > > I checked uv_system_init() code, seems not easy to know the size of
> > > MMIOH region before or inside kernel_randomize_memory(). I have CCed UV
> > > devel experts, not sure if they have any idea about this. Otherwise,
> > > this patch could be the only way I can think of.
> > >
> > > Hi Mike and Russ,
> > >
> > > Is there any chance we can get the size of MMIOH region before mm KASLR
> > > code, namely before we call kernel_randomize_memory()?
>
> The sizes of the MMIOL and MMIOH areas are tied into the HUB design and how
> it is communicated to BIOS and the kernel. This is via some of the config
> MMR's found in the HUB and it would be impossible to provide any access to
> these registers as they change with each new UV architecture.
>
> The kernel does reserve the memory in the EFI memmap. I can send you a
> console log of the full startup that includes the MMIOH reservations. Note
> that it is dependent on what I/O devices are actually present as UV does not
> map empty slots unless forced (because we'd quickly run out of resources.)
> Also, the EFI memmap entries do not specify the exact usage of the contained
> areas.
This one is still a regression bug in our newer rhel since I just fixed
them with rhel-only patch. Now I still need the console log which
includes the MMIOH reservations.
Could you help provide a console log with MMIOH info, or I need request
one from redhat's lab?
Or expert from HPE UV team can make a patch based on the finding and
analysis?
Thanks
Baoquan
>
> >
> > I don't mind system specific quirks to hardware enumeration details, as long as
> > they don't pollute generic code with such special hacks.
> >
> > I.e. in this case it's wrong to allow kaslr_regions[0].size_tb to be wrong. Any
> > other code that relies on it in the future will be wrong as well on UV systems.
>
> Which may come into play on other arches with the new upcoming memory
> technologies.
> >
> > The right quirk would be to fix that up where it gets introduced, or something
> > like that.
>
> Yes, does make sense.
> >
> > Thanks,
> >
> > Ingo
> >
> -----Original Message-----
> From: Baoquan He [mailto:[email protected]]
> Sent: Wednesday, May 16, 2018 11:18 PM
> To: Travis, Mike <[email protected]>; Anderson, Russ
> <[email protected]>; Ramsay, Frank <[email protected]>
> Cc: Ingo Molnar <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; akpm@linux-
> foundation.org; [email protected]; Sivanich, Dimitri
> <[email protected]>; [email protected]
> Subject: Re: [PATCH v2 RESEND 2/2] x86/mm/KASLR: Do not adapt the size of
> the direct mapping section for SGI UV system
>
> Hi Mike, Russ and Frank,
>
> On 09/28/17 at 07:10am, Mike Travis wrote:
> >
> >
> > On 9/28/2017 2:01 AM, Ingo Molnar wrote:
> > >
> > > * Baoquan He <[email protected]> wrote:
> > >
> > > > > > @@ -123,7 +124,7 @@ void __init
> kernel_randomize_memory(void)
> > > > > >
> CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> > > > > > /* Adapt phyiscal memory region size based on available
> memory */
> > > > > > - if (memory_tb < kaslr_regions[0].size_tb)
> > > > > > + if (memory_tb < kaslr_regions[0].size_tb &&
> > > > > > +!is_early_uv_system())
> > > > > > kaslr_regions[0].size_tb = memory_tb;
> > > > > This is really an ugly hack. Is kaslr_regions[] incorrect? If so
> > > > > then it should be corrected instead of uglifying the code that uses it...
> > > >
> > > > Thanks for looking into this!
> > > >
> > > > If on SGI UV system, the kaslr_regions[0].size_tb, namely the size
> > > > of the direct mapping section, is incorrect.
> > > >
> > > > Its direct mapping size includes two parts:
> > > > #1 RAM size of system
> > > > #2 MMIOH region size which only SGI UV system has.
> > > >
> > > > However, the #2 can only be got till uv_system_init() is called in
> > > > native_smp_prepare_cpus(). That is too late for mm KASLR calculation.
> > > > That's why I made this hack.
> > > >
> > > > I checked uv_system_init() code, seems not easy to know the size
> > > > of MMIOH region before or inside kernel_randomize_memory(). I have
> > > > CCed UV devel experts, not sure if they have any idea about this.
> > > > Otherwise, this patch could be the only way I can think of.
> > > >
> > > > Hi Mike and Russ,
> > > >
> > > > Is there any chance we can get the size of MMIOH region before mm
> > > > KASLR code, namely before we call kernel_randomize_memory()?
> >
> > The sizes of the MMIOL and MMIOH areas are tied into the HUB design
> > and how it is communicated to BIOS and the kernel. This is via some
> > of the config MMR's found in the HUB and it would be impossible to
> > provide any access to these registers as they change with each new UV
> architecture.
> >
> > The kernel does reserve the memory in the EFI memmap. I can send you
> > a console log of the full startup that includes the MMIOH
> > reservations. Note that it is dependent on what I/O devices are
> > actually present as UV does not map empty slots unless forced (because
> > we'd quickly run out of resources.) Also, the EFI memmap entries do
> > not specify the exact usage of the contained areas.
>
> This one is still a regression bug in our newer rhel since I just fixed them with
> rhel-only patch. Now I still need the console log which includes the MMIOH
> reservations.
>
Does the system need to have an external IO device for this? If not you should just be able to boot one of the SGI UV systems in the beaker lab (possibly also the HPE Superdome Flex that is in beaker; hpe-flex-01.rhts.eng.bos.redhat.com)
> Could you help provide a console log with MMIOH info, or I need request
> one from redhat's lab?
>
> Or expert from HPE UV team can make a patch based on the finding and
> analysis?
>
> Thanks
> Baoquan
> >
> > >
> > > I don't mind system specific quirks to hardware enumeration details,
> > > as long as they don't pollute generic code with such special hacks.
> > >
> > > I.e. in this case it's wrong to allow kaslr_regions[0].size_tb to be
> > > wrong. Any other code that relies on it in the future will be wrong as well
> on UV systems.
> >
> > Which may come into play on other arches with the new upcoming
> memory
> > technologies.
> > >
> > > The right quirk would be to fix that up where it gets introduced, or
> > > something like that.
> >
> > Yes, does make sense.
> > >
> > > Thanks,
> > >
> > > Ingo
> > >
On 5/17/2018 8:06 AM, Ramsay, Frank wrote:
>
>
>> -----Original Message-----
>> From: Baoquan He [mailto:[email protected]]
>> Sent: Wednesday, May 16, 2018 11:18 PM
>> To: Travis, Mike <[email protected]>; Anderson, Russ
>> <[email protected]>; Ramsay, Frank <[email protected]>
>> Cc: Ingo Molnar <[email protected]>; [email protected];
>> [email protected]; [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; akpm@linux-
>> foundation.org; [email protected]; Sivanich, Dimitri
>> <[email protected]>; [email protected]
>> Subject: Re: [PATCH v2 RESEND 2/2] x86/mm/KASLR: Do not adapt the size of
>> the direct mapping section for SGI UV system
>>
>> Hi Mike, Russ and Frank,
>>
>> On 09/28/17 at 07:10am, Mike Travis wrote:
>>>
>>>
>>> On 9/28/2017 2:01 AM, Ingo Molnar wrote:
>>>>
>>>> * Baoquan He <[email protected]> wrote:
>>>>
>>>>>>> @@ -123,7 +124,7 @@ void __init
>> kernel_randomize_memory(void)
>>>>>>>
>> CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
>>>>>>> /* Adapt phyiscal memory region size based on available
>> memory */
>>>>>>> - if (memory_tb < kaslr_regions[0].size_tb)
>>>>>>> + if (memory_tb < kaslr_regions[0].size_tb &&
>>>>>>> +!is_early_uv_system())
>>>>>>> kaslr_regions[0].size_tb = memory_tb;
>>>>>> This is really an ugly hack. Is kaslr_regions[] incorrect? If so
>>>>>> then it should be corrected instead of uglifying the code that uses it...
>>>>>
>>>>> Thanks for looking into this!
>>>>>
>>>>> If on SGI UV system, the kaslr_regions[0].size_tb, namely the size
>>>>> of the direct mapping section, is incorrect.
>>>>>
>>>>> Its direct mapping size includes two parts:
>>>>> #1 RAM size of system
>>>>> #2 MMIOH region size which only SGI UV system has.
>>>>>
>>>>> However, the #2 can only be got till uv_system_init() is called in
>>>>> native_smp_prepare_cpus(). That is too late for mm KASLR calculation.
>>>>> That's why I made this hack.
>>>>>
>>>>> I checked uv_system_init() code, seems not easy to know the size
>>>>> of MMIOH region before or inside kernel_randomize_memory(). I have
>>>>> CCed UV devel experts, not sure if they have any idea about this.
>>>>> Otherwise, this patch could be the only way I can think of.
>>>>>
>>>>> Hi Mike and Russ,
>>>>>
>>>>> Is there any chance we can get the size of MMIOH region before mm
>>>>> KASLR code, namely before we call kernel_randomize_memory()?
>>>
>>> The sizes of the MMIOL and MMIOH areas are tied into the HUB design
>>> and how it is communicated to BIOS and the kernel. This is via some
>>> of the config MMR's found in the HUB and it would be impossible to
>>> provide any access to these registers as they change with each new UV
>> architecture.
>>>
>>> The kernel does reserve the memory in the EFI memmap. I can send you
>>> a console log of the full startup that includes the MMIOH
>>> reservations. Note that it is dependent on what I/O devices are
>>> actually present as UV does not map empty slots unless forced (because
>>> we'd quickly run out of resources.) Also, the EFI memmap entries do
>>> not specify the exact usage of the contained areas.
>>
>> This one is still a regression bug in our newer rhel since I just fixed them with
>> rhel-only patch. Now I still need the console log which includes the MMIOH
>> reservations.
>>
>
> Does the system need to have an external IO device for this? If not you should just be able to boot one of the SGI UV systems in the beaker lab (possibly also the HPE Superdome Flex that is in beaker; hpe-flex-01.rhts.eng.bos.redhat.com)
If you have a hawks2 (UV4), you would have 4 10G ethernet devices on the
base I/O. But these would only have smaller MMIOH0 regions. This would
not cause MMIOH1 regions to be allocated and assigned. (MC990X/UV3 only
has a single sized MMIOH regions where they are all big enough for the
largest MMIOH region found on any I/O device.)
>
>> Could you help provide a console log with MMIOH info, or I need request
>> one from redhat's lab?
>>
>> Or expert from HPE UV team can make a patch based on the finding and
>> analysis?
>>
>> Thanks
>> Baoquan
>>>
>>>>
>>>> I don't mind system specific quirks to hardware enumeration details,
>>>> as long as they don't pollute generic code with such special hacks.
>>>>
>>>> I.e. in this case it's wrong to allow kaslr_regions[0].size_tb to be
>>>> wrong. Any other code that relies on it in the future will be wrong as well
>> on UV systems.
>>>
>>> Which may come into play on other arches with the new upcoming
>> memory
>>> technologies.
>>>>
>>>> The right quirk would be to fix that up where it gets introduced, or
>>>> something like that.
>>>
>>> Yes, does make sense.
>>>>
>>>> Thanks,
>>>>
>>>> Ingo
>>>>
On 05/24/18 at 01:50pm, Mike Travis wrote:
> Hi Baoquan,
>
> My apologies for my delay, we are going through a network reconfig so mail
> to me was not available for a bit. Comments below...
Not at all.
> > > > > > > Is there any chance we can get the size of MMIOH region before mm KASLR
> > > > > > > code, namely before we call kernel_randomize_memory()?
> > > > >
> > > > > The sizes of the MMIOL and MMIOH areas are tied into the HUB design and how
> > > > > it is communicated to BIOS and the kernel. This is via some of the config
> > > > > MMR's found in the HUB and it would be impossible to provide any access to
> > > > > these registers as they change with each new UV architecture.
> > > > >
> > > > > The kernel does reserve the memory in the EFI memmap. I can send you a
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > console log of the full startup that includes the MMIOH reservations. Note
> >
> > What I want is if we can get the MMIOH region from EFI memmap before
> > kernel_randomize_memory() in setup_kernel(), if yes how we can get it.
>
> The problem is that EFI memmap only shows "reserved memory" and not what it
> is reserved for. Most reservations are for things like BIOS reserved
> memory, and exchanged info from EFI to the kernel.
Ok, then we might not be able to achieve the goal Ingo suggested if can
not get the size UV reserved for MMIOH region.
>
> > Because Ingo doesn't like hacking UV inside kernel_randomize_memory(),
> > seems I have to get the MMIOH region specifically before
> > kernel_randomize_memory(), then count it in when do mm regions
> > reandomization.
>
> Perhaps calling a function prior, to see if memory is "eligible" for
> inclusion into your randomize memory scheme? Adding UV to the list of
> systems to support this would bea very good thing, I'm just not sure how to
> help you do this.
Do you mean adding a function to check if the size of direct mapping is
allowed to adapt or not, any ineligible system need be checked there,
and UV system is the 1st one for now?
I am not sure what a list looks like, e.g DMI table we are using?
>
> >
> > > > > that it is dependent on what I/O devices are actually present as UV does not
> > > > > map empty slots unless forced (because we'd quickly run out of resources.)
> > > > > Also, the EFI memmap entries do not specify the exact usage of the contained
> > > > > areas.
> > > >
> > > > This one is still a regression bug in our newer rhel since I just fixed
> > > > them with rhel-only patch. Now I still need the console log which
> > > > includes the MMIOH reservations.
> > > >
> > > > Could you help provide a console log with MMIOH info, or I need request
> > > > one from redhat's lab?
> > >
> > > Hi, I've forgotten exactly what info you need? I have attached a gzipped
> > > console log (private email since attachments are frowned upon in LKML. You
> > > can see the MMIOH0/1 areas reserved though because there is no "large" MMIOH
> > > devices, no specific memory has been assigned. (See MMIOH1 base == NULL
> > > line).
> >
> > Yes, I checked the console log you provided, seems you have enabled the
> > pr_debug printing, and I saw the lines telling it's NULL.
> >
> > 00:01:17 00:00.0 [ 2.196015] UV: MMIOH0 base:0xfff00000000 shift:52 M_IO:26 MAX_IO:63
> > 00:01:17 00:00.0 [ 2.200000] UV: Map MMIOH0_HI base address NULL
> > ......
> > 00:01:17 00:00.0 [ 2.344001] UV: MMIOH1 base:0x100000000000 shift:52 M_IO:37 MAX_IO:127
> > 00:01:17 00:00.0 [ 2.348000] UV: Map MMIOH1_HI base address NULL
> >
>
> Right. Because there was no devices in these regions, none of them needed
> to be mapped. This is handled by the UV BIOS.
>
> > >
> > > You can grep UV: to get UV specific messages. I also looked though the efi
> > > memmap entries and they don't have MMIO areas distinctively mentioned.
> > >
> > > I'm looking now for a lab system that has at least a single large MMIOH
> > > device (a GPU has a large MMIO aperture). I'll let you know. The GPU
> > > system we had was shipped to the HPE GPU support group down in Houston and I
> > > haven't heard from them yet. I don't think the UV's at Redhat have any I/O
> > > except for the Base I/O (required) devices.
> > >
> > > >
> > > > Or expert from HPE UV team can make a patch based on the finding and
> > > > analysis?
> > >
> > > Again, I'm not exactly sure what you need. Is it only the physical
> > > addresses reserved for MMIOH areas? (MMIOL is in the 2nd 2GB half in the
> > > lower 32 bits.) As I mentioned, we don't have fixed MMIOH addresses and
> > > BIOS sets up all MMIO areas in (I believe) the ACPI tables. So that should
> > > have the authoritative answers to your questions. (Sorry, I don't know
> > > which table has that specific info.)
> >
> > I don't get it very clearly what is the difference between MMHOH and
> > MMIOL. From the code flow, the bug is reported on MMIOH mapping. I
> > haven't found where MMIOL region need be mapped. Could you pointed it
> > out so that I can check the code where MMIOL is being handled, if it
> > need be handled.
>
> The only difference is MMIOL is 32 bit based addressing, while MMIOH is 64
> bit addressing.
> >
> > Let me list thoughts I had about MMIOH region and the bug, please help
> > check if I am right, and anything I missed:
> >
> > Now what I found from code:
> > 1) There's a UVsystab in EFI
>
> True. There are many "EFI" pointers declared to pass info from BIOS to the
> kernel via EFI.
>
> > 2) MMIOH region need be mapped to the direct mapping region which is
> > 64TB, surely here I mean nokaslr case.
>
> Yes, but these regions are in the ACPI tables, and I print the regions in
> the early startup messages strictly as informational. But this is well
> within the "start_kernel()" called functions. Much before you need the
~~~~~
'after'
> info.
> >
> > ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
> >
> > 3) With kaslr, we may shrink size of the direct mapping region because
> > usually system RAM is very small, we need reserve enough area for system
> > ram mapping, then take the left out for better randomization. For UV
> > system, we need find out their MMIOH region size (possibly MMIOIL too if
> > it need be mapped) before kernel_randomize_memory() and add it to size
> > of system RAM to join the mm region randomization.
>
> The MMIOL addresses are already mapped as they are fixed in the 2-4GB 32-bit
> range. The MMIOH mapped regions can be placed anywhere within the 64 bit
> address space.
> >
> > If my above understanding is right, the only thing would be finding the
> > MMIOH region size from efi/acpi table, sorry I really don't know where
> > it should be, as Ingo suggested. If we have no way to find it out at
> > right time, then the old post will be the only choice.
>
> The ACPI tables should have any and all info. How are you getting them now?
> Certainly even whitebox PC's (what we call "non-UV" boxes) would have that
> info in the ACPI tables? I have not had an occasion to find this info in
> the myriad of ACPI tables, so I'm not sure which specific ones to look at.
Seems we can't get info from ACPI table before kernel_randomize_memory().
>
> >
> > (I noticed you always mentioned I/O devices, its relationship with
> > MMIOH/L region is? I am a little confused. UV system could have
> > MMIOH/L region which size and addr are written into efi/acpi table,
> > while later actually they are not mapped, e.g the address is NULL case.)
>
> As I mentioned, the UV BIOS scans the PCI buses for devices for a lot of
> reasons. One is, if there are no devices needing MMIOH regions on a PCI
> host controller, it does not ask for memory to be reserved for that.
> >
> > Thanks a lot for your help!
> >
> > Thanks
> > Baoquan
>
> Btw, I'm going on a vacation soon so my replies may be even more delayed.
It's OK, only if it's convenient to you, or after your vacation.
Thanks a lot!
>
>
> > > > >
> > > > > >
> > > > > > I don't mind system specific quirks to hardware enumeration details, as long as
> > > > > > they don't pollute generic code with such special hacks.
> > > > > >
> > > > > > I.e. in this case it's wrong to allow kaslr_regions[0].size_tb to be wrong. Any
> > > > > > other code that relies on it in the future will be wrong as well on UV systems.
> > > > >
> > > > > Which may come into play on other arches with the new upcoming memory
> > > > > technologies.
> > > > > >
> > > > > > The right quirk would be to fix that up where it gets introduced, or something
> > > > > > like that.
> > > > >
> > > > > Yes, does make sense.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Ingo
> > > > > >
> >
> >