This is a try to implement KASLR for Freescale BookE64 which is based on
my earlier implementation for Freescale BookE32:
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
The implementation for Freescale BookE64 is similar as BookE32. One
difference is that Freescale BookE64 set up a TLB mapping of 1G during
booting. Another difference is that ppc64 needs the kernel to be
64K-aligned. So we can randomize the kernel in this 1G mapping and make
it 64K-aligned. This can save some code to creat another TLB map at
early boot. The disadvantage is that we only have about 1G/64K = 16384
slots to put the kernel in.
KERNELBASE
64K |--> kernel <--|
| | |
+--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
| | | |....| | | | | | | | | |....| | |
+--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
| | 1G
|-----> offset <-----|
kernstart_virt_addr
I'm not sure if the slot numbers is enough or the design has any
defects. If you have some better ideas, I would be happy to hear that.
Thank you all.
v2->v3:
Fix build error when KASLR is disabled.
v1->v2:
Add __kaslr_offset for the secondary cpu boot up.
Jason Yan (6):
powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and
kaslr_early_init()
powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper
powerpc/fsl_booke/64: implement KASLR for fsl_booke64
powerpc/fsl_booke/64: do not clear the BSS for the second pass
powerpc/fsl_booke/64: clear the original kernel if randomized
powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst
and add 64bit part
.../{kaslr-booke32.rst => kaslr-booke.rst} | 35 +++++++--
arch/powerpc/Kconfig | 2 +-
arch/powerpc/kernel/exceptions-64e.S | 23 ++++++
arch/powerpc/kernel/head_64.S | 14 ++++
arch/powerpc/kernel/setup_64.c | 4 +-
arch/powerpc/mm/mmu_decl.h | 19 ++---
arch/powerpc/mm/nohash/kaslr_booke.c | 71 +++++++++++++------
7 files changed, 132 insertions(+), 36 deletions(-)
rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%)
--
2.17.2
Now we support both 32 and 64 bit KASLR for fsl booke. Add document for
64 bit part and rename kaslr-booke32.rst to kaslr-booke.rst.
Signed-off-by: Jason Yan <[email protected]>
Cc: Scott Wood <[email protected]>
Cc: Diana Craciun <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Kees Cook <[email protected]>
---
.../{kaslr-booke32.rst => kaslr-booke.rst} | 35 ++++++++++++++++---
1 file changed, 31 insertions(+), 4 deletions(-)
rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%)
diff --git a/Documentation/powerpc/kaslr-booke32.rst b/Documentation/powerpc/kaslr-booke.rst
similarity index 59%
rename from Documentation/powerpc/kaslr-booke32.rst
rename to Documentation/powerpc/kaslr-booke.rst
index 8b259fdfdf03..42121fed8249 100644
--- a/Documentation/powerpc/kaslr-booke32.rst
+++ b/Documentation/powerpc/kaslr-booke.rst
@@ -1,15 +1,18 @@
.. SPDX-License-Identifier: GPL-2.0
-===========================
-KASLR for Freescale BookE32
-===========================
+=========================
+KASLR for Freescale BookE
+=========================
The word KASLR stands for Kernel Address Space Layout Randomization.
This document tries to explain the implementation of the KASLR for
-Freescale BookE32. KASLR is a security feature that deters exploit
+Freescale BookE. KASLR is a security feature that deters exploit
attempts relying on knowledge of the location of kernel internals.
+KASLR for Freescale BookE32
+-------------------------
+
Since CONFIG_RELOCATABLE has already supported, what we need to do is
map or copy kernel to a proper place and relocate. Freescale Book-E
parts expect lowmem to be mapped by fixed TLB entries(TLB1). The TLB1
@@ -38,5 +41,29 @@ bit of the entropy to decide the index of the 64M zone. Then we chose a
kernstart_virt_addr
+
+KASLR for Freescale BookE64
+---------------------------
+
+The implementation for Freescale BookE64 is similar as BookE32. One
+difference is that Freescale BookE64 set up a TLB mapping of 1G during
+booting. Another difference is that ppc64 needs the kernel to be
+64K-aligned. So we can randomize the kernel in this 1G mapping and make
+it 64K-aligned. This can save some code to creat another TLB map at early
+boot. The disadvantage is that we only have about 1G/64K = 16384 slots to
+put the kernel in::
+
+ KERNELBASE
+
+ 64K |--> kernel <--|
+ | | |
+ +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
+ | | | |....| | | | | | | | | |....| | |
+ +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
+ | | 1G
+ |-----> offset <-----|
+
+ kernstart_virt_addr
+
To enable KASLR, set CONFIG_RANDOMIZE_BASE = y. If KASLR is enable and you
want to disable it at runtime, add "nokaslr" to the kernel cmdline.
--
2.17.2
The implementation for Freescale BookE64 is similar as BookE32. One
difference is that Freescale BookE64 set up a TLB mapping of 1G during
booting. Another difference is that ppc64 needs the kernel to be
64K-aligned. So we can randomize the kernel in this 1G mapping and make
it 64K-aligned. This can save some code to creat another TLB map at
early boot. The disadvantage is that we only have about 1G/64K = 16384
slots to put the kernel in.
To support secondary cpu boot up, a variable __kaslr_offset was added in
first_256B section. This can help secondary cpu get the kaslr offset
before the 1:1 mapping has been setup.
Signed-off-by: Jason Yan <[email protected]>
Cc: Scott Wood <[email protected]>
Cc: Diana Craciun <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Kees Cook <[email protected]>
---
arch/powerpc/Kconfig | 2 +-
arch/powerpc/kernel/exceptions-64e.S | 10 +++++++++
arch/powerpc/kernel/head_64.S | 7 ++++++
arch/powerpc/kernel/setup_64.c | 4 +++-
arch/powerpc/mm/mmu_decl.h | 16 +++++++-------
arch/powerpc/mm/nohash/kaslr_booke.c | 33 +++++++++++++++++++++++++---
6 files changed, 59 insertions(+), 13 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c150a9d49343..754aeb96bb1c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -568,7 +568,7 @@ config RELOCATABLE
config RANDOMIZE_BASE
bool "Randomize the address of the kernel image"
- depends on (FSL_BOOKE && FLATMEM && PPC32)
+ depends on (PPC_FSL_BOOK3E && FLATMEM)
depends on RELOCATABLE
help
Randomizes the virtual address at which the kernel image is
diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S
index 1b9b174bee86..c1c05b8684ca 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -1378,6 +1378,7 @@ skpinv: addi r6,r6,1 /* Increment */
1: mflr r6
addi r6,r6,(2f - 1b)
tovirt(r6,r6)
+ add r6,r6,r19
lis r7,MSR_KERNEL@h
ori r7,r7,MSR_KERNEL@l
mtspr SPRN_SRR0,r6
@@ -1400,6 +1401,7 @@ skpinv: addi r6,r6,1 /* Increment */
/* We translate LR and return */
tovirt(r8,r8)
+ add r8,r8,r19
mtlr r8
blr
@@ -1528,6 +1530,7 @@ a2_tlbinit_code_end:
*/
_GLOBAL(start_initialization_book3e)
mflr r28
+ li r19, 0
/* First, we need to setup some initial TLBs to map the kernel
* text, data and bss at PAGE_OFFSET. We don't have a real mode
@@ -1570,6 +1573,12 @@ _GLOBAL(book3e_secondary_core_init)
cmplwi r4,0
bne 2f
+ li r19, 0
+#ifdef CONFIG_RANDOMIZE_BASE
+ LOAD_REG_ADDR_PIC(r19, __kaslr_offset)
+ lwz r19,0(r19)
+ rlwinm r19,r19,0,0,5
+#endif
/* Setup TLB for this core */
bl initial_tlb_book3e
@@ -1602,6 +1611,7 @@ _GLOBAL(book3e_secondary_core_init)
lis r3,PAGE_OFFSET@highest
sldi r3,r3,32
or r28,r28,r3
+ add r28,r28,r19
1: mtlr r28
blr
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index ad79fddb974d..744624140fb8 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -104,6 +104,13 @@ __secondary_hold_acknowledge:
.8byte 0x0
#ifdef CONFIG_RELOCATABLE
+#ifdef CONFIG_RANDOMIZE_BASE
+ . = 0x58
+ .globl __kaslr_offset
+__kaslr_offset:
+DEFINE_FIXED_SYMBOL(__kaslr_offset)
+ .long 0
+#endif
/* This flag is set to 1 by a loader if the kernel should run
* at the loaded address instead of the linked address. This
* is used by kexec-tools to keep the the kdump kernel in the
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 6104917a282d..a16b970a8d1a 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -66,7 +66,7 @@
#include <asm/feature-fixups.h>
#include <asm/kup.h>
#include <asm/early_ioremap.h>
-
+#include <mm/mmu_decl.h>
#include "setup.h"
int spinning_secondaries;
@@ -300,6 +300,8 @@ void __init early_setup(unsigned long dt_ptr)
/* Enable early debugging if any specified (see udbg.h) */
udbg_early_init();
+ kaslr_early_init(__va(dt_ptr), 0);
+
udbg_printf(" -> %s(), dt_ptr: 0x%lx\n", __func__, dt_ptr);
/*
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 3e1c85c7d10b..bbd721d1e3d7 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -147,14 +147,6 @@ void reloc_kernel_entry(void *fdt, long addr);
extern void loadcam_entry(unsigned int index);
extern void loadcam_multi(int first_idx, int num, int tmp_idx);
-#ifdef CONFIG_RANDOMIZE_BASE
-void kaslr_early_init(void *dt_ptr, phys_addr_t size);
-void kaslr_late_init(void);
-#else
-static inline void kaslr_early_init(void *dt_ptr, phys_addr_t size) {}
-static inline void kaslr_late_init(void) {}
-#endif
-
struct tlbcam {
u32 MAS0;
u32 MAS1;
@@ -164,6 +156,14 @@ struct tlbcam {
};
#endif
+#ifdef CONFIG_RANDOMIZE_BASE
+void kaslr_early_init(void *dt_ptr, phys_addr_t size);
+void kaslr_late_init(void);
+#else
+static inline void kaslr_early_init(void *dt_ptr, phys_addr_t size) {}
+static inline void kaslr_late_init(void) {}
+#endif
+
#if defined(CONFIG_PPC_BOOK3S_32) || defined(CONFIG_FSL_BOOKE) || defined(CONFIG_PPC_8xx)
/* 6xx have BATS */
/* FSL_BOOKE have TLBCAM */
diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c b/arch/powerpc/mm/nohash/kaslr_booke.c
index 07b036e98353..c6f5c1db1394 100644
--- a/arch/powerpc/mm/nohash/kaslr_booke.c
+++ b/arch/powerpc/mm/nohash/kaslr_booke.c
@@ -231,7 +231,7 @@ static __init unsigned long get_usable_address(const void *fdt,
unsigned long pa;
unsigned long pa_end;
- for (pa = offset; (long)pa > (long)start; pa -= SZ_16K) {
+ for (pa = offset; (long)pa > (long)start; pa -= SZ_64K) {
pa_end = pa + regions.kernel_size;
if (overlaps_region(fdt, pa, pa_end))
continue;
@@ -265,14 +265,14 @@ static unsigned long __init kaslr_legal_offset(void *dt_ptr, unsigned long rando
{
unsigned long koffset = 0;
unsigned long start;
- unsigned long index;
unsigned long offset;
+#ifdef CONFIG_PPC32
/*
* Decide which 64M we want to start
* Only use the low 8 bits of the random seed
*/
- index = random & 0xFF;
+ unsigned long index = random & 0xFF;
index %= regions.linear_sz / SZ_64M;
/* Decide offset inside 64M */
@@ -287,6 +287,15 @@ static unsigned long __init kaslr_legal_offset(void *dt_ptr, unsigned long rando
break;
index--;
}
+#else
+ /* Decide kernel offset inside 1G */
+ offset = random % (SZ_1G - regions.kernel_size);
+ offset = round_down(offset, SZ_64K);
+
+ start = memstart_addr;
+ offset = memstart_addr + offset;
+ koffset = get_usable_address(dt_ptr, start, offset);
+#endif
if (koffset != 0)
koffset -= memstart_addr;
@@ -325,6 +334,7 @@ static unsigned long __init kaslr_choose_location(void *dt_ptr, phys_addr_t size
else
pr_warn("KASLR: No safe seed for randomizing the kernel base.\n");
+#ifdef CONFIG_PPC32
ram = min_t(phys_addr_t, __max_low_memory, size);
ram = map_mem_in_cams(ram, CONFIG_LOWMEM_CAM_NUM, true);
linear_sz = min_t(unsigned long, ram, SZ_512M);
@@ -332,6 +342,7 @@ static unsigned long __init kaslr_choose_location(void *dt_ptr, phys_addr_t size
/* If the linear size is smaller than 64M, do not randmize */
if (linear_sz < SZ_64M)
return 0;
+#endif
/* check for a reserved-memory node and record its cell sizes */
regions.reserved_mem = fdt_path_offset(dt_ptr, "/reserved-memory");
@@ -363,6 +374,17 @@ notrace void __init kaslr_early_init(void *dt_ptr, phys_addr_t size)
unsigned long offset;
unsigned long kernel_sz;
+#ifdef CONFIG_PPC64
+ unsigned int *__kaslr_offset = (unsigned int *)(KERNELBASE + 0x58);
+ unsigned int *__run_at_load = (unsigned int *)(KERNELBASE + 0x5c);
+
+ if (*__run_at_load == 1)
+ return;
+
+ /* Setup flat device-tree pointer */
+ initial_boot_params = dt_ptr;
+#endif
+
kernel_sz = (unsigned long)_end - (unsigned long)_stext;
offset = kaslr_choose_location(dt_ptr, size, kernel_sz);
@@ -372,6 +394,7 @@ notrace void __init kaslr_early_init(void *dt_ptr, phys_addr_t size)
kernstart_virt_addr += offset;
kernstart_addr += offset;
+#ifdef CONFIG_PPC32
is_second_reloc = 1;
if (offset >= SZ_64M) {
@@ -381,6 +404,10 @@ notrace void __init kaslr_early_init(void *dt_ptr, phys_addr_t size)
/* Create kernel map to relocate in */
create_kaslr_tlb_entry(1, tlb_virt, tlb_phys);
}
+#else
+ *__kaslr_offset = kernstart_virt_addr - KERNELBASE;
+ *__run_at_load = 1;
+#endif
/* Copy the kernel to it's new location and run */
memcpy((void *)kernstart_virt_addr, (void *)_stext, kernel_sz);
--
2.17.2
The BSS section has already cleared out in the first pass. No need to
clear it again. This can save some time when booting with KASLR
enabled.
Signed-off-by: Jason Yan <[email protected]>
Cc: Scott Wood <[email protected]>
Cc: Diana Craciun <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Kees Cook <[email protected]>
---
arch/powerpc/kernel/head_64.S | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 744624140fb8..8c644e7c3eaf 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -914,6 +914,13 @@ start_here_multiplatform:
bl relative_toc
tovirt(r2,r2)
+ /* Do not clear the BSS for the second pass if randomized */
+ LOAD_REG_ADDR(r3, kernstart_virt_addr)
+ lwz r3,0(r3)
+ LOAD_REG_IMMEDIATE(r4, KERNELBASE)
+ cmpw r3,r4
+ bne 4f
+
/* Clear out the BSS. It may have been done in prom_init,
* already but that's irrelevant since prom_init will soon
* be detached from the kernel completely. Besides, we need
--
2.17.2
The original kernel still exists in the memory, clear it now.
Signed-off-by: Jason Yan <[email protected]>
Cc: Scott Wood <[email protected]>
Cc: Diana Craciun <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Kees Cook <[email protected]>
---
arch/powerpc/mm/nohash/kaslr_booke.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c b/arch/powerpc/mm/nohash/kaslr_booke.c
index c6f5c1db1394..ed1277059368 100644
--- a/arch/powerpc/mm/nohash/kaslr_booke.c
+++ b/arch/powerpc/mm/nohash/kaslr_booke.c
@@ -378,8 +378,10 @@ notrace void __init kaslr_early_init(void *dt_ptr, phys_addr_t size)
unsigned int *__kaslr_offset = (unsigned int *)(KERNELBASE + 0x58);
unsigned int *__run_at_load = (unsigned int *)(KERNELBASE + 0x5c);
- if (*__run_at_load == 1)
+ if (*__run_at_load == 1) {
+ kaslr_late_init();
return;
+ }
/* Setup flat device-tree pointer */
initial_boot_params = dt_ptr;
--
2.17.2
Hi everyone, any comments or suggestions?
Thanks,
Jason
on 2020/2/6 10:58, Jason Yan wrote:
> This is a try to implement KASLR for Freescale BookE64 which is based on
> my earlier implementation for Freescale BookE32:
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
>
> The implementation for Freescale BookE64 is similar as BookE32. One
> difference is that Freescale BookE64 set up a TLB mapping of 1G during
> booting. Another difference is that ppc64 needs the kernel to be
> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
> it 64K-aligned. This can save some code to creat another TLB map at
> early boot. The disadvantage is that we only have about 1G/64K = 16384
> slots to put the kernel in.
>
> KERNELBASE
>
> 64K |--> kernel <--|
> | | |
> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
> | | | |....| | | | | | | | | |....| | |
> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
> | | 1G
> |-----> offset <-----|
>
> kernstart_virt_addr
>
> I'm not sure if the slot numbers is enough or the design has any
> defects. If you have some better ideas, I would be happy to hear that.
>
> Thank you all.
>
> v2->v3:
> Fix build error when KASLR is disabled.
> v1->v2:
> Add __kaslr_offset for the secondary cpu boot up.
>
> Jason Yan (6):
> powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and
> kaslr_early_init()
> powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper
> powerpc/fsl_booke/64: implement KASLR for fsl_booke64
> powerpc/fsl_booke/64: do not clear the BSS for the second pass
> powerpc/fsl_booke/64: clear the original kernel if randomized
> powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst
> and add 64bit part
>
> .../{kaslr-booke32.rst => kaslr-booke.rst} | 35 +++++++--
> arch/powerpc/Kconfig | 2 +-
> arch/powerpc/kernel/exceptions-64e.S | 23 ++++++
> arch/powerpc/kernel/head_64.S | 14 ++++
> arch/powerpc/kernel/setup_64.c | 4 +-
> arch/powerpc/mm/mmu_decl.h | 19 ++---
> arch/powerpc/mm/nohash/kaslr_booke.c | 71 +++++++++++++------
> 7 files changed, 132 insertions(+), 36 deletions(-)
> rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%)
>
ping...
on 2020/2/13 11:00, Jason Yan wrote:
> Hi everyone, any comments or suggestions?
>
> Thanks,
> Jason
>
> on 2020/2/6 10:58, Jason Yan wrote:
>> This is a try to implement KASLR for Freescale BookE64 which is based on
>> my earlier implementation for Freescale BookE32:
>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
>>
>> The implementation for Freescale BookE64 is similar as BookE32. One
>> difference is that Freescale BookE64 set up a TLB mapping of 1G during
>> booting. Another difference is that ppc64 needs the kernel to be
>> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
>> it 64K-aligned. This can save some code to creat another TLB map at
>> early boot. The disadvantage is that we only have about 1G/64K = 16384
>> slots to put the kernel in.
>>
>> KERNELBASE
>>
>> 64K |--> kernel <--|
>> | | |
>> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
>> | | | |....| | | | | | | | | |....| | |
>> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
>> | | 1G
>> |-----> offset <-----|
>>
>> kernstart_virt_addr
>>
>> I'm not sure if the slot numbers is enough or the design has any
>> defects. If you have some better ideas, I would be happy to hear that.
>>
>> Thank you all.
>>
>> v2->v3:
>> Fix build error when KASLR is disabled.
>> v1->v2:
>> Add __kaslr_offset for the secondary cpu boot up.
>>
>> Jason Yan (6):
>> powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and
>> kaslr_early_init()
>> powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper
>> powerpc/fsl_booke/64: implement KASLR for fsl_booke64
>> powerpc/fsl_booke/64: do not clear the BSS for the second pass
>> powerpc/fsl_booke/64: clear the original kernel if randomized
>> powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst
>> and add 64bit part
>>
>> .../{kaslr-booke32.rst => kaslr-booke.rst} | 35 +++++++--
>> arch/powerpc/Kconfig | 2 +-
>> arch/powerpc/kernel/exceptions-64e.S | 23 ++++++
>> arch/powerpc/kernel/head_64.S | 14 ++++
>> arch/powerpc/kernel/setup_64.c | 4 +-
>> arch/powerpc/mm/mmu_decl.h | 19 ++---
>> arch/powerpc/mm/nohash/kaslr_booke.c | 71 +++++++++++++------
>> 7 files changed, 132 insertions(+), 36 deletions(-)
>> rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst}
>> (59%)
>>
>
>
> .
Le 06/02/2020 à 03:58, Jason Yan a écrit :
> The implementation for Freescale BookE64 is similar as BookE32. One
> difference is that Freescale BookE64 set up a TLB mapping of 1G during
> booting. Another difference is that ppc64 needs the kernel to be
> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
> it 64K-aligned. This can save some code to creat another TLB map at
> early boot. The disadvantage is that we only have about 1G/64K = 16384
> slots to put the kernel in.
>
> To support secondary cpu boot up, a variable __kaslr_offset was added in
> first_256B section. This can help secondary cpu get the kaslr offset
> before the 1:1 mapping has been setup.
>
> Signed-off-by: Jason Yan <[email protected]>
> Cc: Scott Wood <[email protected]>
> Cc: Diana Craciun <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Christophe Leroy <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Cc: Kees Cook <[email protected]>
> ---
> arch/powerpc/Kconfig | 2 +-
> arch/powerpc/kernel/exceptions-64e.S | 10 +++++++++
> arch/powerpc/kernel/head_64.S | 7 ++++++
> arch/powerpc/kernel/setup_64.c | 4 +++-
> arch/powerpc/mm/mmu_decl.h | 16 +++++++-------
> arch/powerpc/mm/nohash/kaslr_booke.c | 33 +++++++++++++++++++++++++---
> 6 files changed, 59 insertions(+), 13 deletions(-)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index c150a9d49343..754aeb96bb1c 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -568,7 +568,7 @@ config RELOCATABLE
>
> config RANDOMIZE_BASE
> bool "Randomize the address of the kernel image"
> - depends on (FSL_BOOKE && FLATMEM && PPC32)
> + depends on (PPC_FSL_BOOK3E && FLATMEM)
> depends on RELOCATABLE
> help
> Randomizes the virtual address at which the kernel image is
> diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S
> index 1b9b174bee86..c1c05b8684ca 100644
> --- a/arch/powerpc/kernel/exceptions-64e.S
> +++ b/arch/powerpc/kernel/exceptions-64e.S
> @@ -1378,6 +1378,7 @@ skpinv: addi r6,r6,1 /* Increment */
> 1: mflr r6
> addi r6,r6,(2f - 1b)
> tovirt(r6,r6)
> + add r6,r6,r19
> lis r7,MSR_KERNEL@h
> ori r7,r7,MSR_KERNEL@l
> mtspr SPRN_SRR0,r6
> @@ -1400,6 +1401,7 @@ skpinv: addi r6,r6,1 /* Increment */
>
> /* We translate LR and return */
> tovirt(r8,r8)
> + add r8,r8,r19
> mtlr r8
> blr
>
> @@ -1528,6 +1530,7 @@ a2_tlbinit_code_end:
> */
> _GLOBAL(start_initialization_book3e)
> mflr r28
> + li r19, 0
>
> /* First, we need to setup some initial TLBs to map the kernel
> * text, data and bss at PAGE_OFFSET. We don't have a real mode
> @@ -1570,6 +1573,12 @@ _GLOBAL(book3e_secondary_core_init)
> cmplwi r4,0
> bne 2f
>
> + li r19, 0
> +#ifdef CONFIG_RANDOMIZE_BASE
> + LOAD_REG_ADDR_PIC(r19, __kaslr_offset)
> + lwz r19,0(r19)
> + rlwinm r19,r19,0,0,5
> +#endif
> /* Setup TLB for this core */
> bl initial_tlb_book3e
>
> @@ -1602,6 +1611,7 @@ _GLOBAL(book3e_secondary_core_init)
> lis r3,PAGE_OFFSET@highest
> sldi r3,r3,32
> or r28,r28,r3
> + add r28,r28,r19
> 1: mtlr r28
> blr
>
> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> index ad79fddb974d..744624140fb8 100644
> --- a/arch/powerpc/kernel/head_64.S
> +++ b/arch/powerpc/kernel/head_64.S
> @@ -104,6 +104,13 @@ __secondary_hold_acknowledge:
> .8byte 0x0
>
> #ifdef CONFIG_RELOCATABLE
> +#ifdef CONFIG_RANDOMIZE_BASE
> + . = 0x58
> + .globl __kaslr_offset
> +__kaslr_offset:
> +DEFINE_FIXED_SYMBOL(__kaslr_offset)
> + .long 0
> +#endif
> /* This flag is set to 1 by a loader if the kernel should run
> * at the loaded address instead of the linked address. This
> * is used by kexec-tools to keep the the kdump kernel in the
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 6104917a282d..a16b970a8d1a 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -66,7 +66,7 @@
> #include <asm/feature-fixups.h>
> #include <asm/kup.h>
> #include <asm/early_ioremap.h>
> -
Why remove this new line which clearly separates things in asm/ and
things in local dir ?
> +#include <mm/mmu_decl.h>
> #include "setup.h"
>
> int spinning_secondaries;
> @@ -300,6 +300,8 @@ void __init early_setup(unsigned long dt_ptr)
> /* Enable early debugging if any specified (see udbg.h) */
> udbg_early_init();
>
> + kaslr_early_init(__va(dt_ptr), 0);
> +
> udbg_printf(" -> %s(), dt_ptr: 0x%lx\n", __func__, dt_ptr);
>
> /*
> diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
> index 3e1c85c7d10b..bbd721d1e3d7 100644
> --- a/arch/powerpc/mm/mmu_decl.h
> +++ b/arch/powerpc/mm/mmu_decl.h
> @@ -147,14 +147,6 @@ void reloc_kernel_entry(void *fdt, long addr);
> extern void loadcam_entry(unsigned int index);
> extern void loadcam_multi(int first_idx, int num, int tmp_idx);
>
> -#ifdef CONFIG_RANDOMIZE_BASE
> -void kaslr_early_init(void *dt_ptr, phys_addr_t size);
> -void kaslr_late_init(void);
> -#else
> -static inline void kaslr_early_init(void *dt_ptr, phys_addr_t size) {}
> -static inline void kaslr_late_init(void) {}
> -#endif
> -
> struct tlbcam {
> u32 MAS0;
> u32 MAS1;
> @@ -164,6 +156,14 @@ struct tlbcam {
> };
> #endif
>
> +#ifdef CONFIG_RANDOMIZE_BASE
> +void kaslr_early_init(void *dt_ptr, phys_addr_t size);
> +void kaslr_late_init(void);
> +#else
> +static inline void kaslr_early_init(void *dt_ptr, phys_addr_t size) {}
> +static inline void kaslr_late_init(void) {}
> +#endif
> +
> #if defined(CONFIG_PPC_BOOK3S_32) || defined(CONFIG_FSL_BOOKE) || defined(CONFIG_PPC_8xx)
> /* 6xx have BATS */
> /* FSL_BOOKE have TLBCAM */
> diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c b/arch/powerpc/mm/nohash/kaslr_booke.c
> index 07b036e98353..c6f5c1db1394 100644
> --- a/arch/powerpc/mm/nohash/kaslr_booke.c
> +++ b/arch/powerpc/mm/nohash/kaslr_booke.c
> @@ -231,7 +231,7 @@ static __init unsigned long get_usable_address(const void *fdt,
> unsigned long pa;
> unsigned long pa_end;
>
> - for (pa = offset; (long)pa > (long)start; pa -= SZ_16K) {
> + for (pa = offset; (long)pa > (long)start; pa -= SZ_64K) {
Doesn't this modify the behaviour for PPC32 too ?
> pa_end = pa + regions.kernel_size;
> if (overlaps_region(fdt, pa, pa_end))
> continue;
> @@ -265,14 +265,14 @@ static unsigned long __init kaslr_legal_offset(void *dt_ptr, unsigned long rando
> {
> unsigned long koffset = 0;
> unsigned long start;
> - unsigned long index;
> unsigned long offset;
>
> +#ifdef CONFIG_PPC32
Can we use
if (IS_ENABLED(CONFIG_PPC32)) {
/* 32 bits stuff */
} else {
/* 64 bits stuff */
}
> /*
> * Decide which 64M we want to start
> * Only use the low 8 bits of the random seed
> */
> - index = random & 0xFF;
> + unsigned long index = random & 0xFF;
That's not good in terms of readability, index declaration should remain
at the top of the function, should be possible if using IS_ENABLED() instead
> index %= regions.linear_sz / SZ_64M;
>
> /* Decide offset inside 64M */
> @@ -287,6 +287,15 @@ static unsigned long __init kaslr_legal_offset(void *dt_ptr, unsigned long rando
> break;
> index--;
> }
> +#else
> + /* Decide kernel offset inside 1G */
> + offset = random % (SZ_1G - regions.kernel_size);
> + offset = round_down(offset, SZ_64K);
> +
> + start = memstart_addr;
> + offset = memstart_addr + offset;
> + koffset = get_usable_address(dt_ptr, start, offset);
> +#endif
>
> if (koffset != 0)
> koffset -= memstart_addr;
> @@ -325,6 +334,7 @@ static unsigned long __init kaslr_choose_location(void *dt_ptr, phys_addr_t size
> else
> pr_warn("KASLR: No safe seed for randomizing the kernel base.\n");
>
> +#ifdef CONFIG_PPC32
> ram = min_t(phys_addr_t, __max_low_memory, size);
> ram = map_mem_in_cams(ram, CONFIG_LOWMEM_CAM_NUM, true);
> linear_sz = min_t(unsigned long, ram, SZ_512M);
> @@ -332,6 +342,7 @@ static unsigned long __init kaslr_choose_location(void *dt_ptr, phys_addr_t size
> /* If the linear size is smaller than 64M, do not randmize */
> if (linear_sz < SZ_64M)
> return 0;
> +#endif
>
> /* check for a reserved-memory node and record its cell sizes */
> regions.reserved_mem = fdt_path_offset(dt_ptr, "/reserved-memory");
> @@ -363,6 +374,17 @@ notrace void __init kaslr_early_init(void *dt_ptr, phys_addr_t size)
> unsigned long offset;
> unsigned long kernel_sz;
>
> +#ifdef CONFIG_PPC64
Same, can we use a standard C if/else sequence with
IS_ENABLED(CONFIG_PPC64) ?
> + unsigned int *__kaslr_offset = (unsigned int *)(KERNELBASE + 0x58);
> + unsigned int *__run_at_load = (unsigned int *)(KERNELBASE + 0x5c);
> +
> + if (*__run_at_load == 1)
> + return;
> +
> + /* Setup flat device-tree pointer */
> + initial_boot_params = dt_ptr;
> +#endif
> +
> kernel_sz = (unsigned long)_end - (unsigned long)_stext;
>
> offset = kaslr_choose_location(dt_ptr, size, kernel_sz);
> @@ -372,6 +394,7 @@ notrace void __init kaslr_early_init(void *dt_ptr, phys_addr_t size)
> kernstart_virt_addr += offset;
> kernstart_addr += offset;
>
> +#ifdef CONFIG_PPC32
> is_second_reloc = 1;
>
> if (offset >= SZ_64M) {
> @@ -381,6 +404,10 @@ notrace void __init kaslr_early_init(void *dt_ptr, phys_addr_t size)
> /* Create kernel map to relocate in */
> create_kaslr_tlb_entry(1, tlb_virt, tlb_phys);
> }
> +#else
> + *__kaslr_offset = kernstart_virt_addr - KERNELBASE;
> + *__run_at_load = 1;
> +#endif
>
> /* Copy the kernel to it's new location and run */
> memcpy((void *)kernstart_virt_addr, (void *)_stext, kernel_sz);
>
Christophe
Le 06/02/2020 à 03:58, Jason Yan a écrit :
> Now we support both 32 and 64 bit KASLR for fsl booke. Add document for
> 64 bit part and rename kaslr-booke32.rst to kaslr-booke.rst.
>
> Signed-off-by: Jason Yan <[email protected]>
> Cc: Scott Wood <[email protected]>
> Cc: Diana Craciun <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Christophe Leroy <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Cc: Kees Cook <[email protected]>
> ---
> .../{kaslr-booke32.rst => kaslr-booke.rst} | 35 ++++++++++++++++---
> 1 file changed, 31 insertions(+), 4 deletions(-)
> rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%)
Also update Documentation/powerpc/index.rst ?
Christophe
Le 06/02/2020 à 03:58, Jason Yan a écrit :
> The original kernel still exists in the memory, clear it now.
No such problem with PPC32 ? Or is that common ?
Christophe
>
> Signed-off-by: Jason Yan <[email protected]>
> Cc: Scott Wood <[email protected]>
> Cc: Diana Craciun <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Christophe Leroy <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Cc: Kees Cook <[email protected]>
> ---
> arch/powerpc/mm/nohash/kaslr_booke.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c b/arch/powerpc/mm/nohash/kaslr_booke.c
> index c6f5c1db1394..ed1277059368 100644
> --- a/arch/powerpc/mm/nohash/kaslr_booke.c
> +++ b/arch/powerpc/mm/nohash/kaslr_booke.c
> @@ -378,8 +378,10 @@ notrace void __init kaslr_early_init(void *dt_ptr, phys_addr_t size)
> unsigned int *__kaslr_offset = (unsigned int *)(KERNELBASE + 0x58);
> unsigned int *__run_at_load = (unsigned int *)(KERNELBASE + 0x5c);
>
> - if (*__run_at_load == 1)
> + if (*__run_at_load == 1) {
> + kaslr_late_init();
> return;
> + }
>
> /* Setup flat device-tree pointer */
> initial_boot_params = dt_ptr;
>
在 2020/2/20 21:48, Christophe Leroy 写道:
>
>
> Le 06/02/2020 à 03:58, Jason Yan a écrit :
>> The implementation for Freescale BookE64 is similar as BookE32. One
>> difference is that Freescale BookE64 set up a TLB mapping of 1G during
>> booting. Another difference is that ppc64 needs the kernel to be
>> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
>> it 64K-aligned. This can save some code to creat another TLB map at
>> early boot. The disadvantage is that we only have about 1G/64K = 16384
>> slots to put the kernel in.
>>
>> To support secondary cpu boot up, a variable __kaslr_offset was added in
>> first_256B section. This can help secondary cpu get the kaslr offset
>> before the 1:1 mapping has been setup.
>>
>> Signed-off-by: Jason Yan <[email protected]>
>> Cc: Scott Wood <[email protected]>
>> Cc: Diana Craciun <[email protected]>
>> Cc: Michael Ellerman <[email protected]>
>> Cc: Christophe Leroy <[email protected]>
>> Cc: Benjamin Herrenschmidt <[email protected]>
>> Cc: Paul Mackerras <[email protected]>
>> Cc: Nicholas Piggin <[email protected]>
>> Cc: Kees Cook <[email protected]>
>> ---
>> arch/powerpc/Kconfig | 2 +-
>> arch/powerpc/kernel/exceptions-64e.S | 10 +++++++++
>> arch/powerpc/kernel/head_64.S | 7 ++++++
>> arch/powerpc/kernel/setup_64.c | 4 +++-
>> arch/powerpc/mm/mmu_decl.h | 16 +++++++-------
>> arch/powerpc/mm/nohash/kaslr_booke.c | 33 +++++++++++++++++++++++++---
>> 6 files changed, 59 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index c150a9d49343..754aeb96bb1c 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -568,7 +568,7 @@ config RELOCATABLE
>> config RANDOMIZE_BASE
>> bool "Randomize the address of the kernel image"
>> - depends on (FSL_BOOKE && FLATMEM && PPC32)
>> + depends on (PPC_FSL_BOOK3E && FLATMEM)
>> depends on RELOCATABLE
>> help
>> Randomizes the virtual address at which the kernel image is
>> diff --git a/arch/powerpc/kernel/exceptions-64e.S
>> b/arch/powerpc/kernel/exceptions-64e.S
>> index 1b9b174bee86..c1c05b8684ca 100644
>> --- a/arch/powerpc/kernel/exceptions-64e.S
>> +++ b/arch/powerpc/kernel/exceptions-64e.S
>> @@ -1378,6 +1378,7 @@ skpinv: addi r6,r6,1 /*
>> Increment */
>> 1: mflr r6
>> addi r6,r6,(2f - 1b)
>> tovirt(r6,r6)
>> + add r6,r6,r19
>> lis r7,MSR_KERNEL@h
>> ori r7,r7,MSR_KERNEL@l
>> mtspr SPRN_SRR0,r6
>> @@ -1400,6 +1401,7 @@ skpinv: addi r6,r6,1 /*
>> Increment */
>> /* We translate LR and return */
>> tovirt(r8,r8)
>> + add r8,r8,r19
>> mtlr r8
>> blr
>> @@ -1528,6 +1530,7 @@ a2_tlbinit_code_end:
>> */
>> _GLOBAL(start_initialization_book3e)
>> mflr r28
>> + li r19, 0
>> /* First, we need to setup some initial TLBs to map the kernel
>> * text, data and bss at PAGE_OFFSET. We don't have a real mode
>> @@ -1570,6 +1573,12 @@ _GLOBAL(book3e_secondary_core_init)
>> cmplwi r4,0
>> bne 2f
>> + li r19, 0
>> +#ifdef CONFIG_RANDOMIZE_BASE
>> + LOAD_REG_ADDR_PIC(r19, __kaslr_offset)
>> + lwz r19,0(r19)
>> + rlwinm r19,r19,0,0,5
>> +#endif
>> /* Setup TLB for this core */
>> bl initial_tlb_book3e
>> @@ -1602,6 +1611,7 @@ _GLOBAL(book3e_secondary_core_init)
>> lis r3,PAGE_OFFSET@highest
>> sldi r3,r3,32
>> or r28,r28,r3
>> + add r28,r28,r19
>> 1: mtlr r28
>> blr
>> diff --git a/arch/powerpc/kernel/head_64.S
>> b/arch/powerpc/kernel/head_64.S
>> index ad79fddb974d..744624140fb8 100644
>> --- a/arch/powerpc/kernel/head_64.S
>> +++ b/arch/powerpc/kernel/head_64.S
>> @@ -104,6 +104,13 @@ __secondary_hold_acknowledge:
>> .8byte 0x0
>> #ifdef CONFIG_RELOCATABLE
>> +#ifdef CONFIG_RANDOMIZE_BASE
>> + . = 0x58
>> + .globl __kaslr_offset
>> +__kaslr_offset:
>> +DEFINE_FIXED_SYMBOL(__kaslr_offset)
>> + .long 0
>> +#endif
>> /* This flag is set to 1 by a loader if the kernel should run
>> * at the loaded address instead of the linked address. This
>> * is used by kexec-tools to keep the the kdump kernel in the
>> diff --git a/arch/powerpc/kernel/setup_64.c
>> b/arch/powerpc/kernel/setup_64.c
>> index 6104917a282d..a16b970a8d1a 100644
>> --- a/arch/powerpc/kernel/setup_64.c
>> +++ b/arch/powerpc/kernel/setup_64.c
>> @@ -66,7 +66,7 @@
>> #include <asm/feature-fixups.h>
>> #include <asm/kup.h>
>> #include <asm/early_ioremap.h>
>> -
>
> Why remove this new line which clearly separates things in asm/ and
> things in local dir ?
Sorry to break this. I will add the new line back.
>
>> +#include <mm/mmu_decl.h>
>> #include "setup.h"
>> int spinning_secondaries;
>> @@ -300,6 +300,8 @@ void __init early_setup(unsigned long dt_ptr)
>> /* Enable early debugging if any specified (see udbg.h) */
>> udbg_early_init();
>> + kaslr_early_init(__va(dt_ptr), 0);
>> +
>> udbg_printf(" -> %s(), dt_ptr: 0x%lx\n", __func__, dt_ptr);
>> /*
>> diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
>> index 3e1c85c7d10b..bbd721d1e3d7 100644
>> --- a/arch/powerpc/mm/mmu_decl.h
>> +++ b/arch/powerpc/mm/mmu_decl.h
>> @@ -147,14 +147,6 @@ void reloc_kernel_entry(void *fdt, long addr);
>> extern void loadcam_entry(unsigned int index);
>> extern void loadcam_multi(int first_idx, int num, int tmp_idx);
>> -#ifdef CONFIG_RANDOMIZE_BASE
>> -void kaslr_early_init(void *dt_ptr, phys_addr_t size);
>> -void kaslr_late_init(void);
>> -#else
>> -static inline void kaslr_early_init(void *dt_ptr, phys_addr_t size) {}
>> -static inline void kaslr_late_init(void) {}
>> -#endif
>> -
>> struct tlbcam {
>> u32 MAS0;
>> u32 MAS1;
>> @@ -164,6 +156,14 @@ struct tlbcam {
>> };
>> #endif
>> +#ifdef CONFIG_RANDOMIZE_BASE
>> +void kaslr_early_init(void *dt_ptr, phys_addr_t size);
>> +void kaslr_late_init(void);
>> +#else
>> +static inline void kaslr_early_init(void *dt_ptr, phys_addr_t size) {}
>> +static inline void kaslr_late_init(void) {}
>> +#endif
>> +
>> #if defined(CONFIG_PPC_BOOK3S_32) || defined(CONFIG_FSL_BOOKE) ||
>> defined(CONFIG_PPC_8xx)
>> /* 6xx have BATS */
>> /* FSL_BOOKE have TLBCAM */
>> diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c
>> b/arch/powerpc/mm/nohash/kaslr_booke.c
>> index 07b036e98353..c6f5c1db1394 100644
>> --- a/arch/powerpc/mm/nohash/kaslr_booke.c
>> +++ b/arch/powerpc/mm/nohash/kaslr_booke.c
>> @@ -231,7 +231,7 @@ static __init unsigned long
>> get_usable_address(const void *fdt,
>> unsigned long pa;
>> unsigned long pa_end;
>> - for (pa = offset; (long)pa > (long)start; pa -= SZ_16K) {
>> + for (pa = offset; (long)pa > (long)start; pa -= SZ_64K) {
>
> Doesn't this modify the behaviour for PPC32 too ?
Oh, yes. I will fix this.
>
>> pa_end = pa + regions.kernel_size;
>> if (overlaps_region(fdt, pa, pa_end))
>> continue;
>> @@ -265,14 +265,14 @@ static unsigned long __init
>> kaslr_legal_offset(void *dt_ptr, unsigned long rando
>> {
>> unsigned long koffset = 0;
>> unsigned long start;
>> - unsigned long index;
>> unsigned long offset;
>> +#ifdef CONFIG_PPC32
>
> Can we use
>
> if (IS_ENABLED(CONFIG_PPC32)) {
> /* 32 bits stuff */
> } else {
> /* 64 bits stuff */
> }
Thansk for the suggestion. I will consider to use IS_ENABLED() instead.
>
>> /*
>> * Decide which 64M we want to start
>> * Only use the low 8 bits of the random seed
>> */
>> - index = random & 0xFF;
>> + unsigned long index = random & 0xFF;
>
> That's not good in terms of readability, index declaration should remain
> at the top of the function, should be possible if using IS_ENABLED()
> instead
I'm wondering how to declare a variable inside a code block such as if
(IS_ENABLED(CONFIG_PPC32)) at the top of the function and use the
variable in another if (IS_ENABLED(CONFIG_PPC32)). Is there any good idea?
>
>> index %= regions.linear_sz / SZ_64M;
>> /* Decide offset inside 64M */
>> @@ -287,6 +287,15 @@ static unsigned long __init
>> kaslr_legal_offset(void *dt_ptr, unsigned long rando
>> break;
>> index--;
>> }
>> +#else
>> + /* Decide kernel offset inside 1G */
>> + offset = random % (SZ_1G - regions.kernel_size);
>> + offset = round_down(offset, SZ_64K);
>> +
>> + start = memstart_addr;
>> + offset = memstart_addr + offset;
>> + koffset = get_usable_address(dt_ptr, start, offset);
>> +#endif
>> if (koffset != 0)
>> koffset -= memstart_addr;
>> @@ -325,6 +334,7 @@ static unsigned long __init
>> kaslr_choose_location(void *dt_ptr, phys_addr_t size
>> else
>> pr_warn("KASLR: No safe seed for randomizing the kernel
>> base.\n");
>> +#ifdef CONFIG_PPC32
>> ram = min_t(phys_addr_t, __max_low_memory, size);
>> ram = map_mem_in_cams(ram, CONFIG_LOWMEM_CAM_NUM, true);
>> linear_sz = min_t(unsigned long, ram, SZ_512M);
>> @@ -332,6 +342,7 @@ static unsigned long __init
>> kaslr_choose_location(void *dt_ptr, phys_addr_t size
>> /* If the linear size is smaller than 64M, do not randmize */
>> if (linear_sz < SZ_64M)
>> return 0;
>> +#endif
>> /* check for a reserved-memory node and record its cell sizes */
>> regions.reserved_mem = fdt_path_offset(dt_ptr, "/reserved-memory");
>> @@ -363,6 +374,17 @@ notrace void __init kaslr_early_init(void
>> *dt_ptr, phys_addr_t size)
>> unsigned long offset;
>> unsigned long kernel_sz;
>> +#ifdef CONFIG_PPC64
>
> Same, can we use a standard C if/else sequence with
> IS_ENABLED(CONFIG_PPC64) ?
OK, I will try to do this if I can deal with the declaration of
variables in different if/else sequence.
Thanks,
Jason
>
>> + unsigned int *__kaslr_offset = (unsigned int *)(KERNELBASE + 0x58);
>> + unsigned int *__run_at_load = (unsigned int *)(KERNELBASE + 0x5c);
>> +
>> + if (*__run_at_load == 1)
>> + return;
>> +
>> + /* Setup flat device-tree pointer */
>> + initial_boot_params = dt_ptr;
>> +#endif
>> +
>> kernel_sz = (unsigned long)_end - (unsigned long)_stext;
>> offset = kaslr_choose_location(dt_ptr, size, kernel_sz);
>> @@ -372,6 +394,7 @@ notrace void __init kaslr_early_init(void *dt_ptr,
>> phys_addr_t size)
>> kernstart_virt_addr += offset;
>> kernstart_addr += offset;
>> +#ifdef CONFIG_PPC32
>> is_second_reloc = 1;
>> if (offset >= SZ_64M) {
>> @@ -381,6 +404,10 @@ notrace void __init kaslr_early_init(void
>> *dt_ptr, phys_addr_t size)
>> /* Create kernel map to relocate in */
>> create_kaslr_tlb_entry(1, tlb_virt, tlb_phys);
>> }
>> +#else
>> + *__kaslr_offset = kernstart_virt_addr - KERNELBASE;
>> + *__run_at_load = 1;
>> +#endif
>> /* Copy the kernel to it's new location and run */
>> memcpy((void *)kernstart_virt_addr, (void *)_stext, kernel_sz);
>>
>
> Christophe
>
> .
在 2020/2/20 21:49, Christophe Leroy 写道:
>
>
> Le 06/02/2020 à 03:58, Jason Yan a écrit :
>> The original kernel still exists in the memory, clear it now.
>
> No such problem with PPC32 ? Or is that common ?
>
PPC32 did this in relocate_init() in fsl_booke.c because PPC32 will not
reach kaslr_early_init for the second pass after relocation.
Thanks,
Jason
> Christophe
>
>>
>> Signed-off-by: Jason Yan <[email protected]>
>> Cc: Scott Wood <[email protected]>
>> Cc: Diana Craciun <[email protected]>
>> Cc: Michael Ellerman <[email protected]>
>> Cc: Christophe Leroy <[email protected]>
>> Cc: Benjamin Herrenschmidt <[email protected]>
>> Cc: Paul Mackerras <[email protected]>
>> Cc: Nicholas Piggin <[email protected]>
>> Cc: Kees Cook <[email protected]>
>> ---
>> arch/powerpc/mm/nohash/kaslr_booke.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c
>> b/arch/powerpc/mm/nohash/kaslr_booke.c
>> index c6f5c1db1394..ed1277059368 100644
>> --- a/arch/powerpc/mm/nohash/kaslr_booke.c
>> +++ b/arch/powerpc/mm/nohash/kaslr_booke.c
>> @@ -378,8 +378,10 @@ notrace void __init kaslr_early_init(void
>> *dt_ptr, phys_addr_t size)
>> unsigned int *__kaslr_offset = (unsigned int *)(KERNELBASE + 0x58);
>> unsigned int *__run_at_load = (unsigned int *)(KERNELBASE + 0x5c);
>> - if (*__run_at_load == 1)
>> + if (*__run_at_load == 1) {
>> + kaslr_late_init();
>> return;
>> + }
>> /* Setup flat device-tree pointer */
>> initial_boot_params = dt_ptr;
>>
>
> .
在 2020/2/20 21:50, Christophe Leroy 写道:
>
>
> Le 06/02/2020 à 03:58, Jason Yan a écrit :
>> Now we support both 32 and 64 bit KASLR for fsl booke. Add document for
>> 64 bit part and rename kaslr-booke32.rst to kaslr-booke.rst.
>>
>> Signed-off-by: Jason Yan <[email protected]>
>> Cc: Scott Wood <[email protected]>
>> Cc: Diana Craciun <[email protected]>
>> Cc: Michael Ellerman <[email protected]>
>> Cc: Christophe Leroy <[email protected]>
>> Cc: Benjamin Herrenschmidt <[email protected]>
>> Cc: Paul Mackerras <[email protected]>
>> Cc: Nicholas Piggin <[email protected]>
>> Cc: Kees Cook <[email protected]>
>> ---
>> .../{kaslr-booke32.rst => kaslr-booke.rst} | 35 ++++++++++++++++---
>> 1 file changed, 31 insertions(+), 4 deletions(-)
>> rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst}
>> (59%)
>
> Also update Documentation/powerpc/index.rst ?
>
Oh yes, thanks for reminding me of this.
Thanks,
Jason
> Christophe
>
> .
在 2020/2/26 10:40, Jason Yan 写道:
>
>
> 在 2020/2/20 21:48, Christophe Leroy 写道:
>>
>>
>> Le 06/02/2020 à 03:58, Jason Yan a écrit :
>>> The implementation for Freescale BookE64 is similar as BookE32. One
>>> difference is that Freescale BookE64 set up a TLB mapping of 1G during
>>> booting. Another difference is that ppc64 needs the kernel to be
>>> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
>>> it 64K-aligned. This can save some code to creat another TLB map at
>>> early boot. The disadvantage is that we only have about 1G/64K = 16384
>>> slots to put the kernel in.
>>>
>>> To support secondary cpu boot up, a variable __kaslr_offset was added in
>>> first_256B section. This can help secondary cpu get the kaslr offset
>>> before the 1:1 mapping has been setup.
>>>
>>> Signed-off-by: Jason Yan <[email protected]>
>>> Cc: Scott Wood <[email protected]>
>>> Cc: Diana Craciun <[email protected]>
>>> Cc: Michael Ellerman <[email protected]>
>>> Cc: Christophe Leroy <[email protected]>
>>> Cc: Benjamin Herrenschmidt <[email protected]>
>>> Cc: Paul Mackerras <[email protected]>
>>> Cc: Nicholas Piggin <[email protected]>
>>> Cc: Kees Cook <[email protected]>
>>> ---
>>> arch/powerpc/Kconfig | 2 +-
>>> arch/powerpc/kernel/exceptions-64e.S | 10 +++++++++
>>> arch/powerpc/kernel/head_64.S | 7 ++++++
>>> arch/powerpc/kernel/setup_64.c | 4 +++-
>>> arch/powerpc/mm/mmu_decl.h | 16 +++++++-------
>>> arch/powerpc/mm/nohash/kaslr_booke.c | 33 +++++++++++++++++++++++++---
>>> 6 files changed, 59 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>>> index c150a9d49343..754aeb96bb1c 100644
>>> --- a/arch/powerpc/Kconfig
>>> +++ b/arch/powerpc/Kconfig
>>> @@ -568,7 +568,7 @@ config RELOCATABLE
>>> config RANDOMIZE_BASE
>>> bool "Randomize the address of the kernel image"
>>> - depends on (FSL_BOOKE && FLATMEM && PPC32)
>>> + depends on (PPC_FSL_BOOK3E && FLATMEM)
>>> depends on RELOCATABLE
>>> help
>>> Randomizes the virtual address at which the kernel image is
>>> diff --git a/arch/powerpc/kernel/exceptions-64e.S
>>> b/arch/powerpc/kernel/exceptions-64e.S
>>> index 1b9b174bee86..c1c05b8684ca 100644
>>> --- a/arch/powerpc/kernel/exceptions-64e.S
>>> +++ b/arch/powerpc/kernel/exceptions-64e.S
>>> @@ -1378,6 +1378,7 @@ skpinv: addi r6,r6,1 /*
>>> Increment */
>>> 1: mflr r6
>>> addi r6,r6,(2f - 1b)
>>> tovirt(r6,r6)
>>> + add r6,r6,r19
>>> lis r7,MSR_KERNEL@h
>>> ori r7,r7,MSR_KERNEL@l
>>> mtspr SPRN_SRR0,r6
>>> @@ -1400,6 +1401,7 @@ skpinv: addi r6,r6,1 /*
>>> Increment */
>>> /* We translate LR and return */
>>> tovirt(r8,r8)
>>> + add r8,r8,r19
>>> mtlr r8
>>> blr
>>> @@ -1528,6 +1530,7 @@ a2_tlbinit_code_end:
>>> */
>>> _GLOBAL(start_initialization_book3e)
>>> mflr r28
>>> + li r19, 0
>>> /* First, we need to setup some initial TLBs to map the kernel
>>> * text, data and bss at PAGE_OFFSET. We don't have a real mode
>>> @@ -1570,6 +1573,12 @@ _GLOBAL(book3e_secondary_core_init)
>>> cmplwi r4,0
>>> bne 2f
>>> + li r19, 0
>>> +#ifdef CONFIG_RANDOMIZE_BASE
>>> + LOAD_REG_ADDR_PIC(r19, __kaslr_offset)
>>> + lwz r19,0(r19)
>>> + rlwinm r19,r19,0,0,5
>>> +#endif
>>> /* Setup TLB for this core */
>>> bl initial_tlb_book3e
>>> @@ -1602,6 +1611,7 @@ _GLOBAL(book3e_secondary_core_init)
>>> lis r3,PAGE_OFFSET@highest
>>> sldi r3,r3,32
>>> or r28,r28,r3
>>> + add r28,r28,r19
>>> 1: mtlr r28
>>> blr
>>> diff --git a/arch/powerpc/kernel/head_64.S
>>> b/arch/powerpc/kernel/head_64.S
>>> index ad79fddb974d..744624140fb8 100644
>>> --- a/arch/powerpc/kernel/head_64.S
>>> +++ b/arch/powerpc/kernel/head_64.S
>>> @@ -104,6 +104,13 @@ __secondary_hold_acknowledge:
>>> .8byte 0x0
>>> #ifdef CONFIG_RELOCATABLE
>>> +#ifdef CONFIG_RANDOMIZE_BASE
>>> + . = 0x58
>>> + .globl __kaslr_offset
>>> +__kaslr_offset:
>>> +DEFINE_FIXED_SYMBOL(__kaslr_offset)
>>> + .long 0
>>> +#endif
>>> /* This flag is set to 1 by a loader if the kernel should run
>>> * at the loaded address instead of the linked address. This
>>> * is used by kexec-tools to keep the the kdump kernel in the
>>> diff --git a/arch/powerpc/kernel/setup_64.c
>>> b/arch/powerpc/kernel/setup_64.c
>>> index 6104917a282d..a16b970a8d1a 100644
>>> --- a/arch/powerpc/kernel/setup_64.c
>>> +++ b/arch/powerpc/kernel/setup_64.c
>>> @@ -66,7 +66,7 @@
>>> #include <asm/feature-fixups.h>
>>> #include <asm/kup.h>
>>> #include <asm/early_ioremap.h>
>>> -
>>
>> Why remove this new line which clearly separates things in asm/ and
>> things in local dir ?
>
> Sorry to break this. I will add the new line back.
>
>>
>>> +#include <mm/mmu_decl.h>
>>> #include "setup.h"
>>> int spinning_secondaries;
>>> @@ -300,6 +300,8 @@ void __init early_setup(unsigned long dt_ptr)
>>> /* Enable early debugging if any specified (see udbg.h) */
>>> udbg_early_init();
>>> + kaslr_early_init(__va(dt_ptr), 0);
>>> +
>>> udbg_printf(" -> %s(), dt_ptr: 0x%lx\n", __func__, dt_ptr);
>>> /*
>>> diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
>>> index 3e1c85c7d10b..bbd721d1e3d7 100644
>>> --- a/arch/powerpc/mm/mmu_decl.h
>>> +++ b/arch/powerpc/mm/mmu_decl.h
>>> @@ -147,14 +147,6 @@ void reloc_kernel_entry(void *fdt, long addr);
>>> extern void loadcam_entry(unsigned int index);
>>> extern void loadcam_multi(int first_idx, int num, int tmp_idx);
>>> -#ifdef CONFIG_RANDOMIZE_BASE
>>> -void kaslr_early_init(void *dt_ptr, phys_addr_t size);
>>> -void kaslr_late_init(void);
>>> -#else
>>> -static inline void kaslr_early_init(void *dt_ptr, phys_addr_t size) {}
>>> -static inline void kaslr_late_init(void) {}
>>> -#endif
>>> -
>>> struct tlbcam {
>>> u32 MAS0;
>>> u32 MAS1;
>>> @@ -164,6 +156,14 @@ struct tlbcam {
>>> };
>>> #endif
>>> +#ifdef CONFIG_RANDOMIZE_BASE
>>> +void kaslr_early_init(void *dt_ptr, phys_addr_t size);
>>> +void kaslr_late_init(void);
>>> +#else
>>> +static inline void kaslr_early_init(void *dt_ptr, phys_addr_t size) {}
>>> +static inline void kaslr_late_init(void) {}
>>> +#endif
>>> +
>>> #if defined(CONFIG_PPC_BOOK3S_32) || defined(CONFIG_FSL_BOOKE) ||
>>> defined(CONFIG_PPC_8xx)
>>> /* 6xx have BATS */
>>> /* FSL_BOOKE have TLBCAM */
>>> diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c
>>> b/arch/powerpc/mm/nohash/kaslr_booke.c
>>> index 07b036e98353..c6f5c1db1394 100644
>>> --- a/arch/powerpc/mm/nohash/kaslr_booke.c
>>> +++ b/arch/powerpc/mm/nohash/kaslr_booke.c
>>> @@ -231,7 +231,7 @@ static __init unsigned long
>>> get_usable_address(const void *fdt,
>>> unsigned long pa;
>>> unsigned long pa_end;
>>> - for (pa = offset; (long)pa > (long)start; pa -= SZ_16K) {
>>> + for (pa = offset; (long)pa > (long)start; pa -= SZ_64K) {
>>
>> Doesn't this modify the behaviour for PPC32 too ?
>
> Oh, yes. I will fix this.
>
>>
>>> pa_end = pa + regions.kernel_size;
>>> if (overlaps_region(fdt, pa, pa_end))
>>> continue;
>>> @@ -265,14 +265,14 @@ static unsigned long __init
>>> kaslr_legal_offset(void *dt_ptr, unsigned long rando
>>> {
>>> unsigned long koffset = 0;
>>> unsigned long start;
>>> - unsigned long index;
>>> unsigned long offset;
>>> +#ifdef CONFIG_PPC32
>>
>> Can we use
>>
>> if (IS_ENABLED(CONFIG_PPC32)) {
>> /* 32 bits stuff */
>> } else {
>> /* 64 bits stuff */
>> }
>
> Thansk for the suggestion. I will consider to use IS_ENABLED() instead.
>
>>
>>> /*
>>> * Decide which 64M we want to start
>>> * Only use the low 8 bits of the random seed
>>> */
>>> - index = random & 0xFF;
>>> + unsigned long index = random & 0xFF;
>>
>> That's not good in terms of readability, index declaration should
>> remain at the top of the function, should be possible if using
>> IS_ENABLED() instead
>
> I'm wondering how to declare a variable inside a code block such as if
> (IS_ENABLED(CONFIG_PPC32)) at the top of the function and use the
> variable in another if (IS_ENABLED(CONFIG_PPC32)). Is there any good idea?
>
Hi Christophe,
When using a standard C if/else, all code compiled for PPC32 and PPC64,
but this will bring some build error because not all variables both
defined for PPC32 and PPC64.
[yanaijie@138 linux]$ sh ppc64build.sh
CALL scripts/atomic/check-atomics.sh
CALL scripts/checksyscalls.sh
CHK include/generated/compile.h
CC arch/powerpc/mm/nohash/kaslr_booke.o
arch/powerpc/mm/nohash/kaslr_booke.c: In function 'kaslr_choose_location':
arch/powerpc/mm/nohash/kaslr_booke.c:341:30: error:
'CONFIG_LOWMEM_CAM_NUM' undeclared (first use in this function); did you
mean 'CONFIG_FLATMEM_MANUAL'?
ram = map_mem_in_cams(ram, CONFIG_LOWMEM_CAM_NUM, true);
^~~~~~~~~~~~~~~~~~~~~
CONFIG_FLATMEM_MANUAL
arch/powerpc/mm/nohash/kaslr_booke.c:341:30: note: each undeclared
identifier is reported only once for each function it appears in
arch/powerpc/mm/nohash/kaslr_booke.c: In function 'kaslr_early_init':
arch/powerpc/mm/nohash/kaslr_booke.c:404:3: error: 'is_second_reloc'
undeclared (first use in this function); did you mean '__cond_lock'?
is_second_reloc = 1;
^~~~~~~~~~~~~~~
__cond_lock
arch/powerpc/mm/nohash/kaslr_booke.c:411:4: error: implicit declaration
of function 'create_kaslr_tlb_entry'; did you mean 'reloc_kernel_entry'?
[-Werror=implicit-function-declaration]
create_kaslr_tlb_entry(1, tlb_virt, tlb_phys);
^~~~~~~~~~~~~~~~~~~~~~
reloc_kernel_entry
cc1: all warnings being treated as errors
make[3]: *** [scripts/Makefile.build:268:
arch/powerpc/mm/nohash/kaslr_booke.o] Error 1
make[2]: *** [scripts/Makefile.build:505: arch/powerpc/mm/nohash] Error 2
make[1]: *** [scripts/Makefile.build:505: arch/powerpc/mm] Error 2
make: *** [Makefile:1681: arch/powerpc] Error 2
Thanks,
Jason
>>
>>> index %= regions.linear_sz / SZ_64M;
>>> /* Decide offset inside 64M */
>>> @@ -287,6 +287,15 @@ static unsigned long __init
>>> kaslr_legal_offset(void *dt_ptr, unsigned long rando
>>> break;
>>> index--;
>>> }
>>> +#else
>>> + /* Decide kernel offset inside 1G */
>>> + offset = random % (SZ_1G - regions.kernel_size);
>>> + offset = round_down(offset, SZ_64K);
>>> +
>>> + start = memstart_addr;
>>> + offset = memstart_addr + offset;
>>> + koffset = get_usable_address(dt_ptr, start, offset);
>>> +#endif
>>> if (koffset != 0)
>>> koffset -= memstart_addr;
>>> @@ -325,6 +334,7 @@ static unsigned long __init
>>> kaslr_choose_location(void *dt_ptr, phys_addr_t size
>>> else
>>> pr_warn("KASLR: No safe seed for randomizing the kernel
>>> base.\n");
>>> +#ifdef CONFIG_PPC32
>>> ram = min_t(phys_addr_t, __max_low_memory, size);
>>> ram = map_mem_in_cams(ram, CONFIG_LOWMEM_CAM_NUM, true);
>>> linear_sz = min_t(unsigned long, ram, SZ_512M);
>>> @@ -332,6 +342,7 @@ static unsigned long __init
>>> kaslr_choose_location(void *dt_ptr, phys_addr_t size
>>> /* If the linear size is smaller than 64M, do not randmize */
>>> if (linear_sz < SZ_64M)
>>> return 0;
>>> +#endif
>>> /* check for a reserved-memory node and record its cell sizes */
>>> regions.reserved_mem = fdt_path_offset(dt_ptr,
>>> "/reserved-memory");
>>> @@ -363,6 +374,17 @@ notrace void __init kaslr_early_init(void
>>> *dt_ptr, phys_addr_t size)
>>> unsigned long offset;
>>> unsigned long kernel_sz;
>>> +#ifdef CONFIG_PPC64
>>
>> Same, can we use a standard C if/else sequence with
>> IS_ENABLED(CONFIG_PPC64) ?
>
> OK, I will try to do this if I can deal with the declaration of
> variables in different if/else sequence.
>
> Thanks,
> Jason
>
>
>>
>>> + unsigned int *__kaslr_offset = (unsigned int *)(KERNELBASE + 0x58);
>>> + unsigned int *__run_at_load = (unsigned int *)(KERNELBASE + 0x5c);
>>> +
>>> + if (*__run_at_load == 1)
>>> + return;
>>> +
>>> + /* Setup flat device-tree pointer */
>>> + initial_boot_params = dt_ptr;
>>> +#endif
>>> +
>>> kernel_sz = (unsigned long)_end - (unsigned long)_stext;
>>> offset = kaslr_choose_location(dt_ptr, size, kernel_sz);
>>> @@ -372,6 +394,7 @@ notrace void __init kaslr_early_init(void
>>> *dt_ptr, phys_addr_t size)
>>> kernstart_virt_addr += offset;
>>> kernstart_addr += offset;
>>> +#ifdef CONFIG_PPC32
>>> is_second_reloc = 1;
>>> if (offset >= SZ_64M) {
>>> @@ -381,6 +404,10 @@ notrace void __init kaslr_early_init(void
>>> *dt_ptr, phys_addr_t size)
>>> /* Create kernel map to relocate in */
>>> create_kaslr_tlb_entry(1, tlb_virt, tlb_phys);
>>> }
>>> +#else
>>> + *__kaslr_offset = kernstart_virt_addr - KERNELBASE;
>>> + *__run_at_load = 1;
>>> +#endif
>>> /* Copy the kernel to it's new location and run */
>>> memcpy((void *)kernstart_virt_addr, (void *)_stext, kernel_sz);
>>>
>>
>> Christophe
>>
>> .
>
>
> .
---
This works for me. Only had to leave the #ifdef around the map_mem_in_cams()
Also had to set linear_sz and ram for the alternative case, otherwise I get
arch/powerpc/mm/nohash/kaslr_booke.c: In function 'kaslr_early_init':
arch/powerpc/mm/nohash/kaslr_booke.c:355:33: error: 'linear_sz' may be used uninitialized in this function [-Werror=maybe-uninitialized]
regions.pa_end = memstart_addr + linear_sz;
~~~~~~~~~~~~~~^~~~~~~~~~~
arch/powerpc/mm/nohash/kaslr_booke.c:315:21: note: 'linear_sz' was declared here
unsigned long ram, linear_sz;
^~~~~~~~~
arch/powerpc/mm/nohash/kaslr_booke.c:187:8: error: 'ram' may be used uninitialized in this function [-Werror=maybe-uninitialized]
ret = parse_crashkernel(boot_command_line, size, &crash_size,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
&crash_base);
~~~~~~~~~~~~
arch/powerpc/mm/nohash/kaslr_booke.c:315:16: note: 'ram' was declared here
unsigned long ram, linear_sz;
---
arch/powerpc/mm/mmu_decl.h | 2 +-
arch/powerpc/mm/nohash/kaslr_booke.c | 97 +++++++++++++++-------------
2 files changed, 52 insertions(+), 47 deletions(-)
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index b869ea893301..3700e7c04e51 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -139,9 +139,9 @@ extern unsigned long calc_cam_sz(unsigned long ram, unsigned long virt,
extern void adjust_total_lowmem(void);
extern int switch_to_as1(void);
extern void restore_to_as0(int esel, int offset, void *dt_ptr, int bootcpu);
+#endif
void create_kaslr_tlb_entry(int entry, unsigned long virt, phys_addr_t phys);
extern int is_second_reloc;
-#endif
void reloc_kernel_entry(void *fdt, long addr);
extern void loadcam_entry(unsigned int index);
diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c b/arch/powerpc/mm/nohash/kaslr_booke.c
index c6f5c1db1394..bf69cece9b8c 100644
--- a/arch/powerpc/mm/nohash/kaslr_booke.c
+++ b/arch/powerpc/mm/nohash/kaslr_booke.c
@@ -267,35 +267,37 @@ static unsigned long __init kaslr_legal_offset(void *dt_ptr, unsigned long rando
unsigned long start;
unsigned long offset;
-#ifdef CONFIG_PPC32
- /*
- * Decide which 64M we want to start
- * Only use the low 8 bits of the random seed
- */
- unsigned long index = random & 0xFF;
- index %= regions.linear_sz / SZ_64M;
-
- /* Decide offset inside 64M */
- offset = random % (SZ_64M - regions.kernel_size);
- offset = round_down(offset, SZ_16K);
+ if (IS_ENABLED(CONFIG_PPC32)) {
+ unsigned long index;
+
+ /*
+ * Decide which 64M we want to start
+ * Only use the low 8 bits of the random seed
+ */
+ index = random & 0xFF;
+ index %= regions.linear_sz / SZ_64M;
+
+ /* Decide offset inside 64M */
+ offset = random % (SZ_64M - regions.kernel_size);
+ offset = round_down(offset, SZ_16K);
+
+ while ((long)index >= 0) {
+ offset = memstart_addr + index * SZ_64M + offset;
+ start = memstart_addr + index * SZ_64M;
+ koffset = get_usable_address(dt_ptr, start, offset);
+ if (koffset)
+ break;
+ index--;
+ }
+ } else {
+ /* Decide kernel offset inside 1G */
+ offset = random % (SZ_1G - regions.kernel_size);
+ offset = round_down(offset, SZ_64K);
- while ((long)index >= 0) {
- offset = memstart_addr + index * SZ_64M + offset;
- start = memstart_addr + index * SZ_64M;
+ start = memstart_addr;
+ offset = memstart_addr + offset;
koffset = get_usable_address(dt_ptr, start, offset);
- if (koffset)
- break;
- index--;
}
-#else
- /* Decide kernel offset inside 1G */
- offset = random % (SZ_1G - regions.kernel_size);
- offset = round_down(offset, SZ_64K);
-
- start = memstart_addr;
- offset = memstart_addr + offset;
- koffset = get_usable_address(dt_ptr, start, offset);
-#endif
if (koffset != 0)
koffset -= memstart_addr;
@@ -342,6 +344,8 @@ static unsigned long __init kaslr_choose_location(void *dt_ptr, phys_addr_t size
/* If the linear size is smaller than 64M, do not randmize */
if (linear_sz < SZ_64M)
return 0;
+#else
+ linear_sz = ram = size;
#endif
/* check for a reserved-memory node and record its cell sizes */
@@ -373,17 +377,19 @@ notrace void __init kaslr_early_init(void *dt_ptr, phys_addr_t size)
{
unsigned long offset;
unsigned long kernel_sz;
+ unsigned int *__kaslr_offset;
+ unsigned int *__run_at_load;
-#ifdef CONFIG_PPC64
- unsigned int *__kaslr_offset = (unsigned int *)(KERNELBASE + 0x58);
- unsigned int *__run_at_load = (unsigned int *)(KERNELBASE + 0x5c);
+ if (IS_ENABLED(CONFIG_PPC64)) {
+ __kaslr_offset = (unsigned int *)(KERNELBASE + 0x58);
+ __run_at_load = (unsigned int *)(KERNELBASE + 0x5c);
- if (*__run_at_load == 1)
- return;
+ if (*__run_at_load == 1)
+ return;
- /* Setup flat device-tree pointer */
- initial_boot_params = dt_ptr;
-#endif
+ /* Setup flat device-tree pointer */
+ initial_boot_params = dt_ptr;
+ }
kernel_sz = (unsigned long)_end - (unsigned long)_stext;
@@ -394,20 +400,19 @@ notrace void __init kaslr_early_init(void *dt_ptr, phys_addr_t size)
kernstart_virt_addr += offset;
kernstart_addr += offset;
-#ifdef CONFIG_PPC32
- is_second_reloc = 1;
-
- if (offset >= SZ_64M) {
- unsigned long tlb_virt = round_down(kernstart_virt_addr, SZ_64M);
- phys_addr_t tlb_phys = round_down(kernstart_addr, SZ_64M);
+ if (IS_ENABLED(CONFIG_PPC32)) {
+ is_second_reloc = 1;
+ if (offset >= SZ_64M) {
+ unsigned long tlb_virt = round_down(kernstart_virt_addr, SZ_64M);
+ phys_addr_t tlb_phys = round_down(kernstart_addr, SZ_64M);
- /* Create kernel map to relocate in */
- create_kaslr_tlb_entry(1, tlb_virt, tlb_phys);
+ /* Create kernel map to relocate in */
+ create_kaslr_tlb_entry(1, tlb_virt, tlb_phys);
+ }
+ } else {
+ *__kaslr_offset = kernstart_virt_addr - KERNELBASE;
+ *__run_at_load = 1;
}
-#else
- *__kaslr_offset = kernstart_virt_addr - KERNELBASE;
- *__run_at_load = 1;
-#endif
/* Copy the kernel to it's new location and run */
memcpy((void *)kernstart_virt_addr, (void *)_stext, kernel_sz);
--
2.25.0
Le 26/02/2020 à 03:40, Jason Yan a écrit :
>
>
> 在 2020/2/20 21:48, Christophe Leroy 写道:
>>
>>
>> Le 06/02/2020 à 03:58, Jason Yan a écrit :
>>> /*
>>> * Decide which 64M we want to start
>>> * Only use the low 8 bits of the random seed
>>> */
>>> - index = random & 0xFF;
>>> + unsigned long index = random & 0xFF;
>>
>> That's not good in terms of readability, index declaration should
>> remain at the top of the function, should be possible if using
>> IS_ENABLED() instead
>
> I'm wondering how to declare a variable inside a code block such as if
> (IS_ENABLED(CONFIG_PPC32)) at the top of the function and use the
> variable in another if (IS_ENABLED(CONFIG_PPC32)). Is there any good idea?
You declare it outside the block as usual:
unsigned long some_var;
if (condition) {
some_var = something;
}
do_many_things();
do_other_things();
if (condition)
return some_var;
else
return 0;
Christophe
Le 26/02/2020 à 04:33, Jason Yan a écrit :
>
>
> 在 2020/2/26 10:40, Jason Yan 写道:
>>
>>
>> 在 2020/2/20 21:48, Christophe Leroy 写道:
>>>
>>>
>>> Le 06/02/2020 à 03:58, Jason Yan a écrit :
> Hi Christophe,
>
> When using a standard C if/else, all code compiled for PPC32 and PPC64,
> but this will bring some build error because not all variables both
> defined for PPC32 and PPC64.
>
> [yanaijie@138 linux]$ sh ppc64build.sh
> CALL scripts/atomic/check-atomics.sh
> CALL scripts/checksyscalls.sh
> CHK include/generated/compile.h
> CC arch/powerpc/mm/nohash/kaslr_booke.o
> arch/powerpc/mm/nohash/kaslr_booke.c: In function 'kaslr_choose_location':
> arch/powerpc/mm/nohash/kaslr_booke.c:341:30: error:
> 'CONFIG_LOWMEM_CAM_NUM' undeclared (first use in this function); did you
> mean 'CONFIG_FLATMEM_MANUAL'?
> ram = map_mem_in_cams(ram, CONFIG_LOWMEM_CAM_NUM, true);
> ^~~~~~~~~~~~~~~~~~~~~
> CONFIG_FLATMEM_MANUAL
This one has to remain inside an #ifdef. That's the only one that has to
remain.
> arch/powerpc/mm/nohash/kaslr_booke.c:341:30: note: each undeclared
> identifier is reported only once for each function it appears in
> arch/powerpc/mm/nohash/kaslr_booke.c: In function 'kaslr_early_init':
> arch/powerpc/mm/nohash/kaslr_booke.c:404:3: error: 'is_second_reloc'
In mmu_decl.h, put the declaration outside the #ifdef CONFIG_PPC32
> undeclared (first use in this function); did you mean '__cond_lock'?
> is_second_reloc = 1;
> ^~~~~~~~~~~~~~~
> __cond_lock
> arch/powerpc/mm/nohash/kaslr_booke.c:411:4: error: implicit declaration
> of function 'create_kaslr_tlb_entry'; did you mean 'reloc_kernel_entry'?
Same, put the declaration outside of the #ifdef
> [-Werror=implicit-function-declaration]
> create_kaslr_tlb_entry(1, tlb_virt, tlb_phys);
> ^~~~~~~~~~~~~~~~~~~~~~
> reloc_kernel_entry
> cc1: all warnings being treated as errors
> make[3]: *** [scripts/Makefile.build:268:
> arch/powerpc/mm/nohash/kaslr_booke.o] Error 1
> make[2]: *** [scripts/Makefile.build:505: arch/powerpc/mm/nohash] Error 2
> make[1]: *** [scripts/Makefile.build:505: arch/powerpc/mm] Error 2
> make: *** [Makefile:1681: arch/powerpc] Error 2
See the patch I sent you. It builds ok for me.
Christophe
?? 2020/2/26 13:04, Christophe Leroy д??:
> ---
> This works for me. Only had to leave the #ifdef around the map_mem_in_cams()
> Also had to set linear_sz and ram for the alternative case, otherwise I get
Great. Thank you for the illustration.
Jason
Hi Jason,
> This is a try to implement KASLR for Freescale BookE64 which is based on
> my earlier implementation for Freescale BookE32:
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
>
> The implementation for Freescale BookE64 is similar as BookE32. One
> difference is that Freescale BookE64 set up a TLB mapping of 1G during
> booting. Another difference is that ppc64 needs the kernel to be
> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
> it 64K-aligned. This can save some code to creat another TLB map at
> early boot. The disadvantage is that we only have about 1G/64K = 16384
> slots to put the kernel in.
>
> KERNELBASE
>
> 64K |--> kernel <--|
> | | |
> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
> | | | |....| | | | | | | | | |....| | |
> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
> | | 1G
> |-----> offset <-----|
>
> kernstart_virt_addr
>
> I'm not sure if the slot numbers is enough or the design has any
> defects. If you have some better ideas, I would be happy to hear that.
>
> Thank you all.
>
Are you making any attempt to hide kernel address leaks in this series?
I've just been looking at the stackdump code just now, and it directly
prints link registers and stack pointers, which is probably enough to
determine the kernel base address:
SPs: LRs: %pS pointer
[ 0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154 (unreliable)
[ 0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac
[ 0.424659] [c0000000de403a60] [c0000000024d7a00] mount_block_root+0x634/0x7c0
[ 0.424734] [c0000000de403be0] [c0000000024d8100] prepare_namespace+0x1ec/0x23c
[ 0.424811] [c0000000de403c60] [c0000000024d7010] kernel_init_freeable+0x804/0x880
git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all
in process.c or in xmon.
Maybe replacing the REG format string in KASLR mode would be sufficient?
Regards,
Daniel
> v2->v3:
> Fix build error when KASLR is disabled.
> v1->v2:
> Add __kaslr_offset for the secondary cpu boot up.
>
> Jason Yan (6):
> powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and
> kaslr_early_init()
> powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper
> powerpc/fsl_booke/64: implement KASLR for fsl_booke64
> powerpc/fsl_booke/64: do not clear the BSS for the second pass
> powerpc/fsl_booke/64: clear the original kernel if randomized
> powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst
> and add 64bit part
>
> .../{kaslr-booke32.rst => kaslr-booke.rst} | 35 +++++++--
> arch/powerpc/Kconfig | 2 +-
> arch/powerpc/kernel/exceptions-64e.S | 23 ++++++
> arch/powerpc/kernel/head_64.S | 14 ++++
> arch/powerpc/kernel/setup_64.c | 4 +-
> arch/powerpc/mm/mmu_decl.h | 19 ++---
> arch/powerpc/mm/nohash/kaslr_booke.c | 71 +++++++++++++------
> 7 files changed, 132 insertions(+), 36 deletions(-)
> rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%)
>
> --
> 2.17.2
Hi Daniel,
?? 2020/2/26 15:16, Daniel Axtens д??:
> Hi Jason,
>
>> This is a try to implement KASLR for Freescale BookE64 which is based on
>> my earlier implementation for Freescale BookE32:
>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
>>
>> The implementation for Freescale BookE64 is similar as BookE32. One
>> difference is that Freescale BookE64 set up a TLB mapping of 1G during
>> booting. Another difference is that ppc64 needs the kernel to be
>> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
>> it 64K-aligned. This can save some code to creat another TLB map at
>> early boot. The disadvantage is that we only have about 1G/64K = 16384
>> slots to put the kernel in.
>>
>> KERNELBASE
>>
>> 64K |--> kernel <--|
>> | | |
>> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
>> | | | |....| | | | | | | | | |....| | |
>> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
>> | | 1G
>> |-----> offset <-----|
>>
>> kernstart_virt_addr
>>
>> I'm not sure if the slot numbers is enough or the design has any
>> defects. If you have some better ideas, I would be happy to hear that.
>>
>> Thank you all.
>>
>
> Are you making any attempt to hide kernel address leaks in this series?
Yes.
> I've just been looking at the stackdump code just now, and it directly
> prints link registers and stack pointers, which is probably enough to
> determine the kernel base address:
>
> SPs: LRs: %pS pointer
> [ 0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154 (unreliable)
> [ 0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac
> [ 0.424659] [c0000000de403a60] [c0000000024d7a00] mount_block_root+0x634/0x7c0
> [ 0.424734] [c0000000de403be0] [c0000000024d8100] prepare_namespace+0x1ec/0x23c
> [ 0.424811] [c0000000de403c60] [c0000000024d7010] kernel_init_freeable+0x804/0x880
>
> git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all
> in process.c or in xmon.
>
Thanks for reminding this.
> Maybe replacing the REG format string in KASLR mode would be sufficient?
>
Most archs have removed the address printing when dumping stack. Do we
really have to print this?
If we have to do this, maybe we can use "%pK" so that they will be
hidden from unprivileged users.
Thanks,
Jason
> Regards,
> Daniel
>
>
>> v2->v3:
>> Fix build error when KASLR is disabled.
>> v1->v2:
>> Add __kaslr_offset for the secondary cpu boot up.
>>
>> Jason Yan (6):
>> powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and
>> kaslr_early_init()
>> powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper
>> powerpc/fsl_booke/64: implement KASLR for fsl_booke64
>> powerpc/fsl_booke/64: do not clear the BSS for the second pass
>> powerpc/fsl_booke/64: clear the original kernel if randomized
>> powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst
>> and add 64bit part
>>
>> .../{kaslr-booke32.rst => kaslr-booke.rst} | 35 +++++++--
>> arch/powerpc/Kconfig | 2 +-
>> arch/powerpc/kernel/exceptions-64e.S | 23 ++++++
>> arch/powerpc/kernel/head_64.S | 14 ++++
>> arch/powerpc/kernel/setup_64.c | 4 +-
>> arch/powerpc/mm/mmu_decl.h | 19 ++---
>> arch/powerpc/mm/nohash/kaslr_booke.c | 71 +++++++++++++------
>> 7 files changed, 132 insertions(+), 36 deletions(-)
>> rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%)
>>
>> --
>> 2.17.2
>
> .
>
Jason Yan <[email protected]> writes:
> Hi Daniel,
>
> 在 2020/2/26 15:16, Daniel Axtens 写道:
>> Hi Jason,
>>
>>> This is a try to implement KASLR for Freescale BookE64 which is based on
>>> my earlier implementation for Freescale BookE32:
>>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
>>>
>>> The implementation for Freescale BookE64 is similar as BookE32. One
>>> difference is that Freescale BookE64 set up a TLB mapping of 1G during
>>> booting. Another difference is that ppc64 needs the kernel to be
>>> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
>>> it 64K-aligned. This can save some code to creat another TLB map at
>>> early boot. The disadvantage is that we only have about 1G/64K = 16384
>>> slots to put the kernel in.
>>>
>>> KERNELBASE
>>>
>>> 64K |--> kernel <--|
>>> | | |
>>> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
>>> | | | |....| | | | | | | | | |....| | |
>>> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
>>> | | 1G
>>> |-----> offset <-----|
>>>
>>> kernstart_virt_addr
>>>
>>> I'm not sure if the slot numbers is enough or the design has any
>>> defects. If you have some better ideas, I would be happy to hear that.
>>>
>>> Thank you all.
>>>
>>
>> Are you making any attempt to hide kernel address leaks in this series?
>
> Yes.
>
>> I've just been looking at the stackdump code just now, and it directly
>> prints link registers and stack pointers, which is probably enough to
>> determine the kernel base address:
>>
>> SPs: LRs: %pS pointer
>> [ 0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154 (unreliable)
>> [ 0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac
>> [ 0.424659] [c0000000de403a60] [c0000000024d7a00] mount_block_root+0x634/0x7c0
>> [ 0.424734] [c0000000de403be0] [c0000000024d8100] prepare_namespace+0x1ec/0x23c
>> [ 0.424811] [c0000000de403c60] [c0000000024d7010] kernel_init_freeable+0x804/0x880
>>
>> git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all
>> in process.c or in xmon.
>>
>
> Thanks for reminding this.
>
>> Maybe replacing the REG format string in KASLR mode would be sufficient?
>>
>
> Most archs have removed the address printing when dumping stack. Do we
> really have to print this?
>
> If we have to do this, maybe we can use "%pK" so that they will be
> hidden from unprivileged users.
I suspect that you will find it easier to convince people to accept a
change to %pK than removal :)
BTW, I have a T4240RDB so I might be able to test this series at some
point - do I need an updated bootloader to pass in a random seed, or is
the kernel able to get enough randomness by itself? (Sorry if this is
explained elsewhere in the series, I have only skimmed it lightly!)
Regards,
Daniel
>
> Thanks,
> Jason
>
>> Regards,
>> Daniel
>>
>>
>>> v2->v3:
>>> Fix build error when KASLR is disabled.
>>> v1->v2:
>>> Add __kaslr_offset for the secondary cpu boot up.
>>>
>>> Jason Yan (6):
>>> powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and
>>> kaslr_early_init()
>>> powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper
>>> powerpc/fsl_booke/64: implement KASLR for fsl_booke64
>>> powerpc/fsl_booke/64: do not clear the BSS for the second pass
>>> powerpc/fsl_booke/64: clear the original kernel if randomized
>>> powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst
>>> and add 64bit part
>>>
>>> .../{kaslr-booke32.rst => kaslr-booke.rst} | 35 +++++++--
>>> arch/powerpc/Kconfig | 2 +-
>>> arch/powerpc/kernel/exceptions-64e.S | 23 ++++++
>>> arch/powerpc/kernel/head_64.S | 14 ++++
>>> arch/powerpc/kernel/setup_64.c | 4 +-
>>> arch/powerpc/mm/mmu_decl.h | 19 ++---
>>> arch/powerpc/mm/nohash/kaslr_booke.c | 71 +++++++++++++------
>>> 7 files changed, 132 insertions(+), 36 deletions(-)
>>> rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%)
>>>
>>> --
>>> 2.17.2
>>
>> .
>>
在 2020/2/26 19:41, Daniel Axtens 写道:
> I suspect that you will find it easier to convince people to accept a
> change to %pK than removal:)
>
> BTW, I have a T4240RDB so I might be able to test this series at some
> point - do I need an updated bootloader to pass in a random seed, or is
> the kernel able to get enough randomness by itself? (Sorry if this is
> explained elsewhere in the series, I have only skimmed it lightly!)
>
Thanks. It will be great if you have time to test this series.
You do not need an updated bootloader becuase the kernel is
using timer base as a simple source of entropy. This is enough for
testing the KASLR code itself.
Jason
> Regards,
> Daniel
On Wed, 2020-02-26 at 16:18 +0800, Jason Yan wrote:
> Hi Daniel,
>
> 在 2020/2/26 15:16, Daniel Axtens 写道:
> > Hi Jason,
> >
> > > This is a try to implement KASLR for Freescale BookE64 which is based on
> > > my earlier implementation for Freescale BookE32:
> > > https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
> > >
> > > The implementation for Freescale BookE64 is similar as BookE32. One
> > > difference is that Freescale BookE64 set up a TLB mapping of 1G during
> > > booting. Another difference is that ppc64 needs the kernel to be
> > > 64K-aligned. So we can randomize the kernel in this 1G mapping and make
> > > it 64K-aligned. This can save some code to creat another TLB map at
> > > early boot. The disadvantage is that we only have about 1G/64K = 16384
> > > slots to put the kernel in.
> > >
> > > KERNELBASE
> > >
> > > 64K |--> kernel <--|
> > > | | |
> > > +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
> > > | | | |....| | | | | | | | | |....| | |
> > > +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
> > > | | 1G
> > > |-----> offset <-----|
> > >
> > > kernstart_virt_addr
> > >
> > > I'm not sure if the slot numbers is enough or the design has any
> > > defects. If you have some better ideas, I would be happy to hear that.
> > >
> > > Thank you all.
> > >
> >
> > Are you making any attempt to hide kernel address leaks in this series?
>
> Yes.
>
> > I've just been looking at the stackdump code just now, and it directly
> > prints link registers and stack pointers, which is probably enough to
> > determine the kernel base address:
> >
> > SPs: LRs: %pS pointer
> > [ 0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154
> > (unreliable)
> > [ 0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac
> > [ 0.424659] [c0000000de403a60] [c0000000024d7a00]
> > mount_block_root+0x634/0x7c0
> > [ 0.424734] [c0000000de403be0] [c0000000024d8100]
> > prepare_namespace+0x1ec/0x23c
> > [ 0.424811] [c0000000de403c60] [c0000000024d7010]
> > kernel_init_freeable+0x804/0x880
> >
> > git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all
> > in process.c or in xmon.
> >
>
> Thanks for reminding this.
>
> > Maybe replacing the REG format string in KASLR mode would be sufficient?
> >
>
> Most archs have removed the address printing when dumping stack. Do we
> really have to print this?
>
> If we have to do this, maybe we can use "%pK" so that they will be
> hidden from unprivileged users.
I've found the addresses to be useful, especially if I had a way to dump the
stack data itself. Wouldn't the register dump also be likely to give away the
addresses?
I don't see any debug setting for %pK (or %p) to always print the actual
address (closest is kptr_restrict=1 but that only works in certain
contexts)... from looking at the code it seems it hashes even if kaslr is
entirely disabled? Or am I missing something?
-Scott
在 2020/2/28 13:53, Scott Wood 写道:
> On Wed, 2020-02-26 at 16:18 +0800, Jason Yan wrote:
>> Hi Daniel,
>>
>> 在 2020/2/26 15:16, Daniel Axtens 写道:
>>> Hi Jason,
>>>
>>>> This is a try to implement KASLR for Freescale BookE64 which is based on
>>>> my earlier implementation for Freescale BookE32:
>>>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
>>>>
>>>> The implementation for Freescale BookE64 is similar as BookE32. One
>>>> difference is that Freescale BookE64 set up a TLB mapping of 1G during
>>>> booting. Another difference is that ppc64 needs the kernel to be
>>>> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
>>>> it 64K-aligned. This can save some code to creat another TLB map at
>>>> early boot. The disadvantage is that we only have about 1G/64K = 16384
>>>> slots to put the kernel in.
>>>>
>>>> KERNELBASE
>>>>
>>>> 64K |--> kernel <--|
>>>> | | |
>>>> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
>>>> | | | |....| | | | | | | | | |....| | |
>>>> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
>>>> | | 1G
>>>> |-----> offset <-----|
>>>>
>>>> kernstart_virt_addr
>>>>
>>>> I'm not sure if the slot numbers is enough or the design has any
>>>> defects. If you have some better ideas, I would be happy to hear that.
>>>>
>>>> Thank you all.
>>>>
>>>
>>> Are you making any attempt to hide kernel address leaks in this series?
>>
>> Yes.
>>
>>> I've just been looking at the stackdump code just now, and it directly
>>> prints link registers and stack pointers, which is probably enough to
>>> determine the kernel base address:
>>>
>>> SPs: LRs: %pS pointer
>>> [ 0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154
>>> (unreliable)
>>> [ 0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac
>>> [ 0.424659] [c0000000de403a60] [c0000000024d7a00]
>>> mount_block_root+0x634/0x7c0
>>> [ 0.424734] [c0000000de403be0] [c0000000024d8100]
>>> prepare_namespace+0x1ec/0x23c
>>> [ 0.424811] [c0000000de403c60] [c0000000024d7010]
>>> kernel_init_freeable+0x804/0x880
>>>
>>> git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all
>>> in process.c or in xmon.
>>>
>>
>> Thanks for reminding this.
>>
>>> Maybe replacing the REG format string in KASLR mode would be sufficient?
>>>
>>
>> Most archs have removed the address printing when dumping stack. Do we
>> really have to print this?
>>
>> If we have to do this, maybe we can use "%pK" so that they will be
>> hidden from unprivileged users.
>
> I've found the addresses to be useful, especially if I had a way to dump the
> stack data itself. Wouldn't the register dump also be likely to give away the
> addresses?
If we have to print the address, then kptr_restrict and dmesg_restrict
must be set properly so that unprivileged users cannot see them.
>
> I don't see any debug setting for %pK (or %p) to always print the actual
> address (closest is kptr_restrict=1 but that only works in certain
> contexts)... from looking at the code it seems it hashes even if kaslr is
> entirely disabled? Or am I missing something?
>
Yes, %pK (or %p) always hashes whether kaslr is disabled or not. So if
we want the real value of the address, we cannot use it. But if you only
want to distinguish if two pointers are the same, it's ok.
> -Scott
>
>
>
> .
>
On Fri, 2020-02-28 at 14:47 +0800, Jason Yan wrote:
>
> 在 2020/2/28 13:53, Scott Wood 写道:
> > On Wed, 2020-02-26 at 16:18 +0800, Jason Yan wrote:
> > > Hi Daniel,
> > >
> > > 在 2020/2/26 15:16, Daniel Axtens 写道:
> > > > Maybe replacing the REG format string in KASLR mode would be
> > > > sufficient?
> > > >
> > >
> > > Most archs have removed the address printing when dumping stack. Do we
> > > really have to print this?
> > >
> > > If we have to do this, maybe we can use "%pK" so that they will be
> > > hidden from unprivileged users.
> >
> > I've found the addresses to be useful, especially if I had a way to dump
> > the
> > stack data itself. Wouldn't the register dump also be likely to give away
> > the
> > addresses?
>
> If we have to print the address, then kptr_restrict and dmesg_restrict
> must be set properly so that unprivileged users cannot see them.
And how does that work with crash dumps that could be from any context?
dmesg_restrict is irrelevant as it just controls who can see the dmesg, not
what goes into it. kptr_restrict=1 will only get the value if you're not in
any sort of IRQ, *and* if the crashing context happened to have CAP_SYSLOG.
No other value of kptr_restrict will ever get you the raw value.
> >
> > I don't see any debug setting for %pK (or %p) to always print the actual
> > address (closest is kptr_restrict=1 but that only works in certain
> > contexts)... from looking at the code it seems it hashes even if kaslr is
> > entirely disabled? Or am I missing something?
> >
>
> Yes, %pK (or %p) always hashes whether kaslr is disabled or not. So if
> we want the real value of the address, we cannot use it. But if you only
> want to distinguish if two pointers are the same, it's ok.
Am I the only one that finds this a bit crazy? If you want to lock a system
down then fine, but why wage war on debugging even when there's no
randomization going on? Comparing two pointers for equality is not always
adequate.
-Scott
在 2020/2/29 12:28, Scott Wood 写道:
> On Fri, 2020-02-28 at 14:47 +0800, Jason Yan wrote:
>>
>> 在 2020/2/28 13:53, Scott Wood 写道:
>>> On Wed, 2020-02-26 at 16:18 +0800, Jason Yan wrote:
>>>> Hi Daniel,
>>>>
>>>> 在 2020/2/26 15:16, Daniel Axtens 写道:
>>>>> Maybe replacing the REG format string in KASLR mode would be
>>>>> sufficient?
>>>>>
>>>>
>>>> Most archs have removed the address printing when dumping stack. Do we
>>>> really have to print this?
>>>>
>>>> If we have to do this, maybe we can use "%pK" so that they will be
>>>> hidden from unprivileged users.
>>>
>>> I've found the addresses to be useful, especially if I had a way to dump
>>> the
>>> stack data itself. Wouldn't the register dump also be likely to give away
>>> the
>>> addresses?
>>
>> If we have to print the address, then kptr_restrict and dmesg_restrict
>> must be set properly so that unprivileged users cannot see them.
>
> And how does that work with crash dumps that could be from any context?
>
> dmesg_restrict is irrelevant as it just controls who can see the dmesg, not
> what goes into it. kptr_restrict=1 will only get the value if you're not in
> any sort of IRQ, *and* if the crashing context happened to have CAP_SYSLOG.
> No other value of kptr_restrict will ever get you the raw value.
>
>>>
>>> I don't see any debug setting for %pK (or %p) to always print the actual
>>> address (closest is kptr_restrict=1 but that only works in certain
>>> contexts)... from looking at the code it seems it hashes even if kaslr is
>>> entirely disabled? Or am I missing something?
>>>
>>
>> Yes, %pK (or %p) always hashes whether kaslr is disabled or not. So if
>> we want the real value of the address, we cannot use it. But if you only
>> want to distinguish if two pointers are the same, it's ok.
>
> Am I the only one that finds this a bit crazy? If you want to lock a system
> down then fine, but why wage war on debugging even when there's no
> randomization going on? Comparing two pointers for equality is not always
> adequate.
>
AFAIK, %p hashing is only exist because of many legacy address printings
and force who really want the raw values to switch to %px or even %lx.
It's not the opposite of debugging. Raw address printing is not
forbidden, only people need to estimate the risk of adrdress leaks.
Turnning to %p may not be a good idea in this situation. So
for the REG logs printed when dumping stack, we can disable it when
KASLR is open. For the REG logs in other places like show_regs(), only
privileged can trigger it, and they are not combind with a symbol, so
I think it's ok to keep them.
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index fad50db9dcf2..659c51f0739a 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk, unsigned
long *stack)
newsp = stack[0];
ip = stack[STACK_FRAME_LR_SAVE];
if (!firstframe || ip != lr) {
- printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip);
+ if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
+ printk("%pS", (void *)ip);
+ else
+ printk("["REG"] ["REG"] %pS", sp, ip,
(void *)ip);
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
ret_addr = ftrace_graph_ret_addr(current,
&ftrace_idx, ip, stack);
Thanks,
Jason
> -Scott
>
>
>
> .
>
On Sat, 2020-02-29 at 15:27 +0800, Jason Yan wrote:
>
> 在 2020/2/29 12:28, Scott Wood 写道:
> > On Fri, 2020-02-28 at 14:47 +0800, Jason Yan wrote:
> > >
> > > 在 2020/2/28 13:53, Scott Wood 写道:
> > > >
> > > > I don't see any debug setting for %pK (or %p) to always print the
> > > > actual
> > > > address (closest is kptr_restrict=1 but that only works in certain
> > > > contexts)... from looking at the code it seems it hashes even if kaslr
> > > > is
> > > > entirely disabled? Or am I missing something?
> > > >
> > >
> > > Yes, %pK (or %p) always hashes whether kaslr is disabled or not. So if
> > > we want the real value of the address, we cannot use it. But if you only
> > > want to distinguish if two pointers are the same, it's ok.
> >
> > Am I the only one that finds this a bit crazy? If you want to lock a
> > system
> > down then fine, but why wage war on debugging even when there's no
> > randomization going on? Comparing two pointers for equality is not always
> > adequate.
> >
>
> AFAIK, %p hashing is only exist because of many legacy address printings
> and force who really want the raw values to switch to %px or even %lx.
> It's not the opposite of debugging. Raw address printing is not
> forbidden, only people need to estimate the risk of adrdress leaks.
Yes, but I don't see any format specifier to switch to that will hash in a
randomized production environment, but not in a debug or other non-randomized
environment which seems like the ideal default for most debug output.
>
> Turnning to %p may not be a good idea in this situation. So
> for the REG logs printed when dumping stack, we can disable it when
> KASLR is open. For the REG logs in other places like show_regs(), only
> privileged can trigger it, and they are not combind with a symbol, so
> I think it's ok to keep them.
>
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index fad50db9dcf2..659c51f0739a 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk, unsigned
> long *stack)
> newsp = stack[0];
> ip = stack[STACK_FRAME_LR_SAVE];
> if (!firstframe || ip != lr) {
> - printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip);
> + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
> + printk("%pS", (void *)ip);
> + else
> + printk("["REG"] ["REG"] %pS", sp, ip,
> (void *)ip);
This doesn't deal with "nokaslr" on the kernel command line. It also doesn't
seem like something that every callsite should have to opencode, versus having
an appropriate format specifier behaves as I described above (and I still
don't see why that format specifier should not be "%p").
-Scott
在 2020/3/1 6:54, Scott Wood 写道:
> On Sat, 2020-02-29 at 15:27 +0800, Jason Yan wrote:
>>
>> 在 2020/2/29 12:28, Scott Wood 写道:
>>> On Fri, 2020-02-28 at 14:47 +0800, Jason Yan wrote:
>>>>
>>>> 在 2020/2/28 13:53, Scott Wood 写道:
>>>>>
>>>>> I don't see any debug setting for %pK (or %p) to always print the
>>>>> actual
>>>>> address (closest is kptr_restrict=1 but that only works in certain
>>>>> contexts)... from looking at the code it seems it hashes even if kaslr
>>>>> is
>>>>> entirely disabled? Or am I missing something?
>>>>>
>>>>
>>>> Yes, %pK (or %p) always hashes whether kaslr is disabled or not. So if
>>>> we want the real value of the address, we cannot use it. But if you only
>>>> want to distinguish if two pointers are the same, it's ok.
>>>
>>> Am I the only one that finds this a bit crazy? If you want to lock a
>>> system
>>> down then fine, but why wage war on debugging even when there's no
>>> randomization going on? Comparing two pointers for equality is not always
>>> adequate.
>>>
>>
>> AFAIK, %p hashing is only exist because of many legacy address printings
>> and force who really want the raw values to switch to %px or even %lx.
>> It's not the opposite of debugging. Raw address printing is not
>> forbidden, only people need to estimate the risk of adrdress leaks.
>
> Yes, but I don't see any format specifier to switch to that will hash in a
> randomized production environment, but not in a debug or other non-randomized
> environment which seems like the ideal default for most debug output.
>
Sorry I have no idea why there is no format specifier considered for
switching of randomized or non-randomized environment. May they think
that raw address should not leak in non-randomized environment too. May
be Kees or Tobin can answer this question.
Kees? Tobin?
>>
>> Turnning to %p may not be a good idea in this situation. So
>> for the REG logs printed when dumping stack, we can disable it when
>> KASLR is open. For the REG logs in other places like show_regs(), only
>> privileged can trigger it, and they are not combind with a symbol, so
>> I think it's ok to keep them.
>>
>> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
>> index fad50db9dcf2..659c51f0739a 100644
>> --- a/arch/powerpc/kernel/process.c
>> +++ b/arch/powerpc/kernel/process.c
>> @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk, unsigned
>> long *stack)
>> newsp = stack[0];
>> ip = stack[STACK_FRAME_LR_SAVE];
>> if (!firstframe || ip != lr) {
>> - printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip);
>> + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
>> + printk("%pS", (void *)ip);
>> + else
>> + printk("["REG"] ["REG"] %pS", sp, ip,
>> (void *)ip);
>
> This doesn't deal with "nokaslr" on the kernel command line. It also doesn't
> seem like something that every callsite should have to opencode, versus having
> an appropriate format specifier behaves as I described above (and I still
> don't see why that format specifier should not be "%p").
>
Actually I still do not understand why we should print the raw value
here. When KALLSYMS is enabled we have symbol name and offset like
put_cred_rcu+0x108/0x110, and when KALLSYMS is disabled we have the raw
address.
> -Scott
>
>
>
> .
>
On Mon, 2020-03-02 at 10:17 +0800, Jason Yan wrote:
>
> 在 2020/3/1 6:54, Scott Wood 写道:
> > On Sat, 2020-02-29 at 15:27 +0800, Jason Yan wrote:
> > >
> > > Turnning to %p may not be a good idea in this situation. So
> > > for the REG logs printed when dumping stack, we can disable it when
> > > KASLR is open. For the REG logs in other places like show_regs(), only
> > > privileged can trigger it, and they are not combind with a symbol, so
> > > I think it's ok to keep them.
> > >
> > > diff --git a/arch/powerpc/kernel/process.c
> > > b/arch/powerpc/kernel/process.c
> > > index fad50db9dcf2..659c51f0739a 100644
> > > --- a/arch/powerpc/kernel/process.c
> > > +++ b/arch/powerpc/kernel/process.c
> > > @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk, unsigned
> > > long *stack)
> > > newsp = stack[0];
> > > ip = stack[STACK_FRAME_LR_SAVE];
> > > if (!firstframe || ip != lr) {
> > > - printk("["REG"] ["REG"] %pS", sp, ip, (void
> > > *)ip);
> > > + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
> > > + printk("%pS", (void *)ip);
> > > + else
> > > + printk("["REG"] ["REG"] %pS", sp, ip,
> > > (void *)ip);
> >
> > This doesn't deal with "nokaslr" on the kernel command line. It also
> > doesn't
> > seem like something that every callsite should have to opencode, versus
> > having
> > an appropriate format specifier behaves as I described above (and I still
> > don't see why that format specifier should not be "%p").
> >
>
> Actually I still do not understand why we should print the raw value
> here. When KALLSYMS is enabled we have symbol name and offset like
> put_cred_rcu+0x108/0x110, and when KALLSYMS is disabled we have the raw
> address.
I'm more concerned about the stack address for wading through a raw stack dump
(to find function call arguments, etc). The return address does help confirm
that I'm on the right stack frame though, and also makes looking up a line
number slightly easier than having to look up a symbol address and then add
the offset (at least for non-module addresses).
As a random aside, the mismatch between Linux printing a hex offset and GDB
using decimal in disassembly is annoying...
-Scott
在 2020/3/2 11:24, Scott Wood 写道:
> On Mon, 2020-03-02 at 10:17 +0800, Jason Yan wrote:
>>
>> 在 2020/3/1 6:54, Scott Wood 写道:
>>> On Sat, 2020-02-29 at 15:27 +0800, Jason Yan wrote:
>>>>
>>>> Turnning to %p may not be a good idea in this situation. So
>>>> for the REG logs printed when dumping stack, we can disable it when
>>>> KASLR is open. For the REG logs in other places like show_regs(), only
>>>> privileged can trigger it, and they are not combind with a symbol, so
>>>> I think it's ok to keep them.
>>>>
>>>> diff --git a/arch/powerpc/kernel/process.c
>>>> b/arch/powerpc/kernel/process.c
>>>> index fad50db9dcf2..659c51f0739a 100644
>>>> --- a/arch/powerpc/kernel/process.c
>>>> +++ b/arch/powerpc/kernel/process.c
>>>> @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk, unsigned
>>>> long *stack)
>>>> newsp = stack[0];
>>>> ip = stack[STACK_FRAME_LR_SAVE];
>>>> if (!firstframe || ip != lr) {
>>>> - printk("["REG"] ["REG"] %pS", sp, ip, (void
>>>> *)ip);
>>>> + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
>>>> + printk("%pS", (void *)ip);
>>>> + else
>>>> + printk("["REG"] ["REG"] %pS", sp, ip,
>>>> (void *)ip);
>>>
>>> This doesn't deal with "nokaslr" on the kernel command line. It also
>>> doesn't
>>> seem like something that every callsite should have to opencode, versus
>>> having
>>> an appropriate format specifier behaves as I described above (and I still
>>> don't see why that format specifier should not be "%p").
>>>
>>
>> Actually I still do not understand why we should print the raw value
>> here. When KALLSYMS is enabled we have symbol name and offset like
>> put_cred_rcu+0x108/0x110, and when KALLSYMS is disabled we have the raw
>> address.
>
> I'm more concerned about the stack address for wading through a raw stack dump
> (to find function call arguments, etc). The return address does help confirm
> that I'm on the right stack frame though, and also makes looking up a line
> number slightly easier than having to look up a symbol address and then add
> the offset (at least for non-module addresses).
>
> As a random aside, the mismatch between Linux printing a hex offset and GDB
> using decimal in disassembly is annoying...
>
OK, I will send a RFC patch to add a new format specifier such as "%pk"
or change the exsiting "%pK" to print raw value of addresses when KASLR
is disabled and print hash value of addresses when KASLR is enabled.
Let's see what the printk guys would say :)
> -Scott
>
>
>
> .
>
On Mon, 2020-03-02 at 15:12 +0800, Jason Yan wrote:
>
> 在 2020/3/2 11:24, Scott Wood 写道:
> > On Mon, 2020-03-02 at 10:17 +0800, Jason Yan wrote:
> > >
> > > 在 2020/3/1 6:54, Scott Wood 写道:
> > > > On Sat, 2020-02-29 at 15:27 +0800, Jason Yan wrote:
> > > > >
> > > > > Turnning to %p may not be a good idea in this situation. So
> > > > > for the REG logs printed when dumping stack, we can disable it when
> > > > > KASLR is open. For the REG logs in other places like show_regs(),
> > > > > only
> > > > > privileged can trigger it, and they are not combind with a symbol,
> > > > > so
> > > > > I think it's ok to keep them.
> > > > >
> > > > > diff --git a/arch/powerpc/kernel/process.c
> > > > > b/arch/powerpc/kernel/process.c
> > > > > index fad50db9dcf2..659c51f0739a 100644
> > > > > --- a/arch/powerpc/kernel/process.c
> > > > > +++ b/arch/powerpc/kernel/process.c
> > > > > @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk,
> > > > > unsigned
> > > > > long *stack)
> > > > > newsp = stack[0];
> > > > > ip = stack[STACK_FRAME_LR_SAVE];
> > > > > if (!firstframe || ip != lr) {
> > > > > - printk("["REG"] ["REG"] %pS", sp, ip, (void
> > > > > *)ip);
> > > > > + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
> > > > > + printk("%pS", (void *)ip);
> > > > > + else
> > > > > + printk("["REG"] ["REG"] %pS", sp,
> > > > > ip,
> > > > > (void *)ip);
> > > >
> > > > This doesn't deal with "nokaslr" on the kernel command line. It also
> > > > doesn't
> > > > seem like something that every callsite should have to opencode,
> > > > versus
> > > > having
> > > > an appropriate format specifier behaves as I described above (and I
> > > > still
> > > > don't see why that format specifier should not be "%p").
> > > >
> > >
> > > Actually I still do not understand why we should print the raw value
> > > here. When KALLSYMS is enabled we have symbol name and offset like
> > > put_cred_rcu+0x108/0x110, and when KALLSYMS is disabled we have the raw
> > > address.
> >
> > I'm more concerned about the stack address for wading through a raw stack
> > dump
> > (to find function call arguments, etc). The return address does help
> > confirm
> > that I'm on the right stack frame though, and also makes looking up a line
> > number slightly easier than having to look up a symbol address and then
> > add
> > the offset (at least for non-module addresses).
> >
> > As a random aside, the mismatch between Linux printing a hex offset and
> > GDB
> > using decimal in disassembly is annoying...
> >
>
> OK, I will send a RFC patch to add a new format specifier such as "%pk"
> or change the exsiting "%pK" to print raw value of addresses when KASLR
> is disabled and print hash value of addresses when KASLR is enabled.
> Let's see what the printk guys would say :)
I'm not sure that a new format specifier is needed versus changing the
behavior of "%p", and "%pK" definitely doesn't seem suitable given that it's
intended to be more restricted than "%p" (see commit ef0010a30935de4). The
question is whether there is a legitimate reason to hash in the absence of
kaslr.
-Scott
在 2020/3/2 16:47, Scott Wood 写道:
> On Mon, 2020-03-02 at 15:12 +0800, Jason Yan wrote:
>>
>> 在 2020/3/2 11:24, Scott Wood 写道:
>>> On Mon, 2020-03-02 at 10:17 +0800, Jason Yan wrote:
>>>>
>>>> 在 2020/3/1 6:54, Scott Wood 写道:
>>>>> On Sat, 2020-02-29 at 15:27 +0800, Jason Yan wrote:
>>>>>>
>>>>>> Turnning to %p may not be a good idea in this situation. So
>>>>>> for the REG logs printed when dumping stack, we can disable it when
>>>>>> KASLR is open. For the REG logs in other places like show_regs(),
>>>>>> only
>>>>>> privileged can trigger it, and they are not combind with a symbol,
>>>>>> so
>>>>>> I think it's ok to keep them.
>>>>>>
>>>>>> diff --git a/arch/powerpc/kernel/process.c
>>>>>> b/arch/powerpc/kernel/process.c
>>>>>> index fad50db9dcf2..659c51f0739a 100644
>>>>>> --- a/arch/powerpc/kernel/process.c
>>>>>> +++ b/arch/powerpc/kernel/process.c
>>>>>> @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk,
>>>>>> unsigned
>>>>>> long *stack)
>>>>>> newsp = stack[0];
>>>>>> ip = stack[STACK_FRAME_LR_SAVE];
>>>>>> if (!firstframe || ip != lr) {
>>>>>> - printk("["REG"] ["REG"] %pS", sp, ip, (void
>>>>>> *)ip);
>>>>>> + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
>>>>>> + printk("%pS", (void *)ip);
>>>>>> + else
>>>>>> + printk("["REG"] ["REG"] %pS", sp,
>>>>>> ip,
>>>>>> (void *)ip);
>>>>>
>>>>> This doesn't deal with "nokaslr" on the kernel command line. It also
>>>>> doesn't
>>>>> seem like something that every callsite should have to opencode,
>>>>> versus
>>>>> having
>>>>> an appropriate format specifier behaves as I described above (and I
>>>>> still
>>>>> don't see why that format specifier should not be "%p").
>>>>>
>>>>
>>>> Actually I still do not understand why we should print the raw value
>>>> here. When KALLSYMS is enabled we have symbol name and offset like
>>>> put_cred_rcu+0x108/0x110, and when KALLSYMS is disabled we have the raw
>>>> address.
>>>
>>> I'm more concerned about the stack address for wading through a raw stack
>>> dump
>>> (to find function call arguments, etc). The return address does help
>>> confirm
>>> that I'm on the right stack frame though, and also makes looking up a line
>>> number slightly easier than having to look up a symbol address and then
>>> add
>>> the offset (at least for non-module addresses).
>>>
>>> As a random aside, the mismatch between Linux printing a hex offset and
>>> GDB
>>> using decimal in disassembly is annoying...
>>>
>>
>> OK, I will send a RFC patch to add a new format specifier such as "%pk"
>> or change the exsiting "%pK" to print raw value of addresses when KASLR
>> is disabled and print hash value of addresses when KASLR is enabled.
>> Let's see what the printk guys would say :)
>
> I'm not sure that a new format specifier is needed versus changing the
> behavior of "%p", and "%pK" definitely doesn't seem suitable given that it's
> intended to be more restricted than "%p" (see commit ef0010a30935de4). The
> question is whether there is a legitimate reason to hash in the absence of
> kaslr.
>
The problem is that if we change the behavior of "%p", we have to turn
all exsiting "%p" to "%pK". Hashing is still reasonable when there is no
kaslr because some architectures support randomize at build time such as
arm64.
> -Scott
>
>
>
> .
>
On Wed, 2020-02-26 at 18:16 +1100, Daniel Axtens wrote:
> Hi Jason,
>
> > This is a try to implement KASLR for Freescale BookE64 which is based on
> > my earlier implementation for Freescale BookE32:
> > https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
> >
> > The implementation for Freescale BookE64 is similar as BookE32. One
> > difference is that Freescale BookE64 set up a TLB mapping of 1G during
> > booting. Another difference is that ppc64 needs the kernel to be
> > 64K-aligned. So we can randomize the kernel in this 1G mapping and make
> > it 64K-aligned. This can save some code to creat another TLB map at
> > early boot. The disadvantage is that we only have about 1G/64K = 16384
> > slots to put the kernel in.
> >
> > KERNELBASE
> >
> > 64K |--> kernel <--|
> > | | |
> > +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
> > | | | |....| | | | | | | | | |....| | |
> > +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
> > | | 1G
> > |-----> offset <-----|
> >
> > kernstart_virt_addr
> >
> > I'm not sure if the slot numbers is enough or the design has any
> > defects. If you have some better ideas, I would be happy to hear that.
> >
> > Thank you all.
> >
>
> Are you making any attempt to hide kernel address leaks in this series?
> I've just been looking at the stackdump code just now, and it directly
> prints link registers and stack pointers, which is probably enough to
> determine the kernel base address:
>
> SPs: LRs: %pS pointer
> [ 0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154
> (unreliable)
> [ 0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac
> [ 0.424659] [c0000000de403a60] [c0000000024d7a00]
> mount_block_root+0x634/0x7c0
> [ 0.424734] [c0000000de403be0] [c0000000024d8100]
> prepare_namespace+0x1ec/0x23c
> [ 0.424811] [c0000000de403c60] [c0000000024d7010]
> kernel_init_freeable+0x804/0x880
>
> git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all
> in process.c or in xmon.
>
> Maybe replacing the REG format string in KASLR mode would be sufficient?
Whatever we decide to do here, it's not book3e-specific so it should be
considered separately from these patches.
-Scott
On Thu, 2020-02-06 at 10:58 +0800, Jason Yan wrote:
> The implementation for Freescale BookE64 is similar as BookE32. One
> difference is that Freescale BookE64 set up a TLB mapping of 1G during
> booting. Another difference is that ppc64 needs the kernel to be
> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
> it 64K-aligned. This can save some code to creat another TLB map at
> early boot. The disadvantage is that we only have about 1G/64K = 16384
> slots to put the kernel in.
>
> To support secondary cpu boot up, a variable __kaslr_offset was added in
> first_256B section. This can help secondary cpu get the kaslr offset
> before the 1:1 mapping has been setup.
What specifically requires __kaslr_offset instead of using kernstart_virt_addr
like 32-bit does?
>
> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> index ad79fddb974d..744624140fb8 100644
> --- a/arch/powerpc/kernel/head_64.S
> +++ b/arch/powerpc/kernel/head_64.S
> @@ -104,6 +104,13 @@ __secondary_hold_acknowledge:
> .8byte 0x0
>
> #ifdef CONFIG_RELOCATABLE
> +#ifdef CONFIG_RANDOMIZE_BASE
> + . = 0x58
> + .globl __kaslr_offset
> +__kaslr_offset:
> +DEFINE_FIXED_SYMBOL(__kaslr_offset)
> + .long 0
> +#endif
> /* This flag is set to 1 by a loader if the kernel should run
> * at the loaded address instead of the linked address. This
> * is used by kexec-tools to keep the the kdump kernel in the
Why does it need to go here at a fixed address?
>
> /* check for a reserved-memory node and record its cell sizes */
> regions.reserved_mem = fdt_path_offset(dt_ptr, "/reserved-memory");
> @@ -363,6 +374,17 @@ notrace void __init kaslr_early_init(void *dt_ptr,
> phys_addr_t size)
> unsigned long offset;
> unsigned long kernel_sz;
>
> +#ifdef CONFIG_PPC64
> + unsigned int *__kaslr_offset = (unsigned int *)(KERNELBASE + 0x58);
> + unsigned int *__run_at_load = (unsigned int *)(KERNELBASE + 0x5c);
Why are you referencing these by magic offset rather than by symbol?
> + /* Setup flat device-tree pointer */
> + initial_boot_params = dt_ptr;
> +#endif
Why does 64-bit need this but 32-bit doesn't?
-Scott
On Thu, 2020-02-06 at 10:58 +0800, Jason Yan wrote:
> The BSS section has already cleared out in the first pass. No need to
> clear it again. This can save some time when booting with KASLR
> enabled.
>
> Signed-off-by: Jason Yan <[email protected]>
> Cc: Scott Wood <[email protected]>
> Cc: Diana Craciun <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Christophe Leroy <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Cc: Kees Cook <[email protected]>
> ---
> arch/powerpc/kernel/head_64.S | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> index 744624140fb8..8c644e7c3eaf 100644
> --- a/arch/powerpc/kernel/head_64.S
> +++ b/arch/powerpc/kernel/head_64.S
> @@ -914,6 +914,13 @@ start_here_multiplatform:
> bl relative_toc
> tovirt(r2,r2)
>
> + /* Do not clear the BSS for the second pass if randomized */
> + LOAD_REG_ADDR(r3, kernstart_virt_addr)
> + lwz r3,0(r3)
> + LOAD_REG_IMMEDIATE(r4, KERNELBASE)
> + cmpw r3,r4
> + bne 4f
These are 64-bit values.
-Scott
On Thu, 2020-02-06 at 10:58 +0800, Jason Yan wrote:
> The original kernel still exists in the memory, clear it now.
>
> Signed-off-by: Jason Yan <[email protected]>
> Cc: Scott Wood <[email protected]>
> Cc: Diana Craciun <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Christophe Leroy <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Cc: Kees Cook <[email protected]>
> ---
> arch/powerpc/mm/nohash/kaslr_booke.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c
> b/arch/powerpc/mm/nohash/kaslr_booke.c
> index c6f5c1db1394..ed1277059368 100644
> --- a/arch/powerpc/mm/nohash/kaslr_booke.c
> +++ b/arch/powerpc/mm/nohash/kaslr_booke.c
> @@ -378,8 +378,10 @@ notrace void __init kaslr_early_init(void *dt_ptr,
> phys_addr_t size)
> unsigned int *__kaslr_offset = (unsigned int *)(KERNELBASE + 0x58);
> unsigned int *__run_at_load = (unsigned int *)(KERNELBASE + 0x5c);
>
> - if (*__run_at_load == 1)
> + if (*__run_at_load == 1) {
> + kaslr_late_init();
> return;
> + }
What if you're here because kexec set __run_at_load (or
CONFIG_RELOCATABLE_TEST is enabled), not because kaslr happened?
-Scott
在 2020/3/5 5:44, Scott Wood 写道:
> On Thu, 2020-02-06 at 10:58 +0800, Jason Yan wrote:
>> The implementation for Freescale BookE64 is similar as BookE32. One
>> difference is that Freescale BookE64 set up a TLB mapping of 1G during
>> booting. Another difference is that ppc64 needs the kernel to be
>> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
>> it 64K-aligned. This can save some code to creat another TLB map at
>> early boot. The disadvantage is that we only have about 1G/64K = 16384
>> slots to put the kernel in.
>>
>> To support secondary cpu boot up, a variable __kaslr_offset was added in
>> first_256B section. This can help secondary cpu get the kaslr offset
>> before the 1:1 mapping has been setup.
>
> What specifically requires __kaslr_offset instead of using kernstart_virt_addr
> like 32-bit does?
>
kernstart_virt_addr is in the data section. At the early boot we only
have a 64M tlb mapping. For the 32-bit I limited the kernel in a
64M-aligned region so that we can always get kernstart_virt_addr. But
for the 64-bit the kernel is bigger and not suitable to limit it in a
64M-aligned region.
So if we use kernstart_virt_addr and the kernel is randomized like below
, the secondary cpus will not boot up:
+------------+------------+
| 64M | 64M |
+------------+------------+
^ ^
| kernel |
^
kernstart_virt_addr
So I have to put the kernel offset in the first 64K along with the init
text.
>>
>> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
>> index ad79fddb974d..744624140fb8 100644
>> --- a/arch/powerpc/kernel/head_64.S
>> +++ b/arch/powerpc/kernel/head_64.S
>> @@ -104,6 +104,13 @@ __secondary_hold_acknowledge:
>> .8byte 0x0
>>
>> #ifdef CONFIG_RELOCATABLE
>> +#ifdef CONFIG_RANDOMIZE_BASE
>> + . = 0x58
>> + .globl __kaslr_offset
>> +__kaslr_offset:
>> +DEFINE_FIXED_SYMBOL(__kaslr_offset)
>> + .long 0
>> +#endif
>> /* This flag is set to 1 by a loader if the kernel should run
>> * at the loaded address instead of the linked address. This
>> * is used by kexec-tools to keep the the kdump kernel in the
>
> Why does it need to go here at a fixed address?
>
It does not need to be at a fixed address. I just want to keep
consistent and stay along with __run_at_load.
>
>>
>> /* check for a reserved-memory node and record its cell sizes */
>> regions.reserved_mem = fdt_path_offset(dt_ptr, "/reserved-memory");
>> @@ -363,6 +374,17 @@ notrace void __init kaslr_early_init(void *dt_ptr,
>> phys_addr_t size)
>> unsigned long offset;
>> unsigned long kernel_sz;
>>
>> +#ifdef CONFIG_PPC64
>> + unsigned int *__kaslr_offset = (unsigned int *)(KERNELBASE + 0x58);
>> + unsigned int *__run_at_load = (unsigned int *)(KERNELBASE + 0x5c);
>
> Why are you referencing these by magic offset rather than by symbol?
>
I'm not sure if relocat works for fixed symbols. I will have a test and
swith to reference them by symbols if it works fine.
>
>> + /* Setup flat device-tree pointer */
>> + initial_boot_params = dt_ptr;
>> +#endif
>
> Why does 64-bit need this but 32-bit doesn't?
32-bit called early_get_first_memblock_info() very early which
implicitly setup the device-tree pointer.
>
> -Scott
>
>
>
> .
>
在 2020/3/5 5:49, Scott Wood 写道:
> On Thu, 2020-02-06 at 10:58 +0800, Jason Yan wrote:
>> The BSS section has already cleared out in the first pass. No need to
>> clear it again. This can save some time when booting with KASLR
>> enabled.
>>
>> Signed-off-by: Jason Yan <[email protected]>
>> Cc: Scott Wood <[email protected]>
>> Cc: Diana Craciun <[email protected]>
>> Cc: Michael Ellerman <[email protected]>
>> Cc: Christophe Leroy <[email protected]>
>> Cc: Benjamin Herrenschmidt <[email protected]>
>> Cc: Paul Mackerras <[email protected]>
>> Cc: Nicholas Piggin <[email protected]>
>> Cc: Kees Cook <[email protected]>
>> ---
>> arch/powerpc/kernel/head_64.S | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
>> index 744624140fb8..8c644e7c3eaf 100644
>> --- a/arch/powerpc/kernel/head_64.S
>> +++ b/arch/powerpc/kernel/head_64.S
>> @@ -914,6 +914,13 @@ start_here_multiplatform:
>> bl relative_toc
>> tovirt(r2,r2)
>>
>> + /* Do not clear the BSS for the second pass if randomized */
>> + LOAD_REG_ADDR(r3, kernstart_virt_addr)
>> + lwz r3,0(r3)
>> + LOAD_REG_IMMEDIATE(r4, KERNELBASE)
>> + cmpw r3,r4
>> + bne 4f
>
> These are 64-bit values.
>
Oh yes, will fix. Thanks.
> -Scott
>
>
>
> .
>
在 2020/3/5 5:53, Scott Wood 写道:
> On Thu, 2020-02-06 at 10:58 +0800, Jason Yan wrote:
>> The original kernel still exists in the memory, clear it now.
>>
>> Signed-off-by: Jason Yan <[email protected]>
>> Cc: Scott Wood <[email protected]>
>> Cc: Diana Craciun <[email protected]>
>> Cc: Michael Ellerman <[email protected]>
>> Cc: Christophe Leroy <[email protected]>
>> Cc: Benjamin Herrenschmidt <[email protected]>
>> Cc: Paul Mackerras <[email protected]>
>> Cc: Nicholas Piggin <[email protected]>
>> Cc: Kees Cook <[email protected]>
>> ---
>> arch/powerpc/mm/nohash/kaslr_booke.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c
>> b/arch/powerpc/mm/nohash/kaslr_booke.c
>> index c6f5c1db1394..ed1277059368 100644
>> --- a/arch/powerpc/mm/nohash/kaslr_booke.c
>> +++ b/arch/powerpc/mm/nohash/kaslr_booke.c
>> @@ -378,8 +378,10 @@ notrace void __init kaslr_early_init(void *dt_ptr,
>> phys_addr_t size)
>> unsigned int *__kaslr_offset = (unsigned int *)(KERNELBASE + 0x58);
>> unsigned int *__run_at_load = (unsigned int *)(KERNELBASE + 0x5c);
>>
>> - if (*__run_at_load == 1)
>> + if (*__run_at_load == 1) {
>> + kaslr_late_init();
>> return;
>> + }
>
> What if you're here because kexec set __run_at_load (or
> CONFIG_RELOCATABLE_TEST is enabled), not because kaslr happened?
>
Nothing will happen because kaslr_late_init() only clears memory when
kernstart_virt_addr is not KERNELBASE. When __run_at_load is set then
KASLR will not take effect.
> -Scott
>
>
>
> .
>
在 2020/3/5 5:21, Scott Wood 写道:
> On Wed, 2020-02-26 at 18:16 +1100, Daniel Axtens wrote:
>> Hi Jason,
>>
>>> This is a try to implement KASLR for Freescale BookE64 which is based on
>>> my earlier implementation for Freescale BookE32:
>>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
>>>
>>> The implementation for Freescale BookE64 is similar as BookE32. One
>>> difference is that Freescale BookE64 set up a TLB mapping of 1G during
>>> booting. Another difference is that ppc64 needs the kernel to be
>>> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
>>> it 64K-aligned. This can save some code to creat another TLB map at
>>> early boot. The disadvantage is that we only have about 1G/64K = 16384
>>> slots to put the kernel in.
>>>
>>> KERNELBASE
>>>
>>> 64K |--> kernel <--|
>>> | | |
>>> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
>>> | | | |....| | | | | | | | | |....| | |
>>> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+
>>> | | 1G
>>> |-----> offset <-----|
>>>
>>> kernstart_virt_addr
>>>
>>> I'm not sure if the slot numbers is enough or the design has any
>>> defects. If you have some better ideas, I would be happy to hear that.
>>>
>>> Thank you all.
>>>
>>
>> Are you making any attempt to hide kernel address leaks in this series?
>> I've just been looking at the stackdump code just now, and it directly
>> prints link registers and stack pointers, which is probably enough to
>> determine the kernel base address:
>>
>> SPs: LRs: %pS pointer
>> [ 0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154
>> (unreliable)
>> [ 0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac
>> [ 0.424659] [c0000000de403a60] [c0000000024d7a00]
>> mount_block_root+0x634/0x7c0
>> [ 0.424734] [c0000000de403be0] [c0000000024d8100]
>> prepare_namespace+0x1ec/0x23c
>> [ 0.424811] [c0000000de403c60] [c0000000024d7010]
>> kernel_init_freeable+0x804/0x880
>>
>> git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all
>> in process.c or in xmon.
>>
>> Maybe replacing the REG format string in KASLR mode would be sufficient?
>
> Whatever we decide to do here, it's not book3e-specific so it should be
> considered separately from these patches.
>
OK, I will continue to work with this series.
> -Scott
>
>
>
> .
>