LinuxLists.cc - [RFC][PATCH] Randomize kernel base address on boot

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Dan Rosenberg <[email protected]> wrote:

> This introduces CONFIG_RANDOMIZE_BASE, which randomizes the address at
> which the kernel is decompressed at boot as a security feature that
> deters exploit attempts relying on knowledge of the location of kernel
> internals. The default values of the kptr_restrict and dmesg_restrict
> sysctls are set to (1) when this is enabled, since hiding kernel
> pointers is necessary to preserve the secrecy of the randomized base
> address.

That was quick! :-)

> This feature also uses a fixed mapping to move the IDT (if not already
> done as a fix for the F00F bug), to avoid exposing the location of
> kernel internals relative to the original IDT. This has the additional
> security benefit of marking the new virtual address of the IDT
> read-only.

Btw., as i suggested before the IDT should be made percpu, that way we could
split out and evaluate the IDT change independently of any security
considerations, as a potential scalability improvement. Makes the decision
easier because right now moving the IDT to a 4K TLB increases the kernel's TLB
footprint a tiny bit.

> Entropy is generated using the RDRAND instruction if it is supported. If not,
> then RDTSC is used, if supported. If neither RDRAND nor RDTSC are supported,
> then no randomness is introduced. Support for the CPUID instruction is
> required to check for the availability of these two instructions.

Btw., i'd suggest to fall back not to zero but to something system specific
like RAM size or a BIOS signature such as the contents of 0xf0000 or so. This,
while clearly not random, will at least *somewhat* randomize the kernel against
remote attackers who do not know the RAM size or the system type.

Thanks,

Ingo

2011-05-24 21:16:58

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Dan Rosenberg <[email protected]> wrote:

> Comments/Questions:
>
> * Since RDRAND is relatively new, only the most recent version of
> binutils supports assembling it. To avoid breaking builds for people
> who use older toolchains but want this feature, I hardcoded the opcodes.
> If anyone has a better approach, please let me know.

This is generally the best approach. Maybe mention it here:

> + /* rdrand %eax */
> + .byte 0x0f, 0xc7, 0xf0

... that this is done to work on older GAS as well. Putting that into
changelogs is good, putting it into comments is better.

> * I chose to mimic the F00F bugfix behavior for moving the IDT, since it
> required very little code and has the additional benefit of making the
> IDT read-only. Ingo Molnar's suggestion of allocating per-cpu IDTs
> instead is still on the table, and I'd like to get feedback on this.

ok, good for an RFC patch.

> * In order to increase the entropy for the randomized base, I changed
> the default value of CONFIG_PHYSICAL_ALIGN back to 2mb. It had
> previously been raised to 16mb as a hack so that relocatable kernels
> wouldn't load below that minimum. I address this by changing the
> meaning of CONFIG_PHYSICAL_START such that it now represents a minimum
> address that relocatable kernels can be loaded at (rather than being
> ignored by relocatable kernels). So, if a relocatable kernel determines
> it should be loaded at an address below CONFIG_PHYSICAL_START (which
> defaults to 16mb), I just bump it up.

This would need a real fix, right? The PHYSICAL_ALIGN hack looks worth fixing
in its own right.

> * I would appreciate guidance on safe values for the highest addresses
> we can safely load the kernel at, on both 32-bit and 64-bit. This
> version uses 64mb (0x4000000) for 32-bit, and worked well in testing.

This depends on the memory map. In practice most x86 systems start with a big
chunk of RAM up to end of RAM or 3GB, whichever comes first. Holes typically
start at 3GB or higher.

On some systems holes can be pretty low as well - you'd have to research e820
maps submitted to lkml to see how common this is - but it's not terribly
common.

Some really old systems might have a hole between 15MB-16MB - but that's not an
issue if we load at 16 MB or higher.

> * CONFIG_RANDOMIZE_BASE automatically sets the default value of kptr_restrict
> and dmesg_restrict to 1, since it's nonsensical to use this without the other
> two. I considered removing CONFIG_SECURITY_DMESG_RESTRICT altogether (it
> currently sets the default value for dmesg_restrict), but just in case
> distros want to keep the CONFIG as a toggle switch but don't want to use
> CONFIG_RANDOMIZE_BASE, I kept it around. So, now CONFIG_RANDOMIZE_BASE sets
> the default value for CONFIG_SECURITY_DMESG_RESTRICT.

No, the right solution is what i suggested a few mails ago: /proc/kallsyms (and
other RIP printing places) should report the non-randomized RIP.

That way we do not have to change the kptr_restrict default and tools will
continue to work ...

> * x86-64 is still "to-do". Because it calculates the kernel text address
> twice, this may be a little trickier.

Note that 64-bit is obviously a must-have condition for the eventual acceptance
of this patch.

> * Finding a middle ground instead of the current "all-or-nothing" behavior of
> kptr_restrict that allows perf users to use this feature is future work.

Well, for perf we need to transform back the RIPs that get passed along in the
stack-dump/call-chain code, see:

arch/x86/kernel/dumpstack_64.c
arch/x86/kernel/dumpstack.c
arch/x86/kernel/dumpstack_32.c

That, combined with /proc/kallsyms unrandomization makes 'perf top' will just
work and produce non-randomized RIPs.

The canonical RIP to report is the one that the kernel would have if it was
loaded non-randomized.

> * Tested by repeatedly booting and observing kallsyms output on both i386.
> Passed the "looks random to me" test, and saw no bad behavior. Tested that
> changing CONFIG_PHYSICAL_ALIGN to 2mb still boots and runs fine on amd64.

Please run it over rngtest to measure how much true randomness is in it, on
your testbox.

> * Is it worth bothering to look for alternate sources of entropy if
> RDTSC isn't available?

No, if you do the system-specific BIOS signature trick i think it's adequate.

> * Could use testing of CPU hotplugging and suspend/resume.

and kexec/crashdump. and perf ;-)

Thanks,

Ingo

2011-05-24 21:46:53

by Brian Gerst

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Tue, May 24, 2011 at 4:31 PM, Dan Rosenberg <[email protected]> wrote:
> This introduces CONFIG_RANDOMIZE_BASE, which randomizes the address at
> which the kernel is decompressed at boot as a security feature that
> deters exploit attempts relying on knowledge of the location of kernel
> internals. The default values of the kptr_restrict and dmesg_restrict
> sysctls are set to (1) when this is enabled, since hiding kernel
> pointers is necessary to preserve the secrecy of the randomized base
> address.
>
> This feature also uses a fixed mapping to move the IDT (if not already
> done as a fix for the F00F bug), to avoid exposing the location of
> kernel internals relative to the original IDT. This has the additional
> security benefit of marking the new virtual address of the IDT
> read-only.
>
> Entropy is generated using the RDRAND instruction if it is supported. If
> not, then RDTSC is used, if supported. If neither RDRAND nor RDTSC are
> supported, then no randomness is introduced. Support for the CPUID
> instruction is required to check for the availability of these two
> instructions.
>
> Thanks to everyone who contributed helpful suggestions and feedback so
> far.
>
> Comments/Questions:
>
> * Since RDRAND is relatively new, only the most recent version of
> binutils supports assembling it. To avoid breaking builds for people
> who use older toolchains but want this feature, I hardcoded the opcodes.
> If anyone has a better approach, please let me know.
>
> * I chose to mimic the F00F bugfix behavior for moving the IDT, since it
> required very little code and has the additional benefit of making the
> IDT read-only. Ingo Molnar's suggestion of allocating per-cpu IDTs
> instead is still on the table, and I'd like to get feedback on this.
>
> * In order to increase the entropy for the randomized base, I changed
> the default value of CONFIG_PHYSICAL_ALIGN back to 2mb. It had
> previously been raised to 16mb as a hack so that relocatable kernels
> wouldn't load below that minimum. I address this by changing the
> meaning of CONFIG_PHYSICAL_START such that it now represents a minimum
> address that relocatable kernels can be loaded at (rather than being
> ignored by relocatable kernels). So, if a relocatable kernel determines
> it should be loaded at an address below CONFIG_PHYSICAL_START (which
> defaults to 16mb), I just bump it up.
>
> * I would appreciate guidance on safe values for the highest addresses
> we can safely load the kernel at, on both 32-bit and 64-bit. This
> version uses 64mb (0x4000000) for 32-bit, and worked well in testing.
>
> * CONFIG_RANDOMIZE_BASE automatically sets the default value of
> kptr_restrict and dmesg_restrict to 1, since it's nonsensical to use
> this without the other two. I considered removing
> CONFIG_SECURITY_DMESG_RESTRICT altogether (it currently sets the default
> value for dmesg_restrict), but just in case distros want to keep the
> CONFIG as a toggle switch but don't want to use CONFIG_RANDOMIZE_BASE, I
> kept it around. So, now CONFIG_RANDOMIZE_BASE sets the default value
> for CONFIG_SECURITY_DMESG_RESTRICT.
>
> * x86-64 is still "to-do". Because it calculates the kernel text address
> twice, this may be a little trickier.

This trick doesn't work as you may expect on 64-bit. You are
relocating the physical image of the kernel, but the kernel actually
runs from a fixed virtual mapping. This would require adding the
relocation code that 32-bit uses, so the virtual address can be
changed.

--
Brian Gerst

2011-05-24 22:33:16

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/24/2011 01:31 PM, Dan Rosenberg wrote:
> This introduces CONFIG_RANDOMIZE_BASE, which randomizes the address at
> which the kernel is decompressed at boot as a security feature that
> deters exploit attempts relying on knowledge of the location of kernel
> internals. The default values of the kptr_restrict and dmesg_restrict
> sysctls are set to (1) when this is enabled, since hiding kernel
> pointers is necessary to preserve the secrecy of the randomized base
> address.
>
> This feature also uses a fixed mapping to move the IDT (if not already
> done as a fix for the F00F bug), to avoid exposing the location of
> kernel internals relative to the original IDT. This has the additional
> security benefit of marking the new virtual address of the IDT
> read-only.

As written, I think this is unsafe, simply because the kernel has no
idea what memory is actually safe to relocate into, and your code
doesn't actually make any attempt at doing so.

The fact that you change CONFIG_PHYSICAL_ALIGN is particularly
devastating, and will introduce boot failures on real systems.

For this to be acceptable, you need to at the very least:

1. Verify the in the address map passed to the kernel where it is safe
to locate the kernel;
2. Not introduce a performance regression (we avoid locating in the
bottom 16 MiB for performance reasons, except on very small systems);
3. Make sure not to break kdump.

Arguably this is really something that would be *much* better done in
the bootloader, but given that the dominant boot loader for Linux is
Grub, I don't expect that anything will ever happen until the cows come
home :(

-hpa

2011-05-24 22:56:10

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Tue, 2011-05-24 at 23:02 +0200, Ingo Molnar wrote:
> * Dan Rosenberg <[email protected]> wrote:
>
> > This introduces CONFIG_RANDOMIZE_BASE, which randomizes the address at
> > which the kernel is decompressed at boot as a security feature that
> > deters exploit attempts relying on knowledge of the location of kernel
> > internals. The default values of the kptr_restrict and dmesg_restrict
> > sysctls are set to (1) when this is enabled, since hiding kernel
> > pointers is necessary to preserve the secrecy of the randomized base
> > address.
>
> That was quick! :-)
>
> > This feature also uses a fixed mapping to move the IDT (if not already
> > done as a fix for the F00F bug), to avoid exposing the location of
> > kernel internals relative to the original IDT. This has the additional
> > security benefit of marking the new virtual address of the IDT
> > read-only.
>
> Btw., as i suggested before the IDT should be made percpu, that way we could
> split out and evaluate the IDT change independently of any security
> considerations, as a potential scalability improvement. Makes the decision
> easier because right now moving the IDT to a 4K TLB increases the kernel's TLB
> footprint a tiny bit.
>

Alright, I'll start working on this.

> > Entropy is generated using the RDRAND instruction if it is supported. If not,
> > then RDTSC is used, if supported. If neither RDRAND nor RDTSC are supported,
> > then no randomness is introduced. Support for the CPUID instruction is
> > required to check for the availability of these two instructions.
>
> Btw., i'd suggest to fall back not to zero but to something system specific
> like RAM size or a BIOS signature such as the contents of 0xf0000 or so. This,
> while clearly not random, will at least *somewhat* randomize the kernel against
> remote attackers who do not know the RAM size or the system type.
>

Good idea, will do.

-Dan

2011-05-24 23:00:48

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Tue, 2011-05-24 at 23:16 +0200, Ingo Molnar wrote:
> * Dan Rosenberg <[email protected]> wrote:
>
> > Comments/Questions:
> >
> > * Since RDRAND is relatively new, only the most recent version of
> > binutils supports assembling it. To avoid breaking builds for people
> > who use older toolchains but want this feature, I hardcoded the opcodes.
> > If anyone has a better approach, please let me know.
>
> This is generally the best approach. Maybe mention it here:
>
> > + /* rdrand %eax */
> > + .byte 0x0f, 0xc7, 0xf0
>
> ... that this is done to work on older GAS as well. Putting that into
> changelogs is good, putting it into comments is better.
>

Will do.

> > * In order to increase the entropy for the randomized base, I changed
> > the default value of CONFIG_PHYSICAL_ALIGN back to 2mb. It had
> > previously been raised to 16mb as a hack so that relocatable kernels
> > wouldn't load below that minimum. I address this by changing the
> > meaning of CONFIG_PHYSICAL_START such that it now represents a minimum
> > address that relocatable kernels can be loaded at (rather than being
> > ignored by relocatable kernels). So, if a relocatable kernel determines
> > it should be loaded at an address below CONFIG_PHYSICAL_START (which
> > defaults to 16mb), I just bump it up.
>
> This would need a real fix, right? The PHYSICAL_ALIGN hack looks worth fixing
> in its own right.
>

I'm not sure of a better way to do this than what I've done, which is
essentially introduce a lower bound on the start location rather than
restricting the alignment. Suggestions welcome.

> > * CONFIG_RANDOMIZE_BASE automatically sets the default value of kptr_restrict
> > and dmesg_restrict to 1, since it's nonsensical to use this without the other
> > two. I considered removing CONFIG_SECURITY_DMESG_RESTRICT altogether (it
> > currently sets the default value for dmesg_restrict), but just in case
> > distros want to keep the CONFIG as a toggle switch but don't want to use
> > CONFIG_RANDOMIZE_BASE, I kept it around. So, now CONFIG_RANDOMIZE_BASE sets
> > the default value for CONFIG_SECURITY_DMESG_RESTRICT.
>
> No, the right solution is what i suggested a few mails ago: /proc/kallsyms (and
> other RIP printing places) should report the non-randomized RIP.
>
> That way we do not have to change the kptr_restrict default and tools will
> continue to work ...
>

Ok, I'll do it this way, and leave the kptr_restrict default to 0. But
I still think having the dmesg_restrict default depend on randomization
makes sense, since kernel .text is explicitly revealed in the syslog.

> > * x86-64 is still "to-do". Because it calculates the kernel text address
> > twice, this may be a little trickier.
>
> Note that 64-bit is obviously a must-have condition for the eventual acceptance
> of this patch.

Of course, just wanted early feedback.

>
> > * Finding a middle ground instead of the current "all-or-nothing" behavior of
> > kptr_restrict that allows perf users to use this feature is future work.
>
> Well, for perf we need to transform back the RIPs that get passed along in the
> stack-dump/call-chain code, see:
>
> arch/x86/kernel/dumpstack_64.c
> arch/x86/kernel/dumpstack.c
> arch/x86/kernel/dumpstack_32.c
>
> That, combined with /proc/kallsyms unrandomization makes 'perf top' will just
> work and produce non-randomized RIPs.
>
> The canonical RIP to report is the one that the kernel would have if it was
> loaded non-randomized.
>

Will do.

> > * Tested by repeatedly booting and observing kallsyms output on both i386.
> > Passed the "looks random to me" test, and saw no bad behavior. Tested that
> > changing CONFIG_PHYSICAL_ALIGN to 2mb still boots and runs fine on amd64.
>
> Please run it over rngtest to measure how much true randomness is in it, on
> your testbox.
>

Will do.

> > * Could use testing of CPU hotplugging and suspend/resume.
>
> and kexec/crashdump. and perf ;-)
>

Will do.

Thanks very much for the feedback.

2011-05-24 23:01:32

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

>
> This trick doesn't work as you may expect on 64-bit. You are
> relocating the physical image of the kernel, but the kernel actually
> runs from a fixed virtual mapping. This would require adding the
> relocation code that 32-bit uses, so the virtual address can be
> changed.
>

Noted, thanks, I'll be sure to not waste my time when I start working on
64-bit.

-Dan

> --
> Brian Gerst

2011-05-24 23:05:07

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Tue, 2011-05-24 at 15:31 -0700, H. Peter Anvin wrote:
> On 05/24/2011 01:31 PM, Dan Rosenberg wrote:
> > This introduces CONFIG_RANDOMIZE_BASE, which randomizes the address at
> > which the kernel is decompressed at boot as a security feature that
> > deters exploit attempts relying on knowledge of the location of kernel
> > internals. The default values of the kptr_restrict and dmesg_restrict
> > sysctls are set to (1) when this is enabled, since hiding kernel
> > pointers is necessary to preserve the secrecy of the randomized base
> > address.
> >
> > This feature also uses a fixed mapping to move the IDT (if not already
> > done as a fix for the F00F bug), to avoid exposing the location of
> > kernel internals relative to the original IDT. This has the additional
> > security benefit of marking the new virtual address of the IDT
> > read-only.
>
> As written, I think this is unsafe, simply because the kernel has no
> idea what memory is actually safe to relocate into, and your code
> doesn't actually make any attempt at doing so.
>
> The fact that you change CONFIG_PHYSICAL_ALIGN is particularly
> devastating, and will introduce boot failures on real systems.
>
> For this to be acceptable, you need to at the very least:
>
> 1. Verify the in the address map passed to the kernel where it is safe
> to locate the kernel;

I'll do this, thanks.

> 2. Not introduce a performance regression (we avoid locating in the
> bottom 16 MiB for performance reasons, except on very small systems);

I altered the boot code so that it uses CONFIG_PHYSICAL_START, which
defaults to 16 MiB, as a lower bound on location. So nothing will ever
get loaded below there, and I still can take advantage of higher
alignment granularity. Are there other problems I'm not anticipating?

> 3. Make sure not to break kdump.
>

Ok, I'll be sure to add this to the list of things to test.

Thanks for the feedback.

-Dan

2011-05-24 23:07:27

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/24/2011 02:16 PM, Ingo Molnar wrote:
>
> On some systems holes can be pretty low as well - you'd have to research e820
> maps submitted to lkml to see how common this is - but it's not terribly
> common.
>
> Some really old systems might have a hole between 15MB-16MB - but that's not an
> issue if we load at 16 MB or higher.
>

It definitely happens, and not just at 15-16 MiB either.

Doing this without actually consulting the memory map is dangerous as
hell; plus you have to verify that you're not clobbering anything else,
like the command line, initramfs or the linked list of data.

-hpa

2011-05-24 23:08:05

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/24/2011 04:04 PM, Dan Rosenberg wrote:
>
>> 2. Not introduce a performance regression (we avoid locating in the
>> bottom 16 MiB for performance reasons, except on very small systems);
>
> I altered the boot code so that it uses CONFIG_PHYSICAL_START, which
> defaults to 16 MiB, as a lower bound on location. So nothing will ever
> get loaded below there, and I still can take advantage of higher
> alignment granularity. Are there other problems I'm not anticipating?
>

Please look at the discussion as to what led us to do things this way.

-hpa

2011-05-24 23:08:57

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Tue, 2011-05-24 at 16:31 -0400, Dan Rosenberg wrote:
> This introduces CONFIG_RANDOMIZE_BASE, which randomizes the address at
> which the kernel is decompressed at boot as a security feature that
> deters exploit attempts relying on knowledge of the location of kernel
> internals. The default values of the kptr_restrict and dmesg_restrict
> sysctls are set to (1) when this is enabled, since hiding kernel
> pointers is necessary to preserve the secrecy of the randomized base
> address.

> diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
> index 67a655a..2680db0 100644
> --- a/arch/x86/boot/compressed/head_32.S
> +++ b/arch/x86/boot/compressed/head_32.S
> @@ -69,12 +69,75 @@ ENTRY(startup_32)
> */
>
> #ifdef CONFIG_RELOCATABLE
> +#ifdef CONFIG_RANDOMIZE_BASE
> +
> + /* Standard check for cpuid */
> + pushfl
> + popl %eax
> + movl %eax, %ebx
> + xorl $0x200000, %eax
> + pushl %eax
> + popfl
> + pushfl
> + popl %eax
> + cmpl %eax, %ebx
> + jz 4f
> +
> + /* Check for cpuid 1 */
> + movl $0x0, %eax
> + cpuid
> + cmpl $0x1, %eax
> + jb 4f
> +
> + movl $0x1, %eax
> + cpuid
> + xor %eax, %eax
> +
> + /* RDRAND is bit 30 */
> + testl $0x4000000, %ecx
> + jnz 1f
> +
> + /* RDTSC is bit 4 */
> + testl $0x10, %edx
> + jnz 3f
> +
> + /* Nothing is supported */
> + jmp 4f
> +1:
> + /* RDRAND sets carry bit on success, otherwise we should try
> + * again. */
> + movl $0x10, %ecx
> +2:
> + /* rdrand %eax */
> + .byte 0x0f, 0xc7, 0xf0
> + jc 4f
> + loop 2b
> +
> + /* Fall through: if RDRAND is supported but fails, use RDTSC,
> + * which is guaranteed to be supported. */
> +3:
> + rdtsc
> + shll $0xc, %eax
> +4:
> + /* Maximum offset at 64mb to be safe */
> + andl $0x3ffffff, %eax
> + movl %ebp, %ebx
> + addl %eax, %ebx
> +#else
> movl %ebp, %ebx
> +#endif
> movl BP_kernel_alignment(%esi), %eax
> decl %eax
> addl %eax, %ebx
> notl %eax
> andl %eax, %ebx
> +
> + /* LOAD_PHSYICAL_ADDR is the minimum safe address we can
> + * decompress at. */
> + cmpl $LOAD_PHYSICAL_ADDR, %ebx
> + jae 1f
> + movl $LOAD_PHYSICAL_ADDR, %ebx
> +1:
> #else
> movl $LOAD_PHYSICAL_ADDR, %ebx
> #endif
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 35af09d..6a05219 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -90,6 +90,13 @@ ENTRY(startup_32)
> addl %eax, %ebx
> notl %eax
> andl %eax, %ebx
> +
> + /* LOAD_PHYSICAL_ADDR is the minimum safe address we can
> + * decompress at. */
> + cmpl $LOAD_PHYSICAL_ADDR, %ebx
> + jae 1f
> + movl $LOAD_PHYSICAL_ADDR, %ebx
> +1:
> #else
> movl $LOAD_PHYSICAL_ADDR, %ebx
> #endif
> @@ -191,7 +198,7 @@ no_longmode:
> * it may change in the future.
> */
> .code64
> - .org 0x200
> + .org 0x300
> ENTRY(startup_64)
> /*
> * We come here either from startup_32 or directly from a
> @@ -232,6 +239,13 @@ ENTRY(startup_64)
> addq %rax, %rbp
> notq %rax
> andq %rax, %rbp
> +
> + /* LOAD_PHYSICAL_ADDR is the minimum safe address we can
> + * decompress at. */
> + cmpq $LOAD_PHYSICAL_ADDR, %rbp
> + jae 1f
> + movq $LOAD_PHYSICAL_ADDR, %rbp
> +1:
> #else
> movq $LOAD_PHYSICAL_ADDR, %rbp
> #endif

Thanks to Kees Cook for noticing that I didn't clear %eax before jumping
to my "nothing supported" (4) label. This would have just used the
flags as "randomness", but it's still wrong and I'll fix it. Next
version will have a fallback of using the BIOS signature instead anyway.

-Dan

2011-05-24 23:14:33

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/24/2011 03:31 PM, H. Peter Anvin wrote:
>
> Arguably this is really something that would be *much* better done in
> the bootloader, but given that the dominant boot loader for Linux is
> Grub, I don't expect that anything will ever happen until the cows come
> home :(
>

This pretty much means we need an opt-out for this. I think we need
this both in the form of a boot protocol flag bit (for the case where
the boot loader knows what it's doing, and what the kernel to stay put;
perhaps it has already randomized) and a kernel command-line option
(which can be parsed early and set the above flag.)

-hpa

2011-05-24 23:34:59

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Tue, 2011-05-24 at 16:07 -0700, H. Peter Anvin wrote:
> On 05/24/2011 04:04 PM, Dan Rosenberg wrote:
> >
> >> 2. Not introduce a performance regression (we avoid locating in the
> >> bottom 16 MiB for performance reasons, except on very small systems);
> >
> > I altered the boot code so that it uses CONFIG_PHYSICAL_START, which
> > defaults to 16 MiB, as a lower bound on location. So nothing will ever
> > get loaded below there, and I still can take advantage of higher
> > alignment granularity. Are there other problems I'm not anticipating?
> >
>
> Please look at the discussion as to what led us to do things this way.
>

Would you be able to point me to said discussion? The only thing I can
find is this:

http://marc.info/?l=linux-kernel&m=124173552516435&w=2

This set PHYSICAL_START at 16 MB and alignment at 2/4 MB. Then, three
days later, this was committed:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ceefccc93932b920a8ec6f35f596db05202a12fe

This sets the alignment to 16 MB, with the only justification being that
relocatable kernels also need to start above 16 MB.

Thanks,
Dan

2011-05-24 23:37:13

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/24/2011 04:34 PM, Dan Rosenberg wrote:
> On Tue, 2011-05-24 at 16:07 -0700, H. Peter Anvin wrote:
>> On 05/24/2011 04:04 PM, Dan Rosenberg wrote:
>>>
>>>> 2. Not introduce a performance regression (we avoid locating in the
>>>> bottom 16 MiB for performance reasons, except on very small systems);
>>>
>>> I altered the boot code so that it uses CONFIG_PHYSICAL_START, which
>>> defaults to 16 MiB, as a lower bound on location. So nothing will ever
>>> get loaded below there, and I still can take advantage of higher
>>> alignment granularity. Are there other problems I'm not anticipating?
>>>
>>
>> Please look at the discussion as to what led us to do things this way.
>>
>
> Would you be able to point me to said discussion? The only thing I can
> find is this:
>
> http://marc.info/?l=linux-kernel&m=124173552516435&w=2
>
> This set PHYSICAL_START at 16 MB and alignment at 2/4 MB. Then, three
> days later, this was committed:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ceefccc93932b920a8ec6f35f596db05202a12fe
>
> This sets the alignment to 16 MB, with the only justification being that
> relocatable kernels also need to start above 16 MB.
>

I think those patches came after the discussion were already over. I'll
try to look for it.

-hpa

2011-05-25 02:05:43

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Tue, 2011-05-24 at 19:08 -0400, Dan Rosenberg wrote:
> On Tue, 2011-05-24 at 16:31 -0400, Dan Rosenberg wrote:
> > This introduces CONFIG_RANDOMIZE_BASE, which randomizes the address at
> > which the kernel is decompressed at boot as a security feature that
> > deters exploit attempts relying on knowledge of the location of kernel
> > internals. The default values of the kptr_restrict and dmesg_restrict
> > sysctls are set to (1) when this is enabled, since hiding kernel
> > pointers is necessary to preserve the secrecy of the randomized base
> > address.
>
> > diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
> > index 67a655a..2680db0 100644
> > --- a/arch/x86/boot/compressed/head_32.S
> > +++ b/arch/x86/boot/compressed/head_32.S
> > @@ -69,12 +69,75 @@ ENTRY(startup_32)
> > */
> >
> > #ifdef CONFIG_RELOCATABLE
> > +#ifdef CONFIG_RANDOMIZE_BASE
> > +
> > + /* Standard check for cpuid */
> > + pushfl
> > + popl %eax
> > + movl %eax, %ebx
> > + xorl $0x200000, %eax
> > + pushl %eax
> > + popfl
> > + pushfl
> > + popl %eax
> > + cmpl %eax, %ebx
> > + jz 4f
> > +
> > + /* Check for cpuid 1 */
> > + movl $0x0, %eax
> > + cpuid
> > + cmpl $0x1, %eax
> > + jb 4f
> > +
> > + movl $0x1, %eax
> > + cpuid
> > + xor %eax, %eax
> > +
> > + /* RDRAND is bit 30 */
> > + testl $0x4000000, %ecx
> > + jnz 1f
> > +
> > + /* RDTSC is bit 4 */
> > + testl $0x10, %edx
> > + jnz 3f
> > +
> > + /* Nothing is supported */
> > + jmp 4f
> > +1:
> > + /* RDRAND sets carry bit on success, otherwise we should try
> > + * again. */
> > + movl $0x10, %ecx
> > +2:
> > + /* rdrand %eax */
> > + .byte 0x0f, 0xc7, 0xf0
> > + jc 4f
> > + loop 2b
> > +
> > + /* Fall through: if RDRAND is supported but fails, use RDTSC,
> > + * which is guaranteed to be supported. */
> > +3:
> > + rdtsc
> > + shll $0xc, %eax
> > +4:
> > + /* Maximum offset at 64mb to be safe */
> > + andl $0x3ffffff, %eax
> > + movl %ebp, %ebx
> > + addl %eax, %ebx
> > +#else
> > movl %ebp, %ebx
> > +#endif
> > movl BP_kernel_alignment(%esi), %eax
> > decl %eax
> > addl %eax, %ebx
> > notl %eax
> > andl %eax, %ebx
> > +
> > + /* LOAD_PHSYICAL_ADDR is the minimum safe address we can
> > + * decompress at. */
> > + cmpl $LOAD_PHYSICAL_ADDR, %ebx
> > + jae 1f
> > + movl $LOAD_PHYSICAL_ADDR, %ebx
> > +1:
> > #else
> > movl $LOAD_PHYSICAL_ADDR, %ebx
> > #endif
> > diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> > index 35af09d..6a05219 100644
> > --- a/arch/x86/boot/compressed/head_64.S
> > +++ b/arch/x86/boot/compressed/head_64.S
> > @@ -90,6 +90,13 @@ ENTRY(startup_32)
> > addl %eax, %ebx
> > notl %eax
> > andl %eax, %ebx
> > +
> > + /* LOAD_PHYSICAL_ADDR is the minimum safe address we can
> > + * decompress at. */
> > + cmpl $LOAD_PHYSICAL_ADDR, %ebx
> > + jae 1f
> > + movl $LOAD_PHYSICAL_ADDR, %ebx
> > +1:
> > #else
> > movl $LOAD_PHYSICAL_ADDR, %ebx
> > #endif
> > @@ -191,7 +198,7 @@ no_longmode:
> > * it may change in the future.
> > */
> > .code64
> > - .org 0x200
> > + .org 0x300
> > ENTRY(startup_64)
> > /*
> > * We come here either from startup_32 or directly from a
> > @@ -232,6 +239,13 @@ ENTRY(startup_64)
> > addq %rax, %rbp
> > notq %rax
> > andq %rax, %rbp
> > +
> > + /* LOAD_PHYSICAL_ADDR is the minimum safe address we can
> > + * decompress at. */
> > + cmpq $LOAD_PHYSICAL_ADDR, %rbp
> > + jae 1f
> > + movq $LOAD_PHYSICAL_ADDR, %rbp
> > +1:
> > #else
> > movq $LOAD_PHYSICAL_ADDR, %rbp
> > #endif
>
> Thanks to Kees Cook for noticing that I didn't clear %eax before jumping
> to my "nothing supported" (4) label. This would have just used the
> flags as "randomness", but it's still wrong and I'll fix it. Next
> version will have a fallback of using the BIOS signature instead anyway.
>

Also thanks to someone who prefers to remain nameless for pointing out
that this logic also results in the kernel being loaded at
LOAD_PHYSICAL_ADDR about one in four times (because it rounds up). This
will be fixed as well.

-Dan

2011-05-25 11:23:59

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Dan Rosenberg <[email protected]> wrote:

> > No, the right solution is what i suggested a few mails ago:
> > /proc/kallsyms (and other RIP printing places) should report the
> > non-randomized RIP.
> >
> > That way we do not have to change the kptr_restrict default and
> > tools will continue to work ...
>
> Ok, I'll do it this way, and leave the kptr_restrict default to 0.
> But I still think having the dmesg_restrict default depend on
> randomization makes sense, since kernel .text is explicitly
> revealed in the syslog.

Hm, where is it revealed beyond intcall addresses, which ought to be
handled if they are printed via %pK?

All such information leaks need to be fixed. (This will be the
slowest part of the process i suspect - there's many channels.)

in the syslog we obviously want any RIPs converted to the canonical
'unrandomized' address, so that it can be matched against
/proc/kallsyms, etc. Their randomized value isnt very useful. That
will also protect the randomization secret as a side effect.

The only thorny issue AFAICS are oopses. There's real value in having
'raw' data from a crash (interpreting crashes is hard enough even
without randomization!), OTOH we could keep most of the value of them
by converting them back to canonical addresses.

This would be more or less easy to do for the RIP and the registers,
but less obvious for the stack: a kernel pointer can lie on the stack
at arbitrary alignment. On 64-bit we could probably detect them
rather reliably based on the randomized prefix of kernel addresses:

[ 32.946003] Stack:
[ 32.946003] 0000000000000202 0000000000000002 0000000000000001 0000000000000000
[ 32.946003] 0000000000000198 0000000000000002 0000000000000000 00000000002ca5b0
[ 32.946003] 0000000000000000 ffff88003e5533e0 ffff88003f977c00 ffffffff802225e3

the ffffffff8 prefix (assuming we end up randomizing the address
within the 2GB window available to a RIP-relative addressed kernel)
would be easy to detect even if it's not word aligned. There *would*
be false positives (a 32-bit value of -7 is common), but as long as
we marked any unrandomization clearly with an asterix:

[ 32.946003] Stack:
[ 32.946003] 0000000000000202 0000000000000002 0000000000000001 0000000000000000
[ 32.946003] 0000000000000198 0000000000000002 0000000000000000 00000000002ca5b0
[ 32.946003] 0000000000000000*ffff88003e5533e0*ffff88003f977c00*ffffffff802225e3

we'd be informed that the stack content was slighly different. If we
fixed up register values, say the raw value is:

[ 32.946003] RDX: 0000000000000000 RSI: ffffffff80ce0100 RDI: 0000000000000000

and randomization is -0x100000 then we'd print the normalized value
for 'RSI':

[ 32.946003] RDX: 0000000000000000 RSI:*ffffffff80de0100 RDI: 0000000000000000

And the '*' tells us that this value got normalized.

On 32-bit systems the rate of false positive is probably higher, he
'0xc0' byte pattern is pretty common.

Now, theoretically there's still a tiny information hole here: if an
attacker can crash a kernel in a non-fatal way that puts some known
data on the kernel stack, then the unrandomization will reveal the
secret ...

I guess we'll have to live with that: really paranoid places will
disable dmesg access to unprivileged users.

[ They might also want to have a knob to not log kernel crashes at
all - best protection is if *no one* (not even root) has a way to
figure out the secret. That needs to go hand in hand with forced
use of signed modules, sanitized /dev/mem, no root-controllable DMA
access to any device, no ioperm() and iopl(), etc. - so a very
locked down kernel that protects even root from being able to
execute kernel code. Such systems are still useful btw even if root
otherwise has access to all disks and has access to the kernel
image and can install its own image: a reboot will generally set
off an alarm. ]

> Thanks very much for the feedback.

Hey, thanks for taking up on implementing this rather non-trivial
security feature!

Thanks,

Ingo

2011-05-25 14:03:57

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Tue, 2011-05-24 at 16:06 -0700, H. Peter Anvin wrote:
> On 05/24/2011 02:16 PM, Ingo Molnar wrote:
> >
> > On some systems holes can be pretty low as well - you'd have to research e820
> > maps submitted to lkml to see how common this is - but it's not terribly
> > common.
> >
> > Some really old systems might have a hole between 15MB-16MB - but that's not an
> > issue if we load at 16 MB or higher.
> >
>
> It definitely happens, and not just at 15-16 MiB either.
>
> Doing this without actually consulting the memory map is dangerous as
> hell; plus you have to verify that you're not clobbering anything else,
> like the command line, initramfs or the linked list of data.
>

My current idea is to use int 0x15, eax = 0xe801 (which seems to be
nearly universally supported) and use bx/dx to determine the amount of
contiguous, usable memory above 16 MB, which seems to be exactly what we
want to know. If the BIOS does not support this function I'll be sure
to catch that and skip the randomization. Likewise, if the amount of
returned memory seems insufficient or otherwise confusing, I'll skip the
randomization.

Given this information, do you have a conservative guess for how close
to the top of available memory we can put the kernel? As in, let's say
we have an XYZ MB chunk of contiguous, free memory, how should I
calculate the highest, safe place to put the kernel in that region?

I'm going to continue to enforce the requirement that 16 MB is the
lowest address we can safely load the kernel, and I'd still appreciate
any information on why 2/4 MB default alignment might cause problems.

Thanks,
Dan

> -hpa

2011-05-25 14:14:46

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Dan Rosenberg <[email protected]> wrote:

> I'm going to continue to enforce the requirement that 16 MB is the
> lowest address we can safely load the kernel, and I'd still
> appreciate any information on why 2/4 MB default alignment might
> cause problems.

The 16 MB limit is more about preserving 24-bit addressable memory
than about safety: it is a useful resource to certain physical
devices and we do not want to reduce that resource by ~12.5% by
putting a ~2MB kernel image into it.

But yes, we want to load above 16 MB, if RAM size makes it possible.

Thanks,

Ingo

2011-05-25 14:21:04

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Wed, 2011-05-25 at 13:23 +0200, Ingo Molnar wrote:
> * Dan Rosenberg <[email protected]> wrote:
>
> > > No, the right solution is what i suggested a few mails ago:
> > > /proc/kallsyms (and other RIP printing places) should report the
> > > non-randomized RIP.
> > >
> > > That way we do not have to change the kptr_restrict default and
> > > tools will continue to work ...
> >
> > Ok, I'll do it this way, and leave the kptr_restrict default to 0.
> > But I still think having the dmesg_restrict default depend on
> > randomization makes sense, since kernel .text is explicitly
> > revealed in the syslog.
>
> Hm, where is it revealed beyond intcall addresses, which ought to be
> handled if they are printed via %pK?
>
> All such information leaks need to be fixed. (This will be the
> slowest part of the process i suspect - there's many channels.)
>
> in the syslog we obviously want any RIPs converted to the canonical
> 'unrandomized' address, so that it can be matched against
> /proc/kallsyms, etc. Their randomized value isnt very useful. That
> will also protect the randomization secret as a side effect.
>

%pK doesn't seem like the right thing to do in many cases, since the
capability check doesn't have proper meaning if the caller isn't in
process context. If I'm understanding you right (correct if I'm wrong),
you're looking for kptr_restrict to be completely separate from this
randomization, and when randomization is enabled, all pointers are
unconditionally de-randomized. It seems like the right way to do this
is to include code in vsprintf.c for all %p-type specifiers that would
normally print the actual pointer (as opposed to some of the specialized
cases that print other data) that does something like this:

if((unsigned long)ptr >= (unsigned long)_stext &&
(unsigned long)ptr <= (unsigned long)_end)
ptr -= (_text - (CONFIG_PHYSICAL_START + PAGE_OFFSET));

This way, we don't have to go tracking down every printk caller and
convert them to %pK, which isn't usable anyway in some cases.

> The only thorny issue AFAICS are oopses. There's real value in having
> 'raw' data from a crash (interpreting crashes is hard enough even
> without randomization!), OTOH we could keep most of the value of them
> by converting them back to canonical addresses.
>
> This would be more or less easy to do for the RIP and the registers,
> but less obvious for the stack: a kernel pointer can lie on the stack
> at arbitrary alignment. On 64-bit we could probably detect them
> rather reliably based on the randomized prefix of kernel addresses:
>
> [ 32.946003] Stack:
> [ 32.946003] 0000000000000202 0000000000000002 0000000000000001 0000000000000000
> [ 32.946003] 0000000000000198 0000000000000002 0000000000000000 00000000002ca5b0
> [ 32.946003] 0000000000000000 ffff88003e5533e0 ffff88003f977c00 ffffffff802225e3
>
> the ffffffff8 prefix (assuming we end up randomizing the address
> within the 2GB window available to a RIP-relative addressed kernel)
> would be easy to detect even if it's not word aligned. There *would*
> be false positives (a 32-bit value of -7 is common), but as long as
> we marked any unrandomization clearly with an asterix:
>
> [ 32.946003] Stack:
> [ 32.946003] 0000000000000202 0000000000000002 0000000000000001 0000000000000000
> [ 32.946003] 0000000000000198 0000000000000002 0000000000000000 00000000002ca5b0
> [ 32.946003] 0000000000000000*ffff88003e5533e0*ffff88003f977c00*ffffffff802225e3
>
> we'd be informed that the stack content was slighly different. If we
> fixed up register values, say the raw value is:
>
> [ 32.946003] RDX: 0000000000000000 RSI: ffffffff80ce0100 RDI: 0000000000000000
>
> and randomization is -0x100000 then we'd print the normalized value
> for 'RSI':
>
> [ 32.946003] RDX: 0000000000000000 RSI:*ffffffff80de0100 RDI: 0000000000000000
>
> And the '*' tells us that this value got normalized.
>
> On 32-bit systems the rate of false positive is probably higher, he
> '0xc0' byte pattern is pretty common.
>
> Now, theoretically there's still a tiny information hole here: if an
> attacker can crash a kernel in a non-fatal way that puts some known
> data on the kernel stack, then the unrandomization will reveal the
> secret ...
>
> I guess we'll have to live with that: really paranoid places will
> disable dmesg access to unprivileged users.

I'm tempted to just say "leave OOPS alone", and if you want to preserve
secrecy past an OOPS, you should be disabling dmesg access anyway. But
I'll think more about this.

>
> [ They might also want to have a knob to not log kernel crashes at
> all - best protection is if *no one* (not even root) has a way to
> figure out the secret. That needs to go hand in hand with forced
> use of signed modules, sanitized /dev/mem, no root-controllable DMA
> access to any device, no ioperm() and iopl(), etc. - so a very
> locked down kernel that protects even root from being able to
> execute kernel code. Such systems are still useful btw even if root
> otherwise has access to all disks and has access to the kernel
> image and can install its own image: a reboot will generally set
> off an alarm. ]
>
> > Thanks very much for the feedback.
>
> Hey, thanks for taking up on implementing this rather non-trivial
> security feature!
>

What can I say, I like a challenge. :)

-Dan

> Thanks,
>
> Ingo

2011-05-25 14:29:49

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Dan Rosenberg <[email protected]> wrote:

> On Wed, 2011-05-25 at 13:23 +0200, Ingo Molnar wrote:
> > * Dan Rosenberg <[email protected]> wrote:
> >
> > > > No, the right solution is what i suggested a few mails ago:
> > > > /proc/kallsyms (and other RIP printing places) should report the
> > > > non-randomized RIP.
> > > >
> > > > That way we do not have to change the kptr_restrict default and
> > > > tools will continue to work ...
> > >
> > > Ok, I'll do it this way, and leave the kptr_restrict default to 0.
> > > But I still think having the dmesg_restrict default depend on
> > > randomization makes sense, since kernel .text is explicitly
> > > revealed in the syslog.
> >
> > Hm, where is it revealed beyond intcall addresses, which ought to be
> > handled if they are printed via %pK?
> >
> > All such information leaks need to be fixed. (This will be the
> > slowest part of the process i suspect - there's many channels.)
> >
> > in the syslog we obviously want any RIPs converted to the canonical
> > 'unrandomized' address, so that it can be matched against
> > /proc/kallsyms, etc. Their randomized value isnt very useful. That
> > will also protect the randomization secret as a side effect.
> >
>
> %pK doesn't seem like the right thing to do in many cases, since
> the capability check doesn't have proper meaning if the caller
> isn't in process context. [...]

Oh, ok, i see what you mean.

I was not thinking of %pK as a way to restrict access really. I am
thinking of it as a nicely central way to create constant RIPs out of
random RIPs.

In that sense if %pK cannot be called everywhere please introduce a
%pk variant that just prints a raw kernel address value and does no
access check, just the unrandomization.

> [...] If I'm understanding you right (correct if I'm wrong),
> you're looking for kptr_restrict to be completely separate from
> this randomization, and when randomization is enabled, all pointers
> are unconditionally de-randomized. It seems like the right way to
> do this is to include code in vsprintf.c for all %p-type specifiers
> that would normally print the actual pointer (as opposed to some of
> the specialized cases that print other data) that does something
> like this:
>
> if((unsigned long)ptr >= (unsigned long)_stext &&
> (unsigned long)ptr <= (unsigned long)_end)
> ptr -= (_text - (CONFIG_PHYSICAL_START + PAGE_OFFSET));
>
> This way, we don't have to go tracking down every printk caller and
> convert them to %pK, which isn't usable anyway in some cases.

Yeah, but please also provide %pk to not have to hunt down every
single place that might print a kernel address via a "%016Lx" or "%p"
and thus leaks the randomization secret.

That way you can convert *every* known kernel-address-printing format
string to one of the %p variants and thus have the above
unrandomization step done automatically.

Perhaps as a debugging help also try to flag %p printouts that are
suspiciously within kernel image boundaries. (Note: you dont want to
printk from that place though, as you could already be executing
within printk.) Maybe even %x/%X printouts that are in that range. As
a debugging help, there could easily be false positives.

> I'm tempted to just say "leave OOPS alone", and if you want to preserve
> secrecy past an OOPS, you should be disabling dmesg access anyway. But
> I'll think more about this.

It's definitely a good first-approximation answer. We only do perfect
kernels anyway, so they wont oops.

Please convert RIPs in oops decoding nevertheless, so that it can be
correlated with the symbol table...

Thanks,

Ingo

2011-05-25 15:49:29

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/25/2011 07:03 AM, Dan Rosenberg wrote:
>
> My current idea is to use int 0x15, eax = 0xe801 (which seems to be
> nearly universally supported) and use bx/dx to determine the amount of
> contiguous, usable memory above 16 MB, which seems to be exactly what we
> want to know. If the BIOS does not support this function I'll be sure
> to catch that and skip the randomization. Likewise, if the amount of
> returned memory seems insufficient or otherwise confusing, I'll skip the
> randomization.
>

No, sorry. This has been wrong for over 10 years; there is no
substitute for the full (e820) memory map. *Furthermore*, based on
where in the bootup sequence you are doing this, you also have to
consider any other memory structures that the kernel needs to be aware
of (initramfs, any chunks in the linked list, the command line, EFI
handover structures, etc.) This is in fact an arbitrarily complex
operation... we have *finally* gotten the kernel to the point where (a)
the boot loader can actually do the right thing in all cases and (b) the
kernel will reserve or copy all the auxiliary memory chunks it needs at
a very early point.

Sorry, this cannot be short-circuited.

> Given this information, do you have a conservative guess for how close
> to the top of available memory we can put the kernel? As in, let's say
> we have an XYZ MB chunk of contiguous, free memory, how should I
> calculate the highest, safe place to put the kernel in that region?
>
> I'm going to continue to enforce the requirement that 16 MB is the
> lowest address we can safely load the kernel, and I'd still appreciate
> any information on why 2/4 MB default alignment might cause problems.

The problem with all of that was backwards compatibility with existing
relocating bootloaders.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2011-05-25 16:15:55

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Wed, 2011-05-25 at 08:48 -0700, H. Peter Anvin wrote:
> On 05/25/2011 07:03 AM, Dan Rosenberg wrote:
> >
> > My current idea is to use int 0x15, eax = 0xe801 (which seems to be
> > nearly universally supported) and use bx/dx to determine the amount of
> > contiguous, usable memory above 16 MB, which seems to be exactly what we
> > want to know. If the BIOS does not support this function I'll be sure
> > to catch that and skip the randomization. Likewise, if the amount of
> > returned memory seems insufficient or otherwise confusing, I'll skip the
> > randomization.
> >
>
> No, sorry. This has been wrong for over 10 years; there is no
> substitute for the full (e820) memory map. *Furthermore*, based on
> where in the bootup sequence you are doing this, you also have to
> consider any other memory structures that the kernel needs to be aware
> of (initramfs, any chunks in the linked list, the command line, EFI
> handover structures, etc.) This is in fact an arbitrarily complex
> operation... we have *finally* gotten the kernel to the point where (a)
> the boot loader can actually do the right thing in all cases and (b) the
> kernel will reserve or copy all the auxiliary memory chunks it needs at
> a very early point.
>
> Sorry, this cannot be short-circuited.
>

Ok, checking the e820 memory map seems like the way to go then. As a
first attempt, I'd assume that if I find a contiguous free chunk that
begins before (or at) 16 MB and continues beyond 16 MB, then that
represents space where it's safe to load the kernel (up to a certain
point before the end of that chunk), assuming the chunk has enough space
and I do some degree of checking that I'm not decompressing on top of
something else (I'll start to gather a list of what to watch out for).
Is this a fair assumption?

> > Given this information, do you have a conservative guess for how close
> > to the top of available memory we can put the kernel? As in, let's say
> > we have an XYZ MB chunk of contiguous, free memory, how should I
> > calculate the highest, safe place to put the kernel in that region?
> >
> > I'm going to continue to enforce the requirement that 16 MB is the
> > lowest address we can safely load the kernel, and I'd still appreciate
> > any information on why 2/4 MB default alignment might cause problems.
>
> The problem with all of that was backwards compatibility with existing
> relocating bootloaders.
>

Do you have any alternatives that allow maintaining compatibility while
giving us finer-grained alignment? It seems it should be possible,
since alignment was lower than 16 MB for years before this change was
introduced...

Thanks,
Dan

> -hpa
>
> --
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel. I don't speak on their behalf.

2011-05-25 16:25:15

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/25/2011 09:15 AM, Dan Rosenberg wrote:
>
> Ok, checking the e820 memory map seems like the way to go then. As a
> first attempt, I'd assume that if I find a contiguous free chunk that
> begins before (or at) 16 MB and continues beyond 16 MB, then that
> represents space where it's safe to load the kernel (up to a certain
> point before the end of that chunk), assuming the chunk has enough space
> and I do some degree of checking that I'm not decompressing on top of
> something else (I'll start to gather a list of what to watch out for).
> Is this a fair assumption?
>

There is already code that calculates exactly how much space is needed,
so that part is good -- you should have a tight bound available to you.

The important and messy part, though, is that you get the "raw" e820 map
at that point (including not even having had the e801 and 88 fallback
information merged into it.) This information has to be sanitized (to
deal with overlaps and broken-up chunks) and reserved areas merged in.
This is done in the kernel proper, and bootloaders have some equivalent
code, but you don't have it in that particular boot stage.

>
> Do you have any alternatives that allow maintaining compatibility while
> giving us finer-grained alignment? It seems it should be possible,
> since alignment was lower than 16 MB for years before this change was
> introduced...
>

Basically, you end up having to have a "real alignment" that is internal
to the kernel. We already expose a "minimum alignment" field in the
header (the legacy field is now "recommended alignment"); however, the
"minimum alignment" is really too aggressive.

Since this can be buried in the kernel itself the key is to not change
the existing header fields.

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2011-05-26 20:02:21

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Tue, May 24, 2011 at 04:31:45PM -0400, Dan Rosenberg wrote:

[..]
> ==============================================================
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 880fcb6..999ea82 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1548,8 +1548,8 @@ config PHYSICAL_START
> If kernel is a not relocatable (CONFIG_RELOCATABLE=n) then
> bzImage will decompress itself to above physical address and
> run from there. Otherwise, bzImage will run from the address where
> - it has been loaded by the boot loader and will ignore above physical
> - address.
> + it has been loaded by the boot loader, using the above physical
> + address as a lower bound.
>
> In normal kdump cases one does not have to set/change this option
> as now bzImage can be compiled as a completely relocatable image
> @@ -1595,7 +1595,31 @@ config RELOCATABLE
>
> Note: If CONFIG_RELOCATABLE=y, then the kernel runs from the address
> it has been loaded at and the compile time physical address
> - (CONFIG_PHYSICAL_START) is ignored.
> + (CONFIG_PHYSICAL_START) is solely used as a lower bound.
> +

This does not sound too good. Overloading the definition of PHYSICAL_START
with minimum address. The very definition of relocatable kernel is that
it should be able to run from the physical address it has been loaded
at (subjected to alignment constraints).

So I don't think overloading CONFIG_PHYSICAL_START definition is a good
idea. In fact there is no reason that why kdump kernels should not run
and boot below 16MB. So limiting those kernels to not load and run
below 16MB is does not sound like good option to me.

Also randomization of kernel load address at run time will probably have
some issues with crashkernel=X@Y address syntax. So far user knew what
address first kernel is booting from and user could speicy where to
reserve memory. Now it might happen that user specified some memory
to reserve and kernel decided to occupy that space resulting in failed
memory reservation for crash kernel.

Thanks
Vivek

2011-05-26 20:07:10

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Thu, 2011-05-26 at 16:01 -0400, Vivek Goyal wrote:
> On Tue, May 24, 2011 at 04:31:45PM -0400, Dan Rosenberg wrote:
>
> [..]
> > ==============================================================
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 880fcb6..999ea82 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1548,8 +1548,8 @@ config PHYSICAL_START
> > If kernel is a not relocatable (CONFIG_RELOCATABLE=n) then
> > bzImage will decompress itself to above physical address and
> > run from there. Otherwise, bzImage will run from the address where
> > - it has been loaded by the boot loader and will ignore above physical
> > - address.
> > + it has been loaded by the boot loader, using the above physical
> > + address as a lower bound.
> >
> > In normal kdump cases one does not have to set/change this option
> > as now bzImage can be compiled as a completely relocatable image
> > @@ -1595,7 +1595,31 @@ config RELOCATABLE
> >
> > Note: If CONFIG_RELOCATABLE=y, then the kernel runs from the address
> > it has been loaded at and the compile time physical address
> > - (CONFIG_PHYSICAL_START) is ignored.
> > + (CONFIG_PHYSICAL_START) is solely used as a lower bound.
> > +
>
> This does not sound too good. Overloading the definition of PHYSICAL_START
> with minimum address. The very definition of relocatable kernel is that
> it should be able to run from the physical address it has been loaded
> at (subjected to alignment constraints).
>
> So I don't think overloading CONFIG_PHYSICAL_START definition is a good
> idea. In fact there is no reason that why kdump kernels should not run
> and boot below 16MB. So limiting those kernels to not load and run
> below 16MB is does not sound like good option to me.
>

I'm going to revisit this part of the patch and think of a better way to
do this.

> Also randomization of kernel load address at run time will probably have
> some issues with crashkernel=X@Y address syntax. So far user knew what
> address first kernel is booting from and user could speicy where to
> reserve memory. Now it might happen that user specified some memory
> to reserve and kernel decided to occupy that space resulting in failed
> memory reservation for crash kernel.
>

Ok, added to the list of things to figure out. Thanks.

> Thanks
> Vivek

2011-05-26 20:17:10

by Valdis Klētnieks

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Thu, 26 May 2011 16:01:21 EDT, Vivek Goyal said:

> Also randomization of kernel load address at run time will probably have
> some issues with crashkernel=X@Y address syntax. So far user knew what
> address first kernel is booting from and user could speicy where to
> reserve memory. Now it might happen that user specified some memory
> to reserve and kernel decided to occupy that space resulting in failed
> memory reservation for crash kernel.

That is however fixable - the randomizer just needs to make sure it doesn't
overlay the crashkernel= space, and the crashkernel needs to be started with a
'norandomize' parameter. If your threat model includes attacks on the
crashkernel that randomizing will help with, you got bigger problems. ;)

Attachments:

(No filename) (227.00 B)

2011-05-26 20:32:26

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Thu, May 26, 2011 at 04:16:05PM -0400, [email protected] wrote:
> On Thu, 26 May 2011 16:01:21 EDT, Vivek Goyal said:
>
> > Also randomization of kernel load address at run time will probably have
> > some issues with crashkernel=X@Y address syntax. So far user knew what
> > address first kernel is booting from and user could speicy where to
> > reserve memory. Now it might happen that user specified some memory
> > to reserve and kernel decided to occupy that space resulting in failed
> > memory reservation for crash kernel.
>
> That is however fixable - the randomizer just needs to make sure it doesn't
> overlay the crashkernel= space, and the crashkernel needs to be started with a
> 'norandomize' parameter.

That can be done but at the same time if kernel does not find any suitable
range to boot from, it should override crashkernel=X@Y settings and fail
crash memory reservation.

I guess with randomize space thing a more suitable crash kernel command
line will be crashkernel=X where kernel decides the base address for
second kernel depending on availability.

> If your threat model includes attacks on the
> crashkernel that randomizing will help with, you got bigger problems. ;)
>

:-) I think norandomize for kdump kernel should be just fine.

Thanks
Vivek

2011-05-26 20:35:23

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Tue, May 24, 2011 at 04:31:45PM -0400, Dan Rosenberg wrote:
> This introduces CONFIG_RANDOMIZE_BASE, which randomizes the address at
> which the kernel is decompressed at boot as a security feature that
> deters exploit attempts relying on knowledge of the location of kernel
> internals. The default values of the kptr_restrict and dmesg_restrict
> sysctls are set to (1) when this is enabled, since hiding kernel
> pointers is necessary to preserve the secrecy of the randomized base
> address.

What happens to /proc/iomem interface which gives us the physical memory
location where kernel is loaded. kexec-tools relies on that interface
heavily so we can not take it away. And if we can not take it away then
I think somebody should be easibly be able to calculate this randomized
base address.

Thanks
Vivek

2011-05-26 20:39:40

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Tue, 2011-05-24 at 16:31 -0400, Dan Rosenberg wrote:
> This introduces CONFIG_RANDOMIZE_BASE, which randomizes the address at
> which the kernel is decompressed at boot as a security feature that
> deters exploit attempts relying on knowledge of the location of kernel
> internals. The default values of the kptr_restrict and dmesg_restrict
> sysctls are set to (1) when this is enabled, since hiding kernel
> pointers is necessary to preserve the secrecy of the randomized base
> address.
>
> This feature also uses a fixed mapping to move the IDT (if not already
> done as a fix for the F00F bug), to avoid exposing the location of
> kernel internals relative to the original IDT. This has the additional
> security benefit of marking the new virtual address of the IDT
> read-only.
>
> Entropy is generated using the RDRAND instruction if it is supported. If
> not, then RDTSC is used, if supported. If neither RDRAND nor RDTSC are
> supported, then no randomness is introduced. Support for the CPUID
> instruction is required to check for the availability of these two
> instructions.
>
> Thanks to everyone who contributed helpful suggestions and feedback so
> far.
>

I wanted to send out an update email that consolidated the feedback and
suggestions I've received so far:

1. I'm nearly finished a first draft of code to parse the BIOS E820
memory map to determine where it's safe to place the randomized kernel.
This code accounts for overlapping regions, as well as potential
conflicts in region types (free vs. reserved, etc.), in favor of
non-free types. The end result is, I'll have a reasonable upper bound.

2. I'll parse the kernel command line for crashkernel arguments and
avoid placing a randomized kernel in any regions marked as reserved. A
new command line argument for kdump might be a good idea as well
(discussion on-going).

3. I'll be introducing a new format specifier (perhaps %pk) that
unconditionally de-randomizes kernel pointers, and switch callers where
appropriate.

4. The perf call chains that rely on kernel pointers will account for
the randomization.

5. I'll be switching to per-cpu IDTs, basing my work on the following
patch:

http://marc.info/?l=linux-kernel&m=112767117501231&w=2

Any review or comments on the above patch would be helpful. I'm
considering submitting this portion separately, as it may provide
performance and scalability benefits regardless of randomization.

6. As per H. Peter Anvin's suggestion, it seems there's some demand for
a way to opt-out at the boot-loader level, possibly via a command-line
option and boot protocol flag.

7. Still need to figure out exactly what's ok and what's not regarding
altering alignment and PHYSICAL_START. It seems there's some consensus
on "don't do it", but perhaps it's ok to partially ignore the alignment
config at runtime in favor of hard-coded, known-safe, finer-grained
alternatives.

8. Other pieces of feedback, such as comment suggestions, changes to
kptr_restrict/dmesg_restrict defaults, etc. have been incorporated.

8. x86-64 will present its own set of challenges. One thing at a time.

Thanks for all the feedback and guidance so far. Let me know if
anything above is objectionable, or if you have any more suggestions.
There's lots to do, but I haven't given up yet. :)

Regards,
Dan

2011-05-26 20:41:16

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Thu, May 26, 2011 at 04:35:02PM -0400, Vivek Goyal wrote:
> On Tue, May 24, 2011 at 04:31:45PM -0400, Dan Rosenberg wrote:
> > This introduces CONFIG_RANDOMIZE_BASE, which randomizes the address at
> > which the kernel is decompressed at boot as a security feature that
> > deters exploit attempts relying on knowledge of the location of kernel
> > internals. The default values of the kptr_restrict and dmesg_restrict
> > sysctls are set to (1) when this is enabled, since hiding kernel
> > pointers is necessary to preserve the secrecy of the randomized base
> > address.
>
> What happens to /proc/iomem interface which gives us the physical memory
> location where kernel is loaded. kexec-tools relies on that interface
> heavily so we can not take it away. And if we can not take it away then
> I think somebody should be easibly be able to calculate this randomized
> base address.

Resending this mail as in last message I got the email address of Dan
wrong and mail bounced. Sorry about that.

Thanks
Vivek

2011-05-26 20:44:45

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Thu, 2011-05-26 at 16:40 -0400, Vivek Goyal wrote:
> On Thu, May 26, 2011 at 04:35:02PM -0400, Vivek Goyal wrote:
> > On Tue, May 24, 2011 at 04:31:45PM -0400, Dan Rosenberg wrote:
> > > This introduces CONFIG_RANDOMIZE_BASE, which randomizes the address at
> > > which the kernel is decompressed at boot as a security feature that
> > > deters exploit attempts relying on knowledge of the location of kernel
> > > internals. The default values of the kptr_restrict and dmesg_restrict
> > > sysctls are set to (1) when this is enabled, since hiding kernel
> > > pointers is necessary to preserve the secrecy of the randomized base
> > > address.
> >
> > What happens to /proc/iomem interface which gives us the physical memory
> > location where kernel is loaded. kexec-tools relies on that interface
> > heavily so we can not take it away. And if we can not take it away then
> > I think somebody should be easibly be able to calculate this randomized
> > base address.

Is it common to run kexec-tools as non-root? It may be necessary to
restrict this interface to root when randomization is used (keep in mind
nobody's going to force you to turn this on by default, at least for the
foreseeable future).

-Dan

>
> Thanks
> Vivek

2011-05-26 20:56:18

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Thu, May 26, 2011 at 04:44:34PM -0400, Dan Rosenberg wrote:
> On Thu, 2011-05-26 at 16:40 -0400, Vivek Goyal wrote:
> > On Thu, May 26, 2011 at 04:35:02PM -0400, Vivek Goyal wrote:
> > > On Tue, May 24, 2011 at 04:31:45PM -0400, Dan Rosenberg wrote:
> > > > This introduces CONFIG_RANDOMIZE_BASE, which randomizes the address at
> > > > which the kernel is decompressed at boot as a security feature that
> > > > deters exploit attempts relying on knowledge of the location of kernel
> > > > internals. The default values of the kptr_restrict and dmesg_restrict
> > > > sysctls are set to (1) when this is enabled, since hiding kernel
> > > > pointers is necessary to preserve the secrecy of the randomized base
> > > > address.
> > >
> > > What happens to /proc/iomem interface which gives us the physical memory
> > > location where kernel is loaded. kexec-tools relies on that interface
> > > heavily so we can not take it away. And if we can not take it away then
> > > I think somebody should be easibly be able to calculate this randomized
> > > base address.
>
> Is it common to run kexec-tools as non-root? It may be necessary to
> restrict this interface to root when randomization is used (keep in mind
> nobody's going to force you to turn this on by default, at least for the
> foreseeable future).

kexec-tools runs as root. And I see that /proc/iomem permissions are
also for root only. So it probably is a non-issue.

Thanks
Vivek

2011-05-26 22:18:12

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Tuesday, May 24, 2011, Dan Rosenberg wrote:
> This introduces CONFIG_RANDOMIZE_BASE, which randomizes the address at
> which the kernel is decompressed at boot as a security feature that
> deters exploit attempts relying on knowledge of the location of kernel
> internals. The default values of the kptr_restrict and dmesg_restrict
> sysctls are set to (1) when this is enabled, since hiding kernel
> pointers is necessary to preserve the secrecy of the randomized base
> address.
>
> This feature also uses a fixed mapping to move the IDT (if not already
> done as a fix for the F00F bug), to avoid exposing the location of
> kernel internals relative to the original IDT. This has the additional
> security benefit of marking the new virtual address of the IDT
> read-only.
>
> Entropy is generated using the RDRAND instruction if it is supported. If
> not, then RDTSC is used, if supported. If neither RDRAND nor RDTSC are
> supported, then no randomness is introduced. Support for the CPUID
> instruction is required to check for the availability of these two
> instructions.
>
> Thanks to everyone who contributed helpful suggestions and feedback so
> far.
>
> Comments/Questions:
>
> * Since RDRAND is relatively new, only the most recent version of
> binutils supports assembling it. To avoid breaking builds for people
> who use older toolchains but want this feature, I hardcoded the opcodes.
> If anyone has a better approach, please let me know.
>
> * I chose to mimic the F00F bugfix behavior for moving the IDT, since it
> required very little code and has the additional benefit of making the
> IDT read-only. Ingo Molnar's suggestion of allocating per-cpu IDTs
> instead is still on the table, and I'd like to get feedback on this.
>
> * In order to increase the entropy for the randomized base, I changed
> the default value of CONFIG_PHYSICAL_ALIGN back to 2mb. It had
> previously been raised to 16mb as a hack so that relocatable kernels
> wouldn't load below that minimum. I address this by changing the
> meaning of CONFIG_PHYSICAL_START such that it now represents a minimum
> address that relocatable kernels can be loaded at (rather than being
> ignored by relocatable kernels). So, if a relocatable kernel determines
> it should be loaded at an address below CONFIG_PHYSICAL_START (which
> defaults to 16mb), I just bump it up.
>
> * I would appreciate guidance on safe values for the highest addresses
> we can safely load the kernel at, on both 32-bit and 64-bit. This
> version uses 64mb (0x4000000) for 32-bit, and worked well in testing.
>
> * CONFIG_RANDOMIZE_BASE automatically sets the default value of
> kptr_restrict and dmesg_restrict to 1, since it's nonsensical to use
> this without the other two. I considered removing
> CONFIG_SECURITY_DMESG_RESTRICT altogether (it currently sets the default
> value for dmesg_restrict), but just in case distros want to keep the
> CONFIG as a toggle switch but don't want to use CONFIG_RANDOMIZE_BASE, I
> kept it around. So, now CONFIG_RANDOMIZE_BASE sets the default value
> for CONFIG_SECURITY_DMESG_RESTRICT.
>
> * x86-64 is still "to-do". Because it calculates the kernel text address
> twice, this may be a little trickier.
>
> * Finding a middle ground instead of the current "all-or-nothing"
> behavior of kptr_restrict that allows perf users to use this feature is
> future work.
>
> * Tested by repeatedly booting and observing kallsyms output on both
> i386. Passed the "looks random to me" test, and saw no bad behavior.
> Tested that changing CONFIG_PHYSICAL_ALIGN to 2mb still boots and runs
> fine on amd64.
>
> * Is it worth bothering to look for alternate sources of entropy if
> RDTSC isn't available?
>
> * Could use testing of CPU hotplugging and suspend/resume.

Well, as far as I can tell, this feature is going to break hibernation on
both x86_32 and x86_64 at the moment, unless you can guarantee that the
randomized kernel location will be the same for both the boot and the target
kernels.

It may be worked around on x86_64 relatively easily, I think, but other
architectures (including the 32-bit x86) would require much more intrusive
modifications to work with that feature.

Thanks,
Rafael

2011-05-26 22:33:39

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/26/2011 03:18 PM, Rafael J. Wysocki wrote:
>
> Well, as far as I can tell, this feature is going to break hibernation on
> both x86_32 and x86_64 at the moment, unless you can guarantee that the
> randomized kernel location will be the same for both the boot and the target
> kernels.
>

Obviously we can't and we don't. I'm a bit surprised at that
constraint... how can that constraint not break things like kernels of
slightly different size?

-hpa

2011-05-27 00:26:14

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Thu, 2011-05-26 at 15:32 -0700, H. Peter Anvin wrote:
> On 05/26/2011 03:18 PM, Rafael J. Wysocki wrote:
> >
> > Well, as far as I can tell, this feature is going to break hibernation on
> > both x86_32 and x86_64 at the moment, unless you can guarantee that the
> > randomized kernel location will be the same for both the boot and the target
> > kernels.
> >
>
> Obviously we can't and we don't. I'm a bit surprised at that
> constraint... how can that constraint not break things like kernels of
> slightly different size?
>
> -hpa

Am I understanding it correctly that hibernation is currently operating
under a possibly false assumption? If it's the case that hibernation
should be saving the physical address at which the kernel was previously
loaded and restoring it there regardless of randomization, it would
certainly help me out if someone familiar with the code could take a
stab at that.

Otherwise, any thoughts on a potential solution?

Thanks,
Dan

2011-05-27 02:46:48

by Dave Jones

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Thu, May 26, 2011 at 03:32:13PM -0700, H. Peter Anvin wrote:
> On 05/26/2011 03:18 PM, Rafael J. Wysocki wrote:
> >
> > Well, as far as I can tell, this feature is going to break hibernation on
> > both x86_32 and x86_64 at the moment, unless you can guarantee that the
> > randomized kernel location will be the same for both the boot and the target
> > kernels.
> >
>
> Obviously we can't and we don't. I'm a bit surprised at that
> constraint... how can that constraint not break things like kernels of
> slightly different size?

In Fedora at least, we make sure the kernel you thaw from is the same one
you booted by diddling with grub to force the right kernel to be booted.
By default, you won't see a bootmenu, so it'll just dtrt. You can still
interrupt the boot process, force a boot menu and pick another kernel
of course, and we used to at least have safeguards in place that would
refuse to thaw an image from a different kernel. (This may or may not
be still true since we rewrote the initramfs tools)

Dave

2011-05-27 07:15:46

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Dan Rosenberg <[email protected]> wrote:

> 5. I'll be switching to per-cpu IDTs, basing my work on the
> following patch:
>
> http://marc.info/?l=linux-kernel&m=112767117501231&w=2
>
> Any review or comments on the above patch would be helpful. I'm
> considering submitting this portion separately, as it may provide
> performance and scalability benefits regardless of randomization.

Yeah.

Note that you do not have to do the MSI thing in Zwane's patch, nor
do i think do you need to touch the boot IDT, but instead go for the
easiest route:

There are two main places that set up the IDT:

trap_init();
init_IRQ()

The IDT is fully set up at this point and i don't think we change it
later on. So all the fancy changes to set_intr_gate() et al in
Zwane's patch seem unnecessary to me.

Most of the complexity Zwane's patch has comes from the fact that he
tries to use per CPU IDTs to create *assymetric* IDTs between CPUs -
but we do not want nor need to do that with your patch, which
simplifies things enormously.

Note that both of the above init functions execute only on the boot
CPU, well before SMP is initialized. So it is an easy environment to
work in from an IDT switcheroo POV and we should be able to switch to
the percpu IDT there without much fuss.

Note that setup_per_cpu_areas() is called well before trap_init(), so
at the end of init_IRQ() you can rely on percpu facilities such as
percpu_alloc() as well.

I'd suggest these rough steps to implement it:

- turn off CONFIG_SMP in the .config

- first add the new init function call to the end of arch/x86/'s
init_IRQ(), put the percpu_alloc() into that function, copy
the old IDT into the new IDT (but do not load it!) and boot test
the patch.

At this point you wont have any change to the IDT yet, but you
have tested all the boot CPU init order assumptions: is
percpu_alloc() really available, did you do the copying right,
etc. You might want to print-dump the new IDT in hexdump format
and check whether it looks like an IDT you'd like the CPU to load.

- then add the one extra line that loads the new IDT into the CPU.

If the kernel does not crash then you will have a randomized UP
kernel that does not leak the randomization secret to user-space
via the SIDT instructon. Test this in user-space, marvel at the
non-kernel-image address you get! :-)

- turn on CONFIG_SMP=y and boot the kernel.

The kernel should not crash: you will have the boot CPU with
the percpu IDT, and all secondary CPUs with the bootup IDT
still referenced. Check via your user-space SIDT test-code
and:

taskset 0 ./test-sidt
taskset 1 ./test-sidt
taskset 2 ./test-sidt

That indeed CPU#0 has a different IDT address from all the other
CPUs. Marvel at the incomplete but still fully working IDT setup!
:-)

- Figure out where a new secondary CPU loads the boot IDT. Figure
out where it sets up its percpu area. Find the spot where both
facilities are available already and add the percpu_alloc()+copy
routine to it. Do the hex printout and boot the kernel - do the
dumped IDTs look sane visually?

- If they looked fine then add the one extra line that loads the
new IDT into the secondary CPU(s). Boot and check the IDTs:

taskset 0 ./test-sidt
taskset 1 ./test-sidt
taskset 2 ./test-sidt

Now you should have different results on all different CPUs!
Marvel at having completed the patch!

- Please check whether the IDT has alignment requirements: we could
actually benefit from coloring the percpu IDTs a bit, as each
hyperthread (and core) has a separate IDT so we can spread out any
cache and RAM accesses a bit better amongst the cache/memory
ports.

- Please check how fast SIDT is, how many cycles does it take? If
it's faster than CPUID then you have also created another nice
scalability feature: a user-space instruction that emits the
current CPU ID! [we could encode the CPU ID in the address - this
will also give us the cache coloring.]

Note that using the percpu area will also avoid the 4K mapping TLB
problem Linus referred to: the percpu area is mapped in a 2MB data
TLB.

What this stage wont allow yet is a read-only IDT. That should be yet
another patch on top of this: the percpu IDT will already allow the
protection of the kernel image randomization secret.

The read-only IDT will bring in the 4K TLB cost but maybe that's
acceptable (because the security advantages of a read-only IDT are
real). It will be a relatively easy patch on top of the percpu IDT
patch: where you load the percpu IDT into the CPU with the LIDT
instruction, you'd first fixmap it into a readonly page:

__set_fixmap(FIX_IDT, __pa(percpu_idt_ptr), PAGE_KERNEL_RO);

And use __fix_to_virt(FIX_IDT) as the load_IDT() address.

If you do it as two patches on top of each other i'll try to figure
out a way to measure the performance impact of the readonly IDT via
perf. It won't be easy as the expected effect is very, very small.

Thanks,

Ingo

2011-05-27 09:37:00

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Vivek Goyal <[email protected]> wrote:

> On Thu, May 26, 2011 at 04:16:05PM -0400, [email protected] wrote:
> > On Thu, 26 May 2011 16:01:21 EDT, Vivek Goyal said:
> >
> > > Also randomization of kernel load address at run time will probably have
> > > some issues with crashkernel=X@Y address syntax. So far user knew what
> > > address first kernel is booting from and user could speicy where to
> > > reserve memory. Now it might happen that user specified some memory
> > > to reserve and kernel decided to occupy that space resulting in failed
> > > memory reservation for crash kernel.
> >
> > That is however fixable - the randomizer just needs to make sure it doesn't
> > overlay the crashkernel= space, and the crashkernel needs to be started with a
> > 'norandomize' parameter.
>
> That can be done but at the same time if kernel does not find any suitable
> range to boot from, it should override crashkernel=X@Y settings and fail
> crash memory reservation.
>
> I guess with randomize space thing a more suitable crash kernel command
> line will be crashkernel=X where kernel decides the base address for
> second kernel depending on availability.
>
> > If your threat model includes attacks on the
> > crashkernel that randomizing will help with, you got bigger problems. ;)
> >
>
> :-) I think norandomize for kdump kernel should be just fine.

Dan, please always generate a very clear printk when randomization is
off - if we implement everything correctly then it will be impossible
for even the admin to determine whether there's kernel image
randomization going on on a system! :-)

Btw., systems with signed modules and with an inability for even root
to break into the kernel probably want to disable the pagetable
dumper in debugfs, that will show the exact location of the kernel
image.

(Btw., please also check that unprivileged users cannot read that
file.)

Thanks,

Ingo

2011-05-27 09:39:12

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Vivek Goyal <[email protected]> wrote:

> > Is it common to run kexec-tools as non-root? It may be necessary
> > to restrict this interface to root when randomization is used
> > (keep in mind nobody's going to force you to turn this on by
> > default, at least for the foreseeable future).
>
> kexec-tools runs as root. And I see that /proc/iomem permissions
> are also for root only. So it probably is a non-issue.

it might be an issue to keep in mind for later projects that try to
lock down root itself from being able to patch the kernel (other than
rebooting the box), using signed modules, disabled direct-ioport
access, and other hardened facilities.

Thanks,

Ingo

2011-05-27 09:41:03

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Dave Jones <[email protected]> wrote:

> On Thu, May 26, 2011 at 03:32:13PM -0700, H. Peter Anvin wrote:
> > On 05/26/2011 03:18 PM, Rafael J. Wysocki wrote:
> > >
> > > Well, as far as I can tell, this feature is going to break hibernation on
> > > both x86_32 and x86_64 at the moment, unless you can guarantee that the
> > > randomized kernel location will be the same for both the boot and the target
> > > kernels.
> > >
> >
> > Obviously we can't and we don't. I'm a bit surprised at that
> > constraint... how can that constraint not break things like kernels of
> > slightly different size?
>
> In Fedora at least, we make sure the kernel you thaw from is the
> same one you booted by diddling with grub to force the right kernel
> to be booted.

Btw., the hibernation code should save a signature and make sure that
the two kernels match! It's really broken if the code allows blind
thawing ...

Thanks,

Ingo

2011-05-27 13:08:45

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Fri, May 27, 2011 at 11:38:53AM +0200, Ingo Molnar wrote:
>
> * Vivek Goyal <[email protected]> wrote:
>
> > > Is it common to run kexec-tools as non-root? It may be necessary
> > > to restrict this interface to root when randomization is used
> > > (keep in mind nobody's going to force you to turn this on by
> > > default, at least for the foreseeable future).
> >
> > kexec-tools runs as root. And I see that /proc/iomem permissions
> > are also for root only. So it probably is a non-issue.
>
> it might be an issue to keep in mind for later projects that try to
> lock down root itself from being able to patch the kernel (other than
> rebooting the box), using signed modules, disabled direct-ioport
> access, and other hardened facilities.

For such environments, Eric Paris had posted a patch to be able to
disable loading of kexec/kdump kernel, similar to disabling module loading.

https://lkml.org/lkml/2011/1/19/412

I don't see that in Linus's tree. So looks like it never got committed.

Thanks
Vivek

2011-05-27 13:13:41

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Thu, May 26, 2011 at 04:44:34PM -0400, Dan Rosenberg wrote:
> On Thu, 2011-05-26 at 16:40 -0400, Vivek Goyal wrote:
> > On Thu, May 26, 2011 at 04:35:02PM -0400, Vivek Goyal wrote:
> > > On Tue, May 24, 2011 at 04:31:45PM -0400, Dan Rosenberg wrote:
> > > > This introduces CONFIG_RANDOMIZE_BASE, which randomizes the address at
> > > > which the kernel is decompressed at boot as a security feature that
> > > > deters exploit attempts relying on knowledge of the location of kernel
> > > > internals. The default values of the kptr_restrict and dmesg_restrict
> > > > sysctls are set to (1) when this is enabled, since hiding kernel
> > > > pointers is necessary to preserve the secrecy of the randomized base
> > > > address.
> > >
> > > What happens to /proc/iomem interface which gives us the physical memory
> > > location where kernel is loaded. kexec-tools relies on that interface
> > > heavily so we can not take it away. And if we can not take it away then
> > > I think somebody should be easibly be able to calculate this randomized
> > > base address.
>
> Is it common to run kexec-tools as non-root? It may be necessary to
> restrict this interface to root when randomization is used (keep in mind
> nobody's going to force you to turn this on by default, at least for the
> foreseeable future).

Dan,

I had a stupid question. /proc/kallsyms is also readable by root only. So
if we are doing this so that non-root user can not know kernel virtual and
physical address that should be already covered as non-root users can't
read /proc/kallsysm or /boot/System.map.

And if this randomization is also to protect information from root user
then /proc/iomem exporting the physical address of kernel is still a
valid question in that context.

Thanks
Vivek

2011-05-27 13:21:46

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Fri, 2011-05-27 at 09:13 -0400, Vivek Goyal wrote:
> On Thu, May 26, 2011 at 04:44:34PM -0400, Dan Rosenberg wrote:
> > On Thu, 2011-05-26 at 16:40 -0400, Vivek Goyal wrote:
> > > On Thu, May 26, 2011 at 04:35:02PM -0400, Vivek Goyal wrote:
> > > > On Tue, May 24, 2011 at 04:31:45PM -0400, Dan Rosenberg wrote:
> > > > > This introduces CONFIG_RANDOMIZE_BASE, which randomizes the address at
> > > > > which the kernel is decompressed at boot as a security feature that
> > > > > deters exploit attempts relying on knowledge of the location of kernel
> > > > > internals. The default values of the kptr_restrict and dmesg_restrict
> > > > > sysctls are set to (1) when this is enabled, since hiding kernel
> > > > > pointers is necessary to preserve the secrecy of the randomized base
> > > > > address.
> > > >
> > > > What happens to /proc/iomem interface which gives us the physical memory
> > > > location where kernel is loaded. kexec-tools relies on that interface
> > > > heavily so we can not take it away. And if we can not take it away then
> > > > I think somebody should be easibly be able to calculate this randomized
> > > > base address.
> >
> > Is it common to run kexec-tools as non-root? It may be necessary to
> > restrict this interface to root when randomization is used (keep in mind
> > nobody's going to force you to turn this on by default, at least for the
> > foreseeable future).
>
> Dan,
>
> I had a stupid question. /proc/kallsyms is also readable by root only. So
> if we are doing this so that non-root user can not know kernel virtual and
> physical address that should be already covered as non-root users can't
> read /proc/kallsysm or /boot/System.map.
>

Not sure what system you're running, but /proc/kallsyms is 0444 on my
machine (and in mainline, afaik). Likewise for /proc/iomem.

The problem is mainly with distribution kernels - it's trivial to just
grab an identical vmlinux to a target machine and then you instantly
know exactly where everything is.

> And if this randomization is also to protect information from root user
> then /proc/iomem exporting the physical address of kernel is still a
> valid question in that context.
>

I think we can deal with unprivileged users first, and if we want to
truly prevent root from finding this out, we can introduce a separate
toggle that locks things down further.

-Dan

> Thanks
> Vivek

2011-05-27 13:38:20

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Vivek Goyal <[email protected]> wrote:

> On Fri, May 27, 2011 at 11:38:53AM +0200, Ingo Molnar wrote:
> >
> > * Vivek Goyal <[email protected]> wrote:
> >
> > > > Is it common to run kexec-tools as non-root? It may be necessary
> > > > to restrict this interface to root when randomization is used
> > > > (keep in mind nobody's going to force you to turn this on by
> > > > default, at least for the foreseeable future).
> > >
> > > kexec-tools runs as root. And I see that /proc/iomem permissions
> > > are also for root only. So it probably is a non-issue.
> >
> > it might be an issue to keep in mind for later projects that try to
> > lock down root itself from being able to patch the kernel (other than
> > rebooting the box), using signed modules, disabled direct-ioport
> > access, and other hardened facilities.
>
> For such environments, Eric Paris had posted a patch to be able to
> disable loading of kexec/kdump kernel, similar to disabling module
> loading.
>
> https://lkml.org/lkml/2011/1/19/412
>
> I don't see that in Linus's tree. So looks like it never got
> committed.

That patch looks sane enough. Ping akpm about it please?

Thanks,

Ingo

2011-05-27 13:46:35

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Dan Rosenberg <[email protected]> wrote:

> > And if this randomization is also to protect information from
> > root user then /proc/iomem exporting the physical address of
> > kernel is still a valid question in that context.
>
> I think we can deal with unprivileged users first, and if we want
> to truly prevent root from finding this out, we can introduce a
> separate toggle that locks things down further.

Correct, the case of unprivileged users should be handled first and
it should be handled separately from any root-restrictions.

I only raised this to have a rough record of what would have to
happen there.

Once all is said, done, committed and tested (the last two not
necessarily in that order), we can look at any open root-restrict
questions. It's a lot less clear-cut from a system usability POV.

If we do it we probably want one central one-shot 'restrict root from
now on' toggle, not the separate switches that kill kexec and module
loading separately.

Some shops might even want to disable root from being able to reboot
the system and restrict reboots to physically performed (and
crash/panic/hang induced) reboots only.

Some shops might want to make reboots dependent on the provision of a
secret key. That key would not be stored on that system.

So there's lots of details to sort out in the "keep root from being
able to break into the kernel and hide a rootkit out and disappear"
area.

Thanks,

Ingo

2011-05-27 13:51:10

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Fri, May 27, 2011 at 09:21:32AM -0400, Dan Rosenberg wrote:
> On Fri, 2011-05-27 at 09:13 -0400, Vivek Goyal wrote:
> > On Thu, May 26, 2011 at 04:44:34PM -0400, Dan Rosenberg wrote:
> > > On Thu, 2011-05-26 at 16:40 -0400, Vivek Goyal wrote:
> > > > On Thu, May 26, 2011 at 04:35:02PM -0400, Vivek Goyal wrote:
> > > > > On Tue, May 24, 2011 at 04:31:45PM -0400, Dan Rosenberg wrote:
> > > > > > This introduces CONFIG_RANDOMIZE_BASE, which randomizes the address at
> > > > > > which the kernel is decompressed at boot as a security feature that
> > > > > > deters exploit attempts relying on knowledge of the location of kernel
> > > > > > internals. The default values of the kptr_restrict and dmesg_restrict
> > > > > > sysctls are set to (1) when this is enabled, since hiding kernel
> > > > > > pointers is necessary to preserve the secrecy of the randomized base
> > > > > > address.
> > > > >
> > > > > What happens to /proc/iomem interface which gives us the physical memory
> > > > > location where kernel is loaded. kexec-tools relies on that interface
> > > > > heavily so we can not take it away. And if we can not take it away then
> > > > > I think somebody should be easibly be able to calculate this randomized
> > > > > base address.
> > >
> > > Is it common to run kexec-tools as non-root? It may be necessary to
> > > restrict this interface to root when randomization is used (keep in mind
> > > nobody's going to force you to turn this on by default, at least for the
> > > foreseeable future).
> >
> > Dan,
> >
> > I had a stupid question. /proc/kallsyms is also readable by root only. So
> > if we are doing this so that non-root user can not know kernel virtual and
> > physical address that should be already covered as non-root users can't
> > read /proc/kallsysm or /boot/System.map.
> >
>
> Not sure what system you're running, but /proc/kallsyms is 0444 on my
> machine (and in mainline, afaik). Likewise for /proc/iomem.

Sorry. I read it wrong. Yes /proc/iomem and /proc/kallsyms are 0444.

>
> The problem is mainly with distribution kernels - it's trivial to just
> grab an identical vmlinux to a target machine and then you instantly
> know exactly where everything is.
>
> > And if this randomization is also to protect information from root user
> > then /proc/iomem exporting the physical address of kernel is still a
> > valid question in that context.
> >
>
> I think we can deal with unprivileged users first, and if we want to
> truly prevent root from finding this out, we can introduce a separate
> toggle that locks things down further.

Ok, given the fact that /proc/iomem is 0444 and it carries the physical
address of kernel, it think it should be easy to calcualte the randomized
offset. So I guess we shall have to do something about that too.

Thanks
Vivek

2011-05-27 15:43:38

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Thu, May 26, 2011 at 3:18 PM, Rafael J. Wysocki <[email protected]> wrote:
>
> Well, as far as I can tell, this feature is going to break hibernation on
> both x86_32 and x86_64 at the moment, unless you can guarantee that the
> randomized kernel location will be the same for both the boot and the target
> kernels.

You know what? Maybe that guarantee is actually the *right* thing to do..

In other words, maybe we really really shouldn't randomize the kernel
load address at boot time at all.

Instead, what would be much better, is if we just had some way to
re-link distro kernels with some random text offset. Sure, the load
address wouldn't be "random" in any local sense any more, but I think
the real effort here was to avoid having the common distro kernels
having known text addresses.

If you compile your own kernel version, you're already home free, and
load-time randomization is pointless.

And load-time randomization has all these nasty problems with memory
maps etc, because we obviously have to shift the whole kernel around
by some fixed offset. But if there was some way to just re-link the
distro kernel easily, then it could be done by the kernel install
scripts, and it could potentially do more than just "shift up load
address by some random number".

Hmm?

Linus

2011-05-27 16:07:04

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Friday, May 27, 2011, H. Peter Anvin wrote:
> On 05/26/2011 03:18 PM, Rafael J. Wysocki wrote:
> >
> > Well, as far as I can tell, this feature is going to break hibernation on
> > both x86_32 and x86_64 at the moment, unless you can guarantee that the
> > randomized kernel location will be the same for both the boot and the target
> > kernels.
> >
>
> Obviously we can't and we don't. I'm a bit surprised at that
> constraint... how can that constraint not break things like kernels of
> slightly different size?

Our hibernation code generally requires that the kernel used for loading
the image be the same as the hibernated one. This requirement is slightly
lifted for x86_64, but still we don't have a mechanism for passing the
jump address into the hibernated header in the image header.

I planned to add that, but then didn't have the time to work on it.

Thanks,
Rafael

2011-05-27 16:11:25

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Fri, 2011-05-27 at 08:42 -0700, Linus Torvalds wrote:
> On Thu, May 26, 2011 at 3:18 PM, Rafael J. Wysocki <[email protected]> wrote:
> >
> > Well, as far as I can tell, this feature is going to break hibernation on
> > both x86_32 and x86_64 at the moment, unless you can guarantee that the
> > randomized kernel location will be the same for both the boot and the target
> > kernels.
>
> You know what? Maybe that guarantee is actually the *right* thing to do..
>
> In other words, maybe we really really shouldn't randomize the kernel
> load address at boot time at all.
>
> Instead, what would be much better, is if we just had some way to
> re-link distro kernels with some random text offset. Sure, the load
> address wouldn't be "random" in any local sense any more, but I think
> the real effort here was to avoid having the common distro kernels
> having known text addresses.
>
> If you compile your own kernel version, you're already home free, and
> load-time randomization is pointless.
>
> And load-time randomization has all these nasty problems with memory
> maps etc, because we obviously have to shift the whole kernel around
> by some fixed offset. But if there was some way to just re-link the
> distro kernel easily, then it could be done by the kernel install
> scripts, and it could potentially do more than just "shift up load
> address by some random number".
>
> Hmm?
>
> Linus

You know what...I'm surprised that I'm saying this, but given the number
of non-trivial challenges that still need to be solved in order to
implement load-time randomization, maybe this would be a better way
forward.

We'd still need to go through the same effort to hide information about
kernel text offsets, and we'd still need to do per-cpu IDTs, but neither
of those items are as challenging as some of the other problems.

I'm not ready to take load-time randomization off the table, but I'd
certainly like to hear more discussion on this. There are clearly
advantages to load-time randomization that this new option wouldn't
have, but the question is really "is what we gain worth the effort?".

Thanks,
Dan

2011-05-27 16:10:36

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Friday, May 27, 2011, Ingo Molnar wrote:
>
> * Dave Jones <[email protected]> wrote:
>
> > On Thu, May 26, 2011 at 03:32:13PM -0700, H. Peter Anvin wrote:
> > > On 05/26/2011 03:18 PM, Rafael J. Wysocki wrote:
> > > >
> > > > Well, as far as I can tell, this feature is going to break hibernation on
> > > > both x86_32 and x86_64 at the moment, unless you can guarantee that the
> > > > randomized kernel location will be the same for both the boot and the target
> > > > kernels.
> > > >
> > >
> > > Obviously we can't and we don't. I'm a bit surprised at that
> > > constraint... how can that constraint not break things like kernels of
> > > slightly different size?
> >
> > In Fedora at least, we make sure the kernel you thaw from is the
> > same one you booted by diddling with grub to force the right kernel
> > to be booted.
>
> Btw., the hibernation code should save a signature and make sure that
> the two kernels match! It's really broken if the code allows blind
> thawing ...

It uses signatures, but on x86_64 you actually can use a different kernel
for loading the image, with some limitations.

I'd like to add a mechanism for passing the jump address into the hibernated
kernel in the kernel image, but that part is still missing.

Thanks,
Rafael

2011-05-27 16:21:01

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Friday, May 27, 2011, Dan Rosenberg wrote:
> On Thu, 2011-05-26 at 15:32 -0700, H. Peter Anvin wrote:
> > On 05/26/2011 03:18 PM, Rafael J. Wysocki wrote:
> > >
> > > Well, as far as I can tell, this feature is going to break hibernation on
> > > both x86_32 and x86_64 at the moment, unless you can guarantee that the
> > > randomized kernel location will be the same for both the boot and the target
> > > kernels.
> > >
> >
> > Obviously we can't and we don't. I'm a bit surprised at that
> > constraint... how can that constraint not break things like kernels of
> > slightly different size?
> >
> > -hpa
>
> Am I understanding it correctly that hibernation is currently operating
> under a possibly false assumption? If it's the case that hibernation
> should be saving the physical address at which the kernel was previously
> loaded and restoring it there regardless of randomization, it would
> certainly help me out if someone familiar with the code could take a
> stab at that.

It rather has to save the address where to jump into the image kernel from
the boot kernel, but ISTR that's not straightforward. I thought about
implementing something like this some time ago, but finally I didn't have
the time to finish that work.

At the moment I'm preparing for a trip to Japan, so I'll be able to work on
this with you when I get back home (some time next weekend). In the
meantime, please have a look at arch/x86/power/hibernate_64.c and
arch/x86/power/hibernate_asm_64.S.

Thanks,
Rafael

2011-05-27 17:01:12

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Linus Torvalds <[email protected]> wrote:

> If you compile your own kernel version, you're already home free,
> and load-time randomization is pointless.

Most successful exploits work in two steps: first a local exploit
(weak password with a user, stupid script escaping bug, or a buffer
overflow somewhere), then a local kernel exploit to gain root and
kernel access. (for a rootkit and what not)

Straight remote root exploits are pretty rare - and per system
relinking only protects against that.

The problem with your relinking solution is that a local attacker can
easily figure out where the kernel is. So this does not protect
against the more common break-in scenario.

Kernel image randomization makes this last step really
indeterministic and thus dangerous to attackers.

Thanks,

Ingo

2011-05-27 17:07:34

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/27/2011 10:00 AM, Ingo Molnar wrote:
>
> The problem with your relinking solution is that a local attacker can
> easily figure out where the kernel is. So this does not protect
> against the more common break-in scenario.
>

There is another issue with it: it doesn't actually solve the real
problem other than suspend/resume, which is that the relocation agent
needs to understand what the memory space looks like at the time of boot.

I think something else we will need for this to be possible is initramfs
decoding directly from highmem, since the hack we're currently using to
deal with an initramfs/initrd located partly in highmem will break.

-hpa

2011-05-27 17:10:44

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Fri, 2011-05-27 at 19:00 +0200, Ingo Molnar wrote:
> * Linus Torvalds <[email protected]> wrote:
>
> > If you compile your own kernel version, you're already home free,
> > and load-time randomization is pointless.

> The problem with your relinking solution is that a local attacker can
> easily figure out where the kernel is. So this does not protect
> against the more common break-in scenario.
>
> Kernel image randomization makes this last step really
> indeterministic and thus dangerous to attackers.
>

Just to play devil's advocate, how is it easier for a local attacker to
figure out where kernel internals are if it's been relinked vs.
randomized at load time, assuming we follow through on fixing the info
leaks?

It seems to me that the only functional difference is that subsequent
reboots will yield the same memory layout, which is a real drawback
worth considering.

-Dan

> Thanks,
>
> Ingo

2011-05-27 17:14:58

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/27/2011 10:10 AM, Dan Rosenberg wrote:
>
> Just to play devil's advocate, how is it easier for a local attacker to
> figure out where kernel internals are if it's been relinked vs.
> randomized at load time, assuming we follow through on fixing the info
> leaks?
>

You can read the on-disk kernel file and find out.

-hpa

2011-05-27 17:16:40

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Dan Rosenberg <[email protected]> wrote:

> Just to play devil's advocate, how is it easier for a local
> attacker to figure out where kernel internals are if it's been
> relinked vs. randomized at load time, assuming we follow through on
> fixing the info leaks?

Well, 'fixing the info leaks' will obfuscate previously useful files
such as /proc/kallsyms ...

That's one of the advantages of randomization: it allows us to expose
RIPs without them being an instant information leak.

Thanks,

Ingo

2011-05-27 17:17:31

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Fri, May 27, 2011 at 10:13 AM, H. Peter Anvin <[email protected]> wrote:
>
> You can read the on-disk kernel file and find out.

So? Make it root-readable-only. Problem solved.

That's the _only_ difference, and it's trivial and irrelevant. Come up
with something more real, please.

Linus

2011-05-27 17:21:17

by Kees Cook

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Fri, May 27, 2011 at 10:13:54AM -0700, H. Peter Anvin wrote:
> On 05/27/2011 10:10 AM, Dan Rosenberg wrote:
> >
> > Just to play devil's advocate, how is it easier for a local attacker to
> > figure out where kernel internals are if it's been relinked vs.
> > randomized at load time, assuming we follow through on fixing the info
> > leaks?
> >
>
> You can read the on-disk kernel file and find out.

If we're still operating under the assumption of "defend against non-root",
distros can trivially make the on-disk kernels 0400.

-Kees

--
Kees Cook
Ubuntu Security Team

2011-05-27 17:21:55

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Fri, May 27, 2011 at 10:16 AM, Ingo Molnar <[email protected]> wrote:
>
> Well, 'fixing the info leaks' will obfuscate previously useful files
> such as /proc/kallsyms ...

Guys, stop with the crazy already.

YOU HAVE TO DO THAT FOR THE LINK-TIME-OBFUSCATION TOO!

> That's one of the advantages of randomization: it allows us to expose
> RIPs without them being an instant information leak.

Except you clearly aren't thinking that through AT ALL.

The obfuscation of things like /proc/kallsyms is *exactly*the*same*
whether you do the randomization at boot-time or install-time.

For chrissake - you're doing the same thing. The only question is
"when" (and the fact that if you do it at install-time, you can do a
fancier job of it)

Stop wasting peoples time with idiocies, please.

Linus

2011-05-27 17:38:57

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Linus Torvalds <[email protected]> wrote:

> That's the _only_ difference, and it's trivial and irrelevant. Come
> up with something more real, please.

The advantages of dynamic per boot kernel randomization, over static
per system randomization, as i see them, in order of descending
importance:

- A root exploit will still not give away the location of the
kernel (assuming module loading has been disabled after bootup),
so a rootkit cannot be installed 'silently' on the system, into
RAM only, evading most offline-storage-checking tools.

With static linking this is not possible: reading the kernel image
as root trivially exposes the kernel's location.

- We can expose RIPs to unprivileged tools. Certain users could
still kernel-profile a busy server box while neither being root,
nor having access to the real location of the kernel.

With static linking this is not possible.

- Crash & reboot & retry brute force exploits get harder: if one
attempt at an exploit causes a crash and a reboot, the kernel
addresses are different after the reboot so the attempt has to be
retried without the advantage of any prior history.

With static linking this kind of exploit is somewhat easier: every
crash gives a permanent proof that the guessed RIP offet was
wrong, so history can be used on subsequent retries.

- It gives a way to go one step further in secure server lockdown:
where even root with full access to all storage has no way to
break into the kernel. Reboots, module loading and kexec can be
controlled, ioperm() and iopl() can be restricted. If those are
taken away then even if a root exploit allows the attacker to
overwrite the kernel image, a reboot has to be waited for and if
reboots do sanity checks [based on immutable storage] of the
system then the exploit can be found.

With static linking this is not possible: reading the kernel image
as root trivially exposes the kernel's location.

It's in order of importance: you probably stopped caring at item 2 or
3 but there's definitely people who'd like to go all the way to 4. So
if we can do dynamic randomization sanely then why not offer it as an
option?

Thanks,

Ingo

2011-05-27 17:46:59

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Linus Torvalds <[email protected]> wrote:

> On Fri, May 27, 2011 at 10:16 AM, Ingo Molnar <[email protected]> wrote:
> >
> > Well, 'fixing the info leaks' will obfuscate previously useful files
> > such as /proc/kallsyms ...
>
> Guys, stop with the crazy already.
>
> YOU HAVE TO DO THAT FOR THE LINK-TIME-OBFUSCATION TOO!
>
> > That's one of the advantages of randomization: it allows us to
> > expose RIPs without them being an instant information leak.
>
> Except you clearly aren't thinking that through AT ALL.
>
> The obfuscation of things like /proc/kallsyms is *exactly*the*same*
> whether you do the randomization at boot-time or install-time.

Well, but two mails ago you said:

> And load-time randomization has all these nasty problems with
> memory maps etc, because we obviously have to shift the whole
> kernel around by some fixed offset. But if there was some way to
> just re-link the distro kernel easily, then it could be done by the
> kernel install scripts, and it could potentially do more than just
> "shift up load address by some random number".

If i understood you correctly you suggest randomizing the image by
shifting the symbols in it around. The boot loader would still load
an 'image' where it always loads it - just that image itself is
randomized internally somewhat, right?

( because that's the only way we can avoid the problems with e820
memory maps which you referred to, if don't actually change the
load address. )

Have i understood you correctly?

Thanks,

Ingo

2011-05-27 17:54:51

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/27/2011 10:46 AM, Ingo Molnar wrote:
>
> If i understood you correctly you suggest randomizing the image by
> shifting the symbols in it around. The boot loader would still load
> an 'image' where it always loads it - just that image itself is
> randomized internally somewhat, right?
>
> ( because that's the only way we can avoid the problems with e820
> memory maps which you referred to, if don't actually change the
> load address. )
>

That doesn't solve any problems with the memory map.

-hpa

2011-05-27 18:05:37

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Fri, May 27, 2011 at 10:46 AM, Ingo Molnar <[email protected]> wrote:
>
> If i understood you correctly you suggest randomizing the image by
> shifting the symbols in it around. The boot loader would still load
> an 'image' where it always loads it - just that image itself is
> randomized internally somewhat, right?

You snipped the other part of my email you responded to:

For chrissake - you're doing the same thing. The only question is
"when" (and the fact that if you do it at install-time, you can do a
fancier job of it)

ie the fact that if you do it at install-time, you have the option of
being much more fancy about it.

So sure, the install time option *can* do more. It doesn't *have* to do more.

But being able to do a better job of randomization is *better*. Ok? It
doesn't mean you have to, but you have more options to do things if
you want to.

IOW, there is absolutely zero difference between doing it at
install-time or run-time, but the install-time one is (a) likely
easier and (b) certainly more flexible. But both of them do the exact
same thing, and require the exact same support in things like
/proc/kallsyms.

Of course, if we end up doing something really fancy (which the
install-time option allows), that obviously does mean that the
remapping by %pK thing for kallsyms needs to be much smarter too.

But at %pK time, you can *afford* to do that kind of things. At
boot-time, before you're even loaded and have a hard time even parsing
the e820 maps? Yeah, you're not going to do anything smart there, I
can tell you.

Linus

2011-05-27 18:06:03

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Fri, May 27, 2011 at 10:53 AM, H. Peter Anvin <[email protected]> wrote:
>
> That doesn't solve any problems with the memory map.

Actually, it does.

You can load the kernel at the same virtual address we always load it,
and/or perhaps shift it up by just small amounts (ie "single pages"
rather than "ten bits worth of pages")

And then rely on the fact that you mixed up symbols in other ways.

"Look ma, no need to worry about memory map". At least no more than we do now.

Put another way: think about our /proc/iomem right now:

00100000-bdc6ffff : System RAM
01000000-016bdced : Kernel code
016bdcee-01ca8b7f : Kernel data
01d36000-01de2fff : Kernel bss

with the "shift kernel up at load-time", the above information is
suddenly very scary, because the "Kernel code" part is magically
important.

In contrast, if your randomization depends on just relinking things a
bit differently, you don't really give out any of the random
information in /proc/iomem. Nor does it affect the load address and
the e820 memory map.

And, in fact, it does give you way more bits of randomness to play
around with the text addresses.

With something like function-sections, it should be possible to do
quite a serious job of relinking (and then keep some "function section
to actual relinked address" mapping around so that you can do the
/proc/kallsyms mappings).

But that's actually the "fancy" model. I don't think we should aim at
that to begin with. Start off with something much less ambitious, like
just shifting the kernel by a few pages. People have argued that even
just a 50% chance of an oops is preferable to nothing. So we can start
small and stupid.

See?

Linus

2011-05-27 18:17:46

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Linus Torvalds <[email protected]> wrote:

> On Fri, May 27, 2011 at 10:46 AM, Ingo Molnar <[email protected]> wrote:
> >
> > If i understood you correctly you suggest randomizing the image
> > by shifting the symbols in it around. The boot loader would still
> > load an 'image' where it always loads it - just that image itself
> > is randomized internally somewhat, right?
>
> You snipped the other part of my email you responded to:
>
> For chrissake - you're doing the same thing. The only question is
> "when" (and the fact that if you do it at install-time, you can do a
> fancier job of it)
>
> ie the fact that if you do it at install-time, you have the option of
> being much more fancy about it.
>
> So sure, the install time option *can* do more. It doesn't *have*
> to do more.
>
> But being able to do a better job of randomization is *better*. Ok?
> It doesn't mean you have to, but you have more options to do things
> if you want to.
>
> IOW, there is absolutely zero difference between doing it at
> install-time or run-time, but the install-time one is (a) likely
> easier and (b) certainly more flexible. But both of them do the
> exact same thing, and require the exact same support in things like
> /proc/kallsyms.
>
> Of course, if we end up doing something really fancy (which the
> install-time option allows), that obviously does mean that the
> remapping by %pK thing for kallsyms needs to be much smarter too.
>
> But at %pK time, you can *afford* to do that kind of things. At
> boot-time, before you're even loaded and have a hard time even
> parsing the e820 maps? Yeah, you're not going to do anything smart
> there, I can tell you.

Ok, you are right, we could patch in all the things into the image at
install time to be able to 'derandomize' symbols and still be able to
provide them.

[ One worry i have is that distro logic is to go for the simplest
route: which is to randomize the symbols by padding the beginning
or the end of the kernel image a bit, but don't bother making %pK
smart or fancy. This means that /proc/kallsyms will be restricted
(maybe even turned off completely, because it's now broken) and a
'real' System.map put, only readable to root. This still 'allows'
tooling, in a full SystemTap and Oprofile usability fashion. ]

Anyway, this strikes off the second item from my list. Meanwhile i
also found two other usecases which i added to the head of the list:

- Boot time dynamic randomization allows randomization of 'mass
install' systems, where the same image is used, to still be
randomized: for example a million phones all with the same Flash
ROM image and no 'install' performed at all on them.

With static randomization these systems will all have the same
kernel addresses.

- Boot time dynamic randomization allows read-only systems to still
be randomized: for example internet cafes that use some popular
pre-packaged kiosk-mode live-DVD. They probably wont bother
randomizing and relinking the ISOs per machine and burning per
machine DVDs ...

- A root exploit will still not give away the location of the
kernel (assuming module loading has been disabled after bootup),
so a rootkit cannot be installed 'silently' on the system, into
RAM only, evading most offline-storage-checking tools.

With static linking this is not possible: reading the kernel image
as root trivially exposes the kernel's location.

- Crash & reboot & retry brute force exploits get harder: if one
attempt at an exploit causes a crash and a reboot, the kernel
addresses are different after the reboot so the attempt has to be
retried without the advantage of any prior history.

With static linking this kind of exploit is somewhat easier: every
crash gives a permanent proof that the guessed RIP offet was
wrong, so history can be used on subsequent retries.

- It gives a way to go one step further in secure server lockdown:
where even root with full access to all storage has no way to
break into the kernel. Reboots, module loading and kexec can be
controlled, ioperm() and iopl() can be restricted. If those are
taken away then even if a root exploit allows the attacker to
overwrite the kernel image, a reboot has to be waited for and if
reboots do sanity checks [based on immutable storage] of the
system then the exploit can be found.

With static linking this is not possible: reading the kernel image
as root trivially exposes the kernel's location.

Thanks,

Ingo

2011-05-27 18:44:48

by Kees Cook

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Fri, May 27, 2011 at 08:17:24PM +0200, Ingo Molnar wrote:
> - Boot time dynamic randomization allows randomization of 'mass
> install' systems, where the same image is used, to still be
> randomized: for example a million phones all with the same Flash
> ROM image and no 'install' performed at all on them.
>
> With static randomization these systems will all have the same
> kernel addresses.
>
> - Boot time dynamic randomization allows read-only systems to still
> be randomized: for example internet cafes that use some popular
> pre-packaged kiosk-mode live-DVD. They probably wont bother
> randomizing and relinking the ISOs per machine and burning per
> machine DVDs ...

These 2 points are pretty significant, IMO.

And frankly, distros almost fall into these categories already. IIUC,
a distro would need to ship all of the .o files from each config of the
kernel they ship so each system could do the relinking. That's not a
small foot print to suddenly add to base installs.

-Kees

--
Kees Cook
Ubuntu Security Team

2011-05-27 18:48:58

by David Lang

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Fri, 27 May 2011, Ingo Molnar wrote:

I don't think these two new images are as important as you are tagging
them. I would put them down with the 'protect the system from root' type
of issues.

> - Boot time dynamic randomization allows randomization of 'mass
> install' systems, where the same image is used, to still be
> randomized: for example a million phones all with the same Flash
> ROM image and no 'install' performed at all on them.
>
> With static randomization these systems will all have the same
> kernel addresses.

there is already a need to be able to customize these systems on an
individual system basis (think SSL certs or ssh keys for example)

yes, this makes it a little more difficult than just 'drop this image bit
for bit on the system', but it's not that hard to setup a 'the first time
you boot do this stuff then reboot' step, and that step can do the
'install time' stuff.

> - Boot time dynamic randomization allows read-only systems to still
> be randomized: for example internet cafes that use some popular
> pre-packaged kiosk-mode live-DVD. They probably wont bother
> randomizing and relinking the ISOs per machine and burning per
> machine DVDs ...

this matters a little bit more because a script to create a custom DVD
image on the fly is more difficult.

however, I think this is a significantly less important target,
specifically because these are read-only system images.

but if someone really cares about this, they just need to create a stack
of slightly different DVDs. if this can be batched up and automated it's
not that big a deal. the DVDs don't really need to be per-machine, just a
variety of them.

David Lang

2011-05-27 19:16:12

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Fri, May 27, 2011 at 11:05:07AM -0700, Linus Torvalds wrote:
> On Fri, May 27, 2011 at 10:53 AM, H. Peter Anvin <[email protected]> wrote:
> >
> > That doesn't solve any problems with the memory map.
>
> Actually, it does.
>
> You can load the kernel at the same virtual address we always load it,
> and/or perhaps shift it up by just small amounts (ie "single pages"
> rather than "ten bits worth of pages")
>
> And then rely on the fact that you mixed up symbols in other ways.
>
> "Look ma, no need to worry about memory map". At least no more than we do now.
>
> Put another way: think about our /proc/iomem right now:
>
> 00100000-bdc6ffff : System RAM
> 01000000-016bdced : Kernel code
> 016bdcee-01ca8b7f : Kernel data
> 01d36000-01de2fff : Kernel bss
>
> with the "shift kernel up at load-time", the above information is
> suddenly very scary, because the "Kernel code" part is magically
> important.
>
> In contrast, if your randomization depends on just relinking things a
> bit differently, you don't really give out any of the random
> information in /proc/iomem. Nor does it affect the load address and
> the e820 memory map.
>
> And, in fact, it does give you way more bits of randomness to play
> around with the text addresses.

I am wondering what happens to crash analysis tools if per system
virtual addresses are shifted by some offset. I guess tools like
"crash" can adjust to this by looking at vmcore ELF headers but
I think gdb does not expect change of virtual addresses.

That would essentially mean that apart from vmcore one shall have to
store the vmlinux file also from the system crashed. Currently we don't
have to save vmlinux. In fact for analysis we can install distro provided
debug compiled vmlinux later and just need to get the vmcore file from
crashed system and do the analysis.

So IIUC, with above model, I guess "crash" should be able to adjust
to it quickly but gdb will have issues.

Thanks
Vivek

2011-05-27 21:38:13

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/27/2011 11:05 AM, Linus Torvalds wrote:
>
> You can load the kernel at the same virtual address we always load it,
> and/or perhaps shift it up by just small amounts (ie "single pages"
> rather than "ten bits worth of pages")
>
> And then rely on the fact that you mixed up symbols in other ways.
>

OK, here is a bat-shit-crazy idea... an all-module kernel where nothing
except init code is prelinked at all.

If we could modularize the core code we could have init code load the
modules at all kinds of random addresses; they wouldn't even need to be
contiguous in memory, and since we'd have full access to the memory
layout at that point, we can randomize the **** out of *everything*.

-hpa

2011-05-27 22:01:54

by Olivier Galibert

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Fri, May 27, 2011 at 08:17:24PM +0200, Ingo Molnar wrote:
> - A root exploit will still not give away the location of the
> kernel (assuming module loading has been disabled after bootup),
> so a rootkit cannot be installed 'silently' on the system, into
> RAM only, evading most offline-storage-checking tools.
>
> With static linking this is not possible: reading the kernel image
> as root trivially exposes the kernel's location.

There's something I don't get there. If you managed to escalate your
priviledges enough that you have physical ram access, there's a
billion things you can do to find the kernel, including vector
tracing, pattern matching, looking at the page tables, etc.

What am I missing?

OG.

2011-05-27 22:13:34

by Valdis Klētnieks

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Fri, 27 May 2011 23:51:23 +0200, Olivier Galibert said:
> On Fri, May 27, 2011 at 08:17:24PM +0200, Ingo Molnar wrote:
> > - A root exploit will still not give away the location of the
> > kernel (assuming module loading has been disabled after bootup),
> > so a rootkit cannot be installed 'silently' on the system, into
> > RAM only, evading most offline-storage-checking tools.
> >
> > With static linking this is not possible: reading the kernel image
> > as root trivially exposes the kernel's location.
>
> There's something I don't get there. If you managed to escalate your
> priviledges enough that you have physical ram access, there's a
> billion things you can do to find the kernel, including vector
> tracing, pattern matching, looking at the page tables, etc.

Oh, you mean all the tricks that people do now to patch the syscall table
once we hid it so they couldn't patch it? :)

Attachments:

(No filename) (227.00 B)

2011-05-27 23:52:35

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/27/2011 02:37 PM, H. Peter Anvin wrote:
> On 05/27/2011 11:05 AM, Linus Torvalds wrote:
>>
>> You can load the kernel at the same virtual address we always load it,
>> and/or perhaps shift it up by just small amounts (ie "single pages"
>> rather than "ten bits worth of pages")
>>
>> And then rely on the fact that you mixed up symbols in other ways.
>>
>
> OK, here is a bat-shit-crazy idea... an all-module kernel where nothing
> except init code is prelinked at all.
>
> If we could modularize the core code we could have init code load the
> modules at all kinds of random addresses; they wouldn't even need to be
> contiguous in memory, and since we'd have full access to the memory
> layout at that point, we can randomize the **** out of *everything*.
>

Thinking about it some more, it might not be that crazy. Consider the
following notion: the kernel payload, as delivered by the decompressor,
contains the init code, plus a set of modules, which can be ELF modules,
but don't have to be (but since we already have code to load and link
ELF modules it is probably be the best choice.)

After we initialize the system enough to have a memory map, we can pick
a random place for each module, copy it in place, fix up the
relocations, and free the original location.

If we are exceptionally clever, which of course we are, we could even
have these modules linked to their initial location and fix up
references in running code, that way init code could still call module
code, as long as it doesn't stash away pointers to module data.

-hpa

2011-05-28 00:51:41

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/27/2011 02:51 PM, Olivier Galibert wrote:
> On Fri, May 27, 2011 at 08:17:24PM +0200, Ingo Molnar wrote:
>> - A root exploit will still not give away the location of the
>> kernel (assuming module loading has been disabled after bootup),
>> so a rootkit cannot be installed 'silently' on the system, into
>> RAM only, evading most offline-storage-checking tools.
>>
>> With static linking this is not possible: reading the kernel image
>> as root trivially exposes the kernel's location.
>
> There's something I don't get there. If you managed to escalate your
> priviledges enough that you have physical ram access, there's a
> billion things you can do to find the kernel, including vector
> tracing, pattern matching, looking at the page tables, etc.
>
> What am I missing?
>

Just makes it harder to automate an attack, and more likely that it will
fail. It's an arms race, of course.

-hpa

2011-05-28 06:33:30

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Olivier Galibert <[email protected]> wrote:

> On Fri, May 27, 2011 at 08:17:24PM +0200, Ingo Molnar wrote:
> > - A root exploit will still not give away the location of the
> > kernel (assuming module loading has been disabled after bootup),
> > so a rootkit cannot be installed 'silently' on the system, into
> > RAM only, evading most offline-storage-checking tools.
> >
> > With static linking this is not possible: reading the kernel image
> > as root trivially exposes the kernel's location.
>
> There's something I don't get there. If you managed to escalate your
> priviledges enough that you have physical ram access, there's a
> billion things you can do to find the kernel, including vector
> tracing, pattern matching, looking at the page tables, etc.
>
> What am I missing?

You are missing that it's not unrealistic to make the
"root does not have physical RAM access" condition true
on a system.

CONFIG_STRICT_DEVMEM=y will go a long way already, enabled
on most distros these days:

$ grep DEVMEM $(rpm -ql kernel-2.6.38-0.rc7.git2.3.fc16.x86_64 | grep boot/config)
CONFIG_STRICT_DEVMEM=y

Combined with:

echo 1 > /proc/sys/kernel/modules_disabled

( Which cannot be turned back on once turned off after essential
modules have loaded. )

Admins do not actually need access to physical RAM, nor do they need
the ability to binary patch kernel code, so it's not unrealistic to
do this in distros.

There can be a few more vectors to access physical RAM but they can
be controlled as well.

This will already force a reboot (or a wait for a regular reboot) by
the attacker to install rootkit level code.

But yes, if root controls RAM then it's obviously game over: even
with randomization RAM can be scanned for kernel image signatures,
kernel code can be inserted or system call table patched - q.e.d.

Thanks,

Ingo

2011-05-28 12:18:44

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Linus Torvalds <[email protected]> wrote:

> On Fri, May 27, 2011 at 10:53 AM, H. Peter Anvin <[email protected]> wrote:
> >
> > That doesn't solve any problems with the memory map.
>
> Actually, it does.
>
> You can load the kernel at the same virtual address we always load
> it, and/or perhaps shift it up by just small amounts (ie "single
> pages" rather than "ten bits worth of pages")

Note that if we do not limit it to just 'a few pages' then padding
the randomization space into the kernel image:

*also solves the memory map problem in the dynamic randomization case*

Having half a megabyte of '__init buffer' at the beginning or end of
the kernel image is no big deal, it's more than enough for good
randomization and makes the whole thing image-loader invariant: we
can freely shift the 'real' kernel image within this larger boundary
without consulting RAM maps.

And yes, you are right that smarter randomization like reordering of
functions is probably more feasible with a static method - but i'm
not sure we'd like to reorder functions: they are often ordered by
importance within .c files, hence they are often ordered by cache
hotness, so keeping them together makes sense to optimize icache
footprint.

Further note that should anyone want to randomize the kernel position
within a larger range, memory maps can still be consulted - but
that's an optional enhancement, not a design requirement.

Note that such a larger range of randomization is not possible with
the static install-time randomization method, as it needs the consult
the memory maps on bootup.

So while i agree with you that install-time randomization has unique
properties, i do not agree that all of those unique properties are
advantages and thus i do not think that the case for static
randomization is nearly as clear-cut as you made it appear.

Furthermore, the two main complications of dynamic randomizations
that you highlighted are not really fundamental complications IMO:

- the memory map consulation complexity can be completely eliminated
in the dynamic randomization case as well

- the hibernation complication is overstated i think: if on
hibernation we save the randomization offset then the thawed
kernel can load at the very same address. [ We have no other
choice anyway, pointers to the kernel image are stored all
over the frozen image. ]

This skips re-randomization across hibernation but that's ok:
it's the functional equivalent of suspend-to-RAM.

Btw., there's another advantage of kernel image randomization in
general that i have not mentioned before:

- in addition to randomizing the kernel load physical image address,
on 64-bit x86 we could independently randomize the *virtual*
address of the kernel as well: within a rather large, 2GB address
space.

This makes the very first step of buffer overflow (and pointer
overwrite) attacks very hard: they'd have to find the right
executable needle within a 2GB haystack.

Combined with SMEP this needle is the *only* place where a kernel
mode exploit can execute. [*]

This kind of large-scale virtual address randomization could be
performed both dynamically (boot time) and statically (install time).

Thanks,

Ingo

[*] Assuming we get around sorting out the first 1MB compatibility
constraints that force us to turn off NX there currently, and
review the pagetables for all remaining system mode mapped
executable pages.

2011-05-29 01:15:07

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/28/2011 05:18 AM, Ingo Molnar wrote:
>
> Having half a megabyte of '__init buffer' at the beginning or end of
> the kernel image is no big deal, it's more than enough for good
> randomization and makes the whole thing image-loader invariant: we
> can freely shift the 'real' kernel image within this larger boundary
> without consulting RAM maps.
>

Sure, but you're also blowing any attempt at PMD alignment to kingdom come.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2011-05-29 12:47:29

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* H. Peter Anvin <[email protected]> wrote:

> On 05/28/2011 05:18 AM, Ingo Molnar wrote:
> >
> > Having half a megabyte of '__init buffer' at the beginning or end
> > of the kernel image is no big deal, it's more than enough for
> > good randomization and makes the whole thing image-loader
> > invariant: we can freely shift the 'real' kernel image within
> > this larger boundary without consulting RAM maps.
>
> Sure, but you're also blowing any attempt at PMD alignment to
> kingdom come.

Do you mean we'd not start at a 2MB boundary and thus would waste on
average an about 0.125 worth of huge-TLB cache entry?

That does not look like a very big issue to me - but maybe i'm
missing something and you mean something else.

Thanks,

Ingo

2011-05-29 18:20:44

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/29/2011 05:47 AM, Ingo Molnar wrote:
>
> Do you mean we'd not start at a 2MB boundary and thus would waste on
> average an about 0.125 worth of huge-TLB cache entry?
>
> That does not look like a very big issue to me - but maybe i'm
> missing something and you mean something else.
>

The problem is that because of the misalignment, and whatever falls on
the other side of that memory boundary we might end up having to
fracture the 2 MB page into 4K pages.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2011-05-29 18:44:45

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* H. Peter Anvin <[email protected]> wrote:

> On 05/29/2011 05:47 AM, Ingo Molnar wrote:
> >
> > Do you mean we'd not start at a 2MB boundary and thus would waste on
> > average an about 0.125 worth of huge-TLB cache entry?
> >
> > That does not look like a very big issue to me - but maybe i'm
> > missing something and you mean something else.
> >
>
> The problem is that because of the misalignment, and whatever falls
> on the other side of that memory boundary we might end up having to
> fracture the 2 MB page into 4K pages.

We already have that kind of fragmentation anyway, due to NX and due
to the readonly area. Randomization does not really make that
situation much worse.

But the thing is, we could fully eliminate all those disadvantages on
64-bit x86:

We could put a 2MB hole between end of text (end of X) and start of
readonly data (start of NX), and another 2MB hole between end of
readonly and start of data.

That way we'd have:

- the low alias is mapped NX as well, so the whole area and
surrounding pages are 2MB aligned. The 'holes' are freed up as
__initmem so not wasted.

- the high alias will have three areas:

- the text area, which is 2MB mapped as X
- the ro-data area, which is 2MB mapped as NX-RO
- the data area, which is 2MB mapped as NX-RW

because there's at least 2MB of distance between end of text and
start of data there's a guarantee that both will be fully 2MB mapped.

Btw., we might want to do this regardless of randomization, for
performance reasons: right now the NX and readonly area fragments the
2MB mapping around the kernel text already, into 4K mappings.

Hm?

Thanks,

Ingo

2011-05-29 18:53:50

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/29/2011 11:44 AM, Ingo Molnar wrote:
>
> We could put a 2MB hole between end of text (end of X) and start of
> readonly data (start of NX), and another 2MB hole between end of
> readonly and start of data.
>

It still means you have memory which is X-mapped when it doesn't need to
be, since there will be RAM in that region.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2011-05-29 19:57:22

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* H. Peter Anvin <[email protected]> wrote:

> On 05/29/2011 11:44 AM, Ingo Molnar wrote:
> >
> > We could put a 2MB hole between end of text (end of X) and start of
> > readonly data (start of NX), and another 2MB hole between end of
> > readonly and start of data.
> >
>
> It still means you have memory which is X-mapped when it doesn't need to
> be, since there will be RAM in that region.

But it ought to be rather harmless in this particular case, because
the high alias addresses are all randomized!

Thanks,

Ingo

2011-05-31 16:53:16

by Matthew Garrett

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Thu, May 26, 2011 at 04:39:27PM -0400, Dan Rosenberg wrote:

> 1. I'm nearly finished a first draft of code to parse the BIOS E820
> memory map to determine where it's safe to place the randomized kernel.
> This code accounts for overlapping regions, as well as potential
> conflicts in region types (free vs. reserved, etc.), in favor of
> non-free types. The end result is, I'll have a reasonable upper bound.

The BIOS E820 map, or the kernel representation? In either case, this
isn't going to work well with EFI. There are regions that will be marked
as available in the E820 map that we *mustn't* touch until we've entered
EFI virtual mode.

(This is, clearly, insane).

One other thing is that when we've entered EFI virtual mode we'll have
remapped various parts of the EFI memory map into virtual address space.
There's no way to update these mappings later. If we want kexec to work
then there has to be a mechanism for ensuring that these mappings can be
provided to the second kernel and for it to preserve them.

--
Matthew Garrett | [email protected]

2011-05-31 18:40:58

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/31/2011 09:52 AM, Matthew Garrett wrote:
> On Thu, May 26, 2011 at 04:39:27PM -0400, Dan Rosenberg wrote:
>
>> 1. I'm nearly finished a first draft of code to parse the BIOS E820
>> memory map to determine where it's safe to place the randomized kernel.
>> This code accounts for overlapping regions, as well as potential
>> conflicts in region types (free vs. reserved, etc.), in favor of
>> non-free types. The end result is, I'll have a reasonable upper bound.
>
> The BIOS E820 map, or the kernel representation? In either case, this
> isn't going to work well with EFI. There are regions that will be marked
> as available in the E820 map that we *mustn't* touch until we've entered
> EFI virtual mode.
>
> (This is, clearly, insane).
>

I believe we could (should!) mark them reserved, not available, in the
E820 map and free them later.

-hpa

2011-05-31 18:51:45

by Matthew Garrett

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Tue, May 31, 2011 at 11:40:13AM -0700, H. Peter Anvin wrote:
> On 05/31/2011 09:52 AM, Matthew Garrett wrote:
> > The BIOS E820 map, or the kernel representation? In either case, this
> > isn't going to work well with EFI. There are regions that will be marked
> > as available in the E820 map that we *mustn't* touch until we've entered
> > EFI virtual mode.
> >
> > (This is, clearly, insane).
> >
>
> I believe we could (should!) mark them reserved, not available, in the
> E820 map and free them later.

That was my original approach, but it requires that the bootloader be
modified and it turns out that it's a lot harder to hand reserved
regions back to the OS than it is to just reserve it in-kernel. The
complete inflexibility of e820 is massively unhelpful here. It's just
not possible to represent all of the EFI memory map data in it.

--
Matthew Garrett | [email protected]

2011-05-31 19:03:44

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Tue, 2011-05-31 at 19:51 +0100, Matthew Garrett wrote:
> On Tue, May 31, 2011 at 11:40:13AM -0700, H. Peter Anvin wrote:
> > On 05/31/2011 09:52 AM, Matthew Garrett wrote:
> > > The BIOS E820 map, or the kernel representation? In either case, this
> > > isn't going to work well with EFI. There are regions that will be marked
> > > as available in the E820 map that we *mustn't* touch until we've entered
> > > EFI virtual mode.
> > >
> > > (This is, clearly, insane).
> > >
> >
> > I believe we could (should!) mark them reserved, not available, in the
> > E820 map and free them later.
>
> That was my original approach, but it requires that the bootloader be
> modified and it turns out that it's a lot harder to hand reserved
> regions back to the OS than it is to just reserve it in-kernel. The
> complete inflexibility of e820 is massively unhelpful here. It's just
> not possible to represent all of the EFI memory map data in it.
>

Just for the record, I've put this patch on hold until there's some more
consensus about whether boot-time randomization of the physical kernel
address is the best approach. There are some other potential issues
that haven't been brought up yet publicly, such as the possibility of
local attackers performing cache timing attacks to find the kernel image
location at runtime, which may make traditional ASLR somewhat pointless
regardless (except in the case of remote attackers, I suppose). Perhaps
HPA's suggestion of further modularizing the kernel would have some
advantages in this regard.

-Dan

> --
> Matthew Garrett | [email protected]

2011-05-31 19:08:46

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/31/2011 12:03 PM, Dan Rosenberg wrote:
>
> Just for the record, I've put this patch on hold until there's some more
> consensus about whether boot-time randomization of the physical kernel
> address is the best approach. There are some other potential issues
> that haven't been brought up yet publicly, such as the possibility of
> local attackers performing cache timing attacks to find the kernel image
> location at runtime, which may make traditional ASLR somewhat pointless
> regardless (except in the case of remote attackers, I suppose). Perhaps
> HPA's suggestion of further modularizing the kernel would have some
> advantages in this regard.
>

I'm probably going to implement the whole-image randomization as an
option in the Syslinux bootloader; it is a *lot* easier to do this
correctly in the bootloader.

-hpa

2011-05-31 19:51:11

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Dan Rosenberg <[email protected]> wrote:

> [...] the possibility of local attackers performing cache timing
> attacks to find the kernel image location at runtime, [...]

How would these work, roughly?

Thanks,

Ingo

2011-05-31 19:56:12

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* Dan Rosenberg <[email protected]> wrote:

> Just for the record, I've put this patch on hold until there's some
> more consensus about whether boot-time randomization of the
> physical kernel address is the best approach. [...]

Well, if you use the suggestion i made: to skip the e820 map fiddling
altogether and just allocate half a megabyte of 'hole' at the end of
the kernel image - which would allow the kernel to be randomized
freely upwards by 0-128 pages - then the 'dynamic' versus 'static'
solution could be used at once!

The 'static' method would use the same hole, just at install time,
while the 'dynamic' method would use it during bootup.

Also, if this method is used then most of the controversy about the
dynamic approach goes away (which was the memory maps interpretation
fragility).

Your last patch would need only minor modifications to get the hole
added: you'd need to add the tail-hole in the linker map:

arch/x86/kernel/vmlinux.lds.S

So ... could you *please* not shelf this idea just because people
used lkml for what it was invented: argued with each other rather
forcefully? :-)

Thanks,

Ingo

2011-05-31 20:16:30

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/31/2011 12:55 PM, Ingo Molnar wrote:
>
> So ... could you *please* not shelf this idea just because people
> used lkml for what it was invented: argued with each other rather
> forcefully? :-)
>

The real issue is that if it can be (semi)trivially bypassed, then there
may not be much reason to do it.

Other than that, Ingo's idea at least have the merit that it would break
only older bootloaders doing things wrong.

-hpa

2011-05-31 20:17:17

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On Tue, 2011-05-31 at 21:55 +0200, Ingo Molnar wrote:
> * Dan Rosenberg <[email protected]> wrote:
>
> > Just for the record, I've put this patch on hold until there's some
> > more consensus about whether boot-time randomization of the
> > physical kernel address is the best approach. [...]
>
> Well, if you use the suggestion i made: to skip the e820 map fiddling
> altogether and just allocate half a megabyte of 'hole' at the end of
> the kernel image - which would allow the kernel to be randomized
> freely upwards by 0-128 pages - then the 'dynamic' versus 'static'
> solution could be used at once!
>
> The 'static' method would use the same hole, just at install time,
> while the 'dynamic' method would use it during bootup.
>
> Also, if this method is used then most of the controversy about the
> dynamic approach goes away (which was the memory maps interpretation
> fragility).
>
> Your last patch would need only minor modifications to get the hole
> added: you'd need to add the tail-hole in the linker map:
>
> arch/x86/kernel/vmlinux.lds.S
>
> So ... could you *please* not shelf this idea just because people
> used lkml for what it was invented: argued with each other rather
> forcefully? :-)
>

Don't worry, I haven't shelved the idea...I just wanted to see more of
the on-going conversation before investing a substantial amount of time
on a potentially infeasible solution. I'll give this approach a shot.

-Dan

> Thanks,
>
> Ingo

2011-05-31 20:27:27

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* H. Peter Anvin <[email protected]> wrote:

> On 05/31/2011 12:55 PM, Ingo Molnar wrote:
> >
> > So ... could you *please* not shelf this idea just because people
> > used lkml for what it was invented: argued with each other rather
> > forcefully? :-)
>
> The real issue is that if it can be (semi)trivially bypassed, then
> there may not be much reason to do it.

Sure.

> Other than that, Ingo's idea at least have the merit that it would
> break only older bootloaders doing things wrong.

I'm wondering, why would it break older bootloaders? It's just a
slightly larger than usual kernel image, nothing is visible to the
bootloader.

Thanks,

Ingo

2011-05-31 20:31:25

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

On 05/31/2011 01:27 PM, Ingo Molnar wrote:
>
>> Other than that, Ingo's idea at least have the merit that it would
>> break only older bootloaders doing things wrong.
>
> I'm wondering, why would it break older bootloaders? It's just a
> slightly larger than usual kernel image, nothing is visible to the
> bootloader.
>

Older boot loaders did not know how big the kernel image was, therefore
had no way to avoid memory space collision. That is fixed in boot
protocol 2.10.

-hpa

2011-06-01 06:19:08

[permalink] [raw]

Subject: Re: [RFC][PATCH] Randomize kernel base address on boot

* H. Peter Anvin <[email protected]> wrote:

> On 05/31/2011 01:27 PM, Ingo Molnar wrote:
> >
> >> Other than that, Ingo's idea at least have the merit that it would
> >> break only older bootloaders doing things wrong.
> >
> > I'm wondering, why would it break older bootloaders? It's just a
> > slightly larger than usual kernel image, nothing is visible to the
> > bootloader.
> >
>
> Older boot loaders did not know how big the kernel image was,
> therefore had no way to avoid memory space collision. That is
> fixed in boot protocol 2.10.

But i loaded really large kernel images way back 10 years ago on
various systems and never had any problems until the default
allyesconfig hit a ~40 MB kernel image size limit ;-)

(which limit was in the kernel, not in the bootloader)

So yes, a large kernel image "can" be an issue with old bootloaders
in some situations on weird machines but we don't really "break" them
via randomization, they were broken and fragile in some situations to
begin with.

It's fixed in any distro that cares and which would use our (not even
released) kernel that might one day have randomization.

Is that a fair summary of the bootloader situation?

Thanks,

Ingo

2011-06-01 15:45:31