2013-10-17 09:00:57

by Hatayama, Daisuke

[permalink] [raw]
Subject: kexec cannot find text map area if kaslr is enabled

Hello,

I tried to use x86/kaslr branch to check if how it works with kdump framework.

I found kexec doesn't work. According to the message, it looks like kexec failing
to find kernel text map area from kcore.

$ sudo /sbin/kexec -p --command-line="ro root=UUID=cdd5e357-d223-47ee-9d6e-d1fa78b3f8a4 rd_NO_LUKS nodmraid rd_NO_MD KEYBOARDTYPE=pc KEYTABLE=jp106 LANG=ja_JP.UTF-8 rd_NO_LVM rd_NO_DM consol\
e=ttyS0,19200n8r trace_event=block:*,irq:*,mce:*,sched:*,signal:*,workqueue:*,scsi:* trace_buf_size=25165824 irqpoll nr_cpus=2 reset_devices cgroup_disable=memory mce=off enable_lazy_purge " --initrd=/boot/initrd-3.12.0-rc4-k\
aslrkdump.img /boot/vmlinuz-3.12.0-rc4-kaslr
Can't find kernel text map area from kcore
Cannot load /boot/vmlinuz-3.12.0-rc4-kaslr

>From source code, it looks like kexec trying to find text map area by hard-coded
__START_KERNEL_map address. But this is being altered by kaslr.

static int get_kernel_vaddr_and_size(struct kexec_info *UNUSED(info),
struct crash_elf_info *elf_info)
<cut>
/* Traverse through the Elf headers and find the region where
* kernel is mapped. */
end_phdr = &ehdr.e_phdr[ehdr.e_phnum];
for(phdr = ehdr.e_phdr; phdr != end_phdr; phdr++) {
if (phdr->p_type == PT_LOAD) {
unsigned long long saddr = phdr->p_vaddr;
unsigned long long eaddr = phdr->p_vaddr + phdr->p_memsz;
unsigned long long size;

/* Look for kernel text mapping header. */
if ((saddr >= X86_64__START_KERNEL_map) &&
(eaddr <= X86_64__START_KERNEL_map + X86_64_KERNEL_TEXT_SIZE)) {
saddr = _ALIGN_DOWN(saddr, X86_64_KERN_VADDR_ALIGN);
elf_info->kern_vaddr_start = saddr;
size = eaddr - saddr;
/* Align size to page size boundary. */
size = _ALIGN(size, align);
elf_info->kern_size = size;
dbgprintf("kernel vaddr = 0x%llx size = 0x%llx\n",
saddr, size);
return 0;
}
}
}
fprintf(stderr, "Can't find kernel text map area from kcore\n");
return -1;

It seems to me that kexec needs to get runtime relocation information for example
from /proc/kallsyms.

I think there would be other part that doesn't work well due to this kind of hard coded address.

FYI, here are also part of /proc/iomem and /proc/kcore information on my environment:

$ readelf -l /proc/kcore
Elf file type is CORE (Core file)
Entry point 0x0
There are 11 program headers, starting at offset 64

Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
NOTE 0x00000000000002a8 0x0000000000000000 0x0000000000000000
0x0000000000000c74 0x0000000000000000 0
LOAD 0x00007fffff601000 0xffffffffff600000 0x0000000000000000
0x0000000000800000 0x0000000000800000 RWE 1000
LOAD 0x00007fffa3001000 0xffffffffa3000000 0x0000000000000000
0x0000000000ed4000 0x0000000000ed4000 RWE 1000
LOAD 0x0000490000001000 0xffffc90000000000 0x0000000000000000
0x00001fffffffffff 0x00001fffffffffff RWE 1000
LOAD 0x00007fffc0001000 0xffffffffc0000000 0x0000000000000000
0x000000003f000000 0x000000003f000000 RWE 1000
LOAD 0x0000080000002000 0xffff880000001000 0x0000000000000000
0x000000000009a000 0x000000000009a000 RWE 1000
LOAD 0x00006a0000001000 0xffffea0000000000 0x0000000000000000
0x0000000000003000 0x0000000000003000 RWE 1000
LOAD 0x0000080000101000 0xffff880000100000 0x0000000000000000
0x000000007af0d000 0x000000007af0d000 RWE 1000
LOAD 0x00006a0000004000 0xffffea0000003000 0x0000000000000000
0x0000000001ae6000 0x0000000001ae6000 RWE 1000
LOAD 0x0000080100001000 0xffff880100000000 0x0000000000000000
0x0000000780000000 0x0000000780000000 RWE 1000
LOAD 0x00006a0003801000 0xffffea0003800000 0x0000000000000000
0x000000001a400000 0x000000001a400000 RWE 1000

00000000-00000fff : reserved
00001000-0009afff : System RAM
0009b000-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000c7fff : Video ROM
000c8000-000c8fff : Adapter ROM
000c9000-000cefff : Adapter ROM
000e0000-000fffff : reserved
000f0000-000fffff : System ROM
00100000-7b00cfff : System RAM
03000000-22ffffff : Crash kernel
23000000-2355118e : Kernel code
2355118f-23af95ff : Kernel data
23cb2000-23eadfff : Kernel bss
7b00d000-7b00ffff : reserved
7b010000-7b65efff : ACPI Non-volatile Storage
7b65f000-7b681fff : ACPI Tables
7b682000-7b7bffff : reserved
7b7c0000-7ba3ffff : ACPI Non-volatile Storage
7ba40000-7baaafff : reserved
7baab000-7bcfffff : ACPI Tables
7bd00000-7bd12fff : reserved
7bd13000-7bd15fff : ACPI Tables
7bd16000-7bd45fff : reserved
7bd46000-7bd5efff : ACPI Tables
7bd5f000-7bdfefff : reserved
7bdff000-7bdfffff : ACPI Tables
7be00000-7be4efff : reserved
7be1b018-7be1b067 : APEI ERST
7be1b070-7be1b077 : APEI ERST
7be1b078-7be1d017 : APEI ERST
7be4f000-7bf83fff : ACPI Tables
7bf84000-7bfcefff : ACPI Non-volatile Storage
7bfcf000-7bffefff : ACPI Tables
7bfff000-8fffffff : reserved
80000000-8fffffff : PCI MMCONFIG 0000 [bus 00-ff]
90000000-afffffff : PCI Bus 0000:00
<cut>

--
Thanks.
HATAYAMA, Daisuke


2013-10-17 09:11:38

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: kexec cannot find text map area if kaslr is enabled

(2013/10/17 17:59), HATAYAMA Daisuke wrote:
> Hello,
>
> I tried to use x86/kaslr branch to check if how it works with kdump framework.
>

Sorry, it's a branch of tip repository here:

http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/log/?h=x86/kaslr

--
Thanks.
HATAYAMA, Daisuke

2013-10-17 19:58:53

by Eric W. Biederman

[permalink] [raw]
Subject: Re: kexec cannot find text map area if kaslr is enabled

HATAYAMA Daisuke <[email protected]> writes:

> Hello,
>
> I tried to use x86/kaslr branch to check if how it works with kdump
> framework.

As far as I can tell x86/kaslr is a pretty silly idea. There don't seem
to be enough bits to make it hard to brute force, much less hard to
guess. And it is a lot of pain to get there... Sigh.

> I found kexec doesn't work. According to the message, it looks like kexec failing
> to find kernel text map area from kcore.

Well kexec -p doesn't work.

> $ sudo /sbin/kexec -p --command-line="ro root=UUID=cdd5e357-d223-47ee-9d6e-d1fa78b3f8a4 rd_NO_LUKS nodmraid rd_NO_MD KEYBOARDTYPE=pc KEYTABLE=jp106 LANG=ja_JP.UTF-8 rd_NO_LVM rd_NO_DM consol\
> e=ttyS0,19200n8r trace_event=block:*,irq:*,mce:*,sched:*,signal:*,workqueue:*,scsi:* trace_buf_size=25165824 irqpoll nr_cpus=2 reset_devices cgroup_disable=memory mce=off enable_lazy_purge " --initrd=/boot/initrd-3.12.0-rc4-k\
> aslrkdump.img /boot/vmlinuz-3.12.0-rc4-kaslr
> Can't find kernel text map area from kcore
> Cannot load /boot/vmlinuz-3.12.0-rc4-kaslr
>
> From source code, it looks like kexec trying to find text map area by hard-coded
> __START_KERNEL_map address. But this is being altered by kaslr.

Looking at the code you have found the hard coded address of -2G is
fine, and actually required by the compiler. The actual problem
appears to be that the structure of the kernel mapping has changed.
There are now two mappings in the -2GB range. one of 10MiB and one
of 1024MiB. Where the code was looking for a mapping of 512MiB.

The entire bit of code is a just for pretty printing the core and I
suspect could be done more robustly, possibly by reporting all of the
kernel vaddrs of the mappings.

I expect you could increase X86_64_KERNEL_TEXT_SIZE 2GiB -1 aka
0x7fffffff and the code would work. I don't know if you would have a
recognizable text segment in the core dump.

I believe ultimately what we want is to have an elf image with all of
the same PT_LOAD segments as /proc/kcore, and the current implementation
is not general enough to do that. So this probably makes a good
opportunity to rewrite it.

It may also make sense to have some information from /proc/kallsyms. We
aren't doing that on i386 and have something that works, so I suspect
the same logic will work on x86_64. At least until it is decided that
the best way to load the kernel is to randomly reorder and relink all of
the .o's in the kernel at boot time.

Eric

> static int get_kernel_vaddr_and_size(struct kexec_info *UNUSED(info),
> struct crash_elf_info *elf_info)
> <cut>
> /* Traverse through the Elf headers and find the region where
> * kernel is mapped. */
> end_phdr = &ehdr.e_phdr[ehdr.e_phnum];
> for(phdr = ehdr.e_phdr; phdr != end_phdr; phdr++) {
> if (phdr->p_type == PT_LOAD) {
> unsigned long long saddr = phdr->p_vaddr;
> unsigned long long eaddr = phdr->p_vaddr + phdr->p_memsz;
> unsigned long long size;
>
> /* Look for kernel text mapping header. */
> if ((saddr >= X86_64__START_KERNEL_map) &&
> (eaddr <= X86_64__START_KERNEL_map + X86_64_KERNEL_TEXT_SIZE)) {
> saddr = _ALIGN_DOWN(saddr, X86_64_KERN_VADDR_ALIGN);
> elf_info->kern_vaddr_start = saddr;
> size = eaddr - saddr;
> /* Align size to page size boundary. */
> size = _ALIGN(size, align);
> elf_info->kern_size = size;
> dbgprintf("kernel vaddr = 0x%llx size = 0x%llx\n",
> saddr, size);
> return 0;
> }
> }
> }
> fprintf(stderr, "Can't find kernel text map area from kcore\n");
> return -1;
>
> It seems to me that kexec needs to get runtime relocation information for example
> from /proc/kallsyms.
>
> I think there would be other part that doesn't work well due to this kind of hard coded address.
>
> FYI, here are also part of /proc/iomem and /proc/kcore information on my environment:
>
> $ readelf -l /proc/kcore
> Elf file type is CORE (Core file)
> Entry point 0x0
> There are 11 program headers, starting at offset 64
>
> Program Headers:
> Type Offset VirtAddr PhysAddr
> FileSiz MemSiz Flags Align
> NOTE 0x00000000000002a8 0x0000000000000000 0x0000000000000000
> 0x0000000000000c74 0x0000000000000000 0
> LOAD 0x00007fffff601000 0xffffffffff600000 0x0000000000000000
> 0x0000000000800000 0x0000000000800000 RWE 1000
> LOAD 0x00007fffa3001000 0xffffffffa3000000 0x0000000000000000
> 0x0000000000ed4000 0x0000000000ed4000 RWE 1000
> LOAD 0x0000490000001000 0xffffc90000000000 0x0000000000000000
> 0x00001fffffffffff 0x00001fffffffffff RWE 1000
> LOAD 0x00007fffc0001000 0xffffffffc0000000 0x0000000000000000
> 0x000000003f000000 0x000000003f000000 RWE 1000
> LOAD 0x0000080000002000 0xffff880000001000 0x0000000000000000
> 0x000000000009a000 0x000000000009a000 RWE 1000
> LOAD 0x00006a0000001000 0xffffea0000000000 0x0000000000000000
> 0x0000000000003000 0x0000000000003000 RWE 1000
> LOAD 0x0000080000101000 0xffff880000100000 0x0000000000000000
> 0x000000007af0d000 0x000000007af0d000 RWE 1000
> LOAD 0x00006a0000004000 0xffffea0000003000 0x0000000000000000
> 0x0000000001ae6000 0x0000000001ae6000 RWE 1000
> LOAD 0x0000080100001000 0xffff880100000000 0x0000000000000000
> 0x0000000780000000 0x0000000780000000 RWE 1000
> LOAD 0x00006a0003801000 0xffffea0003800000 0x0000000000000000
> 0x000000001a400000 0x000000001a400000 RWE 1000
>
> 00000000-00000fff : reserved
> 00001000-0009afff : System RAM
> 0009b000-0009ffff : reserved
> 000a0000-000bffff : PCI Bus 0000:00
> 000c0000-000c7fff : Video ROM
> 000c8000-000c8fff : Adapter ROM
> 000c9000-000cefff : Adapter ROM
> 000e0000-000fffff : reserved
> 000f0000-000fffff : System ROM
> 00100000-7b00cfff : System RAM
> 03000000-22ffffff : Crash kernel
> 23000000-2355118e : Kernel code
> 2355118f-23af95ff : Kernel data
> 23cb2000-23eadfff : Kernel bss
> 7b00d000-7b00ffff : reserved
> 7b010000-7b65efff : ACPI Non-volatile Storage
> 7b65f000-7b681fff : ACPI Tables
> 7b682000-7b7bffff : reserved
> 7b7c0000-7ba3ffff : ACPI Non-volatile Storage
> 7ba40000-7baaafff : reserved
> 7baab000-7bcfffff : ACPI Tables
> 7bd00000-7bd12fff : reserved
> 7bd13000-7bd15fff : ACPI Tables
> 7bd16000-7bd45fff : reserved
> 7bd46000-7bd5efff : ACPI Tables
> 7bd5f000-7bdfefff : reserved
> 7bdff000-7bdfffff : ACPI Tables
> 7be00000-7be4efff : reserved
> 7be1b018-7be1b067 : APEI ERST
> 7be1b070-7be1b077 : APEI ERST
> 7be1b078-7be1d017 : APEI ERST
> 7be4f000-7bf83fff : ACPI Tables
> 7bf84000-7bfcefff : ACPI Non-volatile Storage
> 7bfcf000-7bffefff : ACPI Tables
> 7bfff000-8fffffff : reserved
> 80000000-8fffffff : PCI MMCONFIG 0000 [bus 00-ff]
> 90000000-afffffff : PCI Bus 0000:00
> <cut>