Hi,
Running:
$ sudo x86info -a
On this HP ZBook 15 G3 laptop kills the x86info process with segfault
and produces the following kernel BUG.
$ git describe
v4.11-rc4-40-gfe82203
It is also reproducible with the fedora kernel: 4.9.14-200.fc25.x86_64
Full dmesg output here: https://pastebin.com/raw/Kur2mpZq
[ 51.418954] usercopy: kernel memory exposure attempt detected from
ffff880000090000 (dma-kmalloc-256) (4096 bytes)
[ 51.418959] ------------[ cut here ]------------
[ 51.418968] kernel BUG at /home/tomranta/git/linux/mm/usercopy.c:78!
[ 51.418970] invalid opcode: 0000 [#1] SMP
[ 51.418972] Modules linked in: fuse ccm ipt_REJECT nf_reject_ipv4
xt_tcpudp tun af_packet xt_conntrack nf_conntrack libcrc32c ebtable_nat
ebtable_broute bridge ip6table_mangle ip6table_raw iptable_mangle
iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter ip_tables x_tables nls_iso8859_1 nls_cp437 vfat fat
dm_mirror dm_region_hash dm_log arc4 hp_wmi sparse_keymap coretemp
kvm_intel snd_hda_codec_hdmi kvm irqbypass pcbc aesni_intel aes_x86_64
crypto_simd cryptd glue_helper intel_cstate intel_uncore intel_rapl_perf
iwlmvm mac80211 snd_usb_audio mousedev snd_usbmidi_lib snd_rawmidi
input_leds snd_hda_codec_conexant snd_hda_codec_generic efivars iwlwifi
uvcvideo videobuf2_vmalloc videobuf2_memops snd_hda_intel videobuf2_v4l2
cfg80211 videobuf2_core snd_hda_codec snd_seq snd_hwdep
[ 51.419010] snd_seq_device snd_hda_core snd_pcm thermal hp_accel
lis3lv02d input_polldev ac acpi_pad battery led_class evdev hp_wireless
nfsd lockd grace sunrpc tg3 libphy crc32_pclmul crc32c_intel e1000e
sd_mod 8021q garp stp llc mrp unix autofs4
[ 51.419025] CPU: 7 PID: 2406 Comm: x86info Not tainted
4.11.0-rc4-tommi+ #14
[ 51.419027] Hardware name: HP HP ZBook 15 G3/80D5, BIOS N81 Ver.
01.12 11/01/2016
[ 51.419030] task: ffff88026ce84100 task.stack: ffffc90003b94000
[ 51.419035] RIP: 0010:__check_object_size+0xfd/0x195
[ 51.419037] RSP: 0018:ffffc90003b97de0 EFLAGS: 00010282
[ 51.419039] RAX: 0000000000000066 RBX: ffff880000090000 RCX:
0000000000000000
[ 51.419042] RDX: ffff8802bddd33e8 RSI: ffff8802bddcc9e8 RDI:
ffff8802bddcc9e8
[ 51.419044] RBP: ffffc90003b97e00 R08: 000000000006648a R09:
000000000000048b
[ 51.419046] R10: 0000000000000100 R11: ffffffff81e9a86d R12:
0000000000001000
[ 51.419049] R13: 0000000000000001 R14: ffff880000091000 R15:
ffff880000090000
[ 51.419051] FS: 00007f8323436b40(0000) GS:ffff8802bddc0000(0000)
knlGS:0000000000000000
[ 51.419054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 51.419056] CR2: 00007ffcbec21000 CR3: 000000026c8e8000 CR4:
00000000003406a0
[ 51.419058] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 51.419061] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 51.419063] Call Trace:
[ 51.419066] read_mem+0x70/0x120
[ 51.419069] __vfs_read+0x28/0x130
[ 51.419072] ? security_file_permission+0x9b/0xb0
[ 51.419075] ? rw_verify_area+0x4e/0xb0
[ 51.419077] vfs_read+0x96/0x130
[ 51.419079] SyS_read+0x46/0xb0
[ 51.419082] ? SyS_lseek+0x87/0xb0
[ 51.419085] entry_SYSCALL_64_fastpath+0x1a/0xa9
[ 51.419087] RIP: 0033:0x7f8322d56bd0
[ 51.419089] RSP: 002b:00007ffcbec11c68 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 51.419091] RAX: ffffffffffffffda RBX: 0000000000000006 RCX:
00007f8322d56bd0
[ 51.419094] RDX: 0000000000010000 RSI: 00007ffcbec11ca0 RDI:
0000000000000003
[ 51.419096] RBP: 0000000000000008 R08: 0000000000000005 R09:
0000000000000050
[ 51.419098] R10: 0000000000000000 R11: 0000000000000246 R12:
0000000002231c00
[ 51.419100] R13: 00007ffcbec11c9e R14: 00007ffcbec51cf8 R15:
0000000000000000
[ 51.419103] Code: a8 81 48 c7 c2 29 69 a4 81 48 c7 c6 82 89 a5 81 48
0f 45 d0 48 c7 c0 1a 1e a6 81 48 c7 c7 d0 ed a5 81 48 0f 45 f0 e8 7f 74
f8 ff <0f> 0b 48 89 df e8 29 98 e8 ff 84 c0 0f 84 3a ff ff ff b8 00 00
[ 51.419123] RIP: __check_object_size+0xfd/0x195 RSP: ffffc90003b97de0
[ 51.421565] ---[ end trace 441f7992ca25e39d ]---
On Wed, Mar 29, 2017 at 11:44 PM, Tommi Rantala
<[email protected]> wrote:
> Hi,
>
> Running:
>
> $ sudo x86info -a
>
> On this HP ZBook 15 G3 laptop kills the x86info process with segfault and
> produces the following kernel BUG.
>
> $ git describe
> v4.11-rc4-40-gfe82203
>
> It is also reproducible with the fedora kernel: 4.9.14-200.fc25.x86_64
>
> Full dmesg output here: https://pastebin.com/raw/Kur2mpZq
>
> [ 51.418954] usercopy: kernel memory exposure attempt detected from
> ffff880000090000 (dma-kmalloc-256) (4096 bytes)
This seems like a real exposure: the copy is attempting to read 4096
bytes from a 256 byte object.
> [...]
> [ 51.419063] Call Trace:
> [ 51.419066] read_mem+0x70/0x120
> [ 51.419069] __vfs_read+0x28/0x130
> [ 51.419072] ? security_file_permission+0x9b/0xb0
> [ 51.419075] ? rw_verify_area+0x4e/0xb0
> [ 51.419077] vfs_read+0x96/0x130
> [ 51.419079] SyS_read+0x46/0xb0
> [ 51.419082] ? SyS_lseek+0x87/0xb0
> [ 51.419085] entry_SYSCALL_64_fastpath+0x1a/0xa9
I can't reproduce this myself, so I assume it's some specific /proc or
/sys file that I don't have. Are you able to get a strace of x86info
as it runs to see which file it is attempting to read here?
Thanks!
-Kees
--
Kees Cook
Pixel Security
On Thu, Mar 30, 2017 at 09:45:26AM -0700, Kees Cook wrote:
> On Wed, Mar 29, 2017 at 11:44 PM, Tommi Rantala
> <[email protected]> wrote:
> > Hi,
> >
> > Running:
> >
> > $ sudo x86info -a
> >
> > On this HP ZBook 15 G3 laptop kills the x86info process with segfault and
> > produces the following kernel BUG.
> >
> > $ git describe
> > v4.11-rc4-40-gfe82203
> >
> > It is also reproducible with the fedora kernel: 4.9.14-200.fc25.x86_64
> >
> > Full dmesg output here: https://pastebin.com/raw/Kur2mpZq
> >
> > [ 51.418954] usercopy: kernel memory exposure attempt detected from
> > ffff880000090000 (dma-kmalloc-256) (4096 bytes)
>
> This seems like a real exposure: the copy is attempting to read 4096
> bytes from a 256 byte object.
>
> > [...]
> > [ 51.419063] Call Trace:
> > [ 51.419066] read_mem+0x70/0x120
> > [ 51.419069] __vfs_read+0x28/0x130
> > [ 51.419072] ? security_file_permission+0x9b/0xb0
> > [ 51.419075] ? rw_verify_area+0x4e/0xb0
> > [ 51.419077] vfs_read+0x96/0x130
> > [ 51.419079] SyS_read+0x46/0xb0
> > [ 51.419082] ? SyS_lseek+0x87/0xb0
> > [ 51.419085] entry_SYSCALL_64_fastpath+0x1a/0xa9
>
> I can't reproduce this myself, so I assume it's some specific /proc or
> /sys file that I don't have. Are you able to get a strace of x86info
> as it runs to see which file it is attempting to read here?
Presumably this is /dev/mem, with read_mem in drivers/char/mem.c.
I guess you may have locked that down on your system anyhow. ;)
Thanks,
Mark.
On 03/30/2017 09:45 AM, Kees Cook wrote:
> On Wed, Mar 29, 2017 at 11:44 PM, Tommi Rantala
> <[email protected]> wrote:
>> Hi,
>>
>> Running:
>>
>> $ sudo x86info -a
>>
>> On this HP ZBook 15 G3 laptop kills the x86info process with segfault and
>> produces the following kernel BUG.
>>
>> $ git describe
>> v4.11-rc4-40-gfe82203
>>
>> It is also reproducible with the fedora kernel: 4.9.14-200.fc25.x86_64
>>
>> Full dmesg output here: https://pastebin.com/raw/Kur2mpZq
>>
>> [ 51.418954] usercopy: kernel memory exposure attempt detected from
>> ffff880000090000 (dma-kmalloc-256) (4096 bytes)
>
> This seems like a real exposure: the copy is attempting to read 4096
> bytes from a 256 byte object.
>
>> [...]
>> [ 51.419063] Call Trace:
>> [ 51.419066] read_mem+0x70/0x120
>> [ 51.419069] __vfs_read+0x28/0x130
>> [ 51.419072] ? security_file_permission+0x9b/0xb0
>> [ 51.419075] ? rw_verify_area+0x4e/0xb0
>> [ 51.419077] vfs_read+0x96/0x130
>> [ 51.419079] SyS_read+0x46/0xb0
>> [ 51.419082] ? SyS_lseek+0x87/0xb0
>> [ 51.419085] entry_SYSCALL_64_fastpath+0x1a/0xa9
>
> I can't reproduce this myself, so I assume it's some specific /proc or
> /sys file that I don't have. Are you able to get a strace of x86info
> as it runs to see which file it is attempting to read here?
>
> Thanks!
>
> -Kees
>
I can't see this on any of my Fedora systems. It looks like this
is trying to read /dev/mem so I suspect your BIOS is putting out
unexpected values. If you turn off hardened usercopy does x86info
give you reasonable values? I'd also echo getting an strace.
Thanks,
Laura
On Thu, Mar 30, 2017 at 10:27 AM, Laura Abbott <[email protected]> wrote:
> On 03/30/2017 09:45 AM, Kees Cook wrote:
>> On Wed, Mar 29, 2017 at 11:44 PM, Tommi Rantala
>> <[email protected]> wrote:
>>> Hi,
>>>
>>> Running:
>>>
>>> $ sudo x86info -a
>>>
>>> On this HP ZBook 15 G3 laptop kills the x86info process with segfault and
>>> produces the following kernel BUG.
>>>
>>> $ git describe
>>> v4.11-rc4-40-gfe82203
>>>
>>> It is also reproducible with the fedora kernel: 4.9.14-200.fc25.x86_64
>>>
>>> Full dmesg output here: https://pastebin.com/raw/Kur2mpZq
>>>
>>> [ 51.418954] usercopy: kernel memory exposure attempt detected from
>>> ffff880000090000 (dma-kmalloc-256) (4096 bytes)
>>
>> This seems like a real exposure: the copy is attempting to read 4096
>> bytes from a 256 byte object.
>>
>>> [...]
>>> [ 51.419063] Call Trace:
>>> [ 51.419066] read_mem+0x70/0x120
>>> [ 51.419069] __vfs_read+0x28/0x130
>>> [ 51.419072] ? security_file_permission+0x9b/0xb0
>>> [ 51.419075] ? rw_verify_area+0x4e/0xb0
>>> [ 51.419077] vfs_read+0x96/0x130
>>> [ 51.419079] SyS_read+0x46/0xb0
>>> [ 51.419082] ? SyS_lseek+0x87/0xb0
>>> [ 51.419085] entry_SYSCALL_64_fastpath+0x1a/0xa9
>>
>> I can't reproduce this myself, so I assume it's some specific /proc or
>> /sys file that I don't have. Are you able to get a strace of x86info
>> as it runs to see which file it is attempting to read here?
>
> I can't see this on any of my Fedora systems. It looks like this
> is trying to read /dev/mem so I suspect your BIOS is putting out
> unexpected values. If you turn off hardened usercopy does x86info
> give you reasonable values? I'd also echo getting an strace.
Reads out of /dev/mem should be restricted to non-RAM on Fedora, yes?
Tommi, do your kernels have CONFIG_STRICT_DEVMEM=y ?
-Kees
--
Kees Cook
Pixel Security
On 03/30/2017 10:37 AM, Kees Cook wrote:
> On Thu, Mar 30, 2017 at 10:27 AM, Laura Abbott <[email protected]> wrote:
>> On 03/30/2017 09:45 AM, Kees Cook wrote:
>>> On Wed, Mar 29, 2017 at 11:44 PM, Tommi Rantala
>>> <[email protected]> wrote:
>>>> Hi,
>>>>
>>>> Running:
>>>>
>>>> $ sudo x86info -a
>>>>
>>>> On this HP ZBook 15 G3 laptop kills the x86info process with segfault and
>>>> produces the following kernel BUG.
>>>>
>>>> $ git describe
>>>> v4.11-rc4-40-gfe82203
>>>>
>>>> It is also reproducible with the fedora kernel: 4.9.14-200.fc25.x86_64
>>>>
>>>> Full dmesg output here: https://pastebin.com/raw/Kur2mpZq
>>>>
>>>> [ 51.418954] usercopy: kernel memory exposure attempt detected from
>>>> ffff880000090000 (dma-kmalloc-256) (4096 bytes)
>>>
>>> This seems like a real exposure: the copy is attempting to read 4096
>>> bytes from a 256 byte object.
>>>
>>>> [...]
>>>> [ 51.419063] Call Trace:
>>>> [ 51.419066] read_mem+0x70/0x120
>>>> [ 51.419069] __vfs_read+0x28/0x130
>>>> [ 51.419072] ? security_file_permission+0x9b/0xb0
>>>> [ 51.419075] ? rw_verify_area+0x4e/0xb0
>>>> [ 51.419077] vfs_read+0x96/0x130
>>>> [ 51.419079] SyS_read+0x46/0xb0
>>>> [ 51.419082] ? SyS_lseek+0x87/0xb0
>>>> [ 51.419085] entry_SYSCALL_64_fastpath+0x1a/0xa9
>>>
>>> I can't reproduce this myself, so I assume it's some specific /proc or
>>> /sys file that I don't have. Are you able to get a strace of x86info
>>> as it runs to see which file it is attempting to read here?
>>
>> I can't see this on any of my Fedora systems. It looks like this
>> is trying to read /dev/mem so I suspect your BIOS is putting out
>> unexpected values. If you turn off hardened usercopy does x86info
>> give you reasonable values? I'd also echo getting an strace.
>
> Reads out of /dev/mem should be restricted to non-RAM on Fedora, yes?
>
> Tommi, do your kernels have CONFIG_STRICT_DEVMEM=y ?
>
> -Kees
>
CONFIG_STRICT_DEVMEM should be on in all Fedora kernels.
Thanks,
Laura
On Thu, Mar 30, 2017 at 09:45:26AM -0700, Kees Cook wrote:
> On Wed, Mar 29, 2017 at 11:44 PM, Tommi Rantala
> <[email protected]> wrote:
> > Hi,
> >
> > Running:
> >
> > $ sudo x86info -a
> >
> > On this HP ZBook 15 G3 laptop kills the x86info process with segfault and
> > produces the following kernel BUG.
> >
> > $ git describe
> > v4.11-rc4-40-gfe82203
> >
> > It is also reproducible with the fedora kernel: 4.9.14-200.fc25.x86_64
> >
> > Full dmesg output here: https://pastebin.com/raw/Kur2mpZq
> >
> > [ 51.418954] usercopy: kernel memory exposure attempt detected from
> > ffff880000090000 (dma-kmalloc-256) (4096 bytes)
>
> This seems like a real exposure: the copy is attempting to read 4096
> bytes from a 256 byte object.
The code[1] is doing a 4k read from /dev/mem in the range 0x90000 -> 0xa0000
According to arch/x86/mm/init.c:devmem_is_allowed, that's still valid..
Note that the printk is using the direct mapping address. Is that what's
being passed down to devmem_is_allowed now ? If so, that's probably what broke.
Dave
[1] https://github.com/kernelslacker/x86info/blob/master/mptable.c
On Thu, Mar 30, 2017 at 12:41 PM, Dave Jones <[email protected]> wrote:
> On Thu, Mar 30, 2017 at 09:45:26AM -0700, Kees Cook wrote:
> > On Wed, Mar 29, 2017 at 11:44 PM, Tommi Rantala
> > <[email protected]> wrote:
> > > Hi,
> > >
> > > Running:
> > >
> > > $ sudo x86info -a
> > >
> > > On this HP ZBook 15 G3 laptop kills the x86info process with segfault and
> > > produces the following kernel BUG.
> > >
> > > $ git describe
> > > v4.11-rc4-40-gfe82203
> > >
> > > It is also reproducible with the fedora kernel: 4.9.14-200.fc25.x86_64
> > >
> > > Full dmesg output here: https://pastebin.com/raw/Kur2mpZq
> > >
> > > [ 51.418954] usercopy: kernel memory exposure attempt detected from
> > > ffff880000090000 (dma-kmalloc-256) (4096 bytes)
> >
> > This seems like a real exposure: the copy is attempting to read 4096
> > bytes from a 256 byte object.
>
> The code[1] is doing a 4k read from /dev/mem in the range 0x90000 -> 0xa0000
> According to arch/x86/mm/init.c:devmem_is_allowed, that's still valid..
>
> Note that the printk is using the direct mapping address. Is that what's
> being passed down to devmem_is_allowed now ? If so, that's probably what broke.
So this is attempting to read physical memory 0x90000 -> 0xa0000, but
that's somehow resolving to a virtual address that is claimed by
dma-kmalloc?? I'm confused how that's happening...
-Kees
>
> Dave
>
> [1] https://github.com/kernelslacker/x86info/blob/master/mptable.c
--
Kees Cook
Pixel Security
On Thu, Mar 30, 2017 at 12:52:31PM -0700, Kees Cook wrote:
> On Thu, Mar 30, 2017 at 12:41 PM, Dave Jones <[email protected]> wrote:
> > On Thu, Mar 30, 2017 at 09:45:26AM -0700, Kees Cook wrote:
> > > On Wed, Mar 29, 2017 at 11:44 PM, Tommi Rantala
> > > <[email protected]> wrote:
> > > > Hi,
> > > >
> > > > Running:
> > > >
> > > > $ sudo x86info -a
> > > >
> > > > On this HP ZBook 15 G3 laptop kills the x86info process with segfault and
> > > > produces the following kernel BUG.
> > > >
> > > > $ git describe
> > > > v4.11-rc4-40-gfe82203
> > > >
> > > > It is also reproducible with the fedora kernel: 4.9.14-200.fc25.x86_64
> > > >
> > > > Full dmesg output here: https://pastebin.com/raw/Kur2mpZq
> > > >
> > > > [ 51.418954] usercopy: kernel memory exposure attempt detected from
> > > > ffff880000090000 (dma-kmalloc-256) (4096 bytes)
> > >
> > > This seems like a real exposure: the copy is attempting to read 4096
> > > bytes from a 256 byte object.
> >
> > The code[1] is doing a 4k read from /dev/mem in the range 0x90000 -> 0xa0000
> > According to arch/x86/mm/init.c:devmem_is_allowed, that's still valid..
> >
> > Note that the printk is using the direct mapping address. Is that what's
> > being passed down to devmem_is_allowed now ? If so, that's probably what broke.
>
> So this is attempting to read physical memory 0x90000 -> 0xa0000, but
> that's somehow resolving to a virtual address that is claimed by
> dma-kmalloc?? I'm confused how that's happening...
The only thing that I can think of would be a rogue ptr in the bios
table, but that seems unlikely. Tommi, can you put strace of x86info -mp somewhere?
That will confirm/deny whether we're at least asking the kernel to do sane things.
Dave
On 30.03.2017 23:01, Dave Jones wrote:
> On Thu, Mar 30, 2017 at 12:52:31PM -0700, Kees Cook wrote:
> > On Thu, Mar 30, 2017 at 12:41 PM, Dave Jones <[email protected]> wrote:
> > > On Thu, Mar 30, 2017 at 09:45:26AM -0700, Kees Cook wrote:
> > > > On Wed, Mar 29, 2017 at 11:44 PM, Tommi Rantala
> > > > <[email protected]> wrote:
> > > > > Hi,
> > > > >
> > > > > Running:
> > > > >
> > > > > $ sudo x86info -a
> > > > >
> > > > > On this HP ZBook 15 G3 laptop kills the x86info process with segfault and
> > > > > produces the following kernel BUG.
> > > > >
> > > > > $ git describe
> > > > > v4.11-rc4-40-gfe82203
> > > > >
> > > > > It is also reproducible with the fedora kernel: 4.9.14-200.fc25.x86_64
> > > > >
> > > > > Full dmesg output here: https://pastebin.com/raw/Kur2mpZq
> > > > >
> > > > > [ 51.418954] usercopy: kernel memory exposure attempt detected from
> > > > > ffff880000090000 (dma-kmalloc-256) (4096 bytes)
> > > >
> > > > This seems like a real exposure: the copy is attempting to read 4096
> > > > bytes from a 256 byte object.
> > >
> > > The code[1] is doing a 4k read from /dev/mem in the range 0x90000 -> 0xa0000
> > > According to arch/x86/mm/init.c:devmem_is_allowed, that's still valid..
> > >
> > > Note that the printk is using the direct mapping address. Is that what's
> > > being passed down to devmem_is_allowed now ? If so, that's probably what broke.
> >
> > So this is attempting to read physical memory 0x90000 -> 0xa0000, but
> > that's somehow resolving to a virtual address that is claimed by
> > dma-kmalloc?? I'm confused how that's happening...
>
> The only thing that I can think of would be a rogue ptr in the bios
> table, but that seems unlikely. Tommi, can you put strace of x86info -mp somewhere?
> That will confirm/deny whether we're at least asking the kernel to do sane things.
Indeed the bug happens when reading from /dev/mem:
https://pastebin.com/raw/ZEJGQP1X
# strace -f -y x86info -mp
[...]
open("/dev/mem", O_RDONLY) = 3</dev/mem>
lseek(3</dev/mem>, 1038, SEEK_SET) = 1038
read(3</dev/mem>, "\300\235", 2) = 2
lseek(3</dev/mem>, 646144, SEEK_SET) = 646144
read(3</dev/mem>,
"\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1024) = 1024
lseek(3</dev/mem>, 1043, SEEK_SET) = 1043
read(3</dev/mem>, "w\2", 2) = 2
lseek(3</dev/mem>, 645120, SEEK_SET) = 645120
read(3</dev/mem>,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1024) = 1024
lseek(3</dev/mem>, 654336, SEEK_SET) = 654336
read(3</dev/mem>,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1024) = 1024
lseek(3</dev/mem>, 983040, SEEK_SET) = 983040
read(3</dev/mem>,
"IFE$\245S\0\0\1\0\0\0\0\360y\0\0\360\220\260\30\237{=\23\10\17\0000\276\17\0"...,
65536) = 65536
lseek(3</dev/mem>, 917504, SEEK_SET) = 917504
read(3</dev/mem>,
"\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"...,
65536) = 65536
lseek(3</dev/mem>, 524288, SEEK_SET) = 524288
read(3</dev/mem>, <unfinished ...>) = ?
+++ killed by SIGSEGV +++
On 30.03.2017 20:44, Laura Abbott wrote:
> On 03/30/2017 10:37 AM, Kees Cook wrote:
>>
>> Reads out of /dev/mem should be restricted to non-RAM on Fedora, yes?
>>
>> Tommi, do your kernels have CONFIG_STRICT_DEVMEM=y ?
>>
>> -Kees
>>
>
> CONFIG_STRICT_DEVMEM should be on in all Fedora kernels.
Yes, the fedora kernels do have it enabled:
$ grep STRICT_DEVMEM /boot/config-4.9.14-200.fc25.x86_64
CONFIG_STRICT_DEVMEM=y
CONFIG_IO_STRICT_DEVMEM=y
But I do not have it in my own build:
$ grep STRICT_DEVMEM .config
# CONFIG_STRICT_DEVMEM is not set
-Tommi
On 31.03.2017 08:40, Tommi Rantala wrote:
>> The only thing that I can think of would be a rogue ptr in the bios
>> table, but that seems unlikely. Tommi, can you put strace of x86info
>> -mp somewhere?
>> That will confirm/deny whether we're at least asking the kernel to do
>> sane things.
>
> Indeed the bug happens when reading from /dev/mem:
>
> https://pastebin.com/raw/ZEJGQP1X
>
> # strace -f -y x86info -mp
> [...]
> open("/dev/mem", O_RDONLY) = 3</dev/mem>
> lseek(3</dev/mem>, 1038, SEEK_SET) = 1038
> read(3</dev/mem>, "\300\235", 2) = 2
> lseek(3</dev/mem>, 646144, SEEK_SET) = 646144
> read(3</dev/mem>,
> "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 1024) = 1024
> lseek(3</dev/mem>, 1043, SEEK_SET) = 1043
> read(3</dev/mem>, "w\2", 2) = 2
> lseek(3</dev/mem>, 645120, SEEK_SET) = 645120
> read(3</dev/mem>,
> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 1024) = 1024
> lseek(3</dev/mem>, 654336, SEEK_SET) = 654336
> read(3</dev/mem>,
> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 1024) = 1024
> lseek(3</dev/mem>, 983040, SEEK_SET) = 983040
> read(3</dev/mem>,
> "IFE$\245S\0\0\1\0\0\0\0\360y\0\0\360\220\260\30\237{=\23\10\17\0000\276\17\0"...,
> 65536) = 65536
> lseek(3</dev/mem>, 917504, SEEK_SET) = 917504
> read(3</dev/mem>,
> "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"...,
> 65536) = 65536
> lseek(3</dev/mem>, 524288, SEEK_SET) = 524288
> read(3</dev/mem>, <unfinished ...>) = ?
> +++ killed by SIGSEGV +++
That last read is done in mptable.c:347, trying to read GROPE_AREA1.
# ./x86info --debug
x86info v1.31pre
get_intel_topology:
Siblings: 2
Physical Processor ID: 0
Processor Core ID: 0
get_intel_topology:
Siblings: 2
Physical Processor ID: 0
Processor Core ID: 1
get_intel_topology:
Siblings: 2
Physical Processor ID: 0
Processor Core ID: 2
get_intel_topology:
Siblings: 2
Physical Processor ID: 0
Processor Core ID: 3
get_intel_topology:
Siblings: 2
Physical Processor ID: 0
Processor Core ID: 0
get_intel_topology:
Siblings: 2
Physical Processor ID: 0
Processor Core ID: 1
get_intel_topology:
Siblings: 2
Physical Processor ID: 0
Processor Core ID: 2
get_intel_topology:
Siblings: 2
Physical Processor ID: 0
Processor Core ID: 3
Found 8 identical CPUs
EBDA points to: 9dc0
EBDA segment ptr: 9dc00
Segmentation fault
If I comment out the GROPE_AREA1 read, the same kernel bug still happens
with the GROPE_AREA2 read.
Removing both GROPE_AREA1 and GROPE_AREA2 reads avoids the crash:
$ git diff
diff --git a/mptable.c b/mptable.c
index 480f19b..00fff35 100644
--- a/mptable.c
+++ b/mptable.c
@@ -342,6 +342,7 @@ static int apic_probe(unsigned long* paddr)
}
/* search additional memory */
+ /*
target = GROPE_AREA1;
seekEntry(target);
if (readEntry(buffer, GROPE_SIZE)) {
@@ -371,6 +372,7 @@ static int apic_probe(unsigned long* paddr)
return 6;
}
}
+ */
*paddr = (unsigned long)0;
return 0;
# ./x86info -mp
x86info v1.31pre
Found 8 identical CPUs
Extended Family: 0 Extended Model: 5 Family: 6 Model: 94 Stepping: 3
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Unknown model.
Processor name string (BIOS programmed): Intel(R) Core(TM) i7-6820HQ CPU
@ 2.70GHz
Total processor threads: 8
This system has 1 quad-core processor with hyper-threading (2 threads
per core) running at an estimated 2.70GHz
#
-Tommi
On Thu, Mar 30, 2017 at 12:52:31PM -0700, Kees Cook wrote:
> On Thu, Mar 30, 2017 at 12:41 PM, Dave Jones <[email protected]> wrote:
> > On Thu, Mar 30, 2017 at 09:45:26AM -0700, Kees Cook wrote:
> > > On Wed, Mar 29, 2017 at 11:44 PM, Tommi Rantala
> > > <[email protected]> wrote:
> > > > Hi,
> > > >
> > > > Running:
> > > >
> > > > $ sudo x86info -a
> > > >
> > > > On this HP ZBook 15 G3 laptop kills the x86info process with segfault and
> > > > produces the following kernel BUG.
> > > >
> > > > $ git describe
> > > > v4.11-rc4-40-gfe82203
> > > >
> > > > It is also reproducible with the fedora kernel: 4.9.14-200.fc25.x86_64
> > > >
> > > > Full dmesg output here: https://pastebin.com/raw/Kur2mpZq
> > > >
> > > > [ 51.418954] usercopy: kernel memory exposure attempt detected from
> > > > ffff880000090000 (dma-kmalloc-256) (4096 bytes)
> > >
> > > This seems like a real exposure: the copy is attempting to read 4096
> > > bytes from a 256 byte object.
> >
> > The code[1] is doing a 4k read from /dev/mem in the range 0x90000 -> 0xa0000
> > According to arch/x86/mm/init.c:devmem_is_allowed, that's still valid..
> >
> > Note that the printk is using the direct mapping address. Is that what's
> > being passed down to devmem_is_allowed now ? If so, that's probably what broke.
>
> So this is attempting to read physical memory 0x90000 -> 0xa0000, but
> that's somehow resolving to a virtual address that is claimed by
> dma-kmalloc?? I'm confused how that's happening...
/dev/mem is using physical addresses that the kernel translates through the
direct mapping. __check_object_size seems to think that anything passed
into it is always allocated by the kernel, but in this case, I think read_mem()
is just passing through the direct mapping to copy_to_user.
Dave
On Fri, Mar 31, 2017 at 10:17 AM, Dave Jones <[email protected]> wrote:
> On Thu, Mar 30, 2017 at 12:52:31PM -0700, Kees Cook wrote:
> > On Thu, Mar 30, 2017 at 12:41 PM, Dave Jones <[email protected]> wrote:
> > > On Thu, Mar 30, 2017 at 09:45:26AM -0700, Kees Cook wrote:
> > > > On Wed, Mar 29, 2017 at 11:44 PM, Tommi Rantala
> > > > <[email protected]> wrote:
> > > > > Hi,
> > > > >
> > > > > Running:
> > > > >
> > > > > $ sudo x86info -a
> > > > >
> > > > > On this HP ZBook 15 G3 laptop kills the x86info process with segfault and
> > > > > produces the following kernel BUG.
> > > > >
> > > > > $ git describe
> > > > > v4.11-rc4-40-gfe82203
> > > > >
> > > > > It is also reproducible with the fedora kernel: 4.9.14-200.fc25.x86_64
> > > > >
> > > > > Full dmesg output here: https://pastebin.com/raw/Kur2mpZq
> > > > >
> > > > > [ 51.418954] usercopy: kernel memory exposure attempt detected from
> > > > > ffff880000090000 (dma-kmalloc-256) (4096 bytes)
> > > >
> > > > This seems like a real exposure: the copy is attempting to read 4096
> > > > bytes from a 256 byte object.
> > >
> > > The code[1] is doing a 4k read from /dev/mem in the range 0x90000 -> 0xa0000
> > > According to arch/x86/mm/init.c:devmem_is_allowed, that's still valid..
> > >
> > > Note that the printk is using the direct mapping address. Is that what's
> > > being passed down to devmem_is_allowed now ? If so, that's probably what broke.
> >
> > So this is attempting to read physical memory 0x90000 -> 0xa0000, but
> > that's somehow resolving to a virtual address that is claimed by
> > dma-kmalloc?? I'm confused how that's happening...
>
> /dev/mem is using physical addresses that the kernel translates through the
> direct mapping. __check_object_size seems to think that anything passed
> into it is always allocated by the kernel, but in this case, I think read_mem()
> is just passing through the direct mapping to copy_to_user.
How is ffff880000090000 both in the direct mapping and a slab object?
It would need to pass all of these checks, and be marked as PageSlab
before it could be evaluated by __check_heap_object:
if (is_vmalloc_or_module_addr(ptr))
return NULL;
if (!virt_addr_valid(ptr))
return NULL;
page = virt_to_head_page(ptr);
/* Check slab allocator for flags and size. */
if (PageSlab(page))
return __check_heap_object(ptr, n, page);
-Kees
--
Kees Cook
Pixel Security
On Fri, Mar 31, 2017 at 10:32:04AM -0700, Kees Cook wrote:
> > > > > > Full dmesg output here: https://pastebin.com/raw/Kur2mpZq
> > > > > >
> > > > > > [ 51.418954] usercopy: kernel memory exposure attempt detected from
> > > > > > ffff880000090000 (dma-kmalloc-256) (4096 bytes)
> > > > >
> > > > > This seems like a real exposure: the copy is attempting to read 4096
> > > > > bytes from a 256 byte object.
> > > >
> > > > The code[1] is doing a 4k read from /dev/mem in the range 0x90000 -> 0xa0000
> > > > According to arch/x86/mm/init.c:devmem_is_allowed, that's still valid..
> > > >
> > > > Note that the printk is using the direct mapping address. Is that what's
> > > > being passed down to devmem_is_allowed now ? If so, that's probably what broke.
> > >
> > > So this is attempting to read physical memory 0x90000 -> 0xa0000, but
> > > that's somehow resolving to a virtual address that is claimed by
> > > dma-kmalloc?? I'm confused how that's happening...
> >
> > /dev/mem is using physical addresses that the kernel translates through the
> > direct mapping. __check_object_size seems to think that anything passed
> > into it is always allocated by the kernel, but in this case, I think read_mem()
> > is just passing through the direct mapping to copy_to_user.
>
> How is ffff880000090000 both in the direct mapping and a slab object?
>
> It would need to pass all of these checks, and be marked as PageSlab
> before it could be evaluated by __check_heap_object:
>
> if (is_vmalloc_or_module_addr(ptr))
> return NULL;
>
> if (!virt_addr_valid(ptr))
> return NULL;
>
> page = virt_to_head_page(ptr);
>
> /* Check slab allocator for flags and size. */
> if (PageSlab(page))
> return __check_heap_object(ptr, n, page);
Looking at Tommi's dmesg output closer, it appears that he's booting in
EFI mode (which isn't unusual these days). I'm not sure that the EBDA
(that x86info is trying to read) even exists under EFI, which is
probably why the memory range is showing up as usable, and then ending
up as a slab page, rather than being reserved by the BIOS.
...
reserve setup_data: [mem 0x0000000000059000-0x000000000009dfff] usable
...
If EBDA under EFI isn't a valid thing, the puzzling part is why there's
still an EBDA pointer in lowmem. x86 people ?
Longterm, I think I'm just going to gut all the ebda code from x86info,
as it isn't really necessary. Whether we still need to change /dev/mem
to cope with this situation depends on whether there are other valid
usecases.
Dave
On Fri, Mar 31, 2017 at 10:32 AM, Kees Cook <[email protected]> wrote:
>
> How is ffff880000090000 both in the direct mapping and a slab object?
I think this is just very regular /dev/mem behavior, that is hidden by
the fact that the *normal* case for /dev/mem is all to reserved RAM,
which will never be a slab object.
And this is all hidden with STRICT_DEVMEM, which pretty much everybody
has enabled, but Tommi for some reason did not.
> It would need to pass all of these checks, and be marked as PageSlab
> before it could be evaluated by __check_heap_object:
It trivially passes those checks, because it's a normal kernel address
for a page that is just used for kernel stuff.
I think we have two options:
- just get rid of STRICT_DEVMEM and make that unconditional
- make the read_mem/write_mem code use some non-checking copy
routines, since they are obviously designed to access any memory
location (including kernel memory) unless STRICT_DEVMEM is set.
Hmm. Thinking more about this, we do allow access to the first 1MB of
physical memory unconditionally (see devmem_is_allowed() in
arch/x86/mm/init.c). And I think we only _reserve_ the first 64kB or
something. So I guess even STRICT_DEVMEM isn't actually all that
strict.
So this should be visible even *with* STRICT_DEVMEM.
Does a simple
sudo dd if=/dev/mem of=/dev/null bs=4096 count=256
also show the same issue? Maybe regardless of STRICT_DEVMEM?
Maybe we should change devmem_is_allowed() to return a ternary value,
and then have it be "allow access" (for reserved pages), "disallow
access" (for various random stuff), and "just read zero" (for pages in
the low 1M that aren't marked reserved).
That way things like that read the low 1M (like x86info) will
hopefully not be unhappy, but also won't be reading random kernel
data.
Linus
On Fri, Mar 31, 2017 at 11:03 AM, Dave Jones <[email protected]> wrote:
> On Fri, Mar 31, 2017 at 10:32:04AM -0700, Kees Cook wrote:
>
> > > > > > > Full dmesg output here: https://pastebin.com/raw/Kur2mpZq
> > > > > > >
> > > > > > > [ 51.418954] usercopy: kernel memory exposure attempt detected from
> > > > > > > ffff880000090000 (dma-kmalloc-256) (4096 bytes)
> > > > > >
> > > > > > This seems like a real exposure: the copy is attempting to read 4096
> > > > > > bytes from a 256 byte object.
> > > > >
> > > > > The code[1] is doing a 4k read from /dev/mem in the range 0x90000 -> 0xa0000
> > > > > According to arch/x86/mm/init.c:devmem_is_allowed, that's still valid..
> > > > >
> > > > > Note that the printk is using the direct mapping address. Is that what's
> > > > > being passed down to devmem_is_allowed now ? If so, that's probably what broke.
> > > >
> > > > So this is attempting to read physical memory 0x90000 -> 0xa0000, but
> > > > that's somehow resolving to a virtual address that is claimed by
> > > > dma-kmalloc?? I'm confused how that's happening...
> > >
> > > /dev/mem is using physical addresses that the kernel translates through the
> > > direct mapping. __check_object_size seems to think that anything passed
> > > into it is always allocated by the kernel, but in this case, I think read_mem()
> > > is just passing through the direct mapping to copy_to_user.
> >
> > How is ffff880000090000 both in the direct mapping and a slab object?
> >
> > It would need to pass all of these checks, and be marked as PageSlab
> > before it could be evaluated by __check_heap_object:
> >
> > if (is_vmalloc_or_module_addr(ptr))
> > return NULL;
> >
> > if (!virt_addr_valid(ptr))
> > return NULL;
> >
> > page = virt_to_head_page(ptr);
> >
> > /* Check slab allocator for flags and size. */
> > if (PageSlab(page))
> > return __check_heap_object(ptr, n, page);
>
> Looking at Tommi's dmesg output closer, it appears that he's booting in
> EFI mode (which isn't unusual these days). I'm not sure that the EBDA
> (that x86info is trying to read) even exists under EFI, which is
> probably why the memory range is showing up as usable, and then ending
> up as a slab page, rather than being reserved by the BIOS.
>
This stuff all sucks. Presumably the only reason that we pay
attention to the EBDA at all in EFI mode is that no one has the guts
to change it: maybe there's a firmware out there that puts something
important in the EBDA and fails to properly reserve it in the EFI
memory map.
> ...
> reserve setup_data: [mem 0x0000000000059000-0x000000000009dfff] usable
> ...
>
> If EBDA under EFI isn't a valid thing, the puzzling part is why there's
> still an EBDA pointer in lowmem. x86 people ?
>
> Longterm, I think I'm just going to gut all the ebda code from x86info,
> as it isn't really necessary. Whether we still need to change /dev/mem
> to cope with this situation depends on whether there are other valid
> usecases.
I would like to at least consider a stricter alternative: make
/dev/mem a real whitelist. The rules would be that, by default,
/dev/mem access is always rejected. Kernel code could explicitly
register resources that would be permitted via /dev/mem -- each
resource would be tagged with a bit saying "devmem okay" along with
some indication of caching mode. For example, on very recent kernels,
some crappy HP tools are busted because they try to access SMBIOS
using explicit uncached devmem accesses, but that's verboten because
the kernel accesses it with ioremap_cache().
There are really very few cases where /dev/mem is okay at all, I
think. Maybe the EBDA is one of them. And we could make up some hack
where devmem access to certain ranges just gets all zeros regardless
of what's actually there.
--Andy
On 31.03.2017 21:26, Linus Torvalds wrote:
> Hmm. Thinking more about this, we do allow access to the first 1MB of
> physical memory unconditionally (see devmem_is_allowed() in
> arch/x86/mm/init.c). And I think we only _reserve_ the first 64kB or
> something. So I guess even STRICT_DEVMEM isn't actually all that
> strict.
>
> So this should be visible even *with* STRICT_DEVMEM.
>
> Does a simple
>
> sudo dd if=/dev/mem of=/dev/null bs=4096 count=256
>
> also show the same issue? Maybe regardless of STRICT_DEVMEM?
Yep, it is enough to trigger the bug.
Also crashes with the fedora kernel that has STRICT_DEVMEM:
$ sudo dd if=/dev/mem of=/dev/null bs=4096 count=256
Segmentation fault
[ 73.224025] usercopy: kernel memory exposure attempt detected from
ffff893a80059000 (dma-kmalloc-16) (4096 bytes)
[ 73.224049] ------------[ cut here ]------------
[ 73.224056] kernel BUG at mm/usercopy.c:75!
[ 73.224060] invalid opcode: 0000 [#1] SMP
[ 73.224237] CPU: 5 PID: 2860 Comm: dd Not tainted
4.9.14-200.fc25.x86_64 #1
> Maybe we should change devmem_is_allowed() to return a ternary value,
> and then have it be "allow access" (for reserved pages), "disallow
> access" (for various random stuff), and "just read zero" (for pages in
> the low 1M that aren't marked reserved).
>
> That way things like that read the low 1M (like x86info) will
> hopefully not be unhappy, but also won't be reading random kernel
> data.
>
> Linus
>
On Fri, Mar 31, 2017 at 11:26 AM, Linus Torvalds
<[email protected]> wrote:
> On Fri, Mar 31, 2017 at 10:32 AM, Kees Cook <[email protected]> wrote:
>>
>> How is ffff880000090000 both in the direct mapping and a slab object?
>
> I think this is just very regular /dev/mem behavior, that is hidden by
> the fact that the *normal* case for /dev/mem is all to reserved RAM,
> which will never be a slab object.
>
> And this is all hidden with STRICT_DEVMEM, which pretty much everybody
> has enabled, but Tommi for some reason did not.
(It tripped under Fedora (with STRICT_DEVMEM) too, but I see below you
isolated it...)
>
>> It would need to pass all of these checks, and be marked as PageSlab
>> before it could be evaluated by __check_heap_object:
>
> It trivially passes those checks, because it's a normal kernel address
> for a page that is just used for kernel stuff.
>
> I think we have two options:
>
> - just get rid of STRICT_DEVMEM and make that unconditional
I'm a fan of this whatever the case; have all the video drivers moved
away from crazy userspace direct memory access? (Or am I
misremembering the reason for allowing /dev/mem to read RAM?)
> - make the read_mem/write_mem code use some non-checking copy
> routines, since they are obviously designed to access any memory
> location (including kernel memory) unless STRICT_DEVMEM is set.
I don't think this is a probably with the usercopy code: it is
attempting to read RAM which should be blocked. It just _happens_ that
this RAM got used for slab cache.
> Hmm. Thinking more about this, we do allow access to the first 1MB of
> physical memory unconditionally (see devmem_is_allowed() in
Oooh, yes, that's the issue here. If the location is bypassing
devmem_is_allowed(), oops.
> arch/x86/mm/init.c). And I think we only _reserve_ the first 64kB or
> something. So I guess even STRICT_DEVMEM isn't actually all that
> strict.
>
> So this should be visible even *with* STRICT_DEVMEM.
>
> Does a simple
>
> sudo dd if=/dev/mem of=/dev/null bs=4096 count=256
>
> also show the same issue? Maybe regardless of STRICT_DEVMEM?
>
> Maybe we should change devmem_is_allowed() to return a ternary value,
> and then have it be "allow access" (for reserved pages), "disallow
> access" (for various random stuff), and "just read zero" (for pages in
> the low 1M that aren't marked reserved).
If that doesn't break x86info, that would be nice too.
> That way things like that read the low 1M (like x86info) will
> hopefully not be unhappy, but also won't be reading random kernel
> data.
So, this seems like an uncommon situation where <1M memory ended up in
as regular RAM. It seems like this exception is the problem?
-Kees
--
Kees Cook
Pixel Security
On Fri, Mar 31, 2017 at 12:32 PM, Tommi Rantala
<[email protected]> wrote:
> On 31.03.2017 21:26, Linus Torvalds wrote:
>>
>> Hmm. Thinking more about this, we do allow access to the first 1MB of
>> physical memory unconditionally (see devmem_is_allowed() in
>> arch/x86/mm/init.c). And I think we only _reserve_ the first 64kB or
>> something. So I guess even STRICT_DEVMEM isn't actually all that
>> strict.
>>
>> So this should be visible even *with* STRICT_DEVMEM.
>>
>> Does a simple
>>
>> sudo dd if=/dev/mem of=/dev/null bs=4096 count=256
>>
>> also show the same issue? Maybe regardless of STRICT_DEVMEM?
>
>
> Yep, it is enough to trigger the bug.
>
> Also crashes with the fedora kernel that has STRICT_DEVMEM:
>
> $ sudo dd if=/dev/mem of=/dev/null bs=4096 count=256
> Segmentation fault
>
> [ 73.224025] usercopy: kernel memory exposure attempt detected from
> ffff893a80059000 (dma-kmalloc-16) (4096 bytes)
> [ 73.224049] ------------[ cut here ]------------
> [ 73.224056] kernel BUG at mm/usercopy.c:75!
> [ 73.224060] invalid opcode: 0000 [#1] SMP
> [ 73.224237] CPU: 5 PID: 2860 Comm: dd Not tainted 4.9.14-200.fc25.x86_64
> #1
As root, what does dumping /proc/iomem show you?
For one of my systems, I see something like this:
00000000-00000fff : reserved
00001000-0008efff : System RAM
0008f000-0008ffff : reserved
00090000-0009f7ff : System RAM
0009f800-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000c7fff : Video ROM
000e0000-000fffff : reserved
000e0000-000effff : PCI Bus 0000:00
000f0000-000fffff : System ROM
00100000-cdee6fff : System RAM
cbc00000-cc49a653 : Kernel code
cc49a654-ccb661bf : Kernel data
cccf3000-cce30fff : Kernel bss
...
I note that there are two "System RAM" areas below 0x100000. In
arch/x86/mm/init.c, devmem_is_allowed() says:
/*
* devmem_is_allowed() checks to see if /dev/mem access to a certain address
* is valid. The argument is a physical page number.
*
*
* On x86, access has to be given to the first megabyte of ram because that area
* contains BIOS code and data regions used by X and dosemu and similar apps.
* Access has to be given to non-kernel-ram areas as well, these contain the PCI
* mmio resources as well as potential bios/acpi data regions.
*/
int devmem_is_allowed(unsigned long pagenr)
{
if (pagenr < 256)
return 1;
if (iomem_is_exclusive(pagenr << PAGE_SHIFT))
return 0;
if (!page_is_ram(pagenr))
return 1;
return 0;
}
This means that it allows reads into even System RAM below 0x100000,
but I think that's a mistake. Shouldn't BIOS code and data regions
already be marked as "reserved", as seen in my /proc/iomem output? I
feel like the "pagenr < 256" exception should be dropped, but I don't
know all the minor details on the history here.
When I remove this exception, x86info blows up for me ("error reading
EBDA pointer").
So, my question is: are there actually BIOS code/data in memory areas
marked as System RAM? If so, what normally keeps them from being used
for kernel memory? If not, then I assume x86info is wrong?
Dave, you implied the latter, but I wanted to make sure this is
actually true? (And if so, we need to do something like what Linus
suggested to return zeros to keep old x86info "happy" -- would that
keep it happy?)
-Kees
--
Kees Cook
Pixel Security
On Tue, Apr 4, 2017 at 3:37 PM, Kees Cook <[email protected]> wrote:
>
> For one of my systems, I see something like this:
>
> 00000000-00000fff : reserved
> 00001000-0008efff : System RAM
> 0008f000-0008ffff : reserved
> 00090000-0009f7ff : System RAM
> 0009f800-0009ffff : reserved
That's fairly normal.
> I note that there are two "System RAM" areas below 0x100000.
Yes. Traditionally the area from about 4k to 640kB is RAM. With a
random smattering of BIOS areas.
> * On x86, access has to be given to the first megabyte of ram because that area
> * contains BIOS code and data regions used by X and dosemu and similar apps.
Rigth. Traditionally, dosemu did one big mmap of the 1MB area to just
get all the BIOS data in one go.
> This means that it allows reads into even System RAM below 0x100000,
> but I think that's a mistake.
What you think is a "mistake" is how /dev/mem has always worked.
/dev/mem gave access to all the memory of the system. That's LITERALLY
the whole point of it. There was no "BIOS area" or anything else. It
was access to physical memory.
We've added limits to it, but those limits came later, and they came
with the caveat that lots of programs used /dev/mem in various ways.
Nobody was crazy enough to read /dev/mem one byte at a time trying to
follow BIOS tables. No, the traditional way was to just map (or read)
large chunks of it, and then follow the tables in the result. The
easiest way was to just do the whole low 1MB.
There's no "mistake" here. The only thing that is mistaken is you
thinking that we can redefine reality and change history.
I already explained what the likely fix is: make devmem_is_allowed()
return a ternary value, so that those things that *do* read the BIOS
area can just continue to do so, but they see zeroes for the parts
that the kernel has taken over.
Linus
On Tue, Apr 4, 2017 at 3:55 PM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Apr 4, 2017 at 3:37 PM, Kees Cook <[email protected]> wrote:
>>
>> For one of my systems, I see something like this:
>>
>> 00000000-00000fff : reserved
>> 00001000-0008efff : System RAM
>> 0008f000-0008ffff : reserved
>> 00090000-0009f7ff : System RAM
>> 0009f800-0009ffff : reserved
>
> That's fairly normal.
>
>> I note that there are two "System RAM" areas below 0x100000.
>
> Yes. Traditionally the area from about 4k to 640kB is RAM. With a
> random smattering of BIOS areas.
>
>> * On x86, access has to be given to the first megabyte of ram because that area
>> * contains BIOS code and data regions used by X and dosemu and similar apps.
>
> Rigth. Traditionally, dosemu did one big mmap of the 1MB area to just
> get all the BIOS data in one go.
>
>> This means that it allows reads into even System RAM below 0x100000,
>> but I think that's a mistake.
>
> What you think is a "mistake" is how /dev/mem has always worked.
>
> /dev/mem gave access to all the memory of the system. That's LITERALLY
> the whole point of it. There was no "BIOS area" or anything else. It
> was access to physical memory.
>
> We've added limits to it, but those limits came later, and they came
> with the caveat that lots of programs used /dev/mem in various ways.
>
> Nobody was crazy enough to read /dev/mem one byte at a time trying to
> follow BIOS tables. No, the traditional way was to just map (or read)
> large chunks of it, and then follow the tables in the result. The
> easiest way was to just do the whole low 1MB.
>
> There's no "mistake" here. The only thing that is mistaken is you
> thinking that we can redefine reality and change history.
I'm not trying to rewrite history. :) I'm try to understand the
requirements for how the 1MB area was used, which you've explained the
history of now. (Thank you!)
> I already explained what the likely fix is: make devmem_is_allowed()
> return a ternary value, so that those things that *do* read the BIOS
> area can just continue to do so, but they see zeroes for the parts
> that the kernel has taken over.
Sounds good to me. I'll go work on that.
-Kees
--
Kees Cook
Pixel Security
On Tue, Apr 4, 2017 at 3:55 PM, Linus Torvalds
<[email protected]> wrote:
>
> I already explained what the likely fix is: make devmem_is_allowed()
> return a ternary value, so that those things that *do* read the BIOS
> area can just continue to do so, but they see zeroes for the parts
> that the kernel has taken over.
Actually, a simpler solution might be to
(a) keep the binary value
(b) remove the test for the low 1M
(c) to avoid breakage, don't return _error_, but just always read zero
that also removes (or at least makes it much more expensive) a signal
of which pages are kernel allocated vs BIOS allocated.
Linus
On Tue, Apr 4, 2017 at 5:22 PM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Apr 4, 2017 at 3:55 PM, Linus Torvalds
> <[email protected]> wrote:
>>
>> I already explained what the likely fix is: make devmem_is_allowed()
>> return a ternary value, so that those things that *do* read the BIOS
>> area can just continue to do so, but they see zeroes for the parts
>> that the kernel has taken over.
>
> Actually, a simpler solution might be to
>
> (a) keep the binary value
>
> (b) remove the test for the low 1M
>
> (c) to avoid breakage, don't return _error_, but just always read zero
>
> that also removes (or at least makes it much more expensive) a signal
> of which pages are kernel allocated vs BIOS allocated.
This last part (reading zero) is what I'm poking at now. It's not
obvious to me yet how to make the mmap interface hand back zero-mapped
pages. I'll keep digging...
-Kees
--
Kees Cook
Pixel Security