Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755209AbdDDWhL (ORCPT ); Tue, 4 Apr 2017 18:37:11 -0400 Received: from mail-io0-f173.google.com ([209.85.223.173]:34150 "EHLO mail-io0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752346AbdDDWhJ (ORCPT ); Tue, 4 Apr 2017 18:37:09 -0400 MIME-Version: 1.0 In-Reply-To: References: <20170330194143.cbracica3w3ijrcx@codemonkey.org.uk> <20170331171724.nm22iqiellfsvj5z@codemonkey.org.uk> From: Kees Cook Date: Tue, 4 Apr 2017 15:37:07 -0700 X-Google-Sender-Auth: gJDKaLsmms-UMPZ4FEf8tA42AS0 Message-ID: Subject: Re: sudo x86info -a => kernel BUG at mm/usercopy.c:78! To: Tommi Rantala Cc: Linus Torvalds , Dave Jones , Linux-MM , LKML , Laura Abbott , Ingo Molnar , Josh Poimboeuf , Mark Rutland , Eric Biggers Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3401 Lines: 101 On Fri, Mar 31, 2017 at 12:32 PM, Tommi Rantala wrote: > On 31.03.2017 21:26, Linus Torvalds wrote: >> >> Hmm. Thinking more about this, we do allow access to the first 1MB of >> physical memory unconditionally (see devmem_is_allowed() in >> arch/x86/mm/init.c). And I think we only _reserve_ the first 64kB or >> something. So I guess even STRICT_DEVMEM isn't actually all that >> strict. >> >> So this should be visible even *with* STRICT_DEVMEM. >> >> Does a simple >> >> sudo dd if=/dev/mem of=/dev/null bs=4096 count=256 >> >> also show the same issue? Maybe regardless of STRICT_DEVMEM? > > > Yep, it is enough to trigger the bug. > > Also crashes with the fedora kernel that has STRICT_DEVMEM: > > $ sudo dd if=/dev/mem of=/dev/null bs=4096 count=256 > Segmentation fault > > [ 73.224025] usercopy: kernel memory exposure attempt detected from > ffff893a80059000 (dma-kmalloc-16) (4096 bytes) > [ 73.224049] ------------[ cut here ]------------ > [ 73.224056] kernel BUG at mm/usercopy.c:75! > [ 73.224060] invalid opcode: 0000 [#1] SMP > [ 73.224237] CPU: 5 PID: 2860 Comm: dd Not tainted 4.9.14-200.fc25.x86_64 > #1 As root, what does dumping /proc/iomem show you? For one of my systems, I see something like this: 00000000-00000fff : reserved 00001000-0008efff : System RAM 0008f000-0008ffff : reserved 00090000-0009f7ff : System RAM 0009f800-0009ffff : reserved 000a0000-000bffff : PCI Bus 0000:00 000c0000-000c7fff : Video ROM 000e0000-000fffff : reserved 000e0000-000effff : PCI Bus 0000:00 000f0000-000fffff : System ROM 00100000-cdee6fff : System RAM cbc00000-cc49a653 : Kernel code cc49a654-ccb661bf : Kernel data cccf3000-cce30fff : Kernel bss ... I note that there are two "System RAM" areas below 0x100000. In arch/x86/mm/init.c, devmem_is_allowed() says: /* * devmem_is_allowed() checks to see if /dev/mem access to a certain address * is valid. The argument is a physical page number. * * * On x86, access has to be given to the first megabyte of ram because that area * contains BIOS code and data regions used by X and dosemu and similar apps. * Access has to be given to non-kernel-ram areas as well, these contain the PCI * mmio resources as well as potential bios/acpi data regions. */ int devmem_is_allowed(unsigned long pagenr) { if (pagenr < 256) return 1; if (iomem_is_exclusive(pagenr << PAGE_SHIFT)) return 0; if (!page_is_ram(pagenr)) return 1; return 0; } This means that it allows reads into even System RAM below 0x100000, but I think that's a mistake. Shouldn't BIOS code and data regions already be marked as "reserved", as seen in my /proc/iomem output? I feel like the "pagenr < 256" exception should be dropped, but I don't know all the minor details on the history here. When I remove this exception, x86info blows up for me ("error reading EBDA pointer"). So, my question is: are there actually BIOS code/data in memory areas marked as System RAM? If so, what normally keeps them from being used for kernel memory? If not, then I assume x86info is wrong? Dave, you implied the latter, but I wanted to make sure this is actually true? (And if so, we need to do something like what Linus suggested to return zeros to keep old x86info "happy" -- would that keep it happy?) -Kees -- Kees Cook Pixel Security