Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754914AbdDDWzX (ORCPT ); Tue, 4 Apr 2017 18:55:23 -0400 Received: from mail-it0-f47.google.com ([209.85.214.47]:37394 "EHLO mail-it0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753123AbdDDWzW (ORCPT ); Tue, 4 Apr 2017 18:55:22 -0400 MIME-Version: 1.0 In-Reply-To: References: <20170330194143.cbracica3w3ijrcx@codemonkey.org.uk> <20170331171724.nm22iqiellfsvj5z@codemonkey.org.uk> From: Linus Torvalds Date: Tue, 4 Apr 2017 15:55:21 -0700 X-Google-Sender-Auth: dBOTsISbF4zCCmm90zKtDsEI4Jc Message-ID: Subject: Re: sudo x86info -a => kernel BUG at mm/usercopy.c:78! To: Kees Cook Cc: Tommi Rantala , Dave Jones , Linux-MM , LKML , Laura Abbott , Ingo Molnar , Josh Poimboeuf , Mark Rutland , Eric Biggers Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1864 Lines: 49 On Tue, Apr 4, 2017 at 3:37 PM, Kees Cook wrote: > > For one of my systems, I see something like this: > > 00000000-00000fff : reserved > 00001000-0008efff : System RAM > 0008f000-0008ffff : reserved > 00090000-0009f7ff : System RAM > 0009f800-0009ffff : reserved That's fairly normal. > I note that there are two "System RAM" areas below 0x100000. Yes. Traditionally the area from about 4k to 640kB is RAM. With a random smattering of BIOS areas. > * On x86, access has to be given to the first megabyte of ram because that area > * contains BIOS code and data regions used by X and dosemu and similar apps. Rigth. Traditionally, dosemu did one big mmap of the 1MB area to just get all the BIOS data in one go. > This means that it allows reads into even System RAM below 0x100000, > but I think that's a mistake. What you think is a "mistake" is how /dev/mem has always worked. /dev/mem gave access to all the memory of the system. That's LITERALLY the whole point of it. There was no "BIOS area" or anything else. It was access to physical memory. We've added limits to it, but those limits came later, and they came with the caveat that lots of programs used /dev/mem in various ways. Nobody was crazy enough to read /dev/mem one byte at a time trying to follow BIOS tables. No, the traditional way was to just map (or read) large chunks of it, and then follow the tables in the result. The easiest way was to just do the whole low 1MB. There's no "mistake" here. The only thing that is mistaken is you thinking that we can redefine reality and change history. I already explained what the likely fix is: make devmem_is_allowed() return a ternary value, so that those things that *do* read the BIOS area can just continue to do so, but they see zeroes for the parts that the kernel has taken over. Linus