Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp11145625imu; Thu, 6 Dec 2018 12:18:33 -0800 (PST) X-Google-Smtp-Source: AFSGD/VAIoSAV6ttVAGiHMc1x7u+o9bWaE+UMBv3JM2VB+oa5vWTTyZzIeutz68BaJAlYoDQfcoE X-Received: by 2002:a62:6ec8:: with SMTP id j191mr29920466pfc.198.1544127513217; Thu, 06 Dec 2018 12:18:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544127513; cv=none; d=google.com; s=arc-20160816; b=QGsFijXlax01jrVGGhHWJd0elL6FpUf0OK3ETBDILh91pu6Kop95RKoUqSwV66TWs2 fLzZQx+CEPE7olkDnKr8JBDRdLYtiX9PWVoohb4P3qwh8qhhF1YzCgBdOfHmzItM5r6K G1XL5Ue5jKWKW0RH412vS2GnMkBycFzkqwu2LuaYyIsQhEd0/yJ38KFOtzsHllaw5eOd Q0rEYEwwMWrsSwXJ9s2/ZysN7PiYc9Bl9PLOS9bliSlpHkQQT9Jv1jcuBOADvz8OYv25 1F6i8g2hkNPvSmKlINqZ7qOHmK3z6chR3eMyKm78o+UKiTgk9gIdjI+fNdmltJen950m rgJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=4FeavY0q0fQ99qnR5hCEWU+V+m0mbjQTJbshduIo08Q=; b=VsN4F0tabPTiTrjzfkxMlDqDFCEu6YSD6Kur5KAmoiPrJdmLqD+9iYUIZyfp+qEX8U jvyaKgnYPWey3P9pqxf3naQWN6TCZ9eJ+LccoIh4Zlh0E5JS1216tSuphUq/I/XSRMpF Kh/0Mbb5jWeBvOyQzkOd/Q0j+BohPES9Id2fkyBQsoyXZ5Jvgki9JIRk0MEN275U6TgJ z7dTwcPueLqc7Dg8gG/aL1aqU6z1vqddPII9XPIcLXEouP5DVfJI0gD5kg6tg4F4QA37 +xkr0DQlGgQ/KBgQ5zI3tlB//3F3NL6ic241MPJ1I48qYp8CMzs0tF+l01ZlEXxpK/mo EcYg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="W/m0XLYQ"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a6si894650pgc.137.2018.12.06.12.18.17; Thu, 06 Dec 2018 12:18:33 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="W/m0XLYQ"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725994AbeLFUR3 (ORCPT + 99 others); Thu, 6 Dec 2018 15:17:29 -0500 Received: from mail.kernel.org ([198.145.29.99]:40880 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725916AbeLFUR2 (ORCPT ); Thu, 6 Dec 2018 15:17:28 -0500 Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 1587621527 for ; Thu, 6 Dec 2018 20:17:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1544127447; bh=aYpSE6/Viuz7vX/vVqaS/BRu788gryMJQ+HeFYlEIWo=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=W/m0XLYQit/3nqGKvUd6irx5sIOXFRWuTmldMpj5xBFbBKROYGD9i+yFTpFTUs7/X XDSc43shDdXnv7gyUFbNSdwZEmd8xLU2L35RInoJlGNZWT+ftUW+IENyxCbwwmLmM0 bhKJgOQgSRHrRuf/wl3bGBKlWnTarc5fpz8NAo8I= Received: by mail-wm1-f43.google.com with SMTP id n190so2283874wmd.0 for ; Thu, 06 Dec 2018 12:17:27 -0800 (PST) X-Gm-Message-State: AA+aEWZ7PrLzMn1pJM2FoFh0xrA8Q9hKMF1q8nYoNuAdzFzCEkT6jZos icO+tmkJc13fo8bCAQDUvbxPvgx62upuSKkx/cSIRg== X-Received: by 2002:a1c:f112:: with SMTP id p18mr19642431wmh.83.1544127445386; Thu, 06 Dec 2018 12:17:25 -0800 (PST) MIME-Version: 1.0 References: <20181128000754.18056-1-rick.p.edgecombe@intel.com> <20181128000754.18056-2-rick.p.edgecombe@intel.com> <4883FED1-D0EC-41B0-A90F-1A697756D41D@gmail.com> <20181204160304.GB7195@arm.com> <51281e69a3722014f718a6840f43b2e6773eed90.camel@intel.com> <20181205114148.GA15160@arm.com> <20181206190115.GC10086@cisco> In-Reply-To: From: Andy Lutomirski Date: Thu, 6 Dec 2018 12:17:13 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 1/2] vmalloc: New flag for flush before releasing pages To: Nadav Amit Cc: Andrew Lutomirski , Tycho Andersen , Ard Biesheuvel , Will Deacon , Rick Edgecombe , LKML , Daniel Borkmann , Jessica Yu , Steven Rostedt , Alexei Starovoitov , Linux-MM , Jann Horn , "Dock, Deneen T" , Peter Zijlstra , Kristen Carlson Accardi , Andrew Morton , Ingo Molnar , Anil S Keshavamurthy , Kernel Hardening , Masami Hiramatsu , "Naveen N . Rao" , "David S. Miller" , Network Development , Dave Hansen Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 6, 2018 at 11:39 AM Nadav Amit wrote: > > > On Dec 6, 2018, at 11:19 AM, Andy Lutomirski wrote: > > > > On Thu, Dec 6, 2018 at 11:01 AM Tycho Andersen wrote: > >> On Thu, Dec 06, 2018 at 10:53:50AM -0800, Andy Lutomirski wrote: > >>>> If we are going to unmap the linear alias, why not do it at vmalloc(= ) > >>>> time rather than vfree() time? > >>> > >>> That=E2=80=99s not totally nuts. Do we ever have code that expects __= va() to > >>> work on module data? Perhaps crypto code trying to encrypt static > >>> data because our APIs don=E2=80=99t understand virtual addresses. I = guess if > >>> highmem is ever used for modules, then we should be fine. > >>> > >>> RO instead of not present might be safer. But I do like the idea of > >>> renaming Rick's flag to something like VM_XPFO or VM_NO_DIRECT_MAP an= d > >>> making it do all of this. > >> > >> Yeah, doing it for everything automatically seemed like it was/is > >> going to be a lot of work to debug all the corner cases where things > >> expect memory to be mapped but don't explicitly say it. And in > >> particular, the XPFO series only does it for user memory, whereas an > >> additional flag like this would work for extra paranoid allocations > >> of kernel memory too. > > > > I just read the code, and I looks like vmalloc() is already using > > highmem (__GFP_HIGH) if available, so, on big x86_32 systems, for > > example, we already don't have modules in the direct map. > > > > So I say we go for it. This should be quite simple to implement -- > > the pageattr code already has almost all the needed logic on x86. The > > only arch support we should need is a pair of functions to remove a > > vmalloc address range from the address map (if it was present in the > > first place) and a function to put it back. On x86, this should only > > be a few lines of code. > > > > What do you all think? This should solve most of the problems we have. > > > > If we really wanted to optimize this, we'd make it so that > > module_alloc() allocates memory the normal way, then, later on, we > > call some function that, all at once, removes the memory from the > > direct map and applies the right permissions to the vmalloc alias (or > > just makes the vmalloc alias not-present so we can add permissions > > later without flushing), and flushes the TLB. And we arrange for > > vunmap to zap the vmalloc range, then put the memory back into the > > direct map, then free the pages back to the page allocator, with the > > flush in the appropriate place. > > > > I don't see why the page allocator needs to know about any of this. > > It's already okay with the permissions being changed out from under it > > on x86, and it seems fine. Rick, do you want to give some variant of > > this a try? > > Setting it as read-only may work (and already happens for the read-only > module data). I am not sure about setting it as non-present. > > At some point, a discussion about a threat-model, as Rick indicated, woul= d > be required. I presume ROP attacks can easily call set_all_modules_text_r= w() > and override all the protections. > I am far from an expert on exploit techniques, but here's a potentially useful model: let's assume there's an attacker who can write controlled data to a controlled kernel address but cannot directly modify control flow. It would be nice for such an attacker to have a very difficult time of modifying kernel text or of compromising control flow. So we're assuming a feature like kernel CET or that the attacker finds it very difficult to do something like modifying some thread's IRET frame. Admittedly, for the kernel, this is an odd threat model, since an attacker can presumably quite easily learn the kernel stack address of one of their tasks, do some syscall, and then modify their kernel thread's stack such that it will IRET right back to a fully controlled register state with RSP pointing at an attacker-supplied kernel stack. So this threat model gives very strong ROP powers. unless we have either CET or some software technique to harden all the RET instructions in the kernel. I wonder if there's a better model to use. Maybe with stack-protector we get some degree of protection? Or is all of this is rather weak until we have CET or a RAP-like feature.