Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp8671848imu; Tue, 4 Dec 2018 12:11:01 -0800 (PST) X-Google-Smtp-Source: AFSGD/VzFrKCF9LJtAPRc8ypGfkkjjusDwt5ch6eGR4ksNh/uJv2TGUrSS1Zr8WBjpR3DAl4cAke X-Received: by 2002:a17:902:b48b:: with SMTP id y11mr20559243plr.200.1543954261092; Tue, 04 Dec 2018 12:11:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543954261; cv=none; d=google.com; s=arc-20160816; b=iHoXQkvtB9oN6Kci5fqsAdW2PzqB7pKcC2y6pB/z7YZzblGPd2Qsnq7l/cIfqzRmqG f9CfDvvPHorAgfBnkKmNADmey2m2iQU5LcQGkOzeDFP3WQ8QBqHHZV79ESV2i7z+Pqfh BDl3l+uBr7MjRw/7R1Dl7U1RcyDXWASC9WDmJ47EIF2itai0a/db2ZvsfrTyZkN50iji 4bLJRGKA7nfLHLBeIsj32MRviyI2qjusqcVsr2aGbQBnMwO/9G+REwohP91Ctq+fHwYO Z2jWkJ+iVesJs+tE9vNwMzA6aIcb26BHWdZaN1WKJfxnvEQElJLkgFy/YXgnZYP9k09z 90rw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=LkOjYkNTcd4R+psOWsTz0ULcjgvi3Z1YAyN+AJ64sZE=; b=TYsroK/Xu2Mg0PAAyet0BE463+DxkeSIfDFcWJAgdSA2n6feYyj+cJq9DVZPnk5P5F O+9Ez99cd4x8Dmhj4T64MzJI7VB+61tiSuTcNdtLi3w6iBe7ZgBcm2XidHDMfVtnTrNE h+m7V6qhauVwnkVNnOClNK8II7L4LTH07HIa9kl7vN8fthKxdPVcCoJPMxQimXLUSvvX vG5tDQrO4GqMdw1ZIlOl7cFlo3ul/dCly5+llaBI5x6vWz2kr8LnWGB0Gx2dEx1MJqAq S+sPt8pd3unnQZwiFFZcJUHofrRNnNoFHfASRO8ZO9ElZFEV8EqN7Z640fDTZ1+z3bSX WqfA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=lzH3Ij0k; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 6si8187862plc.241.2018.12.04.12.10.44; Tue, 04 Dec 2018 12:11:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=lzH3Ij0k; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725957AbeLDUKF (ORCPT + 99 others); Tue, 4 Dec 2018 15:10:05 -0500 Received: from mail.kernel.org ([198.145.29.99]:57466 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725797AbeLDUKF (ORCPT ); Tue, 4 Dec 2018 15:10:05 -0500 Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 45E01214DA for ; Tue, 4 Dec 2018 20:10:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1543954203; bh=rHbnZG4MYjKT/ezKzLuM6i0ZWebai6pXr4qhMdGvoEA=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=lzH3Ij0k20L5VngphdWVlfN1ppK3uuKJjiq0kbIswYAiN49urw81aL0/nAEc1Aex9 lrFDxLbzUYtM4Qxo48SALl6foXo4LLWAcSAVqN3G4WThcswqSjpC9UpiG6GIZZB4Fk 0AErTsvykrEBuRj95n1I2F26b65RyxAyEIJGoq5Q= Received: by mail-wr1-f42.google.com with SMTP id j10so17293737wru.4 for ; Tue, 04 Dec 2018 12:10:03 -0800 (PST) X-Gm-Message-State: AA+aEWZbbb6rhDB3GpYQlejMNuAiyF5PmdLtL9GTPxY+J3Iqt2A72Bjf Wgg7l3XUc/Q6gkZeFeo2cCO5CBknvx7nuRphv86VAw== X-Received: by 2002:adf:f0c5:: with SMTP id x5mr18519605wro.77.1543954201627; Tue, 04 Dec 2018 12:10:01 -0800 (PST) MIME-Version: 1.0 References: <20181128000754.18056-1-rick.p.edgecombe@intel.com> <20181128000754.18056-2-rick.p.edgecombe@intel.com> <4883FED1-D0EC-41B0-A90F-1A697756D41D@gmail.com> <20181204160304.GB7195@arm.com> <51281e69a3722014f718a6840f43b2e6773eed90.camel@intel.com> In-Reply-To: <51281e69a3722014f718a6840f43b2e6773eed90.camel@intel.com> From: Andy Lutomirski Date: Tue, 4 Dec 2018 12:09:49 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 1/2] vmalloc: New flag for flush before releasing pages To: Rick Edgecombe Cc: Will Deacon , Nadav Amit , LKML , Daniel Borkmann , jeyu@kernel.org, Steven Rostedt , Alexei Starovoitov , Ard Biesheuvel , Linux-MM , Jann Horn , "Dock, Deneen T" , Peter Zijlstra , Kristen Carlson Accardi , Andrew Morton , Ingo Molnar , Andrew Lutomirski , Anil S Keshavamurthy , Kernel Hardening , Masami Hiramatsu , "Naveen N . Rao" , "David S. Miller" , Network Development , Dave Hansen Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 4, 2018 at 12:02 PM Edgecombe, Rick P wrote: > > On Tue, 2018-12-04 at 16:03 +0000, Will Deacon wrote: > > On Mon, Dec 03, 2018 at 05:43:11PM -0800, Nadav Amit wrote: > > > > On Nov 27, 2018, at 4:07 PM, Rick Edgecombe > > > > wrote: > > > > > > > > Since vfree will lazily flush the TLB, but not lazily free the unde= rlying > > > > pages, > > > > it often leaves stale TLB entries to freed pages that could get re-= used. > > > > This is > > > > undesirable for cases where the memory being freed has special perm= issions > > > > such > > > > as executable. > > > > > > So I am trying to finish my patch-set for preventing transient W+X ma= ppings > > > from taking space, by handling kprobes & ftrace that I missed (thanks= again > > > for > > > pointing it out). > > > > > > But all of the sudden, I don=E2=80=99t understand why we have the pro= blem that this > > > (your) patch-set deals with at all. We already change the mappings to= make > > > the memory writable before freeing the memory, so why can=E2=80=99t w= e make it > > > non-executable at the same time? Actually, why do we make the module = memory, > > > including its data executable before freeing it??? > > > > Yeah, this is really confusing, but I have a suspicion it's a combinati= on > > of the various different configurations and hysterical raisins. We can'= t > > rely on module_alloc() allocating from the vmalloc area (see nios2) nor > > can we rely on disable_ro_nx() being available at build time. > > > > If we *could* rely on module allocations always using vmalloc(), then > > we could pass in Rick's new flag and drop disable_ro_nx() altogether > > afaict -- who cares about the memory attributes of a mapping that's abo= ut > > to disappear anyway? > > > > Is it just nios2 that does something different? > > > > Will > > Yea it is really intertwined. I think for x86, set_memory_nx everywhere w= ould > solve it as well, in fact that was what I first thought the solution shou= ld be > until this was suggested. It's interesting that from the other thread Mas= ami > Hiramatsu referenced, set_memory_nx was suggested last year and would hav= e > inadvertently blocked this on x86. But, on the other architectures I have= since > learned it is a bit different. > > It looks like actually most arch's don't re-define set_memory_*, and so a= ll of > the frob_* functions are actually just noops. In which case allocating RW= X is > needed to make it work at all, because that is what the allocation is goi= ng to > stay at. So in these archs, set_memory_nx won't solve it because it will = do > nothing. > > On x86 I think you cannot get rid of disable_ro_nx fully because there is= the > changing of the permissions on the directmap as well. You don't want some= other > caller getting a page that was left RO when freed and then trying to writ= e to > it, if I understand this. > Exactly. After slightly more thought, I suggest renaming VM_IMMEDIATE_UNMAP to VM_MAY_ADJUST_PERMS or similar. It would have the semantics you want, but it would also call some arch hooks to put back the direct map permissions before the flush. Does that seem reasonable? It would need to be hooked up that implement set_memory_ro(), but that should be quite easy. If nothing else, it could fall back to set_memory_ro() in the absence of a better implementation.