Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp507496imm; Tue, 15 May 2018 05:08:28 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrrUBNV4Uv2VaJvdzWc/YRZPKY3NAC/kfQbmWKitgHG6Cwh9GMWl6DQ2O2nv+rVYSXb6ean X-Received: by 2002:a62:df4c:: with SMTP id u73-v6mr14759480pfg.10.1526386107972; Tue, 15 May 2018 05:08:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526386107; cv=none; d=google.com; s=arc-20160816; b=YdKzFwiVS6cdYSPVyLY+k7XKC/2RXt0ksULEOYciRNCXsYQyntfDNelHlLSrGgn0W7 UwgKwVPLMCpNoqJtuvJ724AtWKZXfE/hstW+RzGulxljf5YIkY18hcrwq7fB/Mdryc/A Z5OLNs8XkaTP9HvJMbZemwFzaDFXl+zVqJaSXpRcw17E3O/4pfdgYYo8tMb7u2MnhCzN LDhUfMDt/wQkpjnC415WdGsIMKMB/3k5B/A2zqa8dp9L1Lm9EOYLluQgE8gPO6HPsunF BdxCSuUs1+g6WW82+Xi8NSBYLo7UWyjNdVepA/ca7fECzF/GTxzRhnQwc+Eb8F6iTi10 xHUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=QQFju7LIOPqIq9c/juEFZOjeVSabQrCC1QebJ7sU50M=; b=ogxY3toxuIN8x5CQf5dmnUpTDmL1KSgdsqd7G65sqUu5mEFlMArdIuc0jVonlInkPU qv4O4Mbehg3moAvZZ+bzQ3MSoSxjPIzQNKRlzwRgx7CMtNabVUWdK54ZS5KPvj/g8iRN 8S+TyxOZnD/l/iU4Vx+ZxDfWBhNfqaJUNVz/iybGR7a7vfbPB2v2tWnkWAdOWJGusFb6 i0VTiug1RwzWIieqz+DrjQPmsX6xECINBIGeplY3IbLQYyPL5t+Ij7eXkXMIoH8PrvAP IP6YJcPWCM/i/rRLwObXxNNmPcZfLjUxv4AjLNAAnqYFKBhpVVFQN7XeZ/jR/UDsQYcL iQrA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s4-v6si9211725pgn.403.2018.05.15.05.08.09; Tue, 15 May 2018 05:08:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753141AbeEOMH6 (ORCPT + 99 others); Tue, 15 May 2018 08:07:58 -0400 Received: from foss.arm.com ([217.140.101.70]:59374 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752211AbeEOMH5 (ORCPT ); Tue, 15 May 2018 08:07:57 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3CE771529; Tue, 15 May 2018 05:07:57 -0700 (PDT) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 492F33F23C; Tue, 15 May 2018 05:07:54 -0700 (PDT) Date: Tue, 15 May 2018 13:07:51 +0100 From: Mark Rutland To: Boaz Harrosh Cc: Matthew Wilcox , Jeff Moyer , Andrew Morton , "Kirill A. Shutemov" , linux-kernel , linux-fsdevel , "linux-mm@kvack.org" , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Peter Zijlstra , Dave Hansen , Rik van Riel , Jan Kara , Matthew Wilcox , Amit Golander Subject: Re: [PATCH] mm: Add new vma flag VM_LOCAL_CPU Message-ID: <20180515120750.lro2qbskw5cptc5o@lakrids.cambridge.arm.com> References: <0efb5547-9250-6b6c-fe8e-cf4f44aaa5eb@netapp.com> <20180514191551.GA27939@bombadil.infradead.org> <7ec6fa37-8529-183d-d467-df3642bcbfd2@netapp.com> <20180515004137.GA5168@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 15, 2018 at 01:43:23PM +0300, Boaz Harrosh wrote: > On 15/05/18 03:41, Matthew Wilcox wrote: > > On Mon, May 14, 2018 at 10:37:38PM +0300, Boaz Harrosh wrote: > >> On 14/05/18 22:15, Matthew Wilcox wrote: > >>> On Mon, May 14, 2018 at 08:28:01PM +0300, Boaz Harrosh wrote: > >>>> On a call to mmap an mmap provider (like an FS) can put > >>>> this flag on vma->vm_flags. > >>>> > >>>> The VM_LOCAL_CPU flag tells the Kernel that the vma will be used > >>>> from a single-core only, and therefore invalidation (flush_tlb) of > >>>> PTE(s) need not be a wide CPU scheduling. > >>> > >>> I still don't get this. You're opening the kernel up to being exploited > >>> by any application which can persuade it to set this flag on a VMA. > >>> > >> > >> No No this is not an application accessible flag this can only be set > >> by the mmap implementor at ->mmap() time (Say same as VM_VM_MIXEDMAP). > >> > >> Please see the zuf patches for usage (Again apologise for pushing before > >> a user) > >> > >> The mmap provider has all the facilities to know that this can not be > >> abused, not even by a trusted Server. > > > > I don't think page tables work the way you think they work. > > > > + err = vm_insert_pfn_prot(zt->vma, zt_addr, pfn, prot); > > > > That doesn't just insert it into the local CPU's page table. Any CPU > > which directly accesses or even prefetches that address will also get > > the translation into its cache. > > > > Yes I know, but that is exactly the point of this flag. I know that this > address is only ever accessed from a single core. Because it is an mmap (vma) > of an O_TMPFILE-exclusive file created in a core-pinned thread and I allow > only that thread any kind of access to this vma. Both the filehandle and the > mmaped pointer are kept on the thread stack and have no access from outside. Even if (in the specific context of your application) software on other cores might not explicitly access this area, that does not prevent allocations into TLBs, and TLB maintenance *cannot* be elided. Even assuming that software *never* explicitly accesses an address which it has not mapped is insufficient. For example, imagine you have two threads, each pinned to a CPU, and some local_cpu_{mmap,munmap} which uses your new flag: CPU0 CPU1 x = local_cpu_mmap(...); do_things_with(x); // speculatively allocates TLB // entries for X. // only invalidates local TLBs local_cpu_munmap(x); // TLB entries for X still live y = local_cpu_mmap(...); // if y == x, we can hit the // stale TLB entry, and access // the wrong page do_things_with(y); Consider that after we free x, the kernel could reuse the page for any purpose (e.g. kernel page tables), so this is a major risk. This flag simply is not safe, unless the *entire* mm is only ever accessed from a single CPU. In that case, we don't need the flag anyway, as the mm already has a cpumask. Thanks, Mark.