Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1159057imm; Wed, 23 May 2018 11:11:58 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrpziYxoaMrdgraeqM1qxzBhEJ27cSWLykk1SHyotLcHesHcENZer4ZWfdacSFLPeTVCMsY X-Received: by 2002:a65:4acd:: with SMTP id c13-v6mr3236987pgu.32.1527099118386; Wed, 23 May 2018 11:11:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527099118; cv=none; d=google.com; s=arc-20160816; b=L2MXV+Wjeb0PffOgXWrR6fDHQx6n1nGR9TXs/Fo3q3c7n2sBuupUt+xDlmHG9JXUgc FKjSJoFFH9JB1cY1rCbYuJral/40xfN37j9frCbeSeAqFIn3fD6iGWOywQViR3soit41 qwfFm/C3KgHBv8byFlZx8If/nzOnpY+VyYHIwErs5I5QJiTDcQGgJo/Fs89mnkPR/nNk qM4a86k1s1sRMTw08I98y35mJWk7KeDuci3SXAishxUS9yCLw4UoJDDOCam+9vF/AJY6 Uj4RgmBEH0N/fJwjlBiwlfPs+21Mxc8V4P2Hu2yKAHiL9l3RHTD/9CLyVketBCbP5wPk Q7IQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=npfs0TCQAcUj2cTs3K1m3PNot6qakbHK8QmkuwsGZDg=; b=k+OMDvPn2Er9ztF3tTYFylGUfZTsiqlAea2xUY4ByuR1cfl519U5KkrfnvAwPdEwV5 PyMuaU+mqciH2np6CdXQ0+zvlujUagkBjZKKlGlaCbUsTKIIV9VJpwUg+6D1yUiQKFUW NrjtjWE2A1od/dYpO0bgmp0nZOCzXGJPGVnE25cinEUgmpz47/Wv1I7I+EKS3qjpsGD9 5N9WlxMIn/Ka/U7TS6IHSi6JX+sJrig2FzmkJc7q0PUWDW4+BNHA6R4TYDFxPMdVhePY yKdY7ojIEj2jrWl4hSxr4OHGxXdlGst6n26D7K64YGwaZvBdVO9Mo/YqIfIlNKxC1Eqd uXAw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g9-v6si14961009pgq.145.2018.05.23.11.11.10; Wed, 23 May 2018 11:11:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754829AbeEWSKn (ORCPT + 99 others); Wed, 23 May 2018 14:10:43 -0400 Received: from foss.arm.com ([217.140.101.70]:59746 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752282AbeEWSKj (ORCPT ); Wed, 23 May 2018 14:10:39 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 88DD31435; Wed, 23 May 2018 11:10:39 -0700 (PDT) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 746093F589; Wed, 23 May 2018 11:10:36 -0700 (PDT) Date: Wed, 23 May 2018 19:10:06 +0100 From: Mark Rutland To: Boaz Harrosh Cc: Christopher Lameter , Jeff Moyer , Matthew Wilcox , Andrew Morton , "Kirill A. Shutemov" , linux-kernel , linux-fsdevel , "linux-mm@kvack.org" , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Peter Zijlstra , Dave Hansen , Rik van Riel , Jan Kara , Matthew Wilcox , Amit Golander Subject: Re: [PATCH] mm: Add new vma flag VM_LOCAL_CPU Message-ID: <20180523181004.txe4x6rx52wtcvjx@lakrids.cambridge.arm.com> References: <0efb5547-9250-6b6c-fe8e-cf4f44aaa5eb@netapp.com> <20180514191551.GA27939@bombadil.infradead.org> <7ec6fa37-8529-183d-d467-df3642bcbfd2@netapp.com> <20180515004137.GA5168@bombadil.infradead.org> <010001637399f796-3ffe3ed2-2fb1-4d43-84f0-6a65b6320d66-000000@email.amazonses.com> <5aea6aa0-88cc-be7a-7012-7845499ced2c@netapp.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5aea6aa0-88cc-be7a-7012-7845499ced2c@netapp.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 22, 2018 at 07:05:48PM +0300, Boaz Harrosh wrote: > On 18/05/18 17:14, Christopher Lameter wrote: > > On Tue, 15 May 2018, Boaz Harrosh wrote: > > > >>> I don't think page tables work the way you think they work. > >>> > >>> + err = vm_insert_pfn_prot(zt->vma, zt_addr, pfn, prot); > >>> > >>> That doesn't just insert it into the local CPU's page table. Any CPU > >>> which directly accesses or even prefetches that address will also get > >>> the translation into its cache. > >>> > >> > >> Yes I know, but that is exactly the point of this flag. I know that this > >> address is only ever accessed from a single core. Because it is an mmap (vma) > >> of an O_TMPFILE-exclusive file created in a core-pinned thread and I allow > >> only that thread any kind of access to this vma. Both the filehandle and the > >> mmaped pointer are kept on the thread stack and have no access from outside. > >> > >> So the all point of this flag is the kernel driver telling mm that this > >> address is enforced to only be accessed from one core-pinned thread. > > > > But there are no provisions for probhiting accesses from other cores? > > > > This means that a casual accidental write from a thread executing on > > another core can lead to arbitrary memory corruption because the cache > > flushing has been bypassed. > > No this is not accurate. A "casual accidental write" will not do any harm. > Only a well concerted malicious server can exploit this. A different thread > on a different core will need to hit the exact time to read from the exact > pointer at the narrow window while the IO is going on. fault-in a TLB at the > time of the valid mapping. TLB entries can be allocated at any time, for any reason. Even if a program doesn't explicitly read from the exact pointer at that time, it doesn't guarantee that a TLB entry won't be allocated. > Then later after the IO has ended and before any > of the threads where scheduled out, maliciously write. ... or, regardless of the application's wishes, the core mm code decides it needs to swap this page out (only doing local TLB invalidation), and later pages it back in. Several things can happen, e.g. * a casual write can corrupt the original page, which is now in use for something else. * a CPU might re-allocate a TLB entry for that page, finding it conflicts with an existing entry. This is *fatal* on some architectures. > All the while the App has freed its buffers and the buffer was used > for something else. Please bear in mind that this is only As root, in > an /sbin/ executable signed by the Kernel's key. That isn't enforced by the core API additions, and regardless, root does not necessarily imply access to kernel-internal stuff (e.g. if the lockdown stuff goes in). Claiming that root access means we don't need to care about robustness is not a good argument. [...] > So lets start from the Beginning. > > How can we implement "Private memory"? Use separate processes rather than threads. Each will have a separate mm, so the arch can get away with local TLB invalidation. If you wish to share portions of memory between these processes, we have shared memory APIs to do so. Thanks, Mark.