Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp7466124ybp; Wed, 16 Oct 2019 09:04:26 -0700 (PDT) X-Google-Smtp-Source: APXvYqwVUHdE5ZPK7JcVEfiOsr9H+Qc2V/mFxE9UH2HCYnOGaQfC+S6CSpAvOBWyDgqzhnDJCFja X-Received: by 2002:a17:906:154a:: with SMTP id c10mr39239564ejd.238.1571241865970; Wed, 16 Oct 2019 09:04:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571241865; cv=none; d=google.com; s=arc-20160816; b=K7zQAz9k52qaUwnd0wddalZ/IMahjWxHHCRpT93RayIg1a84CsDi1aq9s1H2TMjh7c T1mRtM1y4a6ZwmrNwIHZrUZcskWinHSoKHXqkq8F2JW8oohAWdICAPwjLDoZsIhYdNVE kdJuQnH2Pa3WCmB2Uv5eNKpmGZy/k5e6DILWioBtpUQfrSJ0eKydEUeBaqdtT14oBrcA jKv2mEGzXe3iBJv287vI4P1HZKlYaP4beItvDqt6OvTsDK58g3o52SS/tNQSmhyGxHWc jYnY+1qOSoo0PK1RSzIIzO7p3gsPRwIlCLk8bPBR3q9pMFUN6Xgy/wTyreRStI2VeIrm KTVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=NAIRkKt/Gtrtj9U/pnfETSk/2UXH3BaWY/3T2IH5YA0=; b=dVnF5BuVuX8JCe0m4XtRfwRhBQmxoVfD1F71fFcR3oK+yCdxZXkgPi7NSoEfVOoBwX /XPFXDzdEpmh2DUgcu23DmktMNJEBHsP4lhNftzazvCxN17hBbYwihugeAL5/XsUAG+E dN3MTDonMTJsyqfYUkzeAP3ZaKUO/p6/V/iH2iYqv9xNusoT7aY3s1ITEnTu/+PQeJjE CLEh5a5Iiajd7M44Y13U13o5zxhpmb8vto6SlCUUXZmCpo321uLfRujE6zMgZPJww/zZ CBGJ28dHveVyI2OpBbmixvMCC4NejO9jzaRtvlbeTtoThL7vS5mfGSNT00wSbnjzcm2J NhdQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k23si16858454ede.274.2019.10.16.09.04.02; Wed, 16 Oct 2019 09:04:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2393528AbfJPNWl (ORCPT + 99 others); Wed, 16 Oct 2019 09:22:41 -0400 Received: from foss.arm.com ([217.140.110.172]:39824 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729306AbfJPNWl (ORCPT ); Wed, 16 Oct 2019 09:22:41 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 69FD8142F; Wed, 16 Oct 2019 06:22:40 -0700 (PDT) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8962B3F68E; Wed, 16 Oct 2019 06:22:38 -0700 (PDT) Date: Wed, 16 Oct 2019 14:22:33 +0100 From: Mark Rutland To: Andrey Ryabinin Cc: Daniel Axtens , kasan-dev@googlegroups.com, linux-mm@kvack.org, x86@kernel.org, glider@google.com, luto@kernel.org, linux-kernel@vger.kernel.org, dvyukov@google.com, christophe.leroy@c-s.fr, linuxppc-dev@lists.ozlabs.org, gor@linux.ibm.com Subject: Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory Message-ID: <20191016132233.GA46264@lakrids.cambridge.arm.com> References: <20191001065834.8880-1-dja@axtens.net> <20191001065834.8880-2-dja@axtens.net> <352cb4fa-2e57-7e3b-23af-898e113bbe22@virtuozzo.com> <87ftjvtoo7.fsf@dja-thinkpad.axtens.net> <8f573b40-3a5a-ed36-dffb-4a54faf3c4e1@virtuozzo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8f573b40-3a5a-ed36-dffb-4a54faf3c4e1@virtuozzo.com> User-Agent: Mutt/1.11.1+11 (2f07cb52) (2018-12-01) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Andrey, On Wed, Oct 16, 2019 at 03:19:50PM +0300, Andrey Ryabinin wrote: > On 10/14/19 4:57 PM, Daniel Axtens wrote: > >>> + /* > >>> + * Ensure poisoning is visible before the shadow is made visible > >>> + * to other CPUs. > >>> + */ > >>> + smp_wmb(); > >> > >> I'm not quite understand what this barrier do and why it needed. > >> And if it's really needed there should be a pairing barrier > >> on the other side which I don't see. > > > > Mark might be better able to answer this, but my understanding is that > > we want to make sure that we never have a situation where the writes are > > reordered so that PTE is installed before all the poisioning is written > > out. I think it follows the logic in __pte_alloc() in mm/memory.c: > > > > /* > > * Ensure all pte setup (eg. pte page lock and page clearing) are > > * visible before the pte is made visible to other CPUs by being > > * put into page tables. > > * > > * The other side of the story is the pointer chasing in the page > > * table walking code (when walking the page table without locking; > > * ie. most of the time). Fortunately, these data accesses consist > > * of a chain of data-dependent loads, meaning most CPUs (alpha > > * being the notable exception) will already guarantee loads are > > * seen in-order. See the alpha page table accessors for the > > * smp_read_barrier_depends() barriers in page table walking code. > > */ > > smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */ > > > > I can clarify the comment. > > I don't see how is this relevant here. The problem isn't quite the same, but it's a similar shape. See below for more details. > barrier in __pte_alloc() for very the following case: > > CPU 0 CPU 1 > __pte_alloc(): pte_offset_kernel(pmd_t * dir, unsigned long address): > pgtable_t new = pte_alloc_one(mm); pte_t *new = (pte_t *) pmd_page_vaddr(*dir) + ((address >> PAGE_SHIFT) & (PTRS_PER_PAGE - 1)); > smp_wmb(); smp_read_barrier_depends(); > pmd_populate(mm, pmd, new); > /* do something with pte, e.g. check if (pte_none(*new)) */ > > > It's needed to ensure that if CPU1 sees pmd_populate() it also sees initialized contents of the 'new'. > > In our case the barrier would have been needed if we had the other side like this: > > if (!pte_none(*vmalloc_shadow_pte)) { > shadow_addr = (unsigned long)__va(pte_pfn(*vmalloc_shadow_pte) << PAGE_SHIFT); > smp_read_barrier_depends(); > *shadow_addr; /* read the shadow, barrier ensures that if we see installed pte, we will see initialized shadow memory. */ > } > > > Without such other side the barrier is pointless. The barrier isn't pointless, but we are relying on a subtlety that is not captured in LKMM, as one of the observers involved is the TLB (and associated page table walkers) of the CPU. Until the PTE written by CPU 0 has been observed by the TLB of CPU 1, it is not possible for CPU 1 to satisfy loads from the memory that PTE maps, as it doesn't yet know which memory that is. Once the PTE written by CPU has been observed by the TLB of CPU 1, it is possible for CPU 1 to satisfy those loads. At this instant, CPU 1 must respect the smp_wmb() before the PTE was written, and hence sees zeroes written before this. Note that if this were not true, we could not safely swap userspace memory. There is the risk (as laid out in [1]) that CPU 1 attempts to hoist the loads of the shadow memory above the load of the PTE, samples a stale (faulting) status from the TLB, then performs the load of the PTE and sees a valid value. In this case (on arm64) a spurious fault could be taken when the access is architecturally performed. It is possible on arm64 to use a barrier here to prevent the spurious fault, but this is not smp_read_barrier_depends(), as that does nothing for everyone but alpha. On arm64 We have a spurious fault handler to fix this up. Thanks, Mark. [1] https://lore.kernel.org/linux-arm-kernel/20190827131818.14724-1-will@kernel.org/ [2] https://lore.kernel.org/linux-mm/20191014152717.GA20438@lakrids.cambridge.arm.com/