Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp3315191pxb; Fri, 5 Nov 2021 13:30:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxdaWNpm9zrS4/+Ptd7RtZn0SWGagtSEfu9N9XcCA4qBmyJC9YcZLLmxLCesoXVcD0eG+vb X-Received: by 2002:a05:6e02:190f:: with SMTP id w15mr26581834ilu.56.1636144211829; Fri, 05 Nov 2021 13:30:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1636144211; cv=none; d=google.com; s=arc-20160816; b=EX1uhMqu51DV+nmxbmu9TJMnlfdnrs5Ln0meXhIM/w+jyVHMlCEJOoATq2wjwTP56G WH1SAQwYMEcQ+CHcgP/X0CIAmGYfL7teJTKw1gh5v65YdEAidnRVmoYb/VTpIeJOFB+m oOjpascrz5TSVaGV1xOZO4MTycqkRt/lBI8b7Bx5QXtzRMesDZJr1jAkKE4+4Tg9Y5wl vGUsQ+xRmoD/gAlRVuMBQCmFS9Ce4hKyi1xk9QcWTz1PhldgCEgo8DB9Bc4Xo6VrbPPX 7PjS7V5xYZf5E3WSi5WFpGejE8EN5uqtpqNIJHyAVwROI37MSEcPSfF7FqZwVjS9MW0L FuZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=p9HBcunW9BvJAyc4kGGbFsaCC3WfFrg7qhiT/0zaay8=; b=fYQUTLAOiaZdvHref9umGGc6CFSm2Anvm4S1SeEVTtswD51V20/iFjrfOqlkVXaxl9 znId5ojYSPU5/sv7KySKZGi1ywVptBH7mva5pqDATZu32dRtT9DtCb5oHwXJZN6M2goF 8nMEybfs7ea/dor6eASoXFkVu3GOiDVzfq9OjJ8o8ghU3L2M+pEfQSGOAZ5459HSOnn+ 3p/qsySzKq+1GnInOoyYWheQraLaM6T7X6pvAE29IIqGTtZ94sDwIXHMVb1zn9Q783FT w0Co0v0EvZXwecTKOvIXDaMFi97CyIcbW/b7fX7lXSdcd5VRXlbCoVfoZNYRSL0Y4x8S UWSQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b13si2644611jat.38.2021.11.05.13.29.56; Fri, 05 Nov 2021 13:30:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232454AbhKETnf (ORCPT + 99 others); Fri, 5 Nov 2021 15:43:35 -0400 Received: from mail.kernel.org ([198.145.29.99]:41376 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231938AbhKETne (ORCPT ); Fri, 5 Nov 2021 15:43:34 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id C62F360FC3; Fri, 5 Nov 2021 19:40:52 +0000 (UTC) Date: Fri, 5 Nov 2021 19:40:49 +0000 From: Catalin Marinas To: Linus Torvalds Cc: Matthew Wilcox , Christoph Hellwig , linux-arch , Ira Weiny , Andrew Morton , Linux Kernel Mailing List , Thomas Gleixner , Russell King Subject: Re: flush_dcache_page vs kunmap_local Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 04, 2021 at 11:23:56AM -0700, Linus Torvalds wrote: > On Thu, Nov 4, 2021 at 11:04 AM Catalin Marinas wrote: > > We still have VIVT processors supported in the kernel and a few where > > the VIPT cache is aliasing (some ARMv6 CPUs). On these, > > flush_dcache_page() is still used to ensure the user aliases are > > coherent with the kernel one, so it's not just about the I/D-cache > > coherency. > > Maybe we could try to split it up and make each function have more > well-defined rules? One of the issues with the flush_dcache thing is > that it's always been so ad-hoc and it's not been hugely clear. [...] > So VIVT arm (and whoever else) would continue to do the cache flushing > at kunmap_local time (or kmap - I don't think it matters which one you > do, as long as you make sure there are no stale contents from the > previous use of that address). > > And then we'd relegate flush_dcache_page() purely for uses where > somebody modifies the data and wants to make sure it ends up being > coherent with subsequent uses (whether kmap and VIVT or I$/D$ > coherency issues)? For PIPT hardware (I suppose most newish architectures), flush_dcache_page() only matters with separate I$ and D$. So we can indeed redefine it as only meaningful when a user page has been written by the kernel (and maybe we can give it a better name). For VIVT, kmap/kunmap() can take care of synchronising the aliases. If the kmap'ed page has a user mapping, kmap() would also need to flush the aliases, not just kunmap() (currently relying on flush_dcache_page() for the read side as well). I suspect this is going to make kmap() more expensive for those highmem pages only used by the kernel, or for pages not yet mapped in user space (say during a page fault on mmap'ed file). Long time ago on arm32 we used to do a check with page_mapping() and mapping_mapped() but I think the latter disappeared from the kernel. > > The cachetlb.rst doc states the two cases where flush_dcache_page() > > should be called: > > > > 1. After writing to a page cache page (that's what we need on arm64 for > > the I-cache). > > > > 2. Before reading from a page cache page and user mappings potentially > > exist. I think arm32 ensures the D-cache user aliases are coherent > > with the kernel one (added rmk to confirm). > > I think the "kernel cache coherency" matters too. The PTE contents > thing seems relevant if we use kmap for that... > > So I do think that the "page cache or user mapping" is not necessarily > the only case. At least the arm32 set_pte() for VIVT caches does its own D$ flushing on the kmap() address. So kunmap() flushing is not strictly necessary for this specific case (I think). But there may be other cases where it matters. > But personally I consider these situations so broken at a hardware > level that I can't find it in myself to care too deeply. We've had them supported in mainline for so many years, and working (mostly, there was the odd driver that did not call the right API). But I'm fine with deprecating them, making them slower in favour of cleaner semantics of kmap, flush_dcache_page etc. > Because user space with non-coherent I$/D$ should do its own cache > flushing if it does "read()" to modify an executable range - exactly > the same way it has to do it for just doing regular stores to that > range. Yes, if the user did a read(), it should flush the caches it cares about. I don't think we even have a flush_dcache_page() call in the kernel in these cases, just copy_to_user(). Basically the kernel should only flush if it wrote via its own mapping (linear, kmap). However, with mmap(PROT_EXEC), the user expects the I$/D$ to be coherent without explicit user cache maintenance, even with PIPT hardware. That's where flush_dcache_page() matters. We also had some weird bugs with a dynamic loader mapping a page initially as PROT_READ|PROT_EXEC, doing an mprotect(PROT_READ|PROT_WRITE) just to write some data (not text, so it never thought explicit cache flushing by user was needed) and back to mprotect(PROT_READ|PROT_EXEC). Because of the CoW, the new page did not have the I$/D$ synchronised, leading to the occasional SIGILL. Again, flush_dcache_page() after CoW is need, though hidden in the arch code (we do this on arm64 in copy_user_highpage()). -- Catalin