Received: by 10.213.65.68 with SMTP id h4csp934143imn; Wed, 14 Mar 2018 04:49:38 -0700 (PDT) X-Google-Smtp-Source: AG47ELvrIVupPmI+pKX6Q23hwlK7D2cynZcT2AT+aHA6G//FyvaZ/DAF+MCpg9c8Jn6xuF56utdu X-Received: by 10.101.83.65 with SMTP id w1mr3515404pgr.313.1521028178583; Wed, 14 Mar 2018 04:49:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521028178; cv=none; d=google.com; s=arc-20160816; b=CNmBA75miO0GCPCd4NGHM+xbp0/YvxDxUxxxo4KPvquic6iJOgRnOg3Pbpf3EbQ6rd kSDlNX2+KQtPt6ATIJcsm6McPWW2ef2GtYaM6VygB1wHb5gEPiQEmTyKBReAORogLCE5 YdNV4n1ycA/uBu3Y4QD/P9EBPmsoGMlJH3XbXhVLd5dPdRXP4BFr9+/Y9Gjz4XS21beN twhFCHZP0xyL5+/i7dNvKVS2rTq4nUk7H3fkrSjPqyDCPh/CqFJotW9IDX/c8pp7UGS8 thOUeg+Hv+8Qp6s4VcYZU7r/0EwcH+Day52nx2mBQec6tGf9A6ArMXcxRkIgNi7VFp+f x2wA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=t4+MeAEYakD50rNX18LEvjkh6zowXVOg8yCN98TvmPY=; b=YU1BZJYBoKtSQnwByIhL4f5KoQ8aWcKSt77YLTJ/TWVV56kNdVjj8zdX4RuuJm4xZn izVNsIBCge9l89h/9H7KNoKry5OJ0vRzj1Aqf0hklUUUWwBSXZniv8xbAWGamRv+X7SA gLmQ2m21aJMFV/vAw2cNKGZALQ9iWr5q5BjG6zrds4z9RJ/GjLHY62yDMwnWF5mHRw0H vt+A8a2cPhAqWmZ9FSXMO5UIopj3Khn1z3fRTUuJsHQ6U1u5BnRPkAtjGQ49wWXI37D8 oI1Os8egPoM+0l3fCH4brrWfZO+yPpymYXGXnBWIapyHAUAJBnC09c5yvYl/EDN0HJG5 UqQQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b9si1755432pgn.443.2018.03.14.04.49.24; Wed, 14 Mar 2018 04:49:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751402AbeCNLsW (ORCPT + 99 others); Wed, 14 Mar 2018 07:48:22 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:51232 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750751AbeCNLsU (ORCPT ); Wed, 14 Mar 2018 07:48:20 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 405C315AB; Wed, 14 Mar 2018 04:48:20 -0700 (PDT) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 920913F487; Wed, 14 Mar 2018 04:48:17 -0700 (PDT) Date: Wed, 14 Mar 2018 11:48:15 +0000 From: Mark Rutland To: Chintan Pandya Cc: catalin.marinas@arm.com, will.deacon@arm.com, arnd@arndb.de, ard.biesheuvel@linaro.org, marc.zyngier@arm.com, james.morse@arm.com, kristina.martsenko@arm.com, takahiro.akashi@linaro.org, gregkh@linuxfoundation.org, tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, akpm@linux-foundation.org, toshi.kani@hpe.com Subject: Re: [PATCH v1 2/4] ioremap: Invalidate TLB after huge mappings Message-ID: <20180314114814.mzgwlzkwzxijiv54@lakrids.cambridge.arm.com> References: <1521017305-28518-1-git-send-email-cpandya@codeaurora.org> <1521017305-28518-3-git-send-email-cpandya@codeaurora.org> <20180314104823.yumqomzmbu3cj442@lakrids.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 14, 2018 at 04:50:35PM +0530, Chintan Pandya wrote: > On 3/14/2018 4:18 PM, Mark Rutland wrote: > > On Wed, Mar 14, 2018 at 02:18:23PM +0530, Chintan Pandya wrote: > > As has been noted in previous threads, the ARM architecture requires a > > Break-Before-Make sequence when changing an entry from a table to a > > block, as is the case here. > > > > The means the necessary sequence is: > > > > 1. Make the entry invalid > > 2. Invalidate relevant TLB entries > > 3. Write the new entry > > > We do this for PTEs. I don't see this applicable to PMDs. The architecture requires this for *all* levels of page table, when certain changes are made. Switching an entry from a table to block (or vice versa) is one of those changes, and this definitely applies to PMDs. > Because, > > 1) To mark any PMD invalid, we need to be sure that next level page > table (I mean all the 512 PTEs) should be zero. That requires us > to scan entire last level page. A big perf hit ! This is in ioremap code. Under what workload does this constitute a perf hit? Regardless, so long as we mark the pmd entry invalid before the TLB invalidation, we don't need to touch the next level table at all. We just require a sequence like: pmd_clear(*pmdp); flush_tlb_kernel_range(pmd_start_addr, pmd_end_addr); pmd_set_huge(*pmdp, phys, prot); > 2) We need to perform step 1 for every unmap as we never know which > unmap will make last level page table empty. Sorry, I don't follow. Could you elaborate on the problem? > Moreover, problem comes only when 4K mapping was followed by 2M > mapping. In all other cases, retaining valid PMD has obvious perf > gain. That's what walk-cache is supposed to be introduced for. Retaining a valid PMD in the TLB that *differs* from a valid PMD in the page tables is a big problem. The architecture requires BBM, as this permits CPUs to allocate PMDs into TLBs at *any* time, even if there's already PMD in the TLB for a given address. Thus, CPUs can allocate *both* valid PMDs into the TLBs. When this happens, a TLB lookup can: 1) return either of the PMDs. 2) raise a TLB conflict abort. 3) return an amalgamation of the two entries (e.g. provide an erroneous address). Note that (3) is particularly scary: * The CPU could raise an SError if the amalgamated entry is junk. * If a memory access hits an amalgamated entry, it may use the wrong physical address, attributes, or permissions, resulting in a number of potential problems. * If the amalgamated entry looks like a partial walk, the TLB might try to perform a walk starting at the physical address in the amalgamated entry. This would cause page table walks to access bogus addresses, allocating junk into TLBs, and may result in SErrors or other aborts. > > Whereas above, the sequence is > > > > 1. Write the new entry > > 2. invalidate relevant TLB entries > > > > Which is insufficient, and will lead to a number of problems. > I couldn't think of new problems with this approach. Could you share > any problematic scenarios ? Please see above. > Also, my test-case runs fine with these patches for 10+ hours. While this may happen to work on particular platforms, it is not guaranteed per the architecture, and will fail on others. Thanks, Mark.