Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp971088imm; Wed, 19 Sep 2018 09:47:47 -0700 (PDT) X-Google-Smtp-Source: ANB0VdaVYn5UilDg2rnGO9dzaKN1U30/SCaOeoSJ9tkenRX3BO3YwM8FwIbZV9MBkBqKO6BlZBK2 X-Received: by 2002:a17:902:1a9:: with SMTP id b38-v6mr35331450plb.89.1537375667844; Wed, 19 Sep 2018 09:47:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537375667; cv=none; d=google.com; s=arc-20160816; b=vS1A1xxSVEEAcef4mp31UN21cPknZQ4h5yfDb3/rfazvjYtbUOnaYF7tvyf+G7spNx v9+x0g+UpJU5RIbQzSXHmhVKpt2/AubEunpWpOuJTC7tuDu7Be+vLT1QWud+0/7FW4jy WD3uhWXVIUIUXhXO75UGs4ew7Rjgk5QgxTL1443cuG76RxZNy3Mg1XNROjCh/bc17wz2 ttGS33FVscPmRskL7UOK7FCRMOrTLga57ZQVEhII2Oa2nEvmNyHYHu48n5q++U9GsJm1 DGzlb7sKtVVnFtY56mplDaosCT+IRPv3R/d/oM2sVy042iMlnCqdE1s2zQbsm+Fo93oX QsDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=9WLkTlbQhdFwEP9lmfFlDXbxXHvuUtWVM9pup0dgwC4=; b=TT1b4c2Ofg5gwSv4YG3uCw882va9jAHcAB8NHgxqXZdEoIrWf4bTn6PXlMn//Ris14 7UeAub40u0X7mHJtJT268DLKGFZhb7SLx/WwMZ4WsvdasecDEZlnHV9oJnEHrOjW7nix 5BxTDI4iJJh2lWllQ5TBD1c4L5qE8KEgwPg+2NshMXr6ugs08sYGxp65+n5OlQ5HIGDl KRenFqHwHI60KnV7CJTwAOqBxRz5zQJAY6VX7YfAvD1+7WioYEHaGKlZWVHQviBTov+X Q1yuvgU2gvqNSyY2Dply865kPWRD1VW3e7zt3OJSepSy/6y6wId17RkYvfI30/4iXEsI 3KGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=eDr7FAe0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 21-v6si22338918pfy.169.2018.09.19.09.47.27; Wed, 19 Sep 2018 09:47:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=eDr7FAe0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732255AbeISVyI (ORCPT + 99 others); Wed, 19 Sep 2018 17:54:08 -0400 Received: from merlin.infradead.org ([205.233.59.134]:45390 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731332AbeISVyH (ORCPT ); Wed, 19 Sep 2018 17:54:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=9WLkTlbQhdFwEP9lmfFlDXbxXHvuUtWVM9pup0dgwC4=; b=eDr7FAe0T7b5JHOhkPDpBJelK ufHDRxB5Jst0CH+3EpyjVoaI31C+BK189P6oJLZSHWQESHnUWQeFZUxG908QepQKSOCwgV3mpTkrl KujBqxKBPbGybCiqmUzPeI8vTIhO5kFLvJjQ7u+OlAXO0/Qggxaz89QXgma4Udleu2Yo/QXAUn5iu Vzcm2nsLyEX/ZN0ijL0TeWRXEU0kytT6jX8z+0JiGlALvh+sYNnU8L20ZaSLcVWWgkfhsusV5DkKv JcTC0qGAGZiHsttkEwjrb0TA2yLn7OvboP5yqJ222nY/+lcQCafd77Tu3MG4ZJ9IZirYxSgm+xnHE 6OufCoFJA==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1g2f8H-0003fo-Go; Wed, 19 Sep 2018 16:15:17 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id E8E472024E450; Wed, 19 Sep 2018 18:15:14 +0200 (CEST) Date: Wed, 19 Sep 2018 18:15:14 +0200 From: Peter Zijlstra To: Martin Schwidefsky Cc: will.deacon@arm.com, aneesh.kumar@linux.vnet.ibm.com, akpm@linux-foundation.org, npiggin@gmail.com, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux@armlinux.org.uk, heiko.carstens@de.ibm.com, Linus Torvalds Subject: Re: [PATCH 2/2] s390/tlb: convert to generic mmu_gather Message-ID: <20180919161514.GK24124@hirez.programming.kicks-ass.net> References: <20180918125151.31744-1-schwidefsky@de.ibm.com> <20180918125151.31744-3-schwidefsky@de.ibm.com> <20180919123849.GF24124@hirez.programming.kicks-ass.net> <20180919162809.30b5c416@mschwideX1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180919162809.30b5c416@mschwideX1> User-Agent: Mutt/1.10.0 (2018-05-17) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 19, 2018 at 04:28:09PM +0200, Martin Schwidefsky wrote: > On Wed, 19 Sep 2018 14:38:49 +0200 > Peter Zijlstra wrote: > > > On Tue, Sep 18, 2018 at 02:51:51PM +0200, Martin Schwidefsky wrote: > > > + page_table_free_rcu(tlb, (unsigned long *) pte, address); > > > > (whitespace damage, fixed) > > > > Also, could you perhaps explain the need for that > > page_table_alloc/page_table_free code? That is, I get the comment about > > using 2K page-table fragments out of 4k physical page, but why this > > custom allocator instead of kmem_cache? It feels like there's a little > > extra complication, but it's not immediately obvious what. > > The kmem_cache code uses the fields of struct page for its tracking. > pgtable_page_ctor uses the same fields, e.g. for the ptl. Last time > I tried to convert the page_table_alloc/page_table_free to kmem_cache > it just crashed. Plus the split of 4K pages into 2 2K fragments is > done on a per mm basis, that should help a little bit with fragmentation. Fair enough, thanks for the information. > > It's that ASCE limit that makes it impossible to use the generic > > helpers, right? > > There are two problems, one of them is related to the ASCE limit: > > 1) s390 supports 4 different page table layouts. 2-levels (2^31 bytes) for 31-bit compat, > 3-levels (2^42 bytes) as the default for 64-bit, 4-levels (2^53) if 4 tera-bytes are > not enough and 5-levels (2^64) for the bragging rights. > The pxd_free_tlb() turn into nops if the number of page table levels require it. Shiny, I think we (x86) have to choose at boot time which paging mode we want and have to stick to it. > 2) The mm->context.flush_mm indication. > That goes back to this beauty in the architecture: > > * "A valid table entry must not be changed while it is attached > * to any CPU and may be used for translation by that CPU except to > * (1) invalidate the entry by using INVALIDATE PAGE TABLE ENTRY, > * or INVALIDATE DAT TABLE ENTRY, (2) alter bits 56-63 of a page > * table entry, or (3) make a change by means of a COMPARE AND SWAP > * AND PURGE instruction that purges the TLB." > > If one CPU is doing a mmu_gather page table operation on the only active thread > in the system the individual page table updates are done in a lazy fashion with > simple stores. If a second CPU picks up another thread for execution, the > attach_count is increased and the page table updates are done with IPTE/IDTE > from now on. But there might by TLBs of around that are not flushed yet. > We may *not* let the second CPU see these TLBs, otherwise the CPU may start an > instruction, then loose the TLB without being able to recreate it. Due to that > the CPU can end up with a half finished instruction it can not roll back nor > complete, ending in a check-stop. The simplest example is MVC with a length > of e.g. 256 bytes. The instruction has to complete with all 256 bytes moved, > or no bytes may have at all. > That is where the mm->context.flush_mm indication comes into play, if the > second CPU finds the bit set at the time it attaches a thread, it will to > an IDTE for flush all TLBs for the mm. Oh man.. what fun. Still, this bit could easily be set in the __*_free_tlb() functions afaict. Still 1) above is enough. Thanks!