Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp3051004pxa; Tue, 18 Aug 2020 05:29:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyYNIj8Gy8b90PEGZQBzBLcFWHCM3QzxfQbBzg8Szy2sKC6FjibciMWwCjtdPrKB3ufLveU X-Received: by 2002:a05:6402:3102:: with SMTP id dc2mr19906303edb.152.1597753739957; Tue, 18 Aug 2020 05:28:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597753739; cv=none; d=google.com; s=arc-20160816; b=A+0YTRp2pHNOlDKBzd3QgFgOVTHPYH2HUm2MK+Hwgq6k4+D1hqz38y24VVAIWZ8JMP LbdMZZcDBuG2deS6uJgjX5AoOFbghilPLYCKse6wlpcAXCrLTknnqt+GOcMN2guKSbcN XwbQHuEJ7DuLKYNe9o9gDlIWNbH1VG/T34q7cMHHhsoYlyNQ8iTe8Vxw0uOLjMJ+v6Kb UGKySHu1+C6IdpalRWvrfRcSMCD78Gh7P7E2C2+UE2+EsFwh7nsCnPUi4IIosNWHebrT X8h15xSlMfH8leiTZvO3Ki1xP0kTq5rVnyXxqRL2ummuVEH7TzXoiXqj+fDjjeBBY07a 6gEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date; bh=cn//x/X9O2OYf2f1CVH+Cyaz4k1FVNCegVxlo2hvsGE=; b=ZkbloG0C4BRmYRUoTbPIqB2lSiT0q0N2dXjY8P8YdDVI6YMrTcnAjfXpZMobWAo5EC 5KP4mIabt3iweh8N4eibc42GecqE8HyL/5K4+2Rl5Z5eEzrVpR2D1OL0YxB1V512sjOv T3ZOcE2k7SWoP2q8Jb7T3WtstmiHFgYsSikEgdyxWulRVySS+d6YkjxkzdZM/MNOoJxq HL1Xu3DMjUwQAqb14p7ElwcdDfqmqn/NQEPwiRIOCBqn0cXAdJXBW/jcSQ0mQyZNDAGh 5qEG+qthtp5jq1MWUuQuukJaJwmgbAEXPARozITLhFN5sfpxVMDZVx+wBohE4bKrh1uk 2YJQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q27si13229342ejd.123.2020.08.18.05.28.36; Tue, 18 Aug 2020 05:28:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726761AbgHRM2A (ORCPT + 99 others); Tue, 18 Aug 2020 08:28:00 -0400 Received: from lhrrgout.huawei.com ([185.176.76.210]:2615 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726745AbgHRM16 (ORCPT ); Tue, 18 Aug 2020 08:27:58 -0400 Received: from lhreml710-chm.china.huawei.com (unknown [172.18.7.107]) by Forcepoint Email with ESMTP id DD78C3F249352D8EDAFF; Tue, 18 Aug 2020 13:27:56 +0100 (IST) Received: from localhost (10.52.121.15) by lhreml710-chm.china.huawei.com (10.201.108.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1913.5; Tue, 18 Aug 2020 13:27:56 +0100 Date: Tue, 18 Aug 2020 13:26:25 +0100 From: Jonathan Cameron To: Anshuman Khandual CC: , , , , , Mark Rutland , Marc Zyngier , "Suzuki Poulose" , Subject: Re: [PATCH 1/2] arm64/mm: Change THP helpers to comply with generic MM semantics Message-ID: <20200818132625.00003d05@Huawei.com> In-Reply-To: <8db455b6-8fe5-b552-119f-4abab0cc8501@arm.com> References: <1597655984-15428-1-git-send-email-anshuman.khandual@arm.com> <1597655984-15428-2-git-send-email-anshuman.khandual@arm.com> <20200818101301.000027ef@Huawei.com> <8db455b6-8fe5-b552-119f-4abab0cc8501@arm.com> Organization: Huawei Technologies Research and Development (UK) Ltd. X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; i686-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.52.121.15] X-ClientProxiedBy: lhreml723-chm.china.huawei.com (10.201.108.74) To lhreml710-chm.china.huawei.com (10.201.108.61) X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 18 Aug 2020 15:11:58 +0530 Anshuman Khandual wrote: > On 08/18/2020 02:43 PM, Jonathan Cameron wrote: > > On Mon, 17 Aug 2020 14:49:43 +0530 > > Anshuman Khandual wrote: > > > >> pmd_present() and pmd_trans_huge() are expected to behave in the following > >> manner during various phases of a given PMD. It is derived from a previous > >> detailed discussion on this topic [1] and present THP documentation [2]. > >> > >> pmd_present(pmd): > >> > >> - Returns true if pmd refers to system RAM with a valid pmd_page(pmd) > >> - Returns false if pmd does not refer to system RAM - Invalid pmd_page(pmd) > >> > >> pmd_trans_huge(pmd): > >> > >> - Returns true if pmd refers to system RAM and is a trans huge mapping > >> > >> ------------------------------------------------------------------------- > >> | PMD states | pmd_present | pmd_trans_huge | > >> ------------------------------------------------------------------------- > >> | Mapped | Yes | Yes | > >> ------------------------------------------------------------------------- > >> | Splitting | Yes | Yes | > >> ------------------------------------------------------------------------- > >> | Migration/Swap | No | No | > >> ------------------------------------------------------------------------- > >> > >> The problem: > >> > >> PMD is first invalidated with pmdp_invalidate() before it's splitting. This > >> invalidation clears PMD_SECT_VALID as below. > >> > >> PMD Split -> pmdp_invalidate() -> pmd_mkinvalid -> Clears PMD_SECT_VALID > >> > >> Once PMD_SECT_VALID gets cleared, it results in pmd_present() return false > >> on the PMD entry. It will need another bit apart from PMD_SECT_VALID to re- > >> affirm pmd_present() as true during the THP split process. To comply with > >> above mentioned semantics, pmd_trans_huge() should also check pmd_present() > >> first before testing presence of an actual transparent huge mapping. > >> > >> The solution: > >> > >> Ideally PMD_TYPE_SECT should have been used here instead. But it shares the > >> bit position with PMD_SECT_VALID which is used for THP invalidation. Hence > >> it will not be there for pmd_present() check after pmdp_invalidate(). > >> > >> A new software defined PMD_PRESENT_INVALID (bit 59) can be set on the PMD > >> entry during invalidation which can help pmd_present() return true and in > >> recognizing the fact that it still points to memory. > >> > >> This bit is transient. During the split process it will be overridden by a > >> page table page representing normal pages in place of erstwhile huge page. > >> Other pmdp_invalidate() callers always write a fresh PMD value on the entry > >> overriding this transient PMD_PRESENT_INVALID bit, which makes it safe. > >> > >> [1]: https://lkml.org/lkml/2018/10/17/231 > >> [2]: https://www.kernel.org/doc/Documentation/vm/transhuge.txt > > > > Hi Anshuman, > > > > One query on this. From my reading of the ARM ARM, bit 59 is not > > an ignored bit. The exact requirements for hardware to be using > > it are a bit complex though. > > > > It 'might' be safe to use it for this, but if so can we have a comment > > explaining why. Also more than possible I'm misunderstanding things! > > We are using this bit 59 only when the entry is not active from MMU > perspective i.e PMD_SECT_VALID is clear. > Understood. I guess we ran out of bits that were always ignored so had to start using ones that are ignored in this particular state. Jonathan