Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp2602126rwi; Fri, 28 Oct 2022 08:58:38 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5kfD5fuMHEXXXXgkFvoRnbq4R8Mwg+N3RNDdIbhnXoEQDp79cx2Dm4HY18KcTygCt0FsKM X-Received: by 2002:a50:fe99:0:b0:45c:329a:40f6 with SMTP id d25-20020a50fe99000000b0045c329a40f6mr71151edt.425.1666972717745; Fri, 28 Oct 2022 08:58:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666972717; cv=none; d=google.com; s=arc-20160816; b=MccP6Puv/zQPsrdV2AxPM8Uaq2hRmXfWUeq+gDObsf9UCND8yvx0mT0VOeKJ5wqX5r Monnm80H0Y+8dgi+Nrh9EGlnQbXktuj9H6ROQupCEvXnKLkrFbDiW33TxpmtFXfh99NP 9jQnjXfMXzG7eo+pFQT7aE7x9FI+zvKWankLcfGcuqIJFQl7p7mKIfi2+g4Avgr0kw6V y0uTsT/2bodnmMj/u7uLi0zOm9Juwl008EYqM4DFBabNfaoVIYtpWy3iP1yunvUzZRBT 4HAlIoDsgMBa0QSKGEQa7p7krZY0qGh31vaARUbz4ZRvtvQy60sBbQ5PEclEWmx2Qy/y 4p/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=U357t9LaYG+Ovf/UDWoeUpPyE18AOuyXI83iLq3jAp8=; b=m4XO+Ne84Bpdd3j9m6o6s+Qrfzij2YphnZQMnvDTfezSa0HT5Z9sw7ARvfQtnSqBU7 ARwS4h2HY1pVVI7iQCrj5aGA7Ovu2xVrdI9jHJUtIYUmWVt6XE7wWb62FTxRMRMA8w1Z Zkl717FCpNcxYxOiXB19V48DxyFMdaQRZMpuqqQvc6I8LW6wActNAjO7J52j53dPGuqs bIYzttlzvzy62Ki/co605IVuxAgqU9KNv8Yuicc/6Ws1SFNzG4FEegXAD2GJTjH0/qGt UsjCrr3aGJWMM+7xP0zftVYUHkiSBm8Jrzizg/eTuv62KIhnvJQVVUD0VvlBp5R40anA VYHg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ne39-20020a1709077ba700b00783757c8e5esi5327765ejc.931.2022.10.28.08.58.10; Fri, 28 Oct 2022 08:58:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229718AbiJ1PxT (ORCPT + 99 others); Fri, 28 Oct 2022 11:53:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229528AbiJ1PxN (ORCPT ); Fri, 28 Oct 2022 11:53:13 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1C1340E0E for ; Fri, 28 Oct 2022 08:53:12 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 3BE6A62949 for ; Fri, 28 Oct 2022 15:53:12 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A5419C433D7; Fri, 28 Oct 2022 15:53:09 +0000 (UTC) Date: Fri, 28 Oct 2022 16:53:06 +0100 From: Catalin Marinas To: Muchun Song Cc: Anshuman Khandual , Wupeng Ma , Andrew Morton , Mike Kravetz , Muchun Song , Michal Hocko , Oscar Salvador , Linux Memory Management List , linux-kernel@vger.kernel.org Subject: Re: [PATCH -next 1/1] mm: hugetlb_vmemmap: Fix WARN_ON in vmemmap_remap_pte Message-ID: References: <20221025014215.3466904-1-mawupeng1@huawei.com> <614E3E83-1EAB-4C39-AF9C-83C0CCF26218@linux.dev> <35dd51eb-c266-f221-298a-21309c17971a@arm.com> <3D6FDA43-A812-4907-B9C8-C2B25567DBBC@linux.dev> <3c545133-71aa-9a8d-8a13-09186c4fa767@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 28, 2022 at 10:45:09AM +0800, Muchun Song wrote: > On Oct 27, 2022, at 18:50, Catalin Marinas wrote: > > On Wed, Oct 26, 2022 at 02:06:00PM +0530, Anshuman Khandual wrote: > >> On 10/26/22 12:31, Muchun Song wrote: > >>>> On 10/25/22 12:06, Muchun Song wrote: > >>>>>> On Oct 25, 2022, at 09:42, Wupeng Ma wrote: > >>>>>> From: Ma Wupeng > >>>>>> > >>>>>> Commit f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with > >>>>>> each HugeTLB page") add vmemmap_remap_pte to remap the tail pages as > >>>>>> read-only to catch illegal write operation to the tail page. > >>>>>> > >>>>>> However this will lead to WARN_ON in arm64 in __check_racy_pte_update() > >>>>> > >>>>> Thanks for your finding this issue. > >>>>> > >>>>>> since this may lead to dirty state cleaned. This check is introduced by > >>>>>> commit 2f4b829c625e ("arm64: Add support for hardware updates of the > >>>>>> access and dirty pte bits") and the initial check is as follow: > >>>>>> > >>>>>> BUG_ON(pte_write(*ptep) && !pte_dirty(pte)); > >>>>>> > >>>>>> Since we do need to mark this pte as read-only to catch illegal write > >>>>>> operation to the tail pages, use set_pte to replace set_pte_at to bypass > >>>>>> this check. > >>>>> > >>>>> In theory, the waring does not affect anything since the tail vmemmap > >>>>> pages are supposed to be read-only. So, skipping this check for vmemmap > >>>> > >>>> Tails vmemmap pages are supposed to be read-only, in practice but their > >>>> backing pages do have pte_write() enabled. Otherwise the VM_WARN_ONCE() > >>>> warning would not have triggered. > >>> > >>> Right. > >>> > >>>> > >>>> VM_WARN_ONCE(pte_write(old_pte) && !pte_dirty(pte), > >>>> "%s: racy dirty state clearing: 0x%016llx -> 0x%016llx", > >>>> __func__, pte_val(old_pte), pte_val(pte)); > >>>> > >>>> Also, is not it true that the pte being remapped into a different page > >>>> as read only, than what it had originally (which will be freed up) i.e > >>>> the PFN in 'old_pte' and 'pte' will be different. Hence is there still > >>> > >>> Right. > >>> > >>>> a possibility for a race condition even when the PFN changes ? > >>> > >>> Sorry, I didn't get this question. Did you mean the PTE is changed from > >>> new (pte) to the old one (old_pte) by the hardware because of the update > >>> of dirty bit when a concurrent write operation to the tail vmemmap page? > >> > >> No, but is not vmemmap_remap_pte() reuses walk->reuse_page for all remaining > >> tails pages ? Is not there a PFN change, along with access permission change > >> involved in this remapping process ? > > > > For the record, as we discussed offline, changing the output address > > (pfn) of a pte is not safe without break-before-make if at least one of > > the mappings was writeable. The caller (vmemmap_remap_pte()) would need > > to be fixed to first invalidate the pte and then write the new pte. I > > Could you expose more details about what issue it will be caused? I am > not familiar with arm64. Well, it's not allowed by the architecture, so some CPU implementations may do weird things like accessing incorrect memory or triggering TLB conflict aborts if, for some reason, they end up with two entries in the TLB for the same VA but pointing to different pfns. The hardware expects an invalid PTE and TLB invalidation between such changes. In practice most likely nothing happens and this works fine but we need to stick to the architecture requirements in case some CPUs take advantage of this requirement. > > assume no other CPU accesses this part of the vmemmap while the pte is > > being remapped. > > However, there is no guarantee that no other CPU accesses this pte. > E.g. memory failure or memory compaction, both can obtain head page > from any tail struct pages (only read) anytime. Oh, so we cannot safely go through a break-before-make sequence here (zero the PTE, flush the TLB, write the new PTE) as some CPU may access this pte. -- Catalin