Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp149468pxb; Tue, 23 Feb 2021 21:16:15 -0800 (PST) X-Google-Smtp-Source: ABdhPJwKHVzRbo13MpN+rbirqTONrlP9mRgrFXKLHx5+oxXnxGJzMD1o0zY1JV1dtppAkDougri9 X-Received: by 2002:a17:906:a147:: with SMTP id bu7mr18836167ejb.383.1614143774915; Tue, 23 Feb 2021 21:16:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614143774; cv=none; d=google.com; s=arc-20160816; b=fzhEK7JRpqfdLrnZWdpNY04jYcucjm9aURgxF6LbPCGki4RoSqk5UC98p9ohNHBVM1 n7MCcKCZ9rd4aKjmaPhSVbTB8XJmpdO5w3da1UzVWKirVb6x1uFhYE7YO69z63MNN8R9 Iekc8U0S0dq9B1AxcKv1LVv3l+sqUvongwuvRNTy9JEFsyiur5T0mydBQD00bKppVG/B epCVfxpZzfpqSdvtJ1ofvzTcleWz9LJEn7HntrFxOwGkBk2NcnadSbLuupp7rxh6CJn0 R8n1vhFx+nl63GbSmt0GKjAcbkGOKp98nTG3dpOKBAc+FXHRO+1g1sliHjkpvS2326/x n4BA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=CUVzR5zFXxwIo/xe4v9SuM3yFdEAGLfil3o7qgfWtrU=; b=0HuD+tC5DvHbcdd9BsRsYqKdqqog0W7BnQVNffPZqZh6toRVP/ka4q2yyk/sZxrqIf gIO3gm9tIwpipI0a2VASG0+MvGsDSivugUa9skaUxmRgrRoTWkjzpRFmu04EDG/oIxXk 5c+dc+WT1+CPUlnkbFY9n61QoBvbNqeMOcvVnffYPWQAfCsc23M8wmYtuAJfbfIjE0fR Kbjvr7cod9YQEnzj1sh5phDBawrV59jjlMfpu5OqQ90TWBWFqNblW/6CQldtMAB3ELCO 7TBBYYRSheWq+a4vd42WcqUYOpiednrp5ZmxIOQuT+3Zb2dLRbaAVLnmGrjmiD9HRAs9 9Jzg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f16si673157ejc.619.2021.02.23.21.15.36; Tue, 23 Feb 2021 21:16:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232565AbhBWWqE (ORCPT + 99 others); Tue, 23 Feb 2021 17:46:04 -0500 Received: from mx2.suse.de ([195.135.220.15]:45930 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232454AbhBWWdA (ORCPT ); Tue, 23 Feb 2021 17:33:00 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 569EBACE5; Tue, 23 Feb 2021 22:32:04 +0000 (UTC) Date: Tue, 23 Feb 2021 23:31:57 +0100 From: Oscar Salvador To: Muchun Song Cc: Mike Kravetz , Jonathan Corbet , Thomas Gleixner , mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , viro@zeniv.linux.org.uk, Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , Matthew Wilcox , Michal Hocko , "Song Bao Hua (Barry Song)" , David Hildenbrand , HORIGUCHI =?utf-8?B?TkFPWUEo5aCA5Y+jIOebtOS5nyk=?= , Joao Martins , Xiongchun duan , linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel Subject: Re: [External] Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page Message-ID: <20210223223157.GA2740@localhost.localdomain> References: <20210219104954.67390-1-songmuchun@bytedance.com> <20210219104954.67390-5-songmuchun@bytedance.com> <13a5363c-6af4-1e1f-9a18-972ca18278b5@oracle.com> <20210223092740.GA1998@linux> <20210223104957.GA3844@linux> <20210223154128.GA21082@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210223154128.GA21082@localhost.localdomain> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 23, 2021 at 04:41:28PM +0100, Oscar Salvador wrote: > On Tue, Feb 23, 2021 at 11:50:05AM +0100, Oscar Salvador wrote: > > > CPU0: CPU1: > > > set_compound_page_dtor(HUGETLB_PAGE_DTOR); > > > memory_failure_hugetlb > > > get_hwpoison_page > > > __get_hwpoison_page > > > get_page_unless_zero > > > put_page_testzero() > > > > > > Maybe this can happen. But it is a very corner case. If we want to > > > deal with this. We can put_page_testzero() first and then > > > set_compound_page_dtor(HUGETLB_PAGE_DTOR). > > > > I have to check further, but it looks like this could actually happen. > > Handling this with VM_BUG_ON is wrong, because memory_failure/soft_offline are > > entitled to increase the refcount of the page. > > > > AFAICS, > > > > CPU0: CPU1: > > set_compound_page_dtor(HUGETLB_PAGE_DTOR); > > memory_failure_hugetlb > > get_hwpoison_page > > __get_hwpoison_page > > get_page_unless_zero > > put_page_testzero() > > identify_page_state > > me_huge_page > > > > I think we can reach me_huge_page with either refcount = 1 or refcount =2, > > depending whether put_page_testzero has been issued. > > > > For now, I would not re-enqueue the page if put_page_testzero == false. > > I have to see how this can be handled gracefully. > > I took a brief look. > It is not really your patch fault. Hugetlb <-> memory-failure synchronization is > a bit odd, it definitely needs improvment. > > The thing is, we can have different scenarios here. > E.g: by the time we return from put_page_testzero, we might have refcount == > 0 and PageHWPoison, or refcount == 1 PageHWPoison. > > The former will let a user get a page from the pool and get a sigbus > when it faults in the page, and the latter will be even more odd as we > will have a self-refcounted page in the free pool (and hwpoisoned). > > As I said, it is not this patchset fault. I just made me realize this > problem. > > I have to think some more about this. I have been thinking more about this. memory failure events can occur at any time, and we might not be in a position where we can handle gracefully the error, meaning that the page might end up in non desirable state. E.g: we could flag the page right before enqueing it. I still think that VM_BUG_ON should go, as the refcount can be perfectly increased by memory-failure/soft_offline handlers, so BUGing there does not make much sense. One think we could do is to check the state of the page we want to retrieve from the free hugepage pool. We should discard any HWpoisoned ones, and dissolve them. The thing is, memory-failure/soft_offline should allocate a new hugepage for the free pool, so keep the pool stable. Something like [1]. Anyway, this is orthogonal to this patch, and something I will work on soon. [1] https://lore.kernel.org/linux-mm/20210222135137.25717-2-osalvador@suse.de/T/#u -- Oscar Salvador SUSE L3