Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp1025119ybb; Wed, 8 Apr 2020 14:55:47 -0700 (PDT) X-Google-Smtp-Source: APiQypK1laOuiTCNvGEZEt3vBeeGztfc2o8ZAOD+cd0AUbGJh3jUGkaHJD0QFMrS7xS9DprA2A6R X-Received: by 2002:a05:6830:19ee:: with SMTP id t14mr1678022ott.287.1586382947756; Wed, 08 Apr 2020 14:55:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586382947; cv=none; d=google.com; s=arc-20160816; b=ShAxd2mvvzqv46XZxpXM074ZavGuBhJWxU1RbGqi8R2I2buPMJQbbFpzzymrRG+pgS QGYc5AqKyodWSp0g8PgDzcoEh2fAAb/+1bMYsJPJPTF4ZtGXmEl8bufOPP13DR6q+IRa LI/AZROqtIZfU1C4lNy47jxTe+O/LmZrkhJRK42HKEwLWbZ3As1BUy1LpOlrjcXS33Zy 7plF+uibEknCON9UbgAEbDhsy+7fwwuATFEfOnSj1C4xrLJwwTqPXYNypRY8QpL76CUx Ti7Qi7CXsj3RAS9qL6p/GSZTSfFc7R6eTPXk5N6UqCydbmaiSVfR4YjDKuyyU8KhGnjM cSXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=OhT/eI5Q+eN3ah+drc/aBXK9Pe1TThj8eomAgANwIBI=; b=1GanAi3IcfVSGqQMMGBTvnd6PVw0coZ7UngLdNZjUud8BAw3z5uBFuwmSs8oexwE0w v3dQzHYvchEa74F/APOt0mSeOvfm5Kiba/4tK+3RYIIHV7hoyudHHO7UCBsGyGdBh03h tfcXX0gYZuren/J8fD94djwrJX7Mo91dGQDhd6SBMHW55WUeuWttAbtBXMXrylgSxAhr dGmtZc52IcjgIYxwhHqI8RJy7e6g2GczXAQb1aqDdCEdMju/i3yAWbFqVTQzWt66KLtl 5EoK1XYnVTkxTxNXMmGovLEN+eoOT6bTZ/s05ST/rBEqFukFjB09/d+AizOOugMlWbv0 +yqQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h66si2861733oif.53.2020.04.08.14.55.35; Wed, 08 Apr 2020 14:55:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728610AbgDHSv3 (ORCPT + 99 others); Wed, 8 Apr 2020 14:51:29 -0400 Received: from out30-56.freemail.mail.aliyun.com ([115.124.30.56]:48719 "EHLO out30-56.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727963AbgDHSv3 (ORCPT ); Wed, 8 Apr 2020 14:51:29 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0Tv.3iEv_1586371884; Received: from US-143344MP.local(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0Tv.3iEv_1586371884) by smtp.aliyun-inc.com(127.0.0.1); Thu, 09 Apr 2020 02:51:26 +0800 Subject: Re: [PATCHv2 5/8] khugepaged: Allow to callapse a page shared across fork To: "Kirill A. Shutemov" Cc: akpm@linux-foundation.org, Andrea Arcangeli , Zi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" References: <20200403112928.19742-1-kirill.shutemov@linux.intel.com> <20200403112928.19742-6-kirill.shutemov@linux.intel.com> <20200408131044.xzlheacvslrbwrja@box> From: Yang Shi Message-ID: <107630f5-bbde-3f78-23e9-6f6b3113d709@linux.alibaba.com> Date: Wed, 8 Apr 2020 11:51:22 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20200408131044.xzlheacvslrbwrja@box> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/8/20 6:10 AM, Kirill A. Shutemov wrote: > On Mon, Apr 06, 2020 at 01:50:56PM -0700, Yang Shi wrote: >> >> On 4/3/20 4:29 AM, Kirill A. Shutemov wrote: >>> The page can be included into collapse as long as it doesn't have extra >>> pins (from GUP or otherwise). >>> >>> Signed-off-by: Kirill A. Shutemov >>> --- >>> mm/khugepaged.c | 25 ++++++++++++++----------- >>> 1 file changed, 14 insertions(+), 11 deletions(-) >>> >>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >>> index 57ff287caf6b..1e7e6543ebca 100644 >>> --- a/mm/khugepaged.c >>> +++ b/mm/khugepaged.c >>> @@ -581,11 +581,18 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, >>> } >>> /* >>> - * cannot use mapcount: can't collapse if there's a gup pin. >>> - * The page must only be referenced by the scanned process >>> - * and page swap cache. >>> + * Check if the page has any GUP (or other external) pins. >>> + * >>> + * The page table that maps the page has been already unlinked >>> + * from the page table tree and this process cannot get >>> + * additinal pin on the page. >>> + * >>> + * New pins can come later if the page is shared across fork, >>> + * but not for the this process. It is fine. The other process >>> + * cannot write to the page, only trigger CoW. >>> */ >>> - if (page_count(page) != 1 + PageSwapCache(page)) { >>> + if (total_mapcount(page) + PageSwapCache(page) != >>> + page_count(page)) { >> This check looks fine for base page, but what if the page is PTE-mapped THP? >> The following patch made this possible. >> >> If it is PTE-mapped THP and the page is in swap cache, the refcount would be >> 512 + the number of PTE-mapped pages. >> >> Shall we do the below change in the following patch? >> >> extra_pins = PageSwapCache(page) ? nr_ccompound(page) - 1 : 0; >> if (total_mapcount(page) + PageSwapCache(page) != page_count(page) - >> extra_pins) { >> ... > Looks like you're right. > > It would be nice to have a test case to demonstrate the issue. > > Is there any way to trigger moving the page to swap cache? I don't see it > immediately. It sounds not easy to trigger since it totally depends on timing, I'm wondering we may have to use MADV_PAGEOUT? Something below off the top of my head may trigger this?     CPU       A                                    CPU    B                         CPU    C In parent: MADV_HUGEPAGE page fault to fill with THP fork                                                     In Child: MADV_NOHUGEPAGE MADV_DONTNEED (split pmd)                                                         MADV_PAGEOUT                                                             -> add_to_swap                                 khugepaged scan parent and try to collapse PTE-mapped                                                             -> try_to_unmap When doing MADV_DONTNEED we need make sure head page is unmapped since MADV_PAGEOUT would call page_mapcount(page) to skip shared mapping. >