Received: by 2002:a25:d783:0:0:0:0:0 with SMTP id o125csp33935ybg; Wed, 18 Mar 2020 16:49:53 -0700 (PDT) X-Google-Smtp-Source: ADFU+vuo9AISl2tfBZVKUItZ4kd2PzbREPCaXW1YzAJbc4DJ4jAgL/DJuateiWDDnGtkbhjgMrdr X-Received: by 2002:a05:6808:491:: with SMTP id z17mr393978oid.78.1584575393476; Wed, 18 Mar 2020 16:49:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584575393; cv=none; d=google.com; s=arc-20160816; b=bVrLvChilzYC74Mem2wFPgtafekGPAaLqE5JD/KT7N4nXQjol84Nq8G8rxm3Q4fcSE NWL2xw4fRdZm9d/qfgFVJwapM6NiSN/30tft/S5ylZyTEu2I0ndLGzzXPMOPon8Ba8g/ GwX5+/+zH2C9wzD2b7PXHKq6n7AoVzYnwVufqtkzTQBoH+/qzVWbBqzwLZIHCWpAWB2b Mvcba2j74tl9SuW1KvyIIq2iIbC6HhWBiB/SyAD900L0rdJL10YVAr0AbI8QKE558nzk PVuBwqe5NHA/iTt0VhdZl86Y7pavObUaQHHOyL2H9QqJg4Kov4Z0JrKRPfYeYqmayLsm sYig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=pnJUaKf1KdrUx7i5h10M46LVwdEtFz9Qqm6jJ5BkohU=; b=p0muW0vhAlX/CGF/r9URDCL6C88xypWOsFfce5cTDJGxhRDm/gQExI8k3xsmqub3yg xxFx4UPmCVJJlrPX/VEV2UARY1qH45jzP8maXIlYt2J7PIliWILRd0AZoW710RnATzFN uvLpQtMSL+qMWLgK++C9QWqVDf/bFnykXOaA/V+fNZi3icbJLi/JmUWaGjkaZXAzu1em 2CizHEDtDLeTfpMQZuqlwLz43XTnuxm6VoPZsAJ+OMa2qeHntDCQiE6eRx9t/fK3OpUg 5noqcMPUu3Nxb+p5rtzm1rgk3c+4Qh0RTvTY237YXG4IG/nUFspRZ7EBhbjMis8MYUXZ R6YQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l63si130250oif.161.2020.03.18.16.49.40; Wed, 18 Mar 2020 16:49:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726943AbgCRXrx (ORCPT + 99 others); Wed, 18 Mar 2020 19:47:53 -0400 Received: from out30-54.freemail.mail.aliyun.com ([115.124.30.54]:55879 "EHLO out30-54.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726647AbgCRXrx (ORCPT ); Wed, 18 Mar 2020 19:47:53 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R111e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04455;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0Tt-1t37_1584575263; Received: from US-143344MP.local(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0Tt-1t37_1584575263) by smtp.aliyun-inc.com(127.0.0.1); Thu, 19 Mar 2020 07:47:46 +0800 Subject: Re: [PATCH] mm: khugepaged: fix potential page state corruption To: "Kirill A. Shutemov" Cc: kirill.shutemov@linux.intel.com, hughd@google.com, aarcange@redhat.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <1584573582-116702-1-git-send-email-yang.shi@linux.alibaba.com> <20200318234036.aw3awl25gxrjd2jl@box> From: Yang Shi Message-ID: Date: Wed, 18 Mar 2020 16:47:42 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20200318234036.aw3awl25gxrjd2jl@box> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/18/20 4:40 PM, Kirill A. Shutemov wrote: > On Thu, Mar 19, 2020 at 07:19:42AM +0800, Yang Shi wrote: >> When khugepaged collapses anonymous pages, the base pages would be freed >> via pagevec or free_page_and_swap_cache(). But, the anonymous page may >> be added back to LRU, then it might result in the below race: >> >> CPU A CPU B >> khugepaged: >> unlock page >> putback_lru_page >> add to lru >> page reclaim: >> isolate this page >> try_to_unmap >> page_remove_rmap <-- corrupt _mapcount >> >> It looks nothing would prevent the pages from isolating by reclaimer. >> >> The other problem is the page's active or unevictable flag might be >> still set when freeing the page via free_page_and_swap_cache(). The >> putback_lru_page() would not clear those two flags if the pages are >> released via pagevec, it sounds nothing prevents from isolating active >> or unevictable pages. >> >> However I didn't really run into these problems, just in theory by visual >> inspection. >> >> And, it also seems unnecessary to have the pages add back to LRU again since >> they are about to be freed when reaching this point. So, clearing active >> and unevictable flags, unlocking and dropping refcount from isolate >> instead of calling putback_lru_page() as what page cache collapse does. >> >> Cc: Kirill A. Shutemov >> Cc: Hugh Dickins >> Cc: Andrea Arcangeli >> Signed-off-by: Yang Shi >> --- >> mm/khugepaged.c | 10 +++++++++- >> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >> index b679908..f42fa4e 100644 >> --- a/mm/khugepaged.c >> +++ b/mm/khugepaged.c >> @@ -673,7 +673,6 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, >> src_page = pte_page(pteval); >> copy_user_highpage(page, src_page, address, vma); >> VM_BUG_ON_PAGE(page_mapcount(src_page) != 1, src_page); >> - release_pte_page(src_page); >> /* >> * ptl mostly unnecessary, but preempt has to >> * be disabled to update the per-cpu stats >> @@ -687,6 +686,15 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, >> pte_clear(vma->vm_mm, address, _pte); >> page_remove_rmap(src_page, false); >> spin_unlock(ptl); >> + >> + dec_node_page_state(src_page, >> + NR_ISOLATED_ANON + page_is_file_cache(src_page)); >> + ClearPageActive(src_page); >> + ClearPageUnevictable(src_page); >> + unlock_page(src_page); >> + /* Drop refcount from isolate */ >> + put_page(src_page); >> + >> free_page_and_swap_cache(src_page); >> } >> } > I will look at this closer tomorrow, but looks like an easier fix is to > move release_pte_page() under ptl that we take just after it. Thanks, Kirill. However, it looks putting the page back to lru is not necessary. And, it also sounds not very efficient to do this under ptl IMHO since it may spins to wait for lru lock. >