Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp895042pxb; Wed, 15 Sep 2021 16:11:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzJXfyVoTxdvzDPMmpu3x7LxpeFk5Qo6ibYCkqYxSRbet2TGiqST7Lsbwhr/mdDgxDov2le X-Received: by 2002:a05:6638:d55:: with SMTP id d21mr2049068jak.104.1631747496962; Wed, 15 Sep 2021 16:11:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631747496; cv=none; d=google.com; s=arc-20160816; b=aOl3qTQTFbomt73AGwsfHCqZTTSJQx7+SDEKtJoHnHW9e9dJ1M/o7RzDkIGFOvr71w nwfJTlOnPMyNUGRUr9Tom+BymeB6ktSLmDZVjLV/WtKU9Xaa9o0nQjU6kO8UnmgyB7P3 5jcbeL1KXzFCZc/hWkk7Z1rDndr2wXtqSz6QnVa6SdBLfHNocLgevSqS5M355rLcHvPd 5m1MUgiIFS7uVz4yb65kxRzvpvfQRMV9SlC8oXmUC34LBD3q/+4eSMMWCsOPYDQK9zUo mVBIYUftInzJ/PRmk+yPYt4XZ2EcPNOWvHQKkXA+16d/nRe4eKYKmdu/+Z5AObllPkyp af5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=9BtEwTezY+qlQ0z6udcY1NI+sWRfBUg4i/PQVpsepp0=; b=Mrvz2s7sOEKAIu38A8SqhUjTIJpUlUbzaxoK8SdTPOudV17otRJbHkvdlc/Jimobo7 n+HoYC2CEwLwZ1pgBmAG/VteV6QPXSCV62bzQQUiKKOBy42Kaz6HzEeIRKU4ptqFb79V HpO/fbHQfwep+sI5/hCnkCVYSTtTPb2yCM3P67qce871W080+wbSzuYDJxjrM9mau3Na 3hYUpWdHnK5qcWi7cjEZQ456rCJnZysg1FkLSS3T0d2sRX2P1BlSqARZQnpOJHieNVBN HoCRtBbeEL+kZb+G4twXIFuPhSOOc4WjGnYvDW2Vsgn6mBB3utIQg9imaMv6sDUJi9/4 tgxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=mKDdUXh3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e46si1062223jaf.50.2021.09.15.16.11.20; Wed, 15 Sep 2021 16:11:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=mKDdUXh3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232815AbhIOXLw (ORCPT + 99 others); Wed, 15 Sep 2021 19:11:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231197AbhIOXLv (ORCPT ); Wed, 15 Sep 2021 19:11:51 -0400 Received: from mail-ed1-x529.google.com (mail-ed1-x529.google.com [IPv6:2a00:1450:4864:20::529]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0C5B1C061574; Wed, 15 Sep 2021 16:10:32 -0700 (PDT) Received: by mail-ed1-x529.google.com with SMTP id c21so9632539edj.0; Wed, 15 Sep 2021 16:10:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9BtEwTezY+qlQ0z6udcY1NI+sWRfBUg4i/PQVpsepp0=; b=mKDdUXh3ZD6Q0nk7BTUP3zZAAgNh3CAlmZL8bCeApJLsy2E+8udpR+XxK5SomOti9c wE0fKNqqEg3jngk8STojSXv7igrZNhj0Kx9W749QBu/7Id6QwxsqtSBglOSKKJOC5GDM t6Gj5chRhn5m+m//diJNa25h157qVqj5ZUyYAIwx6A5oBB2Mgl1h8D7WXReI+ojaoJAk QIL3zznbP8S53Iafn6ampKpLeDScxlB9nvKm4PJCkyktfNHU7TsbHoNCJdiD49EK5D17 eqKsot1RJMRXHlx8Z4CnhcIMMIg6mCokUdNnAsa8M2TVVSm/mSVDh2Joouf0QXlyh6j0 MveQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9BtEwTezY+qlQ0z6udcY1NI+sWRfBUg4i/PQVpsepp0=; b=KISEM3WrO1FribNluRSYAwDzlXym6v49oiADr/ZcYJC+v5+QjYe2Ia6U8TCU3mXyeO +vn+ysjDZdqC3oXU3+LcTn+pVcFPFRH9vyxqkI70FMYgBDh4ihD/g49fBLFPjZmAeaYr xaD2pf/l/R9Mldfdd9SkHQQL1qgOXU+n6RDOeZZYMBKOOhxeAZkzkyXsiEPcCO8y720n 99c8LWYY5N5ImBIhwyzZEt7eD3HxLQMVlwC7djpw4ya7E734q/6IuquN6iwlAKFFa7yc S7yhEcXKsC2hkAA/NNyz5NGY6LrhE46R/ZhLwU+TqguDoL9fp2chKz6XvDWXypWrkJRX bdDQ== X-Gm-Message-State: AOAM532tS+3Isn0B76KulbE8nJvGeQf+jJxoqcFtx8HM2WxMOAOVykO6 tkS8nJeNe1MBdL+MyYHP10SLkLYoYFTZs4qMl+Q= X-Received: by 2002:a05:6402:14c3:: with SMTP id f3mr2841214edx.312.1631747430578; Wed, 15 Sep 2021 16:10:30 -0700 (PDT) MIME-Version: 1.0 References: <20210914183718.4236-1-shy828301@gmail.com> <20210914183718.4236-3-shy828301@gmail.com> <20210915114947.2zh7inouztenth6o@box.shutemov.name> In-Reply-To: From: Yang Shi Date: Wed, 15 Sep 2021 16:10:18 -0700 Message-ID: Subject: Re: [PATCH 2/4] mm: khugepaged: check if file page is on LRU after locking page To: "Kirill A. Shutemov" Cc: =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , Hugh Dickins , "Kirill A. Shutemov" , Matthew Wilcox , Oscar Salvador , Andrew Morton , Linux MM , Linux FS-devel Mailing List , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 15, 2021 at 4:00 PM Yang Shi wrote: > > On Wed, Sep 15, 2021 at 10:48 AM Yang Shi wrote: > > > > On Wed, Sep 15, 2021 at 4:49 AM Kirill A. Shutemov wrote: > > > > > > On Tue, Sep 14, 2021 at 11:37:16AM -0700, Yang Shi wrote: > > > > The khugepaged does check if the page is on LRU or not but it doesn't > > > > hold page lock. And it doesn't check this again after holding page > > > > lock. So it may race with some others, e.g. reclaimer, migration, etc. > > > > All of them isolates page from LRU then lock the page then do something. > > > > > > > > But it could pass the refcount check done by khugepaged to proceed > > > > collapse. Typically such race is not fatal. But if the page has been > > > > isolated from LRU before khugepaged it likely means the page may be not > > > > suitable for collapse for now. > > > > > > > > The other more fatal case is the following patch will keep the poisoned > > > > page in page cache for shmem, so khugepaged may collapse a poisoned page > > > > since the refcount check could pass. 3 refcounts come from: > > > > - hwpoison > > > > - page cache > > > > - khugepaged > > > > > > > > Since it is not on LRU so no refcount is incremented from LRU isolation. > > > > > > > > This is definitely not expected. Checking if it is on LRU or not after > > > > holding page lock could help serialize against hwpoison handler. > > > > > > > > But there is still a small race window between setting hwpoison flag and > > > > bump refcount in hwpoison handler. It could be closed by checking > > > > hwpoison flag in khugepaged, however this race seems unlikely to happen > > > > in real life workload. So just check LRU flag for now to avoid > > > > over-engineering. > > > > > > > > Signed-off-by: Yang Shi > > > > --- > > > > mm/khugepaged.c | 6 ++++++ > > > > 1 file changed, 6 insertions(+) > > > > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > > > index 045cc579f724..bdc161dc27dc 100644 > > > > --- a/mm/khugepaged.c > > > > +++ b/mm/khugepaged.c > > > > @@ -1808,6 +1808,12 @@ static void collapse_file(struct mm_struct *mm, > > > > goto out_unlock; > > > > } > > > > > > > > + /* The hwpoisoned page is off LRU but in page cache */ > > > > + if (!PageLRU(page)) { > > > > + result = SCAN_PAGE_LRU; > > > > + goto out_unlock; > > > > + } > > > > + > > > > if (isolate_lru_page(page)) { > > > > > > isolate_lru_page() should catch the case, no? TestClearPageLRU would fail > > > and we get here. > > > > Hmm... you are definitely right. How could I miss this point. > > > > It might be because of I messed up the page state by some tests which > > may do hole punch then reread the same index. That could drop the > > poisoned page then collapse succeed. But I'm not sure. Anyway I didn't > > figure out how the poisoned page could be collapsed. It seems > > impossible. I will drop this patch. > > I think I figured out the problem. This problem happened after the > page cache split patch and if the hwpoisoned page is not head page. It > is because THP split will unfreeze the refcount of tail pages to 2 > (restore refcount from page cache) then dec refcount to 1. The > refcount pin from hwpoison is gone and it is still on LRU. Then > khugepged locked the page before hwpoison, the refcount is expected to > khugepaged. > > The worse thing is it seems this problem is applicable to anonymous > page too. Once the anonymous THP is split by hwpoison the pin from > hwpoison is gone too the refcount is 1 (comes from PTE map). Then > khugepaged could collapse it to huge page again. It may incur data > corruption. > > And the poisoned page may be freed back to buddy since the lost refcount pin. > > If the poisoned page is head page, the code is fine since hwpoison > doesn't put the refcount for head page after split. > > The fix is simple, just keep the refcount pin for hwpoisoned subpage. Err... wait... I just realized I missed the below code block: if (subpage == page) continue; It skips the subpage passed to split_huge_page() so the refcount pin from the caller for this subpage is kept. And hwpoison doesn't put it. So it seems fine. > > > > > > > > > > result = SCAN_DEL_PAGE_LRU; > > > > goto out_unlock; > > > > -- > > > > 2.26.2 > > > > > > > > > > > > > > -- > > > Kirill A. Shutemov