Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp5591594ybl; Tue, 14 Jan 2020 11:31:03 -0800 (PST) X-Google-Smtp-Source: APXvYqxTwEbDL19sh0ltr3F9dFMFBL7CxbaqlqTm7qaynVp6qdwIFOVSc1E1jtkVnEKq20lLhcQC X-Received: by 2002:a05:6830:158:: with SMTP id j24mr18675419otp.316.1579030263928; Tue, 14 Jan 2020 11:31:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579030263; cv=none; d=google.com; s=arc-20160816; b=AkFt2k2GjfOJQu8rk6t1CMsaQijRTRHV/MDAl0GRl1tl5Sc0D2Z6wn0MJ/AeHOERld 7jr9dzRmHofVo6Dq1hMQsvHPWglPSHmIirudBJJ98xaZTaC1kPmhTLFi5+NDRI8n8Q61 6bjyBb6fQPTp8aSyfpb4nR4Du/5WpPCFwTgPjUTnSVIT4ja9vVLqPXth0MjvJmhRDQN1 cw94izK0n7dk6FU1kvioNaYty3ys682m2tUaFY/VfrIhXfTthS2PojfzNhE/1z27lOjz 503a70mrE7ZKaYlYdl5yybM8r+X81VCET9I1fhyv9hszbhC6JZiE0T9Qx8feZDjaNPP4 LTNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=mSbpUkG4ckHWEwZsh/DUoyIdNzxOBZwe/fnjG+q2gr8=; b=Y4G9EvITawMiVKvFo3wrcm9Qx/c5t2K9xeTcscybXaYh5BTxbTgjBJprBJwiA8/9xy grUgzuIfzrMeKy2l4r8pg/BK8Sr+URPP05Hhs7yft9VaUbgZh/fZYvrPxc1BvE0CJ6nt fySGN7wAj21NwogLJBzr8yef+00wVt+BUa5ehDqQfZXFbjExjfkkSO+OpKcuj5UB3hFp HokK8pOd4XFG937R1lN+W1uYjr1a0GbM5UlnIzQrd8BwkZhPUaVfW06TAtEXSEv56QrC ajk7I1O63Fby5DxNkPNfYO5Brp7iMITvm3KKiAAKzdrdyy7YJej9yt8pMkRSOWpR4to5 Ed2w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a9si8150611oib.59.2020.01.14.11.30.51; Tue, 14 Jan 2020 11:31:03 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728783AbgANT2u (ORCPT + 99 others); Tue, 14 Jan 2020 14:28:50 -0500 Received: from out30-44.freemail.mail.aliyun.com ([115.124.30.44]:51886 "EHLO out30-44.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726839AbgANT2t (ORCPT ); Tue, 14 Jan 2020 14:28:49 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R201e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04397;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0Tnkd7HG_1579030123; Received: from US-143344MP.local(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0Tnkd7HG_1579030123) by smtp.aliyun-inc.com(127.0.0.1); Wed, 15 Jan 2020 03:28:45 +0800 Subject: Re: [v2 PATCH] mm: shmem: allow split THP when truncating THP partially To: Hugh Dickins Cc: kirill.shutemov@linux.intel.com, aarcange@redhat.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <1575420174-19171-1-git-send-email-yang.shi@linux.alibaba.com> From: Yang Shi Message-ID: <00f0bb7d-3c25-a65f-ea94-3e2de8e9bcdd@linux.alibaba.com> Date: Tue, 14 Jan 2020 11:28:41 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/4/19 4:15 PM, Hugh Dickins wrote: > On Wed, 4 Dec 2019, Yang Shi wrote: > >> Currently when truncating shmem file, if the range is partial of THP >> (start or end is in the middle of THP), the pages actually will just get >> cleared rather than being freed unless the range cover the whole THP. >> Even though all the subpages are truncated (randomly or sequentially), >> the THP may still be kept in page cache. This might be fine for some >> usecases which prefer preserving THP. >> >> But, when doing balloon inflation in QEMU, QEMU actually does hole punch >> or MADV_DONTNEED in base page size granulairty if hugetlbfs is not used. >> So, when using shmem THP as memory backend QEMU inflation actually doesn't >> work as expected since it doesn't free memory. But, the inflation >> usecase really needs get the memory freed. Anonymous THP will not get >> freed right away too but it will be freed eventually when all subpages are >> unmapped, but shmem THP would still stay in page cache. >> >> Split THP right away when doing partial hole punch, and if split fails >> just clear the page so that read to the hole punched area would return >> zero. >> >> Cc: Hugh Dickins >> Cc: Kirill A. Shutemov >> Cc: Andrea Arcangeli >> Signed-off-by: Yang Shi >> --- >> v2: * Adopted the comment from Kirill. >> * Dropped fallocate mode flag, THP split is the default behavior. >> * Blended Huge's implementation with my v1 patch. TBH I'm not very keen to >> Hugh's find_get_entries() hack (basically neutral), but without that hack > Thanks for giving it a try. I'm not neutral about my find_get_entries() > hack: it surely had to go (without it, I'd have just pushed my own patch). > I've not noticed anything wrong with your patch, and it's in the right > direction, but I'm still not thrilled with it. I also remember that I > got the looping wrong in my first internal attempt (fixed in what I sent), > and need to be very sure of the try-again-versus-move-on-to-next conditions > before agreeing to anything. No rush, I'll come back to this in days or > month ahead: I'll try to find a less gotoey blend of yours and mine. Hi Hugh, Any update on this one? Thanks, Yang > > Hugh > >> we have to rely on pagevec_release() to release extra pins and play with >> goto. This version does in this way. The patch is bigger than Hugh's due >> to extra comments to make the flow clear. >> >> mm/shmem.c | 120 ++++++++++++++++++++++++++++++++++++++++++------------------- >> 1 file changed, 83 insertions(+), 37 deletions(-) >> >> diff --git a/mm/shmem.c b/mm/shmem.c >> index 220be9f..1ae0c7f 100644 >> --- a/mm/shmem.c >> +++ b/mm/shmem.c >> @@ -806,12 +806,15 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, >> long nr_swaps_freed = 0; >> pgoff_t index; >> int i; >> + bool split = false; >> + struct page *page = NULL; >> >> if (lend == -1) >> end = -1; /* unsigned, so actually very big */ >> >> pagevec_init(&pvec); >> index = start; >> +retry: >> while (index < end) { >> pvec.nr = find_get_entries(mapping, index, >> min(end - index, (pgoff_t)PAGEVEC_SIZE), >> @@ -819,7 +822,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, >> if (!pvec.nr) >> break; >> for (i = 0; i < pagevec_count(&pvec); i++) { >> - struct page *page = pvec.pages[i]; >> + split = false; >> + page = pvec.pages[i]; >> >> index = indices[i]; >> if (index >= end) >> @@ -838,23 +842,24 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, >> if (!trylock_page(page)) >> continue; >> >> - if (PageTransTail(page)) { >> - /* Middle of THP: zero out the page */ >> - clear_highpage(page); >> - unlock_page(page); >> - continue; >> - } else if (PageTransHuge(page)) { >> - if (index == round_down(end, HPAGE_PMD_NR)) { >> + if (PageTransCompound(page) && !unfalloc) { >> + if (PageHead(page) && >> + index != round_down(end, HPAGE_PMD_NR)) { >> /* >> - * Range ends in the middle of THP: >> - * zero out the page >> + * Fall through when punching whole >> + * THP. >> */ >> - clear_highpage(page); >> - unlock_page(page); >> - continue; >> + index += HPAGE_PMD_NR - 1; >> + i += HPAGE_PMD_NR - 1; >> + } else { >> + /* >> + * Split THP for any partial hole >> + * punch. >> + */ >> + get_page(page); >> + split = true; >> + goto split; >> } >> - index += HPAGE_PMD_NR - 1; >> - i += HPAGE_PMD_NR - 1; >> } >> >> if (!unfalloc || !PageUptodate(page)) { >> @@ -866,9 +871,29 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, >> } >> unlock_page(page); >> } >> +split: >> pagevec_remove_exceptionals(&pvec); >> pagevec_release(&pvec); >> cond_resched(); >> + >> + if (split) { >> + /* >> + * The pagevec_release() released all extra pins >> + * from pagevec lookup. And we hold an extra pin >> + * and still have the page locked under us. >> + */ >> + if (!split_huge_page(page)) { >> + unlock_page(page); >> + put_page(page); >> + /* Re-lookup page cache from current index */ >> + goto retry; >> + } >> + >> + /* Fail to split THP, move to next index */ >> + unlock_page(page); >> + put_page(page); >> + } >> + >> index++; >> } >> >> @@ -901,6 +926,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, >> return; >> >> index = start; >> +again: >> while (index < end) { >> cond_resched(); >> >> @@ -916,7 +942,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, >> continue; >> } >> for (i = 0; i < pagevec_count(&pvec); i++) { >> - struct page *page = pvec.pages[i]; >> + split = false; >> + page = pvec.pages[i]; >> >> index = indices[i]; >> if (index >= end) >> @@ -936,30 +963,24 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, >> >> lock_page(page); >> >> - if (PageTransTail(page)) { >> - /* Middle of THP: zero out the page */ >> - clear_highpage(page); >> - unlock_page(page); >> - /* >> - * Partial thp truncate due 'start' in middle >> - * of THP: don't need to look on these pages >> - * again on !pvec.nr restart. >> - */ >> - if (index != round_down(end, HPAGE_PMD_NR)) >> - start++; >> - continue; >> - } else if (PageTransHuge(page)) { >> - if (index == round_down(end, HPAGE_PMD_NR)) { >> + if (PageTransCompound(page) && !unfalloc) { >> + if (PageHead(page) && >> + index != round_down(end, HPAGE_PMD_NR)) { >> /* >> - * Range ends in the middle of THP: >> - * zero out the page >> + * Fall through when punching whole >> + * THP. >> */ >> - clear_highpage(page); >> - unlock_page(page); >> - continue; >> + index += HPAGE_PMD_NR - 1; >> + i += HPAGE_PMD_NR - 1; >> + } else { >> + /* >> + * Split THP for any partial hole >> + * punch. >> + */ >> + get_page(page); >> + split = true; >> + goto rescan_split; >> } >> - index += HPAGE_PMD_NR - 1; >> - i += HPAGE_PMD_NR - 1; >> } >> >> if (!unfalloc || !PageUptodate(page)) { >> @@ -976,8 +997,33 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, >> } >> unlock_page(page); >> } >> +rescan_split: >> pagevec_remove_exceptionals(&pvec); >> pagevec_release(&pvec); >> + >> + if (split) { >> + /* >> + * The pagevec_release() released all extra pins >> + * from pagevec lookup. And we hold an extra pin >> + * and still have the page locked under us. >> + */ >> + if (!split_huge_page(page)) { >> + unlock_page(page); >> + put_page(page); >> + /* Re-lookup page cache from current index */ >> + goto again; >> + } >> + >> + /* >> + * Split fail, clear the page then move to next >> + * index. >> + */ >> + clear_highpage(page); >> + >> + unlock_page(page); >> + put_page(page); >> + } >> + >> index++; >> } >> >> -- >> 1.8.3.1 >> >>