Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp10862572ybi; Thu, 11 Jul 2019 12:17:46 -0700 (PDT) X-Google-Smtp-Source: APXvYqxqjIsjDlXu2NsZXjyazk1u7Y/FnEAL6ivE/340UOasN3Nj+wIhCI+4ZZN12o4Fa4ulRxoF X-Received: by 2002:a63:607:: with SMTP id 7mr6032954pgg.240.1562872666061; Thu, 11 Jul 2019 12:17:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562872666; cv=none; d=google.com; s=arc-20160816; b=lLKS9O1fu19oDzMzXzaku87dq1obkzSR1IytgZrohg8JzHFrKx7uoG0foHe4+mM+N9 6alaWQcUdGyj4Z/j+T9xlHkKzm71Rv5TDQArJVlJQ21HDtsF5DofBHpiQRc5fsfTH1Nc dDSlPFd0cYKNb0nNn9hlUE0jQPWScNmkwx9R7xpbc1C+nk8lfEpL45LXRozn/xvzUMUA pPLSQtmsp42raPp8jM/tLGLHmaD9N5Zeub9VJq9g1+DivgvRfF7Y66eeL9gY6uIt2Khs 3bHYkwJSSm28o7Tt9vh1rQi8YpQsIuaVnSc85Yucu6PtGr97nBxFPy+/2GPzkGxiuBPG iDwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=Ye/ueaLSHeuUbF2UmwC3qi+L8ppesjqbY1zG1QqAhZE=; b=TTWkFrdqnmbaTyFLBBD6igF9uUhkbxRHLnraeAGPDWT3Lem/2KAspikalVfrslTufA Nhny3Tdk0zPTfiVn5lQ/aIqS0+Cu8DRRnzi34NZdHLFEli0t+TyuFo6IFCeWmuw0D2Wv lcXT3rfnc4Qeb0+qq/q0hBwmOuUAJ/rEP+kyjxdnkeOBbfsRUWq9v6A2Qa47E/CtEhQf mHBhNDTfC90Tc4Hc5p0p2y74GZmtnLLqp4Bh/yGb5nGRbJja5+x/vv2w/Twt4kvtW7/v BhCs60RySfGnpW+tRkp8u1RVrD/nVC4/7zQ1QSKKsbtup7KxWIwSnEgFCBPd80cUqGeh U5Tg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=LczJZj1L; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id be3si5382339plb.383.2019.07.11.12.17.29; Thu, 11 Jul 2019 12:17:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=LczJZj1L; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730380AbfGKSm2 (ORCPT + 99 others); Thu, 11 Jul 2019 14:42:28 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:37195 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730370AbfGKSm1 (ORCPT ); Thu, 11 Jul 2019 14:42:27 -0400 Received: by mail-pl1-f194.google.com with SMTP id b3so3476547plr.4 for ; Thu, 11 Jul 2019 11:42:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=Ye/ueaLSHeuUbF2UmwC3qi+L8ppesjqbY1zG1QqAhZE=; b=LczJZj1LMI38wtffmfcB6+DCfMF5aokwVwquBj7rOFWJQgRco8bf3jn7Y7DRGU5AKV eyjU2kDty+AJg5gmH7znkDUWdSkh2RLvTYYxEUHfUOHIABMuT/zxz26VO2fWPMLpbI42 Q0nKfVbhqRFEaZHRqg01t1RLn3rSDiadjybENTkR+xzCP5NkdrHC11Iih4QtXXJJEJ+2 wAnVn52naxG/hzDpgjqUVpVG8nvWNFb9Nm9+54eKFTwbHQZW+h21HGKmNMQL9ThFb9Vb wu6ijN+pWhsut8bcnEO8hAL8JYSaTWFrXUY1xJwZBT5S2jLMFydUdVKuRZGN4kxI/tkg glJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=Ye/ueaLSHeuUbF2UmwC3qi+L8ppesjqbY1zG1QqAhZE=; b=CYDOHOSoC199BDKPRftCcZyPp/W3liep4mPk2RQkfSAUC/GXB0vaBylgrm+6sBCO01 7boOfDkSS59+Q6nS5NDfFVDBnEE7Gx24VkSMjyP6/HKfIEQxBFHTroIegN/ATyB7VNA6 REmE2216L68NcLIUDyRulc3DFBzG/MEXmlD+rFA5NbtHu102T3/EANZXc4Ls/FGg3/a3 rbtfHn4eZdcLFQAE4cmMHGPYdHgmkjg7qF7MpTpqV4GlJq/8YDxXELQfvQ9edklqhUay DKdseimSYi07uyr0SRZO5tT+ODf2ekVifYjQEO7FqYqqByYnK7H+1ygkChQKK/SOgIx5 pgvA== X-Gm-Message-State: APjAAAXjk3gmxlYGkdZ+VLedgo0BMUDantKvorHjujg0TxLbfRJJD3e4 dlnhtsAVtvPlW3QZ1gImN5CYpPrM X-Received: by 2002:a17:902:aa88:: with SMTP id d8mr5969327plr.274.1562870546389; Thu, 11 Jul 2019 11:42:26 -0700 (PDT) Received: from localhost ([2620:10d:c091:500::2:5385]) by smtp.gmail.com with ESMTPSA id 85sm6717518pfv.130.2019.07.11.11.42.25 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 11 Jul 2019 11:42:25 -0700 (PDT) Date: Thu, 11 Jul 2019 14:42:23 -0400 From: Johannes Weiner To: Minchan Kim Cc: Andrew Morton , linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" Subject: Re: [PATCH v4 4/4] mm: introduce MADV_PAGEOUT Message-ID: <20190711184223.GD20341@cmpxchg.org> References: <20190711012528.176050-1-minchan@kernel.org> <20190711012528.176050-5-minchan@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190711012528.176050-5-minchan@kernel.org> User-Agent: Mutt/1.12.0 (2019-05-25) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 11, 2019 at 10:25:28AM +0900, Minchan Kim wrote: > @@ -480,6 +482,198 @@ static long madvise_cold(struct vm_area_struct *vma, > return 0; > } > > +static int madvise_pageout_pte_range(pmd_t *pmd, unsigned long addr, > + unsigned long end, struct mm_walk *walk) > +{ > + struct mmu_gather *tlb = walk->private; > + struct mm_struct *mm = tlb->mm; > + struct vm_area_struct *vma = walk->vma; > + pte_t *orig_pte, *pte, ptent; > + spinlock_t *ptl; > + LIST_HEAD(page_list); > + struct page *page; > + unsigned long next; > + > + if (fatal_signal_pending(current)) > + return -EINTR; > + > + next = pmd_addr_end(addr, end); > + if (pmd_trans_huge(*pmd)) { > + pmd_t orig_pmd; > + > + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); > + ptl = pmd_trans_huge_lock(pmd, vma); > + if (!ptl) > + return 0; > + > + orig_pmd = *pmd; > + if (is_huge_zero_pmd(orig_pmd)) > + goto huge_unlock; > + > + if (unlikely(!pmd_present(orig_pmd))) { > + VM_BUG_ON(thp_migration_supported() && > + !is_pmd_migration_entry(orig_pmd)); > + goto huge_unlock; > + } > + > + page = pmd_page(orig_pmd); > + if (next - addr != HPAGE_PMD_SIZE) { > + int err; > + > + if (page_mapcount(page) != 1) > + goto huge_unlock; > + get_page(page); > + spin_unlock(ptl); > + lock_page(page); > + err = split_huge_page(page); > + unlock_page(page); > + put_page(page); > + if (!err) > + goto regular_page; > + return 0; > + } > + > + if (isolate_lru_page(page)) > + goto huge_unlock; > + > + if (pmd_young(orig_pmd)) { > + pmdp_invalidate(vma, addr, pmd); > + orig_pmd = pmd_mkold(orig_pmd); > + > + set_pmd_at(mm, addr, pmd, orig_pmd); > + tlb_remove_tlb_entry(tlb, pmd, addr); > + } > + > + ClearPageReferenced(page); > + test_and_clear_page_young(page); > + list_add(&page->lru, &page_list); > +huge_unlock: > + spin_unlock(ptl); > + reclaim_pages(&page_list); > + return 0; > + } > + > + if (pmd_trans_unstable(pmd)) > + return 0; > +regular_page: > + tlb_change_page_size(tlb, PAGE_SIZE); > + orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); > + flush_tlb_batched_pending(mm); > + arch_enter_lazy_mmu_mode(); > + for (; addr < end; pte++, addr += PAGE_SIZE) { > + ptent = *pte; > + if (!pte_present(ptent)) > + continue; > + > + page = vm_normal_page(vma, addr, ptent); > + if (!page) > + continue; > + > + /* > + * creating a THP page is expensive so split it only if we > + * are sure it's worth. Split it if we are only owner. > + */ > + if (PageTransCompound(page)) { > + if (page_mapcount(page) != 1) > + break; > + get_page(page); > + if (!trylock_page(page)) { > + put_page(page); > + break; > + } > + pte_unmap_unlock(orig_pte, ptl); > + if (split_huge_page(page)) { > + unlock_page(page); > + put_page(page); > + pte_offset_map_lock(mm, pmd, addr, &ptl); > + break; > + } > + unlock_page(page); > + put_page(page); > + pte = pte_offset_map_lock(mm, pmd, addr, &ptl); > + pte--; > + addr -= PAGE_SIZE; > + continue; > + } > + > + VM_BUG_ON_PAGE(PageTransCompound(page), page); > + > + if (isolate_lru_page(page)) > + continue; > + > + if (pte_young(ptent)) { > + ptent = ptep_get_and_clear_full(mm, addr, pte, > + tlb->fullmm); > + ptent = pte_mkold(ptent); > + set_pte_at(mm, addr, pte, ptent); > + tlb_remove_tlb_entry(tlb, pte, addr); > + } > + ClearPageReferenced(page); > + test_and_clear_page_young(page); > + list_add(&page->lru, &page_list); > + } > + > + arch_leave_lazy_mmu_mode(); > + pte_unmap_unlock(orig_pte, ptl); > + reclaim_pages(&page_list); > + cond_resched(); > + > + return 0; > +} I know you have briefly talked about code sharing already. While I agree that sharing with MADV_FREE is maybe a stretch, I applied these patches and compared the pageout and the cold page table functions, and they are line for line the same EXCEPT for 2-3 lines at the very end, where one reclaims and the other deactivates. It would be good to share here, it shouldn't be hard or result in fragile code. Something like int madvise_cold_or_pageout_range(..., bool pageout)?