Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756967AbcCXJRu (ORCPT ); Thu, 24 Mar 2016 05:17:50 -0400 Received: from mail-wm0-f49.google.com ([74.125.82.49]:35021 "EHLO mail-wm0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753065AbcCXJRl (ORCPT ); Thu, 24 Mar 2016 05:17:41 -0400 Date: Thu, 24 Mar 2016 12:17:28 +0300 From: "Kirill A. Shutemov" To: Hugh Dickins Cc: "Kirill A. Shutemov" , Andrea Arcangeli , Andrew Morton , Dave Hansen , Vlastimil Babka , Christoph Lameter , Naoya Horiguchi , Jerome Marchand , Yang Shi , Sasha Levin , Ning Qu , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCHv4 00/25] THP-enabled tmpfs/shmem Message-ID: <20160324091727.GA26796@node.shutemov.name> References: <1457737157-38573-1-git-send-email-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3828 Lines: 102 On Wed, Mar 23, 2016 at 01:09:05PM -0700, Hugh Dickins wrote: > The small files thing formed my first impression. My second > impression was similar, when I tried mmap(NULL, size_of_RAM, > PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_SHARED, -1, 0) and > cycled around the arena touching all the pages (which of > course has to push a little into swap): that soon OOMed. > > But there I think you probably just have some minor bug to be fixed: > I spent a little while trying to debug it, but then decided I'd > better get back to writing to you. I didn't really understand what > I was seeing, but when I hacked some stats into shrink_page_list(), > converting !is_page_cache_freeable(page) to page_cache_references(page) > to return the difference instead of the bool, a large proportion of > huge tmpfs pages seemed to have count 1 too high to be freeable at > that point (and one huge tmpfs page had a count of 3477). I'll reply to your other points later, but first I wanted to address this obvious bug. I cannot really explain page_count() == 3477, but otherwise: The root cause is that try_to_unmap() doesn't handle PMD-mapped huge pages, so we hit 'case SWAP_AGAIN' all the time. The patch below effectively rewrites 17/25: now we split the huge page before trying to unmap it. split_huge_page() has its own check similar to is_page_cache_freeable(), so we woundn't split pages we cannot free later on. And split_huge_page() for file pages would unmap the page, so we wouldn't need to go to try_to_unmap() after that. The patch look rather simple, but I haven't done full validation cycle for it. Regressions are unlikely, but possible. At some point we would need to teach try_to_unmap() to handle huge pages. It would be required for filesystems with backing storage. But I don't see need for it to get huge tmpfs/shmem work. diff --git a/mm/vmscan.c b/mm/vmscan.c index 9fa9e15594e9..86008f8f1f9b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -473,14 +473,12 @@ void drop_slab(void) static inline int is_page_cache_freeable(struct page *page) { - int radix_tree_pins = PageTransHuge(page) ? HPAGE_PMD_NR : 1; - /* * A freeable page cache page is referenced only by the caller * that isolated the page, the page cache radix tree and * optional buffer heads at page->private. */ - return page_count(page) - page_has_private(page) == 1 + radix_tree_pins; + return page_count(page) - page_has_private(page) == 2; } static int may_write_to_inode(struct inode *inode, struct scan_control *sc) @@ -550,6 +548,8 @@ static pageout_t pageout(struct page *page, struct address_space *mapping, * swap_backing_dev_info is bust: it doesn't reflect the * congestion state of the swapdevs. Easy to fix, if needed. */ + if (!is_page_cache_freeable(page)) + return PAGE_KEEP; if (!mapping) { /* * Some data journaling orphaned pages can have @@ -1055,8 +1055,14 @@ static unsigned long shrink_page_list(struct list_head *page_list, /* Adding to swap updated mapping */ mapping = page_mapping(page); + } else if (unlikely(PageTransHuge(page))) { + /* Split file THP */ + if (split_huge_page_to_list(page, page_list)) + goto keep_locked; } + VM_BUG_ON_PAGE(PageTransHuge(page), page); + /* * The page is mapped into the page tables of one or more * processes. Try to unmap it here. @@ -1112,15 +1118,6 @@ static unsigned long shrink_page_list(struct list_head *page_list, * starts and then write it out here. */ try_to_unmap_flush_dirty(); - - if (!is_page_cache_freeable(page)) - goto keep_locked; - - if (unlikely(PageTransHuge(page))) { - if (split_huge_page_to_list(page, page_list)) - goto keep_locked; - } - switch (pageout(page, mapping, sc)) { case PAGE_KEEP: goto keep_locked; -- Kirill A. Shutemov