Received: by 10.223.164.202 with SMTP id h10csp5261626wrb; Tue, 21 Nov 2017 07:08:14 -0800 (PST) X-Google-Smtp-Source: AGs4zMaYtuQgpnOMeCCbGbqRsyh98vkHv6zgMHkiibpf2OIGdwd2xKpeCk5Ar5d6hGG8QOJMMFEd X-Received: by 10.84.135.3 with SMTP id 3mr17781967pli.147.1511276894191; Tue, 21 Nov 2017 07:08:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511276894; cv=none; d=google.com; s=arc-20160816; b=iUOjl/qhdM0HUW00Ru8OwyWMz7aF3umUrCkfWibxV4kdKLfDDCjotum8a4JJxGDPfq 35WFmjq1292aTvN6a0HxN5iT92NTr0M8XrDyAXkcE15GAzQiGSVBT4R7Cuv4J0m7BMQS d185H/k195hAq2ZuUcdFyCBG4H1c1tL3hOoFWGzv+T6a/C9Jghv+g3UJC1HOMeicETQM flzudzXkIAOs6RIQ5yrjEInlWK3HbZEeqEjJBrnuACjMGZNYNBcXi1VWkCESJo6A/0a0 LYDNiZs3sSDRUWcAZmT5B+p64tDT28xe5UsWj/k5o3S1ykHlmJrak2GCnHbVZ/FDXUHJ o+3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=OVNS04h9XzY0p2GdpDx2Z6xJKpmaB+5GTkSkkrFKICo=; b=MGEZROIY12eYSfmfnTyWR+umSBqH+GwLfTh4ju89virj1zdnlqrgYqpNrACkvpNozj qzZLljLdAI/cA1F/LipagyzeOIRdz6q4oS6GgQQOWc1zDElBbGcgg2z6EvorLoPXWhKR S6wCjTxeavks3ZFpK4wkSSRa4mgw+5VkOHY+/5xQaP6D5ALLhoC2vorILd49SBcC1Gwu G0bY1dH6fS9goOjEtyJ7Bi6oGXzncZP/02mDJEyfby0kiJCLHrIr3Oiz8V0TlJD66rCr FBD8gvh0W1/VAs5u2NeFEmn9F8QqYRzcmUU4ri2QYkjHqRPWsYxI03gB3X/GXAQlviiR p2Pw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@cmpxchg.org header.s=x header.b=bu/V7W2U; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f14si10967323pgr.65.2017.11.21.07.08.02; Tue, 21 Nov 2017 07:08:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@cmpxchg.org header.s=x header.b=bu/V7W2U; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751232AbdKUPG6 (ORCPT + 76 others); Tue, 21 Nov 2017 10:06:58 -0500 Received: from gum.cmpxchg.org ([85.214.110.215]:50098 "EHLO gum.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751067AbdKUPG4 (ORCPT ); Tue, 21 Nov 2017 10:06:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=cmpxchg.org ; s=x; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject: Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=OVNS04h9XzY0p2GdpDx2Z6xJKpmaB+5GTkSkkrFKICo=; b=bu/V7W2UfnbX5OzElOxf/nKAOO 43guL3lIwBuTvL9XDPUxNnuQuOJOQ1B+amdWK5DmEdfxc+MiOtpYb6vYcrHfZS7vU9Jy+j+8eN54O uYZNmSOHnP8EsGoFDjmvb5jEgXSwL2vmwWT2nbJbLJxflSbrz2ezI7AiZ6pALzxmmz/U=; Date: Tue, 21 Nov 2017 10:06:32 -0500 From: Johannes Weiner To: Vlastimil Babka Cc: Shakeel Butt , Huang Ying , Tim Chen , Michal Hocko , Greg Thelen , Andrew Morton , Balbir Singh , Minchan Kim , Shaohua Li , =?iso-8859-1?B?Suly9G1l?= Glisse , Jan Kara , Nicholas Piggin , Dan Williams , Mel Gorman , Hugh Dickins , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm, mlock, vmscan: no more skipping pagevecs Message-ID: <20171121150632.GA23460@cmpxchg.org> References: <20171104224312.145616-1-shakeelb@google.com> <577ab7e8-b079-125b-80ca-6168dd24720a@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <577ab7e8-b079-125b-80ca-6168dd24720a@suse.cz> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 21, 2017 at 01:39:57PM +0100, Vlastimil Babka wrote: > On 11/04/2017 11:43 PM, Shakeel Butt wrote: > > When a thread mlocks an address space backed by file, a new > > page is allocated (assuming file page is not in memory), added > > to the local pagevec (lru_add_pvec), I/O is triggered and the > > thread then sleeps on the page. On I/O completion, the thread > > can wake on a different CPU, the mlock syscall will then sets > > the PageMlocked() bit of the page but will not be able to put > > that page in unevictable LRU as the page is on the pagevec of > > a different CPU. Even on drain, that page will go to evictable > > LRU because the PageMlocked() bit is not checked on pagevec > > drain. > > > > The page will eventually go to right LRU on reclaim but the > > LRU stats will remain skewed for a long time. > > > > However, this issue does not happen for anon pages on swap > > because unlike file pages, anon pages are not added to pagevec > > until they have been fully swapped in. Also the fault handler > > uses vm_flags to set the PageMlocked() bit of such anon pages > > even before returning to mlock() syscall and mlocked pages will > > skip pagevecs and directly be put into unevictable LRU. No such > > luck for file pages. > > > > One way to resolve this issue, is to somehow plumb vm_flags from > > filemap_fault() to add_to_page_cache_lru() which will then skip > > the pagevec for pages of VM_LOCKED vma and directly put them to > > unevictable LRU. However this patch took a different approach. > > > > All the pages, even unevictable, will be added to the pagevecs > > and on the drain, the pages will be added on their LRUs correctly > > by checking their evictability. This resolves the mlocked file > > pages on pagevec of other CPUs issue because when those pagevecs > > will be drained, the mlocked file pages will go to unevictable > > LRU. Also this makes the race with munlock easier to resolve > > because the pagevec drains happen in LRU lock. > > > > There is one (good) side effect though. Without this patch, the > > pages allocated for System V shared memory segment are added to > > evictable LRUs even after shmctl(SHM_LOCK) on that segment. This > > patch will correctly put such pages to unevictable LRU. > > > > Signed-off-by: Shakeel Butt > > I like the approach in general, as it seems to make the code simpler, > and the diffstats support that. I found no bugs, but I can't say that > with certainty that there aren't any, though. This code is rather > tricky. But it should be enough for an ack, so. > > Acked-by: Vlastimil Babka > > A question below, though. > > ... > > > @@ -883,15 +855,41 @@ void lru_add_page_tail(struct page *page, struct page *page_tail, > > static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, > > void *arg) > > { > > - int file = page_is_file_cache(page); > > - int active = PageActive(page); > > - enum lru_list lru = page_lru(page); > > + enum lru_list lru; > > + int was_unevictable = TestClearPageUnevictable(page); > > > > VM_BUG_ON_PAGE(PageLRU(page), page); > > > > SetPageLRU(page); > > + /* > > + * Page becomes evictable in two ways: > > + * 1) Within LRU lock [munlock_vma_pages() and __munlock_pagevec()]. > > + * 2) Before acquiring LRU lock to put the page to correct LRU and then > > + * a) do PageLRU check with lock [check_move_unevictable_pages] > > + * b) do PageLRU check before lock [isolate_lru_page] > > + * > > + * (1) & (2a) are ok as LRU lock will serialize them. For (2b), if the > > + * other thread does not observe our setting of PG_lru and fails > > + * isolation, the following page_evictable() check will make us put > > + * the page in correct LRU. > > + */ > > + smp_mb(); > > Could you elaborate on the purpose of smp_mb() here? Previously there > was "The other side is TestClearPageMlocked() or shmem_lock()" in > putback_lru_page(), which seems rather unclear to me (neither has an > explicit barrier?). The TestClearPageMlocked() is an RMW operation with return value, and thus an implicit full barrier (see Documentation/atomic_bitops.txt). The ordering is between putback and munlock: #0 #1 list_add(&page->lru,...) if (TestClearPageMlock()) SetPageLRU() __munlock_isolate_lru_page() smp_mb() if (page_evictable()) rescue The scenario that the barrier prevents from happening is: list_add(&page->lru,...) if (page_evictable()) rescue if (TestClearPageMlock()) __munlock_isolate_lru_page() // FAILS on !PageLRU SetPageLRU() and now an evictable page is stranded on the unevictable LRU. The barrier guarantees that if #0 doesn't see the page evictable yet, #1 WILL see the PageLRU and succeed in isolation and rescue. Shakeel, please don't drop that "the other side" comment. You mention the places that make the page evictable - which is great, and please keep that as well - but for barriers it's always good to know exactly which operation guarantees the ordering on the other side. In fact, it would be great if you could add comments to the TestClearPageMlocked() sites that mention how they order against the smp_mb() in LRU putback. From 1584679411854788994@xxx Tue Nov 21 12:40:54 +0000 2017 X-GM-THRID: 1583177229608801208 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread