Received: by 10.223.164.202 with SMTP id h10csp5667048wrb; Tue, 21 Nov 2017 13:35:15 -0800 (PST) X-Google-Smtp-Source: AGs4zMbW1YkMCAJ3TTnnFr9MsnyDK2xxwO/xHYD3bRFhjSixWW0RRcdVi1zB4Vs14uBBGaewF2Ie X-Received: by 10.98.162.20 with SMTP id m20mr6764730pff.6.1511300115098; Tue, 21 Nov 2017 13:35:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511300115; cv=none; d=google.com; s=arc-20160816; b=sc5z7TVejm9Ug65SeahL70nJjsMyISTEnuGh3ukkgO1HPK52XGd2VzLBL42S+C1fHr nA/fCA1cPMvw3Gez2YG41YYzY2FCfxmBRDEkv4O/5LPGnC2O9K+fGbzx+NqbES8YClEp hm7xieQZlbOnV6GMXgTyB4Yypu7hVygYB78Jd70GfJ8DeH2e+Wg1qTN4gAxGu+mSqsIw hLJ9eDQqPa5mHMx4Rg3eZ6dNJopPalzbHD248U1SA7Rz03430+LEz5zMHmZXyA1XKoOg EGu9zAxz9qaI9aQbUJPuNYm0YNphHu8Utp4ay4S3g4xr+caeFiUqojV27boDdeR3yC6r LqVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=MOEvPLk+jpn7olf5qvid9RY6YqqIZ5rPCWIuAFtvJl0=; b=LItK7osYq/qrNjHkVTBhFyVjfqP+ZJ/NCxaxSlaXSf+JfjYZWlFLnbIYQZBWfWzVNh arUB3wRAPIgd4BLhvlGhN96ASSHokrmda25keEPgrt9rSRNuTwcd2/BSZY3GuGadjtLJ knkhcZKYNHzcagy5xnCisylGK+kI2xx5NmSYu9dvVCGFhRyrECL3Ev9H47lInpl/4daY unAyirdJbme4bIKHE4hQ9xaaRTLTc9eOUlQGGs4Zbla2HUkN7nJzY5Ob/ggBXMVN2zhU zHFyChS9mzTKSUQrf3TZSgz+1+XyX6lyXeUbwqBHFEvmTjaZpyIX64D70oz2/atev2c0 aFLQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@cmpxchg.org header.s=x header.b=DJgqBcaP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v5si11634720pgc.309.2017.11.21.13.35.03; Tue, 21 Nov 2017 13:35:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@cmpxchg.org header.s=x header.b=DJgqBcaP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751413AbdKUVeV (ORCPT + 76 others); Tue, 21 Nov 2017 16:34:21 -0500 Received: from gum.cmpxchg.org ([85.214.110.215]:50214 "EHLO gum.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751229AbdKUVeU (ORCPT ); Tue, 21 Nov 2017 16:34:20 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=cmpxchg.org ; s=x; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject: Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=MOEvPLk+jpn7olf5qvid9RY6YqqIZ5rPCWIuAFtvJl0=; b=DJgqBcaPCh+PJ7a0Is8hrJk9AB H/ZEmqhk0b9QIdrzgKzEvhjgtcrrEni7k5QwUIxles+/bJtYspHIqoClZGUYd21EKCB2CpD6IPR6N B4Ss7d8ENzUnymU8AIeDHH5EDz4/w4nFn/m3/9pTuLC3pOS/H8Nysy1iN0cSipkNfANE=; Date: Tue, 21 Nov 2017 16:34:00 -0500 From: Johannes Weiner To: Shakeel Butt Cc: Vlastimil Babka , Huang Ying , Tim Chen , Michal Hocko , Greg Thelen , Andrew Morton , Balbir Singh , Minchan Kim , Shaohua Li , =?iso-8859-1?B?Suly9G1l?= Glisse , Jan Kara , Nicholas Piggin , Dan Williams , Mel Gorman , Hugh Dickins , Linux MM , LKML Subject: Re: [PATCH] mm, mlock, vmscan: no more skipping pagevecs Message-ID: <20171121213400.GA1503@cmpxchg.org> References: <20171104224312.145616-1-shakeelb@google.com> <577ab7e8-b079-125b-80ca-6168dd24720a@suse.cz> <20171121150632.GA23460@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 21, 2017 at 10:22:23AM -0800, Shakeel Butt wrote: > On Tue, Nov 21, 2017 at 7:06 AM, Johannes Weiner wrote: > > On Tue, Nov 21, 2017 at 01:39:57PM +0100, Vlastimil Babka wrote: > >> On 11/04/2017 11:43 PM, Shakeel Butt wrote: > >> > When a thread mlocks an address space backed by file, a new > >> > page is allocated (assuming file page is not in memory), added > >> > to the local pagevec (lru_add_pvec), I/O is triggered and the > >> > thread then sleeps on the page. On I/O completion, the thread > >> > can wake on a different CPU, the mlock syscall will then sets > >> > the PageMlocked() bit of the page but will not be able to put > >> > that page in unevictable LRU as the page is on the pagevec of > >> > a different CPU. Even on drain, that page will go to evictable > >> > LRU because the PageMlocked() bit is not checked on pagevec > >> > drain. > >> > > >> > The page will eventually go to right LRU on reclaim but the > >> > LRU stats will remain skewed for a long time. > >> > > >> > However, this issue does not happen for anon pages on swap > >> > because unlike file pages, anon pages are not added to pagevec > >> > until they have been fully swapped in. Also the fault handler > >> > uses vm_flags to set the PageMlocked() bit of such anon pages > >> > even before returning to mlock() syscall and mlocked pages will > >> > skip pagevecs and directly be put into unevictable LRU. No such > >> > luck for file pages. > >> > > >> > One way to resolve this issue, is to somehow plumb vm_flags from > >> > filemap_fault() to add_to_page_cache_lru() which will then skip > >> > the pagevec for pages of VM_LOCKED vma and directly put them to > >> > unevictable LRU. However this patch took a different approach. > >> > > >> > All the pages, even unevictable, will be added to the pagevecs > >> > and on the drain, the pages will be added on their LRUs correctly > >> > by checking their evictability. This resolves the mlocked file > >> > pages on pagevec of other CPUs issue because when those pagevecs > >> > will be drained, the mlocked file pages will go to unevictable > >> > LRU. Also this makes the race with munlock easier to resolve > >> > because the pagevec drains happen in LRU lock. > >> > > >> > There is one (good) side effect though. Without this patch, the > >> > pages allocated for System V shared memory segment are added to > >> > evictable LRUs even after shmctl(SHM_LOCK) on that segment. This > >> > patch will correctly put such pages to unevictable LRU. > >> > > >> > Signed-off-by: Shakeel Butt > >> > >> I like the approach in general, as it seems to make the code simpler, > >> and the diffstats support that. I found no bugs, but I can't say that > >> with certainty that there aren't any, though. This code is rather > >> tricky. But it should be enough for an ack, so. > >> > >> Acked-by: Vlastimil Babka > >> > >> A question below, though. > >> > >> ... > >> > >> > @@ -883,15 +855,41 @@ void lru_add_page_tail(struct page *page, struct page *page_tail, > >> > static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, > >> > void *arg) > >> > { > >> > - int file = page_is_file_cache(page); > >> > - int active = PageActive(page); > >> > - enum lru_list lru = page_lru(page); > >> > + enum lru_list lru; > >> > + int was_unevictable = TestClearPageUnevictable(page); > >> > > >> > VM_BUG_ON_PAGE(PageLRU(page), page); > >> > > >> > SetPageLRU(page); > >> > + /* > >> > + * Page becomes evictable in two ways: > >> > + * 1) Within LRU lock [munlock_vma_pages() and __munlock_pagevec()]. > >> > + * 2) Before acquiring LRU lock to put the page to correct LRU and then > >> > + * a) do PageLRU check with lock [check_move_unevictable_pages] > >> > + * b) do PageLRU check before lock [isolate_lru_page] > >> > + * > >> > + * (1) & (2a) are ok as LRU lock will serialize them. For (2b), if the > >> > + * other thread does not observe our setting of PG_lru and fails > >> > + * isolation, the following page_evictable() check will make us put > >> > + * the page in correct LRU. > >> > + */ > >> > + smp_mb(); > >> > >> Could you elaborate on the purpose of smp_mb() here? Previously there > >> was "The other side is TestClearPageMlocked() or shmem_lock()" in > >> putback_lru_page(), which seems rather unclear to me (neither has an > >> explicit barrier?). > > > > The TestClearPageMlocked() is an RMW operation with return value, and > > thus an implicit full barrier (see Documentation/atomic_bitops.txt). > > > > The ordering is between putback and munlock: > > > > #0 #1 > > list_add(&page->lru,...) if (TestClearPageMlock()) > > SetPageLRU() __munlock_isolate_lru_page() > > smp_mb() > > if (page_evictable()) > > rescue > > > > The scenario that the barrier prevents from happening is: > > > > list_add(&page->lru,...) > > if (page_evictable()) > > rescue > > if (TestClearPageMlock()) > > __munlock_isolate_lru_page() // FAILS on !PageLRU > > SetPageLRU() > > > > and now an evictable page is stranded on the unevictable LRU. > > > > The barrier guarantees that if #0 doesn't see the page evictable yet, > > #1 WILL see the PageLRU and succeed in isolation and rescue. > > > > Shakeel, please don't drop that "the other side" comment. You mention > > the places that make the page evictable - which is great, and please > > keep that as well - but for barriers it's always good to know exactly > > which operation guarantees the ordering on the other side. In fact, it > > would be great if you could add comments to the TestClearPageMlocked() > > sites that mention how they order against the smp_mb() in LRU putback. > > Johannes, I have a question. The example you presented is valid before > this patch as '#0' was happening outside LRU lock. This patch moves > '#0' inside LRU lock and '#1' was already in LRU lock therefore no > issue for this particular scenario. However there is still a > TestClearPageMlocked() in clear_page_mlock() which happens outside LRU > lock and same issue which you have explained can happen even with this > patch (but without smp_mb()). > > So, "the other side" for smp_mb() after this patch will only be the > TestClearPageMlock() in clear_page_mlock() because all other > TestClearPageMlocked() instances are serialized by LRU lock. Please > let me know if I missed something. You are right, I overlooked the lru lock in __munlock_pagevec(). It's really only clear_page_mlock() that needs the ordering. From 1584711690393902094@xxx Tue Nov 21 21:13:58 +0000 2017 X-GM-THRID: 1583177229608801208 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread