Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752754AbbEDDRw (ORCPT ); Sun, 3 May 2015 23:17:52 -0400 Received: from mail-pd0-f172.google.com ([209.85.192.172]:36660 "EHLO mail-pd0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752720AbbEDDRf (ORCPT ); Sun, 3 May 2015 23:17:35 -0400 Date: Mon, 4 May 2015 12:17:22 +0900 From: Minchan Kim To: Vladimir Davydov Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Greg Thelen , Michel Lespinasse , David Rientjes , Pavel Emelyanov , Cyrill Gorcunov , Jonathan Corbet , linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 3/3] proc: add kpageidle file Message-ID: <20150504031722.GA2768@blaptop> References: <4c24a6bf2c9711dd4dbb72a43a16eba6867527b7.1430217477.git.vdavydov@parallels.com> <20150429043536.GB11486@blaptop> <20150429091248.GD1694@esperanza> <20150430082531.GD21771@blaptop> <20150430145055.GB17640@esperanza> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150430145055.GB17640@esperanza> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5343 Lines: 121 On Thu, Apr 30, 2015 at 05:50:55PM +0300, Vladimir Davydov wrote: > On Thu, Apr 30, 2015 at 05:25:31PM +0900, Minchan Kim wrote: > > On Wed, Apr 29, 2015 at 12:12:48PM +0300, Vladimir Davydov wrote: > > > On Wed, Apr 29, 2015 at 01:35:36PM +0900, Minchan Kim wrote: > > > > On Tue, Apr 28, 2015 at 03:24:42PM +0300, Vladimir Davydov wrote: > > > > > +#ifdef CONFIG_IDLE_PAGE_TRACKING > > > > > +static struct page *kpageidle_get_page(unsigned long pfn) > > > > > +{ > > > > > + struct page *page; > > > > > + > > > > > + if (!pfn_valid(pfn)) > > > > > + return NULL; > > > > > + page = pfn_to_page(pfn); > > > > > + /* > > > > > + * We are only interested in user memory pages, i.e. pages that are > > > > > + * allocated and on an LRU list. > > > > > + */ > > > > > + if (!page || page_count(page) == 0 || !PageLRU(page)) > > > > > + return NULL; > > > > > + if (!get_page_unless_zero(page)) > > > > > + return NULL; > > > > > + if (unlikely(!PageLRU(page))) { > > > > > > > > What lock protect the check PageLRU? > > > > If it is racing ClearPageLRU, what happens? > > > > > > If we hold a reference to a page and see that it's on an LRU list, it > > > will surely remain a user memory page at least until we release the > > > reference to it, so it must be safe to play with idle/young flags. If we > > > > The problem is that you pass the page in rmap reverse logic(ie, page_referenced) > > once you judge it's LRU page so if it is false-positive, what happens? > > A question is SetPageLRU, PageLRU, ClearPageLRU keeps memory ordering? > > IOW, all of fields from struct page rmap can acccess should be set up completely > > before LRU checking. Otherwise, something will be broken. > > So, basically you are concerned about the case when we encounter a > freshly allocated page, which has PG_lru bit set and it's going to > become anonymous, but it is still in the process of rmap initialization, > i.e. its ->mapping or ->mapcount may still be uninitialized, right? > > AFAICS, page_referenced should handle such pages fine. Look, it only > needs ->index, ->mapping, and ->mapcount. > > If ->mapping is unset, than it is NULL and rmap_walk_anon_lock -> > page_lock_anon_vma_read will return NULL so that rmap_walk will be a > no-op. > > If ->index is not initialized, than at worst we will go to > anon_vma_interval_tree_foreach over a wrong interval, in which case we > will see that the page is actually not mapped in page_referenced_one -> > page_check_address and again do nothing. > > If ->mapcount is not initialized it is -1, and page_lock_anon_vma_read > will return NULL, just as it does in case ->mapping = NULL. > > For file pages, we always take PG_locked before checking ->mapping, so > it must be valid. > > Thanks, > Vladimir do_anonymous_page page_add_new_anon_rmap atomic_set(&page->_mapcount, 0); __page_set_anon_rmap anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON; page->mapping = (struct address_space *) anon_vma; page->index = linear_page_index(vma, address); lru_cache_add __pagevec_lru_add_fn SetPageLRU(page); During the procedure, there is no lock to prevent race. Then, at least, we need a write memory barrier to guarantee other fields set up before SetPageLRU. (Of course, PageLRU should have read-memory barrier to work well) But I can't find any barrier, either. IOW, any fields you said could be out of order store without any lock or memory barrier. You might argue atomic op is a barrier on x86 but it doesn't guarantee other arches work like that so we need explict momory barrier or lock. Let's have a theoretical example. CPU 0 CPU 1 do_anonymous_page __page_set_anon_rmap /* out of order happened so SetPageLRU is done ahead */ SetPageLRU(page) /* Compilr changed store operation like below */ page->mapping = (struct address_space *) anon_vma; /* Big stall happens */ /* idletacking judged it as LRU page so pass the page in page_reference */ page_refernced page_rmapping return true because page->mapping has some vaule but not complete so it calls rmap_walk_file. it's okay to pass non-completed anon page in rmap_walk_file? page->mapping = (struct address_space *) ((void *)page_mapping + PAGE_MAPPING_ANON); It's too theoretical so it might be hard to happen in real practice. My point is there is nothing to prevent explict race. Even if there is no problem with other lock, it's fragile. Do I miss something? I think general way to handle PageLRU are ahead isolation or zone->lru_lock. -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/