Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754802Ab0AWEKb (ORCPT ); Fri, 22 Jan 2010 23:10:31 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754033Ab0AWEKa (ORCPT ); Fri, 22 Jan 2010 23:10:30 -0500 Received: from out-31.smtp.ucla.edu ([169.232.48.161]:43130 "EHLO out-31.smtp.ucla.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751959Ab0AWEK3 (ORCPT ); Fri, 22 Jan 2010 23:10:29 -0500 X-Greylist: delayed 358 seconds by postgrey-1.27 at vger.kernel.org; Fri, 22 Jan 2010 23:10:29 EST Date: Fri, 22 Jan 2010 20:03:48 -0800 From: Chris Frost To: Wu Fengguang Cc: Andrew Morton , Steve Dickson , David Howells , Xu Chenfeng , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Steve VanDeBogart Subject: Re: [PATCH] mm/readahead.c: update the LRU positions of in-core pages, too Message-ID: <20100123040348.GC30844@frostnet.net> References: <20100120215536.GN27212@frostnet.net> <20100121054734.GC24236@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100121054734.GC24236@localhost> X-PGP-Key: User-Agent: Mutt/1.5.17+20080114 (2008-01-14) X-Probable-Spam: no X-Spam-Hits: 0.654 X-Spam-Score: * X-Spam-Report: SPF_SOFTFAIL Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4866 Lines: 163 On Thu, Jan 21, 2010 at 01:47:34PM +0800, Wu Fengguang wrote: > On Wed, Jan 20, 2010 at 01:55:36PM -0800, Chris Frost wrote: > > This patch changes readahead to move pages that are already in memory and > > in the inactive list to the top of the list. This mirrors the behavior > > of non-in-core pages. The position of pages already in the active list > > remains unchanged. > > This is good in general. Great! > > @@ -170,19 +201,24 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp, > > rcu_read_lock(); > > page = radix_tree_lookup(&mapping->page_tree, page_offset); > > rcu_read_unlock(); > > - if (page) > > - continue; > > - > > - page = page_cache_alloc_cold(mapping); > > - if (!page) > > - break; > > - page->index = page_offset; > > - list_add(&page->lru, &page_pool); > > - if (page_idx == nr_to_read - lookahead_size) > > - SetPageReadahead(page); > > - ret++; > > + if (page) { > > + page_cache_get(page); > > This is racy - the page may have already be freed and possibly reused > by others in the mean time. > > If you do page_cache_get() on a random page, it may trigger bad_page() > in the buddy page allocator, or the VM_BUG_ON() in put_page_testzero(). Thanks for catching these. > > > + if (!pagevec_add(&retain_vec, page)) > > + retain_pages(&retain_vec); > > + } else { > > + page = page_cache_alloc_cold(mapping); > > + if (!page) > > + break; > > + page->index = page_offset; > > + list_add(&page->lru, &page_pool); > > + if (page_idx == nr_to_read - lookahead_size) > > + SetPageReadahead(page); > > + ret++; > > + } > > Years ago I wrote a similar function, which can be called for both > in-kernel-readahead (when it decides not to bring in new pages, but > only retain existing pages) and fadvise-readahead (where it want to > read new pages as well as retain existing pages). > > For better chance of code reuse, would you rebase the patch on it? > (You'll have to do some cleanups first.) This sounds good; thanks. I've rebased my change on the below. Fwiw, performance is unchanged. A few questions below. > +/* > + * Move pages in danger (of thrashing) to the head of inactive_list. > + * Not expected to happen frequently. > + */ > +static unsigned long rescue_pages(struct address_space *mapping, > + struct file_ra_state *ra, > + pgoff_t index, unsigned long nr_pages) > +{ > + struct page *grabbed_page; > + struct page *page; > + struct zone *zone; > + int pgrescue = 0; > + > + dprintk("rescue_pages(ino=%lu, index=%lu, nr=%lu)\n", > + mapping->host->i_ino, index, nr_pages); > + > + for(; nr_pages;) { > + grabbed_page = page = find_get_page(mapping, index); > + if (!page) { > + index++; > + nr_pages--; > + continue; > + } > + > + zone = page_zone(page); > + spin_lock_irq(&zone->lru_lock); > + > + if (!PageLRU(page)) { > + index++; > + nr_pages--; > + goto next_unlock; > + } > + > + do { > + struct page *the_page = page; > + page = list_entry((page)->lru.prev, struct page, lru); > + index++; > + nr_pages--; > + ClearPageReadahead(the_page); > + if (!PageActive(the_page) && > + !PageLocked(the_page) && > + page_count(the_page) == 1) { Why require the page count to be 1? > + list_move(&the_page->lru, &zone->inactive_list); The LRU list manipulation interface has changed since this patch. I believe we should replace the list_move() call with: del_page_from_lru_list(zone, the_page, LRU_INACTIVE_FILE); add_page_to_lru_list(zone, the_page, LRU_INACTIVE_FILE); This moves the page to the top of the list, but also notifies mem_cgroup. It also, I believe needlessly, decrements and then increments the zone state for each move. > + pgrescue++; > + } > + } while (nr_pages && > + page_mapping(page) == mapping && > + page_index(page) == index); Is it ok to not lock each page in this while loop? (Does the zone lock protect all the reads and writes?) Will the zone be the same for all pages seen inside a given run of this while loop? Do you think performance would be better if the code used a pagevec and a call to find_get_pages_contig(), instead of the above find_get_page() and this loop over the LRU list? > + > +next_unlock: > + spin_unlock_irq(&zone->lru_lock); > + page_cache_release(grabbed_page); > + cond_resched(); > + } > + > + ra_account(ra, RA_EVENT_READAHEAD_RESCUE, pgrescue); I don't see ra_account() or relevant fields in struct file_ra_state in the current kernel. I'll drop the ra_account() call? > + return pgrescue; > +} -- Chris Frost http://www.frostnet.net/chris/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/