Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756566AbZDPRNI (ORCPT ); Thu, 16 Apr 2009 13:13:08 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755680AbZDPRMz (ORCPT ); Thu, 16 Apr 2009 13:12:55 -0400 Received: from moutng.kundenserver.de ([212.227.126.177]:60311 "EHLO moutng.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756004AbZDPRMy (ORCPT ); Thu, 16 Apr 2009 13:12:54 -0400 Message-ID: <49E766F3.60800@vlnb.net> Date: Thu, 16 Apr 2009 21:12:19 +0400 From: Vladislav Bolkhovitin User-Agent: Thunderbird 2.0.0.21 (X11/20090320) MIME-Version: 1.0 To: Wu Fengguang CC: Ingo Molnar , Andrew Morton , Jens Axboe , Jeff Moyer , LKML , Peter Zijlstra , Nick Piggin , Rik van Riel , Linus Torvalds , Chenfeng Xu , linux-mm@kvack.org Subject: Re: [PATCH 3/3] readahead: introduce context readahead algorithm References: <20090412071950.166891982@intel.com> <20090412072052.686760755@intel.com> <20090412084819.GA25314@elte.hu> <20090412123518.GA5599@localhost> In-Reply-To: <20090412123518.GA5599@localhost> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V01U2FsdGVkX19zsAWinxv7mM3h+jB4cRF9dNu4SvjhXmK5QGZ xVzHS0dGZJesRZkBSHwB7R2nU/XWKN4i0jg18fkjmDw54WvbT3 D4Y029I+6Jp0NrFNr8Yzg== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5297 Lines: 122 Wu Fengguang, on 04/12/2009 04:35 PM wrote: > On Sun, Apr 12, 2009 at 04:48:19PM +0800, Ingo Molnar wrote: >> * Wu Fengguang wrote: >> >>> Introduce page cache context based readahead algorithm. >>> This is to better support concurrent read streams in general. >>> /* >>> + * Count contiguously cached pages from @offset-1 to @offset-@max, >>> + * this count is a conservative estimation of >>> + * - length of the sequential read sequence, or >>> + * - thrashing threshold in memory tight systems >>> + */ >>> +static pgoff_t count_history_pages(struct address_space *mapping, >>> + struct file_ra_state *ra, >>> + pgoff_t offset, unsigned long max) >>> +{ >>> + pgoff_t head; >>> + >>> + rcu_read_lock(); >>> + head = radix_tree_prev_hole(&mapping->page_tree, offset - 1, max); >>> + rcu_read_unlock(); >>> + >>> + return offset - 1 - head; >>> +} >> Very elegant method! I suspect this will work far better >> than adding various increasingly more complex heuristics. >> >> Emphatically-Acked-by: Ingo Molnar > > Thank you Ingo! > > The only pity is that this heuristic can be defeated by some user > space program that tries to do aggressive drop-behind via > fadvise(DONTNEED) calls. But as long as the drop-behind algorithm > be a bit lazy and does not try to squeeze the last page at @offset-1, > this patch will work just OK. > > The context readahead idea is so fundamental, that a slightly modified > algorithm can be used for all kinds of sequential patterns, and it can > automatically adapt to the thrashing threshold. > > 1 if (probe_page(index - 1)) { > 2 begin = next_hole(index, max); > 3 H = index - prev_hole(index, 2*max); > 4 end = index + H; > 5 update_window(begin, end); > 6 submit_io(); > 7 } > > [=] history [#] current [_] readahead [.] new readahead > ==========================#____________.............. > 1 ^index-1 > 2 |----------->[begin > 3 |<----------- H -----------| > 4 |----------- H ----------->]end > 5 [ new window ] > > > We didn't do that because we want to > - avoid unnecessary page cache lookups for normal cases > - be more aggressive when thrashings are not a concern > > However, readahead thrashings are far more prevalent than one would > expect in a FTP/HTTP file streaming server. It can happen in a modern > server with 16GB memory, 1Gbps outbound bandwidth and 1MB readahead > size, due to the existences of slow streams. > > Let's do a coarse calculation. The 8GB inactive_list pages will be > cycled in 64Gb/1Gbps=64 seconds. This means an async readahead window > must be consumed in 64s, or it will be thrashed. That's a speed of > 2048KB/64s=32KB/s. Any client below this speed will create thrashings > in the server. In practice, those poor slow clients could amount to > half of the total connections(partly because it will take them more > time to download anything). The frequent thrashings will in return > speedup the LRU cycling/aging speed... > > We need a thrashing safe mode which do > - the above modified context readahead algorithm > - conservative ramp up of readahead size > - conservative async readahead size > > The main problem is: when shall we switch into the mode? More I think about an ideal readahead, more it looks like page cache should also keep fairness between its users, similarly as it's done for CPU (CFS) and disk (CFQ). A slow user should have a chance to use its chunk the cache in face of too fast thrasher. The main problems with it are to define what "user" is and how to implement the fairness in an acceptably simple way. Maybe something like that: "user" is IO context (or IO stream?) and, if there would be a need to get some pages for "user" A, pages belonging to other "users" would be evicted with additional wight W, so A's pages would be preferred during eviction. Just my 0.05c. > We can start with aggressive readahead and try to detect readahead > thrashings and switch into thrashing safe mode automatically. This > will work for non-interleaved reads. However the big file streamer - > lighttpd - does interleaved reads. The current data structure is not > able to detect most readahead thrashings for lighttpd. > > Luckily, the non-resident page tracking facility could help this case. > There the thrashed pages with their timing info can be found, based on > which we can have some extended context readahead algorithm that could > even overcome the drop-behind problem :) > > Thanks, > Fengguang > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/