Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753075Ab2KTO3I (ORCPT ); Tue, 20 Nov 2012 09:29:08 -0500 Received: from mga01.intel.com ([192.55.52.88]:44860 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752826Ab2KTO3A (ORCPT ); Tue, 20 Nov 2012 09:29:00 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.83,286,1352102400"; d="scan'208";a="249996538" Date: Tue, 20 Nov 2012 22:28:56 +0800 From: Fengguang Wu To: Jaegeuk Hanse Cc: Claudio Freire , Andrew Morton , linux-kernel@vger.kernel.org, Linux Memory Management List Subject: Re: fadvise interferes with readahead Message-ID: <20121120142856.GA19467@localhost> References: <20121120080427.GA11019@localhost> <50AB8396.4040504@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <50AB8396.4040504@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1606 Lines: 37 > >Yes. The kernel readahead code by design will outperform simple > >fadvise in the case of clustered random reads. Imagine the access > >pattern 1, 3, 2, 6, 4, 9. fadvise will trigger 6 IOs literally. While > > You mean it will trigger 6 IOs in the POSIX_FADV_RANDOM case or > POSIX_FADV_WILLNEED case? Yes. However note that I'm assuming 1-page sized and prefetch depth fadvise(POSIX_FADV_WILLNEED) calls in this example. Given more prefetch depth or good timing, there will be possibility for IO requests (eg. 3 and 2) be merged at block layer. > >kernel readahead will likely trigger 3 IOs for 1, 3, 2-9. Because on > >the page miss for 2, it will detect the existence of history page 1 > >and do readahead properly. For hard disks, it's mainly the number of > > If the first IO read 1, it will call page_cache_sync_read() since > cache miss, > if (offset - (ra->prev_pos) >> PAGE_CACHE_SHIFT) <= 1UL) > goto initial_readahead; > If the initial_readahead will be called? Because offset is equal to > 1 and ra->prev_pos is equal to 0. If my assume is true, 2 also will > be readahead. ra->prev_pos is initialized to -1 in file_ra_state_init(), so that if the very first read is on page 0, it will trigger readahead. Sorry I gave a confusing example. We may as well use 1001, 1003, 1002, 1006, 1004, 1009 as the example numbers. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/