Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759089Ab2JYBtA (ORCPT ); Wed, 24 Oct 2012 21:49:00 -0400 Received: from mail-ia0-f174.google.com ([209.85.210.174]:61084 "EHLO mail-ia0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751243Ab2JYBs6 (ORCPT ); Wed, 24 Oct 2012 21:48:58 -0400 Message-ID: <50889A7E.8010104@gmail.com> Date: Thu, 25 Oct 2012 09:48:46 +0800 From: Ni zhan Chen User-Agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121011 Thunderbird/16.0.1 MIME-Version: 1.0 To: YingHang Zhu , Fengguang Wu CC: Dave Chinner , akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state References: <1350996411-5425-1-git-send-email-casualfisher@gmail.com> <20121023224706.GR4291@dastard> <20121024201921.GX4291@dastard> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3840 Lines: 82 On 10/25/2012 08:17 AM, YingHang Zhu wrote: > On Thu, Oct 25, 2012 at 4:19 AM, Dave Chinner wrote: >> On Wed, Oct 24, 2012 at 07:53:59AM +0800, YingHang Zhu wrote: >>> Hi Dave, >>> On Wed, Oct 24, 2012 at 6:47 AM, Dave Chinner wrote: >>>> On Tue, Oct 23, 2012 at 08:46:51PM +0800, Ying Zhu wrote: >>>>> Hi, >>>>> Recently we ran into the bug that an opened file's ra_pages does not >>>>> synchronize with it's backing device's when the latter is changed >>>>> with blockdev --setra, the application needs to reopen the file >>>>> to know the change, >>>> or simply call fadvise(fd, POSIX_FADV_NORMAL) to reset the readhead >>>> window to the (new) bdi default. >>>> >>>>> which is inappropriate under our circumstances. >>>> Which are? We don't know your circumstances, so you need to tell us >>>> why you need this and why existing methods of handling such changes >>>> are insufficient... >>>> >>>> Optimal readahead windows tend to be a physical property of the >>>> storage and that does not tend to change dynamically. Hence block >>>> device readahead should only need to be set up once, and generally >>>> that can be done before the filesystem is mounted and files are >>>> opened (e.g. via udev rules). Hence you need to explain why you need >>>> to change the default block device readahead on the fly, and why >>>> fadvise(POSIX_FADV_NORMAL) is "inappropriate" to set readahead >>>> windows to the new defaults. >>> Our system is a fuse-based file system, fuse creates a >>> pseudo backing device for the user space file systems, the default readahead >>> size is 128KB and it can't fully utilize the backing storage's read ability, >>> so we should tune it. >> Sure, but that doesn't tell me anything about why you can't do this >> at mount time before the application opens any files. i.e. you've >> simply stated the reason why readahead is tunable, not why you need >> to be fully dynamic..... > We store our file system's data on different disks so we need to change ra_pages > dynamically according to where the data resides, it can't be fixed at mount time > or when we open files. > The abstract bdi of fuse and btrfs provides some dynamically changing > bdi.ra_pages > based on the real backing device. IMHO this should not be ignored. And how to tune ra_pages if one big file distribution in different disks, I think Fengguang Wu can answer these questions, Hi Fengguang, >>> The above third-party application using our file system maintains >>> some long-opened files, we does not have any chances >>> to force them to call fadvise(POSIX_FADV_NORMAL). :( >> So raise a bug/feature request with the third party. Modifying >> kernel code because you can't directly modify the application isn't >> the best solution for anyone. This really is an application problem >> - the kernel already provides the mechanisms to solve this >> problem... :/ > Thanks for advice, I will consult the above application's developers > for more information. > Now from the code itself should we merge the gap between the real > device's ra_pages and the file's? > Obviously the ra_pages is duplicated, otherwise each time we run into this > problem, someone will do the same work as I have done here. > > Thanks, > Ying Zhu >> Cheers, >> >> Dave. >> -- >> Dave Chinner >> david@fromorbit.com > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/