Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp3789092img; Mon, 25 Mar 2019 18:32:14 -0700 (PDT) X-Google-Smtp-Source: APXvYqx7HL23R9ml6EkkGgXFR8VZOXn8/XtzPBqtQzINWI5hsi+dBr28A8t1uRNFfibd95UMK8RT X-Received: by 2002:a63:5ec2:: with SMTP id s185mr26020952pgb.27.1553563934398; Mon, 25 Mar 2019 18:32:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553563934; cv=none; d=google.com; s=arc-20160816; b=I4FW5KlVMdOQQefGsaWRrutezkJr/gzs8Bo59uvCm5/UPfRHBHLcu2oj214H73AKhN XfWIQixSOcxE6WdA0YfzDfh6aY556JxF65NXziu7feFVhwU7obE7w3tMdZOuKzfpC9XU s0GP/gGd85XxXOH+yW/bwLr7Pc3FsPHOEeiqI+QfRlyKANn8Gf6bUsbr0Jhoy8J7IR5z uolWmk8B2Ao2LLVYKJeSpma8z29H3J3GGXTnvMi2upLczbkwOorkgvho0qBkLjEFDaA3 DznHwJ9VF0j1kCJ2VW/zAvl69BvgQL9dJ5IBh6dAwdXKiHfaX3uuB2Mza5ufAxCQCXxu rE8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=sZqLRqWMeYF5y023Ld6aeuaPrAHZCn/f20mtUfF9AQU=; b=FAGTfckJeapWN9wacTp3QVEeC0GBhfGZbLY1GREcF7Vh72gHitcQxa4meuouyrpaKK UOtrYst6A7GGZMUExgOOmf+rTcURCoevQZt+WIkdAaBWtGaDQl4RH9yZrUJ5zGK8e9/e eihI5HHVK0FUVkfEOBEPrrAEr/2IkRmsbdKEVdiwpMoGpkiOV5pbt48R1mNIH6AmJEcU YSO3xwKS6PmULZetiikGGDn8InHszYFVwuWgSPKc/QiRw3kt2ZMwxuCSG7NA+0iUS611 SJ9imJxLiaZfhuuLiTwu8bFYQTmAFYKCjtZZWaz3DJI/wa0TNlzG4GDlBVA0lLjfv6lm xDWw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 4si14462349pfh.36.2019.03.25.18.31.59; Mon, 25 Mar 2019 18:32:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730757AbfCZBbF (ORCPT + 99 others); Mon, 25 Mar 2019 21:31:05 -0400 Received: from mga07.intel.com ([134.134.136.100]:27619 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727427AbfCZBbC (ORCPT ); Mon, 25 Mar 2019 21:31:02 -0400 X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Mar 2019 18:31:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,270,1549958400"; d="scan'208";a="137252173" Received: from zliu7-mobl2.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.212.116]) by orsmga003.jf.intel.com with ESMTP; 25 Mar 2019 18:30:59 -0700 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1h8ava-0007nx-GU; Tue, 26 Mar 2019 09:30:58 +0800 Date: Tue, 26 Mar 2019 09:30:58 +0800 From: Fengguang Wu To: Mark Salyzyn Cc: Martin Liu , akpm@linux-foundation.org, axboe@kernel.dk, dchinner@redhat.com, jenhaochen@google.com, salyzyn@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org Subject: Re: [RFC PATCH] mm: readahead: add readahead_shift into backing device Message-ID: <20190326013058.ykdwxbfkk3x3pvtu@wfg-t540p.sh.intel.com> References: <20190322154610.164564-1-liumartin@google.com> <20190325121628.zxlogz52go6k36on@wfg-t540p.sh.intel.com> <9b194e61-f2d0-82cb-30ac-95afb493b894@android.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <9b194e61-f2d0-82cb-30ac-95afb493b894@android.com> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 25, 2019 at 09:59:31AM -0700, Mark Salyzyn wrote: >On 03/25/2019 05:16 AM, Fengguang Wu wrote: >> Martin, >> >> On Fri, Mar 22, 2019 at 11:46:11PM +0800, Martin Liu wrote: >>> As the discussion https://lore.kernel.org/patchwork/patch/334982/ >>> We know an open file's ra_pages might run out of sync from >>> bdi.ra_pages since sequential, random or error read. Current design >>> is we have to ask users to reopen the file or use fdavise system >>> call to get it sync. However, we might have some cases to change >>> system wide file ra_pages to enhance system performance such as >>> enhance the boot time by increasing the ra_pages or decrease it to >> >> Do you have examples that some distro making use of larger ra_pages >> for boot time optimization? > >Android (if you are willing to squint and look at android-common AOSP >kernels as a Distro). OK. I wonder how exactly Android makes use of it. Since phones are not using hard disks, so should benefit less from large ra_pages. Would you kindly point me to the code? >> Suppose N read streams with equal read speed. The thrash-free memory >> requirement would be (N * 2 * ra_pages). >> >> If N=1000 and ra_pages=1MB, it'd require 2GB memory. Which looks >> affordable in mainstream servers. >That is 50% of the memory on a high end Android device ... Yeah but I'm obviously not talking Android device here. Will a phone serve 1000 concurrent read streams? >> Sorry but it sounds like introducing an unnecessarily twisted new >> interface. I'm afraid it fixes the pain for 0.001% users while >> bringing more puzzle to the majority others. > >2B Android devices on the planet is 0.001%? Nope. Sorry I didn't know about the Android usage. Actually nobody mentioned it in the past discussions. >I am not defending the proposed interface though, if there is something >better that can be used, then looking into: >> >> Then let fadvise() and shrink_readahead_size_eio() adjust that >> per-file ra_pages_shift. >Sounds like this would require a lot from init to globally audit and >reduce the read-ahead for all open files? It depends. In theory it should be possible to create a standalone kernel module to dump the page cache and get the current snapshot of all cached file pages. It'd be a one-shot action and don't require continuous auditing. [RFC] kernel facilities for cache prefetching https://lwn.net/Articles/182128 This tool may also work. It's quick to get the list of opened files by walking /proc/*/fd/, however not as easy to get the list of cached file names. https://github.com/tobert/pcstat Perhaps we can do a simplified /proc/filecache that only dumps the list of cached file names. Then let mincore() based tools take care of the rest work. Regards, Fengguang