Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752490Ab0KWHsD (ORCPT ); Tue, 23 Nov 2010 02:48:03 -0500 Received: from smtp1.linux-foundation.org ([140.211.169.13]:44062 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752025Ab0KWHsB (ORCPT ); Tue, 23 Nov 2010 02:48:01 -0500 Date: Mon, 22 Nov 2010 23:42:57 -0800 From: Andrew Morton To: Minchan Kim Cc: KOSAKI Motohiro , linux-mm , LKML , Peter Zijlstra , Rik van Riel , Johannes Weiner , Nick Piggin Subject: Re: [RFC 1/2] deactive invalidated pages Message-Id: <20101122234257.f14bad44.akpm@linux-foundation.org> In-Reply-To: References: <20101122143817.E242.A69D9226@jp.fujitsu.com> X-Mailer: Sylpheed 2.7.1 (GTK+ 2.18.9; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1982 Lines: 42 On Tue, 23 Nov 2010 16:40:03 +0900 Minchan Kim wrote: > Hi KOSAKI, > > 2010/11/23 KOSAKI Motohiro : > >> By Other approach, app developer uses POSIX_FADV_DONTNEED. > >> But it has a problem. If kernel meets page is writing > >> during invalidate_mapping_pages, it can't work. > >> It is very hard for application programmer to use it. > >> Because they always have to sync data before calling > >> fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could > >> be discardable. At last, they can't use deferred write of kernel > >> so that they could see performance loss. > >> (http://insights.oetiker.ch/linux/fadvise.html) > > > > If rsync use the above url patch, we don't need your patch. > > fdatasync() + POSIX_FADV_DONTNEED should work fine. > > It works well. But it needs always fdatasync before calling fadvise. > For small file, it hurt performance since we can't use the deferred write. fdatasync() is (much) better than nothing, but a userspace application which is carefully managing its IO scheduling should use sync_file_range(SYNC_FILE_RANGE_WRITE) to push data at the disk and should then run fadvise(DONTNEED) against the same data a few seconds later, after the IO has completed. That way, the application won't block against the write I/O at all, unless of course someone else is thrashing the disk as well, etc. If the app is doing a lot of file I/O (eg, rsync) then this shouldn't be too hard to arrange. Although the payback will be pretty small unless the IO-intensive process is also compute-intensive at times. And such applications are a) fairly rare and b) poorly designed: shouldn't be doing heavy IO and heavy compute in the same thread! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/