Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933426Ab3GDAbJ (ORCPT ); Wed, 3 Jul 2013 20:31:09 -0400 Received: from mail.openrapids.net ([64.15.138.104]:33626 "EHLO blackscsi.openrapids.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755123Ab3GDAbI (ORCPT ); Wed, 3 Jul 2013 20:31:08 -0400 Date: Wed, 3 Jul 2013 20:31:03 -0400 From: Mathieu Desnoyers To: Dave Chinner Cc: Jeff Moyer , Mel Gorman , Rob van der Heij , Andrew Morton , Yannick Brosseau , stable@vger.kernel.org, LKML , "lttng-dev@lists.lttng.org" Subject: Re: [-stable 3.8.1 performance regression] madvise POSIX_FADV_DONTNEED Message-ID: <20130704003103.GA13899@Krystal> References: <20130618101147.GA7436@suse.de> <20130619192508.GA666@Krystal> <20130620122016.GA12700@Krystal> <20130625015648.GO29376@dastard> <20130702135858.GA30837@Krystal> <20130703005514.GA17149@Krystal> <20130703084715.GF1875@suse.de> <20130704000344.GG4072@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130704000344.GG4072@dastard> X-Editor: vi X-Info: http://www.efficios.com User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3207 Lines: 79 * Dave Chinner (david@fromorbit.com) wrote: > On Wed, Jul 03, 2013 at 10:53:08AM -0400, Jeff Moyer wrote: > > Mel Gorman writes: > > > > >> > I just tried replacing my sync_file_range()+fadvise() calls and instead > > >> > pass the O_DIRECT flag to open(). Unfortunately, I must be doing > > >> > something very wrong, because I get only 1/3rd of the throughput, and > > >> > the page cache fills up. Any idea why ? > > >> > > >> Since O_DIRECT does not seem to provide acceptable throughput, it may be > > >> interesting to investigate other ways to lessen the latency impact of > > >> the fadvise DONTNEED hint. > > >> > > > > > > There are cases where O_DIRECT falls back to buffered IO which is why you > > > might have found that page cache was still filling up. There are a few > > > reasons why this can happen but I would guess the common cause is that > > > the range of pages being written was in the page cache already and could > > > not be invalidated for some reason. I'm guessing this is the common case > > > for page cache filling even with O_DIRECT but would not bet money on it > > > as it's not a problem I investigated before. > > > > Even when O_DIRECT falls back to buffered I/O for writes, it will > > invalidate the page cache range described by the buffered I/O once it > > completes. For reads, the range is written out synchronously before the > > direct I/O is issued. Either way, you shouldn't see the page cache > > filling up. > > > > I keep forgetting that filesystems other than XFS have sub-optimal > direct IO implementations. I wish that "silent fallback to buffered > IO" idea had never seen the light of day, and that filesystems > implemented direct IO properly. > > > Switching to O_DIRECT often incurs a performance hit, especially if the > > application does not submit more than one I/O at a time. Remember, > > you're not getting readahead, and you're not getting the benefit of the > > writeback code submitting batches of I/O. > > With the way IO is being done, there won't be any readahead (write > only workload) and they are directly controlling writeback one chunk > at a time, so there's not writeback caching to do batching, either. > There's no obvious reason that direct IO should be any slower > assuming that the application is actually doing 1MB sized and > aligned IOs like was mentioned, because both methods are directly > dispatching and then waiting for IO completion. As a clarification, I use 256kB "chunks" (sub-buffers) in my tests, not 1MB. Also, please note that since I'm using splice(), each individual splice call is internally limited to 16 pages worth of data transfer (64kB). > What filesystem is in use here? My test was performed on ext3 filesystem, that was itself sitting on raid-1 software raid. Thanks, Mathieu > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/