Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754862AbbFSXHk (ORCPT ); Fri, 19 Jun 2015 19:07:40 -0400 Received: from ipmail06.adl6.internode.on.net ([150.101.137.145]:45394 "EHLO ipmail06.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753557AbbFSXHd (ORCPT ); Fri, 19 Jun 2015 19:07:33 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2AJCgBJn4RVPOkmLHlSCoMQVF+GTKUJAQEBAQEBBo12hhaFdgQCAoE3TQEBAQEBAQcBAQEBQT+EIgEBAQMBOhwjBQsIAxgJJQ8FJQMHGhOIJwfGIwEBAQEGAQEBAQEBHBiGA4UqhCldB4MXgRQFk3yEVoZ3gTuMM4Zmg1uBCYEoHIFkLDGBAyOBIgEBAQ Date: Sat, 20 Jun 2015 09:07:20 +1000 From: Dave Chinner To: Len Brown Cc: NeilBrown , One Thousand Gnomes , "Rafael J. Wysocki" , Ming Lei , "Rafael J. Wysocki" , Linux PM List , Linux Kernel Mailing List , Len Brown Subject: Re: [PATCH 1/1] suspend: delete sys_sync() Message-ID: <20150619230720.GB16870@dastard> References: <3798672.EXej90jOp1@vostro.rjw.lan> <20150515113557.54ef930b@lxorguk.ukuu.org.uk> <20150518115727.72439610@notabene.brown> <20150619010955.GL20262@dastard> <20150619043147.GA16870@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3168 Lines: 75 On Fri, Jun 19, 2015 at 02:34:37AM -0400, Len Brown wrote: > > Can you repeat this test on your system, so that we can determine if > > the 5ms ""sync time" is actually just the overhead of inode cache > > traversal? If that is the case, the speed of sync on a clean > > filesystem is already a solved problem - the patchset should be > > merged in the 4.2 cycle.... > > Yes, drop_caches does seem to help repeated sync on this system: > Exactly what patch series does this? I'm running ext4 (the default, > not btrfs) None. It's the current behaviour of sync that is ends up walking the inode cache in it's entirity to find dirty inodes that need to be waited on. That's what the sync scalability patch series I pointed you at fixes - sync then keeps a "dirty inodes that need to be waited on list" instead of doing a cache traversal to find them. i.e. the "no cache" results you see will soon be the behaviour sync has regardless of the size of the inode cache. > [lenb@d975xbx ~]$ sudo grep ext4_inode /proc/slabinfo > ext4_inode_cache 3536 3536 1008 16 4 : tunables 0 0 > 0 : slabdata 221 221 0 That's actually a really small cache to begin with. > > This is the problem we really need to reproduce and track down. > > Putting a function trace on sys_sync and executing sync manually, > I was able to see it take 100ms, > though function trace itself could be contributing to that... It would seem that way - you need to get the traces to dump to something that has no sync overhead.... > running analyze_suspend.py after the slab tweak above didn't change much. > in one run sync was 20ms (out of a total suspend time of 60ms). Which may be because the inode cache was larger? > Curiously, in another run, sync ran at 15ms, but sd suspend exploded to 300ms. > I've seen that in some other results. Sometimes sync if fast, but sd > then more than makes up for it by being slow:-( Oh, I see that too. Normally That's because the filesystem hasn't been told to enter an idle state and so is doing metadata writeback IO after the sync. When that happens the sd suspend has wait for request queues to drain, IO to complete and device caches to flush. This simply cannot be avoided because suspend never tells the filesytems to enter an idle state.... i.e. remember what I said initially in this thread about suspend actually needing to freeze filesystems, not just sync them? > FYI, > I ran analyze_suspend.py -x2 > from current directory /tmp, which is mounted on tmpfs, > but still found the 2nd sync was very slow -- 200ms > vs 6 - 20 ms for the sync preceding the 1st suspend. So where did that time go? As I pointed out previously, function trace will only tell us if the delay is data writeback or not. We seem to have confirmed that the delay is, indeed, writeback of dirty data. Now we need to identify what the dirty data belongs to: we need to trace individual writeback events to see what dirty inodes are actually being written. Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/