Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756100Ab2FNOAT (ORCPT ); Thu, 14 Jun 2012 10:00:19 -0400 Received: from mga11.intel.com ([192.55.52.93]:24180 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755951Ab2FNOAR (ORCPT ); Thu, 14 Jun 2012 10:00:17 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.71,315,1320652800"; d="scan'208";a="165654805" Date: Thu, 14 Jun 2012 22:00:06 +0800 From: Fengguang Wu To: Dave Chinner Cc: Wanpeng Li , linux-kernel@vger.kernel.org, Gavin Shan Subject: Re: [PATCH v2] writeback: avoid race when update bandwidth Message-ID: <20120614140006.GB15553@localhost> References: <1339501561-4570-1-git-send-email-liwp.linux@gmail.com> <20120612115219.GA17348@localhost> <20120613035920.GV22848@dastard> <20120613121434.GA937@localhost> <20120614020559.GB7339@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120614020559.GB7339@dastard> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4612 Lines: 101 On Thu, Jun 14, 2012 at 12:05:59PM +1000, Dave Chinner wrote: > On Wed, Jun 13, 2012 at 08:14:34PM +0800, Fengguang Wu wrote: > > On Wed, Jun 13, 2012 at 01:59:20PM +1000, Dave Chinner wrote: > > > On Tue, Jun 12, 2012 at 07:52:19PM +0800, Fengguang Wu wrote: > > > > On Tue, Jun 12, 2012 at 07:46:01PM +0800, Wanpeng Li wrote: > > > > > From: Wanpeng Li > > > > > > > > > > "V1 -> V2" > > > > > * remove dirty_lock > > > > > > > > > > Since bdi->wb.list_lock is used to protect the b_* lists, > > > > > so the flushers who call wb_writeback to writeback pages will > > > > > stuck when bandwidth update policy holds this lock. In order > > > > > to avoid this race we can introduce a new bandwidth_lock who > > > > > is responsible for protecting bandwidth update policy. > > > > > > > > > > Signed-off-by: Wanpeng Li > > > > > > > > Applied with a new title "writeback: use a standalone lock for > > > > updating write bandwidth". "race" is sensitive because it often > > > > refers to some locking error. > > > > > > Fengguang - can we get some evidence that this is a contended lock > > > before changing the scope of it? All of the previous "breaking up > > > global locks" have been done based on lock contention data, so > > > moving back to a global lock for this needs to have the same > > > analysis provided... > > > > Good point. Attached is the lockstat for the case "10 disks each runs > > 100 dd dirtier tasks": > > > > lkp-ne02/JBOD-10HDD-thresh=4G/xfs-100dd-1-3.2.0-rc5 > > (nothing attached) > > > The wb->list_lock contention is much better than I expected, which is > > good. What stand out are > > waittime-total > > - &rq->lock by double_rq_lock() 6738952.13 > > - clockevents_lock by clockevents_notify() 2155554.37 > > - mapping->tree_lock by test_clear_page_writeback() 931550.13 > > - sb_lock by grab_super_passive() 918815.87 > > - &zone->lru_lock by pagevec_lru_move_fn() 912681.05 > > > > - sysfs_mutex by sysfs_permission() 24029975.20 # mutex > > - ip->i_lock by xfs_ilock() 18428284.10 # mrlock > > The wait time is not really an indication of contention problems. > Large wait time is usually an indication that the lock is being used > a lot. Right. > What matters is the number of contentions vs the number of > acquisitions, and the number of those contentions that bounced the > lock. If the number of contentions is >= 0.5% of the acquisitions, > then the lock can be considered hot and needing some work. If I look > here: I wonder if anyone has a simple script for sorting lock_stat output based on that (and perhaps other selectable) criterion? It should be possible to write on myself, but still.. ;-) Default lock_stat output is sorted by absolute number of contentions. > http://lists.linux.hp.com/~enw/ext4/3.2/3.2-full-lockstats.2/ffsb_fsscale.xfs.large_file_creates_threads=192/profiling/iteration.1/lock_stat > > Which is a 192 thread concurrent write on a 48-core machine, the > wb.list_lock shows 5,532 acquistions for the entire test, while the > mapping tree lock took 440 million!. So your test isn't really one > that shows wb.list_lock contention. The 192-thread mailserver > workload from the same machine: > > http://lists.linux.hp.com/~enw/ext4/3.2/3.2-full-lockstats.2/ffsb_fsscale.xfs.mail_server_threads=192/profiling/iteration.1/lock_stat > > Shows about 7.1m acquisitions of the wb.list_lock, but only 28,000 > contentions. So it isn't really contended enough to justify > replacing it with a global lock. Right. > FWIW, the third most contended lock on that workload is the XFS > delayed write queue lock - 25M acquisitions for 600k contentions - a > rate of about 2% which means quite severe contention. That lock no > longer exists in 3.5 - Christoph completely reworked the delayed > write buffer support to remove the global list and lock because it > was showing up in profiles like this... > > Indeed, that profile shows that XFS owns 7 of the 10 most contended > locks, and 3 of them have had significant work done to reduce the > contention since 3.2 as a result of recent profile results like this. Nice work! Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/