Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261540AbVCRLIQ (ORCPT ); Fri, 18 Mar 2005 06:08:16 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261575AbVCRLIQ (ORCPT ); Fri, 18 Mar 2005 06:08:16 -0500 Received: from fire.osdl.org ([65.172.181.4]:59060 "EHLO smtp.osdl.org") by vger.kernel.org with ESMTP id S261540AbVCRLIG (ORCPT ); Fri, 18 Mar 2005 06:08:06 -0500 Date: Fri, 18 Mar 2005 03:07:31 -0800 From: Andrew Morton To: rostedt@goodmis.org Cc: mingo@elte.hu, linux-kernel@vger.kernel.org Subject: Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks) Message-Id: <20050318030731.224aa95f.akpm@osdl.org> In-Reply-To: References: <20050315133540.GB4686@elte.hu> <20050316085029.GA11414@elte.hu> <20050316011510.2a3bdfdb.akpm@osdl.org> <20050316095155.GA15080@elte.hu> <20050316020408.434cc620.akpm@osdl.org> <20050316101906.GA17328@elte.hu> <20050316024022.6d5c4706.akpm@osdl.org> <20050316031909.08e6cab7.akpm@osdl.org> <20050316131521.48b1354e.akpm@osdl.org> <20050318013251.330e4d78.akpm@osdl.org> X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.10; i386-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3113 Lines: 82 Steven Rostedt wrote: > > > > > I really should knock up a script to send out an email when I add a patch > > to -mm. > > > > I thought you might have had something like that already, which was > another reason I thought you might have skipped this. > I do now.. > > > > I'd figure that you may have > > > missed this, since the subject didn't change. So I changed the subject to > > > get your attention, and I've resent this. Here's the patch to get rid of > > > the the lame schedule that was in fs/jbd/commit.c. Let me know if this > > > patch is appropriate. > > > > I'm rather aghast at all the ifdeffery and complexity in this one. But I > > haven't looked at it closely yet. > > > > I wanted to keep the wait logic out when it wasn't a problem. Basically, > the problem only occurs when bit_spin_trylock is defined as an actual > trylock. So I put in a define there to enable the wait queues. I didn't > want to waste cycles checking the wait queue in jbd_unlock_bh_state when > there would never be anything on it. Heck, I figured why even have the > wait queue wasting memory if it wasn't needed. So that added the > ifdeffery complexity. No, that code's just a problem. For ranking reasons it's essentially doing this: repeat: cond_resched(); spin_lock(j_list_lock); .... if (!bit_spin_trylock(bh)) { spin_unlock(j_list_lock); schedule(); goto repeat; } Now imagine that some other CPU holds the bit_spin_lock and is spinning, trying to get the spin_lock(). The above code assumes that the schedule() and cond_resched() will take "long enough" for the other CPU to get the spinlock, do its business then release the locks. So all the schedule() is really doing is "blow a few cycles so the other CPU can get in and grab the spinlock". That'll work OK on normal SMP but I suspect that on NUMA setups with really big latencies we could end up starving the other CPU: this CPU would keep on grabbing the lock. It depends on how the interconnect cache and all that goop works. So what to do? One approach would be to spin on the bit_spin_trylock after having dropped j_list_lock. That'll tell us when the other CPU has moved on. Another approach would be to sleep on a waitqueue somewhere. But that means that jbd_unlock_bh_state() needs to do wakeups all the time - costly. Another approach would be to simply whack an msleep(1) in there. That might be OK - it should be very rare. Probably the first approach would be the one to use. That's for mainline. I don't know what the super-duper-RT fix would be. Why did we start discussing this anyway? Oh, SCHED_FIFO. kjournald doesn't run SCHED_FIFO, but someone may decide to make it do so. But even then I don't see a problem for the mainline kernel, because this CPU's SCHED_FIFO doesn't stop the other CPU from running. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/