Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262753AbUCSLh3 (ORCPT ); Fri, 19 Mar 2004 06:37:29 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262762AbUCSLh3 (ORCPT ); Fri, 19 Mar 2004 06:37:29 -0500 Received: from ns.suse.de ([195.135.220.2]:11680 "EHLO Cantor.suse.de") by vger.kernel.org with ESMTP id S262753AbUCSLh0 (ORCPT ); Fri, 19 Mar 2004 06:37:26 -0500 Date: Fri, 19 Mar 2004 12:37:24 +0100 Message-ID: From: Takashi Iwai To: Chris Mason Cc: Andrew Morton , andrea@suse.de, mjy@geizhals.at, linux-kernel@vger.kernel.org Subject: Re: CONFIG_PREEMPT and server workloads In-Reply-To: <1079639286.4187.2113.camel@watt.suse.com> References: <40591EC1.1060204@geizhals.at> <20040318060358.GC29530@dualathlon.random> <20040318110159.321754d8.akpm@osdl.org> <20040318112941.0221c6ac.akpm@osdl.org> <1079639286.4187.2113.camel@watt.suse.com> User-Agent: Wanderlust/2.10.1 (Watching The Wheels) SEMI/1.14.5 (Awara-Onsen) FLIM/1.14.5 (Demachiyanagi) APEL/10.6 MULE XEmacs/21.4 (patch 13) (Rational FORTRAN) (i386-suse-linux) MIME-Version: 1.0 (generated by SEMI 1.14.5 - "Awara-Onsen") Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2339 Lines: 74 At Thu, 18 Mar 2004 14:48:07 -0500, Chris Mason wrote: > > On Thu, 2004-03-18 at 14:29, Andrew Morton wrote: > > > > yep, i see a similar problem also in reiserfs's do_journal_end(). > > > it's in lock_kernel(). > > > > I have a scheduling point in journal_end() in 2.4. But I added bugs to > > reiserfs a couple of times doing this - it's pretty delicate. Beat up on > > Chris ;) > > ;-) Not sure if Takashi is talking about -suse or -mm, the data=ordered > patches change things around. He sent me suggestions for the > data=ordered latencies already, but it shouldn't be against the BKL > there, since I drop it before calling write_ordered_buffers(). i tested only suse kernels recently. will try mm kernel later, again. ok, let me explain some nasty points i found through the disk i/o load tests: - in the loop in do_journal_end(). this happens periodically in pdflush. /* first data block is j_start + 1, so add one to cur_write_start wherever you use it */ cur_write_start = SB_JOURNAL(p_s_sb)->j_start ; cn = SB_JOURNAL(p_s_sb)->j_first ; jindex = 1 ; /* start at one so we don't get the desc again */ while(cn) { clear_bit(BH_JNew, &(cn->bh->b_state)) ; .... next = cn->next ; free_cnode(p_s_sb, cn) ; cn = next ; } - in write_ordered_buffers(). i still don't figure out where. we have already cond_resched() check in the loops. this one is triggered when i write bulk data in parallel (1GB write with 20 threads at the same time), resulting in up to 2ms. a typical stacktracing looks like this: T=36.569 diff=3.64275 comm=reiserfs/0 rtc_interrupt (+cd/e0) handle_IRQ_event (+2f/60) do_IRQ (+76/170) common_interrupt (+18/20) kfree (+36/50) reiserfs_free_jh (+34/60) write_ordered_buffers (+11f/1d0) flush_commit_list (+3e6/480) flush_async_commits (+5d/70) worker_thread (+164/1d0) flush_async_commits (+0/70) default_wake_function (+0/10) default_wake_function (+0/10) worker_thread (+0/1d0) kthread (+77/9f) kthread (+0/9f) kernel_thread_helper (+5/10) Takashi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/