Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262705AbUCRP3b (ORCPT ); Thu, 18 Mar 2004 10:29:31 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262710AbUCRP3b (ORCPT ); Thu, 18 Mar 2004 10:29:31 -0500 Received: from ns.suse.de ([195.135.220.2]:29387 "EHLO Cantor.suse.de") by vger.kernel.org with ESMTP id S262705AbUCRP3Y (ORCPT ); Thu, 18 Mar 2004 10:29:24 -0500 Date: Thu, 18 Mar 2004 16:28:16 +0100 Message-ID: From: Takashi Iwai To: Andrea Arcangeli Cc: "Marinos J. Yannikos" , Andrew Morton , linux-kernel@vger.kernel.org Subject: Re: CONFIG_PREEMPT and server workloads In-Reply-To: <20040318060358.GC29530@dualathlon.random> References: <40591EC1.1060204@geizhals.at> <20040318060358.GC29530@dualathlon.random> User-Agent: Wanderlust/2.10.1 (Watching The Wheels) SEMI/1.14.5 (Awara-Onsen) FLIM/1.14.5 (Demachiyanagi) APEL/10.6 MULE XEmacs/21.4 (patch 13) (Rational FORTRAN) (i386-suse-linux) MIME-Version: 1.0 (generated by SEMI 1.14.5 - "Awara-Onsen") Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3614 Lines: 100 At Thu, 18 Mar 2004 07:03:58 +0100, Andrea Arcangeli wrote: > > On Thu, Mar 18, 2004 at 05:00:01AM +0100, Marinos J. Yannikos wrote: > > Hi, > > > > we upgraded a few production boxes from 2.4.x to 2.6.4 recently and the > > default .config setting was CONFIG_PREEMPT=y. To get straight to the > > point: according to our measurements, this results in severe performance > > degradation with our typical and some artificial workload. By "severe" I > > mean this: > > this is expected (see the below email, I predicted it on Mar 2000), keep > preempt turned off always, it's useless. Worst of all we're now taking > spinlocks earlier than needed, and the preempt_count stuff isn't > optmized away by PREEMPT=n, once those bits will be fixed too it'll go > even faster. > > preempt just wastes cpu with tons of branches in fast paths that should > take one cycle instead. > > Takashi Iwai did lots of research on the preempt vs lowlatency and > he found that preempt buys nothing and he confirmed my old theories well, i personally am not against the current preempt mechanism from the viewpoint of the audio-processing purpose :) the implementation is relatively clean and easy. but i agree with Andrea, that surely we can achieve the alsmo same RT-performance even without preemption, i.e. with less perempt overhead. it's not necessary to be default. (snip) > These fixes from Takashi Iwai brings 2.6 back in line with 2.4, I > suggested to use EIP dumps from interrupts to get the hotspots, he > promptly used the RTC for that and he could fixup all the spots, great > job he did since now we've a very low worst case sched latency in 2.6 > too: > > --- linux/fs/mpage.c-dist 2004-03-10 16:26:54.293647478 +0100 > +++ linux/fs/mpage.c 2004-03-10 16:27:07.405673634 +0100 > @@ -695,6 +695,7 @@ mpage_writepages(struct address_space *m > unlock_page(page); > } > page_cache_release(page); > + cond_resched(); > spin_lock(&mapping->page_lock); > } > /* the above one is the major source of RT-latency. only this oneliner will reduce more than 90% of RT-latencies. in my case with reiserfs, i got 0.4ms RT-latency with my test suite (with athlon 2200+). there is another point to be fixed in the reiserfs journal transaction. then you'll get 0.1ms RT-latency without preemption. for ext3, these two spots are relevant. --- linux-2.6.4-8/fs/jbd/commit.c-dist 2004-03-16 23:00:40.000000000 +0100 +++ linux-2.6.4-8/fs/jbd/commit.c 2004-03-18 02:42:41.043448624 +0100 @@ -290,6 +290,9 @@ write_out_data_locked: commit_transaction->t_sync_datalist = jh; break; } + + if (need_resched()) + break; } while (jh != last_jh); if (bufs || need_resched()) { --- linux-2.6.4-8/fs/ext3/inode.c-dist 2004-03-18 02:33:38.000000000 +0100 +++ linux-2.6.4-8/fs/ext3/inode.c 2004-03-18 02:33:40.000000000 +0100 @@ -1987,6 +1987,7 @@ static void ext3_free_branches(handle_t if (is_handle_aborted(handle)) return; + cond_resched(); if (depth--) { struct buffer_head *bh; int addr_per_block = EXT3_ADDR_PER_BLOCK(inode->i_sb); i think the first one is needed for preemptive kernel, too. with these patches, also 0.1-0.2ms RT-latency is achieved. BTW, my measurement tool is found at http://www.alsa-project.org/~iwai/latencytest-0.5.2.tar.gz -- Takashi Iwai ALSA Developer - www.alsa-project.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/