Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261569AbVDEGKo (ORCPT ); Tue, 5 Apr 2005 02:10:44 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261570AbVDEGKo (ORCPT ); Tue, 5 Apr 2005 02:10:44 -0400 Received: from e32.co.us.ibm.com ([32.97.110.130]:19097 "EHLO e32.co.us.ibm.com") by vger.kernel.org with ESMTP id S261569AbVDEGK2 (ORCPT ); Tue, 5 Apr 2005 02:10:28 -0400 Subject: Re: ext3 allocate-with-reservation latencies From: Mingming Cao Reply-To: cmm@us.ibm.com To: Ingo Molnar Cc: Lee Revell , linux-kernel , Andrew Morton In-Reply-To: <20050405041359.GA17265@elte.hu> References: <1112673094.14322.10.camel@mindpipe> <20050405041359.GA17265@elte.hu> Content-Type: text/plain Organization: IBM LTC Date: Mon, 04 Apr 2005 23:10:24 -0700 Message-Id: <1112681424.3811.42.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-3) Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3687 Lines: 83 On Tue, 2005-04-05 at 06:13 +0200, Ingo Molnar wrote: > * Lee Revell wrote: > > > I can trigger latencies up to ~1.1 ms with a CVS checkout. It looks > > like inside ext3_try_to_allocate_with_rsv, we spend a long time in this > > loop: > > We have not modify the reservation create algorithm for a long time. Sorry, I missed the background here, on which kernel did you see this latency? And what tests you were running? > > ext3_test_allocatable (bitmap_search_next_usable_block) > > find_next_zero_bit (bitmap_search_next_usable_block) > > find_next_zero_bit (bitmap_search_next_usable_block) > > > > ext3_test_allocatable (bitmap_search_next_usable_block) > > find_next_zero_bit (bitmap_search_next_usable_block) > > find_next_zero_bit (bitmap_search_next_usable_block) > The loop in bitmap_search_next_usable_block tries to find the next zero bit on the bitmap, if it find, then it need to check whether this zero bit is also be zero on the journalling last copy the the bitmap. The double check is there prevent re-use the just freed block on disk before it is committed from journalling. If it's not marked as a free block on the journalling copy, then it will find the next free bit on that copy first, then loop back etc.etc. The bitmap_search_next_usable_block() code is quite simple: static int bitmap_search_next_usable_block(int start, struct buffer_head *bh, int maxblocks) { int next; struct journal_head *jh = bh2jh(bh); /* * The bitmap search --- search forward alternately through the actual * bitmap and the last-committed copy until we find a bit free in * both */ while (start < maxblocks) { next = ext3_find_next_zero_bit(bh->b_data, maxblocks, start); if (next >= maxblocks) return -1; if (ext3_test_allocatable(next, bh)) return next; jbd_lock_bh_state(bh); if (jh->b_committed_data) start = ext3_find_next_zero_bit(jh->b_committed_data, maxblocks, next); jbd_unlock_bh_state(bh); } return -1; } The latency report shows the pattern of two calls of ext3_find_next_zero_bit () after a call to ext3_test_allocatable(), matches what I suspect above. I wonder if re-run the test with -0 noreservation mount option make any difference? > Breaking the lock is not really possible at that point, and it doesnt > look too easy to make that path preemptable either. (To make it > preemptable rsv_lock would need to become a semaphore (this could be > fine, as it's only used when a new reservation window is created).) > > The hard part is the seqlock - the read side is performance-critical, > maybe it could be solved via a preemptable but still scalable seqlock > variant that uses a semaphore for the write side? It all depends on what > the scalability impact of using a semaphore for the new-window code > would be. > The seqlock was there to prevent a smp race, but actually turns out to be unnecessary, and it has been removed recently. While even if it is there, it will not cause much latency here: the whole ext3_try_to_allocate_with_rsv is guard by ei->truncate_sem, thus no multiple readers/writers at the same time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/