Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758876AbZGHU0Z (ORCPT ); Wed, 8 Jul 2009 16:26:25 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756023AbZGHU0P (ORCPT ); Wed, 8 Jul 2009 16:26:15 -0400 Received: from mx1.redhat.com ([66.187.233.31]:60463 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755599AbZGHU0O (ORCPT ); Wed, 8 Jul 2009 16:26:14 -0400 Date: Wed, 8 Jul 2009 16:26:13 -0400 From: Valerie Aurora To: Josef Bacik Cc: linux-ext4@vger.kernel.org, emcnabb@redhat.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH] fix softlockups in ext2/3 when trying to allocate blocks Message-ID: <20090708202612.GC16893@shell> References: <20090706194739.GB19798@dhcp231-156.rdu.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090706194739.GB19798@dhcp231-156.rdu.redhat.com> User-Agent: Mutt/1.4.2.2i Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2307 Lines: 65 On Mon, Jul 06, 2009 at 03:47:39PM -0400, Josef Bacik wrote: > This isn't a huge deal, but using a big beefy box with more CPUs than what is > sane, you can get a nice flood of softlockup messages when running heavy > multi-threaded io tests on ext2/3. The processors compete for blocks from the > allocator, so they will loop quite a bit trying to get their allocation. This > patch simply makes sure that we reschedule if need be. This made the softlockup > messages disappear whereas before they happened almost immediately. Thanks, > > Tested-by: Evan McNabb > Signed-off-by: Josef Bacik > --- > fs/ext2/balloc.c | 1 + > fs/ext3/balloc.c | 2 ++ > 2 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c > index 7f8d2e5..17dd55f 100644 > --- a/fs/ext2/balloc.c > +++ b/fs/ext2/balloc.c > @@ -1176,6 +1176,7 @@ ext2_try_to_allocate_with_rsv(struct super_block *sb, unsigned int group, > break; /* succeed */ > } > num = *count; > + cond_resched(); > } > return ret; > } > diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c > index 27967f9..cffc8cd 100644 > --- a/fs/ext3/balloc.c > +++ b/fs/ext3/balloc.c > @@ -735,6 +735,7 @@ bitmap_search_next_usable_block(ext3_grpblk_t start, struct buffer_head *bh, > struct journal_head *jh = bh2jh(bh); > > while (start < maxblocks) { > + cond_resched(); > next = ext3_find_next_zero_bit(bh->b_data, maxblocks, start); > if (next >= maxblocks) > return -1; I'm curious: Why schedule at the beginning of the while() loop rather than at the end? > @@ -1391,6 +1392,7 @@ ext3_try_to_allocate_with_rsv(struct super_block *sb, handle_t *handle, > break; /* succeed */ > } > num = *count; > + cond_resched(); > } > out: > if (ret >= 0) { > -- > 1.6.2.2 I like this patch in general, but I worry about introducing new performance problems in other cases. Have you guys tested on single cpu systems? Maybe with a file system close to ENOSPC or badly fragmented? -VAL -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/