Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754361AbYFJDNi (ORCPT ); Mon, 9 Jun 2008 23:13:38 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752585AbYFJDNV (ORCPT ); Mon, 9 Jun 2008 23:13:21 -0400 Received: from cantor2.suse.de ([195.135.220.15]:43652 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754687AbYFJDNU (ORCPT ); Mon, 9 Jun 2008 23:13:20 -0400 Date: Mon, 9 Jun 2008 20:09:59 -0700 From: Greg KH To: Linus Torvalds Cc: Ingo Molnar , Cornelia Huck , Vegard Nossum , Adrian Bunk , Andrew Morton , Linux Kernel Mailing List , Jens Axboe , "Rafael J. Wysocki" , Kay Sievers , Neil Brown , Mariusz Kozlowski , Dave Young Subject: Re: [bug, 2.6.26-rc4/rc5] sporadic bootup crashes in blk_lookup_devt()/prepare_namespace() Message-ID: <20080610030959.GD6796@suse.de> References: <20080609080312.GA32458@elte.hu> <20080609020623.b6727f2b.akpm@linux-foundation.org> <19f34abd0806090209l541d93c6jaba2704314b34418@mail.gmail.com> <20080609133426.GB20194@cs181133002.pp.htv.fi> <19f34abd0806090658v54f3a912n2ed30ad6cc20d00@mail.gmail.com> <19f34abd0806090728s3b3fdbeq7dd3d31d02c8f28e@mail.gmail.com> <20080609165757.184724ff@gondolin.boeblingen.de.ibm.com> <20080609153856.GA5149@elte.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.16 (2007-06-09) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2883 Lines: 74 On Mon, Jun 09, 2008 at 09:15:40AM -0700, Linus Torvalds wrote: > > > On Mon, 9 Jun 2008, Ingo Molnar wrote: > > > > ah. I suspect that explains the sporadic nature as well: normally there > > is 'some' object at the list address, just with an invalid type. > > Yes. It could cause two kinds of problems: > > - it might end up returning the wrong 'dev_t'. This is unlikely, since we > only have two cases: the working whole-disk case, and the case where we > find a partition. > > But if we find a partition, we'd still get the right dev_t *most* of > the time, because we'd first get called with "part=0", and then we have > > if (part < disk->minors) > devt = MKDEV(MAJOR(dev->devt), > MINOR(dev->devt) + part); > break; > > where we would only fail if that conditional statement would be untrue > (and then we'd incorrectly return MKDEV(0,0)). Otherwise, 'devt' ends > up being correct anyway. > > So one effect of this bug would be that it would use the random > "disk->minors" value to either return the right devt, or return one > that is all zeroes. But if we return the all-zeroes case, then > init/do_mounts.c will just try again, this time with the numbers > removed, and now it wouldn't hit the "strcmp()" on any partition, and > the next time around it would find a disk and work again. > > So this is a bug, but it's one that essentially is hidden by the > caller. > > - The other alternative is that the bogus "disk->minors" thing would > cause a page fault. This would only happen if the partition allocation > was the first thing in a page, and the previous page was unused, and > you had DEBUG_PAGEALLOC enabled. > > This is obviously the case you saw. > > My trivial fix makes it ignore partitions entirely. > > We *could* (and perhaps should) do something slightly more involved > instead, which actually uses a partition if it's there). Like this. That > would avoid my one nagging worry (that some clever usage makes partitions > with a different numbering or without a base block device). > > And this is all still ignoring the locking issue, of course. It would be > trivial to just remove the block_class_lock, and change > > mutex_[un]lock(&block_class_lock); > > into > > down|up(&block_class.sem); The locking for struct class has turned into a mutex in the -next tree already, but I have left the block_class_lock alone for the moment. Now that I have also cleaned up the places in the /proc files where we grabbed it, I think it might be safe to remove, I'll poke at that tomorrow. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/