Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755153AbYLGW0z (ORCPT ); Sun, 7 Dec 2008 17:26:55 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753121AbYLGW0r (ORCPT ); Sun, 7 Dec 2008 17:26:47 -0500 Received: from ey-out-2122.google.com ([74.125.78.24]:24141 "EHLO ey-out-2122.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753024AbYLGW0q (ORCPT ); Sun, 7 Dec 2008 17:26:46 -0500 Message-ID: Date: Sun, 7 Dec 2008 23:26:43 +0100 From: "Kay Sievers" To: "Alan Cox" Subject: Re: Runaway loop with the current git. Cc: "Evgeniy Polyakov" , "Herbert Xu" , linux-kernel@vger.kernel.org, "Linux Crypto Mailing List" In-Reply-To: <20081207200008.5af2d1e9@lxorguk.ukuu.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20081207175245.2d074299@lxorguk.ukuu.org.uk> <20081207175438.GA3181@ioremap.net> <20081207180302.339bf58d@lxorguk.ukuu.org.uk> <20081207181559.63549cfc@lxorguk.ukuu.org.uk> <20081207183112.3ef31caa@lxorguk.ukuu.org.uk> <20081207200008.5af2d1e9@lxorguk.ukuu.org.uk> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3735 Lines: 83 On Sun, Dec 7, 2008 at 21:00, Alan Cox wrote: >> > max_modprobes = min(max_threads/2, MAX_KMOD_CONCURRENT); >> > atomic_inc(&kmod_concurrent); >> > if (atomic_read(&kmod_concurrent) > max_modprobes) { >> > /* We may be blaming an innocent here, but unlikely */ >> > if (kmod_loop_msg++ < 5) >> > printk(KERN_ERR >> > "request_module: runaway loop modprobe >> > %s\n", module_name); >> > atomic_dec(&kmod_concurrent); >> > return -ENOMEM; >> > } >> > >> > Happy now. Print it out, share it with friends, find someone who can read >> > C if you are stuck. >> >> It does not work, that's all. > > It works for me. It runs in a loop here, just like the $subject says, and bko#12153 says. >> Reproduce the bug and look at it for yourself. > > Well since you've got a reproducer and this code works for me (I've tested > it just fine), why don't you go and reproduce the problem then post a fix > to that code I quoted instead of all this reordering rubbish. If you fix > this code not only won't you risk all the mess from re-ordering > initialisations around the kernel but you'll fix non console related > looping which you imply is also broken as you claim that code doesn't > work for you. > > If I deliberately break my module utils I see a sequence of modprobes > which then hits kmod_concurrent limit then causes a -ENOMEM back to > userspace which then fails the file open. The bug report also shows the > printk is displayed so the runaway loop *was* detected and the code paths > taken which stopped the loop. Sure, I can try to find out why the limiter does does not work here. > I get open -> modprobe -> open -> modprobe -> open -> modprobe ... -> > open fail, then open fail, open fail, open fail, open fail back to the > first modprobe exiting. > > Your proposal to keep the current recent modprobe parameter strings would > shorten the amount of recursion but it wouldn't change the result that I > can see. If I open /dev/console early and wrongly from a modprobe then I > ultimately get a failing open just as I should do. No, my proposal would prevent the kernel to call any modprobe for the special built-in device 5:1, it shortens the recursion to zero, which sounds like the right fix. It's really weird to fix the symptoms, instead of the real problem. The kernel forks a binary with a broken environment. Even the in-kernel-tree cpio generator creates /dev/console, and accessing it from a kernel forked binary makes it crash. The kernel must provide a sane environment for stuff that it calls, I think, that is pretty obvious. Device 5:1 is a core device, which never makes sense to call modprobe for it. No other later driver will ever register that dev_t, so we should do it before calling out to other random userspace stuff which triggers the kernel to go crazy with its own devices. My proposal was to connect 5:1 to the kobject map at the same time as we register the whole tty class. We allocate the class, create sysfs entries, run /sbin/modprobe at that time, why shouldn't we just register the dev_t that time too? It properly prevents the needless userspace "driver searching" for a well known, already created, but not properly registered core device. It makes the process environment sane, and prevents the root cause of the bug, and does not just limit the damage. Thanks, Kay -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/