Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161413AbWHEKyM (ORCPT ); Sat, 5 Aug 2006 06:54:12 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1161426AbWHEKyM (ORCPT ); Sat, 5 Aug 2006 06:54:12 -0400 Received: from py-out-1112.google.com ([64.233.166.176]:4613 "EHLO py-out-1112.google.com") by vger.kernel.org with ESMTP id S1161413AbWHEKyL (ORCPT ); Sat, 5 Aug 2006 06:54:11 -0400 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=piB1MeUBjbptJlODvPmyuFjQ8I6JpFYXxpe/fWn9IHNI7I9AJgd1khyAGx3anYpjQni56SmVGWNoNhhFsP/1PZM6nOJD4zLGTDIdZxuZNCp+l0+tANIZVibrVi3RR1P5gxH4BkhaWusE4cb7/ZZRLjqpgJ4qZomqb2IjHw8hWu0= Message-ID: <6bffcb0e0608050354k4dd0bb0ep337216e984ce41d7@mail.gmail.com> Date: Sat, 5 Aug 2006 12:54:11 +0200 From: "Michal Piotrowski" To: "Dave Jones" , "Linus Torvalds" , "Michal Piotrowski" , LKML Subject: Re: 2.6.18-rc3-g3b445eea BUG: warning at /usr/src/linux-git/kernel/cpu.c:51/unlock_cpu_hotplug() In-Reply-To: <20060805064727.GF13393@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <6bffcb0e0608041204u4dad7cd6rab0abc3eca6747c0@mail.gmail.com> <20060804222400.GC18792@redhat.com> <20060805003142.GH18792@redhat.com> <20060805021051.GA13393@redhat.com> <20060805022356.GC13393@redhat.com> <20060805024947.GE13393@redhat.com> <20060805064727.GF13393@redhat.com> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5573 Lines: 136 On 05/08/06, Dave Jones wrote: > On Fri, Aug 04, 2006 at 10:49:47PM -0400, Dave Jones wrote: > > This trace now makes a lot more sense to me. > > > CPU1 called lock_cpu_hotplug() for app cpuspeed. recursive_depth=0 > > [] show_trace_log_lvl+0x58/0x152 > > [] show_trace+0xd/0x10 > > [] dump_stack+0x19/0x1b > > [] lock_cpu_hotplug+0x39/0xbf > > [] store_scaling_governor+0x142/0x1a3 > > [] store+0x37/0x48 > > [] sysfs_write_file+0xab/0xd1 > > [] vfs_write+0xab/0x157 > > [] sys_write+0x3b/0x60 > > [] sysenter_past_esp+0x56/0x8d > > cpuspeed acquired cpu_bitmask_lock > > > > CPU1 called lock_cpu_hotplug() for app cpuspeed. recursive_depth=0 > > [] show_trace_log_lvl+0x58/0x152 > > [] show_trace+0xd/0x10 > > [] dump_stack+0x19/0x1b > > [] lock_cpu_hotplug+0x39/0xbf > > [] __create_workqueue+0x52/0x122 > > [] cpufreq_governor_dbs+0x9f/0x2c3 [cpufreq_ondemand] > > [] __cpufreq_governor+0x57/0xd8 > > [] __cpufreq_set_policy+0x14e/0x1bc > > [] store_scaling_governor+0x159/0x1a3 > > [] store+0x37/0x48 > > [] sysfs_write_file+0xab/0xd1 > > [] vfs_write+0xab/0x157 > > [] sys_write+0x3b/0x60 > > [] sysenter_past_esp+0x56/0x8d > > Lukewarm IQ detected in hotplug locking > > BUG: warning at kernel/cpu.c:46/lock_cpu_hotplug() > > So when we write to sysfs to set the governor, we end up in store_scaling_governor() > which takes the hotplug lock, and then calls off into the governor to let it > do its thing. Part of ondemand's "thing" is to create a workqueue. > unfortunatly, __create_workqueue also takes the hotplug lock. > > Creating a variant of __create_workqueue that doesn't take the lock > seems really nasty. > > We could remove the locking from store_scaling_governor() and make the governors > themselves have to do the locking, but I'm not sure that's entirely safe. > > We could do something really disgusting like ... > > unlock_cpu_hotplug() > ... > create_workqueue() > ... > lock_cpu_hotplug() > > in ondemand, which opens up a tiny race window, but as ugly as it is, > looks to be the best solution of the bunch right now. > > Comments? > > The really sad part is this is completely unrelated to the original bug reported > in this thread, which shows just how widespread this braindamage is. > Michal's traces really don't really scream anything obvious to me. > (Though given it took me 4 hours to decode my own traces above, this is no > real sign of how big a problem this might be). > > Michal, could you apply this diff.. http://lkml.org/lkml/diff/2006/8/4/381/1 > (change the '120' to '60' first), and send me the debug spew that you get ? > You'll have to wait until a minute of uptime has passed. Oh, and edit > include/linux/jiffies.h to change INITIAL_JIFFIES to '0'. p4-clockmod: P4/Xeon(TM) CPU On-Demand Clock Modulation available ip_tables: (C) 2000-2006 Netfilter Core Team Netfilter messages via NETLINK v0.30. BUG: warning at /usr/src/linux-git/kernel/cpu.c:69/unlock_cpu_hotplug() [] unlock_cpu_hotplug+0x63/0xb2 [] stop_machine_run+0x2e/0x34 [] sys_init_module+0x15a0/0x178a [] do_sync_read+0xb6/0xf1 [] sysenter_past_esp+0x56/0x79 ip_conntrack version 2.4 (8192 buckets, 65536 max) - 224 bytes per conntrack NET: Registered protocol family 17 skge eth0: enabling interface skge eth0: Link is up at 100 Mbps, full duplex, flow control tx and rx w83627hf 9191-0290: Reading VID from GPIO5 NET: Registered protocol family 10 IPv6 over IPv4 tunneling driver audit(1154774854.770:5): avc: denied { write } for pid=1360 comm="cpuspeed" name="cpufreq" dev=sysfs ino=4863 scontext=system_u:system_r:cpuspeed_t:s0 tcontext=system_u:object_r:sysfs_t:s0 tclass=dir eth0: no IPv6 routers present CPU0 called lock_cpu_hotplug() for app amarokapp. recursive_depth=0 [] lock_cpu_hotplug+0x36/0xb9 [] sched_getaffinity+0xf/0x83 [] sys_sched_getaffinity+0x1f/0x41 [] sysenter_past_esp+0x56/0x79 amarokapp acquired cpu_bitmask_lock CPU0 called unlock_cpu_hotplug() for app amarokapp. recursive_depth=0 [] unlock_cpu_hotplug+0x34/0xb2 [] sched_getaffinity+0x64/0x83 [] sys_sched_getaffinity+0x1f/0x41 [] sysenter_past_esp+0x56/0x79 amarokapp released cpu_bitmask_lock CPU0 called lock_cpu_hotplug() for app amarokapp. recursive_depth=0 [] lock_cpu_hotplug+0x36/0xb9 [] sched_setaffinity+0xf/0xd5 [] sys_sched_setaffinity+0x3b/0x41 [] sysenter_past_esp+0x56/0x79 amarokapp acquired cpu_bitmask_lock CPU0 called unlock_cpu_hotplug() for app amarokapp. recursive_depth=0 [] unlock_cpu_hotplug+0x34/0xb2 [] sched_setaffinity+0xce/0xd5 [] sys_sched_setaffinity+0x3b/0x41 [] sysenter_past_esp+0x56/0x79 amarokapp released cpu_bitmask_lock > > Dave > > -- > http://www.codemonkey.org.uk > Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (http://www.stardust.webpages.pl/ltg/wiki/) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/