Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758059AbYGaTwq (ORCPT ); Thu, 31 Jul 2008 15:52:46 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755025AbYGaTwh (ORCPT ); Thu, 31 Jul 2008 15:52:37 -0400 Received: from rv-out-0506.google.com ([209.85.198.235]:55879 "EHLO rv-out-0506.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754975AbYGaTwf (ORCPT ); Thu, 31 Jul 2008 15:52:35 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=seXDK3uiOgN6OiU4yGBFh2zc7OIeRaOrvglZCl8yDsXIQxxInN6KVxYu7e+HM/X1Ec Ona9VjaVTBDKA9+XqFvOyzR8UpQ9Z4ngZ6DV6cj4yExaYyQ6CirEemBycgtKn7miftFE 0Vzp2ioaDkRj8C0JxRaqQgc/XfMfFL7T1ZY9I= Message-ID: Date: Thu, 31 Jul 2008 21:52:33 +0200 From: "Dmitry Adamushko" To: "Ingo Molnar" Subject: Re: Oops in microcode sysfs registration, Cc: "Alistair John Strachan" , "Pekka Paalanen" , "Linus Torvalds" , "Linux Kernel Mailing List" , shaohua.li@intel.com, tigran@aivazian.fsnet.co.uk, "Thomas Gleixner" , "Steven Rostedt" , "Max Krasnyansky" , "Peter Zijlstra" In-Reply-To: <20080731165650.GJ26393@elte.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <200807291457.58408.alistair@devzero.co.uk> <20080729192214.2d3a4ca5@daedalus.pq.iki.fi> <200807291750.41169.alistair@devzero.co.uk> <20080731165650.GJ26393@elte.hu> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8278 Lines: 181 2008/7/31 Ingo Molnar : > > * Dmitry Adamushko wrote: > >> 2008/7/30 Dmitry Adamushko : >> > 2008/7/29 Alistair John Strachan : >> >> On Tuesday 29 July 2008 17:22:14 Pekka Paalanen wrote: >> >>> > Also, I'm sure this is reproducible without the NVIDIA garbage, but I was >> >>> > too lazy to test it. If you want me to repeat the experiment without the >> >>> > driver I would be more than happy to do so. >> >>> >> >>> I'm not sure people are willing to look into this without a clean report, >> >>> so this would be cool. There's even a test module for mmiotrace in the >> >>> kernel, but I doubt it would make difference to use it or not, when trying >> >>> to reproduce the crash without the blob. >> >> >> >> Of course, and I should have attempted to reproduce without the driver. >> >> Fortunately that was easy: it is not an NVIDIA driver bug. >> >> >> >> Steps to reproduce: have CONFIG_MICROCODE=y and a suitable Intel >> >> processor, then do: >> >> >> >> echo mmiotrace >/debug/tracing/current_tracer >> >> echo none >/debug/tracing/current_tracer >> >> >> >> And you get this (snipped) oops: >> >> >> >> in mmio_trace_init >> >> mmiotrace: Disabling non-boot CPUs... >> >> kvm: disabling virtualization on CPU1 >> >> CPU 1 is now offline >> >> SMP alternatives: switching to UP code >> >> CPU0 attaching NULL sched-domain. >> >> CPU1 attaching NULL sched-domain. >> >> CPU0 attaching NULL sched-domain. >> >> mmiotrace: CPU1 is down. >> >> mmiotrace: enabled. >> >> in mmio_trace_reset >> >> mmiotrace: Re-enabling CPUs... >> >> SMP alternatives: switching to SMP code >> >> Booting processor 1/1 ip 6000 >> >> Initializing CPU#1 >> >> Calibrating delay using timer specific routine.. <6>7204.76 BogoMIPS (lpj=3602381) >> >> CPU: L1 I cache: 32K, L1 D cache: 32K >> >> CPU: L2 cache: 4096K >> >> CPU: Physical Processor ID: 0 >> >> CPU: Processor Core ID: 1 >> >> x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106 >> >> CPU1: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz stepping 06 >> >> checking TSC synchronization [CPU#0 -> CPU#1]: passed. >> >> kvm: enabling virtualization on CPU1 >> >> CPU0 attaching NULL sched-domain. >> >> Switched to high resolution mode on CPU 1 >> >> CPU0 attaching sched-domain: >> >> domain 0: span 0-1 level MC >> >> groups: 0 1 >> >> CPU1 attaching sched-domain: >> >> domain 0: span 0-1 level MC >> >> groups: 1 0 >> >> ------------[ cut here ]------------ >> >> Kernel BUG at ffffffff8021a31d [verbose debug info unavailable] >> >> invalid opcode: 0000 [1] PREEMPT SMP >> >> CPU 0 >> >> Modules linked in: rfcomm l2cap kvm_intel kvm ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables bridge stp llc acpi_cpufreq freq_table coretemp hwmon >> >> snd_pcm_oss snd_mixer_oss firewire_sbp2 hci_usb bluetooth arc4 ecb crypto_blkcipher cryptomgr crypto_algapi usbhid zd1211rw mac80211 crypto cfg80211 snd_emu10k1 snd_rawmidi >> >> snd_ac97_codec ac97_bus sg snd_seq_device snd_hda_intel snd_pcm snd_util_mem snd_timer sr_mod snd_hwdep i2c_i801 ehci_hcd firewire_ohci uhci_hcd snd snd_page_alloc firewire_core >> >> soundcore r8169 cdrom usbcore i2c_core crc_itu_t >> >> Pid: 2757, comm: bash Tainted: G A 2.6.27-rc1-damocles #3 >> >> RIP: 0010:[] [] __mc_sysdev_add+0xc3/0x1f1 >> >> RSP: 0018:ffff8800b8905ce8 EFLAGS: 00010297 >> >> RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff880080a04000 >> >> RDX: ffffffff8062c680 RSI: 0000000000000003 RDI: ffffffff8059e830 >> >> RBP: ffff8800b8905d48 R08: ffff8800b8904000 R09: ffffffff80229ca4 >> >> R10: ffff8800010247b0 R11: ffff8800bf879de0 R12: 0000000000000018 >> >> R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 >> >> FS: 00007f8ddc78f6e0(0000) GS:ffffffff805da200(0000) knlGS:0000000000000000 >> >> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> >> CR2: 00007f57cb9b2098 CR3: 00000000b8985000 CR4: 00000000000026e0 >> >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> >> Process bash (pid: 2757, threadinfo ffff8800b8904000, task ffff8800bd125640) >> >> Stack: ffffffff80627040 0000000000000000 0000000000000008 ffffffff8048bb28 >> >> 0000000000000003 ffffffff802ce910 ffff8800b8905d28 0000000000000002 >> >> 00000000ffffffe8 0000000000000001 0000000000000001 ffff880001028418 >> >> Call Trace: >> >> [] ? sysfs_add_file+0xc/0xe >> >> [] mc_sysdev_add+0xb/0xd >> >> [] mc_cpu_callback+0x4b/0x208 >> >> [] ? mce_cpu_callback+0x3e/0xbc >> >> [] notifier_call_chain+0x33/0x5b >> >> [] raw_notifier_call_chain+0xf/0x11 >> >> [] _cpu_up+0xce/0x119 >> >> [] cpu_up+0x5e/0x8a >> >> [] disable_mmiotrace+0xfe/0x173 >> >> [] mmio_trace_reset+0x2d/0x44 >> >> [] tracing_set_trace_write+0xd3/0x10f >> >> [] ? filp_close+0x67/0x72 >> >> [] vfs_write+0xa7/0xe1 >> >> [] sys_write+0x47/0x6f >> >> [] system_call_fastpath+0x16/0x1b >> >> [ 68.405002] >> >> [ 68.405002] >> >> Code: e8 59 80 e8 fd 69 26 00 48 c7 c2 80 c6 62 80 48 8b 05 c0 00 3c 00 48 8b 04 d8 48 8b 48 08 65 8b 04 25 24 00 00 00 44 39 e8 74 04 <0f> 0b eb fe 4c 8d 04 0a 41 c7 84 24 7c 36 64 80 00 >> >> 00 00 00 41 >> >> RIP [] __mc_sysdev_add+0xc3/0x1f1 >> >> RSP >> >> ---[ end trace ee9c9240024cb48c ]--- >> >> >> >> I've replaced the originally tainted dmesg with this new clean one, so >> >> there's no proprietary smell about it :-) >> > >> > Yes, it's kind of a known issue. Take a look at this explanation: >> > http://lkml.org/lkml/2008/7/24/260 >> > >> > There were a few related discussions in other threads (mainly, Max >> > Krasnyansky and I were asking for additional info on possible >> > requirements from the 'microcode' driver...) heh, I think, we'd be >> > better off just fixing it one way or another. >> >> does a patch below fix it for you? >> [ not really what we wanted ] >> >> (non-white-space-damaged version is enclosed) > > could you please send this patch with a changelog, explanation, etc.? Now having thought a bit more on that issue, I tend to think that this patch is not all that nice (so I agree with Max here). The root problem is the way set_cpus_allowed_ptr() is used in microcode's cpu-hotplug handler. With cpu_active_map in place set_cpus_allowed_ptr() can't migrate a task on the soon-to-be-online cpu from withing a CPU_ONLINE handler (more in details here: http://lkml.org/lkml/2008/7/24/260) Basically, this patch marks a 'cpu' available for other tasks to be migrated to it before sending CPU_ONLINE notification to subscribers... [ now, there can be CPU_ONLINE http://lkml.org/lkml/2008/7/24/260handlers that has something to do with enabling migration/load-balancing. e.g. migration_call() , although it has the highest prio and is supposed to run first in a chain ] In another thread, I've asked whether doing 'microcode update' in start_secondary() (or even at the beginning of idle_cpu() would be better): pros: - it's done as early as possible (no other tasks has started running on a cpu yet); - no actions in cpu-hotplug; cons: - microcode sub-systems becomes visible outside of microcode.c _but_ it's arch-specific part anyway + with object-oriented re-work (which is in -tip), I think it'd be that bad. Alternatives: - delayed 'microcode' update -> scheduled to 'workqueue' (cons: it's not as early as possible); - Max suggested a combination of IPI + some wotk (request_firmware()) from cpu-hotplug handler itself. But I think it's quite a complex scheme (and maybe prone to other problems). What do you think? > > Ingo > -- Best regards, Dmitry Adamushko -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/