Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754272Ab1CYQZ7 (ORCPT ); Fri, 25 Mar 2011 12:25:59 -0400 Received: from mail-qy0-f174.google.com ([209.85.216.174]:32772 "EHLO mail-qy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753998Ab1CYQZ6 convert rfc822-to-8bit (ORCPT ); Fri, 25 Mar 2011 12:25:58 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding; b=BPnFy0SID4/PrDmMiw/WjWBV67nynl+VwpjDc2yRzmuY+ku3eQmgOtrIopWo3+RFpU DmHFd/CCXLkG5bjWGgSVwVBS93j/4XL2KaGrJLR4HTbhlgmbic3LdLlu9F0xPnbJkj4B yPt5t4j0K1XVz9r3Z546U002cwpguF5NrlpYg= MIME-Version: 1.0 Reply-To: sedat.dilek@gmail.com In-Reply-To: <20110325155516.GA27484@feather> References: <20110325155516.GA27484@feather> Date: Fri, 25 Mar 2011 17:25:56 +0100 Message-ID: Subject: Re: linux-next: Tree for March 25 (Call trace: RCU|workqueues|block|VFS|ext4 related?) From: Sedat Dilek To: Josh Triplett Cc: linux-next , LKML , Stephen Rothwell , Randy Dunlap , "Theodore Ts'o" , Jens Axboe , Tejun Heo , Al Viro , "Paul E. McKenney" , Nick Piggin Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8459 Lines: 154 On Fri, Mar 25, 2011 at 4:55 PM, Josh Triplett wrote: > On Fri, Mar 25, 2011 at 02:05:33PM +0100, Sedat Dilek wrote: >> On Fri, Mar 25, 2011 at 11:16 AM, Sedat Dilek >> wrote: >> > right after I have finished building a new linux-next kernel, booting >> > into desktop and archiving my build-tree (ext4) as tarball to an >> > external USB harddisk (partition there is ext3). >> > ( Yesterday, I have seen similiar call-traces in my logs, but it was >> > hard to reproduce [1]. ) >> > I am unsure from where the problem aroses, if you have a hint, let me know. >> > >> > Regards, >> > - Sedat - >> > >> > [1] http://lkml.org/lkml/2011/3/24/268 >> > >> > P.S.: Attached are the dmesg outputs and my kernel-config >> > >> >> I turned off the notebook for about 2hrs to avoid thermal problems and >> hoax reports. >> Jumped into desktop and started an archive job as 1st job while doing daily job. >> Yeah, it is reproducible. > [...] >> [  212.453822] EXT3-fs (sdb5): mounted filesystem with ordered data mode >> [  273.224044] INFO: rcu_sched_state detected stall on CPU 0 (t=15000 jiffies) > > 15000 jiffies matches this 60-second gap, assuming you use HZ=250. > Yeah, CONFIG_HZ_250=y. Change HZ value? Might be worth to change (new) default CONFIG_RCU_CPU_STALL_TIMEOUT=60 ? Increase/Decrease? Can I change the value from user-space? If yes, how? Any other hints? $ egrep -i '_HZ|_RCU' /boot/config-2.6.38-next20110325-2-686-iniza CONFIG_TREE_RCU=y # CONFIG_PREEMPT_RCU is not set # CONFIG_RCU_TRACE is not set CONFIG_RCU_FANOUT=32 # CONFIG_RCU_FANOUT_EXACT is not set CONFIG_RCU_FAST_NO_HZ=y # CONFIG_TREE_RCU_TRACE is not set CONFIG_NO_HZ=y # CONFIG_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_300 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=250 # CONFIG_SPARSE_RCU_POINTER is not set # CONFIG_RCU_TORTURE_TEST is not set CONFIG_RCU_CPU_STALL_TIMEOUT=60 >> [  273.224059] sending NMI to all CPUs: >> [  273.224074] NMI backtrace for cpu 0 >> [  273.224081] Modules linked in: ext3 jbd bnep rfcomm bluetooth aes_i586 aes_generic binfmt_misc ppdev acpi_cpufreq mperf cpufreq_powersave cpufreq_userspace lp cpufreq_stats cpufreq_conservative fuse snd_intel8x0 snd_intel8x0m snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm radeon thinkpad_acpi snd_seq_midi pcmcia ttm snd_rawmidi snd_seq_midi_event drm_kms_helper yenta_socket snd_seq pcmcia_rsrc drm pcmcia_core joydev snd_timer snd_seq_device snd i2c_algo_bit tpm_tis shpchp i2c_i801 tpm nsc_ircc irda snd_page_alloc soundcore pci_hotplug rng_core i2c_core tpm_bios psmouse crc_ccitt nvram parport_pc pcspkr parport evdev battery video ac processor power_supply serio_raw button arc4 ecb ath5k ath mac80211 cfg80211 rfkill autofs4 ext4 mbcache jbd2 crc16 dm_mod usbhid hid usb_storage uas sg sd_mod sr_mod crc_t10dif cdrom ata_generic ata_piix libata uhci_hcd ehci_hcd usbcore scsi_mod thermal e1000 thermal_sys floppy [last unloaded: scsi_wait_scan] >> [  273.224367] >> [  273.224377] Pid: 0, comm: swapper Not tainted 2.6.38-next20110325-2-686-iniza #1 IBM 2374SG6/2374SG6 >> [  273.224397] EIP: 0060:[] EFLAGS: 00000807 CPU: 0 >> [  273.224414] EIP is at delay_tsc+0x16/0x5e >> [  273.224424] EAX: 00090d42 EBX: 00002710 ECX: c133faf5 EDX: 00090d41 >> [  273.224435] ESI: 00000000 EDI: 00090d42 EBP: f5819e9c ESP: f5819e8c >> [  273.224445]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 >> [  273.224458] Process swapper (pid: 0, ti=f5818000 task=c13e3fa0 task.ti=c13b6000) >> [  273.224466] Stack: >> [  273.224472]  00090d41 00002710 c13ee580 c13ee600 f5819ea4 c115149f f5819eac c11514bb >> [  273.224497]  f5819eb8 c1016532 c13ee580 f5819ed4 c1078dc1 c134e61e c134e6c2 00000000 >> [  273.224520]  00003a98 f5c03488 f5819ee8 c1078e36 00000000 00000000 c13e3fa0 f5819ef4 >> [  273.224544] Call Trace: >> [  273.224559]  [] __delay+0x9/0xb >> [  273.224571]  [] __const_udelay+0x1a/0x1c >> [  273.224590]  [] arch_trigger_all_cpu_backtrace+0x50/0x62 >> [  273.224608]  [] check_cpu_stall+0x58/0xb8 >> [  273.224622]  [] __rcu_pending+0x15/0xc4 >> [  273.224637]  [] rcu_check_callbacks+0x6d/0x93 >> [  273.224652]  [] update_process_times+0x2d/0x58 >> [  273.224666]  [] tick_sched_timer+0x6b/0x9a >> [  273.224682]  [] __run_hrtimer+0x9c/0x111 >> [  273.224694]  [] ? tick_sched_timer+0x0/0x9a >> [  273.224708]  [] hrtimer_interrupt+0xd6/0x1bb >> [  273.224727]  [] tick_do_broadcast.constprop.4+0x38/0x6a >> [  273.224741]  [] tick_handle_oneshot_broadcast+0xad/0xe1 >> [  273.224757]  [] ? handle_level_irq+0x0/0x63 >> [  273.224772]  [] timer_interrupt+0x15/0x1c >> [  273.224785]  [] handle_irq_event_percpu+0x4e/0x164 >> [  273.224799]  [] ? handle_level_irq+0x0/0x63 >> [  273.224811]  [] handle_irq_event+0x36/0x51 >> [  273.224824]  [] ? handle_level_irq+0x0/0x63 >> [  273.224837]  [] handle_level_irq+0x4d/0x63 >> [  273.224845]   >> [  273.224857]  [] ? do_IRQ+0x35/0x80 >> [  273.224871]  [] ? common_interrupt+0x30/0x38 >> [  273.224886]  [] ? destroy_worker+0x52/0x6c >> [  273.224922]  [] ? arch_local_irq_enable+0x5/0xb [processor] >> [  273.224947]  [] ? acpi_idle_enter_simple+0x100/0x138 [processor] >> [  273.224964]  [] ? cpuidle_idle_call+0xc2/0x137 >> [  273.224978]  [] ? cpu_idle+0x89/0xa3 >> [  273.224995]  [] ? rest_init+0x58/0x5a >> [  273.225008]  [] ? start_kernel+0x315/0x31a >> [  273.225022]  [] ? i386_start_kernel+0xa2/0xaa >> [  273.225029] Code: e5 e8 d6 ff ff ff 5d c3 55 89 e5 8d 04 80 e8 c9 ff ff ff 5d c3 55 89 e5 57 89 c7 56 53 52 64 8b 35 04 20 47 c1 8d 76 00 0f ae e8 6b ff ff ff 89 c3 8d 76 00 0f ae e8 e8 5e ff ff ff 89 c2 29 >> [  273.225154] Call Trace: >> [  273.225166]  [] __delay+0x9/0xb >> [  273.225178]  [] __const_udelay+0x1a/0x1c >> [  273.225192]  [] arch_trigger_all_cpu_backtrace+0x50/0x62 >> [  273.225207]  [] check_cpu_stall+0x58/0xb8 >> [  273.225220]  [] __rcu_pending+0x15/0xc4 >> [  273.225234]  [] rcu_check_callbacks+0x6d/0x93 >> [  273.225247]  [] update_process_times+0x2d/0x58 >> [  273.225260]  [] tick_sched_timer+0x6b/0x9a >> [  273.225274]  [] __run_hrtimer+0x9c/0x111 >> [  273.225286]  [] ? tick_sched_timer+0x0/0x9a >> [  273.225300]  [] hrtimer_interrupt+0xd6/0x1bb >> [  273.225316]  [] tick_do_broadcast.constprop.4+0x38/0x6a >> [  273.225330]  [] tick_handle_oneshot_broadcast+0xad/0xe1 >> [  273.225345]  [] ? handle_level_irq+0x0/0x63 >> [  273.225358]  [] timer_interrupt+0x15/0x1c >> [  273.225370]  [] handle_irq_event_percpu+0x4e/0x164 >> [  273.225384]  [] ? handle_level_irq+0x0/0x63 >> [  273.225396]  [] handle_irq_event+0x36/0x51 >> [  273.225409]  [] ? handle_level_irq+0x0/0x63 >> [  273.225421]  [] handle_level_irq+0x4d/0x63 >> [  273.225429]    [] ? do_IRQ+0x35/0x80 >> [  273.225450]  [] ? common_interrupt+0x30/0x38 >> [  273.225464]  [] ? destroy_worker+0x52/0x6c >> [  273.225493]  [] ? arch_local_irq_enable+0x5/0xb [processor] >> [  273.225517]  [] ? acpi_idle_enter_simple+0x100/0x138 [processor] >> [  273.225532]  [] ? cpuidle_idle_call+0xc2/0x137 >> [  273.225545]  [] ? cpu_idle+0x89/0xa3 >> [  273.225559]  [] ? rest_init+0x58/0x5a >> [  273.225571]  [] ? start_kernel+0x315/0x31a >> [  273.225584]  [] ? i386_start_kernel+0xa2/0xaa > > Interesting.  Looks like RCU detected a stall while the CPU sits in > cpu_idle.  That *shouldn't* happen... > The 2nd dmesg showed me (as there are no more VFS/EXT4 call traces) the problem could occur from RCU stuff. > - Josh Triplett > Regards, - Sedat - -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/