LinuxLists.cc - Re: linux-next: Tree for March 25 (Call trace: RCU|workqueues|block|VFS|ext4 related?)

2011-03-25 10:16:41

Subject: Re: linux-next: Tree for March 25 (Call trace: RCU|workqueues|block|VFS|ext4 related?)

Hi,

right after I have finished building a new linux-next kernel, booting
into desktop and archiving my build-tree (ext4) as tarball to an
external USB harddisk (partition there is ext3).
( Yesterday, I have seen similiar call-traces in my logs, but it was
hard to reproduce [1]. )
I am unsure from where the problem aroses, if you have a hint, let me know.

Regards,
- Sedat -

[1] http://lkml.org/lkml/2011/3/24/268

P.S.: Attached are the dmesg outputs and my kernel-config

Attachments:

dmesg.txt (92.88 kB)
config-2.6.38-next20110325-2-686-iniza (125.34 kB)
Download all attachments

2011-03-25 13:05:36

On Fri, Mar 25, 2011 at 08:42:14PM +0100, Sedat Dilek wrote:
> On Fri, Mar 25, 2011 at 6:48 PM, Paul E. McKenney
> <[email protected]> wrote:
> > On Fri, Mar 25, 2011 at 06:40:38PM +0100, Sedat Dilek wrote:
> >> On Fri, Mar 25, 2011 at 5:51 PM, Sedat Dilek <[email protected]> wrote:
> >> > On Fri, Mar 25, 2011 at 5:42 PM, Paul E. McKenney
> >> > <[email protected]> wrote:
> >> >> On Fri, Mar 25, 2011 at 08:55:16AM -0700, Josh Triplett wrote:
> >> >>> On Fri, Mar 25, 2011 at 02:05:33PM +0100, Sedat Dilek wrote:
> >> >>> > On Fri, Mar 25, 2011 at 11:16 AM, Sedat Dilek
> >> >>> > <[email protected]> wrote:
> >> >>> > > right after I have finished building a new linux-next kernel, booting
> >> >>> > > into desktop and archiving my build-tree (ext4) as tarball to an
> >> >>> > > external USB harddisk (partition there is ext3).
> >> >>> > > ( Yesterday, I have seen similiar call-traces in my logs, but it was
> >> >>> > > hard to reproduce [1]. )
> >> >>> > > I am unsure from where the problem aroses, if you have a hint, let me know.
> >> >>> > >
> >> >>> > > Regards,
> >> >>> > > - Sedat -
> >> >>> > >
> >> >>> > > [1] http://lkml.org/lkml/2011/3/24/268
> >> >>> > >
> >> >>> > > P.S.: Attached are the dmesg outputs and my kernel-config
> >> >>> > >
> >> >>> >
> >> >>> > I turned off the notebook for about 2hrs to avoid thermal problems and
> >> >>> > hoax reports.
> >> >>> > Jumped into desktop and started an archive job as 1st job while doing daily job.
> >> >>> > Yeah, it is reproducible.
> >> >>> [...]
> >> >>> > [ ?212.453822] EXT3-fs (sdb5): mounted filesystem with ordered data mode
> >> >>> > [ ?273.224044] INFO: rcu_sched_state detected stall on CPU 0 (t=15000 jiffies)
> >> >>>
> >> >>> 15000 jiffies matches this 60-second gap, assuming you use HZ=250.
> >> >>>
> >> >>> > [ ?273.224059] sending NMI to all CPUs:
> >> >>> > [ ?273.224074] NMI backtrace for cpu 0
> >> >>> > [ ?273.224081] Modules linked in: ext3 jbd bnep rfcomm bluetooth aes_i586 aes_generic binfmt_misc ppdev acpi_cpufreq mperf cpufreq_powersave cpufreq_userspace lp cpufreq_stats cpufreq_conservative fuse snd_intel8x0 snd_intel8x0m snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm radeon thinkpad_acpi snd_seq_midi pcmcia ttm snd_rawmidi snd_seq_midi_event drm_kms_helper yenta_socket snd_seq pcmcia_rsrc drm pcmcia_core joydev snd_timer snd_seq_device snd i2c_algo_bit tpm_tis shpchp i2c_i801 tpm nsc_ircc irda snd_page_alloc soundcore pci_hotplug rng_core i2c_core tpm_bios psmouse crc_ccitt nvram parport_pc pcspkr parport evdev battery video ac processor power_supply serio_raw button arc4 ecb ath5k ath mac80211 cfg80211 rfkill autofs4 ext4 mbcache jbd2 crc16 dm_mod usbhid hid usb_storage uas sg sd_mod sr_mod crc_t10dif cdrom ata_generic ata_piix libata uhci_hcd ehci_hcd usbcore scsi_mod thermal e1000 thermal_sys floppy [last unloaded: scsi_wait_scan]
> >> >>> > [ ?273.224367]
> >> >>> > [ ?273.224377] Pid: 0, comm: swapper Not tainted 2.6.38-next20110325-2-686-iniza #1 IBM 2374SG6/2374SG6
> >> >>> > [ ?273.224397] EIP: 0060:[<c11514f0>] EFLAGS: 00000807 CPU: 0
> >> >>> > [ ?273.224414] EIP is at delay_tsc+0x16/0x5e
> >> >>> > [ ?273.224424] EAX: 00090d42 EBX: 00002710 ECX: c133faf5 EDX: 00090d41
> >> >>> > [ ?273.224435] ESI: 00000000 EDI: 00090d42 EBP: f5819e9c ESP: f5819e8c
> >> >>> > [ ?273.224445] ?DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> >> >>> > [ ?273.224458] Process swapper (pid: 0, ti=f5818000 task=c13e3fa0 task.ti=c13b6000)
> >> >>> > [ ?273.224466] Stack:
> >> >>> > [ ?273.224472] ?00090d41 00002710 c13ee580 c13ee600 f5819ea4 c115149f f5819eac c11514bb
> >> >>> > [ ?273.224497] ?f5819eb8 c1016532 c13ee580 f5819ed4 c1078dc1 c134e61e c134e6c2 00000000
> >> >>> > [ ?273.224520] ?00003a98 f5c03488 f5819ee8 c1078e36 00000000 00000000 c13e3fa0 f5819ef4
> >> >>> > [ ?273.224544] Call Trace:
> >> >>> > [ ?273.224559] ?[<c115149f>] __delay+0x9/0xb
> >> >>> > [ ?273.224571] ?[<c11514bb>] __const_udelay+0x1a/0x1c
> >> >>> > [ ?273.224590] ?[<c1016532>] arch_trigger_all_cpu_backtrace+0x50/0x62
> >> >>> > [ ?273.224608] ?[<c1078dc1>] check_cpu_stall+0x58/0xb8
> >> >>> > [ ?273.224622] ?[<c1078e36>] __rcu_pending+0x15/0xc4
> >> >>> > [ ?273.224637] ?[<c10791df>] rcu_check_callbacks+0x6d/0x93
> >> >>> > [ ?273.224652] ?[<c1039c6c>] update_process_times+0x2d/0x58
> >> >>> > [ ?273.224666] ?[<c10509e9>] tick_sched_timer+0x6b/0x9a
> >> >>> > [ ?273.224682] ?[<c1047196>] __run_hrtimer+0x9c/0x111
> >> >>> > [ ?273.224694] ?[<c105097e>] ? tick_sched_timer+0x0/0x9a
> >> >>> > [ ?273.224708] ?[<c1047b38>] hrtimer_interrupt+0xd6/0x1bb
> >> >>> > [ ?273.224727] ?[<c104fca1>] tick_do_broadcast.constprop.4+0x38/0x6a
> >> >>> > [ ?273.224741] ?[<c104fd80>] tick_handle_oneshot_broadcast+0xad/0xe1
> >> >>> > [ ?273.224757] ?[<c1076cc2>] ? handle_level_irq+0x0/0x63
> >> >>> > [ ?273.224772] ?[<c1004215>] timer_interrupt+0x15/0x1c
> >> >>> > [ ?273.224785] ?[<c107536d>] handle_irq_event_percpu+0x4e/0x164
> >> >>> > [ ?273.224799] ?[<c1076cc2>] ? handle_level_irq+0x0/0x63
> >> >>> > [ ?273.224811] ?[<c10754b9>] handle_irq_event+0x36/0x51
> >> >>> > [ ?273.224824] ?[<c1076cc2>] ? handle_level_irq+0x0/0x63
> >> >>> > [ ?273.224837] ?[<c1076d0f>] handle_level_irq+0x4d/0x63
> >> >>> > [ ?273.224845] ?<IRQ>
> >> >>> > [ ?273.224857] ?[<c1003b8d>] ? do_IRQ+0x35/0x80
> >> >>> > [ ?273.224871] ?[<c12ac0f0>] ? common_interrupt+0x30/0x38
> >> >>> > [ ?273.224886] ?[<c10400d8>] ? destroy_worker+0x52/0x6c
> >> >>> > [ ?273.224922] ?[<f87b730f>] ? arch_local_irq_enable+0x5/0xb [processor]
> >> >>> > [ ?273.224947] ?[<f87b7ef5>] ? acpi_idle_enter_simple+0x100/0x138 [processor]
> >> >>> > [ ?273.224964] ?[<c11ebd92>] ? cpuidle_idle_call+0xc2/0x137
> >> >>> > [ ?273.224978] ?[<c1001da3>] ? cpu_idle+0x89/0xa3
> >> >>> > [ ?273.224995] ?[<c128c26c>] ? rest_init+0x58/0x5a
> >> >>> > [ ?273.225008] ?[<c1418722>] ? start_kernel+0x315/0x31a
> >> >>> > [ ?273.225022] ?[<c14180a2>] ? i386_start_kernel+0xa2/0xaa
> >> >>> > [ ?273.225029] Code: e5 e8 d6 ff ff ff 5d c3 55 89 e5 8d 04 80 e8 c9 ff ff ff 5d c3 55 89 e5 57 89 c7 56 53 52 64 8b 35 04 20 47 c1 8d 76 00 0f ae e8 <e8> 6b ff ff ff 89 c3 8d 76 00 0f ae e8 e8 5e ff ff ff 89 c2 29
> >> >>> > [ ?273.225154] Call Trace:
> >> >>> > [ ?273.225166] ?[<c115149f>] __delay+0x9/0xb
> >> >>> > [ ?273.225178] ?[<c11514bb>] __const_udelay+0x1a/0x1c
> >> >>> > [ ?273.225192] ?[<c1016532>] arch_trigger_all_cpu_backtrace+0x50/0x62
> >> >>> > [ ?273.225207] ?[<c1078dc1>] check_cpu_stall+0x58/0xb8
> >> >>> > [ ?273.225220] ?[<c1078e36>] __rcu_pending+0x15/0xc4
> >> >>> > [ ?273.225234] ?[<c10791df>] rcu_check_callbacks+0x6d/0x93
> >> >>> > [ ?273.225247] ?[<c1039c6c>] update_process_times+0x2d/0x58
> >> >>> > [ ?273.225260] ?[<c10509e9>] tick_sched_timer+0x6b/0x9a
> >> >>> > [ ?273.225274] ?[<c1047196>] __run_hrtimer+0x9c/0x111
> >> >>> > [ ?273.225286] ?[<c105097e>] ? tick_sched_timer+0x0/0x9a
> >> >>> > [ ?273.225300] ?[<c1047b38>] hrtimer_interrupt+0xd6/0x1bb
> >> >>> > [ ?273.225316] ?[<c104fca1>] tick_do_broadcast.constprop.4+0x38/0x6a
> >> >>> > [ ?273.225330] ?[<c104fd80>] tick_handle_oneshot_broadcast+0xad/0xe1
> >> >>> > [ ?273.225345] ?[<c1076cc2>] ? handle_level_irq+0x0/0x63
> >> >>> > [ ?273.225358] ?[<c1004215>] timer_interrupt+0x15/0x1c
> >> >>> > [ ?273.225370] ?[<c107536d>] handle_irq_event_percpu+0x4e/0x164
> >> >>> > [ ?273.225384] ?[<c1076cc2>] ? handle_level_irq+0x0/0x63
> >> >>> > [ ?273.225396] ?[<c10754b9>] handle_irq_event+0x36/0x51
> >> >>> > [ ?273.225409] ?[<c1076cc2>] ? handle_level_irq+0x0/0x63
> >> >>> > [ ?273.225421] ?[<c1076d0f>] handle_level_irq+0x4d/0x63
> >> >>> > [ ?273.225429] ?<IRQ> ?[<c1003b8d>] ? do_IRQ+0x35/0x80
> >> >>> > [ ?273.225450] ?[<c12ac0f0>] ? common_interrupt+0x30/0x38
> >> >>> > [ ?273.225464] ?[<c10400d8>] ? destroy_worker+0x52/0x6c
> >> >>> > [ ?273.225493] ?[<f87b730f>] ? arch_local_irq_enable+0x5/0xb [processor]
> >> >>> > [ ?273.225517] ?[<f87b7ef5>] ? acpi_idle_enter_simple+0x100/0x138 [processor]
> >> >>> > [ ?273.225532] ?[<c11ebd92>] ? cpuidle_idle_call+0xc2/0x137
> >> >>> > [ ?273.225545] ?[<c1001da3>] ? cpu_idle+0x89/0xa3
> >> >>> > [ ?273.225559] ?[<c128c26c>] ? rest_init+0x58/0x5a
> >> >>> > [ ?273.225571] ?[<c1418722>] ? start_kernel+0x315/0x31a
> >> >>> > [ ?273.225584] ?[<c14180a2>] ? i386_start_kernel+0xa2/0xaa
> >> >>>
> >> >>> Interesting. ?Looks like RCU detected a stall while the CPU sits in
> >> >>> cpu_idle. ?That *shouldn't* happen...
> >> >>
> >> >> There have been a few of these things recently that turned out to
> >> >> be BIOS misconfigurations, though that would not be the first thing
> >> >> I would suspect if the system had run other versions successfully.
> >> >> Another possibility is that the CPU spent the full time in interrupt.
> >> >> Get an interrupt from the idle loop, stay in interrupt for 60 seconds,
> >> >> get an RCU CPU stall warning.
> >> >>
> >> >> Or I could have somehow inserted a bug in RCU. ?But I am not seeing
> >> >> this in my testing.
> >> >>
> >> >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Thanx, Paul
> >> >>
> >> >
> >> > The problems started when I first saw CONFIG_RCU_CPU_STALL_TIMEOUT=60
> >> > in my configs.
> >> >
> >> > This an old IBM T40p notebook with Pentium-M (Banias) UP processor.
> >> > IIRC I have flashed the latest BIOS available for this notebook.
> >> >
> >> > [ ? 11.786073] thinkpad_acpi: ThinkPad BIOS 1RETDRWW (3.23 ), EC 1RHT71WW-3.04
> >> > [ ? 11.786111] thinkpad_acpi: IBM ThinkPad T40p, model 2374SG6
> >> >
> >> > As I am still sitting in the dark, it would be very helpful to know if
> >> > I can play with HZ or RCU kernel-config parameters.
> >> > Can I change RCU behaviour from user-space?
> >> >
> >> > - Sedat -
> >> >
> >> > P.S.: Note to myself: Read Documentation/RCU/stallwarn.txt & check
> >> > possible values in lib/Kconfig.debug
> >> >
> >>
> >> OK, I had a deeper look at the RCU (STALL) kernel-configs.
> >>
> >> $ grep RCU /boot/config-2.6.38-next20110323-3-686-iniza | grep STALL
> >> # CONFIG_RCU_CPU_STALL_DETECTOR is not set
> >>
> >> $ grep RCU /boot/config-2.6.38-next20110324-2-686-iniza | grep STALL
> >> # CONFIG_RCU_CPU_STALL_DETECTOR is not set
> >>
> >> $ grep RCU /boot/config-2.6.38-next20110325-2-686-iniza | grep STALL
> >> CONFIG_RCU_CPU_STALL_TIMEOUT=60
> >
> > Yep, you moved from a kernel version that had the stall detected disabled
> > by default to one that enables it by default.
> >
> > But -next has had stall detection enabled by default for a good
> > long time now.
> >
> >> With my todays (next-20110325) linux-next kernel I cannot work!
> >> The yesterday call-traces could be indeed a different issue (I am
> >> currently testing with the 2 patches from block-tree [1]).
> >>
> >> Now, I am building a new linux-next kernel with CONFIG_TREE_RCU=y as
> >> recommended in Documentation/RCU/stallwarn.txt file.
> >
> > You had CONFIG_TREE_PREEMPT_RCU=y earlier? ?Tiny RCU does not have
> > a stall detector.
> >
> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Thanx, Paul
> >
> >> - Sedat -
> >>
> >> [1] http://lkml.org/lkml/2011/3/25/326
> >
>
> No, and I have here SMP configured.
> Yesterday's RCU and SMP kernel config settings:
>
> # egrep '_RCU|RCU_|_SMP' /boot/config-2.6.38-next20110324-2-686-iniza
> CONFIG_X86_32_SMP=y
> CONFIG_TREE_RCU=y
> # CONFIG_PREEMPT_RCU is not set
> # CONFIG_RCU_TRACE is not set
> CONFIG_RCU_FANOUT=32
> # CONFIG_RCU_FANOUT_EXACT is not set
> CONFIG_RCU_FAST_NO_HZ=y
> # CONFIG_TREE_RCU_TRACE is not set
> CONFIG_USE_GENERIC_SMP_HELPERS=y
> CONFIG_SMP=y
> CONFIG_PM_SLEEP_SMP=y
> CONFIG_HAVE_TEXT_POKE_SMP=y
> CONFIG_SCSI_SAS_HOST_SMP=y
> # CONFIG_SPARSE_RCU_POINTER is not set
> # CONFIG_RCU_TORTURE_TEST is not set
> # CONFIG_RCU_CPU_STALL_DETECTOR is not set
>
> IIRC Tiny RCU and SMP bite each other?
> So, what do you recommend for an UP processor machine?

If you want RCU stall warnings, or if you are building an SMP kernel, it
has to be either TREE_RCU or TREE_PREEMPT_RCU. If you are on UP and don't
care about RCU stall warnings, then either TINY_RCU or TINY_PREEMPT_RCU
will work fine.

I just saw your "Now, I am building a new linux-next kernel with
CONFIG_TREE_RCU=y" and thought that you were hinting that you had
been running with something other than TREE_RCU.

> - Sedat -
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2011-03-26 08:11:39

On Mon, Mar 28, 2011 at 06:24:36AM -0700, Paul E. McKenney wrote:
> On Mon, Mar 28, 2011 at 02:33:36PM +0200, Sedat Dilek wrote:
> > On Mon, Mar 28, 2011 at 6:08 AM, Paul E. McKenney
> > <[email protected]> wrote:
> > > On Sun, Mar 27, 2011 at 11:48:30PM +0200, Sedat Dilek wrote:
> > >> On Sun, Mar 27, 2011 at 11:32 PM, Paul E. McKenney
> > >> <[email protected]> wrote:
> > >> > On Sun, Mar 27, 2011 at 02:26:15PM +0200, Sedat Dilek wrote:
> > >> >> On Sun, Mar 27, 2011 at 7:07 AM, Paul E. McKenney
> > >> >> <[email protected]> wrote:
> > >> >> > On Sat, Mar 26, 2011 at 08:25:29PM -0700, Paul E. McKenney wrote:
> > >> >> >> On Sun, Mar 27, 2011 at 03:30:34AM +0200, Sedat Dilek wrote:
> > >> >> >> > On Sun, Mar 27, 2011 at 1:09 AM, Paul E. McKenney
> > >> >> >> > <[email protected]> wrote:
> > >> >> >> > > On Sat, Mar 26, 2011 at 11:15:22PM +0100, Sedat Dilek wrote:
> > >> >> >
> > >> >> > [ . . . ]
> > >> >> >
> > >> >> >> > >> But then came RCU :-(.
> > >> >> >> > >
> > >> >> >> > > Well, if it turns out to be a problem in RCU I will certainly apologize.
> > >> >> >> > >
> > >> >> >> >
> > >> >> >> > No, that's not so dramatic.
> > >> >> >> > Dealing with this RCU issue has nice side-effects: I remembered (and
> > >> >> >> > finally did) to use a reduced kernel-config set.
> > >> >> >> > The base for it I created with 'make localmodconfig' and did some
> > >> >> >> > manual fine-tuning afterwards (throw out media, rc, dvd, unneeded FSs,
> > >> >> >> > etc.).
> > >> >> >> > Also, I can use fresh gcc-4.6 (4.6.0-1) from the official Debian repos.
> > >> >> >> >
> > >> >> >> > So, I started building with
> > >> >> >> > "revert-rcu-patches/0001-Revert-rcu-introduce-kfree_rcu.patch".
> > >> >> >> > I will let you know.
> > >> >> >>
> > >> >> >> And please also check for tasks consuming all available CPU.
> > >> >> >
> > >> >> > And I still cannot reproduce with the full RCU stack (but based off of
> > >> >> > 2.6.38 rather than -next). ?Nevertheless, if you would like to try a
> > >> >> > speculative patch, here you go.
> > >> >>
> > >> >> You are right and my strategy on handling the (possible RCU?) issue is wrong.
> > >> >> Surely, you tested your RCU stuff in your own repo and everything
> > >> >> might be OK on top of stable 2.6.38.
> > >> >> Linux-next gets daily updates from a lot of different trees, so there
> > >> >> might be interferences with other stuff.
> > >> >> Please, understand I am interested in finding out what is the cause
> > >> >> for my issues, my aim is not to blame you.
> > >> >
> > >> > I am not worried about blame, but rather getting the bug fixed. ?The
> > >> > bug might be in RCU, it might be elsewhere, or it might be a combination
> > >> > of problems in RCU and elsewhere.
> > >> >
> > >> > So the first priority is locating the bug.
> > >> >
> > >> > And that is why I have been asking you over and over to PLEASE take
> > >> > a look at what tasks are consuming CPU while the problem is occuring.
> > >> > The reason that I have been asking over and over is that the symptoms
> > >> > you describe are likely caused by a loop in some kernel code. ?Yes,
> > >> > there might be other causes, but this is the most likely. ?Given that
> > >> > TREE_PREEMPT_RCU behaves better than TREE_RCU, it is likely that this
> > >> > loop is in preemptible code with irqs enabled. ?Therefore, the process
> > >> > accounting code is likely to be able to see the CPU consumption, and
> > >> > you should be able to see it via the "top" or "ps" commands -- or via
> > >> > any number of other tools.
> > >> >
> > >> > For example, if the problem is confined to RCU, you would likely see
> > >> > the "rcuc0" or "rcun0" tasks consuming lots of CPU. ?This would narrow
> > >> > the problem down to a few tens of lines of code. ?If the problem was
> > >> > in some other kthread, then identifying the kthread would very likely
> > >> > narrow things down as well.
> > >> >
> > >> > So, please do take a look to see what taks consuming CPU.
> > >> >
> > >> >> As I was wrong and want to be 99.9% sure it is RCU stuff, I reverted
> > >> >> all (18) RCU patches from linux-next (next-20110325) by keeping the
> > >> >> RCU|PREEMPT|HZ settings from last working next-20110323.
> > >> >
> > >> > Makes sense.
> > >> >
> > >> >> $ egrep 'RCU|PREEMPT|_HZ' /boot/config-2.6.38-next20110325-7-686-iniza
> > >> >> # RCU Subsystem
> > >> >> CONFIG_TREE_RCU=y
> > >> >> # CONFIG_PREEMPT_RCU is not set
> > >> >> # CONFIG_RCU_TRACE is not set
> > >> >> CONFIG_RCU_FANOUT=32
> > >> >> # CONFIG_RCU_FANOUT_EXACT is not set
> > >> >> CONFIG_RCU_FAST_NO_HZ=y
> > >> >> # CONFIG_TREE_RCU_TRACE is not set
> > >> >> CONFIG_PREEMPT_NOTIFIERS=y
> > >> >> CONFIG_NO_HZ=y
> > >> >> # CONFIG_PREEMPT_NONE is not set
> > >> >> CONFIG_PREEMPT_VOLUNTARY=y
> > >> >> # CONFIG_PREEMPT is not set
> > >> >> # CONFIG_HZ_100 is not set
> > >> >> CONFIG_HZ_250=y
> > >> >> # CONFIG_HZ_300 is not set
> > >> >> # CONFIG_HZ_1000 is not set
> > >> >> CONFIG_HZ=250
> > >> >> # CONFIG_SPARSE_RCU_POINTER is not set
> > >> >> # CONFIG_RCU_TORTURE_TEST is not set
> > >> >> # CONFIG_RCU_CPU_STALL_DETECTOR is not set
> > >> >>
> > >> >> I will work and stress this kernel before doing any step-by-step
> > >> >> revert of RCU stuff.
> > >> >>
> > >> >> Thanks for your patch, I applied it on top of "naked" next-20110325,
> > >> >> but I still see call-traces.
> > >> >
> > >> > Thank you very much for testing it!
> > >> >
> > >> > I intend to keep that patch, as it should increase robustness in other
> > >> > situations.
> > >> >
> > >> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Thanx, Paul
> > >> >
> > >> >> - Sedat -
> > >> >>
> > >> >>
> > >> >>
> > >> >> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Thanx, Paul
> > >> >> >
> > >> >> > ------------------------------------------------------------------------
> > >> >> >
> > >> >> > rcu: further lower priority in rcu_yield()
> > >> >> >
> > >> >> > Although rcu_yield() dropped from real-time to normal priority, there
> > >> >> > is always the possibility that the competing tasks have been niced.
> > >> >> > So nice to 19 in rcu_yield() to help ensure that other tasks have a
> > >> >> > better chance of running.
> > >> >> >
> > >> >> > ? ?Signed-off-by: Paul E. McKenney <[email protected]>
> > >> >> >
> > >> >> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > >> >> > index 759f54b..5477764 100644
> > >> >> > --- a/kernel/rcutree.c
> > >> >> > +++ b/kernel/rcutree.c
> > >> >> > @@ -1492,6 +1492,7 @@ static void rcu_yield(void (*f)(unsigned long), unsigned long arg)
> > >> >> > ? ? ? ?mod_timer(&yield_timer, jiffies + 2);
> > >> >> > ? ? ? ?sp.sched_priority = 0;
> > >> >> > ? ? ? ?sched_setscheduler_nocheck(current, SCHED_NORMAL, &sp);
> > >> >> > + ? ? ? set_user_nice(current, 19);
> > >> >> > ? ? ? ?schedule();
> > >> >> > ? ? ? ?sp.sched_priority = RCU_KTHREAD_PRIO;
> > >> >> > ? ? ? ?sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);
> > >>
> > >> Sorry, my attempt was to identify and isolate the culprit commit.
> > >>
> > >> Reverting all RCU patches resulted in a stable system, the following 8
> > >> kernels with reduced k-config setup where all built using this kernel.
> > >>
> > >> All kernels used TREE_RCU (see above), I did not change it (no
> > >> mixing/switching to PREEMPT and TREE_PREEMPT_RCU).
> > >> ( I doubt that TREE_PREEMPT_RCU was some kind of more stable here. )
> > >>
> > >> The culprit commit is bc56163ebd4580199ac7e63f5e160bf139ba0dd6 (from
> > >> rcu/next GIT tree):
> > >> "rcu: move TREE_RCU from softirq to kthread"
> > >
> >
> > Hi Paul,
> >
> > > OK, please accept my apologies for your lost weekend. ?And thank you for
> > > testing this.
> > >
> >
> > No worries, it was mostly a rainy day.
> > The only thing I did @ 16:30 was to go to regional election (the new
> > (regional) prime minister will be the 1st from The German Green
> > party).
>
> ;-)
>
> > But back to RCU :-):
> > The reduced kernel-config setup decreased the build-time from approx.
> > 2hrs (full, generic build) down to approx. 35mins.
>
> Very good!
>
> > >> I can do parallelly a tar job, open 20 tabs in firefox and run a flash
> > >> video in one of them (I did this several times).
> > >
> > > How many files in the tar job? ?Is this creating a tar archive, expanding
> > > it, or both?
> >
> > I am doing a simple tar (filesize: 1.6G for full and 1.0G for reduced build):
> >
> > $ tar -cf $archivedir-on-external-usbhdd/$tarfile $kernel-build-dir
>
> OK, I was extracting a tarball and then removing the resulting source
> tree. I will try this. Though it does seem strange -- I can understand
> how removing a file tree would stress RCU, but not creating a tarball.
> Ah, well, if I fully understood it, there would not be a bug.
>
> > ...plus parallelly opening 20 tabs in firefox.
> > That's normally enough to get my system freaky and see RCU related
> > messages in the logs.
>
> Hmmm... My normal test systems don't have X -- I will need to set
> this up.
>
> > > Do you have a script for this? ?Are all of these running at normal
> > > priority, or are some of them running at real-time priority?
> > >
> >
> > Nothing special.
>
> OK.
>
> > >> [ setup.log ]
> > >> ...
> > >> ? (+) OK ? revert-rcu-patches/0001-Revert-rcu-introduce-kfree_rcu.patch
> > >> ? (+) OK ? revert-rcu-patches/0002-Revert-rcu-fix-spelling.patch
> > >> ? (+) OK ? revert-rcu-patches/0003-Revert-rcu-fix-rcu_cpu_kthread_task-synchronization.patch
> > >> ? (+) OK ? revert-rcu-patches/0004-Revert-rcu-call-__rcu_read_unlock-in-exit_rcu-for-tr.patch
> > >> ? (+) OK ? revert-rcu-patches/0005-Revert-rcu-Converge-TINY_RCU-expedited-and-normal-bo.patch
> > >> ? (+) OK ? revert-rcu-patches/0006-Revert-rcu-remove-useless-boosted_this_gp-field.patch
> > >> ? (+) OK ? revert-rcu-patches/0007-Revert-rcu-code-cleanups-in-TINY_RCU-priority-boosti.patch
> > >> ? (+) OK ? revert-rcu-patches/0008-Revert-rcu-Switch-to-this_cpu-primitives.patch
> > >> ? (+) OK ? revert-rcu-patches/0009-Revert-rcu-Use-WARN_ON_ONCE-for-DEBUG_OBJECTS_RCU_HE.patch
> > >> ? (+) OK ? revert-rcu-patches/0010-Revert-rcu-Enable-DEBUG_OBJECTS_RCU_HEAD-from-PREEMP.patch
> > >> ? (+) OK ? revert-rcu-patches/0011-Revert-rcu-Add-boosting-to-TREE_PREEMPT_RCU-tracing.patch
> > >> ? (+) OK ? revert-rcu-patches/0012-Revert-rcu-eliminate-unused-boosting-statistics.patch
> > >> ? (+) OK ? revert-rcu-patches/0013-Revert-rcu-priority-boosting-for-TREE_PREEMPT_RCU.patch
> > >> ? (+) OK ? revert-rcu-patches/0014-Revert-rcu-move-TREE_RCU-from-softirq-to-kthread.patch
> > >> ...
> > >>
> > >> Hope this helps to narrow down the problem.
> > >>
> > >> As I kept all kernels I can have a look at the tasks consuming high
> > >> CPU usage tomorrow.
> > >
> > > Could you please?
> >
> > I recalled (as you say I requested over and over again from you :-)) I
> > looked with top, htop and 'ps axu', but there was nothing special.
> > Sometimes the system got frozen - at this point (or short before) I
> > did not see anything suspicious with top.
>
> OK, thank you for the info.
>
> > > Also, could you please mount debugfs and list out the files in the
> > > "rcu" directory? ?The "ql=" value from the "rcu/rcudata" file is of
> > > particular interest.
> > >
> >
> > Ah, before I forget...
> >
> > I used TREE_RCU (was the default before noticing RCU issue) for
> > finding the culprit commit.
> > If it is from your POV more helpful to switch to PREEMPT + PREEMPT_RCU
> > + RCU_BOOST, please let me *now* know.
> > ( Both RCU setups freaks up the system. )
>
> If TREE_RCU hits problems faster, it is probably best to stay with
> TREE_RCU.

And of course, one exception to this advice is if TREE_RCU hangs so hard
and fast that you don't have time to get any diagnostics. If this is the
case, then TREE_PREEMPT_RCU might be more productive.

Thanx, Paul

> > I think top & Co. are not enough to track the problem down.
> > I have seen tracing and debugging facililities for RCU.
> >
> > Some questions to debug and trace setup:
> >
> > Case #1: TREE_RCU
> >
> > CONFIG_RCU_TRACE=y
> > CONFIG_TREE_RCU_TRACE=y
>
> Yep.
>
> > Case #2: PREEMPT + PREEMPT_RCU + RCU_BOOST
> >
> > CONFIG_RCU_TRACE=y
> > CONFIG_TREE_RCU_TRACE=y
> > CONFIG_DEBUG_PREEMPT=y <--- Helpful?
> > CONFIG_PREEMPT_TRACER=y <--- Helpful?
> >
> > Any other recommends for useful/helpful trace and/or debug options?
> >
> > Any other intructions for debugging/tracing?
>
> Not at the moment. I will be looking at diagnostics will going
> through the code, so might have something later.
>
> > BTW, today's linux-next (next-20110328) is still freaky, I applied the
> > revert-rcu-patches patchset and all is fine.
>
> I reverted back to the commit preceding the one you pointed out last night
> my time, so the upcoming -next should be less freaky.
>
> > - Sedat -
> >
> > P.S.: Note to myself
> >
> > # mount -t debugfs none /sys/kernel/debug/
> > # ln -s /sys/kernel/debug /debug
> >
> > # find /debug -name rcu
>
> Or:
>
> # cd /debug/rcu
>
> then dump out everything except for the .csv file (which is the same
> as the non-.csv equivalent, but in spreadsheet format -- intended
> for systems with 100s or 1000s of CPUs).
>
> Thanx, Paul

2011-03-28 16:38:08

by Sedat Dilek

[permalink] [raw]

Subject: Re: linux-next: Tree for March 25 (Call trace: RCU|workqueues|block|VFS|ext4 related?)

On Mon, Mar 28, 2011 at 5:11 PM, Paul E. McKenney
<[email protected]> wrote:
> On Mon, Mar 28, 2011 at 06:24:36AM -0700, Paul E. McKenney wrote:
>> On Mon, Mar 28, 2011 at 02:33:36PM +0200, Sedat Dilek wrote:
>> > On Mon, Mar 28, 2011 at 6:08 AM, Paul E. McKenney
>> > <[email protected]> wrote:
>> > > On Sun, Mar 27, 2011 at 11:48:30PM +0200, Sedat Dilek wrote:
>> > >> On Sun, Mar 27, 2011 at 11:32 PM, Paul E. McKenney
>> > >> <[email protected]> wrote:
>> > >> > On Sun, Mar 27, 2011 at 02:26:15PM +0200, Sedat Dilek wrote:
>> > >> >> On Sun, Mar 27, 2011 at 7:07 AM, Paul E. McKenney
>> > >> >> <[email protected]> wrote:
>> > >> >> > On Sat, Mar 26, 2011 at 08:25:29PM -0700, Paul E. McKenney wrote:
>> > >> >> >> On Sun, Mar 27, 2011 at 03:30:34AM +0200, Sedat Dilek wrote:
>> > >> >> >> > On Sun, Mar 27, 2011 at 1:09 AM, Paul E. McKenney
>> > >> >> >> > <[email protected]> wrote:
>> > >> >> >> > > On Sat, Mar 26, 2011 at 11:15:22PM +0100, Sedat Dilek wrote:
>> > >> >> >
>> > >> >> > [ . . . ]
>> > >> >> >
>> > >> >> >> > >> But then came RCU :-(.
>> > >> >> >> > >
>> > >> >> >> > > Well, if it turns out to be a problem in RCU I will certainly apologize.
>> > >> >> >> > >
>> > >> >> >> >
>> > >> >> >> > No, that's not so dramatic.
>> > >> >> >> > Dealing with this RCU issue has nice side-effects: I remembered (and
>> > >> >> >> > finally did) to use a reduced kernel-config set.
>> > >> >> >> > The base for it I created with 'make localmodconfig' and did some
>> > >> >> >> > manual fine-tuning afterwards (throw out media, rc, dvd, unneeded FSs,
>> > >> >> >> > etc.).
>> > >> >> >> > Also, I can use fresh gcc-4.6 (4.6.0-1) from the official Debian repos.
>> > >> >> >> >
>> > >> >> >> > So, I started building with
>> > >> >> >> > "revert-rcu-patches/0001-Revert-rcu-introduce-kfree_rcu.patch".
>> > >> >> >> > I will let you know.
>> > >> >> >>
>> > >> >> >> And please also check for tasks consuming all available CPU.
>> > >> >> >
>> > >> >> > And I still cannot reproduce with the full RCU stack (but based off of
>> > >> >> > 2.6.38 rather than -next). Nevertheless, if you would like to try a
>> > >> >> > speculative patch, here you go.
>> > >> >>
>> > >> >> You are right and my strategy on handling the (possible RCU?) issue is wrong.
>> > >> >> Surely, you tested your RCU stuff in your own repo and everything
>> > >> >> might be OK on top of stable 2.6.38.
>> > >> >> Linux-next gets daily updates from a lot of different trees, so there
>> > >> >> might be interferences with other stuff.
>> > >> >> Please, understand I am interested in finding out what is the cause
>> > >> >> for my issues, my aim is not to blame you.
>> > >> >
>> > >> > I am not worried about blame, but rather getting the bug fixed. The
>> > >> > bug might be in RCU, it might be elsewhere, or it might be a combination
>> > >> > of problems in RCU and elsewhere.
>> > >> >
>> > >> > So the first priority is locating the bug.
>> > >> >
>> > >> > And that is why I have been asking you over and over to PLEASE take
>> > >> > a look at what tasks are consuming CPU while the problem is occuring.
>> > >> > The reason that I have been asking over and over is that the symptoms
>> > >> > you describe are likely caused by a loop in some kernel code. Yes,
>> > >> > there might be other causes, but this is the most likely. Given that
>> > >> > TREE_PREEMPT_RCU behaves better than TREE_RCU, it is likely that this
>> > >> > loop is in preemptible code with irqs enabled. Therefore, the process
>> > >> > accounting code is likely to be able to see the CPU consumption, and
>> > >> > you should be able to see it via the "top" or "ps" commands -- or via
>> > >> > any number of other tools.
>> > >> >
>> > >> > For example, if the problem is confined to RCU, you would likely see
>> > >> > the "rcuc0" or "rcun0" tasks consuming lots of CPU. This would narrow
>> > >> > the problem down to a few tens of lines of code. If the problem was
>> > >> > in some other kthread, then identifying the kthread would very likely
>> > >> > narrow things down as well.
>> > >> >
>> > >> > So, please do take a look to see what taks consuming CPU.
>> > >> >
>> > >> >> As I was wrong and want to be 99.9% sure it is RCU stuff, I reverted
>> > >> >> all (18) RCU patches from linux-next (next-20110325) by keeping the
>> > >> >> RCU|PREEMPT|HZ settings from last working next-20110323.
>> > >> >
>> > >> > Makes sense.
>> > >> >
>> > >> >> $ egrep 'RCU|PREEMPT|_HZ' /boot/config-2.6.38-next20110325-7-686-iniza
>> > >> >> # RCU Subsystem
>> > >> >> CONFIG_TREE_RCU=y
>> > >> >> # CONFIG_PREEMPT_RCU is not set
>> > >> >> # CONFIG_RCU_TRACE is not set
>> > >> >> CONFIG_RCU_FANOUT=32
>> > >> >> # CONFIG_RCU_FANOUT_EXACT is not set
>> > >> >> CONFIG_RCU_FAST_NO_HZ=y
>> > >> >> # CONFIG_TREE_RCU_TRACE is not set
>> > >> >> CONFIG_PREEMPT_NOTIFIERS=y
>> > >> >> CONFIG_NO_HZ=y
>> > >> >> # CONFIG_PREEMPT_NONE is not set
>> > >> >> CONFIG_PREEMPT_VOLUNTARY=y
>> > >> >> # CONFIG_PREEMPT is not set
>> > >> >> # CONFIG_HZ_100 is not set
>> > >> >> CONFIG_HZ_250=y
>> > >> >> # CONFIG_HZ_300 is not set
>> > >> >> # CONFIG_HZ_1000 is not set
>> > >> >> CONFIG_HZ=250
>> > >> >> # CONFIG_SPARSE_RCU_POINTER is not set
>> > >> >> # CONFIG_RCU_TORTURE_TEST is not set
>> > >> >> # CONFIG_RCU_CPU_STALL_DETECTOR is not set
>> > >> >>
>> > >> >> I will work and stress this kernel before doing any step-by-step
>> > >> >> revert of RCU stuff.
>> > >> >>
>> > >> >> Thanks for your patch, I applied it on top of "naked" next-20110325,
>> > >> >> but I still see call-traces.
>> > >> >
>> > >> > Thank you very much for testing it!
>> > >> >
>> > >> > I intend to keep that patch, as it should increase robustness in other
>> > >> > situations.
>> > >> >
>> > >> > Thanx, Paul
>> > >> >
>> > >> >> - Sedat -
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> > Thanx, Paul
>> > >> >> >
>> > >> >> > ------------------------------------------------------------------------
>> > >> >> >
>> > >> >> > rcu: further lower priority in rcu_yield()
>> > >> >> >
>> > >> >> > Although rcu_yield() dropped from real-time to normal priority, there
>> > >> >> > is always the possibility that the competing tasks have been niced.
>> > >> >> > So nice to 19 in rcu_yield() to help ensure that other tasks have a
>> > >> >> > better chance of running.
>> > >> >> >
>> > >> >> > Signed-off-by: Paul E. McKenney <[email protected]>
>> > >> >> >
>> > >> >> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>> > >> >> > index 759f54b..5477764 100644
>> > >> >> > --- a/kernel/rcutree.c
>> > >> >> > +++ b/kernel/rcutree.c
>> > >> >> > @@ -1492,6 +1492,7 @@ static void rcu_yield(void (*f)(unsigned long), unsigned long arg)
>> > >> >> > mod_timer(&yield_timer, jiffies + 2);
>> > >> >> > sp.sched_priority = 0;
>> > >> >> > sched_setscheduler_nocheck(current, SCHED_NORMAL, &sp);
>> > >> >> > + set_user_nice(current, 19);
>> > >> >> > schedule();
>> > >> >> > sp.sched_priority = RCU_KTHREAD_PRIO;
>> > >> >> > sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);
>> > >>
>> > >> Sorry, my attempt was to identify and isolate the culprit commit.
>> > >>
>> > >> Reverting all RCU patches resulted in a stable system, the following 8
>> > >> kernels with reduced k-config setup where all built using this kernel.
>> > >>
>> > >> All kernels used TREE_RCU (see above), I did not change it (no
>> > >> mixing/switching to PREEMPT and TREE_PREEMPT_RCU).
>> > >> ( I doubt that TREE_PREEMPT_RCU was some kind of more stable here. )
>> > >>
>> > >> The culprit commit is bc56163ebd4580199ac7e63f5e160bf139ba0dd6 (from
>> > >> rcu/next GIT tree):
>> > >> "rcu: move TREE_RCU from softirq to kthread"
>> > >
>> >
>> > Hi Paul,
>> >
>> > > OK, please accept my apologies for your lost weekend. And thank you for
>> > > testing this.
>> > >
>> >
>> > No worries, it was mostly a rainy day.
>> > The only thing I did @ 16:30 was to go to regional election (the new
>> > (regional) prime minister will be the 1st from The German Green
>> > party).
>>
>> ;-)
>>
>> > But back to RCU :-):
>> > The reduced kernel-config setup decreased the build-time from approx.
>> > 2hrs (full, generic build) down to approx. 35mins.
>>
>> Very good!
>>
>> > >> I can do parallelly a tar job, open 20 tabs in firefox and run a flash
>> > >> video in one of them (I did this several times).
>> > >
>> > > How many files in the tar job? Is this creating a tar archive, expanding
>> > > it, or both?
>> >
>> > I am doing a simple tar (filesize: 1.6G for full and 1.0G for reduced build):
>> >
>> > $ tar -cf $archivedir-on-external-usbhdd/$tarfile $kernel-build-dir
>>
>> OK, I was extracting a tarball and then removing the resulting source
>> tree. I will try this. Though it does seem strange -- I can understand
>> how removing a file tree would stress RCU, but not creating a tarball.
>> Ah, well, if I fully understood it, there would not be a bug.
>>
>> > ...plus parallelly opening 20 tabs in firefox.
>> > That's normally enough to get my system freaky and see RCU related
>> > messages in the logs.
>>
>> Hmmm... My normal test systems don't have X -- I will need to set
>> this up.
>>
>> > > Do you have a script for this? Are all of these running at normal
>> > > priority, or are some of them running at real-time priority?
>> > >
>> >
>> > Nothing special.
>>
>> OK.
>>
>> > >> [ setup.log ]
>> > >> ...
>> > >> (+) OK revert-rcu-patches/0001-Revert-rcu-introduce-kfree_rcu.patch
>> > >> (+) OK revert-rcu-patches/0002-Revert-rcu-fix-spelling.patch
>> > >> (+) OK revert-rcu-patches/0003-Revert-rcu-fix-rcu_cpu_kthread_task-synchronization.patch
>> > >> (+) OK revert-rcu-patches/0004-Revert-rcu-call-__rcu_read_unlock-in-exit_rcu-for-tr.patch
>> > >> (+) OK revert-rcu-patches/0005-Revert-rcu-Converge-TINY_RCU-expedited-and-normal-bo.patch
>> > >> (+) OK revert-rcu-patches/0006-Revert-rcu-remove-useless-boosted_this_gp-field.patch
>> > >> (+) OK revert-rcu-patches/0007-Revert-rcu-code-cleanups-in-TINY_RCU-priority-boosti.patch
>> > >> (+) OK revert-rcu-patches/0008-Revert-rcu-Switch-to-this_cpu-primitives.patch
>> > >> (+) OK revert-rcu-patches/0009-Revert-rcu-Use-WARN_ON_ONCE-for-DEBUG_OBJECTS_RCU_HE.patch
>> > >> (+) OK revert-rcu-patches/0010-Revert-rcu-Enable-DEBUG_OBJECTS_RCU_HEAD-from-PREEMP.patch
>> > >> (+) OK revert-rcu-patches/0011-Revert-rcu-Add-boosting-to-TREE_PREEMPT_RCU-tracing.patch
>> > >> (+) OK revert-rcu-patches/0012-Revert-rcu-eliminate-unused-boosting-statistics.patch
>> > >> (+) OK revert-rcu-patches/0013-Revert-rcu-priority-boosting-for-TREE_PREEMPT_RCU.patch
>> > >> (+) OK revert-rcu-patches/0014-Revert-rcu-move-TREE_RCU-from-softirq-to-kthread.patch
>> > >> ...
>> > >>
>> > >> Hope this helps to narrow down the problem.
>> > >>
>> > >> As I kept all kernels I can have a look at the tasks consuming high
>> > >> CPU usage tomorrow.
>> > >
>> > > Could you please?
>> >
>> > I recalled (as you say I requested over and over again from you :-)) I
>> > looked with top, htop and 'ps axu', but there was nothing special.
>> > Sometimes the system got frozen - at this point (or short before) I
>> > did not see anything suspicious with top.
>>
>> OK, thank you for the info.
>>
>> > > Also, could you please mount debugfs and list out the files in the
>> > > "rcu" directory? The "ql=" value from the "rcu/rcudata" file is of
>> > > particular interest.
>> > >
>> >
>> > Ah, before I forget...
>> >
>> > I used TREE_RCU (was the default before noticing RCU issue) for
>> > finding the culprit commit.
>> > If it is from your POV more helpful to switch to PREEMPT + PREEMPT_RCU
>> > + RCU_BOOST, please let me *now* know.
>> > ( Both RCU setups freaks up the system. )
>>
>> If TREE_RCU hits problems faster, it is probably best to stay with
>> TREE_RCU.
>
> And of course, one exception to this advice is if TREE_RCU hangs so hard
> and fast that you don't have time to get any diagnostics. If this is the
> case, then TREE_PREEMPT_RCU might be more productive.
>

OK, that would somehow explain why I could not really get some debug
infos when doing "my stress-test" and checking via:

$ LC_ALL=C tail -f /sys/kernel/debug/rcu/rcudata

Then I remembered I saw a snippet for a RCU torture script mentionned
in the kernel-docs (see Documentation/RCU/torture.txt).

189 The following script may be used to torture RCU:
190
191 #!/bin/sh
192
193 modprobe rcutorture
194 sleep 100
195 rmmod rcutorture
196 dmesg | grep torture:

So, I recompiled a new TREE_RC-based kernel and build with
CONFIG_RCU_TORTURE_TEST=m.

Unfortunately, the rmmod (I prefer modprobe -r -v) hangs... the
messages in the logs look promising.

- Sedat -

> Thanx, Paul
>
>> > I think top & Co. are not enough to track the problem down.
>> > I have seen tracing and debugging facililities for RCU.
>> >
>> > Some questions to debug and trace setup:
>> >
>> > Case #1: TREE_RCU
>> >
>> > CONFIG_RCU_TRACE=y
>> > CONFIG_TREE_RCU_TRACE=y
>>
>> Yep.
>>
>> > Case #2: PREEMPT + PREEMPT_RCU + RCU_BOOST
>> >
>> > CONFIG_RCU_TRACE=y
>> > CONFIG_TREE_RCU_TRACE=y
>> > CONFIG_DEBUG_PREEMPT=y <--- Helpful?
>> > CONFIG_PREEMPT_TRACER=y <--- Helpful?
>> >
>> > Any other recommends for useful/helpful trace and/or debug options?
>> >
>> > Any other intructions for debugging/tracing?
>>
>> Not at the moment. I will be looking at diagnostics will going
>> through the code, so might have something later.
>>
>> > BTW, today's linux-next (next-20110328) is still freaky, I applied the
>> > revert-rcu-patches patchset and all is fine.
>>
>> I reverted back to the commit preceding the one you pointed out last night
>> my time, so the upcoming -next should be less freaky.
>>
>> > - Sedat -
>> >
>> > P.S.: Note to myself
>> >
>> > # mount -t debugfs none /sys/kernel/debug/
>> > # ln -s /sys/kernel/debug /debug
>> >
>> > # find /debug -name rcu
>>
>> Or:
>>
>> # cd /debug/rcu
>>
>> then dump out everything except for the .csv file (which is the same
>> as the non-.csv equivalent, but in spreadsheet format -- intended
>> for systems with 100s or 1000s of CPUs).
>>
>> Thanx, Paul
>

Attachments:

msg_rcu-torture.txt (31.82 kB)

2011-03-28 16:46:51

On Tue, Mar 29, 2011 at 2:10 AM, Paul E. McKenney
<[email protected]> wrote:
> On Mon, Mar 28, 2011 at 06:46:48PM +0200, Sedat Dilek wrote:
>> On Mon, Mar 28, 2011 at 6:38 PM, Sedat Dilek <[email protected]> wrote:
>> > On Mon, Mar 28, 2011 at 5:11 PM, Paul E. McKenney
>> > <[email protected]> wrote:
>> >> On Mon, Mar 28, 2011 at 06:24:36AM -0700, Paul E. McKenney wrote:
>> >>> On Mon, Mar 28, 2011 at 02:33:36PM +0200, Sedat Dilek wrote:
>
> [ . . . ]
>
>> >>> > Ah, before I forget...
>> >>> >
>> >>> > I used TREE_RCU (was the default before noticing RCU issue) for
>> >>> > finding the culprit commit.
>> >>> > If it is from your POV more helpful to switch to PREEMPT + PREEMPT_RCU
>> >>> > + RCU_BOOST, please let me *now* know.
>> >>> > ( Both RCU setups freaks up the system. )
>> >>>
>> >>> If TREE_RCU hits problems faster, it is probably best to stay with
>> >>> TREE_RCU.
>> >>
>> >> And of course, one exception to this advice is if TREE_RCU hangs so hard
>> >> and fast that you don't have time to get any diagnostics. If this is the
>> >> case, then TREE_PREEMPT_RCU might be more productive.
>> >>
>> >
>> > OK, that would somehow explain why I could not really get some debug
>> > infos when doing "my stress-test" and checking via:
>> >
>> > $ LC_ALL=C tail -f /sys/kernel/debug/rcu/rcudata
>> >
>> > Then I remembered I saw a snippet for a RCU torture script mentionned
>> > in the kernel-docs (see Documentation/RCU/torture.txt).
>> >
>> > 189 The following script may be used to torture RCU:
>> > 190
>> > 191 #!/bin/sh
>> > 192
>> > 193 modprobe rcutorture
>> > 194 sleep 100
>> > 195 rmmod rcutorture
>> > 196 dmesg | grep torture:
>> >
>> > So, I recompiled a new TREE_RC-based kernel and build with
>> > CONFIG_RCU_TORTURE_TEST=m.
>> >
>> > Unfortunately, the rmmod (I prefer modprobe -r -v) hangs... the
>> > messages in the logs look promising.
>> >
>> > - Sedat -
>> >
>>
>> Wrong attachment, correct attached.
>
> And one stupid problem located thus far. I can make a (tortured) case
> for it resulting in the symptoms you see, but it does seem unlikely to
> happen repeatedly, as it would require a burst of CPU just at the wrong
> time. But who knows?
>
> In any case, I am still looking.
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> Fix stupid typo.
>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 5477764..f311228 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1618,7 +1618,7 @@ static int rcu_node_kthread(void *arg)
> rnp->wakemask = 0;
> raw_spin_unlock_irqrestore(&rnp->lock, flags);
> rcu_initiate_boost(rnp);
> - for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
> + for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask >>= 1) {
> if ((mask & 0x1) == 0)
> continue;
> preempt_disable();
>

I have tested this patch and the previous one you send:

(+) OK rcu-fix/rcu-further-lower-priority-in-rcu_yield.patch
(+) OK rcu-fix/Fix-stupid-typo.patch

As you suggested I switched to PREEMPT and RCU with rcu-boost:

# egrep 'RCU|PREEMPT|_HZ' /boot/config-2.6.38-next20110328-5-686-iniza
# RCU Subsystem
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_TRACE=y
CONFIG_RCU_FANOUT=32
# CONFIG_RCU_FANOUT_EXACT is not set
CONFIG_TREE_RCU_TRACE=y
CONFIG_RCU_BOOST=y
CONFIG_RCU_BOOST_PRIO=1
CONFIG_RCU_BOOST_DELAY=500
CONFIG_NO_HZ=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_DEBUG_PREEMPT=y
# CONFIG_SPARSE_RCU_POINTER is not set
CONFIG_RCU_TORTURE_TEST=m
CONFIG_RCU_CPU_STALL_TIMEOUT=60
CONFIG_RCU_CPU_STALL_VERBOSE=y
CONFIG_PREEMPT_TRACER=y

Unfortunately, the rcu-torture script hangs again on unloading the
rcu-torture-test module.
Attached are RCU-related messages in my logs.
( I tailed for rcudata changes - no logs. )

- Sedat -

Attachments:

msg_rcu-torture_20110329.txt (26.21 kB)

2011-03-29 04:17:41

by Paul E. McKenney

[permalink] [raw]

Subject: Re: linux-next: Tree for March 25 (Call trace: RCU|workqueues|block|VFS|ext4 related?)

On Tue, Mar 29, 2011 at 04:42:31AM +0200, Sedat Dilek wrote:
> On Tue, Mar 29, 2011 at 2:10 AM, Paul E. McKenney
> <[email protected]> wrote:
> > On Mon, Mar 28, 2011 at 06:46:48PM +0200, Sedat Dilek wrote:
> >> On Mon, Mar 28, 2011 at 6:38 PM, Sedat Dilek <[email protected]> wrote:
> >> > On Mon, Mar 28, 2011 at 5:11 PM, Paul E. McKenney
> >> > <[email protected]> wrote:
> >> >> On Mon, Mar 28, 2011 at 06:24:36AM -0700, Paul E. McKenney wrote:
> >> >>> On Mon, Mar 28, 2011 at 02:33:36PM +0200, Sedat Dilek wrote:
> >
> > [ . . . ]
> >
> >> >>> > Ah, before I forget...
> >> >>> >
> >> >>> > I used TREE_RCU (was the default before noticing RCU issue) for
> >> >>> > finding the culprit commit.
> >> >>> > If it is from your POV more helpful to switch to PREEMPT + PREEMPT_RCU
> >> >>> > + RCU_BOOST, please let me *now* know.
> >> >>> > ( Both RCU setups freaks up the system. )
> >> >>>
> >> >>> If TREE_RCU hits problems faster, it is probably best to stay with
> >> >>> TREE_RCU.
> >> >>
> >> >> And of course, one exception to this advice is if TREE_RCU hangs so hard
> >> >> and fast that you don't have time to get any diagnostics. ?If this is the
> >> >> case, then TREE_PREEMPT_RCU might be more productive.
> >> >>
> >> >
> >> > OK, that would somehow explain why I could not really get some debug
> >> > infos when doing "my stress-test" and checking via:
> >> >
> >> > $ LC_ALL=C tail -f /sys/kernel/debug/rcu/rcudata
> >> >
> >> > Then I remembered I saw a snippet for a RCU torture script mentionned
> >> > in the kernel-docs (see Documentation/RCU/torture.txt).
> >> >
> >> > 189 The following script may be used to torture RCU:
> >> > 190
> >> > 191 ? ? ? ? #!/bin/sh
> >> > 192
> >> > 193 ? ? ? ? modprobe rcutorture
> >> > 194 ? ? ? ? sleep 100
> >> > 195 ? ? ? ? rmmod rcutorture
> >> > 196 ? ? ? ? dmesg | grep torture:
> >> >
> >> > So, I recompiled a new TREE_RC-based kernel and build with
> >> > CONFIG_RCU_TORTURE_TEST=m.
> >> >
> >> > Unfortunately, the rmmod (I prefer modprobe -r -v) hangs... the
> >> > messages in the logs look promising.
> >> >
> >> > - Sedat -
> >> >
> >>
> >> Wrong attachment, correct attached.
> >
> > And one stupid problem located thus far. ?I can make a (tortured) case
> > for it resulting in the symptoms you see, but it does seem unlikely to
> > happen repeatedly, as it would require a burst of CPU just at the wrong
> > time. ?But who knows?
> >
> > In any case, I am still looking.
> >
> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > Fix stupid typo.
> >
> > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index 5477764..f311228 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -1618,7 +1618,7 @@ static int rcu_node_kthread(void *arg)
> > ? ? ? ? ? ? ? ?rnp->wakemask = 0;
> > ? ? ? ? ? ? ? ?raw_spin_unlock_irqrestore(&rnp->lock, flags);
> > ? ? ? ? ? ? ? ?rcu_initiate_boost(rnp);
> > - ? ? ? ? ? ? ? for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
> > + ? ? ? ? ? ? ? for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask >>= 1) {
> > ? ? ? ? ? ? ? ? ? ? ? ?if ((mask & 0x1) == 0)
> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?continue;
> > ? ? ? ? ? ? ? ? ? ? ? ?preempt_disable();
> >
>
> I have tested this patch and the previous one you send:
>
> (+) OK rcu-fix/rcu-further-lower-priority-in-rcu_yield.patch
> (+) OK rcu-fix/Fix-stupid-typo.patch
>
> As you suggested I switched to PREEMPT and RCU with rcu-boost:
>
> # egrep 'RCU|PREEMPT|_HZ' /boot/config-2.6.38-next20110328-5-686-iniza
> # RCU Subsystem
> CONFIG_TREE_PREEMPT_RCU=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_RCU_TRACE=y
> CONFIG_RCU_FANOUT=32
> # CONFIG_RCU_FANOUT_EXACT is not set
> CONFIG_TREE_RCU_TRACE=y
> CONFIG_RCU_BOOST=y
> CONFIG_RCU_BOOST_PRIO=1
> CONFIG_RCU_BOOST_DELAY=500
> CONFIG_NO_HZ=y
> # CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> CONFIG_PREEMPT=y
> # CONFIG_HZ_100 is not set
> CONFIG_HZ_250=y
> # CONFIG_HZ_300 is not set
> # CONFIG_HZ_1000 is not set
> CONFIG_HZ=250
> CONFIG_DEBUG_PREEMPT=y
> # CONFIG_SPARSE_RCU_POINTER is not set
> CONFIG_RCU_TORTURE_TEST=m
> CONFIG_RCU_CPU_STALL_TIMEOUT=60
> CONFIG_RCU_CPU_STALL_VERBOSE=y
> CONFIG_PREEMPT_TRACER=y
>
> Unfortunately, the rcu-torture script hangs again on unloading the
> rcu-torture-test module.
> Attached are RCU-related messages in my logs.
> ( I tailed for rcudata changes - no logs. )

OK, still looks like grace periods are being stalled, which would
explain the hang at module-unload time.

On /debug/rcu/rcudata, you need to "cat" it each time. It just
dumps some internal RCU state once, it is not a running log.
So could you please "cat" it out once the problem has occured?

Thanx, Paul

> - Sedat -

> Mar 29 04:21:20 tbox kernel: [ 132.160108] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/1 test_boost_interval=7 test_boost_duration=4
> Mar 29 04:25:08 tbox kernel: [ 360.352072] INFO: task rcu_torture_fak:1750 blocked for more than 120 seconds.
> Mar 29 04:25:08 tbox kernel: [ 360.364505] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar 29 04:25:08 tbox kernel: [ 360.377380] rcu_torture_fak D 00000036 0 1750 2 0x00000000
> Mar 29 04:25:08 tbox kernel: [ 360.389888] f5169f08 00000046 10c2d7ee 00000036 c1467400 000036da 00000000 c1467400
> Mar 29 04:25:08 tbox kernel: [ 360.405450] 00000000 00000036 f5971ce0 c108395c c129f22b 00000001 f5169ed0 c12a313a
> Mar 29 04:25:08 tbox kernel: [ 360.417251] f6406400 f5169ed8 c1025e91 f5169f54 c129f230 1a975683 00000021 c1467400
> Mar 29 04:25:08 tbox kernel: [ 360.429130] Call Trace:
> Mar 29 04:25:08 tbox kernel: [ 360.440754] [<c108395c>] ? trace_preempt_on+0xf/0x21
> Mar 29 04:25:08 tbox kernel: [ 360.452377] [<c129f22b>] ? schedule+0x3e5/0x3fa
> Mar 29 04:25:08 tbox kernel: [ 360.463904] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:25:08 tbox kernel: [ 360.475636] [<c1025e91>] ? need_resched+0x14/0x1e
> Mar 29 04:25:08 tbox kernel: [ 360.487202] [<c129f230>] ? schedule+0x3ea/0x3fa
> Mar 29 04:25:08 tbox kernel: [ 360.498801] [<c108397d>] ? trace_preempt_off+0xf/0x22
> Mar 29 04:25:08 tbox kernel: [ 360.510305] [<c103a53b>] ? lock_timer_base.isra.29+0x1e/0x3c
> Mar 29 04:25:08 tbox kernel: [ 360.521903] [<c129f514>] schedule_timeout+0x21/0xaa
> Mar 29 04:25:08 tbox kernel: [ 360.533342] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:25:08 tbox kernel: [ 360.544920] [<c108395c>] ? trace_preempt_on+0xf/0x21
> Mar 29 04:25:08 tbox kernel: [ 360.556383] [<c129ed68>] ? wait_for_common+0x6f/0xca
> Mar 29 04:25:08 tbox kernel: [ 360.567690] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:25:08 tbox kernel: [ 360.578785] [<c12a3188>] ? sub_preempt_count+0x41/0x43
> Mar 29 04:25:08 tbox kernel: [ 360.589832] [<c129ed6f>] wait_for_common+0x76/0xca
> Mar 29 04:25:08 tbox kernel: [ 360.600831] [<c102ca4a>] ? try_to_wake_up+0x181/0x181
> Mar 29 04:25:08 tbox kernel: [ 360.611948] [<f875f93e>] ? rcu_torture_fqs+0xc6/0xc6 [rcutorture]
> Mar 29 04:25:08 tbox kernel: [ 360.623083] [<c129ee44>] wait_for_completion+0x12/0x14
> Mar 29 04:25:08 tbox kernel: [ 360.634254] [<c1077004>] synchronize_rcu+0x38/0x3a
> Mar 29 04:25:08 tbox kernel: [ 360.645301] [<c1043868>] ? find_ge_pid+0x2f/0x2f
> Mar 29 04:25:08 tbox kernel: [ 360.656373] [<f875f9b1>] rcu_torture_fakewriter+0x73/0xd0 [rcutorture]
> Mar 29 04:25:08 tbox kernel: [ 360.667452] [<c1045511>] kthread+0x62/0x67
> Mar 29 04:25:08 tbox kernel: [ 360.678588] [<c10454af>] ? kthread_worker_fn+0x111/0x111
> Mar 29 04:25:08 tbox kernel: [ 360.689635] [<c12a5cfe>] kernel_thread_helper+0x6/0xd
> Mar 29 04:25:08 tbox kernel: [ 360.700677] INFO: task rcu_torture_fak:1751 blocked for more than 120 seconds.
> Mar 29 04:25:08 tbox kernel: [ 360.711751] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar 29 04:25:08 tbox kernel: [ 360.723000] rcu_torture_fak D 00000021 0 1751 2 0x00000000
> Mar 29 04:25:08 tbox kernel: [ 360.734284] f5807f08 00000046 1b8afabc 00000021 c1467400 21ab5ecd 00000000 c1467400
> Mar 29 04:25:08 tbox kernel: [ 360.745904] 00000000 00000021 f5970840 c108395c c129f22b 00000001 f5807ed0 c12a313a
> Mar 29 04:25:08 tbox kernel: [ 360.757524] f6406400 f5807ed8 c1025e91 f5807f54 c129f230 1a975e7e 00000021 c1467400
> Mar 29 04:25:08 tbox kernel: [ 360.769310] Call Trace:
> Mar 29 04:25:08 tbox kernel: [ 360.780844] [<c108395c>] ? trace_preempt_on+0xf/0x21
> Mar 29 04:25:08 tbox kernel: [ 360.792587] [<c129f22b>] ? schedule+0x3e5/0x3fa
> Mar 29 04:25:08 tbox kernel: [ 360.804259] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:25:08 tbox kernel: [ 360.816103] [<c1025e91>] ? need_resched+0x14/0x1e
> Mar 29 04:25:08 tbox kernel: [ 360.827845] [<c129f230>] ? schedule+0x3ea/0x3fa
> Mar 29 04:25:08 tbox kernel: [ 360.839621] [<c108397d>] ? trace_preempt_off+0xf/0x22
> Mar 29 04:25:08 tbox kernel: [ 360.851319] [<c103a53b>] ? lock_timer_base.isra.29+0x1e/0x3c
> Mar 29 04:25:08 tbox kernel: [ 360.863052] [<c129f514>] schedule_timeout+0x21/0xaa
> Mar 29 04:25:08 tbox kernel: [ 360.874618] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:25:08 tbox kernel: [ 360.886321] [<c108395c>] ? trace_preempt_on+0xf/0x21
> Mar 29 04:25:08 tbox kernel: [ 360.897896] [<c129ed68>] ? wait_for_common+0x6f/0xca
> Mar 29 04:25:08 tbox kernel: [ 360.909449] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:25:08 tbox kernel: [ 360.920917] [<c12a3188>] ? sub_preempt_count+0x41/0x43
> Mar 29 04:25:08 tbox kernel: [ 360.932305] [<c129ed6f>] wait_for_common+0x76/0xca
> Mar 29 04:25:08 tbox kernel: [ 360.943525] [<c102ca4a>] ? try_to_wake_up+0x181/0x181
> Mar 29 04:25:08 tbox kernel: [ 360.954839] [<f875f93e>] ? rcu_torture_fqs+0xc6/0xc6 [rcutorture]
> Mar 29 04:25:09 tbox kernel: [ 360.966148] [<c129ee44>] wait_for_completion+0x12/0x14
> Mar 29 04:25:09 tbox kernel: [ 360.977617] [<c1077004>] synchronize_rcu+0x38/0x3a
> Mar 29 04:25:09 tbox kernel: [ 360.989076] [<c1043868>] ? find_ge_pid+0x2f/0x2f
> Mar 29 04:25:09 tbox kernel: [ 361.000603] [<f875f9b1>] rcu_torture_fakewriter+0x73/0xd0 [rcutorture]
> Mar 29 04:25:09 tbox kernel: [ 361.012020] [<c1045511>] kthread+0x62/0x67
> Mar 29 04:25:09 tbox kernel: [ 361.023329] [<c10454af>] ? kthread_worker_fn+0x111/0x111
> Mar 29 04:25:09 tbox kernel: [ 361.034680] [<c12a5cfe>] kernel_thread_helper+0x6/0xd
> Mar 29 04:25:09 tbox kernel: [ 361.046110] INFO: task rcu_torture_fak:1752 blocked for more than 120 seconds.
> Mar 29 04:25:09 tbox kernel: [ 361.057508] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar 29 04:25:09 tbox kernel: [ 361.068706] rcu_torture_fak D 00000021 0 1752 2 0x00000000
> Mar 29 04:25:09 tbox kernel: [ 361.079547] f59aff08 00000046 1c05091a 00000021 c1467400 2144d81f 00000000 c1467400
> Mar 29 04:25:09 tbox kernel: [ 361.090601] 00000000 00000021 f5b2cc60 c108395c c129f22b 00000001 f59afed0 c12a313a
> Mar 29 04:25:09 tbox kernel: [ 361.101522] f6406400 f59afed8 c1025e91 f59aff54 c129f230 1a973f63 00000021 c1467400
> Mar 29 04:25:09 tbox kernel: [ 361.112488] Call Trace:
> Mar 29 04:25:09 tbox kernel: [ 361.123082] [<c108395c>] ? trace_preempt_on+0xf/0x21
> Mar 29 04:25:09 tbox kernel: [ 361.133760] [<c129f22b>] ? schedule+0x3e5/0x3fa
> Mar 29 04:25:09 tbox kernel: [ 361.144275] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:25:09 tbox kernel: [ 361.154858] [<c1025e91>] ? need_resched+0x14/0x1e
> Mar 29 04:25:09 tbox kernel: [ 361.165231] [<c129f230>] ? schedule+0x3ea/0x3fa
> Mar 29 04:25:09 tbox kernel: [ 361.175636] [<c108397d>] ? trace_preempt_off+0xf/0x22
> Mar 29 04:25:09 tbox kernel: [ 361.185990] [<c103a53b>] ? lock_timer_base.isra.29+0x1e/0x3c
> Mar 29 04:25:09 tbox kernel: [ 361.196475] [<c129f514>] schedule_timeout+0x21/0xaa
> Mar 29 04:25:09 tbox kernel: [ 361.206819] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:25:09 tbox kernel: [ 361.217327] [<c108395c>] ? trace_preempt_on+0xf/0x21
> Mar 29 04:25:09 tbox kernel: [ 361.227750] [<c129ed68>] ? wait_for_common+0x6f/0xca
> Mar 29 04:25:09 tbox kernel: [ 361.238216] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:25:09 tbox kernel: [ 361.248644] [<c12a3188>] ? sub_preempt_count+0x41/0x43
> Mar 29 04:25:09 tbox kernel: [ 361.259051] [<c129ed6f>] wait_for_common+0x76/0xca
> Mar 29 04:25:09 tbox kernel: [ 361.269341] [<c102ca4a>] ? try_to_wake_up+0x181/0x181
> Mar 29 04:25:09 tbox kernel: [ 361.279737] [<f875f93e>] ? rcu_torture_fqs+0xc6/0xc6 [rcutorture]
> Mar 29 04:25:09 tbox kernel: [ 361.290095] [<c129ee44>] wait_for_completion+0x12/0x14
> Mar 29 04:25:09 tbox kernel: [ 361.300570] [<c1077004>] synchronize_rcu+0x38/0x3a
> Mar 29 04:25:09 tbox kernel: [ 361.310963] [<c1043868>] ? find_ge_pid+0x2f/0x2f
> Mar 29 04:25:09 tbox kernel: [ 361.321416] [<f875f9b1>] rcu_torture_fakewriter+0x73/0xd0 [rcutorture]
> Mar 29 04:25:09 tbox kernel: [ 361.331890] [<c1045511>] kthread+0x62/0x67
> Mar 29 04:25:09 tbox kernel: [ 361.342365] [<c10454af>] ? kthread_worker_fn+0x111/0x111
> Mar 29 04:25:09 tbox kernel: [ 361.352806] [<c12a5cfe>] kernel_thread_helper+0x6/0xd
> Mar 29 04:25:09 tbox kernel: [ 361.363332] INFO: task rcu_torture_fak:1753 blocked for more than 120 seconds.
> Mar 29 04:25:09 tbox kernel: [ 361.373899] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar 29 04:25:09 tbox kernel: [ 361.384605] rcu_torture_fak D 00000021 0 1753 2 0x00000000
> Mar 29 04:25:09 tbox kernel: [ 361.395327] f591bf08 00000046 3d298aff 00000021 c1467400 002590a2 00000000 c1467400
> Mar 29 04:25:09 tbox kernel: [ 361.406361] 00000000 00000021 f5844840 c108395c c129f22b 00000001 f591bed0 c12a313a
> Mar 29 04:25:09 tbox kernel: [ 361.417290] f6406400 f591bed8 c1025e91 f591bf54 c129f230 1a974f36 00000021 c1467400
> Mar 29 04:25:09 tbox kernel: [ 361.428261] Call Trace:
> Mar 29 04:25:09 tbox kernel: [ 361.438867] [<c108395c>] ? trace_preempt_on+0xf/0x21
> Mar 29 04:25:09 tbox kernel: [ 361.449542] [<c129f22b>] ? schedule+0x3e5/0x3fa
> Mar 29 04:25:09 tbox kernel: [ 361.460061] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:25:09 tbox kernel: [ 361.470675] [<c1025e91>] ? need_resched+0x14/0x1e
> Mar 29 04:25:09 tbox kernel: [ 361.481046] [<c129f230>] ? schedule+0x3ea/0x3fa
> Mar 29 04:25:09 tbox kernel: [ 361.491448] [<c108397d>] ? trace_preempt_off+0xf/0x22
> Mar 29 04:25:09 tbox kernel: [ 361.501801] [<c103a53b>] ? lock_timer_base.isra.29+0x1e/0x3c
> Mar 29 04:25:09 tbox kernel: [ 361.512251] [<c129f514>] schedule_timeout+0x21/0xaa
> Mar 29 04:25:09 tbox kernel: [ 361.522588] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:25:09 tbox kernel: [ 361.533084] [<c108395c>] ? trace_preempt_on+0xf/0x21
> Mar 29 04:25:09 tbox kernel: [ 361.543495] [<c129ed68>] ? wait_for_common+0x6f/0xca
> Mar 29 04:25:09 tbox kernel: [ 361.553946] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:25:09 tbox kernel: [ 361.564368] [<c12a3188>] ? sub_preempt_count+0x41/0x43
> Mar 29 04:25:09 tbox kernel: [ 361.574778] [<c129ed6f>] wait_for_common+0x76/0xca
> Mar 29 04:25:09 tbox kernel: [ 361.585055] [<c102ca4a>] ? try_to_wake_up+0x181/0x181
> Mar 29 04:25:09 tbox kernel: [ 361.595446] [<f875f93e>] ? rcu_torture_fqs+0xc6/0xc6 [rcutorture]
> Mar 29 04:25:09 tbox kernel: [ 361.605792] [<c129ee44>] wait_for_completion+0x12/0x14
> Mar 29 04:25:09 tbox kernel: [ 361.616270] [<c1077004>] synchronize_rcu+0x38/0x3a
> Mar 29 04:25:09 tbox kernel: [ 361.626652] [<c1043868>] ? find_ge_pid+0x2f/0x2f
> Mar 29 04:25:09 tbox kernel: [ 361.637107] [<f875f9b1>] rcu_torture_fakewriter+0x73/0xd0 [rcutorture]
> Mar 29 04:25:09 tbox kernel: [ 361.647577] [<c1045511>] kthread+0x62/0x67
> Mar 29 04:25:09 tbox kernel: [ 361.658037] [<c10454af>] ? kthread_worker_fn+0x111/0x111
> Mar 29 04:25:09 tbox kernel: [ 361.668459] [<c12a5cfe>] kernel_thread_helper+0x6/0xd
> Mar 29 04:25:09 tbox kernel: [ 361.678968] INFO: task modprobe:1759 blocked for more than 120 seconds.
> Mar 29 04:25:09 tbox kernel: [ 361.689455] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar 29 04:25:09 tbox kernel: [ 361.700107] modprobe D 00000036 0 1759 1747 0x00000000
> Mar 29 04:25:09 tbox kernel: [ 361.710781] f5995ea4 00000082 10c2f37d 00000036 c1467400 00000c21 00000000 c1467400
> Mar 29 04:25:09 tbox kernel: [ 361.721757] 00000000 00000036 f5844420 ffffffff ffffffff c1027d65 10c2d7ee f6406444
> Mar 29 04:25:09 tbox kernel: [ 361.732657] 002dab31 f5995e84 3d090000 00000000 f584444c 00001b8f 00000000 f5995e94
> Mar 29 04:25:09 tbox kernel: [ 361.743604] Call Trace:
> Mar 29 04:25:09 tbox kernel: [ 361.754185] [<c1027d65>] ? finish_task_switch+0x67/0x6d
> Mar 29 04:25:09 tbox kernel: [ 361.764863] [<c10219ba>] ? wakeup_preempt_entity+0x36/0x53
> Mar 29 04:25:09 tbox kernel: [ 361.775397] [<c129f514>] schedule_timeout+0x21/0xaa
> Mar 29 04:25:09 tbox kernel: [ 361.785917] [<c108395c>] ? trace_preempt_on+0xf/0x21
> Mar 29 04:25:09 tbox kernel: [ 361.796240] [<c129ed68>] ? wait_for_common+0x6f/0xca
> Mar 29 04:25:09 tbox kernel: [ 361.806599] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:25:09 tbox kernel: [ 361.816960] [<c12a3188>] ? sub_preempt_count+0x41/0x43
> Mar 29 04:25:09 tbox kernel: [ 361.827308] [<c129ed6f>] wait_for_common+0x76/0xca
> Mar 29 04:25:09 tbox kernel: [ 361.837555] [<c102ca4a>] ? try_to_wake_up+0x181/0x181
> Mar 29 04:25:09 tbox kernel: [ 361.847902] [<c129ee44>] wait_for_completion+0x12/0x14
> Mar 29 04:25:09 tbox kernel: [ 361.858181] [<c1045576>] kthread_stop+0x60/0xaf
> Mar 29 04:25:09 tbox kernel: [ 361.868503] [<f8760592>] rcu_torture_cleanup+0x1d5/0x318 [rcutorture]
> Mar 29 04:25:09 tbox kernel: [ 361.878857] [<c10592bc>] sys_delete_module+0x198/0x1f5
> Mar 29 04:25:09 tbox kernel: [ 361.889253] [<c10c1612>] ? vfs_write+0xa4/0xd7
> Mar 29 04:25:09 tbox kernel: [ 361.899525] [<c11aade0>] ? tty_write_lock+0x3d/0x3d
> Mar 29 04:25:09 tbox kernel: [ 361.909876] [<c10c17ca>] ? sys_write+0x53/0x5d
> Mar 29 04:25:09 tbox kernel: [ 361.920083] [<c12a575f>] sysenter_do_call+0x12/0x28
> Mar 29 04:27:09 tbox kernel: [ 481.928074] INFO: task rcu_torture_fak:1750 blocked for more than 120 seconds.
> Mar 29 04:27:09 tbox kernel: [ 481.940037] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar 29 04:27:09 tbox kernel: [ 481.953070] rcu_torture_fak D 00000036 0 1750 2 0x00000000
> Mar 29 04:27:10 tbox kernel: [ 481.972056] f5169f08 00000046 10c2d7ee 00000036 c1467400 000036da 00000000 c1467400
> Mar 29 04:27:10 tbox kernel: [ 481.984708] 00000000 00000036 f5971ce0 c108395c c129f22b 00000001 f5169ed0 c12a313a
> Mar 29 04:27:10 tbox kernel: [ 481.996536] f6406400 f5169ed8 c1025e91 f5169f54 c129f230 1a975683 00000021 c1467400
> Mar 29 04:27:10 tbox kernel: [ 482.008580] Call Trace:
> Mar 29 04:27:10 tbox kernel: [ 482.020181] [<c108395c>] ? trace_preempt_on+0xf/0x21
> Mar 29 04:27:10 tbox kernel: [ 482.031882] [<c129f22b>] ? schedule+0x3e5/0x3fa
> Mar 29 04:27:10 tbox kernel: [ 482.043546] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:27:10 tbox kernel: [ 482.055367] [<c1025e91>] ? need_resched+0x14/0x1e
> Mar 29 04:27:10 tbox kernel: [ 482.067076] [<c129f230>] ? schedule+0x3ea/0x3fa
> Mar 29 04:27:10 tbox kernel: [ 482.078839] [<c108397d>] ? trace_preempt_off+0xf/0x22
> Mar 29 04:27:10 tbox kernel: [ 482.090503] [<c103a53b>] ? lock_timer_base.isra.29+0x1e/0x3c
> Mar 29 04:27:10 tbox kernel: [ 482.102239] [<c129f514>] schedule_timeout+0x21/0xaa
> Mar 29 04:27:10 tbox kernel: [ 482.113842] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:27:10 tbox kernel: [ 482.125558] [<c108395c>] ? trace_preempt_on+0xf/0x21
> Mar 29 04:27:10 tbox kernel: [ 482.137128] [<c129ed68>] ? wait_for_common+0x6f/0xca
> Mar 29 04:27:10 tbox kernel: [ 482.148707] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:27:10 tbox kernel: [ 482.160239] [<c12a3188>] ? sub_preempt_count+0x41/0x43
> Mar 29 04:27:10 tbox kernel: [ 482.171766] [<c129ed6f>] wait_for_common+0x76/0xca
> Mar 29 04:27:10 tbox kernel: [ 482.183180] [<c102ca4a>] ? try_to_wake_up+0x181/0x181
> Mar 29 04:27:10 tbox kernel: [ 482.194676] [<f875f93e>] ? rcu_torture_fqs+0xc6/0xc6 [rcutorture]
> Mar 29 04:27:10 tbox kernel: [ 482.206123] [<c129ee44>] wait_for_completion+0x12/0x14
> Mar 29 04:27:10 tbox kernel: [ 482.217623] [<c1077004>] synchronize_rcu+0x38/0x3a
> Mar 29 04:27:10 tbox kernel: [ 482.228876] [<c1043868>] ? find_ge_pid+0x2f/0x2f
> Mar 29 04:27:10 tbox kernel: [ 482.240065] [<f875f9b1>] rcu_torture_fakewriter+0x73/0xd0 [rcutorture]
> Mar 29 04:27:10 tbox kernel: [ 482.251134] [<c1045511>] kthread+0x62/0x67
> Mar 29 04:27:10 tbox kernel: [ 482.262103] [<c10454af>] ? kthread_worker_fn+0x111/0x111
> Mar 29 04:27:10 tbox kernel: [ 482.272901] [<c12a5cfe>] kernel_thread_helper+0x6/0xd
> Mar 29 04:27:10 tbox kernel: [ 482.283635] INFO: task rcu_torture_fak:1751 blocked for more than 120 seconds.
> Mar 29 04:27:10 tbox kernel: [ 482.294244] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar 29 04:27:10 tbox kernel: [ 482.304837] rcu_torture_fak D 00000021 0 1751 2 0x00000000
> Mar 29 04:27:10 tbox kernel: [ 482.315409] f5807f08 00000046 1b8afabc 00000021 c1467400 21ab5ecd 00000000 c1467400
> Mar 29 04:27:10 tbox kernel: [ 482.326258] 00000000 00000021 f5970840 c108395c c129f22b 00000001 f5807ed0 c12a313a
> Mar 29 04:27:10 tbox kernel: [ 482.337071] f6406400 f5807ed8 c1025e91 f5807f54 c129f230 1a975e7e 00000021 c1467400
> Mar 29 04:27:10 tbox kernel: [ 482.348016] Call Trace:
> Mar 29 04:27:10 tbox kernel: [ 482.358653] [<c108395c>] ? trace_preempt_on+0xf/0x21
> Mar 29 04:27:10 tbox kernel: [ 482.369453] [<c129f22b>] ? schedule+0x3e5/0x3fa
> Mar 29 04:27:10 tbox kernel: [ 482.380116] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:27:10 tbox kernel: [ 482.390934] [<c1025e91>] ? need_resched+0x14/0x1e
> Mar 29 04:27:10 tbox kernel: [ 482.401650] [<c129f230>] ? schedule+0x3ea/0x3fa
> Mar 29 04:27:10 tbox kernel: [ 482.412396] [<c108397d>] ? trace_preempt_off+0xf/0x22
> Mar 29 04:27:10 tbox kernel: [ 482.423046] [<c103a53b>] ? lock_timer_base.isra.29+0x1e/0x3c
> Mar 29 04:27:10 tbox kernel: [ 482.433759] [<c129f514>] schedule_timeout+0x21/0xaa
> Mar 29 04:27:10 tbox kernel: [ 482.444321] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:27:10 tbox kernel: [ 482.455086] [<c108395c>] ? trace_preempt_on+0xf/0x21
> Mar 29 04:27:10 tbox kernel: [ 482.465806] [<c129ed68>] ? wait_for_common+0x6f/0xca
> Mar 29 04:27:10 tbox kernel: [ 482.476454] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:27:10 tbox kernel: [ 482.486994] [<c12a3188>] ? sub_preempt_count+0x41/0x43
> Mar 29 04:27:10 tbox kernel: [ 482.497590] [<c129ed6f>] wait_for_common+0x76/0xca
> Mar 29 04:27:10 tbox kernel: [ 482.508130] [<c102ca4a>] ? try_to_wake_up+0x181/0x181
> Mar 29 04:27:10 tbox kernel: [ 482.518661] [<f875f93e>] ? rcu_torture_fqs+0xc6/0xc6 [rcutorture]
> Mar 29 04:27:10 tbox kernel: [ 482.529095] [<c129ee44>] wait_for_completion+0x12/0x14
> Mar 29 04:27:10 tbox kernel: [ 482.539659] [<c1077004>] synchronize_rcu+0x38/0x3a
> Mar 29 04:27:10 tbox kernel: [ 482.550091] [<c1043868>] ? find_ge_pid+0x2f/0x2f
> Mar 29 04:27:10 tbox kernel: [ 482.560540] [<f875f9b1>] rcu_torture_fakewriter+0x73/0xd0 [rcutorture]
> Mar 29 04:27:10 tbox kernel: [ 482.571002] [<c1045511>] kthread+0x62/0x67
> Mar 29 04:27:10 tbox kernel: [ 482.581466] [<c10454af>] ? kthread_worker_fn+0x111/0x111
> Mar 29 04:27:10 tbox kernel: [ 482.591845] [<c12a5cfe>] kernel_thread_helper+0x6/0xd
> Mar 29 04:27:10 tbox kernel: [ 482.602246] INFO: task rcu_torture_fak:1752 blocked for more than 120 seconds.
> Mar 29 04:27:10 tbox kernel: [ 482.612710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar 29 04:27:10 tbox kernel: [ 482.623331] rcu_torture_fak D 00000021 0 1752 2 0x00000000
> Mar 29 04:27:10 tbox kernel: [ 482.634036] f59aff08 00000046 1c05091a 00000021 c1467400 2144d81f 00000000 c1467400
> Mar 29 04:27:10 tbox kernel: [ 482.645198] 00000000 00000021 f5b2cc60 c108395c c129f22b 00000001 f59afed0 c12a313a
> Mar 29 04:27:10 tbox kernel: [ 482.656175] f6406400 f59afed8 c1025e91 f59aff54 c129f230 1a973f63 00000021 c1467400
> Mar 29 04:27:10 tbox kernel: [ 482.667124] Call Trace:
> Mar 29 04:27:10 tbox kernel: [ 482.677703] [<c108395c>] ? trace_preempt_on+0xf/0x21
> Mar 29 04:27:10 tbox kernel: [ 482.688365] [<c129f22b>] ? schedule+0x3e5/0x3fa
> Mar 29 04:27:10 tbox kernel: [ 482.698868] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:27:10 tbox kernel: [ 482.709440] [<c1025e91>] ? need_resched+0x14/0x1e
> Mar 29 04:27:10 tbox kernel: [ 482.719793] [<c129f230>] ? schedule+0x3ea/0x3fa
> Mar 29 04:27:10 tbox kernel: [ 482.730182] [<c108397d>] ? trace_preempt_off+0xf/0x22
> Mar 29 04:27:10 tbox kernel: [ 482.740519] [<c103a53b>] ? lock_timer_base.isra.29+0x1e/0x3c
> Mar 29 04:27:10 tbox kernel: [ 482.750947] [<c129f514>] schedule_timeout+0x21/0xaa
> Mar 29 04:27:10 tbox kernel: [ 482.761271] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:27:10 tbox kernel: [ 482.771756] [<c108395c>] ? trace_preempt_on+0xf/0x21
> Mar 29 04:27:10 tbox kernel: [ 482.782160] [<c129ed68>] ? wait_for_common+0x6f/0xca
> Mar 29 04:27:10 tbox kernel: [ 482.792602] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:27:10 tbox kernel: [ 482.803006] [<c12a3188>] ? sub_preempt_count+0x41/0x43
> Mar 29 04:27:10 tbox kernel: [ 482.813393] [<c129ed6f>] wait_for_common+0x76/0xca
> Mar 29 04:27:10 tbox kernel: [ 482.823666] [<c102ca4a>] ? try_to_wake_up+0x181/0x181
> Mar 29 04:27:10 tbox kernel: [ 482.834058] [<f875f93e>] ? rcu_torture_fqs+0xc6/0xc6 [rcutorture]
> Mar 29 04:27:10 tbox kernel: [ 482.844417] [<c129ee44>] wait_for_completion+0x12/0x14
> Mar 29 04:27:10 tbox kernel: [ 482.854887] [<c1077004>] synchronize_rcu+0x38/0x3a
> Mar 29 04:27:10 tbox kernel: [ 482.865264] [<c1043868>] ? find_ge_pid+0x2f/0x2f
> Mar 29 04:27:10 tbox kernel: [ 482.875699] [<f875f9b1>] rcu_torture_fakewriter+0x73/0xd0 [rcutorture]
> Mar 29 04:27:10 tbox kernel: [ 482.886157] [<c1045511>] kthread+0x62/0x67
> Mar 29 04:27:10 tbox kernel: [ 482.896608] [<c10454af>] ? kthread_worker_fn+0x111/0x111
> Mar 29 04:27:10 tbox kernel: [ 482.907033] [<c12a5cfe>] kernel_thread_helper+0x6/0xd
> Mar 29 04:27:10 tbox kernel: [ 482.917579] INFO: task rcu_torture_fak:1753 blocked for more than 120 seconds.
> Mar 29 04:27:10 tbox kernel: [ 482.928135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar 29 04:27:10 tbox kernel: [ 482.938831] rcu_torture_fak D 00000021 0 1753 2 0x00000000
> Mar 29 04:27:10 tbox kernel: [ 482.949547] f591bf08 00000046 3d298aff 00000021 c1467400 002590a2 00000000 c1467400
> Mar 29 04:27:11 tbox kernel: [ 482.960568] 00000000 00000021 f5844840 c108395c c129f22b 00000001 f591bed0 c12a313a
> Mar 29 04:27:11 tbox kernel: [ 482.971488] f6406400 f591bed8 c1025e91 f591bf54 c129f230 1a974f36 00000021 c1467400
> Mar 29 04:27:11 tbox kernel: [ 482.982445] Call Trace:
> Mar 29 04:27:11 tbox kernel: [ 482.993023] [<c108395c>] ? trace_preempt_on+0xf/0x21
> Mar 29 04:27:11 tbox kernel: [ 483.003687] [<c129f22b>] ? schedule+0x3e5/0x3fa
> Mar 29 04:27:11 tbox kernel: [ 483.014180] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:27:11 tbox kernel: [ 483.024748] [<c1025e91>] ? need_resched+0x14/0x1e
> Mar 29 04:27:11 tbox kernel: [ 483.035095] [<c129f230>] ? schedule+0x3ea/0x3fa
> Mar 29 04:27:11 tbox kernel: [ 483.045485] [<c108397d>] ? trace_preempt_off+0xf/0x22
> Mar 29 04:27:11 tbox kernel: [ 483.055817] [<c103a53b>] ? lock_timer_base.isra.29+0x1e/0x3c
> Mar 29 04:27:11 tbox kernel: [ 483.066241] [<c129f514>] schedule_timeout+0x21/0xaa
> Mar 29 04:27:11 tbox kernel: [ 483.076562] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:27:11 tbox kernel: [ 483.087036] [<c108395c>] ? trace_preempt_on+0xf/0x21
> Mar 29 04:27:11 tbox kernel: [ 483.097426] [<c129ed68>] ? wait_for_common+0x6f/0xca
> Mar 29 04:27:11 tbox kernel: [ 483.107856] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:27:11 tbox kernel: [ 483.118256] [<c12a3188>] ? sub_preempt_count+0x41/0x43
> Mar 29 04:27:11 tbox kernel: [ 483.128627] [<c129ed6f>] wait_for_common+0x76/0xca
> Mar 29 04:27:11 tbox kernel: [ 483.138884] [<c102ca4a>] ? try_to_wake_up+0x181/0x181
> Mar 29 04:27:11 tbox kernel: [ 483.149275] [<f875f93e>] ? rcu_torture_fqs+0xc6/0xc6 [rcutorture]
> Mar 29 04:27:11 tbox kernel: [ 483.159611] [<c129ee44>] wait_for_completion+0x12/0x14
> Mar 29 04:27:11 tbox kernel: [ 483.170072] [<c1077004>] synchronize_rcu+0x38/0x3a
> Mar 29 04:27:11 tbox kernel: [ 483.180444] [<c1043868>] ? find_ge_pid+0x2f/0x2f
> Mar 29 04:27:11 tbox kernel: [ 483.190875] [<f875f9b1>] rcu_torture_fakewriter+0x73/0xd0 [rcutorture]
> Mar 29 04:27:11 tbox kernel: [ 483.201327] [<c1045511>] kthread+0x62/0x67
> Mar 29 04:27:11 tbox kernel: [ 483.211784] [<c10454af>] ? kthread_worker_fn+0x111/0x111
> Mar 29 04:27:11 tbox kernel: [ 483.222207] [<c12a5cfe>] kernel_thread_helper+0x6/0xd
> Mar 29 04:27:11 tbox kernel: [ 483.232717] INFO: task modprobe:1759 blocked for more than 120 seconds.
> Mar 29 04:27:11 tbox kernel: [ 483.243213] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar 29 04:27:11 tbox kernel: [ 483.253882] modprobe D 00000036 0 1759 1747 0x00000000
> Mar 29 04:27:11 tbox kernel: [ 483.264562] f5995ea4 00000082 10c2f37d 00000036 c1467400 00000c21 00000000 c1467400
> Mar 29 04:27:11 tbox kernel: [ 483.275540] 00000000 00000036 f5844420 ffffffff ffffffff c1027d65 10c2d7ee f6406444
> Mar 29 04:27:11 tbox kernel: [ 483.286424] 002dab31 f5995e84 3d090000 00000000 f584444c 00001b8f 00000000 f5995e94
> Mar 29 04:27:11 tbox kernel: [ 483.297363] Call Trace:
> Mar 29 04:27:11 tbox kernel: [ 483.307928] [<c1027d65>] ? finish_task_switch+0x67/0x6d
> Mar 29 04:27:11 tbox kernel: [ 483.318648] [<c10219ba>] ? wakeup_preempt_entity+0x36/0x53
> Mar 29 04:27:11 tbox kernel: [ 483.329172] [<c129f514>] schedule_timeout+0x21/0xaa
> Mar 29 04:27:11 tbox kernel: [ 483.339689] [<c108395c>] ? trace_preempt_on+0xf/0x21
> Mar 29 04:27:11 tbox kernel: [ 483.350012] [<c129ed68>] ? wait_for_common+0x6f/0xca
> Mar 29 04:27:11 tbox kernel: [ 483.360371] [<c12a313a>] ? sub_preempt_count.part.167+0x67/0x74
> Mar 29 04:27:11 tbox kernel: [ 483.370729] [<c12a3188>] ? sub_preempt_count+0x41/0x43
> Mar 29 04:27:11 tbox kernel: [ 483.381108] [<c129ed6f>] wait_for_common+0x76/0xca
> Mar 29 04:27:11 tbox kernel: [ 483.391338] [<c102ca4a>] ? try_to_wake_up+0x181/0x181
> Mar 29 04:27:11 tbox kernel: [ 483.401670] [<c129ee44>] wait_for_completion+0x12/0x14
> Mar 29 04:27:11 tbox kernel: [ 483.411936] [<c1045576>] kthread_stop+0x60/0xaf
> Mar 29 04:27:11 tbox kernel: [ 483.422248] [<f8760592>] rcu_torture_cleanup+0x1d5/0x318 [rcutorture]
> Mar 29 04:27:11 tbox kernel: [ 483.432594] [<c10592bc>] sys_delete_module+0x198/0x1f5
> Mar 29 04:27:11 tbox kernel: [ 483.442984] [<c10c1612>] ? vfs_write+0xa4/0xd7
> Mar 29 04:27:11 tbox kernel: [ 483.453255] [<c11aade0>] ? tty_write_lock+0x3d/0x3d
> Mar 29 04:27:11 tbox kernel: [ 483.463598] [<c10c17ca>] ? sys_write+0x53/0x5d
> Mar 29 04:27:11 tbox kernel: [ 483.473787] [<c12a575f>] sysenter_do_call+0x12/0x28

2011-03-29 04:39:49

by Sedat Dilek

[permalink] [raw]

Subject: Re: linux-next: Tree for March 25 (Call trace: RCU|workqueues|block|VFS|ext4 related?)

[...]
>> BTW, today's linux-next (next-20110328) is still freaky, I applied the
>> revert-rcu-patches patchset and all is fine.
>
> I reverted back to the commit preceding the one you pointed out last night
> my time, so the upcoming -next should be less freaky.
>

Makes sense.
Looks like the "old" commits are now in boost.2011.03.25b GIT branch [1]?

I have attached the result from rcu-torture for linux-next (20110329)
as I wanted to see what is logged (loaded rcutorture module 3x normal,
1x with verbose=1).

- Sedat -

[1] http://git.us.kernel.org/?p=linux/kernel/git/paulmck/linux-2.6-rcu.git;a=shortlog;h=refs/heads/boost.2011.03.25b

Attachments:

msg_rcu-torture_next-20110329.txt (5.58 kB)

2011-03-29 05:48:47

by Paul E. McKenney

[permalink] [raw]

Subject: Re: linux-next: Tree for March 25 (Call trace: RCU|workqueues|block|VFS|ext4 related?)

On Tue, Mar 29, 2011 at 06:39:45AM +0200, Sedat Dilek wrote:
> [...]
> >> BTW, today's linux-next (next-20110328) is still freaky, I applied the
> >> revert-rcu-patches patchset and all is fine.
> >
> > I reverted back to the commit preceding the one you pointed out last night
> > my time, so the upcoming -next should be less freaky.
> >
>
> Makes sense.
> Looks like the "old" commits are now in boost.2011.03.25b GIT branch [1]?

Yep. I keep them always, digital packrat that I am.

I will be posting updates.

> I have attached the result from rcu-torture for linux-next (20110329)
> as I wanted to see what is logged (loaded rcutorture module 3x normal,
> 1x with verbose=1).

Yep, that is expected output. Now I just need to work out what else
is messed up.

Thanx, Paul

> - Sedat -
>
> [1] http://git.us.kernel.org/?p=linux/kernel/git/paulmck/linux-2.6-rcu.git;a=shortlog;h=refs/heads/boost.2011.03.25b

> [ 688.623078] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4
> [ 788.672108] rcu-torture: rtc: (null) ver: 12030 tfle: 0 rta: 12030 rtaf: 0 rtf: 12025 rtmbe: 0 rtbke: 0 rtbre: 0 rtbae: 0 rtbafe: 0 rtbf: 0 rtb: 0 nt: 20281
> [ 788.672118] rcu-torture: Reader Pipe: 30198896 782 0 0 0 0 0 0 0 0 0
> [ 788.672123] rcu-torture: Reader Batch: 30196680 2998 0 0 0 0 0 0 0 0 0
> [ 788.672128] rcu-torture: Free-Block Circulation: 12029 12027 12025 12025 12025 12025 12025 12025 12025 12025 0
> [ 788.672158] rcu-torture:--- End of test: SUCCESS: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4
> [ 923.777108] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4
> [ 1023.820120] rcu-torture: rtc: (null) ver: 12206 tfle: 0 rta: 12206 rtaf: 0 rtf: 12203 rtmbe: 0 rtbke: 0 rtbre: 0 rtbae: 0 rtbafe: 0 rtbf: 0 rtb: 0 nt: 20509
> [ 1023.820130] rcu-torture: Reader Pipe: 31024643 786 0 0 0 0 0 0 0 0 0
> [ 1023.820135] rcu-torture: Reader Batch: 31022307 3122 0 0 0 0 0 0 0 0 0
> [ 1023.820140] rcu-torture: Free-Block Circulation: 12205 12203 12203 12203 12203 12203 12203 12203 12203 12203 0
> [ 1023.820171] rcu-torture:--- End of test: SUCCESS: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4
> [ 1292.474530] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4
> [ 1392.504192] rcu-torture: rtc: (null) ver: 12248 tfle: 0 rta: 12248 rtaf: 0 rtf: 12245 rtmbe: 0 rtbke: 0 rtbre: 0 rtbae: 0 rtbafe: 0 rtbf: 0 rtb: 0 nt: 21087
> [ 1392.504201] rcu-torture: Reader Pipe: 31674422 790 0 0 0 0 0 0 0 0 0
> [ 1392.504207] rcu-torture: Reader Batch: 31672026 3186 0 0 0 0 0 0 0 0 0
> [ 1392.504212] rcu-torture: Free-Block Circulation: 12247 12246 12245 12245 12245 12245 12245 12245 12245 12245 0
> [ 1392.504242] rcu-torture:--- End of test: SUCCESS: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4
> [ 1601.082101] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=1 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4
> [ 1601.082114] rcu-torture:Creating rcu_torture_writer task
> [ 1601.082239] rcu-torture:Creating rcu_torture_fakewriter task
> [ 1601.082261] rcu-torture:rcu_torture_writer task started
> [ 1601.087257] rcu-torture:Creating rcu_torture_fakewriter task
> [ 1601.087300] rcu-torture:rcu_torture_fakewriter task started
> [ 1601.090700] rcu-torture:Creating rcu_torture_fakewriter task
> [ 1601.090742] rcu-torture:rcu_torture_fakewriter task started
> [ 1601.090754] rcu-torture:Creating rcu_torture_fakewriter task
> [ 1601.090773] rcu-torture:rcu_torture_fakewriter task started
> [ 1601.090784] rcu-torture:Creating rcu_torture_reader task
> [ 1601.090802] rcu-torture:rcu_torture_fakewriter task started
> [ 1601.090812] rcu-torture:Creating rcu_torture_reader task
> [ 1601.090830] rcu-torture:rcu_torture_reader task started
> [ 1601.090883] rcu-torture:rcu_torture_reader task started
> [ 1601.091303] rcu-torture:rcu_torture_stutter task started
> [ 1701.099323] rcu-torture:Stopping rcu_torture_stutter task
> [ 1701.099347] rcu-torture:rcu_torture_stutter task stopping
> [ 1701.099395] rcu-torture:Stopping rcu_torture_writer task
> [ 1701.099409] rcu-torture:rcu_torture_reader task stopping
> [ 1701.099422] rcu-torture:rcu_torture_reader task stopping
> [ 1701.099437] rcu-torture:rcu_torture_writer task stopping
> [ 1701.099456] rcu-torture:Stopping rcu_torture_reader task
> [ 1701.099497] rcu-torture:Stopping rcu_torture_reader task
> [ 1701.099535] rcu-torture:Stopping rcu_torture_fakewriter task
> [ 1701.100726] rcu-torture:rcu_torture_fakewriter task stopping
> [ 1701.104291] rcu-torture:rcu_torture_fakewriter task stopping
> [ 1701.104315] rcu-torture:Stopping rcu_torture_fakewriter task
> [ 1701.104359] rcu-torture:Stopping rcu_torture_fakewriter task
> [ 1701.131318] rcu-torture:rcu_torture_fakewriter task stopping
> [ 1701.139632] rcu-torture:rcu_torture_fakewriter task stopping
> [ 1701.139668] rcu-torture:Stopping rcu_torture_fakewriter task
> [ 1701.144211] rcu-torture: rtc: (null) ver: 12179 tfle: 0 rta: 12179 rtaf: 0 rtf: 12177 rtmbe: 0 rtbke: 0 rtbre: 0 rtbae: 0 rtbafe: 0 rtbf: 0 rtb: 0 nt: 20951
> [ 1701.144220] rcu-torture: Reader Pipe: 31547701 757 0 0 0 0 0 0 0 0 0
> [ 1701.144226] rcu-torture: Reader Batch: 31545363 3095 0 0 0 0 0 0 0 0 0
> [ 1701.144231] rcu-torture: Free-Block Circulation: 12178 12177 12177 12177 12177 12177 12177 12177 12177 12177 0
> [ 1701.144259] rcu-torture:--- End of test: SUCCESS: nreaders=2 nfakewriters=4 stat_interval=0 verbose=1 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4