2013-05-21 17:32:09

by Dave Jones

[permalink] [raw]
Subject: rcu_read_lock() used illegally while idle!

Back from vacation, and breaking stuff already :)

-rc2, plus some small local mods that shouldn't be a cause.


[ 839.278967] ===============================
[ 839.279023] [ INFO: suspicious RCU usage. ]
[ 839.279080] 3.10.0-rc2+ #1 Not tainted
[ 839.279129] -------------------------------
[ 839.279185] include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
[ 839.279273]
other info that might help us debug this:

[ 839.279370]
RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
[ 839.279497] RCU used illegally from extended quiescent state!
[ 839.279566] 2 locks held by cc1/63645:
[ 839.279614] #0: (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
[ 839.279754] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0
[ 839.279895]
stack backtrace:
[ 839.279954] CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
[ 839.280074] Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
[ 839.280182] 0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
[ 839.280301] ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
[ 839.280419] 0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
[ 839.280537] Call Trace:
[ 839.280578] [<ffffffff816ae383>] dump_stack+0x19/0x1b
[ 839.280645] [<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
[ 839.280723] [<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
[ 839.280793] [<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
[ 839.280865] [<ffffffff8108dffc>] update_curr+0xec/0x240
[ 839.283157] [<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
[ 839.285453] [<ffffffff816b3a71>] __schedule+0x161/0x9b0
[ 839.287753] [<ffffffff816b4721>] preempt_schedule+0x51/0x80
[ 839.290056] [<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
[ 839.292347] [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
[ 839.294620] [<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
[ 839.296880] [<ffffffff816be280>] ftrace_call+0x5/0x2f
[ 839.299107] [<ffffffff816b681d>] ? retint_careful+0xb/0x2e
[ 839.301319] [<ffffffff816b4805>] ? schedule_user+0x5/0x70
[ 839.303524] [<ffffffff816b4805>] ? schedule_user+0x5/0x70
[ 839.305709] [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
[ 839.959586] ------------[ cut here ]------------
[ 839.962342] WARNING: at kernel/context_tracking.c:55 user_enter+0xb5/0xd0()
[ 839.965471] Modules linked in: 8021q garp stp mrp rfcomm cmtp kernelcapi hidp fuse phonet af_key rose caif_socket af_rxrpc caif netrom llc2 bnep can_bcm can_raw can ipt_ULOG nfnetlink l2tp_ppp l2tp_netlink l2tp_core pppoe pppox ppp_generic slhc scsi_transport_iscsi ipx x25 p8023 appletalk psnap p8022 llc irda nfc crc_ccitt rds ax25 decnet af_802154 atm kvm_amd kvm microcode edac_core btusb bluetooth serio_raw pcspkr snd_hda_codec_realtek snd_hda_intel rfkill snd_hda_codec snd_pcm r8169 mii snd_page_alloc snd_timer snd soundcore sr_mod cdrom pata_atiixp radeon backlight drm_kms_helper ttm
[ 839.983088] CPU: 1 PID: 275 Comm: jbd2/sda5-8 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 41/277 64411]
[ 839.986926] Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
[ 839.990687] ffffffff819ffcdc ffff88011cf6d6e8 ffffffff816ae383 ffff88011cf6d720
[ 839.994267] ffffffff81043280 0000000000000000 ffff88011cf6dfd8 00000000ffffffff
[ 839.997861] ffff88012a8046c0 ffff88012b1d7320 ffff88011cf6d730 ffffffff8104336a
[ 840.001431] Call Trace:
[ 840.004284] [<ffffffff816ae383>] dump_stack+0x19/0x1b
[ 840.007486] [<ffffffff81043280>] warn_slowpath_common+0x70/0xa0
[ 840.010807] [<ffffffff8104336a>] warn_slowpath_null+0x1a/0x20
[ 840.014079] [<ffffffff8113acc5>] user_enter+0xb5/0xd0
[ 840.017255] [<ffffffff816b496d>] preempt_schedule_irq+0x7d/0x90
[ 840.020548] [<ffffffff816b68f6>] retint_kernel+0x26/0x30
[ 840.023760] [<ffffffff81190bf7>] ? deactivate_slab+0x627/0x740
[ 840.027035] [<ffffffff810b83fd>] ? trace_hardirqs_on+0xd/0x10
[ 840.030303] [<ffffffff816ac6ca>] ? __slab_alloc+0x433/0x4dd
[ 840.033547] [<ffffffff8126bc98>] ? ext4_ext_find_extent+0x438/0x520
[ 840.036917] [<ffffffff816be280>] ? ftrace_call+0x5/0x2f
[ 840.040147] [<ffffffff8126bc98>] ? ext4_ext_find_extent+0x438/0x520
[ 840.043497] [<ffffffff8126bc98>] ? ext4_ext_find_extent+0x438/0x520
[ 840.046854] [<ffffffff8126bc98>] ? ext4_ext_find_extent+0x438/0x520
[ 840.050171] [<ffffffff81192172>] __kmalloc+0x302/0x400
[ 840.053339] [<ffffffff8126bc98>] ext4_ext_find_extent+0x438/0x520
[ 840.056660] [<ffffffff812701ea>] ext4_ext_map_blocks+0x6a/0x12b0
[ 840.059942] [<ffffffff816be280>] ? ftrace_call+0x5/0x2f
[ 840.063114] [<ffffffff81243a4d>] ext4_map_blocks+0x23d/0x4a0
[ 840.066325] [<ffffffff81243e40>] ? _ext4_get_block+0x190/0x190
[ 840.069563] [<ffffffff81243d33>] _ext4_get_block+0x83/0x190
[ 840.072754] [<ffffffff81243e56>] ext4_get_block+0x16/0x20
[ 840.075939] [<ffffffff811e609f>] generic_block_bmap+0x3f/0x50
[ 840.079189] [<ffffffff811e6065>] ? generic_block_bmap+0x5/0x50
[ 840.082433] [<ffffffff81241610>] ? ext4_num_dirty_pages.isra.46+0x210/0x210
[ 840.085817] [<ffffffff81147302>] ? mapping_tagged+0x12/0x20
[ 840.088981] [<ffffffff8124169c>] ext4_bmap+0x8c/0xf0
[ 840.092065] [<ffffffff811cb021>] bmap+0x21/0x30
[ 840.095078] [<ffffffff8129b073>] jbd2_journal_bmap+0x33/0xb0
[ 840.098238] [<ffffffff8129b175>] jbd2_journal_next_log_block+0x85/0xa0
[ 840.101479] [<ffffffff81291728>] jbd2_journal_commit_transaction+0x8f8/0x25e0
[ 840.104763] [<ffffffff816be280>] ? ftrace_call+0x5/0x2f
[ 840.107774] [<ffffffff816b5b52>] ? _raw_spin_unlock_irqrestore+0x42/0x80
[ 840.110983] [<ffffffff81297f61>] kjournald2+0xe1/0x400
[ 840.114003] [<ffffffff81072d30>] ? wake_up_bit+0x40/0x40
[ 840.117017] [<ffffffff81297e80>] ? commit_timeout+0x10/0x10
[ 840.120065] [<ffffffff81071c3d>] kthread+0xed/0x100
[ 840.123001] [<ffffffff81071b50>] ? insert_kthread_work+0x80/0x80
[ 840.126086] [<ffffffff816be59c>] ret_from_fork+0x7c/0xb0
[ 840.129092] [<ffffffff81071b50>] ? insert_kthread_work+0x80/0x80
[ 840.132199] ---[ end trace 407275e2f13d1407 ]---



$ grep RCU .config
# RCU Subsystem
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_USER_QS=y
CONFIG_RCU_FANOUT=64
CONFIG_RCU_FANOUT_LEAF=16
# CONFIG_RCU_FANOUT_EXACT is not set
# CONFIG_RCU_FAST_NO_HZ is not set
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_RCU_BOOST=y
CONFIG_RCU_BOOST_PRIO=1
CONFIG_RCU_BOOST_DELAY=500
CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_ALL=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
# RCU Debugging
CONFIG_PROVE_RCU=y
# CONFIG_PROVE_RCU_REPEATEDLY is not set
CONFIG_PROVE_RCU_DELAY=y
CONFIG_SPARSE_RCU_POINTER=y
# CONFIG_RCU_TORTURE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=60
CONFIG_RCU_CPU_STALL_VERBOSE=y
CONFIG_RCU_CPU_STALL_INFO=y
# CONFIG_RCU_TRACE is not set


2013-05-21 23:58:21

by Dave Jones

[permalink] [raw]
Subject: WARNING: at kernel/rcutree.c:388 rcu_eqs_enter

On Tue, May 21, 2013 at 01:31:00PM -0400, Dave Jones wrote:

I'm also seeing this quite often..

[ 576.156848] WARNING: at kernel/rcutree.c:388 rcu_eqs_enter+0x9b/0xb0()
[ 576.165962] Modules linked in: snd_seq_dummy tun fuse bnep hidp can_raw phonet rfcomm rose bluetooth netrom af_rxrpc scsi_transport_iscsi caif_socket caif ipt_ULOG af_key pppoe pppox af_802154 ppp_generic nfnetlink slhc llc2 can_bcm can irda atm ax25 ipx rds p8023 p8022 crc_ccitt x25 nfc rfkill decnet appletalk psnap llc snd_hda_codec_realtek snd_hda_codec_hdmi xfs libcrc32c microcode snd_hda_intel snd_hda_codec pcspkr i915 snd_hwdep snd_seq snd_seq_device e1000e snd_pcm i2c_algo_bit drm_kms_helper drm ptp snd_page_alloc snd_timer pps_core snd i2c_core soundcore video
[ 576.235012] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-rc2+ #2
[ 576.264999] ffffffff819e8d1d ffffffff81c01e90 ffffffff816599f1 ffffffff81c01ec8
[ 576.265002] ffffffff8104a171 ffff8802449cdb60 0000000000000000 0000000000000000
[ 576.265005] ffffffff81c01fd8 ffffffff81c01fd8 ffffffff81c01ed8 ffffffff8104a24a
[ 576.265006] Call Trace:
[ 576.265011] [<ffffffff816599f1>] dump_stack+0x19/0x1b
[ 576.265015] [<ffffffff8104a171>] warn_slowpath_common+0x61/0x80
[ 576.265018] [<ffffffff8104a24a>] warn_slowpath_null+0x1a/0x20
[ 576.265022] [<ffffffff81100b7b>] rcu_eqs_enter+0x9b/0xb0
[ 576.265025] [<ffffffff81100ba9>] rcu_idle_enter+0x19/0x40
[ 576.265029] [<ffffffff810a66e2>] cpu_startup_entry+0x92/0x420
[ 576.265034] [<ffffffff81645287>] rest_init+0x137/0x140
[ 576.265037] [<ffffffff81645155>] ? rest_init+0x5/0x140
[ 576.265041] [<ffffffff81ed3e1a>] start_kernel+0x3ec/0x3f9
[ 576.265044] [<ffffffff81ed3836>] ? repair_env_string+0x5c/0x5c
[ 576.265048] [<ffffffff81ed356f>] x86_64_start_reservations+0x2a/0x2c
[ 576.265050] [<ffffffff81ed363d>] x86_64_start_kernel+0xcc/0xcf
[ 576.265052] ---[ end trace d18cda23dfca05cc ]---

WARN_ON_ONCE((oldval & DYNTICK_TASK_NEST_MASK) == 0);

[ 576.265054] WARNING: at kernel/rcutree.c:363 rcu_eqs_enter_common+0x44d/0x470()
[ 576.265096] Modules linked in: snd_seq_dummy tun fuse bnep hidp can_raw phonet rfcomm rose bluetooth netrom af_rxrpc scsi_transport_iscsi caif_socket caif ipt_ULOG af_key pppoe pppox af_802154 ppp_generic nfnetlink slhc llc2 can_bcm can irda atm ax25 ipx rds p8023 p8022 crc_ccitt x25 nfc rfkill decnet appletalk psnap llc snd_hda_codec_realtek snd_hda_codec_hdmi xfs libcrc32c microcode snd_hda_intel snd_hda_codec pcspkr i915 snd_hwdep snd_seq snd_seq_device e1000e snd_pcm i2c_algo_bit drm_kms_helper drm ptp snd_page_alloc snd_timer pps_core snd i2c_core soundcore video
[ 576.265099] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.10.0-rc2+ #2
[ 576.265103] ffffffff819e8d1d ffffffff81c01e50 ffffffff816599f1 ffffffff81c01e88
[ 576.265106] ffffffff8104a171 ffff8802449cdb60 0000000000000000 ffff8802449cdb60
[ 576.265109] ff00000000000000 0000000000000000 ffffffff81c01e98 ffffffff8104a24a
[ 576.265110] Call Trace:
[ 576.265113] [<ffffffff816599f1>] dump_stack+0x19/0x1b
[ 576.265115] [<ffffffff8104a171>] warn_slowpath_common+0x61/0x80
[ 576.265118] [<ffffffff8104a24a>] warn_slowpath_null+0x1a/0x20
[ 576.265121] [<ffffffff81100abd>] rcu_eqs_enter_common+0x44d/0x470
[ 576.265124] [<ffffffff81100b4a>] rcu_eqs_enter+0x6a/0xb0
[ 576.265127] [<ffffffff81100ba9>] rcu_idle_enter+0x19/0x40
[ 576.265130] [<ffffffff810a66e2>] cpu_startup_entry+0x92/0x420
[ 576.265133] [<ffffffff81645287>] rest_init+0x137/0x140
[ 576.265136] [<ffffffff81645155>] ? rest_init+0x5/0x140
[ 576.265139] [<ffffffff81ed3e1a>] start_kernel+0x3ec/0x3f9
[ 576.265141] [<ffffffff81ed3836>] ? repair_env_string+0x5c/0x5c
[ 576.265145] [<ffffffff81ed356f>] x86_64_start_reservations+0x2a/0x2c
[ 576.265147] [<ffffffff81ed363d>] x86_64_start_kernel+0xcc/0xcf
[ 576.265149] ---[ end trace d18cda23dfca05cd ]---

WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1);

[ 576.265362] WARNING: at kernel/rcutree.c:480 rcu_irq_exit+0x1da/0x1f0()
[ 576.265403] Modules linked in: snd_seq_dummy tun fuse bnep hidp can_raw phonet rfcomm rose bluetooth netrom af_rxrpc scsi_transport_iscsi caif_socket caif ipt_ULOG af_key pppoe pppox af_802154 ppp_generic nfnetlink slhc llc2 can_bcm can irda atm ax25 ipx rds p8023 p8022 crc_ccitt x25 nfc rfkill decnet appletalk psnap llc snd_hda_codec_realtek snd_hda_codec_hdmi xfs libcrc32c microcode snd_hda_intel snd_hda_codec pcspkr i915 snd_hwdep snd_seq snd_seq_device e1000e snd_pcm i2c_algo_bit drm_kms_helper drm ptp snd_page_alloc snd_timer pps_core snd i2c_core soundcore video
[ 576.265406] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.10.0-rc2+ #2
[ 576.265410] ffffffff819e8d1d ffff880244803ef8 ffffffff816599f1 ffff880244803f30
[ 576.265414] ffffffff8104a171 ffff8802449cdb60 ff00000000000000 ff00000000000001
[ 576.265417] 0000000000000092 ffffffff81c01fd8 ffff880244803f40 ffffffff8104a24a
[ 576.265417] Call Trace:
[ 576.265421] <IRQ> [<ffffffff816599f1>] dump_stack+0x19/0x1b
[ 576.265424] [<ffffffff8104a171>] warn_slowpath_common+0x61/0x80
[ 576.265427] [<ffffffff8104a24a>] warn_slowpath_null+0x1a/0x20
[ 576.265429] [<ffffffff8110292a>] rcu_irq_exit+0x1da/0x1f0
[ 576.265433] [<ffffffff81054679>] irq_exit+0x89/0xc0
[ 576.265438] [<ffffffff8166bf4b>] smp_apic_timer_interrupt+0x6b/0x9b
[ 576.265442] [<ffffffff8166b0af>] apic_timer_interrupt+0x6f/0x80
[ 576.265448] <EOI> [<ffffffff816625a0>] ? retint_restore_args+0xe/0xe
[ 576.265455] [<ffffffff814e3dfb>] ? cpuidle_idle_call+0x3cb/0x3f0
[ 576.265459] [<ffffffff814e3dff>] ? cpuidle_idle_call+0x3cf/0x3f0
[ 576.265463] [<ffffffff814e3dfb>] ? cpuidle_idle_call+0x3cb/0x3f0
[ 576.265470] [<ffffffff8100bcde>] arch_cpu_idle+0xe/0x30
[ 576.265474] [<ffffffff810a66e7>] cpu_startup_entry+0x97/0x420
[ 576.265479] [<ffffffff81645287>] rest_init+0x137/0x140
[ 576.265483] [<ffffffff81645155>] ? rest_init+0x5/0x140
[ 576.265487] [<ffffffff81ed3e1a>] start_kernel+0x3ec/0x3f9
[ 576.265490] [<ffffffff81ed3836>] ? repair_env_string+0x5c/0x5c
[ 576.265494] [<ffffffff81ed356f>] x86_64_start_reservations+0x2a/0x2c
[ 576.265496] [<ffffffff81ed363d>] x86_64_start_kernel+0xcc/0xcf
[ 576.265498] ---[ end trace d18cda23dfca05ce ]---

WARN_ON_ONCE(rdtp->dynticks_nesting < 0);


[ 576.265501] WARNING: at kernel/rcutree.c:528 rcu_eqs_exit+0x91/0xa0()
[ 576.265542] Modules linked in: snd_seq_dummy tun fuse bnep hidp can_raw phonet rfcomm rose bluetooth netrom af_rxrpc scsi_transport_iscsi caif_socket caif ipt_ULOG af_key pppoe pppox af_802154 ppp_generic nfnetlink slhc llc2 can_bcm can irda atm ax25 ipx rds p8023 p8022 crc_ccitt x25 nfc rfkill decnet appletalk psnap llc snd_hda_codec_realtek snd_hda_codec_hdmi xfs libcrc32c microcode snd_hda_intel snd_hda_codec pcspkr i915 snd_hwdep snd_seq snd_seq_device e1000e snd_pcm i2c_algo_bit drm_kms_helper drm ptp snd_page_alloc snd_timer pps_core snd i2c_core soundcore video
[ 576.265545] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.10.0-rc2+ #2
[ 576.265551] ffffffff819e8d1d ffffffff81c01e90 ffffffff816599f1 ffffffff81c01ec8
[ 576.265559] ffffffff8104a171 ffff8802449cdb60 ff00000000000000 0000000000000000
[ 576.265574] ffffffff81c01fd8 ffffffff81c01fd8 ffffffff81c01ed8 ffffffff8104a24a
[ 576.265575] Call Trace:
[ 576.265578] [<ffffffff816599f1>] dump_stack+0x19/0x1b
[ 576.265580] [<ffffffff8104a171>] warn_slowpath_common+0x61/0x80
[ 576.265583] [<ffffffff8104a24a>] warn_slowpath_null+0x1a/0x20
[ 576.265586] [<ffffffff810fd681>] rcu_eqs_exit+0x91/0xa0
[ 576.265588] [<ffffffff810fd6a9>] rcu_idle_exit+0x19/0x40
[ 576.265591] [<ffffffff810a66f7>] cpu_startup_entry+0xa7/0x420
[ 576.265595] [<ffffffff81645287>] rest_init+0x137/0x140
[ 576.265597] [<ffffffff81645155>] ? rest_init+0x5/0x140
[ 576.265600] [<ffffffff81ed3e1a>] start_kernel+0x3ec/0x3f9
[ 576.265602] [<ffffffff81ed3836>] ? repair_env_string+0x5c/0x5c
[ 576.265605] [<ffffffff81ed356f>] x86_64_start_reservations+0x2a/0x2c
[ 576.265608] [<ffffffff81ed363d>] x86_64_start_kernel+0xcc/0xcf
[ 576.265609] ---[ end trace d18cda23dfca05cf ]---

WARN_ON_ONCE(oldval < 0);

[ 576.265612] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
[ 576.265663] Modules linked in: snd_seq_dummy tun fuse bnep hidp can_raw phonet rfcomm rose bluetooth netrom af_rxrpc scsi_transport_iscsi caif_socket caif ipt_ULOG af_key pppoe pppox af_802154 ppp_generic nfnetlink slhc llc2 can_bcm can irda atm ax25 ipx rds p8023 p8022 crc_ccitt x25 nfc rfkill decnet appletalk psnap llc snd_hda_codec_realtek snd_hda_codec_hdmi xfs libcrc32c microcode snd_hda_intel snd_hda_codec pcspkr i915 snd_hwdep snd_seq snd_seq_device e1000e snd_pcm i2c_algo_bit drm_kms_helper drm ptp snd_page_alloc snd_timer pps_core snd i2c_core soundcore video
[ 576.265665] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.10.0-rc2+ #2
[ 576.265669] ffffffff819e8d1d ffffffff81c01e50 ffffffff816599f1 ffffffff81c01e88
[ 576.265672] ffffffff8104a171 ffff8802449cdb60 ff00000000000000 ffff8802449cdb60
[ 576.265675] ffffffff81c01fd8 0000000000000000 ffffffff81c01e98 ffffffff8104a24a
[ 576.265676] Call Trace:
[ 576.265678] [<ffffffff816599f1>] dump_stack+0x19/0x1b
[ 576.265680] [<ffffffff8104a171>] warn_slowpath_common+0x61/0x80
[ 576.265683] [<ffffffff8104a24a>] warn_slowpath_null+0x1a/0x20
[ 576.265686] [<ffffffff810fd5c7>] rcu_eqs_exit_common+0x357/0x380
[ 576.265689] [<ffffffff810fd64a>] rcu_eqs_exit+0x5a/0xa0
[ 576.265691] [<ffffffff810fd6a9>] rcu_idle_exit+0x19/0x40
[ 576.265694] [<ffffffff810a66f7>] cpu_startup_entry+0xa7/0x420
[ 576.265698] [<ffffffff81645287>] rest_init+0x137/0x140
[ 576.265700] [<ffffffff81645155>] ? rest_init+0x5/0x140
[ 576.265704] [<ffffffff81ed3e1a>] start_kernel+0x3ec/0x3f9
[ 576.265708] [<ffffffff81ed3836>] ? repair_env_string+0x5c/0x5c
[ 576.265712] [<ffffffff81ed356f>] x86_64_start_reservations+0x2a/0x2c
[ 576.265716] [<ffffffff81ed363d>] x86_64_start_kernel+0xcc/0xcf
[ 576.265717] ---[ end trace d18cda23dfca05d0 ]---

WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));


I'm trying to collect all the WARN's in rcutree. 5/29 isn't a bad start :)

Dave

2013-05-22 10:24:12

by Paul E. McKenney

[permalink] [raw]
Subject: Re: rcu_read_lock() used illegally while idle!

On Tue, May 21, 2013 at 01:31:00PM -0400, Dave Jones wrote:
> Back from vacation, and breaking stuff already :)
>
> -rc2, plus some small local mods that shouldn't be a cause.

Hmmm... I thought that ftrace already took care to let RCU know that
the CPU should be considered non-idle before using RCU read-side critical
sections.

Adding Steven and Frederic to CC for their insight.

Thanx, Paul

> [ 839.278967] ===============================
> [ 839.279023] [ INFO: suspicious RCU usage. ]
> [ 839.279080] 3.10.0-rc2+ #1 Not tainted
> [ 839.279129] -------------------------------
> [ 839.279185] include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
> [ 839.279273]
> other info that might help us debug this:
>
> [ 839.279370]
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 1, debug_locks = 0
> [ 839.279497] RCU used illegally from extended quiescent state!
> [ 839.279566] 2 locks held by cc1/63645:
> [ 839.279614] #0: (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
> [ 839.279754] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0
> [ 839.279895]
> stack backtrace:
> [ 839.279954] CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
> [ 839.280074] Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
> [ 839.280182] 0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
> [ 839.280301] ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
> [ 839.280419] 0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
> [ 839.280537] Call Trace:
> [ 839.280578] [<ffffffff816ae383>] dump_stack+0x19/0x1b
> [ 839.280645] [<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
> [ 839.280723] [<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
> [ 839.280793] [<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
> [ 839.280865] [<ffffffff8108dffc>] update_curr+0xec/0x240
> [ 839.283157] [<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
> [ 839.285453] [<ffffffff816b3a71>] __schedule+0x161/0x9b0
> [ 839.287753] [<ffffffff816b4721>] preempt_schedule+0x51/0x80
> [ 839.290056] [<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
> [ 839.292347] [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
> [ 839.294620] [<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
> [ 839.296880] [<ffffffff816be280>] ftrace_call+0x5/0x2f
> [ 839.299107] [<ffffffff816b681d>] ? retint_careful+0xb/0x2e
> [ 839.301319] [<ffffffff816b4805>] ? schedule_user+0x5/0x70
> [ 839.303524] [<ffffffff816b4805>] ? schedule_user+0x5/0x70
> [ 839.305709] [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
> [ 839.959586] ------------[ cut here ]------------
> [ 839.962342] WARNING: at kernel/context_tracking.c:55 user_enter+0xb5/0xd0()
> [ 839.965471] Modules linked in: 8021q garp stp mrp rfcomm cmtp kernelcapi hidp fuse phonet af_key rose caif_socket af_rxrpc caif netrom llc2 bnep can_bcm can_raw can ipt_ULOG nfnetlink l2tp_ppp l2tp_netlink l2tp_core pppoe pppox ppp_generic slhc scsi_transport_iscsi ipx x25 p8023 appletalk psnap p8022 llc irda nfc crc_ccitt rds ax25 decnet af_802154 atm kvm_amd kvm microcode edac_core btusb bluetooth serio_raw pcspkr snd_hda_codec_realtek snd_hda_intel rfkill snd_hda_codec snd_pcm r8169 mii snd_page_alloc snd_timer snd soundcore sr_mod cdrom pata_atiixp radeon backlight drm_kms_helper ttm
> [ 839.983088] CPU: 1 PID: 275 Comm: jbd2/sda5-8 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 41/277 64411]
> [ 839.986926] Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
> [ 839.990687] ffffffff819ffcdc ffff88011cf6d6e8 ffffffff816ae383 ffff88011cf6d720
> [ 839.994267] ffffffff81043280 0000000000000000 ffff88011cf6dfd8 00000000ffffffff
> [ 839.997861] ffff88012a8046c0 ffff88012b1d7320 ffff88011cf6d730 ffffffff8104336a
> [ 840.001431] Call Trace:
> [ 840.004284] [<ffffffff816ae383>] dump_stack+0x19/0x1b
> [ 840.007486] [<ffffffff81043280>] warn_slowpath_common+0x70/0xa0
> [ 840.010807] [<ffffffff8104336a>] warn_slowpath_null+0x1a/0x20
> [ 840.014079] [<ffffffff8113acc5>] user_enter+0xb5/0xd0
> [ 840.017255] [<ffffffff816b496d>] preempt_schedule_irq+0x7d/0x90
> [ 840.020548] [<ffffffff816b68f6>] retint_kernel+0x26/0x30
> [ 840.023760] [<ffffffff81190bf7>] ? deactivate_slab+0x627/0x740
> [ 840.027035] [<ffffffff810b83fd>] ? trace_hardirqs_on+0xd/0x10
> [ 840.030303] [<ffffffff816ac6ca>] ? __slab_alloc+0x433/0x4dd
> [ 840.033547] [<ffffffff8126bc98>] ? ext4_ext_find_extent+0x438/0x520
> [ 840.036917] [<ffffffff816be280>] ? ftrace_call+0x5/0x2f
> [ 840.040147] [<ffffffff8126bc98>] ? ext4_ext_find_extent+0x438/0x520
> [ 840.043497] [<ffffffff8126bc98>] ? ext4_ext_find_extent+0x438/0x520
> [ 840.046854] [<ffffffff8126bc98>] ? ext4_ext_find_extent+0x438/0x520
> [ 840.050171] [<ffffffff81192172>] __kmalloc+0x302/0x400
> [ 840.053339] [<ffffffff8126bc98>] ext4_ext_find_extent+0x438/0x520
> [ 840.056660] [<ffffffff812701ea>] ext4_ext_map_blocks+0x6a/0x12b0
> [ 840.059942] [<ffffffff816be280>] ? ftrace_call+0x5/0x2f
> [ 840.063114] [<ffffffff81243a4d>] ext4_map_blocks+0x23d/0x4a0
> [ 840.066325] [<ffffffff81243e40>] ? _ext4_get_block+0x190/0x190
> [ 840.069563] [<ffffffff81243d33>] _ext4_get_block+0x83/0x190
> [ 840.072754] [<ffffffff81243e56>] ext4_get_block+0x16/0x20
> [ 840.075939] [<ffffffff811e609f>] generic_block_bmap+0x3f/0x50
> [ 840.079189] [<ffffffff811e6065>] ? generic_block_bmap+0x5/0x50
> [ 840.082433] [<ffffffff81241610>] ? ext4_num_dirty_pages.isra.46+0x210/0x210
> [ 840.085817] [<ffffffff81147302>] ? mapping_tagged+0x12/0x20
> [ 840.088981] [<ffffffff8124169c>] ext4_bmap+0x8c/0xf0
> [ 840.092065] [<ffffffff811cb021>] bmap+0x21/0x30
> [ 840.095078] [<ffffffff8129b073>] jbd2_journal_bmap+0x33/0xb0
> [ 840.098238] [<ffffffff8129b175>] jbd2_journal_next_log_block+0x85/0xa0
> [ 840.101479] [<ffffffff81291728>] jbd2_journal_commit_transaction+0x8f8/0x25e0
> [ 840.104763] [<ffffffff816be280>] ? ftrace_call+0x5/0x2f
> [ 840.107774] [<ffffffff816b5b52>] ? _raw_spin_unlock_irqrestore+0x42/0x80
> [ 840.110983] [<ffffffff81297f61>] kjournald2+0xe1/0x400
> [ 840.114003] [<ffffffff81072d30>] ? wake_up_bit+0x40/0x40
> [ 840.117017] [<ffffffff81297e80>] ? commit_timeout+0x10/0x10
> [ 840.120065] [<ffffffff81071c3d>] kthread+0xed/0x100
> [ 840.123001] [<ffffffff81071b50>] ? insert_kthread_work+0x80/0x80
> [ 840.126086] [<ffffffff816be59c>] ret_from_fork+0x7c/0xb0
> [ 840.129092] [<ffffffff81071b50>] ? insert_kthread_work+0x80/0x80
> [ 840.132199] ---[ end trace 407275e2f13d1407 ]---
>
>
>
> $ grep RCU .config
> # RCU Subsystem
> CONFIG_TREE_PREEMPT_RCU=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_RCU_STALL_COMMON=y
> CONFIG_RCU_USER_QS=y
> CONFIG_RCU_FANOUT=64
> CONFIG_RCU_FANOUT_LEAF=16
> # CONFIG_RCU_FANOUT_EXACT is not set
> # CONFIG_RCU_FAST_NO_HZ is not set
> # CONFIG_TREE_RCU_TRACE is not set
> CONFIG_RCU_BOOST=y
> CONFIG_RCU_BOOST_PRIO=1
> CONFIG_RCU_BOOST_DELAY=500
> CONFIG_RCU_NOCB_CPU=y
> CONFIG_RCU_NOCB_CPU_ALL=y
> CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
> # RCU Debugging
> CONFIG_PROVE_RCU=y
> # CONFIG_PROVE_RCU_REPEATEDLY is not set
> CONFIG_PROVE_RCU_DELAY=y
> CONFIG_SPARSE_RCU_POINTER=y
> # CONFIG_RCU_TORTURE_TEST is not set
> CONFIG_RCU_CPU_STALL_TIMEOUT=60
> CONFIG_RCU_CPU_STALL_VERBOSE=y
> CONFIG_RCU_CPU_STALL_INFO=y
> # CONFIG_RCU_TRACE is not set
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2013-05-22 10:28:57

by Paul E. McKenney

[permalink] [raw]
Subject: Re: WARNING: at kernel/rcutree.c:388 rcu_eqs_enter

On Tue, May 21, 2013 at 07:58:13PM -0400, Dave Jones wrote:
> On Tue, May 21, 2013 at 01:31:00PM -0400, Dave Jones wrote:
>
> I'm also seeing this quite often..
>
> [ 576.156848] WARNING: at kernel/rcutree.c:388 rcu_eqs_enter+0x9b/0xb0()
> [ 576.165962] Modules linked in: snd_seq_dummy tun fuse bnep hidp can_raw phonet rfcomm rose bluetooth netrom af_rxrpc scsi_transport_iscsi caif_socket caif ipt_ULOG af_key pppoe pppox af_802154 ppp_generic nfnetlink slhc llc2 can_bcm can irda atm ax25 ipx rds p8023 p8022 crc_ccitt x25 nfc rfkill decnet appletalk psnap llc snd_hda_codec_realtek snd_hda_codec_hdmi xfs libcrc32c microcode snd_hda_intel snd_hda_codec pcspkr i915 snd_hwdep snd_seq snd_seq_device e1000e snd_pcm i2c_algo_bit drm_kms_helper drm ptp snd_page_alloc snd_timer pps_core snd i2c_core soundcore video
> [ 576.235012] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-rc2+ #2
> [ 576.264999] ffffffff819e8d1d ffffffff81c01e90 ffffffff816599f1 ffffffff81c01ec8
> [ 576.265002] ffffffff8104a171 ffff8802449cdb60 0000000000000000 0000000000000000
> [ 576.265005] ffffffff81c01fd8 ffffffff81c01fd8 ffffffff81c01ed8 ffffffff8104a24a
> [ 576.265006] Call Trace:
> [ 576.265011] [<ffffffff816599f1>] dump_stack+0x19/0x1b
> [ 576.265015] [<ffffffff8104a171>] warn_slowpath_common+0x61/0x80
> [ 576.265018] [<ffffffff8104a24a>] warn_slowpath_null+0x1a/0x20
> [ 576.265022] [<ffffffff81100b7b>] rcu_eqs_enter+0x9b/0xb0
> [ 576.265025] [<ffffffff81100ba9>] rcu_idle_enter+0x19/0x40
> [ 576.265029] [<ffffffff810a66e2>] cpu_startup_entry+0x92/0x420
> [ 576.265034] [<ffffffff81645287>] rest_init+0x137/0x140
> [ 576.265037] [<ffffffff81645155>] ? rest_init+0x5/0x140
> [ 576.265041] [<ffffffff81ed3e1a>] start_kernel+0x3ec/0x3f9
> [ 576.265044] [<ffffffff81ed3836>] ? repair_env_string+0x5c/0x5c
> [ 576.265048] [<ffffffff81ed356f>] x86_64_start_reservations+0x2a/0x2c
> [ 576.265050] [<ffffffff81ed363d>] x86_64_start_kernel+0xcc/0xcf
> [ 576.265052] ---[ end trace d18cda23dfca05cc ]---
>
> WARN_ON_ONCE((oldval & DYNTICK_TASK_NEST_MASK) == 0);

Hmmm... Could you please try the patch below? Untested, probably does
not even compile.

[ . . . ]

> I'm trying to collect all the WARN's in rcutree. 5/29 isn't a bad start :)

;-) ;-) ;-)

Just to make your testing more exiting... I will be getting on a plane
in a few hours, and will be on planes and in airports for about 30 hours.
I will check in as connectivity permits.

Thanx, Paul

------------------------------------------------------------------------

trace: Allow idle-safe tracepoints to be called from irq

__DECLARE_TRACE_RCU() currently creates an _rcuidle() tracepoint which
may safely be invoked from what RCU considers to be an idle CPU.
However, these _rcuidle() tracepoints may -not- be invoked from the
handler of an irq taken from idle, because rcu_idle_enter() zeroes
RCU's nesting-level counter, so that the rcu_irq_exit() returning to
idle will trigger a WARN_ON_ONCE().

This commit therefore substitutes rcu_irq_enter() for rcu_idle_exit()
and rcu_irq_exit() for rcu_idle_enter() in order to make the _rcuidle()
tracepoints usable from irq handlers as well as from process context.

Reported-by: Dave Jones <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Steven Rostedt <[email protected]>

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 2f322c3..f8e084d 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -145,8 +145,8 @@ static inline void tracepoint_synchronize_unregister(void)
TP_PROTO(data_proto), \
TP_ARGS(data_args), \
TP_CONDITION(cond), \
- rcu_idle_exit(), \
- rcu_idle_enter()); \
+ rcu_irq_enter(), \
+ rcu_irq_exit()); \
}
#else
#define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args)

2013-05-22 12:49:21

by Steven Rostedt

[permalink] [raw]
Subject: Re: rcu_read_lock() used illegally while idle!

On Wed, 2013-05-22 at 02:36 -0700, Paul E. McKenney wrote:
> On Tue, May 21, 2013 at 01:31:00PM -0400, Dave Jones wrote:
> > Back from vacation, and breaking stuff already :)
> >
> > -rc2, plus some small local mods that shouldn't be a cause.
>
> Hmmm... I thought that ftrace already took care to let RCU know that
> the CPU should be considered non-idle before using RCU read-side critical
> sections.
>
> Adding Steven and Frederic to CC for their insight.
>
> Thanx, Paul
>
> > [ 839.278967] ===============================
> > [ 839.279023] [ INFO: suspicious RCU usage. ]
> > [ 839.279080] 3.10.0-rc2+ #1 Not tainted
> > [ 839.279129] -------------------------------
> > [ 839.279185] include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
> > [ 839.279273]
> > other info that might help us debug this:
> >
> > [ 839.279370]
> > RCU used illegally from idle CPU!
> > rcu_scheduler_active = 1, debug_locks = 0
> > [ 839.279497] RCU used illegally from extended quiescent state!
> > [ 839.279566] 2 locks held by cc1/63645:
> > [ 839.279614] #0: (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
> > [ 839.279754] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0
> > [ 839.279895]
> > stack backtrace:
> > [ 839.279954] CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
> > [ 839.280074] Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
> > [ 839.280182] 0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
> > [ 839.280301] ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
> > [ 839.280419] 0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
> > [ 839.280537] Call Trace:
> > [ 839.280578] [<ffffffff816ae383>] dump_stack+0x19/0x1b
> > [ 839.280645] [<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
> > [ 839.280723] [<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
> > [ 839.280793] [<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
> > [ 839.280865] [<ffffffff8108dffc>] update_curr+0xec/0x240
> > [ 839.283157] [<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
> > [ 839.285453] [<ffffffff816b3a71>] __schedule+0x161/0x9b0
> > [ 839.287753] [<ffffffff816b4721>] preempt_schedule+0x51/0x80
> > [ 839.290056] [<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
> > [ 839.292347] [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
> > [ 839.294620] [<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
> > [ 839.296880] [<ffffffff816be280>] ftrace_call+0x5/0x2f
> > [ 839.299107] [<ffffffff816b681d>] ? retint_careful+0xb/0x2e
> > [ 839.301319] [<ffffffff816b4805>] ? schedule_user+0x5/0x70
> > [ 839.303524] [<ffffffff816b4805>] ? schedule_user+0x5/0x70
> > [ 839.305709] [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
> > [ 839.959586] ------------[ cut here ]------------

This looks to be the caused by the same issue that this patch fixes:

https://lkml.org/lkml/2013/5/10/537

The schedule_user() was traced, and the preempt_enable_no_trace() that
the function tracer does caused for a schedule to occur. As the
scheduler uses rcu, and it was called before schedule_user() could tell
the kernel that the context is changing from user to kernel.

-- Steve



> > [ 839.962342] WARNING: at kernel/context_tracking.c:55 user_enter+0xb5/0xd0()
> > [ 839.965471] Modules linked in: 8021q garp stp mrp rfcomm cmtp kernelcapi hidp fuse phonet af_key rose caif_socket af_rxrpc caif netrom llc2 bnep can_bcm can_raw can ipt_ULOG nfnetlink l2tp_ppp l2tp_netlink l2tp_core pppoe pppox ppp_generic slhc scsi_transport_iscsi ipx x25 p8023 appletalk psnap p8022 llc irda nfc crc_ccitt rds ax25 decnet af_802154 atm kvm_amd kvm microcode edac_core btusb bluetooth serio_raw pcspkr snd_hda_codec_realtek snd_hda_intel rfkill snd_hda_codec snd_pcm r8169 mii snd_page_alloc snd_timer snd soundcore sr_mod cdrom pata_atiixp radeon backlight drm_kms_helper ttm
> > [ 839.983088] CPU: 1 PID: 275 Comm: jbd2/sda5-8 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 41/277 64411]
> > [ 839.986926] Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
> > [ 839.990687] ffffffff819ffcdc ffff88011cf6d6e8 ffffffff816ae383 ffff88011cf6d720
> > [ 839.994267] ffffffff81043280 0000000000000000 ffff88011cf6dfd8 00000000ffffffff
> > [ 839.997861] ffff88012a8046c0 ffff88012b1d7320 ffff88011cf6d730 ffffffff8104336a
> > [ 840.001431] Call Trace:
> > [ 840.004284] [<ffffffff816ae383>] dump_stack+0x19/0x1b
> > [ 840.007486] [<ffffffff81043280>] warn_slowpath_common+0x70/0xa0
> > [ 840.010807] [<ffffffff8104336a>] warn_slowpath_null+0x1a/0x20
> > [ 840.014079] [<ffffffff8113acc5>] user_enter+0xb5/0xd0
> > [ 840.017255] [<ffffffff816b496d>] preempt_schedule_irq+0x7d/0x90
> > [ 840.020548] [<ffffffff816b68f6>] retint_kernel+0x26/0x30
> > [ 840.023760] [<ffffffff81190bf7>] ? deactivate_slab+0x627/0x740
> > [ 840.027035] [<ffffffff810b83fd>] ? trace_hardirqs_on+0xd/0x10
> > [ 840.030303] [<ffffffff816ac6ca>] ? __slab_alloc+0x433/0x4dd
> > [ 840.033547] [<ffffffff8126bc98>] ? ext4_ext_find_extent+0x438/0x520
> > [ 840.036917] [<ffffffff816be280>] ? ftrace_call+0x5/0x2f
> > [ 840.040147] [<ffffffff8126bc98>] ? ext4_ext_find_extent+0x438/0x520
> > [ 840.043497] [<ffffffff8126bc98>] ? ext4_ext_find_extent+0x438/0x520
> > [ 840.046854] [<ffffffff8126bc98>] ? ext4_ext_find_extent+0x438/0x520
> > [ 840.050171] [<ffffffff81192172>] __kmalloc+0x302/0x400
> > [ 840.053339] [<ffffffff8126bc98>] ext4_ext_find_extent+0x438/0x520
> > [ 840.056660] [<ffffffff812701ea>] ext4_ext_map_blocks+0x6a/0x12b0
> > [ 840.059942] [<ffffffff816be280>] ? ftrace_call+0x5/0x2f
> > [ 840.063114] [<ffffffff81243a4d>] ext4_map_blocks+0x23d/0x4a0
> > [ 840.066325] [<ffffffff81243e40>] ? _ext4_get_block+0x190/0x190
> > [ 840.069563] [<ffffffff81243d33>] _ext4_get_block+0x83/0x190
> > [ 840.072754] [<ffffffff81243e56>] ext4_get_block+0x16/0x20
> > [ 840.075939] [<ffffffff811e609f>] generic_block_bmap+0x3f/0x50
> > [ 840.079189] [<ffffffff811e6065>] ? generic_block_bmap+0x5/0x50
> > [ 840.082433] [<ffffffff81241610>] ? ext4_num_dirty_pages.isra.46+0x210/0x210
> > [ 840.085817] [<ffffffff81147302>] ? mapping_tagged+0x12/0x20
> > [ 840.088981] [<ffffffff8124169c>] ext4_bmap+0x8c/0xf0
> > [ 840.092065] [<ffffffff811cb021>] bmap+0x21/0x30
> > [ 840.095078] [<ffffffff8129b073>] jbd2_journal_bmap+0x33/0xb0
> > [ 840.098238] [<ffffffff8129b175>] jbd2_journal_next_log_block+0x85/0xa0
> > [ 840.101479] [<ffffffff81291728>] jbd2_journal_commit_transaction+0x8f8/0x25e0
> > [ 840.104763] [<ffffffff816be280>] ? ftrace_call+0x5/0x2f
> > [ 840.107774] [<ffffffff816b5b52>] ? _raw_spin_unlock_irqrestore+0x42/0x80
> > [ 840.110983] [<ffffffff81297f61>] kjournald2+0xe1/0x400
> > [ 840.114003] [<ffffffff81072d30>] ? wake_up_bit+0x40/0x40
> > [ 840.117017] [<ffffffff81297e80>] ? commit_timeout+0x10/0x10
> > [ 840.120065] [<ffffffff81071c3d>] kthread+0xed/0x100
> > [ 840.123001] [<ffffffff81071b50>] ? insert_kthread_work+0x80/0x80
> > [ 840.126086] [<ffffffff816be59c>] ret_from_fork+0x7c/0xb0
> > [ 840.129092] [<ffffffff81071b50>] ? insert_kthread_work+0x80/0x80
> > [ 840.132199] ---[ end trace 407275e2f13d1407 ]---
> >
> >
> >
> > $ grep RCU .config
> > # RCU Subsystem
> > CONFIG_TREE_PREEMPT_RCU=y
> > CONFIG_PREEMPT_RCU=y
> > CONFIG_RCU_STALL_COMMON=y
> > CONFIG_RCU_USER_QS=y
> > CONFIG_RCU_FANOUT=64
> > CONFIG_RCU_FANOUT_LEAF=16
> > # CONFIG_RCU_FANOUT_EXACT is not set
> > # CONFIG_RCU_FAST_NO_HZ is not set
> > # CONFIG_TREE_RCU_TRACE is not set
> > CONFIG_RCU_BOOST=y
> > CONFIG_RCU_BOOST_PRIO=1
> > CONFIG_RCU_BOOST_DELAY=500
> > CONFIG_RCU_NOCB_CPU=y
> > CONFIG_RCU_NOCB_CPU_ALL=y
> > CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
> > # RCU Debugging
> > CONFIG_PROVE_RCU=y
> > # CONFIG_PROVE_RCU_REPEATEDLY is not set
> > CONFIG_PROVE_RCU_DELAY=y
> > CONFIG_SPARSE_RCU_POINTER=y
> > # CONFIG_RCU_TORTURE_TEST is not set
> > CONFIG_RCU_CPU_STALL_TIMEOUT=60
> > CONFIG_RCU_CPU_STALL_VERBOSE=y
> > CONFIG_RCU_CPU_STALL_INFO=y
> > # CONFIG_RCU_TRACE is not set
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >

2013-05-23 15:52:17

by Dave Jones

[permalink] [raw]
Subject: Re: WARNING: at kernel/rcutree.c:388 rcu_eqs_enter

On Wed, May 22, 2013 at 02:51:29AM -0700, Paul E. McKenney wrote:

> trace: Allow idle-safe tracepoints to be called from irq
>
> __DECLARE_TRACE_RCU() currently creates an _rcuidle() tracepoint which
> may safely be invoked from what RCU considers to be an idle CPU.
> However, these _rcuidle() tracepoints may -not- be invoked from the
> handler of an irq taken from idle, because rcu_idle_enter() zeroes
> RCU's nesting-level counter, so that the rcu_irq_exit() returning to
> idle will trigger a WARN_ON_ONCE().
>
> This commit therefore substitutes rcu_irq_enter() for rcu_idle_exit()
> and rcu_irq_exit() for rcu_idle_enter() in order to make the _rcuidle()
> tracepoints usable from irq handlers as well as from process context.
>
> Reported-by: Dave Jones <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
> Cc: Steven Rostedt <[email protected]>
>
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index 2f322c3..f8e084d 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -145,8 +145,8 @@ static inline void tracepoint_synchronize_unregister(void)
> TP_PROTO(data_proto), \
> TP_ARGS(data_args), \
> TP_CONDITION(cond), \
> - rcu_idle_exit(), \
> - rcu_idle_enter()); \
> + rcu_irq_enter(), \
> + rcu_irq_exit()); \
> }
> #else
> #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args)

This stops the "WARNING: at kernel/context_tracking.c:55 user_enter+0xb5/0xd0()"
but I still see..


[ 363.399348] ===============================
[ 363.400617] [ INFO: suspicious RCU usage. ]
[ 363.401880] 3.10.0-rc2+ #31 Not tainted
[ 363.403210] -------------------------------
[ 363.403787] include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
[ 363.403831]
other info that might help us debug this:

[ 363.403880]
RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
[ 363.403942] RCU used illegally from extended quiescent state!
[ 363.403976] 2 locks held by trinity-child0/27319:
[ 363.404006] #0: (&rq->lock){-.-.-.}, at: [<ffffffff816e91bf>] __schedule+0xef/0x9c0
[ 363.404076] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff810a5605>] cpuacct_charge+0x5/0x1f0
[ 363.404146]
stack backtrace:
[ 363.404176] CPU: 1 PID: 27319 Comm: trinity-child0 Not tainted 3.10.0-rc2+ #31
[ 363.405536] 0000000000000000 ffff880208ddfcf8 ffffffff816e2a5b ffff880208ddfd28
[ 363.406835] ffffffff810b5917 ffff880243362568 00000000003c3c18 ffff880243362520
[ 363.408140] 0000000000000001 ffff880208ddfd60 ffffffff810a5785 ffffffff810a5605
[ 363.409451] Call Trace:
[ 363.410726] [<ffffffff816e2a5b>] dump_stack+0x19/0x1b
[ 363.412005] [<ffffffff810b5917>] lockdep_rcu_suspicious+0xe7/0x120
[ 363.413281] [<ffffffff810a5785>] cpuacct_charge+0x185/0x1f0
[ 363.414543] [<ffffffff810a5605>] ? cpuacct_charge+0x5/0x1f0
[ 363.415793] [<ffffffff8109607c>] update_curr+0xec/0x250
[ 363.417035] [<ffffffff810975a8>] put_prev_task_fair+0x228/0x480
[ 363.418273] [<ffffffff816e9236>] __schedule+0x166/0x9c0
[ 363.419497] [<ffffffff816e9ee4>] preempt_schedule+0x44/0x60
[ 363.420717] [<ffffffff816e9fb0>] ? __cond_resched_softirq+0x60/0x60
[ 363.421940] [<ffffffff816ebfa4>] ? retint_careful+0x12/0x2e
[ 363.423155] [<ffffffff8110e2b3>] ftrace_ops_control_func+0x1d3/0x210
[ 363.424373] [<ffffffff816f3a40>] ftrace_call+0x5/0x2f
[ 363.425577] [<ffffffff816ebf9d>] ? retint_careful+0xb/0x2e
[ 363.426784] [<ffffffff816e9fb5>] ? schedule_user+0x5/0x70
[ 363.427988] [<ffffffff816e9fb5>] ? schedule_user+0x5/0x70
[ 363.429184] [<ffffffff816ebfa4>] ? retint_careful+0x12/0x2e

I'll try Steve's patch on top next.

Dave

2013-05-23 16:40:19

by Dave Jones

[permalink] [raw]
Subject: Re: rcu_read_lock() used illegally while idle!

On Wed, May 22, 2013 at 08:49:18AM -0400, Steven Rostedt wrote:

> > ===============================
> > [ INFO: suspicious RCU usage. ]
> > 3.10.0-rc2+ #1 Not tainted
> > -------------------------------
> > include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
> > other info that might help us debug this:
> > RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
> > RCU used illegally from extended quiescent state!
> > 2 locks held by cc1/63645:
> > #0: (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
> > #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0
> >
> > CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
> > Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
> > 0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
> > ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
> > 0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
> > Call Trace:
> > [<ffffffff816ae383>] dump_stack+0x19/0x1b
> > [<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
> > [<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
> > [<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
> > [<ffffffff8108dffc>] update_curr+0xec/0x240
> > [<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
> > [<ffffffff816b3a71>] __schedule+0x161/0x9b0
> > [<ffffffff816b4721>] preempt_schedule+0x51/0x80
> > [<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
> > [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
> > [<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
> > [<ffffffff816be280>] ftrace_call+0x5/0x2f
> > [<ffffffff816b681d>] ? retint_careful+0xb/0x2e
> > [<ffffffff816b4805>] ? schedule_user+0x5/0x70
> > [<ffffffff816b4805>] ? schedule_user+0x5/0x70
> > [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
> > ------------[ cut here ]------------
>
> This looks to be the caused by the same issue that this patch fixes:
>
> https://lkml.org/lkml/2013/5/10/537
>
> The schedule_user() was traced, and the preempt_enable_no_trace() that
> the function tracer does caused for a schedule to occur. As the
> scheduler uses rcu, and it was called before schedule_user() could tell
> the kernel that the context is changing from user to kernel.

That patch doesn't help unfortunatly.

Dave

2013-05-24 13:46:23

by Steven Rostedt

[permalink] [raw]
Subject: Re: rcu_read_lock() used illegally while idle!

On Thu, 2013-05-23 at 12:40 -0400, Dave Jones wrote:
> On Wed, May 22, 2013 at 08:49:18AM -0400, Steven Rostedt wrote:
>
> > > ===============================
> > > [ INFO: suspicious RCU usage. ]
> > > 3.10.0-rc2+ #1 Not tainted
> > > -------------------------------
> > > include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
> > > other info that might help us debug this:
> > > RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
> > > RCU used illegally from extended quiescent state!
> > > 2 locks held by cc1/63645:
> > > #0: (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
> > > #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0
> > >
> > > CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
> > > Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
> > > 0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
> > > ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
> > > 0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
> > > Call Trace:
> > > [<ffffffff816ae383>] dump_stack+0x19/0x1b
> > > [<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
> > > [<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
> > > [<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
> > > [<ffffffff8108dffc>] update_curr+0xec/0x240
> > > [<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
> > > [<ffffffff816b3a71>] __schedule+0x161/0x9b0
> > > [<ffffffff816b4721>] preempt_schedule+0x51/0x80
> > > [<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
> > > [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
> > > [<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
> > > [<ffffffff816be280>] ftrace_call+0x5/0x2f
> > > [<ffffffff816b681d>] ? retint_careful+0xb/0x2e
> > > [<ffffffff816b4805>] ? schedule_user+0x5/0x70
> > > [<ffffffff816b4805>] ? schedule_user+0x5/0x70
> > > [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
> > > ------------[ cut here ]------------
> >
> > This looks to be the caused by the same issue that this patch fixes:
> >
> > https://lkml.org/lkml/2013/5/10/537
> >
> > The schedule_user() was traced, and the preempt_enable_no_trace() that
> > the function tracer does caused for a schedule to occur. As the
> > scheduler uses rcu, and it was called before schedule_user() could tell
> > the kernel that the context is changing from user to kernel.
>
> That patch doesn't help unfortunatly.
>

Dave,

Can you send me your full .config, and also what you did to trigger
this? As I see ftrace_call is there, I'm assuming you have function
tracing enabled somewhere. Probably via perf, as ftrace_ops_control_func
is used by perf.

Also, if you added anything special on the kernel command line, that
info will be useful too.

Thanks,

-- Steve

2013-05-24 14:23:48

by Dave Jones

[permalink] [raw]
Subject: Re: rcu_read_lock() used illegally while idle!

On Fri, May 24, 2013 at 09:46:20AM -0400, Steven Rostedt wrote:

> > > This looks to be the caused by the same issue that this patch fixes:
> > >
> > > https://lkml.org/lkml/2013/5/10/537
> > >
> > > The schedule_user() was traced, and the preempt_enable_no_trace() that
> > > the function tracer does caused for a schedule to occur. As the
> > > scheduler uses rcu, and it was called before schedule_user() could tell
> > > the kernel that the context is changing from user to kernel.
> >
> > That patch doesn't help unfortunatly.
> >
>
> Dave,
>
> Can you send me your full .config, and also what you did to trigger
> this? As I see ftrace_call is there, I'm assuming you have function
> tracing enabled somewhere. Probably via perf, as ftrace_ops_control_func
> is used by perf.
>
> Also, if you added anything special on the kernel command line, that
> info will be useful too.

.config: http://paste.fedoraproject.org/14281/94052971/raw/

trace shows the problem process was 'cc1', so I was likely building a kernel
at the time. There was also a trinity run going on in the background.

cmdline: nothing special..

BOOT_IMAGE=/vmlinuz-3.10.0-rc2+ root=UUID=bee21cd9-1852-4d1d-9e9d-2e44332b8df1 ro rd.md=0 rd.lvm=0 rd.dm=0 vconsole.keymap=us rd.luks=0 vconsole.font=latarcyrheb-sun16 consoleblank=0 console=ttyUSB0,115200 console=tty0 pause_on_oops=30 audit=0

Dave

2013-05-24 19:23:44

by Steven Rostedt

[permalink] [raw]
Subject: Re: rcu_read_lock() used illegally while idle!

On Fri, 2013-05-24 at 10:23 -0400, Dave Jones wrote:

> .config: http://paste.fedoraproject.org/14281/94052971/raw/
>
> trace shows the problem process was 'cc1', so I was likely building a kernel
> at the time. There was also a trinity run going on in the background.
>
> cmdline: nothing special..
>
> BOOT_IMAGE=/vmlinuz-3.10.0-rc2+ root=UUID=bee21cd9-1852-4d1d-9e9d-2e44332b8df1 ro rd.md=0 rd.lvm=0 rd.dm=0 vconsole.keymap=us rd.luks=0 vconsole.font=latarcyrheb-sun16 consoleblank=0 console=ttyUSB0,115200 console=tty0 pause_on_oops=30 audit=0
>

OK, I'm able to reproduce this. Looks like the same issue happens from
other ways from userspace into the kernel (the next way was tracing
system calls).

Forget the other patch. Here's a new patch that is more specific to
tracing and context tracking.

Can you try this one out, please.

-- Steve

diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 87a03c7..f5d4723 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -33,9 +33,25 @@ do { \
preempt_schedule(); \
} while (0)

+#ifdef CONFIG_CONTEXT_TRACKING
+
+void preempt_schedule_context(void);
+
+#define preempt_check_resched_context() \
+do { \
+ if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
+ preempt_schedule_context(); \
+} while (0)
+#else
+
+#define preempt_check_resched_context() preempt_check_resched()
+
+#endif /* CONFIG_CONTEXT_TRACKING */
+
#else /* !CONFIG_PREEMPT */

#define preempt_check_resched() do { } while (0)
+#define preempt_check_resched_context() do { } while (0)

#endif /* CONFIG_PREEMPT */

@@ -88,7 +104,7 @@ do { \
do { \
preempt_enable_no_resched_notrace(); \
barrier(); \
- preempt_check_resched(); \
+ preempt_check_resched_context(); \
} while (0)

#else /* !CONFIG_PREEMPT_COUNT */
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 65349f0..ac3a312 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -71,6 +71,46 @@ void user_enter(void)
local_irq_restore(flags);
}

+/**
+ * preempt_schedule_context - preempt_schedule called by tracing
+ *
+ * The tracing infrastructure uses preempt_enable_notrace to prevent
+ * recursion and tracing preempt enabling caused by the tracing
+ * infrastructure itself. But as tracing can happen in areas coming
+ * from userspace or just about to enter userspace, a preempt enable
+ * can occur before user_exit() is called. This will cause the scheduler
+ * to be called when the system is still in usermode.
+ *
+ * To prevent this, the preempt_enable_notrace will use this function
+ * instead of preempt_schedule() to exit user context if needed before
+ * calling the scheduler.
+ */
+void __sched notrace preempt_schedule_context(void)
+{
+ struct thread_info *ti = current_thread_info();
+ enum ctx_state prev_ctx;
+
+ if (likely(ti->preempt_count || irqs_disabled()))
+ return;
+
+ /*
+ * Need to disable preemption in case user_exit() is traced
+ * and the tracer calls preempt_enable_notrace() causing
+ * an infinite recursion.
+ */
+ preempt_disable_notrace();
+ prev_ctx = this_cpu_read(context_tracking.state);
+ user_exit();
+ preempt_enable_no_resched_notrace();
+
+ preempt_schedule();
+
+ preempt_disable_notrace();
+ if (prev_ctx == IN_USER)
+ user_enter();
+ preempt_enable_notrace();
+}
+EXPORT_SYMBOL(preempt_schedule_context);

/**
* user_exit - Inform the context tracking that the CPU is

2013-05-24 22:43:36

by Dave Jones

[permalink] [raw]
Subject: Re: rcu_read_lock() used illegally while idle!

On Fri, May 24, 2013 at 03:23:40PM -0400, Steven Rostedt wrote:
> On Fri, 2013-05-24 at 10:23 -0400, Dave Jones wrote:
>
> > .config: http://paste.fedoraproject.org/14281/94052971/raw/
> >
> > trace shows the problem process was 'cc1', so I was likely building a kernel
> > at the time. There was also a trinity run going on in the background.
> >
> > cmdline: nothing special..
> >
> > BOOT_IMAGE=/vmlinuz-3.10.0-rc2+ root=UUID=bee21cd9-1852-4d1d-9e9d-2e44332b8df1 ro rd.md=0 rd.lvm=0 rd.dm=0 vconsole.keymap=us rd.luks=0 vconsole.font=latarcyrheb-sun16 consoleblank=0 console=ttyUSB0,115200 console=tty0 pause_on_oops=30 audit=0
> >
>
> OK, I'm able to reproduce this. Looks like the same issue happens from
> other ways from userspace into the kernel (the next way was tracing
> system calls).
>
> Forget the other patch. Here's a new patch that is more specific to
> tracing and context tracking.
>
> Can you try this one out, please.

It's looking good so far.

Dave

2013-05-25 14:22:55

by Dave Jones

[permalink] [raw]
Subject: Re: rcu_read_lock() used illegally while idle!

On Fri, May 24, 2013 at 03:23:40PM -0400, Steven Rostedt wrote:
> On Fri, 2013-05-24 at 10:23 -0400, Dave Jones wrote:
>
> > .config: http://paste.fedoraproject.org/14281/94052971/raw/
> >
> > trace shows the problem process was 'cc1', so I was likely building a kernel
> > at the time. There was also a trinity run going on in the background.
> >
> > cmdline: nothing special..
> >
> > BOOT_IMAGE=/vmlinuz-3.10.0-rc2+ root=UUID=bee21cd9-1852-4d1d-9e9d-2e44332b8df1 ro rd.md=0 rd.lvm=0 rd.dm=0 vconsole.keymap=us rd.luks=0 vconsole.font=latarcyrheb-sun16 consoleblank=0 console=ttyUSB0,115200 console=tty0 pause_on_oops=30 audit=0
> >
>
> OK, I'm able to reproduce this. Looks like the same issue happens from
> other ways from userspace into the kernel (the next way was tracing
> system calls).
>
> Forget the other patch. Here's a new patch that is more specific to
> tracing and context tracking.
>
> Can you try this one out, please.

Took a lot longer to hit this..

[53693.297516] ===============================
[53693.298109] [ INFO: suspicious RCU usage. ]
[53693.298562] 3.10.0-rc2+ #38 Not tainted
[53693.299017] -------------------------------
[53693.299474] include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
[53693.299959]
other info that might help us debug this:

[53693.301420]
RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
[53693.302918] RCU used illegally from extended quiescent state!
[53693.303462] 1 lock held by trinity-child1/18786:
[53693.303966] #0: (rcu_read_lock){.+.+..}, at: [<ffffffff8113dd48>] __perf_event_overflow+0x108/0x310
[53693.304557]
stack backtrace:
[53693.305608] CPU: 3 PID: 18786 Comm: trinity-child1 Not tainted 3.10.0-rc2+ #38
[53693.306790] 0000000000000000 ffff88020767bac8 ffffffff816e2f6b ffff88020767baf8
[53693.307408] ffffffff810b5897 ffff88021de92520 0000000000000000 ffff88020767bbf8
[53693.308035] 0000000000000000 ffff88020767bb78 ffffffff8113ded4 ffffffff8113dd48
[53693.308671] Call Trace:
[53693.309301] [<ffffffff816e2f6b>] dump_stack+0x19/0x1b
[53693.309943] [<ffffffff810b5897>] lockdep_rcu_suspicious+0xe7/0x120
[53693.310596] [<ffffffff8113ded4>] __perf_event_overflow+0x294/0x310
[53693.311256] [<ffffffff8113dd48>] ? __perf_event_overflow+0x108/0x310
[53693.311923] [<ffffffff81309289>] ? __const_udelay+0x29/0x30
[53693.312596] [<ffffffff81076054>] ? __rcu_read_unlock+0x54/0xa0
[53693.313275] [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
[53693.313958] [<ffffffff8113dfa1>] perf_swevent_overflow+0x51/0xe0
[53693.314650] [<ffffffff8113e08f>] perf_swevent_event+0x5f/0x90
[53693.315347] [<ffffffff8113e1c9>] perf_tp_event+0x109/0x4f0
[53693.316059] [<ffffffff8113e36f>] ? perf_tp_event+0x2af/0x4f0
[53693.316773] [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20
[53693.317485] [<ffffffff8112d79f>] perf_ftrace_function_call+0xbf/0xd0
[53693.318207] [<ffffffff8110e1e1>] ? ftrace_ops_control_func+0x181/0x210
[53693.318921] [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20
[53693.319621] [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470
[53693.320330] [<ffffffff8110e1e1>] ftrace_ops_control_func+0x181/0x210
[53693.321046] [<ffffffff816f4000>] ftrace_call+0x5/0x2f
[53693.321763] [<ffffffff8110e229>] ? ftrace_ops_control_func+0x1c9/0x210
[53693.322490] [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
[53693.323221] [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40
[53693.323988] [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40
[53693.324728] [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470
[53693.325489] [<ffffffff8110112a>] rcu_eqs_enter+0x6a/0xb0
[53693.326273] [<ffffffff81103673>] rcu_user_enter+0x13/0x20
[53693.327025] [<ffffffff8114541a>] user_enter+0x6a/0xd0
[53693.327783] [<ffffffff8100f6d8>] syscall_trace_leave+0x78/0x140
[53693.328551] [<ffffffff816f46af>] int_check_syscall_exit_work+0x34/0x3d

2013-05-25 18:20:13

by Steven Rostedt

[permalink] [raw]
Subject: Re: rcu_read_lock() used illegally while idle!

On Sat, 2013-05-25 at 09:59 -0400, Dave Jones wrote:
> On Fri, May 24, 2013 at 03:23:40PM -0400, Steven Rostedt wrote:
> > On Fri, 2013-05-24 at 10:23 -0400, Dave Jones wrote:
> >
> > > .config: http://paste.fedoraproject.org/14281/94052971/raw/
> > >
> > > trace shows the problem process was 'cc1', so I was likely building a kernel
> > > at the time. There was also a trinity run going on in the background.
> > >
> > > cmdline: nothing special..
> > >
> > > BOOT_IMAGE=/vmlinuz-3.10.0-rc2+ root=UUID=bee21cd9-1852-4d1d-9e9d-2e44332b8df1 ro rd.md=0 rd.lvm=0 rd.dm=0 vconsole.keymap=us rd.luks=0 vconsole.font=latarcyrheb-sun16 consoleblank=0 console=ttyUSB0,115200 console=tty0 pause_on_oops=30 audit=0
> > >
> >
> > OK, I'm able to reproduce this. Looks like the same issue happens from
> > other ways from userspace into the kernel (the next way was tracing
> > system calls).
> >
> > Forget the other patch. Here's a new patch that is more specific to
> > tracing and context tracking.
> >
> > Can you try this one out, please.
>
> Took a lot longer to hit this..

This is a same but different bug ;-)

Looks like we fixed all the function tracing infrastructure problems,
but this is a function tracer user problem. Namely perf.

>
> [53693.297516] ===============================
> [53693.298109] [ INFO: suspicious RCU usage. ]
> [53693.298562] 3.10.0-rc2+ #38 Not tainted
> [53693.299017] -------------------------------
> [53693.299474] include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
> [53693.299959]
> other info that might help us debug this:
>
> [53693.301420]
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 1, debug_locks = 0
> [53693.302918] RCU used illegally from extended quiescent state!
> [53693.303462] 1 lock held by trinity-child1/18786:
> [53693.303966] #0: (rcu_read_lock){.+.+..}, at: [<ffffffff8113dd48>] __perf_event_overflow+0x108/0x310
> [53693.304557]
> stack backtrace:
> [53693.305608] CPU: 3 PID: 18786 Comm: trinity-child1 Not tainted 3.10.0-rc2+ #38
> [53693.306790] 0000000000000000 ffff88020767bac8 ffffffff816e2f6b ffff88020767baf8
> [53693.307408] ffffffff810b5897 ffff88021de92520 0000000000000000 ffff88020767bbf8
> [53693.308035] 0000000000000000 ffff88020767bb78 ffffffff8113ded4 ffffffff8113dd48
> [53693.308671] Call Trace:
> [53693.309301] [<ffffffff816e2f6b>] dump_stack+0x19/0x1b
> [53693.309943] [<ffffffff810b5897>] lockdep_rcu_suspicious+0xe7/0x120
> [53693.310596] [<ffffffff8113ded4>] __perf_event_overflow+0x294/0x310
> [53693.311256] [<ffffffff8113dd48>] ? __perf_event_overflow+0x108/0x310
> [53693.311923] [<ffffffff81309289>] ? __const_udelay+0x29/0x30
> [53693.312596] [<ffffffff81076054>] ? __rcu_read_unlock+0x54/0xa0
> [53693.313275] [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
> [53693.313958] [<ffffffff8113dfa1>] perf_swevent_overflow+0x51/0xe0
> [53693.314650] [<ffffffff8113e08f>] perf_swevent_event+0x5f/0x90
> [53693.315347] [<ffffffff8113e1c9>] perf_tp_event+0x109/0x4f0
> [53693.316059] [<ffffffff8113e36f>] ? perf_tp_event+0x2af/0x4f0
> [53693.316773] [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20

Ouch! perf function tracing uses rcu_lock(). This should be avoided.
Believe it or not, rcu_lock() is too invasive for function tracing, not
to mention that the function tracer traces rcu_lock().

Jiri,

Is there a way you can make perf not use rcu here? Or we need to
blacklist what perf can trace. It can not trace any rcu calls. Here it
looks like it traced rcu_eqs_enter(). It also wont be able to trace
anything before user_exit() or after user_enter().

-- Steve

> [53693.317485] [<ffffffff8112d79f>] perf_ftrace_function_call+0xbf/0xd0
> [53693.318207] [<ffffffff8110e1e1>] ? ftrace_ops_control_func+0x181/0x210
> [53693.318921] [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20
> [53693.319621] [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470
> [53693.320330] [<ffffffff8110e1e1>] ftrace_ops_control_func+0x181/0x210
> [53693.321046] [<ffffffff816f4000>] ftrace_call+0x5/0x2f
> [53693.321763] [<ffffffff8110e229>] ? ftrace_ops_control_func+0x1c9/0x210
> [53693.322490] [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
> [53693.323221] [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40
> [53693.323988] [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40
> [53693.324728] [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470
> [53693.325489] [<ffffffff8110112a>] rcu_eqs_enter+0x6a/0xb0
> [53693.326273] [<ffffffff81103673>] rcu_user_enter+0x13/0x20
> [53693.327025] [<ffffffff8114541a>] user_enter+0x6a/0xd0
> [53693.327783] [<ffffffff8100f6d8>] syscall_trace_leave+0x78/0x140
> [53693.328551] [<ffffffff816f46af>] int_check_syscall_exit_work+0x34/0x3d
>

2013-05-26 18:05:16

by Jiri Olsa

[permalink] [raw]
Subject: Re: rcu_read_lock() used illegally while idle!

On Sat, May 25, 2013 at 02:20:04PM -0400, Steven Rostedt wrote:
> On Sat, 2013-05-25 at 09:59 -0400, Dave Jones wrote:
> > On Fri, May 24, 2013 at 03:23:40PM -0400, Steven Rostedt wrote:
> > > On Fri, 2013-05-24 at 10:23 -0400, Dave Jones wrote:
> > >
> > > > .config: http://paste.fedoraproject.org/14281/94052971/raw/
> > > >
> > > > trace shows the problem process was 'cc1', so I was likely building a kernel
> > > > at the time. There was also a trinity run going on in the background.
> > > >
> > > > cmdline: nothing special..
> > > >
> > > > BOOT_IMAGE=/vmlinuz-3.10.0-rc2+ root=UUID=bee21cd9-1852-4d1d-9e9d-2e44332b8df1 ro rd.md=0 rd.lvm=0 rd.dm=0 vconsole.keymap=us rd.luks=0 vconsole.font=latarcyrheb-sun16 consoleblank=0 console=ttyUSB0,115200 console=tty0 pause_on_oops=30 audit=0
> > > >
> > >
> > > OK, I'm able to reproduce this. Looks like the same issue happens from
> > > other ways from userspace into the kernel (the next way was tracing
> > > system calls).
> > >
> > > Forget the other patch. Here's a new patch that is more specific to
> > > tracing and context tracking.
> > >
> > > Can you try this one out, please.
> >
> > Took a lot longer to hit this..
>
> This is a same but different bug ;-)
>
> Looks like we fixed all the function tracing infrastructure problems,
> but this is a function tracer user problem. Namely perf.
>
> >
> > [53693.297516] ===============================
> > [53693.298109] [ INFO: suspicious RCU usage. ]
> > [53693.298562] 3.10.0-rc2+ #38 Not tainted
> > [53693.299017] -------------------------------
> > [53693.299474] include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
> > [53693.299959]
> > other info that might help us debug this:
> >
> > [53693.301420]
> > RCU used illegally from idle CPU!
> > rcu_scheduler_active = 1, debug_locks = 0
> > [53693.302918] RCU used illegally from extended quiescent state!
> > [53693.303462] 1 lock held by trinity-child1/18786:
> > [53693.303966] #0: (rcu_read_lock){.+.+..}, at: [<ffffffff8113dd48>] __perf_event_overflow+0x108/0x310
> > [53693.304557]
> > stack backtrace:
> > [53693.305608] CPU: 3 PID: 18786 Comm: trinity-child1 Not tainted 3.10.0-rc2+ #38
> > [53693.306790] 0000000000000000 ffff88020767bac8 ffffffff816e2f6b ffff88020767baf8
> > [53693.307408] ffffffff810b5897 ffff88021de92520 0000000000000000 ffff88020767bbf8
> > [53693.308035] 0000000000000000 ffff88020767bb78 ffffffff8113ded4 ffffffff8113dd48
> > [53693.308671] Call Trace:
> > [53693.309301] [<ffffffff816e2f6b>] dump_stack+0x19/0x1b
> > [53693.309943] [<ffffffff810b5897>] lockdep_rcu_suspicious+0xe7/0x120
> > [53693.310596] [<ffffffff8113ded4>] __perf_event_overflow+0x294/0x310
> > [53693.311256] [<ffffffff8113dd48>] ? __perf_event_overflow+0x108/0x310
> > [53693.311923] [<ffffffff81309289>] ? __const_udelay+0x29/0x30
> > [53693.312596] [<ffffffff81076054>] ? __rcu_read_unlock+0x54/0xa0
> > [53693.313275] [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
> > [53693.313958] [<ffffffff8113dfa1>] perf_swevent_overflow+0x51/0xe0
> > [53693.314650] [<ffffffff8113e08f>] perf_swevent_event+0x5f/0x90
> > [53693.315347] [<ffffffff8113e1c9>] perf_tp_event+0x109/0x4f0
> > [53693.316059] [<ffffffff8113e36f>] ? perf_tp_event+0x2af/0x4f0
> > [53693.316773] [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20
>
> Ouch! perf function tracing uses rcu_lock(). This should be avoided.
> Believe it or not, rcu_lock() is too invasive for function tracing, not
> to mention that the function tracer traces rcu_lock().
>
> Jiri,
>
> Is there a way you can make perf not use rcu here? Or we need to
> blacklist what perf can trace. It can not trace any rcu calls. Here it
> looks like it traced rcu_eqs_enter(). It also wont be able to trace
> anything before user_exit() or after user_enter().

ok, I'll check

jirka

2013-05-26 18:57:44

by Paul E. McKenney

[permalink] [raw]
Subject: Re: WARNING: at kernel/rcutree.c:388 rcu_eqs_enter

On Thu, May 23, 2013 at 11:51:52AM -0400, Dave Jones wrote:
> On Wed, May 22, 2013 at 02:51:29AM -0700, Paul E. McKenney wrote:
>
> > trace: Allow idle-safe tracepoints to be called from irq
> >
> > __DECLARE_TRACE_RCU() currently creates an _rcuidle() tracepoint which
> > may safely be invoked from what RCU considers to be an idle CPU.
> > However, these _rcuidle() tracepoints may -not- be invoked from the
> > handler of an irq taken from idle, because rcu_idle_enter() zeroes
> > RCU's nesting-level counter, so that the rcu_irq_exit() returning to
> > idle will trigger a WARN_ON_ONCE().
> >
> > This commit therefore substitutes rcu_irq_enter() for rcu_idle_exit()
> > and rcu_irq_exit() for rcu_idle_enter() in order to make the _rcuidle()
> > tracepoints usable from irq handlers as well as from process context.
> >
> > Reported-by: Dave Jones <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> >
> > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> > index 2f322c3..f8e084d 100644
> > --- a/include/linux/tracepoint.h
> > +++ b/include/linux/tracepoint.h
> > @@ -145,8 +145,8 @@ static inline void tracepoint_synchronize_unregister(void)
> > TP_PROTO(data_proto), \
> > TP_ARGS(data_args), \
> > TP_CONDITION(cond), \
> > - rcu_idle_exit(), \
> > - rcu_idle_enter()); \
> > + rcu_irq_enter(), \
> > + rcu_irq_exit()); \
> > }
> > #else
> > #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args)
>
> This stops the "WARNING: at kernel/context_tracking.c:55 user_enter+0xb5/0xd0()"
> but I still see..

Thank you for testing this!

This is the same one that you and Steven Rostedt are discussing, correct?

Thanx, Paul

> [ 363.399348] ===============================
> [ 363.400617] [ INFO: suspicious RCU usage. ]
> [ 363.401880] 3.10.0-rc2+ #31 Not tainted
> [ 363.403210] -------------------------------
> [ 363.403787] include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
> [ 363.403831]
> other info that might help us debug this:
>
> [ 363.403880]
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 1, debug_locks = 0
> [ 363.403942] RCU used illegally from extended quiescent state!
> [ 363.403976] 2 locks held by trinity-child0/27319:
> [ 363.404006] #0: (&rq->lock){-.-.-.}, at: [<ffffffff816e91bf>] __schedule+0xef/0x9c0
> [ 363.404076] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff810a5605>] cpuacct_charge+0x5/0x1f0
> [ 363.404146]
> stack backtrace:
> [ 363.404176] CPU: 1 PID: 27319 Comm: trinity-child0 Not tainted 3.10.0-rc2+ #31
> [ 363.405536] 0000000000000000 ffff880208ddfcf8 ffffffff816e2a5b ffff880208ddfd28
> [ 363.406835] ffffffff810b5917 ffff880243362568 00000000003c3c18 ffff880243362520
> [ 363.408140] 0000000000000001 ffff880208ddfd60 ffffffff810a5785 ffffffff810a5605
> [ 363.409451] Call Trace:
> [ 363.410726] [<ffffffff816e2a5b>] dump_stack+0x19/0x1b
> [ 363.412005] [<ffffffff810b5917>] lockdep_rcu_suspicious+0xe7/0x120
> [ 363.413281] [<ffffffff810a5785>] cpuacct_charge+0x185/0x1f0
> [ 363.414543] [<ffffffff810a5605>] ? cpuacct_charge+0x5/0x1f0
> [ 363.415793] [<ffffffff8109607c>] update_curr+0xec/0x250
> [ 363.417035] [<ffffffff810975a8>] put_prev_task_fair+0x228/0x480
> [ 363.418273] [<ffffffff816e9236>] __schedule+0x166/0x9c0
> [ 363.419497] [<ffffffff816e9ee4>] preempt_schedule+0x44/0x60
> [ 363.420717] [<ffffffff816e9fb0>] ? __cond_resched_softirq+0x60/0x60
> [ 363.421940] [<ffffffff816ebfa4>] ? retint_careful+0x12/0x2e
> [ 363.423155] [<ffffffff8110e2b3>] ftrace_ops_control_func+0x1d3/0x210
> [ 363.424373] [<ffffffff816f3a40>] ftrace_call+0x5/0x2f
> [ 363.425577] [<ffffffff816ebf9d>] ? retint_careful+0xb/0x2e
> [ 363.426784] [<ffffffff816e9fb5>] ? schedule_user+0x5/0x70
> [ 363.427988] [<ffffffff816e9fb5>] ? schedule_user+0x5/0x70
> [ 363.429184] [<ffffffff816ebfa4>] ? retint_careful+0x12/0x2e
>
> I'll try Steve's patch on top next.
>
> Dave
>

2013-05-28 20:14:59

by Dave Jones

[permalink] [raw]
Subject: Re: rcu_read_lock() used illegally while idle!

On Sat, May 25, 2013 at 02:20:04PM -0400, Steven Rostedt wrote:
> On Sat, 2013-05-25 at 09:59 -0400, Dave Jones wrote:
> > On Fri, May 24, 2013 at 03:23:40PM -0400, Steven Rostedt wrote:
> > > On Fri, 2013-05-24 at 10:23 -0400, Dave Jones wrote:
> > >
> > > > .config: http://paste.fedoraproject.org/14281/94052971/raw/
> > > >
> > > > trace shows the problem process was 'cc1', so I was likely building a kernel
> > > > at the time. There was also a trinity run going on in the background.
> > > >
> > > > cmdline: nothing special..
> > > >
> > > > BOOT_IMAGE=/vmlinuz-3.10.0-rc2+ root=UUID=bee21cd9-1852-4d1d-9e9d-2e44332b8df1 ro rd.md=0 rd.lvm=0 rd.dm=0 vconsole.keymap=us rd.luks=0 vconsole.font=latarcyrheb-sun16 consoleblank=0 console=ttyUSB0,115200 console=tty0 pause_on_oops=30 audit=0
> > > >
> > >
> > > OK, I'm able to reproduce this. Looks like the same issue happens from
> > > other ways from userspace into the kernel (the next way was tracing
> > > system calls).
> > >
> > > Forget the other patch. Here's a new patch that is more specific to
> > > tracing and context tracking.
> > >
> > > Can you try this one out, please.
> >
> > Took a lot longer to hit this..
>
> This is a same but different bug ;-)
>
> Looks like we fixed all the function tracing infrastructure problems,
> but this is a function tracer user problem. Namely perf.

another variant of the same ? or different different ?

[12572.705832] ======================================================
[12572.750317] [ INFO: possible circular locking dependency detected ]
[12572.796978] 3.10.0-rc3+ #39 Not tainted
[12572.833381] -------------------------------------------------------
[12572.862233] trinity-child17/31341 is trying to acquire lock:
[12572.870390] (rcu_node_0){..-.-.}, at: [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
[12572.878859]
but task is already holding lock:
[12572.894894] (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
[12572.903381]
which lock already depends on the new lock.

[12572.927541]
the existing dependency chain (in reverse order) is:
[12572.943736]
-> #4 (&ctx->lock){-.-...}:
[12572.960032] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
[12572.968337] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
[12572.976633] [<ffffffff8113c987>] __perf_event_task_sched_out+0x2e7/0x5e0
[12572.984969] [<ffffffff81088953>] perf_event_task_sched_out+0x93/0xa0
[12572.993326] [<ffffffff816ea0bf>] __schedule+0x2cf/0x9c0
[12573.001652] [<ffffffff816eacfe>] schedule_user+0x2e/0x70
[12573.009998] [<ffffffff816ecd64>] retint_careful+0x12/0x2e
[12573.018321]
-> #3 (&rq->lock){-.-.-.}:
[12573.034628] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
[12573.042930] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
[12573.051248] [<ffffffff8108e6a7>] wake_up_new_task+0xb7/0x260
[12573.059579] [<ffffffff810492f5>] do_fork+0x105/0x470
[12573.067880] [<ffffffff81049686>] kernel_thread+0x26/0x30
[12573.076202] [<ffffffff816cee63>] rest_init+0x23/0x140
[12573.084508] [<ffffffff81ed8e1f>] start_kernel+0x3f1/0x3fe
[12573.092852] [<ffffffff81ed856f>] x86_64_start_reservations+0x2a/0x2c
[12573.101233] [<ffffffff81ed863d>] x86_64_start_kernel+0xcc/0xcf
[12573.109528]
-> #2 (&p->pi_lock){-.-.-.}:
[12573.125675] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
[12573.133829] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
[12573.141964] [<ffffffff8108e881>] try_to_wake_up+0x31/0x320
[12573.150065] [<ffffffff8108ebe2>] default_wake_function+0x12/0x20
[12573.158151] [<ffffffff8107bbf8>] autoremove_wake_function+0x18/0x40
[12573.166195] [<ffffffff81085398>] __wake_up_common+0x58/0x90
[12573.174215] [<ffffffff81086909>] __wake_up+0x39/0x50
[12573.182146] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
[12573.190119] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
[12573.198023] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
[12573.205860] [<ffffffff8107a91d>] kthread+0xed/0x100
[12573.213656] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
[12573.221379]
-> #1 (&rsp->gp_wq){..-.-.}:
[12573.236329] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
[12573.243783] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
[12573.251178] [<ffffffff810868f3>] __wake_up+0x23/0x50
[12573.258505] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
[12573.265891] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
[12573.273248] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
[12573.280564] [<ffffffff8107a91d>] kthread+0xed/0x100
[12573.287807] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
[12573.295067]
-> #0 (rcu_node_0){..-.-.}:
[12573.309293] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
[12573.316568] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
[12573.323825] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
[12573.331081] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
[12573.338377] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
[12573.345648] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
[12573.352942] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
[12573.360211] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
[12573.367514] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
[12573.374816] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
[12573.382068]
other info that might help us debug this:

[12573.403229] Chain exists of:
rcu_node_0 --> &rq->lock --> &ctx->lock

[12573.424471] Possible unsafe locking scenario:

[12573.438499] CPU0 CPU1
[12573.445599] ---- ----
[12573.452691] lock(&ctx->lock);
[12573.459799] lock(&rq->lock);
[12573.467010] lock(&ctx->lock);
[12573.474192] lock(rcu_node_0);
[12573.481262]
*** DEADLOCK ***

[12573.501931] 1 lock held by trinity-child17/31341:
[12573.508990] #0: (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
[12573.516475]
stack backtrace:
[12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
[12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
[12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
[12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
[12573.567856] Call Trace:
[12573.575011] [<ffffffff816e375b>] dump_stack+0x19/0x1b
[12573.582284] [<ffffffff816dfa5d>] print_circular_bug+0x200/0x20f
[12573.589637] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
[12573.596982] [<ffffffff810918f5>] ? sched_clock_cpu+0xb5/0x100
[12573.604344] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
[12573.611652] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
[12573.619030] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
[12573.626331] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
[12573.633671] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
[12573.640992] [<ffffffff811390ed>] ? perf_lock_task_context+0x7d/0x2d0
[12573.648330] [<ffffffff810b429e>] ? put_lock_stats.isra.29+0xe/0x40
[12573.655662] [<ffffffff813095a0>] ? delay_tsc+0x90/0xe0
[12573.662964] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
[12573.670276] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
[12573.677622] [<ffffffff81139070>] ? __perf_event_enable+0x370/0x370
[12573.684981] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
[12573.692358] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
[12573.699753] [<ffffffff8108cd9d>] ? get_parent_ip+0xd/0x50
[12573.707135] [<ffffffff810b71fd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[12573.714599] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
[12573.721996] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2

2013-05-28 20:27:23

by Steven Rostedt

[permalink] [raw]
Subject: Re: rcu_read_lock() used illegally while idle!

On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:
> On Sat, May 25, 2013 at 02:20:04PM -0400, Steven Rostedt wrote:
> > On Sat, 2013-05-25 at 09:59 -0400, Dave Jones wrote:
> > > On Fri, May 24, 2013 at 03:23:40PM -0400, Steven Rostedt wrote:
> > > > On Fri, 2013-05-24 at 10:23 -0400, Dave Jones wrote:
> > > >
> > > > > .config: http://paste.fedoraproject.org/14281/94052971/raw/
> > > > >
> > > > > trace shows the problem process was 'cc1', so I was likely building a kernel
> > > > > at the time. There was also a trinity run going on in the background.
> > > > >
> > > > > cmdline: nothing special..
> > > > >
> > > > > BOOT_IMAGE=/vmlinuz-3.10.0-rc2+ root=UUID=bee21cd9-1852-4d1d-9e9d-2e44332b8df1 ro rd.md=0 rd.lvm=0 rd.dm=0 vconsole.keymap=us rd.luks=0 vconsole.font=latarcyrheb-sun16 consoleblank=0 console=ttyUSB0,115200 console=tty0 pause_on_oops=30 audit=0
> > > > >
> > > >
> > > > OK, I'm able to reproduce this. Looks like the same issue happens from
> > > > other ways from userspace into the kernel (the next way was tracing
> > > > system calls).
> > > >
> > > > Forget the other patch. Here's a new patch that is more specific to
> > > > tracing and context tracking.
> > > >
> > > > Can you try this one out, please.
> > >
> > > Took a lot longer to hit this..
> >
> > This is a same but different bug ;-)
> >
> > Looks like we fixed all the function tracing infrastructure problems,
> > but this is a function tracer user problem. Namely perf.
>
> another variant of the same ? or different different ?

No this is something entirely different. The previous reports were
mostly caused by rcu dereference being used when rcu was shut off (going
into user land or idle).

This looks like a more serious possible deadlock. I'll look into this a
bit.

Thanks,

-- Steve

>
> [12572.705832] ======================================================
> [12572.750317] [ INFO: possible circular locking dependency detected ]
> [12572.796978] 3.10.0-rc3+ #39 Not tainted
> [12572.833381] -------------------------------------------------------
> [12572.862233] trinity-child17/31341 is trying to acquire lock:
> [12572.870390] (rcu_node_0){..-.-.}, at: [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12572.878859]
> but task is already holding lock:
> [12572.894894] (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
> [12572.903381]
> which lock already depends on the new lock.
>
> [12572.927541]
> the existing dependency chain (in reverse order) is:
> [12572.943736]
> -> #4 (&ctx->lock){-.-...}:
> [12572.960032] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12572.968337] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12572.976633] [<ffffffff8113c987>] __perf_event_task_sched_out+0x2e7/0x5e0
> [12572.984969] [<ffffffff81088953>] perf_event_task_sched_out+0x93/0xa0
> [12572.993326] [<ffffffff816ea0bf>] __schedule+0x2cf/0x9c0
> [12573.001652] [<ffffffff816eacfe>] schedule_user+0x2e/0x70
> [12573.009998] [<ffffffff816ecd64>] retint_careful+0x12/0x2e
> [12573.018321]
> -> #3 (&rq->lock){-.-.-.}:
> [12573.034628] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.042930] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.051248] [<ffffffff8108e6a7>] wake_up_new_task+0xb7/0x260
> [12573.059579] [<ffffffff810492f5>] do_fork+0x105/0x470
> [12573.067880] [<ffffffff81049686>] kernel_thread+0x26/0x30
> [12573.076202] [<ffffffff816cee63>] rest_init+0x23/0x140
> [12573.084508] [<ffffffff81ed8e1f>] start_kernel+0x3f1/0x3fe
> [12573.092852] [<ffffffff81ed856f>] x86_64_start_reservations+0x2a/0x2c
> [12573.101233] [<ffffffff81ed863d>] x86_64_start_kernel+0xcc/0xcf
> [12573.109528]
> -> #2 (&p->pi_lock){-.-.-.}:
> [12573.125675] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.133829] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.141964] [<ffffffff8108e881>] try_to_wake_up+0x31/0x320
> [12573.150065] [<ffffffff8108ebe2>] default_wake_function+0x12/0x20
> [12573.158151] [<ffffffff8107bbf8>] autoremove_wake_function+0x18/0x40
> [12573.166195] [<ffffffff81085398>] __wake_up_common+0x58/0x90
> [12573.174215] [<ffffffff81086909>] __wake_up+0x39/0x50
> [12573.182146] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.190119] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
> [12573.198023] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
> [12573.205860] [<ffffffff8107a91d>] kthread+0xed/0x100
> [12573.213656] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
> [12573.221379]
> -> #1 (&rsp->gp_wq){..-.-.}:
> [12573.236329] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.243783] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.251178] [<ffffffff810868f3>] __wake_up+0x23/0x50
> [12573.258505] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.265891] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
> [12573.273248] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
> [12573.280564] [<ffffffff8107a91d>] kthread+0xed/0x100
> [12573.287807] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
> [12573.295067]
> -> #0 (rcu_node_0){..-.-.}:
> [12573.309293] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
> [12573.316568] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.323825] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.331081] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12573.338377] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
> [12573.345648] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
> [12573.352942] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
> [12573.360211] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
> [12573.367514] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
> [12573.374816] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
> [12573.382068]
> other info that might help us debug this:
>
> [12573.403229] Chain exists of:
> rcu_node_0 --> &rq->lock --> &ctx->lock
>
> [12573.424471] Possible unsafe locking scenario:
>
> [12573.438499] CPU0 CPU1
> [12573.445599] ---- ----
> [12573.452691] lock(&ctx->lock);
> [12573.459799] lock(&rq->lock);
> [12573.467010] lock(&ctx->lock);
> [12573.474192] lock(rcu_node_0);
> [12573.481262]
> *** DEADLOCK ***
>
> [12573.501931] 1 lock held by trinity-child17/31341:
> [12573.508990] #0: (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
> [12573.516475]
> stack backtrace:
> [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
> [12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
> [12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
> [12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
> [12573.567856] Call Trace:
> [12573.575011] [<ffffffff816e375b>] dump_stack+0x19/0x1b
> [12573.582284] [<ffffffff816dfa5d>] print_circular_bug+0x200/0x20f
> [12573.589637] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
> [12573.596982] [<ffffffff810918f5>] ? sched_clock_cpu+0xb5/0x100
> [12573.604344] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.611652] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.619030] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.626331] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.633671] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12573.640992] [<ffffffff811390ed>] ? perf_lock_task_context+0x7d/0x2d0
> [12573.648330] [<ffffffff810b429e>] ? put_lock_stats.isra.29+0xe/0x40
> [12573.655662] [<ffffffff813095a0>] ? delay_tsc+0x90/0xe0
> [12573.662964] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
> [12573.670276] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
> [12573.677622] [<ffffffff81139070>] ? __perf_event_enable+0x370/0x370
> [12573.684981] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
> [12573.692358] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
> [12573.699753] [<ffffffff8108cd9d>] ? get_parent_ip+0xd/0x50
> [12573.707135] [<ffffffff810b71fd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [12573.714599] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
> [12573.721996] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2

2013-05-28 21:32:59

by Steven Rostedt

[permalink] [raw]
Subject: [BUG] with rcu nocb, don't call wake up holding rnp->lock (was: rcu_read_lock() used illegally while idle!)

Paul, this looks to be a nocb rcu bug.


On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:

> [12572.705832] ======================================================
> [12572.750317] [ INFO: possible circular locking dependency detected ]
> [12572.796978] 3.10.0-rc3+ #39 Not tainted
> [12572.833381] -------------------------------------------------------
> [12572.862233] trinity-child17/31341 is trying to acquire lock:
> [12572.870390] (rcu_node_0){..-.-.}, at: [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12572.878859]
> but task is already holding lock:
> [12572.894894] (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
> [12572.903381]
> which lock already depends on the new lock.
>
> [12572.927541]
> the existing dependency chain (in reverse order) is:
> [12572.943736]
> -> #4 (&ctx->lock){-.-...}:
> [12572.960032] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12572.968337] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12572.976633] [<ffffffff8113c987>] __perf_event_task_sched_out+0x2e7/0x5e0
> [12572.984969] [<ffffffff81088953>] perf_event_task_sched_out+0x93/0xa0
> [12572.993326] [<ffffffff816ea0bf>] __schedule+0x2cf/0x9c0
> [12573.001652] [<ffffffff816eacfe>] schedule_user+0x2e/0x70
> [12573.009998] [<ffffffff816ecd64>] retint_careful+0x12/0x2e
> [12573.018321]
> -> #3 (&rq->lock){-.-.-.}:
> [12573.034628] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.042930] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.051248] [<ffffffff8108e6a7>] wake_up_new_task+0xb7/0x260
> [12573.059579] [<ffffffff810492f5>] do_fork+0x105/0x470
> [12573.067880] [<ffffffff81049686>] kernel_thread+0x26/0x30
> [12573.076202] [<ffffffff816cee63>] rest_init+0x23/0x140
> [12573.084508] [<ffffffff81ed8e1f>] start_kernel+0x3f1/0x3fe
> [12573.092852] [<ffffffff81ed856f>] x86_64_start_reservations+0x2a/0x2c
> [12573.101233] [<ffffffff81ed863d>] x86_64_start_kernel+0xcc/0xcf
> [12573.109528]
> -> #2 (&p->pi_lock){-.-.-.}:
> [12573.125675] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.133829] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.141964] [<ffffffff8108e881>] try_to_wake_up+0x31/0x320
> [12573.150065] [<ffffffff8108ebe2>] default_wake_function+0x12/0x20
> [12573.158151] [<ffffffff8107bbf8>] autoremove_wake_function+0x18/0x40
> [12573.166195] [<ffffffff81085398>] __wake_up_common+0x58/0x90
> [12573.174215] [<ffffffff81086909>] __wake_up+0x39/0x50
> [12573.182146] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.190119] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
> [12573.198023] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
> [12573.205860] [<ffffffff8107a91d>] kthread+0xed/0x100
> [12573.213656] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
> [12573.221379]
> -> #1 (&rsp->gp_wq){..-.-.}:
> [12573.236329] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.243783] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.251178] [<ffffffff810868f3>] __wake_up+0x23/0x50
> [12573.258505] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.265891] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
> [12573.273248] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
> [12573.280564] [<ffffffff8107a91d>] kthread+0xed/0x100
> [12573.287807] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0

Notice the above call chain.

rcu_start_future_gp() is called with the rnp->lock held. Then it calls
rcu_start_gp_advance, which does a wakeup.

You can't do wakeups while holding the rnp->lock, as that would mean
that you could not do a rcu_read_unlock() while holding the rq lock, or
any lock that was taken while holding the rq lock. This is because...
(See below).


> [12573.295067]
> -> #0 (rcu_node_0){..-.-.}:
> [12573.309293] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
> [12573.316568] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.323825] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.331081] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12573.338377] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
> [12573.345648] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
> [12573.352942] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
> [12573.360211] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
> [12573.367514] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
> [12573.374816] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2

Notice the above trace.

perf took its own ctx->lock, which can be taken while holding the rq
lock. While holding this lock, it did a rcu_read_unlock(). The
perf_lock_task_context() basically looks like:

rcu_read_lock();
raw_spin_lock(ctx->lock);
rcu_read_unlock();

Now, what looks to have happened, is that we scheduled after taking that
first rcu_read_lock() but before taking the spin lock. When we scheduled
back in and took the ctx->lock, the following rcu_read_unlock()
triggered the "special" code.

The rcu_read_unlock_special() takes the rnp->lock, which gives us a
possible deadlock scenario.

CPU0 CPU1 CPU2
---- ---- ----

rcu_nocb_kthread()
lock(rq->lock);
lock(ctx->lock);
lock(rnp->lock);

wake_up();

lock(rq->lock);

rcu_read_unlock();

rcu_read_unlock_special();

lock(rnp->lock);
lock(ctx->lock);

**** DEADLOCK ****


> [12573.382068]
> other info that might help us debug this:
>
> [12573.403229] Chain exists of:
> rcu_node_0 --> &rq->lock --> &ctx->lock
>
> [12573.424471] Possible unsafe locking scenario:
>
> [12573.438499] CPU0 CPU1
> [12573.445599] ---- ----
> [12573.452691] lock(&ctx->lock);
> [12573.459799] lock(&rq->lock);
> [12573.467010] lock(&ctx->lock);
> [12573.474192] lock(rcu_node_0);
> [12573.481262]
> *** DEADLOCK ***
>
> [12573.501931] 1 lock held by trinity-child17/31341:
> [12573.508990] #0: (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
> [12573.516475]
> stack backtrace:
> [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
> [12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
> [12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
> [12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
> [12573.567856] Call Trace:
> [12573.575011] [<ffffffff816e375b>] dump_stack+0x19/0x1b
> [12573.582284] [<ffffffff816dfa5d>] print_circular_bug+0x200/0x20f
> [12573.589637] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
> [12573.596982] [<ffffffff810918f5>] ? sched_clock_cpu+0xb5/0x100
> [12573.604344] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.611652] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.619030] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.626331] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.633671] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12573.640992] [<ffffffff811390ed>] ? perf_lock_task_context+0x7d/0x2d0
> [12573.648330] [<ffffffff810b429e>] ? put_lock_stats.isra.29+0xe/0x40
> [12573.655662] [<ffffffff813095a0>] ? delay_tsc+0x90/0xe0
> [12573.662964] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
> [12573.670276] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
> [12573.677622] [<ffffffff81139070>] ? __perf_event_enable+0x370/0x370
> [12573.684981] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
> [12573.692358] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
> [12573.699753] [<ffffffff8108cd9d>] ? get_parent_ip+0xd/0x50
> [12573.707135] [<ffffffff810b71fd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [12573.714599] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
> [12573.721996] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2

Here's one solution, to delay the wakeup via a irq_work: This is what
perf and ftrace use to perform wakeups in critical sections.

-- Steve

Signed-off-by: Steven Rostedt <[email protected]>

Index: linux-trace.git/init/Kconfig
===================================================================
--- linux-trace.git.orig/init/Kconfig
+++ linux-trace.git/init/Kconfig
@@ -431,6 +431,7 @@ choice
config TREE_RCU
bool "Tree-based hierarchical RCU"
depends on !PREEMPT && SMP
+ select IRQ_WORK
help
This option selects the RCU implementation that is
designed for very large SMP system with hundreds or
Index: linux-trace.git/kernel/rcutree.c
===================================================================
--- linux-trace.git.orig/kernel/rcutree.c
+++ linux-trace.git/kernel/rcutree.c
@@ -1613,6 +1613,14 @@ static int __noreturn rcu_gp_kthread(voi
}
}

+static void rsp_wakeup(struct irq_work *work)
+{
+ struct rcu_state *rsp = container_of(work, struct rcu_state, wakeup_work);
+
+ /* Wake up rcu_gp_kthread() to start the grace period. */
+ wake_up(&rsp->gp_wq);
+}
+
/*
* Start a new RCU grace period if warranted, re-initializing the hierarchy
* in preparation for detecting the next grace period. The caller must hold
@@ -1637,8 +1645,12 @@ rcu_start_gp_advanced(struct rcu_state *
}
rsp->gp_flags = RCU_GP_FLAG_INIT;

- /* Wake up rcu_gp_kthread() to start the grace period. */
- wake_up(&rsp->gp_wq);
+ /*
+ * We can't do wakeups while holding the rnp->lock, as that
+ * could cause possible deadlocks with the rq->lock. Deter
+ * the wakeup to interrupt context.
+ */
+ irq_work_queue(&rsp->wakeup_work);
}

/*
@@ -3235,6 +3247,7 @@ static void __init rcu_init_one(struct r

rsp->rda = rda;
init_waitqueue_head(&rsp->gp_wq);
+ init_irq_work(&rsp->wakeup_work, rsp_wakeup);
rnp = rsp->level[rcu_num_lvls - 1];
for_each_possible_cpu(i) {
while (i > rnp->grphi)
Index: linux-trace.git/kernel/rcutree.h
===================================================================
--- linux-trace.git.orig/kernel/rcutree.h
+++ linux-trace.git/kernel/rcutree.h
@@ -27,6 +27,7 @@
#include <linux/threads.h>
#include <linux/cpumask.h>
#include <linux/seqlock.h>
+#include <linux/irq_work.h>

/*
* Define shape of hierarchy based on NR_CPUS, CONFIG_RCU_FANOUT, and
@@ -442,6 +443,7 @@ struct rcu_state {
char *name; /* Name of structure. */
char abbr; /* Abbreviated name. */
struct list_head flavors; /* List of RCU flavors. */
+ struct irq_work wakeup_work; /* Postponed wakeups */
};

/* Values for rcu_state structure's gp_flags field. */

2013-05-29 03:29:25

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [BUG] with rcu nocb, don't call wake up holding rnp->lock (was: rcu_read_lock() used illegally while idle!)

On Tue, May 28, 2013 at 05:32:53PM -0400, Steven Rostedt wrote:
> Paul, this looks to be a nocb rcu bug.

Excellent work tracing this down! I have queued your fix, and if it
passes testing, I will push for 3.10.

Thanx, Paul

> On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:
>
> > [12572.705832] ======================================================
> > [12572.750317] [ INFO: possible circular locking dependency detected ]
> > [12572.796978] 3.10.0-rc3+ #39 Not tainted
> > [12572.833381] -------------------------------------------------------
> > [12572.862233] trinity-child17/31341 is trying to acquire lock:
> > [12572.870390] (rcu_node_0){..-.-.}, at: [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> > [12572.878859]
> > but task is already holding lock:
> > [12572.894894] (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
> > [12572.903381]
> > which lock already depends on the new lock.
> >
> > [12572.927541]
> > the existing dependency chain (in reverse order) is:
> > [12572.943736]
> > -> #4 (&ctx->lock){-.-...}:
> > [12572.960032] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> > [12572.968337] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> > [12572.976633] [<ffffffff8113c987>] __perf_event_task_sched_out+0x2e7/0x5e0
> > [12572.984969] [<ffffffff81088953>] perf_event_task_sched_out+0x93/0xa0
> > [12572.993326] [<ffffffff816ea0bf>] __schedule+0x2cf/0x9c0
> > [12573.001652] [<ffffffff816eacfe>] schedule_user+0x2e/0x70
> > [12573.009998] [<ffffffff816ecd64>] retint_careful+0x12/0x2e
> > [12573.018321]
> > -> #3 (&rq->lock){-.-.-.}:
> > [12573.034628] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> > [12573.042930] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> > [12573.051248] [<ffffffff8108e6a7>] wake_up_new_task+0xb7/0x260
> > [12573.059579] [<ffffffff810492f5>] do_fork+0x105/0x470
> > [12573.067880] [<ffffffff81049686>] kernel_thread+0x26/0x30
> > [12573.076202] [<ffffffff816cee63>] rest_init+0x23/0x140
> > [12573.084508] [<ffffffff81ed8e1f>] start_kernel+0x3f1/0x3fe
> > [12573.092852] [<ffffffff81ed856f>] x86_64_start_reservations+0x2a/0x2c
> > [12573.101233] [<ffffffff81ed863d>] x86_64_start_kernel+0xcc/0xcf
> > [12573.109528]
> > -> #2 (&p->pi_lock){-.-.-.}:
> > [12573.125675] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> > [12573.133829] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
> > [12573.141964] [<ffffffff8108e881>] try_to_wake_up+0x31/0x320
> > [12573.150065] [<ffffffff8108ebe2>] default_wake_function+0x12/0x20
> > [12573.158151] [<ffffffff8107bbf8>] autoremove_wake_function+0x18/0x40
> > [12573.166195] [<ffffffff81085398>] __wake_up_common+0x58/0x90
> > [12573.174215] [<ffffffff81086909>] __wake_up+0x39/0x50
> > [12573.182146] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
> > [12573.190119] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
> > [12573.198023] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
> > [12573.205860] [<ffffffff8107a91d>] kthread+0xed/0x100
> > [12573.213656] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
> > [12573.221379]
> > -> #1 (&rsp->gp_wq){..-.-.}:
> > [12573.236329] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> > [12573.243783] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
> > [12573.251178] [<ffffffff810868f3>] __wake_up+0x23/0x50
> > [12573.258505] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
> > [12573.265891] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
> > [12573.273248] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
> > [12573.280564] [<ffffffff8107a91d>] kthread+0xed/0x100
> > [12573.287807] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
>
> Notice the above call chain.
>
> rcu_start_future_gp() is called with the rnp->lock held. Then it calls
> rcu_start_gp_advance, which does a wakeup.
>
> You can't do wakeups while holding the rnp->lock, as that would mean
> that you could not do a rcu_read_unlock() while holding the rq lock, or
> any lock that was taken while holding the rq lock. This is because...
> (See below).
>
>
> > [12573.295067]
> > -> #0 (rcu_node_0){..-.-.}:
> > [12573.309293] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
> > [12573.316568] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> > [12573.323825] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> > [12573.331081] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> > [12573.338377] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
> > [12573.345648] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
> > [12573.352942] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
> > [12573.360211] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
> > [12573.367514] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
> > [12573.374816] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
>
> Notice the above trace.
>
> perf took its own ctx->lock, which can be taken while holding the rq
> lock. While holding this lock, it did a rcu_read_unlock(). The
> perf_lock_task_context() basically looks like:
>
> rcu_read_lock();
> raw_spin_lock(ctx->lock);
> rcu_read_unlock();
>
> Now, what looks to have happened, is that we scheduled after taking that
> first rcu_read_lock() but before taking the spin lock. When we scheduled
> back in and took the ctx->lock, the following rcu_read_unlock()
> triggered the "special" code.
>
> The rcu_read_unlock_special() takes the rnp->lock, which gives us a
> possible deadlock scenario.
>
> CPU0 CPU1 CPU2
> ---- ---- ----
>
> rcu_nocb_kthread()
> lock(rq->lock);
> lock(ctx->lock);
> lock(rnp->lock);
>
> wake_up();
>
> lock(rq->lock);
>
> rcu_read_unlock();
>
> rcu_read_unlock_special();
>
> lock(rnp->lock);
> lock(ctx->lock);
>
> **** DEADLOCK ****
>
>
> > [12573.382068]
> > other info that might help us debug this:
> >
> > [12573.403229] Chain exists of:
> > rcu_node_0 --> &rq->lock --> &ctx->lock
> >
> > [12573.424471] Possible unsafe locking scenario:
> >
> > [12573.438499] CPU0 CPU1
> > [12573.445599] ---- ----
> > [12573.452691] lock(&ctx->lock);
> > [12573.459799] lock(&rq->lock);
> > [12573.467010] lock(&ctx->lock);
> > [12573.474192] lock(rcu_node_0);
> > [12573.481262]
> > *** DEADLOCK ***
> >
> > [12573.501931] 1 lock held by trinity-child17/31341:
> > [12573.508990] #0: (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
> > [12573.516475]
> > stack backtrace:
> > [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
> > [12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
> > [12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
> > [12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
> > [12573.567856] Call Trace:
> > [12573.575011] [<ffffffff816e375b>] dump_stack+0x19/0x1b
> > [12573.582284] [<ffffffff816dfa5d>] print_circular_bug+0x200/0x20f
> > [12573.589637] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
> > [12573.596982] [<ffffffff810918f5>] ? sched_clock_cpu+0xb5/0x100
> > [12573.604344] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> > [12573.611652] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
> > [12573.619030] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> > [12573.626331] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
> > [12573.633671] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> > [12573.640992] [<ffffffff811390ed>] ? perf_lock_task_context+0x7d/0x2d0
> > [12573.648330] [<ffffffff810b429e>] ? put_lock_stats.isra.29+0xe/0x40
> > [12573.655662] [<ffffffff813095a0>] ? delay_tsc+0x90/0xe0
> > [12573.662964] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
> > [12573.670276] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
> > [12573.677622] [<ffffffff81139070>] ? __perf_event_enable+0x370/0x370
> > [12573.684981] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
> > [12573.692358] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
> > [12573.699753] [<ffffffff8108cd9d>] ? get_parent_ip+0xd/0x50
> > [12573.707135] [<ffffffff810b71fd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> > [12573.714599] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
> > [12573.721996] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
>
> Here's one solution, to delay the wakeup via a irq_work: This is what
> perf and ftrace use to perform wakeups in critical sections.
>
> -- Steve
>
> Signed-off-by: Steven Rostedt <[email protected]>
>
> Index: linux-trace.git/init/Kconfig
> ===================================================================
> --- linux-trace.git.orig/init/Kconfig
> +++ linux-trace.git/init/Kconfig
> @@ -431,6 +431,7 @@ choice
> config TREE_RCU
> bool "Tree-based hierarchical RCU"
> depends on !PREEMPT && SMP
> + select IRQ_WORK
> help
> This option selects the RCU implementation that is
> designed for very large SMP system with hundreds or
> Index: linux-trace.git/kernel/rcutree.c
> ===================================================================
> --- linux-trace.git.orig/kernel/rcutree.c
> +++ linux-trace.git/kernel/rcutree.c
> @@ -1613,6 +1613,14 @@ static int __noreturn rcu_gp_kthread(voi
> }
> }
>
> +static void rsp_wakeup(struct irq_work *work)
> +{
> + struct rcu_state *rsp = container_of(work, struct rcu_state, wakeup_work);
> +
> + /* Wake up rcu_gp_kthread() to start the grace period. */
> + wake_up(&rsp->gp_wq);
> +}
> +
> /*
> * Start a new RCU grace period if warranted, re-initializing the hierarchy
> * in preparation for detecting the next grace period. The caller must hold
> @@ -1637,8 +1645,12 @@ rcu_start_gp_advanced(struct rcu_state *
> }
> rsp->gp_flags = RCU_GP_FLAG_INIT;
>
> - /* Wake up rcu_gp_kthread() to start the grace period. */
> - wake_up(&rsp->gp_wq);
> + /*
> + * We can't do wakeups while holding the rnp->lock, as that
> + * could cause possible deadlocks with the rq->lock. Deter
> + * the wakeup to interrupt context.
> + */
> + irq_work_queue(&rsp->wakeup_work);
> }
>
> /*
> @@ -3235,6 +3247,7 @@ static void __init rcu_init_one(struct r
>
> rsp->rda = rda;
> init_waitqueue_head(&rsp->gp_wq);
> + init_irq_work(&rsp->wakeup_work, rsp_wakeup);
> rnp = rsp->level[rcu_num_lvls - 1];
> for_each_possible_cpu(i) {
> while (i > rnp->grphi)
> Index: linux-trace.git/kernel/rcutree.h
> ===================================================================
> --- linux-trace.git.orig/kernel/rcutree.h
> +++ linux-trace.git/kernel/rcutree.h
> @@ -27,6 +27,7 @@
> #include <linux/threads.h>
> #include <linux/cpumask.h>
> #include <linux/seqlock.h>
> +#include <linux/irq_work.h>
>
> /*
> * Define shape of hierarchy based on NR_CPUS, CONFIG_RCU_FANOUT, and
> @@ -442,6 +443,7 @@ struct rcu_state {
> char *name; /* Name of structure. */
> char abbr; /* Abbreviated name. */
> struct list_head flavors; /* List of RCU flavors. */
> + struct irq_work wakeup_work; /* Postponed wakeups */
> };
>
> /* Values for rcu_state structure's gp_flags field. */
>
>

Subject: [tip:sched/core] tracing/context-tracking: Add preempt_schedule_context() for tracing

Commit-ID: 29bb9e5a75684106a37593ad75ec75ff8312731b
Gitweb: http://git.kernel.org/tip/29bb9e5a75684106a37593ad75ec75ff8312731b
Author: Steven Rostedt <[email protected]>
AuthorDate: Fri, 24 May 2013 15:23:40 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 19 Jun 2013 12:55:10 +0200

tracing/context-tracking: Add preempt_schedule_context() for tracing

Dave Jones hit the following bug report:

===============================
[ INFO: suspicious RCU usage. ]
3.10.0-rc2+ #1 Not tainted
-------------------------------
include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
other info that might help us debug this:
RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
2 locks held by cc1/63645:
#0: (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
#1: (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0

CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
Call Trace:
[<ffffffff816ae383>] dump_stack+0x19/0x1b
[<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
[<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
[<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
[<ffffffff8108dffc>] update_curr+0xec/0x240
[<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
[<ffffffff816b3a71>] __schedule+0x161/0x9b0
[<ffffffff816b4721>] preempt_schedule+0x51/0x80
[<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
[<ffffffff816b6824>] ? retint_careful+0x12/0x2e
[<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
[<ffffffff816be280>] ftrace_call+0x5/0x2f
[<ffffffff816b681d>] ? retint_careful+0xb/0x2e
[<ffffffff816b4805>] ? schedule_user+0x5/0x70
[<ffffffff816b4805>] ? schedule_user+0x5/0x70
[<ffffffff816b6824>] ? retint_careful+0x12/0x2e
------------[ cut here ]------------

What happened was that the function tracer traced the schedule_user() code
that tells RCU that the system is coming back from userspace, and to
add the CPU back to the RCU monitoring.

Because the function tracer does a preempt_disable/enable_notrace() calls
the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set,
then preempt_schedule() is called. But this is called before the user_exit()
function can inform the kernel that the CPU is no longer in user mode and
needs to be accounted for by RCU.

The fix is to create a new preempt_schedule_context() that checks if
the kernel is still in user mode and if so to switch it to kernel mode
before calling schedule. It also switches back to user mode coming back
from schedule in need be.

The only user of this currently is the preempt_enable_notrace(), which is
only used by the tracing subsystem.

Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/preempt.h | 18 +++++++++++++++++-
kernel/context_tracking.c | 40 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 87a03c7..f5d4723 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -33,9 +33,25 @@ do { \
preempt_schedule(); \
} while (0)

+#ifdef CONFIG_CONTEXT_TRACKING
+
+void preempt_schedule_context(void);
+
+#define preempt_check_resched_context() \
+do { \
+ if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
+ preempt_schedule_context(); \
+} while (0)
+#else
+
+#define preempt_check_resched_context() preempt_check_resched()
+
+#endif /* CONFIG_CONTEXT_TRACKING */
+
#else /* !CONFIG_PREEMPT */

#define preempt_check_resched() do { } while (0)
+#define preempt_check_resched_context() do { } while (0)

#endif /* CONFIG_PREEMPT */

@@ -88,7 +104,7 @@ do { \
do { \
preempt_enable_no_resched_notrace(); \
barrier(); \
- preempt_check_resched(); \
+ preempt_check_resched_context(); \
} while (0)

#else /* !CONFIG_PREEMPT_COUNT */
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 65349f0..6667700 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -71,6 +71,46 @@ void user_enter(void)
local_irq_restore(flags);
}

+#ifdef CONFIG_PREEMPT
+/**
+ * preempt_schedule_context - preempt_schedule called by tracing
+ *
+ * The tracing infrastructure uses preempt_enable_notrace to prevent
+ * recursion and tracing preempt enabling caused by the tracing
+ * infrastructure itself. But as tracing can happen in areas coming
+ * from userspace or just about to enter userspace, a preempt enable
+ * can occur before user_exit() is called. This will cause the scheduler
+ * to be called when the system is still in usermode.
+ *
+ * To prevent this, the preempt_enable_notrace will use this function
+ * instead of preempt_schedule() to exit user context if needed before
+ * calling the scheduler.
+ */
+void __sched notrace preempt_schedule_context(void)
+{
+ struct thread_info *ti = current_thread_info();
+ enum ctx_state prev_ctx;
+
+ if (likely(ti->preempt_count || irqs_disabled()))
+ return;
+
+ /*
+ * Need to disable preemption in case user_exit() is traced
+ * and the tracer calls preempt_enable_notrace() causing
+ * an infinite recursion.
+ */
+ preempt_disable_notrace();
+ prev_ctx = exception_enter();
+ preempt_enable_no_resched_notrace();
+
+ preempt_schedule();
+
+ preempt_disable_notrace();
+ exception_exit(prev_ctx);
+ preempt_enable_notrace();
+}
+EXPORT_SYMBOL_GPL(preempt_schedule_context);
+#endif /* CONFIG_PREEMPT */

/**
* user_exit - Inform the context tracking that the CPU is