2013-06-10 19:51:58

by Dave Jones

[permalink] [raw]
Subject: Another RCU trace. (3.10-rc5)

Something in this code still isn't right in rc5.
I got a pretty mangled trace (and then total lockup) pretty quickly..

[ 1300.705212] ===============================
[ 1300.707514] [ INFO: suspicious RCU usage. ]
[ 1300.709794] 3.10.0-rc5+ #6 Not tainted
[ 1300.712064] -------------------------------
[ 1300.714363] include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
[ 1300.717516]
[ 1300.717516] other info that might help us debug this:
[ 1300.717516]
[ 1300.723450]
[ 1300.723450] RCU used illegally from idle CPU!
[ 1300.723450] rcu_scheduler_active = 1, debug_locks = 0
[ 1300.730124] RCU used illegally from extended quiescent state!
[ 1300.732971] 2 locks held by kworker/0:0/12904:
[ 1300.735395] #0: (&(&(&p->vtime_seqlock)->lock)->rlock){-.-.-.}, at: [<ffffffff8109203e>] vtime_account_system.part.1+0x1e/0x50
[ 1300.740229] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff810a5815>] cpuacct_account_field+0x5/0x200
[ 1300.744305]
[ 1300.744305] stack backtrace:
[ 1300.747817] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
[ 1300.756486] ------------[ cut here ]------------
[ 1300.759463] WARNING: at kernel/sched/cputime.c:652 get_vtime_delta+0x64/0x80()
[ 1300.780868] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
[ 1300.783521] ------------[ cut here ]------------
[ 1300.784848] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
[ 1300.792216] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
[ 1300.795211] ------------[ cut here ]------------
[ 1300.796734] WARNING: at kernel/sched/cputime.c:652 get_vtime_delta+0x64/0x80()
[ 1300.804965] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
[ 1300.808270] ------------[ cut here ]------------
[ 1300.809921] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
[ 1300.811591] Modules linked in: tun rfcomm fuse bnep hidp ipt_ULOG nfnetlink scsi_transport_iscsi can_raw af_rxrpc rds nfc can_bcm netrom decnet rose af_key af_802154 can appletalk x25 bluetooth llc2 rfkill pppoe caif_socket pppox caif ppp_generic slhc irda ipx p8023 psnap p8022 llc crc_ccitt atm ax25 phonet nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel snd_hda_codec_realtek kvm crc32c_intel snd_hda_codec_hdmi ghash_clmulni_intel microcode pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq e1000e snd_seq_device usb_debug snd_pcm ptp snd_page_alloc pps_core snd_timer snd soundcore xfs libcrc32c
[ 1300.818924] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
[ 1300.822581] ------------[ cut here ]------------
[ 1300.824419] WARNING: at kernel/sched/cputime.c:652 get_vtime_delta+0x64/0x80()
[ 1300.826271] Modules linked in: tun rfcomm fuse bnep hidp ipt_ULOG nfnetlink scsi_transport_iscsi can_raw af_rxrpc rds nfc can_bcm netrom decnet rose af_key af_802154 can appletalk x25 bluetooth llc2 rfkill pppoe caif_socket pppox caif ppp_generic slhc irda ipx p8023 psnap p8022 llc crc_ccitt atm ax25 phonet nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel snd_hda_codec_realtek kvm crc32c_intel snd_hda_codec_hdmi ghash_clmulni_intel microcode pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq e1000e snd_seq_device usb_debug snd_pcm ptp snd_page_alloc pps_core snd_timer snd soundcore xfs libcrc32c
[ 1300.834252] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
[ 1300.838244] ------------[ cut here ]------------
[ 1300.840231] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
[ 1300.895079] WARNING: at kernel/sched/cputime.c:652 get_v[ 1300.8 af_802154 can appletalk x25 bluetooth llc2 rfkill pppoe nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel snd_h snd_hda_intel snd_hda_codec snd_hwdep snd_seq e1000e sn
[ 1300.907729] CPU: 0 PID: 12904 Comm: kworker/0:0 Not
[ 1300.914988] ffffffff8104a1b1 000000000b89a53f ffff
[ 1300.917385] 0000000000000008 ffffffff81101b57 ffff
[ 1300.919760] Call Trace:
[ 1300.922120] [<ffffffff816e39db>] dump_stack+0x19/0x1b
[ 1300.924429] [<ffffffff8104a1b1>] warn_slowpath_common+0x61/0x80
[ 1300.926721] [<ffffffff81101b57>] ? rcu_eqs_exit_common+0x357/0x380
[ 1300.928993] [<ffffffff8104a28a>] warn_slowpath_null+0x1a/0x20
[ 1300.931243] [<ffffffff81091b94>] get_vtime_delta+0x64/0x80
[ 1300.933487] [<ffffffff810922f9>] vtime_account_user+0x29/0x[ 1300.935702] [<ffffffff81145285>] user_exit+0[ 1300.937882] [<ffffffff816f0591>] do_page_fau[ 1300.940037] [<ffffffff816ed162>] page_fault+0x22/0x3[ 1300.942158] [<ffffffff81101b57>] ? rcu_eqs_e[ 1300.944270] [<ffffffff81308fb9>] ? copy_user[ 1300.946365] [<ffffffff8114b46c>] ? __probe_k[ 1300.948437] [<ffffffff81072598>] print_worke[ 1300.950502] [<ffffffff8104de32>] dump_stack_[ 1300.952543] [<ffffffff816e39d2>] dum[ 1300.954546] [<ffffffff8104a1b1>] warn_slowpa[ 1300.956525] [<ffffffff8104a28a>] warn_slowpa[ 1300.958471] [<ffffffff81101b57>] rcu[ 1300.960401] [<ffffffff81091b94>] ? get_vtime[ 1300.962309] [<ffffffff81101bda>] rcu[ 1300.964196] [<ffffffff811039e3>] rcu_user_ex[ 1300.966071] [<ffffffff81145275>] use[ 1300.967911] [<ffffffff816f0591>] do_[ 1300.969729] [<ffffffff816ed162>] pag[ 1300.971510] [<ffffffff81091b94>] ? get_vtime[ 1300.973270] [<ffffffff81308fb9>] ? c[ 1300.975014] [ 1300.989735] [ 1300.991250] [<ffffffff816f05[ 1300.992759] [<ffffffff816ed1[ 1300.994234] [<ffffffff81101b57>] ? r[ 1300.995698] [<ffffffff81308f[ 1300.997130] [<ffffffff8114b4[ 1300.998534] [<ffffffff810725[ 1300.999918] [<ffffffff8104de[ 1301.001284] [<ffffffff816e39[ 1301.002620] [<ffffffff8104a1[ 1301.003955] [<ffffff[ 1301.005255] [<ffffffff81101b[ 1301.006533] [<ffffffff81091b[ 1301.007780] [<ffffff[ 1301.008997] [<ffffffff811039[ 1301.010190] [<ffffff[ 1301.011355] [<ffffff[ 1301.012504] [<ffffffff816ed1[ 1301.013642] [<ffffff[ 1301.014758] [<ffffff[ 1301.015852] [<ffffff[ 1301.016913] [<ffffff[ 1301.017948] [<ffffff[ 1301.018965] [<ffffff[ 1301.019951] [<ffffff[ 1301.020924] [ 1301.021895] [<ffffff[ 1301.022861] [<ffffff[ 1301.023820] [<ffffff[ 1301.024776] [<ffffff[ 1301.025726] [ 1301.026679] [<ffffff[ 1301.027626] [<ffffff[ 1301.028585] [<ffffff[ 1301.029544] [ 1301.030502] [<ffffff[ 1301.031463] [<ffffff[ 1301.032430] [<ffffff[ 1301.033391] [<ffffff[ 1301.034351] [ 1301.035305] [<ffffff[ 1301.036264] [<ffffff[ 1301.037218] [ 1301.038171] [<ffffff[ 1301.039122] [<ffffff[ 1301.040069] [<ffffff[ 1301.041018] [ 1301.041965] [<ffffff[ 1301.042916] [<ffffff[ 1301.043880] [<ffffff[ 1301.044842] [<ffffff[ 1301.045808] [ 1301.046780] [<ffffff[ 1301.047737] [<ffffff[ 1301.048705] [<ffffff[ 1301.049669] [ 1301.050632] [<ffffff[ 1301.051594] [<ffffff[ 1301.052557] [<ffffff[ 1301.053517] [<ffffff[ 1301.054476] [ 1301.055430] [<ffffff[ 1301.056392] [<ffffff[ 1301.057361] [<ffffff[ 1301.058329] [ 1301.059303] [<ffffff[ 1301.060281] [<ffffff[ 1301.061247] [<ffffff[ 1301.062217] [<ffffff[ 1301.063180] [ 1301.064146] [<ffffff[ 1301.065110] [<ffffff[ 1301.066069] [<ffffff[ 1301.067028] [<ffffff[ 1301.067982] [ 1301.068944] [<ffffff[ 1301.069899] [<ffffff[ 1301.070860] [<ffffff[ 1301.071834] [ 1301.072808] [<ffffff[ 1301.073781] [<ffffff[ 1301.074761] [<ffffff[ 1301.075732] [<ffffff[ 1301.0[ 1301.0[ 1301.092183] [<ffffff[ 1301.093150] [ 1301.094113] [<ffffff[ 1301.095081] [<ffffff[ 1301.096050] [<ffffff[ 1301.097011] [<ffffff[ 1301.097975] [ 1301.098944] [<ffffff[ 1301.099917] [<ffffff[ 1301.100891] [<ffffff[ 1301.101859] [<ffffff[ 1301.102834] -[ 1301.103835] Thread ov

and then total lockup.


I saw some of Steven's patches get merged on Friday, is there anything else
outstanding that didn't make it in yet that I could test ?
Or is this another new bug ?

Dave


2013-06-10 20:34:24

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Another RCU trace. (3.10-rc5)

On Mon, Jun 10, 2013 at 03:51:45PM -0400, Dave Jones wrote:
> Something in this code still isn't right in rc5.
> I got a pretty mangled trace (and then total lockup) pretty quickly..
>
> [ 1300.705212] ===============================
> [ 1300.707514] [ INFO: suspicious RCU usage. ]
> [ 1300.709794] 3.10.0-rc5+ #6 Not tainted
> [ 1300.712064] -------------------------------
> [ 1300.714363] include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
> [ 1300.717516]
> [ 1300.717516] other info that might help us debug this:
> [ 1300.717516]
> [ 1300.723450]
> [ 1300.723450] RCU used illegally from idle CPU!
> [ 1300.723450] rcu_scheduler_active = 1, debug_locks = 0
> [ 1300.730124] RCU used illegally from extended quiescent state!
> [ 1300.732971] 2 locks held by kworker/0:0/12904:
> [ 1300.735395] #0: (&(&(&p->vtime_seqlock)->lock)->rlock){-.-.-.}, at: [<ffffffff8109203e>] vtime_account_system.part.1+0x1e/0x50
> [ 1300.740229] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff810a5815>] cpuacct_account_field+0x5/0x200
> [ 1300.744305]
> [ 1300.744305] stack backtrace:
> [ 1300.747817] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.756486] ------------[ cut here ]------------
> [ 1300.759463] WARNING: at kernel/sched/cputime.c:652 get_vtime_delta+0x64/0x80()
> [ 1300.780868] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.783521] ------------[ cut here ]------------
> [ 1300.784848] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
> [ 1300.792216] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.795211] ------------[ cut here ]------------
> [ 1300.796734] WARNING: at kernel/sched/cputime.c:652 get_vtime_delta+0x64/0x80()
> [ 1300.804965] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.808270] ------------[ cut here ]------------
> [ 1300.809921] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
> [ 1300.811591] Modules linked in: tun rfcomm fuse bnep hidp ipt_ULOG nfnetlink scsi_transport_iscsi can_raw af_rxrpc rds nfc can_bcm netrom decnet rose af_key af_802154 can appletalk x25 bluetooth llc2 rfkill pppoe caif_socket pppox caif ppp_generic slhc irda ipx p8023 psnap p8022 llc crc_ccitt atm ax25 phonet nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel snd_hda_codec_realtek kvm crc32c_intel snd_hda_codec_hdmi ghash_clmulni_intel microcode pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq e1000e snd_seq_device usb_debug snd_pcm ptp snd_page_alloc pps_core snd_timer snd soundcore xfs libcrc32c
> [ 1300.818924] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.822581] ------------[ cut here ]------------
> [ 1300.824419] WARNING: at kernel/sched/cputime.c:652 get_vtime_delta+0x64/0x80()
> [ 1300.826271] Modules linked in: tun rfcomm fuse bnep hidp ipt_ULOG nfnetlink scsi_transport_iscsi can_raw af_rxrpc rds nfc can_bcm netrom decnet rose af_key af_802154 can appletalk x25 bluetooth llc2 rfkill pppoe caif_socket pppox caif ppp_generic slhc irda ipx p8023 psnap p8022 llc crc_ccitt atm ax25 phonet nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel snd_hda_codec_realtek kvm crc32c_intel snd_hda_codec_hdmi ghash_clmulni_intel microcode pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq e1000e snd_seq_device usb_debug snd_pcm ptp snd_page_alloc pps_core snd_timer snd soundcore xfs libcrc32c
> [ 1300.834252] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.838244] ------------[ cut here ]------------
> [ 1300.840231] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
> [ 1300.895079] WARNING: at kernel/sched/cputime.c:652 get_v[ 1300.8 af_802154 can appletalk x25 bluetooth llc2 rfkill pppoe nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel snd_h snd_hda_intel snd_hda_codec snd_hwdep snd_seq e1000e sn
> [ 1300.907729] CPU: 0 PID: 12904 Comm: kworker/0:0 Not
> [ 1300.914988] ffffffff8104a1b1 000000000b89a53f ffff
> [ 1300.917385] 0000000000000008 ffffffff81101b57 ffff
> [ 1300.919760] Call Trace:
> [ 1300.922120] [<ffffffff816e39db>] dump_stack+0x19/0x1b
> [ 1300.924429] [<ffffffff8104a1b1>] warn_slowpath_common+0x61/0x80
> [ 1300.926721] [<ffffffff81101b57>] ? rcu_eqs_exit_common+0x357/0x380
> [ 1300.928993] [<ffffffff8104a28a>] warn_slowpath_null+0x1a/0x20
> [ 1300.931243] [<ffffffff81091b94>] get_vtime_delta+0x64/0x80
> [ 1300.933487] [<ffffffff810922f9>] vtime_account_user+0x29/0x[ 1300.935702] [<ffffffff81145285>] user_exit+0[ 1300.937882] [<ffffffff816f0591>] do_page_fau[ 1300.940037] [<ffffffff816ed162>] page_fault+0x22/0x3[ 1300.942158] [<ffffffff81101b57>] ? rcu_eqs_e[ 1300.944270] [<ffffffff81308fb9>] ? copy_user[ 1300.946365] [<ffffffff8114b46c>] ? __probe_k[ 1300.948437] [<ffffffff81072598>] print_worke[ 1300.950502] [<ffffffff8104de32>] dump_stack_[ 1300.952543] [<ffffffff816e39d2>] dum[ 1300.954546] [<ffffffff8104a1b1>] warn_slowpa[ 1300.956525] [<ffffffff8104a28a>] warn_slowpa[ 1300.958471] [<ffffffff81101b57>] rcu[ 1300.960401] [<ffffffff81091b94>] ? get_vtime[ 1300.962309] [<ffffffff81101bda>] rcu[ 1300.964196] [<ffffffff811039e3>] rcu_user_ex[ 1300.966071] [<ffffffff81145275>] use[ 1300.967911] [<ffffffff816f0591>] do_[ 1300.969729] [<ffffffff816ed162>] pag[ 1300.971510] [<ffffffff81091b94>] ? get_vtime[ 1300.973270] [<ffffffff81308fb9>] ? c[ 1300.975014] [ 1300.989735] [ 1300.991250] [<ffffffff816f05[ 1300.992759] [<ffffffff816ed1[ 1300.994234] [<ffffffff81101b57>] ? r[ 1300.995698] [<ffffffff81308f[ 1300.997130] [<ffffffff8114b4[ 1300.998534] [<ffffffff810725[ 1300.999918] [<ffffffff8104de[ 1301.001284] [<ffffffff816e39[ 1301.002620] [<ffffffff8104a1[ 1301.003955] [<ffffff[ 1301.005255] [<ffffffff81101b[ 1301.006533] [<ffffffff81091b[ 1301.007780] [<ffffff[ 1301.008997] [<ffffffff811039[ 1301.010190] [<ffffff[ 1301.011355] [<ffffff[ 1301.012504] [<ffffffff816ed1[ 1301.013642] [<ffffff[ 1301.014758] [<ffffff[ 1301.015852] [<ffffff[ 1301.016913] [<ffffff[ 1301.017948] [<ffffff[ 1301.018965] [<ffffff[ 1301.019951] [<ffffff[ 1301.020924] [ 1301.021895] [<ffffff[ 1301.022861] [<ffffff[ 1301.023820] [<ffffff[ 1301.024776] [<ffffff[ 1301.025726] [ 1301.026679] [<ffffff[ 1301.027626] [<ffffff[ 1301.028585] [<ffffff[ 1301.029544] [ 1301.030502] [<ffffff[ 1301.031463] [<ffffff[ 1301.032430] [<ffffff[ 1301.033391] [<ffffff[ 1301.034351] [ 1301.035305] [<ffffff[ 1301.036264] [<ffffff[ 1301.037218] [ 1301.038171] [<ffffff[ 1301.039122] [<ffffff[ 1301.040069] [<ffffff[ 1301.041018] [ 1301.041965] [<ffffff[ 1301.042916] [<ffffff[ 1301.043880] [<ffffff[ 1301.044842] [<ffffff[ 1301.045808] [ 1301.046780] [<ffffff[ 1301.047737] [<ffffff[ 1301.048705] [<ffffff[ 1301.049669] [ 1301.050632] [<ffffff[ 1301.051594] [<ffffff[ 1301.052557] [<ffffff[ 1301.053517] [<ffffff[ 1301.054476] [ 1301.055430] [<ffffff[ 1301.056392] [<ffffff[ 1301.057361] [<ffffff[ 1301.058329] [ 1301.059303] [<ffffff[ 1301.060281] [<ffffff[ 1301.061247] [<ffffff[ 1301.062217] [<ffffff[ 1301.063180] [ 1301.064146] [<ffffff[ 1301.065110] [<ffffff[ 1301.066069] [<ffffff[ 1301.067028] [<ffffff[ 1301.067982] [ 1301.068944] [<ffffff[ 1301.069899] [<ffffff[ 1301.070860] [<ffffff[ 1301.071834] [ 1301.072808] [<ffffff[ 1301.073781] [<ffffff[ 1301.074761] [<ffffff[ 1301.075732] [<ffffff[ 1301.0[ 1301.0[ 1301.092183] [<ffffff[ 1301.093150] [ 1301.094113] [<ffffff[ 1301.095081] [<ffffff[ 1301.096050] [<ffffff[ 1301.097011] [<ffffff[ 1301.097975] [ 1301.098944] [<ffffff[ 1301.099917] [<ffffff[ 1301.100891] [<ffffff[ 1301.101859] [<ffffff[ 1301.102834] -[ 1301.103835] Thread ov
>
> and then total lockup.
>
>
> I saw some of Steven's patches get merged on Friday, is there anything else
> outstanding that didn't make it in yet that I could test ?
> Or is this another new bug ?

I have three fixes queued up at:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent

Kind of hard to tell whether they are relevant given the interleaved
stack traces, but can't hurt to try them out.

Thanx, Paul

2013-06-10 20:38:47

by Steven Rostedt

[permalink] [raw]
Subject: Re: Another RCU trace. (3.10-rc5)

On Mon, 2013-06-10 at 15:51 -0400, Dave Jones wrote:
> Something in this code still isn't right in rc5.
> I got a pretty mangled trace (and then total lockup) pretty quickly..
>
> [ 1300.705212] ===============================
> [ 1300.707514] [ INFO: suspicious RCU usage. ]
> [ 1300.709794] 3.10.0-rc5+ #6 Not tainted
> [ 1300.712064] -------------------------------
> [ 1300.714363] include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
> [ 1300.717516]
> [ 1300.717516] other info that might help us debug this:
> [ 1300.717516]
> [ 1300.723450]
> [ 1300.723450] RCU used illegally from idle CPU!
> [ 1300.723450] rcu_scheduler_active = 1, debug_locks = 0
> [ 1300.730124] RCU used illegally from extended quiescent state!
> [ 1300.732971] 2 locks held by kworker/0:0/12904:
> [ 1300.735395] #0: (&(&(&p->vtime_seqlock)->lock)->rlock){-.-.-.}, at: [<ffffffff8109203e>] vtime_account_system.part.1+0x1e/0x50
> [ 1300.740229] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff810a5815>] cpuacct_account_field+0x5/0x200
> [ 1300.744305]
> [ 1300.744305] stack backtrace:
> [ 1300.747817] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.756486] ------------[ cut here ]------------
> [ 1300.759463] WARNING: at kernel/sched/cputime.c:652 get_vtime_delta+0x64/0x80()
> [ 1300.780868] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.783521] ------------[ cut here ]------------
> [ 1300.784848] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
> [ 1300.792216] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.795211] ------------[ cut here ]------------
> [ 1300.796734] WARNING: at kernel/sched/cputime.c:652 get_vtime_delta+0x64/0x80()
> [ 1300.804965] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.808270] ------------[ cut here ]------------
> [ 1300.809921] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
> [ 1300.811591] Modules linked in: tun rfcomm fuse bnep hidp ipt_ULOG nfnetlink scsi_transport_iscsi can_raw af_rxrpc rds nfc can_bcm netrom decnet rose af_key af_802154 can appletalk x25 bluetooth llc2 rfkill pppoe caif_socket pppox caif ppp_generic slhc irda ipx p8023 psnap p8022 llc crc_ccitt atm ax25 phonet nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel snd_hda_codec_realtek kvm crc32c_intel snd_hda_codec_hdmi ghash_clmulni_intel microcode pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq e1000e snd_seq_device usb_debug snd_pcm ptp snd_page_alloc pps_core snd_timer snd soundcore xfs libcrc32c
> [ 1300.818924] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.822581] ------------[ cut here ]------------
> [ 1300.824419] WARNING: at kernel/sched/cputime.c:652 get_vtime_delta+0x64/0x80()
> [ 1300.826271] Modules linked in: tun rfcomm fuse bnep hidp ipt_ULOG nfnetlink scsi_transport_iscsi can_raw af_rxrpc rds nfc can_bcm netrom decnet rose af_key af_802154 can appletalk x25 bluetooth llc2 rfkill pppoe caif_socket pppox caif ppp_generic slhc irda ipx p8023 psnap p8022 llc crc_ccitt atm ax25 phonet nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel snd_hda_codec_realtek kvm crc32c_intel snd_hda_codec_hdmi ghash_clmulni_intel microcode pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq e1000e snd_seq_device usb_debug snd_pcm ptp snd_page_alloc pps_core snd_timer snd soundcore xfs libcrc32c
> [ 1300.834252] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.838244] ------------[ cut here ]------------
> [ 1300.840231] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
> [ 1300.895079] WARNING: at kernel/sched/cputime.c:652 get_v[ 1300.8 af_802154 can appletalk x25 bluetooth llc2 rfkill pppoe nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel snd_h snd_hda_intel snd_hda_codec snd_hwdep snd_seq e1000e sn
> [ 1300.907729] CPU: 0 PID: 12904 Comm: kworker/0:0 Not
> [ 1300.914988] ffffffff8104a1b1 000000000b89a53f ffff
> [ 1300.917385] 0000000000000008 ffffffff81101b57 ffff
> [ 1300.919760] Call Trace:
> [ 1300.922120] [<ffffffff816e39db>] dump_stack+0x19/0x1b
> [ 1300.924429] [<ffffffff8104a1b1>] warn_slowpath_common+0x61/0x80
> [ 1300.926721] [<ffffffff81101b57>] ? rcu_eqs_exit_common+0x357/0x380
> [ 1300.928993] [<ffffffff8104a28a>] warn_slowpath_null+0x1a/0x20
> [ 1300.931243] [<ffffffff81091b94>] get_vtime_delta+0x64/0x80
> [ 1300.933487] [<ffffffff810922f9>] vtime_account_user+0x29/0x[ 1300.935702] [<ffffffff81145285>] user_exit+0[ 1300.937882] [<ffffffff816f0591>] do_page_fau[ 1300.940037] [<ffffffff816ed162>] page_fault+0x22/0x3[ 1300.942158] [<ffffffff81101b57>] ? rcu_eqs_e[ 1300.944270] [<ffffffff81308fb9>] ? copy_user[ 1300.946365] [<ffffffff8114b46c>] ? __probe_k[ 1300.948437] [<ffffffff81072598>] print_worke[ 1300.950502] [<ffffffff8104de32>] dump_stack_[ 1300.952543] [<ffffffff816e39d2>] dum[ 1300.954546] [<ffffffff8104a1b1>] warn_slowpa[ 1300.956525] [<ffffffff8104a28a>] warn_slowpa[ 1300.958471] [<ffffffff81101b57>] rcu[ 1300.960401] [<ffffffff81091b94>] ? get_vtime[ 1300.962309] [<ffffffff81101bda>] rcu[ 1300.964196] [<ffffffff811039e3>] rcu_user_ex[ 1300.966071] [<ffffffff81145275>] use[ 1300.967911] [<ffffffff816f0591>] do_[ 1300.969729] [<ffffffff816ed162>] pag[ 1300.971510] [<ffffffff81091b94>] ? get_vtime[ 1300.973270] [<ffffffff81308fb9>] ? c[ 1300.975014] [ 1300.989735] [ 1300.991250] [<ffffffff816f05[ 1300.992759] [<ffffffff816ed1[ 1300.994234] [<ffffffff81101b57>] ? r[ 1300.995698] [<ffffffff81308f[ 1300.997130] [<ffffffff8114b4[ 1300.998534] [<ffffffff810725[ 1300.999918] [<ffffffff8104de[ 1301.001284] [<ffffffff816e39[ 1301.002620] [<ffffffff8104a1[ 1301.003955] [<ffffff[ 1301.005255] [<ffffffff81101b[ 1301.006533] [<ffffffff81091b[ 1301.007780] [<ffffff[ 1301.008997] [<ffffffff811039[ 1301.010190] [<ffffff[ 1301.011355] [<ffffff[ 1301.012504] [<ffffffff816ed1[ 1301.013642] [<ffffff[ 1301.014758] [<ffffff[ 1301.015852] [<ffffff[ 1301.016913] [<ffffff[ 1301.017948] [<ffffff[ 1301.018965] [<ffffff[ 1301.019951] [<ffffff[ 1301.020924] [ 1301.021895] [<ffffff[ 1301.022861] [<ffffff[ 1301.023820] [<ffffff[ 1301.024776] [<ffffff[ 1301.025726] [ 1301.026679] [<ffffff[ 1301.027626] [<ffffff[ 1301.028585] [<ffffff[ 1301.029544] [ 1301.030502] [<ffffff[ 1301.031463] [<ffffff[ 1301.032430] [<ffffff[ 1301.033391] [<ffffff[ 1301.034351] [ 1301.035305] [<ffffff[ 1301.036264] [<ffffff[ 1301.037218] [ 1301.038171] [<ffffff[ 1301.039122] [<ffffff[ 1301.040069] [<ffffff[ 1301.041018] [ 1301.041965] [<ffffff[ 1301.042916] [<ffffff[ 1301.043880] [<ffffff[ 1301.044842] [<ffffff[ 1301.045808] [ 1301.046780] [<ffffff[ 1301.047737] [<ffffff[ 1301.048705] [<ffffff[ 1301.049669] [ 1301.050632] [<ffffff[ 1301.051594] [<ffffff[ 1301.052557] [<ffffff[ 1301.053517] [<ffffff[ 1301.054476] [ 1301.055430] [<ffffff[ 1301.056392] [<ffffff[ 1301.057361] [<ffffff[ 1301.058329] [ 1301.059303] [<ffffff[ 1301.060281] [<ffffff[ 1301.061247] [<ffffff[ 1301.062217] [<ffffff[ 1301.063180] [ 1301.064146] [<ffffff[ 1301.065110] [<ffffff[ 1301.066069] [<ffffff[ 1301.067028] [<ffffff[ 1301.067982] [ 1301.068944] [<ffffff[ 1301.069899] [<ffffff[ 1301.070860] [<ffffff[ 1301.071834] [ 1301.072808] [<ffffff[ 1301.073781] [<ffffff[ 1301.074761] [<ffffff[ 1301.075732] [<ffffff[ 1301.0[ 1301.0[ 1301.092183] [<ffffff[ 1301.093150] [ 1301.094113] [<ffffff[ 1301.095081] [<ffffff[ 1301.096050] [<ffffff[ 1301.097011] [<ffffff[ 1301.097975] [ 1301.098944] [<ffffff[ 1301.099917] [<ffffff[ 1301.100891] [<ffffff[ 1301.101859] [<ffffff[ 1301.102834] -[ 1301.103835] Thread ov
>
> and then total lockup.
>
>
> I saw some of Steven's patches get merged on Friday, is there anything else
> outstanding that didn't make it in yet that I could test ?
> Or is this another new bug ?

Looks to be another bug. Can you post the config you were using. Or is
this the same config as before?

Also, I don't see any tracing going on, so this could just be a context
tracking bug.

-- Steve

2013-06-10 21:01:39

by Dave Jones

[permalink] [raw]
Subject: Re: Another RCU trace. (3.10-rc5)

On Mon, Jun 10, 2013 at 01:33:55PM -0700, Paul E. McKenney wrote:

> > I saw some of Steven's patches get merged on Friday, is there anything else
> > outstanding that didn't make it in yet that I could test ?
> > Or is this another new bug ?
>
> I have three fixes queued up at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent
>
> Kind of hard to tell whether they are relevant given the interleaved
> stack traces, but can't hurt to try them out.

Here's another. Looks different.

[ 2739.921649] ===============================
[ 2739.923894] [ INFO: suspicious RCU usage. ]
[ 2739.926144] 3.10.0-rc5+ #6 Not tainted
[ 2739.928397] -------------------------------
[ 2739.930670] include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
[ 2739.933826]
other info that might help us debug this:

[ 2739.939663]
RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
[ 2739.946345] RCU used illegally from extended quiescent state!
[ 2739.949123] 2 locks held by trinity-child1/4385:
[ 2739.951537] #0: (&rq->lock){-.-.-.}, at: [<ffffffff816ea16f>] __schedule+0xef/0x9c0
[ 2739.955316] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff810a5625>] cpuacct_charge+0x5/0x1f0
[ 2739.959101]
stack backtrace:
[ 2739.962529] CPU: 1 PID: 4385 Comm: trinity-child1 Not tainted 3.10.0-rc5+ #6
[ 2739.970870] 0000000000000000 ffff8802247e3cf8 ffffffff816e39db ffff8802247e3d28
[ 2739.974556] ffffffff810b5987 ffff880200f02568 000000000032585b ffff880200f02520
[ 2739.978353] 0000000000000001 ffff8802247e3d60 ffffffff810a57a5 ffffffff810a5625
[ 2739.982052] Call Trace:
[ 2739.984098] [<ffffffff816e39db>] dump_stack+0x19/0x1b
[ 2739.986996] [<ffffffff810b5987>] lockdep_rcu_suspicious+0xe7/0x120
[ 2739.990080] [<ffffffff810a57a5>] cpuacct_charge+0x185/0x1f0
[ 2739.992971] [<ffffffff810a5625>] ? cpuacct_charge+0x5/0x1f0
[ 2739.994716] [<ffffffff8109609c>] update_curr+0xec/0x250
[ 2739.995873] [<ffffffff810975c8>] put_prev_task_fair+0x228/0x480
[ 2739.997036] [<ffffffff816ea1e6>] __schedule+0x166/0x9c0
[ 2739.998192] [<ffffffff816eaf60>] ? __cond_resched_softirq+0x60/0x60
[ 2739.999344] [<ffffffff816eae94>] preempt_schedule+0x44/0x60
[ 2740.000479] [<ffffffff816eaf60>] ? __cond_resched_softirq+0x60/0x60
[ 2740.001622] [<ffffffff816ecf64>] ? retint_careful+0x12/0x2e
[ 2740.002760] [<ffffffff8110d73f>] ftrace_ops_control_func+0x10f/0x140
[ 2740.003908] [<ffffffff816f4a00>] ftrace_call+0x5/0x2f
[ 2740.005046] [<ffffffff816ecf5d>] ? retint_careful+0xb/0x2e
[ 2740.006188] [<ffffffff816eaf65>] ? schedule_user+0x5/0x70
[ 2740.007325] [<ffffffff816eaf65>] ? schedule_user+0x5/0x70
[ 2740.008454] [<ffffffff816ecf64>] ? retint_careful+0x12/0x2e
[ 2757.542667] ------------[ cut here ]------------
[ 2757.544693] WARNING: at kernel/context_tracking.c:55 user_enter+0xb5/0xd0()
[ 2757.546644] Modules linked in: snd_seq_dummy fuse hidp bnep rfcomm ipt_ULOG can_raw nfnetlink can_bcm can decnet scsi_transport_iscsi pppoe pppox ppp_generic slhc af_rxrpc caif_socket caif bluetooth x25 phonet ipx p8023 p8022 rds nfc rfkill netrom af_key irda appletalk psnap crc_ccitt llc2 llc rose ax25 atm af_802154 nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel kvm crc32c_intel ghash_clmulni_intel snd_hda_codec_realtek microcode pcspkr snd_hda_codec_hdmi usb_debug snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device e1000e snd_pcm ptp pps_core snd_page_alloc snd_timer snd soundcore xfs libcrc32c
[ 2757.559111] CPU: 2 PID: 7910 Comm: kworker/u8:2 Not tainted 3.10.0-rc5+ #6
[ 2757.564356] ffffffff81a18e2d ffff880200db3ba8 ffffffff816e39db ffff880200db3be0
[ 2757.566581] ffffffff8104a1b1 0000000000000001 ffff880200db3fd8 00000000ffffffff
[ 2757.568813] ffff8802434cc240 ffff880244dd9160 ffff880200db3bf0 ffffffff8104a28a
[ 2757.571053] Call Trace:
[ 2757.572639] [<ffffffff816e39db>] dump_stack+0x19/0x1b
[ 2757.574545] [<ffffffff8104a1b1>] warn_slowpath_common+0x61/0x80
[ 2757.576548] [<ffffffff8104a28a>] warn_slowpath_null+0x1a/0x20
[ 2757.578526] [<ffffffff811451f5>] user_enter+0xb5/0xd0
[ 2757.580424] [<ffffffff816eb0d5>] preempt_schedule_irq+0x85/0x90
[ 2757.582424] [<ffffffff816ed036>] retint_kernel+0x26/0x30
[ 2757.584344] [<ffffffff81199877>] ? deactivate_slab+0x627/0x740
[ 2757.586330] [<ffffffff816e2957>] ? __slab_alloc+0x457/0x507
[ 2757.588279] [<ffffffff816e295c>] ? __slab_alloc+0x45c/0x507
[ 2757.590237] [<ffffffff816f4a00>] ? ftrace_call+0x5/0x2f
[ 2757.592145] [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
[ 2757.594156] [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
[ 2757.596166] [<ffffffff8119cd97>] kmem_cache_alloc+0x257/0x320
[ 2757.598129] [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
[ 2757.600138] [<ffffffff8110890e>] __delayacct_tsk_init+0x1e/0x50
[ 2757.602127] [<ffffffff81048425>] copy_process.part.28+0x805/0x15a0
[ 2757.604144] [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
[ 2757.606242] [<ffffffff810492ab>] do_fork+0xbb/0x470
[ 2757.608098] [<ffffffff816f4a00>] ? ftrace_call+0x5/0x2f
[ 2757.609994] [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
[ 2757.612099] [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
[ 2757.614196] [<ffffffff81049686>] kernel_thread+0x26/0x30
[ 2757.616104] [<ffffffff8106a418>] wait_for_helper+0x68/0xa0
[ 2757.618037] [<ffffffff8108b910>] ? schedule_tail+0x30/0xb0
[ 2757.619961] [<ffffffff8106a3b0>] ? __call_usermodehelper+0x90/0x90
[ 2757.621959] [<ffffffff816f4d1c>] ret_from_fork+0x7c/0xb0
[ 2757.623846] [<ffffffff8106a3b0>] ? __call_usermodehelper+0x90/0x90
[ 2757.625829] ---[ end trace 312da8c9d32e2f2c ]---

2013-06-10 21:16:51

by Steven Rostedt

[permalink] [raw]
Subject: Re: Another RCU trace. (3.10-rc5)

On Mon, 2013-06-10 at 17:01 -0400, Dave Jones wrote:
> On Mon, Jun 10, 2013 at 01:33:55PM -0700, Paul E. McKenney wrote:
>
> > > I saw some of Steven's patches get merged on Friday, is there anything else
> > > outstanding that didn't make it in yet that I could test ?
> > > Or is this another new bug ?
> >
> > I have three fixes queued up at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent
> >
> > Kind of hard to tell whether they are relevant given the interleaved
> > stack traces, but can't hurt to try them out.
>
> Here's another. Looks different.
>
> [ 2739.921649] ===============================
> [ 2739.923894] [ INFO: suspicious RCU usage. ]
> [ 2739.926144] 3.10.0-rc5+ #6 Not tainted
> [ 2739.928397] -------------------------------
> [ 2739.930670] include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
> [ 2739.933826]
> other info that might help us debug this:
>
> [ 2739.939663]
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 1, debug_locks = 0
> [ 2739.946345] RCU used illegally from extended quiescent state!
> [ 2739.949123] 2 locks held by trinity-child1/4385:
> [ 2739.951537] #0: (&rq->lock){-.-.-.}, at: [<ffffffff816ea16f>] __schedule+0xef/0x9c0
> [ 2739.955316] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff810a5625>] cpuacct_charge+0x5/0x1f0
> [ 2739.959101]
> stack backtrace:
> [ 2739.962529] CPU: 1 PID: 4385 Comm: trinity-child1 Not tainted 3.10.0-rc5+ #6
> [ 2739.970870] 0000000000000000 ffff8802247e3cf8 ffffffff816e39db ffff8802247e3d28
> [ 2739.974556] ffffffff810b5987 ffff880200f02568 000000000032585b ffff880200f02520
> [ 2739.978353] 0000000000000001 ffff8802247e3d60 ffffffff810a57a5 ffffffff810a5625
> [ 2739.982052] Call Trace:
> [ 2739.984098] [<ffffffff816e39db>] dump_stack+0x19/0x1b
> [ 2739.986996] [<ffffffff810b5987>] lockdep_rcu_suspicious+0xe7/0x120
> [ 2739.990080] [<ffffffff810a57a5>] cpuacct_charge+0x185/0x1f0
> [ 2739.992971] [<ffffffff810a5625>] ? cpuacct_charge+0x5/0x1f0
> [ 2739.994716] [<ffffffff8109609c>] update_curr+0xec/0x250
> [ 2739.995873] [<ffffffff810975c8>] put_prev_task_fair+0x228/0x480
> [ 2739.997036] [<ffffffff816ea1e6>] __schedule+0x166/0x9c0
> [ 2739.998192] [<ffffffff816eaf60>] ? __cond_resched_softirq+0x60/0x60
> [ 2739.999344] [<ffffffff816eae94>] preempt_schedule+0x44/0x60

Yeah, this one is fixed by a patch I sent out earlier, and I believe
Peter Zijlstra is going to push it. It wasn't part of my queue.

Peter, Are you going to take the preempt_schedule_context() patch?

-- Steve

> [ 2740.000479] [<ffffffff816eaf60>] ? __cond_resched_softirq+0x60/0x60
> [ 2740.001622] [<ffffffff816ecf64>] ? retint_careful+0x12/0x2e
> [ 2740.002760] [<ffffffff8110d73f>] ftrace_ops_control_func+0x10f/0x140
> [ 2740.003908] [<ffffffff816f4a00>] ftrace_call+0x5/0x2f
> [ 2740.005046] [<ffffffff816ecf5d>] ? retint_careful+0xb/0x2e
> [ 2740.006188] [<ffffffff816eaf65>] ? schedule_user+0x5/0x70
> [ 2740.007325] [<ffffffff816eaf65>] ? schedule_user+0x5/0x70
> [ 2740.008454] [<ffffffff816ecf64>] ? retint_careful+0x12/0x2e
> [ 2757.542667] ------------[ cut here ]------------
> [ 2757.544693] WARNING: at kernel/context_tracking.c:55 user_enter+0xb5/0xd0()
> [ 2757.546644] Modules linked in: snd_seq_dummy fuse hidp bnep rfcomm ipt_ULOG can_raw nfnetlink can_bcm can decnet scsi_transport_iscsi pppoe pppox ppp_generic slhc af_rxrpc caif_socket caif bluetooth x25 phonet ipx p8023 p8022 rds nfc rfkill netrom af_key irda appletalk psnap crc_ccitt llc2 llc rose ax25 atm af_802154 nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel kvm crc32c_intel ghash_clmulni_intel snd_hda_codec_realtek microcode pcspkr snd_hda_codec_hdmi usb_debug snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device e1000e snd_pcm ptp pps_core snd_page_alloc snd_timer snd soundcore xfs libcrc32c
> [ 2757.559111] CPU: 2 PID: 7910 Comm: kworker/u8:2 Not tainted 3.10.0-rc5+ #6
> [ 2757.564356] ffffffff81a18e2d ffff880200db3ba8 ffffffff816e39db ffff880200db3be0
> [ 2757.566581] ffffffff8104a1b1 0000000000000001 ffff880200db3fd8 00000000ffffffff
> [ 2757.568813] ffff8802434cc240 ffff880244dd9160 ffff880200db3bf0 ffffffff8104a28a
> [ 2757.571053] Call Trace:
> [ 2757.572639] [<ffffffff816e39db>] dump_stack+0x19/0x1b
> [ 2757.574545] [<ffffffff8104a1b1>] warn_slowpath_common+0x61/0x80
> [ 2757.576548] [<ffffffff8104a28a>] warn_slowpath_null+0x1a/0x20
> [ 2757.578526] [<ffffffff811451f5>] user_enter+0xb5/0xd0
> [ 2757.580424] [<ffffffff816eb0d5>] preempt_schedule_irq+0x85/0x90
> [ 2757.582424] [<ffffffff816ed036>] retint_kernel+0x26/0x30
> [ 2757.584344] [<ffffffff81199877>] ? deactivate_slab+0x627/0x740
> [ 2757.586330] [<ffffffff816e2957>] ? __slab_alloc+0x457/0x507
> [ 2757.588279] [<ffffffff816e295c>] ? __slab_alloc+0x45c/0x507
> [ 2757.590237] [<ffffffff816f4a00>] ? ftrace_call+0x5/0x2f
> [ 2757.592145] [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
> [ 2757.594156] [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
> [ 2757.596166] [<ffffffff8119cd97>] kmem_cache_alloc+0x257/0x320
> [ 2757.598129] [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
> [ 2757.600138] [<ffffffff8110890e>] __delayacct_tsk_init+0x1e/0x50
> [ 2757.602127] [<ffffffff81048425>] copy_process.part.28+0x805/0x15a0
> [ 2757.604144] [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
> [ 2757.606242] [<ffffffff810492ab>] do_fork+0xbb/0x470
> [ 2757.608098] [<ffffffff816f4a00>] ? ftrace_call+0x5/0x2f
> [ 2757.609994] [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
> [ 2757.612099] [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
> [ 2757.614196] [<ffffffff81049686>] kernel_thread+0x26/0x30
> [ 2757.616104] [<ffffffff8106a418>] wait_for_helper+0x68/0xa0
> [ 2757.618037] [<ffffffff8108b910>] ? schedule_tail+0x30/0xb0
> [ 2757.619961] [<ffffffff8106a3b0>] ? __call_usermodehelper+0x90/0x90
> [ 2757.621959] [<ffffffff816f4d1c>] ret_from_fork+0x7c/0xb0
> [ 2757.623846] [<ffffffff8106a3b0>] ? __call_usermodehelper+0x90/0x90
> [ 2757.625829] ---[ end trace 312da8c9d32e2f2c ]---
>

2013-06-10 21:18:35

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Another RCU trace. (3.10-rc5)

On Mon, Jun 10, 2013 at 05:01:23PM -0400, Dave Jones wrote:
> On Mon, Jun 10, 2013 at 01:33:55PM -0700, Paul E. McKenney wrote:
>
> > > I saw some of Steven's patches get merged on Friday, is there anything else
> > > outstanding that didn't make it in yet that I could test ?
> > > Or is this another new bug ?
> >
> > I have three fixes queued up at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent
> >
> > Kind of hard to tell whether they are relevant given the interleaved
> > stack traces, but can't hurt to try them out.
>
> Here's another. Looks different.

I bet that commit d6284099 (trace: Allow idle-safe tracepoints to be
called from irq) from the above git archive fixes this one. Just don't
ask how much I am willing to bet. ;-)

Thanx, Paul

> [ 2739.921649] ===============================
> [ 2739.923894] [ INFO: suspicious RCU usage. ]
> [ 2739.926144] 3.10.0-rc5+ #6 Not tainted
> [ 2739.928397] -------------------------------
> [ 2739.930670] include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
> [ 2739.933826]
> other info that might help us debug this:
>
> [ 2739.939663]
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 1, debug_locks = 0
> [ 2739.946345] RCU used illegally from extended quiescent state!
> [ 2739.949123] 2 locks held by trinity-child1/4385:
> [ 2739.951537] #0: (&rq->lock){-.-.-.}, at: [<ffffffff816ea16f>] __schedule+0xef/0x9c0
> [ 2739.955316] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff810a5625>] cpuacct_charge+0x5/0x1f0
> [ 2739.959101]
> stack backtrace:
> [ 2739.962529] CPU: 1 PID: 4385 Comm: trinity-child1 Not tainted 3.10.0-rc5+ #6
> [ 2739.970870] 0000000000000000 ffff8802247e3cf8 ffffffff816e39db ffff8802247e3d28
> [ 2739.974556] ffffffff810b5987 ffff880200f02568 000000000032585b ffff880200f02520
> [ 2739.978353] 0000000000000001 ffff8802247e3d60 ffffffff810a57a5 ffffffff810a5625
> [ 2739.982052] Call Trace:
> [ 2739.984098] [<ffffffff816e39db>] dump_stack+0x19/0x1b
> [ 2739.986996] [<ffffffff810b5987>] lockdep_rcu_suspicious+0xe7/0x120
> [ 2739.990080] [<ffffffff810a57a5>] cpuacct_charge+0x185/0x1f0
> [ 2739.992971] [<ffffffff810a5625>] ? cpuacct_charge+0x5/0x1f0
> [ 2739.994716] [<ffffffff8109609c>] update_curr+0xec/0x250
> [ 2739.995873] [<ffffffff810975c8>] put_prev_task_fair+0x228/0x480
> [ 2739.997036] [<ffffffff816ea1e6>] __schedule+0x166/0x9c0
> [ 2739.998192] [<ffffffff816eaf60>] ? __cond_resched_softirq+0x60/0x60
> [ 2739.999344] [<ffffffff816eae94>] preempt_schedule+0x44/0x60
> [ 2740.000479] [<ffffffff816eaf60>] ? __cond_resched_softirq+0x60/0x60
> [ 2740.001622] [<ffffffff816ecf64>] ? retint_careful+0x12/0x2e
> [ 2740.002760] [<ffffffff8110d73f>] ftrace_ops_control_func+0x10f/0x140
> [ 2740.003908] [<ffffffff816f4a00>] ftrace_call+0x5/0x2f
> [ 2740.005046] [<ffffffff816ecf5d>] ? retint_careful+0xb/0x2e
> [ 2740.006188] [<ffffffff816eaf65>] ? schedule_user+0x5/0x70
> [ 2740.007325] [<ffffffff816eaf65>] ? schedule_user+0x5/0x70
> [ 2740.008454] [<ffffffff816ecf64>] ? retint_careful+0x12/0x2e
> [ 2757.542667] ------------[ cut here ]------------
> [ 2757.544693] WARNING: at kernel/context_tracking.c:55 user_enter+0xb5/0xd0()
> [ 2757.546644] Modules linked in: snd_seq_dummy fuse hidp bnep rfcomm ipt_ULOG can_raw nfnetlink can_bcm can decnet scsi_transport_iscsi pppoe pppox ppp_generic slhc af_rxrpc caif_socket caif bluetooth x25 phonet ipx p8023 p8022 rds nfc rfkill netrom af_key irda appletalk psnap crc_ccitt llc2 llc rose ax25 atm af_802154 nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel kvm crc32c_intel ghash_clmulni_intel snd_hda_codec_realtek microcode pcspkr snd_hda_codec_hdmi usb_debug snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device e1000e snd_pcm ptp pps_core snd_page_alloc snd_timer snd soundcore xfs libcrc32c
> [ 2757.559111] CPU: 2 PID: 7910 Comm: kworker/u8:2 Not tainted 3.10.0-rc5+ #6
> [ 2757.564356] ffffffff81a18e2d ffff880200db3ba8 ffffffff816e39db ffff880200db3be0
> [ 2757.566581] ffffffff8104a1b1 0000000000000001 ffff880200db3fd8 00000000ffffffff
> [ 2757.568813] ffff8802434cc240 ffff880244dd9160 ffff880200db3bf0 ffffffff8104a28a
> [ 2757.571053] Call Trace:
> [ 2757.572639] [<ffffffff816e39db>] dump_stack+0x19/0x1b
> [ 2757.574545] [<ffffffff8104a1b1>] warn_slowpath_common+0x61/0x80
> [ 2757.576548] [<ffffffff8104a28a>] warn_slowpath_null+0x1a/0x20
> [ 2757.578526] [<ffffffff811451f5>] user_enter+0xb5/0xd0
> [ 2757.580424] [<ffffffff816eb0d5>] preempt_schedule_irq+0x85/0x90
> [ 2757.582424] [<ffffffff816ed036>] retint_kernel+0x26/0x30
> [ 2757.584344] [<ffffffff81199877>] ? deactivate_slab+0x627/0x740
> [ 2757.586330] [<ffffffff816e2957>] ? __slab_alloc+0x457/0x507
> [ 2757.588279] [<ffffffff816e295c>] ? __slab_alloc+0x45c/0x507
> [ 2757.590237] [<ffffffff816f4a00>] ? ftrace_call+0x5/0x2f
> [ 2757.592145] [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
> [ 2757.594156] [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
> [ 2757.596166] [<ffffffff8119cd97>] kmem_cache_alloc+0x257/0x320
> [ 2757.598129] [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
> [ 2757.600138] [<ffffffff8110890e>] __delayacct_tsk_init+0x1e/0x50
> [ 2757.602127] [<ffffffff81048425>] copy_process.part.28+0x805/0x15a0
> [ 2757.604144] [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
> [ 2757.606242] [<ffffffff810492ab>] do_fork+0xbb/0x470
> [ 2757.608098] [<ffffffff816f4a00>] ? ftrace_call+0x5/0x2f
> [ 2757.609994] [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
> [ 2757.612099] [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
> [ 2757.614196] [<ffffffff81049686>] kernel_thread+0x26/0x30
> [ 2757.616104] [<ffffffff8106a418>] wait_for_helper+0x68/0xa0
> [ 2757.618037] [<ffffffff8108b910>] ? schedule_tail+0x30/0xb0
> [ 2757.619961] [<ffffffff8106a3b0>] ? __call_usermodehelper+0x90/0x90
> [ 2757.621959] [<ffffffff816f4d1c>] ret_from_fork+0x7c/0xb0
> [ 2757.623846] [<ffffffff8106a3b0>] ? __call_usermodehelper+0x90/0x90
> [ 2757.625829] ---[ end trace 312da8c9d32e2f2c ]---
>
>

2013-06-10 21:37:38

by Steven Rostedt

[permalink] [raw]
Subject: Re: Another RCU trace. (3.10-rc5)

On Mon, 2013-06-10 at 14:18 -0700, Paul E. McKenney wrote:
> On Mon, Jun 10, 2013 at 05:01:23PM -0400, Dave Jones wrote:
> > On Mon, Jun 10, 2013 at 01:33:55PM -0700, Paul E. McKenney wrote:
> >
> > > > I saw some of Steven's patches get merged on Friday, is there anything else
> > > > outstanding that didn't make it in yet that I could test ?
> > > > Or is this another new bug ?
> > >
> > > I have three fixes queued up at:
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent
> > >
> > > Kind of hard to tell whether they are relevant given the interleaved
> > > stack traces, but can't hurt to try them out.
> >
> > Here's another. Looks different.
>
> I bet that commit d6284099 (trace: Allow idle-safe tracepoints to be
> called from irq) from the above git archive fixes this one. Just don't
> ask how much I am willing to bet. ;-)

Don't bet much ;-) This has nothing to do with tracepoints. It's due to
the function tracer. And the perf function tracer at that, as that's the
only thing that currently uses "ftrace_ops_control_func" that's in the
backtrace below.

-- Steve

>
> Thanx, Paul
>
> > [ 2739.921649] ===============================
> > [ 2739.923894] [ INFO: suspicious RCU usage. ]
> > [ 2739.926144] 3.10.0-rc5+ #6 Not tainted
> > [ 2739.928397] -------------------------------
> > [ 2739.930670] include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
> > [ 2739.933826]
> > other info that might help us debug this:
> >
> > [ 2739.939663]
> > RCU used illegally from idle CPU!
> > rcu_scheduler_active = 1, debug_locks = 0
> > [ 2739.946345] RCU used illegally from extended quiescent state!
> > [ 2739.949123] 2 locks held by trinity-child1/4385:
> > [ 2739.951537] #0: (&rq->lock){-.-.-.}, at: [<ffffffff816ea16f>] __schedule+0xef/0x9c0
> > [ 2739.955316] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff810a5625>] cpuacct_charge+0x5/0x1f0
> > [ 2739.959101]
> > stack backtrace:
> > [ 2739.962529] CPU: 1 PID: 4385 Comm: trinity-child1 Not tainted 3.10.0-rc5+ #6
> > [ 2739.970870] 0000000000000000 ffff8802247e3cf8 ffffffff816e39db ffff8802247e3d28
> > [ 2739.974556] ffffffff810b5987 ffff880200f02568 000000000032585b ffff880200f02520
> > [ 2739.978353] 0000000000000001 ffff8802247e3d60 ffffffff810a57a5 ffffffff810a5625
> > [ 2739.982052] Call Trace:
> > [ 2739.984098] [<ffffffff816e39db>] dump_stack+0x19/0x1b
> > [ 2739.986996] [<ffffffff810b5987>] lockdep_rcu_suspicious+0xe7/0x120
> > [ 2739.990080] [<ffffffff810a57a5>] cpuacct_charge+0x185/0x1f0
> > [ 2739.992971] [<ffffffff810a5625>] ? cpuacct_charge+0x5/0x1f0
> > [ 2739.994716] [<ffffffff8109609c>] update_curr+0xec/0x250
> > [ 2739.995873] [<ffffffff810975c8>] put_prev_task_fair+0x228/0x480
> > [ 2739.997036] [<ffffffff816ea1e6>] __schedule+0x166/0x9c0
> > [ 2739.998192] [<ffffffff816eaf60>] ? __cond_resched_softirq+0x60/0x60
> > [ 2739.999344] [<ffffffff816eae94>] preempt_schedule+0x44/0x60
> > [ 2740.000479] [<ffffffff816eaf60>] ? __cond_resched_softirq+0x60/0x60
> > [ 2740.001622] [<ffffffff816ecf64>] ? retint_careful+0x12/0x2e
> > [ 2740.002760] [<ffffffff8110d73f>] ftrace_ops_control_func+0x10f/0x140
> > [ 2740.003908] [<ffffffff816f4a00>] ftrace_call+0x5/0x2f
> > [ 2740.005046] [<ffffffff816ecf5d>] ? retint_careful+0xb/0x2e
> > [ 2740.006188] [<ffffffff816eaf65>] ? schedule_user+0x5/0x70
> > [ 2740.007325] [<ffffffff816eaf65>] ? schedule_user+0x5/0x70
> > [ 2740.008454] [<ffffffff816ecf64>] ? retint_careful+0x12/0x2e
> > [ 2757.542667] ------------[ cut here ]------------
> > [ 2757.544693] WARNING: at kernel/context_tracking.c:55 user_enter+0xb5/0xd0()
> > [ 2757.546644] Modules linked in: snd_seq_dummy fuse hidp bnep rfcomm ipt_ULOG can_raw nfnetlink can_bcm can decnet scsi_transport_iscsi pppoe pppox ppp_generic slhc af_rxrpc caif_socket caif bluetooth x25 phonet ipx p8023 p8022 rds nfc rfkill netrom af_key irda appletalk psnap crc_ccitt llc2 llc rose ax25 atm af_802154 nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel kvm crc32c_intel ghash_clmulni_intel snd_hda_codec_realtek microcode pcspkr snd_hda_codec_hdmi usb_debug snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device e1000e snd_pcm ptp pps_core snd_page_alloc snd_timer snd soundcore xfs libcrc32c
> > [ 2757.559111] CPU: 2 PID: 7910 Comm: kworker/u8:2 Not tainted 3.10.0-rc5+ #6
> > [ 2757.564356] ffffffff81a18e2d ffff880200db3ba8 ffffffff816e39db ffff880200db3be0
> > [ 2757.566581] ffffffff8104a1b1 0000000000000001 ffff880200db3fd8 00000000ffffffff
> > [ 2757.568813] ffff8802434cc240 ffff880244dd9160 ffff880200db3bf0 ffffffff8104a28a
> > [ 2757.571053] Call Trace:
> > [ 2757.572639] [<ffffffff816e39db>] dump_stack+0x19/0x1b
> > [ 2757.574545] [<ffffffff8104a1b1>] warn_slowpath_common+0x61/0x80
> > [ 2757.576548] [<ffffffff8104a28a>] warn_slowpath_null+0x1a/0x20
> > [ 2757.578526] [<ffffffff811451f5>] user_enter+0xb5/0xd0
> > [ 2757.580424] [<ffffffff816eb0d5>] preempt_schedule_irq+0x85/0x90
> > [ 2757.582424] [<ffffffff816ed036>] retint_kernel+0x26/0x30
> > [ 2757.584344] [<ffffffff81199877>] ? deactivate_slab+0x627/0x740
> > [ 2757.586330] [<ffffffff816e2957>] ? __slab_alloc+0x457/0x507
> > [ 2757.588279] [<ffffffff816e295c>] ? __slab_alloc+0x45c/0x507
> > [ 2757.590237] [<ffffffff816f4a00>] ? ftrace_call+0x5/0x2f
> > [ 2757.592145] [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
> > [ 2757.594156] [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
> > [ 2757.596166] [<ffffffff8119cd97>] kmem_cache_alloc+0x257/0x320
> > [ 2757.598129] [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
> > [ 2757.600138] [<ffffffff8110890e>] __delayacct_tsk_init+0x1e/0x50
> > [ 2757.602127] [<ffffffff81048425>] copy_process.part.28+0x805/0x15a0
> > [ 2757.604144] [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
> > [ 2757.606242] [<ffffffff810492ab>] do_fork+0xbb/0x470
> > [ 2757.608098] [<ffffffff816f4a00>] ? ftrace_call+0x5/0x2f
> > [ 2757.609994] [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
> > [ 2757.612099] [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
> > [ 2757.614196] [<ffffffff81049686>] kernel_thread+0x26/0x30
> > [ 2757.616104] [<ffffffff8106a418>] wait_for_helper+0x68/0xa0
> > [ 2757.618037] [<ffffffff8108b910>] ? schedule_tail+0x30/0xb0
> > [ 2757.619961] [<ffffffff8106a3b0>] ? __call_usermodehelper+0x90/0x90
> > [ 2757.621959] [<ffffffff816f4d1c>] ret_from_fork+0x7c/0xb0
> > [ 2757.623846] [<ffffffff8106a3b0>] ? __call_usermodehelper+0x90/0x90
> > [ 2757.625829] ---[ end trace 312da8c9d32e2f2c ]---
> >
> >

2013-06-10 21:54:32

by Dave Jones

[permalink] [raw]
Subject: Re: Another RCU trace. (3.10-rc5)

On Mon, Jun 10, 2013 at 05:37:36PM -0400, Steven Rostedt wrote:
> On Mon, 2013-06-10 at 14:18 -0700, Paul E. McKenney wrote:
> > On Mon, Jun 10, 2013 at 05:01:23PM -0400, Dave Jones wrote:
> > > On Mon, Jun 10, 2013 at 01:33:55PM -0700, Paul E. McKenney wrote:
> > >
> > > > > I saw some of Steven's patches get merged on Friday, is there anything else
> > > > > outstanding that didn't make it in yet that I could test ?
> > > > > Or is this another new bug ?
> > > >
> > > > I have three fixes queued up at:
> > > >
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent
> > > >
> > > > Kind of hard to tell whether they are relevant given the interleaved
> > > > stack traces, but can't hurt to try them out.
> > >
> > > Here's another. Looks different.
> >
> > I bet that commit d6284099 (trace: Allow idle-safe tracepoints to be
> > called from irq) from the above git archive fixes this one. Just don't
> > ask how much I am willing to bet. ;-)
>
> Don't bet much ;-) This has nothing to do with tracepoints. It's due to
> the function tracer.

dammit, 20 minutes after I finally cloned the repo.
Can we go back to posting diffs instead of hashes please ?

So while updating my list of bugs I've found this cycle, I noticed
I'd already posted this one a month ago on -rc2.
https://lkml.org/lkml/2013/5/21/327
which led us to this patch https://lkml.org/lkml/2013/5/24/379
After which I hit a bunch of what seem to be other RCU related bugs.

So maybe that patch was a winner after all and got dropped ?

Dave

2013-06-10 22:10:36

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Another RCU trace. (3.10-rc5)

On Mon, Jun 10, 2013 at 05:54:21PM -0400, Dave Jones wrote:
> On Mon, Jun 10, 2013 at 05:37:36PM -0400, Steven Rostedt wrote:
> > On Mon, 2013-06-10 at 14:18 -0700, Paul E. McKenney wrote:
> > > On Mon, Jun 10, 2013 at 05:01:23PM -0400, Dave Jones wrote:
> > > > On Mon, Jun 10, 2013 at 01:33:55PM -0700, Paul E. McKenney wrote:
> > > >
> > > > > > I saw some of Steven's patches get merged on Friday, is there anything else
> > > > > > outstanding that didn't make it in yet that I could test ?
> > > > > > Or is this another new bug ?
> > > > >
> > > > > I have three fixes queued up at:
> > > > >
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent
> > > > >
> > > > > Kind of hard to tell whether they are relevant given the interleaved
> > > > > stack traces, but can't hurt to try them out.
> > > >
> > > > Here's another. Looks different.
> > >
> > > I bet that commit d6284099 (trace: Allow idle-safe tracepoints to be
> > > called from irq) from the above git archive fixes this one. Just don't
> > > ask how much I am willing to bet. ;-)
> >
> > Don't bet much ;-) This has nothing to do with tracepoints. It's due to
> > the function tracer.
>
> dammit, 20 minutes after I finally cloned the repo.
> Can we go back to posting diffs instead of hashes please ?
>
> So while updating my list of bugs I've found this cycle, I noticed
> I'd already posted this one a month ago on -rc2.
> https://lkml.org/lkml/2013/5/21/327
> which led us to this patch https://lkml.org/lkml/2013/5/24/379
> After which I hit a bunch of what seem to be other RCU related bugs.
>
> So maybe that patch was a winner after all and got dropped ?

Please see below for the patch that I was thinking of.

It has not been dropped, I was on travel and a bit slow about pushing
things.

Thanx, Paul

------------------------------------------------------------------------

trace: Allow idle-safe tracepoints to be called from irq

__DECLARE_TRACE_RCU() currently creates an _rcuidle() tracepoint which
may safely be invoked from what RCU considers to be an idle CPU.
However, these _rcuidle() tracepoints may -not- be invoked from the
handler of an irq taken from idle, because rcu_idle_enter() zeroes
RCU's nesting-level counter, so that the rcu_irq_exit() returning to
idle will trigger a WARN_ON_ONCE().

This commit therefore substitutes rcu_irq_enter() for rcu_idle_exit()
and rcu_irq_exit() for rcu_idle_enter() in order to make the _rcuidle()
tracepoints usable from irq handlers as well as from process context.

Reported-by: Dave Jones <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Steven Rostedt <[email protected]>

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 2f322c3..f8e084d 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -145,8 +145,8 @@ static inline void tracepoint_synchronize_unregister(void)
TP_PROTO(data_proto), \
TP_ARGS(data_args), \
TP_CONDITION(cond), \
- rcu_idle_exit(), \
- rcu_idle_enter()); \
+ rcu_irq_enter(), \
+ rcu_irq_exit()); \
}
#else
#define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args)

2013-06-10 22:10:59

by Steven Rostedt

[permalink] [raw]
Subject: Re: Another RCU trace. (3.10-rc5)

On Mon, 2013-06-10 at 17:54 -0400, Dave Jones wrote:
> On Mon, Jun 10, 2013 at 05:37:36PM -0400, Steven Rostedt wrote:
> > On Mon, 2013-06-10 at 14:18 -0700, Paul E. McKenney wrote:
> > > On Mon, Jun 10, 2013 at 05:01:23PM -0400, Dave Jones wrote:
> > > > On Mon, Jun 10, 2013 at 01:33:55PM -0700, Paul E. McKenney wrote:
> > > >
> > > > > > I saw some of Steven's patches get merged on Friday, is there anything else
> > > > > > outstanding that didn't make it in yet that I could test ?
> > > > > > Or is this another new bug ?
> > > > >
> > > > > I have three fixes queued up at:
> > > > >
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent
> > > > >
> > > > > Kind of hard to tell whether they are relevant given the interleaved
> > > > > stack traces, but can't hurt to try them out.
> > > >
> > > > Here's another. Looks different.
> > >
> > > I bet that commit d6284099 (trace: Allow idle-safe tracepoints to be
> > > called from irq) from the above git archive fixes this one. Just don't
> > > ask how much I am willing to bet. ;-)
> >
> > Don't bet much ;-) This has nothing to do with tracepoints. It's due to
> > the function tracer.
>
> dammit, 20 minutes after I finally cloned the repo.
> Can we go back to posting diffs instead of hashes please ?
>
> So while updating my list of bugs I've found this cycle, I noticed
> I'd already posted this one a month ago on -rc2.
> https://lkml.org/lkml/2013/5/21/327
> which led us to this patch https://lkml.org/lkml/2013/5/24/379
> After which I hit a bunch of what seem to be other RCU related bugs.
>
> So maybe that patch was a winner after all and got dropped ?

I don't think it got dropped, it just seemed that it would go through
Peter:

https://lkml.org/lkml/2013/6/6/456

-- Steve

>
> Dave
>

2013-06-10 22:25:40

by Steven Rostedt

[permalink] [raw]
Subject: Re: Another RCU trace. (3.10-rc5)

On Mon, 2013-06-10 at 15:10 -0700, Paul E. McKenney wrote:

> > So maybe that patch was a winner after all and got dropped ?
>
> Please see below for the patch that I was thinking of.
>
> It has not been dropped, I was on travel and a bit slow about pushing
> things.
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> trace: Allow idle-safe tracepoints to be called from irq
>
> __DECLARE_TRACE_RCU() currently creates an _rcuidle() tracepoint which
> may safely be invoked from what RCU considers to be an idle CPU.
> However, these _rcuidle() tracepoints may -not- be invoked from the
> handler of an irq taken from idle, because rcu_idle_enter() zeroes
> RCU's nesting-level counter, so that the rcu_irq_exit() returning to
> idle will trigger a WARN_ON_ONCE().
>
> This commit therefore substitutes rcu_irq_enter() for rcu_idle_exit()
> and rcu_irq_exit() for rcu_idle_enter() in order to make the _rcuidle()
> tracepoints usable from irq handlers as well as from process context.
>
> Reported-by: Dave Jones <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
> Cc: Steven Rostedt <[email protected]>
>
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index 2f322c3..f8e084d 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -145,8 +145,8 @@ static inline void tracepoint_synchronize_unregister(void)
> TP_PROTO(data_proto), \
> TP_ARGS(data_args), \
> TP_CONDITION(cond), \
> - rcu_idle_exit(), \
> - rcu_idle_enter()); \
> + rcu_irq_enter(), \
> + rcu_irq_exit()); \

Still wont fix the splat that Dave saw. As the splat has nothing to do
with tracepoints, but has to do with function tracing, which is a whole
'nother beast.

-- Steve

> }
> #else
> #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args)

2013-06-18 09:58:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Another RCU trace. (3.10-rc5)

On Mon, Jun 10, 2013 at 05:16:48PM -0400, Steven Rostedt wrote:
> On Mon, 2013-06-10 at 17:01 -0400, Dave Jones wrote:
> > On Mon, Jun 10, 2013 at 01:33:55PM -0700, Paul E. McKenney wrote:
> >
> > > > I saw some of Steven's patches get merged on Friday, is there anything else
> > > > outstanding that didn't make it in yet that I could test ?
> > > > Or is this another new bug ?
> > >
> > > I have three fixes queued up at:
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent
> > >
> > > Kind of hard to tell whether they are relevant given the interleaved
> > > stack traces, but can't hurt to try them out.
> >
> > Here's another. Looks different.
> >
> > [ 2739.921649] ===============================
> > [ 2739.923894] [ INFO: suspicious RCU usage. ]
> > [ 2739.926144] 3.10.0-rc5+ #6 Not tainted
> > [ 2739.928397] -------------------------------
> > [ 2739.930670] include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
> > [ 2739.933826]
> > other info that might help us debug this:
> >
> > [ 2739.939663]
> > RCU used illegally from idle CPU!
> > rcu_scheduler_active = 1, debug_locks = 0
> > [ 2739.946345] RCU used illegally from extended quiescent state!
> > [ 2739.949123] 2 locks held by trinity-child1/4385:
> > [ 2739.951537] #0: (&rq->lock){-.-.-.}, at: [<ffffffff816ea16f>] __schedule+0xef/0x9c0
> > [ 2739.955316] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff810a5625>] cpuacct_charge+0x5/0x1f0
> > [ 2739.959101]
> > stack backtrace:
> > [ 2739.962529] CPU: 1 PID: 4385 Comm: trinity-child1 Not tainted 3.10.0-rc5+ #6
> > [ 2739.970870] 0000000000000000 ffff8802247e3cf8 ffffffff816e39db ffff8802247e3d28
> > [ 2739.974556] ffffffff810b5987 ffff880200f02568 000000000032585b ffff880200f02520
> > [ 2739.978353] 0000000000000001 ffff8802247e3d60 ffffffff810a57a5 ffffffff810a5625
> > [ 2739.982052] Call Trace:
> > [ 2739.984098] [<ffffffff816e39db>] dump_stack+0x19/0x1b
> > [ 2739.986996] [<ffffffff810b5987>] lockdep_rcu_suspicious+0xe7/0x120
> > [ 2739.990080] [<ffffffff810a57a5>] cpuacct_charge+0x185/0x1f0
> > [ 2739.992971] [<ffffffff810a5625>] ? cpuacct_charge+0x5/0x1f0
> > [ 2739.994716] [<ffffffff8109609c>] update_curr+0xec/0x250
> > [ 2739.995873] [<ffffffff810975c8>] put_prev_task_fair+0x228/0x480
> > [ 2739.997036] [<ffffffff816ea1e6>] __schedule+0x166/0x9c0
> > [ 2739.998192] [<ffffffff816eaf60>] ? __cond_resched_softirq+0x60/0x60
> > [ 2739.999344] [<ffffffff816eae94>] preempt_schedule+0x44/0x60
>
> Yeah, this one is fixed by a patch I sent out earlier, and I believe
> Peter Zijlstra is going to push it. It wasn't part of my queue.
>
> Peter, Are you going to take the preempt_schedule_context() patch?

I have it queued, I just seem to have some problems locating Ingo to
stuff patches into -tip :/

Will continue prodding.. Ingo if you're reading! :-)

2013-06-18 14:41:54

by Dave Jones

[permalink] [raw]
Subject: Re: Another RCU trace. (3.10-rc5)

On Tue, Jun 18, 2013 at 11:58:21AM +0200, Peter Zijlstra wrote:

> > Peter, Are you going to take the preempt_schedule_context() patch?
>
> I have it queued, I just seem to have some problems locating Ingo to
> stuff patches into -tip :/
>
> Will continue prodding.. Ingo if you're reading! :-)

There's a bunch of other stuff in tip pending pushing to Linus too,
including a bunch of fixes in timers/urgent that have been there
over a month.

Ingo hasn't posted since the 3rd, maybe he's sick/on vacation.
Does anyone else do pushes from tip->Linus ? tglx ?

Dave

2013-06-18 17:32:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Another RCU trace. (3.10-rc5)

On Tue, Jun 18, 2013 at 10:40:20AM -0400, Dave Jones wrote:
> On Tue, Jun 18, 2013 at 11:58:21AM +0200, Peter Zijlstra wrote:
>
> > > Peter, Are you going to take the preempt_schedule_context() patch?
> >
> > I have it queued, I just seem to have some problems locating Ingo to
> > stuff patches into -tip :/
> >
> > Will continue prodding.. Ingo if you're reading! :-)
>
> There's a bunch of other stuff in tip pending pushing to Linus too,
> including a bunch of fixes in timers/urgent that have been there
> over a month.
>
> Ingo hasn't posted since the 3rd, maybe he's sick/on vacation.
> Does anyone else do pushes from tip->Linus ? tglx ?
>

hpa and tglx are the other people with tip commit access. I'll try and
get a hold of tglx; he is currently at a conference but he should be
back the day after tomorrow.