Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751898AbdHBGc3 (ORCPT ); Wed, 2 Aug 2017 02:32:29 -0400 Received: from mail-io0-f174.google.com ([209.85.223.174]:32912 "EHLO mail-io0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751194AbdHBGc1 (ORCPT ); Wed, 2 Aug 2017 02:32:27 -0400 From: Pratyush Anand Subject: Query: rcu_sched detected stalls with function_graph tracer To: Steven Rostedt Cc: open list Message-ID: Date: Wed, 2 Aug 2017 12:02:21 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 15275 Lines: 282 Hi Steve, I am using a 3.10 based kernel, and when I enable function_graph with one particular x86_64 machine, I encounter rcu_sched stall. echo function_graph > /sys/kernel/debug/tracing/current_tracer echo 1 > /sys/kernel/debug/tracing/tracing_on If I use 4.13-rc2, then its better, but still goes for it after a bit of stressing with stressng. It looks like that this behavior should be expected with a loaded system (like that of ftrace-graph tracing for all available function) which can cause RCU stall warning. But,I am not an expert. So, whats your opinion on it? I tried to bisect and there was no particular function which can cause this. It looked like that when we had more than 5000 functions in set_ftrace_filter, we start getting such warnings. Tried by putting some most hit functions like spin_lock* etc in set_ftrace_notrace. Still, no help. Do you have some suggestion for further debugging pointers? log with 3.10 based kernel =============================== [ 358.764071] perf: interrupt took too long (4503 > 4488), lowering kernel.perf_event_max_sample_rate to 44000 [ 408.705034] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 7, t=60036 jiffies, g=21144, c=21143, q=21651) [ 408.706011] Task dump for CPU 0: [ 408.706011] swapper/0 R running task 0 0 0 0x00000008 [ 408.706011] 0000000000000003 0000000000000003 ffffffff81aaf920 ffffffff819e4000 [ 408.706011] ffffffff819e7ed0 ffffffff816b6ef5 0000000000000000 ffffffff81b1c820 [ 408.706011] ffffffff819e4000 ffffffff819e4000 ffffffff819e4000 ffffffff819e4000 [ 408.706011] Call Trace: [ 408.706011] [] ? ftrace_graph_caller+0x85/0x85 [ 408.706011] [] handle_irq_event_percpu+0x55/0x80 [ 408.706011] [] ? ftrace_graph_caller+0x85/0x85 [ 408.706011] [] handle_irq_event+0x3c/0x60 [ 408.706011] [] ? ftrace_graph_caller+0x85/0x85 [ 408.706011] [] pci_bus_read_config_dword+0x86/0xb0 [ 408.706011] [] ? rest_init+0x77/0x80 [ 408.706011] [] ? start_kernel+0x439/0x45a [ 408.706011] [] ? repair_env_string+0x5c/0x5c [ 408.706011] [] ? early_idt_handler_array+0x120/0x120 [ 408.706011] [] ? x86_64_start_reservations+0x24/0x26 [ 408.706011] [] ? x86_64_start_kernel+0x14f/0x172 [ 377.243167] perf: interrupt took too long (5741 > 5628), lowering kernel.perf_event_max_sample_rate to 34000 [ 437.522697] perf: interrupt took too long (8551 > 7176), lowering kernel.perf_event_max_sample_rate to 23000 [ 463.908115] perf: interrupt took too long (11374 > 10688), lowering kernel.perf_event_max_sample_rate to 17000 [ 480.954277] INFO: task yum:6726 blocked for more than 120 seconds. [ 480.991431] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 481.038470] yum D ffff880467bebf40 0 6726 6714 0x00000080 [ 481.081174] ffff88006489b850 0000000000000082 ffff880467bebf40 ffff88006489bfd8 [ 481.126085] ffff88006489bfd8 ffff88006489bfd8 ffff880467bebf40 ffff8802772656b0 [ 481.171087] 7fffffffffffffff 0000000000000002 0000000000000000 ffff880467bebf40 [ 481.216034] Call Trace: [ 481.230832] [] ftrace_graph_caller+0x85/0x85 [ 481.266435] [] schedule+0x29/0x70 [ 481.296290] [] ftrace_graph_caller+0x85/0x85 [ 481.331966] [] schedule_timeout+0x239/0x2c0 [ 481.367082] [] ? prepare_ftrace_return+0xb6/0xe0 [ 481.404850] [] ? __down_common+0x104/0x104 [ 481.439492] [] ftrace_graph_caller+0x85/0x85 [ 481.475091] [] __down_common+0xaa/0x104 [ 481.508098] [] ? ftrace_graph_caller+0x85/0x85 [ 481.544778] [] down+0x41/0x50 [ 481.577943] [] __down+0x1d/0x1f [ 481.611734] [] ftrace_graph_caller+0x85/0x85 [ 481.652576] [] xfs_buf_lock+0x3c/0xd0 [xfs] [ 481.693178] [] ftrace_graph_caller+0x85/0x85 [ 481.734216] [] _xfs_buf_find+0x170/0x330 [xfs] [ 481.775910] [] ftrace_graph_caller+0x85/0x85 [ 481.816266] [] xfs_buf_get_map+0x2a/0x240 [xfs] [ 481.859119] [] ftrace_graph_caller+0x85/0x85 [ 481.899431] [] xfs_buf_read_map+0x30/0x160 [xfs] [ 481.941875] [] ftrace_graph_caller+0x85/0x85 [ 481.982192] [] xfs_trans_read_buf_map+0x211/0x400 [xfs] [ 482.027938] [] ftrace_graph_caller+0x85/0x85 [ 482.068655] [] xfs_imap_to_bp+0x6e/0xf0 [xfs] [ 482.109275] [] ftrace_graph_caller+0x85/0x85 [ 482.149375] [] xfs_iunlink_remove+0x295/0x3c0 [xfs] [ 482.193561] [] ftrace_graph_caller+0x85/0x85 [ 482.233577] [] xfs_ifree+0x47/0x100 [xfs] [ 482.272198] [] ftrace_graph_caller+0x85/0x85 [ 482.312544] [] xfs_inactive_ifree+0xd1/0x230 [xfs] [ 482.355816] [] ftrace_graph_caller+0x85/0x85 [ 482.395936] [] xfs_inactive+0x8b/0x130 [xfs] [ 482.436204] [] ftrace_graph_caller+0x85/0x85 [ 482.477110] [] xfs_fs_destroy_inode+0x95/0x180 [xfs] [ 482.521397] [] ftrace_graph_caller+0x85/0x85 [ 482.562184] [] destroy_inode+0x38/0x60 [ 482.599372] [] ftrace_graph_caller+0x85/0x85 [ 482.639831] [] evict+0x10a/0x180 [ 482.673827] [] ftrace_graph_caller+0x85/0x85 [ 482.714694] [] iput+0xf9/0x190 [ 482.748414] [] ftrace_graph_caller+0x85/0x85 [ 482.789294] [] dentry_kill+0x168/0x1b0 [ 482.827374] [] ftrace_graph_caller+0x85/0x85 [ 482.868003] [] SYSC_renameat2+0x518/0x5a0 [ 482.906672] [] dput+0x5e/0xd0 [ 482.938986] [] ftrace_graph_caller+0x85/0x85 [ 482.979490] [] SyS_rename+0x1e/0x20 [ 483.016292] [] ? ftrace_push_return_trace+0x4f/0xe0 [ 483.060909] [] ? SyS_link+0x20/0x20 [ 483.096075] [] ? SyS_rename+0x1e/0x20 [ 483.132925] [] ? prepare_ftrace_return+0xb6/0xe0 [ 483.174945] [] ? SyS_link+0x20/0x20 [ 483.210469] [] ? __trace_graph_return+0x9b/0xb0 [ 483.252601] [] SyS_renameat2+0xe/0x10 [ 483.288881] [] ftrace_graph_caller+0x85/0x85 [ 483.328689] [] system_call_fastpath+0x16/0x1b [ 483.369480] [] ftrace_graph_caller+0x85/0x85 [ 507.786060] perf: interrupt took too long (14339 > 14217), lowering kernel.perf_event_max_sample_rate to 13000 [ 546.740390] perf: interrupt took too long (18151 > 17923), lowering kernel.perf_event_max_sample_rate to 11000 [ 588.710897] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 7, t=240042 jiffies, g=21144, c=21143, q=23204) [ 588.711872] Task dump for CPU 0: [ 588.711872] swapper/0 R running task 0 0 0 0x00000008 [ 588.711872] 0000000000000003 0000000000000003 ffffffff81aaf920 ffffffff819e4000 [ 588.711872] ffffffff819e7ed0 ffffffff816b6ef5 0000000000000000 ffffffff81b1c820 [ 588.711872] ffffffff819e4000 ffffffff819e4000 ffffffff819e4000 ffffffff819e4000 [ 588.711872] Call Trace: [ 588.711872] [] ? ftrace_graph_caller+0x85/0x85 [ 588.711872] [] handle_irq_event_percpu+0x55/0x80 [ 588.711872] [] ? ftrace_graph_caller+0x85/0x85 [ 588.711872] [] handle_irq_event+0x3c/0x60 [ 588.711872] [] ? ftrace_graph_caller+0x85/0x85 [ 588.711872] [] handle_irq_event+0x3c/0x60 [ 588.711872] [] ? rest_init+0x77/0x80 [ 588.711872] [] ? start_kernel+0x439/0x45a [ 588.711872] [] ? repair_env_string+0x5c/0x5c [ 588.711872] [] ? early_idt_handler_array+0x120/0x120 [ 588.711872] [] ? x86_64_start_reservations+0x24/0x26 [ 588.711872] [] ? x86_64_start_kernel+0x14f/0x172 [-- MARK -- Wed Jul 12 02:45:00 2017] log with 4.13-rc2 kernel ============================= [ 876.272025] INFO: rcu_sched detected stalls on CPUs/tasks: [ 876.272027] 0-...: (1 GPs behind) idle=992/140000000000000/0 softirq=66236/66237 fqs=11905 [ 876.272027] (detected by 12, t=60002 jiffies, g=20604, c=20603, q=23819) [ 876.272027] Sending NMI from CPU 12 to CPUs 0: [ 876.272027] NMI backtrace for cpu 0 [ 876.272027] CPU: 0 PID: 2106 Comm: stress Not tainted 4.13.0-rc2 #1 [ 876.272027] Hardware name: XXXXXXXXXXXXXXXXX [ 876.272027] task: ffff9484e3a52c00 task.stack: ffffaa1c08888000 [ 876.272027] RIP: 0010:delay_tsc+0x23/0x60 [ 876.272027] RSP: 0000:ffffaa1c0888b768 EFLAGS: 00000083 [ 876.272027] RAX: 00000000000007a4 RBX: ffffffffb9f48d40 RCX: 000001eec7b47589 [ 876.272027] RDX: 000001eec7b47d2d RSI: 0000000000000000 RDI: 00000000000007d1 [ 876.272027] RBP: ffffaa1c0888b768 R08: ffff94820fc08890 R09: 000000002ce120d1 [ 876.272027] R10: 0000000497700000 R11: ffff9482f6c7c580 R12: 00000000000026f2 [ 876.272027] R13: 0000000000000020 R14: ffffffffb9ce4ad1 R15: ffffffffb9f48d40 [ 876.272027] FS: 00007fdb22cf9740(0000) GS:ffff9482f6200000(0000) knlGS:0000000000000000 [ 876.272027] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 876.272027] CR2: 00007f4dfe401010 CR3: 00000004656ef000 CR4: 00000000000006f0 [ 876.272027] Call Trace: [ 876.272027] ? ftrace_graph_caller+0xa8/0xa8 [ 876.272027] __const_udelay+0x32/0x40 [ 876.272027] wait_for_xmitr+0x2c/0xa0 [ 876.272027] ? ftrace_graph_caller+0xa8/0xa8 [ 876.272027] serial8250_console_putchar+0x1c/0x30 [ 876.272027] ? wait_for_xmitr+0xa0/0xa0 [ 876.272027] ? ftrace_graph_caller+0xa8/0xa8 [ 876.272027] uart_console_write+0x31/0x70 [ 876.272027] ? ftrace_graph_caller+0xa8/0xa8 [ 876.272027] serial8250_console_write+0xd7/0x290 [ 876.272027] ? serial8250_console_write+0x5/0x290 [ 876.272027] ? ftrace_return_to_handler+0x9d/0xf0 [ 876.272027] ? univ8250_console_write+0x5/0x30 [ 876.272027] ? ftrace_graph_caller+0xa8/0xa8 [ 876.272027] univ8250_console_write+0x26/0x30 [ 876.272027] ? ftrace_graph_caller+0xa8/0xa8 [ 876.272027] console_unlock+0x3e6/0x4c0 [ 876.272027] ? ftrace_graph_caller+0xa8/0xa8 [ 876.272027] vprintk_emit+0x2f5/0x3a0 [ 876.272027] ? vprintk_emit+0x5/0x3a0 [ 876.272027] ? ftrace_graph_caller+0xa8/0xa8 [ 876.272027] vprintk_default+0x29/0x50 [ 876.272027] ? ftrace_graph_caller+0xa8/0xa8 [ 876.272027] vprintk_func+0x27/0x60 [ 876.272027] ? ftrace_graph_caller+0xa8/0xa8 [ 876.272027] printk+0x58/0x6f [ 876.272027] ? ftrace_graph_caller+0xa8/0xa8 [ 876.272027] show_free_areas+0x255/0xa40 [ 876.272027] ? ftrace_graph_caller+0x78/0xa8 [ 876.272027] ? __trace_graph_return+0x88/0x90 [ 876.272027] ? ftrace_return_to_handler+0x9d/0xf0 [ 876.272027] ? show_free_areas+0x5/0xa40 [ 876.272027] ? ftrace_graph_caller+0xa8/0xa8 [ 876.272027] show_mem+0x2b/0xf0 [ 876.272027] warn_alloc+0x198/0x1c0 [ 876.272027] ? ftrace_graph_caller+0xa8/0xa8 [ 876.272027] __alloc_pages_slowpath+0x353/0xb90 [ 876.272027] ? __trace_graph_return+0x88/0x90 [ 876.272027] ? __alloc_pages_nodemask+0x102/0x290 [ 876.272027] ? ftrace_return_to_handler+0x9d/0xf0 [ 876.272027] __alloc_pages_nodemask+0x267/0x290 [ 876.272027] ? ftrace_graph_caller+0xa8/0xa8 [ 876.272027] alloc_pages_vma+0x7f/0x180 [ 876.272027] ? ftrace_graph_caller+0xa8/0xa8 [ 876.272027] __handle_mm_fault+0xbeb/0x1150 [ 876.272027] ? __trace_graph_return+0x88/0x90 [ 876.272027] ? ftrace_graph_caller+0xa8/0xa8 [ 876.272027] handle_mm_fault+0xd1/0x230 [ 876.272027] ? ftrace_graph_caller+0xa8/0xa8 [ 876.272027] __do_page_fault+0x22a/0x4a0 [ 876.272027] ? ftrace_graph_caller+0xa8/0xa8 [ 876.272027] do_page_fault+0x30/0x80 [ 876.272027] ? return_to_handler+0x15/0x27 [ 876.272027] page_fault+0x28/0x30 [ 876.272027] RIP: 0033:0x403072 [ 876.272027] RSP: 002b:00007fff358a2590 EFLAGS: 00010202 [ 876.272027] RAX: 00007fdb19e8e010 RBX: 0000000000000000 RCX: 00007fdb12420010 [ 876.272027] RDX: 0000000007a6e000 RSI: 0000000010001000 RDI: 0000000000000000 [ 876.272027] RBP: 00007fff358a25d0 R08: ffffffffffffffff R09: 0000000010000000 [ 876.272027] R10: 0000000000000022 R11: 0000000000001000 R12: 0000000000400dc0 [ 876.272027] R13: 00007fff358a27e0 R14: 0000000000000000 R15: 0000000000000000 [ 876.272027] Code: c3 0f 1f 80 00 00 00 00 e8 7b 1c 01 00 55 48 89 e5 65 8b 35 d0 be ca 46 0f ae e8 0f 31 48 89 d1 48 c1 e1 20 48 09 c1 eb 0d f3 90 <65> 8b 05 b6 be ca 46 39 c6 75 19 0f ae e8 0f 31 48 c1 e2 20 48 [ 876.319103] INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 1.808 msecs [ 877.920915] stress: page allocation stalls for 11490ms, order:0, mode:0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null) [ 877.920971] stress cpuset=/ mems_allowed=0-1 [ 877.921094] CPU: 18 PID: 2085 Comm: stress Not tainted 4.13.0-rc2 #1 [ 877.921108] Hardware name: XXXXXXXXXX [ 877.921122] Call Trace: [ 877.921170] ? ftrace_graph_caller+0xa8/0xa8 [ 877.921193] dump_stack+0x63/0x89 [ 877.921237] warn_alloc+0x114/0x1c0 [ 877.921352] ? ftrace_graph_caller+0xa8/0xa8 [ 877.921372] __alloc_pages_slowpath+0x353/0xb90 [ 877.921444] ? __trace_graph_return+0x88/0x90 [ 877.921489] ? __alloc_pages_nodemask+0x102/0x290 [ 877.921520] ? ftrace_return_to_handler+0x9d/0xf0 [ 877.921599] __alloc_pages_nodemask+0x267/0x290 [ 877.921701] ? ftrace_graph_caller+0xa8/0xa8 [ 877.921726] alloc_pages_vma+0x7f/0x180 [ 877.921790] ? ftrace_graph_caller+0xa8/0xa8 [ 877.921812] __handle_mm_fault+0xbeb/0x1150 [ 877.921839] ? __trace_graph_return+0x88/0x90 [ 877.921968] ? ftrace_graph_caller+0xa8/0xa8 [ 877.921986] handle_mm_fault+0xd1/0x230 [ 877.922028] ? ftrace_graph_caller+0xa8/0xa8 [ 877.922028] __do_page_fault+0x22a/0x4a0 [ 877.922028] ? ftrace_graph_caller+0xa8/0xa8 [ 877.922028] do_page_fault+0x30/0x80 [ 877.922028] ? return_to_handler+0x15/0x27 [ 877.922028] page_fault+0x28/0x30 [ 877.922028] RIP: 0033:0x403072 [ 877.922028] RSP: 002b:00007fff358a2590 EFLAGS: 00010202 [ 877.922028] RAX: 00007fdb1f99a010 RBX: 0000000000000000 RCX: 00007fdb12420010 [ 877.922028] RDX: 000000000d57a000 RSI: 0000000010001000 RDI: 0000000000000000 [ 877.922028] RBP: 00007fff358a25d0 R08: ffffffffffffffff R09: 0000000010000000 [ 877.922028] R10: 0000000000000022 R11: 0000000000001000 R12: 0000000000400dc0 [ 877.922028] R13: 00007fff358a27e0 R14: 0000000000000000 R15: 0000000000000000 [ 877.922541] warn_alloc_show_mem: 1 callbacks suppressed [ 877.922554] Mem-Info: [ 877.922594] active_anon:3380619 inactive_anon:424654 isolated_anon:27399 -- Regards Pratyush