Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934052AbeAKOg5 (ORCPT + 1 other); Thu, 11 Jan 2018 09:36:57 -0500 Received: from mail-qk0-f169.google.com ([209.85.220.169]:40193 "EHLO mail-qk0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932621AbeAKOgz (ORCPT ); Thu, 11 Jan 2018 09:36:55 -0500 X-Google-Smtp-Source: ACJfBou1WRP7wR3LsQ7vIVauWbClhi3SL42IAZYQwzFZlAB4EDwRRCO110lxMOKekg41AH6LGBXmAhP7+OIkh66x5cY= MIME-Version: 1.0 From: Romain Izard Date: Thu, 11 Jan 2018 15:36:34 +0100 X-Google-Sender-Auth: 3pOifXcGfoZ4yNmt84JPgMPzqrc Message-ID: Subject: "perf record" leads to a watchdog reset on a SAMA5D2 SoC To: linux-arm-kernel , LKML Cc: Nicolas Ferre Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hello, While trying to use perf on a device based on a SAMA5D2 SoC, I encountered a systematic watchdog reboot. Simply calling "perf record /bin/true" is enough to trigger this condition. I have reproduced the issue on a SAMA5D2 Xplained demo board, with a 4.14.11 kernel, leading to "rcu_preempt self-detected stall on CPU" kernel messages once the watchdog was disabled. But this was depending on the kernel configuration - the sama5_defconfig configuration would not show the issue, but adding the following options to this defconfig was be enough: +CONFIG_PERF_EVENTS=y +CONFIG_PREEMPT=y +CONFIG_FUNCTION_TRACER=y +# CONFIG_DYNAMIC_FTRACE is not set There are complex interactions between the options, and in the case of the original device DYNAMIC_FTRACE was set. But when trying to reduce the number of changed flags between the default and a configuration reproducing the issue, I ended up in a situation where enabling or disabling DYNAMIC_FTRACE is the difference that triggered the problem. Here is an example of the error (repeated every 30s) in the kernel log: 8<---------------------------------------------------------------------- INFO: rcu_preempt self-detected stall on CPU 0-...: (1413 ticks this GP) idle=7fe/140000000000002/0 softirq=1011/1011 fqs=0 (t=2100 jiffies g=86 c=85 q=63) rcu_preempt kthread starved for 2100 jiffies! g86 c85 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0 rcu_preempt I 0 8 2 0x00000000 [] (__schedule) from [] (schedule+0x58/0xb8) [] (schedule) from [] (schedule_timeout+0x1d8/0x3fc) [] (schedule_timeout) from [] (rcu_gp_kthread+0x628/0xab4) [] (rcu_gp_kthread) from [] (kthread+0x134/0x164) [] (kthread) from [] (ret_from_fork+0x14/0x2c) Task dump for CPU 0: perf R running task 0 842 836 0x00000002 [] (unwind_backtrace) from [] (show_stack+0x20/0x24) [] (show_stack) from [] (sched_show_task.part.22+0xc4/0xe0) [] (sched_show_task.part.22) from [] (dump_cpu_task+0x40/0x44) [] (dump_cpu_task) from [] (rcu_dump_cpu_stacks+0x8c/0xcc) [] (rcu_dump_cpu_stacks) from [] (rcu_check_callbacks+0x628/0xa5c) [] (rcu_check_callbacks) from [] (update_process_times+0x40/0x6c) [] (update_process_times) from [] (tick_sched_handle+0x64/0x68) [] (tick_sched_handle) from [] (tick_sched_timer+0x58/0xb4) [] (tick_sched_timer) from [] (__hrtimer_run_queues.constprop.5+0x174/0x330) [] (__hrtimer_run_queues.constprop.5) from [] (hrtimer_interrupt+0xa4/0x260) [] (hrtimer_interrupt) from [] (ch2_irq+0x38/0x40) [] (ch2_irq) from [] (__handle_irq_event_percpu+0x6c/0x258) [] (__handle_irq_event_percpu) from [] (handle_irq_event_percpu+0x2c/0x68) [] (handle_irq_event_percpu) from [] (handle_irq_event+0x5c/0x94) [] (handle_irq_event) from [] (handle_fasteoi_irq+0xbc/0x198) [] (handle_fasteoi_irq) from [] (generic_handle_irq+0x34/0x44) [] (generic_handle_irq) from [] (__handle_domain_irq+0x80/0xec) [] (__handle_domain_irq) from [] (aic5_handle+0xd0/0xf8) [] (aic5_handle) from [] (__irq_svc+0x6c/0xa8) Exception stack(0xde63fd18 to 0xde63fd60) fd00: 00000000 c0a077dc fd20: c0d86200 00000000 c0d86200 00000000 00000000 00000202 de63fe20 df007600 fd40: c0d7c3cc de63fdbc de63fd58 de63fd68 c03d3cb4 c0101648 200b0113 ffffffff [] (__irq_svc) from [] (__do_softirq+0xa0/0x350) [] (__do_softirq) from [] (irq_exit+0xfc/0x138) [] (irq_exit) from [] (__handle_domain_irq+0x84/0xec) [] (__handle_domain_irq) from [] (aic5_handle+0xd0/0xf8) [] (aic5_handle) from [] (__irq_svc+0x6c/0xa8) Exception stack(0xde63fe20 to 0xde63fe68) fe20: 00000000 c01c265c 00000000 00000000 600b0013 c01c612c de57f600 de63e000 fe40: df284c00 00000000 00000000 de63fe8c de63fe08 de63fe70 c0145178 c0187214 fe60: 600b0013 ffffffff [] (__irq_svc) from [] (smp_call_function_single+0x38/0x60) [] (smp_call_function_single) from [] (task_function_call+0x3c/0x54) [] (task_function_call) from [] (perf_install_in_context+0x5c/0x19c) [] (perf_install_in_context) from [] (SyS_perf_event_open+0xb88/0xdcc) [] (SyS_perf_event_open) from [] (ret_fast_syscall+0x0/0x54) 8<---------------------------------------------------------------------- I'm not sure of how to interpret the issue. Is it because the CPU is too busy servicing the trace code during ticks to do anything else? Best regards, -- Romain Izard