Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp839130pxb; Fri, 22 Apr 2022 12:16:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJykMSFo0sogUdSFBQlQgV49am5YJGnZUT3WgLxp9Jp5NovLGWsNH7EOyO4aRfe8xxZ6qzBR X-Received: by 2002:a17:90a:a58c:b0:1c9:bc35:6ed9 with SMTP id b12-20020a17090aa58c00b001c9bc356ed9mr6955796pjq.146.1650655013585; Fri, 22 Apr 2022 12:16:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650655013; cv=none; d=google.com; s=arc-20160816; b=ZW16mnGlL+LfNiE/Jg/dK9USWy3pu/YHptv5LxJDHyj9sOYCXRFGGiP7o7P2ia+vEl d5wZ35noMCcKsmEdldiEqeC6/squC2qCyKUkbXIl7U9loB/nqAQPQPIbKpcVUP3QpDXz RnCg+XsB6tImfDGz70OSWoXIweoyLitrwgHVnfaEeNd9EEQvyWpx6AmJVi8uDetQ1qWl kap+y0O9yx6ZLuCU97Xwx5pwntyo0K6b2nrFmD7aB6n3A54rfhh7/H744dyudBceYmMc P8MVjv6pf78Ovh/BnMIMDyARgnYWXq+WXj0IyAWLWaXDi4KEs4IsafGqgEO1xKsnBtex /Hig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=HyXkNyvFoe2pY1RyCllxHpxefVN6lfUijwnOgvMkxcw=; b=IPMitzMmE03fQ7L5bthaNpvv7908Sl8k/zuHegYPfiAyoYNk3/c70pxBBMes23lCsh AULAW+EaSpmlyKH3iX2N6RHK+8q8UNwaU4RRAHEfYmg66BHM1mp9uzWIGaKUxOiCcOca 7afA+NRLyB1ln0T1KGCmB68bFBW0K9R54R8tH/k5FT0gLfcHYp72ku9trS+7PJtrQc2Z ap034q2d8QJdQXbrVUnRwp27lGseXc/L0oZ4LRKIEjyu+TPnS65/tlXxt1H9CboLO4Kr XXzElnavTFT662cu8XiWN54dYqCMMqBscTKdg5zumwzjQrV6actdjzJxUQ1d1+avz/s+ 3Lig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=qeVY+gdj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id e6-20020a17090301c600b00156dd62acffsi9803462plh.376.2022.04.22.12.16.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Apr 2022 12:16:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=qeVY+gdj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B5C7B151869; Fri, 22 Apr 2022 11:32:28 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244225AbiDTKiL (ORCPT + 99 others); Wed, 20 Apr 2022 06:38:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34502 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377672AbiDTKhf (ORCPT ); Wed, 20 Apr 2022 06:37:35 -0400 Received: from mail-lf1-x12f.google.com (mail-lf1-x12f.google.com [IPv6:2a00:1450:4864:20::12f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9CF8618D; Wed, 20 Apr 2022 03:34:47 -0700 (PDT) Received: by mail-lf1-x12f.google.com with SMTP id bu29so2251260lfb.0; Wed, 20 Apr 2022 03:34:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=HyXkNyvFoe2pY1RyCllxHpxefVN6lfUijwnOgvMkxcw=; b=qeVY+gdjEyyR+9ndlkpsNmVRgsz1f8Bmm2TLtLrgXhv5PH7+HS56pxBXAKpMvoX2kC f3KrA54To9v+ECO8Ghn/HkHV05ePsinZPK2NuGjHPgplplanHuF+uAD/nYgF2hFKATFw iVCppbrKGq7feKtLKg0R45P0xkqJIfhCUgb8O1ALebHqZAnhjVNEHFAa1nY0g06NUHV+ lZaSVsgvfajK4rC28j8W4ieQG8EblnaUhDrVq+LCxxMqFvhheRlBKq/y9npSwdNvHSat zOCTi/IfbFyuoq8273+Y7pxfERLWE4r0pRSVZNBJzqiTpForTa/AVEMigWQKoSM++xlf nF+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=HyXkNyvFoe2pY1RyCllxHpxefVN6lfUijwnOgvMkxcw=; b=C2MTp4n6LDevY7X962PARapufjluGnukOWWQcF6QTvHRfBdkID2EghsFnPBScwSgga 1ExIpFinprHWjMZNxKL5To7yBlwumPi0AtM77T/m+8qmWsvomLNe/MWyhVPTE6KiGrIg PqWAG9AJVrwWiRshEPPHTofM6mxNWiVy/oQehDGu2Tpn78E8JWfm1hmGkqxN+UKeYcyM P6dmaGOodbFnsMlw9elPncL+vgNl6qtoL3Rpp45e4L8cL8LK3DClEwXODJeQ0MXTpVn+ iDfhewF6nWPi+wguuy1G8hcXnnRLDti6YLjh/i/tJLrdNCNu/YPtjzPErz2fraYxRat+ jyiQ== X-Gm-Message-State: AOAM5328FPLZ2TrG+xIFnhxmes+AgqFoC/fjHX67KB4poEbl0GBr+H1W oT115P36akM1cwyRne0XSuYLvy2El6hc7pqmwTM= X-Received: by 2002:a05:6512:31d3:b0:471:b18f:b604 with SMTP id j19-20020a05651231d300b00471b18fb604mr4013534lfe.96.1650450885656; Wed, 20 Apr 2022 03:34:45 -0700 (PDT) MIME-Version: 1.0 References: <20220418043735.11441-1-patrick.wang.shcn@gmail.com> <20220418143404.55c8fcab@gandalf.local.home> In-Reply-To: From: patrick wang Date: Wed, 20 Apr 2022 18:34:34 +0800 Message-ID: Subject: Re: [PATCH] rcu: ftrace: avoid tracing a few functions executed in multi_cpu_stop() To: Steven Rostedt Cc: paulmck@kernel.org, frederic@kernel.org, quic_neeraju@quicinc.com, josh@joshtriplett.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, joel@joelfernandes.org, rcu@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 19, 2022 at 12:06 PM patrick wang wrote: > > On Tue, Apr 19, 2022 at 2:34 AM Steven Rostedt wrote: > > > > On Mon, 18 Apr 2022 12:37:35 +0800 > > Patrick Wang wrote: > > > > > A few functions are in the call chain of rcu_momentary_dyntick_idle() > > > which is executed in multi_cpu_stop() and marked notrace. They are running > > > in traced when ftrace modify code. This may cause non-ftrace_modify_code > > > CPUs stall: > > > > I'm confused by this. How is traced functions causing this exactly? Is this > > on RISC-V? > > During ftrace modify code, these functions are running and their > instructions will > be modified by ftrace (I see the nop instructions in these functions > from the compiler). > When instructions are being modified, they shouldn't be executed. Or > the executor > may behave unpredictably. > Sorry for the format. Need get used to gmail. These functions are running within stop machine and ftrace modify code by using stop machine to ensure the safety on some architectures(e.g. RISC-V). These functions' instructions will be modified during ftrace modifying code. When instructions are being modified, they shouldn't be executed typically. Or the executor may behave unpredictably. > > > > > > > > > [ 72.686113] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > > > [ 72.687344] rcu: 1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0 > > > [ 72.687800] rcu: 3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0 > > > [ 72.688280] (detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4) > > > [ 72.688739] Task dump for CPU 1: > > > [ 72.688991] task:migration/1 state:R running task stack: 0 pid: 19 ppid: 2 flags:0x00000000 > > > [ 72.689594] Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174 > > > [ 72.690242] Call Trace: > > > [ 72.690603] Task dump for CPU 3: > > > [ 72.690761] task:migration/3 state:R running task stack: 0 pid: 29 ppid: 2 flags:0x00000000 > > > [ 72.691135] Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174 > > > [ 72.691474] Call Trace: > > > [ 72.691733] rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 > > > [ 72.692180] rcu: Possible timer handling issue on cpu=2 timer-softirq=594 > > > [ 72.692485] rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 > > > [ 72.692876] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. > > > [ 72.693232] rcu: RCU grace-period kthread stack dump: > > > [ 72.693433] task:rcu_preempt state:I stack: 0 pid: 14 ppid: 2 flags:0x00000000 > > > [ 72.693788] Call Trace: > > > [ 72.694018] [] schedule+0x56/0xc2 > > > [ 72.694306] [] schedule_timeout+0x82/0x184 > > > [ 72.694539] [] rcu_gp_fqs_loop+0x19a/0x318 > > > [ 72.694809] [] rcu_gp_kthread+0x11a/0x140 > > > [ 72.695325] [] kthread+0xee/0x118 > > > [ 72.695657] [] ret_from_exception+0x0/0x14 > > > [ 72.696089] rcu: Stack dump where RCU GP kthread last ran: > > > [ 72.696383] Task dump for CPU 2: > > > [ 72.696562] task:migration/2 state:R running task stack: 0 pid: 24 ppid: 2 flags:0x00000000 > > > [ 72.697059] Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174 > > > [ 72.697471] Call Trace: > > > > > > Mark rcu_preempt_deferred_qs(), rcu_preempt_need_deferred_qs() and > > > rcu_preempt_deferred_qs_irqrestore() notrace to avoid this. > > > > > > > The rcu_momentary_dyntick_idle() was marked notrace because of RISC-V not > > being able to call functions from within stop machine. If that's what is > > being prevented, > Yes, that is. Commit 4230e2deaa48 (stop_machine, rcu: Mark functions as notrace) marked rcu_momentary_dyntick_idle() notrace. But this issue still exists to some extent. Thanks, Patrick > > > then I'm fine with this (although I'm thinking we need > > different kinds of "notrace" for different architectures as one arch's > > limitation should not be cause for another's). > > > > Totally agree with this. The "notrace" currently is heavy, can effect all archs. > > Thanks > Patrick > > > > But before I ack this patch, I want to understand the real issues here. > > > > -- Steve