Received: by 2002:a5d:925a:0:0:0:0:0 with SMTP id e26csp1476125iol; Fri, 10 Jun 2022 08:14:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz/8ewHX6zrHs1GWZkbenVKRTDBStU2XO/UJIrVJnlcSXMw2m4CPnYCfbiF5ACKqq/JcXcu X-Received: by 2002:a17:906:9244:b0:70c:f626:944d with SMTP id c4-20020a170906924400b0070cf626944dmr36420808ejx.496.1654874052847; Fri, 10 Jun 2022 08:14:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1654874052; cv=none; d=google.com; s=arc-20160816; b=TRSON+HUtLVyyWr0UKFwqnMeKO+Zk1bR6UHvjQX0V+QLpCe+g3nCD3CgTix/qLA2i6 hCK3q6r+wH1b7Bj4HfmkhVEkcUX+9vqzc+czYGW+m3eoBjYRgEfK15sBBZltUfuz5frA 492rJQDc9NSJINfHu1dtJum2ft2QE+y+2yZQKg2oMeG2obwMADNiJdcJDeVqsOfoh/Ig SV1dqytvqxeBGghateowWgksjrdwzqmrRgvUVFOhDOQfaL0Y3KyAp73to1rM3MA2TwkZ 1H9Vyu+dldfL1s5cfXaGvyadIEKbS3JshfSh5uCbGTHq4hutQ37HZkTqBufZger5jqBU DH9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=bi4K7x/v9c7h2U0344y+aXpiaAr2Kxmyzj1YY0pkjmg=; b=L8bUD3pyyaZdzvtaQcyaaXGwNpT/i6BPMigSpKTfE9AbCf5y0kKDrzQG/5zsHJY+xv Wf04EpMiv/4o4bMnxc3JDcATgvavxL7DVRw8weT/olL5ZxRYIpWtZlYug2WbFu2xhB/n nXgvZaApPKiDkP5l31Z1N2a1Nn4DlVFZQ+aYi42JKIwGrg90ARH4YPVHhaaY7QrrFAmH r5dVDXKcWXqR21zkiUq8BAd2pTZszhSYOnrHxVLI0dfROEbWsA578bcwRbzUc/E/84jP L69K8rJboGWUish2HIAcpicjQu6EpcPQyFJOStrM2/SBcjc96Gq7LTuSCWlIS773gUhN kIfg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=B+Ks1u07; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f13-20020a0564021e8d00b0042de317fc6asi1496145edf.298.2022.06.10.08.13.46; Fri, 10 Jun 2022 08:14:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=B+Ks1u07; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244563AbiFJPGr (ORCPT + 99 others); Fri, 10 Jun 2022 11:06:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244799AbiFJPGn (ORCPT ); Fri, 10 Jun 2022 11:06:43 -0400 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D6A122A610; Fri, 10 Jun 2022 08:06:41 -0700 (PDT) Received: by mail-wm1-x32a.google.com with SMTP id 67-20020a1c1946000000b00397382b44f4so1202012wmz.2; Fri, 10 Jun 2022 08:06:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=bi4K7x/v9c7h2U0344y+aXpiaAr2Kxmyzj1YY0pkjmg=; b=B+Ks1u07Co56HPLtvw33sVYlSWqTqbHLWSdb2YneHJaALzn3ewFBRuqLPPL0EQyqY0 XJsEKwHhcLQN+wj+FbSYpIRP4gTSqqefV8gfTa35IP+XUAxd5++lPo21n+YGkBv7GF/I 7XexB4SmapOhaWsDCkPGmnnLB7u2/QLLL7C5wnvrVsS02+K/fPFdfhOP6AuSNa2YcCmC xq67mSZTxLKKBAS0jnEm7Inmk8jOrm2ngouJJrnJcSdHr4RzwKSc0i62XljGm4bxBT5p cZVk3VdzyIqnjlW4hwE4RCRTbS1WXB4JqLCrolGrODwZAwUXraSVUOoy6HNEdwrZMOeO 0hlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bi4K7x/v9c7h2U0344y+aXpiaAr2Kxmyzj1YY0pkjmg=; b=WHPnJqA/VkN6awZJDqZLESeVfRTzvYT644+8h1URrSPQ8dESeIjx1llTYepRWXxPL2 yVp0pNZ6Ij1ZTj7IAZs7Ms0O6BKque5k6PBIxO/E0puIZmB6DWNU2O5trceXOPzMjCkY N6UpIdGXEKai7U5s6YJCFb8W+CE/lagK92wGY0Mfwdy7hm7uBw3OtddeUi+Y65X0AQOu IlHC9BqJmIOx/KGFohAMv3NWpJe5mFDGgXNz9r1i0a5Ghxl1CasnzjN5VYKanhngBa3u CTvKsUe+2doFhOLj0e5mAUYztMpmfXVErRbkEbym0T9PmKDxvMMVORZjUPkAwp//TXmx aCTg== X-Gm-Message-State: AOAM530zaLwldqIhCPH4faqFSdand9bEHCGrhp+Pd8Byi/4X/fvHwg4R 0NBXlb6DY8ANHHQv7rI18mR9O0SUYdkWJIY64Pc= X-Received: by 2002:a05:600c:4f87:b0:39c:8091:31e0 with SMTP id n7-20020a05600c4f8700b0039c809131e0mr221931wmq.84.1654873599489; Fri, 10 Jun 2022 08:06:39 -0700 (PDT) MIME-Version: 1.0 References: <20220609170859.v3.1.Ie846c5352bc307ee4248d7cab998ab3016b85d06@changeid> In-Reply-To: <20220609170859.v3.1.Ie846c5352bc307ee4248d7cab998ab3016b85d06@changeid> From: Rob Clark Date: Fri, 10 Jun 2022 08:06:26 -0700 Message-ID: Subject: Re: [PATCH v3] drm/msm: Avoid unclocked GMU register access in 6xx gpu_busy To: Douglas Anderson Cc: Akhil P Oommen , Jordan Crouse , Abhinav Kumar , Bjorn Andersson , Chia-I Wu , Dan Carpenter , Daniel Vetter , David Airlie , Dmitry Baryshkov , Eric Anholt , Jonathan Marek , Sean Paul , Viresh Kumar , Vladimir Lypak , Yangtao Li , dri-devel , freedreno , linux-arm-msm , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 9, 2022 at 5:10 PM Douglas Anderson wrote: > > From testing on sc7180-trogdor devices, reading the GMU registers > needs the GMU clocks to be enabled. Those clocks get turned on in > a6xx_gmu_resume(). Confusingly enough, that function is called as a > result of the runtime_pm of the GPU "struct device", not the GMU > "struct device". Unfortunately the current a6xx_gpu_busy() grabs a > reference to the GMU's "struct device". > > The fact that we were grabbing the wrong reference was easily seen to > cause crashes that happen if we change the GPU's pm_runtime usage to > not use autosuspend. It's also believed to cause some long tail GPU > crashes even with autosuspend. > > We could look at changing it so that we do pm_runtime_get_if_in_use() > on the GPU's "struct device", but then we run into a different > problem. pm_runtime_get_if_in_use() will return 0 for the GPU's > "struct device" the whole time when we're in the "autosuspend > delay". That is, when we drop the last reference to the GPU but we're > waiting a period before actually suspending then we'll think the GPU > is off. One reason that's bad is that if the GPU didn't actually turn > off then the cycle counter doesn't lose state and that throws off all > of our calculations. > > Let's change the code to keep track of the suspend state of > devfreq. msm_devfreq_suspend() is always called before we actually > suspend the GPU and msm_devfreq_resume() after we resume it. This > means we can use the suspended state to know if we're powered or not. > > NOTE: one might wonder when exactly our status function is called when > devfreq is supposed to be disabled. The stack crawl I captured was: > msm_devfreq_get_dev_status > devfreq_simple_ondemand_func > devfreq_update_target > qos_notifier_call > qos_max_notifier_call > blocking_notifier_call_chain > pm_qos_update_target > freq_qos_apply > apply_constraint > __dev_pm_qos_update_request > dev_pm_qos_update_request > msm_devfreq_idle_work > > Fixes: eadf79286a4b ("drm/msm: Check for powered down HW in the devfreq callbacks") > Signed-off-by: Douglas Anderson > --- > > Changes in v3: > - Totally rewrote to not use the pm_runtime functions. > - Moved the code to be common for all adreno GPUs. > > Changes in v2: > - Move the set_freq runtime pm grab to the GPU file. > - Use <= for the pm_runtime test, not ==. > > drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 8 ------ > drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 13 ++++----- > drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 +++------ > drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 3 ++- > drivers/gpu/drm/msm/msm_gpu.h | 9 ++++++- > drivers/gpu/drm/msm/msm_gpu_devfreq.c | 39 +++++++++++++++++++++------ > 6 files changed, 51 insertions(+), 33 deletions(-) > > diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c > index c424e9a37669..3dcec7acb384 100644 > --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c > +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c > @@ -1666,18 +1666,10 @@ static u64 a5xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate) > { > u64 busy_cycles; > > - /* Only read the gpu busy if the hardware is already active */ > - if (pm_runtime_get_if_in_use(&gpu->pdev->dev) == 0) { > - *out_sample_rate = 1; > - return 0; > - } > - > busy_cycles = gpu_read64(gpu, REG_A5XX_RBBM_PERFCTR_RBBM_0_LO, > REG_A5XX_RBBM_PERFCTR_RBBM_0_HI); > *out_sample_rate = clk_get_rate(gpu->core_clk); > > - pm_runtime_put(&gpu->pdev->dev); > - > return busy_cycles; > } > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > index 9f76f5b15759..dc715d88ff21 100644 > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > @@ -102,7 +102,8 @@ bool a6xx_gmu_gx_is_on(struct a6xx_gmu *gmu) > A6XX_GMU_SPTPRAC_PWR_CLK_STATUS_GX_HM_CLK_OFF)); > } > > -void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp) > +void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp, > + bool suspended) > { > struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); > struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu); > @@ -127,15 +128,16 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp) > > /* > * This can get called from devfreq while the hardware is idle. Don't > - * bring up the power if it isn't already active > + * bring up the power if it isn't already active. All we're doing here > + * is updating the frequency so that when we come back online we're at > + * the right rate. > */ > - if (pm_runtime_get_if_in_use(gmu->dev) == 0) > + if (suspended) > return; > > if (!gmu->legacy) { > a6xx_hfi_set_freq(gmu, perf_index); > dev_pm_opp_set_opp(&gpu->pdev->dev, opp); > - pm_runtime_put(gmu->dev); > return; > } > > @@ -159,7 +161,6 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp) > dev_err(gmu->dev, "GMU set GPU frequency error: %d\n", ret); > > dev_pm_opp_set_opp(&gpu->pdev->dev, opp); > - pm_runtime_put(gmu->dev); > } > > unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu) > @@ -895,7 +896,7 @@ static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu) > return; > > gmu->freq = 0; /* so a6xx_gmu_set_freq() doesn't exit early */ > - a6xx_gmu_set_freq(gpu, gpu_opp); > + a6xx_gmu_set_freq(gpu, gpu_opp, false); > dev_pm_opp_put(gpu_opp); > } > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > index 42ed9a3c4905..8c02a67f29f2 100644 > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > @@ -1658,27 +1658,21 @@ static u64 a6xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate) > /* 19.2MHz */ > *out_sample_rate = 19200000; > > - /* Only read the gpu busy if the hardware is already active */ > - if (pm_runtime_get_if_in_use(a6xx_gpu->gmu.dev) == 0) > - return 0; > - > busy_cycles = gmu_read64(&a6xx_gpu->gmu, > REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_L, > REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_H); > > - > - pm_runtime_put(a6xx_gpu->gmu.dev); > - > return busy_cycles; > } > > -static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp) > +static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp, > + bool suspended) > { > struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); > struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu); > > mutex_lock(&a6xx_gpu->gmu.lock); > - a6xx_gmu_set_freq(gpu, opp); > + a6xx_gmu_set_freq(gpu, opp, suspended); > mutex_unlock(&a6xx_gpu->gmu.lock); > } > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h > index 86e0a7c3fe6d..ab853f61db63 100644 > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h > @@ -77,7 +77,8 @@ void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state); > int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node); > void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu); > > -void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp); > +void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp, > + bool suspended); > unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu); > > void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state, > diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h > index 6def00883046..7ced1a30d4e8 100644 > --- a/drivers/gpu/drm/msm/msm_gpu.h > +++ b/drivers/gpu/drm/msm/msm_gpu.h > @@ -68,7 +68,8 @@ struct msm_gpu_funcs { > struct msm_gpu_state *(*gpu_state_get)(struct msm_gpu *gpu); > int (*gpu_state_put)(struct msm_gpu_state *state); > unsigned long (*gpu_get_freq)(struct msm_gpu *gpu); > - void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp); > + void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp, > + bool suspended); nit, I suppose we should add a comment that gpu_set_freq()/gpu_busy() should *not* call runpm get/put as they are called with lock held Otherwise looks good to me.. kinda sad that we have to track the suspended state behind runpm's back, but I don't really see any better way. Reviewed-by: Rob Clark > struct msm_gem_address_space *(*create_address_space) > (struct msm_gpu *gpu, struct platform_device *pdev); > struct msm_gem_address_space *(*create_private_address_space) > @@ -92,6 +93,9 @@ struct msm_gpu_devfreq { > /** devfreq: devfreq instance */ > struct devfreq *devfreq; > > + /** lock: lock for "suspended", "busy_cycles", and "time" */ > + struct mutex lock; > + > /** > * idle_constraint: > * > @@ -135,6 +139,9 @@ struct msm_gpu_devfreq { > * elapsed > */ > struct msm_hrtimer_work boost_work; > + > + /** suspended: tracks if we're suspended */ > + bool suspended; > }; > > struct msm_gpu { > diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c b/drivers/gpu/drm/msm/msm_gpu_devfreq.c > index d2539ca78c29..ea94bc18e72e 100644 > --- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c > +++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c > @@ -20,6 +20,7 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq, > u32 flags) > { > struct msm_gpu *gpu = dev_to_gpu(dev); > + struct msm_gpu_devfreq *df = &gpu->devfreq; > struct dev_pm_opp *opp; > > /* > @@ -32,10 +33,13 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq, > > trace_msm_gpu_freq_change(dev_pm_opp_get_freq(opp)); > > - if (gpu->funcs->gpu_set_freq) > - gpu->funcs->gpu_set_freq(gpu, opp); > - else > + if (gpu->funcs->gpu_set_freq) { > + mutex_lock(&df->lock); > + gpu->funcs->gpu_set_freq(gpu, opp, df->suspended); > + mutex_unlock(&df->lock); > + } else { > clk_set_rate(gpu->core_clk, *freq); > + } > > dev_pm_opp_put(opp); > > @@ -58,15 +62,24 @@ static void get_raw_dev_status(struct msm_gpu *gpu, > unsigned long sample_rate; > ktime_t time; > > + mutex_lock(&df->lock); > + > status->current_frequency = get_freq(gpu); > - busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate); > time = ktime_get(); > - > - busy_time = busy_cycles - df->busy_cycles; > status->total_time = ktime_us_delta(time, df->time); > + df->time = time; > > + if (df->suspended) { > + mutex_unlock(&df->lock); > + status->busy_time = 0; > + return; > + } > + > + busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate); > + busy_time = busy_cycles - df->busy_cycles; > df->busy_cycles = busy_cycles; > - df->time = time; > + > + mutex_unlock(&df->lock); > > busy_time *= USEC_PER_SEC; > do_div(busy_time, sample_rate); > @@ -175,6 +188,8 @@ void msm_devfreq_init(struct msm_gpu *gpu) > if (!gpu->funcs->gpu_busy) > return; > > + mutex_init(&df->lock); > + > dev_pm_qos_add_request(&gpu->pdev->dev, &df->idle_freq, > DEV_PM_QOS_MAX_FREQUENCY, > PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE); > @@ -244,12 +259,16 @@ void msm_devfreq_cleanup(struct msm_gpu *gpu) > void msm_devfreq_resume(struct msm_gpu *gpu) > { > struct msm_gpu_devfreq *df = &gpu->devfreq; > + unsigned long sample_rate; > > if (!has_devfreq(gpu)) > return; > > - df->busy_cycles = 0; > + mutex_lock(&df->lock); > + df->busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate); > df->time = ktime_get(); > + df->suspended = false; > + mutex_unlock(&df->lock); > > devfreq_resume_device(df->devfreq); > } > @@ -261,6 +280,10 @@ void msm_devfreq_suspend(struct msm_gpu *gpu) > if (!has_devfreq(gpu)) > return; > > + mutex_lock(&df->lock); > + df->suspended = true; > + mutex_unlock(&df->lock); > + > devfreq_suspend_device(df->devfreq); > > cancel_idle_work(df); > -- > 2.36.1.476.g0c4daa206d-goog >