Received: by 2002:a05:6520:2586:b029:fa:41f3:c225 with SMTP id u6csp22774lky; Wed, 9 Jun 2021 14:45:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx8EbhF9dTzY5ho8GjjntIe/Dr6nvVuXi+fq5eutv/JpB//YQ5vzcwMdSgNAtBEtEBE5cpl X-Received: by 2002:a17:906:5488:: with SMTP id r8mr1687375ejo.374.1623275156475; Wed, 09 Jun 2021 14:45:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623275156; cv=none; d=google.com; s=arc-20160816; b=Lnwcq5kRuKkWoakfDBY8WIlfsrbfjrEnR7uppmTMq7KJsi8MfagVLmowypzJWwoxB4 NlcT7kKfmIJZKTWFRxsi7Wq4OiKW2jD+r42NPpUJCnJiID6LQqQJfC8PGcgbsvFA3Kjn pR0uq3ZmrfNT02KRFBODB0+aNB07EOBkWSHKrmy2tDqMlIYriHxby60yPh2VvNzTf0Aw 5n6762Ya5LdjnuFfNQGZsNsaifv5Go9GCi5AK0zQJ5fZ5Es9stvHG7Ph+crpXetK4H91 Vu5UgsboXh8NUVqWm7mqQ6wU8TvOpNESBVu7uwy8zcr6SDTBuqMbvM/TKnSQJxs3hsX9 poKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=de4woTGItgJ6l8fxDhQxfYqmRWgiFJZJ8CF4CJ5w94k=; b=wfKfhxewcQ2rkAK4La0DKmGmGaWGRJ0+JvBv+n7NZLMs00C+7iJHYRjYTCKX9STLVJ cYIlH56JDSNCl5296lFLhlrvEbJ8G9yWeROlAK980pGqwtc1bi4whPAEBrOqoeUgz7Gr oyVHl4D+DnLOKev9QVZKUahzTbeiqccwhTrdkBWIiRFW87ggUYZR6fYKMVTXHHS2DbYN mHY3YgWelBFrr96mxMKvr4PK51gUCIVQcxAynhBUZft9QeEcbbwhRihnSrmZUZ7wS4Z6 c3q5mrtcb4BEDlHuy5f64ygGBST23lAdSSUXhS6vWJuD30Ttv9vJEJI2XpqrmSK7C7PZ ZcsQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=CYQOUqp4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m9si767895ejj.467.2021.06.09.14.45.32; Wed, 09 Jun 2021 14:45:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=CYQOUqp4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229685AbhFIVpU (ORCPT + 99 others); Wed, 9 Jun 2021 17:45:20 -0400 Received: from mail-wm1-f50.google.com ([209.85.128.50]:33599 "EHLO mail-wm1-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229517AbhFIVpT (ORCPT ); Wed, 9 Jun 2021 17:45:19 -0400 Received: by mail-wm1-f50.google.com with SMTP id s70-20020a1ca9490000b02901a589651424so3832737wme.0; Wed, 09 Jun 2021 14:43:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=de4woTGItgJ6l8fxDhQxfYqmRWgiFJZJ8CF4CJ5w94k=; b=CYQOUqp4fw985r+ki5qABT6gg+FP4T6wg7BuFBudnLu9MnZM7/R3Zt4yRW3+yS7pJR 4YzlnScDSx2Vh5V8Vh6Cbtc3bLFi6qa5MuclaZnV9/awuaZbe/4MngyEqRZsCnqZafv7 cx+9UvUlv9FWU265l/6PEuL6Iac6j9JJ/tN0FL5o/CVN6JwbSoIyKOA+1x18mw0pND8m r2GHC5w+1GM327NU+zV2YGna7+G/NO5DFCcDTab468YTQ3bC4PnOEP0jhKVfF4fzufs2 2yldUcOzrY1EVjuLUGfzW8RvKTd9lwzgloCOZG7lFxAGqjfr2Px1G7y54jX2dZLYR93b TR/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=de4woTGItgJ6l8fxDhQxfYqmRWgiFJZJ8CF4CJ5w94k=; b=SAM5wOM0+pLz3I8+ZHRB3URzphBQ8sIz2bd/otd0ch7TRDu64pNBdrOKQHn1Rp71K6 W8WbN27vIZId3CnFnc0xrr/jpvCORiwg3trbLVnW1sv0w5GuLVn6COq2bdyL6hdW7Uy1 Qf+Y4NbHQTtGolZXiQIWZxHOlokVO0+w0yZB4SY71DMrcn1w+A5XFvc62UDcst4FFP3N z6CtYb1dyLSIZndAIjC/si5nBoUaiVJbUtt3tB4hUsyvU6bXcgo0nASYLeKpRUI0yrjc 0V5cgaf1tKAVHtGLb6gqJpqWsoWH8V2UYUyZl43cfXudlUB0XFBFHNimvH7SY9l4F54O yaaQ== X-Gm-Message-State: AOAM5325tbG0B5ZYqcX3DySJSietOLU9+ZlvPnh1eekwFFulWokNnT3U LhS45QVLNMsml6KFr6ixtH+USEwnbZo6xslTISU= X-Received: by 2002:a05:600c:1ca6:: with SMTP id k38mr12305339wms.49.1623274943425; Wed, 09 Jun 2021 14:42:23 -0700 (PDT) MIME-Version: 1.0 References: <20210601224750.513996-1-robdclark@gmail.com> <20210601224750.513996-7-robdclark@gmail.com> <20210608151247.u3uxznfoek7trxiv@cosmicpenguin.net> In-Reply-To: <20210608151247.u3uxznfoek7trxiv@cosmicpenguin.net> From: Rob Clark Date: Wed, 9 Jun 2021 14:46:17 -0700 Message-ID: Subject: Re: [PATCH v4 5/6] drm/msm: Add crashdump support for stalled SMMU To: Rob Clark , dri-devel , Rob Clark , Sean Paul , David Airlie , Daniel Vetter , Iskren Chernev , Akhil P Oommen , AngeloGioacchino Del Regno , Konrad Dybcio , "Kristian H. Kristensen" , Marijn Suijten , Sai Prakash Ranjan , Sharat Masetty , Jonathan Marek , Zhenzhong Duan , Lee Jones , "open list:DRM DRIVER FOR MSM ADRENO GPU" , "open list:DRM DRIVER FOR MSM ADRENO GPU" , open list Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 8, 2021 at 8:12 AM Jordan Crouse wrote: > > On Tue, Jun 01, 2021 at 03:47:24PM -0700, Rob Clark wrote: > > From: Rob Clark > > > > For collecting devcoredumps with the SMMU stalled after an iova fault, > > we need to skip the parts of the GPU state which are normally collected > > with the hw crashdumper, since with the SMMU stalled the hw would be > > unable to write out the requested state to memory. > > On a5xx and a6xx you can query RBBM_STATUS3 bit 24 to see if the IOMMU is > stalled. That could be an alternative option to adding the "stalled" > infrastructure across all targets. Hmm, I suppose it is really only a5xx/a6xx that needs to do something differently in this case, because of crashdumper, so maybe this would be a reasonable approach BR, -R > Jordan > > > > Signed-off-by: Rob Clark > > --- > > drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 2 +- > > drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 2 +- > > drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 2 +- > > drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 5 ++- > > drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 2 +- > > drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 43 ++++++++++++++++----- > > drivers/gpu/drm/msm/msm_debugfs.c | 2 +- > > drivers/gpu/drm/msm/msm_gpu.c | 7 ++-- > > drivers/gpu/drm/msm/msm_gpu.h | 2 +- > > 9 files changed, 47 insertions(+), 20 deletions(-) > > > > diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c > > index bdc989183c64..d2c31fae64fd 100644 > > --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c > > +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c > > @@ -434,7 +434,7 @@ static void a2xx_dump(struct msm_gpu *gpu) > > adreno_dump(gpu); > > } > > > > -static struct msm_gpu_state *a2xx_gpu_state_get(struct msm_gpu *gpu) > > +static struct msm_gpu_state *a2xx_gpu_state_get(struct msm_gpu *gpu, bool stalled) > > { > > struct msm_gpu_state *state = kzalloc(sizeof(*state), GFP_KERNEL); > > > > diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c > > index 4534633fe7cd..b1a6f87d74ef 100644 > > --- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c > > +++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c > > @@ -464,7 +464,7 @@ static void a3xx_dump(struct msm_gpu *gpu) > > adreno_dump(gpu); > > } > > > > -static struct msm_gpu_state *a3xx_gpu_state_get(struct msm_gpu *gpu) > > +static struct msm_gpu_state *a3xx_gpu_state_get(struct msm_gpu *gpu, bool stalled) > > { > > struct msm_gpu_state *state = kzalloc(sizeof(*state), GFP_KERNEL); > > > > diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c > > index 82bebb40234d..22780a594d6f 100644 > > --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c > > +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c > > @@ -549,7 +549,7 @@ static const unsigned int a405_registers[] = { > > ~0 /* sentinel */ > > }; > > > > -static struct msm_gpu_state *a4xx_gpu_state_get(struct msm_gpu *gpu) > > +static struct msm_gpu_state *a4xx_gpu_state_get(struct msm_gpu *gpu, bool stalled) > > { > > struct msm_gpu_state *state = kzalloc(sizeof(*state), GFP_KERNEL); > > > > diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c > > index a0eef5d9b89b..2e7714b1a17f 100644 > > --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c > > +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c > > @@ -1519,7 +1519,7 @@ static void a5xx_gpu_state_get_hlsq_regs(struct msm_gpu *gpu, > > msm_gem_kernel_put(dumper.bo, gpu->aspace, true); > > } > > > > -static struct msm_gpu_state *a5xx_gpu_state_get(struct msm_gpu *gpu) > > +static struct msm_gpu_state *a5xx_gpu_state_get(struct msm_gpu *gpu, bool stalled) > > { > > struct a5xx_gpu_state *a5xx_state = kzalloc(sizeof(*a5xx_state), > > GFP_KERNEL); > > @@ -1536,7 +1536,8 @@ static struct msm_gpu_state *a5xx_gpu_state_get(struct msm_gpu *gpu) > > a5xx_state->base.rbbm_status = gpu_read(gpu, REG_A5XX_RBBM_STATUS); > > > > /* Get the HLSQ regs with the help of the crashdumper */ > > - a5xx_gpu_state_get_hlsq_regs(gpu, a5xx_state); > > + if (!stalled) > > + a5xx_gpu_state_get_hlsq_regs(gpu, a5xx_state); > > > > a5xx_set_hwcg(gpu, true); > > > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h > > index ce0610c5256f..e0f06ce4e1a9 100644 > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h > > @@ -86,7 +86,7 @@ unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu); > > void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state, > > struct drm_printer *p); > > > > -struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu); > > +struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu, bool stalled); > > int a6xx_gpu_state_put(struct msm_gpu_state *state); > > > > #endif /* __A6XX_GPU_H__ */ > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c > > index c1699b4f9a89..d0af68a76c4f 100644 > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c > > @@ -833,6 +833,21 @@ static void a6xx_get_registers(struct msm_gpu *gpu, > > a6xx_state, &a6xx_vbif_reglist, > > &a6xx_state->registers[index++]); > > > > + if (!dumper) { > > + /* > > + * We can't use the crashdumper when the SMMU is stalled, > > + * because the GPU has no memory access until we resume > > + * translation (but we don't want to do that until after > > + * we have captured as much useful GPU state as possible). > > + * So instead collect registers via the CPU: > > + */ > > + for (i = 0; i < ARRAY_SIZE(a6xx_reglist); i++) > > + a6xx_get_ahb_gpu_registers(gpu, > > + a6xx_state, &a6xx_reglist[i], > > + &a6xx_state->registers[index++]); > > + return; > > + } > > + > > for (i = 0; i < ARRAY_SIZE(a6xx_reglist); i++) > > a6xx_get_crashdumper_registers(gpu, > > a6xx_state, &a6xx_reglist[i], > > @@ -903,9 +918,9 @@ static void a6xx_get_indexed_registers(struct msm_gpu *gpu, > > a6xx_state->nr_indexed_regs = count; > > } > > > > -struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu) > > +struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu, bool stalled) > > { > > - struct a6xx_crashdumper dumper = { 0 }; > > + struct a6xx_crashdumper _dumper = { 0 }, *dumper = NULL; > > struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); > > struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu); > > struct a6xx_gpu_state *a6xx_state = kzalloc(sizeof(*a6xx_state), > > @@ -928,14 +943,24 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu) > > /* Get the banks of indexed registers */ > > a6xx_get_indexed_registers(gpu, a6xx_state); > > > > - /* Try to initialize the crashdumper */ > > - if (!a6xx_crashdumper_init(gpu, &dumper)) { > > - a6xx_get_registers(gpu, a6xx_state, &dumper); > > - a6xx_get_shaders(gpu, a6xx_state, &dumper); > > - a6xx_get_clusters(gpu, a6xx_state, &dumper); > > - a6xx_get_dbgahb_clusters(gpu, a6xx_state, &dumper); > > + /* > > + * Try to initialize the crashdumper, if we are not dumping state > > + * with the SMMU stalled. The crashdumper needs memory access to > > + * write out GPU state, so we need to skip this when the SMMU is > > + * stalled in response to an iova fault > > + */ > > + if (!stalled && !a6xx_crashdumper_init(gpu, &_dumper)) { > > + dumper = &_dumper; > > + } > > + > > + a6xx_get_registers(gpu, a6xx_state, dumper); > > + > > + if (dumper) { > > + a6xx_get_shaders(gpu, a6xx_state, dumper); > > + a6xx_get_clusters(gpu, a6xx_state, dumper); > > + a6xx_get_dbgahb_clusters(gpu, a6xx_state, dumper); > > > > - msm_gem_kernel_put(dumper.bo, gpu->aspace, true); > > + msm_gem_kernel_put(dumper->bo, gpu->aspace, true); > > } > > > > if (snapshot_debugbus) > > diff --git a/drivers/gpu/drm/msm/msm_debugfs.c b/drivers/gpu/drm/msm/msm_debugfs.c > > index 7a2b53d35e6b..90558e826934 100644 > > --- a/drivers/gpu/drm/msm/msm_debugfs.c > > +++ b/drivers/gpu/drm/msm/msm_debugfs.c > > @@ -77,7 +77,7 @@ static int msm_gpu_open(struct inode *inode, struct file *file) > > goto free_priv; > > > > pm_runtime_get_sync(&gpu->pdev->dev); > > - show_priv->state = gpu->funcs->gpu_state_get(gpu); > > + show_priv->state = gpu->funcs->gpu_state_get(gpu, false); > > pm_runtime_put_sync(&gpu->pdev->dev); > > > > mutex_unlock(&dev->struct_mutex); > > diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c > > index fa7691cb4614..4d280bf446e6 100644 > > --- a/drivers/gpu/drm/msm/msm_gpu.c > > +++ b/drivers/gpu/drm/msm/msm_gpu.c > > @@ -381,7 +381,8 @@ static void msm_gpu_crashstate_get_bo(struct msm_gpu_state *state, > > } > > > > static void msm_gpu_crashstate_capture(struct msm_gpu *gpu, > > - struct msm_gem_submit *submit, char *comm, char *cmd) > > + struct msm_gem_submit *submit, char *comm, char *cmd, > > + bool stalled) > > { > > struct msm_gpu_state *state; > > > > @@ -393,7 +394,7 @@ static void msm_gpu_crashstate_capture(struct msm_gpu *gpu, > > if (gpu->crashstate) > > return; > > > > - state = gpu->funcs->gpu_state_get(gpu); > > + state = gpu->funcs->gpu_state_get(gpu, stalled); > > if (IS_ERR_OR_NULL(state)) > > return; > > > > @@ -519,7 +520,7 @@ static void recover_worker(struct kthread_work *work) > > > > /* Record the crash state */ > > pm_runtime_get_sync(&gpu->pdev->dev); > > - msm_gpu_crashstate_capture(gpu, submit, comm, cmd); > > + msm_gpu_crashstate_capture(gpu, submit, comm, cmd, false); > > pm_runtime_put_sync(&gpu->pdev->dev); > > > > kfree(cmd); > > diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h > > index 7a082a12d98f..c15e5fd675d2 100644 > > --- a/drivers/gpu/drm/msm/msm_gpu.h > > +++ b/drivers/gpu/drm/msm/msm_gpu.h > > @@ -60,7 +60,7 @@ struct msm_gpu_funcs { > > void (*debugfs_init)(struct msm_gpu *gpu, struct drm_minor *minor); > > #endif > > unsigned long (*gpu_busy)(struct msm_gpu *gpu); > > - struct msm_gpu_state *(*gpu_state_get)(struct msm_gpu *gpu); > > + struct msm_gpu_state *(*gpu_state_get)(struct msm_gpu *gpu, bool stalled); > > int (*gpu_state_put)(struct msm_gpu_state *state); > > unsigned long (*gpu_get_freq)(struct msm_gpu *gpu); > > void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp); > > -- > > 2.31.1 > >