Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp1865483rwb; Fri, 19 Aug 2022 10:40:33 -0700 (PDT) X-Google-Smtp-Source: AA6agR60vTfCqUCpgGtSJvVwvx5tdYZ2Ovo4wHu2DWtXYwF5V4SmwxZb+6347SUmAXtIeYaWevbu X-Received: by 2002:a63:8942:0:b0:429:a15b:4cd2 with SMTP id v63-20020a638942000000b00429a15b4cd2mr7070676pgd.445.1660930833134; Fri, 19 Aug 2022 10:40:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660930833; cv=none; d=google.com; s=arc-20160816; b=YrzotZ3gBFaAZXY9OQKfw/4ycUNwaOoAzRi0Bk+/shUwXQMtmWWW+PZvT27vyIrm0D u2uWYyvL4qji88JCiG3rZ1//3BpnGkCnQpyuy0S+KAMBvLipVrhuniRWSq6sVIvS+rqC WkffHBOtajnjQ2JJ8j0DRXcX54Q/tBMEcVNP2y6+5cymmexW7OPhZUU+BZEd2qiaovBq Wn9UQbaz56Br9chXc8MTq6HxxYGnCiyDKGzg0HM3Q6abc8HyfGJOoJQSMKYW7gQP9wR6 2DZmNDzlSab+f4Nx3GILSRVwHcQX/mEdYaz6hCg2MnwsTg8TMqXuElzH8FY4E1eGwJ7L B6vQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id; bh=OvJK1/AFVEjaemzm1JXYR2UdjE+HmaUOP4hzwFAXXBw=; b=SPinC1BGCwdqznKHNlucbJ2yVGO01GssLPIpCOO5VoDKfXk4tAFAAvI0iCBQMjLz4L voQ3sCYTDmKfXgKv7dVQR0We9rmt48fLBfSBp3Fc4bKXH+uBGWO5o0oSDkbEKXYfXZ5P IlFpOhpY4lvZ6nyOrRI4fvS33Y9jRsnTvbH967VRuewk8q15IJHRRwUkwekeTiKtuNT4 IbQcyLK6jBOdjS2rduotY26hzVGTPVshOzu3xbBB9X7xrjjmHjNh+KB6Rp2cE0QN7Tof +J5WhJhHWXi1QyTa9ORBZyA3Batl0ilz3AG3s3gOWsz1uANEJjKcixd2jNa3GEgme7gF 56oQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l184-20020a6388c1000000b0041b2d206897si4344804pgd.804.2022.08.19.10.40.22; Fri, 19 Aug 2022 10:40:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351117AbiHSRNJ (ORCPT + 99 others); Fri, 19 Aug 2022 13:13:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350349AbiHSRMu (ORCPT ); Fri, 19 Aug 2022 13:12:50 -0400 Received: from metis.ext.pengutronix.de (metis.ext.pengutronix.de [IPv6:2001:67c:670:201:290:27ff:fe1d:cc33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CCC3011F763 for ; Fri, 19 Aug 2022 09:32:50 -0700 (PDT) Received: from gallifrey.ext.pengutronix.de ([2001:67c:670:201:5054:ff:fe8d:eefb] helo=[IPv6:::1]) by metis.ext.pengutronix.de with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oP4vQ-0007Ka-Op; Fri, 19 Aug 2022 18:32:48 +0200 Message-ID: <10fe31b114732ff47bb072dd1e3c6e3928654310.camel@pengutronix.de> Subject: Re: [PATCH] drm/etnaviv: print offender task information on hangcheck recovery From: Lucas Stach To: Christian Gmeiner , linux-kernel@vger.kernel.org Cc: David Airlie , Daniel Vetter , "moderated list:DRM DRIVERS FOR VIVANTE GPU IP" , "open list:DRM DRIVERS FOR VIVANTE GPU IP" , Russell King Date: Fri, 19 Aug 2022 18:32:46 +0200 In-Reply-To: References: <20220603123706.678320-1-christian.gmeiner@gmail.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.40.4 (3.40.4-1.fc34) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-SA-Exim-Connect-IP: 2001:67c:670:201:5054:ff:fe8d:eefb X-SA-Exim-Mail-From: l.stach@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: linux-kernel@vger.kernel.org X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am Mittwoch, dem 22.06.2022 um 10:52 +0200 schrieb Lucas Stach: > Hi Christian, > > Am Freitag, dem 03.06.2022 um 14:37 +0200 schrieb Christian Gmeiner: > > Track the pid per submit, so we can print the name and cmdline of > > the task which submitted the batch that caused the gpu to hang. > > > I really like the idea. I think the pid handling could be integrated > into the scheduler, so we don't have to carry it on each submit, but > not requesting any changes right now. I'm leaning toward taking this > patch as-is and doing the scheduler integration as a second step. > Applied to etnaviv/next. Regards, Lucas > > > Signed-off-by: Christian Gmeiner > > --- > > drivers/gpu/drm/etnaviv/etnaviv_gem.h | 1 + > > drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 6 ++++++ > > drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 18 +++++++++++++++++- > > drivers/gpu/drm/etnaviv/etnaviv_gpu.h | 2 +- > > drivers/gpu/drm/etnaviv/etnaviv_sched.c | 2 +- > > 5 files changed, 26 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.h b/drivers/gpu/drm/etnaviv/etnaviv_gem.h > > index 63688e6e4580..baa81cbf701a 100644 > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.h > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.h > > @@ -96,6 +96,7 @@ struct etnaviv_gem_submit { > > int out_fence_id; > > struct list_head node; /* GPU active submit list */ > > struct etnaviv_cmdbuf cmdbuf; > > + struct pid *pid; /* submitting process */ > > bool runtime_resumed; > > u32 exec_state; > > u32 flags; > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c > > index 1ac916b24891..1491159d0d20 100644 > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c > > @@ -399,6 +399,9 @@ static void submit_cleanup(struct kref *kref) > > mutex_unlock(&submit->gpu->fence_lock); > > dma_fence_put(submit->out_fence); > > } > > + > > + put_pid(submit->pid); > > + > > kfree(submit->pmrs); > > kfree(submit); > > } > > @@ -422,6 +425,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data, > > struct sync_file *sync_file = NULL; > > struct ww_acquire_ctx ticket; > > int out_fence_fd = -1; > > + struct pid *pid = get_pid(task_pid(current)); > > void *stream; > > int ret; > > > > @@ -519,6 +523,8 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data, > > goto err_submit_ww_acquire; > > } > > > > + submit->pid = pid; > > + > > ret = etnaviv_cmdbuf_init(priv->cmdbuf_suballoc, &submit->cmdbuf, > > ALIGN(args->stream_size, 8) + 8); > > if (ret) > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c > > index 37018bc55810..7d9bf4673e2d 100644 > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c > > @@ -1045,12 +1045,28 @@ int etnaviv_gpu_debugfs(struct etnaviv_gpu *gpu, struct seq_file *m) > > } > > #endif > > > > -void etnaviv_gpu_recover_hang(struct etnaviv_gpu *gpu) > > +void etnaviv_gpu_recover_hang(struct etnaviv_gem_submit *submit) > > { > > + struct etnaviv_gpu *gpu = submit->gpu; > > + char *comm = NULL, *cmd = NULL; > > + struct task_struct *task; > > unsigned int i; > > > > dev_err(gpu->dev, "recover hung GPU!\n"); > > > > + task = get_pid_task(submit->pid, PIDTYPE_PID); > > + if (task) { > > + comm = kstrdup(task->comm, GFP_KERNEL); > > + cmd = kstrdup_quotable_cmdline(task, GFP_KERNEL); > > + put_task_struct(task); > > + } > > + > > + if (comm && cmd) > > + dev_err(gpu->dev, "offending task: %s (%s)\n", comm, cmd); > > + > > + kfree(cmd); > > + kfree(comm); > > + > > if (pm_runtime_get_sync(gpu->dev) < 0) > > goto pm_put; > > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.h b/drivers/gpu/drm/etnaviv/etnaviv_gpu.h > > index 85eddd492774..b3a0941d56fd 100644 > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.h > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.h > > @@ -168,7 +168,7 @@ bool etnaviv_fill_identity_from_hwdb(struct etnaviv_gpu *gpu); > > int etnaviv_gpu_debugfs(struct etnaviv_gpu *gpu, struct seq_file *m); > > #endif > > > > -void etnaviv_gpu_recover_hang(struct etnaviv_gpu *gpu); > > +void etnaviv_gpu_recover_hang(struct etnaviv_gem_submit *submit); > > void etnaviv_gpu_retire(struct etnaviv_gpu *gpu); > > int etnaviv_gpu_wait_fence_interruptible(struct etnaviv_gpu *gpu, > > u32 fence, struct drm_etnaviv_timespec *timeout); > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > > index 72e2553fbc98..d29f467eee13 100644 > > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > > @@ -67,7 +67,7 @@ static enum drm_gpu_sched_stat etnaviv_sched_timedout_job(struct drm_sched_job > > > > /* get the GPU back into the init state */ > > etnaviv_core_dump(submit); > > - etnaviv_gpu_recover_hang(gpu); > > + etnaviv_gpu_recover_hang(submit); > > > > drm_sched_resubmit_jobs(&gpu->sched); > > > >