Received: by 2002:ac0:a591:0:0:0:0:0 with SMTP id m17-v6csp489754imm; Thu, 5 Jul 2018 04:01:07 -0700 (PDT) X-Google-Smtp-Source: AAOMgpem2ZOZz2J5s4N7gaSY1GRUYX/GTH7pGOYmEi3mvXYoohDFhFOMYbYTOIwc3yBjC/RqjD/1 X-Received: by 2002:a17:902:a9ca:: with SMTP id b10-v6mr5689646plr.275.1530788467544; Thu, 05 Jul 2018 04:01:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530788467; cv=none; d=google.com; s=arc-20160816; b=TiX0NKVuCddoMM8Wn06PHb0BZDWiEbU0Wm62Q5tuKDpA8xa4uITypR6vWZwSr+3/VZ ggUP01YyDVUozK4S597XcSxzDgr2sMYrQyp159QaUk6L6Vkqw14GZm8Vcsq+ha2LsPM2 jROAn2kcQQtlB5YLYFHPDmUqw3avGZ8ie37s7cwBk2aHHNOOjfojQHshYE79Rj7ypT0r SyrtKquVTAJtZzSpYJBTRJWoR53k9E7EDpfLT245CO7KuEySMefyHPhNYzWYZIT/0RZA UVLx5MxhmEluk7eUSxTVCZR5C8RXKORUXvAv4EnMt6E+gSAKHd998Facd2Bf8vqH1zTn +ygA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id :arc-authentication-results; bh=pFGuVKUOremO7WE9pUdrOKhtvKbFZNVd2V2P9qTq3ys=; b=R/MI/9caZJTfOATqeKG6XbKq54LA9cGOiAfqesBIAwM6TGvt/O/hgxqutgIyo4hfMh 3IEM23/NHyFZCIn+raQvfnOz2UKFrE1Y6Mks9u+/j+8gQKDdvro8aTcaXLMHbD53mmKC IvM8pbPQC5DcO4Shh3iAIPg7aHbv3I9nb0Y6PBLtAO46yqkqfjsLIW4AWk2xQOcM1ZdX m9fsQQIcyIWE/tpX9qhqNMLx97s9T4glrJBmj7hV0QITZHMIhy+5ENnCChiG+rJY606O OGysVuMOqt3TyslG+ZskUuTbke2Bi/t7nCPkyr3kZkrsvKi3SNOcYG/cG5Fb2b22ZJ1Q cKmg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t2-v6si5496178pge.64.2018.07.05.04.00.46; Thu, 05 Jul 2018 04:01:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753819AbeGEK7M (ORCPT + 99 others); Thu, 5 Jul 2018 06:59:12 -0400 Received: from metis.ext.pengutronix.de ([85.220.165.71]:53675 "EHLO metis.ext.pengutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753756AbeGEK7L (ORCPT ); Thu, 5 Jul 2018 06:59:11 -0400 Received: from kresse.hi.pengutronix.de ([2001:67c:670:100:1d::2a]) by metis.ext.pengutronix.de with esmtp (Exim 4.89) (envelope-from ) id 1fb1yd-0006uu-KP; Thu, 05 Jul 2018 12:59:07 +0200 Message-ID: <1530788347.15725.2.camel@pengutronix.de> Subject: Re: [PATCH 1/4] drm/v3d: Delay the scheduler timeout if we're still making progress. From: Lucas Stach To: Eric Anholt , dri-devel@lists.freedesktop.org Cc: linux-kernel@vger.kernel.org Date: Thu, 05 Jul 2018 12:59:07 +0200 In-Reply-To: <20180703170515.6298-1-eric@anholt.net> References: <20180703170515.6298-1-eric@anholt.net> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.22.6-1+deb9u1 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-SA-Exim-Connect-IP: 2001:67c:670:100:1d::2a X-SA-Exim-Mail-From: l.stach@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: linux-kernel@vger.kernel.org Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am Dienstag, den 03.07.2018, 10:05 -0700 schrieb Eric Anholt: > GTF-GLES2.gtf.GL.acos.acos_float_vert_xvary submits jobs that take 4 > seconds at maximum resolution, but we still want to reset quickly if a > job is really hung.  Sample the CL's current address and the return > address (since we call into tile lists repeatedly) and if either has > changed then assume we've made progress. So this means you are doubling your timeout? AFAICS for the first time you hit the timeout handler the cached ctca and ctra values will probably always differ from the current values. Maybe this warrants a mention in the commit message, as it's changing the behavior of the scheduler timeout. Also how easy is it for userspace to construct such an infinite loop in the CL? Thinking about a rogue client DoSing the GPU while exploiting this check in the timeout handler to stay under the radar... Regards, Lucas > Signed-off-by: Eric Anholt > > Cc: Lucas Stach > --- >  drivers/gpu/drm/v3d/v3d_drv.h   |  2 ++ >  drivers/gpu/drm/v3d/v3d_regs.h  |  1 + >  drivers/gpu/drm/v3d/v3d_sched.c | 18 ++++++++++++++++++ >  3 files changed, 21 insertions(+) > > diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h > index f546e0ab9562..a5d96d823416 100644 > --- a/drivers/gpu/drm/v3d/v3d_drv.h > +++ b/drivers/gpu/drm/v3d/v3d_drv.h > @@ -189,6 +189,8 @@ struct v3d_job { >   > >   /* GPU virtual addresses of the start/end of the CL job. */ > >   u32 start, end; > + > > + u32 timedout_ctca, timedout_ctra; >  }; >   >  struct v3d_exec_info { > diff --git a/drivers/gpu/drm/v3d/v3d_regs.h b/drivers/gpu/drm/v3d/v3d_regs.h > index fc13282dfc2f..854046565989 100644 > --- a/drivers/gpu/drm/v3d/v3d_regs.h > +++ b/drivers/gpu/drm/v3d/v3d_regs.h > @@ -222,6 +222,7 @@ >  #define V3D_CLE_CTNCA(n) (V3D_CLE_CT0CA + 4 * n) >  #define V3D_CLE_CT0RA                                  0x00118 >  #define V3D_CLE_CT1RA                                  0x0011c > +#define V3D_CLE_CTNRA(n) (V3D_CLE_CT0RA + 4 * n) >  #define V3D_CLE_CT0LC                                  0x00120 >  #define V3D_CLE_CT1LC                                  0x00124 >  #define V3D_CLE_CT0PC                                  0x00128 > diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c > index 808bc901f567..00667c733dca 100644 > --- a/drivers/gpu/drm/v3d/v3d_sched.c > +++ b/drivers/gpu/drm/v3d/v3d_sched.c > @@ -153,7 +153,25 @@ v3d_job_timedout(struct drm_sched_job *sched_job) > >   struct v3d_job *job = to_v3d_job(sched_job); > >   struct v3d_exec_info *exec = job->exec; > >   struct v3d_dev *v3d = exec->v3d; > > + enum v3d_queue job_q = job == &exec->bin ? V3D_BIN : V3D_RENDER; > >   enum v3d_queue q; > > + u32 ctca = V3D_CORE_READ(0, V3D_CLE_CTNCA(job_q)); > > + u32 ctra = V3D_CORE_READ(0, V3D_CLE_CTNRA(job_q)); > + > > + /* If the current address or return address have changed, then > > +  * the GPU has probably made progress and we should delay the > > +  * reset.  This could fail if the GPU got in an infinite loop > > +  * in the CL, but that is pretty unlikely outside of an i-g-t > > +  * testcase. > > +  */ > > + if (job->timedout_ctca != ctca || job->timedout_ctra != ctra) { > > + job->timedout_ctca = ctca; > > + job->timedout_ctra = ctra; > + > > + schedule_delayed_work(&job->base.work_tdr, > > +       job->base.sched->timeout); > > + return; > > + } >   > >   mutex_lock(&v3d->reset_lock); >