Received: by 2002:a05:7412:e794:b0:fa:551:50a7 with SMTP id o20csp1675793rdd; Thu, 11 Jan 2024 06:19:25 -0800 (PST) X-Google-Smtp-Source: AGHT+IGhV0SJ3avEjK6LSq0GAcAxtFL8I6Px0I4Ps3sk0awB+5WF695O0S+a3uj9C/Dqz7MCIRFD X-Received: by 2002:a05:620a:8320:b0:783:3310:ecef with SMTP id pa32-20020a05620a832000b007833310ecefmr1453637qkn.64.1704982765226; Thu, 11 Jan 2024 06:19:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704982765; cv=none; d=google.com; s=arc-20160816; b=MtZfzMTuc1oo1+ZDae4HW5M5OT1r3zrnOFL+F/NO43odNlSsmviWZCCGsRuBotM6Wy eJpChXB+35hyRnB9+cWCv3gxdpLUCGbw0PEuawVFvwAa0wj03XgpgvkE9n7iOUQvVcdc FgS+htubN1xyTuJIAc4kedzj5q8zpV0ttM9lzf3P7EggktLm4kVUTLJJLQeR53+M+ri5 THNNSTvd04/gyM1l6+muxG/2EhXk82+Q7DotEkC7HQoaRz0e8OfQyGwp2LvOyLqT/C7i zYfKt9DxnNY4uzzlapGg1DuLsnfM0lmuIYX5emU1p2OFaaQGmzu0ONROuBqMsWzhdNdo E3hQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:mail-followup-to:message-id:subject:cc:to:from:date :dkim-signature; bh=xSSxh+RPIUMZKMpTsr13zKkvmGosyi4C4sKFagTsfmM=; fh=2JlMUvki4ZSO22afnchNUVu/YcwTSFFqtyp7r69J8g8=; b=rhfUd3vyByZF5b5absxVDJ4lu3nk8pCq73IULbiMUxl8V+tgoGAPzVSvEZGCUk32LD V6zX9wqnwKu5fN/AB4YyNB5AkLdn4iGXw0MI7u3W+TbexoOSBYr+n1QVDBIXbb/yoTZL M8CRKnB38Au7zC2aNjqtNhpYMW1iBjc/7L2eERxxMb5te1HWQYbtob9K9Lxw49t6/Ti7 wyzn1c5t8mJR/DMZEo/53FavLfOtM9sU+OfljoM5qRZ/Qpj5wALE4EXxeTcahGCaA3v3 x2y+oIbA5QVyS3d/g0U6h5SiT338zstESlEG1rpBYKkthfYEoTURfVfnGdXLeG+ItC6F qJ1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b=PBCkNEvl; spf=pass (google.com: domain of linux-kernel+bounces-23705-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-23705-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id y4-20020a37e304000000b0078334092c60si979852qki.70.2024.01.11.06.19.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Jan 2024 06:19:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-23705-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b=PBCkNEvl; spf=pass (google.com: domain of linux-kernel+bounces-23705-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-23705-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id E65751C2374E for ; Thu, 11 Jan 2024 14:19:24 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 27B5A3D384; Thu, 11 Jan 2024 14:19:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="PBCkNEvl" Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A86943C097 for ; Thu, 11 Jan 2024 14:19:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ffwll.ch Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ffwll.ch Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-a2b9e2c9858so61371066b.1 for ; Thu, 11 Jan 2024 06:19:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; t=1704982751; x=1705587551; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:mail-followup-to:message-id:subject:cc:to :from:date:from:to:cc:subject:date:message-id:reply-to; bh=xSSxh+RPIUMZKMpTsr13zKkvmGosyi4C4sKFagTsfmM=; b=PBCkNEvlaoggykFyzmmiFHVFhWFA1u5d7W66bzdVAY1r8vxSruFM1TKGmFY3wlCxMD /+mQbPUBpMtla6UIHzdqPWUfnJTNYfC5kOrsZXGga8U7bLK1m0Iu/XFpj/gezGLyG1gz o0b7NN5h+ttpKpx5D9tGKPUm8/+G1D9S6+TvM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704982751; x=1705587551; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:mail-followup-to:message-id:subject:cc:to :from:date:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=xSSxh+RPIUMZKMpTsr13zKkvmGosyi4C4sKFagTsfmM=; b=ozanygOgO8cVjkadrSBnl4Fk1Gevy0IG/Af5aQl5TdNcja8ri5SMYdHYjrOkc6Drrb qhf2vf39GAA71RSZDR4t9ou0fpdwQ87ghRKjagJCoY9XYFgmn74pOD8owZ2v8PvnnzEr hoeZJmFkoGLXEH8HGfumAuidu4she9NJXROrCcDNoJ722omatFzyRC4Vly9BUulSr3am w6kxiucDzH2OWAeCYLLTGoz3eA34cl1sV+3uXsRQ70xO9Vn5Ff8jT3xrSt2p79bvCyz3 UctSLYHOT0wazzvxDIKblLGjF19F5k45cp28IFzKZnFQTpmHNPGbEgM64jA7VgsRbf6g Q2Wg== X-Gm-Message-State: AOJu0YzXjOEsxJxfM/EVhzmNjx6ohzO+UDTed2cia4CT+yQu/fsgQRAa I/KBWve9pr5s2/Fgf42Rs7FE5ro62R6yvw== X-Received: by 2002:a17:907:96a1:b0:a26:d233:80b0 with SMTP id hd33-20020a17090796a100b00a26d23380b0mr1600615ejc.0.1704982750749; Thu, 11 Jan 2024 06:19:10 -0800 (PST) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id mb3-20020a170906eb0300b00a28956cf75esm629228ejb.130.2024.01.11.06.19.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Jan 2024 06:19:10 -0800 (PST) Date: Thu, 11 Jan 2024 15:19:08 +0100 From: Daniel Vetter To: Rob Clark Cc: dri-devel@lists.freedesktop.org, linux-arm-msm@vger.kernel.org, freedreno@lists.freedesktop.org, Rob Clark , Abhinav Kumar , Dmitry Baryshkov , Sean Paul , Marijn Suijten , David Airlie , open list , Daniel Vetter Subject: Re: [PATCH] Revert "drm/msm/gpu: Push gpu lock down past runpm" Message-ID: Mail-Followup-To: Rob Clark , dri-devel@lists.freedesktop.org, linux-arm-msm@vger.kernel.org, freedreno@lists.freedesktop.org, Rob Clark , Abhinav Kumar , Dmitry Baryshkov , Sean Paul , Marijn Suijten , David Airlie , open list References: <20240109182218.193804-1-robdclark@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Operating-System: Linux phenom 6.5.0-4-amd64 On Wed, Jan 10, 2024 at 06:54:53AM -0800, Rob Clark wrote: > On Wed, Jan 10, 2024 at 2:50 AM Daniel Vetter wrote: > > > > On Tue, Jan 09, 2024 at 10:22:17AM -0800, Rob Clark wrote: > > > From: Rob Clark > > > > > > This reverts commit abe2023b4cea192ab266b351fd38dc9dbd846df0. > > > > > > Changing the locking order means that scheduler/msm_job_run() can race > > > with the recovery kthread worker, with the result that the GPU gets an > > > extra runpm get when we are trying to power it off. Leaving the GPU in > > > an unrecovered state. > > > > The recovery kthread is supposed to stop all the relevant schedulers, > > which should remove any possible race conditions. So unless there's more > > going on, or you have your own recovery kthread (don't, reuse the one from > > the scheduler with your own work items, that's why you can provide that) > > this looks like an incomplete/incorrect explanation ... ? > > > > Slightly confused > > msm still uses it's own recovery, which pre-dates the scheduler > conversion. At one point (a yr or two back?) I started looking at > integrating recovery w/ scheduler.. at the time I think you talked me > out of it, but I don't remember the reason hm ... most scheduler discussions I remember was around the "allocate your own workqueue and hand that to scheduler to avoid races/deadlocks". Which iirc Boris implemented a while ago. Once you have that workqueue you can then also process any other error condition on there with the exact same locking design (like hw error or page faults or whatever), not just drm/sched tdr. I don't remember anything else that ever came up at least at a fundamental level ... So if that discussion was older than 78efe21b6f8e ("drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr") you should be covered. Fingers crossed :-) Meanwhile if you do not use drm/sched tdr at all then doing the exact same design but just on your own workqueue should also work. The critical thing is really only: - have one single-thread workqueue for all gpu recover - bracket each handler in there with drm_sched_stop/start for all affected engines No more races! Cheers, Sima > > BR, > -R > > > -Sima > > > > > > > > I'll need to come up with a different scheme for appeasing lockdep. > > > > > > Signed-off-by: Rob Clark > > > --- > > > drivers/gpu/drm/msm/msm_gpu.c | 11 +++++------ > > > drivers/gpu/drm/msm/msm_ringbuffer.c | 7 +++++-- > > > 2 files changed, 10 insertions(+), 8 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c > > > index 095390774f22..655002b21b0d 100644 > > > --- a/drivers/gpu/drm/msm/msm_gpu.c > > > +++ b/drivers/gpu/drm/msm/msm_gpu.c > > > @@ -751,12 +751,14 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit) > > > struct msm_ringbuffer *ring = submit->ring; > > > unsigned long flags; > > > > > > - pm_runtime_get_sync(&gpu->pdev->dev); > > > + WARN_ON(!mutex_is_locked(&gpu->lock)); > > > > > > - mutex_lock(&gpu->lock); > > > + pm_runtime_get_sync(&gpu->pdev->dev); > > > > > > msm_gpu_hw_init(gpu); > > > > > > + submit->seqno = submit->hw_fence->seqno; > > > + > > > update_sw_cntrs(gpu); > > > > > > /* > > > @@ -781,11 +783,8 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit) > > > gpu->funcs->submit(gpu, submit); > > > gpu->cur_ctx_seqno = submit->queue->ctx->seqno; > > > > > > - hangcheck_timer_reset(gpu); > > > - > > > - mutex_unlock(&gpu->lock); > > > - > > > pm_runtime_put(&gpu->pdev->dev); > > > + hangcheck_timer_reset(gpu); > > > } > > > > > > /* > > > diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c > > > index e0ed27739449..548f5266a7d3 100644 > > > --- a/drivers/gpu/drm/msm/msm_ringbuffer.c > > > +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c > > > @@ -21,8 +21,6 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job) > > > > > > msm_fence_init(submit->hw_fence, fctx); > > > > > > - submit->seqno = submit->hw_fence->seqno; > > > - > > > mutex_lock(&priv->lru.lock); > > > > > > for (i = 0; i < submit->nr_bos; i++) { > > > @@ -35,8 +33,13 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job) > > > > > > mutex_unlock(&priv->lru.lock); > > > > > > + /* TODO move submit path over to using a per-ring lock.. */ > > > + mutex_lock(&gpu->lock); > > > + > > > msm_gpu_submit(gpu, submit); > > > > > > + mutex_unlock(&gpu->lock); > > > + > > > return dma_fence_get(submit->hw_fence); > > > } > > > > > > -- > > > 2.43.0 > > > > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch