Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp1636522pxp; Thu, 17 Mar 2022 13:16:05 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx1Sb5Xz7By0rHwlO9vVY0fwhKQk1uwtvwNUvhAa0VtpbEPcvSac26W2GYxxd7D0HvQ5DC1 X-Received: by 2002:a17:90a:a78f:b0:1bc:8042:9330 with SMTP id f15-20020a17090aa78f00b001bc80429330mr17642932pjq.229.1647548165465; Thu, 17 Mar 2022 13:16:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647548165; cv=none; d=google.com; s=arc-20160816; b=HCCv3Ko6LDSrHwMluKnsIJ2QBnKkSsaQJhV6Pcw2DcoQ8GjP7vxQKdmSTTXBexq18W G0Gq6bIk6cYjLh76Aec6LktZFNX1/Se5adozUYMsL1D0IfYCAst3L3d+2cXiKgjr/iOf 0LUAevLIqMANU30b92gQX8Ck2I7ZyzNyJp5DWpacIkq8l2/qjcEWu1MebLMijDKlawKj 1GsHB8GvRxa4txtnKBo8CV6wrHwO/C4zMSxPShiSYEmBL3YTopMPV4nfbDLJTCn1NgIf haKdQ1KHd3Jmodnac0ImvxO0QrxzIAcTPIXhOoBFG4GGvCrpaPtqvUaLaeO9Dj9l6Mwm /v+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=6LOApTDV32pj942nWbGnihH8wpCqgX1Ox8EsEiHpgfI=; b=WUzja7k2IvQ/n5TyKhYbDf88qDAG9PC10b+Z6dl0zLRv5XJcEfpdWRA61TZSNZoj+x cdaAQWbZk0hdmNF4h9hqx7J7fLtvxFmhr9PH96DnROs6FLjrsdVtEUkdLYxJmEKBnvXQ KhAn1ztHYrNiQkv+Ez/c3yKFhWHbum6y+zyhThQQXDKNfKE8Psv3Wxbs9zHGsbeAbC6Y Tfc94kNKvjKX1WuToREWcB5Pch7zGF+oxisBtmzdCnRBuRbSykWdzggKzveabEUX/bxO 4NtAGnmcPCaroLV98gWFB3hLLvvBNH93GQmbP7minj+b9UlLiIWfFhrUV6Z4MF3S2O+A cDng== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=dL6nH85l; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id d7-20020aa78e47000000b004f3d8bf00a8si5265977pfr.365.2022.03.17.13.16.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Mar 2022 13:16:05 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=dL6nH85l; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 36614116B6A; Thu, 17 Mar 2022 12:58:17 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235599AbiCQPJ0 (ORCPT + 99 others); Thu, 17 Mar 2022 11:09:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233738AbiCQPJY (ORCPT ); Thu, 17 Mar 2022 11:09:24 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8FE511BE78; Thu, 17 Mar 2022 08:08:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647529686; x=1679065686; h=date:from:to:cc:subject:message-id:references: mime-version:content-transfer-encoding:in-reply-to; bh=q2+TipMuP6Eri4UIx+/IiRxHb0cv1DVcASwAJEbrBzs=; b=dL6nH85lLdePQDLoiJp96C2oHmS+siwFFvDx65Ho3teUYvtWnt7PwERq CYJkM+nE9zBoIGDUP9jtrE/J3KQcd1/q1uQUFGxga3pmIjU2WfmQni0aF pMvjDiEl2a3YTwrMzxmmEIz/naRrFO6Y9J7GpFs4zyt/AzbC5SyNQL6xk exnwwygk157+0rYzRjxtiAuUBLcRVN01JT6H2aa5toL+4cRlurRs9cucs GOWDulBTraKTkfEcvTFLdkVP60LAxE0dO1qjxmuOh2FLKjsNdXN/iI2Qs aKXgF/QXq6tbl4OVJ/ZR2YdnaqqXoL0KUSl5303snBLnckR98XQwGrm3+ Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10289"; a="236831862" X-IronPort-AV: E=Sophos;i="5.90,188,1643702400"; d="scan'208";a="236831862" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Mar 2022 08:04:27 -0700 X-IronPort-AV: E=Sophos;i="5.90,188,1643702400"; d="scan'208";a="498855940" Received: from jons-linux-dev-box.fm.intel.com (HELO jons-linux-dev-box) ([10.1.27.20]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Mar 2022 08:04:26 -0700 Date: Thu, 17 Mar 2022 07:58:21 -0700 From: Matthew Brost To: Christian =?iso-8859-1?Q?K=F6nig?= Cc: Rob Clark , Andrey Grodzovsky , "dri-devel@lists.freedesktop.org" , "freedreno@lists.freedesktop.org" , "linux-arm-msm@vger.kernel.org" , Rob Clark , Sean Paul , Abhinav Kumar , David Airlie , Akhil P Oommen , Jonathan Marek , AngeloGioacchino Del Regno , Bjorn Andersson , Vladimir Lypak , open list Subject: Re: [PATCH 2/3] drm/msm/gpu: Park scheduler threads for system suspend Message-ID: <20220317145821.GA331@jons-linux-dev-box> References: <20220310234611.424743-1-robdclark@gmail.com> <20220310234611.424743-3-robdclark@gmail.com> <3945551d-47d2-1974-f637-1dbc61e14702@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <3945551d-47d2-1974-f637-1dbc61e14702@amd.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 17, 2022 at 03:06:18AM -0700, Christian K?nig wrote: > Am 17.03.22 um 10:59 schrieb Daniel Vetter: > > On Thu, Mar 10, 2022 at 03:46:05PM -0800, Rob Clark wrote: > >> From: Rob Clark > >> > >> In the system suspend path, we don't want to be racing with the > >> scheduler kthreads pushing additional queued up jobs to the hw > >> queue (ringbuffer). So park them first. While we are at it, > >> move the wait for active jobs to complete into the new system- > >> suspend path. > >> > >> Signed-off-by: Rob Clark > >> --- > >> drivers/gpu/drm/msm/adreno/adreno_device.c | 68 ++++++++++++++++++++-- > >> 1 file changed, 64 insertions(+), 4 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c > >> index 8859834b51b8..0440a98988fc 100644 > >> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c > >> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c > >> @@ -619,22 +619,82 @@ static int active_submits(struct msm_gpu *gpu) > >> static int adreno_runtime_suspend(struct device *dev) > >> { > >> struct msm_gpu *gpu = dev_to_gpu(dev); > >> - int remaining; > >> + > >> + /* > >> + * We should be holding a runpm ref, which will prevent > >> + * runtime suspend. In the system suspend path, we've > >> + * already waited for active jobs to complete. > >> + */ > >> + WARN_ON_ONCE(gpu->active_submits); > >> + > >> + return gpu->funcs->pm_suspend(gpu); > >> +} > >> + > >> +static void suspend_scheduler(struct msm_gpu *gpu) > >> +{ > >> + int i; > >> + > >> + /* > >> + * Shut down the scheduler before we force suspend, so that > >> + * suspend isn't racing with scheduler kthread feeding us > >> + * more work. > >> + * > >> + * Note, we just want to park the thread, and let any jobs > >> + * that are already on the hw queue complete normally, as > >> + * opposed to the drm_sched_stop() path used for handling > >> + * faulting/timed-out jobs. We can't really cancel any jobs > >> + * already on the hw queue without racing with the GPU. > >> + */ > >> + for (i = 0; i < gpu->nr_rings; i++) { > >> + struct drm_gpu_scheduler *sched = &gpu->rb[i]->sched; > >> + kthread_park(sched->thread); > > Shouldn't we have some proper interfaces for this? > > If I'm not completely mistaken we already should have one, yes. > > > Also I'm kinda wondering how other drivers do this, feels like we should have a standard > > way. > > > > Finally not flushing out all in-flight requests sounds a bit like a bad > > idea for system suspend/resume since that's also the hibernation path, and > > that would mean your shrinker/page reclaim stops working. At least in full > > generality. Which ain't good for hibernation. > > Completely agree, that looks like an incorrect workaround to me. > > During suspend all userspace applications should be frozen and all f > their hardware activity flushed out and waited for completion. > Isn't that what Rob is doing? He kills the scheduler preventing any new job from being submitted then waits for an outstanding jobs to complete naturally complete (see the wait_event_timeout below). If the jobs don't naturally complete the suspend seems to be aborted? That flow makes sense to me and seems like a novel way to avoid races. Matt > I do remember that our internal guys came up with pretty much the same > idea and it sounded broken to me back then as well. > > Regards, > Christian. > > > > > Adding Christian and Andrey. > > -Daniel > > > >> + } > >> +} > >> + > >> +static void resume_scheduler(struct msm_gpu *gpu) > >> +{ > >> + int i; > >> + > >> + for (i = 0; i < gpu->nr_rings; i++) { > >> + struct drm_gpu_scheduler *sched = &gpu->rb[i]->sched; > >> + kthread_unpark(sched->thread); > >> + } > >> +} > >> + > >> +static int adreno_system_suspend(struct device *dev) > >> +{ > >> + struct msm_gpu *gpu = dev_to_gpu(dev); > >> + int remaining, ret; > >> + > >> + suspend_scheduler(gpu); > >> > >> remaining = wait_event_timeout(gpu->retire_event, > >> active_submits(gpu) == 0, > >> msecs_to_jiffies(1000)); > >> if (remaining == 0) { > >> dev_err(dev, "Timeout waiting for GPU to suspend\n"); > >> - return -EBUSY; > >> + ret = -EBUSY; > >> + goto out; > >> } > >> > >> - return gpu->funcs->pm_suspend(gpu); > >> + ret = pm_runtime_force_suspend(dev); > >> +out: > >> + if (ret) > >> + resume_scheduler(gpu); > >> + > >> + return ret; > >> } > >> + > >> +static int adreno_system_resume(struct device *dev) > >> +{ > >> + resume_scheduler(dev_to_gpu(dev)); > >> + return pm_runtime_force_resume(dev); > >> +} > >> + > >> #endif > >> > >> static const struct dev_pm_ops adreno_pm_ops = { > >> - SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume) > >> + SET_SYSTEM_SLEEP_PM_OPS(adreno_system_suspend, adreno_system_resume) > >> SET_RUNTIME_PM_OPS(adreno_runtime_suspend, adreno_runtime_resume, NULL) > >> }; > >> > >> -- > >> 2.35.1 > >> >