Received: by 2002:a25:824b:0:0:0:0:0 with SMTP id d11csp2202797ybn; Thu, 26 Sep 2019 08:27:29 -0700 (PDT) X-Google-Smtp-Source: APXvYqw1xk8ros8+4FWATSE+YUKAKHYODBOe/FDzZbclkWV0hk/eHNMusMbtspT9FHUdYj90/H+i X-Received: by 2002:a05:6402:184d:: with SMTP id v13mr4444282edy.56.1569511649223; Thu, 26 Sep 2019 08:27:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1569511649; cv=none; d=google.com; s=arc-20160816; b=hJ4+z9+BXwxAk9DTb4vtFJ/A2ccDYwzOmkj4t/LKDmYd54tz10YOIR5pk+VFWOcx2d nLaEG51Uuh8UdVGSSmxMnd7EpB/FD7r4J0F3e26Sjaj/2KwcQgi8CBPPaIXszDq980ef LuYO+v7BWowkVTsbgJAcJjPY783h3U1DrliJPKT4V1Qa8DeKIjHYXoOSj4B6r7dCZ4OZ DK6bti+LnPUiwJDTcyo1sdKutck4d/UPpkJY12wS8hiRCNfHlJGa6kq+U2fXmRhJ8f3G SjLZ+l/SqDmukigEdFMwpLi07nntUshxPHyGJOrKVyoc4bhYxcmJJUaCRDLp1w5tN9Hs cS+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=/NuoskoAJvCIFF7isK9v+ZdyiufepQoixVWyR8g6Duc=; b=rrfeaqdya38oaZGVAmsWjGHV1y9jcwYsB47AF0BDoQV7pk7wB/95AquSrG0JZJLtZv 7UZ05ozz25w/kv8WauOFdx15xkJJjQ/68cqDi6FC0IAbtHF+IIdnCz3AofyhLtkRdws4 IzNDqJ7GBViN3oOAoSCENHsPKy8W/5NlhNfasYH4v0jie42s+4O/pI+rMh5p/ew/PZoi ywxB0iYOXg6QBebxMn4grJCZNepVj+rcNcchG+QT+2bmQT+LZXF3VG5xnv3Zax5uOBNw 7s/AhRP/hPArH6QYvJMlMEkfzpp3i1bRPwCsrk/rB+wR7nogE8QnGZtuHkCTCZHF6sX0 K8sA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p20si1137178ejg.30.2019.09.26.08.27.04; Thu, 26 Sep 2019 08:27:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727253AbfIZPXU (ORCPT + 99 others); Thu, 26 Sep 2019 11:23:20 -0400 Received: from foss.arm.com ([217.140.110.172]:52798 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726666AbfIZPXU (ORCPT ); Thu, 26 Sep 2019 11:23:20 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 43D2F28; Thu, 26 Sep 2019 08:23:20 -0700 (PDT) Received: from [10.1.196.133] (e112269-lin.cambridge.arm.com [10.1.196.133]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D47503F534; Thu, 26 Sep 2019 08:23:18 -0700 (PDT) Subject: Re: [PATCH v4] drm: Don't free jobs in wait_event_interruptible() To: "Grodzovsky, Andrey" , Daniel Vetter , David Airlie , "Koenig, Christian" Cc: "Deucher, Alexander" , Nayan Deshmukh , Sharat Masetty , "linux-kernel@vger.kernel.org" , "dri-devel@lists.freedesktop.org" References: <20190926141630.14258-1-steven.price@arm.com> From: Steven Price Message-ID: <48214f02-e853-015f-55cc-397c7b06cb5d@arm.com> Date: Thu, 26 Sep 2019 16:23:17 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 26/09/2019 16:14, Grodzovsky, Andrey wrote: > > On 9/26/19 10:16 AM, Steven Price wrote: >> drm_sched_cleanup_jobs() attempts to free finished jobs, however because >> it is called as the condition of wait_event_interruptible() it must not >> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep. >> >> Instead let's rename drm_sched_cleanup_jobs() to >> drm_sched_get_cleanup_job() and simply return a job for processing if >> there is one. The caller can then call the free_job() callback outside >> the wait_event_interruptible() where sleeping is possible before >> re-checking and returning to sleep if necessary. >> >> Signed-off-by: Steven Price >> --- >> Changes from v3: >> * drm_sched_main() re-arms the timeout for the next job after calling >> free_job() >> >> drivers/gpu/drm/scheduler/sched_main.c | 45 +++++++++++++++----------- >> 1 file changed, 26 insertions(+), 19 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >> index 9a0ee74d82dc..148468447ba9 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb) >> } >> >> /** >> - * drm_sched_cleanup_jobs - destroy finished jobs >> + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed >> * >> * @sched: scheduler instance >> * >> - * Remove all finished jobs from the mirror list and destroy them. >> + * Returns the next finished job from the mirror list (if there is one) >> + * ready for it to be destroyed. >> */ >> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) >> +static struct drm_sched_job * >> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) >> { >> + struct drm_sched_job *job = NULL; >> unsigned long flags; >> >> /* Don't destroy jobs while the timeout worker is running */ >> if (sched->timeout != MAX_SCHEDULE_TIMEOUT && >> !cancel_delayed_work(&sched->work_tdr)) >> - return; >> - >> + return NULL; >> >> - while (!list_empty(&sched->ring_mirror_list)) { >> - struct drm_sched_job *job; >> + spin_lock_irqsave(&sched->job_list_lock, flags); >> >> - job = list_first_entry(&sched->ring_mirror_list, >> + job = list_first_entry_or_null(&sched->ring_mirror_list, >> struct drm_sched_job, node); >> - if (!dma_fence_is_signaled(&job->s_fence->finished)) >> - break; >> >> - spin_lock_irqsave(&sched->job_list_lock, flags); >> + if (job && dma_fence_is_signaled(&job->s_fence->finished)) { >> /* remove job from ring_mirror_list */ >> list_del_init(&job->node); >> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >> - >> - sched->ops->free_job(job); >> + } else { >> + job = NULL; >> + /* queue timeout for next job */ >> + drm_sched_start_timeout(sched); >> } >> >> - /* queue timeout for next job */ >> - spin_lock_irqsave(&sched->job_list_lock, flags); >> - drm_sched_start_timeout(sched); >> spin_unlock_irqrestore(&sched->job_list_lock, flags); >> >> + return job; >> } >> >> /** >> @@ -698,12 +696,21 @@ static int drm_sched_main(void *param) >> struct drm_sched_fence *s_fence; >> struct drm_sched_job *sched_job; >> struct dma_fence *fence; >> + struct drm_sched_job *cleanup_job = NULL; >> >> wait_event_interruptible(sched->wake_up_worker, >> - (drm_sched_cleanup_jobs(sched), >> + (cleanup_job = drm_sched_get_cleanup_job(sched)) || >> (!drm_sched_blocked(sched) && >> (entity = drm_sched_select_entity(sched))) || >> - kthread_should_stop())); >> + kthread_should_stop()); >> + >> + while (cleanup_job) { >> + sched->ops->free_job(cleanup_job); >> + /* queue timeout for next job */ >> + drm_sched_start_timeout(sched); >> + >> + cleanup_job = drm_sched_get_cleanup_job(sched); >> + } > > > Why drm_sched_start_timeout is called both here and inside > drm_sched_get_cleanup_job ? And also why call it multiple times in the > loop instead of only once after the loop is doneĀ  ? Christian pointed out to be that the first thing drm_sched_get_cleanup_job does is call cancel_delayed_work(), and if that returns false then it bails out with a NULL return. So to actually get another job (if one exists) the timeout has to be restarted. It's also necessary to restart the timeout in the case where the return is NULL which is handled in the function itself. TBH I'm not sure whether this while loop is worth it - it may be better to replace it with simply: if (cleanup_job) { sched->ops->free_job(cleanup_job); /* queue timeout for next job */ drm_sched_start_timeout(sched); } The outer loop would then handle the next call to drm_sched_get_cleanup_job() as necessary. Steve