Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp4256606ybp; Mon, 7 Oct 2019 05:53:08 -0700 (PDT) X-Google-Smtp-Source: APXvYqyjHxhEKPKygKigNU0rITWmqbt1m1jWPHTz466c2ysnHdsbCmjfUVVGb0Zr2Iuk/K9ktjAL X-Received: by 2002:a17:906:244a:: with SMTP id a10mr23731143ejb.137.1570452788055; Mon, 07 Oct 2019 05:53:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570452788; cv=none; d=google.com; s=arc-20160816; b=yYMBaS4/o0qAA6bY5iiYROS6mcGLCn+UZIV905aNxyRe21NTyIlZ2OcjeKhqrm+NBV RfRVAcBG0g84s8DZvJBVVg2ErB/P/vK+UmnKa3KBra1nhNuuDzx/x1itCdI5JumcC++2 dXSEO+5vqj/Rexp+mUeEbCRHhBOxYQcQbO9XDYWjjpt+q9kR42CF509UuMlq0dPaplVE wYL1raHod+4lqda5jDFh1v/S263w4grAtr0TnX7VROp+z1791GrrSLU7T2Wjjfon8uzb 360gDA6lJ2Aldwaybhu5FqqEJ8viye/kCDWyCyYQhxR3KcbfUtkYNZbzxE3ttWdV79av AGLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=YdvkLhru+lE43/IOCOnKUzXCCy3qLb6YJ1xmhbchD3Y=; b=kw4F6fFca/SmsZ/Ybp410NxOy+bepKWLCY8MeACqvxM6++4yU450eMC4NRjoJWuxm4 fqOmmuKtsCydfHyEmtT5MXS5OsnHJmrKMH7lvS0LZd7NIl9312J6vJppEiQ7qULqTdUX hm/OnC79alo0eWnjACgxxQ7YUNcBN6kwoZVjRZQeyD8II/q1vFlqFHBs9ud5+kLItQ34 wwrYBU5nlOiqdls62SgmdcKLNtkALuD0O9SyzKGDd26K5yZCKjhQOAtQYYXsYF0V3qDY qkvUGl/s8Ebbjetgqo/alXuSux8b6XoE/l/ckAYlCAmiVnkVxBaAE68LlH0f1eIRkuUt t+Hg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p14si8850592edc.227.2019.10.07.05.52.44; Mon, 07 Oct 2019 05:53:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727947AbfJGMuZ (ORCPT + 99 others); Mon, 7 Oct 2019 08:50:25 -0400 Received: from foss.arm.com ([217.140.110.172]:33790 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727554AbfJGMuZ (ORCPT ); Mon, 7 Oct 2019 08:50:25 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D38001570; Mon, 7 Oct 2019 05:50:24 -0700 (PDT) Received: from e112269-lin.arm.com (e112269-lin.cambridge.arm.com [10.1.196.133]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9840E3F706; Mon, 7 Oct 2019 05:50:23 -0700 (PDT) From: Steven Price To: Daniel Vetter , David Airlie , Rob Herring , Tomeu Vizoso Cc: Alyssa Rosenzweig , Steven Price , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, Neil Armstrong Subject: [PATCH] drm/panfrost: Handle resetting on timeout better Date: Mon, 7 Oct 2019 13:50:14 +0100 Message-Id: <20191007125014.12595-1-steven.price@arm.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Panfrost uses multiple schedulers (one for each slot, so 2 in reality), and on a timeout has to stop all the schedulers to safely perform a reset. However more than one scheduler can trigger a timeout at the same time. This race condition results in jobs being freed while they are still in use. When stopping other slots use cancel_delayed_work_sync() to ensure that any timeout started for that slot has completed. Also use mutex_trylock() to obtain reset_lock. This means that only one thread attempts the reset, the other threads will simply complete without doing anything (the first thread will wait for this in the call to cancel_delayed_work_sync()). While we're here and since the function is already dependent on sched_job not being NULL, let's remove the unnecessary checks, along with a commented out call to panfrost_core_dump() which has never existed in mainline. Signed-off-by: Steven Price --- This is a tidied up version of the patch orginally posted here: http://lkml.kernel.org/r/26ae2a4d-8df1-e8db-3060-41638ed63e2a%40arm.com drivers/gpu/drm/panfrost/panfrost_job.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index a58551668d9a..dcc9a7603685 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -381,13 +381,19 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job) job_read(pfdev, JS_TAIL_LO(js)), sched_job); - mutex_lock(&pfdev->reset_lock); + if (!mutex_trylock(&pfdev->reset_lock)) + return; - for (i = 0; i < NUM_JOB_SLOTS; i++) - drm_sched_stop(&pfdev->js->queue[i].sched, sched_job); + for (i = 0; i < NUM_JOB_SLOTS; i++) { + struct drm_gpu_scheduler *sched = &pfdev->js->queue[i].sched; + + drm_sched_stop(sched, sched_job); + if (js != i) + /* Ensure any timeouts on other slots have finished */ + cancel_delayed_work_sync(&sched->work_tdr); + } - if (sched_job) - drm_sched_increase_karma(sched_job); + drm_sched_increase_karma(sched_job); spin_lock_irqsave(&pfdev->js->job_lock, flags); for (i = 0; i < NUM_JOB_SLOTS; i++) { @@ -398,7 +404,6 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job) } spin_unlock_irqrestore(&pfdev->js->job_lock, flags); - /* panfrost_core_dump(pfdev); */ panfrost_devfreq_record_transition(pfdev, js); panfrost_device_reset(pfdev); -- 2.20.1