Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp852900pxk; Thu, 17 Sep 2020 19:07:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz6Qr3Q4Y/JjUa0R5kTbmWrv8uFK1GUwiW0hwStKnNDapFD5DvfSW65yUjJMrCVEAO9DjTf X-Received: by 2002:a50:9f6f:: with SMTP id b102mr35543268edf.272.1600394834642; Thu, 17 Sep 2020 19:07:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600394834; cv=none; d=google.com; s=arc-20160816; b=N2WyKPaU3dyO4MweZvxNJxe6NtODWcDJ5W9e4oJR5UGfB9i+FGUYv5eQcp9dv7z84V AoOgnZtWhFJe2v8ejRTLMU3tUZ1FOSHuUYnbaoQYFnk4HhrX1vJB1fO9ksr5ZC9CLsqY m+3pZABVeUlMYOyjswMY15RQ8z+BgkKaWWZXQWylg/EeDBhBrsBAXu2ivIXSDRlfiNCv 0h0EjPdydxTfCsJe2qOgcJI6pohO+tOktnHuAIyizA6e0mtHuqMEKBiGTGU7ZIb2VSjM 3aKS2tgUD3F5Um16t66t/Nc0ACFM5E6h5EW3CsXa/RLf5140DV/93LMB7givxyknFxy4 cusQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=osoG6ycH+YgyEz14AlPEdgK3G+rbAh/iO4CXgPU2WVU=; b=zY+khoQShdY5WPznDZwPuu5f5LmxDHf0RseySaXSbu7gQHzTtty0llb8Z8HUuZ5oZJ Ygz+kmJsTkwEbYwIjvRgozsidZAVM9eDbZXNwgEurhXRU4eKEkVtHM/JFbBb6qmgf4cO 1/47ykAk/fF8YdA2lgo39nUkKD6RvWrFPU5XaKFcFlWr5Vyj5JyCL8/+o7hQSYookIUP mwyL1obsiRqBbYPxDj3JMprgSPNCmV/SRfj+shFDAAIcHEFEWG3JvcGURZGv/DrlBcP8 lSvbpjRj/381wtI0hcrLOkHSBiOD0Bq4gq9XL/xifBUsjsiVyid3o3R6r4Psy3uDVsZ7 0K2w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=LctZEalj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u8si1246035ejg.578.2020.09.17.19.06.51; Thu, 17 Sep 2020 19:07:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=LctZEalj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726958AbgIRCDG (ORCPT + 99 others); Thu, 17 Sep 2020 22:03:06 -0400 Received: from mail.kernel.org ([198.145.29.99]:48558 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726876AbgIRCCy (ORCPT ); Thu, 17 Sep 2020 22:02:54 -0400 Received: from sasha-vm.mshome.net (c-73-47-72-35.hsd1.nh.comcast.net [73.47.72.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C110123718; Fri, 18 Sep 2020 02:02:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600394573; bh=mB0PswWSGZUE0hQctMizawBCYsl1DAYFkU5l1hbNNLQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LctZEaljem9J0OpfHirw5iliTw8czdw5d2wr6BiQ1RR8TLpDZe1UvoZFT1GXRPfzO FCKq6vc1LYXjKrQmkO9dMDCVu7HOZ0wdSrkJ1lAn3VfuPSVMeVW6zFVaG/uPWiTa7h owNuAk4WCnLfwmHUqC7GIA+39Q95CQni+Mr4tbK8= From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Andrey Grodzovsky , =?UTF-8?q?Christian=20K=C3=B6nig?= , Emily Deng , Sasha Levin , dri-devel@lists.freedesktop.org Subject: [PATCH AUTOSEL 5.4 085/330] drm/scheduler: Avoid accessing freed bad job. Date: Thu, 17 Sep 2020 21:57:05 -0400 Message-Id: <20200918020110.2063155-85-sashal@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200918020110.2063155-1-sashal@kernel.org> References: <20200918020110.2063155-1-sashal@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Andrey Grodzovsky [ Upstream commit 135517d3565b48f4def3b1b82008bc17eb5d1c90 ] Problem: Due to a race between drm_sched_cleanup_jobs in sched thread and drm_sched_job_timedout in timeout work there is a possiblity that bad job was already freed while still being accessed from the timeout thread. Fix: Instead of just peeking at the bad job in the mirror list remove it from the list under lock and then put it back later when we are garanteed no race with main sched thread is possible which is after the thread is parked. v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs. v3: Rebase on top of drm-misc-next. v2 is not needed anymore as drm_sched_get_cleanup_job already has a lock there. v4: Fix comments to relfect latest code in drm-misc. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König Reviewed-by: Emily Deng Tested-by: Emily Deng Signed-off-by: Christian König Link: https://patchwork.freedesktop.org/patch/342356 Signed-off-by: Sasha Levin --- drivers/gpu/drm/scheduler/sched_main.c | 27 ++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 30c5ddd6d081c..134e9106ebac1 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct work_struct *work) unsigned long flags; sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work); + + /* Protects against concurrent deletion in drm_sched_get_cleanup_job */ + spin_lock_irqsave(&sched->job_list_lock, flags); job = list_first_entry_or_null(&sched->ring_mirror_list, struct drm_sched_job, node); if (job) { + /* + * Remove the bad job so it cannot be freed by concurrent + * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread + * is parked at which point it's safe. + */ + list_del_init(&job->node); + spin_unlock_irqrestore(&sched->job_list_lock, flags); + job->sched->ops->timedout_job(job); /* @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct work_struct *work) job->sched->ops->free_job(job); sched->free_guilty = false; } + } else { + spin_unlock_irqrestore(&sched->job_list_lock, flags); } spin_lock_irqsave(&sched->job_list_lock, flags); @@ -369,6 +382,20 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) kthread_park(sched->thread); + /* + * Reinsert back the bad job here - now it's safe as + * drm_sched_get_cleanup_job cannot race against us and release the + * bad job at this point - we parked (waited for) any in progress + * (earlier) cleanups and drm_sched_get_cleanup_job will not be called + * now until the scheduler thread is unparked. + */ + if (bad && bad->sched == sched) + /* + * Add at the head of the queue to reflect it was the earliest + * job extracted. + */ + list_add(&bad->node, &sched->ring_mirror_list); + /* * Iterate the job list from later to earlier one and either deactive * their HW callbacks or remove them from mirror list if they already -- 2.25.1