Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2709211pxj; Mon, 31 May 2021 08:47:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxwgClcu05QIe1DejUBON+cyryUl7ORRbW+ZJP0eLL8/sg9t6050wsdoNn3fUpkJ4rUPv/g X-Received: by 2002:a17:906:4e8c:: with SMTP id v12mr23783904eju.365.1622476037607; Mon, 31 May 2021 08:47:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622476037; cv=none; d=google.com; s=arc-20160816; b=SDw0/BsfacyqVmu/XpTKEzpHq2ky8oDnS6u8FMsL3amADd/Iujr4r+x+r7nApGbmWS L5UnIv2dJvGcCHNOsdeFojxoDHjsQW85TbO14bO6/nMOtshsCLijlcph3CAerR3lTi0P 7O4OtMJHb5Z7lzMMHzM3XQt4aPFfOu/TQRCQCZEe4TfWoY0sXK+yborggSarOwovh5HS aBV9FJeftaYuWl83JXbLjChbGMNg0qrbXyRxnVAvusy+hPyuH0TbXYkRndM8aocj+5Aq wV7itM1pxD3IA3sUEx+i19m0QeYaktivK3U2vAkbYgEKkcgMl1ISiRi/TFRiZW+rucBf Bo5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=xkgb89CO4/h60Hw0Zy8LWO4yJukMoLDDaoUIO8n7q4g=; b=UzKjUMewsULsfbbkOxO0EPoa5/bjFlT/Hajd7Rhoe4tEja628CyYBUe40ffgnDk6ah DNzv+nvYyajEhURDKnaDk+O+IG8EaK8kZdLO8aVGunLF6iWjU3MIDmKgKByRNsWBM1T0 4rOYbaTWfF/Hbuxh6BFTVs1sVlXoBzRD/O8PFrQHJ0rA/6dnIRzsdDr6ykwFqzE12nRP 9cJIQ2FvUGheTDgUipMQE4Kmby9ZZ8Ry2CBc9YZcNHLqYw8EMJIxjr34Q3e1Ro6rOXPN KCSKie7B0lMz73d1AepLAygPyxXbaGjOAlT9d4JbMSgL+zKDOhqZVsKi3VN5ruNzbZYJ qopA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=DCgrKcVB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h11si3449468edz.24.2021.05.31.08.46.53; Mon, 31 May 2021 08:47:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=DCgrKcVB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232065AbhEaPpn (ORCPT + 99 others); Mon, 31 May 2021 11:45:43 -0400 Received: from mail.kernel.org ([198.145.29.99]:48830 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233771AbhEaOZ0 (ORCPT ); Mon, 31 May 2021 10:25:26 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 8B68161A24; Mon, 31 May 2021 13:46:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1622468800; bh=SNPG34cW3+hIu5UyOQGDVSaQNlNyVlT+x99/tpuyLac=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=DCgrKcVB04pyRProsvkXp5Kv1c7t4rUcRcLB8ZsIS5tCzyIAN6fHJXojKbZhjE4oG dTWxHB4RDIoLjaxWLWxngGRsVY7DF0Q3kkGux/oByFb/BpOTQgM2fgwJf/YB2jZWHX 8H5mEXrBNvq5UpmPZY+XWMUfGQ6mqwe+te5Z0r3E= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Lang Yu , =?UTF-8?q?Christian=20K=C3=83nig?= , Andrey Grodzovsky , Alex Deucher , Sasha Levin Subject: [PATCH 5.4 134/177] drm/amd/amdgpu: fix a potential deadlock in gpu reset Date: Mon, 31 May 2021 15:14:51 +0200 Message-Id: <20210531130652.570205443@linuxfoundation.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531130647.887605866@linuxfoundation.org> References: <20210531130647.887605866@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Lang Yu [ Upstream commit 9c2876d56f1ce9b6b2072f1446fb1e8d1532cb3d ] When amdgpu_ib_ring_tests failed, the reset logic called amdgpu_device_ip_suspend twice, then deadlock occurred. Deadlock log: [ 805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110). [ 806.290952] [drm] free PSP TMR buffer [ 806.319406] ============================================ [ 806.320315] WARNING: possible recursive locking detected [ 806.321225] 5.11.0-custom #1 Tainted: G W OEL [ 806.322135] -------------------------------------------- [ 806.323043] cat/2593 is trying to acquire lock: [ 806.323825] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.325668] but task is already holding lock: [ 806.326664] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.328430] other info that might help us debug this: [ 806.329539] Possible unsafe locking scenario: [ 806.330549] CPU0 [ 806.330983] ---- [ 806.331416] lock(&adev->dm.dc_lock); [ 806.332086] lock(&adev->dm.dc_lock); [ 806.332738] *** DEADLOCK *** [ 806.333747] May be due to missing lock nesting notation [ 806.334899] 3 locks held by cat/2593: [ 806.335537] #0: ffff888100d3f1b8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110 [ 806.337009] #1: ffff888136b1fd78 (&adev->reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu] [ 806.339018] #2: ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.340869] stack backtrace: [ 806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G W OEL 5.11.0-custom #1 [ 806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020 [ 806.344413] Call Trace: [ 806.344849] dump_stack+0x93/0xbd [ 806.345435] __lock_acquire.cold+0x18a/0x2cf [ 806.346179] lock_acquire+0xca/0x390 [ 806.346807] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.347813] __mutex_lock+0x9b/0x930 [ 806.348454] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.349434] ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu] [ 806.350581] ? _raw_spin_unlock_irqrestore+0x47/0x50 [ 806.351437] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.352437] ? rcu_read_lock_sched_held+0x4f/0x80 [ 806.353252] ? rcu_read_lock_sched_held+0x4f/0x80 [ 806.354064] mutex_lock_nested+0x1b/0x20 [ 806.354747] ? mutex_lock_nested+0x1b/0x20 [ 806.355457] dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.356427] ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu] [ 806.357736] amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu] [ 806.360394] amdgpu_device_ip_suspend+0x21/0x70 [amdgpu] [ 806.362926] amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu] [ 806.365560] amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu] Signed-off-by: Lang Yu Acked-by: Christian KÃnig Reviewed-by: Andrey Grodzovsky Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 3b3fc9a426e9..765f9a6c4640 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3704,7 +3704,6 @@ out: r = amdgpu_ib_ring_tests(tmp_adev); if (r) { dev_err(tmp_adev->dev, "ib ring test failed (%d).\n", r); - r = amdgpu_device_ip_suspend(tmp_adev); need_full_reset = true; r = -EAGAIN; goto end; -- 2.30.2