Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2748944pxj; Mon, 31 May 2021 09:45:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwCvFwpq6fEkAGdiSk19Onss0vChlCEMjh7lIYjeF25tm0bkhZet2jatVWEMR8LwikM4ieC X-Received: by 2002:a05:6602:1c4:: with SMTP id w4mr17153502iot.44.1622479550194; Mon, 31 May 2021 09:45:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622479550; cv=none; d=google.com; s=arc-20160816; b=PEOWhZ6CVK03rHYzk5wmNE2yumjrcmGnrgj85+XmRG17H5ouw7jheIo+tX+v9DVbf9 EPYtwDUf+tNBdlAE/KTFqox1+cNRuUuq8ArVZffyFJKLGDiGkyyq4Sifhy8G/3Ro9rN1 gDaX92Jpk2fHmS7mPoeQlXf5DNzqsm6r3chn3dSCMf5hoXnLVzFlKFb7RvUpGQeLti3c 7scUEHdH6sDQjB82rst0D8pPLmpxxwQRXjXTKZlO4m3Tyoac9xopJ/6AVcA+SbJ7UC4l hrMsdfnH8VNwKt2HAzTGLpff6sp0V0ni8Ym/sN+oAT4BOiBJcsN72VuatrF6GzCLUe+3 0JLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=MCzlYWlcS/zaaj23J4QQ1GASOA5sfKlCdQHJDq/lTk8=; b=iRl2CyIeAEqVG4qPDW9VJicwcXkit+d+ZLYbRVFK4CA5+x+I41ztwlhv7Tu/Lb89P/ yTwAm8/GdPF6/6fmJhEzCsXDy6/nMRKQZM2QdTsxVolrmtY1gGv3w8KMkuvguq4FL2um pRG/FokyuQMiZf4WdgbyXC3VEKbBGczcDKeHxsITRMGo9OLUxtRcfolaiDsszZhNu0PB yKlCFJgiKNJ1KmihR76hGw5ji/H/FaCV0cE1hqAWZmdVE8B9yZvXLEFoyEos/7j+fpGj aEHPegWyOCk/4kzIACsoz/C9UEdA8qVhNxrXCTxyq1waIi/ZVfiiktwhyP8zfIzXT6GL z8VA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=WNexFies; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m9si7162316ilg.125.2021.05.31.09.45.36; Mon, 31 May 2021 09:45:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=WNexFies; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232753AbhEaQqm (ORCPT + 99 others); Mon, 31 May 2021 12:46:42 -0400 Received: from mail.kernel.org ([198.145.29.99]:47906 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233034AbhEaOzT (ORCPT ); Mon, 31 May 2021 10:55:19 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id E811161CB4; Mon, 31 May 2021 13:59:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1622469575; bh=FXLswEYKyKfl1A9x8S/+L7ZjrBmTg0dp7L1wTtKjkxw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WNexFiesPZqscUhv6FyoMaWvnO/KR9wKnQcNwj5HkBHstg27RLaYlhOD2EEJeP8ah 3q+BimomXrSXr6W4dI27SuIZEl7Y1ChaFs2epBFGQkKoMeO6pUJLlL1WNcTWsoKkAs BlEsUKopfg9dk/c6L60EsBKlBAhSPUycWRAMQqYo= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Lang Yu , =?UTF-8?q?Christian=20K=C3=83nig?= , Andrey Grodzovsky , Alex Deucher , Sasha Levin Subject: [PATCH 5.12 215/296] drm/amd/amdgpu: fix a potential deadlock in gpu reset Date: Mon, 31 May 2021 15:14:30 +0200 Message-Id: <20210531130711.076994090@linuxfoundation.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210531130703.762129381@linuxfoundation.org> References: <20210531130703.762129381@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Lang Yu [ Upstream commit 9c2876d56f1ce9b6b2072f1446fb1e8d1532cb3d ] When amdgpu_ib_ring_tests failed, the reset logic called amdgpu_device_ip_suspend twice, then deadlock occurred. Deadlock log: [ 805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110). [ 806.290952] [drm] free PSP TMR buffer [ 806.319406] ============================================ [ 806.320315] WARNING: possible recursive locking detected [ 806.321225] 5.11.0-custom #1 Tainted: G W OEL [ 806.322135] -------------------------------------------- [ 806.323043] cat/2593 is trying to acquire lock: [ 806.323825] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.325668] but task is already holding lock: [ 806.326664] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.328430] other info that might help us debug this: [ 806.329539] Possible unsafe locking scenario: [ 806.330549] CPU0 [ 806.330983] ---- [ 806.331416] lock(&adev->dm.dc_lock); [ 806.332086] lock(&adev->dm.dc_lock); [ 806.332738] *** DEADLOCK *** [ 806.333747] May be due to missing lock nesting notation [ 806.334899] 3 locks held by cat/2593: [ 806.335537] #0: ffff888100d3f1b8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110 [ 806.337009] #1: ffff888136b1fd78 (&adev->reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu] [ 806.339018] #2: ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.340869] stack backtrace: [ 806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G W OEL 5.11.0-custom #1 [ 806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020 [ 806.344413] Call Trace: [ 806.344849] dump_stack+0x93/0xbd [ 806.345435] __lock_acquire.cold+0x18a/0x2cf [ 806.346179] lock_acquire+0xca/0x390 [ 806.346807] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.347813] __mutex_lock+0x9b/0x930 [ 806.348454] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.349434] ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu] [ 806.350581] ? _raw_spin_unlock_irqrestore+0x47/0x50 [ 806.351437] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.352437] ? rcu_read_lock_sched_held+0x4f/0x80 [ 806.353252] ? rcu_read_lock_sched_held+0x4f/0x80 [ 806.354064] mutex_lock_nested+0x1b/0x20 [ 806.354747] ? mutex_lock_nested+0x1b/0x20 [ 806.355457] dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.356427] ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu] [ 806.357736] amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu] [ 806.360394] amdgpu_device_ip_suspend+0x21/0x70 [amdgpu] [ 806.362926] amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu] [ 806.365560] amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu] Signed-off-by: Lang Yu Acked-by: Christian KÃnig Reviewed-by: Andrey Grodzovsky Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 5eee251e3335..85d90e857693 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4356,7 +4356,6 @@ out: r = amdgpu_ib_ring_tests(tmp_adev); if (r) { dev_err(tmp_adev->dev, "ib ring test failed (%d).\n", r); - r = amdgpu_device_ip_suspend(tmp_adev); need_full_reset = true; r = -EAGAIN; goto end; -- 2.30.2