Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1135782ybt; Tue, 7 Jul 2020 08:32:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJywyt1k2WP84RMy2XuIIS9twSLxK2P1FdH5pxaHHqQAe0a16eiJNugADm2huUljYUSbFutR X-Received: by 2002:a17:906:8157:: with SMTP id z23mr26112620ejw.349.1594135945492; Tue, 07 Jul 2020 08:32:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594135945; cv=none; d=google.com; s=arc-20160816; b=y8wBGKPn2R50jfK2uvGkP8kF0nQcB1GppDtFK9vTeogc/lyruQCu38dhQsCQdzdTwP H5nbGYkHw94A24sfWCITyefsqSItDhcpWvc8A6tvS4uNJYEt3EyWOC5geD6bv5rzNbcD ICKhianzIRjNTayrwurPShYQtHuGxnMkNWrnkXgnwuhRFWgCAwvXmAW3m8wnzhix0YiS oQh8HSYIGktmoY6tIsWRnH+MxK0JkgQa4dChV8Avz6spIsSjGWh7ZJcNsPl3Gfqsx9xl TqcIyAq9BDUpvfUe4ezH7+c83/glJy8NCY08O1lVzT8/fxa1tG3c92gFeXi7+QWYSe5u k4Og== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=/hZZRUlyuh+Wp5quCKsBUtho0L+ZNJXtykMXWoWnIRE=; b=cXPlzolx+EtNG4iSGzvhtuoR0e70eDecsiZ6up8rkVUnm47IhuqrhObZV4SNMKNOnB BmyosqPHsxGEiQAy1Hxkq+5RFwf9LI/PDl2x98K48yg3VXQqdoi2gQuIPuvlEk2rBsL1 xeDVUBP70Hb3FWOW4y9pFlac9yh3/0hb04UU3EmUfcyFwT9T4G0LbwvR99OJiZYwSRds 1PUNbDVxz6VPx+cBOklgKXioQ/eAKURcZJVWo+iR6cu3sIQ5GTT1iE6EoHmszsabOCyf iyZOhyEpisTOOJXcbw0KS73cLFT/ngvP+/mtLGS1sVdlb/JAUuOGF+G9KljXu842z9sZ wTBA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=GXu0ue5b; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id mm15si13649856ejb.561.2020.07.07.08.31.58; Tue, 07 Jul 2020 08:32:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=GXu0ue5b; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729519AbgGGPb1 (ORCPT + 99 others); Tue, 7 Jul 2020 11:31:27 -0400 Received: from mail.kernel.org ([198.145.29.99]:35916 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728757AbgGGPXC (ORCPT ); Tue, 7 Jul 2020 11:23:02 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 84F1920663; Tue, 7 Jul 2020 15:23:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1594135382; bh=NBfZR347CX3NjXV8lpgwwhncpWpFn2IdfEY6lIHKI78=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GXu0ue5batlgQsKVVoyBTl0bFHmF9bRTiUNLF22wyvT11onk77K5W3At5ZdK2hAKF 0lIxqYN7dC7OVDySqj5i485rAAqGG0YAHklHhwZpa8DGdE1eQJ+6Q9w+1s9pJEUK3Q 2c44mxIQiuAl9qTwfrWQ+SVNAdygufZ8I2jdGLv8= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Guchun Chen , John Clements , Alex Deucher , Sasha Levin Subject: [PATCH 5.7 019/112] drm/amdgpu: fix kernel page fault issue by ras recovery on sGPU Date: Tue, 7 Jul 2020 17:16:24 +0200 Message-Id: <20200707145801.910328468@linuxfoundation.org> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200707145800.925304888@linuxfoundation.org> References: <20200707145800.925304888@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Guchun Chen [ Upstream commit 12c17b9d62663c14a5343d6742682b3e67280754 ] When running ras uncorrectable error injection and triggering GPU reset on sGPU, below issue is observed. It's caused by the list uninitialized when accessing. [ 80.047227] BUG: unable to handle page fault for address: ffffffffc0f4f750 [ 80.047300] #PF: supervisor write access in kernel mode [ 80.047351] #PF: error_code(0x0003) - permissions violation [ 80.047404] PGD 12c20e067 P4D 12c20e067 PUD 12c210067 PMD 41c4ee067 PTE 404316061 [ 80.047477] Oops: 0003 [#1] SMP PTI [ 80.047516] CPU: 7 PID: 377 Comm: kworker/7:2 Tainted: G OE 5.4.0-rc7-guchchen #1 [ 80.047594] Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018 [ 80.047888] Workqueue: events amdgpu_ras_do_recovery [amdgpu] Signed-off-by: Guchun Chen Reviewed-by: John Clements Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index b0aa4e1ed4df7..cd18596b47d33 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -1444,9 +1444,10 @@ static void amdgpu_ras_do_recovery(struct work_struct *work) struct amdgpu_hive_info *hive = amdgpu_get_xgmi_hive(adev, false); /* Build list of devices to query RAS related errors */ - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) { + if (hive && adev->gmc.xgmi.num_physical_nodes > 1) device_list_handle = &hive->device_list; - } else { + else { + INIT_LIST_HEAD(&device_list); list_add_tail(&adev->gmc.xgmi.head, &device_list); device_list_handle = &device_list; } -- 2.25.1