Received: by 2002:ab2:6203:0:b0:1f5:f2ab:c469 with SMTP id o3csp2625663lqt; Mon, 22 Apr 2024 17:15:36 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXhFvnL9QoRQGIUWiK007FwISksPZYk53bR11IgKro0JcNxaMOPDDAZ7jwb+9pIXNHxBYdOmSND4yyo+MSVUhQbJQ8vin8X51asXc1uMw== X-Google-Smtp-Source: AGHT+IGykhWQesqfFuZc7BEt0cdYSbLYLARcT8i/mtK+VUrZ5qeAYvId7T+DH3yXLOsvthgyolPT X-Received: by 2002:a17:902:efc7:b0:1e8:2c8d:b749 with SMTP id ja7-20020a170902efc700b001e82c8db749mr11814411plb.30.1713831336067; Mon, 22 Apr 2024 17:15:36 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713831336; cv=pass; d=google.com; s=arc-20160816; b=K87hR0Y1qRg1h6OtOy5ixceiiX76yxti5bQ8kjy3RKK45hg7i07UA40CpF/c5Qe3ZJ S2U5wxLn4wwIfgkBgt/v6osJ17HotBa2Vk4sbKpjzfo4nFJ03A92PaHZZeAHINYsV9MM vn5AWgwzznhu5yIL2sA5SUp3HEeue7DCKGm9kSgKsqFDObKLhi3FNhJT4D8msjxrDOT3 ernp+AzdCJ59Unw72HDheMz64O3a3r0D5fivlNmrluVBtASiPkZt8NDGWVk3eI3TAbMd 27W246N7n08aziSEhzgBXgPAUP8U061dK4PdSRuYnHAYF38KfdNTb8C4KwgXdD2FxDO4 soQA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=r5rx1vQPA7a3rTjMh55gQCISppfvuCnn5twMGXX6fAs=; fh=z1BbIThiX1+BzYz0+r7LgRMUgMrCiJY7VA56wr40uuY=; b=I4lGGNjQfxdFl4d0Z53KvQha3XHUOp8nFvVHFTh1JtOcxOB+OvYBtn6Ii+v0uG8gIH 4KDhD4znfbCM2tkCJ4rncdJvrPaOeoUGsF6qoZhWUDrjIIxLInz3FRBq9DuIdePwTLQV pyAOZLJk2ecSdYBk2w5JqO+s99sm/nwIgv5CRxD9LuH8/ksykOb5KVpvt6Q00bXfeTeh OiQ8GraTAtNwWGS8HcNFgeSQ+sEnKdWUMq3b1ttnnGf0T4uessvtkkxwsXMR9xtJOoOw mdRlscgeKUdQ24vSzpXaTr2HmZUR3gO6DsYpwlRFCJUewxbYRgjiZV/9ngxbb+HuJMbF d7hw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=a7iazVTH; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-154251-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-154251-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id h5-20020a170902f7c500b001e7b7656181si8409134plw.393.2024.04.22.17.15.35 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Apr 2024 17:15:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-154251-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=a7iazVTH; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-154251-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-154251-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 8AD2AB24DBB for ; Tue, 23 Apr 2024 00:01:33 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7D4E315ADA6; Mon, 22 Apr 2024 23:55:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="a7iazVTH" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A13EB15AAC8; Mon, 22 Apr 2024 23:55:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713830127; cv=none; b=a852lH37s5IzNqsu+yJk3pnwYOozgVo5GVSeqoO71kjJ/pCEEjAwjZhCy+9uM+QJp6XA0rcAAYeTjWrXcM+UvLJfW/oz0cWXrmLnQdmPbADAFJ82nDjo1X0NdpDXEkSTm38BKsVgn/7KBhX6jBcPnn5OmfXKu6Ld5C4OssF0UgM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713830127; c=relaxed/simple; bh=Thz1QMoJ49mQ5FBVVdAHFumjxd2hBOeSF8CzaMI3W4U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IeWyb39DgLnP/A7Nv51fYj0GWPtGc1cXY/Q7uk/2wUyazKmrRa5L3MVnqT9pHjdPz6yoi+JgbTUF2flCLl1TYy5mtDR6d+pGr/liDpPwO5Dy6arQ+eTo46M4/z8hOlRNVLFYNJvugsjJyBdZ2XML70LHI4SgbEi0S9LZ6SgDRJU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=a7iazVTH; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id DD96AC4AF07; Mon, 22 Apr 2024 23:55:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1713830127; bh=Thz1QMoJ49mQ5FBVVdAHFumjxd2hBOeSF8CzaMI3W4U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=a7iazVTHO/iw4xgBSabABXDOB4XMq6o4UWoRPjlFoqAWP9mYHkIGO3+3dWAGnW4gP GxDEFHZOvz/WUIisb3Hxq0So0AiZcxiYQnj3CPUIpqiEx8ONlMrdLVkJ47eAHVTg8H vOlqwUNORcMVEs/Ad+sE6QBU9JH2vWuYp9DklQ9B4kmUHjEn0ED8Z6CrY8uvwsY9N2 CPE9NUJhP+AD5MMvWok4C61ysIfNrXHxqsBampemEd4et+ahXdH7Kt/UgVgUJEe0G1 lhKFk27F8iGkVzbvrhhqMee4fhf7ckB8WvOeUkZHMyTJDfvluQ1Y1Aj0M8I8A+nfL5 6Ek0CDWrfqM2Q== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Zhigang Luo , Felix Kuehling , Alex Deucher , Sasha Levin , Felix.Kuehling@amd.com, christian.koenig@amd.com, Xinhui.Pan@amd.com, airlied@gmail.com, daniel@ffwll.ch, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Subject: [PATCH AUTOSEL 6.8 33/43] amd/amdkfd: sync all devices to wait all processes being evicted Date: Mon, 22 Apr 2024 19:14:19 -0400 Message-ID: <20240422231521.1592991-33-sashal@kernel.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240422231521.1592991-1-sashal@kernel.org> References: <20240422231521.1592991-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 6.8.7 Content-Transfer-Encoding: 8bit From: Zhigang Luo [ Upstream commit d06af584be5a769d124b7302b32a033e9559761d ] If there are more than one device doing reset in parallel, the first device will call kfd_suspend_all_processes() to evict all processes on all devices, this call takes time to finish. other device will start reset and recover without waiting. if the process has not been evicted before doing recover, it will be restored, then caused page fault. Signed-off-by: Zhigang Luo Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 0a9cf9dfc2243..fcf6558d019e5 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -944,7 +944,6 @@ void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm) { struct kfd_node *node; int i; - int count; if (!kfd->init_complete) return; @@ -952,12 +951,10 @@ void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm) /* for runtime suspend, skip locking kfd */ if (!run_pm) { mutex_lock(&kfd_processes_mutex); - count = ++kfd_locked; - mutex_unlock(&kfd_processes_mutex); - /* For first KFD device suspend all the KFD processes */ - if (count == 1) + if (++kfd_locked == 1) kfd_suspend_all_processes(); + mutex_unlock(&kfd_processes_mutex); } for (i = 0; i < kfd->num_nodes; i++) { @@ -968,7 +965,7 @@ void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm) int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm) { - int ret, count, i; + int ret, i; if (!kfd->init_complete) return 0; @@ -982,12 +979,10 @@ int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm) /* for runtime resume, skip unlocking kfd */ if (!run_pm) { mutex_lock(&kfd_processes_mutex); - count = --kfd_locked; - mutex_unlock(&kfd_processes_mutex); - - WARN_ONCE(count < 0, "KFD suspend / resume ref. error"); - if (count == 0) + if (--kfd_locked == 0) ret = kfd_resume_all_processes(); + WARN_ONCE(kfd_locked < 0, "KFD suspend / resume ref. error"); + mutex_unlock(&kfd_processes_mutex); } return ret; -- 2.43.0