Received: by 2002:a05:6a10:eb17:0:0:0:0 with SMTP id hx23csp508315pxb; Thu, 9 Sep 2021 06:07:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzWPfNcOVweYpts2pGluG15Lw1WG5/2mdGjTlWwfolZJIr0Bs4nzjngzwu35iQIMY080KAO X-Received: by 2002:a5e:a913:: with SMTP id c19mr2547337iod.31.1631192830192; Thu, 09 Sep 2021 06:07:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631192830; cv=none; d=google.com; s=arc-20160816; b=Tw0BauQ7MLD41ytYWFaOfaDONDqOnAD8g+sKpXy3XPXSyiWOU2p7des0umh5XuNuoO uC5BNzCDhTaftk3sgMSPVc5hZwgVAqS1J8s8P227Y2YkVe13mj/hm2MmVdH+M8mcOB4W B/VhGw/P6kmrD9C+8Akt1EmPcXzTGab+IOySVE6mbTlHgA7sy5DjZwZL7mrjrBMsEabV TD39WDHVpFi5SNxmF8XVexHXf7/L7A9UyOL6GGCMT0fYWWlz4boI80mypt5kP3mmMzt2 zCEiExu+7RYLubXUzwcoqDM3rkqnekYgAq1WZqCVYkCHc6GJh5HRhIPXwWOhqjx0nJ7y gD5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=OGGLYHvX0vKgvLYKWsbsYAV2P5wZpRRifkR5/vNXK+M=; b=XagIDZEXTzaedbwEZY9niEBeDqcKSSMABnByhiDxHTlBIvEHiRW+7I8dHXpzg0NQlt bbKCK0mBJi1FwP3e3SxS1VqfivjDJWZ2NzqbNV3Ok/pgyhmm916Xah+EzUObZqMyk0yF Qs6JfTFQ8VNakFtbCo6c/b80gKPTQF2YA10+W1MazXl/8JIMKPVUfoRBIevKe0Njg6i6 LCjLEvV53YqWFt18Wm3lMY8RWfMV5DD1OqNF0dOAiNjCdv2a/8NgFE3Gld0EXo93/irp YIEfdvSJBVm/KB+EnkmeqYXhmgZDWIWqEzHZggRE6QD3CaygCrlMoML4+A7adFAptOeU rk2w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=S0QsmXEm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z23si2038631jas.40.2021.09.09.06.06.53; Thu, 09 Sep 2021 06:07:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=S0QsmXEm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355916AbhIINDp (ORCPT + 99 others); Thu, 9 Sep 2021 09:03:45 -0400 Received: from mail.kernel.org ([198.145.29.99]:46220 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354465AbhIIMvK (ORCPT ); Thu, 9 Sep 2021 08:51:10 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 46A9863235; Thu, 9 Sep 2021 11:57:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1631188633; bh=UsaLy+VS9uGznDshld/1XfrmZjQm8g1fmvtDiPhgZ/Y=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=S0QsmXEmCcNPRreD+jynqxqRy8Ir4ec+wZMu1uio+FgJGipWmQF27v3oyOTjKkZpL CcjNBP6M0PZYFDq8KAYNi0kNs1VKr5MZORvhT+u/Lcj1VXSt1lGe9yltMm590VCxPk p/L4fCQMMNXHyox9Hq4qkpO3iibHku++152HB2m0VsWfNhMgT/dRbn5w8kXQ7TwQTg a/mzq7iHTtHVokmhnIcIYGN/pwzB1rBT/rdyg8ibPfp4rvIY9RUwflHO8EZ1wDvdy/ ksVSZRHVt7f3UXk8cqBQBt6HNPBpjHiegViv0K8ySPBQInFTwGSf6rN4Z28V4vu8jz do9BIpEHyd1/g== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Sean Keely , Felix Kuehling , Alex Deucher , Sasha Levin , amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Subject: [PATCH AUTOSEL 5.4 099/109] drm/amdkfd: Account for SH/SE count when setting up cu masks. Date: Thu, 9 Sep 2021 07:54:56 -0400 Message-Id: <20210909115507.147917-99-sashal@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210909115507.147917-1-sashal@kernel.org> References: <20210909115507.147917-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Sean Keely [ Upstream commit 1ec06c2dee679e9f089e78ed20cb74ee90155f61 ] On systems with multiple SH per SE compute_static_thread_mgmt_se# is split into independent masks, one for each SH, in the upper and lower 16 bits. We need to detect this and apply cu masking to each SH. The cu mask bits are assigned first to each SE, then to alternate SHs, then finally to higher CU id. This ensures that the maximum number of SPIs are engaged as early as possible while balancing CU assignment to each SH. v2: Use max SH/SE rather than max SH in cu_per_sh. v3: Fix comment blocks, ensure se_mask is initially zero filled, and correctly assign se.sh.cu positions to unset bits in cu_mask. Signed-off-by: Sean Keely Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 84 +++++++++++++++----- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h | 1 + 2 files changed, 64 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c index 88813dad731f..c021519af810 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c @@ -98,36 +98,78 @@ void mqd_symmetrically_map_cu_mask(struct mqd_manager *mm, uint32_t *se_mask) { struct kfd_cu_info cu_info; - uint32_t cu_per_se[KFD_MAX_NUM_SE] = {0}; - int i, se, sh, cu = 0; - + uint32_t cu_per_sh[KFD_MAX_NUM_SE][KFD_MAX_NUM_SH_PER_SE] = {0}; + int i, se, sh, cu; amdgpu_amdkfd_get_cu_info(mm->dev->kgd, &cu_info); if (cu_mask_count > cu_info.cu_active_number) cu_mask_count = cu_info.cu_active_number; + /* Exceeding these bounds corrupts the stack and indicates a coding error. + * Returning with no CU's enabled will hang the queue, which should be + * attention grabbing. + */ + if (cu_info.num_shader_engines > KFD_MAX_NUM_SE) { + pr_err("Exceeded KFD_MAX_NUM_SE, chip reports %d\n", cu_info.num_shader_engines); + return; + } + if (cu_info.num_shader_arrays_per_engine > KFD_MAX_NUM_SH_PER_SE) { + pr_err("Exceeded KFD_MAX_NUM_SH, chip reports %d\n", + cu_info.num_shader_arrays_per_engine * cu_info.num_shader_engines); + return; + } + /* Count active CUs per SH. + * + * Some CUs in an SH may be disabled. HW expects disabled CUs to be + * represented in the high bits of each SH's enable mask (the upper and lower + * 16 bits of se_mask) and will take care of the actual distribution of + * disabled CUs within each SH automatically. + * Each half of se_mask must be filled only on bits 0-cu_per_sh[se][sh]-1. + * + * See note on Arcturus cu_bitmap layout in gfx_v9_0_get_cu_info. + */ for (se = 0; se < cu_info.num_shader_engines; se++) for (sh = 0; sh < cu_info.num_shader_arrays_per_engine; sh++) - cu_per_se[se] += hweight32(cu_info.cu_bitmap[se % 4][sh + (se / 4)]); - - /* Symmetrically map cu_mask to all SEs: - * cu_mask[0] bit0 -> se_mask[0] bit0; - * cu_mask[0] bit1 -> se_mask[1] bit0; - * ... (if # SE is 4) - * cu_mask[0] bit4 -> se_mask[0] bit1; + cu_per_sh[se][sh] = hweight32(cu_info.cu_bitmap[se % 4][sh + (se / 4)]); + + /* Symmetrically map cu_mask to all SEs & SHs: + * se_mask programs up to 2 SH in the upper and lower 16 bits. + * + * Examples + * Assuming 1 SH/SE, 4 SEs: + * cu_mask[0] bit0 -> se_mask[0] bit0 + * cu_mask[0] bit1 -> se_mask[1] bit0 + * ... + * cu_mask[0] bit4 -> se_mask[0] bit1 + * ... + * + * Assuming 2 SH/SE, 4 SEs + * cu_mask[0] bit0 -> se_mask[0] bit0 (SE0,SH0,CU0) + * cu_mask[0] bit1 -> se_mask[1] bit0 (SE1,SH0,CU0) + * ... + * cu_mask[0] bit4 -> se_mask[0] bit16 (SE0,SH1,CU0) + * cu_mask[0] bit5 -> se_mask[1] bit16 (SE1,SH1,CU0) + * ... + * cu_mask[0] bit8 -> se_mask[0] bit1 (SE0,SH0,CU1) * ... + * + * First ensure all CUs are disabled, then enable user specified CUs. */ - se = 0; - for (i = 0; i < cu_mask_count; i++) { - if (cu_mask[i / 32] & (1 << (i % 32))) - se_mask[se] |= 1 << cu; - - do { - se++; - if (se == cu_info.num_shader_engines) { - se = 0; - cu++; + for (i = 0; i < cu_info.num_shader_engines; i++) + se_mask[i] = 0; + + i = 0; + for (cu = 0; cu < 16; cu++) { + for (sh = 0; sh < cu_info.num_shader_arrays_per_engine; sh++) { + for (se = 0; se < cu_info.num_shader_engines; se++) { + if (cu_per_sh[se][sh] > cu) { + if (cu_mask[i / 32] & (1 << (i % 32))) + se_mask[se] |= 1 << (cu + sh * 16); + i++; + if (i == cu_mask_count) + return; + } } - } while (cu >= cu_per_se[se] && cu < 32); + } } } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h index fbdb16418847..4edc012e3138 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h @@ -27,6 +27,7 @@ #include "kfd_priv.h" #define KFD_MAX_NUM_SE 8 +#define KFD_MAX_NUM_SH_PER_SE 2 /** * struct mqd_manager -- 2.30.2