Received: by 2002:ab2:6d45:0:b0:1fb:d597:ff75 with SMTP id d5csp255579lqr; Wed, 5 Jun 2024 05:22:09 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVWkyL+qxWYCW4vIO2RuSC5DkTIDrsMnjnXJrq/aaAj2YOXF+I3UilF7mOZzvMwUGFVyvcmiB4UXin/T+h3FbcpSqCQCV6BnDZvFePN8w== X-Google-Smtp-Source: AGHT+IGqseknK9qi9js50m+g7y031VWi8gPSuHBOBRuJHuJLW3TuGWR2kgOHELG9FIerLYXjYw03 X-Received: by 2002:a05:6214:5b86:b0:6ad:716e:e13d with SMTP id 6a1803df08f44-6b030a997b7mr26605666d6.60.1717590129503; Wed, 05 Jun 2024 05:22:09 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717590129; cv=pass; d=google.com; s=arc-20160816; b=jrvFvuhu7+fjclLz2ytI0MD8evgQh26S7nXGiXs91xSwzQ6cT0ryPXWXV8zop0N9DI 8curlCUKO2kN/N0C5+eeiA9xk5K6Gu7w5NEvDnf9eH5EeMqOG1VeDXsS92fGXhj/PJwQ rVjDTpyp4utw62VSXCHrWA9a0zkWm67/CWL26mBYO0gdsMZFpnv8O8bF3+cm6dYQS1O6 EmxQ5CwkQ3PcVEpxdDW5WCg3YSPJzLAK7zqqWCh83yzzbyeFVML3JAFmzIgtg85yyZUW FDC8lPxM1mVY+Ppr3a2CHwwjbr/gYByXDVJjRQvTUic1b9g+RKWEncW93Ns6EbyAi5V2 juyg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=McpRuLaosjxRcTqfNw/hKWY90dZZ0ymDSU/VytVvZyg=; fh=v9wq7dboZ64jWyee73s7vUomL3rRsBGzKDtJ69Ug7KI=; b=ZyZ+hEZv3up9/Vo56fi06+iUH5Qzws7z2XHI+B2soTrcAq0L+YUq0rSUM5D39HzGG8 W5C/El6cL0GXWgvRcgDbIvAigNiselZGD990aSf9ideUbOfzHMCfMGE/QShtjYnJBYpq Ff/CjRaapHhDpWS0mAUdR3kIRwkumESm0DK4iL+XhPeBtX3aNgNFWbzEprNu9AkVkXfP QHf+KJzVxT9iwSYGuw3O/l1ExWStizSdmFE7rIdmMIgBWVvdgerlBy5vQnxLXkjMGkqq ov64kZR1/w3rldwnA7zighFN075XBjw6RrFTdJf6OyL/1REPJS17hzRmyf5Jox40U2Oe fUrg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=hTi+gbRD; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-202427-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-202427-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id 6a1803df08f44-6ae4b4150f7si135151786d6.360.2024.06.05.05.22.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Jun 2024 05:22:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-202427-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=hTi+gbRD; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-202427-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-202427-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 2FE8A1C23C8C for ; Wed, 5 Jun 2024 12:22:09 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 13FDA19AA62; Wed, 5 Jun 2024 11:56:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hTi+gbRD" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 008FC19AA52; Wed, 5 Jun 2024 11:56:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717588570; cv=none; b=H9PE+/VOLvfJPXCZzdxqP63/Tfxew+fxva+dSM7eK/NClHvp3mDW9Ss8VgxwWFLI6GsF0hzInCs7hroqREs8KfI8isFh4K9d+wbOVVz90T2LAULsFSIXjKGs2oOJoz6AHspjgzvcEgJqN8GjSjJVtIiJ3gZqVFRw7e5rMOjnexY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717588570; c=relaxed/simple; bh=vcr8GKcwkk9DL0Sw5rElQ5s4ZEn0YTHN5T4849ZYSco=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QmK56ak4dC1Q8VDLeqBsUVgmYu/R/vy/tOsmsq06ZbRqvc5PSiau8tn8kbDAVoxDiozXKsBEh96hWUwOyhZPDmRE5ibEGaXRWrWXyf2jIunCl69/WipBGXEhd3PDcOzFHD0u/ySn2pgMo2uJ1fNuBSbm7xMK6cw0p1/GMBB6ZLQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hTi+gbRD; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 220AEC4AF0A; Wed, 5 Jun 2024 11:56:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1717588569; bh=vcr8GKcwkk9DL0Sw5rElQ5s4ZEn0YTHN5T4849ZYSco=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hTi+gbRDsbhaH32OvnvbRKKKDEtf1AbhBAnvc2bErwiHgcFBVn3wsxNM3jby7PXWj zck84Fl7hYqwGeruBU4/vp3FPPpzLj/L0DgvDH009aBG29nAZPQvGpfiQr512HHkVf Q4ggbEaP33ZHHJOPWU7VEesT+fMdqTaof7esBGiMMBnBo0b80ibzjh1dAKJH3Bm+kS WVm9JaB9hV29oCMttvouURya+oElrV8d/h0nbfX16+CwIjU5i+FHhpJaBfCGJTzTC6 G4QctZfWQplzB/93fcBNx70c1p5PqBf4QbEY3D3gyOSAQYDwLV/Jgi35IuH2IFQuej kGPKlXTl93EHg== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Lang Yu , Felix Kuehling , Alex Deucher , Sasha Levin , Felix.Kuehling@amd.com, christian.koenig@amd.com, Xinhui.Pan@amd.com, airlied@gmail.com, daniel@ffwll.ch, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Subject: [PATCH AUTOSEL 6.8 4/6] drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs Date: Wed, 5 Jun 2024 07:55:56 -0400 Message-ID: <20240605115602.2964993-4-sashal@kernel.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240605115602.2964993-1-sashal@kernel.org> References: <20240605115602.2964993-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 6.8.12 Content-Transfer-Encoding: 8bit From: Lang Yu [ Upstream commit eb853413d02c8d9b27942429b261a9eef228f005 ] Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads memory allocation requirements. We can't even run a Basic MNIST Example with a default 512MB carveout. https://github.com/pytorch/examples/tree/main/mnist. Error Log: "torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes is free. Of the allocated memory 103.83 MiB is allocated by PyTorch, and 22.17 MiB is reserved by PyTorch but unallocated" Though we can change BIOS settings to enlarge carveout size, which is inflexible and may bring complaint. On the other hand, the memory resource can't be effectively used between host and device. The solution is MI300A approach, i.e., let VRAM allocations go to GTT. Then device and host can flexibly and effectively share memory resource. v2: Report local_mem_size_private as 0. (Felix) Signed-off-by: Lang Yu Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 5 +++++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 20 ++++++++++--------- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 ++++-- drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 3 ++- 5 files changed, 23 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 131983ed43465..7bd14a293ee52 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -455,6 +455,9 @@ void amdgpu_amdkfd_get_local_mem_info(struct amdgpu_device *adev, else mem_info->local_mem_size_private = KFD_XCP_MEMORY_SIZE(adev, xcp->id); + } else if (adev->flags & AMD_IS_APU) { + mem_info->local_mem_size_public = (ttm_tt_pages_limit() << PAGE_SHIFT); + mem_info->local_mem_size_private = 0; } else { mem_info->local_mem_size_public = adev->gmc.visible_vram_size; mem_info->local_mem_size_private = adev->gmc.real_vram_size - @@ -803,6 +806,8 @@ u64 amdgpu_amdkfd_xcp_memory_size(struct amdgpu_device *adev, int xcp_id) } do_div(tmp, adev->xcp_mgr->num_xcp_per_mem_partition); return ALIGN_DOWN(tmp, PAGE_SIZE); + } else if (adev->flags & AMD_IS_APU) { + return (ttm_tt_pages_limit() << PAGE_SHIFT); } else { return adev->gmc.real_vram_size; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 36796fbc00315..fd0aef17ab36c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -196,7 +196,7 @@ int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev, return -EINVAL; vram_size = KFD_XCP_MEMORY_SIZE(adev, xcp_id); - if (adev->gmc.is_app_apu) { + if (adev->gmc.is_app_apu || adev->flags & AMD_IS_APU) { system_mem_needed = size; ttm_mem_needed = size; } @@ -232,7 +232,8 @@ int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev, "adev reference can't be null when vram is used"); if (adev && xcp_id >= 0) { adev->kfd.vram_used[xcp_id] += vram_needed; - adev->kfd.vram_used_aligned[xcp_id] += adev->gmc.is_app_apu ? + adev->kfd.vram_used_aligned[xcp_id] += + (adev->gmc.is_app_apu || adev->flags & AMD_IS_APU) ? vram_needed : ALIGN(vram_needed, VRAM_AVAILABLITY_ALIGN); } @@ -260,7 +261,7 @@ void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev, if (adev) { adev->kfd.vram_used[xcp_id] -= size; - if (adev->gmc.is_app_apu) { + if (adev->gmc.is_app_apu || adev->flags & AMD_IS_APU) { adev->kfd.vram_used_aligned[xcp_id] -= size; kfd_mem_limit.system_mem_used -= size; kfd_mem_limit.ttm_mem_used -= size; @@ -887,7 +888,7 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem, * if peer device has large BAR. In contrast, access over xGMI is * allowed for both small and large BAR configurations of peer device */ - if ((adev != bo_adev && !adev->gmc.is_app_apu) && + if ((adev != bo_adev && !(adev->gmc.is_app_apu || adev->flags & AMD_IS_APU)) && ((mem->domain == AMDGPU_GEM_DOMAIN_VRAM) || (mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL) || (mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP))) { @@ -1654,7 +1655,7 @@ size_t amdgpu_amdkfd_get_available_memory(struct amdgpu_device *adev, - atomic64_read(&adev->vram_pin_size) - reserved_for_pt; - if (adev->gmc.is_app_apu) { + if (adev->gmc.is_app_apu || adev->flags & AMD_IS_APU) { system_mem_available = no_system_mem_limit ? kfd_mem_limit.max_system_mem_limit : kfd_mem_limit.max_system_mem_limit - @@ -1702,7 +1703,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( if (flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) { domain = alloc_domain = AMDGPU_GEM_DOMAIN_VRAM; - if (adev->gmc.is_app_apu) { + if (adev->gmc.is_app_apu || adev->flags & AMD_IS_APU) { domain = AMDGPU_GEM_DOMAIN_GTT; alloc_domain = AMDGPU_GEM_DOMAIN_GTT; alloc_flags = 0; @@ -1949,7 +1950,7 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu( if (size) { if (!is_imported && (mem->bo->preferred_domains == AMDGPU_GEM_DOMAIN_VRAM || - (adev->gmc.is_app_apu && + ((adev->gmc.is_app_apu || adev->flags & AMD_IS_APU) && mem->bo->preferred_domains == AMDGPU_GEM_DOMAIN_GTT))) *size = bo_size; else @@ -2372,8 +2373,9 @@ static int import_obj_create(struct amdgpu_device *adev, (*mem)->dmabuf = dma_buf; (*mem)->bo = bo; (*mem)->va = va; - (*mem)->domain = (bo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM) && !adev->gmc.is_app_apu ? - AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT; + (*mem)->domain = (bo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM) && + !(adev->gmc.is_app_apu || adev->flags & AMD_IS_APU) ? + AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT; (*mem)->mapped_to_gpu_memory = 0; (*mem)->process_info = avm->process_info; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index bdc01ca9609a7..00e4032976231 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -1009,7 +1009,7 @@ int kgd2kfd_init_zone_device(struct amdgpu_device *adev) if (amdgpu_ip_version(adev, GC_HWIP, 0) < IP_VERSION(9, 0, 1)) return -EINVAL; - if (adev->gmc.is_app_apu) + if (adev->gmc.is_app_apu || adev->flags & AMD_IS_APU) return 0; pgmap = &kfddev->pgmap; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index c50a0dc9c9c07..94f3614dd6c26 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -2617,7 +2617,8 @@ svm_range_best_restore_location(struct svm_range *prange, return -1; } - if (node->adev->gmc.is_app_apu) + if (node->adev->gmc.is_app_apu || + node->adev->flags & AMD_IS_APU) return 0; if (prange->preferred_loc == gpuid || @@ -3335,7 +3336,8 @@ svm_range_best_prefetch_location(struct svm_range *prange) goto out; } - if (bo_node->adev->gmc.is_app_apu) { + if (bo_node->adev->gmc.is_app_apu || + bo_node->adev->flags & AMD_IS_APU) { best_loc = 0; goto out; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h index 026863a0abcd3..9c37bd0567efa 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h @@ -201,7 +201,8 @@ void svm_range_list_lock_and_flush_work(struct svm_range_list *svms, struct mm_s * is initialized to not 0 when page migration register device memory. */ #define KFD_IS_SVM_API_SUPPORTED(adev) ((adev)->kfd.pgmap.type != 0 ||\ - (adev)->gmc.is_app_apu) + (adev)->gmc.is_app_apu ||\ + ((adev)->flags & AMD_IS_APU)) void svm_range_bo_unref_async(struct svm_range_bo *svm_bo); -- 2.43.0