Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp1295116pxb; Fri, 21 Jan 2022 14:37:50 -0800 (PST) X-Google-Smtp-Source: ABdhPJxNIfcsr3RjC3NTxbYiGiJpQFIoOyg59DWd50mNoyrKA+cLrtiQe9ugtqj+FKf78WHwjr70 X-Received: by 2002:a05:6a00:1687:b0:4c7:5fbb:bb13 with SMTP id k7-20020a056a00168700b004c75fbbbb13mr3292290pfc.36.1642804670604; Fri, 21 Jan 2022 14:37:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642804670; cv=none; d=google.com; s=arc-20160816; b=FpwL6jSGYGQIyXe/e5pSYL3K61nqceJFlIXfCxGt5v5boVSZurmeEdD8ICi/5MpFaI yCvfVewoAbPxLbxaG2jImxRILG0+JyUDpTOWTjDhmH3jfXXAMEwtFHHWXv0xva42sK0c M7t+arkQ17MKe5quGD23Ccy6jNHXeXAGn7L3nstJNs5sakcrqTqBX8GqAkUTFFjWKAIF ICGgq25PkDjjH9tTmG98/MEWr0cE5DIkBKlRTMaaaYpUIuFgWHVFI9pbYbuhgtO0F032 C4cydEm6BqWQvUqxM6J9h+CqxQumWDQm/o0yLm1gYJIpr8kK7+TayfnvCi8suGsG1VMX ulgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=7W93YVolxlyce73O9rwqKHIOLTmLnabRI7c09kRs1Ic=; b=IRgLzSmaYGCTNyHpE3bYfD1l/ifeIVJaS6qP9jkojw00XmO0r/r7nb5lqUaGww2uFa 8k8zoJuq7t8RTPv8F/0qOl/Ndo4Y7/w5MgHO7fx8ebVhj6a/zgZUZJ1DPNJCeE85lg1q oglMTve9QRRRlCbCl91jZABQn9/bMnk84e4852KizcDc531tjYMam4JWEsBrxmuCT6Jm U0ksEaAL+utUZw8lh/MgaqvYnTLld7yLfGAVWgGaXwIqx7UWkCFlv8Qp35Dq0rlxuPCO KAyhvWezFq9Y1ZDq0oh3U3cA2AeKptMtGaEnStOOw42dci2GH0ICM9xXhinQD5H7l07F /NdQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=oT4VD2+j; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=collabora.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y16si7387645pfl.104.2022.01.21.14.37.38; Fri, 21 Jan 2022 14:37:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=oT4VD2+j; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=collabora.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377699AbiATUkv (ORCPT + 99 others); Thu, 20 Jan 2022 15:40:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60782 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377672AbiATUkr (ORCPT ); Thu, 20 Jan 2022 15:40:47 -0500 Received: from bhuna.collabora.co.uk (bhuna.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e3e3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 636CFC06173F for ; Thu, 20 Jan 2022 12:40:47 -0800 (PST) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: bbeckett) with ESMTPSA id 08C7B1F4192B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1642711246; bh=EZhNbnnux8QGMifa3jABLZvijyiSZHx7tlZohz+hASU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oT4VD2+juQaTlO3LDHSiGnAi/+UgtwAFQNTR67ijCQ4En/WD/1Cz+eMZH7g+03Eee fNbE62VqmRFH5ZUXM6ilDC06bt+zkVN+T3cVJkLzOllPuqFhLM2cnQblu9tDHMilPl Vk4cheBNrtBFO/qer+A1A0t/SvNpY+CjLoFHpGSnmwG49Bl3ovgwdw1VkZqfDfsjTr ZqgTtd2O1nvAJa4iGRs9hejt84N33sTKPB7ZXLEAiifwEEOmnSBd0n1dNzTJi2x8OH HeyJSpKQPyTx9jVaYGJbFILFcMbi4oSghGMASUXzKdNlPTDYjrRifMxIvOhOzE631j QAV+0yL4PHeOA== From: Robert Beckett To: Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , Tvrtko Ursulin , David Airlie , Daniel Vetter Cc: Matthew Auld , Stuart Summers , Ramalingam C , intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 3/5] drm/i915: support 64K GTT pages for discrete cards Date: Thu, 20 Jan 2022 20:37:18 +0000 Message-Id: <20220120203721.632424-4-bob.beckett@collabora.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220120203721.632424-1-bob.beckett@collabora.com> References: <20220120203721.632424-1-bob.beckett@collabora.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Matthew Auld discrete cards optimise 64K GTT pages for local-memory, since everything should be allocated at 64K granularity. We say goodbye to sparse entries, and instead get a compact 256B page-table for 64K pages, which should be more cache friendly. 4K pages for local-memory are no longer supported by the HW. Signed-off-by: Matthew Auld Signed-off-by: Stuart Summers Signed-off-by: Ramalingam C Cc: Joonas Lahtinen Cc: Rodrigo Vivi --- .../gpu/drm/i915/gem/selftests/huge_pages.c | 60 ++++++++++ drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 108 +++++++++++++++++- drivers/gpu/drm/i915/gt/intel_gtt.h | 3 + drivers/gpu/drm/i915/gt/intel_ppgtt.c | 1 + 4 files changed, 169 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index 26f997c376a2..7efa6a598b03 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -1478,6 +1478,65 @@ static int igt_ppgtt_sanity_check(void *arg) return err; } +static int igt_ppgtt_compact(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct drm_i915_gem_object *obj; + int err; + + /* + * Simple test to catch issues with compact 64K pages -- since the pt is + * compacted to 256B that gives us 32 entries per pt, however since the + * backing page for the pt is 4K, any extra entries we might incorrectly + * write out should be ignored by the HW. If ever hit such a case this + * test should catch it since some of our writes would land in scratch. + */ + + if (!HAS_64K_PAGES(i915)) { + pr_info("device lacks compact 64K page support, skipping\n"); + return 0; + } + + if (!HAS_LMEM(i915)) { + pr_info("device lacks LMEM support, skipping\n"); + return 0; + } + + /* We want the range to cover multiple page-table boundaries. */ + obj = i915_gem_object_create_lmem(i915, SZ_4M, 0); + if (IS_ERR(obj)) + return err; + + err = i915_gem_object_pin_pages_unlocked(obj); + if (err) + goto out_put; + + if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) { + pr_info("LMEM compact unable to allocate huge-page(s)\n"); + goto out_unpin; + } + + /* + * Disable 2M GTT pages by forcing the page-size to 64K for the GTT + * insertion. + */ + obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K; + + err = igt_write_huge(i915, obj); + if (err) + pr_err("LMEM compact write-huge failed\n"); + +out_unpin: + i915_gem_object_unpin_pages(obj); +out_put: + i915_gem_object_put(obj); + + if (err == -ENOMEM) + err = 0; + + return err; +} + static int igt_tmpfs_fallback(void *arg) { struct drm_i915_private *i915 = arg; @@ -1735,6 +1794,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915) SUBTEST(igt_tmpfs_fallback), SUBTEST(igt_ppgtt_smoke_huge), SUBTEST(igt_ppgtt_sanity_check), + SUBTEST(igt_ppgtt_compact), }; if (!HAS_PPGTT(i915)) { diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index c43e724afa9f..62471730266c 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, start, end, lvl); } else { unsigned int count; + unsigned int pte = gen8_pd_index(start, 0); + unsigned int num_ptes; u64 *vaddr; count = gen8_pt_count(start, end); @@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, atomic_read(&pt->used)); GEM_BUG_ON(!count || count >= atomic_read(&pt->used)); + num_ptes = count; + if (pt->is_compact) { + GEM_BUG_ON(num_ptes % 16); + GEM_BUG_ON(pte % 16); + num_ptes /= 16; + pte /= 16; + } + vaddr = px_vaddr(pt); - memset64(vaddr + gen8_pd_index(start, 0), + memset64(vaddr + pte, vm->scratch[0]->encode, - count); + num_ptes); atomic_sub(count, &pt->used); start += count; @@ -453,6 +463,95 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, return idx; } +static void +xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm, + struct i915_vma_resource *vma_res, + struct sgt_dma *iter, + enum i915_cache_level cache_level, + u32 flags) +{ + const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags); + unsigned int rem = sg_dma_len(iter->sg); + u64 start = vma_res->start; + + GEM_BUG_ON(!i915_vm_is_4lvl(vm)); + + do { + struct i915_page_directory * const pdp = + gen8_pdp_for_page_address(vm, start); + struct i915_page_directory * const pd = + i915_pd_entry(pdp, __gen8_pte_index(start, 2)); + struct i915_page_table *pt = + i915_pt_entry(pd, __gen8_pte_index(start, 1)); + gen8_pte_t encode = pte_encode; + unsigned int page_size; + gen8_pte_t *vaddr; + u16 index, max; + + max = I915_PDES; + + if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M && + IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) && + rem >= I915_GTT_PAGE_SIZE_2M && + !__gen8_pte_index(start, 0)) { + index = __gen8_pte_index(start, 1); + encode |= GEN8_PDE_PS_2M; + page_size = I915_GTT_PAGE_SIZE_2M; + + vaddr = px_vaddr(pd); + } else { + if (encode & GEN12_PPGTT_PTE_LM) { + GEM_BUG_ON(__gen8_pte_index(start, 0) % 16); + GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K); + GEM_BUG_ON(!IS_ALIGNED(iter->dma, + I915_GTT_PAGE_SIZE_64K)); + + index = __gen8_pte_index(start, 0) / 16; + page_size = I915_GTT_PAGE_SIZE_64K; + + max /= 16; + + vaddr = px_vaddr(pd); + vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K; + + pt->is_compact = true; + } else { + GEM_BUG_ON(pt->is_compact); + index = __gen8_pte_index(start, 0); + page_size = I915_GTT_PAGE_SIZE; + } + + vaddr = px_vaddr(pt); + } + + do { + GEM_BUG_ON(rem < page_size); + vaddr[index++] = encode | iter->dma; + + start += page_size; + iter->dma += page_size; + rem -= page_size; + if (iter->dma >= iter->max) { + iter->sg = __sg_next(iter->sg); + if (!iter->sg) + break; + + rem = sg_dma_len(iter->sg); + if (!rem) + break; + + iter->dma = sg_dma_address(iter->sg); + iter->max = iter->dma + rem; + + if (unlikely(!IS_ALIGNED(iter->dma, page_size))) + break; + } + } while (rem >= page_size && index < max); + + vma_res->page_sizes_gtt |= page_size; + } while (iter->sg && sg_dma_len(iter->sg)); +} + static void gen8_ppgtt_insert_huge(struct i915_address_space *vm, struct i915_vma_resource *vma_res, struct sgt_dma *iter, @@ -586,7 +685,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm, struct sgt_dma iter = sgt_dma(vma_res); if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) { - gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); + if (HAS_64K_PAGES(vm->i915)) + xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); + else + gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); } else { u64 idx = vma_res->start >> GEN8_PTE_SHIFT; diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index 1519c6a689b0..c306bfebd998 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -92,6 +92,8 @@ typedef u64 gen8_pte_t; #define GEN12_GGTT_PTE_LM BIT_ULL(1) +#define GEN12_PDE_64K BIT(6) + /* * Cacheability Control is a 4-bit value. The low three bits are stored in bits * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE. @@ -160,6 +162,7 @@ struct i915_page_table { atomic_t used; struct i915_page_table *stash; }; + bool is_compact; }; struct i915_page_directory { diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 48e6e2f87700..043652dc6892 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm) return ERR_PTR(-ENOMEM); } + pt->is_compact = false; atomic_set(&pt->used, 0); return pt; } -- 2.25.1