Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp281432pxb; Thu, 20 Jan 2022 13:24:56 -0800 (PST) X-Google-Smtp-Source: ABdhPJxc1pFRJZXjjUKCcZdez+skRrW2TojRUwT37LWF0QC3q16sEV7fcydp/7RikUZwyQrOKbHa X-Received: by 2002:a17:90b:3143:: with SMTP id ip3mr1028467pjb.161.1642713896563; Thu, 20 Jan 2022 13:24:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642713896; cv=none; d=google.com; s=arc-20160816; b=jqjThaW0SmcyW/uViOoo9jvhmMsC7eYMStSE7aX+R77krImZjAoCgaCUXqjrZvzuKw qqtgTo7nVuvVBjqXPs7ZUQ2IyRVhEun1BFXvaJABm3oWF5iLetWBegnOUBHJtNf5VUCe uipMeGSYbJ4NAbXS3xlgYS4EeNH847F71F1MelQ9jCJc150Lyej8Q/sbLYU1fiVnKII1 WvABQnox/p923qeGMW1kWopgTvw0S4ODeD9EEwZF7Ka4pm0UU4B8zirUsI+tl8u+/egv yL8T4JT2FKc0u7gyutz0a+a/VcmeX11hUuUDp29Rq1d6gPI2zWqn9NPSxcH1KsVyrT7K d6Vw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=o2FO2n/hhymtoqf/qyUX91Nr5khfxfnt1sI0etxnAto=; b=Uam1PrVU1kTOcaKouMzx/AS0MBKNUZm2pHXiqnXMrYpgAAnuMF0I+pzqmHYQ5LNmIX LAtZyQuPZTEzCaH9vV2B+Cb2QgrafH/8IqyoauyEOQl9W93/KnqoIWPKKn+ajnpYN47r znaojhAOlgbPQKyipr83tewhJuNgxjA2qaclmumZgOjKFBnyuufmdLtGw+cUjfGjNQ3S V7cdCWUTQwrtb5sLpRv+S/W4HSPB6qndVUl8i0KIulg0a9T5EU45GTSTO8ttBJPZyVy9 Qbh5Ll6kxm1u0LvxJt9qXKTqlvoghk1I+xKEyQLuWkjmNt98FnyjfRAb5KU/OH+P3xu1 Tu9Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=UDoGuuKU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=collabora.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k9si5363233pfc.320.2022.01.20.13.24.42; Thu, 20 Jan 2022 13:24:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=UDoGuuKU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=collabora.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347651AbiARRxl (ORCPT + 99 others); Tue, 18 Jan 2022 12:53:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48092 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347632AbiARRxk (ORCPT ); Tue, 18 Jan 2022 12:53:40 -0500 Received: from bhuna.collabora.co.uk (bhuna.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e3e3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2AF61C06161C for ; Tue, 18 Jan 2022 09:53:39 -0800 (PST) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: bbeckett) with ESMTPSA id AAB091F43F8C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1642528418; bh=NPQ2CknYxLM3KXAKcRXKUCFTUTsy40EPkXZ1YytxHV8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UDoGuuKUGsrB3ROZwXj9mi/44M2d+NMyCy1B0SvOT2ks0Z9yQSwcV9Thiy0Z+Esl8 s3Ue+BtxsLmFWKCKrAalS761XRr+fYAD/+2wxxHGvtx3AMXPaKyyyCvXjJkm/ssbuV DR/brwMmUL26v41PQ61YK64kBx1fxtJBs8mWNod7GaZcblijPCjY3HPqDuWfHIYrtB rfc5cJ5jJh830vGvkUKftL8zTjTAG5sSHQuWSjPInvBKWcopnMLCervEaeG3SzaIcO eh3xfvPFWhnbgV8Ce1uQMisHS1g41tJY/W+bVYy1l/2SAXwpCVOtwlR3hy/xhfRSnR 6lQWG+lIn2drw== From: Robert Beckett To: Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , Tvrtko Ursulin , David Airlie , Daniel Vetter Cc: Matthew Auld , Stuart Summers , Ramalingam C , intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 2/4] drm/i915: support 64K GTT pages for discrete cards Date: Tue, 18 Jan 2022 17:50:35 +0000 Message-Id: <20220118175036.3840934-3-bob.beckett@collabora.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220118175036.3840934-1-bob.beckett@collabora.com> References: <20220118175036.3840934-1-bob.beckett@collabora.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Matthew Auld discrete cards optimise 64K GTT pages for local-memory, since everything should be allocated at 64K granularity. We say goodbye to sparse entries, and instead get a compact 256B page-table for 64K pages, which should be more cache friendly. 4K pages for local-memory are no longer supported by the HW. Signed-off-by: Matthew Auld Signed-off-by: Stuart Summers Signed-off-by: Ramalingam C Cc: Joonas Lahtinen Cc: Rodrigo Vivi --- .../gpu/drm/i915/gem/selftests/huge_pages.c | 60 ++++++++++ drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 108 +++++++++++++++++- drivers/gpu/drm/i915/gt/intel_gtt.h | 3 + drivers/gpu/drm/i915/gt/intel_ppgtt.c | 1 + 4 files changed, 169 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index 26f997c376a2..7efa6a598b03 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -1478,6 +1478,65 @@ static int igt_ppgtt_sanity_check(void *arg) return err; } +static int igt_ppgtt_compact(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct drm_i915_gem_object *obj; + int err; + + /* + * Simple test to catch issues with compact 64K pages -- since the pt is + * compacted to 256B that gives us 32 entries per pt, however since the + * backing page for the pt is 4K, any extra entries we might incorrectly + * write out should be ignored by the HW. If ever hit such a case this + * test should catch it since some of our writes would land in scratch. + */ + + if (!HAS_64K_PAGES(i915)) { + pr_info("device lacks compact 64K page support, skipping\n"); + return 0; + } + + if (!HAS_LMEM(i915)) { + pr_info("device lacks LMEM support, skipping\n"); + return 0; + } + + /* We want the range to cover multiple page-table boundaries. */ + obj = i915_gem_object_create_lmem(i915, SZ_4M, 0); + if (IS_ERR(obj)) + return err; + + err = i915_gem_object_pin_pages_unlocked(obj); + if (err) + goto out_put; + + if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) { + pr_info("LMEM compact unable to allocate huge-page(s)\n"); + goto out_unpin; + } + + /* + * Disable 2M GTT pages by forcing the page-size to 64K for the GTT + * insertion. + */ + obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K; + + err = igt_write_huge(i915, obj); + if (err) + pr_err("LMEM compact write-huge failed\n"); + +out_unpin: + i915_gem_object_unpin_pages(obj); +out_put: + i915_gem_object_put(obj); + + if (err == -ENOMEM) + err = 0; + + return err; +} + static int igt_tmpfs_fallback(void *arg) { struct drm_i915_private *i915 = arg; @@ -1735,6 +1794,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915) SUBTEST(igt_tmpfs_fallback), SUBTEST(igt_ppgtt_smoke_huge), SUBTEST(igt_ppgtt_sanity_check), + SUBTEST(igt_ppgtt_compact), }; if (!HAS_PPGTT(i915)) { diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index c43e724afa9f..62471730266c 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, start, end, lvl); } else { unsigned int count; + unsigned int pte = gen8_pd_index(start, 0); + unsigned int num_ptes; u64 *vaddr; count = gen8_pt_count(start, end); @@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, atomic_read(&pt->used)); GEM_BUG_ON(!count || count >= atomic_read(&pt->used)); + num_ptes = count; + if (pt->is_compact) { + GEM_BUG_ON(num_ptes % 16); + GEM_BUG_ON(pte % 16); + num_ptes /= 16; + pte /= 16; + } + vaddr = px_vaddr(pt); - memset64(vaddr + gen8_pd_index(start, 0), + memset64(vaddr + pte, vm->scratch[0]->encode, - count); + num_ptes); atomic_sub(count, &pt->used); start += count; @@ -453,6 +463,95 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, return idx; } +static void +xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm, + struct i915_vma_resource *vma_res, + struct sgt_dma *iter, + enum i915_cache_level cache_level, + u32 flags) +{ + const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags); + unsigned int rem = sg_dma_len(iter->sg); + u64 start = vma_res->start; + + GEM_BUG_ON(!i915_vm_is_4lvl(vm)); + + do { + struct i915_page_directory * const pdp = + gen8_pdp_for_page_address(vm, start); + struct i915_page_directory * const pd = + i915_pd_entry(pdp, __gen8_pte_index(start, 2)); + struct i915_page_table *pt = + i915_pt_entry(pd, __gen8_pte_index(start, 1)); + gen8_pte_t encode = pte_encode; + unsigned int page_size; + gen8_pte_t *vaddr; + u16 index, max; + + max = I915_PDES; + + if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M && + IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) && + rem >= I915_GTT_PAGE_SIZE_2M && + !__gen8_pte_index(start, 0)) { + index = __gen8_pte_index(start, 1); + encode |= GEN8_PDE_PS_2M; + page_size = I915_GTT_PAGE_SIZE_2M; + + vaddr = px_vaddr(pd); + } else { + if (encode & GEN12_PPGTT_PTE_LM) { + GEM_BUG_ON(__gen8_pte_index(start, 0) % 16); + GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K); + GEM_BUG_ON(!IS_ALIGNED(iter->dma, + I915_GTT_PAGE_SIZE_64K)); + + index = __gen8_pte_index(start, 0) / 16; + page_size = I915_GTT_PAGE_SIZE_64K; + + max /= 16; + + vaddr = px_vaddr(pd); + vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K; + + pt->is_compact = true; + } else { + GEM_BUG_ON(pt->is_compact); + index = __gen8_pte_index(start, 0); + page_size = I915_GTT_PAGE_SIZE; + } + + vaddr = px_vaddr(pt); + } + + do { + GEM_BUG_ON(rem < page_size); + vaddr[index++] = encode | iter->dma; + + start += page_size; + iter->dma += page_size; + rem -= page_size; + if (iter->dma >= iter->max) { + iter->sg = __sg_next(iter->sg); + if (!iter->sg) + break; + + rem = sg_dma_len(iter->sg); + if (!rem) + break; + + iter->dma = sg_dma_address(iter->sg); + iter->max = iter->dma + rem; + + if (unlikely(!IS_ALIGNED(iter->dma, page_size))) + break; + } + } while (rem >= page_size && index < max); + + vma_res->page_sizes_gtt |= page_size; + } while (iter->sg && sg_dma_len(iter->sg)); +} + static void gen8_ppgtt_insert_huge(struct i915_address_space *vm, struct i915_vma_resource *vma_res, struct sgt_dma *iter, @@ -586,7 +685,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm, struct sgt_dma iter = sgt_dma(vma_res); if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) { - gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); + if (HAS_64K_PAGES(vm->i915)) + xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); + else + gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); } else { u64 idx = vma_res->start >> GEN8_PTE_SHIFT; diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index b8da2514d601..0ab1fee9c587 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -92,6 +92,8 @@ typedef u64 gen8_pte_t; #define GEN12_GGTT_PTE_LM BIT_ULL(1) +#define GEN12_PDE_64K BIT(6) + /* * Cacheability Control is a 4-bit value. The low three bits are stored in bits * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE. @@ -160,6 +162,7 @@ struct i915_page_table { atomic_t used; struct i915_page_table *stash; }; + bool is_compact; }; struct i915_page_directory { diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 48e6e2f87700..043652dc6892 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm) return ERR_PTR(-ENOMEM); } + pt->is_compact = false; atomic_set(&pt->used, 0); return pt; } -- 2.25.1