Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp873815pxb; Tue, 1 Feb 2022 12:11:21 -0800 (PST) X-Google-Smtp-Source: ABdhPJySF8dEFGgDyW/9FYwqemHIB5E5qQh3yKs6TVxaKgyAkmaZlNVE9slr4lZEVqXRfzo+YeUS X-Received: by 2002:a05:6a00:1a8d:: with SMTP id e13mr20731031pfv.10.1643746281091; Tue, 01 Feb 2022 12:11:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643746281; cv=none; d=google.com; s=arc-20160816; b=fEfuYRpaKRcl6AdTUNxeXEOwfUFKxX2zKrvqMi7XNks/PJqh58lMLnR4/cJAr+KGQL zMh86rfqh1zoeceGrinvq2MQA0E9pjFOgmPH3riEL07VOPqItLucP0JALJ9BQvb7jEbN lZWDhg4lCHxnQBzav8CjfQ8/Otf0UvegMFPhmK2qArfBnR7vTfE4OJfqGOlexhkTUSpq aAgzkYCr0rqaXmZQaDw7aPL53Esva7cAPcmhHnkmxewg0zlZCCDGcDEbLSMwGM3Y+VNX 9dj6lJMYvO2xCeXgMqzpNKXr7MSPTdULljt5nwdryx9CgYo7hmLq1VfHceSwud9QDj2o FdjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=vvxFjR/7kJnHL0UEa0WWSq4VS/X4oV+SAkL+8FccKEA=; b=AM0ORTsyXnUiSzKaOjLmJgIEUg5dg4tyRl97mrAvf15UIP1jKCnmU3Sh5fFgVx24me XYJPPtQHZXnXVOybmgwjXoBrEWEYyRqGLP7KHF3Tbmk/GLMLK6tToOsa2DmaHJwmtjeH yDTALHU92atmapeCpBWZqua/jmK93OdCptoC6m/2PpylqDARAHQbsTuml3WWiIxRL+hH 03OufLX2nmlDEHTG6s5HOGtmv5kj0O8AiIKKet4R+sZBpxLiQYy43rMIdAaR/z06RVd/ Nojcs8gw/TfvTAEuhdT+hHhchPVZ6QRO8oYTNC7Am9fVRUTm2GaF+Qlj1PjDPR/xLCqk XIoQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Ne+0oP3I; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bo22si3023176pjb.14.2022.02.01.12.11.08; Tue, 01 Feb 2022 12:11:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Ne+0oP3I; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378925AbiAaLiW (ORCPT + 99 others); Mon, 31 Jan 2022 06:38:22 -0500 Received: from mga06.intel.com ([134.134.136.31]:27700 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377270AbiAaL0K (ORCPT ); Mon, 31 Jan 2022 06:26:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643628368; x=1675164368; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=kZxmhKBFiayMDpwuQsw1kd+N+12nVcj2LtcRfCcgqZA=; b=Ne+0oP3IoZgWmky3JMp/j5HRtqIZRGoPjV0QuKR8x0G1liHVYQbuk0MI wqq0WwRasPQg/311MTsTLdcvjrCbY3ZDogKu/2yPPPH8o9NV9mp14dfpA T2ZG1oyLbNtN4ltirCg+jRPHoMkkZgmBKBbOLiV3D1YNxbUgBvOyJo4T9 q4kIAUHTGKdWY6/em/wscp30T4Ti+a2f3HcXvIvVBV1K7UhXakN7LMFpz X5ueWSUdh9ykZz2ZNApnve614siTD4jjhUJ1KUnYnVxnC/X8k0rWWqtm3 S5iMDzfqFhce6gKrDdZVMj6/fuwWLkFFoBDBE1ZhyPOf+/4fWrkxwlVrN A==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="308173305" X-IronPort-AV: E=Sophos;i="5.88,330,1635231600"; d="scan'208";a="308173305" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2022 03:20:01 -0800 X-IronPort-AV: E=Sophos;i="5.88,330,1635231600"; d="scan'208";a="537140866" Received: from jsarha-mobl.ger.corp.intel.com (HELO [10.249.254.108]) ([10.249.254.108]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2022 03:19:58 -0800 Message-ID: <326070cc-2e8c-0196-940e-7beb0704cd4f@linux.intel.com> Date: Mon, 31 Jan 2022 12:19:55 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.0 Subject: Re: [Intel-gfx] [PATCH v5 3/5] drm/i915: support 64K GTT pages for discrete cards Content-Language: en-US To: Robert Beckett , Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , Tvrtko Ursulin , David Airlie , Daniel Vetter Cc: intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Matthew Auld References: <20220125193530.3272386-1-bob.beckett@collabora.com> <20220125193530.3272386-4-bob.beckett@collabora.com> From: =?UTF-8?Q?Thomas_Hellstr=c3=b6m?= In-Reply-To: <20220125193530.3272386-4-bob.beckett@collabora.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/25/22 20:35, Robert Beckett wrote: > From: Matthew Auld > > discrete cards optimise 64K GTT pages for local-memory, since everything > should be allocated at 64K granularity. We say goodbye to sparse > entries, and instead get a compact 256B page-table for 64K pages, > which should be more cache friendly. 4K pages for local-memory > are no longer supported by the HW. > > v4: don't return uninitialized err in igt_ppgtt_compact > Reported-by: kernel test robot Reviewed-by: Thomas Hellström > > Signed-off-by: Matthew Auld > Signed-off-by: Stuart Summers > Signed-off-by: Ramalingam C > Signed-off-by: Robert Beckett > Cc: Joonas Lahtinen > Cc: Rodrigo Vivi > --- > .../gpu/drm/i915/gem/selftests/huge_pages.c | 60 ++++++++++ > drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 108 +++++++++++++++++- > drivers/gpu/drm/i915/gt/intel_gtt.h | 3 + > drivers/gpu/drm/i915/gt/intel_ppgtt.c | 1 + > 4 files changed, 169 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c > index f36191ebf964..a7d9bdb85d70 100644 > --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c > +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c > @@ -1478,6 +1478,65 @@ static int igt_ppgtt_sanity_check(void *arg) > return err; > } > > +static int igt_ppgtt_compact(void *arg) > +{ > + struct drm_i915_private *i915 = arg; > + struct drm_i915_gem_object *obj; > + int err; > + > + /* > + * Simple test to catch issues with compact 64K pages -- since the pt is > + * compacted to 256B that gives us 32 entries per pt, however since the > + * backing page for the pt is 4K, any extra entries we might incorrectly > + * write out should be ignored by the HW. If ever hit such a case this > + * test should catch it since some of our writes would land in scratch. > + */ > + > + if (!HAS_64K_PAGES(i915)) { > + pr_info("device lacks compact 64K page support, skipping\n"); > + return 0; > + } > + > + if (!HAS_LMEM(i915)) { > + pr_info("device lacks LMEM support, skipping\n"); > + return 0; > + } > + > + /* We want the range to cover multiple page-table boundaries. */ > + obj = i915_gem_object_create_lmem(i915, SZ_4M, 0); > + if (IS_ERR(obj)) > + return PTR_ERR(obj); > + > + err = i915_gem_object_pin_pages_unlocked(obj); > + if (err) > + goto out_put; > + > + if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) { > + pr_info("LMEM compact unable to allocate huge-page(s)\n"); > + goto out_unpin; > + } > + > + /* > + * Disable 2M GTT pages by forcing the page-size to 64K for the GTT > + * insertion. > + */ > + obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K; > + > + err = igt_write_huge(i915, obj); > + if (err) > + pr_err("LMEM compact write-huge failed\n"); > + > +out_unpin: > + i915_gem_object_unpin_pages(obj); > +out_put: > + i915_gem_object_put(obj); > + > + if (err == -ENOMEM) > + err = 0; > + > + return err; > +} > + > static int igt_tmpfs_fallback(void *arg) > { > struct drm_i915_private *i915 = arg; > @@ -1735,6 +1794,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915) > SUBTEST(igt_tmpfs_fallback), > SUBTEST(igt_ppgtt_smoke_huge), > SUBTEST(igt_ppgtt_sanity_check), > + SUBTEST(igt_ppgtt_compact), > }; > > if (!HAS_PPGTT(i915)) { > diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c > index c43e724afa9f..62471730266c 100644 > --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c > +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c > @@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, > start, end, lvl); > } else { > unsigned int count; > + unsigned int pte = gen8_pd_index(start, 0); > + unsigned int num_ptes; > u64 *vaddr; > > count = gen8_pt_count(start, end); > @@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, > atomic_read(&pt->used)); > GEM_BUG_ON(!count || count >= atomic_read(&pt->used)); > > + num_ptes = count; > + if (pt->is_compact) { > + GEM_BUG_ON(num_ptes % 16); > + GEM_BUG_ON(pte % 16); > + num_ptes /= 16; > + pte /= 16; > + } > + > vaddr = px_vaddr(pt); > - memset64(vaddr + gen8_pd_index(start, 0), > + memset64(vaddr + pte, > vm->scratch[0]->encode, > - count); > + num_ptes); > > atomic_sub(count, &pt->used); > start += count; > @@ -453,6 +463,95 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, > return idx; > } > > +static void > +xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm, > + struct i915_vma_resource *vma_res, > + struct sgt_dma *iter, > + enum i915_cache_level cache_level, > + u32 flags) > +{ > + const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags); > + unsigned int rem = sg_dma_len(iter->sg); > + u64 start = vma_res->start; > + > + GEM_BUG_ON(!i915_vm_is_4lvl(vm)); > + > + do { > + struct i915_page_directory * const pdp = > + gen8_pdp_for_page_address(vm, start); > + struct i915_page_directory * const pd = > + i915_pd_entry(pdp, __gen8_pte_index(start, 2)); > + struct i915_page_table *pt = > + i915_pt_entry(pd, __gen8_pte_index(start, 1)); > + gen8_pte_t encode = pte_encode; > + unsigned int page_size; > + gen8_pte_t *vaddr; > + u16 index, max; > + > + max = I915_PDES; > + > + if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M && > + IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) && > + rem >= I915_GTT_PAGE_SIZE_2M && > + !__gen8_pte_index(start, 0)) { > + index = __gen8_pte_index(start, 1); > + encode |= GEN8_PDE_PS_2M; > + page_size = I915_GTT_PAGE_SIZE_2M; > + > + vaddr = px_vaddr(pd); > + } else { > + if (encode & GEN12_PPGTT_PTE_LM) { > + GEM_BUG_ON(__gen8_pte_index(start, 0) % 16); > + GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K); > + GEM_BUG_ON(!IS_ALIGNED(iter->dma, > + I915_GTT_PAGE_SIZE_64K)); > + > + index = __gen8_pte_index(start, 0) / 16; > + page_size = I915_GTT_PAGE_SIZE_64K; > + > + max /= 16; > + > + vaddr = px_vaddr(pd); > + vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K; > + > + pt->is_compact = true; > + } else { > + GEM_BUG_ON(pt->is_compact); > + index = __gen8_pte_index(start, 0); > + page_size = I915_GTT_PAGE_SIZE; > + } > + > + vaddr = px_vaddr(pt); > + } > + > + do { > + GEM_BUG_ON(rem < page_size); > + vaddr[index++] = encode | iter->dma; > + > + start += page_size; > + iter->dma += page_size; > + rem -= page_size; > + if (iter->dma >= iter->max) { > + iter->sg = __sg_next(iter->sg); > + if (!iter->sg) > + break; > + > + rem = sg_dma_len(iter->sg); > + if (!rem) > + break; > + > + iter->dma = sg_dma_address(iter->sg); > + iter->max = iter->dma + rem; > + > + if (unlikely(!IS_ALIGNED(iter->dma, page_size))) > + break; > + } > + } while (rem >= page_size && index < max); > + > + vma_res->page_sizes_gtt |= page_size; > + } while (iter->sg && sg_dma_len(iter->sg)); > +} > + > static void gen8_ppgtt_insert_huge(struct i915_address_space *vm, > struct i915_vma_resource *vma_res, > struct sgt_dma *iter, > @@ -586,7 +685,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm, > struct sgt_dma iter = sgt_dma(vma_res); > > if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) { > - gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); > + if (HAS_64K_PAGES(vm->i915)) > + xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); > + else > + gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); > } else { > u64 idx = vma_res->start >> GEN8_PTE_SHIFT; > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h > index ba9f040f8606..e6ce0be6d484 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h > @@ -92,6 +92,8 @@ typedef u64 gen8_pte_t; > > #define GEN12_GGTT_PTE_LM BIT_ULL(1) > > +#define GEN12_PDE_64K BIT(6) > + > /* > * Cacheability Control is a 4-bit value. The low three bits are stored in bits > * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE. > @@ -160,6 +162,7 @@ struct i915_page_table { > atomic_t used; > struct i915_page_table *stash; > }; > + bool is_compact; > }; > > struct i915_page_directory { > diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c > index 48e6e2f87700..043652dc6892 100644 > --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c > +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c > @@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm) > return ERR_PTR(-ENOMEM); > } > > + pt->is_compact = false; > atomic_set(&pt->used, 0); > return pt; > }