2022-06-27 14:39:41

by Mikko Perttunen

[permalink] [raw]
Subject: [PATCH v7/v3 00/22] Host1x context isolation / Tegra234 support

From: Mikko Perttunen <[email protected]>

Integrated the Host1x context isolation series (patches 1 to 8) and
Tegra234 support series (patches 9 to 22) in one email thread for
the benefit of automatic testers.

Changes from previous versions:

Context isolation:
* Improved check to ensure context devices are attached to IOMMU
* Fixed build failure when CONFIG_IOMMU_API=n as reported by bot
* Dropped Thierry's Host1x schema YAML conversion from this series
-- it was accidentally included in the previous
* Also dropped arm-smmu change for now. It can be merged later if
necessary.

Tegra234:
* Split bracketing fix in DT schema to separate patch
* Added Acked-by

Thanks,
Mikko

Mikko Perttunen (22):
dt-bindings: host1x: Add iommu-map property
gpu: host1x: Add context device management code
gpu: host1x: Program context stream ID on submission
arm64: tegra: Add Host1x context stream IDs on Tegra186+
drm/tegra: falcon: Set DMACTX field on DMA transactions
drm/tegra: nvdec: Fix TRANSCFG register offset
drm/tegra: Support context isolation
drm/tegra: Implement stream ID related callbacks on engines
dt-bindings: Add bindings for Tegra234 Host1x and VIC
dt-bindings: host1x: Fix bracketing in example
dt-bindings: Add headers for Host1x and VIC on Tegra234
arm64: tegra: Add Host1x and VIC on Tegra234
gpu: host1x: Deduplicate hardware headers
gpu: host1x: Simplify register mapping and add common aperture
gpu: host1x: Program virtualization tables
gpu: host1x: Allow reset to be missing
gpu: host1x: Program interrupt destinations on Tegra234
gpu: host1x: Tegra234 device data and headers
gpu: host1x: Rewrite job opcode sequence
gpu: host1x: Add MLOCK release code on Tegra234
gpu: host1x: Use RESTART_W to skip timed out jobs on Tegra186+
drm/tegra: vic: Add Tegra234 support

.../display/tegra/nvidia,tegra124-vic.yaml | 1 +
.../display/tegra/nvidia,tegra20-host1x.yaml | 115 +++++++++--
arch/arm64/boot/dts/nvidia/tegra186.dtsi | 11 ++
arch/arm64/boot/dts/nvidia/tegra194.dtsi | 11 ++
arch/arm64/boot/dts/nvidia/tegra234.dtsi | 46 +++++
drivers/gpu/drm/tegra/drm.c | 1 +
drivers/gpu/drm/tegra/drm.h | 11 ++
drivers/gpu/drm/tegra/falcon.c | 8 +
drivers/gpu/drm/tegra/falcon.h | 1 +
drivers/gpu/drm/tegra/nvdec.c | 13 +-
drivers/gpu/drm/tegra/submit.c | 48 ++++-
drivers/gpu/drm/tegra/uapi.c | 43 ++++-
drivers/gpu/drm/tegra/vic.c | 79 +++++++-
drivers/gpu/host1x/Makefile | 6 +-
drivers/gpu/host1x/cdma.c | 19 +-
drivers/gpu/host1x/context.c | 160 ++++++++++++++++
drivers/gpu/host1x/context.h | 38 ++++
drivers/gpu/host1x/dev.c | 124 ++++++++----
drivers/gpu/host1x/dev.h | 13 ++
drivers/gpu/host1x/hw/cdma_hw.c | 34 ++++
drivers/gpu/host1x/hw/channel_hw.c | 136 +++++++++----
drivers/gpu/host1x/hw/host1x01_hardware.h | 114 +----------
drivers/gpu/host1x/hw/host1x02_hardware.h | 113 +----------
drivers/gpu/host1x/hw/host1x04_hardware.h | 113 +----------
drivers/gpu/host1x/hw/host1x05_hardware.h | 113 +----------
drivers/gpu/host1x/hw/host1x06_hardware.h | 118 +-----------
drivers/gpu/host1x/hw/host1x07_hardware.h | 118 +-----------
drivers/gpu/host1x/hw/host1x08.c | 33 ++++
drivers/gpu/host1x/hw/host1x08.h | 15 ++
drivers/gpu/host1x/hw/host1x08_hardware.h | 21 ++
drivers/gpu/host1x/hw/hw_host1x08_channel.h | 11 ++
drivers/gpu/host1x/hw/hw_host1x08_common.h | 11 ++
.../gpu/host1x/hw/hw_host1x08_hypervisor.h | 9 +
drivers/gpu/host1x/hw/hw_host1x08_uclass.h | 181 ++++++++++++++++++
drivers/gpu/host1x/hw/hw_host1x08_vm.h | 36 ++++
drivers/gpu/host1x/hw/intr_hw.c | 11 ++
drivers/gpu/host1x/hw/opcodes.h | 150 +++++++++++++++
include/dt-bindings/clock/tegra234-clock.h | 4 +
include/dt-bindings/memory/tegra234-mc.h | 5 +
.../dt-bindings/power/tegra234-powergate.h | 1 +
include/dt-bindings/reset/tegra234-reset.h | 1 +
include/linux/host1x.h | 42 ++++
42 files changed, 1357 insertions(+), 781 deletions(-)
create mode 100644 drivers/gpu/host1x/context.c
create mode 100644 drivers/gpu/host1x/context.h
create mode 100644 drivers/gpu/host1x/hw/host1x08.c
create mode 100644 drivers/gpu/host1x/hw/host1x08.h
create mode 100644 drivers/gpu/host1x/hw/host1x08_hardware.h
create mode 100644 drivers/gpu/host1x/hw/hw_host1x08_channel.h
create mode 100644 drivers/gpu/host1x/hw/hw_host1x08_common.h
create mode 100644 drivers/gpu/host1x/hw/hw_host1x08_hypervisor.h
create mode 100644 drivers/gpu/host1x/hw/hw_host1x08_uclass.h
create mode 100644 drivers/gpu/host1x/hw/hw_host1x08_vm.h
create mode 100644 drivers/gpu/host1x/hw/opcodes.h

--
2.36.1


2022-06-27 14:40:22

by Mikko Perttunen

[permalink] [raw]
Subject: [PATCH v7/v3 01/22] dt-bindings: host1x: Add iommu-map property

From: Mikko Perttunen <[email protected]>

Add schema information for specifying context stream IDs. This uses
the standard iommu-map property.

Signed-off-by: Mikko Perttunen <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Acked-by: Rob Herring <[email protected]>
---
v3:
* New patch
v4:
* Remove memory-contexts subnode.
---
.../bindings/display/tegra/nvidia,tegra20-host1x.yaml | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
index 4fd513efb0f7..0adeb03b9e3a 100644
--- a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
+++ b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
@@ -144,6 +144,11 @@ allOf:
reset-names:
maxItems: 1

+ iommu-map:
+ description: Specification of stream IDs available for memory context device
+ use. Should be a mapping of IDs 0..n to IOMMU entries corresponding to
+ usable stream IDs.
+
required:
- reg-names

--
2.36.1

2022-06-27 14:41:50

by Mikko Perttunen

[permalink] [raw]
Subject: [PATCH v7/v3 04/22] arm64: tegra: Add Host1x context stream IDs on Tegra186+

From: Mikko Perttunen <[email protected]>

Add Host1x context stream IDs on systems that support Host1x context
isolation. Host1x and attached engines can use these stream IDs to
allow isolation between memory used by different processes.

The specified stream IDs must match those configured by the hypervisor,
if one is present.

Signed-off-by: Mikko Perttunen <[email protected]>
---
v2:
* Added context devices on T194.
* Use iommu-map instead of custom property.
v4:
* Remove memory-contexts subnode.
---
arch/arm64/boot/dts/nvidia/tegra186.dtsi | 11 +++++++++++
arch/arm64/boot/dts/nvidia/tegra194.dtsi | 11 +++++++++++
2 files changed, 22 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
index 0e9afc3e2f26..5f560f13ed93 100644
--- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
@@ -1461,6 +1461,17 @@ host1x@13e00000 {

iommus = <&smmu TEGRA186_SID_HOST1X>;

+ /* Context isolation domains */
+ iommu-map = <
+ 0 &smmu TEGRA186_SID_HOST1X_CTX0 1
+ 1 &smmu TEGRA186_SID_HOST1X_CTX1 1
+ 2 &smmu TEGRA186_SID_HOST1X_CTX2 1
+ 3 &smmu TEGRA186_SID_HOST1X_CTX3 1
+ 4 &smmu TEGRA186_SID_HOST1X_CTX4 1
+ 5 &smmu TEGRA186_SID_HOST1X_CTX5 1
+ 6 &smmu TEGRA186_SID_HOST1X_CTX6 1
+ 7 &smmu TEGRA186_SID_HOST1X_CTX7 1>;
+
dpaux1: dpaux@15040000 {
compatible = "nvidia,tegra186-dpaux";
reg = <0x15040000 0x10000>;
diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index d1f8248c00f4..613fd71dec25 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -1769,6 +1769,17 @@ host1x@13e00000 {
interconnect-names = "dma-mem";
iommus = <&smmu TEGRA194_SID_HOST1X>;

+ /* Context isolation domains */
+ iommu-map = <
+ 0 &smmu TEGRA194_SID_HOST1X_CTX0 1
+ 1 &smmu TEGRA194_SID_HOST1X_CTX1 1
+ 2 &smmu TEGRA194_SID_HOST1X_CTX2 1
+ 3 &smmu TEGRA194_SID_HOST1X_CTX3 1
+ 4 &smmu TEGRA194_SID_HOST1X_CTX4 1
+ 5 &smmu TEGRA194_SID_HOST1X_CTX5 1
+ 6 &smmu TEGRA194_SID_HOST1X_CTX6 1
+ 7 &smmu TEGRA194_SID_HOST1X_CTX7 1>;
+
nvdec@15140000 {
compatible = "nvidia,tegra194-nvdec";
reg = <0x15140000 0x00040000>;
--
2.36.1

2022-06-27 14:42:05

by Mikko Perttunen

[permalink] [raw]
Subject: [PATCH v7/v3 12/22] arm64: tegra: Add Host1x and VIC on Tegra234

From: Mikko Perttunen <[email protected]>

Add device tree nodes for Host1x and VIC on Tegra234.

Signed-off-by: Mikko Perttunen <[email protected]>
---
arch/arm64/boot/dts/nvidia/tegra234.dtsi | 46 ++++++++++++++++++++++++
1 file changed, 46 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra234.dtsi b/arch/arm64/boot/dts/nvidia/tegra234.dtsi
index cb3af539e477..cae68e59580c 100644
--- a/arch/arm64/boot/dts/nvidia/tegra234.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra234.dtsi
@@ -454,6 +454,52 @@ misc@100000 {
status = "okay";
};

+ host1x@13e00000 {
+ compatible = "nvidia,tegra234-host1x";
+ reg = <0x13e00000 0x10000>,
+ <0x13e10000 0x10000>,
+ <0x13e40000 0x10000>;
+ reg-names = "common", "hypervisor", "vm";
+ interrupts = <GIC_SPI 448 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 449 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 450 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 451 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 452 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 453 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 454 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 455 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 263 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-names = "syncpt0", "syncpt1", "syncpt2", "syncpt3", "syncpt4",
+ "syncpt5", "syncpt6", "syncpt7", "host1x";
+ clocks = <&bpmp TEGRA234_CLK_HOST1X>;
+ clock-names = "host1x";
+
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ ranges = <0x15000000 0x15000000 0x01000000>;
+ interconnects = <&mc TEGRA234_MEMORY_CLIENT_HOST1XDMAR &emc>;
+ interconnect-names = "dma-mem";
+ iommus = <&smmu_niso1 TEGRA234_SID_HOST1X>;
+
+ vic@15340000 {
+ compatible = "nvidia,tegra234-vic";
+ reg = <0x15340000 0x00040000>;
+ interrupts = <GIC_SPI 206 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&bpmp TEGRA234_CLK_VIC>;
+ clock-names = "vic";
+ resets = <&bpmp TEGRA234_RESET_VIC>;
+ reset-names = "vic";
+
+ power-domains = <&bpmp TEGRA234_POWER_DOMAIN_VIC>;
+ interconnects = <&mc TEGRA234_MEMORY_CLIENT_VICSRD &emc>,
+ <&mc TEGRA234_MEMORY_CLIENT_VICSWR &emc>;
+ interconnect-names = "dma-mem", "write";
+ iommus = <&smmu_niso1 TEGRA234_SID_VIC>;
+ dma-coherent;
+ };
+ };
+
gpio: gpio@2200000 {
compatible = "nvidia,tegra234-gpio";
reg-names = "security", "gpio";
--
2.36.1

2022-06-27 14:48:45

by Mikko Perttunen

[permalink] [raw]
Subject: [PATCH v7/v3 21/22] gpu: host1x: Use RESTART_W to skip timed out jobs on Tegra186+

From: Mikko Perttunen <[email protected]>

When MLOCK enforcement is enabled, the 0-word write currently done
is rejected by the hardware outside of an MLOCK region. As such,
on these chips, which also have the newer, more convenient RESTART_W
opcode, use that instead to skip over the timed out job.

Signed-off-by: Mikko Perttunen <[email protected]>
---
drivers/gpu/host1x/cdma.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index 765e5aa64eb6..bb1f3c746be4 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -457,9 +457,24 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
* to offset 0xbad. This does nothing but
* has a easily detected signature in debug
* traces.
+ *
+ * On systems with MLOCK enforcement enabled,
+ * the above 0 word writes would fall foul of
+ * the enforcement. As such, in the first slot
+ * put a RESTART_W opcode to the beginning
+ * of the next job. We don't use this for older
+ * chips since those only support the RESTART
+ * opcode with inconvenient alignment requirements.
*/
- mapped[2*slot+0] = 0x1bad0000;
- mapped[2*slot+1] = 0x1bad0000;
+ if (i == 0 && host1x->info->has_wide_gather) {
+ unsigned int next_job = (job->first_get/8 + job->num_slots)
+ % HOST1X_PUSHBUFFER_SLOTS;
+ mapped[2*slot+0] = (0xd << 28) | (next_job * 2);
+ mapped[2*slot+1] = 0x0;
+ } else {
+ mapped[2*slot+0] = 0x1bad0000;
+ mapped[2*slot+1] = 0x1bad0000;
+ }
}

job->cancelled = true;
--
2.36.1

2022-06-27 14:48:45

by Mikko Perttunen

[permalink] [raw]
Subject: [PATCH v7/v3 13/22] gpu: host1x: Deduplicate hardware headers

From: Mikko Perttunen <[email protected]>

Host1x class information and opcodes are unchanged or backwards
compatible across SoCs so let's not duplicate them for each one
but have them in a shared header file.

At the same time, add opcode functions for acquire/release_mlock.

Signed-off-by: Mikko Perttunen <[email protected]>
---
drivers/gpu/host1x/hw/host1x01_hardware.h | 114 +---------------
drivers/gpu/host1x/hw/host1x02_hardware.h | 113 +---------------
drivers/gpu/host1x/hw/host1x04_hardware.h | 113 +---------------
drivers/gpu/host1x/hw/host1x05_hardware.h | 113 +---------------
drivers/gpu/host1x/hw/host1x06_hardware.h | 128 +-----------------
drivers/gpu/host1x/hw/host1x07_hardware.h | 128 +-----------------
drivers/gpu/host1x/hw/opcodes.h | 150 ++++++++++++++++++++++
7 files changed, 156 insertions(+), 703 deletions(-)
create mode 100644 drivers/gpu/host1x/hw/opcodes.h

diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
index fe59df1d3dc3..cb93d7c1808c 100644
--- a/drivers/gpu/host1x/hw/host1x01_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x01_hardware.h
@@ -15,118 +15,6 @@
#include "hw_host1x01_sync.h"
#include "hw_host1x01_uclass.h"

-static inline u32 host1x_class_host_wait_syncpt(
- unsigned indx, unsigned threshold)
-{
- return host1x_uclass_wait_syncpt_indx_f(indx)
- | host1x_uclass_wait_syncpt_thresh_f(threshold);
-}
-
-static inline u32 host1x_class_host_load_syncpt_base(
- unsigned indx, unsigned threshold)
-{
- return host1x_uclass_load_syncpt_base_base_indx_f(indx)
- | host1x_uclass_load_syncpt_base_value_f(threshold);
-}
-
-static inline u32 host1x_class_host_wait_syncpt_base(
- unsigned indx, unsigned base_indx, unsigned offset)
-{
- return host1x_uclass_wait_syncpt_base_indx_f(indx)
- | host1x_uclass_wait_syncpt_base_base_indx_f(base_indx)
- | host1x_uclass_wait_syncpt_base_offset_f(offset);
-}
-
-static inline u32 host1x_class_host_incr_syncpt_base(
- unsigned base_indx, unsigned offset)
-{
- return host1x_uclass_incr_syncpt_base_base_indx_f(base_indx)
- | host1x_uclass_incr_syncpt_base_offset_f(offset);
-}
-
-static inline u32 host1x_class_host_incr_syncpt(
- unsigned cond, unsigned indx)
-{
- return host1x_uclass_incr_syncpt_cond_f(cond)
- | host1x_uclass_incr_syncpt_indx_f(indx);
-}
-
-static inline u32 host1x_class_host_indoff_reg_write(
- unsigned mod_id, unsigned offset, bool auto_inc)
-{
- u32 v = host1x_uclass_indoff_indbe_f(0xf)
- | host1x_uclass_indoff_indmodid_f(mod_id)
- | host1x_uclass_indoff_indroffset_f(offset);
- if (auto_inc)
- v |= host1x_uclass_indoff_autoinc_f(1);
- return v;
-}
-
-static inline u32 host1x_class_host_indoff_reg_read(
- unsigned mod_id, unsigned offset, bool auto_inc)
-{
- u32 v = host1x_uclass_indoff_indmodid_f(mod_id)
- | host1x_uclass_indoff_indroffset_f(offset)
- | host1x_uclass_indoff_rwn_read_v();
- if (auto_inc)
- v |= host1x_uclass_indoff_autoinc_f(1);
- return v;
-}
-
-
-/* cdma opcodes */
-static inline u32 host1x_opcode_setclass(
- unsigned class_id, unsigned offset, unsigned mask)
-{
- return (0 << 28) | (offset << 16) | (class_id << 6) | mask;
-}
-
-static inline u32 host1x_opcode_incr(unsigned offset, unsigned count)
-{
- return (1 << 28) | (offset << 16) | count;
-}
-
-static inline u32 host1x_opcode_nonincr(unsigned offset, unsigned count)
-{
- return (2 << 28) | (offset << 16) | count;
-}
-
-static inline u32 host1x_opcode_mask(unsigned offset, unsigned mask)
-{
- return (3 << 28) | (offset << 16) | mask;
-}
-
-static inline u32 host1x_opcode_imm(unsigned offset, unsigned value)
-{
- return (4 << 28) | (offset << 16) | value;
-}
-
-static inline u32 host1x_opcode_imm_incr_syncpt(unsigned cond, unsigned indx)
-{
- return host1x_opcode_imm(host1x_uclass_incr_syncpt_r(),
- host1x_class_host_incr_syncpt(cond, indx));
-}
-
-static inline u32 host1x_opcode_restart(unsigned address)
-{
- return (5 << 28) | (address >> 4);
-}
-
-static inline u32 host1x_opcode_gather(unsigned count)
-{
- return (6 << 28) | count;
-}
-
-static inline u32 host1x_opcode_gather_nonincr(unsigned offset, unsigned count)
-{
- return (6 << 28) | (offset << 16) | BIT(15) | count;
-}
-
-static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
-{
- return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
-}
-
-#define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
+#include "opcodes.h"

#endif
diff --git a/drivers/gpu/host1x/hw/host1x02_hardware.h b/drivers/gpu/host1x/hw/host1x02_hardware.h
index af60d7fb016d..2d1282b9bc33 100644
--- a/drivers/gpu/host1x/hw/host1x02_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x02_hardware.h
@@ -15,117 +15,6 @@
#include "hw_host1x02_sync.h"
#include "hw_host1x02_uclass.h"

-static inline u32 host1x_class_host_wait_syncpt(
- unsigned indx, unsigned threshold)
-{
- return host1x_uclass_wait_syncpt_indx_f(indx)
- | host1x_uclass_wait_syncpt_thresh_f(threshold);
-}
-
-static inline u32 host1x_class_host_load_syncpt_base(
- unsigned indx, unsigned threshold)
-{
- return host1x_uclass_load_syncpt_base_base_indx_f(indx)
- | host1x_uclass_load_syncpt_base_value_f(threshold);
-}
-
-static inline u32 host1x_class_host_wait_syncpt_base(
- unsigned indx, unsigned base_indx, unsigned offset)
-{
- return host1x_uclass_wait_syncpt_base_indx_f(indx)
- | host1x_uclass_wait_syncpt_base_base_indx_f(base_indx)
- | host1x_uclass_wait_syncpt_base_offset_f(offset);
-}
-
-static inline u32 host1x_class_host_incr_syncpt_base(
- unsigned base_indx, unsigned offset)
-{
- return host1x_uclass_incr_syncpt_base_base_indx_f(base_indx)
- | host1x_uclass_incr_syncpt_base_offset_f(offset);
-}
-
-static inline u32 host1x_class_host_incr_syncpt(
- unsigned cond, unsigned indx)
-{
- return host1x_uclass_incr_syncpt_cond_f(cond)
- | host1x_uclass_incr_syncpt_indx_f(indx);
-}
-
-static inline u32 host1x_class_host_indoff_reg_write(
- unsigned mod_id, unsigned offset, bool auto_inc)
-{
- u32 v = host1x_uclass_indoff_indbe_f(0xf)
- | host1x_uclass_indoff_indmodid_f(mod_id)
- | host1x_uclass_indoff_indroffset_f(offset);
- if (auto_inc)
- v |= host1x_uclass_indoff_autoinc_f(1);
- return v;
-}
-
-static inline u32 host1x_class_host_indoff_reg_read(
- unsigned mod_id, unsigned offset, bool auto_inc)
-{
- u32 v = host1x_uclass_indoff_indmodid_f(mod_id)
- | host1x_uclass_indoff_indroffset_f(offset)
- | host1x_uclass_indoff_rwn_read_v();
- if (auto_inc)
- v |= host1x_uclass_indoff_autoinc_f(1);
- return v;
-}
-
-/* cdma opcodes */
-static inline u32 host1x_opcode_setclass(
- unsigned class_id, unsigned offset, unsigned mask)
-{
- return (0 << 28) | (offset << 16) | (class_id << 6) | mask;
-}
-
-static inline u32 host1x_opcode_incr(unsigned offset, unsigned count)
-{
- return (1 << 28) | (offset << 16) | count;
-}
-
-static inline u32 host1x_opcode_nonincr(unsigned offset, unsigned count)
-{
- return (2 << 28) | (offset << 16) | count;
-}
-
-static inline u32 host1x_opcode_mask(unsigned offset, unsigned mask)
-{
- return (3 << 28) | (offset << 16) | mask;
-}
-
-static inline u32 host1x_opcode_imm(unsigned offset, unsigned value)
-{
- return (4 << 28) | (offset << 16) | value;
-}
-
-static inline u32 host1x_opcode_imm_incr_syncpt(unsigned cond, unsigned indx)
-{
- return host1x_opcode_imm(host1x_uclass_incr_syncpt_r(),
- host1x_class_host_incr_syncpt(cond, indx));
-}
-
-static inline u32 host1x_opcode_restart(unsigned address)
-{
- return (5 << 28) | (address >> 4);
-}
-
-static inline u32 host1x_opcode_gather(unsigned count)
-{
- return (6 << 28) | count;
-}
-
-static inline u32 host1x_opcode_gather_nonincr(unsigned offset, unsigned count)
-{
- return (6 << 28) | (offset << 16) | BIT(15) | count;
-}
-
-static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
-{
- return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
-}
-
-#define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
+#include "opcodes.h"

#endif
diff --git a/drivers/gpu/host1x/hw/host1x04_hardware.h b/drivers/gpu/host1x/hw/host1x04_hardware.h
index 4f9bcddf27e3..84d244e8af30 100644
--- a/drivers/gpu/host1x/hw/host1x04_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x04_hardware.h
@@ -15,117 +15,6 @@
#include "hw_host1x04_sync.h"
#include "hw_host1x04_uclass.h"

-static inline u32 host1x_class_host_wait_syncpt(
- unsigned indx, unsigned threshold)
-{
- return host1x_uclass_wait_syncpt_indx_f(indx)
- | host1x_uclass_wait_syncpt_thresh_f(threshold);
-}
-
-static inline u32 host1x_class_host_load_syncpt_base(
- unsigned indx, unsigned threshold)
-{
- return host1x_uclass_load_syncpt_base_base_indx_f(indx)
- | host1x_uclass_load_syncpt_base_value_f(threshold);
-}
-
-static inline u32 host1x_class_host_wait_syncpt_base(
- unsigned indx, unsigned base_indx, unsigned offset)
-{
- return host1x_uclass_wait_syncpt_base_indx_f(indx)
- | host1x_uclass_wait_syncpt_base_base_indx_f(base_indx)
- | host1x_uclass_wait_syncpt_base_offset_f(offset);
-}
-
-static inline u32 host1x_class_host_incr_syncpt_base(
- unsigned base_indx, unsigned offset)
-{
- return host1x_uclass_incr_syncpt_base_base_indx_f(base_indx)
- | host1x_uclass_incr_syncpt_base_offset_f(offset);
-}
-
-static inline u32 host1x_class_host_incr_syncpt(
- unsigned cond, unsigned indx)
-{
- return host1x_uclass_incr_syncpt_cond_f(cond)
- | host1x_uclass_incr_syncpt_indx_f(indx);
-}
-
-static inline u32 host1x_class_host_indoff_reg_write(
- unsigned mod_id, unsigned offset, bool auto_inc)
-{
- u32 v = host1x_uclass_indoff_indbe_f(0xf)
- | host1x_uclass_indoff_indmodid_f(mod_id)
- | host1x_uclass_indoff_indroffset_f(offset);
- if (auto_inc)
- v |= host1x_uclass_indoff_autoinc_f(1);
- return v;
-}
-
-static inline u32 host1x_class_host_indoff_reg_read(
- unsigned mod_id, unsigned offset, bool auto_inc)
-{
- u32 v = host1x_uclass_indoff_indmodid_f(mod_id)
- | host1x_uclass_indoff_indroffset_f(offset)
- | host1x_uclass_indoff_rwn_read_v();
- if (auto_inc)
- v |= host1x_uclass_indoff_autoinc_f(1);
- return v;
-}
-
-/* cdma opcodes */
-static inline u32 host1x_opcode_setclass(
- unsigned class_id, unsigned offset, unsigned mask)
-{
- return (0 << 28) | (offset << 16) | (class_id << 6) | mask;
-}
-
-static inline u32 host1x_opcode_incr(unsigned offset, unsigned count)
-{
- return (1 << 28) | (offset << 16) | count;
-}
-
-static inline u32 host1x_opcode_nonincr(unsigned offset, unsigned count)
-{
- return (2 << 28) | (offset << 16) | count;
-}
-
-static inline u32 host1x_opcode_mask(unsigned offset, unsigned mask)
-{
- return (3 << 28) | (offset << 16) | mask;
-}
-
-static inline u32 host1x_opcode_imm(unsigned offset, unsigned value)
-{
- return (4 << 28) | (offset << 16) | value;
-}
-
-static inline u32 host1x_opcode_imm_incr_syncpt(unsigned cond, unsigned indx)
-{
- return host1x_opcode_imm(host1x_uclass_incr_syncpt_r(),
- host1x_class_host_incr_syncpt(cond, indx));
-}
-
-static inline u32 host1x_opcode_restart(unsigned address)
-{
- return (5 << 28) | (address >> 4);
-}
-
-static inline u32 host1x_opcode_gather(unsigned count)
-{
- return (6 << 28) | count;
-}
-
-static inline u32 host1x_opcode_gather_nonincr(unsigned offset, unsigned count)
-{
- return (6 << 28) | (offset << 16) | BIT(15) | count;
-}
-
-static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
-{
- return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
-}
-
-#define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
+#include "opcodes.h"

#endif
diff --git a/drivers/gpu/host1x/hw/host1x05_hardware.h b/drivers/gpu/host1x/hw/host1x05_hardware.h
index af3ab4b7f010..1dcde6ec7909 100644
--- a/drivers/gpu/host1x/hw/host1x05_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x05_hardware.h
@@ -15,117 +15,6 @@
#include "hw_host1x05_sync.h"
#include "hw_host1x05_uclass.h"

-static inline u32 host1x_class_host_wait_syncpt(
- unsigned indx, unsigned threshold)
-{
- return host1x_uclass_wait_syncpt_indx_f(indx)
- | host1x_uclass_wait_syncpt_thresh_f(threshold);
-}
-
-static inline u32 host1x_class_host_load_syncpt_base(
- unsigned indx, unsigned threshold)
-{
- return host1x_uclass_load_syncpt_base_base_indx_f(indx)
- | host1x_uclass_load_syncpt_base_value_f(threshold);
-}
-
-static inline u32 host1x_class_host_wait_syncpt_base(
- unsigned indx, unsigned base_indx, unsigned offset)
-{
- return host1x_uclass_wait_syncpt_base_indx_f(indx)
- | host1x_uclass_wait_syncpt_base_base_indx_f(base_indx)
- | host1x_uclass_wait_syncpt_base_offset_f(offset);
-}
-
-static inline u32 host1x_class_host_incr_syncpt_base(
- unsigned base_indx, unsigned offset)
-{
- return host1x_uclass_incr_syncpt_base_base_indx_f(base_indx)
- | host1x_uclass_incr_syncpt_base_offset_f(offset);
-}
-
-static inline u32 host1x_class_host_incr_syncpt(
- unsigned cond, unsigned indx)
-{
- return host1x_uclass_incr_syncpt_cond_f(cond)
- | host1x_uclass_incr_syncpt_indx_f(indx);
-}
-
-static inline u32 host1x_class_host_indoff_reg_write(
- unsigned mod_id, unsigned offset, bool auto_inc)
-{
- u32 v = host1x_uclass_indoff_indbe_f(0xf)
- | host1x_uclass_indoff_indmodid_f(mod_id)
- | host1x_uclass_indoff_indroffset_f(offset);
- if (auto_inc)
- v |= host1x_uclass_indoff_autoinc_f(1);
- return v;
-}
-
-static inline u32 host1x_class_host_indoff_reg_read(
- unsigned mod_id, unsigned offset, bool auto_inc)
-{
- u32 v = host1x_uclass_indoff_indmodid_f(mod_id)
- | host1x_uclass_indoff_indroffset_f(offset)
- | host1x_uclass_indoff_rwn_read_v();
- if (auto_inc)
- v |= host1x_uclass_indoff_autoinc_f(1);
- return v;
-}
-
-/* cdma opcodes */
-static inline u32 host1x_opcode_setclass(
- unsigned class_id, unsigned offset, unsigned mask)
-{
- return (0 << 28) | (offset << 16) | (class_id << 6) | mask;
-}
-
-static inline u32 host1x_opcode_incr(unsigned offset, unsigned count)
-{
- return (1 << 28) | (offset << 16) | count;
-}
-
-static inline u32 host1x_opcode_nonincr(unsigned offset, unsigned count)
-{
- return (2 << 28) | (offset << 16) | count;
-}
-
-static inline u32 host1x_opcode_mask(unsigned offset, unsigned mask)
-{
- return (3 << 28) | (offset << 16) | mask;
-}
-
-static inline u32 host1x_opcode_imm(unsigned offset, unsigned value)
-{
- return (4 << 28) | (offset << 16) | value;
-}
-
-static inline u32 host1x_opcode_imm_incr_syncpt(unsigned cond, unsigned indx)
-{
- return host1x_opcode_imm(host1x_uclass_incr_syncpt_r(),
- host1x_class_host_incr_syncpt(cond, indx));
-}
-
-static inline u32 host1x_opcode_restart(unsigned address)
-{
- return (5 << 28) | (address >> 4);
-}
-
-static inline u32 host1x_opcode_gather(unsigned count)
-{
- return (6 << 28) | count;
-}
-
-static inline u32 host1x_opcode_gather_nonincr(unsigned offset, unsigned count)
-{
- return (6 << 28) | (offset << 16) | BIT(15) | count;
-}
-
-static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
-{
- return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
-}
-
-#define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
+#include "opcodes.h"

#endif
diff --git a/drivers/gpu/host1x/hw/host1x06_hardware.h b/drivers/gpu/host1x/hw/host1x06_hardware.h
index 5d515745eee7..c05cfa7e3090 100644
--- a/drivers/gpu/host1x/hw/host1x06_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x06_hardware.h
@@ -16,132 +16,6 @@
#include "hw_host1x06_vm.h"
#include "hw_host1x06_hypervisor.h"

-static inline u32 host1x_class_host_wait_syncpt(
- unsigned indx, unsigned threshold)
-{
- return host1x_uclass_wait_syncpt_indx_f(indx)
- | host1x_uclass_wait_syncpt_thresh_f(threshold);
-}
-
-static inline u32 host1x_class_host_load_syncpt_base(
- unsigned indx, unsigned threshold)
-{
- return host1x_uclass_load_syncpt_base_base_indx_f(indx)
- | host1x_uclass_load_syncpt_base_value_f(threshold);
-}
-
-static inline u32 host1x_class_host_wait_syncpt_base(
- unsigned indx, unsigned base_indx, unsigned offset)
-{
- return host1x_uclass_wait_syncpt_base_indx_f(indx)
- | host1x_uclass_wait_syncpt_base_base_indx_f(base_indx)
- | host1x_uclass_wait_syncpt_base_offset_f(offset);
-}
-
-static inline u32 host1x_class_host_incr_syncpt_base(
- unsigned base_indx, unsigned offset)
-{
- return host1x_uclass_incr_syncpt_base_base_indx_f(base_indx)
- | host1x_uclass_incr_syncpt_base_offset_f(offset);
-}
-
-static inline u32 host1x_class_host_incr_syncpt(
- unsigned cond, unsigned indx)
-{
- return host1x_uclass_incr_syncpt_cond_f(cond)
- | host1x_uclass_incr_syncpt_indx_f(indx);
-}
-
-static inline u32 host1x_class_host_indoff_reg_write(
- unsigned mod_id, unsigned offset, bool auto_inc)
-{
- u32 v = host1x_uclass_indoff_indbe_f(0xf)
- | host1x_uclass_indoff_indmodid_f(mod_id)
- | host1x_uclass_indoff_indroffset_f(offset);
- if (auto_inc)
- v |= host1x_uclass_indoff_autoinc_f(1);
- return v;
-}
-
-static inline u32 host1x_class_host_indoff_reg_read(
- unsigned mod_id, unsigned offset, bool auto_inc)
-{
- u32 v = host1x_uclass_indoff_indmodid_f(mod_id)
- | host1x_uclass_indoff_indroffset_f(offset)
- | host1x_uclass_indoff_rwn_read_v();
- if (auto_inc)
- v |= host1x_uclass_indoff_autoinc_f(1);
- return v;
-}
-
-/* cdma opcodes */
-static inline u32 host1x_opcode_setclass(
- unsigned class_id, unsigned offset, unsigned mask)
-{
- return (0 << 28) | (offset << 16) | (class_id << 6) | mask;
-}
-
-static inline u32 host1x_opcode_incr(unsigned offset, unsigned count)
-{
- return (1 << 28) | (offset << 16) | count;
-}
-
-static inline u32 host1x_opcode_nonincr(unsigned offset, unsigned count)
-{
- return (2 << 28) | (offset << 16) | count;
-}
-
-static inline u32 host1x_opcode_mask(unsigned offset, unsigned mask)
-{
- return (3 << 28) | (offset << 16) | mask;
-}
-
-static inline u32 host1x_opcode_imm(unsigned offset, unsigned value)
-{
- return (4 << 28) | (offset << 16) | value;
-}
-
-static inline u32 host1x_opcode_imm_incr_syncpt(unsigned cond, unsigned indx)
-{
- return host1x_opcode_imm(host1x_uclass_incr_syncpt_r(),
- host1x_class_host_incr_syncpt(cond, indx));
-}
-
-static inline u32 host1x_opcode_restart(unsigned address)
-{
- return (5 << 28) | (address >> 4);
-}
-
-static inline u32 host1x_opcode_gather(unsigned count)
-{
- return (6 << 28) | count;
-}
-
-static inline u32 host1x_opcode_gather_nonincr(unsigned offset, unsigned count)
-{
- return (6 << 28) | (offset << 16) | BIT(15) | count;
-}
-
-static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
-{
- return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
-}
-
-static inline u32 host1x_opcode_setstreamid(unsigned streamid)
-{
- return (7 << 28) | streamid;
-}
-
-static inline u32 host1x_opcode_setpayload(unsigned payload)
-{
- return (9 << 28) | payload;
-}
-
-static inline u32 host1x_opcode_gather_wide(unsigned count)
-{
- return (12 << 28) | count;
-}
-
-#define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
+#include "opcodes.h"

#endif
diff --git a/drivers/gpu/host1x/hw/host1x07_hardware.h b/drivers/gpu/host1x/hw/host1x07_hardware.h
index 82c0cc9bb0b5..d67364e03956 100644
--- a/drivers/gpu/host1x/hw/host1x07_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x07_hardware.h
@@ -16,132 +16,6 @@
#include "hw_host1x07_vm.h"
#include "hw_host1x07_hypervisor.h"

-static inline u32 host1x_class_host_wait_syncpt(
- unsigned indx, unsigned threshold)
-{
- return host1x_uclass_wait_syncpt_indx_f(indx)
- | host1x_uclass_wait_syncpt_thresh_f(threshold);
-}
-
-static inline u32 host1x_class_host_load_syncpt_base(
- unsigned indx, unsigned threshold)
-{
- return host1x_uclass_load_syncpt_base_base_indx_f(indx)
- | host1x_uclass_load_syncpt_base_value_f(threshold);
-}
-
-static inline u32 host1x_class_host_wait_syncpt_base(
- unsigned indx, unsigned base_indx, unsigned offset)
-{
- return host1x_uclass_wait_syncpt_base_indx_f(indx)
- | host1x_uclass_wait_syncpt_base_base_indx_f(base_indx)
- | host1x_uclass_wait_syncpt_base_offset_f(offset);
-}
-
-static inline u32 host1x_class_host_incr_syncpt_base(
- unsigned base_indx, unsigned offset)
-{
- return host1x_uclass_incr_syncpt_base_base_indx_f(base_indx)
- | host1x_uclass_incr_syncpt_base_offset_f(offset);
-}
-
-static inline u32 host1x_class_host_incr_syncpt(
- unsigned cond, unsigned indx)
-{
- return host1x_uclass_incr_syncpt_cond_f(cond)
- | host1x_uclass_incr_syncpt_indx_f(indx);
-}
-
-static inline u32 host1x_class_host_indoff_reg_write(
- unsigned mod_id, unsigned offset, bool auto_inc)
-{
- u32 v = host1x_uclass_indoff_indbe_f(0xf)
- | host1x_uclass_indoff_indmodid_f(mod_id)
- | host1x_uclass_indoff_indroffset_f(offset);
- if (auto_inc)
- v |= host1x_uclass_indoff_autoinc_f(1);
- return v;
-}
-
-static inline u32 host1x_class_host_indoff_reg_read(
- unsigned mod_id, unsigned offset, bool auto_inc)
-{
- u32 v = host1x_uclass_indoff_indmodid_f(mod_id)
- | host1x_uclass_indoff_indroffset_f(offset)
- | host1x_uclass_indoff_rwn_read_v();
- if (auto_inc)
- v |= host1x_uclass_indoff_autoinc_f(1);
- return v;
-}
-
-/* cdma opcodes */
-static inline u32 host1x_opcode_setclass(
- unsigned class_id, unsigned offset, unsigned mask)
-{
- return (0 << 28) | (offset << 16) | (class_id << 6) | mask;
-}
-
-static inline u32 host1x_opcode_incr(unsigned offset, unsigned count)
-{
- return (1 << 28) | (offset << 16) | count;
-}
-
-static inline u32 host1x_opcode_nonincr(unsigned offset, unsigned count)
-{
- return (2 << 28) | (offset << 16) | count;
-}
-
-static inline u32 host1x_opcode_mask(unsigned offset, unsigned mask)
-{
- return (3 << 28) | (offset << 16) | mask;
-}
-
-static inline u32 host1x_opcode_imm(unsigned offset, unsigned value)
-{
- return (4 << 28) | (offset << 16) | value;
-}
-
-static inline u32 host1x_opcode_imm_incr_syncpt(unsigned cond, unsigned indx)
-{
- return host1x_opcode_imm(host1x_uclass_incr_syncpt_r(),
- host1x_class_host_incr_syncpt(cond, indx));
-}
-
-static inline u32 host1x_opcode_restart(unsigned address)
-{
- return (5 << 28) | (address >> 4);
-}
-
-static inline u32 host1x_opcode_gather(unsigned count)
-{
- return (6 << 28) | count;
-}
-
-static inline u32 host1x_opcode_gather_nonincr(unsigned offset, unsigned count)
-{
- return (6 << 28) | (offset << 16) | BIT(15) | count;
-}
-
-static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
-{
- return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
-}
-
-static inline u32 host1x_opcode_setstreamid(unsigned streamid)
-{
- return (7 << 28) | streamid;
-}
-
-static inline u32 host1x_opcode_setpayload(unsigned payload)
-{
- return (9 << 28) | payload;
-}
-
-static inline u32 host1x_opcode_gather_wide(unsigned count)
-{
- return (12 << 28) | count;
-}
-
-#define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
+#include "opcodes.h"

#endif
diff --git a/drivers/gpu/host1x/hw/opcodes.h b/drivers/gpu/host1x/hw/opcodes.h
new file mode 100644
index 000000000000..649614499b04
--- /dev/null
+++ b/drivers/gpu/host1x/hw/opcodes.h
@@ -0,0 +1,150 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Tegra host1x opcodes
+ *
+ * Copyright (c) 2022 NVIDIA Corporation.
+ */
+
+#ifndef __HOST1X_OPCODES_H
+#define __HOST1X_OPCODES_H
+
+#include <linux/types.h>
+
+static inline u32 host1x_class_host_wait_syncpt(
+ unsigned indx, unsigned threshold)
+{
+ return host1x_uclass_wait_syncpt_indx_f(indx)
+ | host1x_uclass_wait_syncpt_thresh_f(threshold);
+}
+
+static inline u32 host1x_class_host_load_syncpt_base(
+ unsigned indx, unsigned threshold)
+{
+ return host1x_uclass_load_syncpt_base_base_indx_f(indx)
+ | host1x_uclass_load_syncpt_base_value_f(threshold);
+}
+
+static inline u32 host1x_class_host_wait_syncpt_base(
+ unsigned indx, unsigned base_indx, unsigned offset)
+{
+ return host1x_uclass_wait_syncpt_base_indx_f(indx)
+ | host1x_uclass_wait_syncpt_base_base_indx_f(base_indx)
+ | host1x_uclass_wait_syncpt_base_offset_f(offset);
+}
+
+static inline u32 host1x_class_host_incr_syncpt_base(
+ unsigned base_indx, unsigned offset)
+{
+ return host1x_uclass_incr_syncpt_base_base_indx_f(base_indx)
+ | host1x_uclass_incr_syncpt_base_offset_f(offset);
+}
+
+static inline u32 host1x_class_host_incr_syncpt(
+ unsigned cond, unsigned indx)
+{
+ return host1x_uclass_incr_syncpt_cond_f(cond)
+ | host1x_uclass_incr_syncpt_indx_f(indx);
+}
+
+static inline u32 host1x_class_host_indoff_reg_write(
+ unsigned mod_id, unsigned offset, bool auto_inc)
+{
+ u32 v = host1x_uclass_indoff_indbe_f(0xf)
+ | host1x_uclass_indoff_indmodid_f(mod_id)
+ | host1x_uclass_indoff_indroffset_f(offset);
+ if (auto_inc)
+ v |= host1x_uclass_indoff_autoinc_f(1);
+ return v;
+}
+
+static inline u32 host1x_class_host_indoff_reg_read(
+ unsigned mod_id, unsigned offset, bool auto_inc)
+{
+ u32 v = host1x_uclass_indoff_indmodid_f(mod_id)
+ | host1x_uclass_indoff_indroffset_f(offset)
+ | host1x_uclass_indoff_rwn_read_v();
+ if (auto_inc)
+ v |= host1x_uclass_indoff_autoinc_f(1);
+ return v;
+}
+
+static inline u32 host1x_opcode_setclass(
+ unsigned class_id, unsigned offset, unsigned mask)
+{
+ return (0 << 28) | (offset << 16) | (class_id << 6) | mask;
+}
+
+static inline u32 host1x_opcode_incr(unsigned offset, unsigned count)
+{
+ return (1 << 28) | (offset << 16) | count;
+}
+
+static inline u32 host1x_opcode_nonincr(unsigned offset, unsigned count)
+{
+ return (2 << 28) | (offset << 16) | count;
+}
+
+static inline u32 host1x_opcode_mask(unsigned offset, unsigned mask)
+{
+ return (3 << 28) | (offset << 16) | mask;
+}
+
+static inline u32 host1x_opcode_imm(unsigned offset, unsigned value)
+{
+ return (4 << 28) | (offset << 16) | value;
+}
+
+static inline u32 host1x_opcode_imm_incr_syncpt(unsigned cond, unsigned indx)
+{
+ return host1x_opcode_imm(host1x_uclass_incr_syncpt_r(),
+ host1x_class_host_incr_syncpt(cond, indx));
+}
+
+static inline u32 host1x_opcode_restart(unsigned address)
+{
+ return (5 << 28) | (address >> 4);
+}
+
+static inline u32 host1x_opcode_gather(unsigned count)
+{
+ return (6 << 28) | count;
+}
+
+static inline u32 host1x_opcode_gather_nonincr(unsigned offset, unsigned count)
+{
+ return (6 << 28) | (offset << 16) | BIT(15) | count;
+}
+
+static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
+{
+ return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
+}
+
+static inline u32 host1x_opcode_setstreamid(unsigned streamid)
+{
+ return (7 << 28) | streamid;
+}
+
+static inline u32 host1x_opcode_setpayload(unsigned payload)
+{
+ return (9 << 28) | payload;
+}
+
+static inline u32 host1x_opcode_gather_wide(unsigned count)
+{
+ return (12 << 28) | count;
+}
+
+static inline u32 host1x_opcode_acquire_mlock(unsigned mlock)
+{
+ return (14 << 28) | (0 << 24) | mlock;
+}
+
+static inline u32 host1x_opcode_release_mlock(unsigned mlock)
+{
+ return (14 << 28) | (1 << 24) | mlock;
+}
+
+#define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
+
+#endif
--
2.36.1

2022-06-27 14:49:39

by Mikko Perttunen

[permalink] [raw]
Subject: [PATCH v7/v3 19/22] gpu: host1x: Rewrite job opcode sequence

From: Mikko Perttunen <[email protected]>

For new (Tegra186+) SoCs, use a new ('full-featured') job opcode
sequence that is compatible with virtualization. In particular,
the Host1x hardware in Tegra234 is more strict regarding the sequence,
requiring ACQUIRE_MLOCK-SETCLASS-SETSTREAMID opcodes to occur in
that sequence without gaps (except for SETPAYLOAD), so let's do it
properly in one go now.

Signed-off-by: Mikko Perttunen <[email protected]>
---
drivers/gpu/host1x/hw/channel_hw.c | 144 +++++++++++++++++------------
1 file changed, 85 insertions(+), 59 deletions(-)

diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index f84caf06621a..4eb7fb2e4f0a 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -47,10 +47,41 @@ static void trace_write_gather(struct host1x_cdma *cdma, struct host1x_bo *bo,
}
}

-static void submit_wait(struct host1x_cdma *cdma, u32 id, u32 threshold,
+static void submit_wait(struct host1x_job *job, u32 id, u32 threshold,
u32 next_class)
{
-#if HOST1X_HW >= 2
+ struct host1x_cdma *cdma = &job->channel->cdma;
+
+#if HOST1X_HW >= 6
+ u32 stream_id;
+
+ /*
+ * If a memory context has been set, use it. Otherwise
+ * (if context isolation is disabled) use the engine's
+ * firmware stream ID.
+ */
+ if (job->memory_context)
+ stream_id = job->memory_context->stream_id;
+ else
+ stream_id = job->engine_fallback_streamid;
+
+ host1x_cdma_push_wide(cdma,
+ host1x_opcode_setclass(
+ HOST1X_CLASS_HOST1X,
+ HOST1X_UCLASS_LOAD_SYNCPT_PAYLOAD_32,
+ /* WAIT_SYNCPT_32 is at SYNCPT_PAYLOAD_32+2 */
+ BIT(0) | BIT(2)
+ ),
+ threshold,
+ id,
+ HOST1X_OPCODE_NOP
+ );
+ host1x_cdma_push_wide(&job->channel->cdma,
+ host1x_opcode_setclass(job->class, 0, 0),
+ host1x_opcode_setpayload(stream_id),
+ host1x_opcode_setstreamid(job->engine_streamid_offset / 4),
+ HOST1X_OPCODE_NOP);
+#elif HOST1X_HW >= 2
host1x_cdma_push_wide(cdma,
host1x_opcode_setclass(
HOST1X_CLASS_HOST1X,
@@ -97,7 +128,7 @@ static void submit_gathers(struct host1x_job *job, u32 job_syncpt_base)
else
threshold = cmd->wait.threshold;

- submit_wait(cdma, cmd->wait.id, threshold, cmd->wait.next_class);
+ submit_wait(job, cmd->wait.id, threshold, cmd->wait.next_class);
} else {
struct host1x_job_gather *g = &cmd->gather;

@@ -180,42 +211,70 @@ static void host1x_enable_gather_filter(struct host1x_channel *ch)
#endif
}

-static void host1x_channel_program_engine_streamid(struct host1x_job *job)
+static void channel_program_cdma(struct host1x_job *job)
{
+ struct host1x_cdma *cdma = &job->channel->cdma;
+ struct host1x_syncpt *sp = job->syncpt;
+
#if HOST1X_HW >= 6
u32 fence;

- if (!job->memory_context)
- return;
+ /* Enter engine class with invalid stream ID. */
+ host1x_cdma_push_wide(cdma,
+ host1x_opcode_acquire_mlock(job->class),
+ host1x_opcode_setclass(job->class, 0, 0),
+ host1x_opcode_setpayload(0),
+ host1x_opcode_setstreamid(job->engine_streamid_offset / 4));

- fence = host1x_syncpt_incr_max(job->syncpt, 1);
+ /* Before switching stream ID to real stream ID, ensure engine is idle. */
+ fence = host1x_syncpt_incr_max(sp, 1);
+ host1x_cdma_push(&job->channel->cdma,
+ host1x_opcode_nonincr(HOST1X_UCLASS_INCR_SYNCPT, 1),
+ HOST1X_UCLASS_INCR_SYNCPT_INDX_F(job->syncpt->id) |
+ HOST1X_UCLASS_INCR_SYNCPT_COND_F(4));
+ submit_wait(job, job->syncpt->id, fence, job->class);

- /* First, increment a syncpoint on OP_DONE condition.. */
+ /* Submit work. */
+ job->syncpt_end = host1x_syncpt_incr_max(sp, job->syncpt_incrs);
+ submit_gathers(job, job->syncpt_end - job->syncpt_incrs);

+ /* Before releasing MLOCK, ensure engine is idle again. */
+ fence = host1x_syncpt_incr_max(sp, 1);
host1x_cdma_push(&job->channel->cdma,
host1x_opcode_nonincr(HOST1X_UCLASS_INCR_SYNCPT, 1),
HOST1X_UCLASS_INCR_SYNCPT_INDX_F(job->syncpt->id) |
- HOST1X_UCLASS_INCR_SYNCPT_COND_F(1));
+ HOST1X_UCLASS_INCR_SYNCPT_COND_F(4));
+ submit_wait(job, job->syncpt->id, fence, job->class);

- /* Wait for syncpoint to increment */
+ /* Release MLOCK. */
+ host1x_cdma_push(cdma,
+ HOST1X_OPCODE_NOP, host1x_opcode_release_mlock(job->class));
+#else
+ if (job->serialize) {
+ /*
+ * Force serialization by inserting a host wait for the
+ * previous job to finish before this one can commence.
+ */
+ host1x_cdma_push(cdma,
+ host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
+ host1x_uclass_wait_syncpt_r(), 1),
+ host1x_class_host_wait_syncpt(job->syncpt->id,
+ host1x_syncpt_read_max(sp)));
+ }

- host1x_cdma_push(&job->channel->cdma,
- host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
- host1x_uclass_wait_syncpt_r(), 1),
- host1x_class_host_wait_syncpt(job->syncpt->id, fence));
+ /* Synchronize base register to allow using it for relative waiting */
+ if (sp->base)
+ synchronize_syncpt_base(job);

- /*
- * Now that we know the engine is idle, return to class and
- * change stream ID.
- */
+ /* add a setclass for modules that require it */
+ if (job->class)
+ host1x_cdma_push(cdma,
+ host1x_opcode_setclass(job->class, 0, 0),
+ HOST1X_OPCODE_NOP);

- host1x_cdma_push(&job->channel->cdma,
- host1x_opcode_setclass(job->class, 0, 0),
- HOST1X_OPCODE_NOP);
+ job->syncpt_end = host1x_syncpt_incr_max(sp, job->syncpt_incrs);

- host1x_cdma_push(&job->channel->cdma,
- host1x_opcode_setpayload(job->memory_context->stream_id),
- host1x_opcode_setstreamid(job->engine_streamid_offset / 4));
+ submit_gathers(job, job->syncpt_end - job->syncpt_incrs);
#endif
}

@@ -223,7 +282,6 @@ static int channel_submit(struct host1x_job *job)
{
struct host1x_channel *ch = job->channel;
struct host1x_syncpt *sp = job->syncpt;
- u32 user_syncpt_incrs = job->syncpt_incrs;
u32 prev_max = 0;
u32 syncval;
int err;
@@ -251,6 +309,7 @@ static int channel_submit(struct host1x_job *job)

host1x_channel_set_streamid(ch);
host1x_enable_gather_filter(ch);
+ host1x_hw_syncpt_assign_to_channel(host, sp, ch);

/* begin a CDMA submit */
err = host1x_cdma_begin(&ch->cdma, job);
@@ -259,40 +318,7 @@ static int channel_submit(struct host1x_job *job)
goto error;
}

- if (job->serialize) {
- /*
- * Force serialization by inserting a host wait for the
- * previous job to finish before this one can commence.
- */
- host1x_cdma_push(&ch->cdma,
- host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
- host1x_uclass_wait_syncpt_r(), 1),
- host1x_class_host_wait_syncpt(job->syncpt->id,
- host1x_syncpt_read_max(sp)));
- }
-
- /* Synchronize base register to allow using it for relative waiting */
- if (sp->base)
- synchronize_syncpt_base(job);
-
- host1x_hw_syncpt_assign_to_channel(host, sp, ch);
-
- /* add a setclass for modules that require it */
- if (job->class)
- host1x_cdma_push(&ch->cdma,
- host1x_opcode_setclass(job->class, 0, 0),
- HOST1X_OPCODE_NOP);
-
- /*
- * Ensure engine DMA is idle and set new stream ID. May increment
- * syncpt max.
- */
- host1x_channel_program_engine_streamid(job);
-
- syncval = host1x_syncpt_incr_max(sp, user_syncpt_incrs);
- job->syncpt_end = syncval;
-
- submit_gathers(job, syncval - user_syncpt_incrs);
+ channel_program_cdma(job);

/* end CDMA submit & stash pinned hMems into sync queue */
host1x_cdma_end(&ch->cdma, job);
--
2.36.1

2022-06-27 14:50:31

by Mikko Perttunen

[permalink] [raw]
Subject: [PATCH v7/v3 20/22] gpu: host1x: Add MLOCK release code on Tegra234

From: Mikko Perttunen <[email protected]>

With the full-featured opcode sequence using MLOCKs, we need to also
unlock those MLOCKs in the event of a timeout. However, it turns out
that on Tegra186/Tegra194, by default, we don't need to do this;
furthermore, on Tegra234 it is much simpler to do; so only implement
this on Tegra234 for the time being.

Signed-off-by: Mikko Perttunen <[email protected]>
---
drivers/gpu/host1x/hw/cdma_hw.c | 34 ++++++++++++++++++++++
drivers/gpu/host1x/hw/hw_host1x08_common.h | 7 +++++
2 files changed, 41 insertions(+)

diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
index e49cd5b8f735..1b65a10b9dfc 100644
--- a/drivers/gpu/host1x/hw/cdma_hw.c
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -238,6 +238,37 @@ static void cdma_resume(struct host1x_cdma *cdma, u32 getptr)
cdma_timeout_restart(cdma, getptr);
}

+static void timeout_release_mlock(struct host1x_cdma *cdma)
+{
+#if HOST1X_HW >= 8
+ /* Tegra186 and Tegra194 require a more complicated MLOCK release
+ * sequence. Furthermore, those chips by default don't enforce MLOCKs,
+ * so it turns out that if we don't /actually/ need MLOCKs, we can just
+ * ignore them.
+ *
+ * As such, for now just implement this on Tegra234 where things are
+ * stricter but also easy to implement.
+ */
+ struct host1x_channel *ch = cdma_to_channel(cdma);
+ struct host1x *host1x = cdma_to_host1x(cdma);
+ u32 offset;
+
+ switch (ch->client->class) {
+ case HOST1X_CLASS_VIC:
+ offset = HOST1X_COMMON_VIC_MLOCK;
+ break;
+ case HOST1X_CLASS_NVDEC:
+ offset = HOST1X_COMMON_NVDEC_MLOCK;
+ break;
+ default:
+ WARN(1, "%s was not updated for class %u", __func__, ch->client->class);
+ return;
+ }
+
+ host1x_common_writel(host1x, 0x0, offset);
+#endif
+}
+
/*
* If this timeout fires, it indicates the current sync_queue entry has
* exceeded its TTL and the userctx should be timed out and remaining
@@ -288,6 +319,9 @@ static void cdma_timeout_handler(struct work_struct *work)
/* stop HW, resetting channel/module */
host1x_hw_cdma_freeze(host1x, cdma);

+ /* release any held MLOCK */
+ timeout_release_mlock(cdma);
+
host1x_cdma_update_sync_queue(cdma, ch->dev);
mutex_unlock(&cdma->lock);
}
diff --git a/drivers/gpu/host1x/hw/hw_host1x08_common.h b/drivers/gpu/host1x/hw/hw_host1x08_common.h
index 4df28440b86b..8e0c99150ec2 100644
--- a/drivers/gpu/host1x/hw/hw_host1x08_common.h
+++ b/drivers/gpu/host1x/hw/hw_host1x08_common.h
@@ -2,3 +2,10 @@
/*
* Copyright (c) 2022 NVIDIA Corporation.
*/
+
+#define HOST1X_COMMON_OFA_MLOCK 0x4050
+#define HOST1X_COMMON_NVJPG1_MLOCK 0x4070
+#define HOST1X_COMMON_VIC_MLOCK 0x4078
+#define HOST1X_COMMON_NVENC_MLOCK 0x407c
+#define HOST1X_COMMON_NVDEC_MLOCK 0x4080
+#define HOST1X_COMMON_NVJPG_MLOCK 0x4084
--
2.36.1

2022-06-27 14:54:48

by Mikko Perttunen

[permalink] [raw]
Subject: [PATCH v7/v3 11/22] dt-bindings: Add headers for Host1x and VIC on Tegra234

From: Mikko Perttunen <[email protected]>

Add clock, memory controller, powergate and reset dt-binding headers
for Host1x and VIC on Tegra234.

Signed-off-by: Mikko Perttunen <[email protected]>
Acked-by: Krzysztof Kozlowski <[email protected]>
---
include/dt-bindings/clock/tegra234-clock.h | 4 ++++
include/dt-bindings/memory/tegra234-mc.h | 5 +++++
include/dt-bindings/power/tegra234-powergate.h | 1 +
include/dt-bindings/reset/tegra234-reset.h | 1 +
4 files changed, 11 insertions(+)

diff --git a/include/dt-bindings/clock/tegra234-clock.h b/include/dt-bindings/clock/tegra234-clock.h
index bd4c3086a2da..6e4e5cc75631 100644
--- a/include/dt-bindings/clock/tegra234-clock.h
+++ b/include/dt-bindings/clock/tegra234-clock.h
@@ -38,6 +38,8 @@
* throughput and memory controller power.
*/
#define TEGRA234_CLK_EMC 31U
+/** @brief output of mux controlled by CLK_RST_CONTROLLER_CLK_SOURCE_HOST1X */
+#define TEGRA234_CLK_HOST1X 46U
/** @brief output of gate CLK_ENB_FUSE */
#define TEGRA234_CLK_FUSE 40U
/** @brief output of mux controlled by CLK_RST_CONTROLLER_CLK_SOURCE_I2C1 */
@@ -132,6 +134,8 @@
#define TEGRA234_CLK_UARTA 155U
/** @brief output of gate CLK_ENB_PEX1_CORE_6 */
#define TEGRA234_CLK_PEX1_C6_CORE 161U
+/** @brief output of mux controlled by CLK_RST_CONTROLLER_CLK_SOURCE_VIC */
+#define TEGRA234_CLK_VIC 167U
/** @brief output of gate CLK_ENB_PEX2_CORE_7 */
#define TEGRA234_CLK_PEX2_C7_CORE 171U
/** @brief output of gate CLK_ENB_PEX2_CORE_8 */
diff --git a/include/dt-bindings/memory/tegra234-mc.h b/include/dt-bindings/memory/tegra234-mc.h
index e3b0e9da295d..73fdd18523a9 100644
--- a/include/dt-bindings/memory/tegra234-mc.h
+++ b/include/dt-bindings/memory/tegra234-mc.h
@@ -26,6 +26,8 @@
#define TEGRA234_SID_PCIE8 0x09
#define TEGRA234_SID_PCIE10 0x0b
#define TEGRA234_SID_BPMP 0x10
+#define TEGRA234_SID_HOST1X 0x27
+#define TEGRA234_SID_VIC 0x34

/*
* memory client IDs
@@ -33,6 +35,7 @@

/* High-definition audio (HDA) read clients */
#define TEGRA234_MEMORY_CLIENT_HDAR 0x15
+#define TEGRA234_MEMORY_CLIENT_HOST1XDMAR 0x16
/* PCIE6 read clients */
#define TEGRA234_MEMORY_CLIENT_PCIE6AR 0x28
/* PCIE6 write clients */
@@ -65,6 +68,8 @@
#define TEGRA234_MEMORY_CLIENT_SDMMCRAB 0x63
/* sdmmcd memory write client */
#define TEGRA234_MEMORY_CLIENT_SDMMCWAB 0x67
+#define TEGRA234_MEMORY_CLIENT_VICSRD 0x6c
+#define TEGRA234_MEMORY_CLIENT_VICSWR 0x6d
/* BPMP read client */
#define TEGRA234_MEMORY_CLIENT_BPMPR 0x93
/* BPMP write client */
diff --git a/include/dt-bindings/power/tegra234-powergate.h b/include/dt-bindings/power/tegra234-powergate.h
index f610eee9bce8..c3f7e380d2c6 100644
--- a/include/dt-bindings/power/tegra234-powergate.h
+++ b/include/dt-bindings/power/tegra234-powergate.h
@@ -18,5 +18,6 @@
#define TEGRA234_POWER_DOMAIN_MGBEA 17U
#define TEGRA234_POWER_DOMAIN_MGBEB 18U
#define TEGRA234_POWER_DOMAIN_MGBEC 19U
+#define TEGRA234_POWER_DOMAIN_VIC 29U

#endif
diff --git a/include/dt-bindings/reset/tegra234-reset.h b/include/dt-bindings/reset/tegra234-reset.h
index 547ca3b60caa..1971400bf360 100644
--- a/include/dt-bindings/reset/tegra234-reset.h
+++ b/include/dt-bindings/reset/tegra234-reset.h
@@ -44,6 +44,7 @@
#define TEGRA234_RESET_QSPI1 77U
#define TEGRA234_RESET_SDMMC4 85U
#define TEGRA234_RESET_UARTA 100U
+#define TEGRA234_RESET_VIC 113U
#define TEGRA234_RESET_PEX0_CORE_0 116U
#define TEGRA234_RESET_PEX0_CORE_1 117U
#define TEGRA234_RESET_PEX0_CORE_2 118U
--
2.36.1

2022-06-30 15:45:28

by Rob Herring

[permalink] [raw]
Subject: Re: [PATCH v7/v3 00/22] Host1x context isolation / Tegra234 support

On Mon, Jun 27, 2022 at 05:19:46PM +0300, Mikko Perttunen wrote:
> From: Mikko Perttunen <[email protected]>
>
> Integrated the Host1x context isolation series (patches 1 to 8) and
> Tegra234 support series (patches 9 to 22) in one email thread for
> the benefit of automatic testers.

And probably to the detriment of tools looking at the version number
like b4 with the double version. Don't get creative like this.

Rob