2023-02-23 10:52:10

by Konrad Dybcio

[permalink] [raw]
Subject: [PATCH v3 0/7] OPP and devfreq for all Adrenos

v2 -> v3:

- Add [2/7], x-ref with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21484
- De-magic-ify the remaining BIT(6) in a2xx_busy (thanks Dmitry)
- Drop unnecessary else{} level in [3/7]
- Pick up tags

v2: https://lore.kernel.org/linux-arm-msm/[email protected]/

v1 -> v2:

- Move a2xx #defines to XML
- Use dev_pm_opp_find_freq_floor in the common path in [2/6]
- Clarify a comment in [2/6]
- Move voting from a5xx to Adreno-wide [6/6]
- Pick up tags

v1: https://lore.kernel.org/linux-arm-msm/[email protected]

This series is a combination of [1] and a subset of [2] and some new
stuff.

With it, devfreq is used on all a2xx-a6xx (including gmu and
gmu-wrapper) and all clk_set_rate(core clock) calls are dropped in
favour of dev_pm_opp_set_rate, which - drumroll - lets us scale
the voltage domain. DT patches making use of that will be sent
separately.

On top of that, a5xx gets a call to enable icc scaling from the OPP
tables. No SoCs implementing a2xx have icc support yet and a3/4xx
SoCs have separate logic for that, which will be updated at a later
time.

Getting this in for 6.4 early would be appreciated, as that would
allow for getting GMU wrapper GPUs up (without VDD&icc scaling they
can only run at lower freqs, which is.. ehhh..)

Changes:
- a3xx busy: use the _1 counter as per msm-3.x instead of _0
- a6xx-series-opp: basically rewrite, ensure compat with all gens
- a2/4xx busy: new patch
- a5xx icc: new patch

[1] https://lore.kernel.org/linux-arm-msm/[email protected]/
[2] https://lore.kernel.org/linux-arm-msm/[email protected]/

Signed-off-by: Konrad Dybcio <[email protected]>
---
Konrad Dybcio (7):
drm/msm/a2xx: Include perf counter reg values in XML
drm/msm/a2xx: Add REG_A2XX_RBBM_PM_OVERRIDE2 to XML
drm/msm/adreno: Use OPP for every GPU generation
drm/msm/a2xx: Implement .gpu_busy
drm/msm/a3xx: Implement .gpu_busy
drm/msm/a4xx: Implement .gpu_busy
drm/msm/adreno: Enable optional icc voting from OPP tables

drivers/gpu/drm/msm/adreno/a2xx.xml.h | 18 ++++++
drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 26 ++++++++
drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 11 ++++
drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 11 ++++
drivers/gpu/drm/msm/adreno/adreno_device.c | 4 ++
drivers/gpu/drm/msm/adreno/adreno_gpu.c | 99 +++++++++++++-----------------
drivers/gpu/drm/msm/msm_gpu.c | 4 +-
drivers/gpu/drm/msm/msm_gpu_devfreq.c | 2 +-
8 files changed, 117 insertions(+), 58 deletions(-)
---
base-commit: aaf70d5ad5e2b06a8050c51e278b0c3a14fabef5
change-id: 20230223-topic-opp-01e7112b867d

Best regards,
--
Konrad Dybcio <[email protected]>



2023-02-23 10:52:14

by Konrad Dybcio

[permalink] [raw]
Subject: [PATCH v3 1/7] drm/msm/a2xx: Include perf counter reg values in XML

This is a partial merge of [1], subject to be dropped if a header
update is executed.

[1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21480/

Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a2xx.xml.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a2xx.xml.h b/drivers/gpu/drm/msm/adreno/a2xx.xml.h
index afa6023346c4..b85fdc082bc1 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx.xml.h
+++ b/drivers/gpu/drm/msm/adreno/a2xx.xml.h
@@ -1060,6 +1060,12 @@ enum a2xx_mh_perfcnt_select {
AXI_TOTAL_READ_REQUEST_DATA_BEATS = 181,
};

+enum perf_mode_cnt {
+ PERF_STATE_RESET = 0,
+ PERF_STATE_ENABLE = 1,
+ PERF_STATE_FREEZE = 2,
+};
+
enum adreno_mmu_clnt_beh {
BEH_NEVR = 0,
BEH_TRAN_RNG = 1,

--
2.39.2


2023-02-23 10:52:18

by Konrad Dybcio

[permalink] [raw]
Subject: [PATCH v3 2/7] drm/msm/a2xx: Add REG_A2XX_RBBM_PM_OVERRIDE2 to XML

This is a partial merge of [1], subject to be dropped if a header
update is executed.

[1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21484

Suggested-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a2xx.xml.h | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a2xx.xml.h b/drivers/gpu/drm/msm/adreno/a2xx.xml.h
index b85fdc082bc1..fbac25f66c67 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx.xml.h
+++ b/drivers/gpu/drm/msm/adreno/a2xx.xml.h
@@ -1313,6 +1313,18 @@ static inline uint32_t A2XX_MH_MMU_VA_RANGE_VA_BASE(uint32_t val)
#define A2XX_RBBM_PM_OVERRIDE1_MH_TCROQ_SCLK_PM_OVERRIDE 0x80000000

#define REG_A2XX_RBBM_PM_OVERRIDE2 0x0000039d
+#define A2XX_RBBM_PM_OVERRIDE2_PA_REG_SCLK_PM_OVERRIDE 0x00000001
+#define A2XX_RBBM_PM_OVERRIDE2_PA_PA_SCLK_PM_OVERRIDE 0x00000002
+#define A2XX_RBBM_PM_OVERRIDE2_PA_AG_SCLK_PM_OVERRIDE 0x00000004
+#define A2XX_RBBM_PM_OVERRIDE2_VGT_REG_SCLK_PM_OVERRIDE 0x00000008
+#define A2XX_RBBM_PM_OVERRIDE2_VGT_FIFOS_SCLK_PM_OVERRIDE 0x00000010
+#define A2XX_RBBM_PM_OVERRIDE2_VGT_VGT_SCLK_PM_OVERRIDE 0x00000020
+#define A2XX_RBBM_PM_OVERRIDE2_DEBUG_PERF_SCLK_PM_OVERRIDE 0x00000040
+#define A2XX_RBBM_PM_OVERRIDE2_PERM_SCLK_PM_OVERRIDE 0x00000080
+#define A2XX_RBBM_PM_OVERRIDE2_GC_GA_GMEM0_PM_OVERRIDE 0x00000100
+#define A2XX_RBBM_PM_OVERRIDE2_GC_GA_GMEM1_PM_OVERRIDE 0x00000200
+#define A2XX_RBBM_PM_OVERRIDE2_GC_GA_GMEM2_PM_OVERRIDE 0x00000400
+#define A2XX_RBBM_PM_OVERRIDE2_GC_GA_GMEM3_PM_OVERRIDE 0x00000800

#define REG_A2XX_RBBM_DEBUG_OUT 0x000003a0


--
2.39.2


2023-02-23 10:52:24

by Konrad Dybcio

[permalink] [raw]
Subject: [PATCH v3 3/7] drm/msm/adreno: Use OPP for every GPU generation

Some older GPUs (namely a2xx with no opp tables at all and a320 with
downstream-remnants gpu pwrlevels) used not to have OPP tables. They
both however had just one frequency defined, making it extremely easy
to construct such an OPP table from within the driver if need be.

Do so and switch all clk_set_rate calls on core_clk to their OPP
counterparts.

Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/adreno_gpu.c | 99 +++++++++++++++------------------
drivers/gpu/drm/msm/msm_gpu.c | 4 +-
drivers/gpu/drm/msm/msm_gpu_devfreq.c | 2 +-
3 files changed, 47 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index ce6b76c45b6f..d12f2f314022 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -922,73 +922,46 @@ void adreno_wait_ring(struct msm_ringbuffer *ring, uint32_t ndwords)
ring->id);
}

-/* Get legacy powerlevels from qcom,gpu-pwrlevels and populate the opp table */
-static int adreno_get_legacy_pwrlevels(struct device *dev)
-{
- struct device_node *child, *node;
- int ret;
-
- node = of_get_compatible_child(dev->of_node, "qcom,gpu-pwrlevels");
- if (!node) {
- DRM_DEV_DEBUG(dev, "Could not find the GPU powerlevels\n");
- return -ENXIO;
- }
-
- for_each_child_of_node(node, child) {
- unsigned int val;
-
- ret = of_property_read_u32(child, "qcom,gpu-freq", &val);
- if (ret)
- continue;
-
- /*
- * Skip the intentionally bogus clock value found at the bottom
- * of most legacy frequency tables
- */
- if (val != 27000000)
- dev_pm_opp_add(dev, val, 0);
- }
-
- of_node_put(node);
-
- return 0;
-}
-
-static void adreno_get_pwrlevels(struct device *dev,
+static int adreno_get_pwrlevels(struct device *dev,
struct msm_gpu *gpu)
{
+ struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
unsigned long freq = ULONG_MAX;
struct dev_pm_opp *opp;
int ret;

gpu->fast_rate = 0;

- /* You down with OPP? */
- if (!of_find_property(dev->of_node, "operating-points-v2", NULL))
- ret = adreno_get_legacy_pwrlevels(dev);
- else {
- ret = devm_pm_opp_of_add_table(dev);
- if (ret)
- DRM_DEV_ERROR(dev, "Unable to set the OPP table\n");
- }
-
- if (!ret) {
- /* Find the fastest defined rate */
- opp = dev_pm_opp_find_freq_floor(dev, &freq);
- if (!IS_ERR(opp)) {
- gpu->fast_rate = freq;
- dev_pm_opp_put(opp);
+ /* devm_pm_opp_of_add_table may error out but will still create an OPP table */
+ ret = devm_pm_opp_of_add_table(dev);
+ if (ret == -ENODEV) {
+ /* Special cases for ancient hw with ancient DT bindings */
+ if (adreno_is_a2xx(adreno_gpu)) {
+ dev_warn(dev, "Unable to find the OPP table. Falling back to 200 MHz.\n");
+ dev_pm_opp_add(dev, 200000000, 0);
+ } else if (adreno_is_a320(adreno_gpu)) {
+ dev_warn(dev, "Unable to find the OPP table. Falling back to 450 MHz.\n");
+ dev_pm_opp_add(dev, 450000000, 0);
+ } else {
+ DRM_DEV_ERROR(dev, "Unable to find the OPP table\n");
+ return -ENODEV;
}
+ } else if (ret) {
+ DRM_DEV_ERROR(dev, "Unable to set the OPP table\n");
+ return ret;
}

- if (!gpu->fast_rate) {
- dev_warn(dev,
- "Could not find a clock rate. Using a reasonable default\n");
- /* Pick a suitably safe clock speed for any target */
- gpu->fast_rate = 200000000;
- }
+ /* Find the fastest defined rate */
+ opp = dev_pm_opp_find_freq_floor(dev, &freq);
+ if (IS_ERR(opp))
+ return PTR_ERR(opp);
+
+ gpu->fast_rate = freq;
+ dev_pm_opp_put(opp);

DBG("fast_rate=%u, slow_rate=27000000", gpu->fast_rate);
+
+ return 0;
}

int adreno_gpu_ocmem_init(struct device *dev, struct adreno_gpu *adreno_gpu,
@@ -1046,6 +1019,20 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
struct adreno_rev *rev = &config->rev;
const char *gpu_name;
u32 speedbin;
+ int ret;
+
+ /*
+ * This can only be done before devm_pm_opp_of_add_table(), or
+ * dev_pm_opp_set_config() will WARN_ON()
+ */
+ if (IS_ERR(devm_clk_get(dev, "core"))) {
+ /*
+ * If "core" is absent, go for the legacy clock name.
+ * If we got this far in probing, it's a given one of them exists.
+ */
+ devm_pm_opp_set_clkname(dev, "core_clk");
+ } else
+ devm_pm_opp_set_clkname(dev, "core");

adreno_gpu->funcs = funcs;
adreno_gpu->info = adreno_info(config->rev);
@@ -1070,7 +1057,9 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,

adreno_gpu_config.nr_rings = nr_rings;

- adreno_get_pwrlevels(dev, gpu);
+ ret = adreno_get_pwrlevels(dev, gpu);
+ if (ret)
+ return ret;

pm_runtime_set_autosuspend_delay(dev,
adreno_gpu->info->inactive_period);
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 380249500325..cdcb00df3f25 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -59,7 +59,7 @@ static int disable_pwrrail(struct msm_gpu *gpu)
static int enable_clk(struct msm_gpu *gpu)
{
if (gpu->core_clk && gpu->fast_rate)
- clk_set_rate(gpu->core_clk, gpu->fast_rate);
+ dev_pm_opp_set_rate(&gpu->pdev->dev, gpu->fast_rate);

/* Set the RBBM timer rate to 19.2Mhz */
if (gpu->rbbmtimer_clk)
@@ -78,7 +78,7 @@ static int disable_clk(struct msm_gpu *gpu)
* will be rounded down to zero anyway so it all works out.
*/
if (gpu->core_clk)
- clk_set_rate(gpu->core_clk, 27000000);
+ dev_pm_opp_set_rate(&gpu->pdev->dev, 27000000);

if (gpu->rbbmtimer_clk)
clk_set_rate(gpu->rbbmtimer_clk, 0);
diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
index e27dbf12b5e8..ea70c1c32d94 100644
--- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
+++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
@@ -48,7 +48,7 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,
gpu->funcs->gpu_set_freq(gpu, opp, df->suspended);
mutex_unlock(&df->lock);
} else {
- clk_set_rate(gpu->core_clk, *freq);
+ dev_pm_opp_set_rate(dev, *freq);
}

dev_pm_opp_put(opp);

--
2.39.2


2023-02-23 10:52:27

by Konrad Dybcio

[permalink] [raw]
Subject: [PATCH v3 5/7] drm/msm/a3xx: Implement .gpu_busy

Add support for gpu_busy on a3xx, which is required for devfreq
support.

Tested-by: Dmitry Baryshkov <[email protected]> #ifc6410
Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index 948785ed07bb..c86b377f6f0d 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -477,6 +477,16 @@ static struct msm_gpu_state *a3xx_gpu_state_get(struct msm_gpu *gpu)
return state;
}

+static u64 a3xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
+{
+ u64 busy_cycles;
+
+ busy_cycles = gpu_read64(gpu, REG_A3XX_RBBM_PERFCTR_RBBM_1_LO);
+ *out_sample_rate = clk_get_rate(gpu->core_clk);
+
+ return busy_cycles;
+}
+
static u32 a3xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
{
ring->memptrs->rptr = gpu_read(gpu, REG_AXXX_CP_RB_RPTR);
@@ -498,6 +508,7 @@ static const struct adreno_gpu_funcs funcs = {
#if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP)
.show = adreno_show,
#endif
+ .gpu_busy = a3xx_gpu_busy,
.gpu_state_get = a3xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
.create_address_space = adreno_create_address_space,

--
2.39.2


2023-02-23 10:52:29

by Konrad Dybcio

[permalink] [raw]
Subject: [PATCH v3 4/7] drm/msm/a2xx: Implement .gpu_busy

Implement gpu_busy based on the downstream msm-3.4 code [1]. This
allows us to use devfreq on this old old old hardware!

[1] https://github.com/LineageOS/android_kernel_sony_apq8064/blob/lineage-16.0/drivers/gpu/msm/adreno_a2xx.c#L1975

Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
index c67089a7ebc1..104bdf28cdaf 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
@@ -481,6 +481,31 @@ a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
return aspace;
}

+/* While the precise size of this field is unknown, it holds at least these three values.. */
+static u64 a2xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
+{
+ u64 busy_cycles;
+
+ /* Freeze the counter */
+ gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_FREEZE);
+
+ busy_cycles = gpu_read64(gpu, REG_A2XX_RBBM_PERFCOUNTER1_LO);
+
+ /* Reset the counter */
+ gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_RESET);
+
+ /* Re-enable the performance monitors */
+ gpu_rmw(gpu, REG_A2XX_RBBM_PM_OVERRIDE2,
+ A2XX_RBBM_PM_OVERRIDE2_DEBUG_PERF_SCLK_PM_OVERRIDE,
+ A2XX_RBBM_PM_OVERRIDE2_DEBUG_PERF_SCLK_PM_OVERRIDE);
+ gpu_write(gpu, REG_A2XX_RBBM_PERFCOUNTER1_SELECT, 1);
+ gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_ENABLE);
+
+ *out_sample_rate = clk_get_rate(gpu->core_clk);
+
+ return busy_cycles;
+}
+
static u32 a2xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
{
ring->memptrs->rptr = gpu_read(gpu, REG_AXXX_CP_RB_RPTR);
@@ -502,6 +527,7 @@ static const struct adreno_gpu_funcs funcs = {
#if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP)
.show = adreno_show,
#endif
+ .gpu_busy = a2xx_gpu_busy,
.gpu_state_get = a2xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
.create_address_space = a2xx_create_address_space,

--
2.39.2


2023-02-23 10:52:31

by Konrad Dybcio

[permalink] [raw]
Subject: [PATCH v3 6/7] drm/msm/a4xx: Implement .gpu_busy

Add support for gpu_busy on a4xx, which is required for devfreq
support.

Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index 3e09d3a7a0ac..715436cb3996 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -611,6 +611,16 @@ static int a4xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
return 0;
}

+static u64 a4xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
+{
+ u64 busy_cycles;
+
+ busy_cycles = gpu_read64(gpu, REG_A4XX_RBBM_PERFCTR_RBBM_1_LO);
+ *out_sample_rate = clk_get_rate(gpu->core_clk);
+
+ return busy_cycles;
+}
+
static u32 a4xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
{
ring->memptrs->rptr = gpu_read(gpu, REG_A4XX_CP_RB_RPTR);
@@ -632,6 +642,7 @@ static const struct adreno_gpu_funcs funcs = {
#if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP)
.show = adreno_show,
#endif
+ .gpu_busy = a4xx_gpu_busy,
.gpu_state_get = a4xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
.create_address_space = adreno_create_address_space,

--
2.39.2


2023-02-23 10:52:38

by Konrad Dybcio

[permalink] [raw]
Subject: [PATCH v3 7/7] drm/msm/adreno: Enable optional icc voting from OPP tables

Add the dev_pm_opp_of_find_icc_paths() call to let the OPP framework
handle bus voting as part of power level setting.

Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/adreno_device.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 36f062c7582f..5142a4c72cfc 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -548,6 +548,10 @@ static int adreno_bind(struct device *dev, struct device *master, void *data)
return PTR_ERR(gpu);
}

+ ret = dev_pm_opp_of_find_icc_paths(dev, NULL);
+ if (ret)
+ return ret;
+
return 0;
}


--
2.39.2


2023-02-23 15:32:46

by Dmitry Baryshkov

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] drm/msm/a2xx: Include perf counter reg values in XML

On 23/02/2023 12:51, Konrad Dybcio wrote:
> This is a partial merge of [1], subject to be dropped if a header
> update is executed.
>
> [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21480/
>
> Signed-off-by: Konrad Dybcio <[email protected]>
> ---
> drivers/gpu/drm/msm/adreno/a2xx.xml.h | 6 ++++++
> 1 file changed, 6 insertions(+)

Reviewed-by: Dmitry Baryshkov <[email protected]>

--
With best wishes
Dmitry


2023-02-23 15:32:59

by Dmitry Baryshkov

[permalink] [raw]
Subject: Re: [PATCH v3 2/7] drm/msm/a2xx: Add REG_A2XX_RBBM_PM_OVERRIDE2 to XML

On 23/02/2023 12:51, Konrad Dybcio wrote:
> This is a partial merge of [1], subject to be dropped if a header
> update is executed.
>
> [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21484
>
> Suggested-by: Dmitry Baryshkov <[email protected]>
> Signed-off-by: Konrad Dybcio <[email protected]>
> ---
> drivers/gpu/drm/msm/adreno/a2xx.xml.h | 12 ++++++++++++
> 1 file changed, 12 insertions(+)

Reviewed-by: Dmitry Baryshkov <[email protected]>

--
With best wishes
Dmitry


2023-02-24 15:05:21

by Jonathan Marek

[permalink] [raw]
Subject: Re: [Freedreno] [PATCH v3 4/7] drm/msm/a2xx: Implement .gpu_busy

This won't work because a2xx freedreno userspace expects to own all the
perfcounters.

This will break perfcounters for userspace, and when userspace isn't
using perfcounters, this won't count correctly because userspace writes
0 to CP_PERFMON_CNTL at the start of every submit.

On 2/23/23 5:52 AM, Konrad Dybcio wrote:
> Implement gpu_busy based on the downstream msm-3.4 code [1]. This
> allows us to use devfreq on this old old old hardware!
>
> [1] https://github.com/LineageOS/android_kernel_sony_apq8064/blob/lineage-16.0/drivers/gpu/msm/adreno_a2xx.c#L1975
>
> Reviewed-by: Dmitry Baryshkov <[email protected]>
> Signed-off-by: Konrad Dybcio <[email protected]>
> ---
> drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> index c67089a7ebc1..104bdf28cdaf 100644
> --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> @@ -481,6 +481,31 @@ a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
> return aspace;
> }
>
> +/* While the precise size of this field is unknown, it holds at least these three values.. */
> +static u64 a2xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
> +{
> + u64 busy_cycles;
> +
> + /* Freeze the counter */
> + gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_FREEZE);
> +
> + busy_cycles = gpu_read64(gpu, REG_A2XX_RBBM_PERFCOUNTER1_LO);
> +
> + /* Reset the counter */
> + gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_RESET);
> +
> + /* Re-enable the performance monitors */
> + gpu_rmw(gpu, REG_A2XX_RBBM_PM_OVERRIDE2,
> + A2XX_RBBM_PM_OVERRIDE2_DEBUG_PERF_SCLK_PM_OVERRIDE,
> + A2XX_RBBM_PM_OVERRIDE2_DEBUG_PERF_SCLK_PM_OVERRIDE);
> + gpu_write(gpu, REG_A2XX_RBBM_PERFCOUNTER1_SELECT, 1);
> + gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_ENABLE);
> +
> + *out_sample_rate = clk_get_rate(gpu->core_clk);
> +
> + return busy_cycles;
> +}
> +
> static u32 a2xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
> {
> ring->memptrs->rptr = gpu_read(gpu, REG_AXXX_CP_RB_RPTR);
> @@ -502,6 +527,7 @@ static const struct adreno_gpu_funcs funcs = {
> #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP)
> .show = adreno_show,
> #endif
> + .gpu_busy = a2xx_gpu_busy,
> .gpu_state_get = a2xx_gpu_state_get,
> .gpu_state_put = adreno_gpu_state_put,
> .create_address_space = a2xx_create_address_space,
>

2023-03-13 16:55:04

by Konrad Dybcio

[permalink] [raw]
Subject: Re: [Freedreno] [PATCH v3 4/7] drm/msm/a2xx: Implement .gpu_busy



On 24.02.2023 16:04, Jonathan Marek wrote:
> This won't work because a2xx freedreno userspace expects to own all the perfcounters.
>
> This will break perfcounters for userspace, and when userspace isn't using perfcounters, this won't count correctly because userspace writes 0 to CP_PERFMON_CNTL at the start of every submit.

Rob, would you be willing to take this without the a2xx bits? It
should still be fine, except without devfreq. Not that we had
any significant sort of scaling on a2xx before.

Konrad
>
> On 2/23/23 5:52 AM, Konrad Dybcio wrote:
>> Implement gpu_busy based on the downstream msm-3.4 code [1]. This
>> allows us to use devfreq on this old old old hardware!
>>
>> [1] https://github.com/LineageOS/android_kernel_sony_apq8064/blob/lineage-16.0/drivers/gpu/msm/adreno_a2xx.c#L1975
>>
>> Reviewed-by: Dmitry Baryshkov <[email protected]>
>> Signed-off-by: Konrad Dybcio <[email protected]>
>> ---
>>   drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 26 ++++++++++++++++++++++++++
>>   1 file changed, 26 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
>> index c67089a7ebc1..104bdf28cdaf 100644
>> --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
>> @@ -481,6 +481,31 @@ a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
>>       return aspace;
>>   }
>>   +/* While the precise size of this field is unknown, it holds at least these three values.. */
>> +static u64 a2xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
>> +{
>> +    u64 busy_cycles;
>> +
>> +    /* Freeze the counter */
>> +    gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_FREEZE);
>> +
>> +    busy_cycles = gpu_read64(gpu, REG_A2XX_RBBM_PERFCOUNTER1_LO);
>> +
>> +    /* Reset the counter */
>> +    gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_RESET);
>> +
>> +    /* Re-enable the performance monitors */
>> +    gpu_rmw(gpu, REG_A2XX_RBBM_PM_OVERRIDE2,
>> +        A2XX_RBBM_PM_OVERRIDE2_DEBUG_PERF_SCLK_PM_OVERRIDE,
>> +        A2XX_RBBM_PM_OVERRIDE2_DEBUG_PERF_SCLK_PM_OVERRIDE);
>> +    gpu_write(gpu, REG_A2XX_RBBM_PERFCOUNTER1_SELECT, 1);
>> +    gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_ENABLE);
>> +
>> +    *out_sample_rate = clk_get_rate(gpu->core_clk);
>> +
>> +    return busy_cycles;
>> +}
>> +
>>   static u32 a2xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
>>   {
>>       ring->memptrs->rptr = gpu_read(gpu, REG_AXXX_CP_RB_RPTR);
>> @@ -502,6 +527,7 @@ static const struct adreno_gpu_funcs funcs = {
>>   #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP)
>>           .show = adreno_show,
>>   #endif
>> +        .gpu_busy = a2xx_gpu_busy,
>>           .gpu_state_get = a2xx_gpu_state_get,
>>           .gpu_state_put = adreno_gpu_state_put,
>>           .create_address_space = a2xx_create_address_space,
>>

2023-03-20 18:09:17

by Rob Clark

[permalink] [raw]
Subject: Re: [Freedreno] [PATCH v3 4/7] drm/msm/a2xx: Implement .gpu_busy

On Mon, Mar 13, 2023 at 9:54 AM Konrad Dybcio <[email protected]> wrote:
>
>
>
> On 24.02.2023 16:04, Jonathan Marek wrote:
> > This won't work because a2xx freedreno userspace expects to own all the perfcounters.
> >
> > This will break perfcounters for userspace, and when userspace isn't using perfcounters, this won't count correctly because userspace writes 0 to CP_PERFMON_CNTL at the start of every submit.
>
> Rob, would you be willing to take this without the a2xx bits? It
> should still be fine, except without devfreq. Not that we had
> any significant sort of scaling on a2xx before.

Yup, sounds like a plan

BR,
-R

> Konrad
> >
> > On 2/23/23 5:52 AM, Konrad Dybcio wrote:
> >> Implement gpu_busy based on the downstream msm-3.4 code [1]. This
> >> allows us to use devfreq on this old old old hardware!
> >>
> >> [1] https://github.com/LineageOS/android_kernel_sony_apq8064/blob/lineage-16.0/drivers/gpu/msm/adreno_a2xx.c#L1975
> >>
> >> Reviewed-by: Dmitry Baryshkov <[email protected]>
> >> Signed-off-by: Konrad Dybcio <[email protected]>
> >> ---
> >> drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 26 ++++++++++++++++++++++++++
> >> 1 file changed, 26 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> >> index c67089a7ebc1..104bdf28cdaf 100644
> >> --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> >> +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> >> @@ -481,6 +481,31 @@ a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
> >> return aspace;
> >> }
> >> +/* While the precise size of this field is unknown, it holds at least these three values.. */
> >> +static u64 a2xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
> >> +{
> >> + u64 busy_cycles;
> >> +
> >> + /* Freeze the counter */
> >> + gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_FREEZE);
> >> +
> >> + busy_cycles = gpu_read64(gpu, REG_A2XX_RBBM_PERFCOUNTER1_LO);
> >> +
> >> + /* Reset the counter */
> >> + gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_RESET);
> >> +
> >> + /* Re-enable the performance monitors */
> >> + gpu_rmw(gpu, REG_A2XX_RBBM_PM_OVERRIDE2,
> >> + A2XX_RBBM_PM_OVERRIDE2_DEBUG_PERF_SCLK_PM_OVERRIDE,
> >> + A2XX_RBBM_PM_OVERRIDE2_DEBUG_PERF_SCLK_PM_OVERRIDE);
> >> + gpu_write(gpu, REG_A2XX_RBBM_PERFCOUNTER1_SELECT, 1);
> >> + gpu_write(gpu, REG_A2XX_CP_PERFMON_CNTL, PERF_STATE_ENABLE);
> >> +
> >> + *out_sample_rate = clk_get_rate(gpu->core_clk);
> >> +
> >> + return busy_cycles;
> >> +}
> >> +
> >> static u32 a2xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
> >> {
> >> ring->memptrs->rptr = gpu_read(gpu, REG_AXXX_CP_RB_RPTR);
> >> @@ -502,6 +527,7 @@ static const struct adreno_gpu_funcs funcs = {
> >> #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP)
> >> .show = adreno_show,
> >> #endif
> >> + .gpu_busy = a2xx_gpu_busy,
> >> .gpu_state_get = a2xx_gpu_state_get,
> >> .gpu_state_put = adreno_gpu_state_put,
> >> .create_address_space = a2xx_create_address_space,
> >>