v7 -> v8:
- Fix up resume/suspend (icc now correctly parks to 0, don't abuse
OPP & genpd throughout system-wide suspend)
- Don't handle ebi1_clk separately, the bulk ops handle it just fine
- Rebase on next-20230525 (no meaningful changes)
v7: https://lore.kernel.org/linux-arm-msm/[email protected]/
v6 -> v7:
- Rebase on next-20230519 (A640/650 speedbin merged already)
- separate out the .get_timestamp cb for gmu wrapper
- check for gmu presence inside a6xx_llc_slices_(init|destroy) instead
of before calling them
- use REG_A6XX_RBBM_GPR0_CNTL instead of literal 0x18
- move a6xx_bus_clear_pending_transactions to a6xx_gpu, clean it up
and reuse it for gmu wrapper gpus
- drop clearing RBBM_GBIF (GBIF from GX's POV) as part of draining the
buses, it's not necessary
- introduce a helper for gpu softreset
- sw-reset the gmu wrapper GPUS *after* draining GBIF and only reset
it if it's hung
- reword the commit message in "Remove both GBIF and RBBM GBIF halt
on hw init" and move it before gmu wrapper-specific changes
- drop set_rate logic from a6xx_pm_suspend as the clock simply gets
disabled and we don't have to worry about scaling problems as OPP
and devfreq take care of that, validated with debugcc
- drop a level of indentation in _a6xx_check_idle() to hopefully
improve readability
- check for !a610 instead of gmu_wrapper||a619_holi in sptprac cc
toggling in a6xx_set_hwcg()
- pick up krzk's rb on bindings
All external dependencies have been merged since the last revision.
v6: https://lore.kernel.org/r/[email protected]
v5 -> v6:
- Rebase on 8ead96783163 ("drm/msm/gpu: Move BO allocation out of hw_init")
(Add .ucode_load to funcs_gmuwrapper)
- Drop A6[45]0 speedbin deps, merged into msm-next
Dependencies:
- https://lore.kernel.org/linux-arm-msm/[email protected]/ (to work properly)
v5: https://lore.kernel.org/linux-arm-msm/[email protected]/
v4 -> v5:
- Add a newline before the new allOf:if: [3/15]
- Enforce 6 clocks on A619_holi/A610 [2/15]
- Pick up tags
- Improve error handling in a6xx_pm_resume [6/15]
- Add patch [1/15] (fix an existing issue) which can be picked
separately and account for it in [6/15]
- Rebase atop Akhil's CX shutdown patches and incorporate analogous logic
- Fix a regression introduced in v3 that made the fw loader expect
GMU fw on GMU wrapper GPUs
Dependencies:
- https://lore.kernel.org/linux-arm-msm/[email protected]/ (to apply)
- https://lore.kernel.org/linux-arm-msm/[email protected]/ (to work properly)
v4: https://lore.kernel.org/r/[email protected]
v3 -> v4:
- Drop the mistakengly-included and wrong A3xx-A5xx bindings changes
- Improve bindings commit messages to better explain what GMU Wrapper is
- Drop the A680 highest bank bit value adjustment patch
- Sort UBWC config variables in a reverse-Christmass-tree fashion [4/14]
- Don't alter any UBWC config values in [4/14]
- Do so for a619_holi in [8/14]
- Rebase on next-20230314 (shouldn't matter at all)
v3: https://lore.kernel.org/r/[email protected]
v2 -> v3:
New dependencies:
- https://lore.kernel.org/linux-arm-msm/[email protected]/T/#t
- https://lore.kernel.org/linux-arm-msm/[email protected]/
Sidenote: A speedbin rework is in progress, the of_machine_is_compatible
calls in A619_holi are ugly (but well, necessary..) but they'll be
replaced with socid matching in this or the next kernel cycle.
Due to the new way of identifying GMU wrapper GPUs, configuring 6350
to use wrapper would cause the wrong fuse values to be checked, but that
will be solved by the conversion + the ultimate goal is to use the GMU
whenever possible with the wrapper left for GMU-less Adrenos and early
bringup debugging of GMU-equipped ones.
- Ship dt-bindings in this series as we're referencing the compatible now
- "De-staticize" -> "remove static keyword" [3/15]
- Track down all the values in [4/15]
- Add many comments and explanations in [4/15]
- Fix possible return-before-mutex-unlock [5/15]
- Explain the GMU wrapper a bit more in the commit msg [5/15]
- Separate out pm_resume/suspend for GMU-wrapper GPUs to make things
cleaner [5/15]
- Don't check if `info` exists, it has to at this point [5/15]
- Assign gpu->info early and clean up following if statements in
a6xx_gpu_init [5/15]
- Determine whether we use GMU wrapper based on the GMU compatible
instead of a quirk [5/15]
- Use a struct field to annotate whether we're using gmu wrapper so
that it can be assigned at runtime (turns out a619 holi-ness cannot
be determined by patchid + that will make it easier to test out GMU
GPUs without actually turning on the GMU if anybody wants to do so)
[5/15]
- Unconditionally hook up gx to the gmu wrapper (otherwise our gpu
will not get power) [5/15]
- Don't check for gx domain presence in gmu_wrapper paths, it's
guaranteed [5/15]
- Use opp set rate in the gmuwrapper suspend path [5/15]
- Call opp functions on the GPU device and not on the DRM device of
mdp4/5/DPU1 half the time (WHOOOOPS!) [5/15]
- Disable the memory clock in a6xx_pm_suspend instead of enabling it
(moderate oops) [5/15]
- Call the forgotten clk_bulk_disable_unprepare in a6xx_pm_suspend [5/15]
- Set rate to FMIN (a6xx really doesn't like rate=0 + that's what
msm-5.x does anyway) before disabling core clock [5/15]
- pm_runtime_get_sync -> pm_runtime_resume_and_get [5/15]
- Don't annotate no cached BO support with a quirk, as A619_holi is
merged into the A619 entry in the big const struct - this means
that all GPUs operating in gmu wrapper configuration will be
implicitly treated as if they didn't have this feature [7/15]
- Drop OPP rate & icc related patches, they're a part of a separate
series now; rebase on it
- Clean up extra parentheses [8/15]
- Identify A619_holi by checking the compatible of its GMU instead
of patchlevel [8/15]
- Drop "Fix up A6XX protected registers" - unnecessary, Rob will add
a comment explaining why
- Fix existing UBWC values for A680, new patch [10/15]
- Use adreno_is_aXYZ macros in speedbin matching [13/15] - new patch
v2: https://lore.kernel.org/linux-arm-msm/[email protected]/
v1 -> v2:
- Fix A630 values in [2/14]
- Fix [6/14] for GMU-equipped GPUs
Link to v1: https://lore.kernel.org/linux-arm-msm/[email protected]/
This series concludes my couple-weeks-long suffering of figuring out
the ins and outs of the "non-standard" A6xx GPUs which feature no GMU.
The GMU functionality is essentially emulated by parting out a
"GMU wrapper" region, which is essentially just a register space
within the GPU. It's modeled to be as similar to the actual GMU
as possible while staying as unnecessary as we can make it - there's
no IRQs, communicating with a microcontroller, no RPMh communication
etc. etc. I tried to reuse as much code as possible without making
a mess where every even line is used for GMU and every odd line is
used for GMU wrapper..
This series contains:
- plumbing for non-GMU operation, if-ing out GMU calls based on
GMU presence
- GMU wrapper support
- A610 support (w/ speedbin)
- A619 support (w/ speedbin)
- couple of minor fixes and improvements
- VDDCX/VDDGX scaling fix for non-GMU GPUs (concerns more than just
A6xx)
- Enablement of opp interconnect properties
A619_holi works perfectly fine using the already-present A619 support
in mesa. A610 needs more work on that front, but can already replay
command traces captures on downstream.
NOTE: the "drm/msm/a6xx: Add support for A619_holi" patch contains
two occurences of 0x18 used in place of a register #define, as it's
supposed to be RBBM_GPR0_CNTL, but that will only be present after
mesa-side changes are merged and headers are synced from there.
Speedbin patches depend on:
https://lore.kernel.org/linux-arm-msm/[email protected]/
Signed-off-by: Konrad Dybcio <[email protected]>
---
Konrad Dybcio (18):
dt-bindings: display/msm: gpu: Document GMU wrapper-equipped A6xx
dt-bindings: display/msm/gmu: Add GMU wrapper
drm/msm/a6xx: Remove static keyword from sptprac en/disable functions
drm/msm/a6xx: Move force keepalive vote removal to a6xx_gmu_force_off()
drm/msm/a6xx: Move a6xx_bus_clear_pending_transactions to a6xx_gpu
drm/msm/a6xx: Improve a6xx_bus_clear_pending_transactions()
drm/msm/a6xx: Add a helper for software-resetting the GPU
drm/msm/a6xx: Remove both GBIF and RBBM GBIF halt on hw init
drm/msm/a6xx: Extend and explain UBWC config
drm/msm/a6xx: Introduce GMU wrapper support
drm/msm/adreno: Disable has_cached_coherent in GMU wrapper configurations
drm/msm/a6xx: Add support for A619_holi
drm/msm/a6xx: Add A610 support
drm/msm/a6xx: Fix some A619 tunables
drm/msm/a6xx: Use "else if" in GPU speedbin rev matching
drm/msm/a6xx: Use adreno_is_aXYZ macros in speedbin matching
drm/msm/a6xx: Add A619_holi speedbin support
drm/msm/a6xx: Add A610 speedbin support
.../devicetree/bindings/display/msm/gmu.yaml | 50 +-
.../devicetree/bindings/display/msm/gpu.yaml | 61 ++-
drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 122 +++--
drivers/gpu/drm/msm/adreno/a6xx_gmu.h | 2 +
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 512 ++++++++++++++++++---
drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 4 +
drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 14 +-
drivers/gpu/drm/msm/adreno/adreno_device.c | 17 +-
drivers/gpu/drm/msm/adreno/adreno_gpu.c | 8 +-
drivers/gpu/drm/msm/adreno/adreno_gpu.h | 33 +-
10 files changed, 686 insertions(+), 137 deletions(-)
---
base-commit: 6a3d37b4d885129561e1cef361216f00472f7d2e
change-id: 20230223-topic-gmuwrapper-b4fff5fd7789
Best regards,
--
Konrad Dybcio <[email protected]>
Currently we're only deasserting REG_A6XX_RBBM_GBIF_HALT, but we also
need REG_A6XX_GBIF_HALT to be set to 0.
This is typically done automatically on successful GX collapse, but in
case that fails, we should take care of it.
Also, add a memory barrier to ensure it's gone through before jumping
to further initialization.
Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 083ccb5bcb4e..dfde5fb65eed 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1003,8 +1003,12 @@ static int hw_init(struct msm_gpu *gpu)
a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
/* Clear GBIF halt in case GX domain was not collapsed */
- if (a6xx_has_gbif(adreno_gpu))
+ if (a6xx_has_gbif(adreno_gpu)) {
+ gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
+ /* Let's make extra sure that the GPU can access the memory.. */
+ mb();
+ }
gpu_write(gpu, REG_A6XX_RBBM_SECVID_TSB_CNTL, 0);
--
2.40.1
This function is responsible for telling the GPU to halt transactions
on all of its relevant buses, drain them and leave them in a predictable
state, so that the GPU can be e.g. reset cleanly.
Move the function to a6xx_gpu.c, remove the static keyword and add a
prototype in a6xx_gpu.h to accomodate for the move.
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 37 -----------------------------------
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 36 ++++++++++++++++++++++++++++++++++
drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 2 ++
3 files changed, 38 insertions(+), 37 deletions(-)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 9421716a2fe5..b86be123ecd0 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -868,43 +868,6 @@ static void a6xx_gmu_rpmh_off(struct a6xx_gmu *gmu)
(val & 1), 100, 1000);
}
-#define GBIF_CLIENT_HALT_MASK BIT(0)
-#define GBIF_ARB_HALT_MASK BIT(1)
-
-static void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu,
- bool gx_off)
-{
- struct msm_gpu *gpu = &adreno_gpu->base;
-
- if (!a6xx_has_gbif(adreno_gpu)) {
- gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, 0xf);
- spin_until((gpu_read(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL1) &
- 0xf) == 0xf);
- gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, 0);
-
- return;
- }
-
- if (gx_off) {
- /* Halt the gx side of GBIF */
- gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 1);
- spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) & 1);
- }
-
- /* Halt new client requests on GBIF */
- gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
- spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
- (GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
-
- /* Halt all AXI requests on GBIF */
- gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
- spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
- (GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
-
- /* The GBIF halt needs to be explicitly cleared */
- gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
-}
-
/* Force the GMU off in case it isn't responsive */
static void a6xx_gmu_force_off(struct a6xx_gmu *gmu)
{
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index e34aa15156a4..6bb4da70f6a6 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1597,6 +1597,42 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
}
+#define GBIF_CLIENT_HALT_MASK BIT(0)
+#define GBIF_ARB_HALT_MASK BIT(1)
+
+void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_off)
+{
+ struct msm_gpu *gpu = &adreno_gpu->base;
+
+ if (!a6xx_has_gbif(adreno_gpu)) {
+ gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, 0xf);
+ spin_until((gpu_read(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL1) &
+ 0xf) == 0xf);
+ gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, 0);
+
+ return;
+ }
+
+ if (gx_off) {
+ /* Halt the gx side of GBIF */
+ gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 1);
+ spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) & 1);
+ }
+
+ /* Halt new client requests on GBIF */
+ gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
+ spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
+ (GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
+
+ /* Halt all AXI requests on GBIF */
+ gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
+ spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
+ (GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
+
+ /* The GBIF halt needs to be explicitly cleared */
+ gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
+}
+
static int a6xx_pm_resume(struct msm_gpu *gpu)
{
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index eea2e60ce3b7..9580def06d45 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -88,4 +88,6 @@ void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu);
int a6xx_gpu_state_put(struct msm_gpu_state *state);
+void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_off);
+
#endif /* __A6XX_GPU_H__ */
--
2.40.1
Adreno 619 expects some tunables to be set differently. Make up for it.
Fixes: b7616b5c69e6 ("drm/msm/adreno: Add A619 support")
Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index c0d5973320d9..1a29e7dd9975 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1198,6 +1198,8 @@ static int hw_init(struct msm_gpu *gpu)
gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00200200);
else if (adreno_is_a650(adreno_gpu) || adreno_is_a660(adreno_gpu))
gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00300200);
+ else if (adreno_is_a619(adreno_gpu))
+ gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00018000);
else if (adreno_is_a610(adreno_gpu))
gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00080000);
else
@@ -1215,7 +1217,9 @@ static int hw_init(struct msm_gpu *gpu)
a6xx_set_ubwc_config(gpu);
/* Enable fault detection */
- if (adreno_is_a610(adreno_gpu))
+ if (adreno_is_a619(adreno_gpu))
+ gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) | 0x3fffff);
+ else if (adreno_is_a610(adreno_gpu))
gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) | 0x3ffff);
else
gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) | 0x1fffff);
--
2.40.1
The "GMU Wrapper" is Qualcomm's name for "let's treat the GPU blocks
we'd normally assign to the GMU as if they were a part of the GMU, even
though they are not". It's a (good) software representation of the GMU_CX
and GMU_GX register spaces within the GPUSS that helps us programatically
treat these de-facto GMU-less parts in a way that's very similar to their
GMU-equipped cousins, massively saving up on code duplication.
The "wrapper" register space was specifically designed to mimic the layout
of a real GMU, though it rather obviously does not have the M3 core et al.
To sum it all up, the GMU wrapper is essentially a register space within
the GPU, which Linux sees as a dumbed-down regular GMU: there's no clocks,
interrupts, multiple reg spaces, iommus and OPP. Document it.
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Konrad Dybcio <[email protected]>
---
.../devicetree/bindings/display/msm/gmu.yaml | 50 ++++++++++++++++------
1 file changed, 38 insertions(+), 12 deletions(-)
diff --git a/Documentation/devicetree/bindings/display/msm/gmu.yaml b/Documentation/devicetree/bindings/display/msm/gmu.yaml
index f31a26305ca9..5fc4106110ad 100644
--- a/Documentation/devicetree/bindings/display/msm/gmu.yaml
+++ b/Documentation/devicetree/bindings/display/msm/gmu.yaml
@@ -19,16 +19,18 @@ description: |
properties:
compatible:
- items:
- - pattern: '^qcom,adreno-gmu-6[0-9][0-9]\.[0-9]$'
- - const: qcom,adreno-gmu
+ oneOf:
+ - items:
+ - pattern: '^qcom,adreno-gmu-6[0-9][0-9]\.[0-9]$'
+ - const: qcom,adreno-gmu
+ - const: qcom,adreno-gmu-wrapper
reg:
- minItems: 3
+ minItems: 1
maxItems: 4
reg-names:
- minItems: 3
+ minItems: 1
maxItems: 4
clocks:
@@ -44,7 +46,6 @@ properties:
- description: GMU HFI interrupt
- description: GMU interrupt
-
interrupt-names:
items:
- const: hfi
@@ -72,14 +73,8 @@ required:
- compatible
- reg
- reg-names
- - clocks
- - clock-names
- - interrupts
- - interrupt-names
- power-domains
- power-domain-names
- - iommus
- - operating-points-v2
additionalProperties: false
@@ -218,6 +213,28 @@ allOf:
- const: axi
- const: memnoc
+ - if:
+ properties:
+ compatible:
+ contains:
+ const: qcom,adreno-gmu-wrapper
+ then:
+ properties:
+ reg:
+ items:
+ - description: GMU wrapper register space
+ reg-names:
+ items:
+ - const: gmu
+ else:
+ required:
+ - clocks
+ - clock-names
+ - interrupts
+ - interrupt-names
+ - iommus
+ - operating-points-v2
+
examples:
- |
#include <dt-bindings/clock/qcom,gpucc-sdm845.h>
@@ -250,3 +267,12 @@ examples:
iommus = <&adreno_smmu 5>;
operating-points-v2 = <&gmu_opp_table>;
};
+
+ gmu_wrapper: gmu@596a000 {
+ compatible = "qcom,adreno-gmu-wrapper";
+ reg = <0x0596a000 0x30000>;
+ reg-names = "gmu";
+ power-domains = <&gpucc GPU_CX_GDSC>,
+ <&gpucc GPU_GX_GDSC>;
+ power-domain-names = "cx", "gx";
+ };
--
2.40.1
A619_holi is implemented on at least two SoCs: SM4350 (holi) and SM6375
(blair). This is what seems to be a first occurrence of this happening,
but it's easy to overcome by guarding the SoC-specific fuse values with
of_machine_is_compatible(). Do just that to enable frequency limiting
on these SoCs.
Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index ca4ffa44097e..d046af5f6de2 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -2110,6 +2110,34 @@ static u32 a618_get_speed_bin(u32 fuse)
return UINT_MAX;
}
+static u32 a619_holi_get_speed_bin(u32 fuse)
+{
+ /*
+ * There are (at least) two SoCs implementing A619_holi: SM4350 (holi)
+ * and SM6375 (blair). Limit the fuse matching to the corresponding
+ * SoC to prevent bogus frequency setting (as improbable as it may be,
+ * given unexpected fuse values are.. unexpected! But still possible.)
+ */
+
+ if (fuse == 0)
+ return 0;
+
+ if (of_machine_is_compatible("qcom,sm4350")) {
+ if (fuse == 138)
+ return 1;
+ else if (fuse == 92)
+ return 2;
+ } else if (of_machine_is_compatible("qcom,sm6375")) {
+ if (fuse == 190)
+ return 1;
+ else if (fuse == 177)
+ return 2;
+ } else
+ pr_warn("Unknown SoC implementing A619_holi!\n");
+
+ return UINT_MAX;
+}
+
static u32 a619_get_speed_bin(u32 fuse)
{
if (fuse == 0)
@@ -2170,6 +2198,9 @@ static u32 fuse_to_supp_hw(struct device *dev, struct adreno_gpu *adreno_gpu, u3
if (adreno_is_a618(adreno_gpu))
val = a618_get_speed_bin(fuse);
+ else if (adreno_is_a619_holi(adreno_gpu))
+ val = a619_holi_get_speed_bin(fuse);
+
else if (adreno_is_a619(adreno_gpu))
val = a619_get_speed_bin(fuse);
--
2.40.1
A610 and A619_holi don't support the feature. Disable it to make the GPU stop
crashing after almost each and every submission - the received data on
the GPU end was simply incomplete in garbled, resulting in almost nothing
being executed properly. Extend the disablement to adreno_has_gmu_wrapper,
as none of the GMU wrapper Adrenos that don't support yet seem to feature it.
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/adreno_device.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 8cff86e9d35c..b133755a56c4 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -551,7 +551,6 @@ static int adreno_bind(struct device *dev, struct device *master, void *data)
config.rev.minor, config.rev.patchid);
priv->is_a2xx = config.rev.core == 2;
- priv->has_cached_coherent = config.rev.core >= 6;
gpu = info->init(drm);
if (IS_ERR(gpu)) {
@@ -563,6 +562,10 @@ static int adreno_bind(struct device *dev, struct device *master, void *data)
if (ret)
return ret;
+ if (config.rev.core >= 6)
+ if (!adreno_has_gmu_wrapper(to_adreno_gpu(gpu)))
+ priv->has_cached_coherent = true;
+
return 0;
}
--
2.40.1
A619_holi is a GMU-less variant of the already-supported A619 GPU.
It's present on at least SM4350 (holi) and SM6375 (blair). No mesa
changes are required. Add the required kernel-side support for it.
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 27 +++++++++++++++++++++++++--
drivers/gpu/drm/msm/adreno/adreno_gpu.h | 5 +++++
2 files changed, 30 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 0a44762dbb6d..bb04f65e6f68 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -810,6 +810,9 @@ static void a6xx_set_ubwc_config(struct msm_gpu *gpu)
if (adreno_is_a618(adreno_gpu))
return;
+ if (adreno_is_a619_holi(adreno_gpu))
+ hbb_lo = 0;
+
if (adreno_is_a640_family(adreno_gpu))
amsbc = 1;
@@ -1027,7 +1030,12 @@ static int hw_init(struct msm_gpu *gpu)
}
/* Clear GBIF halt in case GX domain was not collapsed */
- if (a6xx_has_gbif(adreno_gpu)) {
+ if (adreno_is_a619_holi(adreno_gpu)) {
+ gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
+ gpu_write(gpu, REG_A6XX_RBBM_GPR0_CNTL, 0);
+ /* Let's make extra sure that the GPU can access the memory.. */
+ mb();
+ } else if (a6xx_has_gbif(adreno_gpu)) {
gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
/* Let's make extra sure that the GPU can access the memory.. */
@@ -1036,6 +1044,9 @@ static int hw_init(struct msm_gpu *gpu)
gpu_write(gpu, REG_A6XX_RBBM_SECVID_TSB_CNTL, 0);
+ if (adreno_is_a619_holi(adreno_gpu))
+ a6xx_sptprac_enable(gmu);
+
/*
* Disable the trusted memory range - we don't actually supported secure
* memory rendering at this point in time and we don't want to block off
@@ -1656,12 +1667,18 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
#define GBIF_CLIENT_HALT_MASK BIT(0)
#define GBIF_ARB_HALT_MASK BIT(1)
#define VBIF_XIN_HALT_CTRL0_MASK GENMASK(3, 0)
+#define VBIF_RESET_ACK_MASK 0xF0
+#define GPR0_GBIF_HALT_REQUEST 0x1E0
void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_off)
{
struct msm_gpu *gpu = &adreno_gpu->base;
- if (!a6xx_has_gbif(adreno_gpu)) {
+ if (adreno_is_a619_holi(adreno_gpu)) {
+ gpu_write(gpu, 0x18, GPR0_GBIF_HALT_REQUEST);
+ spin_until((gpu_read(gpu, REG_A6XX_RBBM_VBIF_GX_RESET_STATUS) &
+ (VBIF_RESET_ACK_MASK)) == VBIF_RESET_ACK_MASK);
+ } else if (!a6xx_has_gbif(adreno_gpu)) {
gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, VBIF_XIN_HALT_CTRL0_MASK);
spin_until((gpu_read(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL1) &
(VBIF_XIN_HALT_CTRL0_MASK)) == VBIF_XIN_HALT_CTRL0_MASK);
@@ -1756,6 +1773,9 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
if (ret)
goto err_bulk_clk;
+ if (adreno_is_a619_holi(adreno_gpu))
+ a6xx_sptprac_enable(gmu);
+
/* If anything goes south, tear the GPU down piece by piece.. */
if (ret) {
err_bulk_clk:
@@ -1815,6 +1835,9 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
/* Drain the outstanding traffic on memory buses */
a6xx_bus_clear_pending_transactions(adreno_gpu, true);
+ if (adreno_is_a619_holi(adreno_gpu))
+ a6xx_sptprac_disable(gmu);
+
clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
pm_runtime_put_sync(gmu->gxpd);
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index ee5352bc5329..432fee5c1516 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -252,6 +252,11 @@ static inline int adreno_is_a619(struct adreno_gpu *gpu)
return gpu->revn == 619;
}
+static inline int adreno_is_a619_holi(struct adreno_gpu *gpu)
+{
+ return adreno_is_a619(gpu) && adreno_has_gmu_wrapper(gpu);
+}
+
static inline int adreno_is_a630(struct adreno_gpu *gpu)
{
return gpu->revn == 630;
--
2.40.1
Rename lower_bit to hbb_lo and explain what it signifies.
Add explanations (wherever possible to other tunables).
Port setting min_access_length, ubwc_mode and hbb_hi from downstream.
Reviewed-by: Rob Clark <[email protected]>
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 39 +++++++++++++++++++++++++++--------
1 file changed, 30 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index dfde5fb65eed..58bf405b85d8 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -786,10 +786,25 @@ static void a6xx_set_cp_protect(struct msm_gpu *gpu)
static void a6xx_set_ubwc_config(struct msm_gpu *gpu)
{
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
- u32 lower_bit = 2;
- u32 amsbc = 0;
+ /* Unknown, introduced with A650 family, related to UBWC mode/ver 4 */
u32 rgb565_predicator = 0;
+ /* Unknown, introduced with A650 family */
u32 uavflagprd_inv = 0;
+ /* Whether the minimum access length is 64 bits */
+ u32 min_acc_len = 0;
+ /* Entirely magic, per-GPU-gen value */
+ u32 ubwc_mode = 0;
+ /*
+ * The Highest Bank Bit value represents the bit of the highest DDR bank.
+ * We then subtract 13 from it (13 is the minimum value allowed by hw) and
+ * write the lowest two bits of the remaining value as hbb_lo and the
+ * one above it as hbb_hi to the hardware. This should ideally use DRAM
+ * type detection.
+ */
+ u32 hbb_hi = 0;
+ u32 hbb_lo = 2;
+ /* Unknown, introduced with A640/680 */
+ u32 amsbc = 0;
/* a618 is using the hw default values */
if (adreno_is_a618(adreno_gpu))
@@ -800,25 +815,31 @@ static void a6xx_set_ubwc_config(struct msm_gpu *gpu)
if (adreno_is_a650(adreno_gpu) || adreno_is_a660(adreno_gpu)) {
/* TODO: get ddr type from bootloader and use 2 for LPDDR4 */
- lower_bit = 3;
+ hbb_lo = 3;
amsbc = 1;
rgb565_predicator = 1;
uavflagprd_inv = 2;
}
if (adreno_is_7c3(adreno_gpu)) {
- lower_bit = 1;
+ hbb_lo = 1;
amsbc = 1;
rgb565_predicator = 1;
uavflagprd_inv = 2;
}
gpu_write(gpu, REG_A6XX_RB_NC_MODE_CNTL,
- rgb565_predicator << 11 | amsbc << 4 | lower_bit << 1);
- gpu_write(gpu, REG_A6XX_TPL1_NC_MODE_CNTL, lower_bit << 1);
- gpu_write(gpu, REG_A6XX_SP_NC_MODE_CNTL,
- uavflagprd_inv << 4 | lower_bit << 1);
- gpu_write(gpu, REG_A6XX_UCHE_MODE_CNTL, lower_bit << 21);
+ rgb565_predicator << 11 | hbb_hi << 10 | amsbc << 4 |
+ min_acc_len << 3 | hbb_lo << 1 | ubwc_mode);
+
+ gpu_write(gpu, REG_A6XX_TPL1_NC_MODE_CNTL, hbb_hi << 4 |
+ min_acc_len << 3 | hbb_lo << 1 | ubwc_mode);
+
+ gpu_write(gpu, REG_A6XX_SP_NC_MODE_CNTL, hbb_hi << 10 |
+ uavflagprd_inv << 4 | min_acc_len << 3 |
+ hbb_lo << 1 | ubwc_mode);
+
+ gpu_write(gpu, REG_A6XX_UCHE_MODE_CNTL, min_acc_len << 23 | hbb_lo << 21);
}
static int a6xx_cp_init(struct msm_gpu *gpu)
--
2.40.1
As pointed out by Akhil during the review process of GMU wrapper
introduction [1], it makes sense to move this write into the function
that's responsible for forcibly shutting the GMU off.
It is also very convenient to move this to GMU-specific code, so that
it does not have to be guarded by an if-condition to avoid calling it
on GMU wrapper targets.
Move the write to the aforementioned a6xx_gmu_force_off() to achieve
that. No effective functional change.
[1] https://lore.kernel.org/linux-arm-msm/[email protected]/
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 6 ++++++
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 ------
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 87babbb2a19f..9421716a2fe5 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -912,6 +912,12 @@ static void a6xx_gmu_force_off(struct a6xx_gmu *gmu)
struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
struct msm_gpu *gpu = &adreno_gpu->base;
+ /*
+ * Turn off keep alive that might have been enabled by the hang
+ * interrupt
+ */
+ gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
+
/* Flush all the queues */
a6xx_hfi_stop(gmu);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 9fb214f150dd..e34aa15156a4 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1274,12 +1274,6 @@ static void a6xx_recover(struct msm_gpu *gpu)
/* Halt SQE first */
gpu_write(gpu, REG_A6XX_CP_SQE_CNTL, 3);
- /*
- * Turn off keep alive that might have been enabled by the hang
- * interrupt
- */
- gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
-
pm_runtime_dont_use_autosuspend(&gpu->pdev->dev);
/* active_submit won't change until we make a submission */
--
2.40.1
A610 is one of (if not the) lowest-tier SKUs in the A6XX family. It
features no GMU, as it's implemented solely on SoCs with SMD_RPM.
What's more interesting is that it does not feature a VDDGX line
either, being powered solely by VDDCX and has an unfortunate hardware
quirk that makes its reset line broken - after a couple of assert/
deassert cycles, it will hang for good and will not wake up again.
This GPU requires mesa changes for proper rendering, and lots of them
at that. The command streams are quite far away from any other A6XX
GPU and hence it needs special care. This patch was validated both
by running an (incomplete) downstream mesa with some hacks (frames
rendered correctly, though some instructions made the GPU hangcheck
which is expected - garbage in, garbage out) and by replaying RD
traces captured with the downstream KGSL driver - no crashes there,
ever.
Add support for this GPU on the kernel side, which comes down to
pretty simply adding A612 HWCG tables, altering a few values and
adding a special case for handling the reset line.
Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 101 +++++++++++++++++++++++++----
drivers/gpu/drm/msm/adreno/adreno_device.c | 12 ++++
drivers/gpu/drm/msm/adreno/adreno_gpu.h | 8 ++-
3 files changed, 108 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index bb04f65e6f68..c0d5973320d9 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -252,6 +252,56 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
a6xx_flush(gpu, ring);
}
+const struct adreno_reglist a612_hwcg[] = {
+ {REG_A6XX_RBBM_CLOCK_CNTL_SP0, 0x22222222},
+ {REG_A6XX_RBBM_CLOCK_CNTL2_SP0, 0x02222220},
+ {REG_A6XX_RBBM_CLOCK_DELAY_SP0, 0x00000081},
+ {REG_A6XX_RBBM_CLOCK_HYST_SP0, 0x0000f3cf},
+ {REG_A6XX_RBBM_CLOCK_CNTL_TP0, 0x22222222},
+ {REG_A6XX_RBBM_CLOCK_CNTL2_TP0, 0x22222222},
+ {REG_A6XX_RBBM_CLOCK_CNTL3_TP0, 0x22222222},
+ {REG_A6XX_RBBM_CLOCK_CNTL4_TP0, 0x00022222},
+ {REG_A6XX_RBBM_CLOCK_DELAY_TP0, 0x11111111},
+ {REG_A6XX_RBBM_CLOCK_DELAY2_TP0, 0x11111111},
+ {REG_A6XX_RBBM_CLOCK_DELAY3_TP0, 0x11111111},
+ {REG_A6XX_RBBM_CLOCK_DELAY4_TP0, 0x00011111},
+ {REG_A6XX_RBBM_CLOCK_HYST_TP0, 0x77777777},
+ {REG_A6XX_RBBM_CLOCK_HYST2_TP0, 0x77777777},
+ {REG_A6XX_RBBM_CLOCK_HYST3_TP0, 0x77777777},
+ {REG_A6XX_RBBM_CLOCK_HYST4_TP0, 0x00077777},
+ {REG_A6XX_RBBM_CLOCK_CNTL_RB0, 0x22222222},
+ {REG_A6XX_RBBM_CLOCK_CNTL2_RB0, 0x01202222},
+ {REG_A6XX_RBBM_CLOCK_CNTL_CCU0, 0x00002220},
+ {REG_A6XX_RBBM_CLOCK_HYST_RB_CCU0, 0x00040f00},
+ {REG_A6XX_RBBM_CLOCK_CNTL_RAC, 0x05522022},
+ {REG_A6XX_RBBM_CLOCK_CNTL2_RAC, 0x00005555},
+ {REG_A6XX_RBBM_CLOCK_DELAY_RAC, 0x00000011},
+ {REG_A6XX_RBBM_CLOCK_HYST_RAC, 0x00445044},
+ {REG_A6XX_RBBM_CLOCK_CNTL_TSE_RAS_RBBM, 0x04222222},
+ {REG_A6XX_RBBM_CLOCK_MODE_VFD, 0x00002222},
+ {REG_A6XX_RBBM_CLOCK_MODE_GPC, 0x02222222},
+ {REG_A6XX_RBBM_CLOCK_DELAY_HLSQ_2, 0x00000002},
+ {REG_A6XX_RBBM_CLOCK_MODE_HLSQ, 0x00002222},
+ {REG_A6XX_RBBM_CLOCK_DELAY_TSE_RAS_RBBM, 0x00004000},
+ {REG_A6XX_RBBM_CLOCK_DELAY_VFD, 0x00002222},
+ {REG_A6XX_RBBM_CLOCK_DELAY_GPC, 0x00000200},
+ {REG_A6XX_RBBM_CLOCK_DELAY_HLSQ, 0x00000000},
+ {REG_A6XX_RBBM_CLOCK_HYST_TSE_RAS_RBBM, 0x00000000},
+ {REG_A6XX_RBBM_CLOCK_HYST_VFD, 0x00000000},
+ {REG_A6XX_RBBM_CLOCK_HYST_GPC, 0x04104004},
+ {REG_A6XX_RBBM_CLOCK_HYST_HLSQ, 0x00000000},
+ {REG_A6XX_RBBM_CLOCK_CNTL_UCHE, 0x22222222},
+ {REG_A6XX_RBBM_CLOCK_HYST_UCHE, 0x00000004},
+ {REG_A6XX_RBBM_CLOCK_DELAY_UCHE, 0x00000002},
+ {REG_A6XX_RBBM_ISDB_CNT, 0x00000182},
+ {REG_A6XX_RBBM_RAC_THRESHOLD_CNT, 0x00000000},
+ {REG_A6XX_RBBM_SP_HYST_CNT, 0x00000000},
+ {REG_A6XX_RBBM_CLOCK_CNTL_GMU_GX, 0x00000222},
+ {REG_A6XX_RBBM_CLOCK_DELAY_GMU_GX, 0x00000111},
+ {REG_A6XX_RBBM_CLOCK_HYST_GMU_GX, 0x00000555},
+ {},
+};
+
/* For a615 family (a615, a616, a618 and a619) */
const struct adreno_reglist a615_hwcg[] = {
{REG_A6XX_RBBM_CLOCK_CNTL_SP0, 0x02222222},
@@ -602,6 +652,8 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
if (adreno_is_a630(adreno_gpu))
clock_cntl_on = 0x8aa8aa02;
+ else if (adreno_is_a610(adreno_gpu))
+ clock_cntl_on = 0xaaa8aa82;
else
clock_cntl_on = 0x8aa8aa82;
@@ -612,13 +664,15 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
return;
/* Disable SP clock before programming HWCG registers */
- gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
+ if (!adreno_is_a610(adreno_gpu))
+ gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
for (i = 0; (reg = &adreno_gpu->info->hwcg[i], reg->offset); i++)
gpu_write(gpu, reg->offset, state ? reg->value : 0);
/* Enable SP clock */
- gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
+ if (!adreno_is_a610(adreno_gpu))
+ gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
gpu_write(gpu, REG_A6XX_RBBM_CLOCK_CNTL, state ? clock_cntl_on : 0);
}
@@ -806,6 +860,13 @@ static void a6xx_set_ubwc_config(struct msm_gpu *gpu)
/* Unknown, introduced with A640/680 */
u32 amsbc = 0;
+ if (adreno_is_a610(adreno_gpu)) {
+ /* HBB = 14 */
+ hbb_lo = 1;
+ min_acc_len = 1;
+ ubwc_mode = 1;
+ }
+
/* a618 is using the hw default values */
if (adreno_is_a618(adreno_gpu))
return;
@@ -1073,13 +1134,13 @@ static int hw_init(struct msm_gpu *gpu)
a6xx_set_hwcg(gpu, true);
/* VBIF/GBIF start*/
- if (adreno_is_a640_family(adreno_gpu) ||
+ if (adreno_is_a610(adreno_gpu) ||
+ adreno_is_a640_family(adreno_gpu) ||
adreno_is_a650_family(adreno_gpu)) {
gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE0, 0x00071620);
gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE1, 0x00071620);
gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE2, 0x00071620);
gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE3, 0x00071620);
- gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE3, 0x00071620);
gpu_write(gpu, REG_A6XX_RBBM_GBIF_CLIENT_QOS_CNTL, 0x3);
} else {
gpu_write(gpu, REG_A6XX_RBBM_VBIF_CLIENT_QOS_CNTL, 0x3);
@@ -1107,18 +1168,26 @@ static int hw_init(struct msm_gpu *gpu)
gpu_write(gpu, REG_A6XX_UCHE_FILTER_CNTL, 0x804);
gpu_write(gpu, REG_A6XX_UCHE_CACHE_WAYS, 0x4);
- if (adreno_is_a640_family(adreno_gpu) ||
- adreno_is_a650_family(adreno_gpu))
+ if (adreno_is_a640_family(adreno_gpu) || adreno_is_a650_family(adreno_gpu)) {
gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_2, 0x02000140);
- else
+ gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_1, 0x8040362c);
+ } else if (adreno_is_a610(adreno_gpu)) {
+ gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_2, 0x00800060);
+ gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_1, 0x40201b16);
+ } else {
gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_2, 0x010000c0);
- gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_1, 0x8040362c);
+ gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_1, 0x8040362c);
+ }
if (adreno_is_a660_family(adreno_gpu))
gpu_write(gpu, REG_A6XX_CP_LPAC_PROG_FIFO_SIZE, 0x00000020);
/* Setting the mem pool size */
- gpu_write(gpu, REG_A6XX_CP_MEM_POOL_SIZE, 128);
+ if (adreno_is_a610(adreno_gpu)) {
+ gpu_write(gpu, REG_A6XX_CP_MEM_POOL_SIZE, 48);
+ gpu_write(gpu, REG_A6XX_CP_MEM_POOL_DBG_ADDR, 47);
+ } else
+ gpu_write(gpu, REG_A6XX_CP_MEM_POOL_SIZE, 128);
/* Setting the primFifo thresholds default values,
* and vccCacheSkipDis=1 bit (0x200) for A640 and newer
@@ -1129,6 +1198,8 @@ static int hw_init(struct msm_gpu *gpu)
gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00200200);
else if (adreno_is_a650(adreno_gpu) || adreno_is_a660(adreno_gpu))
gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00300200);
+ else if (adreno_is_a610(adreno_gpu))
+ gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00080000);
else
gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00180000);
@@ -1144,8 +1215,10 @@ static int hw_init(struct msm_gpu *gpu)
a6xx_set_ubwc_config(gpu);
/* Enable fault detection */
- gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL,
- (1 << 30) | 0x1fffff);
+ if (adreno_is_a610(adreno_gpu))
+ gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) | 0x3ffff);
+ else
+ gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) | 0x1fffff);
gpu_write(gpu, REG_A6XX_UCHE_CLIENT_PF, 1);
@@ -1675,7 +1748,7 @@ void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_
struct msm_gpu *gpu = &adreno_gpu->base;
if (adreno_is_a619_holi(adreno_gpu)) {
- gpu_write(gpu, 0x18, GPR0_GBIF_HALT_REQUEST);
+ gpu_write(gpu, REG_A6XX_RBBM_GPR0_CNTL, GPR0_GBIF_HALT_REQUEST);
spin_until((gpu_read(gpu, REG_A6XX_RBBM_VBIF_GX_RESET_STATUS) &
(VBIF_RESET_ACK_MASK)) == VBIF_RESET_ACK_MASK);
} else if (!a6xx_has_gbif(adreno_gpu)) {
@@ -1709,6 +1782,10 @@ void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_
void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert)
{
+ /* 11nm chips (e.g. ones with A610) have hw issues with the reset line! */
+ if (adreno_is_a610(to_adreno_gpu(gpu)))
+ return;
+
gpu_write(gpu, REG_A6XX_RBBM_SW_RESET_CMD, assert);
/* Add a barrier to avoid bad surprises */
mb();
diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
index b133755a56c4..2c2cdbdada4d 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -253,6 +253,18 @@ static const struct adreno_info gpulist[] = {
.quirks = ADRENO_QUIRK_LMLOADKILL_DISABLE,
.init = a5xx_gpu_init,
.zapfw = "a540_zap.mdt",
+ }, {
+ .rev = ADRENO_REV(6, 1, 0, ANY_ID),
+ .revn = 610,
+ .name = "A610",
+ .fw = {
+ [ADRENO_FW_SQE] = "a630_sqe.fw",
+ },
+ .gmem = (SZ_128K + SZ_4K),
+ .inactive_period = 500,
+ .init = a6xx_gpu_init,
+ .zapfw = "a610_zap.mdt",
+ .hwcg = a612_hwcg,
}, {
.rev = ADRENO_REV(6, 1, 8, ANY_ID),
.revn = 618,
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index 432fee5c1516..7a5d595d4b99 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -55,7 +55,8 @@ struct adreno_reglist {
u32 value;
};
-extern const struct adreno_reglist a615_hwcg[], a630_hwcg[], a640_hwcg[], a650_hwcg[], a660_hwcg[];
+extern const struct adreno_reglist a612_hwcg[], a615_hwcg[], a630_hwcg[], a640_hwcg[], a650_hwcg[];
+extern const struct adreno_reglist a660_hwcg[];
struct adreno_info {
struct adreno_rev rev;
@@ -242,6 +243,11 @@ static inline int adreno_is_a540(struct adreno_gpu *gpu)
return gpu->revn == 540;
}
+static inline int adreno_is_a610(struct adreno_gpu *gpu)
+{
+ return gpu->revn == 610;
+}
+
static inline int adreno_is_a618(struct adreno_gpu *gpu)
{
return gpu->revn == 618;
--
2.40.1
The "GMU Wrapper" is Qualcomm's name for "let's treat the GPU blocks
we'd normally assign to the GMU as if they were a part of the GMU, even
though they are not". It's a (good) software representation of the GMU_CX
and GMU_GX register spaces within the GPUSS that helps us programatically
treat these de-facto GMU-less parts in a way that's very similar to their
GMU-equipped cousins, massively saving up on code duplication.
The "wrapper" register space was specifically designed to mimic the layout
of a real GMU, though it rather obviously does not have the M3 core et al.
GMU wrapper-equipped A6xx GPUs require clocks and clock-names to be
specified under the GPU node, just like their older cousins. Account
for that.
Signed-off-by: Konrad Dybcio <[email protected]>
---
.../devicetree/bindings/display/msm/gpu.yaml | 61 ++++++++++++++++++----
1 file changed, 52 insertions(+), 9 deletions(-)
diff --git a/Documentation/devicetree/bindings/display/msm/gpu.yaml b/Documentation/devicetree/bindings/display/msm/gpu.yaml
index 5dabe7b6794b..58ca8912a8c3 100644
--- a/Documentation/devicetree/bindings/display/msm/gpu.yaml
+++ b/Documentation/devicetree/bindings/display/msm/gpu.yaml
@@ -36,10 +36,7 @@ properties:
reg-names:
minItems: 1
- items:
- - const: kgsl_3d0_reg_memory
- - const: cx_mem
- - const: cx_dbgc
+ maxItems: 3
interrupts:
maxItems: 1
@@ -157,16 +154,62 @@ allOf:
required:
- clocks
- clock-names
+
- if:
properties:
compatible:
contains:
- pattern: '^qcom,adreno-6[0-9][0-9]\.[0-9]$'
-
- then: # Since Adreno 6xx series clocks should be defined in GMU
+ enum:
+ - qcom,adreno-610.0
+ - qcom,adreno-619.1
+ then:
properties:
- clocks: false
- clock-names: false
+ clocks:
+ minItems: 6
+ maxItems: 6
+
+ clock-names:
+ items:
+ - const: core
+ description: GPU Core clock
+ - const: iface
+ description: GPU Interface clock
+ - const: mem_iface
+ description: GPU Memory Interface clock
+ - const: alt_mem_iface
+ description: GPU Alternative Memory Interface clock
+ - const: gmu
+ description: CX GMU clock
+ - const: xo
+ description: GPUCC clocksource clock
+
+ reg-names:
+ minItems: 1
+ items:
+ - const: kgsl_3d0_reg_memory
+ - const: cx_dbgc
+
+ required:
+ - clocks
+ - clock-names
+ else:
+ if:
+ properties:
+ compatible:
+ contains:
+ pattern: '^qcom,adreno-6[0-9][0-9]\.[0-9]$'
+
+ then: # Starting with A6xx, the clocks are usually defined in the GMU node
+ properties:
+ clocks: false
+ clock-names: false
+
+ reg-names:
+ minItems: 1
+ items:
+ - const: kgsl_3d0_reg_memory
+ - const: cx_mem
+ - const: cx_dbgc
examples:
- |
--
2.40.1
Unify the indentation and explain the cryptic 0xF value.
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 6bb4da70f6a6..e3ac3f045665 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1597,17 +1597,18 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
}
-#define GBIF_CLIENT_HALT_MASK BIT(0)
-#define GBIF_ARB_HALT_MASK BIT(1)
+#define GBIF_CLIENT_HALT_MASK BIT(0)
+#define GBIF_ARB_HALT_MASK BIT(1)
+#define VBIF_XIN_HALT_CTRL0_MASK GENMASK(3, 0)
void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_off)
{
struct msm_gpu *gpu = &adreno_gpu->base;
if (!a6xx_has_gbif(adreno_gpu)) {
- gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, 0xf);
+ gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, VBIF_XIN_HALT_CTRL0_MASK);
spin_until((gpu_read(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL1) &
- 0xf) == 0xf);
+ (VBIF_XIN_HALT_CTRL0_MASK)) == VBIF_XIN_HALT_CTRL0_MASK);
gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, 0);
return;
--
2.40.1
Introduce a6xx_gpu_sw_reset() in preparation for adding GMU wrapper
GPUs and reuse it in a6xx_gmu_force_off().
This helper, contrary to the original usage in GMU code paths, adds
a write memory barrier which together with the necessary delay should
ensure that the reset is never deasserted too quickly due to e.g. OoO
execution going crazy.
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 3 +--
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 11 +++++++++++
drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 1 +
3 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index b86be123ecd0..5ba8cba69383 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -899,8 +899,7 @@ static void a6xx_gmu_force_off(struct a6xx_gmu *gmu)
a6xx_bus_clear_pending_transactions(adreno_gpu, true);
/* Reset GPU core blocks */
- gpu_write(gpu, REG_A6XX_RBBM_SW_RESET_CMD, 1);
- udelay(100);
+ a6xx_gpu_sw_reset(gpu, true);
}
static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index e3ac3f045665..083ccb5bcb4e 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1634,6 +1634,17 @@ void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_
gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
}
+void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert)
+{
+ gpu_write(gpu, REG_A6XX_RBBM_SW_RESET_CMD, assert);
+ /* Add a barrier to avoid bad surprises */
+ mb();
+
+ /* The reset line needs to be asserted for at least 100 us */
+ if (assert)
+ udelay(100);
+}
+
static int a6xx_pm_resume(struct msm_gpu *gpu)
{
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 9580def06d45..aa70390ee1c6 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -89,5 +89,6 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu);
int a6xx_gpu_state_put(struct msm_gpu_state *state);
void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_off);
+void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert);
#endif /* __A6XX_GPU_H__ */
--
2.40.1
Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
but don't implement the associated GMUs. This is due to the fact that
the GMU directly pokes at RPMh. Sadly, this means we have to take care
of enabling & scaling power rails, clocks and bandwidth ourselves.
Reuse existing Adreno-common code and modify the deeply-GMU-infused
A6XX code to facilitate these GPUs. This involves if-ing out lots
of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
the actual name that Qualcomm uses in their downstream kernels).
This is essentially a register region which is convenient to model
as a device. We'll use it for managing the GDSCs. The register
layout matches the actual GMU_CX/GX regions on the "real GMU" devices
and lets us reuse quite a bit of gmu_read/write/rmw calls.
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 72 +++++++++-
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 211 ++++++++++++++++++++++++----
drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 1 +
drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 14 +-
drivers/gpu/drm/msm/adreno/adreno_gpu.c | 8 +-
drivers/gpu/drm/msm/adreno/adreno_gpu.h | 6 +
6 files changed, 277 insertions(+), 35 deletions(-)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 5ba8cba69383..385ca3a12462 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -1437,6 +1437,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, struct platform_device *pdev,
void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
{
+ struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
struct platform_device *pdev = to_platform_device(gmu->dev);
@@ -1462,10 +1463,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
gmu->mmio = NULL;
gmu->rscc = NULL;
- a6xx_gmu_memory_free(gmu);
+ if (!adreno_has_gmu_wrapper(adreno_gpu)) {
+ a6xx_gmu_memory_free(gmu);
- free_irq(gmu->gmu_irq, gmu);
- free_irq(gmu->hfi_irq, gmu);
+ free_irq(gmu->gmu_irq, gmu);
+ free_irq(gmu->hfi_irq, gmu);
+ }
/* Drop reference taken in of_find_device_by_node */
put_device(gmu->dev);
@@ -1484,6 +1487,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
return 0;
}
+int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
+{
+ struct platform_device *pdev = of_find_device_by_node(node);
+ struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
+ int ret;
+
+ if (!pdev)
+ return -ENODEV;
+
+ gmu->dev = &pdev->dev;
+
+ of_dma_configure(gmu->dev, node, true);
+
+ pm_runtime_enable(gmu->dev);
+
+ /* Mark legacy for manual SPTPRAC control */
+ gmu->legacy = true;
+
+ /* Map the GMU registers */
+ gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
+ if (IS_ERR(gmu->mmio)) {
+ ret = PTR_ERR(gmu->mmio);
+ goto err_mmio;
+ }
+
+ gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
+ if (IS_ERR(gmu->cxpd)) {
+ ret = PTR_ERR(gmu->cxpd);
+ goto err_mmio;
+ }
+
+ if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
+ ret = -ENODEV;
+ goto detach_cxpd;
+ }
+
+ init_completion(&gmu->pd_gate);
+ complete_all(&gmu->pd_gate);
+ gmu->pd_nb.notifier_call = cxpd_notifier_cb;
+
+ /* Get a link to the GX power domain to reset the GPU */
+ gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
+ if (IS_ERR(gmu->gxpd)) {
+ ret = PTR_ERR(gmu->gxpd);
+ goto err_mmio;
+ }
+
+ gmu->initialized = true;
+
+ return 0;
+
+detach_cxpd:
+ dev_pm_domain_detach(gmu->cxpd, false);
+
+err_mmio:
+ iounmap(gmu->mmio);
+
+ /* Drop reference taken in of_find_device_by_node */
+ put_device(gmu->dev);
+
+ return ret;
+}
+
int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
{
struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 58bf405b85d8..0a44762dbb6d 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -21,7 +21,7 @@ static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
/* Check that the GMU is idle */
- if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
+ if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_isidle(&a6xx_gpu->gmu))
return false;
/* Check tha the CX master is idle */
@@ -1018,10 +1018,13 @@ static int hw_init(struct msm_gpu *gpu)
{
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+ struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
int ret;
- /* Make sure the GMU keeps the GPU on while we set it up */
- a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
+ if (!adreno_has_gmu_wrapper(adreno_gpu)) {
+ /* Make sure the GMU keeps the GPU on while we set it up */
+ a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
+ }
/* Clear GBIF halt in case GX domain was not collapsed */
if (a6xx_has_gbif(adreno_gpu)) {
@@ -1148,6 +1151,17 @@ static int hw_init(struct msm_gpu *gpu)
0x3f0243f0);
}
+ if (adreno_has_gmu_wrapper(adreno_gpu)) {
+ /* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
+
+ /* Set up the CX GMU counter 0 to count busy ticks */
+ gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
+
+ /* Enable power counter 0 */
+ gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
+ gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
+ }
+
/* Protect registers from the CP */
a6xx_set_cp_protect(gpu);
@@ -1237,6 +1251,8 @@ static int hw_init(struct msm_gpu *gpu)
}
out:
+ if (adreno_has_gmu_wrapper(adreno_gpu))
+ return ret;
/*
* Tell the GMU that we are done touching the GPU and it can start power
* management
@@ -1271,9 +1287,6 @@ static void a6xx_dump(struct msm_gpu *gpu)
adreno_dump(gpu);
}
-#define VBIF_RESET_ACK_TIMEOUT 100
-#define VBIF_RESET_ACK_MASK 0x00f0
-
static void a6xx_recover(struct msm_gpu *gpu)
{
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
@@ -1311,6 +1324,15 @@ static void a6xx_recover(struct msm_gpu *gpu)
*/
gpu->active_submits = 0;
+ if (adreno_has_gmu_wrapper(adreno_gpu)) {
+ /* Drain the outstanding traffic on memory buses */
+ a6xx_bus_clear_pending_transactions(adreno_gpu, true);
+
+ /* Reset the GPU to a clean state */
+ a6xx_gpu_sw_reset(gpu, true);
+ a6xx_gpu_sw_reset(gpu, false);
+ }
+
reinit_completion(&gmu->pd_gate);
dev_pm_genpd_add_notifier(gmu->cxpd, &gmu->pd_nb);
dev_pm_genpd_synced_poweroff(gmu->cxpd);
@@ -1461,7 +1483,8 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
* Force the GPU to stay on until after we finish
* collecting information
*/
- gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
+ if (!adreno_has_gmu_wrapper(adreno_gpu))
+ gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
DRM_DEV_ERROR(&gpu->pdev->dev,
"gpu fault ring %d fence %x status %8.8X rb %4.4x/%4.4x ib1 %16.16llX/%4.4x ib2 %16.16llX/%4.4x\n",
@@ -1592,6 +1615,10 @@ static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)
static void a6xx_llc_slices_destroy(struct a6xx_gpu *a6xx_gpu)
{
+ /* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
+ if (adreno_has_gmu_wrapper(&a6xx_gpu->base))
+ return;
+
llcc_slice_putd(a6xx_gpu->llc_slice);
llcc_slice_putd(a6xx_gpu->htw_llc_slice);
}
@@ -1601,6 +1628,10 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
{
struct device_node *phandle;
+ /* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
+ if (adreno_has_gmu_wrapper(&a6xx_gpu->base))
+ return;
+
/*
* There is a different programming path for targets with an mmu500
* attached, so detect if that is the case
@@ -1670,7 +1701,7 @@ void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert)
udelay(100);
}
-static int a6xx_pm_resume(struct msm_gpu *gpu)
+static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
{
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
@@ -1690,10 +1721,58 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
a6xx_llc_activate(a6xx_gpu);
- return 0;
+ return ret;
}
-static int a6xx_pm_suspend(struct msm_gpu *gpu)
+static int a6xx_pm_resume(struct msm_gpu *gpu)
+{
+ struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+ struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+ struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
+ unsigned long freq = gpu->fast_rate;
+ struct dev_pm_opp *opp;
+ int ret;
+
+ gpu->needs_hw_init = true;
+
+ trace_msm_gpu_resume(0);
+
+ mutex_lock(&a6xx_gpu->gmu.lock);
+
+ opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
+ if (IS_ERR(opp)) {
+ ret = PTR_ERR(opp);
+ goto err_set_opp;
+ }
+ dev_pm_opp_put(opp);
+
+ /* Set the core clock and bus bw, having VDD scaling in mind */
+ dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
+
+ pm_runtime_resume_and_get(gmu->dev);
+ pm_runtime_resume_and_get(gmu->gxpd);
+
+ ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
+ if (ret)
+ goto err_bulk_clk;
+
+ /* If anything goes south, tear the GPU down piece by piece.. */
+ if (ret) {
+err_bulk_clk:
+ pm_runtime_put(gmu->gxpd);
+ pm_runtime_put(gmu->dev);
+ dev_pm_opp_set_opp(&gpu->pdev->dev, NULL);
+ }
+err_set_opp:
+ mutex_unlock(&a6xx_gpu->gmu.lock);
+
+ if (!ret)
+ msm_devfreq_resume(gpu);
+
+ return ret;
+}
+
+static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
{
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
@@ -1720,7 +1799,40 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
return 0;
}
-static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
+static int a6xx_pm_suspend(struct msm_gpu *gpu)
+{
+ struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+ struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+ struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
+ int i;
+
+ trace_msm_gpu_suspend(0);
+
+ msm_devfreq_suspend(gpu);
+
+ mutex_lock(&a6xx_gpu->gmu.lock);
+
+ /* Drain the outstanding traffic on memory buses */
+ a6xx_bus_clear_pending_transactions(adreno_gpu, true);
+
+ clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
+
+ pm_runtime_put_sync(gmu->gxpd);
+ dev_pm_opp_set_opp(&gpu->pdev->dev, NULL);
+ pm_runtime_put_sync(gmu->dev);
+
+ mutex_unlock(&a6xx_gpu->gmu.lock);
+
+ if (a6xx_gpu->shadow_bo)
+ for (i = 0; i < gpu->nr_rings; i++)
+ a6xx_gpu->shadow[i] = 0;
+
+ gpu->suspend_count++;
+
+ return 0;
+}
+
+static int a6xx_gmu_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
{
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
@@ -1739,6 +1851,12 @@ static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
return 0;
}
+static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
+{
+ *value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
+ return 0;
+}
+
static struct msm_ringbuffer *a6xx_active_ring(struct msm_gpu *gpu)
{
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
@@ -2004,8 +2122,8 @@ static const struct adreno_gpu_funcs funcs = {
.set_param = adreno_set_param,
.hw_init = a6xx_hw_init,
.ucode_load = a6xx_ucode_load,
- .pm_suspend = a6xx_pm_suspend,
- .pm_resume = a6xx_pm_resume,
+ .pm_suspend = a6xx_gmu_pm_suspend,
+ .pm_resume = a6xx_gmu_pm_resume,
.recover = a6xx_recover,
.submit = a6xx_submit,
.active_ring = a6xx_active_ring,
@@ -2020,6 +2138,35 @@ static const struct adreno_gpu_funcs funcs = {
#if defined(CONFIG_DRM_MSM_GPU_STATE)
.gpu_state_get = a6xx_gpu_state_get,
.gpu_state_put = a6xx_gpu_state_put,
+#endif
+ .create_address_space = a6xx_create_address_space,
+ .create_private_address_space = a6xx_create_private_address_space,
+ .get_rptr = a6xx_get_rptr,
+ .progress = a6xx_progress,
+ },
+ .get_timestamp = a6xx_gmu_get_timestamp,
+};
+
+static const struct adreno_gpu_funcs funcs_gmuwrapper = {
+ .base = {
+ .get_param = adreno_get_param,
+ .set_param = adreno_set_param,
+ .hw_init = a6xx_hw_init,
+ .ucode_load = a6xx_ucode_load,
+ .pm_suspend = a6xx_pm_suspend,
+ .pm_resume = a6xx_pm_resume,
+ .recover = a6xx_recover,
+ .submit = a6xx_submit,
+ .active_ring = a6xx_active_ring,
+ .irq = a6xx_irq,
+ .destroy = a6xx_destroy,
+#if defined(CONFIG_DRM_MSM_GPU_STATE)
+ .show = a6xx_show,
+#endif
+ .gpu_busy = a6xx_gpu_busy,
+#if defined(CONFIG_DRM_MSM_GPU_STATE)
+ .gpu_state_get = a6xx_gpu_state_get,
+ .gpu_state_put = a6xx_gpu_state_put,
#endif
.create_address_space = a6xx_create_address_space,
.create_private_address_space = a6xx_create_private_address_space,
@@ -2050,15 +2197,31 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
adreno_gpu->registers = NULL;
+ /* Check if there is a GMU phandle and set it up */
+ node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
+ /* FIXME: How do we gracefully handle this? */
+ BUG_ON(!node);
+
+ adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
+
/*
* We need to know the platform type before calling into adreno_gpu_init
* so that the hw_apriv flag can be correctly set. Snoop into the info
* and grab the revision number
*/
info = adreno_info(config->rev);
-
- if (info && (info->revn == 650 || info->revn == 660 ||
- adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
+ if (!info)
+ return ERR_PTR(-EINVAL);
+
+ /* Assign these early so that we can use the is_aXYZ helpers */
+ /* Numeric revision IDs (e.g. 630) */
+ adreno_gpu->revn = info->revn;
+ /* New-style ADRENO_REV()-only */
+ adreno_gpu->rev = info->rev;
+ /* Quirk data */
+ adreno_gpu->info = info;
+
+ if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
adreno_gpu->base.hw_apriv = true;
a6xx_llc_slices_init(pdev, a6xx_gpu);
@@ -2069,7 +2232,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
return ERR_PTR(ret);
}
- ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+ if (adreno_has_gmu_wrapper(adreno_gpu))
+ ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
+ else
+ ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
if (ret) {
a6xx_destroy(&(a6xx_gpu->base.base));
return ERR_PTR(ret);
@@ -2082,13 +2248,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
priv->gpu_clamp_to_idle = true;
- /* Check if there is a GMU phandle and set it up */
- node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
-
- /* FIXME: How do we gracefully handle this? */
- BUG_ON(!node);
-
- ret = a6xx_gmu_init(a6xx_gpu, node);
+ if (adreno_has_gmu_wrapper(adreno_gpu))
+ ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
+ else
+ ret = a6xx_gmu_init(a6xx_gpu, node);
of_node_put(node);
if (ret) {
a6xx_destroy(&(a6xx_gpu->base.base));
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index aa70390ee1c6..c788b06e72da 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
+int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
index 30ecdff363e7..4e5d650578c6 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
@@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
/* Get the generic state from the adreno core */
adreno_gpu_state_get(gpu, &a6xx_state->base);
- a6xx_get_gmu_registers(gpu, a6xx_state);
+ if (!adreno_has_gmu_wrapper(adreno_gpu)) {
+ a6xx_get_gmu_registers(gpu, a6xx_state);
- a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
- a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
- a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
+ a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
+ a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
+ a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
- a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
+ a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
+ }
/* If GX isn't on the rest of the data isn't going to be accessible */
- if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
+ if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
return &a6xx_state->base;
/* Get the banks of indexed registers */
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 6934cee07d42..5c5901d65950 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
if (!adreno_gpu->info->fw[i])
continue;
+ /* Skip loading GMU firwmare with GMU Wrapper */
+ if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
+ continue;
+
/* Skip if the firmware has already been loaded */
if (adreno_gpu->fw[i])
continue;
@@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
u32 speedbin;
int ret;
- /* Only handle the core clock when GMU is not in use */
- if (config->rev.core < 6) {
+ /* Only handle the core clock when GMU is not in use (or is absent). */
+ if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
/*
* This can only be done before devm_pm_opp_of_add_table(), or
* dev_pm_opp_set_config() will WARN_ON()
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index f62612a5c70f..ee5352bc5329 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -115,6 +115,7 @@ struct adreno_gpu {
* code (a3xx_gpu.c) and stored in this common location.
*/
const unsigned int *reg_offsets;
+ bool gmu_is_wrapper;
};
#define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
@@ -145,6 +146,11 @@ struct adreno_platform_config {
bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
+static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
+{
+ return gpu->gmu_is_wrapper;
+}
+
static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
{
return (gpu->revn < 300);
--
2.40.1
A610 is implemented on at least three SoCs: SM6115 (bengal), SM6125
(trinket) and SM6225 (khaje). Trinket does not support speed binning
(only a single SKU exists) and we don't yet support khaje upstream.
Hence, add a fuse mapping table for bengal to allow for per-chip
frequency limiting.
Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index d046af5f6de2..c304fa118cff 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -2098,6 +2098,30 @@ static bool a6xx_progress(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
return progress;
}
+static u32 a610_get_speed_bin(u32 fuse)
+{
+ /*
+ * There are (at least) three SoCs implementing A610: SM6125 (trinket),
+ * SM6115 (bengal) and SM6225 (khaje). Trinket does not have speedbinning,
+ * as only a single SKU exists and we don't support khaje upstream yet.
+ * Hence, this matching table is only valid for bengal and can be easily
+ * expanded if need be.
+ */
+
+ if (fuse == 0)
+ return 0;
+ else if (fuse == 206)
+ return 1;
+ else if (fuse == 200)
+ return 2;
+ else if (fuse == 157)
+ return 3;
+ else if (fuse == 127)
+ return 4;
+
+ return UINT_MAX;
+}
+
static u32 a618_get_speed_bin(u32 fuse)
{
if (fuse == 0)
@@ -2195,6 +2219,9 @@ static u32 fuse_to_supp_hw(struct device *dev, struct adreno_gpu *adreno_gpu, u3
{
u32 val = UINT_MAX;
+ if (adreno_is_a610(adreno_gpu))
+ val = a610_get_speed_bin(fuse);
+
if (adreno_is_a618(adreno_gpu))
val = a618_get_speed_bin(fuse);
--
2.40.1
Before transitioning to using per-SoC and not per-Adreno speedbin
fuse values (need another patchset to land elsewhere), a good
improvement/stopgap solution is to use adreno_is_aXYZ macros in
place of explicit revision matching. Do so to allow differentiating
between A619 and A619_holi.
Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 18 +++++++++---------
drivers/gpu/drm/msm/adreno/adreno_gpu.h | 14 ++++++++++++--
2 files changed, 21 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 5faa85543428..ca4ffa44097e 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -2163,23 +2163,23 @@ static u32 adreno_7c3_get_speed_bin(u32 fuse)
return UINT_MAX;
}
-static u32 fuse_to_supp_hw(struct device *dev, struct adreno_rev rev, u32 fuse)
+static u32 fuse_to_supp_hw(struct device *dev, struct adreno_gpu *adreno_gpu, u32 fuse)
{
u32 val = UINT_MAX;
- if (adreno_cmp_rev(ADRENO_REV(6, 1, 8, ANY_ID), rev))
+ if (adreno_is_a618(adreno_gpu))
val = a618_get_speed_bin(fuse);
- else if (adreno_cmp_rev(ADRENO_REV(6, 1, 9, ANY_ID), rev))
+ else if (adreno_is_a619(adreno_gpu))
val = a619_get_speed_bin(fuse);
- else if (adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), rev))
+ else if (adreno_is_7c3(adreno_gpu))
val = adreno_7c3_get_speed_bin(fuse);
- else if (adreno_cmp_rev(ADRENO_REV(6, 4, 0, ANY_ID), rev))
+ else if (adreno_is_a640(adreno_gpu))
val = a640_get_speed_bin(fuse);
- else if (adreno_cmp_rev(ADRENO_REV(6, 5, 0, ANY_ID), rev))
+ else if (adreno_is_a650(adreno_gpu))
val = a650_get_speed_bin(fuse);
if (val == UINT_MAX) {
@@ -2192,7 +2192,7 @@ static u32 fuse_to_supp_hw(struct device *dev, struct adreno_rev rev, u32 fuse)
return (1 << val);
}
-static int a6xx_set_supported_hw(struct device *dev, struct adreno_rev rev)
+static int a6xx_set_supported_hw(struct device *dev, struct adreno_gpu *adreno_gpu)
{
u32 supp_hw;
u32 speedbin;
@@ -2211,7 +2211,7 @@ static int a6xx_set_supported_hw(struct device *dev, struct adreno_rev rev)
return ret;
}
- supp_hw = fuse_to_supp_hw(dev, rev, speedbin);
+ supp_hw = fuse_to_supp_hw(dev, adreno_gpu, speedbin);
ret = devm_pm_opp_set_supported_hw(dev, &supp_hw, 1);
if (ret)
@@ -2330,7 +2330,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
a6xx_llc_slices_init(pdev, a6xx_gpu);
- ret = a6xx_set_supported_hw(&pdev->dev, config->rev);
+ ret = a6xx_set_supported_hw(&pdev->dev, adreno_gpu);
if (ret) {
a6xx_destroy(&(a6xx_gpu->base.base));
return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index 7a5d595d4b99..21513cec038f 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -268,9 +268,9 @@ static inline int adreno_is_a630(struct adreno_gpu *gpu)
return gpu->revn == 630;
}
-static inline int adreno_is_a640_family(struct adreno_gpu *gpu)
+static inline int adreno_is_a640(struct adreno_gpu *gpu)
{
- return (gpu->revn == 640) || (gpu->revn == 680);
+ return gpu->revn == 640;
}
static inline int adreno_is_a650(struct adreno_gpu *gpu)
@@ -289,6 +289,11 @@ static inline int adreno_is_a660(struct adreno_gpu *gpu)
return gpu->revn == 660;
}
+static inline int adreno_is_a680(struct adreno_gpu *gpu)
+{
+ return gpu->revn == 680;
+}
+
/* check for a615, a616, a618, a619 or any derivatives */
static inline int adreno_is_a615_family(struct adreno_gpu *gpu)
{
@@ -306,6 +311,11 @@ static inline int adreno_is_a650_family(struct adreno_gpu *gpu)
return gpu->revn == 650 || gpu->revn == 620 || adreno_is_a660_family(gpu);
}
+static inline int adreno_is_a640_family(struct adreno_gpu *gpu)
+{
+ return adreno_is_a640(gpu) || adreno_is_a680(gpu);
+}
+
u64 adreno_private_address_space_size(struct msm_gpu *gpu);
int adreno_get_param(struct msm_gpu *gpu, struct msm_file_private *ctx,
uint32_t param, uint64_t *value, uint32_t *len);
--
2.40.1
The GPU can only be one at a time. Turn a series of ifs into if +
elseifs to save some CPU cycles.
Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Konrad Dybcio <[email protected]>
---
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 1a29e7dd9975..5faa85543428 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -2170,16 +2170,16 @@ static u32 fuse_to_supp_hw(struct device *dev, struct adreno_rev rev, u32 fuse)
if (adreno_cmp_rev(ADRENO_REV(6, 1, 8, ANY_ID), rev))
val = a618_get_speed_bin(fuse);
- if (adreno_cmp_rev(ADRENO_REV(6, 1, 9, ANY_ID), rev))
+ else if (adreno_cmp_rev(ADRENO_REV(6, 1, 9, ANY_ID), rev))
val = a619_get_speed_bin(fuse);
- if (adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), rev))
+ else if (adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), rev))
val = adreno_7c3_get_speed_bin(fuse);
- if (adreno_cmp_rev(ADRENO_REV(6, 4, 0, ANY_ID), rev))
+ else if (adreno_cmp_rev(ADRENO_REV(6, 4, 0, ANY_ID), rev))
val = a640_get_speed_bin(fuse);
- if (adreno_cmp_rev(ADRENO_REV(6, 5, 0, ANY_ID), rev))
+ else if (adreno_cmp_rev(ADRENO_REV(6, 5, 0, ANY_ID), rev))
val = a650_get_speed_bin(fuse);
if (val == UINT_MAX) {
--
2.40.1
On Mon, 29 May 2023 15:52:20 +0200, Konrad Dybcio wrote:
> The "GMU Wrapper" is Qualcomm's name for "let's treat the GPU blocks
> we'd normally assign to the GMU as if they were a part of the GMU, even
> though they are not". It's a (good) software representation of the GMU_CX
> and GMU_GX register spaces within the GPUSS that helps us programatically
> treat these de-facto GMU-less parts in a way that's very similar to their
> GMU-equipped cousins, massively saving up on code duplication.
>
> The "wrapper" register space was specifically designed to mimic the layout
> of a real GMU, though it rather obviously does not have the M3 core et al.
>
> GMU wrapper-equipped A6xx GPUs require clocks and clock-names to be
> specified under the GPU node, just like their older cousins. Account
> for that.
>
> Signed-off-by: Konrad Dybcio <[email protected]>
> ---
> .../devicetree/bindings/display/msm/gpu.yaml | 61 ++++++++++++++++++----
> 1 file changed, 52 insertions(+), 9 deletions(-)
>
Running 'make dtbs_check' with the schema in this patch gives the
following warnings. Consider if they are expected or the schema is
incorrect. These may not be new warnings.
Note that it is not yet a requirement to have 0 warnings for dtbs_check.
This will change in the future.
Full log is available here: https://patchwork.ozlabs.org/patch/1787121
gpu@2c00000: compatible: 'oneOf' conditional failed, one must be fixed:
arch/arm64/boot/dts/qcom/sm8150-hdk.dtb
arch/arm64/boot/dts/qcom/sm8150-mtp.dtb
On 30.05.2023 14:26, Krzysztof Kozlowski wrote:
> On Mon, 29 May 2023 15:52:20 +0200, Konrad Dybcio wrote:
>> The "GMU Wrapper" is Qualcomm's name for "let's treat the GPU blocks
>> we'd normally assign to the GMU as if they were a part of the GMU, even
>> though they are not". It's a (good) software representation of the GMU_CX
>> and GMU_GX register spaces within the GPUSS that helps us programatically
>> treat these de-facto GMU-less parts in a way that's very similar to their
>> GMU-equipped cousins, massively saving up on code duplication.
>>
>> The "wrapper" register space was specifically designed to mimic the layout
>> of a real GMU, though it rather obviously does not have the M3 core et al.
>>
>> GMU wrapper-equipped A6xx GPUs require clocks and clock-names to be
>> specified under the GPU node, just like their older cousins. Account
>> for that.
>>
>> Signed-off-by: Konrad Dybcio <[email protected]>
>> ---
>> .../devicetree/bindings/display/msm/gpu.yaml | 61 ++++++++++++++++++----
>> 1 file changed, 52 insertions(+), 9 deletions(-)
>>
>
> Running 'make dtbs_check' with the schema in this patch gives the
> following warnings. Consider if they are expected or the schema is
> incorrect. These may not be new warnings.
I think it'd be beneficial if the bot diffed the output of checks pre-
and post- patch.
Konrad
>
> Note that it is not yet a requirement to have 0 warnings for dtbs_check.
> This will change in the future.
>
> Full log is available here: https://patchwork.ozlabs.org/patch/1787121
>
>
> gpu@2c00000: compatible: 'oneOf' conditional failed, one must be fixed:
> arch/arm64/boot/dts/qcom/sm8150-hdk.dtb
> arch/arm64/boot/dts/qcom/sm8150-mtp.dtb
On 29.05.2023 15:52, Konrad Dybcio wrote:
> v7 -> v8:
> - Fix up resume/suspend (icc now correctly parks to 0, don't abuse
> OPP & genpd throughout system-wide suspend)
> - Don't handle ebi1_clk separately, the bulk ops handle it just fine
> - Rebase on next-20230525 (no meaningful changes)
Krzysztof pointed out to me in private that he has previously
reviewed the dt-bindings patches, but I managed to drop them by
accident.. I'll fix that in the next revision. I'll wait for more
comments before resending though.
Konrad
>
> v7: https://lore.kernel.org/linux-arm-msm/[email protected]/
>
> v6 -> v7:
> - Rebase on next-20230519 (A640/650 speedbin merged already)
>
> - separate out the .get_timestamp cb for gmu wrapper
>
> - check for gmu presence inside a6xx_llc_slices_(init|destroy) instead
> of before calling them
>
> - use REG_A6XX_RBBM_GPR0_CNTL instead of literal 0x18
>
> - move a6xx_bus_clear_pending_transactions to a6xx_gpu, clean it up
> and reuse it for gmu wrapper gpus
>
> - drop clearing RBBM_GBIF (GBIF from GX's POV) as part of draining the
> buses, it's not necessary
>
> - introduce a helper for gpu softreset
>
> - sw-reset the gmu wrapper GPUS *after* draining GBIF and only reset
> it if it's hung
>
> - reword the commit message in "Remove both GBIF and RBBM GBIF halt
> on hw init" and move it before gmu wrapper-specific changes
>
> - drop set_rate logic from a6xx_pm_suspend as the clock simply gets
> disabled and we don't have to worry about scaling problems as OPP
> and devfreq take care of that, validated with debugcc
>
> - drop a level of indentation in _a6xx_check_idle() to hopefully
> improve readability
>
> - check for !a610 instead of gmu_wrapper||a619_holi in sptprac cc
> toggling in a6xx_set_hwcg()
>
> - pick up krzk's rb on bindings
>
> All external dependencies have been merged since the last revision.
>
> v6: https://lore.kernel.org/r/[email protected]
>
> v5 -> v6:
> - Rebase on 8ead96783163 ("drm/msm/gpu: Move BO allocation out of hw_init")
> (Add .ucode_load to funcs_gmuwrapper)
> - Drop A6[45]0 speedbin deps, merged into msm-next
>
> Dependencies:
> - https://lore.kernel.org/linux-arm-msm/[email protected]/ (to work properly)
>
> v5: https://lore.kernel.org/linux-arm-msm/[email protected]/
>
> v4 -> v5:
> - Add a newline before the new allOf:if: [3/15]
> - Enforce 6 clocks on A619_holi/A610 [2/15]
> - Pick up tags
> - Improve error handling in a6xx_pm_resume [6/15]
> - Add patch [1/15] (fix an existing issue) which can be picked
> separately and account for it in [6/15]
> - Rebase atop Akhil's CX shutdown patches and incorporate analogous logic
> - Fix a regression introduced in v3 that made the fw loader expect
> GMU fw on GMU wrapper GPUs
>
> Dependencies:
> - https://lore.kernel.org/linux-arm-msm/[email protected]/ (to apply)
> - https://lore.kernel.org/linux-arm-msm/[email protected]/ (to work properly)
>
> v4: https://lore.kernel.org/r/[email protected]
>
> v3 -> v4:
> - Drop the mistakengly-included and wrong A3xx-A5xx bindings changes
> - Improve bindings commit messages to better explain what GMU Wrapper is
> - Drop the A680 highest bank bit value adjustment patch
> - Sort UBWC config variables in a reverse-Christmass-tree fashion [4/14]
> - Don't alter any UBWC config values in [4/14]
> - Do so for a619_holi in [8/14]
> - Rebase on next-20230314 (shouldn't matter at all)
>
> v3: https://lore.kernel.org/r/[email protected]
>
> v2 -> v3:
> New dependencies:
> - https://lore.kernel.org/linux-arm-msm/[email protected]/T/#t
> - https://lore.kernel.org/linux-arm-msm/[email protected]/
>
> Sidenote: A speedbin rework is in progress, the of_machine_is_compatible
> calls in A619_holi are ugly (but well, necessary..) but they'll be
> replaced with socid matching in this or the next kernel cycle.
>
> Due to the new way of identifying GMU wrapper GPUs, configuring 6350
> to use wrapper would cause the wrong fuse values to be checked, but that
> will be solved by the conversion + the ultimate goal is to use the GMU
> whenever possible with the wrapper left for GMU-less Adrenos and early
> bringup debugging of GMU-equipped ones.
>
> - Ship dt-bindings in this series as we're referencing the compatible now
>
> - "De-staticize" -> "remove static keyword" [3/15]
>
> - Track down all the values in [4/15]
>
> - Add many comments and explanations in [4/15]
>
> - Fix possible return-before-mutex-unlock [5/15]
>
> - Explain the GMU wrapper a bit more in the commit msg [5/15]
>
> - Separate out pm_resume/suspend for GMU-wrapper GPUs to make things
> cleaner [5/15]
>
> - Don't check if `info` exists, it has to at this point [5/15]
>
> - Assign gpu->info early and clean up following if statements in
> a6xx_gpu_init [5/15]
>
> - Determine whether we use GMU wrapper based on the GMU compatible
> instead of a quirk [5/15]
>
> - Use a struct field to annotate whether we're using gmu wrapper so
> that it can be assigned at runtime (turns out a619 holi-ness cannot
> be determined by patchid + that will make it easier to test out GMU
> GPUs without actually turning on the GMU if anybody wants to do so)
> [5/15]
>
> - Unconditionally hook up gx to the gmu wrapper (otherwise our gpu
> will not get power) [5/15]
>
> - Don't check for gx domain presence in gmu_wrapper paths, it's
> guaranteed [5/15]
>
> - Use opp set rate in the gmuwrapper suspend path [5/15]
>
> - Call opp functions on the GPU device and not on the DRM device of
> mdp4/5/DPU1 half the time (WHOOOOPS!) [5/15]
>
> - Disable the memory clock in a6xx_pm_suspend instead of enabling it
> (moderate oops) [5/15]
>
> - Call the forgotten clk_bulk_disable_unprepare in a6xx_pm_suspend [5/15]
>
> - Set rate to FMIN (a6xx really doesn't like rate=0 + that's what
> msm-5.x does anyway) before disabling core clock [5/15]
>
> - pm_runtime_get_sync -> pm_runtime_resume_and_get [5/15]
>
> - Don't annotate no cached BO support with a quirk, as A619_holi is
> merged into the A619 entry in the big const struct - this means
> that all GPUs operating in gmu wrapper configuration will be
> implicitly treated as if they didn't have this feature [7/15]
>
> - Drop OPP rate & icc related patches, they're a part of a separate
> series now; rebase on it
>
> - Clean up extra parentheses [8/15]
>
> - Identify A619_holi by checking the compatible of its GMU instead
> of patchlevel [8/15]
>
> - Drop "Fix up A6XX protected registers" - unnecessary, Rob will add
> a comment explaining why
>
> - Fix existing UBWC values for A680, new patch [10/15]
>
> - Use adreno_is_aXYZ macros in speedbin matching [13/15] - new patch
>
> v2: https://lore.kernel.org/linux-arm-msm/[email protected]/
>
> v1 -> v2:
> - Fix A630 values in [2/14]
> - Fix [6/14] for GMU-equipped GPUs
>
> Link to v1: https://lore.kernel.org/linux-arm-msm/[email protected]/
>
> This series concludes my couple-weeks-long suffering of figuring out
> the ins and outs of the "non-standard" A6xx GPUs which feature no GMU.
>
> The GMU functionality is essentially emulated by parting out a
> "GMU wrapper" region, which is essentially just a register space
> within the GPU. It's modeled to be as similar to the actual GMU
> as possible while staying as unnecessary as we can make it - there's
> no IRQs, communicating with a microcontroller, no RPMh communication
> etc. etc. I tried to reuse as much code as possible without making
> a mess where every even line is used for GMU and every odd line is
> used for GMU wrapper..
>
> This series contains:
> - plumbing for non-GMU operation, if-ing out GMU calls based on
> GMU presence
> - GMU wrapper support
> - A610 support (w/ speedbin)
> - A619 support (w/ speedbin)
> - couple of minor fixes and improvements
> - VDDCX/VDDGX scaling fix for non-GMU GPUs (concerns more than just
> A6xx)
> - Enablement of opp interconnect properties
>
> A619_holi works perfectly fine using the already-present A619 support
> in mesa. A610 needs more work on that front, but can already replay
> command traces captures on downstream.
>
> NOTE: the "drm/msm/a6xx: Add support for A619_holi" patch contains
> two occurences of 0x18 used in place of a register #define, as it's
> supposed to be RBBM_GPR0_CNTL, but that will only be present after
> mesa-side changes are merged and headers are synced from there.
>
> Speedbin patches depend on:
> https://lore.kernel.org/linux-arm-msm/[email protected]/
>
> Signed-off-by: Konrad Dybcio <[email protected]>
> ---
> Konrad Dybcio (18):
> dt-bindings: display/msm: gpu: Document GMU wrapper-equipped A6xx
> dt-bindings: display/msm/gmu: Add GMU wrapper
> drm/msm/a6xx: Remove static keyword from sptprac en/disable functions
> drm/msm/a6xx: Move force keepalive vote removal to a6xx_gmu_force_off()
> drm/msm/a6xx: Move a6xx_bus_clear_pending_transactions to a6xx_gpu
> drm/msm/a6xx: Improve a6xx_bus_clear_pending_transactions()
> drm/msm/a6xx: Add a helper for software-resetting the GPU
> drm/msm/a6xx: Remove both GBIF and RBBM GBIF halt on hw init
> drm/msm/a6xx: Extend and explain UBWC config
> drm/msm/a6xx: Introduce GMU wrapper support
> drm/msm/adreno: Disable has_cached_coherent in GMU wrapper configurations
> drm/msm/a6xx: Add support for A619_holi
> drm/msm/a6xx: Add A610 support
> drm/msm/a6xx: Fix some A619 tunables
> drm/msm/a6xx: Use "else if" in GPU speedbin rev matching
> drm/msm/a6xx: Use adreno_is_aXYZ macros in speedbin matching
> drm/msm/a6xx: Add A619_holi speedbin support
> drm/msm/a6xx: Add A610 speedbin support
>
> .../devicetree/bindings/display/msm/gmu.yaml | 50 +-
> .../devicetree/bindings/display/msm/gpu.yaml | 61 ++-
> drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 122 +++--
> drivers/gpu/drm/msm/adreno/a6xx_gmu.h | 2 +
> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 512 ++++++++++++++++++---
> drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 4 +
> drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 14 +-
> drivers/gpu/drm/msm/adreno/adreno_device.c | 17 +-
> drivers/gpu/drm/msm/adreno/adreno_gpu.c | 8 +-
> drivers/gpu/drm/msm/adreno/adreno_gpu.h | 33 +-
> 10 files changed, 686 insertions(+), 137 deletions(-)
> ---
> base-commit: 6a3d37b4d885129561e1cef361216f00472f7d2e
> change-id: 20230223-topic-gmuwrapper-b4fff5fd7789
>
> Best regards,
On Mon, May 29, 2023 at 03:52:24PM +0200, Konrad Dybcio wrote:
>
> This function is responsible for telling the GPU to halt transactions
> on all of its relevant buses, drain them and leave them in a predictable
> state, so that the GPU can be e.g. reset cleanly.
>
> Move the function to a6xx_gpu.c, remove the static keyword and add a
> prototype in a6xx_gpu.h to accomodate for the move.
>
> Signed-off-by: Konrad Dybcio <[email protected]>
Reviewed-by: Akhil P Oommen <[email protected]>
-Akhil
> ---
> drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 37 -----------------------------------
> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 36 ++++++++++++++++++++++++++++++++++
> drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 2 ++
> 3 files changed, 38 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index 9421716a2fe5..b86be123ecd0 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -868,43 +868,6 @@ static void a6xx_gmu_rpmh_off(struct a6xx_gmu *gmu)
> (val & 1), 100, 1000);
> }
>
> -#define GBIF_CLIENT_HALT_MASK BIT(0)
> -#define GBIF_ARB_HALT_MASK BIT(1)
> -
> -static void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu,
> - bool gx_off)
> -{
> - struct msm_gpu *gpu = &adreno_gpu->base;
> -
> - if (!a6xx_has_gbif(adreno_gpu)) {
> - gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, 0xf);
> - spin_until((gpu_read(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL1) &
> - 0xf) == 0xf);
> - gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, 0);
> -
> - return;
> - }
> -
> - if (gx_off) {
> - /* Halt the gx side of GBIF */
> - gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 1);
> - spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) & 1);
> - }
> -
> - /* Halt new client requests on GBIF */
> - gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
> - spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
> - (GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
> -
> - /* Halt all AXI requests on GBIF */
> - gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
> - spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
> - (GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
> -
> - /* The GBIF halt needs to be explicitly cleared */
> - gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
> -}
> -
> /* Force the GMU off in case it isn't responsive */
> static void a6xx_gmu_force_off(struct a6xx_gmu *gmu)
> {
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index e34aa15156a4..6bb4da70f6a6 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1597,6 +1597,42 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
> a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
> }
>
> +#define GBIF_CLIENT_HALT_MASK BIT(0)
> +#define GBIF_ARB_HALT_MASK BIT(1)
> +
> +void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_off)
> +{
> + struct msm_gpu *gpu = &adreno_gpu->base;
> +
> + if (!a6xx_has_gbif(adreno_gpu)) {
> + gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, 0xf);
> + spin_until((gpu_read(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL1) &
> + 0xf) == 0xf);
> + gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, 0);
> +
> + return;
> + }
> +
> + if (gx_off) {
> + /* Halt the gx side of GBIF */
> + gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 1);
> + spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) & 1);
> + }
> +
> + /* Halt new client requests on GBIF */
> + gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
> + spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
> + (GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
> +
> + /* Halt all AXI requests on GBIF */
> + gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
> + spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
> + (GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
> +
> + /* The GBIF halt needs to be explicitly cleared */
> + gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
> +}
> +
> static int a6xx_pm_resume(struct msm_gpu *gpu)
> {
> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index eea2e60ce3b7..9580def06d45 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -88,4 +88,6 @@ void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
> struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu);
> int a6xx_gpu_state_put(struct msm_gpu_state *state);
>
> +void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_off);
> +
> #endif /* __A6XX_GPU_H__ */
>
> --
> 2.40.1
>
On Mon, May 29, 2023 at 03:52:23PM +0200, Konrad Dybcio wrote:
>
> As pointed out by Akhil during the review process of GMU wrapper
> introduction [1], it makes sense to move this write into the function
> that's responsible for forcibly shutting the GMU off.
>
> It is also very convenient to move this to GMU-specific code, so that
> it does not have to be guarded by an if-condition to avoid calling it
> on GMU wrapper targets.
>
> Move the write to the aforementioned a6xx_gmu_force_off() to achieve
> that. No effective functional change.
Reviewed-by: Akhil P Oommen <[email protected]>
-Akhil.
>
> [1] https://lore.kernel.org/linux-arm-msm/[email protected]/
> Signed-off-by: Konrad Dybcio <[email protected]>
> ---
> drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 6 ++++++
> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 ------
> 2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index 87babbb2a19f..9421716a2fe5 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -912,6 +912,12 @@ static void a6xx_gmu_force_off(struct a6xx_gmu *gmu)
> struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> struct msm_gpu *gpu = &adreno_gpu->base;
>
> + /*
> + * Turn off keep alive that might have been enabled by the hang
> + * interrupt
> + */
> + gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
> +
> /* Flush all the queues */
> a6xx_hfi_stop(gmu);
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 9fb214f150dd..e34aa15156a4 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1274,12 +1274,6 @@ static void a6xx_recover(struct msm_gpu *gpu)
> /* Halt SQE first */
> gpu_write(gpu, REG_A6XX_CP_SQE_CNTL, 3);
>
> - /*
> - * Turn off keep alive that might have been enabled by the hang
> - * interrupt
> - */
> - gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
> -
> pm_runtime_dont_use_autosuspend(&gpu->pdev->dev);
>
> /* active_submit won't change until we make a submission */
>
> --
> 2.40.1
>
On Mon, May 29, 2023 at 03:52:30PM +0200, Konrad Dybcio wrote:
>
> A610 and A619_holi don't support the feature. Disable it to make the GPU stop
> crashing after almost each and every submission - the received data on
> the GPU end was simply incomplete in garbled, resulting in almost nothing
> being executed properly. Extend the disablement to adreno_has_gmu_wrapper,
> as none of the GMU wrapper Adrenos that don't support yet seem to feature it.
>
> Signed-off-by: Konrad Dybcio <[email protected]>
> ---
Reviewed-by: Akhil P Oommen <[email protected]>
-Akhil
> drivers/gpu/drm/msm/adreno/adreno_device.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
> index 8cff86e9d35c..b133755a56c4 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> @@ -551,7 +551,6 @@ static int adreno_bind(struct device *dev, struct device *master, void *data)
> config.rev.minor, config.rev.patchid);
>
> priv->is_a2xx = config.rev.core == 2;
> - priv->has_cached_coherent = config.rev.core >= 6;
>
> gpu = info->init(drm);
> if (IS_ERR(gpu)) {
> @@ -563,6 +562,10 @@ static int adreno_bind(struct device *dev, struct device *master, void *data)
> if (ret)
> return ret;
>
> + if (config.rev.core >= 6)
> + if (!adreno_has_gmu_wrapper(to_adreno_gpu(gpu)))
> + priv->has_cached_coherent = true;
> +
> return 0;
> }
>
>
> --
> 2.40.1
>
On Mon, May 29, 2023 at 03:52:25PM +0200, Konrad Dybcio wrote:
>
> Unify the indentation and explain the cryptic 0xF value.
>
> Signed-off-by: Konrad Dybcio <[email protected]>
Reviewed-by: Akhil P Oommen <[email protected]>
-Akhil
> ---
> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 6bb4da70f6a6..e3ac3f045665 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1597,17 +1597,18 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
> a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
> }
>
> -#define GBIF_CLIENT_HALT_MASK BIT(0)
> -#define GBIF_ARB_HALT_MASK BIT(1)
> +#define GBIF_CLIENT_HALT_MASK BIT(0)
> +#define GBIF_ARB_HALT_MASK BIT(1)
> +#define VBIF_XIN_HALT_CTRL0_MASK GENMASK(3, 0)
>
> void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_off)
> {
> struct msm_gpu *gpu = &adreno_gpu->base;
>
> if (!a6xx_has_gbif(adreno_gpu)) {
> - gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, 0xf);
> + gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, VBIF_XIN_HALT_CTRL0_MASK);
> spin_until((gpu_read(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL1) &
> - 0xf) == 0xf);
> + (VBIF_XIN_HALT_CTRL0_MASK)) == VBIF_XIN_HALT_CTRL0_MASK);
> gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, 0);
>
> return;
>
> --
> 2.40.1
>
On Mon, May 29, 2023 at 03:52:26PM +0200, Konrad Dybcio wrote:
>
> Introduce a6xx_gpu_sw_reset() in preparation for adding GMU wrapper
> GPUs and reuse it in a6xx_gmu_force_off().
>
> This helper, contrary to the original usage in GMU code paths, adds
> a write memory barrier which together with the necessary delay should
> ensure that the reset is never deasserted too quickly due to e.g. OoO
> execution going crazy.
>
> Signed-off-by: Konrad Dybcio <[email protected]>
> ---
> drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 3 +--
> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 11 +++++++++++
> drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 1 +
> 3 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index b86be123ecd0..5ba8cba69383 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -899,8 +899,7 @@ static void a6xx_gmu_force_off(struct a6xx_gmu *gmu)
> a6xx_bus_clear_pending_transactions(adreno_gpu, true);
>
> /* Reset GPU core blocks */
> - gpu_write(gpu, REG_A6XX_RBBM_SW_RESET_CMD, 1);
> - udelay(100);
> + a6xx_gpu_sw_reset(gpu, true);
> }
>
> static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu)
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index e3ac3f045665..083ccb5bcb4e 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1634,6 +1634,17 @@ void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_
> gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
> }
>
> +void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert)
> +{
> + gpu_write(gpu, REG_A6XX_RBBM_SW_RESET_CMD, assert);
> + /* Add a barrier to avoid bad surprises */
Can you please make this comment a bit more clear? Highlight that we
should ensure the register is posted at hw before polling.
I think this barrier is required only during assert.
-Akhil.
> + mb();
> +
> + /* The reset line needs to be asserted for at least 100 us */
> + if (assert)
> + udelay(100);
> +}
> +
> static int a6xx_pm_resume(struct msm_gpu *gpu)
> {
> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index 9580def06d45..aa70390ee1c6 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -89,5 +89,6 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu);
> int a6xx_gpu_state_put(struct msm_gpu_state *state);
>
> void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_off);
> +void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert);
>
> #endif /* __A6XX_GPU_H__ */
>
> --
> 2.40.1
>
On Mon, 29 May 2023 15:52:20 +0200, Konrad Dybcio wrote:
> The "GMU Wrapper" is Qualcomm's name for "let's treat the GPU blocks
> we'd normally assign to the GMU as if they were a part of the GMU, even
> though they are not". It's a (good) software representation of the GMU_CX
> and GMU_GX register spaces within the GPUSS that helps us programatically
> treat these de-facto GMU-less parts in a way that's very similar to their
> GMU-equipped cousins, massively saving up on code duplication.
>
> The "wrapper" register space was specifically designed to mimic the layout
> of a real GMU, though it rather obviously does not have the M3 core et al.
>
> GMU wrapper-equipped A6xx GPUs require clocks and clock-names to be
> specified under the GPU node, just like their older cousins. Account
> for that.
>
> Signed-off-by: Konrad Dybcio <[email protected]>
> ---
> .../devicetree/bindings/display/msm/gpu.yaml | 61 ++++++++++++++++++----
> 1 file changed, 52 insertions(+), 9 deletions(-)
>
Acked-by: Rob Herring <[email protected]>
On Tue, May 30, 2023 at 03:35:09PM +0200, Konrad Dybcio wrote:
>
>
> On 30.05.2023 14:26, Krzysztof Kozlowski wrote:
> > On Mon, 29 May 2023 15:52:20 +0200, Konrad Dybcio wrote:
> >> The "GMU Wrapper" is Qualcomm's name for "let's treat the GPU blocks
> >> we'd normally assign to the GMU as if they were a part of the GMU, even
> >> though they are not". It's a (good) software representation of the GMU_CX
> >> and GMU_GX register spaces within the GPUSS that helps us programatically
> >> treat these de-facto GMU-less parts in a way that's very similar to their
> >> GMU-equipped cousins, massively saving up on code duplication.
> >>
> >> The "wrapper" register space was specifically designed to mimic the layout
> >> of a real GMU, though it rather obviously does not have the M3 core et al.
> >>
> >> GMU wrapper-equipped A6xx GPUs require clocks and clock-names to be
> >> specified under the GPU node, just like their older cousins. Account
> >> for that.
> >>
> >> Signed-off-by: Konrad Dybcio <[email protected]>
> >> ---
> >> .../devicetree/bindings/display/msm/gpu.yaml | 61 ++++++++++++++++++----
> >> 1 file changed, 52 insertions(+), 9 deletions(-)
> >>
> >
> > Running 'make dtbs_check' with the schema in this patch gives the
> > following warnings. Consider if they are expected or the schema is
> > incorrect. These may not be new warnings.
> I think it'd be beneficial if the bot diffed the output of checks pre-
> and post- patch.
Fix all the warnings and it will. ;) Care to donate h/w to run the build
twice every time?
Really what I care about on these is when I keep getting changes to a
schema and the list of warnings remains long and not getting fixed.
This case was less than useful with just the oneOf warning.
Rob
On 8.06.2023 22:58, Rob Herring wrote:
> On Tue, May 30, 2023 at 03:35:09PM +0200, Konrad Dybcio wrote:
>>
>>
>> On 30.05.2023 14:26, Krzysztof Kozlowski wrote:
>>> On Mon, 29 May 2023 15:52:20 +0200, Konrad Dybcio wrote:
>>>> The "GMU Wrapper" is Qualcomm's name for "let's treat the GPU blocks
>>>> we'd normally assign to the GMU as if they were a part of the GMU, even
>>>> though they are not". It's a (good) software representation of the GMU_CX
>>>> and GMU_GX register spaces within the GPUSS that helps us programatically
>>>> treat these de-facto GMU-less parts in a way that's very similar to their
>>>> GMU-equipped cousins, massively saving up on code duplication.
>>>>
>>>> The "wrapper" register space was specifically designed to mimic the layout
>>>> of a real GMU, though it rather obviously does not have the M3 core et al.
>>>>
>>>> GMU wrapper-equipped A6xx GPUs require clocks and clock-names to be
>>>> specified under the GPU node, just like their older cousins. Account
>>>> for that.
>>>>
>>>> Signed-off-by: Konrad Dybcio <[email protected]>
>>>> ---
>>>> .../devicetree/bindings/display/msm/gpu.yaml | 61 ++++++++++++++++++----
>>>> 1 file changed, 52 insertions(+), 9 deletions(-)
>>>>
>>>
>>> Running 'make dtbs_check' with the schema in this patch gives the
>>> following warnings. Consider if they are expected or the schema is
>>> incorrect. These may not be new warnings.
>> I think it'd be beneficial if the bot diffed the output of checks pre-
>> and post- patch.
>
> Fix all the warnings and it will. ;)
Nice one :P
Care to donate h/w to run the build
> twice every time?
Personally that might be a bit difficult, but I'm pretty sure KernelCI
farms don't run at full throttle 24/7, perpaps some of their capacity
could be borrowed?
>
> Really what I care about on these is when I keep getting changes to a
> schema and the list of warnings remains long and not getting fixed.
>
> This case was less than useful with just the oneOf warning.
Ack
Konrad
>
> Rob
On Mon, May 29, 2023 at 03:52:27PM +0200, Konrad Dybcio wrote:
>
> Currently we're only deasserting REG_A6XX_RBBM_GBIF_HALT, but we also
> need REG_A6XX_GBIF_HALT to be set to 0.
>
> This is typically done automatically on successful GX collapse, but in
> case that fails, we should take care of it.
>
> Also, add a memory barrier to ensure it's gone through before jumping
> to further initialization.
>
> Reviewed-by: Dmitry Baryshkov <[email protected]>
> Signed-off-by: Konrad Dybcio <[email protected]>
> ---
> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 083ccb5bcb4e..dfde5fb65eed 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1003,8 +1003,12 @@ static int hw_init(struct msm_gpu *gpu)
> a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
>
> /* Clear GBIF halt in case GX domain was not collapsed */
> - if (a6xx_has_gbif(adreno_gpu))
> + if (a6xx_has_gbif(adreno_gpu)) {
> + gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
> + /* Let's make extra sure that the GPU can access the memory.. */
> + mb();
This barrier is unnecessary because writel transactions are ordered and
we don't expect a traffic from GPU immediately after this.
-Akhil
> + }
>
> gpu_write(gpu, REG_A6XX_RBBM_SECVID_TSB_CNTL, 0);
>
>
> --
> 2.40.1
>
On 9.06.2023 20:25, Akhil P Oommen wrote:
> On Mon, May 29, 2023 at 03:52:27PM +0200, Konrad Dybcio wrote:
>>
>> Currently we're only deasserting REG_A6XX_RBBM_GBIF_HALT, but we also
>> need REG_A6XX_GBIF_HALT to be set to 0.
>>
>> This is typically done automatically on successful GX collapse, but in
>> case that fails, we should take care of it.
>>
>> Also, add a memory barrier to ensure it's gone through before jumping
>> to further initialization.
>>
>> Reviewed-by: Dmitry Baryshkov <[email protected]>
>> Signed-off-by: Konrad Dybcio <[email protected]>
>> ---
>> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> index 083ccb5bcb4e..dfde5fb65eed 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> @@ -1003,8 +1003,12 @@ static int hw_init(struct msm_gpu *gpu)
>> a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
>>
>> /* Clear GBIF halt in case GX domain was not collapsed */
>> - if (a6xx_has_gbif(adreno_gpu))
>> + if (a6xx_has_gbif(adreno_gpu)) {
>> + gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
>> gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
>> + /* Let's make extra sure that the GPU can access the memory.. */
>> + mb();
> This barrier is unnecessary because writel transactions are ordered and
> we don't expect a traffic from GPU immediately after this.
>
> -Akhil
Right, let's remove it!
Konrad
>> + }
>>
>> gpu_write(gpu, REG_A6XX_RBBM_SECVID_TSB_CNTL, 0);
>>
>>
>> --
>> 2.40.1
>>
On Mon, May 29, 2023 at 03:52:28PM +0200, Konrad Dybcio wrote:
>
> Rename lower_bit to hbb_lo and explain what it signifies.
> Add explanations (wherever possible to other tunables).
>
> Port setting min_access_length, ubwc_mode and hbb_hi from downstream.
>
> Reviewed-by: Rob Clark <[email protected]>
> Signed-off-by: Konrad Dybcio <[email protected]>
> ---
> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 39 +++++++++++++++++++++++++++--------
> 1 file changed, 30 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index dfde5fb65eed..58bf405b85d8 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -786,10 +786,25 @@ static void a6xx_set_cp_protect(struct msm_gpu *gpu)
> static void a6xx_set_ubwc_config(struct msm_gpu *gpu)
> {
> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> - u32 lower_bit = 2;
> - u32 amsbc = 0;
> + /* Unknown, introduced with A650 family, related to UBWC mode/ver 4 */
> u32 rgb565_predicator = 0;
> + /* Unknown, introduced with A650 family */
> u32 uavflagprd_inv = 0;
> + /* Whether the minimum access length is 64 bits */
> + u32 min_acc_len = 0;
> + /* Entirely magic, per-GPU-gen value */
> + u32 ubwc_mode = 0;
> + /*
> + * The Highest Bank Bit value represents the bit of the highest DDR bank.
> + * We then subtract 13 from it (13 is the minimum value allowed by hw) and
> + * write the lowest two bits of the remaining value as hbb_lo and the
> + * one above it as hbb_hi to the hardware. This should ideally use DRAM
> + * type detection.
> + */
> + u32 hbb_hi = 0;
> + u32 hbb_lo = 2;
> + /* Unknown, introduced with A640/680 */
> + u32 amsbc = 0;
>
> /* a618 is using the hw default values */
> if (adreno_is_a618(adreno_gpu))
> @@ -800,25 +815,31 @@ static void a6xx_set_ubwc_config(struct msm_gpu *gpu)
>
> if (adreno_is_a650(adreno_gpu) || adreno_is_a660(adreno_gpu)) {
> /* TODO: get ddr type from bootloader and use 2 for LPDDR4 */
> - lower_bit = 3;
> + hbb_lo = 3;
> amsbc = 1;
> rgb565_predicator = 1;
> uavflagprd_inv = 2;
> }
>
> if (adreno_is_7c3(adreno_gpu)) {
> - lower_bit = 1;
> + hbb_lo = 1;
> amsbc = 1;
> rgb565_predicator = 1;
> uavflagprd_inv = 2;
> }
>
> gpu_write(gpu, REG_A6XX_RB_NC_MODE_CNTL,
> - rgb565_predicator << 11 | amsbc << 4 | lower_bit << 1);
> - gpu_write(gpu, REG_A6XX_TPL1_NC_MODE_CNTL, lower_bit << 1);
> - gpu_write(gpu, REG_A6XX_SP_NC_MODE_CNTL,
> - uavflagprd_inv << 4 | lower_bit << 1);
> - gpu_write(gpu, REG_A6XX_UCHE_MODE_CNTL, lower_bit << 21);
> + rgb565_predicator << 11 | hbb_hi << 10 | amsbc << 4 |
> + min_acc_len << 3 | hbb_lo << 1 | ubwc_mode);
> +
> + gpu_write(gpu, REG_A6XX_TPL1_NC_MODE_CNTL, hbb_hi << 4 |
> + min_acc_len << 3 | hbb_lo << 1 | ubwc_mode);
> +
> + gpu_write(gpu, REG_A6XX_SP_NC_MODE_CNTL, hbb_hi << 10 |
> + uavflagprd_inv << 4 | min_acc_len << 3 |
> + hbb_lo << 1 | ubwc_mode);
> +
> + gpu_write(gpu, REG_A6XX_UCHE_MODE_CNTL, min_acc_len << 23 | hbb_lo << 21);
> }
>
> static int a6xx_cp_init(struct msm_gpu *gpu)
>
Reviewed-by: Akhil P Oommen <[email protected]>
-Akhil
> --
> 2.40.1
>
On Mon, May 29, 2023 at 03:52:29PM +0200, Konrad Dybcio wrote:
>
> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
> but don't implement the associated GMUs. This is due to the fact that
> the GMU directly pokes at RPMh. Sadly, this means we have to take care
> of enabling & scaling power rails, clocks and bandwidth ourselves.
>
> Reuse existing Adreno-common code and modify the deeply-GMU-infused
> A6XX code to facilitate these GPUs. This involves if-ing out lots
> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
> the actual name that Qualcomm uses in their downstream kernels).
>
> This is essentially a register region which is convenient to model
> as a device. We'll use it for managing the GDSCs. The register
> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
> and lets us reuse quite a bit of gmu_read/write/rmw calls.
>
> Signed-off-by: Konrad Dybcio <[email protected]>
> ---
> drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 72 +++++++++-
> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 211 ++++++++++++++++++++++++----
> drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 1 +
> drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 14 +-
> drivers/gpu/drm/msm/adreno/adreno_gpu.c | 8 +-
> drivers/gpu/drm/msm/adreno/adreno_gpu.h | 6 +
> 6 files changed, 277 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index 5ba8cba69383..385ca3a12462 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -1437,6 +1437,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, struct platform_device *pdev,
>
> void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> {
> + struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> struct platform_device *pdev = to_platform_device(gmu->dev);
>
> @@ -1462,10 +1463,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> gmu->mmio = NULL;
> gmu->rscc = NULL;
>
> - a6xx_gmu_memory_free(gmu);
> + if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> + a6xx_gmu_memory_free(gmu);
>
> - free_irq(gmu->gmu_irq, gmu);
> - free_irq(gmu->hfi_irq, gmu);
> + free_irq(gmu->gmu_irq, gmu);
> + free_irq(gmu->hfi_irq, gmu);
> + }
>
> /* Drop reference taken in of_find_device_by_node */
> put_device(gmu->dev);
> @@ -1484,6 +1487,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
> return 0;
> }
>
> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> +{
> + struct platform_device *pdev = of_find_device_by_node(node);
> + struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> + int ret;
> +
> + if (!pdev)
> + return -ENODEV;
> +
> + gmu->dev = &pdev->dev;
> +
> + of_dma_configure(gmu->dev, node, true);
> +
> + pm_runtime_enable(gmu->dev);
> +
> + /* Mark legacy for manual SPTPRAC control */
> + gmu->legacy = true;
> +
> + /* Map the GMU registers */
> + gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
> + if (IS_ERR(gmu->mmio)) {
> + ret = PTR_ERR(gmu->mmio);
> + goto err_mmio;
> + }
> +
> + gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
> + if (IS_ERR(gmu->cxpd)) {
> + ret = PTR_ERR(gmu->cxpd);
> + goto err_mmio;
> + }
> +
> + if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
> + ret = -ENODEV;
> + goto detach_cxpd;
> + }
> +
> + init_completion(&gmu->pd_gate);
> + complete_all(&gmu->pd_gate);
> + gmu->pd_nb.notifier_call = cxpd_notifier_cb;
> +
> + /* Get a link to the GX power domain to reset the GPU */
> + gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
> + if (IS_ERR(gmu->gxpd)) {
> + ret = PTR_ERR(gmu->gxpd);
> + goto err_mmio;
> + }
> +
> + gmu->initialized = true;
> +
> + return 0;
> +
> +detach_cxpd:
> + dev_pm_domain_detach(gmu->cxpd, false);
> +
> +err_mmio:
> + iounmap(gmu->mmio);
> +
> + /* Drop reference taken in of_find_device_by_node */
> + put_device(gmu->dev);
> +
> + return ret;
> +}
> +
> int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> {
> struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 58bf405b85d8..0a44762dbb6d 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -21,7 +21,7 @@ static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>
> /* Check that the GMU is idle */
> - if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
> + if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_isidle(&a6xx_gpu->gmu))
> return false;
>
> /* Check tha the CX master is idle */
> @@ -1018,10 +1018,13 @@ static int hw_init(struct msm_gpu *gpu)
> {
> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> + struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> int ret;
>
> - /* Make sure the GMU keeps the GPU on while we set it up */
> - a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
> + if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> + /* Make sure the GMU keeps the GPU on while we set it up */
> + a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
> + }
>
> /* Clear GBIF halt in case GX domain was not collapsed */
> if (a6xx_has_gbif(adreno_gpu)) {
> @@ -1148,6 +1151,17 @@ static int hw_init(struct msm_gpu *gpu)
> 0x3f0243f0);
> }
>
> + if (adreno_has_gmu_wrapper(adreno_gpu)) {
> + /* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
> +
> + /* Set up the CX GMU counter 0 to count busy ticks */
> + gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
> +
> + /* Enable power counter 0 */
> + gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
> + gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
> + }
nit: For gmu targets, we do this at a6xx_rpmh_start() which is an odd place
to keep this. But I don't know the reason why it was decided to keep it
there. I don't see any reason why we cannot keep it here for both
gmu/gmu-wrapper like in the downstream driver.
> +
> /* Protect registers from the CP */
> a6xx_set_cp_protect(gpu);
>
> @@ -1237,6 +1251,8 @@ static int hw_init(struct msm_gpu *gpu)
> }
>
> out:
> + if (adreno_has_gmu_wrapper(adreno_gpu))
> + return ret;
> /*
> * Tell the GMU that we are done touching the GPU and it can start power
> * management
> @@ -1271,9 +1287,6 @@ static void a6xx_dump(struct msm_gpu *gpu)
> adreno_dump(gpu);
> }
>
> -#define VBIF_RESET_ACK_TIMEOUT 100
> -#define VBIF_RESET_ACK_MASK 0x00f0
> -
> static void a6xx_recover(struct msm_gpu *gpu)
> {
> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> @@ -1311,6 +1324,15 @@ static void a6xx_recover(struct msm_gpu *gpu)
> */
> gpu->active_submits = 0;
>
> + if (adreno_has_gmu_wrapper(adreno_gpu)) {
> + /* Drain the outstanding traffic on memory buses */
> + a6xx_bus_clear_pending_transactions(adreno_gpu, true);
> +
> + /* Reset the GPU to a clean state */
> + a6xx_gpu_sw_reset(gpu, true);
> + a6xx_gpu_sw_reset(gpu, false);
> + }
> +
> reinit_completion(&gmu->pd_gate);
> dev_pm_genpd_add_notifier(gmu->cxpd, &gmu->pd_nb);
> dev_pm_genpd_synced_poweroff(gmu->cxpd);
> @@ -1461,7 +1483,8 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
> * Force the GPU to stay on until after we finish
> * collecting information
> */
> - gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
> + if (!adreno_has_gmu_wrapper(adreno_gpu))
> + gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
>
> DRM_DEV_ERROR(&gpu->pdev->dev,
> "gpu fault ring %d fence %x status %8.8X rb %4.4x/%4.4x ib1 %16.16llX/%4.4x ib2 %16.16llX/%4.4x\n",
> @@ -1592,6 +1615,10 @@ static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)
>
> static void a6xx_llc_slices_destroy(struct a6xx_gpu *a6xx_gpu)
> {
> + /* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
> + if (adreno_has_gmu_wrapper(&a6xx_gpu->base))
> + return;
> +
> llcc_slice_putd(a6xx_gpu->llc_slice);
> llcc_slice_putd(a6xx_gpu->htw_llc_slice);
> }
> @@ -1601,6 +1628,10 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
> {
> struct device_node *phandle;
>
> + /* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
> + if (adreno_has_gmu_wrapper(&a6xx_gpu->base))
> + return;
> +
> /*
> * There is a different programming path for targets with an mmu500
> * attached, so detect if that is the case
> @@ -1670,7 +1701,7 @@ void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert)
> udelay(100);
> }
>
> -static int a6xx_pm_resume(struct msm_gpu *gpu)
> +static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
> {
> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> @@ -1690,10 +1721,58 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
>
> a6xx_llc_activate(a6xx_gpu);
>
> - return 0;
> + return ret;
> }
>
> -static int a6xx_pm_suspend(struct msm_gpu *gpu)
> +static int a6xx_pm_resume(struct msm_gpu *gpu)
> +{
> + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> + struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> + unsigned long freq = gpu->fast_rate;
> + struct dev_pm_opp *opp;
> + int ret;
> +
> + gpu->needs_hw_init = true;
> +
> + trace_msm_gpu_resume(0);
> +
> + mutex_lock(&a6xx_gpu->gmu.lock);
Where is this lock initialized? If the init was moved out of
a6xx_gmu_init(), can you please share that patch?
> +
> + opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
> + if (IS_ERR(opp)) {
> + ret = PTR_ERR(opp);
> + goto err_set_opp;
> + }
> + dev_pm_opp_put(opp);
> +
> + /* Set the core clock and bus bw, having VDD scaling in mind */
> + dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> +
> + pm_runtime_resume_and_get(gmu->dev);
> + pm_runtime_resume_and_get(gmu->gxpd);
> +
> + ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
> + if (ret)
> + goto err_bulk_clk;
> +
> + /* If anything goes south, tear the GPU down piece by piece.. */
> + if (ret) {
> +err_bulk_clk:
Goto jump directly to another block looks odd to me. Why do you need this label
anyway?
> + pm_runtime_put(gmu->gxpd);
> + pm_runtime_put(gmu->dev);
> + dev_pm_opp_set_opp(&gpu->pdev->dev, NULL);
> + }
> +err_set_opp:
Generally, it is better to name the label based on what you do here. For
eg: "unlock_lock:".
Also, this function is small enough that it is better to return directly
in case of error. I think that would be more readable.
> + mutex_unlock(&a6xx_gpu->gmu.lock);
> +
> + if (!ret)
> + msm_devfreq_resume(gpu);
> +
> + return ret;
> +}
> +
> +static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
> {
> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> @@ -1720,7 +1799,40 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
> return 0;
> }
>
> -static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
> +static int a6xx_pm_suspend(struct msm_gpu *gpu)
> +{
> + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> + struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> + int i;
> +
> + trace_msm_gpu_suspend(0);
> +
> + msm_devfreq_suspend(gpu);
> +
> + mutex_lock(&a6xx_gpu->gmu.lock);
Again, is this initialized somewhere?
> +
> + /* Drain the outstanding traffic on memory buses */
> + a6xx_bus_clear_pending_transactions(adreno_gpu, true);
> +
> + clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
> +
> + pm_runtime_put_sync(gmu->gxpd);
> + dev_pm_opp_set_opp(&gpu->pdev->dev, NULL);
> + pm_runtime_put_sync(gmu->dev);
> +
> + mutex_unlock(&a6xx_gpu->gmu.lock);
> +
> + if (a6xx_gpu->shadow_bo)
> + for (i = 0; i < gpu->nr_rings; i++)
> + a6xx_gpu->shadow[i] = 0;
> +
> + gpu->suspend_count++;
> +
> + return 0;
> +}
> +
> +static int a6xx_gmu_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
> {
> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> @@ -1739,6 +1851,12 @@ static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
> return 0;
> }
>
> +static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
> +{
> + *value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
> + return 0;
> +}
> +
> static struct msm_ringbuffer *a6xx_active_ring(struct msm_gpu *gpu)
> {
> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> @@ -2004,8 +2122,8 @@ static const struct adreno_gpu_funcs funcs = {
> .set_param = adreno_set_param,
> .hw_init = a6xx_hw_init,
> .ucode_load = a6xx_ucode_load,
> - .pm_suspend = a6xx_pm_suspend,
> - .pm_resume = a6xx_pm_resume,
> + .pm_suspend = a6xx_gmu_pm_suspend,
> + .pm_resume = a6xx_gmu_pm_resume,
> .recover = a6xx_recover,
> .submit = a6xx_submit,
> .active_ring = a6xx_active_ring,
> @@ -2020,6 +2138,35 @@ static const struct adreno_gpu_funcs funcs = {
> #if defined(CONFIG_DRM_MSM_GPU_STATE)
> .gpu_state_get = a6xx_gpu_state_get,
> .gpu_state_put = a6xx_gpu_state_put,
> +#endif
> + .create_address_space = a6xx_create_address_space,
> + .create_private_address_space = a6xx_create_private_address_space,
> + .get_rptr = a6xx_get_rptr,
> + .progress = a6xx_progress,
> + },
> + .get_timestamp = a6xx_gmu_get_timestamp,
> +};
> +
> +static const struct adreno_gpu_funcs funcs_gmuwrapper = {
> + .base = {
> + .get_param = adreno_get_param,
> + .set_param = adreno_set_param,
> + .hw_init = a6xx_hw_init,
> + .ucode_load = a6xx_ucode_load,
> + .pm_suspend = a6xx_pm_suspend,
> + .pm_resume = a6xx_pm_resume,
> + .recover = a6xx_recover,
> + .submit = a6xx_submit,
> + .active_ring = a6xx_active_ring,
> + .irq = a6xx_irq,
> + .destroy = a6xx_destroy,
> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> + .show = a6xx_show,
> +#endif
> + .gpu_busy = a6xx_gpu_busy,
> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> + .gpu_state_get = a6xx_gpu_state_get,
> + .gpu_state_put = a6xx_gpu_state_put,
> #endif
> .create_address_space = a6xx_create_address_space,
> .create_private_address_space = a6xx_create_private_address_space,
> @@ -2050,15 +2197,31 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>
> adreno_gpu->registers = NULL;
>
> + /* Check if there is a GMU phandle and set it up */
> + node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> + /* FIXME: How do we gracefully handle this? */
> + BUG_ON(!node);
> +
> + adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
> +
> /*
> * We need to know the platform type before calling into adreno_gpu_init
> * so that the hw_apriv flag can be correctly set. Snoop into the info
> * and grab the revision number
> */
> info = adreno_info(config->rev);
> -
> - if (info && (info->revn == 650 || info->revn == 660 ||
> - adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
> + if (!info)
> + return ERR_PTR(-EINVAL);
> +
> + /* Assign these early so that we can use the is_aXYZ helpers */
> + /* Numeric revision IDs (e.g. 630) */
> + adreno_gpu->revn = info->revn;
> + /* New-style ADRENO_REV()-only */
> + adreno_gpu->rev = info->rev;
> + /* Quirk data */
> + adreno_gpu->info = info;
> +
> + if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
> adreno_gpu->base.hw_apriv = true;
>
> a6xx_llc_slices_init(pdev, a6xx_gpu);
> @@ -2069,7 +2232,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> return ERR_PTR(ret);
> }
>
> - ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> + if (adreno_has_gmu_wrapper(adreno_gpu))
> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
> + else
> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> if (ret) {
> a6xx_destroy(&(a6xx_gpu->base.base));
> return ERR_PTR(ret);
> @@ -2082,13 +2248,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
> priv->gpu_clamp_to_idle = true;
>
> - /* Check if there is a GMU phandle and set it up */
> - node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> -
> - /* FIXME: How do we gracefully handle this? */
> - BUG_ON(!node);
> -
> - ret = a6xx_gmu_init(a6xx_gpu, node);
> + if (adreno_has_gmu_wrapper(adreno_gpu))
> + ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
> + else
> + ret = a6xx_gmu_init(a6xx_gpu, node);
> of_node_put(node);
> if (ret) {
> a6xx_destroy(&(a6xx_gpu->base.base));
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index aa70390ee1c6..c788b06e72da 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>
> int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
>
> void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> index 30ecdff363e7..4e5d650578c6 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> @@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
> /* Get the generic state from the adreno core */
> adreno_gpu_state_get(gpu, &a6xx_state->base);
>
> - a6xx_get_gmu_registers(gpu, a6xx_state);
> + if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> + a6xx_get_gmu_registers(gpu, a6xx_state);
>
> - a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> - a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> - a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> + a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> + a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> + a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> /
> - a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> + a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> + }
>
> /* If GX isn't on the rest of the data isn't going to be accessible */
> - if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> + if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> return &a6xx_state->base;
>
> /* Get the banks of indexed registers */
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index 6934cee07d42..5c5901d65950 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
> if (!adreno_gpu->info->fw[i])
> continue;
>
> + /* Skip loading GMU firwmare with GMU Wrapper */
> + if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
> + continue;
> +
> /* Skip if the firmware has already been loaded */
> if (adreno_gpu->fw[i])
> continue;
> @@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
> u32 speedbin;
> int ret;
>
> - /* Only handle the core clock when GMU is not in use */
> - if (config->rev.core < 6) {
> + /* Only handle the core clock when GMU is not in use (or is absent). */
> + if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
> /*
> * This can only be done before devm_pm_opp_of_add_table(), or
> * dev_pm_opp_set_config() will WARN_ON()
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> index f62612a5c70f..ee5352bc5329 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -115,6 +115,7 @@ struct adreno_gpu {
> * code (a3xx_gpu.c) and stored in this common location.
> */
> const unsigned int *reg_offsets;
> + bool gmu_is_wrapper;
> };
> #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
>
> @@ -145,6 +146,11 @@ struct adreno_platform_config {
>
> bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
>
> +static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
> +{
> + return gpu->gmu_is_wrapper;
> +}
> +
> static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
> {
> return (gpu->revn < 300);
>
> --
> 2.40.1
>
I am still not fully onboard with the idea of gmu_wrapper node in devicetree.
Aside from that, I don't see any other issue. Please check the few comments I left.
-Akhil.
On Mon, May 29, 2023 at 03:52:34PM +0200, Konrad Dybcio wrote:
>
> The GPU can only be one at a time. Turn a series of ifs into if +
> elseifs to save some CPU cycles.
>
> Reviewed-by: Dmitry Baryshkov <[email protected]>
> Signed-off-by: Konrad Dybcio <[email protected]>
Reviewed-by: Akhil P Oommen <[email protected]>
-Akhil
> ---
> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 1a29e7dd9975..5faa85543428 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -2170,16 +2170,16 @@ static u32 fuse_to_supp_hw(struct device *dev, struct adreno_rev rev, u32 fuse)
> if (adreno_cmp_rev(ADRENO_REV(6, 1, 8, ANY_ID), rev))
> val = a618_get_speed_bin(fuse);
>
> - if (adreno_cmp_rev(ADRENO_REV(6, 1, 9, ANY_ID), rev))
> + else if (adreno_cmp_rev(ADRENO_REV(6, 1, 9, ANY_ID), rev))
> val = a619_get_speed_bin(fuse);
>
> - if (adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), rev))
> + else if (adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), rev))
> val = adreno_7c3_get_speed_bin(fuse);
>
> - if (adreno_cmp_rev(ADRENO_REV(6, 4, 0, ANY_ID), rev))
> + else if (adreno_cmp_rev(ADRENO_REV(6, 4, 0, ANY_ID), rev))
> val = a640_get_speed_bin(fuse);
>
> - if (adreno_cmp_rev(ADRENO_REV(6, 5, 0, ANY_ID), rev))
> + else if (adreno_cmp_rev(ADRENO_REV(6, 5, 0, ANY_ID), rev))
> val = a650_get_speed_bin(fuse);
>
> if (val == UINT_MAX) {
>
> --
> 2.40.1
>
On Mon, May 29, 2023 at 03:52:32PM +0200, Konrad Dybcio wrote:
>
> A610 is one of (if not the) lowest-tier SKUs in the A6XX family. It
> features no GMU, as it's implemented solely on SoCs with SMD_RPM.
> What's more interesting is that it does not feature a VDDGX line
> either, being powered solely by VDDCX and has an unfortunate hardware
> quirk that makes its reset line broken - after a couple of assert/
> deassert cycles, it will hang for good and will not wake up again.
>
> This GPU requires mesa changes for proper rendering, and lots of them
> at that. The command streams are quite far away from any other A6XX
> GPU and hence it needs special care. This patch was validated both
> by running an (incomplete) downstream mesa with some hacks (frames
> rendered correctly, though some instructions made the GPU hangcheck
> which is expected - garbage in, garbage out) and by replaying RD
> traces captured with the downstream KGSL driver - no crashes there,
> ever.
>
> Add support for this GPU on the kernel side, which comes down to
> pretty simply adding A612 HWCG tables, altering a few values and
> adding a special case for handling the reset line.
>
> Reviewed-by: Dmitry Baryshkov <[email protected]>
> Signed-off-by: Konrad Dybcio <[email protected]>
> ---
> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 101 +++++++++++++++++++++++++----
> drivers/gpu/drm/msm/adreno/adreno_device.c | 12 ++++
> drivers/gpu/drm/msm/adreno/adreno_gpu.h | 8 ++-
> 3 files changed, 108 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index bb04f65e6f68..c0d5973320d9 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -252,6 +252,56 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
> a6xx_flush(gpu, ring);
> }
>
> +const struct adreno_reglist a612_hwcg[] = {
> + {REG_A6XX_RBBM_CLOCK_CNTL_SP0, 0x22222222},
> + {REG_A6XX_RBBM_CLOCK_CNTL2_SP0, 0x02222220},
> + {REG_A6XX_RBBM_CLOCK_DELAY_SP0, 0x00000081},
> + {REG_A6XX_RBBM_CLOCK_HYST_SP0, 0x0000f3cf},
> + {REG_A6XX_RBBM_CLOCK_CNTL_TP0, 0x22222222},
> + {REG_A6XX_RBBM_CLOCK_CNTL2_TP0, 0x22222222},
> + {REG_A6XX_RBBM_CLOCK_CNTL3_TP0, 0x22222222},
> + {REG_A6XX_RBBM_CLOCK_CNTL4_TP0, 0x00022222},
> + {REG_A6XX_RBBM_CLOCK_DELAY_TP0, 0x11111111},
> + {REG_A6XX_RBBM_CLOCK_DELAY2_TP0, 0x11111111},
> + {REG_A6XX_RBBM_CLOCK_DELAY3_TP0, 0x11111111},
> + {REG_A6XX_RBBM_CLOCK_DELAY4_TP0, 0x00011111},
> + {REG_A6XX_RBBM_CLOCK_HYST_TP0, 0x77777777},
> + {REG_A6XX_RBBM_CLOCK_HYST2_TP0, 0x77777777},
> + {REG_A6XX_RBBM_CLOCK_HYST3_TP0, 0x77777777},
> + {REG_A6XX_RBBM_CLOCK_HYST4_TP0, 0x00077777},
> + {REG_A6XX_RBBM_CLOCK_CNTL_RB0, 0x22222222},
> + {REG_A6XX_RBBM_CLOCK_CNTL2_RB0, 0x01202222},
> + {REG_A6XX_RBBM_CLOCK_CNTL_CCU0, 0x00002220},
> + {REG_A6XX_RBBM_CLOCK_HYST_RB_CCU0, 0x00040f00},
> + {REG_A6XX_RBBM_CLOCK_CNTL_RAC, 0x05522022},
> + {REG_A6XX_RBBM_CLOCK_CNTL2_RAC, 0x00005555},
> + {REG_A6XX_RBBM_CLOCK_DELAY_RAC, 0x00000011},
> + {REG_A6XX_RBBM_CLOCK_HYST_RAC, 0x00445044},
> + {REG_A6XX_RBBM_CLOCK_CNTL_TSE_RAS_RBBM, 0x04222222},
> + {REG_A6XX_RBBM_CLOCK_MODE_VFD, 0x00002222},
> + {REG_A6XX_RBBM_CLOCK_MODE_GPC, 0x02222222},
> + {REG_A6XX_RBBM_CLOCK_DELAY_HLSQ_2, 0x00000002},
> + {REG_A6XX_RBBM_CLOCK_MODE_HLSQ, 0x00002222},
> + {REG_A6XX_RBBM_CLOCK_DELAY_TSE_RAS_RBBM, 0x00004000},
> + {REG_A6XX_RBBM_CLOCK_DELAY_VFD, 0x00002222},
> + {REG_A6XX_RBBM_CLOCK_DELAY_GPC, 0x00000200},
> + {REG_A6XX_RBBM_CLOCK_DELAY_HLSQ, 0x00000000},
> + {REG_A6XX_RBBM_CLOCK_HYST_TSE_RAS_RBBM, 0x00000000},
> + {REG_A6XX_RBBM_CLOCK_HYST_VFD, 0x00000000},
> + {REG_A6XX_RBBM_CLOCK_HYST_GPC, 0x04104004},
> + {REG_A6XX_RBBM_CLOCK_HYST_HLSQ, 0x00000000},
> + {REG_A6XX_RBBM_CLOCK_CNTL_UCHE, 0x22222222},
> + {REG_A6XX_RBBM_CLOCK_HYST_UCHE, 0x00000004},
> + {REG_A6XX_RBBM_CLOCK_DELAY_UCHE, 0x00000002},
> + {REG_A6XX_RBBM_ISDB_CNT, 0x00000182},
> + {REG_A6XX_RBBM_RAC_THRESHOLD_CNT, 0x00000000},
> + {REG_A6XX_RBBM_SP_HYST_CNT, 0x00000000},
> + {REG_A6XX_RBBM_CLOCK_CNTL_GMU_GX, 0x00000222},
> + {REG_A6XX_RBBM_CLOCK_DELAY_GMU_GX, 0x00000111},
> + {REG_A6XX_RBBM_CLOCK_HYST_GMU_GX, 0x00000555},
> + {},
> +};
> +
> /* For a615 family (a615, a616, a618 and a619) */
> const struct adreno_reglist a615_hwcg[] = {
> {REG_A6XX_RBBM_CLOCK_CNTL_SP0, 0x02222222},
> @@ -602,6 +652,8 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
>
> if (adreno_is_a630(adreno_gpu))
> clock_cntl_on = 0x8aa8aa02;
> + else if (adreno_is_a610(adreno_gpu))
> + clock_cntl_on = 0xaaa8aa82;
> else
> clock_cntl_on = 0x8aa8aa82;
>
> @@ -612,13 +664,15 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
> return;
>
> /* Disable SP clock before programming HWCG registers */
> - gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
> + if (!adreno_is_a610(adreno_gpu))
> + gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
>
> for (i = 0; (reg = &adreno_gpu->info->hwcg[i], reg->offset); i++)
> gpu_write(gpu, reg->offset, state ? reg->value : 0);
>
> /* Enable SP clock */
> - gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
> + if (!adreno_is_a610(adreno_gpu))
> + gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
>
> gpu_write(gpu, REG_A6XX_RBBM_CLOCK_CNTL, state ? clock_cntl_on : 0);
> }
> @@ -806,6 +860,13 @@ static void a6xx_set_ubwc_config(struct msm_gpu *gpu)
> /* Unknown, introduced with A640/680 */
> u32 amsbc = 0;
>
> + if (adreno_is_a610(adreno_gpu)) {
> + /* HBB = 14 */
> + hbb_lo = 1;
> + min_acc_len = 1;
> + ubwc_mode = 1;
> + }
> +
> /* a618 is using the hw default values */
> if (adreno_is_a618(adreno_gpu))
> return;
> @@ -1073,13 +1134,13 @@ static int hw_init(struct msm_gpu *gpu)
> a6xx_set_hwcg(gpu, true);
>
> /* VBIF/GBIF start*/
> - if (adreno_is_a640_family(adreno_gpu) ||
> + if (adreno_is_a610(adreno_gpu) ||
> + adreno_is_a640_family(adreno_gpu) ||
> adreno_is_a650_family(adreno_gpu)) {
> gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE0, 0x00071620);
> gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE1, 0x00071620);
> gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE2, 0x00071620);
> gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE3, 0x00071620);
> - gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE3, 0x00071620);
> gpu_write(gpu, REG_A6XX_RBBM_GBIF_CLIENT_QOS_CNTL, 0x3);
> } else {
> gpu_write(gpu, REG_A6XX_RBBM_VBIF_CLIENT_QOS_CNTL, 0x3);
> @@ -1107,18 +1168,26 @@ static int hw_init(struct msm_gpu *gpu)
> gpu_write(gpu, REG_A6XX_UCHE_FILTER_CNTL, 0x804);
> gpu_write(gpu, REG_A6XX_UCHE_CACHE_WAYS, 0x4);
>
> - if (adreno_is_a640_family(adreno_gpu) ||
> - adreno_is_a650_family(adreno_gpu))
> + if (adreno_is_a640_family(adreno_gpu) || adreno_is_a650_family(adreno_gpu)) {
> gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_2, 0x02000140);
> - else
> + gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_1, 0x8040362c);
> + } else if (adreno_is_a610(adreno_gpu)) {
> + gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_2, 0x00800060);
> + gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_1, 0x40201b16);
> + } else {
> gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_2, 0x010000c0);
> - gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_1, 0x8040362c);
> + gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_1, 0x8040362c);
> + }
>
> if (adreno_is_a660_family(adreno_gpu))
> gpu_write(gpu, REG_A6XX_CP_LPAC_PROG_FIFO_SIZE, 0x00000020);
>
> /* Setting the mem pool size */
> - gpu_write(gpu, REG_A6XX_CP_MEM_POOL_SIZE, 128);
> + if (adreno_is_a610(adreno_gpu)) {
> + gpu_write(gpu, REG_A6XX_CP_MEM_POOL_SIZE, 48);
> + gpu_write(gpu, REG_A6XX_CP_MEM_POOL_DBG_ADDR, 47);
> + } else
> + gpu_write(gpu, REG_A6XX_CP_MEM_POOL_SIZE, 128);
>
> /* Setting the primFifo thresholds default values,
> * and vccCacheSkipDis=1 bit (0x200) for A640 and newer
> @@ -1129,6 +1198,8 @@ static int hw_init(struct msm_gpu *gpu)
> gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00200200);
> else if (adreno_is_a650(adreno_gpu) || adreno_is_a660(adreno_gpu))
> gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00300200);
> + else if (adreno_is_a610(adreno_gpu))
> + gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00080000);
> else
> gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00180000);
>
> @@ -1144,8 +1215,10 @@ static int hw_init(struct msm_gpu *gpu)
> a6xx_set_ubwc_config(gpu);
>
> /* Enable fault detection */
> - gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL,
> - (1 << 30) | 0x1fffff);
> + if (adreno_is_a610(adreno_gpu))
> + gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) | 0x3ffff);
> + else
> + gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) | 0x1fffff);
>
> gpu_write(gpu, REG_A6XX_UCHE_CLIENT_PF, 1);
>
> @@ -1675,7 +1748,7 @@ void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_
> struct msm_gpu *gpu = &adreno_gpu->base;
>
> if (adreno_is_a619_holi(adreno_gpu)) {
> - gpu_write(gpu, 0x18, GPR0_GBIF_HALT_REQUEST);
> + gpu_write(gpu, REG_A6XX_RBBM_GPR0_CNTL, GPR0_GBIF_HALT_REQUEST);
This looks like an unrelated change.
> spin_until((gpu_read(gpu, REG_A6XX_RBBM_VBIF_GX_RESET_STATUS) &
> (VBIF_RESET_ACK_MASK)) == VBIF_RESET_ACK_MASK);
> } else if (!a6xx_has_gbif(adreno_gpu)) {
> @@ -1709,6 +1782,10 @@ void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_
>
> void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert)
> {
> + /* 11nm chips (e.g. ones with A610) have hw issues with the reset line! */
> + if (adreno_is_a610(to_adreno_gpu(gpu)))
> + return;
> +
> gpu_write(gpu, REG_A6XX_RBBM_SW_RESET_CMD, assert);
> /* Add a barrier to avoid bad surprises */
> mb();
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
> index b133755a56c4..2c2cdbdada4d 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> @@ -253,6 +253,18 @@ static const struct adreno_info gpulist[] = {
> .quirks = ADRENO_QUIRK_LMLOADKILL_DISABLE,
> .init = a5xx_gpu_init,
> .zapfw = "a540_zap.mdt",
> + }, {
> + .rev = ADRENO_REV(6, 1, 0, ANY_ID),
> + .revn = 610,
> + .name = "A610",
> + .fw = {
> + [ADRENO_FW_SQE] = "a630_sqe.fw",
> + },
> + .gmem = (SZ_128K + SZ_4K),
> + .inactive_period = 500,
You really want such a long inactive period?
> + .init = a6xx_gpu_init,
> + .zapfw = "a610_zap.mdt",
> + .hwcg = a612_hwcg,
> }, {
> .rev = ADRENO_REV(6, 1, 8, ANY_ID),
> .revn = 618,
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> index 432fee5c1516..7a5d595d4b99 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -55,7 +55,8 @@ struct adreno_reglist {
> u32 value;
> };
>
> -extern const struct adreno_reglist a615_hwcg[], a630_hwcg[], a640_hwcg[], a650_hwcg[], a660_hwcg[];
> +extern const struct adreno_reglist a612_hwcg[], a615_hwcg[], a630_hwcg[], a640_hwcg[], a650_hwcg[];
> +extern const struct adreno_reglist a660_hwcg[];
>
> struct adreno_info {
> struct adreno_rev rev;
> @@ -242,6 +243,11 @@ static inline int adreno_is_a540(struct adreno_gpu *gpu)
> return gpu->revn == 540;
> }
>
> +static inline int adreno_is_a610(struct adreno_gpu *gpu)
> +{
> + return gpu->revn == 610;
> +}
> +
> static inline int adreno_is_a618(struct adreno_gpu *gpu)
> {
> return gpu->revn == 618;
>
> --
> 2.40.1
>
Minor nits, but looks good to me.
-Akhil.
On Mon, May 29, 2023 at 03:52:35PM +0200, Konrad Dybcio wrote:
>
> Before transitioning to using per-SoC and not per-Adreno speedbin
> fuse values (need another patchset to land elsewhere), a good
> improvement/stopgap solution is to use adreno_is_aXYZ macros in
> place of explicit revision matching. Do so to allow differentiating
> between A619 and A619_holi.
>
> Reviewed-by: Dmitry Baryshkov <[email protected]>
> Signed-off-by: Konrad Dybcio <[email protected]>
Reviewed-by: Akhil P Oommen <[email protected]>
-Akhil
> ---
> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 18 +++++++++---------
> drivers/gpu/drm/msm/adreno/adreno_gpu.h | 14 ++++++++++++--
> 2 files changed, 21 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 5faa85543428..ca4ffa44097e 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -2163,23 +2163,23 @@ static u32 adreno_7c3_get_speed_bin(u32 fuse)
> return UINT_MAX;
> }
>
> -static u32 fuse_to_supp_hw(struct device *dev, struct adreno_rev rev, u32 fuse)
> +static u32 fuse_to_supp_hw(struct device *dev, struct adreno_gpu *adreno_gpu, u32 fuse)
> {
> u32 val = UINT_MAX;
>
> - if (adreno_cmp_rev(ADRENO_REV(6, 1, 8, ANY_ID), rev))
> + if (adreno_is_a618(adreno_gpu))
> val = a618_get_speed_bin(fuse);
>
> - else if (adreno_cmp_rev(ADRENO_REV(6, 1, 9, ANY_ID), rev))
> + else if (adreno_is_a619(adreno_gpu))
> val = a619_get_speed_bin(fuse);
>
> - else if (adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), rev))
> + else if (adreno_is_7c3(adreno_gpu))
> val = adreno_7c3_get_speed_bin(fuse);
>
> - else if (adreno_cmp_rev(ADRENO_REV(6, 4, 0, ANY_ID), rev))
> + else if (adreno_is_a640(adreno_gpu))
> val = a640_get_speed_bin(fuse);
>
> - else if (adreno_cmp_rev(ADRENO_REV(6, 5, 0, ANY_ID), rev))
> + else if (adreno_is_a650(adreno_gpu))
> val = a650_get_speed_bin(fuse);
>
> if (val == UINT_MAX) {
> @@ -2192,7 +2192,7 @@ static u32 fuse_to_supp_hw(struct device *dev, struct adreno_rev rev, u32 fuse)
> return (1 << val);
> }
>
> -static int a6xx_set_supported_hw(struct device *dev, struct adreno_rev rev)
> +static int a6xx_set_supported_hw(struct device *dev, struct adreno_gpu *adreno_gpu)
> {
> u32 supp_hw;
> u32 speedbin;
> @@ -2211,7 +2211,7 @@ static int a6xx_set_supported_hw(struct device *dev, struct adreno_rev rev)
> return ret;
> }
>
> - supp_hw = fuse_to_supp_hw(dev, rev, speedbin);
> + supp_hw = fuse_to_supp_hw(dev, adreno_gpu, speedbin);
>
> ret = devm_pm_opp_set_supported_hw(dev, &supp_hw, 1);
> if (ret)
> @@ -2330,7 +2330,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>
> a6xx_llc_slices_init(pdev, a6xx_gpu);
>
> - ret = a6xx_set_supported_hw(&pdev->dev, config->rev);
> + ret = a6xx_set_supported_hw(&pdev->dev, adreno_gpu);
> if (ret) {
> a6xx_destroy(&(a6xx_gpu->base.base));
> return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> index 7a5d595d4b99..21513cec038f 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -268,9 +268,9 @@ static inline int adreno_is_a630(struct adreno_gpu *gpu)
> return gpu->revn == 630;
> }
>
> -static inline int adreno_is_a640_family(struct adreno_gpu *gpu)
> +static inline int adreno_is_a640(struct adreno_gpu *gpu)
> {
> - return (gpu->revn == 640) || (gpu->revn == 680);
> + return gpu->revn == 640;
> }
>
> static inline int adreno_is_a650(struct adreno_gpu *gpu)
> @@ -289,6 +289,11 @@ static inline int adreno_is_a660(struct adreno_gpu *gpu)
> return gpu->revn == 660;
> }
>
> +static inline int adreno_is_a680(struct adreno_gpu *gpu)
> +{
> + return gpu->revn == 680;
> +}
> +
> /* check for a615, a616, a618, a619 or any derivatives */
> static inline int adreno_is_a615_family(struct adreno_gpu *gpu)
> {
> @@ -306,6 +311,11 @@ static inline int adreno_is_a650_family(struct adreno_gpu *gpu)
> return gpu->revn == 650 || gpu->revn == 620 || adreno_is_a660_family(gpu);
> }
>
> +static inline int adreno_is_a640_family(struct adreno_gpu *gpu)
> +{
> + return adreno_is_a640(gpu) || adreno_is_a680(gpu);
> +}
> +
> u64 adreno_private_address_space_size(struct msm_gpu *gpu);
> int adreno_get_param(struct msm_gpu *gpu, struct msm_file_private *ctx,
> uint32_t param, uint64_t *value, uint32_t *len);
>
> --
> 2.40.1
>
On Mon, May 29, 2023 at 03:52:33PM +0200, Konrad Dybcio wrote:
>
> Adreno 619 expects some tunables to be set differently. Make up for it.
>
> Fixes: b7616b5c69e6 ("drm/msm/adreno: Add A619 support")
> Reviewed-by: Dmitry Baryshkov <[email protected]>
> Signed-off-by: Konrad Dybcio <[email protected]>
> ---
> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index c0d5973320d9..1a29e7dd9975 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1198,6 +1198,8 @@ static int hw_init(struct msm_gpu *gpu)
> gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00200200);
> else if (adreno_is_a650(adreno_gpu) || adreno_is_a660(adreno_gpu))
> gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00300200);
> + else if (adreno_is_a619(adreno_gpu))
> + gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00018000);
> else if (adreno_is_a610(adreno_gpu))
> gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00080000);
> else
> @@ -1215,7 +1217,9 @@ static int hw_init(struct msm_gpu *gpu)
> a6xx_set_ubwc_config(gpu);
>
> /* Enable fault detection */
> - if (adreno_is_a610(adreno_gpu))
> + if (adreno_is_a619(adreno_gpu))
> + gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) | 0x3fffff);
> + else if (adreno_is_a610(adreno_gpu))
> gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) | 0x3ffff);
> else
> gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) | 0x1fffff);
Reviewed-by: Akhil P Oommen <[email protected]>
-Akhil
>
> --
> 2.40.1
>
On Mon, May 29, 2023 at 03:52:36PM +0200, Konrad Dybcio wrote:
>
> A619_holi is implemented on at least two SoCs: SM4350 (holi) and SM6375
> (blair). This is what seems to be a first occurrence of this happening,
> but it's easy to overcome by guarding the SoC-specific fuse values with
> of_machine_is_compatible(). Do just that to enable frequency limiting
> on these SoCs.
>
> Reviewed-by: Dmitry Baryshkov <[email protected]>
> Signed-off-by: Konrad Dybcio <[email protected]>
Reviewed-by: Akhil P Oommen <[email protected]>
-Akhil
> ---
> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index ca4ffa44097e..d046af5f6de2 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -2110,6 +2110,34 @@ static u32 a618_get_speed_bin(u32 fuse)
> return UINT_MAX;
> }
>
> +static u32 a619_holi_get_speed_bin(u32 fuse)
> +{
> + /*
> + * There are (at least) two SoCs implementing A619_holi: SM4350 (holi)
> + * and SM6375 (blair). Limit the fuse matching to the corresponding
> + * SoC to prevent bogus frequency setting (as improbable as it may be,
> + * given unexpected fuse values are.. unexpected! But still possible.)
> + */
> +
> + if (fuse == 0)
> + return 0;
> +
> + if (of_machine_is_compatible("qcom,sm4350")) {
> + if (fuse == 138)
> + return 1;
> + else if (fuse == 92)
> + return 2;
> + } else if (of_machine_is_compatible("qcom,sm6375")) {
> + if (fuse == 190)
> + return 1;
> + else if (fuse == 177)
> + return 2;
> + } else
> + pr_warn("Unknown SoC implementing A619_holi!\n");
> +
> + return UINT_MAX;
> +}
> +
> static u32 a619_get_speed_bin(u32 fuse)
> {
> if (fuse == 0)
> @@ -2170,6 +2198,9 @@ static u32 fuse_to_supp_hw(struct device *dev, struct adreno_gpu *adreno_gpu, u3
> if (adreno_is_a618(adreno_gpu))
> val = a618_get_speed_bin(fuse);
>
> + else if (adreno_is_a619_holi(adreno_gpu))
> + val = a619_holi_get_speed_bin(fuse);
> +
> else if (adreno_is_a619(adreno_gpu))
> val = a619_get_speed_bin(fuse);
>
>
> --
> 2.40.1
>
On Mon, May 29, 2023 at 03:52:37PM +0200, Konrad Dybcio wrote:
>
> A610 is implemented on at least three SoCs: SM6115 (bengal), SM6125
> (trinket) and SM6225 (khaje). Trinket does not support speed binning
> (only a single SKU exists) and we don't yet support khaje upstream.
> Hence, add a fuse mapping table for bengal to allow for per-chip
> frequency limiting.
>
> Reviewed-by: Dmitry Baryshkov <[email protected]>
> Signed-off-by: Konrad Dybcio <[email protected]>
> ---
> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 27 +++++++++++++++++++++++++++
> 1 file changed, 27 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index d046af5f6de2..c304fa118cff 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -2098,6 +2098,30 @@ static bool a6xx_progress(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
> return progress;
> }
>
> +static u32 a610_get_speed_bin(u32 fuse)
> +{
> + /*
> + * There are (at least) three SoCs implementing A610: SM6125 (trinket),
> + * SM6115 (bengal) and SM6225 (khaje). Trinket does not have speedbinning,
> + * as only a single SKU exists and we don't support khaje upstream yet.
> + * Hence, this matching table is only valid for bengal and can be easily
> + * expanded if need be.
> + */
> +
> + if (fuse == 0)
> + return 0;
> + else if (fuse == 206)
> + return 1;
> + else if (fuse == 200)
> + return 2;
> + else if (fuse == 157)
> + return 3;
> + else if (fuse == 127)
> + return 4;
> +
> + return UINT_MAX;
> +}
> +
> static u32 a618_get_speed_bin(u32 fuse)
> {
> if (fuse == 0)
> @@ -2195,6 +2219,9 @@ static u32 fuse_to_supp_hw(struct device *dev, struct adreno_gpu *adreno_gpu, u3
> {
> u32 val = UINT_MAX;
>
> + if (adreno_is_a610(adreno_gpu))
> + val = a610_get_speed_bin(fuse);
> +
Didn't you update here to convert to 'else if' in one of the earlier
patches??
Reviewed-by: Akhil P Oommen <[email protected]>
-Akhil.
> if (adreno_is_a618(adreno_gpu))
> val = a618_get_speed_bin(fuse);
>
>
> --
> 2.40.1
>
On 14.06.2023 22:18, Akhil P Oommen wrote:
> On Mon, May 29, 2023 at 03:52:37PM +0200, Konrad Dybcio wrote:
>>
>> A610 is implemented on at least three SoCs: SM6115 (bengal), SM6125
>> (trinket) and SM6225 (khaje). Trinket does not support speed binning
>> (only a single SKU exists) and we don't yet support khaje upstream.
>> Hence, add a fuse mapping table for bengal to allow for per-chip
>> frequency limiting.
>>
>> Reviewed-by: Dmitry Baryshkov <[email protected]>
>> Signed-off-by: Konrad Dybcio <[email protected]>
>> ---
>> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 27 +++++++++++++++++++++++++++
>> 1 file changed, 27 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> index d046af5f6de2..c304fa118cff 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> @@ -2098,6 +2098,30 @@ static bool a6xx_progress(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
>> return progress;
>> }
>>
>> +static u32 a610_get_speed_bin(u32 fuse)
>> +{
>> + /*
>> + * There are (at least) three SoCs implementing A610: SM6125 (trinket),
>> + * SM6115 (bengal) and SM6225 (khaje). Trinket does not have speedbinning,
>> + * as only a single SKU exists and we don't support khaje upstream yet.
>> + * Hence, this matching table is only valid for bengal and can be easily
>> + * expanded if need be.
>> + */
>> +
>> + if (fuse == 0)
>> + return 0;
>> + else if (fuse == 206)
>> + return 1;
>> + else if (fuse == 200)
>> + return 2;
>> + else if (fuse == 157)
>> + return 3;
>> + else if (fuse == 127)
>> + return 4;
>> +
>> + return UINT_MAX;
>> +}
>> +
>> static u32 a618_get_speed_bin(u32 fuse)
>> {
>> if (fuse == 0)
>> @@ -2195,6 +2219,9 @@ static u32 fuse_to_supp_hw(struct device *dev, struct adreno_gpu *adreno_gpu, u3
>> {
>> u32 val = UINT_MAX;
>>
>> + if (adreno_is_a610(adreno_gpu))
>> + val = a610_get_speed_bin(fuse);
>> +
>
> Didn't you update here to convert to 'else if' in one of the earlier
> patches??
Right, missed this one!
Konrad
>
> Reviewed-by: Akhil P Oommen <[email protected]>
>
> -Akhil.
>> if (adreno_is_a618(adreno_gpu))
>> val = a618_get_speed_bin(fuse);
>>
>>
>> --
>> 2.40.1
>>
On 14.06.2023 21:41, Akhil P Oommen wrote:
> On Mon, May 29, 2023 at 03:52:32PM +0200, Konrad Dybcio wrote:
>>
>> A610 is one of (if not the) lowest-tier SKUs in the A6XX family. It
>> features no GMU, as it's implemented solely on SoCs with SMD_RPM.
>> What's more interesting is that it does not feature a VDDGX line
>> either, being powered solely by VDDCX and has an unfortunate hardware
>> quirk that makes its reset line broken - after a couple of assert/
>> deassert cycles, it will hang for good and will not wake up again.
>>
>> This GPU requires mesa changes for proper rendering, and lots of them
>> at that. The command streams are quite far away from any other A6XX
>> GPU and hence it needs special care. This patch was validated both
>> by running an (incomplete) downstream mesa with some hacks (frames
>> rendered correctly, though some instructions made the GPU hangcheck
>> which is expected - garbage in, garbage out) and by replaying RD
>> traces captured with the downstream KGSL driver - no crashes there,
>> ever.
>>
>> Add support for this GPU on the kernel side, which comes down to
>> pretty simply adding A612 HWCG tables, altering a few values and
>> adding a special case for handling the reset line.
>>
>> Reviewed-by: Dmitry Baryshkov <[email protected]>
>> Signed-off-by: Konrad Dybcio <[email protected]>
>> ---
>> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 101 +++++++++++++++++++++++++----
>> drivers/gpu/drm/msm/adreno/adreno_device.c | 12 ++++
>> drivers/gpu/drm/msm/adreno/adreno_gpu.h | 8 ++-
>> 3 files changed, 108 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> index bb04f65e6f68..c0d5973320d9 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> @@ -252,6 +252,56 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>> a6xx_flush(gpu, ring);
>> }
>>
>> +const struct adreno_reglist a612_hwcg[] = {
>> + {REG_A6XX_RBBM_CLOCK_CNTL_SP0, 0x22222222},
>> + {REG_A6XX_RBBM_CLOCK_CNTL2_SP0, 0x02222220},
>> + {REG_A6XX_RBBM_CLOCK_DELAY_SP0, 0x00000081},
>> + {REG_A6XX_RBBM_CLOCK_HYST_SP0, 0x0000f3cf},
>> + {REG_A6XX_RBBM_CLOCK_CNTL_TP0, 0x22222222},
>> + {REG_A6XX_RBBM_CLOCK_CNTL2_TP0, 0x22222222},
>> + {REG_A6XX_RBBM_CLOCK_CNTL3_TP0, 0x22222222},
>> + {REG_A6XX_RBBM_CLOCK_CNTL4_TP0, 0x00022222},
>> + {REG_A6XX_RBBM_CLOCK_DELAY_TP0, 0x11111111},
>> + {REG_A6XX_RBBM_CLOCK_DELAY2_TP0, 0x11111111},
>> + {REG_A6XX_RBBM_CLOCK_DELAY3_TP0, 0x11111111},
>> + {REG_A6XX_RBBM_CLOCK_DELAY4_TP0, 0x00011111},
>> + {REG_A6XX_RBBM_CLOCK_HYST_TP0, 0x77777777},
>> + {REG_A6XX_RBBM_CLOCK_HYST2_TP0, 0x77777777},
>> + {REG_A6XX_RBBM_CLOCK_HYST3_TP0, 0x77777777},
>> + {REG_A6XX_RBBM_CLOCK_HYST4_TP0, 0x00077777},
>> + {REG_A6XX_RBBM_CLOCK_CNTL_RB0, 0x22222222},
>> + {REG_A6XX_RBBM_CLOCK_CNTL2_RB0, 0x01202222},
>> + {REG_A6XX_RBBM_CLOCK_CNTL_CCU0, 0x00002220},
>> + {REG_A6XX_RBBM_CLOCK_HYST_RB_CCU0, 0x00040f00},
>> + {REG_A6XX_RBBM_CLOCK_CNTL_RAC, 0x05522022},
>> + {REG_A6XX_RBBM_CLOCK_CNTL2_RAC, 0x00005555},
>> + {REG_A6XX_RBBM_CLOCK_DELAY_RAC, 0x00000011},
>> + {REG_A6XX_RBBM_CLOCK_HYST_RAC, 0x00445044},
>> + {REG_A6XX_RBBM_CLOCK_CNTL_TSE_RAS_RBBM, 0x04222222},
>> + {REG_A6XX_RBBM_CLOCK_MODE_VFD, 0x00002222},
>> + {REG_A6XX_RBBM_CLOCK_MODE_GPC, 0x02222222},
>> + {REG_A6XX_RBBM_CLOCK_DELAY_HLSQ_2, 0x00000002},
>> + {REG_A6XX_RBBM_CLOCK_MODE_HLSQ, 0x00002222},
>> + {REG_A6XX_RBBM_CLOCK_DELAY_TSE_RAS_RBBM, 0x00004000},
>> + {REG_A6XX_RBBM_CLOCK_DELAY_VFD, 0x00002222},
>> + {REG_A6XX_RBBM_CLOCK_DELAY_GPC, 0x00000200},
>> + {REG_A6XX_RBBM_CLOCK_DELAY_HLSQ, 0x00000000},
>> + {REG_A6XX_RBBM_CLOCK_HYST_TSE_RAS_RBBM, 0x00000000},
>> + {REG_A6XX_RBBM_CLOCK_HYST_VFD, 0x00000000},
>> + {REG_A6XX_RBBM_CLOCK_HYST_GPC, 0x04104004},
>> + {REG_A6XX_RBBM_CLOCK_HYST_HLSQ, 0x00000000},
>> + {REG_A6XX_RBBM_CLOCK_CNTL_UCHE, 0x22222222},
>> + {REG_A6XX_RBBM_CLOCK_HYST_UCHE, 0x00000004},
>> + {REG_A6XX_RBBM_CLOCK_DELAY_UCHE, 0x00000002},
>> + {REG_A6XX_RBBM_ISDB_CNT, 0x00000182},
>> + {REG_A6XX_RBBM_RAC_THRESHOLD_CNT, 0x00000000},
>> + {REG_A6XX_RBBM_SP_HYST_CNT, 0x00000000},
>> + {REG_A6XX_RBBM_CLOCK_CNTL_GMU_GX, 0x00000222},
>> + {REG_A6XX_RBBM_CLOCK_DELAY_GMU_GX, 0x00000111},
>> + {REG_A6XX_RBBM_CLOCK_HYST_GMU_GX, 0x00000555},
>> + {},
>> +};
>> +
>> /* For a615 family (a615, a616, a618 and a619) */
>> const struct adreno_reglist a615_hwcg[] = {
>> {REG_A6XX_RBBM_CLOCK_CNTL_SP0, 0x02222222},
>> @@ -602,6 +652,8 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
>>
>> if (adreno_is_a630(adreno_gpu))
>> clock_cntl_on = 0x8aa8aa02;
>> + else if (adreno_is_a610(adreno_gpu))
>> + clock_cntl_on = 0xaaa8aa82;
>> else
>> clock_cntl_on = 0x8aa8aa82;
>>
>> @@ -612,13 +664,15 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
>> return;
>>
>> /* Disable SP clock before programming HWCG registers */
>> - gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
>> + if (!adreno_is_a610(adreno_gpu))
>> + gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
>>
>> for (i = 0; (reg = &adreno_gpu->info->hwcg[i], reg->offset); i++)
>> gpu_write(gpu, reg->offset, state ? reg->value : 0);
>>
>> /* Enable SP clock */
>> - gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
>> + if (!adreno_is_a610(adreno_gpu))
>> + gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
>>
>> gpu_write(gpu, REG_A6XX_RBBM_CLOCK_CNTL, state ? clock_cntl_on : 0);
>> }
>> @@ -806,6 +860,13 @@ static void a6xx_set_ubwc_config(struct msm_gpu *gpu)
>> /* Unknown, introduced with A640/680 */
>> u32 amsbc = 0;
>>
>> + if (adreno_is_a610(adreno_gpu)) {
>> + /* HBB = 14 */
>> + hbb_lo = 1;
>> + min_acc_len = 1;
>> + ubwc_mode = 1;
>> + }
>> +
>> /* a618 is using the hw default values */
>> if (adreno_is_a618(adreno_gpu))
>> return;
>> @@ -1073,13 +1134,13 @@ static int hw_init(struct msm_gpu *gpu)
>> a6xx_set_hwcg(gpu, true);
>>
>> /* VBIF/GBIF start*/
>> - if (adreno_is_a640_family(adreno_gpu) ||
>> + if (adreno_is_a610(adreno_gpu) ||
>> + adreno_is_a640_family(adreno_gpu) ||
>> adreno_is_a650_family(adreno_gpu)) {
>> gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE0, 0x00071620);
>> gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE1, 0x00071620);
>> gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE2, 0x00071620);
>> gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE3, 0x00071620);
>> - gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE3, 0x00071620);
>> gpu_write(gpu, REG_A6XX_RBBM_GBIF_CLIENT_QOS_CNTL, 0x3);
>> } else {
>> gpu_write(gpu, REG_A6XX_RBBM_VBIF_CLIENT_QOS_CNTL, 0x3);
>> @@ -1107,18 +1168,26 @@ static int hw_init(struct msm_gpu *gpu)
>> gpu_write(gpu, REG_A6XX_UCHE_FILTER_CNTL, 0x804);
>> gpu_write(gpu, REG_A6XX_UCHE_CACHE_WAYS, 0x4);
>>
>> - if (adreno_is_a640_family(adreno_gpu) ||
>> - adreno_is_a650_family(adreno_gpu))
>> + if (adreno_is_a640_family(adreno_gpu) || adreno_is_a650_family(adreno_gpu)) {
>> gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_2, 0x02000140);
>> - else
>> + gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_1, 0x8040362c);
>> + } else if (adreno_is_a610(adreno_gpu)) {
>> + gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_2, 0x00800060);
>> + gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_1, 0x40201b16);
>> + } else {
>> gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_2, 0x010000c0);
>> - gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_1, 0x8040362c);
>> + gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_1, 0x8040362c);
>> + }
>>
>> if (adreno_is_a660_family(adreno_gpu))
>> gpu_write(gpu, REG_A6XX_CP_LPAC_PROG_FIFO_SIZE, 0x00000020);
>>
>> /* Setting the mem pool size */
>> - gpu_write(gpu, REG_A6XX_CP_MEM_POOL_SIZE, 128);
>> + if (adreno_is_a610(adreno_gpu)) {
>> + gpu_write(gpu, REG_A6XX_CP_MEM_POOL_SIZE, 48);
>> + gpu_write(gpu, REG_A6XX_CP_MEM_POOL_DBG_ADDR, 47);
>> + } else
>> + gpu_write(gpu, REG_A6XX_CP_MEM_POOL_SIZE, 128);
>>
>> /* Setting the primFifo thresholds default values,
>> * and vccCacheSkipDis=1 bit (0x200) for A640 and newer
>> @@ -1129,6 +1198,8 @@ static int hw_init(struct msm_gpu *gpu)
>> gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00200200);
>> else if (adreno_is_a650(adreno_gpu) || adreno_is_a660(adreno_gpu))
>> gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00300200);
>> + else if (adreno_is_a610(adreno_gpu))
>> + gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00080000);
>> else
>> gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00180000);
>>
>> @@ -1144,8 +1215,10 @@ static int hw_init(struct msm_gpu *gpu)
>> a6xx_set_ubwc_config(gpu);
>>
>> /* Enable fault detection */
>> - gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL,
>> - (1 << 30) | 0x1fffff);
>> + if (adreno_is_a610(adreno_gpu))
>> + gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) | 0x3ffff);
>> + else
>> + gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) | 0x1fffff);
>>
>> gpu_write(gpu, REG_A6XX_UCHE_CLIENT_PF, 1);
>>
>> @@ -1675,7 +1748,7 @@ void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_
>> struct msm_gpu *gpu = &adreno_gpu->base;
>>
>> if (adreno_is_a619_holi(adreno_gpu)) {
>> - gpu_write(gpu, 0x18, GPR0_GBIF_HALT_REQUEST);
>> + gpu_write(gpu, REG_A6XX_RBBM_GPR0_CNTL, GPR0_GBIF_HALT_REQUEST);
>
> This looks like an unrelated change.
Right, wrong commit.
>
>> spin_until((gpu_read(gpu, REG_A6XX_RBBM_VBIF_GX_RESET_STATUS) &
>> (VBIF_RESET_ACK_MASK)) == VBIF_RESET_ACK_MASK);
>> } else if (!a6xx_has_gbif(adreno_gpu)) {
>> @@ -1709,6 +1782,10 @@ void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_
>>
>> void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert)
>> {
>> + /* 11nm chips (e.g. ones with A610) have hw issues with the reset line! */
>> + if (adreno_is_a610(to_adreno_gpu(gpu)))
>> + return;
>> +
>> gpu_write(gpu, REG_A6XX_RBBM_SW_RESET_CMD, assert);
>> /* Add a barrier to avoid bad surprises */
>> mb();
>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
>> index b133755a56c4..2c2cdbdada4d 100644
>> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
>> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
>> @@ -253,6 +253,18 @@ static const struct adreno_info gpulist[] = {
>> .quirks = ADRENO_QUIRK_LMLOADKILL_DISABLE,
>> .init = a5xx_gpu_init,
>> .zapfw = "a540_zap.mdt",
>> + }, {
>> + .rev = ADRENO_REV(6, 1, 0, ANY_ID),
>> + .revn = 610,
>> + .name = "A610",
>> + .fw = {
>> + [ADRENO_FW_SQE] = "a630_sqe.fw",
>> + },
>> + .gmem = (SZ_128K + SZ_4K),
>> + .inactive_period = 500,
>
> You really want such a long inactive period?
Whoooooops! I confused this with gdsc timeout.. Thanks for spotting
this!
Konrad
>
>> + .init = a6xx_gpu_init,
>> + .zapfw = "a610_zap.mdt",
>> + .hwcg = a612_hwcg,
>> }, {
>> .rev = ADRENO_REV(6, 1, 8, ANY_ID),
>> .revn = 618,
>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>> index 432fee5c1516..7a5d595d4b99 100644
>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>> @@ -55,7 +55,8 @@ struct adreno_reglist {
>> u32 value;
>> };
>>
>> -extern const struct adreno_reglist a615_hwcg[], a630_hwcg[], a640_hwcg[], a650_hwcg[], a660_hwcg[];
>> +extern const struct adreno_reglist a612_hwcg[], a615_hwcg[], a630_hwcg[], a640_hwcg[], a650_hwcg[];
>> +extern const struct adreno_reglist a660_hwcg[];
>>
>> struct adreno_info {
>> struct adreno_rev rev;
>> @@ -242,6 +243,11 @@ static inline int adreno_is_a540(struct adreno_gpu *gpu)
>> return gpu->revn == 540;
>> }
>>
>> +static inline int adreno_is_a610(struct adreno_gpu *gpu)
>> +{
>> + return gpu->revn == 610;
>> +}
>> +
>> static inline int adreno_is_a618(struct adreno_gpu *gpu)
>> {
>> return gpu->revn == 618;
>>
>> --
>> 2.40.1
>>
>
> Minor nits, but looks good to me.
>
> -Akhil.
On 6.06.2023 19:18, Akhil P Oommen wrote:
> On Mon, May 29, 2023 at 03:52:26PM +0200, Konrad Dybcio wrote:
>>
>> Introduce a6xx_gpu_sw_reset() in preparation for adding GMU wrapper
>> GPUs and reuse it in a6xx_gmu_force_off().
>>
>> This helper, contrary to the original usage in GMU code paths, adds
>> a write memory barrier which together with the necessary delay should
>> ensure that the reset is never deasserted too quickly due to e.g. OoO
>> execution going crazy.
>>
>> Signed-off-by: Konrad Dybcio <[email protected]>
>> ---
>> drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 3 +--
>> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 11 +++++++++++
>> drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 1 +
>> 3 files changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> index b86be123ecd0..5ba8cba69383 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> @@ -899,8 +899,7 @@ static void a6xx_gmu_force_off(struct a6xx_gmu *gmu)
>> a6xx_bus_clear_pending_transactions(adreno_gpu, true);
>>
>> /* Reset GPU core blocks */
>> - gpu_write(gpu, REG_A6XX_RBBM_SW_RESET_CMD, 1);
>> - udelay(100);
>> + a6xx_gpu_sw_reset(gpu, true);
>> }
>>
>> static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu)
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> index e3ac3f045665..083ccb5bcb4e 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> @@ -1634,6 +1634,17 @@ void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_
>> gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
>> }
>>
>> +void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert)
>> +{
>> + gpu_write(gpu, REG_A6XX_RBBM_SW_RESET_CMD, assert);
>> + /* Add a barrier to avoid bad surprises */
> Can you please make this comment a bit more clear? Highlight that we
> should ensure the register is posted at hw before polling.
>
> I think this barrier is required only during assert.
Generally it should not be strictly required at all, but I'm thinking
that it'd be good to keep it in both cases, so that:
if (assert)
we don't keep writing things to the GPU if it's in reset
else
we don't start writing things to the GPU becomes it comes
out of reset
Also, if you squint hard enough at the commit message, you'll notice
I intended for this so only be a wmb, but for some reason generalized
it.. Perhaps that's another thing I should fix!
for v9..
Konrad
>
> -Akhil.
>> + mb();
>> +
>> + /* The reset line needs to be asserted for at least 100 us */
>> + if (assert)
>> + udelay(100);
>> +}
>> +
>> static int a6xx_pm_resume(struct msm_gpu *gpu)
>> {
>> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> index 9580def06d45..aa70390ee1c6 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> @@ -89,5 +89,6 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu);
>> int a6xx_gpu_state_put(struct msm_gpu_state *state);
>>
>> void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_off);
>> +void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert);
>>
>> #endif /* __A6XX_GPU_H__ */
>>
>> --
>> 2.40.1
>>
On Thu, Jun 15, 2023 at 12:34:06PM +0200, Konrad Dybcio wrote:
>
> On 6.06.2023 19:18, Akhil P Oommen wrote:
> > On Mon, May 29, 2023 at 03:52:26PM +0200, Konrad Dybcio wrote:
> >>
> >> Introduce a6xx_gpu_sw_reset() in preparation for adding GMU wrapper
> >> GPUs and reuse it in a6xx_gmu_force_off().
> >>
> >> This helper, contrary to the original usage in GMU code paths, adds
> >> a write memory barrier which together with the necessary delay should
> >> ensure that the reset is never deasserted too quickly due to e.g. OoO
> >> execution going crazy.
> >>
> >> Signed-off-by: Konrad Dybcio <[email protected]>
> >> ---
> >> drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 3 +--
> >> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 11 +++++++++++
> >> drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 1 +
> >> 3 files changed, 13 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >> index b86be123ecd0..5ba8cba69383 100644
> >> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >> @@ -899,8 +899,7 @@ static void a6xx_gmu_force_off(struct a6xx_gmu *gmu)
> >> a6xx_bus_clear_pending_transactions(adreno_gpu, true);
> >>
> >> /* Reset GPU core blocks */
> >> - gpu_write(gpu, REG_A6XX_RBBM_SW_RESET_CMD, 1);
> >> - udelay(100);
> >> + a6xx_gpu_sw_reset(gpu, true);
> >> }
> >>
> >> static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu)
> >> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >> index e3ac3f045665..083ccb5bcb4e 100644
> >> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >> @@ -1634,6 +1634,17 @@ void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_
> >> gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
> >> }
> >>
> >> +void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert)
> >> +{
> >> + gpu_write(gpu, REG_A6XX_RBBM_SW_RESET_CMD, assert);
> >> + /* Add a barrier to avoid bad surprises */
> > Can you please make this comment a bit more clear? Highlight that we
> > should ensure the register is posted at hw before polling.
> >
> > I think this barrier is required only during assert.
> Generally it should not be strictly required at all, but I'm thinking
> that it'd be good to keep it in both cases, so that:
>
> if (assert)
> we don't keep writing things to the GPU if it's in reset
> else
> we don't start writing things to the GPU becomes it comes
> out of reset
>
> Also, if you squint hard enough at the commit message, you'll notice
> I intended for this so only be a wmb, but for some reason generalized
> it.. Perhaps that's another thing I should fix!
> for v9..
wmb() doesn't provide any ordering guarantee with the delay loop.
A common practice is to just read back the same register before
the loop because a readl followed by delay() is guaranteed to be ordered.
-Akhil.
>
> Konrad
> >
> > -Akhil.
> >> + mb();
> >> +
> >> + /* The reset line needs to be asserted for at least 100 us */
> >> + if (assert)
> >> + udelay(100);
> >> +}
> >> +
> >> static int a6xx_pm_resume(struct msm_gpu *gpu)
> >> {
> >> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >> index 9580def06d45..aa70390ee1c6 100644
> >> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >> @@ -89,5 +89,6 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu);
> >> int a6xx_gpu_state_put(struct msm_gpu_state *state);
> >>
> >> void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_off);
> >> +void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert);
> >>
> >> #endif /* __A6XX_GPU_H__ */
> >>
> >> --
> >> 2.40.1
> >>
On 15.06.2023 22:11, Akhil P Oommen wrote:
> On Thu, Jun 15, 2023 at 12:34:06PM +0200, Konrad Dybcio wrote:
>>
>> On 6.06.2023 19:18, Akhil P Oommen wrote:
>>> On Mon, May 29, 2023 at 03:52:26PM +0200, Konrad Dybcio wrote:
>>>>
>>>> Introduce a6xx_gpu_sw_reset() in preparation for adding GMU wrapper
>>>> GPUs and reuse it in a6xx_gmu_force_off().
>>>>
>>>> This helper, contrary to the original usage in GMU code paths, adds
>>>> a write memory barrier which together with the necessary delay should
>>>> ensure that the reset is never deasserted too quickly due to e.g. OoO
>>>> execution going crazy.
>>>>
>>>> Signed-off-by: Konrad Dybcio <[email protected]>
>>>> ---
>>>> drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 3 +--
>>>> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 11 +++++++++++
>>>> drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 1 +
>>>> 3 files changed, 13 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>> index b86be123ecd0..5ba8cba69383 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>> @@ -899,8 +899,7 @@ static void a6xx_gmu_force_off(struct a6xx_gmu *gmu)
>>>> a6xx_bus_clear_pending_transactions(adreno_gpu, true);
>>>>
>>>> /* Reset GPU core blocks */
>>>> - gpu_write(gpu, REG_A6XX_RBBM_SW_RESET_CMD, 1);
>>>> - udelay(100);
>>>> + a6xx_gpu_sw_reset(gpu, true);
>>>> }
>>>>
>>>> static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu)
>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>> index e3ac3f045665..083ccb5bcb4e 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>> @@ -1634,6 +1634,17 @@ void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_
>>>> gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
>>>> }
>>>>
>>>> +void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert)
>>>> +{
>>>> + gpu_write(gpu, REG_A6XX_RBBM_SW_RESET_CMD, assert);
>>>> + /* Add a barrier to avoid bad surprises */
>>> Can you please make this comment a bit more clear? Highlight that we
>>> should ensure the register is posted at hw before polling.
>>>
>>> I think this barrier is required only during assert.
>> Generally it should not be strictly required at all, but I'm thinking
>> that it'd be good to keep it in both cases, so that:
>>
>> if (assert)
>> we don't keep writing things to the GPU if it's in reset
>> else
>> we don't start writing things to the GPU becomes it comes
>> out of reset
>>
>> Also, if you squint hard enough at the commit message, you'll notice
>> I intended for this so only be a wmb, but for some reason generalized
>> it.. Perhaps that's another thing I should fix!
>> for v9..
>
> wmb() doesn't provide any ordering guarantee with the delay loop.
Hm, fair.. I'm still not as fluent with memory access knowledge as I'd
like to be..
> A common practice is to just read back the same register before
> the loop because a readl followed by delay() is guaranteed to be ordered.
So, how should I proceed? Keep the r/w barrier, or add a readback and
a tiiiny (perhaps even using ndelay instead of udelay?) delay on de-assert?
Konrad
>
> -Akhil.
>>
>> Konrad
>>>
>>> -Akhil.
>>>> + mb();
>>>> +
>>>> + /* The reset line needs to be asserted for at least 100 us */
>>>> + if (assert)
>>>> + udelay(100);
>>>> +}
>>>> +
>>>> static int a6xx_pm_resume(struct msm_gpu *gpu)
>>>> {
>>>> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>> index 9580def06d45..aa70390ee1c6 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>> @@ -89,5 +89,6 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu);
>>>> int a6xx_gpu_state_put(struct msm_gpu_state *state);
>>>>
>>>> void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_off);
>>>> +void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert);
>>>>
>>>> #endif /* __A6XX_GPU_H__ */
>>>>
>>>> --
>>>> 2.40.1
>>>>
On Thu, Jun 15, 2023 at 10:59:23PM +0200, Konrad Dybcio wrote:
>
> On 15.06.2023 22:11, Akhil P Oommen wrote:
> > On Thu, Jun 15, 2023 at 12:34:06PM +0200, Konrad Dybcio wrote:
> >>
> >> On 6.06.2023 19:18, Akhil P Oommen wrote:
> >>> On Mon, May 29, 2023 at 03:52:26PM +0200, Konrad Dybcio wrote:
> >>>>
> >>>> Introduce a6xx_gpu_sw_reset() in preparation for adding GMU wrapper
> >>>> GPUs and reuse it in a6xx_gmu_force_off().
> >>>>
> >>>> This helper, contrary to the original usage in GMU code paths, adds
> >>>> a write memory barrier which together with the necessary delay should
> >>>> ensure that the reset is never deasserted too quickly due to e.g. OoO
> >>>> execution going crazy.
> >>>>
> >>>> Signed-off-by: Konrad Dybcio <[email protected]>
> >>>> ---
> >>>> drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 3 +--
> >>>> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 11 +++++++++++
> >>>> drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 1 +
> >>>> 3 files changed, 13 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>>> index b86be123ecd0..5ba8cba69383 100644
> >>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>>> @@ -899,8 +899,7 @@ static void a6xx_gmu_force_off(struct a6xx_gmu *gmu)
> >>>> a6xx_bus_clear_pending_transactions(adreno_gpu, true);
> >>>>
> >>>> /* Reset GPU core blocks */
> >>>> - gpu_write(gpu, REG_A6XX_RBBM_SW_RESET_CMD, 1);
> >>>> - udelay(100);
> >>>> + a6xx_gpu_sw_reset(gpu, true);
> >>>> }
> >>>>
> >>>> static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu)
> >>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>> index e3ac3f045665..083ccb5bcb4e 100644
> >>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>> @@ -1634,6 +1634,17 @@ void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_
> >>>> gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
> >>>> }
> >>>>
> >>>> +void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert)
> >>>> +{
> >>>> + gpu_write(gpu, REG_A6XX_RBBM_SW_RESET_CMD, assert);
> >>>> + /* Add a barrier to avoid bad surprises */
> >>> Can you please make this comment a bit more clear? Highlight that we
> >>> should ensure the register is posted at hw before polling.
> >>>
> >>> I think this barrier is required only during assert.
> >> Generally it should not be strictly required at all, but I'm thinking
> >> that it'd be good to keep it in both cases, so that:
> >>
> >> if (assert)
> >> we don't keep writing things to the GPU if it's in reset
> >> else
> >> we don't start writing things to the GPU becomes it comes
> >> out of reset
> >>
> >> Also, if you squint hard enough at the commit message, you'll notice
> >> I intended for this so only be a wmb, but for some reason generalized
> >> it.. Perhaps that's another thing I should fix!
> >> for v9..
> >
> > wmb() doesn't provide any ordering guarantee with the delay loop.
> Hm, fair.. I'm still not as fluent with memory access knowledge as I'd
> like to be..
>
> > A common practice is to just read back the same register before
> > the loop because a readl followed by delay() is guaranteed to be ordered.
> So, how should I proceed? Keep the r/w barrier, or add a readback and
> a tiiiny (perhaps even using ndelay instead of udelay?) delay on de-assert?
readback + delay (similar value as downstream). This path is exercised
rarely.
-Akhil.
>
> Konrad
> >
> > -Akhil.
> >>
> >> Konrad
> >>>
> >>> -Akhil.
> >>>> + mb();
> >>>> +
> >>>> + /* The reset line needs to be asserted for at least 100 us */
> >>>> + if (assert)
> >>>> + udelay(100);
> >>>> +}
> >>>> +
> >>>> static int a6xx_pm_resume(struct msm_gpu *gpu)
> >>>> {
> >>>> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >>>> index 9580def06d45..aa70390ee1c6 100644
> >>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >>>> @@ -89,5 +89,6 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu);
> >>>> int a6xx_gpu_state_put(struct msm_gpu_state *state);
> >>>>
> >>>> void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu, bool gx_off);
> >>>> +void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert);
> >>>>
> >>>> #endif /* __A6XX_GPU_H__ */
> >>>>
> >>>> --
> >>>> 2.40.1
> >>>>
On 10.06.2023 00:06, Akhil P Oommen wrote:
> On Mon, May 29, 2023 at 03:52:29PM +0200, Konrad Dybcio wrote:
>>
>> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
>> but don't implement the associated GMUs. This is due to the fact that
>> the GMU directly pokes at RPMh. Sadly, this means we have to take care
>> of enabling & scaling power rails, clocks and bandwidth ourselves.
>>
>> Reuse existing Adreno-common code and modify the deeply-GMU-infused
>> A6XX code to facilitate these GPUs. This involves if-ing out lots
>> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
>> the actual name that Qualcomm uses in their downstream kernels).
>>
>> This is essentially a register region which is convenient to model
>> as a device. We'll use it for managing the GDSCs. The register
>> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
>> and lets us reuse quite a bit of gmu_read/write/rmw calls.
>>
>> Signed-off-by: Konrad Dybcio <[email protected]>
>> ---
>> drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 72 +++++++++-
>> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 211 ++++++++++++++++++++++++----
>> drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 1 +
>> drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 14 +-
>> drivers/gpu/drm/msm/adreno/adreno_gpu.c | 8 +-
>> drivers/gpu/drm/msm/adreno/adreno_gpu.h | 6 +
>> 6 files changed, 277 insertions(+), 35 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> index 5ba8cba69383..385ca3a12462 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> @@ -1437,6 +1437,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, struct platform_device *pdev,
>>
>> void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>> {
>> + struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>> struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>> struct platform_device *pdev = to_platform_device(gmu->dev);
>>
>> @@ -1462,10 +1463,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>> gmu->mmio = NULL;
>> gmu->rscc = NULL;
>>
>> - a6xx_gmu_memory_free(gmu);
>> + if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>> + a6xx_gmu_memory_free(gmu);
>>
>> - free_irq(gmu->gmu_irq, gmu);
>> - free_irq(gmu->hfi_irq, gmu);
>> + free_irq(gmu->gmu_irq, gmu);
>> + free_irq(gmu->hfi_irq, gmu);
>> + }
>>
>> /* Drop reference taken in of_find_device_by_node */
>> put_device(gmu->dev);
>> @@ -1484,6 +1487,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
>> return 0;
>> }
>>
>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
>> +{
>> + struct platform_device *pdev = of_find_device_by_node(node);
>> + struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>> + int ret;
>> +
>> + if (!pdev)
>> + return -ENODEV;
>> +
>> + gmu->dev = &pdev->dev;
>> +
>> + of_dma_configure(gmu->dev, node, true);
>> +
>> + pm_runtime_enable(gmu->dev);
>> +
>> + /* Mark legacy for manual SPTPRAC control */
>> + gmu->legacy = true;
>> +
>> + /* Map the GMU registers */
>> + gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
>> + if (IS_ERR(gmu->mmio)) {
>> + ret = PTR_ERR(gmu->mmio);
>> + goto err_mmio;
>> + }
>> +
>> + gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
>> + if (IS_ERR(gmu->cxpd)) {
>> + ret = PTR_ERR(gmu->cxpd);
>> + goto err_mmio;
>> + }
>> +
>> + if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
>> + ret = -ENODEV;
>> + goto detach_cxpd;
>> + }
>> +
>> + init_completion(&gmu->pd_gate);
>> + complete_all(&gmu->pd_gate);
>> + gmu->pd_nb.notifier_call = cxpd_notifier_cb;
>> +
>> + /* Get a link to the GX power domain to reset the GPU */
>> + gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
>> + if (IS_ERR(gmu->gxpd)) {
>> + ret = PTR_ERR(gmu->gxpd);
>> + goto err_mmio;
>> + }
>> +
>> + gmu->initialized = true;
>> +
>> + return 0;
>> +
>> +detach_cxpd:
>> + dev_pm_domain_detach(gmu->cxpd, false);
>> +
>> +err_mmio:
>> + iounmap(gmu->mmio);
>> +
>> + /* Drop reference taken in of_find_device_by_node */
>> + put_device(gmu->dev);
>> +
>> + return ret;
>> +}
>> +
>> int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
>> {
>> struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> index 58bf405b85d8..0a44762dbb6d 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> @@ -21,7 +21,7 @@ static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
>> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>
>> /* Check that the GMU is idle */
>> - if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
>> + if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_isidle(&a6xx_gpu->gmu))
>> return false;
>>
>> /* Check tha the CX master is idle */
>> @@ -1018,10 +1018,13 @@ static int hw_init(struct msm_gpu *gpu)
>> {
>> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>> + struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>> int ret;
>>
>> - /* Make sure the GMU keeps the GPU on while we set it up */
>> - a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
>> + if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>> + /* Make sure the GMU keeps the GPU on while we set it up */
>> + a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
>> + }
>>
>> /* Clear GBIF halt in case GX domain was not collapsed */
>> if (a6xx_has_gbif(adreno_gpu)) {
>> @@ -1148,6 +1151,17 @@ static int hw_init(struct msm_gpu *gpu)
>> 0x3f0243f0);
>> }
>>
>> + if (adreno_has_gmu_wrapper(adreno_gpu)) {
>> + /* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
>> +
>> + /* Set up the CX GMU counter 0 to count busy ticks */
>> + gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
>> +
>> + /* Enable power counter 0 */
>> + gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
>> + gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
>> + }
>
> nit: For gmu targets, we do this at a6xx_rpmh_start() which is an odd place
> to keep this. But I don't know the reason why it was decided to keep it
> there. I don't see any reason why we cannot keep it here for both
> gmu/gmu-wrapper like in the downstream driver.
I split it up into a separate patch and reused for the next revision.
Tested on A630, the GMU doesn't complain.
>
>> +
>> /* Protect registers from the CP */
>> a6xx_set_cp_protect(gpu);
>>
>> @@ -1237,6 +1251,8 @@ static int hw_init(struct msm_gpu *gpu)
>> }
>>
>> out:
>> + if (adreno_has_gmu_wrapper(adreno_gpu))
>> + return ret;
>> /*
>> * Tell the GMU that we are done touching the GPU and it can start power
>> * management
>> @@ -1271,9 +1287,6 @@ static void a6xx_dump(struct msm_gpu *gpu)
>> adreno_dump(gpu);
>> }
>>
>> -#define VBIF_RESET_ACK_TIMEOUT 100
>> -#define VBIF_RESET_ACK_MASK 0x00f0
>> -
>> static void a6xx_recover(struct msm_gpu *gpu)
>> {
>> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> @@ -1311,6 +1324,15 @@ static void a6xx_recover(struct msm_gpu *gpu)
>> */
>> gpu->active_submits = 0;
>>
>> + if (adreno_has_gmu_wrapper(adreno_gpu)) {
>> + /* Drain the outstanding traffic on memory buses */
>> + a6xx_bus_clear_pending_transactions(adreno_gpu, true);
>> +
>> + /* Reset the GPU to a clean state */
>> + a6xx_gpu_sw_reset(gpu, true);
>> + a6xx_gpu_sw_reset(gpu, false);
>> + }
>> +
>> reinit_completion(&gmu->pd_gate);
>> dev_pm_genpd_add_notifier(gmu->cxpd, &gmu->pd_nb);
>> dev_pm_genpd_synced_poweroff(gmu->cxpd);
>> @@ -1461,7 +1483,8 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
>> * Force the GPU to stay on until after we finish
>> * collecting information
>> */
>> - gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
>> + if (!adreno_has_gmu_wrapper(adreno_gpu))
>> + gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
>>
>> DRM_DEV_ERROR(&gpu->pdev->dev,
>> "gpu fault ring %d fence %x status %8.8X rb %4.4x/%4.4x ib1 %16.16llX/%4.4x ib2 %16.16llX/%4.4x\n",
>> @@ -1592,6 +1615,10 @@ static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)
>>
>> static void a6xx_llc_slices_destroy(struct a6xx_gpu *a6xx_gpu)
>> {
>> + /* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
>> + if (adreno_has_gmu_wrapper(&a6xx_gpu->base))
>> + return;
>> +
>> llcc_slice_putd(a6xx_gpu->llc_slice);
>> llcc_slice_putd(a6xx_gpu->htw_llc_slice);
>> }
>> @@ -1601,6 +1628,10 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
>> {
>> struct device_node *phandle;
>>
>> + /* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
>> + if (adreno_has_gmu_wrapper(&a6xx_gpu->base))
>> + return;
>> +
>> /*
>> * There is a different programming path for targets with an mmu500
>> * attached, so detect if that is the case
>> @@ -1670,7 +1701,7 @@ void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert)
>> udelay(100);
>> }
>>
>> -static int a6xx_pm_resume(struct msm_gpu *gpu)
>> +static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
>> {
>> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>> @@ -1690,10 +1721,58 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
>>
>> a6xx_llc_activate(a6xx_gpu);
>>
>> - return 0;
>> + return ret;
>> }
>>
>> -static int a6xx_pm_suspend(struct msm_gpu *gpu)
>> +static int a6xx_pm_resume(struct msm_gpu *gpu)
>> +{
>> + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>> + struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>> + unsigned long freq = gpu->fast_rate;
>> + struct dev_pm_opp *opp;
>> + int ret;
>> +
>> + gpu->needs_hw_init = true;
>> +
>> + trace_msm_gpu_resume(0);
>> +
>> + mutex_lock(&a6xx_gpu->gmu.lock);
>
> Where is this lock initialized? If the init was moved out of
> a6xx_gmu_init(), can you please share that patch?
12abd735f030 ("drm/msm/a6xx: initialize GMU mutex earlier")
>
>> +
>> + opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
>> + if (IS_ERR(opp)) {
>> + ret = PTR_ERR(opp);
>> + goto err_set_opp;
>> + }
>> + dev_pm_opp_put(opp);
>> +
>> + /* Set the core clock and bus bw, having VDD scaling in mind */
>> + dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
>> +
>> + pm_runtime_resume_and_get(gmu->dev);
>> + pm_runtime_resume_and_get(gmu->gxpd);
>> +
>> + ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
>> + if (ret)
>> + goto err_bulk_clk;
>> +
>> + /* If anything goes south, tear the GPU down piece by piece.. */
>> + if (ret) {
>> +err_bulk_clk:
>
> Goto jump directly to another block looks odd to me. Why do you need this label
> anyway?
If clk_bulk_prepare_enable() fails, trying to proceed will hang the
platform with unclocked accesses. We need to unwind everything that
has been done up until that point, in reverse order.
>
>> + pm_runtime_put(gmu->gxpd);
>> + pm_runtime_put(gmu->dev);
>> + dev_pm_opp_set_opp(&gpu->pdev->dev, NULL);
>> + }
>> +err_set_opp:
>
> Generally, it is better to name the label based on what you do here. For
> eg: "unlock_lock:".
That seems to be a mixed bag all throughout the kernel, I've seen many
usages of err_(what went wrong)
>
> Also, this function is small enough that it is better to return directly
> in case of error. I think that would be more readable.
Not really, adding the necessary cleanup steps in `if (ret)`
blocks would roughly double the function's size.
>
>> + mutex_unlock(&a6xx_gpu->gmu.lock);
>> +
>> + if (!ret)
>> + msm_devfreq_resume(gpu);
>> +
>> + return ret;
>> +}
>> +
>> +static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
>> {
>> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>> @@ -1720,7 +1799,40 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>> return 0;
>> }
>>
>> -static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>> +static int a6xx_pm_suspend(struct msm_gpu *gpu)
>> +{
>> + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>> + struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>> + int i;
>> +
>> + trace_msm_gpu_suspend(0);
>> +
>> + msm_devfreq_suspend(gpu);
>> +
>> + mutex_lock(&a6xx_gpu->gmu.lock);
>
> Again, is this initialized somewhere?
>
>> +
>> + /* Drain the outstanding traffic on memory buses */
>> + a6xx_bus_clear_pending_transactions(adreno_gpu, true);
>> +
>> + clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
>> +
>> + pm_runtime_put_sync(gmu->gxpd);
>> + dev_pm_opp_set_opp(&gpu->pdev->dev, NULL);
>> + pm_runtime_put_sync(gmu->dev);
>> +
>> + mutex_unlock(&a6xx_gpu->gmu.lock);
>> +
>> + if (a6xx_gpu->shadow_bo)
>> + for (i = 0; i < gpu->nr_rings; i++)
>> + a6xx_gpu->shadow[i] = 0;
>> +
>> + gpu->suspend_count++;
>> +
>> + return 0;
>> +}
>> +
>> +static int a6xx_gmu_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>> {
>> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>> @@ -1739,6 +1851,12 @@ static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>> return 0;
>> }
>>
>> +static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>> +{
>> + *value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
>> + return 0;
>> +}
>> +
>> static struct msm_ringbuffer *a6xx_active_ring(struct msm_gpu *gpu)
>> {
>> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> @@ -2004,8 +2122,8 @@ static const struct adreno_gpu_funcs funcs = {
>> .set_param = adreno_set_param,
>> .hw_init = a6xx_hw_init,
>> .ucode_load = a6xx_ucode_load,
>> - .pm_suspend = a6xx_pm_suspend,
>> - .pm_resume = a6xx_pm_resume,
>> + .pm_suspend = a6xx_gmu_pm_suspend,
>> + .pm_resume = a6xx_gmu_pm_resume,
>> .recover = a6xx_recover,
>> .submit = a6xx_submit,
>> .active_ring = a6xx_active_ring,
>> @@ -2020,6 +2138,35 @@ static const struct adreno_gpu_funcs funcs = {
>> #if defined(CONFIG_DRM_MSM_GPU_STATE)
>> .gpu_state_get = a6xx_gpu_state_get,
>> .gpu_state_put = a6xx_gpu_state_put,
>> +#endif
>> + .create_address_space = a6xx_create_address_space,
>> + .create_private_address_space = a6xx_create_private_address_space,
>> + .get_rptr = a6xx_get_rptr,
>> + .progress = a6xx_progress,
>> + },
>> + .get_timestamp = a6xx_gmu_get_timestamp,
>> +};
>> +
>> +static const struct adreno_gpu_funcs funcs_gmuwrapper = {
>> + .base = {
>> + .get_param = adreno_get_param,
>> + .set_param = adreno_set_param,
>> + .hw_init = a6xx_hw_init,
>> + .ucode_load = a6xx_ucode_load,
>> + .pm_suspend = a6xx_pm_suspend,
>> + .pm_resume = a6xx_pm_resume,
>> + .recover = a6xx_recover,
>> + .submit = a6xx_submit,
>> + .active_ring = a6xx_active_ring,
>> + .irq = a6xx_irq,
>> + .destroy = a6xx_destroy,
>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
>> + .show = a6xx_show,
>> +#endif
>> + .gpu_busy = a6xx_gpu_busy,
>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
>> + .gpu_state_get = a6xx_gpu_state_get,
>> + .gpu_state_put = a6xx_gpu_state_put,
>> #endif
>> .create_address_space = a6xx_create_address_space,
>> .create_private_address_space = a6xx_create_private_address_space,
>> @@ -2050,15 +2197,31 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>
>> adreno_gpu->registers = NULL;
>>
>> + /* Check if there is a GMU phandle and set it up */
>> + node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
>> + /* FIXME: How do we gracefully handle this? */
>> + BUG_ON(!node);
>> +
>> + adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
>> +
>> /*
>> * We need to know the platform type before calling into adreno_gpu_init
>> * so that the hw_apriv flag can be correctly set. Snoop into the info
>> * and grab the revision number
>> */
>> info = adreno_info(config->rev);
>> -
>> - if (info && (info->revn == 650 || info->revn == 660 ||
>> - adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
>> + if (!info)
>> + return ERR_PTR(-EINVAL);
>> +
>> + /* Assign these early so that we can use the is_aXYZ helpers */
>> + /* Numeric revision IDs (e.g. 630) */
>> + adreno_gpu->revn = info->revn;
>> + /* New-style ADRENO_REV()-only */
>> + adreno_gpu->rev = info->rev;
>> + /* Quirk data */
>> + adreno_gpu->info = info;
>> +
>> + if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
>> adreno_gpu->base.hw_apriv = true;
>>
>> a6xx_llc_slices_init(pdev, a6xx_gpu);
>> @@ -2069,7 +2232,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>> return ERR_PTR(ret);
>> }
>>
>> - ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
>> + if (adreno_has_gmu_wrapper(adreno_gpu))
>> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
>> + else
>> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
>> if (ret) {
>> a6xx_destroy(&(a6xx_gpu->base.base));
>> return ERR_PTR(ret);
>> @@ -2082,13 +2248,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>> if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
>> priv->gpu_clamp_to_idle = true;
>>
>> - /* Check if there is a GMU phandle and set it up */
>> - node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
>> -
>> - /* FIXME: How do we gracefully handle this? */
>> - BUG_ON(!node);
>> -
>> - ret = a6xx_gmu_init(a6xx_gpu, node);
>> + if (adreno_has_gmu_wrapper(adreno_gpu))
>> + ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
>> + else
>> + ret = a6xx_gmu_init(a6xx_gpu, node);
>> of_node_put(node);
>> if (ret) {
>> a6xx_destroy(&(a6xx_gpu->base.base));
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> index aa70390ee1c6..c788b06e72da 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> @@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>> void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>>
>> int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>> void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
>>
>> void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>> index 30ecdff363e7..4e5d650578c6 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>> @@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
>> /* Get the generic state from the adreno core */
>> adreno_gpu_state_get(gpu, &a6xx_state->base);
>>
>> - a6xx_get_gmu_registers(gpu, a6xx_state);
>> + if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>> + a6xx_get_gmu_registers(gpu, a6xx_state);
>>
>> - a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
>> - a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
>> - a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
>> + a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
>> + a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
>> + a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
>> /
>> - a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
>> + a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
>> + }
>>
>> /* If GX isn't on the rest of the data isn't going to be accessible */
>> - if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>> + if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>> return &a6xx_state->base;
>>
>> /* Get the banks of indexed registers */
>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>> index 6934cee07d42..5c5901d65950 100644
>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>> @@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
>> if (!adreno_gpu->info->fw[i])
>> continue;
>>
>> + /* Skip loading GMU firwmare with GMU Wrapper */
>> + if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
>> + continue;
>> +
>> /* Skip if the firmware has already been loaded */
>> if (adreno_gpu->fw[i])
>> continue;
>> @@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>> u32 speedbin;
>> int ret;
>>
>> - /* Only handle the core clock when GMU is not in use */
>> - if (config->rev.core < 6) {
>> + /* Only handle the core clock when GMU is not in use (or is absent). */
>> + if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
>> /*
>> * This can only be done before devm_pm_opp_of_add_table(), or
>> * dev_pm_opp_set_config() will WARN_ON()
>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>> index f62612a5c70f..ee5352bc5329 100644
>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>> @@ -115,6 +115,7 @@ struct adreno_gpu {
>> * code (a3xx_gpu.c) and stored in this common location.
>> */
>> const unsigned int *reg_offsets;
>> + bool gmu_is_wrapper;
>> };
>> #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
>>
>> @@ -145,6 +146,11 @@ struct adreno_platform_config {
>>
>> bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
>>
>> +static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
>> +{
>> + return gpu->gmu_is_wrapper;
>> +}
>> +
>> static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
>> {
>> return (gpu->revn < 300);
>>
>> --
>> 2.40.1
>>
>
> I am still not fully onboard with the idea of gmu_wrapper node in devicetree.
> Aside from that, I don't see any other issue. Please check the few comments I left.
Thanks for your review!
Konrad
>
> -Akhil.
>
On Thu, Jun 15, 2023 at 11:43:04PM +0200, Konrad Dybcio wrote:
>
> On 10.06.2023 00:06, Akhil P Oommen wrote:
> > On Mon, May 29, 2023 at 03:52:29PM +0200, Konrad Dybcio wrote:
> >>
> >> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
> >> but don't implement the associated GMUs. This is due to the fact that
> >> the GMU directly pokes at RPMh. Sadly, this means we have to take care
> >> of enabling & scaling power rails, clocks and bandwidth ourselves.
> >>
> >> Reuse existing Adreno-common code and modify the deeply-GMU-infused
> >> A6XX code to facilitate these GPUs. This involves if-ing out lots
> >> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
> >> the actual name that Qualcomm uses in their downstream kernels).
> >>
> >> This is essentially a register region which is convenient to model
> >> as a device. We'll use it for managing the GDSCs. The register
> >> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
> >> and lets us reuse quite a bit of gmu_read/write/rmw calls.
> >>
> >> Signed-off-by: Konrad Dybcio <[email protected]>
> >> ---
> >> drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 72 +++++++++-
> >> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 211 ++++++++++++++++++++++++----
> >> drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 1 +
> >> drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 14 +-
> >> drivers/gpu/drm/msm/adreno/adreno_gpu.c | 8 +-
> >> drivers/gpu/drm/msm/adreno/adreno_gpu.h | 6 +
> >> 6 files changed, 277 insertions(+), 35 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >> index 5ba8cba69383..385ca3a12462 100644
> >> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >> @@ -1437,6 +1437,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, struct platform_device *pdev,
> >>
> >> void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> >> {
> >> + struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> >> struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >> struct platform_device *pdev = to_platform_device(gmu->dev);
> >>
> >> @@ -1462,10 +1463,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> >> gmu->mmio = NULL;
> >> gmu->rscc = NULL;
> >>
> >> - a6xx_gmu_memory_free(gmu);
> >> + if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >> + a6xx_gmu_memory_free(gmu);
> >>
> >> - free_irq(gmu->gmu_irq, gmu);
> >> - free_irq(gmu->hfi_irq, gmu);
> >> + free_irq(gmu->gmu_irq, gmu);
> >> + free_irq(gmu->hfi_irq, gmu);
> >> + }
> >>
> >> /* Drop reference taken in of_find_device_by_node */
> >> put_device(gmu->dev);
> >> @@ -1484,6 +1487,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
> >> return 0;
> >> }
> >>
> >> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> >> +{
> >> + struct platform_device *pdev = of_find_device_by_node(node);
> >> + struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >> + int ret;
> >> +
> >> + if (!pdev)
> >> + return -ENODEV;
> >> +
> >> + gmu->dev = &pdev->dev;
> >> +
> >> + of_dma_configure(gmu->dev, node, true);
> >> +
> >> + pm_runtime_enable(gmu->dev);
> >> +
> >> + /* Mark legacy for manual SPTPRAC control */
> >> + gmu->legacy = true;
> >> +
> >> + /* Map the GMU registers */
> >> + gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
> >> + if (IS_ERR(gmu->mmio)) {
> >> + ret = PTR_ERR(gmu->mmio);
> >> + goto err_mmio;
> >> + }
> >> +
> >> + gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
> >> + if (IS_ERR(gmu->cxpd)) {
> >> + ret = PTR_ERR(gmu->cxpd);
> >> + goto err_mmio;
> >> + }
> >> +
> >> + if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
> >> + ret = -ENODEV;
> >> + goto detach_cxpd;
> >> + }
> >> +
> >> + init_completion(&gmu->pd_gate);
> >> + complete_all(&gmu->pd_gate);
> >> + gmu->pd_nb.notifier_call = cxpd_notifier_cb;
> >> +
> >> + /* Get a link to the GX power domain to reset the GPU */
> >> + gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
> >> + if (IS_ERR(gmu->gxpd)) {
> >> + ret = PTR_ERR(gmu->gxpd);
> >> + goto err_mmio;
> >> + }
> >> +
> >> + gmu->initialized = true;
> >> +
> >> + return 0;
> >> +
> >> +detach_cxpd:
> >> + dev_pm_domain_detach(gmu->cxpd, false);
> >> +
> >> +err_mmio:
> >> + iounmap(gmu->mmio);
> >> +
> >> + /* Drop reference taken in of_find_device_by_node */
> >> + put_device(gmu->dev);
> >> +
> >> + return ret;
> >> +}
> >> +
> >> int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> >> {
> >> struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> >> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >> index 58bf405b85d8..0a44762dbb6d 100644
> >> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >> @@ -21,7 +21,7 @@ static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
> >> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>
> >> /* Check that the GMU is idle */
> >> - if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
> >> + if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_isidle(&a6xx_gpu->gmu))
> >> return false;
> >>
> >> /* Check tha the CX master is idle */
> >> @@ -1018,10 +1018,13 @@ static int hw_init(struct msm_gpu *gpu)
> >> {
> >> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >> + struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >> int ret;
> >>
> >> - /* Make sure the GMU keeps the GPU on while we set it up */
> >> - a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
> >> + if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >> + /* Make sure the GMU keeps the GPU on while we set it up */
> >> + a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
> >> + }
> >>
> >> /* Clear GBIF halt in case GX domain was not collapsed */
> >> if (a6xx_has_gbif(adreno_gpu)) {
> >> @@ -1148,6 +1151,17 @@ static int hw_init(struct msm_gpu *gpu)
> >> 0x3f0243f0);
> >> }
> >>
> >> + if (adreno_has_gmu_wrapper(adreno_gpu)) {
> >> + /* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
> >> +
> >> + /* Set up the CX GMU counter 0 to count busy ticks */
> >> + gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
> >> +
> >> + /* Enable power counter 0 */
> >> + gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
> >> + gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
> >> + }
> >
> > nit: For gmu targets, we do this at a6xx_rpmh_start() which is an odd place
> > to keep this. But I don't know the reason why it was decided to keep it
> > there. I don't see any reason why we cannot keep it here for both
> > gmu/gmu-wrapper like in the downstream driver.
> I split it up into a separate patch and reused for the next revision.
> Tested on A630, the GMU doesn't complain.
>
> >
> >> +
> >> /* Protect registers from the CP */
> >> a6xx_set_cp_protect(gpu);
> >>
> >> @@ -1237,6 +1251,8 @@ static int hw_init(struct msm_gpu *gpu)
> >> }
> >>
> >> out:
> >> + if (adreno_has_gmu_wrapper(adreno_gpu))
> >> + return ret;
> >> /*
> >> * Tell the GMU that we are done touching the GPU and it can start power
> >> * management
> >> @@ -1271,9 +1287,6 @@ static void a6xx_dump(struct msm_gpu *gpu)
> >> adreno_dump(gpu);
> >> }
> >>
> >> -#define VBIF_RESET_ACK_TIMEOUT 100
> >> -#define VBIF_RESET_ACK_MASK 0x00f0
> >> -
> >> static void a6xx_recover(struct msm_gpu *gpu)
> >> {
> >> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >> @@ -1311,6 +1324,15 @@ static void a6xx_recover(struct msm_gpu *gpu)
> >> */
> >> gpu->active_submits = 0;
> >>
> >> + if (adreno_has_gmu_wrapper(adreno_gpu)) {
> >> + /* Drain the outstanding traffic on memory buses */
> >> + a6xx_bus_clear_pending_transactions(adreno_gpu, true);
> >> +
> >> + /* Reset the GPU to a clean state */
> >> + a6xx_gpu_sw_reset(gpu, true);
> >> + a6xx_gpu_sw_reset(gpu, false);
> >> + }
> >> +
> >> reinit_completion(&gmu->pd_gate);
> >> dev_pm_genpd_add_notifier(gmu->cxpd, &gmu->pd_nb);
> >> dev_pm_genpd_synced_poweroff(gmu->cxpd);
> >> @@ -1461,7 +1483,8 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
> >> * Force the GPU to stay on until after we finish
> >> * collecting information
> >> */
> >> - gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
> >> + if (!adreno_has_gmu_wrapper(adreno_gpu))
> >> + gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
> >>
> >> DRM_DEV_ERROR(&gpu->pdev->dev,
> >> "gpu fault ring %d fence %x status %8.8X rb %4.4x/%4.4x ib1 %16.16llX/%4.4x ib2 %16.16llX/%4.4x\n",
> >> @@ -1592,6 +1615,10 @@ static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)
> >>
> >> static void a6xx_llc_slices_destroy(struct a6xx_gpu *a6xx_gpu)
> >> {
> >> + /* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
> >> + if (adreno_has_gmu_wrapper(&a6xx_gpu->base))
> >> + return;
> >> +
> >> llcc_slice_putd(a6xx_gpu->llc_slice);
> >> llcc_slice_putd(a6xx_gpu->htw_llc_slice);
> >> }
> >> @@ -1601,6 +1628,10 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
> >> {
> >> struct device_node *phandle;
> >>
> >> + /* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
> >> + if (adreno_has_gmu_wrapper(&a6xx_gpu->base))
> >> + return;
> >> +
> >> /*
> >> * There is a different programming path for targets with an mmu500
> >> * attached, so detect if that is the case
> >> @@ -1670,7 +1701,7 @@ void a6xx_gpu_sw_reset(struct msm_gpu *gpu, bool assert)
> >> udelay(100);
> >> }
> >>
> >> -static int a6xx_pm_resume(struct msm_gpu *gpu)
> >> +static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
> >> {
> >> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >> @@ -1690,10 +1721,58 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
> >>
> >> a6xx_llc_activate(a6xx_gpu);
> >>
> >> - return 0;
> >> + return ret;
> >> }
> >>
> >> -static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >> +static int a6xx_pm_resume(struct msm_gpu *gpu)
> >> +{
> >> + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >> + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >> + struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >> + unsigned long freq = gpu->fast_rate;
> >> + struct dev_pm_opp *opp;
> >> + int ret;
> >> +
> >> + gpu->needs_hw_init = true;
> >> +
> >> + trace_msm_gpu_resume(0);
> >> +
> >> + mutex_lock(&a6xx_gpu->gmu.lock);
> >
> > Where is this lock initialized? If the init was moved out of
> > a6xx_gmu_init(), can you please share that patch?
> 12abd735f030 ("drm/msm/a6xx: initialize GMU mutex earlier")
>
> >
> >> +
> >> + opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
> >> + if (IS_ERR(opp)) {
> >> + ret = PTR_ERR(opp);
> >> + goto err_set_opp;
> >> + }
> >> + dev_pm_opp_put(opp);
> >> +
> >> + /* Set the core clock and bus bw, having VDD scaling in mind */
> >> + dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> >> +
> >> + pm_runtime_resume_and_get(gmu->dev);
> >> + pm_runtime_resume_and_get(gmu->gxpd);
> >> +
> >> + ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
> >> + if (ret)
> >> + goto err_bulk_clk;
> >> +
> >> + /* If anything goes south, tear the GPU down piece by piece.. */
> >> + if (ret) {
> >> +err_bulk_clk:
> >
> > Goto jump directly to another block looks odd to me. Why do you need this label
> > anyway?
> If clk_bulk_prepare_enable() fails, trying to proceed will hang the
> platform with unclocked accesses. We need to unwind everything that
> has been done up until that point, in reverse order.
I missed this response from you earlier.
But you are checking for 'ret' twice here. You will end up here even
if you don't jump! So "if (ret) goto err_bulk_clk;" looks
unnecessary.
-Akhil.
>
> >
> >> + pm_runtime_put(gmu->gxpd);
> >> + pm_runtime_put(gmu->dev);
> >> + dev_pm_opp_set_opp(&gpu->pdev->dev, NULL);
> >> + }
> >> +err_set_opp:
> >
> > Generally, it is better to name the label based on what you do here. For
> > eg: "unlock_lock:".
> That seems to be a mixed bag all throughout the kernel, I've seen many
> usages of err_(what went wrong)
>
> >
> > Also, this function is small enough that it is better to return directly
> > in case of error. I think that would be more readable.
> Not really, adding the necessary cleanup steps in `if (ret)`
> blocks would roughly double the function's size.
>
> >
> >> + mutex_unlock(&a6xx_gpu->gmu.lock);
> >> +
> >> + if (!ret)
> >> + msm_devfreq_resume(gpu);
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
> >> {
> >> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >> @@ -1720,7 +1799,40 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >> return 0;
> >> }
> >>
> >> -static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
> >> +static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >> +{
> >> + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >> + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >> + struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >> + int i;
> >> +
> >> + trace_msm_gpu_suspend(0);
> >> +
> >> + msm_devfreq_suspend(gpu);
> >> +
> >> + mutex_lock(&a6xx_gpu->gmu.lock);
> >
> > Again, is this initialized somewhere?
> >
> >> +
> >> + /* Drain the outstanding traffic on memory buses */
> >> + a6xx_bus_clear_pending_transactions(adreno_gpu, true);
> >> +
> >> + clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
> >> +
> >> + pm_runtime_put_sync(gmu->gxpd);
> >> + dev_pm_opp_set_opp(&gpu->pdev->dev, NULL);
> >> + pm_runtime_put_sync(gmu->dev);
> >> +
> >> + mutex_unlock(&a6xx_gpu->gmu.lock);
> >> +
> >> + if (a6xx_gpu->shadow_bo)
> >> + for (i = 0; i < gpu->nr_rings; i++)
> >> + a6xx_gpu->shadow[i] = 0;
> >> +
> >> + gpu->suspend_count++;
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static int a6xx_gmu_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
> >> {
> >> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >> @@ -1739,6 +1851,12 @@ static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
> >> return 0;
> >> }
> >>
> >> +static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
> >> +{
> >> + *value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
> >> + return 0;
> >> +}
> >> +
> >> static struct msm_ringbuffer *a6xx_active_ring(struct msm_gpu *gpu)
> >> {
> >> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >> @@ -2004,8 +2122,8 @@ static const struct adreno_gpu_funcs funcs = {
> >> .set_param = adreno_set_param,
> >> .hw_init = a6xx_hw_init,
> >> .ucode_load = a6xx_ucode_load,
> >> - .pm_suspend = a6xx_pm_suspend,
> >> - .pm_resume = a6xx_pm_resume,
> >> + .pm_suspend = a6xx_gmu_pm_suspend,
> >> + .pm_resume = a6xx_gmu_pm_resume,
> >> .recover = a6xx_recover,
> >> .submit = a6xx_submit,
> >> .active_ring = a6xx_active_ring,
> >> @@ -2020,6 +2138,35 @@ static const struct adreno_gpu_funcs funcs = {
> >> #if defined(CONFIG_DRM_MSM_GPU_STATE)
> >> .gpu_state_get = a6xx_gpu_state_get,
> >> .gpu_state_put = a6xx_gpu_state_put,
> >> +#endif
> >> + .create_address_space = a6xx_create_address_space,
> >> + .create_private_address_space = a6xx_create_private_address_space,
> >> + .get_rptr = a6xx_get_rptr,
> >> + .progress = a6xx_progress,
> >> + },
> >> + .get_timestamp = a6xx_gmu_get_timestamp,
> >> +};
> >> +
> >> +static const struct adreno_gpu_funcs funcs_gmuwrapper = {
> >> + .base = {
> >> + .get_param = adreno_get_param,
> >> + .set_param = adreno_set_param,
> >> + .hw_init = a6xx_hw_init,
> >> + .ucode_load = a6xx_ucode_load,
> >> + .pm_suspend = a6xx_pm_suspend,
> >> + .pm_resume = a6xx_pm_resume,
> >> + .recover = a6xx_recover,
> >> + .submit = a6xx_submit,
> >> + .active_ring = a6xx_active_ring,
> >> + .irq = a6xx_irq,
> >> + .destroy = a6xx_destroy,
> >> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> >> + .show = a6xx_show,
> >> +#endif
> >> + .gpu_busy = a6xx_gpu_busy,
> >> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> >> + .gpu_state_get = a6xx_gpu_state_get,
> >> + .gpu_state_put = a6xx_gpu_state_put,
> >> #endif
> >> .create_address_space = a6xx_create_address_space,
> >> .create_private_address_space = a6xx_create_private_address_space,
> >> @@ -2050,15 +2197,31 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>
> >> adreno_gpu->registers = NULL;
> >>
> >> + /* Check if there is a GMU phandle and set it up */
> >> + node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> >> + /* FIXME: How do we gracefully handle this? */
> >> + BUG_ON(!node);
> >> +
> >> + adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
> >> +
> >> /*
> >> * We need to know the platform type before calling into adreno_gpu_init
> >> * so that the hw_apriv flag can be correctly set. Snoop into the info
> >> * and grab the revision number
> >> */
> >> info = adreno_info(config->rev);
> >> -
> >> - if (info && (info->revn == 650 || info->revn == 660 ||
> >> - adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
> >> + if (!info)
> >> + return ERR_PTR(-EINVAL);
> >> +
> >> + /* Assign these early so that we can use the is_aXYZ helpers */
> >> + /* Numeric revision IDs (e.g. 630) */
> >> + adreno_gpu->revn = info->revn;
> >> + /* New-style ADRENO_REV()-only */
> >> + adreno_gpu->rev = info->rev;
> >> + /* Quirk data */
> >> + adreno_gpu->info = info;
> >> +
> >> + if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
> >> adreno_gpu->base.hw_apriv = true;
> >>
> >> a6xx_llc_slices_init(pdev, a6xx_gpu);
> >> @@ -2069,7 +2232,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >> return ERR_PTR(ret);
> >> }
> >>
> >> - ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> >> + if (adreno_has_gmu_wrapper(adreno_gpu))
> >> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
> >> + else
> >> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> >> if (ret) {
> >> a6xx_destroy(&(a6xx_gpu->base.base));
> >> return ERR_PTR(ret);
> >> @@ -2082,13 +2248,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >> if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
> >> priv->gpu_clamp_to_idle = true;
> >>
> >> - /* Check if there is a GMU phandle and set it up */
> >> - node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> >> -
> >> - /* FIXME: How do we gracefully handle this? */
> >> - BUG_ON(!node);
> >> -
> >> - ret = a6xx_gmu_init(a6xx_gpu, node);
> >> + if (adreno_has_gmu_wrapper(adreno_gpu))
> >> + ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
> >> + else
> >> + ret = a6xx_gmu_init(a6xx_gpu, node);
> >> of_node_put(node);
> >> if (ret) {
> >> a6xx_destroy(&(a6xx_gpu->base.base));
> >> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >> index aa70390ee1c6..c788b06e72da 100644
> >> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >> @@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> >> void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> >>
> >> int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> >> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> >> void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
> >>
> >> void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> >> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> >> index 30ecdff363e7..4e5d650578c6 100644
> >> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> >> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> >> @@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
> >> /* Get the generic state from the adreno core */
> >> adreno_gpu_state_get(gpu, &a6xx_state->base);
> >>
> >> - a6xx_get_gmu_registers(gpu, a6xx_state);
> >> + if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >> + a6xx_get_gmu_registers(gpu, a6xx_state);
> >>
> >> - a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> >> - a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> >> - a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> >> + a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> >> + a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> >> + a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> >> /
> >> - a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> >> + a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> >> + }
> >>
> >> /* If GX isn't on the rest of the data isn't going to be accessible */
> >> - if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> >> + if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> >> return &a6xx_state->base;
> >>
> >> /* Get the banks of indexed registers */
> >> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >> index 6934cee07d42..5c5901d65950 100644
> >> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >> @@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
> >> if (!adreno_gpu->info->fw[i])
> >> continue;
> >>
> >> + /* Skip loading GMU firwmare with GMU Wrapper */
> >> + if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
> >> + continue;
> >> +
> >> /* Skip if the firmware has already been loaded */
> >> if (adreno_gpu->fw[i])
> >> continue;
> >> @@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
> >> u32 speedbin;
> >> int ret;
> >>
> >> - /* Only handle the core clock when GMU is not in use */
> >> - if (config->rev.core < 6) {
> >> + /* Only handle the core clock when GMU is not in use (or is absent). */
> >> + if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
> >> /*
> >> * This can only be done before devm_pm_opp_of_add_table(), or
> >> * dev_pm_opp_set_config() will WARN_ON()
> >> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> >> index f62612a5c70f..ee5352bc5329 100644
> >> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> >> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> >> @@ -115,6 +115,7 @@ struct adreno_gpu {
> >> * code (a3xx_gpu.c) and stored in this common location.
> >> */
> >> const unsigned int *reg_offsets;
> >> + bool gmu_is_wrapper;
> >> };
> >> #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
> >>
> >> @@ -145,6 +146,11 @@ struct adreno_platform_config {
> >>
> >> bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
> >>
> >> +static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
> >> +{
> >> + return gpu->gmu_is_wrapper;
> >> +}
> >> +
> >> static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
> >> {
> >> return (gpu->revn < 300);
> >>
> >> --
> >> 2.40.1
> >>
> >
> > I am still not fully onboard with the idea of gmu_wrapper node in devicetree.
> > Aside from that, I don't see any other issue. Please check the few comments I left.
> Thanks for your review!
>
> Konrad
> >
> > -Akhil.
> >
On 16.06.2023 19:54, Akhil P Oommen wrote:
> On Thu, Jun 15, 2023 at 11:43:04PM +0200, Konrad Dybcio wrote:
>>
>> On 10.06.2023 00:06, Akhil P Oommen wrote:
>>> On Mon, May 29, 2023 at 03:52:29PM +0200, Konrad Dybcio wrote:
>>>>
>>>> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
>>>> but don't implement the associated GMUs. This is due to the fact that
>>>> the GMU directly pokes at RPMh. Sadly, this means we have to take care
>>>> of enabling & scaling power rails, clocks and bandwidth ourselves.
>>>>
>>>> Reuse existing Adreno-common code and modify the deeply-GMU-infused
>>>> A6XX code to facilitate these GPUs. This involves if-ing out lots
>>>> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
>>>> the actual name that Qualcomm uses in their downstream kernels).
>>>>
>>>> This is essentially a register region which is convenient to model
>>>> as a device. We'll use it for managing the GDSCs. The register
>>>> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
>>>> and lets us reuse quite a bit of gmu_read/write/rmw calls.
>>>>
>>>> Signed-off-by: Konrad Dybcio <[email protected]>
>>>> ---
[...]
>>>> +
>>>> + ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
>>>> + if (ret)
>>>> + goto err_bulk_clk;
>>>> +
>>>> + /* If anything goes south, tear the GPU down piece by piece.. */
>>>> + if (ret) {
>>>> +err_bulk_clk:
>>>
>>> Goto jump directly to another block looks odd to me. Why do you need this label
>>> anyway?
>> If clk_bulk_prepare_enable() fails, trying to proceed will hang the
>> platform with unclocked accesses. We need to unwind everything that
>> has been done up until that point, in reverse order.
>
> I missed this response from you earlier.
>
> But you are checking for 'ret' twice here. You will end up here even
> if you don't jump! So "if (ret) goto err_bulk_clk;" looks
> unnecessary.
>
> -Akhil.
Ohhh right, silly mistake on my part ;)
I already sent out a v9 since.. Please check it out and if you
have any further comments, I'll fix this, and if not.. Perhaps I
could fix it in an incremental patch if that revision is gtg?
Konrad
>
>>
>>>
>>>> + pm_runtime_put(gmu->gxpd);
>>>> + pm_runtime_put(gmu->dev);
>>>> + dev_pm_opp_set_opp(&gpu->pdev->dev, NULL);
>>>> + }
>>>> +err_set_opp:
>>>
>>> Generally, it is better to name the label based on what you do here. For
>>> eg: "unlock_lock:".
>> That seems to be a mixed bag all throughout the kernel, I've seen many
>> usages of err_(what went wrong)
>>
>>>
>>> Also, this function is small enough that it is better to return directly
>>> in case of error. I think that would be more readable.
>> Not really, adding the necessary cleanup steps in `if (ret)`
>> blocks would roughly double the function's size.
>>
>>>
>>>> + mutex_unlock(&a6xx_gpu->gmu.lock);
>>>> +
>>>> + if (!ret)
>>>> + msm_devfreq_resume(gpu);
>>>> +
>>>> + return ret;
>>>> +}
>>>> +
>>>> +static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
>>>> {
>>>> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>> @@ -1720,7 +1799,40 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>>> return 0;
>>>> }
>>>>
>>>> -static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>>>> +static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>>> +{
>>>> + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>> + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>> + struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>> + int i;
>>>> +
>>>> + trace_msm_gpu_suspend(0);
>>>> +
>>>> + msm_devfreq_suspend(gpu);
>>>> +
>>>> + mutex_lock(&a6xx_gpu->gmu.lock);
>>>
>>> Again, is this initialized somewhere?
>>>
>>>> +
>>>> + /* Drain the outstanding traffic on memory buses */
>>>> + a6xx_bus_clear_pending_transactions(adreno_gpu, true);
>>>> +
>>>> + clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
>>>> +
>>>> + pm_runtime_put_sync(gmu->gxpd);
>>>> + dev_pm_opp_set_opp(&gpu->pdev->dev, NULL);
>>>> + pm_runtime_put_sync(gmu->dev);
>>>> +
>>>> + mutex_unlock(&a6xx_gpu->gmu.lock);
>>>> +
>>>> + if (a6xx_gpu->shadow_bo)
>>>> + for (i = 0; i < gpu->nr_rings; i++)
>>>> + a6xx_gpu->shadow[i] = 0;
>>>> +
>>>> + gpu->suspend_count++;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int a6xx_gmu_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>>>> {
>>>> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>> @@ -1739,6 +1851,12 @@ static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>>>> return 0;
>>>> }
>>>>
>>>> +static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>>>> +{
>>>> + *value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
>>>> + return 0;
>>>> +}
>>>> +
>>>> static struct msm_ringbuffer *a6xx_active_ring(struct msm_gpu *gpu)
>>>> {
>>>> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>> @@ -2004,8 +2122,8 @@ static const struct adreno_gpu_funcs funcs = {
>>>> .set_param = adreno_set_param,
>>>> .hw_init = a6xx_hw_init,
>>>> .ucode_load = a6xx_ucode_load,
>>>> - .pm_suspend = a6xx_pm_suspend,
>>>> - .pm_resume = a6xx_pm_resume,
>>>> + .pm_suspend = a6xx_gmu_pm_suspend,
>>>> + .pm_resume = a6xx_gmu_pm_resume,
>>>> .recover = a6xx_recover,
>>>> .submit = a6xx_submit,
>>>> .active_ring = a6xx_active_ring,
>>>> @@ -2020,6 +2138,35 @@ static const struct adreno_gpu_funcs funcs = {
>>>> #if defined(CONFIG_DRM_MSM_GPU_STATE)
>>>> .gpu_state_get = a6xx_gpu_state_get,
>>>> .gpu_state_put = a6xx_gpu_state_put,
>>>> +#endif
>>>> + .create_address_space = a6xx_create_address_space,
>>>> + .create_private_address_space = a6xx_create_private_address_space,
>>>> + .get_rptr = a6xx_get_rptr,
>>>> + .progress = a6xx_progress,
>>>> + },
>>>> + .get_timestamp = a6xx_gmu_get_timestamp,
>>>> +};
>>>> +
>>>> +static const struct adreno_gpu_funcs funcs_gmuwrapper = {
>>>> + .base = {
>>>> + .get_param = adreno_get_param,
>>>> + .set_param = adreno_set_param,
>>>> + .hw_init = a6xx_hw_init,
>>>> + .ucode_load = a6xx_ucode_load,
>>>> + .pm_suspend = a6xx_pm_suspend,
>>>> + .pm_resume = a6xx_pm_resume,
>>>> + .recover = a6xx_recover,
>>>> + .submit = a6xx_submit,
>>>> + .active_ring = a6xx_active_ring,
>>>> + .irq = a6xx_irq,
>>>> + .destroy = a6xx_destroy,
>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
>>>> + .show = a6xx_show,
>>>> +#endif
>>>> + .gpu_busy = a6xx_gpu_busy,
>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
>>>> + .gpu_state_get = a6xx_gpu_state_get,
>>>> + .gpu_state_put = a6xx_gpu_state_put,
>>>> #endif
>>>> .create_address_space = a6xx_create_address_space,
>>>> .create_private_address_space = a6xx_create_private_address_space,
>>>> @@ -2050,15 +2197,31 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>
>>>> adreno_gpu->registers = NULL;
>>>>
>>>> + /* Check if there is a GMU phandle and set it up */
>>>> + node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
>>>> + /* FIXME: How do we gracefully handle this? */
>>>> + BUG_ON(!node);
>>>> +
>>>> + adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
>>>> +
>>>> /*
>>>> * We need to know the platform type before calling into adreno_gpu_init
>>>> * so that the hw_apriv flag can be correctly set. Snoop into the info
>>>> * and grab the revision number
>>>> */
>>>> info = adreno_info(config->rev);
>>>> -
>>>> - if (info && (info->revn == 650 || info->revn == 660 ||
>>>> - adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
>>>> + if (!info)
>>>> + return ERR_PTR(-EINVAL);
>>>> +
>>>> + /* Assign these early so that we can use the is_aXYZ helpers */
>>>> + /* Numeric revision IDs (e.g. 630) */
>>>> + adreno_gpu->revn = info->revn;
>>>> + /* New-style ADRENO_REV()-only */
>>>> + adreno_gpu->rev = info->rev;
>>>> + /* Quirk data */
>>>> + adreno_gpu->info = info;
>>>> +
>>>> + if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
>>>> adreno_gpu->base.hw_apriv = true;
>>>>
>>>> a6xx_llc_slices_init(pdev, a6xx_gpu);
>>>> @@ -2069,7 +2232,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>> return ERR_PTR(ret);
>>>> }
>>>>
>>>> - ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
>>>> + if (adreno_has_gmu_wrapper(adreno_gpu))
>>>> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
>>>> + else
>>>> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
>>>> if (ret) {
>>>> a6xx_destroy(&(a6xx_gpu->base.base));
>>>> return ERR_PTR(ret);
>>>> @@ -2082,13 +2248,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>> if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
>>>> priv->gpu_clamp_to_idle = true;
>>>>
>>>> - /* Check if there is a GMU phandle and set it up */
>>>> - node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
>>>> -
>>>> - /* FIXME: How do we gracefully handle this? */
>>>> - BUG_ON(!node);
>>>> -
>>>> - ret = a6xx_gmu_init(a6xx_gpu, node);
>>>> + if (adreno_has_gmu_wrapper(adreno_gpu))
>>>> + ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
>>>> + else
>>>> + ret = a6xx_gmu_init(a6xx_gpu, node);
>>>> of_node_put(node);
>>>> if (ret) {
>>>> a6xx_destroy(&(a6xx_gpu->base.base));
>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>> index aa70390ee1c6..c788b06e72da 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>> @@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>>>> void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>>>>
>>>> int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>>>> void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
>>>>
>>>> void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>>>> index 30ecdff363e7..4e5d650578c6 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>>>> @@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
>>>> /* Get the generic state from the adreno core */
>>>> adreno_gpu_state_get(gpu, &a6xx_state->base);
>>>>
>>>> - a6xx_get_gmu_registers(gpu, a6xx_state);
>>>> + if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>>>> + a6xx_get_gmu_registers(gpu, a6xx_state);
>>>>
>>>> - a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
>>>> - a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
>>>> - a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
>>>> + a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
>>>> + a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
>>>> + a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
>>>> /
>>>> - a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
>>>> + a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
>>>> + }
>>>>
>>>> /* If GX isn't on the rest of the data isn't going to be accessible */
>>>> - if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>>>> + if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>>>> return &a6xx_state->base;
>>>>
>>>> /* Get the banks of indexed registers */
>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>> index 6934cee07d42..5c5901d65950 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>> @@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
>>>> if (!adreno_gpu->info->fw[i])
>>>> continue;
>>>>
>>>> + /* Skip loading GMU firwmare with GMU Wrapper */
>>>> + if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
>>>> + continue;
>>>> +
>>>> /* Skip if the firmware has already been loaded */
>>>> if (adreno_gpu->fw[i])
>>>> continue;
>>>> @@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>>>> u32 speedbin;
>>>> int ret;
>>>>
>>>> - /* Only handle the core clock when GMU is not in use */
>>>> - if (config->rev.core < 6) {
>>>> + /* Only handle the core clock when GMU is not in use (or is absent). */
>>>> + if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
>>>> /*
>>>> * This can only be done before devm_pm_opp_of_add_table(), or
>>>> * dev_pm_opp_set_config() will WARN_ON()
>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>>>> index f62612a5c70f..ee5352bc5329 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>>>> @@ -115,6 +115,7 @@ struct adreno_gpu {
>>>> * code (a3xx_gpu.c) and stored in this common location.
>>>> */
>>>> const unsigned int *reg_offsets;
>>>> + bool gmu_is_wrapper;
>>>> };
>>>> #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
>>>>
>>>> @@ -145,6 +146,11 @@ struct adreno_platform_config {
>>>>
>>>> bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
>>>>
>>>> +static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
>>>> +{
>>>> + return gpu->gmu_is_wrapper;
>>>> +}
>>>> +
>>>> static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
>>>> {
>>>> return (gpu->revn < 300);
>>>>
>>>> --
>>>> 2.40.1
>>>>
>>>
>>> I am still not fully onboard with the idea of gmu_wrapper node in devicetree.
>>> Aside from that, I don't see any other issue. Please check the few comments I left.
>> Thanks for your review!
>>
>> Konrad
>>>
>>> -Akhil.
>>>
On Sat, Jun 17, 2023 at 02:00:50AM +0200, Konrad Dybcio wrote:
>
> On 16.06.2023 19:54, Akhil P Oommen wrote:
> > On Thu, Jun 15, 2023 at 11:43:04PM +0200, Konrad Dybcio wrote:
> >>
> >> On 10.06.2023 00:06, Akhil P Oommen wrote:
> >>> On Mon, May 29, 2023 at 03:52:29PM +0200, Konrad Dybcio wrote:
> >>>>
> >>>> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
> >>>> but don't implement the associated GMUs. This is due to the fact that
> >>>> the GMU directly pokes at RPMh. Sadly, this means we have to take care
> >>>> of enabling & scaling power rails, clocks and bandwidth ourselves.
> >>>>
> >>>> Reuse existing Adreno-common code and modify the deeply-GMU-infused
> >>>> A6XX code to facilitate these GPUs. This involves if-ing out lots
> >>>> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
> >>>> the actual name that Qualcomm uses in their downstream kernels).
> >>>>
> >>>> This is essentially a register region which is convenient to model
> >>>> as a device. We'll use it for managing the GDSCs. The register
> >>>> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
> >>>> and lets us reuse quite a bit of gmu_read/write/rmw calls.
> >>>>
> >>>> Signed-off-by: Konrad Dybcio <[email protected]>
> >>>> ---
> [...]
>
> >>>> +
> >>>> + ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
> >>>> + if (ret)
> >>>> + goto err_bulk_clk;
> >>>> +
> >>>> + /* If anything goes south, tear the GPU down piece by piece.. */
> >>>> + if (ret) {
> >>>> +err_bulk_clk:
> >>>
> >>> Goto jump directly to another block looks odd to me. Why do you need this label
> >>> anyway?
> >> If clk_bulk_prepare_enable() fails, trying to proceed will hang the
> >> platform with unclocked accesses. We need to unwind everything that
> >> has been done up until that point, in reverse order.
> >
> > I missed this response from you earlier.
> >
> > But you are checking for 'ret' twice here. You will end up here even
> > if you don't jump! So "if (ret) goto err_bulk_clk;" looks
> > unnecessary.
> >
> > -Akhil.
> Ohhh right, silly mistake on my part ;)
>
> I already sent out a v9 since.. Please check it out and if you
> have any further comments, I'll fix this, and if not.. Perhaps I
> could fix it in an incremental patch if that revision is gtg?
Incremental patch is fine as there is no functional issue.
-Akhil.
>
> Konrad
> >
> >>
> >>>
> >>>> + pm_runtime_put(gmu->gxpd);
> >>>> + pm_runtime_put(gmu->dev);
> >>>> + dev_pm_opp_set_opp(&gpu->pdev->dev, NULL);
> >>>> + }
> >>>> +err_set_opp:
> >>>
> >>> Generally, it is better to name the label based on what you do here. For
> >>> eg: "unlock_lock:".
> >> That seems to be a mixed bag all throughout the kernel, I've seen many
> >> usages of err_(what went wrong)
> >>
> >>>
> >>> Also, this function is small enough that it is better to return directly
> >>> in case of error. I think that would be more readable.
> >> Not really, adding the necessary cleanup steps in `if (ret)`
> >> blocks would roughly double the function's size.
> >>
> >>>
> >>>> + mutex_unlock(&a6xx_gpu->gmu.lock);
> >>>> +
> >>>> + if (!ret)
> >>>> + msm_devfreq_resume(gpu);
> >>>> +
> >>>> + return ret;
> >>>> +}
> >>>> +
> >>>> +static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
> >>>> {
> >>>> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>> @@ -1720,7 +1799,40 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >>>> return 0;
> >>>> }
> >>>>
> >>>> -static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
> >>>> +static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >>>> +{
> >>>> + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>> + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>> + struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>>> + int i;
> >>>> +
> >>>> + trace_msm_gpu_suspend(0);
> >>>> +
> >>>> + msm_devfreq_suspend(gpu);
> >>>> +
> >>>> + mutex_lock(&a6xx_gpu->gmu.lock);
> >>>
> >>> Again, is this initialized somewhere?
> >>>
> >>>> +
> >>>> + /* Drain the outstanding traffic on memory buses */
> >>>> + a6xx_bus_clear_pending_transactions(adreno_gpu, true);
> >>>> +
> >>>> + clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
> >>>> +
> >>>> + pm_runtime_put_sync(gmu->gxpd);
> >>>> + dev_pm_opp_set_opp(&gpu->pdev->dev, NULL);
> >>>> + pm_runtime_put_sync(gmu->dev);
> >>>> +
> >>>> + mutex_unlock(&a6xx_gpu->gmu.lock);
> >>>> +
> >>>> + if (a6xx_gpu->shadow_bo)
> >>>> + for (i = 0; i < gpu->nr_rings; i++)
> >>>> + a6xx_gpu->shadow[i] = 0;
> >>>> +
> >>>> + gpu->suspend_count++;
> >>>> +
> >>>> + return 0;
> >>>> +}
> >>>> +
> >>>> +static int a6xx_gmu_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
> >>>> {
> >>>> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>> @@ -1739,6 +1851,12 @@ static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
> >>>> return 0;
> >>>> }
> >>>>
> >>>> +static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
> >>>> +{
> >>>> + *value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
> >>>> + return 0;
> >>>> +}
> >>>> +
> >>>> static struct msm_ringbuffer *a6xx_active_ring(struct msm_gpu *gpu)
> >>>> {
> >>>> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>> @@ -2004,8 +2122,8 @@ static const struct adreno_gpu_funcs funcs = {
> >>>> .set_param = adreno_set_param,
> >>>> .hw_init = a6xx_hw_init,
> >>>> .ucode_load = a6xx_ucode_load,
> >>>> - .pm_suspend = a6xx_pm_suspend,
> >>>> - .pm_resume = a6xx_pm_resume,
> >>>> + .pm_suspend = a6xx_gmu_pm_suspend,
> >>>> + .pm_resume = a6xx_gmu_pm_resume,
> >>>> .recover = a6xx_recover,
> >>>> .submit = a6xx_submit,
> >>>> .active_ring = a6xx_active_ring,
> >>>> @@ -2020,6 +2138,35 @@ static const struct adreno_gpu_funcs funcs = {
> >>>> #if defined(CONFIG_DRM_MSM_GPU_STATE)
> >>>> .gpu_state_get = a6xx_gpu_state_get,
> >>>> .gpu_state_put = a6xx_gpu_state_put,
> >>>> +#endif
> >>>> + .create_address_space = a6xx_create_address_space,
> >>>> + .create_private_address_space = a6xx_create_private_address_space,
> >>>> + .get_rptr = a6xx_get_rptr,
> >>>> + .progress = a6xx_progress,
> >>>> + },
> >>>> + .get_timestamp = a6xx_gmu_get_timestamp,
> >>>> +};
> >>>> +
> >>>> +static const struct adreno_gpu_funcs funcs_gmuwrapper = {
> >>>> + .base = {
> >>>> + .get_param = adreno_get_param,
> >>>> + .set_param = adreno_set_param,
> >>>> + .hw_init = a6xx_hw_init,
> >>>> + .ucode_load = a6xx_ucode_load,
> >>>> + .pm_suspend = a6xx_pm_suspend,
> >>>> + .pm_resume = a6xx_pm_resume,
> >>>> + .recover = a6xx_recover,
> >>>> + .submit = a6xx_submit,
> >>>> + .active_ring = a6xx_active_ring,
> >>>> + .irq = a6xx_irq,
> >>>> + .destroy = a6xx_destroy,
> >>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> >>>> + .show = a6xx_show,
> >>>> +#endif
> >>>> + .gpu_busy = a6xx_gpu_busy,
> >>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> >>>> + .gpu_state_get = a6xx_gpu_state_get,
> >>>> + .gpu_state_put = a6xx_gpu_state_put,
> >>>> #endif
> >>>> .create_address_space = a6xx_create_address_space,
> >>>> .create_private_address_space = a6xx_create_private_address_space,
> >>>> @@ -2050,15 +2197,31 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>>>
> >>>> adreno_gpu->registers = NULL;
> >>>>
> >>>> + /* Check if there is a GMU phandle and set it up */
> >>>> + node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> >>>> + /* FIXME: How do we gracefully handle this? */
> >>>> + BUG_ON(!node);
> >>>> +
> >>>> + adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
> >>>> +
> >>>> /*
> >>>> * We need to know the platform type before calling into adreno_gpu_init
> >>>> * so that the hw_apriv flag can be correctly set. Snoop into the info
> >>>> * and grab the revision number
> >>>> */
> >>>> info = adreno_info(config->rev);
> >>>> -
> >>>> - if (info && (info->revn == 650 || info->revn == 660 ||
> >>>> - adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
> >>>> + if (!info)
> >>>> + return ERR_PTR(-EINVAL);
> >>>> +
> >>>> + /* Assign these early so that we can use the is_aXYZ helpers */
> >>>> + /* Numeric revision IDs (e.g. 630) */
> >>>> + adreno_gpu->revn = info->revn;
> >>>> + /* New-style ADRENO_REV()-only */
> >>>> + adreno_gpu->rev = info->rev;
> >>>> + /* Quirk data */
> >>>> + adreno_gpu->info = info;
> >>>> +
> >>>> + if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
> >>>> adreno_gpu->base.hw_apriv = true;
> >>>>
> >>>> a6xx_llc_slices_init(pdev, a6xx_gpu);
> >>>> @@ -2069,7 +2232,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>>> return ERR_PTR(ret);
> >>>> }
> >>>>
> >>>> - ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> >>>> + if (adreno_has_gmu_wrapper(adreno_gpu))
> >>>> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
> >>>> + else
> >>>> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> >>>> if (ret) {
> >>>> a6xx_destroy(&(a6xx_gpu->base.base));
> >>>> return ERR_PTR(ret);
> >>>> @@ -2082,13 +2248,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>>> if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
> >>>> priv->gpu_clamp_to_idle = true;
> >>>>
> >>>> - /* Check if there is a GMU phandle and set it up */
> >>>> - node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> >>>> -
> >>>> - /* FIXME: How do we gracefully handle this? */
> >>>> - BUG_ON(!node);
> >>>> -
> >>>> - ret = a6xx_gmu_init(a6xx_gpu, node);
> >>>> + if (adreno_has_gmu_wrapper(adreno_gpu))
> >>>> + ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
> >>>> + else
> >>>> + ret = a6xx_gmu_init(a6xx_gpu, node);
> >>>> of_node_put(node);
> >>>> if (ret) {
> >>>> a6xx_destroy(&(a6xx_gpu->base.base));
> >>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >>>> index aa70390ee1c6..c788b06e72da 100644
> >>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >>>> @@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> >>>> void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> >>>>
> >>>> int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> >>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> >>>> void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
> >>>>
> >>>> void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> >>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> >>>> index 30ecdff363e7..4e5d650578c6 100644
> >>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> >>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> >>>> @@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
> >>>> /* Get the generic state from the adreno core */
> >>>> adreno_gpu_state_get(gpu, &a6xx_state->base);
> >>>>
> >>>> - a6xx_get_gmu_registers(gpu, a6xx_state);
> >>>> + if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>> + a6xx_get_gmu_registers(gpu, a6xx_state);
> >>>>
> >>>> - a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> >>>> - a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> >>>> - a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> >>>> + a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> >>>> + a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> >>>> + a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> >>>> /
> >>>> - a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> >>>> + a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> >>>> + }
> >>>>
> >>>> /* If GX isn't on the rest of the data isn't going to be accessible */
> >>>> - if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> >>>> + if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> >>>> return &a6xx_state->base;
> >>>>
> >>>> /* Get the banks of indexed registers */
> >>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >>>> index 6934cee07d42..5c5901d65950 100644
> >>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >>>> @@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
> >>>> if (!adreno_gpu->info->fw[i])
> >>>> continue;
> >>>>
> >>>> + /* Skip loading GMU firwmare with GMU Wrapper */
> >>>> + if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
> >>>> + continue;
> >>>> +
> >>>> /* Skip if the firmware has already been loaded */
> >>>> if (adreno_gpu->fw[i])
> >>>> continue;
> >>>> @@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
> >>>> u32 speedbin;
> >>>> int ret;
> >>>>
> >>>> - /* Only handle the core clock when GMU is not in use */
> >>>> - if (config->rev.core < 6) {
> >>>> + /* Only handle the core clock when GMU is not in use (or is absent). */
> >>>> + if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
> >>>> /*
> >>>> * This can only be done before devm_pm_opp_of_add_table(), or
> >>>> * dev_pm_opp_set_config() will WARN_ON()
> >>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> >>>> index f62612a5c70f..ee5352bc5329 100644
> >>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> >>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> >>>> @@ -115,6 +115,7 @@ struct adreno_gpu {
> >>>> * code (a3xx_gpu.c) and stored in this common location.
> >>>> */
> >>>> const unsigned int *reg_offsets;
> >>>> + bool gmu_is_wrapper;
> >>>> };
> >>>> #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
> >>>>
> >>>> @@ -145,6 +146,11 @@ struct adreno_platform_config {
> >>>>
> >>>> bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
> >>>>
> >>>> +static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
> >>>> +{
> >>>> + return gpu->gmu_is_wrapper;
> >>>> +}
> >>>> +
> >>>> static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
> >>>> {
> >>>> return (gpu->revn < 300);
> >>>>
> >>>> --
> >>>> 2.40.1
> >>>>
> >>>
> >>> I am still not fully onboard with the idea of gmu_wrapper node in devicetree.
> >>> Aside from that, I don't see any other issue. Please check the few comments I left.
> >> Thanks for your review!
> >>
> >> Konrad
> >>>
> >>> -Akhil.
> >>>
On 17.06.2023 18:07, Akhil P Oommen wrote:
> On Sat, Jun 17, 2023 at 02:00:50AM +0200, Konrad Dybcio wrote:
>>
>> On 16.06.2023 19:54, Akhil P Oommen wrote:
>>> On Thu, Jun 15, 2023 at 11:43:04PM +0200, Konrad Dybcio wrote:
>>>>
>>>> On 10.06.2023 00:06, Akhil P Oommen wrote:
>>>>> On Mon, May 29, 2023 at 03:52:29PM +0200, Konrad Dybcio wrote:
>>>>>>
>>>>>> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
>>>>>> but don't implement the associated GMUs. This is due to the fact that
>>>>>> the GMU directly pokes at RPMh. Sadly, this means we have to take care
>>>>>> of enabling & scaling power rails, clocks and bandwidth ourselves.
>>>>>>
>>>>>> Reuse existing Adreno-common code and modify the deeply-GMU-infused
>>>>>> A6XX code to facilitate these GPUs. This involves if-ing out lots
>>>>>> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
>>>>>> the actual name that Qualcomm uses in their downstream kernels).
>>>>>>
>>>>>> This is essentially a register region which is convenient to model
>>>>>> as a device. We'll use it for managing the GDSCs. The register
>>>>>> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
>>>>>> and lets us reuse quite a bit of gmu_read/write/rmw calls.
>>>>>>
>>>>>> Signed-off-by: Konrad Dybcio <[email protected]>
>>>>>> ---
>> [...]
>>
>>>>>> +
>>>>>> + ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
>>>>>> + if (ret)
>>>>>> + goto err_bulk_clk;
>>>>>> +
>>>>>> + /* If anything goes south, tear the GPU down piece by piece.. */
>>>>>> + if (ret) {
>>>>>> +err_bulk_clk:
>>>>>
>>>>> Goto jump directly to another block looks odd to me. Why do you need this label
>>>>> anyway?
>>>> If clk_bulk_prepare_enable() fails, trying to proceed will hang the
>>>> platform with unclocked accesses. We need to unwind everything that
>>>> has been done up until that point, in reverse order.
>>>
>>> I missed this response from you earlier.
>>>
>>> But you are checking for 'ret' twice here. You will end up here even
>>> if you don't jump! So "if (ret) goto err_bulk_clk;" looks
>>> unnecessary.
>>>
>>> -Akhil.
>> Ohhh right, silly mistake on my part ;)
>>
>> I already sent out a v9 since.. Please check it out and if you
>> have any further comments, I'll fix this, and if not.. Perhaps I
>> could fix it in an incremental patch if that revision is gtg?
>
> Incremental patch is fine as there is no functional issue.
Okay so I took another look with today's next that already contains
this series, and it currently looks like:
ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
if (ret)
goto err_bulk_clk;
if (adreno_is_a619_holi(adreno_gpu))
a6xx_sptprac_enable(gmu);
/* If anything goes south, tear the GPU down piece by piece.. */
if (ret) {
err_bulk_clk:
So it makes sense this way.. perhaps I just left it in this patch
by mistake when I was rebasing some changes. I guess it requires
no further action now?
Konrad
>
> -Akhil.
>
>>
>> Konrad
>>>
>>>>
>>>>>
>>>>>> + pm_runtime_put(gmu->gxpd);
>>>>>> + pm_runtime_put(gmu->dev);
>>>>>> + dev_pm_opp_set_opp(&gpu->pdev->dev, NULL);
>>>>>> + }
>>>>>> +err_set_opp:
>>>>>
>>>>> Generally, it is better to name the label based on what you do here. For
>>>>> eg: "unlock_lock:".
>>>> That seems to be a mixed bag all throughout the kernel, I've seen many
>>>> usages of err_(what went wrong)
>>>>
>>>>>
>>>>> Also, this function is small enough that it is better to return directly
>>>>> in case of error. I think that would be more readable.
>>>> Not really, adding the necessary cleanup steps in `if (ret)`
>>>> blocks would roughly double the function's size.
>>>>
>>>>>
>>>>>> + mutex_unlock(&a6xx_gpu->gmu.lock);
>>>>>> +
>>>>>> + if (!ret)
>>>>>> + msm_devfreq_resume(gpu);
>>>>>> +
>>>>>> + return ret;
>>>>>> +}
>>>>>> +
>>>>>> +static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
>>>>>> {
>>>>>> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>> @@ -1720,7 +1799,40 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> -static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>>>>>> +static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>>>>> +{
>>>>>> + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>> + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>> + struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>>>> + int i;
>>>>>> +
>>>>>> + trace_msm_gpu_suspend(0);
>>>>>> +
>>>>>> + msm_devfreq_suspend(gpu);
>>>>>> +
>>>>>> + mutex_lock(&a6xx_gpu->gmu.lock);
>>>>>
>>>>> Again, is this initialized somewhere?
>>>>>
>>>>>> +
>>>>>> + /* Drain the outstanding traffic on memory buses */
>>>>>> + a6xx_bus_clear_pending_transactions(adreno_gpu, true);
>>>>>> +
>>>>>> + clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
>>>>>> +
>>>>>> + pm_runtime_put_sync(gmu->gxpd);
>>>>>> + dev_pm_opp_set_opp(&gpu->pdev->dev, NULL);
>>>>>> + pm_runtime_put_sync(gmu->dev);
>>>>>> +
>>>>>> + mutex_unlock(&a6xx_gpu->gmu.lock);
>>>>>> +
>>>>>> + if (a6xx_gpu->shadow_bo)
>>>>>> + for (i = 0; i < gpu->nr_rings; i++)
>>>>>> + a6xx_gpu->shadow[i] = 0;
>>>>>> +
>>>>>> + gpu->suspend_count++;
>>>>>> +
>>>>>> + return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int a6xx_gmu_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>>>>>> {
>>>>>> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>> @@ -1739,6 +1851,12 @@ static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> +static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>>>>>> +{
>>>>>> + *value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
>>>>>> + return 0;
>>>>>> +}
>>>>>> +
>>>>>> static struct msm_ringbuffer *a6xx_active_ring(struct msm_gpu *gpu)
>>>>>> {
>>>>>> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>> @@ -2004,8 +2122,8 @@ static const struct adreno_gpu_funcs funcs = {
>>>>>> .set_param = adreno_set_param,
>>>>>> .hw_init = a6xx_hw_init,
>>>>>> .ucode_load = a6xx_ucode_load,
>>>>>> - .pm_suspend = a6xx_pm_suspend,
>>>>>> - .pm_resume = a6xx_pm_resume,
>>>>>> + .pm_suspend = a6xx_gmu_pm_suspend,
>>>>>> + .pm_resume = a6xx_gmu_pm_resume,
>>>>>> .recover = a6xx_recover,
>>>>>> .submit = a6xx_submit,
>>>>>> .active_ring = a6xx_active_ring,
>>>>>> @@ -2020,6 +2138,35 @@ static const struct adreno_gpu_funcs funcs = {
>>>>>> #if defined(CONFIG_DRM_MSM_GPU_STATE)
>>>>>> .gpu_state_get = a6xx_gpu_state_get,
>>>>>> .gpu_state_put = a6xx_gpu_state_put,
>>>>>> +#endif
>>>>>> + .create_address_space = a6xx_create_address_space,
>>>>>> + .create_private_address_space = a6xx_create_private_address_space,
>>>>>> + .get_rptr = a6xx_get_rptr,
>>>>>> + .progress = a6xx_progress,
>>>>>> + },
>>>>>> + .get_timestamp = a6xx_gmu_get_timestamp,
>>>>>> +};
>>>>>> +
>>>>>> +static const struct adreno_gpu_funcs funcs_gmuwrapper = {
>>>>>> + .base = {
>>>>>> + .get_param = adreno_get_param,
>>>>>> + .set_param = adreno_set_param,
>>>>>> + .hw_init = a6xx_hw_init,
>>>>>> + .ucode_load = a6xx_ucode_load,
>>>>>> + .pm_suspend = a6xx_pm_suspend,
>>>>>> + .pm_resume = a6xx_pm_resume,
>>>>>> + .recover = a6xx_recover,
>>>>>> + .submit = a6xx_submit,
>>>>>> + .active_ring = a6xx_active_ring,
>>>>>> + .irq = a6xx_irq,
>>>>>> + .destroy = a6xx_destroy,
>>>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
>>>>>> + .show = a6xx_show,
>>>>>> +#endif
>>>>>> + .gpu_busy = a6xx_gpu_busy,
>>>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
>>>>>> + .gpu_state_get = a6xx_gpu_state_get,
>>>>>> + .gpu_state_put = a6xx_gpu_state_put,
>>>>>> #endif
>>>>>> .create_address_space = a6xx_create_address_space,
>>>>>> .create_private_address_space = a6xx_create_private_address_space,
>>>>>> @@ -2050,15 +2197,31 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>>>
>>>>>> adreno_gpu->registers = NULL;
>>>>>>
>>>>>> + /* Check if there is a GMU phandle and set it up */
>>>>>> + node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
>>>>>> + /* FIXME: How do we gracefully handle this? */
>>>>>> + BUG_ON(!node);
>>>>>> +
>>>>>> + adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
>>>>>> +
>>>>>> /*
>>>>>> * We need to know the platform type before calling into adreno_gpu_init
>>>>>> * so that the hw_apriv flag can be correctly set. Snoop into the info
>>>>>> * and grab the revision number
>>>>>> */
>>>>>> info = adreno_info(config->rev);
>>>>>> -
>>>>>> - if (info && (info->revn == 650 || info->revn == 660 ||
>>>>>> - adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
>>>>>> + if (!info)
>>>>>> + return ERR_PTR(-EINVAL);
>>>>>> +
>>>>>> + /* Assign these early so that we can use the is_aXYZ helpers */
>>>>>> + /* Numeric revision IDs (e.g. 630) */
>>>>>> + adreno_gpu->revn = info->revn;
>>>>>> + /* New-style ADRENO_REV()-only */
>>>>>> + adreno_gpu->rev = info->rev;
>>>>>> + /* Quirk data */
>>>>>> + adreno_gpu->info = info;
>>>>>> +
>>>>>> + if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
>>>>>> adreno_gpu->base.hw_apriv = true;
>>>>>>
>>>>>> a6xx_llc_slices_init(pdev, a6xx_gpu);
>>>>>> @@ -2069,7 +2232,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>>> return ERR_PTR(ret);
>>>>>> }
>>>>>>
>>>>>> - ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
>>>>>> + if (adreno_has_gmu_wrapper(adreno_gpu))
>>>>>> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
>>>>>> + else
>>>>>> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
>>>>>> if (ret) {
>>>>>> a6xx_destroy(&(a6xx_gpu->base.base));
>>>>>> return ERR_PTR(ret);
>>>>>> @@ -2082,13 +2248,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>>> if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
>>>>>> priv->gpu_clamp_to_idle = true;
>>>>>>
>>>>>> - /* Check if there is a GMU phandle and set it up */
>>>>>> - node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
>>>>>> -
>>>>>> - /* FIXME: How do we gracefully handle this? */
>>>>>> - BUG_ON(!node);
>>>>>> -
>>>>>> - ret = a6xx_gmu_init(a6xx_gpu, node);
>>>>>> + if (adreno_has_gmu_wrapper(adreno_gpu))
>>>>>> + ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
>>>>>> + else
>>>>>> + ret = a6xx_gmu_init(a6xx_gpu, node);
>>>>>> of_node_put(node);
>>>>>> if (ret) {
>>>>>> a6xx_destroy(&(a6xx_gpu->base.base));
>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>>>> index aa70390ee1c6..c788b06e72da 100644
>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>>>> @@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>>>>>> void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>>>>>>
>>>>>> int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>>>>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>>>>>> void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
>>>>>>
>>>>>> void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>>>>>> index 30ecdff363e7..4e5d650578c6 100644
>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>>>>>> @@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
>>>>>> /* Get the generic state from the adreno core */
>>>>>> adreno_gpu_state_get(gpu, &a6xx_state->base);
>>>>>>
>>>>>> - a6xx_get_gmu_registers(gpu, a6xx_state);
>>>>>> + if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>> + a6xx_get_gmu_registers(gpu, a6xx_state);
>>>>>>
>>>>>> - a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
>>>>>> - a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
>>>>>> - a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
>>>>>> + a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
>>>>>> + a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
>>>>>> + a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
>>>>>> /
>>>>>> - a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
>>>>>> + a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
>>>>>> + }
>>>>>>
>>>>>> /* If GX isn't on the rest of the data isn't going to be accessible */
>>>>>> - if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>>>>>> + if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>>>>>> return &a6xx_state->base;
>>>>>>
>>>>>> /* Get the banks of indexed registers */
>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>>>> index 6934cee07d42..5c5901d65950 100644
>>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>>>> @@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
>>>>>> if (!adreno_gpu->info->fw[i])
>>>>>> continue;
>>>>>>
>>>>>> + /* Skip loading GMU firwmare with GMU Wrapper */
>>>>>> + if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
>>>>>> + continue;
>>>>>> +
>>>>>> /* Skip if the firmware has already been loaded */
>>>>>> if (adreno_gpu->fw[i])
>>>>>> continue;
>>>>>> @@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>>>>>> u32 speedbin;
>>>>>> int ret;
>>>>>>
>>>>>> - /* Only handle the core clock when GMU is not in use */
>>>>>> - if (config->rev.core < 6) {
>>>>>> + /* Only handle the core clock when GMU is not in use (or is absent). */
>>>>>> + if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
>>>>>> /*
>>>>>> * This can only be done before devm_pm_opp_of_add_table(), or
>>>>>> * dev_pm_opp_set_config() will WARN_ON()
>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>>>>>> index f62612a5c70f..ee5352bc5329 100644
>>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>>>>>> @@ -115,6 +115,7 @@ struct adreno_gpu {
>>>>>> * code (a3xx_gpu.c) and stored in this common location.
>>>>>> */
>>>>>> const unsigned int *reg_offsets;
>>>>>> + bool gmu_is_wrapper;
>>>>>> };
>>>>>> #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
>>>>>>
>>>>>> @@ -145,6 +146,11 @@ struct adreno_platform_config {
>>>>>>
>>>>>> bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
>>>>>>
>>>>>> +static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
>>>>>> +{
>>>>>> + return gpu->gmu_is_wrapper;
>>>>>> +}
>>>>>> +
>>>>>> static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
>>>>>> {
>>>>>> return (gpu->revn < 300);
>>>>>>
>>>>>> --
>>>>>> 2.40.1
>>>>>>
>>>>>
>>>>> I am still not fully onboard with the idea of gmu_wrapper node in devicetree.
>>>>> Aside from that, I don't see any other issue. Please check the few comments I left.
>>>> Thanks for your review!
>>>>
>>>> Konrad
>>>>>
>>>>> -Akhil.
>>>>>