2022-06-01 09:28:29

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v2] drm/msm/dpu: Move min BW request and full BW disable back to mdss

In commit a670ff578f1f ("drm/msm/dpu: always use mdp device to scale
bandwidth") we fully moved interconnect stuff to the DPU driver. This
had no change for sc7180 but _did_ have an impact for other SoCs. It
made them match the sc7180 scheme.

Unfortunately, the sc7180 scheme seems like it was a bit broken.
Specifically the interconnect needs to be on for more than just the
DPU driver's AXI bus. In the very least it also needs to be on for the
DSI driver's AXI bus. This can be seen fairly easily by doing this on
a ChromeOS sc7180-trogdor class device:

set_power_policy --ac_screen_dim_delay=5 --ac_screen_off_delay=10
sleep 10
cd /sys/bus/platform/devices/ae94000.dsi/power
echo on > control

When you do that, you'll get a warning splat in the logs about
"gcc_disp_hf_axi_clk status stuck at 'off'".

One could argue that perhaps what I have done above is "illegal" and
that it can't happen naturally in the system because in normal system
usage the DPU is pretty much always on when DSI is on. That being
said:
* In official ChromeOS builds (admittedly a 5.4 kernel with backports)
we have seen that splat at bootup.
* Even though we don't use "autosuspend" for these components, we
don't use the "put_sync" variants. Thus plausibly the DSI could stay
"runtime enabled" past when the DPU is enabled. Techncially we
shouldn't do that if the DPU's suspend ends up yanking our clock.

Let's change things such that the "bare minimum" request for the
interconnect happens in the mdss driver again. That means that all of
the children can assume that the interconnect is on at the minimum
bandwidth. We'll then let the DPU request the higher amount that it
wants.

It should be noted that this isn't as hacky of a solution as it might
initially appear. Specifically:
* Since MDSS and DPU individually get their own references to the
interconnect then the framework will actually handle aggregating
them. The two drivers are _not_ clobbering each other.
* When the Qualcomm interconnect driver aggregates it takes the max of
all the peaks. Thus having MDSS request a peak, as we're doing here,
won't actually change the total interconnect bandwidth (it won't be
added to the request for the DPU). This perhaps explains why the
"average" requested in MDSS was historically 0 since that one
_would_ be added in.

NOTE also that in the downstream ChromeOS 5.4 and 5.15 kernels, we're
also seeing some RPMH hangs that are addressed by this fix. These
hangs are showing up in the field and on _some_ devices with enough
stress testing of suspend/resume. Specifically right at suspend time
with a stack crawl that looks like this (from chromeos-5.15 tree):
rpmh_write_batch+0x19c/0x240
qcom_icc_bcm_voter_commit+0x210/0x420
qcom_icc_set+0x28/0x38
apply_constraints+0x70/0xa4
icc_set_bw+0x150/0x24c
dpu_runtime_resume+0x50/0x1c4
pm_generic_runtime_resume+0x30/0x44
__genpd_runtime_resume+0x68/0x7c
genpd_runtime_resume+0x12c/0x20c
__rpm_callback+0x98/0x138
rpm_callback+0x30/0x88
rpm_resume+0x370/0x4a0
__pm_runtime_resume+0x80/0xb0
dpu_kms_enable_commit+0x24/0x30
msm_atomic_commit_tail+0x12c/0x630
commit_tail+0xac/0x150
drm_atomic_helper_commit+0x114/0x11c
drm_atomic_commit+0x68/0x78
drm_atomic_helper_disable_all+0x158/0x1c8
drm_atomic_helper_suspend+0xc0/0x1c0
drm_mode_config_helper_suspend+0x2c/0x60
msm_pm_prepare+0x2c/0x40
pm_generic_prepare+0x30/0x44
genpd_prepare+0x80/0xd0
device_prepare+0x78/0x17c
dpm_prepare+0xb0/0x384
dpm_suspend_start+0x34/0xc0

We don't completely understand all the mechanisms in play, but the
hang seemed to come and go with random factors. It's not terribly
surprising that the hang is gone after this patch since the line of
code that was failing is no longer present in the kernel.

Fixes: a670ff578f1f ("drm/msm/dpu: always use mdp device to scale bandwidth")
Fixes: c33b7c0389e1 ("drm/msm/dpu: add support for clk and bw scaling for display")
Signed-off-by: Douglas Anderson <[email protected]>
---

Changes in v2:
- Don't set bandwidth in init.

drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 8 ----
drivers/gpu/drm/msm/msm_mdss.c | 57 +++++++++++++++++++++++++
2 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
index 2b9d931474e0..3025184053e0 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
@@ -49,8 +49,6 @@
#define DPU_DEBUGFS_DIR "msm_dpu"
#define DPU_DEBUGFS_HWMASKNAME "hw_log_mask"

-#define MIN_IB_BW 400000000ULL /* Min ib vote 400MB */
-
static int dpu_kms_hw_init(struct msm_kms *kms);
static void _dpu_kms_mmu_destroy(struct dpu_kms *dpu_kms);

@@ -1303,15 +1301,9 @@ static int __maybe_unused dpu_runtime_resume(struct device *dev)
struct dpu_kms *dpu_kms = to_dpu_kms(priv->kms);
struct drm_encoder *encoder;
struct drm_device *ddev;
- int i;

ddev = dpu_kms->dev;

- WARN_ON(!(dpu_kms->num_paths));
- /* Min vote of BW is required before turning on AXI clk */
- for (i = 0; i < dpu_kms->num_paths; i++)
- icc_set_bw(dpu_kms->path[i], 0, Bps_to_icc(MIN_IB_BW));
-
rc = clk_bulk_prepare_enable(dpu_kms->num_clocks, dpu_kms->clocks);
if (rc) {
DPU_ERROR("clock enable failed rc:%d\n", rc);
diff --git a/drivers/gpu/drm/msm/msm_mdss.c b/drivers/gpu/drm/msm/msm_mdss.c
index 0454a571adf7..e13c5c12b775 100644
--- a/drivers/gpu/drm/msm/msm_mdss.c
+++ b/drivers/gpu/drm/msm/msm_mdss.c
@@ -5,6 +5,7 @@

#include <linux/clk.h>
#include <linux/delay.h>
+#include <linux/interconnect.h>
#include <linux/irq.h>
#include <linux/irqchip.h>
#include <linux/irqdesc.h>
@@ -25,6 +26,8 @@
#define UBWC_CTRL_2 0x150
#define UBWC_PREDICTION_MODE 0x154

+#define MIN_IB_BW 400000000UL /* Min ib vote 400MB */
+
struct msm_mdss {
struct device *dev;

@@ -36,8 +39,47 @@ struct msm_mdss {
unsigned long enabled_mask;
struct irq_domain *domain;
} irq_controller;
+ struct icc_path *path[2];
+ u32 num_paths;
};

+static int msm_mdss_parse_data_bus_icc_path(struct device *dev,
+ struct msm_mdss *msm_mdss)
+{
+ struct icc_path *path0 = of_icc_get(dev, "mdp0-mem");
+ struct icc_path *path1 = of_icc_get(dev, "mdp1-mem");
+
+ if (IS_ERR_OR_NULL(path0))
+ return PTR_ERR_OR_ZERO(path0);
+
+ msm_mdss->path[0] = path0;
+ msm_mdss->num_paths = 1;
+
+ if (!IS_ERR_OR_NULL(path1)) {
+ msm_mdss->path[1] = path1;
+ msm_mdss->num_paths++;
+ }
+
+ return 0;
+}
+
+static void msm_mdss_put_icc_path(void *data)
+{
+ struct msm_mdss *msm_mdss = data;
+ int i;
+
+ for (i = 0; i < msm_mdss->num_paths; i++)
+ icc_put(msm_mdss->path[i]);
+}
+
+static void msm_mdss_icc_request_bw(struct msm_mdss *msm_mdss, unsigned long bw)
+{
+ int i;
+
+ for (i = 0; i < msm_mdss->num_paths; i++)
+ icc_set_bw(msm_mdss->path[i], 0, Bps_to_icc(bw));
+}
+
static void msm_mdss_irq(struct irq_desc *desc)
{
struct msm_mdss *msm_mdss = irq_desc_get_handler_data(desc);
@@ -136,6 +178,13 @@ static int msm_mdss_enable(struct msm_mdss *msm_mdss)
{
int ret;

+ /*
+ * Several components have AXI clocks that can only be turned on if
+ * the interconnect is enabled (non-zero bandwidth). Let's make sure
+ * that the interconnects are at least at a minimum amount.
+ */
+ msm_mdss_icc_request_bw(msm_mdss, MIN_IB_BW);
+
ret = clk_bulk_prepare_enable(msm_mdss->num_clocks, msm_mdss->clocks);
if (ret) {
dev_err(msm_mdss->dev, "clock enable failed, ret:%d\n", ret);
@@ -178,6 +227,7 @@ static int msm_mdss_enable(struct msm_mdss *msm_mdss)
static int msm_mdss_disable(struct msm_mdss *msm_mdss)
{
clk_bulk_disable_unprepare(msm_mdss->num_clocks, msm_mdss->clocks);
+ msm_mdss_icc_request_bw(msm_mdss, 0);

return 0;
}
@@ -271,6 +321,13 @@ static struct msm_mdss *msm_mdss_init(struct platform_device *pdev, bool is_mdp5

dev_dbg(&pdev->dev, "mapped mdss address space @%pK\n", msm_mdss->mmio);

+ ret = msm_mdss_parse_data_bus_icc_path(&pdev->dev, msm_mdss);
+ if (ret)
+ return ERR_PTR(ret);
+ ret = devm_add_action_or_reset(&pdev->dev, msm_mdss_put_icc_path, msm_mdss);
+ if (ret)
+ return ERR_PTR(ret);
+
if (is_mdp5)
ret = mdp5_mdss_parse_clock(pdev, &msm_mdss->clocks);
else
--
2.36.1.255.ge46751e96f-goog



2022-06-01 20:07:31

by Abhinav Kumar

[permalink] [raw]
Subject: Re: [PATCH v2] drm/msm/dpu: Move min BW request and full BW disable back to mdss



On 6/1/2022 3:04 AM, Dmitry Baryshkov wrote:
> On Wed, 1 Jun 2022 at 02:01, Douglas Anderson <[email protected]> wrote:
>>
>> In commit a670ff578f1f ("drm/msm/dpu: always use mdp device to scale
>> bandwidth") we fully moved interconnect stuff to the DPU driver. This
>> had no change for sc7180 but _did_ have an impact for other SoCs. It
>> made them match the sc7180 scheme.
>
> [skipped the description]
>
>>
>> Changes in v2:
>> - Don't set bandwidth in init.
>>
>> drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 8 ----
>> drivers/gpu/drm/msm/msm_mdss.c | 57 +++++++++++++++++++++++++
>> 2 files changed, 57 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
>> index 2b9d931474e0..3025184053e0 100644
>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
>> @@ -49,8 +49,6 @@
>> #define DPU_DEBUGFS_DIR "msm_dpu"
>> #define DPU_DEBUGFS_HWMASKNAME "hw_log_mask"
>>
>> -#define MIN_IB_BW 400000000ULL /* Min ib vote 400MB */
>> -
>> static int dpu_kms_hw_init(struct msm_kms *kms);
>> static void _dpu_kms_mmu_destroy(struct dpu_kms *dpu_kms);
>>
>
> [skipped]
>
>> diff --git a/drivers/gpu/drm/msm/msm_mdss.c b/drivers/gpu/drm/msm/msm_mdss.c
>> index 0454a571adf7..e13c5c12b775 100644
>> --- a/drivers/gpu/drm/msm/msm_mdss.c
>> +++ b/drivers/gpu/drm/msm/msm_mdss.c
>> @@ -5,6 +5,7 @@
>>
>> #include <linux/clk.h>
>> #include <linux/delay.h>
>> +#include <linux/interconnect.h>
>> #include <linux/irq.h>
>> #include <linux/irqchip.h>
>> #include <linux/irqdesc.h>
>> @@ -25,6 +26,8 @@
>> #define UBWC_CTRL_2 0x150
>> #define UBWC_PREDICTION_MODE 0x154
>>
>> +#define MIN_IB_BW 400000000UL /* Min ib vote 400MB */
>
> As msm_mdss is now used for both DPU and MDP5 devices, could you
> please confirm that this value is valid for older devices too? E.g.
> db410c or 8974
>
I need to check with Kalyan on this value (400MB) as I am unable to find
documentation on this. Will update this thread when I do.

So prior to this change 627dc55c273da ("drm/msm/disp/dpu1: icc path
needs to be set before dpu runtime resume"), this value was coming from
the hw catalog

@@ -1191,10 +1193,10 @@ static int __maybe_unused
dpu_runtime_resume(struct device *dev)

ddev = dpu_kms->dev;

+ WARN_ON(!(dpu_kms->num_paths));
/* Min vote of BW is required before turning on AXI clk */
for (i = 0; i < dpu_kms->num_paths; i++)
- icc_set_bw(dpu_kms->path[i], 0,
- dpu_kms->catalog->perf.min_dram_ib);
+ icc_set_bw(dpu_kms->path[i], 0, Bps_to_icc(MIN_IB_BW));

After this, we moved to a hard-coded value, I am not sure why.

So nothing wrong with this change as such, the only question is whether
this value is correct for older chips.

But the question here is, are older chips even using icc.

It seems like only sc7180, RB3/RB5 are unless i am mistaken.

So is there really any impact to the older chips with this change.

If not, we should probably let this one go ahead and move back to
catalog based approach while extending ICC for older chips.

Thanks

Abhinav

>> +
>> struct msm_mdss {
>> struct device *dev;
>>
>

2022-06-01 20:11:48

by Stephen Boyd

[permalink] [raw]
Subject: Re: [PATCH v2] drm/msm/dpu: Move min BW request and full BW disable back to mdss

Quoting Douglas Anderson (2022-05-31 16:01:26)
> In commit a670ff578f1f ("drm/msm/dpu: always use mdp device to scale
> bandwidth") we fully moved interconnect stuff to the DPU driver. This
> had no change for sc7180 but _did_ have an impact for other SoCs. It
> made them match the sc7180 scheme.
>
> Unfortunately, the sc7180 scheme seems like it was a bit broken.
> Specifically the interconnect needs to be on for more than just the
> DPU driver's AXI bus. In the very least it also needs to be on for the
> DSI driver's AXI bus. This can be seen fairly easily by doing this on
> a ChromeOS sc7180-trogdor class device:
>
> set_power_policy --ac_screen_dim_delay=5 --ac_screen_off_delay=10
> sleep 10
> cd /sys/bus/platform/devices/ae94000.dsi/power
> echo on > control
>
> When you do that, you'll get a warning splat in the logs about
> "gcc_disp_hf_axi_clk status stuck at 'off'".
>
> One could argue that perhaps what I have done above is "illegal" and
> that it can't happen naturally in the system because in normal system
> usage the DPU is pretty much always on when DSI is on. That being
> said:
> * In official ChromeOS builds (admittedly a 5.4 kernel with backports)
> we have seen that splat at bootup.
> * Even though we don't use "autosuspend" for these components, we
> don't use the "put_sync" variants. Thus plausibly the DSI could stay
> "runtime enabled" past when the DPU is enabled. Techncially we
> shouldn't do that if the DPU's suspend ends up yanking our clock.
>
> Let's change things such that the "bare minimum" request for the
> interconnect happens in the mdss driver again. That means that all of
> the children can assume that the interconnect is on at the minimum
> bandwidth. We'll then let the DPU request the higher amount that it
> wants.
>
> It should be noted that this isn't as hacky of a solution as it might
> initially appear. Specifically:
> * Since MDSS and DPU individually get their own references to the
> interconnect then the framework will actually handle aggregating
> them. The two drivers are _not_ clobbering each other.
> * When the Qualcomm interconnect driver aggregates it takes the max of
> all the peaks. Thus having MDSS request a peak, as we're doing here,
> won't actually change the total interconnect bandwidth (it won't be
> added to the request for the DPU). This perhaps explains why the
> "average" requested in MDSS was historically 0 since that one
> _would_ be added in.
>
> NOTE also that in the downstream ChromeOS 5.4 and 5.15 kernels, we're
> also seeing some RPMH hangs that are addressed by this fix. These
> hangs are showing up in the field and on _some_ devices with enough
> stress testing of suspend/resume. Specifically right at suspend time
> with a stack crawl that looks like this (from chromeos-5.15 tree):
> rpmh_write_batch+0x19c/0x240
> qcom_icc_bcm_voter_commit+0x210/0x420
> qcom_icc_set+0x28/0x38
> apply_constraints+0x70/0xa4
> icc_set_bw+0x150/0x24c
> dpu_runtime_resume+0x50/0x1c4
> pm_generic_runtime_resume+0x30/0x44
> __genpd_runtime_resume+0x68/0x7c
> genpd_runtime_resume+0x12c/0x20c
> __rpm_callback+0x98/0x138
> rpm_callback+0x30/0x88
> rpm_resume+0x370/0x4a0
> __pm_runtime_resume+0x80/0xb0
> dpu_kms_enable_commit+0x24/0x30
> msm_atomic_commit_tail+0x12c/0x630
> commit_tail+0xac/0x150
> drm_atomic_helper_commit+0x114/0x11c
> drm_atomic_commit+0x68/0x78
> drm_atomic_helper_disable_all+0x158/0x1c8
> drm_atomic_helper_suspend+0xc0/0x1c0
> drm_mode_config_helper_suspend+0x2c/0x60
> msm_pm_prepare+0x2c/0x40
> pm_generic_prepare+0x30/0x44
> genpd_prepare+0x80/0xd0
> device_prepare+0x78/0x17c
> dpm_prepare+0xb0/0x384
> dpm_suspend_start+0x34/0xc0
>
> We don't completely understand all the mechanisms in play, but the
> hang seemed to come and go with random factors. It's not terribly
> surprising that the hang is gone after this patch since the line of
> code that was failing is no longer present in the kernel.
>
> Fixes: a670ff578f1f ("drm/msm/dpu: always use mdp device to scale bandwidth")
> Fixes: c33b7c0389e1 ("drm/msm/dpu: add support for clk and bw scaling for display")
> Signed-off-by: Douglas Anderson <[email protected]>
> ---

Reviewed-by: Stephen Boyd <[email protected]>

2022-06-01 20:21:30

by Dmitry Baryshkov

[permalink] [raw]
Subject: Re: [PATCH v2] drm/msm/dpu: Move min BW request and full BW disable back to mdss

On Wed, 1 Jun 2022 at 02:01, Douglas Anderson <[email protected]> wrote:
>
> In commit a670ff578f1f ("drm/msm/dpu: always use mdp device to scale
> bandwidth") we fully moved interconnect stuff to the DPU driver. This
> had no change for sc7180 but _did_ have an impact for other SoCs. It
> made them match the sc7180 scheme.

[skipped the description]

>
> Changes in v2:
> - Don't set bandwidth in init.
>
> drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 8 ----
> drivers/gpu/drm/msm/msm_mdss.c | 57 +++++++++++++++++++++++++
> 2 files changed, 57 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> index 2b9d931474e0..3025184053e0 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> @@ -49,8 +49,6 @@
> #define DPU_DEBUGFS_DIR "msm_dpu"
> #define DPU_DEBUGFS_HWMASKNAME "hw_log_mask"
>
> -#define MIN_IB_BW 400000000ULL /* Min ib vote 400MB */
> -
> static int dpu_kms_hw_init(struct msm_kms *kms);
> static void _dpu_kms_mmu_destroy(struct dpu_kms *dpu_kms);
>

[skipped]

> diff --git a/drivers/gpu/drm/msm/msm_mdss.c b/drivers/gpu/drm/msm/msm_mdss.c
> index 0454a571adf7..e13c5c12b775 100644
> --- a/drivers/gpu/drm/msm/msm_mdss.c
> +++ b/drivers/gpu/drm/msm/msm_mdss.c
> @@ -5,6 +5,7 @@
>
> #include <linux/clk.h>
> #include <linux/delay.h>
> +#include <linux/interconnect.h>
> #include <linux/irq.h>
> #include <linux/irqchip.h>
> #include <linux/irqdesc.h>
> @@ -25,6 +26,8 @@
> #define UBWC_CTRL_2 0x150
> #define UBWC_PREDICTION_MODE 0x154
>
> +#define MIN_IB_BW 400000000UL /* Min ib vote 400MB */

As msm_mdss is now used for both DPU and MDP5 devices, could you
please confirm that this value is valid for older devices too? E.g.
db410c or 8974

> +
> struct msm_mdss {
> struct device *dev;
>

--
With best wishes
Dmitry

2022-06-01 20:32:53

by Jessica Zhang

[permalink] [raw]
Subject: Re: [Freedreno] [PATCH v2] drm/msm/dpu: Move min BW request and full BW disable back to mdss



On 5/31/2022 4:01 PM, Douglas Anderson wrote:
> In commit a670ff578f1f ("drm/msm/dpu: always use mdp device to scale
> bandwidth") we fully moved interconnect stuff to the DPU driver. This
> had no change for sc7180 but _did_ have an impact for other SoCs. It
> made them match the sc7180 scheme.
>
> Unfortunately, the sc7180 scheme seems like it was a bit broken.
> Specifically the interconnect needs to be on for more than just the
> DPU driver's AXI bus. In the very least it also needs to be on for the
> DSI driver's AXI bus. This can be seen fairly easily by doing this on
> a ChromeOS sc7180-trogdor class device:
>
> set_power_policy --ac_screen_dim_delay=5 --ac_screen_off_delay=10
> sleep 10
> cd /sys/bus/platform/devices/ae94000.dsi/power
> echo on > control
>
> When you do that, you'll get a warning splat in the logs about
> "gcc_disp_hf_axi_clk status stuck at 'off'".
>
> One could argue that perhaps what I have done above is "illegal" and
> that it can't happen naturally in the system because in normal system
> usage the DPU is pretty much always on when DSI is on. That being
> said:
> * In official ChromeOS builds (admittedly a 5.4 kernel with backports)
> we have seen that splat at bootup.
> * Even though we don't use "autosuspend" for these components, we
> don't use the "put_sync" variants. Thus plausibly the DSI could stay
> "runtime enabled" past when the DPU is enabled. Techncially we
> shouldn't do that if the DPU's suspend ends up yanking our clock.
>
> Let's change things such that the "bare minimum" request for the
> interconnect happens in the mdss driver again. That means that all of
> the children can assume that the interconnect is on at the minimum
> bandwidth. We'll then let the DPU request the higher amount that it
> wants.
>
> It should be noted that this isn't as hacky of a solution as it might
> initially appear. Specifically:
> * Since MDSS and DPU individually get their own references to the
> interconnect then the framework will actually handle aggregating
> them. The two drivers are _not_ clobbering each other.
> * When the Qualcomm interconnect driver aggregates it takes the max of
> all the peaks. Thus having MDSS request a peak, as we're doing here,
> won't actually change the total interconnect bandwidth (it won't be
> added to the request for the DPU). This perhaps explains why the
> "average" requested in MDSS was historically 0 since that one
> _would_ be added in.
>
> NOTE also that in the downstream ChromeOS 5.4 and 5.15 kernels, we're
> also seeing some RPMH hangs that are addressed by this fix. These
> hangs are showing up in the field and on _some_ devices with enough
> stress testing of suspend/resume. Specifically right at suspend time
> with a stack crawl that looks like this (from chromeos-5.15 tree):
> rpmh_write_batch+0x19c/0x240
> qcom_icc_bcm_voter_commit+0x210/0x420
> qcom_icc_set+0x28/0x38
> apply_constraints+0x70/0xa4
> icc_set_bw+0x150/0x24c
> dpu_runtime_resume+0x50/0x1c4
> pm_generic_runtime_resume+0x30/0x44
> __genpd_runtime_resume+0x68/0x7c
> genpd_runtime_resume+0x12c/0x20c
> __rpm_callback+0x98/0x138
> rpm_callback+0x30/0x88
> rpm_resume+0x370/0x4a0
> __pm_runtime_resume+0x80/0xb0
> dpu_kms_enable_commit+0x24/0x30
> msm_atomic_commit_tail+0x12c/0x630
> commit_tail+0xac/0x150
> drm_atomic_helper_commit+0x114/0x11c
> drm_atomic_commit+0x68/0x78
> drm_atomic_helper_disable_all+0x158/0x1c8
> drm_atomic_helper_suspend+0xc0/0x1c0
> drm_mode_config_helper_suspend+0x2c/0x60
> msm_pm_prepare+0x2c/0x40
> pm_generic_prepare+0x30/0x44
> genpd_prepare+0x80/0xd0
> device_prepare+0x78/0x17c
> dpm_prepare+0xb0/0x384
> dpm_suspend_start+0x34/0xc0
>
> We don't completely understand all the mechanisms in play, but the
> hang seemed to come and go with random factors. It's not terribly
> surprising that the hang is gone after this patch since the line of
> code that was failing is no longer present in the kernel.
>
> Fixes: a670ff578f1f ("drm/msm/dpu: always use mdp device to scale bandwidth")
> Fixes: c33b7c0389e1 ("drm/msm/dpu: add support for clk and bw scaling for display")
> Signed-off-by: Douglas Anderson <[email protected]>

Tested-by: Jessica Zhang <[email protected]> # RB3 (sdm845) and
RB5 (qrb5165)

> ---
>
> Changes in v2:
> - Don't set bandwidth in init.
>
> drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 8 ----
> drivers/gpu/drm/msm/msm_mdss.c | 57 +++++++++++++++++++++++++
> 2 files changed, 57 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> index 2b9d931474e0..3025184053e0 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> @@ -49,8 +49,6 @@
> #define DPU_DEBUGFS_DIR "msm_dpu"
> #define DPU_DEBUGFS_HWMASKNAME "hw_log_mask"
>
> -#define MIN_IB_BW 400000000ULL /* Min ib vote 400MB */
> -
> static int dpu_kms_hw_init(struct msm_kms *kms);
> static void _dpu_kms_mmu_destroy(struct dpu_kms *dpu_kms);
>
> @@ -1303,15 +1301,9 @@ static int __maybe_unused dpu_runtime_resume(struct device *dev)
> struct dpu_kms *dpu_kms = to_dpu_kms(priv->kms);
> struct drm_encoder *encoder;
> struct drm_device *ddev;
> - int i;
>
> ddev = dpu_kms->dev;
>
> - WARN_ON(!(dpu_kms->num_paths));
> - /* Min vote of BW is required before turning on AXI clk */
> - for (i = 0; i < dpu_kms->num_paths; i++)
> - icc_set_bw(dpu_kms->path[i], 0, Bps_to_icc(MIN_IB_BW));
> -
> rc = clk_bulk_prepare_enable(dpu_kms->num_clocks, dpu_kms->clocks);
> if (rc) {
> DPU_ERROR("clock enable failed rc:%d\n", rc);
> diff --git a/drivers/gpu/drm/msm/msm_mdss.c b/drivers/gpu/drm/msm/msm_mdss.c
> index 0454a571adf7..e13c5c12b775 100644
> --- a/drivers/gpu/drm/msm/msm_mdss.c
> +++ b/drivers/gpu/drm/msm/msm_mdss.c
> @@ -5,6 +5,7 @@
>
> #include <linux/clk.h>
> #include <linux/delay.h>
> +#include <linux/interconnect.h>
> #include <linux/irq.h>
> #include <linux/irqchip.h>
> #include <linux/irqdesc.h>
> @@ -25,6 +26,8 @@
> #define UBWC_CTRL_2 0x150
> #define UBWC_PREDICTION_MODE 0x154
>
> +#define MIN_IB_BW 400000000UL /* Min ib vote 400MB */
> +
> struct msm_mdss {
> struct device *dev;
>
> @@ -36,8 +39,47 @@ struct msm_mdss {
> unsigned long enabled_mask;
> struct irq_domain *domain;
> } irq_controller;
> + struct icc_path *path[2];
> + u32 num_paths;
> };
>
> +static int msm_mdss_parse_data_bus_icc_path(struct device *dev,
> + struct msm_mdss *msm_mdss)
> +{
> + struct icc_path *path0 = of_icc_get(dev, "mdp0-mem");
> + struct icc_path *path1 = of_icc_get(dev, "mdp1-mem");
> +
> + if (IS_ERR_OR_NULL(path0))
> + return PTR_ERR_OR_ZERO(path0);
> +
> + msm_mdss->path[0] = path0;
> + msm_mdss->num_paths = 1;
> +
> + if (!IS_ERR_OR_NULL(path1)) {
> + msm_mdss->path[1] = path1;
> + msm_mdss->num_paths++;
> + }
> +
> + return 0;
> +}
> +
> +static void msm_mdss_put_icc_path(void *data)
> +{
> + struct msm_mdss *msm_mdss = data;
> + int i;
> +
> + for (i = 0; i < msm_mdss->num_paths; i++)
> + icc_put(msm_mdss->path[i]);
> +}
> +
> +static void msm_mdss_icc_request_bw(struct msm_mdss *msm_mdss, unsigned long bw)
> +{
> + int i;
> +
> + for (i = 0; i < msm_mdss->num_paths; i++)
> + icc_set_bw(msm_mdss->path[i], 0, Bps_to_icc(bw));
> +}
> +
> static void msm_mdss_irq(struct irq_desc *desc)
> {
> struct msm_mdss *msm_mdss = irq_desc_get_handler_data(desc);
> @@ -136,6 +178,13 @@ static int msm_mdss_enable(struct msm_mdss *msm_mdss)
> {
> int ret;
>
> + /*
> + * Several components have AXI clocks that can only be turned on if
> + * the interconnect is enabled (non-zero bandwidth). Let's make sure
> + * that the interconnects are at least at a minimum amount.
> + */
> + msm_mdss_icc_request_bw(msm_mdss, MIN_IB_BW);
> +
> ret = clk_bulk_prepare_enable(msm_mdss->num_clocks, msm_mdss->clocks);
> if (ret) {
> dev_err(msm_mdss->dev, "clock enable failed, ret:%d\n", ret);
> @@ -178,6 +227,7 @@ static int msm_mdss_enable(struct msm_mdss *msm_mdss)
> static int msm_mdss_disable(struct msm_mdss *msm_mdss)
> {
> clk_bulk_disable_unprepare(msm_mdss->num_clocks, msm_mdss->clocks);
> + msm_mdss_icc_request_bw(msm_mdss, 0);
>
> return 0;
> }
> @@ -271,6 +321,13 @@ static struct msm_mdss *msm_mdss_init(struct platform_device *pdev, bool is_mdp5
>
> dev_dbg(&pdev->dev, "mapped mdss address space @%pK\n", msm_mdss->mmio);
>
> + ret = msm_mdss_parse_data_bus_icc_path(&pdev->dev, msm_mdss);
> + if (ret)
> + return ERR_PTR(ret);
> + ret = devm_add_action_or_reset(&pdev->dev, msm_mdss_put_icc_path, msm_mdss);
> + if (ret)
> + return ERR_PTR(ret);
> +
> if (is_mdp5)
> ret = mdp5_mdss_parse_clock(pdev, &msm_mdss->clocks);
> else
> --
> 2.36.1.255.ge46751e96f-goog
>

2022-06-01 20:51:20

by Abhinav Kumar

[permalink] [raw]
Subject: Re: [PATCH v2] drm/msm/dpu: Move min BW request and full BW disable back to mdss

Hi Doug

On 5/31/2022 4:01 PM, Douglas Anderson wrote:
> In commit a670ff578f1f ("drm/msm/dpu: always use mdp device to scale
> bandwidth") we fully moved interconnect stuff to the DPU driver. This
> had no change for sc7180 but _did_ have an impact for other SoCs. It
> made them match the sc7180 scheme.
>
> Unfortunately, the sc7180 scheme seems like it was a bit broken.
> Specifically the interconnect needs to be on for more than just the
> DPU driver's AXI bus. In the very least it also needs to be on for the
> DSI driver's AXI bus. This can be seen fairly easily by doing this on
> a ChromeOS sc7180-trogdor class device:
>
> set_power_policy --ac_screen_dim_delay=5 --ac_screen_off_delay=10
> sleep 10
> cd /sys/bus/platform/devices/ae94000.dsi/power
> echo on > control
>
> When you do that, you'll get a warning splat in the logs about
> "gcc_disp_hf_axi_clk status stuck at 'off'".
>
> One could argue that perhaps what I have done above is "illegal" and
> that it can't happen naturally in the system because in normal system
> usage the DPU is pretty much always on when DSI is on. That being
> said:
> * In official ChromeOS builds (admittedly a 5.4 kernel with backports)
> we have seen that splat at bootup.
> * Even though we don't use "autosuspend" for these components, we
> don't use the "put_sync" variants. Thus plausibly the DSI could stay
> "runtime enabled" past when the DPU is enabled. Techncially we
> shouldn't do that if the DPU's suspend ends up yanking our clock.
>
> Let's change things such that the "bare minimum" request for the
> interconnect happens in the mdss driver again. That means that all of
> the children can assume that the interconnect is on at the minimum
> bandwidth. We'll then let the DPU request the higher amount that it
> wants.
>
> It should be noted that this isn't as hacky of a solution as it might
> initially appear. Specifically:
> * Since MDSS and DPU individually get their own references to the
> interconnect then the framework will actually handle aggregating
> them. The two drivers are _not_ clobbering each other.
> * When the Qualcomm interconnect driver aggregates it takes the max of
> all the peaks. Thus having MDSS request a peak, as we're doing here,
> won't actually change the total interconnect bandwidth (it won't be
> added to the request for the DPU). This perhaps explains why the
> "average" requested in MDSS was historically 0 since that one
> _would_ be added in.
>
> NOTE also that in the downstream ChromeOS 5.4 and 5.15 kernels, we're
> also seeing some RPMH hangs that are addressed by this fix. These
> hangs are showing up in the field and on _some_ devices with enough
> stress testing of suspend/resume. Specifically right at suspend time
> with a stack crawl that looks like this (from chromeos-5.15 tree):
> rpmh_write_batch+0x19c/0x240
> qcom_icc_bcm_voter_commit+0x210/0x420
> qcom_icc_set+0x28/0x38
> apply_constraints+0x70/0xa4
> icc_set_bw+0x150/0x24c
> dpu_runtime_resume+0x50/0x1c4
> pm_generic_runtime_resume+0x30/0x44
> __genpd_runtime_resume+0x68/0x7c
> genpd_runtime_resume+0x12c/0x20c
> __rpm_callback+0x98/0x138
> rpm_callback+0x30/0x88
> rpm_resume+0x370/0x4a0
> __pm_runtime_resume+0x80/0xb0
> dpu_kms_enable_commit+0x24/0x30
> msm_atomic_commit_tail+0x12c/0x630
> commit_tail+0xac/0x150
> drm_atomic_helper_commit+0x114/0x11c
> drm_atomic_commit+0x68/0x78
> drm_atomic_helper_disable_all+0x158/0x1c8
> drm_atomic_helper_suspend+0xc0/0x1c0
> drm_mode_config_helper_suspend+0x2c/0x60
> msm_pm_prepare+0x2c/0x40
> pm_generic_prepare+0x30/0x44
> genpd_prepare+0x80/0xd0
> device_prepare+0x78/0x17c
> dpm_prepare+0xb0/0x384
> dpm_suspend_start+0x34/0xc0
>
> We don't completely understand all the mechanisms in play, but the
> hang seemed to come and go with random factors. It's not terribly
> surprising that the hang is gone after this patch since the line of
> code that was failing is no longer present in the kernel.
>
> Fixes: a670ff578f1f ("drm/msm/dpu: always use mdp device to scale bandwidth")
> Fixes: c33b7c0389e1 ("drm/msm/dpu: add support for clk and bw scaling for display")
> Signed-off-by: Douglas Anderson <[email protected]>
Reviewed-by: Abhinav Kumar <[email protected]>

We will test this out even on RB3/RB5 to make sure it boots up fine and
give Tested-by.

Thanks

Abhinav

> ---
>
> Changes in v2:
> - Don't set bandwidth in init.
>
> drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 8 ----
> drivers/gpu/drm/msm/msm_mdss.c | 57 +++++++++++++++++++++++++
> 2 files changed, 57 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> index 2b9d931474e0..3025184053e0 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> @@ -49,8 +49,6 @@
> #define DPU_DEBUGFS_DIR "msm_dpu"
> #define DPU_DEBUGFS_HWMASKNAME "hw_log_mask"
>
> -#define MIN_IB_BW 400000000ULL /* Min ib vote 400MB */
> -
> static int dpu_kms_hw_init(struct msm_kms *kms);
> static void _dpu_kms_mmu_destroy(struct dpu_kms *dpu_kms);
>
> @@ -1303,15 +1301,9 @@ static int __maybe_unused dpu_runtime_resume(struct device *dev)
> struct dpu_kms *dpu_kms = to_dpu_kms(priv->kms);
> struct drm_encoder *encoder;
> struct drm_device *ddev;
> - int i;
>
> ddev = dpu_kms->dev;
>
> - WARN_ON(!(dpu_kms->num_paths));
> - /* Min vote of BW is required before turning on AXI clk */
> - for (i = 0; i < dpu_kms->num_paths; i++)
> - icc_set_bw(dpu_kms->path[i], 0, Bps_to_icc(MIN_IB_BW));
> -
> rc = clk_bulk_prepare_enable(dpu_kms->num_clocks, dpu_kms->clocks);
> if (rc) {
> DPU_ERROR("clock enable failed rc:%d\n", rc);
> diff --git a/drivers/gpu/drm/msm/msm_mdss.c b/drivers/gpu/drm/msm/msm_mdss.c
> index 0454a571adf7..e13c5c12b775 100644
> --- a/drivers/gpu/drm/msm/msm_mdss.c
> +++ b/drivers/gpu/drm/msm/msm_mdss.c
> @@ -5,6 +5,7 @@
>
> #include <linux/clk.h>
> #include <linux/delay.h>
> +#include <linux/interconnect.h>
> #include <linux/irq.h>
> #include <linux/irqchip.h>
> #include <linux/irqdesc.h>
> @@ -25,6 +26,8 @@
> #define UBWC_CTRL_2 0x150
> #define UBWC_PREDICTION_MODE 0x154
>
> +#define MIN_IB_BW 400000000UL /* Min ib vote 400MB */
> +
> struct msm_mdss {
> struct device *dev;
>
> @@ -36,8 +39,47 @@ struct msm_mdss {
> unsigned long enabled_mask;
> struct irq_domain *domain;
> } irq_controller;
> + struct icc_path *path[2];
> + u32 num_paths;
> };
>
> +static int msm_mdss_parse_data_bus_icc_path(struct device *dev,
> + struct msm_mdss *msm_mdss)
> +{
> + struct icc_path *path0 = of_icc_get(dev, "mdp0-mem");
> + struct icc_path *path1 = of_icc_get(dev, "mdp1-mem");
> +
> + if (IS_ERR_OR_NULL(path0))
> + return PTR_ERR_OR_ZERO(path0);
> +
> + msm_mdss->path[0] = path0;
> + msm_mdss->num_paths = 1;
> +
> + if (!IS_ERR_OR_NULL(path1)) {
> + msm_mdss->path[1] = path1;
> + msm_mdss->num_paths++;
> + }
> +
> + return 0;
> +}
> +
> +static void msm_mdss_put_icc_path(void *data)
> +{
> + struct msm_mdss *msm_mdss = data;
> + int i;
> +
> + for (i = 0; i < msm_mdss->num_paths; i++)
> + icc_put(msm_mdss->path[i]);
> +}
> +
> +static void msm_mdss_icc_request_bw(struct msm_mdss *msm_mdss, unsigned long bw)
> +{
> + int i;
> +
> + for (i = 0; i < msm_mdss->num_paths; i++)
> + icc_set_bw(msm_mdss->path[i], 0, Bps_to_icc(bw));
> +}
> +
> static void msm_mdss_irq(struct irq_desc *desc)
> {
> struct msm_mdss *msm_mdss = irq_desc_get_handler_data(desc);
> @@ -136,6 +178,13 @@ static int msm_mdss_enable(struct msm_mdss *msm_mdss)
> {
> int ret;
>
> + /*
> + * Several components have AXI clocks that can only be turned on if
> + * the interconnect is enabled (non-zero bandwidth). Let's make sure
> + * that the interconnects are at least at a minimum amount.
> + */
> + msm_mdss_icc_request_bw(msm_mdss, MIN_IB_BW);
> +
> ret = clk_bulk_prepare_enable(msm_mdss->num_clocks, msm_mdss->clocks);
> if (ret) {
> dev_err(msm_mdss->dev, "clock enable failed, ret:%d\n", ret);
> @@ -178,6 +227,7 @@ static int msm_mdss_enable(struct msm_mdss *msm_mdss)
> static int msm_mdss_disable(struct msm_mdss *msm_mdss)
> {
> clk_bulk_disable_unprepare(msm_mdss->num_clocks, msm_mdss->clocks);
> + msm_mdss_icc_request_bw(msm_mdss, 0);
>
> return 0;
> }
> @@ -271,6 +321,13 @@ static struct msm_mdss *msm_mdss_init(struct platform_device *pdev, bool is_mdp5
>
> dev_dbg(&pdev->dev, "mapped mdss address space @%pK\n", msm_mdss->mmio);
>
> + ret = msm_mdss_parse_data_bus_icc_path(&pdev->dev, msm_mdss);
> + if (ret)
> + return ERR_PTR(ret);
> + ret = devm_add_action_or_reset(&pdev->dev, msm_mdss_put_icc_path, msm_mdss);
> + if (ret)
> + return ERR_PTR(ret);
> +
> if (is_mdp5)
> ret = mdp5_mdss_parse_clock(pdev, &msm_mdss->clocks);
> else

2022-06-01 21:38:29

by Dmitry Baryshkov

[permalink] [raw]
Subject: Re: [PATCH v2] drm/msm/dpu: Move min BW request and full BW disable back to mdss

On Wed, 1 Jun 2022 at 20:18, Abhinav Kumar <[email protected]> wrote:
> On 6/1/2022 3:04 AM, Dmitry Baryshkov wrote:
> > On Wed, 1 Jun 2022 at 02:01, Douglas Anderson <[email protected]> wrote:
> >>
> >> In commit a670ff578f1f ("drm/msm/dpu: always use mdp device to scale
> >> bandwidth") we fully moved interconnect stuff to the DPU driver. This
> >> had no change for sc7180 but _did_ have an impact for other SoCs. It
> >> made them match the sc7180 scheme.
> >
> > [skipped the description]
> >
> >>
> >> Changes in v2:
> >> - Don't set bandwidth in init.
> >>
> >> drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 8 ----
> >> drivers/gpu/drm/msm/msm_mdss.c | 57 +++++++++++++++++++++++++
> >> 2 files changed, 57 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> >> index 2b9d931474e0..3025184053e0 100644
> >> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> >> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> >> @@ -49,8 +49,6 @@
> >> #define DPU_DEBUGFS_DIR "msm_dpu"
> >> #define DPU_DEBUGFS_HWMASKNAME "hw_log_mask"
> >>
> >> -#define MIN_IB_BW 400000000ULL /* Min ib vote 400MB */
> >> -
> >> static int dpu_kms_hw_init(struct msm_kms *kms);
> >> static void _dpu_kms_mmu_destroy(struct dpu_kms *dpu_kms);
> >>
> >
> > [skipped]
> >
> >> diff --git a/drivers/gpu/drm/msm/msm_mdss.c b/drivers/gpu/drm/msm/msm_mdss.c
> >> index 0454a571adf7..e13c5c12b775 100644
> >> --- a/drivers/gpu/drm/msm/msm_mdss.c
> >> +++ b/drivers/gpu/drm/msm/msm_mdss.c
> >> @@ -5,6 +5,7 @@
> >>
> >> #include <linux/clk.h>
> >> #include <linux/delay.h>
> >> +#include <linux/interconnect.h>
> >> #include <linux/irq.h>
> >> #include <linux/irqchip.h>
> >> #include <linux/irqdesc.h>
> >> @@ -25,6 +26,8 @@
> >> #define UBWC_CTRL_2 0x150
> >> #define UBWC_PREDICTION_MODE 0x154
> >>
> >> +#define MIN_IB_BW 400000000UL /* Min ib vote 400MB */
> >
> > As msm_mdss is now used for both DPU and MDP5 devices, could you
> > please confirm that this value is valid for older devices too? E.g.
> > db410c or 8974
> >
> I need to check with Kalyan on this value (400MB) as I am unable to find
> documentation on this. Will update this thread when I do.
>
> So prior to this change 627dc55c273da ("drm/msm/disp/dpu1: icc path
> needs to be set before dpu runtime resume"), this value was coming from
> the hw catalog
>
> @@ -1191,10 +1193,10 @@ static int __maybe_unused
> dpu_runtime_resume(struct device *dev)
>
> ddev = dpu_kms->dev;
>
> + WARN_ON(!(dpu_kms->num_paths));
> /* Min vote of BW is required before turning on AXI clk */
> for (i = 0; i < dpu_kms->num_paths; i++)
> - icc_set_bw(dpu_kms->path[i], 0,
> - dpu_kms->catalog->perf.min_dram_ib);
> + icc_set_bw(dpu_kms->path[i], 0, Bps_to_icc(MIN_IB_BW));
>
> After this, we moved to a hard-coded value, I am not sure why.
>
> So nothing wrong with this change as such, the only question is whether
> this value is correct for older chips.
>
> But the question here is, are older chips even using icc.
>
> It seems like only sc7180, RB3/RB5 are unless i am mistaken.

We are not using it for msm8916 (but we should most probably). And for
the msm8996 the icc patches were by Yassine.

> So is there really any impact to the older chips with this change.
>
> If not, we should probably let this one go ahead and move back to
> catalog based approach while extending ICC for older chips.

Let's get this sorted out. I'm fine with 400 MBps, if that works for
all chipsets.

--
With best wishes
Dmitry

2022-06-01 23:05:16

by Abhinav Kumar

[permalink] [raw]
Subject: Re: [PATCH v2] drm/msm/dpu: Move min BW request and full BW disable back to mdss



On 6/1/2022 12:58 PM, Dmitry Baryshkov wrote:
> On Wed, 1 Jun 2022 at 20:18, Abhinav Kumar <[email protected]> wrote:
>> On 6/1/2022 3:04 AM, Dmitry Baryshkov wrote:
>>> On Wed, 1 Jun 2022 at 02:01, Douglas Anderson <[email protected]> wrote:
>>>>
>>>> In commit a670ff578f1f ("drm/msm/dpu: always use mdp device to scale
>>>> bandwidth") we fully moved interconnect stuff to the DPU driver. This
>>>> had no change for sc7180 but _did_ have an impact for other SoCs. It
>>>> made them match the sc7180 scheme.
>>>
>>> [skipped the description]
>>>
>>>>
>>>> Changes in v2:
>>>> - Don't set bandwidth in init.
>>>>
>>>> drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 8 ----
>>>> drivers/gpu/drm/msm/msm_mdss.c | 57 +++++++++++++++++++++++++
>>>> 2 files changed, 57 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
>>>> index 2b9d931474e0..3025184053e0 100644
>>>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
>>>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
>>>> @@ -49,8 +49,6 @@
>>>> #define DPU_DEBUGFS_DIR "msm_dpu"
>>>> #define DPU_DEBUGFS_HWMASKNAME "hw_log_mask"
>>>>
>>>> -#define MIN_IB_BW 400000000ULL /* Min ib vote 400MB */
>>>> -
>>>> static int dpu_kms_hw_init(struct msm_kms *kms);
>>>> static void _dpu_kms_mmu_destroy(struct dpu_kms *dpu_kms);
>>>>
>>>
>>> [skipped]
>>>
>>>> diff --git a/drivers/gpu/drm/msm/msm_mdss.c b/drivers/gpu/drm/msm/msm_mdss.c
>>>> index 0454a571adf7..e13c5c12b775 100644
>>>> --- a/drivers/gpu/drm/msm/msm_mdss.c
>>>> +++ b/drivers/gpu/drm/msm/msm_mdss.c
>>>> @@ -5,6 +5,7 @@
>>>>
>>>> #include <linux/clk.h>
>>>> #include <linux/delay.h>
>>>> +#include <linux/interconnect.h>
>>>> #include <linux/irq.h>
>>>> #include <linux/irqchip.h>
>>>> #include <linux/irqdesc.h>
>>>> @@ -25,6 +26,8 @@
>>>> #define UBWC_CTRL_2 0x150
>>>> #define UBWC_PREDICTION_MODE 0x154
>>>>
>>>> +#define MIN_IB_BW 400000000UL /* Min ib vote 400MB */
>>>
>>> As msm_mdss is now used for both DPU and MDP5 devices, could you
>>> please confirm that this value is valid for older devices too? E.g.
>>> db410c or 8974
>>>
>> I need to check with Kalyan on this value (400MB) as I am unable to find
>> documentation on this. Will update this thread when I do.
>>
>> So prior to this change 627dc55c273da ("drm/msm/disp/dpu1: icc path
>> needs to be set before dpu runtime resume"), this value was coming from
>> the hw catalog
>>
>> @@ -1191,10 +1193,10 @@ static int __maybe_unused
>> dpu_runtime_resume(struct device *dev)
>>
>> ddev = dpu_kms->dev;
>>
>> + WARN_ON(!(dpu_kms->num_paths));
>> /* Min vote of BW is required before turning on AXI clk */
>> for (i = 0; i < dpu_kms->num_paths; i++)
>> - icc_set_bw(dpu_kms->path[i], 0,
>> - dpu_kms->catalog->perf.min_dram_ib);
>> + icc_set_bw(dpu_kms->path[i], 0, Bps_to_icc(MIN_IB_BW));
>>
>> After this, we moved to a hard-coded value, I am not sure why.
>>
>> So nothing wrong with this change as such, the only question is whether
>> this value is correct for older chips.
>>
>> But the question here is, are older chips even using icc.
>>
>> It seems like only sc7180, RB3/RB5 are unless i am mistaken.
>
> We are not using it for msm8916 (but we should most probably). And for
> the msm8996 the icc patches were by Yassine.
>
>> So is there really any impact to the older chips with this change.
>>
>> If not, we should probably let this one go ahead and move back to
>> catalog based approach while extending ICC for older chips.
>
> Let's get this sorted out. I'm fine with 400 MBps, if that works for
> all chipsets.
>

I confirm that 400MBps min vote will work for all chipsets based on the
discussion i had with my team.

Here, the additional thing to note as per discussion with doug on IRC is

now two ICC paths get created, one from mdp5's probe and the other from
msm_mdss_init().

So the ICC driver will aggregate the votes and take the max for the
second parameter ( IB ).

So for normal use-cases this will still work fine.

Thanks

Abhinav


2022-06-01 23:10:26

by Dmitry Baryshkov

[permalink] [raw]
Subject: Re: [PATCH v2] drm/msm/dpu: Move min BW request and full BW disable back to mdss

On 01/06/2022 02:01, Douglas Anderson wrote:
> In commit a670ff578f1f ("drm/msm/dpu: always use mdp device to scale
> bandwidth") we fully moved interconnect stuff to the DPU driver. This
> had no change for sc7180 but _did_ have an impact for other SoCs. It
> made them match the sc7180 scheme.
>
> Unfortunately, the sc7180 scheme seems like it was a bit broken.
> Specifically the interconnect needs to be on for more than just the
> DPU driver's AXI bus. In the very least it also needs to be on for the
> DSI driver's AXI bus. This can be seen fairly easily by doing this on
> a ChromeOS sc7180-trogdor class device:
>
> set_power_policy --ac_screen_dim_delay=5 --ac_screen_off_delay=10
> sleep 10
> cd /sys/bus/platform/devices/ae94000.dsi/power
> echo on > control
>
> When you do that, you'll get a warning splat in the logs about
> "gcc_disp_hf_axi_clk status stuck at 'off'".
>
> One could argue that perhaps what I have done above is "illegal" and
> that it can't happen naturally in the system because in normal system
> usage the DPU is pretty much always on when DSI is on. That being
> said:
> * In official ChromeOS builds (admittedly a 5.4 kernel with backports)
> we have seen that splat at bootup.
> * Even though we don't use "autosuspend" for these components, we
> don't use the "put_sync" variants. Thus plausibly the DSI could stay
> "runtime enabled" past when the DPU is enabled. Techncially we
> shouldn't do that if the DPU's suspend ends up yanking our clock.
>
> Let's change things such that the "bare minimum" request for the
> interconnect happens in the mdss driver again. That means that all of
> the children can assume that the interconnect is on at the minimum
> bandwidth. We'll then let the DPU request the higher amount that it
> wants.
>
> It should be noted that this isn't as hacky of a solution as it might
> initially appear. Specifically:
> * Since MDSS and DPU individually get their own references to the
> interconnect then the framework will actually handle aggregating
> them. The two drivers are _not_ clobbering each other.
> * When the Qualcomm interconnect driver aggregates it takes the max of
> all the peaks. Thus having MDSS request a peak, as we're doing here,
> won't actually change the total interconnect bandwidth (it won't be
> added to the request for the DPU). This perhaps explains why the
> "average" requested in MDSS was historically 0 since that one
> _would_ be added in.
>
> NOTE also that in the downstream ChromeOS 5.4 and 5.15 kernels, we're
> also seeing some RPMH hangs that are addressed by this fix. These
> hangs are showing up in the field and on _some_ devices with enough
> stress testing of suspend/resume. Specifically right at suspend time
> with a stack crawl that looks like this (from chromeos-5.15 tree):
> rpmh_write_batch+0x19c/0x240
> qcom_icc_bcm_voter_commit+0x210/0x420
> qcom_icc_set+0x28/0x38
> apply_constraints+0x70/0xa4
> icc_set_bw+0x150/0x24c
> dpu_runtime_resume+0x50/0x1c4
> pm_generic_runtime_resume+0x30/0x44
> __genpd_runtime_resume+0x68/0x7c
> genpd_runtime_resume+0x12c/0x20c
> __rpm_callback+0x98/0x138
> rpm_callback+0x30/0x88
> rpm_resume+0x370/0x4a0
> __pm_runtime_resume+0x80/0xb0
> dpu_kms_enable_commit+0x24/0x30
> msm_atomic_commit_tail+0x12c/0x630
> commit_tail+0xac/0x150
> drm_atomic_helper_commit+0x114/0x11c
> drm_atomic_commit+0x68/0x78
> drm_atomic_helper_disable_all+0x158/0x1c8
> drm_atomic_helper_suspend+0xc0/0x1c0
> drm_mode_config_helper_suspend+0x2c/0x60
> msm_pm_prepare+0x2c/0x40
> pm_generic_prepare+0x30/0x44
> genpd_prepare+0x80/0xd0
> device_prepare+0x78/0x17c
> dpm_prepare+0xb0/0x384
> dpm_suspend_start+0x34/0xc0
>
> We don't completely understand all the mechanisms in play, but the
> hang seemed to come and go with random factors. It's not terribly
> surprising that the hang is gone after this patch since the line of
> code that was failing is no longer present in the kernel.
>
> Fixes: a670ff578f1f ("drm/msm/dpu: always use mdp device to scale bandwidth")
> Fixes: c33b7c0389e1 ("drm/msm/dpu: add support for clk and bw scaling for display")
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
>
> Changes in v2:
> - Don't set bandwidth in init.

Reviewed-by: Dmitry Baryshkov <[email protected]>

>
> drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 8 ----
> drivers/gpu/drm/msm/msm_mdss.c | 57 +++++++++++++++++++++++++
> 2 files changed, 57 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> index 2b9d931474e0..3025184053e0 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> @@ -49,8 +49,6 @@
> #define DPU_DEBUGFS_DIR "msm_dpu"
> #define DPU_DEBUGFS_HWMASKNAME "hw_log_mask"
>
> -#define MIN_IB_BW 400000000ULL /* Min ib vote 400MB */
> -
> static int dpu_kms_hw_init(struct msm_kms *kms);
> static void _dpu_kms_mmu_destroy(struct dpu_kms *dpu_kms);
>
> @@ -1303,15 +1301,9 @@ static int __maybe_unused dpu_runtime_resume(struct device *dev)
> struct dpu_kms *dpu_kms = to_dpu_kms(priv->kms);
> struct drm_encoder *encoder;
> struct drm_device *ddev;
> - int i;
>
> ddev = dpu_kms->dev;
>
> - WARN_ON(!(dpu_kms->num_paths));
> - /* Min vote of BW is required before turning on AXI clk */
> - for (i = 0; i < dpu_kms->num_paths; i++)
> - icc_set_bw(dpu_kms->path[i], 0, Bps_to_icc(MIN_IB_BW));
> -
> rc = clk_bulk_prepare_enable(dpu_kms->num_clocks, dpu_kms->clocks);
> if (rc) {
> DPU_ERROR("clock enable failed rc:%d\n", rc);
> diff --git a/drivers/gpu/drm/msm/msm_mdss.c b/drivers/gpu/drm/msm/msm_mdss.c
> index 0454a571adf7..e13c5c12b775 100644
> --- a/drivers/gpu/drm/msm/msm_mdss.c
> +++ b/drivers/gpu/drm/msm/msm_mdss.c
> @@ -5,6 +5,7 @@
>
> #include <linux/clk.h>
> #include <linux/delay.h>
> +#include <linux/interconnect.h>
> #include <linux/irq.h>
> #include <linux/irqchip.h>
> #include <linux/irqdesc.h>
> @@ -25,6 +26,8 @@
> #define UBWC_CTRL_2 0x150
> #define UBWC_PREDICTION_MODE 0x154
>
> +#define MIN_IB_BW 400000000UL /* Min ib vote 400MB */
> +
> struct msm_mdss {
> struct device *dev;
>
> @@ -36,8 +39,47 @@ struct msm_mdss {
> unsigned long enabled_mask;
> struct irq_domain *domain;
> } irq_controller;
> + struct icc_path *path[2];
> + u32 num_paths;
> };
>
> +static int msm_mdss_parse_data_bus_icc_path(struct device *dev,
> + struct msm_mdss *msm_mdss)
> +{
> + struct icc_path *path0 = of_icc_get(dev, "mdp0-mem");
> + struct icc_path *path1 = of_icc_get(dev, "mdp1-mem");
> +
> + if (IS_ERR_OR_NULL(path0))
> + return PTR_ERR_OR_ZERO(path0);
> +
> + msm_mdss->path[0] = path0;
> + msm_mdss->num_paths = 1;
> +
> + if (!IS_ERR_OR_NULL(path1)) {
> + msm_mdss->path[1] = path1;
> + msm_mdss->num_paths++;
> + }
> +
> + return 0;
> +}
> +
> +static void msm_mdss_put_icc_path(void *data)
> +{
> + struct msm_mdss *msm_mdss = data;
> + int i;
> +
> + for (i = 0; i < msm_mdss->num_paths; i++)
> + icc_put(msm_mdss->path[i]);
> +}
> +
> +static void msm_mdss_icc_request_bw(struct msm_mdss *msm_mdss, unsigned long bw)
> +{
> + int i;
> +
> + for (i = 0; i < msm_mdss->num_paths; i++)
> + icc_set_bw(msm_mdss->path[i], 0, Bps_to_icc(bw));
> +}
> +
> static void msm_mdss_irq(struct irq_desc *desc)
> {
> struct msm_mdss *msm_mdss = irq_desc_get_handler_data(desc);
> @@ -136,6 +178,13 @@ static int msm_mdss_enable(struct msm_mdss *msm_mdss)
> {
> int ret;
>
> + /*
> + * Several components have AXI clocks that can only be turned on if
> + * the interconnect is enabled (non-zero bandwidth). Let's make sure
> + * that the interconnects are at least at a minimum amount.
> + */
> + msm_mdss_icc_request_bw(msm_mdss, MIN_IB_BW);
> +
> ret = clk_bulk_prepare_enable(msm_mdss->num_clocks, msm_mdss->clocks);
> if (ret) {
> dev_err(msm_mdss->dev, "clock enable failed, ret:%d\n", ret);
> @@ -178,6 +227,7 @@ static int msm_mdss_enable(struct msm_mdss *msm_mdss)
> static int msm_mdss_disable(struct msm_mdss *msm_mdss)
> {
> clk_bulk_disable_unprepare(msm_mdss->num_clocks, msm_mdss->clocks);
> + msm_mdss_icc_request_bw(msm_mdss, 0);
>
> return 0;
> }
> @@ -271,6 +321,13 @@ static struct msm_mdss *msm_mdss_init(struct platform_device *pdev, bool is_mdp5
>
> dev_dbg(&pdev->dev, "mapped mdss address space @%pK\n", msm_mdss->mmio);
>
> + ret = msm_mdss_parse_data_bus_icc_path(&pdev->dev, msm_mdss);
> + if (ret)
> + return ERR_PTR(ret);
> + ret = devm_add_action_or_reset(&pdev->dev, msm_mdss_put_icc_path, msm_mdss);
> + if (ret)
> + return ERR_PTR(ret);
> +
> if (is_mdp5)
> ret = mdp5_mdss_parse_clock(pdev, &msm_mdss->clocks);
> else


--
With best wishes
Dmitry