A few (independent) fixes to the MHI bus for issues that I have come across
while developing against the mainline code.
Jeffrey Hugo (3):
bus: mhi: core: Handle syserr during power_up
bus: mhi: core: Make sure to powerdown if mhi_sync_power_up fails
bus: mhi: core: Remove link_status() callback
drivers/bus/mhi/core/init.c | 6 ++----
drivers/bus/mhi/core/main.c | 5 ++---
drivers/bus/mhi/core/pm.c | 26 +++++++++++++++++++++++++-
include/linux/mhi.h | 2 --
4 files changed, 29 insertions(+), 10 deletions(-)
--
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
The MHI device may be in the syserr state when we attempt to init it in
power_up(). Since we have no local state, the handling is simple -
reset the device and wait for it to transition out of the reset state.
Signed-off-by: Jeffrey Hugo <[email protected]>
---
drivers/bus/mhi/core/pm.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index 52690cb..cd6ba23 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -9,6 +9,7 @@
#include <linux/dma-direction.h>
#include <linux/dma-mapping.h>
#include <linux/interrupt.h>
+#include <linux/iopoll.h>
#include <linux/list.h>
#include <linux/mhi.h>
#include <linux/module.h>
@@ -760,6 +761,7 @@ static void mhi_deassert_dev_wake(struct mhi_controller *mhi_cntrl,
int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
{
+ enum mhi_state state;
enum mhi_ee_type current_ee;
enum dev_st_transition next_state;
struct device *dev = &mhi_cntrl->mhi_dev->dev;
@@ -829,6 +831,24 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
goto error_bhi_offset;
}
+ state = mhi_get_mhi_state(mhi_cntrl);
+ if (state == MHI_STATE_SYS_ERR) {
+ mhi_set_mhi_state(mhi_cntrl, MHI_STATE_RESET);
+ ret = readl_poll_timeout(mhi_cntrl->regs + MHICTRL, val,
+ !(val & MHICTRL_RESET_MASK), 1000,
+ mhi_cntrl->timeout_ms * 1000);
+ if (ret) {
+ dev_info(dev, "Failed to reset syserr\n");
+ goto error_bhi_offset;
+ }
+
+ /*
+ * device cleares INTVEC as part of RESET processing,
+ * re-program it
+ */
+ mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);
+ }
+
/* Transition to next state */
next_state = MHI_IN_PBL(current_ee) ?
DEV_ST_TRANSITION_PBL : DEV_ST_TRANSITION_READY;
--
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
Powerdown is necessary if mhi_sync_power_up fails due to a timeout, to
clean up the resources. Otherwise a BUG could be triggered when
attempting to clean up MSIs because the IRQ is still active from a
request_irq().
Signed-off-by: Jeffrey Hugo <[email protected]>
---
drivers/bus/mhi/core/pm.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index cd6ba23..1bfa334 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -922,7 +922,11 @@ int mhi_sync_power_up(struct mhi_controller *mhi_cntrl)
MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state),
msecs_to_jiffies(mhi_cntrl->timeout_ms));
- return (MHI_IN_MISSION_MODE(mhi_cntrl->ee)) ? 0 : -EIO;
+ ret = (MHI_IN_MISSION_MODE(mhi_cntrl->ee)) ? 0 : -EIO;
+
+ if (ret)
+ mhi_power_down(mhi_cntrl, false);
+ return ret;
}
EXPORT_SYMBOL(mhi_sync_power_up);
--
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
If the MHI core detects invalid data due to a PCI read, it calls into
the controller via link_status() to double check that the link is infact
down. All in all, this is pretty pointless, and racy. There are no good
reasons for this, and only drawbacks.
Its pointless because chances are, the controller is going to do the same
thing to determine if the link is down - attempt a PCI access and compare
the result. This does not make the link status decision any smarter.
Its racy because its possible that the link was down at the time of the
MHI core access, but then recovered before the controller access. In this
case, the controller will indicate the link is not down, and the MHI core
will precede to use a bad value as the MHI core does not attempt to retry
the access.
Retrying the access in the MHI core is a bad idea because again, it is
racy - what if the link is down again? Furthermore, there may be some
higher level state associated with the link status, that is now invalid
because the link went down.
The only reason why the MHI core could see "invalid" data when doing a PCI
access, that is actually valid, is if the register actually contained the
PCI spec defined sentinel for an invalid access. In this case, it is
arguable that the MHI implementation broken, and should be fixed, not
worked around.
Therefore, remove the link_status() callback before anyone attempts to
implement it.
Signed-off-by: Jeffrey Hugo <[email protected]>
---
drivers/bus/mhi/core/init.c | 6 ++----
drivers/bus/mhi/core/main.c | 5 ++---
include/linux/mhi.h | 2 --
3 files changed, 4 insertions(+), 9 deletions(-)
diff --git a/drivers/bus/mhi/core/init.c b/drivers/bus/mhi/core/init.c
index b38359c..2af08d57 100644
--- a/drivers/bus/mhi/core/init.c
+++ b/drivers/bus/mhi/core/init.c
@@ -812,10 +812,8 @@ int mhi_register_controller(struct mhi_controller *mhi_cntrl,
if (!mhi_cntrl)
return -EINVAL;
- if (!mhi_cntrl->runtime_get || !mhi_cntrl->runtime_put)
- return -EINVAL;
-
- if (!mhi_cntrl->status_cb || !mhi_cntrl->link_status)
+ if (!mhi_cntrl->runtime_get || !mhi_cntrl->runtime_put ||
+ !mhi_cntrl->status_cb)
return -EINVAL;
ret = parse_config(mhi_cntrl, config);
diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c
index eb4256b..473278b8 100644
--- a/drivers/bus/mhi/core/main.c
+++ b/drivers/bus/mhi/core/main.c
@@ -20,9 +20,8 @@ int __must_check mhi_read_reg(struct mhi_controller *mhi_cntrl,
{
u32 tmp = readl(base + offset);
- /* If there is any unexpected value, query the link status */
- if (PCI_INVALID_READ(tmp) &&
- mhi_cntrl->link_status(mhi_cntrl))
+ /* If the value is invalid, the link is down */
+ if (PCI_INVALID_READ(tmp))
return -EIO;
*out = tmp;
diff --git a/include/linux/mhi.h b/include/linux/mhi.h
index ad19960..be704a4 100644
--- a/include/linux/mhi.h
+++ b/include/linux/mhi.h
@@ -335,7 +335,6 @@ struct mhi_controller_config {
* @syserr_worker: System error worker
* @state_event: State change event
* @status_cb: CB function to notify power states of the device (required)
- * @link_status: CB function to query link status of the device (required)
* @wake_get: CB function to assert device wake (optional)
* @wake_put: CB function to de-assert device wake (optional)
* @wake_toggle: CB function to assert and de-assert device wake (optional)
@@ -417,7 +416,6 @@ struct mhi_controller {
void (*status_cb)(struct mhi_controller *mhi_cntrl,
enum mhi_callback cb);
- int (*link_status)(struct mhi_controller *mhi_cntrl);
void (*wake_get)(struct mhi_controller *mhi_cntrl, bool override);
void (*wake_put)(struct mhi_controller *mhi_cntrl, bool override);
void (*wake_toggle)(struct mhi_controller *mhi_cntrl);
--
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
Hi Jeff,
On Mon, Apr 06, 2020 at 03:04:37PM -0600, Jeffrey Hugo wrote:
> If the MHI core detects invalid data due to a PCI read, it calls into
> the controller via link_status() to double check that the link is infact
> down. All in all, this is pretty pointless, and racy. There are no good
> reasons for this, and only drawbacks.
>
> Its pointless because chances are, the controller is going to do the same
> thing to determine if the link is down - attempt a PCI access and compare
> the result. This does not make the link status decision any smarter.
>
> Its racy because its possible that the link was down at the time of the
> MHI core access, but then recovered before the controller access. In this
> case, the controller will indicate the link is not down, and the MHI core
> will precede to use a bad value as the MHI core does not attempt to retry
> the access.
>
> Retrying the access in the MHI core is a bad idea because again, it is
> racy - what if the link is down again? Furthermore, there may be some
> higher level state associated with the link status, that is now invalid
> because the link went down.
>
> The only reason why the MHI core could see "invalid" data when doing a PCI
> access, that is actually valid, is if the register actually contained the
> PCI spec defined sentinel for an invalid access. In this case, it is
> arguable that the MHI implementation broken, and should be fixed, not
> worked around.
>
> Therefore, remove the link_status() callback before anyone attempts to
> implement it.
>
> Signed-off-by: Jeffrey Hugo <[email protected]>
LGTM. But as per the IRC discussion I'd like the mhi_reg_read() to be
implemented as a callback in mhi_controller struct inorder to truly make MHI
a PCI agnostic bus.
Since we don't have any controller driver in mainline, I think it is the
good time to do this change.
For this,
Reviewed-by: Manivannan Sadhasivam <[email protected]>
Thanks,
Mani
> ---
> drivers/bus/mhi/core/init.c | 6 ++----
> drivers/bus/mhi/core/main.c | 5 ++---
> include/linux/mhi.h | 2 --
> 3 files changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/bus/mhi/core/init.c b/drivers/bus/mhi/core/init.c
> index b38359c..2af08d57 100644
> --- a/drivers/bus/mhi/core/init.c
> +++ b/drivers/bus/mhi/core/init.c
> @@ -812,10 +812,8 @@ int mhi_register_controller(struct mhi_controller *mhi_cntrl,
> if (!mhi_cntrl)
> return -EINVAL;
>
> - if (!mhi_cntrl->runtime_get || !mhi_cntrl->runtime_put)
> - return -EINVAL;
> -
> - if (!mhi_cntrl->status_cb || !mhi_cntrl->link_status)
> + if (!mhi_cntrl->runtime_get || !mhi_cntrl->runtime_put ||
> + !mhi_cntrl->status_cb)
> return -EINVAL;
>
> ret = parse_config(mhi_cntrl, config);
> diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c
> index eb4256b..473278b8 100644
> --- a/drivers/bus/mhi/core/main.c
> +++ b/drivers/bus/mhi/core/main.c
> @@ -20,9 +20,8 @@ int __must_check mhi_read_reg(struct mhi_controller *mhi_cntrl,
> {
> u32 tmp = readl(base + offset);
>
> - /* If there is any unexpected value, query the link status */
> - if (PCI_INVALID_READ(tmp) &&
> - mhi_cntrl->link_status(mhi_cntrl))
> + /* If the value is invalid, the link is down */
> + if (PCI_INVALID_READ(tmp))
> return -EIO;
>
> *out = tmp;
> diff --git a/include/linux/mhi.h b/include/linux/mhi.h
> index ad19960..be704a4 100644
> --- a/include/linux/mhi.h
> +++ b/include/linux/mhi.h
> @@ -335,7 +335,6 @@ struct mhi_controller_config {
> * @syserr_worker: System error worker
> * @state_event: State change event
> * @status_cb: CB function to notify power states of the device (required)
> - * @link_status: CB function to query link status of the device (required)
> * @wake_get: CB function to assert device wake (optional)
> * @wake_put: CB function to de-assert device wake (optional)
> * @wake_toggle: CB function to assert and de-assert device wake (optional)
> @@ -417,7 +416,6 @@ struct mhi_controller {
>
> void (*status_cb)(struct mhi_controller *mhi_cntrl,
> enum mhi_callback cb);
> - int (*link_status)(struct mhi_controller *mhi_cntrl);
> void (*wake_get)(struct mhi_controller *mhi_cntrl, bool override);
> void (*wake_put)(struct mhi_controller *mhi_cntrl, bool override);
> void (*wake_toggle)(struct mhi_controller *mhi_cntrl);
> --
> Qualcomm Technologies, Inc. is a member of the
> Code Aurora Forum, a Linux Foundation Collaborative Project.
On Mon, Apr 06, 2020 at 03:04:36PM -0600, Jeffrey Hugo wrote:
> Powerdown is necessary if mhi_sync_power_up fails due to a timeout, to
> clean up the resources. Otherwise a BUG could be triggered when
> attempting to clean up MSIs because the IRQ is still active from a
> request_irq().
>
> Signed-off-by: Jeffrey Hugo <[email protected]>
> ---
> drivers/bus/mhi/core/pm.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
> index cd6ba23..1bfa334 100644
> --- a/drivers/bus/mhi/core/pm.c
> +++ b/drivers/bus/mhi/core/pm.c
> @@ -922,7 +922,11 @@ int mhi_sync_power_up(struct mhi_controller *mhi_cntrl)
> MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state),
> msecs_to_jiffies(mhi_cntrl->timeout_ms));
>
> - return (MHI_IN_MISSION_MODE(mhi_cntrl->ee)) ? 0 : -EIO;
> + ret = (MHI_IN_MISSION_MODE(mhi_cntrl->ee)) ? 0 : -EIO;
> +
> + if (ret)
> + mhi_power_down(mhi_cntrl, false);
> + return ret;
I'd prefer the style of,
```
statement
if (cond)
statement
return
```
Please stick to this. The change itself looks good.
Thanks,
Mani
> }
> EXPORT_SYMBOL(mhi_sync_power_up);
>
> --
> Qualcomm Technologies, Inc. is a member of the
> Code Aurora Forum, a Linux Foundation Collaborative Project.
On Mon, Apr 06, 2020 at 03:04:35PM -0600, Jeffrey Hugo wrote:
> The MHI device may be in the syserr state when we attempt to init it in
> power_up(). Since we have no local state, the handling is simple -
> reset the device and wait for it to transition out of the reset state.
>
> Signed-off-by: Jeffrey Hugo <[email protected]>
> ---
> drivers/bus/mhi/core/pm.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
> index 52690cb..cd6ba23 100644
> --- a/drivers/bus/mhi/core/pm.c
> +++ b/drivers/bus/mhi/core/pm.c
> @@ -9,6 +9,7 @@
> #include <linux/dma-direction.h>
> #include <linux/dma-mapping.h>
> #include <linux/interrupt.h>
> +#include <linux/iopoll.h>
> #include <linux/list.h>
> #include <linux/mhi.h>
> #include <linux/module.h>
> @@ -760,6 +761,7 @@ static void mhi_deassert_dev_wake(struct mhi_controller *mhi_cntrl,
>
> int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
> {
> + enum mhi_state state;
> enum mhi_ee_type current_ee;
> enum dev_st_transition next_state;
> struct device *dev = &mhi_cntrl->mhi_dev->dev;
> @@ -829,6 +831,24 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
> goto error_bhi_offset;
> }
>
> + state = mhi_get_mhi_state(mhi_cntrl);
> + if (state == MHI_STATE_SYS_ERR) {
> + mhi_set_mhi_state(mhi_cntrl, MHI_STATE_RESET);
> + ret = readl_poll_timeout(mhi_cntrl->regs + MHICTRL, val,
> + !(val & MHICTRL_RESET_MASK), 1000,
Hmm. Do we really need a max 1ms delay between each read? I'd prefer to have
100ns to reduce the wait time.
> + mhi_cntrl->timeout_ms * 1000);
> + if (ret) {
> + dev_info(dev, "Failed to reset syserr\n");
dev_info(dev, "Failed to reset MHI due to syserr state\n"); ?
Thanks,
Mani
> + goto error_bhi_offset;
> + }
> +
> + /*
> + * device cleares INTVEC as part of RESET processing,
> + * re-program it
> + */
> + mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);
> + }
> +
> /* Transition to next state */
> next_state = MHI_IN_PBL(current_ee) ?
> DEV_ST_TRANSITION_PBL : DEV_ST_TRANSITION_READY;
> --
> Qualcomm Technologies, Inc. is a member of the
> Code Aurora Forum, a Linux Foundation Collaborative Project.
On 4/6/2020 11:58 PM, Manivannan Sadhasivam wrote:
> Hi Jeff,
>
> On Mon, Apr 06, 2020 at 03:04:37PM -0600, Jeffrey Hugo wrote:
>> If the MHI core detects invalid data due to a PCI read, it calls into
>> the controller via link_status() to double check that the link is infact
>> down. All in all, this is pretty pointless, and racy. There are no good
>> reasons for this, and only drawbacks.
>>
>> Its pointless because chances are, the controller is going to do the same
>> thing to determine if the link is down - attempt a PCI access and compare
>> the result. This does not make the link status decision any smarter.
>>
>> Its racy because its possible that the link was down at the time of the
>> MHI core access, but then recovered before the controller access. In this
>> case, the controller will indicate the link is not down, and the MHI core
>> will precede to use a bad value as the MHI core does not attempt to retry
>> the access.
>>
>> Retrying the access in the MHI core is a bad idea because again, it is
>> racy - what if the link is down again? Furthermore, there may be some
>> higher level state associated with the link status, that is now invalid
>> because the link went down.
>>
>> The only reason why the MHI core could see "invalid" data when doing a PCI
>> access, that is actually valid, is if the register actually contained the
>> PCI spec defined sentinel for an invalid access. In this case, it is
>> arguable that the MHI implementation broken, and should be fixed, not
>> worked around.
>>
>> Therefore, remove the link_status() callback before anyone attempts to
>> implement it.
>>
>> Signed-off-by: Jeffrey Hugo <[email protected]>
>
> LGTM. But as per the IRC discussion I'd like the mhi_reg_read() to be
> implemented as a callback in mhi_controller struct inorder to truly make MHI
> a PCI agnostic bus.
>
> Since we don't have any controller driver in mainline, I think it is the
> good time to do this change.
No problem. I thought you might prefer that approach, hence the
discussion. :)
Do you want that included in this change, or as a follow up?
>
> For this,
>
> Reviewed-by: Manivannan Sadhasivam <[email protected]>
>
> Thanks,
> Mani
>
>> ---
>> drivers/bus/mhi/core/init.c | 6 ++----
>> drivers/bus/mhi/core/main.c | 5 ++---
>> include/linux/mhi.h | 2 --
>> 3 files changed, 4 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/bus/mhi/core/init.c b/drivers/bus/mhi/core/init.c
>> index b38359c..2af08d57 100644
>> --- a/drivers/bus/mhi/core/init.c
>> +++ b/drivers/bus/mhi/core/init.c
>> @@ -812,10 +812,8 @@ int mhi_register_controller(struct mhi_controller *mhi_cntrl,
>> if (!mhi_cntrl)
>> return -EINVAL;
>>
>> - if (!mhi_cntrl->runtime_get || !mhi_cntrl->runtime_put)
>> - return -EINVAL;
>> -
>> - if (!mhi_cntrl->status_cb || !mhi_cntrl->link_status)
>> + if (!mhi_cntrl->runtime_get || !mhi_cntrl->runtime_put ||
>> + !mhi_cntrl->status_cb)
>> return -EINVAL;
>>
>> ret = parse_config(mhi_cntrl, config);
>> diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c
>> index eb4256b..473278b8 100644
>> --- a/drivers/bus/mhi/core/main.c
>> +++ b/drivers/bus/mhi/core/main.c
>> @@ -20,9 +20,8 @@ int __must_check mhi_read_reg(struct mhi_controller *mhi_cntrl,
>> {
>> u32 tmp = readl(base + offset);
>>
>> - /* If there is any unexpected value, query the link status */
>> - if (PCI_INVALID_READ(tmp) &&
>> - mhi_cntrl->link_status(mhi_cntrl))
>> + /* If the value is invalid, the link is down */
>> + if (PCI_INVALID_READ(tmp))
>> return -EIO;
>>
>> *out = tmp;
>> diff --git a/include/linux/mhi.h b/include/linux/mhi.h
>> index ad19960..be704a4 100644
>> --- a/include/linux/mhi.h
>> +++ b/include/linux/mhi.h
>> @@ -335,7 +335,6 @@ struct mhi_controller_config {
>> * @syserr_worker: System error worker
>> * @state_event: State change event
>> * @status_cb: CB function to notify power states of the device (required)
>> - * @link_status: CB function to query link status of the device (required)
>> * @wake_get: CB function to assert device wake (optional)
>> * @wake_put: CB function to de-assert device wake (optional)
>> * @wake_toggle: CB function to assert and de-assert device wake (optional)
>> @@ -417,7 +416,6 @@ struct mhi_controller {
>>
>> void (*status_cb)(struct mhi_controller *mhi_cntrl,
>> enum mhi_callback cb);
>> - int (*link_status)(struct mhi_controller *mhi_cntrl);
>> void (*wake_get)(struct mhi_controller *mhi_cntrl, bool override);
>> void (*wake_put)(struct mhi_controller *mhi_cntrl, bool override);
>> void (*wake_toggle)(struct mhi_controller *mhi_cntrl);
>> --
>> Qualcomm Technologies, Inc. is a member of the
>> Code Aurora Forum, a Linux Foundation Collaborative Project.
--
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
On 4/7/2020 12:14 AM, Manivannan Sadhasivam wrote:
> On Mon, Apr 06, 2020 at 03:04:36PM -0600, Jeffrey Hugo wrote:
>> Powerdown is necessary if mhi_sync_power_up fails due to a timeout, to
>> clean up the resources. Otherwise a BUG could be triggered when
>> attempting to clean up MSIs because the IRQ is still active from a
>> request_irq().
>>
>> Signed-off-by: Jeffrey Hugo <[email protected]>
>> ---
>> drivers/bus/mhi/core/pm.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
>> index cd6ba23..1bfa334 100644
>> --- a/drivers/bus/mhi/core/pm.c
>> +++ b/drivers/bus/mhi/core/pm.c
>> @@ -922,7 +922,11 @@ int mhi_sync_power_up(struct mhi_controller *mhi_cntrl)
>> MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state),
>> msecs_to_jiffies(mhi_cntrl->timeout_ms));
>>
>> - return (MHI_IN_MISSION_MODE(mhi_cntrl->ee)) ? 0 : -EIO;
>> + ret = (MHI_IN_MISSION_MODE(mhi_cntrl->ee)) ? 0 : -EIO;
>> +
>> + if (ret)
>> + mhi_power_down(mhi_cntrl, false);
>> + return ret;
>
> I'd prefer the style of,
>
> ```
> statement
> if (cond)
> statement
>
> return
> ```
>
> Please stick to this. The change itself looks good.
>
Sure, will do.
--
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
On 4/7/2020 12:26 AM, Manivannan Sadhasivam wrote:
> On Mon, Apr 06, 2020 at 03:04:35PM -0600, Jeffrey Hugo wrote:
>> The MHI device may be in the syserr state when we attempt to init it in
>> power_up(). Since we have no local state, the handling is simple -
>> reset the device and wait for it to transition out of the reset state.
>>
>> Signed-off-by: Jeffrey Hugo <[email protected]>
>> ---
>> drivers/bus/mhi/core/pm.c | 20 ++++++++++++++++++++
>> 1 file changed, 20 insertions(+)
>>
>> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
>> index 52690cb..cd6ba23 100644
>> --- a/drivers/bus/mhi/core/pm.c
>> +++ b/drivers/bus/mhi/core/pm.c
>> @@ -9,6 +9,7 @@
>> #include <linux/dma-direction.h>
>> #include <linux/dma-mapping.h>
>> #include <linux/interrupt.h>
>> +#include <linux/iopoll.h>
>> #include <linux/list.h>
>> #include <linux/mhi.h>
>> #include <linux/module.h>
>> @@ -760,6 +761,7 @@ static void mhi_deassert_dev_wake(struct mhi_controller *mhi_cntrl,
>>
>> int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
>> {
>> + enum mhi_state state;
>> enum mhi_ee_type current_ee;
>> enum dev_st_transition next_state;
>> struct device *dev = &mhi_cntrl->mhi_dev->dev;
>> @@ -829,6 +831,24 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
>> goto error_bhi_offset;
>> }
>>
>> + state = mhi_get_mhi_state(mhi_cntrl);
>> + if (state == MHI_STATE_SYS_ERR) {
>> + mhi_set_mhi_state(mhi_cntrl, MHI_STATE_RESET);
>> + ret = readl_poll_timeout(mhi_cntrl->regs + MHICTRL, val,
>> + !(val & MHICTRL_RESET_MASK), 1000,
>
> Hmm. Do we really need a max 1ms delay between each read? I'd prefer to have
> 100ns to reduce the wait time.
I assume you mean 100us since that's the units of the parameter, and
usleep_range is the actual delay mechanism. Please correct me if that
is a bad assumption.
I chose 1ms to try to avoid flooding the bus, since on one system we
care about, the round trip time was observed to be ~1ms. However, that
is fairly arbitrary, so a factor of 10 reduction don't seem like a
significant issue.
>
>> + mhi_cntrl->timeout_ms * 1000);
>> + if (ret) {
>> + dev_info(dev, "Failed to reset syserr\n");
>
> dev_info(dev, "Failed to reset MHI due to syserr state\n"); ?
>
Ah yes, that is clearer. Thanks
--
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
On Tue, Apr 07, 2020 at 09:03:28AM -0600, Jeffrey Hugo wrote:
> On 4/6/2020 11:58 PM, Manivannan Sadhasivam wrote:
> > Hi Jeff,
> >
> > On Mon, Apr 06, 2020 at 03:04:37PM -0600, Jeffrey Hugo wrote:
> > > If the MHI core detects invalid data due to a PCI read, it calls into
> > > the controller via link_status() to double check that the link is infact
> > > down. All in all, this is pretty pointless, and racy. There are no good
> > > reasons for this, and only drawbacks.
> > >
> > > Its pointless because chances are, the controller is going to do the same
> > > thing to determine if the link is down - attempt a PCI access and compare
> > > the result. This does not make the link status decision any smarter.
> > >
> > > Its racy because its possible that the link was down at the time of the
> > > MHI core access, but then recovered before the controller access. In this
> > > case, the controller will indicate the link is not down, and the MHI core
> > > will precede to use a bad value as the MHI core does not attempt to retry
> > > the access.
> > >
> > > Retrying the access in the MHI core is a bad idea because again, it is
> > > racy - what if the link is down again? Furthermore, there may be some
> > > higher level state associated with the link status, that is now invalid
> > > because the link went down.
> > >
> > > The only reason why the MHI core could see "invalid" data when doing a PCI
> > > access, that is actually valid, is if the register actually contained the
> > > PCI spec defined sentinel for an invalid access. In this case, it is
> > > arguable that the MHI implementation broken, and should be fixed, not
> > > worked around.
> > >
> > > Therefore, remove the link_status() callback before anyone attempts to
> > > implement it.
> > >
> > > Signed-off-by: Jeffrey Hugo <[email protected]>
> >
> > LGTM. But as per the IRC discussion I'd like the mhi_reg_read() to be
> > implemented as a callback in mhi_controller struct inorder to truly make MHI
> > a PCI agnostic bus.
> >
> > Since we don't have any controller driver in mainline, I think it is the
> > good time to do this change.
>
> No problem. I thought you might prefer that approach, hence the discussion.
> :)
>
> Do you want that included in this change, or as a follow up?
>
You can add a patch in this series itself.
Thanks,
Mani
> >
> > For this,
> >
> > Reviewed-by: Manivannan Sadhasivam <[email protected]>
> >
> > Thanks,
> > Mani
> >
> > > ---
> > > drivers/bus/mhi/core/init.c | 6 ++----
> > > drivers/bus/mhi/core/main.c | 5 ++---
> > > include/linux/mhi.h | 2 --
> > > 3 files changed, 4 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/bus/mhi/core/init.c b/drivers/bus/mhi/core/init.c
> > > index b38359c..2af08d57 100644
> > > --- a/drivers/bus/mhi/core/init.c
> > > +++ b/drivers/bus/mhi/core/init.c
> > > @@ -812,10 +812,8 @@ int mhi_register_controller(struct mhi_controller *mhi_cntrl,
> > > if (!mhi_cntrl)
> > > return -EINVAL;
> > > - if (!mhi_cntrl->runtime_get || !mhi_cntrl->runtime_put)
> > > - return -EINVAL;
> > > -
> > > - if (!mhi_cntrl->status_cb || !mhi_cntrl->link_status)
> > > + if (!mhi_cntrl->runtime_get || !mhi_cntrl->runtime_put ||
> > > + !mhi_cntrl->status_cb)
> > > return -EINVAL;
> > > ret = parse_config(mhi_cntrl, config);
> > > diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c
> > > index eb4256b..473278b8 100644
> > > --- a/drivers/bus/mhi/core/main.c
> > > +++ b/drivers/bus/mhi/core/main.c
> > > @@ -20,9 +20,8 @@ int __must_check mhi_read_reg(struct mhi_controller *mhi_cntrl,
> > > {
> > > u32 tmp = readl(base + offset);
> > > - /* If there is any unexpected value, query the link status */
> > > - if (PCI_INVALID_READ(tmp) &&
> > > - mhi_cntrl->link_status(mhi_cntrl))
> > > + /* If the value is invalid, the link is down */
> > > + if (PCI_INVALID_READ(tmp))
> > > return -EIO;
> > > *out = tmp;
> > > diff --git a/include/linux/mhi.h b/include/linux/mhi.h
> > > index ad19960..be704a4 100644
> > > --- a/include/linux/mhi.h
> > > +++ b/include/linux/mhi.h
> > > @@ -335,7 +335,6 @@ struct mhi_controller_config {
> > > * @syserr_worker: System error worker
> > > * @state_event: State change event
> > > * @status_cb: CB function to notify power states of the device (required)
> > > - * @link_status: CB function to query link status of the device (required)
> > > * @wake_get: CB function to assert device wake (optional)
> > > * @wake_put: CB function to de-assert device wake (optional)
> > > * @wake_toggle: CB function to assert and de-assert device wake (optional)
> > > @@ -417,7 +416,6 @@ struct mhi_controller {
> > > void (*status_cb)(struct mhi_controller *mhi_cntrl,
> > > enum mhi_callback cb);
> > > - int (*link_status)(struct mhi_controller *mhi_cntrl);
> > > void (*wake_get)(struct mhi_controller *mhi_cntrl, bool override);
> > > void (*wake_put)(struct mhi_controller *mhi_cntrl, bool override);
> > > void (*wake_toggle)(struct mhi_controller *mhi_cntrl);
> > > --
> > > Qualcomm Technologies, Inc. is a member of the
> > > Code Aurora Forum, a Linux Foundation Collaborative Project.
>
>
> --
> Jeffrey Hugo
> Qualcomm Technologies, Inc. is a member of the
> Code Aurora Forum, a Linux Foundation Collaborative Project.
On Tue, Apr 07, 2020 at 09:11:33AM -0600, Jeffrey Hugo wrote:
> On 4/7/2020 12:26 AM, Manivannan Sadhasivam wrote:
> > On Mon, Apr 06, 2020 at 03:04:35PM -0600, Jeffrey Hugo wrote:
> > > The MHI device may be in the syserr state when we attempt to init it in
> > > power_up(). Since we have no local state, the handling is simple -
> > > reset the device and wait for it to transition out of the reset state.
> > >
> > > Signed-off-by: Jeffrey Hugo <[email protected]>
> > > ---
> > > drivers/bus/mhi/core/pm.c | 20 ++++++++++++++++++++
> > > 1 file changed, 20 insertions(+)
> > >
> > > diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
> > > index 52690cb..cd6ba23 100644
> > > --- a/drivers/bus/mhi/core/pm.c
> > > +++ b/drivers/bus/mhi/core/pm.c
> > > @@ -9,6 +9,7 @@
> > > #include <linux/dma-direction.h>
> > > #include <linux/dma-mapping.h>
> > > #include <linux/interrupt.h>
> > > +#include <linux/iopoll.h>
> > > #include <linux/list.h>
> > > #include <linux/mhi.h>
> > > #include <linux/module.h>
> > > @@ -760,6 +761,7 @@ static void mhi_deassert_dev_wake(struct mhi_controller *mhi_cntrl,
> > > int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
> > > {
> > > + enum mhi_state state;
> > > enum mhi_ee_type current_ee;
> > > enum dev_st_transition next_state;
> > > struct device *dev = &mhi_cntrl->mhi_dev->dev;
> > > @@ -829,6 +831,24 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
> > > goto error_bhi_offset;
> > > }
> > > + state = mhi_get_mhi_state(mhi_cntrl);
> > > + if (state == MHI_STATE_SYS_ERR) {
> > > + mhi_set_mhi_state(mhi_cntrl, MHI_STATE_RESET);
> > > + ret = readl_poll_timeout(mhi_cntrl->regs + MHICTRL, val,
> > > + !(val & MHICTRL_RESET_MASK), 1000,
> >
> > Hmm. Do we really need a max 1ms delay between each read? I'd prefer to have
> > 100ns to reduce the wait time.
>
>
> I assume you mean 100us since that's the units of the parameter, and
> usleep_range is the actual delay mechanism. Please correct me if that is a
> bad assumption.
>
Yep, it will get extended to:
usleep_range((__sleep_us >> 2) + 1, __sleep_us)
So the max delay (range) here would be 100ns. Anyway, I'm okay with 1ms. Please
see below.
> I chose 1ms to try to avoid flooding the bus, since on one system we care
> about, the round trip time was observed to be ~1ms. However, that is fairly
> arbitrary, so a factor of 10 reduction don't seem like a significant issue.
>
Hmm. I thought 1ms is too much wait time but just looked into some downstream
controller implementation and they seem to be allowing the timeout value upto
2000ms. So this is fine. Sorry for the noise!
Thanks,
Mani
> >
> > > + mhi_cntrl->timeout_ms * 1000);
> > > + if (ret) {
> > > + dev_info(dev, "Failed to reset syserr\n");
> >
> > dev_info(dev, "Failed to reset MHI due to syserr state\n"); ?
> >
>
> Ah yes, that is clearer. Thanks
>
> --
> Jeffrey Hugo
> Qualcomm Technologies, Inc. is a member of the
> Code Aurora Forum, a Linux Foundation Collaborative Project.