With the QAIC driver in -next, I'd like to suggest some MHI changes that
specific to AIC100 devices, but perhaps provide a framework for other
device oddities.
AIC100 devices technically violate the MHI spec in two ways. Sadly, these
issues comes from the device hardware, so host SW needs to work around
them.
Thie first issue, presented in this series, has to do with the
SOC_HW_VERSION register. This register is suposed to be initialized by the
hardware prior to the MHI being accessable by the host to contain a
version string for the SoC of the device. This could be used by the host
MHI controller software to identify and handle version to version changes.
The AIC100 hardware does not initialize this register, and thus it
contains garbage.
This would not be much of a problem normally - the QAIC driver would just
never use it. However the MHI stack uses this register as part of the init
sequence and if the controller reports that the register is inaccessable
then the init sequence fails. On some AIC100 cards, the garbage value
ends up being 0xFFFFFFFF which is PCIe spec defined to be a special value
indicating the access failed. The MHI controller cannot tell if that
value is a PCIe link issue, or just garbage.
QAIC needs a way to tell MHI not to use this register. Other buses have a
quirk mechanism - a way to describe oddities in a particular
implementation that have some kind of workaround. Since this seems to be
the first need for such a thing in MHI, introduce a quirk framework.
The second issue AIC100 has involves the PK Hash registers. A solution for
this is expected to be proposed in the near future and is anticipated to
make use of the quirk framework proposed here. With PK Hash, there are two
oddities to handle. AIC100 does not initialize these registers until the
SBL is running, which is later than the spec indicates, and in practice
is after MHI reads/caches them. Also, AIC100 does not have enough
registers defined to fully report the 5 PK Hash slots, so a custom
reporting format is defined by the device.
v2:
-Fix build error
-Fix typo in commit text
Jeffrey Hugo (2):
bus: mhi: host: Add quirk framework and initial quirk
accel/qaic: Add MHI_QUIRK_SOC_HW_VERSION_UNRELIABLE
drivers/accel/qaic/mhi_controller.c | 1 +
drivers/bus/mhi/host/init.c | 13 +++++++++----
include/linux/mhi.h | 18 ++++++++++++++++++
3 files changed, 28 insertions(+), 4 deletions(-)
--
2.40.1
AIC100 does not initialize the SOC_HW_VERSION MHI register as expected.
Some instances of AIC100 are observed to have 0xFFFFFFFF in this register
which makes the controller think that the link is down and return an error
up to MHI. This results in a failed initialization.
Allow these cards to initialize by advertising
MHI_QUIRK_SOC_HW_VERSION_UNRELIABLE in the MHI controller.
Change-Id: I62f31a063abf49e8f22ee2aa630d76babba83c8d
Signed-off-by: Jeffrey Hugo <[email protected]>
Reviewed-by: Carl Vanderlip <[email protected]>
Reviewed-by: Pranjal Ramajor Asha Kanojiya <[email protected]>
---
drivers/accel/qaic/mhi_controller.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/accel/qaic/mhi_controller.c b/drivers/accel/qaic/mhi_controller.c
index 5036e58e7235..2c85063d8cc6 100644
--- a/drivers/accel/qaic/mhi_controller.c
+++ b/drivers/accel/qaic/mhi_controller.c
@@ -400,6 +400,7 @@ static struct mhi_controller_config aic100_config = {
.event_cfg = aic100_events,
.use_bounce_buf = false,
.m2_no_db = false,
+ .quirks = MHI_QUIRK_SOC_HW_VERSION_UNRELIABLE,
};
static int mhi_read_reg(struct mhi_controller *mhi_cntrl, void __iomem *addr, u32 *out)
--
2.40.1
On Fri, May 19, 2023 at 10:39:00AM -0600, Jeffrey Hugo wrote:
> With the QAIC driver in -next, I'd like to suggest some MHI changes that
> specific to AIC100 devices, but perhaps provide a framework for other
> device oddities.
>
> AIC100 devices technically violate the MHI spec in two ways. Sadly, these
> issues comes from the device hardware, so host SW needs to work around
> them.
>
> Thie first issue, presented in this series, has to do with the
> SOC_HW_VERSION register. This register is suposed to be initialized by the
> hardware prior to the MHI being accessable by the host to contain a
> version string for the SoC of the device. This could be used by the host
> MHI controller software to identify and handle version to version changes.
> The AIC100 hardware does not initialize this register, and thus it
> contains garbage.
>
> This would not be much of a problem normally - the QAIC driver would just
> never use it. However the MHI stack uses this register as part of the init
> sequence and if the controller reports that the register is inaccessable
> then the init sequence fails. On some AIC100 cards, the garbage value
> ends up being 0xFFFFFFFF which is PCIe spec defined to be a special value
> indicating the access failed. The MHI controller cannot tell if that
> value is a PCIe link issue, or just garbage.
>
> QAIC needs a way to tell MHI not to use this register. Other buses have a
> quirk mechanism - a way to describe oddities in a particular
> implementation that have some kind of workaround. Since this seems to be
> the first need for such a thing in MHI, introduce a quirk framework.
>
> The second issue AIC100 has involves the PK Hash registers. A solution for
> this is expected to be proposed in the near future and is anticipated to
> make use of the quirk framework proposed here. With PK Hash, there are two
> oddities to handle. AIC100 does not initialize these registers until the
> SBL is running, which is later than the spec indicates, and in practice
> is after MHI reads/caches them. Also, AIC100 does not have enough
> registers defined to fully report the 5 PK Hash slots, so a custom
> reporting format is defined by the device.
>
Looking at the two issues you reported above, it looks to me that they can be
handled inside the aic100 mhi_controller driver itself. Since the MHI stack
exports the read_reg callback to controller drivers, if some registers are not
supported by the device, then the callback can provide some fixed dummy data
emulating the register until the issue is fixed in the device (if at all).
Quirk framework could be useful if the device misbehaves against the protocol
itself but for the register issues like this, I think the controller driver can
handle itself.
What do you think?
- Mani
> v2:
> -Fix build error
> -Fix typo in commit text
>
> Jeffrey Hugo (2):
> bus: mhi: host: Add quirk framework and initial quirk
> accel/qaic: Add MHI_QUIRK_SOC_HW_VERSION_UNRELIABLE
>
> drivers/accel/qaic/mhi_controller.c | 1 +
> drivers/bus/mhi/host/init.c | 13 +++++++++----
> include/linux/mhi.h | 18 ++++++++++++++++++
> 3 files changed, 28 insertions(+), 4 deletions(-)
>
> --
> 2.40.1
>
>
--
மணிவண்ணன் சதாசிவம்
On 6/8/2023 5:59 AM, Manivannan Sadhasivam wrote:
> On Fri, May 19, 2023 at 10:39:00AM -0600, Jeffrey Hugo wrote:
>> With the QAIC driver in -next, I'd like to suggest some MHI changes that
>> specific to AIC100 devices, but perhaps provide a framework for other
>> device oddities.
>>
>> AIC100 devices technically violate the MHI spec in two ways. Sadly, these
>> issues comes from the device hardware, so host SW needs to work around
>> them.
>>
>> Thie first issue, presented in this series, has to do with the
>> SOC_HW_VERSION register. This register is suposed to be initialized by the
>> hardware prior to the MHI being accessable by the host to contain a
>> version string for the SoC of the device. This could be used by the host
>> MHI controller software to identify and handle version to version changes.
>> The AIC100 hardware does not initialize this register, and thus it
>> contains garbage.
>>
>> This would not be much of a problem normally - the QAIC driver would just
>> never use it. However the MHI stack uses this register as part of the init
>> sequence and if the controller reports that the register is inaccessable
>> then the init sequence fails. On some AIC100 cards, the garbage value
>> ends up being 0xFFFFFFFF which is PCIe spec defined to be a special value
>> indicating the access failed. The MHI controller cannot tell if that
>> value is a PCIe link issue, or just garbage.
>>
>> QAIC needs a way to tell MHI not to use this register. Other buses have a
>> quirk mechanism - a way to describe oddities in a particular
>> implementation that have some kind of workaround. Since this seems to be
>> the first need for such a thing in MHI, introduce a quirk framework.
>>
>> The second issue AIC100 has involves the PK Hash registers. A solution for
>> this is expected to be proposed in the near future and is anticipated to
>> make use of the quirk framework proposed here. With PK Hash, there are two
>> oddities to handle. AIC100 does not initialize these registers until the
>> SBL is running, which is later than the spec indicates, and in practice
>> is after MHI reads/caches them. Also, AIC100 does not have enough
>> registers defined to fully report the 5 PK Hash slots, so a custom
>> reporting format is defined by the device.
>>
>
> Looking at the two issues you reported above, it looks to me that they can be
> handled inside the aic100 mhi_controller driver itself. Since the MHI stack
> exports the read_reg callback to controller drivers, if some registers are not
> supported by the device, then the callback can provide some fixed dummy data
> emulating the register until the issue is fixed in the device (if at all).
>
> Quirk framework could be useful if the device misbehaves against the protocol
> itself but for the register issues like this, I think the controller driver can
> handle itself.
>
> What do you think?
I think for the HW_VERSION register, your suggestion is very good, and
something I plan to adopt.
For the PK Hash registers, I don't think it quite works.
HW_VERSION I can hard code to a valid value, or just stub out to 0 since
that appears to be only consumed by the MHI Controller, and we don't use it.
The PK Hash registers are programmed into the SoC, and can be unique
from SoC to SoC. I don't see how the driver can provide valid, but
faked information for them. Also, the user consumes this data via
sysfs. We'd like to give the data to the user, and we can't fake it.
Also the data is dynamic.
Lets start with the dynamic data issue. Right now MHI reads these
registers once, and caches the values. I would propose a quirk to
change that behavior for AIC100, but does MHI really need to operate in
a "read once" mode? Would something actually break if MHI read the
registers every time the sysfs node is accessed? Then sysfs would
display the latest data, which would be beneficial to AIC100 and should
not be a behavior change for other devices which have static data (MHI
just displays the same data because it hasn't changed).
Do you recall the reason behind making the PK Hash registers read once
and cached?
>
> - Mani
>
>> v2:
>> -Fix build error
>> -Fix typo in commit text
>>
>> Jeffrey Hugo (2):
>> bus: mhi: host: Add quirk framework and initial quirk
>> accel/qaic: Add MHI_QUIRK_SOC_HW_VERSION_UNRELIABLE
>>
>> drivers/accel/qaic/mhi_controller.c | 1 +
>> drivers/bus/mhi/host/init.c | 13 +++++++++----
>> include/linux/mhi.h | 18 ++++++++++++++++++
>> 3 files changed, 28 insertions(+), 4 deletions(-)
>>
>> --
>> 2.40.1
>>
>>
>
On Mon, Jun 26, 2023 at 11:15:56AM -0600, Jeffrey Hugo wrote:
> On 6/8/2023 5:59 AM, Manivannan Sadhasivam wrote:
> > On Fri, May 19, 2023 at 10:39:00AM -0600, Jeffrey Hugo wrote:
> > > With the QAIC driver in -next, I'd like to suggest some MHI changes that
> > > specific to AIC100 devices, but perhaps provide a framework for other
> > > device oddities.
> > >
> > > AIC100 devices technically violate the MHI spec in two ways. Sadly, these
> > > issues comes from the device hardware, so host SW needs to work around
> > > them.
> > >
> > > Thie first issue, presented in this series, has to do with the
> > > SOC_HW_VERSION register. This register is suposed to be initialized by the
> > > hardware prior to the MHI being accessable by the host to contain a
> > > version string for the SoC of the device. This could be used by the host
> > > MHI controller software to identify and handle version to version changes.
> > > The AIC100 hardware does not initialize this register, and thus it
> > > contains garbage.
> > >
> > > This would not be much of a problem normally - the QAIC driver would just
> > > never use it. However the MHI stack uses this register as part of the init
> > > sequence and if the controller reports that the register is inaccessable
> > > then the init sequence fails. On some AIC100 cards, the garbage value
> > > ends up being 0xFFFFFFFF which is PCIe spec defined to be a special value
> > > indicating the access failed. The MHI controller cannot tell if that
> > > value is a PCIe link issue, or just garbage.
> > >
> > > QAIC needs a way to tell MHI not to use this register. Other buses have a
> > > quirk mechanism - a way to describe oddities in a particular
> > > implementation that have some kind of workaround. Since this seems to be
> > > the first need for such a thing in MHI, introduce a quirk framework.
> > >
> > > The second issue AIC100 has involves the PK Hash registers. A solution for
> > > this is expected to be proposed in the near future and is anticipated to
> > > make use of the quirk framework proposed here. With PK Hash, there are two
> > > oddities to handle. AIC100 does not initialize these registers until the
> > > SBL is running, which is later than the spec indicates, and in practice
> > > is after MHI reads/caches them. Also, AIC100 does not have enough
> > > registers defined to fully report the 5 PK Hash slots, so a custom
> > > reporting format is defined by the device.
> > >
> >
> > Looking at the two issues you reported above, it looks to me that they can be
> > handled inside the aic100 mhi_controller driver itself. Since the MHI stack
> > exports the read_reg callback to controller drivers, if some registers are not
> > supported by the device, then the callback can provide some fixed dummy data
> > emulating the register until the issue is fixed in the device (if at all).
> >
> > Quirk framework could be useful if the device misbehaves against the protocol
> > itself but for the register issues like this, I think the controller driver can
> > handle itself.
> >
> > What do you think?
>
> I think for the HW_VERSION register, your suggestion is very good, and
> something I plan to adopt.
>
> For the PK Hash registers, I don't think it quite works.
>
> HW_VERSION I can hard code to a valid value, or just stub out to 0 since
> that appears to be only consumed by the MHI Controller, and we don't use it.
>
> The PK Hash registers are programmed into the SoC, and can be unique from
> SoC to SoC. I don't see how the driver can provide valid, but faked
> information for them. Also, the user consumes this data via sysfs. We'd
> like to give the data to the user, and we can't fake it. Also the data is
> dynamic.
>
> Lets start with the dynamic data issue. Right now MHI reads these registers
> once, and caches the values. I would propose a quirk to change that
> behavior for AIC100, but does MHI really need to operate in a "read once"
> mode? Would something actually break if MHI read the registers every time
> the sysfs node is accessed? Then sysfs would display the latest data, which
> would be beneficial to AIC100 and should not be a behavior change for other
> devices which have static data (MHI just displays the same data because it
> hasn't changed).
>
> Do you recall the reason behind making the PK Hash registers read once and
> cached?
>
I don't see an issue with reading the PK hash dynamically. I think the intention
for caching mostly come from the fact it was a static data.
So you can dynamically read it all the time.
- Mani
> >
> > - Mani
> >
> > > v2:
> > > -Fix build error
> > > -Fix typo in commit text
> > >
> > > Jeffrey Hugo (2):
> > > bus: mhi: host: Add quirk framework and initial quirk
> > > accel/qaic: Add MHI_QUIRK_SOC_HW_VERSION_UNRELIABLE
> > >
> > > drivers/accel/qaic/mhi_controller.c | 1 +
> > > drivers/bus/mhi/host/init.c | 13 +++++++++----
> > > include/linux/mhi.h | 18 ++++++++++++++++++
> > > 3 files changed, 28 insertions(+), 4 deletions(-)
> > >
> > > --
> > > 2.40.1
> > >
> > >
> >
>
>
--
மணிவண்ணன் சதாசிவம்