It appears that hardware does not like cacheable accesses to this
region. Trying to access this shared memory region as Normal Memory
leads to secure interrupt which causes an endless loop somewhere in
Trust Zone.
The only reason it is working right now is because Qualcomm Hypervisor
maps the same region as Non-Cacheable memory in Stage 2 translation
tables. The issue manifests if we want to use another hypervisor (like
Xen or KVM), which does not know anything about those specific
mappings. This patch fixes the issue by mapping the shared memory as
Write-Through. This removes dependency on correct mappings in Stage 2
tables.
I tested this on SA8155P with Xen.
Signed-off-by: Volodymyr Babchuk <[email protected]>
---
drivers/soc/qcom/cmd-db.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/soc/qcom/cmd-db.c b/drivers/soc/qcom/cmd-db.c
index a5fd68411bed5..dd5ababdb476c 100644
--- a/drivers/soc/qcom/cmd-db.c
+++ b/drivers/soc/qcom/cmd-db.c
@@ -324,7 +324,7 @@ static int cmd_db_dev_probe(struct platform_device *pdev)
return -EINVAL;
}
- cmd_db_header = memremap(rmem->base, rmem->size, MEMREMAP_WB);
+ cmd_db_header = memremap(rmem->base, rmem->size, MEMREMAP_WT);
if (!cmd_db_header) {
ret = -ENOMEM;
cmd_db_header = NULL;
--
2.43.0
On Wed, Mar 27, 2024 at 08:09:34PM +0000, Volodymyr Babchuk wrote:
> It appears that hardware does not like cacheable accesses to this
> region. Trying to access this shared memory region as Normal Memory
> leads to secure interrupt which causes an endless loop somewhere in
> Trust Zone.
>
> The only reason it is working right now is because Qualcomm Hypervisor
> maps the same region as Non-Cacheable memory in Stage 2 translation
> tables. The issue manifests if we want to use another hypervisor (like
> Xen or KVM), which does not know anything about those specific
> mappings. This patch fixes the issue by mapping the shared memory as
> Write-Through. This removes dependency on correct mappings in Stage 2
> tables.
>
> I tested this on SA8155P with Xen.
>
Hi!
I observe a similar issue while trying to boot Linux in EL2 after taking
over qcom's hyp on a sc7180 WoA device:
[ 0.337736] CPU: All CPU(s) started at EL2
(...)
[ 0.475135] Serial: AMBA PL011 UART driver
[ 0.479649] Internal error: synchronous external abort: 0000000096000410 [#1] PREEMPT SMP
[ 0.488053] Modules linked in:
[ 0.491213] CPU: 6 PID: 1 Comm: swapper/0 Not tainted 6.7.0 #41
[ 0.497310] Hardware name: Acer Aspire 1 (DT)
[ 0.501800] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.508964] pc : cmd_db_dev_probe+0x38/0xc4
[ 0.513290] lr : cmd_db_dev_probe+0x2c/0xc4
[ 0.517606] sp : ffff8000817ebab0
[ 0.521019] x29: ffff8000817ebab0 x28: 0000000000000000 x27: ffff800081346050
<uart cuts out>
Unfortunately this patch doesn't help in this case (I beileve I even
tried same/similar change a while back when trying to debug this)
Currently I can work around this by just reocationg the cmd-db while
still under the qcom's hyp [1] but it would be nice to find a generic
solution that doesn't need pre-boot hacks...
AFAIK this is not observed on at least sc8280xp WoA devices and I'd
assume cros is not affected because they don't use qcom's TZ and instead
use TF-A (which is overall more friendly, though still uses qcom's
proprietary qtiseclib under the hood)
Nikita
[1] https://github.com/TravMurav/slbounce/blob/main/src/dtbhack_main.c#L17
> Signed-off-by: Volodymyr Babchuk <[email protected]>
> ---
> drivers/soc/qcom/cmd-db.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/soc/qcom/cmd-db.c b/drivers/soc/qcom/cmd-db.c
> index a5fd68411bed5..dd5ababdb476c 100644
> --- a/drivers/soc/qcom/cmd-db.c
> +++ b/drivers/soc/qcom/cmd-db.c
> @@ -324,7 +324,7 @@ static int cmd_db_dev_probe(struct platform_device *pdev)
> return -EINVAL;
> }
>
> - cmd_db_header = memremap(rmem->base, rmem->size, MEMREMAP_WB);
> + cmd_db_header = memremap(rmem->base, rmem->size, MEMREMAP_WT);
> if (!cmd_db_header) {
> ret = -ENOMEM;
> cmd_db_header = NULL;
> --
> 2.43.0
On 3/28/2024 1:39 AM, Volodymyr Babchuk wrote:
> It appears that hardware does not like cacheable accesses to this
> region. Trying to access this shared memory region as Normal Memory
> leads to secure interrupt which causes an endless loop somewhere in
> Trust Zone.
Linux does not write into cmd-db region. This region is write protected
by XPU. Making this region uncached magically solves the XPU write fault
issue.
Can you please include above details?
>
> The only reason it is working right now is because Qualcomm Hypervisor
> maps the same region as Non-Cacheable memory in Stage 2 translation
> tables. The issue manifests if we want to use another hypervisor (like
> Xen or KVM), which does not know anything about those specific
> mappings. This patch fixes the issue by mapping the shared memory as
> Write-Through. This removes dependency on correct mappings in Stage 2
> tables.
Using MEMREMAP_WC also resolves for qcm6490, see below comment.
>
> I tested this on SA8155P with Xen.
>
> Signed-off-by: Volodymyr Babchuk <[email protected]>
> ---
> drivers/soc/qcom/cmd-db.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/soc/qcom/cmd-db.c b/drivers/soc/qcom/cmd-db.c
> index a5fd68411bed5..dd5ababdb476c 100644
> --- a/drivers/soc/qcom/cmd-db.c
> +++ b/drivers/soc/qcom/cmd-db.c
> @@ -324,7 +324,7 @@ static int cmd_db_dev_probe(struct platform_device *pdev)
> return -EINVAL;
> }
>
> - cmd_db_header = memremap(rmem->base, rmem->size, MEMREMAP_WB);
> + cmd_db_header = memremap(rmem->base, rmem->size, MEMREMAP_WT);
In downstream, we have below which resolved similar issue on qcm6490.
cmd_db_header = memremap(rmem->base, rmem->size, MEMREMAP_WC);
Downstream SA8155P also have MEMREMAP_WC. Can you please give it a try
on your device?
Thanks,
Maulik
On Thu, Mar 28, 2024 at 04:12:11PM +0500, Nikita Travkin wrote:
> On Wed, Mar 27, 2024 at 08:09:34PM +0000, Volodymyr Babchuk wrote:
> > It appears that hardware does not like cacheable accesses to this
> > region. Trying to access this shared memory region as Normal Memory
> > leads to secure interrupt which causes an endless loop somewhere in
> > Trust Zone.
> >
> > The only reason it is working right now is because Qualcomm Hypervisor
> > maps the same region as Non-Cacheable memory in Stage 2 translation
> > tables. The issue manifests if we want to use another hypervisor (like
> > Xen or KVM), which does not know anything about those specific
> > mappings. This patch fixes the issue by mapping the shared memory as
> > Write-Through. This removes dependency on correct mappings in Stage 2
> > tables.
> >
> > I tested this on SA8155P with Xen.
> >
>
> Hi!
>
> I observe a similar issue while trying to boot Linux in EL2 after taking
> over qcom's hyp on a sc7180 WoA device:
>
> [ 0.337736] CPU: All CPU(s) started at EL2
> (...)
> [ 0.475135] Serial: AMBA PL011 UART driver
> [ 0.479649] Internal error: synchronous external abort: 0000000096000410 [#1] PREEMPT SMP
> [ 0.488053] Modules linked in:
> [ 0.491213] CPU: 6 PID: 1 Comm: swapper/0 Not tainted 6.7.0 #41
> [ 0.497310] Hardware name: Acer Aspire 1 (DT)
> [ 0.501800] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 0.508964] pc : cmd_db_dev_probe+0x38/0xc4
> [ 0.513290] lr : cmd_db_dev_probe+0x2c/0xc4
> [ 0.517606] sp : ffff8000817ebab0
> [ 0.521019] x29: ffff8000817ebab0 x28: 0000000000000000 x27: ffff800081346050
> <uart cuts out>
>
> Unfortunately this patch doesn't help in this case (I beileve I even
> tried same/similar change a while back when trying to debug this)
>
I'm sorry, it looks like I made a mistake in my tooling while testing
this patch, which I only realized after trying Maulik's suggestion...
Both _WT and _WC fix the issue I see on sc7180 WoA, so whether you keep
the patch as is or change it to _WC as suggested:
Tested-By: Nikita Travkin <[email protected]> # sc7180 WoA in EL2
Thanks for looking into this!
Nikita
> Currently I can work around this by just reocationg the cmd-db while
> still under the qcom's hyp [1] but it would be nice to find a generic
> solution that doesn't need pre-boot hacks...
>
> AFAIK this is not observed on at least sc8280xp WoA devices and I'd
> assume cros is not affected because they don't use qcom's TZ and instead
> use TF-A (which is overall more friendly, though still uses qcom's
> proprietary qtiseclib under the hood)
>
> Nikita
>
> [1] https://github.com/TravMurav/slbounce/blob/main/src/dtbhack_main.c#L17
>
> > Signed-off-by: Volodymyr Babchuk <[email protected]>
> > ---
> > drivers/soc/qcom/cmd-db.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/soc/qcom/cmd-db.c b/drivers/soc/qcom/cmd-db.c
> > index a5fd68411bed5..dd5ababdb476c 100644
> > --- a/drivers/soc/qcom/cmd-db.c
> > +++ b/drivers/soc/qcom/cmd-db.c
> > @@ -324,7 +324,7 @@ static int cmd_db_dev_probe(struct platform_device *pdev)
> > return -EINVAL;
> > }
> >
> > - cmd_db_header = memremap(rmem->base, rmem->size, MEMREMAP_WB);
> > + cmd_db_header = memremap(rmem->base, rmem->size, MEMREMAP_WT);
> > if (!cmd_db_header) {
> > ret = -ENOMEM;
> > cmd_db_header = NULL;
> > --
> > 2.43.0
Hi Maulik
"Maulik Shah (mkshah)" <[email protected]> writes:
> On 3/28/2024 1:39 AM, Volodymyr Babchuk wrote:
>> It appears that hardware does not like cacheable accesses to this
>> region. Trying to access this shared memory region as Normal Memory
>> leads to secure interrupt which causes an endless loop somewhere in
>> Trust Zone.
>
> Linux does not write into cmd-db region. This region is write
> protected by XPU. Making this region uncached magically solves the XPU
> write fault
> issue.
>
> Can you please include above details?
Sure, I'll add this to the next version.
>> The only reason it is working right now is because Qualcomm
>> Hypervisor
>> maps the same region as Non-Cacheable memory in Stage 2 translation
>> tables. The issue manifests if we want to use another hypervisor (like
>> Xen or KVM), which does not know anything about those specific
>> mappings. This patch fixes the issue by mapping the shared memory as
>> Write-Through. This removes dependency on correct mappings in Stage 2
>> tables.
>
> Using MEMREMAP_WC also resolves for qcm6490, see below comment.
>
>> I tested this on SA8155P with Xen.
>> Signed-off-by: Volodymyr Babchuk <[email protected]>
>> ---
>> drivers/soc/qcom/cmd-db.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> diff --git a/drivers/soc/qcom/cmd-db.c b/drivers/soc/qcom/cmd-db.c
>> index a5fd68411bed5..dd5ababdb476c 100644
>> --- a/drivers/soc/qcom/cmd-db.c
>> +++ b/drivers/soc/qcom/cmd-db.c
>> @@ -324,7 +324,7 @@ static int cmd_db_dev_probe(struct platform_device *pdev)
>> return -EINVAL;
>> }
>> - cmd_db_header = memremap(rmem->base, rmem->size, MEMREMAP_WB);
>> + cmd_db_header = memremap(rmem->base, rmem->size, MEMREMAP_WT);
>
> In downstream, we have below which resolved similar issue on qcm6490.
>
> cmd_db_header = memremap(rmem->base, rmem->size, MEMREMAP_WC);
>
> Downstream SA8155P also have MEMREMAP_WC. Can you please give it a try
> on your device?
Yes, MEMREMAP_WC works as well. This opens the question: which type is
more correct? I have no deep understanding in QCOM internals so it is
hard to me to answer this question.
--
WBR, Volodymyr
On 3/29/2024 3:49 AM, Volodymyr Babchuk wrote:
>
> Hi Maulik
>
> "Maulik Shah (mkshah)" <[email protected]> writes:
>
>> On 3/28/2024 1:39 AM, Volodymyr Babchuk wrote:
>>> It appears that hardware does not like cacheable accesses to this
>>> region. Trying to access this shared memory region as Normal Memory
>>> leads to secure interrupt which causes an endless loop somewhere in
>>> Trust Zone.
>>
>> Linux does not write into cmd-db region. This region is write
>> protected by XPU. Making this region uncached magically solves the XPU
>> write fault
>> issue.
>>
>> Can you please include above details?
>
> Sure, I'll add this to the next version.
>
Thanks.
>>
>> In downstream, we have below which resolved similar issue on qcm6490.
>>
>> cmd_db_header = memremap(rmem->base, rmem->size, MEMREMAP_WC);
>>
>> Downstream SA8155P also have MEMREMAP_WC. Can you please give it a try
>> on your device?
>
> Yes, MEMREMAP_WC works as well. This opens the question: which type is
> more correct? I have no deep understanding in QCOM internals so it is
> hard to me to answer this question.
>
XPU may have falsely detected clean cache eviction as "write" into the
write protected region so using uncached flag MEMREMAP_WC may be helping
here and is more correct in my understanding.
This can also be included in commit message.
Thanks,
Maulik
Quoting Stephan Gerhold (2024-04-11 01:02:01)
> On Wed, Apr 10, 2024 at 10:12:37PM +0000, Volodymyr Babchuk wrote:
> > Stephan Gerhold <[email protected]> writes:
> > > On Wed, Mar 27, 2024 at 11:29:09PM +0000, Caleb Connolly wrote:
> > >> On 27/03/2024 21:06, Konrad Dybcio wrote:
> > >> > On 27.03.2024 10:04 PM, Volodymyr Babchuk wrote:
> > >> >> Konrad Dybcio <[email protected]> writes:
> > >> >>> On 27.03.2024 9:09 PM, Volodymyr Babchuk wrote:
> > >> >>>> It appears that hardware does not like cacheable accesses to this
> > >> >>>> region. Trying to access this shared memory region as Normal Memory
> > >> >>>> leads to secure interrupt which causes an endless loop somewhere in
> > >> >>>> Trust Zone.
> > >> >>>>
> > >> >>>> The only reason it is working right now is because Qualcomm Hypervisor
> > >> >>>> maps the same region as Non-Cacheable memory in Stage 2 translation
> > >> >>>> tables. The issue manifests if we want to use another hypervisor (like
> > >> >>>> Xen or KVM), which does not know anything about those specific
> > >> >>>> mappings. This patch fixes the issue by mapping the shared memory as
> > >> >>>> Write-Through. This removes dependency on correct mappings in Stage 2
> > >> >>>> tables.
> > >> >>>>
> > >> >>>> I tested this on SA8155P with Xen.
> > >> >>>>
> > >> >>>> Signed-off-by: Volodymyr Babchuk <[email protected]>
> > >> >>>> ---
> > >> >>>
> > >> >>> Interesting..
> > >> >>>
> > >> >>> +Doug, Rob have you ever seen this on Chrome? (FYI, Volodymyr, chromebooks
> > >> >>> ship with no qcom hypervisor)
ChromeOS boots the kernel at EL2 on sc7180. But more importantly we
don't enable whichever xPU it is that you're running into.
> > >> >>
> > >> >> Well, maybe I was wrong when called this thing "hypervisor". All I know
> > >> >> that it sits in hyp.mbn partition and all what it does is setup EL2
> > >> >> before switching to EL1 and running UEFI.
> > >> >>
> > >> >> In my experiments I replaced contents of hyp.mbn with U-Boot, which gave
> > >> >> me access to EL2 and I was able to boot Xen and then Linux as Dom0.
> > >> >
> > >> > Yeah we're talking about the same thing. I was just curious whether
> > >> > the Chrome folks have heard of it, or whether they have any changes/
> > >> > workarounds for it.
> > >>
> > >> Does Linux ever write to this region? Given that the Chromebooks don't
> > >> seem to have issues with this (we have a bunch of them in pmOS and I'd
> > >> be very very surprised if this was an issue there which nobody had tried
> > >> upstreaming before) I'd guess the significant difference here is between
> > >> booting Linux in EL2 (as Chromebooks do?) vs with Xen.
> > >>
> > >
> > > FWIW: This old patch series from Stephen Boyd is closely related:
> > > https://urldefense.com/v3/__https://lore.kernel.org/linux-arm-msm/[email protected]/__;!!GF_29dbcQIUBPA!yGecMHGezwkDU9t7XATVTI80PNGjZdQV2xsYFTl6EhpMMsRf_7xryKx8mEVpmTwTcKMGaaWomtyvr05zFcmsf2Kk$
> > > [lore[.]kernel[.]org]
> > >
> > >> The main use case I have is to map the command-db memory region on
> > >> Qualcomm devices with a read-only mapping. It's already a const marked
> > >> pointer and the API returns const pointers as well, so this series
> > >> makes sure that even stray writes can't modify the memory.
> > >
> > > Stephen, what was the end result of that patch series? Mapping the
> > > cmd-db read-only sounds cleaner than trying to be lucky with the right
> > > set of cache flags.
> > >
> >
> > I checked the series, but I am afraid that I have no capacity to finish
> > this. Will it be okay to move forward with my patch? I understand that
> > this is not the best solution, but it is simple and it works. If this is
> > fine, I'll send v2 with all comments addressed.
> >
>
> My current understanding is that the important property here is to have
> a non-cacheable mapping, which is the case for both MEMREMAP_WT and
> MEMREMAP_WC, but not MEMREMAP_WB. Unfortunately, the MEMREMAP_RO option
> Stephen introduced is also a cacheable mapping, which still seems to
> trigger the issue in some cases. I'm not sure why a cache writeback
> still happens when the mapping is read-only and nobody writes anything.
Qualcomm knows for certain. It's not a cache writeback per my
recollection. I recall the problem always being that it's a speculative
access to xPU protected memory. If there's a cacheable mapping in the
non-secure page tables then it may be loaded at the bus with the
non-secure bit set (NS). Once the xPU sees that it reboots the system.
It used to be that we could never map secure memory regions in the
kernel. I suspect with EL2 the story changes slightly. The hypervisor is
the one mapping cmd-db at stage2, so any speculative access goes on the
bus as EL2 tagged, and thus "approved" by the xPU. Then if the
hypervisor sees EL1 (secure or non-secure) access cmd-db, it traps and
makes sure it can actually access that address. If not, the hypervisor
"panics" and reboots. Either way, EL1 can have a cacheable mapping and
EL2 can make sure the secrets are safe, while the cache never goes out
to the bus as anything besides EL2.
>
> You can also test it if you want. For a quick test,
>
> - cmd_db_header = memremap(rmem->base, rmem->size, MEMREMAP_WB);
> + cmd_db_header = ioremap_prot(rmem->base, rmem->size, _PAGE_KERNEL_RO);
>
> should be (largely) equivalent to MEMREMAP_RO with Stephen's patch
> series. I asked Nikita to test this on SC7180 and it still seems to
> cause the crash.
>
> It seems to work only with a read-only non-cacheable mapping, e.g. with
>
> + cmd_db_header = ioremap_prot(rmem->base, rmem->size,
> ((PROT_NORMAL_NC & ~PTE_WRITE) | PTE_RDONLY));
>
> The lines I just suggested for testing are highly architecture-specific
> though so not usable for a proper patch. If MEMREMAP_RO does not solve
> the real problem here then the work to make an usable read-only mapping
> would go beyond just finishing Stephen's patch series, since one would
> need to introduce some kind of MEMREMAP_RO_NC flag that creates a
> read-only non-cacheable mapping.
>
> It is definitely easier to just change the driver to use the existing
> MEMREMAP_WC. Given the crash you found, the hardware/firmware seems to
> have a built-in write protection on most platforms anyway. :D
>
How is Xen mapping this protected memory region? It sounds like maybe
that should be mapped differently. Also, how is EL2 accessible on this
device?