From: Quentin Perret <[email protected]>
commit 43c1ff8b75011bc3e3e923adf31ba815864a2494 upstream.
Memory regions marked as "no-map" in the host device-tree routinely
include TrustZone carev-outs and DMA pools. Although donating such pages
to the hypervisor may not breach confidentiality, it could be used to
corrupt its state in uncontrollable ways. To prevent this, let's block
host-initiated memory transitions targeting "no-map" pages altogether in
nVHE protected mode as there should be no valid reason to do this in
current operation.
Thankfully, the pKVM EL2 hypervisor has a full copy of the host's list
of memblock regions, so we can easily check for the presence of the
MEMBLOCK_NOMAP flag on a region containing pages being donated from the
host.
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Tested-by: Vincent Donnefort <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ bp: clean ]
Signed-off-by: Suraj Jitindar Singh <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 07f9dc9848ef..0f6c053686c7 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -195,7 +195,7 @@ struct kvm_mem_range {
u64 end;
};
-static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
+static struct memblock_region *find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
{
int cur, left = 0, right = hyp_memblock_nr;
struct memblock_region *reg;
@@ -218,18 +218,28 @@ static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
} else {
range->start = reg->base;
range->end = end;
- return true;
+ return reg;
}
}
- return false;
+ return NULL;
}
bool addr_is_memory(phys_addr_t phys)
{
struct kvm_mem_range range;
- return find_mem_range(phys, &range);
+ return !!find_mem_range(phys, &range);
+}
+
+static bool addr_is_allowed_memory(phys_addr_t phys)
+{
+ struct memblock_region *reg;
+ struct kvm_mem_range range;
+
+ reg = find_mem_range(phys, &range);
+
+ return reg && !(reg->flags & MEMBLOCK_NOMAP);
}
static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range)
@@ -348,7 +358,7 @@ static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot pr
static int host_stage2_idmap(u64 addr)
{
struct kvm_mem_range range;
- bool is_memory = find_mem_range(addr, &range);
+ bool is_memory = !!find_mem_range(addr, &range);
enum kvm_pgtable_prot prot;
int ret;
@@ -425,7 +435,7 @@ static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
struct check_walk_data *d = arg;
kvm_pte_t pte = *ptep;
- if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
+ if (kvm_pte_valid(pte) && !addr_is_allowed_memory(kvm_pte_to_phys(pte)))
return -EINVAL;
return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
--
2.34.1
From: Will Deacon <[email protected]>
commit 09cce60bddd6461a93a5bf434265a47827d1bc6f upstream.
Since host stage-2 mappings are created lazily, we cannot rely solely on
the pte in order to recover the target physical address when checking a
host-initiated memory transition as this permits donation of unmapped
regions corresponding to MMIO or "no-map" memory.
Instead of inspecting the pte, move the addr_is_allowed_memory() check
into the host callback function where it is passed the physical address
directly from the walker.
Cc: Quentin Perret <[email protected]>
Fixes: e82edcc75c4e ("KVM: arm64: Implement do_share() helper for sharing memory")
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ bp: s/ctx->addr/addr in __check_page_state_visitor due to missing commit
"KVM: arm64: Combine visitor arguments into a context structure"
in stable.
]
Signed-off-by: Suraj Jitindar Singh <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 0f6c053686c7..0faa330a41ed 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -424,7 +424,7 @@ struct pkvm_mem_share {
struct check_walk_data {
enum pkvm_page_state desired;
- enum pkvm_page_state (*get_page_state)(kvm_pte_t pte);
+ enum pkvm_page_state (*get_page_state)(kvm_pte_t pte, u64 addr);
};
static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
@@ -435,10 +435,7 @@ static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
struct check_walk_data *d = arg;
kvm_pte_t pte = *ptep;
- if (kvm_pte_valid(pte) && !addr_is_allowed_memory(kvm_pte_to_phys(pte)))
- return -EINVAL;
-
- return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
+ return d->get_page_state(pte, addr) == d->desired ? 0 : -EPERM;
}
static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
@@ -453,8 +450,11 @@ static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
return kvm_pgtable_walk(pgt, addr, size, &walker);
}
-static enum pkvm_page_state host_get_page_state(kvm_pte_t pte)
+static enum pkvm_page_state host_get_page_state(kvm_pte_t pte, u64 addr)
{
+ if (!addr_is_allowed_memory(addr))
+ return PKVM_NOPAGE;
+
if (!kvm_pte_valid(pte) && pte)
return PKVM_NOPAGE;
@@ -521,7 +521,7 @@ static int host_initiate_unshare(u64 *completer_addr,
return __host_set_page_state_range(addr, size, PKVM_PAGE_OWNED);
}
-static enum pkvm_page_state hyp_get_page_state(kvm_pte_t pte)
+static enum pkvm_page_state hyp_get_page_state(kvm_pte_t pte, u64 addr)
{
if (!kvm_pte_valid(pte))
return PKVM_NOPAGE;
--
2.34.1
On Wed, 20 Sep 2023 20:27:28 +0100,
Suraj Jitindar Singh <[email protected]> wrote:
>
> From: Quentin Perret <[email protected]>
>
> commit 43c1ff8b75011bc3e3e923adf31ba815864a2494 upstream.
>
> Memory regions marked as "no-map" in the host device-tree routinely
> include TrustZone carev-outs and DMA pools. Although donating such pages
> to the hypervisor may not breach confidentiality, it could be used to
> corrupt its state in uncontrollable ways. To prevent this, let's block
> host-initiated memory transitions targeting "no-map" pages altogether in
> nVHE protected mode as there should be no valid reason to do this in
> current operation.
>
> Thankfully, the pKVM EL2 hypervisor has a full copy of the host's list
> of memblock regions, so we can easily check for the presence of the
> MEMBLOCK_NOMAP flag on a region containing pages being donated from the
> host.
>
> Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
> Tested-by: Vincent Donnefort <[email protected]>
> Signed-off-by: Quentin Perret <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> Signed-off-by: Marc Zyngier <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> [ bp: clean ]
What is this?
> Signed-off-by: Suraj Jitindar Singh <[email protected]>
What is the rationale for backporting this? It wasn't tagged as Cc: to
stable for a reason: pKVM isn't functional upstream, and won't be for
the next couple of cycles *at least*.
So at it stands, I'm against such a backport.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
On Wed, 20 Sep 2023 20:27:29 +0100,
Suraj Jitindar Singh <[email protected]> wrote:
>
> From: Will Deacon <[email protected]>
>
> commit 09cce60bddd6461a93a5bf434265a47827d1bc6f upstream.
>
> Since host stage-2 mappings are created lazily, we cannot rely solely on
> the pte in order to recover the target physical address when checking a
> host-initiated memory transition as this permits donation of unmapped
> regions corresponding to MMIO or "no-map" memory.
>
> Instead of inspecting the pte, move the addr_is_allowed_memory() check
> into the host callback function where it is passed the physical address
> directly from the walker.
>
> Cc: Quentin Perret <[email protected]>
> Fixes: e82edcc75c4e ("KVM: arm64: Implement do_share() helper for sharing memory")
> Signed-off-by: Will Deacon <[email protected]>
> Signed-off-by: Marc Zyngier <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> [ bp: s/ctx->addr/addr in __check_page_state_visitor due to missing commit
> "KVM: arm64: Combine visitor arguments into a context structure"
> in stable.
> ]
Same question.
> Signed-off-by: Suraj Jitindar Singh <[email protected]>
Again, I find this backport pretty pointless. What is the rationale
for it?
M.
--
Without deviation from the norm, progress is not possible.
On Thu, 2023-09-21 at 08:15 +0100, Marc Zyngier wrote:
> On Wed, 20 Sep 2023 20:27:29 +0100,
> Suraj Jitindar Singh <[email protected]> wrote:
> >
> > From: Will Deacon <[email protected]>
> >
> > commit 09cce60bddd6461a93a5bf434265a47827d1bc6f upstream.
> >
> > Since host stage-2 mappings are created lazily, we cannot rely
> > solely on
> > the pte in order to recover the target physical address when
> > checking a
> > host-initiated memory transition as this permits donation of
> > unmapped
> > regions corresponding to MMIO or "no-map" memory.
> >
> > Instead of inspecting the pte, move the addr_is_allowed_memory()
> > check
> > into the host callback function where it is passed the physical
> > address
> > directly from the walker.
> >
> > Cc: Quentin Perret <[email protected]>
> > Fixes: e82edcc75c4e ("KVM: arm64: Implement do_share() helper for
> > sharing memory")
> > Signed-off-by: Will Deacon <[email protected]>
> > Signed-off-by: Marc Zyngier <[email protected]>
> > Link:
> > https://lore.kernel.org/r/[email protected]
> > [ bp: s/ctx->addr/addr in __check_page_state_visitor due to missing
> > commit
> > "KVM: arm64: Combine visitor arguments into a context
> > structure"
> > in stable.
> > ]
>
> Same question.
Noting what changes were made to the patch from the upstream mainline
version when it was applied to the stable tree.
>
> > Signed-off-by: Suraj Jitindar Singh <[email protected]>
>
> Again, I find this backport pretty pointless. What is the rationale
> for it?
The 2 patches were backported to address CVE-2023-21264.
This one addresses the CVE.
Thanks
>
> M.
>
On Thu, 2023-09-21 at 08:13 +0100, Marc Zyngier wrote:
> On Wed, 20 Sep 2023 20:27:28 +0100,
> Suraj Jitindar Singh <[email protected]> wrote:
> >
> > From: Quentin Perret <[email protected]>
> >
> > commit 43c1ff8b75011bc3e3e923adf31ba815864a2494 upstream.
> >
> > Memory regions marked as "no-map" in the host device-tree routinely
> > include TrustZone carev-outs and DMA pools. Although donating such
> > pages
> > to the hypervisor may not breach confidentiality, it could be used
> > to
> > corrupt its state in uncontrollable ways. To prevent this, let's
> > block
> > host-initiated memory transitions targeting "no-map" pages
> > altogether in
> > nVHE protected mode as there should be no valid reason to do this
> > in
> > current operation.
> >
> > Thankfully, the pKVM EL2 hypervisor has a full copy of the host's
> > list
> > of memblock regions, so we can easily check for the presence of the
> > MEMBLOCK_NOMAP flag on a region containing pages being donated from
> > the
> > host.
> >
> > Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
> > Tested-by: Vincent Donnefort <[email protected]>
> > Signed-off-by: Quentin Perret <[email protected]>
> > Signed-off-by: Will Deacon <[email protected]>
> > Signed-off-by: Marc Zyngier <[email protected]>
> > Link:
> > https://lore.kernel.org/r/[email protected]
> > [ bp: clean ]
>
> What is this?
Noting any details about the backport. In this case it was a clean
backport.
>
> > Signed-off-by: Suraj Jitindar Singh <[email protected]>
>
> What is the rationale for backporting this? It wasn't tagged as Cc:
> to
> stable for a reason: pKVM isn't functional upstream, and won't be for
> the next couple of cycles *at least*.
>
> So at it stands, I'm against such a backport.
>
The 2 patches were backported to address CVE-2023-21264.
This one provides context for the proceeding patch.
I wasn't aware that it's non functional. Does this mean that the code
won't be compiled or just that it can't actually be run currently from
the upstream codebase?
I guess I'm trying to understand if the conditions of the CVE are a
real concern even if it isn't technically functional.
Thanks
> Thanks,
>
> M.
>
On Thu, 21 Sep 2023 23:22:54 +0100,
"Jitindar Singh, Suraj" <[email protected]> wrote:
>
> On Thu, 2023-09-21 at 08:13 +0100, Marc Zyngier wrote:
> > On Wed, 20 Sep 2023 20:27:28 +0100,
> > Suraj Jitindar Singh <[email protected]> wrote:
> > >
> > > From: Quentin Perret <[email protected]>
> > >
> > > commit 43c1ff8b75011bc3e3e923adf31ba815864a2494 upstream.
> > >
> > > Memory regions marked as "no-map" in the host device-tree routinely
> > > include TrustZone carev-outs and DMA pools. Although donating such
> > > pages
> > > to the hypervisor may not breach confidentiality, it could be used
> > > to
> > > corrupt its state in uncontrollable ways. To prevent this, let's
> > > block
> > > host-initiated memory transitions targeting "no-map" pages
> > > altogether in
> > > nVHE protected mode as there should be no valid reason to do this
> > > in
> > > current operation.
> > >
> > > Thankfully, the pKVM EL2 hypervisor has a full copy of the host's
> > > list
> > > of memblock regions, so we can easily check for the presence of the
> > > MEMBLOCK_NOMAP flag on a region containing pages being donated from
> > > the
> > > host.
> > >
> > > Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
> > > Tested-by: Vincent Donnefort <[email protected]>
> > > Signed-off-by: Quentin Perret <[email protected]>
> > > Signed-off-by: Will Deacon <[email protected]>
> > > Signed-off-by: Marc Zyngier <[email protected]>
> > > Link:
> > > https://lore.kernel.org/r/[email protected]
> > > [ bp: clean ]
> >
> > What is this?
>
> Noting any details about the backport. In this case it was a clean
> backport.
I don't think this has anything to do here. If you want to add a note
indicating what was changed in the patch, make it *extremely* visible
in the commit message, and not hidden as some obscure form of
metadata.
>
> >
> > > Signed-off-by: Suraj Jitindar Singh <[email protected]>
> >
> > What is the rationale for backporting this? It wasn't tagged as Cc:
> > to
> > stable for a reason: pKVM isn't functional upstream, and won't be for
> > the next couple of cycles *at least*.
> >
> > So at it stands, I'm against such a backport.
> >
>
> The 2 patches were backported to address CVE-2023-21264.
> This one provides context for the proceeding patch.
I care about CVEs as much as I care about holes in my socks (i.e. very
little). If there is a concern, it should be brought up on the list as
a discussion, and not as a consequence of some script kiddie
automatically generating CVEs.
> I wasn't aware that it's non functional. Does this mean that the code
> won't be compiled or just that it can't actually be run currently from
> the upstream codebase?
This code is inactive unless you pass the correct option on the
command line, and as it is brings zero benefit over standard KVM. The
only place this matters is in the Android kernel, as it has full
support for pKVM, and has the fix already. We carry it upstream at a
courtesy to the pKVM developers, but that's about it.
> I guess I'm trying to understand if the conditions of the CVE are a
> real concern even if it isn't technically functional.
This CVE is a waste of precious bytes, and I have no interest in
seeing this backported.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
On Thu, Sep 21, 2023 at 10:22:54PM +0000, Jitindar Singh, Suraj wrote:
> On Thu, 2023-09-21 at 08:13 +0100, Marc Zyngier wrote:
> > On Wed, 20 Sep 2023 20:27:28 +0100,
> > Suraj Jitindar Singh <[email protected]> wrote:
> > >
> > > From: Quentin Perret <[email protected]>
> > >
> > > commit 43c1ff8b75011bc3e3e923adf31ba815864a2494 upstream.
> > >
> > > Memory regions marked as "no-map" in the host device-tree routinely
> > > include TrustZone carev-outs and DMA pools. Although donating such
> > > pages
> > > to the hypervisor may not breach confidentiality, it could be used
> > > to
> > > corrupt its state in uncontrollable ways. To prevent this, let's
> > > block
> > > host-initiated memory transitions targeting "no-map" pages
> > > altogether in
> > > nVHE protected mode as there should be no valid reason to do this
> > > in
> > > current operation.
> > >
> > > Thankfully, the pKVM EL2 hypervisor has a full copy of the host's
> > > list
> > > of memblock regions, so we can easily check for the presence of the
> > > MEMBLOCK_NOMAP flag on a region containing pages being donated from
> > > the
> > > host.
> > >
> > > Reviewed-by: Philippe Mathieu-Daud? <[email protected]>
> > > Tested-by: Vincent Donnefort <[email protected]>
> > > Signed-off-by: Quentin Perret <[email protected]>
> > > Signed-off-by: Will Deacon <[email protected]>
> > > Signed-off-by: Marc Zyngier <[email protected]>
> > > Link:
> > > https://lore.kernel.org/r/[email protected]
> > > [ bp: clean ]
> >
> > What is this?
>
> Noting any details about the backport. In this case it was a clean
> backport.
>
> >
> > > Signed-off-by: Suraj Jitindar Singh <[email protected]>
> >
> > What is the rationale for backporting this? It wasn't tagged as Cc:
> > to
> > stable for a reason: pKVM isn't functional upstream, and won't be for
> > the next couple of cycles *at least*.
> >
> > So at it stands, I'm against such a backport.
> >
>
> The 2 patches were backported to address CVE-2023-21264.
> This one provides context for the proceeding patch.
>
> I wasn't aware that it's non functional. Does this mean that the code
> won't be compiled or just that it can't actually be run currently from
> the upstream codebase?
>
> I guess I'm trying to understand if the conditions of the CVE are a
> real concern even if it isn't technically functional.
Why do you think the CVE is actually even valid? Who filed it and why?
Remember, CVEs almost never mean anything for the kernel, they are not
able to be given out by the kernel security team, and they just don't
make any sense for us.
I'll go drop these patches from the stable queues for now, and wait for
you all to agree what is happening here.
thanks,
greg k-h