When running under TDX the virtio host is untrusted. The bulk
of the kernel memory is encrypted and protected, but the virtio
ring is in special shared memory that is shared with the
untrusted host.
This means virtio needs to be hardened against any attacks from
the host through the ring. Of course it's impossible to prevent DOS
(the host can chose at any time to stop doing IO), but there
should be no buffer overruns or similar that might give access to
any private memory in the guest.
virtio has a lot of modes, most are difficult to harden.
The best for hardening seems to be split mode without indirect
descriptors. This also simplifies the hardening job because
it's only a single code path.
Only allow split mode when in a protected guest. Followon
patches harden the split mode code paths, and we don't want
an malicious host to force anything else. Also disallow
indirect mode for similar reasons.
Signed-off-by: Andi Kleen <[email protected]>
---
drivers/virtio/virtio_ring.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71e16b53e9c1..f35629fa47b1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -11,6 +11,7 @@
#include <linux/module.h>
#include <linux/hrtimer.h>
#include <linux/dma-mapping.h>
+#include <linux/protected_guest.h>
#include <xen/xen.h>
#ifdef DEBUG
@@ -2221,8 +2222,16 @@ void vring_transport_features(struct virtio_device *vdev)
unsigned int i;
for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
+
+ /*
+ * In protected guest mode disallow packed or indirect
+ * because they ain't hardened.
+ */
+
switch (i) {
case VIRTIO_RING_F_INDIRECT_DESC:
+ if (protected_guest_has(VM_MEM_ENCRYPT))
+ goto clear;
break;
case VIRTIO_RING_F_EVENT_IDX:
break;
@@ -2231,9 +2240,12 @@ void vring_transport_features(struct virtio_device *vdev)
case VIRTIO_F_ACCESS_PLATFORM:
break;
case VIRTIO_F_RING_PACKED:
+ if (protected_guest_has(VM_MEM_ENCRYPT))
+ goto clear;
break;
case VIRTIO_F_ORDER_PLATFORM:
break;
+ clear:
default:
/* We don't understand this bit. */
__virtio_clear_bit(vdev, i);
--
2.25.4
?? 2021/6/3 ????8:41, Andi Kleen д??:
> When running under TDX the virtio host is untrusted. The bulk
> of the kernel memory is encrypted and protected, but the virtio
> ring is in special shared memory that is shared with the
> untrusted host.
>
> This means virtio needs to be hardened against any attacks from
> the host through the ring. Of course it's impossible to prevent DOS
> (the host can chose at any time to stop doing IO), but there
> should be no buffer overruns or similar that might give access to
> any private memory in the guest.
>
> virtio has a lot of modes, most are difficult to harden.
>
> The best for hardening seems to be split mode without indirect
> descriptors. This also simplifies the hardening job because
> it's only a single code path.
>
> Only allow split mode when in a protected guest. Followon
> patches harden the split mode code paths, and we don't want
> an malicious host to force anything else. Also disallow
> indirect mode for similar reasons.
>
> Signed-off-by: Andi Kleen <[email protected]>
> ---
> drivers/virtio/virtio_ring.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 71e16b53e9c1..f35629fa47b1 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -11,6 +11,7 @@
> #include <linux/module.h>
> #include <linux/hrtimer.h>
> #include <linux/dma-mapping.h>
> +#include <linux/protected_guest.h>
> #include <xen/xen.h>
>
> #ifdef DEBUG
> @@ -2221,8 +2222,16 @@ void vring_transport_features(struct virtio_device *vdev)
> unsigned int i;
>
> for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
> +
> + /*
> + * In protected guest mode disallow packed or indirect
> + * because they ain't hardened.
> + */
> +
> switch (i) {
> case VIRTIO_RING_F_INDIRECT_DESC:
> + if (protected_guest_has(VM_MEM_ENCRYPT))
> + goto clear;
So we will see huge performance regression without indirect descriptor.
We need to consider to address this.
Thanks
> break;
> case VIRTIO_RING_F_EVENT_IDX:
> break;
> @@ -2231,9 +2240,12 @@ void vring_transport_features(struct virtio_device *vdev)
> case VIRTIO_F_ACCESS_PLATFORM:
> break;
> case VIRTIO_F_RING_PACKED:
> + if (protected_guest_has(VM_MEM_ENCRYPT))
> + goto clear;
> break;
> case VIRTIO_F_ORDER_PLATFORM:
> break;
> + clear:
> default:
> /* We don't understand this bit. */
> __virtio_clear_bit(vdev, i);
> So we will see huge performance regression without indirect
> descriptor. We need to consider to address this.
A regression would be when some existing case would be slower.
That's not the case because the behavior for the existing cases does not
change.
Anyways when there are performance problems they can be addressed, but
first is to make it secure.
-Andi
>
> Thanks
>
>
>> break;
>> case VIRTIO_RING_F_EVENT_IDX:
>> break;
>> @@ -2231,9 +2240,12 @@ void vring_transport_features(struct
>> virtio_device *vdev)
>> case VIRTIO_F_ACCESS_PLATFORM:
>> break;
>> case VIRTIO_F_RING_PACKED:
>> + if (protected_guest_has(VM_MEM_ENCRYPT))
>> + goto clear;
>> break;
>> case VIRTIO_F_ORDER_PLATFORM:
>> break;
>> + clear:
>> default:
>> /* We don't understand this bit. */
>> __virtio_clear_bit(vdev, i);
>
在 2021/6/3 上午9:48, Andi Kleen 写道:
>
>> So we will see huge performance regression without indirect
>> descriptor. We need to consider to address this.
>
> A regression would be when some existing case would be slower.
>
> That's not the case because the behavior for the existing cases does
> not change.
>
> Anyways when there are performance problems they can be addressed, but
> first is to make it secure.
I agree, but I want to know why indirect descriptor needs to be
disabled. The table can't be wrote by the device since it's not coherent
swiotlb mapping.
Thanks
>
> -Andi
>
>
>>
>> Thanks
>>
>>
>>> break;
>>> case VIRTIO_RING_F_EVENT_IDX:
>>> break;
>>> @@ -2231,9 +2240,12 @@ void vring_transport_features(struct
>>> virtio_device *vdev)
>>> case VIRTIO_F_ACCESS_PLATFORM:
>>> break;
>>> case VIRTIO_F_RING_PACKED:
>>> + if (protected_guest_has(VM_MEM_ENCRYPT))
>>> + goto clear;
>>> break;
>>> case VIRTIO_F_ORDER_PLATFORM:
>>> break;
>>> + clear:
>>> default:
>>> /* We don't understand this bit. */
>>> __virtio_clear_bit(vdev, i);
>>
>
>
> I agree, but I want to know why indirect descriptor needs to be
> disabled. The table can't be wrote by the device since it's not
> coherent swiotlb mapping.
I had all kinds of problems with uninitialized entries in the indirect
table. So I gave up on it and concluded it would be too difficult to secure.
-Andi
在 2021/6/3 上午10:56, Andi Kleen 写道:
>
>>
>> I agree, but I want to know why indirect descriptor needs to be
>> disabled. The table can't be wrote by the device since it's not
>> coherent swiotlb mapping.
>
> I had all kinds of problems with uninitialized entries in the indirect
> table. So I gave up on it and concluded it would be too difficult to
> secure.
>
>
> -Andi
>
>
Ok, but what I meant is this, if we don't read from the descriptor ring,
and validate all the other metadata supplied by the device (used id and
len). Then there should be no way for the device to suppress the dma
flags to write to the indirect descriptor table.
Or do you have an example how it can do that?
Thanks
> Ok, but what I meant is this, if we don't read from the descriptor
> ring, and validate all the other metadata supplied by the device (used
> id and len). Then there should be no way for the device to suppress
> the dma flags to write to the indirect descriptor table.
>
> Or do you have an example how it can do that?
I don't. If you can validate everything it's probably ok
The only drawback is even more code to audit and test.
-Andi
On 6/2/21 5:41 PM, Andi Kleen wrote:
> Only allow split mode when in a protected guest. Followon
> patches harden the split mode code paths, and we don't want
> an malicious host to force anything else. Also disallow
> indirect mode for similar reasons.
I read this as "the virtio driver is buggy. Let's disable most of the
buggy code in one special case in which we need a driver without bugs.
In all the other cases (e.g. hardware virtio device connected over
USB-C), driver bugs are still allowed."
Can we just fix the driver without special cases?
--Andy
On 6/3/2021 10:33 AM, Andy Lutomirski wrote:
> On 6/2/21 5:41 PM, Andi Kleen wrote:
>> Only allow split mode when in a protected guest. Followon
>> patches harden the split mode code paths, and we don't want
>> an malicious host to force anything else. Also disallow
>> indirect mode for similar reasons.
> I read this as "the virtio driver is buggy. Let's disable most of the
> buggy code in one special case in which we need a driver without bugs.
> In all the other cases (e.g. hardware virtio device connected over
> USB-C), driver bugs are still allowed."
My understanding is most of the other modes (except for split with separate descriptors) are obsolete and just there for compatibility. As long as they're deprecated they won't harm anyone.
-Andi
On Thu, Jun 3, 2021, at 11:00 AM, Andi Kleen wrote:
>
> On 6/3/2021 10:33 AM, Andy Lutomirski wrote:
> > On 6/2/21 5:41 PM, Andi Kleen wrote:
> >> Only allow split mode when in a protected guest. Followon
> >> patches harden the split mode code paths, and we don't want
> >> an malicious host to force anything else. Also disallow
> >> indirect mode for similar reasons.
> > I read this as "the virtio driver is buggy. Let's disable most of the
> > buggy code in one special case in which we need a driver without bugs.
> > In all the other cases (e.g. hardware virtio device connected over
> > USB-C), driver bugs are still allowed."
>
> My understanding is most of the other modes (except for split with
> separate descriptors) are obsolete and just there for compatibility. As
> long as they're deprecated they won't harm anyone.
>
>
Tell that to every crypto downgrade attack ever.
I see two credible solutions:
1. Actually harden the virtio driver.
2. Have a new virtio-modern driver and use it for modern use cases. Maybe rename the old driver virtio-legacy or virtio-insecure. They can share code.
Another snag you may hit: virtio’s heuristic for whether to use proper DMA ops or to bypass them is a giant kludge. I’m very slightly optimistic that getting the heuristic wrong will make the driver fail to operate but won’t allow the host to take over the guest, but I’m not really convinced. And I wrote that code! A virtio-modern mode probably should not have a heuristic, and the various iommu-bypassing modes should be fixed to work at the bus level, not the device level.
> Tell that to every crypto downgrade attack ever.
That's exactly what this patch addresses.
>
> I see two credible solutions:
>
> 1. Actually harden the virtio driver.
That's exactly what this patchkit, and the alternative approaches, like
Jason's, are doing.
>
> 2. Have a new virtio-modern driver and use it for modern use cases. Maybe rename the old driver virtio-legacy or virtio-insecure. They can share code.
In most use cases the legacy driver is not insecure because there is no
memory protection anyways.
Yes maybe such a split would be a good idea for maintenance and maybe
performance reasons, but at least from the security perspective I don't
see any need for it.
>
> Another snag you may hit: virtio’s heuristic for whether to use proper DMA ops or to bypass them is a giant kludge. I’m very slightly optimistic that getting the heuristic wrong will make the driver fail to operate but won’t allow the host to take over the guest, but I’m not really convinced. And I wrote that code! A virtio-modern mode probably should not have a heuristic, and the various iommu-bypassing modes should be fixed to work at the bus level, not the device level
TDX and SEV use the arch hook to enforce DMA API, so that part is also
solved.
-Andi
On Thu, Jun 3, 2021, at 12:53 PM, Andi Kleen wrote:
>
> > Tell that to every crypto downgrade attack ever.
>
> That's exactly what this patch addresses.
>
> >
> > I see two credible solutions:
> >
> > 1. Actually harden the virtio driver.
> That's exactly what this patchkit, and the alternative approaches, like
> Jason's, are doing.
> >
> > 2. Have a new virtio-modern driver and use it for modern use cases. Maybe rename the old driver virtio-legacy or virtio-insecure. They can share code.
>
> In most use cases the legacy driver is not insecure because there is no
> memory protection anyways.
>
> Yes maybe such a split would be a good idea for maintenance and maybe
> performance reasons, but at least from the security perspective I don't
> see any need for it.
Please reread my email.
We do not need an increasing pile of kludges to make TDX and SEV “secure”. We need the actual loaded driver to be secure. The virtio architecture is full of legacy nonsense, and there is no good reason for SEV and TDX to be a giant special case.
As I said before, real PCIe (Thunderbolt/USB-C or anything else) has the exact same problem. The fact that TDX has encrypted memory is, at best, a poor proxy for the actual condition. The actual condition is that the host does not trust the device to implement the virtio protocol correctly.
>
> >
> > Another snag you may hit: virtio’s heuristic for whether to use proper DMA ops or to bypass them is a giant kludge. I’m very slightly optimistic that getting the heuristic wrong will make the driver fail to operate but won’t allow the host to take over the guest, but I’m not really convinced. And I wrote that code! A virtio-modern mode probably should not have a heuristic, and the various iommu-bypassing modes should be fixed to work at the bus level, not the device level
>
> TDX and SEV use the arch hook to enforce DMA API, so that part is also
> solved.
>
Can you point me to the code you’re referring to?
>
> -Andi
>
>
> We do not need an increasing pile of kludges
Do you mean disabling features is a kludge?
If yes I disagree with that characterization.
> to make TDX and SEV “secure”. We need the actual loaded driver to be secure. The virtio architecture is full of legacy nonsense,
> and there is no good reason for SEV and TDX to be a giant special case.
I don't know where you see a "giant special case". Except for the
limited feature negotiation all the changes are common, and the
disabling of features (which is not new BTW, but already done e.g. with
forcing DMA API in some cases) can be of course used by all these other
technologies too. But it just cannot be done by default for everything
because it would break compatibility. So every technology with such
requirements has to explicitly opt-in.
>
> As I said before, real PCIe (Thunderbolt/USB-C or anything else) has the exact same problem. The fact that TDX has encrypted memory is, at best, a poor proxy for the actual condition. The actual condition is that the host does not trust the device to implement the virtio protocol correctly.
Right they can do similar limitations of feature sets. But again it
cannot be default.
>
>>
>> TDX and SEV use the arch hook to enforce DMA API, so that part is also
>> solved.
>>
> Can you point me to the code you’re referring to?
See 4/8 in this patch kit. It uses an existing hook which is already
used in tree by s390.
-Andi
在 2021/6/4 上午3:31, Andy Lutomirski 写道:
>
> On Thu, Jun 3, 2021, at 11:00 AM, Andi Kleen wrote:
>> On 6/3/2021 10:33 AM, Andy Lutomirski wrote:
>>> On 6/2/21 5:41 PM, Andi Kleen wrote:
>>>> Only allow split mode when in a protected guest. Followon
>>>> patches harden the split mode code paths, and we don't want
>>>> an malicious host to force anything else. Also disallow
>>>> indirect mode for similar reasons.
>>> I read this as "the virtio driver is buggy. Let's disable most of the
>>> buggy code in one special case in which we need a driver without bugs.
>>> In all the other cases (e.g. hardware virtio device connected over
>>> USB-C), driver bugs are still allowed."
>> My understanding is most of the other modes (except for split with
>> separate descriptors) are obsolete and just there for compatibility. As
>> long as they're deprecated they won't harm anyone.
>>
>>
> Tell that to every crypto downgrade attack ever.
>
> I see two credible solutions:
>
> 1. Actually harden the virtio driver.
>
> 2. Have a new virtio-modern driver and use it for modern use cases. Maybe rename the old driver virtio-legacy or virtio-insecure. They can share code.
Note that we had already split legacy driver out which can be turned off
via Kconfig.
>
> Another snag you may hit: virtio’s heuristic for whether to use proper DMA ops or to bypass them is a giant kludge. I’m very slightly optimistic that getting the heuristic wrong will make the driver fail to operate but won’t allow the host to take over the guest, but I’m not really convinced. And I wrote that code! A virtio-modern mode probably should not have a heuristic, and the various iommu-bypassing modes should be fixed to work at the bus level, not the device level.
I remember there's a very long discussion about this and probably
without any conclusion. Fortunately, the management layer has been
taught to enforce VIRTIO_F_ACCESS_PLATFORM for encrypted guests.
A possible way to fix this is without any conflicts is to mandate the
VIRTIO_F_ACCESS_PLATFORM in version 1.2.
Thanks
>
在 2021/6/4 上午2:00, Andi Kleen 写道:
>
> On 6/3/2021 10:33 AM, Andy Lutomirski wrote:
>> On 6/2/21 5:41 PM, Andi Kleen wrote:
>>> Only allow split mode when in a protected guest. Followon
>>> patches harden the split mode code paths, and we don't want
>>> an malicious host to force anything else. Also disallow
>>> indirect mode for similar reasons.
>> I read this as "the virtio driver is buggy. Let's disable most of the
>> buggy code in one special case in which we need a driver without bugs.
>> In all the other cases (e.g. hardware virtio device connected over
>> USB-C), driver bugs are still allowed."
>
> My understanding is most of the other modes (except for split with
> separate descriptors) are obsolete and just there for compatibility.
> As long as they're deprecated they won't harm anyone.
>
> -Andi
>
For "mode" do you packed vs split? If yes, it's not just for
compatibility. Though packed virtqueue is designed to be more hardware
friendly, most hardware vendors choose to start from split.
Thanks
On 6/3/21 4:32 PM, Andi Kleen wrote:
>
>> We do not need an increasing pile of kludges
>
> Do you mean disabling features is a kludge?
>
> If yes I disagree with that characterization.
>
>
>> to make TDX and SEV “secure”. We need the actual loaded driver to be
>> secure. The virtio architecture is full of legacy nonsense,
>> and there is no good reason for SEV and TDX to be a giant special case.
>
> I don't know where you see a "giant special case". Except for the
> limited feature negotiation all the changes are common, and the
> disabling of features (which is not new BTW, but already done e.g. with
> forcing DMA API in some cases) can be of course used by all these other
> technologies too. But it just cannot be done by default for everything
> because it would break compatibility. So every technology with such
> requirements has to explicitly opt-in.
>
>
>>
>> As I said before, real PCIe (Thunderbolt/USB-C or anything else) has
>> the exact same problem. The fact that TDX has encrypted memory is, at
>> best, a poor proxy for the actual condition. The actual condition is
>> that the host does not trust the device to implement the virtio
>> protocol correctly.
>
> Right they can do similar limitations of feature sets. But again it
> cannot be default.
Let me try again.
For most Linux drivers, a report that a misbehaving device can corrupt
host memory is a bug, not a feature. If a USB device can corrupt kernel
memory, that's a serious bug. If a USB-C device can corrupt kernel
memory, that's also a serious bug, although, sadly, we probably have
lots of these bugs. If a Firewire device can corrupt kernel memory,
news at 11. If a Bluetooth or WiFi peer can corrupt kernel memory,
people write sonnets about it and give it clever names. Why is virtio
special?
If, for some reason, the virtio driver cannot be fixed so that it is
secure and compatible [1], then I think that the limited cases that are
secure should be accessible to anyone, with or without TDX. Have a
virtio.secure_mode module option or a udev-controllable parameter or an
alternative driver name or *something*. An alternative driver name
would allow userspace to prevent the insecure mode from auto-binding to
devices. And make whatever system configures encrypted guests for
security use this mode. (Linux is not going to be magically secure just
by booting it in TDX. There's a whole process of unsealing or remote
attestation, something needs to prevent the hypervisor from connecting a
virtual keyboard and typing init=/bin/bash, something needs to provision
an SSH key, etc.)
In my opinion, it is not so great to identify bugs in the driver and
then say that they're only being fixed for TDX and SEV.
Keep in mind that, as I understand it, there is nothing virt specific
about virtio. There are real physical devices that speak virtio.
[1] The DMA quirk is nasty. Fortunately, it's the only case I'm aware
of in which the virtio driver genuinely cannot be made secure and
compatible at the smae time. Also, fortunately, most real deployments
except on powerpc work just fine with the DMA quirk unquirked.
>
>
>>
>>>
>>> TDX and SEV use the arch hook to enforce DMA API, so that part is also
>>> solved.
>>>
>> Can you point me to the code you’re referring to?
>
> See 4/8 in this patch kit. It uses an existing hook which is already
> used in tree by s390.
This one:
int arch_has_restricted_virtio_memory_access(void)
+{
+ return is_tdx_guest();
+}
I'm looking at a fairly recent kernel, and I don't see anything for s390
wired up in vring_use_dma_api.
> For most Linux drivers, a report that a misbehaving device can corrupt
> host memory is a bug, not a feature. If a USB device can corrupt kernel
> memory, that's a serious bug. If a USB-C device can corrupt kernel
> memory, that's also a serious bug, although, sadly, we probably have
> lots of these bugs. If a Firewire device can corrupt kernel memory,
> news at 11. If a Bluetooth or WiFi peer can corrupt kernel memory,
> people write sonnets about it and give it clever names. Why is virtio
> special?
Well for most cases it's pointless because they don't have any memory
protection anyways.
Why break compatibility if it does not buy you anything?
Anyways if you want to enable the restricted mode for something else,
it's easy to do. The cases where it matters seem to already work on it,
like the user space virtio ring.
My changes for boundary checking are enabled unconditionally anyways, as
well as the other patchkits.
>
> This one:
>
> int arch_has_restricted_virtio_memory_access(void)
> +{
> + return is_tdx_guest();
> +}
>
> I'm looking at a fairly recent kernel, and I don't see anything for s390
> wired up in vring_use_dma_api.
It's not using vring_use_dma_api, but enforces the DMA API at virtio
ring setup time, same as SEV/TDX.
-Andi
在 2021/6/4 上午1:33, Andy Lutomirski 写道:
> On 6/2/21 5:41 PM, Andi Kleen wrote:
>> Only allow split mode when in a protected guest. Followon
>> patches harden the split mode code paths, and we don't want
>> an malicious host to force anything else. Also disallow
>> indirect mode for similar reasons.
> I read this as "the virtio driver is buggy. Let's disable most of the
> buggy code in one special case in which we need a driver without bugs.
> In all the other cases (e.g. hardware virtio device connected over
> USB-C), driver bugs are still allowed."
>
> Can we just fix the driver without special cases?
I think we can, this is what this series tries to do:
https://www.spinics.net/lists/kvm/msg241825.html
It tries to fix without a special caring for any specific features.
Thanks
>
> --Andy
>
在 2021/6/3 下午9:55, Andi Kleen 写道:
>
>> Ok, but what I meant is this, if we don't read from the descriptor
>> ring, and validate all the other metadata supplied by the device
>> (used id and len). Then there should be no way for the device to
>> suppress the dma flags to write to the indirect descriptor table.
>>
>> Or do you have an example how it can do that?
>
> I don't. If you can validate everything it's probably ok
>
> The only drawback is even more code to audit and test.
>
> -Andi
>
>
Ok, then I'm going to post a formal series, please have a look and we
can start from there.
Thanks