On 8/3/22 11:11, Tom Lendacky wrote:
> + /*
> + * Use the MSR protocol when either:
> + * - executing in an NMI to avoid any possibility of a deadlock
> + * - per-CPU GHCBs are not yet registered, since __vmgexit_psc()
> + * uses the per-CPU GHCB.
> + */
> + if (in_nmi() || !ghcb_percpu_ready)
> + return early_set_pages_state(__pa(vaddr), npages, op);
> +
> + spin_lock_irqsave(&psc_desc_lock, flags);
Would it be simpler to just do a spin_trylock_irqsave()? You fall back
to early_set_pages_state() whenever you can't acquire the lock.
That avoids even having to know what the situations are where you
_might_ recurse. If it recurses, the trylock will just naturally fail.
You simply can't have bugs where the "(in_nmi() || !ghcb_percpu_ready)"
conditional was wrong.
On 8/3/22 13:17, Dave Hansen wrote:
> On 8/3/22 11:11, Tom Lendacky wrote:
>> + /*
>> + * Use the MSR protocol when either:
>> + * - executing in an NMI to avoid any possibility of a deadlock
>> + * - per-CPU GHCBs are not yet registered, since __vmgexit_psc()
>> + * uses the per-CPU GHCB.
>> + */
>> + if (in_nmi() || !ghcb_percpu_ready)
>> + return early_set_pages_state(__pa(vaddr), npages, op);
>> +
>> + spin_lock_irqsave(&psc_desc_lock, flags);
>
> Would it be simpler to just do a spin_trylock_irqsave()? You fall back
> to early_set_pages_state() whenever you can't acquire the lock.
I was looking at that and can definitely go that route if this approach is
preferred.
Thanks,
Tom
>
> That avoids even having to know what the situations are where you
> _might_ recurse. If it recurses, the trylock will just naturally fail.
> You simply can't have bugs where the "(in_nmi() || !ghcb_percpu_ready)"
> conditional was wrong.
On 8/3/22 11:21, Tom Lendacky wrote:
>> Would it be simpler to just do a spin_trylock_irqsave()? You fall back
>> to early_set_pages_state() whenever you can't acquire the lock.
>
> I was looking at that and can definitely go that route if this approach
> is preferred.
I prefer it for sure.
This whole iteration does look good to me versus the per-cpu version, so
I say go ahead with doing this for v2 once you wait a bit for any more
feedback.
On 8/3/22 13:24, Dave Hansen wrote:
> On 8/3/22 11:21, Tom Lendacky wrote:
>>> Would it be simpler to just do a spin_trylock_irqsave()? You fall back
>>> to early_set_pages_state() whenever you can't acquire the lock.
>>
>> I was looking at that and can definitely go that route if this approach
>> is preferred.
>
> I prefer it for sure.
>
> This whole iteration does look good to me versus the per-cpu version, so
> I say go ahead with doing this for v2 once you wait a bit for any more
> feedback.
I'm still concerned about the whole spinlock and performance. What if I
reduce the number of entries in the PSC structure to, say, 64, which
reduces the size of the struct to 520 bytes. Any issue if that is put on
the stack, instead? It definitely makes things less complicated and feels
like a good compromise on the size vs the number of PSC VMGEXIT requests.
Thanks,
Tom
On 8/3/22 14:03, Tom Lendacky wrote:
>> This whole iteration does look good to me versus the per-cpu version, so
>> I say go ahead with doing this for v2 once you wait a bit for any more
>> feedback.
>
> I'm still concerned about the whole spinlock and performance. What if I
> reduce the number of entries in the PSC structure to, say, 64, which
> reduces the size of the struct to 520 bytes. Any issue if that is put on
> the stack, instead? It definitely makes things less complicated and
> feels like a good compromise on the size vs the number of PSC VMGEXIT
> requests.
That would be fine too.
But, I doubt there will be any real performance issues coming out of
this. As bad as this MSR thing is, I suspect it's not half as
disastrous as the global spinlock in Kirill's patches.
Also, private<->shared page conversions are *NOT* common from what I can
tell. There are a few pages converted at boot, but most host the
guest<->host communications are through the swiotlb pages which are static.
Are there other things that SEV uses this structure for that I'm missing?
On 8/3/22 16:18, Dave Hansen wrote:
> On 8/3/22 14:03, Tom Lendacky wrote:
>>> This whole iteration does look good to me versus the per-cpu version, so
>>> I say go ahead with doing this for v2 once you wait a bit for any more
>>> feedback.
>>
>> I'm still concerned about the whole spinlock and performance. What if I
>> reduce the number of entries in the PSC structure to, say, 64, which
>> reduces the size of the struct to 520 bytes. Any issue if that is put on
>> the stack, instead? It definitely makes things less complicated and
>> feels like a good compromise on the size vs the number of PSC VMGEXIT
>> requests.
>
> That would be fine too.
Ok.
>
> But, I doubt there will be any real performance issues coming out of
> this. As bad as this MSR thing is, I suspect it's not half as
> disastrous as the global spinlock in Kirill's patches.
>
> Also, private<->shared page conversions are *NOT* common from what I can
> tell. There are a few pages converted at boot, but most host the
> guest<->host communications are through the swiotlb pages which are static.
Generally, that's true. But, e.g., a dma_alloc_coherent() actually doesn't
go through SWIOTLB, but instead allocates the pages and makes them shared,
which results in a page state change. The NVMe driver was calling that API
a lot. In this case, though, the NVMe driver was running in IRQ context
and set_memory_decrypted() could sleep, so an unencrypted DMA memory pool
was created to work around the sleeping issue and reduce the page state
changes. It's just things like that, that make me wary.
Thanks,
Tom
>
> Are there other things that SEV uses this structure for that I'm missing?
On 8/3/22 14:34, Tom Lendacky wrote:
>> Also, private<->shared page conversions are *NOT* common from what I can
>> tell. There are a few pages converted at boot, but most host the
>> guest<->host communications are through the swiotlb pages which are
>> static.
>
> Generally, that's true. But, e.g., a dma_alloc_coherent() actually
> doesn't go through SWIOTLB, but instead allocates the pages and makes
> them shared, which results in a page state change. The NVMe driver was
> calling that API a lot. In this case, though, the NVMe driver was
> running in IRQ context and set_memory_decrypted() could sleep, so an
> unencrypted DMA memory pool was created to work around the sleeping
> issue and reduce the page state changes. It's just things like that,
> that make me wary.
Interesting. Is that a real passthrough NVMe device or the hypervisor
presenting a virtual one that just happens to use the NVMe driver?
I'm pretty sure the TDX folks have been banking on having very few page
state changes. But, part of that at least is their expectation of
relying heavily on virtio.
I wonder if their expectations are accurate, or if once TDX gets out
into the real world if their hopes will be dashed.
On 8/3/22 16:48, Dave Hansen wrote:
> On 8/3/22 14:34, Tom Lendacky wrote:
>>> Also, private<->shared page conversions are *NOT* common from what I can
>>> tell. There are a few pages converted at boot, but most host the
>>> guest<->host communications are through the swiotlb pages which are
>>> static.
>>
>> Generally, that's true. But, e.g., a dma_alloc_coherent() actually
>> doesn't go through SWIOTLB, but instead allocates the pages and makes
>> them shared, which results in a page state change. The NVMe driver was
>> calling that API a lot. In this case, though, the NVMe driver was
>> running in IRQ context and set_memory_decrypted() could sleep, so an
>> unencrypted DMA memory pool was created to work around the sleeping
>> issue and reduce the page state changes. It's just things like that,
>> that make me wary.
>
> Interesting. Is that a real passthrough NVMe device or the hypervisor
> presenting a virtual one that just happens to use the NVMe driver?
Hmmm... not sure, possibly the latter. I just knew that whatever it was,
the NVMe driver was being used.
Thanks,
Tom
>
> I'm pretty sure the TDX folks have been banking on having very few page
> state changes. But, part of that at least is their expectation of
> relying heavily on virtio.
>
> I wonder if their expectations are accurate, or if once TDX gets out
> into the real world if their hopes will be dashed.