Hi all-
As promised when Thomas did his GDT fixmap work, here is a draft patch
to speed up KVM by extending it.
The downside of this patch is that it makes the fixmap significantly
larger on 64-bit systems if NR_CPUS is large (it adds 15 more pages
per CPU). I don't know if we care at all. It also bloats the kernel
image by 4k and wastes 4k of RAM for the entire time the system is
booted. We could avoid the latter bit (sort of) by not mapping the
extra fixmap pages at all and handling the resulting faults somehow.
That would scare me -- now we have IRET generating #PF when running
malicious , and that way lies utter madness.
The upside is that we don't need to do LGDT after a vmexit on VMX.
LGDT is slooooooooooow. But no, I haven't benchmarked this yet.
What do you all think?
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/kvm&id=e249a09787d6956b52d8260b2326d8f12f768799
Andrew/Boris/Juergen: what does Xen think about setting a very high
GDT limit? Will it let us? Should I fix it by changing
load_fixmap_gdt() (i.e. uncommenting the commented bit) or by teaching
the Xen paravirt code to just ignore the monstrous limit? Or is it
not a problem in the first place?
--Andy
On 09/06/2017 02:13, Andy Lutomirski wrote:
> Hi all-
>
> As promised when Thomas did his GDT fixmap work, here is a draft patch
> to speed up KVM by extending it.
>
> The downside of this patch is that it makes the fixmap significantly
> larger on 64-bit systems if NR_CPUS is large (it adds 15 more pages
> per CPU). I don't know if we care at all. It also bloats the kernel
> image by 4k and wastes 4k of RAM for the entire time the system is
> booted. We could avoid the latter bit (sort of) by not mapping the
> extra fixmap pages at all and handling the resulting faults somehow.
> That would scare me -- now we have IRET generating #PF when running
> malicious , and that way lies utter madness.
>
> The upside is that we don't need to do LGDT after a vmexit on VMX.
> LGDT is slooooooooooow. But no, I haven't benchmarked this yet.
>
> What do you all think?
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/kvm&id=e249a09787d6956b52d8260b2326d8f12f768799
>
> Andrew/Boris/Juergen: what does Xen think about setting a very high
> GDT limit? Will it let us? Should I fix it by changing
> load_fixmap_gdt() (i.e. uncommenting the commented bit) or by teaching
> the Xen paravirt code to just ignore the monstrous limit? Or is it
> not a problem in the first place?
When running PV, any selector under 0xe000 is fair game, and anything
over that is Xen's.
OTOH, the set of software running as a PV guest, and also running KVM is
empty. An HVM guest (which when nested, is the only viable option to
run KVM) has total control over its GDT.
~Andrew
On 09/06/2017 03:13, Andy Lutomirski wrote:
> Hi all-
>
> As promised when Thomas did his GDT fixmap work, here is a draft patch
> to speed up KVM by extending it.
>
> The downside of this patch is that it makes the fixmap significantly
> larger on 64-bit systems if NR_CPUS is large (it adds 15 more pages
> per CPU). I don't know if we care at all. It also bloats the kernel
> image by 4k and wastes 4k of RAM for the entire time the system is
> booted. We could avoid the latter bit (sort of) by not mapping the
> extra fixmap pages at all and handling the resulting faults somehow.
> That would scare me -- now we have IRET generating #PF when running
> malicious , and that way lies utter madness.
>
> The upside is that we don't need to do LGDT after a vmexit on VMX.
> LGDT is slooooooooooow. But no, I haven't benchmarked this yet.
>
> What do you all think?
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/kvm&id=e249a09787d6956b52d8260b2326d8f12f768799
Not sure I understand this completely, but:
/* Get the fixmap index for a specific processor */
static inline unsigned int get_cpu_gdt_ro_index(int cpu)
{
- return FIX_GDT_REMAP_BEGIN + cpu;
+ return FIX_GDT_REMAP_END - cpu * PAGES_PER_GDT;
}
isn't this off by one. I think it should be
FIX_GDT_REMAP_END + 1 - cpu * PAGES_PER_GDT
or just FIX_GDT_REMAP_BEGIN + cpu * PAGES_PER_GDT? That is for example:
FIX_GDT_REMAP_BEGIN = 100
get_cpu_gdt_ro_index(0) = 100
get_cpu_gdt_ro_index(1) = 116
get_cpu_gdt_ro_index(2) = 132
get_cpu_gdt_ro_index(3) = 148
FIX_GDT_REMAP_END = 163
Paolo
On Fri, Jun 9, 2017 at 12:14 AM, Paolo Bonzini <[email protected]> wrote:
>
>
> On 09/06/2017 03:13, Andy Lutomirski wrote:
>> Hi all-
>>
>> As promised when Thomas did his GDT fixmap work, here is a draft patch
>> to speed up KVM by extending it.
>>
>> The downside of this patch is that it makes the fixmap significantly
>> larger on 64-bit systems if NR_CPUS is large (it adds 15 more pages
>> per CPU). I don't know if we care at all. It also bloats the kernel
>> image by 4k and wastes 4k of RAM for the entire time the system is
>> booted. We could avoid the latter bit (sort of) by not mapping the
>> extra fixmap pages at all and handling the resulting faults somehow.
>> That would scare me -- now we have IRET generating #PF when running
>> malicious , and that way lies utter madness.
>>
>> The upside is that we don't need to do LGDT after a vmexit on VMX.
>> LGDT is slooooooooooow. But no, I haven't benchmarked this yet.
>>
>> What do you all think?
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/kvm&id=e249a09787d6956b52d8260b2326d8f12f768799
>
> Not sure I understand this completely, but:
>
> /* Get the fixmap index for a specific processor */
> static inline unsigned int get_cpu_gdt_ro_index(int cpu)
> {
> - return FIX_GDT_REMAP_BEGIN + cpu;
> + return FIX_GDT_REMAP_END - cpu * PAGES_PER_GDT;
> }
>
> isn't this off by one. I think it should be
>
> FIX_GDT_REMAP_END + 1 - cpu * PAGES_PER_GDT
>
> or just FIX_GDT_REMAP_BEGIN + cpu * PAGES_PER_GDT? That is for example:
>
> FIX_GDT_REMAP_BEGIN = 100
> get_cpu_gdt_ro_index(0) = 100
> get_cpu_gdt_ro_index(1) = 116
> get_cpu_gdt_ro_index(2) = 132
> get_cpu_gdt_ro_index(3) = 148
> FIX_GDT_REMAP_END = 163
The issue here is that the fixmap is upside down: lower indices are
*higher* addresses, which means that, if we have a multi-page GDT, we
need get_cpu_gdt_ro_index() to return an index of the lowest page in
each GDT. The simplest way seems to be to put them in ascending
order.
With the range of indices being 100 .. 163 (with 4 CPUs), we'd want
the GDTs to be at:
163..148
147..132
131..116
115..100
so FIX_GDT_REMAP_END - cpu * PAGES_PER_GDT is correct, I think. Or am
I still off by one?
--Andy
On 09/06/2017 17:45, Andy Lutomirski wrote:
> On Fri, Jun 9, 2017 at 12:14 AM, Paolo Bonzini <[email protected]> wrote:
>>
>>
>> On 09/06/2017 03:13, Andy Lutomirski wrote:
>>> Hi all-
>>>
>>> As promised when Thomas did his GDT fixmap work, here is a draft patch
>>> to speed up KVM by extending it.
>>>
>>> The downside of this patch is that it makes the fixmap significantly
>>> larger on 64-bit systems if NR_CPUS is large (it adds 15 more pages
>>> per CPU). I don't know if we care at all. It also bloats the kernel
>>> image by 4k and wastes 4k of RAM for the entire time the system is
>>> booted. We could avoid the latter bit (sort of) by not mapping the
>>> extra fixmap pages at all and handling the resulting faults somehow.
>>> That would scare me -- now we have IRET generating #PF when running
>>> malicious , and that way lies utter madness.
>>>
>>> The upside is that we don't need to do LGDT after a vmexit on VMX.
>>> LGDT is slooooooooooow. But no, I haven't benchmarked this yet.
>>>
>>> What do you all think?
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/kvm&id=e249a09787d6956b52d8260b2326d8f12f768799
>>
>> Not sure I understand this completely, but:
>>
>> /* Get the fixmap index for a specific processor */
>> static inline unsigned int get_cpu_gdt_ro_index(int cpu)
>> {
>> - return FIX_GDT_REMAP_BEGIN + cpu;
>> + return FIX_GDT_REMAP_END - cpu * PAGES_PER_GDT;
>> }
>>
>> isn't this off by one. I think it should be
>>
>> FIX_GDT_REMAP_END + 1 - cpu * PAGES_PER_GDT
>>
>> or just FIX_GDT_REMAP_BEGIN + cpu * PAGES_PER_GDT? That is for example:
>>
>> FIX_GDT_REMAP_BEGIN = 100
>> get_cpu_gdt_ro_index(0) = 100
>> get_cpu_gdt_ro_index(1) = 116
>> get_cpu_gdt_ro_index(2) = 132
>> get_cpu_gdt_ro_index(3) = 148
>> FIX_GDT_REMAP_END = 163
>
> The issue here is that the fixmap is upside down: lower indices are
> *higher* addresses, which means that, if we have a multi-page GDT, we
> need get_cpu_gdt_ro_index() to return an index of the lowest page in
> each GDT. The simplest way seems to be to put them in ascending
> order.
>
> With the range of indices being 100 .. 163 (with 4 CPUs), we'd want
> the GDTs to be at:
>
> 163..148
> 147..132
> 131..116
> 115..100
>
> so FIX_GDT_REMAP_END - cpu * PAGES_PER_GDT is correct, I think. Or am
> I still off by one?
No, you're right. Thanks for explaining!
Paolo