LinuxLists.cc - [PATCH v2 0/3] KVM: x86: KVM_MEM_PCI

[permalink] [raw]

Subject: Re: [PATCH v2 0/3] KVM: x86: KVM_MEM_PCI_HOLE memory

On Wed, Sep 02, 2020 at 10:59:20AM +0200, Vitaly Kuznetsov wrote:
> Peter Xu <[email protected]> writes:
> > My whole point was more about trying to understand the problem behind.
> > Providing a fast path for reading pci holes seems to be reasonable as is,
> > however it's just that I'm confused on why there're so many reads on the pci
> > holes after all. Another important question is I'm wondering how this series
> > will finally help the use case of microvm. I'm not sure I get the whole point
> > of it, but... if microvm is the major use case of this, it would be good to
> > provide some quick numbers on those if possible.
> >
> > For example, IIUC microvm uses qboot (as a better alternative than seabios) for
> > fast boot, and qboot has:
> >
> > https://github.com/bonzini/qboot/blob/master/pci.c#L20
> >
> > I'm kind of curious whether qboot will still be used when this series is used
> > with microvm VMs? Since those are still at least PIO based.
>
> I'm afraid there is no 'grand plan' for everything at this moment :-(
> For traditional VMs 0.04 sec per boot is negligible and definitely not
> worth adding a feature, memory requirements are also very
> different. When it comes to microvm-style usage things change.
>
> '8193' PCI hole accesses I mention in the PATCH0 blurb are just from
> Linux as I was doing direct kernel boot, we can't get better than that
> (if PCI is in the game of course). Firmware (qboot, seabios,...) can
> only add more. I *think* the plan is to eventually switch them all to
> MMCFG, at least for KVM guests, by default but we need something to put
> to the advertisement.

I see a similar ~8k PCI hole reads with a -kernel boot w/ OVMF. All but 60
of those are from pcibios_fixup_peer_bridges(), and all are from the kernel.
My understanding is that pcibios_fixup_peer_bridges() is useful if and only
if there multiple root buses. And AFAICT, when running under QEMU, the only
way for there to be multiple buses in is if there is an explicit bridge
created ("pxb" or "pxb-pcie"). Based on the cover letter from those[*], the
main reason for creating a bridge is to handle pinned CPUs on a NUMA system
with pass-through devices. That use case seems highly unlikely to cross
paths with micro VMs, i.e. micro VMs will only ever have a single bus.
Unless I'm mistaken, microvm doesn't even support PCI, does it?

If all of the above is true, this can be handled by adding "pci=lastbus=0"
as a guest kernel param to override its scanning of buses. And couldn't
that be done by QEMU's microvm_fix_kernel_cmdline() to make it transparent
to the end user?

[*] https://www.redhat.com/archives/libvir-list/2016-March/msg01213.html

2020-09-04 07:22:51

[permalink] [raw]

Subject: Re: [PATCH v2 0/3] KVM: x86: KVM_MEM_PCI_HOLE memory

Hi,

> attached. Though those should be far less than 8000+, and those should also be
> pio rather than mmio.

Well, seabios 1.14 (qemu 5.1) added mmio support (to speed up boot a
little bit, mmio is one and pio is two vmexits).

Depends on q35 obviously, and a few pio accesses remain b/c seabios has
to first setup mmconfig.

take care,
Gerd

2020-09-04 07:30:28

[permalink] [raw]

Subject: Re: [PATCH v2 0/3] KVM: x86: KVM_MEM_PCI_HOLE memory

Hi,

> Unless I'm mistaken, microvm doesn't even support PCI, does it?

Correct, no pci support right now.

We could probably wire up ecam (arm/virt style) for pcie support, once
the acpi support for mictovm finally landed (we need acpi for that
because otherwise the kernel wouldn't find the pcie bus).

Question is whenever there is a good reason to do so. Why would someone
prefer microvm with pcie support over q35?

> If all of the above is true, this can be handled by adding "pci=lastbus=0"
> as a guest kernel param to override its scanning of buses. And couldn't
> that be done by QEMU's microvm_fix_kernel_cmdline() to make it transparent
> to the end user?

microvm_fix_kernel_cmdline() is a hack, not a solution.

Beside that I doubt this has much of an effect on microvm because
it doesn't support pcie in the first place.

take care,
Gerd

2020-09-04 16:01:22

[permalink] [raw]

Subject: Re: [PATCH v2 0/3] KVM: x86: KVM_MEM_PCI_HOLE memory

On Fri, Sep 04, 2020 at 09:29:05AM +0200, Gerd Hoffmann wrote:
> Hi,
>
> > Unless I'm mistaken, microvm doesn't even support PCI, does it?
>
> Correct, no pci support right now.
>
> We could probably wire up ecam (arm/virt style) for pcie support, once
> the acpi support for mictovm finally landed (we need acpi for that
> because otherwise the kernel wouldn't find the pcie bus).
>
> Question is whenever there is a good reason to do so. Why would someone
> prefer microvm with pcie support over q35?
>
> > If all of the above is true, this can be handled by adding "pci=lastbus=0"
> > as a guest kernel param to override its scanning of buses. And couldn't
> > that be done by QEMU's microvm_fix_kernel_cmdline() to make it transparent
> > to the end user?
>
> microvm_fix_kernel_cmdline() is a hack, not a solution.
>
> Beside that I doubt this has much of an effect on microvm because
> it doesn't support pcie in the first place.

I am so confused. Vitaly, can you clarify exactly what QEMU VM type this
series is intended to help? If this is for microvm, then why is the guest
doing PCI scanning in the first place? If it's for q35, why is the
justification for microvm-like workloads?

Either way, I think it makes sense explore other options before throwing
something into KVM, e.g. modifying guest command line, adding a KVM hint,
"fixing" QEMU, etc...

2020-09-07 08:38:47

by Vitaly Kuznetsov

[permalink] [raw]

Subject: Re: [PATCH v2 0/3] KVM: x86: KVM_MEM_PCI_HOLE memory

Sean Christopherson <[email protected]> writes:

> On Fri, Sep 04, 2020 at 09:29:05AM +0200, Gerd Hoffmann wrote:
>> Hi,
>>
>> > Unless I'm mistaken, microvm doesn't even support PCI, does it?
>>
>> Correct, no pci support right now.
>>
>> We could probably wire up ecam (arm/virt style) for pcie support, once
>> the acpi support for mictovm finally landed (we need acpi for that
>> because otherwise the kernel wouldn't find the pcie bus).
>>
>> Question is whenever there is a good reason to do so. Why would someone
>> prefer microvm with pcie support over q35?
>>
>> > If all of the above is true, this can be handled by adding "pci=lastbus=0"
>> > as a guest kernel param to override its scanning of buses. And couldn't
>> > that be done by QEMU's microvm_fix_kernel_cmdline() to make it transparent
>> > to the end user?
>>
>> microvm_fix_kernel_cmdline() is a hack, not a solution.
>>
>> Beside that I doubt this has much of an effect on microvm because
>> it doesn't support pcie in the first place.
>
> I am so confused. Vitaly, can you clarify exactly what QEMU VM type this
> series is intended to help? If this is for microvm, then why is the guest
> doing PCI scanning in the first place? If it's for q35, why is the
> justification for microvm-like workloads?

I'm not exactly sure about the plans for particular machine types, the
intention was to use this for pcie in QEMU in general so whatever
machine type uses pcie will benefit.

Now, it seems that we have a more sophisticated landscape. The
optimization will only make sense to speed up boot so all 'traditional'
VM types with 'traditional' firmware are out of
question. 'Container-like' VMs seem to avoid PCI for now, I'm not sure
if it's because they're in early stages of their development, because
they can get away without PCI or, actually, because of slowness at boot
(which we're trying to tackle with this feature). I'd definitely like to
hear more what people think about this.

>
> Either way, I think it makes sense explore other options before throwing
> something into KVM, e.g. modifying guest command line, adding a KVM hint,
> "fixing" QEMU, etc...
>

Initially, this feature looked like a small and straitforward
(micro-)optimization to me: memory regions with 'PCI hole' semantics do
exist and we can speed up access to them. Ideally, I'd like to find
other 'constant memory' regions requiring fast access and come up with
an interface to create them in KVM but so far nothing interesting came
up...

--
Vitaly

2020-09-07 10:53:52

[permalink] [raw]

Subject: Re: [PATCH v2 0/3] KVM: x86: KVM_MEM_PCI_HOLE memory

On Fri, Sep 04, 2020 at 09:29:05AM +0200, Gerd Hoffmann wrote:
> Hi,
>
> > Unless I'm mistaken, microvm doesn't even support PCI, does it?
>
> Correct, no pci support right now.
>
> We could probably wire up ecam (arm/virt style) for pcie support, once
> the acpi support for mictovm finally landed (we need acpi for that
> because otherwise the kernel wouldn't find the pcie bus).
>
> Question is whenever there is a good reason to do so. Why would someone
> prefer microvm with pcie support over q35?

The usual reasons to use pcie apply to microvm just the same.
E.g.: pass through of pcie devices?

> > If all of the above is true, this can be handled by adding "pci=lastbus=0"
> > as a guest kernel param to override its scanning of buses. And couldn't
> > that be done by QEMU's microvm_fix_kernel_cmdline() to make it transparent
> > to the end user?
>
> microvm_fix_kernel_cmdline() is a hack, not a solution.
>
> Beside that I doubt this has much of an effect on microvm because
> it doesn't support pcie in the first place.
>
> take care,
> Gerd

2020-09-07 10:56:55

[permalink] [raw]

Subject: Re: [PATCH v2 0/3] KVM: x86: KVM_MEM_PCI_HOLE memory

On Thu, Sep 03, 2020 at 11:12:12PM -0700, Sean Christopherson wrote:
> On Wed, Sep 02, 2020 at 10:59:20AM +0200, Vitaly Kuznetsov wrote:
> > Peter Xu <[email protected]> writes:
> > > My whole point was more about trying to understand the problem behind.
> > > Providing a fast path for reading pci holes seems to be reasonable as is,
> > > however it's just that I'm confused on why there're so many reads on the pci
> > > holes after all. Another important question is I'm wondering how this series
> > > will finally help the use case of microvm. I'm not sure I get the whole point
> > > of it, but... if microvm is the major use case of this, it would be good to
> > > provide some quick numbers on those if possible.
> > >
> > > For example, IIUC microvm uses qboot (as a better alternative than seabios) for
> > > fast boot, and qboot has:
> > >
> > > https://github.com/bonzini/qboot/blob/master/pci.c#L20
> > >
> > > I'm kind of curious whether qboot will still be used when this series is used
> > > with microvm VMs? Since those are still at least PIO based.
> >
> > I'm afraid there is no 'grand plan' for everything at this moment :-(
> > For traditional VMs 0.04 sec per boot is negligible and definitely not
> > worth adding a feature, memory requirements are also very
> > different. When it comes to microvm-style usage things change.
> >
> > '8193' PCI hole accesses I mention in the PATCH0 blurb are just from
> > Linux as I was doing direct kernel boot, we can't get better than that
> > (if PCI is in the game of course). Firmware (qboot, seabios,...) can
> > only add more. I *think* the plan is to eventually switch them all to
> > MMCFG, at least for KVM guests, by default but we need something to put
> > to the advertisement.
>
> I see a similar ~8k PCI hole reads with a -kernel boot w/ OVMF. All but 60
> of those are from pcibios_fixup_peer_bridges(), and all are from the kernel.
> My understanding is that pcibios_fixup_peer_bridges() is useful if and only
> if there multiple root buses. And AFAICT, when running under QEMU, the only
> way for there to be multiple buses in is if there is an explicit bridge
> created ("pxb" or "pxb-pcie"). Based on the cover letter from those[*], the
> main reason for creating a bridge is to handle pinned CPUs on a NUMA system
> with pass-through devices. That use case seems highly unlikely to cross
> paths with micro VMs, i.e. micro VMs will only ever have a single bus.

My position is it's not all black and white, workloads do not
cleanly partition to these that care about boot speed and those
that don't. So IMHO we care about boot speed with pcie even if
microvm does not use it at the moment.

> Unless I'm mistaken, microvm doesn't even support PCI, does it?
>
> If all of the above is true, this can be handled by adding "pci=lastbus=0"
> as a guest kernel param to override its scanning of buses. And couldn't
> that be done by QEMU's microvm_fix_kernel_cmdline() to make it transparent
> to the end user?
>
> [*] https://www.redhat.com/archives/libvir-list/2016-March/msg01213.html

2020-09-07 11:37:37

[permalink] [raw]

Subject: Re: [PATCH v2 0/3] KVM: x86: KVM_MEM_PCI_HOLE memory

On Mon, Sep 07, 2020 at 10:37:39AM +0200, Vitaly Kuznetsov wrote:
> Sean Christopherson <[email protected]> writes:
>
> > On Fri, Sep 04, 2020 at 09:29:05AM +0200, Gerd Hoffmann wrote:
> >> Hi,
> >>
> >> > Unless I'm mistaken, microvm doesn't even support PCI, does it?
> >>
> >> Correct, no pci support right now.
> >>
> >> We could probably wire up ecam (arm/virt style) for pcie support, once
> >> the acpi support for mictovm finally landed (we need acpi for that
> >> because otherwise the kernel wouldn't find the pcie bus).
> >>
> >> Question is whenever there is a good reason to do so. Why would someone
> >> prefer microvm with pcie support over q35?
> >>
> >> > If all of the above is true, this can be handled by adding "pci=lastbus=0"
> >> > as a guest kernel param to override its scanning of buses. And couldn't
> >> > that be done by QEMU's microvm_fix_kernel_cmdline() to make it transparent
> >> > to the end user?
> >>
> >> microvm_fix_kernel_cmdline() is a hack, not a solution.
> >>
> >> Beside that I doubt this has much of an effect on microvm because
> >> it doesn't support pcie in the first place.
> >
> > I am so confused. Vitaly, can you clarify exactly what QEMU VM type this
> > series is intended to help? If this is for microvm, then why is the guest
> > doing PCI scanning in the first place? If it's for q35, why is the
> > justification for microvm-like workloads?
>
> I'm not exactly sure about the plans for particular machine types, the
> intention was to use this for pcie in QEMU in general so whatever
> machine type uses pcie will benefit.
>
> Now, it seems that we have a more sophisticated landscape. The
> optimization will only make sense to speed up boot so all 'traditional'
> VM types with 'traditional' firmware are out of
> question. 'Container-like' VMs seem to avoid PCI for now, I'm not sure
> if it's because they're in early stages of their development, because
> they can get away without PCI or, actually, because of slowness at boot
> (which we're trying to tackle with this feature). I'd definitely like to
> hear more what people think about this.

I suspect microvms will need pci eventually. I would much rather KVM
had an exit-less discovery mechanism in place by then because
learning from history if it doesn't they will do some kind of
hack on the kernel command line, and everyone will be stuck
supporting that for years ...

> >
> > Either way, I think it makes sense explore other options before throwing
> > something into KVM, e.g. modifying guest command line, adding a KVM hint,
> > "fixing" QEMU, etc...
> >
>
> Initially, this feature looked like a small and straitforward
> (micro-)optimization to me: memory regions with 'PCI hole' semantics do
> exist and we can speed up access to them. Ideally, I'd like to find
> other 'constant memory' regions requiring fast access and come up with
> an interface to create them in KVM but so far nothing interesting came
> up...

True, me neither.

> --
> Vitaly

2020-09-11 17:01:49

[permalink] [raw]

Subject: Re: [PATCH v2 0/3] KVM: x86: KVM_MEM_PCI_HOLE memory

On Mon, Sep 07, 2020 at 07:32:23AM -0400, Michael S. Tsirkin wrote:
> On Mon, Sep 07, 2020 at 10:37:39AM +0200, Vitaly Kuznetsov wrote:
> > Sean Christopherson <[email protected]> writes:
> >
> > > On Fri, Sep 04, 2020 at 09:29:05AM +0200, Gerd Hoffmann wrote:
> > >> Hi,
> > >>
> > >> > Unless I'm mistaken, microvm doesn't even support PCI, does it?
> > >>
> > >> Correct, no pci support right now.
> > >>
> > >> We could probably wire up ecam (arm/virt style) for pcie support, once
> > >> the acpi support for mictovm finally landed (we need acpi for that
> > >> because otherwise the kernel wouldn't find the pcie bus).
> > >>
> > >> Question is whenever there is a good reason to do so. Why would someone
> > >> prefer microvm with pcie support over q35?
> > >>
> > >> > If all of the above is true, this can be handled by adding "pci=lastbus=0"
> > >> > as a guest kernel param to override its scanning of buses. And couldn't
> > >> > that be done by QEMU's microvm_fix_kernel_cmdline() to make it transparent
> > >> > to the end user?
> > >>
> > >> microvm_fix_kernel_cmdline() is a hack, not a solution.
> > >>
> > >> Beside that I doubt this has much of an effect on microvm because
> > >> it doesn't support pcie in the first place.
> > >
> > > I am so confused. Vitaly, can you clarify exactly what QEMU VM type this
> > > series is intended to help? If this is for microvm, then why is the guest
> > > doing PCI scanning in the first place? If it's for q35, why is the
> > > justification for microvm-like workloads?
> >
> > I'm not exactly sure about the plans for particular machine types, the
> > intention was to use this for pcie in QEMU in general so whatever
> > machine type uses pcie will benefit.
> >
> > Now, it seems that we have a more sophisticated landscape. The
> > optimization will only make sense to speed up boot so all 'traditional'
> > VM types with 'traditional' firmware are out of
> > question. 'Container-like' VMs seem to avoid PCI for now, I'm not sure
> > if it's because they're in early stages of their development, because
> > they can get away without PCI or, actually, because of slowness at boot
> > (which we're trying to tackle with this feature). I'd definitely like to
> > hear more what people think about this.
>
> I suspect microvms will need pci eventually. I would much rather KVM
> had an exit-less discovery mechanism in place by then because
> learning from history if it doesn't they will do some kind of
> hack on the kernel command line, and everyone will be stuck
> supporting that for years ...

Is it not an option for the VMM to "accurately" enumerate the number of buses?
E.g. if the VMM has devices on only bus 0, then enumerate that there is one
bus so that the guest doesn't try and probe devices that can't possibly exist.
Or is that completely non-sensical and/or violate PCIe spec?

2020-09-18 08:54:35

[permalink] [raw]

Subject: Re: [PATCH v2 0/3] KVM: x86: KVM_MEM_PCI_HOLE memory

Hi,

> I see a similar ~8k PCI hole reads with a -kernel boot w/ OVMF. All but 60
> of those are from pcibios_fixup_peer_bridges(), and all are from the kernel.

pcibios_fixup_peer_bridges() looks at pcibios_last_bus, and that in turn
seems to be set according to the mmconfig size (in
arch/x86/pci/mmconfig-shared.c).

So, maybe we just need to declare a smaller mmconfig window in the acpi
tables, depending on the number of pci busses actually used ...

> If all of the above is true, this can be handled by adding "pci=lastbus=0"

... so we don't need manual quirks like this?

take care,
Gerd

2020-09-18 09:37:54

[permalink] [raw]

Subject: Re: [PATCH v2 0/3] KVM: x86: KVM_MEM_PCI_HOLE memory

Hi,

> > We could probably wire up ecam (arm/virt style) for pcie support, once
> > the acpi support for mictovm finally landed (we need acpi for that
> > because otherwise the kernel wouldn't find the pcie bus).
> >
> > Question is whenever there is a good reason to do so. Why would someone
> > prefer microvm with pcie support over q35?
>
> The usual reasons to use pcie apply to microvm just the same.
> E.g.: pass through of pcie devices?

Playground:
https://git.kraxel.org/cgit/qemu/log/?h=sirius/microvm-usb

Adds support for usb and pcie (use -machine microvm,usb=on,pcie=on
to enable). Reuses the gpex used on arm/aarch64. Seems to work ok
on a quick test.

Not fully sure how to deal correctly with ioports. The gpex device
has a mmio window for the io address space. Will that approach work
on x86 too? Anyway, just not having a ioport range seems to be a
valid configuation, so I've just disabled them for now ...

take care,
Gerd

2020-09-18 12:36:37

[permalink] [raw]

Subject: Re: [PATCH v2 0/3] KVM: x86: KVM_MEM_PCI_HOLE memory

On Fri, Sep 11, 2020 at 10:00:31AM -0700, Sean Christopherson wrote:
> On Mon, Sep 07, 2020 at 07:32:23AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Sep 07, 2020 at 10:37:39AM +0200, Vitaly Kuznetsov wrote:
> > > Sean Christopherson <[email protected]> writes:
> > >
> > > > On Fri, Sep 04, 2020 at 09:29:05AM +0200, Gerd Hoffmann wrote:
> > > >> Hi,
> > > >>
> > > >> > Unless I'm mistaken, microvm doesn't even support PCI, does it?
> > > >>
> > > >> Correct, no pci support right now.
> > > >>
> > > >> We could probably wire up ecam (arm/virt style) for pcie support, once
> > > >> the acpi support for mictovm finally landed (we need acpi for that
> > > >> because otherwise the kernel wouldn't find the pcie bus).
> > > >>
> > > >> Question is whenever there is a good reason to do so. Why would someone
> > > >> prefer microvm with pcie support over q35?
> > > >>
> > > >> > If all of the above is true, this can be handled by adding "pci=lastbus=0"
> > > >> > as a guest kernel param to override its scanning of buses. And couldn't
> > > >> > that be done by QEMU's microvm_fix_kernel_cmdline() to make it transparent
> > > >> > to the end user?
> > > >>
> > > >> microvm_fix_kernel_cmdline() is a hack, not a solution.
> > > >>
> > > >> Beside that I doubt this has much of an effect on microvm because
> > > >> it doesn't support pcie in the first place.
> > > >
> > > > I am so confused. Vitaly, can you clarify exactly what QEMU VM type this
> > > > series is intended to help? If this is for microvm, then why is the guest
> > > > doing PCI scanning in the first place? If it's for q35, why is the
> > > > justification for microvm-like workloads?
> > >
> > > I'm not exactly sure about the plans for particular machine types, the
> > > intention was to use this for pcie in QEMU in general so whatever
> > > machine type uses pcie will benefit.
> > >
> > > Now, it seems that we have a more sophisticated landscape. The
> > > optimization will only make sense to speed up boot so all 'traditional'
> > > VM types with 'traditional' firmware are out of
> > > question. 'Container-like' VMs seem to avoid PCI for now, I'm not sure
> > > if it's because they're in early stages of their development, because
> > > they can get away without PCI or, actually, because of slowness at boot
> > > (which we're trying to tackle with this feature). I'd definitely like to
> > > hear more what people think about this.
> >
> > I suspect microvms will need pci eventually. I would much rather KVM
> > had an exit-less discovery mechanism in place by then because
> > learning from history if it doesn't they will do some kind of
> > hack on the kernel command line, and everyone will be stuck
> > supporting that for years ...
>
> Is it not an option for the VMM to "accurately" enumerate the number of buses?
> E.g. if the VMM has devices on only bus 0, then enumerate that there is one
> bus so that the guest doesn't try and probe devices that can't possibly exist.
> Or is that completely non-sensical and/or violate PCIe spec?

There is some tension here, in that one way to make guest boot faster
is to defer hotplug of devices until after it booted.

--
MST

2020-09-21 17:22:58