Would it be possible to revert:
commit 228cf79376f13b98f2e1ac10586311312757675c
Author: Konstantin Ozerkov <[email protected]>
Date: Wed Oct 26 19:11:01 2011 +0400
ALSA: intel8x0: Improve performance in virtual environment
Presumably one or more of the following is true:
a) The inside_vm == true case is just an optimization and should apply
unconditionally.
b) The inside_vm == true case is incorrect and should be fixed or disabled.
c) The inside_vm == true case is a special case that makes sense then
IO is very very slow but doesn't make sense when IO is fast. If so,
why not literally measure the time that the IO takes and switch over
to the "inside VM" path when IO is slow?
There are a pile of nonsensical "are we in a VM" checks of various
sorts scattered throughout the kernel, they're all a mess to maintain
(there are lots of kinds of VMs in the world, and Linux may not even
know it's a guest), and, in most cases, it appears that the correct
solution is to delete the checks. I just removed a nasty one in the
x86_32 entry asm, and this one is written in C so it should be a piece
of cake :)
--Andy
On Tue, 29 Mar 2016 23:37:32 +0200,
Andy Lutomirski wrote:
>
> Would it be possible to revert:
>
> commit 228cf79376f13b98f2e1ac10586311312757675c
> Author: Konstantin Ozerkov <[email protected]>
> Date: Wed Oct 26 19:11:01 2011 +0400
>
> ALSA: intel8x0: Improve performance in virtual environment
>
> Presumably one or more of the following is true:
>
> a) The inside_vm == true case is just an optimization and should apply
> unconditionally.
>
> b) The inside_vm == true case is incorrect and should be fixed or disabled.
>
> c) The inside_vm == true case is a special case that makes sense then
> IO is very very slow but doesn't make sense when IO is fast. If so,
> why not literally measure the time that the IO takes and switch over
> to the "inside VM" path when IO is slow?
More important condition is rather that the register updates of CIV
and PICB are atomic. This is satisfied mostly only on VM, and can't
be measured easily unlike the IO read speed.
> There are a pile of nonsensical "are we in a VM" checks of various
> sorts scattered throughout the kernel, they're all a mess to maintain
> (there are lots of kinds of VMs in the world, and Linux may not even
> know it's a guest), and, in most cases, it appears that the correct
> solution is to delete the checks. I just removed a nasty one in the
> x86_32 entry asm, and this one is written in C so it should be a piece
> of cake :)
This cake looks sweet, but a worm is hidden behind the cream.
The loop in the code itself is already a kludge for the buggy hardware
where the inconsistent read happens not so often (only at the boundary
and in a racy way). It would be nice if we can have a more reliably
way to know the hardware buggyness, but it's difficult,
unsurprisingly.
thanks,
Takashi
On Wed, Mar 30, 2016 at 08:07:04AM +0200, Takashi Iwai wrote:
> On Tue, 29 Mar 2016 23:37:32 +0200,
> Andy Lutomirski wrote:
> >
> > Would it be possible to revert:
> >
> > commit 228cf79376f13b98f2e1ac10586311312757675c
> > Author: Konstantin Ozerkov <[email protected]>
> > Date: Wed Oct 26 19:11:01 2011 +0400
> >
> > ALSA: intel8x0: Improve performance in virtual environment
> >
> > Presumably one or more of the following is true:
> >
> > a) The inside_vm == true case is just an optimization and should apply
> > unconditionally.
> >
> > b) The inside_vm == true case is incorrect and should be fixed or disabled.
> >
> > c) The inside_vm == true case is a special case that makes sense then
> > IO is very very slow but doesn't make sense when IO is fast. If so,
> > why not literally measure the time that the IO takes and switch over
> > to the "inside VM" path when IO is slow?
BTW can we simulate this on bare metal by throttling an IO bus, or
perhaps mucking with the scheduler ?
I ask as I wonder if similar type of optimization may be useful
to first simulate with other types of buses for other IO devices
we might use in virtualization environments. If so, I'd be curious
to know if similar type of optimizations might be possible for
other sounds cards, or other IO devices.
> More important condition is rather that the register updates of CIV
> and PICB are atomic.
To help with this can you perhaps elaborate a bit more on what the code
does? As I read it snd_intel8x0_pcm_pointer() gets a pointer to some
sort of audio frame we are in and uses two values to see if we are
going to be evaluating the right frame, we use an optimization of
some sort to skip one check for virtual environments. We seem to need
this given that on a virtual environment it is assumed that the sound
card is emulated, and as such an IO read there is rather expensive.
Can you confirm and/or elaborate a bit more what this does ?
To try to help understand what is going on can you describe what CIV
and PICB are exactly ?
> This is satisfied mostly only on VM, and can't
> be measured easily unlike the IO read speed.
Interesting, note the original patch claimed it was for KVM and
Parallels hypervisor only, but since the code uses:
+#if defined(__i386__) || defined(__x86_64__)
+ inside_vm = inside_vm || boot_cpu_has(X86_FEATURE_HYPERVISOR);
+#endif
This makes it apply also to Xen as well, this makes this hack more
broad, but does is it only applicable when an emulated device is
used ? What about if a hypervisor is used and PCI passthrough is
used ?
> > There are a pile of nonsensical "are we in a VM" checks of various
> > sorts scattered throughout the kernel, they're all a mess to maintain
> > (there are lots of kinds of VMs in the world, and Linux may not even
> > know it's a guest), and, in most cases, it appears that the correct
> > solution is to delete the checks. I just removed a nasty one in the
> > x86_32 entry asm, and this one is written in C so it should be a piece
> > of cake :)
>
> This cake looks sweet, but a worm is hidden behind the cream.
> The loop in the code itself is already a kludge for the buggy hardware
> where the inconsistent read happens not so often (only at the boundary
> and in a racy way). It would be nice if we can have a more reliably
> way to know the hardware buggyness, but it's difficult,
> unsurprisingly.
The concern here is setting precedents for VM cases sprinkled in the kernel.
The assumption here is such special cases are really paper'ing over another
type of issue, so its best to ultimately try to root cause the issue in
a more generalized fashion.
Stephen Hemminger pointer out to me a while ago that the Linux scheduler
really can't tell apart between latencies incurred for instance due to
network IO and say latencies incurred high computation. We also don't
have information to feed the scheduler to provide reasonable latency
guarantees. The same should apply to sound IO latency issues, however
this example seems to reveal very type-of-device specifics which are
used to make certain compromises. If the issue can be tied to discrepancies
on the scheduler in differentiating latencies incurred by IO or CPU bound
work loads, and certain compromises are indeed very device specific it
would make a generic solution perhaps really hard to address.
Virtual environments have another subtle issue, which I've been suspecting for
a while might get worse over time, and that is that in certain types of
virtualized environments you have to deal with at least (unless you are using
nested virtual environments) two schedulers, each making perhaps very different
decisions, and each perhaps perceiving different conditions, and reacting
perhaps at different times to the same exact event. In the networking world
where two different solutions in two separate layers worked trying to solve
a similar issue with different algorithms has now proven to have wreaked
havoc, its what we know as bufferbloat. Ignoring IO, if we just consider the
discrepancy on scheduler information between a guest on a hypervisor or
a bare metal box, we already know odd issues can occur on external situations
such large number of guest ramp up (say booting 100 guests) or dynamic
topology changes. To address these things there have been IMHO knee-jerk
reactions to the problem on hypervisors, for instance:
a) CPU pinning [0]
b) CPU affinity [0]
c) CPU pools [1]
d) NUMA aware scheduling [2]
[0] http://wiki.xen.org/wiki/Automatic_NUMA_Placement
[1] http://wiki.xen.org/wiki/Cpupools_Howto
[2] http://wiki.xen.org/wiki/Xen_4.3_NUMA_Aware_Scheduling
I have suspected these are just paper work-arounds over the real
issues... but I have no evidence to confirm this yet. If there
is a possibility that is true, the discrepancies on types of
latencies incurred by CPU bound or IO bound should exacerbate
this issue even further.
Would a real time scheduler provide any semantics / heuristics
to help with any of this ?
Luis
On Fri, 01 Apr 2016 00:26:18 +0200,
Luis R. Rodriguez wrote:
>
> On Wed, Mar 30, 2016 at 08:07:04AM +0200, Takashi Iwai wrote:
> > On Tue, 29 Mar 2016 23:37:32 +0200,
> > Andy Lutomirski wrote:
> > >
> > > Would it be possible to revert:
> > >
> > > commit 228cf79376f13b98f2e1ac10586311312757675c
> > > Author: Konstantin Ozerkov <[email protected]>
> > > Date: Wed Oct 26 19:11:01 2011 +0400
> > >
> > > ALSA: intel8x0: Improve performance in virtual environment
> > >
> > > Presumably one or more of the following is true:
> > >
> > > a) The inside_vm == true case is just an optimization and should apply
> > > unconditionally.
> > >
> > > b) The inside_vm == true case is incorrect and should be fixed or disabled.
> > >
> > > c) The inside_vm == true case is a special case that makes sense then
> > > IO is very very slow but doesn't make sense when IO is fast. If so,
> > > why not literally measure the time that the IO takes and switch over
> > > to the "inside VM" path when IO is slow?
>
> BTW can we simulate this on bare metal by throttling an IO bus, or
> perhaps mucking with the scheduler ?
>
> I ask as I wonder if similar type of optimization may be useful
> to first simulate with other types of buses for other IO devices
> we might use in virtualization environments. If so, I'd be curious
> to know if similar type of optimizations might be possible for
> other sounds cards, or other IO devices.
There aren't so many sound devices requiring such a workaround.
> > More important condition is rather that the register updates of CIV
> > and PICB are atomic.
>
> To help with this can you perhaps elaborate a bit more on what the code
> does? As I read it snd_intel8x0_pcm_pointer() gets a pointer to some
> sort of audio frame we are in and uses two values to see if we are
> going to be evaluating the right frame, we use an optimization of
> some sort to skip one check for virtual environments. We seem to need
> this given that on a virtual environment it is assumed that the sound
> card is emulated, and as such an IO read there is rather expensive.
>
> Can you confirm and/or elaborate a bit more what this does ?
>
> To try to help understand what is going on can you describe what CIV
> and PICB are exactly ?
CIV and PICB registers are a pair and we calculate the linear position
in a ring buffer from both two. However, they are divorced sometimes
under stress, and the position calculated from such values may go
backward wrongly. For avoiding it, there is the second read of the
PICB register and compare with the previous value, and loop until it
matches. This check is skipped on VM.
> > This is satisfied mostly only on VM, and can't
> > be measured easily unlike the IO read speed.
>
> Interesting, note the original patch claimed it was for KVM and
> Parallels hypervisor only, but since the code uses:
>
> +#if defined(__i386__) || defined(__x86_64__)
> + inside_vm = inside_vm || boot_cpu_has(X86_FEATURE_HYPERVISOR);
> +#endif
>
> This makes it apply also to Xen as well, this makes this hack more
> broad, but does is it only applicable when an emulated device is
> used ? What about if a hypervisor is used and PCI passthrough is
> used ?
A good question. Xen was added there at the time from positive
results by quick tests, but it might show an issue if it's running on
a very old chip with PCI passthrough. But I'm not sure whether PCI
passthrough would work on such old chipsets at all.
> > > There are a pile of nonsensical "are we in a VM" checks of various
> > > sorts scattered throughout the kernel, they're all a mess to maintain
> > > (there are lots of kinds of VMs in the world, and Linux may not even
> > > know it's a guest), and, in most cases, it appears that the correct
> > > solution is to delete the checks. I just removed a nasty one in the
> > > x86_32 entry asm, and this one is written in C so it should be a piece
> > > of cake :)
> >
> > This cake looks sweet, but a worm is hidden behind the cream.
> > The loop in the code itself is already a kludge for the buggy hardware
> > where the inconsistent read happens not so often (only at the boundary
> > and in a racy way). It would be nice if we can have a more reliably
> > way to know the hardware buggyness, but it's difficult,
> > unsurprisingly.
>
> The concern here is setting precedents for VM cases sprinkled in the kernel.
> The assumption here is such special cases are really paper'ing over another
> type of issue, so its best to ultimately try to root cause the issue in
> a more generalized fashion.
Well, it's rather bare metal that shows the buggy behavior, thus we
need to paper over it. In that sense, it's other way round; we don't
tune for VM. The VM check we're discussing is rather for skipping the
strange workaround.
You may ask whether we can reduce the whole workaround instead. It's
practically impossible. We don't know which models doing so and which
not. And, the hardware in question are (literally) thousands of
variants of damn old PC mobos. Any fundamental change needs to be
verified on all these machines...
Takashi
On Fri, Apr 01, 2016 at 07:34:10AM +0200, Takashi Iwai wrote:
> On Fri, 01 Apr 2016 00:26:18 +0200,
> Luis R. Rodriguez wrote:
> >
> > On Wed, Mar 30, 2016 at 08:07:04AM +0200, Takashi Iwai wrote:
> > > On Tue, 29 Mar 2016 23:37:32 +0200,
> > > Andy Lutomirski wrote:
> > > >
> > > > Would it be possible to revert:
> > > >
> > > > commit 228cf79376f13b98f2e1ac10586311312757675c
> > > > Author: Konstantin Ozerkov <[email protected]>
> > > > Date: Wed Oct 26 19:11:01 2011 +0400
> > > >
> > > > ALSA: intel8x0: Improve performance in virtual environment
> > > >
> > > > Presumably one or more of the following is true:
> > > >
> > > > a) The inside_vm == true case is just an optimization and should apply
> > > > unconditionally.
> > > >
> > > > b) The inside_vm == true case is incorrect and should be fixed or disabled.
> > > >
> > > > c) The inside_vm == true case is a special case that makes sense then
> > > > IO is very very slow but doesn't make sense when IO is fast. If so,
> > > > why not literally measure the time that the IO takes and switch over
> > > > to the "inside VM" path when IO is slow?
> >
> > BTW can we simulate this on bare metal by throttling an IO bus, or
> > perhaps mucking with the scheduler ?
> >
> > I ask as I wonder if similar type of optimization may be useful
> > to first simulate with other types of buses for other IO devices
> > we might use in virtualization environments. If so, I'd be curious
> > to know if similar type of optimizations might be possible for
> > other sounds cards, or other IO devices.
>
> There aren't so many sound devices requiring such a workaround.
Why not, what makes this special?
> > > More important condition is rather that the register updates of CIV
> > > and PICB are atomic.
> >
> > To help with this can you perhaps elaborate a bit more on what the code
> > does? As I read it snd_intel8x0_pcm_pointer() gets a pointer to some
> > sort of audio frame we are in and uses two values to see if we are
> > going to be evaluating the right frame, we use an optimization of
> > some sort to skip one check for virtual environments. We seem to need
> > this given that on a virtual environment it is assumed that the sound
> > card is emulated, and as such an IO read there is rather expensive.
> >
> > Can you confirm and/or elaborate a bit more what this does ?
> >
> > To try to help understand what is going on can you describe what CIV
> > and PICB are exactly ?
>
> CIV and PICB registers are a pair and we calculate the linear position
> in a ring buffer from both two.
What type of ring buffer is this ?
> However, they are divorced sometimes
> under stress, and the position calculated from such values may go
> backward wrongly. For avoiding it, there is the second read of the
> PICB register and compare with the previous value, and loop until it
> matches. This check is skipped on VM.
I see. Is this a software emulation *bug*, or an IO issue due to
virtualization? I tried to read what the pointer() (that's what its called)
callback does but since there is no documentation for any of the callbacks I
have no clue what so ever.
If the former, could a we somehow detect an emulated device other than through
this type of check ? Or could we *add* a capability of some sort to detect it
on the driver ? This would not address the removal, but it could mean finding a
way to address emulation issues.
If its an IO issue -- exactly what is the causing the delays in IO ?
> > > This is satisfied mostly only on VM, and can't
> > > be measured easily unlike the IO read speed.
> >
> > Interesting, note the original patch claimed it was for KVM and
> > Parallels hypervisor only, but since the code uses:
> >
> > +#if defined(__i386__) || defined(__x86_64__)
> > + inside_vm = inside_vm || boot_cpu_has(X86_FEATURE_HYPERVISOR);
> > +#endif
> >
> > This makes it apply also to Xen as well, this makes this hack more
> > broad, but does is it only applicable when an emulated device is
> > used ? What about if a hypervisor is used and PCI passthrough is
> > used ?
>
> A good question. Xen was added there at the time from positive
> results by quick tests, but it might show an issue if it's running on
> a very old chip with PCI passthrough. But I'm not sure whether PCI
> passthrough would work on such old chipsets at all.
If it did have an issue then that would have to be special cased, that
is the module parameter would not need to be enabled for such type of
systems, and heuristics would be needed. As you note, fortunately this
may not be common though... but if this type of work around may be
taken as a precedent to enable other types of hacks in other drivers
I'm very fearful of more hacks later needing these considerations as
well.
> > > > There are a pile of nonsensical "are we in a VM" checks of various
> > > > sorts scattered throughout the kernel, they're all a mess to maintain
> > > > (there are lots of kinds of VMs in the world, and Linux may not even
> > > > know it's a guest), and, in most cases, it appears that the correct
> > > > solution is to delete the checks. I just removed a nasty one in the
> > > > x86_32 entry asm, and this one is written in C so it should be a piece
> > > > of cake :)
> > >
> > > This cake looks sweet, but a worm is hidden behind the cream.
> > > The loop in the code itself is already a kludge for the buggy hardware
> > > where the inconsistent read happens not so often (only at the boundary
> > > and in a racy way). It would be nice if we can have a more reliably
> > > way to know the hardware buggyness, but it's difficult,
> > > unsurprisingly.
> >
> > The concern here is setting precedents for VM cases sprinkled in the kernel.
> > The assumption here is such special cases are really paper'ing over another
> > type of issue, so its best to ultimately try to root cause the issue in
> > a more generalized fashion.
>
> Well, it's rather bare metal that shows the buggy behavior, thus we
> need to paper over it. In that sense, it's other way round; we don't
> tune for VM. The VM check we're discussing is rather for skipping the
> strange workaround.
What is it exactly about a VM that enables this work around to be skipped?
I don't quite get it yet.
> You may ask whether we can reduce the whole workaround instead. It's
> practically impossible. We don't know which models doing so and which
> not. And, the hardware in question are (literally) thousands of
> variants of damn old PC mobos. Any fundamental change needs to be
> verified on all these machines...
What if we can come up with algorithm on the ring buffer that would
satisfy both cases without special casing it ? Is removing this VM
check impossible really?
Luis
On Sat, 02 Apr 2016 00:28:31 +0200,
Luis R. Rodriguez wrote:
>
> On Fri, Apr 01, 2016 at 07:34:10AM +0200, Takashi Iwai wrote:
> > On Fri, 01 Apr 2016 00:26:18 +0200,
> > Luis R. Rodriguez wrote:
> > >
> > > On Wed, Mar 30, 2016 at 08:07:04AM +0200, Takashi Iwai wrote:
> > > > On Tue, 29 Mar 2016 23:37:32 +0200,
> > > > Andy Lutomirski wrote:
> > > > >
> > > > > Would it be possible to revert:
> > > > >
> > > > > commit 228cf79376f13b98f2e1ac10586311312757675c
> > > > > Author: Konstantin Ozerkov <[email protected]>
> > > > > Date: Wed Oct 26 19:11:01 2011 +0400
> > > > >
> > > > > ALSA: intel8x0: Improve performance in virtual environment
> > > > >
> > > > > Presumably one or more of the following is true:
> > > > >
> > > > > a) The inside_vm == true case is just an optimization and should apply
> > > > > unconditionally.
> > > > >
> > > > > b) The inside_vm == true case is incorrect and should be fixed or disabled.
> > > > >
> > > > > c) The inside_vm == true case is a special case that makes sense then
> > > > > IO is very very slow but doesn't make sense when IO is fast. If so,
> > > > > why not literally measure the time that the IO takes and switch over
> > > > > to the "inside VM" path when IO is slow?
> > >
> > > BTW can we simulate this on bare metal by throttling an IO bus, or
> > > perhaps mucking with the scheduler ?
> > >
> > > I ask as I wonder if similar type of optimization may be useful
> > > to first simulate with other types of buses for other IO devices
> > > we might use in virtualization environments. If so, I'd be curious
> > > to know if similar type of optimizations might be possible for
> > > other sounds cards, or other IO devices.
> >
> > There aren't so many sound devices requiring such a workaround.
>
> Why not, what makes this special?
The hardware buggyness.
> > > > More important condition is rather that the register updates of CIV
> > > > and PICB are atomic.
> > >
> > > To help with this can you perhaps elaborate a bit more on what the code
> > > does? As I read it snd_intel8x0_pcm_pointer() gets a pointer to some
> > > sort of audio frame we are in and uses two values to see if we are
> > > going to be evaluating the right frame, we use an optimization of
> > > some sort to skip one check for virtual environments. We seem to need
> > > this given that on a virtual environment it is assumed that the sound
> > > card is emulated, and as such an IO read there is rather expensive.
> > >
> > > Can you confirm and/or elaborate a bit more what this does ?
> > >
> > > To try to help understand what is going on can you describe what CIV
> > > and PICB are exactly ?
> >
> > CIV and PICB registers are a pair and we calculate the linear position
> > in a ring buffer from both two.
>
> What type of ring buffer is this ?
A normal PCM ring buffer via PCI DMA transfer.
> > However, they are divorced sometimes
> > under stress, and the position calculated from such values may go
> > backward wrongly. For avoiding it, there is the second read of the
> > PICB register and compare with the previous value, and loop until it
> > matches. This check is skipped on VM.
>
> I see. Is this a software emulation *bug*, or an IO issue due to
> virtualization? I tried to read what the pointer() (that's what its called)
> callback does but since there is no documentation for any of the callbacks I
> have no clue what so ever.
>
> If the former, could a we somehow detect an emulated device other than through
> this type of check ? Or could we *add* a capability of some sort to detect it
> on the driver ? This would not address the removal, but it could mean finding a
> way to address emulation issues.
>
> If its an IO issue -- exactly what is the causing the delays in IO ?
Luis, there is no problem about emulation itself. It's rather an
optimization to lighten the host side load, as I/O access on a VM is
much heavier.
> > > > This is satisfied mostly only on VM, and can't
> > > > be measured easily unlike the IO read speed.
> > >
> > > Interesting, note the original patch claimed it was for KVM and
> > > Parallels hypervisor only, but since the code uses:
> > >
> > > +#if defined(__i386__) || defined(__x86_64__)
> > > + inside_vm = inside_vm || boot_cpu_has(X86_FEATURE_HYPERVISOR);
> > > +#endif
> > >
> > > This makes it apply also to Xen as well, this makes this hack more
> > > broad, but does is it only applicable when an emulated device is
> > > used ? What about if a hypervisor is used and PCI passthrough is
> > > used ?
> >
> > A good question. Xen was added there at the time from positive
> > results by quick tests, but it might show an issue if it's running on
> > a very old chip with PCI passthrough. But I'm not sure whether PCI
> > passthrough would work on such old chipsets at all.
>
> If it did have an issue then that would have to be special cased, that
> is the module parameter would not need to be enabled for such type of
> systems, and heuristics would be needed. As you note, fortunately this
> may not be common though...
Actually this *is* module parametered. If set to a boolean value, it
can be applied / skipped forcibly. So, if there has been a problem on
Xen, this should have been reported. That's why I wrote it's no
common case. This comes from the real experience.
> but if this type of work around may be
> taken as a precedent to enable other types of hacks in other drivers
> I'm very fearful of more hacks later needing these considerations as
> well.
>
> > > > > There are a pile of nonsensical "are we in a VM" checks of various
> > > > > sorts scattered throughout the kernel, they're all a mess to maintain
> > > > > (there are lots of kinds of VMs in the world, and Linux may not even
> > > > > know it's a guest), and, in most cases, it appears that the correct
> > > > > solution is to delete the checks. I just removed a nasty one in the
> > > > > x86_32 entry asm, and this one is written in C so it should be a piece
> > > > > of cake :)
> > > >
> > > > This cake looks sweet, but a worm is hidden behind the cream.
> > > > The loop in the code itself is already a kludge for the buggy hardware
> > > > where the inconsistent read happens not so often (only at the boundary
> > > > and in a racy way). It would be nice if we can have a more reliably
> > > > way to know the hardware buggyness, but it's difficult,
> > > > unsurprisingly.
> > >
> > > The concern here is setting precedents for VM cases sprinkled in the kernel.
> > > The assumption here is such special cases are really paper'ing over another
> > > type of issue, so its best to ultimately try to root cause the issue in
> > > a more generalized fashion.
> >
> > Well, it's rather bare metal that shows the buggy behavior, thus we
> > need to paper over it. In that sense, it's other way round; we don't
> > tune for VM. The VM check we're discussing is rather for skipping the
> > strange workaround.
>
> What is it exactly about a VM that enables this work around to be skipped?
> I don't quite get it yet.
VM -- at least the full one with the sound hardware emulation --
doesn't have the hardware bug. So, the check isn't needed.
> > You may ask whether we can reduce the whole workaround instead. It's
> > practically impossible. We don't know which models doing so and which
> > not. And, the hardware in question are (literally) thousands of
> > variants of damn old PC mobos. Any fundamental change needs to be
> > verified on all these machines...
>
> What if we can come up with algorithm on the ring buffer that would
> satisfy both cases without special casing it ? Is removing this VM
> check impossible really?
Yes, it's impossible practically, see my comment above.
Whatever you change, you need to verify it on real machines. And it's
very difficult to achieve.
thanks,
Takashi
On Fri, Apr 1, 2016 at 10:33 PM, Takashi Iwai <[email protected]> wrote:
> On Sat, 02 Apr 2016 00:28:31 +0200,
> Luis R. Rodriguez wrote:
>> If the former, could a we somehow detect an emulated device other than through
>> this type of check ? Or could we *add* a capability of some sort to detect it
>> on the driver ? This would not address the removal, but it could mean finding a
>> way to address emulation issues.
>>
>> If its an IO issue -- exactly what is the causing the delays in IO ?
>
> Luis, there is no problem about emulation itself. It's rather an
> optimization to lighten the host side load, as I/O access on a VM is
> much heavier.
>
>> > > > This is satisfied mostly only on VM, and can't
>> > > > be measured easily unlike the IO read speed.
>> > >
>> > > Interesting, note the original patch claimed it was for KVM and
>> > > Parallels hypervisor only, but since the code uses:
>> > >
>> > > +#if defined(__i386__) || defined(__x86_64__)
>> > > + inside_vm = inside_vm || boot_cpu_has(X86_FEATURE_HYPERVISOR);
>> > > +#endif
>> > >
>> > > This makes it apply also to Xen as well, this makes this hack more
>> > > broad, but does is it only applicable when an emulated device is
>> > > used ? What about if a hypervisor is used and PCI passthrough is
>> > > used ?
>> >
>> > A good question. Xen was added there at the time from positive
>> > results by quick tests, but it might show an issue if it's running on
>> > a very old chip with PCI passthrough. But I'm not sure whether PCI
>> > passthrough would work on such old chipsets at all.
>>
>> If it did have an issue then that would have to be special cased, that
>> is the module parameter would not need to be enabled for such type of
>> systems, and heuristics would be needed. As you note, fortunately this
>> may not be common though...
>
> Actually this *is* module parametered. If set to a boolean value, it
> can be applied / skipped forcibly. So, if there has been a problem on
> Xen, this should have been reported. That's why I wrote it's no
> common case. This comes from the real experience.
>
>> but if this type of work around may be
>> taken as a precedent to enable other types of hacks in other drivers
>> I'm very fearful of more hacks later needing these considerations as
>> well.
>>
>> > > > > There are a pile of nonsensical "are we in a VM" checks of various
>> > > > > sorts scattered throughout the kernel, they're all a mess to maintain
>> > > > > (there are lots of kinds of VMs in the world, and Linux may not even
>> > > > > know it's a guest), and, in most cases, it appears that the correct
>> > > > > solution is to delete the checks. I just removed a nasty one in the
>> > > > > x86_32 entry asm, and this one is written in C so it should be a piece
>> > > > > of cake :)
>> > > >
>> > > > This cake looks sweet, but a worm is hidden behind the cream.
>> > > > The loop in the code itself is already a kludge for the buggy hardware
>> > > > where the inconsistent read happens not so often (only at the boundary
>> > > > and in a racy way). It would be nice if we can have a more reliably
>> > > > way to know the hardware buggyness, but it's difficult,
>> > > > unsurprisingly.
>> > >
>> > > The concern here is setting precedents for VM cases sprinkled in the kernel.
>> > > The assumption here is such special cases are really paper'ing over another
>> > > type of issue, so its best to ultimately try to root cause the issue in
>> > > a more generalized fashion.
>> >
>> > Well, it's rather bare metal that shows the buggy behavior, thus we
>> > need to paper over it. In that sense, it's other way round; we don't
>> > tune for VM. The VM check we're discussing is rather for skipping the
>> > strange workaround.
>>
>> What is it exactly about a VM that enables this work around to be skipped?
>> I don't quite get it yet.
>
> VM -- at least the full one with the sound hardware emulation --
> doesn't have the hardware bug. So, the check isn't needed.
Here's the issue, though: asking "am I in a VM" is not a good way to
learn properties of hardware. Just off the top of my head, here are
some types of VM and what they might imply about hardware:
Intel Kernel Guard: your sound card is passed through from real hardware.
Xen: could go either way. In dom0, it's likely passed through. In
domU, it could be passed through or emulated, and I believe this is
the case for all of the Xen variants.
KVM: Probably emulated, but could be passed through.
I think the main reason that Luis and I are both uncomfortable with
"am I in a VM" checks is that they're rarely the right thing to be
detecting, the APIs are poorly designed, and most of the use cases in
the kernel are using them as a proxy for something else and would be
clearer and more future proof if they tested what they actually need
to test more directly.
>
>> > You may ask whether we can reduce the whole workaround instead. It's
>> > practically impossible. We don't know which models doing so and which
>> > not. And, the hardware in question are (literally) thousands of
>> > variants of damn old PC mobos. Any fundamental change needs to be
>> > verified on all these machines...
>>
>> What if we can come up with algorithm on the ring buffer that would
>> satisfy both cases without special casing it ? Is removing this VM
>> check impossible really?
>
> Yes, it's impossible practically, see my comment above.
> Whatever you change, you need to verify it on real machines. And it's
> very difficult to achieve.
But, given what I think you're saying, you only need to test one way:
if the non-VM code works and is just slow on a VM, then wouldn't it be
okay if there were some heuristic that were always right on bare metal
and mostly right on a VM?
Anyway, I still don't see what's wrong with just measuring how long an
iteration of your loop takes. Sure, on both bare metal and on a VM,
there are all kinds of timing errors due to SMI and such, but I don't
think it's true at all that hypervisors will show you only guest time.
The sound drivers don't run early in boot -- they run when full kernel
functionality is available. Both the ktime_* APIs and
CLOCK_MONTONIC_RAW should give actual physical elapsed time. After
all, if they didn't, then simply reading the clock in a VM guest would
be completely broken.
In other words, a simple heuristic could be that, if each of the first
four iterations takes >100 microseconds (or whatever the actual number
is that starts causing real problems on a VM), then switch to the VM
variant. After all, if you run on native hardware that's so slow that
your loop will just time out, then you don't gain anything by actually
letting it time out, and, if you're on a VM that's so fast that it
doesn't matter, then it shouldn't matter what you do.
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
On Sat, 02 Apr 2016 14:57:44 +0200,
Andy Lutomirski wrote:
>
> On Fri, Apr 1, 2016 at 10:33 PM, Takashi Iwai <[email protected]> wrote:
> > On Sat, 02 Apr 2016 00:28:31 +0200,
> > Luis R. Rodriguez wrote:
> >> If the former, could a we somehow detect an emulated device other than through
> >> this type of check ? Or could we *add* a capability of some sort to detect it
> >> on the driver ? This would not address the removal, but it could mean finding a
> >> way to address emulation issues.
> >>
> >> If its an IO issue -- exactly what is the causing the delays in IO ?
> >
> > Luis, there is no problem about emulation itself. It's rather an
> > optimization to lighten the host side load, as I/O access on a VM is
> > much heavier.
> >
> >> > > > This is satisfied mostly only on VM, and can't
> >> > > > be measured easily unlike the IO read speed.
> >> > >
> >> > > Interesting, note the original patch claimed it was for KVM and
> >> > > Parallels hypervisor only, but since the code uses:
> >> > >
> >> > > +#if defined(__i386__) || defined(__x86_64__)
> >> > > + inside_vm = inside_vm || boot_cpu_has(X86_FEATURE_HYPERVISOR);
> >> > > +#endif
> >> > >
> >> > > This makes it apply also to Xen as well, this makes this hack more
> >> > > broad, but does is it only applicable when an emulated device is
> >> > > used ? What about if a hypervisor is used and PCI passthrough is
> >> > > used ?
> >> >
> >> > A good question. Xen was added there at the time from positive
> >> > results by quick tests, but it might show an issue if it's running on
> >> > a very old chip with PCI passthrough. But I'm not sure whether PCI
> >> > passthrough would work on such old chipsets at all.
> >>
> >> If it did have an issue then that would have to be special cased, that
> >> is the module parameter would not need to be enabled for such type of
> >> systems, and heuristics would be needed. As you note, fortunately this
> >> may not be common though...
> >
> > Actually this *is* module parametered. If set to a boolean value, it
> > can be applied / skipped forcibly. So, if there has been a problem on
> > Xen, this should have been reported. That's why I wrote it's no
> > common case. This comes from the real experience.
> >
> >> but if this type of work around may be
> >> taken as a precedent to enable other types of hacks in other drivers
> >> I'm very fearful of more hacks later needing these considerations as
> >> well.
> >>
> >> > > > > There are a pile of nonsensical "are we in a VM" checks of various
> >> > > > > sorts scattered throughout the kernel, they're all a mess to maintain
> >> > > > > (there are lots of kinds of VMs in the world, and Linux may not even
> >> > > > > know it's a guest), and, in most cases, it appears that the correct
> >> > > > > solution is to delete the checks. I just removed a nasty one in the
> >> > > > > x86_32 entry asm, and this one is written in C so it should be a piece
> >> > > > > of cake :)
> >> > > >
> >> > > > This cake looks sweet, but a worm is hidden behind the cream.
> >> > > > The loop in the code itself is already a kludge for the buggy hardware
> >> > > > where the inconsistent read happens not so often (only at the boundary
> >> > > > and in a racy way). It would be nice if we can have a more reliably
> >> > > > way to know the hardware buggyness, but it's difficult,
> >> > > > unsurprisingly.
> >> > >
> >> > > The concern here is setting precedents for VM cases sprinkled in the kernel.
> >> > > The assumption here is such special cases are really paper'ing over another
> >> > > type of issue, so its best to ultimately try to root cause the issue in
> >> > > a more generalized fashion.
> >> >
> >> > Well, it's rather bare metal that shows the buggy behavior, thus we
> >> > need to paper over it. In that sense, it's other way round; we don't
> >> > tune for VM. The VM check we're discussing is rather for skipping the
> >> > strange workaround.
> >>
> >> What is it exactly about a VM that enables this work around to be skipped?
> >> I don't quite get it yet.
> >
> > VM -- at least the full one with the sound hardware emulation --
> > doesn't have the hardware bug. So, the check isn't needed.
>
> Here's the issue, though: asking "am I in a VM" is not a good way to
> learn properties of hardware. Just off the top of my head, here are
> some types of VM and what they might imply about hardware:
>
> Intel Kernel Guard: your sound card is passed through from real hardware.
>
> Xen: could go either way. In dom0, it's likely passed through. In
> domU, it could be passed through or emulated, and I believe this is
> the case for all of the Xen variants.
>
> KVM: Probably emulated, but could be passed through.
>
> I think the main reason that Luis and I are both uncomfortable with
> "am I in a VM" checks is that they're rarely the right thing to be
> detecting, the APIs are poorly designed, and most of the use cases in
> the kernel are using them as a proxy for something else and would be
> clearer and more future proof if they tested what they actually need
> to test more directly.
Please, guys, take a look at the code more closely. This is applied
only to the known emulated PCI devices, and the driver shows the
kernel message:
static int snd_intel8x0_inside_vm(struct pci_dev *pci)
....
/* check for known (emulated) devices */
if (pci->subsystem_vendor == PCI_SUBVENDOR_ID_REDHAT_QUMRANET &&
pci->subsystem_device == PCI_SUBDEVICE_ID_QEMU) {
/* KVM emulated sound, PCI SSID: 1af4:1100 */
msg = "enable KVM";
} else if (pci->subsystem_vendor == 0x1ab8) {
/* Parallels VM emulated sound, PCI SSID: 1ab8:xxxx */
msg = "enable Parallels VM";
} else {
msg = "disable (unknown or VT-d) VM";
result = 0;
}
if (msg != NULL)
dev_info(&pci->dev, "%s optimization\n", msg);
> >> > You may ask whether we can reduce the whole workaround instead. It's
> >> > practically impossible. We don't know which models doing so and which
> >> > not. And, the hardware in question are (literally) thousands of
> >> > variants of damn old PC mobos. Any fundamental change needs to be
> >> > verified on all these machines...
> >>
> >> What if we can come up with algorithm on the ring buffer that would
> >> satisfy both cases without special casing it ? Is removing this VM
> >> check impossible really?
> >
> > Yes, it's impossible practically, see my comment above.
> > Whatever you change, you need to verify it on real machines. And it's
> > very difficult to achieve.
>
> But, given what I think you're saying, you only need to test one way:
> if the non-VM code works and is just slow on a VM, then wouldn't it be
> okay if there were some heuristic that were always right on bare metal
> and mostly right on a VM?
This is the current implementation :) It's the simplest way.
> Anyway, I still don't see what's wrong with just measuring how long an
> iteration of your loop takes. Sure, on both bare metal and on a VM,
> there are all kinds of timing errors due to SMI and such, but I don't
> think it's true at all that hypervisors will show you only guest time.
> The sound drivers don't run early in boot -- they run when full kernel
> functionality is available. Both the ktime_* APIs and
> CLOCK_MONTONIC_RAW should give actual physical elapsed time. After
> all, if they didn't, then simply reading the clock in a VM guest would
> be completely broken.
Well, remember the driver serves for 20 years old 32bit PC mobo,
too...
> In other words, a simple heuristic could be that, if each of the first
> four iterations takes >100 microseconds (or whatever the actual number
> is that starts causing real problems on a VM), then switch to the VM
> variant. After all, if you run on native hardware that's so slow that
> your loop will just time out, then you don't gain anything by actually
> letting it time out, and, if you're on a VM that's so fast that it
> doesn't matter, then it shouldn't matter what you do.
Sorry, no. Although the purpose of inside_vm flag is the
optimization, it's applied not only because I/O is slow. It's
applicable because it works without the further hardware bug
workaround.
IOW, what we need to know is not about the I/O speed. Even if I/O is
slow, it's still wrong to skip the workaround if the sound device
behaves wrongly just like the real hardware. Instead, we need to know
which device doesn't need the bug workaround. And, this can't be
measured easily. Thus, the only sensible way is the whitelist, as is
in the current code.
thanks,
Takashi
On Apr 2, 2016 12:07 PM, "Takashi Iwai" <[email protected]> wrote:
>
> On Sat, 02 Apr 2016 14:57:44 +0200,
> Andy Lutomirski wrote:
> >
> > On Fri, Apr 1, 2016 at 10:33 PM, Takashi Iwai <[email protected]> wrote:
> > > On Sat, 02 Apr 2016 00:28:31 +0200,
> > > Luis R. Rodriguez wrote:
> > >> If the former, could a we somehow detect an emulated device other than through
> > >> this type of check ? Or could we *add* a capability of some sort to detect it
> > >> on the driver ? This would not address the removal, but it could mean finding a
> > >> way to address emulation issues.
> > >>
> > >> If its an IO issue -- exactly what is the causing the delays in IO ?
> > >
> > > Luis, there is no problem about emulation itself. It's rather an
> > > optimization to lighten the host side load, as I/O access on a VM is
> > > much heavier.
> > >
> > >> > > > This is satisfied mostly only on VM, and can't
> > >> > > > be measured easily unlike the IO read speed.
> > >> > >
> > >> > > Interesting, note the original patch claimed it was for KVM and
> > >> > > Parallels hypervisor only, but since the code uses:
> > >> > >
> > >> > > +#if defined(__i386__) || defined(__x86_64__)
> > >> > > + inside_vm = inside_vm || boot_cpu_has(X86_FEATURE_HYPERVISOR);
> > >> > > +#endif
> > >> > >
> > >> > > This makes it apply also to Xen as well, this makes this hack more
> > >> > > broad, but does is it only applicable when an emulated device is
> > >> > > used ? What about if a hypervisor is used and PCI passthrough is
> > >> > > used ?
> > >> >
> > >> > A good question. Xen was added there at the time from positive
> > >> > results by quick tests, but it might show an issue if it's running on
> > >> > a very old chip with PCI passthrough. But I'm not sure whether PCI
> > >> > passthrough would work on such old chipsets at all.
> > >>
> > >> If it did have an issue then that would have to be special cased, that
> > >> is the module parameter would not need to be enabled for such type of
> > >> systems, and heuristics would be needed. As you note, fortunately this
> > >> may not be common though...
> > >
> > > Actually this *is* module parametered. If set to a boolean value, it
> > > can be applied / skipped forcibly. So, if there has been a problem on
> > > Xen, this should have been reported. That's why I wrote it's no
> > > common case. This comes from the real experience.
> > >
> > >> but if this type of work around may be
> > >> taken as a precedent to enable other types of hacks in other drivers
> > >> I'm very fearful of more hacks later needing these considerations as
> > >> well.
> > >>
> > >> > > > > There are a pile of nonsensical "are we in a VM" checks of various
> > >> > > > > sorts scattered throughout the kernel, they're all a mess to maintain
> > >> > > > > (there are lots of kinds of VMs in the world, and Linux may not even
> > >> > > > > know it's a guest), and, in most cases, it appears that the correct
> > >> > > > > solution is to delete the checks. I just removed a nasty one in the
> > >> > > > > x86_32 entry asm, and this one is written in C so it should be a piece
> > >> > > > > of cake :)
> > >> > > >
> > >> > > > This cake looks sweet, but a worm is hidden behind the cream.
> > >> > > > The loop in the code itself is already a kludge for the buggy hardware
> > >> > > > where the inconsistent read happens not so often (only at the boundary
> > >> > > > and in a racy way). It would be nice if we can have a more reliably
> > >> > > > way to know the hardware buggyness, but it's difficult,
> > >> > > > unsurprisingly.
> > >> > >
> > >> > > The concern here is setting precedents for VM cases sprinkled in the kernel.
> > >> > > The assumption here is such special cases are really paper'ing over another
> > >> > > type of issue, so its best to ultimately try to root cause the issue in
> > >> > > a more generalized fashion.
> > >> >
> > >> > Well, it's rather bare metal that shows the buggy behavior, thus we
> > >> > need to paper over it. In that sense, it's other way round; we don't
> > >> > tune for VM. The VM check we're discussing is rather for skipping the
> > >> > strange workaround.
> > >>
> > >> What is it exactly about a VM that enables this work around to be skipped?
> > >> I don't quite get it yet.
> > >
> > > VM -- at least the full one with the sound hardware emulation --
> > > doesn't have the hardware bug. So, the check isn't needed.
> >
> > Here's the issue, though: asking "am I in a VM" is not a good way to
> > learn properties of hardware. Just off the top of my head, here are
> > some types of VM and what they might imply about hardware:
> >
> > Intel Kernel Guard: your sound card is passed through from real hardware.
> >
> > Xen: could go either way. In dom0, it's likely passed through. In
> > domU, it could be passed through or emulated, and I believe this is
> > the case for all of the Xen variants.
> >
> > KVM: Probably emulated, but could be passed through.
> >
> > I think the main reason that Luis and I are both uncomfortable with
> > "am I in a VM" checks is that they're rarely the right thing to be
> > detecting, the APIs are poorly designed, and most of the use cases in
> > the kernel are using them as a proxy for something else and would be
> > clearer and more future proof if they tested what they actually need
> > to test more directly.
>
> Please, guys, take a look at the code more closely. This is applied
> only to the known emulated PCI devices, and the driver shows the
> kernel message:
>
> static int snd_intel8x0_inside_vm(struct pci_dev *pci)
> ....
> /* check for known (emulated) devices */
> if (pci->subsystem_vendor == PCI_SUBVENDOR_ID_REDHAT_QUMRANET &&
> pci->subsystem_device == PCI_SUBDEVICE_ID_QEMU) {
> /* KVM emulated sound, PCI SSID: 1af4:1100 */
> msg = "enable KVM";
> } else if (pci->subsystem_vendor == 0x1ab8) {
> /* Parallels VM emulated sound, PCI SSID: 1ab8:xxxx */
> msg = "enable Parallels VM";
> } else {
> msg = "disable (unknown or VT-d) VM";
> result = 0;
> }
Now I'm more confused. Why are you checking the PCI IDs *and* whether
a hypervisor is detected? Why not check only the IDs?
In any event, at the very least the comment is misleading:
/* detect KVM and Parallels virtual environments */
result = kvm_para_available();
#ifdef X86_FEATURE_HYPERVISOR
result = result || boot_cpu_has(X86_FEATURE_HYPERVISOR);
#endif
You're detecting KVM (sometimes) and the x86 "hypervisor" bit. The
latter has no particularly well-defined meaning. You're also missing
Xen PV, I believe, and I think that Xen PV + QEMU is a real thing, and
you'll fail to detect it, even though it can present a QEMU-emulated
card.
In other words, how is this code any different from a simple whitelist
of two specific cards that work a little differently from others?
--Andy
On Sat, 02 Apr 2016 20:05:21 +0200,
Andy Lutomirski wrote:
>
> On Apr 2, 2016 12:07 PM, "Takashi Iwai" <[email protected]> wrote:
> >
> > On Sat, 02 Apr 2016 14:57:44 +0200,
> > Andy Lutomirski wrote:
> > >
> > > On Fri, Apr 1, 2016 at 10:33 PM, Takashi Iwai <[email protected]> wrote:
> > > > On Sat, 02 Apr 2016 00:28:31 +0200,
> > > > Luis R. Rodriguez wrote:
> > > >> If the former, could a we somehow detect an emulated device other than through
> > > >> this type of check ? Or could we *add* a capability of some sort to detect it
> > > >> on the driver ? This would not address the removal, but it could mean finding a
> > > >> way to address emulation issues.
> > > >>
> > > >> If its an IO issue -- exactly what is the causing the delays in IO ?
> > > >
> > > > Luis, there is no problem about emulation itself. It's rather an
> > > > optimization to lighten the host side load, as I/O access on a VM is
> > > > much heavier.
> > > >
> > > >> > > > This is satisfied mostly only on VM, and can't
> > > >> > > > be measured easily unlike the IO read speed.
> > > >> > >
> > > >> > > Interesting, note the original patch claimed it was for KVM and
> > > >> > > Parallels hypervisor only, but since the code uses:
> > > >> > >
> > > >> > > +#if defined(__i386__) || defined(__x86_64__)
> > > >> > > + inside_vm = inside_vm || boot_cpu_has(X86_FEATURE_HYPERVISOR);
> > > >> > > +#endif
> > > >> > >
> > > >> > > This makes it apply also to Xen as well, this makes this hack more
> > > >> > > broad, but does is it only applicable when an emulated device is
> > > >> > > used ? What about if a hypervisor is used and PCI passthrough is
> > > >> > > used ?
> > > >> >
> > > >> > A good question. Xen was added there at the time from positive
> > > >> > results by quick tests, but it might show an issue if it's running on
> > > >> > a very old chip with PCI passthrough. But I'm not sure whether PCI
> > > >> > passthrough would work on such old chipsets at all.
> > > >>
> > > >> If it did have an issue then that would have to be special cased, that
> > > >> is the module parameter would not need to be enabled for such type of
> > > >> systems, and heuristics would be needed. As you note, fortunately this
> > > >> may not be common though...
> > > >
> > > > Actually this *is* module parametered. If set to a boolean value, it
> > > > can be applied / skipped forcibly. So, if there has been a problem on
> > > > Xen, this should have been reported. That's why I wrote it's no
> > > > common case. This comes from the real experience.
> > > >
> > > >> but if this type of work around may be
> > > >> taken as a precedent to enable other types of hacks in other drivers
> > > >> I'm very fearful of more hacks later needing these considerations as
> > > >> well.
> > > >>
> > > >> > > > > There are a pile of nonsensical "are we in a VM" checks of various
> > > >> > > > > sorts scattered throughout the kernel, they're all a mess to maintain
> > > >> > > > > (there are lots of kinds of VMs in the world, and Linux may not even
> > > >> > > > > know it's a guest), and, in most cases, it appears that the correct
> > > >> > > > > solution is to delete the checks. I just removed a nasty one in the
> > > >> > > > > x86_32 entry asm, and this one is written in C so it should be a piece
> > > >> > > > > of cake :)
> > > >> > > >
> > > >> > > > This cake looks sweet, but a worm is hidden behind the cream.
> > > >> > > > The loop in the code itself is already a kludge for the buggy hardware
> > > >> > > > where the inconsistent read happens not so often (only at the boundary
> > > >> > > > and in a racy way). It would be nice if we can have a more reliably
> > > >> > > > way to know the hardware buggyness, but it's difficult,
> > > >> > > > unsurprisingly.
> > > >> > >
> > > >> > > The concern here is setting precedents for VM cases sprinkled in the kernel.
> > > >> > > The assumption here is such special cases are really paper'ing over another
> > > >> > > type of issue, so its best to ultimately try to root cause the issue in
> > > >> > > a more generalized fashion.
> > > >> >
> > > >> > Well, it's rather bare metal that shows the buggy behavior, thus we
> > > >> > need to paper over it. In that sense, it's other way round; we don't
> > > >> > tune for VM. The VM check we're discussing is rather for skipping the
> > > >> > strange workaround.
> > > >>
> > > >> What is it exactly about a VM that enables this work around to be skipped?
> > > >> I don't quite get it yet.
> > > >
> > > > VM -- at least the full one with the sound hardware emulation --
> > > > doesn't have the hardware bug. So, the check isn't needed.
> > >
> > > Here's the issue, though: asking "am I in a VM" is not a good way to
> > > learn properties of hardware. Just off the top of my head, here are
> > > some types of VM and what they might imply about hardware:
> > >
> > > Intel Kernel Guard: your sound card is passed through from real hardware.
> > >
> > > Xen: could go either way. In dom0, it's likely passed through. In
> > > domU, it could be passed through or emulated, and I believe this is
> > > the case for all of the Xen variants.
> > >
> > > KVM: Probably emulated, but could be passed through.
> > >
> > > I think the main reason that Luis and I are both uncomfortable with
> > > "am I in a VM" checks is that they're rarely the right thing to be
> > > detecting, the APIs are poorly designed, and most of the use cases in
> > > the kernel are using them as a proxy for something else and would be
> > > clearer and more future proof if they tested what they actually need
> > > to test more directly.
> >
> > Please, guys, take a look at the code more closely. This is applied
> > only to the known emulated PCI devices, and the driver shows the
> > kernel message:
> >
> > static int snd_intel8x0_inside_vm(struct pci_dev *pci)
> > ....
> > /* check for known (emulated) devices */
> > if (pci->subsystem_vendor == PCI_SUBVENDOR_ID_REDHAT_QUMRANET &&
> > pci->subsystem_device == PCI_SUBDEVICE_ID_QEMU) {
> > /* KVM emulated sound, PCI SSID: 1af4:1100 */
> > msg = "enable KVM";
> > } else if (pci->subsystem_vendor == 0x1ab8) {
> > /* Parallels VM emulated sound, PCI SSID: 1ab8:xxxx */
> > msg = "enable Parallels VM";
> > } else {
> > msg = "disable (unknown or VT-d) VM";
> > result = 0;
> > }
>
> Now I'm more confused. Why are you checking the PCI IDs *and* whether
> a hypervisor is detected? Why not check only the IDs?
>
> In any event, at the very least the comment is misleading:
>
> /* detect KVM and Parallels virtual environments */
> result = kvm_para_available();
> #ifdef X86_FEATURE_HYPERVISOR
> result = result || boot_cpu_has(X86_FEATURE_HYPERVISOR);
> #endif
>
> You're detecting KVM (sometimes) and the x86 "hypervisor" bit. The
> latter has no particularly well-defined meaning. You're also missing
> Xen PV, I believe, and I think that Xen PV + QEMU is a real thing, and
> you'll fail to detect it, even though it can present a QEMU-emulated
> card.
>
> In other words, how is this code any different from a simple whitelist
> of two specific cards that work a little differently from others?
The PCI ID whitelist was introduced later in the commit 7fb4f392bd27,
and before that, we relied on the VM detection as a switch for
skipping the workaround. The VM detection code was kept there just to
be sure, in case the whitelist isn't 100% correct.
Looking at the current status, the whitelist alone seems enough, so
the VM detection code could be dropped, I suppose.
Takashi
On 02/04/16 13:57, Andy Lutomirski wrote:
> On Fri, Apr 1, 2016 at 10:33 PM, Takashi Iwai <[email protected]> wrote:
>> On Sat, 02 Apr 2016 00:28:31 +0200,
>> Luis R. Rodriguez wrote:
>>> If the former, could a we somehow detect an emulated device other than through
>>> this type of check ? Or could we *add* a capability of some sort to detect it
>>> on the driver ? This would not address the removal, but it could mean finding a
>>> way to address emulation issues.
>>>
>>> If its an IO issue -- exactly what is the causing the delays in IO ?
>>
>> Luis, there is no problem about emulation itself. It's rather an
>> optimization to lighten the host side load, as I/O access on a VM is
>> much heavier.
>>
>>>>>> This is satisfied mostly only on VM, and can't
>>>>>> be measured easily unlike the IO read speed.
>>>>>
>>>>> Interesting, note the original patch claimed it was for KVM and
>>>>> Parallels hypervisor only, but since the code uses:
>>>>>
>>>>> +#if defined(__i386__) || defined(__x86_64__)
>>>>> + inside_vm = inside_vm || boot_cpu_has(X86_FEATURE_HYPERVISOR);
>>>>> +#endif
>>>>>
>>>>> This makes it apply also to Xen as well, this makes this hack more
>>>>> broad, but does is it only applicable when an emulated device is
>>>>> used ? What about if a hypervisor is used and PCI passthrough is
>>>>> used ?
>>>>
>>>> A good question. Xen was added there at the time from positive
>>>> results by quick tests, but it might show an issue if it's running on
>>>> a very old chip with PCI passthrough. But I'm not sure whether PCI
>>>> passthrough would work on such old chipsets at all.
>>>
>>> If it did have an issue then that would have to be special cased, that
>>> is the module parameter would not need to be enabled for such type of
>>> systems, and heuristics would be needed. As you note, fortunately this
>>> may not be common though...
>>
>> Actually this *is* module parametered. If set to a boolean value, it
>> can be applied / skipped forcibly. So, if there has been a problem on
>> Xen, this should have been reported. That's why I wrote it's no
>> common case. This comes from the real experience.
>>
>>> but if this type of work around may be
>>> taken as a precedent to enable other types of hacks in other drivers
>>> I'm very fearful of more hacks later needing these considerations as
>>> well.
>>>
>>>>>>> There are a pile of nonsensical "are we in a VM" checks of various
>>>>>>> sorts scattered throughout the kernel, they're all a mess to maintain
>>>>>>> (there are lots of kinds of VMs in the world, and Linux may not even
>>>>>>> know it's a guest), and, in most cases, it appears that the correct
>>>>>>> solution is to delete the checks. I just removed a nasty one in the
>>>>>>> x86_32 entry asm, and this one is written in C so it should be a piece
>>>>>>> of cake :)
>>>>>>
>>>>>> This cake looks sweet, but a worm is hidden behind the cream.
>>>>>> The loop in the code itself is already a kludge for the buggy hardware
>>>>>> where the inconsistent read happens not so often (only at the boundary
>>>>>> and in a racy way). It would be nice if we can have a more reliably
>>>>>> way to know the hardware buggyness, but it's difficult,
>>>>>> unsurprisingly.
>>>>>
>>>>> The concern here is setting precedents for VM cases sprinkled in the kernel.
>>>>> The assumption here is such special cases are really paper'ing over another
>>>>> type of issue, so its best to ultimately try to root cause the issue in
>>>>> a more generalized fashion.
>>>>
>>>> Well, it's rather bare metal that shows the buggy behavior, thus we
>>>> need to paper over it. In that sense, it's other way round; we don't
>>>> tune for VM. The VM check we're discussing is rather for skipping the
>>>> strange workaround.
>>>
>>> What is it exactly about a VM that enables this work around to be skipped?
>>> I don't quite get it yet.
>>
>> VM -- at least the full one with the sound hardware emulation --
>> doesn't have the hardware bug. So, the check isn't needed.
>
> Here's the issue, though: asking "am I in a VM" is not a good way to
> learn properties of hardware. Just off the top of my head, here are
> some types of VM and what they might imply about hardware:
>
> Intel Kernel Guard: your sound card is passed through from real hardware.
>
> Xen: could go either way. In dom0, it's likely passed through. In
> domU, it could be passed through or emulated, and I believe this is
> the case for all of the Xen variants.
>
> KVM: Probably emulated, but could be passed through.
I'm not sure exactly why I was CC'd into this thread, but this is an
important point -- even if you're running in a VM, you may actually have
direct un-emulated IO access to a real (buggy) piece of hardware; in
which case it sounds like you still need the work-around. So
boot_cpu_has(X86_FEATURE_HYPERVISOR) is probably not the right check.
-George
On Mon, 04 Apr 2016 11:05:43 +0200,
George Dunlap wrote:
>
> On 02/04/16 13:57, Andy Lutomirski wrote:
> > On Fri, Apr 1, 2016 at 10:33 PM, Takashi Iwai <[email protected]> wrote:
> >> On Sat, 02 Apr 2016 00:28:31 +0200,
> >> Luis R. Rodriguez wrote:
> >>> If the former, could a we somehow detect an emulated device other than through
> >>> this type of check ? Or could we *add* a capability of some sort to detect it
> >>> on the driver ? This would not address the removal, but it could mean finding a
> >>> way to address emulation issues.
> >>>
> >>> If its an IO issue -- exactly what is the causing the delays in IO ?
> >>
> >> Luis, there is no problem about emulation itself. It's rather an
> >> optimization to lighten the host side load, as I/O access on a VM is
> >> much heavier.
> >>
> >>>>>> This is satisfied mostly only on VM, and can't
> >>>>>> be measured easily unlike the IO read speed.
> >>>>>
> >>>>> Interesting, note the original patch claimed it was for KVM and
> >>>>> Parallels hypervisor only, but since the code uses:
> >>>>>
> >>>>> +#if defined(__i386__) || defined(__x86_64__)
> >>>>> + inside_vm = inside_vm || boot_cpu_has(X86_FEATURE_HYPERVISOR);
> >>>>> +#endif
> >>>>>
> >>>>> This makes it apply also to Xen as well, this makes this hack more
> >>>>> broad, but does is it only applicable when an emulated device is
> >>>>> used ? What about if a hypervisor is used and PCI passthrough is
> >>>>> used ?
> >>>>
> >>>> A good question. Xen was added there at the time from positive
> >>>> results by quick tests, but it might show an issue if it's running on
> >>>> a very old chip with PCI passthrough. But I'm not sure whether PCI
> >>>> passthrough would work on such old chipsets at all.
> >>>
> >>> If it did have an issue then that would have to be special cased, that
> >>> is the module parameter would not need to be enabled for such type of
> >>> systems, and heuristics would be needed. As you note, fortunately this
> >>> may not be common though...
> >>
> >> Actually this *is* module parametered. If set to a boolean value, it
> >> can be applied / skipped forcibly. So, if there has been a problem on
> >> Xen, this should have been reported. That's why I wrote it's no
> >> common case. This comes from the real experience.
> >>
> >>> but if this type of work around may be
> >>> taken as a precedent to enable other types of hacks in other drivers
> >>> I'm very fearful of more hacks later needing these considerations as
> >>> well.
> >>>
> >>>>>>> There are a pile of nonsensical "are we in a VM" checks of various
> >>>>>>> sorts scattered throughout the kernel, they're all a mess to maintain
> >>>>>>> (there are lots of kinds of VMs in the world, and Linux may not even
> >>>>>>> know it's a guest), and, in most cases, it appears that the correct
> >>>>>>> solution is to delete the checks. I just removed a nasty one in the
> >>>>>>> x86_32 entry asm, and this one is written in C so it should be a piece
> >>>>>>> of cake :)
> >>>>>>
> >>>>>> This cake looks sweet, but a worm is hidden behind the cream.
> >>>>>> The loop in the code itself is already a kludge for the buggy hardware
> >>>>>> where the inconsistent read happens not so often (only at the boundary
> >>>>>> and in a racy way). It would be nice if we can have a more reliably
> >>>>>> way to know the hardware buggyness, but it's difficult,
> >>>>>> unsurprisingly.
> >>>>>
> >>>>> The concern here is setting precedents for VM cases sprinkled in the kernel.
> >>>>> The assumption here is such special cases are really paper'ing over another
> >>>>> type of issue, so its best to ultimately try to root cause the issue in
> >>>>> a more generalized fashion.
> >>>>
> >>>> Well, it's rather bare metal that shows the buggy behavior, thus we
> >>>> need to paper over it. In that sense, it's other way round; we don't
> >>>> tune for VM. The VM check we're discussing is rather for skipping the
> >>>> strange workaround.
> >>>
> >>> What is it exactly about a VM that enables this work around to be skipped?
> >>> I don't quite get it yet.
> >>
> >> VM -- at least the full one with the sound hardware emulation --
> >> doesn't have the hardware bug. So, the check isn't needed.
> >
> > Here's the issue, though: asking "am I in a VM" is not a good way to
> > learn properties of hardware. Just off the top of my head, here are
> > some types of VM and what they might imply about hardware:
> >
> > Intel Kernel Guard: your sound card is passed through from real hardware.
> >
> > Xen: could go either way. In dom0, it's likely passed through. In
> > domU, it could be passed through or emulated, and I believe this is
> > the case for all of the Xen variants.
> >
> > KVM: Probably emulated, but could be passed through.
>
> I'm not sure exactly why I was CC'd into this thread, but this is an
> important point -- even if you're running in a VM, you may actually have
> direct un-emulated IO access to a real (buggy) piece of hardware; in
> which case it sounds like you still need the work-around. So
> boot_cpu_has(X86_FEATURE_HYPERVISOR) is probably not the right check.
The VM check is kept there only to show a kernel message; in case
where a similar issue is seen on another VM, user may notice more
easily by that. The VM check itself doesn't change any kernel
behavior any longer.
Takashi
On Sat, Apr 02, 2016 at 10:22:44PM +0200, Takashi Iwai wrote:
> On Sat, 02 Apr 2016 20:05:21 +0200,
> Andy Lutomirski wrote:
> >
> > On Apr 2, 2016 12:07 PM, "Takashi Iwai" <[email protected]> wrote:
> > >
> > > On Sat, 02 Apr 2016 14:57:44 +0200,
> > > Andy Lutomirski wrote:
> > > >
> > > > On Fri, Apr 1, 2016 at 10:33 PM, Takashi Iwai <[email protected]> wrote:
> > > > > On Sat, 02 Apr 2016 00:28:31 +0200,
> > > > > Luis R. Rodriguez wrote:
> > > > >> If the former, could a we somehow detect an emulated device other than through
> > > > >> this type of check ? Or could we *add* a capability of some sort to detect it
> > > > >> on the driver ? This would not address the removal, but it could mean finding a
> > > > >> way to address emulation issues.
> > > > >>
> > > > >> If its an IO issue -- exactly what is the causing the delays in IO ?
> > > > >
> > > > > Luis, there is no problem about emulation itself. It's rather an
> > > > > optimization to lighten the host side load, as I/O access on a VM is
> > > > > much heavier.
> > > > >
> > > > >> > > > This is satisfied mostly only on VM, and can't
> > > > >> > > > be measured easily unlike the IO read speed.
> > > > >> > >
> > > > >> > > Interesting, note the original patch claimed it was for KVM and
> > > > >> > > Parallels hypervisor only, but since the code uses:
> > > > >> > >
> > > > >> > > +#if defined(__i386__) || defined(__x86_64__)
> > > > >> > > + inside_vm = inside_vm || boot_cpu_has(X86_FEATURE_HYPERVISOR);
> > > > >> > > +#endif
> > > > >> > >
> > > > >> > > This makes it apply also to Xen as well, this makes this hack more
> > > > >> > > broad, but does is it only applicable when an emulated device is
> > > > >> > > used ? What about if a hypervisor is used and PCI passthrough is
> > > > >> > > used ?
> > > > >> >
> > > > >> > A good question. Xen was added there at the time from positive
> > > > >> > results by quick tests, but it might show an issue if it's running on
> > > > >> > a very old chip with PCI passthrough. But I'm not sure whether PCI
> > > > >> > passthrough would work on such old chipsets at all.
> > > > >>
> > > > >> If it did have an issue then that would have to be special cased, that
> > > > >> is the module parameter would not need to be enabled for such type of
> > > > >> systems, and heuristics would be needed. As you note, fortunately this
> > > > >> may not be common though...
> > > > >
> > > > > Actually this *is* module parametered. If set to a boolean value, it
> > > > > can be applied / skipped forcibly. So, if there has been a problem on
> > > > > Xen, this should have been reported. That's why I wrote it's no
> > > > > common case. This comes from the real experience.
> > > > >
> > > > >> but if this type of work around may be
> > > > >> taken as a precedent to enable other types of hacks in other drivers
> > > > >> I'm very fearful of more hacks later needing these considerations as
> > > > >> well.
> > > > >>
> > > > >> > > > > There are a pile of nonsensical "are we in a VM" checks of various
> > > > >> > > > > sorts scattered throughout the kernel, they're all a mess to maintain
> > > > >> > > > > (there are lots of kinds of VMs in the world, and Linux may not even
> > > > >> > > > > know it's a guest), and, in most cases, it appears that the correct
> > > > >> > > > > solution is to delete the checks. I just removed a nasty one in the
> > > > >> > > > > x86_32 entry asm, and this one is written in C so it should be a piece
> > > > >> > > > > of cake :)
> > > > >> > > >
> > > > >> > > > This cake looks sweet, but a worm is hidden behind the cream.
> > > > >> > > > The loop in the code itself is already a kludge for the buggy hardware
> > > > >> > > > where the inconsistent read happens not so often (only at the boundary
> > > > >> > > > and in a racy way). It would be nice if we can have a more reliably
> > > > >> > > > way to know the hardware buggyness, but it's difficult,
> > > > >> > > > unsurprisingly.
> > > > >> > >
> > > > >> > > The concern here is setting precedents for VM cases sprinkled in the kernel.
> > > > >> > > The assumption here is such special cases are really paper'ing over another
> > > > >> > > type of issue, so its best to ultimately try to root cause the issue in
> > > > >> > > a more generalized fashion.
> > > > >> >
> > > > >> > Well, it's rather bare metal that shows the buggy behavior, thus we
> > > > >> > need to paper over it. In that sense, it's other way round; we don't
> > > > >> > tune for VM. The VM check we're discussing is rather for skipping the
> > > > >> > strange workaround.
> > > > >>
> > > > >> What is it exactly about a VM that enables this work around to be skipped?
> > > > >> I don't quite get it yet.
> > > > >
> > > > > VM -- at least the full one with the sound hardware emulation --
> > > > > doesn't have the hardware bug. So, the check isn't needed.
> > > >
> > > > Here's the issue, though: asking "am I in a VM" is not a good way to
> > > > learn properties of hardware. Just off the top of my head, here are
> > > > some types of VM and what they might imply about hardware:
> > > >
> > > > Intel Kernel Guard: your sound card is passed through from real hardware.
> > > >
> > > > Xen: could go either way. In dom0, it's likely passed through. In
> > > > domU, it could be passed through or emulated, and I believe this is
> > > > the case for all of the Xen variants.
> > > >
> > > > KVM: Probably emulated, but could be passed through.
> > > >
> > > > I think the main reason that Luis and I are both uncomfortable with
> > > > "am I in a VM" checks is that they're rarely the right thing to be
> > > > detecting, the APIs are poorly designed, and most of the use cases in
> > > > the kernel are using them as a proxy for something else and would be
> > > > clearer and more future proof if they tested what they actually need
> > > > to test more directly.
> > >
> > > Please, guys, take a look at the code more closely. This is applied
> > > only to the known emulated PCI devices, and the driver shows the
> > > kernel message:
> > >
> > > static int snd_intel8x0_inside_vm(struct pci_dev *pci)
> > > ....
> > > /* check for known (emulated) devices */
> > > if (pci->subsystem_vendor == PCI_SUBVENDOR_ID_REDHAT_QUMRANET &&
> > > pci->subsystem_device == PCI_SUBDEVICE_ID_QEMU) {
> > > /* KVM emulated sound, PCI SSID: 1af4:1100 */
> > > msg = "enable KVM";
> > > } else if (pci->subsystem_vendor == 0x1ab8) {
> > > /* Parallels VM emulated sound, PCI SSID: 1ab8:xxxx */
> > > msg = "enable Parallels VM";
> > > } else {
> > > msg = "disable (unknown or VT-d) VM";
> > > result = 0;
> > > }
> >
> > Now I'm more confused. Why are you checking the PCI IDs *and* whether
> > a hypervisor is detected? Why not check only the IDs?
> >
> > In any event, at the very least the comment is misleading:
> >
> > /* detect KVM and Parallels virtual environments */
> > result = kvm_para_available();
> > #ifdef X86_FEATURE_HYPERVISOR
> > result = result || boot_cpu_has(X86_FEATURE_HYPERVISOR);
> > #endif
> >
> > You're detecting KVM (sometimes) and the x86 "hypervisor" bit. The
> > latter has no particularly well-defined meaning. You're also missing
> > Xen PV, I believe, and I think that Xen PV + QEMU is a real thing, and
> > you'll fail to detect it, even though it can present a QEMU-emulated
> > card.
Indeed, I think some Xen instances would fail to check with this.
> > In other words, how is this code any different from a simple whitelist
> > of two specific cards that work a little differently from others?
>
> The PCI ID whitelist was introduced later in the commit 7fb4f392bd27,
> and before that, we relied on the VM detection as a switch for
> skipping the workaround. The VM detection code was kept there just to
> be sure, in case the whitelist isn't 100% correct.
>
> Looking at the current status, the whitelist alone seems enough, so
> the VM detection code could be dropped, I suppose.
Great, such a change should perhaps highligh the impact also to Xen
then.
Luis