2022-06-06 18:04:54

by Alex Williamson

[permalink] [raw]
Subject: [PATCH 2/2] vfio/pci: Remove console drivers

Console drivers can create conflicts with PCI resources resulting in
userspace getting mmap failures to memory BARs. This is especially evident
when trying to re-use the system primary console for userspace drivers.
Attempt to remove all nature of conflicting drivers as part of our VGA
initialization.

Reported-by: Laszlo Ersek <[email protected]>
Tested-by: Laszlo Ersek <[email protected]>
Suggested-by: Gerd Hoffmann <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/pci/vfio_pci_core.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a0d69ddaf90d..e0cbcbc2aee1 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -13,6 +13,7 @@
#include <linux/device.h>
#include <linux/eventfd.h>
#include <linux/file.h>
+#include <linux/fb.h>
#include <linux/interrupt.h>
#include <linux/iommu.h>
#include <linux/module.h>
@@ -29,6 +30,8 @@

#include <linux/vfio_pci_core.h>

+#include <drm/drm_aperture.h>
+
#define DRIVER_AUTHOR "Alex Williamson <[email protected]>"
#define DRIVER_DESC "core driver for VFIO based PCI devices"

@@ -1793,6 +1796,20 @@ static int vfio_pci_vga_init(struct vfio_pci_core_device *vdev)
if (!vfio_pci_is_vga(pdev))
return 0;

+#if IS_REACHABLE(CONFIG_DRM)
+ drm_aperture_detach_platform_drivers(pdev);
+#endif
+
+#if IS_REACHABLE(CONFIG_FB)
+ ret = remove_conflicting_pci_framebuffers(pdev, vdev->vdev.ops->name);
+ if (ret)
+ return ret;
+#endif
+
+ ret = vga_remove_vgacon(pdev);
+ if (ret)
+ return ret;
+
ret = vga_client_register(pdev, vfio_pci_set_decode);
if (ret)
return ret;



2022-06-08 11:43:36

by Thomas Zimmermann

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfio/pci: Remove console drivers

Hi Alex

Am 06.06.22 um 19:53 schrieb Alex Williamson:
> Console drivers can create conflicts with PCI resources resulting in
> userspace getting mmap failures to memory BARs. This is especially evident
> when trying to re-use the system primary console for userspace drivers.
> Attempt to remove all nature of conflicting drivers as part of our VGA
> initialization.

First a dumb question about your use case. You want to assign a PCI
graphics card to a virtual machine and need to remove the generic driver
from the framebuffer?

>
> Reported-by: Laszlo Ersek <[email protected]>
> Tested-by: Laszlo Ersek <[email protected]>
> Suggested-by: Gerd Hoffmann <[email protected]>
> Signed-off-by: Alex Williamson <[email protected]>
> ---
> drivers/vfio/pci/vfio_pci_core.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index a0d69ddaf90d..e0cbcbc2aee1 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -13,6 +13,7 @@
> #include <linux/device.h>
> #include <linux/eventfd.h>
> #include <linux/file.h>
> +#include <linux/fb.h>
> #include <linux/interrupt.h>
> #include <linux/iommu.h>
> #include <linux/module.h>
> @@ -29,6 +30,8 @@
>
> #include <linux/vfio_pci_core.h>
>
> +#include <drm/drm_aperture.h>
> +
> #define DRIVER_AUTHOR "Alex Williamson <[email protected]>"
> #define DRIVER_DESC "core driver for VFIO based PCI devices"
>
> @@ -1793,6 +1796,20 @@ static int vfio_pci_vga_init(struct vfio_pci_core_device *vdev)
> if (!vfio_pci_is_vga(pdev))
> return 0;
>
> +#if IS_REACHABLE(CONFIG_DRM)
> + drm_aperture_detach_platform_drivers(pdev);
> +#endif
> +
> +#if IS_REACHABLE(CONFIG_FB)
> + ret = remove_conflicting_pci_framebuffers(pdev, vdev->vdev.ops->name);
> + if (ret)
> + return ret;
> +#endif
> +
> + ret = vga_remove_vgacon(pdev);
> + if (ret)
> + return ret;
> +

You shouldn't have to copy any of the implementation of the aperture
helpers.

If you call drm_aperture_remove_conflicting_pci_framebuffers() it should
work correctly. The only reason why it requires a DRM driver structure
as second argument is for the driver's name. [1] And that name is only
used for printing an info message. [2]

The plan forward would be to drop patch 1 entirely.

For patch 2, the most trivial workaround is to instanciate struct
drm_driver here and set the name field to 'vdev->vdev.ops->name'. In the
longer term, the aperture helpers will be moved out of DRM and into a
more prominent location. That workaround will be cleaned up then.

Alternatively, drm_aperture_remove_conflicting_pci_framebuffers() could
be changed to accept the name string as second argument, but that's
quite a bit of churn within the DRM code.

Best regards
Thomas

[1]
https://elixir.bootlin.com/linux/v5.18.2/source/drivers/gpu/drm/drm_aperture.c#L347
[2]
https://elixir.bootlin.com/linux/v5.18.2/source/drivers/video/fbdev/core/fbmem.c#L1570


> ret = vga_client_register(pdev, vfio_pci_set_decode);
> if (ret)
> return ret;
>
>

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


Attachments:
OpenPGP_signature (855.00 B)
OpenPGP digital signature

2022-06-08 14:31:39

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfio/pci: Remove console drivers

Hi Thomas,

On Wed, 8 Jun 2022 13:11:21 +0200
Thomas Zimmermann <[email protected]> wrote:

> Hi Alex
>
> Am 06.06.22 um 19:53 schrieb Alex Williamson:
> > Console drivers can create conflicts with PCI resources resulting in
> > userspace getting mmap failures to memory BARs. This is especially evident
> > when trying to re-use the system primary console for userspace drivers.
> > Attempt to remove all nature of conflicting drivers as part of our VGA
> > initialization.
>
> First a dumb question about your use case. You want to assign a PCI
> graphics card to a virtual machine and need to remove the generic driver
> from the framebuffer?

Exactly.

> > Reported-by: Laszlo Ersek <[email protected]>
> > Tested-by: Laszlo Ersek <[email protected]>
> > Suggested-by: Gerd Hoffmann <[email protected]>
> > Signed-off-by: Alex Williamson <[email protected]>
> > ---
> > drivers/vfio/pci/vfio_pci_core.c | 17 +++++++++++++++++
> > 1 file changed, 17 insertions(+)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index a0d69ddaf90d..e0cbcbc2aee1 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -13,6 +13,7 @@
> > #include <linux/device.h>
> > #include <linux/eventfd.h>
> > #include <linux/file.h>
> > +#include <linux/fb.h>
> > #include <linux/interrupt.h>
> > #include <linux/iommu.h>
> > #include <linux/module.h>
> > @@ -29,6 +30,8 @@
> >
> > #include <linux/vfio_pci_core.h>
> >
> > +#include <drm/drm_aperture.h>
> > +
> > #define DRIVER_AUTHOR "Alex Williamson <[email protected]>"
> > #define DRIVER_DESC "core driver for VFIO based PCI devices"
> >
> > @@ -1793,6 +1796,20 @@ static int vfio_pci_vga_init(struct vfio_pci_core_device *vdev)
> > if (!vfio_pci_is_vga(pdev))
> > return 0;
> >
> > +#if IS_REACHABLE(CONFIG_DRM)
> > + drm_aperture_detach_platform_drivers(pdev);
> > +#endif
> > +
> > +#if IS_REACHABLE(CONFIG_FB)
> > + ret = remove_conflicting_pci_framebuffers(pdev, vdev->vdev.ops->name);
> > + if (ret)
> > + return ret;
> > +#endif
> > +
> > + ret = vga_remove_vgacon(pdev);
> > + if (ret)
> > + return ret;
> > +
>
> You shouldn't have to copy any of the implementation of the aperture
> helpers.
>
> If you call drm_aperture_remove_conflicting_pci_framebuffers() it should
> work correctly. The only reason why it requires a DRM driver structure
> as second argument is for the driver's name. [1] And that name is only
> used for printing an info message. [2]

vfio-pci is not dependent on CONFIG_DRM, therefore we need to open code
this regardless. The only difference if we were to use the existing
function would be something like:

#if IS_REACHABLE(CONFIG_DRM)
ret = drm_aperture_remove_conflicting_pci_framebuffers(pdev, &dummy_drm_driver);
if (ret)
return ret;
#else
#if IS_REACHABLE(CONFIG_FB)
ret = remove_conflicting_pci_framebuffers(pdev, vdev->vdev.ops->name);
if (ret)
return ret;
#endif
ret = vga_remove_vgacon(pdev);
if (ret)
return ret;
#endif

It's also bad practice to create a dummy DRM driver struct with some
assumption which fields are used. If the usage is later expanded, we'd
only discover it by users getting segfaults. If DRM wanted to make
such an API guarantee, then we shouldn't have commit 97c9bfe3f660
("drm/aperture: Pass DRM driver structure instead of driver name").

> The plan forward would be to drop patch 1 entirely.
>
> For patch 2, the most trivial workaround is to instanciate struct
> drm_driver here and set the name field to 'vdev->vdev.ops->name'. In the
> longer term, the aperture helpers will be moved out of DRM and into a
> more prominent location. That workaround will be cleaned up then.

Trivial in execution, but as above, this is poor practice and should be
avoided.

> Alternatively, drm_aperture_remove_conflicting_pci_framebuffers() could
> be changed to accept the name string as second argument, but that's
> quite a bit of churn within the DRM code.

The series as presented was exactly meant to provide the most correct
solution with the least churn to the DRM code. The above referenced
commit precludes us from calling the existing DRM function directly
without resorting to poor practices of assuming the usage of the DRM
driver struct. Using the existing DRM function also does not prevent
us from open coding the remainder of the function to avoid a CONFIG_DRM
dependency. Thanks,

Alex

2022-06-08 16:07:59

by Gerd Hoffmann

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfio/pci: Remove console drivers

Hi,

> You shouldn't have to copy any of the implementation of the aperture
> helpers.

That comes from the aperture helpers being part of drm ...

> For patch 2, the most trivial workaround is to instanciate struct drm_driver
> here and set the name field to 'vdev->vdev.ops->name'. In the longer term,
> the aperture helpers will be moved out of DRM and into a more prominent
> location. That workaround will be cleaned up then.

... but if the long-term plan is to clean that up properly anyway I
don't see the point in bike shedding too much on the details of some
temporary solution.

> Alternatively, drm_aperture_remove_conflicting_pci_framebuffers() could be
> changed to accept the name string as second argument, but that's quite a bit
> of churn within the DRM code.

Also pointless churn because you'll have the very same churn again when
moving the aperture helpers out of drm.

take care,
Gerd

2022-06-09 09:26:59

by Thomas Zimmermann

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfio/pci: Remove console drivers

Hi

Am 08.06.22 um 16:04 schrieb Alex Williamson:
>>
>> You shouldn't have to copy any of the implementation of the aperture
>> helpers.
>>
>> If you call drm_aperture_remove_conflicting_pci_framebuffers() it should
>> work correctly. The only reason why it requires a DRM driver structure
>> as second argument is for the driver's name. [1] And that name is only
>> used for printing an info message. [2]
>
> vfio-pci is not dependent on CONFIG_DRM, therefore we need to open code
> this regardless. The only difference if we were to use the existing
> function would be something like:
>
> #if IS_REACHABLE(CONFIG_DRM)
> ret = drm_aperture_remove_conflicting_pci_framebuffers(pdev, &dummy_drm_driver);
> if (ret)
> return ret;
> #else
> #if IS_REACHABLE(CONFIG_FB)
> ret = remove_conflicting_pci_framebuffers(pdev, vdev->vdev.ops->name);
> if (ret)
> return ret;
> #endif
> ret = vga_remove_vgacon(pdev);
> if (ret)
> return ret;
> #endif
>
> It's also bad practice to create a dummy DRM driver struct with some
> assumption which fields are used. If the usage is later expanded, we'd
> only discover it by users getting segfaults. If DRM wanted to make
> such an API guarantee, then we shouldn't have commit 97c9bfe3f660
> ("drm/aperture: Pass DRM driver structure instead of driver name").

What you're open coding within vfio is legacy code and we want to remove
it as much as possible from the aperture helpers.

Tying the helpers to DRM was in mistake in retrospective. We wanted
something tailored to the needs of DRM. Now that we've seen quite a few
corner cases in the interaction among graphics subsystems, we need
something else. The order of creating devices and loading driver modules
is crucial to making the hand-over between drivers work reliably. So
far, this luckily has only been a problem in theory, but not practice.

>
>> The plan forward would be to drop patch 1 entirely.
>>
>> For patch 2, the most trivial workaround is to instanciate struct
>> drm_driver here and set the name field to 'vdev->vdev.ops->name'. In the
>> longer term, the aperture helpers will be moved out of DRM and into a
>> more prominent location. That workaround will be cleaned up then.
>
> Trivial in execution, but as above, this is poor practice and should be
> avoided.
>
>> Alternatively, drm_aperture_remove_conflicting_pci_framebuffers() could
>> be changed to accept the name string as second argument, but that's
>> quite a bit of churn within the DRM code.
>
> The series as presented was exactly meant to provide the most correct
> solution with the least churn to the DRM code. The above referenced
> commit precludes us from calling the existing DRM function directly
> without resorting to poor practices of assuming the usage of the DRM
> driver struct. Using the existing DRM function also does not prevent
> us from open coding the remainder of the function to avoid a CONFIG_DRM
> dependency. Thanks,

Please have a look at the attached patch. It moves the aperture helpers
to a location common to the various possible users (DRM, fbdev, vfio).
The DRM interfaces remain untouched for now. The patch should provide
what you need in vfio and also serve our future use cases for graphics
drivers. If possible, please create your patch on top of it.

Best regards
Thomas

>
> Alex
>

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


Attachments:
0001-drm-Implement-DRM-aperture-helpers-under-video.patch (23.86 kB)
OpenPGP_signature (855.00 B)
OpenPGP digital signature
Download all attachments

2022-06-09 21:56:00

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfio/pci: Remove console drivers

On Thu, 9 Jun 2022 11:13:22 +0200
Thomas Zimmermann <[email protected]> wrote:
>
> Please have a look at the attached patch. It moves the aperture helpers
> to a location common to the various possible users (DRM, fbdev, vfio).
> The DRM interfaces remain untouched for now. The patch should provide
> what you need in vfio and also serve our future use cases for graphics
> drivers. If possible, please create your patch on top of it.

Looks good to me, this of course makes the vfio change quite trivial.
One change I'd request:

diff --git a/drivers/video/console/Kconfig b/drivers/video/console/Kconfig
index 40c50fa2dd70..7f3c44e1538b 100644
--- a/drivers/video/console/Kconfig
+++ b/drivers/video/console/Kconfig
@@ -10,6 +10,7 @@ config VGA_CONSOLE
depends on !4xx && !PPC_8xx && !SPARC && !M68K && !PARISC && !SUPERH && \
(!ARM || ARCH_FOOTBRIDGE || ARCH_INTEGRATOR || ARCH_NETWINDER) && \
!ARM64 && !ARC && !MICROBLAZE && !OPENRISC && !S390 && !UML
+ select APERTURE_HELPERS if (DRM || FB || VFIO_PCI)
default y
help
Saying Y here will allow you to use Linux in text mode through a

This should be VFIO_PCI_CORE. Thanks,

Alex

2022-06-09 22:06:27

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfio/pci: Remove console drivers

On Thu, 9 Jun 2022 15:41:02 -0600
Alex Williamson <[email protected]> wrote:

> On Thu, 9 Jun 2022 11:13:22 +0200
> Thomas Zimmermann <[email protected]> wrote:
> >
> > Please have a look at the attached patch. It moves the aperture helpers
> > to a location common to the various possible users (DRM, fbdev, vfio).
> > The DRM interfaces remain untouched for now. The patch should provide
> > what you need in vfio and also serve our future use cases for graphics
> > drivers. If possible, please create your patch on top of it.
>
> Looks good to me, this of course makes the vfio change quite trivial.
> One change I'd request:
>
> diff --git a/drivers/video/console/Kconfig b/drivers/video/console/Kconfig
> index 40c50fa2dd70..7f3c44e1538b 100644
> --- a/drivers/video/console/Kconfig
> +++ b/drivers/video/console/Kconfig
> @@ -10,6 +10,7 @@ config VGA_CONSOLE
> depends on !4xx && !PPC_8xx && !SPARC && !M68K && !PARISC && !SUPERH && \
> (!ARM || ARCH_FOOTBRIDGE || ARCH_INTEGRATOR || ARCH_NETWINDER) && \
> !ARM64 && !ARC && !MICROBLAZE && !OPENRISC && !S390 && !UML
> + select APERTURE_HELPERS if (DRM || FB || VFIO_PCI)
> default y
> help
> Saying Y here will allow you to use Linux in text mode through a
>
> This should be VFIO_PCI_CORE. Thanks,

Also, whatever tree this lands in, I'd appreciate a topic branch being
made available so I can more easily get the vfio change in on the same
release. Thanks,

Alex

2022-06-10 07:13:25

by Thomas Zimmermann

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfio/pci: Remove console drivers

Hi

Am 09.06.22 um 23:44 schrieb Alex Williamson:
> On Thu, 9 Jun 2022 15:41:02 -0600
> Alex Williamson <[email protected]> wrote:
>
>> On Thu, 9 Jun 2022 11:13:22 +0200
>> Thomas Zimmermann <[email protected]> wrote:
>>>
>>> Please have a look at the attached patch. It moves the aperture helpers
>>> to a location common to the various possible users (DRM, fbdev, vfio).
>>> The DRM interfaces remain untouched for now. The patch should provide
>>> what you need in vfio and also serve our future use cases for graphics
>>> drivers. If possible, please create your patch on top of it.
>>
>> Looks good to me, this of course makes the vfio change quite trivial.
>> One change I'd request:
>>
>> diff --git a/drivers/video/console/Kconfig b/drivers/video/console/Kconfig
>> index 40c50fa2dd70..7f3c44e1538b 100644
>> --- a/drivers/video/console/Kconfig
>> +++ b/drivers/video/console/Kconfig
>> @@ -10,6 +10,7 @@ config VGA_CONSOLE
>> depends on !4xx && !PPC_8xx && !SPARC && !M68K && !PARISC && !SUPERH && \
>> (!ARM || ARCH_FOOTBRIDGE || ARCH_INTEGRATOR || ARCH_NETWINDER) && \
>> !ARM64 && !ARC && !MICROBLAZE && !OPENRISC && !S390 && !UML
>> + select APERTURE_HELPERS if (DRM || FB || VFIO_PCI)
>> default y
>> help
>> Saying Y here will allow you to use Linux in text mode through a
>>
>> This should be VFIO_PCI_CORE. Thanks,

I attached an updated patch to this email.

>
> Also, whatever tree this lands in, I'd appreciate a topic branch being
> made available so I can more easily get the vfio change in on the same
> release. Thanks,

You can add my patch to your series and merge it through vfio. You'd
only have to cc dri-devel for the patch's review. I guess it's more
important for vfio than DRM. We have no hurry on the DRM side, but v5.20
would be nice.

Best regards
Thomas

>
> Alex
>

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


Attachments:
0001-drm-Implement-DRM-aperture-helpers-under-video.patch (23.86 kB)
OpenPGP_signature (855.00 B)
OpenPGP digital signature
Download all attachments

2022-06-10 15:08:53

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfio/pci: Remove console drivers

On Fri, 10 Jun 2022 09:03:15 +0200
Thomas Zimmermann <[email protected]> wrote:

> Hi
>
> Am 09.06.22 um 23:44 schrieb Alex Williamson:
> > On Thu, 9 Jun 2022 15:41:02 -0600
> > Alex Williamson <[email protected]> wrote:
> >
> >> On Thu, 9 Jun 2022 11:13:22 +0200
> >> Thomas Zimmermann <[email protected]> wrote:
> >>>
> >>> Please have a look at the attached patch. It moves the aperture helpers
> >>> to a location common to the various possible users (DRM, fbdev, vfio).
> >>> The DRM interfaces remain untouched for now. The patch should provide
> >>> what you need in vfio and also serve our future use cases for graphics
> >>> drivers. If possible, please create your patch on top of it.
> >>
> >> Looks good to me, this of course makes the vfio change quite trivial.
> >> One change I'd request:
> >>
> >> diff --git a/drivers/video/console/Kconfig b/drivers/video/console/Kconfig
> >> index 40c50fa2dd70..7f3c44e1538b 100644
> >> --- a/drivers/video/console/Kconfig
> >> +++ b/drivers/video/console/Kconfig
> >> @@ -10,6 +10,7 @@ config VGA_CONSOLE
> >> depends on !4xx && !PPC_8xx && !SPARC && !M68K && !PARISC && !SUPERH && \
> >> (!ARM || ARCH_FOOTBRIDGE || ARCH_INTEGRATOR || ARCH_NETWINDER) && \
> >> !ARM64 && !ARC && !MICROBLAZE && !OPENRISC && !S390 && !UML
> >> + select APERTURE_HELPERS if (DRM || FB || VFIO_PCI)
> >> default y
> >> help
> >> Saying Y here will allow you to use Linux in text mode through a
> >>
> >> This should be VFIO_PCI_CORE. Thanks,
>
> I attached an updated patch to this email.
>
> >
> > Also, whatever tree this lands in, I'd appreciate a topic branch being
> > made available so I can more easily get the vfio change in on the same
> > release. Thanks,
>
> You can add my patch to your series and merge it through vfio. You'd
> only have to cc dri-devel for the patch's review. I guess it's more
> important for vfio than DRM. We have no hurry on the DRM side, but v5.20
> would be nice.

Ok, I didn't realize you were offering the patch for me to post and
merge. I'll do that. Thanks!

Alex

2022-12-04 00:45:13

by [email protected]

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfio/pci: Remove console drivers

Hi,

I hope it is ok to reply to this old thread. Unfortunately, I found a
problem only now after upgrading to 6.0.

My setup has multiple GPUs (2), and I depend on EFIFB to have a working console.
pre-patch behavior, when I bind the vfio-pci to my secondary GPU both
the passthrough and the EFIFB keep working fine.
post-patch behavior, when I bind the vfio-pci to the secondary GPU,
the EFIFB disappears from the system, binding the console to the
"dummy console".
Whenever you try to access the terminal, you have the screen stuck in
whatever was the last buffer content, which gives the impression of
"freezing," but I can still type.
Everything else works, including the passthrough.

I can only think about a few options:

- Is there a way to have EFIFB show up again? After all it looks like
the kernel has just abandoned it, but the buffer is still there. I
can't find a single message about the secondary card and EFIFB in
dmesg, but there's a message for the primary card and EFIFB.
- Can we have a boolean controlling the behavior of vfio-pci
altogether or at least controlling the behavior of vfio-pci for that
specific ID? I know there's already some option for vfio-pci and VGA
cards, would it be appropriate to attach this behavior to that option?

Thanks,

Carlos Augusto

2022-12-05 01:45:07

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfio/pci: Remove console drivers

On Sat, 3 Dec 2022 17:12:38 -0700
"[email protected]" <[email protected]> wrote:

> Hi,
>
> I hope it is ok to reply to this old thread.

It is, but the only relic of the thread is the subject. For reference,
the latest version of this posted is here:

https://lore.kernel.org/all/[email protected]/

Which is committed as:

d17378062079 ("vfio/pci: Remove console drivers")

> Unfortunately, I found a
> problem only now after upgrading to 6.0.
>
> My setup has multiple GPUs (2), and I depend on EFIFB to have a working console.
> pre-patch behavior, when I bind the vfio-pci to my secondary GPU both
> the passthrough and the EFIFB keep working fine.
> post-patch behavior, when I bind the vfio-pci to the secondary GPU,
> the EFIFB disappears from the system, binding the console to the
> "dummy console".
> Whenever you try to access the terminal, you have the screen stuck in
> whatever was the last buffer content, which gives the impression of
> "freezing," but I can still type.
> Everything else works, including the passthrough.

This sounds like the call to aperture_remove_conflicting_pci_devices()
is removing the conflicting driver itself rather than removing the
device from the driver. Is it not possible to unbind the GPU from
efifb before binding the GPU to vfio-pci to effectively nullify the
added call?

> I can only think about a few options:
>
> - Is there a way to have EFIFB show up again? After all it looks like
> the kernel has just abandoned it, but the buffer is still there. I
> can't find a single message about the secondary card and EFIFB in
> dmesg, but there's a message for the primary card and EFIFB.
> - Can we have a boolean controlling the behavior of vfio-pci
> altogether or at least controlling the behavior of vfio-pci for that
> specific ID? I know there's already some option for vfio-pci and VGA
> cards, would it be appropriate to attach this behavior to that option?

I suppose we could have an opt-out module option on vfio-pci to skip
the above call, but clearly it would be better if things worked by
default. We cannot make full use of GPUs with vfio-pci if they're
still in use by host console drivers. The intention was certainly to
unbind the device from any low level drivers rather than disable use of
a console driver entirely. DRM/GPU folks, is that possibly an
interface we could implement? Thanks,

Alex

2022-12-05 09:53:35

by Thomas Zimmermann

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfio/pci: Remove console drivers

Hi

Am 05.12.22 um 01:51 schrieb Alex Williamson:
> On Sat, 3 Dec 2022 17:12:38 -0700
> "[email protected]" <[email protected]> wrote:
>
>> Hi,
>>
>> I hope it is ok to reply to this old thread.
>
> It is, but the only relic of the thread is the subject. For reference,
> the latest version of this posted is here:
>
> https://lore.kernel.org/all/[email protected]/
>
> Which is committed as:
>
> d17378062079 ("vfio/pci: Remove console drivers")
>
>> Unfortunately, I found a
>> problem only now after upgrading to 6.0.
>>
>> My setup has multiple GPUs (2), and I depend on EFIFB to have a working console.

Which GPUs do you have?

>> pre-patch behavior, when I bind the vfio-pci to my secondary GPU both
>> the passthrough and the EFIFB keep working fine.
>> post-patch behavior, when I bind the vfio-pci to the secondary GPU,
>> the EFIFB disappears from the system, binding the console to the
>> "dummy console".

The efifb would likely use the first GPU. And vfio-pci should only
remove the generic driver from the second device. Are you sure that
you're not somehow using the first GPU with vfio-pci.

>> Whenever you try to access the terminal, you have the screen stuck in
>> whatever was the last buffer content, which gives the impression of
>> "freezing," but I can still type.
>> Everything else works, including the passthrough.
>
> This sounds like the call to aperture_remove_conflicting_pci_devices()
> is removing the conflicting driver itself rather than removing the
> device from the driver. Is it not possible to unbind the GPU from
> efifb before binding the GPU to vfio-pci to effectively nullify the
> added call?
>
>> I can only think about a few options:
>>
>> - Is there a way to have EFIFB show up again? After all it looks like
>> the kernel has just abandoned it, but the buffer is still there. I
>> can't find a single message about the secondary card and EFIFB in
>> dmesg, but there's a message for the primary card and EFIFB.
>> - Can we have a boolean controlling the behavior of vfio-pci
>> altogether or at least controlling the behavior of vfio-pci for that
>> specific ID? I know there's already some option for vfio-pci and VGA
>> cards, would it be appropriate to attach this behavior to that option?
>
> I suppose we could have an opt-out module option on vfio-pci to skip
> the above call, but clearly it would be better if things worked by
> default. We cannot make full use of GPUs with vfio-pci if they're
> still in use by host console drivers. The intention was certainly to
> unbind the device from any low level drivers rather than disable use of
> a console driver entirely. DRM/GPU folks, is that possibly an
> interface we could implement? Thanks,

When vfio-pci gives the GPU device to the guest, which driver driver is
bound to it?

Best regards
Thomas

>
> Alex
>

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


Attachments:
OpenPGP_signature (855.00 B)
OpenPGP digital signature

2022-12-05 10:22:37

by Thomas Zimmermann

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfio/pci: Remove console drivers

Hi

Am 05.12.22 um 10:32 schrieb [email protected]:
> I have a rtx 3070 and a 3090, I am absolutely sure I am binding vfio-pci
> to the 3090 and not the 3070.
>
> I have bound the driver in two different ways, first by passing the IDs
> to the module and alternatively by manipulating the system interface and
> use the override (this is what I originally had to do when I used two
> 1080s, so I know it works).
>
> While the 3090 doesn't show a console, there's a remnant from the refund
> (and grub previously) there.
>
> The assessment Alex made previously, where
> aperture_remove_conflicting_pci_devices() is removing the driver (EFIFB)
> instead of the device seems correct, but it could also can be a quirky
> of how EFIFB is implemented. I recall reading a long time ago that EFIFB
> is a special device and once it detects changes it would simply give up.
> There was also no way to attach a device to it again as it depends on
> being preloaded outside the kernel; once something takes over the buffer
> reinitializing is "impossible". I never went deeper to try and
> understand it.

We recently reworked fbdev's interaction with the aperture helpers. [1]
All devices should now be removed iff the driver has been bound to it
(which should be the case here) The patches went into an v6.1-rc.

Could you try the most recent v6.1-rc and report if this fixes the problem?

Best regards
Thomas

[1] https://patchwork.freedesktop.org/series/106040/

>
>
> On Mon, Dec 5, 2022, 2:00 AM Thomas Zimmermann <[email protected]
> <mailto:[email protected]>> wrote:
>
> Hi
>
> Am 05.12.22 um 01:51 schrieb Alex Williamson:
> > On Sat, 3 Dec 2022 17:12:38 -0700
> > "[email protected]" <[email protected]> wrote:
> >
> >> Hi,
> >>
> >> I hope it is ok to reply to this old thread.
> >
> > It is, but the only relic of the thread is the subject.  For
> reference,
> > the latest version of this posted is here:
> >
> >
> https://lore.kernel.org/all/[email protected]/ <https://lore.kernel.org/all/[email protected]/>
> >
> > Which is committed as:
> >
> > d17378062079 ("vfio/pci: Remove console drivers")
> >
> >> Unfortunately, I found a
> >> problem only now after upgrading to 6.0.
> >>
> >> My setup has multiple GPUs (2), and I depend on EFIFB to have a
> working console.
>
> Which GPUs do you have?
>
> >> pre-patch behavior, when I bind the vfio-pci to my secondary GPU
> both
> >> the passthrough and the EFIFB keep working fine.
> >> post-patch behavior, when I bind the vfio-pci to the secondary GPU,
> >> the EFIFB disappears from the system, binding the console to the
> >> "dummy console".
>
> The efifb would likely use the first GPU. And vfio-pci should only
> remove the generic driver from the second device. Are you sure that
> you're not somehow using the first GPU with vfio-pci.
>
> >> Whenever you try to access the terminal, you have the screen
> stuck in
> >> whatever was the last buffer content, which gives the impression of
> >> "freezing," but I can still type.
> >> Everything else works, including the passthrough.
> >
> > This sounds like the call to
> aperture_remove_conflicting_pci_devices()
> > is removing the conflicting driver itself rather than removing the
> > device from the driver.  Is it not possible to unbind the GPU from
> > efifb before binding the GPU to vfio-pci to effectively nullify the
> > added call?
> >
> >> I can only think about a few options:
> >>
> >> - Is there a way to have EFIFB show up again? After all it looks
> like
> >> the kernel has just abandoned it, but the buffer is still there. I
> >> can't find a single message about the secondary card and EFIFB in
> >> dmesg, but there's a message for the primary card and EFIFB.
> >> - Can we have a boolean controlling the behavior of vfio-pci
> >> altogether or at least controlling the behavior of vfio-pci for that
> >> specific ID? I know there's already some option for vfio-pci and VGA
> >> cards, would it be appropriate to attach this behavior to that
> option?
> >
> > I suppose we could have an opt-out module option on vfio-pci to skip
> > the above call, but clearly it would be better if things worked by
> > default.  We cannot make full use of GPUs with vfio-pci if they're
> > still in use by host console drivers.  The intention was certainly to
> > unbind the device from any low level drivers rather than disable
> use of
> > a console driver entirely.  DRM/GPU folks, is that possibly an
> > interface we could implement?  Thanks,
>
> When vfio-pci gives the GPU device to the guest, which driver driver is
> bound to it?
>
> Best regards
> Thomas
>
> >
> > Alex
> >
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev
>

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


Attachments:
OpenPGP_signature (855.00 B)
OpenPGP digital signature

2022-12-05 22:20:46

by [email protected]

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfio/pci: Remove console drivers

Hi Thomas,

On Mon, Dec 5, 2022 at 3:11 AM Thomas Zimmermann <[email protected]> wrote:
>
> Hi
>
> Am 05.12.22 um 10:32 schrieb [email protected]:
> > I have a rtx 3070 and a 3090, I am absolutely sure I am binding vfio-pci
> > to the 3090 and not the 3070.
> >
> > I have bound the driver in two different ways, first by passing the IDs
> > to the module and alternatively by manipulating the system interface and
> > use the override (this is what I originally had to do when I used two
> > 1080s, so I know it works).
> >
> > While the 3090 doesn't show a console, there's a remnant from the refund
> > (and grub previously) there.
> >
> > The assessment Alex made previously, where
> > aperture_remove_conflicting_pci_devices() is removing the driver (EFIFB)
> > instead of the device seems correct, but it could also can be a quirky
> > of how EFIFB is implemented. I recall reading a long time ago that EFIFB
> > is a special device and once it detects changes it would simply give up.
> > There was also no way to attach a device to it again as it depends on
> > being preloaded outside the kernel; once something takes over the buffer
> > reinitializing is "impossible". I never went deeper to try and
> > understand it.
>
> We recently reworked fbdev's interaction with the aperture helpers. [1]
> All devices should now be removed iff the driver has been bound to it
> (which should be the case here) The patches went into an v6.1-rc.
>
> Could you try the most recent v6.1-rc and report if this fixes the problem?

I just tried the latest one, v6.1-rc8, and I can see all the commits
for the series you mentioned there.

The same freeze behavior happens when I load vfio-pci:

[ 6.525463] VFIO - User Level meta-driver version: 0.3
[ 6.528231] Console: switching to colour dummy device 320x90

--
Carlos

>
> Best regards
> Thomas
>
> [1] https://patchwork.freedesktop.org/series/106040/
>
> >
> >
> > On Mon, Dec 5, 2022, 2:00 AM Thomas Zimmermann <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> > Hi
> >
> > Am 05.12.22 um 01:51 schrieb Alex Williamson:
> > > On Sat, 3 Dec 2022 17:12:38 -0700
> > > "[email protected]" <[email protected]> wrote:
> > >
> > >> Hi,
> > >>
> > >> I hope it is ok to reply to this old thread.
> > >
> > > It is, but the only relic of the thread is the subject. For
> > reference,
> > > the latest version of this posted is here:
> > >
> > >
> > https://lore.kernel.org/all/[email protected]/ <https://lore.kernel.org/all/[email protected]/>
> > >
> > > Which is committed as:
> > >
> > > d17378062079 ("vfio/pci: Remove console drivers")
> > >
> > >> Unfortunately, I found a
> > >> problem only now after upgrading to 6.0.
> > >>
> > >> My setup has multiple GPUs (2), and I depend on EFIFB to have a
> > working console.
> >
> > Which GPUs do you have?
> >
> > >> pre-patch behavior, when I bind the vfio-pci to my secondary GPU
> > both
> > >> the passthrough and the EFIFB keep working fine.
> > >> post-patch behavior, when I bind the vfio-pci to the secondary GPU,
> > >> the EFIFB disappears from the system, binding the console to the
> > >> "dummy console".
> >
> > The efifb would likely use the first GPU. And vfio-pci should only
> > remove the generic driver from the second device. Are you sure that
> > you're not somehow using the first GPU with vfio-pci.
> >
> > >> Whenever you try to access the terminal, you have the screen
> > stuck in
> > >> whatever was the last buffer content, which gives the impression of
> > >> "freezing," but I can still type.
> > >> Everything else works, including the passthrough.
> > >
> > > This sounds like the call to
> > aperture_remove_conflicting_pci_devices()
> > > is removing the conflicting driver itself rather than removing the
> > > device from the driver. Is it not possible to unbind the GPU from
> > > efifb before binding the GPU to vfio-pci to effectively nullify the
> > > added call?
> > >
> > >> I can only think about a few options:
> > >>
> > >> - Is there a way to have EFIFB show up again? After all it looks
> > like
> > >> the kernel has just abandoned it, but the buffer is still there. I
> > >> can't find a single message about the secondary card and EFIFB in
> > >> dmesg, but there's a message for the primary card and EFIFB.
> > >> - Can we have a boolean controlling the behavior of vfio-pci
> > >> altogether or at least controlling the behavior of vfio-pci for that
> > >> specific ID? I know there's already some option for vfio-pci and VGA
> > >> cards, would it be appropriate to attach this behavior to that
> > option?
> > >
> > > I suppose we could have an opt-out module option on vfio-pci to skip
> > > the above call, but clearly it would be better if things worked by
> > > default. We cannot make full use of GPUs with vfio-pci if they're
> > > still in use by host console drivers. The intention was certainly to
> > > unbind the device from any low level drivers rather than disable
> > use of
> > > a console driver entirely. DRM/GPU folks, is that possibly an
> > > interface we could implement? Thanks,
> >
> > When vfio-pci gives the GPU device to the guest, which driver driver is
> > bound to it?
> >
> > Best regards
> > Thomas
> >
> > >
> > > Alex
> > >
> >
> > --
> > Thomas Zimmermann
> > Graphics Driver Developer
> > SUSE Software Solutions Germany GmbH
> > Maxfeldstr. 5, 90409 Nürnberg, Germany
> > (HRB 36809, AG Nürnberg)
> > Geschäftsführer: Ivo Totev
> >
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev

2023-01-02 10:50:27

by Shawn Michaels

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfio/pci: Remove console drivers

Hi,

I just upgraded my system (after 7 months) and I also lost my framebuffer on boot.
At first, I thought that my computer was freezing on startup, but it turns out
that it is running fine (I can SSH to it and even startx remotely), only the
framebuffer stops working very early on boot:

:: running early hook [udev]
Starting systemd-udevd version 252.4-2-arch
[3.400568] VFIO - User Level meta-driver version: 0.3
*LOSING FRAMEBUFFER HERE*

Running startx from an SSH session starts my X server and display works again.

I have two identical GPU cards (nVidia GTX 1070). I have been using the method
from [1] for years in order to prevent the nvidia driver from grabbing my guest
GPU. As mb already pointed out, vfio_pci now kills the framebuffer of the host
GPU even though it is assigned to the guest GPU. I could isolate it to the
following line from [1]:

echo "vfio-pci" > "$GPU/driver_override"

I also double checked and this is correctly written to the guest GPU, and not
to the host GPU. My kernel version is:

Linux cc 6.1.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 21 Dec 2022 22:27:55 +0000 x86_64 GNU/Linux

You can find people having the same issue in threads from [2] and [3]. There are
also some reports on the VFIO discord server.

This is a problem for people using an encrypted filesystem (password needs to be
typed blindly) or in case boot fails for some reason (and you cannot see console
logs). In my case, I switched from manually starting X with startx to using a
graphical login manager. Blacklisting the vfio_pci module by passing the following
kernel parameter brings the boot console back:

module_blacklist=vfio_pci

Happy new year everyone

Shawn

[1] https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF#Passthrough_all_GPUs_but_the_boot_GPU
[2] https://bbs.archlinux.org/viewtopic.php?pid=2063423
[3] https://forum.level1techs.com/t/linux-kernel-6-seems-to-be-incompatible-with-the-vfio-pci-module-needed-for-pci-passthrough/190039/11