Basic x86 support [1] for running Linux as secondary Jailhouse [2] guest
is currently pending in the tip tree. This builds on top and enhances
the PCI support for x86 and also ARM guests (ARM[64] does not require
platform patches and works already).
Key elements of this series are:
- detection of Jailhouse via device tree hypervisor node
- function-level PCI scan if Jailhouse is detected
- MMCONFIG support for x86 guests
As most changes affect x86, I would suggest to route the series also via
tip after the necessary acks are collected.
Changes in v2:
- adjusted commit log and include ordering in patch 2
- rebased over Linus master
Jan
[1] https://lkml.org/lkml/2017/11/27/125
[2] http://jailhouse-project.org
CC: Benedikt Spranger <[email protected]>
CC: Mark Rutland <[email protected]>
CC: Otavio Pontes <[email protected]>
CC: Rob Herring <[email protected]>
Jan Kiszka (5):
jailhouse: Provide detection for non-x86 systems
PCI: Scan all functions when running over Jailhouse
x86: Consolidate PCI_MMCONFIG configs
x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI
MAINTAINERS: Add entry for Jailhouse
Otavio Pontes (1):
x86/jailhouse: Enable PCI mmconfig access in inmates
Documentation/devicetree/bindings/jailhouse.txt | 8 ++++++++
MAINTAINERS | 7 +++++++
arch/x86/Kconfig | 11 ++++++-----
arch/x86/include/asm/jailhouse_para.h | 2 +-
arch/x86/include/asm/pci_x86.h | 2 ++
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/cpu/amd.c | 2 +-
arch/x86/kernel/jailhouse.c | 7 +++++++
arch/x86/pci/legacy.c | 4 +++-
arch/x86/pci/mmconfig-shared.c | 4 ++--
drivers/pci/probe.c | 4 +++-
include/linux/hypervisor.h | 17 +++++++++++++++--
12 files changed, 56 insertions(+), 14 deletions(-)
create mode 100644 Documentation/devicetree/bindings/jailhouse.txt
--
2.13.6
From: Jan Kiszka <[email protected]>
Not sure if those two worked by design or just by chance so far. In any
case, it's at least cleaner and clearer to express this in a single
config statement.
Signed-off-by: Jan Kiszka <[email protected]>
---
arch/x86/Kconfig | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index eb7f43f23521..63e85e7da12e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2641,8 +2641,9 @@ config PCI_DIRECT
depends on PCI && (X86_64 || (PCI_GODIRECT || PCI_GOANY || PCI_GOOLPC || PCI_GOMMCONFIG))
config PCI_MMCONFIG
- def_bool y
- depends on X86_32 && PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY)
+ bool "Support mmconfig PCI config space access" if X86_64
+ default y
+ depends on PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY || X86_64)
config PCI_OLPC
def_bool y
@@ -2657,10 +2658,6 @@ config PCI_DOMAINS
def_bool y
depends on PCI
-config PCI_MMCONFIG
- bool "Support mmconfig PCI config space access"
- depends on X86_64 && PCI && ACPI
-
config PCI_CNB20LE_QUIRK
bool "Read CNB20LE Host Bridge Windows" if EXPERT
depends on PCI
--
2.13.6
From: Jan Kiszka <[email protected]>
Per PCIe r4.0, sec 7.5.1.1.9, multi-function devices are required to
have a function 0. Therefore, Linux scans for devices at function 0
(devfn 0/8/16/...) and only scans for other functions if function 0
has its Multi-Function Device bit set or ARI or SR-IOV indicate
there are more functions.
The Jailhouse hypervisor may pass individual functions of a
multi-function device to a guest without passing function 0, which
means a Linux guest won't find them.
Change Linux PCI probing so it scans all function numbers when
running as a guest over Jailhouse.
This is technically prohibited by the spec, so it is possible that
PCI devices without the Multi-Function Device bit set may have
unexpected behavior in response to this probe.
Based on patch by Benedikt Spranger, adding Jailhouse probing to avoid
changing the behavior in the absence of the hypervisor.
CC: Benedikt Spranger <[email protected]>
Signed-off-by: Jan Kiszka <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
---
arch/x86/pci/legacy.c | 4 +++-
drivers/pci/probe.c | 4 +++-
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
index 1cb01abcb1be..dfbe6ac38830 100644
--- a/arch/x86/pci/legacy.c
+++ b/arch/x86/pci/legacy.c
@@ -4,6 +4,7 @@
#include <linux/init.h>
#include <linux/export.h>
#include <linux/pci.h>
+#include <asm/jailhouse_para.h>
#include <asm/pci_x86.h>
/*
@@ -34,13 +35,14 @@ int __init pci_legacy_init(void)
void pcibios_scan_specific_bus(int busn)
{
+ int stride = jailhouse_paravirt() ? 1 : 8;
int devfn;
u32 l;
if (pci_find_bus(0, busn))
return;
- for (devfn = 0; devfn < 256; devfn += 8) {
+ for (devfn = 0; devfn < 256; devfn += stride) {
if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
l != 0x0000 && l != 0xffff) {
DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ef5377438a1e..ce728251ae36 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -16,6 +16,7 @@
#include <linux/pci-aspm.h>
#include <linux/aer.h>
#include <linux/acpi.h>
+#include <linux/hypervisor.h>
#include <linux/irqdomain.h>
#include <linux/pm_runtime.h>
#include "pci.h"
@@ -2517,6 +2518,7 @@ static unsigned int pci_scan_child_bus_extend(struct pci_bus *bus,
unsigned int available_buses)
{
unsigned int used_buses, normal_bridges = 0, hotplug_bridges = 0;
+ unsigned int stride = jailhouse_paravirt() ? 1 : 8;
unsigned int start = bus->busn_res.start;
unsigned int devfn, cmax, max = start;
struct pci_dev *dev;
@@ -2524,7 +2526,7 @@ static unsigned int pci_scan_child_bus_extend(struct pci_bus *bus,
dev_dbg(&bus->dev, "scanning bus\n");
/* Go find them, Rover! */
- for (devfn = 0; devfn < 0x100; devfn += 8)
+ for (devfn = 0; devfn < 0x100; devfn += stride)
pci_scan_slot(bus, devfn);
/* Reserve buses for SR-IOV capability */
--
2.13.6
From: Jan Kiszka <[email protected]>
Implement jailhouse_paravirt() via device tree probing on architectures
!= x86. Will be used by the PCI core.
CC: Rob Herring <[email protected]>
CC: Mark Rutland <[email protected]>
Signed-off-by: Jan Kiszka <[email protected]>
---
Documentation/devicetree/bindings/jailhouse.txt | 8 ++++++++
arch/x86/include/asm/jailhouse_para.h | 2 +-
include/linux/hypervisor.h | 17 +++++++++++++++--
3 files changed, 24 insertions(+), 3 deletions(-)
create mode 100644 Documentation/devicetree/bindings/jailhouse.txt
diff --git a/Documentation/devicetree/bindings/jailhouse.txt b/Documentation/devicetree/bindings/jailhouse.txt
new file mode 100644
index 000000000000..2901c25ff340
--- /dev/null
+++ b/Documentation/devicetree/bindings/jailhouse.txt
@@ -0,0 +1,8 @@
+Jailhouse non-root cell device tree bindings
+--------------------------------------------
+
+When running in a non-root Jailhouse cell (partition), the device tree of this
+platform shall have a top-level "hypervisor" node with the following
+properties:
+
+- compatible = "jailhouse,cell"
diff --git a/arch/x86/include/asm/jailhouse_para.h b/arch/x86/include/asm/jailhouse_para.h
index 875b54376689..b885a961a150 100644
--- a/arch/x86/include/asm/jailhouse_para.h
+++ b/arch/x86/include/asm/jailhouse_para.h
@@ -1,7 +1,7 @@
/* SPDX-License-Identifier: GPL2.0 */
/*
- * Jailhouse paravirt_ops implementation
+ * Jailhouse paravirt detection
*
* Copyright (c) Siemens AG, 2015-2017
*
diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h
index b19563f9a8eb..fc08b433c856 100644
--- a/include/linux/hypervisor.h
+++ b/include/linux/hypervisor.h
@@ -8,15 +8,28 @@
*/
#ifdef CONFIG_X86
+
+#include <asm/jailhouse_para.h>
#include <asm/x86_init.h>
+
static inline void hypervisor_pin_vcpu(int cpu)
{
x86_platform.hyper.pin_vcpu(cpu);
}
-#else
+
+#else /* !CONFIG_X86 */
+
+#include <linux/of.h>
+
static inline void hypervisor_pin_vcpu(int cpu)
{
}
-#endif
+
+static inline bool jailhouse_paravirt(void)
+{
+ return of_find_compatible_node(NULL, NULL, "jailhouse,cell");
+}
+
+#endif /* !CONFIG_X86 */
#endif /* __LINUX_HYPEVISOR_H */
--
2.13.6
From: Jan Kiszka <[email protected]>
Signed-off-by: Jan Kiszka <[email protected]>
---
MAINTAINERS | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 93a12af4f180..4b889f282c77 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7521,6 +7521,13 @@ Q: http://patchwork.linuxtv.org/project/linux-media/list/
S: Maintained
F: drivers/media/dvb-frontends/ix2505v*
+JAILHOUSE HYPERVISOR INTERFACE
+M: Jan Kiszka <[email protected]>
+L: [email protected]
+S: Maintained
+F: arch/x86/kernel/jailhouse.c
+F: arch/x86/include/asm/jailhouse_para.h
+
JC42.4 TEMPERATURE SENSOR DRIVER
M: Guenter Roeck <[email protected]>
L: [email protected]
--
2.13.6
From: Otavio Pontes <[email protected]>
Use the PCI mmconfig base address exported by jailhouse in boot
parameters in order to access the memory mapped PCI configuration space.
Signed-off-by: Otavio Pontes <[email protected]>
[Jan: rebased, fixed !CONFIG_PCI_MMCONFIG]
Signed-off-by: Jan Kiszka <[email protected]>
---
arch/x86/include/asm/pci_x86.h | 2 ++
arch/x86/kernel/jailhouse.c | 7 +++++++
arch/x86/pci/mmconfig-shared.c | 4 ++--
3 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index eb66fa9cd0fc..959d618dbb17 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -151,6 +151,8 @@ extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
phys_addr_t addr);
extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
+ int end, u64 addr);
extern struct list_head pci_mmcfg_list;
diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
index b68fd895235a..7fe2a73da0b3 100644
--- a/arch/x86/kernel/jailhouse.c
+++ b/arch/x86/kernel/jailhouse.c
@@ -124,6 +124,13 @@ static int __init jailhouse_pci_arch_init(void)
if (pcibios_last_bus < 0)
pcibios_last_bus = 0xff;
+#ifdef CONFIG_PCI_MMCONFIG
+ if (setup_data.pci_mmconfig_base) {
+ pci_mmconfig_add(0, 0, 0xff, setup_data.pci_mmconfig_base);
+ pci_mmcfg_arch_init();
+ }
+#endif
+
return 0;
}
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 96684d0adcf9..0e590272366b 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -94,8 +94,8 @@ static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start,
return new;
}
-static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
- int end, u64 addr)
+struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
+ int end, u64 addr)
{
struct pci_mmcfg_region *new;
--
2.13.6
From: Jan Kiszka <[email protected]>
Jailhouse does not use ACPI, but it does support MMCONFIG. Make sure the
latter can be built without having to enable ACPI as well. Primarily, we
need to make the AMD mmconf-fam10h_64 depend upon MMCONFIG and ACPI,
instead of just the former.
Saves some bytes in the Jailhouse non-root kernel.
Signed-off-by: Jan Kiszka <[email protected]>
---
arch/x86/Kconfig | 6 +++++-
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/cpu/amd.c | 2 +-
3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 63e85e7da12e..5b0ac52e357a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2643,7 +2643,7 @@ config PCI_DIRECT
config PCI_MMCONFIG
bool "Support mmconfig PCI config space access" if X86_64
default y
- depends on PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY || X86_64)
+ depends on PCI && (ACPI || SFI || JAILHOUSE_GUEST) && (PCI_GOMMCONFIG || PCI_GOANY || X86_64)
config PCI_OLPC
def_bool y
@@ -2658,6 +2658,10 @@ config PCI_DOMAINS
def_bool y
depends on PCI
+config MMCONF_FAM10H
+ def_bool y
+ depends on PCI_MMCONFIG && ACPI
+
config PCI_CNB20LE_QUIRK
bool "Read CNB20LE Host Bridge Windows" if EXPERT
depends on PCI
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 29786c87e864..73ccf80c09a2 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -146,6 +146,6 @@ ifeq ($(CONFIG_X86_64),y)
obj-$(CONFIG_GART_IOMMU) += amd_gart_64.o aperture_64.o
obj-$(CONFIG_CALGARY_IOMMU) += pci-calgary_64.o tce_64.o
- obj-$(CONFIG_PCI_MMCONFIG) += mmconf-fam10h_64.o
+ obj-$(CONFIG_MMCONF_FAM10H) += mmconf-fam10h_64.o
obj-y += vsmp_64.o
endif
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index f0e6456ca7d3..12bc0a1139da 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -716,7 +716,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c)
static void init_amd_gh(struct cpuinfo_x86 *c)
{
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_MMCONF_FAM10H
/* do this for boot cpu */
if (c == &boot_cpu_data)
check_enable_amd_mmconf_dmi();
--
2.13.6
On Wed, 28 Feb 2018, Jan Kiszka wrote:
> From: Jan Kiszka <[email protected]>
>
> Per PCIe r4.0, sec 7.5.1.1.9, multi-function devices are required to
> have a function 0. Therefore, Linux scans for devices at function 0
> (devfn 0/8/16/...) and only scans for other functions if function 0
> has its Multi-Function Device bit set or ARI or SR-IOV indicate
> there are more functions.
>
> The Jailhouse hypervisor may pass individual functions of a
> multi-function device to a guest without passing function 0, which
> means a Linux guest won't find them.
>
> Change Linux PCI probing so it scans all function numbers when
> running as a guest over Jailhouse.
> void pcibios_scan_specific_bus(int busn)
> {
> + int stride = jailhouse_paravirt() ? 1 : 8;
> int devfn;
> u32 l;
>
> if (pci_find_bus(0, busn))
> return;
>
> - for (devfn = 0; devfn < 256; devfn += 8) {
> + for (devfn = 0; devfn < 256; devfn += stride) {
> if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
> l != 0x0000 && l != 0xffff) {
> DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
Shouldn't that take the situation into account where the MFD bit is set on
a regular devfn, i.e. (devfn % 8) == 0? In that case you'd scan the
subfunctions twice.
Thanks,
tglx
On 2018-02-28 09:44, Thomas Gleixner wrote:
> On Wed, 28 Feb 2018, Jan Kiszka wrote:
>
>> From: Jan Kiszka <[email protected]>
>>
>> Per PCIe r4.0, sec 7.5.1.1.9, multi-function devices are required to
>> have a function 0. Therefore, Linux scans for devices at function 0
>> (devfn 0/8/16/...) and only scans for other functions if function 0
>> has its Multi-Function Device bit set or ARI or SR-IOV indicate
>> there are more functions.
>>
>> The Jailhouse hypervisor may pass individual functions of a
>> multi-function device to a guest without passing function 0, which
>> means a Linux guest won't find them.
>>
>> Change Linux PCI probing so it scans all function numbers when
>> running as a guest over Jailhouse.
>
>> void pcibios_scan_specific_bus(int busn)
>> {
>> + int stride = jailhouse_paravirt() ? 1 : 8;
>> int devfn;
>> u32 l;
>>
>> if (pci_find_bus(0, busn))
>> return;
>>
>> - for (devfn = 0; devfn < 256; devfn += 8) {
>> + for (devfn = 0; devfn < 256; devfn += stride) {
>> if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
>> l != 0x0000 && l != 0xffff) {
>> DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
>
> Shouldn't that take the situation into account where the MFD bit is set on
> a regular devfn, i.e. (devfn % 8) == 0? In that case you'd scan the
> subfunctions twice.
Good point, and it also applies to pci_scan_child_bus_extend. Will add
some filters.
Jan
--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
On Wed, Feb 28, 2018 at 8:34 AM, Jan Kiszka <[email protected]> wrote:
> From: Jan Kiszka <[email protected]>
>
> Not sure if those two worked by design or just by chance so far. In any
> case, it's at least cleaner and clearer to express this in a single
> config statement.
I would add a reference to the commit which brought that in the first place.
>
> Signed-off-by: Jan Kiszka <[email protected]>
> ---
> arch/x86/Kconfig | 9 +++------
> 1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index eb7f43f23521..63e85e7da12e 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2641,8 +2641,9 @@ config PCI_DIRECT
> depends on PCI && (X86_64 || (PCI_GODIRECT || PCI_GOANY || PCI_GOOLPC || PCI_GOMMCONFIG))
>
> config PCI_MMCONFIG
> - def_bool y
> - depends on X86_32 && PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY)
> + bool "Support mmconfig PCI config space access" if X86_64
> + default y
> + depends on PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY || X86_64)
Looking to the above context I would rather put it like
depends on PCI && (ACPI || SFI) && (X86_64 || (PCI_GOANY || PCI_GOMMCONFIG))
--
With Best Regards,
Andy Shevchenko
On Wed, Feb 28, 2018 at 05:45:37PM +0200, Andy Shevchenko wrote:
> On Wed, Feb 28, 2018 at 8:34 AM, Jan Kiszka <[email protected]> wrote:
> > From: Jan Kiszka <[email protected]>
> >
> > Not sure if those two worked by design or just by chance so far. In any
> > case, it's at least cleaner and clearer to express this in a single
> > config statement.
It would be nice if this were a complete statement of what the patch
does, but without the subject, it's not. E.g., as I'm composing this
response in an editor window, I can't see the subject, so it seems
incomplete.
> > Signed-off-by: Jan Kiszka <[email protected]>
> > ---
> > arch/x86/Kconfig | 9 +++------
> > 1 file changed, 3 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index eb7f43f23521..63e85e7da12e 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -2641,8 +2641,9 @@ config PCI_DIRECT
> > depends on PCI && (X86_64 || (PCI_GODIRECT || PCI_GOANY || PCI_GOOLPC || PCI_GOMMCONFIG))
> >
> > config PCI_MMCONFIG
> > - def_bool y
> > - depends on X86_32 && PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY)
> > + bool "Support mmconfig PCI config space access" if X86_64
> > + default y
> > + depends on PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY || X86_64)
>
> Looking to the above context I would rather put it like
>
> depends on PCI && (ACPI || SFI) && (X86_64 || (PCI_GOANY || PCI_GOMMCONFIG))
The changelog doesn't point out any intended functional change, but I
think both these proposals add some new configs that previously could
not occur, e.g.,
CONFIG_X86_64=y
CONFIG_SFI=y
# CONFIG_ACPI is unset
CONFIG_PCI_MMCONFIG=y
If this is intended, the changelog should mention it.