2014-04-19 02:53:35

by Myron Stowe

[permalink] [raw]
Subject: [PATCH v2 0/5] x86/PCI: Add AMD hostbridge support for newer AMD systems

Currently the kernel only supports AMD hostbridges up to Family 11h.
This causes PCI numa_node information to be reported incorrectly
for newer Families with multiple sockets.

This patch series augments the logic to discover newer AMD hostbridge types
(Family 12h, 14h 15h and 16h).

Note:
* Patch 1 and 2 are functional changes for newer AMD Families
* Patch 3 is general clean up and restructuring
* Patch 4 warns when BIOS has not supplied an ACPI _PXM method for a
hostbridge and we "guess" a hostbridge's node value
* Patch 5 removes the now redundant 'quirk_amd_nb_node'

-v2: This series was introduced by Suravee Suthikulanit but seems to have
stalled. I would like to get this moving again and eventually pulled
upstream so I've taken Suravee's series and -
o Rebased the original three patches to v3.15-rc1,
o Added a warn if we "guess" a hostbridge's node value (patch 4),
o I believe with the additional Family support that
'quirk_amd_nb_node' is now redundant so I added patch 5 which
removes it.

Reference:
https://bugzilla.kernel.org/show_bug.cgi?id=72051

Myron Stowe (2):
PCI: Remove redundant 'quirk_amd_nb_node'
ACPI/PCI: Warn if we have to "guess" host bridge node information

Suravee Suthikulpanit (3):
x86/PCI: Miscellaneous code clean up for early_fillup_mp_bus_info
x86/PCI: Support additional MMIO range capabilities
x86/PCI: Add support for generic AMD hostbridges


arch/x86/kernel/quirks.c | 58 -------
arch/x86/pci/acpi.c | 6 +
arch/x86/pci/amd_bus.c | 409 ++++++++++++++++++++++++++++++----------------
3 files changed, 271 insertions(+), 202 deletions(-)

--


2014-04-19 02:53:43

by Myron Stowe

[permalink] [raw]
Subject: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

From: Suravee Suthikulpanit <[email protected]>

This patch adds supports for additional MMIO ranges (16 ranges). Also,
each MMIO base/limit can now support up to 48-bit MMIO addresses.
However, this requires initializing the ECS sooner since the new registers
are in the ECS ranges.

This applies for AMD Family 15h and later.

Reference: Advanced Micro Devices (AMD), "BIOS and Kernel Developer's
Guide (BKDG) for AMD Family 15h Models 00h-0fh Processors." Section 3.4
Device [1F:18]h Function 1 Configuration Registers;
D18F1x[1CC:180,BC:80] MMIO Base/Limit,
D18F1x[D8,D0,C8,C0] IO-Space Base,
D18F1x[DC,D4,CC,C4] IO-Space Limit,
42301 Rev 3.12 - October 11, 2012.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
Signed-off-by: Myron Stowe <[email protected]>
Tested-by: Aravind Gopalakrishnan <[email protected]>
---

arch/x86/pci/amd_bus.c | 126 +++++++++++++++++++++++++++++++++++++++---------
1 files changed, 103 insertions(+), 23 deletions(-)

diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
index c8cd75c..1b2895e 100644
--- a/arch/x86/pci/amd_bus.c
+++ b/arch/x86/pci/amd_bus.c
@@ -13,11 +13,30 @@

#define AMD_NB_F0_NODE_ID 0x60
#define AMD_NB_F0_UNIT_ID 0x64
+#define AMD_NB_F1_MMIO_BASE_REG 0x80
+#define AMD_NB_F1_MMIO_LIMIT_REG 0x84
+#define AMD_NB_F1_MMIO_BASE_LIMIT_HI_REG 0x180
+#define AMD_NB_F1_IO_BASE_ADDR_REG 0xc0
+#define AMD_NB_F1_IO_LIMIT_ADDR_REG 0xc4
#define AMD_NB_F1_CONFIG_MAP_REG 0xe0

#define RANGE_NUM 16
+#define AMD_NB_F1_MMIO_RANGES 16
+#define AMD_NB_F1_IOPORT_RANGES 4
#define AMD_NB_F1_CONFIG_MAP_RANGES 4

+#define AMD_PCIE_CF8(bus, dev, fn, reg) \
+ (0x80000000 | \
+ ((reg & 0xF00) << 16) | \
+ ((bus & 0xF) << 16) | \
+ ((dev & 0x1F) << 11) | \
+ ((fn & 0x7) << 8) | \
+ ((reg & 0xFC)))
+
+static bool amd_early_ecs_enabled;
+
+static int __init pci_io_ecs_init(u8 bus, u8 slot);
+
/*
* This discovers the pcibus <-> node mapping on AMD K8, Family 10h, 12h,
* 14h, 15h, and 16h processor based systems. This also gets peer root bus
@@ -39,6 +58,20 @@ static struct amd_hostbridge hb_probes[] __initdata = {
{ 0xff, 0 , PCI_DEVICE_ID_AMD_10H_NB_HT },
};

+/* This version of read_pci_config allows reading registers in the ECS area */
+static inline int _amd_read_pci_config(u8 bus, u8 slot, u8 fn, u32 offset)
+{
+ u32 value;
+
+ if ((!amd_early_ecs_enabled) && (offset > 0xFF))
+ return -1;
+
+ outl(AMD_PCIE_CF8(bus, slot, fn, offset), 0xcf8);
+ value = inl(0xcfc);
+
+ return value;
+}
+
static struct pci_root_info __init *find_pci_root_info(int node, int link)
{
struct pci_root_info *info;
@@ -48,6 +81,9 @@ static struct pci_root_info __init *find_pci_root_info(int node, int link)
if (info->node == node && info->link == link)
return info;

+ pr_warn("AMD-Bus: WARNING: Failed to find root info for node %#x, link %#x\n",
+ node, link);
+
return NULL;
}

@@ -118,6 +154,12 @@ static int __init early_fill_mp_bus_info(void)

pr_debug("Found AMD hostbridge at %x:%x.0\n", bus, slot);

+ /* We enable ECS mode prior to probing MMIO as MMIO related registers
+ * are in the ECS area.
+ */
+ pci_io_ecs_init(bus, slot);
+
+ found = false;
for (i = 0; i < AMD_NB_F1_CONFIG_MAP_RANGES; i++) {
int min_bus;
int max_bus;
@@ -128,6 +170,7 @@ static int __init early_fill_mp_bus_info(void)
if ((reg & 7) != 3)
continue;

+ found = true;
min_bus = (reg >> 16) & 0xff;
max_bus = (reg >> 24) & 0xff;
node = (reg >> 4) & 0x07;
@@ -136,6 +179,13 @@ static int __init early_fill_mp_bus_info(void)
info = alloc_pci_root_info(min_bus, max_bus, node, link);
}

+ if (!found) {
+ /* In case there is no AMDNB_F1_CONFIG_MAP_REGs,
+ * we just use default to bus 0, node 0 link 0)
+ */
+ info = alloc_pci_root_info(0, 0, 0, 0);
+ }
+
/* get the default node and link for left over res */
reg = read_pci_config(bus, slot, 0, AMD_NB_F0_NODE_ID);
def_node = (reg >> 8) & 0x07;
@@ -144,14 +194,17 @@ static int __init early_fill_mp_bus_info(void)

memset(range, 0, sizeof(range));
add_range(range, RANGE_NUM, 0, 0, 0xffff + 1);
+
/* io port resource */
- for (i = 0; i < 4; i++) {
- reg = read_pci_config(bus, slot, 1, 0xc0 + (i << 3));
+ for (i = 0; i < AMD_NB_F1_IOPORT_RANGES; i++) {
+ reg = read_pci_config(bus, slot, 1,
+ AMD_NB_F1_IO_BASE_ADDR_REG + (i << 3));
if (!(reg & 3))
continue;

start = reg & 0xfff000;
- reg = read_pci_config(bus, slot, 1, 0xc4 + (i << 3));
+ reg = read_pci_config(bus, slot, 1,
+ AMD_NB_F1_IO_LIMIT_ADDR_REG + (i << 3));
node = reg & 0x07;
link = (reg >> 4) & 0x03;
end = (reg & 0xfff000) | 0xfff;
@@ -169,6 +222,7 @@ static int __init early_fill_mp_bus_info(void)
update_res(info, start, end, IORESOURCE_IO, 1);
subtract_range(range, RANGE_NUM, start, end + 1);
}
+
/* add left over io port range to def node/link, [0, 0xffff] */
/* find the position */
info = find_pci_root_info(def_node, def_link);
@@ -211,22 +265,46 @@ static int __init early_fill_mp_bus_info(void)
}

/* mmio resource */
- for (i = 0; i < 8; i++) {
- reg = read_pci_config(bus, slot, 1, 0x80 + (i << 3));
+ for (i = 0; i < AMD_NB_F1_MMIO_RANGES; i++) {
+ u64 tmp;
+ u32 base = AMD_NB_F1_MMIO_BASE_REG + (i << 3);
+ u32 limit = AMD_NB_F1_MMIO_LIMIT_REG + (i << 3);
+ u32 base_limit_hi = AMD_NB_F1_MMIO_BASE_LIMIT_HI_REG + (i << 2);
+
+ if (i >= 12) {
+ /* Range 12 base/limit starts at 0x2A0 */
+ base += 0x1c0;
+ limit += 0x1c0;
+ /* Range 12 base/limit hi starts at 0x2C0 */
+ base_limit_hi += 0x110;
+ } else if (i >= 8) {
+ /* Range 8 base/limit starts at 0x1A0 */
+ base += 0xe0;
+ limit += 0xe0;
+ /* Range 12 base/limit hi starts at 0x1C0 */
+ base_limit_hi += 0x20;
+ }
+
+ /* Base lo */
+ reg = _amd_read_pci_config(bus, slot, 1, base);
if (!(reg & 3))
continue;

- start = reg & 0xffffff00; /* 39:16 on 31:8*/
- start <<= 8;
- reg = read_pci_config(bus, slot, 1, 0x84 + (i << 3));
+ start = (reg & 0xffffff00UL) << 8; /* 39:16 on 31:8 */
+
+ /* Limit lo */
+ reg = _amd_read_pci_config(bus, slot, 1, limit);
node = reg & 0x07;
link = (reg >> 4) & 0x03;
- end = (reg & 0xffffff00);
- end <<= 8;
- end |= 0xffff;
+ end = (reg & 0xffffff00UL) << 8; /* 39:16 on 31:8 */
+ end |= 0xffffUL;

- info = find_pci_root_info(node, link);
+ /* Base/Limit hi */
+ tmp = _amd_read_pci_config(bus, slot, 1, base_limit_hi);
+ start |= ((tmp & 0xffUL) << 40);
+ end |= ((tmp & (0xffUL << 16)) << 24);

+ info = find_pci_root_info(node, link);
if (!info)
continue;

@@ -354,20 +432,24 @@ static struct notifier_block amd_cpu_notifier = {
.notifier_call = amd_cpu_notify,
};

-static void __init pci_enable_pci_io_ecs(void)
+static void __init pci_enable_pci_io_ecs(u8 bus, u8 slot)
{
#ifdef CONFIG_AMD_NB
unsigned int i, n;
+ u8 limit;

for (n = i = 0; !n && amd_nb_bus_dev_ranges[i].dev_limit; ++i) {
- u8 bus = amd_nb_bus_dev_ranges[i].bus;
- u8 slot = amd_nb_bus_dev_ranges[i].dev_base;
- u8 limit = amd_nb_bus_dev_ranges[i].dev_limit;
+ /* Try matching for the bus range */
+ if ((bus != amd_nb_bus_dev_ranges[i].bus) ||
+ (slot != amd_nb_bus_dev_ranges[i].dev_base))
+ continue;
+
+ limit = amd_nb_bus_dev_ranges[i].dev_limit;

+ /* Setup all northbridges within the range */
for (; slot < limit; ++slot) {
u32 val = read_pci_config(bus, slot, 3, 0);
-
- if (!early_is_amd_nb(val))
+ if (!val)
continue;

val = read_pci_config(bus, slot, 3, 0x8c);
@@ -375,13 +457,14 @@ static void __init pci_enable_pci_io_ecs(void)
val |= ENABLE_CF8_EXT_CFG >> 32;
write_pci_config(bus, slot, 3, 0x8c, val);
}
+ amd_early_ecs_enabled = true;
++n;
}
}
#endif
}

-static int __init pci_io_ecs_init(void)
+static int __init pci_io_ecs_init(u8 bus, u8 slot)
{
int cpu;

@@ -389,9 +472,7 @@ static int __init pci_io_ecs_init(void)
if (boot_cpu_data.x86 < 0x10)
return 0;

- /* Try the PCI method first. */
- if (early_pci_allowed())
- pci_enable_pci_io_ecs();
+ pci_enable_pci_io_ecs(bus, slot);

cpu_notifier_register_begin();
for_each_online_cpu(cpu)
@@ -411,7 +492,6 @@ static int __init amd_postcore_init(void)
return 0;

early_fill_mp_bus_info();
- pci_io_ecs_init();

return 0;
}

2014-04-19 02:53:54

by Myron Stowe

[permalink] [raw]
Subject: [PATCH v2 3/5] x86/PCI: Miscellaneous code clean up for early_fillup_mp_bus_info

From: Suravee Suthikulpanit <[email protected]>

* Refactoring of the early_fill_mp_bus_info function into multiple helper
functions since it is getting long, and difficult to follow.

* Merge early_fill_mp_bus_info into amd_postcore_init as there is no need to
have this as a separate function.

* Use pr_xxx instead of printk

* Prepend "AMD-Bus" for each print.

* The current code is using "fam10h_mmconf_*" in several place. As this
file is no longer specific to family10h systems, this patch changes
such occurrances to "amd_mmconf_*" instead for clarity.

No functional changes.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
Signed-off-by: Myron Stowe <[email protected]>
Tested-by: Aravind Gopalakrishnan <[email protected]>
---

arch/x86/pci/amd_bus.c | 284 ++++++++++++++++++++++++++----------------------
1 files changed, 152 insertions(+), 132 deletions(-)

diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
index 1b2895e..c15add6 100644
--- a/arch/x86/pci/amd_bus.c
+++ b/arch/x86/pci/amd_bus.c
@@ -35,8 +35,6 @@

static bool amd_early_ecs_enabled;

-static int __init pci_io_ecs_init(u8 bus, u8 slot);
-
/*
* This discovers the pcibus <-> node mapping on AMD K8, Family 10h, 12h,
* 14h, 15h, and 16h processor based systems. This also gets peer root bus
@@ -87,86 +85,51 @@ static struct pci_root_info __init *find_pci_root_info(int node, int link)
return NULL;
}

-/**
- * early_fill_mp_bus_to_node()
- * called before pcibios_scan_root and pci_scan_bus
- * fills the mp_bus_to_cpumask array based according to the LDT Bus Number
- * Registers found in the K8 northbridge
- */
-static int __init early_fill_mp_bus_info(void)
+static struct amd_hostbridge * __init probe_pci_hostbridge(void)
{
int i;
- unsigned bus;
- unsigned slot;
- int node;
- int link;
- int def_node;
- int def_link;
- struct pci_root_info *info;
- u32 reg;
- u64 start;
- u64 end;
- struct range range[RANGE_NUM];
- u64 val;
- u32 address;
- bool found;
- struct resource fam10h_mmconf_res, *fam10h_mmconf;
- u64 fam10h_mmconf_start;
- u64 fam10h_mmconf_end;
-
- if (!early_pci_allowed())
- return -1;
+ struct amd_hostbridge *hb = NULL;

- found = false;
for (i = 0; i < ARRAY_SIZE(hb_probes); i++) {
u32 id;
u16 device;
u16 vendor;
u32 class;

- bus = hb_probes[i].bus;
- slot = hb_probes[i].slot;
- id = read_pci_config(bus, slot, 0, PCI_VENDOR_ID);
+ hb = &hb_probes[i];
+ id = read_pci_config(hb->bus, hb->slot, 0, PCI_VENDOR_ID);
vendor = id & 0xffff;
device = (id>>16) & 0xffff;
- class = read_pci_config(bus, slot, 0,
+ class = read_pci_config(hb->bus, hb->slot, 0,
PCI_CLASS_REVISION) >> 16;

if (PCI_VENDOR_ID_AMD == vendor) {
- if (hb_probes[i].device == device) {
- found = true;
+ if (hb->device == device)
break;
- }

- if ((hb_probes[i].device == PCI_ANY_ID) &&
+ if ((hb->device == PCI_ANY_ID) &&
(class == PCI_CLASS_BRIDGE_HOST)) {
- hb_probes[i].device = device;
- found = true;
+ hb->device = device;
break;
}
}
+ hb = NULL;
}

- if (!found) {
- pr_warn("AMD hostbridge not found\n");
- return 0;
- }
-
- pr_debug("Found AMD hostbridge at %x:%x.0\n", bus, slot);
+ return hb;
+}

- /* We enable ECS mode prior to probing MMIO as MMIO related registers
- * are in the ECS area.
- */
- pci_io_ecs_init(bus, slot);
+static int __init setup_pci_root_info(struct amd_hostbridge *hb)
+{
+ int i, min_bus, max_bus;
+ int node = 0, link = 0;
+ bool found = false;

- found = false;
for (i = 0; i < AMD_NB_F1_CONFIG_MAP_RANGES; i++) {
- int min_bus;
- int max_bus;
- reg = read_pci_config(bus, slot, 1,
- AMD_NB_F1_CONFIG_MAP_REG + (i << 2));
+ int offset = AMD_NB_F1_CONFIG_MAP_REG + (i << 2);
+ u32 reg = read_pci_config(hb->bus, hb->slot, 1, offset);

- /* Check if that register is enabled for bus range */
+ /* Check if register is enabled for bus range */
if ((reg & 7) != 3)
continue;

@@ -176,34 +139,41 @@ static int __init early_fill_mp_bus_info(void)
node = (reg >> 4) & 0x07;
link = (reg >> 8) & 0x03;

- info = alloc_pci_root_info(min_bus, max_bus, node, link);
+ if (!alloc_pci_root_info(min_bus, max_bus, node, link))
+ return -ENOMEM;
}

if (!found) {
- /* In case there is no AMDNB_F1_CONFIG_MAP_REGs,
- * we just use default to bus 0, node 0 link 0)
+ /* In case there are no AMDNB_F1_CONFIG_MAP_REGs, default
+ * to bus 0, node 0 link 0.
*/
- info = alloc_pci_root_info(0, 0, 0, 0);
+ if (!alloc_pci_root_info(0, 0, 0, 0))
+ return -ENOMEM;
}

- /* get the default node and link for left over res */
- reg = read_pci_config(bus, slot, 0, AMD_NB_F0_NODE_ID);
- def_node = (reg >> 8) & 0x07;
- reg = read_pci_config(bus, slot, 0, AMD_NB_F0_UNIT_ID);
- def_link = (reg >> 8) & 0x03;
+ return 0;
+}
+
+static void __init probe_ioport_resource(struct amd_hostbridge *hb)
+{
+ int i, node, link;
+ u64 start, end;
+ u32 reg;
+ struct pci_root_info *info;
+ struct range range[RANGE_NUM];

memset(range, 0, sizeof(range));
add_range(range, RANGE_NUM, 0, 0, 0xffff + 1);

- /* io port resource */
+ /* I/O port resource */
for (i = 0; i < AMD_NB_F1_IOPORT_RANGES; i++) {
- reg = read_pci_config(bus, slot, 1,
+ reg = read_pci_config(hb->bus, hb->slot, 1,
AMD_NB_F1_IO_BASE_ADDR_REG + (i << 3));
if (!(reg & 3))
continue;

start = reg & 0xfff000;
- reg = read_pci_config(bus, slot, 1,
+ reg = read_pci_config(hb->bus, hb->slot, 1,
AMD_NB_F1_IO_LIMIT_ADDR_REG + (i << 3));
node = reg & 0x07;
link = (reg >> 4) & 0x03;
@@ -213,19 +183,23 @@ static int __init early_fill_mp_bus_info(void)
if (!info)
continue; /* not found */

- printk(KERN_DEBUG "node %d link %d: io port [%llx, %llx]\n",
- node, link, start, end);
+ pr_debug("AMD-Bus: index %d node %d link %d: io port [%llx, %llx]\n",
+ i, node, link, start, end);

- /* kernel only handle 16 bit only */
+ /* Kernel handles 16 bit only */
if (end > 0xffff)
end = 0xffff;
update_res(info, start, end, IORESOURCE_IO, 1);
subtract_range(range, RANGE_NUM, start, end + 1);
}

- /* add left over io port range to def node/link, [0, 0xffff] */
- /* find the position */
- info = find_pci_root_info(def_node, def_link);
+ /* Add left over I/O port range to def node/link, [0, 0xffff] */
+ /* Find the position */
+ reg = read_pci_config(hb->bus, hb->slot, 0, AMD_NB_F0_NODE_ID);
+ node = (reg >> 8) & 0x07;
+ reg = read_pci_config(hb->bus, hb->slot, 0, AMD_NB_F0_UNIT_ID);
+ link = (reg >> 8) & 0x03;
+ info = find_pci_root_info(node, link);
if (info) {
for (i = 0; i < RANGE_NUM; i++) {
if (!range[i].end)
@@ -235,36 +209,47 @@ static int __init early_fill_mp_bus_info(void)
IORESOURCE_IO, 1);
}
}
+}
+
+static void __init probe_mmio_resource(struct amd_hostbridge *hb)
+{
+ int i, node, link;
+ u32 address, reg;
+ u64 start, end, val, amd_mmconf_start, amd_mmconf_end;
+ struct pci_root_info *info;
+ struct resource amd_mmconf_res, *amd_mmconf;
+ struct range range[RANGE_NUM];

memset(range, 0, sizeof(range));
+
/* 0xfd00000000-0xffffffffff for HT */
end = cap_resource((0xfdULL<<32) - 1);
end++;
add_range(range, RANGE_NUM, 0, 0, end);

- /* need to take out [0, TOM) for RAM*/
+ /* Need to take out [0, TOM] for RAM */
address = MSR_K8_TOP_MEM1;
rdmsrl(address, val);
end = (val & 0xffffff800000ULL);
- printk(KERN_INFO "TOM: %016llx aka %lldM\n", end, end>>20);
+ pr_info("AMD-Bus: TOM: %016llx aka %lldM\n", end, end>>20);
if (end < (1ULL<<32))
subtract_range(range, RANGE_NUM, 0, end);

- /* get mmconfig */
- fam10h_mmconf = amd_get_mmconfig_range(&fam10h_mmconf_res);
- /* need to take out mmconf range */
- if (fam10h_mmconf) {
- printk(KERN_DEBUG "Fam 10h mmconf %pR\n", fam10h_mmconf);
- fam10h_mmconf_start = fam10h_mmconf->start;
- fam10h_mmconf_end = fam10h_mmconf->end;
- subtract_range(range, RANGE_NUM, fam10h_mmconf_start,
- fam10h_mmconf_end + 1);
+ /* Get mmconfig */
+ amd_mmconf = amd_get_mmconfig_range(&amd_mmconf_res);
+ /* Need to take out MMCONF range */
+ if (amd_mmconf) {
+ pr_debug("AMD-Bus: mmconf %pR\n", amd_mmconf);
+ amd_mmconf_start = amd_mmconf->start;
+ amd_mmconf_end = amd_mmconf->end;
+ subtract_range(range, RANGE_NUM, amd_mmconf_start,
+ amd_mmconf_end + 1);
} else {
- fam10h_mmconf_start = 0;
- fam10h_mmconf_end = 0;
+ amd_mmconf_start = 0;
+ amd_mmconf_end = 0;
}

- /* mmio resource */
+ /* Probing MMIO base/limit regs */
for (i = 0; i < AMD_NB_F1_MMIO_RANGES; i++) {
u64 tmp;
u32 base = AMD_NB_F1_MMIO_BASE_REG + (i << 3);
@@ -286,21 +271,21 @@ static int __init early_fill_mp_bus_info(void)
}

/* Base lo */
- reg = _amd_read_pci_config(bus, slot, 1, base);
+ reg = _amd_read_pci_config(hb->bus, hb->slot, 1, base);
if (!(reg & 3))
continue;

start = (reg & 0xffffff00UL) << 8; /* 39:16 on 31:8 */

/* Limit lo */
- reg = _amd_read_pci_config(bus, slot, 1, limit);
+ reg = _amd_read_pci_config(hb->bus, hb->slot, 1, limit);
node = reg & 0x07;
link = (reg >> 4) & 0x03;
end = (reg & 0xffffff00UL) << 8; /* 39:16 on 31:8 */
end |= 0xffffUL;

/* Base/Limit hi */
- tmp = _amd_read_pci_config(bus, slot, 1, base_limit_hi);
+ tmp = _amd_read_pci_config(hb->bus, hb->slot, 1, base_limit_hi);
start |= ((tmp & 0xffUL) << 40);
end |= ((tmp & (0xffUL << 16)) << 24);

@@ -308,36 +293,38 @@ static int __init early_fill_mp_bus_info(void)
if (!info)
continue;

- printk(KERN_DEBUG "node %d link %d: mmio [%llx, %llx]",
- node, link, start, end);
+ pr_debug("AMD-Bus: index %d node %d link %d: mmio [%llx, %llx]",
+ i, node, link, start, end);
+
/*
- * some sick allocation would have range overlap with fam10h
- * mmconf range, so need to update start and end.
+ * Some (sick) allocations have range overlapping with MMCONF
+ * range, so need to update start and end.
*/
- if (fam10h_mmconf_end) {
+ if (amd_mmconf_end) {
int changed = 0;
u64 endx = 0;
- if (start >= fam10h_mmconf_start &&
- start <= fam10h_mmconf_end) {
- start = fam10h_mmconf_end + 1;
+ if (start >= amd_mmconf_start &&
+ start <= amd_mmconf_end) {
+ start = amd_mmconf_end + 1;
changed = 1;
}

- if (end >= fam10h_mmconf_start &&
- end <= fam10h_mmconf_end) {
- end = fam10h_mmconf_start - 1;
+ if (end >= amd_mmconf_start &&
+ end <= amd_mmconf_end) {
+ end = amd_mmconf_start - 1;
changed = 1;
}

- if (start < fam10h_mmconf_start &&
- end > fam10h_mmconf_end) {
- /* we got a hole */
- endx = fam10h_mmconf_start - 1;
- update_res(info, start, endx, IORESOURCE_MEM, 0);
+ if (start < amd_mmconf_start &&
+ end > amd_mmconf_end) {
+ /* We got a hole */
+ endx = amd_mmconf_start - 1;
+ update_res(info, start, endx,
+ IORESOURCE_MEM, 0);
subtract_range(range, RANGE_NUM, start,
endx + 1);
printk(KERN_CONT " ==> [%llx, %llx]", start, endx);
- start = fam10h_mmconf_end + 1;
+ start = amd_mmconf_end + 1;
changed = 1;
}
if (changed) {
@@ -356,25 +343,30 @@ static int __init early_fill_mp_bus_info(void)
printk(KERN_CONT "\n");
}

- /* need to take out [4G, TOM2) for RAM*/
+ /* Need to take out [4G, TOM2] for RAM */
/* SYS_CFG */
address = MSR_K8_SYSCFG;
rdmsrl(address, val);
- /* TOP_MEM2 is enabled? */
+ /* Is TOP_MEM2 enabled? */
if (val & (1<<21)) {
/* TOP_MEM2 */
address = MSR_K8_TOP_MEM2;
rdmsrl(address, val);
end = (val & 0xffffff800000ULL);
- printk(KERN_INFO "TOM2: %016llx aka %lldM\n", end, end>>20);
+ pr_info("AMD-Bus: TOM2: %016llx aka %lldM\n",
+ end, end>>20);
subtract_range(range, RANGE_NUM, 1ULL<<32, end);
}

/*
- * add left over mmio range to def node/link ?
- * that is tricky, just record range in from start_min to 4G
+ * Add left over MMIO range to def node/link?
+ * That is tricky, just record range from start_min to 4G.
*/
- info = find_pci_root_info(def_node, def_link);
+ reg = read_pci_config(hb->bus, hb->slot, 0, AMD_NB_F0_NODE_ID);
+ node = (reg >> 8) & 0x07;
+ reg = read_pci_config(hb->bus, hb->slot, 0, AMD_NB_F0_UNIT_ID);
+ link = (reg >> 8) & 0x03;
+ info = find_pci_root_info(node, link);
if (info) {
for (i = 0; i < RANGE_NUM; i++) {
if (!range[i].end)
@@ -385,20 +377,6 @@ static int __init early_fill_mp_bus_info(void)
IORESOURCE_MEM, 1);
}
}
-
- list_for_each_entry(info, &pci_root_infos, list) {
- int busnum;
- struct pci_root_res *root_res;
-
- busnum = info->busn.start;
- printk(KERN_DEBUG "bus: %pR on node %x link %x\n",
- &info->busn, info->node, info->link);
- list_for_each_entry(root_res, &info->resources, list)
- printk(KERN_DEBUG "bus: %02x %pR\n",
- busnum, &root_res->res);
- }
-
- return 0;
}

#define ENABLE_CF8_EXT_CFG (1ULL << 46)
@@ -464,15 +442,15 @@ static void __init pci_enable_pci_io_ecs(u8 bus, u8 slot)
#endif
}

-static int __init pci_io_ecs_init(u8 bus, u8 slot)
+static int __init pci_io_ecs_init(struct amd_hostbridge *hb)
{
int cpu;

- /* assume all cpus from fam10h have IO ECS */
+ /* Assume all CPUs from Family 10h onward have IO ECS */
if (boot_cpu_data.x86 < 0x10)
return 0;

- pci_enable_pci_io_ecs(bus, slot);
+ pci_enable_pci_io_ecs(hb->bus, hb->slot);

cpu_notifier_register_begin();
for_each_online_cpu(cpu)
@@ -486,12 +464,54 @@ static int __init pci_io_ecs_init(u8 bus, u8 slot)
return 0;
}

+/**
+ * amd_postcore_init is called before pcibios_scan_root and pci_scan_bus.
+ */
static int __init amd_postcore_init(void)
{
+ struct pci_root_info *info;
+ struct amd_hostbridge *hb;
+
if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
return 0;

- early_fill_mp_bus_info();
+ if (!early_pci_allowed())
+ return -1;
+
+ hb = probe_pci_hostbridge();
+ if (!hb) {
+ pr_warn("AMD-Bus: AMD hostbridge not found\n");
+ return -1;
+ }
+
+ pr_debug("AMD-Bus: Found hostbridge %x:%x.0 (device ID = %#x)\n",
+ hb->bus, hb->slot, hb->device);
+
+ /* Enable Extended Configuration Space (ECS) mode prior to probing
+ * MMIO since the MMIO related registers are in the ECS area.
+ */
+ pci_io_ecs_init(hb);
+
+ if (setup_pci_root_info(hb) != 0) {
+ pr_warn("AMD-Bus: Failed to setup pci root info.\n");
+ return -1;
+ }
+
+ probe_ioport_resource(hb);
+
+ probe_mmio_resource(hb);
+
+ list_for_each_entry(info, &pci_root_infos, list) {
+ int busnum;
+ struct pci_root_res *root_res;
+
+ busnum = info->busn.start;
+ pr_debug("AMD-Bus: bus: %pR on node %x link %x\n",
+ &info->busn, info->node, info->link);
+ list_for_each_entry(root_res, &info->resources, list)
+ pr_debug("AMD-Bus: bus: %02x %pR\n",
+ busnum, &root_res->res);
+ }

return 0;
}

2014-04-19 02:54:11

by Myron Stowe

[permalink] [raw]
Subject: [PATCH v2 5/5] PCI: Remove redundant 'quirk_amd_nb_node'

With the amd_bus.c updates to support additional AMD processors (11h, 12h,
14h 15h and 16h) 'quirk_amd_nb_node' seems to be redundant. This patch
removes it.

Signed-off-by: Myron Stowe <[email protected]>
---

arch/x86/kernel/quirks.c | 58 ----------------------------------------------
1 files changed, 0 insertions(+), 58 deletions(-)

diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index ff898bb..abbd6bc 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -514,64 +514,6 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS,

#endif

-#if defined(CONFIG_PCI) && defined(CONFIG_NUMA)
-/* Set correct numa_node information for AMD NB functions */
-static void quirk_amd_nb_node(struct pci_dev *dev)
-{
- struct pci_dev *nb_ht;
- unsigned int devfn;
- u32 node;
- u32 val;
-
- devfn = PCI_DEVFN(PCI_SLOT(dev->devfn), 0);
- nb_ht = pci_get_slot(dev->bus, devfn);
- if (!nb_ht)
- return;
-
- pci_read_config_dword(nb_ht, 0x60, &val);
- node = pcibus_to_node(dev->bus) | (val & 7);
- /*
- * Some hardware may return an invalid node ID,
- * so check it first:
- */
- if (node_online(node))
- set_dev_node(&dev->dev, node);
- pci_dev_put(nb_ht);
-}
-
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_K8_NB,
- quirk_amd_nb_node);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP,
- quirk_amd_nb_node);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_K8_NB_MEMCTL,
- quirk_amd_nb_node);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_K8_NB_MISC,
- quirk_amd_nb_node);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_10H_NB_HT,
- quirk_amd_nb_node);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_10H_NB_MAP,
- quirk_amd_nb_node);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_10H_NB_DRAM,
- quirk_amd_nb_node);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_10H_NB_MISC,
- quirk_amd_nb_node);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_10H_NB_LINK,
- quirk_amd_nb_node);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_15H_NB_F0,
- quirk_amd_nb_node);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_15H_NB_F1,
- quirk_amd_nb_node);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_15H_NB_F2,
- quirk_amd_nb_node);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_15H_NB_F3,
- quirk_amd_nb_node);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_15H_NB_F4,
- quirk_amd_nb_node);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_15H_NB_F5,
- quirk_amd_nb_node);
-
-#endif
-
#ifdef CONFIG_PCI
/*
* Processor does not ensure DRAM scrub read/write sequence

2014-04-19 02:54:19

by Myron Stowe

[permalink] [raw]
Subject: [PATCH v2 4/5] ACPI/PCI: Warn if we have to "guess" host bridge node information

The vast majority of platforms are not supplying ACPI _PXM (proximity)
information corresponding to host bridge (PNP0A03/PNP0A08) devices
resulting in sysfs "numa_node" values of -1 (NUMA_NO_NODE) [1]:
# for i in /sys/devices/pci0000\:00/*/numa_node; do cat $i; done | uniq
-1

# find /sys/ -name "numa_node" | while read fname; do cat $fname; \
done | uniq
-1

AMD based platforms provide a fall-back for this situation via amd_bus.c.
These platforms snoop out the information by directly reading specific
registers from the Northbridge and caching them via 'alloc_pci_root_info'.

Later during boot processing when host bridges are discovered -
'pci_acpi_scan_root' - the kernel looks for their corresponding ACPI _PXM
method - drivers/acpi/numa.c::acpi_get_node(). If the BIOS supplied a
_PXM method then that node (proximity) value is associated. If the BIOS
did not supply a _PXM method *and* the platform is AMD based, the
fall-back cached values obtained directly from the Northbridge are used;
otherwise, "NUMA_NO_NODE" is associated.

There are a number of issues with this fall-back mechanism the most
notable being that amd_bus.c extracts a 3-bit number from a CPU register
and uses it as the node number. The node numbers used by Linux are
logical and there's no reason they need to be identical to settings in the
CPU registers. So if we have some node information obtained in the normal
way (from _PXM, SLIT, SRAT, etc.) and some from amd_bus.c, there's no
reason to believe they will be compatible.

This patch warns when this situation occurs:
pci_root PNP0A08:00: [Firmware Bug]: No _PXM; guessing node number 0

[1] https://bugzilla.kernel.org/show_bug.cgi?id=72051

Signed-off-by: Myron Stowe <[email protected]>
---

arch/x86/pci/acpi.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index 01edac6..80c09ba 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -489,8 +489,12 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
}

node = acpi_get_node(device->handle);
- if (node == NUMA_NO_NODE)
+ if (node == NUMA_NO_NODE) {
node = x86_pci_root_bus_node(busnum);
+ if (node != NUMA_NO_NODE)
+ dev_info(&device->dev, FW_BUG "No _PXM; guessing node number %x\n",
+ node);
+ }

if (node != NUMA_NO_NODE && !node_online(node))
node = NUMA_NO_NODE;

2014-04-19 02:53:40

by Myron Stowe

[permalink] [raw]
Subject: [PATCH v2 1/5] x86/PCI: Add support for generic AMD hostbridges

From: Suravee Suthikulpanit <[email protected]>

AMD hostbridges gnenerally show up as PCI device 0:18.0. This patch adds
logic to automatically probe the device at this location and check PCI
device class code.

This patch supports platforms with the following AMD hostbridge types:
(K8, family10h, 11h, 12h, 14h 15h and 16h processors).

Reference(s):
https://bugzilla.kernel.org/show_bug.cgi?id=72051
Advanced Micro Devices (AMD), "BIOS and Kernel Developer's
Guide (BKDG) for AMD Family 15h Models 00h-0fh Processors." Section
3.4 Device [1F:18]h Function 1 Configuration Registers; D18F1x[EC:E0]
Configuration Map, 42301 Rev 3.12 - October 11, 2012.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
Signed-off-by: Myron Stowe <[email protected]>
Tested-by: Aravind Gopalakrishnan <[email protected]>
---

arch/x86/pci/amd_bus.c | 71 ++++++++++++++++++++++++++++++++----------------
1 files changed, 47 insertions(+), 24 deletions(-)

diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
index e88f4c5..c8cd75c 100644
--- a/arch/x86/pci/amd_bus.c
+++ b/arch/x86/pci/amd_bus.c
@@ -11,27 +11,34 @@

#include "bus_numa.h"

+#define AMD_NB_F0_NODE_ID 0x60
+#define AMD_NB_F0_UNIT_ID 0x64
+#define AMD_NB_F1_CONFIG_MAP_REG 0xe0
+
+#define RANGE_NUM 16
+#define AMD_NB_F1_CONFIG_MAP_RANGES 4
+
/*
- * This discovers the pcibus <-> node mapping on AMD K8.
- * also get peer root bus resource for io,mmio
+ * This discovers the pcibus <-> node mapping on AMD K8, Family 10h, 12h,
+ * 14h, 15h, and 16h processor based systems. This also gets peer root bus
+ * resources corresponding to I/O and MMIO addresses.
*/

-struct pci_hostbridge_probe {
+struct amd_hostbridge {
u32 bus;
u32 slot;
- u32 vendor;
u32 device;
};

-static struct pci_hostbridge_probe pci_probes[] __initdata = {
- { 0, 0x18, PCI_VENDOR_ID_AMD, 0x1100 },
- { 0, 0x18, PCI_VENDOR_ID_AMD, 0x1200 },
- { 0xff, 0, PCI_VENDOR_ID_AMD, 0x1200 },
- { 0, 0x18, PCI_VENDOR_ID_AMD, 0x1300 },
+static struct amd_hostbridge hb_probes[] __initdata = {
+ /* Standard AMD Hostbriges are at bus 0x0 slot 0x18.
+ * In case of PCI_ANY_ID, the logic will try matching
+ * PCI_CLASS_BRIDGE_HOST instead.
+ */
+ { 0x0 , 0x18, PCI_ANY_ID },
+ { 0xff, 0 , PCI_DEVICE_ID_AMD_10H_NB_HT },
};

-#define RANGE_NUM 16
-
static struct pci_root_info __init *find_pci_root_info(int node, int link)
{
struct pci_root_info *info;
@@ -75,31 +82,47 @@ static int __init early_fill_mp_bus_info(void)
return -1;

found = false;
- for (i = 0; i < ARRAY_SIZE(pci_probes); i++) {
+ for (i = 0; i < ARRAY_SIZE(hb_probes); i++) {
u32 id;
u16 device;
u16 vendor;
+ u32 class;

- bus = pci_probes[i].bus;
- slot = pci_probes[i].slot;
+ bus = hb_probes[i].bus;
+ slot = hb_probes[i].slot;
id = read_pci_config(bus, slot, 0, PCI_VENDOR_ID);
-
vendor = id & 0xffff;
device = (id>>16) & 0xffff;
- if (pci_probes[i].vendor == vendor &&
- pci_probes[i].device == device) {
- found = true;
- break;
+ class = read_pci_config(bus, slot, 0,
+ PCI_CLASS_REVISION) >> 16;
+
+ if (PCI_VENDOR_ID_AMD == vendor) {
+ if (hb_probes[i].device == device) {
+ found = true;
+ break;
+ }
+
+ if ((hb_probes[i].device == PCI_ANY_ID) &&
+ (class == PCI_CLASS_BRIDGE_HOST)) {
+ hb_probes[i].device = device;
+ found = true;
+ break;
+ }
}
}

- if (!found)
+ if (!found) {
+ pr_warn("AMD hostbridge not found\n");
return 0;
+ }

- for (i = 0; i < 4; i++) {
+ pr_debug("Found AMD hostbridge at %x:%x.0\n", bus, slot);
+
+ for (i = 0; i < AMD_NB_F1_CONFIG_MAP_RANGES; i++) {
int min_bus;
int max_bus;
- reg = read_pci_config(bus, slot, 1, 0xe0 + (i << 2));
+ reg = read_pci_config(bus, slot, 1,
+ AMD_NB_F1_CONFIG_MAP_REG + (i << 2));

/* Check if that register is enabled for bus range */
if ((reg & 7) != 3)
@@ -114,9 +137,9 @@ static int __init early_fill_mp_bus_info(void)
}

/* get the default node and link for left over res */
- reg = read_pci_config(bus, slot, 0, 0x60);
+ reg = read_pci_config(bus, slot, 0, AMD_NB_F0_NODE_ID);
def_node = (reg >> 8) & 0x07;
- reg = read_pci_config(bus, slot, 0, 0x64);
+ reg = read_pci_config(bus, slot, 0, AMD_NB_F0_UNIT_ID);
def_link = (reg >> 8) & 0x03;

memset(range, 0, sizeof(range));

2014-04-19 11:31:29

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] x86/PCI: Add support for generic AMD hostbridges

On Fri, Apr 18, 2014 at 08:53:16PM -0600, Myron Stowe wrote:
> From: Suravee Suthikulpanit <[email protected]>
>
> AMD hostbridges gnenerally show up as PCI device 0:18.0. This patch adds
> logic to automatically probe the device at this location and check PCI
> device class code.
>
> This patch supports platforms with the following AMD hostbridge types:
> (K8, family10h, 11h, 12h, 14h 15h and 16h processors).
>
> Reference(s):
> https://bugzilla.kernel.org/show_bug.cgi?id=72051
> Advanced Micro Devices (AMD), "BIOS and Kernel Developer's
> Guide (BKDG) for AMD Family 15h Models 00h-0fh Processors." Section
> 3.4 Device [1F:18]h Function 1 Configuration Registers; D18F1x[EC:E0]
> Configuration Map, 42301 Rev 3.12 - October 11, 2012.
>
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
> Signed-off-by: Myron Stowe <[email protected]>
> Tested-by: Aravind Gopalakrishnan <[email protected]>
> ---
>
> arch/x86/pci/amd_bus.c | 71 ++++++++++++++++++++++++++++++++----------------
> 1 files changed, 47 insertions(+), 24 deletions(-)
>
> diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
> index e88f4c5..c8cd75c 100644
> --- a/arch/x86/pci/amd_bus.c
> +++ b/arch/x86/pci/amd_bus.c
> @@ -11,27 +11,34 @@
>
> #include "bus_numa.h"
>
> +#define AMD_NB_F0_NODE_ID 0x60
> +#define AMD_NB_F0_UNIT_ID 0x64
> +#define AMD_NB_F1_CONFIG_MAP_REG 0xe0
> +
> +#define RANGE_NUM 16
> +#define AMD_NB_F1_CONFIG_MAP_RANGES 4
> +
> /*
> - * This discovers the pcibus <-> node mapping on AMD K8.
> - * also get peer root bus resource for io,mmio
> + * This discovers the pcibus <-> node mapping on AMD K8, Family 10h, 12h,
> + * 14h, 15h, and 16h processor based systems. This also gets peer root bus
> + * resources corresponding to I/O and MMIO addresses.
> */
>
> -struct pci_hostbridge_probe {
> +struct amd_hostbridge {
> u32 bus;
> u32 slot;
> - u32 vendor;
> u32 device;
> };
>
> -static struct pci_hostbridge_probe pci_probes[] __initdata = {
> - { 0, 0x18, PCI_VENDOR_ID_AMD, 0x1100 },
> - { 0, 0x18, PCI_VENDOR_ID_AMD, 0x1200 },
> - { 0xff, 0, PCI_VENDOR_ID_AMD, 0x1200 },
> - { 0, 0x18, PCI_VENDOR_ID_AMD, 0x1300 },
> +static struct amd_hostbridge hb_probes[] __initdata = {
> + /* Standard AMD Hostbriges are at bus 0x0 slot 0x18.
> + * In case of PCI_ANY_ID, the logic will try matching
> + * PCI_CLASS_BRIDGE_HOST instead.
> + */

Standard kernel comment style:

/*
* Something interesting for the code below.
* This is the second comment line.
*/

> + { 0x0 , 0x18, PCI_ANY_ID },
> + { 0xff, 0 , PCI_DEVICE_ID_AMD_10H_NB_HT },
> };
>
> -#define RANGE_NUM 16
> -
> static struct pci_root_info __init *find_pci_root_info(int node, int link)
> {
> struct pci_root_info *info;
> @@ -75,31 +82,47 @@ static int __init early_fill_mp_bus_info(void)

Btw, this function is huuuge. It could use a splitup some day.

> return -1;
>
> found = false;
> - for (i = 0; i < ARRAY_SIZE(pci_probes); i++) {
> + for (i = 0; i < ARRAY_SIZE(hb_probes); i++) {
> u32 id;
> u16 device;
> u16 vendor;
> + u32 class;
>
> - bus = pci_probes[i].bus;
> - slot = pci_probes[i].slot;
> + bus = hb_probes[i].bus;
> + slot = hb_probes[i].slot;
> id = read_pci_config(bus, slot, 0, PCI_VENDOR_ID);
> -
> vendor = id & 0xffff;
> device = (id>>16) & 0xffff;
> - if (pci_probes[i].vendor == vendor &&
> - pci_probes[i].device == device) {
> - found = true;
> - break;
> + class = read_pci_config(bus, slot, 0,
> + PCI_CLASS_REVISION) >> 16;
> +
> + if (PCI_VENDOR_ID_AMD == vendor) {

Flip comparison:

if (vendor == PCI_VENDOR_ID_AMD)

and then save yourself another indentation level:

if (vendor != PCI_VENDOR_ID_AMD)
continue;

> + if (hb_probes[i].device == device) {
> + found = true;
> + break;
> + }
> +
> + if ((hb_probes[i].device == PCI_ANY_ID) &&
> + (class == PCI_CLASS_BRIDGE_HOST)) {
> + hb_probes[i].device = device;
> + found = true;
> + break;
> + }
> }
> }
>
> - if (!found)
> + if (!found) {
> + pr_warn("AMD hostbridge not found\n");
> return 0;
> + }

I guess this should return a negative value... Not that it matters much
from looking at the call site in the amd_postcore_init initcall.

>
> - for (i = 0; i < 4; i++) {
> + pr_debug("Found AMD hostbridge at %x:%x.0\n", bus, slot);

Why not pr_info? I think this is useful info we want to print out at
default level.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-19 13:52:30

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

On Fri, Apr 18, 2014 at 08:53:23PM -0600, Myron Stowe wrote:
> From: Suravee Suthikulpanit <[email protected]>
>
> This patch adds supports for additional MMIO ranges (16 ranges). Also,
> each MMIO base/limit can now support up to 48-bit MMIO addresses.
> However, this requires initializing the ECS sooner since the new registers
> are in the ECS ranges.
>
> This applies for AMD Family 15h and later.
>
> Reference: Advanced Micro Devices (AMD), "BIOS and Kernel Developer's
> Guide (BKDG) for AMD Family 15h Models 00h-0fh Processors." Section 3.4
> Device [1F:18]h Function 1 Configuration Registers;
> D18F1x[1CC:180,BC:80] MMIO Base/Limit,
> D18F1x[D8,D0,C8,C0] IO-Space Base,
> D18F1x[DC,D4,CC,C4] IO-Space Limit,
> 42301 Rev 3.12 - October 11, 2012.
>
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
> Signed-off-by: Myron Stowe <[email protected]>
> Tested-by: Aravind Gopalakrishnan <[email protected]>
> ---
>
> arch/x86/pci/amd_bus.c | 126 +++++++++++++++++++++++++++++++++++++++---------
> 1 files changed, 103 insertions(+), 23 deletions(-)
>
> diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
> index c8cd75c..1b2895e 100644
> --- a/arch/x86/pci/amd_bus.c
> +++ b/arch/x86/pci/amd_bus.c
> @@ -13,11 +13,30 @@
>
> #define AMD_NB_F0_NODE_ID 0x60
> #define AMD_NB_F0_UNIT_ID 0x64
> +#define AMD_NB_F1_MMIO_BASE_REG 0x80
> +#define AMD_NB_F1_MMIO_LIMIT_REG 0x84
> +#define AMD_NB_F1_MMIO_BASE_LIMIT_HI_REG 0x180
> +#define AMD_NB_F1_IO_BASE_ADDR_REG 0xc0
> +#define AMD_NB_F1_IO_LIMIT_ADDR_REG 0xc4
> #define AMD_NB_F1_CONFIG_MAP_REG 0xe0
>
> #define RANGE_NUM 16
> +#define AMD_NB_F1_MMIO_RANGES 16
> +#define AMD_NB_F1_IOPORT_RANGES 4
> #define AMD_NB_F1_CONFIG_MAP_RANGES 4
>
> +#define AMD_PCIE_CF8(bus, dev, fn, reg) \
> + (0x80000000 | \
> + ((reg & 0xF00) << 16) | \
> + ((bus & 0xF) << 16) | \
> + ((dev & 0x1F) << 11) | \
> + ((fn & 0x7) << 8) | \
> + ((reg & 0xFC)))
> +
> +static bool amd_early_ecs_enabled;
> +
> +static int __init pci_io_ecs_init(u8 bus, u8 slot);

Please move the function above its caller instead of adding a forward
declaration.

> +
> /*
> * This discovers the pcibus <-> node mapping on AMD K8, Family 10h, 12h,
> * 14h, 15h, and 16h processor based systems. This also gets peer root bus
> @@ -39,6 +58,20 @@ static struct amd_hostbridge hb_probes[] __initdata = {
> { 0xff, 0 , PCI_DEVICE_ID_AMD_10H_NB_HT },
> };
>
> +/* This version of read_pci_config allows reading registers in the ECS area */
> +static inline int _amd_read_pci_config(u8 bus, u8 slot, u8 fn, u32 offset)
> +{
> + u32 value;
> +
> + if ((!amd_early_ecs_enabled) && (offset > 0xFF))
> + return -1;
> +
> + outl(AMD_PCIE_CF8(bus, slot, fn, offset), 0xcf8);
> + value = inl(0xcfc);
> +
> + return value;
> +
> static struct pci_root_info __init *find_pci_root_info(int node, int link)
> {
> struct pci_root_info *info;
> @@ -48,6 +81,9 @@ static struct pci_root_info __init *find_pci_root_info(int node, int link)
> if (info->node == node && info->link == link)
> return info;
>
> + pr_warn("AMD-Bus: WARNING: Failed to find root info for node %#x, link %#x\n",

Prefixes like "AMD-Bus" are done using pr_fmt.

> + node, link);
> +
> return NULL;
> }
>
> @@ -118,6 +154,12 @@ static int __init early_fill_mp_bus_info(void)
>
> pr_debug("Found AMD hostbridge at %x:%x.0\n", bus, slot);
>
> + /* We enable ECS mode prior to probing MMIO as MMIO related registers
> + * are in the ECS area.
> + */
> + pci_io_ecs_init(bus, slot);

Well, if enabling the ECS failed somehow, you probably want to save
yourself the _amd_read_pci_config() calls further down.

Which means:

* you should propagate an error code from that pci_io_ecs_init() function

* you can carve out the huge loop iterating over AMD_NB_F1_MMIO_RANGES
along with the pci ECS enablement call to pci_io_ecs_init into a
separate function and thus slim down that huge early_fill_mp_bus_info()
monster a bit.

> +
> + found = false;
> for (i = 0; i < AMD_NB_F1_CONFIG_MAP_RANGES; i++) {
> int min_bus;
> int max_bus;
> @@ -128,6 +170,7 @@ static int __init early_fill_mp_bus_info(void)
> if ((reg & 7) != 3)
> continue;
>
> + found = true;
> min_bus = (reg >> 16) & 0xff;
> max_bus = (reg >> 24) & 0xff;
> node = (reg >> 4) & 0x07;
> @@ -136,6 +179,13 @@ static int __init early_fill_mp_bus_info(void)
> info = alloc_pci_root_info(min_bus, max_bus, node, link);
> }
>
> + if (!found) {
> + /* In case there is no AMDNB_F1_CONFIG_MAP_REGs,
> + * we just use default to bus 0, node 0 link 0)
> + */
> + info = alloc_pci_root_info(0, 0, 0, 0);
> + }

No need for curly braces around a single statement - simply put the
comment before the if (...).

> +
> /* get the default node and link for left over res */
> reg = read_pci_config(bus, slot, 0, AMD_NB_F0_NODE_ID);
> def_node = (reg >> 8) & 0x07;
> @@ -144,14 +194,17 @@ static int __init early_fill_mp_bus_info(void)
>
> memset(range, 0, sizeof(range));
> add_range(range, RANGE_NUM, 0, 0, 0xffff + 1);
> +
> /* io port resource */
> - for (i = 0; i < 4; i++) {
> - reg = read_pci_config(bus, slot, 1, 0xc0 + (i << 3));
> + for (i = 0; i < AMD_NB_F1_IOPORT_RANGES; i++) {
> + reg = read_pci_config(bus, slot, 1,
> + AMD_NB_F1_IO_BASE_ADDR_REG + (i << 3));
> if (!(reg & 3))
> continue;
>
> start = reg & 0xfff000;
> - reg = read_pci_config(bus, slot, 1, 0xc4 + (i << 3));
> + reg = read_pci_config(bus, slot, 1,
> + AMD_NB_F1_IO_LIMIT_ADDR_REG + (i << 3));
> node = reg & 0x07;
> link = (reg >> 4) & 0x03;
> end = (reg & 0xfff000) | 0xfff;
> @@ -169,6 +222,7 @@ static int __init early_fill_mp_bus_info(void)
> update_res(info, start, end, IORESOURCE_IO, 1);
> subtract_range(range, RANGE_NUM, start, end + 1);
> }
> +
> /* add left over io port range to def node/link, [0, 0xffff] */
> /* find the position */
> info = find_pci_root_info(def_node, def_link);
> @@ -211,22 +265,46 @@ static int __init early_fill_mp_bus_info(void)
> }
>
> /* mmio resource */
> - for (i = 0; i < 8; i++) {
> - reg = read_pci_config(bus, slot, 1, 0x80 + (i << 3));
> + for (i = 0; i < AMD_NB_F1_MMIO_RANGES; i++) {
> + u64 tmp;
> + u32 base = AMD_NB_F1_MMIO_BASE_REG + (i << 3);
> + u32 limit = AMD_NB_F1_MMIO_LIMIT_REG + (i << 3);
> + u32 base_limit_hi = AMD_NB_F1_MMIO_BASE_LIMIT_HI_REG + (i << 2);
> +
> + if (i >= 12) {
> + /* Range 12 base/limit starts at 0x2A0 */
> + base += 0x1c0;
> + limit += 0x1c0;
> + /* Range 12 base/limit hi starts at 0x2C0 */
> + base_limit_hi += 0x110;
> + } else if (i >= 8) {
> + /* Range 8 base/limit starts at 0x1A0 */
> + base += 0xe0;
> + limit += 0xe0;
> + /* Range 12 base/limit hi starts at 0x1C0 */
> + base_limit_hi += 0x20;
> + }
> +
> + /* Base lo */
> + reg = _amd_read_pci_config(bus, slot, 1, base);
> if (!(reg & 3))
> continue;
>
> - start = reg & 0xffffff00; /* 39:16 on 31:8*/
> - start <<= 8;
> - reg = read_pci_config(bus, slot, 1, 0x84 + (i << 3));
> + start = (reg & 0xffffff00UL) << 8; /* 39:16 on 31:8 */
> +
> + /* Limit lo */
> + reg = _amd_read_pci_config(bus, slot, 1, limit);
> node = reg & 0x07;
> link = (reg >> 4) & 0x03;
> - end = (reg & 0xffffff00);
> - end <<= 8;
> - end |= 0xffff;
> + end = (reg & 0xffffff00UL) << 8; /* 39:16 on 31:8 */
> + end |= 0xffffUL;
>
> - info = find_pci_root_info(node, link);
> + /* Base/Limit hi */
> + tmp = _amd_read_pci_config(bus, slot, 1, base_limit_hi);
> + start |= ((tmp & 0xffUL) << 40);
> + end |= ((tmp & (0xffUL << 16)) << 24);
>
> + info = find_pci_root_info(node, link);
> if (!info)
> continue;
>
> @@ -354,20 +432,24 @@ static struct notifier_block amd_cpu_notifier = {
> .notifier_call = amd_cpu_notify,
> };
>
> -static void __init pci_enable_pci_io_ecs(void)
> +static void __init pci_enable_pci_io_ecs(u8 bus, u8 slot)
> {
> #ifdef CONFIG_AMD_NB
> unsigned int i, n;
> + u8 limit;
>
> for (n = i = 0; !n && amd_nb_bus_dev_ranges[i].dev_limit; ++i) {
> - u8 bus = amd_nb_bus_dev_ranges[i].bus;
> - u8 slot = amd_nb_bus_dev_ranges[i].dev_base;
> - u8 limit = amd_nb_bus_dev_ranges[i].dev_limit;
> + /* Try matching for the bus range */
> + if ((bus != amd_nb_bus_dev_ranges[i].bus) ||
> + (slot != amd_nb_bus_dev_ranges[i].dev_base))
> + continue;
> +
> + limit = amd_nb_bus_dev_ranges[i].dev_limit;
>
> + /* Setup all northbridges within the range */
> for (; slot < limit; ++slot) {
> u32 val = read_pci_config(bus, slot, 3, 0);
> -
> - if (!early_is_amd_nb(val))
> + if (!val)
> continue;
>
> val = read_pci_config(bus, slot, 3, 0x8c);
> @@ -375,13 +457,14 @@ static void __init pci_enable_pci_io_ecs(void)
> val |= ENABLE_CF8_EXT_CFG >> 32;

What a fun shifting!

Maybe you should do

#define ENABLE_CF8_EXT_CFG BIT(46 - 32)

to show exactly what you mean and how the bit is defined in MSR NB_CFG1
and also show how the high 32-bits are mapped into F3x8c, while at it.

And then you can drop the shifting at the call site.


> write_pci_config(bus, slot, 3, 0x8c, val);
> }
> + amd_early_ecs_enabled = true;

You probably should check whether the PCI config write stuck before
setting this.

> ++n;
> }
> }
> #endif
> }
>
> -static int __init pci_io_ecs_init(void)
> +static int __init pci_io_ecs_init(u8 bus, u8 slot)
> {
> int cpu;
>
> @@ -389,9 +472,7 @@ static int __init pci_io_ecs_init(void)
> if (boot_cpu_data.x86 < 0x10)
> return 0;
>
> - /* Try the PCI method first. */
> - if (early_pci_allowed())
> - pci_enable_pci_io_ecs();

Where did the if-check go?

> + pci_enable_pci_io_ecs(bus, slot);
>
> cpu_notifier_register_begin();
> for_each_online_cpu(cpu)
> @@ -411,7 +492,6 @@ static int __init amd_postcore_init(void)
> return 0;
>
> early_fill_mp_bus_info();
> - pci_io_ecs_init();
>
> return 0;
> }
>

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-20 07:59:54

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

Drop Andreas' old email address from CC as it keeps bouncing.

On Sat, Apr 19, 2014 at 03:52:20PM +0200, Borislav Petkov wrote:
> > -static void __init pci_enable_pci_io_ecs(void)
> > +static void __init pci_enable_pci_io_ecs(u8 bus, u8 slot)
> > {
> > #ifdef CONFIG_AMD_NB
> > unsigned int i, n;
> > + u8 limit;
> >
> > for (n = i = 0; !n && amd_nb_bus_dev_ranges[i].dev_limit; ++i) {
> > - u8 bus = amd_nb_bus_dev_ranges[i].bus;
> > - u8 slot = amd_nb_bus_dev_ranges[i].dev_base;
> > - u8 limit = amd_nb_bus_dev_ranges[i].dev_limit;
> > + /* Try matching for the bus range */
> > + if ((bus != amd_nb_bus_dev_ranges[i].bus) ||
> > + (slot != amd_nb_bus_dev_ranges[i].dev_base))
> > + continue;
> > +
> > + limit = amd_nb_bus_dev_ranges[i].dev_limit;
> >
> > + /* Setup all northbridges within the range */
> > for (; slot < limit; ++slot) {
> > u32 val = read_pci_config(bus, slot, 3, 0);
> > -
> > - if (!early_is_amd_nb(val))
> > + if (!val)
> > continue;
> >
> > val = read_pci_config(bus, slot, 3, 0x8c);
> > @@ -375,13 +457,14 @@ static void __init pci_enable_pci_io_ecs(void)
> > val |= ENABLE_CF8_EXT_CFG >> 32;
>
> What a fun shifting!
>
> Maybe you should do
>
> #define ENABLE_CF8_EXT_CFG BIT(46 - 32)
>
> to show exactly what you mean and how the bit is defined in MSR NB_CFG1
> and also show how the high 32-bits are mapped into F3x8c, while at it.
>
> And then you can drop the shifting at the call site.

Ok, I see another fun with this ECS enabling:

There's a enable_pci_io_ecs() which enables ECS through the NB_CFG MSR
which is called as part of the notifier *and* there's a PCI write to
that same bit in pci_enable_pci_io_ecs() which iterates over all NBs.

So, AFAICT, we do it twice and the second time is not needed. Which
means, you probably can drop pci_enable_pci_io_ecs() completely and use
solely the notifier?

Yes, no?

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-20 08:02:18

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] x86/PCI: Miscellaneous code clean up for early_fillup_mp_bus_info

On Fri, Apr 18, 2014 at 08:53:31PM -0600, Myron Stowe wrote:
> From: Suravee Suthikulpanit <[email protected]>
>
> * Refactoring of the early_fill_mp_bus_info function into multiple helper
> functions since it is getting long, and difficult to follow.

Much better.

> * Merge early_fill_mp_bus_info into amd_postcore_init as there is no need to
> have this as a separate function.
>
> * Use pr_xxx instead of printk

You can convert the printk(KERN_CONT -> pr_cont( too, while at it.
Although, this would prepend the fmt thing on every line ...

> * Prepend "AMD-Bus" for each print.

That's done with pr_fmt

> * The current code is using "fam10h_mmconf_*" in several place. As this
> file is no longer specific to family10h systems, this patch changes
> such occurrances to "amd_mmconf_*" instead for clarity.

...

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-20 10:21:15

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] ACPI/PCI: Warn if we have to "guess" host bridge node information

On Fri, Apr 18, 2014 at 08:53:39PM -0600, Myron Stowe wrote:
> The vast majority of platforms are not supplying ACPI _PXM (proximity)
> information corresponding to host bridge (PNP0A03/PNP0A08) devices
> resulting in sysfs "numa_node" values of -1 (NUMA_NO_NODE) [1]:
> # for i in /sys/devices/pci0000\:00/*/numa_node; do cat $i; done | uniq
> -1
>
> # find /sys/ -name "numa_node" | while read fname; do cat $fname; \
> done | uniq
> -1
>
> AMD based platforms provide a fall-back for this situation via amd_bus.c.
> These platforms snoop out the information by directly reading specific
> registers from the Northbridge and caching them via 'alloc_pci_root_info'.
>
> Later during boot processing when host bridges are discovered -
> 'pci_acpi_scan_root' - the kernel looks for their corresponding ACPI _PXM
> method - drivers/acpi/numa.c::acpi_get_node(). If the BIOS supplied a
> _PXM method then that node (proximity) value is associated. If the BIOS
> did not supply a _PXM method *and* the platform is AMD based, the
> fall-back cached values obtained directly from the Northbridge are used;
> otherwise, "NUMA_NO_NODE" is associated.
>
> There are a number of issues with this fall-back mechanism the most
> notable being that amd_bus.c extracts a 3-bit number from a CPU register
> and uses it as the node number. The node numbers used by Linux are
> logical and there's no reason they need to be identical to settings in the
> CPU registers. So if we have some node information obtained in the normal
> way (from _PXM, SLIT, SRAT, etc.) and some from amd_bus.c, there's no
> reason to believe they will be compatible.
>
> This patch warns when this situation occurs:
> pci_root PNP0A08:00: [Firmware Bug]: No _PXM; guessing node number 0
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=72051
>
> Signed-off-by: Myron Stowe <[email protected]>
> ---
>
> arch/x86/pci/acpi.c | 6 +++++-
> 1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
> index 01edac6..80c09ba 100644
> --- a/arch/x86/pci/acpi.c
> +++ b/arch/x86/pci/acpi.c
> @@ -489,8 +489,12 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
> }
>
> node = acpi_get_node(device->handle);
> - if (node == NUMA_NO_NODE)
> + if (node == NUMA_NO_NODE) {
> node = x86_pci_root_bus_node(busnum);
> + if (node != NUMA_NO_NODE)
> + dev_info(&device->dev, FW_BUG "No _PXM; guessing node number %x\n",

Hmm, I'm not really convinced this message is user-friendly enough. Can
we be more descriptive here please?

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-20 10:54:36

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] PCI: Remove redundant 'quirk_amd_nb_node'

On Fri, Apr 18, 2014 at 08:53:46PM -0600, Myron Stowe wrote:
> With the amd_bus.c updates to support additional AMD processors (11h, 12h,
> 14h 15h and 16h) 'quirk_amd_nb_node' seems to be redundant. This patch
> removes it.
>
> Signed-off-by: Myron Stowe <[email protected]>

Good idea. Has this been tested, though? There are a bunch of places
using dev_to_node() which this quirk corrects, especially 'local_cpus'
in a bunch of sysfs nodes in the PCI hierarchy /sys/devices/pci*.

We want to make sure this doesn't break that and other dev_to_node()
callsites.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-20 13:44:23

by Myron Stowe

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] PCI: Remove redundant 'quirk_amd_nb_node'

On Sun, Apr 20, 2014 at 4:54 AM, Borislav Petkov <[email protected]> wrote:
> On Fri, Apr 18, 2014 at 08:53:46PM -0600, Myron Stowe wrote:
>> With the amd_bus.c updates to support additional AMD processors (11h, 12h,
>> 14h 15h and 16h) 'quirk_amd_nb_node' seems to be redundant. This patch
>> removes it.
>>
>> Signed-off-by: Myron Stowe <[email protected]>
>
> Good idea. Has this been tested, though? There are a bunch of places
> using dev_to_node() which this quirk corrects, especially 'local_cpus'
> in a bunch of sysfs nodes in the PCI hierarchy /sys/devices/pci*.

Borislav:

Thanks for the review, suggestions, and comments.

Just and FYI - I'll be gone until late in this coming week on a
backpacking trip in remote Utah. I'll work your suggestions when I
get back.

With respect to this patch and testing. Yes, I did test and it worked
for my situation. I would like to hear from Daniel as I'm not exactly
sure what his last change was specific to.

Daniel: with the previous patches in this series applied do you still
need this quirk for your situation?

Thanks,
Myron
>
> We want to make sure this doesn't break that and other dev_to_node()
> callsites.

Agreed

>
> Thanks.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.

2014-04-21 16:54:21

by Daniel J Blueman

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] PCI: Remove redundant 'quirk_amd_nb_node'

Hi Myron,

On 04/20/2014 09:44 PM, Myron Stowe wrote:
> On Sun, Apr 20, 2014 at 4:54 AM, Borislav Petkov <[email protected]> wrote:
>> On Fri, Apr 18, 2014 at 08:53:46PM -0600, Myron Stowe wrote:
>>> With the amd_bus.c updates to support additional AMD processors (11h, 12h,
>>> 14h 15h and 16h) 'quirk_amd_nb_node' seems to be redundant. This patch
>>> removes it.
>>>
>>> Signed-off-by: Myron Stowe <[email protected]>
>>
>> Good idea. Has this been tested, though? There are a bunch of places
>> using dev_to_node() which this quirk corrects, especially 'local_cpus'
>> in a bunch of sysfs nodes in the PCI hierarchy /sys/devices/pci*.
>
> Borislav:
>
> Thanks for the review, suggestions, and comments.
>
> Just and FYI - I'll be gone until late in this coming week on a
> backpacking trip in remote Utah. I'll work your suggestions when I
> get back.
>
> With respect to this patch and testing. Yes, I did test and it worked
> for my situation. I would like to hear from Daniel as I'm not exactly
> sure what his last change was specific to.
>
> Daniel: with the previous patches in this series applied do you still
> need this quirk for your situation?

It's fine enough for us, since we'll always emit a PXM method for any
PCI host bridges (but roots), which gets correctly preserved with your
patches [1]. I don't have reason to suspect northbridges at bus 0,
devices 0x19 and later in each PCI domain having the wrong NUMA node is
going to cause any issues.

We could improve things by changing probe_pci_hostbridge to iterate all
PCI bus domains, but it's probably not a big deal to bail with one
warning, as we currently see multiple (which would become a lot on most
of our systems):

[ 3.155974] AMD-Bus: TOM: 00000000d8000000 aka 3456M
[ 3.160003]
[ 3.161496] AMD-Bus: WARNING: Failed to find root info for node 0x4,
link 0x0
[ 3.168625]
[ 3.170002]
[ 3.171492] AMD-Bus: WARNING: Failed to find root info for node 0x4,
link 0x0
[ 3.178614] AMD-Bus: WARNING: Failed to find root info for node 0x4,
link 0x0
[ 3.180003] AMD-Bus: TOM2: 0000001900000000 aka 102400M

Thanks,
Daniel

-- [1] (two server NumaConnect system)

# lspci
0000:00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890
Northbridge only dual slot (2x16) PCI-e GFX Hydra part (rev 02)
0000:00:0d.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890
PCI to PCI bridge (external gfx1 port B)
0000:00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 SATA Controller [IDE mode]
0000:00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
0000:00:12.1 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0 USB OHCI1 Controller
0000:00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 USB EHCI Controller
0000:00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
0000:00:13.1 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0 USB OHCI1 Controller
0000:00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 USB EHCI Controller
0000:00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus
Controller (rev 3d)
0000:00:14.1 IDE interface: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 IDE Controller
0000:00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 LPC host controller
0000:00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
PCI to PCI Bridge
0000:00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
0000:00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 0
0000:00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 1
0000:00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 2
0000:00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 3
0000:00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 4
0000:00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 5
0000:00:19.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 0
0000:00:19.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 1
0000:00:19.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 2
0000:00:19.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 3
0000:00:19.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 4
0000:00:19.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 5
0000:00:1a.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 0
0000:00:1a.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 1
0000:00:1a.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 2
0000:00:1a.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 3
0000:00:1a.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 4
0000:00:1a.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 5
0000:00:1b.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 0
0000:00:1b.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 1
0000:00:1b.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 2
0000:00:1b.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 3
0000:00:1b.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 4
0000:00:1b.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 5
0000:00:1c.0 Host bridge: Numascale AS NumaChip N601 (rev 02)
0000:00:1c.1 Host bridge: Numascale AS NumaChip N602 (rev 02)
0000:01:04.0 VGA compatible controller: Matrox Electronics Systems Ltd.
MGA G200eW WPCM450 (rev 0a)
0000:02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit
Network Connection (rev 01)
0000:02:00.1 Ethernet controller: Intel Corporation 82576 Gigabit
Network Connection (rev 01)
0001:00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890
Northbridge only dual slot (2x16) PCI-e GFX Hydra part (rev 02)
0001:00:0d.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890
PCI to PCI bridge (external gfx1 port B)
0001:00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 SATA Controller [IDE mode]
0001:00:14.0 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
SMBus Controller (rev 3d)
0001:00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
PCI to PCI Bridge
0001:00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 0
0001:00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 1
0001:00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 2
0001:00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 3
0001:00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 4
0001:00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 5
0001:00:19.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 0
0001:00:19.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 1
0001:00:19.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 2
0001:00:19.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 3
0001:00:19.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 4
0001:00:19.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 5
0001:00:1a.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 0
0001:00:1a.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 1
0001:00:1a.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 2
0001:00:1a.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 3
0001:00:1a.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 4
0001:00:1a.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 5
0001:00:1b.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 0
0001:00:1b.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 1
0001:00:1b.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 2
0001:00:1b.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 3
0001:00:1b.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 4
0001:00:1b.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
Processor Function 5
0001:00:1c.0 Host bridge: Numascale AS NumaChip N601 (rev 02)
0001:00:1c.1 Host bridge: Numascale AS NumaChip N602 (rev 02)
0001:02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit
Network Connection (rev 01)
0001:02:00.1 Ethernet controller: Intel Corporation 82576 Gigabit
Network Connection (rev 01)

# find /sys/devices/pci* -name numa_node -print -exec cat {} \;
/sys/devices/pci0000:00/0000:00:00.0/numa_node
0
/sys/devices/pci0000:00/0000:00:11.0/numa_node
0
/sys/devices/pci0000:00/0000:00:12.0/numa_node
0
/sys/devices/pci0000:00/0000:00:12.1/numa_node
0
/sys/devices/pci0000:00/0000:00:12.2/numa_node
0
/sys/devices/pci0000:00/0000:00:13.0/numa_node
0
/sys/devices/pci0000:00/0000:00:13.1/numa_node
0
/sys/devices/pci0000:00/0000:00:13.2/numa_node
0
/sys/devices/pci0000:00/0000:00:14.0/numa_node
0
/sys/devices/pci0000:00/0000:00:14.1/numa_node
0
/sys/devices/pci0000:00/0000:00:14.3/numa_node
0
/sys/devices/pci0000:00/0000:00:14.4/0000:01:04.0/numa_node
0
/sys/devices/pci0000:00/0000:00:14.4/numa_node
0
/sys/devices/pci0000:00/0000:00:14.5/numa_node
0
/sys/devices/pci0000:00/0000:00:18.0/numa_node
0
/sys/devices/pci0000:00/0000:00:18.1/numa_node
0
/sys/devices/pci0000:00/0000:00:18.2/numa_node
0
/sys/devices/pci0000:00/0000:00:18.3/numa_node
0
/sys/devices/pci0000:00/0000:00:18.4/numa_node
0
/sys/devices/pci0000:00/0000:00:18.5/numa_node
0
/sys/devices/pci0000:00/0000:00:19.0/numa_node
0
/sys/devices/pci0000:00/0000:00:19.1/numa_node
0
/sys/devices/pci0000:00/0000:00:19.2/numa_node
0
/sys/devices/pci0000:00/0000:00:19.3/numa_node
0
/sys/devices/pci0000:00/0000:00:19.4/numa_node
0
/sys/devices/pci0000:00/0000:00:19.5/numa_node
0
/sys/devices/pci0000:00/0000:00:0d.0/0000:02:00.0/numa_node
0
/sys/devices/pci0000:00/0000:00:0d.0/0000:02:00.1/numa_node
0
/sys/devices/pci0000:00/0000:00:0d.0/numa_node
0
/sys/devices/pci0000:00/0000:00:1a.0/numa_node
0
/sys/devices/pci0000:00/0000:00:1a.1/numa_node
0
/sys/devices/pci0000:00/0000:00:1a.2/numa_node
0
/sys/devices/pci0000:00/0000:00:1a.3/numa_node
0
/sys/devices/pci0000:00/0000:00:1a.4/numa_node
0
/sys/devices/pci0000:00/0000:00:1a.5/numa_node
0
/sys/devices/pci0000:00/0000:00:1b.0/numa_node
0
/sys/devices/pci0000:00/0000:00:1b.1/numa_node
0
/sys/devices/pci0000:00/0000:00:1b.2/numa_node
0
/sys/devices/pci0000:00/0000:00:1b.3/numa_node
0
/sys/devices/pci0000:00/0000:00:1b.4/numa_node
0
/sys/devices/pci0000:00/0000:00:1b.5/numa_node
0
/sys/devices/pci0000:00/0000:00:1c.0/numa_node
0
/sys/devices/pci0000:00/0000:00:1c.1/numa_node
0
/sys/devices/pci0001:00/0001:00:00.0/numa_node
4
/sys/devices/pci0001:00/0001:00:11.0/numa_node
4
/sys/devices/pci0001:00/0001:00:14.0/numa_node
4
/sys/devices/pci0001:00/0001:00:14.4/numa_node
4
/sys/devices/pci0001:00/0001:00:18.0/numa_node
4
/sys/devices/pci0001:00/0001:00:18.1/numa_node
4
/sys/devices/pci0001:00/0001:00:18.2/numa_node
4
/sys/devices/pci0001:00/0001:00:18.3/numa_node
4
/sys/devices/pci0001:00/0001:00:18.4/numa_node
4
/sys/devices/pci0001:00/0001:00:18.5/numa_node
4
/sys/devices/pci0001:00/0001:00:19.0/numa_node
4
/sys/devices/pci0001:00/0001:00:19.1/numa_node
4
/sys/devices/pci0001:00/0001:00:19.2/numa_node
4
/sys/devices/pci0001:00/0001:00:19.3/numa_node
4
/sys/devices/pci0001:00/0001:00:19.4/numa_node
4
/sys/devices/pci0001:00/0001:00:19.5/numa_node
4
/sys/devices/pci0001:00/0001:00:0d.0/0001:02:00.0/numa_node
4
/sys/devices/pci0001:00/0001:00:0d.0/0001:02:00.1/numa_node
4
/sys/devices/pci0001:00/0001:00:0d.0/numa_node
4
/sys/devices/pci0001:00/0001:00:1a.0/numa_node
4
/sys/devices/pci0001:00/0001:00:1a.1/numa_node
4
/sys/devices/pci0001:00/0001:00:1a.2/numa_node
4
/sys/devices/pci0001:00/0001:00:1a.3/numa_node
4
/sys/devices/pci0001:00/0001:00:1a.4/numa_node
4
/sys/devices/pci0001:00/0001:00:1a.5/numa_node
4
/sys/devices/pci0001:00/0001:00:1b.0/numa_node
4
/sys/devices/pci0001:00/0001:00:1b.1/numa_node
4
/sys/devices/pci0001:00/0001:00:1b.2/numa_node
4
/sys/devices/pci0001:00/0001:00:1b.3/numa_node
4
/sys/devices/pci0001:00/0001:00:1b.4/numa_node
4
/sys/devices/pci0001:00/0001:00:1b.5/numa_node
4
/sys/devices/pci0001:00/0001:00:1c.0/numa_node
4
/sys/devices/pci0001:00/0001:00:1c.1/numa_node
4
--
Daniel J Blueman
Principal Software Engineer, Numascale

2014-04-25 22:24:34

by Myron Stowe

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

On Sun, Apr 20, 2014 at 1:59 AM, Borislav Petkov <[email protected]> wrote:
> Drop Andreas' old email address from CC as it keeps bouncing.
>
> On Sat, Apr 19, 2014 at 03:52:20PM +0200, Borislav Petkov wrote:
>> > -static void __init pci_enable_pci_io_ecs(void)
>> > +static void __init pci_enable_pci_io_ecs(u8 bus, u8 slot)
>> > {
>> > #ifdef CONFIG_AMD_NB
>> > unsigned int i, n;
>> > + u8 limit;
>> >
>> > for (n = i = 0; !n && amd_nb_bus_dev_ranges[i].dev_limit; ++i) {
>> > - u8 bus = amd_nb_bus_dev_ranges[i].bus;
>> > - u8 slot = amd_nb_bus_dev_ranges[i].dev_base;
>> > - u8 limit = amd_nb_bus_dev_ranges[i].dev_limit;
>> > + /* Try matching for the bus range */
>> > + if ((bus != amd_nb_bus_dev_ranges[i].bus) ||
>> > + (slot != amd_nb_bus_dev_ranges[i].dev_base))
>> > + continue;
>> > +
>> > + limit = amd_nb_bus_dev_ranges[i].dev_limit;
>> >
>> > + /* Setup all northbridges within the range */
>> > for (; slot < limit; ++slot) {
>> > u32 val = read_pci_config(bus, slot, 3, 0);
>> > -
>> > - if (!early_is_amd_nb(val))
>> > + if (!val)
>> > continue;
>> >
>> > val = read_pci_config(bus, slot, 3, 0x8c);
>> > @@ -375,13 +457,14 @@ static void __init pci_enable_pci_io_ecs(void)
>> > val |= ENABLE_CF8_EXT_CFG >> 32;
>>
>> What a fun shifting!
>>
>> Maybe you should do
>>
>> #define ENABLE_CF8_EXT_CFG BIT(46 - 32)
>>
>> to show exactly what you mean and how the bit is defined in MSR NB_CFG1
>> and also show how the high 32-bits are mapped into F3x8c, while at it.
>>
>> And then you can drop the shifting at the call site.
>
> Ok, I see another fun with this ECS enabling:
>
> There's a enable_pci_io_ecs() which enables ECS through the NB_CFG MSR
> which is called as part of the notifier *and* there's a PCI write to
> that same bit in pci_enable_pci_io_ecs() which iterates over all NBs.
>
> So, AFAICT, we do it twice and the second time is not needed. Which
> means, you probably can drop pci_enable_pci_io_ecs() completely and use
> solely the notifier?

It does look as if there is some duplication with respect to setting
MSR_AMD64_NB_CFG's (which is aliased at D18F3x8c [1])
ENABLE_CF8_EXT_CFG enable bit but there are at least a couple of
differences.

enable_pci_io_ecs() only sets the bit on one NB whereas
pci_enable_pci_io_ecs iterates over all the NBs (as you mentioned
above). The other difference has something to do with Xen; see the
origin of pci_enable_pci_io_ecs - commit 24d9b70b8.

>
> Yes, no?

Suravee, Kim - either of you want to chime in here?


[1] Read-write; Per-node. MSRC001_001F[31:0] is an alias of D18F3x88.
MSRC001_001F[63:32] is an alias of
D18F3x8C.

Myron
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-04-26 09:10:48

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

+ Robert.

On Fri, Apr 25, 2014 at 04:24:31PM -0600, Myron Stowe wrote:
> On Sun, Apr 20, 2014 at 1:59 AM, Borislav Petkov <[email protected]> wrote:
> > Drop Andreas' old email address from CC as it keeps bouncing.
> >
> > On Sat, Apr 19, 2014 at 03:52:20PM +0200, Borislav Petkov wrote:
> >> > -static void __init pci_enable_pci_io_ecs(void)
> >> > +static void __init pci_enable_pci_io_ecs(u8 bus, u8 slot)
> >> > {
> >> > #ifdef CONFIG_AMD_NB
> >> > unsigned int i, n;
> >> > + u8 limit;
> >> >
> >> > for (n = i = 0; !n && amd_nb_bus_dev_ranges[i].dev_limit; ++i) {
> >> > - u8 bus = amd_nb_bus_dev_ranges[i].bus;
> >> > - u8 slot = amd_nb_bus_dev_ranges[i].dev_base;
> >> > - u8 limit = amd_nb_bus_dev_ranges[i].dev_limit;
> >> > + /* Try matching for the bus range */
> >> > + if ((bus != amd_nb_bus_dev_ranges[i].bus) ||
> >> > + (slot != amd_nb_bus_dev_ranges[i].dev_base))
> >> > + continue;
> >> > +
> >> > + limit = amd_nb_bus_dev_ranges[i].dev_limit;
> >> >
> >> > + /* Setup all northbridges within the range */
> >> > for (; slot < limit; ++slot) {
> >> > u32 val = read_pci_config(bus, slot, 3, 0);
> >> > -
> >> > - if (!early_is_amd_nb(val))
> >> > + if (!val)
> >> > continue;
> >> >
> >> > val = read_pci_config(bus, slot, 3, 0x8c);
> >> > @@ -375,13 +457,14 @@ static void __init pci_enable_pci_io_ecs(void)
> >> > val |= ENABLE_CF8_EXT_CFG >> 32;
> >>
> >> What a fun shifting!
> >>
> >> Maybe you should do
> >>
> >> #define ENABLE_CF8_EXT_CFG BIT(46 - 32)
> >>
> >> to show exactly what you mean and how the bit is defined in MSR NB_CFG1
> >> and also show how the high 32-bits are mapped into F3x8c, while at it.
> >>
> >> And then you can drop the shifting at the call site.
> >
> > Ok, I see another fun with this ECS enabling:
> >
> > There's a enable_pci_io_ecs() which enables ECS through the NB_CFG MSR
> > which is called as part of the notifier *and* there's a PCI write to
> > that same bit in pci_enable_pci_io_ecs() which iterates over all NBs.
> >
> > So, AFAICT, we do it twice and the second time is not needed. Which
> > means, you probably can drop pci_enable_pci_io_ecs() completely and use
> > solely the notifier?
>
> It does look as if there is some duplication with respect to setting
> MSR_AMD64_NB_CFG's (which is aliased at D18F3x8c [1])
> ENABLE_CF8_EXT_CFG enable bit but there are at least a couple of
> differences.
>
> enable_pci_io_ecs() only sets the bit on one NB whereas
> pci_enable_pci_io_ecs iterates over all the NBs (as you mentioned
> above). The other difference has something to do with Xen; see the
> origin of pci_enable_pci_io_ecs - commit 24d9b70b8.

Of course it is xen - what else?! We do have to carry special code in
baremetal just for it because it is special and we all can't seem to get
enough of its crap.

Oh well, I guess we should at least comment this and refer to 24d9b70b8
so that the explanation is right there, in the code.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-28 20:50:54

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

[+cc Jan (24d9b70b8 author), Yinghai]

On Sat, Apr 26, 2014 at 3:10 AM, Borislav Petkov <[email protected]> wrote:
> + Robert.
>
> On Fri, Apr 25, 2014 at 04:24:31PM -0600, Myron Stowe wrote:
>> On Sun, Apr 20, 2014 at 1:59 AM, Borislav Petkov <[email protected]> wrote:
>> > Drop Andreas' old email address from CC as it keeps bouncing.
>> >
>> > On Sat, Apr 19, 2014 at 03:52:20PM +0200, Borislav Petkov wrote:
>> >> > -static void __init pci_enable_pci_io_ecs(void)
>> >> > +static void __init pci_enable_pci_io_ecs(u8 bus, u8 slot)
>> >> > {
>> >> > #ifdef CONFIG_AMD_NB
>> >> > unsigned int i, n;
>> >> > + u8 limit;
>> >> >
>> >> > for (n = i = 0; !n && amd_nb_bus_dev_ranges[i].dev_limit; ++i) {
>> >> > - u8 bus = amd_nb_bus_dev_ranges[i].bus;
>> >> > - u8 slot = amd_nb_bus_dev_ranges[i].dev_base;
>> >> > - u8 limit = amd_nb_bus_dev_ranges[i].dev_limit;
>> >> > + /* Try matching for the bus range */
>> >> > + if ((bus != amd_nb_bus_dev_ranges[i].bus) ||
>> >> > + (slot != amd_nb_bus_dev_ranges[i].dev_base))
>> >> > + continue;
>> >> > +
>> >> > + limit = amd_nb_bus_dev_ranges[i].dev_limit;
>> >> >
>> >> > + /* Setup all northbridges within the range */
>> >> > for (; slot < limit; ++slot) {
>> >> > u32 val = read_pci_config(bus, slot, 3, 0);
>> >> > -
>> >> > - if (!early_is_amd_nb(val))
>> >> > + if (!val)
>> >> > continue;
>> >> >
>> >> > val = read_pci_config(bus, slot, 3, 0x8c);
>> >> > @@ -375,13 +457,14 @@ static void __init pci_enable_pci_io_ecs(void)
>> >> > val |= ENABLE_CF8_EXT_CFG >> 32;
>> >>
>> >> What a fun shifting!
>> >>
>> >> Maybe you should do
>> >>
>> >> #define ENABLE_CF8_EXT_CFG BIT(46 - 32)
>> >>
>> >> to show exactly what you mean and how the bit is defined in MSR NB_CFG1
>> >> and also show how the high 32-bits are mapped into F3x8c, while at it.
>> >>
>> >> And then you can drop the shifting at the call site.
>> >
>> > Ok, I see another fun with this ECS enabling:
>> >
>> > There's a enable_pci_io_ecs() which enables ECS through the NB_CFG MSR
>> > which is called as part of the notifier *and* there's a PCI write to
>> > that same bit in pci_enable_pci_io_ecs() which iterates over all NBs.
>> >
>> > So, AFAICT, we do it twice and the second time is not needed. Which
>> > means, you probably can drop pci_enable_pci_io_ecs() completely and use
>> > solely the notifier?
>>
>> It does look as if there is some duplication with respect to setting
>> MSR_AMD64_NB_CFG's (which is aliased at D18F3x8c [1])
>> ENABLE_CF8_EXT_CFG enable bit but there are at least a couple of
>> differences.
>>
>> enable_pci_io_ecs() only sets the bit on one NB whereas
>> pci_enable_pci_io_ecs iterates over all the NBs (as you mentioned
>> above). The other difference has something to do with Xen; see the
>> origin of pci_enable_pci_io_ecs - commit 24d9b70b8.
>
> Of course it is xen - what else?! We do have to carry special code in
> baremetal just for it because it is special and we all can't seem to get
> enough of its crap.
>
> Oh well, I guess we should at least comment this and refer to 24d9b70b8
> so that the explanation is right there, in the code.

This is probably obvious, but my interest here is to (1) make sure all
systems in the field run well (so we need quirks to work around BIOS
and other issues), and (2) eliminate the need for kernel changes to
support future systems. So far we seem to be concentrating on (1) and
neglecting (2), which means we're always reacting to things that are
broken.

This I/O ECS thing seems likely to cause future problems. My
understanding (based on sec 2.8 of [1]) is that enable_pci_io_ecs()
and pci_enable_pci_io_ecs() are there to enable access to extended
config space (offsets 256-4095) via the 0xcf8/0xcfc I/O ports.

Per sec 4.1.1 of [2], we should be using ECAM (the memory-mapped
enhanced configuration mechanism, i.e., MMCONFIG) to access extended
config space, and the BIOS should supply an MCFG table.

So why do we need to enable I/O access to ECS on AMD chips at all? Is
this a workaround for a broken BIOS that doesn't supply an MCFG table?

>From reading the path below, I think raw_pci_read() will use
pci_direct_conf1 for (domain 0 [cfg 0-255]). For everything else, it
will use (a) pci_mmcfg if there's a valid MCFG or (b) pci_direct_conf1
if there's no MCFG and this is an AMD >= fam10h CPU, i.e.,
PCI_HAS_IO_ECS is set.

pci_arch_init
type = pci_direct_probe
pci_mmcfg_early_init
__pci_mmcfg_init
pci_mmcfg_arch_init
raw_pci_ext_ops = &pci_mmcfg
pci_direct_init
if (type == 1)
raw_pci_ops = &pci_direct_conf1
if (raw_pci_ext_ops)
return
if (!pci_probe & PCI_HAS_IO_ECS)
return
raw_pci_ext_ops = &pci_direct_conf1

I think we should try to get rid of amd_bus.c, e.g., only run
amd_postcore_init() for BIOS dates < 2015. It looks like a crutch
that is perpetuating buggy BIOSes and costing us maintenance effort.
We don't need anything similar for Intel CPUs, and I don't see a
compelling reason why we need it for AMD.

Bjorn

[1] BIOS and Kernel Developer's Guide for AMD Family 15h Models
00h-0Fh Processors Rev 3.14 (document number 42301)
[2] PCI Firmware Specification, Rev 3.0, June 20, 2005

2014-04-28 21:11:00

by Myron Stowe

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] x86/PCI: Add support for generic AMD hostbridges

On Sat, Apr 19, 2014 at 5:31 AM, Borislav Petkov <[email protected]> wrote:
> On Fri, Apr 18, 2014 at 08:53:16PM -0600, Myron Stowe wrote:
>> From: Suravee Suthikulpanit <[email protected]>
>>
>> AMD hostbridges gnenerally show up as PCI device 0:18.0. This patch adds
>> logic to automatically probe the device at this location and check PCI
>> device class code.
>>
>> This patch supports platforms with the following AMD hostbridge types:
>> (K8, family10h, 11h, 12h, 14h 15h and 16h processors).
>>
>> Reference(s):
>> https://bugzilla.kernel.org/show_bug.cgi?id=72051
>> Advanced Micro Devices (AMD), "BIOS and Kernel Developer's
>> Guide (BKDG) for AMD Family 15h Models 00h-0fh Processors." Section
>> 3.4 Device [1F:18]h Function 1 Configuration Registers; D18F1x[EC:E0]
>> Configuration Map, 42301 Rev 3.12 - October 11, 2012.
>>
>> Signed-off-by: Suravee Suthikulpanit <[email protected]>
>> Signed-off-by: Myron Stowe <[email protected]>
>> Tested-by: Aravind Gopalakrishnan <[email protected]>
>> ---
>>
>> arch/x86/pci/amd_bus.c | 71 ++++++++++++++++++++++++++++++++----------------
>> 1 files changed, 47 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
>> index e88f4c5..c8cd75c 100644
>> --- a/arch/x86/pci/amd_bus.c
>> +++ b/arch/x86/pci/amd_bus.c
>> @@ -11,27 +11,34 @@
>>
>> #include "bus_numa.h"
>>
>> +#define AMD_NB_F0_NODE_ID 0x60
>> +#define AMD_NB_F0_UNIT_ID 0x64
>> +#define AMD_NB_F1_CONFIG_MAP_REG 0xe0
>> +
>> +#define RANGE_NUM 16
>> +#define AMD_NB_F1_CONFIG_MAP_RANGES 4
>> +
>> /*
>> - * This discovers the pcibus <-> node mapping on AMD K8.
>> - * also get peer root bus resource for io,mmio
>> + * This discovers the pcibus <-> node mapping on AMD K8, Family 10h, 12h,
>> + * 14h, 15h, and 16h processor based systems. This also gets peer root bus
>> + * resources corresponding to I/O and MMIO addresses.
>> */
>>
>> -struct pci_hostbridge_probe {
>> +struct amd_hostbridge {
>> u32 bus;
>> u32 slot;
>> - u32 vendor;
>> u32 device;
>> };
>>
>> -static struct pci_hostbridge_probe pci_probes[] __initdata = {
>> - { 0, 0x18, PCI_VENDOR_ID_AMD, 0x1100 },
>> - { 0, 0x18, PCI_VENDOR_ID_AMD, 0x1200 },
>> - { 0xff, 0, PCI_VENDOR_ID_AMD, 0x1200 },
>> - { 0, 0x18, PCI_VENDOR_ID_AMD, 0x1300 },
>> +static struct amd_hostbridge hb_probes[] __initdata = {
>> + /* Standard AMD Hostbriges are at bus 0x0 slot 0x18.
>> + * In case of PCI_ANY_ID, the logic will try matching
>> + * PCI_CLASS_BRIDGE_HOST instead.
>> + */
>
> Standard kernel comment style:
>
> /*
> * Something interesting for the code below.
> * This is the second comment line.
> */
>

I went ahead and made this change in patch 1/5 (even though most
non-functional changes are in patch 3/5) as it seems to belong here.

>> + { 0x0 , 0x18, PCI_ANY_ID },
>> + { 0xff, 0 , PCI_DEVICE_ID_AMD_10H_NB_HT },
>> };
>>
>> -#define RANGE_NUM 16
>> -
>> static struct pci_root_info __init *find_pci_root_info(int node, int link)
>> {
>> struct pci_root_info *info;
>> @@ -75,31 +82,47 @@ static int __init early_fill_mp_bus_info(void)
>
> Btw, this function is huuuge. It could use a splitup some day.
>

Yes, Suravee did split this up in patch 3/5

>> return -1;
>>
>> found = false;
>> - for (i = 0; i < ARRAY_SIZE(pci_probes); i++) {
>> + for (i = 0; i < ARRAY_SIZE(hb_probes); i++) {
>> u32 id;
>> u16 device;
>> u16 vendor;
>> + u32 class;
>>
>> - bus = pci_probes[i].bus;
>> - slot = pci_probes[i].slot;
>> + bus = hb_probes[i].bus;
>> + slot = hb_probes[i].slot;
>> id = read_pci_config(bus, slot, 0, PCI_VENDOR_ID);
>> -
>> vendor = id & 0xffff;
>> device = (id>>16) & 0xffff;
>> - if (pci_probes[i].vendor == vendor &&
>> - pci_probes[i].device == device) {
>> - found = true;
>> - break;
>> + class = read_pci_config(bus, slot, 0,
>> + PCI_CLASS_REVISION) >> 16;
>> +
>> + if (PCI_VENDOR_ID_AMD == vendor) {
>
> Flip comparison:
>
> if (vendor == PCI_VENDOR_ID_AMD)
>
> and then save yourself another indentation level:
>
> if (vendor != PCI_VENDOR_ID_AMD)
> continue;
>

Yes, much better

>> + if (hb_probes[i].device == device) {
>> + found = true;
>> + break;
>> + }
>> +
>> + if ((hb_probes[i].device == PCI_ANY_ID) &&
>> + (class == PCI_CLASS_BRIDGE_HOST)) {
>> + hb_probes[i].device = device;
>> + found = true;
>> + break;
>> + }
>> }
>> }
>>
>> - if (!found)
>> + if (!found) {
>> + pr_warn("AMD hostbridge not found\n");
>> return 0;
>> + }
>
> I guess this should return a negative value... Not that it matters much
> from looking at the call site in the amd_postcore_init initcall.
>

Patch 3/5 cleans this up.

>>
>> - for (i = 0; i < 4; i++) {
>> + pr_debug("Found AMD hostbridge at %x:%x.0\n", bus, slot);
>
> Why not pr_info? I think this is useful info we want to print out at
> default level.

Sure, ok ;)
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-04-28 21:19:37

by Myron Stowe

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

On Sat, Apr 19, 2014 at 7:52 AM, Borislav Petkov <[email protected]> wrote:
> On Fri, Apr 18, 2014 at 08:53:23PM -0600, Myron Stowe wrote:
>> From: Suravee Suthikulpanit <[email protected]>
>>
>> This patch adds supports for additional MMIO ranges (16 ranges). Also,
>> each MMIO base/limit can now support up to 48-bit MMIO addresses.
>> However, this requires initializing the ECS sooner since the new registers
>> are in the ECS ranges.
>>
>> This applies for AMD Family 15h and later.
>>
>> Reference: Advanced Micro Devices (AMD), "BIOS and Kernel Developer's
>> Guide (BKDG) for AMD Family 15h Models 00h-0fh Processors." Section 3.4
>> Device [1F:18]h Function 1 Configuration Registers;
>> D18F1x[1CC:180,BC:80] MMIO Base/Limit,
>> D18F1x[D8,D0,C8,C0] IO-Space Base,
>> D18F1x[DC,D4,CC,C4] IO-Space Limit,
>> 42301 Rev 3.12 - October 11, 2012.
>>
>> Signed-off-by: Suravee Suthikulpanit <[email protected]>
>> Signed-off-by: Myron Stowe <[email protected]>
>> Tested-by: Aravind Gopalakrishnan <[email protected]>
>> ---
>>
>> arch/x86/pci/amd_bus.c | 126 +++++++++++++++++++++++++++++++++++++++---------
>> 1 files changed, 103 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
>> index c8cd75c..1b2895e 100644
>> --- a/arch/x86/pci/amd_bus.c
>> +++ b/arch/x86/pci/amd_bus.c
>> @@ -13,11 +13,30 @@
>>
>> #define AMD_NB_F0_NODE_ID 0x60
>> #define AMD_NB_F0_UNIT_ID 0x64
>> +#define AMD_NB_F1_MMIO_BASE_REG 0x80
>> +#define AMD_NB_F1_MMIO_LIMIT_REG 0x84
>> +#define AMD_NB_F1_MMIO_BASE_LIMIT_HI_REG 0x180
>> +#define AMD_NB_F1_IO_BASE_ADDR_REG 0xc0
>> +#define AMD_NB_F1_IO_LIMIT_ADDR_REG 0xc4
>> #define AMD_NB_F1_CONFIG_MAP_REG 0xe0
>>
>> #define RANGE_NUM 16
>> +#define AMD_NB_F1_MMIO_RANGES 16
>> +#define AMD_NB_F1_IOPORT_RANGES 4
>> #define AMD_NB_F1_CONFIG_MAP_RANGES 4
>>
>> +#define AMD_PCIE_CF8(bus, dev, fn, reg) \
>> + (0x80000000 | \
>> + ((reg & 0xF00) << 16) | \
>> + ((bus & 0xF) << 16) | \
>> + ((dev & 0x1F) << 11) | \
>> + ((fn & 0x7) << 8) | \
>> + ((reg & 0xFC)))
>> +
>> +static bool amd_early_ecs_enabled;
>> +
>> +static int __init pci_io_ecs_init(u8 bus, u8 slot);
>
> Please move the function above its caller instead of adding a forward
> declaration.
>

Patch 3/5 cleans this up

>> +
>> /*
>> * This discovers the pcibus <-> node mapping on AMD K8, Family 10h, 12h,
>> * 14h, 15h, and 16h processor based systems. This also gets peer root bus
>> @@ -39,6 +58,20 @@ static struct amd_hostbridge hb_probes[] __initdata = {
>> { 0xff, 0 , PCI_DEVICE_ID_AMD_10H_NB_HT },
>> };
>>
>> +/* This version of read_pci_config allows reading registers in the ECS area */
>> +static inline int _amd_read_pci_config(u8 bus, u8 slot, u8 fn, u32 offset)
>> +{
>> + u32 value;
>> +
>> + if ((!amd_early_ecs_enabled) && (offset > 0xFF))
>> + return -1;
>> +
>> + outl(AMD_PCIE_CF8(bus, slot, fn, offset), 0xcf8);
>> + value = inl(0xcfc);
>> +
>> + return value;
>> +
>> static struct pci_root_info __init *find_pci_root_info(int node, int link)
>> {
>> struct pci_root_info *info;
>> @@ -48,6 +81,9 @@ static struct pci_root_info __init *find_pci_root_info(int node, int link)
>> if (info->node == node && info->link == link)
>> return info;
>>
>> + pr_warn("AMD-Bus: WARNING: Failed to find root info for node %#x, link %#x\n",
>
> Prefixes like "AMD-Bus" are done using pr_fmt.
>

Ok, I incorporated pr_fmt changes into patch 3/5

>> + node, link);
>> +
>> return NULL;
>> }
>>
>> @@ -118,6 +154,12 @@ static int __init early_fill_mp_bus_info(void)
>>
>> pr_debug("Found AMD hostbridge at %x:%x.0\n", bus, slot);
>>
>> + /* We enable ECS mode prior to probing MMIO as MMIO related registers
>> + * are in the ECS area.
>> + */
>> + pci_io_ecs_init(bus, slot);
>
> Well, if enabling the ECS failed somehow, you probably want to save
> yourself the _amd_read_pci_config() calls further down.
>
> Which means:
>
> * you should propagate an error code from that pci_io_ecs_init() function
>
> * you can carve out the huge loop iterating over AMD_NB_F1_MMIO_RANGES
> along with the pci ECS enablement call to pci_io_ecs_init into a
> separate function and thus slim down that huge early_fill_mp_bus_info()
> monster a bit.
>

Yes but this was not trivial since most, if not all, usages ignore any
return value.

I'm wanting to focus on just fixing the issue for the most part and
making as few changes as possible. This is primarly due to:
o The lack of engagement by the AMD engineers (Suravee posted these
changes but has since been unresponsive),
o As Bjorn just mentioned, we would like to get rid of this entire
file (as we did in a similar manner with intel_bus) as it is just an
"endless treadmill" that needs attention with every new HW release,
o I want my name to be minimally related to this ;)

>> +
>> + found = false;
>> for (i = 0; i < AMD_NB_F1_CONFIG_MAP_RANGES; i++) {
>> int min_bus;
>> int max_bus;
>> @@ -128,6 +170,7 @@ static int __init early_fill_mp_bus_info(void)
>> if ((reg & 7) != 3)
>> continue;
>>
>> + found = true;
>> min_bus = (reg >> 16) & 0xff;
>> max_bus = (reg >> 24) & 0xff;
>> node = (reg >> 4) & 0x07;
>> @@ -136,6 +179,13 @@ static int __init early_fill_mp_bus_info(void)
>> info = alloc_pci_root_info(min_bus, max_bus, node, link);
>> }
>>
>> + if (!found) {
>> + /* In case there is no AMDNB_F1_CONFIG_MAP_REGs,
>> + * we just use default to bus 0, node 0 link 0)
>> + */
>> + info = alloc_pci_root_info(0, 0, 0, 0);
>> + }
>
> No need for curly braces around a single statement - simply put the
> comment before the if (...).
>

Done

>> +
>> /* get the default node and link for left over res */
>> reg = read_pci_config(bus, slot, 0, AMD_NB_F0_NODE_ID);
>> def_node = (reg >> 8) & 0x07;
>> @@ -144,14 +194,17 @@ static int __init early_fill_mp_bus_info(void)
>>
>> memset(range, 0, sizeof(range));
>> add_range(range, RANGE_NUM, 0, 0, 0xffff + 1);
>> +
>> /* io port resource */
>> - for (i = 0; i < 4; i++) {
>> - reg = read_pci_config(bus, slot, 1, 0xc0 + (i << 3));
>> + for (i = 0; i < AMD_NB_F1_IOPORT_RANGES; i++) {
>> + reg = read_pci_config(bus, slot, 1,
>> + AMD_NB_F1_IO_BASE_ADDR_REG + (i << 3));
>> if (!(reg & 3))
>> continue;
>>
>> start = reg & 0xfff000;
>> - reg = read_pci_config(bus, slot, 1, 0xc4 + (i << 3));
>> + reg = read_pci_config(bus, slot, 1,
>> + AMD_NB_F1_IO_LIMIT_ADDR_REG + (i << 3));
>> node = reg & 0x07;
>> link = (reg >> 4) & 0x03;
>> end = (reg & 0xfff000) | 0xfff;
>> @@ -169,6 +222,7 @@ static int __init early_fill_mp_bus_info(void)
>> update_res(info, start, end, IORESOURCE_IO, 1);
>> subtract_range(range, RANGE_NUM, start, end + 1);
>> }
>> +
>> /* add left over io port range to def node/link, [0, 0xffff] */
>> /* find the position */
>> info = find_pci_root_info(def_node, def_link);
>> @@ -211,22 +265,46 @@ static int __init early_fill_mp_bus_info(void)
>> }
>>
>> /* mmio resource */
>> - for (i = 0; i < 8; i++) {
>> - reg = read_pci_config(bus, slot, 1, 0x80 + (i << 3));
>> + for (i = 0; i < AMD_NB_F1_MMIO_RANGES; i++) {
>> + u64 tmp;
>> + u32 base = AMD_NB_F1_MMIO_BASE_REG + (i << 3);
>> + u32 limit = AMD_NB_F1_MMIO_LIMIT_REG + (i << 3);
>> + u32 base_limit_hi = AMD_NB_F1_MMIO_BASE_LIMIT_HI_REG + (i << 2);
>> +
>> + if (i >= 12) {
>> + /* Range 12 base/limit starts at 0x2A0 */
>> + base += 0x1c0;
>> + limit += 0x1c0;
>> + /* Range 12 base/limit hi starts at 0x2C0 */
>> + base_limit_hi += 0x110;
>> + } else if (i >= 8) {
>> + /* Range 8 base/limit starts at 0x1A0 */
>> + base += 0xe0;
>> + limit += 0xe0;
>> + /* Range 12 base/limit hi starts at 0x1C0 */
>> + base_limit_hi += 0x20;
>> + }
>> +
>> + /* Base lo */
>> + reg = _amd_read_pci_config(bus, slot, 1, base);
>> if (!(reg & 3))
>> continue;
>>
>> - start = reg & 0xffffff00; /* 39:16 on 31:8*/
>> - start <<= 8;
>> - reg = read_pci_config(bus, slot, 1, 0x84 + (i << 3));
>> + start = (reg & 0xffffff00UL) << 8; /* 39:16 on 31:8 */
>> +
>> + /* Limit lo */
>> + reg = _amd_read_pci_config(bus, slot, 1, limit);
>> node = reg & 0x07;
>> link = (reg >> 4) & 0x03;
>> - end = (reg & 0xffffff00);
>> - end <<= 8;
>> - end |= 0xffff;
>> + end = (reg & 0xffffff00UL) << 8; /* 39:16 on 31:8 */
>> + end |= 0xffffUL;
>>
>> - info = find_pci_root_info(node, link);
>> + /* Base/Limit hi */
>> + tmp = _amd_read_pci_config(bus, slot, 1, base_limit_hi);
>> + start |= ((tmp & 0xffUL) << 40);
>> + end |= ((tmp & (0xffUL << 16)) << 24);
>>
>> + info = find_pci_root_info(node, link);
>> if (!info)
>> continue;
>>
>> @@ -354,20 +432,24 @@ static struct notifier_block amd_cpu_notifier = {
>> .notifier_call = amd_cpu_notify,
>> };
>>
>> -static void __init pci_enable_pci_io_ecs(void)
>> +static void __init pci_enable_pci_io_ecs(u8 bus, u8 slot)
>> {
>> #ifdef CONFIG_AMD_NB
>> unsigned int i, n;
>> + u8 limit;
>>
>> for (n = i = 0; !n && amd_nb_bus_dev_ranges[i].dev_limit; ++i) {
>> - u8 bus = amd_nb_bus_dev_ranges[i].bus;
>> - u8 slot = amd_nb_bus_dev_ranges[i].dev_base;
>> - u8 limit = amd_nb_bus_dev_ranges[i].dev_limit;
>> + /* Try matching for the bus range */
>> + if ((bus != amd_nb_bus_dev_ranges[i].bus) ||
>> + (slot != amd_nb_bus_dev_ranges[i].dev_base))
>> + continue;
>> +
>> + limit = amd_nb_bus_dev_ranges[i].dev_limit;
>>
>> + /* Setup all northbridges within the range */
>> for (; slot < limit; ++slot) {
>> u32 val = read_pci_config(bus, slot, 3, 0);
>> -
>> - if (!early_is_amd_nb(val))
>> + if (!val)
>> continue;
>>
>> val = read_pci_config(bus, slot, 3, 0x8c);
>> @@ -375,13 +457,14 @@ static void __init pci_enable_pci_io_ecs(void)
>> val |= ENABLE_CF8_EXT_CFG >> 32;
>
> What a fun shifting!
>
> Maybe you should do
>
> #define ENABLE_CF8_EXT_CFG BIT(46 - 32)
>
> to show exactly what you mean and how the bit is defined in MSR NB_CFG1
> and also show how the high 32-bits are mapped into F3x8c, while at it.
>
> And then you can drop the shifting at the call site.
>
>

Yes, this confused me at first also so I added an additional ENABLE_CF8_EXT_CFG

>> write_pci_config(bus, slot, 3, 0x8c, val);
>> }
>> + amd_early_ecs_enabled = true;
>
> You probably should check whether the PCI config write stuck before
> setting this.
>
>> ++n;
>> }
>> }
>> #endif
>> }
>>
>> -static int __init pci_io_ecs_init(void)
>> +static int __init pci_io_ecs_init(u8 bus, u8 slot)
>> {
>> int cpu;
>>
>> @@ -389,9 +472,7 @@ static int __init pci_io_ecs_init(void)
>> if (boot_cpu_data.x86 < 0x10)
>> return 0;
>>
>> - /* Try the PCI method first. */
>> - if (early_pci_allowed())
>> - pci_enable_pci_io_ecs();
>
> Where did the if-check go?
>

Good catch. I assume Suravee didn't drop this on purpose?

>> + pci_enable_pci_io_ecs(bus, slot);
>>
>> cpu_notifier_register_begin();
>> for_each_online_cpu(cpu)
>> @@ -411,7 +492,6 @@ static int __init amd_postcore_init(void)
>> return 0;
>>
>> early_fill_mp_bus_info();
>> - pci_io_ecs_init();
>>
>> return 0;
>> }
>>
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-04-28 21:21:53

by Myron Stowe

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] x86/PCI: Miscellaneous code clean up for early_fillup_mp_bus_info

On Sun, Apr 20, 2014 at 2:02 AM, Borislav Petkov <[email protected]> wrote:
> On Fri, Apr 18, 2014 at 08:53:31PM -0600, Myron Stowe wrote:
>> From: Suravee Suthikulpanit <[email protected]>
>>
>> * Refactoring of the early_fill_mp_bus_info function into multiple helper
>> functions since it is getting long, and difficult to follow.
>
> Much better.
>

Yes, it had become too big!

>> * Merge early_fill_mp_bus_info into amd_postcore_init as there is no need to
>> have this as a separate function.
>>
>> * Use pr_xxx instead of printk
>
> You can convert the printk(KERN_CONT -> pr_cont( too, while at it.
> Although, this would prepend the fmt thing on every line ...
>

Yeah, I ignored this since the additional pr_fmt() output was too
noisy and redundant

>> * Prepend "AMD-Bus" for each print.
>
> That's done with pr_fmt
>

Done

>> * The current code is using "fam10h_mmconf_*" in several place. As this
>> file is no longer specific to family10h systems, this patch changes
>> such occurrances to "amd_mmconf_*" instead for clarity.
>
> ...
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-04-28 21:24:42

by Myron Stowe

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] ACPI/PCI: Warn if we have to "guess" host bridge node information

On Sun, Apr 20, 2014 at 4:21 AM, Borislav Petkov <[email protected]> wrote:
> On Fri, Apr 18, 2014 at 08:53:39PM -0600, Myron Stowe wrote:
>> The vast majority of platforms are not supplying ACPI _PXM (proximity)
>> information corresponding to host bridge (PNP0A03/PNP0A08) devices
>> resulting in sysfs "numa_node" values of -1 (NUMA_NO_NODE) [1]:
>> # for i in /sys/devices/pci0000\:00/*/numa_node; do cat $i; done | uniq
>> -1
>>
>> # find /sys/ -name "numa_node" | while read fname; do cat $fname; \
>> done | uniq
>> -1
>>
>> AMD based platforms provide a fall-back for this situation via amd_bus.c.
>> These platforms snoop out the information by directly reading specific
>> registers from the Northbridge and caching them via 'alloc_pci_root_info'.
>>
>> Later during boot processing when host bridges are discovered -
>> 'pci_acpi_scan_root' - the kernel looks for their corresponding ACPI _PXM
>> method - drivers/acpi/numa.c::acpi_get_node(). If the BIOS supplied a
>> _PXM method then that node (proximity) value is associated. If the BIOS
>> did not supply a _PXM method *and* the platform is AMD based, the
>> fall-back cached values obtained directly from the Northbridge are used;
>> otherwise, "NUMA_NO_NODE" is associated.
>>
>> There are a number of issues with this fall-back mechanism the most
>> notable being that amd_bus.c extracts a 3-bit number from a CPU register
>> and uses it as the node number. The node numbers used by Linux are
>> logical and there's no reason they need to be identical to settings in the
>> CPU registers. So if we have some node information obtained in the normal
>> way (from _PXM, SLIT, SRAT, etc.) and some from amd_bus.c, there's no
>> reason to believe they will be compatible.
>>
>> This patch warns when this situation occurs:
>> pci_root PNP0A08:00: [Firmware Bug]: No _PXM; guessing node number 0
>>
>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=72051
>>
>> Signed-off-by: Myron Stowe <[email protected]>
>> ---
>>
>> arch/x86/pci/acpi.c | 6 +++++-
>> 1 files changed, 5 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
>> index 01edac6..80c09ba 100644
>> --- a/arch/x86/pci/acpi.c
>> +++ b/arch/x86/pci/acpi.c
>> @@ -489,8 +489,12 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
>> }
>>
>> node = acpi_get_node(device->handle);
>> - if (node == NUMA_NO_NODE)
>> + if (node == NUMA_NO_NODE) {
>> node = x86_pci_root_bus_node(busnum);
>> + if (node != NUMA_NO_NODE)
>> + dev_info(&device->dev, FW_BUG "No _PXM; guessing node number %x\n",
>
> Hmm, I'm not really convinced this message is user-friendly enough. Can
> we be more descriptive here please?
>

How about -
dev_info(&device->dev, FW_BUG "no _PXM; falling back to node %d from
hardware (may be inconsistent with ACPI node numbers)\n", node);

> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-04-28 21:28:46

by Myron Stowe

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] PCI: Remove redundant 'quirk_amd_nb_node'

On Sun, Apr 20, 2014 at 4:54 AM, Borislav Petkov <[email protected]> wrote:
> On Fri, Apr 18, 2014 at 08:53:46PM -0600, Myron Stowe wrote:
>> With the amd_bus.c updates to support additional AMD processors (11h, 12h,
>> 14h 15h and 16h) 'quirk_amd_nb_node' seems to be redundant. This patch
>> removes it.
>>
>> Signed-off-by: Myron Stowe <[email protected]>
>
> Good idea. Has this been tested, though? There are a bunch of places
> using dev_to_node() which this quirk corrects, especially 'local_cpus'
> in a bunch of sysfs nodes in the PCI hierarchy /sys/devices/pci*.

My testing was related to the PCI hierarchy /sys/devices/pci* as this
was the root issue I'm trying to get resolved.

Daniel also responded - Thanks Dan! - indicating that the changes are
ok related to his concerns.

>
> We want to make sure this doesn't break that and other dev_to_node()
> callsites.

Agreed, which is at least in part why this is a separate patch.

I'm going to run some more testing and then post with most of your
comments incorporated. Thanks again for the review.

Myron
>
> Thanks.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-04-28 21:40:55

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

On Mon, Apr 28, 2014 at 02:50:29PM -0600, Bjorn Helgaas wrote:
> This I/O ECS thing seems likely to cause future problems. My
> understanding (based on sec 2.8 of [1]) is that enable_pci_io_ecs()
> and pci_enable_pci_io_ecs() are there to enable access to extended
> config space (offsets 256-4095) via the 0xcf8/0xcfc I/O ports.
>
> Per sec 4.1.1 of [2], we should be using ECAM (the memory-mapped
> enhanced configuration mechanism, i.e., MMCONFIG) to access extended
> config space, and the BIOS should supply an MCFG table.
>
> So why do we need to enable I/O access to ECS on AMD chips at all? Is
> this a workaround for a broken BIOS that doesn't supply an MCFG table?

That's a good question. 831d991821dae doesn't say but I have a hunch
Andreas and/or Robert should know. I seem to vaguely remember it
might've been because of a missing MCFG but have flushed it out of the
cache long time ago.

Let's ask them.

Andreas, Robert, guys, do you remember why we did the PCI IO ECS access?
B0rked BIOSes?

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-29 02:02:40

by Suravee Suthikulpanit

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] PCI: Remove redundant 'quirk_amd_nb_node'

On 4/21/2014 11:53 AM, Daniel J Blueman wrote:
> Hi Myron,
>
> On 04/20/2014 09:44 PM, Myron Stowe wrote:
>> On Sun, Apr 20, 2014 at 4:54 AM, Borislav Petkov <[email protected]> wrote:
>>> On Fri, Apr 18, 2014 at 08:53:46PM -0600, Myron Stowe wrote:
>>>> With the amd_bus.c updates to support additional AMD processors
>>>> (11h, 12h,
>>>> 14h 15h and 16h) 'quirk_amd_nb_node' seems to be redundant. This
>>>> patch
>>>> removes it.
>>>>
>>>> Signed-off-by: Myron Stowe <[email protected]>
>>>
>>> Good idea. Has this been tested, though? There are a bunch of places
>>> using dev_to_node() which this quirk corrects, especially 'local_cpus'
>>> in a bunch of sysfs nodes in the PCI hierarchy /sys/devices/pci*.
>>
>> Borislav:
>>
>> Thanks for the review, suggestions, and comments.
>>
>> Just and FYI - I'll be gone until late in this coming week on a
>> backpacking trip in remote Utah. I'll work your suggestions when I
>> get back.
>>
>> With respect to this patch and testing. Yes, I did test and it worked
>> for my situation. I would like to hear from Daniel as I'm not exactly
>> sure what his last change was specific to.
>>
>> Daniel: with the previous patches in this series applied do you still
>> need this quirk for your situation?
>
> It's fine enough for us, since we'll always emit a PXM method for any
> PCI host bridges (but roots), which gets correctly preserved with your
> patches [1]. I don't have reason to suspect northbridges at bus 0,
> devices 0x19 and later in each PCI domain having the wrong NUMA node is
> going to cause any issues.
>
> We could improve things by changing probe_pci_hostbridge to iterate all
> PCI bus domains, but it's probably not a big deal to bail with one
> warning, as we currently see multiple (which would become a lot on most
> of our systems):
>
> [ 3.155974] AMD-Bus: TOM: 00000000d8000000 aka 3456M
> [ 3.160003]
> [ 3.161496] AMD-Bus: WARNING: Failed to find root info for node 0x4,
> link 0x0
> [ 3.168625]
> [ 3.170002]
> [ 3.171492] AMD-Bus: WARNING: Failed to find root info for node 0x4,
> link 0x0
> [ 3.178614] AMD-Bus: WARNING: Failed to find root info for node 0x4,
> link 0x0
> [ 3.180003] AMD-Bus: TOM2: 0000001900000000 aka 102400M
>
> Thanks,
> Daniel
>
> -- [1] (two server NumaConnect system)
>
> # lspci
> 0000:00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890
> Northbridge only dual slot (2x16) PCI-e GFX Hydra part (rev 02)
> 0000:00:0d.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890
> PCI to PCI bridge (external gfx1 port B)
> 0000:00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI]
> SB7x0/SB8x0/SB9x0 SATA Controller [IDE mode]
> 0000:00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
> SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
> 0000:00:12.1 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
> SB7x0 USB OHCI1 Controller
> 0000:00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
> SB7x0/SB8x0/SB9x0 USB EHCI Controller
> 0000:00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
> SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
> 0000:00:13.1 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
> SB7x0 USB OHCI1 Controller
> 0000:00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
> SB7x0/SB8x0/SB9x0 USB EHCI Controller
> 0000:00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus
> Controller (rev 3d)
> 0000:00:14.1 IDE interface: Advanced Micro Devices, Inc. [AMD/ATI]
> SB7x0/SB8x0/SB9x0 IDE Controller
> 0000:00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI]
> SB7x0/SB8x0/SB9x0 LPC host controller
> 0000:00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
> PCI to PCI Bridge
> 0000:00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
> SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
> 0000:00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 0
> 0000:00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 1
> 0000:00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 2
> 0000:00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 3
> 0000:00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 4
> 0000:00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 5
> 0000:00:19.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 0
> 0000:00:19.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 1
> 0000:00:19.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 2
> 0000:00:19.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 3
> 0000:00:19.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 4
> 0000:00:19.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 5
> 0000:00:1a.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 0
> 0000:00:1a.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 1
> 0000:00:1a.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 2
> 0000:00:1a.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 3
> 0000:00:1a.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 4
> 0000:00:1a.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 5
> 0000:00:1b.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 0
> 0000:00:1b.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 1
> 0000:00:1b.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 2
> 0000:00:1b.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 3
> 0000:00:1b.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 4
> 0000:00:1b.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 5
> 0000:00:1c.0 Host bridge: Numascale AS NumaChip N601 (rev 02)
> 0000:00:1c.1 Host bridge: Numascale AS NumaChip N602 (rev 02)
> 0000:01:04.0 VGA compatible controller: Matrox Electronics Systems Ltd.
> MGA G200eW WPCM450 (rev 0a)
> 0000:02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit
> Network Connection (rev 01)
> 0000:02:00.1 Ethernet controller: Intel Corporation 82576 Gigabit
> Network Connection (rev 01)
> 0001:00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890
> Northbridge only dual slot (2x16) PCI-e GFX Hydra part (rev 02)
> 0001:00:0d.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890
> PCI to PCI bridge (external gfx1 port B)
> 0001:00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI]
> SB7x0/SB8x0/SB9x0 SATA Controller [IDE mode]
> 0001:00:14.0 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
> SMBus Controller (rev 3d)
> 0001:00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
> PCI to PCI Bridge
> 0001:00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 0
> 0001:00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 1
> 0001:00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 2
> 0001:00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 3
> 0001:00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 4
> 0001:00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 5
> 0001:00:19.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 0
> 0001:00:19.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 1
> 0001:00:19.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 2
> 0001:00:19.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 3
> 0001:00:19.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 4
> 0001:00:19.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 5
> 0001:00:1a.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 0
> 0001:00:1a.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 1
> 0001:00:1a.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 2
> 0001:00:1a.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 3
> 0001:00:1a.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 4
> 0001:00:1a.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 5
> 0001:00:1b.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 0
> 0001:00:1b.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 1
> 0001:00:1b.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 2
> 0001:00:1b.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 3
> 0001:00:1b.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 4
> 0001:00:1b.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h
> Processor Function 5
> 0001:00:1c.0 Host bridge: Numascale AS NumaChip N601 (rev 02)
> 0001:00:1c.1 Host bridge: Numascale AS NumaChip N602 (rev 02)
> 0001:02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit
> Network Connection (rev 01)
> 0001:02:00.1 Ethernet controller: Intel Corporation 82576 Gigabit
> Network Connection (rev 01)
>
> # find /sys/devices/pci* -name numa_node -print -exec cat {} \;
> /sys/devices/pci0000:00/0000:00:00.0/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:11.0/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:12.0/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:12.1/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:12.2/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:13.0/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:13.1/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:13.2/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:14.0/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:14.1/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:14.3/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:14.4/0000:01:04.0/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:14.4/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:14.5/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:18.0/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:18.1/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:18.2/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:18.3/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:18.4/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:18.5/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:19.0/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:19.1/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:19.2/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:19.3/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:19.4/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:19.5/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:0d.0/0000:02:00.0/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:0d.0/0000:02:00.1/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:0d.0/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:1a.0/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:1a.1/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:1a.2/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:1a.3/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:1a.4/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:1a.5/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:1b.0/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:1b.1/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:1b.2/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:1b.3/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:1b.4/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:1b.5/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:1c.0/numa_node
> 0
> /sys/devices/pci0000:00/0000:00:1c.1/numa_node
> 0
> /sys/devices/pci0001:00/0001:00:00.0/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:11.0/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:14.0/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:14.4/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:18.0/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:18.1/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:18.2/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:18.3/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:18.4/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:18.5/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:19.0/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:19.1/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:19.2/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:19.3/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:19.4/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:19.5/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:0d.0/0001:02:00.0/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:0d.0/0001:02:00.1/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:0d.0/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:1a.0/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:1a.1/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:1a.2/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:1a.3/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:1a.4/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:1a.5/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:1b.0/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:1b.1/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:1b.2/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:1b.3/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:1b.4/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:1b.5/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:1c.0/numa_node
> 4
> /sys/devices/pci0001:00/0001:00:1c.1/numa_node
> 4

Sorry for late reply. My concern is that removing the
"quirk_amd_nb_node()" will affect the value of "numa_node" of the host
bridge devices (i.e. X:00.[18|19|1a|1b|1c|1d|1e|1f].X). I am not sure
if any code is using this information. But in theory, these host-bridge
devices are not on the same node as where the PCI root complex lives
(e.g. 0 and 4 from the example above).

If we want the "numa_node" to really representing the actual node, then
the quirk has to stay for now. We might need to come up with a
different logic to replace the quirks here, which would automatically
determine the actual node value for these host-bridge devices.

Suravee

Suravee

2014-04-29 02:47:25

by Suravee Suthikulpanit

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

On 4/28/2014 4:19 PM, Myron Stowe wrote:
> On Sat, Apr 19, 2014 at 7:52 AM, Borislav Petkov <[email protected]> wrote:
>> On Fri, Apr 18, 2014 at 08:53:23PM -0600, Myron Stowe wrote:
>>> From: Suravee Suthikulpanit <[email protected]>

.......

>>> @@ -118,6 +154,12 @@ static int __init early_fill_mp_bus_info(void)
>>>
>>> pr_debug("Found AMD hostbridge at %x:%x.0\n", bus, slot);
>>>
>>> + /* We enable ECS mode prior to probing MMIO as MMIO related registers
>>> + * are in the ECS area.
>>> + */
>>> + pci_io_ecs_init(bus, slot);
>>
>> Well, if enabling the ECS failed somehow, you probably want to save
>> yourself the _amd_read_pci_config() calls further down.
>>
>> Which means:
>>
>> * you should propagate an error code from that pci_io_ecs_init() function
>>
>> * you can carve out the huge loop iterating over AMD_NB_F1_MMIO_RANGES
>> along with the pci ECS enablement call to pci_io_ecs_init into a
>> separate function and thus slim down that huge early_fill_mp_bus_info()
>> monster a bit.
>>
>
> Yes but this was not trivial since most, if not all, usages ignore any
> return value.
>
> I'm wanting to focus on just fixing the issue for the most part and
> making as few changes as possible. This is primarly due to:
> o The lack of engagement by the AMD engineers (Suravee posted these
> changes but has since been unresponsive),
> o As Bjorn just mentioned, we would like to get rid of this entire
> file (as we did in a similar manner with intel_bus) as it is just an
> "endless treadmill" that needs attention with every new HW release,
> o I want my name to be minimally related to this ;)
>
[Suravee] Sorry for late reply and did not follow up on this patch in a
timely manner. Also, thank you Myron for helping to follow up with the
patch set.

Regarding the error code from pci_io_ecs_init(), it is currently always
return 0 (success). I am not sure what error code should I
check/propagate. Let me know if I am missing something here.

......

>>> -static int __init pci_io_ecs_init(void)
>>> +static int __init pci_io_ecs_init(u8 bus, u8 slot)
>>> {
>>> int cpu;
>>>
>>> @@ -389,9 +472,7 @@ static int __init pci_io_ecs_init(void)
>>> if (boot_cpu_data.x86 < 0x10)
>>> return 0;
>>>
>>> - /* Try the PCI method first. */
>>> - if (early_pci_allowed())
>>> - pci_enable_pci_io_ecs();
>>
>> Where did the if-check go?
>>
>
> Good catch. I assume Suravee didn't drop this on purpose?

[SURAVEE] I dropped the "early_pci_allowed()" here since I have moved
the calling of "pci_io_ecs_init()" into the "early_fill_mp_bus_info()"
since I would like to enable PCI ecs access at earlier stage. The
"early_fill_mp_bus_info() has already check for "early_pci_allowed()"
already at the beginning of the function. So, this should not be needed
here.

Suravee

2014-04-29 03:04:43

by Suravee Suthikulpanit

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

On 4/25/2014 5:24 PM, Myron Stowe wrote:
> On Sun, Apr 20, 2014 at 1:59 AM, Borislav Petkov <[email protected]> wrote:
>> Drop Andreas' old email address from CC as it keeps bouncing.
>>
>> On Sat, Apr 19, 2014 at 03:52:20PM +0200, Borislav Petkov wrote:
>>>> -static void __init pci_enable_pci_io_ecs(void)
>>>> +static void __init pci_enable_pci_io_ecs(u8 bus, u8 slot)
>>>> {
>>>> #ifdef CONFIG_AMD_NB
>>>> unsigned int i, n;
>>>> + u8 limit;
>>>>
>>>> for (n = i = 0; !n && amd_nb_bus_dev_ranges[i].dev_limit; ++i) {
>>>> - u8 bus = amd_nb_bus_dev_ranges[i].bus;
>>>> - u8 slot = amd_nb_bus_dev_ranges[i].dev_base;
>>>> - u8 limit = amd_nb_bus_dev_ranges[i].dev_limit;
>>>> + /* Try matching for the bus range */
>>>> + if ((bus != amd_nb_bus_dev_ranges[i].bus) ||
>>>> + (slot != amd_nb_bus_dev_ranges[i].dev_base))
>>>> + continue;
>>>> +
>>>> + limit = amd_nb_bus_dev_ranges[i].dev_limit;
>>>>
>>>> + /* Setup all northbridges within the range */
>>>> for (; slot < limit; ++slot) {
>>>> u32 val = read_pci_config(bus, slot, 3, 0);
>>>> -
>>>> - if (!early_is_amd_nb(val))
>>>> + if (!val)
>>>> continue;
>>>>
>>>> val = read_pci_config(bus, slot, 3, 0x8c);
>>>> @@ -375,13 +457,14 @@ static void __init pci_enable_pci_io_ecs(void)
>>>> val |= ENABLE_CF8_EXT_CFG >> 32;
>>>
>>> What a fun shifting!
>>>
>>> Maybe you should do
>>>
>>> #define ENABLE_CF8_EXT_CFG BIT(46 - 32)
>>>
>>> to show exactly what you mean and how the bit is defined in MSR NB_CFG1
>>> and also show how the high 32-bits are mapped into F3x8c, while at it.
>>>
>>> And then you can drop the shifting at the call site.
>>
>> Ok, I see another fun with this ECS enabling:
>>
>> There's a enable_pci_io_ecs() which enables ECS through the NB_CFG MSR
>> which is called as part of the notifier *and* there's a PCI write to
>> that same bit in pci_enable_pci_io_ecs() which iterates over all NBs.
>>
>> So, AFAICT, we do it twice and the second time is not needed. Which
>> means, you probably can drop pci_enable_pci_io_ecs() completely and use
>> solely the notifier?
>
> It does look as if there is some duplication with respect to setting
> MSR_AMD64_NB_CFG's (which is aliased at D18F3x8c [1])
> ENABLE_CF8_EXT_CFG enable bit but there are at least a couple of
> differences.
>
> enable_pci_io_ecs() only sets the bit on one NB whereas
> pci_enable_pci_io_ecs iterates over all the NBs (as you mentioned
> above). The other difference has something to do with Xen; see the
> origin of pci_enable_pci_io_ecs - commit 24d9b70b8.
>
>>
>> Yes, no?
>
> Suravee, Kim - either of you want to chime in here?

I believe the notifier is mainly for the cores which are initially
offline, and then brought up online afterward.

Suravee

2014-04-29 07:06:18

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

>>> On 28.04.14 at 22:50, <[email protected]> wrote:
>>> > There's a enable_pci_io_ecs() which enables ECS through the NB_CFG MSR
>>> > which is called as part of the notifier *and* there's a PCI write to
>>> > that same bit in pci_enable_pci_io_ecs() which iterates over all NBs.
>>> >
>>> > So, AFAICT, we do it twice and the second time is not needed. Which
>>> > means, you probably can drop pci_enable_pci_io_ecs() completely and use
>>> > solely the notifier?
>>>
>>> It does look as if there is some duplication with respect to setting
>>> MSR_AMD64_NB_CFG's (which is aliased at D18F3x8c [1])
>>> ENABLE_CF8_EXT_CFG enable bit but there are at least a couple of
>>> differences.
>>>
>>> enable_pci_io_ecs() only sets the bit on one NB whereas
>>> pci_enable_pci_io_ecs iterates over all the NBs (as you mentioned
>>> above). The other difference has something to do with Xen; see the
>>> origin of pci_enable_pci_io_ecs - commit 24d9b70b8.

As stated there - Xen Dom0 is running on virtual CPUs, i.e. which
physical CPU the code gets executed on is unknown, and hence the
MSR based approach just does not reliably work there.

Of course, with (since then) grown needs to access extended config
space in Xen itself, this enabling could be moved into the hypervisor.
Back then the hypervisor didn't have much need for this, so it seemed
more reasonable (albeit I'm rather certain Borislav is going to disagree
with this) to augment the Dom0 side code.

> This is probably obvious, but my interest here is to (1) make sure all
> systems in the field run well (so we need quirks to work around BIOS
> and other issues), and (2) eliminate the need for kernel changes to
> support future systems. So far we seem to be concentrating on (1) and
> neglecting (2), which means we're always reacting to things that are
> broken.
>
> This I/O ECS thing seems likely to cause future problems. My
> understanding (based on sec 2.8 of [1]) is that enable_pci_io_ecs()
> and pci_enable_pci_io_ecs() are there to enable access to extended
> config space (offsets 256-4095) via the 0xcf8/0xcfc I/O ports.
>
> Per sec 4.1.1 of [2], we should be using ECAM (the memory-mapped
> enhanced configuration mechanism, i.e., MMCONFIG) to access extended
> config space, and the BIOS should supply an MCFG table.
>
> So why do we need to enable I/O access to ECS on AMD chips at all? Is
> this a workaround for a broken BIOS that doesn't supply an MCFG table?

Yes. Rather than leaving such systems entirely without access to
extended config space, it seemed (and still seems to me) better to
at least have this fallback in place. If after this discussion you decide
to drop this code here, it would be nice if you could let us know up
front, so we can decide whether to add the code to the hypervisor
instead or - just like on Intel systems - rely on MCFG being properly
exposed by the firmware.

Jan

2014-04-29 07:33:18

by Andreas Herrmann

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

On Mon, Apr 28, 2014 at 11:40:36PM +0200, Borislav Petkov wrote:
> On Mon, Apr 28, 2014 at 02:50:29PM -0600, Bjorn Helgaas wrote:
> > This I/O ECS thing seems likely to cause future problems. My
> > understanding (based on sec 2.8 of [1]) is that enable_pci_io_ecs()
> > and pci_enable_pci_io_ecs() are there to enable access to extended
> > config space (offsets 256-4095) via the 0xcf8/0xcfc I/O ports.
> >
> > Per sec 4.1.1 of [2], we should be using ECAM (the memory-mapped
> > enhanced configuration mechanism, i.e., MMCONFIG) to access extended
> > config space, and the BIOS should supply an MCFG table.
> >
> > So why do we need to enable I/O access to ECS on AMD chips at all? Is
> > this a workaround for a broken BIOS that doesn't supply an MCFG table?
>
> That's a good question. 831d991821dae doesn't say but I have a hunch
> Andreas and/or Robert should know. I seem to vaguely remember it
> might've been because of a missing MCFG but have flushed it out of the
> cache long time ago.
>
> Let's ask them.
>
> Andreas, Robert, guys, do you remember why we did the PCI IO ECS access?
> B0rked BIOSes?


I am sure, it's because some server systems had MMIO ECS access not
enabled in BIOS. I can't remember which systems were affected.


Andreas

2014-04-29 10:20:21

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

On Tue, Apr 29, 2014 at 09:33:09AM +0200, Andreas Herrmann wrote:
> I am sure, it's because some server systems had MMIO ECS access not
> enabled in BIOS. I can't remember which systems were affected.

Ok, now AMD people: what's the story with IO ECS, can we assume that on
everything after F10h, BIOS has a sensible MCFG and we can limit this to
F10h only? I like Bjorn's idea but we need to make sure a working MCFG
is ubiquitous.

Which begs the real question: Suravee, why are you even touching IO ECS
provided F15h and later have a MCFG? Or, do they?

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-29 11:19:29

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

On 29.04.14 09:33:09, Andreas Herrmann wrote:
> On Mon, Apr 28, 2014 at 11:40:36PM +0200, Borislav Petkov wrote:
> > On Mon, Apr 28, 2014 at 02:50:29PM -0600, Bjorn Helgaas wrote:
> > > This I/O ECS thing seems likely to cause future problems. My
> > > understanding (based on sec 2.8 of [1]) is that enable_pci_io_ecs()
> > > and pci_enable_pci_io_ecs() are there to enable access to extended
> > > config space (offsets 256-4095) via the 0xcf8/0xcfc I/O ports.
> > >
> > > Per sec 4.1.1 of [2], we should be using ECAM (the memory-mapped
> > > enhanced configuration mechanism, i.e., MMCONFIG) to access extended
> > > config space, and the BIOS should supply an MCFG table.

This is due to the non-atomic access to 0xcf8/0xcfc.

IIRC, another problem with io cfg space access is that it is bound to
register rax or so. The kernel implementation may not change here.

Otherall mmconfig access is much better implemented and thus
recommended to use by the os, while io cfg is used mainly by the bios.

> > > So why do we need to enable I/O access to ECS on AMD chips at all? Is
> > > this a workaround for a broken BIOS that doesn't supply an MCFG table?
> >
> > That's a good question. 831d991821dae doesn't say but I have a hunch
> > Andreas and/or Robert should know. I seem to vaguely remember it
> > might've been because of a missing MCFG but have flushed it out of the
> > cache long time ago.
> >
> > Let's ask them.
> >
> > Andreas, Robert, guys, do you remember why we did the PCI IO ECS access?
> > B0rked BIOSes?
>
>
> I am sure, it's because some server systems had MMIO ECS access not
> enabled in BIOS. I can't remember which systems were affected.

Yes, mmio ecs was not enabled on some systems by the bios. We needed
to use cf8/cfc io access.

If MMCONFIG is available it is the default, cf8/cfc is only used
otherwise. Also, some kernel configs may disable MMCONFIG.

PCI ECS was especially needed to enable the IBS interrupt for hw
profiling.

-Robert

2014-04-29 14:15:49

by Steffen Persvold

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

On 29 Apr 2014, at 3:20 , Borislav Petkov <[email protected]> wrote:

> On Tue, Apr 29, 2014 at 09:33:09AM +0200, Andreas Herrmann wrote:
>> I am sure, it's because some server systems had MMIO ECS access not
>> enabled in BIOS. I can't remember which systems were affected.
>
> Ok, now AMD people: what's the story with IO ECS, can we assume that on
> everything after F10h, BIOS has a sensible MCFG and we can limit this to
> F10h only? I like Bjorn's idea but we need to make sure a working MCFG
> is ubiquitous.
>
> Which begs the real question: Suravee, why are you even touching IO ECS
> provided F15h and later have a MCFG? Or, do they?
>

Our experience with this is that Fam10h and later have a very well working MCFG setup, earlier generations not so much (hence IO ECS was needed).

Cheers,
Steffen

2014-04-29 15:17:08

by Suravee Suthikulpanit

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

On 4/29/2014 5:20 AM, Borislav Petkov wrote:
> On Tue, Apr 29, 2014 at 09:33:09AM +0200, Andreas Herrmann wrote:
>> I am sure, it's because some server systems had MMIO ECS access not
>> enabled in BIOS. I can't remember which systems were affected.
>
If you are referring to accessing PCI ECS ranges via 0xCF8, then yes,
BIOS disable this as described below in the BKDG.

"The BIOS may use either configuration space access mechanism during
boot. Before booting the OS, BIOS must disable IO access to ECS, enable
MMIO configuration and build an ACPI defined MCFG table. BIOS ACPI code
must use MMIO to access configuration space."

> Ok, now AMD people: what's the story with IO ECS, can we assume that on
> everything after F10h, BIOS has a sensible MCFG and we can limit this to
> F10h only? I like Bjorn's idea but we need to make sure a working MCFG
> is ubiquitous.
>
> Which begs the real question: Suravee, why are you even touching IO ECS
> provided F15h and later have a MCFG? Or, do they?
>

As I was trying to generalize the logic inside amd_bus.c, which seems to
be used mainly as a fallback mechanism, I tried to maintain the existing
code, which does many things:
1. Setup numa_node information (if PXM doesn't exist)
2. Probe NB for MMIO resources (if MCFG doesn't exist)
3. Probe NB for IO resources
4. Setup IO ECS

In the new code, the IO ECS was needed to retrieve the
AMD_NB_F1_MMIO_BASE_LIMIT_HI_REG (offset 0x180) during the early
initialization as part of (2) logic. However, this register exists only
on the newer systems. However, as you mentioned, for (2) we can assume
that the MCFG exists for most of the systems (family10h and later), and
should be used instead.

The main purpose of this patch set is mainly to deal with the the node
information (1). So, we might need to split these all up and handle
them separately as needed where (2) and (3) will be used as fallback for
older systems where MCFG does not exist. I am not sure if where we need (4).

Suravee

2014-04-29 17:17:29

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

In addition to Boris I have the following:

On 18.04.14 20:53:23, Myron Stowe wrote:
> +#define AMD_PCIE_CF8(bus, dev, fn, reg) \
> + (0x80000000 | \
> + ((reg & 0xF00) << 16) | \
> + ((bus & 0xF) << 16) | \
> + ((dev & 0x1F) << 11) | \
> + ((fn & 0x7) << 8) | \
> + ((reg & 0xFC)))

Make this a static function.

> +/* This version of read_pci_config allows reading registers in the ECS area */
> +static inline int _amd_read_pci_config(u8 bus, u8 slot, u8 fn, u32 offset)

No need to inline this, no need for a single tailing underscore.

> +{
> + u32 value;
> +
> + if ((!amd_early_ecs_enabled) && (offset > 0xFF))
> + return -1;
> +
> + outl(AMD_PCIE_CF8(bus, slot, fn, offset), 0xcf8);
> + value = inl(0xcfc);
> +
> + return value;

This design is broken. The return type does not match the functions
return value and you can't do error checking as -1 can be a value too.

Why reinventing this new instead of trying to reuse functions in
arch/x86/pci/direct.c/early.c or so?

Would an access via mmconfig space at this early stage possible? If so
this code could be skipped as we wouln't need it for fam10h (no early
access needed) nor fam15h or newer (mmconfig available).

> +}
> +
> static struct pci_root_info __init *find_pci_root_info(int node, int link)
> {
> struct pci_root_info *info;
> @@ -48,6 +81,9 @@ static struct pci_root_info __init *find_pci_root_info(int node, int link)
> if (info->node == node && info->link == link)
> return info;
>
> + pr_warn("AMD-Bus: WARNING: Failed to find root info for node %#x, link %#x\n",
> + node, link);

As there might be (virtual) systems where this case is valid, I
wouldn't print anything here. We should only print things that exist
or error/warnings.

-Robert

> +
> return NULL;
> }
>
> @@ -118,6 +154,12 @@ static int __init early_fill_mp_bus_info(void)
>
> pr_debug("Found AMD hostbridge at %x:%x.0\n", bus, slot);

2014-04-29 19:14:59

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

On Tue, Apr 29, 2014 at 10:16:57AM -0500, Suravee Suthikulanit wrote:
> In the new code, the IO ECS was needed to retrieve the
> AMD_NB_F1_MMIO_BASE_LIMIT_HI_REG (offset 0x180) during the early
> initialization as part of (2) logic. However, this register exists only on
> the newer systems. However, as you mentioned, for (2) we can assume that
> the MCFG exists for most of the systems (family10h and later), and should be
> used instead.
>
> The main purpose of this patch set is mainly to deal with the the node
> information (1). So, we might need to split these all up and handle them
> separately as needed where (2) and (3) will be used as fallback for older
> systems where MCFG does not exist.

So sounds to me like we want to get rid of the whole IO ECS deal
altogether then.

Now, I'm wondering whether we should kill it completely since I don't
think anyone cares about numa node info being correct on K8, or? I'm
specifically turning to our numascale friends who love to have a lot of
nodes. :-)

Daniel, Steffen?

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-29 19:16:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] ACPI/PCI: Warn if we have to "guess" host bridge node information

On Mon, Apr 28, 2014 at 03:24:39PM -0600, Myron Stowe wrote:
> >> @@ -489,8 +489,12 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
> >> }
> >>
> >> node = acpi_get_node(device->handle);
> >> - if (node == NUMA_NO_NODE)
> >> + if (node == NUMA_NO_NODE) {
> >> node = x86_pci_root_bus_node(busnum);
> >> + if (node != NUMA_NO_NODE)
> >> + dev_info(&device->dev, FW_BUG "No _PXM; guessing node number %x\n",
> >
> > Hmm, I'm not really convinced this message is user-friendly enough. Can
> > we be more descriptive here please?
> >
>
> How about -
> dev_info(&device->dev, FW_BUG "no _PXM; falling back to node %d from
> hardware (may be inconsistent with ACPI node numbers)\n", node);

Yep, better.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-29 19:29:55

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] PCI: Remove redundant 'quirk_amd_nb_node'

On Mon, Apr 28, 2014 at 8:02 PM, Suravee Suthikulanit
<[email protected]> wrote:

> Sorry for late reply. My concern is that removing the "quirk_amd_nb_node()"
> will affect the value of "numa_node" of the host bridge devices (i.e.
> X:00.[18|19|1a|1b|1c|1d|1e|1f].X). I am not sure if any code is using this
> information. But in theory, these host-bridge devices are not on the same
> node as where the PCI root complex lives (e.g. 0 and 4 from the example
> above).

I doubt anything in the kernel uses the node number for these devices
(00:[18|19|...]). The only place the PCI core uses it is to run a
driver probe method on the same node as the device. Are there even
drivers for these devices?

They aren't PCI-to-PCI bridges, so there are no devices below them
that would be affected by their node numbers.

> If we want the "numa_node" to really representing the actual node, then the
> quirk has to stay for now. We might need to come up with a different logic
> to replace the quirks here, which would automatically determine the actual
> node value for these host-bridge devices.

It sounds like the numa_node for these devices in sysfs is misleading
unless we have a quirk like this. If that's important, I think it
could be fixed by having the BIOS provide _PXM methods for them. But
I don't think it is, and I'm inclined to remove the quirk.

I'm pushing hard to get rid of CPU-specific code like this because
there are generic methods to do it, e.g., _PXM, and the CPU-specific
code is a perennial maintenance headache.

Bjorn

2014-04-29 21:40:30

by Myron Stowe

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

On Tue, Apr 29, 2014 at 1:14 PM, Borislav Petkov <[email protected]> wrote:
> On Tue, Apr 29, 2014 at 10:16:57AM -0500, Suravee Suthikulanit wrote:
>> In the new code, the IO ECS was needed to retrieve the
>> AMD_NB_F1_MMIO_BASE_LIMIT_HI_REG (offset 0x180) during the early
>> initialization as part of (2) logic. However, this register exists only on
>> the newer systems. However, as you mentioned, for (2) we can assume that
>> the MCFG exists for most of the systems (family10h and later), and should be
>> used instead.
>>
>> The main purpose of this patch set is mainly to deal with the the node
>> information (1). So, we might need to split these all up and handle them
>> separately as needed where (2) and (3) will be used as fallback for older
>> systems where MCFG does not exist.
>
> So sounds to me like we want to get rid of the whole IO ECS deal
> altogether then.
>
> Now, I'm wondering whether we should kill it completely since I don't
> think anyone cares about numa node info being correct on K8, or? I'm
> specifically turning to our numascale friends who love to have a lot of
> nodes. :-)

I think we need to be careful here as there are two unrelated topics
being discussed together. What started this whole thread was the need
for sysfs related numa_node information with respect to PCI devices
(1). Without patch 1, platforms with newer AMD CPUs end up having
'-1' numa_node values for all PCI devices.

IO ECS has no bearing on patch 1, it only comes into play with patch 2
which is concerned with MMIO resource information when MCFG doesn't
exist. For the particular issue I'm trying to get resolved, patch 2
is not needed. However, since we have expended time and effort on
this subject, perhaps we should get this cleaned up while it has our
attention.

I'm all for deleting as much of amd_bus.c as possible due to its
"perennial maintenance headache". The obvious choices seem to be all,
or some combination, of:
o removing IO ECS logic,
o removing IO/MMIO logic (assuming MCFG issues were long enough ago
to no longer be a concern),
o start deprecating amd_bus.c by adding logic to skip if BIOS >= 2015

Myron

>
> Daniel, Steffen?
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --

2014-04-30 06:41:57

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

On 18.04.14 20:53:23, Myron Stowe wrote:
> From: Suravee Suthikulpanit <[email protected]>

> - start = reg & 0xffffff00; /* 39:16 on 31:8*/
> - start <<= 8;
> - reg = read_pci_config(bus, slot, 1, 0x84 + (i << 3));
> + start = (reg & 0xffffff00UL) << 8; /* 39:16 on 31:8 */

I think this is not save anymore with respect to u32/u64 casting. The
original was.

-Robert

2014-04-30 07:00:51

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

On 29.04.14 15:40:28, Myron Stowe wrote:
> On Tue, Apr 29, 2014 at 1:14 PM, Borislav Petkov <[email protected]> wrote:
> > So sounds to me like we want to get rid of the whole IO ECS deal
> > altogether then.
> >
> > Now, I'm wondering whether we should kill it completely since I don't
> > think anyone cares about numa node info being correct on K8, or? I'm
> > specifically turning to our numascale friends who love to have a lot of
> > nodes. :-)

Maybe I did get you wrong, but IO ECS was introduced with fam10h and
is not related to k8.

> I think we need to be careful here as there are two unrelated topics
> being discussed together. What started this whole thread was the need
> for sysfs related numa_node information with respect to PCI devices
> (1). Without patch 1, platforms with newer AMD CPUs end up having
> '-1' numa_node values for all PCI devices.
>
> IO ECS has no bearing on patch 1, it only comes into play with patch 2
> which is concerned with MMIO resource information when MCFG doesn't
> exist. For the particular issue I'm trying to get resolved, patch 2
> is not needed. However, since we have expended time and effort on
> this subject, perhaps we should get this cleaned up while it has our
> attention.
>
> I'm all for deleting as much of amd_bus.c as possible due to its
> "perennial maintenance headache". The obvious choices seem to be all,
> or some combination, of:
> o removing IO ECS logic,
> o removing IO/MMIO logic (assuming MCFG issues were long enough ago
> to no longer be a concern),
> o start deprecating amd_bus.c by adding logic to skip if BIOS >= 2015

I don't see any reason for big changes actually. Just bind the IO ECS
logic to fam10h (either with fam check or pci device depending on the
implementation, xen's flavor would be pci). This is something stricter
than 'if BIOS >= 2015'. It leaves code as it is which is maintainable.

You implement the new logic for for newer families. No need for one
implementation that fits all.

-Robert

2014-04-30 07:50:45

by Suravee Suthikulpanit

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities



On 04/30/2014 02:00 AM, Robert Richter wrote:
> On 29.04.14 15:40:28, Myron Stowe wrote:
>> On Tue, Apr 29, 2014 at 1:14 PM, Borislav Petkov <[email protected]> wrote:
>>> So sounds to me like we want to get rid of the whole IO ECS deal
>>> altogether then.
>>>
>>> Now, I'm wondering whether we should kill it completely since I don't
>>> think anyone cares about numa node info being correct on K8, or? I'm
>>> specifically turning to our numascale friends who love to have a lot of
>>> nodes. :-)
>
> Maybe I did get you wrong, but IO ECS was introduced with fam10h and
> is not related to k8.
>
>> I think we need to be careful here as there are two unrelated topics
>> being discussed together. What started this whole thread was the need
>> for sysfs related numa_node information with respect to PCI devices
>> (1). Without patch 1, platforms with newer AMD CPUs end up having
>> '-1' numa_node values for all PCI devices.
>>
>> IO ECS has no bearing on patch 1, it only comes into play with patch 2
>> which is concerned with MMIO resource information when MCFG doesn't
>> exist. For the particular issue I'm trying to get resolved, patch 2
>> is not needed. However, since we have expended time and effort on
>> this subject, perhaps we should get this cleaned up while it has our
>> attention.
>>
>> I'm all for deleting as much of amd_bus.c as possible due to its
>> "perennial maintenance headache". The obvious choices seem to be all,
>> or some combination, of:
>> o removing IO ECS logic,
>> o removing IO/MMIO logic (assuming MCFG issues were long enough ago
>> to no longer be a concern),
>> o start deprecating amd_bus.c by adding logic to skip if BIOS >= 2015
>
> I don't see any reason for big changes actually. Just bind the IO ECS
> logic to fam10h (either with fam check or pci device depending on the
> implementation, xen's flavor would be pci). This is something stricter
> than 'if BIOS >= 2015'. It leaves code as it is which is maintainable.
>
> You implement the new logic for for newer families. No need for one
> implementation that fits all.
>
> -Robert
>
Actually, if ECS is needed by IBS, then wouldn't this still be needed
for every family since 10h and later (except family11h).

Suravee

2014-04-30 09:51:37

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

On 30.04.14 02:50:21, Suravee Suthikulpanit wrote:
> Actually, if ECS is needed by IBS, then wouldn't this still be needed for
> every family since 10h and later (except family11h).

Yes, but you have stated that pci mmconfig should work on families >
10h. Thus, ECS would work there out-of-the-box (at least after the
system brought pci up, ibs is initialized after pci setup).

-Robert

2014-04-30 23:03:57

by Myron Stowe

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/PCI: Support additional MMIO range capabilities

On Wed, Apr 30, 2014 at 1:00 AM, Robert Richter <[email protected]> wrote:
> On 29.04.14 15:40:28, Myron Stowe wrote:
>> On Tue, Apr 29, 2014 at 1:14 PM, Borislav Petkov <[email protected]> wrote:
>> > So sounds to me like we want to get rid of the whole IO ECS deal
>> > altogether then.
>> >
>> > Now, I'm wondering whether we should kill it completely since I don't
>> > think anyone cares about numa node info being correct on K8, or? I'm
>> > specifically turning to our numascale friends who love to have a lot of
>> > nodes. :-)
>
> Maybe I did get you wrong, but IO ECS was introduced with fam10h and
> is not related to k8.
>
>> I think we need to be careful here as there are two unrelated topics
>> being discussed together. What started this whole thread was the need
>> for sysfs related numa_node information with respect to PCI devices
>> (1). Without patch 1, platforms with newer AMD CPUs end up having
>> '-1' numa_node values for all PCI devices.
>>
>> IO ECS has no bearing on patch 1, it only comes into play with patch 2
>> which is concerned with MMIO resource information when MCFG doesn't
>> exist. For the particular issue I'm trying to get resolved, patch 2
>> is not needed. However, since we have expended time and effort on
>> this subject, perhaps we should get this cleaned up while it has our
>> attention.
>>
>> I'm all for deleting as much of amd_bus.c as possible due to its
>> "perennial maintenance headache". The obvious choices seem to be all,
>> or some combination, of:
>> o removing IO ECS logic,
>> o removing IO/MMIO logic (assuming MCFG issues were long enough ago
>> to no longer be a concern),
>> o start deprecating amd_bus.c by adding logic to skip if BIOS >= 2015
>
> I don't see any reason for big changes actually. Just bind the IO ECS
> logic to fam10h (either with fam check or pci device depending on the
> implementation, xen's flavor would be pci). This is something stricter
> than 'if BIOS >= 2015'. It leaves code as it is which is maintainable.
>
> You implement the new logic for for newer families. No need for one
> implementation that fits all.

I wasn't explicit enough with respect to "deleting as much of
amd_bus.c as possible ..." so I'll try again.

Earlier in this thread - https://lkml.org/lkml/2014/4/28/524 - Bjorn
expressed the desire to "eliminate the need for kernel changes to
support future systems. So far we seem to be concentrating on (1) and
neglecting (2), which means we're always reacting to things that are
broken. ... I think we should try to get rid of amd_bus.c ...".

Then, again in this thread - https://lkml.org/lkml/2014/4/29/360 -
Suravee noted: "... the existing code, which does many things:
1. Setup numa_node information (if PXM doesn't exist)
2. Probe NB for MMIO resources (if MCFG doesn't exist)
3. Probe NB for IO resources
4. Setup IO ECS

So let's walk through these. (1) was put in place to "snoop out, from
the HW" numa_node information. It is "snooped" and cached. Then,
later in booting, if the platform does not supply an ACPI _PXM method
corresponding to the hostbridge *and* we are on a AMD based platform,
the "snooped" numa_node information is retrieved and used. There are
two issues with this approach. First, "The node numbers used by Linux
are logical and there's no reason they need to be identical to
settings in the CPU registers. So if we got some node information in
the normal way (from _PXM, SLIT, SRAT, etc.) and some from your patch,
there's no reason to believe they would be compatible." [1]. Second,
there is a architectural agnostic way to get this information; the
ACPI _PXM method. Looking at numerous 'acpidump' captures, the vast
majority of platform BIOS' are not implementing _PXM methods
corresponding to hostbridges - we need to try and correct this and get
away from this current, error prone, fall-back mechanism (again: see
[1]).

(2) and (3) were put in place for similar reasons but with respect to
MCFG - during its early phases, it was either buggy or BIOS' were not
supplying ACPI MCFG tables. This was long enough ago that I expect we
are well past those issues with new systems today. MCFG, _CBA, and
_CRS are again architectural agnostic ways to get MMCONFIG and
resource (I/O Port, and MMIO) information. With respect to (2) and
(3) we were in a similar situation with Intel based systems and for a
brief period of time had 'intel_bus.c'. We were encountering the same
"perennial maintenance headache" issues with 'intel_bus.c' and finally
with Bjorn's efforts in implementing _CRS as the default for platforms
with BIOS >= 2008 [2] we were able to obviate 'intel_bus.c' completely
- something we should be similarly striving for here with amd_bus.c.

(4) is a little more interesting. It seems to be related to Xen, non
MMIO based ECS enabled platforms, and IBS. Xen has indicated that
they can "decide whether to add the code to the hypervisor instead or
- just like on Intel systems - rely on MCFG being properly
exposed by the firmware." [3]. Again, I expect we are past the early
implementations of platforms that don't have MMIO based ECS enabled.
That leaves IBS which I'm completely unfamiliar with [4].

With the possible exception of (4), there should be ACPI based
architectural agnostic ways to get the information being discussed
here. MCFG, _CBA, and _CRS are mature and provide solutions to (2)
and (3). There are platforms in the field, the vast majority
actually, that still do not implement _PXM methods corresponding to
hostbridges (1). Patch 1 of this series provides a fall-back for that
situation for AMD based platforms only; albeit a solution with
problems itself as expressed above. For (1), the proper solution is
to get platform BIOS' to implement _PXM methods.

As a result, it seems like we should be pursuing an avenue to move us
out of the current "perennial maintenance headache" design that
amd_bus.c presents. As such, I'm going to start working on an
additional patch to this series that only runs 'amd_postcore_init()'
for BIOS dates < 2015.

[1] https://lkml.org/lkml/2014/3/17/390
[2] Kernel commit 7bc5e3f "x86/PCI: use host bridge _CRS info by
default on 2008 and newer machines"
[3] https://lkml.org/lkml/2014/4/29/66
[4] https://lkml.org/lkml/2014/4/30/153 - "ECS would work there
out-of-the-box (at least after the system brought pci up, ibs is
initialized after pci setup)."

Myron


>
> -Robert