2012-05-23 06:34:53

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 00/11] PCI: resource allocation related

From: Yinghai Lu <[email protected]>

This patchset will try to make allocation to find suitable assignement.
1. will try to assign 64 bit resource above 4g at first.
2. will find space that is matched with needed size
3. will put resource in right location to leave more big alignment with left blank resource
4. will try option rom BAR as optional resources.

Could be found:
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-res-alloc

and it is based on pci for-3.6 branch.

Thanks

Yinghai Lu

Yinghai Lu (11):
PCI: Should add children device res to fail list
PCI: Try to allocate mem64 above 4G at first
intel-gtt: Read 64bit for gmar_bus_addr
PCI: Make sure assign same align with large size resource at first
resources: Split out __allocate_resource()
resource: make find_resource could return just fit resource
PCI: Don't allocate small resource in big empty space.
resource: only return range with needed align
PCI: Add is_pci_iov_resource_idx()
PCI: Sort unassigned resources with correct alignment
PCI: Treat ROM resource as optional during assigning.

drivers/char/agp/intel-gtt.c | 14 ++++--
drivers/pci/bus.c | 38 +++++++++++---
drivers/pci/setup-bus.c | 78 ++++++++++++++++++-----------
drivers/pci/setup-res.c | 28 +++++++----
include/linux/ioport.h | 8 +++
include/linux/pci.h | 23 ++++++++
kernel/resource.c | 114 ++++++++++++++++++++++++++++++++++++++---
7 files changed, 243 insertions(+), 60 deletions(-)

--
1.7.7


2012-05-23 06:34:58

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

and will fall back to below 4g if it can not find any above 4g.

Signed-off-by: Yinghai Lu <[email protected]>
---
drivers/pci/bus.c | 16 +++++++++++++---
1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 4ce5ef2..2429f1f 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -122,13 +122,17 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
int i, ret = -ENOMEM;
struct resource *r;
resource_size_t max = -1;
+ resource_size_t bottom = PCIBIOS_MAX_MEM_32 + 1ULL;

type_mask |= IORESOURCE_IO | IORESOURCE_MEM;

/* don't allocate too high if the pref mem doesn't support 64bit*/
- if (!(res->flags & IORESOURCE_MEM_64))
+ if (!(res->flags & IORESOURCE_MEM_64)) {
max = PCIBIOS_MAX_MEM_32;
+ bottom = 0;
+ }

+again:
pci_bus_for_each_resource(bus, r, i) {
if (!r)
continue;
@@ -145,12 +149,18 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,

/* Ok, try it out.. */
ret = allocate_resource(r, res, size,
- r->start ? : min,
+ max(bottom, r->start ? : min),
max, align,
alignf, alignf_data);
if (ret == 0)
- break;
+ return 0;
}
+
+ if (bottom != 0) {
+ bottom = 0;
+ goto again;
+ }
+
return ret;
}

--
1.7.7

2012-05-23 06:35:10

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 08/11] resource: only return range with needed align

Compare align between put range in head and tail, pick small one
to leave big one for future user.

Signed-off-by: Yinghai Lu <[email protected]>
---
kernel/resource.c | 14 +++++++++++++-
1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index b4dae55..8277090 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -506,8 +506,20 @@ out:
}

if (!ret) {
- new->start = avail_start;
+ /* compare which one have max order */
+ new->start = round_down(avail_start + avail_size - size,
+ constraint->align);
+ new->end = avail_start + avail_size - 1;
+ new->start = constraint->alignf(constraint->alignf_data, new,
+ size, constraint->align);
new->end = new->start + size - 1;
+
+ if (new->start < avail_start ||
+ new->end > (avail_start + avail_size - 1) ||
+ __ffs64(new->start) >= __ffs64(avail_start)) {
+ new->start = avail_start;
+ new->end = new->start + size - 1;
+ }
}
return ret;
}
--
1.7.7

2012-05-23 06:35:06

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 07/11] PCI: Don't allocate small resource in big empty space.

Use updated find_resource to return matched resource instead using head
of bigger range.

Signed-off-by: Yinghai Lu <[email protected]>
---
drivers/pci/bus.c | 22 ++++++++++++++++++----
drivers/pci/setup-bus.c | 12 ++++++++----
drivers/pci/setup-res.c | 28 ++++++++++++++++++----------
include/linux/ioport.h | 8 ++++++++
include/linux/pci.h | 10 ++++++++++
kernel/resource.c | 22 +++++++++++++++++++---
6 files changed, 81 insertions(+), 21 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 2429f1f..a7ba102 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -110,14 +110,14 @@ void pci_bus_remove_resources(struct pci_bus *bus)
* for a specific device resource.
*/
int
-pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
+pci_bus_alloc_resource_fit(struct pci_bus *bus, struct resource *res,
resource_size_t size, resource_size_t align,
resource_size_t min, unsigned int type_mask,
resource_size_t (*alignf)(void *,
const struct resource *,
resource_size_t,
resource_size_t),
- void *alignf_data)
+ void *alignf_data, bool fit)
{
int i, ret = -ENOMEM;
struct resource *r;
@@ -148,10 +148,10 @@ again:
continue;

/* Ok, try it out.. */
- ret = allocate_resource(r, res, size,
+ ret = allocate_resource_fit(r, res, size,
max(bottom, r->start ? : min),
max, align,
- alignf, alignf_data);
+ alignf, alignf_data, fit);
if (ret == 0)
return 0;
}
@@ -164,6 +164,20 @@ again:
return ret;
}

+int
+pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
+ resource_size_t size, resource_size_t align,
+ resource_size_t min, unsigned int type_mask,
+ resource_size_t (*alignf)(void *,
+ const struct resource *,
+ resource_size_t,
+ resource_size_t),
+ void *alignf_data)
+{
+ return pci_bus_alloc_resource_fit(bus, res, size, align, min, type_mask,
+ alignf, alignf_data, false);
+}
+
/**
* pci_bus_add_device - add a single device
* @dev: device to add
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 0568f29..4d6823f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -275,7 +275,7 @@ out:
* requests that could not satisfied to the failed_list.
*/
static void assign_requested_resources_sorted(struct list_head *head,
- struct list_head *fail_head)
+ struct list_head *fail_head, bool fit)
{
struct resource *res;
struct pci_dev_resource *dev_res;
@@ -285,7 +285,7 @@ static void assign_requested_resources_sorted(struct list_head *head,
res = dev_res->res;
idx = res - &dev_res->dev->resource[0];
if (resource_size(res) &&
- pci_assign_resource(dev_res->dev, idx)) {
+ pci_assign_resource_fit(dev_res->dev, idx, fit)) {
if (fail_head) {
/*
* if the failed res is for ROM BAR, and it will
@@ -320,6 +320,7 @@ static void __assign_resources_sorted(struct list_head *head,
LIST_HEAD(local_fail_head);
struct pci_dev_resource *save_res;
struct pci_dev_resource *dev_res;
+ bool fit = true;

/* Check if optional add_size is there */
if (!realloc_head || list_empty(realloc_head))
@@ -339,7 +340,7 @@ static void __assign_resources_sorted(struct list_head *head,
dev_res->res);

/* Try updated head list with add_size added */
- assign_requested_resources_sorted(head, &local_fail_head);
+ assign_requested_resources_sorted(head, &local_fail_head, fit);

/* all assigned with add_size ? */
if (list_empty(&local_fail_head)) {
@@ -366,9 +367,12 @@ static void __assign_resources_sorted(struct list_head *head,
}
free_list(&save_head);

+ /* will need to expand later, so not use fit */
+ fit = false;
+
requested_and_reassign:
/* Satisfy the must-have resource requests */
- assign_requested_resources_sorted(head, fail_head);
+ assign_requested_resources_sorted(head, fail_head, fit);

/* Try to satisfy any additional optional resource
requests */
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index eea85da..7010ad9 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -128,7 +128,8 @@ void pci_disable_bridge_window(struct pci_dev *dev)
}

static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
- int resno, resource_size_t size, resource_size_t align)
+ int resno, resource_size_t size, resource_size_t align,
+ bool fit)
{
struct resource *res = dev->resource + resno;
resource_size_t min;
@@ -137,9 +138,9 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;

/* First, try exact prefetching match.. */
- ret = pci_bus_alloc_resource(bus, res, size, align, min,
+ ret = pci_bus_alloc_resource_fit(bus, res, size, align, min,
IORESOURCE_PREFETCH,
- pcibios_align_resource, dev);
+ pcibios_align_resource, dev, fit);

if (ret < 0 && (res->flags & IORESOURCE_PREFETCH)) {
/*
@@ -148,8 +149,8 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
* But a prefetching area can handle a non-prefetching
* window (it will just not perform as well).
*/
- ret = pci_bus_alloc_resource(bus, res, size, align, min, 0,
- pcibios_align_resource, dev);
+ ret = pci_bus_alloc_resource_fit(bus, res, size, align, min, 0,
+ pcibios_align_resource, dev, fit);
}
return ret;
}
@@ -206,7 +207,8 @@ static int pci_revert_fw_address(struct resource *res, struct pci_dev *dev,
return ret;
}

-static int _pci_assign_resource(struct pci_dev *dev, int resno, int size, resource_size_t min_align)
+static int _pci_assign_resource(struct pci_dev *dev, int resno, int size,
+ resource_size_t min_align, bool fit)
{
struct resource *res = dev->resource + resno;
struct pci_bus *bus;
@@ -214,7 +216,8 @@ static int _pci_assign_resource(struct pci_dev *dev, int resno, int size, resour
char *type;

bus = dev->bus;
- while ((ret = __pci_assign_resource(bus, dev, resno, size, min_align))) {
+ while ((ret = __pci_assign_resource(bus, dev, resno, size,
+ min_align, fit))) {
if (!bus->parent || !bus->self->transparent)
break;
bus = bus->parent;
@@ -253,7 +256,7 @@ int pci_reassign_resource(struct pci_dev *dev, int resno, resource_size_t addsiz

/* already aligned with min_align */
new_size = resource_size(res) + addsize;
- ret = _pci_assign_resource(dev, resno, new_size, min_align);
+ ret = _pci_assign_resource(dev, resno, new_size, min_align, false);
if (!ret) {
res->flags &= ~IORESOURCE_STARTALIGN;
dev_info(&dev->dev, "BAR %d: reassigned %pR\n", resno, res);
@@ -263,7 +266,7 @@ int pci_reassign_resource(struct pci_dev *dev, int resno, resource_size_t addsiz
return ret;
}

-int pci_assign_resource(struct pci_dev *dev, int resno)
+int pci_assign_resource_fit(struct pci_dev *dev, int resno, bool fit)
{
struct resource *res = dev->resource + resno;
resource_size_t align, size;
@@ -279,7 +282,7 @@ int pci_assign_resource(struct pci_dev *dev, int resno)

bus = dev->bus;
size = resource_size(res);
- ret = _pci_assign_resource(dev, resno, size, align);
+ ret = _pci_assign_resource(dev, resno, size, align, fit);

/*
* If we failed to assign anything, let's try the address
@@ -298,6 +301,11 @@ int pci_assign_resource(struct pci_dev *dev, int resno)
return ret;
}

+int pci_assign_resource(struct pci_dev *dev, int resno)
+{
+ return pci_assign_resource_fit(dev, resno, false);
+}
+
int pci_enable_resources(struct pci_dev *dev, int mask)
{
u16 cmd, old_cmd;
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 589e0e7..255852e 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -148,6 +148,14 @@ extern struct resource *insert_resource_conflict(struct resource *parent, struct
extern int insert_resource(struct resource *parent, struct resource *new);
extern void insert_resource_expand_to_fit(struct resource *root, struct resource *new);
extern void arch_remove_reservations(struct resource *avail);
+int allocate_resource_fit(struct resource *root, struct resource *new,
+ resource_size_t size, resource_size_t min,
+ resource_size_t max, resource_size_t align,
+ resource_size_t (*alignf)(void *,
+ const struct resource *,
+ resource_size_t,
+ resource_size_t),
+ void *alignf_data, bool fit);
extern int allocate_resource(struct resource *root, struct resource *new,
resource_size_t size, resource_size_t min,
resource_size_t max, resource_size_t align,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index a0e2d7f..c0704a0 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -827,6 +827,7 @@ int __pci_reset_function_locked(struct pci_dev *dev);
int pci_reset_function(struct pci_dev *dev);
void pci_update_resource(struct pci_dev *dev, int resno);
int __must_check pci_assign_resource(struct pci_dev *dev, int i);
+int __must_check pci_assign_resource_fit(struct pci_dev *dev, int i, bool fit);
int __must_check pci_reassign_resource(struct pci_dev *dev, int i, resource_size_t add_size, resource_size_t align);
int pci_select_bars(struct pci_dev *dev, unsigned long flags);

@@ -943,6 +944,15 @@ int __must_check pci_bus_alloc_resource(struct pci_bus *bus,
resource_size_t,
resource_size_t),
void *alignf_data);
+int __must_check pci_bus_alloc_resource_fit(struct pci_bus *bus,
+ struct resource *res, resource_size_t size,
+ resource_size_t align, resource_size_t min,
+ unsigned int type_mask,
+ resource_size_t (*alignf)(void *,
+ const struct resource *,
+ resource_size_t,
+ resource_size_t),
+ void *alignf_data, bool fit);
void pci_enable_bridges(struct pci_bus *bus);

/* Proper probing supporting hot-pluggable devices */
diff --git a/kernel/resource.c b/kernel/resource.c
index 45ab24d..b4dae55 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -580,7 +580,7 @@ static int __allocate_resource(struct resource *root, struct resource *new,
const struct resource *,
resource_size_t,
resource_size_t),
- void *alignf_data, bool lock)
+ void *alignf_data, bool lock, bool fit)
{
int err;
struct resource_constraint constraint;
@@ -602,13 +602,28 @@ static int __allocate_resource(struct resource *root, struct resource *new,

if (lock)
write_lock(&resource_lock);
- err = find_resource(root, new, size, &constraint, false);
+ err = find_resource(root, new, size, &constraint, fit);
if (err >= 0 && __request_resource(root, new))
err = -EBUSY;
if (lock)
write_unlock(&resource_lock);
return err;
}
+int allocate_resource_fit(struct resource *root, struct resource *new,
+ resource_size_t size, resource_size_t min,
+ resource_size_t max, resource_size_t align,
+ resource_size_t (*alignf)(void *,
+ const struct resource *,
+ resource_size_t,
+ resource_size_t),
+ void *alignf_data, bool fit)
+{
+ bool lock = true;
+
+ return __allocate_resource(root, new, size, min, max, align,
+ alignf, alignf_data, lock, fit);
+}
+
int allocate_resource(struct resource *root, struct resource *new,
resource_size_t size, resource_size_t min,
resource_size_t max, resource_size_t align,
@@ -619,9 +634,10 @@ int allocate_resource(struct resource *root, struct resource *new,
void *alignf_data)
{
bool lock = true;
+ bool fit = false;

return __allocate_resource(root, new, size, min, max, align,
- alignf, alignf_data, lock);
+ alignf, alignf_data, lock, fit);
}

EXPORT_SYMBOL(allocate_resource);
--
1.7.7

2012-05-23 06:35:49

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 10/11] PCI: Sort unassigned resources with correct alignment

For SIZEALIGN type resource, we need to add back add_size in optional
resource list during __dev_sort_resources(), otherwise those optional
resources will get skipped.

SRIOV BAR is specical one, it will always re-read size for BAR.

Signed-off-by: Yinghai Lu <[email protected]>
---
drivers/pci/setup-bus.c | 33 ++++++++++++++++++++++++++-------
1 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 99bc728..ed32864 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -120,8 +120,25 @@ static resource_size_t get_res_add_size(struct list_head *head,
return 0;
}

+static resource_size_t __pci_resource_alignment(
+ struct pci_dev *dev,
+ struct resource *r,
+ struct list_head *realloc_head)
+{
+ resource_size_t r_align, add_size = 0;
+
+ if ((r->flags & IORESOURCE_SIZEALIGN) && realloc_head)
+ add_size = get_res_add_size(realloc_head, r);
+ r->end += add_size;
+ r_align = pci_resource_alignment(dev, r);
+ r->end -= add_size;
+
+ return r_align;
+}
/* Sort resources by alignment */
-static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
+static void pdev_sort_resources(struct pci_dev *dev,
+ struct list_head *realloc_head,
+ struct list_head *head)
{
int i;

@@ -139,7 +156,7 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
if (!(r->flags) || r->parent)
continue;

- r_align = pci_resource_alignment(dev, r);
+ r_align = __pci_resource_alignment(dev, r, realloc_head);
if (!r_align) {
dev_warn(&dev->dev, "BAR %d: %pR has bogus alignment\n",
i, r);
@@ -159,8 +176,9 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
list_for_each_entry(dev_res, head, list) {
resource_size_t align;

- align = pci_resource_alignment(dev_res->dev,
- dev_res->res);
+ align = __pci_resource_alignment(dev_res->dev,
+ dev_res->res,
+ realloc_head);

if (r_align > align ||
(r_align == align &&
@@ -175,6 +193,7 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
}

static void __dev_sort_resources(struct pci_dev *dev,
+ struct list_head *realloc_head,
struct list_head *head)
{
u16 class = dev->class >> 8;
@@ -191,7 +210,7 @@ static void __dev_sort_resources(struct pci_dev *dev,
return;
}

- pdev_sort_resources(dev, head);
+ pdev_sort_resources(dev, realloc_head, head);
}

static inline void reset_resource(struct resource *res)
@@ -387,7 +406,7 @@ static void pdev_assign_resources_sorted(struct pci_dev *dev,
{
LIST_HEAD(head);

- __dev_sort_resources(dev, &head);
+ __dev_sort_resources(dev, add_head, &head);
__assign_resources_sorted(&head, add_head, fail_head);

}
@@ -400,7 +419,7 @@ static void pbus_assign_resources_sorted(const struct pci_bus *bus,
LIST_HEAD(head);

list_for_each_entry(dev, &bus->devices, bus_list)
- __dev_sort_resources(dev, &head);
+ __dev_sort_resources(dev, realloc_head, &head);

__assign_resources_sorted(&head, realloc_head, fail_head);
}
--
1.7.7

2012-05-23 06:35:46

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 11/11] PCI: Treat ROM resource as optional during assigning.

So will try to allocate them together with requested ones, if can not assign
them, could go with requested one only, and just skip ROM resource.

Signed-off-by: Yinghai Lu <[email protected]>
---
drivers/pci/setup-bus.c | 21 +++++++--------------
include/linux/pci.h | 5 +++++
2 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index ed32864..41c08d6 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -305,18 +305,10 @@ static void assign_requested_resources_sorted(struct list_head *head,
idx = res - &dev_res->dev->resource[0];
if (resource_size(res) &&
pci_assign_resource_fit(dev_res->dev, idx, fit)) {
- if (fail_head) {
- /*
- * if the failed res is for ROM BAR, and it will
- * be enabled later, don't add it to the list
- */
- if (!((idx == PCI_ROM_RESOURCE) &&
- (!(res->flags & IORESOURCE_ROM_ENABLE))))
- add_to_list(fail_head,
- dev_res->dev, res,
- 0 /* dont care */,
- 0 /* dont care */);
- }
+ if (fail_head)
+ add_to_list(fail_head, dev_res->dev, res,
+ 0 /* dont care */,
+ 0 /* dont care */);
reset_resource(res);
}
}
@@ -833,8 +825,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
continue;
r_size = resource_size(r);

- /* put SRIOV requested res to the optional list */
- if (realloc_head && is_pci_iov_resource_idx(i)) {
+ /* put SRIOV/ROM requested res to the optional list */
+ if (realloc_head && (is_pci_iov_resource_idx(i) ||
+ is_pci_rom_resource_idx(i))) {
r->end = r->start - 1;
add_to_list(realloc_head, dev, r, r_size, 0/* dont' care */);
children_add_size += r_size;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0d254d9..08c081c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -122,6 +122,11 @@ static inline bool is_pci_iov_resource_idx(int i)
return false;
}

+static inline bool is_pci_rom_resource_idx(int i)
+{
+ return i == PCI_ROM_RESOURCE;
+}
+
typedef int __bitwise pci_power_t;

#define PCI_D0 ((pci_power_t __force) 0)
--
1.7.7

2012-05-23 06:36:16

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 09/11] PCI: Add is_pci_iov_resource_idx()

So we can remove one ifdef in setup-bus.c. and will share the code in that
ifdef block.

Signed-off-by: Yinghai Lu <[email protected]>
---
drivers/pci/setup-bus.c | 7 +++----
include/linux/pci.h | 8 ++++++++
2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 4d6823f..99bc728 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -813,16 +813,15 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
if (r->parent || (r->flags & mask) != type)
continue;
r_size = resource_size(r);
-#ifdef CONFIG_PCI_IOV
+
/* put SRIOV requested res to the optional list */
- if (realloc_head && i >= PCI_IOV_RESOURCES &&
- i <= PCI_IOV_RESOURCE_END) {
+ if (realloc_head && is_pci_iov_resource_idx(i)) {
r->end = r->start - 1;
add_to_list(realloc_head, dev, r, r_size, 0/* dont' care */);
children_add_size += r_size;
continue;
}
-#endif
+
/* For bridges size != alignment */
align = pci_resource_alignment(dev, r);
order = __ffs(align) - 20;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c0704a0..0d254d9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -114,6 +114,14 @@ enum {
DEVICE_COUNT_RESOURCE = PCI_NUM_RESOURCES,
};

+static inline bool is_pci_iov_resource_idx(int i)
+{
+#ifdef CONFIG_PCI_IOV
+ return i >= PCI_IOV_RESOURCES && i <= PCI_IOV_RESOURCE_END;
+#endif
+ return false;
+}
+
typedef int __bitwise pci_power_t;

#define PCI_D0 ((pci_power_t __force) 0)
--
1.7.7

2012-05-23 06:36:47

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 06/11] resource: make find_resource could return just fit resource

Find all suitable empty slots and pick one just fit, so we could spare the big
slot for needed ones later.

Signed-off-by: Yinghai Lu <[email protected]>
---
kernel/resource.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++---
1 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 41d7050..45ab24d 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -435,7 +435,7 @@ static int __find_resource(struct resource *root, struct resource *old,
alloc.end = alloc.start + size - 1;
if (resource_contains(&avail, &alloc)) {
new->start = alloc.start;
- new->end = alloc.end;
+ new->end = !old ? avail.end : alloc.end;
return 0;
}
}
@@ -450,14 +450,66 @@ next: if (!this || this->end == root->end)
return -EBUSY;
}

+struct avail_resource {
+ struct list_head list;
+ struct resource res;
+};
/*
* Find empty slot in the resource tree given range and alignment.
*/
static int find_resource(struct resource *root, struct resource *new,
resource_size_t size,
- struct resource_constraint *constraint)
+ struct resource_constraint *constraint, bool fit)
{
- return __find_resource(root, NULL, new, size, constraint);
+ int ret = -1;
+ LIST_HEAD(head);
+ struct avail_resource *avail, *tmp;
+ resource_size_t avail_start = 0, avail_size = -1ULL;
+
+ if (!fit) {
+ ret = __find_resource(root, NULL, new, size, constraint);
+ if (!ret)
+ new->end = new->start + size - 1;
+ return ret;
+ }
+
+again:
+ /* find all suitable ones */
+ avail = kzalloc(sizeof(*avail), GFP_KERNEL);
+ if (!avail)
+ goto out;
+
+ avail->res.start = new->start;
+ avail->res.end = new->end;
+ avail->res.flags = new->flags;
+ ret = __find_resource(root, NULL, &avail->res, size, constraint);
+ if (ret || __request_resource(root, &avail->res)) {
+ ret = -EBUSY;
+ kfree(avail);
+ goto out;
+ }
+ /* add to the list */
+ list_add(&avail->list, &head);
+ goto again;
+
+out:
+ /* pick up the smallest one and delete the list */
+ list_for_each_entry_safe(avail, tmp, &head, list) {
+ if (resource_size(&avail->res) < avail_size) {
+ avail_size = resource_size(&avail->res);
+ avail_start = avail->res.start;
+ ret = 0;
+ }
+ list_del(&avail->list);
+ __release_resource(&avail->res);
+ kfree(avail);
+ }
+
+ if (!ret) {
+ new->start = avail_start;
+ new->end = new->start + size - 1;
+ }
+ return ret;
}

/**
@@ -550,7 +602,7 @@ static int __allocate_resource(struct resource *root, struct resource *new,

if (lock)
write_lock(&resource_lock);
- err = find_resource(root, new, size, &constraint);
+ err = find_resource(root, new, size, &constraint, false);
if (err >= 0 && __request_resource(root, new))
err = -EBUSY;
if (lock)
--
1.7.7

2012-05-23 06:34:56

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 01/11] PCI: Should add children device res to fail list

We can stop according to try number now. So do need that as stop sign.

Signed-off-by: Yinghai Lu <[email protected]>
---
drivers/pci/setup-bus.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 192172c..64d478f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -283,7 +283,7 @@ static void assign_requested_resources_sorted(struct list_head *head,
idx = res - &dev_res->dev->resource[0];
if (resource_size(res) &&
pci_assign_resource(dev_res->dev, idx)) {
- if (fail_head && !pci_is_root_bus(dev_res->dev->bus)) {
+ if (fail_head) {
/*
* if the failed res is for ROM BAR, and it will
* be enabled later, don't add it to the list
--
1.7.7

2012-05-23 06:37:20

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 04/11] PCI: Make sure assign same align with large size resource at first

When sorting them, put the one with large size before small size.


Signed-off-by: Yinghai Lu <[email protected]>
---
drivers/pci/setup-bus.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 64d478f..0568f29 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -128,7 +128,7 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
for (i = 0; i < PCI_NUM_RESOURCES; i++) {
struct resource *r;
struct pci_dev_resource *dev_res, *tmp;
- resource_size_t r_align;
+ resource_size_t r_align, r_size;
struct list_head *n;

r = &dev->resource[i];
@@ -145,6 +145,7 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
i, r);
continue;
}
+ r_size = resource_size(r);

tmp = kzalloc(sizeof(*tmp), GFP_KERNEL);
if (!tmp)
@@ -161,7 +162,9 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
align = pci_resource_alignment(dev_res->dev,
dev_res->res);

- if (r_align > align) {
+ if (r_align > align ||
+ (r_align == align &&
+ r_size > resource_size(dev_res->res))) {
n = &dev_res->list;
break;
}
--
1.7.7

2012-05-23 06:37:18

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 03/11] intel-gtt: Read 64bit for gmar_bus_addr

That bar could be 64bit pref mem.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: David Airlie <[email protected]>
---
drivers/char/agp/intel-gtt.c | 14 ++++++++++----
1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index 7f025fb..77e150e 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -770,16 +770,22 @@ static void i830_write_entry(dma_addr_t addr, unsigned int entry,
static bool intel_enable_gtt(void)
{
u32 gma_addr;
+ u32 addr_hi = 0;
u8 __iomem *reg;
+ int pos;

if (INTEL_GTT_GEN <= 2)
- pci_read_config_dword(intel_private.pcidev, I810_GMADDR,
- &gma_addr);
+ pos = I810_GMADDR;
else
- pci_read_config_dword(intel_private.pcidev, I915_GMADDR,
- &gma_addr);
+ pos = I915_GMADDR;
+
+ pci_read_config_dword(intel_private.pcidev, pos, &gma_addr);
+
+ if (gma_addr & PCI_BASE_ADDRESS_MEM_TYPE_64)
+ pci_read_config_dword(intel_private.pcidev, pos + 4, &addr_hi);

intel_private.gma_bus_addr = (gma_addr & PCI_BASE_ADDRESS_MEM_MASK);
+ intel_private.gma_bus_addr |= (u64)addr_hi << 32;

if (INTEL_GTT_GEN >= 6)
return true;
--
1.7.7

2012-05-23 06:38:17

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 05/11] resources: Split out __allocate_resource()

It will take bool lock, so we could use it in other functions that
hold the resource lock.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Andrew Morton <[email protected]>
---
kernel/resource.c | 26 +++++++++++++++++++++-----
1 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 51ade23..41d7050 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -521,14 +521,14 @@ out:
* @alignf: alignment function, optional, called if not NULL
* @alignf_data: arbitrary data to pass to the @alignf function
*/
-int allocate_resource(struct resource *root, struct resource *new,
+static int __allocate_resource(struct resource *root, struct resource *new,
resource_size_t size, resource_size_t min,
resource_size_t max, resource_size_t align,
resource_size_t (*alignf)(void *,
const struct resource *,
resource_size_t,
resource_size_t),
- void *alignf_data)
+ void *alignf_data, bool lock)
{
int err;
struct resource_constraint constraint;
@@ -542,19 +542,35 @@ int allocate_resource(struct resource *root, struct resource *new,
constraint.alignf = alignf;
constraint.alignf_data = alignf_data;

- if ( new->parent ) {
+ if (new->parent && lock) {
/* resource is already allocated, try reallocating with
the new constraints */
return reallocate_resource(root, new, size, &constraint);
}

- write_lock(&resource_lock);
+ if (lock)
+ write_lock(&resource_lock);
err = find_resource(root, new, size, &constraint);
if (err >= 0 && __request_resource(root, new))
err = -EBUSY;
- write_unlock(&resource_lock);
+ if (lock)
+ write_unlock(&resource_lock);
return err;
}
+int allocate_resource(struct resource *root, struct resource *new,
+ resource_size_t size, resource_size_t min,
+ resource_size_t max, resource_size_t align,
+ resource_size_t (*alignf)(void *,
+ const struct resource *,
+ resource_size_t,
+ resource_size_t),
+ void *alignf_data)
+{
+ bool lock = true;
+
+ return __allocate_resource(root, new, size, min, max, align,
+ alignf, alignf_data, lock);
+}

EXPORT_SYMBOL(allocate_resource);

--
1.7.7

2012-05-23 07:21:54

by Dave Airlie

[permalink] [raw]
Subject: Re: [PATCH 03/11] intel-gtt: Read 64bit for gmar_bus_addr

On Wed, May 23, 2012 at 7:34 AM, Yinghai Lu <[email protected]> wrote:
> That bar could be 64bit pref mem.
>
> Signed-off-by: Yinghai Lu <[email protected]>
> Cc: David Airlie <[email protected]>

Adding Daniel Vetter.

Dave.
> ---
> ?drivers/char/agp/intel-gtt.c | ? 14 ++++++++++----
> ?1 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
> index 7f025fb..77e150e 100644
> --- a/drivers/char/agp/intel-gtt.c
> +++ b/drivers/char/agp/intel-gtt.c
> @@ -770,16 +770,22 @@ static void i830_write_entry(dma_addr_t addr, unsigned int entry,
> ?static bool intel_enable_gtt(void)
> ?{
> ? ? ? ?u32 gma_addr;
> + ? ? ? u32 addr_hi = 0;
> ? ? ? ?u8 __iomem *reg;
> + ? ? ? int pos;
>
> ? ? ? ?if (INTEL_GTT_GEN <= 2)
> - ? ? ? ? ? ? ? pci_read_config_dword(intel_private.pcidev, I810_GMADDR,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? &gma_addr);
> + ? ? ? ? ? ? ? pos = I810_GMADDR;
> ? ? ? ?else
> - ? ? ? ? ? ? ? pci_read_config_dword(intel_private.pcidev, I915_GMADDR,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? &gma_addr);
> + ? ? ? ? ? ? ? pos = I915_GMADDR;
> +
> + ? ? ? pci_read_config_dword(intel_private.pcidev, pos, &gma_addr);
> +
> + ? ? ? if (gma_addr & PCI_BASE_ADDRESS_MEM_TYPE_64)
> + ? ? ? ? ? ? ? pci_read_config_dword(intel_private.pcidev, pos + 4, &addr_hi);
>
> ? ? ? ?intel_private.gma_bus_addr = (gma_addr & PCI_BASE_ADDRESS_MEM_MASK);
> + ? ? ? intel_private.gma_bus_addr |= (u64)addr_hi << 32;
>
> ? ? ? ?if (INTEL_GTT_GEN >= 6)
> ? ? ? ? ? ?return true;
> --
> 1.7.7
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/

2012-05-23 07:43:28

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 03/11] intel-gtt: Read 64bit for gmar_bus_addr

On Wed, May 23, 2012 at 08:21:51AM +0100, Dave Airlie wrote:
> On Wed, May 23, 2012 at 7:34 AM, Yinghai Lu <[email protected]> wrote:
> > That bar could be 64bit pref mem.
> >
> > Signed-off-by: Yinghai Lu <[email protected]>
> > Cc: David Airlie <[email protected]>
>
> Adding Daniel Vetter.

Reviewed-by: Daniel Vetter <[email protected]>
--
Daniel Vetter
Mail: [email protected]
Mobile: +41 (0)79 365 57 48

2012-05-23 15:58:08

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Tue, May 22, 2012 at 11:34 PM, Yinghai Lu <[email protected]> wrote:
> and will fall back to below 4g if it can not find any above 4g.

Has this been tested on 32-bit machines without PAE? There might be
things that just happen to work because their allocations were always
done bottom-up.

Or do we have something else that protects us from the "oops, we can't
actually *map* those pages"?

Linus

2012-05-23 17:30:15

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Wed, May 23, 2012 at 8:57 AM, Linus Torvalds
<[email protected]> wrote:
> On Tue, May 22, 2012 at 11:34 PM, Yinghai Lu <[email protected]> wrote:
>> and will fall back to below 4g if it can not find any above 4g.
>
> Has this been tested on 32-bit machines without PAE? There might be
> things that just happen to work because their allocations were always
> done bottom-up.

Good point. that problem should be addressed at first before this patch.

>
> Or do we have something else that protects us from the "oops, we can't
> actually *map* those pages"?

Steven tested on his setup.

I tested some Infiniband cards.

Thanks

Yinghai

2012-05-23 18:40:48

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Wed, May 23, 2012 at 10:30 AM, Yinghai Lu <[email protected]> wrote:
> On Wed, May 23, 2012 at 8:57 AM, Linus Torvalds
> <[email protected]> wrote:
>> On Tue, May 22, 2012 at 11:34 PM, Yinghai Lu <[email protected]> wrote:
>>> and will fall back to below 4g if it can not find any above 4g.
>>
>> Has this been tested on 32-bit machines without PAE? There might be
>> things that just happen to work because their allocations were always
>> done bottom-up.
>
> Good point. that problem should be addressed at first before this patch.

Just checked code for 32bit machines without PAE.

when X86_PAE is not set, phys_addr_t aka resource_size_t will be 32bit.
so in drivers/pci/bus.c::pci_bus_alloc_resource_fit()
will have bottom to 0.
resource_size_t bottom = PCIBIOS_MAX_MEM_32 + 1ULL;
also in arch/x86/kernel/setup.c::setup_arch()
iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
will have iomem_resource.end to 0xffffffff

when X86_PAE is set, but CPU does not support PAE.
phys_addr_t aka resource_size_t will be 32bit.
so in drivers/pci/bus.c::pci_bus_alloc_resource_fit()
will have bottom to 4g.
resource_size_t bottom = PCIBIOS_MAX_MEM_32 + 1ULL;
but
in arch/x86/kernel/setup.c::setup_arch()
iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
will have iomem_resource.end to 0xffffffff, because x86_phys_bits is 32 when PAE
is not detected in arch/x86/kernel/cpu/common.c::get_cpu_cap.
that mean first try will fail, so it will go to second try with bottom to 0.

so both case are safe with this patch.

Thanks

Yinghai

2012-05-25 04:36:55

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Wed, May 23, 2012 at 11:40:46AM -0700, Yinghai Lu wrote:
> On Wed, May 23, 2012 at 10:30 AM, Yinghai Lu <[email protected]> wrote:
> > On Wed, May 23, 2012 at 8:57 AM, Linus Torvalds
> > <[email protected]> wrote:
> >> On Tue, May 22, 2012 at 11:34 PM, Yinghai Lu <[email protected]> wrote:
> >>> and will fall back to below 4g if it can not find any above 4g.
> >>
> >> Has this been tested on 32-bit machines without PAE? There might be
> >> things that just happen to work because their allocations were always
> >> done bottom-up.
> >
> > Good point. that problem should be addressed at first before this patch.
>
> Just checked code for 32bit machines without PAE.
>
> when X86_PAE is not set, phys_addr_t aka resource_size_t will be 32bit.
> so in drivers/pci/bus.c::pci_bus_alloc_resource_fit()
> will have bottom to 0.
> resource_size_t bottom = PCIBIOS_MAX_MEM_32 + 1ULL;
> also in arch/x86/kernel/setup.c::setup_arch()
> iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
> will have iomem_resource.end to 0xffffffff
>
> when X86_PAE is set, but CPU does not support PAE.
> phys_addr_t aka resource_size_t will be 32bit.

I think you meant phys_addr_t and resource_size_t will be *64* bit
when X86_PAE is set. Obvious to you, but quite confusing to non-x86
experts like me :)

> so in drivers/pci/bus.c::pci_bus_alloc_resource_fit()
> will have bottom to 4g.
> resource_size_t bottom = PCIBIOS_MAX_MEM_32 + 1ULL;
> but
> in arch/x86/kernel/setup.c::setup_arch()
> iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
> will have iomem_resource.end to 0xffffffff, because x86_phys_bits is 32 when PAE
> is not detected in arch/x86/kernel/cpu/common.c::get_cpu_cap.
> that mean first try will fail, so it will go to second try with bottom to 0.
>
> so both case are safe with this patch.

I don't really like the dependency on PCIBIOS_MAX_MEM_32 + 1ULL
overflowing to zero -- that means the reader has to know what the
value of PCIBIOS_MAX_MEM_32 is, and things would break in non-obvious
ways if we changed it.

What do you think of a patch like the following? It makes it
explicit that we can only allocate space the CPU can address.

commit feded2ae21d6160292726ccd5128080d42395be4
Author: Bjorn Helgaas <[email protected]>
Date: Thu May 24 22:15:26 2012 -0600

PCI: try to allocate 64-bit resources above 4GB

If we have a 64-bit resource, try to allocate it above 4GB first. If that
fails, either because there's no space or the CPU can't address space above
4GB (iomem_resource.end is the highest address the CPU supports), we'll
fall back to allocating space below 4GB.

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 4ce5ef2..2c56693 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -121,14 +121,18 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
{
int i, ret = -ENOMEM;
struct resource *r;
- resource_size_t max = -1;
+ resource_size_t start = 0;
+ resource_size_t end = PCIBIOS_MAX_MEM_32;

type_mask |= IORESOURCE_IO | IORESOURCE_MEM;

- /* don't allocate too high if the pref mem doesn't support 64bit*/
- if (!(res->flags & IORESOURCE_MEM_64))
- max = PCIBIOS_MAX_MEM_32;
+ /* If this is a 64-bit resource, prefer space above 4GB */
+ if (res->flags & IORESOURCE_MEM_64) {
+ start = PCIBIOS_MAX_MEM_32 + 1ULL;
+ end = iomem_resource.end;
+ }

+again:
pci_bus_for_each_resource(bus, r, i) {
if (!r)
continue;
@@ -145,12 +149,18 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,

/* Ok, try it out.. */
ret = allocate_resource(r, res, size,
- r->start ? : min,
- max, align,
+ max(start, r->start ? : min),
+ end, align,
alignf, alignf_data);
if (ret == 0)
- break;
+ return 0;
}
+
+ if (start != 0) {
+ start = 0;
+ goto again;
+ }
+
return ret;
}

2012-05-25 17:53:28

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Thu, May 24, 2012 at 9:36 PM, Bjorn Helgaas <[email protected]> wrote:
> On Wed, May 23, 2012 at 11:40:46AM -0700, Yinghai Lu wrote:
>> On Wed, May 23, 2012 at 10:30 AM, Yinghai Lu <[email protected]> wrote:
>> > On Wed, May 23, 2012 at 8:57 AM, Linus Torvalds
>> > <[email protected]> wrote:
>> >> On Tue, May 22, 2012 at 11:34 PM, Yinghai Lu <[email protected]> wrote:
>> >>> and will fall back to below 4g if it can not find any above 4g.
>> >>
>> >> Has this been tested on 32-bit machines without PAE? There might be
>> >> things that just happen to work because their allocations were always
>> >> done bottom-up.
>> >
>> > Good point. that problem should be addressed at first before this patch.
>>
>> Just checked code for 32bit machines without PAE.
>>
>> when X86_PAE is not set, phys_addr_t aka resource_size_t will be 32bit.
>> so in drivers/pci/bus.c::pci_bus_alloc_resource_fit()
>> will have bottom to 0.
>> ? ? resource_size_t bottom = PCIBIOS_MAX_MEM_32 + 1ULL;
>> also in arch/x86/kernel/setup.c::setup_arch()
>> ? ?iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
>> will have iomem_resource.end to 0xffffffff
>>
>> when X86_PAE is set, but CPU does not support PAE.
>> phys_addr_t aka resource_size_t will be 32bit.
>
> I think you meant phys_addr_t and resource_size_t will be *64* bit
> when X86_PAE is set. ?Obvious to you, but quite confusing to non-x86
> experts like me :)
>
>> so in drivers/pci/bus.c::pci_bus_alloc_resource_fit()
>> will have bottom to 4g.
>> ? ? resource_size_t bottom = PCIBIOS_MAX_MEM_32 + 1ULL;
>> but
>> in arch/x86/kernel/setup.c::setup_arch()
>> ? ?iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
>> will have iomem_resource.end to 0xffffffff, because x86_phys_bits is 32 when PAE
>> is not detected in arch/x86/kernel/cpu/common.c::get_cpu_cap.
>> that mean first try will fail, so it will go to second try with bottom to 0.
>>
>> so both case are safe with this patch.
>
> I don't really like the dependency on PCIBIOS_MAX_MEM_32 + 1ULL
> overflowing to zero -- that means the reader has to know what the
> value of PCIBIOS_MAX_MEM_32 is, and things would break in non-obvious
> ways if we changed it.
>
> What do you think of a patch like the following? ?It makes it
> explicit that we can only allocate space the CPU can address.
>
> commit feded2ae21d6160292726ccd5128080d42395be4
> Author: Bjorn Helgaas <[email protected]>
> Date: ? Thu May 24 22:15:26 2012 -0600
>
> ? ?PCI: try to allocate 64-bit resources above 4GB
>
> ? ?If we have a 64-bit resource, try to allocate it above 4GB first. ?If that
> ? ?fails, either because there's no space or the CPU can't address space above
> ? ?4GB (iomem_resource.end is the highest address the CPU supports), we'll
> ? ?fall back to allocating space below 4GB.
>
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 4ce5ef2..2c56693 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -121,14 +121,18 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
> ?{
> ? ? ? ?int i, ret = -ENOMEM;
> ? ? ? ?struct resource *r;
> - ? ? ? resource_size_t max = -1;
> + ? ? ? resource_size_t start = 0;
> + ? ? ? resource_size_t end = PCIBIOS_MAX_MEM_32;
>
> ? ? ? ?type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
>
> - ? ? ? /* don't allocate too high if the pref mem doesn't support 64bit*/
> - ? ? ? if (!(res->flags & IORESOURCE_MEM_64))
> - ? ? ? ? ? ? ? max = PCIBIOS_MAX_MEM_32;
> + ? ? ? /* If this is a 64-bit resource, prefer space above 4GB */
> + ? ? ? if (res->flags & IORESOURCE_MEM_64) {
> + ? ? ? ? ? ? ? start = PCIBIOS_MAX_MEM_32 + 1ULL;
> + ? ? ? ? ? ? ? end = iomem_resource.end;

but here we still have PCIBIOS_MAX_MEM_32 + 1ULL ...will still have
overflow to 0..

also because all mmio will in iomem_resource, so we don't need to
specify it, and still keep using -1 as max.
aka avoid referring global iomem_resource here.

So this version is the same as old version, and just reverse checking
res->flags & IORESOURCE_MEM_64

> + ? ? ? }
>
> +again:
> ? ? ? ?pci_bus_for_each_resource(bus, r, i) {
> ? ? ? ? ? ? ? ?if (!r)
> ? ? ? ? ? ? ? ? ? ? ? ?continue;
> @@ -145,12 +149,18 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
>
> ? ? ? ? ? ? ? ?/* Ok, try it out.. */
> ? ? ? ? ? ? ? ?ret = allocate_resource(r, res, size,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? r->start ? : min,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? max, align,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? max(start, r->start ? : min),
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? end, align,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?alignf, alignf_data);
> ? ? ? ? ? ? ? ?if (ret == 0)
> - ? ? ? ? ? ? ? ? ? ? ? break;
> + ? ? ? ? ? ? ? ? ? ? ? return 0;
> ? ? ? ?}
> +
> + ? ? ? if (start != 0) {
> + ? ? ? ? ? ? ? start = 0;
> + ? ? ? ? ? ? ? goto again;
> + ? ? ? }
> +
> ? ? ? ?return ret;
> ?}
>

2012-05-25 18:39:28

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Fri, May 25, 2012 at 10:53 AM, Yinghai Lu <[email protected]> wrote:
>> I don't really like the dependency on PCIBIOS_MAX_MEM_32 + 1ULL
>> overflowing to zero -- that means the reader has to know what the
>> value of PCIBIOS_MAX_MEM_32 is, and things would break in non-obvious
>> ways if we changed it.
>>

please check if attached one is more clear.

make max and bottom is only related to _MEM and not default one.

- if (!(res->flags & IORESOURCE_MEM_64))
- max = PCIBIOS_MAX_MEM_32;
+ if (res->flags & IORESOURCE_MEM) {
+ if (!(res->flags & IORESOURCE_MEM_64))
+ max = PCIBIOS_MAX_MEM_32;
+ else if (PCIBIOS_MAX_MEM_32 != -1)
+ bottom = (resource_size_t)(1ULL<<32);
+ }

will still not affect to other arches.


Thanks

Yinghai


Attachments:
allocate_high_at_first.patch (1.90 kB)

2012-05-25 19:37:22

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Fri, May 25, 2012 at 11:39:26AM -0700, Yinghai Lu wrote:
> On Fri, May 25, 2012 at 10:53 AM, Yinghai Lu <[email protected]> wrote:
> >> I don't really like the dependency on PCIBIOS_MAX_MEM_32 + 1ULL
> >> overflowing to zero -- that means the reader has to know what the
> >> value of PCIBIOS_MAX_MEM_32 is, and things would break in non-obvious
> >> ways if we changed it.
> >>
>
> please check if attached one is more clear.
>
> make max and bottom is only related to _MEM and not default one.
>
> - if (!(res->flags & IORESOURCE_MEM_64))
> - max = PCIBIOS_MAX_MEM_32;
> + if (res->flags & IORESOURCE_MEM) {
> + if (!(res->flags & IORESOURCE_MEM_64))
> + max = PCIBIOS_MAX_MEM_32;
> + else if (PCIBIOS_MAX_MEM_32 != -1)
> + bottom = (resource_size_t)(1ULL<<32);
> + }
>
> will still not affect to other arches.

That's goofy. You're proposing to make only x86_64 and x86-PAE try to put
64-bit BARs above 4GB. Why should this be specific to x86? I acknowledge
that there's risk in doing this, but if it's a good idea for x86_64, it
should also be a good idea for other 64-bit architectures.

And testing for "is this x86_32 without PAE?" with
"PCIBIOS_MAX_MEM_32 == -1" is just plain obtuse and hides an
important bit of arch-specific behavior.

Tangential question about allocate_resource(): Is its "max" argument
really necessary? We'll obviously only allocate from inside the root
resource, so "max" is just a way to artificially avoid the end of
that resource. Is there really a case where that's required?

"min" makes sense because in a case like this, it's valid to allocate from
anywhere in the root resource, but we want to try to allocate from the >4GB
part first, then fall back to allocating from the whole resource. I'm not
sure there's a corresponding case for "max."

Getting back to this patch, I don't think we should need to adjust "max" at
all. For example, this:

commit cb1c8e46244cfd84a1a2fe91be860a74c1cf4e25
Author: Bjorn Helgaas <[email protected]>
Date: Thu May 24 22:15:26 2012 -0600

PCI: try to allocate 64-bit mem resources above 4GB

If we have a 64-bit mem resource, try to allocate it above 4GB first. If
that fails, we'll fall back to allocating space below 4GB.

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 4ce5ef2..075e5b1 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -121,14 +121,16 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
{
int i, ret = -ENOMEM;
struct resource *r;
- resource_size_t max = -1;
+ resource_size_t start = 0;
+ resource_size_t end = MAX_RESOURCE;

type_mask |= IORESOURCE_IO | IORESOURCE_MEM;

- /* don't allocate too high if the pref mem doesn't support 64bit*/
- if (!(res->flags & IORESOURCE_MEM_64))
- max = PCIBIOS_MAX_MEM_32;
+ /* If this is a 64-bit mem resource, try above 4GB first */
+ if (res->flags & IORESOURCE_MEM_64)
+ start = (resource_size_t) (1ULL << 32);

+again:
pci_bus_for_each_resource(bus, r, i) {
if (!r)
continue;
@@ -145,12 +147,18 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,

/* Ok, try it out.. */
ret = allocate_resource(r, res, size,
- r->start ? : min,
- max, align,
+ max(start, r->start ? : min),
+ end, align,
alignf, alignf_data);
if (ret == 0)
- break;
+ return 0;
+ }
+
+ if (start != 0) {
+ start = 0;
+ goto again;
}
+
return ret;
}

2012-05-25 20:18:37

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On 05/25/2012 12:37 PM, Bjorn Helgaas wrote:
>
> That's goofy. You're proposing to make only x86_64 and x86-PAE try to put
> 64-bit BARs above 4GB. Why should this be specific to x86? I acknowledge
> that there's risk in doing this, but if it's a good idea for x86_64, it
> should also be a good idea for other 64-bit architectures.
>

And so it is (assuming, of course, that we can address that memory.)

-hpa

2012-05-25 20:19:50

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Fri, May 25, 2012 at 12:37 PM, Bjorn Helgaas <[email protected]> wrote:
> On Fri, May 25, 2012 at 11:39:26AM -0700, Yinghai Lu wrote:
>> On Fri, May 25, 2012 at 10:53 AM, Yinghai Lu <[email protected]> wrote:
>> >> I don't really like the dependency on PCIBIOS_MAX_MEM_32 + 1ULL
>> >> overflowing to zero -- that means the reader has to know what the
>> >> value of PCIBIOS_MAX_MEM_32 is, and things would break in non-obvious
>> >> ways if we changed it.
>> >>
>>
>> please check if attached one is more clear.
>>
>> make max and bottom is only related to _MEM and not default one.
>>
>> - ? ? ? if (!(res->flags & IORESOURCE_MEM_64))
>> - ? ? ? ? ? ? ? max = PCIBIOS_MAX_MEM_32;
>> + ? ? ? if (res->flags & IORESOURCE_MEM) {
>> + ? ? ? ? ? ? ? if (!(res->flags & IORESOURCE_MEM_64))
>> + ? ? ? ? ? ? ? ? ? ? ? max = PCIBIOS_MAX_MEM_32;
>> + ? ? ? ? ? ? ? else if (PCIBIOS_MAX_MEM_32 != -1)
>> + ? ? ? ? ? ? ? ? ? ? ? bottom = (resource_size_t)(1ULL<<32);
>> + ? ? ? }
>>
>> will still not affect to other arches.
>
> That's goofy. ?You're proposing to make only x86_64 and x86-PAE try to put
> 64-bit BARs above 4GB. ?Why should this be specific to x86? ?I acknowledge
> that there's risk in doing this, but if it's a good idea for x86_64, it
> should also be a good idea for other 64-bit architectures.
>
> And testing for "is this x86_32 without PAE?" with
> "PCIBIOS_MAX_MEM_32 == -1" is just plain obtuse and hides an
> important bit of arch-specific behavior.
>
> Tangential question about allocate_resource(): ?Is its "max" argument
> really necessary? ?We'll obviously only allocate from inside the root
> resource, so "max" is just a way to artificially avoid the end of
> that resource. ?Is there really a case where that's required?
>
> "min" makes sense because in a case like this, it's valid to allocate from
> anywhere in the root resource, but we want to try to allocate from the >4GB
> part first, then fall back to allocating from the whole resource. ?I'm not
> sure there's a corresponding case for "max."
>
> Getting back to this patch, I don't think we should need to adjust "max" at
> all. ?For example, this:
>
> commit cb1c8e46244cfd84a1a2fe91be860a74c1cf4e25
> Author: Bjorn Helgaas <[email protected]>
> Date: ? Thu May 24 22:15:26 2012 -0600
>
> ? ?PCI: try to allocate 64-bit mem resources above 4GB
>
> ? ?If we have a 64-bit mem resource, try to allocate it above 4GB first. ?If
> ? ?that fails, we'll fall back to allocating space below 4GB.
>
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 4ce5ef2..075e5b1 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -121,14 +121,16 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
> ?{
> ? ? ? ?int i, ret = -ENOMEM;
> ? ? ? ?struct resource *r;
> - ? ? ? resource_size_t max = -1;
> + ? ? ? resource_size_t start = 0;
> + ? ? ? resource_size_t end = MAX_RESOURCE;

yeah, MAX_RESOURCE is better than -1.

>
> ? ? ? ?type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
>
> - ? ? ? /* don't allocate too high if the pref mem doesn't support 64bit*/
> - ? ? ? if (!(res->flags & IORESOURCE_MEM_64))
> - ? ? ? ? ? ? ? max = PCIBIOS_MAX_MEM_32;

can not remove this one.
otherwise will could allocate above 4g range to non MEM64 resource.


> + ? ? ? /* If this is a 64-bit mem resource, try above 4GB first */
> + ? ? ? if (res->flags & IORESOURCE_MEM_64)
> + ? ? ? ? ? ? ? start = (resource_size_t) (1ULL << 32);

could affect other arches. let's see if other arches is ok.

please check merged version.

also we have

include/linux/range.h:#define MAX_RESOURCE ((resource_size_t)~0)
arch/x86/kernel/e820.c:#define MAX_RESOURCE_SIZE ((resource_size_t)-1)

we should merge them later?

Thanks

Yinghai


Attachments:
allocate_high_at_first_v3.patch (2.20 kB)

2012-05-25 21:55:25

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Fri, May 25, 2012 at 2:19 PM, Yinghai Lu <[email protected]> wrote:
> On Fri, May 25, 2012 at 12:37 PM, Bjorn Helgaas <[email protected]> wrote:
>> On Fri, May 25, 2012 at 11:39:26AM -0700, Yinghai Lu wrote:
>>> On Fri, May 25, 2012 at 10:53 AM, Yinghai Lu <[email protected]> wrote:
>>> >> I don't really like the dependency on PCIBIOS_MAX_MEM_32 + 1ULL
>>> >> overflowing to zero -- that means the reader has to know what the
>>> >> value of PCIBIOS_MAX_MEM_32 is, and things would break in non-obvious
>>> >> ways if we changed it.
>>> >>
>>>
>>> please check if attached one is more clear.
>>>
>>> make max and bottom is only related to _MEM and not default one.
>>>
>>> - ? ? ? if (!(res->flags & IORESOURCE_MEM_64))
>>> - ? ? ? ? ? ? ? max = PCIBIOS_MAX_MEM_32;
>>> + ? ? ? if (res->flags & IORESOURCE_MEM) {
>>> + ? ? ? ? ? ? ? if (!(res->flags & IORESOURCE_MEM_64))
>>> + ? ? ? ? ? ? ? ? ? ? ? max = PCIBIOS_MAX_MEM_32;
>>> + ? ? ? ? ? ? ? else if (PCIBIOS_MAX_MEM_32 != -1)
>>> + ? ? ? ? ? ? ? ? ? ? ? bottom = (resource_size_t)(1ULL<<32);
>>> + ? ? ? }
>>>
>>> will still not affect to other arches.
>>
>> That's goofy. ?You're proposing to make only x86_64 and x86-PAE try to put
>> 64-bit BARs above 4GB. ?Why should this be specific to x86? ?I acknowledge
>> that there's risk in doing this, but if it's a good idea for x86_64, it
>> should also be a good idea for other 64-bit architectures.
>>
>> And testing for "is this x86_32 without PAE?" with
>> "PCIBIOS_MAX_MEM_32 == -1" is just plain obtuse and hides an
>> important bit of arch-specific behavior.
>>
>> Tangential question about allocate_resource(): ?Is its "max" argument
>> really necessary? ?We'll obviously only allocate from inside the root
>> resource, so "max" is just a way to artificially avoid the end of
>> that resource. ?Is there really a case where that's required?
>>
>> "min" makes sense because in a case like this, it's valid to allocate from
>> anywhere in the root resource, but we want to try to allocate from the >4GB
>> part first, then fall back to allocating from the whole resource. ?I'm not
>> sure there's a corresponding case for "max."
>>
>> Getting back to this patch, I don't think we should need to adjust "max" at
>> all. ?For example, this:
>>
>> commit cb1c8e46244cfd84a1a2fe91be860a74c1cf4e25
>> Author: Bjorn Helgaas <[email protected]>
>> Date: ? Thu May 24 22:15:26 2012 -0600
>>
>> ? ?PCI: try to allocate 64-bit mem resources above 4GB
>>
>> ? ?If we have a 64-bit mem resource, try to allocate it above 4GB first. ?If
>> ? ?that fails, we'll fall back to allocating space below 4GB.
>>
>> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
>> index 4ce5ef2..075e5b1 100644
>> --- a/drivers/pci/bus.c
>> +++ b/drivers/pci/bus.c
>> @@ -121,14 +121,16 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
>> ?{
>> ? ? ? ?int i, ret = -ENOMEM;
>> ? ? ? ?struct resource *r;
>> - ? ? ? resource_size_t max = -1;
>> + ? ? ? resource_size_t start = 0;
>> + ? ? ? resource_size_t end = MAX_RESOURCE;
>
> yeah, MAX_RESOURCE is better than -1.
>
>>
>> ? ? ? ?type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
>>
>> - ? ? ? /* don't allocate too high if the pref mem doesn't support 64bit*/
>> - ? ? ? if (!(res->flags & IORESOURCE_MEM_64))
>> - ? ? ? ? ? ? ? max = PCIBIOS_MAX_MEM_32;
>
> can not remove this one.
> otherwise will could allocate above 4g range to non MEM64 resource.

Yeah, I convince myself of the dumbest things sometimes. It occurred
to me while driving home that we need this, but you beat me to it :)

I think we actually have a separate bug here. On 64-bit non-x86
architectures, PCIBIOS_MAX_MEM_32 is a 64-bit -1, so the following
attempt to avoid putting a 32-bit BAR above 4G only works on x86,
where PCIBIOS_MAX_MEM_32 is 0xffffffff.

/* don't allocate too high if the pref mem doesn't support 64bit*/
if (!(res->flags & IORESOURCE_MEM_64))
max = PCIBIOS_MAX_MEM_32;

I think we should fix this with a separate patch that removes
PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
0xffffffff (or some other "max 32-bit value" symbol). I don't think
there's anything arch-specific about this.

So I'd like to see two patches here:
1) Avoid allocating 64-bit regions for 32-bit BARs
2) Try to allocate regions above 4GB for 64-bit BARs

> also we have
>
> include/linux/range.h:#define MAX_RESOURCE ((resource_size_t)~0)
> arch/x86/kernel/e820.c:#define MAX_RESOURCE_SIZE ((resource_size_t)-1)
>
> we should merge them later?

I would support that.

Bjorn

2012-05-25 21:59:11

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On 05/25/2012 02:55 PM, Bjorn Helgaas wrote:
>
> I think we actually have a separate bug here. On 64-bit non-x86
> architectures, PCIBIOS_MAX_MEM_32 is a 64-bit -1, so the following
> attempt to avoid putting a 32-bit BAR above 4G only works on x86,
> where PCIBIOS_MAX_MEM_32 is 0xffffffff.
>
> /* don't allocate too high if the pref mem doesn't support 64bit*/
> if (!(res->flags & IORESOURCE_MEM_64))
> max = PCIBIOS_MAX_MEM_32;
>
> I think we should fix this with a separate patch that removes
> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
> 0xffffffff (or some other "max 32-bit value" symbol). I don't think
> there's anything arch-specific about this.
>
> So I'd like to see two patches here:
> 1) Avoid allocating 64-bit regions for 32-bit BARs
> 2) Try to allocate regions above 4GB for 64-bit BARs
>

Do we also need to track the maximum address available to the CPU?

-hpa

2012-05-25 22:15:18

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Fri, May 25, 2012 at 3:58 PM, H. Peter Anvin <[email protected]> wrote:
> On 05/25/2012 02:55 PM, Bjorn Helgaas wrote:
>>
>> I think we actually have a separate bug here. ?On 64-bit non-x86
>> architectures, PCIBIOS_MAX_MEM_32 is a 64-bit -1, so the following
>> attempt to avoid putting a 32-bit BAR above 4G only works on x86,
>> where PCIBIOS_MAX_MEM_32 is 0xffffffff.
>>
>> ? ? ? ? /* don't allocate too high if the pref mem doesn't support 64bit*/
>> ? ? ? ? if (!(res->flags & IORESOURCE_MEM_64))
>> ? ? ? ? ? ? ? ? max = PCIBIOS_MAX_MEM_32;
>>
>> I think we should fix this with a separate patch that removes
>> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
>> 0xffffffff (or some other "max 32-bit value" symbol). ?I don't think
>> there's anything arch-specific about this.
>>
>> So I'd like to see two patches here:
>> ? 1) Avoid allocating 64-bit regions for 32-bit BARs
>> ? 2) Try to allocate regions above 4GB for 64-bit BARs
>
> Do we also need to track the maximum address available to the CPU?

We are allocating from the resources available on this bus, which
means the host bridge apertures or a subset. If the arch supplies
those, that should be enough. If it doesn't, we default to
iomem_resource. On x86, we trim that down to only the supported
address space:

iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;

But I'm sure some other architectures don't do that yet. I guess
that's one of the risks -- if an arch is doesn't tell us the apertures
and doesn't trim iomem_resource, we could allocate a non-addressable
region.

Bjorn

2012-05-25 23:10:25

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Fri, May 25, 2012 at 2:55 PM, Bjorn Helgaas <[email protected]> wrote:
> I think we should fix this with a separate patch that removes
> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
> 0xffffffff (or some other "max 32-bit value" symbol). ?I don't think
> there's anything arch-specific about this.
>
> So I'd like to see two patches here:
> ?1) Avoid allocating 64-bit regions for 32-bit BARs
> ?2) Try to allocate regions above 4GB for 64-bit BARs

Sure. please check updated two patches.

Thanks

Yinghai


Attachments:
32_bit_bar_allocation.patch (2.50 kB)
allocate_high_at_first_v4.patch (2.09 kB)
Download all attachments

2012-05-26 00:13:05

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Fri, May 25, 2012 at 5:10 PM, Yinghai Lu <[email protected]> wrote:
> On Fri, May 25, 2012 at 2:55 PM, Bjorn Helgaas <[email protected]> wrote:
>> I think we should fix this with a separate patch that removes
>> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
>> 0xffffffff (or some other "max 32-bit value" symbol). ?I don't think
>> there's anything arch-specific about this.
>>
>> So I'd like to see two patches here:
>> ?1) Avoid allocating 64-bit regions for 32-bit BARs
>> ?2) Try to allocate regions above 4GB for 64-bit BARs
>
> Sure. please check updated two patches.

I think the first one looks good.

I'm curious about the second. Why did you add the IORESOURCE_MEM
test? That's doesn't affect the "start =" piece because
IORESOURCE_MEM is always set if IORESOURCE_MEM_64 is set.

But it does affect the "end =" part. Previously we limited all I/O
and 32-bit mem BARs to the low 4GB. This patch makes it so we limit
32-bit mem BARs to the low 4GB, but we don't limit I/O BARs. But I/O
BARs can only have 32 bits of address, so it seems like we should
limit them the same way as 32-bit mem BARs. So I expected something
like this:

if (res->flags & IORESOURCE_MEM_64) {
start = (resource_size_t) (1ULL << 32);
end = PCI_MAX_RESOURCE;
} else {
start = 0;
end = PCI_MAX_RESOURCE_32;
}

2012-05-26 15:02:13

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Fri, May 25, 2012 at 6:12 PM, Bjorn Helgaas <[email protected]> wrote:
> On Fri, May 25, 2012 at 5:10 PM, Yinghai Lu <[email protected]> wrote:
>> On Fri, May 25, 2012 at 2:55 PM, Bjorn Helgaas <[email protected]> wrote:
>>> I think we should fix this with a separate patch that removes
>>> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
>>> 0xffffffff (or some other "max 32-bit value" symbol). ?I don't think
>>> there's anything arch-specific about this.
>>>
>>> So I'd like to see two patches here:
>>> ?1) Avoid allocating 64-bit regions for 32-bit BARs
>>> ?2) Try to allocate regions above 4GB for 64-bit BARs
>>
>> Sure. please check updated two patches.
>
> I think the first one looks good.
>
> I'm curious about the second. ?Why did you add the IORESOURCE_MEM
> test? ?That's doesn't affect the "start =" piece because
> IORESOURCE_MEM is always set if IORESOURCE_MEM_64 is set.
>
> But it does affect the "end =" part. ?Previously we limited all I/O
> and 32-bit mem BARs to the low 4GB. ?This patch makes it so we limit
> 32-bit mem BARs to the low 4GB, but we don't limit I/O BARs. ?But I/O
> BARs can only have 32 bits of address, so it seems like we should
> limit them the same way as 32-bit mem BARs. ?So I expected something
> like this:
>
> ? ?if (res->flags & IORESOURCE_MEM_64) {
> ? ? ? ?start = (resource_size_t) (1ULL << 32);
> ? ? ? ?end = PCI_MAX_RESOURCE;
> ? ?} else {
> ? ? ? ?start = 0;
> ? ? ? ?end = PCI_MAX_RESOURCE_32;
> ? ?}

Another bug here: we're trying to restrict the *bus* addresses we
allocate, but we're applying the limits to *CPU* addresses.
Therefore, this only works as intended when CPU addresses are the same
as bus addresses, i.e., when the host bridge applies no address
translation. That happens to be the case for x86, but is not the case
in general.

I think we need a third patch to fix this problem.

2012-05-29 17:55:28

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Fri, May 25, 2012 at 5:12 PM, Bjorn Helgaas <[email protected]> wrote:
> On Fri, May 25, 2012 at 5:10 PM, Yinghai Lu <[email protected]> wrote:
>> On Fri, May 25, 2012 at 2:55 PM, Bjorn Helgaas <[email protected]> wrote:
>>> I think we should fix this with a separate patch that removes
>>> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
>>> 0xffffffff (or some other "max 32-bit value" symbol). ?I don't think
>>> there's anything arch-specific about this.
>>>
>>> So I'd like to see two patches here:
>>> ?1) Avoid allocating 64-bit regions for 32-bit BARs
>>> ?2) Try to allocate regions above 4GB for 64-bit BARs
>>
>> Sure. please check updated two patches.
>
> I think the first one looks good.
>
> I'm curious about the second. ?Why did you add the IORESOURCE_MEM
> test? ?That's doesn't affect the "start =" piece because
> IORESOURCE_MEM is always set if IORESOURCE_MEM_64 is set.
>
> But it does affect the "end =" part. ?Previously we limited all I/O
> and 32-bit mem BARs to the low 4GB. ?This patch makes it so we limit
> 32-bit mem BARs to the low 4GB, but we don't limit I/O BARs. ?But I/O
> BARs can only have 32 bits of address, so it seems like we should
> limit them the same way as 32-bit mem BARs. ?So I expected something
> like this:
>
> ? ?if (res->flags & IORESOURCE_MEM_64) {
> ? ? ? ?start = (resource_size_t) (1ULL << 32);
> ? ? ? ?end = PCI_MAX_RESOURCE;
> ? ?} else {
> ? ? ? ?start = 0;
> ? ? ? ?end = PCI_MAX_RESOURCE_32;
> ? ?}

x86 are using 16bits.

some others use 32 bits.
#define IO_SPACE_LIMIT 0xffffffff

ia64 and sparc are using 64bits.
#define IO_SPACE_LIMIT 0xffffffffffffffffUL

but pci only support 16bits and 32bits.

maybe later we can add
PCI_MAX_RESOURCE_16

to handle 16bits and 32bit io ports.

Yinghai

2012-05-29 17:56:47

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Sat, May 26, 2012 at 8:01 AM, Bjorn Helgaas <[email protected]> wrote:
> Another bug here: we're trying to restrict the *bus* addresses we
> allocate, but we're applying the limits to *CPU* addresses.
> Therefore, this only works as intended when CPU addresses are the same
> as bus addresses, i.e., when the host bridge applies no address
> translation. ?That happens to be the case for x86, but is not the case
> in general.
>
> I think we need a third patch to fix this problem.

yes.

2012-05-29 17:57:48

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>
> x86 are using 16bits.
>
> some others use 32 bits.
> #define IO_SPACE_LIMIT 0xffffffff
>
> ia64 and sparc are using 64bits.
> #define IO_SPACE_LIMIT 0xffffffffffffffffUL
>
> but pci only support 16bits and 32bits.
>
> maybe later we can add
> PCI_MAX_RESOURCE_16
>
> to handle 16bits and 32bit io ports.
>

Shouldn't this be dealt by root port apertures?

-hpa

2012-05-29 18:17:12

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <[email protected]> wrote:
> On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>>
>> x86 are using 16bits.
>>
>> some others use 32 bits.
>> #define IO_SPACE_LIMIT 0xffffffff
>>
>> ia64 and sparc are using 64bits.
>> #define IO_SPACE_LIMIT ? ? ? ? ? ? ? 0xffffffffffffffffUL
>>
>> but pci only support 16bits and 32bits.
>>
>> maybe later we can add
>> PCI_MAX_RESOURCE_16
>>
>> to handle 16bits and 32bit io ports.
>>
>
> Shouldn't this be dealt by root port apertures?
>

pci bridge could support 16bits and 32bits io port.
but we did not record if 32bits is supported.

so during allocating, could have allocated above 64k address to non
32bit bridge.

but x86 is ok, because ioport.end always set to 0xffff.
other arches with IO_SPACE_LIMIT with 0xffffffff or
0xffffffffffffffffUL may have problem.

Yinghai

2012-05-29 19:03:41

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On 05/29/2012 11:17 AM, Yinghai Lu wrote:
>
> pci bridge could support 16bits and 32bits io port.
> but we did not record if 32bits is supported.
>

Okay, so this is the actual problem, no?

> so during allocating, could have allocated above 64k address to non
> 32bit bridge.
>
> but x86 is ok, because ioport.end always set to 0xffff.
> other arches with IO_SPACE_LIMIT with 0xffffffff or
> 0xffffffffffffffffUL may have problem.

The latter is nonsense, the PCI-side address space is only 32 bits wide.

-hpa

2012-05-29 19:23:52

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu <[email protected]> wrote:
> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <[email protected]> wrote:
>> On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>>>
>>> x86 are using 16bits.
>>>
>>> some others use 32 bits.
>>> #define IO_SPACE_LIMIT 0xffffffff
>>>
>>> ia64 and sparc are using 64bits.
>>> #define IO_SPACE_LIMIT ? ? ? ? ? ? ? 0xffffffffffffffffUL
>>>
>>> but pci only support 16bits and 32bits.
>>>
>>> maybe later we can add
>>> PCI_MAX_RESOURCE_16
>>>
>>> to handle 16bits and 32bit io ports.
>>>
>>
>> Shouldn't this be dealt by root port apertures?
>>
>
> pci bridge could support 16bits and 32bits io port.
> but we did not record if 32bits is supported.
>
> so during allocating, could have allocated above 64k address to non
> 32bit bridge.
>
> but ?x86 is ok, because ioport.end always set to 0xffff.
> other arches with IO_SPACE_LIMIT with 0xffffffff or
> 0xffffffffffffffffUL may have problem.

I think current IO_SPACE_LIMIT usage is a little confused. The
"ioport_resource.end = IO_SPACE_LIMIT" in kernel/resource.c refers to
a CPU-side address, not a bus address. Other uses, e.g., in
__pci_read_base(), apply it to bus addresses from BARs, which is
wrong. Host bridges apply I/O port offsets just like they apply
memory offsets. The ia64 IO_SPACE_LIMIT of 0xffffffffffffffffUL means
there's no restriction on CPU-side I/O port addresses, but any given
host bridge will translate its I/O port aperture to bus addresses that
fit in 32 bits.

None of this is really relevant to the question I asked, namely, "why
Yinghai's patch doesn't limit I/O BAR values to 32 bits?" That
constraint is clearly a requirement because I/O BARs are only 32 bits
wide, but I don't think it needs to be enforced in the code here. The
host bridge or upstream P2P bridge apertures should already take care
of that automatically. I don't think the 16- or 32-bitness of P2P
bridge apertures is relevant here, because the I/O resources available
on the secondary bus already reflect that.

After all that discussion, I think my objection here boils down to
"you shouldn't change the I/O BAR constraints in a patch that claims
to allocate 64-bit *memory* BARs above 4GB."

I think the code below is still the clearest way to set the constraints:

if (res->flags & IORESOURCE_MEM_64) {
start = (resource_size_t) (1ULL << 32);
end = PCI_MAX_RESOURCE;
} else {
start = 0;
end = PCI_MAX_RESOURCE_32;
}

It's not strictly necessary to limit I/O BARs to PCI_MAX_RESOURCE_32
because host bridge apertures should already enforce that, but I think
the code above just makes it clearer.

2012-05-29 20:40:30

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas <[email protected]> wrote:
> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu <[email protected]> wrote:
>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <[email protected]> wrote:
>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>>>>
>>>> x86 are using 16bits.
>>>>
>>>> some others use 32 bits.
>>>> #define IO_SPACE_LIMIT 0xffffffff
>>>>
>>>> ia64 and sparc are using 64bits.
>>>> #define IO_SPACE_LIMIT ? ? ? ? ? ? ? 0xffffffffffffffffUL
>>>>
>>>> but pci only support 16bits and 32bits.
>>>>
>>>> maybe later we can add
>>>> PCI_MAX_RESOURCE_16
>>>>
>>>> to handle 16bits and 32bit io ports.
>>>>
>>>
>>> Shouldn't this be dealt by root port apertures?
>>>
>>
>> pci bridge could support 16bits and 32bits io port.
>> but we did not record if 32bits is supported.
>>
>> so during allocating, could have allocated above 64k address to non
>> 32bit bridge.
>>
>> but ?x86 is ok, because ioport.end always set to 0xffff.
>> other arches with IO_SPACE_LIMIT with 0xffffffff or
>> 0xffffffffffffffffUL may have problem.
>
> I think current IO_SPACE_LIMIT usage is a little confused. ?The
> "ioport_resource.end = IO_SPACE_LIMIT" in kernel/resource.c refers to
> a CPU-side address, not a bus address. ?Other uses, e.g., in
> __pci_read_base(), apply it to bus addresses from BARs, which is
> wrong. ?Host bridges apply I/O port offsets just like they apply
> memory offsets. ?The ia64 IO_SPACE_LIMIT of 0xffffffffffffffffUL means
> there's no restriction on CPU-side I/O port addresses, but any given
> host bridge will translate its I/O port aperture to bus addresses that
> fit in 32 bits.
>
> None of this is really relevant to the question I asked, namely, "why
> Yinghai's patch doesn't limit I/O BAR values to 32 bits?" ?That
> constraint is clearly a requirement because I/O BARs are only 32 bits
> wide, but I don't think it needs to be enforced in the code here. ?The
> host bridge or upstream P2P bridge apertures should already take care
> of that automatically. ?I don't think the 16- or 32-bitness of P2P
> bridge apertures is relevant here, because the I/O resources available
> on the secondary bus already reflect that.
>
> After all that discussion, I think my objection here boils down to
> "you shouldn't change the I/O BAR constraints in a patch that claims
> to allocate 64-bit *memory* BARs above 4GB."
>
> I think the code below is still the clearest way to set the constraints:
>
> ? if (res->flags & IORESOURCE_MEM_64) {
> ? ? ? start = (resource_size_t) (1ULL << 32);
> ? ? ? end = PCI_MAX_RESOURCE;
> ? } else {
> ? ? ? start = 0;
> ? ? ? end = PCI_MAX_RESOURCE_32;
> ? }
>
> It's not strictly necessary to limit I/O BARs to PCI_MAX_RESOURCE_32
> because host bridge apertures should already enforce that, but I think
> the code above just makes it clearer.


ok, please check the version, that put back PCI_MAX_RESOURCE_32 for io ports.

also RFC to limit for 16 bit ioport handling. only help other arches
that does support 32bit ioports but have bridges only support 16bit io
ports.

Thanks

Yinghai


Attachments:
allocate_high_at_first_v5.patch (2.13 kB)
fix_32bit_ioport.patch (5.10 kB)
Download all attachments

2012-05-29 20:46:06

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Tue, May 29, 2012 at 12:03 PM, H. Peter Anvin <[email protected]> wrote:
> On 05/29/2012 11:17 AM, Yinghai Lu wrote:
>>
>> pci bridge could support 16bits and 32bits io port.
>> but we did not record if 32bits is supported.
>>
>
> Okay, so this is the actual problem, no?

their fw could not need kernel help to allocate io ports, or they are
only use device that support 32bit ioport.

>
>> so during allocating, could have allocated above 64k address to non
>> 32bit bridge.
>>
>> but ?x86 is ok, because ioport.end always set to 0xffff.
>> other arches with IO_SPACE_LIMIT with 0xffffffff or
>> 0xffffffffffffffffUL may have problem.
>
> The latter is nonsense, the PCI-side address space is only 32 bits wide.
>
maybe they have unified io include ioport and mem io?

Yinghai

2012-05-29 20:51:13

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On 05/29/2012 01:46 PM, Yinghai Lu wrote:
> On Tue, May 29, 2012 at 12:03 PM, H. Peter Anvin <[email protected]> wrote:
>> On 05/29/2012 11:17 AM, Yinghai Lu wrote:
>>>
>>> pci bridge could support 16bits and 32bits io port.
>>> but we did not record if 32bits is supported.
>>>
>>
>> Okay, so this is the actual problem, no?
>
> their fw could not need kernel help to allocate io ports, or they are
> only use device that support 32bit ioport.
>

That barely parses, never mind makes sense.

>>
>>> so during allocating, could have allocated above 64k address to non
>>> 32bit bridge.
>>>
>>> but x86 is ok, because ioport.end always set to 0xffff.
>>> other arches with IO_SPACE_LIMIT with 0xffffffff or
>>> 0xffffffffffffffffUL may have problem.
>>
>> The latter is nonsense, the PCI-side address space is only 32 bits wide.
>>
> maybe they have unified io include ioport and mem io?
>

The bus-side address space should not be more than 32 bits no matter
what. As Bjorn indicates, you seem to be mixing up bus and cpu
addresses all over the place.

-hpa

2012-05-29 20:55:34

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

From: Yinghai Lu <[email protected]>
Date: Tue, 29 May 2012 13:46:03 -0700

> On Tue, May 29, 2012 at 12:03 PM, H. Peter Anvin <[email protected]> wrote:
>> On 05/29/2012 11:17 AM, Yinghai Lu wrote:
>>> so during allocating, could have allocated above 64k address to non
>>> 32bit bridge.
>>>
>>> but ?x86 is ok, because ioport.end always set to 0xffff.
>>> other arches with IO_SPACE_LIMIT with 0xffffffff or
>>> 0xffffffffffffffffUL may have problem.
>>
>> The latter is nonsense, the PCI-side address space is only 32 bits wide.
>>
> maybe they have unified io include ioport and mem io?

The resource values stored are the actual physical I/O addresses on
some platforms. Sparc, for example, falls into this category.

Therefore ~0UL is the only feasible limit for them.

We have to distinguish explicitly when we are operating upon actual
PCI bus view addresses vs. the values that end up in the resources.
They are entirely different, and have completely different address
ranges.

2012-05-29 23:25:09

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Tue, May 29, 2012 at 2:40 PM, Yinghai Lu <[email protected]> wrote:
> On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas <[email protected]> wrote:
>> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu <[email protected]> wrote:
>>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <[email protected]> wrote:
>>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>>>>>
>>>>> x86 are using 16bits.
>>>>>
>>>>> some others use 32 bits.
>>>>> #define IO_SPACE_LIMIT 0xffffffff
>>>>>
>>>>> ia64 and sparc are using 64bits.
>>>>> #define IO_SPACE_LIMIT ? ? ? ? ? ? ? 0xffffffffffffffffUL
>>>>>
>>>>> but pci only support 16bits and 32bits.
>>>>>
>>>>> maybe later we can add
>>>>> PCI_MAX_RESOURCE_16
>>>>>
>>>>> to handle 16bits and 32bit io ports.
>>>>>
>>>>
>>>> Shouldn't this be dealt by root port apertures?
>>>>
>>>
>>> pci bridge could support 16bits and 32bits io port.
>>> but we did not record if 32bits is supported.
>>>
>>> so during allocating, could have allocated above 64k address to non
>>> 32bit bridge.
>>>
>>> but ?x86 is ok, because ioport.end always set to 0xffff.
>>> other arches with IO_SPACE_LIMIT with 0xffffffff or
>>> 0xffffffffffffffffUL may have problem.
>>
>> I think current IO_SPACE_LIMIT usage is a little confused. ?The
>> "ioport_resource.end = IO_SPACE_LIMIT" in kernel/resource.c refers to
>> a CPU-side address, not a bus address. ?Other uses, e.g., in
>> __pci_read_base(), apply it to bus addresses from BARs, which is
>> wrong. ?Host bridges apply I/O port offsets just like they apply
>> memory offsets. ?The ia64 IO_SPACE_LIMIT of 0xffffffffffffffffUL means
>> there's no restriction on CPU-side I/O port addresses, but any given
>> host bridge will translate its I/O port aperture to bus addresses that
>> fit in 32 bits.
>>
>> None of this is really relevant to the question I asked, namely, "why
>> Yinghai's patch doesn't limit I/O BAR values to 32 bits?" ?That
>> constraint is clearly a requirement because I/O BARs are only 32 bits
>> wide, but I don't think it needs to be enforced in the code here. ?The
>> host bridge or upstream P2P bridge apertures should already take care
>> of that automatically. ?I don't think the 16- or 32-bitness of P2P
>> bridge apertures is relevant here, because the I/O resources available
>> on the secondary bus already reflect that.
>>
>> After all that discussion, I think my objection here boils down to
>> "you shouldn't change the I/O BAR constraints in a patch that claims
>> to allocate 64-bit *memory* BARs above 4GB."
>>
>> I think the code below is still the clearest way to set the constraints:
>>
>> ? if (res->flags & IORESOURCE_MEM_64) {
>> ? ? ? start = (resource_size_t) (1ULL << 32);
>> ? ? ? end = PCI_MAX_RESOURCE;
>> ? } else {
>> ? ? ? start = 0;
>> ? ? ? end = PCI_MAX_RESOURCE_32;
>> ? }
>>
>> It's not strictly necessary to limit I/O BARs to PCI_MAX_RESOURCE_32
>> because host bridge apertures should already enforce that, but I think
>> the code above just makes it clearer.
>
>
> ok, please check the version, that put back PCI_MAX_RESOURCE_32 for io ports.

I like the fact that this patch no longer changes anything for I/O
resources. I assume this is part of fixing some bug (Steven's?) I'd
like to have a pointer in the changelog to a bugzilla or discussion
about the bug.

The effect of this patch is similar to what I did earlier with
b126b4703afa4 and e7f8567db9a7 (allocate space from top down), though
this one is more limited and it won't change quite as much. We ran
into problems (BIOS defects, I think) and had to revert my patches, so
it's quite possible that we'll run into similar problems here.

I'm a little nervous because this is a fundamental area that explores
new areas of the address space minefield. I think we're generally
safer if we follow a path similar to where Windows has been. I think
Windows also prefers space above 4GB for 64-bit BARs, but I suspect
that's just a natural consequence of allocating from the top down. So
we'll place things just above 4GB, and Windows will place things as
high as possible.

I don't know the best solution here. This patch ("bottom-up above
4GB") is one possibility. Another is to allocate only 64-bit BARs
top-down. Or maybe allocate everything top-down on machines newer
than some date. They all seem ugly. What makes me uneasy is that
your patch strikes out on a new path that is different from what we've
done before *and* different from what Windows does.

2012-05-29 23:27:27

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Tue, May 29, 2012 at 2:40 PM, Yinghai Lu <[email protected]> wrote:
> On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas <[email protected]> wrote:
>> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu <[email protected]> wrote:
>>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <[email protected]> wrote:
>>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>>>>>
>>>>> x86 are using 16bits.
>>>>>
>>>>> some others use 32 bits.
>>>>> #define IO_SPACE_LIMIT 0xffffffff
>>>>>
>>>>> ia64 and sparc are using 64bits.
>>>>> #define IO_SPACE_LIMIT ? ? ? ? ? ? ? 0xffffffffffffffffUL
>>>>>
>>>>> but pci only support 16bits and 32bits.
>>>>>
>>>>> maybe later we can add
>>>>> PCI_MAX_RESOURCE_16
>>>>>
>>>>> to handle 16bits and 32bit io ports.
>>>>>
>>>>
>>>> Shouldn't this be dealt by root port apertures?
>>>>
>>>
>>> pci bridge could support 16bits and 32bits io port.
>>> but we did not record if 32bits is supported.
>>>
>>> so during allocating, could have allocated above 64k address to non
>>> 32bit bridge.
>>>
>>> but ?x86 is ok, because ioport.end always set to 0xffff.
>>> other arches with IO_SPACE_LIMIT with 0xffffffff or
>>> 0xffffffffffffffffUL may have problem.
>>
>> I think current IO_SPACE_LIMIT usage is a little confused. ?The
>> "ioport_resource.end = IO_SPACE_LIMIT" in kernel/resource.c refers to
>> a CPU-side address, not a bus address. ?Other uses, e.g., in
>> __pci_read_base(), apply it to bus addresses from BARs, which is
>> wrong. ?Host bridges apply I/O port offsets just like they apply
>> memory offsets. ?The ia64 IO_SPACE_LIMIT of 0xffffffffffffffffUL means
>> there's no restriction on CPU-side I/O port addresses, but any given
>> host bridge will translate its I/O port aperture to bus addresses that
>> fit in 32 bits.
>>
>> None of this is really relevant to the question I asked, namely, "why
>> Yinghai's patch doesn't limit I/O BAR values to 32 bits?" ?That
>> constraint is clearly a requirement because I/O BARs are only 32 bits
>> wide, but I don't think it needs to be enforced in the code here. ?The
>> host bridge or upstream P2P bridge apertures should already take care
>> of that automatically. ?I don't think the 16- or 32-bitness of P2P
>> bridge apertures is relevant here, because the I/O resources available
>> on the secondary bus already reflect that.
>>
>> After all that discussion, I think my objection here boils down to
>> "you shouldn't change the I/O BAR constraints in a patch that claims
>> to allocate 64-bit *memory* BARs above 4GB."
>>
>> I think the code below is still the clearest way to set the constraints:
>>
>> ? if (res->flags & IORESOURCE_MEM_64) {
>> ? ? ? start = (resource_size_t) (1ULL << 32);
>> ? ? ? end = PCI_MAX_RESOURCE;
>> ? } else {
>> ? ? ? start = 0;
>> ? ? ? end = PCI_MAX_RESOURCE_32;
>> ? }
>>
>> It's not strictly necessary to limit I/O BARs to PCI_MAX_RESOURCE_32
>> because host bridge apertures should already enforce that, but I think
>> the code above just makes it clearer.
>
>
> ok, please check the version, that put back PCI_MAX_RESOURCE_32 for io ports.
>
> also RFC to limit for 16 bit ioport handling. ?only help other arches
> that does support 32bit ioports but have bridges only support 16bit io
> ports.

I don't understand this one at all. It looks like you mashed together
at least two changes: (1) prefer I/O space above 64K if available, and
(2) mark secondary bus resources with IORESOURCE_IO_32 when the P2P
bridge I/O window address decode type is PCI_IO_RANGE_TYPE_32 and use
that to limit allocations.

I don't see the justification for (1). What problem does this solve?

I don't see the need for (2). Even without the start/end constraints
in pci_bus_alloc_resource(), we only allocate from the resources
available on the bus. Apparently you think this list might be
incorrect, i.e., it might include I/O space above 64K when it
shouldn't. I don't think that's the case, but if it is, we should fix
the list of bus resources, not add a hack to avoid parts of it.

2012-05-29 23:34:00

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Tue, May 29, 2012 at 4:27 PM, Bjorn Helgaas <[email protected]> wrote:
> I don't understand this one at all. ?It looks like you mashed together
> at least two changes: (1) prefer I/O space above 64K if available, and
> (2) mark secondary bus resources with IORESOURCE_IO_32 when the P2P
> bridge I/O window address decode type is PCI_IO_RANGE_TYPE_32 and use
> that to limit allocations.

let drop the one about IORESOURCE_IO_32.

that should only fix one reallocation bug in theory out of x86.

Thanks

Yinghai

2012-05-29 23:47:55

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Tue, May 29, 2012 at 5:33 PM, Yinghai Lu <[email protected]> wrote:
> On Tue, May 29, 2012 at 4:27 PM, Bjorn Helgaas <[email protected]> wrote:
>> I don't understand this one at all. ?It looks like you mashed together
>> at least two changes: (1) prefer I/O space above 64K if available, and
>> (2) mark secondary bus resources with IORESOURCE_IO_32 when the P2P
>> bridge I/O window address decode type is PCI_IO_RANGE_TYPE_32 and use
>> that to limit allocations.
>
> let drop the one about IORESOURCE_IO_32.
>
> that should only fix one reallocation bug in theory out of x86.

If there's a bug, we should try to fix it. I just don't understand
what the bug *is*.

2012-05-30 07:40:50

by Steven Newbury

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 30/05/12 00:27, Bjorn Helgaas wrote:
> On Tue, May 29, 2012 at 2:40 PM, Yinghai Lu <[email protected]>
> wrote:
>> On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas
>> <[email protected]> wrote:
>>> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu
>>> <[email protected]> wrote:
>>>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin
>>>> <[email protected]> wrote:
>>>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote:

*SNIP*

>>
>>
>> ok, please check the version, that put back PCI_MAX_RESOURCE_32
>> for io ports.
>>
>> also RFC to limit for 16 bit ioport handling. only help other
>> arches that does support 32bit ioports but have bridges only
>> support 16bit io ports.
>
> I don't understand this one at all. It looks like you mashed
> together at least two changes: (1) prefer I/O space above 64K if
> available, and (2) mark secondary bus resources with
> IORESOURCE_IO_32 when the P2P bridge I/O window address decode type
> is PCI_IO_RANGE_TYPE_32 and use that to limit allocations.
>
> I don't see the justification for (1). What problem does this
> solve?
>
I can potentially see this helping with hotplug, where a new device
has only 16 bit io ports, but if there's no space remaining under 64k
allocation would fail. Preferring above 64k means preserving that
limited resource. This is exactly equivalent to my original
motivation for preferring 64 bit PCI mem in order to preserve 32 bit
address space for hotplug devices without 64 bit BARs.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk/FzvEACgkQGcb56gMuC61lYwCfchbsyzN5KLCWTuyQiMcJR0DH
l4gAoJAY1D0HN6m5JnQLu705hvm85p5a
=whd3
-----END PGP SIGNATURE-----

2012-05-30 16:28:11

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Wed, May 30, 2012 at 1:40 AM, Steven Newbury <[email protected]> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 30/05/12 00:27, Bjorn Helgaas wrote:
>> On Tue, May 29, 2012 at 2:40 PM, Yinghai Lu <[email protected]>
>> wrote:
>>> On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas
>>> <[email protected]> wrote:
>>>> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu
>>>> <[email protected]> wrote:
>>>>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin
>>>>> <[email protected]> wrote:
>>>>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>
> *SNIP*
>
>>>
>>>
>>> ok, please check the version, that put back PCI_MAX_RESOURCE_32
>>> for io ports.
>>>
>>> also RFC to limit for 16 bit ioport handling. ?only help other
>>> arches that does support 32bit ioports but have bridges only
>>> support 16bit io ports.
>>
>> I don't understand this one at all. ?It looks like you mashed
>> together at least two changes: (1) prefer I/O space above 64K if
>> available, and (2) mark secondary bus resources with
>> IORESOURCE_IO_32 when the P2P bridge I/O window address decode type
>> is PCI_IO_RANGE_TYPE_32 and use that to limit allocations.
>>
>> I don't see the justification for (1). ?What problem does this
>> solve?
>>
> I can potentially see this helping with hotplug, where a new device
> has only 16 bit io ports, but if there's no space remaining under 64k
> allocation would fail. ?Preferring above 64k means preserving that
> limited resource. ?This is exactly equivalent to my original
> motivation for preferring 64 bit PCI mem in order to preserve 32 bit
> address space for hotplug devices without 64 bit BARs.

You're right, the spec does allow the upper 16 bits of I/O BARs to be
hardwired to zero (PCI spec rev 3.0, p. 226), so this part does make
some sense. I don't think it applies to x86, since I don't think
there's a way to generate an I/O access to anything above 64K, but it
could help other arches.

I'm inclined to be conservative and wait until we find a problem where
a patch like this would help.

2012-05-30 16:31:12

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On 05/30/2012 09:27 AM, Bjorn Helgaas wrote:
>
> You're right, the spec does allow the upper 16 bits of I/O BARs to be
> hardwired to zero (PCI spec rev 3.0, p. 226), so this part does make
> some sense. I don't think it applies to x86, since I don't think
> there's a way to generate an I/O access to anything above 64K, but it
> could help other arches.
>
> I'm inclined to be conservative and wait until we find a problem where
> a patch like this would help.
>

The really conservative thing is to just use 16-bit addresses for I/O on
all platforms. I think a lot of non-x86 platforms does that.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-05-30 16:34:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Wed, May 30, 2012 at 9:27 AM, Bjorn Helgaas <[email protected]> wrote:
>
> You're right, the spec does allow the upper 16 bits of I/O BARs to be
> hardwired to zero (PCI spec rev 3.0, p. 226), so this part does make
> some sense. ?I don't think it applies to x86, since I don't think
> there's a way to generate an I/O access to anything above 64K, but it
> could help other arches.
>
> I'm inclined to be conservative and wait until we find a problem where
> a patch like this would help.

I agree. I would be *very* suspicious of "it might help with other
architectures".

Sure, other architectures might be using non-16-bit IO addresses for
their motherboard resources (just because they can). However, any
hotplug device is most likely (by far) to have been tested and
designed for x86, which means that even if it might look like it has
more than 16 bits of IO space, it will never have been tested that
way.

So rather than make it likely to help other architectures, I'd say
that it would make it likely to break things.

Linus

2012-06-01 23:30:54

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <[email protected]> wrote:
>
> The bus-side address space should not be more than 32 bits no matter
> what. ?As Bjorn indicates, you seem to be mixing up bus and cpu
> addresses all over the place.

please check update patches that is using converted pci bus address
for boundary checking.

Thanks

Yinghai Lu


Attachments:
pcibus_addr_converting_bus.patch (4.36 kB)
allocate_high_at_first_v6.patch (4.17 kB)
Download all attachments

2012-06-04 01:06:10

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Fri, Jun 1, 2012 at 4:30 PM, Yinghai Lu <[email protected]> wrote:
> On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <[email protected]> wrote:
>>
>> The bus-side address space should not be more than 32 bits no matter
>> what. ?As Bjorn indicates, you seem to be mixing up bus and cpu
>> addresses all over the place.
>
> please check update patches that is using converted pci bus address
> for boundary checking.

What problem does this fix? There's significant risk that this
allocation change will make us trip over something, so it must fix
something to make it worth considering.

Steve's problem doesn't count because that's a "pci=nocrs" case that
will always require special handling. A general solution is not
possible without a BIOS change (to describe >4GB apertures) or a
native host bridge driver (to discover >4GB apertures from the
hardware). These patches only make Steve's machine work by accident
-- they make us put the video device above 4GB, and we're just lucky
that the host bridge claims that region.

One possibility is some sort of boot-time option to force a PCI device
to a specified address. That would be useful for debugging as well as
for Steve's machine.

2012-06-05 02:37:10

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Sun, Jun 3, 2012 at 6:05 PM, Bjorn Helgaas <[email protected]> wrote:
> On Fri, Jun 1, 2012 at 4:30 PM, Yinghai Lu <[email protected]> wrote:
>> On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <[email protected]> wrote:
>>>
>>> The bus-side address space should not be more than 32 bits no matter
>>> what. ?As Bjorn indicates, you seem to be mixing up bus and cpu
>>> addresses all over the place.
>>
>> please check update patches that is using converted pci bus address
>> for boundary checking.
>
> What problem does this fix? ?There's significant risk that this
> allocation change ?will make us trip over something, so it must fix
> something to make it worth considering.

If we do not enable that, we would not find the problem.

On one my test setup that _CRS does state 64bit resource range,
but when I clear some device resource manually and let kernel allocate
high, just then find out those devices does not work with drivers.
It turns out _CRS have more big range than what the chipset setting states.
with fixing in BIOS, allocate high is working now on that platform.

>
> Steve's problem doesn't count because that's a "pci=nocrs" case that
> will always require special handling.

but pci=nocrs is still supported, even some systems does not work with
pci=use_crs

>?A general solution is not
> possible without a BIOS change (to describe >4GB apertures) or a
> native host bridge driver (to discover >4GB apertures from the
> hardware). ?These patches only make Steve's machine work by accident
> -- they make us put the video device above 4GB, and we're just lucky
> that the host bridge claims that region.

Some bios looks enabling the non-stated range default to legacy chain.
Some bios does not do that. only stated range count.
So with pci=nocrs we still have some chance to get allocate high working.

>
> One possibility is some sort of boot-time option to force a PCI device
> to a specified address. ?That would be useful for debugging as well as
> for Steve's machine.

yeah, how about

pci=alloc_high

and default to disabled ?

Thanks

Yinghai

2012-06-05 04:51:09

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Mon, Jun 4, 2012 at 7:37 PM, Yinghai Lu <[email protected]> wrote:
> On Sun, Jun 3, 2012 at 6:05 PM, Bjorn Helgaas <[email protected]> wrote:
>> On Fri, Jun 1, 2012 at 4:30 PM, Yinghai Lu <[email protected]> wrote:
>>> On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <[email protected]> wrote:
>>>>
>>>> The bus-side address space should not be more than 32 bits no matter
>>>> what. ?As Bjorn indicates, you seem to be mixing up bus and cpu
>>>> addresses all over the place.
>>>
>>> please check update patches that is using converted pci bus address
>>> for boundary checking.
>>
>> What problem does this fix? ?There's significant risk that this
>> allocation change ?will make us trip over something, so it must fix
>> something to make it worth considering.
>
> If we do not enable that, we would not find the problem.

Sorry, that didn't make any sense to me. I'm hoping you will point us
to a bug report that is fixed by this patch.

> On one my test setup that _CRS does state 64bit resource range,
> but when I clear some device resource manually and let kernel allocate
> high, just then find out those devices does not work with drivers.
> It turns out _CRS have more big range than what the chipset setting states.
> with fixing in BIOS, allocate high is working now on that platform.

I didn't understand this either, sorry. Are you saying that this
patch helps us work around a BIOS defect?

>> Steve's problem doesn't count because that's a "pci=nocrs" case that
>> will always require special handling.
>
> but pci=nocrs is still supported, even some systems does not work with
> pci=use_crs
>
>>?A general solution is not
>> possible without a BIOS change (to describe >4GB apertures) or a
>> native host bridge driver (to discover >4GB apertures from the
>> hardware). ?These patches only make Steve's machine work by accident
>> -- they make us put the video device above 4GB, and we're just lucky
>> that the host bridge claims that region.
>
> Some bios looks enabling the non-stated range default to legacy chain.
> Some bios does not do that. only stated range count.
> So with pci=nocrs we still have some chance to get allocate high working.

The patch as proposed changes behavior for all systems, whether we're
using _CRS or not (in fact, it even changes the behavior for non-x86
systems). The only case we know of where it fixes something is
Steve's system, where he already has to use "pci=nocrs" in order for
it to help. My point is that it would be safer to leave things as
they are for everybody, and merely ask Steve to use "pci=nocrs
pci=alloc_high" or something similar.

>> One possibility is some sort of boot-time option to force a PCI device
>> to a specified address. ?That would be useful for debugging as well as
>> for Steve's machine.
>
> yeah, how about
>
> pci=alloc_high
>
> and default to disabled ?

I was actually thinking of something more specific, e.g., a way to
place one device at an exact address. I've implemented that a couple
times already for testing various things. But maybe a more general
option like "pci=alloc_high" would make sense, too.

Linux has a long history of allocating bottom-up. Windows has a long
history of allocating top-down. You're proposing a third alternative,
allocating bottom-up starting at 4GB for 64-bit BARs. If we change
this area, I would prefer something that follows Windows because I
think it will be closer to what's been tested by Windows. Do you
think your alternative is better?

2012-06-05 05:04:59

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Mon, Jun 4, 2012 at 9:50 PM, Bjorn Helgaas <[email protected]> wrote:
> On Mon, Jun 4, 2012 at 7:37 PM, Yinghai Lu <[email protected]> wrote:
>> On Sun, Jun 3, 2012 at 6:05 PM, Bjorn Helgaas <[email protected]> wrote:
>>> On Fri, Jun 1, 2012 at 4:30 PM, Yinghai Lu <[email protected]> wrote:
>>>> On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <[email protected]> wrote:
>>>>>
>>>>> The bus-side address space should not be more than 32 bits no matter
>>>>> what. ?As Bjorn indicates, you seem to be mixing up bus and cpu
>>>>> addresses all over the place.
>>>>
>>>> please check update patches that is using converted pci bus address
>>>> for boundary checking.
>>>
>>> What problem does this fix? ?There's significant risk that this
>>> allocation change ?will make us trip over something, so it must fix
>>> something to make it worth considering.
>>
>> If we do not enable that, we would not find the problem.
>
> Sorry, that didn't make any sense to me. ?I'm hoping you will point us
> to a bug report that is fixed by this patch.

current it only help Steve's test case.

>
>> On one my test setup that _CRS does state 64bit resource range,
>> but when I clear some device resource manually and let kernel allocate
>> high, just then find out those devices does not work with drivers.
>> It turns out _CRS have more big range than what the chipset setting states.
>> with fixing in BIOS, allocate high is working now on that platform.
>
> I didn't understand this either, sorry. ?Are you saying that this
> patch helps us work around a BIOS defect?

Help us find out one BIOS defect.

>
>> yeah, how about
>>
>> pci=alloc_high
>>
>> and default to disabled ?
>
> I was actually thinking of something more specific, e.g., a way to
> place one device at an exact address. ?I've implemented that a couple
> times already for testing various things. ?But maybe a more general
> option like "pci=alloc_high" would make sense, too.

yeah.

>
....
> Linux has a long history of allocating bottom-up. ?Windows has a long
> history of allocating top-down. ?You're proposing a third alternative,
> allocating bottom-up starting at 4GB for 64-bit BARs. ?If we change
> this area, I would prefer something that follows Windows because I
> think it will be closer to what's been tested by Windows. ?Do you
> think your alternative is better?

hope we can figure out how windows is making it work.

Steve, Can you check if Windows is working with your test case ?

If it works, we may try do the same thing from Linux, so you will not need to
append "pci=nocrs pci=alloc_high"...

Thanks

Yinghai

2012-06-06 09:44:13

by Steven Newbury

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Tue,  5 Jun 2012, 06:04:57 BST, Yinghai Lu <[email protected]> wrote:
> > Linux has a long history of allocating bottom-up.  Windows has a long
> > history of allocating top-down.  You're proposing a third alternative,
> > allocating bottom-up starting at 4GB for 64-bit BARs.  If we change
> > this area, I would prefer something that follows Windows because I
> > think it will be closer to what's been tested by Windows.  Do you
> > think your alternative is better?
>
> hope we can figure out how windows is making it work.
>
> Steve, Can you check if Windows is working with your test case ?
>
> If it works, we may try do the same thing from Linux, so you will not
> need to append "pci=nocrs pci=alloc_high"...
>
Unfortunately I don't have a 64 bit version of Windows to test with. Vista(32 bit) fails to even boot when docked, hot-plugging fails to allocate resources, but at least doesn't crash.

>From what I've read about the (64 bit) Windows allocation stragegy it's closer to Yinghai's method than the Linux default, preferring 64 bit resources (>4G) when possible. I'll try to find the specification document again.

2012-06-06 16:18:39

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first

On Wed, Jun 6, 2012 at 2:44 AM, Steven Newbury <[email protected]> wrote:
> On Tue,? ?5 Jun 2012, 06:04:57 BST, Yinghai Lu <[email protected]> wrote:
>> > Linux has a long history of allocating bottom-up. ?Windows has a long
>> > history of allocating top-down. ?You're proposing a third alternative,
>> > allocating bottom-up starting at 4GB for 64-bit BARs. ?If we change
>> > this area, I would prefer something that follows Windows because I
>> > think it will be closer to what's been tested by Windows. ?Do you
>> > think your alternative is better?
>>
>> hope we can figure out how windows is making it work.
>>
>> Steve, Can you check if Windows is working with your test case ?
>>
>> If it works, we may try do the same thing from Linux, so you will not
>> need to append "pci=nocrs pci=alloc_high"...
>>
> Unfortunately I don't have a 64 bit version of Windows to test with. ?Vista(32 bit) fails to even boot when docked, hot-plugging fails to allocate resources, but at least doesn't crash.
>
> From what I've read about the (64 bit) Windows allocation stragegy it's closer to Yinghai's method than the Linux default, preferring 64 bit resources (>4G) when possible. ?I'll try to find the specification document again.

Here's the host bridge info from the BIOS (from
https://bugzilla.kernel.org/show_bug.cgi?id=10461 attachment
https://bugzilla.kernel.org/attachment.cgi?id=72869):

ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7]
pci_root PNP0A03:00: host bridge window [io 0x0d00-0xffff]
pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
pci_root PNP0A03:00: host bridge window [mem 0x000d0000-0x000dffff]
pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xf7ffffff]
pci_root PNP0A03:00: host bridge window [mem 0xfc000000-0xfebfffff]
pci_root PNP0A03:00: host bridge window [mem 0xfec10000-0xfecfffff]
pci_root PNP0A03:00: host bridge window [mem 0xfed1c000-0xfed1ffff]
pci_root PNP0A03:00: host bridge window [mem 0xfed90000-0xfed9ffff]
pci_root PNP0A03:00: host bridge window [mem 0xfed40000-0xfed44fff]
pci_root PNP0A03:00: host bridge window [mem 0xfeda7000-0xfedfffff]
pci_root PNP0A03:00: host bridge window [mem 0xfee10000-0xff9fffff]
pci_root PNP0A03:00: host bridge window [mem 0xffc00000-0xffdfffff]

There's no aperture above 4GB. So I don't think any version of
Windows will ever assign a BAR above 4GB.