Hi Bjorn,
Sorry for the delay on this one and pushing it after RC1.
Feel free to queue it up for 4.20 if it looks fine.
I've added comments to the git log and source explaining why calculate_iosize
was left unchanged. Basically I could not synthesize a condition where it would
have affected the topology.
v1->v2: Comments
Jon Derrick (1):
PCI: Equalize hotplug memory for non/occupied slots
drivers/pci/setup-bus.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
--
1.8.3.1
Currently, a hotplug bridge will be given hpmemsize additional memory if
available, in order to satisfy any future hotplug allocation
requirements.
These calculations don't consider the current memory size of the hotplug
bridge/slot, so hotplug bridges/slots which have downstream devices will
get their current allocation in addition to the hpmemsize value.
This makes for possibly undesirable results with a mix of unoccupied and
occupied slots (ex, with hpmemsize=2M):
02:03.0 PCI bridge: <-- Occupied
Memory behind bridge: d6200000-d64fffff [size=3M]
02:04.0 PCI bridge: <-- Unoccupied
Memory behind bridge: d6500000-d66fffff [size=2M]
This change considers the current allocation size when using the
hpmemsize parameter to make the reservations predictable for the mix of
unoccupied and occupied slots:
02:03.0 PCI bridge: <-- Occupied
Memory behind bridge: d6200000-d63fffff [size=2M]
02:04.0 PCI bridge: <-- Unoccupied
Memory behind bridge: d6400000-d65fffff [size=2M]
The calculation for IO (hpiosize) should be similar, but platform
firmwares I've encountered (including QEMU) provide strict allocations
for IO and would not provide free IO resources for hotplug buses
in order to prove this calculation.
Signed-off-by: Jon Derrick <[email protected]>
---
drivers/pci/setup-bus.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 79b1824..70d0aba 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -831,7 +831,8 @@ static resource_size_t calculate_iosize(resource_size_t size,
static resource_size_t calculate_memsize(resource_size_t size,
resource_size_t min_size,
- resource_size_t size1,
+ resource_size_t add_size,
+ resource_size_t children_add_size,
resource_size_t old_size,
resource_size_t align)
{
@@ -841,7 +842,15 @@ static resource_size_t calculate_memsize(resource_size_t size,
old_size = 0;
if (size < old_size)
size = old_size;
- size = ALIGN(size + size1, align);
+
+ /*
+ * Consider the current allocation size when adding size for extra
+ * hotplug memory. This ensures that occupied slots don't receive
+ * unneccessary memory allocations in addition to their current size.
+ * The calculation should be similar for calculate_iosize, but was
+ * unable to be tested.
+ */
+ size = ALIGN(max(size, add_size) + children_add_size, align);
return size;
}
@@ -1079,12 +1088,10 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
min_align = calculate_mem_align(aligns, max_order);
min_align = max(min_align, window_alignment(bus, b_res->flags));
- size0 = calculate_memsize(size, min_size, 0, resource_size(b_res), min_align);
+ size0 = calculate_memsize(size, min_size, 0, 0, resource_size(b_res), min_align);
add_align = max(min_align, add_align);
- if (children_add_size > add_size)
- add_size = children_add_size;
- size1 = (!realloc_head || (realloc_head && !add_size)) ? size0 :
- calculate_memsize(size, min_size, add_size,
+ size1 = (!realloc_head || (realloc_head && !add_size && !children_add_size)) ? size0 :
+ calculate_memsize(size, min_size, add_size, children_add_size,
resource_size(b_res), add_align);
if (!size0 && !size1) {
if (b_res->start || b_res->end)
--
1.8.3.1
On Thu, Aug 30, 2018 at 04:12:00PM -0600, Jon Derrick wrote:
> Currently, a hotplug bridge will be given hpmemsize additional memory if
> available, in order to satisfy any future hotplug allocation
> requirements.
>
> These calculations don't consider the current memory size of the hotplug
> bridge/slot, so hotplug bridges/slots which have downstream devices will
> get their current allocation in addition to the hpmemsize value.
>
> This makes for possibly undesirable results with a mix of unoccupied and
> occupied slots (ex, with hpmemsize=2M):
>
> 02:03.0 PCI bridge: <-- Occupied
> Memory behind bridge: d6200000-d64fffff [size=3M]
> 02:04.0 PCI bridge: <-- Unoccupied
> Memory behind bridge: d6500000-d66fffff [size=2M]
>
> This change considers the current allocation size when using the
> hpmemsize parameter to make the reservations predictable for the mix of
> unoccupied and occupied slots:
>
> 02:03.0 PCI bridge: <-- Occupied
> Memory behind bridge: d6200000-d63fffff [size=2M]
> 02:04.0 PCI bridge: <-- Unoccupied
> Memory behind bridge: d6400000-d65fffff [size=2M]
>
> The calculation for IO (hpiosize) should be similar, but platform
> firmwares I've encountered (including QEMU) provide strict allocations
> for IO and would not provide free IO resources for hotplug buses
> in order to prove this calculation.
>
> Signed-off-by: Jon Derrick <[email protected]>
Reviewed-by: Mika Westerberg <[email protected]>
On Thu, Aug 30, 2018 at 04:11:59PM -0600, Jon Derrick wrote:
> Hi Bjorn,
>
> Sorry for the delay on this one and pushing it after RC1.
> Feel free to queue it up for 4.20 if it looks fine.
>
> I've added comments to the git log and source explaining why
> calculate_iosize was left unchanged. Basically I could not
> synthesize a condition where it would have affected the topology.
In other words, the only reason you didn't change the
calculate_iosize() path was because you couldn't test it?
I appreciate your desire to avoid untested changes, but I think it's
very important to preserve and even improve the symmetry between
calculate_memsize() and calculate_iosize(). For example, it's not
obvious why the order is different here:
calculate_iosize():
size = ALIGN(size + size1, align);
if (size < old_size)
size = old_size;
calculate_memsize():
if (size < old_size)
size = old_size;
size = ALIGN(size + size1, align);
So I don't want to diverge them further unless there's a real
functional reason why we need to handle I/O port space differently
than MMIO space.
You've tested the MMIO path, and I'm willing to take the risk of
doing the same thing in the I/O port path.
Bjorn
On Mon, 2018-09-17 at 15:53 -0500, Bjorn Helgaas wrote:
> On Thu, Aug 30, 2018 at 04:11:59PM -0600, Jon Derrick wrote:
> > Hi Bjorn,
> >
> > Sorry for the delay on this one and pushing it after RC1.
> > Feel free to queue it up for 4.20 if it looks fine.
> >
> > I've added comments to the git log and source explaining why
> > calculate_iosize was left unchanged. Basically I could not
> > synthesize a condition where it would have affected the topology.
>
> In other words, the only reason you didn't change the
> calculate_iosize() path was because you couldn't test it?
>
I did unsuccessfully try to synthesize it in hardware and qemu. The
firmwares didn't provide the neccessary topology to hit the flexible IO
provisioning conditions
> I appreciate your desire to avoid untested changes, but I think it's
> very important to preserve and even improve the symmetry between
> calculate_memsize() and calculate_iosize(). For example, it's not
> obvious why the order is different here:
>
> calculate_iosize():
> size = ALIGN(size + size1, align);
> if (size < old_size)
> size = old_size;
>
I agree this part didn't make that much sense to me, which was another
reason I left it as-is. Looking at it again, I think its a harmless
calculation that bounds IO size tightly, but could also be reordered as
below to provide for the additional IO (assuming this code ever runs).
> calculate_memsize():
> if (size < old_size)
> size = old_size;
> size = ALIGN(size + size1, align);
>
> So I don't want to diverge them further unless there's a real
> functional reason why we need to handle I/O port space differently
> than MMIO space.
>
> You've tested the MMIO path, and I'm willing to take the risk of
> doing the same thing in the I/O port path.
>
> Bjorn
Great! I'll follow-up with a patch as soon as I can
Jon