Hi Andi,
I don't see this patch being applied in any of the tree yet.
Resending this incase you guys might have missed it.
I assume this will go through the ACPI tree.
The need for this patch was explained in
http://marc.info/?l=linux-kernel&m=121441818818598&w=2
The v2 patch was discussed in
http://marc.info/?l=linux-acpi&m=121433339619175&w=2
I then sent a incremental patch on top of v2 to fix some conditions and
handle the change in the e820_search_gap interface.
In this v3 patch, i have folded the incremental fix into the v2 patch.
Please apply this.
Thanks,
Alok
--
acpi based pci gap calculation - v3
From: Alok N Kataria <[email protected]>
Evaluates the _CRS object under PCI0 looking for producer resources.
Then searches the e820 memory space for a gap within these producer resources.
Allows us to find a gap for the unclaimed pci resources or MMIO resources
for hotplug devices within the BIOS allowed pci regions.
v1->v2: Some changes in the print strings.
Also note the prototype for e820_search_gap has changed in this
iteration, but the return value is ignored while calling it over here.
v2->v3: Pass the pci producer resources end_addr (got from _CRS) to the
e820_search_gap function.
We limit the search only within the producer region's address space.
Also add a condition to check if the resource lies in the 32bit
address range.
Signed-off-by: Alok N Kataria <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
arch/x86/pci/acpi.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 80 insertions(+), 0 deletions(-)
diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index 19af069..fff2c42 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -4,6 +4,7 @@
#include <linux/irq.h>
#include <linux/dmi.h>
#include <asm/numa.h>
+#include <asm/e820.h>
#include "pci.h"
struct pci_root_info {
@@ -14,6 +15,11 @@ struct pci_root_info {
int busnum;
};
+struct gap_info {
+ unsigned long gapstart;
+ unsigned long gapsize;
+};
+
static acpi_status
resource_to_addr(struct acpi_resource *resource,
struct acpi_resource_address64 *addr)
@@ -110,6 +116,78 @@ adjust_transparent_bridge_resources(struct pci_bus *bus)
}
}
+static acpi_status search_gap(struct acpi_resource *resource, void *data)
+{
+ struct acpi_resource_address64 addr;
+ acpi_status status;
+ struct gap_info *gap = data;
+ unsigned long long start_addr, end_addr;
+
+ status = resource_to_addr(resource, &addr);
+ if (ACPI_SUCCESS(status) &&
+ addr.resource_type == ACPI_MEMORY_RANGE &&
+ addr.address_length > gap->gapsize) {
+ start_addr = addr.minimum + addr.translation_offset;
+ /*
+ * We want space only in the 32bit address range
+ */
+ if (start_addr < UINT_MAX) {
+ end_addr = start_addr + addr.address_length;
+ e820_search_gap(&gap->gapstart, &gap->gapsize,
+ start_addr, end_addr);
+ }
+ }
+
+ return AE_OK;
+}
+
+/*
+ * Search for a hole in the 32 bit address space for PCI to assign MMIO
+ * resources, for hotplug or unconfigured resources.
+ * We query the CRS object of the PCI root device to look for possible producer
+ * resources in the tree and consider these while calulating the start address
+ * for this hole.
+ */
+static void pci_setup_gap(acpi_handle *handle)
+{
+ struct gap_info gap;
+ acpi_status status;
+
+ gap.gapstart = 0;
+ gap.gapsize = 0x400000;
+
+ status = acpi_walk_resources(handle, METHOD_NAME__CRS,
+ search_gap, &gap);
+
+ if (ACPI_SUCCESS(status)) {
+ unsigned long round;
+
+ if (!gap.gapstart) {
+ printk(KERN_ERR "ACPI: Warning: Cannot find a gap "
+ "in the 32bit address range for PCI\n"
+ "ACPI: PCI devices may collide with "
+ "hotpluggable memory address range\n");
+ }
+ /*
+ * Round the gapstart, uses the same logic as in
+ * e820_gap_setup
+ */
+ round = 0x100000;
+ while ((gap.gapsize >> 4) > round)
+ round += round;
+ /* Fun with two's complement */
+ pci_mem_start = (gap.gapstart + round) & -round;
+
+ printk(KERN_INFO "ACPI: PCI resources should "
+ "start at %lx (gap: %lx:%lx)\n",
+ pci_mem_start, gap.gapstart, gap.gapsize);
+ } else {
+ printk(KERN_ERR "ACPI: Error while searching for gap in "
+ "the 32bit address range for PCI\n");
+ }
+}
+
+
static void
get_current_resources(struct acpi_device *device, int busnum,
int domain, struct pci_bus *bus)
@@ -220,6 +298,8 @@ struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_device *device, int do
if (bus && (pci_probe & PCI_USE__CRS))
get_current_resources(device, busnum, domain, bus);
+
+ pci_setup_gap(device->handle);
return bus;
}
On Tuesday, July 15, 2008 11:59 am Alok Kataria wrote:
> Hi Andi,
>
> I don't see this patch being applied in any of the tree yet.
> Resending this incase you guys might have missed it.
> I assume this will go through the ACPI tree.
>
> The need for this patch was explained in
> http://marc.info/?l=linux-kernel&m=121441818818598&w=2
>
> The v2 patch was discussed in
> http://marc.info/?l=linux-acpi&m=121433339619175&w=2
>
> I then sent a incremental patch on top of v2 to fix some conditions and
> handle the change in the e820_search_gap interface.
>
> In this v3 patch, i have folded the incremental fix into the v2 patch.
> Please apply this.
This should probably go to linux-pci too.
I wonder if this is stable enough to go into 2.6.27? Most of the PCI bugs we
have at the moment are related to PCI resource allocation failures. My hope
is that finding more space will fix most of them. Assuming this patch
doesn't have any dependencies, I can put it in my linux-next branch.
Also, TJ was working on something similar; any comments TJ?
Thanks,
Jesse
On Tue, 2008-07-15 at 13:28 -0700, Jesse Barnes wrote:
> On Tuesday, July 15, 2008 11:59 am Alok Kataria wrote:
> > Hi Andi,
> >
> > I don't see this patch being applied in any of the tree yet.
> > Resending this incase you guys might have missed it.
> > I assume this will go through the ACPI tree.
> >
> > The need for this patch was explained in
> > http://marc.info/?l=linux-kernel&m=121441818818598&w=2
> >
> > The v2 patch was discussed in
> > http://marc.info/?l=linux-acpi&m=121433339619175&w=2
> >
> > I then sent a incremental patch on top of v2 to fix some conditions and
> > handle the change in the e820_search_gap interface.
> >
> > In this v3 patch, i have folded the incremental fix into the v2 patch.
> > Please apply this.
>
> This should probably go to linux-pci too.
>
> I wonder if this is stable enough to go into 2.6.27?
I have tested it with couple of different BIOS settings and it seems to
work as it should.
> Most of the PCI bugs we
> have at the moment are related to PCI resource allocation failures. My hope
> is that finding more space will fix most of them. Assuming this patch
> doesn't have any dependencies, I can put it in my linux-next branch.
No dependencies, I had added a function e820_search_gap which is used by
this patch. This function is already in the mainline tree.
Thanks for applying.
--
Alok
On Tuesday, July 15, 2008 1:54 pm Alok Kataria wrote:
> I have tested it with couple of different BIOS settings and it seems to
> work as it should.
>
> > Most of the PCI bugs we
> > have at the moment are related to PCI resource allocation failures. My
> > hope is that finding more space will fix most of them. Assuming this
> > patch doesn't have any dependencies, I can put it in my linux-next
> > branch.
>
> No dependencies, I had added a function e820_search_gap which is used by
> this patch. This function is already in the mainline tree.
> Thanks for applying.
Ok, I stuffed it in my linux-next branch. I'll let it sit there for a day or
so though, just to shake out any build problems etc. in linux-next before
asking Linus to pull the whole lot.
Thanks,
Jesse
* Jesse Barnes <[email protected]> wrote:
> On Tuesday, July 15, 2008 1:54 pm Alok Kataria wrote:
> > I have tested it with couple of different BIOS settings and it seems to
> > work as it should.
> >
> > > Most of the PCI bugs we
> > > have at the moment are related to PCI resource allocation failures. My
> > > hope is that finding more space will fix most of them. Assuming this
> > > patch doesn't have any dependencies, I can put it in my linux-next
> > > branch.
> >
> > No dependencies, I had added a function e820_search_gap which is
> > used by this patch. This function is already in the mainline tree.
> > Thanks for applying.
>
> Ok, I stuffed it in my linux-next branch. I'll let it sit there for a
> day or so though, just to shake out any build problems etc. in
> linux-next before asking Linus to pull the whole lot.
thanks. The general x86 infrastructure bits this patch depends on are
upstream already.
Ingo
Jesse Barnes wrote:
> On Tuesday, July 15, 2008 1:54 pm Alok Kataria wrote:
>> I have tested it with couple of different BIOS settings and it seems to
>> work as it should.
>>
>>> Most of the PCI bugs we
>>> have at the moment are related to PCI resource allocation failures. My
>>> hope is that finding more space will fix most of them. Assuming this
>>> patch doesn't have any dependencies, I can put it in my linux-next
>>> branch.
>> No dependencies, I had added a function e820_search_gap which is used by
>> this patch. This function is already in the mainline tree.
>> Thanks for applying.
>
> Ok, I stuffed it in my linux-next branch. I'll let it sit there for a day or
> so though, just to shake out any build problems etc. in linux-next before
> asking Linus to pull the whole lot.
For me it seems a little risky to push up that early. I think it would
be better to let it sit longer in linux-next for this. After all this is
a major change how IO resources are allocated. That is why I didn't pull
it into the ACPI release branch.
-Andi
Alok Kataria wrote:
> Hi Andi,
>
> I don't see this patch being applied in any of the tree yet.
> Resending this incase you guys might have missed it.
Sorry still some backlog from vacation last week. However the focus
right now is on patches for 2.6.27 and I don't think your patch
is really tested well enough yet for that. It is fairly intrusive
and changes a critical piece of code. More likely it's a .28 thing,
or possibly a late merge candidate (but that's also not clear)
-Andi
On Tuesday, July 15, 2008 10:40 pm Andi Kleen wrote:
> Jesse Barnes wrote:
> > On Tuesday, July 15, 2008 1:54 pm Alok Kataria wrote:
> >> I have tested it with couple of different BIOS settings and it seems to
> >> work as it should.
> >>
> >>> Most of the PCI bugs we
> >>> have at the moment are related to PCI resource allocation failures. My
> >>> hope is that finding more space will fix most of them. Assuming this
> >>> patch doesn't have any dependencies, I can put it in my linux-next
> >>> branch.
> >>
> >> No dependencies, I had added a function e820_search_gap which is used by
> >> this patch. This function is already in the mainline tree.
> >> Thanks for applying.
> >
> > Ok, I stuffed it in my linux-next branch. I'll let it sit there for a
> > day or so though, just to shake out any build problems etc. in linux-next
> > before asking Linus to pull the whole lot.
>
> For me it seems a little risky to push up that early. I think it would
> be better to let it sit longer in linux-next for this. After all this is
> a major change how IO resources are allocated. That is why I didn't pull
> it into the ACPI release branch.
The only problem there is that linux-next doesn't get nearly the sort of
testing coverage we need for this kind of change. It's also small, and
reverting it is easy, so if we run into big problems that we can't fix we can
always back it out...
I'm interested in hearing people's thoughts on this.
Jesse
> The only problem there is that linux-next doesn't get nearly the sort of
> testing coverage we need for this kind of change.
Normally I tend to wait for one -mm release, which seems to be tested
by a reasonable number of people. If it survives that it is good
to be tested in Linus' tree.
Just stuffing this in in literally the last minute doesn't seem
like a good idea.
-Andi
On Wednesday, July 16, 2008 9:33 am Andi Kleen wrote:
> > The only problem there is that linux-next doesn't get nearly the sort of
> > testing coverage we need for this kind of change.
>
> Normally I tend to wait for one -mm release, which seems to be tested
> by a reasonable number of people. If it survives that it is good
> to be tested in Linus' tree.
>
> Just stuffing this in in literally the last minute doesn't seem
> like a good idea.
Well it's hardly last minute given that the merge window only opened a couple
of days ago...
But beyond that, now that I've thought about it a bit more I'm not even sure
the patch is really correct (though it works on my test machines). Shouldn't
we be looking at _PRS not _CRS? And ideally we should try to find even more
space, not less. This patch made one of my machines lose quite a bit of
space:
...
Allocating PCI resources starting at c0000000 (gap: bf000000:40f00000)
...
ACPI: PCI resources should start at c0000000 (gap: bf000000:31000000)
...
which is a step backwards. With that in mind, I reverted the patch before
asking Linus to pull; I'm hopeful we can do better though. I'd love to never
see "resource allocation failed" messages anymore.
Thanks,
Jesse
Hi Jesse,
On Wed, 2008-07-16 at 17:03 -0700, Jesse Barnes wrote:
> On Wednesday, July 16, 2008 9:33 am Andi Kleen wrote:
> > > The only problem there is that linux-next doesn't get nearly the sort of
> > > testing coverage we need for this kind of change.
> >
> > Normally I tend to wait for one -mm release, which seems to be tested
> > by a reasonable number of people. If it survives that it is good
> > to be tested in Linus' tree.
> >
> > Just stuffing this in in literally the last minute doesn't seem
> > like a good idea.
>
> Well it's hardly last minute given that the merge window only opened a couple
> of days ago...
>
> But beyond that, now that I've thought about it a bit more I'm not even sure
> the patch is really correct (though it works on my test machines). Shouldn't
> we be looking at _PRS not _CRS?
IMO, the current resource allocations will give us a better idea of
which region is free.
Besides, from what i read PRS is optional and not all BIOS's may export
that.
> And ideally we should try to find even more
> space, not less. This patch made one of my machines lose quite a bit of
> space:
>
> ...
> Allocating PCI resources starting at c0000000 (gap: bf000000:40f00000)
> ...
> ACPI: PCI resources should start at c0000000 (gap: bf000000:31000000)
> ...
>
Yep, we should try to find more space but we should also make sure that
this space doesn't clash with any other resource.
As explained in my first mail while posting v1 of these patches, the
kernel ignores the memory hotplug range while calculating this gap for
PCI, and consequently these regions collide.
With this patch we have put some constraints like looking only in the
producer regions rather than all regions which are right now not
reserved/used by the e820 map.
However, looking back again i think we can improve this further in some
cases.
I have made some more changes in the e820_search_gap algorithm to find
the *biggest* un-utilized gap in the 32bit address range, and have added
some debug print messages.
Can you please try this patch on your system and mail me your full
dmesg.
Thanks,
Alok
---
arch/x86/kernel/e820.c | 19 +++++++++++++++++++
arch/x86/pci/acpi.c | 4 ++++
2 files changed, 23 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index a5383ae..298aeec 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -539,6 +539,8 @@ __init int e820_search_gap(unsigned long *gapstart, unsigned long *gapsize,
last = (end_addr && end_addr < MAX_GAP_END) ? end_addr : MAX_GAP_END;
+ printk("E820_DEBUG: Searching for gap between (0x%08lx - 0x%08llx)\n",
+ start_addr, end_addr);
while (--i >= 0) {
unsigned long long start = e820.map[i].addr;
unsigned long long end = start + e820.map[i].size;
@@ -556,9 +558,26 @@ __init int e820_search_gap(unsigned long *gapstart, unsigned long *gapsize,
if (gap >= *gapsize) {
*gapsize = gap;
*gapstart = end;
+ printk("E820_DEBUG: Found gap starting at "
+ "0x%08llx size 0x%08llx\n", end, gap);
found = 1;
}
}
+ if (start > start_addr) {
+ unsigned long gap, prev_end;
+ prev_end = e820.map[i-1].addr + e820.map[i-1].size;
+ if (start_addr > prev_end) {
+ gap = start - start_addr;
+ if (gap >=*gapsize) {
+ *gapsize = gap;
+ *gapstart = start_addr;
+ printk("E820_DEBUG: Found gap at start starting at "
+ "0x%08llx size 0x%08llx\n", end, gap);
+ found = 1;
+ start = start_addr;
+ }
+ }
+ }
if (start < last)
last = start;
}
diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index fff2c42..1a88149 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -131,8 +131,12 @@ static acpi_status search_gap(struct acpi_resource *resource, void *data)
/*
* We want space only in the 32bit address range
*/
+ printk("ACPI_DEBUG start_addr 0x%08lx gapsize 0x%08lx "
+ "address_length 0x%08lx\n", start_addr, gap->gapsize,
+ addr.address_length);
if (start_addr < UINT_MAX) {
end_addr = start_addr + addr.address_length;
+ printk("\t\tend_addr is 0x%08lx\n", end_addr);
e820_search_gap(&gap->gapstart, &gap->gapsize,
start_addr, end_addr);
}
On Tue, 2008-07-22 at 14:50 -0700, Jesse Barnes wrote:
> On Monday, July 21, 2008 10:59 am Alok Kataria wrote:
> > Hi Jesse,
> >
> > Did you get a chance to try this patch on your box. Let me know what are
> > the values you get now.
>
> Here's the dmesg from my box with the ACPI gap stuff applied.
>
> I'm still more inclined to TJ's approach though; it should give us a lot more
> space for PCI devices; though you're right that avoiding conflicts is
> definitely important too...
Hi Jesse,
Thanks for sending the log.
In the log that you sent me, please note the following debug messages
-------
E820_DEBUG: Searching for gap between (0x00000000 - 0x100000000)
E820_DEBUG: Found gap starting at 0xbf000000 size 0x40f00000
Allocating PCI resources starting at c0000000 (gap: bf000000:40f00000)
-------
This is the gap that was allocated by walking just the e820_map
With my changes we query the _CRS resource and get following info
------
ACPI_DEBUG start_addr 0xf8000000 gapsize 0x00400000 address_length 0x06b00000
end_addr is 0xfeb00000
E820_DEBUG: Searching for gap between (0xf8000000 - 0xfeb00000)
E820_DEBUG: Found gap at start starting at 0x100000000 size 0x07f00000
ACPI_DEBUG start_addr 0xbf000000 gapsize 0x07f00000 address_length 0x31000000
end_addr is 0xf0000000
E820_DEBUG: Searching for gap between (0xbf000000 - 0xf0000000)
E820_DEBUG: Found gap starting at 0xbf000000 size 0x31000000
------
So there are 2 producer regions one from [0xBF000000 - 0xF0000000] and
another from [0xF8000000 - 0xFEB00000]. That means BIOS has reserved the
area from [0xF0000000 - 0xF7FFFFFF] for some other resource.
If you look a little below in the log there is this
----
system 00:01: iomem range 0xf0000000-0xf7ffffff has been reserved
----
So the gap that we had calculated first i.e. from e820_setup_gap did
contain a collision i.e. though a resource was reserved from
[0xf0000000 - 0xf7ffffff] our gap calculation doesn't take that into
account. My patch fixes this issue.
So, IMHO this is a BUG and should be fixed. Please let me know your
views.
Thanks,
Alok
On Tuesday, July 22, 2008 3:52 pm Alok Kataria wrote:
> In the log that you sent me, please note the following debug messages
> -------
> E820_DEBUG: Searching for gap between (0x00000000 - 0x100000000)
> E820_DEBUG: Found gap starting at 0xbf000000 size 0x40f00000
> Allocating PCI resources starting at c0000000 (gap: bf000000:40f00000)
> -------
>
> This is the gap that was allocated by walking just the e820_map
>
> With my changes we query the _CRS resource and get following info
> ------
> ACPI_DEBUG start_addr 0xf8000000 gapsize 0x00400000 address_length
> 0x06b00000 end_addr is 0xfeb00000
> E820_DEBUG: Searching for gap between (0xf8000000 - 0xfeb00000)
> E820_DEBUG: Found gap at start starting at 0x100000000 size 0x07f00000
> ACPI_DEBUG start_addr 0xbf000000 gapsize 0x07f00000 address_length
> 0x31000000 end_addr is 0xf0000000
> E820_DEBUG: Searching for gap between (0xbf000000 - 0xf0000000)
> E820_DEBUG: Found gap starting at 0xbf000000 size 0x31000000
> ------
>
> So there are 2 producer regions one from [0xBF000000 - 0xF0000000] and
> another from [0xF8000000 - 0xFEB00000]. That means BIOS has reserved the
> area from [0xF0000000 - 0xF7FFFFFF] for some other resource.
> If you look a little below in the log there is this
>
> ----
> system 00:01: iomem range 0xf0000000-0xf7ffffff has been reserved
> ----
>
> So the gap that we had calculated first i.e. from e820_setup_gap did
> contain a collision i.e. though a resource was reserved from
> [0xf0000000 - 0xf7ffffff] our gap calculation doesn't take that into
> account. My patch fixes this issue.
>
> So, IMHO this is a BUG and should be fixed. Please let me know your
> views.
Yes, there's a conflict there, but on many machines it's probably a harmless
one. My main concerns are these:
1) it changes long standing behavior and doesn't fix any real reported bugs
I'm aware of (feel free to point me at some)
2) it looks like it will dramatically reduce the available PCI resource
space on at least some platforms, and that space is already scarce in our
current scheme
So basically, I'm just feeling very conservative about any changes to resource
allocation, that's all.
If this change were coupled with one that allowed us to exploit more available
address space for PCI resources I'd feel much more comfortable about it,
which is why I'm so interested in TJ's work (/me goes off to read his wiki
now).
Thanks,
Jesse
Hi Jesse,
On Wed, 2008-07-23 at 09:58 -0700, Jesse Barnes wrote:
> On Tuesday, July 22, 2008 3:52 pm Alok Kataria wrote:
> > So the gap that we had calculated first i.e. from e820_setup_gap did
> > contain a collision i.e. though a resource was reserved from
> > [0xf0000000 - 0xf7ffffff] our gap calculation doesn't take that into
> > account. My patch fixes this issue.
> >
> > So, IMHO this is a BUG and should be fixed. Please let me know your
> > views.
>
> Yes, there's a conflict there, but on many machines it's probably a harmless
> one. My main concerns are these:
> 1) it changes long standing behavior and doesn't fix any real reported bugs
> I'm aware of (feel free to point me at some)
The problem that I see is with Memory Hotadd, though my BIOS exports the
correct SRAT table and tells the OS that some regions are hot pluggable,
PCI gap calculation ignores that info and assigns some "option ROMS" in
hot pluggable memory region.
Because of this when i try to hot add memory, the OS sees a resource
allocation conflict and bails out. Which shouldn't have happened,
right ?
> 2) it looks like it will dramatically reduce the available PCI resource
> space on at least some platforms, and that space is already scarce in our
> current scheme
IMO, these gaps are used only for the option ROMS or hot pluggable
devices which in itself are rare. So its not like we are reducing the
whole available PCI resource space, but only the space that is needed by
these optional ROMS.
And i think its inevitable if we have to remove these conflicts with any
other subsystem.
>
> So basically, I'm just feeling very conservative about any changes to resource
> allocation, that's all.
>
> If this change were coupled with one that allowed us to exploit more available
> address space for PCI resources I'd feel much more comfortable about it,
> which is why I'm so interested in TJ's work (/me goes off to read his wiki
> now).
/me to having a look.
Thanks,
Alok
>
> Thanks,
> Jesse
On Wednesday, July 23, 2008 10:10 am Alok Kataria wrote:
> Hi Jesse,
>
> On Wed, 2008-07-23 at 09:58 -0700, Jesse Barnes wrote:
> > On Tuesday, July 22, 2008 3:52 pm Alok Kataria wrote:
> > > So the gap that we had calculated first i.e. from e820_setup_gap did
> > > contain a collision i.e. though a resource was reserved from
> > > [0xf0000000 - 0xf7ffffff] our gap calculation doesn't take that into
> > > account. My patch fixes this issue.
> > >
> > > So, IMHO this is a BUG and should be fixed. Please let me know your
> > > views.
> >
> > Yes, there's a conflict there, but on many machines it's probably a
> > harmless one. My main concerns are these:
> > 1) it changes long standing behavior and doesn't fix any real reported
> > bugs I'm aware of (feel free to point me at some)
>
> The problem that I see is with Memory Hotadd, though my BIOS exports the
> correct SRAT table and tells the OS that some regions are hot pluggable,
> PCI gap calculation ignores that info and assigns some "option ROMS" in
> hot pluggable memory region.
>
> Because of this when i try to hot add memory, the OS sees a resource
> allocation conflict and bails out. Which shouldn't have happened,
> right ?
Ah ok, so you've got a real problem here. And I don't deny that there's a
conflict that we should fix.
> > 2) it looks like it will dramatically reduce the available PCI resource
> > space on at least some platforms, and that space is already scarce
> > in our current scheme
>
> IMO, these gaps are used only for the option ROMS or hot pluggable
> devices which in itself are rare. So its not like we are reducing the
> whole available PCI resource space, but only the space that is needed by
> these optional ROMS.
> And i think its inevitable if we have to remove these conflicts with any
> other subsystem.
Actually we have other allocations. In many cases the BIOS won't assign MMIO
resources, so we do it at boot time. Most of the open PCI bugs we have today
are resource allocation failures, and not just for ROMs, so hopefully you can
understand my worry. Using all available gaps rather than just the largest
seems like the obvious first step to increasing our available space, and
would give us more leeway to avoid stomping on other reserved space.
Jesse
On Wed, 2008-07-23 at 10:52 -0700, Jesse Barnes wrote:
> On Wednesday, July 23, 2008 10:10 am Alok Kataria wrote:
> > Hi Jesse,
> >
> > On Wed, 2008-07-23 at 09:58 -0700, Jesse Barnes wrote:
> > > On Tuesday, July 22, 2008 3:52 pm Alok Kataria wrote:
> > > > So the gap that we had calculated first i.e. from e820_setup_gap did
> > > > contain a collision i.e. though a resource was reserved from
> > > > [0xf0000000 - 0xf7ffffff] our gap calculation doesn't take that into
> > > > account. My patch fixes this issue.
> > > >
> > > > So, IMHO this is a BUG and should be fixed. Please let me know your
> > > > views.
> > >
> > > Yes, there's a conflict there, but on many machines it's probably a
> > > harmless one. My main concerns are these:
> > > 1) it changes long standing behavior and doesn't fix any real reported
> > > bugs I'm aware of (feel free to point me at some)
> >
> > The problem that I see is with Memory Hotadd, though my BIOS exports the
> > correct SRAT table and tells the OS that some regions are hot pluggable,
> > PCI gap calculation ignores that info and assigns some "option ROMS" in
> > hot pluggable memory region.
> >
> > Because of this when i try to hot add memory, the OS sees a resource
> > allocation conflict and bails out. Which shouldn't have happened,
> > right ?
>
> Ah ok, so you've got a real problem here. And I don't deny that there's a
> conflict that we should fix.
Thanks :-)
>
> > > 2) it looks like it will dramatically reduce the available PCI resource
> > > space on at least some platforms, and that space is already scarce
> > > in our current scheme
> >
> > IMO, these gaps are used only for the option ROMS or hot pluggable
> > devices which in itself are rare. So its not like we are reducing the
> > whole available PCI resource space, but only the space that is needed by
> > these optional ROMS.
> > And i think its inevitable if we have to remove these conflicts with any
> > other subsystem.
>
> Actually we have other allocations. In many cases the BIOS won't assign MMIO
> resources, so we do it at boot time. Most of the open PCI bugs we have today
> are resource allocation failures, and not just for ROMs, so hopefully you can
> understand my worry. Using all available gaps rather than just the largest
> seems like the obvious first step to increasing our available space, and
> would give us more leeway to avoid stomping on other reserved space.
>
Yeah i agree that this should be the approach.
I went through TJ's wiki and in the solutions section the first point
talks about
"keeping a list of all available PCI iomem regions which will be derived
from the e820 map (and the CRS object of the root PCI device).
TJ/Jesse, do we have any patches/work-in-progress which actually does
this ?
I can then add the CRS object stuff to that then.
Thanks,
Alok