On Wed, Nov 21, 2007 at 09:49:03AM +0000, Jan Beulich wrote:
> >I will definitely take a look at this but due to holiday/vacation
> >I may not make any progress on it until next week. I'm not sure
> >if I totally follow your description but I suspect it will
> >become more understandable when I look at the code. BTW, what
> >system/PCI adapter combination are you seeing this with? It
> >would be nice if I could figure out how to reproduce the problem
> >here.
>
> The basic consideration to make is what happens if the bridge's
> non-prefetch window provides just enough space to cover all non-ROM
> non-prefetch regions.
Thanks. I think I understand now.
A kernel without the patch always forced creation of a prefetch
window for expansion ROMs which was incorrect on systems where
insufficient memory resources are available for both non-prefetch
and prefetch windows. On the systems I was dealing with, the
BIOS assumes (correctly, I believe) that expansion ROM memory
resource needs will be satisfied from the non-prefetch window.
On those same systems, the BIOS (via _CRS) does not provide
enough additional prefetch or non-prefetch memory for even a
minimum size prefetch window. These systems limit the memory
resources provided for already installed PCI devices to the
bare minimum because of the large number of PCI devices in a
multi-node configuration that are competing for the available
resources.
A kernel with the patch forces expansion ROM resources to be
allocated from a non-prefetch window which, based on your
input, is apparently problematic on at least one system due to
the BIOS (via e820 hole?) not providing enough non-prefetch memory
to construct a large enough non-prefetch window to accommodate
the underlying expansion ROMs.
I think it would still be useful to know what system/PCI adapter
combination you are seeing the problem with so that I can try
to put together a setup that will reproduce the problem here.
Alternatively, it would good if you wouldn't mind providing
more information, testing possible fixes, etc. As a start,
could you send me the following taken with and without the
patch?
- /proc/iomem
- /proc/ioports
- `dmesg` output
- `lspci -vt` output
- `lspci -vvv` output
>
> >Also, is it happening both with and without the 'pci=use_crs'
> >kernel option?
>
> I have to admit I didn't even know about that option;
It is quite new. It was added with a separate patch that I
submitted at the same time as the "Avoid creating P2P prefetch
window for expansion ROMs" patch.
Prior to constraining PCI memory resource allocations to what
_CRS returned, the BIOS unassigned mem resources (including
that required for expansion ROMs) were being allocated from
often incorrect ranges in the e820 hole. After adding the
_CRS constraint we encountered an expansion ROM allocation
problem that motivated the "Avoid creating P2P prefetch window
for expansion ROMs" change. If I recall correctly, we ran
into the expansion ROM allocation problem when using a PCIe
adapter which has an on-board P2P bridge (PCIe-to-PCI/PCI-X)
above a device with an expansion ROM.
> when using it the
> box doesn't find its root anymore (the fusion MPT driver reports that it
> can't map the adapter's memory), and /proc/iomem doesn't show any
> iomem regions apart from the frame buffer used by vesafb, ACPI memory
> and one single range fffa0000-fffabfff. Below is the relevant fragment
> of the kernel messages.
Interesting. I actually anticipated that this sort of thing
might happen on some systems with BIOSes that export a _CRS
that does not correctly return all of the resources available
under the associated root bridge. This is one reason why the
use _CRS feature is "off" by default. You might want to make
sure that the BIOS on that system is up to date and, if so,
let the h/w vendor know about this.
Greg,
Without a closer look at the code and the information I
requested above I am not 100% certain that my expansion ROM
allocation fix is incomplete but it seems pretty likely that
it is. If you feel that the possible regression that Jan
reported is more important than the issue I was trying to solve
feel free to remove the change until I can come up with
something better.
Thanks,
Gary
--
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503 IBM T/L: 775-4503
[email protected]
http://www.ibm.com/linux/ltc
>
> Jan
>
> PCI: Using ACPI for IRQ routing
> PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
> PCI: Cannot allocate resource region 8 of bridge 0000:00:01.0
> PCI: Cannot allocate resource region 9 of bridge 0000:00:01.0
> PCI: Cannot allocate resource region 8 of bridge 0000:00:02.0
> PCI: Cannot allocate resource region 9 of bridge 0000:00:02.0
> PCI: Cannot allocate resource region 8 of bridge 0000:00:04.0
> PCI: Cannot allocate resource region 9 of bridge 0000:00:04.0
> PCI: Cannot allocate resource region 8 of bridge 0000:00:05.0
> PCI: Cannot allocate resource region 9 of bridge 0000:00:05.0
> PCI: Cannot allocate resource region 8 of bridge 0000:00:06.0
> PCI: Cannot allocate resource region 8 of bridge 0000:0a:00.2
> PCI: Cannot allocate resource region 8 of bridge 0000:00:07.0
> PCI: Cannot allocate resource region 8 of bridge 0000:0d:00.0
> PCI: Cannot allocate resource region 8 of bridge 0000:00:1e.0
> PCI: Cannot allocate resource region 9 of bridge 0000:00:1e.0
> PCI: Cannot allocate resource region 0 of device 0000:00:1d.7
> PCI: Cannot allocate resource region 0 of device 0000:0a:00.0
> PCI: Cannot allocate resource region 0 of device 0000:0c:02.0
> PCI: Cannot allocate resource region 0 of device 0000:0c:02.1
> PCI: Cannot allocate resource region 1 of device 0000:0e:05.0
> PCI: Cannot allocate resource region 3 of device 0000:0e:05.0
> PCI: Cannot allocate resource region 1 of device 0000:0e:05.1
> PCI: Cannot allocate resource region 3 of device 0000:0e:05.1
> PCI: Cannot allocate resource region 0 of device 0000:10:00.0
> PCI: Cannot allocate resource region 2 of device 0000:10:00.0
> Setting up standard PCI resources
> pnp: 00:06: ioport range 0x400-0x4bf could not be reserved
> pnp: 00:06: ioport range 0x800-0x87f has been reserved
> pnp: 00:06: ioport range 0x3f0-0x3f5 has been reserved
> pnp: 00:06: ioport range 0x3f7-0x3f7 has been reserved
> pnp: 00:06: ioport range 0x880-0x883 has been reserved
> pnp: 00:06: ioport range 0xca4-0xca4 has been reserved
> pnp: 00:06: ioport range 0xca5-0xca5 has been reserved
> pnp: 00:0a: ioport range 0xca2-0xca2 has been reserved
> pnp: 00:0a: ioport range 0xca3-0xca3 has been reserved
> assign 0000:00:06.0#8 (cur=100000 min=80000000)
> PCI: Failed to allocate mem resource #8:200000@100000 for 0000:00:06.0
> assign 0000:00:07.0#8 (cur=100000 min=80000000)
> PCI: Failed to allocate mem resource #8:300000@100000 for 0000:00:07.0
> PCI: Failed to allocate mem resource #0:400@0 for 0000:00:1d.7
> PCI: Bridge: 0000:00:01.0
> IO window: 7000-7fff
> MEM window: disabled.
> PREFETCH window: disabled.
> PCI: Bridge: 0000:00:02.0
> IO window: 6000-6fff
> MEM window: disabled.
> PREFETCH window: disabled.
> PCI: Bridge: 0000:00:03.0
> IO window: disabled.
> MEM window: disabled.
> PREFETCH window: disabled.
> PCI: Bridge: 0000:00:04.0
> IO window: 5000-5fff
> MEM window: disabled.
> PREFETCH window: disabled.
> PCI: Bridge: 0000:00:05.0
> IO window: 4000-4fff
> MEM window: disabled.
> PREFETCH window: disabled.
> assign 0000:0a:00.2#8 (cur=100000 min=80000000)
> PCI: Failed to allocate mem resource #8:100000@100000 for 0000:0a:00.2
> PCI: Failed to allocate mem resource #0:1000@0 for 0000:0a:00.0
> PCI: Bridge: 0000:0a:00.0
> IO window: disabled.
> MEM window: disabled.
> PREFETCH window: disabled.
> PCI: Failed to allocate mem resource #6:20000@0 for 0000:0c:02.0
> PCI: Failed to allocate mem resource #0:10000@0 for 0000:0c:02.0
> PCI: Failed to allocate mem resource #0:10000@0 for 0000:0c:02.1
> PCI: Bridge: 0000:0a:00.2
> IO window: disabled.
> MEM window: disabled.
> PREFETCH window: disabled.
> PCI: Bridge: 0000:00:06.0
> IO window: disabled.
> MEM window: disabled.
> PREFETCH window: disabled.
> assign 0000:0d:00.0#8 (cur=100000 min=80000000)
> PCI: Failed to allocate mem resource #8:300000@100000 for 0000:0d:00.0
> PCI: Failed to allocate mem resource #6:100000@0 for 0000:0e:05.0
> PCI: Failed to allocate mem resource #6:100000@0 for 0000:0e:05.1
> PCI: Failed to allocate mem resource #1:10000@0 for 0000:0e:05.0
> PCI: Failed to allocate mem resource #3:10000@0 for 0000:0e:05.0
> PCI: Failed to allocate mem resource #1:10000@0 for 0000:0e:05.1
> PCI: Failed to allocate mem resource #3:10000@0 for 0000:0e:05.1
> PCI: Bridge: 0000:0d:00.0
> IO window: 3000-3fff
> MEM window: disabled.
> PREFETCH window: disabled.
> PCI: Bridge: 0000:0d:00.2
> IO window: disabled.
> MEM window: disabled.
> PREFETCH window: disabled.
> PCI: Bridge: 0000:00:07.0
> IO window: 3000-3fff
> MEM window: disabled.
> PREFETCH window: disabled.
> PCI: Failed to allocate mem resource #0:8000000@0 for 0000:10:00.0
> PCI: Failed to allocate mem resource #6:20000@0 for 0000:10:00.0
> PCI: Failed to allocate mem resource #2:10000@0 for 0000:10:00.0
> PCI: Bridge: 0000:00:1e.0
> IO window: 2000-2fff
> MEM window: disabled.
> PREFETCH window: disabled.
>
On Mon, Nov 26, 2007 at 01:59:54PM -0800, Gary Hade wrote:
> Greg,
> Without a closer look at the code and the information I
> requested above I am not 100% certain that my expansion ROM
> allocation fix is incomplete but it seems pretty likely that
> it is. If you feel that the possible regression that Jan
> reported is more important than the issue I was trying to solve
> feel free to remove the change until I can come up with
> something better.
If you want me to revert the changes in Linus's tree right now, that's
fine, I will, just send me a patch to do so.
But if you think you can work to solve this problem quickly, then I'm
more than willing to keep the changes in the tree.
It's up to you.
thanks,
greg k-h
On Mon, Nov 26, 2007 at 02:15:45PM -0800, Greg KH wrote:
> On Mon, Nov 26, 2007 at 01:59:54PM -0800, Gary Hade wrote:
> > Greg,
> > Without a closer look at the code and the information I
> > requested above I am not 100% certain that my expansion ROM
> > allocation fix is incomplete but it seems pretty likely that
> > it is. If you feel that the possible regression that Jan
> > reported is more important than the issue I was trying to solve
> > feel free to remove the change until I can come up with
> > something better.
>
> If you want me to revert the changes in Linus's tree right now, that's
> fine, I will, just send me a patch to do so.
>
> But if you think you can work to solve this problem quickly, then I'm
> more than willing to keep the changes in the tree.
>
> It's up to you.
Thanks Greg. If I am unable to provide a revised (or properly
defend the current one :) in the next few days I will send you
a revert patch.
Gary
--
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503 IBM T/L: 775-4503
[email protected]
http://www.ibm.com/linux/ltc
>A kernel without the patch always forced creation of a prefetch
>window for expansion ROMs which was incorrect on systems where
>insufficient memory resources are available for both non-prefetch
>and prefetch windows. On the systems I was dealing with, the
>BIOS assumes (correctly, I believe) that expansion ROM memory
>resource needs will be satisfied from the non-prefetch window.
Why would ROM space generally need to be non-prefetchable? I can
see that special cases might require this, but as long as ROM space
really is just normal code and data, there's nothing wrong with
prefetching from it I would think. Of course I realize there's no way
to specify that on a per-device basis, so I think the BIOS must be
relied upon here.
>I think it would still be useful to know what system/PCI adapter
>combination you are seeing the problem with so that I can try
>to put together a setup that will reproduce the problem here.
>Alternatively, it would good if you wouldn't mind providing
>more information, testing possible fixes, etc. As a start,
>could you send me the following taken with and without the
>patch?
> - /proc/iomem
> - /proc/ioports
> - `dmesg` output
> - `lspci -vt` output
> - `lspci -vvv` output
Attached. *.0 is with the patch, *.1 is with it reverted. All output from
the SLE10SP2 (2.6.16.54-based) kernel that has that patch backported.
I'd also like to note that I found that a second of the systems I'm
regularly dealing with also has similar problems. I'm just not normally
looking at the boot messages that closely.
Jan
Hi Jan,
On Tue, Nov 27, 2007 at 09:28:25AM +0000, Jan Beulich wrote:
> >A kernel without the patch always forced creation of a prefetch
> >window for expansion ROMs which was incorrect on systems where
> >insufficient memory resources are available for both non-prefetch
> >and prefetch windows. On the systems I was dealing with, the
> >BIOS assumes (correctly, I believe) that expansion ROM memory
> >resource needs will be satisfied from the non-prefetch window.
>
> Why would ROM space generally need to be non-prefetchable? I can
> see that special cases might require this, but as long as ROM space
> really is just normal code and data, there's nothing wrong with
> prefetching from it I would think. Of course I realize there's no way
> to specify that on a per-device basis, so I think the BIOS must be
> relied upon here.
Yes, I believe you are correct. It is my understanding that
expansion ROM space is not "required" to be non-prefetchable
but "can be" non-prefetchable. Since it "can be" non-prefetchable
it is also my understanding that the BIOS can assume that the
kernel will allocate expansion ROM space from the non-prefetch
window if sufficient space exists there. The BIOS does this to
conserve PCI memory to make more space available to other PCI
devices (already installed or may be hotplugged) in the large
number of other PCI slots that exist in a multi-node system.
For example, I believe the minimum p2p bridge window size is
1MB so if the BIOS determines that all the PCI memory needs
(including expansion ROMs) for devices below the bridge can
be satisfied from a 1MB non-prefetch window, it can provide
(via _CRS) only 1MB of PCI memory for the non-prefetch window
and nothing for a prefetch window. If the kernel strictly
requires expansion ROM space to be allocated from a prefetch
window the BIOS would have to provide an additional 1MB of PCI
memory for a prefetch window. So, in this example the kernel
wants 2MB (1MB for non-prefetch window, 1 MB for prefetch window)
but the BIOS only provides 1MB for the non-prefetch window.
I hope this makes more sense now.
>
> >I think it would still be useful to know what system/PCI adapter
> >combination you are seeing the problem with so that I can try
> >to put together a setup that will reproduce the problem here.
> >Alternatively, it would good if you wouldn't mind providing
> >more information, testing possible fixes, etc. As a start,
> >could you send me the following taken with and without the
> >patch?
> > - /proc/iomem
> > - /proc/ioports
> > - `dmesg` output
> > - `lspci -vt` output
> > - `lspci -vvv` output
>
> Attached. *.0 is with the patch, *.1 is with it reverted. All output from
> the SLE10SP2 (2.6.16.54-based) kernel that has that patch backported.
Thanks!
>
> I'd also like to note that I found that a second of the systems I'm
> regularly dealing with also has similar problems. I'm just not normally
> looking at the boot messages that closely.
Sounds like I better get to work. :)
Thanks,
Gary
--
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503 IBM T/L: 775-4503
[email protected]
http://www.ibm.com/linux/ltc
On Mon, Nov 26, 2007 at 02:15:45PM -0800, Greg KH wrote:
> On Mon, Nov 26, 2007 at 01:59:54PM -0800, Gary Hade wrote:
> > Greg,
> > Without a closer look at the code and the information I
> > requested above I am not 100% certain that my expansion ROM
> > allocation fix is incomplete but it seems pretty likely that
> > it is. If you feel that the possible regression that Jan
> > reported is more important than the issue I was trying to solve
> > feel free to remove the change until I can come up with
> > something better.
>
> If you want me to revert the changes in Linus's tree right now, that's
> fine, I will, just send me a patch to do so.
>
> But if you think you can work to solve this problem quickly, then I'm
> more than willing to keep the changes in the tree.
Greg/Jan, This took me a little longer than I expected.
Thanks for your patience.
Because of the things that I learned from Jun'ichi Nomura
during our discussions in relation to his proposed
"pci: Omit error message for benign allocation failure" change
( re: http://marc.info/?l=linux-kernel&m=119679719828284&w=2 )
and the below additional information that was provided by the
BIOS engineer who I have also been talking to about the problem,
I have decided to propose:
1. Elimination of the default expansion ROM memory allocation
attempts with old behavior accessible via pci=rom boot
option.
2. Revert my "Avoid creating P2P prefetch window for
expansion ROMs" change to eliminate the type of
regressions that Jan spotted for those that might
want to force the old behavior with pci=rom.
Patches and more details for both of these to follow.
BIOS engineer said:
"Our BIOS is really not reserving any resources for the option roms.
BIOS used the non-prefetchable window to shadow the option rom.
So, there is room there for one option rom to be mapped at a time.
If multiple option roms are on the same bus, there may be room for
only one of them, as BIOS maps only one at a time to shadow it.
BIOS is grouping option rom bar with the non-prefetchable window,
because option rom bar and non-prefetchable window are 32-bit,
while prefetchable window could be 64-bit. Also, the spec doesn't
explicitly specify option rom bar as prefetchable. In fact, our BIOS
ensures DWORD aligned read during option rom scan & shadow,
as some devices had problem dealing with some alignment/size."
Thanks,
Gary
--
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503 IBM T/L: 775-4503
[email protected]
http://www.ibm.com/linux/ltc