Hi,
I have a problem getting a couple of PCI cards to play nicely together. The
error I get (in dmesg) is:
2.6.26 kernel:
[ 0.215619] PCI: Cannot allocate resource region 0 of device 0000:01:04.0
2.6.30 kernel:
[ 0.138390] pci 0000:01:04.0: BAR 0: can't allocate resource
And (for both kernels) the output of lspci has the following line:
01:04.0 Unclassified device [0080]: Device 0002:0080
Hardware Details:
- My motherboard is a DigitalLogic MSM945. It has a PC104 bus.
- I have two PC104 cards:
i) 8-port serial card (DiamondSystems Emerald-MM-8Plus)
ii) PC/104+ to MiniPCI Adapter (ConnectTech) (no MiniPCI cards installed).
(I have run both of these cards with a different motherboard, no problems).
I've compared two conditions:
a) "working": serial card installed by itself, and
b) "broken": serial + MiniPCI cards both installed
For the 2.6.26 kernel, I've attached the output of
- dmesg
- lspci
- lspci -v
- cat /proc/iomem
for both 'working' and 'broken' cases.
(Note: in both cases the following two lines appear in the dmesg output:
[ 0.185656] PCI: BIOS Bug: MCFG area at e0000000 is not reserved in ACPI
motherboard resources
[ 0.185711] PCI: Not using MMCONFIG
Is this significant?)
Does anyone have any ideas what to try here? I've been fighting with this for
a while now...
(Please Cc me on any response).
Thanks,
Alex
I added linux-pci because this appears PCI-related.
On Tuesday 05 January 2010 02:18:02 am Alex Brooks wrote:
> I have a problem getting a couple of PCI cards to play nicely together. ...
> And (for both kernels) the output of lspci has the following line:
> 01:04.0 Unclassified device [0080]: Device 0002:0080
When you have only the 8-port serial card installed, it appears at
01:04.0. When you have both cards installed, we don't see a new device,
and whatever is at 01:04.0 no longer looks like the octal UART.
> Hardware Details:
> - My motherboard is a DigitalLogic MSM945. It has a PC104 bus.
> - I have two PC104 cards:
> i) 8-port serial card (DiamondSystems Emerald-MM-8Plus)
> ii) PC/104+ to MiniPCI Adapter (ConnectTech) (no MiniPCI cards installed).
Do you have these two cards?
http://www.diamondsystems.com/products/emeraldmm8plus,
http://www.connecttech.com/sub/Products/PC104plus_MiniPCI.asp
If so, both of those web pages mention jumpers that set the card's
PCI device ("slot") number. My guess is that both of your cards
are set to the same number.
Bjorn
> > I have a problem getting a couple of PCI cards to play nicely together.
> > ...
> >
> > And (for both kernels) the output of lspci has the following line:
> > 01:04.0 Unclassified device [0080]: Device 0002:0080
>
> When you have only the 8-port serial card installed, it appears at
> 01:04.0. When you have both cards installed, we don't see a new device,
> and whatever is at 01:04.0 no longer looks like the octal UART.
I'm pretty sure that it is the octal UART, but it isn't being recognised
properly. To test this theory, I tried with (a) no cards at all, and (b)
just the MiniPCI adapter (no serial card) -- nothing else appears at 01:04.0
(output of lspci attached).
> > Hardware Details:
> > - My motherboard is a DigitalLogic MSM945. It has a PC104 bus.
> > - I have two PC104 cards:
> > i) 8-port serial card (DiamondSystems Emerald-MM-8Plus)
> > ii) PC/104+ to MiniPCI Adapter (ConnectTech) (no MiniPCI cards
> > installed).
>
> Do you have these two cards?
>
> http://www.diamondsystems.com/products/emeraldmm8plus,
> http://www.connecttech.com/sub/Products/PC104plus_MiniPCI.asp
Yes that's right.
> If so, both of those web pages mention jumpers that set the card's
> PCI device ("slot") number. My guess is that both of your cards
> are set to the same number.
I'd already looked for this, I'm certain I have the jumpers right (I tried
intentionally setting them incorrectly, the failure mode is different).
Alex
On Tuesday 05 January 2010 03:19:47 pm Alex Brooks wrote:
> > > I have a problem getting a couple of PCI cards to play nicely together.
> > > ...
> > >
> > > And (for both kernels) the output of lspci has the following line:
> > > 01:04.0 Unclassified device [0080]: Device 0002:0080
> >
> > When you have only the 8-port serial card installed, it appears at
> > 01:04.0. When you have both cards installed, we don't see a new device,
> > and whatever is at 01:04.0 no longer looks like the octal UART.
>
> I'm pretty sure that it is the octal UART, but it isn't being recognised
> properly.
I'm sure it *is* the UART, but we're getting the wrong data back from
it, so we can't use it. The Mini PCI adapter must be interfering with
the UART board somehow. Since you said the Mini PCI slot on the adapter
is empty, and the jumpers are programmed correctly, I suspect the adapter
is broken. Maybe a connector problem? Do you have another one that
behaves the same way?
If you put a Mini PCI card in the slot and try this in a system without
the UART card, does the Mini PCI card work correctly? The jumpers on
the adapter should determine the PCI bus number where the Mini PCI device
appears.
> To test this theory, I tried with (a) no cards at all, and (b)
> just the MiniPCI adapter (no serial card) -- nothing else appears at 01:04.0
> (output of lspci attached).
The PC/104+ to Mini PCI adapter looks like it's completely passive. It
should be invisible to the OS, so it won't appear in lspci. For both
cases, the lspci output you attached is exactly what I would expect.
I don't see any indication that this is a software problem (I know nothing
about PC/104, so let me know if you disagree, and why).
> > If so, both of those web pages mention jumpers that set the card's
> > PCI device ("slot") number. My guess is that both of your cards
> > are set to the same number.
>
> I'd already looked for this, I'm certain I have the jumpers right (I tried
> intentionally setting them incorrectly, the failure mode is different).
How did you set them, and what was the failure? If the adapter is really
passive and the Mini PCI slot is empty, it seems strange that the adapter
jumper setting would make any difference at all.
Bjorn
> > > > I have a problem getting a couple of PCI cards to play nicely
> > > > together. ...
> > > >
> > > > And (for both kernels) the output of lspci has the following line:
> > > > 01:04.0 Unclassified device [0080]: Device 0002:0080
> > >
> > > When you have only the 8-port serial card installed, it appears at
> > > 01:04.0. When you have both cards installed, we don't see a new device,
> > > and whatever is at 01:04.0 no longer looks like the octal UART.
> >
> > I'm pretty sure that it is the octal UART, but it isn't being recognised
> > properly.
>
> I'm sure it *is* the UART, but we're getting the wrong data back from
> it, so we can't use it. The Mini PCI adapter must be interfering with
> the UART board somehow. Since you said the Mini PCI slot on the adapter
> is empty, and the jumpers are programmed correctly, I suspect the adapter
> is broken. Maybe a connector problem? Do you have another one that
> behaves the same way?
Unfortunately I don't have space PC104 cards. But I do have a spare PC104
motherboard (not the same brand/model), and on this I can verify that both
cards work correctly.
> If you put a Mini PCI card in the slot and try this in a system without
> the UART card, does the Mini PCI card work correctly? The jumpers on
> the adapter should determine the PCI bus number where the Mini PCI device
> appears.
Yes, a MiniPCI card in the slot works regardless of what other PCI cards are
on the bus. I'm trying to debug this without a MiniPCI card to remove extra
variables.
> > To test this theory, I tried with (a) no cards at all, and (b)
> > just the MiniPCI adapter (no serial card) -- nothing else appears at
> > 01:04.0 (output of lspci attached).
>
> The PC/104+ to Mini PCI adapter looks like it's completely passive. It
> should be invisible to the OS, so it won't appear in lspci. For both
> cases, the lspci output you attached is exactly what I would expect.
>
> I don't see any indication that this is a software problem (I know nothing
> about PC/104, so let me know if you disagree, and why).
I don't have a smoking gun that points at software, it was just my guess based
on:
a) all the pieces of hardware work in certain combinations
b) there's a nasty-looking error message in dmesg
> > > If so, both of those web pages mention jumpers that set the card's
> > > PCI device ("slot") number. My guess is that both of your cards
> > > are set to the same number.
> >
> > I'd already looked for this, I'm certain I have the jumpers right (I
> > tried intentionally setting them incorrectly, the failure mode is
> > different).
>
> How did you set them, and what was the failure? If the adapter is really
> passive and the Mini PCI slot is empty, it seems strange that the adapter
> jumper setting would make any difference at all.
I tried a few different incorrect settings, at least one of them was with both
cards on the same ID. The computer failed to boot (i.e. didn't get to the
linux kernel at all).
Alex
--
------------------------------
Dr Alex Brooks
Marathon Robotics Pty Ltd
National Innovation Centre
4 Cornwallis Street
Eveleigh, NSW 2015
Sydney, Australia
Ph: +61 2 9209 4021
Web: http://www.marathon-robotics.com
------------------------------
On Tuesday 05 January 2010 07:16:21 pm Alex Brooks wrote:
> > I don't see any indication that this is a software problem (I know
> > nothing about PC/104, so let me know if you disagree, and why).
>
> I don't have a smoking gun that points at software, it was just my guess
> based on:
> a) all the pieces of hardware work in certain combinations
> b) there's a nasty-looking error message in dmesg
You mentioned two messages. Here's the first:
PCI: BIOS Bug: MCFG area at e0000000 is not reserved in ACPI motherboard resources
PCI: Not using MMCONFIG.
and the second:
PCI: Cannot allocate resource region 0 of device 0000:01:04.0
PCI: Error while updating region 0000:01:04.0/0 (feb00002 != 00000000)
The "Error while updating region" message is because we wrote something
to a BAR, read it back, and they didn't match. I suppose this could be
some kind of config space problem related to the fact that we're not
using MMCONFIG.
Can you try removing the MCFG reservation check, so we do try to use
MMCONFIG? If you do this on a current kernel, we'll also get a little
more debug output about the subsequent resource allocation failure.
Bjorn
> You mentioned two messages. Here's the first:
> PCI: BIOS Bug: MCFG area at e0000000 is not reserved in ACPI motherboard
> resources PCI: Not using MMCONFIG.
>
> and the second:
> PCI: Cannot allocate resource region 0 of device 0000:01:04.0
> PCI: Error while updating region 0000:01:04.0/0 (feb00002 != 00000000)
>
> The "Error while updating region" message is because we wrote something
> to a BAR, read it back, and they didn't match. I suppose this could be
> some kind of config space problem related to the fact that we're not
> using MMCONFIG.
>
> Can you try removing the MCFG reservation check, so we do try to use
> MMCONFIG? If you do this on a current kernel, we'll also get a little
> more debug output about the subsequent resource allocation failure.
I did this on my 2.6.26 kernel (is this current enough or would a later kernel
be better?) and attached the dmesg output.
Alex
On Thu, 2010-01-07 at 12:09 +1100, Alex Brooks wrote:
> > You mentioned two messages. Here's the first:
> > PCI: BIOS Bug: MCFG area at e0000000 is not reserved in ACPI motherboard
> > resources PCI: Not using MMCONFIG.
> >
> > and the second:
> > PCI: Cannot allocate resource region 0 of device 0000:01:04.0
> > PCI: Error while updating region 0000:01:04.0/0 (feb00002 != 00000000)
> >
> > The "Error while updating region" message is because we wrote something
> > to a BAR, read it back, and they didn't match. I suppose this could be
> > some kind of config space problem related to the fact that we're not
> > using MMCONFIG.
> >
> > Can you try removing the MCFG reservation check, so we do try to use
> > MMCONFIG? If you do this on a current kernel, we'll also get a little
> > more debug output about the subsequent resource allocation failure.
>
> I did this on my 2.6.26 kernel (is this current enough or would a later kernel
> be better?) and attached the dmesg output.
Looks like you got the same errors when accessing config space as
before.
I was hoping for a kernel directly from Linus' git repo, e.g.,
2.6.33-rc3; I don't think the debug output I'm thinking about went in
until after 2.6.32 was released.
Bjorn
> I was hoping for a kernel directly from Linus' git repo, e.g.,
> 2.6.33-rc3; I don't think the debug output I'm thinking about went in
> until after 2.6.32 was released.
I'm not sure how to modify the 2.6.33-rc3 kernel source exactly re MCFG -- the
dmesg output for 2.6.33-rc3 is quite different (attached), including some new
information about an address collision for the recalcitrant device:
pci 0000:01:04.0: address space collision: [mem 0x00800000-0x00800fff] already
in use
pci 0000:01:04.0: can't reserve [mem 0x00800000-0x00800fff]
Does this shed any more light on things (or can you tell me what I could
modify to get better debug info)?
Thanks,
Alex
On Saturday 09 January 2010 08:07:26 pm Alex Brooks wrote:
> > I was hoping for a kernel directly from Linus' git repo, e.g.,
> > 2.6.33-rc3; I don't think the debug output I'm thinking about went in
> > until after 2.6.32 was released.
>
> I'm not sure how to modify the 2.6.33-rc3 kernel source exactly re MCFG -- the
> dmesg output for 2.6.33-rc3 is quite different (attached),
It's interesting that we now use MMCONFIG by default; no tweaking
necessary. I guess Linux just got a little smarter between 2.6.26
and 2.6.33 -- in this case, it looks like we reduce the size of the
MMCONFIG region from what the BIOS reported.
But I don't think MMCONFIG is relevant to this problem anyway.
> including some new
> information about an address collision for the recalcitrant device:
>
> pci 0000:01:04.0: address space collision: [mem 0x00800000-0x00800fff] already
> in use
> pci 0000:01:04.0: can't reserve [mem 0x00800000-0x00800fff]
>
> Does this shed any more light on things (or can you tell me what I could
> modify to get better debug info)?
It shows that we think the octal UART is at 0x00800000, which
doesn't seem valid (I think it's in the middle of your system RAM).
This is before Linux moves anything around, so normally this would
be what BIOS left in the BAR. But BIOS puts it at 0xfebff000 in the
working case (without the Mini PCI adapter), and the adapter shouldn't
even be visible to the BIOS, so I would expect the octal UART to still
be at the same address.
I'm afraid I still don't see a software problem here. To me (and I'm
certainly not a hardware person), it feels like an electrical problem
on the PCI bus: we read a BAR and it has a nonsensical value, we write
the BAR and can't read that value back, we read the vendor/device/class
codes and get nonsense. It's also interesting that most of these
nonsense values we read seem to have only one bit set.
Bjorn
> > including some new
> > information about an address collision for the recalcitrant device:
> >
> > pci 0000:01:04.0: address space collision: [mem 0x00800000-0x00800fff]
> > already in use
> > pci 0000:01:04.0: can't reserve [mem 0x00800000-0x00800fff]
> >
> > Does this shed any more light on things (or can you tell me what I could
> > modify to get better debug info)?
>
> It shows that we think the octal UART is at 0x00800000, which
> doesn't seem valid (I think it's in the middle of your system RAM).
>
> This is before Linux moves anything around, so normally this would
> be what BIOS left in the BAR. But BIOS puts it at 0xfebff000 in the
> working case (without the Mini PCI adapter), and the adapter shouldn't
> even be visible to the BIOS, so I would expect the octal UART to still
> be at the same address.
>
> I'm afraid I still don't see a software problem here. To me (and I'm
> certainly not a hardware person), it feels like an electrical problem
> on the PCI bus: we read a BAR and it has a nonsensical value, we write
> the BAR and can't read that value back, we read the vendor/device/class
> codes and get nonsense. It's also interesting that most of these
> nonsense values we read seem to have only one bit set.
OK, at this point I think I'll look into switching hardware. Thanks very much
for all the help.
Alex