The powerpc version of pci_resource_to_user() and associated hooks
used by /proc/bus/pci and /sys/bus/pci mmap have been broken for some
time on machines that don't have a 1:1 mapping of devices (basically
on non-PowerMacs) and have PCI devices above 32 bits.
This attempts to fix it as well as possible.
The rule is supposed to be that pci_resource_to_user() always converts
the resources back into a BAR values since that's what the /proc
interface was supposed to deal with. However, for X to work on platforms
where PCI MMIO is not mapped 1:1, it became a habit of platforms like
sparc or powerpc to pass "fixed up" values there sinve X expects to
be able to use values from /proc/bus/pci/devices as offsets to mmap
of /dev/mem...
So we keep that contraption here, causing also /sys/*/resource to
expose fully absolute MMIO addresses instead of BAR values, which is
ugly, but should still work as long as those are only used to calculate
alignment within a page.
X is still broken when built 32 bits on machines where PCI MMIO can be
above 32 bits space unfortunately. It looks like somebody (DaveM ?)
hacked a fix in X to handle long long resources and had the good idea
to wrap it in #ifdef __sparc__ :-(
I suppose distro can do a quick fix here to get current X working, at
least until we have the new code from Ian Romanick that is supposed to
do proper use of sysfs for mapping resources.
This version of the patch is not well tested, mostly for comments at
this point.
As for fixing X to deal with 64 bits values when built as a 32 bits
binary, edit hw/xfree86/os-support/linux/lnx_pci.c and make it use
the definitions inside #ifdef __sparc__ at the top of the file. I
can't see any good reason why this was made sparc only in the first
place though...
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
Dave (Airlie), maybe you want to fix X git ? As a temporary measure
at least until we get Ian code ?
Index: linux-cell/arch/powerpc/kernel/pci_32.c
===================================================================
--- linux-cell.orig/arch/powerpc/kernel/pci_32.c 2006-11-13 13:37:51.000000000 +1100
+++ linux-cell/arch/powerpc/kernel/pci_32.c 2006-11-13 19:01:28.000000000 +1100
@@ -1546,7 +1546,7 @@ pci_resource_to_bus(struct pci_dev *pdev
static struct resource *__pci_mmap_make_offset(struct pci_dev *dev,
- unsigned long *offset,
+ resource_size_t *offset,
enum pci_mmap_state mmap_state)
{
struct pci_controller *hose = pci_bus_to_hose(dev->bus->number);
@@ -1558,7 +1558,9 @@ static struct resource *__pci_mmap_make_
/* If memory, add on the PCI bridge address offset */
if (mmap_state == pci_mmap_mem) {
+#if 0 /* See comment in pci_resource_to_user() for why this is disabled */
*offset += hose->pci_mem_offset;
+#endif
res_bit = IORESOURCE_MEM;
} else {
io_offset = hose->io_base_virt - (void __iomem *)_IO_BASE;
@@ -1626,9 +1628,6 @@ static pgprot_t __pci_mmap_set_pgprot(st
else
prot |= _PAGE_GUARDED;
- printk("PCI map for %s:%llx, prot: %lx\n", pci_name(dev),
- (unsigned long long)rp->start, prot);
-
return __pgprot(prot);
}
@@ -1697,7 +1696,7 @@ int pci_mmap_page_range(struct pci_dev *
enum pci_mmap_state mmap_state,
int write_combine)
{
- unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
+ resource_size_t offset = vma->vm_pgoff << PAGE_SHIFT;
struct resource *rp;
int ret;
@@ -1810,21 +1809,43 @@ void pci_resource_to_user(const struct p
resource_size_t *start, resource_size_t *end)
{
struct pci_controller *hose = pci_bus_to_hose(dev->bus->number);
- unsigned long offset = 0;
+ resource_size_t offset = 0;
if (hose == NULL)
return;
if (rsrc->flags & IORESOURCE_IO)
- offset = (void __iomem *)_IO_BASE - hose->io_base_virt
- + hose->io_base_phys;
+ offset = (unsigned long)hose->io_base_virt - _IO_BASE;
+
+ /* We pass a fully fixed up address to userland for MMIO instead of
+ * a BAR value because X is lame and expects to be able to use that
+ * to pass to /dev/mem !
+ *
+ * That means that we'll have potentially 64 bits values where some
+ * userland apps only expect 32 (like X itself since it thinks only
+ * Sparc has 64 bits MMIO) but if we don't do that, we break it on
+ * 32 bits CHRPs :-(
+ *
+ * Hopefully, the sysfs insterface is immune to that gunk. Once X
+ * has been fixed (and the fix spread enough), we can re-enable the
+ * 2 lines below and pass down a BAR value to userland. In that case
+ * we'll also have to re-enable the matching code in
+ * __pci_mmap_make_offset().
+ *
+ * BenH.
+ */
+#if 0
+ else if (rsrc->flags & IORESOURCE_MEM)
+ offset = hose->pci_mem_offset;
+#endif
- *start = rsrc->start + offset;
- *end = rsrc->end + offset;
+ *start = rsrc->start - offset;
+ *end = rsrc->end - offset;
}
void __init
-pci_init_resource(struct resource *res, unsigned long start, unsigned long end,
+pci_init_resource(struct resource *res, resource_size_t start,
+ resource_size_t end,
int flags, char *name)
{
res->start = start;
Index: linux-cell/arch/powerpc/kernel/pci_64.c
===================================================================
--- linux-cell.orig/arch/powerpc/kernel/pci_64.c 2006-11-13 13:37:51.000000000 +1100
+++ linux-cell/arch/powerpc/kernel/pci_64.c 2006-11-13 19:01:21.000000000 +1100
@@ -689,7 +689,7 @@ int pci_proc_domain(struct pci_bus *bus)
* Returns negative error code on failure, zero on success.
*/
static struct resource *__pci_mmap_make_offset(struct pci_dev *dev,
- unsigned long *offset,
+ resource_size_t *offset,
enum pci_mmap_state mmap_state)
{
struct pci_controller *hose = pci_bus_to_host(dev->bus);
@@ -701,7 +701,9 @@ static struct resource *__pci_mmap_make_
/* If memory, add on the PCI bridge address offset */
if (mmap_state == pci_mmap_mem) {
+#if 0 /* See comment in pci_resource_to_user() for why this is disabled */
*offset += hose->pci_mem_offset;
+#endif
res_bit = IORESOURCE_MEM;
} else {
io_offset = (unsigned long)hose->io_base_virt - pci_io_base;
@@ -769,9 +771,6 @@ static pgprot_t __pci_mmap_set_pgprot(st
else
prot |= _PAGE_GUARDED;
- printk(KERN_DEBUG "PCI map for %s:%lx, prot: %lx\n", pci_name(dev), rp->start,
- prot);
-
return __pgprot(prot);
}
@@ -839,7 +838,7 @@ pgprot_t pci_phys_mem_access_prot(struct
int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
enum pci_mmap_state mmap_state, int write_combine)
{
- unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
+ resource_size_t offset = vma->vm_pgoff << PAGE_SHIFT;
struct resource *rp;
int ret;
@@ -1342,20 +1341,41 @@ EXPORT_SYMBOL(pci_read_irq_line);
void pci_resource_to_user(const struct pci_dev *dev, int bar,
const struct resource *rsrc,
- u64 *start, u64 *end)
+ resource_size_t *start, resource_size_t *end)
{
struct pci_controller *hose = pci_bus_to_host(dev->bus);
- unsigned long offset = 0;
+ resource_size_t offset = 0;
if (hose == NULL)
return;
if (rsrc->flags & IORESOURCE_IO)
- offset = pci_io_base - (unsigned long)hose->io_base_virt +
- hose->io_base_phys;
+ offset = (unsigned long)hose->io_base_virt - pci_io_base;
+
+ /* We pass a fully fixed up address to userland for MMIO instead of
+ * a BAR value because X is lame and expects to be able to use that
+ * to pass to /dev/mem !
+ *
+ * That means that we'll have potentially 64 bits values where some
+ * userland apps only expect 32 (like X itself since it thinks only
+ * Sparc has 64 bits MMIO) but if we don't do that, we break it on
+ * 32 bits CHRPs :-(
+ *
+ * Hopefully, the sysfs insterface is immune to that gunk. Once X
+ * has been fixed (and the fix spread enough), we can re-enable the
+ * 2 lines below and pass down a BAR value to userland. In that case
+ * we'll also have to re-enable the matching code in
+ * __pci_mmap_make_offset().
+ *
+ * BenH.
+ */
+#if 0
+ else if (rsrc->flags & IORESOURCE_MEM)
+ offset = hose->pci_mem_offset;
+#endif
- *start = rsrc->start + offset;
- *end = rsrc->end + offset;
+ *start = rsrc->start - offset;
+ *end = rsrc->end - offset;
}
struct pci_controller* pci_find_hose_for_OF_device(struct device_node* node)
Index: linux-cell/include/asm-powerpc/pci-bridge.h
===================================================================
--- linux-cell.orig/include/asm-powerpc/pci-bridge.h 2006-11-13 13:37:50.000000000 +1100
+++ linux-cell/include/asm-powerpc/pci-bridge.h 2006-11-13 13:51:00.000000000 +1100
@@ -31,12 +31,12 @@ struct pci_controller {
int last_busno;
void __iomem *io_base_virt;
- unsigned long io_base_phys;
+ resource_size_t io_base_phys;
/* Some machines have a non 1:1 mapping of
* the PCI memory space in the CPU bus space
*/
- unsigned long pci_mem_offset;
+ resource_size_t pci_mem_offset;
unsigned long pci_io_size;
struct pci_ops *ops;
Index: linux-cell/include/asm-ppc/pci-bridge.h
===================================================================
--- linux-cell.orig/include/asm-ppc/pci-bridge.h 2006-11-13 13:37:50.000000000 +1100
+++ linux-cell/include/asm-ppc/pci-bridge.h 2006-11-13 13:47:30.000000000 +1100
@@ -20,8 +20,8 @@ extern unsigned long pci_bus_mem_base_ph
extern struct pci_controller* pcibios_alloc_controller(void);
/* Helper function for setting up resources */
-extern void pci_init_resource(struct resource *res, unsigned long start,
- unsigned long end, int flags, char *name);
+extern void pci_init_resource(struct resource *res, resource_size_t start,
+ resource_size_t end, int flags, char *name);
/* Get the PCI host controller for a bus */
extern struct pci_controller* pci_bus_to_hose(int bus);
@@ -50,12 +50,12 @@ struct pci_controller {
int bus_offset;
void __iomem *io_base_virt;
- unsigned long io_base_phys;
+ resource_size_t io_base_phys;
/* Some machines (PReP) have a non 1:1 mapping of
* the PCI memory space in the CPU bus space
*/
- unsigned long pci_mem_offset;
+ resource_size_t pci_mem_offset;
struct pci_ops *ops;
volatile unsigned int __iomem *cfg_addr;
From: Benjamin Herrenschmidt <[email protected]>
Date: Mon, 13 Nov 2006 19:16:30 +1100
> The powerpc version of pci_resource_to_user() and associated hooks
> used by /proc/bus/pci and /sys/bus/pci mmap have been broken for some
> time on machines that don't have a 1:1 mapping of devices (basically
> on non-PowerMacs) and have PCI devices above 32 bits.
>
> This attempts to fix it as well as possible.
>
> The rule is supposed to be that pci_resource_to_user() always converts
> the resources back into a BAR values since that's what the /proc
> interface was supposed to deal with. However, for X to work on platforms
> where PCI MMIO is not mapped 1:1, it became a habit of platforms like
> sparc or powerpc to pass "fixed up" values there sinve X expects to
> be able to use values from /proc/bus/pci/devices as offsets to mmap
> of /dev/mem...
Not on Sparc. /dev/mem mmap()'s are basically verbotten on Sparc,
period.
That's the whole reason I created /proc/bus/pci mmap(). So that
X could simply read a BAR, open /proc/bus/pci/${DEVICE} and
mmap() with some range within the BAR as the offset/size tuple.
If an application wants all of I/O space (to poke at VGA addresses for
a domain, for example), it finds the root bridge for that domain and
you can mmap()'s it's I/O space that way on platforms the implement
I/O space via memory accesses.
The only thing X was (unfortunately) still using the "devices" file
for is device existence, and this is obviously broken because there is
zero domain information in that procfs file so it's impossible to use
correctly. :-)
> X is still broken when built 32 bits on machines where PCI MMIO can be
> above 32 bits space unfortunately. It looks like somebody (DaveM ?)
> hacked a fix in X to handle long long resources and had the good idea
> to wrap it in #ifdef __sparc__ :-(
Sorry, it was the only 32/64 platform at the time that old X code was
written and the X maintainers at the time were unbelievably anal :-/
So the gist of your change is that X isn't obtaining BAR values
in the correct context on powerpc, and so you're going to hack up
the "devices" files output to "help" X out.
This doesn't sound sane to me.
What sounds better to me is that X does the right thing, which is
obtain the BAR from the PCI config space to determine what values to
pass in to /proc/bus/pci mmap() calls. And if it wants raw addresses
to pass in to /dev/mem mmap()'s on platforms where that works (ie. not
Sparc, to begin with) it should obtain those values from the "devices"
file which must be values suitable as /dev/mem offsets.
I strongly look forward to Ian's new X code, that is for sure :-)
> Not on Sparc. /dev/mem mmap()'s are basically verbotten on Sparc,
> period.
>
> That's the whole reason I created /proc/bus/pci mmap(). So that
> X could simply read a BAR, open /proc/bus/pci/${DEVICE} and
> mmap() with some range within the BAR as the offset/size tuple.
Ok, well, that would be lovely if that API was followed by X for more
than just sparc :-) Basically, I started by fixing ppc to properly
accept BAR values in /proc/bus/pci/${DEVICE} and provide BAR values
in /proc/bus/pci/devices and that broke X.
The problem is that X expects a value in "devices" that it can pass
to /dev/mem :-( At least on ppc and other non-sparc platforms. Thus I
need to have /proc/bus/pci/devices provide "fixed" up values in
processor bus address space instead of BAR values, and what I provide in
that file is also what mmap expects as an offset for ${DEVICE} files,
thus I'm screwed... Unless I "disconnect" the two (possibly by passing
an argument to pci_resource_to_user() ...)
> If an application wants all of I/O space (to poke at VGA addresses for
> a domain, for example), it finds the root bridge for that domain and
> you can mmap()'s it's I/O space that way on platforms the implement
> I/O space via memory accesses.
Yeah, that's what X should do on ppc too, right now it doesn't. There's
Ian patches to completely overhaul X PCI that should fix all of that,
but in the meantime, I think I should look into turning all those ifdef
__sparc__ into if defined(__sparc__) || defined(__powerpc__) in X ! I'll
have a deeper look this week.
> The only thing X was (unfortunately) still using the "devices" file
> for is device existence, and this is obviously broken because there is
> zero domain information in that procfs file so it's impossible to use
> correctly. :-)
Yes, it's screwed when you start having domains. It also uses it to do
that horrible mmap trick I explained above. It basically takes a BAR
value, then reads the devices file entry for the device, and iterate the
BARs, reading them from the card to compare them to the value passed in,
and returning the base address in "devices" for use by mmap to /dev/mem.
> > X is still broken when built 32 bits on machines where PCI MMIO can be
> > above 32 bits space unfortunately. It looks like somebody (DaveM ?)
> > hacked a fix in X to handle long long resources and had the good idea
> > to wrap it in #ifdef __sparc__ :-(
>
> Sorry, it was the only 32/64 platform at the time that old X code was
> written and the X maintainers at the time were unbelievably anal :-/
Yeah, I remember.
> So the gist of your change is that X isn't obtaining BAR values
> in the correct context on powerpc, and so you're going to hack up
> the "devices" files output to "help" X out.
>
> This doesn't sound sane to me.
Well, we've been hacking up the "devices" file for some time. I fact, it
seems like X code is written with the expectation that all archs do that
by providing a "fixed" value there, it seems to be the whole point of
the xf86GetPciOffsetFromOS() and xf86GetOSOffsetFromPCI() functions in
there. They do what I explained above, convering a BAR value into
something that can be passed to /dev/mem. This is broken, but this is
the only way to get X to work
If I "fix" the kernel to do the right thing, that is pass BAR values in
devices and expect BAR values in mmap, then I will break existing X
setups on machines where PCI is not mapped 1:1 (that is mostly CHRP
machines).
The problem I'm fixing in this patch is that while we were providing the
hacked up value in "devices", we were expecting the BAR value in mmap,
and there are apps expecting us to be consistent between the two, thus
the breakage.
> What sounds better to me is that X does the right thing, which is
> obtain the BAR from the PCI config space to determine what values to
> pass in to /proc/bus/pci mmap() calls. And if it wants raw addresses
> to pass in to /dev/mem mmap()'s on platforms where that works (ie. not
> Sparc, to begin with) it should obtain those values from the "devices"
> file which must be values suitable as /dev/mem offsets.
Ok so you think I should stick to my old code that is passing "fixed"
values in "devices" and expect BAR values in mmap ? The problem if you
do that is that you break /sys then. Look at what pci-sysfs.c does:
/* pci_mmap_page_range() expects the same kind of entry as coming
* from /proc/bus/pci/ which is a "user visible" value. If this is
* different from the resource itself, arch will do necessary fixup.
*/
pci_resource_to_user(pdev, i, res, &start, &end);
vma->vm_pgoff += start >> PAGE_SHIFT;
It calls pci_resource_to_user() which is the function used to display
the resouces both in /sys/*/resource and /proc/bus/pci/devices, and adds
that base to the offset before calling the arch pci_mmap_page_range()
function.
That function expects the same values as /proc/ mmap offsets. Thus that
means that /proc/ mmap offets haev to be the same as what
pci_resource_to_user() return which is also... waht is
in /proc/bus/pci/devices ! yuck !
The only way to actually get things right -and-
have /proc/bus/pci/devices expose "fixed up" values that can be used
with /dev/mem would be to add an argument to pci_resource_to_user() that
would be set differently depending on wether it's called by the /proc
code to populate "devices" or wether it's called by syfs to do the fixup
above.
> I strongly look forward to Ian's new X code, that is for sure :-)
Me too :-)
Cheers,
Ben.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
David Miller wrote:
> From: Benjamin Herrenschmidt <[email protected]>
> Date: Mon, 13 Nov 2006 19:16:30 +1100
>
>> X is still broken when built 32 bits on machines where PCI MMIO can be
>> above 32 bits space unfortunately. It looks like somebody (DaveM ?)
>> hacked a fix in X to handle long long resources and had the good idea
>> to wrap it in #ifdef __sparc__ :-(
>
> Sorry, it was the only 32/64 platform at the time that old X code was
> written and the X maintainers at the time were unbelievably anal :-/
>
> So the gist of your change is that X isn't obtaining BAR values
> in the correct context on powerpc, and so you're going to hack up
> the "devices" files output to "help" X out.
>
> This doesn't sound sane to me.
It doesn't sound terribly sane to me. What's wrong with just opening
/sys/bus/pci/devices/*/resource[0-5]? It seems like that solves all the
problems.
> What sounds better to me is that X does the right thing, which is
> obtain the BAR from the PCI config space to determine what values to
> pass in to /proc/bus/pci mmap() calls. And if it wants raw addresses
> to pass in to /dev/mem mmap()'s on platforms where that works (ie. not
> Sparc, to begin with) it should obtain those values from the "devices"
> file which must be values suitable as /dev/mem offsets.
>
> I strongly look forward to Ian's new X code, that is for sure :-)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
iD8DBQFFWSbAX1gOwKyEAw8RAtceAKCc2PrYJNg8v2LcClLwTfEmo1aGzwCfRR7o
TkJnY+7IMpmWUQt/7FAW6A4=
=tDJc
-----END PGP SIGNATURE-----
> > So the gist of your change is that X isn't obtaining BAR values
> > in the correct context on powerpc, and so you're going to hack up
> > the "devices" files output to "help" X out.
> >
> > This doesn't sound sane to me.
>
> It doesn't sound terribly sane to me. What's wrong with just opening
> /sys/bus/pci/devices/*/resource[0-5]? It seems like that solves all the
> problems.
We have to, that's how we get current X to work. Of course it will be
much better once X uses sysfs, no question about that.
Ben.
From: Benjamin Herrenschmidt <[email protected]>
Date: Tue, 14 Nov 2006 12:59:54 +1100
> If I "fix" the kernel to do the right thing, that is pass BAR values in
> devices and expect BAR values in mmap, then I will break existing X
> setups on machines where PCI is not mapped 1:1 (that is mostly CHRP
> machines).
>
> The problem I'm fixing in this patch is that while we were providing the
> hacked up value in "devices", we were expecting the BAR value in mmap,
> and there are apps expecting us to be consistent between the two, thus
> the breakage.
Ok, I don't see much alternatives for you then. I have no real
objections to your patch.
On Mon, 2006-11-13 at 21:07 -0800, David Miller wrote:
> From: Benjamin Herrenschmidt <[email protected]>
> Date: Tue, 14 Nov 2006 12:59:54 +1100
>
> > If I "fix" the kernel to do the right thing, that is pass BAR values in
> > devices and expect BAR values in mmap, then I will break existing X
> > setups on machines where PCI is not mapped 1:1 (that is mostly CHRP
> > machines).
> >
> > The problem I'm fixing in this patch is that while we were providing the
> > hacked up value in "devices", we were expecting the BAR value in mmap,
> > and there are apps expecting us to be consistent between the two, thus
> > the breakage.
>
> Ok, I don't see much alternatives for you then. I have no real
> objections to your patch.
Thanks. Fortunately, it's all going away soon :-)
Ben.