2004-09-04 20:00:11

by [email protected]

[permalink] [raw]
Subject: multi-domain PCI and sysfs

How do multiple PCI domains appear in sysfs? I don't own a machine
with these so I can't just look.

Do they appear like:
/sys/devices/pci0000:00
/sys/devices/pci0001:00
/sys/devices/pci0002:00

I'm trying to figure out where to attach a sysfs attribute for turning
vga off in a domain. I'd like to do something like:
/sys/devices/pci0000:00/vga
/sys/devices/pci0001:00/vga
/sys/devices/pci0002:00/vga

I need to know what the domains look like in sysfs in order to pick
the right place for the attribute.

--
Jon Smirl
[email protected]


2004-09-04 21:58:23

by Jesse Barnes

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Saturday, September 4, 2004 1:00 pm, Jon Smirl wrote:
> How do multiple PCI domains appear in sysfs? I don't own a machine
> with these so I can't just look.
>
> Do they appear like:
> /sys/devices/pci0000:00
> /sys/devices/pci0001:00
> /sys/devices/pci0002:00

Yep, on all the machines I've used.

sn2 (ia64):
[root@flatearth ~]# ls -l /sys/devices
total 0
drwxr-xr-x 5 root root 0 Sep 5 08:07 pci0000:01
drwxr-xr-x 3 root root 0 Sep 5 08:07 pci0000:02
drwxr-xr-x 2 root root 0 Sep 5 08:07 platform
drwxr-xr-x 5 root root 0 Sep 5 08:07 system
[root@flatearth ~]# ls -l /sys/devices/pci0000\:02
total 0
drwxr-xr-x 2 root root 0 Sep 5 08:07 0000:02:01.0
-rw-r--r-- 1 root root 16384 Sep 5 08:07 detach_state

ppc32:
jbarnes@mill:~$ ls -l /sys/devices
total 0
drwxr-xr-x 5 root root 0 Sep 4 13:37 pci0000:00/
drwxr-xr-x 13 root root 0 Sep 4 13:37 pci0001:01/
drwxr-xr-x 7 root root 0 Sep 4 13:37 pci0002:06/
drwxr-xr-x 3 root root 0 Sep 4 13:37 platform/
drwxr-xr-x 4 root root 0 Sep 4 13:37 system/
drwxr-xr-x 5 root root 0 Sep 4 13:37 uni-n-i2c/
jbarnes@mill:~$ ls -l /sys/devices/pci0001\:01/
total 0
drwxr-xr-x 3 root root 0 Sep 4 13:37 0001:01:0b.0/
drwxr-xr-x 3 root root 0 Sep 4 13:37 0001:01:12.0/
drwxr-xr-x 3 root root 0 Sep 4 13:37 0001:01:13.0/
drwxr-xr-x 4 root root 0 Sep 4 13:37 0001:01:17.0/
drwxr-xr-x 3 root root 0 Sep 4 13:37 0001:01:18.0/
drwxr-xr-x 3 root root 0 Sep 4 13:37 0001:01:19.0/
drwxr-xr-x 4 root root 0 Sep 4 13:37 0001:01:1a.0/
drwxr-xr-x 4 root root 0 Sep 4 13:37 0001:01:1b.0/
drwxr-xr-x 4 root root 0 Sep 4 13:37 0001:01:1b.1/
drwxr-xr-x 3 root root 0 Sep 4 13:37 0001:01:1b.2/
-rw-r--r-- 1 root root 4096 Sep 4 13:37 detach_state
drwxr-xr-x 2 root root 0 Sep 4 13:37 power/

Jesse

2004-09-04 22:12:39

by [email protected]

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Sat, 4 Sep 2004 14:57:46 -0700, Jesse Barnes <[email protected]> wrote:
> Yep, on all the machines I've used.
>
> sn2 (ia64):
> [root@flatearth ~]# ls -l /sys/devices
> total 0
> drwxr-xr-x 5 root root 0 Sep 5 08:07 pci0000:01
> drwxr-xr-x 3 root root 0 Sep 5 08:07 pci0000:02
> drwxr-xr-x 2 root root 0 Sep 5 08:07 platform
> drwxr-xr-x 5 root root 0 Sep 5 08:07 system

sn2 looks wrong. It should be
> drwxr-xr-x 5 root root 0 Sep 5 08:07 pci0000:01
> drwxr-xr-x 3 root root 0 Sep 5 08:07 pci0001:02
> drwxr-xr-x 2 root root 0 Sep 5 08:07 platform
> drwxr-xr-x 5 root root 0 Sep 5 08:07 system

--
Jon Smirl
[email protected]

2004-09-04 22:27:54

by Jesse Barnes

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Saturday, September 4, 2004 3:12 pm, Jon Smirl wrote:
> On Sat, 4 Sep 2004 14:57:46 -0700, Jesse Barnes <[email protected]>
wrote:
> > Yep, on all the machines I've used.
> >
> > sn2 (ia64):
> > [root@flatearth ~]# ls -l /sys/devices
> > total 0
> > drwxr-xr-x 5 root root 0 Sep 5 08:07 pci0000:01
> > drwxr-xr-x 3 root root 0 Sep 5 08:07 pci0000:02
> > drwxr-xr-x 2 root root 0 Sep 5 08:07 platform
> > drwxr-xr-x 5 root root 0 Sep 5 08:07 system
>
> sn2 looks wrong. It should be
>
> > drwxr-xr-x 5 root root 0 Sep 5 08:07 pci0000:01
> > drwxr-xr-x 3 root root 0 Sep 5 08:07 pci0001:02
> > drwxr-xr-x 2 root root 0 Sep 5 08:07 platform
> > drwxr-xr-x 5 root root 0 Sep 5 08:07 system

It only has one domain though, so it's correct. Both busses are in domain 0.

Jesse

2004-09-04 22:45:30

by [email protected]

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

Is this a multipath configuration where pci0000:01 and pci0000:02 can
both get to the same target bus? So both busses are top level busses?

I'm trying to figure out where to stick the vga=0/1 attribute for
disabling all the VGA devices in a domain. It's starting to look like
there isn't a single node in sysfs that corresponds to a domain, in
this case there are two for the same domain.

On Sat, 4 Sep 2004 15:27:50 -0700, Jesse Barnes <[email protected]> wrote:
> On Saturday, September 4, 2004 3:12 pm, Jon Smirl wrote:
> > On Sat, 4 Sep 2004 14:57:46 -0700, Jesse Barnes <[email protected]>
> wrote:
> > > Yep, on all the machines I've used.
> > >
> > > sn2 (ia64):
> > > [root@flatearth ~]# ls -l /sys/devices
> > > total 0
> > > drwxr-xr-x 5 root root 0 Sep 5 08:07 pci0000:01
> > > drwxr-xr-x 3 root root 0 Sep 5 08:07 pci0000:02
> > > drwxr-xr-x 2 root root 0 Sep 5 08:07 platform
> > > drwxr-xr-x 5 root root 0 Sep 5 08:07 system
> >
> > sn2 looks wrong. It should be
> >
> > > drwxr-xr-x 5 root root 0 Sep 5 08:07 pci0000:01
> > > drwxr-xr-x 3 root root 0 Sep 5 08:07 pci0001:02
> > > drwxr-xr-x 2 root root 0 Sep 5 08:07 platform
> > > drwxr-xr-x 5 root root 0 Sep 5 08:07 system
>
> It only has one domain though, so it's correct. Both busses are in domain 0.
>
> Jesse
>



--

Jon Smirl
[email protected]

2004-09-04 23:04:03

by Jesse Barnes

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Saturday, September 4, 2004 3:45 pm, Jon Smirl wrote:
> Is this a multipath configuration where pci0000:01 and pci0000:02 can
> both get to the same target bus? So both busses are top level busses?
>
> I'm trying to figure out where to stick the vga=0/1 attribute for
> disabling all the VGA devices in a domain. It's starting to look like
> there isn't a single node in sysfs that corresponds to a domain, in
> this case there are two for the same domain.

Yes, I think that's the case. Matthew would probably know for sure though.

Jesse

2004-09-05 23:04:32

by Matthew Wilcox

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Sat, Sep 04, 2004 at 04:03:56PM -0700, Jesse Barnes wrote:
> On Saturday, September 4, 2004 3:45 pm, Jon Smirl wrote:
> > Is this a multipath configuration where pci0000:01 and pci0000:02 can
> > both get to the same target bus? So both busses are top level busses?
> >
> > I'm trying to figure out where to stick the vga=0/1 attribute for
> > disabling all the VGA devices in a domain. It's starting to look like
> > there isn't a single node in sysfs that corresponds to a domain, in
> > this case there are two for the same domain.
>
> Yes, I think that's the case. Matthew would probably know for sure though.

Huh, eh, what? There's no such thing as multipath PCI configurations.
The important concepts in PCI are:

- the function
- the device
- the bus
- the root bus
- the domain
- the machine

That is, a machine contains one or more PCI domains, each domain contains
one or more root busses, each root bus may have bridges to a collection
of other busses, each bus contains one or more devices, each device
contains one or more functions.

Many people confuse the domain and the root bus. HP has made many
machines with multiple root busses in a single PCI domain, particularly
in the PA-RISC line. Frequently the topology of these machines looks
like this: http://www.hp.com/products1/itanium/chipset/2-way_block.html

Each of the six chips labelled "hp zx1 I/O adapters" is a host to PCI
bridge; thus this sample machine has 6 root busses. Despite that, it's
a single domain -- all the devices are numbered so as to not overlap.
It's theoretically possible to communicate between devices under different
I/O adapters, but this isn't a supported configuration.

HP's multiple domain machines (eg rx8620) look something like several
of these tied together. That's not really how it works, but that's a
good way to imagine them.

I haven't really looked at the VGA attribute. I think Ivan or Grant
would be better equipped to help you on this front. I remember them
rehashing it 2-3 years ago.

--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain

2004-09-05 23:50:21

by [email protected]

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

So how do multiple root buses work? Since they are all in the same
domain it seems like a single IO operation would go to all of the root
bridges simultaneously. Then each root bridge matches to the address
and decides to send the IO operation on if there is a match.

Would the diagram be equivalent to having a single root bridge with
six transparent PCI bridges attached? The purpose of the six host
bridges is to let six PCI transactions happen in parallel, right?
Other that probing the PCI config space of the host bridges, is there
any way in software to know that there are six bridges, or is this
completely transparent?

When implementing the VGA control I'm running into the problem that
there is no root node in sysfs for the top of a domain. I need a
domain node to attach an attribute disabling all VGA devices in the
domain. In the zx1 diagram I could have vga devices on any of the
PCI-X or AGP buses.

With the xz1 sysfs would look like this:
/sys/devices/pci0000:00
/sys/devices/pci0000:01
/sys/devices/pci0000:02
/sys/devices/pci0000:03
/sys/devices/pci0000:04
/sys/devices/pci0000:05

would it make more sense if it looked like this?
/sys/devices/pci0000/00
/sys/devices/pci0000/01
/sys/devices/pci0000/02
/sys/devices/pci0000/03
/sys/devices/pci0000/04
/sys/devices/pci0000/05

Now I have a node for the top of the domain. I can make a vga
attribute like this:
/sys/devices/pci0000/vga
set it to zero and my code will disable all VGA devices in the domain.

My home system would switch from
/sys/devices/pci0000:00
to
/sys/devices/pci0000/00

pp32 from
/sys/devices/pci0000:00
/sys/devices/pci0001:01
/sys/devices/pci0002:06
to
/sys/devices/pci0000/00
/sys/devices/pci0001/01
/sys/devices/pci0002/06

On Mon, 6 Sep 2004 00:04:25 +0100, Matthew Wilcox <[email protected]> wrote:
>
>
> On Sat, Sep 04, 2004 at 04:03:56PM -0700, Jesse Barnes wrote:
> > On Saturday, September 4, 2004 3:45 pm, Jon Smirl wrote:
> > > Is this a multipath configuration where pci0000:01 and pci0000:02 can
> > > both get to the same target bus? So both busses are top level busses?
> > >
> > > I'm trying to figure out where to stick the vga=0/1 attribute for
> > > disabling all the VGA devices in a domain. It's starting to look like
> > > there isn't a single node in sysfs that corresponds to a domain, in
> > > this case there are two for the same domain.
> >
> > Yes, I think that's the case. Matthew would probably know for sure though.
>
> Huh, eh, what? There's no such thing as multipath PCI configurations.
> The important concepts in PCI are:
>
> - the function
> - the device
> - the bus
> - the root bus
> - the domain
> - the machine
>
> That is, a machine contains one or more PCI domains, each domain contains
> one or more root busses, each root bus may have bridges to a collection
> of other busses, each bus contains one or more devices, each device
> contains one or more functions.
>
> Many people confuse the domain and the root bus. HP has made many
> machines with multiple root busses in a single PCI domain, particularly
> in the PA-RISC line. Frequently the topology of these machines looks
> like this: http://www.hp.com/products1/itanium/chipset/2-way_block.html
>
> Each of the six chips labelled "hp zx1 I/O adapters" is a host to PCI
> bridge; thus this sample machine has 6 root busses. Despite that, it's
> a single domain -- all the devices are numbered so as to not overlap.
> It's theoretically possible to communicate between devices under different
> I/O adapters, but this isn't a supported configuration.
>
> HP's multiple domain machines (eg rx8620) look something like several
> of these tied together. That's not really how it works, but that's a
> good way to imagine them.
>
> I haven't really looked at the VGA attribute. I think Ivan or Grant
> would be better equipped to help you on this front. I remember them
> rehashing it 2-3 years ago.
>
> --
> "Next the statesmen will invent cheap lies, putting the blame upon
> the nation that is attacked, and every man will be glad of those
> conscience-soothing falsities, and will diligently study them, and refuse
> to examine any refutations of them; and thus he will by and by convince
> himself that the war is just, and will thank God for the better sleep
> he enjoys after this process of grotesque self-deception." -- Mark Twain
>



--
Jon Smirl
[email protected]

2004-09-06 00:03:16

by Alan

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Llu, 2004-09-06 at 00:50, Jon Smirl wrote:
> So how do multiple root buses work? Since they are all in the same
> domain it seems like a single IO operation would go to all of the root
> bridges simultaneously. Then each root bridge matches to the address
> and decides to send the IO operation on if there is a match.

Architecture specific. Even in the PC world it varies.

Alan

2004-09-06 00:07:39

by Jesse Barnes

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Sunday, September 5, 2004 4:04 pm, Matthew Wilcox wrote:
> On Sat, Sep 04, 2004 at 04:03:56PM -0700, Jesse Barnes wrote:
> > On Saturday, September 4, 2004 3:45 pm, Jon Smirl wrote:
> > > Is this a multipath configuration where pci0000:01 and pci0000:02 can
> > > both get to the same target bus? So both busses are top level busses?
> > >
> > > I'm trying to figure out where to stick the vga=0/1 attribute for
> > > disabling all the VGA devices in a domain. It's starting to look like
> > > there isn't a single node in sysfs that corresponds to a domain, in
> > > this case there are two for the same domain.
> >
> > Yes, I think that's the case. Matthew would probably know for sure
> > though.
>
> Huh, eh, what? There's no such thing as multipath PCI configurations.
> The important concepts in PCI are:

Right, but I was answering his question about whether or not there was a place
to stick his 'vga' control file on a per-domain basis. There would be if the
layout was something like this:

/sys/devices/pciDDDD/BB/SS.F/foo
rather than the current
/sys/devices/pciDDDD:BB/DDDD:BB:SS.F/foo

> I haven't really looked at the VGA attribute. I think Ivan or Grant
> would be better equipped to help you on this front. I remember them
> rehashing it 2-3 years ago.

I'm actually ok with a system wide vga arbitration driver, assuming that we'll
never have to worry about the scalability of stuff that wants to do legacy
vga I/O.

Thanks,
Jesse

2004-09-06 00:31:31

by [email protected]

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Sun, 5 Sep 2004 17:06:50 -0700, Jesse Barnes <[email protected]> wrote:
> /sys/devices/pciDDDD/BB/SS.F/foo
> rather than the current
> /sys/devices/pciDDDD:BB/DDDD:BB:SS.F/foo
>

/sys/devices/pciDDDD/BB/DDDD:BB:SS.F/foo

Would be better. You want the fully qualified location on the device node.


--
Jon Smirl
[email protected]

2004-09-06 01:39:12

by [email protected]

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

Another way to look at this would be to create one vga device per
domain. I could then hook the vga=0/1 attribute off from this device
and avoid the problem of creating domain nodes under /sys/devices


On Sun, 5 Sep 2004 17:06:50 -0700, Jesse Barnes <[email protected]> wrote:
> On Sunday, September 5, 2004 4:04 pm, Matthew Wilcox wrote:
> > On Sat, Sep 04, 2004 at 04:03:56PM -0700, Jesse Barnes wrote:
> > > On Saturday, September 4, 2004 3:45 pm, Jon Smirl wrote:
> > > > Is this a multipath configuration where pci0000:01 and pci0000:02 can
> > > > both get to the same target bus? So both busses are top level busses?
> > > >
> > > > I'm trying to figure out where to stick the vga=0/1 attribute for
> > > > disabling all the VGA devices in a domain. It's starting to look like
> > > > there isn't a single node in sysfs that corresponds to a domain, in
> > > > this case there are two for the same domain.
> > >
> > > Yes, I think that's the case. Matthew would probably know for sure
> > > though.
> >
> > Huh, eh, what? There's no such thing as multipath PCI configurations.
> > The important concepts in PCI are:
>
> Right, but I was answering his question about whether or not there was a place
> to stick his 'vga' control file on a per-domain basis. There would be if the
> layout was something like this:
>
> /sys/devices/pciDDDD/BB/SS.F/foo
> rather than the current
> /sys/devices/pciDDDD:BB/DDDD:BB:SS.F/foo
>
> > I haven't really looked at the VGA attribute. I think Ivan or Grant
> > would be better equipped to help you on this front. I remember them
> > rehashing it 2-3 years ago.
>
> I'm actually ok with a system wide vga arbitration driver, assuming that we'll
> never have to worry about the scalability of stuff that wants to do legacy
> vga I/O.
>
> Thanks,
> Jesse
>
>



--
Jon Smirl
[email protected]

2004-09-06 01:41:29

by Matthew Wilcox

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Sun, Sep 05, 2004 at 07:50:04PM -0400, Jon Smirl wrote:
> So how do multiple root buses work? Since they are all in the same
> domain it seems like a single IO operation would go to all of the root
> bridges simultaneously. Then each root bridge matches to the address
> and decides to send the IO operation on if there is a match.

Not exactly -- each rope (the line that connects the memory & I/O
controller to the I/O adapter) has a limited bandwidth, so the memory
& I/O controller makes the decision which rope each transaction is
destined for.

> When implementing the VGA control I'm running into the problem that
> there is no root node in sysfs for the top of a domain. I need a
> domain node to attach an attribute disabling all VGA devices in the
> domain. In the zx1 diagram I could have vga devices on any of the
> PCI-X or AGP buses.

Why would it be a problem if the attribute is per-bus, as it is right now?
see bus->bridge_ctl (PCI_BRIDGE_CTL_VGA)

> With the xz1 sysfs would look like this:
> /sys/devices/pci0000:00
> /sys/devices/pci0000:01
> /sys/devices/pci0000:02
> /sys/devices/pci0000:03
> /sys/devices/pci0000:04
> /sys/devices/pci0000:05

Actually, they're sparsely numbered to allow for people plugging in pci-pci
bridges on cards, so:

$ ls -1 /sys/devices/
pci0000:00
pci0000:80
pci0000:a0
pci0000:c0
platform
system

--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain

2004-09-07 23:01:05

by [email protected]

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Mon, 6 Sep 2004 02:40:58 +0100, Matthew Wilcox <[email protected]> wrote:
> > When implementing the VGA control I'm running into the problem that
> > there is no root node in sysfs for the top of a domain. I need a
> > domain node to attach an attribute disabling all VGA devices in the
> > domain. In the zx1 diagram I could have vga devices on any of the
> > PCI-X or AGP buses.
>
> Why would it be a problem if the attribute is per-bus, as it is right now?
> see bus->bridge_ctl (PCI_BRIDGE_CTL_VGA)
>
> Actually, they're sparsely numbered to allow for people plugging in pci-pci
> bridges on cards, so:
>
> $ ls -1 /sys/devices/
> pci0000:00
> pci0000:80
> pci0000:a0
> pci0000:c0
> platform
> system

How many active VGA devices can I have in this system 1 or 4? If the
answer is 4, how do I independently address each VGA card? If the
answer is one, you can see why I want a pci0000 node to hold the
attribute for turning it off and on.

How many simultaneous VGA devices does this system allow?

ppc32:
jbarnes@mill:~$ ls -l /sys/devices
total 0
drwxr-xr-x 5 root root 0 Sep 4 13:37 pci0000:00/
drwxr-xr-x 13 root root 0 Sep 4 13:37 pci0001:01/
drwxr-xr-x 7 root root 0 Sep 4 13:37 pci0002:06/
drwxr-xr-x 3 root root 0 Sep 4 13:37 platform/
drwxr-xr-x 4 root root 0 Sep 4 13:37 system/
drwxr-xr-x 5 root root 0 Sep 4 13:37 uni-n-i2c/

I would think it is three active devices.

Does a PCI domain imply separate PCI IO address spaces, or does it
just mean separate PCI Config spaces?

Can an x86 machine use separate PCI IO address spaces?

--
Jon Smirl
[email protected]

2004-09-07 23:17:14

by David Miller

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Tue, 7 Sep 2004 18:58:53 -0400
Jon Smirl <[email protected]> wrote:

> > $ ls -1 /sys/devices/
> > pci0000:00
> > pci0000:80
> > pci0000:a0
> > pci0000:c0
> > platform
> > system
>
> How many active VGA devices can I have in this system 1 or 4? If the
> answer is 4, how do I independently address each VGA card? If the
> answer is one, you can see why I want a pci0000 node to hold the
> attribute for turning it off and on.

I don't know about the above but for a multi-domain system the
way it works is that the I/O ports are accessed using a different
base address for each domain.

On my box I/O space looks like:

7ffed000000-7ffedffffff : SCHIZO0 PBMA
7ffed000300-7ffed0003ff : 0000:00:04.0
7ffed000300-7ffed0003fe : qlogicfc
7ffef000000-7ffefffffff : SCHIZO0 PBMB
7ffef000300-7ffef0003ff : 0001:00:06.0
7ffef000300-7ffef0003ff : sym53c8xx
7ffef000400-7ffef0004ff : 0001:00:06.1
7ffef000400-7ffef0004ff : sym53c8xx

SCHIZO is the PCI controller name, there are two PCI segments.

davem@nuts:~$ ls /sys/devices/pci*
/sys/devices/pci0000:00:
0000:00:00.0 0000:00:01.0 0000:00:04.0 detach_state

/sys/devices/pci0001:00:
0001:00:00.0 0001:00:03.0 0001:00:05.2 0001:00:06.1
0001:00:01.0 0001:00:05.0 0001:00:05.3 detach_state
0001:00:02.0 0001:00:05.1 0001:00:06.0
davem@nuts:~$

So to access some VGA port on domain zero the
address would likely end up being:

0x7ffed000000 + VGA_FOO

This is a real touchy area btw, because if there is no
VGA card, such I/O port accesses are going to trap and
we need to have a common way to handle that somehow.

2004-09-08 03:40:19

by [email protected]

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Tue, 7 Sep 2004 16:11:40 -0700, David S. Miller <[email protected]> wrote:
> On Tue, 7 Sep 2004 18:58:53 -0400
> Jon Smirl <[email protected]> wrote:
> > How many active VGA devices can I have in this system 1 or 4? If the
> > answer is 4, how do I independently address each VGA card? If the
> > answer is one, you can see why I want a pci0000 node to hold the
> > attribute for turning it off and on.
>
> I don't know about the above but for a multi-domain system the
> way it works is that the I/O ports are accessed using a different
> base address for each domain.

How does this work for IO ports in port space instead of memory mapped IO?

--
Jon Smirl
[email protected]

2004-09-08 04:15:33

by David Miller

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Tue, 7 Sep 2004 23:39:49 -0400
Jon Smirl <[email protected]> wrote:

> On Tue, 7 Sep 2004 16:11:40 -0700, David S. Miller <[email protected]> wrote:
> > On Tue, 7 Sep 2004 18:58:53 -0400
> > Jon Smirl <[email protected]> wrote:
> > > How many active VGA devices can I have in this system 1 or 4? If the
> > > answer is 4, how do I independently address each VGA card? If the
> > > answer is one, you can see why I want a pci0000 node to hold the
> > > attribute for turning it off and on.
> >
> > I don't know about the above but for a multi-domain system the
> > way it works is that the I/O ports are accessed using a different
> > base address for each domain.
>
> How does this work for IO ports in port space instead of memory mapped IO?

Those are IO ports in port space. IO ports and PCI memory space just
live in different physical memory windows, no special instructions
for IO port space access as on x86.

2004-09-08 04:16:36

by Jesse Barnes

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Tuesday, September 7, 2004 4:11 pm, David S. Miller wrote:
> This is a real touchy area btw, because if there is no
> VGA card, such I/O port accesses are going to trap and
> we need to have a common way to handle that somehow.

So I take it your platform won't soft fail the accesses and return all 1s? On
ia64, I've got a patch to add some machine check code to deal with it, but it
requires pre-registration of the regions that are to be used for legacy I/O
(i.e. I have to record the memory range and pid at /proc/bus/pci mmap time so
that the machine check handler can send a SIGBUS). A potentially cleaner
option which Ben and I would prefer is to use the vga device Jon is creating
to do legacy I/O with explicit read/write or ioctl calls.

Jesse

2004-09-08 04:17:01

by Jesse Barnes

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Tuesday, September 7, 2004 8:39 pm, Jon Smirl wrote:
> On Tue, 7 Sep 2004 16:11:40 -0700, David S. Miller <[email protected]>
wrote:
> > On Tue, 7 Sep 2004 18:58:53 -0400
> >
> > Jon Smirl <[email protected]> wrote:
> > > How many active VGA devices can I have in this system 1 or 4? If the
> > > answer is 4, how do I independently address each VGA card? If the
> > > answer is one, you can see why I want a pci0000 node to hold the
> > > attribute for turning it off and on.
> >
> > I don't know about the above but for a multi-domain system the
> > way it works is that the I/O ports are accessed using a different
> > base address for each domain.
>
> How does this work for IO ports in port space instead of memory mapped IO?

On sn2 at least, it's the same thing. Each PCI segment has a 'base address'
that can be used for legacy I/O. Just add the port you want to access to the
base and hope that a card responds before a master abort occurs.

Jesse

2004-09-08 04:19:41

by David Miller

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Tue, 7 Sep 2004 21:15:09 -0700
Jesse Barnes <[email protected]> wrote:

> On Tuesday, September 7, 2004 4:11 pm, David S. Miller wrote:
> > This is a real touchy area btw, because if there is no
> > VGA card, such I/O port accesses are going to trap and
> > we need to have a common way to handle that somehow.
>
> So I take it your platform won't soft fail the accesses and return all 1s?

Nope, you get a machine check trap. We have to catch these when doing
PCI config space accesses in the kernel too. Grep for pci_poke_* in
arch/sparc64/kernel/*.c

> A potentially cleaner option which Ben and I would prefer is to use
> the vga device Jon is creating to do legacy I/O with explicit
> read/write or ioctl calls.

Definitely. Note that xfree86 already has a signal handler for this
stuff, ppc generates traps like sparc64 too.

2004-09-08 04:26:16

by Jesse Barnes

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Tuesday, September 7, 2004 9:16 pm, David S. Miller wrote:
> > A potentially cleaner option which Ben and I would prefer is to use
> > the vga device Jon is creating to do legacy I/O with explicit
> > read/write or ioctl calls.
>
> Definitely. Note that xfree86 already has a signal handler for this
> stuff, ppc generates traps like sparc64 too.

Doing SIGBUS on ia64 was painful, due to the way the CPU chooses to not
generate errors until bad data is actually consumed, but that's the approach
we're taking at the moment. I'd rather have the ioctls though, so I'm glad
you're up for it. My hope is that we can have a unified Linux device access
method in X and get rid of all (or at least most) of the ppc/sparc/ia64/etc.
specific hacks in the tree...

Jesse

2004-09-08 06:01:43

by [email protected]

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Tue, 7 Sep 2004 21:25:41 -0700, Jesse Barnes <[email protected]> wrote:
> you're up for it. My hope is that we can have a unified Linux device access
> method in X and get rid of all (or at least most) of the ppc/sparc/ia64/etc.
> specific hacks in the tree...

X on GL is going to eliminate all device access from X. Everything
will be handled from the OpenGL layer. When everything is finished
even the OpenGL layer won't do hardware access either, it will IOCTL
the DRM driver to do it. In the final solution the only user of the
VGA control should be the secondary card reset program.

Where is the PCI segment base address stored in the PCI driver
structures? I'm still having trouble with the fact that the PCI driver
does not have a clear structure representing a PCI segment. Shouldn't
there be a structure corresponding to a segment?

>From what I understand right now the SN2 machine can not have two
active VGA cards since it does not have two PCI segments. Without two
segments there is no way to tell the legacy addresses apart.

--
Jon Smirl
[email protected]

2004-09-08 06:55:30

by [email protected]

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

Another part I don't understand... PCI VGA hardware is designed to
respond to IN/OUT instructions to port space. ppc64/ia64 don't have
IN/OUT port instructions. Is there some special hardware on ppc64/ia64
that declares part of the PCI IO space "legacy space" and turns
read/writes there into IN/OUT port cycles on the PCI bus so that the
legacy hardware can see the accesses?

On machines without this "legacy space" translation hardware (ie all
32b x86 bit machines) I can only have a single VGA adapter active
since there is only a single legacy space and inb/outb are real
instructions.

On machines with "legacy space" translation I can have one active VGA
card per translator. How do I know how many translators there are? Is
only one per domain/segment allowed?

How does ppc32 handle VGA port instructions, is the "legacy
translation" space at the bottom of the PCI address space?

I looked at io.h on IA64, how do apps select which legacy IO space
they are using? Now I see add_io_space() and related code.

Maybe it's not a good idea to have a 32b x86 person writing this
driver. Is there a cross platform structure that corresponds to IO
spaces?


--
Jon Smirl
[email protected]

2004-09-08 11:12:57

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Wed, 8 Sep 2004, Jon Smirl wrote:
> Another part I don't understand... PCI VGA hardware is designed to
> respond to IN/OUT instructions to port space. ppc64/ia64 don't have
> IN/OUT port instructions. Is there some special hardware on ppc64/ia64
> that declares part of the PCI IO space "legacy space" and turns
^^^^^^^^^^^^ ^^^^^^^^^^^^
What you call `legacy space' above is called `PCI I/O space'.
What you call `PCI IO space' above is CPU address (memory) space.

> read/writes there into IN/OUT port cycles on the PCI bus so that the
> legacy hardware can see the accesses?

Like Dave already said:
| Those are IO ports in port space. IO ports and PCI memory space just
| live in different physical memory windows, no special instructions
| for IO port space access as on x86.

PCI has I/O space and memory space (and config space).

On ia32, you access PCI I/O space (`I/O ports') using IN/OUT instructions.
On non-ia32, you access PCI I/O space by accessing a special region of the CPU
address space.

You access PCI memory space by accessing a special region of the CPU
address space on all platforms I'm aware of. On ia32, it starts at CPU address
zero, and there's no offset between CPU physical addresses and PCI bus
addresses. On other platforms, there may be offsets.

The first (first in PCI bus space!) 16 MiB of PCI memory space is special in
that it's ISA memory space. On ia32, it's always at CPU address zero. On other
platforms, it may be at offset zero of PCI memory space, or it may be in a
completely different region of CPU address space. Or it may not be accessible
at all (cfr. some PowerMacs).

For access to PCI config space, there are even more possibilities. On ia32 it's
usually done using indirect access to PCI I/O space 0xcf8 etc. On other
platforms it's usually done by accessing a special region of the CPU address
space. Or in a different way ;-)

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2004-09-08 11:38:20

by Matthew Wilcox

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Wed, Sep 08, 2004 at 01:11:36PM +0200, Geert Uytterhoeven wrote:
> On ia32, you access PCI I/O space (`I/O ports') using IN/OUT instructions.
> On non-ia32, you access PCI I/O space by accessing a special region of the CPU
> address space.

That's one way of doing it, sure ... ;-)

With HP's Dino PCI controller, you generate I/O cycles on the PCI bus by
writing the address to the DINO_PCI_ADDR register, then reading or writing
the DINO_IO_DATA register.

> You access PCI memory space by accessing a special region of the CPU
> address space on all platforms I'm aware of.

The PPC64 iSeries port seems to do a hypervisor call for readb() et al.
Nasty stuff.

> On ia32, it starts at CPU address
> zero, and there's no offset between CPU physical addresses and PCI bus
> addresses. On other platforms, there may be offsets.

It can even depend on machine model ... only some ia64 platforms have
different bus view and physical view (see Documentation/IO-mapping.txt)

> For access to PCI config space, there are even more possibilities. On ia32
> it's usually done using indirect access to PCI I/O space 0xcf8 etc. On other
> platforms it's usually done by accessing a special region of the CPU address
> space. Or in a different way ;-)

We have four options on i386 -- direct1, direct2, bios and mmconfig.
Other platforms ... well, get even weirder. Magic registers, firmware
calls, memory mapped. It's all been done.

--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain

2004-09-08 14:08:55

by Alan

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Mer, 2004-09-08 at 07:01, Jon Smirl wrote:
> X on GL is going to eliminate all device access from X. Everything

Hardly.

> will be handled from the OpenGL layer. When everything is finished

Nope. It might change the access model for some user space applications.

> Where is the PCI segment base address stored in the PCI driver
> structures? I'm still having trouble with the fact that the PCI driver
> does not have a clear structure representing a PCI segment. Shouldn't
> there be a structure corresponding to a segment?

Who says PCI is a top level bus. You can have PCI bridges on a different
top level bus on some systems - totally independant. Not sure if you can
hot plug PCI root bridges on any parisc boxes but that would sure be
fun.

> >From what I understand right now the SN2 machine can not have two
> active VGA cards since it does not have two PCI segments. Without two
> segments there is no way to tell the legacy addresses apart.

Some of the NUMA x86 boxes have multiple I/O spaces too. So your I/O
address effectively includes a "system node" section.

2004-09-08 15:06:22

by Matthew Wilcox

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Tue, Sep 07, 2004 at 06:58:53PM -0400, Jon Smirl wrote:
> How many active VGA devices can I have in this system 1 or 4? If the
> answer is 4, how do I independently address each VGA card? If the
> answer is one, you can see why I want a pci0000 node to hold the
> attribute for turning it off and on.

Each root bridge has the ability to route the VGA memory space. I imagine
only one has that bit set at a time.

> How many simultaneous VGA devices does this system allow?

I don't think HP supports a configuration other than having the VGA
device on the AGP bus, but there's no reason one couldn't put a VGA
device in every PCI-X slot, is there?

> Does a PCI domain imply separate PCI IO address spaces, or does it
> just mean separate PCI Config spaces?

It only requires separate config space. In practice, you may well
get separate IO port space and/or memory space per domain, or even per
root bus. On PA-RISC, we extend the IO port space to 24 bit and use
the top 8 bits to determine which PCI root bus we're talking to.

> Can an x86 machine use separate PCI IO address spaces?

I think some of the NUMAQ stuff can have separate IO port space, but
you'd have to ask someone familiar with the architecture like Martin
Bligh or Bill Irwin.

--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain

2004-09-08 16:06:22

by Jesse Barnes

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Tuesday, September 7, 2004 11:01 pm, Jon Smirl wrote:
> X on GL is going to eliminate all device access from X. Everything
> will be handled from the OpenGL layer. When everything is finished
> even the OpenGL layer won't do hardware access either, it will IOCTL
> the DRM driver to do it. In the final solution the only user of the
> VGA control should be the secondary card reset program.

Oh right, I forgot. Anyway, the card reset program needs to get at this stuff
somehow.

> Where is the PCI segment base address stored in the PCI driver
> structures? I'm still having trouble with the fact that the PCI driver
> does not have a clear structure representing a PCI segment. Shouldn't
> there be a structure corresponding to a segment?

That would be nice, maybe an extra resource or something? I haven't looked at
the sparc code, but it probably deals with this (sn2 has platform specific
functions to get the base address for a bus).

> From what I understand right now the SN2 machine can not have two
> active VGA cards since it does not have two PCI segments. Without two
> segments there is no way to tell the legacy addresses apart.

sn2 does have multiple PCI segments, we just don't export them yet.

Jesse

2004-09-08 18:23:33

by David Miller

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Wed, 8 Sep 2004 09:02:14 -0700
Jesse Barnes <[email protected]> wrote:

> > Where is the PCI segment base address stored in the PCI driver
> > structures? I'm still having trouble with the fact that the PCI driver
> > does not have a clear structure representing a PCI segment. Shouldn't
> > there be a structure corresponding to a segment?
>
> That would be nice, maybe an extra resource or something? I haven't looked at
> the sparc code, but it probably deals with this (sn2 has platform specific
> functions to get the base address for a bus).

We store them directly in pci_resource_*(pdev,BAR_NUM) as physical
addresses.

2004-09-08 18:26:47

by David Miller

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Wed, 8 Sep 2004 02:55:15 -0400
Jon Smirl <[email protected]> wrote:

> Another part I don't understand... PCI VGA hardware is designed to
> respond to IN/OUT instructions to port space. ppc64/ia64 don't have
> IN/OUT port instructions.

On ppc, sparc, and other non-x86 platforms, when you perform load/store
instructions within the port I/O space window, the PCI controller emits
the IN/OUT transactions exactly as if an x86 processor had executed
an in{bwl}/out{bwl} instruction.

2004-09-08 18:33:27

by Jesse Barnes

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Wednesday, September 8, 2004 11:20 am, David S. Miller wrote:
> On Wed, 8 Sep 2004 09:02:14 -0700
>
> Jesse Barnes <[email protected]> wrote:
> > > Where is the PCI segment base address stored in the PCI driver
> > > structures? I'm still having trouble with the fact that the PCI driver
> > > does not have a clear structure representing a PCI segment. Shouldn't
> > > there be a structure corresponding to a segment?
> >
> > That would be nice, maybe an extra resource or something? I haven't
> > looked at the sparc code, but it probably deals with this (sn2 has
> > platform specific functions to get the base address for a bus).
>
> We store them directly in pci_resource_*(pdev,BAR_NUM) as physical
> addresses.

Oh, right, you have them stored in each bridge, right? I should do the same
thing for sn2...

Thanks,
Jesse

2004-09-08 18:59:39

by David Miller

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Wed, 8 Sep 2004 11:32:28 -0700
Jesse Barnes <[email protected]> wrote:

> On Wednesday, September 8, 2004 11:20 am, David S. Miller wrote:
> > On Wed, 8 Sep 2004 09:02:14 -0700
> >
> > Jesse Barnes <[email protected]> wrote:
> > > > Where is the PCI segment base address stored in the PCI driver
> > > > structures? I'm still having trouble with the fact that the PCI driver
> > > > does not have a clear structure representing a PCI segment. Shouldn't
> > > > there be a structure corresponding to a segment?
> > >
> > > That would be nice, maybe an extra resource or something? I haven't
> > > looked at the sparc code, but it probably deals with this (sn2 has
> > > platform specific functions to get the base address for a bus).
> >
> > We store them directly in pci_resource_*(pdev,BAR_NUM) as physical
> > addresses.
>
> Oh, right, you have them stored in each bridge, right? I should do the same
> thing for sn2...

Oh you mean the base of the entire I/O space? We store that in the
pdev arch level private area.

2004-09-08 23:47:32

by Alan

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Mer, 2004-09-08 at 19:21, David S. Miller wrote:
> On ppc, sparc, and other non-x86 platforms, when you perform load/store
> instructions within the port I/O space window, the PCI controller emits
> the IN/OUT transactions exactly as if an x86 processor had executed
> an in{bwl}/out{bwl} instruction.

Some of them are not quite that pretty. In certain cases outb gets
translated into code that does horrors vaguely of the form

spin_lock_irqsave
lane = (addr & 3) << 3;
writel(1<<(addr&3), somecontroller->lanes);
writel(addr, somecontroller->offset);
writel(val << lane, somecontroller->somewhere);
spin_unlock_irqrestore

Other code has extra magic reads in to work around fpga pci bridges that
forgot out is synchronous and writel is posted.

A better summary from the higher levels of the kernel would be "don't
look behind the sofa, there might be a monster lurking".

The only way I can see VGA routing working is to have some kind of arch
code that can tell you which devices are on the same VGA legacy tree.
That then allows a vga layer to walk VGA devices and ask arch code the
typically simple question

pci_vga_shared_router(pdev1, pdev2)

Alan

2004-09-09 00:31:53

by [email protected]

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Wed, 08 Sep 2004 23:41:05 +0100, Alan Cox <[email protected]> wrote:
> The only way I can see VGA routing working is to have some kind of arch
> code that can tell you which devices are on the same VGA legacy tree.
> That then allows a vga layer to walk VGA devices and ask arch code the
> typically simple question

This is the core problem, I'm missing this piece of data. I need it to
know how many VGA devices to create since there needs to be one for
each VGA legacy tree.

All of my previous replies were confused since I was associating VGA
legacy trees and PCI domains, which apparently have nothing to do with
each other. I'm working on hardware that has neither multiple legacy
trees or domains so I have no experience in dealing with them.

I think the problem is more basic than building a VGA device. I
wouldn't be having trouble if there were structures for each "PCI IO
space". An x86 machine would have one of these structs. Other
architectures would have multiple ones. You need these structs to find
any PCI legacy device, the problem is not specific to VGA.

Shouldn't we first create a cross platform structure that represents
the "PCI IO spaces" available in the system? Then I could walk this
list and easily know how many VGA devices to create. Each VGA device
would then use this structure to know the PCI base address for each
"IO space" operation.

I suspect "PCI IO spaces" are a function of PCI bridge chips. We
already have structures corresponding to these chips. Maybe all I
need to know is how to query a bridge chips config and see if it is
implementing a "PCI IO space". Then I could walk the bridge structures
and know how many VGA devices to create.

--
Jon Smirl
[email protected]

2004-09-09 14:11:25

by Alan

[permalink] [raw]
Subject: Re: multi-domain PCI and sysfs

On Iau, 2004-09-09 at 01:31, Jon Smirl wrote:
> I think the problem is more basic than building a VGA device. I
> wouldn't be having trouble if there were structures for each "PCI IO
> space". An x86 machine would have one of these structs. Other

Depends which x86.

The single trivial arch function I proposed in the previous mail is
enough to untangle this problem and has two virtues

1. For most platforms the implementation is "return 1"
2. The minimal implementation is merely less efficient so you don't
have to hack every conceivable case at once.

> architectures would have multiple ones. You need these structs to find
> any PCI legacy device, the problem is not specific to VGA.

There are essentially no other devices we care about. IDE legacy is
dealt with BIOS level and never touched again - so why bother designing
for them.