2004-03-17 21:37:38

by Martin Hicks

[permalink] [raw]
Subject: Exporting physical topology information


Hi,

I'm trying to figure out what the best way is to export a minimal amount
of physical topology information to userland. Would it be acceptable to
export this kind of information with sysfs?

I'm not proposing that we build an entire physical topology tree in
sysfs, but just providing an attribute file. The two most obvious
examples of where this would be useful is for nodes and pci busses. The
Altix platform is a modular system with CPU bricks and IO bricks. We
currently have no method for locating where "node0" is, nor do we have a
method for locating pci bus 0000:20, for example.

If we could physically locate a PCI bus, then it would be much easier
to (for example) locate our defective SCSI disk that is target4 on the
SCSI controller that is on pci bus 0000:20.

The attached patch, care of Jesse Barnes, exports a physid attribute for
each node, which indicates the physical location of the node. Altix
specific.

thanks
mh

--
Martin Hicks Wild Open Source Inc.
[email protected] 613-266-2296


Attachments:
(No filename) (1.02 kB)
physid.patch (2.48 kB)
Download all attachments

2004-03-18 17:45:37

by Jesse Barnes

[permalink] [raw]
Subject: Re: Exporting physical topology information

On Wednesday 17 March 2004 1:37 pm, Martin Hicks wrote:
> I'm not proposing that we build an entire physical topology tree in
> sysfs, but just providing an attribute file. The two most obvious
> examples of where this would be useful is for nodes and pci busses. The
> Altix platform is a modular system with CPU bricks and IO bricks. We
> currently have no method for locating where "node0" is, nor do we have a
> method for locating pci bus 0000:20, for example.

I'm curious how other arches deal with this too. Like on ppc64 when
you want to remove a CPU or set of CPUs, you have to bring it (or all
of the cores on a given module) down via software, then go into the
lab and find the module to pull it out. Is there a mapping somewhere
that the user is expected to use? A hypervisor call of some sort to
make some lights blink?

> If we could physically locate a PCI bus, then it would be much easier
> to (for example) locate our defective SCSI disk that is target4 on the
> SCSI controller that is on pci bus 0000:20.

This seems like one of the main uses--find components that went bad.
Physically locating a CPU, DIMM, PCI board, or disk would all be
easier if we provided some sort of physical identifier and
logical->physical mapping information. On IRIX, we actually expose
the whole physical hierarchy of the system in /hw. One of the
problems with that approach is that everytime a new system
configuration is released the kernel has to be updated to know about
it, resulting in /hw paths that change over time, and from system to
system...

Jesse

2004-03-19 00:18:01

by Greg KH

[permalink] [raw]
Subject: Re: Exporting physical topology information

On Wed, Mar 17, 2004 at 04:37:14PM -0500, Martin Hicks wrote:
>
> Hi,
>
> I'm trying to figure out what the best way is to export a minimal amount
> of physical topology information to userland. Would it be acceptable to
> export this kind of information with sysfs?
>
> I'm not proposing that we build an entire physical topology tree in
> sysfs, but just providing an attribute file. The two most obvious
> examples of where this would be useful is for nodes and pci busses. The
> Altix platform is a modular system with CPU bricks and IO bricks. We
> currently have no method for locating where "node0" is, nor do we have a
> method for locating pci bus 0000:20, for example.
>
> If we could physically locate a PCI bus, then it would be much easier
> to (for example) locate our defective SCSI disk that is target4 on the
> SCSI controller that is on pci bus 0000:20.

Um, what's wrong with the current /sys/class/pci_bus/*/cpuaffinity files
for determining this topology information? That is why it was added.

thanks,

greg k-h

2004-03-19 17:48:29

by Martin Hicks

[permalink] [raw]
Subject: Re: Exporting physical topology information


On Thu, Mar 18, 2004 at 03:21:39PM -0800, Greg KH wrote:
> On Wed, Mar 17, 2004 at 04:37:14PM -0500, Martin Hicks wrote:
> >
> > Hi,
> >
> > If we could physically locate a PCI bus, then it would be much easier
> > to (for example) locate our defective SCSI disk that is target4 on the
> > SCSI controller that is on pci bus 0000:20.
>
> Um, what's wrong with the current /sys/class/pci_bus/*/cpuaffinity files
> for determining this topology information? That is why it was added.

This gives us more logical topology information. It still doesn't tell
us where in the room the specific piece of equipment is.

mh

--
Martin Hicks Wild Open Source Inc.
[email protected] 613-266-2296

2004-03-19 17:52:28

by Jesse Barnes

[permalink] [raw]
Subject: Re: Exporting physical topology information

On Thursday 18 March 2004 3:21 pm, Greg KH wrote:
> > If we could physically locate a PCI bus, then it would be much easier
> > to (for example) locate our defective SCSI disk that is target4 on the
> > SCSI controller that is on pci bus 0000:20.
>
> Um, what's wrong with the current /sys/class/pci_bus/*/cpuaffinity files
> for determining this topology information? That is why it was added.

Nothing, except that it only provides logical information. In a large
system, it's really useful to be able to physically locate a component
somehow. That was the idea behind adding 'physid'. For example:

[jbarnes@spamtin pci0000:02]$ pwd
/sys/devices/pci0000:02
[jbarnes@spamtin pci0000:02]$ cat physid
rack: 5
module: 12
slot: 3

or for nodes:

[jbarnes@spamtin node2]$ cat physid
rack: 1
module: 3
slot: 1

Then you could walk into the lab and know exactly which device to
kick. Obviously, these values would be platform specific, though on
ia64 and some x86 platforms, we could probably use the ACPI namespace
to access some of the info, and on ppc the OF namespace might have it.

Thanks,
Jesse

2004-03-19 18:44:15

by Greg KH

[permalink] [raw]
Subject: Re: Exporting physical topology information

On Fri, Mar 19, 2004 at 09:51:52AM -0800, Jesse Barnes wrote:
> On Thursday 18 March 2004 3:21 pm, Greg KH wrote:
> > > If we could physically locate a PCI bus, then it would be much easier
> > > to (for example) locate our defective SCSI disk that is target4 on the
> > > SCSI controller that is on pci bus 0000:20.
> >
> > Um, what's wrong with the current /sys/class/pci_bus/*/cpuaffinity files
> > for determining this topology information? That is why it was added.
>
> Nothing, except that it only provides logical information. In a large
> system, it's really useful to be able to physically locate a component
> somehow. That was the idea behind adding 'physid'. For example:
>
> [jbarnes@spamtin pci0000:02]$ pwd
> /sys/devices/pci0000:02
> [jbarnes@spamtin pci0000:02]$ cat physid
> rack: 5
> module: 12
> slot: 3

Hm, that looks to violate the "one value per file" mandate of sysfs,
right? Right now PCI Hotplug slots have a LED on them that you can
flash from userspace to help locate the physical slot that you want to
change. I also know of large PCI drawers that have LEDs that flash to
locate them.

Also, this is _very_ hardware/platform specific. If you want to try to
implement this, I'd be interested in what the patch would look like.

thanks,

greg k-h

2004-03-19 18:43:47

by Greg KH

[permalink] [raw]
Subject: Re: Exporting physical topology information

On Fri, Mar 19, 2004 at 12:48:26PM -0500, Martin Hicks wrote:
>
> On Thu, Mar 18, 2004 at 03:21:39PM -0800, Greg KH wrote:
> > On Wed, Mar 17, 2004 at 04:37:14PM -0500, Martin Hicks wrote:
> > >
> > > Hi,
> > >
> > > If we could physically locate a PCI bus, then it would be much easier
> > > to (for example) locate our defective SCSI disk that is target4 on the
> > > SCSI controller that is on pci bus 0000:20.
> >
> > Um, what's wrong with the current /sys/class/pci_bus/*/cpuaffinity files
> > for determining this topology information? That is why it was added.
>
> This gives us more logical topology information. It still doesn't tell
> us where in the room the specific piece of equipment is.

True, but isn't that what labels on your CPU nodes are for?

:)

greg k-h

2004-03-19 18:53:48

by Jesse Barnes

[permalink] [raw]
Subject: Re: Exporting physical topology information

On Friday 19 March 2004 9:59 am, Greg KH wrote:
> Hm, that looks to violate the "one value per file" mandate of sysfs,
> right? Right now PCI Hotplug slots have a LED on them that you can

Yeah... my original patch to implement node physids used the Altix
module id, which looks like rrrtss, where rrr is a rack id, t is a
brick type, and ss is the rack slot, e.g. 001c12, so it was one value.
The example above was just brainstorming, I'm sure there are better
ways to do it.

> flash from userspace to help locate the physical slot that you want to
> change. I also know of large PCI drawers that have LEDs that flash to
> locate them.

Yeah, that makes things easy, but it would be nice to cover CPUs and
memory banks too, so you can go remove the DIMM with a persistent
single or double bit error, or a CPU with a bad cache or whatever. I
imagine some hardware has blinking lights for that too.

> Also, this is _very_ hardware/platform specific. If you want to try to
> implement this, I'd be interested in what the patch would look like.

Here's the (very platform specific) patch I did for Altix, just to see
what it would look like, and to solicit comments. There's some other
per-node stuff that would be nice to have available to userspace too,
mostly for administrative purposes, like the chipset revision and
type, firmware revision, and other hardware specific details. One way
to export that sort of thing is with some sort of arbitrary data blob,
but like you said, that violates the sysfs one file, one value
principle.

Thanks,
Jesse


===== arch/ia64/mm/numa.c 1.6 vs edited =====
--- 1.6/arch/ia64/mm/numa.c Sun Jan 11 22:54:38 2004
+++ edited/arch/ia64/mm/numa.c Fri Jan 23 11:56:48 2004
@@ -20,6 +20,8 @@
#include <linux/bootmem.h>
#include <asm/mmzone.h>
#include <asm/numa.h>
+#include <asm/sn/nodepda.h>
+#include <asm/sn/module.h>

static struct memblk *sysfs_memblks;
static struct node *sysfs_nodes;
@@ -50,6 +52,13 @@
break;

return (i < num_memblks) ? node_memblk[i].nid : (num_memblks ? -1 : 0);
+}
+
+void node_to_physid(int node, char *buf)
+{
+ struct nodepda_s *nodeinfo = NODEPDA(node);
+
+ format_module_id(buf, nodeinfo->module->id, MODULE_FORMAT_BRIEF);
}

static int __init topology_init(void)
===== drivers/base/node.c 1.16 vs edited =====
--- 1.16/drivers/base/node.c Mon Dec 29 13:37:47 2003
+++ edited/drivers/base/node.c Fri Jan 23 12:25:44 2004
@@ -56,6 +56,17 @@
static SYSDEV_ATTR(meminfo,S_IRUGO,node_read_meminfo,NULL);


+static ssize_t node_read_physid(struct sys_device * dev, char * buf)
+{
+ struct node *node_dev = to_node(dev);
+ int len;
+
+ len = snprintf(buf, NODE_MAX_PHYSID + 1, "%s\n", node_dev->physid);
+ return len;
+}
+
+static SYSDEV_ATTR(physid,S_IRUGO,node_read_physid,NULL);
+
/*
* register_node - Setup a driverfs device for a node.
* @num - Node number to use when creating the device.
@@ -67,6 +78,7 @@
int error;

node->cpumap = node_to_cpumask(num);
+ node_to_physid(num, node->physid);
node->sysdev.id = num;
node->sysdev.cls = &node_class;
error = sys_device_register(&node->sysdev);
@@ -74,6 +86,7 @@
if (!error){
sysdev_create_file(&node->sysdev, &attr_cpumap);
sysdev_create_file(&node->sysdev, &attr_meminfo);
+ sysdev_create_file(&node->sysdev, &attr_physid);
}
return error;
}
===== include/asm-ia64/topology.h 1.9 vs edited =====
--- 1.9/include/asm-ia64/topology.h Wed Jun 18 18:38:50 2003
+++ edited/include/asm-ia64/topology.h Fri Jan 23 11:40:51 2004
@@ -60,6 +60,8 @@

void build_cpu_to_node_map(void);

+extern void node_to_physid(int node, char *buf);
+
#endif /* CONFIG_NUMA */

#include <asm-generic/topology.h>
===== include/asm-ia64/sn/module.h 1.10 vs edited =====
--- 1.10/include/asm-ia64/sn/module.h Sun Jan 18 22:36:15 2004
+++ edited/include/asm-ia64/sn/module.h Fri Jan 23 11:38:34 2004
@@ -14,6 +14,7 @@


#include <linux/config.h>
+#include <asm/sn/sgi.h>
#include <asm/sn/klconfig.h>
#include <asm/sn/ksys/elsc.h>

===== include/linux/node.h 1.5 vs edited =====
--- 1.5/include/linux/node.h Mon Aug 18 19:46:23 2003
+++ edited/include/linux/node.h Fri Jan 23 11:32:44 2004
@@ -22,8 +22,11 @@
#include <linux/sysdev.h>
#include <linux/cpumask.h>

+#define NODE_MAX_PHYSID 80
+
struct node {
cpumask_t cpumap; /* Bitmap of CPUs on the Node */
+ char physid[NODE_MAX_PHYSID]; /* Physical ID of node */
struct sys_device sysdev;
};