Request for comments.
This patch creates a virtual filesystem that represents an Open Firmware
device tree. It has been tested on an OLPC x86 system, but the code is
not processor-specific (apart from its current location under arch/i386).
It requires an Open Firmware implementation that can stay resident during
Linux startup.
It is similar in some respect to fs/proc/proc_devtree.c , but does not
use procfs, nor does it require an intermediate layer of code to
create a flattened representation of the device tree.
The patch applies cleanly against the current version of
git://dev.laptop.org/olpc-2.6 . (commit 5b9429be6056864b938ff6f39e5df3cecbbfcf4b).
Please cc me (Mitch Bradley <[email protected]>) on comments.
OLPC users will need to upgrade their firmware to
http://wiki.laptop.org/go/OLPC_Firmware_Q2B14 to use this.
diff --git a/.config b/.config
index 6087ae7..f15900f 100644
--- a/.config
+++ b/.config
@@ -246,6 +246,7 @@ # CONFIG_MCA is not set
# CONFIG_SCx200 is not set
CONFIG_OLPC=y
CONFIG_OLPC_PM=y
+CONFIG_OFW_FS=y
#
# PCCARD (PCMCIA/CardBus) support
diff --git a/Documentation/filesystems/ofwfs.txt b/Documentation/filesystems/ofwfs.txt
new file mode 100644
index 0000000..9dd94de
--- /dev/null
+++ b/Documentation/filesystems/ofwfs.txt
@@ -0,0 +1,115 @@
+ ofwfs - Open Firmware File System
+
+ofwfs makes the Open Firmware device tree available to Linux as
+a mountable virtual filesystem.
+
+== Introduction ==
+
+Open Firmware (OFW) is boot firmware as defined by IEEE Standard
+1275-1994*. It is common on SPARC (where it is called "Open Boot", a
+Sun trademark) and PowerPC processors, and is available for several
+other processors. In the x86 realm, the highest profile platform that
+uses Open Firmware is the One Laptop Per Child system. Open source
+implemenations can be found at openbios.org.
+
+The Open Firmware device tree is a hierarchical description of the
+system hardware, plus a few other bits of information about the
+firmware itself. The root of the device tree represents the main
+system bus. There are child nodes for bridges to subordinate buses,
+for individual peripheral devices, and for "pseudo devices" representing
+other useful information.
+
+Each device tree node contains a set of properties that describe the
+object in question. They specify such things as the name and type of
+the device, its address ranges within the address space of its parent
+bus, the interrupts that it uses, clock frequencies, and other
+information that is useful for device identification and configuration.
+
+== Filesystem Representation ==
+
+ofwfs exports the OFW device tree as a Linux filesystem. Filesystem
+directories correspond to a device tree nodes, with the directory name
+being the fully-qualifed (name@address) node name. Regular files
+correspond to properties, with file name being the same as the property name
+and the file contents being the verbatim property value.
+
+== Property Value Encoding ==
+
+OFW property values are encoded using a well-defined format that is the
+concatenation of some number of items, each encoded in a platform-independent
+manner according to its type. For example, a 32-bit integer is encoded
+as 4 binary bytes in big-endian order (i.e. network byte order), whereas
+a text string is encoded as a null-terminated sequence of bytes. There
+is no padding between concatenated items; each item within a composite
+property value begins immediately after the preceding item, without regard
+to any alignment restrictions that may exist on a particular processor.
+
+Composite property values that contain mixed types (e.g. a 32-bit integer
+followed by several strings) exist, but are rare. Most property values
+are either a single item (e.g. a string or a number) or an array of similar
+items (e.g. a list of numbers).
+
+A regular file in ofwfs contains the exact byte sequence that comprises the
+OFW property value. Properties are not reformatted into text form, so numeric
+property values appear as binary integers. While this is inconvenient
+for viewing, it is generally easier for programs that read property values,
+and it means that Open Firmware documentation about property values applies
+directly, without having to separately document an ASCII transformation
+(which would have to separately specified for different kinds of properties).
+
+== Environment Assumptions ==
+
+The ofwfs code assumes that the Open Firmware client interface callback
+can be executed during Linux kernel startup (specifically, at "core_initcall"
+time). When ofwfs is initialized, it copies out the property values, so that
+subsequent accesses to the tree do not require callbacks into OFW.
+
+After initialization, ofwfs can be used by kernel components such as drivers,
+and it can be mounted so that userspace applications can access it.
+
+== Recommended Mount Point ==
+
+The recommended mount point for ofwfs is /ofw. (TBD: Should it be mounted
+somewhere under /sys instead?)
+
+== Related Work ==
+
+fs/proc/proc_devtree.c is an existing interface to the Open Firmware device
+tree that is used for the PowerPC and SPARC architectures. Using the
+procfs infrastructure, it creates /proc/device-tree as the root of the
+device tree. proc_devtree.c assumes that the device tree information has
+already been copied out of Open Firmware into a "flattened device tree"
+data structure.
+
+The ofwfs development began as a attempt to use proc_devtree.c, but
+got bogged down in the complexity of the interface code to create the
+flattened tree. A direct callback interface turned out to be much
+simpler, resulting in a small, elegant implementation of /proc/device-tree .
+
+That morphed into ofwfs as a result of discussions with kernel developers,
+who said that /proc was falling into disfavor, so that a separate virtual
+filesystem would be more likely to meet the approval of the kernel team.
+
+== Specific Tree Items ==
+
+Specifications for required and optional properties for various kinds
+of processors, buses, and devices may be found in the various "bindings"
+documents on the Open Firmware Working Group site at
+http://playground.sun.com/1275 .
+
+
+(* The IEEE standard expired in 1999. IEEE standards automatically expire
+after 5 years unless renewed, and the Open Firmware working group decided
+not to undertake the considerable effort necessary for a reaffirmation
+ballot. Nevertheless, Open Firmware as defined by the IEEE document
+remains in use on many PowerPC and SPARC systems, and on a smaller scale
+for other processors.)
+
+
+== Acknowledgements ==
+
+* Paul Mackerras wrote proc_devtree.c, which we used as a starting point.
+* David Kahn converted proc_devtree.c to use a direct interface to OFW.
+* Arnd Bergmann wrote a virtual filesystem version of proc_devtree.c
+* Mitch Bradley put all the pieces together, got it working, and wrote
+ this document.
diff --git a/Documentation/i386/zero-page.txt b/Documentation/i386/zero-page.txt
index c04a421..8c9ab88 100644
--- a/Documentation/i386/zero-page.txt
+++ b/Documentation/i386/zero-page.txt
@@ -28,7 +28,8 @@ Offset Type Description
0xa0 16 bytes System description table truncated to 16 bytes.
( struct sys_desc_table_struct )
- 0xb0 - 0x13f Free. Add more parameters here if you really need them.
+ 0xb0 16 bytes Open Firmware information (magic, version, callback, idt)
+ 0xc0 - 0x13f Free. Add more parameters here if you really need them.
0x140- 0x1be EDID_INFO Video mode setup
0x1c4 unsigned long EFI system table pointer
diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index 3a2eccf..01fdd40 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -1141,6 +1141,15 @@ config OLPC_PM
Add support for the Geode power management facilities on the
OLPC Childrens Machine
+config OFW_FS
+ bool "Support for Open Firmware device tree filesystem"
+ default y if OLPC
+ help
+ This option adds a virtual filesystem which contains an
+ image of the Open Firmware device tree (a description of
+ the system hardware that is exported from the boot firmware).
+ If unsure, say N here.
+
source "drivers/pcmcia/Kconfig"
source "drivers/pci/hotplug/Kconfig"
diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
index 8b365db..718b3cf 100644
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -45,6 +45,7 @@ EXTRA_AFLAGS := -traditional
obj-$(CONFIG_SCx200) += scx200.o
obj-$(CONFIG_OLPC) += olpc.o
obj-$(CONFIG_OLPC_PM) += olpc-pm.o
+obj-$(CONFIG_OFW_FS) += callofw.o ofw_fs.o
# vsyscall.o contains the vsyscall DSO images as __initdata.
# We must build both images before we can assemble it.
diff --git a/arch/i386/kernel/callofw.c b/arch/i386/kernel/callofw.c
new file mode 100644
index 0000000..b2c7a0e
--- /dev/null
+++ b/arch/i386/kernel/callofw.c
@@ -0,0 +1,96 @@
+/*
+ * callofw.c - Open Firmware client interface for 32-bit systems.
+ * This code is intended to be portable to any 32-bit Open Firmware
+ * implementation with a standard client interface that can be
+ * called when Linux is running.
+ */
+
+#include <stdarg.h>
+#include <asm/callofw.h>
+#include <linux/spinlock.h>
+
+int (*call_firmware)(int *);
+
+static DEFINE_SPINLOCK(prom_lock);
+
+#define MAXARGS 20
+int callofw(char *name, int numargs, int numres, ...)
+{
+ va_list ap;
+ int argarray[MAXARGS+3];
+ int argnum = 3;
+ int retval;
+ int *intp;
+ unsigned long flags;
+
+// printk(KERN_WARNING "CALLOFW: %s\n", name);
+ if (call_firmware == NULL)
+ return (-1);
+
+ argarray[0] = (int)name;
+ argarray[1] = numargs;
+ argarray[2] = numres;
+
+ if ((numargs + numres) > MAXARGS) {
+ return (-1);
+ }
+
+ va_start(ap, numres);
+ while (numargs--) {
+ argarray[argnum++] = va_arg(ap, int);
+ }
+
+ spin_lock_irqsave(&prom_lock, flags);
+ retval = call_firmware(argarray);
+ spin_unlock_irqrestore(&prom_lock, flags);
+
+ if (retval == 0) {
+ while (numres--) {
+ intp = va_arg(ap, int *);
+ *intp = argarray[argnum++];
+ }
+ }
+ va_end(ap);
+ return (retval);
+}
+#undef MAXARGS
+
+#if 0
+
+The return value from callofw in all cases is 0 if the attempt to call the
+function succeeded, nonzero otherwise. That return value is from the
+gateway function only. Any results from the called function are returned
+via output argument pointers.
+
+Here are call templates for all the standard OFW client services.
+
+callofw("test", 1, 1, namestr, &missing);
+callofw("peer", 1, 1, phandle, &sibling_phandle);
+callofw("child", 1, 1, phandle, &child_phandle);
+callofw("parent", 1, 1, phandle, &parent_phandle);
+callofw("instance_to_package", 1, 1, ihandle, &phandle);
+callofw("getproplen", 2, 1, phandle, namestr, &proplen);
+callofw("getprop", 4, 1, phandle, namestr, bufaddr, buflen, &size);
+callofw("nextprop", 3, 1, phandle, previousstr, bufaddr, &flag);
+callofw("setprop", 4, 1, phandle, namestr, bufaddr, len, &size);
+callofw("canon", 3, 1, devspecstr, bufaddr, buflen, &length);
+callofw("finddevice", 1, 1, devspecstr, &phandle);
+callofw("instance-to-path", 3, 1, ihandle, bufaddr, buflen, &length);
+callofw("package-to-path", 3, 1, phandle, bufaddr, buflen, &length);
+callofw("call_method", numin, numout, in0, in1, ..., &out0, &out1, ...);
+callofw("open", 1, 1, devspecstr, &ihandle);
+callofw("close", 1, 0, ihandle);
+callofw("read", 3, 1, ihandle, addr, len, &actual);
+callofw("write", 3, 1, ihandle, addr, len, &actual);
+callofw("seek", 3, 1, ihandle, pos_hi, pos_lo, &status);
+callofw("claim", 3, 1, virtaddr, size, align, &baseaddr);
+callofw("release", 2, 0, virtaddr, size);
+callofw("boot", 1, 0, bootspecstr);
+callofw("enter", 0, 0);
+callofw("exit", 0, 0);
+callofw("chain", 5, 0, virtaddr, size, entryaddr, argsaddr, len);
+callofw("interpret", numin+1, numout+1, cmdstr, in0, ..., &catchres, &out0, ...);
+callofw("set-callback", 1, 1, newfuncaddr, &oldfuncaddr);
+callofw("set-symbol-lookup", 2, 0, symtovaladdr, valtosymaddr);
+callofw("milliseconds", 0, 1, &ms);
+#endif
diff --git a/arch/i386/kernel/head.S b/arch/i386/kernel/head.S
index ca31f18..a0ac805 100644
--- a/arch/i386/kernel/head.S
+++ b/arch/i386/kernel/head.S
@@ -113,6 +113,32 @@ ENTRY(startup_32)
* Warning: don't use %esi or the stack in this code. However, %esp
* can be used as a GPR if you really need it...
*/
+#ifdef CONFIG_OFW_FS
+/*
+ * If Open Firmware booted us, save the OFW client interface callback address
+ * and preserve the OFW page mappings by priming the kernel's new page
+ * directory area with a copy of the OFW page directory. That lets OFW stay
+ * resident in high memory (high in both the virtual and physical spaces)
+ * for at least long enough to copy out the device tree.
+ */
+ movl $(boot_params - __PAGE_OFFSET + OFW_INFO_OFFSET), %ebp
+ cmpl $0x2057464F, (%ebp) /* Magic number "OFW " */
+ jne 1f
+
+ mov 0x8(%ebp), %eax /* Save callback address */
+ mov %eax, call_firmware - __PAGE_OFFSET
+
+ /* Copy the OFW pdir into swapper_pg_dir */
+ movl %esi, %edx /* save %esi */
+ movl $(swapper_pg_dir - __PAGE_OFFSET), %edi
+ movl %cr3, %esi /* Source is current pg_dir base address */
+ movl $1024, %ecx /* Number of page directory entries */
+ rep
+ movsl
+ movl %edx, %esi /* restore %esi */
+1:
+#endif
+
page_pde_offset = (__PAGE_OFFSET >> 20);
movl $(pg0 - __PAGE_OFFSET), %edi
diff --git a/arch/i386/kernel/ofw_fs.c b/arch/i386/kernel/ofw_fs.c
new file mode 100644
index 0000000..30ca359
--- /dev/null
+++ b/arch/i386/kernel/ofw_fs.c
@@ -0,0 +1,261 @@
+/*
+ * Open Firmware Device Tree Filesystem
+ *
+ * By Mitch Bradley ([email protected]), with assistance from David Kahn.
+ * Most of the basic virtual file system structure was taken from a
+ * "promfs" example written by Arnd Bergmann.
+ *
+ * This code should be portable to any system (regardless of CPU) that
+ * has an Open Firmware client interface that can be called when this
+ * module is initialized.
+ */
+
+#include <linux/dcache.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/pagemap.h>
+
+#include <asm/callofw.h>
+
+/* 1275 in little-endian ASCII (for IEEE 1275 - the Open Firmware Standard) */
+#define OFWFS_MAGIC 0x35373231
+
+struct propval {
+ int length;
+ char value[];
+};
+
+static ssize_t ofwfs_read(struct file *file, char __user *data,
+ size_t len, loff_t *ppos)
+{
+ struct propval *propval = file->f_dentry->d_inode->i_private;
+
+ return simple_read_from_buffer(data, len, ppos,
+ propval->value, propval->length);
+}
+
+static struct file_operations ofwfs_property_operations = {
+ .read = ofwfs_read,
+};
+
+static struct inode *ofwfs_new_inode(struct super_block *sb,
+ int mode, void *data)
+{
+ struct inode *inode;
+
+ inode = new_inode(sb);
+ if (!inode)
+ goto out;
+
+ inode->i_mode = mode;
+ inode->i_uid = current->fsuid;
+ inode->i_gid = current->fsgid;
+ inode->i_blkbits = PAGE_CACHE_SHIFT;
+ inode->i_blocks = 0;
+ inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+ inode->i_private = data;
+ switch (mode & S_IFMT) {
+ case S_IFDIR:
+ inode->i_op = &simple_dir_inode_operations;
+ inode->i_fop = &simple_dir_operations;
+ break;
+ case S_IFREG:
+ inode->i_fop = &ofwfs_property_operations;
+ inode->i_bytes = ((struct propval *)data)->length;
+ inode->i_size = (loff_t)(((struct propval *)data)->length);
+ break;
+ }
+out:
+ return inode;
+}
+
+static int ofwfs_create_prop(struct super_block *sb, struct dentry *dir,
+ char *propname, struct propval *propval)
+{
+ struct dentry *dentry;
+ struct inode *inode;
+ int ret;
+
+ ret = -ENOMEM;
+ inode = ofwfs_new_inode(sb, S_IFREG | 0444, propval);
+ if (!inode)
+ goto out;
+ dentry = d_alloc_name(dir, propname);
+ if (!dentry)
+ goto out_iput;
+ d_add(dentry, inode);
+ return 0;
+
+out_iput:
+ iput(inode);
+out:
+ return ret;
+}
+
+static int ofwfs_create_props(struct super_block *sb, struct dentry *dir,
+ phandle node)
+{
+ char propname[32];
+ int security, len;
+ struct propval *propval;
+ int ret = 0;
+ int flag;
+
+ propname[0] = '\0';
+
+ while ((void)callofw("nextprop", 3, 1, node, propname, propname,
+ &flag), flag == 1) {
+ security = strncmp(propname, "security-", 9) == 0;
+ len = 0;
+ if (!security)
+ (void)callofw("getproplen", 2, 1, node, propname, &len);
+
+ propval = kmalloc(sizeof(struct propval) + len, GFP_KERNEL);
+
+ propval->length = len;
+ (void) callofw("getprop", 4, 1, node, propname, propval->value,
+ len, &len);
+
+ ret = ofwfs_create_prop(sb, dir, propname, propval);
+ if (ret)
+ goto out;
+ }
+
+out:
+ if (ret) {
+ WARN_ON(1);
+/*
+ ofwfs_remove_props(sb, dir);
+*/
+ }
+
+ return ret;
+}
+
+static int ofwfs_create_dir(struct super_block *sb,
+ struct dentry *parent, phandle node)
+{
+ int ret;
+ int pathlen;
+ int i;
+ phandle child = 0;
+ struct dentry *dentry = NULL;
+ struct inode *inode = NULL;
+ char *path;
+
+ ret = -ENOMEM;
+
+ inode = ofwfs_new_inode(sb, S_IFDIR | 0555, node);
+ if (!inode)
+ goto out;
+
+ if (parent) {
+ (void) callofw("package-to-path", 3, 1, node, NULL, 0,
+ &pathlen);
+
+ path = kmalloc((size_t) pathlen + 1, GFP_KERNEL);
+ if (!path)
+ goto out_iput;
+
+ (void) callofw("package-to-path", 3, 1, node, path,
+ pathlen + 1, &i);
+
+ dentry = d_alloc_name(parent, strrchr(path, '/') + 1);
+ kfree(path);
+ if (!dentry)
+ goto out_iput;
+ d_add(dentry, inode);
+ } else {
+ dentry = d_alloc_root(inode);
+ if (!dentry)
+ goto out_iput;
+ sb->s_root = dentry;
+ }
+
+ ret = ofwfs_create_props(sb, dentry, node);
+ if (ret)
+ goto out_cleanup;
+
+ (void) callofw("child", 1, 1, node, &child);
+ while (child) {
+ ret = ofwfs_create_dir(sb, dentry, child);
+ if (ret)
+ goto out;
+ (void) callofw("peer", 1, 1, child, &child);
+ }
+ return 0;
+
+out_cleanup:
+ WARN_ON(1);
+// ofwfs_remove_dirs();
+ dput(dentry);
+out_iput:
+ iput(inode);
+out:
+ return ret;
+}
+
+static int ofwfs_create_root(struct super_block *sb, void *data)
+{
+ phandle root;
+
+ root = 0;
+ (void) callofw("peer", 1, 1, 0, &root);
+
+ if (root == 0) {
+ printk(KERN_ERR "ofwfs: can't find device tree root\n");
+ return -ENOENT;
+ }
+
+ return ofwfs_create_dir(sb, NULL, root);
+}
+
+static struct super_operations ofwfs_super_operations = {
+ .statfs = simple_statfs,
+ .drop_inode = generic_delete_inode,
+};
+
+static int ofwfs_fill_super(struct super_block *sb, void *data, int silent)
+{
+ sb->s_maxbytes = MAX_LFS_FILESIZE;
+ sb->s_blocksize = PAGE_CACHE_SIZE;
+ sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
+ sb->s_magic = OFWFS_MAGIC;
+ sb->s_op = &ofwfs_super_operations;
+ return ofwfs_create_root(sb, data);
+}
+
+static int ofwfs_get_sb(struct file_system_type *fs_type,
+ int flags, const char *name, void *data, struct vfsmount *mnt)
+{
+ return get_sb_single(fs_type, flags, data, ofwfs_fill_super, mnt);
+}
+
+static struct file_system_type ofwfs_type = {
+ .owner = THIS_MODULE,
+ .name = "ofwfs",
+ .get_sb = ofwfs_get_sb,
+ .kill_sb = kill_litter_super,
+};
+
+static int __init ofwfs_init(void)
+{
+ int ret;
+
+ ret = register_filesystem(&ofwfs_type);
+ if (ret)
+ return ret;
+
+ kern_mount(&ofwfs_type);
+ return 0;
+}
+
+core_initcall(ofwfs_init);
+
+static void __exit ofwfs_exit(void)
+{
+ unregister_filesystem(&ofwfs_type);
+}
+// module_exit(ofwfs_exit);
diff --git a/include/asm-i386/callofw.h b/include/asm-i386/callofw.h
new file mode 100644
index 0000000..594cb63
--- /dev/null
+++ b/include/asm-i386/callofw.h
@@ -0,0 +1,22 @@
+#ifndef _I386_PROM_H
+#define _I386_PROM_H
+#ifdef __KERNEL__
+
+/*
+ * Definitions for Open Firmware client interface on 32-bit system.
+ * OF Cell size is 4. Integer properties are encoded big endian,
+ * as with all OF implementations.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include <linux/types.h>
+
+typedef void *phandle;
+
+extern int callofw(char *, int, int, ...);
+
+#endif /* __KERNEL__ */
+#endif /* _I386_PROM_H */
diff --git a/include/asm-i386/setup.h b/include/asm-i386/setup.h
index 2734909..397dc6d 100644
--- a/include/asm-i386/setup.h
+++ b/include/asm-i386/setup.h
@@ -24,6 +24,7 @@ #define OLD_CL_MAGIC 0xA33F
#define OLD_CL_BASE_ADDR 0x90000
#define OLD_CL_OFFSET 0x90022
#define NEW_CL_POINTER 0x228 /* Relative to real mode data */
+#define OFW_INFO_OFFSET 0xb0 /* Relative to real mode data */
#ifndef __ASSEMBLY__
/*
@@ -41,6 +42,7 @@ #define APM_BIOS_INFO (*(struct apm_bios
#define IST_INFO (*(struct ist_info *) (PARAM+0x60))
#define DRIVE_INFO (*(struct drive_info_struct *) (PARAM+0x80))
#define SYS_DESC_TABLE (*(struct sys_desc_table_struct*)(PARAM+0xa0))
+/* OFW INFO is 0x10 bytes starting at offset 0xb0 */
#define EFI_SYSTAB ((efi_system_table_t *) *((unsigned long *)(PARAM+0x1c4)))
#define EFI_MEMDESC_SIZE (*((unsigned long *) (PARAM+0x1c8)))
#define EFI_MEMDESC_VERSION (*((unsigned long *) (PARAM+0x1cc)))
From: Mitch Bradley <[email protected]>
Date: Sat, 30 Dec 2006 15:38:46 -1000
> Request for comments.
>
> This patch creates a virtual filesystem that represents an Open Firmware
> device tree. It has been tested on an OLPC x86 system, but the code is
> not processor-specific (apart from its current location under arch/i386).
>
> It requires an Open Firmware implementation that can stay resident during
> Linux startup.
>
> It is similar in some respect to fs/proc/proc_devtree.c , but does not
> use procfs, nor does it require an intermediate layer of code to
> create a flattened representation of the device tree.
>
> The patch applies cleanly against the current version of
> git://dev.laptop.org/olpc-2.6 . (commit 5b9429be6056864b938ff6f39e5df3cecbbfcf4b).
>
> Please cc me (Mitch Bradley <[email protected]>) on comments.
Can we please not have N different interfaces to the open-firmware
calls so that perhaps powerpc and Sparc have a chance of using this
code too?
On sparc and powerpc, we even build an in-kernel data structure of the
entire open-firmware device tree that code like your's could use if
you make a simple setup layer for i386 as well. We have interfaces for
modifying property values at run time too.
I would strongly suggest looking at things like
arch/{sparc,sparc64,powerpc}/kernel/prom.c and
include/asm-{sparc,sparc64,powerpc}/prom.h and
arch/{sparc,sparc64,powerpc}/kernel/of_device.c and
include/asm-{sparc,sparc64,powerpc}/of_device.h
since we've already invested a lot of thought and
infrastructure into providing interfaces to this information
on powerpc and the two sparc platforms.
Thanks.
David Miller wrote:
> ...
> Can we please not have N different interfaces to the open-firmware
> calls so that perhaps powerpc and Sparc have a chance of using this
> code too?
>
The base interface function is callofw(), which is effectively identical
to call_prom_ret() in arch/powerpc/kernel/prom_init.c . So it seems
that PowerPC could use it. I suppose I could change the name of
callofw() to call_prom_ret(), thus making the base interface identical
to PowerPC's. All it does is argument marshalling, translating between
C varargs argument lists and the OFW argarray format.
SPARC should be able to use that same base interface function directly.
It is written to the standard OFW client interface. The x86 client
interface that I tested it on is essentially the same code that is in
OBP. It wouldn't work on ancient Sun machines with the sunmon romvec
interface, but Sun stopped making such machines something like 16 years ago.
> On sparc and powerpc, we even build an in-kernel data structure of the
> entire open-firmware device tree that code like your's could use if
> you make a simple setup layer for i386 as well. We have interfaces for
> modifying property values at run time too.
>
> I would strongly suggest looking at things like
> arch/{sparc,sparc64,powerpc}/kernel/prom.c and
> include/asm-{sparc,sparc64,powerpc}/prom.h and
> arch/{sparc,sparc64,powerpc}/kernel/of_device.c and
> include/asm-{sparc,sparc64,powerpc}/of_device.h
> since we've already invested a lot of thought and
> infrastructure into providing interfaces to this information
> on powerpc and the two sparc platforms.
>
>
>
I did look at those files, until my eyes glazed over. In powerpc land,
the files that are the underlayer for proc_devtree.c comprise 4700 lines
of code (the files you list plus prom_init.c). In sparc land, it is
only 3200 lines (the files you list plus the prom interface library).
On top of that, proc_devtree.c is 233 lines.
In contrast, ofw_fs.c is 261 lines, and the base interface function
callofw() is 97 lines (half of them comments).
Admittedly, this is something of an apples-to-oranges comparison,
because ofw_fs only exports a read-only device tree and nothing else.
But in the case where that is all you need, a direct interface to OFW
that avoids the middleman seems like a good choice.
I did consider first creating a memory data structure identical to the
powerpc/sparc one, but that looked like it was going to be essentially
twice as much code for no extra capability. The code to traverse the
device tree and create the memory data structure is roughly the same as
the code to create the filesystem structure. I just didn't see the
value of an intermediate representation for systems that don't otherwise
need it. (A setup layer would have let me use proc_devtree.c directly,
so the total amount of new code would have been the same, but many
people told me that if I even suggested using procfs the kernel gurus
would blow me out of the water without bothering to blink.)
In the SPARC and PowerPC spaces, Open Firmware is widespread, so it
makes sense for those kernels to use OFW extensively. In x86 land, OFW
is far from being the dominant firmware, so the x86 kernel is unlikely
to depend on OFW services at a deep level. That being the case, the
deep-integration features of the sparc and powerpc OFW interfaces are
not needed in x86 land. But a lightweight interface to the device tree
is certainly useful for the platforms that do have OFW. It might be
useful for other processors as well, especially on platforms that don't
need the deep configurability that drove the OFW design.
From: Mitch Bradley <[email protected]>
Date: Sat, 30 Dec 2006 23:36:45 -1000
> The base interface function is callofw(), which is effectively identical
> to call_prom_ret() in arch/powerpc/kernel/prom_init.c . So it seems
> that PowerPC could use it. I suppose I could change the name of
> callofw() to call_prom_ret(), thus making the base interface identical
> to PowerPC's. All it does is argument marshalling, translating between
> C varargs argument lists and the OFW argarray format.
Please create explicit function calls for each operation, this
way the caller is immune to open-firmware implementation details.
> SPARC should be able to use that same base interface function directly.
> It is written to the standard OFW client interface.
Sparc 32-bit predates the OFW specification and does things differently.
Please use a functional interface using a C function for each device
tree operation, then the implementation behind it doesn't matter.
> It wouldn't work on ancient Sun machines with the sunmon romvec
> interface, but Sun stopped making such machines something like 16 years ago.
Yet we still support them in the 32-bit sparc port. And it's so
easy to support this properly, please use the clean stuff we've
created for this.
> I did consider first creating a memory data structure identical to the
> powerpc/sparc one, but that looked like it was going to be essentially
> twice as much code for no extra capability. The code to traverse the
> device tree and create the memory data structure is roughly the same as
> the code to create the filesystem structure. I just didn't see the
> value of an intermediate representation for systems that don't otherwise
> need it.
Since we put it in memory, we have zero reason to call into the
firmware for device tree access and this simplifies things a lot.
But all of that really doesn't matter, if you use a functional
C interface for each device tree access operation, it doesn't
matter what's behind it right?
Responding to two replies in one email ...
Mitch Bradley wrote:
>
> David Miller wrote:
>> ...
>> Can we please not have N different interfaces to the open-firmware
>> calls so that perhaps powerpc and Sparc have a chance of using this
>> code too?
David,
I helped Mitch do this "port", so I'd like to chime in here.
Mitch and I both looked extensively at the powerpc and sparc
code that's there. There's an entire extra layer there that
isn't needed if all you want to do is expose the device
tree to userland via the file system, and that's all that was
needed in this case, plus, the desire is to keep OFW in memory
in this case, so caching of the device tree is unnecessary for
this simple implementation.
We didn't create a new interface here, all we did was
write code for the existing OFW client interface and call it
directly, rather than go through extra layers of code to do
that.Both the existing powerpc and sparc code seems to be
burdened with the use of legacy interfaces.
David Miller wrote:
> Please create explicit function calls for each operation, this
> way the caller is immune to open-firmware implementation details.
I did that initially, but Mitch and I agreed that it's just
a waste of more code for this case, where the majority of
i386 implementations are not going to use it.
The opportunity for layering an interface in between the kernel
and the firmware still exists with the trivial callofw interface.
>
>> SPARC should be able to use that same base interface function directly.
>> It is written to the standard OFW client interface.
>
> Sparc 32-bit predates the OFW specification and does things differently.
15 year old 32-bit SPARC machines use a different interface.
None of those machines are even supported by Sun anymore
(hardware or software support in any form.) Nonetheless,
we haven't changed any of that.
And by the way, Mitch and I are intimately familiar with that
stuff from our days together at Sun. (I'm still at Sun.) Mitch
and I have worked together on these firmware interfaces for a long
time now.
>
> Please use a functional interface using a C function for each device
> tree operation, then the implementation behind it doesn't matter.
The opportunity for layering a different interface in between the kernel
and the firmware already exists with the callofw and cif_handler interfaces.
In fact, before Mitch got the code working, I had written an entire
emulation layer to test the file system code out. All it did was hook
into that existing cif_handler interface, providing its own address
rather than using a cif handler address in the firmware. At the end of
the day, Mitch got the cif calls working to the actual firmware before
we tried the emulation layer, but the point is that it was fairly trivial
to implement.
>> It wouldn't work on ancient Sun machines with the sunmon romvec
>> interface, but Sun stopped making such machines something like 16 years ago.
>
> Yet we still support them in the 32-bit sparc port. And it's so
> easy to support this properly, please use the clean stuff we've
> created for this.
We haven't changed the sparc implementation, it all still works.
All we've done is created a trivial implementation for exporting
the device tree to userland that isn't burdened by the powerpc
and sparc legacy code that's in there now.
>
>> I did consider first creating a memory data structure identical to the
>> powerpc/sparc one, but that looked like it was going to be essentially
>> twice as much code for no extra capability. The code to traverse the
>> device tree and create the memory data structure is roughly the same as
>> the code to create the filesystem structure. I just didn't see the
>> value of an intermediate representation for systems that don't otherwise
>> need it.
>
> Since we put it in memory, we have zero reason to call into the
> firmware for device tree access and this simplifies things a lot.
And it also wastes resources, if all you need is a read-only
view of the device tree in the filesystem.
-David
From: David Kahn <[email protected]>
Date: Sun, 31 Dec 2006 02:11:53 -0800
> All we've done is created a trivial implementation for exporting
> the device tree to userland that isn't burdened by the powerpc
> and sparc legacy code that's in there now.
So now we'll have _3_ different implementations of exporting
the OFW device tree via procfs. Your's, the proc_devtree
of powerpc, and sparc's /proc/openprom
That doesn't make any sense to me, having 3 ways of doing the same
exact thing and making no attempt to share code at all.
If you want to do something new that consolidates everything, with the
goal of deprecating the existing stuff, that's great! But with they
way you're doing this, all the sparc and powerpc implementations
really can't take advantage of it.
Am I the only person who sees something very wrong with this?
David Miller wrote:
> From: David Kahn <[email protected]>
> Date: Sun, 31 Dec 2006 02:11:53 -0800
>
>> All we've done is created a trivial implementation for exporting
>> the device tree to userland that isn't burdened by the powerpc
>> and sparc legacy code that's in there now.
>
> So now we'll have _3_ different implementations of exporting
> the OFW device tree via procfs. Your's, the proc_devtree
> of powerpc, and sparc's /proc/openprom
I would not exactly call what we have for powerpc
"exporting the OFW device tree". I don't quite know
what it is, but it isn't as simple as exporting the
OFW device tree. I don't think we really wanted to
get into any of that here.
> If you want to do something new that consolidates everything, with the
> goal of deprecating the existing stuff, that's great! But with they
> way you're doing this, all the sparc and powerpc implementations
> really can't take advantage of it.
Sure they can take advantage of it, if what they need
to export is a read-only copy of the actual device tree
without any legacy burden or having a writable/changeable
copy of it with a trivial implementation. But that's not
up to us, and that's not what's been done for powerpc and
sparc.
This is a trivial implementation that suits it's purpose.
It's simple. I'm not sure what more is needed for this
project when it's pretty clear that i386 will never need
any additional support for open firmware.
-David
Hi,
On Sunday 31 December 2006 11:49, David Miller wrote:
> From: David Kahn <[email protected]>
> Date: Sun, 31 Dec 2006 02:11:53 -0800
>
> > All we've done is created a trivial implementation for exporting
> > the device tree to userland that isn't burdened by the powerpc
> > and sparc legacy code that's in there now.
>
> So now we'll have _3_ different implementations of exporting
> the OFW device tree via procfs. Your's, the proc_devtree
> of powerpc, and sparc's /proc/openprom
>
> That doesn't make any sense to me, having 3 ways of doing the same
> exact thing and making no attempt to share code at all.
>
> If you want to do something new that consolidates everything, with the
> goal of deprecating the existing stuff, that's great! But with they
> way you're doing this, all the sparc and powerpc implementations
> really can't take advantage of it.
>
> Am I the only person who sees something very wrong with this?
Nope you aren't, ACK to a unified user-space export from my side as well.
Yours,
--
Ren? Rebe - ExactCODE GmbH - Europe, Germany, Berlin
http://exactcode.de | http://t2-project.org | http://rene.rebe.name
+49 (0)30 / 255 897 45
On 12/31/06, Mitch Bradley <[email protected]> wrote:
> diff --git a/arch/i386/kernel/ofw_fs.c b/arch/i386/kernel/ofw_fs.c
> new file mode 100644
> index 0000000..30ca359
> --- /dev/null
> +++ b/arch/i386/kernel/ofw_fs.c
> @@ -0,0 +1,261 @@
> +/* 1275 in little-endian ASCII (for IEEE 1275 - the Open Firmware Standard) */
> +#define OFWFS_MAGIC 0x35373231
Please put this in <linux/magic.h>
On 12/31/06, Mitch Bradley <[email protected]> wrote:
> + case S_IFREG:
> + inode->i_fop = &ofwfs_property_operations;
> + inode->i_bytes = ((struct propval *)data)->length;
> + inode->i_size = (loff_t)(((struct propval *)data)->length);
As you need the struct twice, use a local variable.
On 12/31/06, Mitch Bradley <[email protected]> wrote:
> +static int ofwfs_create_props(struct super_block *sb, struct dentry *dir,
> + phandle node)
> +{
> + char propname[32];
> + int security, len;
> + struct propval *propval;
> + int ret = 0;
> + int flag;
> +
> + propname[0] = '\0';
> +
> + while ((void)callofw("nextprop", 3, 1, node, propname, propname,
> + &flag), flag == 1) {
Please do the call within the loop body and use an explicit bereak.
> + security = strncmp(propname, "security-", 9) == 0;
> + len = 0;
Redundant assignment, no?
> + if (!security)
> + (void)callofw("getproplen", 2, 1, node, propname, &len);
We don't use void casts to suppress return value in Linux. You can
just drop it (elsewhere too).
On 12/31/06, Mitch Bradley <[email protected]> wrote:
> +out:
> + if (ret) {
> + WARN_ON(1);
> +/*
> + ofwfs_remove_props(sb, dir);
> +*/
> + }
Hmm?
On 12/31/06, Mitch Bradley <[email protected]> wrote:
> +static int __init ofwfs_init(void)
> +{
> + int ret;
> +
> + ret = register_filesystem(&ofwfs_type);
> + if (ret)
> + return ret;
> +
> + kern_mount(&ofwfs_type);
kern_mount can fail so you'd better do IS_ERR/PTR_ERR here.
> diff --git a/include/asm-i386/callofw.h b/include/asm-i386/callofw.h
> new file mode 100644
> index 0000000..594cb63
> --- /dev/null
> +++ b/include/asm-i386/callofw.h
> +#include <linux/types.h>
> +
> +typedef void *phandle;
The typedef seems useless and confusing at best. Can we drop it?
On Dec 30 2006 15:38, Mitch Bradley wrote:
>
> Request for comments.
>
> It is similar in some respect to fs/proc/proc_devtree.c , but does
> not use procfs, nor does it require an intermediate layer of code
> to create a flattened representation of the device tree.
NB: openpromfs does not use procfs either. It just happens to be
mounted somewhere down there.
> +++ b/Documentation/filesystems/ofwfs.txt
> +== Property Value Encoding ==
> +
> +A regular file in ofwfs contains the exact byte sequence that
> +comprises the OFW property value. Properties are not reformatted
> +into text form, so numeric property values appear as binary
> +integers. While this is inconvenient for viewing, it is generally
> +easier for programs that read property values, and it means that
> +Open Firmware documentation about property values applies
> +directly, without having to separately document an ASCII
> +transformation (which would have to separately specified for
> +different kinds of properties).
I appreciate the ASCII formatting that openpromfs currently does.
Perhaps, should OFWFS be used, it could offer both?
> +== Environment Assumptions ==
> +
> +The ofwfs code assumes that the Open Firmware client interface
> +callback can be executed during Linux kernel startup
> +(specifically, at "core_initcall" time). When ofwfs is
> +initialized, it copies out the property values, so that subsequent
> +accesses to the tree do not require callbacks into OFW.
openpromfs also has many parts read-only, only one object,
/options/security-password, is writable. Hence caching it once and
forever seems ok.
BUT, the eeprom utility may be used to modify values, and if used, I
would like to see ofwfs show the updated value. openpromfs does it
today:
15:09 ares:/proc/openprom/options # cat oem-banner?
false
15:09 ares:/proc/openprom/options # eeprom 'oem-banner?=true'
15:09 ares:/proc/openprom/options # cat oem-banner?
true
(BTW, why does not openpromfs have it rw?)
> +== Recommended Mount Point ==
> +
> +The recommended mount point for ofwfs is /ofw. (TBD: Should it be
> +mounted somewhere under /sys instead?)
Somewhere in /sys/firmware perhaps.
-`J'
--
On Sun, Dec 31, 2006 at 02:49:17AM -0800, David Miller wrote:
> From: David Kahn <[email protected]>
> Date: Sun, 31 Dec 2006 02:11:53 -0800
>
> > All we've done is created a trivial implementation for exporting
> > the device tree to userland that isn't burdened by the powerpc
> > and sparc legacy code that's in there now.
>
> So now we'll have _3_ different implementations of exporting
> the OFW device tree via procfs. Your's, the proc_devtree
> of powerpc, and sparc's /proc/openprom
>
> That doesn't make any sense to me, having 3 ways of doing the same
> exact thing and making no attempt to share code at all.
>
> If you want to do something new that consolidates everything, with the
> goal of deprecating the existing stuff, that's great! But with they
> way you're doing this, all the sparc and powerpc implementations
> really can't take advantage of it.
>
> Am I the only person who sees something very wrong with this?
No, I completely agree with you on this.
If firmworks really wants to have a spearate filesystem that's fine.
But please start with the ppc or sparc code and massage it into a
a separate filesystem before adding support for a new platform.
The last thing we need is more duplication.
In case anyone wants to start this based on ppc I'd gladfully help
to test this on pmac (32 and 64bit) and cell (64 bit).
I made all the changes Pekka suggested, except:
> + security = strncmp(propname, "security-", 9) == 0;
>> + len = 0;
>
> Redundant assignment, no?
>
>> + if (!security)
>> + (void)callofw("getproplen", 2, 1, node,
>> propname, &len);
>
That assignment turns out not to be redundant. If a security variable
is recognized, you want the length to be 0 so as not to expose the
password. In that case the following "getproplen" call won't be executed.
That logic was adapted from the existing file fs/proc/devtree.c . It
turns out that the code there has a bug: You really want to look for
just "security-password" ; there is no need to, and good reasons not to,
suppress the length of "security-mode" and "security-#badlogins". (Good
OFW implementations won't leak the password length anyway, so check is
only needed as a workaround).
I have rewritten the code for clarity and correctness thusly:
if (strcmp(propname, "security-password") == 0) {
len = 0; /* Don't leak password length */
} else {
callofw("getproplen", 2, 1, node, propname, &len);
}
From: Jan Engelhardt <[email protected]>
Date: Sun, 31 Dec 2006 15:12:26 +0100 (MET)
> BUT, the eeprom utility may be used to modify values, and if used, I
> would like to see ofwfs show the updated value. openpromfs does it
> today:
>
> 15:09 ares:/proc/openprom/options # cat oem-banner?
> false
> 15:09 ares:/proc/openprom/options # eeprom 'oem-banner?=true'
> 15:09 ares:/proc/openprom/options # cat oem-banner?
> true
>
> (BTW, why does not openpromfs have it rw?)
It used to be able to :-)
When I changed sparc/sparc64 over to an in-memory copy of the
OFW tree, I wasn't able to retain writable property support
in openpromfs. It just needs to be implemented and I never
found the desire nor time.
Happily, everyone uses 'eeprom' or some other program accessing
the tree via /dev/openprom to change values.
I'm incredibly surprised how much resistence there is from the
i386 OFW folks to do this right. It would be like 80 lines of
code to suck the device tree into kernel memory, or if they don't
want to do that they can use inline function wrappers to provide
the clean C-language interface to all of this and the cost to
i386-OFW would be zero with the benefit that other platforms could
use the code potentially.
Doing the same thing 3 different ways, knowingly, is just very bad
engineering. That is how you end up with a big fat pile of
unmaintainable poo instead of a clean maintainable source tree. If we
fix a bug in one of these things, the other 2 are so different that if
the bug is in the others we'll never know and it's not easy to check
so people won't do it.
So please do this crap right.
Thanks.
From: Christoph Hellwig <[email protected]>
Date: Sun, 31 Dec 2006 15:41:03 +0000
> On Sun, Dec 31, 2006 at 02:49:17AM -0800, David Miller wrote:
> > Am I the only person who sees something very wrong with this?
>
> No, I completely agree with you on this.
>
> If firmworks really wants to have a spearate filesystem that's fine.
> But please start with the ppc or sparc code and massage it into a
> a separate filesystem before adding support for a new platform.
>
> The last thing we need is more duplication.
>
> In case anyone wants to start this based on ppc I'd gladfully help
> to test this on pmac (32 and 64bit) and cell (64 bit).
Thanks for the vote of sanity Christoph.
I'm happy to test on sparc64 of course too :-)
On Dec 31 2006 12:45, David Miller wrote:
>From: Jan Engelhardt <[email protected]>
>
>> BUT, the eeprom utility may be used to modify values, and if used, I
>> would like to see ofwfs show the updated value. openpromfs does it
>> today:
>>
>> 15:09 ares:/proc/openprom/options # cat oem-banner?
>> false
>> 15:09 ares:/proc/openprom/options # eeprom 'oem-banner?=true'
>> 15:09 ares:/proc/openprom/options # cat oem-banner?
>> true
>>
>> (BTW, why does not openpromfs have it rw?)
>
>It used to be able to :-)
>
>When I changed sparc/sparc64 over to an in-memory copy of the
>OFW tree, I wasn't able to retain writable property support
>in openpromfs. It just needs to be implemented and I never
>found the desire nor time.
However, there is this one pseudofile "security-password" which _is_
writable, what about it?
-`J'
--
>>> All we've done is created a trivial implementation for exporting
>>> the device tree to userland that isn't burdened by the powerpc
>>> and sparc legacy code that's in there now.
>>
>> So now we'll have _3_ different implementations of exporting
>> the OFW device tree via procfs. Your's, the proc_devtree
>> of powerpc, and sparc's /proc/openprom
>>
>> That doesn't make any sense to me, having 3 ways of doing the same
>> exact thing and making no attempt to share code at all.
Not the same exact thing -- using a text representation for
the property contents is a very different thing (and completely
braindead).
>> If you want to do something new that consolidates everything, with the
>> goal of deprecating the existing stuff, that's great! But with they
>> way you're doing this, all the sparc and powerpc implementations
>> really can't take advantage of it.
The problem is that it is almost impossible to consolidate the
current OF code. It would be possible to consolidate the
filesystem code only though (and that's a good plan of course).
>> Am I the only person who sees something very wrong with this?
>
> No, I completely agree with you on this.
>
> If firmworks really wants to have a spearate filesystem that's fine.
> But please start with the ppc or sparc code and massage it into a
> a separate filesystem before adding support for a new platform.
It's so much easier to start from scratch in this case ;-)
> In case anyone wants to start this based on ppc I'd gladfully help
> to test this on pmac (32 and 64bit) and cell (64 bit).
Let's not base it on the PowerPC code, we don't want /proc.
How about this as a high-level design, it would be simple for
the OLPC guys to implement, and very easy for the other archs
to hook up to:
Every architecture that supports the device tree filesystem,
initialises a "struct device_tree_ops" with a bunch of
pointers to functions that allow you to traverse the device
tree and read its properties (and maybe write properties, or
even delete and create new nodes. The devtree filesystem code
simply calls into these functions to do the actual operations
on the device tree (access an in-kernel data structure, call
the OF, or both -- or something entirely different, who knows).
Segher
Folks,
If we reused the current code in fs/proc/proc_devtree.c
and re-wrote the underlying of_* routines for i386 only,
(in the hope of removing the complexity not needed for
this implementation) would that be an acceptable
implementation?
In other words, the of_* routines continue to define the
interface between kernel and the firmware/OS
layer. Although that code in proc_devtree.c defining
the functions duplicate_name() and fixup_name() is still
troubling to me.
IMHO, the directory entries in the filesystem
should be in the form "node-name@unit-address" (eg: /pci@1f,0,
"pci" is the node name, "@" is the separator character defined
by IEEE 1275, and "1f,0" is the unit-address,
which are always guaranteed to be unique. That's part of the
reason we did a separate implementation. I'm not sure
how we'll resolve that part of it or what problem that
code is trying to resolve by changing the directory names
that appear in the filesystem in a rather odd way. It's
not possible to have two ambiguously fully qualified nodes in the OFW
device tree, otherwise you would never be able to select
a specific one by name.
-David
>> +A regular file in ofwfs contains the exact byte sequence that
>> +comprises the OFW property value. Properties are not reformatted
>> +into text form, so numeric property values appear as binary
>> +integers. While this is inconvenient for viewing, it is generally
>> +easier for programs that read property values, and it means that
>> +Open Firmware documentation about property values applies
>> +directly, without having to separately document an ASCII
>> +transformation (which would have to separately specified for
>> +different kinds of properties).
>
> I appreciate the ASCII formatting that openpromfs currently does.
The text representation (in openpromfs) is *useless*, since it cannot
be transformed back into the binary representation.
Most tools have an easier time using the raw binary version too.
If a user wants to see text, they can use an "lsdevtree" program.
> Perhaps, should OFWFS be used, it could offer both?
Such transformations don't belong in the kernel but in userland.
>> +The recommended mount point for ofwfs is /ofw. (TBD: Should it be
>> +mounted somewhere under /sys instead?)
>
> Somewhere in /sys/firmware perhaps.
While I hate deep paths, /sys/firmware/device-tree does sound good.
Segher
> I would not exactly call what we have for powerpc
> "exporting the OFW device tree". I don't quite know
> what it is, but it isn't as simple as exporting the
> OFW device tree. I don't think we really wanted to
> get into any of that here.
The Linux PowerPC port uses an OF-like device tree on
*every* platform, even those that don't have a native OF.
It also has to work around lots of bugs in many OF
implementations. It's doing lots of stuff, and not all
of it is nicely separated into logical modules I'm afraid.
> Sure they can take advantage of it, if what they need
> to export is a read-only copy of the actual device tree
> without any legacy burden or having a writable/changeable
> copy of it with a trivial implementation.
So give them an interface that allows them to hook up to
your new code :-)
> This is a trivial implementation that suits it's purpose.
> It's simple. I'm not sure what more is needed for this
> project when it's pretty clear that i386 will never need
> any additional support for open firmware.
Never say never...
Segher
Some comments, mostly coding style:
> - 0xb0 - 0x13f Free. Add more parameters here if you really need them.
> + 0xb0 16 bytes Open Firmware information (magic, version, callback,
> idt)
Is there an OF ISA binding for x86 somewhere? And don't
point me to the source code, I'd like to see an actual
reference doc ;-)
> +// printk(KERN_WARNING "CALLOFW: %s\n", name);
Please remove disabled code. And don't use // comments
at all please.
> + if (call_firmware == NULL)
> + return (-1);
No parentheses around return value.
> + argarray[0] = (int)name;
This would warn on 64-bit systems, better write it as
(u32)(u64)name (and make sure that "name" is somewhere
in the low 4GB of memory). This doesn't handle 64-bit
client interface either btw.
> + while (numargs--) {
> + argarray[argnum++] = va_arg(ap, int);
> + }
No braces around single statements.
> +#undef MAXARGS
Why this #undef? That's nasty style, and this is at the
end of file anyway.
> +#if 0
Better use a normal comment.
> +Here are call templates for all the standard OFW client services.
You missed "instance-to-interposed-path" (standard, although
not required).
> + * By Mitch Bradley ([email protected]), with assistance from David
> Kahn.
> + * Most of the basic virtual file system structure was taken from a
> + * "promfs" example written by Arnd Bergmann. + *
My mailer messed up this line, that means you have trailing
spaces here :-)
> + + if (root == 0) {
Same here.
> +++ b/include/asm-i386/callofw.h
> @@ -0,0 +1,22 @@
> +#ifndef _I386_PROM_H
> +#define _I386_PROM_H
Better make the #define correspond to the file name.
Segher
From: David Kahn <[email protected]>
Date: Sun, 31 Dec 2006 19:37:52 -0800
> IMHO, the directory entries in the filesystem
> should be in the form "node-name@unit-address" (eg: /pci@1f,0,
> "pci" is the node name, "@" is the separator character defined
> by IEEE 1275, and "1f,0" is the unit-address,
> which are always guaranteed to be unique. That's part of the
> reason we did a separate implementation. I'm not sure
> how we'll resolve that part of it or what problem that
> code is trying to resolve by changing the directory names
> that appear in the filesystem in a rather odd way. It's
> not possible to have two ambiguously fully qualified nodes in the OFW
> device tree, otherwise you would never be able to select
> a specific one by name.
Absolutely, and if you look that's how Sparc's openpromfs names
things now. It even goes through all of the trouble to make
sure the unit addressing matches what the Sparc OBP uses
precisely.
From: Segher Boessenkool <[email protected]>
Date: Mon, 1 Jan 2007 04:33:13 +0100
> >>> All we've done is created a trivial implementation for exporting
> >>> the device tree to userland that isn't burdened by the powerpc
> >>> and sparc legacy code that's in there now.
> >>
> >> So now we'll have _3_ different implementations of exporting
> >> the OFW device tree via procfs. Your's, the proc_devtree
> >> of powerpc, and sparc's /proc/openprom
> >>
> >> That doesn't make any sense to me, having 3 ways of doing the same
> >> exact thing and making no attempt to share code at all.
>
> Not the same exact thing -- using a text representation for
> the property contents is a very different thing (and completely
> braindead).
The filesystem bit is for groveling around and getting information
from the shell prompt, or shell scripts. Text processing.
If you want the binary bits, export it with something like
/dev/openprom. We don't generally export binary representation
files out of /proc or /sys, in fact this rule I believe is layed
our precisely somewhere at least in the sysfs case.
> Every architecture that supports the device tree filesystem,
> initialises a "struct device_tree_ops" with a bunch of
> pointers to functions that allow you to traverse the device
> tree and read its properties (and maybe write properties, or
> even delete and create new nodes. The devtree filesystem code
> simply calls into these functions to do the actual operations
> on the device tree (access an in-kernel data structure, call
> the OF, or both -- or something entirely different, who knows).
That's the only key point in my opinion, any clean interface that
sits in front of this stuff is fine as long as it encompasses
all of the necessary operations and allows just about any
implementation underneath.
>> Not the same exact thing -- using a text representation for
>> the property contents is a very different thing (and completely
>> braindead).
>
> The filesystem bit is for groveling around and getting information
> from the shell prompt, or shell scripts. Text processing.
>
> If you want the binary bits, export it with something like
> /dev/openprom.
But we'd have to implement a tree structure on top of that.
If for doing "real" things we have to use some device file with
some helper program, we can do away with the whole filesystem
thing just as well -- *always* use that helper program.
On the PowerPC side of things, we have binary properties in
the device tree filesystem, and have a simple user space
program to make it readable as text.
> We don't generally export binary representation
> files out of /proc or /sys, in fact this rule I believe is layed
> our precisely somewhere at least in the sysfs case.
That only works for sysfs because there is the "one value
per file" rule -- something the OF device tree doesn't do,
and doesn't need to do, since it uses a well-defined binary
format.
If you *really* want (the option of) showing things as text
in the filesystem, you better make it so that there is a
one-to-one translation back to binary. For example, what
does this mean, is it a text string or two bytes:
01.02
Yes you as a user can guess, but scripts can't (reliably).
>> Every architecture that supports the device tree filesystem,
>> initialises a "struct device_tree_ops" with a bunch of
>> pointers to functions that allow you to traverse the device
>> tree and read its properties (and maybe write properties, or
>> even delete and create new nodes. The devtree filesystem code
>> simply calls into these functions to do the actual operations
>> on the device tree (access an in-kernel data structure, call
>> the OF, or both -- or something entirely different, who knows).
>
> That's the only key point in my opinion, any clean interface that
> sits in front of this stuff is fine as long as it encompasses
> all of the necessary operations and allows just about any
> implementation underneath.
Yeah. And I'd like to have it as a collection of function
pointers so the arch code can put in the implementation it
needs at boot time. Flexibility is good.
Segher
David Miller wrote:
>
> We don't generally export binary representation
> files out of /proc or /sys, in fact this rule I believe is layed
> our precisely somewhere at least in the sysfs case.
>
>
pci-sysfs exports PCI config space in binary.
On Jan 1 2007 08:10, Mitch Bradley wrote:
>>
>> We don't generally export binary representation
>> files out of /proc or /sys, in fact this rule I believe is layed
>> our precisely somewhere at least in the sysfs case.
>>
> pci-sysfs exports PCI config space in binary.
cat /sys/bus/pci/devices/0000\:01\:00.0/subsystem_device
0x0c60
Can't find the binary thing. But I've seen it before -- esp.
/proc/usb when it existed. It's really sad, because you can learn a
lot from the text representations. Having to figure out what the
proper utility and all its options is often takes some time.
-`J'
--
From: Segher Boessenkool <[email protected]>
Date: Mon, 1 Jan 2007 18:48:33 +0100
> If you *really* want (the option of) showing things as text
> in the filesystem, you better make it so that there is a
> one-to-one translation back to binary. For example, what
> does this mean, is it a text string or two bytes:
>
> 01.02
>
> Yes you as a user can guess, but scripts can't (reliably).
We have some extensive code in fs/openpromfs/inode.c that
determines whether a property is text or not. I can't
guarentee it works %100, but it's very context dependant
(only the driver "knows") but it works for all the cases
I've tried.
I really think you're making a mountain out of a mole hill, to be
honest :-)
>> If you *really* want (the option of) showing things as text
>> in the filesystem, you better make it so that there is a
>> one-to-one translation back to binary. For example, what
>> does this mean, is it a text string or two bytes:
>>
>> 01.02
>>
>> Yes you as a user can guess, but scripts can't (reliably).
>
> We have some extensive code in fs/openpromfs/inode.c that
> determines whether a property is text or not. I can't
> guarentee it works %100, but it's very context dependant
> (only the driver "knows") but it works for all the cases
> I've tried.
It's still a heuristic, I don't think the kernel should be
doing things like this; leave the guesswork to userland,
where different users can guess in different ways if they
want/need.
Some real life properties contain _both_ a binary part and
a text part, btw.
> I really think you're making a mountain out of a mole hill, to be
> honest :-)
Heh. There is one big problem: text representation is useless
(to scripts etc.) unless it can be transformed back to binary;
i.e., it has to be possible to reliably detect _how_ some
property is represented into text, something that cannot be
done with how openpromfs handles it.
Segher
David Miller wrote:
> We have some extensive code in fs/openpromfs/inode.c that
> determines whether a property is text or not. I can't
> guarentee it works %100, but it's very context dependant
> (only the driver "knows") but it works for all the cases
> I've tried.
The problem with guessing, as you've noted, is that you
can't be 100% correct for device specific stuff.
Sure, you can guess that standard properties defined
by the core spec like "ranges" and "reg" consist of
integer data, but you can't make any guess about
device-specific stuff. (The heuristics you mentioned
just look at the data to see if it consists of printable
characters as far as I can tell, and that too isn't
foolproof, as you noted.)
Properties that consist of simple string data will just
show up as printable string data, but it's usually best to
leave the interpretation of binary properties up to the
entity that's consuming them, since they have to
know how to interpret them.
Also, the sparc port doesn't have to deal with
endian issues, since prop-encoding is big-endian.
There's really no way to "guess" properly.
A userland library or program can do whatever
it wants to as a helper. The interpretation of
the data should be done by the entity that consumes
it.
If that doesn't fit the model of /sys or /proc,
I suppose it could be done in a separate file
system, but that's overkill, isn't it?
-David
From: Segher Boessenkool <[email protected]>
Date: Tue, 2 Jan 2007 00:52:36 +0100
> There is one big problem: text representation is useless
> (to scripts etc.) unless it can be transformed back to binary;
> i.e., it has to be possible to reliably detect _how_ some
> property is represented into text, something that cannot be
> done with how openpromfs handles it.
Text is text is text for many propertiers, in particular
the ones you actually end up wanting to modify.
The biggest and most used example are the device aliases
and the 'boot-device' and 'boot-file' environment variables.
We even have a special case for that latter case on sparc
via:
echo 'foo.image' >/proc/sys/kernel/reboot-cmd
In order for a problem to exist, you have to show counter
examples where the problem triggers and something fails.
What in userspace wants to modify a OFW property, which
is not text?
In my experience all such cases are limited to ASCII text
valued properties, such as device aliases, environment
variables, and things like nvramrc.
From: David Kahn <[email protected]>
Date: Mon, 01 Jan 2007 17:40:25 -0800
> If that doesn't fit the model of /sys or /proc,
> I suppose it could be done in a separate file
> system, but that's overkill, isn't it?
Or by a device driver, which is what OFW systems have
been doing for years, and we have it already, it's called
/dev/openprom and if you provide the of_*() API you could
use it out of the box too.
> I'm incredibly surprised how much resistence there is from the
> i386 OFW folks to do this right. It would be like 80 lines of
> code to suck the device tree into kernel memory, or if they don't
> want to do that they can use inline function wrappers to provide
> the clean C-language interface to all of this and the cost to
> i386-OFW would be zero with the benefit that other platforms could
> use the code potentially.
>
> Doing the same thing 3 different ways, knowingly, is just very bad
> engineering. That is how you end up with a big fat pile of
> unmaintainable poo instead of a clean maintainable source tree. If we
> fix a bug in one of these things, the other 2 are so different that if
> the bug is in the others we'll never know and it's not easy to check
> so people won't do it.
>
> So please do this crap right.
I strongly agree. Nowadays, both powerpc and sparc use an in-memory copy
of the tree (wether you use the flattened format during the trampoline
from OF runtime to the kernel or not is a different matter, we created
that for the sake of kexec and embedded devices with no real OF, but the
end result is the same, a kernel based tree structure).
There is already powerpc's /proc/device-tree and sparc's openpromfs, I'm
all about converging that to a single implementation (a filesystem is
fine) that uses the in-memory tree.
Ben.
> I would strongly suggest looking at things like
> arch/{sparc,sparc64,powerpc}/kernel/prom.c and
> include/asm-{sparc,sparc64,powerpc}/prom.h and
> arch/{sparc,sparc64,powerpc}/kernel/of_device.c and
> include/asm-{sparc,sparc64,powerpc}/of_device.h
> since we've already invested a lot of thought and
> infrastructure into providing interfaces to this information
> on powerpc and the two sparc platforms.
In addition, I haven't given on the idea one day of actually merging the
powerpc and sparc implementation of a lot of that stuff. Mostly the
device-tree accessors proper, the of_device/of_platform bits etc... into
something like drivers/of1394 maybe.
Thus if i386 is going to have a device-tree, please use the same
interfaces.
Ben.
From: Benjamin Herrenschmidt <[email protected]>
Date: Tue, 02 Jan 2007 14:45:31 +1100
> In addition, I haven't given on the idea one day of actually merging the
> powerpc and sparc implementation of a lot of that stuff. Mostly the
> device-tree accessors proper, the of_device/of_platform bits etc... into
> something like drivers/of1394 maybe.
>
> Thus if i386 is going to have a device-tree, please use the same
> interfaces.
Yes a large amount of the code is identical.
In fact we've been finding bugs in each other's stuff along
the way, so the sooner we consolidate the sooner we stop
having to fix every bug twice :)
> The base interface function is callofw(), which is effectively identical
> to call_prom_ret() in arch/powerpc/kernel/prom_init.c . So it seems
> that PowerPC could use it. I suppose I could change the name of
> callofw() to call_prom_ret(), thus making the base interface identical
> to PowerPC's. All it does is argument marshalling, translating between
> C varargs argument lists and the OFW argarray format.
Except that none of the powerpc platforms can keep OF alive after the
kernel has booted, which is why we do an in-memory copy of the tree.
We have well defined interfaces to access that copy and you should do
the same on i386.
We also have defined mecanisms for non-OF firmwares to pass device-trees
that follow the OF defined bindings, mostly for embedded things (and
there are already some of these on the field).
> SPARC should be able to use that same base interface function directly.
> It is written to the standard OFW client interface. The x86 client
> interface that I tested it on is essentially the same code that is in
> OBP. It wouldn't work on ancient Sun machines with the sunmon romvec
> interface, but Sun stopped making such machines something like 16 years ago.
sparc64 also uses an in-memory copy of the tree nowadays.
> I did look at those files, until my eyes glazed over. In powerpc land,
> the files that are the underlayer for proc_devtree.c comprise 4700 lines
> of code (the files you list plus prom_init.c). In sparc land, it is
> only 3200 lines (the files you list plus the prom interface library).
> On top of that, proc_devtree.c is 233 lines.
>
> In contrast, ofw_fs.c is 261 lines, and the base interface function
> callofw() is 97 lines (half of them comments).
I think the number of lines of code involved here is totally irrelevant.
Especially if you just add together all the stuff in prom.c which isn't
entirely related to simply accessing the device-tree. You are just
over-simplifying. (And in fact, powerpc also carries some earlier
version of the interfaces that is deprecated but that we haven't removed
yet).
> Admittedly, this is something of an apples-to-oranges comparison,
> because ofw_fs only exports a read-only device tree and nothing else.
> But in the case where that is all you need, a direct interface to OFW
> that avoids the middleman seems like a good choice.
Except that your approach is only ever useable when there is a "live" OF
still present which isn't the case on powerpc.
> I did consider first creating a memory data structure identical to the
> powerpc/sparc one, but that looked like it was going to be essentially
> twice as much code for no extra capability.
The code to create it is fairly small and __init. Doesn't matter -that-
much.
> The code to traverse the
> device tree and create the memory data structure is roughly the same as
> the code to create the filesystem structure. I just didn't see the
> value of an intermediate representation for systems that don't otherwise
> need it. (A setup layer would have let me use proc_devtree.c directly,
> so the total amount of new code would have been the same, but many
> people told me that if I even suggested using procfs the kernel gurus
> would blow me out of the water without bothering to blink.)
There are some performances advantages, when walking the tree later on,
in not using direct OF accesses all the time (on platforms where that is
possible at all) too. In addition, the kernel implementation with the
in-memory tree adds some refcounting and locking while I suspect that to
access a real OF, you have to hard-single-thread everything.
Finally, you can't have something as simple as powerpc's get_property()
(that just returns a pointer to the property content) with direct OF
access unless you use some magic static buffer or some crap around those
lines, or add passing of a buffer in, so from a driver pointer of view,
the interface provided by powerpc/sparc is nicer.
There are additional issues on various platforms related to actually
passing buffers in/out OF, which cannot always access the full system
memory (think about OF running in 32 bits mode on some machines etc...).
> In the SPARC and PowerPC spaces, Open Firmware is widespread, so it
> makes sense for those kernels to use OFW extensively. In x86 land, OFW
> is far from being the dominant firmware, so the x86 kernel is unlikely
> to depend on OFW services at a deep level. That being the case, the
> deep-integration features of the sparc and powerpc OFW interfaces are
> not needed in x86 land. But a lightweight interface to the device tree
> is certainly useful for the platforms that do have OFW. It might be
> useful for other processors as well, especially on platforms that don't
> need the deep configurability that drove the OFW design.
I still don't agree with having yet another interface different to the
existing ones.
Ben.
> This is a trivial implementation that suits it's purpose.
> It's simple. I'm not sure what more is needed for this
> project when it's pretty clear that i386 will never need
> any additional support for open firmware.
I don't agree. It's definitely not clear to me.... Especially as open
source OF implementations are starting to show up, it makes a lot of
sense to start representing non-enumerable devices (CPUs, legacy
devices, embedded things etc...) in the device-tree and I fail to see
why an x86 OF-based machine would not get the ability to use it in the
same way as powerpc or sparc does in that regard.
Ben.
On Sun, 2006-12-31 at 19:37 -0800, David Kahn wrote:
> Folks,
>
> If we reused the current code in fs/proc/proc_devtree.c
> and re-wrote the underlying of_* routines for i386 only,
> (in the hope of removing the complexity not needed for
> this implementation) would that be an acceptable
> implementation?
>
> In other words, the of_* routines continue to define the
> interface between kernel and the firmware/OS
> layer. Although that code in proc_devtree.c defining
> the functions duplicate_name() and fixup_name() is still
> troubling to me.
The main problem with the current powerpc interface to the device-tree
(though as I said earlier, it's very handy for drivers) is that
get_property() doesn't take a buffer pointer. Thus, it's not very
suitable for an interface that involves calling into OF to retreive the
properties. It's really an interface designed around the idea that the
tree is in kernel memory, and the lifetime of the properties is tied to
the lifetime of the node.
> IMHO, the directory entries in the filesystem
> should be in the form "node-name@unit-address" (eg: /pci@1f,0,
> "pci" is the node name, "@" is the separator character defined
> by IEEE 1275, and "1f,0" is the unit-address,
> which are always guaranteed to be unique.
They should be. The problem is buggy OF implementations. For example,
both IBM and Apple OFs have the nasty habit of having under the CPU
nodes an "l2-cache" node with no unit-address -and- a property with the
same name (which contains just a phandle to that l2-cache node btw).
There are other examples too (some pmacs have a duplicate i2c bus with
some one of the copies containing only a subset of the devices)
In general, we don't fabricate the @unit-address part, we use OF's own
package-to-path (unlike sparc which I think doesn't always have that
method), and thus we have to deal with implementations that return no
unit-address or duplicate names.
> It's
> not possible to have two ambiguously fully qualified nodes in the OFW
> device tree, otherwise you would never be able to select
> a specific one by name.
Well, it happens to be the case though. The code is to work around that.
A normal bug-free tree should never trigger the workarounds.
Ben.
> The filesystem bit is for groveling around and getting information
> from the shell prompt, or shell scripts. Text processing.
>
> If you want the binary bits, export it with something like
> /dev/openprom. We don't generally export binary representation
> files out of /proc or /sys, in fact this rule I believe is layed
> our precisely somewhere at least in the sysfs case.
Well, on powerpc, we've always had it binary. We expose the files with
the exact binary content of the properties. We then use paulus' "lsprop"
tool which is installed by default on pretty much all ppc distros, which
duplicates OF's heuristics to display property contents (as strings, hex
values or mix of both depending on that they contain).
It has proved a good idea in general as I can easily get an exact
device-tree dump from users by asking for a tarball of /proc/device-tree
and in some case, the data in there -is- binary (For example, the EDID
properties for monitors left by video drivers, or things like that).
Ben.
From: Benjamin Herrenschmidt <[email protected]>
Date: Tue, 02 Jan 2007 15:05:59 +1100
> It has proved a good idea in general as I can easily get an exact
> device-tree dump from users by asking for a tarball of /proc/device-tree
> and in some case, the data in there -is- binary (For example, the EDID
> properties for monitors left by video drivers, or things like that).
Yes and with openpromfs I can get the EDID too :-)
root@sunset:/proc/openprom/pci@1f,700000/SUNW,XVR-100@3# cat edid
00ffffff.ffffff00.4dd9d000.67175700.2d0d0103.0e321f78.eacea9a3.574c9926.19484cbd.ee80a940.81808140.01010101.01010101.0101734b.80a072b0.2a4080d0.1300ef35.1100001c.483f4030.62b03240.40c01300.ef351100.001e0000.00fd0037.411e5f14.000a2020.20202020.000000fc.0053444d.2d503233.32570a20.202000af
root@sunset:/proc/openprom/pci@1f,700000/SUNW,XVR-100@3#
I think there is high value in an OFW filesystem representation
that gives you _EXACTLY_ what the OFW command line prompt does
when you try to traverse the device tree from there, and that
is what openpromfs tries to do.
If you want raw access, use a character device or a similar auxilliary
access to the data items. Another idea is to provide a seperate file
operation (such as ioctl) on the OFW property files in order to fetch
things raw and in binary.
When I get some binary data out of a procfs or sysfs file I feel like
strangling somebody. I'm grovelling around in a filesystem from the
command line so that I can get some information as a user. If you
don't give me text I can't tell what the heck it is.
Simple system tools should not need to interpret binary data in
order to provide access to simple structured data like this, that's
just stupid.
> I think there is high value in an OFW filesystem representation
> that gives you _EXACTLY_ what the OFW command line prompt does
> when you try to traverse the device tree from there, and that
> is what openpromfs tries to do.
Except that every OFW implementation I have here shows you different
things for the same original binary data :-)
> If you want raw access, use a character device or a similar auxilliary
> access to the data items. Another idea is to provide a seperate file
> operation (such as ioctl) on the OFW property files in order to fetch
> things raw and in binary.
>
> When I get some binary data out of a procfs or sysfs file I feel like
> strangling somebody. I'm grovelling around in a filesystem from the
> command line so that I can get some information as a user. If you
> don't give me text I can't tell what the heck it is.
>
> Simple system tools should not need to interpret binary data in
> order to provide access to simple structured data like this, that's
> just stupid.
I would agree with you if the data was properly typed in the first place
but it's not, thus you end up with heuristics and I hate heuristics in
the kernel :-) Now, that's also why everybody on ppc has "lsprop" at
hand which does the "pretty printing" thing.
I like being able to have a simple way (ie. tar /proc/device-tree) to
tell user to send me their DT and have in the end an exact binary
representation so I can actually dig for problems, like a wrong phandle
in an interrupt-map or stuff like that...
Cheers,
Ben.
From: Benjamin Herrenschmidt <[email protected]>
Date: Tue, 02 Jan 2007 15:57:05 +1100
> I like being able to have a simple way (ie. tar /proc/device-tree) to
> tell user to send me their DT and have in the end an exact binary
> representation so I can actually dig for problems, like a wrong phandle
> in an interrupt-map or stuff like that...
"prtconf -pv" is what I'd ask the user to do on Sparc, or something
similar.
In over 10 years of the sparc port there's never been a situation
where "prtconf -pv" or similar did not get me the information I
needed. :-)
"prtconf" walks the device tree raw using /dev/openprom and
pretty prints it like I assume your ppc "lsprop" thing does.
On Mon, 2007-01-01 at 21:01 -0800, David Miller wrote:
> From: Benjamin Herrenschmidt <[email protected]>
> Date: Tue, 02 Jan 2007 15:57:05 +1100
>
> > I like being able to have a simple way (ie. tar /proc/device-tree) to
> > tell user to send me their DT and have in the end an exact binary
> > representation so I can actually dig for problems, like a wrong phandle
> > in an interrupt-map or stuff like that...
>
> "prtconf -pv" is what I'd ask the user to do on Sparc, or something
> similar.
>
> In over 10 years of the sparc port there's never been a situation
> where "prtconf -pv" or similar did not get me the information I
> needed. :-)
>
> "prtconf" walks the device tree raw using /dev/openprom and
> pretty prints it like I assume your ppc "lsprop" thing does.
Probably. The question now is that if we want to somewhat converge, what
to do... either change sparc habits or change powerpc habits :-) I'll
let that fight happen between you and paulus and watch while having a
beer at LCA though :-)
Ben.
From: Benjamin Herrenschmidt <[email protected]>
Date: Tue, 02 Jan 2007 16:09:14 +1100
> Probably. The question now is that if we want to somewhat converge, what
> to do... either change sparc habits or change powerpc habits :-) I'll
> let that fight happen between you and paulus and watch while having a
> beer at LCA though :-)
compromise is always possible, especially over beer :)
>> It has proved a good idea in general as I can easily get an exact
>> device-tree dump from users by asking for a tarball of
>> /proc/device-tree
>> and in some case, the data in there -is- binary (For example, the EDID
>> properties for monitors left by video drivers, or things like that).
>
> Yes and with openpromfs I can get the EDID too :-)
>
> root@sunset:/proc/openprom/pci@1f,700000/SUNW,XVR-100@3# cat edid
> 00ffffff.ffffff00.4dd9d000.67175700.2d0d0103.0e321f78.eacea9a3.574c9926
> .19484cbd.ee80a940.81808140.01010101.01010101.0101734b.80a072b0.2a4080d
> 0.1300ef35.1100001c.483f4030.62b03240.40c01300.ef351100.001e0000.00fd00
> 37.411e5f14.000a2020.20202020.000000fc.0053444d.2d503233.32570a20.20200
> 0af
> root@sunset:/proc/openprom/pci@1f,700000/SUNW,XVR-100@3#
And how do you know from this output that the property
didn't contain this stuff as a text string?
> I think there is high value in an OFW filesystem representation
> that gives you _EXACTLY_ what the OFW command line prompt does
> when you try to traverse the device tree from there, and that
> is what openpromfs tries to do.
I suppose you're referring to the OF .properties command,
it's trivial to get at the *actual property contents* from
the OF prompt too.
.properties is a OF _user interface_ command, it is a helper
thing to show the properties in a more easily human digestible
format. The analogue in Linux would be a user land tool.
> If you want raw access, use a character device or a similar auxilliary
> access to the data items. Another idea is to provide a seperate file
> operation (such as ioctl) on the OFW property files in order to fetch
> things raw and in binary.
If user space can get raw access to the data via a device
file, there shouldn't be a filesystem at all, just some
tool that interfaces to the device. Since the data we're
representing is naturally tree-based though, a filesystem
is a much nicer solution than a character device with a
bunch of ioctls.
> When I get some binary data out of a procfs or sysfs file I feel like
> strangling somebody.
Right, let's encode PCI config space (all of it, not just the
few standard fields) into 256 files config-00, ..., config-ff.
Superb plan!
The fact is, if you get some data (a property value, or the
contents of config space), that you do not know what it means
or how it is structured even, there is no good way to represent
it other than passing it through as-is. Or you can guess, and
guess wrong (sometimes at least). Scripts parsing your output
will have to read all formats you produce and transform it back
into something usable -- they can *never* use it as-is, they
can't assume some property will always be text for example (be
text according to your heuristic, that is). Oh, except the format
you chose cannot be transformed back at all.
Another example of that then: I'd like to _see_ the difference
between (as hex bytes) "61 62 63" and "61 62 63 00" -- if only
to find bugs in OF code, or whatever. With the property_show()
thing in openpromfs, I can't tell (well maybe the first one
crashes on the strlen() call, if prop->val doesn't get an extra
0 tagged onto the end during creation, I didn't check).
> I'm grovelling around in a filesystem from the
> command line so that I can get some information as a user. If you
> don't give me text I can't tell what the heck it is.
Sure, but you don't need the kernel to present it as text, use
some helper thing instead, that can use *much* better heuristics
(like .properties in OF actually does) to decide on matters.
> Simple system tools should not need to interpret binary data in
> order to provide access to simple structured data like this, that's
> just stupid.
The structure within a property is context-dependent, it isn't
simple at all (in general). And the binary data doesn't need
interpretation, the text representation of it does though --
interpretation back to usable binary form.
Segher
>> There is one big problem: text representation is useless
>> (to scripts etc.) unless it can be transformed back to binary;
>> i.e., it has to be possible to reliably detect _how_ some
>> property is represented into text, something that cannot be
>> done with how openpromfs handles it.
>
> Text is text is text for many propertiers,
Sure some properties contain (or _should_ contain!) just
one simple text string. You can simply "cat" those when
they are binary in the filesystem too FWIW.
> in particular
> the ones you actually end up wanting to modify.
But that is just one simple use of the filesystem. One that
doesn't work at all on PowerPC btw -- at kernel run time,
we don't have access to OF at all anymore.
> In order for a problem to exist, you have to show counter
> examples where the problem triggers and something fails.
>
> What in userspace wants to modify a OFW property, which
> is not text?
I never was talking about modifying. Most things that
most users want to modify aren't normal OF properties
anyway, but configuration variables. In some/many OF
implementations updating those via the device tree doesn't
work.
> In my experience all such cases are limited to ASCII text
> valued properties, such as device aliases, environment
> variables, and things like nvramrc.
If the text representation in the file system was unambiguous,
it wouldn't be useless for scripts anymore, merely very
inconvenient.
Segher
>> So please do this crap right.
>
> I strongly agree. Nowadays, both powerpc and sparc use an in-memory
> copy
> of the tree (wether you use the flattened format during the trampoline
> from OF runtime to the kernel or not is a different matter, we created
> that for the sake of kexec and embedded devices with no real OF, but
> the
> end result is the same, a kernel based tree structure).
Are you really suggesting that using a kernel copy of the
device tree is the correct thing to do, and the only correct
thing to do -- with the sole argument that "that's what the
current ports do"?
> There is already powerpc's /proc/device-tree and sparc's openpromfs,
> I'm
> all about converging that to a single implementation (a filesystem is
> fine)
We all agree on that, the OLPC people too, they just didn't
have time yet.
> that uses the in-memory tree.
...but to that I can't agree.
Segher
> In addition, I haven't given on the idea one day of actually merging
> the
> powerpc and sparc implementation of a lot of that stuff. Mostly the
> device-tree accessors proper, the of_device/of_platform bits etc...
> into
> something like drivers/of1394 maybe.
1394? :-)
> Thus if i386 is going to have a device-tree, please use the same
> interfaces.
Such a hypothetical "converged" interface would not be
identical to either the current SPARC or the PowerPC
code, nor just a mix of those two. It's a great plan
to try to get to this eventually, but it will take time.
A lot of it perhaps.
Segher
> Except that none of the powerpc platforms can keep OF alive after the
> kernel has booted, which is why we do an in-memory copy of the tree.
Adding that functionality hasn't gotten easier at all since
we use the flattened tree for everything, heh.
> We have well defined interfaces to access that copy and you should do
> the same on i386.
There should be a well-defined "C-style" interface, yes. But it
does _not_ have to be a copy of the data and it does _not_ have
to be the same interface.
You also can't require "i386" to add that stuff right now, when
they only have one user (the fs code) that is perfectly happy
calling the low-level interface for the three or four calls it
needs.
> We also have defined mecanisms for non-OF firmwares to pass
> device-trees
> that follow the OF defined bindings, mostly for embedded things (and
> there are already some of these on the field).
Yeah, and that's a great thing. Any "unified" interface we
come up with should support both those "flat trees", and
"real OF trees", but also "real, still alive during kernel
run-time, OF".
> There are some performances advantages, when walking the tree later on,
> in not using direct OF accesses all the time (on platforms where that
> is
> possible at all) too.
Yes. A problem that can easily be solved by a little cache
thing (for the device nodes, not even the property data) if
it turns out to be a real advantage.
> In addition, the kernel implementation with the
> in-memory tree adds some refcounting and locking while I suspect that
> to
> access a real OF, you have to hard-single-thread everything.
Not single thread -- but a "global OF lock" yes. Not that
it matters too much, (almost) all property accesses are init
time anyway (which is effectively single threaded).
> Finally, you can't have something as simple as powerpc's get_property()
> (that just returns a pointer to the property content) with direct OF
> access unless you use some magic static buffer or some crap around
> those
> lines, or add passing of a buffer in, so from a driver pointer of view,
> the interface provided by powerpc/sparc is nicer.
But since you have a _put() function anyway, for your reference
counting, having magic (automatically allocated, not static of
course) buffers works just fine.
> There are additional issues on various platforms related to actually
> passing buffers in/out OF, which cannot always access the full system
> memory (think about OF running in 32 bits mode on some machines
> etc...).
Or simply using a 32-bit client interface, yes. You can sidestep that
issue by not having OF around at all anymore by the time the kernel
proper starts, but that's not a good argument for not running with OF
still alive.
> I still don't agree with having yet another interface different to the
> existing ones.
Yes, the interfaces should be consolidated (at least as far as it
makes sense, and people can reach agreement) -- let's get there step
by step please, don't say "you can't do an OF interface in the kernel
since it's not identical to ours and you didn't do all the necessary
work to merge and fix everything up front" :-)
So let's decide on the text-vs.-binary fs thing first...
Segher
>> IMHO, the directory entries in the filesystem
>> should be in the form "node-name@unit-address" (eg: /pci@1f,0,
>> "pci" is the node name, "@" is the separator character defined
>> by IEEE 1275, and "1f,0" is the unit-address,
>> which are always guaranteed to be unique.
>
> They should be. The problem is buggy OF implementations. For example,
> both IBM and Apple OFs have the nasty habit of having under the CPU
> nodes an "l2-cache" node with no unit-address -and- a property with the
> same name
That is perfectly valid FWIW. Not a "best practice" or anything,
but valid nonetheless.
Device tree semantics do not fit POSIX filesystem semantics 100%,
you do need some workarounds for some edge cases yes.
>> It's
>> not possible to have two ambiguously fully qualified nodes in the OFW
>> device tree, otherwise you would never be able to select
>> a specific one by name.
>
> Well, it happens to be the case though. The code is to work around
> that.
> A normal bug-free tree should never trigger the workarounds.
Well it's not *technically* a bug to have two device nodes with
an exact identical path in OF, but sure :-)
Segher
>> Simple system tools should not need to interpret binary data in
>> order to provide access to simple structured data like this, that's
>> just stupid.
>
> I would agree with you if the data was properly typed in the first
> place
> but it's not,
OF device tree properties are "properly typed" just fine -- it's
just that only the intended consumers of the data know what to
expect, you certainly can't derive the data structures from just
looking at the data in one instance of a property.
Put another way, if you know what you are looking for in the
device tree, you can decode a property just fine -- if you're
given a random property on the other hand, you can never get
any better than the generic property structure of "array of bytes".
Segher
* Segher Boessenkool <[email protected]> [070102 12:37]:
> Are you really suggesting that using a kernel copy of the
> device tree is the correct thing to do, and the only correct
> thing to do -- with the sole argument that "that's what the
> current ports do"?
There might also be a speed advantage if any drivers or user space
programs operate heavily on the device tree on those systems with _lots_
of devices..
Stefan
--
coresystems GmbH • Brahmsstr. 16 • D-79104 Freiburg i. Br.
Tel.: +49 761 7668825 • Fax: +49 761 7664613
Email: [email protected] • http://www.coresystems.de/
Benjamin Herrenschmidt wrote:
>> This is a trivial implementation that suits it's purpose.
>> It's simple. I'm not sure what more is needed for this
>> project when it's pretty clear that i386 will never need
>> any additional support for open firmware.
>
> I don't agree. It's definitely not clear to me.... Especially as open
If Linuxbios has its way then this is indeed not true. We are in the
middle of our design for V3 and the hardware representation will be
based on a device-tree description. From that we plan to auto generate
the various tables that i386 stuff uses. But ideally the kernel could
get all it needed from the device-tree.
At least 1 major manufacturer is interested in providing LinuxBios on
some of its motherboards and AMD supports LinuxBios so in the near
future x86 should have a real reason to have full device-tree support.
--
Richard A. Smith.
On Tue, 2007-01-02 at 12:45 +0100, Segher Boessenkool wrote:
> > In addition, I haven't given on the idea one day of actually merging
> > the
> > powerpc and sparc implementation of a lot of that stuff. Mostly the
> > device-tree accessors proper, the of_device/of_platform bits etc...
> > into
> > something like drivers/of1394 maybe.
>
> 1394? :-)
Yeah, I'm tired, of1275 :-)
> > Thus if i386 is going to have a device-tree, please use the same
> > interfaces.
>
> Such a hypothetical "converged" interface would not be
> identical to either the current SPARC or the PowerPC
> code, nor just a mix of those two. It's a great plan
> to try to get to this eventually, but it will take time.
> A lot of it perhaps.
Ben.
On Tue, 2007-01-02 at 12:37 +0100, Segher Boessenkool wrote:
> >> So please do this crap right.
> >
> > I strongly agree. Nowadays, both powerpc and sparc use an in-memory
> > copy
> > of the tree (wether you use the flattened format during the trampoline
> > from OF runtime to the kernel or not is a different matter, we created
> > that for the sake of kexec and embedded devices with no real OF, but
> > the
> > end result is the same, a kernel based tree structure).
>
> Are you really suggesting that using a kernel copy of the
> device tree is the correct thing to do, and the only correct
> thing to do -- with the sole argument that "that's what the
> current ports do"?
Well, there are reasons why that's what the current ports do :-)
We could of course have the interface work either on a copy of the tree
or on a real OF (though that means changing things like get_property on
powerpc and fixing the gazillions of users) but I tend to think that
working on a copy always is more efficient.
Ben.
> We could of course have the interface work either on a copy of the tree
> or on a real OF (though that means changing things like get_property on
> powerpc and fixing the gazillions of users) but I tend to think that
> working on a copy always is more efficient.
>
>
The patch that I posted creates a copy in the virtual filesystem
domain. So the efficiency issue is moot.
The filesystem domain copy is smaller than the in-memory tree, as it
omits a lot of unnecessary fields.
On Tue, 2007-01-02 at 13:22 +0100, Segher Boessenkool wrote:
> > Except that none of the powerpc platforms can keep OF alive after the
> > kernel has booted, which is why we do an in-memory copy of the tree.
>
> Adding that functionality hasn't gotten easier at all since
> we use the flattened tree for everything, heh.
Not for everything, only for the trampoline between the client interface
environment and the kernel. But yeah, we haven't tried hard to go in
that direction at all.
.../...
> Not single thread -- but a "global OF lock" yes. Not that
> it matters too much, (almost) all property accesses are init
> time anyway (which is effectively single threaded).
Not that true anymore. A lot of driver probe is being threaded nowadays,
either bcs of the new multithread probing bits, or because they get
loaded by userland from some initramfs etc..
> > Finally, you can't have something as simple as powerpc's get_property()
> > (that just returns a pointer to the property content) with direct OF
> > access unless you use some magic static buffer or some crap around
> > those
> > lines, or add passing of a buffer in, so from a driver pointer of view,
> > the interface provided by powerpc/sparc is nicer.
>
> But since you have a _put() function anyway, for your reference
> counting, having magic (automatically allocated, not static of
> course) buffers works just fine.
You can maybe create an in-memory cache of all properties in a node from
whatever function that returns a node pointer and dispose of them on
_put(), that would allow to keep the get_property semantics as-is with a
"live" tree, but I still don't see the point
Ben.
On Tue, 2007-01-02 at 10:11 -1000, Mitch Bradley wrote:
>
> > We could of course have the interface work either on a copy of the tree
> > or on a real OF (though that means changing things like get_property on
> > powerpc and fixing the gazillions of users) but I tend to think that
> > working on a copy always is more efficient.
> >
> >
> The patch that I posted creates a copy in the virtual filesystem
> domain. So the efficiency issue is moot.
>
> The filesystem domain copy is smaller than the in-memory tree, as it
> omits a lot of unnecessary fields.
"a lot" ? Hrm... et me see... a device-node contains the pointers for
the tree structure and the property list ...
What other fields ? We do have indeed a void * on powerpc which is handy
for the PCI code in there but I might get rid of it at one point and we
have a copy of the phandle which is useful as soon as you try to parse
things like the interrupt-tree.
Now, you can have that, and then have a filesystem capable of
instanciating dentries & inodes on the fly based on that structure (and
thus purge them on memory pressure too. In which case the footprint is
actually smaller since inodes/dentries are big, thus it's a good thing
to be able to purge them and re-create them on the fly.
I do object basically to having something that doesn't also provide
in-kernel interfaces to access the device nodes & properties. I don't
agree with the reasoning that x86 will never need it. Now, we have
currently two slightly different interfaces, on powerpc and sparc, to
access them, and that's I think we should try to converge and use the
same interface for x86.
In addition, as sparc64 also moved to an in-memory copy of the tree, I
tend to be fairly convinced that we should also move toward the same
-implementation- also based on an in-memory copy of the tree which is
more efficient (and doesn't use a significant amount of RAM).
Ben.
>> Are you really suggesting that using a kernel copy of the
>> device tree is the correct thing to do, and the only correct
>> thing to do -- with the sole argument that "that's what the
>> current ports do"?
>
> Well, there are reasons why that's what the current ports do :-)
Sure. It might have been a good choice for the current two ports
(probably was even, at least at that point in time the choice was
made -- doesn't mean you can't ever step back on it).
That doesn't mean it's a good choice for everything and certainly
it's not the only realistic choice for everything.
> We could of course have the interface work either on a copy of the tree
> or on a real OF
Yes. I'd like to steer in that direction eventually.
> (though that means changing things like get_property on
> powerpc
You can keep the function name, and have it redirect through
a struct device_tree_ops or similar.
> and fixing the gazillions of users)
which isn't needed if we can manage to keep the function arguments
identical.
We also can keep the old names as compatibility accessors for
PowerPC only for a while, until everything is converted -- SPARC
can do the same then. You can't really keep the old PowerPC names
in kernel-wide code anyway, "get_property" is a way too generic
name for that for example.
> but I tend to think that
> working on a copy always is more efficient.
In general, nothing using the device tree cares about a few
cycles extra (well, a few thousand perhaps). The sole exception
would seem to be the filesystem; and it could easily keep a cache,
one with a global invalidate (via callback?) on any device tree
change even -- changes are infrequent, and basically non-existent
after early kernel boot. Does any generic such cache for all
simple filesystems exist already?
Segher
>> Not single thread -- but a "global OF lock" yes. Not that
>> it matters too much, (almost) all property accesses are init
>> time anyway (which is effectively single threaded).
>
> Not that true anymore. A lot of driver probe is being threaded
> nowadays,
> either bcs of the new multithread probing bits, or because they get
> loaded by userland from some initramfs etc..
The kernel doesn't care if one CPU is in OF land while the others
are doing other stuff -- well you have to make sure the OF won't
try to use a hardware device at the same time as the kernel, true.
>>> Finally, you can't have something as simple as powerpc's
>>> get_property()
>>> (that just returns a pointer to the property content) with direct OF
>>> access unless you use some magic static buffer or some crap around
>>> those
>>> lines, or add passing of a buffer in, so from a driver pointer of
>>> view,
>>> the interface provided by powerpc/sparc is nicer.
>>
>> But since you have a _put() function anyway, for your reference
>> counting, having magic (automatically allocated, not static of
>> course) buffers works just fine.
>
> You can maybe create an in-memory cache of all properties in a node
> from
> whatever function that returns a node pointer and dispose of them on
> _put(), that would allow to keep the get_property semantics as-is with
> a
> "live" tree, but I still don't see the point
I'm a bit concerned about the 100kB or so of data duplication
(on a *quite big* device tree), and the extra code you need
(all changes have to be done to both tree copies). Maybe
I shouldn't be worried; still, it's obviously not a great
idea to *require* any arch to get and keep a full copy of
the tree -- it's wasteful and unnecessary.
Segher
> The kernel doesn't care if one CPU is in OF land while the others
> are doing other stuff -- well you have to make sure the OF won't
> try to use a hardware device at the same time as the kernel, true.
That statement alone hides an absolute can of worms btw ;-)
> I'm a bit concerned about the 100kB or so of data duplication
> (on a *quite big* device tree), and the extra code you need
> (all changes have to be done to both tree copies). Maybe
> I shouldn't be worried; still, it's obviously not a great
> idea to *require* any arch to get and keep a full copy of
> the tree -- it's wasteful and unnecessary.
Well, big device-trees generally are on big machines with enough memory
not to care and the only platform I know where the DT can actually
change over time is IBM pSeries when doing DLPAR, in which case, OF is
dead, it all happens via magic HV/RTAS calls and the kernel is
-supposed- to maintain it's own copy and add/remove nodes from it.
Ben.
> I do object basically to having something that doesn't also provide
> in-kernel interfaces to access the device nodes & properties.
That's the wrong way around. Work is underway to instead
have the devicetreefs *use* the in-kernel interfaces. Would
that be acceptable?
> I don't
> agree with the reasoning that x86 will never need it.
Neither do I :-)
> Now, we have
> currently two slightly different interfaces, on powerpc and sparc, to
> access them, and that's I think we should try to converge and use the
> same interface for x86.
All should converge on the same interface. That does not
ab initio mean we should converge on what you currently
have (although that might eventually be that case).
> In addition, as sparc64 also moved to an in-memory copy of the tree, I
> tend to be fairly convinced that we should also move toward the same
> -implementation- also based on an in-memory copy of the tree which is
> more efficient (and doesn't use a significant amount of RAM).
Leaving aside the issue of in-memory or not, I don't think
it is realistic to think any completely common implementation
will work for this -- it might for current SPARC+PowerPC+OLPC,
but more stuff will be added over time...
Segher
>> The kernel doesn't care if one CPU is in OF land while the others
>> are doing other stuff -- well you have to make sure the OF won't
>> try to use a hardware device at the same time as the kernel, true.
>
> That statement alone hides an absolute can of worms btw ;-)
Oh I know. With a sane OF implementation, things will work
out fine though.
>> I'm a bit concerned about the 100kB or so of data duplication
>> (on a *quite big* device tree), and the extra code you need
>> (all changes have to be done to both tree copies). Maybe
>> I shouldn't be worried; still, it's obviously not a great
>> idea to *require* any arch to get and keep a full copy of
>> the tree -- it's wasteful and unnecessary.
>
> Well, big device-trees generally are on big machines with enough memory
> not to care and the only platform I know where the DT can actually
> change over time is IBM pSeries when doing DLPAR, in which case, OF is
> dead, it all happens via magic HV/RTAS calls and the kernel is
> -supposed- to maintain it's own copy and add/remove nodes from it.
You're almost convincing me. I'll sleep on it a night.
Segher
> All should converge on the same interface. That does not
> ab initio mean we should converge on what you currently
> have (although that might eventually be that case).
Well, Dave and I will happen to be in the same place in a few weeks for
LCA so we might spend some time having a look there if we don't have any
better to do :-)
> Leaving aside the issue of in-memory or not, I don't think
> it is realistic to think any completely common implementation
> will work for this -- it might for current SPARC+PowerPC+OLPC,
> but more stuff will be added over time...
And ? I don't see why a mostly common implementations wouldn't work,
provided that we provide hooks in the right place.
It's pretty clear to me that the actual construction of the in-memory
tree will remain platform specific (powerpc has this flattened format
used for the trampoline for example and so far, I don't think other
platforms plan to use it, though it might be a good idea too :-) sparc
has "issues" related to firmwares that aren't quite OF, etc...)
But it's also clear that the in-kernel representation, accessors and
filesystem could/should be totally identical, including all we build on
top, like prom_parse, of_device/of_platform device stuff etc.. (for
which I need to re-sync with davem too btw, as he did some fixes that I
didn't backport to powerpc... sigh)
The other -one- thing that has to be different is the write back for
properties that can be changed (/options typically) where the write back
mecanism is definitely platform specific.
Ben.
From: Segher Boessenkool <[email protected]>
Date: Tue, 2 Jan 2007 22:15:50 +0100
> We also can keep the old names as compatibility accessors for
> PowerPC only for a while, until everything is converted -- SPARC
> can do the same then. You can't really keep the old PowerPC names
> in kernel-wide code anyway, "get_property" is a way too generic
> name for that for example.
That's a simple change for PPC, sed 's/get_property/of_get_property//'
and that's what sparc used since the beginning of it's in-kernel
copy of the device tree implementation.
From: Segher Boessenkool <[email protected]>
Date: Tue, 2 Jan 2007 22:28:21 +0100
> >> Not single thread -- but a "global OF lock" yes. Not that
> >> it matters too much, (almost) all property accesses are init
> >> time anyway (which is effectively single threaded).
> >
> > Not that true anymore. A lot of driver probe is being threaded
> > nowadays,
> > either bcs of the new multithread probing bits, or because they get
> > loaded by userland from some initramfs etc..
>
> The kernel doesn't care if one CPU is in OF land while the others
> are doing other stuff -- well you have to make sure the OF won't
> try to use a hardware device at the same time as the kernel, true.
True, but at the very least you have to prevent multiple cpus
from enterring OFW. In fact this is very important.
OFW is not multi-threaded therefore you can't let multiple CPUs call
into OFW at one time. You must use some kind of locking mechanism,
and that locking mechanism is not simple because it has to not just
stop the other cpus, it has to be able to stop the other cpus yet
still allow them to receive SMP cross-calls from the firmware if the
OFW call is 'stop' or similar.
> I'm a bit concerned about the 100kB or so of data duplication
> (on a *quite big* device tree), and the extra code you need
> (all changes have to be done to both tree copies). Maybe
> I shouldn't be worried; still, it's obviously not a great
> idea to *require* any arch to get and keep a full copy of
> the tree -- it's wasteful and unnecessary.
The largest amount of memory I've ever seen consumed on sparc64
was 76K and this is 1) 64-bit and 2) an ENORMOUS machine with
lots of cpus and devices. And I know because sparc64 prints
a kernel message at boot which states how much memory was
consumed by the in-kernel device tree copy.
That ENORMOUS machine also had a lot of ram.
76K, that's just over 8 FREAKIN PAGES. It's nothing!
I think all this worry is overblown, terribly so in fact.
Please let's get over this memory consumption non-issue and move
on to more productive talk.
From: Segher Boessenkool <[email protected]>
Date: Tue, 2 Jan 2007 22:37:32 +0100
> Leaving aside the issue of in-memory or not, I don't think
> it is realistic to think any completely common implementation
> will work for this -- it might for current SPARC+PowerPC+OLPC,
> but more stuff will be added over time...
I see nothing supporting this IMHO bogus claim.
If you can traverse the device tree using OFW calls, you
can do it to build the in-kernel copy of the tree too.
How the tree is populated, etc., is not an issue for the
common code, for sure. Each platform does that in whatever
way is appropriate.
But the tree traversal, getting property values, etc. is indeed
a task for the common code and that is exactly what Ben is
suggesting.
From: Segher Boessenkool <[email protected]>
Date: Tue, 2 Jan 2007 22:40:17 +0100
> >> The kernel doesn't care if one CPU is in OF land while the others
> >> are doing other stuff -- well you have to make sure the OF won't
> >> try to use a hardware device at the same time as the kernel, true.
> >
> > That statement alone hides an absolute can of worms btw ;-)
>
> Oh I know. With a sane OF implementation, things will work
> out fine though.
Note that on PPC they do not co-exist with the OFW after
the kernel boots up because it's an enormous can of worms
if not downright impossible.
If I didn't need to make certain OFW calls after bootup on
sparc I'd throw away the OFW mappings and image too.
With software replaced TLBs it's an enormous hassle to
coexist safely with the OFW image. The locking is a
second order issue.
>> Leaving aside the issue of in-memory or not, I don't think
>> it is realistic to think any completely common implementation
>> will work for this -- it might for current SPARC+PowerPC+OLPC,
>> but more stuff will be added over time...
>
> And ? I don't see why a mostly common implementations wouldn't work,
> provided that we provide hooks in the right place.
Now read back and see that that is very close to what I said.
> It's pretty clear to me that the actual construction of the in-memory
> tree will remain platform specific (powerpc has this flattened format
> used for the trampoline for example and so far, I don't think other
> platforms plan to use it, though it might be a good idea too :-) sparc
> has "issues" related to firmwares that aren't quite OF, etc...)
>
> But it's also clear that the in-kernel representation, accessors and
> filesystem could/should be totally identical, including all we build on
> top, like prom_parse, of_device/of_platform device stuff etc.. (for
> which I need to re-sync with davem too btw, as he did some fixes that I
> didn't backport to powerpc... sigh)
The biggest problem is the huge collection of workarounds we have
for PowerPC alone already -- if that can be moved into some
quirk collection thing, where certain quirks are only run on some
systems, it might scale.
You'll also have to deal with endianness finally (you can *not*
access an integer property via an int*).
It will be easiest to start with a biggish collection of hooks,
that doesn't require too much code change, and slowly converge
stuff.
> The other -one- thing that has to be different is the write back for
> properties that can be changed (/options typically) where the write
> back
> mecanism is definitely platform specific.
All properties can be changed, any new property can be created.
Oh you mean after you killed OF -- yeah, it gets a bit harder
then eh :-)
Segher
> The biggest problem is the huge collection of workarounds we have
> for PowerPC alone already -- if that can be moved into some
> quirk collection thing, where certain quirks are only run on some
> systems, it might scale.
Well, if you notice, I've been moving most of such workarounds to
prom_init.c, that is, to the code that actually fetches the device-tree
from the firmware, which make things easier.
There are still a few, notably in prom_parse.c, but I'm pretty confident
they aren't too invasive (and I need to backport more fixes from david
there) and in macio_asic, but the later is specific enough to not be a
problem.
> You'll also have to deal with endianness finally (you can *not*
> access an integer property via an int*).
Yes, that's one big thing we'll need to do.
> It will be easiest to start with a biggish collection of hooks,
> that doesn't require too much code change, and slowly converge
> stuff.
Hooks ? How so ?
It's easier to start merging powerpc and sparc I reckon and then, "fix"
that so that it works on x86 :-)
> All properties can be changed, any new property can be created.
> Oh you mean after you killed OF -- yeah, it gets a bit harder
> then eh :-)
You know very well what I meant :-) The ones where it makes sense to
write them back to OF. On machines where OF dies, there are mecanism via
the nvram to store properties in /options (like boot-device etc...),
there are at least 2 such mecanisms (apple oldworld and chrp), and on
machines where OF is still alive, that should probably be an OF call of
some sort.
So we do want some sort of platform hook for -these-, but at the same
time, that's fairly low on the list. Currently, we have no code in the
kernel to deal with that on powerpc, it's all userland via /dev/nvram,
and I reckon it might just stay that way with platform specific userland
tools for a little longer.
Ben.
>>>> Not single thread -- but a "global OF lock" yes. Not that
>>>> it matters too much, (almost) all property accesses are init
>>>> time anyway (which is effectively single threaded).
>>>
>>> Not that true anymore. A lot of driver probe is being threaded
>>> nowadays,
>>> either bcs of the new multithread probing bits, or because they get
>>> loaded by userland from some initramfs etc..
>>
>> The kernel doesn't care if one CPU is in OF land while the others
>> are doing other stuff -- well you have to make sure the OF won't
>> try to use a hardware device at the same time as the kernel, true.
>
> True, but at the very least you have to prevent multiple cpus
> from enterring OFW. In fact this is very important.
Yes. "Global OF lock".
> OFW is not multi-threaded
You are not _guaranteed_ it is multithreaded, and you don't
know it's threading model (or how to do thread synchronisation).
> therefore you can't let multiple CPUs call
> into OFW at one time. You must use some kind of locking mechanism,
> and that locking mechanism is not simple because it has to not just
> stop the other cpus, it has to be able to stop the other cpus yet
> still allow them to receive SMP cross-calls from the firmware if the
> OFW call is 'stop' or similar.
YOu don't need to *stop* the other CPUs, you just need to
prevent them from entering the client interface. Put a lock
in front.
>> I'm a bit concerned about the 100kB or so of data duplication
>> (on a *quite big* device tree), and the extra code you need
>> (all changes have to be done to both tree copies). Maybe
>> I shouldn't be worried; still, it's obviously not a great
>> idea to *require* any arch to get and keep a full copy of
>> the tree -- it's wasteful and unnecessary.
>
> The largest amount of memory I've ever seen consumed on sparc64
> was 76K and this is 1) 64-bit and 2) an ENORMOUS machine with
> lots of cpus and devices. And I know because sparc64 prints
> a kernel message at boot which states how much memory was
> consumed by the in-kernel device tree copy.
The in-OF tree uses a bit more memory, depending on implementation.
It's hard to tell though, it contains so much more than the
properties-only tree, perhaps you're right.
> Please let's get over this memory consumption non-issue and move
> on to more productive talk.
Okay -- so answer the second part of my concern please: if you keep
a copy, you need to keep both in sync -- that means every change
by the kernel has to be done twice, and you won't ever be told about
changes by the OF, so you have to get a full fresh copy every single
time you return from an OF client call that could have changed a
property.
Segher
>> Leaving aside the issue of in-memory or not, I don't think
>> it is realistic to think any completely common implementation
>> will work for this -- it might for current SPARC+PowerPC+OLPC,
>> but more stuff will be added over time...
>
> I see nothing supporting this IMHO bogus claim.
Please keep in mind that not all systems want to kill OF
as soon as they enter the kernel -- some want to keep it
active basically forever (or only remove it when the user
asks for it).
Segher
[snipping a bit for now]
> It's easier to start merging powerpc and sparc I reckon
Well it won't hurt to merge and clean up these two first, sure.
> and then, "fix"
> that so that it works on x86 :-)
That works, if the goal is to just add x86/OLPC to the list of
platforms that have a device tree fs. I thought the plan was
to create a single, more generic, OF interface/API in the kernel
though.
Segher
On Jan 3 2007 01:52, Segher Boessenkool wrote:
>> > Leaving aside the issue of in-memory or not, I don't think
>> > it is realistic to think any completely common implementation
>> > will work for this -- it might for current SPARC+PowerPC+OLPC,
>> > but more stuff will be added over time...
>>
>> I see nothing supporting this IMHO bogus claim.
>
> Please keep in mind that not all systems want to kill OF
> as soon as they enter the kernel -- some want to keep it
> active basically forever (or only remove it when the user
> asks for it).
Kill OF? sparc does not want that IMO, how else should I return to
the 'ok' prompt?
-`J'
--
From: Segher Boessenkool <[email protected]>
Date: Wed, 3 Jan 2007 01:48:02 +0100
> > therefore you can't let multiple CPUs call
> > into OFW at one time. You must use some kind of locking mechanism,
> > and that locking mechanism is not simple because it has to not just
> > stop the other cpus, it has to be able to stop the other cpus yet
> > still allow them to receive SMP cross-calls from the firmware if the
> > OFW call is 'stop' or similar.
>
> YOu don't need to *stop* the other CPUs, you just need to
> prevent them from entering the client interface. Put a lock
> in front.
That's not the issue.
If the global OFW lock disables interrupts, other cpus trying to get
that lock can't receive CPU cross calls since they are delivered via
interrupts, get it? That's the issue you need to be careful about.
> > Please let's get over this memory consumption non-issue and move
> > on to more productive talk.
>
> Okay -- so answer the second part of my concern please: if you keep
> a copy, you need to keep both in sync -- that means every change
> by the kernel has to be done twice, and you won't ever be told about
> changes by the OF, so you have to get a full fresh copy every single
> time you return from an OF client call that could have changed a
> property.
Sure, you need to call OFW when a property is changed, and we have
code to handle that perfectly fine in the sparc of_*() code.
There are mechanisms on the OFW implementations that do hot plugging
to learn about the OFW tree changes. On some sparc64 platforms,
for example, you have to do a "SUNW,foo-operation" OFW call when a
board is hot-plugged in an Ex000 enterprise box, and after that call
finishes successfully, you know where the new board is in the OFW
tree so you import everything underneath. You do the opposite for
a hot-unplug sequence.
Every platform is going to handle this differently.
But there is nothing about it that precludes doing an in-kernel
OFW tree.
Even the most brute-force implementation is possible. When any
hot-plug event occurs, validate the current in-kernel tree. It's that
easy.
I guess it's good to have someone like you, the dissenter in the
group, but Ben and I will snuff you out completely soon enough :-)))
From: Segher Boessenkool <[email protected]>
Date: Wed, 3 Jan 2007 01:52:06 +0100
> >> Leaving aside the issue of in-memory or not, I don't think
> >> it is realistic to think any completely common implementation
> >> will work for this -- it might for current SPARC+PowerPC+OLPC,
> >> but more stuff will be added over time...
> >
> > I see nothing supporting this IMHO bogus claim.
>
> Please keep in mind that not all systems want to kill OF
> as soon as they enter the kernel -- some want to keep it
> active basically forever (or only remove it when the user
> asks for it).
That's what we do on sparc32 and sparc64, I of course understand this.
From: Segher Boessenkool <[email protected]>
Date: Wed, 3 Jan 2007 02:14:34 +0100
> [snipping a bit for now]
>
> > and then, "fix"
> > that so that it works on x86 :-)
>
> That works, if the goal is to just add x86/OLPC to the list of
> platforms that have a device tree fs. I thought the plan was
> to create a single, more generic, OF interface/API in the kernel
> though.
Absolutely.
But let's get the first-order issue solved so the OLPC people
can get the functionality they need. That's the most practical
approach.
From: Jan Engelhardt <[email protected]>
Date: Wed, 3 Jan 2007 02:13:39 +0100 (MET)
>
> On Jan 3 2007 01:52, Segher Boessenkool wrote:
> >> > Leaving aside the issue of in-memory or not, I don't think
> >> > it is realistic to think any completely common implementation
> >> > will work for this -- it might for current SPARC+PowerPC+OLPC,
> >> > but more stuff will be added over time...
> >>
> >> I see nothing supporting this IMHO bogus claim.
> >
> > Please keep in mind that not all systems want to kill OF
> > as soon as they enter the kernel -- some want to keep it
> > active basically forever (or only remove it when the user
> > asks for it).
>
> Kill OF? sparc does not want that IMO, how else should I return to
> the 'ok' prompt?
PowerPC kills OF because it really has to, that's one of numerous
reasons that it started sucking the device tree into a kernel copy
early in the bootup and using that for device discovery etc.
To be honest, the 'ok' prompt is of limited value when you have
things like Alt-SysRq and PPC's XMON debugger in the kernel already.
In fact, the 'ok' prompt is an ENORMOUS pain in the ass to support
on machines with USB keyboards, because sharing the USB host
controller is beyond non-trivial. I've never implemented support
for that on sparc64 and I frankly have no desire to do the work
necessary to support that. It simply is not worth it.
> In fact, the 'ok' prompt is an ENORMOUS pain in the ass to support
> on machines with USB keyboards, because sharing the USB host
> controller is beyond non-trivial. I've never implemented support
> for that on sparc64 and I frankly have no desire to do the work
> necessary to support that. It simply is not worth it.
I was wondering about that .... :-)
Device sharing with a "live" OF is just an absolute pain in the ass and
I'm actually pretty happy not to have to do it. Segher and I, and more
recently, paulus and I, have been discussing about ways to deal with it
or make a version of SLOF (segher's pet OF implementation) that could
stay alive but the more I think about the burden, the more I'm inclined
to just give up...
There have been a recurring need for system vendors to provide
"firmware" type code to be called from the OS or to co-exist with the OS
though. Wether that's a good idea or not or wether those vendor
justifications for that are valid or not is of course debatable ;-)
Some of those attemptes resulted in horrors ranging from SMM BIOSes to
RTAS, via ACPI AML stuff or even worse, Apple's platform function
"scripts" in the device-tree.
I think every single of these approaches have proven that it actually
caused more problems than it solves.
Most of the time, the goal is to actually ease the OS work by
abstracting dodgy motherboard bits, like all the random IOs to toggle to
enable clocks on a given bus for power management etc... But every time,
it's been abused as a way to hide implementation details or hardware
specs, and everytime, bugs in those "firmware" provided blobs have
proven more damageable than having clear documentation for the HW and
proper driver code to deal with it.
Ben.
>>> therefore you can't let multiple CPUs call
>>> into OFW at one time. You must use some kind of locking mechanism,
>>> and that locking mechanism is not simple because it has to not just
>>> stop the other cpus, it has to be able to stop the other cpus yet
>>> still allow them to receive SMP cross-calls from the firmware if the
>>> OFW call is 'stop' or similar.
>>
>> YOu don't need to *stop* the other CPUs, you just need to
>> prevent them from entering the client interface. Put a lock
>> in front.
>
> That's not the issue.
>
> If the global OFW lock disables interrupts,
Why would it?
> other cpus trying to get
> that lock can't receive CPU cross calls since they are delivered via
> interrupts, get it? That's the issue you need to be careful about.
Yes I "get it", thank you. I don't see it as an insurmountable
problem though, even if it exists at all.
>> Okay -- so answer the second part of my concern please: if you keep
>> a copy, you need to keep both in sync -- that means every change
>> by the kernel has to be done twice, and you won't ever be told about
>> changes by the OF, so you have to get a full fresh copy every single
>> time you return from an OF client call that could have changed a
>> property.
>
> Sure, you need to call OFW when a property is changed, and we have
> code to handle that perfectly fine in the sparc of_*() code.
>
> There are mechanisms on the OFW implementations that do hot plugging
> to learn about the OFW tree changes. On some sparc64 platforms,
> for example, you have to do a "SUNW,foo-operation" OFW call when a
> board is hot-plugged in an Ex000 enterprise box, and after that call
> finishes successfully, you know where the new board is in the OFW
> tree so you import everything underneath. You do the opposite for
> a hot-unplug sequence.
>
> Every platform is going to handle this differently.
Yes.
> But there is nothing about it that precludes doing an in-kernel
> OFW tree.
Nothing _precludes_ doing that of course. And it is good
to have an in-kernel tree obviously, you pretty much need
one to make the connection between the OF device tree and
the kernel notion of devices. I'm just saying that you
shouldn't consider the kernel copy the "master", unless of
course you killed OF.
> Even the most brute-force implementation is possible. When any
> hot-plug event occurs, validate the current in-kernel tree. It's that
> easy.
On more events than just hot-plug, but yes. Now consider the
consequences: no part of the kernel can hang on to a reference
to a property, or a device node, while any such event can happen
(at least if it requires the data in there to still be valid ;-) )
This means that anything that uses the device tree always should
go over some accessor functions and get all the info they need
at once, not wait until later.
> I guess it's good to have someone like you, the dissenter in the
> group, but Ben and I will snuff you out completely soon enough :-)))
Heh. I guess I look at the problem more from the OF side, and
you more from the SPARC kernel side, and Ben of course more from
the PowerPC kernel side. That doesn't make me a "dissenter", we
all have (mostly) the same goal in the end.
Segher
>> Kill OF? sparc does not want that IMO, how else should I return to
>> the 'ok' prompt?
>
> PowerPC kills OF because it really has to,
No it doesn't. It has to on some (very common, heh) subarchs.
> that's one of numerous
> reasons that it started sucking the device tree into a kernel copy
> early in the bootup and using that for device discovery etc.
>
> To be honest, the 'ok' prompt is of limited value when you have
> things like Alt-SysRq and PPC's XMON debugger in the kernel already.
When I ported the PowerPC kernel to a new platform years ago,
the big problems happened *before* any of those methods worked,
at really early boot time (the PowerPC port has become a lot
better since then fwiw, it might be easier now). Running (and
debugging) other client programs was a lot easier, since you
could always drop back to OF on panic conditions so you could
investigate what was going on without using hardware probes.
> In fact, the 'ok' prompt is an ENORMOUS pain in the ass to support
> on machines with USB keyboards, because sharing the USB host
> controller is beyond non-trivial. I've never implemented support
> for that on sparc64 and I frankly have no desire to do the work
> necessary to support that. It simply is not worth it.
Oh yes, USB keyboards, lovely. Use a serial port, or a dedicated
network controller, or similar, for the OF console instead ;-)
Segher
>> In fact, the 'ok' prompt is an ENORMOUS pain in the ass to support
>> on machines with USB keyboards, because sharing the USB host
>> controller is beyond non-trivial. I've never implemented support
>> for that on sparc64 and I frankly have no desire to do the work
>> necessary to support that. It simply is not worth it.
>
> I was wondering about that .... :-)
>
> Device sharing with a "live" OF is just an absolute pain in the ass and
> I'm actually pretty happy not to have to do it.
Yep. The sane thing to do is to simply _not_ share devices,
use some debug serial / enet / whatever port exclusively for
OF, and/or only share easily sharable devices. Some resources
_have_ to be shared of course (CPU, memory, MMU, system timer),
but there are OF bindings that describe how to do that in detail.
It gets bloody hard to do things at all if some OF implementation
c.q. some OS implementation doesn't play along of course :-/
> Segher and I, and more
> recently, paulus and I, have been discussing about ways to deal with it
> or make a version of SLOF (segher's pet OF implementation)
I object to the word "pet". And it's not my property, anyway.
> that could
> stay alive but the more I think about the burden, the more I'm inclined
> to just give up...
But I'm not. All I'm asking for is that you don't needlessly
make things hard / impossible for me. It's a bit of an
engineering challenge already ;-)
> There have been a recurring need for system vendors to provide
> "firmware" type code to be called from the OS or to co-exist with the
> OS
> though. Wether that's a good idea or not or wether those vendor
> justifications for that are valid or not is of course debatable ;-)
>
> Some of those attemptes resulted in horrors ranging from SMM BIOSes to
> RTAS, via ACPI AML stuff or even worse, Apple's platform function
> "scripts" in the device-tree.
All those things essentially work like a black box; you have no
idea what they do, what resources they use, etc.
> I think every single of these approaches have proven that it actually
> caused more problems than it solves.
Yes exactly.
> Most of the time, the goal is to actually ease the OS work by
> abstracting dodgy motherboard bits, like all the random IOs to toggle
> to
> enable clocks on a given bus for power management etc... But every
> time,
> it's been abused as a way to hide implementation details or hardware
> specs,
Yes.
> and everytime, bugs in those "firmware" provided blobs have
> proven more damageable than having clear documentation for the HW and
> proper driver code to deal with it.
And partially it's because you're just not calling them correctly --
not necessarily your fault, documentation tends to be severely
lacking.
Segher
From: Segher Boessenkool <[email protected]>
Date: Wed, 3 Jan 2007 16:23:53 +0100
> >>> therefore you can't let multiple CPUs call
> >>> into OFW at one time. You must use some kind of locking mechanism,
> >>> and that locking mechanism is not simple because it has to not just
> >>> stop the other cpus, it has to be able to stop the other cpus yet
> >>> still allow them to receive SMP cross-calls from the firmware if the
> >>> OFW call is 'stop' or similar.
> >>
> >> YOu don't need to *stop* the other CPUs, you just need to
> >> prevent them from entering the client interface. Put a lock
> >> in front.
> >
> > That's not the issue.
> >
> > If the global OFW lock disables interrupts,
>
> Why would it?
Because if a serial/keyboard/other-console-input-device interrupt
arrives, from the user hitting the "BREAK" key sequence, you'll have
to enter OBP from a hardware interrupt handler.
Look, this thread is already tiring me, having to explain a lot
of things and not making much useful progress. So I'll just be
honest and say that I'll go work on other things which need my
time and won't take part in this thread any more.
Take care.
Mitch Bradley wrote:
> Request for comments.
Sorry to revive this thread, I just ran through the discussion. I'm
about 50% in agreement with the idea.
I'd like to put in my $.02 in favor of having a way to pass the OF
device tree to the kernel, in much the same way we pass stuff like
ACPI and PIRQ and MP tables now.
I'd like to also put in my $.02 in OPPOSITION go doing this via
callofw() or any bios callback mechanism.
We have hoped, for some years now, to use the OF device tree as a way
to pass information from (e.g.) LinuxBIOS to the kernel. It seemed
like a solid and tested path. The various kernels understand the tree,
especially non-linux kernels which LinuxBIOS boots. It's a simple
format and well-designed, unlike ACPI. We have always though it was a
very nice design.
But any mechanism that depends on a callback to OFW is way too
limiting. As soon as you put that callback in, you lock out
- uBoot
- LinuxBIOS
- Bochs BIOS as used in Xen and other hypervisors and emulators
- any path that uses kexec (since the first kernel probably shut down
OF)
- etherboot
- GNUFI
So, while the idea of the OF tree is very general, and IMHO very
desirable, the idea of the callback is very specific to one firmware
implementation on one CPU architecture on one platform -- the OLPC --
and is hence not desirable at all.
An idea that is potentially applicable and usable on all BIOSes
becomes usable on only one.
OFW is open source now. I think it's time to reexamine the basic
assumptions about the need for a callback, and see if something better
can't be done. Also, as others mentioned, callback into any sort of
firmware on SMP can and does get messy. I've seen this in practice on
EFI-based machines.
Otherwise, this is just too limited to be of any use to those of us
doing more than just OFW.
Mitch, is there some way to get OF device tree to the kernel without
involving a callback? That would be quite nice.
thanks
ron
Segher has a modification to the devtree patch that creates a lower
level ops vector that can be implemented with callback or non-callback.
It is still being tested.
>
On 1/11/07, Mitch Bradley <[email protected]> wrote:
> Segher has a modification to the devtree patch that creates a lower
> level ops vector that can be implemented with callback or non-callback.
>
> It is still being tested.
Wonderful! If the non-callback version works out, then we can greatly
widen the potential use of the OF device tree for many BIOSes. If that
works, then we can put the proprietary tables into a small box and
ignore them :-)
thanks
ron
* ron minnich <[email protected]> [070111 18:39]:
> I'd like to put in my $.02 in favor of having a way to pass the OF
> device tree to the kernel, in much the same way we pass stuff like
> ACPI and PIRQ and MP tables now.
This works fine for just passing the device tree, but it will fail for
the next step of being able to use the firmware in the OS, and returning
sanely to the firmware.
> But any mechanism that depends on a callback to OFW is way too
> limiting. As soon as you put that callback in, you lock out
> - uBoot
People in the uBoot project have been discussing adding OFW
functionality because it is so useful.
> - LinuxBIOS
With some of the payloads, especially those you list below anyways.
> - Bochs BIOS as used in Xen and other hypervisors and emulators
You dont need it there. Think compatible. We are not going to get _rid_
of legacy bios traces for another 10 years anyways.
> - any path that uses kexec (since the first kernel probably shut down
> OF)
No, that path works fine. The first kernel uses OFW, so it wont shut it
down. Only thing is you need to pass the callback to the loaded kernel.
1 line of code.
> - etherboot
ok, well.
> - GNUFI
Reading UEFI.
There you have a large number of callbacks anyways which will be
supported anyways unless large parts of the industry stop their
"Cuius regio, eius religio" behavior and do sane solutions rather than
(U)EFI.
This is a completely different path, and out of reach for us anyways.
With GNUFI we support EFI callbacks, the same way as we support 16bit
legacy callbacks with ADLO, and OFW callbacks with OFW.
> So, while the idea of the OF tree is very general, and IMHO very
> desirable, the idea of the callback is very specific to one firmware
> implementation on one CPU architecture on one platform -- the OLPC --
> and is hence not desirable at all.
SUN, IBM, APPLE, ... lots of vendors have been using OFW for their
machines. It is by no means a local phenomenon. This way of doing things
has been working since about two decades now with nearly no changes.
And it covers at least Sparc, Sparc64, PPC, PPC64, Mips, Arm, X86,
X86-64. That's 8 CPU architectures. Far more successful than any other
approach.
> An idea that is potentially applicable and usable on all BIOSes
> becomes usable on only one.
The idea is to only have one in the end, and that is flexible enough to
customize it. That's why it is open and we can look at the source.
Some PowerPC systems use a flat tree without callback interface. Sure,
if the machine is too slow to do the real thing, why not. But why throw
away flexibility otherwise? It is not like we have to rely on some evil
patented closed source crap that vendors hail and court as IP.
> OFW is open source now. I think it's time to reexamine the basic
> assumptions about the need for a callback, and see if something better
> can't be done.
I fully agree. And I believe there are very good things that can be done
with callbacks. The reasons callbacks are evil is that you dont know
what you call into. This is not at all the case here. It's a mere
function call that calls some highly board specific code, not unlike all
the calls we do in LinuxBIOS already today. Since we're 100% open
source, we don't "cross a border" anymore.
> Also, as others mentioned, callback into any sort of
> firmware on SMP can and does get messy. I've seen this in practice on
> EFI-based machines.
Yes, executing closed source half undocumented binary blobs from your
software might have undocumented harmful side effects. But that is a
different story, no?
> Otherwise, this is just too limited to be of any use to those of us
> doing more than just OFW.
Yes, the patch was specific to Open Firmware. It is just another
interface next to
- 16bit legacy callbacks
- (u)efi legacy callbacks
- existing openfirmware support code for non-x86 platforms.
But: It is a first step that, as a mid-term goal, allows us to unify OFW
support on all platforms to some extent. Of course. There will always be
quirks. We have quirks in about every subsystem of Linux today, so that
is not more scary.
So what we need is one in-kernel interface that allows us to plug-in
support code for each of the supported firmware interfaces. The attempt
to use the same firmware interface on more than one platform helps that
path.
> Mitch, is there some way to get OF device tree to the kernel without
> involving a callback? That would be quite nice.
That is a nice idea, but unless there is any LinuxBIOS version that
creates such a device tree and exports it as a data structure to the OS,
why would we want to add such support to the Linux kernel?
Stefan
--
coresystems GmbH • Brahmsstr. 16 • D-79104 Freiburg i. Br.
Tel.: +49 761 7668825 • Fax: +49 761 7664613
Email: [email protected] • http://www.coresystems.de/
>> Segher has a modification to the devtree patch that creates a lower
>> level ops vector that can be implemented with callback or
>> non-callback.
>>
>> It is still being tested.
It currently only hooks up OLPC, and I don't have any of
those [hint hint], so I need external testers. I'll send
the patch to LKML if it does work. However, the way it
implements the filesystem will need significant changes,
but we'll discuss that later :-)
> Wonderful! If the non-callback version works out,
We use it on many PowerPC platforms already, it works
just fine.
> then we can greatly
> widen the potential use of the OF device tree for many BIOSes. If that
> works, then we can put the proprietary tables into a small box and
> ignore them :-)
Yup, the DTC-generated tree is just a single binary blob
from the perspective of passing it around.
Segher
>> I'd like to put in my $.02 in favor of having a way to pass the OF
>> device tree to the kernel, in much the same way we pass stuff like
>> ACPI and PIRQ and MP tables now.
>
> This works fine for just passing the device tree, but it will fail for
> the next step of being able to use the firmware in the OS, and
> returning
> sanely to the firmware.
Not everyone wants/needs that. Flexibility is key.
>> - any path that uses kexec (since the first kernel probably shut down
>> OF)
>
> No, that path works fine. The first kernel uses OFW, so it wont shut it
> down. Only thing is you need to pass the callback to the loaded kernel.
Depends. The kernel _can_ shut down OF; in that case, it
becomes responsible for passing the device tree along to
the kexec'd kernel.
>> - etherboot
>
> ok, well.
Heh :-)
>> OFW is open source now. I think it's time to reexamine the basic
>> assumptions about the need for a callback, and see if something better
>> can't be done.
>
> I fully agree. And I believe there are very good things that can be
> done
> with callbacks. The reasons callbacks are evil is that you dont know
> what you call into. This is not at all the case here. It's a mere
> function call that calls some highly board specific code, not unlike
> all
> the calls we do in LinuxBIOS already today. Since we're 100% open
> source, we don't "cross a border" anymore.
Oh you *do* cross a border, and that is a good thing here; it
is a stable API, and that makes a lot of sense here.
> - 16bit legacy callbacks
> - (u)efi legacy callbacks
> - existing openfirmware support code for non-x86 platforms.
>
> But: It is a first step that, as a mid-term goal, allows us to unify
> OFW
> support on all platforms to some extent.
Yes.
>> Mitch, is there some way to get OF device tree to the kernel without
>> involving a callback? That would be quite nice.
>
> That is a nice idea, but unless there is any LinuxBIOS version that
> creates such a device tree and exports it as a data structure to the
> OS,
> why would we want to add such support to the Linux kernel?
The PowerPC arch code already handles this.
Segher
On 1/11/07, Stefan Reinauer <[email protected]> wrote:
> This works fine for just passing the device tree, but it will fail for
> the next step of being able to use the firmware in the OS, and returning
> sanely to the firmware.
And why is it we need to do that, presently? And how, in a virtualized
environment, for example, would you plan to support this calling into
firmware? (I sort of know how IBM does it, I am wondering how OFW
would plan to do it).
We can standardize passing a device tree structure across a very wide
range of environments. But supporting callbacks is necessarily going
to be a much smaller range of environments. It sounds, however, like
it will be possible to do both the callback and non-callback cases, so
I think I'm fine with that anyway. I will wait for Segher's patch.
thanks
ron
On 1/11/07, Segher Boessenkool <[email protected]> wrote:
> Depends. The kernel _can_ shut down OF; in that case, it
> becomes responsible for passing the device tree along to
> the kexec'd kernel.
>
so we will need the non-callback support anyway, it seems?
thanks
ron