2005-05-13 20:12:54

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH 0/8] ppc64: Introduce Cell/BPA platform, v2

This series of patches add support for a fifth platform type in the
ppc64 architecture tree. The Broadband Processor Architecture (BPA)
is what machines using the Cell processor should be following
and currently only prototype hardware exists for it.

Except for the last patch, these are functionally the same as
the first version but are updated for 2.6.12-rc4 and contain
changes based on the feedback I got so far.

The first three patches add some infrastructure that is used by
BPA machines but is not really specific to them can could be used
by other new platform types as well.

The next three patches add the actual platform code, which should
be usable for any BPA compatible implementation.

Patch 7 introduces a new file system to make use of the SPUs inside
the processors. This patch is still in a prototype stage and not
intended for merging yet. The final patch adds some user space code
in the Documentation directory that clarifies how to use the file
system. This one should become a separate package at a later point.

Arnd <><


2005-05-13 20:06:09

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH 8/8] ppc64: add spufs user library

This adds a user space library as a counterpart to the kernel side spufs.
Since the hardware is not available yet, this is mostly for documenting
the spufs API and is not intended for merging into mainline.

As the API matures, libspu will become a separate package.

From: Dirk Herrend?rfer <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>

--- linux-cg.orig/Documentation/bpa/libspu/Makefile 1969-12-31 19:00:00.000000000 -0500
+++ linux-cg/Documentation/bpa/libspu/Makefile 2005-05-13 11:37:31.481923136 -0400
@@ -0,0 +1,41 @@
+#*
+#* libbpathread - A wrapper library to adapt the JSRE SPU usage model to SPUFS
+#* Copyright (C) 2005 IBM Corp.
+#*
+#* This library is free software; you can redistribute it and/or modify it
+#* under the terms of the GNU Lesser General Public License as published by
+#* the Free Software Foundation; either version 2.1 of the License,
+#* or (at your option) any later version.
+#*
+#* This library is distributed in the hope that it will be useful, but
+#* WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+#* or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
+#* License for more details.
+#*
+#* You should have received a copy of the GNU Lesser General Public License
+#* along with this library; if not, write to the Free Software Foundation,
+#* Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+#*
+
+CC := gcc
+CTAGS = ctags
+
+CFLAGS := -O2 -m32 -Wall -I. -Iinclude -DDEBUG -g \
+
+libspu_OBJS := elf_loader.o bpathread.o
+OBJS := libbpathread.a $(libspu_OBJS)
+
+all: $(OBJS)
+
+libbpathread.a: $(libspu_OBJS)
+ ar -r $@ $(libspu_OBJS)
+
+tests:
+ make -C test/start-stop
+
+tags:
+ $(CTAGS) -R .
+
+clean:
+ rm -f $(OBJS) *~ tags
+ make -C test/start-stop clean
--- linux-cg.orig/Documentation/bpa/libspu/README 1969-12-31 19:00:00.000000000 -0500
+++ linux-cg/Documentation/bpa/libspu/README 2005-05-13 11:37:31.482922984 -0400
@@ -0,0 +1,7 @@
+This is an example of how to use the SPEs in applications. It is by
+no means complete or perfect - it is meant as a reference of how to
+use the SPUFS implementation in linux.
+As more and more features of SPUFS become available, this code will
+be extended too.
+
+D.Herrendoerfer <[email protected]>
--- linux-cg.orig/Documentation/bpa/libspu/bpathread.c 1969-12-31 19:00:00.000000000 -0500
+++ linux-cg/Documentation/bpa/libspu/bpathread.c 2005-05-13 11:37:31.483922832 -0400
@@ -0,0 +1,314 @@
+/*
+ * libbpathread - A wrapper library to adapt the JSRE SPU usage model to SPUFS
+ * Copyright (C) 2005 IBM Corp.
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License,
+ * or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
+ * License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this library; if not, write to the Free Software Foundation,
+ * Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <elf.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <pthread.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <sys/ioctl.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+
+#include <bpathread.h>
+#include <elf_loader.h>
+#include <spe_exec.h>
+
+#define __PRINTF(fmt, args...) { fprintf(stderr,fmt , ## args); }
+#ifdef DEBUG
+#define DEBUG_PRINTF(fmt, args...) __PRINTF(fmt , ## args)
+#else
+#define DEBUG_PRINTF(fmt, args...)
+#endif
+
+
+int thread_num = 0;
+
+/*
+ * Helpers
+ *
+ * */
+
+struct thread_start_info
+{
+ char pathname[40]; /* */
+ int thread_num; /* */
+};
+
+struct thread_store
+{
+ pthread_t spe_thread;
+ int thread_return_value;
+ int fd_mbox;
+ unsigned int state;
+};
+
+static struct thread_store spe_thread_store[1024];
+
+/*
+ * int spe_ldr[]:
+ * SPE code that performs the actual parameter setting:
+ */
+static int spe_ldr[] = {
+ 0x30fff083,
+ 0x30fff284,
+ 0x30fff485,
+ 0x30fff686,
+ 0x30fff000,
+ 0x35000000,
+ 0x00000000,
+ 0x00000000
+};
+
+/**
+ * Library API
+ *
+ */
+
+speid_t
+spe_create_thread (int gid, void *start, void *argp, void *envp, int mask,
+ int flags)
+{
+ int rc, memfd;
+ addr64 argp64, envp64;
+ pthread_t thread;
+ char memname[40], pathname[40];
+ void *spe_ld_buf;
+ ssize_t count = 0, num = 0;
+ struct spe_ld_info ld_info;
+ struct thread_start_info *thread_info;
+ struct spe_exec_params spe_params __attribute__ ((aligned (4096)));
+
+ DEBUG_PRINTF ("spe_create_thread(0x%x, %p, %p, %p, 0x%x, 0x%x)\n",
+ gid, start, argp, envp, mask, flags);
+
+ argp64.ull = (unsigned long long) (unsigned long) argp;
+ envp64.ull = (unsigned long long) (unsigned long) envp;
+
+ /* Make the SPU Directory */
+
+ sprintf (pathname, "/spu/bpathread-%i-%i", getpid (), thread_num);
+
+ DEBUG_PRINTF ("mkdir %s\n", pathname);
+
+ rc = mkdir (pathname, S_IRUSR | S_IWUSR | S_IXUSR);
+ if (rc < 0)
+ {
+ DEBUG_PRINTF ("Could not make dir %s\n", pathname);
+ return -1;
+ }
+
+ sprintf (memname, "%s/mem", pathname);
+
+ /* Check SPE */
+
+ memfd = open (memname, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
+ if (memfd < 0)
+ {
+ DEBUG_PRINTF ("Could not open SPE mem file.\n");
+ return -1;
+ }
+
+ /* Prepare Loader */
+
+ spe_ld_buf = malloc (LS_SIZE);
+ thread_info = malloc (sizeof (*thread_info));
+
+ if (!spe_ld_buf || !thread_info)
+ {
+ DEBUG_PRINTF ("Could not allocate SPE memory. \n");
+ errno = ENOMEM;
+ return -1;
+ }
+
+ memset(spe_ld_buf, 0, LS_SIZE);
+
+ rc = load_spe_elf (start, spe_ld_buf, &ld_info);
+ if (rc != 0)
+ {
+ DEBUG_PRINTF ("Load SPE ELF failed..\n");
+ return -1;
+ }
+
+ /* Add SPE exec program */
+
+ DEBUG_PRINTF ("Add exec prog dst:0x%04x size:0x%04x\n",
+ SPE_LDR_START, sizeof (spe_ldr));
+ memcpy (spe_ld_buf + SPE_LDR_START, &spe_ldr, sizeof (spe_ldr));
+
+ /* Add SPE exec parameters */
+
+ spe_params.entry = ld_info.entry;
+ spe_params.gpr4[0] = argp64.ui[0];
+ spe_params.gpr4[1] = argp64.ui[1];
+ spe_params.gpr5[0] = envp64.ui[0];
+ spe_params.gpr5[1] = envp64.ui[1];
+
+ DEBUG_PRINTF ("Add exec param dst:0x%04x size:0x%04x\n",
+ SPE_PARAM_START, sizeof (spe_params));
+ memcpy (spe_ld_buf + SPE_PARAM_START, &spe_params,
+ sizeof (spe_params));
+
+ /* Copy SPE image to SPUfs */
+ do
+ {
+ num = write (memfd, spe_ld_buf + count, LS_SIZE - count);
+ if (num == -1)
+ {
+ DEBUG_PRINTF ("Transfer SPE ELF failed..\n");
+ return -1;
+ }
+
+ count += num;
+ }
+ while (count < LS_SIZE && num);
+ close (memfd);
+
+ /* Free the SPE Buffer */
+ free (spe_ld_buf);
+
+ strcpy (thread_info->pathname, pathname);
+ thread_info->thread_num = thread_num;
+
+ spe_thread_store[thread_num].state = BPA_THREAD_START;
+
+ rc = pthread_create (&thread, NULL, spe_thread, thread_info);
+
+ rc = thread_num;
+
+ while (spe_thread_store[thread_num].state != BPA_THREAD_IDLE)
+ {
+ thread_num++;
+ }
+
+ spe_thread_store[rc].spe_thread = thread;
+
+ return rc;
+}
+
+int
+spe_wait (speid_t speid, int *status, int options)
+{
+ int rc;
+
+ DEBUG_PRINTF ("spu_wait(0x%x, %p, 0x%x)\n", speid, status, options);
+
+ rc = pthread_join (spe_thread_store[speid].spe_thread,
+ (void **) status);
+
+ spe_thread_store[speid].state = BPA_THREAD_IDLE;
+
+ DEBUG_PRINTF ("Thread ended.\n");
+ return rc;
+}
+
+int
+spe_kill (speid_t speid, int sig)
+{
+ int rc;
+
+ rc = pthread_kill (spe_thread_store[speid].spe_thread, sig);
+
+ return rc;
+}
+
+/*
+ * Thread Code
+ *
+ * */
+
+void *
+spe_thread (void *ptr)
+{
+ char runname[40], mboxname[40], pathname[40];
+ int runfd, mboxfd;
+ int num;
+ struct thread_start_info *thread_info;
+
+ struct spufs_run_arg
+ {
+ unsigned npc; /* inout: Next Program Counter */
+ unsigned short code; /* out: SPU status */
+ unsigned short status;
+ };
+ struct spufs_run_arg arg = { SPE_LDR_PROG_start, };
+
+ DEBUG_PRINTF ("In thread\n");
+
+ thread_info = (struct thread_start_info *) ptr;
+
+ num = thread_info->thread_num;
+ strcpy (pathname, thread_info->pathname);
+
+ free (thread_info);
+
+ DEBUG_PRINTF ("thread: %i.\n", num);
+ DEBUG_PRINTF ("pathname: %s.\n", pathname);
+
+ sprintf (runname, "%s/run", pathname);
+ runfd = open (runname, O_RDONLY);
+ if (runfd < 0)
+ {
+ DEBUG_PRINTF ("Could not open SPU run file.\n");
+ spe_thread_store[num].thread_return_value = -EINVAL;
+ pthread_exit ((void *) spe_thread_store[num].
+ thread_return_value);
+ }
+
+ sprintf (mboxname, "%s/m_box", pathname);
+ mboxfd = open (mboxname, O_RDONLY);
+ if (mboxfd < 0)
+ {
+ DEBUG_PRINTF ("Could not open SPE mailbox file.\n");
+ spe_thread_store[num].thread_return_value = -EINVAL;
+ pthread_exit ((void *) spe_thread_store[num].
+ thread_return_value);
+ }
+ else
+ {
+ spe_thread_store[num].fd_mbox = mboxfd;
+ }
+
+ spe_thread_store[num].state = BPA_THREAD_RUNNING;
+
+ int ret = ioctl (runfd, _IOWR ('s', 0, struct spufs_run_arg), &arg)
+ & 0xff;
+ if (ret < 0)
+ {
+ DEBUG_PRINTF ("Could not ioctl() on SPE run file.\n");
+ spe_thread_store[num].thread_return_value = -EINVAL;
+ pthread_exit ((void *) spe_thread_store[num].
+ thread_return_value);
+ }
+
+ close (runfd);
+
+ spe_thread_store[num].state = BPA_THREAD_ENDED;
+
+ DEBUG_PRINTF ("SPE thread result: %08x:%04x:%04x\n", arg.npc,
+ arg.code, arg.status);
+
+ spe_thread_store[num].thread_return_value = arg.code;
+ return ((void *) spe_thread_store[num].thread_return_value);
+}
--- linux-cg.orig/Documentation/bpa/libspu/bpathread.h 1969-12-31 19:00:00.000000000 -0500
+++ linux-cg/Documentation/bpa/libspu/bpathread.h 2005-05-13 11:37:31.484922680 -0400
@@ -0,0 +1,47 @@
+/*
+ * libbpathread - A wrapper library to adapt the JSRE SPU usage model to SPUFS
+ * Copyright (C) 2005 IBM Corp.
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License,
+ * or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
+ * License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this library; if not, write to the Free Software Foundation,
+ * Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#ifndef _bpathread_h_
+#define _bpathread_h_
+
+typedef int speid_t;
+
+/* APIs for SPE threads.
+ */
+
+extern speid_t spe_create_thread (int gid, void *start,
+ void *argp, void *envp,
+ int mask, int flags);
+
+extern int spe_wait (speid_t speid, int *status, int options);
+
+extern int spe_kill (speid_t speid, int sig);
+
+
+/* SPE-thread internals
+ */
+
+void *spe_thread (void *ptr);
+
+#define BPA_THREAD_IDLE 0
+#define BPA_THREAD_START 1
+#define BPA_THREAD_RUNNING 2
+#define BPA_THREAD_ENDED 3
+
+
+#endif
--- linux-cg.orig/Documentation/bpa/libspu/elf_loader.c 1969-12-31 19:00:00.000000000 -0500
+++ linux-cg/Documentation/bpa/libspu/elf_loader.c 2005-05-13 11:37:31.484922680 -0400
@@ -0,0 +1,130 @@
+/* libbpathread - A wrapper library to adapt the JSRE SPU usage model to SPUFS
+ * Copyright (C) 2005 IBM Corp.
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License,
+ * or (at your option) any later version.
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
+ * License for more details.
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this library; if not, write to the Free Software Foundation,
+ * Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <string.h>
+#include <malloc.h>
+#include <errno.h>
+#include <elf.h>
+#include <elf_loader.h>
+
+#define __PRINTF(fmt, args...) { fprintf(stderr,fmt , ## args); }
+#ifdef DEBUG
+#define DEBUG_PRINTF(fmt, args...) __PRINTF(fmt , ## args)
+#else
+#define DEBUG_PRINTF(fmt, args...)
+#endif
+
+int
+load_spe_elf (void *elf_start, void *ld_buffer, struct spe_ld_info *ld_info)
+{
+ Elf32_Ehdr *ehdr;
+ static const unsigned char expected[EI_PAD] = {
+ [EI_MAG0] = ELFMAG0,
+ [EI_MAG1] = ELFMAG1,
+ [EI_MAG2] = ELFMAG2,
+ [EI_MAG3] = ELFMAG3,
+ [EI_CLASS] = ELFCLASS32,
+ [EI_DATA] = ELFDATA2MSB,
+ [EI_VERSION] = EV_CURRENT,
+ [EI_OSABI] = ELFOSABI_SYSV,
+ [EI_ABIVERSION] = 0
+ };
+ Elf32_Phdr *phdr;
+ Elf32_Phdr *ph;
+ int num_load_seg = 0;
+
+ DEBUG_PRINTF ("load_spe_elf(%p, %p)\n", elf_start, ld_buffer);
+ ehdr = (Elf32_Ehdr *) elf_start;
+
+ /* Validate ELF */
+ if (memcmp (ehdr->e_ident, expected, EI_PAD) != 0)
+ {
+ DEBUG_PRINTF ("invalid ELF header.\n");
+ DEBUG_PRINTF ("expected 0x%016llX != 0x%016llX\n",
+ *(long long *) expected, *(long long *) ehdr);
+ errno = EINVAL;
+ return -errno;
+ }
+
+ /* Validate the machine type */
+ if (ehdr->e_machine != 0x17)
+ {
+ DEBUG_PRINTF ("not an SPE ELF object");
+ errno = EINVAL;
+ return -errno;
+ }
+
+ /* Validate ELF object type. */
+ if (ehdr->e_type != ET_EXEC)
+ {
+ DEBUG_PRINTF ("invalid SPE ELF type.\n");
+ DEBUG_PRINTF ("SPU type %d != %d\n", ehdr->e_type, ET_EXEC);
+ errno = EINVAL;
+ DEBUG_PRINTF ("parse_spu_elf(): errno=%d.\n", errno);
+ return -errno;
+ }
+
+ /* Start processing headers */
+ phdr = (Elf32_Phdr *) ((char *) ehdr + ehdr->e_phoff);
+
+ /*
+ * Load all PT_LOAD segments onto the SPU local store buffer.
+ */
+ DEBUG_PRINTF ("Segments: 0x%x\n", ehdr->e_phnum);
+ for (ph = phdr; ph < &phdr[ehdr->e_phnum]; ++ph)
+ {
+ switch (ph->p_type)
+ {
+ case PT_LOAD:
+ /* DEBUG_PRINTF ("PT_LOAD)\n"); */
+ /* Only LOAD non-zero segments. */
+ if (ph->p_filesz)
+ {
+ num_load_seg++;
+
+ DEBUG_PRINTF
+ ("SPE_LOAD %p (0x%x) -> %p (0x%x) (%i bytes)\n",
+ ld_buffer + ph->p_vaddr,
+ ph->p_vaddr,
+ elf_start + ph->p_paddr,
+ ph->p_paddr, ph->p_filesz);
+ memcpy (ld_buffer + ph->p_vaddr,
+ elf_start + ph->p_paddr,
+ ph->p_filesz);
+ }
+ break;
+ }
+ }
+ if (num_load_seg == 0)
+ {
+ DEBUG_PRINTF ("no segments to load");
+ errno = EINVAL;
+ return -errno;
+ }
+
+ /* Remember where the code wants to be started */
+ ld_info->entry = ehdr->e_entry;
+ DEBUG_PRINTF ("entry = 0x%x\n", ehdr->e_entry);
+
+ return 0;
+
+}
--- linux-cg.orig/Documentation/bpa/libspu/elf_loader.h 1969-12-31 19:00:00.000000000 -0500
+++ linux-cg/Documentation/bpa/libspu/elf_loader.h 2005-05-13 11:37:31.485922528 -0400
@@ -0,0 +1,40 @@
+/*
+ * libbpathread - A wrapper library to adapt the JSRE SPU usage model to SPUFS
+ * Copyright (C) 2005 IBM Corp.
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License,
+ * or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
+ * License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this library; if not, write to the Free Software Foundation,
+ * Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#define LS_SIZE 0x40000 /* 256K (in bytes) */
+
+#define SPE_LDR_PROG_start (LS_SIZE - 512) // location of spu_ld.so prog
+#define SPE_LDR_PARAMS_start (LS_SIZE - 128) // location of spu_ldr_params
+
+typedef union
+{
+ unsigned long long ull;
+ unsigned int ui[2];
+} addr64;
+
+struct spe_ld_info
+{
+ unsigned int entry; /* Entry point of SPU image */
+};
+
+/*
+ * Global API : */
+
+int load_spe_elf (void *elf_start, void *ld_buffer,
+ struct spe_ld_info *ld_info);
--- linux-cg.orig/Documentation/bpa/libspu/spe_exec.h 1969-12-31 19:00:00.000000000 -0500
+++ linux-cg/Documentation/bpa/libspu/spe_exec.h 2005-05-13 11:37:31.485922528 -0400
@@ -0,0 +1,43 @@
+/*
+ * libbpathread - A wrapper library to adapt the JSRE SPU usage model to SPUFS
+ * Copyright (C) 2005 IBM Corp.
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License,
+ * or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
+ * License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this library; if not, write to the Free Software Foundation,
+ * Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef _spe_exec_h_
+#define _spe_exec_h_
+
+#define SPE_LDR_START 0x0003fe00
+#define SPE_PARAM_START 0x0003ff80
+
+
+/*
+ * struct spe_exec_params:
+ *
+ * Holds the (per thread) parameters for the spe program
+*/
+
+struct spe_exec_params
+{
+ unsigned int entry; /* entry point for application. */
+ unsigned int gpr3[4]; /* initial setting for $3 */
+ unsigned int gpr4[4]; /* initial setting for $4 */
+ unsigned int gpr5[4]; /* initial setting for $5 */
+ unsigned int gpr6[4]; /* initial setting for $6 */
+
+};
+
+#endif
--- linux-cg.orig/Documentation/bpa/libspu/test/start-stop/Makefile 1969-12-31 19:00:00.000000000 -0500
+++ linux-cg/Documentation/bpa/libspu/test/start-stop/Makefile 2005-05-13 11:37:31.486922376 -0400
@@ -0,0 +1,43 @@
+#*
+#* libbpathread - A wrapper library to adapt the JSRE SPU usage model to SPUFS
+#* Copyright (C) 2005 IBM Corp.
+#*
+#* This library is free software; you can redistribute it and/or modify it
+#* under the terms of the GNU Lesser General Public License as published by
+#* the Free Software Foundation; either version 2.1 of the License,
+#* or (at your option) any later version.
+#*
+#* This library is distributed in the hope that it will be useful, but
+#* WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+#* or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
+#* License for more details.
+#*
+#* You should have received a copy of the GNU Lesser General Public License
+#* along with this library; if not, write to the Free Software Foundation,
+#* Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+#*
+
+CC := gcc
+SPECC := spu-gcc
+CTAGS = ctags
+
+CFLAGS := -O2 -m32 -Wall -I../.. -I../../include -g
+SPECFLAGS := -O2 -Wall -I../../include
+
+LDFLAGS := -m32
+LIBS := -L../.. -l bpathread -l pthread
+
+SPE_OBJS := spe-start-stop
+OBJS := ppe-start-stop
+
+all: $(OBJS) $(SPE_OBJS)
+
+clean:
+ rm -f $(OBJS) $(SPE_OBJS)
+
+ppe-start-stop: ppe-start-stop.c
+ $(CC) -o $@ $< $(CFLAGS) $(LDFLAGS) $(LIBS)
+
+spe-start-stop: spe-start-stop.c
+ $(SPECC) $(SPECFLAGS) -o $@ $<
+
--- linux-cg.orig/Documentation/bpa/libspu/test/start-stop/ppe-start-stop.c 1969-12-31 19:00:00.000000000 -0500
+++ linux-cg/Documentation/bpa/libspu/test/start-stop/ppe-start-stop.c 2005-05-13 11:37:31.487922224 -0400
@@ -0,0 +1,89 @@
+/*
+ * libbpathread - A wrapper library to adapt the JSRE SPU usage model to SPUFS
+ * Copyright (C) 2005 IBM Corp.
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License,
+ * or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
+ * License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this library; if not, write to the Free Software Foundation,
+ * Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include <bpathread.h>
+
+
+void *load_binary(char *bin)
+{
+ int binfd;
+ struct stat statbuf;
+ int ret;
+ void *buf;
+ void *pos;
+ off_t size;
+
+ binfd = open(bin, O_RDONLY);
+ if (binfd < 0)
+ return NULL;
+
+ ret = fstat(binfd, &statbuf);
+ if (ret < 0)
+ return NULL;
+
+ buf = malloc(statbuf.st_size + 16);
+ if (!buf)
+ return NULL;
+
+ buf = (void *)(((unsigned long)buf + 16) & ~15);
+ pos = buf;
+ size = statbuf.st_size;
+
+ do {
+ ret = read(binfd, pos, size);
+ if (ret > 0) {
+ pos += ret;
+ size -= ret;
+ }
+ if (ret < 0)
+ return NULL;
+ } while (size > 0 && ret);
+
+ return buf;
+}
+
+int main(int argc, char* argv[])
+{
+ char *binary;
+ int threadnum,status;
+
+ if (argc != 2) {
+ printf("usage: pu spu-executable\n");
+ exit(1);
+ }
+
+ binary = load_binary(argv[1]);
+ if (!binary)
+ exit(2);
+
+ threadnum = spe_create_thread(0, binary, NULL, NULL, 0, 0);
+
+ spe_wait(threadnum,&status,0);
+
+ printf("Thread returned status: %04x\n",status);
+ return 0;
+}
--- linux-cg.orig/Documentation/bpa/libspu/test/start-stop/spe-start-stop.c 1969-12-31 19:00:00.000000000 -0500
+++ linux-cg/Documentation/bpa/libspu/test/start-stop/spe-start-stop.c 2005-05-13 11:37:31.487922224 -0400
@@ -0,0 +1,23 @@
+/*
+ * libbpathread - A wrapper library to adapt the JSRE SPU usage model to SPUFS
+ * Copyright (C) 2005 IBM Corp.
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by
+ * the Free Software Foundation; either version 2.1 of the License,
+ * or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
+ * License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this library; if not, write to the Free Software Foundation,
+ * Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+int main(void)
+{
+ return 0;
+}
--- linux-cg.orig/Documentation/bpa/libspu/tools/README 1969-12-31 19:00:00.000000000 -0500
+++ linux-cg/Documentation/bpa/libspu/tools/README 2005-05-13 11:37:31.487922224 -0400
@@ -0,0 +1,8 @@
+Contents of this Directory
+
+elfspe-register:
+Script to register a SPE-ELF loading app with binfmt_misc.
+
+embedspu:
+Script to attach a SPE-ELF object to an executable.
+
--- linux-cg.orig/Documentation/bpa/libspu/tools/elfspe-register 1969-12-31 19:00:00.000000000 -0500
+++ linux-cg/Documentation/bpa/libspu/tools/elfspe-register 2005-05-13 11:37:31.488922072 -0400
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+echo ':spu:M::\x7fELF\x01\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x17::/home/uweigand/runspu:' > /proc/sys/fs/binfmt_misc/register
+
--- linux-cg.orig/Documentation/bpa/libspu/tools/embedspu 1969-12-31 19:00:00.000000000 -0500
+++ linux-cg/Documentation/bpa/libspu/tools/embedspu 2005-05-13 11:37:31.488922072 -0400
@@ -0,0 +1,56 @@
+#/bin/sh
+
+#
+# Embed SPE ELF executable into PPE object file, and define a
+# global pointer variable refering to the embedded file.
+#
+# Usage: embedspu [flags] symbol_name input_filename output_filename
+#
+# input_filename: SPE ELF executable to be embedded
+# output_filename: Resulting PPE object file
+# symbol_name: Name of global pointer variable to be defined
+# flags: GCC flags defining PPE object file format
+# (e.g. -m32 or -m64)
+#
+
+# Argument parsing
+SYMBOL=
+INFILE=
+OUTFILE=
+FLAGS=
+
+while [ -n "$1" ]; do
+ case $1 in
+ -*) FLAGS="${FLAGS} $1"
+ shift ;;
+ *) if [ -z $SYMBOL ]; then
+ SYMBOL=$1
+ elif [ -z $INFILE ]; then
+ INFILE=$1
+ elif [ -z $OUTFILE ]; then
+ OUTFILE=$1
+ fi
+ shift ;;
+ esac
+done
+
+if [ -z "$SYMBOL" -o -z "$INFILE" -o -z "$OUTFILE" ]; then
+ echo "Usage: $0 [symbol_name] [input_filename] [output_filename]"
+ exit 1
+fi
+
+# The section name as defined by the SPU ABI
+SECTION=.spuelf.${INFILE}
+
+# Build object file holding pointer to embedded section
+gcc ${FLAGS} -x c -c -o ${OUTFILE} - <<EOF || exit 1
+static char __section__[] __attribute__((section("${SECTION}"), aligned(128))) = { };
+void *${SYMBOL} = __section__;
+EOF
+
+# Add embedded section contents into object file
+objcopy --add-section ${SECTION}=${INFILE} \
+ --set-section-flags ${SECTION}=alloc,load,readonly,data,contents \
+ --strip-unneeded ${OUTFILE} \
+ || rm -f ${OUTFILE}
+

2005-05-13 20:12:49

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH 7/8] ppc64: SPU file system

This is an early version of the SPU file system, which is used
to run code on the Synergistic Processing Units of the Broadband
Engine.

The file system provides a name space similar to posix shared
memory or message queues. Users that have write permissions
on the file system can create directories in the spufs root.

Every directory represents an SPU context, which is currently
mapped to a physical SPU, but that is going to change to a
virtualization scheme in future updates.

An SPU context directory contains a predefined set of files
used for manipulating the state of the logical SPU. Users
can change permissions on those files, but not actually
add or remove files without removing the complete directory.

The current set of files is:

/mem the contents of the local store memory of the SPU.
This can be accessed like a regular shared memory
file and contains both code and data in the address
space of the SPU.
The implemented file operations currently are read(),
write() and mmap(). We will need our own address
space operations as soon as we allow the SPU context
to be scheduled away from the physical SPU into
page cache.

/run A stub file that lets us do ioctl. The only ioctl
method we need is the spu_run() call. spu_run suspends
the current thread from the host CPU and transfers
the flow of execution to the SPU.
The ioctl call return to the calling thread when a state
is entered that can not be handled by the kernel, e.g.
an error in the SPU code or an exit() from it.
When a signal is pending for the host CPU thread, the
ioctl is interrupted and the SPU stopped in order to
call the signal handler.

/mbox The first SPU to CPU communication mailbox. This file
is read-only and can be read in units of 32 bits.
The file can only be used in non-blocking mode and
it even poll() will not block on it.
When no data is available in the mailbox, read() returns
EAGAIN.

/ibox The second SPU to CPU communication mailbox. This file
is similar to the first mailbox file, but can be read
in blocking I/O mode, and the poll familiy of system
calls can be used to wait for it.

/wbox The CPU to SPU communation mailbox. It is write-only
can can be written in units of 32 bits. If the mailbox
is full, write() will block and poll can be used to
wait for it becoming empty again.

Other files are planned but currently are not implemented or
not functional.

Signed-off-by: Arnd Bergmann <[email protected]>

--- linux-cg.orig/arch/ppc64/kernel/Makefile 2005-05-13 15:23:43.019961032 -0400
+++ linux-cg/arch/ppc64/kernel/Makefile 2005-05-13 17:25:48.121935456 -0400
@@ -53,6 +53,7 @@ obj-$(CONFIG_HVCS) += hvcserver.o
obj-$(CONFIG_IBMVIO) += vio.o
obj-$(CONFIG_XICS) += xics.o
obj-$(CONFIG_MPIC) += mpic.o
+obj-$(CONFIG_SPU_FS) += spu_base.o

obj-$(CONFIG_PPC_PMAC) += pmac_setup.o pmac_feature.o pmac_pci.o \
pmac_time.o pmac_nvram.o pmac_low_i2c.o
--- linux-cg.orig/arch/ppc64/kernel/spu_base.c 1969-12-31 19:00:00.000000000 -0500
+++ linux-cg/arch/ppc64/kernel/spu_base.c 2005-05-13 17:25:48.124935000 -0400
@@ -0,0 +1,579 @@
+/*
+ * Low-level SPU handling
+ *
+ * (C) Copyright IBM Deutschland Entwicklung GmbH 2005
+ *
+ * Author: Arnd Bergmann <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#define DEBUG 1
+
+#include <linux/interrupt.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/wait.h>
+
+#include <asm/io.h>
+#include <asm/prom.h>
+#include <asm/semaphore.h>
+#include <asm/spu.h>
+#include <asm/mmu_context.h>
+
+#include "bpa_iic.h"
+
+static int __spu_trap_invalid_dma(struct spu *spu)
+{
+ pr_debug("%s\n", __FUNCTION__);
+ force_sig(SIGBUS, /* info, */ spu->task);
+ return 0;
+}
+
+static int __spu_trap_dma_align(struct spu *spu)
+{
+ pr_debug("%s\n", __FUNCTION__);
+ force_sig(SIGBUS, /* info, */ spu->task);
+ return 0;
+}
+
+static int __spu_trap_error(struct spu *spu)
+{
+ pr_debug("%s\n", __FUNCTION__);
+ force_sig(SIGILL, /* info, */ spu->task);
+ return 0;
+}
+
+static int __spu_trap_data_seg(struct spu *spu, unsigned long ea)
+{
+ struct spu_priv2 __iomem *priv2;
+ struct mm_struct *mm;
+
+ pr_debug("%s\n", __FUNCTION__);
+
+ if (REGION_ID(ea) != USER_REGION_ID) {
+ printk("invalid region access at %016lx\n", ea);
+ return 1;
+ }
+
+ priv2 = spu->priv2;
+ mm = spu->mm;
+
+ if (spu->slb_replace >= 8)
+ spu->slb_replace = 0;
+
+ out_be64(&priv2->slb_index_W, spu->slb_replace);
+ out_be64(&priv2->slb_vsid_RW,
+ (get_vsid(mm->context.id, ea) << SLB_VSID_SHIFT)
+ | SLB_VSID_USER);
+ out_be64(&priv2->slb_esid_RW, (ea & ESID_MASK) | SLB_ESID_V);
+ out_be64(&priv2->mfc_control_RW, MFC_CNTL_RESTART_DMA_COMMAND);
+
+ printk("set slb %d context %lx, ea %016lx, vsid %016lx, esid %016lx\n",
+ spu->slb_replace, mm->context.id, ea,
+ (get_vsid(mm->context.id, ea) << SLB_VSID_SHIFT)| SLB_VSID_USER,
+ (ea & ESID_MASK) | SLB_ESID_V);
+ return 0;
+}
+
+static int __spu_trap_data_map(struct spu *spu, unsigned long ea)
+{
+ unsigned long dsisr;
+ struct spu_priv1 __iomem *priv1;
+
+ pr_debug("%s\n", __FUNCTION__);
+ priv1 = spu->priv1;
+ dsisr = in_be64(&priv1->mfc_dsisr_RW);
+
+ if (dsisr & MFC_DSISR_PTE_NOT_FOUND) {
+ printk("pte lookup ea %016lx, dsisr %lx\n", ea, dsisr);
+ wake_up(&spu->stop_wq);
+ } else {
+ printk("unexpexted data fault ea %016lx, dsisr %lx\n", ea, dsisr);
+ }
+
+ return 0;
+}
+
+static int __spu_trap_mailbox(struct spu *spu)
+{
+ pr_debug("%s\n", __FUNCTION__);
+ wake_up(&spu->mbox_wq);
+ return 0;
+}
+
+static int __spu_trap_stop(struct spu *spu)
+{
+ pr_debug("%s\n", __FUNCTION__);
+ spu->stop_code = in_be32(&spu->problem->spu_status_R);
+ wake_up(&spu->stop_wq);
+ return 0;
+}
+
+static int __spu_trap_halt(struct spu *spu)
+{
+ pr_debug("%s\n", __FUNCTION__);
+ spu->stop_code = in_be32(&spu->problem->spu_status_R);
+ wake_up(&spu->stop_wq);
+ return 0;
+}
+
+static int __spu_trap_tag_group(struct spu *spu)
+{
+ pr_debug("%s\n", __FUNCTION__);
+ /* wake_up(&spu->dma_wq); */
+ return 0;
+}
+
+static irqreturn_t
+spu_irq_class_0(int irq, void *data, struct pt_regs *regs)
+{
+ struct spu *spu;
+ unsigned long stat;
+
+ spu = data;
+ stat = in_be64(&spu->priv1->int_stat_class0_RW);
+
+ if (stat & 1) /* invalid MFC DMA */
+ __spu_trap_invalid_dma(spu);
+
+ if (stat & 2) /* invalid DMA alignment */
+ __spu_trap_dma_align(spu);
+
+ if (stat & 4) /* error on SPU */
+ __spu_trap_error(spu);
+
+ out_be64(&spu->priv1->int_stat_class0_RW, stat);
+ return stat ? IRQ_HANDLED : IRQ_NONE;
+}
+
+static irqreturn_t
+spu_irq_class_1(int irq, void *data, struct pt_regs *regs)
+{
+ struct spu *spu;
+ unsigned long stat, dar;
+
+ spu = data;
+ stat = in_be64(&spu->priv1->int_stat_class1_RW);
+ dar = in_be64(&spu->priv1->mfc_dar_RW);
+
+ if (stat & 1) /* segment fault */
+ __spu_trap_data_seg(spu, dar);
+
+ if (stat & 2) { /* mapping fault */
+ __spu_trap_data_map(spu, dar);
+ }
+
+ if (stat & 4) /* ls compare & suspend on get */
+ ;
+
+ if (stat & 8) /* ls compare & suspend on put */
+ ;
+
+ out_be64(&spu->priv1->int_stat_class1_RW, stat);
+ return stat ? IRQ_HANDLED : IRQ_NONE;
+}
+
+static irqreturn_t
+spu_irq_class_2(int irq, void *data, struct pt_regs *regs)
+{
+ struct spu *spu;
+ unsigned long stat;
+
+ spu = data;
+ stat = in_be64(&spu->priv1->int_stat_class2_RW);
+
+ if (stat & 1) /* mailbox */
+ __spu_trap_mailbox(spu);
+
+ if (stat & 2) /* SPU stop-and-signal */
+ __spu_trap_stop(spu);
+
+ if (stat & 4) /* SPU halted */
+ __spu_trap_halt(spu);
+
+ if (stat & 8) /* DMA tag group complete */
+ __spu_trap_tag_group(spu);
+
+ out_be64(&spu->priv1->int_stat_class2_RW, stat);
+ return stat ? IRQ_HANDLED : IRQ_NONE;
+}
+
+static int
+spu_request_irqs(struct spu *spu)
+{
+ int ret;
+ int irq_base;
+
+ irq_base = IIC_NODE_STRIDE * spu->node + IIC_SPE_OFFSET;
+
+ snprintf(spu->irq_c0, sizeof (spu->irq_c0), "spe%02d.0", spu->number);
+ ret = request_irq(irq_base + spu->isrc,
+ spu_irq_class_0, 0, spu->irq_c0, spu);
+ if (ret)
+ goto out;
+ out_be64(&spu->priv1->int_mask_class0_RW, 0x7);
+
+ snprintf(spu->irq_c1, sizeof (spu->irq_c1), "spe%02d.1", spu->number);
+ ret = request_irq(irq_base + IIC_CLASS_STRIDE + spu->isrc,
+ spu_irq_class_1, 0, spu->irq_c1, spu);
+ if (ret)
+ goto out1;
+ out_be64(&spu->priv1->int_mask_class1_RW, 0x3);
+
+ snprintf(spu->irq_c2, sizeof (spu->irq_c2), "spe%02d.2", spu->number);
+ ret = request_irq(irq_base + 2*IIC_CLASS_STRIDE + spu->isrc,
+ spu_irq_class_2, 0, spu->irq_c2, spu);
+ if (ret)
+ goto out2;
+ out_be64(&spu->priv1->int_mask_class2_RW, 0xf);
+ goto out;
+
+out2:
+ free_irq(irq_base + IIC_CLASS_STRIDE + spu->isrc, spu);
+out1:
+ free_irq(irq_base + spu->isrc, spu);
+out:
+ return ret;
+}
+
+static void
+spu_free_irqs(struct spu *spu)
+{
+ int irq_base;
+
+ irq_base = IIC_NODE_STRIDE * spu->node + IIC_SPE_OFFSET;
+
+ free_irq(irq_base + spu->isrc, spu);
+ free_irq(irq_base + IIC_CLASS_STRIDE + spu->isrc, spu);
+ free_irq(irq_base + 2*IIC_CLASS_STRIDE + spu->isrc, spu);
+}
+
+static LIST_HEAD(spu_list);
+static DECLARE_MUTEX(spu_mutex);
+
+struct spu *spu_alloc(void)
+{
+ struct spu *spu;
+
+ down(&spu_mutex);
+ if (!list_empty(&spu_list)) {
+ spu = list_entry(spu_list.next, struct spu, list);
+ list_del_init(&spu->list);
+ printk("Got SPU %x\n", spu->isrc);
+ } else {
+ printk("No SPU left\n");
+ spu = NULL;
+ }
+ up(&spu_mutex);
+ return spu;
+}
+EXPORT_SYMBOL(spu_alloc);
+
+void spu_free(struct spu *spu)
+{
+ down(&spu_mutex);
+ list_add_tail(&spu->list, &spu_list);
+ up(&spu_mutex);
+}
+EXPORT_SYMBOL(spu_free);
+
+extern int hash_page(unsigned long ea, unsigned long access, unsigned long trap); //XXX
+static int spu_handle_pte_fault(struct spu *spu)
+{
+ struct spu_problem __iomem *prob;
+ struct spu_priv1 __iomem *priv1;
+ struct spu_priv2 __iomem *priv2;
+ unsigned long ea, access, is_write;
+ struct mm_struct *mm;
+ struct vm_area_struct *vma;
+ int ret;
+
+ printk("%s\n", __FUNCTION__);
+ prob = spu->problem;
+ priv1 = spu->priv1;
+ priv2 = spu->priv2;
+
+ ea = in_be64(&priv1->mfc_dar_RW);
+ access = _PAGE_PRESENT | _PAGE_USER;
+ is_write = in_be64(&priv1->mfc_dsisr_RW) & 0x02000000;
+ mm = spu->mm;
+
+ ret = hash_page(ea, access, 0x300);
+ if (ret < 0) {
+ printk("error in hash_page!\n");
+ ret = -EFAULT;
+ goto out_err;
+ }
+
+ printk("current %ld, spu %ld, ea %ld\n", current->mm->context.id, mm->context.id, ea);
+ if (!ret) {
+ printk("hash inserted, vsid %lx\n", get_vsid(current->mm->context.id, ea));
+ goto out_restart;
+ }
+
+ ret = -EFAULT;
+ if (ea >= TASK_SIZE)
+ goto out_err;
+
+ down_read(&mm->mmap_sem);
+ vma = find_vma(mm, ea);
+ if (!vma)
+ goto out;
+
+ if (is_write) {
+ if (!(vma->vm_flags & VM_WRITE))
+ goto out;
+ }
+
+ ret = 0;
+/* FIXME add missing code from do_page_fault */
+ switch (handle_mm_fault(mm, vma, ea, is_write)) {
+ case VM_FAULT_MINOR:
+ printk("minor\n");
+ current->min_flt++;
+ break;
+ case VM_FAULT_MAJOR:
+ printk("major\n");
+ current->maj_flt++;
+ break;
+ case VM_FAULT_SIGBUS:
+ ret = -EFAULT;
+ break;
+ case VM_FAULT_OOM:
+ ret = -ENOMEM;
+ break;
+ default:
+ BUG();
+ }
+out:
+ up_read(&mm->mmap_sem);
+ if (ret)
+ goto out_err;
+out_restart:
+ out_be64(&priv2->mfc_control_RW, MFC_CNTL_RESTART_DMA_COMMAND);
+out_err:
+ printk("%s: returning %d\n", __FUNCTION__, ret);
+ return ret;
+}
+
+int spu_run(struct spu *spu)
+{
+ struct spu_problem __iomem *prob;
+ struct spu_priv1 __iomem *priv1;
+ struct spu_priv2 __iomem *priv2;
+ unsigned long status;
+ int count = 10;
+ int ret;
+
+ prob = spu->problem;
+ priv1 = spu->priv1;
+ priv2 = spu->priv2;
+ spu->mm = current->mm;
+ spu->task = current;
+ out_be32(&prob->spu_runcntl_RW, SPU_RUNCNTL_RUNNABLE);
+
+ do {
+ ret = wait_event_interruptible(spu->stop_wq,
+ (!((status = in_be32(&prob->spu_status_R)) & 0x1))
+ || (in_be64(&priv1->mfc_dsisr_RW) & MFC_DSISR_PTE_NOT_FOUND));
+
+ if (status & SPU_STATUS_STOPPED_BY_STOP)
+ ret = -EAGAIN;
+ else if (status & SPU_STATUS_STOPPED_BY_HALT)
+ ret = -EIO;
+ else if (in_be64(&priv1->mfc_dsisr_RW) & MFC_DSISR_PTE_NOT_FOUND)
+ ret = spu_handle_pte_fault(spu);
+
+ } while (!ret && count--);
+ out_be32(&prob->spu_runcntl_RW, SPU_RUNCNTL_STOP);
+ out_be64(&priv2->slb_invalidate_all_W, 0);
+ spu->mm = NULL;
+ spu->task = NULL;
+
+ return ret;
+}
+EXPORT_SYMBOL(spu_run);
+
+static void __iomem * __init map_spe_prop(struct device_node *n,
+ const char *name)
+{
+ struct address_prop {
+ unsigned long address;
+ unsigned int len;
+ } __attribute__((packed)) *prop;
+
+ void *p;
+ int proplen;
+
+ p = get_property(n, name, &proplen);
+ if (proplen != sizeof (struct address_prop))
+ return NULL;
+
+ prop = p;
+
+ return ioremap(prop->address, prop->len);
+}
+
+static void spu_unmap(struct spu *spu)
+{
+ iounmap(spu->priv2);
+ iounmap(spu->priv1);
+ iounmap(spu->problem);
+ iounmap((u8 __iomem *)spu->local_store);
+}
+
+static int __init spu_map_device(struct spu *spu, struct device_node *spe)
+{
+ unsigned int *isrc_prop;
+ int ret;
+
+ ret = -ENODEV;
+ isrc_prop = (u32 *)get_property(spe, "isrc", NULL);
+ if (!isrc_prop)
+ goto out;
+ spu->isrc = *isrc_prop;
+
+ spu->name = get_property(spe, "name", NULL);
+ if (!spu->name)
+ goto out;
+
+ /* we use local store as ram, not io memory */
+ spu->local_store = (u8 __force *) map_spe_prop(spe, "local-store");
+ if (!spu->local_store)
+ goto out;
+
+ spu->problem= map_spe_prop(spe, "problem");
+ if (!spu->problem)
+ goto out_unmap;
+
+ spu->priv1= map_spe_prop(spe, "priv1");
+ if (!spu->priv1)
+ goto out_unmap;
+
+ spu->priv2= map_spe_prop(spe, "priv2");
+ if (!spu->priv2)
+ goto out_unmap;
+ ret = 0;
+ goto out;
+
+out_unmap:
+ spu_unmap(spu);
+out:
+ return ret;
+}
+
+static int __init find_spu_node_id(struct device_node *spe)
+{
+ unsigned int *id;
+ struct device_node *cpu;
+
+ cpu = spe->parent->parent;
+ id = (unsigned int *)get_property(cpu, "node-id", NULL);
+
+ return id ? *id : 0;
+}
+
+static int __init create_spu(struct device_node *spe)
+{
+ struct spu *spu;
+ int ret;
+ static int number;
+
+ ret = -ENOMEM;
+ spu = kmalloc(sizeof (*spu), GFP_KERNEL);
+ if (!spu)
+ goto out;
+
+ ret = spu_map_device(spu, spe);
+ if (ret)
+ goto out_free;
+
+ spu->node = find_spu_node_id(spe);
+ spu->stop_code = 0;
+ spu->slb_replace = 0;
+ spu->mm = NULL;
+
+ out_be64(&spu->priv1->mfc_sdr_RW, mfspr(SPRN_SDR1));
+ out_be64(&spu->priv1->mfc_sr1_RW, 0x33);
+
+ init_waitqueue_head(&spu->stop_wq);
+ init_waitqueue_head(&spu->mbox_wq);
+
+ down(&spu_mutex);
+ spu->number = number++;
+ ret = spu_request_irqs(spu);
+ if (ret)
+ goto out_unmap;
+
+ list_add(&spu->list, &spu_list);
+ up(&spu_mutex);
+
+ printk(KERN_DEBUG "Using SPE %s %02x %p %p %p %p %d\n",
+ spu->name, spu->isrc, spu->local_store,
+ spu->problem, spu->priv1, spu->priv2, spu->number);
+ goto out;
+
+out_unmap:
+ up(&spu_mutex);
+ spu_unmap(spu);
+out_free:
+ kfree(spu);
+out:
+ return ret;
+}
+
+static void destroy_spu(struct spu *spu)
+{
+ list_del_init(&spu->list);
+
+ spu_free_irqs(spu);
+ spu_unmap(spu);
+ kfree(spu);
+}
+
+static void cleanup_spu_base(void)
+{
+ struct spu *spu, *tmp;
+ down(&spu_mutex);
+ list_for_each_entry_safe(spu, tmp, &spu_list, list)
+ destroy_spu(spu);
+ up(&spu_mutex);
+}
+module_exit(cleanup_spu_base);
+
+static int __init init_spu_base(void)
+{
+ struct device_node *node;
+ int ret;
+
+ ret = -ENODEV;
+ for (node = of_find_node_by_type(NULL, "spc");
+ node; node = of_find_node_by_type(node, "spc")) {
+ ret = create_spu(node);
+ if (ret) {
+ printk(KERN_WARNING "%s: Error initializing %s\n",
+ __FUNCTION__, node->name);
+ cleanup_spu_base();
+ break;
+ }
+ }
+ return ret;
+}
+module_init(init_spu_base);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Arnd Bergmann <[email protected]>");
--- linux-cg.orig/arch/ppc64/mm/hash_utils.c 2005-05-13 15:15:07.870991152 -0400
+++ linux-cg/arch/ppc64/mm/hash_utils.c 2005-05-13 17:25:48.126934696 -0400
@@ -354,6 +354,7 @@ int hash_page(unsigned long ea, unsigned

return ret;
}
+EXPORT_SYMBOL_GPL(hash_page);

void flush_hash_page(unsigned long context, unsigned long ea, pte_t pte,
int local)
--- linux-cg.orig/fs/Kconfig 2005-05-13 15:15:07.872990848 -0400
+++ linux-cg/fs/Kconfig 2005-05-13 17:25:48.128934392 -0400
@@ -853,6 +853,16 @@ config HUGETLBFS
config HUGETLB_PAGE
def_bool HUGETLBFS

+config SPU_FS
+ tristate "SPU file system"
+ default m
+ depends on PPC_BPA
+ help
+ The SPU file system is used to access Synergistic Processing
+ Units on machines implementing the Broadband Processor
+ Architecture.
+
+
config RAMFS
bool
default y
--- linux-cg.orig/fs/Makefile 2005-05-13 15:15:07.874990544 -0400
+++ linux-cg/fs/Makefile 2005-05-13 17:25:48.131933936 -0400
@@ -95,3 +95,4 @@ obj-$(CONFIG_BEFS_FS) += befs/
obj-$(CONFIG_HOSTFS) += hostfs/
obj-$(CONFIG_HPPFS) += hppfs/
obj-$(CONFIG_DEBUG_FS) += debugfs/
+obj-$(CONFIG_SPU_FS) += spufs/
--- linux-cg.orig/fs/spufs/Makefile 1969-12-31 19:00:00.000000000 -0500
+++ linux-cg/fs/spufs/Makefile 2005-05-13 17:25:48.133933632 -0400
@@ -0,0 +1,3 @@
+obj-$(CONFIG_SPU_FS) += spufs.o
+
+spufs-y += inode.o
--- linux-cg.orig/fs/spufs/inode.c 1969-12-31 19:00:00.000000000 -0500
+++ linux-cg/fs/spufs/inode.c 2005-05-13 17:25:48.135933328 -0400
@@ -0,0 +1,991 @@
+/*
+ * SPU file system
+ *
+ * (C) Copyright IBM Deutschland Entwicklung GmbH 2005
+ *
+ * Author: Arnd Bergmann <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/fs.h>
+#include <linux/backing-dev.h>
+#include <linux/init.h>
+#include <linux/ioctl.h>
+#include <linux/module.h>
+#include <linux/pagemap.h>
+#include <linux/poll.h>
+#include <linux/slab.h>
+
+#include <asm/io.h>
+#include <asm/semaphore.h>
+#include <asm/spu.h>
+#include <asm/uaccess.h>
+
+/* SPU context abstraction */
+struct spu_context {
+ struct spu *spu; /* pointer to a physical SPU if SPUFS_DIRECT */
+ struct rw_semaphore backing_sema; /* protects the above */
+ spinlock_t mmio_lock; /* protects mmio access */
+ long sig;
+
+ struct kref kref;
+};
+
+static struct spu_context *
+alloc_spu_context(void)
+{
+ struct spu_context *ctx;
+ ctx = kmalloc(sizeof *ctx, GFP_KERNEL);
+ if (!ctx)
+ goto out;
+ ctx->spu = spu_alloc();
+ if (!ctx->spu)
+ goto out_free;
+ init_rwsem(&ctx->backing_sema);
+ spin_lock_init(&ctx->mmio_lock);
+ kref_init(&ctx->kref);
+ goto out;
+out_free:
+ kfree(ctx);
+ ctx = NULL;
+out:
+ return ctx;
+}
+
+static void
+destroy_spu_context(struct kref *kref)
+{
+ struct spu_context *ctx;
+ ctx = container_of(kref, struct spu_context, kref);
+ if (ctx->spu)
+ spu_free(ctx->spu);
+ kfree(ctx);
+}
+
+static struct spu_context *
+get_spu_context(struct spu_context *ctx)
+{
+ kref_get(&ctx->kref);
+ return ctx;
+}
+
+static void
+put_spu_context(struct spu_context *ctx)
+{
+ kref_put(&ctx->kref, &destroy_spu_context);
+}
+
+/* The magic number for our file system */
+enum {
+ SPUFS_MAGIC = 0x23c9b64e,
+};
+
+/* bits in the inode flags */
+enum {
+ SPUFS_DIRECT, /* Data resides on a physical SPU */
+};
+
+struct spufs_inode_info {
+ struct spu_context *i_ctx;
+ struct inode vfs_inode;
+};
+
+static kmem_cache_t *spufs_inode_cache;
+#define SPUFS_I(inode) container_of(inode, struct spufs_inode_info, vfs_inode)
+
+/* Information about the backing dev, same as ramfs */
+
+static struct backing_dev_info spufs_backing_dev_info = {
+ .ra_pages = 0, /* No readahead */
+ .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK |
+ BDI_CAP_MAP_DIRECT | BDI_CAP_MAP_COPY | BDI_CAP_READ_MAP |
+ BDI_CAP_WRITE_MAP,
+};
+
+static struct address_space_operations spufs_aops = {
+ .readpage = simple_readpage,
+ .prepare_write = simple_prepare_write,
+ .commit_write = simple_commit_write,
+};
+
+/* File operations */
+
+static int
+spufs_open(struct inode *inode, struct file *file)
+{
+ struct spufs_inode_info *i = SPUFS_I(inode);
+ file->private_data = i->i_ctx;
+ return 0;
+}
+
+static ssize_t
+spufs_read(struct file *file, char __user *buffer, size_t size, loff_t *pos)
+{
+ struct spu *spu;
+ struct spu_context *ctx;
+ int ret;
+
+ ctx = file->private_data;
+ spu = ctx->spu;
+
+ down_read(&ctx->backing_sema);
+ if (spu->number & 0/*1*/) {
+ ret = generic_file_read(file, buffer, size, pos);
+ goto out;
+ }
+
+ ret = 0;
+ size = min_t(ssize_t, LS_SIZE - *pos, size);
+ if (size <= 0)
+ goto out;
+ *pos += size;
+ ret = copy_to_user(buffer, spu->local_store + *pos - size, size);
+ ret = ret ? -EFAULT : size;
+
+out:
+ up_read(&ctx->backing_sema);
+ return ret;
+}
+
+static ssize_t
+spufs_write(struct file *file, const char __user *buffer, size_t size, loff_t *pos)
+{
+ struct spu_context *ctx = file->private_data;
+ struct spu *spu = ctx->spu;
+
+ if (spu->number & 0) //1)
+ return generic_file_write(file, buffer, size, pos);
+
+ size = min_t(ssize_t, LS_SIZE - *pos, size);
+ if (size <= 0)
+ return -EFBIG;
+ *pos += size;
+ return copy_from_user(spu->local_store + *pos - size,
+ buffer, size) ? -EFAULT : size;
+}
+
+static int
+spufs_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ struct spu_context *ctx = file->private_data;
+ struct spu *spu = ctx->spu;
+ unsigned long pfn;
+
+ if (spu->number & 0) //1)
+ return generic_file_mmap(file, vma);
+
+ vma->vm_flags |= VM_RESERVED;
+ pfn = __pa(spu->local_store) >> PAGE_SHIFT;
+ /*
+ * This will work for actual SPUs, but not for vmalloc memory:
+ */
+ if (remap_pfn_range(vma, vma->vm_start, pfn,
+ vma->vm_end-vma->vm_start, vma->vm_page_prot))
+ return -EAGAIN;
+ /**/
+ return 0;
+}
+
+static struct file_operations spufs_mem_fops = {
+ .open = spufs_open,
+ .read = spufs_read,
+ .write = spufs_write,
+ .mmap = spufs_mmap,
+ .llseek = generic_file_llseek,
+};
+
+/* generic open function for all pipe-like files */
+static int spufs_pipe_open(struct inode *inode, struct file *file)
+{
+ struct spufs_inode_info *i = SPUFS_I(inode);
+ file->private_data = i->i_ctx;
+
+ return nonseekable_open(inode, file);
+}
+
+static ssize_t spufs_mbox_read(struct file *file, char __user *buf,
+ size_t len, loff_t *pos)
+{
+ struct spu_context *ctx;
+ struct spu_problem __iomem *prob;
+ u32 mbox_stat;
+ u32 mbox_data;
+
+ if (len < 4)
+ return -EINVAL;
+
+ ctx = file->private_data;
+ prob = ctx->spu->problem;
+ mbox_stat = in_be32(&prob->mb_stat_R);
+ if (!(mbox_stat & 0x0000ff))
+ return -EAGAIN;
+
+ mbox_data = in_be32(&prob->pu_mb_R);
+
+ if (copy_to_user(buf, &mbox_data, sizeof mbox_data))
+ return -EFAULT;
+
+ return 4;
+}
+
+static struct file_operations spufs_mbox_fops = {
+ .open = spufs_pipe_open,
+ .read = spufs_mbox_read,
+};
+
+static ssize_t spufs_ibox_read(struct file *file, char __user *buf,
+ size_t len, loff_t *pos)
+{
+ struct spu_context *ctx;
+ struct spu_problem __iomem *prob;
+ struct spu_priv2 __iomem *priv2;
+ u32 mbox_stat;
+ u32 ibox_data;
+ ssize_t ret;
+
+ if (len < 4)
+ return -EINVAL;
+
+ ctx = file->private_data;
+ prob = ctx->spu->problem;
+ priv2 = ctx->spu->priv2;
+
+ mbox_stat = in_be32(&prob->mb_stat_R);
+ if (!(mbox_stat & 0xff0000))
+ return -EAGAIN;
+
+ ibox_data = in_be64(&priv2->puint_mb_R);
+
+ ret = 4;
+ if (copy_to_user(buf, &ibox_data, sizeof ibox_data))
+ ret = -EFAULT;
+
+ return ret;
+}
+
+static unsigned int spufs_ibox_poll(struct file *file, poll_table *wait)
+{
+ struct spu_context *ctx;
+ struct spu_problem __iomem *prob;
+ u32 mbox_stat;
+ unsigned int mask;
+
+ ctx = file->private_data;
+ prob = ctx->spu->problem;
+ mbox_stat = in_be32(&prob->mb_stat_R);
+
+ poll_wait(file, &ctx->spu->mbox_wq, wait);
+
+ mask = 0;
+ if (mbox_stat & 0xff0000)
+ mask |= POLLIN | POLLRDNORM;
+
+ return mask;
+}
+
+static struct file_operations spufs_ibox_fops = {
+ .open = spufs_pipe_open,
+ .read = spufs_ibox_read,
+ .poll = spufs_ibox_poll,
+};
+
+static ssize_t spufs_wbox_write(struct file *file, const char __user *buf,
+ size_t len, loff_t *pos)
+{
+ struct spu_context *ctx;
+ struct spu_problem __iomem *prob;
+ u32 mbox_stat;
+ u32 wbox_data;
+
+ if (len < 4)
+ return -EINVAL;
+
+ ctx = file->private_data;
+ prob = ctx->spu->problem;
+ mbox_stat = in_be32(&prob->mb_stat_R);
+ if (!(mbox_stat & 0x00ff00))
+ return -EAGAIN;
+
+ if (copy_from_user(&wbox_data, buf, sizeof wbox_data))
+ return -EFAULT;
+
+ out_be32(&prob->spu_mb_W, wbox_data);
+
+ return 4;
+}
+
+static unsigned int spufs_wbox_poll(struct file *file, poll_table *wait)
+{
+ struct spu_context *ctx;
+ struct spu_problem __iomem *prob;
+ u32 mbox_stat;
+ unsigned int mask;
+
+ ctx = file->private_data;
+ prob = ctx->spu->problem;
+ mbox_stat = in_be32(&prob->mb_stat_R);
+
+ poll_wait(file, &ctx->spu->mbox_wq, wait);
+
+ mask = 0;
+ if (mbox_stat & 0x00ff00)
+ mask = POLLOUT | POLLWRNORM;
+
+ return mask;
+}
+
+static struct file_operations spufs_wbox_fops = {
+ .open = spufs_pipe_open,
+ .write = spufs_wbox_write,
+ .poll = spufs_wbox_poll,
+};
+
+static int spufs_run_open(struct inode *inode, struct file *file)
+{
+ struct spufs_inode_info *i = SPUFS_I(inode);
+ file->private_data = i->i_ctx;
+
+ return nonseekable_open(inode, file);
+}
+
+struct spufs_run_arg {
+ u32 npc; /* inout: Next Program Counter */
+ u32 status; /* out: SPU status */
+};
+
+static long spufs_run_ioctl(struct file *file, unsigned int num,
+ unsigned long arg)
+{
+ struct spu_context *ctx;
+ struct spu_problem __iomem *prob;
+ struct spufs_run_arg data;
+ int ret;
+
+ if (num != _IOWR('s', 0, struct spufs_run_arg))
+ return -EINVAL;
+
+ if (copy_from_user(&data, (void __user *)arg, sizeof data))
+ return -EFAULT;
+
+ ctx = file->private_data;
+ prob = ctx->spu->problem;
+ out_be32(&prob->spu_npc_RW, data.npc);
+ wmb();
+
+ ret = spu_run(ctx->spu);
+/*
+ prob->spu_npc_RW = data.npc;
+ ctx->spu->mm = current->mm;
+ wmb();
+ prob->spu_runcntl_RW = SPU_RUNCNTL_RUNNABLE;
+ mb();
+
+ ret = wait_event_interruptible(ctx->spu->stop_wq,
+ prob->spu_status_R & 0x3e);
+
+ prob->spu_runcntl_RW = SPU_RUNCNTL_STOP;
+ ctx->spu->mm = NULL;
+*/
+ data.status = in_be32(&prob->spu_status_R);
+ data.npc = in_be32(&prob->spu_npc_RW);
+ if (copy_to_user((void __user *)arg, &data, sizeof data))
+ ret = -EFAULT;
+
+ return ret;
+}
+
+static struct file_operations spufs_run_fops = {
+ .open = spufs_run_open,
+ .unlocked_ioctl = spufs_run_ioctl,
+ .compat_ioctl = spufs_run_ioctl,
+};
+
+
+/**** spufs attributes
+ *
+ * Attributes in spufs behave similar to those in sysfs:
+ *
+ * Writing to an attribute immediately sets a value, an open file can be
+ * written to multiple times.
+ *
+ * Reading from an attribute creates a buffer from the value that might get
+ * read with multiple read calls. When the attribute has been read completely,
+ * no further read calls are possible until the file is opened again.
+ *
+ * All spufs attributes contain a text representation of a numeric value that
+ * are accessed with the get() and set() functions.
+ *
+ * Perhaps these file operations could be put in debugfs or libfs instead,
+ * they are not really SPU specific.
+ */
+
+struct spufs_attr {
+ long (*get)(struct spu_context *);
+ void (*set)(struct spu_context *, long);
+ struct spu_context *ctx;
+ char get_buf[24]; /* enough to store a long and "\n\0" */
+ char set_buf[24];
+ struct semaphore sem; /* protects access to these buffers */
+};
+
+/* spufs_attr_open is called by an actual attribute open file operation
+ * to set the attribute specific access operations. */
+static int spufs_attr_open(struct inode *inode, struct file *file,
+ long (*get)(struct spu_context *),
+ void (*set)(struct spu_context *, long))
+{
+ struct spufs_attr *attr;
+
+ attr = kmalloc(sizeof *attr, GFP_KERNEL);
+ if (!attr)
+ return -ENOMEM;
+
+ /* reading/writing needs the respective get/set operation */
+ if (((file->f_mode & FMODE_READ) && !get) ||
+ ((file->f_mode & FMODE_WRITE) && !set))
+ return -EACCES;
+
+ attr->get = get;
+ attr->set = set;
+ attr->ctx = SPUFS_I(inode)->i_ctx;
+ init_MUTEX(&attr->sem);
+
+ file->private_data = attr;
+
+ return nonseekable_open(inode, file);
+}
+
+static int spufs_attr_close(struct inode *inode, struct file *file)
+{
+ kfree(file->private_data);
+ return 0;
+}
+
+/* read from the buffer that is filled with the get function */
+static ssize_t spufs_attr_read(struct file *file, char __user *buf,
+ size_t len, loff_t *ppos)
+{
+ struct spufs_attr *attr;
+ size_t size;
+ ssize_t ret;
+
+ attr = file->private_data;
+
+ down(&attr->sem);
+ if (*ppos) /* continued read */
+ size = strlen(attr->get_buf);
+ else /* first read */
+ size = scnprintf(attr->get_buf, sizeof (attr->get_buf),
+ "%ld\n", attr->get(attr->ctx));
+
+ ret = simple_read_from_buffer(buf, len, ppos, attr->get_buf, size);
+ up(&attr->sem);
+ return ret;
+}
+
+/* interpret the buffer as a number to call the set function with */
+static ssize_t spufs_attr_write(struct file *file, const char __user *buf,
+ size_t len, loff_t *ppos)
+{
+ struct spufs_attr *attr;
+ long val;
+ size_t size;
+ ssize_t ret;
+
+
+ attr = file->private_data;
+
+ down(&attr->sem);
+ ret = -EFAULT;
+ size = min(sizeof (attr->set_buf) - 1, len);
+ if (copy_from_user(attr->set_buf, buf, size))
+ goto out;
+
+ ret = len; /* claim we got the whole input */
+ attr->set_buf[size] = '\0';
+ val = simple_strtol(attr->set_buf, NULL, 0);
+ attr->set(attr->ctx, val);
+out:
+ up(&attr->sem);
+ return ret;
+}
+
+#define spufs_attribute(name) \
+static int name ## _open(struct inode *inode, struct file *file) \
+{ \
+ return spufs_attr_open(inode, file, &name ## _get, &name ## _set); \
+} \
+static struct file_operations name = { \
+ .open = name ## _open, \
+ .release = spufs_attr_close, \
+ .read = spufs_attr_read, \
+ .write = spufs_attr_write, \
+};
+
+
+static void spufs_signal1_type_set(struct spu_context *ctx, long val)
+{
+ ctx->sig = val;
+}
+
+static long spufs_signal1_type_get(struct spu_context *ctx)
+{
+ return ctx->sig;
+}
+
+spufs_attribute(spufs_signal1_type);
+
+static void spufs_class0_stat_set(struct spu_context *ctx, long val)
+{
+ out_be64(&ctx->spu->priv1->int_stat_class0_RW, val);
+}
+
+static long spufs_class0_stat_get(struct spu_context *ctx)
+{
+ return in_be64(&ctx->spu->priv1->int_stat_class0_RW);
+}
+
+spufs_attribute(spufs_class0_stat);
+
+static void spufs_class1_stat_set(struct spu_context *ctx, long val)
+{
+ out_be64(&ctx->spu->priv1->int_stat_class1_RW, val);
+}
+
+static long spufs_class1_stat_get(struct spu_context *ctx)
+{
+ return in_be64(&ctx->spu->priv1->int_stat_class1_RW);
+}
+
+spufs_attribute(spufs_class1_stat);
+
+static void spufs_class2_stat_set(struct spu_context *ctx, long val)
+{
+ out_be64(&ctx->spu->priv1->int_stat_class2_RW, val);
+}
+
+static long spufs_class2_stat_get(struct spu_context *ctx)
+{
+ return in_be64(&ctx->spu->priv1->int_stat_class2_RW);
+}
+
+spufs_attribute(spufs_class2_stat);
+
+static void spufs_class0_mask_set(struct spu_context *ctx, long val)
+{
+ out_be64(&ctx->spu->priv1->int_mask_class0_RW, val);
+}
+
+static long spufs_class0_mask_get(struct spu_context *ctx)
+{
+ return in_be64(&ctx->spu->priv1->int_mask_class0_RW);
+}
+
+spufs_attribute(spufs_class0_mask);
+
+static void spufs_class1_mask_set(struct spu_context *ctx, long val)
+{
+ out_be64(&ctx->spu->priv1->int_mask_class1_RW, val);
+}
+
+static long spufs_class1_mask_get(struct spu_context *ctx)
+{
+ return in_be64(&ctx->spu->priv1->int_mask_class1_RW);
+}
+
+spufs_attribute(spufs_class1_mask);
+
+static void spufs_class2_mask_set(struct spu_context *ctx, long val)
+{
+ out_be64(&ctx->spu->priv1->int_mask_class2_RW, val);
+}
+
+static long spufs_class2_mask_get(struct spu_context *ctx)
+{
+ return in_be64(&ctx->spu->priv1->int_mask_class2_RW);
+}
+
+spufs_attribute(spufs_class2_mask);
+
+#define priv1_attr(name) \
+static void spufs_ ## name ## _set(struct spu_context *ctx, long val) \
+{ out_be64(&ctx->spu->priv1->name, val); } \
+static long spufs_ ## name ## _get(struct spu_context *ctx) \
+{ return in_be64(&ctx->spu->priv1->name); } \
+spufs_attribute(spufs_ ## name)
+
+#define priv2_attr(name) \
+static void spufs_ ## name ## _set(struct spu_context *ctx, long val) \
+{ out_be64(&ctx->spu->priv2->name, val); } \
+static long spufs_ ## name ## _get(struct spu_context *ctx) \
+{ return in_be64(&ctx->spu->priv2->name); } \
+spufs_attribute(spufs_ ## name)
+
+priv1_attr(mfc_sr1_RW);
+priv1_attr(mfc_fir_R);
+priv1_attr(mfc_fir_status_or_W);
+priv1_attr(mfc_fir_status_and_W);
+priv1_attr(mfc_fir_mask_R);
+priv1_attr(mfc_fir_mask_or_W);
+priv1_attr(mfc_fir_mask_and_W);
+priv1_attr(mfc_fir_chkstp_enable_RW);
+priv1_attr(mfc_cer_R);
+priv1_attr(mfc_dsisr_RW);
+priv1_attr(mfc_dsir_R);
+priv1_attr(mfc_sdr_RW);
+priv2_attr(mfc_control_RW);
+
+/* Inode operations */
+
+static struct inode *
+spufs_alloc_inode(struct super_block *sb)
+{
+ struct spufs_inode_info *ei;
+
+ ei = kmem_cache_alloc(spufs_inode_cache, SLAB_KERNEL);
+ if (!ei)
+ return NULL;
+ return &ei->vfs_inode;
+}
+
+static void
+spufs_destroy_inode(struct inode *inode)
+{
+ kmem_cache_free(spufs_inode_cache, SPUFS_I(inode));
+}
+
+static void
+spufs_init_once(void *p, kmem_cache_t * cachep, unsigned long flags)
+{
+ struct spufs_inode_info *ei = p;
+
+ if ((flags & (SLAB_CTOR_VERIFY|SLAB_CTOR_CONSTRUCTOR)) ==
+ SLAB_CTOR_CONSTRUCTOR) {
+ inode_init_once(&ei->vfs_inode);
+ }
+}
+
+static struct inode *
+spufs_new_inode(struct super_block *sb, int mode)
+{
+ struct inode *inode;
+
+ inode = new_inode(sb);
+ if (!inode)
+ goto out;
+
+ inode->i_mode = mode;
+ inode->i_uid = current->fsuid;
+ inode->i_gid = current->fsgid;
+ inode->i_blksize = PAGE_CACHE_SIZE;
+ inode->i_blocks = 0;
+ inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+out:
+ return inode;
+}
+
+static int
+spufs_setattr(struct dentry *dentry, struct iattr *attr)
+{
+ struct inode *inode = dentry->d_inode;
+
+/* dump_stack();
+ printk("ia_size %lld, i_size:%lld\n", attr->ia_size, inode->i_size);
+*/
+ if (attr->ia_size != inode->i_size)
+ return -EINVAL;
+ return inode_setattr(inode, attr);
+}
+
+/*
+static int
+spufs_create(struct inode *dir, struct dentry *dentry,
+ int mode, struct nameidata *nd)
+{
+ static struct inode_operations iops = {
+ .getattr = simple_getattr,
+ .setattr = spufs_setattr,
+ };
+
+
+ struct inode *inode;
+ int ret;
+
+ ret = -ENOSPC;
+ inode = spufs_new_inode(dir->i_sb, S_IFREG | mode);
+ if (!inode)
+ goto out;
+ inode->i_op = &iops;
+ inode->i_fop = &spufs_mem_fops;
+ inode->i_size = LS_SIZE;
+ SPUFS_I(inode)->i_spu = spu_alloc();
+ if (!SPUFS_I(inode)->i_spu)
+ goto out_iput;
+ inode->i_mapping->a_ops = &spufs_aops;
+ inode->i_mapping->backing_dev_info = &spufs_backing_dev_info;
+ d_instantiate(dentry, inode);
+ dget(dentry);
+ return 0;
+out_iput:
+ iput(inode);
+out:
+ return ret;
+}
+*/
+
+static void
+spufs_delete_inode(struct inode *inode)
+{
+ if (SPUFS_I(inode)->i_ctx)
+ put_spu_context(SPUFS_I(inode)->i_ctx);
+ clear_inode(inode);
+}
+
+static struct tree_descr spufs_dir_contents[] = {
+ { "mem", &spufs_mem_fops, 0644, },
+ { "run", &spufs_run_fops, 0400, },
+ { "mbox", &spufs_mbox_fops, 0400, },
+ { "ibox", &spufs_ibox_fops, 0400, },
+ { "wbox", &spufs_wbox_fops, 0200, },
+ { "signal1_type", &spufs_signal1_type, 0600, },
+ { "signal2_type", &spufs_signal1_type, 0600, },
+
+#if 1 /* debugging only */
+ { "class0_mask", &spufs_class0_mask, 0600, },
+ { "class1_mask", &spufs_class1_mask, 0600, },
+ { "class2_mask", &spufs_class2_mask, 0600, },
+ { "class0_stat", &spufs_class0_stat, 0600, },
+ { "class1_stat", &spufs_class1_stat, 0600, },
+ { "class2_stat", &spufs_class2_stat, 0600, },
+ { "sr1", &spufs_mfc_sr1_RW, 0600, },
+ { "fir", &spufs_mfc_fir_R, 0400, },
+ { "fir_status_or", &spufs_mfc_fir_status_or_W, 0200, },
+ { "fir_status_and", &spufs_mfc_fir_status_and_W, 0200, },
+ { "fir_mask", &spufs_mfc_fir_mask_R, 0400, },
+ { "fir_mask_or", &spufs_mfc_fir_mask_or_W, 0200, },
+ { "fir_mask_and", &spufs_mfc_fir_mask_and_W, 0200, },
+ { "fir_chkstp", &spufs_mfc_fir_chkstp_enable_RW, 0600, },
+ { "cer", &spufs_mfc_cer_R, 0400, },
+ { "dsisr", &spufs_mfc_dsisr_RW, 0600, },
+ { "dsir", &spufs_mfc_dsir_R, 0200, },
+ { "cntl", &spufs_mfc_control_RW, 0600, },
+ { "sdr", &spufs_mfc_sdr_RW, 0600, },
+#endif
+ {},
+};
+
+static int
+spufs_fill_dir(struct dentry *dir, struct tree_descr *files,
+ int mode, struct spu_context *ctx)
+{
+ struct inode *inode;
+ struct dentry *dentry;
+ int ret;
+
+ static struct inode_operations iops = {
+ .getattr = simple_getattr,
+ .setattr = spufs_setattr,
+ };
+
+ ret = -ENOSPC;
+ while (files->name && files->name[0]) {
+ dentry = d_alloc_name(dir, files->name);
+ if (!dentry)
+ goto out;
+ inode = spufs_new_inode(dir->d_sb,
+ S_IFREG | (files->mode & mode));
+ if (!inode)
+ goto out;
+ inode->i_op = &iops;
+ inode->i_fop = files->ops;
+ inode->i_mapping->a_ops = &spufs_aops;
+ inode->i_mapping->backing_dev_info = &spufs_backing_dev_info;
+ SPUFS_I(inode)->i_ctx = get_spu_context(ctx);
+
+ d_add(dentry, inode);
+ files++;
+ }
+ return 0;
+out:
+ // FIXME: remove all files that are left
+ return ret;
+}
+
+static int
+spufs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
+{
+ int ret;
+ struct inode *inode;
+ struct spu_context *ctx;
+
+ ret = -ENOSPC;
+ inode = spufs_new_inode(dir->i_sb, mode | S_IFDIR);
+ if (!inode)
+ goto out;
+
+ if (dir->i_mode & S_ISGID) {
+ inode->i_gid = dir->i_gid;
+ inode->i_mode |= S_ISGID;
+ }
+ ctx = alloc_spu_context();
+ SPUFS_I(inode)->i_ctx = ctx;
+ if (!ctx)
+ goto out_iput;
+
+ inode->i_op = &simple_dir_inode_operations;
+ inode->i_fop = &simple_dir_operations;
+ ret = spufs_fill_dir(dentry, spufs_dir_contents, mode, ctx);
+ if (ret)
+ goto out_free_ctx;
+
+ d_instantiate(dentry, inode);
+ dget(dentry);
+ dir->i_nlink++;
+ goto out;
+
+out_free_ctx:
+ put_spu_context(ctx);
+out_iput:
+ iput(inode);
+out:
+ return ret;
+}
+
+/* This looks really wrong! */
+static int spufs_rmdir(struct inode *root, struct dentry *dir_dentry)
+{
+ struct dentry *dentry;
+ int err;
+
+ spin_lock(&dcache_lock);
+
+ /* check if any entry is used */
+ err = -EBUSY;
+ list_for_each_entry(dentry, &dir_dentry->d_subdirs, d_child) {
+ if (d_unhashed(dentry) || !dentry->d_inode)
+ continue;
+ if (atomic_read(&dentry->d_count) != 1)
+ goto out;
+ }
+ /* remove all entries */
+ err = 0;
+ list_for_each_entry(dentry, &dir_dentry->d_subdirs, d_child) {
+ if (d_unhashed(dentry) || !dentry->d_inode)
+ continue;
+ atomic_dec(&dentry->d_count);
+ __d_drop(dentry);
+ }
+out:
+ spin_unlock(&dcache_lock);
+ if (!err) {
+ shrink_dcache_parent(dir_dentry);
+ err = simple_rmdir(root, dir_dentry);
+ }
+ return err;
+}
+
+/* File system initialization */
+
+static int
+spufs_create_root(struct super_block *sb) {
+ static struct inode_operations spufs_dir_inode_operations = {
+ .lookup = simple_lookup,
+ .mkdir = spufs_mkdir,
+ .rmdir = spufs_rmdir,
+// .rename = simple_rename, // XXX maybe
+ };
+
+ struct inode *inode;
+ int ret;
+
+ ret = -ENOMEM;
+ inode = spufs_new_inode(sb, S_IFDIR | 0777);
+
+ if (inode) {
+ inode->i_op = &spufs_dir_inode_operations;
+ inode->i_fop = &simple_dir_operations;
+ SPUFS_I(inode)->i_ctx = NULL;
+ sb->s_root = d_alloc_root(inode);
+ if (!sb->s_root)
+ iput(inode);
+ else
+ ret = 0;
+ }
+ return ret;
+}
+
+static int
+spufs_fill_super(struct super_block *sb, void *data, int silent)
+{
+ static struct super_operations s_ops = {
+ .alloc_inode = spufs_alloc_inode,
+ .destroy_inode = spufs_destroy_inode,
+ .statfs = simple_statfs,
+ .delete_inode = spufs_delete_inode,
+ .drop_inode = generic_delete_inode,
+ };
+
+ sb->s_maxbytes = MAX_LFS_FILESIZE;
+ sb->s_blocksize = PAGE_CACHE_SIZE;
+ sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
+ sb->s_magic = SPUFS_MAGIC;
+ sb->s_op = &s_ops;
+
+ return spufs_create_root(sb);
+}
+
+static struct super_block *
+spufs_get_sb(struct file_system_type *fstype, int flags,
+ const char *name, void *data)
+{
+ return get_sb_single(fstype, flags, data, spufs_fill_super);
+}
+
+static struct file_system_type spufs_type = {
+ .owner = THIS_MODULE,
+ .name = "spufs",
+ .get_sb = spufs_get_sb,
+ .kill_sb = kill_litter_super,
+};
+
+static int spufs_init(void)
+{
+ int ret;
+ ret = -ENOMEM;
+ spufs_inode_cache = kmem_cache_create("spufs_inode_cache",
+ sizeof(struct spufs_inode_info), 0,
+ SLAB_HWCACHE_ALIGN, spufs_init_once, NULL);
+
+ if (!spufs_inode_cache)
+ goto out;
+ ret = register_filesystem(&spufs_type);
+ if (ret)
+ kmem_cache_destroy(spufs_inode_cache);
+out:
+ return ret;
+}
+module_init(spufs_init);
+
+static void spufs_exit(void)
+{
+ unregister_filesystem(&spufs_type);
+ kmem_cache_destroy(spufs_inode_cache);
+}
+module_exit(spufs_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Arnd Bergmann <[email protected]>");
+
--- linux-cg.orig/include/asm-ppc64/spu.h 1969-12-31 19:00:00.000000000 -0500
+++ linux-cg/include/asm-ppc64/spu.h 2005-05-13 17:25:48.137933024 -0400
@@ -0,0 +1,463 @@
+/*
+ * SPU core / file system interface and HW structures
+ *
+ * (C) Copyright IBM Deutschland Entwicklung GmbH 2005
+ *
+ * Author: Arnd Bergmann <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef _SPU_H
+#define _SPU_H
+
+#define LS_ORDER (6) /* 256 kb */
+
+#define LS_SIZE (PAGE_SIZE << LS_ORDER)
+
+struct spu {
+ char *name;
+ u8 *local_store;
+ struct spu_problem __iomem *problem;
+ struct spu_priv1 __iomem *priv1;
+ struct spu_priv2 __iomem *priv2;
+ struct list_head list;
+ int number;
+ u32 isrc;
+ u32 node;
+ struct kref kref;
+ size_t ls_size;
+ unsigned int slb_replace;
+ struct mm_struct *mm;
+ struct task_struct *task;
+
+ u32 stop_code;
+ wait_queue_head_t stop_wq;
+ wait_queue_head_t mbox_wq;
+
+ char irq_c0[8];
+ char irq_c1[8];
+ char irq_c2[8];
+};
+
+struct spu *spu_alloc(void);
+void spu_free(struct spu *spu);
+int spu_run(struct spu *spu);
+
+/*
+ * This defines the Local Store, Problem Area and Privlege Area of an SPU.
+ */
+
+union MFC_TagSizeClassCmd {
+ struct {
+ u16 mfc_size;
+ u16 mfc_tag;
+ u8 pad;
+ u8 mfc_rclassid;
+ u16 mfc_cmd;
+ } u;
+ struct {
+ u32 mfc_size_tag32;
+ u32 mfc_class_cmd32;
+ } by32;
+ u64 all64;
+};
+
+struct MFC_cq_sr {
+ u64 mfc_cq_data0_RW;
+ u64 mfc_cq_data1_RW;
+ u64 mfc_cq_data2_RW;
+ u64 mfc_cq_data3_RW;
+};
+
+struct spu_problem {
+ u8 pad_0x0000_0x3000[0x3000 - 0x0000]; /* 0x0000 */
+
+ /* DMA Area */
+ u8 pad_0x3000_0x3004[0x4]; /* 0x3000 */
+ u32 mfc_lsa_W; /* 0x3004 */
+ u64 mfc_ea_W; /* 0x3008 */
+ union MFC_TagSizeClassCmd mfc_union_W; /* 0x3010 */
+ u8 pad_0x3018_0x3104[0xec]; /* 0x3018 */
+ u32 dma_qstatus_R; /* 0x3104 */
+ u8 pad_0x3108_0x3204[0xfc]; /* 0x3108 */
+ u32 dma_querytype_RW; /* 0x3204 */
+ u8 pad_0x3208_0x321c[0x14]; /* 0x3208 */
+ u32 dma_querymask_RW; /* 0x321c */
+ u8 pad_0x3220_0x322c[0xc]; /* 0x3220 */
+ u32 dma_tagstatus_R; /* 0x322c */
+#define DMA_TAGSTATUS_INTR_ANY 1u
+#define DMA_TAGSTATUS_INTR_ALL 2u
+ u8 pad_0x3230_0x4000[0x4000 - 0x3230]; /* 0x3230 */
+
+ /* SPU Control Area */
+ u8 pad_0x4000_0x4004[0x4]; /* 0x4000 */
+ u32 pu_mb_R; /* 0x4004 */
+ u8 pad_0x4008_0x400c[0x4]; /* 0x4008 */
+ u32 spu_mb_W; /* 0x400c */
+ u8 pad_0x4010_0x4014[0x4]; /* 0x4010 */
+ u32 mb_stat_R; /* 0x4014 */
+ u8 pad_0x4018_0x401c[0x4]; /* 0x4018 */
+ u32 spu_runcntl_RW; /* 0x401c */
+#define SPU_RUNCNTL_STOP 0L
+#define SPU_RUNCNTL_RUNNABLE 1L
+ u8 pad_0x4020_0x4024[0x4]; /* 0x4020 */
+ u32 spu_status_R; /* 0x4024 */
+#define SPU_STATUS_STOPPED 0x0
+#define SPU_STATUS_RUNNING 0x1
+#define SPU_STATUS_STOPPED_BY_STOP 0x2
+#define SPU_STATUS_STOPPED_BY_HALT 0x4
+#define SPU_STATUS_WAITING_FOR_CHANNEL 0x8
+#define SPU_STATUS_SINGLE_STEP 0x10
+ u8 pad_0x4028_0x402c[0x4]; /* 0x4028 */
+ u32 spu_spe_R; /* 0x402c */
+ u8 pad_0x4030_0x4034[0x4]; /* 0x4030 */
+ u32 spu_npc_RW; /* 0x4034 */
+ u8 pad_0x4038_0x14000[0x14000 - 0x4038]; /* 0x4038 */
+
+ /* Signal Notification Area */
+ u8 pad_0x14000_0x1400c[0xc]; /* 0x14000 */
+ u32 signal_notify1; /* 0x1400c */
+ u8 pad_0x14010_0x1c00c[0x7ffc]; /* 0x14010 */
+ u32 signal_notify2; /* 0x1c00c */
+} __attribute__ ((aligned(0x20000)));
+
+/* SPU Privilege 2 State Area */
+struct spu_priv2 {
+ /* MFC Registers */
+ u8 pad_0x0000_0x1100[0x1100 - 0x0000]; /* 0x0000 */
+
+ /* SLB Management Registers */
+ u8 pad_0x1100_0x1108[0x8]; /* 0x1100 */
+ u64 slb_index_W; /* 0x1108 */
+#define SLB_INDEX_MASK 0x7L
+ u64 slb_esid_RW; /* 0x1110 */
+ u64 slb_vsid_RW; /* 0x1118 */
+#define SLB_VSID_SUPERVISOR_STATE (0x1ull << 11)
+#define SLB_VSID_SUPERVISOR_STATE_MASK (0x1ull << 11)
+#define SLB_VSID_PROBLEM_STATE (0x1ull << 10)
+#define SLB_VSID_PROBLEM_STATE_MASK (0x1ull << 10)
+#define SLB_VSID_EXECUTE_SEGMENT (0x1ull << 9)
+#define SLB_VSID_NO_EXECUTE_SEGMENT (0x1ull << 9)
+#define SLB_VSID_EXECUTE_SEGMENT_MASK (0x1ull << 9)
+#define SLB_VSID_4K_PAGE (0x0 << 8)
+#define SLB_VSID_LARGE_PAGE (0x1ull << 8)
+#define SLB_VSID_PAGE_SIZE_MASK (0x1ull << 8)
+#define SLB_VSID_CLASS_MASK (0x1ull << 7)
+#define SLB_VSID_VIRTUAL_PAGE_SIZE_MASK (0x1ull << 6)
+ u64 slb_invalidate_entry_W; /* 0x1120 */
+ u64 slb_invalidate_all_W; /* 0x1128 */
+ u8 pad_0x1130_0x2000[0x2000 - 0x1130]; /* 0x1130 */
+
+ /* Context Save / Restore Area */
+ struct MFC_cq_sr spuq[16]; /* 0x2000 */
+ struct MFC_cq_sr puq[8]; /* 0x2200 */
+ u8 pad_0x2300_0x3000[0x3000 - 0x2300]; /* 0x2300 */
+
+ /* MFC Control */
+ u64 mfc_control_RW; /* 0x3000 */
+#define MFC_CNTL_RESUME_DMA_QUEUE (0ull << 0)
+#define MFC_CNTL_SUSPEND_DMA_QUEUE (1ull << 0)
+#define MFC_CNTL_SUSPEND_DMA_QUEUE_MASK (1ull << 0)
+#define MFC_CNTL_NORMAL_DMA_QUEUE_OPERATION (0ull << 8)
+#define MFC_CNTL_SUSPEND_IN_PROGRESS (1ull << 8)
+#define MFC_CNTL_SUSPEND_COMPLETE (3ull << 8)
+#define MFC_CNTL_SUSPEND_DMA_STATUS_MASK (3ull << 8)
+#define MFC_CNTL_DMA_QUEUES_EMPTY (1ull << 14)
+#define MFC_CNTL_DMA_QUEUES_EMPTY_MASK (1ull << 14)
+#define MFC_CNTL_PURGE_DMA_REQUEST (1ull << 15)
+#define MFC_CNTL_PURGE_DMA_IN_PROGRESS (1ull << 24)
+#define MFC_CNTL_PURGE_DMA_COMPLETE (3ull << 24)
+#define MFC_CNTL_PURGE_DMA_STATUS_MASK (3ull << 24)
+#define MFC_CNTL_RESTART_DMA_COMMAND (1ull << 32)
+#define MFC_CNTL_DMA_COMMAND_REISSUE_PENDING (1ull << 32)
+#define MFC_CNTL_DMA_COMMAND_REISSUE_STATUS_MASK (1ull << 32)
+#define MFC_CNTL_MFC_PRIVILEGE_STATE (2ull << 33)
+#define MFC_CNTL_MFC_PROBLEM_STATE (3ull << 33)
+#define MFC_CNTL_MFC_KEY_PROTECTION_STATE_MASK (3ull << 33)
+#define MFC_CNTL_DECREMENTER_HALTED (1ull << 35)
+#define MFC_CNTL_DECREMENTER_RUNNING (1ull << 40)
+#define MFC_CNTL_DECREMENTER_STATUS_MASK (1ull << 40)
+ u8 pad_0x3008_0x4000[0x4000 - 0x3008]; /* 0x3008 */
+
+ /* Interrupt Mailbox */
+ u64 puint_mb_R; /* 0x4000 */
+ u8 pad_0x4008_0x4040[0x4040 - 0x4008]; /* 0x4008 */
+
+ /* SPU Control */
+ u64 spu_privcntl_RW; /* 0x4040 */
+#define SPU_PRIVCNTL_MODE_NORMAL (0x0ull << 0)
+#define SPU_PRIVCNTL_MODE_SINGLE_STEP (0x1ull << 0)
+#define SPU_PRIVCNTL_MODE_MASK (0x1ull << 0)
+#define SPU_PRIVCNTL_NO_ATTENTION_EVENT (0x0ull << 1)
+#define SPU_PRIVCNTL_ATTENTION_EVENT (0x1ull << 1)
+#define SPU_PRIVCNTL_ATTENTION_EVENT_MASK (0x1ull << 1)
+#define SPU_PRIVCNT_LOAD_REQUEST_NORMAL (0x0ull << 2)
+#define SPU_PRIVCNT_LOAD_REQUEST_ENABLE_MASK (0x1ull << 2)
+ u8 pad_0x4048_0x4058[0x10]; /* 0x4048 */
+ u64 spu_lslr_RW; /* 0x4058 */
+ u64 spu_chnlcntptr_RW; /* 0x4060 */
+ u64 spu_chnlcnt_RW; /* 0x4068 */
+ u64 spu_chnldata_RW; /* 0x4070 */
+ u64 spu_cfg_RW; /* 0x4078 */
+ u8 pad_0x4080_0x5000[0x5000 - 0x4080]; /* 0x4080 */
+
+ /* PV2_ImplRegs: Implementation-specific privileged-state 2 regs */
+ u64 spu_pm_trace_tag_status_RW; /* 0x5000 */
+ u64 spu_tag_status_query_RW; /* 0x5008 */
+#define TAG_STATUS_QUERY_CONDITION_BITS (0x3ull << 32)
+#define TAG_STATUS_QUERY_MASK_BITS (0xffffffffull)
+ u64 spu_cmd_buf1_RW; /* 0x5010 */
+#define SPU_COMMAND_BUFFER_1_LSA_BITS (0x7ffffull << 32)
+#define SPU_COMMAND_BUFFER_1_EAH_BITS (0xffffffffull)
+ u64 spu_cmd_buf2_RW; /* 0x5018 */
+#define SPU_COMMAND_BUFFER_2_EAL_BITS ((0xffffffffull) << 32)
+#define SPU_COMMAND_BUFFER_2_TS_BITS (0xffffull << 16)
+#define SPU_COMMAND_BUFFER_2_TAG_BITS (0x3full)
+ u64 spu_atomic_status_RW; /* 0x5020 */
+} __attribute__ ((aligned(0x20000)));
+
+/* SPU Privilege 1 State Area */
+struct spu_priv1 {
+ /* Control and Configuration Area */
+ u64 mfc_sr1_RW; /* 0x000 */
+#define MFC_STATE1_LOCAL_STORAGE_DECODE_MASK 0x01ull
+#define MFC_STATE1_BUS_TLBIE_MASK 0x02ull
+#define MFC_STATE1_REAL_MODE_OFFSET_ENABLE_MASK 0x04ull
+#define MFC_STATE1_PROBLEM_STATE_MASK 0x08ull
+#define MFC_STATE1_RELOCATE_MASK 0x10ull
+#define MFC_STATE1_MASTER_RUN_CONTROL_MASK 0x20ull
+ u64 mfc_lpid_RW; /* 0x008 */
+ u64 spu_idr_RW; /* 0x010 */
+ u64 mfc_vr_RO; /* 0x018 */
+#define MFC_VERSION_BITS (0xffff << 16)
+#define MFC_REVISION_BITS (0xffff)
+#define MFC_GET_VERSION_BITS(vr) (((vr) & MFC_VERSION_BITS) >> 16)
+#define MFC_GET_REVISION_BITS(vr) ((vr) & MFC_REVISION_BITS)
+ u64 spu_vr_RO; /* 0x020 */
+#define SPU_VERSION_BITS (0xffff << 16)
+#define SPU_REVISION_BITS (0xffff)
+#define SPU_GET_VERSION_BITS(vr) (vr & SPU_VERSION_BITS) >> 16
+#define SPU_GET_REVISION_BITS(vr) (vr & SPU_REVISION_BITS)
+ u8 pad_0x28_0x100[0x100 - 0x28]; /* 0x28 */
+
+
+ /* Interrupt Area */
+ u64 int_mask_class0_RW; /* 0x100 */
+#define CLASS0_ENABLE_DMA_ALIGNMENT_INTR 0x1L
+#define CLASS0_ENABLE_INVALID_DMA_COMMAND_INTR 0x2L
+#define CLASS0_ENABLE_SPU_ERROR_INTR 0x4L
+#define CLASS0_ENABLE_MFC_FIR_INTR 0x8L
+ u64 int_mask_class1_RW; /* 0x108 */
+#define CLASS1_ENABLE_SEGMENT_FAULT_INTR 0x1L
+#define CLASS1_ENABLE_STORAGE_FAULT_INTR 0x2L
+#define CLASS1_ENABLE_LS_COMPARE_SUSPEND_ON_GET_INTR 0x4L
+#define CLASS1_ENABLE_LS_COMPARE_SUSPEND_ON_PUT_INTR 0x8L
+ u64 int_mask_class2_RW; /* 0x110 */
+#define CLASS2_ENABLE_MAILBOX_INTR 0x1L
+#define CLASS2_ENABLE_SPU_STOP_INTR 0x2L
+#define CLASS2_ENABLE_SPU_HALT_INTR 0x4L
+#define CLASS2_ENABLE_SPU_DMA_TAG_GROUP_COMPLETE_INTR 0x8L
+ u8 pad_0x118_0x140[0x28]; /* 0x118 */
+ u64 int_stat_class0_RW; /* 0x140 */
+ u64 int_stat_class1_RW; /* 0x148 */
+ u64 int_stat_class2_RW; /* 0x150 */
+ u8 pad_0x158_0x180[0x28]; /* 0x158 */
+ u64 int_route_RW; /* 0x180 */
+
+ /* Interrupt Routing */
+ u8 pad_0x188_0x200[0x200 - 0x188]; /* 0x188 */
+
+ /* Atomic Unit Control Area */
+ u64 mfc_atomic_flush_RW; /* 0x200 */
+#define mfc_atomic_flush_enable 0x1L
+ u8 pad_0x208_0x280[0x78]; /* 0x208 */
+ u64 resource_allocation_groupID_RW; /* 0x280 */
+ u64 resource_allocation_enable_RW; /* 0x288 */
+ u8 pad_0x290_0x380[0x380 - 0x290]; /* 0x290 */
+
+ /* MFC Fault Isolation Area */
+ /* mfc_fir_R: MFC Fault Isolation Register.
+ * mfc_fir_status_or_W: MFC Fault Isolation Status OR Register.
+ * mfc_fir_status_and_W: MFC Fault Isolation Status AND Register.
+ * mfc_fir_mask_R: MFC FIR Mask Register.
+ * mfc_fir_mask_or_W: MFC FIR Mask OR Register.
+ * mfc_fir_mask_and_W: MFC FIR Mask AND Register.
+ * mfc_fir_chkstp_enable_W: MFC FIR Checkstop Enable Register.
+ */
+ u64 mfc_fir_R; /* 0x380 */
+ u64 mfc_fir_status_or_W; /* 0x388 */
+ u64 mfc_fir_status_and_W; /* 0x390 */
+ u64 mfc_fir_mask_R; /* 0x398 */
+ u64 mfc_fir_mask_or_W; /* 0x3a0 */
+ u64 mfc_fir_mask_and_W; /* 0x3a8 */
+ u64 mfc_fir_chkstp_enable_RW; /* 0x3b0 */
+ u8 pad_0x3b8_0x3c8[0x3c8 - 0x3b8]; /* 0x3b8 */
+
+ /* SPU_Cache_ImplRegs: Implementation-dependent cache registers */
+
+ u64 smf_sbi_signal_sel; /* 0x3c8 */
+#define smf_sbi_mask_lsb 56
+#define smf_sbi_shift (63 - smf_sbi_mask_lsb)
+#define smf_sbi_mask (0x301LL << smf_sbi_shift)
+#define smf_sbi_bus0_bits (0x001LL << smf_sbi_shift)
+#define smf_sbi_bus2_bits (0x100LL << smf_sbi_shift)
+#define smf_sbi2_bus0_bits (0x201LL << smf_sbi_shift)
+#define smf_sbi2_bus2_bits (0x300LL << smf_sbi_shift)
+ u64 smf_ato_signal_sel; /* 0x3d0 */
+#define smf_ato_mask_lsb 35
+#define smf_ato_shift (63 - smf_ato_mask_lsb)
+#define smf_ato_mask (0x3LL << smf_ato_shift)
+#define smf_ato_bus0_bits (0x2LL << smf_ato_shift)
+#define smf_ato_bus2_bits (0x1LL << smf_ato_shift)
+ u8 pad_0x3d8_0x400[0x400 - 0x3d8]; /* 0x3d8 */
+
+ /* TLB Management Registers */
+ u64 mfc_sdr_RW; /* 0x400 */
+ u8 pad_0x408_0x500[0xf8]; /* 0x408 */
+ u64 tlb_index_hint_RO; /* 0x500 */
+ u64 tlb_index_W; /* 0x508 */
+ u64 tlb_vpn_RW; /* 0x510 */
+ u64 tlb_rpn_RW; /* 0x518 */
+ u8 pad_0x520_0x540[0x20]; /* 0x520 */
+ u64 tlb_invalidate_entry_W; /* 0x540 */
+ u64 tlb_invalidate_all_W; /* 0x548 */
+ u8 pad_0x550_0x580[0x580 - 0x550]; /* 0x550 */
+
+ /* SPU_MMU_ImplRegs: Implementation-dependent MMU registers */
+ u64 smm_hid; /* 0x580 */
+#define PAGE_SIZE_MASK 0xf000000000000000ull
+#define PAGE_SIZE_16MB_64KB 0x2000000000000000ull
+ u8 pad_0x588_0x600[0x600 - 0x588]; /* 0x588 */
+
+ /* MFC Status/Control Area */
+ u64 mfc_accr_RW; /* 0x600 */
+#define MFC_ACCR_EA_ACCESS_GET (1 << 0)
+#define MFC_ACCR_EA_ACCESS_PUT (1 << 1)
+#define MFC_ACCR_LS_ACCESS_GET (1 << 3)
+#define MFC_ACCR_LS_ACCESS_PUT (1 << 4)
+ u8 pad_0x608_0x610[0x8]; /* 0x608 */
+ u64 mfc_dsisr_RW; /* 0x610 */
+#define MFC_DSISR_PTE_NOT_FOUND (1 << 30)
+#define MFC_DSISR_ACCESS_DENIED (1 << 27)
+#define MFC_DSISR_ATOMIC (1 << 26)
+#define MFC_DSISR_ACCESS_PUT (1 << 25)
+#define MFC_DSISR_ADDR_MATCH (1 << 22)
+#define MFC_DSISR_LS (1 << 17)
+#define MFC_DSISR_L (1 << 16)
+#define MFC_DSISR_ADDRESS_OVERFLOW (1 << 0)
+ u8 pad_0x618_0x620[0x8]; /* 0x618 */
+ u64 mfc_dar_RW; /* 0x620 */
+ u8 pad_0x628_0x700[0x700 - 0x628]; /* 0x628 */
+
+ /* Replacement Management Table (RMT) Area */
+ u64 rmt_index_RW; /* 0x700 */
+ u8 pad_0x708_0x710[0x8]; /* 0x708 */
+ u64 rmt_data1_RW; /* 0x710 */
+ u8 pad_0x718_0x800[0x800 - 0x718]; /* 0x718 */
+
+ /* Control/Configuration Registers */
+ u64 mfc_dsir_R; /* 0x800 */
+#define MFC_DSIR_Q (1 << 31)
+#define MFC_DSIR_SPU_QUEUE MFC_DSIR_Q
+ u64 mfc_lsacr_RW; /* 0x808 */
+#define MFC_LSACR_COMPARE_MASK ((~0ull) << 32)
+#define MFC_LSACR_COMPARE_ADDR ((~0ull) >> 32)
+ u64 mfc_lscrr_R; /* 0x810 */
+#define MFC_LSCRR_Q (1 << 31)
+#define MFC_LSCRR_SPU_QUEUE MFC_LSCRR_Q
+#define MFC_LSCRR_QI_SHIFT 32
+#define MFC_LSCRR_QI_MASK ((~0ull) << MFC_LSCRR_QI_SHIFT)
+ u8 pad_0x818_0x900[0x900 - 0x818]; /* 0x818 */
+
+ /* Real Mode Support Registers */
+ u64 mfc_rm_boundary; /* 0x900 */
+ u8 pad_0x908_0x938[0x30]; /* 0x908 */
+ u64 smf_dma_signal_sel; /* 0x938 */
+#define mfc_dma1_mask_lsb 41
+#define mfc_dma1_shift (63 - mfc_dma1_mask_lsb)
+#define mfc_dma1_mask (0x3LL << mfc_dma1_shift)
+#define mfc_dma1_bits (0x1LL << mfc_dma1_shift)
+#define mfc_dma2_mask_lsb 43
+#define mfc_dma2_shift (63 - mfc_dma2_mask_lsb)
+#define mfc_dma2_mask (0x3LL << mfc_dma2_shift)
+#define mfc_dma2_bits (0x1LL << mfc_dma2_shift)
+ u8 pad_0x940_0xa38[0xf8]; /* 0x940 */
+ u64 smm_signal_sel; /* 0xa38 */
+#define smm_sig_mask_lsb 12
+#define smm_sig_shift (63 - smm_sig_mask_lsb)
+#define smm_sig_mask (0x3LL << smm_sig_shift)
+#define smm_sig_bus0_bits (0x2LL << smm_sig_shift)
+#define smm_sig_bus2_bits (0x1LL << smm_sig_shift)
+ u8 pad_0xa40_0xc00[0xc00 - 0xa40]; /* 0xa40 */
+
+ /* DMA Command Error Area */
+ u64 mfc_cer_R; /* 0xc00 */
+#define MFC_CER_Q (1 << 31)
+#define MFC_CER_SPU_QUEUE MFC_CER_Q
+ u8 pad_0xc08_0x1000[0x1000 - 0xc08]; /* 0xc08 */
+
+ /* PV1_ImplRegs: Implementation-dependent privileged-state 1 regs */
+ /* DMA Command Error Area */
+ u64 spu_ecc_cntl_RW; /* 0x1000 */
+#define SPU_ECC_CNTL_E (1ull << 0ull)
+#define SPU_ECC_CNTL_ENABLE SPU_ECC_CNTL_E
+#define SPU_ECC_CNTL_DISABLE (~SPU_ECC_CNTL_E & 1L)
+#define SPU_ECC_CNTL_S (1ull << 1ull)
+#define SPU_ECC_STOP_AFTER_ERROR SPU_ECC_CNTL_S
+#define SPU_ECC_CONTINUE_AFTER_ERROR (~SPU_ECC_CNTL_S & 2L)
+#define SPU_ECC_CNTL_B (1ull << 2ull)
+#define SPU_ECC_BACKGROUND_ENABLE SPU_ECC_CNTL_B
+#define SPU_ECC_BACKGROUND_DISABLE (~SPU_ECC_CNTL_B & 4L)
+#define SPU_ECC_CNTL_I_SHIFT 3ull
+#define SPU_ECC_CNTL_I_MASK (3ull << SPU_ECC_CNTL_I_SHIFT)
+#define SPU_ECC_WRITE_ALWAYS (~SPU_ECC_CNTL_I & 12L)
+#define SPU_ECC_WRITE_CORRECTABLE (1ull << SPU_ECC_CNTL_I_SHIFT)
+#define SPU_ECC_WRITE_UNCORRECTABLE (3ull << SPU_ECC_CNTL_I_SHIFT)
+#define SPU_ECC_CNTL_D (1ull << 5ull)
+#define SPU_ECC_DETECTION_ENABLE SPU_ECC_CNTL_D
+#define SPU_ECC_DETECTION_DISABLE (~SPU_ECC_CNTL_D & 32L)
+ u64 spu_ecc_stat_RW; /* 0x1008 */
+#define SPU_ECC_CORRECTED_ERROR (1ull << 0ul)
+#define SPU_ECC_UNCORRECTED_ERROR (1ull << 1ul)
+#define SPU_ECC_SCRUB_COMPLETE (1ull << 2ul)
+#define SPU_ECC_SCRUB_IN_PROGRESS (1ull << 3ul)
+#define SPU_ECC_INSTRUCTION_ERROR (1ull << 4ul)
+#define SPU_ECC_DATA_ERROR (1ull << 5ul)
+#define SPU_ECC_DMA_ERROR (1ull << 6ul)
+#define SPU_ECC_STATUS_CNT_MASK (256ull << 8)
+ u64 spu_ecc_addr_RW; /* 0x1010 */
+ u64 spu_err_mask_RW; /* 0x1018 */
+#define SPU_ERR_ILLEGAL_INSTR (1ull << 0ul)
+#define SPU_ERR_ILLEGAL_CHANNEL (1ull << 1ul)
+ u8 pad_0x1020_0x1028[0x1028 - 0x1020]; /* 0x1020 */
+
+ /* SPU Debug-Trace Bus (DTB) Selection Registers */
+ u64 spu_trig0_sel; /* 0x1028 */
+ u64 spu_trig1_sel; /* 0x1030 */
+ u64 spu_trig2_sel; /* 0x1038 */
+ u64 spu_trig3_sel; /* 0x1040 */
+ u64 spu_trace_sel; /* 0x1048 */
+#define spu_trace_sel_mask 0x1f1fLL
+#define spu_trace_sel_bus0_bits 0x1000LL
+#define spu_trace_sel_bus2_bits 0x0010LL
+ u64 spu_event0_sel; /* 0x1050 */
+ u64 spu_event1_sel; /* 0x1058 */
+ u64 spu_event2_sel; /* 0x1060 */
+ u64 spu_event3_sel; /* 0x1068 */
+ u64 spu_trace_cntl; /* 0x1070 */
+} __attribute__ ((aligned(0x2000)));
+
+#endif
--- linux-cg.orig/mm/memory.c 2005-05-13 15:15:07.883989176 -0400
+++ linux-cg/mm/memory.c 2005-05-13 17:25:48.140932568 -0400
@@ -2194,6 +2194,7 @@ unsigned long vmalloc_to_pfn(void * vmal
{
return page_to_pfn(vmalloc_to_page(vmalloc_addr));
}
+EXPORT_SYMBOL_GPL(handle_mm_fault);

EXPORT_SYMBOL(vmalloc_to_pfn);


2005-05-13 20:14:54

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH 6/8] ppc64: Add driver for BPA iommu

Implementation of software load support for the BE iommu. This is very
different from other iommu code on ppc64, since we only do a static mapping.
The mapping is currently hardcoded but should really be read from the
firmware, but they don't set up the device nodes yet. There is a single
512MB DMA window for PCI, USB and ethernet at 0x20000000 for our RAM.

The Cell processor can put the I/O page table either in memory like
the hashed page table (hardware load) or have the operating system
write the entries into memory mapped CPU registers (software load).

I use the software load mechanism because I know that all I/O page
table entries for the amount of installed physical memory fit into
the IO TLB cache. At the point when we get machines with more than
4GB of installed memory, we can either use hardware I/O page table
access like the other platforms do or dynamically update the I/O
TLB entries when a page fault occurs in the I/O subsystem.

The software load can then use the macros that I have implemented
for the static mapping in order to do the TLB cache updates.

Signed-off-by: Arnd Bergmann <[email protected]>

Index: linus-2.5/arch/ppc64/kernel/Makefile
===================================================================
--- linus-2.5.orig/arch/ppc64/kernel/Makefile 2005-04-22 07:01:07.000000000 +0200
+++ linus-2.5/arch/ppc64/kernel/Makefile 2005-04-29 10:01:44.000000000 +0200
@@ -34,7 +34,8 @@
pSeries_nvram.o rtasd.o ras.o pSeries_reconfig.o \
pSeries_setup.o pSeries_iommu.o

-obj-$(CONFIG_PPC_BPA) += bpa_setup.o bpa_nvram.o bpa_iic.o spider-pic.o
+obj-$(CONFIG_PPC_BPA) += bpa_setup.o bpa_iommu.o bpa_nvram.o \
+ bpa_iic.o spider-pic.o

obj-$(CONFIG_EEH) += eeh.o
obj-$(CONFIG_PROC_FS) += proc_ppc64.o
Index: linus-2.5/arch/ppc64/kernel/bpa_iommu.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linus-2.5/arch/ppc64/kernel/bpa_iommu.c 2005-04-29 10:24:03.000000000 +0200
@@ -0,0 +1,377 @@
+/*
+ * IOMMU implementation for Broadband Processor Architecture
+ * We just establish a linear mapping at boot by setting all the
+ * IOPT cache entries in the CPU.
+ * The mapping functions should be identical to pci_direct_iommu,
+ * except for the handling of the high order bit that is required
+ * by the Spider bridge. These should be split into a separate
+ * file at the point where we get a different bridge chip.
+ *
+ * Copyright (C) 2005 IBM Deutschland Entwicklung GmbH,
+ * Arnd Bergmann <[email protected]>
+ *
+ * Based on linear mapping
+ * Copyright (C) 2003 Benjamin Herrenschmidt ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <linux/delay.h>
+#include <linux/string.h>
+#include <linux/init.h>
+#include <linux/bootmem.h>
+#include <linux/mm.h>
+#include <linux/dma-mapping.h>
+
+#include <asm/sections.h>
+#include <asm/iommu.h>
+#include <asm/io.h>
+#include <asm/prom.h>
+#include <asm/pci-bridge.h>
+#include <asm/machdep.h>
+#include <asm/pmac_feature.h>
+#include <asm/abs_addr.h>
+#include <asm/system.h>
+
+#include "pci.h"
+#include "bpa_iommu.h"
+
+static inline unsigned long
+get_iopt_entry(unsigned long real_address, unsigned long ioid,
+ unsigned long prot)
+{
+ return (prot & IOPT_PROT_MASK)
+ | (IOPT_COHERENT)
+ | (IOPT_ORDER_VC)
+ | (real_address & IOPT_RPN_MASK)
+ | (ioid & IOPT_IOID_MASK);
+}
+
+typedef struct {
+ unsigned long val;
+} ioste;
+
+static inline ioste
+mk_ioste(unsigned long val)
+{
+ ioste ioste = { .val = val, };
+ return ioste;
+}
+
+static inline ioste
+get_iost_entry(unsigned long iopt_base, unsigned long io_address, unsigned page_size)
+{
+ unsigned long ps;
+ unsigned long iostep;
+ unsigned long nnpt;
+ unsigned long shift;
+
+ switch (page_size) {
+ case 0x1000000:
+ ps = IOST_PS_16M;
+ nnpt = 0; /* one page per segment */
+ shift = 5; /* segment has 16 iopt entries */
+ break;
+
+ case 0x100000:
+ ps = IOST_PS_1M;
+ nnpt = 0; /* one page per segment */
+ shift = 1; /* segment has 256 iopt entries */
+ break;
+
+ case 0x10000:
+ ps = IOST_PS_64K;
+ nnpt = 0x07; /* 8 pages per io page table */
+ shift = 0; /* all entries are used */
+ break;
+
+ case 0x1000:
+ ps = IOST_PS_4K;
+ nnpt = 0x7f; /* 128 pages per io page table */
+ shift = 0; /* all entries are used */
+ break;
+
+ default: /* not a known compile time constant */
+ BUILD_BUG_ON(1);
+ break;
+ }
+
+ iostep = iopt_base +
+ /* need 8 bytes per iopte */
+ (((io_address / page_size * 8)
+ /* align io page tables on 4k page boundaries */
+ << shift)
+ /* nnpt+1 pages go into each iopt */
+ & ~(nnpt << 12));
+
+ nnpt++; /* this seems to work, but the documentation is not clear
+ about wether we put nnpt or nnpt-1 into the ioste bits.
+ In theory, this can't work for 4k pages. */
+ return mk_ioste(IOST_VALID_MASK
+ | (iostep & IOST_PT_BASE_MASK)
+ | ((nnpt << 5) & IOST_NNPT_MASK)
+ | (ps & IOST_PS_MASK));
+}
+
+/* compute the address of an io pte */
+static inline unsigned long
+get_ioptep(ioste iost_entry, unsigned long io_address)
+{
+ unsigned long iopt_base;
+ unsigned long page_size;
+ unsigned long page_number;
+ unsigned long iopt_offset;
+
+ iopt_base = iost_entry.val & IOST_PT_BASE_MASK;
+ page_size = iost_entry.val & IOST_PS_MASK;
+
+ /* decode page size to compute page number */
+ page_number = (io_address & 0x0fffffff) >> (10 + 2 * page_size);
+ /* page number is an offset into the io page table */
+ iopt_offset = (page_number << 3) & 0x7fff8ul;
+ return iopt_base + iopt_offset;
+}
+
+/* compute the tag field of the iopt cache entry */
+static inline unsigned long
+get_ioc_tag(ioste iost_entry, unsigned long io_address)
+{
+ unsigned long iopte = get_ioptep(iost_entry, io_address);
+
+ return IOPT_VALID_MASK
+ | ((iopte & 0x00000000000000ff8ul) >> 3)
+ | ((iopte & 0x0000003fffffc0000ul) >> 9);
+}
+
+/* compute the hashed 6 bit index for the 4-way associative pte cache */
+static inline unsigned long
+get_ioc_hash(ioste iost_entry, unsigned long io_address)
+{
+ unsigned long iopte = get_ioptep(iost_entry, io_address);
+
+ return ((iopte & 0x000000000000001f8ul) >> 3)
+ ^ ((iopte & 0x00000000000020000ul) >> 17)
+ ^ ((iopte & 0x00000000000010000ul) >> 15)
+ ^ ((iopte & 0x00000000000008000ul) >> 13)
+ ^ ((iopte & 0x00000000000004000ul) >> 11)
+ ^ ((iopte & 0x00000000000002000ul) >> 9)
+ ^ ((iopte & 0x00000000000001000ul) >> 7);
+}
+
+/* same as above, but pretend that we have a simpler 1-way associative
+ pte cache with an 8 bit index */
+static inline unsigned long
+get_ioc_hash_1way(ioste iost_entry, unsigned long io_address)
+{
+ unsigned long iopte = get_ioptep(iost_entry, io_address);
+
+ return ((iopte & 0x000000000000001f8ul) >> 3)
+ ^ ((iopte & 0x00000000000020000ul) >> 17)
+ ^ ((iopte & 0x00000000000010000ul) >> 15)
+ ^ ((iopte & 0x00000000000008000ul) >> 13)
+ ^ ((iopte & 0x00000000000004000ul) >> 11)
+ ^ ((iopte & 0x00000000000002000ul) >> 9)
+ ^ ((iopte & 0x00000000000001000ul) >> 7)
+ ^ ((iopte & 0x0000000000000c000ul) >> 8);
+}
+
+static inline ioste
+get_iost_cache(void __iomem *base, unsigned long index)
+{
+ unsigned long __iomem *p = (base + IOC_ST_CACHE_DIR);
+ return mk_ioste(in_be64(&p[index]));
+}
+
+static inline void
+set_iost_cache(void __iomem *base, unsigned long index, ioste ste)
+{
+ unsigned long __iomem *p = (base + IOC_ST_CACHE_DIR);
+ pr_debug("ioste %02lx was %016lx, store %016lx", index,
+ get_iost_cache(base, index).val, ste.val);
+ out_be64(&p[index], ste.val);
+ pr_debug(" now %016lx\n", get_iost_cache(base, index).val);
+}
+
+static inline unsigned long
+get_iopt_cache(void __iomem *base, unsigned long index, unsigned long *tag)
+{
+ unsigned long __iomem *tags = (void *)(base + IOC_PT_CACHE_DIR);
+ unsigned long __iomem *p = (void *)(base + IOC_PT_CACHE_REG);
+
+ *tag = tags[index];
+ rmb();
+ return *p;
+}
+
+static inline void
+set_iopt_cache(void __iomem *base, unsigned long index,
+ unsigned long tag, unsigned long val)
+{
+ unsigned long __iomem *tags = base + IOC_PT_CACHE_DIR;
+ unsigned long __iomem *p = base + IOC_PT_CACHE_REG;
+ pr_debug("iopt %02lx was v%016lx/t%016lx, store v%016lx/t%016lx\n",
+ index, get_iopt_cache(base, index, &oldtag), oldtag, val, tag);
+
+ out_be64(p, val);
+ out_be64(&tags[index], tag);
+}
+
+static inline void
+set_iost_origin(void __iomem *base)
+{
+ unsigned long __iomem *p = base + IOC_ST_ORIGIN;
+ unsigned long origin = IOSTO_ENABLE | IOSTO_SW;
+
+ pr_debug("iost_origin %016lx, now %016lx\n", in_be64(p), origin);
+ out_be64(p, origin);
+}
+
+static inline void
+set_iocmd_config(void __iomem *base)
+{
+ unsigned long __iomem *p = base + 0xc00;
+ unsigned long conf;
+
+ conf = in_be64(p);
+ pr_debug("iost_conf %016lx, now %016lx\n", conf, conf | IOCMD_CONF_TE);
+ out_be64(p, conf | IOCMD_CONF_TE);
+}
+
+/* FIXME: get these from the device tree */
+#define ioc_base 0x20000511000ull
+#define ioc_mmio_base 0x20000510000ull
+#define ioid 0x48a
+#define iopt_phys_offset (- 0x20000000) /* We have a 512MB offset from the SB */
+#define io_page_size 0x1000000
+
+static unsigned long map_iopt_entry(unsigned long address)
+{
+ switch (address >> 20) {
+ case 0x600:
+ address = 0x24020000000ull; /* spider i/o */
+ break;
+ default:
+ address += iopt_phys_offset;
+ break;
+ }
+
+ return get_iopt_entry(address, ioid, IOPT_PROT_RW);
+}
+
+static void iommu_bus_setup_null(struct pci_bus *b) { }
+static void iommu_dev_setup_null(struct pci_dev *d) { }
+
+/* initialize the iommu to support a simple linear mapping
+ * for each DMA window used by any device. For now, we
+ * happen to know that there is only one DMA window in use,
+ * starting at iopt_phys_offset. */
+static void bpa_map_iommu(void)
+{
+ unsigned long address;
+ void __iomem *base;
+ ioste ioste;
+ unsigned long index;
+
+ base = __ioremap(ioc_base, 0x1000, _PAGE_NO_CACHE);
+ pr_debug("%lx mapped to %p\n", ioc_base, base);
+ set_iocmd_config(base);
+ iounmap(base);
+
+ base = __ioremap(ioc_mmio_base, 0x1000, _PAGE_NO_CACHE);
+ pr_debug("%lx mapped to %p\n", ioc_mmio_base, base);
+
+ set_iost_origin(base);
+
+ for (address = 0; address < 0x100000000ul; address += io_page_size) {
+ ioste = get_iost_entry(0x10000000000ul, address, io_page_size);
+ if ((address & 0xfffffff) == 0) /* segment start */
+ set_iost_cache(base, address >> 28, ioste);
+ index = get_ioc_hash_1way(ioste, address);
+ pr_debug("addr %08lx, index %02lx, ioste %016lx\n",
+ address, index, ioste.val);
+ set_iopt_cache(base,
+ get_ioc_hash_1way(ioste, address),
+ get_ioc_tag(ioste, address),
+ map_iopt_entry(address));
+ }
+ iounmap(base);
+}
+
+
+static void *bpa_alloc_coherent(struct device *hwdev, size_t size,
+ dma_addr_t *dma_handle, unsigned int __nocast flag)
+{
+ void *ret;
+
+ ret = (void *)__get_free_pages(flag, get_order(size));
+ if (ret != NULL) {
+ memset(ret, 0, size);
+ *dma_handle = virt_to_abs(ret) | BPA_DMA_VALID;
+ }
+ return ret;
+}
+
+static void bpa_free_coherent(struct device *hwdev, size_t size,
+ void *vaddr, dma_addr_t dma_handle)
+{
+ free_pages((unsigned long)vaddr, get_order(size));
+}
+
+static dma_addr_t bpa_map_single(struct device *hwdev, void *ptr,
+ size_t size, enum dma_data_direction direction)
+{
+ return virt_to_abs(ptr) | BPA_DMA_VALID;
+}
+
+static void bpa_unmap_single(struct device *hwdev, dma_addr_t dma_addr,
+ size_t size, enum dma_data_direction direction)
+{
+}
+
+static int bpa_map_sg(struct device *hwdev, struct scatterlist *sg,
+ int nents, enum dma_data_direction direction)
+{
+ int i;
+
+ for (i = 0; i < nents; i++, sg++) {
+ sg->dma_address = (page_to_phys(sg->page) + sg->offset)
+ | BPA_DMA_VALID;
+ sg->dma_length = sg->length;
+ }
+
+ return nents;
+}
+
+static void bpa_unmap_sg(struct device *hwdev, struct scatterlist *sg,
+ int nents, enum dma_data_direction direction)
+{
+}
+
+static int bpa_dma_supported(struct device *dev, u64 mask)
+{
+ return mask < 0x100000000ull;
+}
+
+void bpa_init_iommu(void)
+{
+ bpa_map_iommu();
+
+ /* Direct I/O, IOMMU off */
+ ppc_md.iommu_dev_setup = iommu_dev_setup_null;
+ ppc_md.iommu_bus_setup = iommu_bus_setup_null;
+
+ pci_dma_ops.alloc_coherent = bpa_alloc_coherent;
+ pci_dma_ops.free_coherent = bpa_free_coherent;
+ pci_dma_ops.map_single = bpa_map_single;
+ pci_dma_ops.unmap_single = bpa_unmap_single;
+ pci_dma_ops.map_sg = bpa_map_sg;
+ pci_dma_ops.unmap_sg = bpa_unmap_sg;
+ pci_dma_ops.dma_supported = bpa_dma_supported;
+}
Index: linus-2.5/arch/ppc64/kernel/bpa_iommu.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linus-2.5/arch/ppc64/kernel/bpa_iommu.h 2005-04-29 09:47:29.000000000 +0200
@@ -0,0 +1,65 @@
+#ifndef BPA_IOMMU_H
+#define BPA_IOMMU_H
+
+/* some constants */
+enum {
+ /* segment table entries */
+ IOST_VALID_MASK = 0x8000000000000000ul,
+ IOST_TAG_MASK = 0x3000000000000000ul,
+ IOST_PT_BASE_MASK = 0x000003fffffff000ul,
+ IOST_NNPT_MASK = 0x0000000000000fe0ul,
+ IOST_PS_MASK = 0x000000000000000ful,
+
+ IOST_PS_4K = 0x1,
+ IOST_PS_64K = 0x3,
+ IOST_PS_1M = 0x5,
+ IOST_PS_16M = 0x7,
+
+ /* iopt tag register */
+ IOPT_VALID_MASK = 0x0000000200000000ul,
+ IOPT_TAG_MASK = 0x00000001fffffffful,
+
+ /* iopt cache register */
+ IOPT_PROT_MASK = 0xc000000000000000ul,
+ IOPT_PROT_NONE = 0x0000000000000000ul,
+ IOPT_PROT_READ = 0x4000000000000000ul,
+ IOPT_PROT_WRITE = 0x8000000000000000ul,
+ IOPT_PROT_RW = 0xc000000000000000ul,
+ IOPT_COHERENT = 0x2000000000000000ul,
+
+ IOPT_ORDER_MASK = 0x1800000000000000ul,
+ /* order access to same IOID/VC on same address */
+ IOPT_ORDER_ADDR = 0x0800000000000000ul,
+ /* similar, but only after a write access */
+ IOPT_ORDER_WRITES = 0x1000000000000000ul,
+ /* Order all accesses to same IOID/VC */
+ IOPT_ORDER_VC = 0x1800000000000000ul,
+
+ IOPT_RPN_MASK = 0x000003fffffff000ul,
+ IOPT_HINT_MASK = 0x0000000000000800ul,
+ IOPT_IOID_MASK = 0x00000000000007fful,
+
+ IOSTO_ENABLE = 0x8000000000000000ul,
+ IOSTO_ORIGIN = 0x000003fffffff000ul,
+ IOSTO_HW = 0x0000000000000800ul,
+ IOSTO_SW = 0x0000000000000400ul,
+
+ IOCMD_CONF_TE = 0x0000800000000000ul,
+
+ /* memory mapped registers */
+ IOC_PT_CACHE_DIR = 0x000,
+ IOC_ST_CACHE_DIR = 0x800,
+ IOC_PT_CACHE_REG = 0x910,
+ IOC_ST_ORIGIN = 0x918,
+ IOC_CONF = 0x930,
+
+ /* The high bit needs to be set on every DMA address,
+ only 2GB are addressable */
+ BPA_DMA_VALID = 0x80000000,
+ BPA_DMA_MASK = 0x7fffffff,
+};
+
+
+void bpa_init_iommu(void);
+
+#endif
Index: linus-2.5/arch/ppc64/kernel/bpa_setup.c
===================================================================
--- linus-2.5.orig/arch/ppc64/kernel/bpa_setup.c 2005-04-22 06:59:58.000000000 +0200
+++ linus-2.5/arch/ppc64/kernel/bpa_setup.c 2005-04-29 10:01:12.000000000 +0200
@@ -46,6 +46,7 @@

#include "pci.h"
#include "bpa_iic.h"
+#include "bpa_iommu.h"

#ifdef DEBUG
#define DBG(fmt...) udbg_printf(fmt)
@@ -179,7 +180,7 @@

hpte_init_native();

- pci_direct_iommu_init();
+ bpa_init_iommu();

ppc64_interrupt_controller = IC_BPA_IIC;


2005-05-13 20:19:20

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH 5/8] ppc64: Add driver for BPA interrupt controllers

Add support for the integrated interrupt controller on BPA
CPUs. There is one of those for each SMT thread.

The mapping of interrupt numbers to HW interrupt sources
is described in arch/ppc64/kernel/bpa_iic.h.

This version hardcodes the 'Spider' chip as the secondary
interrupt controller. That is not really generic for the
architecture, but at the moment it is the only secondary
PIC that exists.

A little more work will be needed on this as soon as
we have boards with multiple external interrupt controllers.

Signed-off-by: Arnd Bergmann <[email protected]>

Index: linus-2.5/arch/ppc64/Kconfig
===================================================================
--- linus-2.5.orig/arch/ppc64/Kconfig 2005-04-22 06:59:52.000000000 +0200
+++ linus-2.5/arch/ppc64/Kconfig 2005-04-22 06:59:58.000000000 +0200
@@ -106,6 +106,21 @@
bool
default y

+config XICS
+ depends on PPC_PSERIES
+ bool
+ default y
+
+config MPIC
+ depends on PPC_PSERIES || PPC_PMAC || PPC_MAPLE
+ bool
+ default y
+
+config BPA_IIC
+ depends on PPC_BPA
+ bool
+ default y
+
# VMX is pSeries only for now until somebody writes the iSeries
# exception vectors for it
config ALTIVEC
Index: linus-2.5/arch/ppc64/kernel/Makefile
===================================================================
--- linus-2.5.orig/arch/ppc64/kernel/Makefile 2005-04-22 06:59:52.000000000 +0200
+++ linus-2.5/arch/ppc64/kernel/Makefile 2005-04-22 07:01:07.000000000 +0200
@@ -28,13 +28,13 @@
mf.o HvLpEvent.o iSeries_proc.o iSeries_htab.o \
iSeries_iommu.o

-obj-$(CONFIG_PPC_MULTIPLATFORM) += nvram.o i8259.o prom_init.o prom.o mpic.o
+obj-$(CONFIG_PPC_MULTIPLATFORM) += nvram.o i8259.o prom_init.o prom.o

obj-$(CONFIG_PPC_PSERIES) += pSeries_pci.o pSeries_lpar.o pSeries_hvCall.o \
pSeries_nvram.o rtasd.o ras.o pSeries_reconfig.o \
- xics.o pSeries_setup.o pSeries_iommu.o
+ pSeries_setup.o pSeries_iommu.o

-obj-$(CONFIG_PPC_BPA) += bpa_setup.o bpa_nvram.o
+obj-$(CONFIG_PPC_BPA) += bpa_setup.o bpa_nvram.o bpa_iic.o spider-pic.o

obj-$(CONFIG_EEH) += eeh.o
obj-$(CONFIG_PROC_FS) += proc_ppc64.o
@@ -50,6 +50,8 @@
obj-$(CONFIG_BOOTX_TEXT) += btext.o
obj-$(CONFIG_HVCS) += hvcserver.o
obj-$(CONFIG_IBMVIO) += vio.o
+obj-$(CONFIG_XICS) += xics.o
+obj-$(CONFIG_MPIC) += mpic.o

obj-$(CONFIG_PPC_PMAC) += pmac_setup.o pmac_feature.o pmac_pci.o \
pmac_time.o pmac_nvram.o pmac_low_i2c.o
Index: linus-2.5/arch/ppc64/kernel/bpa_iic.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linus-2.5/arch/ppc64/kernel/bpa_iic.c 2005-04-22 06:59:58.000000000 +0200
@@ -0,0 +1,270 @@
+/*
+ * BPA Internal Interrupt Controller
+ *
+ * (C) Copyright IBM Deutschland Entwicklung GmbH 2005
+ *
+ * Author: Arnd Bergmann <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/config.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/percpu.h>
+#include <linux/types.h>
+
+#include <asm/io.h>
+#include <asm/pgtable.h>
+#include <asm/prom.h>
+#include <asm/ptrace.h>
+
+#include "bpa_iic.h"
+
+struct iic_pending_bits {
+ u32 data;
+ u8 flags;
+ u8 class;
+ u8 source;
+ u8 prio;
+};
+
+enum iic_pending_flags {
+ IIC_VALID = 0x80,
+ IIC_IPI = 0x40,
+};
+
+struct iic_regs {
+ struct iic_pending_bits pending;
+ struct iic_pending_bits pending_destr;
+ u64 generate;
+ u64 prio;
+};
+
+struct iic {
+ struct iic_regs __iomem *regs;
+};
+
+static DEFINE_PER_CPU(struct iic, iic);
+
+void iic_local_enable(void)
+{
+ out_be64(&__get_cpu_var(iic).regs->prio, 0xff);
+}
+
+void iic_local_disable(void)
+{
+ out_be64(&__get_cpu_var(iic).regs->prio, 0x0);
+}
+
+static unsigned int iic_startup(unsigned int irq)
+{
+ return 0;
+}
+
+static void iic_enable(unsigned int irq)
+{
+ iic_local_enable();
+}
+
+static void iic_disable(unsigned int irq)
+{
+}
+
+static void iic_end(unsigned int irq)
+{
+ iic_local_enable();
+}
+
+static struct hw_interrupt_type iic_pic = {
+ .typename = " BPA-IIC ",
+ .startup = iic_startup,
+ .enable = iic_enable,
+ .disable = iic_disable,
+ .end = iic_end,
+};
+
+static int iic_external_get_irq(struct iic_pending_bits pending)
+{
+ int irq;
+ unsigned char node, unit;
+
+ node = pending.source >> 4;
+ unit = pending.source & 0xf;
+ irq = -1;
+
+ /*
+ * This mapping is specific to the Broadband
+ * Engine. We might need to get the numbers
+ * from the device tree to support future CPUs.
+ */
+ switch (unit) {
+ case 0x00:
+ case 0x0b:
+ /*
+ * One of these units can be connected
+ * to an external interrupt controller.
+ */
+ if (pending.prio > 0x3f ||
+ pending.class != 2)
+ break;
+ irq = IIC_EXT_OFFSET
+ + spider_get_irq(pending.prio + node * IIC_NODE_STRIDE)
+ + node * IIC_NODE_STRIDE;
+ break;
+ case 0x01 ... 0x04:
+ case 0x07 ... 0x0a:
+ /*
+ * These units are connected to the SPEs
+ */
+ if (pending.class > 2)
+ break;
+ irq = IIC_SPE_OFFSET
+ + pending.class * IIC_CLASS_STRIDE
+ + node * IIC_NODE_STRIDE
+ + unit;
+ break;
+ }
+ if (irq == -1)
+ printk(KERN_WARNING "Unexpected interrupt class %02x, "
+ "source %02x, prio %02x, cpu %02x\n", pending.class,
+ pending.source, pending.prio, smp_processor_id());
+ return irq;
+}
+
+/* Get an IRQ number from the pending state register of the IIC */
+int iic_get_irq(struct pt_regs *regs)
+{
+ struct iic *iic;
+ int irq;
+ struct iic_pending_bits pending;
+
+ iic = &__get_cpu_var(iic);
+ *(unsigned long *) &pending =
+ in_be64((unsigned long __iomem *) &iic->regs->pending_destr);
+
+ irq = -1;
+ if (pending.flags & IIC_VALID) {
+ if (pending.flags & IIC_IPI) {
+ irq = IIC_IPI_OFFSET + (pending.prio >> 4);
+/*
+ if (irq > 0x80)
+ printk(KERN_WARNING "Unexpected IPI prio %02x"
+ "on CPU %02x\n", pending.prio,
+ smp_processor_id());
+*/
+ } else {
+ irq = iic_external_get_irq(pending);
+ }
+ }
+ return irq;
+}
+
+static struct iic_regs __iomem *find_iic(int cpu)
+{
+ struct device_node *np;
+ int nodeid = cpu / 2;
+ unsigned long regs;
+ struct iic_regs __iomem *iic_regs;
+
+ for (np = of_find_node_by_type(NULL, "cpu");
+ np;
+ np = of_find_node_by_type(np, "cpu")) {
+ if (nodeid == *(int *)get_property(np, "node-id", NULL))
+ break;
+ }
+
+ if (!np) {
+ printk(KERN_WARNING "IIC: CPU %d not found\n", cpu);
+ iic_regs = NULL;
+ } else {
+ regs = *(long *)get_property(np, "iic", NULL);
+
+ /* hack until we have decided on the devtree info */
+ regs += 0x400;
+ if (cpu & 1)
+ regs += 0x20;
+
+ printk(KERN_DEBUG "IIC for CPU %d at %lx\n", cpu, regs);
+ iic_regs = __ioremap(regs, sizeof(struct iic_regs),
+ _PAGE_NO_CACHE);
+ }
+ return iic_regs;
+}
+
+#ifdef CONFIG_SMP
+void iic_setup_cpu(void)
+{
+ out_be64(&__get_cpu_var(iic).regs->prio, 0xff);
+}
+
+void iic_cause_IPI(int cpu, int mesg)
+{
+ out_be64(&per_cpu(iic, cpu).regs->generate, mesg);
+}
+
+static irqreturn_t iic_ipi_action(int irq, void *dev_id, struct pt_regs *regs)
+{
+
+ smp_message_recv(irq - IIC_IPI_OFFSET, regs);
+ return IRQ_HANDLED;
+}
+
+static void iic_request_ipi(int irq, const char *name)
+{
+ /* IPIs are marked SA_INTERRUPT as they must run with irqs
+ * disabled */
+ get_irq_desc(irq)->handler = &iic_pic;
+ get_irq_desc(irq)->status |= IRQ_PER_CPU;
+ request_irq(irq, iic_ipi_action, SA_INTERRUPT, name, NULL);
+}
+
+void iic_request_IPIs(void)
+{
+ iic_request_ipi(IIC_IPI_OFFSET + PPC_MSG_CALL_FUNCTION, "IPI-call");
+ iic_request_ipi(IIC_IPI_OFFSET + PPC_MSG_RESCHEDULE, "IPI-resched");
+#ifdef CONFIG_DEBUGGER
+ iic_request_ipi(IIC_IPI_OFFSET + PPC_MSG_DEBUGGER_BREAK, "IPI-debug");
+#endif /* CONFIG_DEBUGGER */
+}
+#endif /* CONFIG_SMP */
+
+static void iic_setup_spe_handlers(void)
+{
+ int be, isrc;
+
+ /* Assume two threads per BE are present */
+ for (be=0; be < num_present_cpus() / 2; be++) {
+ for (isrc = 0; isrc < IIC_CLASS_STRIDE * 3; isrc++) {
+ int irq = IIC_NODE_STRIDE * be + IIC_SPE_OFFSET + isrc;
+ get_irq_desc(irq)->handler = &iic_pic;
+ }
+ }
+}
+
+void iic_init_IRQ(void)
+{
+ int cpu, irq_offset;
+ struct iic *iic;
+
+ irq_offset = 0;
+ for_each_cpu(cpu) {
+ iic = &per_cpu(iic, cpu);
+ iic->regs = find_iic(cpu);
+ if (iic->regs)
+ out_be64(&iic->regs->prio, 0xff);
+ }
+ iic_setup_spe_handlers();
+}
Index: linus-2.5/arch/ppc64/kernel/bpa_iic.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linus-2.5/arch/ppc64/kernel/bpa_iic.h 2005-04-22 06:59:58.000000000 +0200
@@ -0,0 +1,62 @@
+#ifndef ASM_BPA_IIC_H
+#define ASM_BPA_IIC_H
+#ifdef __KERNEL__
+/*
+ * Mapping of IIC pending bits into per-node
+ * interrupt numbers.
+ *
+ * IRQ FF CC SS PP FF CC SS PP Description
+ *
+ * 00-3f 80 02 +0 00 - 80 02 +0 3f South Bridge
+ * 00-3f 80 02 +b 00 - 80 02 +b 3f South Bridge
+ * 41-4a 80 00 +1 ** - 80 00 +a ** SPU Class 0
+ * 51-5a 80 01 +1 ** - 80 01 +a ** SPU Class 1
+ * 61-6a 80 02 +1 ** - 80 02 +a ** SPU Class 2
+ * 70-7f C0 ** ** 00 - C0 ** ** 0f IPI
+ *
+ * F flags
+ * C class
+ * S source
+ * P Priority
+ * + node number
+ * * don't care
+ *
+ * A node consists of a Broadband Engine and an optional
+ * south bridge device providing a maximum of 64 IRQs.
+ * The south bridge may be connected to either IOIF0
+ * or IOIF1.
+ * Each SPE is represented as three IRQ lines, one per
+ * interrupt class.
+ * 16 IRQ numbers are reserved for inter processor
+ * interruptions, although these are only used in the
+ * range of the first node.
+ *
+ * This scheme needs 128 IRQ numbers per BIF node ID,
+ * which means that with the total of 512 lines
+ * available, we can have a maximum of four nodes.
+ */
+
+enum {
+ IIC_EXT_OFFSET = 0x00, /* Start of south bridge IRQs */
+ IIC_NUM_EXT = 0x40, /* Number of south bridge IRQs */
+ IIC_SPE_OFFSET = 0x40, /* Start of SPE interrupts */
+ IIC_CLASS_STRIDE = 0x10, /* SPE IRQs per class */
+ IIC_IPI_OFFSET = 0x70, /* Start of IPI IRQs */
+ IIC_NUM_IPIS = 0x10, /* IRQs reserved for IPI */
+ IIC_NODE_STRIDE = 0x80, /* Total IRQs per node */
+};
+
+extern void iic_init_IRQ(void);
+extern int iic_get_irq(struct pt_regs *regs);
+extern void iic_cause_IPI(int cpu, int mesg);
+extern void iic_request_IPIs(void);
+extern void iic_setup_cpu(void);
+extern void iic_local_enable(void);
+extern void iic_local_disable(void);
+
+
+extern void spider_init_IRQ(void);
+extern int spider_get_irq(unsigned long int_pending);
+
+#endif
+#endif /* ASM_BPA_IIC_H */
Index: linus-2.5/arch/ppc64/kernel/bpa_setup.c
===================================================================
--- linus-2.5.orig/arch/ppc64/kernel/bpa_setup.c 2005-04-22 06:59:52.000000000 +0200
+++ linus-2.5/arch/ppc64/kernel/bpa_setup.c 2005-04-22 06:59:58.000000000 +0200
@@ -45,6 +45,7 @@
#include <asm/cputable.h>

#include "pci.h"
+#include "bpa_iic.h"

#ifdef DEBUG
#define DBG(fmt...) udbg_printf(fmt)
@@ -143,6 +144,9 @@

static void __init bpa_setup_arch(void)
{
+ ppc_md.init_IRQ = iic_init_IRQ;
+ ppc_md.get_irq = iic_get_irq;
+
#ifdef CONFIG_SMP
smp_init_pSeries();
#endif
@@ -158,7 +162,7 @@
/* Find and initialize PCI host bridges */
init_pci_config_tokens();
find_and_init_phbs();
-
+ spider_init_IRQ();
#ifdef CONFIG_DUMMY_CONSOLE
conswitchp = &dummy_con;
#endif
Index: linus-2.5/arch/ppc64/kernel/pSeries_smp.c
===================================================================
--- linus-2.5.orig/arch/ppc64/kernel/pSeries_smp.c 2005-04-22 06:58:22.000000000 +0200
+++ linus-2.5/arch/ppc64/kernel/pSeries_smp.c 2005-04-22 06:59:58.000000000 +0200
@@ -1,5 +1,5 @@
/*
- * SMP support for pSeries machines.
+ * SMP support for pSeries and BPA machines.
*
* Dave Engebretsen, Peter Bergner, and
* Mike Corrigan {engebret|bergner|mikec}@us.ibm.com
@@ -47,6 +47,7 @@
#include <asm/pSeries_reconfig.h>

#include "mpic.h"
+#include "bpa_iic.h"

#ifdef DEBUG
#define DBG(fmt...) udbg_printf(fmt)
@@ -286,6 +287,7 @@
return 1;
}

+#ifdef CONFIG_XICS
static inline void smp_xics_do_message(int cpu, int msg)
{
set_bit(msg, &xics_ipi_message[cpu].value);
@@ -334,6 +336,37 @@
rtas_set_indicator(GLOBAL_INTERRUPT_QUEUE,
(1UL << interrupt_server_size) - 1 - default_distrib_server, 1);
}
+#endif /* CONFIG_XICS */
+#ifdef CONFIG_BPA_IIC
+static void smp_iic_message_pass(int target, int msg)
+{
+ unsigned int i;
+
+ if (target < NR_CPUS) {
+ iic_cause_IPI(target, msg);
+ } else {
+ for_each_online_cpu(i) {
+ if (target == MSG_ALL_BUT_SELF
+ && i == smp_processor_id())
+ continue;
+ iic_cause_IPI(i, msg);
+ }
+ }
+}
+
+static int __init smp_iic_probe(void)
+{
+ iic_request_IPIs();
+
+ return cpus_weight(cpu_possible_map);
+}
+
+static void __devinit smp_iic_setup_cpu(int cpu)
+{
+ if (cpu != boot_cpuid)
+ iic_setup_cpu();
+}
+#endif /* CONFIG_BPA_IIC */

static DEFINE_SPINLOCK(timebase_lock);
static unsigned long timebase = 0;
@@ -388,14 +421,15 @@

return 1;
}
-
+#ifdef CONFIG_MPIC
static struct smp_ops_t pSeries_mpic_smp_ops = {
.message_pass = smp_mpic_message_pass,
.probe = smp_mpic_probe,
.kick_cpu = smp_pSeries_kick_cpu,
.setup_cpu = smp_mpic_setup_cpu,
};
-
+#endif
+#ifdef CONFIG_XICS
static struct smp_ops_t pSeries_xics_smp_ops = {
.message_pass = smp_xics_message_pass,
.probe = smp_xics_probe,
@@ -403,6 +437,16 @@
.setup_cpu = smp_xics_setup_cpu,
.cpu_bootable = smp_pSeries_cpu_bootable,
};
+#endif
+#ifdef CONFIG_BPA_IIC
+static struct smp_ops_t bpa_iic_smp_ops = {
+ .message_pass = smp_iic_message_pass,
+ .probe = smp_iic_probe,
+ .kick_cpu = smp_pSeries_kick_cpu,
+ .setup_cpu = smp_iic_setup_cpu,
+ .cpu_bootable = smp_pSeries_cpu_bootable,
+};
+#endif

/* This is called very early */
void __init smp_init_pSeries(void)
@@ -411,10 +455,25 @@

DBG(" -> smp_init_pSeries()\n");

- if (ppc64_interrupt_controller == IC_OPEN_PIC)
+ switch (ppc64_interrupt_controller) {
+#ifdef CONFIG_MPIC
+ case IC_OPEN_PIC:
smp_ops = &pSeries_mpic_smp_ops;
- else
+ break;
+#endif
+#ifdef CONFIG_XICS
+ case IC_PPC_XIC:
smp_ops = &pSeries_xics_smp_ops;
+ break;
+#endif
+#ifdef CONFIG_BPA_IIC
+ case IC_BPA_IIC:
+ smp_ops = &bpa_iic_smp_ops;
+ break;
+#endif
+ default:
+ panic("Invalid interrupt controller");
+ }

#ifdef CONFIG_HOTPLUG_CPU
smp_ops->cpu_disable = pSeries_cpu_disable;
Index: linus-2.5/arch/ppc64/kernel/smp.c
===================================================================
--- linus-2.5.orig/arch/ppc64/kernel/smp.c 2005-04-22 06:58:22.000000000 +0200
+++ linus-2.5/arch/ppc64/kernel/smp.c 2005-04-22 06:59:58.000000000 +0200
@@ -71,7 +71,7 @@

int smt_enabled_at_boot = 1;

-#ifdef CONFIG_PPC_MULTIPLATFORM
+#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_PMAC) || defined(CONFIG_PPC_MAPLE)
void smp_mpic_message_pass(int target, int msg)
{
/* make sure we're sending something that translates to an IPI */
Index: linus-2.5/arch/ppc64/kernel/spider-pic.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linus-2.5/arch/ppc64/kernel/spider-pic.c 2005-04-22 06:59:58.000000000 +0200
@@ -0,0 +1,191 @@
+/*
+ * External Interrupt Controller on Spider South Bridge
+ *
+ * (C) Copyright IBM Deutschland Entwicklung GmbH 2005
+ *
+ * Author: Arnd Bergmann <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+
+#include <asm/pgtable.h>
+#include <asm/prom.h>
+#include <asm/io.h>
+
+#include "bpa_iic.h"
+
+/* register layout taken from Spider spec, table 7.4-4 */
+enum {
+ TIR_DEN = 0x004, /* Detection Enable Register */
+ TIR_MSK = 0x084, /* Mask Level Register */
+ TIR_EDC = 0x0c0, /* Edge Detection Clear Register */
+ TIR_PNDA = 0x100, /* Pending Register A */
+ TIR_PNDB = 0x104, /* Pending Register B */
+ TIR_CS = 0x144, /* Current Status Register */
+ TIR_LCSA = 0x150, /* Level Current Status Register A */
+ TIR_LCSB = 0x154, /* Level Current Status Register B */
+ TIR_LCSC = 0x158, /* Level Current Status Register C */
+ TIR_LCSD = 0x15c, /* Level Current Status Register D */
+ TIR_CFGA = 0x200, /* Setting Register A0 */
+ TIR_CFGB = 0x204, /* Setting Register B0 */
+ /* 0x208 ... 0x3ff Setting Register An/Bn */
+ TIR_PPNDA = 0x400, /* Packet Pending Register A */
+ TIR_PPNDB = 0x404, /* Packet Pending Register B */
+ TIR_PIERA = 0x408, /* Packet Output Error Register A */
+ TIR_PIERB = 0x40c, /* Packet Output Error Register B */
+ TIR_PIEN = 0x444, /* Packet Output Enable Register */
+ TIR_PIPND = 0x454, /* Packet Output Pending Register */
+ TIRDID = 0x484, /* Spider Device ID Register */
+ REISTIM = 0x500, /* Reissue Command Timeout Time Setting */
+ REISTIMEN = 0x504, /* Reissue Command Timeout Setting */
+ REISWAITEN = 0x508, /* Reissue Wait Control*/
+};
+
+static void __iomem *spider_pics[4];
+
+static void __iomem *spider_get_pic(int irq)
+{
+ int node = irq / IIC_NODE_STRIDE;
+ irq %= IIC_NODE_STRIDE;
+
+ if (irq >= IIC_EXT_OFFSET &&
+ irq < IIC_EXT_OFFSET + IIC_NUM_EXT &&
+ spider_pics)
+ return spider_pics[node];
+ return NULL;
+}
+
+static int spider_get_nr(unsigned int irq)
+{
+ return (irq % IIC_NODE_STRIDE) - IIC_EXT_OFFSET;
+}
+
+static void __iomem *spider_get_irq_config(int irq)
+{
+ void __iomem *pic;
+ pic = spider_get_pic(irq);
+ return pic + TIR_CFGA + 8 * spider_get_nr(irq);
+}
+
+static void spider_enable_irq(unsigned int irq)
+{
+ void __iomem *cfg = spider_get_irq_config(irq);
+ irq = spider_get_nr(irq);
+
+ out_be32(cfg, in_be32(cfg) | 0x3107000eu);
+ out_be32(cfg + 4, in_be32(cfg + 4) | 0x00020000u | irq);
+}
+
+static void spider_disable_irq(unsigned int irq)
+{
+ void __iomem *cfg = spider_get_irq_config(irq);
+ irq = spider_get_nr(irq);
+
+ out_be32(cfg, in_be32(cfg) & ~0x30000000u);
+}
+
+static unsigned int spider_startup_irq(unsigned int irq)
+{
+ spider_enable_irq(irq);
+ return 0;
+}
+
+static void spider_shutdown_irq(unsigned int irq)
+{
+ spider_disable_irq(irq);
+}
+
+static void spider_end_irq(unsigned int irq)
+{
+ spider_enable_irq(irq);
+}
+
+static void spider_ack_irq(unsigned int irq)
+{
+ spider_disable_irq(irq);
+ iic_local_enable();
+}
+
+static struct hw_interrupt_type spider_pic = {
+ .typename = " SPIDER ",
+ .startup = spider_startup_irq,
+ .shutdown = spider_shutdown_irq,
+ .enable = spider_enable_irq,
+ .disable = spider_disable_irq,
+ .ack = spider_ack_irq,
+ .end = spider_end_irq,
+};
+
+
+int spider_get_irq(unsigned long int_pending)
+{
+ void __iomem *regs = spider_get_pic(int_pending);
+ unsigned long cs;
+ int irq;
+
+ cs = in_be32(regs + TIR_CS);
+
+ irq = cs >> 24;
+ if (irq != 63)
+ return irq;
+
+ return -1;
+}
+
+void spider_init_IRQ(void)
+{
+ int node;
+ struct device_node *dn;
+ unsigned int *property;
+ long spiderpic;
+ int n;
+
+/* FIXME: detect multiple PICs as soon as the device tree has them */
+ for (node = 0; node < 1; node++) {
+ dn = of_find_node_by_path("/");
+ n = prom_n_addr_cells(dn);
+ property = (unsigned int *) get_property(dn,
+ "platform-spider-pic", NULL);
+
+ if (!property)
+ continue;
+ for (spiderpic = 0; n > 0; --n)
+ spiderpic = (spiderpic << 32) + *property++;
+ printk(KERN_DEBUG "SPIDER addr: %lx\n", spiderpic);
+ spider_pics[node] = __ioremap(spiderpic, 0x800, _PAGE_NO_CACHE);
+ for (n = 0; n < IIC_NUM_EXT; n++) {
+ int irq = n + IIC_EXT_OFFSET + node * IIC_NODE_STRIDE;
+ get_irq_desc(irq)->handler = &spider_pic;
+
+ /* do not mask any interrupts because of level */
+ out_be32(spider_pics[node] + TIR_MSK, 0x0);
+
+ /* disable edge detection clear */
+ /* out_be32(spider_pics[node] + TIR_EDC, 0x0); */
+
+ /* enable interrupt packets to be output */
+ out_be32(spider_pics[node] + TIR_PIEN,
+ in_be32(spider_pics[node] + TIR_PIEN) | 0x1);
+
+ /* Enable the interrupt detection enable bit. Do this last! */
+ out_be32(spider_pics[node] + TIR_DEN,
+ in_be32(spider_pics[node] +TIR_DEN) | 0x1);
+
+ }
+ }
+}

2005-05-13 20:21:53

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH 4/8] ppc64: add BPA platform type

This adds the basic support for running on BPA machines.
So far, this is only the IBM workstation, and it will
not run on others without a little more generalization.

It should be possible to configure a kernel for any
combination of CONFIG_PPC_BPA with any of the other
multiplatform targets.

Signed-off-by: Arnd Bergmann <[email protected]>

Index: linus-2.5/MAINTAINERS
===================================================================
--- linus-2.5.orig/MAINTAINERS 2005-05-09 08:14:59.000000000 +0200
+++ linus-2.5/MAINTAINERS 2005-05-09 08:17:38.000000000 +0200
@@ -493,6 +493,13 @@
W: http://sourceforge.net/projects/bonding/
S: Supported

+BROADBAND PROCESSOR ARCHITECTURE
+P: Arnd Bergmann
+M: [email protected]
+L: [email protected]
+W: http://linuxppc64.org
+S: Supported
+
BTTV VIDEO4LINUX DRIVER
P: Gerd Knorr
M: [email protected]
Index: linus-2.5/arch/ppc64/Kconfig
===================================================================
--- linus-2.5.orig/arch/ppc64/Kconfig 2005-05-09 08:15:08.000000000 +0200
+++ linus-2.5/arch/ppc64/Kconfig 2005-05-09 08:17:38.000000000 +0200
@@ -77,6 +77,10 @@
bool " IBM pSeries & new iSeries"
default y

+config PPC_BPA
+ bool " Broadband Processor Architecture"
+ depends on PPC_MULTIPLATFORM
+
config PPC_PMAC
depends on PPC_MULTIPLATFORM
bool " Apple G5 based machines"
@@ -256,7 +260,7 @@

config PPC_RTAS
bool
- depends on PPC_PSERIES
+ depends on PPC_PSERIES || PPC_BPA
default y

config RTAS_PROC
Index: linus-2.5/arch/ppc64/Makefile
===================================================================
--- linus-2.5.orig/arch/ppc64/Makefile 2005-05-09 08:15:08.000000000 +0200
+++ linus-2.5/arch/ppc64/Makefile 2005-05-09 08:17:38.000000000 +0200
@@ -90,12 +90,14 @@
boottarget-$(CONFIG_PPC_PSERIES) := zImage zImage.initrd
boottarget-$(CONFIG_PPC_MAPLE) := zImage zImage.initrd
boottarget-$(CONFIG_PPC_ISERIES) := vmlinux.sminitrd vmlinux.initrd vmlinux.sm
+boottarget-$(CONFIG_PPC_BPA) := zImage zImage.initrd
$(boottarget-y): vmlinux
$(Q)$(MAKE) $(build)=$(boot) $(boot)/$@

bootimage-$(CONFIG_PPC_PSERIES) := $(boot)/zImage
bootimage-$(CONFIG_PPC_PMAC) := vmlinux
bootimage-$(CONFIG_PPC_MAPLE) := $(boot)/zImage
+bootimage-$(CONFIG_PPC_BPA) := zImage
bootimage-$(CONFIG_PPC_ISERIES) := vmlinux
BOOTIMAGE := $(bootimage-y)
install: vmlinux
Index: linus-2.5/arch/ppc64/kernel/Makefile
===================================================================
--- linus-2.5.orig/arch/ppc64/kernel/Makefile 2005-05-09 08:16:57.000000000 +0200
+++ linus-2.5/arch/ppc64/kernel/Makefile 2005-05-09 08:17:38.000000000 +0200
@@ -34,6 +34,8 @@
pSeries_nvram.o rtasd.o ras.o pSeries_reconfig.o \
xics.o pSeries_setup.o pSeries_iommu.o

+obj-$(CONFIG_PPC_BPA) += bpa_setup.o bpa_nvram.o
+
obj-$(CONFIG_EEH) += eeh.o
obj-$(CONFIG_PROC_FS) += proc_ppc64.o
obj-$(CONFIG_RTAS_FLASH) += rtas_flash.o
@@ -60,6 +62,7 @@
obj-$(CONFIG_PPC_PMAC) += pmac_smp.o smp-tbsync.o
obj-$(CONFIG_PPC_ISERIES) += iSeries_smp.o
obj-$(CONFIG_PPC_PSERIES) += pSeries_smp.o
+obj-$(CONFIG_PPC_BPA) += pSeries_smp.o
obj-$(CONFIG_PPC_MAPLE) += smp-tbsync.o
endif

Index: linus-2.5/arch/ppc64/kernel/bpa_setup.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linus-2.5/arch/ppc64/kernel/bpa_setup.c 2005-05-09 08:17:38.000000000 +0200
@@ -0,0 +1,207 @@
+/*
+ * linux/arch/ppc/kernel/bpa_setup.c
+ *
+ * Copyright (C) 1995 Linus Torvalds
+ * Adapted from 'alpha' version by Gary Thomas
+ * Modified by Cort Dougan ([email protected])
+ * Modified by PPC64 Team, IBM Corp
+ * Modified by BPA Team, IBM Deutschland Entwicklung GmbH
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#undef DEBUG
+
+#include <linux/config.h>
+#include <linux/sched.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/stddef.h>
+#include <linux/unistd.h>
+#include <linux/slab.h>
+#include <linux/user.h>
+#include <linux/reboot.h>
+#include <linux/init.h>
+#include <linux/delay.h>
+#include <linux/irq.h>
+#include <linux/seq_file.h>
+#include <linux/root_dev.h>
+#include <linux/console.h>
+
+#include <asm/mmu.h>
+#include <asm/processor.h>
+#include <asm/io.h>
+#include <asm/pgtable.h>
+#include <asm/prom.h>
+#include <asm/rtas.h>
+#include <asm/pci-bridge.h>
+#include <asm/iommu.h>
+#include <asm/dma.h>
+#include <asm/machdep.h>
+#include <asm/time.h>
+#include <asm/nvram.h>
+#include <asm/cputable.h>
+
+#include "pci.h"
+
+#ifdef DEBUG
+#define DBG(fmt...) udbg_printf(fmt)
+#else
+#define DBG(fmt...)
+#endif
+
+extern void pSeries_get_boot_time(struct rtc_time *rtc_time);
+extern void pSeries_get_rtc_time(struct rtc_time *rtc_time);
+extern int pSeries_set_rtc_time(struct rtc_time *rtc_time);
+
+extern unsigned long ppc_proc_freq;
+extern unsigned long ppc_tb_freq;
+
+void bpa_get_cpuinfo(struct seq_file *m)
+{
+ struct device_node *root;
+ const char *model = "";
+
+ root = of_find_node_by_path("/");
+ if (root)
+ model = get_property(root, "model", NULL);
+ seq_printf(m, "machine\t\t: BPA %s\n", model);
+ of_node_put(root);
+}
+
+static void __init bpa_progress(char *s, unsigned short hex)
+{
+ printk("*** %04x : %s\n", hex, s ? s : "");
+}
+
+extern void setup_default_decr(void);
+
+/* Some sane defaults: 125 MHz timebase, 1GHz processor */
+#define DEFAULT_TB_FREQ 125000000UL
+#define DEFAULT_PROC_FREQ (DEFAULT_TB_FREQ * 8)
+
+/* FIXME: consolidate this into rtas.c or similar */
+static void __init pSeries_calibrate_decr(void)
+{
+ struct device_node *cpu;
+ struct div_result divres;
+ unsigned int *fp;
+ int node_found;
+
+ /*
+ * The cpu node should have a timebase-frequency property
+ * to tell us the rate at which the decrementer counts.
+ */
+ cpu = of_find_node_by_type(NULL, "cpu");
+
+ ppc_tb_freq = DEFAULT_TB_FREQ; /* hardcoded default */
+ node_found = 0;
+ if (cpu != 0) {
+ fp = (unsigned int *)get_property(cpu, "timebase-frequency",
+ NULL);
+ if (fp != 0) {
+ node_found = 1;
+ ppc_tb_freq = *fp;
+ }
+ }
+ if (!node_found)
+ printk(KERN_ERR "WARNING: Estimating decrementer frequency "
+ "(not found)\n");
+
+ ppc_proc_freq = DEFAULT_PROC_FREQ;
+ node_found = 0;
+ if (cpu != 0) {
+ fp = (unsigned int *)get_property(cpu, "clock-frequency",
+ NULL);
+ if (fp != 0) {
+ node_found = 1;
+ ppc_proc_freq = *fp;
+ }
+ }
+ if (!node_found)
+ printk(KERN_ERR "WARNING: Estimating processor frequency "
+ "(not found)\n");
+
+ of_node_put(cpu);
+
+ printk(KERN_INFO "time_init: decrementer frequency = %lu.%.6lu MHz\n",
+ ppc_tb_freq/1000000, ppc_tb_freq%1000000);
+ printk(KERN_INFO "time_init: processor frequency = %lu.%.6lu MHz\n",
+ ppc_proc_freq/1000000, ppc_proc_freq%1000000);
+
+ tb_ticks_per_jiffy = ppc_tb_freq / HZ;
+ tb_ticks_per_sec = tb_ticks_per_jiffy * HZ;
+ tb_ticks_per_usec = ppc_tb_freq / 1000000;
+ tb_to_us = mulhwu_scale_factor(ppc_tb_freq, 1000000);
+ div128_by_32(1024*1024, 0, tb_ticks_per_sec, &divres);
+ tb_to_xs = divres.result_low;
+
+ setup_default_decr();
+}
+
+static void __init bpa_setup_arch(void)
+{
+#ifdef CONFIG_SMP
+ smp_init_pSeries();
+#endif
+
+ /* init to some ~sane value until calibrate_delay() runs */
+ loops_per_jiffy = 50000000;
+
+ if (ROOT_DEV == 0) {
+ printk("No ramdisk, default root is /dev/hda2\n");
+ ROOT_DEV = Root_HDA2;
+ }
+
+ /* Find and initialize PCI host bridges */
+ init_pci_config_tokens();
+ find_and_init_phbs();
+
+#ifdef CONFIG_DUMMY_CONSOLE
+ conswitchp = &dummy_con;
+#endif
+
+ // bpa_nvram_init();
+}
+
+/*
+ * Early initialization. Relocation is on but do not reference unbolted pages
+ */
+static void __init bpa_init_early(void)
+{
+ DBG(" -> bpa_init_early()\n");
+
+ hpte_init_native();
+
+ pci_direct_iommu_init();
+
+ ppc64_interrupt_controller = IC_BPA_IIC;
+
+ DBG(" <- bpa_init_early()\n");
+}
+
+
+static int __init bpa_probe(int platform)
+{
+ if (platform != PLATFORM_BPA)
+ return 0;
+
+ return 1;
+}
+
+struct machdep_calls __initdata bpa_md = {
+ .probe = bpa_probe,
+ .setup_arch = bpa_setup_arch,
+ .init_early = bpa_init_early,
+ .get_cpuinfo = bpa_get_cpuinfo,
+ .restart = rtas_restart,
+ .power_off = rtas_power_off,
+ .halt = rtas_halt,
+ .get_boot_time = pSeries_get_boot_time,
+ .get_rtc_time = pSeries_get_rtc_time,
+ .set_rtc_time = pSeries_set_rtc_time,
+ .calibrate_decr = pSeries_calibrate_decr,
+ .progress = bpa_progress,
+};
Index: linus-2.5/arch/ppc64/kernel/cpu_setup_power4.S
===================================================================
--- linus-2.5.orig/arch/ppc64/kernel/cpu_setup_power4.S 2005-05-08 09:51:35.000000000 +0200
+++ linus-2.5/arch/ppc64/kernel/cpu_setup_power4.S 2005-05-09 08:17:38.000000000 +0200
@@ -73,7 +73,21 @@

_GLOBAL(__setup_cpu_power4)
blr
-
+
+_GLOBAL(__setup_cpu_be)
+ /* Set large page sizes LP=0: 16MB, LP=1: 64KB */
+ addi r3, 0, 0
+ ori r3, r3, HID6_LB
+ sldi r3, r3, 32
+ nor r3, r3, r3
+ mfspr r4, SPRN_HID6
+ and r4, r4, r3
+ addi r3, 0, 0x02000
+ sldi r3, r3, 32
+ or r4, r4, r3
+ mtspr SPRN_HID6, r4
+ blr
+
_GLOBAL(__setup_cpu_ppc970)
mfspr r0,SPRN_HID0
li r11,5 /* clear DOZE and SLEEP */
Index: linus-2.5/arch/ppc64/kernel/cputable.c
===================================================================
--- linus-2.5.orig/arch/ppc64/kernel/cputable.c 2005-05-08 09:51:35.000000000 +0200
+++ linus-2.5/arch/ppc64/kernel/cputable.c 2005-05-09 08:17:38.000000000 +0200
@@ -34,6 +34,7 @@
extern void __setup_cpu_power3(unsigned long offset, struct cpu_spec* spec);
extern void __setup_cpu_power4(unsigned long offset, struct cpu_spec* spec);
extern void __setup_cpu_ppc970(unsigned long offset, struct cpu_spec* spec);
+extern void __setup_cpu_be(unsigned long offset, struct cpu_spec* spec);


/* We only set the altivec features if the kernel was compiled with altivec
@@ -162,6 +163,16 @@
__setup_cpu_power4,
COMMON_PPC64_FW
},
+ { /* BE DD1.x */
+ 0xffff0000, 0x00700000, "Broadband Engine",
+ CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE |
+ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_ALTIVEC_COMP |
+ CPU_FTR_SMT,
+ COMMON_USER_PPC64 | PPC_FEATURE_HAS_ALTIVEC_COMP,
+ 128, 128,
+ __setup_cpu_be,
+ COMMON_PPC64_FW
+ },
{ /* default match */
0x00000000, 0x00000000, "POWER4 (compatible)",
CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE |
Index: linus-2.5/arch/ppc64/kernel/irq.c
===================================================================
--- linus-2.5.orig/arch/ppc64/kernel/irq.c 2005-05-08 09:51:35.000000000 +0200
+++ linus-2.5/arch/ppc64/kernel/irq.c 2005-05-09 08:17:38.000000000 +0200
@@ -395,6 +395,9 @@
if (ppc64_interrupt_controller == IC_OPEN_PIC)
return real_irq; /* no mapping for openpic (for now) */

+ if (ppc64_interrupt_controller == IC_BPA_IIC)
+ return real_irq; /* no mapping for iic either */
+
/* don't map interrupts < MIN_VIRT_IRQ */
if (real_irq < MIN_VIRT_IRQ) {
virt_irq_to_real_map[real_irq] = real_irq;
Index: linus-2.5/arch/ppc64/kernel/proc_ppc64.c
===================================================================
--- linus-2.5.orig/arch/ppc64/kernel/proc_ppc64.c 2005-05-08 09:51:35.000000000 +0200
+++ linus-2.5/arch/ppc64/kernel/proc_ppc64.c 2005-05-09 08:17:38.000000000 +0200
@@ -53,7 +53,7 @@
if (!root)
return 1;

- if (!(systemcfg->platform & PLATFORM_PSERIES))
+ if (!(systemcfg->platform & (PLATFORM_PSERIES | PLATFORM_BPA)))
return 0;

if (!proc_mkdir("rtas", root))
Index: linus-2.5/arch/ppc64/kernel/prom_init.c
===================================================================
--- linus-2.5.orig/arch/ppc64/kernel/prom_init.c 2005-05-09 08:15:08.000000000 +0200
+++ linus-2.5/arch/ppc64/kernel/prom_init.c 2005-05-09 08:17:38.000000000 +0200
@@ -1844,9 +1844,9 @@
&getprop_rval, sizeof(getprop_rval));

/*
- * On pSeries, copy the CPU hold code
+ * On pSeries and BPA, copy the CPU hold code
*/
- if (RELOC(of_platform) & PLATFORM_PSERIES)
+ if (RELOC(of_platform) & (PLATFORM_PSERIES | PLATFORM_BPA))
copy_and_flush(0, KERNELBASE - offset, 0x100, 0);

/*
Index: linus-2.5/arch/ppc64/kernel/setup.c
===================================================================
--- linus-2.5.orig/arch/ppc64/kernel/setup.c 2005-05-08 09:51:35.000000000 +0200
+++ linus-2.5/arch/ppc64/kernel/setup.c 2005-05-09 08:17:38.000000000 +0200
@@ -348,6 +348,7 @@
extern struct machdep_calls pSeries_md;
extern struct machdep_calls pmac_md;
extern struct machdep_calls maple_md;
+extern struct machdep_calls bpa_md;

/* Ultimately, stuff them in an elf section like initcalls... */
static struct machdep_calls __initdata *machines[] = {
@@ -360,6 +361,9 @@
#ifdef CONFIG_PPC_MAPLE
&maple_md,
#endif /* CONFIG_PPC_MAPLE */
+#ifdef CONFIG_PPC_BPA
+ &bpa_md,
+#endif
NULL
};

Index: linus-2.5/arch/ppc64/kernel/traps.c
===================================================================
--- linus-2.5.orig/arch/ppc64/kernel/traps.c 2005-05-09 08:04:31.000000000 +0200
+++ linus-2.5/arch/ppc64/kernel/traps.c 2005-05-09 08:17:38.000000000 +0200
@@ -126,6 +126,10 @@
printk("POWERMAC ");
nl = 1;
break;
+ case PLATFORM_BPA:
+ printk("BPA ");
+ nl = 1;
+ break;
}
if (nl)
printk("\n");
Index: linus-2.5/include/asm-ppc64/mmu.h
===================================================================
--- linus-2.5.orig/include/asm-ppc64/mmu.h 2005-05-09 08:15:42.000000000 +0200
+++ linus-2.5/include/asm-ppc64/mmu.h 2005-05-09 08:20:31.000000000 +0200
@@ -47,9 +47,10 @@
#define SLB_VSID_KS ASM_CONST(0x0000000000000800)
#define SLB_VSID_KP ASM_CONST(0x0000000000000400)
#define SLB_VSID_N ASM_CONST(0x0000000000000200) /* no-execute */
-#define SLB_VSID_L ASM_CONST(0x0000000000000100) /* largepage 16M */
+#define SLB_VSID_L ASM_CONST(0x0000000000000100) /* largepage */
#define SLB_VSID_C ASM_CONST(0x0000000000000080) /* class */
-
+#define SLB_VSID_LS ASM_CONST(0x0000000000000070) /* size of largepage */
+
#define SLB_VSID_KERNEL (SLB_VSID_KP|SLB_VSID_C)
#define SLB_VSID_USER (SLB_VSID_KP|SLB_VSID_KS)

Index: linus-2.5/include/asm-ppc64/processor.h
===================================================================
--- linus-2.5.orig/include/asm-ppc64/processor.h 2005-05-09 08:04:46.000000000 +0200
+++ linus-2.5/include/asm-ppc64/processor.h 2005-05-09 08:17:38.000000000 +0200
@@ -217,14 +217,22 @@
#define HID0_ABE (1<<3) /* Address Broadcast Enable */
#define HID0_BHTE (1<<2) /* Branch History Table Enable */
#define HID0_BTCD (1<<1) /* Branch target cache disable */
+#define SPRN_HID6 0x3F9 /* Hardware Implementation Register 6 */
+#define HID6_LB (0x0F<<12) /* Concurrent Large Page Modes */
+#define HID6_DLP (1<<20) /* Disable all large page modes (4K only) */
#define SPRN_MSRDORM 0x3F1 /* Hardware Implementation Register 1 */
#define SPRN_HID1 0x3F1 /* Hardware Implementation Register 1 */
#define SPRN_IABR 0x3F2 /* Instruction Address Breakpoint Register */
#define SPRN_NIADORM 0x3F3 /* Hardware Implementation Register 2 */
#define SPRN_HID4 0x3F4 /* 970 HID4 */
#define SPRN_HID5 0x3F6 /* 970 HID5 */
-#define SPRN_TSC 0x3FD /* Thread switch control */
-#define SPRN_TST 0x3FC /* Thread switch timeout */
+#define SPRN_TSCR 0x399 /* Thread switch control on BE */
+#define SPRN_TTR 0x39A /* Thread switch timeout on BE */
+#define TSCR_DEC_ENABLE 0x200000 /* Decrementer Interrupt */
+#define TSCR_EE_ENABLE 0x100000 /* External Interrupt */
+#define TSCR_EE_BOOST 0x080000 /* External Interrupt Boost */
+#define SPRN_TSC 0x3FD /* Thread switch control on others */
+#define SPRN_TST 0x3FC /* Thread switch timeout on others */
#define SPRN_IAC1 0x3F4 /* Instruction Address Compare 1 */
#define SPRN_IAC2 0x3F5 /* Instruction Address Compare 2 */
#define SPRN_ICCR 0x3FB /* Instruction Cache Cacheability Register */
@@ -411,8 +419,9 @@
#define PV_POWER5 0x003A
#define PV_POWER5p 0x003B
#define PV_970FX 0x003C
-#define PV_630 0x0040
-#define PV_630p 0x0041
+#define PV_630 0x0040
+#define PV_630p 0x0041
+#define PV_BE 0x0070

/* Platforms supported by PPC64 */
#define PLATFORM_PSERIES 0x0100
@@ -421,6 +430,7 @@
#define PLATFORM_LPAR 0x0001
#define PLATFORM_POWERMAC 0x0400
#define PLATFORM_MAPLE 0x0500
+#define PLATFORM_BPA 0x1000

/* Compatibility with drivers coming from PPC32 world */
#define _machine (systemcfg->platform)
@@ -432,6 +442,7 @@
#define IC_INVALID 0
#define IC_OPEN_PIC 1
#define IC_PPC_XIC 2
+#define IC_BPA_IIC 3

#define XGLUE(a,b) a##b
#define GLUE(a,b) XGLUE(a,b)
Index: linus-2.5/include/asm-ppc64/smp.h
===================================================================
--- linus-2.5.orig/include/asm-ppc64/smp.h 2005-05-08 09:51:35.000000000 +0200
+++ linus-2.5/include/asm-ppc64/smp.h 2005-05-09 08:17:38.000000000 +0200
@@ -85,6 +85,14 @@

extern struct smp_ops_t *smp_ops;

+#ifdef CONFIG_PPC_PSERIES
+void vpa_init(int cpu);
+#else
+static inline void vpa_init(int cpu)
+{
+}
+#endif /* CONFIG_PPC_PSERIES */
+
#endif /* __ASSEMBLY__ */

#endif /* !(_PPC64_SMP_H) */

2005-05-13 20:26:18

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH 3/8] ppc64: add a watchdog driver for rtas

Add a watchdog using the RTAS OS surveillance service. This is
provided as a simpler alternative to rtasd. The added value
is that it works with standard watchdog client programs and
can therefore also do user space monitoring.

On BPA, rtasd is not really useful because the hardware does
not have much to report with event-scan.

The driver should also work on other platforms that support
the OS surveillance rtas calls.

From: Utz Bacher <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>

--- linux-2.6-ppc.orig/drivers/char/watchdog/Kconfig 2005-03-18 07:08:59.836902728 -0500
+++ linux-2.6-ppc/drivers/char/watchdog/Kconfig 2005-03-18 07:09:12.047905480 -0500
@@ -414,6 +414,16 @@ config WATCHDOG_RIO
machines. The watchdog timeout period is normally one minute but
can be changed with a boot-time parameter.

+# ppc64 RTAS watchdog
+config WATCHDOG_RTAS
+ tristate "RTAS watchdog"
+ depends on WATCHDOG && PPC_RTAS
+ help
+ This driver adds watchdog support for the RTAS watchdog.
+
+ To compile this driver as a module, choose M here. The module
+ will be called wdrtas.
+
#
# ISA-based Watchdog Cards
#
--- linux-2.6-ppc.orig/drivers/char/watchdog/Makefile 2005-03-18 07:08:59.857899536 -0500
+++ linux-2.6-ppc/drivers/char/watchdog/Makefile 2005-03-18 07:09:52.344904960 -0500
@@ -33,6 +33,7 @@ obj-$(CONFIG_USBPCWATCHDOG) += pcwd_usb.
obj-$(CONFIG_IXP4XX_WATCHDOG) += ixp4xx_wdt.o
obj-$(CONFIG_IXP2000_WATCHDOG) += ixp2000_wdt.o
obj-$(CONFIG_8xx_WDT) += mpc8xx_wdt.o
+obj-$(CONFIG_WATCHDOG_RTAS) += wdrtas.o

# Only one watchdog can succeed. We probe the hardware watchdog
# drivers first, then the softdog driver. This means if your hardware
--- linux-2.6-ppc.orig/drivers/char/watchdog/wdrtas.c 1969-12-31 19:00:00.000000000 -0500
+++ linux-2.6-ppc/drivers/char/watchdog/wdrtas.c 2005-03-18 07:09:12.051904872 -0500
@@ -0,0 +1,691 @@
+/*
+ * FIXME: add wdrtas_get_status and wdrtas_get_boot_status as soon as
+ * RTAS calls are available
+ */
+
+/*
+ * RTAS watchdog driver
+ *
+ * (C) Copyright IBM Corp. 2005
+ * device driver to exploit watchdog RTAS functions
+ *
+ * Authors : Utz Bacher <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/config.h>
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+#include <linux/notifier.h>
+#include <linux/reboot.h>
+#include <linux/types.h>
+#include <linux/watchdog.h>
+
+#include <asm/rtas.h>
+#include <asm/uaccess.h>
+
+#define WDRTAS_MAGIC_CHAR 42
+#define WDRTAS_SUPPORTED_MASK (WDIOF_SETTIMEOUT | \
+ WDIOF_MAGICCLOSE)
+
+MODULE_AUTHOR("Utz Bacher <[email protected]>");
+MODULE_DESCRIPTION("RTAS watchdog driver");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_MISCDEV(WATCHDOG_MINOR);
+MODULE_ALIAS_MISCDEV(TEMP_MINOR);
+
+#ifdef CONFIG_WATCHDOG_NOWAYOUT
+static int wdrtas_nowayout = 1;
+#else
+static int wdrtas_nowayout = 0;
+#endif
+
+static volatile int wdrtas_miscdev_open = 0;
+static char wdrtas_expect_close = 0;
+
+static int wdrtas_interval;
+
+#define WDRTAS_THERMAL_SENSOR 3
+static int wdrtas_token_get_sensor_state;
+#define WDRTAS_SURVEILLANCE_IND 9000
+static int wdrtas_token_set_indicator;
+#define WDRTAS_SP_SPI 28
+static int wdrtas_token_get_sp;
+static int wdrtas_token_event_scan;
+
+#define WDRTAS_DEFAULT_INTERVAL 300
+
+#define WDRTAS_LOGBUFFER_LEN 128
+static char wdrtas_logbuffer[WDRTAS_LOGBUFFER_LEN];
+
+
+/*** watchdog access functions */
+
+/**
+ * wdrtas_set_interval - sets the watchdog interval
+ * @interval: new interval
+ *
+ * returns 0 on success, <0 on failures
+ *
+ * wdrtas_set_interval sets the watchdog keepalive interval by calling the
+ * RTAS function set-indicator (surveillance). The unit of interval is
+ * seconds.
+ */
+static int
+wdrtas_set_interval(int interval)
+{
+ long result;
+ static int print_msg = 10;
+
+ /* rtas uses minutes */
+ interval = (interval + 59) / 60;
+
+ result = rtas_call(wdrtas_token_set_indicator, 3, 1, NULL,
+ WDRTAS_SURVEILLANCE_IND, 0, interval);
+ if ( (result < 0) && (print_msg) ) {
+ printk("wdrtas: setting the watchdog to %i timeout failed: "
+ "%li\n", interval, result);
+ print_msg--;
+ }
+
+ return result;
+}
+
+/**
+ * wdrtas_get_interval - returns the current watchdog interval
+ * @fallback_value: value (in seconds) to use, if the RTAS call fails
+ *
+ * returns the interval
+ *
+ * wdrtas_get_interval returns the current watchdog keepalive interval
+ * as reported by the RTAS function ibm,get-system-parameter. The unit
+ * of the return value is seconds.
+ */
+static int
+wdrtas_get_interval(int fallback_value)
+{
+ long result;
+ char value[4];
+
+ result = rtas_call(wdrtas_token_get_sp, 3, 1, NULL,
+ WDRTAS_SP_SPI, (void *)__pa(&value), 4);
+ if ( (value[0] != 0) || (value[1] != 2) || (value[3] != 0) ||
+ (result < 0) ) {
+ printk("wdrtas: could not get sp_spi watchdog timeout (%li). "
+ "Continuing\n", result);
+ return fallback_value;
+ }
+
+ /* rtas uses minutes */
+ return ((int)value[2]) * 60;
+}
+
+/**
+ * wdrtas_timer_start - starts watchdog
+ *
+ * wdrtas_timer_start starts the watchdog by calling the RTAS function
+ * set-interval (surveillance)
+ */
+static void
+wdrtas_timer_start(void)
+{
+ wdrtas_set_interval(wdrtas_interval);
+}
+
+/**
+ * wdrtas_timer_stop - stops watchdog
+ *
+ * wdrtas_timer_stop stops the watchdog timer by calling the RTAS function
+ * set-interval (surveillance)
+ */
+static void
+wdrtas_timer_stop(void)
+{
+ wdrtas_set_interval(0);
+}
+
+/**
+ * wdrtas_log_scanned_event - logs an event we received during keepalive
+ *
+ * wdrtas_log_scanned_event prints a message to the log buffer dumping
+ * the results of the last event-scan call
+ */
+static void
+wdrtas_log_scanned_event(void)
+{
+ int i;
+
+ for (i = 0; i < WDRTAS_LOGBUFFER_LEN; i += 16)
+ printk("wdrtas: dumping event (line %i/%i), data = "
+ "%02x %02x %02x %02x %02x %02x %02x %02x "
+ "%02x %02x %02x %02x %02x %02x %02x %02x\n",
+ (i / 16) + 1, (WDRTAS_LOGBUFFER_LEN / 16),
+ wdrtas_logbuffer[i + 0], wdrtas_logbuffer[i + 1],
+ wdrtas_logbuffer[i + 2], wdrtas_logbuffer[i + 3],
+ wdrtas_logbuffer[i + 4], wdrtas_logbuffer[i + 5],
+ wdrtas_logbuffer[i + 6], wdrtas_logbuffer[i + 7],
+ wdrtas_logbuffer[i + 8], wdrtas_logbuffer[i + 9],
+ wdrtas_logbuffer[i + 10], wdrtas_logbuffer[i + 11],
+ wdrtas_logbuffer[i + 12], wdrtas_logbuffer[i + 13],
+ wdrtas_logbuffer[i + 14], wdrtas_logbuffer[i + 15]);
+}
+
+/**
+ * wdrtas_timer_keepalive - resets watchdog timer to keep system alive
+ *
+ * wdrtas_timer_keepalive restarts the watchdog timer by calling the
+ * RTAS function event-scan and repeats these calls as long as there are
+ * events available. All events will be dumped.
+ */
+static void
+wdrtas_timer_keepalive(void)
+{
+ long result;
+
+ do {
+ result = rtas_call(wdrtas_token_event_scan, 4, 1, NULL,
+ RTAS_EVENT_SCAN_ALL_EVENTS, 0,
+ (void *)__pa(wdrtas_logbuffer),
+ WDRTAS_LOGBUFFER_LEN);
+ if (result < 0)
+ printk("wdrtas: event-scan failed: %li\n",result);
+ if (result == 0)
+ wdrtas_log_scanned_event();
+ } while (result == 0);
+}
+
+/**
+ * wdrtas_get_temperature - returns current temperature
+ *
+ * returns temperature or <0 on failures
+ *
+ * wdrtas_get_temperature returns the current temperature in Fahrenheit. It
+ * uses the RTAS call get-sensor-state, token 3 to do so
+ */
+static int
+wdrtas_get_temperature(void)
+{
+ long result;
+ int temperature = 0;
+
+ result = rtas_call(wdrtas_token_get_sensor_state, 2, 2,
+ (void *)__pa(&temperature),
+ WDRTAS_THERMAL_SENSOR, 0);
+
+ if (result < 0)
+ printk("wdrtas: reading the thermal sensor faild: %li\n",
+ result);
+ else
+ temperature = ((temperature * 9) / 5) + 32; /* fahrenheit */
+
+ return temperature;
+}
+
+/**
+ * wdrtas_get_status - returns the status of the watchdog
+ *
+ * returns a bitmask of defines WDIOF_... as defined in
+ * include/linux/watchdog.h
+ */
+static int
+wdrtas_get_status(void)
+{
+ return 0; /* TODO */
+}
+
+/**
+ * wdrtas_get_boot_status - returns the reason for the last boot
+ *
+ * returns a bitmask of defines WDIOF_... as defined in
+ * include/linux/watchdog.h, indicating why the watchdog rebooted the system
+ */
+static int
+wdrtas_get_boot_status(void)
+{
+ return 0; /* TODO */
+}
+
+/*** watchdog API and operations stuff */
+
+/* wdrtas_write - called when watchdog device is written to
+ * @file: file structure
+ * @buf: user buffer with data
+ * @len: amount to data written
+ * @ppos: position in file
+ *
+ * returns the number of successfully processed characters, which is always
+ * the number of bytes passed to this function
+ *
+ * wdrtas_write processes all the data given to it and looks for the magic
+ * character 'V'. This character allows the watchdog device to be closed
+ * properly.
+ */
+static ssize_t
+wdrtas_write(struct file *file, const char __user *buf,
+ size_t len, loff_t *ppos)
+{
+ int i;
+ char c;
+
+ if (!len)
+ goto out;
+
+ if (!wdrtas_nowayout) {
+ wdrtas_expect_close = 0;
+ /* look for 'V' */
+ for (i = 0; i < len; i++) {
+ if (get_user(c, buf + i))
+ return -EFAULT;
+ /* allow to close device */
+ if (c == 'V')
+ wdrtas_expect_close = WDRTAS_MAGIC_CHAR;
+ }
+ }
+
+ wdrtas_timer_keepalive();
+
+out:
+ return len;
+}
+
+/**
+ * wdrtas_ioctl - ioctl function for the watchdog device
+ * @inode: inode structure
+ * @file: file structure
+ * @cmd: command for ioctl
+ * @arg: argument pointer
+ *
+ * returns 0 on success, <0 on failure
+ *
+ * wdrtas_ioctl implements the watchdog API ioctls
+ */
+static int
+wdrtas_ioctl(struct inode *inode, struct file *file,
+ unsigned int cmd, unsigned long arg)
+{
+ int __user *argp = (void *)arg;
+ int i;
+ static struct watchdog_info wdinfo = {
+ .options = WDRTAS_SUPPORTED_MASK,
+ .firmware_version = 0,
+ .identity = "wdrtas"
+ };
+
+ switch (cmd) {
+ case WDIOC_GETSUPPORT:
+ if (copy_to_user(argp, &wdinfo, sizeof(wdinfo)))
+ return -EFAULT;
+ return 0;
+
+ case WDIOC_GETSTATUS:
+ i = wdrtas_get_status();
+ return put_user(i, argp);
+
+ case WDIOC_GETBOOTSTATUS:
+ i = wdrtas_get_boot_status();
+ return put_user(i, argp);
+
+ case WDIOC_GETTEMP:
+ if (wdrtas_token_get_sensor_state == RTAS_UNKNOWN_SERVICE)
+ return -EOPNOTSUPP;
+
+ i = wdrtas_get_temperature();
+ return put_user(i, argp);
+
+ case WDIOC_SETOPTIONS:
+ if (get_user(i, argp))
+ return -EFAULT;
+ if (i & WDIOS_DISABLECARD)
+ wdrtas_timer_stop();
+ if (i & WDIOS_ENABLECARD) {
+ wdrtas_timer_keepalive();
+ wdrtas_timer_start();
+ }
+ if (i & WDIOS_TEMPPANIC) {
+ /* not implemented. Done by H8 */
+ }
+ return 0;
+
+ case WDIOC_KEEPALIVE:
+ wdrtas_timer_keepalive();
+ return 0;
+
+ case WDIOC_SETTIMEOUT:
+ if (get_user(i, argp))
+ return -EFAULT;
+
+ if (wdrtas_set_interval(i))
+ return -EINVAL;
+
+ wdrtas_timer_keepalive();
+
+ if (wdrtas_token_get_sp == RTAS_UNKNOWN_SERVICE)
+ wdrtas_interval = i;
+ else
+ wdrtas_interval = wdrtas_get_interval(i);
+ /* fallthrough */
+
+ case WDIOC_GETTIMEOUT:
+ return put_user(wdrtas_interval, argp);
+
+ default:
+ return -ENOIOCTLCMD;
+ }
+}
+
+/**
+ * wdrtas_open - open function of watchdog device
+ * @inode: inode structure
+ * @file: file structure
+ *
+ * returns 0 on success, -EBUSY if the file has been opened already, <0 on
+ * other failures
+ *
+ * function called when watchdog device is opened
+ */
+static int
+wdrtas_open(struct inode *inode, struct file *file)
+{
+ /* only open once */
+ if (xchg(&wdrtas_miscdev_open,1))
+ return -EBUSY;
+
+ wdrtas_timer_start();
+ wdrtas_timer_keepalive();
+
+ return nonseekable_open(inode, file);
+}
+
+/**
+ * wdrtas_close - close function of watchdog device
+ * @inode: inode structure
+ * @file: file structure
+ *
+ * returns 0 on success
+ *
+ * close function. Always succeeds
+ */
+static int
+wdrtas_close(struct inode *inode, struct file *file)
+{
+ /* only stop watchdog, if this was announced using 'V' before */
+ if (wdrtas_expect_close == WDRTAS_MAGIC_CHAR)
+ wdrtas_timer_stop();
+ else {
+ printk("wdrtas: got unexpected close. Watchdog "
+ "not stopped.\n");
+ wdrtas_timer_keepalive();
+ }
+
+ wdrtas_expect_close = 0;
+ xchg(&wdrtas_miscdev_open,0);
+ return 0;
+}
+
+/**
+ * wdrtas_temp_read - gives back the temperature in fahrenheit
+ * @file: file structure
+ * @buf: user buffer
+ * @count: number of bytes to be read
+ * @ppos: position in file
+ *
+ * returns always 1 or -EFAULT in case of user space copy failures, <0 on
+ * other failures
+ *
+ * wdrtas_temp_read gives the temperature to the users by copying this
+ * value as one byte into the user space buffer. The unit is Fahrenheit...
+ */
+static ssize_t
+wdrtas_temp_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ int temperature = 0;
+
+ temperature = wdrtas_get_temperature();
+ if (temperature < 0)
+ return temperature;
+
+ if (copy_to_user(buf, &temperature, 1))
+ return -EFAULT;
+
+ return 1;
+}
+
+/**
+ * wdrtas_temp_open - open function of temperature device
+ * @inode: inode structure
+ * @file: file structure
+ *
+ * returns 0 on success, <0 on failure
+ *
+ * function called when temperature device is opened
+ */
+static int
+wdrtas_temp_open(struct inode *inode, struct file *file)
+{
+ return nonseekable_open(inode, file);
+}
+
+/**
+ * wdrtas_temp_close - close function of temperature device
+ * @inode: inode structure
+ * @file: file structure
+ *
+ * returns 0 on success
+ *
+ * close function. Always succeeds
+ */
+static int
+wdrtas_temp_close(struct inode *inode, struct file *file)
+{
+ return 0;
+}
+
+/**
+ * wdrtas_reboot - reboot notifier function
+ * @nb: notifier block structure
+ * @code: reboot code
+ * @ptr: unused
+ *
+ * returns NOTIFY_DONE
+ *
+ * wdrtas_reboot stops the watchdog in case of a reboot
+ */
+static int
+wdrtas_reboot(struct notifier_block *this, unsigned long code, void *ptr)
+{
+ if ( (code==SYS_DOWN) || (code==SYS_HALT) )
+ wdrtas_timer_stop();
+
+ return NOTIFY_DONE;
+}
+
+/*** initialization stuff */
+
+static struct file_operations wdrtas_fops = {
+ .owner = THIS_MODULE,
+ .llseek = no_llseek,
+ .write = wdrtas_write,
+ .ioctl = wdrtas_ioctl,
+ .open = wdrtas_open,
+ .release = wdrtas_close,
+};
+
+static struct miscdevice wdrtas_miscdev = {
+ .minor = WATCHDOG_MINOR,
+ .name = "watchdog",
+ .fops = &wdrtas_fops,
+};
+
+static struct file_operations wdrtas_temp_fops = {
+ .owner = THIS_MODULE,
+ .llseek = no_llseek,
+ .read = wdrtas_temp_read,
+ .open = wdrtas_temp_open,
+ .release = wdrtas_temp_close,
+};
+
+static struct miscdevice wdrtas_tempdev = {
+ .minor = TEMP_MINOR,
+ .name = "temperature",
+ .fops = &wdrtas_temp_fops,
+};
+
+static struct notifier_block wdrtas_notifier = {
+ .notifier_call = wdrtas_reboot,
+};
+
+/**
+ * wdrtas_get_tokens - reads in RTAS tokens
+ *
+ * returns 0 on succes, <0 on failure
+ *
+ * wdrtas_get_tokens reads in the tokens for the RTAS calls used in
+ * this watchdog driver. It tolerates, if "get-sensor-state" and
+ * "ibm,get-system-parameter" are not available.
+ */
+static int
+wdrtas_get_tokens(void)
+{
+ wdrtas_token_get_sensor_state = rtas_token("get-sensor-state");
+ if (wdrtas_token_get_sensor_state == RTAS_UNKNOWN_SERVICE) {
+ printk("wdrtas: couldn't get token for get-sensor-state. "
+ "Trying to continue without temperature support.\n");
+ }
+
+ wdrtas_token_get_sp = rtas_token("ibm,get-system-parameter");
+ if (wdrtas_token_get_sp == RTAS_UNKNOWN_SERVICE) {
+ printk("wdrtas: couldn't get token for "
+ "ibm,get-system-parameter. Trying to continue with "
+ "a default timeout value of %i seconds.\n",
+ WDRTAS_DEFAULT_INTERVAL);
+ }
+
+ wdrtas_token_set_indicator = rtas_token("set-indicator");
+ if (wdrtas_token_set_indicator == RTAS_UNKNOWN_SERVICE) {
+ printk("wdrtas: couldn't get token for set-indicator. "
+ "Terminating watchdog code.\n");
+ return -EIO;
+ }
+
+ wdrtas_token_event_scan = rtas_token("event-scan");
+ if (wdrtas_token_event_scan == RTAS_UNKNOWN_SERVICE) {
+ printk("wdrtas: couldn't get token for event-scan. "
+ "Terminating watchdog code.\n");
+ return -EIO;
+ }
+
+ return 0;
+}
+
+/**
+ * wdrtas_unregister_devs - unregisters the misc dev handlers
+ *
+ * wdrtas_register_devs unregisters the watchdog and temperature watchdog
+ * misc devs
+ */
+static void
+wdrtas_unregister_devs(void)
+{
+ misc_deregister(&wdrtas_miscdev);
+ if (wdrtas_token_get_sensor_state != RTAS_UNKNOWN_SERVICE)
+ misc_deregister(&wdrtas_tempdev);
+}
+
+/**
+ * wdrtas_register_devs - registers the misc dev handlers
+ *
+ * returns 0 on succes, <0 on failure
+ *
+ * wdrtas_register_devs registers the watchdog and temperature watchdog
+ * misc devs
+ */
+static int
+wdrtas_register_devs(void)
+{
+ int result;
+
+ result = misc_register(&wdrtas_miscdev);
+ if (result) {
+ printk("wdrtas: couldn't register watchdog misc device. "
+ "Terminating watchdog code.\n");
+ return result;
+ }
+
+ if (wdrtas_token_get_sensor_state != RTAS_UNKNOWN_SERVICE) {
+ result = misc_register(&wdrtas_tempdev);
+ if (result) {
+ printk("wdrtas: couldn't register watchdog "
+ "temperature misc device. Continuing without "
+ "temperature support.\n");
+ wdrtas_token_get_sensor_state = RTAS_UNKNOWN_SERVICE;
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * wdrtas_init - init function of the watchdog driver
+ *
+ * returns 0 on succes, <0 on failure
+ *
+ * registers the file handlers and the reboot notifier
+ */
+static int __init
+wdrtas_init(void)
+{
+ if (wdrtas_get_tokens())
+ return -ENODEV;
+
+ if (wdrtas_register_devs())
+ return -ENODEV;
+
+ if (register_reboot_notifier(&wdrtas_notifier)) {
+ printk("wdrtas: could not register reboot notifier. "
+ "Terminating watchdog code.\n");
+ wdrtas_unregister_devs();
+ return -ENODEV;
+ }
+
+ if (wdrtas_token_get_sp == RTAS_UNKNOWN_SERVICE)
+ wdrtas_interval = WDRTAS_DEFAULT_INTERVAL;
+ else
+ wdrtas_interval = wdrtas_get_interval(WDRTAS_DEFAULT_INTERVAL);
+
+ return 0;
+}
+
+/**
+ * wdrtas_exit - exit function of the watchdog driver
+ *
+ * unregisters the file handlers and the reboot notifier
+ */
+static void __exit
+wdrtas_exit(void)
+{
+ if (!wdrtas_nowayout)
+ wdrtas_timer_stop();
+
+ wdrtas_unregister_devs();
+
+ unregister_reboot_notifier(&wdrtas_notifier);
+}
+
+module_init(wdrtas_init);
+module_exit(wdrtas_exit);

2005-05-13 20:32:05

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH 1/8] ppc64: split out generic rtas code from pSeries_pci.c

BPA is using rtas for PCI but should not be confused by
pSeries code. This also avoids some #ifdefs. Other
platforms that want to use rtas_pci.c could create
their own platform_pci.c with platform specific fixups.

Signed-off-by: Arnd Bergmann <[email protected]>

--- linux-cg.orig/arch/ppc64/kernel/Makefile 2005-05-13 14:56:19.016994560 -0400
+++ linux-cg/arch/ppc64/kernel/Makefile 2005-05-13 15:00:05.111971888 -0400
@@ -32,13 +32,14 @@ obj-$(CONFIG_PPC_MULTIPLATFORM) += nvram

obj-$(CONFIG_PPC_PSERIES) += pSeries_pci.o pSeries_lpar.o pSeries_hvCall.o \
pSeries_nvram.o rtasd.o ras.o pSeries_reconfig.o \
- xics.o rtas.o pSeries_setup.o pSeries_iommu.o
+ xics.o pSeries_setup.o pSeries_iommu.o

obj-$(CONFIG_EEH) += eeh.o
obj-$(CONFIG_PROC_FS) += proc_ppc64.o
obj-$(CONFIG_RTAS_FLASH) += rtas_flash.o
obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_MODULES) += module.o ppc_ksyms.o
+obj-$(CONFIG_PPC_RTAS) += rtas.o rtas_pci.o
obj-$(CONFIG_RTAS_PROC) += rtas-proc.o
obj-$(CONFIG_SCANLOG) += scanlog.o
obj-$(CONFIG_VIOPATH) += viopath.o
--- linux-cg.orig/arch/ppc64/kernel/mpic.h 2005-05-13 14:56:19.018994256 -0400
+++ linux-cg/arch/ppc64/kernel/mpic.h 2005-05-13 15:00:10.785908048 -0400
@@ -265,3 +265,6 @@ extern void mpic_send_ipi(unsigned int i
extern int mpic_get_one_irq(struct mpic *mpic, struct pt_regs *regs);
/* This one gets to the primary mpic */
extern int mpic_get_irq(struct pt_regs *regs);
+
+/* global mpic for pSeries */
+extern struct mpic *pSeries_mpic;
--- linux-cg.orig/arch/ppc64/kernel/pSeries_pci.c 2005-05-13 14:57:09.556898776 -0400
+++ linux-cg/arch/ppc64/kernel/pSeries_pci.c 2005-05-13 15:00:10.786907896 -0400
@@ -1,13 +1,11 @@
/*
- * pSeries_pci.c
+ * arch/ppc64/kernel/pSeries_pci.c
*
* Copyright (C) 2001 Dave Engebretsen, IBM Corporation
* Copyright (C) 2003 Anton Blanchard <[email protected]>, IBM
*
* pSeries specific routines for PCI.
*
- * Based on code from pci.c and chrp_pci.c
- *
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
@@ -23,430 +21,18 @@
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/

+#include <linux/init.h>
+#include <linux/ioport.h>
#include <linux/kernel.h>
-#include <linux/threads.h>
#include <linux/pci.h>
#include <linux/string.h>
-#include <linux/init.h>
-#include <linux/bootmem.h>

-#include <asm/io.h>
-#include <asm/pgtable.h>
-#include <asm/irq.h>
-#include <asm/prom.h>
-#include <asm/machdep.h>
#include <asm/pci-bridge.h>
-#include <asm/iommu.h>
-#include <asm/rtas.h>
+#include <asm/prom.h>

-#include "mpic.h"
#include "pci.h"

-/* RTAS tokens */
-static int read_pci_config;
-static int write_pci_config;
-static int ibm_read_pci_config;
-static int ibm_write_pci_config;
-
-static int s7a_workaround;
-
-extern struct mpic *pSeries_mpic;
-
-static int config_access_valid(struct device_node *dn, int where)
-{
- if (where < 256)
- return 1;
- if (where < 4096 && dn->pci_ext_config_space)
- return 1;
-
- return 0;
-}
-
-static int rtas_read_config(struct device_node *dn, int where, int size, u32 *val)
-{
- int returnval = -1;
- unsigned long buid, addr;
- int ret;
-
- if (!dn)
- return PCIBIOS_DEVICE_NOT_FOUND;
- if (!config_access_valid(dn, where))
- return PCIBIOS_BAD_REGISTER_NUMBER;
-
- addr = ((where & 0xf00) << 20) | (dn->busno << 16) |
- (dn->devfn << 8) | (where & 0xff);
- buid = dn->phb->buid;
- if (buid) {
- ret = rtas_call(ibm_read_pci_config, 4, 2, &returnval,
- addr, buid >> 32, buid & 0xffffffff, size);
- } else {
- ret = rtas_call(read_pci_config, 2, 2, &returnval, addr, size);
- }
- *val = returnval;
-
- if (ret)
- return PCIBIOS_DEVICE_NOT_FOUND;
-
- if (returnval == EEH_IO_ERROR_VALUE(size)
- && eeh_dn_check_failure (dn, NULL))
- return PCIBIOS_DEVICE_NOT_FOUND;
-
- return PCIBIOS_SUCCESSFUL;
-}
-
-static int rtas_pci_read_config(struct pci_bus *bus,
- unsigned int devfn,
- int where, int size, u32 *val)
-{
- struct device_node *busdn, *dn;
-
- if (bus->self)
- busdn = pci_device_to_OF_node(bus->self);
- else
- busdn = bus->sysdata; /* must be a phb */
-
- /* Search only direct children of the bus */
- for (dn = busdn->child; dn; dn = dn->sibling)
- if (dn->devfn == devfn)
- return rtas_read_config(dn, where, size, val);
- return PCIBIOS_DEVICE_NOT_FOUND;
-}
-
-static int rtas_write_config(struct device_node *dn, int where, int size, u32 val)
-{
- unsigned long buid, addr;
- int ret;
-
- if (!dn)
- return PCIBIOS_DEVICE_NOT_FOUND;
- if (!config_access_valid(dn, where))
- return PCIBIOS_BAD_REGISTER_NUMBER;
-
- addr = ((where & 0xf00) << 20) | (dn->busno << 16) |
- (dn->devfn << 8) | (where & 0xff);
- buid = dn->phb->buid;
- if (buid) {
- ret = rtas_call(ibm_write_pci_config, 5, 1, NULL, addr, buid >> 32, buid & 0xffffffff, size, (ulong) val);
- } else {
- ret = rtas_call(write_pci_config, 3, 1, NULL, addr, size, (ulong)val);
- }
-
- if (ret)
- return PCIBIOS_DEVICE_NOT_FOUND;
-
- return PCIBIOS_SUCCESSFUL;
-}
-
-static int rtas_pci_write_config(struct pci_bus *bus,
- unsigned int devfn,
- int where, int size, u32 val)
-{
- struct device_node *busdn, *dn;
-
- if (bus->self)
- busdn = pci_device_to_OF_node(bus->self);
- else
- busdn = bus->sysdata; /* must be a phb */
-
- /* Search only direct children of the bus */
- for (dn = busdn->child; dn; dn = dn->sibling)
- if (dn->devfn == devfn)
- return rtas_write_config(dn, where, size, val);
- return PCIBIOS_DEVICE_NOT_FOUND;
-}
-
-struct pci_ops rtas_pci_ops = {
- rtas_pci_read_config,
- rtas_pci_write_config
-};
-
-int is_python(struct device_node *dev)
-{
- char *model = (char *)get_property(dev, "model", NULL);
-
- if (model && strstr(model, "Python"))
- return 1;
-
- return 0;
-}
-
-static int get_phb_reg_prop(struct device_node *dev,
- unsigned int addr_size_words,
- struct reg_property64 *reg)
-{
- unsigned int *ui_ptr = NULL, len;
-
- /* Found a PHB, now figure out where his registers are mapped. */
- ui_ptr = (unsigned int *)get_property(dev, "reg", &len);
- if (ui_ptr == NULL)
- return 1;
-
- if (addr_size_words == 1) {
- reg->address = ((struct reg_property32 *)ui_ptr)->address;
- reg->size = ((struct reg_property32 *)ui_ptr)->size;
- } else {
- *reg = *((struct reg_property64 *)ui_ptr);
- }
-
- return 0;
-}
-
-static void python_countermeasures(struct device_node *dev,
- unsigned int addr_size_words)
-{
- struct reg_property64 reg_struct;
- void __iomem *chip_regs;
- volatile u32 val;
-
- if (get_phb_reg_prop(dev, addr_size_words, &reg_struct))
- return;
-
- /* Python's register file is 1 MB in size. */
- chip_regs = ioremap(reg_struct.address & ~(0xfffffUL), 0x100000);
-
- /*
- * Firmware doesn't always clear this bit which is critical
- * for good performance - Anton
- */
-
-#define PRG_CL_RESET_VALID 0x00010000
-
- val = in_be32(chip_regs + 0xf6030);
- if (val & PRG_CL_RESET_VALID) {
- printk(KERN_INFO "Python workaround: ");
- val &= ~PRG_CL_RESET_VALID;
- out_be32(chip_regs + 0xf6030, val);
- /*
- * We must read it back for changes to
- * take effect
- */
- val = in_be32(chip_regs + 0xf6030);
- printk("reg0: %x\n", val);
- }
-
- iounmap(chip_regs);
-}
-
-void __init init_pci_config_tokens (void)
-{
- read_pci_config = rtas_token("read-pci-config");
- write_pci_config = rtas_token("write-pci-config");
- ibm_read_pci_config = rtas_token("ibm,read-pci-config");
- ibm_write_pci_config = rtas_token("ibm,write-pci-config");
-}
-
-unsigned long __devinit get_phb_buid (struct device_node *phb)
-{
- int addr_cells;
- unsigned int *buid_vals;
- unsigned int len;
- unsigned long buid;
-
- if (ibm_read_pci_config == -1) return 0;
-
- /* PHB's will always be children of the root node,
- * or so it is promised by the current firmware. */
- if (phb->parent == NULL)
- return 0;
- if (phb->parent->parent)
- return 0;
-
- buid_vals = (unsigned int *) get_property(phb, "reg", &len);
- if (buid_vals == NULL)
- return 0;
-
- addr_cells = prom_n_addr_cells(phb);
- if (addr_cells == 1) {
- buid = (unsigned long) buid_vals[0];
- } else {
- buid = (((unsigned long)buid_vals[0]) << 32UL) |
- (((unsigned long)buid_vals[1]) & 0xffffffff);
- }
- return buid;
-}
-
-static int phb_set_bus_ranges(struct device_node *dev,
- struct pci_controller *phb)
-{
- int *bus_range;
- unsigned int len;
-
- bus_range = (int *) get_property(dev, "bus-range", &len);
- if (bus_range == NULL || len < 2 * sizeof(int)) {
- return 1;
- }
-
- phb->first_busno = bus_range[0];
- phb->last_busno = bus_range[1];
-
- return 0;
-}
-
-static int __devinit setup_phb(struct device_node *dev,
- struct pci_controller *phb,
- unsigned int addr_size_words)
-{
- pci_setup_pci_controller(phb);
-
- if (is_python(dev))
- python_countermeasures(dev, addr_size_words);
-
- if (phb_set_bus_ranges(dev, phb))
- return 1;
-
- phb->arch_data = dev;
- phb->ops = &rtas_pci_ops;
- phb->buid = get_phb_buid(dev);
-
- return 0;
-}
-
-static void __devinit add_linux_pci_domain(struct device_node *dev,
- struct pci_controller *phb,
- struct property *of_prop)
-{
- memset(of_prop, 0, sizeof(struct property));
- of_prop->name = "linux,pci-domain";
- of_prop->length = sizeof(phb->global_number);
- of_prop->value = (unsigned char *)&of_prop[1];
- memcpy(of_prop->value, &phb->global_number, sizeof(phb->global_number));
- prom_add_property(dev, of_prop);
-}
-
-static struct pci_controller * __init alloc_phb(struct device_node *dev,
- unsigned int addr_size_words)
-{
- struct pci_controller *phb;
- struct property *of_prop;
-
- phb = alloc_bootmem(sizeof(struct pci_controller));
- if (phb == NULL)
- return NULL;
-
- of_prop = alloc_bootmem(sizeof(struct property) +
- sizeof(phb->global_number));
- if (!of_prop)
- return NULL;
-
- if (setup_phb(dev, phb, addr_size_words))
- return NULL;
-
- add_linux_pci_domain(dev, phb, of_prop);
-
- return phb;
-}
-
-static struct pci_controller * __devinit alloc_phb_dynamic(struct device_node *dev, unsigned int addr_size_words)
-{
- struct pci_controller *phb;
-
- phb = (struct pci_controller *)kmalloc(sizeof(struct pci_controller),
- GFP_KERNEL);
- if (phb == NULL)
- return NULL;
-
- if (setup_phb(dev, phb, addr_size_words))
- return NULL;
-
- phb->is_dynamic = 1;
-
- /* TODO: linux,pci-domain? */
-
- return phb;
-}
-
-unsigned long __init find_and_init_phbs(void)
-{
- struct device_node *node;
- struct pci_controller *phb;
- unsigned int root_size_cells = 0;
- unsigned int index;
- unsigned int *opprop = NULL;
- struct device_node *root = of_find_node_by_path("/");
-
- if (ppc64_interrupt_controller == IC_OPEN_PIC) {
- opprop = (unsigned int *)get_property(root,
- "platform-open-pic", NULL);
- }
-
- root_size_cells = prom_n_size_cells(root);
-
- index = 0;
-
- for (node = of_get_next_child(root, NULL);
- node != NULL;
- node = of_get_next_child(root, node)) {
- if (node->type == NULL || strcmp(node->type, "pci") != 0)
- continue;
-
- phb = alloc_phb(node, root_size_cells);
- if (!phb)
- continue;
-
- pci_process_bridge_OF_ranges(phb, node);
- pci_setup_phb_io(phb, index == 0);
-
- if (ppc64_interrupt_controller == IC_OPEN_PIC && pSeries_mpic) {
- int addr = root_size_cells * (index + 2) - 1;
- mpic_assign_isu(pSeries_mpic, index, opprop[addr]);
- }
-
- index++;
- }
-
- of_node_put(root);
- pci_devs_phb_init();
-
- /*
- * pci_probe_only and pci_assign_all_buses can be set via properties
- * in chosen.
- */
- if (of_chosen) {
- int *prop;
-
- prop = (int *)get_property(of_chosen, "linux,pci-probe-only",
- NULL);
- if (prop)
- pci_probe_only = *prop;
-
- prop = (int *)get_property(of_chosen,
- "linux,pci-assign-all-buses", NULL);
- if (prop)
- pci_assign_all_buses = *prop;
- }
-
- return 0;
-}
-
-struct pci_controller * __devinit init_phb_dynamic(struct device_node *dn)
-{
- struct device_node *root = of_find_node_by_path("/");
- unsigned int root_size_cells = 0;
- struct pci_controller *phb;
- struct pci_bus *bus;
- int primary;
-
- root_size_cells = prom_n_size_cells(root);
-
- primary = list_empty(&hose_list);
- phb = alloc_phb_dynamic(dn, root_size_cells);
- if (!phb)
- return NULL;
-
- pci_process_bridge_OF_ranges(phb, dn);
-
- pci_setup_phb_io_dynamic(phb, primary);
- of_node_put(root);
-
- pci_devs_phb_init_dynamic(phb);
- phb->last_busno = 0xff;
- bus = pci_scan_bus(phb->first_busno, phb->ops, phb->arch_data);
- phb->bus = bus;
- phb->last_busno = bus->subordinate;
-
- return phb;
-}
-EXPORT_SYMBOL(init_phb_dynamic);
+static int __initdata s7a_workaround;

#if 0
void pcibios_name_device(struct pci_dev *dev)
@@ -474,7 +60,7 @@ void pcibios_name_device(struct pci_dev
DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pcibios_name_device);
#endif

-static void check_s7a(void)
+static void __init check_s7a(void)
{
struct device_node *root;
char *model;
@@ -488,56 +74,6 @@ static void check_s7a(void)
}
}

-/* RPA-specific bits for removing PHBs */
-int pcibios_remove_root_bus(struct pci_controller *phb)
-{
- struct pci_bus *b = phb->bus;
- struct resource *res;
- int rc, i;
-
- res = b->resource[0];
- if (!res->flags) {
- printk(KERN_ERR "%s: no IO resource for PHB %s\n", __FUNCTION__,
- b->name);
- return 1;
- }
-
- rc = unmap_bus_range(b);
- if (rc) {
- printk(KERN_ERR "%s: failed to unmap IO on bus %s\n",
- __FUNCTION__, b->name);
- return 1;
- }
-
- if (release_resource(res)) {
- printk(KERN_ERR "%s: failed to release IO on bus %s\n",
- __FUNCTION__, b->name);
- return 1;
- }
-
- for (i = 1; i < 3; ++i) {
- res = b->resource[i];
- if (!res->flags && i == 0) {
- printk(KERN_ERR "%s: no MEM resource for PHB %s\n",
- __FUNCTION__, b->name);
- return 1;
- }
- if (res->flags && release_resource(res)) {
- printk(KERN_ERR
- "%s: failed to release IO %d on bus %s\n",
- __FUNCTION__, i, b->name);
- return 1;
- }
- }
-
- list_del(&phb->list_node);
- if (phb->is_dynamic)
- kfree(phb);
-
- return 0;
-}
-EXPORT_SYMBOL(pcibios_remove_root_bus);
-
static void __init pSeries_request_regions(void)
{
if (!isa_io_base)
--- linux-cg.orig/arch/ppc64/kernel/rtas_pci.c 1969-12-31 19:00:00.000000000 -0500
+++ linux-cg/arch/ppc64/kernel/rtas_pci.c 2005-05-13 15:00:10.788907592 -0400
@@ -0,0 +1,495 @@
+/*
+ * arch/ppc64/kernel/rtas_pci.c
+ *
+ * Copyright (C) 2001 Dave Engebretsen, IBM Corporation
+ * Copyright (C) 2003 Anton Blanchard <[email protected]>, IBM
+ *
+ * RTAS specific routines for PCI.
+ *
+ * Based on code from pci.c, chrp_pci.c and pSeries_pci.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/kernel.h>
+#include <linux/threads.h>
+#include <linux/pci.h>
+#include <linux/string.h>
+#include <linux/init.h>
+#include <linux/bootmem.h>
+
+#include <asm/io.h>
+#include <asm/pgtable.h>
+#include <asm/irq.h>
+#include <asm/prom.h>
+#include <asm/machdep.h>
+#include <asm/pci-bridge.h>
+#include <asm/iommu.h>
+#include <asm/rtas.h>
+
+#include "mpic.h"
+#include "pci.h"
+
+/* RTAS tokens */
+static int read_pci_config;
+static int write_pci_config;
+static int ibm_read_pci_config;
+static int ibm_write_pci_config;
+
+static int config_access_valid(struct device_node *dn, int where)
+{
+ if (where < 256)
+ return 1;
+ if (where < 4096 && dn->pci_ext_config_space)
+ return 1;
+
+ return 0;
+}
+
+static int rtas_read_config(struct device_node *dn, int where, int size, u32 *val)
+{
+ int returnval = -1;
+ unsigned long buid, addr;
+ int ret;
+
+ if (!dn)
+ return PCIBIOS_DEVICE_NOT_FOUND;
+ if (!config_access_valid(dn, where))
+ return PCIBIOS_BAD_REGISTER_NUMBER;
+
+ addr = ((where & 0xf00) << 20) | (dn->busno << 16) |
+ (dn->devfn << 8) | (where & 0xff);
+ buid = dn->phb->buid;
+ if (buid) {
+ ret = rtas_call(ibm_read_pci_config, 4, 2, &returnval,
+ addr, buid >> 32, buid & 0xffffffff, size);
+ } else {
+ ret = rtas_call(read_pci_config, 2, 2, &returnval, addr, size);
+ }
+ *val = returnval;
+
+ if (ret)
+ return PCIBIOS_DEVICE_NOT_FOUND;
+
+ if (returnval == EEH_IO_ERROR_VALUE(size)
+ && eeh_dn_check_failure (dn, NULL))
+ return PCIBIOS_DEVICE_NOT_FOUND;
+
+ return PCIBIOS_SUCCESSFUL;
+}
+
+static int rtas_pci_read_config(struct pci_bus *bus,
+ unsigned int devfn,
+ int where, int size, u32 *val)
+{
+ struct device_node *busdn, *dn;
+
+ if (bus->self)
+ busdn = pci_device_to_OF_node(bus->self);
+ else
+ busdn = bus->sysdata; /* must be a phb */
+
+ /* Search only direct children of the bus */
+ for (dn = busdn->child; dn; dn = dn->sibling)
+ if (dn->devfn == devfn)
+ return rtas_read_config(dn, where, size, val);
+ return PCIBIOS_DEVICE_NOT_FOUND;
+}
+
+static int rtas_write_config(struct device_node *dn, int where, int size, u32 val)
+{
+ unsigned long buid, addr;
+ int ret;
+
+ if (!dn)
+ return PCIBIOS_DEVICE_NOT_FOUND;
+ if (!config_access_valid(dn, where))
+ return PCIBIOS_BAD_REGISTER_NUMBER;
+
+ addr = ((where & 0xf00) << 20) | (dn->busno << 16) |
+ (dn->devfn << 8) | (where & 0xff);
+ buid = dn->phb->buid;
+ if (buid) {
+ ret = rtas_call(ibm_write_pci_config, 5, 1, NULL, addr, buid >> 32, buid & 0xffffffff, size, (ulong) val);
+ } else {
+ ret = rtas_call(write_pci_config, 3, 1, NULL, addr, size, (ulong)val);
+ }
+
+ if (ret)
+ return PCIBIOS_DEVICE_NOT_FOUND;
+
+ return PCIBIOS_SUCCESSFUL;
+}
+
+static int rtas_pci_write_config(struct pci_bus *bus,
+ unsigned int devfn,
+ int where, int size, u32 val)
+{
+ struct device_node *busdn, *dn;
+
+ if (bus->self)
+ busdn = pci_device_to_OF_node(bus->self);
+ else
+ busdn = bus->sysdata; /* must be a phb */
+
+ /* Search only direct children of the bus */
+ for (dn = busdn->child; dn; dn = dn->sibling)
+ if (dn->devfn == devfn)
+ return rtas_write_config(dn, where, size, val);
+ return PCIBIOS_DEVICE_NOT_FOUND;
+}
+
+struct pci_ops rtas_pci_ops = {
+ rtas_pci_read_config,
+ rtas_pci_write_config
+};
+
+int is_python(struct device_node *dev)
+{
+ char *model = (char *)get_property(dev, "model", NULL);
+
+ if (model && strstr(model, "Python"))
+ return 1;
+
+ return 0;
+}
+
+static int get_phb_reg_prop(struct device_node *dev,
+ unsigned int addr_size_words,
+ struct reg_property64 *reg)
+{
+ unsigned int *ui_ptr = NULL, len;
+
+ /* Found a PHB, now figure out where his registers are mapped. */
+ ui_ptr = (unsigned int *)get_property(dev, "reg", &len);
+ if (ui_ptr == NULL)
+ return 1;
+
+ if (addr_size_words == 1) {
+ reg->address = ((struct reg_property32 *)ui_ptr)->address;
+ reg->size = ((struct reg_property32 *)ui_ptr)->size;
+ } else {
+ *reg = *((struct reg_property64 *)ui_ptr);
+ }
+
+ return 0;
+}
+
+static void python_countermeasures(struct device_node *dev,
+ unsigned int addr_size_words)
+{
+ struct reg_property64 reg_struct;
+ void __iomem *chip_regs;
+ volatile u32 val;
+
+ if (get_phb_reg_prop(dev, addr_size_words, &reg_struct))
+ return;
+
+ /* Python's register file is 1 MB in size. */
+ chip_regs = ioremap(reg_struct.address & ~(0xfffffUL), 0x100000);
+
+ /*
+ * Firmware doesn't always clear this bit which is critical
+ * for good performance - Anton
+ */
+
+#define PRG_CL_RESET_VALID 0x00010000
+
+ val = in_be32(chip_regs + 0xf6030);
+ if (val & PRG_CL_RESET_VALID) {
+ printk(KERN_INFO "Python workaround: ");
+ val &= ~PRG_CL_RESET_VALID;
+ out_be32(chip_regs + 0xf6030, val);
+ /*
+ * We must read it back for changes to
+ * take effect
+ */
+ val = in_be32(chip_regs + 0xf6030);
+ printk("reg0: %x\n", val);
+ }
+
+ iounmap(chip_regs);
+}
+
+void __init init_pci_config_tokens (void)
+{
+ read_pci_config = rtas_token("read-pci-config");
+ write_pci_config = rtas_token("write-pci-config");
+ ibm_read_pci_config = rtas_token("ibm,read-pci-config");
+ ibm_write_pci_config = rtas_token("ibm,write-pci-config");
+}
+
+unsigned long __devinit get_phb_buid (struct device_node *phb)
+{
+ int addr_cells;
+ unsigned int *buid_vals;
+ unsigned int len;
+ unsigned long buid;
+
+ if (ibm_read_pci_config == -1) return 0;
+
+ /* PHB's will always be children of the root node,
+ * or so it is promised by the current firmware. */
+ if (phb->parent == NULL)
+ return 0;
+ if (phb->parent->parent)
+ return 0;
+
+ buid_vals = (unsigned int *) get_property(phb, "reg", &len);
+ if (buid_vals == NULL)
+ return 0;
+
+ addr_cells = prom_n_addr_cells(phb);
+ if (addr_cells == 1) {
+ buid = (unsigned long) buid_vals[0];
+ } else {
+ buid = (((unsigned long)buid_vals[0]) << 32UL) |
+ (((unsigned long)buid_vals[1]) & 0xffffffff);
+ }
+ return buid;
+}
+
+static int phb_set_bus_ranges(struct device_node *dev,
+ struct pci_controller *phb)
+{
+ int *bus_range;
+ unsigned int len;
+
+ bus_range = (int *) get_property(dev, "bus-range", &len);
+ if (bus_range == NULL || len < 2 * sizeof(int)) {
+ return 1;
+ }
+
+ phb->first_busno = bus_range[0];
+ phb->last_busno = bus_range[1];
+
+ return 0;
+}
+
+static int __devinit setup_phb(struct device_node *dev,
+ struct pci_controller *phb,
+ unsigned int addr_size_words)
+{
+ pci_setup_pci_controller(phb);
+
+ if (is_python(dev))
+ python_countermeasures(dev, addr_size_words);
+
+ if (phb_set_bus_ranges(dev, phb))
+ return 1;
+
+ phb->arch_data = dev;
+ phb->ops = &rtas_pci_ops;
+ phb->buid = get_phb_buid(dev);
+
+ return 0;
+}
+
+static void __devinit add_linux_pci_domain(struct device_node *dev,
+ struct pci_controller *phb,
+ struct property *of_prop)
+{
+ memset(of_prop, 0, sizeof(struct property));
+ of_prop->name = "linux,pci-domain";
+ of_prop->length = sizeof(phb->global_number);
+ of_prop->value = (unsigned char *)&of_prop[1];
+ memcpy(of_prop->value, &phb->global_number, sizeof(phb->global_number));
+ prom_add_property(dev, of_prop);
+}
+
+static struct pci_controller * __init alloc_phb(struct device_node *dev,
+ unsigned int addr_size_words)
+{
+ struct pci_controller *phb;
+ struct property *of_prop;
+
+ phb = alloc_bootmem(sizeof(struct pci_controller));
+ if (phb == NULL)
+ return NULL;
+
+ of_prop = alloc_bootmem(sizeof(struct property) +
+ sizeof(phb->global_number));
+ if (!of_prop)
+ return NULL;
+
+ if (setup_phb(dev, phb, addr_size_words))
+ return NULL;
+
+ add_linux_pci_domain(dev, phb, of_prop);
+
+ return phb;
+}
+
+static struct pci_controller * __devinit alloc_phb_dynamic(struct device_node *dev, unsigned int addr_size_words)
+{
+ struct pci_controller *phb;
+
+ phb = (struct pci_controller *)kmalloc(sizeof(struct pci_controller),
+ GFP_KERNEL);
+ if (phb == NULL)
+ return NULL;
+
+ if (setup_phb(dev, phb, addr_size_words))
+ return NULL;
+
+ phb->is_dynamic = 1;
+
+ /* TODO: linux,pci-domain? */
+
+ return phb;
+}
+
+unsigned long __init find_and_init_phbs(void)
+{
+ struct device_node *node;
+ struct pci_controller *phb;
+ unsigned int root_size_cells = 0;
+ unsigned int index;
+ unsigned int *opprop = NULL;
+ struct device_node *root = of_find_node_by_path("/");
+
+ if (ppc64_interrupt_controller == IC_OPEN_PIC) {
+ opprop = (unsigned int *)get_property(root,
+ "platform-open-pic", NULL);
+ }
+
+ root_size_cells = prom_n_size_cells(root);
+
+ index = 0;
+
+ for (node = of_get_next_child(root, NULL);
+ node != NULL;
+ node = of_get_next_child(root, node)) {
+ if (node->type == NULL || strcmp(node->type, "pci") != 0)
+ continue;
+
+ phb = alloc_phb(node, root_size_cells);
+ if (!phb)
+ continue;
+
+ pci_process_bridge_OF_ranges(phb, node);
+ pci_setup_phb_io(phb, index == 0);
+#ifdef CONFIG_PPC_PSERIES
+ if (ppc64_interrupt_controller == IC_OPEN_PIC && pSeries_mpic) {
+ int addr = root_size_cells * (index + 2) - 1;
+ mpic_assign_isu(pSeries_mpic, index, opprop[addr]);
+ }
+#endif
+ index++;
+ }
+
+ of_node_put(root);
+ pci_devs_phb_init();
+
+ /*
+ * pci_probe_only and pci_assign_all_buses can be set via properties
+ * in chosen.
+ */
+ if (of_chosen) {
+ int *prop;
+
+ prop = (int *)get_property(of_chosen, "linux,pci-probe-only",
+ NULL);
+ if (prop)
+ pci_probe_only = *prop;
+
+ prop = (int *)get_property(of_chosen,
+ "linux,pci-assign-all-buses", NULL);
+ if (prop)
+ pci_assign_all_buses = *prop;
+ }
+
+ return 0;
+}
+
+struct pci_controller * __devinit init_phb_dynamic(struct device_node *dn)
+{
+ struct device_node *root = of_find_node_by_path("/");
+ unsigned int root_size_cells = 0;
+ struct pci_controller *phb;
+ struct pci_bus *bus;
+ int primary;
+
+ root_size_cells = prom_n_size_cells(root);
+
+ primary = list_empty(&hose_list);
+ phb = alloc_phb_dynamic(dn, root_size_cells);
+ if (!phb)
+ return NULL;
+
+ pci_process_bridge_OF_ranges(phb, dn);
+
+ pci_setup_phb_io_dynamic(phb, primary);
+ of_node_put(root);
+
+ pci_devs_phb_init_dynamic(phb);
+ phb->last_busno = 0xff;
+ bus = pci_scan_bus(phb->first_busno, phb->ops, phb->arch_data);
+ phb->bus = bus;
+ phb->last_busno = bus->subordinate;
+
+ return phb;
+}
+EXPORT_SYMBOL(init_phb_dynamic);
+
+/* RPA-specific bits for removing PHBs */
+int pcibios_remove_root_bus(struct pci_controller *phb)
+{
+ struct pci_bus *b = phb->bus;
+ struct resource *res;
+ int rc, i;
+
+ res = b->resource[0];
+ if (!res->flags) {
+ printk(KERN_ERR "%s: no IO resource for PHB %s\n", __FUNCTION__,
+ b->name);
+ return 1;
+ }
+
+ rc = unmap_bus_range(b);
+ if (rc) {
+ printk(KERN_ERR "%s: failed to unmap IO on bus %s\n",
+ __FUNCTION__, b->name);
+ return 1;
+ }
+
+ if (release_resource(res)) {
+ printk(KERN_ERR "%s: failed to release IO on bus %s\n",
+ __FUNCTION__, b->name);
+ return 1;
+ }
+
+ for (i = 1; i < 3; ++i) {
+ res = b->resource[i];
+ if (!res->flags && i == 0) {
+ printk(KERN_ERR "%s: no MEM resource for PHB %s\n",
+ __FUNCTION__, b->name);
+ return 1;
+ }
+ if (res->flags && release_resource(res)) {
+ printk(KERN_ERR
+ "%s: failed to release IO %d on bus %s\n",
+ __FUNCTION__, i, b->name);
+ return 1;
+ }
+ }
+
+ list_del(&phb->list_node);
+ if (phb->is_dynamic)
+ kfree(phb);
+
+ return 0;
+}
+EXPORT_SYMBOL(pcibios_remove_root_bus);

2005-05-13 20:32:40

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH 2/8] ppc64: add a minimal nvram driver

The firmware provides the location and size of the nvram
in the device tree, so it does not really contain any
hardware specific bits and could be used on other
machines as well.

From: Utz Bacher <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>

Index: linus-2.5/arch/ppc64/kernel/bpa_nvram.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linus-2.5/arch/ppc64/kernel/bpa_nvram.c 2005-04-20 01:55:36.000000000 +0200
@@ -0,0 +1,118 @@
+/*
+ * NVRAM for CPBW
+ *
+ * (C) Copyright IBM Corp. 2005
+ *
+ * Authors : Utz Bacher <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+
+#include <asm/machdep.h>
+#include <asm/nvram.h>
+#include <asm/prom.h>
+
+static void __iomem *bpa_nvram_start;
+static long bpa_nvram_len;
+static spinlock_t bpa_nvram_lock = SPIN_LOCK_UNLOCKED;
+
+static ssize_t bpa_nvram_read(char *buf, size_t count, loff_t *index)
+{
+ unsigned long flags;
+
+ if (*index >= bpa_nvram_len)
+ return 0;
+ if (*index + count > bpa_nvram_len)
+ count = bpa_nvram_len - *index;
+
+ spin_lock_irqsave(&bpa_nvram_lock, flags);
+
+ memcpy_fromio(buf, bpa_nvram_start + *index, count);
+
+ spin_unlock_irqrestore(&bpa_nvram_lock, flags);
+
+ *index += count;
+ return count;
+}
+
+static ssize_t bpa_nvram_write(char *buf, size_t count, loff_t *index)
+{
+ unsigned long flags;
+
+ if (*index >= bpa_nvram_len)
+ return 0;
+ if (*index + count > bpa_nvram_len)
+ count = bpa_nvram_len - *index;
+
+ spin_lock_irqsave(&bpa_nvram_lock, flags);
+
+ memcpy_toio(bpa_nvram_start + *index, buf, count);
+
+ spin_unlock_irqrestore(&bpa_nvram_lock, flags);
+
+ *index += count;
+ return count;
+}
+
+static ssize_t bpa_nvram_get_size(void)
+{
+ return bpa_nvram_len;
+}
+
+int __init bpa_nvram_init(void)
+{
+ struct device_node *nvram_node;
+ unsigned long *buffer;
+ int proplen;
+ unsigned long nvram_addr;
+ int ret;
+
+ ret = -ENODEV;
+ nvram_node = of_find_node_by_type(NULL, "nvram");
+ if (!nvram_node)
+ goto out;
+
+ ret = -EIO;
+ buffer = (unsigned long *)get_property(nvram_node, "reg", &proplen);
+ if (proplen != 2*sizeof(unsigned long))
+ goto out;
+
+ ret = -ENODEV;
+ nvram_addr = buffer[0];
+ bpa_nvram_len = buffer[1];
+ if ( (!bpa_nvram_len) || (!nvram_addr) )
+ goto out;
+
+ bpa_nvram_start = ioremap(nvram_addr, bpa_nvram_len);
+ if (!bpa_nvram_start)
+ goto out;
+
+ printk(KERN_INFO "BPA NVRAM, %luk mapped to %p\n",
+ bpa_nvram_len >> 10, bpa_nvram_start);
+
+ ppc_md.nvram_read = bpa_nvram_read;
+ ppc_md.nvram_write = bpa_nvram_write;
+ ppc_md.nvram_size = bpa_nvram_get_size;
+
+out:
+ of_node_put(nvram_node);
+ return ret;
+}
Index: linus-2.5/include/asm-ppc64/nvram.h
===================================================================
--- linus-2.5.orig/include/asm-ppc64/nvram.h 2005-04-20 01:54:03.000000000 +0200
+++ linus-2.5/include/asm-ppc64/nvram.h 2005-04-20 01:55:36.000000000 +0200
@@ -70,6 +70,7 @@

extern int pSeries_nvram_init(void);
extern int pmac_nvram_init(void);
+extern int bpa_nvram_init(void);

/* PowerMac specific nvram stuffs */


2005-05-13 23:35:00

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 7/8] ppc64: SPU file system


> /run A stub file that lets us do ioctl. The only ioctl
> method we need is the spu_run() call. spu_run suspends
> the current thread from the host CPU and transfers
> the flow of execution to the SPU.
> The ioctl call return to the calling thread when a state
> is entered that can not be handled by the kernel, e.g.
> an error in the SPU code or an exit() from it.
> When a signal is pending for the host CPU thread, the
> ioctl is interrupted and the SPU stopped in order to
> call the signal handler.

ioctl's are generally considered evil ... what about a write() method
writing a command ?

Ben.


2005-05-14 07:46:19

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 7/8] ppc64: SPU file system

On Fri, May 13, 2005 at 09:29:06PM +0200, Arnd Bergmann wrote:
> This is an early version of the SPU file system, which is used
> to run code on the Synergistic Processing Units of the Broadband
> Engine.

The whitespace seems a bit dammaged in places, check your tabs vs.
spaces...

> /run A stub file that lets us do ioctl.

No, as Ben said, do not do this. Use write. And as you are only doing
1 type of ioctl, it shouldn't be an issue. Also it will be faster than
the ioctl due to lack of BKL usage :)

And I don't quite think you do the proper permission and validate of the
data in your code, you should verify this is all correct.

> +/**** spufs attributes
> + *
> + * Attributes in spufs behave similar to those in sysfs:
> + *
> + * Writing to an attribute immediately sets a value, an open file can be
> + * written to multiple times.
> + *
> + * Reading from an attribute creates a buffer from the value that might get
> + * read with multiple read calls. When the attribute has been read completely,
> + * no further read calls are possible until the file is opened again.
> + *
> + * All spufs attributes contain a text representation of a numeric value that
> + * are accessed with the get() and set() functions.
> + *
> + * Perhaps these file operations could be put in debugfs or libfs instead,
> + * they are not really SPU specific.

Yes they should. I'll gladly take them for debugfs or like you state,
libfs is probably the better place for them so everyone can use them.

If you make up a patch, I'll fix up debugfs to use them properly.

> +#define spufs_attribute(name) \
> +static int name ## _open(struct inode *inode, struct file *file) \
> +{ \
> + return spufs_attr_open(inode, file, &name ## _get, &name ## _set); \
> +} \
> +static struct file_operations name = { \
> + .open = name ## _open, \
> + .release = spufs_attr_close, \
> + .read = spufs_attr_read, \
> + .write = spufs_attr_write, \
> +};

No module owner set? Be careful if not...

> +static struct tree_descr spufs_dir_contents[] = {
> + { "mem", &spufs_mem_fops, 0644, },

Named identifiers are the better way to do this (yeah, longer code I
know...)

> + { "run", &spufs_run_fops, 0400, },
> + { "mbox", &spufs_mbox_fops, 0400, },
> + { "ibox", &spufs_ibox_fops, 0400, },
> + { "wbox", &spufs_wbox_fops, 0200, },
> + { "signal1_type", &spufs_signal1_type, 0600, },
> + { "signal2_type", &spufs_signal1_type, 0600, },
> +
> +#if 1 /* debugging only */
> + { "class0_mask", &spufs_class0_mask, 0600, },
> + { "class1_mask", &spufs_class1_mask, 0600, },
> + { "class2_mask", &spufs_class2_mask, 0600, },
> + { "class0_stat", &spufs_class0_stat, 0600, },
> + { "class1_stat", &spufs_class1_stat, 0600, },
> + { "class2_stat", &spufs_class2_stat, 0600, },
> + { "sr1", &spufs_mfc_sr1_RW, 0600, },
> + { "fir", &spufs_mfc_fir_R, 0400, },
> + { "fir_status_or", &spufs_mfc_fir_status_or_W, 0200, },
> + { "fir_status_and", &spufs_mfc_fir_status_and_W, 0200, },
> + { "fir_mask", &spufs_mfc_fir_mask_R, 0400, },
> + { "fir_mask_or", &spufs_mfc_fir_mask_or_W, 0200, },
> + { "fir_mask_and", &spufs_mfc_fir_mask_and_W, 0200, },
> + { "fir_chkstp", &spufs_mfc_fir_chkstp_enable_RW, 0600, },
> + { "cer", &spufs_mfc_cer_R, 0400, },
> + { "dsisr", &spufs_mfc_dsisr_RW, 0600, },
> + { "dsir", &spufs_mfc_dsir_R, 0200, },
> + { "cntl", &spufs_mfc_control_RW, 0600, },
> + { "sdr", &spufs_mfc_sdr_RW, 0600, },
> +#endif
> + {},
> +};
> +
> +static int
> +spufs_fill_dir(struct dentry *dir, struct tree_descr *files,
> + int mode, struct spu_context *ctx)
> +{
> + struct inode *inode;
> + struct dentry *dentry;
> + int ret;
> +
> + static struct inode_operations iops = {
> + .getattr = simple_getattr,
> + .setattr = spufs_setattr,
> + };
> +
> + ret = -ENOSPC;
> + while (files->name && files->name[0]) {
> + dentry = d_alloc_name(dir, files->name);
> + if (!dentry)
> + goto out;
> + inode = spufs_new_inode(dir->d_sb,
> + S_IFREG | (files->mode & mode));
> + if (!inode)
> + goto out;
> + inode->i_op = &iops;
> + inode->i_fop = files->ops;
> + inode->i_mapping->a_ops = &spufs_aops;
> + inode->i_mapping->backing_dev_info = &spufs_backing_dev_info;
> + SPUFS_I(inode)->i_ctx = get_spu_context(ctx);
> +
> + d_add(dentry, inode);
> + files++;
> + }
> + return 0;
> +out:
> + // FIXME: remove all files that are left
> + return ret;
> +}
> +
> +static int
> +spufs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
> +{
> + int ret;
> + struct inode *inode;
> + struct spu_context *ctx;
> +
> + ret = -ENOSPC;
> + inode = spufs_new_inode(dir->i_sb, mode | S_IFDIR);
> + if (!inode)
> + goto out;
> +
> + if (dir->i_mode & S_ISGID) {
> + inode->i_gid = dir->i_gid;
> + inode->i_mode |= S_ISGID;
> + }
> + ctx = alloc_spu_context();
> + SPUFS_I(inode)->i_ctx = ctx;
> + if (!ctx)
> + goto out_iput;
> +
> + inode->i_op = &simple_dir_inode_operations;
> + inode->i_fop = &simple_dir_operations;
> + ret = spufs_fill_dir(dentry, spufs_dir_contents, mode, ctx);
> + if (ret)
> + goto out_free_ctx;
> +
> + d_instantiate(dentry, inode);
> + dget(dentry);
> + dir->i_nlink++;
> + goto out;
> +
> +out_free_ctx:
> + put_spu_context(ctx);
> +out_iput:
> + iput(inode);
> +out:
> + return ret;
> +}
> +
> +/* This looks really wrong! */
> +static int spufs_rmdir(struct inode *root, struct dentry *dir_dentry)

Why do you need this? Doesn't 'simple_rmdir' work for you?

The rest of your ramfs based fs code looks a bit complex. Can't it be
as "simple" as the debugfs code is (only 100 lines for a fs.) Or is it
doing different types of things that I'm completly misunderstanding?

And I still think that 100 lines of code to make a ramfs type fs is a
bit big, need to work on that one of these days...

> +union MFC_TagSizeClassCmd {
> + struct {
> + u16 mfc_size;
> + u16 mfc_tag;
> + u8 pad;
> + u8 mfc_rclassid;
> + u16 mfc_cmd;
> + } u;
> + struct {
> + u32 mfc_size_tag32;
> + u32 mfc_class_cmd32;
> + } by32;
> + u64 all64;
> +};

Remember __u16 and friends for structures that cross the user/kernel
boundry (like your ioctl that you will be rewriting...)

thanks,

greg k-h

2005-05-14 13:21:54

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 7/8] ppc64: SPU file system

On S?nnavend 14 Mai 2005 09:45, Greg KH wrote:
> On Fri, May 13, 2005 at 09:29:06PM +0200, Arnd Bergmann wrote:
> > /run A stub file that lets us do ioctl.
>
> No, as Ben said, do not do this. Use write. And as you are only doing
> 1 type of ioctl, it shouldn't be an issue. Also it will be faster than
> the ioctl due to lack of BKL usage :)

I've been back and forth between a number of interfaces here and haven't
found one that I'm really happy with. Using write() is probably my least
favorite one, but these are the alternatives I've come up with so far:

1. ioctl:
pro:
- easy to do in a file system
- can have both input and output arguments
contra:
- ugly
- weakly typed
- unpopular

2. sys_spufs_run(int fd, __u32 pc, __u32 *new_pc, __u32 *status):
pro:
- strong types
- can have both input and output arguments
contra:
- does not fit file system semantics well
- bad for prototyping

3. read:
pro:
- fits file system semantics
- can still return a struct { __u32 new_pc; __u32 status; };
contra:
- no way to pass updated instruction pointer directly

4. write:
pro:
- fits file system semantics
- can take instruction pointer as input
contra:
- no output data

The main problem is the way that the ABI requires the main loop
to work, which is roughly:

pc = initial_instruction_pointer;
do {
set_pc(pc);
status = enter_spu();
if ((status & 0xff00) == SPU_WANTS_EXIT)
return (status & 0xff);
if ((status & 0xffff) == SPU_LIBRARY_CALL) {
pc = get_pc();
do_library_call(*(unsigned int)(local_store_pointer + pc));
pc += 4;
}
if ((status & 0xffff) < SPU_USER_CODE)
do_user_defined_stuff(status);
} while (!(status & 0xffff0000) & ERROR_MASK));

Currently, I'm doing all this in user space, i.e. the kernel does not
need to know about the different status codes that are reserved for
exit or library calls.

Having a new system call would keep the basic concept of the ioctl
and may or may not be nicer but is certainly harder to debug for now.

One thing I could do instead is have the kernel automatically increment
the program counter when the spu requests a library call. This should
be ok, because the SPU_LIBRARY_CALL stuff has already been defined
to have a very specific operating system independent meaning and as
soon as we want to do system calls from the SPU, the kernel needs to
know about some status codes anyway.

In that case, I can make the SPU instruction pointer another file
in the SPU context directory that only needs to be written once.
The operation that starts the SPU code could be an eight byte read
system call returning the new instruction pointer (needed to get
the library call arguments) and the status code that lets the
user determine the required action.

Using a write call instead of read makes the interface even more
complicated because it would require the user to read the status
from a separate file after write returns to check what needs to
be done and then use lseek() or yet another file to access the
instruction pointer.

> And I don't quite think you do the proper permission and validate of the
> data in your code, you should verify this is all correct.

Yes, I'm sure I got that wrong. I'll put that on my todo list.

> > +/**** spufs attributes
> > + *

> > + * Perhaps these file operations could be put in debugfs or libfs instead,
> > + * they are not really SPU specific.
>
> Yes they should. I'll gladly take them for debugfs or like you state,
> libfs is probably the better place for them so everyone can use them.
>
> If you make up a patch, I'll fix up debugfs to use them properly.

Ok. I'll do the patch for libfs then. I've been thinking about
changing

+#define spufs_attribute(name) \
+static int name ## _open(struct inode *inode, struct file *file) \
+{ \
+ return spufs_attr_open(inode, file, &name ## _get, &name ## _set); \
+} \
+static struct file_operations name = { \
+ .open = name ## _open, \
+ .release = spufs_attr_close, \
+ .read = spufs_attr_read, \
+ .write = spufs_attr_write, \
+};

to take a format string argument as well, which is then used in the
spufs_attr_read function instead of the hardcoded "%ld\n". Do you think
I should do that or rather keep the current implementation?

> > +#define spufs_attribute(name) \
> > +static int name ## _open(struct inode *inode, struct file *file) \
> > +{ \
> > + return spufs_attr_open(inode, file, &name ## _get, &name ## _set); \
> > +} \
> > +static struct file_operations name = { \
> > + .open = name ## _open, \
> > + .release = spufs_attr_close, \
> > + .read = spufs_attr_read, \
> > + .write = spufs_attr_write, \
> > +};
>
> No module owner set? Be careful if not...

Right. Is there ever a reason to have file operations without owner?
Maybe dentry_open() could warn about this.

> > +static struct tree_descr spufs_dir_contents[] = {
> > + { "mem", &spufs_mem_fops, 0644, },
>
> Named identifiers are the better way to do this (yeah, longer code I
> know...)

Ok. I took the concept from fs/nfsd/nfsctl.c, thinking that Al knows
how to best do these things, but I can of course change this.

> > +/* This looks really wrong! */
> > +static int spufs_rmdir(struct inode *root, struct dentry *dir_dentry)
>
> Why do you need this? Doesn't 'simple_rmdir' work for you?

The idea was to keep the file system contents consistant with the
underlying data structures. If I allow users to unlink context
directories or files in there, there is no longer a way to extract
reliable information from the file system, e.g. for the debugger
or for implementing something like spu_ps.

My solution was to force the dentries in each directory to be
present. When the directory is created, the files are already
there and unlinking a single file is impossible. To destroy the
spu context, the user has to rmdir it, which will either remove
all files in there as well or fail in the case that any file is
still open.

Of course that is not really Posix behavior, but it avoids some
other pitfalls.

> The rest of your ramfs based fs code looks a bit complex. Can't it be
> as "simple" as the debugfs code is (only 100 lines for a fs.) Or is it
> doing different types of things that I'm completly misunderstanding?

Apart from my special directory semantics, I plan to have a rather
unusual way to map the "mem" files: Each spu context can be either
present on a physical SPU or stored in memory, so I can create a large
number of SPU contexts despite the limitation of physical SPUs present
in the machine.

When the SPU context is executing code, it obviously has to be on a
physical SPU and a memory map of the "mem" file accesses the actual
local store memory that is accessible in the real address space of
the kernel. The context save operation copies the local store memory
into a virtual file that lives only in page cache, exactly how
ramfs deals with its files. Switching between these two states should
be possible without breaking user space programs that have the
file mapped into their address space.

This mechanism is not implemented yet, but some of my code is already
prepared for it.

I also intend to split some parts out from inode.c, probably have
a file.c that contains all the file operations and another context.c
that deals with the interface to the low level spu code and with
abstracting logical spu context from physical spus.

I suppose I should also go over my code to find unnecessary functionality.

> Remember __u16 and friends for structures that cross the user/kernel
> boundry (like your ioctl that you will be rewriting...)

Yes. There are no data structures that are shared with user space
except the current ioctl argument. The MFC_TagSizeClassCmd (yes, I
need to remember to change the name some day, currently this still
uses the identifiers from the spec) and the others are defined
by the hardware interface.

Thanks for all your comments,

Arnd <><

2005-05-15 06:31:36

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 7/8] ppc64: SPU file system


> Using a write call instead of read makes the interface even more
> complicated because it would require the user to read the status
> from a separate file after write returns to check what needs to
> be done and then use lseek() or yet another file to access the
> instruction pointer.

Why not just write(pc) to start and read back status from the same
file ?

Ben.


2005-05-15 09:07:29

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 7/8] ppc64: SPU file system

Hi!

> > /run A stub file that lets us do ioctl. The only ioctl
> > method we need is the spu_run() call. spu_run suspends
> > the current thread from the host CPU and transfers
> > the flow of execution to the SPU.
> > The ioctl call return to the calling thread when a state
> > is entered that can not be handled by the kernel, e.g.
> > an error in the SPU code or an exit() from it.
> > When a signal is pending for the host CPU thread, the
> > ioctl is interrupted and the SPU stopped in order to
> > call the signal handler.
>
> ioctl's are generally considered evil ... what about a write() method
> writing a command ?

That's even more evil than ioctl()... Try doing 32-vs-64bit conversion
on write...
Pavel
--
Boycott Kodak -- for their patent abuse against Java.

2005-05-15 10:25:21

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 7/8] ppc64: SPU file system

On S?nndag 15 Mai 2005 08:29, Benjamin Herrenschmidt wrote:
> Why not just write(pc) to start and read back status from the same
> file ?

I suppose you are thinking of the simple_transaction_read() style
interface. I've got the feeling that this is generally even
less popular than ioctl because

- it is still an untyped interface (as would be a read() based one)
- you can't do 32 bit emulation (doesn't matter for me, we only
have 32 bit data)
- it is non-atomic
- it doubles the system call overhead

One operation that I want to allow is to have an infinite loop
running on the SPU that does a simple operation (e.g. process
one MPEG macroblock) and have that called by multiple unrelated
processes in turns. When my operation is not atomic, users need
to have additional IPC serialization of their accesses. Most
would want that anyway, but it is not a requirement with an
interface that needs only a single system call.

For the extra syscall overhead, I would like to see measurements
of a real world application before I change to an interface that
is slower in theory. Do you have measurements for the time spent
in a trivial system call on G5 or Power4?

Arnd <><

2005-05-15 11:24:21

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 7/8] ppc64: SPU file system

Greg KH <[email protected]> writes:
>
> No, as Ben said, do not do this. Use write. And as you are only doing
> 1 type of ioctl, it shouldn't be an issue. Also it will be faster than
> the ioctl due to lack of BKL usage :)

The problem is that if something is wrong regarding 32bit/64bit
compatibility (I am not saying Arnd will get it wrong, but
for a general rule someone will get it wrong and it has happened, e.g.
in ubsfs) then it is impossible to do any compat emulation
on read/write.

So I would actually prefer ioctl because it is sfer.

-Andi

2005-05-15 12:05:06

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 7/8] ppc64: SPU file system


> > ioctl's are generally considered evil ... what about a write() method
> > writing a command ?
>
> That's even more evil than ioctl()... Try doing 32-vs-64bit conversion
> on write...

I don't see the problem ... if you are passing a structure, you have to
convert it anyway, and it's bad practice. I was thinking about passing
ascii so it can be controlled by shell scripts.

Ben.


2005-05-15 12:06:29

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 7/8] ppc64: SPU file system

On S?nndag 15 Mai 2005 12:08, Arnd Bergmann wrote:
> On S?nndag 15 Mai 2005 08:29, Benjamin Herrenschmidt wrote:
> > Why not just write(pc) to start and read back status from the same
> > file ?

I just remembered the strongest reason against using write() to set
the instruction pointer: It breaks signal delivery during execution
of SPU code. With an ioctl or system call based interface, the kernel
simply updates the instruction pointer in process memory before
calling a signal handler. When/if the signal handler returns, it
does the same call again with the updated argument and the SPU
continues to fetch code at the point where it stopped.

If I do a read() based interface, there are no input parameters
at all, so restarted system calls work as well.

How about this one:

read() starts execution and returns the status value in a four
byte buffer.
Calling lseek() on the "run" file updates the instruction pointer,
so the library call can work like this plus error handling:

extern char *mapped_local_store;
uint32_t status;
int runfd = open("run", O_RDONLY);
lseek(runfd, INITIAL_INSTRUCTION, SEEK_SET);
do {
read(runfd, &status, 4);
if (status == SPU_DO_LIBRARY_CALL) {
size_t arg = lseek(runfd, 4, SEEK_CUR) - 4;
do_library_call(mapped_local_store + arg);
}
} while (status != SPU_EXIT);

Arnd <><

2005-05-15 12:49:23

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 7/8] ppc64: SPU file system

On S?nndag 15 Mai 2005 14:02, Benjamin Herrenschmidt wrote:
>
> > That's even more evil than ioctl()... Try doing 32-vs-64bit conversion
> > on write...
>
> I don't see the problem ... if you are passing a structure, you have to
> convert it anyway, and it's bad practice. I was thinking about passing
> ascii so it can be controlled by shell scripts.

Parsing multi-value ascii data is error prone. in kernel space, I would
not want to do anything more complex than a simple_strtoul(), if only
for the reason of not giving bad examples.

When passing binary structures, there is a significant difference between
passing it through ioctl or read/write: We already have a rather complicated
method of detecting if whether and how to convert them (f_op->compat_ioctl,
hash lookup and the deprecated dynamic registration).

For read/write, there is no way to tell if you need to do the conversion,
even if the file operation is aware of the actual data layout of both
variants. Moreover, a good implementation of a read/write file operation
should be able to deal with resuming partial transfers.

Regarding the shell scripting possibility, I don't really see the point.
The only code that should actually use the kernel interfaces is something
like an /lib/ld-spu.so interpreter and that is better implemented in C
anyway because it needs to parse ELF structures and such.

Arnd <><

2005-05-16 20:19:16

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 7/8] ppc64: SPU file system

Greg> No, as Ben said, do not do this. Use write. And as you are
Greg> only doing 1 type of ioctl, it shouldn't be an issue. Also
Greg> it will be faster than the ioctl due to lack of BKL usage :)

This is no longer true. ioctls don't have to take the BKL now that
struct file_operations has unlocked_ioctl and compat_ioctl.

- R.

2005-05-16 21:42:06

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 7/8] ppc64: SPU file system

On Mon, May 16, 2005 at 01:14:58PM -0700, Roland Dreier wrote:
> Greg> No, as Ben said, do not do this. Use write. And as you are
> Greg> only doing 1 type of ioctl, it shouldn't be an issue. Also
> Greg> it will be faster than the ioctl due to lack of BKL usage :)
>
> This is no longer true. ioctls don't have to take the BKL now that
> struct file_operations has unlocked_ioctl and compat_ioctl.

Yes, but his patch did not use them :)

thanks,

greg k-h

2005-05-16 21:45:42

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 7/8] ppc64: SPU file system

On Sat, May 14, 2005 at 03:05:06PM +0200, Arnd Bergmann wrote:
> On S?nnavend 14 Mai 2005 09:45, Greg KH wrote:
> > On Fri, May 13, 2005 at 09:29:06PM +0200, Arnd Bergmann wrote:
> > > /run A stub file that lets us do ioctl.
> >
> > No, as Ben said, do not do this. Use write. And as you are only doing
> > 1 type of ioctl, it shouldn't be an issue. Also it will be faster than
> > the ioctl due to lack of BKL usage :)
>
> I've been back and forth between a number of interfaces here and haven't
> found one that I'm really happy with. Using write() is probably my least
> favorite one, but these are the alternatives I've come up with so far:
>
> 1. ioctl:
> pro:
> - easy to do in a file system
> - can have both input and output arguments
> contra:
> - ugly
> - weakly typed
> - unpopular
>
> 2. sys_spufs_run(int fd, __u32 pc, __u32 *new_pc, __u32 *status):
> pro:
> - strong types
> - can have both input and output arguments
> contra:
> - does not fit file system semantics well
> - bad for prototyping

I suggest you do this. Based on what you say you want the code to do, I
agree, write() doesn't really work out well (but it might, and if you
want an example of how to do it, look at the ibmasm driver, it
implements write() in a way much like what you are wanting to do.)

> > > +/**** spufs attributes
> > > + *
>
> > > + * Perhaps these file operations could be put in debugfs or libfs instead,
> > > + * they are not really SPU specific.
> >
> > Yes they should. I'll gladly take them for debugfs or like you state,
> > libfs is probably the better place for them so everyone can use them.
> >
> > If you make up a patch, I'll fix up debugfs to use them properly.
>
> Ok. I'll do the patch for libfs then. I've been thinking about
> changing
>
> +#define spufs_attribute(name) \
> +static int name ## _open(struct inode *inode, struct file *file) \
> +{ \
> + return spufs_attr_open(inode, file, &name ## _get, &name ## _set); \
> +} \
> +static struct file_operations name = { \
> + .open = name ## _open, \
> + .release = spufs_attr_close, \
> + .read = spufs_attr_read, \
> + .write = spufs_attr_write, \
> +};
>
> to take a format string argument as well, which is then used in the
> spufs_attr_read function instead of the hardcoded "%ld\n". Do you think
> I should do that or rather keep the current implementation?

yeah, you probably need the format string.

> > > +#define spufs_attribute(name) \
> > > +static int name ## _open(struct inode *inode, struct file *file) \
> > > +{ \
> > > + return spufs_attr_open(inode, file, &name ## _get, &name ## _set); \
> > > +} \
> > > +static struct file_operations name = { \
> > > + .open = name ## _open, \
> > > + .release = spufs_attr_close, \
> > > + .read = spufs_attr_read, \
> > > + .write = spufs_attr_write, \
> > > +};
> >
> > No module owner set? Be careful if not...
>
> Right. Is there ever a reason to have file operations without owner?

Code built into the kernel?

> Maybe dentry_open() could warn about this.

Would die a horrible death due to the above :)

> > > +/* This looks really wrong! */
> > > +static int spufs_rmdir(struct inode *root, struct dentry *dir_dentry)
> >
> > Why do you need this? Doesn't 'simple_rmdir' work for you?
>
> The idea was to keep the file system contents consistant with the
> underlying data structures. If I allow users to unlink context
> directories or files in there, there is no longer a way to extract
> reliable information from the file system, e.g. for the debugger
> or for implementing something like spu_ps.
>
> My solution was to force the dentries in each directory to be
> present. When the directory is created, the files are already
> there and unlinking a single file is impossible. To destroy the
> spu context, the user has to rmdir it, which will either remove
> all files in there as well or fail in the case that any file is
> still open.

Ick.

> Of course that is not really Posix behavior, but it avoids some
> other pitfalls.

Go with a syscall :)

> > Remember __u16 and friends for structures that cross the user/kernel
> > boundry (like your ioctl that you will be rewriting...)
>
> Yes. There are no data structures that are shared with user space
> except the current ioctl argument. The MFC_TagSizeClassCmd (yes, I
> need to remember to change the name some day, currently this still
> uses the identifiers from the spec) and the others are defined
> by the hardware interface.

Identifiers that are named as per a spec are ok to leave alone. We did
that with USB, as it makes sense to do it that way for anyone who reads
the spec and the code.

But if your spec is only for the Linux OS, well, that's a different
issue...

thanks,

greg k-h

2005-05-16 22:26:44

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 7/8] ppc64: SPU file system

On Maandag 16 Mai 2005 22:58, Greg KH wrote:
> On Sat, May 14, 2005 at 03:05:06PM +0200, Arnd Bergmann wrote:

> > 2. sys_spufs_run(int fd, __u32 pc, __u32 *new_pc, __u32 *status):
> > pro:
> > - strong types
> > - can have both input and output arguments
> > contra:
> > - does not fit file system semantics well
> > - bad for prototyping
>
> I suggest you do this. Based on what you say you want the code to do, I
> agree, write() doesn't really work out well

The syscall approach has another small disadvantage in that I need to
do a callback registration mechanism for it if I want to have spufs as
a loadable module. I could of course require spufs to be builtin, but
that complicates prototype testing (as mentioned) and enlarges combined
pSeries/powermac/BPA distro kernels.

I think I'll leave the ioctl for now and add a note saying that I need
to replace it with a syscall or the write/read or lseek/read based
approach when I arrive at a more feature complete point.

> (but it might, and if you
> want an example of how to do it, look at the ibmasm driver, it
> implements write() in a way much like what you are wanting to do.)

That would be the same write/read combination as Ben's second
proposal and the nfsctl file system, right?

> > My solution was to force the dentries in each directory to be
> > present. When the directory is created, the files are already
> > there and unlinking a single file is impossible. To destroy the
> > spu context, the user has to rmdir it, which will either remove
> > all files in there as well or fail in the case that any file is
> > still open.
>
> Ick.
>
> > Of course that is not really Posix behavior, but it avoids some
> > other pitfalls.
>
> Go with a syscall :)

Sorry, I'm not following that reasoning. How does a syscall help
with the problem of atomic context destruction?

Arnd <><

2005-05-16 22:31:09

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 7/8] ppc64: SPU file system

On Tue, May 17, 2005 at 12:01:05AM +0200, Arnd Bergmann wrote:
> On Maandag 16 Mai 2005 22:58, Greg KH wrote:
> > On Sat, May 14, 2005 at 03:05:06PM +0200, Arnd Bergmann wrote:
>
> > > 2. sys_spufs_run(int fd, __u32 pc, __u32 *new_pc, __u32 *status):
> > > pro:
> > > - strong types
> > > - can have both input and output arguments
> > > contra:
> > > - does not fit file system semantics well
> > > - bad for prototyping
> >
> > I suggest you do this. Based on what you say you want the code to do, I
> > agree, write() doesn't really work out well
>
> The syscall approach has another small disadvantage in that I need to
> do a callback registration mechanism for it if I want to have spufs as
> a loadable module. I could of course require spufs to be builtin, but
> that complicates prototype testing (as mentioned) and enlarges combined
> pSeries/powermac/BPA distro kernels.

Huh? We can handle syscalls in modules these days pretty simply. Look
at how nfs and others do it.

> I think I'll leave the ioctl for now and add a note saying that I need
> to replace it with a syscall or the write/read or lseek/read based
> approach when I arrive at a more feature complete point.

Nah, make it a syscall :)

> > (but it might, and if you
> > want an example of how to do it, look at the ibmasm driver, it
> > implements write() in a way much like what you are wanting to do.)
>
> That would be the same write/read combination as Ben's second
> proposal and the nfsctl file system, right?

Yes.

> > > My solution was to force the dentries in each directory to be
> > > present. When the directory is created, the files are already
> > > there and unlinking a single file is impossible. To destroy the
> > > spu context, the user has to rmdir it, which will either remove
> > > all files in there as well or fail in the case that any file is
> > > still open.
> >
> > Ick.
> >
> > > Of course that is not really Posix behavior, but it avoids some
> > > other pitfalls.
> >
> > Go with a syscall :)
>
> Sorry, I'm not following that reasoning. How does a syscall help
> with the problem of atomic context destruction?

Sorry, I thought they were referring to the same issue.

greg k-h

2005-05-16 22:41:26

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 7/8] ppc64: SPU file system

On Dinsdag 17 Mai 2005 00:27, Greg KH wrote:
> Huh? ?We can handle syscalls in modules these days pretty simply. ?Look
> at how nfs and others do it.

Well afaics, nfs works around this issue by having fs/nfsctl.o always
as a builtin and abstract the calls through a file system using
read/write. That would be Ben's idea again, i.e. not actually
using a system call.

The only widely used module that I'm aware of ever implementing a system
call was the TUX web accelerator that that used a hack in entry.S
and its own dynamic registration.

Arnd <><

2005-05-16 22:49:16

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 7/8] ppc64: SPU file system

On Tue, May 17, 2005 at 12:22:56AM +0200, Arnd Bergmann wrote:
> On Dinsdag 17 Mai 2005 00:27, Greg KH wrote:
> > Huh? ?We can handle syscalls in modules these days pretty simply. ?Look
> > at how nfs and others do it.
>
> Well afaics, nfs works around this issue by having fs/nfsctl.o always
> as a builtin and abstract the calls through a file system using
> read/write. That would be Ben's idea again, i.e. not actually
> using a system call.
>
> The only widely used module that I'm aware of ever implementing a system
> call was the TUX web accelerator that that used a hack in entry.S
> and its own dynamic registration.

Sorry, but I was thinking of the cond_syscall() stuff, to allow syscalls
in modules or code that just happens to not be built into the kernel.

thanks,

greg k-h

2005-05-17 07:26:33

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH 4/8] ppc64: add BPA platform type

Arnd Bergmann writes:

> This adds the basic support for running on BPA machines.
> So far, this is only the IBM workstation, and it will
> not run on others without a little more generalization.

> +/* FIXME: consolidate this into rtas.c or similar */
> +static void __init pSeries_calibrate_decr(void)

Shouldn't this be called bpa_calibrate_decr or something similar?

> -#define PV_630 0x0040
> -#define PV_630p 0x0041
> +#define PV_630 0x0040
> +#define PV_630p 0x0041

Hmmm, I don't think your patch needs to clean up the whitespace here.

Regards,
Paul.

2005-05-17 11:25:13

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 4/8] ppc64: add BPA platform type

On Dinsdag 17 Mai 2005 09:01, Paul Mackerras wrote:
> Arnd Bergmann writes:
>
> > This adds the basic support for running on BPA machines.
> > So far, this is only the IBM workstation, and it will
> > not run on others without a little more generalization.
>
> > +/* FIXME: consolidate this into rtas.c or similar */
> > +static void __init pSeries_calibrate_decr(void)
>
> Shouldn't this be called bpa_calibrate_decr or something similar?

The function is identical to the one for pSeries, and I'd
prefer to have only one copy of it with a more generic name.
Actually, it looks like maple and perhaps pmac have a very
similar *_calibrate_decr function, so I could perhaps
just put this into time.c as generic_calibrate_decr().

[ Ben, can you tell if pSeries_calibrate_decr should work on
all G5 macs or if it can be changed to support them as well? ]

On a similar issue, I just remembered that I wanted to
create a rtas_time.c to hold the rtc access functions
for pSeries and BPA. Do you think that's a good idea?

> > -#define PV_630 0x0040
> > -#define PV_630p 0x0041
> > +#define PV_630 0x0040
> > +#define PV_630p 0x0041
>
> Hmmm, I don't think your patch needs to clean up the whitespace here.

ok.

Arnd <><

2005-05-17 20:40:52

by Nathan Lynch

[permalink] [raw]
Subject: Re: [PATCH 3/8] ppc64: add a watchdog driver for rtas

Arnd Bergmann wrote:
> +static volatile int wdrtas_miscdev_open = 0;
...
> +static int
> +wdrtas_open(struct inode *inode, struct file *file)
> +{
> + /* only open once */
> + if (xchg(&wdrtas_miscdev_open,1))
> + return -EBUSY;

The volatile and xchg strike me as an obscure method for ensuring only
one process at a time can open this file. Any reason a semaphore
couldn't be used?

> +static int
> +wdrtas_close(struct inode *inode, struct file *file)
> +{
> + /* only stop watchdog, if this was announced using 'V' before */
> + if (wdrtas_expect_close == WDRTAS_MAGIC_CHAR)
> + wdrtas_timer_stop();
> + else {
> + printk("wdrtas: got unexpected close. Watchdog "
> + "not stopped.\n");

printk's need a valid log level specified. There are several in this
file that lack them.


Nathan

2005-05-18 07:31:56

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 3/8] ppc64: add a watchdog driver for rtas

On Dinsdag 17 Mai 2005 22:40, Nathan Lynch wrote:
> Arnd Bergmann wrote:
> > +static volatile int wdrtas_miscdev_open = 0;
> ...
> > +static int
> > +wdrtas_open(struct inode *inode, struct file *file)
> > +{
> > + /* only open once */
> > + if (xchg(&wdrtas_miscdev_open,1))
> > + return -EBUSY;
>
> The volatile and xchg strike me as an obscure method for ensuring only
> one process at a time can open this file. Any reason a semaphore
> couldn't be used?

A semaphore would also be the wrong approach since we don't want
processes to block but instead to fail opening the watchdog twice.
Other watchdog drivers use atomic_t or bitops to guard open, which
imho would be the better solution.

Of course, there is also Wim's plan to do a unified watchdog driver
that would solve this once and for all.

> > + printk("wdrtas: got unexpected close. Watchdog "
> > + "not stopped.\n");
>
> printk's need a valid log level specified. There are several in this
> file that lack them.

Right.

Utz, do you have time to fix up these issues? If not, I probably won't
look into it this week either.

Thanks,

Arnd <><

2005-05-18 12:58:10

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH] libfs: add simple attribute files

Based on the discussion about spufs attributes, this is my suggestion
for a more generic attribute file support that can be used by both
debugfs and spufs.

Simple attribute files behave similarly to sequential files from
a kernel programmers perspective in that a standard set of file
operations is provided and only an open operation needs to
be written that registers file specific get() and set() functions.

These operations are defined as

void foo_set(void *data, long val); and
long foo_get(void *data);

where data is the inode->u.generic_ip pointer of the file and the
operations just need to make send of that pointer. The infrastructure
makes sure this works correctly with concurrent access and partial
read calls.

A macro named DEFINE_SIMPLE_ATTRIBUTE is provided to further simplify
using the attributes.

This patch already contains the changes for debugfs to use attributes
for its internal file operations.

Signed-off-by: Arnd Bergmann <[email protected]>

---
fs/debugfs/file.c | 74 +++++++++++++++++++----------------------
fs/libfs.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 48 +++++++++++++++++++++++++++
3 files changed, 177 insertions(+), 39 deletions(-)

fs/libfs.c: needs update
Index: linus-2.5/include/linux/fs.h
===================================================================
--- linus-2.5.orig/include/linux/fs.h 2005-05-18 10:58:52.000000000 +0200
+++ linus-2.5/include/linux/fs.h 2005-05-18 14:07:10.000000000 +0200
@@ -1657,6 +1657,55 @@
ar->size = n;
}

+/*
+ * simple attribute files
+ *
+ * These attributes behave similar to those in sysfs:
+ *
+ * Writing to an attribute immediately sets a value, an open file can be
+ * written to multiple times.
+ *
+ * Reading from an attribute creates a buffer from the value that might get
+ * read with multiple read calls. When the attribute has been read
+ * completely, no further read calls are possible until the file is opened
+ * again.
+ *
+ * All spufs attributes contain a text representation of a numeric value
+ * that are accessed with the get() and set() functions.
+ *
+ * Perhaps these file operations could be put in debugfs or libfs instead,
+ * they are not really SPU specific.
+ */
+#define DEFINE_SIMPLE_ATTRIBUTE(__fops, __get, __set, __fmt) \
+static int __fops ## _open(struct inode *inode, struct file *file) \
+{ \
+ __simple_attr_check_format(__fmt, 0ul); \
+ return simple_attr_open(inode, file, &__get, &__set, __fmt); \
+} \
+static struct file_operations __fops = { \
+ .owner = THIS_MODULE, \
+ .open = __fops ## _open, \
+ .release = simple_attr_close, \
+ .read = simple_attr_read, \
+ .write = simple_attr_write, \
+};
+
+static inline void __attribute__((format(printf, 1, 2)))
+__simple_attr_check_format(const char *fmt, ...)
+{
+ /* don't do anything, just let the compiler check the arguments; */
+}
+
+int simple_attr_open(struct inode *inode, struct file *file,
+ long (*get)(void *), void (*set)(void *, long),
+ const char *fmt);
+int simple_attr_close(struct inode *inode, struct file *file);
+ssize_t simple_attr_read(struct file *file, char __user *buf,
+ size_t len, loff_t *ppos);
+ssize_t simple_attr_write(struct file *file, const char __user *buf,
+ size_t len, loff_t *ppos);
+
+
#ifdef CONFIG_SECURITY
static inline char *alloc_secdata(void)
{
Index: linus-2.5/fs/libfs.c
===================================================================
--- linus-2.5.orig/fs/libfs.c 2005-05-18 10:58:52.000000000 +0200
+++ linus-2.5/fs/libfs.c 2005-05-18 12:06:49.000000000 +0200
@@ -519,6 +519,100 @@
return 0;
}

+/* Simple attribute files */
+
+struct simple_attr {
+ long (*get)(void *);
+ void (*set)(void *, long);
+ char get_buf[24]; /* enough to store a long and "\n\0" */
+ char set_buf[24];
+ void *data;
+ const char *fmt; /* format for read operation */
+ struct semaphore sem; /* protects access to these buffers */
+};
+
+/* simple_attr_open is called by an actual attribute open file operation
+ * to set the attribute specific access operations. */
+int simple_attr_open(struct inode *inode, struct file *file,
+ long (*get)(void *), void (*set)(void *, long),
+ const char *fmt)
+{
+ struct simple_attr *attr;
+
+ attr = kmalloc(sizeof *attr, GFP_KERNEL);
+ if (!attr)
+ return -ENOMEM;
+
+ /* reading/writing needs the respective get/set operation */
+ if (((file->f_mode & FMODE_READ) && !get) ||
+ ((file->f_mode & FMODE_WRITE) && !set))
+ return -EACCES;
+
+ attr->get = get;
+ attr->set = set;
+ attr->data = inode->u.generic_ip;
+ attr->fmt = fmt;
+ init_MUTEX(&attr->sem);
+
+ file->private_data = attr;
+
+ return nonseekable_open(inode, file);
+}
+
+int simple_attr_close(struct inode *inode, struct file *file)
+{
+ kfree(file->private_data);
+ return 0;
+}
+
+/* read from the buffer that is filled with the get function */
+ssize_t simple_attr_read(struct file *file, char __user *buf,
+ size_t len, loff_t *ppos)
+{
+ struct simple_attr *attr;
+ size_t size;
+ ssize_t ret;
+
+ attr = file->private_data;
+
+ down(&attr->sem);
+ if (*ppos) /* continued read */
+ size = strlen(attr->get_buf);
+ else /* first read */
+ size = scnprintf(attr->get_buf, sizeof (attr->get_buf),
+ attr->fmt, attr->get(attr->data));
+
+ ret = simple_read_from_buffer(buf, len, ppos, attr->get_buf, size);
+ up(&attr->sem);
+ return ret;
+}
+
+/* interpret the buffer as a number to call the set function with */
+ssize_t simple_attr_write(struct file *file, const char __user *buf,
+ size_t len, loff_t *ppos)
+{
+ struct simple_attr *attr;
+ long val;
+ size_t size;
+ ssize_t ret;
+
+ attr = file->private_data;
+
+ down(&attr->sem);
+ ret = -EFAULT;
+ size = min(sizeof (attr->set_buf) - 1, len);
+ if (copy_from_user(attr->set_buf, buf, size))
+ goto out;
+
+ ret = len; /* claim we got the whole input */
+ attr->set_buf[size] = '\0';
+ val = simple_strtol(attr->set_buf, NULL, 0);
+ attr->set(attr->data, val);
+out:
+ up(&attr->sem);
+ return ret;
+}
+
EXPORT_SYMBOL(dcache_dir_close);
EXPORT_SYMBOL(dcache_dir_lseek);
EXPORT_SYMBOL(dcache_dir_open);
Index: linus-2.5/fs/debugfs/file.c
===================================================================
--- linus-2.5.orig/fs/debugfs/file.c 2005-05-18 10:58:52.000000000 +0200
+++ linus-2.5/fs/debugfs/file.c 2005-05-18 12:22:16.000000000 +0200
@@ -45,45 +45,6 @@
.open = default_open,
};

-#define simple_type(type, format, temptype, strtolfn) \
-static ssize_t read_file_##type(struct file *file, char __user *user_buf, \
- size_t count, loff_t *ppos) \
-{ \
- char buf[32]; \
- type *val = file->private_data; \
- \
- snprintf(buf, sizeof(buf), format "\n", *val); \
- return simple_read_from_buffer(user_buf, count, ppos, buf, strlen(buf));\
-} \
-static ssize_t write_file_##type(struct file *file, const char __user *user_buf,\
- size_t count, loff_t *ppos) \
-{ \
- char *endp; \
- char buf[32]; \
- int buf_size; \
- type *val = file->private_data; \
- temptype tmp; \
- \
- memset(buf, 0x00, sizeof(buf)); \
- buf_size = min(count, (sizeof(buf)-1)); \
- if (copy_from_user(buf, user_buf, buf_size)) \
- return -EFAULT; \
- \
- tmp = strtolfn(buf, &endp, 0); \
- if ((endp == buf) || ((type)tmp != tmp)) \
- return -EINVAL; \
- *val = tmp; \
- return count; \
-} \
-static struct file_operations fops_##type = { \
- .read = read_file_##type, \
- .write = write_file_##type, \
- .open = default_open, \
-};
-simple_type(u8, "%c", unsigned long, simple_strtoul);
-simple_type(u16, "%hi", unsigned long, simple_strtoul);
-simple_type(u32, "%i", unsigned long, simple_strtoul);
-
/**
* debugfs_create_u8 - create a file in the debugfs filesystem that is used to read and write a unsigned 8 bit value.
*
@@ -109,6 +70,18 @@
* NULL or !NULL instead as to eliminate the need for #ifdef in the calling
* code.
*/
+static void debugfs_u8_set(void *data, long val)
+{
+ *(u8 *)data = val;
+}
+
+static long debugfs_u8_get(void *data)
+{
+ return *(u8 *)data;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(fops_u8, debugfs_u8_get, debugfs_u8_set, "%lu\n");
+
struct dentry *debugfs_create_u8(const char *name, mode_t mode,
struct dentry *parent, u8 *value)
{
@@ -141,6 +114,17 @@
* NULL or !NULL instead as to eliminate the need for #ifdef in the calling
* code.
*/
+static void debugfs_u16_set(void *data, long val)
+{
+ *(u16 *)data = val;
+}
+
+static long debugfs_u16_get(void *data)
+{
+ return *(u16 *)data;
+}
+DEFINE_SIMPLE_ATTRIBUTE(fops_u16, debugfs_u16_get, debugfs_u16_set, "%lu\n");
+
struct dentry *debugfs_create_u16(const char *name, mode_t mode,
struct dentry *parent, u16 *value)
{
@@ -173,6 +157,18 @@
* NULL or !NULL instead as to eliminate the need for #ifdef in the calling
* code.
*/
+static void debugfs_u32_set(void *data, long val)
+{
+ *(u32 *)data = val;
+}
+
+static long debugfs_u32_get(void *data)
+{
+ return *(u32 *)data;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(fops_u32, debugfs_u32_get, debugfs_u32_set, "%lu\n");
+
struct dentry *debugfs_create_u32(const char *name, mode_t mode,
struct dentry *parent, u32 *value)
{

2005-05-18 15:01:00

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 3/8] ppc64: add a watchdog driver for rtas

On Middeweken 18 Mai 2005 16:45, Nathan Lynch wrote:
>
> > A semaphore would also be the wrong approach since we don't want
> > processes to block but instead to fail opening the watchdog twice.
>
> I should have been more explicit. ?What I had in mind was using
> down_trylock and returning -EBUSY if it failed.

Well, that's also pointless. If the only operations you ever do
on a semaphore are down_trylock() and up(), you end up using
only the atomic variable in there while wasting a few bytes of
extra memory for storing the wait queue head ;-)

Arnd <><

2005-05-18 15:31:22

by Nathan Lynch

[permalink] [raw]
Subject: Re: [PATCH 3/8] ppc64: add a watchdog driver for rtas

Arnd Bergmann wrote:
> On Dinsdag 17 Mai 2005 22:40, Nathan Lynch wrote:
> > Arnd Bergmann wrote:
> > > +static volatile int wdrtas_miscdev_open = 0;
> > ...
> > > +static int
> > > +wdrtas_open(struct inode *inode, struct file *file)
> > > +{
> > > + /* only open once */
> > > + if (xchg(&wdrtas_miscdev_open,1))
> > > + return -EBUSY;
> >
> > The volatile and xchg strike me as an obscure method for ensuring only
> > one process at a time can open this file. Any reason a semaphore
> > couldn't be used?
>
> A semaphore would also be the wrong approach since we don't want
> processes to block but instead to fail opening the watchdog twice.

I should have been more explicit. What I had in mind was using
down_trylock and returning -EBUSY if it failed.

Nathan

2005-05-18 20:20:25

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] libfs: add simple attribute files

On Wed, May 18, 2005 at 02:40:59PM +0200, Arnd Bergmann wrote:
> Based on the discussion about spufs attributes, this is my suggestion
> for a more generic attribute file support that can be used by both
> debugfs and spufs.

Thanks for the patch. I've cleaned it up a bit (drop the spufs
comments, changed the access check, and made the val be u64, and
exported the symbols and cleaned up the debugfs portion) and added it to
my tree. It should show up in the next -mm release. I've included the
patch below so you can see my
changes.

thanks,

greg k-h

---------------

Based on the discussion about spufs attributes, this is my suggestion
for a more generic attribute file support that can be used by both
debugfs and spufs.

Simple attribute files behave similarly to sequential files from
a kernel programmers perspective in that a standard set of file
operations is provided and only an open operation needs to
be written that registers file specific get() and set() functions.

These operations are defined as

void foo_set(void *data, long val); and
long foo_get(void *data);

where data is the inode->u.generic_ip pointer of the file and the
operations just need to make send of that pointer. The infrastructure
makes sure this works correctly with concurrent access and partial
read calls.

A macro named DEFINE_SIMPLE_ATTRIBUTE is provided to further simplify
using the attributes.

This patch already contains the changes for debugfs to use attributes
for its internal file operations.

Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/debugfs/file.c | 67 +++++++++++++++--------------------
fs/libfs.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 46 ++++++++++++++++++++++++
3 files changed, 174 insertions(+), 38 deletions(-)

--- gregkh-2.6.orig/include/linux/fs.h 2005-05-18 11:16:39.000000000 -0700
+++ gregkh-2.6/include/linux/fs.h 2005-05-18 11:16:51.000000000 -0700
@@ -1657,6 +1657,52 @@
ar->size = n;
}

+/*
+ * simple attribute files
+ *
+ * These attributes behave similar to those in sysfs:
+ *
+ * Writing to an attribute immediately sets a value, an open file can be
+ * written to multiple times.
+ *
+ * Reading from an attribute creates a buffer from the value that might get
+ * read with multiple read calls. When the attribute has been read
+ * completely, no further read calls are possible until the file is opened
+ * again.
+ *
+ * All attributes contain a text representation of a numeric value
+ * that are accessed with the get() and set() functions.
+ */
+#define DEFINE_SIMPLE_ATTRIBUTE(__fops, __get, __set, __fmt) \
+static int __fops ## _open(struct inode *inode, struct file *file) \
+{ \
+ __simple_attr_check_format(__fmt, 0ul); \
+ return simple_attr_open(inode, file, &__get, &__set, __fmt); \
+} \
+static struct file_operations __fops = { \
+ .owner = THIS_MODULE, \
+ .open = __fops ## _open, \
+ .release = simple_attr_close, \
+ .read = simple_attr_read, \
+ .write = simple_attr_write, \
+};
+
+static inline void __attribute__((format(printf, 1, 2)))
+__simple_attr_check_format(const char *fmt, ...)
+{
+ /* don't do anything, just let the compiler check the arguments; */
+}
+
+int simple_attr_open(struct inode *inode, struct file *file,
+ u64 (*get)(void *), void (*set)(void *, u64),
+ const char *fmt);
+int simple_attr_close(struct inode *inode, struct file *file);
+ssize_t simple_attr_read(struct file *file, char __user *buf,
+ size_t len, loff_t *ppos);
+ssize_t simple_attr_write(struct file *file, const char __user *buf,
+ size_t len, loff_t *ppos);
+
+
#ifdef CONFIG_SECURITY
static inline char *alloc_secdata(void)
{
--- gregkh-2.6.orig/fs/libfs.c 2005-05-18 11:16:39.000000000 -0700
+++ gregkh-2.6/fs/libfs.c 2005-05-18 11:19:09.000000000 -0700
@@ -519,6 +519,101 @@
return 0;
}

+/* Simple attribute files */
+
+struct simple_attr {
+ u64 (*get)(void *);
+ void (*set)(void *, u64);
+ char get_buf[24]; /* enough to store a u64 and "\n\0" */
+ char set_buf[24];
+ void *data;
+ const char *fmt; /* format for read operation */
+ struct semaphore sem; /* protects access to these buffers */
+};
+
+/* simple_attr_open is called by an actual attribute open file operation
+ * to set the attribute specific access operations. */
+int simple_attr_open(struct inode *inode, struct file *file,
+ u64 (*get)(void *), void (*set)(void *, u64),
+ const char *fmt)
+{
+ struct simple_attr *attr;
+
+ attr = kmalloc(sizeof(*attr), GFP_KERNEL);
+ if (!attr)
+ return -ENOMEM;
+
+ attr->get = get;
+ attr->set = set;
+ attr->data = inode->u.generic_ip;
+ attr->fmt = fmt;
+ init_MUTEX(&attr->sem);
+
+ file->private_data = attr;
+
+ return nonseekable_open(inode, file);
+}
+
+int simple_attr_close(struct inode *inode, struct file *file)
+{
+ kfree(file->private_data);
+ return 0;
+}
+
+/* read from the buffer that is filled with the get function */
+ssize_t simple_attr_read(struct file *file, char __user *buf,
+ size_t len, loff_t *ppos)
+{
+ struct simple_attr *attr;
+ size_t size;
+ ssize_t ret;
+
+ attr = file->private_data;
+
+ if (!attr->get)
+ return -EACCES;
+
+ down(&attr->sem);
+ if (*ppos) /* continued read */
+ size = strlen(attr->get_buf);
+ else /* first read */
+ size = scnprintf(attr->get_buf, sizeof(attr->get_buf),
+ attr->fmt, attr->get(attr->data));
+
+ ret = simple_read_from_buffer(buf, len, ppos, attr->get_buf, size);
+ up(&attr->sem);
+ return ret;
+}
+
+/* interpret the buffer as a number to call the set function with */
+ssize_t simple_attr_write(struct file *file, const char __user *buf,
+ size_t len, loff_t *ppos)
+{
+ struct simple_attr *attr;
+ u64 val;
+ size_t size;
+ ssize_t ret;
+
+ attr = file->private_data;
+
+ if (!attr->set)
+ return -EACCES;
+
+ down(&attr->sem);
+ ret = -EFAULT;
+ size = min(sizeof(attr->set_buf) - 1, len);
+ if (copy_from_user(attr->set_buf, buf, size))
+ goto out;
+
+ ret = len; /* claim we got the whole input */
+ attr->set_buf[size] = '\0';
+ val = simple_strtol(attr->set_buf, NULL, 0);
+ attr->set(attr->data, val);
+out:
+ up(&attr->sem);
+ return ret;
+}
+
EXPORT_SYMBOL(dcache_dir_close);
EXPORT_SYMBOL(dcache_dir_lseek);
EXPORT_SYMBOL(dcache_dir_open);
@@ -547,3 +642,7 @@
EXPORT_SYMBOL(simple_transaction_get);
EXPORT_SYMBOL(simple_transaction_read);
EXPORT_SYMBOL(simple_transaction_release);
+EXPORT_SYMBOL_GPL(simple_attr_open);
+EXPORT_SYMBOL_GPL(simple_attr_close);
+EXPORT_SYMBOL_GPL(simple_attr_read);
+EXPORT_SYMBOL_GPL(simple_attr_write);
--- gregkh-2.6.orig/fs/debugfs/file.c 2005-05-18 11:16:39.000000000 -0700
+++ gregkh-2.6/fs/debugfs/file.c 2005-05-18 11:18:35.000000000 -0700
@@ -45,44 +45,15 @@
.open = default_open,
};

-#define simple_type(type, format, temptype, strtolfn) \
-static ssize_t read_file_##type(struct file *file, char __user *user_buf, \
- size_t count, loff_t *ppos) \
-{ \
- char buf[32]; \
- type *val = file->private_data; \
- \
- snprintf(buf, sizeof(buf), format "\n", *val); \
- return simple_read_from_buffer(user_buf, count, ppos, buf, strlen(buf));\
-} \
-static ssize_t write_file_##type(struct file *file, const char __user *user_buf,\
- size_t count, loff_t *ppos) \
-{ \
- char *endp; \
- char buf[32]; \
- int buf_size; \
- type *val = file->private_data; \
- temptype tmp; \
- \
- memset(buf, 0x00, sizeof(buf)); \
- buf_size = min(count, (sizeof(buf)-1)); \
- if (copy_from_user(buf, user_buf, buf_size)) \
- return -EFAULT; \
- \
- tmp = strtolfn(buf, &endp, 0); \
- if ((endp == buf) || ((type)tmp != tmp)) \
- return -EINVAL; \
- *val = tmp; \
- return count; \
-} \
-static struct file_operations fops_##type = { \
- .read = read_file_##type, \
- .write = write_file_##type, \
- .open = default_open, \
-};
-simple_type(u8, "%c", unsigned long, simple_strtoul);
-simple_type(u16, "%hi", unsigned long, simple_strtoul);
-simple_type(u32, "%i", unsigned long, simple_strtoul);
+static void debugfs_u8_set(void *data, u64 val)
+{
+ *(u8 *)data = val;
+}
+static u64 debugfs_u8_get(void *data)
+{
+ return *(u8 *)data;
+}
+DEFINE_SIMPLE_ATTRIBUTE(fops_u8, debugfs_u8_get, debugfs_u8_set, "%lu\n");

/**
* debugfs_create_u8 - create a file in the debugfs filesystem that is used to read and write a unsigned 8 bit value.
@@ -116,6 +87,16 @@
}
EXPORT_SYMBOL_GPL(debugfs_create_u8);

+static void debugfs_u16_set(void *data, u64 val)
+{
+ *(u16 *)data = val;
+}
+static u64 debugfs_u16_get(void *data)
+{
+ return *(u16 *)data;
+}
+DEFINE_SIMPLE_ATTRIBUTE(fops_u16, debugfs_u16_get, debugfs_u16_set, "%lu\n");
+
/**
* debugfs_create_u16 - create a file in the debugfs filesystem that is used to read and write a unsigned 8 bit value.
*
@@ -148,6 +129,16 @@
}
EXPORT_SYMBOL_GPL(debugfs_create_u16);

+static void debugfs_u32_set(void *data, u64 val)
+{
+ *(u32 *)data = val;
+}
+static u64 debugfs_u32_get(void *data)
+{
+ return *(u32 *)data;
+}
+DEFINE_SIMPLE_ATTRIBUTE(fops_u32, debugfs_u32_get, debugfs_u32_set, "%lu\n");
+
/**
* debugfs_create_u32 - create a file in the debugfs filesystem that is used to read and write a unsigned 8 bit value.
*

2005-05-19 08:46:09

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] libfs: add simple attribute files

On Middeweken 18 Mai 2005 22:24, Greg KH wrote:

> Thanks for the patch. I've cleaned it up a bit (drop the spufs
> comments, changed the access check, and made the val be u64, and
> exported the symbols and cleaned up the debugfs portion) and added it to
> my tree. It should show up in the next -mm release. I've included the
> patch below so you can see my
> changes.

Great, thanks for cleaning up those mistakes.

I noticed one small problem with the change from 'long' to 'u64', in
that you did not change it in all places. In particular, using "%lu" to
print a u64 value will always do the wrong thing on big-endian 32 bit
platforms and maybe on some others.
Since 'u64' is '%llu' on most platforms but '%lu' on some 64 bit
platforms, I'd either do explicit cast to unsigned long long in
the printf or use unsigned long long throughout the code.

> void foo_set(void *data, long val); and
^^ u64
> long foo_get(void *data);
^^ u64

> +#define DEFINE_SIMPLE_ATTRIBUTE(__fops, __get, __set, __fmt) \
> +static int __fops ## _open(struct inode *inode, struct file *file) \
> +{ \
> + __simple_attr_check_format(__fmt, 0ul); \
^^^^ 0ull

> + else /* first read */
> + size = scnprintf(attr->get_buf, sizeof(attr->get_buf),
> + attr->fmt, attr->get(attr->data));
^^ (unsigned long long)

> +DEFINE_SIMPLE_ATTRIBUTE(fops_u8, debugfs_u8_get, debugfs_u8_set, "%lu\n");
> +DEFINE_SIMPLE_ATTRIBUTE(fops_u16, debugfs_u16_get, debugfs_u16_set, "%lu\n");
> +DEFINE_SIMPLE_ATTRIBUTE(fops_u32, debugfs_u32_get, debugfs_u32_set, "%lu\n");
%llu ^^^^

I also noticed that it is not possible to pass NULL operations to
DEFINE_SIMPLE_ATTRIBUTE() unless you change

--- a/include/linux/fs.h 2005-05-19 10:17:53.000000000 +0200
+++ b/include/linux/fs.h 2005-05-19 10:14:57.000000000 +0200
@@ -1680,7 +1680,7 @@
static int __fops ## _open(struct inode *inode, struct file *file) \
{ \
__simple_attr_check_format(__fmt, 0ul); \
- return simple_attr_open(inode, file, &__get, &__set, __fmt); \
+ return simple_attr_open(inode, file, __get, __set, __fmt); \
} \
static struct file_operations __fops = { \
.owner = THIS_MODULE, \

I'm currently away from my test machine, so I think it's easier if you
just update your patch yourself, but I could also send you an update
patch later if you prefer.

Arnd <><