have hypervisor extensions (e.g. the P4080 which has an e500mc core).
I think it makes sense for this patchset to go through Kumar Gala's -next
branch, but I still need ACKs from various people on the parts that are
not e500-specific.
1. powerpc: make irq_choose_cpu() available to all PIC drivers
2. powerpc: introduce ePAPR embedded hypervisor hcall interface
3. powerpc: introduce the ePAPR embedded hypervisor vmpic driver
4. powerpc: add Freescale hypervisor partition control functions
5. powerpc/85xx: add board support for the Freescale hypervisor
6. tty/powerpc: introduce the ePAPR embedded hypervisor byte channel driver
7. drivers/misc: introduce Freescale hypervisor management driver
Ben Herrenschmidt, please review/ack parts 1-3.
Greg Kroah-Hartman, please review/ack part 6.
Andrew Morton, please review/ack part 7.
Thank you very much for looking at this patchset. I hope to have it included
in 2.6.40.
From: Stuart Yoder <[email protected]>
Move irq_choose_cpu() into arch/powerpc/kernel/irq.c so that it can be used
by other PIC drivers. The function is not MPIC-specific.
Signed-off-by: Stuart Yoder <[email protected]>
Signed-off-by: Timur Tabi <[email protected]>
---
arch/powerpc/include/asm/irq.h | 2 ++
arch/powerpc/kernel/irq.c | 35 +++++++++++++++++++++++++++++++++++
arch/powerpc/sysdev/mpic.c | 36 ------------------------------------
3 files changed, 37 insertions(+), 36 deletions(-)
diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index 67ab5fb..1792d84 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -342,5 +342,7 @@ extern int call_handle_irq(int irq, void *p1,
struct thread_info *tp, void *func);
extern void do_IRQ(struct pt_regs *regs);
+int irq_choose_cpu(const struct cpumask *mask);
+
#endif /* _ASM_IRQ_H */
#endif /* __KERNEL__ */
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index f621b7d..ce9bf93 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -870,6 +870,41 @@ unsigned int irq_find_mapping(struct irq_host *host,
}
EXPORT_SYMBOL_GPL(irq_find_mapping);
+#ifdef CONFIG_SMP
+int irq_choose_cpu(const struct cpumask *mask)
+{
+ int cpuid;
+
+ if (cpumask_equal(mask, cpu_all_mask)) {
+ static int irq_rover;
+ static DEFINE_RAW_SPINLOCK(irq_rover_lock);
+ unsigned long flags;
+
+ /* Round-robin distribution... */
+do_round_robin:
+ raw_spin_lock_irqsave(&irq_rover_lock, flags);
+
+ irq_rover = cpumask_next(irq_rover, cpu_online_mask);
+ if (irq_rover >= nr_cpu_ids)
+ irq_rover = cpumask_first(cpu_online_mask);
+
+ cpuid = irq_rover;
+
+ raw_spin_unlock_irqrestore(&irq_rover_lock, flags);
+ } else {
+ cpuid = cpumask_first_and(mask, cpu_online_mask);
+ if (cpuid >= nr_cpu_ids)
+ goto do_round_robin;
+ }
+
+ return get_hard_smp_processor_id(cpuid);
+}
+#else
+int irq_choose_cpu(const struct cpumask *mask)
+{
+ return hard_smp_processor_id();
+}
+#endif
unsigned int irq_radix_revmap_lookup(struct irq_host *host,
irq_hw_number_t hwirq)
diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c
index f91c065..5ae1142 100644
--- a/arch/powerpc/sysdev/mpic.c
+++ b/arch/powerpc/sysdev/mpic.c
@@ -571,42 +571,6 @@ static void __init mpic_scan_ht_pics(struct mpic *mpic)
#endif /* CONFIG_MPIC_U3_HT_IRQS */
-#ifdef CONFIG_SMP
-static int irq_choose_cpu(const struct cpumask *mask)
-{
- int cpuid;
-
- if (cpumask_equal(mask, cpu_all_mask)) {
- static int irq_rover = 0;
- static DEFINE_RAW_SPINLOCK(irq_rover_lock);
- unsigned long flags;
-
- /* Round-robin distribution... */
- do_round_robin:
- raw_spin_lock_irqsave(&irq_rover_lock, flags);
-
- irq_rover = cpumask_next(irq_rover, cpu_online_mask);
- if (irq_rover >= nr_cpu_ids)
- irq_rover = cpumask_first(cpu_online_mask);
-
- cpuid = irq_rover;
-
- raw_spin_unlock_irqrestore(&irq_rover_lock, flags);
- } else {
- cpuid = cpumask_first_and(mask, cpu_online_mask);
- if (cpuid >= nr_cpu_ids)
- goto do_round_robin;
- }
-
- return get_hard_smp_processor_id(cpuid);
-}
-#else
-static int irq_choose_cpu(const struct cpumask *mask)
-{
- return hard_smp_processor_id();
-}
-#endif
-
#define mpic_irq_to_hw(virq) ((unsigned int)irq_map[virq].hwirq)
/* Find an mpic associated with a given linux interrupt */
--
1.7.3.4
ePAPR hypervisors provide operating system services via a "hypercall"
interface. The following steps need to be performed to make an hcall:
1. Load r11 with the hcall number
2. Load specific other registers with parameters
3. Issue instrucion "sc 1"
4. The return code is in r3
5. Other returned parameters are in other registers.
To provide this service to the kernel, these steps are wrapped in inline
assembly functions. Standard ePAPR hcalls are in epapr_hcalls.h, and Freescale
extensions are in fsl_hcalls.h.
Signed-off-by: Timur Tabi <[email protected]>
---
arch/powerpc/include/asm/epapr_hcalls.h | 502 +++++++++++++++++++++++
arch/powerpc/include/asm/fsl_hcalls.h | 655 +++++++++++++++++++++++++++++++
2 files changed, 1157 insertions(+), 0 deletions(-)
create mode 100644 arch/powerpc/include/asm/epapr_hcalls.h
create mode 100644 arch/powerpc/include/asm/fsl_hcalls.h
diff --git a/arch/powerpc/include/asm/epapr_hcalls.h b/arch/powerpc/include/asm/epapr_hcalls.h
new file mode 100644
index 0000000..f3b0c2c
--- /dev/null
+++ b/arch/powerpc/include/asm/epapr_hcalls.h
@@ -0,0 +1,502 @@
+/*
+ * ePAPR hcall interface
+ *
+ * Copyright 2008-2011 Freescale Semiconductor, Inc.
+ *
+ * Author: Timur Tabi <[email protected]>
+ *
+ * This file is provided under a dual BSD/GPL license. When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ * names of its contributors may be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/* A "hypercall" is an "sc 1" instruction. This header file file provides C
+ * wrapper functions for the ePAPR hypervisor interface. It is inteded
+ * for use by Linux device drivers and other operating systems.
+ *
+ * The hypercalls are implemented as inline assembly, rather than assembly
+ * language functions in a .S file, for optimization. It allows
+ * the caller to issue the hypercall instruction directly, improving both
+ * performance and memory footprint.
+ */
+
+#ifndef _EPAPR_HCALLS_H
+#define _EPAPR_HCALLS_H
+
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <asm/byteorder.h>
+
+#define EV_BYTE_CHANNEL_SEND 1
+#define EV_BYTE_CHANNEL_RECEIVE 2
+#define EV_BYTE_CHANNEL_POLL 3
+#define EV_INT_SET_CONFIG 4
+#define EV_INT_GET_CONFIG 5
+#define EV_INT_SET_MASK 6
+#define EV_INT_GET_MASK 7
+#define EV_INT_IACK 9
+#define EV_INT_EOI 10
+#define EV_INT_SEND_IPI 11
+#define EV_INT_SET_TASK_PRIORITY 12
+#define EV_INT_GET_TASK_PRIORITY 13
+#define EV_DOORBELL_SEND 14
+#define EV_MSGSND 15
+#define EV_IDLE 16
+
+/* vendor ID: epapr */
+#define EV_LOCAL_VENDOR_ID 0 /* for private use */
+#define EV_EPAPR_VENDOR_ID 1
+#define EV_FSL_VENDOR_ID 2 /* Freescale Semiconductor */
+#define EV_IBM_VENDOR_ID 3 /* IBM */
+#define EV_GHS_VENDOR_ID 4 /* Green Hills Software */
+#define EV_ENEA_VENDOR_ID 5 /* Enea */
+#define EV_WR_VENDOR_ID 6 /* Wind River Systems */
+#define EV_AMCC_VENDOR_ID 7 /* Applied Micro Circuits */
+#define EV_KVM_VENDOR_ID 42 /* KVM */
+
+/* The max number of bytes that a byte channel can send or receive per call */
+#define EV_BYTE_CHANNEL_MAX_BYTES 16
+
+
+#define _EV_HCALL_TOKEN(id, num) (((id) << 16) | (num))
+#define EV_HCALL_TOKEN(hcall_num) _EV_HCALL_TOKEN(EV_EPAPR_VENDOR_ID, hcall_num)
+
+/* epapr error codes */
+#define EV_EPERM 1 /* Operation not permitted */
+#define EV_ENOENT 2 /* Entry Not Found */
+#define EV_EIO 3 /* I/O error occured */
+#define EV_EAGAIN 4 /* The operation had insufficient
+ * resources to complete and should be
+ * retried
+ */
+#define EV_ENOMEM 5 /* There was insufficient memory to
+ * complete the operation */
+#define EV_EFAULT 6 /* Bad guest address */
+#define EV_ENODEV 7 /* No such device */
+#define EV_EINVAL 8 /* An argument supplied to the hcall
+ was out of range or invalid */
+#define EV_INTERNAL 9 /* An internal error occured */
+#define EV_CONFIG 10 /* A configuration error was detected */
+#define EV_INVALID_STATE 11 /* The object is in an invalid state */
+#define EV_UNIMPLEMENTED 12 /* Unimplemented hypercall */
+#define EV_BUFFER_OVERFLOW 13 /* Caller-supplied buffer too small */
+
+/*
+ * Hypercall register clobber list
+ *
+ * These macros are used to define the list of clobbered registers during a
+ * hypercall. Technically, registers r0 and r3-r12 are always clobbered,
+ * but the gcc inline assembly syntax does not allow us to specify registers
+ * on the clobber list that are also on the input/output list. Therefore,
+ * the lists of clobbered registers depends on the number of register
+ * parmeters ("+r" and "=r") passed to the hypercall.
+ *
+ * Each assembly block should use one of the HCALL_CLOBBERSx macros. As a
+ * general rule, 'x' is the number of parameters passed to the assembly
+ * block *except* for r11.
+ *
+ * If you're not sure, just use the smallest value of 'x' that does not
+ * generate a compilation error. Because these are static inline functions,
+ * the compiler will only check the clobber list for a function if you
+ * compile code that calls that function.
+ *
+ * r3 and r11 are not included in any clobbers list because they are always
+ * listed as output registers.
+ *
+ * XER, CTR, and LR are currently listed as clobbers because it's uncertain
+ * whether they will be clobbered.
+ *
+ * Note that r11 can be used as an output parameter.
+*/
+
+/* List of common clobbered registers. Do not use this macro. */
+#define EV_HCALL_CLOBBERS "r0", "r12", "xer", "ctr", "lr", "cc"
+
+#define EV_HCALL_CLOBBERS8 EV_HCALL_CLOBBERS
+#define EV_HCALL_CLOBBERS7 EV_HCALL_CLOBBERS8, "r10"
+#define EV_HCALL_CLOBBERS6 EV_HCALL_CLOBBERS7, "r9"
+#define EV_HCALL_CLOBBERS5 EV_HCALL_CLOBBERS6, "r8"
+#define EV_HCALL_CLOBBERS4 EV_HCALL_CLOBBERS5, "r7"
+#define EV_HCALL_CLOBBERS3 EV_HCALL_CLOBBERS4, "r6"
+#define EV_HCALL_CLOBBERS2 EV_HCALL_CLOBBERS3, "r5"
+#define EV_HCALL_CLOBBERS1 EV_HCALL_CLOBBERS2, "r4"
+
+
+/*
+ * We use "uintptr_t" to define a register because it's guaranteed to be a
+ * 32-bit integer on a 32-bit platform, and a 64-bit integer on a 64-bit
+ * platform.
+ *
+ * All registers are either input/output or output only. Registers that are
+ * initialized before making the hypercall are input/output. All
+ * input/output registers are represented with "+r". Output-only registers
+ * are represented with "=r". Do not specify any unused registers. The
+ * clobber list will tell the compiler that the hypercall modifies those
+ * registers, which is good enough.
+ */
+
+/**
+ * ev_int_set_config - configure the specified interrupt
+ * @interrupt: the interrupt number
+ * @config: configuration for this interrupt
+ * @priority: interrupt priority
+ * @destination: destination CPU number
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int ev_int_set_config(unsigned int interrupt,
+ uint32_t config, unsigned int priority, uint32_t destination)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+ register uintptr_t r4 __asm__("r4");
+ register uintptr_t r5 __asm__("r5");
+ register uintptr_t r6 __asm__("r6");
+
+ r11 = EV_HCALL_TOKEN(EV_INT_SET_CONFIG);
+ r3 = interrupt;
+ r4 = config;
+ r5 = priority;
+ r6 = destination;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3), "+r" (r4), "+r" (r5), "+r" (r6)
+ : : EV_HCALL_CLOBBERS4
+ );
+
+ return r3;
+}
+
+/**
+ * ev_int_get_config - return the config of the specified interrupt
+ * @interrupt: the interrupt number
+ * @config: returned configuration for this interrupt
+ * @priority: returned interrupt priority
+ * @destination: returned destination CPU number
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int ev_int_get_config(unsigned int interrupt,
+ uint32_t *config, unsigned int *priority, uint32_t *destination)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+ register uintptr_t r4 __asm__("r4");
+ register uintptr_t r5 __asm__("r5");
+ register uintptr_t r6 __asm__("r6");
+
+ r11 = EV_HCALL_TOKEN(EV_INT_GET_CONFIG);
+ r3 = interrupt;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3), "=r" (r4), "=r" (r5), "=r" (r6)
+ : : EV_HCALL_CLOBBERS4
+ );
+
+ *config = r4;
+ *priority = r5;
+ *destination = r6;
+
+ return r3;
+}
+
+/**
+ * ev_int_set_mask - sets the mask for the specified interrupt source
+ * @interrupt: the interrupt number
+ * @mask: 0=enable interrupts, 1=disable interrupts
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int ev_int_set_mask(unsigned int interrupt,
+ unsigned int mask)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+ register uintptr_t r4 __asm__("r4");
+
+ r11 = EV_HCALL_TOKEN(EV_INT_SET_MASK);
+ r3 = interrupt;
+ r4 = mask;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3), "+r" (r4)
+ : : EV_HCALL_CLOBBERS2
+ );
+
+ return r3;
+}
+
+/**
+ * ev_int_get_mask - returns the mask for the specified interrupt source
+ * @interrupt: the interrupt number
+ * @mask: returned mask for this interrupt (0=enabled, 1=disabled)
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int ev_int_get_mask(unsigned int interrupt,
+ unsigned int *mask)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+ register uintptr_t r4 __asm__("r4");
+
+ r11 = EV_HCALL_TOKEN(EV_INT_GET_MASK);
+ r3 = interrupt;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3), "=r" (r4)
+ : : EV_HCALL_CLOBBERS2
+ );
+
+ *mask = r4;
+
+ return r3;
+}
+
+/**
+ * ev_int_eoi - signal the end of interrupt processing
+ * @interrupt: the interrupt number
+ *
+ * This function signals the end of processing for the the specified
+ * interrupt, which must be the interrupt currently in service. By
+ * definition, this is also the highest-priority interrupt.
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int ev_int_eoi(unsigned int interrupt)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+
+ r11 = EV_HCALL_TOKEN(EV_INT_EOI);
+ r3 = interrupt;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3)
+ : : EV_HCALL_CLOBBERS1
+ );
+
+ return r3;
+}
+
+/**
+ * ev_byte_channel_send - send characters to a byte stream
+ * @handle: byte stream handle
+ * @count: (input) num of chars to send, (output) num chars sent
+ * @buffer: pointer to a 16-byte buffer
+ *
+ * @buffer must be at least 16 bytes long, because all 16 bytes will be
+ * read from memory into registers, even if count < 16.
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int ev_byte_channel_send(unsigned int handle,
+ unsigned int *count, const char buffer[EV_BYTE_CHANNEL_MAX_BYTES])
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+ register uintptr_t r4 __asm__("r4");
+ register uintptr_t r5 __asm__("r5");
+ register uintptr_t r6 __asm__("r6");
+ register uintptr_t r7 __asm__("r7");
+ register uintptr_t r8 __asm__("r8");
+ const uint32_t *p = (const uint32_t *) buffer;
+
+ r11 = EV_HCALL_TOKEN(EV_BYTE_CHANNEL_SEND);
+ r3 = handle;
+ r4 = *count;
+ r5 = be32_to_cpu(p[0]);
+ r6 = be32_to_cpu(p[1]);
+ r7 = be32_to_cpu(p[2]);
+ r8 = be32_to_cpu(p[3]);
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3),
+ "+r" (r4), "+r" (r5), "+r" (r6), "+r" (r7), "+r" (r8)
+ : : EV_HCALL_CLOBBERS6
+ );
+
+ *count = r4;
+
+ return r3;
+}
+
+/**
+ * ev_byte_channel_receive - fetch characters from a byte channel
+ * @handle: byte channel handle
+ * @count: (input) max num of chars to receive, (output) num chars received
+ * @buffer: pointer to a 16-byte buffer
+ *
+ * The size of @buffer must be at least 16 bytes, even if you request fewer
+ * than 16 characters, because we always write 16 bytes to @buffer. This is
+ * for performance reasons.
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int ev_byte_channel_receive(unsigned int handle,
+ unsigned int *count, char buffer[EV_BYTE_CHANNEL_MAX_BYTES])
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+ register uintptr_t r4 __asm__("r4");
+ register uintptr_t r5 __asm__("r5");
+ register uintptr_t r6 __asm__("r6");
+ register uintptr_t r7 __asm__("r7");
+ register uintptr_t r8 __asm__("r8");
+ uint32_t *p = (uint32_t *) buffer;
+
+ r11 = EV_HCALL_TOKEN(EV_BYTE_CHANNEL_RECEIVE);
+ r3 = handle;
+ r4 = *count;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3), "+r" (r4),
+ "=r" (r5), "=r" (r6), "=r" (r7), "=r" (r8)
+ : : EV_HCALL_CLOBBERS6
+ );
+
+ *count = r4;
+ p[0] = cpu_to_be32(r5);
+ p[1] = cpu_to_be32(r6);
+ p[2] = cpu_to_be32(r7);
+ p[3] = cpu_to_be32(r8);
+
+ return r3;
+}
+
+/**
+ * ev_byte_channel_poll - returns the status of the byte channel buffers
+ * @handle: byte channel handle
+ * @rx_count: returned count of bytes in receive queue
+ * @tx_count: returned count of free space in transmit queue
+ *
+ * This function reports the amount of data in the receive queue (i.e. the
+ * number of bytes you can read), and the amount of free space in the transmit
+ * queue (i.e. the number of bytes you can write).
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int ev_byte_channel_poll(unsigned int handle,
+ unsigned int *rx_count, unsigned int *tx_count)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+ register uintptr_t r4 __asm__("r4");
+ register uintptr_t r5 __asm__("r5");
+
+ r11 = EV_HCALL_TOKEN(EV_BYTE_CHANNEL_POLL);
+ r3 = handle;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3), "=r" (r4), "=r" (r5)
+ : : EV_HCALL_CLOBBERS3
+ );
+
+ *rx_count = r4;
+ *tx_count = r5;
+
+ return r3;
+}
+
+/**
+ * ev_int_iack - acknowledge an interrupt
+ * @handle: handle to the target interrupt controller
+ * @vector: returned interrupt vector
+ *
+ * If handle is zero, the function returns the next interrupt source
+ * number to be handled irrespective of the hierarchy or cascading
+ * of interrupt controllers. If non-zero, specifies a handle to the
+ * interrupt controller that is the target of the acknowledge.
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int ev_int_iack(unsigned int handle,
+ unsigned int *vector)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+ register uintptr_t r4 __asm__("r4");
+
+ r11 = EV_HCALL_TOKEN(EV_INT_IACK);
+ r3 = handle;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3), "=r" (r4)
+ : : EV_HCALL_CLOBBERS2
+ );
+
+ *vector = r4;
+
+ return r3;
+}
+
+/**
+ * ev_doorbell_send - send a doorbell to another partition
+ * @handle: doorbell send handle
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int ev_doorbell_send(unsigned int handle)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+
+ r11 = EV_HCALL_TOKEN(EV_DOORBELL_SEND);
+ r3 = handle;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3)
+ : : EV_HCALL_CLOBBERS1
+ );
+
+ return r3;
+}
+
+/**
+ * ev_idle -- wait for next interrupt on this core
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int ev_idle(void)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+
+ r11 = EV_HCALL_TOKEN(EV_IDLE);
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "=r" (r3)
+ : : EV_HCALL_CLOBBERS1
+ );
+
+ return r3;
+}
+
+#endif
diff --git a/arch/powerpc/include/asm/fsl_hcalls.h b/arch/powerpc/include/asm/fsl_hcalls.h
new file mode 100644
index 0000000..922d9b5
--- /dev/null
+++ b/arch/powerpc/include/asm/fsl_hcalls.h
@@ -0,0 +1,655 @@
+/*
+ * Freescale hypervisor call interface
+ *
+ * Copyright 2008-2010 Freescale Semiconductor, Inc.
+ *
+ * Author: Timur Tabi <[email protected]>
+ *
+ * This file is provided under a dual BSD/GPL license. When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ * names of its contributors may be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _FSL_HCALLS_H
+#define _FSL_HCALLS_H
+
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <asm/byteorder.h>
+#include <asm/epapr_hcalls.h>
+
+#define FH_API_VERSION 1
+
+#define FH_ERR_GET_INFO 1
+#define FH_PARTITION_GET_DTPROP 2
+#define FH_PARTITION_SET_DTPROP 3
+#define FH_PARTITION_RESTART 4
+#define FH_PARTITION_GET_STATUS 5
+#define FH_PARTITION_START 6
+#define FH_PARTITION_STOP 7
+#define FH_PARTITION_MEMCPY 8
+#define FH_DMA_ENABLE 9
+#define FH_DMA_DISABLE 10
+#define FH_SEND_NMI 11
+#define FH_VMPIC_GET_MSIR 12
+#define FH_SYSTEM_RESET 13
+#define FH_GET_CORE_STATE 14
+#define FH_ENTER_NAP 15
+#define FH_EXIT_NAP 16
+#define FH_CLAIM_DEVICE 17
+#define FH_PARTITION_STOP_DMA 18
+
+/* vendor ID: Freescale Semiconductor */
+#define FH_HCALL_TOKEN(num) _EV_HCALL_TOKEN(EV_FSL_VENDOR_ID, num)
+
+/*
+ * We use "uintptr_t" to define a register because it's guaranteed to be a
+ * 32-bit integer on a 32-bit platform, and a 64-bit integer on a 64-bit
+ * platform.
+ *
+ * All registers are either input/output or output only. Registers that are
+ * initialized before making the hypercall are input/output. All
+ * input/output registers are represented with "+r". Output-only registers
+ * are represented with "=r". Do not specify any unused registers. The
+ * clobber list will tell the compiler that the hypercall modifies those
+ * registers, which is good enough.
+ */
+
+/**
+ * fh_send_nmi - send NMI to virtual cpu(s).
+ * @vcpu_mask: send NMI to virtual cpu(s) specified by this mask.
+ *
+ * Returns 0 for success, or EINVAL for invalid vcpu_mask.
+ */
+static inline unsigned int fh_send_nmi(unsigned int vcpu_mask)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+
+ r11 = FH_HCALL_TOKEN(FH_SEND_NMI);
+ r3 = vcpu_mask;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3)
+ : : EV_HCALL_CLOBBERS1
+ );
+
+ return r3;
+}
+
+/* Arbitrary limits to avoid excessive memory allocation in hypervisor */
+#define FH_DTPROP_MAX_PATHLEN 4096
+#define FH_DTPROP_MAX_PROPLEN 32768
+
+/**
+ * fh_partiton_get_dtprop - get a property from a guest device tree.
+ * @handle: handle of partition whose device tree is to be accessed
+ * @dtpath_addr: physical address of device tree path to access
+ * @propname_addr: physical address of name of property
+ * @propvalue_addr: physical address of property value buffer
+ * @propvalue_len: length of buffer on entry, length of property on return
+ *
+ * Returns zero on success, non-zero on error.
+ */
+static inline unsigned int fh_partition_get_dtprop(int handle,
+ uint64_t dtpath_addr,
+ uint64_t propname_addr,
+ uint64_t propvalue_addr,
+ uint32_t *propvalue_len)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+ register uintptr_t r4 __asm__("r4");
+ register uintptr_t r5 __asm__("r5");
+ register uintptr_t r6 __asm__("r6");
+ register uintptr_t r7 __asm__("r7");
+ register uintptr_t r8 __asm__("r8");
+ register uintptr_t r9 __asm__("r9");
+ register uintptr_t r10 __asm__("r10");
+
+ r11 = FH_HCALL_TOKEN(FH_PARTITION_GET_DTPROP);
+ r3 = handle;
+
+#ifdef CONFIG_PHYS_64BIT
+ r4 = dtpath_addr >> 32;
+ r6 = propname_addr >> 32;
+ r8 = propvalue_addr >> 32;
+#else
+ r4 = 0;
+ r6 = 0;
+ r8 = 0;
+#endif
+ r5 = (uint32_t)dtpath_addr;
+ r7 = (uint32_t)propname_addr;
+ r9 = (uint32_t)propvalue_addr;
+ r10 = *propvalue_len;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11),
+ "+r" (r3), "+r" (r4), "+r" (r5), "+r" (r6), "+r" (r7),
+ "+r" (r8), "+r" (r9), "+r" (r10)
+ : : EV_HCALL_CLOBBERS8
+ );
+
+ *propvalue_len = r4;
+ return r3;
+}
+
+/**
+ * Set a property in a guest device tree.
+ * @handle: handle of partition whose device tree is to be accessed
+ * @dtpath_addr: physical address of device tree path to access
+ * @propname_addr: physical address of name of property
+ * @propvalue_addr: physical address of property value
+ * @propvalue_len: length of property
+ *
+ * Returns zero on success, non-zero on error.
+ */
+static inline unsigned int fh_partition_set_dtprop(int handle,
+ uint64_t dtpath_addr,
+ uint64_t propname_addr,
+ uint64_t propvalue_addr,
+ uint32_t propvalue_len)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+ register uintptr_t r4 __asm__("r4");
+ register uintptr_t r6 __asm__("r6");
+ register uintptr_t r8 __asm__("r8");
+ register uintptr_t r5 __asm__("r5");
+ register uintptr_t r7 __asm__("r7");
+ register uintptr_t r9 __asm__("r9");
+ register uintptr_t r10 __asm__("r10");
+
+ r11 = FH_HCALL_TOKEN(FH_PARTITION_SET_DTPROP);
+ r3 = handle;
+
+#ifdef CONFIG_PHYS_64BIT
+ r4 = dtpath_addr >> 32;
+ r6 = propname_addr >> 32;
+ r8 = propvalue_addr >> 32;
+#else
+ r4 = 0;
+ r6 = 0;
+ r8 = 0;
+#endif
+ r5 = (uint32_t)dtpath_addr;
+ r7 = (uint32_t)propname_addr;
+ r9 = (uint32_t)propvalue_addr;
+ r10 = propvalue_len;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11),
+ "+r" (r3), "+r" (r4), "+r" (r5), "+r" (r6), "+r" (r7),
+ "+r" (r8), "+r" (r9), "+r" (r10)
+ : : EV_HCALL_CLOBBERS8
+ );
+
+ return r3;
+}
+
+/**
+ * fh_partition_restart - reboot the current partition
+ * @partition: partition ID
+ *
+ * Returns an error code if reboot failed. Does not return if it succeeds.
+ */
+static inline unsigned int fh_partition_restart(unsigned int partition)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+
+ r11 = FH_HCALL_TOKEN(FH_PARTITION_RESTART);
+ r3 = partition;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3)
+ : : EV_HCALL_CLOBBERS1
+ );
+
+ return r3;
+}
+
+#define FH_PARTITION_STOPPED 0
+#define FH_PARTITION_RUNNING 1
+#define FH_PARTITION_STARTING 2
+#define FH_PARTITION_STOPPING 3
+#define FH_PARTITION_PAUSING 4
+#define FH_PARTITION_PAUSED 5
+#define FH_PARTITION_RESUMING 6
+
+/**
+ * fh_partition_get_status - gets the status of a partition
+ * @partition: partition ID
+ * @status: returned status code
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int fh_partition_get_status(unsigned int partition,
+ unsigned int *status)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+ register uintptr_t r4 __asm__("r4");
+
+ r11 = FH_HCALL_TOKEN(FH_PARTITION_GET_STATUS);
+ r3 = partition;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3), "=r" (r4)
+ : : EV_HCALL_CLOBBERS2
+ );
+
+ *status = r4;
+
+ return r3;
+}
+
+/**
+ * fh_partition_start - boots and starts execution of the specified partition
+ * @partition: partition ID
+ * @entry_point: guest physical address to start execution
+ *
+ * The hypervisor creates a 1-to-1 virtual/physical IMA mapping, so at boot
+ * time, guest physical address are the same as guest virtual addresses.
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int fh_partition_start(unsigned int partition,
+ uint32_t entry_point, int load)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+ register uintptr_t r4 __asm__("r4");
+ register uintptr_t r5 __asm__("r5");
+
+ r11 = FH_HCALL_TOKEN(FH_PARTITION_START);
+ r3 = partition;
+ r4 = entry_point;
+ r5 = load;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3), "+r" (r4), "+r" (r5)
+ : : EV_HCALL_CLOBBERS3
+ );
+
+ return r3;
+}
+
+/**
+ * fh_partition_stop - stops another partition
+ * @partition: partition ID
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int fh_partition_stop(unsigned int partition)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+
+ r11 = FH_HCALL_TOKEN(FH_PARTITION_STOP);
+ r3 = partition;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3)
+ : : EV_HCALL_CLOBBERS1
+ );
+
+ return r3;
+}
+
+/**
+ * struct fh_sg_list: definition of the fh_partition_memcpy S/G list
+ * @source: guest physical address to copy from
+ * @target: guest physical address to copy to
+ * @size: number of bytes to copy
+ * @reserved: reserved, must be zero
+ *
+ * The scatter/gather list for fh_partition_memcpy() is an array of these
+ * structures. The array must be guest physically contiguous.
+ *
+ * This structure must be aligned on 32-byte boundary, so that no single
+ * strucuture can span two pages.
+ */
+struct fh_sg_list {
+ uint64_t source; /**< guest physical address to copy from */
+ uint64_t target; /**< guest physical address to copy to */
+ uint64_t size; /**< number of bytes to copy */
+ uint64_t reserved; /**< reserved, must be zero */
+} __attribute__ ((aligned(32)));
+
+/**
+ * fh_partition_memcpy - copies data from one guest to another
+ * @source: the ID of the partition to copy from
+ * @target: the ID of the partition to copy to
+ * @sg_list: guest physical address of an array of &fh_sg_list structures
+ * @count: the number of entries in @sg_list
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int fh_partition_memcpy(unsigned int source,
+ unsigned int target, phys_addr_t sg_list, unsigned int count)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+ register uintptr_t r4 __asm__("r4");
+ register uintptr_t r5 __asm__("r5");
+ register uintptr_t r6 __asm__("r6");
+ register uintptr_t r7 __asm__("r7");
+
+ r11 = FH_HCALL_TOKEN(FH_PARTITION_MEMCPY);
+ r3 = source;
+ r4 = target;
+ r5 = (uint32_t) sg_list;
+
+#ifdef CONFIG_PHYS_64BIT
+ r6 = sg_list >> 32;
+#else
+ r6 = 0;
+#endif
+ r7 = count;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11),
+ "+r" (r3), "+r" (r4), "+r" (r5), "+r" (r6), "+r" (r7)
+ : : EV_HCALL_CLOBBERS5
+ );
+
+ return r3;
+}
+
+/**
+ * fh_dma_enable - enable DMA for the specified device
+ * @liodn: the LIODN of the I/O device for which to enable DMA
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int fh_dma_enable(unsigned int liodn)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+
+ r11 = FH_HCALL_TOKEN(FH_DMA_ENABLE);
+ r3 = liodn;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3)
+ : : EV_HCALL_CLOBBERS1
+ );
+
+ return r3;
+}
+
+/**
+ * fh_dma_disable - disable DMA for the specified device
+ * @liodn: the LIODN of the I/O device for which to disable DMA
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int fh_dma_disable(unsigned int liodn)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+
+ r11 = FH_HCALL_TOKEN(FH_DMA_DISABLE);
+ r3 = liodn;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3)
+ : : EV_HCALL_CLOBBERS1
+ );
+
+ return r3;
+}
+
+
+/**
+ * fh_vmpic_get_msir - returns the MPIC-MSI register value
+ * @interrupt: the interrupt number
+ * @msir_val: returned MPIC-MSI register value
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int fh_vmpic_get_msir(unsigned int interrupt,
+ unsigned int *msir_val)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+ register uintptr_t r4 __asm__("r4");
+
+ r11 = FH_HCALL_TOKEN(FH_VMPIC_GET_MSIR);
+ r3 = interrupt;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3), "=r" (r4)
+ : : EV_HCALL_CLOBBERS2
+ );
+
+ *msir_val = r4;
+
+ return r3;
+}
+
+/**
+ * fh_system_reset - reset the system
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int fh_system_reset(void)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+
+ r11 = FH_HCALL_TOKEN(FH_SYSTEM_RESET);
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "=r" (r3)
+ : : EV_HCALL_CLOBBERS1
+ );
+
+ return r3;
+}
+
+
+/**
+ * fh_err_get_info - get platform error information
+ * @queue id:
+ * 0 for guest error event queue
+ * 1 for global error event queue
+ *
+ * @pointer to store the platform error data:
+ * platform error data is returned in registers r4 - r11
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int fh_err_get_info(int queue, uint32_t *bufsize,
+ uint32_t addr_hi, uint32_t addr_lo, int peek)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+ register uintptr_t r4 __asm__("r4");
+ register uintptr_t r5 __asm__("r5");
+ register uintptr_t r6 __asm__("r6");
+ register uintptr_t r7 __asm__("r7");
+
+ r11 = FH_HCALL_TOKEN(FH_ERR_GET_INFO);
+ r3 = queue;
+ r4 = *bufsize;
+ r5 = addr_hi;
+ r6 = addr_lo;
+ r7 = peek;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3), "+r" (r4), "+r" (r5), "+r" (r6),
+ "+r" (r7)
+ : : EV_HCALL_CLOBBERS5
+ );
+
+ *bufsize = r4;
+
+ return r3;
+}
+
+
+#define FH_VCPU_RUN 0
+#define FH_VCPU_IDLE 1
+#define FH_VCPU_NAP 2
+
+/**
+ * fh_get_core_state - get the state of a vcpu
+ *
+ * @handle: handle of partition containing the vcpu
+ * @vcpu: vcpu number within the partition
+ * @state:the current state of the vcpu, see FH_VCPU_*
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int fh_get_core_state(unsigned int handle,
+ unsigned int vcpu, unsigned int *state)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+ register uintptr_t r4 __asm__("r4");
+
+ r11 = FH_HCALL_TOKEN(FH_GET_CORE_STATE);
+ r3 = handle;
+ r4 = vcpu;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3), "+r" (r4)
+ : : EV_HCALL_CLOBBERS2
+ );
+
+ *state = r4;
+ return r3;
+}
+
+/**
+ * fh_enter_nap - enter nap on a vcpu
+ *
+ * Note that though the API supports entering nap on a vcpu other
+ * than the caller, this may not be implmented and may return EINVAL.
+ *
+ * @handle: handle of partition containing the vcpu
+ * @vcpu: vcpu number within the partition
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int fh_enter_nap(unsigned int handle, unsigned int vcpu)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+ register uintptr_t r4 __asm__("r4");
+
+ r11 = FH_HCALL_TOKEN(FH_ENTER_NAP);
+ r3 = handle;
+ r4 = vcpu;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3), "+r" (r4)
+ : : EV_HCALL_CLOBBERS2
+ );
+
+ return r3;
+}
+
+/**
+ * fh_exit_nap - exit nap on a vcpu
+ * @handle: handle of partition containing the vcpu
+ * @vcpu: vcpu number within the partition
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int fh_exit_nap(unsigned int handle, unsigned int vcpu)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+ register uintptr_t r4 __asm__("r4");
+
+ r11 = FH_HCALL_TOKEN(FH_EXIT_NAP);
+ r3 = handle;
+ r4 = vcpu;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3), "+r" (r4)
+ : : EV_HCALL_CLOBBERS2
+ );
+
+ return r3;
+}
+/**
+ * fh_claim_device - claim a "claimable" shared device
+ * @handle: fsl,hv-device-handle of node to claim
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int fh_claim_device(unsigned int handle)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+
+ r11 = FH_HCALL_TOKEN(FH_CLAIM_DEVICE);
+ r3 = handle;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3)
+ : : EV_HCALL_CLOBBERS1
+ );
+
+ return r3;
+}
+
+/**
+ * Run deferred DMA disabling on a partition's private devices
+ *
+ * This applies to devices which a partition owns either privately,
+ * or which are claimable and still actively owned by that partition,
+ * and which do not have the no-dma-disable property.
+ *
+ * @handle: partition (must be stopped) whose DMA is to be disabled
+ *
+ * Returns 0 for success, or an error code.
+ */
+static inline unsigned int fh_partition_stop_dma(unsigned int handle)
+{
+ register uintptr_t r11 __asm__("r11");
+ register uintptr_t r3 __asm__("r3");
+
+ r11 = FH_HCALL_TOKEN(FH_PARTITION_STOP_DMA);
+ r3 = handle;
+
+ __asm__ __volatile__ ("sc 1"
+ : "+r" (r11), "+r" (r3)
+ : : EV_HCALL_CLOBBERS1
+ );
+
+ return r3;
+}
+#endif
--
1.7.3.4
From: Ashish Kalra <[email protected]>
The Freescale ePAPR reference hypervisor provides interrupt controller services
via a hypercall interface, instead of emulating the MPIC controller. This is
called the VMPIC.
The ePAPR "virtual interrupt controller" provides interrupt controller services
for external interrupts. External interrupts received by a partition can come
from two sources:
- Hardware interrupts - hardware interrupts come from external
interrupt lines or on-chip I/O devices.
- Virtual interrupts - virtual interrupts are generated by the hypervisor
as part of some hypervisor service or hypervisor-created virtual device.
Both types of interrupts are processed using the same programming model and
same set of hypercalls.
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Timur Tabi <[email protected]>
---
arch/powerpc/include/asm/ehv_pic.h | 40 +++++
arch/powerpc/platforms/Kconfig | 4 +
arch/powerpc/sysdev/Makefile | 1 +
arch/powerpc/sysdev/ehv_pic.c | 302 ++++++++++++++++++++++++++++++++++++
4 files changed, 347 insertions(+), 0 deletions(-)
create mode 100644 arch/powerpc/include/asm/ehv_pic.h
create mode 100644 arch/powerpc/sysdev/ehv_pic.c
diff --git a/arch/powerpc/include/asm/ehv_pic.h b/arch/powerpc/include/asm/ehv_pic.h
new file mode 100644
index 0000000..a9e1f4f
--- /dev/null
+++ b/arch/powerpc/include/asm/ehv_pic.h
@@ -0,0 +1,40 @@
+/*
+ * EHV_PIC private definitions and structure.
+ *
+ * Copyright 2008-2010 Freescale Semiconductor, Inc.
+ *
+ * This file is licensed under the terms of the GNU General Public License
+ * version 2. This program is licensed "as is" without any warranty of any
+ * kind, whether express or implied.
+ */
+#ifndef __EHV_PIC_H__
+#define __EHV_PIC_H__
+
+#include <linux/irq.h>
+
+#define NR_EHV_PIC_INTS 1024
+
+#define EHV_PIC_INFO(name) EHV_PIC_##name
+
+#define EHV_PIC_VECPRI_POLARITY_NEGATIVE 0
+#define EHV_PIC_VECPRI_POLARITY_POSITIVE 1
+#define EHV_PIC_VECPRI_SENSE_EDGE 0
+#define EHV_PIC_VECPRI_SENSE_LEVEL 0x2
+#define EHV_PIC_VECPRI_POLARITY_MASK 0x1
+#define EHV_PIC_VECPRI_SENSE_MASK 0x2
+
+struct ehv_pic {
+ /* The remapper for this EHV_PIC */
+ struct irq_host *irqhost;
+
+ /* The "linux" controller struct */
+ struct irq_chip hc_irq;
+
+ /* core int flag */
+ int coreint_flag;
+};
+
+void ehv_pic_init(void);
+unsigned int ehv_pic_get_irq(void);
+
+#endif /* __EHV_PIC_H__ */
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index f7b0772..9ac7eba 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -74,6 +74,10 @@ config MPIC
bool
default n
+config PPC_EPAPR_HV_PIC
+ bool
+ default n
+
config MPIC_WEIRD
bool
default n
diff --git a/arch/powerpc/sysdev/Makefile b/arch/powerpc/sysdev/Makefile
index 1e0c933..663bcb8 100644
--- a/arch/powerpc/sysdev/Makefile
+++ b/arch/powerpc/sysdev/Makefile
@@ -4,6 +4,7 @@ ccflags-$(CONFIG_PPC64) := -mno-minimal-toc
mpic-msi-obj-$(CONFIG_PCI_MSI) += mpic_msi.o mpic_u3msi.o mpic_pasemi_msi.o
obj-$(CONFIG_MPIC) += mpic.o $(mpic-msi-obj-y)
+obj-$(CONFIG_PPC_EPAPR_HV_PIC) += ehv_pic.o
fsl-msi-obj-$(CONFIG_PCI_MSI) += fsl_msi.o
obj-$(CONFIG_PPC_MSI_BITMAP) += msi_bitmap.o
diff --git a/arch/powerpc/sysdev/ehv_pic.c b/arch/powerpc/sysdev/ehv_pic.c
new file mode 100644
index 0000000..af1a5df
--- /dev/null
+++ b/arch/powerpc/sysdev/ehv_pic.c
@@ -0,0 +1,302 @@
+/*
+ * Driver for ePAPR Embedded Hypervisor PIC
+ *
+ * Copyright 2008-2011 Freescale Semiconductor, Inc.
+ *
+ * Author: Ashish Kalra <[email protected]>
+ *
+ * This file is licensed under the terms of the GNU General Public License
+ * version 2. This program is licensed "as is" without any warranty of any
+ * kind, whether express or implied.
+ */
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/irq.h>
+#include <linux/smp.h>
+#include <linux/interrupt.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/of.h>
+
+#include <asm/io.h>
+#include <asm/irq.h>
+#include <asm/smp.h>
+#include <asm/machdep.h>
+#include <asm/ehv_pic.h>
+#include <asm/fsl_hcalls.h>
+
+#include "../../../kernel/irq/settings.h"
+
+static struct ehv_pic *global_ehv_pic;
+static DEFINE_SPINLOCK(ehv_pic_lock);
+
+static u32 hwirq_intspec[NR_EHV_PIC_INTS];
+static u32 __iomem *mpic_percpu_base_vaddr;
+
+#define IRQ_TYPE_MPIC_DIRECT 4
+#define MPIC_EOI 0x00B0
+
+/*
+ * Linux descriptor level callbacks
+ */
+
+void ehv_pic_unmask_irq(struct irq_data *d)
+{
+ unsigned int src = virq_to_hw(d->irq);
+
+ ev_int_set_mask(src, 0);
+}
+
+void ehv_pic_mask_irq(struct irq_data *d)
+{
+ unsigned int src = virq_to_hw(d->irq);
+
+ ev_int_set_mask(src, 1);
+}
+
+void ehv_pic_end_irq(struct irq_data *d)
+{
+ unsigned int src = virq_to_hw(d->irq);
+
+ ev_int_eoi(src);
+}
+
+void ehv_pic_direct_end_irq(struct irq_data *d)
+{
+ out_be32(mpic_percpu_base_vaddr + MPIC_EOI / 4, 0);
+}
+
+int ehv_pic_set_affinity(struct irq_data *d, const struct cpumask *dest,
+ bool force)
+{
+ unsigned int src = virq_to_hw(d->irq);
+ unsigned int config, prio, cpu_dest;
+ int cpuid = irq_choose_cpu(dest);
+ unsigned long flags;
+
+ spin_lock_irqsave(&ehv_pic_lock, flags);
+ ev_int_get_config(src, &config, &prio, &cpu_dest);
+ ev_int_set_config(src, config, prio, cpuid);
+ spin_unlock_irqrestore(&ehv_pic_lock, flags);
+
+ return 0;
+}
+
+static unsigned int ehv_pic_type_to_vecpri(unsigned int type)
+{
+ /* Now convert sense value */
+
+ switch (type & IRQ_TYPE_SENSE_MASK) {
+ case IRQ_TYPE_EDGE_RISING:
+ return EHV_PIC_INFO(VECPRI_SENSE_EDGE) |
+ EHV_PIC_INFO(VECPRI_POLARITY_POSITIVE);
+
+ case IRQ_TYPE_EDGE_FALLING:
+ case IRQ_TYPE_EDGE_BOTH:
+ return EHV_PIC_INFO(VECPRI_SENSE_EDGE) |
+ EHV_PIC_INFO(VECPRI_POLARITY_NEGATIVE);
+
+ case IRQ_TYPE_LEVEL_HIGH:
+ return EHV_PIC_INFO(VECPRI_SENSE_LEVEL) |
+ EHV_PIC_INFO(VECPRI_POLARITY_POSITIVE);
+
+ case IRQ_TYPE_LEVEL_LOW:
+ default:
+ return EHV_PIC_INFO(VECPRI_SENSE_LEVEL) |
+ EHV_PIC_INFO(VECPRI_POLARITY_NEGATIVE);
+ }
+}
+
+int ehv_pic_set_irq_type(struct irq_data *d, unsigned int flow_type)
+{
+ unsigned int src = virq_to_hw(d->irq);
+ struct irq_desc *desc = irq_to_desc(d->irq);
+ unsigned int vecpri, vold, vnew, prio, cpu_dest;
+ unsigned long flags;
+
+ if (flow_type == IRQ_TYPE_NONE)
+ flow_type = IRQ_TYPE_LEVEL_LOW;
+
+ irq_settings_clr_level(desc);
+ irq_settings_set_trigger_mask(desc, flow_type);
+ if (flow_type & (IRQ_TYPE_LEVEL_HIGH | IRQ_TYPE_LEVEL_LOW))
+ irq_settings_set_level(desc);
+
+ vecpri = ehv_pic_type_to_vecpri(flow_type);
+
+ spin_lock_irqsave(&ehv_pic_lock, flags);
+ ev_int_get_config(src, &vold, &prio, &cpu_dest);
+ vnew = vold & ~(EHV_PIC_INFO(VECPRI_POLARITY_MASK) |
+ EHV_PIC_INFO(VECPRI_SENSE_MASK));
+ vnew |= vecpri;
+
+ /*
+ * TODO : Add specific interface call for platform to set
+ * individual interrupt priorities.
+ * platform currently using static/default priority for all ints
+ */
+
+ prio = 8;
+
+ ev_int_set_config(src, vecpri, prio, cpu_dest);
+
+ spin_unlock_irqrestore(&ehv_pic_lock, flags);
+ return 0;
+}
+
+static struct irq_chip ehv_pic_irq_chip = {
+ .irq_mask = ehv_pic_mask_irq,
+ .irq_unmask = ehv_pic_unmask_irq,
+ .irq_eoi = ehv_pic_end_irq,
+ .irq_set_type = ehv_pic_set_irq_type,
+};
+
+static struct irq_chip ehv_pic_direct_eoi_irq_chip = {
+ .irq_mask = ehv_pic_mask_irq,
+ .irq_unmask = ehv_pic_unmask_irq,
+ .irq_eoi = ehv_pic_direct_end_irq,
+ .irq_set_type = ehv_pic_set_irq_type,
+};
+
+/* Return an interrupt vector or NO_IRQ if no interrupt is pending. */
+unsigned int ehv_pic_get_irq(void)
+{
+ int irq;
+
+ BUG_ON(global_ehv_pic == NULL);
+
+ if (global_ehv_pic->coreint_flag)
+ irq = mfspr(SPRN_EPR); /* if core int mode */
+ else
+ ev_int_iack(0, &irq); /* legacy mode */
+
+ if (irq == 0xFFFF) /* 0xFFFF --> no irq is pending */
+ return NO_IRQ;
+
+ /*
+ * this will also setup revmap[] in the slow path for the first
+ * time, next calls will always use fast path by indexing revmap
+ */
+ return irq_linear_revmap(global_ehv_pic->irqhost, irq);
+}
+
+static int ehv_pic_host_match(struct irq_host *h, struct device_node *node)
+{
+ /* Exact match, unless ehv_pic node is NULL */
+ return h->of_node == NULL || h->of_node == node;
+}
+
+static int ehv_pic_host_map(struct irq_host *h, unsigned int virq,
+ irq_hw_number_t hw)
+{
+ struct ehv_pic *ehv_pic = h->host_data;
+ struct irq_chip *chip;
+
+ /* Default chip */
+ chip = &ehv_pic->hc_irq;
+
+ if (mpic_percpu_base_vaddr)
+ if (hwirq_intspec[hw] & IRQ_TYPE_MPIC_DIRECT)
+ chip = &ehv_pic_direct_eoi_irq_chip;
+
+ irq_set_chip_data(virq, chip);
+ /*
+ * using handle_fasteoi_irq as our irq handler, this will
+ * only call the eoi callback and suitable for the MPIC
+ * controller which set ISR/IPR automatically and clear the
+ * highest priority active interrupt in ISR/IPR when we do
+ * a specific eoi
+ */
+ irq_set_chip_and_handler(virq, chip, handle_fasteoi_irq);
+
+ /* Set default irq type */
+ irq_set_irq_type(virq, IRQ_TYPE_NONE);
+
+ return 0;
+}
+
+static int ehv_pic_host_xlate(struct irq_host *h, struct device_node *ct,
+ const u32 *intspec, unsigned int intsize,
+ irq_hw_number_t *out_hwirq, unsigned int *out_flags)
+
+{
+ /*
+ * interrupt sense values coming from the guest device tree
+ * interrupt specifiers can have four possible sense and
+ * level encoding information and they need to
+ * be translated between firmware type & linux type.
+ */
+
+ static unsigned char map_of_senses_to_linux_irqtype[4] = {
+ IRQ_TYPE_EDGE_FALLING,
+ IRQ_TYPE_EDGE_RISING,
+ IRQ_TYPE_LEVEL_LOW,
+ IRQ_TYPE_LEVEL_HIGH,
+ };
+
+ *out_hwirq = intspec[0];
+ if (intsize > 1) {
+ hwirq_intspec[intspec[0]] = intspec[1];
+ *out_flags = map_of_senses_to_linux_irqtype[intspec[1] &
+ ~IRQ_TYPE_MPIC_DIRECT];
+ } else {
+ *out_flags = IRQ_TYPE_NONE;
+ }
+
+ return 0;
+}
+
+static struct irq_host_ops ehv_pic_host_ops = {
+ .match = ehv_pic_host_match,
+ .map = ehv_pic_host_map,
+ .xlate = ehv_pic_host_xlate,
+};
+
+void __init ehv_pic_init(void)
+{
+ struct device_node *np, *np2;
+ struct ehv_pic *ehv_pic;
+ int coreint_flag = 1;
+
+ np = of_find_compatible_node(NULL, NULL, "epapr,hv-pic");
+ if (!np) {
+ pr_err("ehv_pic_init: could not find epapr,hv-pic node\n");
+ return;
+ }
+
+ if (!of_find_property(np, "has-external-proxy", NULL))
+ coreint_flag = 0;
+
+ ehv_pic = kzalloc(sizeof(struct ehv_pic), GFP_KERNEL);
+ if (!ehv_pic) {
+ of_node_put(np);
+ return;
+ }
+
+ ehv_pic->irqhost = irq_alloc_host(np, IRQ_HOST_MAP_LINEAR,
+ NR_EHV_PIC_INTS, &ehv_pic_host_ops, 0);
+
+ if (!ehv_pic->irqhost) {
+ of_node_put(np);
+ return;
+ }
+
+ np2 = of_find_compatible_node(NULL, NULL, "fsl,hv-mpic-per-cpu");
+ if (np2) {
+ mpic_percpu_base_vaddr = of_iomap(np2, 0);
+ if (!mpic_percpu_base_vaddr)
+ pr_err("ehv_pic_init: of_iomap failed\n");
+
+ of_node_put(np2);
+ }
+
+ ehv_pic->irqhost->host_data = ehv_pic;
+ ehv_pic->hc_irq = ehv_pic_irq_chip;
+ ehv_pic->hc_irq.irq_set_affinity = ehv_pic_set_affinity;
+ ehv_pic->coreint_flag = coreint_flag;
+
+ global_ehv_pic = ehv_pic;
+ irq_set_default_host(global_ehv_pic->irqhost);
+}
--
1.7.3.4
Add functions to restart and halt the current partition when running under
the Freescale hypervisor. These functions should be assigned to various
function pointers of the ppc_md structure during the .probe() function for
the board:
ppc_md.restart = fsl_hv_restart;
ppc_md.power_off = fsl_hv_halt;
ppc_md.halt = fsl_hv_halt;
Signed-off-by: Timur Tabi <[email protected]>
---
arch/powerpc/sysdev/fsl_soc.c | 27 +++++++++++++++++++++++++++
arch/powerpc/sysdev/fsl_soc.h | 3 +++
2 files changed, 30 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/sysdev/fsl_soc.c b/arch/powerpc/sysdev/fsl_soc.c
index 19e5015..265313e 100644
--- a/arch/powerpc/sysdev/fsl_soc.c
+++ b/arch/powerpc/sysdev/fsl_soc.c
@@ -41,6 +41,7 @@
#include <sysdev/fsl_soc.h>
#include <mm/mmu_decl.h>
#include <asm/cpm2.h>
+#include <asm/fsl_hcalls.h> /* For the Freescale hypervisor */
extern void init_fcc_ioports(struct fs_platform_info*);
extern void init_fec_ioports(struct fs_platform_info*);
@@ -252,3 +253,29 @@ void fsl_rstcr_restart(char *cmd)
struct platform_diu_data_ops diu_ops;
EXPORT_SYMBOL(diu_ops);
#endif
+
+/*
+ * Restart the current partition
+ *
+ * This function should be assigned to the ppc_md.restart function pointer,
+ * to initiate a partition restart when we're running under the Freescale
+ * hypervisor.
+ */
+void fsl_hv_restart(char *cmd)
+{
+ pr_info("hv restart\n");
+ fh_partition_restart(-1);
+}
+
+/*
+ * Halt the current partition
+ *
+ * This function should be assigned to the ppc_md.power_off and ppc_md.halt
+ * function pointers, to shut down the partition when we're running under
+ * the Freescale hypervisor.
+ */
+void fsl_hv_halt(void)
+{
+ pr_info("hv exit\n");
+ fh_partition_stop(-1);
+}
diff --git a/arch/powerpc/sysdev/fsl_soc.h b/arch/powerpc/sysdev/fsl_soc.h
index 5360948..2ece02b 100644
--- a/arch/powerpc/sysdev/fsl_soc.h
+++ b/arch/powerpc/sysdev/fsl_soc.h
@@ -36,5 +36,8 @@ struct platform_diu_data_ops {
extern struct platform_diu_data_ops diu_ops;
#endif
+void fsl_hv_restart(char *cmd);
+void fsl_hv_halt(void);
+
#endif
#endif
--
1.7.3.4
Add support for the ePAPR-compliant Freescale hypervisor (aka "Topaz") on the
Freescale P3041DS, P4080DS, and P5020DS reference boards.
Signed-off-by: Timur Tabi <[email protected]>
---
arch/powerpc/platforms/85xx/Kconfig | 3 +++
arch/powerpc/platforms/85xx/corenet_ds.c | 7 +++++++
arch/powerpc/platforms/85xx/p3041_ds.c | 16 +++++++++++++++-
arch/powerpc/platforms/85xx/p4080_ds.c | 29 ++++++++++++++++-------------
arch/powerpc/platforms/85xx/p5020_ds.c | 16 +++++++++++++++-
5 files changed, 56 insertions(+), 15 deletions(-)
diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms/85xx/Kconfig
index b6976e1..f1e9db6 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -163,6 +163,7 @@ config P3041_DS
select SWIOTLB
select MPC8xxx_GPIO
select HAS_RAPIDIO
+ select PPC_EPAPR_HV_PIC
help
This option enables support for the P3041 DS board
@@ -174,6 +175,7 @@ config P4080_DS
select SWIOTLB
select MPC8xxx_GPIO
select HAS_RAPIDIO
+ select PPC_EPAPR_HV_PIC
help
This option enables support for the P4080 DS board
@@ -188,6 +190,7 @@ config P5020_DS
select SWIOTLB
select MPC8xxx_GPIO
select HAS_RAPIDIO
+ select PPC_EPAPR_HV_PIC
help
This option enables support for the P5020 DS board
diff --git a/arch/powerpc/platforms/85xx/corenet_ds.c b/arch/powerpc/platforms/85xx/corenet_ds.c
index 2ab338c..07a5e67 100644
--- a/arch/powerpc/platforms/85xx/corenet_ds.c
+++ b/arch/powerpc/platforms/85xx/corenet_ds.c
@@ -116,6 +116,13 @@ static const struct of_device_id of_device_ids[] __devinitconst = {
{
.compatible = "fsl,rapidio-delta",
},
+ /* The following two are for the Freescale hypervisor */
+ {
+ .name = "hypervisor",
+ },
+ {
+ .name = "handles",
+ },
{}
};
diff --git a/arch/powerpc/platforms/85xx/p3041_ds.c b/arch/powerpc/platforms/85xx/p3041_ds.c
index 0ed52e1..e2cfb6b 100644
--- a/arch/powerpc/platforms/85xx/p3041_ds.c
+++ b/arch/powerpc/platforms/85xx/p3041_ds.c
@@ -30,6 +30,7 @@
#include <linux/of_platform.h>
#include <sysdev/fsl_soc.h>
#include <sysdev/fsl_pci.h>
+#include <asm/ehv_pic.h>
#include "corenet_ds.h"
@@ -40,7 +41,20 @@ static int __init p3041_ds_probe(void)
{
unsigned long root = of_get_flat_dt_root();
- return of_flat_dt_is_compatible(root, "fsl,P3041DS");
+ if (of_flat_dt_is_compatible(root, "fsl,P3041DS"))
+ return 1;
+
+ /* Check if we're running under the Freescale hypervisor */
+ if (of_flat_dt_is_compatible(root, "fsl,P3041DS-hv")) {
+ ppc_md.init_IRQ = ehv_pic_init;
+ ppc_md.get_irq = ehv_pic_get_irq;
+ ppc_md.restart = fsl_hv_restart;
+ ppc_md.power_off = fsl_hv_halt;
+ ppc_md.halt = fsl_hv_halt;
+ return 1;
+ }
+
+ return 0;
}
define_machine(p3041_ds) {
diff --git a/arch/powerpc/platforms/85xx/p4080_ds.c b/arch/powerpc/platforms/85xx/p4080_ds.c
index 8417046..af074ff 100644
--- a/arch/powerpc/platforms/85xx/p4080_ds.c
+++ b/arch/powerpc/platforms/85xx/p4080_ds.c
@@ -29,13 +29,10 @@
#include <linux/of_platform.h>
#include <sysdev/fsl_soc.h>
#include <sysdev/fsl_pci.h>
+#include <asm/ehv_pic.h>
#include "corenet_ds.h"
-#ifdef CONFIG_PCI
-static int primary_phb_addr;
-#endif
-
/*
* Called very early, device-tree isn't unflattened
*/
@@ -43,17 +40,20 @@ static int __init p4080_ds_probe(void)
{
unsigned long root = of_get_flat_dt_root();
- if (of_flat_dt_is_compatible(root, "fsl,P4080DS")) {
-#ifdef CONFIG_PCI
- /* treat PCIe1 as primary,
- * shouldn't matter as we have no ISA on the board
- */
- primary_phb_addr = 0x0000;
-#endif
+ if (of_flat_dt_is_compatible(root, "fsl,P4080DS"))
+ return 1;
+
+ /* Check if we're running under the Freescale hypervisor */
+ if (of_flat_dt_is_compatible(root, "fsl,P4080DS-hv")) {
+ ppc_md.init_IRQ = ehv_pic_init;
+ ppc_md.get_irq = ehv_pic_get_irq;
+ ppc_md.restart = fsl_hv_restart;
+ ppc_md.power_off = fsl_hv_halt;
+ ppc_md.halt = fsl_hv_halt;
return 1;
- } else {
- return 0;
}
+
+ return 0;
}
define_machine(p4080_ds) {
@@ -71,4 +71,7 @@ define_machine(p4080_ds) {
};
machine_device_initcall(p4080_ds, corenet_ds_publish_devices);
+
+#ifdef CONFIG_SWIOTLB
machine_arch_initcall(p4080_ds, swiotlb_setup_bus_notifier);
+#endif
diff --git a/arch/powerpc/platforms/85xx/p5020_ds.c b/arch/powerpc/platforms/85xx/p5020_ds.c
index 7467b71..94348c9 100644
--- a/arch/powerpc/platforms/85xx/p5020_ds.c
+++ b/arch/powerpc/platforms/85xx/p5020_ds.c
@@ -30,6 +30,7 @@
#include <linux/of_platform.h>
#include <sysdev/fsl_soc.h>
#include <sysdev/fsl_pci.h>
+#include <asm/ehv_pic.h>
#include "corenet_ds.h"
@@ -40,7 +41,20 @@ static int __init p5020_ds_probe(void)
{
unsigned long root = of_get_flat_dt_root();
- return of_flat_dt_is_compatible(root, "fsl,P5020DS");
+ if (of_flat_dt_is_compatible(root, "fsl,P5020DS"))
+ return 1;
+
+ /* Check if we're running under the Freescale hypervisor */
+ if (of_flat_dt_is_compatible(root, "fsl,P5020DS-hv")) {
+ ppc_md.init_IRQ = ehv_pic_init;
+ ppc_md.get_irq = ehv_pic_get_irq;
+ ppc_md.restart = fsl_hv_restart;
+ ppc_md.power_off = fsl_hv_halt;
+ ppc_md.halt = fsl_hv_halt;
+ return 1;
+ }
+
+ return 0;
}
define_machine(p5020_ds) {
--
1.7.3.4
The ePAPR embedded hypervisor specification provides an API for "byte
channels", which are serial-like virtual devices for sending and receiving
streams of bytes. This driver provides Linux kernel support for byte
channels via three distinct interfaces:
1) An early-console (udbg) driver. This provides early console output
through a byte channel. The byte channel handle must be specified in a
Kconfig option.
2) A normal console driver. Output is sent to the byte channel designated
for stdout in the device tree. The console driver is for handling kernel
printk calls.
3) A tty driver, which is used to handle user-space input and output. The
byte channel used for the console is designated as the default tty.
Signed-off-by: Timur Tabi <[email protected]>
---
arch/powerpc/include/asm/udbg.h | 1 +
arch/powerpc/kernel/udbg.c | 2 +
drivers/tty/Kconfig | 33 ++
drivers/tty/Makefile | 1 +
drivers/tty/ehv_bytechan.c | 872 +++++++++++++++++++++++++++++++++++++++
5 files changed, 909 insertions(+), 0 deletions(-)
create mode 100644 drivers/tty/ehv_bytechan.c
diff --git a/arch/powerpc/include/asm/udbg.h b/arch/powerpc/include/asm/udbg.h
index 11ae699..bb9f6b1 100644
--- a/arch/powerpc/include/asm/udbg.h
+++ b/arch/powerpc/include/asm/udbg.h
@@ -52,6 +52,7 @@ extern void __init udbg_init_44x_as1(void);
extern void __init udbg_init_40x_realmode(void);
extern void __init udbg_init_cpm(void);
extern void __init udbg_init_usbgecko(void);
+extern void __init udbg_init_ehv_bc(void);
#endif /* __KERNEL__ */
#endif /* _ASM_POWERPC_UDBG_H */
diff --git a/arch/powerpc/kernel/udbg.c b/arch/powerpc/kernel/udbg.c
index e39cad8..d117368 100644
--- a/arch/powerpc/kernel/udbg.c
+++ b/arch/powerpc/kernel/udbg.c
@@ -62,6 +62,8 @@ void __init udbg_early_init(void)
udbg_init_cpm();
#elif defined(CONFIG_PPC_EARLY_DEBUG_USBGECKO)
udbg_init_usbgecko();
+#elif defined(CONFIG_PPC_EARLY_DEBUG_EHV_BC)
+ udbg_init_ehv_bc();
#endif
#ifdef CONFIG_PPC_EARLY_DEBUG
diff --git a/drivers/tty/Kconfig b/drivers/tty/Kconfig
index 3fd7199..9fe0212 100644
--- a/drivers/tty/Kconfig
+++ b/drivers/tty/Kconfig
@@ -319,3 +319,36 @@ config N_GSM
This line discipline provides support for the GSM MUX protocol and
presents the mux as a set of 61 individual tty devices.
+config PPC_EPAPR_HV_BYTECHAN
+ tristate "ePAPR hypervisor byte channel driver"
+ depends on PPC
+ help
+ This driver creates /dev entries for each ePAPR hypervisor byte
+ channel, thereby allowing applications to communicate with byte
+ channels as if they were serial ports.
+
+config PPC_EARLY_DEBUG_EHV_BC
+ bool "Early console (udbg) support for ePAPR hypervisors"
+ depends on PPC_EPAPR_HV_BYTECHAN
+ help
+ Select this option to enable early console (a.k.a. "udbg") support
+ via an ePAPR byte channel. You also need to choose the byte channel
+ handle below.
+
+config PPC_EARLY_DEBUG_EHV_BC_HANDLE
+ int "Byte channel handle for early console (udbg)"
+ depends on PPC_EARLY_DEBUG_EHV_BC
+ default 0
+ help
+ If you want early console (udbg) output through a byte channel,
+ specify the handle of the byte channel to use.
+
+ For this to work, the byte channel driver must be compiled
+ in-kernel, not as a module.
+
+ Note that only one early console driver can be enabled, so don't
+ enable any others if you enable this one.
+
+ If the number you specify is not a valid byte channel handle, then
+ there simply will be no early console output. This is true also
+ if you don't boot under a hypervisor at all.
diff --git a/drivers/tty/Makefile b/drivers/tty/Makefile
index 690522f..4afebd2 100644
--- a/drivers/tty/Makefile
+++ b/drivers/tty/Makefile
@@ -24,5 +24,6 @@ obj-$(CONFIG_ROCKETPORT) += rocket.o
obj-$(CONFIG_SYNCLINK_GT) += synclink_gt.o
obj-$(CONFIG_SYNCLINKMP) += synclinkmp.o
obj-$(CONFIG_SYNCLINK) += synclink.o
+obj-$(CONFIG_PPC_EPAPR_HV_BYTECHAN) += ehv_bytechan.o
obj-y += ipwireless/
diff --git a/drivers/tty/ehv_bytechan.c b/drivers/tty/ehv_bytechan.c
new file mode 100644
index 0000000..b51fdb2
--- /dev/null
+++ b/drivers/tty/ehv_bytechan.c
@@ -0,0 +1,872 @@
+/* ePAPR hypervisor byte channel device driver
+ *
+ * Copyright 2009-2011 Freescale Semiconductor, Inc.
+ *
+ * Author: Timur Tabi <[email protected]>
+ *
+ * This file is licensed under the terms of the GNU General Public License
+ * version 2. This program is licensed "as is" without any warranty of any
+ * kind, whether express or implied.
+ *
+ * This driver support three distinct interfaces, all of which are related to
+ * ePAPR hypervisor byte channels.
+ *
+ * 1) An early-console (udbg) driver. This provides early console output
+ * through a byte channel. The byte channel handle must be specified in a
+ * Kconfig option.
+ *
+ * 2) A normal console driver. Output is sent to the byte channel designated
+ * for stdout in the device tree. The console driver is for handling kernel
+ * printk calls.
+ *
+ * 3) A tty driver, which is used to handle user-space input and output. The
+ * byte channel used for the console is designated as the default tty.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/err.h>
+#include <linux/interrupt.h>
+#include <linux/fs.h>
+#include <linux/poll.h>
+#include <asm/epapr_hcalls.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/cdev.h>
+#include <linux/console.h>
+#include <linux/tty.h>
+#include <linux/tty_flip.h>
+#include <linux/circ_buf.h>
+#include <asm/udbg.h>
+
+/* The size of the transmit circular buffer. This must be a power of two. */
+#define BUF_SIZE 2048
+
+/* Per-byte channel private data */
+struct ehv_bc_data {
+ struct device *dev;
+ struct tty_port port;
+ struct tty_struct *ttys;
+ uint32_t handle;
+ unsigned int rx_irq;
+ unsigned int tx_irq;
+
+ spinlock_t lock; /* lock for transmit buffer */
+ unsigned char buf[BUF_SIZE]; /* transmit circular buffer */
+ unsigned int head; /* circular buffer head */
+ unsigned int tail; /* circular buffer tail */
+
+ int tx_irq_enabled; /* true == TX interrupt is enabled */
+};
+
+/* Array of byte channel objects */
+static struct ehv_bc_data *bcs;
+
+/* Byte channel handle for stdout (and stdin), taken from device tree */
+static unsigned int stdout_bc;
+
+/* Virtual IRQ for the byte channel handle for stdin, taken from device tree */
+static unsigned int stdout_irq;
+
+/**************************** SUPPORT FUNCTIONS ****************************/
+
+/*
+ * Enable the transmit interrupt
+ *
+ * Unlike a serial device, byte channels have no mechanism for disabling their
+ * own receive or transmit interrupts. To emulate that feature, we toggle
+ * the IRQ in the kernel.
+ *
+ * We cannot just blindly call enable_irq() or disable_irq(), because these
+ * calls are reference counted. This means that we cannot call enable_irq()
+ * if interrupts are already enabled. This can happen in two situations:
+ *
+ * 1. The tty layer makes two back-to-back calls to ehv_bc_tty_write()
+ * 2. A transmit interrupt occurs while executing ehv_bc_tx_dequeue()
+ *
+ * To work around this, we keep a flag to tell us if the IRQ is enabled or not.
+ */
+static void enable_tx_interrupt(struct ehv_bc_data *bc)
+{
+ if (!bc->tx_irq_enabled) {
+ enable_irq(bc->tx_irq);
+ bc->tx_irq_enabled = 1;
+ }
+}
+
+static void disable_tx_interrupt(struct ehv_bc_data *bc)
+{
+ if (bc->tx_irq_enabled) {
+ disable_irq_nosync(bc->tx_irq);
+ bc->tx_irq_enabled = 0;
+ }
+}
+
+/*
+ * find the byte channel handle to use for the console
+ *
+ * The byte channel to be used for the console is specified via a "stdout"
+ * property in the /chosen node.
+ *
+ * For compatible with legacy device trees, we also look for a "stdout" alias.
+ */
+static int find_console_handle(void)
+{
+ struct device_node *np, *np2;
+ const char *sprop = NULL;
+ const uint32_t *iprop;
+
+ np = of_find_node_by_path("/chosen");
+ if (np)
+ sprop = of_get_property(np, "stdout-path", NULL);
+
+ if (!np || !sprop) {
+ of_node_put(np);
+ np = of_find_node_by_name(NULL, "aliases");
+ if (np)
+ sprop = of_get_property(np, "stdout", NULL);
+ }
+
+ if (!sprop) {
+ of_node_put(np);
+ return 0;
+ }
+
+ /* We don't care what the aliased node is actually called. We only
+ * care if it's compatible with "epapr,hv-byte-channel", because that
+ * indicates that it's a byte channel node. We use a temporary
+ * variable, 'np2', because we can't release 'np' until we're done with
+ * 'sprop'.
+ */
+ np2 = of_find_node_by_path(sprop);
+ of_node_put(np);
+ np = np2;
+ if (!np) {
+ pr_warning("ehv-bc: stdout node '%s' does not exist\n", sprop);
+ return 0;
+ }
+
+ /* Is it a byte channel? */
+ if (!of_device_is_compatible(np, "epapr,hv-byte-channel")) {
+ of_node_put(np);
+ return 0;
+ }
+
+ stdout_irq = irq_of_parse_and_map(np, 0);
+ if (stdout_irq == NO_IRQ) {
+ pr_err("ehv-bc: no 'interrupts' property in %s node\n", sprop);
+ of_node_put(np);
+ return 0;
+ }
+
+ /*
+ * The 'hv-handle' property contains the handle for this byte channel.
+ */
+ iprop = of_get_property(np, "hv-handle", NULL);
+ if (!iprop) {
+ pr_err("ehv-bc: no 'hv-handle' property in %s node\n",
+ np->name);
+ of_node_put(np);
+ return 0;
+ }
+ stdout_bc = be32_to_cpu(*iprop);
+
+ of_node_put(np);
+ return 1;
+}
+
+/*************************** EARLY CONSOLE DRIVER ***************************/
+
+#ifdef CONFIG_PPC_EARLY_DEBUG_EHV_BC
+
+/*
+ * send a byte to a byte channel, wait if necessary
+ *
+ * This function sends a byte to a byte channel, and it waits and
+ * retries if the byte channel is full. It returns if the character
+ * has been sent, or if some error has occurred.
+ *
+ */
+static void byte_channel_spin_send(const char data)
+{
+ int ret, count;
+
+ do {
+ count = 1;
+ ret = ev_byte_channel_send(CONFIG_PPC_EARLY_DEBUG_EHV_BC_HANDLE,
+ &count, &data);
+ } while (ret == EV_EAGAIN);
+}
+
+/*
+ * The udbg subsystem calls this function to display a single character.
+ * We convert CR to a CR/LF.
+ */
+static void ehv_bc_udbg_putc(char c)
+{
+ if (c == '\n')
+ byte_channel_spin_send('\r');
+
+ byte_channel_spin_send(c);
+}
+
+/*
+ * early console initialization
+ *
+ * PowerPC kernels support an early printk console, also known as udbg.
+ * This function must be called via the ppc_md.init_early function pointer.
+ * At this point, the device tree has been unflattened, so we can obtain the
+ * byte channel handle for stdout.
+ *
+ * We only support displaying of characters (putc). We do not support
+ * keyboard input.
+ */
+void __init udbg_init_ehv_bc(void)
+{
+ unsigned int rx_count, tx_count;
+ unsigned int ret;
+
+ /* Check if we're running as a guest of a hypervisor */
+ if (!(mfmsr() & MSR_GS))
+ return;
+
+ /* Verify the byte channel handle */
+ ret = ev_byte_channel_poll(CONFIG_PPC_EARLY_DEBUG_EHV_BC_HANDLE,
+ &rx_count, &tx_count);
+ if (ret)
+ return;
+
+ udbg_putc = ehv_bc_udbg_putc;
+ register_early_udbg_console();
+
+ udbg_printf("ehv-bc: early console using byte channel handle %u\n",
+ CONFIG_PPC_EARLY_DEBUG_EHV_BC_HANDLE);
+}
+
+#endif
+
+/****************************** CONSOLE DRIVER ******************************/
+
+static struct tty_driver *ehv_bc_driver;
+
+/*
+ * Byte channel console sending worker function.
+ *
+ * For consoles, if the output buffer is full, we should just spin until it
+ * clears.
+ */
+static int ehv_bc_console_byte_channel_send(unsigned int handle, const char *s,
+ unsigned int count)
+{
+ unsigned int len;
+ int ret = 0;
+
+ while (count) {
+ len = min_t(unsigned int, count, EV_BYTE_CHANNEL_MAX_BYTES);
+ do {
+ ret = ev_byte_channel_send(handle, &len, s);
+ } while (ret == EV_EAGAIN);
+ count -= len;
+ s += len;
+ }
+
+ return ret;
+}
+
+/*
+ * write a string to the console
+ *
+ * This function gets called to write a string from the kernel, typically from
+ * a printk(). This function spins until all data is written.
+ *
+ * We copy the data to a temporary buffer because we need to insert a \r in
+ * front of every \n. It's more efficient to copy the data to the buffer than
+ * it is to make multiple hcalls for each character or each newline.
+ */
+static void ehv_bc_console_write(struct console *co, const char *s,
+ unsigned int count)
+{
+ unsigned int handle = (unsigned int)co->data;
+ char s2[EV_BYTE_CHANNEL_MAX_BYTES];
+ unsigned int i, j = 0;
+ char c;
+
+ for (i = 0; i < count; i++) {
+ c = *s++;
+
+ if (c == '\n')
+ s2[j++] = '\r';
+
+ s2[j++] = c;
+ if (j >= (EV_BYTE_CHANNEL_MAX_BYTES - 1)) {
+ if (ehv_bc_console_byte_channel_send(handle, s2, j))
+ return;
+ j = 0;
+ }
+ }
+
+ if (j)
+ ehv_bc_console_byte_channel_send(handle, s2, j);
+}
+
+/*
+ * When /dev/console is opened, the kernel iterates the console list looking
+ * for one with ->device and then calls that method. On success, it expects
+ * the passed-in int* to contain the minor number to use.
+ */
+static struct tty_driver *ehv_bc_console_device(struct console *co, int *index)
+{
+ *index = co->index;
+
+ return ehv_bc_driver;
+}
+
+static struct console ehv_bc_console = {
+ .name = "ttyEHV",
+ .write = ehv_bc_console_write,
+ .device = ehv_bc_console_device,
+ .flags = CON_PRINTBUFFER | CON_ENABLED,
+};
+
+/*
+ * Console initialization
+ *
+ * This is the first function that is called after the device tree is
+ * available, so here is where we determine the byte channel handle and IRQ for
+ * stdout/stdin, even though that information is used by the tty and character
+ * drivers.
+ */
+static int __init ehv_bc_console_init(void)
+{
+ if (!find_console_handle()) {
+ pr_debug("ehv-bc: stdout is not a byte channel\n");
+ return -ENODEV;
+ }
+
+#ifdef CONFIG_PPC_EARLY_DEBUG_EHV_BC
+ /* Print a friendly warning if the user chose the wrong byte channel
+ * handle for udbg.
+ */
+ if (stdout_bc != CONFIG_PPC_EARLY_DEBUG_EHV_BC_HANDLE)
+ pr_warning("ehv-bc: udbg handle %u is not the stdout handle\n",
+ CONFIG_PPC_EARLY_DEBUG_EHV_BC_HANDLE);
+#endif
+
+ ehv_bc_console.data = (void *)stdout_bc;
+
+ /* add_preferred_console() must be called before register_console(),
+ otherwise it won't work. However, we don't want to enumerate all the
+ byte channels here, either, since we only care about one. */
+
+ add_preferred_console(ehv_bc_console.name, ehv_bc_console.index, NULL);
+ register_console(&ehv_bc_console);
+
+ pr_info("ehv-bc: registered console driver for byte channel %u\n",
+ stdout_bc);
+
+ return 0;
+}
+console_initcall(ehv_bc_console_init);
+
+/******************************** TTY DRIVER ********************************/
+
+/*
+ * byte channel receive interupt handler
+ *
+ * This ISR is called whenever data is available on a byte channel.
+ */
+static irqreturn_t ehv_bc_tty_rx_isr(int irq, void *data)
+{
+ struct ehv_bc_data *bc = data;
+ struct tty_struct *ttys = bc->ttys;
+ unsigned int rx_count, tx_count, len;
+ int count;
+ char buffer[EV_BYTE_CHANNEL_MAX_BYTES];
+ int ret;
+
+ /* Find out how much data needs to be read, and then ask the TTY layer
+ * if it can handle that much. We want to ensure that every byte we
+ * read from the byte channel will be accepted by the TTY layer.
+ */
+ ev_byte_channel_poll(bc->handle, &rx_count, &tx_count);
+ count = tty_buffer_request_room(ttys, rx_count);
+
+ /* 'count' is the maximum amount of data the TTY layer can accept at
+ * this time. However, during testing, I was never able to get 'count'
+ * to be less than 'rx_count'. I'm not sure whether I'm calling it
+ * correctly.
+ */
+
+ while (count > 0) {
+ len = min_t(unsigned int, count, sizeof(buffer));
+
+ /* Read some data from the byte channel. This function will
+ * never return more than EV_BYTE_CHANNEL_MAX_BYTES bytes.
+ */
+ ev_byte_channel_receive(bc->handle, &len, buffer);
+
+ /* 'len' is now the amount of data that's been received. 'len'
+ * can't be zero, and most likely it's equal to one.
+ */
+
+ /* Pass the received data to the tty layer. Note that this
+ * function calls tty_buffer_request_room(), so I'm not sure if
+ * we should have also called tty_buffer_request_room().
+ */
+ ret = tty_insert_flip_string(ttys, buffer, len);
+
+ /* 'ret' is the number of bytes that the TTY layer accepted.
+ * If it's not equal to 'len', then it means the buffer is
+ * full, which should never happen. If it does happen, we can
+ * exit gracefully, but we drop the last 'len - ret' characters
+ * that we read from the byte channel.
+ */
+ if (ret != len)
+ break;
+
+ count -= len;
+ }
+
+ /* Tell the tty layer that we're done. */
+ tty_flip_buffer_push(ttys);
+
+ return IRQ_HANDLED;
+}
+
+/*
+ * dequeue the transmit buffer to the hypervisor
+ *
+ * This function, which can be called in interrupt context, dequeues as much
+ * data as possible from the transmit buffer to the byte channel.
+ */
+static void ehv_bc_tx_dequeue(struct ehv_bc_data *bc)
+{
+ unsigned int count;
+ unsigned int len, ret;
+ unsigned long flags;
+
+ do {
+ spin_lock_irqsave(&bc->lock, flags);
+ len = min_t(unsigned int,
+ CIRC_CNT_TO_END(bc->head, bc->tail, BUF_SIZE),
+ EV_BYTE_CHANNEL_MAX_BYTES);
+
+ ret = ev_byte_channel_send(bc->handle, &len, bc->buf + bc->tail);
+
+ /* 'len' is valid only if the return code is 0 or EV_EAGAIN */
+ if (!ret || (ret == EV_EAGAIN))
+ bc->tail = (bc->tail + len) & (BUF_SIZE - 1);
+
+ count = CIRC_CNT(bc->head, bc->tail, BUF_SIZE);
+ spin_unlock_irqrestore(&bc->lock, flags);
+ } while (count && !ret);
+
+ spin_lock_irqsave(&bc->lock, flags);
+ if (CIRC_CNT(bc->head, bc->tail, BUF_SIZE))
+ /*
+ * If we haven't emptied the buffer, then enable the TX IRQ.
+ * We'll get an interrupt when there's more room in the
+ * hypervisor's output buffer.
+ */
+ enable_tx_interrupt(bc);
+ else
+ disable_tx_interrupt(bc);
+ spin_unlock_irqrestore(&bc->lock, flags);
+}
+
+/*
+ * byte channel transmit interupt handler
+ *
+ * This ISR is called whenever space becomes available for transmitting
+ * characters on a byte channel.
+ */
+static irqreturn_t ehv_bc_tty_tx_isr(int irq, void *data)
+{
+ struct ehv_bc_data *bc = data;
+
+ ehv_bc_tx_dequeue(bc);
+ tty_wakeup(bc->ttys);
+
+ return IRQ_HANDLED;
+}
+
+/*
+ * This function is called when the tty layer has data for us send. We store
+ * the data first in a circular buffer, and then dequeue as much of that data
+ * as possible.
+ *
+ * We don't need to worry about whether there is enough room in the buffer for
+ * all the data. The purpose of ehv_bc_tty_write_room() is to tell the tty
+ * layer how much data it can safely send to us. We guarantee that
+ * ehv_bc_tty_write_room() will never lie, so the tty layer will never send us
+ * too much data.
+ */
+static int ehv_bc_tty_write(struct tty_struct *ttys, const unsigned char *s,
+ int count)
+{
+ struct ehv_bc_data *bc = ttys->driver_data;
+ unsigned long flags;
+ unsigned int len;
+ unsigned int written = 0;
+
+ while (1) {
+ spin_lock_irqsave(&bc->lock, flags);
+ len = CIRC_SPACE_TO_END(bc->head, bc->tail, BUF_SIZE);
+ if (count < len)
+ len = count;
+ if (len) {
+ memcpy(bc->buf + bc->head, s, len);
+ bc->head = (bc->head + len) & (BUF_SIZE - 1);
+ }
+ spin_unlock_irqrestore(&bc->lock, flags);
+ if (!len)
+ break;
+
+ s += len;
+ count -= len;
+ written += len;
+ }
+
+ ehv_bc_tx_dequeue(bc);
+
+ return written;
+}
+
+/* This function can be called multiple times for a given tty_struct, which is
+ * why we initialize bc->ttys in ehv_bc_tty_port_activate() instead.
+ *
+ * For some reason, the tty layer will still call this function even if the
+ * device was not registered (i.e. tty_register_device() was not called). So
+ * we need to check for that.
+ */
+static int ehv_bc_tty_open(struct tty_struct *ttys, struct file *filp)
+{
+ struct ehv_bc_data *bc = &bcs[ttys->index];
+
+ if (!bc->dev)
+ return -ENODEV;
+
+ return tty_port_open(&bc->port, ttys, filp);
+}
+
+/* Amazingly, if ehv_bc_tty_open() returns an error code, the tty layer will
+ * still call this function to close the tty device. So we can't assume that
+ * the tty port has been initialized.
+ */
+static void ehv_bc_tty_close(struct tty_struct *ttys, struct file *filp)
+{
+ struct ehv_bc_data *bc = &bcs[ttys->index];
+
+ if (bc->dev)
+ tty_port_close(&bc->port, ttys, filp);
+}
+
+/*
+ * return the amount of space in the output buffer
+ *
+ * This is actually a contract between the driver and the tty layer outlining
+ * how much write room the driver can guarantee will be sent OR BUFFERED. This
+ * driver MUST honor the return value.
+ */
+static int ehv_bc_tty_write_room(struct tty_struct *ttys)
+{
+ struct ehv_bc_data *bc = ttys->driver_data;
+ unsigned long flags;
+ int count;
+
+ spin_lock_irqsave(&bc->lock, flags);
+ count = CIRC_SPACE(bc->head, bc->tail, BUF_SIZE);
+ spin_unlock_irqrestore(&bc->lock, flags);
+
+ return count;
+}
+
+/*
+ * Stop sending data to the tty layer
+ *
+ * This function is called when the tty layer's input buffers are getting full,
+ * so the driver should stop sending it data. The easiest way to do this is to
+ * disable the RX IRQ, which will prevent ehv_bc_tty_rx_isr() from being
+ * called.
+ *
+ * The hypervisor will continue to queue up any incoming data. If there is any
+ * data in the queue when the RX interrupt is enabled, we'll immediately get an
+ * RX interrupt.
+ */
+static void ehv_bc_tty_throttle(struct tty_struct *ttys)
+{
+ struct ehv_bc_data *bc = ttys->driver_data;
+
+ disable_irq(bc->rx_irq);
+}
+
+/*
+ * Resume sending data to the tty layer
+ *
+ * This function is called after previously calling ehv_bc_tty_throttle(). The
+ * tty layer's input buffers now have more room, so the driver can resume
+ * sending it data.
+ */
+static void ehv_bc_tty_unthrottle(struct tty_struct *ttys)
+{
+ struct ehv_bc_data *bc = ttys->driver_data;
+
+ /* If there is any data in the queue when the RX interrupt is enabled,
+ * we'll immediately get an RX interrupt.
+ */
+ enable_irq(bc->rx_irq);
+}
+
+/*
+ * TTY driver operations
+ *
+ * If we could ask the hypervisor how much data is still in the TX buffer, or
+ * at least how big the TX buffers are, then we could implement the
+ * .wait_until_sent and .chars_in_buffer functions.
+ */
+static const struct tty_operations ehv_bc_ops = {
+ .open = ehv_bc_tty_open,
+ .close = ehv_bc_tty_close,
+ .write = ehv_bc_tty_write,
+ .write_room = ehv_bc_tty_write_room,
+ .throttle = ehv_bc_tty_throttle,
+ .unthrottle = ehv_bc_tty_unthrottle,
+};
+
+/*
+ * initialize the TTY port
+ *
+ * This function will only be called once, no matter how many times
+ * ehv_bc_tty_open() is called. That's why we register the ISR here, and also
+ * why we initialize tty_struct-related variables here.
+ */
+static int ehv_bc_tty_port_activate(struct tty_port *port,
+ struct tty_struct *ttys)
+{
+ struct ehv_bc_data *bc = container_of(port, struct ehv_bc_data, port);
+ int ret;
+
+ bc->ttys = ttys;
+ ttys->driver_data = bc;
+
+ ret = request_irq(bc->rx_irq, ehv_bc_tty_rx_isr, 0, "ehv-bc", bc);
+ if (ret < 0) {
+ dev_err(bc->dev, "could not request rx irq %u (ret=%i)\n",
+ bc->rx_irq, ret);
+ return ret;
+ }
+
+ /* request_irq also enables the IRQ */
+ bc->tx_irq_enabled = 1;
+
+ ret = request_irq(bc->tx_irq, ehv_bc_tty_tx_isr, 0, "ehv-bc", bc);
+ if (ret < 0) {
+ dev_err(bc->dev, "could not request tx irq %u (ret=%i)\n",
+ bc->tx_irq, ret);
+ free_irq(bc->rx_irq, bc);
+ return ret;
+ }
+
+ /* The TX IRQ is enabled only when we can't write all the data to the
+ * byte channel at once, so by default it's disabled.
+ */
+ disable_tx_interrupt(bc);
+
+ return 0;
+}
+
+static void ehv_bc_tty_port_shutdown(struct tty_port *port)
+{
+ struct ehv_bc_data *bc = container_of(port, struct ehv_bc_data, port);
+
+ free_irq(bc->tx_irq, bc);
+ free_irq(bc->rx_irq, bc);
+ bc->ttys = NULL;
+}
+
+static const struct tty_port_operations ehv_bc_tty_port_ops = {
+ .activate = ehv_bc_tty_port_activate,
+ .shutdown = ehv_bc_tty_port_shutdown,
+};
+
+static int __devinit ehv_bc_tty_probe(struct platform_device *pdev)
+{
+ struct device_node *np = pdev->dev.of_node;
+ struct ehv_bc_data *bc;
+ const uint32_t *iprop;
+ unsigned int handle;
+ int ret;
+ static unsigned int index = 1;
+ unsigned int i;
+
+ iprop = of_get_property(np, "hv-handle", NULL);
+ if (!iprop) {
+ dev_err(&pdev->dev, "no 'hv-handle' property in %s node\n",
+ np->name);
+ return -ENODEV;
+ }
+
+ /* We already told the console layer that the index for the console
+ * device is zero, so we need to make sure that we use that index when
+ * we probe the console byte channel node.
+ */
+ handle = be32_to_cpu(*iprop);
+ i = (handle == stdout_bc) ? 0 : index++;
+ bc = &bcs[i];
+
+ bc->handle = handle;
+ bc->head = 0;
+ bc->tail = 0;
+ spin_lock_init(&bc->lock);
+
+ bc->rx_irq = irq_of_parse_and_map(np, 0);
+ bc->tx_irq = irq_of_parse_and_map(np, 1);
+ if ((bc->rx_irq == NO_IRQ) || (bc->tx_irq == NO_IRQ)) {
+ dev_err(&pdev->dev, "no 'interrupts' property in %s node\n",
+ np->name);
+ ret = -ENODEV;
+ goto error;
+ }
+
+ bc->dev = tty_register_device(ehv_bc_driver, i, &pdev->dev);
+ if (IS_ERR(bc->dev)) {
+ ret = PTR_ERR(bc->dev);
+ dev_err(&pdev->dev, "could not register tty (ret=%i)\n", ret);
+ goto error;
+ }
+
+ tty_port_init(&bc->port);
+ bc->port.ops = &ehv_bc_tty_port_ops;
+
+ dev_set_drvdata(&pdev->dev, bc);
+
+ dev_info(&pdev->dev, "registered /dev/%s%u for byte channel %u\n",
+ ehv_bc_driver->name, i, bc->handle);
+
+ return 0;
+
+error:
+ irq_dispose_mapping(bc->tx_irq);
+ irq_dispose_mapping(bc->rx_irq);
+
+ memset(bc, 0, sizeof(struct ehv_bc_data));
+ return ret;
+}
+
+static int ehv_bc_tty_remove(struct platform_device *pdev)
+{
+ struct ehv_bc_data *bc = dev_get_drvdata(&pdev->dev);
+
+ tty_unregister_device(ehv_bc_driver, bc - bcs);
+
+ irq_dispose_mapping(bc->tx_irq);
+ irq_dispose_mapping(bc->rx_irq);
+
+ return 0;
+}
+
+static const struct of_device_id ehv_bc_tty_of_ids[] = {
+ { .compatible = "epapr,hv-byte-channel" },
+ {}
+};
+
+static struct platform_driver ehv_bc_tty_driver = {
+ .driver = {
+ .owner = THIS_MODULE,
+ .name = "ehv-bc",
+ .of_match_table = ehv_bc_tty_of_ids,
+ },
+ .probe = ehv_bc_tty_probe,
+ .remove = ehv_bc_tty_remove,
+};
+
+/**
+ * ehv_bc_init - ePAPR hypervisor byte channel driver initialization
+ *
+ * This function is called when this module is loaded.
+ */
+static int __init ehv_bc_init(void)
+{
+ struct device_node *np;
+ unsigned int count = 0; /* Number of elements in bcs[] */
+ int ret;
+
+ pr_info("ePAPR hypervisor byte channel driver\n");
+
+ /* Count the number of byte channels */
+ for_each_compatible_node(np, NULL, "epapr,hv-byte-channel")
+ count++;
+
+ if (!count)
+ return -ENODEV;
+
+ /* The array index of an element in bcs[] is the same as the tty index
+ * for that element. If you know the address of an element in the
+ * array, then you can use pointer math (e.g. "bc - bcs") to get its
+ * tty index.
+ */
+ bcs = kzalloc(count * sizeof(struct ehv_bc_data), GFP_KERNEL);
+ if (!bcs)
+ return -ENOMEM;
+
+ ehv_bc_driver = alloc_tty_driver(count);
+ if (!ehv_bc_driver) {
+ ret = -ENOMEM;
+ goto error;
+ }
+
+ ehv_bc_driver->owner = THIS_MODULE;
+ ehv_bc_driver->driver_name = "ehv-bc";
+ ehv_bc_driver->name = ehv_bc_console.name;
+ ehv_bc_driver->type = TTY_DRIVER_TYPE_CONSOLE;
+ ehv_bc_driver->subtype = SYSTEM_TYPE_CONSOLE;
+ ehv_bc_driver->init_termios = tty_std_termios;
+ ehv_bc_driver->flags = TTY_DRIVER_REAL_RAW | TTY_DRIVER_DYNAMIC_DEV;
+ tty_set_operations(ehv_bc_driver, &ehv_bc_ops);
+
+ ret = tty_register_driver(ehv_bc_driver);
+ if (ret) {
+ pr_err("ehv-bc: could not register tty driver (ret=%i)\n", ret);
+ goto error;
+ }
+
+ ret = platform_driver_register(&ehv_bc_tty_driver);
+ if (ret) {
+ pr_err("ehv-bc: could not register platform driver (ret=%i)\n",
+ ret);
+ goto error;
+ }
+
+ return 0;
+
+error:
+ if (ehv_bc_driver) {
+ tty_unregister_driver(ehv_bc_driver);
+ put_tty_driver(ehv_bc_driver);
+ }
+
+ kfree(bcs);
+
+ return ret;
+}
+
+
+/**
+ * ehv_bc_exit - ePAPR hypervisor byte channel driver termination
+ *
+ * This function is called when this driver is unloaded.
+ */
+static void __exit ehv_bc_exit(void)
+{
+ tty_unregister_driver(ehv_bc_driver);
+ put_tty_driver(ehv_bc_driver);
+ kfree(bcs);
+}
+
+module_init(ehv_bc_init);
+module_exit(ehv_bc_exit);
+
+MODULE_AUTHOR("Timur Tabi <[email protected]>");
+MODULE_DESCRIPTION("ePAPR hypervisor byte channel driver");
+MODULE_LICENSE("GPL v2");
--
1.7.3.4
The Freescale hypervisor management driver provides several services to
drivers and applications related to the Freescale hypervisor:
1. An ioctl interface for querying and managing partitions
2. A file interface to reading incoming doorbells
3. An interrupt handler for shutting down the partition upon receiving the
shutdown doorbell from a manager partition
4. An interface for receiving callbacks when a managed partition shuts down.
Signed-off-by: Timur Tabi <[email protected]>
---
drivers/misc/Kconfig | 7 +
drivers/misc/Makefile | 1 +
drivers/misc/fsl_hypervisor.c | 941 ++++++++++++++++++++++++++++++++++++++++
include/linux/fsl_hypervisor.h | 203 +++++++++
4 files changed, 1152 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/fsl_hypervisor.c
create mode 100644 include/linux/fsl_hypervisor.h
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 4e007c6..4aa6032 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -216,6 +216,13 @@ config ENCLOSURE_SERVICES
driver (SCSI/ATA) which supports enclosures
or a SCSI enclosure device (SES) to use these services.
+config FSL_HV_MANAGER
+ tristate "Freescale hypervisor management driver"
+ depends on FSL_SOC
+ help
+ This driver allows applications to communicate with the Freescale
+ Hypervisor.
+
config SGI_XP
tristate "Support communication between SGI SSIs"
depends on NET
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index f546860..2c23e34 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_SENSORS_BH1770) += bh1770glc.o
obj-$(CONFIG_SENSORS_APDS990X) += apds990x.o
obj-$(CONFIG_SGI_IOC4) += ioc4.o
obj-$(CONFIG_ENCLOSURE_SERVICES) += enclosure.o
+obj-$(CONFIG_FSL_HV_MANAGER) += fsl_hypervisor.o
obj-$(CONFIG_KGDB_TESTS) += kgdbts.o
obj-$(CONFIG_SGI_XP) += sgi-xp/
obj-$(CONFIG_SGI_GRU) += sgi-gru/
diff --git a/drivers/misc/fsl_hypervisor.c b/drivers/misc/fsl_hypervisor.c
new file mode 100644
index 0000000..a03aa7b
--- /dev/null
+++ b/drivers/misc/fsl_hypervisor.c
@@ -0,0 +1,941 @@
+/** @file
+ * Freescale Hypervisor Management Driver
+ *
+ * This driver contains functions to support the Freescale hypervisor.
+ */
+/* Copyright (C) 2008-2010 Freescale Semiconductor, Inc.
+ * Author: Timur Tabi <[email protected]>
+ *
+ * This file is licensed under the terms of the GNU General Public License
+ * version 2. This program is licensed "as is" without any warranty of any
+ * kind, whether express or implied.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/miscdevice.h>
+#include <linux/mm.h>
+#include <linux/pagemap.h>
+#include <linux/slab.h>
+#include <linux/poll.h>
+#include <linux/of.h>
+#include <linux/reboot.h>
+#include <linux/uaccess.h>
+#include <linux/notifier.h>
+
+#include <linux/io.h>
+#include <asm/fsl_hcalls.h>
+
+#include <linux/fsl_hypervisor.h>
+
+static BLOCKING_NOTIFIER_HEAD(failover_subscribers);
+
+/**
+ * ioctl_restart: ioctl interface for FSL_HV_IOCTL_PARTITION_RESTART
+ *
+ * Restart a running partition
+ */
+static long ioctl_restart(struct fsl_hv_ioctl_restart __user *p)
+{
+ struct fsl_hv_ioctl_restart param;
+
+ /* Get the parameters from the user */
+ if (copy_from_user(¶m, p, sizeof(struct fsl_hv_ioctl_restart)))
+ return -EFAULT;
+
+ param.ret = fh_partition_restart(param.partition);
+
+ if (copy_to_user(&p->ret, ¶m.ret, sizeof(__u32)))
+ return -EFAULT;
+
+ return 0;
+}
+
+/**
+ * ioctl_status: ioctl interface for FSL_HV_IOCTL_PARTITION_STATUS
+ *
+ * Query the status of a partition
+ */
+static long ioctl_status(struct fsl_hv_ioctl_status __user *p)
+{
+ struct fsl_hv_ioctl_status param;
+ u32 status;
+
+ /* Get the parameters from the user */
+ if (copy_from_user(¶m, p, sizeof(struct fsl_hv_ioctl_status)))
+ return -EFAULT;
+
+ param.ret = fh_partition_get_status(param.partition, &status);
+ if (!param.ret)
+ param.status = status;
+
+ if (copy_to_user(p, ¶m, sizeof(struct fsl_hv_ioctl_status)))
+ return -EFAULT;
+
+ return 0;
+}
+
+/**
+ * ioctl_start: ioctl interface for FSL_HV_IOCTL_PARTITION_START
+ *
+ * Start a stopped partition.
+ */
+static long ioctl_start(struct fsl_hv_ioctl_start __user *p)
+{
+ struct fsl_hv_ioctl_start param;
+
+ /* Get the parameters from the user */
+ if (copy_from_user(¶m, p, sizeof(struct fsl_hv_ioctl_start)))
+ return -EFAULT;
+
+ param.ret = fh_partition_start(param.partition, param.entry_point,
+ param.load);
+
+ if (copy_to_user(&p->ret, ¶m.ret, sizeof(__u32)))
+ return -EFAULT;
+
+ return 0;
+}
+
+/**
+ * ioctl_stop: ioctl interface for FSL_HV_IOCTL_PARTITION_STOP
+ *
+ * Stop a running partition
+ */
+static long ioctl_stop(struct fsl_hv_ioctl_stop __user *p)
+{
+ struct fsl_hv_ioctl_stop param;
+
+ /* Get the parameters from the user */
+ if (copy_from_user(¶m, p, sizeof(struct fsl_hv_ioctl_stop)))
+ return -EFAULT;
+
+ param.ret = fh_partition_stop(param.partition);
+
+ if (copy_to_user(&p->ret, ¶m.ret, sizeof(__u32)))
+ return -EFAULT;
+
+ return 0;
+}
+
+/**
+ * ioctl_memcpy: ioctl interface for FSL_HV_IOCTL_MEMCPY
+ *
+ * The FH_MEMCPY hypercall takes an array of address/address/size structures
+ * to represent the data being copied. As a convenience to the user, this
+ * ioctl takes a user-create buffer and a pointer to a guest physically
+ * contiguous buffer in the remote partition, and creates the
+ * address/address/size array for the hypercall.
+ */
+static long ioctl_memcpy(struct fsl_hv_ioctl_memcpy __user *p)
+{
+ struct fsl_hv_ioctl_memcpy param;
+
+ struct page **pages = NULL;
+ void *sg_list_unaligned = NULL;
+ struct fh_sg_list *sg_list = NULL;
+
+ unsigned int nr_pages;
+ unsigned long lb_offset; /* Offset within a page of the local buffer */
+
+ unsigned int i;
+ long ret = 0;
+ phys_addr_t remote_paddr; /* The next address in the remote buffer */
+ uint32_t count; /* The number of bytes left to copy */
+
+ /* Get the parameters from the user */
+ if (copy_from_user(¶m, p, sizeof(struct fsl_hv_ioctl_memcpy)))
+ return -EFAULT;
+
+ /* One partition must be local, the other must be remote. In other
+ words, if source and target are both -1, or are both not -1, then
+ return an error. */
+ if ((param.source == -1) == (param.target == -1))
+ return -EINVAL;
+
+ /*
+ * The array of pages returned by get_user_pages() covers only
+ * page-aligned memory. Since the user buffer is probably not
+ * page-aligned, we need to handle the discrepancy.
+ *
+ * We calculate the offset within a page of the S/G list, and make
+ * adjustments accordingly. This will result in a page list that looks
+ * like this:
+ *
+ * ---- <-- first page starts before the buffer
+ * | |
+ * |////|-> ----
+ * |////| | |
+ * ---- | |
+ * | |
+ * ---- | |
+ * |////| | |
+ * |////| | |
+ * |////| | |
+ * ---- | |
+ * | |
+ * ---- | |
+ * |////| | |
+ * |////| | |
+ * |////| | |
+ * ---- | |
+ * | |
+ * ---- | |
+ * |////| | |
+ * |////|-> ----
+ * | | <-- last page ends after the buffer
+ * ----
+ *
+ * The distance between the start of the first page and the start of the
+ * buffer is lb_offset. The hashed (///) areas are the parts of the
+ * page list that contain the actual buffer.
+ *
+ * The advantage of this approach is that the number of pages is
+ * equal to the number of entries in the S/G list that we give to the
+ * hypervisor.
+ */
+ lb_offset = param.local_vaddr & (PAGE_SIZE - 1);
+ nr_pages = (param.count + lb_offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
+ /* Allocate the buffers we need */
+
+ /* pages is an array of struct page pointers that's initialized by
+ get_user_pages() */
+ pages = kzalloc(nr_pages * sizeof(struct page *), GFP_KERNEL);
+ if (!pages) {
+ pr_debug("fsl-hv: could not allocate page list\n");
+ return -ENOMEM;
+ }
+
+ /* sg_list is the list of fh_sg_list objects that we pass to the
+ hypervisor */
+ sg_list_unaligned = kmalloc(nr_pages * sizeof(struct fh_sg_list) +
+ sizeof(struct fh_sg_list) - 1, GFP_KERNEL);
+ if (!sg_list_unaligned) {
+ pr_debug("fsl-hv: could not allocate S/G list\n");
+ return -ENOMEM;
+ }
+ sg_list = PTR_ALIGN(sg_list_unaligned, sizeof(struct fh_sg_list));
+
+ /* Get the physical addresses of the source buffer */
+ down_read(¤t->mm->mmap_sem);
+ ret = get_user_pages(current, current->mm,
+ param.local_vaddr - lb_offset, nr_pages,
+ (param.source == -1) ? READ : WRITE,
+ 0, pages, NULL);
+ up_read(¤t->mm->mmap_sem);
+
+ if (ret != nr_pages) {
+ /* get_user_pages() failed */
+ pr_debug("fsl-hv: could not lock source buffer\n");
+ ret = -EACCES;
+ goto exit;
+ }
+
+ /* reset ret here */
+ ret = 0;
+
+ /* Build the fh_sg_list[] array. The first page is special
+ because it's misaligned.*/
+ if (param.source == -1) {
+ sg_list[0].source = page_to_phys(pages[0]) + lb_offset;
+ sg_list[0].target = param.remote_paddr;
+ } else {
+ sg_list[0].source = param.remote_paddr;
+ sg_list[0].target = page_to_phys(pages[0]) + lb_offset;
+ }
+ sg_list[0].size = min_t(uint64_t, param.count, PAGE_SIZE - lb_offset);
+
+ remote_paddr = param.remote_paddr + sg_list[0].size;
+ count = param.count - sg_list[0].size;
+
+ for (i = 1; i < nr_pages; i++) {
+ if (param.source == -1) {
+ /* local to remote */
+ sg_list[i].source = page_to_phys(pages[i]);
+ sg_list[i].target = remote_paddr;
+ } else {
+ /* remote to local */
+ sg_list[i].source = remote_paddr;
+ sg_list[i].target = page_to_phys(pages[i]);
+ }
+ sg_list[i].size = min_t(uint64_t, count, PAGE_SIZE);
+
+ remote_paddr += sg_list[i].size;
+ count -= sg_list[i].size;
+ }
+
+ param.ret = fh_partition_memcpy(param.source, param.target,
+ virt_to_phys(sg_list), nr_pages);
+
+exit:
+ if (pages) {
+ for (i = 0; i < nr_pages; i++)
+ if (pages[i])
+ page_cache_release(pages[i]);
+ }
+
+ kfree(sg_list_unaligned);
+ kfree(pages);
+
+ if (!ret)
+ if (copy_to_user(&p->ret, ¶m.ret, sizeof(__u32)))
+ return -EFAULT;
+
+ return ret;
+}
+
+/**
+ * ioctl_doorbell: ioctl interface for FSL_HV_IOCTL_DOORBELL
+ *
+ * Ring a doorbell
+ */
+static long ioctl_doorbell(struct fsl_hv_ioctl_doorbell __user *p)
+{
+ struct fsl_hv_ioctl_doorbell param;
+
+ /* Get the parameters from the user */
+ if (copy_from_user(¶m, p, sizeof(struct fsl_hv_ioctl_doorbell)))
+ return -EFAULT;
+
+ param.ret = ev_doorbell_send(param.doorbell);
+
+ if (copy_to_user(&p->ret, ¶m.ret, sizeof(__u32)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static char *strdup_from_user(const char __user *ustr, size_t max)
+{
+ size_t len;
+ char *str;
+
+ len = strnlen_user(ustr, max);
+ if (len > max)
+ return ERR_PTR(-ENAMETOOLONG);
+
+ str = kmalloc(len, GFP_KERNEL);
+ if (!str)
+ return ERR_PTR(-ENOMEM);
+
+ if (copy_from_user(str, ustr, len))
+ return ERR_PTR(-EFAULT);
+
+ return str;
+}
+
+static long ioctl_dtprop(struct fsl_hv_ioctl_prop __user *p, int set)
+{
+ struct fsl_hv_ioctl_prop param;
+ char __user *upath, *upropname;
+ void __user *upropval;
+ char *path = NULL, *propname = NULL;
+ void *propval = NULL;
+ int ret = 0;
+
+ /* Get the parameters from the user */
+ if (copy_from_user(¶m, p, sizeof(struct fsl_hv_ioctl_prop)))
+ return -EFAULT;
+
+ upath = (char __user *)(uintptr_t)param.path;
+ upropname = (char __user *)(uintptr_t)param.propname;
+ upropval = (void __user *)(uintptr_t)param.propval;
+
+ path = strdup_from_user(upath, FH_DTPROP_MAX_PATHLEN);
+ if (IS_ERR(path)) {
+ ret = PTR_ERR(path);
+ goto out;
+ }
+
+ propname = strdup_from_user(upropname, FH_DTPROP_MAX_PATHLEN);
+ if (IS_ERR(propname)) {
+ ret = PTR_ERR(propname);
+ goto out;
+ }
+
+ if (param.proplen > FH_DTPROP_MAX_PROPLEN) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ propval = kmalloc(param.proplen, GFP_KERNEL);
+ if (!propval) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ if (set) {
+ if (copy_from_user(propval, upropval, param.proplen)) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ param.ret = fh_partition_set_dtprop(param.handle,
+ virt_to_phys(path),
+ virt_to_phys(propname),
+ virt_to_phys(propval),
+ param.proplen);
+ } else {
+ param.ret = fh_partition_get_dtprop(param.handle,
+ virt_to_phys(path),
+ virt_to_phys(propname),
+ virt_to_phys(propval),
+ ¶m.proplen);
+
+ if (param.ret == 0) {
+ if (copy_to_user(upropval, propval, param.proplen) ||
+ put_user(param.proplen, &p->proplen)) {
+ ret = -EFAULT;
+ goto out;
+ }
+ }
+ }
+
+ if (put_user(param.ret, &p->ret))
+ ret = -EFAULT;
+
+out:
+ kfree(path);
+ kfree(propval);
+ kfree(propname);
+
+ return ret;
+}
+
+/**
+ * fsl_hv_ioctl: ioctl main entry point
+ */
+static long fsl_hv_ioctl(struct file *file, unsigned int cmd,
+ unsigned long argaddr)
+{
+ union fsl_hv_ioctl_param __user *arg =
+ (union fsl_hv_ioctl_param __user *)argaddr;
+ long ret;
+
+ /* Make sure the application is called the right driver. */
+ if (_IOC_TYPE(cmd) != 0) {
+ pr_debug("fsl-hv: ioctl type %u should be 0\n", _IOC_TYPE(cmd));
+ return -EINVAL;
+ }
+
+ /* Make sure the application set the direction flag correctly. */
+ if (_IOC_DIR(cmd) != (_IOC_READ | _IOC_WRITE)) {
+ pr_debug("fsl-hv: ioctl direction should be _IOWR\n");
+ return -EINVAL;
+ }
+
+ /* Make sure the application is passing the right structure to us. */
+ if (_IOC_SIZE(cmd) < sizeof(union fsl_hv_ioctl_param)) {
+ pr_debug("fsl-hv: ioctl size %u is too small (should be %u)\n",
+ _IOC_SIZE(cmd), sizeof(union fsl_hv_ioctl_param));
+ return -EINVAL;
+ }
+
+ switch (_IOC_NR(cmd)) {
+ case FSL_HV_IOCTL_PARTITION_RESTART:
+ ret = ioctl_restart(&arg->restart);
+ break;
+ case FSL_HV_IOCTL_PARTITION_GET_STATUS:
+ ret = ioctl_status(&arg->status);
+ break;
+ case FSL_HV_IOCTL_PARTITION_START:
+ ret = ioctl_start(&arg->start);
+ break;
+ case FSL_HV_IOCTL_PARTITION_STOP:
+ ret = ioctl_stop(&arg->stop);
+ break;
+ case FSL_HV_IOCTL_MEMCPY:
+ ret = ioctl_memcpy(&arg->memcpy);
+ break;
+ case FSL_HV_IOCTL_DOORBELL:
+ ret = ioctl_doorbell(&arg->doorbell);
+ break;
+ case FSL_HV_IOCTL_GETPROP:
+ ret = ioctl_dtprop(&arg->prop, 0);
+ break;
+ case FSL_HV_IOCTL_SETPROP:
+ ret = ioctl_dtprop(&arg->prop, 1);
+ break;
+ default:
+ pr_debug("fsl-hv: unknown ioctl %u\n", cmd);
+ ret = -ENOIOCTLCMD;
+ break;
+ }
+
+ return ret;
+}
+
+/* Linked list of processes that have us open */
+struct list_head db_list;
+
+/* spinlock for db_list */
+static DEFINE_SPINLOCK(db_list_lock);
+
+/* The size of the doorbell event queue. This must be a power of two. */
+#define QSIZE 16
+
+/* Returns the next head/tail pointer, wrapping around the queue if necessary */
+#define nextp(x) (((x) + 1) & (QSIZE - 1))
+
+/* Per-open data structure */
+struct doorbell_queue {
+ struct list_head list;
+ spinlock_t lock;
+ wait_queue_head_t wait;
+ unsigned int head;
+ unsigned int tail;
+ uint32_t q[QSIZE];
+};
+
+/* Linked list of ISRs that we registered */
+struct list_head isr_list;
+
+/* Per-ISR data structure */
+struct doorbell_isr {
+ struct list_head list;
+ unsigned int irq;
+ uint32_t doorbell; /* The doorbell handle */
+ uint32_t partition; /* The partition handle, if used */
+ struct work_struct work;
+};
+
+/**
+ * fsl_hv_isr - interrupt handler for all doorbells
+ * @param irq - the IRQ (a.k.a. receive handle)
+ *
+ * We use the same interrupt handler for all doorbells. Whenever a doorbell
+ * is rung, and we receive an interrupt, we just put the handle for that
+ * doorbell (passed to us as *data) into all of the queues.
+ *
+ */
+static irqreturn_t fsl_hv_isr(int irq, void *data)
+{
+ struct doorbell_queue *dbq;
+ unsigned long flags;
+
+ /* Prevent another core from modifying db_list */
+ spin_lock_irqsave(&db_list_lock, flags);
+
+ list_for_each_entry(dbq, &db_list, list) {
+ if (dbq->head != nextp(dbq->tail)) {
+ dbq->q[dbq->tail] = (uint32_t) (uintptr_t) data;
+ /* This memory barrier eliminates the need to grab
+ * the spinlock for dbq.
+ */
+ smp_wmb();
+ dbq->tail = nextp(dbq->tail);
+ wake_up_interruptible(&dbq->wait);
+ }
+ }
+
+ spin_unlock_irqrestore(&db_list_lock, flags);
+
+ return IRQ_HANDLED;
+}
+
+/**
+ * fsl_hv_state_change_work_func -- state change worker function
+ *
+ * The state change notification arrives in an interrupt, but we can't call
+ * blocking_notifier_call_chain() in an interrupt handler. We could call
+ * atomic_notifier_call_chain(), but that would require the clients' call-back
+ * function to run in interrupt context. Since we don't want to impose that
+ * restriction on the clients, we create a work queue to process the
+ * notification in kernel context.
+ */
+static void fsl_hv_state_change_work_func(struct work_struct *work)
+{
+ struct doorbell_isr *dbisr =
+ container_of(work, struct doorbell_isr, work);
+
+ blocking_notifier_call_chain(&failover_subscribers, dbisr->partition,
+ NULL);
+}
+
+/**
+ * fsl_hv_state_change_isr - interrupt handler for state-change doorbells
+ */
+static irqreturn_t fsl_hv_state_change_isr(int irq, void *data)
+{
+ unsigned int status;
+ struct doorbell_isr *dbisr = data;
+ int ret;
+
+ /* Determine the new state, and if it's stopped, notify the clients. */
+ ret = fh_partition_get_status(dbisr->partition, &status);
+ if (!ret && (status == FH_PARTITION_STOPPED))
+ schedule_work(&dbisr->work);
+
+ /* Call the normal handler */
+ return fsl_hv_isr(irq, (void *) (uintptr_t) dbisr->doorbell);
+}
+
+/**
+ * fsl_hv_poll - returns a bitmask indicating whether a read will block
+ *
+ * @return unsigned int
+ */
+static unsigned int fsl_hv_poll(struct file *filp, struct poll_table_struct *p)
+{
+ struct doorbell_queue *dbq = filp->private_data;
+ unsigned long flags;
+ unsigned int mask;
+
+ spin_lock_irqsave(&dbq->lock, flags);
+
+ poll_wait(filp, &dbq->wait, p);
+ mask = (dbq->head == dbq->tail) ? 0 : (POLLIN | POLLRDNORM);
+
+ spin_unlock_irqrestore(&dbq->lock, flags);
+
+ return mask;
+}
+
+/**
+ * fsl_hv_read - return the handles for any incoming doorbells
+ *
+ * If there are doorbell handles in the queue for this open instance, then
+ * return them to the caller as an array of 32-bit integers. Otherwise,
+ * block until there is at least one handle to return.
+ */
+static ssize_t fsl_hv_read(struct file *filp, char __user *buf, size_t len,
+ loff_t *off)
+{
+ struct doorbell_queue *dbq = filp->private_data;
+ uint32_t __user *p = (uint32_t __user *) buf; /* for put_user() */
+ unsigned long flags;
+ ssize_t count = 0;
+
+ /* Make sure we stop when the user buffer is full. */
+ while (len >= sizeof(uint32_t)) {
+ uint32_t dbell; /* Local copy of doorbell queue data */
+
+ spin_lock_irqsave(&dbq->lock, flags);
+
+ /* If the queue is empty, then either we're done or we need
+ * to block. If the application specified O_NONBLOCK, then
+ * we return the appropriate error code.
+ */
+ if (dbq->head == dbq->tail) {
+ spin_unlock_irqrestore(&dbq->lock, flags);
+ if (count)
+ break;
+ if (filp->f_flags & O_NONBLOCK)
+ return -EAGAIN;
+ if (wait_event_interruptible(dbq->wait,
+ dbq->head != dbq->tail))
+ return -ERESTARTSYS;
+ continue;
+ }
+
+ /* Even though we have an smp_wmb() in the ISR, the core
+ * might speculatively execute the "dbell = ..." below while
+ * it's evaluating the if-statement above. In that case, the
+ * value put into dbell could be stale if the core accepts the
+ * speculation. To prevent that, we need a read memory barrier
+ * here as well.
+ */
+ smp_rmb();
+
+ /* Copy the data to a temporary local buffer, because
+ * we can't call copy_to_user() from inside a spinlock
+ */
+ dbell = dbq->q[dbq->head];
+ dbq->head = nextp(dbq->head);
+
+ spin_unlock_irqrestore(&dbq->lock, flags);
+
+ if (put_user(dbell, p))
+ return -EFAULT;
+ p++;
+ count += sizeof(uint32_t);
+ len -= sizeof(uint32_t);
+ }
+
+ return count;
+}
+
+/**
+ * fsl_hv_open - open the driver
+ *
+ * Open the driver and prepare for reading doorbells.
+ *
+ * Every time an application opens the driver, we create a doorbell queue
+ * for that file handle. This queue is used for any incoming doorbells.
+ */
+static int fsl_hv_open(struct inode *inode, struct file *filp)
+{
+ struct doorbell_queue *dbq;
+ unsigned long flags;
+ int ret = 0;
+
+ dbq = kzalloc(sizeof(struct doorbell_queue), GFP_KERNEL);
+ if (!dbq) {
+ pr_err("fsl-hv: out of memory\n");
+ return -ENOMEM;
+ }
+
+ spin_lock_init(&dbq->lock);
+ init_waitqueue_head(&dbq->wait);
+
+ spin_lock_irqsave(&db_list_lock, flags);
+ list_add(&dbq->list, &db_list);
+ spin_unlock_irqrestore(&db_list_lock, flags);
+
+ filp->private_data = dbq;
+
+ return ret;
+}
+
+/**
+ * fsl_hv_close - close the driver
+ */
+static int fsl_hv_close(struct inode *inode, struct file *filp)
+{
+ struct doorbell_queue *dbq = filp->private_data;
+ unsigned long flags;
+
+ int ret = 0;
+
+ spin_lock_irqsave(&db_list_lock, flags);
+ list_del(&dbq->list);
+ spin_unlock_irqrestore(&db_list_lock, flags);
+
+ kfree(dbq);
+
+ return ret;
+}
+
+static const struct file_operations fsl_hv_fops = {
+ .owner = THIS_MODULE,
+ .open = fsl_hv_open,
+ .release = fsl_hv_close,
+ .poll = fsl_hv_poll,
+ .read = fsl_hv_read,
+ .unlocked_ioctl = fsl_hv_ioctl,
+};
+
+static struct miscdevice fsl_hv_misc_dev = {
+ MISC_DYNAMIC_MINOR,
+ "fsl-hv",
+ &fsl_hv_fops
+};
+
+static DECLARE_WORK(power_off, (work_func_t) kernel_power_off);
+
+static irqreturn_t fsl_hv_shutdown_isr(int irq, void *data)
+{
+ schedule_work(&power_off);
+
+ /* We should never get here */
+ return IRQ_NONE;
+}
+
+/**
+ * get_parent_handle -- returns the handle of the parent of the given node
+ *
+ * The handle is the value of the 'reg' property
+ */
+static int get_parent_handle(struct device_node *np)
+{
+ struct device_node *parent;
+ const uint32_t *prop;
+ int len;
+
+ parent = of_get_parent(np);
+ if (!parent)
+ /* It's not really possible for this to fail */
+ return -ENODEV;
+
+ prop = of_get_property(parent, "reg", &len);
+ of_node_put(parent);
+
+ if (!prop || (len != sizeof(uint32_t)))
+ /* This can happen only if the node is malformed */
+ return -ENODEV;
+
+ return *prop;
+}
+
+/**
+ * fsl_hv_failover_register -- register a callback for failover events
+ *
+ * This function is called by device drivers to register their callback
+ * functions for fail-over events.
+ */
+int fsl_hv_failover_register(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&failover_subscribers, nb);
+}
+EXPORT_SYMBOL(fsl_hv_failover_register);
+
+/**
+ * fsl_hv_failover_unregister -- unregister a callback for failover events
+ */
+int fsl_hv_failover_unregister(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_unregister(&failover_subscribers, nb);
+}
+EXPORT_SYMBOL(fsl_hv_failover_unregister);
+
+/**
+ * has_fsl_hypervisor - return TRUE if we're running under FSL hypervisor
+ *
+ * This function checks to see if we're running under the Freescale
+ * hypervisor, and returns zero if we're not, or non-zero if we are.
+ *
+ * First, it checks if MSR[GS]==1, which means we're running under some
+ * hypervisor. Then it checks if there is a hypervisor node in the device
+ * tree. Currently, that means there needs to be a node in the root called
+ * "hypervisor" and which has a property named "fsl,hv-version".
+ */
+static int has_fsl_hypervisor(void)
+{
+ struct device_node *node;
+ int ret;
+
+ if (!(mfmsr() & MSR_GS))
+ return 0;
+
+ node = of_find_node_by_path("/hypervisor");
+ if (!node)
+ return 0;
+
+ ret = of_find_property(node, "fsl,hv-version", NULL) != NULL;
+
+ of_node_put(node);
+
+ return ret;
+}
+
+/**
+ * fsl_hypervisor_init: Freescale hypervisor management driver init
+ *
+ * This function is called when this module is loaded.
+ *
+ * Register ourselves as a miscellaneous driver. This will register the
+ * fops structure and create the right sysfs entries for udev.
+ */
+static int __init fsl_hypervisor_init(void)
+{
+ struct device_node *np;
+ struct doorbell_isr *dbisr, *n;
+ int ret;
+
+ pr_info("Freescale hypervisor management driver\n");
+
+ if (!has_fsl_hypervisor()) {
+ pr_info("fsl-hv: no hypervisor found\n");
+ return -ENODEV;
+ }
+
+ ret = misc_register(&fsl_hv_misc_dev);
+ if (ret) {
+ pr_err("fsl-hv: cannot register device\n");
+ return ret;
+ }
+
+ INIT_LIST_HEAD(&db_list);
+ INIT_LIST_HEAD(&isr_list);
+
+ for_each_compatible_node(np, NULL, "epapr,hv-receive-doorbell") {
+ unsigned int irq;
+ const uint32_t *handle;
+
+ handle = of_get_property(np, "interrupts", NULL);
+ irq = irq_of_parse_and_map(np, 0);
+ if (!handle || (irq == NO_IRQ)) {
+ pr_err("fsl-hv: no 'interrupts' property in %s node\n",
+ np->full_name);
+ continue;
+ }
+
+ dbisr = kzalloc(sizeof(*dbisr), GFP_KERNEL);
+ if (!dbisr)
+ goto out_of_memory;
+
+ dbisr->irq = irq;
+ dbisr->doorbell = *handle;
+ INIT_WORK(&dbisr->work, fsl_hv_state_change_work_func);
+
+ if (of_device_is_compatible(np, "fsl,hv-shutdown-doorbell")) {
+ /* The shutdown doorbell gets its own ISR */
+ ret = request_irq(irq, fsl_hv_shutdown_isr, 0,
+ np->name, dbisr);
+ } else if (of_device_is_compatible(np,
+ "fsl,hv-state-change-doorbell")) {
+ /* The state change doorbell triggers a notification if
+ * the state of the managed partition changes to
+ * "stopped". We need a separate interrupt handler for
+ * that, and we also need to know the handle of the
+ * target partition, not just the handle of the
+ * doorbell.
+ */
+ dbisr->partition = ret = get_parent_handle(np);
+ if (ret < 0) {
+ pr_err("fsl-hv: node %s has missing or "
+ "malformed parent\n", np->full_name);
+ kfree(dbisr);
+ continue;
+ }
+ ret = request_irq(irq, fsl_hv_state_change_isr, 0,
+ np->name, dbisr);
+ } else
+ ret = request_irq(irq, fsl_hv_isr, 0, np->name, dbisr);
+
+ if (ret < 0) {
+ pr_err("fsl-hv: could not request irq %u for node %s\n",
+ irq, np->full_name);
+ kfree(dbisr);
+ continue;
+ }
+
+ list_add(&dbisr->list, &isr_list);
+
+ pr_info("fsl-hv: registered handler for doorbell %u\n",
+ *handle);
+ }
+
+ return 0;
+
+out_of_memory:
+ list_for_each_entry_safe(dbisr, n, &isr_list, list) {
+ free_irq(dbisr->irq, dbisr);
+ list_del(&dbisr->list);
+ kfree(dbisr);
+ }
+
+ misc_deregister(&fsl_hv_misc_dev);
+
+ return -ENOMEM;
+}
+
+/**
+ * fsl_hypervisor_exit: Freescale hypervisor management driver termination
+ *
+ * This function is called when this driver is unloaded.
+ */
+static void __exit fsl_hypervisor_exit(void)
+{
+ struct doorbell_isr *dbisr, *n;
+
+ list_for_each_entry_safe(dbisr, n, &isr_list, list) {
+ free_irq(dbisr->irq, dbisr);
+ list_del(&dbisr->list);
+ kfree(dbisr);
+ }
+
+ misc_deregister(&fsl_hv_misc_dev);
+}
+
+module_init(fsl_hypervisor_init);
+module_exit(fsl_hypervisor_exit);
+
+MODULE_AUTHOR("Timur Tabi <[email protected]>");
+MODULE_DESCRIPTION("Freescale hypervisor management driver");
+MODULE_LICENSE("GPL v2");
diff --git a/include/linux/fsl_hypervisor.h b/include/linux/fsl_hypervisor.h
new file mode 100644
index 0000000..63740a2
--- /dev/null
+++ b/include/linux/fsl_hypervisor.h
@@ -0,0 +1,203 @@
+/*
+ * Freescale hypervisor ioctl interface
+ *
+ * Copyright (C) 2008-2011 Freescale Semiconductor, Inc.
+ *
+ * This file is licensed under the terms of the GNU General Public License
+ * version 2. This program is licensed "as is" without any warranty of any
+ * kind, whether express or implied.
+ *
+ * This file is used by the Freescale hypervisor management driver. It can
+ * also be included by applications that need to communicate with the driver
+ * via the ioctl interface.
+ */
+
+#ifndef FSL_HYPERVISOR_H
+#define FSL_HYPERVISOR_H
+
+#include <linux/types.h>
+
+/**
+ * Freescale hypervisor ioctl parameter
+ */
+union fsl_hv_ioctl_param {
+
+ /**
+ * @ret: Return value.
+ *
+ * This is always the first word of any structure.
+ */
+ __u32 ret;
+
+ /**
+ * struct fsl_hv_ioctl_restart: restart a partition
+ * @ret: return error code from the hypervisor
+ * @partition: the ID of the partition to restart, or -1 for the
+ * calling partition
+ *
+ * Used by FSL_HV_IOCTL_PARTITION_RESTART
+ */
+ struct fsl_hv_ioctl_restart {
+ __u32 ret;
+ __u32 partition;
+ } restart;
+
+ /**
+ * struct fsl_hv_ioctl_status: get a partition's status
+ * @ret: return error code from the hypervisor
+ * @partition: the ID of the partition to query, or -1 for the
+ * calling partition
+ * @status: The returned status of the partition
+ *
+ * Used by FSL_HV_IOCTL_PARTITION_GET_STATUS
+ *
+ * Values of 'status':
+ * 0 = Stopped
+ * 1 = Running
+ * 2 = Starting
+ * 3 = Stopping
+ */
+ struct fsl_hv_ioctl_status {
+ __u32 ret;
+ __u32 partition;
+ __u32 status;
+ } status;
+
+ /**
+ * struct fsl_hv_ioctl_start: start a partition
+ * @ret: return error code from the hypervisor
+ * @partition: the ID of the partition to control
+ * @entry_point: The offset within the guest IMA to start execution
+ * @load: If non-zero, reload the partition's images before starting
+ *
+ * Used by FSL_HV_IOCTL_PARTITION_START
+ */
+ struct fsl_hv_ioctl_start {
+ __u32 ret;
+ __u32 partition;
+ __u32 entry_point;
+ __u32 load;
+ } start;
+
+ /**
+ * struct fsl_hv_ioctl_stop: stop a partition
+ * @ret: return error code from the hypervisor
+ * @partition: the ID of the partition to stop, or -1 for the calling
+ * partition
+ *
+ * Used by FSL_HV_IOCTL_PARTITION_STOP
+ */
+ struct fsl_hv_ioctl_stop {
+ __u32 ret;
+ __u32 partition;
+ } stop;
+
+ /**
+ * struct fsl_hv_ioctl_memcpy: copy memory between partitions
+ * @ret: return error code from the hypervisor
+ * @source: the partition ID of the source partition, or -1 for this
+ * partition
+ * @target: the partition ID of the target partition, or -1 for this
+ * partition
+ * @local_addr: user-space virtual address of a buffer in the local
+ * partition
+ * @remote_addr: guest physical address of a buffer in the
+ * remote partition
+ * @count: the number of bytes to copy. Both the local and remote
+ * buffers must be at least 'count' bytes long
+ *
+ * Used by FSL_HV_IOCTL_MEMCPY
+ *
+ * The 'local' partition is the partition that calls this ioctl. The
+ * 'remote' partition is a different partition. The data is copied from
+ * the 'source' paritition' to the 'target' partition.
+ *
+ * The buffer in the remote partition must be guest physically
+ * contiguous.
+ *
+ * This ioctl does not support copying memory between two remote
+ * partitions or within the same partition, so either 'source' or
+ * 'target' (but not both) must be -1. In other words, either
+ *
+ * source == local and target == remote
+ * or
+ * source == remote and target == local
+ */
+ struct fsl_hv_ioctl_memcpy {
+ __u32 ret;
+ __u32 source;
+ __u32 target;
+ __u64 local_vaddr;
+ __u64 remote_paddr;
+ __u64 count;
+ } memcpy;
+
+ /**
+ * struct fsl_hv_ioctl_doorbell: ring a doorbell
+ * @ret: return error code from the hypervisor
+ * @doorbell: the handle of the doorbell to ring doorbell
+ *
+ * Used by FSL_HV_IOCTL_DOORBELL
+ */
+ struct fsl_hv_ioctl_doorbell {
+ __u32 ret;
+ __u32 doorbell;
+ } doorbell;
+
+ /**
+ * struct fsl_hv_ioctl_prop: get/set a device tree property
+ * @ret: return error code from the hypervisor
+ * @handle: handle of partition whose tree to access
+ * @path: virtual address of path name of node to access
+ * @propname: virtual address of name of property to access
+ * @propval: virtual address of property data buffer
+ * @proplen: Size of property data buffer
+ *
+ * Used by FSL_HV_IOCTL_DOORBELL
+ */
+ struct fsl_hv_ioctl_prop {
+ __u32 ret;
+ __u32 handle;
+ __u64 path;
+ __u64 propname;
+ __u64 propval;
+ __u32 proplen;
+ } prop;
+};
+
+/*
+ * ioctl commands.
+ */
+enum {
+ FSL_HV_IOCTL_PARTITION_RESTART = 1, /* Boot another partition */
+ FSL_HV_IOCTL_PARTITION_GET_STATUS = 2, /* Boot another partition */
+ FSL_HV_IOCTL_PARTITION_START = 3, /* Boot another partition */
+ FSL_HV_IOCTL_PARTITION_STOP = 4, /* Stop this or another partition */
+ FSL_HV_IOCTL_MEMCPY = 5, /* Copy data from one partition to another */
+ FSL_HV_IOCTL_DOORBELL = 6, /* Ring a doorbell */
+
+ /* Get a property from another guest's device tree */
+ FSL_HV_IOCTL_GETPROP = 7,
+
+ /* Set a property in another guest's device tree */
+ FSL_HV_IOCTL_SETPROP = 8,
+};
+
+#ifdef __KERNEL__
+
+/**
+ * fsl_hv_event_register -- register a callback for failover events
+ *
+ * This function is called by device drivers to register their callback
+ * functions for fail-over events.
+ */
+int fsl_hv_failover_register(struct notifier_block *nb);
+
+/**
+ * fsl_hv_event_unregister -- unregister a callback for failover events
+ */
+int fsl_hv_failover_unregister(struct notifier_block *nb);
+
+#endif
+
+#endif
--
1.7.3.4
Timur Tabi wrote:
> have hypervisor extensions (e.g. the P4080 which has an e500mc core).
Oops, this email got munged. The first paragraph should say:
This patchset adds support for running Linux under the Freescale hypervisor,
which is an ePAPR-compliant hypervisor that runs on our PowerPC SOCs that
have hypervisor extensions (e.g. the P4080 which has an e500mc core).
--
Timur Tabi
Linux kernel developer at Freescale
On Thu, May 19, 2011 at 08:54:31AM -0500, Timur Tabi wrote:
> +/*
> + * The udbg subsystem calls this function to display a single character.
> + * We convert CR to a CR/LF.
> + */
> +static void ehv_bc_udbg_putc(char c)
> +{
> + if (c == '\n')
> + byte_channel_spin_send('\r');
> +
> + byte_channel_spin_send(c);
> +}
Why do this conversion in the driver? Shouldn't that be something that
userspace worries about?
thanks,
greg k-h
Greg KH wrote:
> Why do this conversion in the driver? Shouldn't that be something that
> userspace worries about?
The udbg interface is a very early kernel printk interface. I don't know what
the "u" stands for, but "dbg" is for "debug". The udbg interface is removed
once a normal console driver kicks in.
This is why I need to specify the byte channel handle via Kconfig. This code is
used so early that not even the device tree is available.
All of the udbg_putc functions do this.
--
Timur Tabi
Linux kernel developer at Freescale
> + struct tty_struct *ttys;
ttys are refcounted and you have a refcounted pointer for free in your
tty_port that is maintained by the tty_port logic, as well as it
providing ref counted, properly locked handling for the reference.
> +/******************************** TTY DRIVER ********************************/
> +
> +/*
> + * byte channel receive interupt handler
> + *
> + * This ISR is called whenever data is available on a byte channel.
> + */
> +static irqreturn_t ehv_bc_tty_rx_isr(int irq, void *data)
> +{
> + struct ehv_bc_data *bc = data;
> + struct tty_struct *ttys = bc->ttys;
ttys = tty_port_tty_get(&bc->port);
stuff
if (ttys != NULL)
tty stuff
tty_kref_put(ttys);
> + ev_byte_channel_poll(bc->handle, &rx_count, &tx_count);
> + count = tty_buffer_request_room(ttys, rx_count);
> +
> + /* 'count' is the maximum amount of data the TTY layer can accept at
> + * this time. However, during testing, I was never able to get 'count'
> + * to be less than 'rx_count'. I'm not sure whether I'm calling it
> + * correctly.
It will try hard to fulfill your request until 64K is queued. Before that
point your only expected failure is when the system kmalloc for
GFP_ATOMIC fails, which is an extreme situation.
> + /* Pass the received data to the tty layer. Note that this
> + * function calls tty_buffer_request_room(), so I'm not sure if
> + * we should have also called tty_buffer_request_room().
> + */
> + ret = tty_insert_flip_string(ttys, buffer, len);
You only need to request_room in advance if you can't handle the case
where the insert_flip_string returns less than you stuffed down it.
> + len = min_t(unsigned int,
> + CIRC_CNT_TO_END(bc->head, bc->tail, BUF_SIZE),
> + EV_BYTE_CHANNEL_MAX_BYTES);
The kfifo API is probably faster and cleaner. Much of tty still uses
CIRC_* because they predate the new APIs.
> + * This ISR is called whenever space becomes available for transmitting
> + * characters on a byte channel.
> + */
> +static irqreturn_t ehv_bc_tty_tx_isr(int irq, void *data)
> +{
> + struct ehv_bc_data *bc = data;
> +
> + ehv_bc_tx_dequeue(bc);
> + tty_wakeup(bc->ttys);
Again tty krefs/locking
> +/* This function can be called multiple times for a given tty_struct, which is
> + * why we initialize bc->ttys in ehv_bc_tty_port_activate() instead.
> + *
> + * For some reason, the tty layer will still call this function even if the
> + * device was not registered (i.e. tty_register_device() was not called). So
> + * we need to check for that.
[Because register_device is optional and some legacy drivers still don't
use it]
You really also need a hangup method so vhangup() does the right thing
and you can securely do logins etc and sessions on your console. As
you've got no hardware entangled in this and you already use tty_port
helpers the hangup helper will do the work for you.
I guess the only other thing to consider is whether you want to implement
a SYSRQ interface on your console ?
On Thu, 19 May 2011 07:22:25 -0700
Greg KH <[email protected]> wrote:
> On Thu, May 19, 2011 at 08:54:31AM -0500, Timur Tabi wrote:
> > +/*
> > + * The udbg subsystem calls this function to display a single character.
> > + * We convert CR to a CR/LF.
> > + */
> > +static void ehv_bc_udbg_putc(char c)
> > +{
> > + if (c == '\n')
> > + byte_channel_spin_send('\r');
> > +
> > + byte_channel_spin_send(c);
> > +}
>
> Why do this conversion in the driver? Shouldn't that be something that
> userspace worries about?
udbg is a bit before the land of userspace so it needs to do whatever
adaption the firmware/happyvisor interface wants.
On Thursday 19 May 2011, Timur Tabi wrote:
>
> The ePAPR embedded hypervisor specification provides an API for "byte
> channels", which are serial-like virtual devices for sending and receiving
> streams of bytes.
Why is this using a full tty driver instead of the hvc framework that most
other hypervisor consoles use?
Arnd
Arnd Bergmann wrote:
> Why is this using a full tty driver instead of the hvc framework that most
> other hypervisor consoles use?
Because HVC uses the same interface for consoles and tty, and that resulted in
dropped characters if the client driver returns EAGAIN because it's output
buffer is full. I posted a patch that "fixes" this, but it was rejected.
Here's the original patch:
http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-August/085136.html
And here's the thread discussing our concerns:
http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-September/thread.html
(search for "fix dropping of characters when output byte channel is full")
--
Timur Tabi
Linux kernel developer at Freescale
Alan Cox wrote:
> ttys = tty_port_tty_get(&bc->port);
> stuff
> if (ttys != NULL)
> tty stuff
> tty_kref_put(ttys);
Under what circumstances can ttys be NULL? I currently only use this code in
the RX and TX interrupt handlers, which are both enabled in the
tty_port_operations.activate() function.
Is this right for the TX handler:
static irqreturn_t ehv_bc_tty_tx_isr(int irq, void *data)
{
struct ehv_bc_data *bc = data;
struct tty_struct *ttys = tty_port_tty_get(&bc->port);
ehv_bc_tx_dequeue(bc);
if (ttys) {
tty_wakeup(ttys);
tty_kref_put(ttys);
}
return IRQ_HANDLED;
}
I just want to make sure that testing for NULL is really necessary in my
interrupt handlers.
>> > + len = min_t(unsigned int,
>> > + CIRC_CNT_TO_END(bc->head, bc->tail, BUF_SIZE),
>> > + EV_BYTE_CHANNEL_MAX_BYTES);
> The kfifo API is probably faster and cleaner. Much of tty still uses
> CIRC_* because they predate the new APIs.
Ok, I'll change it.
> You really also need a hangup method so vhangup() does the right thing
> and you can securely do logins etc and sessions on your console. As
> you've got no hardware entangled in this and you already use tty_port
> helpers the hangup helper will do the work for you.
Ok.
>
> I guess the only other thing to consider is whether you want to implement
> a SYSRQ interface on your console ?
I don't think byte channels can support SYSRQ, but I'll look into it.
--
Timur Tabi
Linux kernel developer at Freescale
> Under what circumstances can ttys be NULL? I currently only use this code in
> the RX and TX interrupt handlers, which are both enabled in the
> tty_port_operations.activate() function.
When you add hangup support.
>
> Is this right for the TX handler:
>
> static irqreturn_t ehv_bc_tty_tx_isr(int irq, void *data)
> {
> struct ehv_bc_data *bc = data;
> struct tty_struct *ttys = tty_port_tty_get(&bc->port);
>
> ehv_bc_tx_dequeue(bc);
> if (ttys) {
> tty_wakeup(ttys);
> tty_kref_put(ttys);
> }
>
> return IRQ_HANDLED;
Yes.
EV_BYTE_CHANNEL_MAX_BYTES);
> > The kfifo API is probably faster and cleaner. Much of tty still uses
> > CIRC_* because they predate the new APIs.
>
> Ok, I'll change it.
I flag that one up as a general comment - don't feel you need to change
it if CIRC_* works in your case.
> > I guess the only other thing to consider is whether you want to implement
> > a SYSRQ interface on your console ?
>
> I don't think byte channels can support SYSRQ, but I'll look into it.
What some drivers do in this case is nominate some obscure ctrl sequence
to mean 'sysrq' unless doubled (eg ctrl-^ etc)
Depends if the functionality is useful in your environment or not
Alan Cox wrote:
>>> > > The kfifo API is probably faster and cleaner. Much of tty still uses
>>> > > CIRC_* because they predate the new APIs.
>> >
>> > Ok, I'll change it.
> I flag that one up as a general comment - don't feel you need to change
> it if CIRC_* works in your case.
CIRC_* does work for me, so I'll keep it as-is.
>>> > > I guess the only other thing to consider is whether you want to implement
>>> > > a SYSRQ interface on your console ?
>> >
>> > I don't think byte channels can support SYSRQ, but I'll look into it.
> What some drivers do in this case is nominate some obscure ctrl sequence
> to mean 'sysrq' unless doubled (eg ctrl-^ etc)
Ok, I can do that.
> Depends if the functionality is useful in your environment or not
It is, but I'd like to add it later so that I can make the 2.6.40 window (if it
isn't already too late).
--
Timur Tabi
Linux kernel developer at Freescale
> Ok, I can do that.
>
> > Depends if the functionality is useful in your environment or not
>
> It is, but I'd like to add it later so that I can make the 2.6.40 window (if it
> isn't already too late).
Seems sensible.
Alan
On Thu, May 19, 2011 at 10:54:03AM -0500, Timur Tabi wrote:
> > Depends if the functionality is useful in your environment or not
>
> It is, but I'd like to add it later so that I can make the 2.6.40 window (if it
> isn't already too late).
It's too late, it needed to be in linux-next _before_ the window opened.
sorry,
greg k-h
Alan Cox wrote:
>> > + /* Pass the received data to the tty layer. Note that this
>> > + * function calls tty_buffer_request_room(), so I'm not sure if
>> > + * we should have also called tty_buffer_request_room().
>> > + */
>> > + ret = tty_insert_flip_string(ttys, buffer, len);
> You only need to request_room in advance if you can't handle the case
> where the insert_flip_string returns less than you stuffed down it.
If tty_insert_flip_string() returns less than I stuffed down it, the characters
it didn't accept will be dropped. That's because once I receive them, I have
nowhere else to put them. I suppose I could implement a receive FIFO, but that
seems overkill. If calling tty_buffer_request_room() ensures that
tty_insert_flip_string() always accepts all the characters, I would rather do that.
--
Timur Tabi
Linux kernel developer at Freescale
Greg KH wrote:
> It's too late, it needed to be in linux-next _before_ the window opened.
>
> sorry,
Curses! Foiled again!
Well, then I'd like to get this patchset fixed up and approved soon after the
window closes, so that there's no excuse for missing 2.6.41.
--
Timur Tabi
Linux kernel developer at Freescale
Alan Cox wrote:
> You really also need a hangup method so vhangup() does the right thing
> and you can securely do logins etc and sessions on your console. As
> you've got no hardware entangled in this and you already use tty_port
> helpers the hangup helper will do the work for you.
So all I need is this?
static void ehv_bc_tty_hangup(struct tty_struct *ttys)
{
struct ehv_bc_data *bc = ttys->driver_data;
tty_port_hangup(&bc->port);
}
I've noticed that some drivers flush their transmit buffers before calling
tty_port_hangup(), but some others don't. Should I do this too? I don't know
if hangup should be as quick as possible.
--
Timur Tabi
Linux kernel developer at Freescale
On Thu, 19 May 2011 11:31:32 -0500
Timur Tabi <[email protected]> wrote:
> Alan Cox wrote:
> > You really also need a hangup method so vhangup() does the right thing
> > and you can securely do logins etc and sessions on your console. As
> > you've got no hardware entangled in this and you already use tty_port
> > helpers the hangup helper will do the work for you.
>
> So all I need is this?
>
> static void ehv_bc_tty_hangup(struct tty_struct *ttys)
> {
> struct ehv_bc_data *bc = ttys->driver_data;
>
> tty_port_hangup(&bc->port);
> }
>
> I've noticed that some drivers flush their transmit buffers before calling
> tty_port_hangup(), but some others don't. Should I do this too? I don't know
> if hangup should be as quick as possible.
Doesn't matter too much. If you can flush it quickly then do so
On Thu, 19 May 2011 11:05:49 -0500
Timur Tabi <[email protected]> wrote:
> Alan Cox wrote:
> >> > + /* Pass the received data to the tty layer. Note that this
> >> > + * function calls tty_buffer_request_room(), so I'm not sure if
> >> > + * we should have also called tty_buffer_request_room().
> >> > + */
> >> > + ret = tty_insert_flip_string(ttys, buffer, len);
>
> > You only need to request_room in advance if you can't handle the case
> > where the insert_flip_string returns less than you stuffed down it.
>
> If tty_insert_flip_string() returns less than I stuffed down it, the characters
> it didn't accept will be dropped. That's because once I receive them, I have
> nowhere else to put them. I suppose I could implement a receive FIFO, but that
> seems overkill. If calling tty_buffer_request_room() ensures that
> tty_insert_flip_string() always accepts all the characters, I would rather do that.
I was answering the question in the comment in the code...
On May 19, 2011, at 8:54 AM, Timur Tabi wrote:
> have hypervisor extensions (e.g. the P4080 which has an e500mc core).
>
> I think it makes sense for this patchset to go through Kumar Gala's -next
> branch, but I still need ACKs from various people on the parts that are
> not e500-specific.
>
> 1. powerpc: make irq_choose_cpu() available to all PIC drivers
> 2. powerpc: introduce ePAPR embedded hypervisor hcall interface
> 3. powerpc: introduce the ePAPR embedded hypervisor vmpic driver
> 4. powerpc: add Freescale hypervisor partition control functions
> 5. powerpc/85xx: add board support for the Freescale hypervisor
> 6. tty/powerpc: introduce the ePAPR embedded hypervisor byte channel driver
> 7. drivers/misc: introduce Freescale hypervisor management driver
>
> Ben Herrenschmidt, please review/ack parts 1-3.
>
> Greg Kroah-Hartman, please review/ack part 6.
>
> Andrew Morton, please review/ack part 7.
>
> Thank you very much for looking at this patchset. I hope to have it included
> in 2.6.40.
Applied to 'test' branch. (grabbed 'v2' of tty patch). Fixed merged conflicts.
- k
On Fri, May 20, 2011 at 3:29 PM, Kumar Gala <[email protected]> wrote:
> Applied to 'test' branch. ?(grabbed 'v2' of tty patch). ?Fixed merged conflicts.
I don't think you pushed this branch to git.kernel.org
http://git.kernel.org/?p=linux/kernel/git/galak/powerpc.git;a=shortlog;h=refs/heads/test
--
Timur Tabi
Linux kernel developer at Freescale
On May 23, 2011, at 4:09 PM, Tabi Timur-B04825 wrote:
> On Fri, May 20, 2011 at 3:29 PM, Kumar Gala <[email protected]> wrote:
>
>> Applied to 'test' branch. (grabbed 'v2' of tty patch). Fixed merged conflicts.
>
> I don't think you pushed this branch to git.kernel.org
>
> http://git.kernel.org/?p=linux/kernel/git/galak/powerpc.git;a=shortlog;h=refs/heads/test
>
Tree pushed.
- k-
On May 19, 2011, at 8:54 AM, Timur Tabi wrote:
> From: Stuart Yoder <[email protected]>
>
> Move irq_choose_cpu() into arch/powerpc/kernel/irq.c so that it can be used
> by other PIC drivers. The function is not MPIC-specific.
>
> Signed-off-by: Stuart Yoder <[email protected]>
> Signed-off-by: Timur Tabi <[email protected]>
> ---
> arch/powerpc/include/asm/irq.h | 2 ++
> arch/powerpc/kernel/irq.c | 35 +++++++++++++++++++++++++++++++++++
> arch/powerpc/sysdev/mpic.c | 36 ------------------------------------
> 3 files changed, 37 insertions(+), 36 deletions(-)
applied to next
- k
On May 19, 2011, at 8:54 AM, Timur Tabi wrote:
> ePAPR hypervisors provide operating system services via a "hypercall"
> interface. The following steps need to be performed to make an hcall:
>
> 1. Load r11 with the hcall number
> 2. Load specific other registers with parameters
> 3. Issue instrucion "sc 1"
> 4. The return code is in r3
> 5. Other returned parameters are in other registers.
>
> To provide this service to the kernel, these steps are wrapped in inline
> assembly functions. Standard ePAPR hcalls are in epapr_hcalls.h, and Freescale
> extensions are in fsl_hcalls.h.
>
> Signed-off-by: Timur Tabi <[email protected]>
> ---
> arch/powerpc/include/asm/epapr_hcalls.h | 502 +++++++++++++++++++++++
> arch/powerpc/include/asm/fsl_hcalls.h | 655 +++++++++++++++++++++++++++++++
> 2 files changed, 1157 insertions(+), 0 deletions(-)
> create mode 100644 arch/powerpc/include/asm/epapr_hcalls.h
> create mode 100644 arch/powerpc/include/asm/fsl_hcalls.h
applied to next
- k
On May 19, 2011, at 8:54 AM, Timur Tabi wrote:
> From: Ashish Kalra <[email protected]>
>
> The Freescale ePAPR reference hypervisor provides interrupt controller services
> via a hypercall interface, instead of emulating the MPIC controller. This is
> called the VMPIC.
>
> The ePAPR "virtual interrupt controller" provides interrupt controller services
> for external interrupts. External interrupts received by a partition can come
> from two sources:
>
> - Hardware interrupts - hardware interrupts come from external
> interrupt lines or on-chip I/O devices.
> - Virtual interrupts - virtual interrupts are generated by the hypervisor
> as part of some hypervisor service or hypervisor-created virtual device.
>
> Both types of interrupts are processed using the same programming model and
> same set of hypercalls.
>
> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Timur Tabi <[email protected]>
> ---
> arch/powerpc/include/asm/ehv_pic.h | 40 +++++
> arch/powerpc/platforms/Kconfig | 4 +
> arch/powerpc/sysdev/Makefile | 1 +
> arch/powerpc/sysdev/ehv_pic.c | 302 ++++++++++++++++++++++++++++++++++++
> 4 files changed, 347 insertions(+), 0 deletions(-)
> create mode 100644 arch/powerpc/include/asm/ehv_pic.h
> create mode 100644 arch/powerpc/sysdev/ehv_pic.c
applied to next
- k
On May 19, 2011, at 8:54 AM, Timur Tabi wrote:
> Add functions to restart and halt the current partition when running under
> the Freescale hypervisor. These functions should be assigned to various
> function pointers of the ppc_md structure during the .probe() function for
> the board:
>
> ppc_md.restart = fsl_hv_restart;
> ppc_md.power_off = fsl_hv_halt;
> ppc_md.halt = fsl_hv_halt;
>
> Signed-off-by: Timur Tabi <[email protected]>
> ---
> arch/powerpc/sysdev/fsl_soc.c | 27 +++++++++++++++++++++++++++
> arch/powerpc/sysdev/fsl_soc.h | 3 +++
> 2 files changed, 30 insertions(+), 0 deletions(-)
applied to next
- k
On May 19, 2011, at 8:54 AM, Timur Tabi wrote:
> Add support for the ePAPR-compliant Freescale hypervisor (aka "Topaz") on the
> Freescale P3041DS, P4080DS, and P5020DS reference boards.
>
> Signed-off-by: Timur Tabi <[email protected]>
> ---
> arch/powerpc/platforms/85xx/Kconfig | 3 +++
> arch/powerpc/platforms/85xx/corenet_ds.c | 7 +++++++
> arch/powerpc/platforms/85xx/p3041_ds.c | 16 +++++++++++++++-
> arch/powerpc/platforms/85xx/p4080_ds.c | 29 ++++++++++++++++-------------
> arch/powerpc/platforms/85xx/p5020_ds.c | 16 +++++++++++++++-
> 5 files changed, 56 insertions(+), 15 deletions(-)
applied to next
- k