2008-11-18 09:03:57

by Bryan Wu

[permalink] [raw]
Subject: [PATCH 0/5] Blackfin SMP like patchset


Hi folks,

We provide the SMP like functions for our Blackfin dual core processor
BF561 for almost 1 year. And after a long time developing, debugging and
internal review, we'd like to post them to LKML for other maintainer
review.

Please find our wiki page about this SMP like patches:
http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like

In this patchset, we split the big patch into 5 parts:

1. SMP code related to BF561 processor and machine
2. SMP for Blackfin header file and machine common code
3. SMP for Blackfin CPLB code
4. SMP for Blackfin kernel and memory management code
5. other Blackfin misc code related to SMP

Please kindly review. We are queueing these patches for 2.6.29 merge
window

Thanks
-Bryan


2008-11-18 09:04:25

by Bryan Wu

[permalink] [raw]
Subject: [PATCH 1/5] Blackfin arch: SMP supporting patchset: BF561 related code

From: Graf Yang <[email protected]>

Blackfin dual core BF561 processor can support SMP like features.
https://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like

In this patch, we provide SMP extend to BF561 kernel code

Signed-off-by: Graf Yang <[email protected]>
Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: Bryan Wu <[email protected]>
---
arch/blackfin/mach-bf561/Kconfig | 6 +-
arch/blackfin/mach-bf561/Makefile | 1 +
arch/blackfin/mach-bf561/atomic.S | 954 ++++++++++++++++++++++
arch/blackfin/mach-bf561/include/mach/blackfin.h | 4 +
arch/blackfin/mach-bf561/include/mach/defBF561.h | 3 +
arch/blackfin/mach-bf561/include/mach/mem_map.h | 120 +++
arch/blackfin/mach-bf561/include/mach/smp.h | 22 +
arch/blackfin/mach-bf561/secondary.S | 215 +++++
arch/blackfin/mach-bf561/smp.c | 182 ++++
9 files changed, 1504 insertions(+), 3 deletions(-)
create mode 100644 arch/blackfin/mach-bf561/atomic.S
create mode 100644 arch/blackfin/mach-bf561/include/mach/smp.h
create mode 100644 arch/blackfin/mach-bf561/secondary.S
create mode 100644 arch/blackfin/mach-bf561/smp.c

diff --git a/arch/blackfin/mach-bf561/Kconfig b/arch/blackfin/mach-bf561/Kconfig
index 3f48954..5d56438 100644
--- a/arch/blackfin/mach-bf561/Kconfig
+++ b/arch/blackfin/mach-bf561/Kconfig
@@ -4,9 +4,9 @@ source "arch/blackfin/mach-bf561/boards/Kconfig"

menu "BF561 Specific Configuration"

-comment "Core B Support"
+if (!SMP)

-menu "Core B Support"
+comment "Core B Support"

config BF561_COREB
bool "Enable Core B support"
@@ -25,7 +25,7 @@ config BF561_COREB_RESET
0 is set, and will reset PC to 0xff600000 when
COREB_SRAM_INIT is cleared.

-endmenu
+endif

comment "Interrupt Priority Assignment"

diff --git a/arch/blackfin/mach-bf561/Makefile b/arch/blackfin/mach-bf561/Makefile
index f39235a..c37f00c 100644
--- a/arch/blackfin/mach-bf561/Makefile
+++ b/arch/blackfin/mach-bf561/Makefile
@@ -7,3 +7,4 @@ extra-y := head.o
obj-y := ints-priority.o dma.o

obj-$(CONFIG_BF561_COREB) += coreb.o
+obj-$(CONFIG_SMP) += smp.o secondary.o atomic.o
diff --git a/arch/blackfin/mach-bf561/atomic.S b/arch/blackfin/mach-bf561/atomic.S
new file mode 100644
index 0000000..d5c4fd8
--- /dev/null
+++ b/arch/blackfin/mach-bf561/atomic.S
@@ -0,0 +1,954 @@
+/*
+ * File: arch/blackfin/mach-bf561/atomic.S
+ * Author: Philippe Gerum <[email protected]>
+ *
+ * Copyright 2007 Analog Devices Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see the file COPYING, or write
+ * to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <linux/linkage.h>
+#include <asm/blackfin.h>
+#include <asm/cache.h>
+#include <asm/asm-offsets.h>
+#include <asm/rwlock.h>
+#include <asm/cplb.h>
+
+.text
+
+.macro coreslot_loadaddr
+ p0.l = _corelock;
+ p0.h = _corelock;
+.endm
+
+.macro do_ssync
+#if defined(ANOMALY_05000312) || defined(ANOMALY_05000244)
+ cli r2;
+#ifdef ANOMALY_05000244
+ nop;
+ nop;
+#endif
+ ssync;
+ sti r2;
+#else
+ ssync;
+#endif
+.endm
+
+.macro do_csync
+#if defined(ANOMALY_05000312) || defined(ANOMALY_05000244)
+ cli r2;
+#ifdef ANOMALY_05000244
+ nop;
+ nop;
+#endif
+ csync;
+ sti r2;
+#else
+ csync;
+#endif
+.endm
+
+.macro idlelize
+ do_ssync;
+.endm
+
+.align 2
+
+/*
+ * r0 = address of atomic data to flush and invalidate (32bit).
+ *
+ * Clear interrupts and return the old mask.
+ * We assume that no atomic data can span cachelines.
+ *
+ * Clobbers: r2:0, p0
+ */
+_get_core_lock:
+
+ r1 = -L1_CACHE_BYTES;
+ r1 = r0 & r1;
+ cli r0;
+ coreslot_loadaddr;
+.Lretry_corelock:
+ testset (p0);
+ if cc jump .Ldone_corelock;
+ idlelize;
+ jump .Lretry_corelock
+.Ldone_corelock:
+ p0 = r1;
+ do_csync;
+ flushinv[p0];
+ do_ssync;
+ rts;
+
+/*
+ * r0 = address of atomic data in uncacheable memory region (32bit).
+ *
+ * Clear interrupts and return the old mask.
+ *
+ * Clobbers: r0, p0
+ */
+_get_core_lock_noflush:
+
+ cli r0;
+ coreslot_loadaddr;
+.Lretry_corelock_noflush:
+ testset (p0);
+ if cc jump .Ldone_corelock_noflush;
+ idlelize;
+ jump .Lretry_corelock_noflush
+.Ldone_corelock_noflush:
+ rts;
+
+/*
+ * r0 = interrupt mask to restore.
+ * r1 = address of atomic data to flush and invalidate (32bit).
+ *
+ * Interrupts are masked on entry (see _get_core_lock).
+ * Clobbers: r2:0, p0
+ */
+_put_core_lock:
+
+ /* Write-through cache assumed, so no flush needed here. */
+ coreslot_loadaddr;
+ r1 = 0;
+ [p0] = r1;
+ do_ssync;
+ sti r0;
+ rts;
+
+#ifdef __ARCH_SYNC_CORE_DCACHE
+
+ENTRY(___raw_smp_mark_barrier_asm)
+
+ [--sp] = rets;
+ [--sp] = ( r7:5 );
+ [--sp] = r0;
+ [--sp] = p1;
+ [--sp] = p0;
+ call _get_core_lock_noflush;
+
+ /*
+ * Calculate current core mask
+ */
+ GET_CPUID(p1, r7);
+ r6 = 1;
+ r6 <<= r7;
+
+ /*
+ * Set bit of other cores in barrier mask. Don't change current core bit.
+ */
+ p1.l = _barrier_mask;
+ p1.h = _barrier_mask;
+ r7 = [p1];
+ r5 = r7 & r6;
+ r7 = ~r6;
+ cc = r5 == 0;
+ if cc jump 1f;
+ r7 = r7 | r6;
+1:
+ [p1] = r7;
+ do_ssync;
+
+ call _put_core_lock;
+ p0 = [sp++];
+ p1 = [sp++];
+ r0 = [sp++];
+ ( r7:5 ) = [sp++];
+ rets = [sp++];
+ rts;
+
+ENTRY(___raw_smp_check_barrier_asm)
+
+ [--sp] = rets;
+ [--sp] = ( r7:5 );
+ [--sp] = r0;
+ [--sp] = p1;
+ [--sp] = p0;
+ call _get_core_lock_noflush;
+
+ /*
+ * Calculate current core mask
+ */
+ GET_CPUID(p1, r7);
+ r6 = 1;
+ r6 <<= r7;
+
+ /*
+ * Clear current core bit in barrier mask if it is set.
+ */
+ p1.l = _barrier_mask;
+ p1.h = _barrier_mask;
+ r7 = [p1];
+ r5 = r7 & r6;
+ cc = r5 == 0;
+ if cc jump 1f;
+ r6 = ~r6;
+ r7 = r7 & r6;
+ [p1] = r7;
+ do_ssync;
+
+ call _put_core_lock;
+
+ /*
+ * Invalidate the entire D-cache of current core.
+ */
+ sp += -12;
+ call _resync_core_dcache
+ sp += 12;
+ jump 2f;
+1:
+ call _put_core_lock;
+2:
+ p0 = [sp++];
+ p1 = [sp++];
+ r0 = [sp++];
+ ( r7:5 ) = [sp++];
+ rets = [sp++];
+ rts;
+
+/*
+ * r0 = irqflags
+ * r1 = address of atomic data
+ *
+ * Clobbers: r2:0, p1:0
+ */
+_start_lock_coherent:
+
+ [--sp] = rets;
+ [--sp] = ( r7:6 );
+ r7 = r0;
+ p1 = r1;
+
+ /*
+ * Determine whether the atomic data was previously
+ * owned by another CPU (=r6).
+ */
+ GET_CPUID(p0, r2);
+ r1 = 1;
+ r1 <<= r2;
+ r2 = ~r1;
+
+ r1 = [p1];
+ r1 >>= 28; /* CPU fingerprints are stored in the high nibble. */
+ r6 = r1 & r2;
+ r1 = [p1];
+ r1 <<= 4;
+ r1 >>= 4;
+ [p1] = r1;
+
+ /*
+ * Release the core lock now, but keep IRQs disabled while we are
+ * performing the remaining housekeeping chores for the current CPU.
+ */
+ coreslot_loadaddr;
+ r1 = 0;
+ [p0] = r1;
+
+ /*
+ * If another CPU has owned the same atomic section before us,
+ * then our D-cached copy of the shared data protected by the
+ * current spin/write_lock may be obsolete.
+ */
+ cc = r6 == 0;
+ if cc jump .Lcache_synced
+
+ /*
+ * Invalidate the entire D-cache of the current core.
+ */
+ sp += -12;
+ call _resync_core_dcache
+ sp += 12;
+
+.Lcache_synced:
+ do_ssync;
+ sti r7;
+ ( r7:6 ) = [sp++];
+ rets = [sp++];
+ rts
+
+/*
+ * r0 = irqflags
+ * r1 = address of atomic data
+ *
+ * Clobbers: r2:0, p1:0
+ */
+_end_lock_coherent:
+
+ p1 = r1;
+ GET_CPUID(p0, r2);
+ r2 += 28;
+ r1 = 1;
+ r1 <<= r2;
+ r2 = [p1];
+ r2 = r1 | r2;
+ [p1] = r2;
+ r1 = p1;
+ jump _put_core_lock;
+
+#endif /* __ARCH_SYNC_CORE_DCACHE */
+
+/*
+ * r0 = &spinlock->lock
+ *
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_spin_is_locked_asm)
+
+ p1 = r0;
+ [--sp] = rets;
+ call _get_core_lock;
+ r3 = [p1];
+ cc = bittst( r3, 0 );
+ r3 = cc;
+ r1 = p1;
+ call _put_core_lock;
+ rets = [sp++];
+ r0 = r3;
+ rts;
+
+/*
+ * r0 = &spinlock->lock
+ *
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_spin_lock_asm)
+
+ p1 = r0;
+ [--sp] = rets;
+.Lretry_spinlock:
+ call _get_core_lock;
+ r1 = p1;
+ r2 = [p1];
+ cc = bittst( r2, 0 );
+ if cc jump .Lbusy_spinlock
+#ifdef __ARCH_SYNC_CORE_DCACHE
+ r3 = p1;
+ bitset ( r2, 0 ); /* Raise the lock bit. */
+ [p1] = r2;
+ call _start_lock_coherent
+#else
+ r2 = 1;
+ [p1] = r2;
+ call _put_core_lock;
+#endif
+ rets = [sp++];
+ rts;
+
+.Lbusy_spinlock:
+ /* We don't touch the atomic area if busy, so that flush
+ will behave like nop in _put_core_lock. */
+ call _put_core_lock;
+ idlelize;
+ r0 = p1;
+ jump .Lretry_spinlock
+
+/*
+ * r0 = &spinlock->lock
+ *
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_spin_trylock_asm)
+
+ p1 = r0;
+ [--sp] = rets;
+ call _get_core_lock;
+ r1 = p1;
+ r3 = [p1];
+ cc = bittst( r3, 0 );
+ if cc jump .Lfailed_trylock
+#ifdef __ARCH_SYNC_CORE_DCACHE
+ bitset ( r3, 0 ); /* Raise the lock bit. */
+ [p1] = r3;
+ call _start_lock_coherent
+#else
+ r2 = 1;
+ [p1] = r2;
+ call _put_core_lock;
+#endif
+ r0 = 1;
+ rets = [sp++];
+ rts;
+.Lfailed_trylock:
+ call _put_core_lock;
+ r0 = 0;
+ rets = [sp++];
+ rts;
+
+/*
+ * r0 = &spinlock->lock
+ *
+ * Clobbers: r2:0, p1:0
+ */
+
+ENTRY(___raw_spin_unlock_asm)
+
+ p1 = r0;
+ [--sp] = rets;
+ call _get_core_lock;
+ r2 = [p1];
+ bitclr ( r2, 0 );
+ [p1] = r2;
+ r1 = p1;
+#ifdef __ARCH_SYNC_CORE_DCACHE
+ call _end_lock_coherent
+#else
+ call _put_core_lock;
+#endif
+ rets = [sp++];
+ rts;
+
+/*
+ * r0 = &rwlock->lock
+ *
+ * Clobbers: r2:0, p1:0
+ */
+ENTRY(___raw_read_lock_asm)
+
+ p1 = r0;
+ [--sp] = rets;
+ call _get_core_lock;
+.Lrdlock_try:
+ r1 = [p1];
+ r1 += -1;
+ [p1] = r1;
+ cc = r1 < 0;
+ if cc jump .Lrdlock_failed
+ r1 = p1;
+#ifdef __ARCH_SYNC_CORE_DCACHE
+ call _start_lock_coherent
+#else
+ call _put_core_lock;
+#endif
+ rets = [sp++];
+ rts;
+
+.Lrdlock_failed:
+ r1 += 1;
+ [p1] = r1;
+.Lrdlock_wait:
+ r1 = p1;
+ call _put_core_lock;
+ idlelize;
+ r0 = p1;
+ call _get_core_lock;
+ r1 = [p1];
+ cc = r1 < 2;
+ if cc jump .Lrdlock_wait;
+ jump .Lrdlock_try
+
+/*
+ * r0 = &rwlock->lock
+ *
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_read_trylock_asm)
+
+ p1 = r0;
+ [--sp] = rets;
+ call _get_core_lock;
+ r1 = [p1];
+ cc = r1 <= 0;
+ if cc jump .Lfailed_tryrdlock;
+ r1 += -1;
+ [p1] = r1;
+ r1 = p1;
+#ifdef __ARCH_SYNC_CORE_DCACHE
+ call _start_lock_coherent
+#else
+ call _put_core_lock;
+#endif
+ rets = [sp++];
+ r0 = 1;
+ rts;
+.Lfailed_tryrdlock:
+ r1 = p1;
+ call _put_core_lock;
+ rets = [sp++];
+ r0 = 0;
+ rts;
+
+/*
+ * r0 = &rwlock->lock
+ *
+ * Note: Processing controlled by a reader lock should not have
+ * any side-effect on cache issues with the other core, so we
+ * just release the core lock and exit (no _end_lock_coherent).
+ *
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_read_unlock_asm)
+
+ p1 = r0;
+ [--sp] = rets;
+ call _get_core_lock;
+ r1 = [p1];
+ r1 += 1;
+ [p1] = r1;
+ r1 = p1;
+ call _put_core_lock;
+ rets = [sp++];
+ rts;
+
+/*
+ * r0 = &rwlock->lock
+ *
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_write_lock_asm)
+
+ p1 = r0;
+ r3.l = lo(RW_LOCK_BIAS);
+ r3.h = hi(RW_LOCK_BIAS);
+ [--sp] = rets;
+ call _get_core_lock;
+.Lwrlock_try:
+ r1 = [p1];
+ r1 = r1 - r3;
+#ifdef __ARCH_SYNC_CORE_DCACHE
+ r2 = r1;
+ r2 <<= 4;
+ r2 >>= 4;
+ cc = r2 == 0;
+#else
+ cc = r1 == 0;
+#endif
+ if !cc jump .Lwrlock_wait
+ [p1] = r1;
+ r1 = p1;
+#ifdef __ARCH_SYNC_CORE_DCACHE
+ call _start_lock_coherent
+#else
+ call _put_core_lock;
+#endif
+ rets = [sp++];
+ rts;
+
+.Lwrlock_wait:
+ r1 = p1;
+ call _put_core_lock;
+ idlelize;
+ r0 = p1;
+ call _get_core_lock;
+ r1 = [p1];
+#ifdef __ARCH_SYNC_CORE_DCACHE
+ r1 <<= 4;
+ r1 >>= 4;
+#endif
+ cc = r1 == r3;
+ if !cc jump .Lwrlock_wait;
+ jump .Lwrlock_try
+
+/*
+ * r0 = &rwlock->lock
+ *
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_write_trylock_asm)
+
+ p1 = r0;
+ [--sp] = rets;
+ call _get_core_lock;
+ r1 = [p1];
+ r2.l = lo(RW_LOCK_BIAS);
+ r2.h = hi(RW_LOCK_BIAS);
+ cc = r1 == r2;
+ if !cc jump .Lfailed_trywrlock;
+#ifdef __ARCH_SYNC_CORE_DCACHE
+ r1 >>= 28;
+ r1 <<= 28;
+#else
+ r1 = 0;
+#endif
+ [p1] = r1;
+ r1 = p1;
+#ifdef __ARCH_SYNC_CORE_DCACHE
+ call _start_lock_coherent
+#else
+ call _put_core_lock;
+#endif
+ rets = [sp++];
+ r0 = 1;
+ rts;
+
+.Lfailed_trywrlock:
+ r1 = p1;
+ call _put_core_lock;
+ rets = [sp++];
+ r0 = 0;
+ rts;
+
+/*
+ * r0 = &rwlock->lock
+ *
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_write_unlock_asm)
+
+ p1 = r0;
+ r3.l = lo(RW_LOCK_BIAS);
+ r3.h = hi(RW_LOCK_BIAS);
+ [--sp] = rets;
+ call _get_core_lock;
+ r1 = [p1];
+ r1 = r1 + r3;
+ [p1] = r1;
+ r1 = p1;
+#ifdef __ARCH_SYNC_CORE_DCACHE
+ call _end_lock_coherent
+#else
+ call _put_core_lock;
+#endif
+ rets = [sp++];
+ rts;
+
+/*
+ * r0 = ptr
+ * r1 = value
+ *
+ * Add a signed value to a 32bit word and return the new value atomically.
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_atomic_update_asm)
+
+ p1 = r0;
+ r3 = r1;
+ [--sp] = rets;
+ call _get_core_lock;
+ r2 = [p1];
+ r3 = r3 + r2;
+ [p1] = r3;
+ r1 = p1;
+ call _put_core_lock;
+ r0 = r3;
+ rets = [sp++];
+ rts;
+
+/*
+ * r0 = ptr
+ * r1 = mask
+ *
+ * Clear the mask bits from a 32bit word and return the old 32bit value
+ * atomically.
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_atomic_clear_asm)
+
+ p1 = r0;
+ r3 = ~r1;
+ [--sp] = rets;
+ call _get_core_lock;
+ r2 = [p1];
+ r3 = r2 & r3;
+ [p1] = r3;
+ r3 = r2;
+ r1 = p1;
+ call _put_core_lock;
+ r0 = r3;
+ rets = [sp++];
+ rts;
+
+/*
+ * r0 = ptr
+ * r1 = mask
+ *
+ * Set the mask bits into a 32bit word and return the old 32bit value
+ * atomically.
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_atomic_set_asm)
+
+ p1 = r0;
+ r3 = r1;
+ [--sp] = rets;
+ call _get_core_lock;
+ r2 = [p1];
+ r3 = r2 | r3;
+ [p1] = r3;
+ r3 = r2;
+ r1 = p1;
+ call _put_core_lock;
+ r0 = r3;
+ rets = [sp++];
+ rts;
+
+/*
+ * r0 = ptr
+ * r1 = mask
+ *
+ * XOR the mask bits with a 32bit word and return the old 32bit value
+ * atomically.
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_atomic_xor_asm)
+
+ p1 = r0;
+ r3 = r1;
+ [--sp] = rets;
+ call _get_core_lock;
+ r2 = [p1];
+ r3 = r2 ^ r3;
+ [p1] = r3;
+ r3 = r2;
+ r1 = p1;
+ call _put_core_lock;
+ r0 = r3;
+ rets = [sp++];
+ rts;
+
+/*
+ * r0 = ptr
+ * r1 = mask
+ *
+ * Perform a logical AND between the mask bits and a 32bit word, and
+ * return the masked value. We need this on this architecture in
+ * order to invalidate the local cache before testing.
+ *
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_atomic_test_asm)
+
+ p1 = r0;
+ r3 = r1;
+ r1 = -L1_CACHE_BYTES;
+ r1 = r0 & r1;
+ p0 = r1;
+ flushinv[p0];
+ do_ssync;
+ r0 = [p1];
+ r0 = r0 & r3;
+ rts;
+
+/*
+ * r0 = ptr
+ * r1 = value
+ *
+ * Swap *ptr with value and return the old 32bit value atomically.
+ * Clobbers: r3:0, p1:0
+ */
+#define __do_xchg(src, dst) \
+ p1 = r0; \
+ r3 = r1; \
+ [--sp] = rets; \
+ call _get_core_lock; \
+ r2 = src; \
+ dst = r3; \
+ r3 = r2; \
+ r1 = p1; \
+ call _put_core_lock; \
+ r0 = r3; \
+ rets = [sp++]; \
+ rts;
+
+ENTRY(___raw_xchg_1_asm)
+
+ __do_xchg(b[p1] (z), b[p1])
+
+ENTRY(___raw_xchg_2_asm)
+
+ __do_xchg(w[p1] (z), w[p1])
+
+ENTRY(___raw_xchg_4_asm)
+
+ __do_xchg([p1], [p1])
+
+/*
+ * r0 = ptr
+ * r1 = new
+ * r2 = old
+ *
+ * Swap *ptr with new if *ptr == old and return the previous *ptr
+ * value atomically.
+ *
+ * Clobbers: r3:0, p1:0
+ */
+#define __do_cmpxchg(src, dst) \
+ [--sp] = rets; \
+ [--sp] = r4; \
+ p1 = r0; \
+ r3 = r1; \
+ r4 = r2; \
+ call _get_core_lock; \
+ r2 = src; \
+ cc = r2 == r4; \
+ if !cc jump 1f; \
+ dst = r3; \
+ 1: r3 = r2; \
+ r1 = p1; \
+ call _put_core_lock; \
+ r0 = r3; \
+ r4 = [sp++]; \
+ rets = [sp++]; \
+ rts;
+
+ENTRY(___raw_cmpxchg_1_asm)
+
+ __do_cmpxchg(b[p1] (z), b[p1])
+
+ENTRY(___raw_cmpxchg_2_asm)
+
+ __do_cmpxchg(w[p1] (z), w[p1])
+
+ENTRY(___raw_cmpxchg_4_asm)
+
+ __do_cmpxchg([p1], [p1])
+
+/*
+ * r0 = ptr
+ * r1 = bitnr
+ *
+ * Set a bit in a 32bit word and return the old 32bit value atomically.
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_bit_set_asm)
+
+ r2 = r1;
+ r1 = 1;
+ r1 <<= r2;
+ jump ___raw_atomic_set_asm
+
+/*
+ * r0 = ptr
+ * r1 = bitnr
+ *
+ * Clear a bit in a 32bit word and return the old 32bit value atomically.
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_bit_clear_asm)
+
+ r2 = r1;
+ r1 = 1;
+ r1 <<= r2;
+ jump ___raw_atomic_clear_asm
+
+/*
+ * r0 = ptr
+ * r1 = bitnr
+ *
+ * Toggle a bit in a 32bit word and return the old 32bit value atomically.
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_bit_toggle_asm)
+
+ r2 = r1;
+ r1 = 1;
+ r1 <<= r2;
+ jump ___raw_atomic_xor_asm
+
+/*
+ * r0 = ptr
+ * r1 = bitnr
+ *
+ * Test-and-set a bit in a 32bit word and return the old bit value atomically.
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_bit_test_set_asm)
+
+ [--sp] = rets;
+ [--sp] = r1;
+ call ___raw_bit_set_asm
+ r1 = [sp++];
+ r2 = 1;
+ r2 <<= r1;
+ r0 = r0 & r2;
+ cc = r0 == 0;
+ if cc jump 1f
+ r0 = 1;
+1:
+ rets = [sp++];
+ rts;
+
+/*
+ * r0 = ptr
+ * r1 = bitnr
+ *
+ * Test-and-clear a bit in a 32bit word and return the old bit value atomically.
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_bit_test_clear_asm)
+
+ [--sp] = rets;
+ [--sp] = r1;
+ call ___raw_bit_clear_asm
+ r1 = [sp++];
+ r2 = 1;
+ r2 <<= r1;
+ r0 = r0 & r2;
+ cc = r0 == 0;
+ if cc jump 1f
+ r0 = 1;
+1:
+ rets = [sp++];
+ rts;
+
+/*
+ * r0 = ptr
+ * r1 = bitnr
+ *
+ * Test-and-toggle a bit in a 32bit word,
+ * and return the old bit value atomically.
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_bit_test_toggle_asm)
+
+ [--sp] = rets;
+ [--sp] = r1;
+ call ___raw_bit_toggle_asm
+ r1 = [sp++];
+ r2 = 1;
+ r2 <<= r1;
+ r0 = r0 & r2;
+ cc = r0 == 0;
+ if cc jump 1f
+ r0 = 1;
+1:
+ rets = [sp++];
+ rts;
+
+/*
+ * r0 = ptr
+ * r1 = bitnr
+ *
+ * Test a bit in a 32bit word and return its value.
+ * We need this on this architecture in order to invalidate
+ * the local cache before testing.
+ *
+ * Clobbers: r3:0, p1:0
+ */
+ENTRY(___raw_bit_test_asm)
+
+ r2 = r1;
+ r1 = 1;
+ r1 <<= r2;
+ jump ___raw_atomic_test_asm
+
+/*
+ * r0 = ptr
+ *
+ * Fetch and return an uncached 32bit value.
+ *
+ * Clobbers: r2:0, p1:0
+ */
+ENTRY(___raw_uncached_fetch_asm)
+
+ p1 = r0;
+ r1 = -L1_CACHE_BYTES;
+ r1 = r0 & r1;
+ p0 = r1;
+ flushinv[p0];
+ do_ssync;
+ r0 = [p1];
+ rts;
diff --git a/arch/blackfin/mach-bf561/include/mach/blackfin.h b/arch/blackfin/mach-bf561/include/mach/blackfin.h
index 0ea8666..f79f662 100644
--- a/arch/blackfin/mach-bf561/include/mach/blackfin.h
+++ b/arch/blackfin/mach-bf561/include/mach/blackfin.h
@@ -66,8 +66,12 @@

#define bfin_read_SIC_IMASK(x) bfin_read32(SICA_IMASK0 + (x << 2))
#define bfin_write_SIC_IMASK(x, val) bfin_write32((SICA_IMASK0 + (x << 2)), val)
+#define bfin_read_SICB_IMASK(x) bfin_read32(SICB_IMASK0 + (x << 2))
+#define bfin_write_SICB_IMASK(x, val) bfin_write32((SICB_IMASK0 + (x << 2)), val)
#define bfin_read_SIC_ISR(x) bfin_read32(SICA_ISR0 + (x << 2))
#define bfin_write_SIC_ISR(x, val) bfin_write32((SICA_ISR0 + (x << 2)), val)
+#define bfin_read_SICB_ISR(x) bfin_read32(SICB_ISR0 + (x << 2))
+#define bfin_write_SICB_ISR(x, val) bfin_write32((SICB_ISR0 + (x << 2)), val)

#define BFIN_UART_NR_PORTS 1

diff --git a/arch/blackfin/mach-bf561/include/mach/defBF561.h b/arch/blackfin/mach-bf561/include/mach/defBF561.h
index 4eca202..d7c5097 100644
--- a/arch/blackfin/mach-bf561/include/mach/defBF561.h
+++ b/arch/blackfin/mach-bf561/include/mach/defBF561.h
@@ -912,6 +912,9 @@
#define ACTIVE_PLLDISABLED 0x0004 /* Processor In Active Mode With PLL Disabled */
#define PLL_LOCKED 0x0020 /* PLL_LOCKCNT Has Been Reached */

+/* SICA_SYSCR Masks */
+#define COREB_SRAM_INIT 0x0020
+
/* SWRST Mask */
#define SYSTEM_RESET 0x0007 /* Initiates a system software reset */
#define DOUBLE_FAULT_A 0x0008 /* Core A Double Fault Causes Reset */
diff --git a/arch/blackfin/mach-bf561/include/mach/mem_map.h b/arch/blackfin/mach-bf561/include/mach/mem_map.h
index f1d4c06..488c3bd 100644
--- a/arch/blackfin/mach-bf561/include/mach/mem_map.h
+++ b/arch/blackfin/mach-bf561/include/mach/mem_map.h
@@ -85,4 +85,124 @@
#define L1_SCRATCH_START COREA_L1_SCRATCH_START
#define L1_SCRATCH_LENGTH 0x1000

+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_SMP
+
+#define get_l1_scratch_start_cpu(cpu) \
+ ({ unsigned long __addr; \
+ __addr = (cpu) ? COREB_L1_SCRATCH_START : COREA_L1_SCRATCH_START;\
+ __addr; })
+
+#define get_l1_code_start_cpu(cpu) \
+ ({ unsigned long __addr; \
+ __addr = (cpu) ? COREB_L1_CODE_START : COREA_L1_CODE_START; \
+ __addr; })
+
+#define get_l1_data_a_start_cpu(cpu) \
+ ({ unsigned long __addr; \
+ __addr = (cpu) ? COREB_L1_DATA_A_START : COREA_L1_DATA_A_START;\
+ __addr; })
+
+#define get_l1_data_b_start_cpu(cpu) \
+ ({ unsigned long __addr; \
+ __addr = (cpu) ? COREB_L1_DATA_B_START : COREA_L1_DATA_B_START;\
+ __addr; })
+
+#define get_l1_scratch_start() get_l1_scratch_start_cpu(blackfin_core_id())
+#define get_l1_code_start() get_l1_code_start_cpu(blackfin_core_id())
+#define get_l1_data_a_start() get_l1_data_a_start_cpu(blackfin_core_id())
+#define get_l1_data_b_start() get_l1_data_b_start_cpu(blackfin_core_id())
+
+#else /* !CONFIG_SMP */
+#define get_l1_scratch_start_cpu(cpu) L1_SCRATCH_START
+#define get_l1_code_start_cpu(cpu) L1_CODE_START
+#define get_l1_data_a_start_cpu(cpu) L1_DATA_A_START
+#define get_l1_data_b_start_cpu(cpu) L1_DATA_B_START
+#define get_l1_scratch_start() L1_SCRATCH_START
+#define get_l1_code_start() L1_CODE_START
+#define get_l1_data_a_start() L1_DATA_A_START
+#define get_l1_data_b_start() L1_DATA_B_START
+#endif /* !CONFIG_SMP */
+
+#else /* __ASSEMBLY__ */
+
+/*
+ * The following macros both return the address of the PDA for the
+ * current core.
+ *
+ * In its first safe (and hairy) form, the macro neither clobbers any
+ * register aside of the output Preg, nor uses the stack, since it
+ * could be called with an invalid stack pointer, or the current stack
+ * space being uncovered by any CPLB (e.g. early exception handling).
+ *
+ * The constraints on the second form are a bit relaxed, and the code
+ * is allowed to use the specified Dreg for determining the PDA
+ * address to be returned into Preg.
+ */
+#ifdef CONFIG_SMP
+#define GET_PDA_SAFE(preg) \
+ preg.l = lo(DSPID); \
+ preg.h = hi(DSPID); \
+ preg = [preg]; \
+ preg = preg << 2; \
+ preg = preg << 2; \
+ preg = preg << 2; \
+ preg = preg << 2; \
+ preg = preg << 2; \
+ preg = preg << 2; \
+ preg = preg << 2; \
+ preg = preg << 2; \
+ preg = preg << 2; \
+ preg = preg << 2; \
+ preg = preg << 2; \
+ preg = preg << 2; \
+ if cc jump 2f; \
+ cc = preg == 0x0; \
+ preg.l = _cpu_pda; \
+ preg.h = _cpu_pda; \
+ if !cc jump 3f; \
+1: \
+ /* preg = 0x0; */ \
+ cc = !cc; /* restore cc to 0 */ \
+ jump 4f; \
+2: \
+ cc = preg == 0x0; \
+ preg.l = _cpu_pda; \
+ preg.h = _cpu_pda; \
+ if cc jump 4f; \
+ /* preg = 0x1000000; */ \
+ cc = !cc; /* restore cc to 1 */ \
+3: \
+ preg = [preg]; \
+4:
+
+#define GET_PDA(preg, dreg) \
+ preg.l = lo(DSPID); \
+ preg.h = hi(DSPID); \
+ dreg = [preg]; \
+ preg.l = _cpu_pda; \
+ preg.h = _cpu_pda; \
+ cc = bittst(dreg, 0); \
+ if !cc jump 1f; \
+ preg = [preg]; \
+1: \
+
+#define GET_CPUID(preg, dreg) \
+ preg.l = lo(DSPID); \
+ preg.h = hi(DSPID); \
+ dreg = [preg]; \
+ dreg = ROT dreg BY -1; \
+ dreg = CC;
+
+#else
+#define GET_PDA_SAFE(preg) \
+ preg.l = _cpu_pda; \
+ preg.h = _cpu_pda;
+
+#define GET_PDA(preg, dreg) GET_PDA_SAFE(preg)
+#endif /* CONFIG_SMP */
+
+#endif /* __ASSEMBLY__ */
+
#endif /* _MEM_MAP_533_H_ */
diff --git a/arch/blackfin/mach-bf561/include/mach/smp.h b/arch/blackfin/mach-bf561/include/mach/smp.h
new file mode 100644
index 0000000..f9e65eb
--- /dev/null
+++ b/arch/blackfin/mach-bf561/include/mach/smp.h
@@ -0,0 +1,22 @@
+#ifndef _MACH_BF561_SMP
+#define _MACH_BF561_SMP
+
+struct task_struct;
+
+void platform_init_cpus(void);
+
+void platform_prepare_cpus(unsigned int max_cpus);
+
+int platform_boot_secondary(unsigned int cpu, struct task_struct *idle);
+
+void platform_secondary_init(unsigned int cpu);
+
+void platform_request_ipi(int (*handler)(int, void *));
+
+void platform_send_ipi(cpumask_t callmap);
+
+void platform_send_ipi_cpu(unsigned int cpu);
+
+void platform_clear_ipi(unsigned int cpu);
+
+#endif /* !_MACH_BF561_SMP */
diff --git a/arch/blackfin/mach-bf561/secondary.S b/arch/blackfin/mach-bf561/secondary.S
new file mode 100644
index 0000000..3532404
--- /dev/null
+++ b/arch/blackfin/mach-bf561/secondary.S
@@ -0,0 +1,215 @@
+/*
+ * File: arch/blackfin/mach-bf561/secondary.S
+ * Based on: arch/blackfin/mach-bf561/head.S
+ * Author: Philippe Gerum <[email protected]>
+ *
+ * Copyright 2007 Analog Devices Inc.
+ *
+ * Description: BF561 coreB bootstrap file
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see the file COPYING, or write
+ * to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <linux/linkage.h>
+#include <linux/init.h>
+#include <asm/blackfin.h>
+#include <asm/asm-offsets.h>
+
+__INIT
+
+/* Lay the initial stack into the L1 scratch area of Core B */
+#define INITIAL_STACK 0xFF701000
+
+ENTRY(_coreb_trampoline_start)
+ /* Set the SYSCFG register */
+ R0 = 0x36;
+ SYSCFG = R0; /*Enable Cycle Counter and Nesting Of Interrupts(3rd Bit)*/
+ R0 = 0;
+
+ /*Clear Out All the data and pointer Registers*/
+ R1 = R0;
+ R2 = R0;
+ R3 = R0;
+ R4 = R0;
+ R5 = R0;
+ R6 = R0;
+ R7 = R0;
+
+ P0 = R0;
+ P1 = R0;
+ P2 = R0;
+ P3 = R0;
+ P4 = R0;
+ P5 = R0;
+
+ LC0 = r0;
+ LC1 = r0;
+ L0 = r0;
+ L1 = r0;
+ L2 = r0;
+ L3 = r0;
+
+ /* Clear Out All the DAG Registers*/
+ B0 = r0;
+ B1 = r0;
+ B2 = r0;
+ B3 = r0;
+
+ I0 = r0;
+ I1 = r0;
+ I2 = r0;
+ I3 = r0;
+
+ M0 = r0;
+ M1 = r0;
+ M2 = r0;
+ M3 = r0;
+
+ /* Turn off the icache */
+ p0.l = LO(IMEM_CONTROL);
+ p0.h = HI(IMEM_CONTROL);
+ R1 = [p0];
+ R0 = ~ENICPLB;
+ R0 = R0 & R1;
+
+ /* Anomaly 05000125 */
+#ifdef ANOMALY_05000125
+ CLI R2;
+ SSYNC;
+#endif
+ [p0] = R0;
+ SSYNC;
+#ifdef ANOMALY_05000125
+ STI R2;
+#endif
+
+ /* Turn off the dcache */
+ p0.l = LO(DMEM_CONTROL);
+ p0.h = HI(DMEM_CONTROL);
+ R1 = [p0];
+ R0 = ~ENDCPLB;
+ R0 = R0 & R1;
+
+ /* Anomaly 05000125 */
+#ifdef ANOMALY_05000125
+ CLI R2;
+ SSYNC;
+#endif
+ [p0] = R0;
+ SSYNC;
+#ifdef ANOMALY_05000125
+ STI R2;
+#endif
+
+ /* in case of double faults, save a few things */
+ p0.l = _init_retx_coreb;
+ p0.h = _init_retx_coreb;
+ R0 = RETX;
+ [P0] = R0;
+
+#ifdef CONFIG_DEBUG_DOUBLEFAULT
+ /* Only save these if we are storing them,
+ * This happens here, since L1 gets clobbered
+ * below
+ */
+ GET_PDA(p0, r0);
+ r7 = [p0 + PDA_RETX];
+ p1.l = _init_saved_retx_coreb;
+ p1.h = _init_saved_retx_coreb;
+ [p1] = r7;
+
+ r7 = [p0 + PDA_DCPLB];
+ p1.l = _init_saved_dcplb_fault_addr_coreb;
+ p1.h = _init_saved_dcplb_fault_addr_coreb;
+ [p1] = r7;
+
+ r7 = [p0 + PDA_ICPLB];
+ p1.l = _init_saved_icplb_fault_addr_coreb;
+ p1.h = _init_saved_icplb_fault_addr_coreb;
+ [p1] = r7;
+
+ r7 = [p0 + PDA_SEQSTAT];
+ p1.l = _init_saved_seqstat_coreb;
+ p1.h = _init_saved_seqstat_coreb;
+ [p1] = r7;
+#endif
+
+ /* Initialize stack pointer */
+ sp.l = lo(INITIAL_STACK);
+ sp.h = hi(INITIAL_STACK);
+ fp = sp;
+ usp = sp;
+
+ /* This section keeps the processor in supervisor mode
+ * during core B startup. Branches to the idle task.
+ */
+
+ /* EVT15 = _real_start */
+
+ p0.l = lo(EVT15);
+ p0.h = hi(EVT15);
+ p1.l = _coreb_start;
+ p1.h = _coreb_start;
+ [p0] = p1;
+ csync;
+
+ p0.l = lo(IMASK);
+ p0.h = hi(IMASK);
+ p1.l = IMASK_IVG15;
+ p1.h = 0x0;
+ [p0] = p1;
+ csync;
+
+ raise 15;
+ p0.l = .LWAIT_HERE;
+ p0.h = .LWAIT_HERE;
+ reti = p0;
+#if defined(ANOMALY_05000281)
+ nop; nop; nop;
+#endif
+ rti;
+
+.LWAIT_HERE:
+ jump .LWAIT_HERE;
+
+.align 4
+ENTRY(_coreb_trampoline_end)
+
+ENTRY(_coreb_start)
+ [--sp] = reti;
+
+ p0.l = lo(WDOGB_CTL);
+ p0.h = hi(WDOGB_CTL);
+ r0 = 0xAD6(z);
+ w[p0] = r0; /* Clear the watchdog. */
+ ssync;
+
+ /*
+ * switch to IDLE stack.
+ */
+ p0.l = _secondary_stack;
+ p0.h = _secondary_stack;
+ sp = [p0];
+ usp = sp;
+ fp = sp;
+ sp += -12;
+ call _init_pda
+ sp += 12;
+ call _secondary_start_kernel;
+.L_exit:
+ jump.s .L_exit;
+
+__FINIT
diff --git a/arch/blackfin/mach-bf561/smp.c b/arch/blackfin/mach-bf561/smp.c
new file mode 100644
index 0000000..ba21051
--- /dev/null
+++ b/arch/blackfin/mach-bf561/smp.c
@@ -0,0 +1,182 @@
+/*
+ * File: arch/blackfin/mach-bf561/smp.c
+ * Author: Philippe Gerum <[email protected]>
+ *
+ * Copyright 2007 Analog Devices Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see the file COPYING, or write
+ * to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/delay.h>
+#include <asm/smp.h>
+#include <asm/dma.h>
+
+#define COREB_SRAM_BASE 0xff600000
+#define COREB_SRAM_SIZE 0x4000
+
+extern char coreb_trampoline_start, coreb_trampoline_end;
+
+static DEFINE_SPINLOCK(boot_lock);
+
+static cpumask_t cpu_callin_map;
+
+/*
+ * platform_init_cpus() - Tell the world about how many cores we
+ * have. This is called while setting up the architecture support
+ * (setup_arch()), so don't be too demanding here with respect to
+ * available kernel services.
+ */
+
+void __init platform_init_cpus(void)
+{
+ cpu_set(0, cpu_possible_map); /* CoreA */
+ cpu_set(1, cpu_possible_map); /* CoreB */
+}
+
+void __init platform_prepare_cpus(unsigned int max_cpus)
+{
+ int len;
+
+ len = &coreb_trampoline_end - &coreb_trampoline_start + 1;
+
+ if (len > COREB_SRAM_SIZE) {
+ /* Paranoid. */
+ printk(KERN_ERR "Bootstrap code size (%d) > CoreB SRAM (%d).\n",
+ len, COREB_SRAM_SIZE);
+ return;
+ }
+
+ dma_memcpy((void *)COREB_SRAM_BASE, &coreb_trampoline_start, len);
+
+ /* Both cores ought to be present on a bf561! */
+ cpu_set(0, cpu_present_map); /* CoreA */
+ cpu_set(1, cpu_present_map); /* CoreB */
+
+ printk(KERN_INFO "CoreB bootstrap code to SRAM %p via DMA.\n", (void *)COREB_SRAM_BASE);
+}
+
+int __init setup_profiling_timer(unsigned int multiplier) /* not supported */
+{
+ return -EINVAL;
+}
+
+void __cpuinit platform_secondary_init(unsigned int cpu)
+{
+ local_irq_disable();
+
+ /* Clone setup for peripheral interrupt sources from CoreA. */
+ bfin_write_SICB_IMASK0(bfin_read_SICA_IMASK0());
+ bfin_write_SICB_IMASK1(bfin_read_SICA_IMASK1());
+ SSYNC();
+
+ /* Clone setup for IARs from CoreA. */
+ bfin_write_SICB_IAR0(bfin_read_SICA_IAR0());
+ bfin_write_SICB_IAR1(bfin_read_SICA_IAR1());
+ bfin_write_SICB_IAR2(bfin_read_SICA_IAR2());
+ bfin_write_SICB_IAR3(bfin_read_SICA_IAR3());
+ bfin_write_SICB_IAR4(bfin_read_SICA_IAR4());
+ bfin_write_SICB_IAR5(bfin_read_SICA_IAR5());
+ bfin_write_SICB_IAR6(bfin_read_SICA_IAR6());
+ bfin_write_SICB_IAR7(bfin_read_SICA_IAR7());
+ SSYNC();
+
+ local_irq_enable();
+
+ /* Calibrate loops per jiffy value. */
+ calibrate_delay();
+
+ /* Store CPU-private information to the cpu_data array. */
+ bfin_setup_cpudata(cpu);
+
+ /* We are done with local CPU inits, unblock the boot CPU. */
+ cpu_set(cpu, cpu_callin_map);
+ spin_lock(&boot_lock);
+ spin_unlock(&boot_lock);
+}
+
+int __cpuinit platform_boot_secondary(unsigned int cpu, struct task_struct *idle)
+{
+ unsigned long timeout;
+
+ if ((bfin_read_SICA_SYSCR() & COREB_SRAM_INIT) == 0)
+ return -EBUSY; /* CoreB already running?! */
+
+ printk(KERN_INFO "Booting Core B.\n");
+
+ spin_lock(&boot_lock);
+
+ /* Kick CoreB, which should start execution from CORE_SRAM_BASE. */
+ SSYNC();
+ bfin_write_SICA_SYSCR(bfin_read_SICA_SYSCR() & ~COREB_SRAM_INIT);
+ SSYNC();
+
+ timeout = jiffies + 1 * HZ;
+ while (time_before(jiffies, timeout)) {
+ if (cpu_isset(cpu, cpu_callin_map))
+ break;
+ udelay(100);
+ barrier();
+ }
+
+ spin_unlock(&boot_lock);
+
+ return cpu_isset(cpu, cpu_callin_map) ? 0 : -ENOSYS;
+}
+
+void platform_request_ipi(irq_handler_t handler)
+{
+ int ret;
+
+ ret = request_irq(IRQ_SUPPLE_0, handler, IRQF_DISABLED,
+ "SMP interrupt", handler);
+ if (ret)
+ panic("Cannot request supplemental interrupt 0 for IPI service\n");
+}
+
+void platform_send_ipi(cpumask_t callmap)
+{
+ unsigned int cpu;
+
+ for_each_cpu_mask(cpu, callmap) {
+ if (likely(cpu < 2)) {
+ SSYNC();
+ bfin_write_SICB_SYSCR(bfin_read_SICB_SYSCR() | (1 << (6 + cpu)));
+ SSYNC();
+ }
+ }
+}
+
+void platform_send_ipi_cpu(unsigned int cpu)
+{
+
+ if (likely(cpu < 2)) {
+ SSYNC();
+ bfin_write_SICB_SYSCR(bfin_read_SICB_SYSCR() | (1 << (6 + cpu)));
+ SSYNC();
+ }
+}
+
+void platform_clear_ipi(unsigned int cpu)
+{
+ if (likely(cpu < 2)) {
+ SSYNC();
+ bfin_write_SICB_SYSCR(bfin_read_SICB_SYSCR() | (1 << (10 + cpu)));
+ SSYNC();
+ }
+}
--
1.5.6.3

2008-11-18 09:04:49

by Bryan Wu

[permalink] [raw]
Subject: [PATCH 3/5] Blackfin arch: SMP supporting patchset: Blackfin CPLB related code

From: Graf Yang <[email protected]>

Blackfin dual core BF561 processor can support SMP like features.
https://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like

In this patch, we provide SMP extend to Blackfin CPLB related code

Signed-off-by: Graf Yang <[email protected]>
Signed-off-by: Bryan Wu <[email protected]>
---
arch/blackfin/include/asm/cplb-mpu.h | 15 ++--
arch/blackfin/include/asm/cplb.h | 21 +++---
arch/blackfin/include/asm/cplbinit.h | 57 ++++++++++++---
arch/blackfin/include/asm/mmu_context.h | 27 +++++--
arch/blackfin/kernel/cplb-mpu/cacheinit.c | 4 +-
arch/blackfin/kernel/cplb-mpu/cplbinfo.c | 43 +++++++----
arch/blackfin/kernel/cplb-mpu/cplbinit.c | 43 ++++++------
arch/blackfin/kernel/cplb-mpu/cplbmgr.c | 102 ++++++++++++++-------------
arch/blackfin/kernel/cplb-nompu/cacheinit.c | 9 ++-
arch/blackfin/kernel/cplb-nompu/cplbinfo.c | 55 +++++++++------
arch/blackfin/kernel/cplb-nompu/cplbinit.c | 89 +++++++++---------------
arch/blackfin/kernel/cplb-nompu/cplbmgr.S | 29 ++++----
12 files changed, 275 insertions(+), 219 deletions(-)

diff --git a/arch/blackfin/include/asm/cplb-mpu.h b/arch/blackfin/include/asm/cplb-mpu.h
index 75c67b9..80680ad 100644
--- a/arch/blackfin/include/asm/cplb-mpu.h
+++ b/arch/blackfin/include/asm/cplb-mpu.h
@@ -28,6 +28,7 @@
*/
#ifndef __ASM_BFIN_CPLB_MPU_H
#define __ASM_BFIN_CPLB_MPU_H
+#include <linux/threads.h>

struct cplb_entry {
unsigned long data, addr;
@@ -39,22 +40,22 @@ struct mem_region {
unsigned long icplb_data;
};

-extern struct cplb_entry dcplb_tbl[MAX_CPLBS];
-extern struct cplb_entry icplb_tbl[MAX_CPLBS];
+extern struct cplb_entry dcplb_tbl[NR_CPUS][MAX_CPLBS];
+extern struct cplb_entry icplb_tbl[NR_CPUS][MAX_CPLBS];
extern int first_switched_icplb;
extern int first_mask_dcplb;
extern int first_switched_dcplb;

-extern int nr_dcplb_miss, nr_icplb_miss, nr_icplb_supv_miss, nr_dcplb_prot;
-extern int nr_cplb_flush;
+extern int nr_dcplb_miss[], nr_icplb_miss[], nr_icplb_supv_miss[];
+extern int nr_dcplb_prot[], nr_cplb_flush[];

extern int page_mask_order;
extern int page_mask_nelts;

-extern unsigned long *current_rwx_mask;
+extern unsigned long *current_rwx_mask[NR_CPUS];

-extern void flush_switched_cplbs(void);
-extern void set_mask_dcplbs(unsigned long *);
+extern void flush_switched_cplbs(unsigned int);
+extern void set_mask_dcplbs(unsigned long *, unsigned int);

extern void __noreturn panic_cplb_error(int seqstat, struct pt_regs *);

diff --git a/arch/blackfin/include/asm/cplb.h b/arch/blackfin/include/asm/cplb.h
index 9e8b403..5f7545d 100644
--- a/arch/blackfin/include/asm/cplb.h
+++ b/arch/blackfin/include/asm/cplb.h
@@ -30,7 +30,6 @@
#ifndef _CPLB_H
#define _CPLB_H

-#include <asm/blackfin.h>
#include <mach/anomaly.h>

#define SDRAM_IGENERIC (CPLB_L1_CHBL | CPLB_USER_RD | CPLB_VALID | CPLB_PORTPRIO)
@@ -55,13 +54,24 @@
#endif

#define L1_DMEMORY (CPLB_LOCK | CPLB_COMMON)
+
+#ifdef CONFIG_SMP
+#define L2_ATTR (INITIAL_T | I_CPLB | D_CPLB)
+#define L2_IMEMORY (CPLB_COMMON | CPLB_LOCK)
+#define L2_DMEMORY (CPLB_COMMON | CPLB_LOCK)
+
+#else
#ifdef CONFIG_BFIN_L2_CACHEABLE
#define L2_IMEMORY (SDRAM_IGENERIC)
#define L2_DMEMORY (SDRAM_DGENERIC)
#else
#define L2_IMEMORY (CPLB_COMMON)
#define L2_DMEMORY (CPLB_COMMON)
-#endif
+#endif /* CONFIG_BFIN_L2_CACHEABLE */
+
+#define L2_ATTR (INITIAL_T | SWITCH_T | I_CPLB | D_CPLB)
+#endif /* CONFIG_SMP */
+
#define SDRAM_DNON_CHBL (CPLB_COMMON)
#define SDRAM_EBIU (CPLB_COMMON)
#define SDRAM_OOPS (CPLB_VALID | ANOMALY_05000158_WORKAROUND | CPLB_LOCK | CPLB_DIRTY)
@@ -71,14 +81,7 @@
#define SIZE_1M 0x00100000 /* 1M */
#define SIZE_4M 0x00400000 /* 4M */

-#ifdef CONFIG_MPU
#define MAX_CPLBS 16
-#else
-#define MAX_CPLBS (16 * 2)
-#endif
-
-#define ASYNC_MEMORY_CPLB_COVERAGE ((ASYNC_BANK0_SIZE + ASYNC_BANK1_SIZE + \
- ASYNC_BANK2_SIZE + ASYNC_BANK3_SIZE) / SIZE_4M)

#define CPLB_ENABLE_ICACHE_P 0
#define CPLB_ENABLE_DCACHE_P 1
diff --git a/arch/blackfin/include/asm/cplbinit.h b/arch/blackfin/include/asm/cplbinit.h
index f845b41..6bfc257 100644
--- a/arch/blackfin/include/asm/cplbinit.h
+++ b/arch/blackfin/include/asm/cplbinit.h
@@ -36,6 +36,8 @@
#ifdef CONFIG_MPU

#include <asm/cplb-mpu.h>
+extern void bfin_icache_init(struct cplb_entry *icplb_tbl);
+extern void bfin_dcache_init(struct cplb_entry *icplb_tbl);

#else

@@ -46,8 +48,40 @@

#define IN_KERNEL 1

-enum
-{ZERO_P, L1I_MEM, L1D_MEM, SDRAM_KERN , SDRAM_RAM_MTD, SDRAM_DMAZ, RES_MEM, ASYNC_MEM, L2_MEM};
+#define ASYNC_MEMORY_CPLB_COVERAGE ((ASYNC_BANK0_SIZE + ASYNC_BANK1_SIZE + \
+ ASYNC_BANK2_SIZE + ASYNC_BANK3_SIZE) / SIZE_4M)
+
+#define CPLB_MEM CONFIG_MAX_MEM_SIZE
+
+/*
+* Number of required data CPLB switchtable entries
+* MEMSIZE / 4 (we mostly install 4M page size CPLBs
+* approx 16 for smaller 1MB page size CPLBs for allignment purposes
+* 1 for L1 Data Memory
+* possibly 1 for L2 Data Memory
+* 1 for CONFIG_DEBUG_HUNT_FOR_ZERO
+* 1 for ASYNC Memory
+*/
+#define MAX_SWITCH_D_CPLBS (((CPLB_MEM / 4) + 16 + 1 + 1 + 1 \
+ + ASYNC_MEMORY_CPLB_COVERAGE) * 2)
+
+/*
+* Number of required instruction CPLB switchtable entries
+* MEMSIZE / 4 (we mostly install 4M page size CPLBs
+* approx 12 for smaller 1MB page size CPLBs for allignment purposes
+* 1 for L1 Instruction Memory
+* possibly 1 for L2 Instruction Memory
+* 1 for CONFIG_DEBUG_HUNT_FOR_ZERO
+*/
+#define MAX_SWITCH_I_CPLBS (((CPLB_MEM / 4) + 12 + 1 + 1 + 1) * 2)
+
+/* Number of CPLB table entries, used for cplb-nompu. */
+#define CPLB_TBL_ENTRIES (16 * 4)
+
+enum {
+ ZERO_P, L1I_MEM, L1D_MEM, L2_MEM, SDRAM_KERN, SDRAM_RAM_MTD, SDRAM_DMAZ,
+ RES_MEM, ASYNC_MEM, OCB_ROM
+};

struct cplb_desc {
u32 start; /* start address */
@@ -66,8 +100,8 @@ struct cplb_tab {
u16 size;
};

-extern u_long icplb_table[];
-extern u_long dcplb_table[];
+extern u_long icplb_tables[NR_CPUS][CPLB_TBL_ENTRIES+1];
+extern u_long dcplb_tables[NR_CPUS][CPLB_TBL_ENTRIES+1];

/* Till here we are discussing about the static memory management model.
* However, the operating envoronments commonly define more CPLB
@@ -78,15 +112,18 @@ extern u_long dcplb_table[];
* This is how Page descriptor Table is implemented in uClinux/Blackfin.
*/

-extern u_long ipdt_table[];
-extern u_long dpdt_table[];
+extern u_long ipdt_tables[NR_CPUS][MAX_SWITCH_I_CPLBS+1];
+extern u_long dpdt_tables[NR_CPUS][MAX_SWITCH_D_CPLBS+1];
#ifdef CONFIG_CPLB_INFO
-extern u_long ipdt_swapcount_table[];
-extern u_long dpdt_swapcount_table[];
+extern u_long ipdt_swapcount_tables[NR_CPUS][MAX_SWITCH_I_CPLBS];
+extern u_long dpdt_swapcount_tables[NR_CPUS][MAX_SWITCH_D_CPLBS];
#endif
+extern void bfin_icache_init(u_long icplbs[]);
+extern void bfin_dcache_init(u_long dcplbs[]);

#endif /* CONFIG_MPU */

-extern void generate_cplb_tables(void);
-
+#if defined(CONFIG_BFIN_DCACHE) || defined(CONFIG_BFIN_ICACHE)
+extern void generate_cplb_tables_cpu(unsigned int cpu);
+#endif
#endif
diff --git a/arch/blackfin/include/asm/mmu_context.h b/arch/blackfin/include/asm/mmu_context.h
index 35593dd..944e29f 100644
--- a/arch/blackfin/include/asm/mmu_context.h
+++ b/arch/blackfin/include/asm/mmu_context.h
@@ -37,6 +37,10 @@
#include <asm/pgalloc.h>
#include <asm/cplbinit.h>

+/* Note: L1 stacks are CPU-private things, so we bluntly disable this
+ feature in SMP mode, and use the per-CPU scratch SRAM bank only to
+ store the PDA instead. */
+
extern void *current_l1_stack_save;
extern int nr_l1stack_tasks;
extern void *l1_stack_base;
@@ -88,12 +92,15 @@ activate_l1stack(struct mm_struct *mm, unsigned long sp_base)
static inline void switch_mm(struct mm_struct *prev_mm, struct mm_struct *next_mm,
struct task_struct *tsk)
{
+#ifdef CONFIG_MPU
+ unsigned int cpu = smp_processor_id();
+#endif
if (prev_mm == next_mm)
return;
#ifdef CONFIG_MPU
- if (prev_mm->context.page_rwx_mask == current_rwx_mask) {
- flush_switched_cplbs();
- set_mask_dcplbs(next_mm->context.page_rwx_mask);
+ if (prev_mm->context.page_rwx_mask == current_rwx_mask[cpu]) {
+ flush_switched_cplbs(cpu);
+ set_mask_dcplbs(next_mm->context.page_rwx_mask, cpu);
}
#endif

@@ -138,9 +145,10 @@ static inline void protect_page(struct mm_struct *mm, unsigned long addr,

static inline void update_protections(struct mm_struct *mm)
{
- if (mm->context.page_rwx_mask == current_rwx_mask) {
- flush_switched_cplbs();
- set_mask_dcplbs(mm->context.page_rwx_mask);
+ unsigned int cpu = smp_processor_id();
+ if (mm->context.page_rwx_mask == current_rwx_mask[cpu]) {
+ flush_switched_cplbs(cpu);
+ set_mask_dcplbs(mm->context.page_rwx_mask, cpu);
}
}
#endif
@@ -165,6 +173,9 @@ init_new_context(struct task_struct *tsk, struct mm_struct *mm)
static inline void destroy_context(struct mm_struct *mm)
{
struct sram_list_struct *tmp;
+#ifdef CONFIG_MPU
+ unsigned int cpu = smp_processor_id();
+#endif

#ifdef CONFIG_APP_STACK_L1
if (current_l1_stack_save == mm->context.l1_stack_save)
@@ -179,8 +190,8 @@ static inline void destroy_context(struct mm_struct *mm)
kfree(tmp);
}
#ifdef CONFIG_MPU
- if (current_rwx_mask == mm->context.page_rwx_mask)
- current_rwx_mask = NULL;
+ if (current_rwx_mask[cpu] == mm->context.page_rwx_mask)
+ current_rwx_mask[cpu] = NULL;
free_pages((unsigned long)mm->context.page_rwx_mask, page_mask_order);
#endif
}
diff --git a/arch/blackfin/kernel/cplb-mpu/cacheinit.c b/arch/blackfin/kernel/cplb-mpu/cacheinit.c
index a8b712a..c6ff947 100644
--- a/arch/blackfin/kernel/cplb-mpu/cacheinit.c
+++ b/arch/blackfin/kernel/cplb-mpu/cacheinit.c
@@ -25,7 +25,7 @@
#include <asm/cplbinit.h>

#if defined(CONFIG_BFIN_ICACHE)
-void __init bfin_icache_init(void)
+void __cpuinit bfin_icache_init(struct cplb_entry *icplb_tbl)
{
unsigned long ctrl;
int i;
@@ -43,7 +43,7 @@ void __init bfin_icache_init(void)
#endif

#if defined(CONFIG_BFIN_DCACHE)
-void __init bfin_dcache_init(void)
+void __cpuinit bfin_dcache_init(struct cplb_entry *dcplb_tbl)
{
unsigned long ctrl;
int i;
diff --git a/arch/blackfin/kernel/cplb-mpu/cplbinfo.c b/arch/blackfin/kernel/cplb-mpu/cplbinfo.c
index 822beef..00cb2cf 100644
--- a/arch/blackfin/kernel/cplb-mpu/cplbinfo.c
+++ b/arch/blackfin/kernel/cplb-mpu/cplbinfo.c
@@ -66,32 +66,32 @@ static char *cplb_print_entry(char *buf, struct cplb_entry *tbl, int switched)
return buf;
}

-int cplbinfo_proc_output(char *buf)
+int cplbinfo_proc_output(char *buf, void *data)
{
char *p;
+ unsigned int cpu = (unsigned int)data;;

p = buf;

- p += sprintf(p, "------------------ CPLB Information ------------------\n\n");
-
+ p += sprintf(p, "------------- CPLB Information on CPU%u --------------\n\n", cpu);
if (bfin_read_IMEM_CONTROL() & ENICPLB) {
p += sprintf(p, "Instruction CPLB entry:\n");
- p = cplb_print_entry(p, icplb_tbl, first_switched_icplb);
+ p = cplb_print_entry(p, icplb_tbl[cpu], first_switched_icplb);
} else
p += sprintf(p, "Instruction CPLB is disabled.\n\n");

if (1 || bfin_read_DMEM_CONTROL() & ENDCPLB) {
p += sprintf(p, "Data CPLB entry:\n");
- p = cplb_print_entry(p, dcplb_tbl, first_switched_dcplb);
+ p = cplb_print_entry(p, dcplb_tbl[cpu], first_switched_dcplb);
} else
p += sprintf(p, "Data CPLB is disabled.\n");

p += sprintf(p, "ICPLB miss: %d\nICPLB supervisor miss: %d\n",
- nr_icplb_miss, nr_icplb_supv_miss);
+ nr_icplb_miss[cpu], nr_icplb_supv_miss[cpu]);
p += sprintf(p, "DCPLB miss: %d\nDCPLB protection fault:%d\n",
- nr_dcplb_miss, nr_dcplb_prot);
+ nr_dcplb_miss[cpu], nr_dcplb_prot[cpu]);
p += sprintf(p, "CPLB flushes: %d\n",
- nr_cplb_flush);
+ nr_cplb_flush[cpu]);

return p - buf;
}
@@ -101,7 +101,7 @@ static int cplbinfo_read_proc(char *page, char **start, off_t off,
{
int len;

- len = cplbinfo_proc_output(page);
+ len = cplbinfo_proc_output(page, data);
if (len <= off + count)
*eof = 1;
*start = page + off;
@@ -115,20 +115,33 @@ static int cplbinfo_read_proc(char *page, char **start, off_t off,

static int __init cplbinfo_init(void)
{
- struct proc_dir_entry *entry;
+ struct proc_dir_entry *parent, *entry;
+ unsigned int cpu;
+ unsigned char str[10];
+
+ parent = proc_mkdir("cplbinfo", NULL);

- entry = create_proc_entry("cplbinfo", 0, NULL);
- if (!entry)
- return -ENOMEM;
+ for_each_online_cpu(cpu) {
+ sprintf(str, "cpu%u", cpu);
+ entry = create_proc_entry(str, 0, parent);
+ if (!entry)
+ return -ENOMEM;

- entry->read_proc = cplbinfo_read_proc;
- entry->data = NULL;
+ entry->read_proc = cplbinfo_read_proc;
+ entry->data = (void *)cpu;
+ }

return 0;
}

static void __exit cplbinfo_exit(void)
{
+ unsigned int cpu;
+ unsigned char str[20];
+ for_each_online_cpu(cpu) {
+ sprintf(str, "cplbinfo/cpu%u", cpu);
+ remove_proc_entry(str, NULL);
+ }
remove_proc_entry("cplbinfo", NULL);
}

diff --git a/arch/blackfin/kernel/cplb-mpu/cplbinit.c b/arch/blackfin/kernel/cplb-mpu/cplbinit.c
index 55af729..269d2a3 100644
--- a/arch/blackfin/kernel/cplb-mpu/cplbinit.c
+++ b/arch/blackfin/kernel/cplb-mpu/cplbinit.c
@@ -30,13 +30,13 @@
# error the MPU will not function safely while Anomaly 05000263 applies
#endif

-struct cplb_entry icplb_tbl[MAX_CPLBS];
-struct cplb_entry dcplb_tbl[MAX_CPLBS];
+struct cplb_entry icplb_tbl[NR_CPUS][MAX_CPLBS];
+struct cplb_entry dcplb_tbl[NR_CPUS][MAX_CPLBS];

int first_switched_icplb, first_switched_dcplb;
int first_mask_dcplb;

-void __init generate_cplb_tables(void)
+void __init generate_cplb_tables_cpu(unsigned int cpu)
{
int i_d, i_i;
unsigned long addr;
@@ -55,15 +55,16 @@ void __init generate_cplb_tables(void)
d_cache |= CPLB_L1_AOW | CPLB_WT;
#endif
#endif
+
i_d = i_i = 0;

/* Set up the zero page. */
- dcplb_tbl[i_d].addr = 0;
- dcplb_tbl[i_d++].data = SDRAM_OOPS | PAGE_SIZE_1KB;
+ dcplb_tbl[cpu][i_d].addr = 0;
+ dcplb_tbl[cpu][i_d++].data = SDRAM_OOPS | PAGE_SIZE_1KB;

#if 0
- icplb_tbl[i_i].addr = 0;
- icplb_tbl[i_i++].data = i_cache | CPLB_USER_RD | PAGE_SIZE_4KB;
+ icplb_tbl[cpu][i_i].addr = 0;
+ icplb_tbl[cpu][i_i++].data = i_cache | CPLB_USER_RD | PAGE_SIZE_4KB;
#endif

/* Cover kernel memory with 4M pages. */
@@ -72,28 +73,28 @@ void __init generate_cplb_tables(void)
i_data = i_cache | CPLB_VALID | CPLB_PORTPRIO | PAGE_SIZE_4MB;

for (; addr < memory_start; addr += 4 * 1024 * 1024) {
- dcplb_tbl[i_d].addr = addr;
- dcplb_tbl[i_d++].data = d_data;
- icplb_tbl[i_i].addr = addr;
- icplb_tbl[i_i++].data = i_data | (addr == 0 ? CPLB_USER_RD : 0);
+ dcplb_tbl[cpu][i_d].addr = addr;
+ dcplb_tbl[cpu][i_d++].data = d_data;
+ icplb_tbl[cpu][i_i].addr = addr;
+ icplb_tbl[cpu][i_i++].data = i_data | (addr == 0 ? CPLB_USER_RD : 0);
}

/* Cover L1 memory. One 4M area for code and data each is enough. */
#if L1_DATA_A_LENGTH > 0 || L1_DATA_B_LENGTH > 0
- dcplb_tbl[i_d].addr = L1_DATA_A_START;
- dcplb_tbl[i_d++].data = L1_DMEMORY | PAGE_SIZE_4MB;
+ dcplb_tbl[cpu][i_d].addr = get_l1_data_a_start_cpu(cpu);
+ dcplb_tbl[cpu][i_d++].data = L1_DMEMORY | PAGE_SIZE_4MB;
#endif
#if L1_CODE_LENGTH > 0
- icplb_tbl[i_i].addr = L1_CODE_START;
- icplb_tbl[i_i++].data = L1_IMEMORY | PAGE_SIZE_4MB;
+ icplb_tbl[cpu][i_i].addr = get_l1_code_start_cpu(cpu);
+ icplb_tbl[cpu][i_i++].data = L1_IMEMORY | PAGE_SIZE_4MB;
#endif

/* Cover L2 memory */
#if L2_LENGTH > 0
- dcplb_tbl[i_d].addr = L2_START;
- dcplb_tbl[i_d++].data = L2_DMEMORY | PAGE_SIZE_1MB;
- icplb_tbl[i_i].addr = L2_START;
- icplb_tbl[i_i++].data = L2_IMEMORY | PAGE_SIZE_1MB;
+ dcplb_tbl[cpu][i_d].addr = L2_START;
+ dcplb_tbl[cpu][i_d++].data = L2_DMEMORY | PAGE_SIZE_1MB;
+ icplb_tbl[cpu][i_i].addr = L2_START;
+ icplb_tbl[cpu][i_i++].data = L2_IMEMORY | PAGE_SIZE_1MB;
#endif

first_mask_dcplb = i_d;
@@ -101,7 +102,7 @@ void __init generate_cplb_tables(void)
first_switched_icplb = i_i;

while (i_d < MAX_CPLBS)
- dcplb_tbl[i_d++].data = 0;
+ dcplb_tbl[cpu][i_d++].data = 0;
while (i_i < MAX_CPLBS)
- icplb_tbl[i_i++].data = 0;
+ icplb_tbl[cpu][i_i++].data = 0;
}
diff --git a/arch/blackfin/kernel/cplb-mpu/cplbmgr.c b/arch/blackfin/kernel/cplb-mpu/cplbmgr.c
index baa52e2..76bd991 100644
--- a/arch/blackfin/kernel/cplb-mpu/cplbmgr.c
+++ b/arch/blackfin/kernel/cplb-mpu/cplbmgr.c
@@ -30,10 +30,11 @@

int page_mask_nelts;
int page_mask_order;
-unsigned long *current_rwx_mask;
+unsigned long *current_rwx_mask[NR_CPUS];

-int nr_dcplb_miss, nr_icplb_miss, nr_icplb_supv_miss, nr_dcplb_prot;
-int nr_cplb_flush;
+int nr_dcplb_miss[NR_CPUS], nr_icplb_miss[NR_CPUS];
+int nr_icplb_supv_miss[NR_CPUS], nr_dcplb_prot[NR_CPUS];
+int nr_cplb_flush[NR_CPUS];

static inline void disable_dcplb(void)
{
@@ -98,42 +99,42 @@ static inline int write_permitted(int status, unsigned long data)
}

/* Counters to implement round-robin replacement. */
-static int icplb_rr_index, dcplb_rr_index;
+static int icplb_rr_index[NR_CPUS], dcplb_rr_index[NR_CPUS];

/*
* Find an ICPLB entry to be evicted and return its index.
*/
-static int evict_one_icplb(void)
+static int evict_one_icplb(unsigned int cpu)
{
int i;
for (i = first_switched_icplb; i < MAX_CPLBS; i++)
- if ((icplb_tbl[i].data & CPLB_VALID) == 0)
+ if ((icplb_tbl[cpu][i].data & CPLB_VALID) == 0)
return i;
- i = first_switched_icplb + icplb_rr_index;
+ i = first_switched_icplb + icplb_rr_index[cpu];
if (i >= MAX_CPLBS) {
i -= MAX_CPLBS - first_switched_icplb;
- icplb_rr_index -= MAX_CPLBS - first_switched_icplb;
+ icplb_rr_index[cpu] -= MAX_CPLBS - first_switched_icplb;
}
- icplb_rr_index++;
+ icplb_rr_index[cpu]++;
return i;
}

-static int evict_one_dcplb(void)
+static int evict_one_dcplb(unsigned int cpu)
{
int i;
for (i = first_switched_dcplb; i < MAX_CPLBS; i++)
- if ((dcplb_tbl[i].data & CPLB_VALID) == 0)
+ if ((dcplb_tbl[cpu][i].data & CPLB_VALID) == 0)
return i;
- i = first_switched_dcplb + dcplb_rr_index;
+ i = first_switched_dcplb + dcplb_rr_index[cpu];
if (i >= MAX_CPLBS) {
i -= MAX_CPLBS - first_switched_dcplb;
- dcplb_rr_index -= MAX_CPLBS - first_switched_dcplb;
+ dcplb_rr_index[cpu] -= MAX_CPLBS - first_switched_dcplb;
}
- dcplb_rr_index++;
+ dcplb_rr_index[cpu]++;
return i;
}

-static noinline int dcplb_miss(void)
+static noinline int dcplb_miss(unsigned int cpu)
{
unsigned long addr = bfin_read_DCPLB_FAULT_ADDR();
int status = bfin_read_DCPLB_STATUS();
@@ -141,7 +142,7 @@ static noinline int dcplb_miss(void)
int idx;
unsigned long d_data;

- nr_dcplb_miss++;
+ nr_dcplb_miss[cpu]++;

d_data = CPLB_SUPV_WR | CPLB_VALID | CPLB_DIRTY | PAGE_SIZE_4KB;
#ifdef CONFIG_BFIN_DCACHE
@@ -168,25 +169,25 @@ static noinline int dcplb_miss(void)
} else if (addr >= _ramend) {
d_data |= CPLB_USER_RD | CPLB_USER_WR;
} else {
- mask = current_rwx_mask;
+ mask = current_rwx_mask[cpu];
if (mask) {
int page = addr >> PAGE_SHIFT;
- int offs = page >> 5;
+ int idx = page >> 5;
int bit = 1 << (page & 31);

- if (mask[offs] & bit)
+ if (mask[idx] & bit)
d_data |= CPLB_USER_RD;

mask += page_mask_nelts;
- if (mask[offs] & bit)
+ if (mask[idx] & bit)
d_data |= CPLB_USER_WR;
}
}
- idx = evict_one_dcplb();
+ idx = evict_one_dcplb(cpu);

addr &= PAGE_MASK;
- dcplb_tbl[idx].addr = addr;
- dcplb_tbl[idx].data = d_data;
+ dcplb_tbl[cpu][idx].addr = addr;
+ dcplb_tbl[cpu][idx].data = d_data;

disable_dcplb();
bfin_write32(DCPLB_DATA0 + idx * 4, d_data);
@@ -196,21 +197,21 @@ static noinline int dcplb_miss(void)
return 0;
}

-static noinline int icplb_miss(void)
+static noinline int icplb_miss(unsigned int cpu)
{
unsigned long addr = bfin_read_ICPLB_FAULT_ADDR();
int status = bfin_read_ICPLB_STATUS();
int idx;
unsigned long i_data;

- nr_icplb_miss++;
+ nr_icplb_miss[cpu]++;

/* If inside the uncached DMA region, fault. */
if (addr >= _ramend - DMA_UNCACHED_REGION && addr < _ramend)
return CPLB_PROT_VIOL;

if (status & FAULT_USERSUPV)
- nr_icplb_supv_miss++;
+ nr_icplb_supv_miss[cpu]++;

/*
* First, try to find a CPLB that matches this address. If we
@@ -218,8 +219,8 @@ static noinline int icplb_miss(void)
* that the instruction crosses a page boundary.
*/
for (idx = first_switched_icplb; idx < MAX_CPLBS; idx++) {
- if (icplb_tbl[idx].data & CPLB_VALID) {
- unsigned long this_addr = icplb_tbl[idx].addr;
+ if (icplb_tbl[cpu][idx].data & CPLB_VALID) {
+ unsigned long this_addr = icplb_tbl[cpu][idx].addr;
if (this_addr <= addr && this_addr + PAGE_SIZE > addr) {
addr += PAGE_SIZE;
break;
@@ -257,23 +258,23 @@ static noinline int icplb_miss(void)
* Otherwise, check the x bitmap of the current process.
*/
if (!(status & FAULT_USERSUPV)) {
- unsigned long *mask = current_rwx_mask;
+ unsigned long *mask = current_rwx_mask[cpu];

if (mask) {
int page = addr >> PAGE_SHIFT;
- int offs = page >> 5;
+ int idx = page >> 5;
int bit = 1 << (page & 31);

mask += 2 * page_mask_nelts;
- if (mask[offs] & bit)
+ if (mask[idx] & bit)
i_data |= CPLB_USER_RD;
}
}
}
- idx = evict_one_icplb();
+ idx = evict_one_icplb(cpu);
addr &= PAGE_MASK;
- icplb_tbl[idx].addr = addr;
- icplb_tbl[idx].data = i_data;
+ icplb_tbl[cpu][idx].addr = addr;
+ icplb_tbl[cpu][idx].data = i_data;

disable_icplb();
bfin_write32(ICPLB_DATA0 + idx * 4, i_data);
@@ -283,19 +284,19 @@ static noinline int icplb_miss(void)
return 0;
}

-static noinline int dcplb_protection_fault(void)
+static noinline int dcplb_protection_fault(unsigned int cpu)
{
int status = bfin_read_DCPLB_STATUS();

- nr_dcplb_prot++;
+ nr_dcplb_prot[cpu]++;

if (status & FAULT_RW) {
int idx = faulting_cplb_index(status);
- unsigned long data = dcplb_tbl[idx].data;
+ unsigned long data = dcplb_tbl[cpu][idx].data;
if (!(data & CPLB_WT) && !(data & CPLB_DIRTY) &&
write_permitted(status, data)) {
data |= CPLB_DIRTY;
- dcplb_tbl[idx].data = data;
+ dcplb_tbl[cpu][idx].data = data;
bfin_write32(DCPLB_DATA0 + idx * 4, data);
return 0;
}
@@ -306,36 +307,37 @@ static noinline int dcplb_protection_fault(void)
int cplb_hdr(int seqstat, struct pt_regs *regs)
{
int cause = seqstat & 0x3f;
+ unsigned int cpu = smp_processor_id();
switch (cause) {
case 0x23:
- return dcplb_protection_fault();
+ return dcplb_protection_fault(cpu);
case 0x2C:
- return icplb_miss();
+ return icplb_miss(cpu);
case 0x26:
- return dcplb_miss();
+ return dcplb_miss(cpu);
default:
return 1;
}
}

-void flush_switched_cplbs(void)
+void flush_switched_cplbs(unsigned int cpu)
{
int i;
unsigned long flags;

- nr_cplb_flush++;
+ nr_cplb_flush[cpu]++;

local_irq_save(flags);
disable_icplb();
for (i = first_switched_icplb; i < MAX_CPLBS; i++) {
- icplb_tbl[i].data = 0;
+ icplb_tbl[cpu][i].data = 0;
bfin_write32(ICPLB_DATA0 + i * 4, 0);
}
enable_icplb();

disable_dcplb();
for (i = first_switched_dcplb; i < MAX_CPLBS; i++) {
- dcplb_tbl[i].data = 0;
+ dcplb_tbl[cpu][i].data = 0;
bfin_write32(DCPLB_DATA0 + i * 4, 0);
}
enable_dcplb();
@@ -343,7 +345,7 @@ void flush_switched_cplbs(void)

}

-void set_mask_dcplbs(unsigned long *masks)
+void set_mask_dcplbs(unsigned long *masks, unsigned int cpu)
{
int i;
unsigned long addr = (unsigned long)masks;
@@ -351,12 +353,12 @@ void set_mask_dcplbs(unsigned long *masks)
unsigned long flags;

if (!masks) {
- current_rwx_mask = masks;
+ current_rwx_mask[cpu] = masks;
return;
}

local_irq_save(flags);
- current_rwx_mask = masks;
+ current_rwx_mask[cpu] = masks;

d_data = CPLB_SUPV_WR | CPLB_VALID | CPLB_DIRTY | PAGE_SIZE_4KB;
#ifdef CONFIG_BFIN_DCACHE
@@ -368,8 +370,8 @@ void set_mask_dcplbs(unsigned long *masks)

disable_dcplb();
for (i = first_mask_dcplb; i < first_switched_dcplb; i++) {
- dcplb_tbl[i].addr = addr;
- dcplb_tbl[i].data = d_data;
+ dcplb_tbl[cpu][i].addr = addr;
+ dcplb_tbl[cpu][i].data = d_data;
bfin_write32(DCPLB_DATA0 + i * 4, d_data);
bfin_write32(DCPLB_ADDR0 + i * 4, addr);
addr += PAGE_SIZE;
diff --git a/arch/blackfin/kernel/cplb-nompu/cacheinit.c b/arch/blackfin/kernel/cplb-nompu/cacheinit.c
index bd08315..3a385ae 100644
--- a/arch/blackfin/kernel/cplb-nompu/cacheinit.c
+++ b/arch/blackfin/kernel/cplb-nompu/cacheinit.c
@@ -25,9 +25,9 @@
#include <asm/cplbinit.h>

#if defined(CONFIG_BFIN_ICACHE)
-void __init bfin_icache_init(void)
+void __cpuinit bfin_icache_init(u_long icplb[])
{
- unsigned long *table = icplb_table;
+ unsigned long *table = icplb;
unsigned long ctrl;
int i;

@@ -47,9 +47,9 @@ void __init bfin_icache_init(void)
#endif

#if defined(CONFIG_BFIN_DCACHE)
-void __init bfin_dcache_init(void)
+void __cpuinit bfin_dcache_init(u_long dcplb[])
{
- unsigned long *table = dcplb_table;
+ unsigned long *table = dcplb;
unsigned long ctrl;
int i;

@@ -64,6 +64,7 @@ void __init bfin_dcache_init(void)
ctrl = bfin_read_DMEM_CONTROL();
ctrl |= DMEM_CNTR;
bfin_write_DMEM_CONTROL(ctrl);
+
SSYNC();
}
#endif
diff --git a/arch/blackfin/kernel/cplb-nompu/cplbinfo.c b/arch/blackfin/kernel/cplb-nompu/cplbinfo.c
index 1e74f0b..3f00809 100644
--- a/arch/blackfin/kernel/cplb-nompu/cplbinfo.c
+++ b/arch/blackfin/kernel/cplb-nompu/cplbinfo.c
@@ -68,22 +68,22 @@ static int cplb_find_entry(unsigned long *cplb_addr,
return -1;
}

-static char *cplb_print_entry(char *buf, int type)
+static char *cplb_print_entry(char *buf, int type, unsigned int cpu)
{
- unsigned long *p_addr = dpdt_table;
- unsigned long *p_data = dpdt_table + 1;
- unsigned long *p_icount = dpdt_swapcount_table;
- unsigned long *p_ocount = dpdt_swapcount_table + 1;
+ unsigned long *p_addr = dpdt_tables[cpu];
+ unsigned long *p_data = dpdt_tables[cpu] + 1;
+ unsigned long *p_icount = dpdt_swapcount_tables[cpu];
+ unsigned long *p_ocount = dpdt_swapcount_tables[cpu] + 1;
unsigned long *cplb_addr = (unsigned long *)DCPLB_ADDR0;
unsigned long *cplb_data = (unsigned long *)DCPLB_DATA0;
int entry = 0, used_cplb = 0;

if (type == CPLB_I) {
buf += sprintf(buf, "Instruction CPLB entry:\n");
- p_addr = ipdt_table;
- p_data = ipdt_table + 1;
- p_icount = ipdt_swapcount_table;
- p_ocount = ipdt_swapcount_table + 1;
+ p_addr = ipdt_tables[cpu];
+ p_data = ipdt_tables[cpu] + 1;
+ p_icount = ipdt_swapcount_tables[cpu];
+ p_ocount = ipdt_swapcount_tables[cpu] + 1;
cplb_addr = (unsigned long *)ICPLB_ADDR0;
cplb_data = (unsigned long *)ICPLB_DATA0;
} else
@@ -134,24 +134,24 @@ static char *cplb_print_entry(char *buf, int type)
return buf;
}

-static int cplbinfo_proc_output(char *buf)
+static int cplbinfo_proc_output(char *buf, void *data)
{
+ unsigned int cpu = (unsigned int)data;
char *p;

p = buf;

- p += sprintf(p, "------------------ CPLB Information ------------------\n\n");
+ p += sprintf(p, "------------- CPLB Information on CPU%u--------------\n\n", cpu);

if (bfin_read_IMEM_CONTROL() & ENICPLB)
- p = cplb_print_entry(p, CPLB_I);
+ p = cplb_print_entry(p, CPLB_I, cpu);
else
p += sprintf(p, "Instruction CPLB is disabled.\n\n");

if (bfin_read_DMEM_CONTROL() & ENDCPLB)
- p = cplb_print_entry(p, CPLB_D);
+ p = cplb_print_entry(p, CPLB_D, cpu);
else
p += sprintf(p, "Data CPLB is disabled.\n");
-
return p - buf;
}

@@ -160,7 +160,7 @@ static int cplbinfo_read_proc(char *page, char **start, off_t off,
{
int len;

- len = cplbinfo_proc_output(page);
+ len = cplbinfo_proc_output(page, data);
if (len <= off + count)
*eof = 1;
*start = page + off;
@@ -174,20 +174,33 @@ static int cplbinfo_read_proc(char *page, char **start, off_t off,

static int __init cplbinfo_init(void)
{
- struct proc_dir_entry *entry;
+ struct proc_dir_entry *parent, *entry;
+ unsigned int cpu;
+ unsigned char str[10];
+
+ parent = proc_mkdir("cplbinfo", NULL);

- entry = create_proc_entry("cplbinfo", 0, NULL);
- if (!entry)
- return -ENOMEM;
+ for_each_online_cpu(cpu) {
+ sprintf(str, "cpu%u", cpu);
+ entry = create_proc_entry(str, 0, parent);
+ if (!entry)
+ return -ENOMEM;

- entry->read_proc = cplbinfo_read_proc;
- entry->data = NULL;
+ entry->read_proc = cplbinfo_read_proc;
+ entry->data = (void *)cpu;
+ }

return 0;
}

static void __exit cplbinfo_exit(void)
{
+ unsigned int cpu;
+ unsigned char str[20];
+ for_each_online_cpu(cpu) {
+ sprintf(str, "cplbinfo/cpu%u", cpu);
+ remove_proc_entry(str, NULL);
+ }
remove_proc_entry("cplbinfo", NULL);
}

diff --git a/arch/blackfin/kernel/cplb-nompu/cplbinit.c b/arch/blackfin/kernel/cplb-nompu/cplbinit.c
index 2debc90..8966c70 100644
--- a/arch/blackfin/kernel/cplb-nompu/cplbinit.c
+++ b/arch/blackfin/kernel/cplb-nompu/cplbinit.c
@@ -27,46 +27,20 @@
#include <asm/cplb.h>
#include <asm/cplbinit.h>

-#define CPLB_MEM CONFIG_MAX_MEM_SIZE
-
-/*
-* Number of required data CPLB switchtable entries
-* MEMSIZE / 4 (we mostly install 4M page size CPLBs
-* approx 16 for smaller 1MB page size CPLBs for allignment purposes
-* 1 for L1 Data Memory
-* possibly 1 for L2 Data Memory
-* 1 for CONFIG_DEBUG_HUNT_FOR_ZERO
-* 1 for ASYNC Memory
-*/
-#define MAX_SWITCH_D_CPLBS (((CPLB_MEM / 4) + 16 + 1 + 1 + 1 \
- + ASYNC_MEMORY_CPLB_COVERAGE) * 2)
-
-/*
-* Number of required instruction CPLB switchtable entries
-* MEMSIZE / 4 (we mostly install 4M page size CPLBs
-* approx 12 for smaller 1MB page size CPLBs for allignment purposes
-* 1 for L1 Instruction Memory
-* possibly 1 for L2 Instruction Memory
-* 1 for CONFIG_DEBUG_HUNT_FOR_ZERO
-*/
-#define MAX_SWITCH_I_CPLBS (((CPLB_MEM / 4) + 12 + 1 + 1 + 1) * 2)
-
-
-u_long icplb_table[MAX_CPLBS + 1];
-u_long dcplb_table[MAX_CPLBS + 1];
+u_long icplb_tables[NR_CPUS][CPLB_TBL_ENTRIES+1];
+u_long dcplb_tables[NR_CPUS][CPLB_TBL_ENTRIES+1];

#ifdef CONFIG_CPLB_SWITCH_TAB_L1
-# define PDT_ATTR __attribute__((l1_data))
+#define PDT_ATTR __attribute__((l1_data))
#else
-# define PDT_ATTR
+#define PDT_ATTR
#endif

-u_long ipdt_table[MAX_SWITCH_I_CPLBS + 1] PDT_ATTR;
-u_long dpdt_table[MAX_SWITCH_D_CPLBS + 1] PDT_ATTR;
-
+u_long ipdt_tables[NR_CPUS][MAX_SWITCH_I_CPLBS+1] PDT_ATTR;
+u_long dpdt_tables[NR_CPUS][MAX_SWITCH_D_CPLBS+1] PDT_ATTR;
#ifdef CONFIG_CPLB_INFO
-u_long ipdt_swapcount_table[MAX_SWITCH_I_CPLBS] PDT_ATTR;
-u_long dpdt_swapcount_table[MAX_SWITCH_D_CPLBS] PDT_ATTR;
+u_long ipdt_swapcount_tables[NR_CPUS][MAX_SWITCH_I_CPLBS] PDT_ATTR;
+u_long dpdt_swapcount_tables[NR_CPUS][MAX_SWITCH_D_CPLBS] PDT_ATTR;
#endif

struct s_cplb {
@@ -93,8 +67,8 @@ static struct cplb_desc cplb_data[] = {
.name = "Zero Pointer Guard Page",
},
{
- .start = L1_CODE_START,
- .end = L1_CODE_START + L1_CODE_LENGTH,
+ .start = 0, /* dyanmic */
+ .end = 0, /* dynamic */
.psize = SIZE_4M,
.attr = INITIAL_T | SWITCH_T | I_CPLB,
.i_conf = L1_IMEMORY,
@@ -103,8 +77,8 @@ static struct cplb_desc cplb_data[] = {
.name = "L1 I-Memory",
},
{
- .start = L1_DATA_A_START,
- .end = L1_DATA_B_START + L1_DATA_B_LENGTH,
+ .start = 0, /* dynamic */
+ .end = 0, /* dynamic */
.psize = SIZE_4M,
.attr = INITIAL_T | SWITCH_T | D_CPLB,
.i_conf = 0,
@@ -117,6 +91,16 @@ static struct cplb_desc cplb_data[] = {
.name = "L1 D-Memory",
},
{
+ .start = L2_START,
+ .end = L2_START + L2_LENGTH,
+ .psize = SIZE_1M,
+ .attr = L2_ATTR,
+ .i_conf = L2_IMEMORY,
+ .d_conf = L2_DMEMORY,
+ .valid = (L2_LENGTH > 0),
+ .name = "L2 Memory",
+ },
+ {
.start = 0,
.end = 0, /* dynamic */
.psize = 0,
@@ -165,16 +149,6 @@ static struct cplb_desc cplb_data[] = {
.name = "Asynchronous Memory Banks",
},
{
- .start = L2_START,
- .end = L2_START + L2_LENGTH,
- .psize = SIZE_1M,
- .attr = SWITCH_T | I_CPLB | D_CPLB,
- .i_conf = L2_IMEMORY,
- .d_conf = L2_DMEMORY,
- .valid = (L2_LENGTH > 0),
- .name = "L2 Memory",
- },
- {
.start = BOOT_ROM_START,
.end = BOOT_ROM_START + BOOT_ROM_LENGTH,
.psize = SIZE_1M,
@@ -310,7 +284,7 @@ __fill_data_cplbtab(struct cplb_tab *t, int i, u32 a_start, u32 a_end)
}
}

-void __init generate_cplb_tables(void)
+void __init generate_cplb_tables_cpu(unsigned int cpu)
{

u16 i, j, process;
@@ -322,8 +296,8 @@ void __init generate_cplb_tables(void)

printk(KERN_INFO "NOMPU: setting up cplb tables for global access\n");

- cplb.init_i.size = MAX_CPLBS;
- cplb.init_d.size = MAX_CPLBS;
+ cplb.init_i.size = CPLB_TBL_ENTRIES;
+ cplb.init_d.size = CPLB_TBL_ENTRIES;
cplb.switch_i.size = MAX_SWITCH_I_CPLBS;
cplb.switch_d.size = MAX_SWITCH_D_CPLBS;

@@ -332,11 +306,15 @@ void __init generate_cplb_tables(void)
cplb.switch_i.pos = 0;
cplb.switch_d.pos = 0;

- cplb.init_i.tab = icplb_table;
- cplb.init_d.tab = dcplb_table;
- cplb.switch_i.tab = ipdt_table;
- cplb.switch_d.tab = dpdt_table;
+ cplb.init_i.tab = icplb_tables[cpu];
+ cplb.init_d.tab = dcplb_tables[cpu];
+ cplb.switch_i.tab = ipdt_tables[cpu];
+ cplb.switch_d.tab = dpdt_tables[cpu];

+ cplb_data[L1I_MEM].start = get_l1_code_start_cpu(cpu);
+ cplb_data[L1I_MEM].end = cplb_data[L1I_MEM].start + L1_CODE_LENGTH;
+ cplb_data[L1D_MEM].start = get_l1_data_a_start_cpu(cpu);
+ cplb_data[L1D_MEM].end = get_l1_data_b_start_cpu(cpu) + L1_DATA_B_LENGTH;
cplb_data[SDRAM_KERN].end = memory_end;

#ifdef CONFIG_MTD_UCLINUX
@@ -459,6 +437,5 @@ void __init generate_cplb_tables(void)
cplb.switch_d.tab[cplb.switch_d.pos] = -1;

}
-
#endif

diff --git a/arch/blackfin/kernel/cplb-nompu/cplbmgr.S b/arch/blackfin/kernel/cplb-nompu/cplbmgr.S
index f5cf3ac..985f3fc 100644
--- a/arch/blackfin/kernel/cplb-nompu/cplbmgr.S
+++ b/arch/blackfin/kernel/cplb-nompu/cplbmgr.S
@@ -52,6 +52,7 @@
#include <linux/linkage.h>
#include <asm/blackfin.h>
#include <asm/cplb.h>
+#include <asm/asm-offsets.h>

#ifdef CONFIG_EXCPT_IRQ_SYSC_L1
.section .l1.text
@@ -164,10 +165,9 @@ ENTRY(_cplb_mgr)
.Lifound_victim:
#ifdef CONFIG_CPLB_INFO
R7 = [P0 - 0x104];
- P2.L = _ipdt_table;
- P2.H = _ipdt_table;
- P3.L = _ipdt_swapcount_table;
- P3.H = _ipdt_swapcount_table;
+ GET_PDA(P2, R2);
+ P3 = [P2 + PDA_IPDT_SWAPCOUNT];
+ P2 = [P2 + PDA_IPDT];
P3 += -4;
.Licount:
R2 = [P2]; /* address from config table */
@@ -208,11 +208,10 @@ ENTRY(_cplb_mgr)
* range.
*/

- P2.L = _ipdt_table;
- P2.H = _ipdt_table;
+ GET_PDA(P3, R0);
+ P2 = [P3 + PDA_IPDT];
#ifdef CONFIG_CPLB_INFO
- P3.L = _ipdt_swapcount_table;
- P3.H = _ipdt_swapcount_table;
+ P3 = [P3 + PDA_IPDT_SWAPCOUNT];
P3 += -8;
#endif
P0.L = _page_size_table;
@@ -469,10 +468,9 @@ ENTRY(_cplb_mgr)

#ifdef CONFIG_CPLB_INFO
R7 = [P0 - 0x104];
- P2.L = _dpdt_table;
- P2.H = _dpdt_table;
- P3.L = _dpdt_swapcount_table;
- P3.H = _dpdt_swapcount_table;
+ GET_PDA(P2, R2);
+ P3 = [P2 + PDA_DPDT_SWAPCOUNT];
+ P2 = [P2 + PDA_DPDT];
P3 += -4;
.Ldicount:
R2 = [P2];
@@ -541,11 +539,10 @@ ENTRY(_cplb_mgr)

R0 = I0; /* Our faulting address */

- P2.L = _dpdt_table;
- P2.H = _dpdt_table;
+ GET_PDA(P3, R1);
+ P2 = [P3 + PDA_DPDT];
#ifdef CONFIG_CPLB_INFO
- P3.L = _dpdt_swapcount_table;
- P3.H = _dpdt_swapcount_table;
+ P3 = [P3 + PDA_DPDT_SWAPCOUNT];
P3 += -8;
#endif

--
1.5.6.3

2008-11-18 09:05:16

by Bryan Wu

[permalink] [raw]
Subject: [PATCH 5/5] Blackfin arch: SMP supporting patchset: some other misc code

From: Graf Yang <[email protected]>

Blackfin dual core BF561 processor can support SMP like features.
https://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like

In this patch, we provide SMP extend to some other misc code

Singed-off-by: Graf Yang <[email protected]>
Signed-off-by: Bryan Wu <[email protected]>
---
arch/blackfin/Kconfig | 32 +++++++++++++++++++++-
arch/blackfin/kernel/vmlinux.lds.S | 4 +-
arch/blackfin/mach-bf518/include/mach/mem_map.h | 15 ++++++++++
arch/blackfin/mach-bf527/include/mach/mem_map.h | 15 ++++++++++
arch/blackfin/mach-bf533/include/mach/mem_map.h | 15 ++++++++++
arch/blackfin/mach-bf537/include/mach/mem_map.h | 15 ++++++++++
arch/blackfin/mach-bf538/include/mach/mem_map.h | 15 ++++++++++
arch/blackfin/mach-bf548/include/mach/mem_map.h | 15 ++++++++++
8 files changed, 122 insertions(+), 4 deletions(-)

diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig
index 004c06c..7fc8a51 100644
--- a/arch/blackfin/Kconfig
+++ b/arch/blackfin/Kconfig
@@ -200,6 +200,32 @@ config BF561

endchoice

+config SMP
+ depends on BF561
+ bool "Symmetric multi-processing support"
+ ---help---
+ This enables support for systems with more than one CPU,
+ like the dual core BF561. If you have a system with only one
+ CPU, say N. If you have a system with more than one CPU, say Y.
+
+ If you don't know what to do here, say N.
+
+config NR_CPUS
+ int
+ depends on SMP
+ default 2 if BF561
+
+config IRQ_PER_CPU
+ bool
+ depends on SMP
+ default y
+
+config TICK_SOURCE_SYSTMR0
+ bool
+ select BFIN_GPTIMERS
+ depends on SMP
+ default y
+
config BF_REV_MIN
int
default 0 if (BF51x || BF52x || BF54x)
@@ -502,6 +528,7 @@ source kernel/Kconfig.hz

config GENERIC_TIME
bool "Generic time"
+ depends on !SMP
default y

config GENERIC_CLOCKEVENTS
@@ -576,6 +603,7 @@ endmenu


menu "Blackfin Kernel Optimizations"
+ depends on !SMP

comment "Memory Optimizations"

@@ -738,7 +766,6 @@ config BFIN_INS_LOWOVERHEAD

endmenu

-
choice
prompt "Kernel executes from"
help
@@ -804,7 +831,8 @@ config BFIN_ICACHE_LOCK
choice
prompt "Policy"
depends on BFIN_DCACHE
- default BFIN_WB
+ default BFIN_WB if !SMP
+ default BFIN_WT if SMP
config BFIN_WB
bool "Write back"
help
diff --git a/arch/blackfin/kernel/vmlinux.lds.S b/arch/blackfin/kernel/vmlinux.lds.S
index 7d12c66..2a48535 100644
--- a/arch/blackfin/kernel/vmlinux.lds.S
+++ b/arch/blackfin/kernel/vmlinux.lds.S
@@ -109,7 +109,7 @@ SECTIONS
#endif

DATA_DATA
- *(.data.*)
+ *(.data)
CONSTRUCTORS

/* make sure the init_task is aligned to the
@@ -161,6 +161,7 @@ SECTIONS
*(.con_initcall.init)
___con_initcall_end = .;
}
+ PERCPU(4)
SECURITY_INIT
.init.ramfs :
{
@@ -236,7 +237,6 @@ SECTIONS
. = ALIGN(4);
__ebss_l2 = .;
}
-
/* Force trailing alignment of our init section so that when we
* free our init memory, we don't leave behind a partial page.
*/
diff --git a/arch/blackfin/mach-bf518/include/mach/mem_map.h b/arch/blackfin/mach-bf518/include/mach/mem_map.h
index 10f678f..ac95d33 100644
--- a/arch/blackfin/mach-bf518/include/mach/mem_map.h
+++ b/arch/blackfin/mach-bf518/include/mach/mem_map.h
@@ -99,4 +99,19 @@
#define L1_SCRATCH_START 0xFFB00000
#define L1_SCRATCH_LENGTH 0x1000

+#define get_l1_scratch_start_cpu(cpu) L1_SCRATCH_START
+#define get_l1_code_start_cpu(cpu) L1_CODE_START
+#define get_l1_data_a_start_cpu(cpu) L1_DATA_A_START
+#define get_l1_data_b_start_cpu(cpu) L1_DATA_B_START
+#define get_l1_scratch_start() L1_SCRATCH_START
+#define get_l1_code_start() L1_CODE_START
+#define get_l1_data_a_start() L1_DATA_A_START
+#define get_l1_data_b_start() L1_DATA_B_START
+
+#define GET_PDA_SAFE(preg) \
+ preg.l = _cpu_pda; \
+ preg.h = _cpu_pda;
+
+#define GET_PDA(preg, dreg) GET_PDA_SAFE(preg)
+
#endif /* _MEM_MAP_518_H_ */
diff --git a/arch/blackfin/mach-bf527/include/mach/mem_map.h b/arch/blackfin/mach-bf527/include/mach/mem_map.h
index ef46dc9..bd7fe0f 100644
--- a/arch/blackfin/mach-bf527/include/mach/mem_map.h
+++ b/arch/blackfin/mach-bf527/include/mach/mem_map.h
@@ -99,4 +99,19 @@
#define L1_SCRATCH_START 0xFFB00000
#define L1_SCRATCH_LENGTH 0x1000

+#define get_l1_scratch_start_cpu(cpu) L1_SCRATCH_START
+#define get_l1_code_start_cpu(cpu) L1_CODE_START
+#define get_l1_data_a_start_cpu(cpu) L1_DATA_A_START
+#define get_l1_data_b_start_cpu(cpu) L1_DATA_B_START
+#define get_l1_scratch_start() L1_SCRATCH_START
+#define get_l1_code_start() L1_CODE_START
+#define get_l1_data_a_start() L1_DATA_A_START
+#define get_l1_data_b_start() L1_DATA_B_START
+
+#define GET_PDA_SAFE(preg) \
+ preg.l = _cpu_pda; \
+ preg.h = _cpu_pda;
+
+#define GET_PDA(preg, dreg) GET_PDA_SAFE(preg)
+
#endif /* _MEM_MAP_527_H_ */
diff --git a/arch/blackfin/mach-bf533/include/mach/mem_map.h b/arch/blackfin/mach-bf533/include/mach/mem_map.h
index 581fc6e..d5eaef2 100644
--- a/arch/blackfin/mach-bf533/include/mach/mem_map.h
+++ b/arch/blackfin/mach-bf533/include/mach/mem_map.h
@@ -168,4 +168,19 @@
#define L1_SCRATCH_START 0xFFB00000
#define L1_SCRATCH_LENGTH 0x1000

+#define get_l1_scratch_start_cpu(cpu) L1_SCRATCH_START
+#define get_l1_code_start_cpu(cpu) L1_CODE_START
+#define get_l1_data_a_start_cpu(cpu) L1_DATA_A_START
+#define get_l1_data_b_start_cpu(cpu) L1_DATA_B_START
+#define get_l1_scratch_start() L1_SCRATCH_START
+#define get_l1_code_start() L1_CODE_START
+#define get_l1_data_a_start() L1_DATA_A_START
+#define get_l1_data_b_start() L1_DATA_B_START
+
+#define GET_PDA_SAFE(preg) \
+ preg.l = _cpu_pda; \
+ preg.h = _cpu_pda;
+
+#define GET_PDA(preg, dreg) GET_PDA_SAFE(preg)
+
#endif /* _MEM_MAP_533_H_ */
diff --git a/arch/blackfin/mach-bf537/include/mach/mem_map.h b/arch/blackfin/mach-bf537/include/mach/mem_map.h
index 5078b66..be4de76 100644
--- a/arch/blackfin/mach-bf537/include/mach/mem_map.h
+++ b/arch/blackfin/mach-bf537/include/mach/mem_map.h
@@ -176,4 +176,19 @@
#define L1_SCRATCH_START 0xFFB00000
#define L1_SCRATCH_LENGTH 0x1000

+#define get_l1_scratch_start_cpu(cpu) L1_SCRATCH_START
+#define get_l1_code_start_cpu(cpu) L1_CODE_START
+#define get_l1_data_a_start_cpu(cpu) L1_DATA_A_START
+#define get_l1_data_b_start_cpu(cpu) L1_DATA_B_START
+#define get_l1_scratch_start() L1_SCRATCH_START
+#define get_l1_code_start() L1_CODE_START
+#define get_l1_data_a_start() L1_DATA_A_START
+#define get_l1_data_b_start() L1_DATA_B_START
+
+#define GET_PDA_SAFE(preg) \
+ preg.l = _cpu_pda; \
+ preg.h = _cpu_pda;
+
+#define GET_PDA(preg, dreg) GET_PDA_SAFE(preg)
+
#endif /* _MEM_MAP_537_H_ */
diff --git a/arch/blackfin/mach-bf538/include/mach/mem_map.h b/arch/blackfin/mach-bf538/include/mach/mem_map.h
index d65d430..c134057 100644
--- a/arch/blackfin/mach-bf538/include/mach/mem_map.h
+++ b/arch/blackfin/mach-bf538/include/mach/mem_map.h
@@ -104,4 +104,19 @@
#define L1_SCRATCH_START 0xFFB00000
#define L1_SCRATCH_LENGTH 0x1000

+#define get_l1_scratch_start_cpu(cpu) L1_SCRATCH_START
+#define get_l1_code_start_cpu(cpu) L1_CODE_START
+#define get_l1_data_a_start_cpu(cpu) L1_DATA_A_START
+#define get_l1_data_b_start_cpu(cpu) L1_DATA_B_START
+#define get_l1_scratch_start() L1_SCRATCH_START
+#define get_l1_code_start() L1_CODE_START
+#define get_l1_data_a_start() L1_DATA_A_START
+#define get_l1_data_b_start() L1_DATA_B_START
+
+#define GET_PDA_SAFE(preg) \
+ preg.l = _cpu_pda; \
+ preg.h = _cpu_pda;
+
+#define GET_PDA(preg, dreg) GET_PDA_SAFE(preg)
+
#endif /* _MEM_MAP_538_H_ */
diff --git a/arch/blackfin/mach-bf548/include/mach/mem_map.h b/arch/blackfin/mach-bf548/include/mach/mem_map.h
index a222842..361eb0e 100644
--- a/arch/blackfin/mach-bf548/include/mach/mem_map.h
+++ b/arch/blackfin/mach-bf548/include/mach/mem_map.h
@@ -108,4 +108,19 @@
#define L1_SCRATCH_START 0xFFB00000
#define L1_SCRATCH_LENGTH 0x1000

+#define get_l1_scratch_start_cpu(cpu) L1_SCRATCH_START
+#define get_l1_code_start_cpu(cpu) L1_CODE_START
+#define get_l1_data_a_start_cpu(cpu) L1_DATA_A_START
+#define get_l1_data_b_start_cpu(cpu) L1_DATA_B_START
+#define get_l1_scratch_start() L1_SCRATCH_START
+#define get_l1_code_start() L1_CODE_START
+#define get_l1_data_a_start() L1_DATA_A_START
+#define get_l1_data_b_start() L1_DATA_B_START
+
+#define GET_PDA_SAFE(preg) \
+ preg.l = _cpu_pda; \
+ preg.h = _cpu_pda;
+
+#define GET_PDA(preg, dreg) GET_PDA_SAFE(preg)
+
#endif/* _MEM_MAP_548_H_ */
--
1.5.6.3

2008-11-18 09:05:49

by Bryan Wu

[permalink] [raw]
Subject: [PATCH 2/5] Blackfin arch: SMP supporting patchset: Blackfin header files and machine common code

From: Graf Yang <[email protected]>

Blackfin dual core BF561 processor can support SMP like features.
https://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like

In this patch, we provide SMP extend to Blackfin header files
and machine common code

Signed-off-by: Graf Yang <[email protected]>
Signed-off-by: Bryan Wu <[email protected]>
---
arch/blackfin/include/asm/atomic.h | 124 ++++++--
arch/blackfin/include/asm/bfin-global.h | 5 +-
arch/blackfin/include/asm/bitops.h | 185 ++++++++----
arch/blackfin/include/asm/cache.h | 29 ++
arch/blackfin/include/asm/cacheflush.h | 20 +-
arch/blackfin/include/asm/context.S | 6 +-
arch/blackfin/include/asm/cpu.h | 42 +++
arch/blackfin/include/asm/l1layout.h | 3 +-
arch/blackfin/include/asm/mutex-dec.h | 112 +++++++
arch/blackfin/include/asm/mutex.h | 63 ++++
arch/blackfin/include/asm/pda.h | 70 ++++
arch/blackfin/include/asm/percpu.h | 12 +-
arch/blackfin/include/asm/processor.h | 7 +-
arch/blackfin/include/asm/rwlock.h | 6 +
arch/blackfin/include/asm/smp.h | 42 +++
arch/blackfin/include/asm/spinlock.h | 87 +++++-
arch/blackfin/include/asm/spinlock_types.h | 22 ++
arch/blackfin/include/asm/system.h | 116 ++++++--
arch/blackfin/mach-common/Makefile | 1 +
arch/blackfin/mach-common/cache.S | 36 ++
arch/blackfin/mach-common/entry.S | 92 +++---
arch/blackfin/mach-common/head.S | 29 +-
arch/blackfin/mach-common/ints-priority.c | 41 +++-
arch/blackfin/mach-common/smp.c | 476 ++++++++++++++++++++++++++++
arch/blackfin/oprofile/common.c | 2 +-
25 files changed, 1437 insertions(+), 191 deletions(-)
create mode 100644 arch/blackfin/include/asm/cpu.h
create mode 100644 arch/blackfin/include/asm/mutex-dec.h
create mode 100644 arch/blackfin/include/asm/pda.h
create mode 100644 arch/blackfin/include/asm/rwlock.h
create mode 100644 arch/blackfin/include/asm/smp.h
create mode 100644 arch/blackfin/include/asm/spinlock_types.h
create mode 100644 arch/blackfin/mach-common/smp.c

diff --git a/arch/blackfin/include/asm/atomic.h b/arch/blackfin/include/asm/atomic.h
index 7cf5087..8af0542 100644
--- a/arch/blackfin/include/asm/atomic.h
+++ b/arch/blackfin/include/asm/atomic.h
@@ -13,15 +13,83 @@
* Tony Kou ([email protected]) Lineo Inc. 2001
*/

-typedef struct {
- int counter;
-} atomic_t;
-#define ATOMIC_INIT(i) { (i) }
+typedef struct { volatile int counter; } atomic_t;

-#define atomic_read(v) ((v)->counter)
+#define ATOMIC_INIT(i) { (i) }
#define atomic_set(v, i) (((v)->counter) = i)

-static __inline__ void atomic_add(int i, atomic_t * v)
+#ifdef CONFIG_SMP
+
+#define atomic_read(v) __raw_uncached_fetch_asm(&(v)->counter)
+
+asmlinkage int __raw_uncached_fetch_asm(const volatile int *ptr);
+
+asmlinkage int __raw_atomic_update_asm(volatile int *ptr, int value);
+
+asmlinkage int __raw_atomic_clear_asm(volatile int *ptr, int value);
+
+asmlinkage int __raw_atomic_set_asm(volatile int *ptr, int value);
+
+asmlinkage int __raw_atomic_xor_asm(volatile int *ptr, int value);
+
+asmlinkage int __raw_atomic_test_asm(const volatile int *ptr, int value);
+
+static inline void atomic_add(int i, atomic_t *v)
+{
+ __raw_atomic_update_asm(&v->counter, i);
+}
+
+static inline void atomic_sub(int i, atomic_t *v)
+{
+ __raw_atomic_update_asm(&v->counter, -i);
+}
+
+static inline int atomic_add_return(int i, atomic_t *v)
+{
+ return __raw_atomic_update_asm(&v->counter, i);
+}
+
+static inline int atomic_sub_return(int i, atomic_t *v)
+{
+ return __raw_atomic_update_asm(&v->counter, -i);
+}
+
+static inline void atomic_inc(volatile atomic_t *v)
+{
+ __raw_atomic_update_asm(&v->counter, 1);
+}
+
+static inline void atomic_dec(volatile atomic_t *v)
+{
+ __raw_atomic_update_asm(&v->counter, -1);
+}
+
+static inline void atomic_clear_mask(int mask, atomic_t *v)
+{
+ __raw_atomic_clear_asm(&v->counter, mask);
+}
+
+static inline void atomic_set_mask(int mask, atomic_t *v)
+{
+ __raw_atomic_set_asm(&v->counter, mask);
+}
+
+static inline int atomic_test_mask(int mask, atomic_t *v)
+{
+ return __raw_atomic_test_asm(&v->counter, mask);
+}
+
+/* Atomic operations are already serializing */
+#define smp_mb__before_atomic_dec() barrier()
+#define smp_mb__after_atomic_dec() barrier()
+#define smp_mb__before_atomic_inc() barrier()
+#define smp_mb__after_atomic_inc() barrier()
+
+#else /* !CONFIG_SMP */
+
+#define atomic_read(v) ((v)->counter)
+
+static inline void atomic_add(int i, atomic_t *v)
{
long flags;

@@ -30,7 +98,7 @@ static __inline__ void atomic_add(int i, atomic_t * v)
local_irq_restore(flags);
}

-static __inline__ void atomic_sub(int i, atomic_t * v)
+static inline void atomic_sub(int i, atomic_t *v)
{
long flags;

@@ -40,7 +108,7 @@ static __inline__ void atomic_sub(int i, atomic_t * v)

}

-static inline int atomic_add_return(int i, atomic_t * v)
+static inline int atomic_add_return(int i, atomic_t *v)
{
int __temp = 0;
long flags;
@@ -54,8 +122,7 @@ static inline int atomic_add_return(int i, atomic_t * v)
return __temp;
}

-#define atomic_add_negative(a, v) (atomic_add_return((a), (v)) < 0)
-static inline int atomic_sub_return(int i, atomic_t * v)
+static inline int atomic_sub_return(int i, atomic_t *v)
{
int __temp = 0;
long flags;
@@ -68,7 +135,7 @@ static inline int atomic_sub_return(int i, atomic_t * v)
return __temp;
}

-static __inline__ void atomic_inc(volatile atomic_t * v)
+static inline void atomic_inc(volatile atomic_t *v)
{
long flags;

@@ -77,20 +144,7 @@ static __inline__ void atomic_inc(volatile atomic_t * v)
local_irq_restore(flags);
}

-#define atomic_cmpxchg(v, o, n) ((int)cmpxchg(&((v)->counter), (o), (n)))
-#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
-
-#define atomic_add_unless(v, a, u) \
-({ \
- int c, old; \
- c = atomic_read(v); \
- while (c != (u) && (old = atomic_cmpxchg((v), c, c + (a))) != c) \
- c = old; \
- c != (u); \
-})
-#define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0)
-
-static __inline__ void atomic_dec(volatile atomic_t * v)
+static inline void atomic_dec(volatile atomic_t *v)
{
long flags;

@@ -99,7 +153,7 @@ static __inline__ void atomic_dec(volatile atomic_t * v)
local_irq_restore(flags);
}

-static __inline__ void atomic_clear_mask(unsigned int mask, atomic_t * v)
+static inline void atomic_clear_mask(unsigned int mask, atomic_t *v)
{
long flags;

@@ -108,7 +162,7 @@ static __inline__ void atomic_clear_mask(unsigned int mask, atomic_t * v)
local_irq_restore(flags);
}

-static __inline__ void atomic_set_mask(unsigned int mask, atomic_t * v)
+static inline void atomic_set_mask(unsigned int mask, atomic_t *v)
{
long flags;

@@ -123,9 +177,25 @@ static __inline__ void atomic_set_mask(unsigned int mask, atomic_t * v)
#define smp_mb__before_atomic_inc() barrier()
#define smp_mb__after_atomic_inc() barrier()

+#endif /* !CONFIG_SMP */
+
+#define atomic_add_negative(a, v) (atomic_add_return((a), (v)) < 0)
#define atomic_dec_return(v) atomic_sub_return(1,(v))
#define atomic_inc_return(v) atomic_add_return(1,(v))

+#define atomic_cmpxchg(v, o, n) ((int)cmpxchg(&((v)->counter), (o), (n)))
+#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
+
+#define atomic_add_unless(v, a, u) \
+({ \
+ int c, old; \
+ c = atomic_read(v); \
+ while (c != (u) && (old = atomic_cmpxchg((v), c, c + (a))) != c) \
+ c = old; \
+ c != (u); \
+})
+#define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0)
+
/*
* atomic_inc_and_test - increment and test
* @v: pointer of type atomic_t
diff --git a/arch/blackfin/include/asm/bfin-global.h b/arch/blackfin/include/asm/bfin-global.h
index 7729566..1dd0805 100644
--- a/arch/blackfin/include/asm/bfin-global.h
+++ b/arch/blackfin/include/asm/bfin-global.h
@@ -47,6 +47,9 @@
# define DMA_UNCACHED_REGION (0)
#endif

+extern void bfin_setup_caches(unsigned int cpu);
+extern void bfin_setup_cpudata(unsigned int cpu);
+
extern unsigned long get_cclk(void);
extern unsigned long get_sclk(void);
extern unsigned long sclk_to_usecs(unsigned long sclk);
@@ -58,8 +61,6 @@ extern void dump_bfin_trace_buffer(void);

/* init functions only */
extern int init_arch_irq(void);
-extern void bfin_icache_init(void);
-extern void bfin_dcache_init(void);
extern void init_exception_vectors(void);
extern void program_IAR(void);

diff --git a/arch/blackfin/include/asm/bitops.h b/arch/blackfin/include/asm/bitops.h
index b39a175..5872fb6 100644
--- a/arch/blackfin/include/asm/bitops.h
+++ b/arch/blackfin/include/asm/bitops.h
@@ -7,7 +7,6 @@

#include <linux/compiler.h>
#include <asm/byteorder.h> /* swab32 */
-#include <asm/system.h> /* save_flags */

#ifdef __KERNEL__

@@ -20,36 +19,71 @@
#include <asm-generic/bitops/sched.h>
#include <asm-generic/bitops/ffz.h>

-static __inline__ void set_bit(int nr, volatile unsigned long *addr)
+#ifdef CONFIG_SMP
+
+#include <linux/linkage.h>
+
+asmlinkage int __raw_bit_set_asm(volatile unsigned long *addr, int nr);
+
+asmlinkage int __raw_bit_clear_asm(volatile unsigned long *addr, int nr);
+
+asmlinkage int __raw_bit_toggle_asm(volatile unsigned long *addr, int nr);
+
+asmlinkage int __raw_bit_test_set_asm(volatile unsigned long *addr, int nr);
+
+asmlinkage int __raw_bit_test_clear_asm(volatile unsigned long *addr, int nr);
+
+asmlinkage int __raw_bit_test_toggle_asm(volatile unsigned long *addr, int nr);
+
+asmlinkage int __raw_bit_test_asm(const volatile unsigned long *addr, int nr);
+
+static inline void set_bit(int nr, volatile unsigned long *addr)
{
- int *a = (int *)addr;
- int mask;
- unsigned long flags;
+ volatile unsigned long *a = addr + (nr >> 5);
+ __raw_bit_set_asm(a, nr & 0x1f);
+}

- a += nr >> 5;
- mask = 1 << (nr & 0x1f);
- local_irq_save(flags);
- *a |= mask;
- local_irq_restore(flags);
+static inline void clear_bit(int nr, volatile unsigned long *addr)
+{
+ volatile unsigned long *a = addr + (nr >> 5);
+ __raw_bit_clear_asm(a, nr & 0x1f);
}

-static __inline__ void __set_bit(int nr, volatile unsigned long *addr)
+static inline void change_bit(int nr, volatile unsigned long *addr)
{
- int *a = (int *)addr;
- int mask;
+ volatile unsigned long *a = addr + (nr >> 5);
+ __raw_bit_toggle_asm(a, nr & 0x1f);
+}

- a += nr >> 5;
- mask = 1 << (nr & 0x1f);
- *a |= mask;
+static inline int test_bit(int nr, const volatile unsigned long *addr)
+{
+ volatile const unsigned long *a = addr + (nr >> 5);
+ return __raw_bit_test_asm(a, nr & 0x1f) != 0;
}

-/*
- * clear_bit() doesn't provide any barrier for the compiler.
- */
-#define smp_mb__before_clear_bit() barrier()
-#define smp_mb__after_clear_bit() barrier()
+static inline int test_and_set_bit(int nr, volatile unsigned long *addr)
+{
+ volatile unsigned long *a = addr + (nr >> 5);
+ return __raw_bit_test_set_asm(a, nr & 0x1f);
+}

-static __inline__ void clear_bit(int nr, volatile unsigned long *addr)
+static inline int test_and_clear_bit(int nr, volatile unsigned long *addr)
+{
+ volatile unsigned long *a = addr + (nr >> 5);
+ return __raw_bit_test_clear_asm(a, nr & 0x1f);
+}
+
+static inline int test_and_change_bit(int nr, volatile unsigned long *addr)
+{
+ volatile unsigned long *a = addr + (nr >> 5);
+ return __raw_bit_test_toggle_asm(a, nr & 0x1f);
+}
+
+#else /* !CONFIG_SMP */
+
+#include <asm/system.h> /* save_flags */
+
+static inline void set_bit(int nr, volatile unsigned long *addr)
{
int *a = (int *)addr;
int mask;
@@ -57,21 +91,23 @@ static __inline__ void clear_bit(int nr, volatile unsigned long *addr)
a += nr >> 5;
mask = 1 << (nr & 0x1f);
local_irq_save(flags);
- *a &= ~mask;
+ *a |= mask;
local_irq_restore(flags);
}

-static __inline__ void __clear_bit(int nr, volatile unsigned long *addr)
+static inline void clear_bit(int nr, volatile unsigned long *addr)
{
int *a = (int *)addr;
int mask;
-
+ unsigned long flags;
a += nr >> 5;
mask = 1 << (nr & 0x1f);
+ local_irq_save(flags);
*a &= ~mask;
+ local_irq_restore(flags);
}

-static __inline__ void change_bit(int nr, volatile unsigned long *addr)
+static inline void change_bit(int nr, volatile unsigned long *addr)
{
int mask, flags;
unsigned long *ADDR = (unsigned long *)addr;
@@ -83,17 +119,7 @@ static __inline__ void change_bit(int nr, volatile unsigned long *addr)
local_irq_restore(flags);
}

-static __inline__ void __change_bit(int nr, volatile unsigned long *addr)
-{
- int mask;
- unsigned long *ADDR = (unsigned long *)addr;
-
- ADDR += nr >> 5;
- mask = 1 << (nr & 31);
- *ADDR ^= mask;
-}
-
-static __inline__ int test_and_set_bit(int nr, void *addr)
+static inline int test_and_set_bit(int nr, volatile unsigned long *addr)
{
int mask, retval;
volatile unsigned int *a = (volatile unsigned int *)addr;
@@ -109,19 +135,23 @@ static __inline__ int test_and_set_bit(int nr, void *addr)
return retval;
}

-static __inline__ int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static inline int test_and_clear_bit(int nr, volatile unsigned long *addr)
{
int mask, retval;
volatile unsigned int *a = (volatile unsigned int *)addr;
+ unsigned long flags;

a += nr >> 5;
mask = 1 << (nr & 0x1f);
+ local_irq_save(flags);
retval = (mask & *a) != 0;
- *a |= mask;
+ *a &= ~mask;
+ local_irq_restore(flags);
+
return retval;
}

-static __inline__ int test_and_clear_bit(int nr, volatile unsigned long *addr)
+static inline int test_and_change_bit(int nr, volatile unsigned long *addr)
{
int mask, retval;
volatile unsigned int *a = (volatile unsigned int *)addr;
@@ -131,13 +161,59 @@ static __inline__ int test_and_clear_bit(int nr, volatile unsigned long *addr)
mask = 1 << (nr & 0x1f);
local_irq_save(flags);
retval = (mask & *a) != 0;
- *a &= ~mask;
+ *a ^= mask;
local_irq_restore(flags);
-
return retval;
}

-static __inline__ int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+/*
+ * This routine doesn't need to go through raw atomic ops in UP
+ * context.
+ */
+#define test_bit(nr,addr) \
+(__builtin_constant_p(nr) ? \
+ __constant_test_bit((nr), (addr)) : \
+ __test_bit((nr), (addr)))
+
+#endif /* CONFIG_SMP */
+
+/*
+ * clear_bit() doesn't provide any barrier for the compiler.
+ */
+#define smp_mb__before_clear_bit() barrier()
+#define smp_mb__after_clear_bit() barrier()
+
+static inline void __set_bit(int nr, volatile unsigned long *addr)
+{
+ int *a = (int *)addr;
+ int mask;
+
+ a += nr >> 5;
+ mask = 1 << (nr & 0x1f);
+ *a |= mask;
+}
+
+static inline void __clear_bit(int nr, volatile unsigned long *addr)
+{
+ int *a = (int *)addr;
+ int mask;
+
+ a += nr >> 5;
+ mask = 1 << (nr & 0x1f);
+ *a &= ~mask;
+}
+
+static inline void __change_bit(int nr, volatile unsigned long *addr)
+{
+ int mask;
+ unsigned long *ADDR = (unsigned long *)addr;
+
+ ADDR += nr >> 5;
+ mask = 1 << (nr & 31);
+ *ADDR ^= mask;
+}
+
+static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
{
int mask, retval;
volatile unsigned int *a = (volatile unsigned int *)addr;
@@ -145,26 +221,23 @@ static __inline__ int __test_and_clear_bit(int nr, volatile unsigned long *addr)
a += nr >> 5;
mask = 1 << (nr & 0x1f);
retval = (mask & *a) != 0;
- *a &= ~mask;
+ *a |= mask;
return retval;
}

-static __inline__ int test_and_change_bit(int nr, volatile unsigned long *addr)
+static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
{
int mask, retval;
volatile unsigned int *a = (volatile unsigned int *)addr;
- unsigned long flags;

a += nr >> 5;
mask = 1 << (nr & 0x1f);
- local_irq_save(flags);
retval = (mask & *a) != 0;
- *a ^= mask;
- local_irq_restore(flags);
+ *a &= ~mask;
return retval;
}

-static __inline__ int __test_and_change_bit(int nr,
+static inline int __test_and_change_bit(int nr,
volatile unsigned long *addr)
{
int mask, retval;
@@ -177,16 +250,13 @@ static __inline__ int __test_and_change_bit(int nr,
return retval;
}

-/*
- * This routine doesn't need to be atomic.
- */
-static __inline__ int __constant_test_bit(int nr, const void *addr)
+static inline int __constant_test_bit(int nr, const void *addr)
{
return ((1UL << (nr & 31)) &
(((const volatile unsigned int *)addr)[nr >> 5])) != 0;
}

-static __inline__ int __test_bit(int nr, const void *addr)
+static inline int __test_bit(int nr, const void *addr)
{
int *a = (int *)addr;
int mask;
@@ -196,11 +266,6 @@ static __inline__ int __test_bit(int nr, const void *addr)
return ((mask & *a) != 0);
}

-#define test_bit(nr,addr) \
-(__builtin_constant_p(nr) ? \
- __constant_test_bit((nr),(addr)) : \
- __test_bit((nr),(addr)))
-
#include <asm-generic/bitops/find.h>
#include <asm-generic/bitops/hweight.h>
#include <asm-generic/bitops/lock.h>
diff --git a/arch/blackfin/include/asm/cache.h b/arch/blackfin/include/asm/cache.h
index 023d721..8663781 100644
--- a/arch/blackfin/include/asm/cache.h
+++ b/arch/blackfin/include/asm/cache.h
@@ -12,6 +12,11 @@
#define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT)
#define SMP_CACHE_BYTES L1_CACHE_BYTES

+#ifdef CONFIG_SMP
+#define __cacheline_aligned
+#else
+#define ____cacheline_aligned
+
/*
* Put cacheline_aliged data to L1 data memory
*/
@@ -21,9 +26,33 @@
__section__(".data_l1.cacheline_aligned")))
#endif

+#endif
+
/*
* largest L1 which this arch supports
*/
#define L1_CACHE_SHIFT_MAX 5

+#if defined(CONFIG_SMP) && \
+ !defined(CONFIG_BFIN_CACHE_COHERENT) && \
+ defined(CONFIG_BFIN_DCACHE)
+#define __ARCH_SYNC_CORE_DCACHE
+#ifndef __ASSEMBLY__
+asmlinkage void __raw_smp_mark_barrier_asm(void);
+asmlinkage void __raw_smp_check_barrier_asm(void);
+
+static inline void smp_mark_barrier(void)
+{
+ __raw_smp_mark_barrier_asm();
+}
+static inline void smp_check_barrier(void)
+{
+ __raw_smp_check_barrier_asm();
+}
+
+void resync_core_dcache(void);
+#endif
+#endif
+
+
#endif
diff --git a/arch/blackfin/include/asm/cacheflush.h b/arch/blackfin/include/asm/cacheflush.h
index 4403415..1b040f5 100644
--- a/arch/blackfin/include/asm/cacheflush.h
+++ b/arch/blackfin/include/asm/cacheflush.h
@@ -35,6 +35,7 @@ extern void blackfin_icache_flush_range(unsigned long start_address, unsigned lo
extern void blackfin_dcache_flush_range(unsigned long start_address, unsigned long end_address);
extern void blackfin_dcache_invalidate_range(unsigned long start_address, unsigned long end_address);
extern void blackfin_dflush_page(void *page);
+extern void blackfin_invalidate_entire_dcache(void);

#define flush_dcache_mmap_lock(mapping) do { } while (0)
#define flush_dcache_mmap_unlock(mapping) do { } while (0)
@@ -44,12 +45,20 @@ extern void blackfin_dflush_page(void *page);
#define flush_cache_vmap(start, end) do { } while (0)
#define flush_cache_vunmap(start, end) do { } while (0)

+#ifdef CONFIG_SMP
+#define flush_icache_range_others(start, end) \
+ smp_icache_flush_range_others((start), (end))
+#else
+#define flush_icache_range_others(start, end) do { } while (0)
+#endif
+
static inline void flush_icache_range(unsigned start, unsigned end)
{
#if defined(CONFIG_BFIN_DCACHE) && defined(CONFIG_BFIN_ICACHE)

# if defined(CONFIG_BFIN_WT)
blackfin_icache_flush_range((start), (end));
+ flush_icache_range_others(start, end);
# else
blackfin_icache_dcache_flush_range((start), (end));
# endif
@@ -58,6 +67,7 @@ static inline void flush_icache_range(unsigned start, unsigned end)

# if defined(CONFIG_BFIN_ICACHE)
blackfin_icache_flush_range((start), (end));
+ flush_icache_range_others(start, end);
# endif
# if defined(CONFIG_BFIN_DCACHE)
blackfin_dcache_flush_range((start), (end));
@@ -66,10 +76,12 @@ static inline void flush_icache_range(unsigned start, unsigned end)
#endif
}

-#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
-do { memcpy(dst, src, len); \
- flush_icache_range ((unsigned) (dst), (unsigned) (dst) + (len)); \
+#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
+do { memcpy(dst, src, len); \
+ flush_icache_range((unsigned) (dst), (unsigned) (dst) + (len)); \
+ flush_icache_range_others((unsigned long) (dst), (unsigned long) (dst) + (len));\
} while (0)
+
#define copy_from_user_page(vma, page, vaddr, dst, src, len) memcpy(dst, src, len)

#if defined(CONFIG_BFIN_DCACHE)
@@ -82,7 +94,7 @@ do { memcpy(dst, src, len); \
# define flush_dcache_page(page) blackfin_dflush_page(page_address(page))
#else
# define flush_dcache_range(start,end) do { } while (0)
-# define flush_dcache_page(page) do { } while (0)
+# define flush_dcache_page(page) do { } while (0)
#endif

extern unsigned long reserved_mem_dcache_on;
diff --git a/arch/blackfin/include/asm/context.S b/arch/blackfin/include/asm/context.S
index c0e630e..40d20b4 100644
--- a/arch/blackfin/include/asm/context.S
+++ b/arch/blackfin/include/asm/context.S
@@ -303,9 +303,14 @@
RETI = [sp++];
RETS = [sp++];

+#ifdef CONFIG_SMP
+ GET_PDA(p0, r0);
+ r0 = [p0 + PDA_IRQFLAGS];
+#else
p0.h = _irq_flags;
p0.l = _irq_flags;
r0 = [p0];
+#endif
sti r0;

sp += 4; /* Skip Reserved */
@@ -352,4 +357,3 @@
SYSCFG = [sp++];
csync;
.endm
-
diff --git a/arch/blackfin/include/asm/cpu.h b/arch/blackfin/include/asm/cpu.h
new file mode 100644
index 0000000..9b7aefe
--- /dev/null
+++ b/arch/blackfin/include/asm/cpu.h
@@ -0,0 +1,42 @@
+/*
+ * File: arch/blackfin/include/asm/cpu.h.
+ * Author: Philippe Gerum <[email protected]>
+ *
+ * Copyright 2007 Analog Devices Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see the file COPYING, or write
+ * to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef __ASM_BLACKFIN_CPU_H
+#define __ASM_BLACKFIN_CPU_H
+
+#include <linux/percpu.h>
+
+struct task_struct;
+
+struct blackfin_cpudata {
+ struct cpu cpu;
+ struct task_struct *idle;
+ unsigned long cclk;
+ unsigned int imemctl;
+ unsigned int dmemctl;
+ unsigned long loops_per_jiffy;
+ unsigned long dcache_invld_count;
+};
+
+DECLARE_PER_CPU(struct blackfin_cpudata, cpu_data);
+
+#endif
diff --git a/arch/blackfin/include/asm/l1layout.h b/arch/blackfin/include/asm/l1layout.h
index c13ded7..06bb37f 100644
--- a/arch/blackfin/include/asm/l1layout.h
+++ b/arch/blackfin/include/asm/l1layout.h
@@ -24,7 +24,8 @@ struct l1_scratch_task_info
};

/* A pointer to the structure in memory. */
-#define L1_SCRATCH_TASK_INFO ((struct l1_scratch_task_info *)L1_SCRATCH_START)
+#define L1_SCRATCH_TASK_INFO ((struct l1_scratch_task_info *)\
+ get_l1_scratch_start())

#endif

diff --git a/arch/blackfin/include/asm/mutex-dec.h b/arch/blackfin/include/asm/mutex-dec.h
new file mode 100644
index 0000000..0134151
--- /dev/null
+++ b/arch/blackfin/include/asm/mutex-dec.h
@@ -0,0 +1,112 @@
+/*
+ * include/asm-generic/mutex-dec.h
+ *
+ * Generic implementation of the mutex fastpath, based on atomic
+ * decrement/increment.
+ */
+#ifndef _ASM_GENERIC_MUTEX_DEC_H
+#define _ASM_GENERIC_MUTEX_DEC_H
+
+/**
+ * __mutex_fastpath_lock - try to take the lock by moving the count
+ * from 1 to a 0 value
+ * @count: pointer of type atomic_t
+ * @fail_fn: function to call if the original value was not 1
+ *
+ * Change the count from 1 to a value lower than 1, and call <fail_fn> if
+ * it wasn't 1 originally. This function MUST leave the value lower than
+ * 1 even when the "1" assertion wasn't true.
+ */
+static inline void
+__mutex_fastpath_lock(atomic_t *count, fastcall void (*fail_fn)(atomic_t *))
+{
+ if (unlikely(atomic_dec_return(count) < 0))
+ fail_fn(count);
+ else
+ smp_mb();
+}
+
+/**
+ * __mutex_fastpath_lock_retval - try to take the lock by moving the count
+ * from 1 to a 0 value
+ * @count: pointer of type atomic_t
+ * @fail_fn: function to call if the original value was not 1
+ *
+ * Change the count from 1 to a value lower than 1, and call <fail_fn> if
+ * it wasn't 1 originally. This function returns 0 if the fastpath succeeds,
+ * or anything the slow path function returns.
+ */
+static inline int
+__mutex_fastpath_lock_retval(atomic_t *count, fastcall int (*fail_fn)(atomic_t *))
+{
+ if (unlikely(atomic_dec_return(count) < 0))
+ return fail_fn(count);
+ else {
+ smp_mb();
+ return 0;
+ }
+}
+
+/**
+ * __mutex_fastpath_unlock - try to promote the count from 0 to 1
+ * @count: pointer of type atomic_t
+ * @fail_fn: function to call if the original value was not 0
+ *
+ * Try to promote the count from 0 to 1. If it wasn't 0, call <fail_fn>.
+ * In the failure case, this function is allowed to either set the value to
+ * 1, or to set it to a value lower than 1.
+ *
+ * If the implementation sets it to a value of lower than 1, then the
+ * __mutex_slowpath_needs_to_unlock() macro needs to return 1, it needs
+ * to return 0 otherwise.
+ */
+static inline void
+__mutex_fastpath_unlock(atomic_t *count, fastcall void (*fail_fn)(atomic_t *))
+{
+ smp_mb();
+ if (unlikely(atomic_inc_return(count) <= 0))
+ fail_fn(count);
+}
+
+#define __mutex_slowpath_needs_to_unlock() 1
+
+/**
+ * __mutex_fastpath_trylock - try to acquire the mutex, without waiting
+ *
+ * @count: pointer of type atomic_t
+ * @fail_fn: fallback function
+ *
+ * Change the count from 1 to a value lower than 1, and return 0 (failure)
+ * if it wasn't 1 originally, or return 1 (success) otherwise. This function
+ * MUST leave the value lower than 1 even when the "1" assertion wasn't true.
+ * Additionally, if the value was < 0 originally, this function must not leave
+ * it to 0 on failure.
+ *
+ * If the architecture has no effective trylock variant, it should call the
+ * <fail_fn> spinlock-based trylock variant unconditionally.
+ */
+static inline int
+__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
+{
+ /*
+ * We have two variants here. The cmpxchg based one is the best one
+ * because it never induce a false contention state. It is included
+ * here because architectures using the inc/dec algorithms over the
+ * xchg ones are much more likely to support cmpxchg natively.
+ *
+ * If not we fall back to the spinlock based variant - that is
+ * just as efficient (and simpler) as a 'destructive' probing of
+ * the mutex state would be.
+ */
+#ifdef __HAVE_ARCH_CMPXCHG
+ if (likely(atomic_cmpxchg(count, 1, 0) == 1)) {
+ smp_mb();
+ return 1;
+ }
+ return 0;
+#else
+ return fail_fn(count);
+#endif
+}
+
+#endif
diff --git a/arch/blackfin/include/asm/mutex.h b/arch/blackfin/include/asm/mutex.h
index 458c1f7..5d39925 100644
--- a/arch/blackfin/include/asm/mutex.h
+++ b/arch/blackfin/include/asm/mutex.h
@@ -6,4 +6,67 @@
* implementation. (see asm-generic/mutex-xchg.h for details)
*/

+#ifndef _ASM_MUTEX_H
+#define _ASM_MUTEX_H
+
+#ifndef CONFIG_SMP
#include <asm-generic/mutex-dec.h>
+#else
+
+static inline void
+__mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
+{
+ if (unlikely(atomic_dec_return(count) < 0))
+ fail_fn(count);
+ else
+ smp_mb();
+}
+
+static inline int
+__mutex_fastpath_lock_retval(atomic_t *count, int (*fail_fn)(atomic_t *))
+{
+ if (unlikely(atomic_dec_return(count) < 0))
+ return fail_fn(count);
+ else {
+ smp_mb();
+ return 0;
+ }
+}
+
+static inline void
+__mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *))
+{
+ smp_mb();
+ if (unlikely(atomic_inc_return(count) <= 0))
+ fail_fn(count);
+}
+
+#define __mutex_slowpath_needs_to_unlock() 1
+
+static inline int
+__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
+{
+ /*
+ * We have two variants here. The cmpxchg based one is the best one
+ * because it never induce a false contention state. It is included
+ * here because architectures using the inc/dec algorithms over the
+ * xchg ones are much more likely to support cmpxchg natively.
+ *
+ * If not we fall back to the spinlock based variant - that is
+ * just as efficient (and simpler) as a 'destructive' probing of
+ * the mutex state would be.
+ */
+#ifdef __HAVE_ARCH_CMPXCHG
+ if (likely(atomic_cmpxchg(count, 1, 0) == 1)) {
+ smp_mb();
+ return 1;
+ }
+ return 0;
+#else
+ return fail_fn(count);
+#endif
+}
+
+#endif
+
+#endif
diff --git a/arch/blackfin/include/asm/pda.h b/arch/blackfin/include/asm/pda.h
new file mode 100644
index 0000000..a24d130
--- /dev/null
+++ b/arch/blackfin/include/asm/pda.h
@@ -0,0 +1,70 @@
+/*
+ * File: arch/blackfin/include/asm/pda.h
+ * Author: Philippe Gerum <[email protected]>
+ *
+ * Copyright 2007 Analog Devices Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see the file COPYING, or write
+ * to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef _ASM_BLACKFIN_PDA_H
+#define _ASM_BLACKFIN_PDA_H
+
+#include <asm/mem_map.h>
+
+#ifndef __ASSEMBLY__
+
+struct blackfin_pda { /* Per-processor Data Area */
+ struct blackfin_pda *next;
+
+ unsigned long syscfg;
+#ifdef CONFIG_SMP
+ unsigned long imask; /* Current IMASK value */
+#endif
+
+ unsigned long *ipdt; /* Start of switchable I-CPLB table */
+ unsigned long *ipdt_swapcount; /* Number of swaps in ipdt */
+ unsigned long *dpdt; /* Start of switchable D-CPLB table */
+ unsigned long *dpdt_swapcount; /* Number of swaps in dpdt */
+
+ /*
+ * Single instructions can have multiple faults, which
+ * need to be handled by traps.c, in irq5. We store
+ * the exception cause to ensure we don't miss a
+ * double fault condition
+ */
+ unsigned long ex_iptr;
+ unsigned long ex_optr;
+ unsigned long ex_buf[4];
+ unsigned long ex_imask; /* Saved imask from exception */
+ unsigned long *ex_stack; /* Exception stack space */
+
+#ifdef ANOMALY_05000261
+ unsigned long last_cplb_fault_retx;
+#endif
+ unsigned long dcplb_fault_addr;
+ unsigned long icplb_fault_addr;
+ unsigned long retx;
+ unsigned long seqstat;
+};
+
+extern struct blackfin_pda cpu_pda[];
+
+void reserve_pda(void);
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* _ASM_BLACKFIN_PDA_H */
diff --git a/arch/blackfin/include/asm/percpu.h b/arch/blackfin/include/asm/percpu.h
index 78dd61f..797c0c1 100644
--- a/arch/blackfin/include/asm/percpu.h
+++ b/arch/blackfin/include/asm/percpu.h
@@ -3,4 +3,14 @@

#include <asm-generic/percpu.h>

-#endif /* __ARCH_BLACKFIN_PERCPU__ */
+#ifdef CONFIG_MODULES
+#define PERCPU_MODULE_RESERVE 8192
+#else
+#define PERCPU_MODULE_RESERVE 0
+#endif
+
+#define PERCPU_ENOUGH_ROOM \
+ (ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES) + \
+ PERCPU_MODULE_RESERVE)
+
+#endif /* __ARCH_BLACKFIN_PERCPU__ */
diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h
index e3e9b41..30703c7 100644
--- a/arch/blackfin/include/asm/processor.h
+++ b/arch/blackfin/include/asm/processor.h
@@ -106,7 +106,8 @@ unsigned long get_wchan(struct task_struct *p);
eip; })
#define KSTK_ESP(tsk) ((tsk) == current ? rdusp() : (tsk)->thread.usp)

-#define cpu_relax() barrier()
+#define cpu_relax() smp_mb()
+

/* Get the Silicon Revision of the chip */
static inline uint32_t __pure bfin_revid(void)
@@ -137,7 +138,11 @@ static inline uint32_t __pure bfin_revid(void)
static inline uint16_t __pure bfin_cpuid(void)
{
return (bfin_read_CHIPID() & CHIPID_FAMILY) >> 12;
+}

+static inline uint32_t __pure bfin_dspid(void)
+{
+ return bfin_read_DSPID();
}

static inline uint32_t __pure bfin_compiled_revid(void)
diff --git a/arch/blackfin/include/asm/rwlock.h b/arch/blackfin/include/asm/rwlock.h
new file mode 100644
index 0000000..4a724b3
--- /dev/null
+++ b/arch/blackfin/include/asm/rwlock.h
@@ -0,0 +1,6 @@
+#ifndef _ASM_BLACKFIN_RWLOCK_H
+#define _ASM_BLACKFIN_RWLOCK_H
+
+#define RW_LOCK_BIAS 0x01000000
+
+#endif
diff --git a/arch/blackfin/include/asm/smp.h b/arch/blackfin/include/asm/smp.h
new file mode 100644
index 0000000..233cb8c
--- /dev/null
+++ b/arch/blackfin/include/asm/smp.h
@@ -0,0 +1,42 @@
+/*
+ * File: arch/blackfin/include/asm/smp.h
+ * Author: Philippe Gerum <[email protected]>
+ *
+ * Copyright 2007 Analog Devices Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see the file COPYING, or write
+ * to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef __ASM_BLACKFIN_SMP_H
+#define __ASM_BLACKFIN_SMP_H
+
+#include <linux/kernel.h>
+#include <linux/threads.h>
+#include <linux/cpumask.h>
+#include <linux/cache.h>
+#include <asm/blackfin.h>
+#include <mach/smp.h>
+
+#define raw_smp_processor_id() blackfin_core_id()
+
+struct corelock_slot {
+ int lock;
+};
+
+void smp_icache_flush_range_others(unsigned long start,
+ unsigned long end);
+
+#endif /* !__ASM_BLACKFIN_SMP_H */
diff --git a/arch/blackfin/include/asm/spinlock.h b/arch/blackfin/include/asm/spinlock.h
index 64e908a..0249ac3 100644
--- a/arch/blackfin/include/asm/spinlock.h
+++ b/arch/blackfin/include/asm/spinlock.h
@@ -1,6 +1,89 @@
#ifndef __BFIN_SPINLOCK_H
#define __BFIN_SPINLOCK_H

-#error blackfin architecture does not support SMP spin lock yet
+#include <asm/atomic.h>

-#endif
+asmlinkage int __raw_spin_is_locked_asm(volatile int *ptr);
+asmlinkage void __raw_spin_lock_asm(volatile int *ptr);
+asmlinkage int __raw_spin_trylock_asm(volatile int *ptr);
+asmlinkage void __raw_spin_unlock_asm(volatile int *ptr);
+asmlinkage void __raw_read_lock_asm(volatile int *ptr);
+asmlinkage int __raw_read_trylock_asm(volatile int *ptr);
+asmlinkage void __raw_read_unlock_asm(volatile int *ptr);
+asmlinkage void __raw_write_lock_asm(volatile int *ptr);
+asmlinkage int __raw_write_trylock_asm(volatile int *ptr);
+asmlinkage void __raw_write_unlock_asm(volatile int *ptr);
+
+static inline int __raw_spin_is_locked(raw_spinlock_t *lock)
+{
+ return __raw_spin_is_locked_asm(&lock->lock);
+}
+
+static inline void __raw_spin_lock(raw_spinlock_t *lock)
+{
+ __raw_spin_lock_asm(&lock->lock);
+}
+
+#define __raw_spin_lock_flags(lock, flags) __raw_spin_lock(lock)
+
+static inline int __raw_spin_trylock(raw_spinlock_t *lock)
+{
+ return __raw_spin_trylock_asm(&lock->lock);
+}
+
+static inline void __raw_spin_unlock(raw_spinlock_t *lock)
+{
+ __raw_spin_unlock_asm(&lock->lock);
+}
+
+static inline void __raw_spin_unlock_wait(raw_spinlock_t *lock)
+{
+ while (__raw_spin_is_locked(lock))
+ cpu_relax();
+}
+
+static inline int __raw_read_can_lock(raw_rwlock_t *rw)
+{
+ return __raw_uncached_fetch_asm(&rw->lock) > 0;
+}
+
+static inline int __raw_write_can_lock(raw_rwlock_t *rw)
+{
+ return __raw_uncached_fetch_asm(&rw->lock) == RW_LOCK_BIAS;
+}
+
+static inline void __raw_read_lock(raw_rwlock_t *rw)
+{
+ __raw_read_lock_asm(&rw->lock);
+}
+
+static inline int __raw_read_trylock(raw_rwlock_t *rw)
+{
+ return __raw_read_trylock_asm(&rw->lock);
+}
+
+static inline void __raw_read_unlock(raw_rwlock_t *rw)
+{
+ __raw_read_unlock_asm(&rw->lock);
+}
+
+static inline void __raw_write_lock(raw_rwlock_t *rw)
+{
+ __raw_write_lock_asm(&rw->lock);
+}
+
+static inline int __raw_write_trylock(raw_rwlock_t *rw)
+{
+ return __raw_write_trylock_asm(&rw->lock);
+}
+
+static inline void __raw_write_unlock(raw_rwlock_t *rw)
+{
+ __raw_write_unlock_asm(&rw->lock);
+}
+
+#define _raw_spin_relax(lock) cpu_relax()
+#define _raw_read_relax(lock) cpu_relax()
+#define _raw_write_relax(lock) cpu_relax()
+
+#endif /* !__BFIN_SPINLOCK_H */
diff --git a/arch/blackfin/include/asm/spinlock_types.h b/arch/blackfin/include/asm/spinlock_types.h
new file mode 100644
index 0000000..b1e3c4c
--- /dev/null
+++ b/arch/blackfin/include/asm/spinlock_types.h
@@ -0,0 +1,22 @@
+#ifndef __ASM_SPINLOCK_TYPES_H
+#define __ASM_SPINLOCK_TYPES_H
+
+#ifndef __LINUX_SPINLOCK_TYPES_H
+# error "please don't include this file directly"
+#endif
+
+#include <asm/rwlock.h>
+
+typedef struct {
+ volatile unsigned int lock;
+} raw_spinlock_t;
+
+#define __RAW_SPIN_LOCK_UNLOCKED { 0 }
+
+typedef struct {
+ volatile unsigned int lock;
+} raw_rwlock_t;
+
+#define __RAW_RW_LOCK_UNLOCKED { RW_LOCK_BIAS }
+
+#endif
diff --git a/arch/blackfin/include/asm/system.h b/arch/blackfin/include/asm/system.h
index 8f1627d..6b368fa 100644
--- a/arch/blackfin/include/asm/system.h
+++ b/arch/blackfin/include/asm/system.h
@@ -37,20 +37,16 @@
#include <linux/linkage.h>
#include <linux/compiler.h>
#include <mach/anomaly.h>
+#include <asm/pda.h>
+#include <asm/processor.h>
+
+/* Forward decl needed due to cdef inter dependencies */
+static inline uint32_t __pure bfin_dspid(void);
+#define blackfin_core_id() (bfin_dspid() & 0xff)

/*
* Interrupt configuring macros.
*/
-
-extern unsigned long irq_flags;
-
-#define local_irq_enable() \
- __asm__ __volatile__( \
- "sti %0;" \
- : \
- : "d" (irq_flags) \
- )
-
#define local_irq_disable() \
do { \
int __tmp_dummy; \
@@ -66,6 +62,18 @@ extern unsigned long irq_flags;
# define NOP_PAD_ANOMALY_05000244
#endif

+#ifdef CONFIG_SMP
+# define irq_flags cpu_pda[blackfin_core_id()].imask
+#else
+extern unsigned long irq_flags;
+#endif
+
+#define local_irq_enable() \
+ __asm__ __volatile__( \
+ "sti %0;" \
+ : \
+ : "d" (irq_flags) \
+ )
#define idle_with_irq_disabled() \
__asm__ __volatile__( \
NOP_PAD_ANOMALY_05000244 \
@@ -129,22 +137,85 @@ extern unsigned long irq_flags;
#define rmb() asm volatile ("" : : :"memory")
#define wmb() asm volatile ("" : : :"memory")
#define set_mb(var, value) do { (void) xchg(&var, value); } while (0)
-
#define read_barrier_depends() do { } while(0)

#ifdef CONFIG_SMP
-#define smp_mb() mb()
-#define smp_rmb() rmb()
-#define smp_wmb() wmb()
-#define smp_read_barrier_depends() read_barrier_depends()
+asmlinkage unsigned long __raw_xchg_1_asm(volatile void *ptr, unsigned long value);
+asmlinkage unsigned long __raw_xchg_2_asm(volatile void *ptr, unsigned long value);
+asmlinkage unsigned long __raw_xchg_4_asm(volatile void *ptr, unsigned long value);
+asmlinkage unsigned long __raw_cmpxchg_1_asm(volatile void *ptr,
+ unsigned long new, unsigned long old);
+asmlinkage unsigned long __raw_cmpxchg_2_asm(volatile void *ptr,
+ unsigned long new, unsigned long old);
+asmlinkage unsigned long __raw_cmpxchg_4_asm(volatile void *ptr,
+ unsigned long new, unsigned long old);
+
+#ifdef __ARCH_SYNC_CORE_DCACHE
+# define smp_mb() do { barrier(); smp_check_barrier(); smp_mark_barrier(); } while (0)
+# define smp_rmb() do { barrier(); smp_check_barrier(); } while (0)
+# define smp_wmb() do { barrier(); smp_mark_barrier(); } while (0)
#else
+# define smp_mb() barrier()
+# define smp_rmb() barrier()
+# define smp_wmb() barrier()
+#endif
+
+static inline unsigned long __xchg(unsigned long x, volatile void *ptr,
+ int size)
+{
+ unsigned long tmp;
+
+ switch (size) {
+ case 1:
+ tmp = __raw_xchg_1_asm(ptr, x);
+ break;
+ case 2:
+ tmp = __raw_xchg_2_asm(ptr, x);
+ break;
+ case 4:
+ tmp = __raw_xchg_4_asm(ptr, x);
+ break;
+ }
+
+ return tmp;
+}
+
+/*
+ * Atomic compare and exchange. Compare OLD with MEM, if identical,
+ * store NEW in MEM. Return the initial value in MEM. Success is
+ * indicated by comparing RETURN with OLD.
+ */
+static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
+ unsigned long new, int size)
+{
+ unsigned long tmp;
+
+ switch (size) {
+ case 1:
+ tmp = __raw_cmpxchg_1_asm(ptr, new, old);
+ break;
+ case 2:
+ tmp = __raw_cmpxchg_2_asm(ptr, new, old);
+ break;
+ case 4:
+ tmp = __raw_cmpxchg_4_asm(ptr, new, old);
+ break;
+ }
+
+ return tmp;
+}
+#define cmpxchg(ptr, o, n) \
+ ((__typeof__(*(ptr)))__cmpxchg((ptr), (unsigned long)(o), \
+ (unsigned long)(n), sizeof(*(ptr))))
+
+#define smp_read_barrier_depends() smp_check_barrier()
+
+#else /* !CONFIG_SMP */
+
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
#define smp_read_barrier_depends() do { } while(0)
-#endif
-
-#define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))

struct __xchg_dummy {
unsigned long a[100];
@@ -194,9 +265,12 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr,
(unsigned long)(n), sizeof(*(ptr))))
#define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))

-#ifndef CONFIG_SMP
#include <asm-generic/cmpxchg.h>
-#endif
+
+#endif /* !CONFIG_SMP */
+
+#define xchg(ptr, x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x), (ptr), sizeof(*(ptr))))
+#define tas(ptr) ((void)xchg((ptr), 1))

#define prepare_to_switch() do { } while(0)

@@ -218,4 +292,4 @@ do { \
(last) = resume (prev, next); \
} while (0)

-#endif /* _BLACKFIN_SYSTEM_H */
+#endif /* _BLACKFIN_SYSTEM_H */
diff --git a/arch/blackfin/mach-common/Makefile b/arch/blackfin/mach-common/Makefile
index e6ed57c..9388b4a 100644
--- a/arch/blackfin/mach-common/Makefile
+++ b/arch/blackfin/mach-common/Makefile
@@ -10,3 +10,4 @@ obj-$(CONFIG_BFIN_ICACHE_LOCK) += lock.o
obj-$(CONFIG_PM) += pm.o dpmc_modes.o
obj-$(CONFIG_CPU_FREQ) += cpufreq.o
obj-$(CONFIG_CPU_VOLTAGE) += dpmc.o
+obj-$(CONFIG_SMP) += smp.o
diff --git a/arch/blackfin/mach-common/cache.S b/arch/blackfin/mach-common/cache.S
index 3c98dac..1187512 100644
--- a/arch/blackfin/mach-common/cache.S
+++ b/arch/blackfin/mach-common/cache.S
@@ -97,3 +97,39 @@ ENTRY(_blackfin_dflush_page)
P1 = 1 << (PAGE_SHIFT - L1_CACHE_SHIFT);
jump .Ldfr;
ENDPROC(_blackfin_dflush_page)
+
+/* Invalidate the Entire Data cache by
+ * clearing DMC[1:0] bits
+ */
+ENTRY(_blackfin_invalidate_entire_dcache)
+ [--SP] = ( R7:5);
+
+ P0.L = LO(DMEM_CONTROL);
+ P0.H = HI(DMEM_CONTROL);
+ R7 = [P0];
+ R5 = R7; /* Save DMEM_CNTR */
+
+ /* Clear the DMC[1:0] bits, All valid bits in the data
+ * cache are set to the invalid state
+ */
+ BITCLR(R7,DMC0_P);
+ BITCLR(R7,DMC1_P);
+ CLI R6;
+ SSYNC; /* SSYNC required before writing to DMEM_CONTROL. */
+ .align 8;
+ [P0] = R7;
+ SSYNC;
+ STI R6;
+
+ /* Configures the data cache again */
+
+ CLI R6;
+ SSYNC; /* SSYNC required before writing to DMEM_CONTROL. */
+ .align 8;
+ [P0] = R5;
+ SSYNC;
+ STI R6;
+
+ ( R7:5) = [SP++];
+ RTS;
+ENDPROC(_blackfin_invalidate_entire_dcache)
diff --git a/arch/blackfin/mach-common/entry.S b/arch/blackfin/mach-common/entry.S
index c6ae844..5531f49 100644
--- a/arch/blackfin/mach-common/entry.S
+++ b/arch/blackfin/mach-common/entry.S
@@ -36,6 +36,7 @@
#include <linux/init.h>
#include <linux/linkage.h>
#include <linux/unistd.h>
+#include <linux/threads.h>
#include <asm/blackfin.h>
#include <asm/errno.h>
#include <asm/fixed_code.h>
@@ -75,11 +76,11 @@ ENTRY(_ex_workaround_261)
* handle it.
*/
P4 = R7; /* Store EXCAUSE */
- p5.l = _last_cplb_fault_retx;
- p5.h = _last_cplb_fault_retx;
- r7 = [p5];
+
+ GET_PDA(p5, r7);
+ r7 = [p5 + PDA_LFRETX];
r6 = retx;
- [p5] = r6;
+ [p5 + PDA_LFRETX] = r6;
cc = r6 == r7;
if !cc jump _bfin_return_from_exception;
/* fall through */
@@ -324,7 +325,9 @@ ENTRY(_ex_trap_c)
[p4] = p5;
csync;

+ GET_PDA(p5, r6);
#ifndef CONFIG_DEBUG_DOUBLEFAULT
+
/*
* Save these registers, as they are only valid in exception context
* (where we are now - as soon as we defer to IRQ5, they can change)
@@ -335,29 +338,25 @@ ENTRY(_ex_trap_c)
p4.l = lo(DCPLB_FAULT_ADDR);
p4.h = hi(DCPLB_FAULT_ADDR);
r7 = [p4];
- p5.h = _saved_dcplb_fault_addr;
- p5.l = _saved_dcplb_fault_addr;
- [p5] = r7;
+ [p5 + PDA_DCPLB] = r7;

- r7 = [p4 + (ICPLB_FAULT_ADDR - DCPLB_FAULT_ADDR)];
- p5.h = _saved_icplb_fault_addr;
- p5.l = _saved_icplb_fault_addr;
- [p5] = r7;
+ p4.l = lo(ICPLB_FAULT_ADDR);
+ p4.h = hi(ICPLB_FAULT_ADDR);
+ r6 = [p4];
+ [p5 + PDA_ICPLB] = r6;

r6 = retx;
- p4.l = _saved_retx;
- p4.h = _saved_retx;
- [p4] = r6;
+ [p5 + PDA_RETX] = r6;
#endif
r6 = SYSCFG;
- [p4 + 4] = r6;
+ [p5 + PDA_SYSCFG] = r6;
BITCLR(r6, 0);
SYSCFG = r6;

/* Disable all interrupts, but make sure level 5 is enabled so
* we can switch to that level. Save the old mask. */
cli r6;
- [p4 + 8] = r6;
+ [p5 + PDA_EXIMASK] = r6;

p4.l = lo(SAFE_USER_INSTRUCTION);
p4.h = hi(SAFE_USER_INSTRUCTION);
@@ -424,17 +423,16 @@ ENDPROC(_double_fault)
ENTRY(_exception_to_level5)
SAVE_ALL_SYS

- p4.l = _saved_retx;
- p4.h = _saved_retx;
- r6 = [p4];
+ GET_PDA(p4, r7); /* Fetch current PDA */
+ r6 = [p4 + PDA_RETX];
[sp + PT_PC] = r6;

- r6 = [p4 + 4];
+ r6 = [p4 + PDA_SYSCFG];
[sp + PT_SYSCFG] = r6;

/* Restore interrupt mask. We haven't pushed RETI, so this
* doesn't enable interrupts until we return from this handler. */
- r6 = [p4 + 8];
+ r6 = [p4 + PDA_EXIMASK];
sti r6;

/* Restore the hardware error vector. */
@@ -478,8 +476,8 @@ ENTRY(_trap) /* Exception: 4th entry into system event table(supervisor mode)*/
* scratch register (for want of a better option).
*/
EX_SCRATCH_REG = sp;
- sp.l = _exception_stack_top;
- sp.h = _exception_stack_top;
+ GET_PDA_SAFE(sp);
+ sp = [sp + PDA_EXSTACK]
/* Try to deal with syscalls quickly. */
[--sp] = ASTAT;
[--sp] = (R7:6,P5:4);
@@ -501,27 +499,22 @@ ENTRY(_trap) /* Exception: 4th entry into system event table(supervisor mode)*/
* but they are not very interesting, so don't save them
*/

+ GET_PDA(p5, r7);
p4.l = lo(DCPLB_FAULT_ADDR);
p4.h = hi(DCPLB_FAULT_ADDR);
r7 = [p4];
- p5.h = _saved_dcplb_fault_addr;
- p5.l = _saved_dcplb_fault_addr;
- [p5] = r7;
+ [p5 + PDA_DCPLB] = r7;

- r7 = [p4 + (ICPLB_FAULT_ADDR - DCPLB_FAULT_ADDR)];
- p5.h = _saved_icplb_fault_addr;
- p5.l = _saved_icplb_fault_addr;
- [p5] = r7;
+ p4.l = lo(ICPLB_FAULT_ADDR);
+ p4.h = hi(ICPLB_FAULT_ADDR);
+ r7 = [p4];
+ [p5 + PDA_ICPLB] = r7;

- p4.l = _saved_retx;
- p4.h = _saved_retx;
r6 = retx;
- [p4] = r6;
+ [p5 + PDA_RETX] = r6;

r7 = SEQSTAT; /* reason code is in bit 5:0 */
- p4.l = _saved_seqstat;
- p4.h = _saved_seqstat;
- [p4] = r7;
+ [p5 + PDA_SEQSTAT] = r7;
#else
r7 = SEQSTAT; /* reason code is in bit 5:0 */
#endif
@@ -546,11 +539,11 @@ ENTRY(_kernel_execve)
p0 = sp;
r3 = SIZEOF_PTREGS / 4;
r4 = 0(x);
-0:
+.Lclear_regs:
[p0++] = r4;
r3 += -1;
cc = r3 == 0;
- if !cc jump 0b (bp);
+ if !cc jump .Lclear_regs (bp);

p0 = sp;
sp += -16;
@@ -558,7 +551,7 @@ ENTRY(_kernel_execve)
call _do_execve;
SP += 16;
cc = r0 == 0;
- if ! cc jump 1f;
+ if ! cc jump .Lexecve_failed;
/* Success. Copy our temporary pt_regs to the top of the kernel
* stack and do a normal exception return.
*/
@@ -574,12 +567,12 @@ ENTRY(_kernel_execve)
p0 = fp;
r4 = [p0--];
r3 = SIZEOF_PTREGS / 4;
-0:
+.Lcopy_regs:
r4 = [p0--];
[p1--] = r4;
r3 += -1;
cc = r3 == 0;
- if ! cc jump 0b (bp);
+ if ! cc jump .Lcopy_regs (bp);

r0 = (KERNEL_STACK_SIZE - SIZEOF_PTREGS) (z);
p1 = r0;
@@ -591,7 +584,7 @@ ENTRY(_kernel_execve)

RESTORE_CONTEXT;
rti;
-1:
+.Lexecve_failed:
unlink;
rts;
ENDPROC(_kernel_execve)
@@ -925,9 +918,14 @@ _schedule_and_signal_from_int:
p1 = rets;
[sp + PT_RESERVED] = p1;

+#ifdef CONFIG_SMP
+ GET_PDA(p0, r0); /* Fetch current PDA (can't migrate to other CPU here) */
+ r0 = [p0 + PDA_IRQFLAGS];
+#else
p0.l = _irq_flags;
p0.h = _irq_flags;
r0 = [p0];
+#endif
sti r0;

r0 = sp;
@@ -1539,12 +1537,6 @@ ENTRY(_sys_call_table)
.endr
END(_sys_call_table)

-#if ANOMALY_05000261
-/* Used by the assembly entry point to work around an anomaly. */
-_last_cplb_fault_retx:
- .long 0;
-#endif
-
#ifdef CONFIG_EXCEPTION_L1_SCRATCH
/* .section .l1.bss.scratch */
.set _exception_stack_top, L1_SCRATCH_START + L1_SCRATCH_LENGTH
@@ -1554,8 +1546,8 @@ _last_cplb_fault_retx:
#else
.bss
#endif
-_exception_stack:
- .rept 1024
+ENTRY(_exception_stack)
+ .rept 1024 * NR_CPUS
.long 0
.endr
_exception_stack_top:
diff --git a/arch/blackfin/mach-common/head.S b/arch/blackfin/mach-common/head.S
index c1dcaeb..a621ae4 100644
--- a/arch/blackfin/mach-common/head.S
+++ b/arch/blackfin/mach-common/head.S
@@ -13,6 +13,7 @@
#include <asm/blackfin.h>
#include <asm/thread_info.h>
#include <asm/trace.h>
+#include <asm/asm-offsets.h>

__INIT

@@ -111,33 +112,26 @@ ENTRY(__start)
* This happens here, since L1 gets clobbered
* below
*/
- p0.l = _saved_retx;
- p0.h = _saved_retx;
+ GET_PDA(p0, r0);
+ r7 = [p0 + PDA_RETX];
p1.l = _init_saved_retx;
p1.h = _init_saved_retx;
- r0 = [p0];
- [p1] = r0;
+ [p1] = r7;

- p0.l = _saved_dcplb_fault_addr;
- p0.h = _saved_dcplb_fault_addr;
+ r7 = [p0 + PDA_DCPLB];
p1.l = _init_saved_dcplb_fault_addr;
p1.h = _init_saved_dcplb_fault_addr;
- r0 = [p0];
- [p1] = r0;
+ [p1] = r7;

- p0.l = _saved_icplb_fault_addr;
- p0.h = _saved_icplb_fault_addr;
+ r7 = [p0 + PDA_ICPLB];
p1.l = _init_saved_icplb_fault_addr;
p1.h = _init_saved_icplb_fault_addr;
- r0 = [p0];
- [p1] = r0;
+ [p1] = r7;

- p0.l = _saved_seqstat;
- p0.h = _saved_seqstat;
+ r7 = [p0 + PDA_SEQSTAT];
p1.l = _init_saved_seqstat;
p1.h = _init_saved_seqstat;
- r0 = [p0];
- [p1] = r0;
+ [p1] = r7;
#endif

/* Initialize stack pointer */
@@ -255,6 +249,9 @@ ENTRY(_real_start)
sp = sp + p1;
usp = sp;
fp = sp;
+ sp += -12;
+ call _init_pda
+ sp += 12;
jump.l _start_kernel;
ENDPROC(_real_start)

diff --git a/arch/blackfin/mach-common/ints-priority.c b/arch/blackfin/mach-common/ints-priority.c
index d45d0c5..eb8dfcf 100644
--- a/arch/blackfin/mach-common/ints-priority.c
+++ b/arch/blackfin/mach-common/ints-priority.c
@@ -55,6 +55,7 @@
* -
*/

+#ifndef CONFIG_SMP
/* Initialize this to an actual value to force it into the .data
* section so that we know it is properly initialized at entry into
* the kernel but before bss is initialized to zero (which is where
@@ -63,6 +64,7 @@
*/
unsigned long irq_flags = 0x1f;
EXPORT_SYMBOL(irq_flags);
+#endif

/* The number of spurious interrupts */
atomic_t num_spurious;
@@ -163,6 +165,10 @@ static void bfin_internal_mask_irq(unsigned int irq)
mask_bit = SIC_SYSIRQ(irq) % 32;
bfin_write_SIC_IMASK(mask_bank, bfin_read_SIC_IMASK(mask_bank) &
~(1 << mask_bit));
+#ifdef CONFIG_SMP
+ bfin_write_SICB_IMASK(mask_bank, bfin_read_SICB_IMASK(mask_bank) &
+ ~(1 << mask_bit));
+#endif
#endif
}

@@ -177,6 +183,10 @@ static void bfin_internal_unmask_irq(unsigned int irq)
mask_bit = SIC_SYSIRQ(irq) % 32;
bfin_write_SIC_IMASK(mask_bank, bfin_read_SIC_IMASK(mask_bank) |
(1 << mask_bit));
+#ifdef CONFIG_SMP
+ bfin_write_SICB_IMASK(mask_bank, bfin_read_SICB_IMASK(mask_bank) |
+ (1 << mask_bit));
+#endif
#endif
}

@@ -896,7 +906,7 @@ static struct irq_chip bfin_gpio_irqchip = {
#endif
};

-void __init init_exception_vectors(void)
+void __cpuinit init_exception_vectors(void)
{
/* cannot program in software:
* evt0 - emulation (jtag)
@@ -935,6 +945,10 @@ int __init init_arch_irq(void)
# ifdef CONFIG_BF54x
bfin_write_SIC_IMASK2(SIC_UNMASK_ALL);
# endif
+# ifdef CONFIG_SMP
+ bfin_write_SICB_IMASK0(SIC_UNMASK_ALL);
+ bfin_write_SICB_IMASK1(SIC_UNMASK_ALL);
+# endif
#else
bfin_write_SIC_IMASK(SIC_UNMASK_ALL);
#endif
@@ -995,6 +1009,17 @@ int __init init_arch_irq(void)

break;
#endif
+#ifdef CONFIG_TICK_SOURCE_SYSTMR0
+ case IRQ_TIMER0:
+ set_irq_handler(irq, handle_percpu_irq);
+ break;
+#endif
+#ifdef CONFIG_SMP
+ case IRQ_SUPPLE_0:
+ case IRQ_SUPPLE_1:
+ set_irq_handler(irq, handle_percpu_irq);
+ break;
+#endif
default:
set_irq_handler(irq, handle_simple_irq);
break;
@@ -1029,7 +1054,7 @@ int __init init_arch_irq(void)
search_IAR();

/* Enable interrupts IVG7-15 */
- irq_flags = irq_flags | IMASK_IVG15 |
+ irq_flags |= IMASK_IVG15 |
IMASK_IVG14 | IMASK_IVG13 | IMASK_IVG12 | IMASK_IVG11 |
IMASK_IVG10 | IMASK_IVG9 | IMASK_IVG8 | IMASK_IVG7 | IMASK_IVGHW;

@@ -1070,8 +1095,16 @@ void do_irq(int vec, struct pt_regs *fp)
|| defined(BF538_FAMILY) || defined(CONFIG_BF51x)
unsigned long sic_status[3];

- sic_status[0] = bfin_read_SIC_ISR0() & bfin_read_SIC_IMASK0();
- sic_status[1] = bfin_read_SIC_ISR1() & bfin_read_SIC_IMASK1();
+ if (smp_processor_id()) {
+#ifdef CONFIG_SMP
+ /* This will be optimized out in UP mode. */
+ sic_status[0] = bfin_read_SICB_ISR0() & bfin_read_SICB_IMASK0();
+ sic_status[1] = bfin_read_SICB_ISR1() & bfin_read_SICB_IMASK1();
+#endif
+ } else {
+ sic_status[0] = bfin_read_SIC_ISR0() & bfin_read_SIC_IMASK0();
+ sic_status[1] = bfin_read_SIC_ISR1() & bfin_read_SIC_IMASK1();
+ }
#ifdef CONFIG_BF54x
sic_status[2] = bfin_read_SIC_ISR2() & bfin_read_SIC_IMASK2();
#endif
diff --git a/arch/blackfin/mach-common/smp.c b/arch/blackfin/mach-common/smp.c
new file mode 100644
index 0000000..7aeeced
--- /dev/null
+++ b/arch/blackfin/mach-common/smp.c
@@ -0,0 +1,476 @@
+/*
+ * File: arch/blackfin/kernel/smp.c
+ * Author: Philippe Gerum <[email protected]>
+ * IPI management based on arch/arm/kernel/smp.c.
+ *
+ * Copyright 2007 Analog Devices Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see the file COPYING, or write
+ * to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <linux/module.h>
+#include <linux/delay.h>
+#include <linux/init.h>
+#include <linux/spinlock.h>
+#include <linux/sched.h>
+#include <linux/interrupt.h>
+#include <linux/cache.h>
+#include <linux/profile.h>
+#include <linux/errno.h>
+#include <linux/mm.h>
+#include <linux/cpu.h>
+#include <linux/smp.h>
+#include <linux/seq_file.h>
+#include <linux/irq.h>
+#include <asm/atomic.h>
+#include <asm/cacheflush.h>
+#include <asm/mmu_context.h>
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+#include <asm/processor.h>
+#include <asm/ptrace.h>
+#include <asm/cpu.h>
+#include <linux/err.h>
+
+struct corelock_slot corelock __attribute__ ((__section__(".l2.bss")));
+
+void __cpuinitdata *init_retx_coreb, *init_saved_retx_coreb,
+ *init_saved_seqstat_coreb, *init_saved_icplb_fault_addr_coreb,
+ *init_saved_dcplb_fault_addr_coreb;
+
+cpumask_t cpu_possible_map;
+EXPORT_SYMBOL(cpu_possible_map);
+
+cpumask_t cpu_online_map;
+EXPORT_SYMBOL(cpu_online_map);
+
+#define BFIN_IPI_RESCHEDULE 0
+#define BFIN_IPI_CALL_FUNC 1
+#define BFIN_IPI_CPU_STOP 2
+
+struct blackfin_flush_data {
+ unsigned long start;
+ unsigned long end;
+};
+
+void *secondary_stack;
+
+
+struct smp_call_struct {
+ void (*func)(void *info);
+ void *info;
+ int wait;
+ cpumask_t pending;
+ cpumask_t waitmask;
+};
+
+static struct blackfin_flush_data smp_flush_data;
+
+static DEFINE_SPINLOCK(stop_lock);
+
+struct ipi_message {
+ struct list_head list;
+ unsigned long type;
+ struct smp_call_struct call_struct;
+};
+
+struct ipi_message_queue {
+ struct list_head head;
+ spinlock_t lock;
+ unsigned long count;
+};
+
+static DEFINE_PER_CPU(struct ipi_message_queue, ipi_msg_queue);
+
+static void ipi_cpu_stop(unsigned int cpu)
+{
+ spin_lock(&stop_lock);
+ printk(KERN_CRIT "CPU%u: stopping\n", cpu);
+ dump_stack();
+ spin_unlock(&stop_lock);
+
+ cpu_clear(cpu, cpu_online_map);
+
+ local_irq_disable();
+
+ while (1)
+ SSYNC();
+}
+
+static void ipi_flush_icache(void *info)
+{
+ struct blackfin_flush_data *fdata = info;
+
+ /* Invalidate the memory holding the bounds of the flushed region. */
+ blackfin_dcache_invalidate_range((unsigned long)fdata,
+ (unsigned long)fdata + sizeof(*fdata));
+
+ blackfin_icache_flush_range(fdata->start, fdata->end);
+}
+
+static void ipi_call_function(unsigned int cpu, struct ipi_message *msg)
+{
+ int wait;
+ void (*func)(void *info);
+ void *info;
+ func = msg->call_struct.func;
+ info = msg->call_struct.info;
+ wait = msg->call_struct.wait;
+ cpu_clear(cpu, msg->call_struct.pending);
+ func(info);
+ if (wait)
+ cpu_clear(cpu, msg->call_struct.waitmask);
+ else
+ kfree(msg);
+}
+
+static irqreturn_t ipi_handler(int irq, void *dev_instance)
+{
+ struct ipi_message *msg, *mg;
+ struct ipi_message_queue *msg_queue;
+ unsigned int cpu = smp_processor_id();
+
+ platform_clear_ipi(cpu);
+
+ msg_queue = &__get_cpu_var(ipi_msg_queue);
+ msg_queue->count++;
+
+ spin_lock(&msg_queue->lock);
+ list_for_each_entry_safe(msg, mg, &msg_queue->head, list) {
+ list_del(&msg->list);
+ switch (msg->type) {
+ case BFIN_IPI_RESCHEDULE:
+ /* That's the easiest one; leave it to
+ * return_from_int. */
+ kfree(msg);
+ break;
+ case BFIN_IPI_CALL_FUNC:
+ ipi_call_function(cpu, msg);
+ break;
+ case BFIN_IPI_CPU_STOP:
+ ipi_cpu_stop(cpu);
+ kfree(msg);
+ break;
+ default:
+ printk(KERN_CRIT "CPU%u: Unknown IPI message \
+ 0x%lx\n", cpu, msg->type);
+ kfree(msg);
+ break;
+ }
+ }
+ spin_unlock(&msg_queue->lock);
+ return IRQ_HANDLED;
+}
+
+static void ipi_queue_init(void)
+{
+ unsigned int cpu;
+ struct ipi_message_queue *msg_queue;
+ for_each_possible_cpu(cpu) {
+ msg_queue = &per_cpu(ipi_msg_queue, cpu);
+ INIT_LIST_HEAD(&msg_queue->head);
+ spin_lock_init(&msg_queue->lock);
+ msg_queue->count = 0;
+ }
+}
+
+int smp_call_function(void (*func)(void *info), void *info, int wait)
+{
+ unsigned int cpu;
+ cpumask_t callmap;
+ unsigned long flags;
+ struct ipi_message_queue *msg_queue;
+ struct ipi_message *msg;
+
+ callmap = cpu_online_map;
+ cpu_clear(smp_processor_id(), callmap);
+ if (cpus_empty(callmap))
+ return 0;
+
+ msg = kmalloc(sizeof(*msg), GFP_ATOMIC);
+ INIT_LIST_HEAD(&msg->list);
+ msg->call_struct.func = func;
+ msg->call_struct.info = info;
+ msg->call_struct.wait = wait;
+ msg->call_struct.pending = callmap;
+ msg->call_struct.waitmask = callmap;
+ msg->type = BFIN_IPI_CALL_FUNC;
+
+ for_each_cpu_mask(cpu, callmap) {
+ msg_queue = &per_cpu(ipi_msg_queue, cpu);
+ spin_lock_irqsave(&msg_queue->lock, flags);
+ list_add(&msg->list, &msg_queue->head);
+ spin_unlock_irqrestore(&msg_queue->lock, flags);
+ platform_send_ipi_cpu(cpu);
+ }
+ if (wait) {
+ while (!cpus_empty(msg->call_struct.waitmask))
+ blackfin_dcache_invalidate_range(
+ (unsigned long)(&msg->call_struct.waitmask),
+ (unsigned long)(&msg->call_struct.waitmask));
+ kfree(msg);
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(smp_call_function);
+
+int smp_call_function_single(int cpuid, void (*func) (void *info), void *info,
+ int wait)
+{
+ unsigned int cpu = cpuid;
+ cpumask_t callmap;
+ unsigned long flags;
+ struct ipi_message_queue *msg_queue;
+ struct ipi_message *msg;
+
+ if (cpu_is_offline(cpu))
+ return 0;
+ cpus_clear(callmap);
+ cpu_set(cpu, callmap);
+
+ msg = kmalloc(sizeof(*msg), GFP_ATOMIC);
+ INIT_LIST_HEAD(&msg->list);
+ msg->call_struct.func = func;
+ msg->call_struct.info = info;
+ msg->call_struct.wait = wait;
+ msg->call_struct.pending = callmap;
+ msg->call_struct.waitmask = callmap;
+ msg->type = BFIN_IPI_CALL_FUNC;
+
+ msg_queue = &per_cpu(ipi_msg_queue, cpu);
+ spin_lock_irqsave(&msg_queue->lock, flags);
+ list_add(&msg->list, &msg_queue->head);
+ spin_unlock_irqrestore(&msg_queue->lock, flags);
+ platform_send_ipi_cpu(cpu);
+
+ if (wait) {
+ while (!cpus_empty(msg->call_struct.waitmask))
+ blackfin_dcache_invalidate_range(
+ (unsigned long)(&msg->call_struct.waitmask),
+ (unsigned long)(&msg->call_struct.waitmask));
+ kfree(msg);
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(smp_call_function_single);
+
+void smp_send_reschedule(int cpu)
+{
+ unsigned long flags;
+ struct ipi_message_queue *msg_queue;
+ struct ipi_message *msg;
+
+ if (cpu_is_offline(cpu))
+ return;
+
+ msg = kmalloc(sizeof(*msg), GFP_ATOMIC);
+ memset(msg, 0, sizeof(msg));
+ INIT_LIST_HEAD(&msg->list);
+ msg->type = BFIN_IPI_RESCHEDULE;
+
+ msg_queue = &per_cpu(ipi_msg_queue, cpu);
+ spin_lock_irqsave(&msg_queue->lock, flags);
+ list_add(&msg->list, &msg_queue->head);
+ spin_unlock_irqrestore(&msg_queue->lock, flags);
+ platform_send_ipi_cpu(cpu);
+
+ return;
+}
+
+void smp_send_stop(void)
+{
+ unsigned int cpu;
+ cpumask_t callmap;
+ unsigned long flags;
+ struct ipi_message_queue *msg_queue;
+ struct ipi_message *msg;
+
+ callmap = cpu_online_map;
+ cpu_clear(smp_processor_id(), callmap);
+ if (cpus_empty(callmap))
+ return;
+
+ msg = kmalloc(sizeof(*msg), GFP_ATOMIC);
+ memset(msg, 0, sizeof(msg));
+ INIT_LIST_HEAD(&msg->list);
+ msg->type = BFIN_IPI_CPU_STOP;
+
+ for_each_cpu_mask(cpu, callmap) {
+ msg_queue = &per_cpu(ipi_msg_queue, cpu);
+ spin_lock_irqsave(&msg_queue->lock, flags);
+ list_add(&msg->list, &msg_queue->head);
+ spin_unlock_irqrestore(&msg_queue->lock, flags);
+ platform_send_ipi_cpu(cpu);
+ }
+ return;
+}
+
+int __cpuinit __cpu_up(unsigned int cpu)
+{
+ struct task_struct *idle;
+ int ret;
+
+ idle = fork_idle(cpu);
+ if (IS_ERR(idle)) {
+ printk(KERN_ERR "CPU%u: fork() failed\n", cpu);
+ return PTR_ERR(idle);
+ }
+
+ secondary_stack = task_stack_page(idle) + THREAD_SIZE;
+ smp_wmb();
+
+ ret = platform_boot_secondary(cpu, idle);
+
+ if (ret) {
+ cpu_clear(cpu, cpu_present_map);
+ printk(KERN_CRIT "CPU%u: processor failed to boot (%d)\n", cpu, ret);
+ free_task(idle);
+ } else
+ cpu_set(cpu, cpu_online_map);
+
+ secondary_stack = NULL;
+
+ return ret;
+}
+
+static void __cpuinit setup_secondary(unsigned int cpu)
+{
+#ifndef CONFIG_TICK_SOURCE_SYSTMR0
+ struct irq_desc *timer_desc;
+#endif
+ unsigned long ilat;
+
+ bfin_write_IMASK(0);
+ CSYNC();
+ ilat = bfin_read_ILAT();
+ CSYNC();
+ bfin_write_ILAT(ilat);
+ CSYNC();
+
+ /* Reserve the PDA space for the secondary CPU. */
+ reserve_pda();
+
+ /* Enable interrupt levels IVG7-15. IARs have been already
+ * programmed by the boot CPU. */
+ irq_flags |= IMASK_IVG15 |
+ IMASK_IVG14 | IMASK_IVG13 | IMASK_IVG12 | IMASK_IVG11 |
+ IMASK_IVG10 | IMASK_IVG9 | IMASK_IVG8 | IMASK_IVG7 | IMASK_IVGHW;
+
+#ifdef CONFIG_TICK_SOURCE_SYSTMR0
+ /* Power down the core timer, just to play safe. */
+ bfin_write_TCNTL(0);
+
+ /* system timer0 has been setup by CoreA. */
+#else
+ timer_desc = irq_desc + IRQ_CORETMR;
+ setup_core_timer();
+ timer_desc->chip->enable(IRQ_CORETMR);
+#endif
+}
+
+void __cpuinit secondary_start_kernel(void)
+{
+ unsigned int cpu = smp_processor_id();
+ struct mm_struct *mm = &init_mm;
+
+ if (_bfin_swrst & SWRST_DBL_FAULT_B) {
+ printk(KERN_EMERG "CoreB Recovering from DOUBLE FAULT event\n");
+#ifdef CONFIG_DEBUG_DOUBLEFAULT
+ printk(KERN_EMERG " While handling exception (EXCAUSE = 0x%x) at %pF\n",
+ (int)init_saved_seqstat_coreb & SEQSTAT_EXCAUSE, init_saved_retx_coreb);
+ printk(KERN_NOTICE " DCPLB_FAULT_ADDR: %pF\n", init_saved_dcplb_fault_addr_coreb);
+ printk(KERN_NOTICE " ICPLB_FAULT_ADDR: %pF\n", init_saved_icplb_fault_addr_coreb);
+#endif
+ printk(KERN_NOTICE " The instruction at %pF caused a double exception\n",
+ init_retx_coreb);
+ }
+
+ /*
+ * We want the D-cache to be enabled early, in case the atomic
+ * support code emulates cache coherence (see
+ * __ARCH_SYNC_CORE_DCACHE).
+ */
+ init_exception_vectors();
+
+ bfin_setup_caches(cpu);
+
+ local_irq_disable();
+
+ /* Attach the new idle task to the global mm. */
+ atomic_inc(&mm->mm_users);
+ atomic_inc(&mm->mm_count);
+ current->active_mm = mm;
+ BUG_ON(current->mm); /* Can't be, but better be safe than sorry. */
+
+ preempt_disable();
+
+ setup_secondary(cpu);
+
+ local_irq_enable();
+
+ platform_secondary_init(cpu);
+
+ cpu_idle();
+}
+
+void __init smp_prepare_boot_cpu(void)
+{
+}
+
+void __init smp_prepare_cpus(unsigned int max_cpus)
+{
+ platform_prepare_cpus(max_cpus);
+ ipi_queue_init();
+ platform_request_ipi(&ipi_handler);
+}
+
+void __init smp_cpus_done(unsigned int max_cpus)
+{
+ unsigned long bogosum = 0;
+ unsigned int cpu;
+
+ for_each_online_cpu(cpu)
+ bogosum += per_cpu(cpu_data, cpu).loops_per_jiffy;
+
+ printk(KERN_INFO "SMP: Total of %d processors activated "
+ "(%lu.%02lu BogoMIPS).\n",
+ num_online_cpus(),
+ bogosum / (500000/HZ),
+ (bogosum / (5000/HZ)) % 100);
+}
+
+void smp_icache_flush_range_others(unsigned long start, unsigned long end)
+{
+ smp_flush_data.start = start;
+ smp_flush_data.end = end;
+
+ if (smp_call_function(&ipi_flush_icache, &smp_flush_data, 1))
+ printk(KERN_WARNING "SMP: failed to run I-cache flush request on other CPUs\n");
+}
+EXPORT_SYMBOL_GPL(smp_icache_flush_range_others);
+
+#ifdef __ARCH_SYNC_CORE_DCACHE
+unsigned long barrier_mask __attribute__ ((__section__(".l2.bss")));
+
+void resync_core_dcache(void)
+{
+ unsigned int cpu = get_cpu();
+ blackfin_invalidate_entire_dcache();
+ ++per_cpu(cpu_data, cpu).dcache_invld_count;
+ put_cpu();
+}
+EXPORT_SYMBOL(resync_core_dcache);
+#endif
diff --git a/arch/blackfin/oprofile/common.c b/arch/blackfin/oprofile/common.c
index 0f6d303..f34795a 100644
--- a/arch/blackfin/oprofile/common.c
+++ b/arch/blackfin/oprofile/common.c
@@ -130,7 +130,7 @@ int __init oprofile_arch_init(struct oprofile_operations *ops)

mutex_init(&pfmon_lock);

- dspid = bfin_read_DSPID();
+ dspid = bfin_dspid();

printk(KERN_INFO "Oprofile got the cpu id is 0x%x. \n", dspid);

--
1.5.6.3

2008-11-18 09:06:20

by Bryan Wu

[permalink] [raw]
Subject: [PATCH 4/5] Blackfin arch: SMP supporting patchset: Blackfin kernel and memory management code

From: Graf Yang <[email protected]>

Blackfin dual core BF561 processor can support SMP like features.
https://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like

In this patch, we provide SMP extend to Blackfin kernel and memory management code

Singed-off-by: Graf Yang <[email protected]>
Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: Bryan Wu <[email protected]>
---
arch/blackfin/kernel/asm-offsets.c | 29 +++
arch/blackfin/kernel/bfin_ksyms.c | 34 ++++
arch/blackfin/kernel/entry.S | 1 +
arch/blackfin/kernel/irqchip.c | 24 ++--
arch/blackfin/kernel/kgdb.c | 4 +-
arch/blackfin/kernel/module.c | 13 ++-
arch/blackfin/kernel/process.c | 23 ++-
arch/blackfin/kernel/ptrace.c | 8 +-
arch/blackfin/kernel/reboot.c | 24 ++-
arch/blackfin/kernel/setup.c | 163 ++++++++++++------
arch/blackfin/kernel/time.c | 114 +++++++++----
arch/blackfin/kernel/traps.c | 56 +++----
arch/blackfin/mm/init.c | 60 +++++--
arch/blackfin/mm/sram-alloc.c | 336 +++++++++++++++++++++---------------
14 files changed, 580 insertions(+), 309 deletions(-)

diff --git a/arch/blackfin/kernel/asm-offsets.c b/arch/blackfin/kernel/asm-offsets.c
index 9bb85dd..b5df945 100644
--- a/arch/blackfin/kernel/asm-offsets.c
+++ b/arch/blackfin/kernel/asm-offsets.c
@@ -56,6 +56,9 @@ int main(void)
/* offsets into the thread struct */
DEFINE(THREAD_KSP, offsetof(struct thread_struct, ksp));
DEFINE(THREAD_USP, offsetof(struct thread_struct, usp));
+ DEFINE(THREAD_SR, offsetof(struct thread_struct, seqstat));
+ DEFINE(PT_SR, offsetof(struct thread_struct, seqstat));
+ DEFINE(THREAD_ESP0, offsetof(struct thread_struct, esp0));
DEFINE(THREAD_PC, offsetof(struct thread_struct, pc));
DEFINE(KERNEL_STACK_SIZE, THREAD_SIZE);

@@ -128,5 +131,31 @@ int main(void)
DEFINE(SIGSEGV, SIGSEGV);
DEFINE(SIGTRAP, SIGTRAP);

+ /* PDA management (in L1 scratchpad) */
+ DEFINE(PDA_SYSCFG, offsetof(struct blackfin_pda, syscfg));
+#ifdef CONFIG_SMP
+ DEFINE(PDA_IRQFLAGS, offsetof(struct blackfin_pda, imask));
+#endif
+ DEFINE(PDA_IPDT, offsetof(struct blackfin_pda, ipdt));
+ DEFINE(PDA_IPDT_SWAPCOUNT, offsetof(struct blackfin_pda, ipdt_swapcount));
+ DEFINE(PDA_DPDT, offsetof(struct blackfin_pda, dpdt));
+ DEFINE(PDA_DPDT_SWAPCOUNT, offsetof(struct blackfin_pda, dpdt_swapcount));
+ DEFINE(PDA_EXIPTR, offsetof(struct blackfin_pda, ex_iptr));
+ DEFINE(PDA_EXOPTR, offsetof(struct blackfin_pda, ex_optr));
+ DEFINE(PDA_EXBUF, offsetof(struct blackfin_pda, ex_buf));
+ DEFINE(PDA_EXIMASK, offsetof(struct blackfin_pda, ex_imask));
+ DEFINE(PDA_EXSTACK, offsetof(struct blackfin_pda, ex_stack));
+#ifdef ANOMALY_05000261
+ DEFINE(PDA_LFRETX, offsetof(struct blackfin_pda, last_cplb_fault_retx));
+#endif
+ DEFINE(PDA_DCPLB, offsetof(struct blackfin_pda, dcplb_fault_addr));
+ DEFINE(PDA_ICPLB, offsetof(struct blackfin_pda, icplb_fault_addr));
+ DEFINE(PDA_RETX, offsetof(struct blackfin_pda, retx));
+ DEFINE(PDA_SEQSTAT, offsetof(struct blackfin_pda, seqstat));
+#ifdef CONFIG_SMP
+ /* Inter-core lock (in L2 SRAM) */
+ DEFINE(SIZEOF_CORELOCK, sizeof(struct corelock_slot));
+#endif
+
return 0;
}
diff --git a/arch/blackfin/kernel/bfin_ksyms.c b/arch/blackfin/kernel/bfin_ksyms.c
index b66f1d4..763c315 100644
--- a/arch/blackfin/kernel/bfin_ksyms.c
+++ b/arch/blackfin/kernel/bfin_ksyms.c
@@ -68,3 +68,37 @@ EXPORT_SYMBOL(insw_8);
EXPORT_SYMBOL(outsl);
EXPORT_SYMBOL(insl);
EXPORT_SYMBOL(insl_16);
+
+#ifdef CONFIG_SMP
+EXPORT_SYMBOL(__raw_atomic_update_asm);
+EXPORT_SYMBOL(__raw_atomic_clear_asm);
+EXPORT_SYMBOL(__raw_atomic_set_asm);
+EXPORT_SYMBOL(__raw_atomic_xor_asm);
+EXPORT_SYMBOL(__raw_atomic_test_asm);
+EXPORT_SYMBOL(__raw_xchg_1_asm);
+EXPORT_SYMBOL(__raw_xchg_2_asm);
+EXPORT_SYMBOL(__raw_xchg_4_asm);
+EXPORT_SYMBOL(__raw_cmpxchg_1_asm);
+EXPORT_SYMBOL(__raw_cmpxchg_2_asm);
+EXPORT_SYMBOL(__raw_cmpxchg_4_asm);
+EXPORT_SYMBOL(__raw_spin_is_locked_asm);
+EXPORT_SYMBOL(__raw_spin_lock_asm);
+EXPORT_SYMBOL(__raw_spin_trylock_asm);
+EXPORT_SYMBOL(__raw_spin_unlock_asm);
+EXPORT_SYMBOL(__raw_read_lock_asm);
+EXPORT_SYMBOL(__raw_read_trylock_asm);
+EXPORT_SYMBOL(__raw_read_unlock_asm);
+EXPORT_SYMBOL(__raw_write_lock_asm);
+EXPORT_SYMBOL(__raw_write_trylock_asm);
+EXPORT_SYMBOL(__raw_write_unlock_asm);
+EXPORT_SYMBOL(__raw_bit_set_asm);
+EXPORT_SYMBOL(__raw_bit_clear_asm);
+EXPORT_SYMBOL(__raw_bit_toggle_asm);
+EXPORT_SYMBOL(__raw_bit_test_asm);
+EXPORT_SYMBOL(__raw_bit_test_set_asm);
+EXPORT_SYMBOL(__raw_bit_test_clear_asm);
+EXPORT_SYMBOL(__raw_bit_test_toggle_asm);
+EXPORT_SYMBOL(__raw_uncached_fetch_asm);
+EXPORT_SYMBOL(__raw_smp_mark_barrier_asm);
+EXPORT_SYMBOL(__raw_smp_check_barrier_asm);
+#endif
diff --git a/arch/blackfin/kernel/entry.S b/arch/blackfin/kernel/entry.S
index faea88e..c0c3fe8 100644
--- a/arch/blackfin/kernel/entry.S
+++ b/arch/blackfin/kernel/entry.S
@@ -30,6 +30,7 @@
#include <linux/linkage.h>
#include <asm/thread_info.h>
#include <asm/errno.h>
+#include <asm/blackfin.h>
#include <asm/asm-offsets.h>

#include <asm/context.S>
diff --git a/arch/blackfin/kernel/irqchip.c b/arch/blackfin/kernel/irqchip.c
index 07402f5..9eebb78 100644
--- a/arch/blackfin/kernel/irqchip.c
+++ b/arch/blackfin/kernel/irqchip.c
@@ -36,7 +36,7 @@
#include <linux/irq.h>
#include <asm/trace.h>

-static unsigned long irq_err_count;
+static atomic_t irq_err_count;
static spinlock_t irq_controller_lock;

/*
@@ -48,7 +48,7 @@ void dummy_mask_unmask_irq(unsigned int irq)

void ack_bad_irq(unsigned int irq)
{
- irq_err_count += 1;
+ atomic_inc(&irq_err_count);
printk(KERN_ERR "IRQ: spurious interrupt %d\n", irq);
}
EXPORT_SYMBOL(ack_bad_irq);
@@ -72,7 +72,7 @@ static struct irq_desc bad_irq_desc = {

int show_interrupts(struct seq_file *p, void *v)
{
- int i = *(loff_t *) v;
+ int i = *(loff_t *) v, j;
struct irqaction *action;
unsigned long flags;

@@ -80,19 +80,20 @@ int show_interrupts(struct seq_file *p, void *v)
spin_lock_irqsave(&irq_desc[i].lock, flags);
action = irq_desc[i].action;
if (!action)
- goto unlock;
-
- seq_printf(p, "%3d: %10u ", i, kstat_irqs(i));
+ goto skip;
+ seq_printf(p, "%3d: ", i);
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", kstat_cpu(j).irqs[i]);
+ seq_printf(p, " %8s", irq_desc[i].chip->name);
seq_printf(p, " %s", action->name);
for (action = action->next; action; action = action->next)
- seq_printf(p, ", %s", action->name);
+ seq_printf(p, " %s", action->name);

seq_putc(p, '\n');
- unlock:
+ skip:
spin_unlock_irqrestore(&irq_desc[i].lock, flags);
- } else if (i == NR_IRQS) {
- seq_printf(p, "Err: %10lu\n", irq_err_count);
- }
+ } else if (i == NR_IRQS)
+ seq_printf(p, "Err: %10u\n", atomic_read(&irq_err_count));
return 0;
}

@@ -101,7 +102,6 @@ int show_interrupts(struct seq_file *p, void *v)
* come via this function. Instead, they should provide their
* own 'handler'
*/
-
#ifdef CONFIG_DO_IRQ_L1
__attribute__((l1_text))
#endif
diff --git a/arch/blackfin/kernel/kgdb.c b/arch/blackfin/kernel/kgdb.c
index b795a20..ab40221 100644
--- a/arch/blackfin/kernel/kgdb.c
+++ b/arch/blackfin/kernel/kgdb.c
@@ -363,12 +363,12 @@ void kgdb_passive_cpu_callback(void *info)

void kgdb_roundup_cpus(unsigned long flags)
{
- smp_call_function(kgdb_passive_cpu_callback, NULL, 0, 0);
+ smp_call_function(kgdb_passive_cpu_callback, NULL, 0);
}

void kgdb_roundup_cpu(int cpu, unsigned long flags)
{
- smp_call_function_single(cpu, kgdb_passive_cpu_callback, NULL, 0, 0);
+ smp_call_function_single(cpu, kgdb_passive_cpu_callback, NULL, 0);
}
#endif

diff --git a/arch/blackfin/kernel/module.c b/arch/blackfin/kernel/module.c
index e1bebc8..2e14cad 100644
--- a/arch/blackfin/kernel/module.c
+++ b/arch/blackfin/kernel/module.c
@@ -343,7 +343,13 @@ apply_relocate_add(Elf_Shdr * sechdrs, const char *strtab,
pr_debug("location is %x, value is %x type is %d \n",
(unsigned int) location32, value,
ELF32_R_TYPE(rel[i].r_info));
-
+#ifdef CONFIG_SMP
+ if ((unsigned long)location16 >= COREB_L1_DATA_A_START) {
+ printk(KERN_ERR "module %s: cannot relocate in L1: %u (SMP kernel)",
+ mod->name, ELF32_R_TYPE(rel[i].r_info));
+ return -ENOEXEC;
+ }
+#endif
switch (ELF32_R_TYPE(rel[i].r_info)) {

case R_pcrel24:
@@ -436,6 +442,7 @@ module_finalize(const Elf_Ehdr * hdr,
{
unsigned int i, strindex = 0, symindex = 0;
char *secstrings;
+ long err = 0;

secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;

@@ -460,8 +467,10 @@ module_finalize(const Elf_Ehdr * hdr,
(strcmp(".rela.l1.text", secstrings + sechdrs[i].sh_name) == 0) ||
((strcmp(".rela.text", secstrings + sechdrs[i].sh_name) == 0) &&
(hdr->e_flags & (EF_BFIN_CODE_IN_L1|EF_BFIN_CODE_IN_L2))))) {
- apply_relocate_add((Elf_Shdr *) sechdrs, strtab,
+ err = apply_relocate_add((Elf_Shdr *) sechdrs, strtab,
symindex, i, mod);
+ if (err < 0)
+ return -ENOEXEC;
}
}
return 0;
diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
index 326e301..4359ea2 100644
--- a/arch/blackfin/kernel/process.c
+++ b/arch/blackfin/kernel/process.c
@@ -171,6 +171,13 @@ asmlinkage int bfin_clone(struct pt_regs *regs)
unsigned long clone_flags;
unsigned long newsp;

+#ifdef __ARCH_SYNC_CORE_DCACHE
+ if (current->rt.nr_cpus_allowed == num_possible_cpus()) {
+ current->cpus_allowed = cpumask_of_cpu(smp_processor_id());
+ current->rt.nr_cpus_allowed = 1;
+ }
+#endif
+
/* syscall2 puts clone_flags in r0 and usp in r1 */
clone_flags = regs->r0;
newsp = regs->r1;
@@ -338,22 +345,22 @@ int _access_ok(unsigned long addr, unsigned long size)
if (addr >= (unsigned long)__init_begin &&
addr + size <= (unsigned long)__init_end)
return 1;
- if (addr >= L1_SCRATCH_START
- && addr + size <= L1_SCRATCH_START + L1_SCRATCH_LENGTH)
+ if (addr >= get_l1_scratch_start()
+ && addr + size <= get_l1_scratch_start() + L1_SCRATCH_LENGTH)
return 1;
#if L1_CODE_LENGTH != 0
- if (addr >= L1_CODE_START + (_etext_l1 - _stext_l1)
- && addr + size <= L1_CODE_START + L1_CODE_LENGTH)
+ if (addr >= get_l1_code_start() + (_etext_l1 - _stext_l1)
+ && addr + size <= get_l1_code_start() + L1_CODE_LENGTH)
return 1;
#endif
#if L1_DATA_A_LENGTH != 0
- if (addr >= L1_DATA_A_START + (_ebss_l1 - _sdata_l1)
- && addr + size <= L1_DATA_A_START + L1_DATA_A_LENGTH)
+ if (addr >= get_l1_data_a_start() + (_ebss_l1 - _sdata_l1)
+ && addr + size <= get_l1_data_a_start() + L1_DATA_A_LENGTH)
return 1;
#endif
#if L1_DATA_B_LENGTH != 0
- if (addr >= L1_DATA_B_START + (_ebss_b_l1 - _sdata_b_l1)
- && addr + size <= L1_DATA_B_START + L1_DATA_B_LENGTH)
+ if (addr >= get_l1_data_b_start() + (_ebss_b_l1 - _sdata_b_l1)
+ && addr + size <= get_l1_data_b_start() + L1_DATA_B_LENGTH)
return 1;
#endif
#if L2_LENGTH != 0
diff --git a/arch/blackfin/kernel/ptrace.c b/arch/blackfin/kernel/ptrace.c
index 140bf00..4de44f3 100644
--- a/arch/blackfin/kernel/ptrace.c
+++ b/arch/blackfin/kernel/ptrace.c
@@ -220,8 +220,8 @@ long arch_ptrace(struct task_struct *child, long request, long addr, long data)
break;
pr_debug("ptrace: user address is valid\n");

- if (L1_CODE_LENGTH != 0 && addr >= L1_CODE_START
- && addr + sizeof(tmp) <= L1_CODE_START + L1_CODE_LENGTH) {
+ if (L1_CODE_LENGTH != 0 && addr >= get_l1_code_start()
+ && addr + sizeof(tmp) <= get_l1_code_start() + L1_CODE_LENGTH) {
safe_dma_memcpy (&tmp, (const void *)(addr), sizeof(tmp));
copied = sizeof(tmp);

@@ -300,8 +300,8 @@ long arch_ptrace(struct task_struct *child, long request, long addr, long data)
break;
pr_debug("ptrace: user address is valid\n");

- if (L1_CODE_LENGTH != 0 && addr >= L1_CODE_START
- && addr + sizeof(data) <= L1_CODE_START + L1_CODE_LENGTH) {
+ if (L1_CODE_LENGTH != 0 && addr >= get_l1_code_start()
+ && addr + sizeof(data) <= get_l1_code_start() + L1_CODE_LENGTH) {
safe_dma_memcpy ((void *)(addr), &data, sizeof(data));
copied = sizeof(data);

diff --git a/arch/blackfin/kernel/reboot.c b/arch/blackfin/kernel/reboot.c
index ae97ca4..eeee8cb 100644
--- a/arch/blackfin/kernel/reboot.c
+++ b/arch/blackfin/kernel/reboot.c
@@ -21,7 +21,7 @@
* the core reset.
*/
__attribute__((l1_text))
-static void bfin_reset(void)
+static void _bfin_reset(void)
{
/* Wait for completion of "system" events such as cache line
* line fills so that we avoid infinite stalls later on as
@@ -66,6 +66,18 @@ static void bfin_reset(void)
}
}

+static void bfin_reset(void)
+{
+ if (ANOMALY_05000353 || ANOMALY_05000386)
+ _bfin_reset();
+ else
+ /* the bootrom checks to see how it was reset and will
+ * automatically perform a software reset for us when
+ * it starts executing boot
+ */
+ asm("raise 1;");
+}
+
__attribute__((weak))
void native_machine_restart(char *cmd)
{
@@ -75,14 +87,10 @@ void machine_restart(char *cmd)
{
native_machine_restart(cmd);
local_irq_disable();
- if (ANOMALY_05000353 || ANOMALY_05000386)
- bfin_reset();
+ if (smp_processor_id())
+ smp_call_function((void *)bfin_reset, 0, 1);
else
- /* the bootrom checks to see how it was reset and will
- * automatically perform a software reset for us when
- * it starts executing boot
- */
- asm("raise 1;");
+ bfin_reset();
}

__attribute__((weak))
diff --git a/arch/blackfin/kernel/setup.c b/arch/blackfin/kernel/setup.c
index 71a9a8c..c644d23 100644
--- a/arch/blackfin/kernel/setup.c
+++ b/arch/blackfin/kernel/setup.c
@@ -26,11 +26,10 @@
#include <asm/blackfin.h>
#include <asm/cplbinit.h>
#include <asm/div64.h>
+#include <asm/cpu.h>
#include <asm/fixed_code.h>
#include <asm/early_printk.h>

-static DEFINE_PER_CPU(struct cpu, cpu_devices);
-
u16 _bfin_swrst;
EXPORT_SYMBOL(_bfin_swrst);

@@ -79,29 +78,76 @@ static struct change_member *change_point[2*BFIN_MEMMAP_MAX] __initdata;
static struct bfin_memmap_entry *overlap_list[BFIN_MEMMAP_MAX] __initdata;
static struct bfin_memmap_entry new_map[BFIN_MEMMAP_MAX] __initdata;

-void __init bfin_cache_init(void)
-{
+DEFINE_PER_CPU(struct blackfin_cpudata, cpu_data);
+
#if defined(CONFIG_BFIN_DCACHE) || defined(CONFIG_BFIN_ICACHE)
- generate_cplb_tables();
+void __init generate_cplb_tables(void)
+{
+ unsigned int cpu;
+
+ /* Generate per-CPU I&D CPLB tables */
+ for (cpu = 0; cpu < num_possible_cpus(); ++cpu)
+ generate_cplb_tables_cpu(cpu);
+}
#endif

+void __cpuinit bfin_setup_caches(unsigned int cpu)
+{
#ifdef CONFIG_BFIN_ICACHE
- bfin_icache_init();
- printk(KERN_INFO "Instruction Cache Enabled\n");
+#ifdef CONFIG_MPU
+ bfin_icache_init(icplb_tbl[cpu]);
+#else
+ bfin_icache_init(icplb_tables[cpu]);
+#endif
#endif

#ifdef CONFIG_BFIN_DCACHE
- bfin_dcache_init();
- printk(KERN_INFO "Data Cache Enabled"
+#ifdef CONFIG_MPU
+ bfin_dcache_init(dcplb_tbl[cpu]);
+#else
+ bfin_dcache_init(dcplb_tables[cpu]);
+#endif
+#endif
+
+ /*
+ * In cache coherence emulation mode, we need to have the
+ * D-cache enabled before running any atomic operation which
+ * might invove cache invalidation (i.e. spinlock, rwlock).
+ * So printk's are deferred until then.
+ */
+#ifdef CONFIG_BFIN_ICACHE
+ printk(KERN_INFO "Instruction Cache Enabled for CPU%u\n", cpu);
+#endif
+#ifdef CONFIG_BFIN_DCACHE
+ printk(KERN_INFO "Data Cache Enabled for CPU%u"
# if defined CONFIG_BFIN_WB
" (write-back)"
# elif defined CONFIG_BFIN_WT
" (write-through)"
# endif
- "\n");
+ "\n", cpu);
#endif
}

+void __cpuinit bfin_setup_cpudata(unsigned int cpu)
+{
+ struct blackfin_cpudata *cpudata = &per_cpu(cpu_data, cpu);
+
+ cpudata->idle = current;
+ cpudata->loops_per_jiffy = loops_per_jiffy;
+ cpudata->cclk = get_cclk();
+ cpudata->imemctl = bfin_read_IMEM_CONTROL();
+ cpudata->dmemctl = bfin_read_DMEM_CONTROL();
+}
+
+void __init bfin_cache_init(void)
+{
+#if defined(CONFIG_BFIN_DCACHE) || defined(CONFIG_BFIN_ICACHE)
+ generate_cplb_tables();
+#endif
+ bfin_setup_caches(0);
+}
+
void __init bfin_relocate_l1_mem(void)
{
unsigned long l1_code_length;
@@ -230,7 +276,7 @@ static int __init sanitize_memmap(struct bfin_memmap_entry *map, int *pnr_map)
/* record all known change-points (starting and ending addresses),
omitting those that are for empty memory regions */
chgidx = 0;
- for (i = 0; i < old_nr; i++) {
+ for (i = 0; i < old_nr; i++) {
if (map[i].size != 0) {
change_point[chgidx]->addr = map[i].addr;
change_point[chgidx++]->pentry = &map[i];
@@ -238,13 +284,13 @@ static int __init sanitize_memmap(struct bfin_memmap_entry *map, int *pnr_map)
change_point[chgidx++]->pentry = &map[i];
}
}
- chg_nr = chgidx; /* true number of change-points */
+ chg_nr = chgidx; /* true number of change-points */

/* sort change-point list by memory addresses (low -> high) */
still_changing = 1;
- while (still_changing) {
+ while (still_changing) {
still_changing = 0;
- for (i = 1; i < chg_nr; i++) {
+ for (i = 1; i < chg_nr; i++) {
/* if <current_addr> > <last_addr>, swap */
/* or, if current=<start_addr> & last=<end_addr>, swap */
if ((change_point[i]->addr < change_point[i-1]->addr) ||
@@ -261,10 +307,10 @@ static int __init sanitize_memmap(struct bfin_memmap_entry *map, int *pnr_map)
}

/* create a new memmap, removing overlaps */
- overlap_entries = 0; /* number of entries in the overlap table */
- new_entry = 0; /* index for creating new memmap entries */
- last_type = 0; /* start with undefined memory type */
- last_addr = 0; /* start with 0 as last starting address */
+ overlap_entries = 0; /* number of entries in the overlap table */
+ new_entry = 0; /* index for creating new memmap entries */
+ last_type = 0; /* start with undefined memory type */
+ last_addr = 0; /* start with 0 as last starting address */
/* loop through change-points, determining affect on the new memmap */
for (chgidx = 0; chgidx < chg_nr; chgidx++) {
/* keep track of all overlapping memmap entries */
@@ -286,14 +332,14 @@ static int __init sanitize_memmap(struct bfin_memmap_entry *map, int *pnr_map)
if (overlap_list[i]->type > current_type)
current_type = overlap_list[i]->type;
/* continue building up new memmap based on this information */
- if (current_type != last_type) {
+ if (current_type != last_type) {
if (last_type != 0) {
new_map[new_entry].size =
change_point[chgidx]->addr - last_addr;
/* move forward only if the new size was non-zero */
if (new_map[new_entry].size != 0)
if (++new_entry >= BFIN_MEMMAP_MAX)
- break; /* no more space left for new entries */
+ break; /* no more space left for new entries */
}
if (current_type != 0) {
new_map[new_entry].addr = change_point[chgidx]->addr;
@@ -303,9 +349,9 @@ static int __init sanitize_memmap(struct bfin_memmap_entry *map, int *pnr_map)
last_type = current_type;
}
}
- new_nr = new_entry; /* retain count for new entries */
+ new_nr = new_entry; /* retain count for new entries */

- /* copy new mapping into original location */
+ /* copy new mapping into original location */
memcpy(map, new_map, new_nr*sizeof(struct bfin_memmap_entry));
*pnr_map = new_nr;

@@ -361,7 +407,6 @@ static __init int parse_memmap(char *arg)
* - "memmap=XXX[KkmM][@][$]XXX[KkmM]" defines a memory region
* @ from <start> to <start>+<mem>, type RAM
* $ from <start> to <start>+<mem>, type RESERVED
- *
*/
static __init void parse_cmdline_early(char *cmdline_p)
{
@@ -383,12 +428,10 @@ static __init void parse_cmdline_early(char *cmdline_p)
if (*to != ' ') {
if (*to == '$'
|| *(to + 1) == '$')
- reserved_mem_dcache_on =
- 1;
+ reserved_mem_dcache_on = 1;
if (*to == '#'
|| *(to + 1) == '#')
- reserved_mem_icache_on =
- 1;
+ reserved_mem_icache_on = 1;
}
}
} else if (!memcmp(to, "earlyprintk=", 12)) {
@@ -417,9 +460,8 @@ static __init void parse_cmdline_early(char *cmdline_p)
* [_ramend - DMA_UNCACHED_REGION,
* _ramend]: uncached DMA region
* [_ramend, physical_mem_end]: memory not managed by kernel
- *
*/
-static __init void memory_setup(void)
+static __init void memory_setup(void)
{
#ifdef CONFIG_MTD_UCLINUX
unsigned long mtd_phys = 0;
@@ -436,7 +478,7 @@ static __init void memory_setup(void)
memory_end = _ramend - DMA_UNCACHED_REGION;

#ifdef CONFIG_MPU
- /* Round up to multiple of 4MB. */
+ /* Round up to multiple of 4MB */
memory_start = (_ramstart + 0x3fffff) & ~0x3fffff;
#else
memory_start = PAGE_ALIGN(_ramstart);
@@ -616,7 +658,7 @@ static __init void setup_bootmem_allocator(void)
end_pfn = memory_end >> PAGE_SHIFT;

/*
- * give all the memory to the bootmap allocator, tell it to put the
+ * give all the memory to the bootmap allocator, tell it to put the
* boot mem_map at the start of memory.
*/
bootmap_size = init_bootmem_node(NODE_DATA(0),
@@ -791,7 +833,11 @@ void __init setup_arch(char **cmdline_p)
bfin_write_SWRST(_bfin_swrst | DOUBLE_FAULT);
#endif

+#ifdef CONFIG_SMP
+ if (_bfin_swrst & SWRST_DBL_FAULT_A) {
+#else
if (_bfin_swrst & RESET_DOUBLE) {
+#endif
printk(KERN_EMERG "Recovering from DOUBLE FAULT event\n");
#ifdef CONFIG_DEBUG_DOUBLEFAULT
/* We assume the crashing kernel, and the current symbol table match */
@@ -835,7 +881,7 @@ void __init setup_arch(char **cmdline_p)
printk(KERN_INFO "Blackfin Linux support by http://blackfin.uclinux.org/\n");

printk(KERN_INFO "Processor Speed: %lu MHz core clock and %lu MHz System Clock\n",
- cclk / 1000000, sclk / 1000000);
+ cclk / 1000000, sclk / 1000000);

if (ANOMALY_05000273 && (cclk >> 1) <= sclk)
printk("\n\n\nANOMALY_05000273: CCLK must be >= 2*SCLK !!!\n\n\n");
@@ -867,18 +913,21 @@ void __init setup_arch(char **cmdline_p)
BUG_ON((char *)&safe_user_instruction - (char *)&fixed_code_start
!= SAFE_USER_INSTRUCTION - FIXED_CODE_START);

+#ifdef CONFIG_SMP
+ platform_init_cpus();
+#endif
init_exception_vectors();
- bfin_cache_init();
+ bfin_cache_init(); /* Initialize caches for the boot CPU */
}

static int __init topology_init(void)
{
- int cpu;
+ unsigned int cpu;
+ /* Record CPU-private information for the boot processor. */
+ bfin_setup_cpudata(0);

for_each_possible_cpu(cpu) {
- struct cpu *c = &per_cpu(cpu_devices, cpu);
-
- register_cpu(c, cpu);
+ register_cpu(&per_cpu(cpu_data, cpu).cpu, cpu);
}

return 0;
@@ -983,15 +1032,15 @@ static int show_cpuinfo(struct seq_file *m, void *v)
char *cpu, *mmu, *fpu, *vendor, *cache;
uint32_t revid;

- u_long cclk = 0, sclk = 0;
+ u_long sclk = 0;
u_int icache_size = BFIN_ICACHESIZE / 1024, dcache_size = 0, dsup_banks = 0;
+ struct blackfin_cpudata *cpudata = &per_cpu(cpu_data, *(unsigned int *)v);

cpu = CPU;
mmu = "none";
fpu = "none";
revid = bfin_revid();

- cclk = get_cclk();
sclk = get_sclk();

switch (bfin_read_CHIPID() & CHIPID_MANUFACTURE) {
@@ -1003,10 +1052,8 @@ static int show_cpuinfo(struct seq_file *m, void *v)
break;
}

- seq_printf(m, "processor\t: %d\n"
- "vendor_id\t: %s\n",
- *(unsigned int *)v,
- vendor);
+ seq_printf(m, "processor\t: %d\n" "vendor_id\t: %s\n",
+ *(unsigned int *)v, vendor);

if (CPUID == bfin_cpuid())
seq_printf(m, "cpu family\t: 0x%04x\n", CPUID);
@@ -1016,7 +1063,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)

seq_printf(m, "model name\t: ADSP-%s %lu(MHz CCLK) %lu(MHz SCLK) (%s)\n"
"stepping\t: %d\n",
- cpu, cclk/1000000, sclk/1000000,
+ cpu, cpudata->cclk/1000000, sclk/1000000,
#ifdef CONFIG_MPU
"mpu on",
#else
@@ -1025,16 +1072,16 @@ static int show_cpuinfo(struct seq_file *m, void *v)
revid);

seq_printf(m, "cpu MHz\t\t: %lu.%03lu/%lu.%03lu\n",
- cclk/1000000, cclk%1000000,
+ cpudata->cclk/1000000, cpudata->cclk%1000000,
sclk/1000000, sclk%1000000);
seq_printf(m, "bogomips\t: %lu.%02lu\n"
"Calibration\t: %lu loops\n",
- (loops_per_jiffy * HZ) / 500000,
- ((loops_per_jiffy * HZ) / 5000) % 100,
- (loops_per_jiffy * HZ));
+ (cpudata->loops_per_jiffy * HZ) / 500000,
+ ((cpudata->loops_per_jiffy * HZ) / 5000) % 100,
+ (cpudata->loops_per_jiffy * HZ));

/* Check Cache configutation */
- switch (bfin_read_DMEM_CONTROL() & (1 << DMC0_P | 1 << DMC1_P)) {
+ switch (cpudata->dmemctl & (1 << DMC0_P | 1 << DMC1_P)) {
case ACACHE_BSRAM:
cache = "dbank-A/B\t: cache/sram";
dcache_size = 16;
@@ -1058,10 +1105,10 @@ static int show_cpuinfo(struct seq_file *m, void *v)
}

/* Is it turned on? */
- if ((bfin_read_DMEM_CONTROL() & (ENDCPLB | DMC_ENABLE)) != (ENDCPLB | DMC_ENABLE))
+ if ((cpudata->dmemctl & (ENDCPLB | DMC_ENABLE)) != (ENDCPLB | DMC_ENABLE))
dcache_size = 0;

- if ((bfin_read_IMEM_CONTROL() & (IMC | ENICPLB)) != (IMC | ENICPLB))
+ if ((cpudata->imemctl & (IMC | ENICPLB)) != (IMC | ENICPLB))
icache_size = 0;

seq_printf(m, "cache size\t: %d KB(L1 icache) "
@@ -1086,8 +1133,13 @@ static int show_cpuinfo(struct seq_file *m, void *v)
"dcache setup\t: %d Super-banks/%d Sub-banks/%d Ways, %d Lines/Way\n",
dsup_banks, BFIN_DSUBBANKS, BFIN_DWAYS,
BFIN_DLINES);
+#ifdef __ARCH_SYNC_CORE_DCACHE
+ seq_printf(m,
+ "SMP Dcache Flushes\t: %lu\n\n",
+ per_cpu(cpu_data, *(unsigned int *)v).dcache_invld_count);
+#endif
#ifdef CONFIG_BFIN_ICACHE_LOCK
- switch ((bfin_read_IMEM_CONTROL() >> 3) & WAYALL_L) {
+ switch ((cpudata->imemctl >> 3) & WAYALL_L) {
case WAY0_L:
seq_printf(m, "Way0 Locked-Down\n");
break;
@@ -1137,6 +1189,12 @@ static int show_cpuinfo(struct seq_file *m, void *v)
seq_printf(m, "No Ways are locked\n");
}
#endif
+ if (*(unsigned int *)v != NR_CPUS-1)
+ return 0;
+
+#if L2_LENGTH
+ seq_printf(m, "L2 SRAM\t\t: %dKB\n", L2_LENGTH/0x400);
+#endif
seq_printf(m, "board name\t: %s\n", bfin_board_name);
seq_printf(m, "board memory\t: %ld kB (0x%p -> 0x%p)\n",
physical_mem_end >> 10, (void *)0, (void *)physical_mem_end);
@@ -1144,6 +1202,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
((int)memory_end - (int)_stext) >> 10,
_stext,
(void *)memory_end);
+ seq_printf(m, "\n");

return 0;
}
diff --git a/arch/blackfin/kernel/time.c b/arch/blackfin/kernel/time.c
index eb23523..06de2ce 100644
--- a/arch/blackfin/kernel/time.c
+++ b/arch/blackfin/kernel/time.c
@@ -34,9 +34,11 @@
#include <linux/interrupt.h>
#include <linux/time.h>
#include <linux/irq.h>
+#include <linux/delay.h>

#include <asm/blackfin.h>
#include <asm/time.h>
+#include <asm/gptimers.h>

/* This is an NTP setting */
#define TICK_SIZE (tick_nsec / 1000)
@@ -46,11 +48,14 @@ static unsigned long gettimeoffset(void);

static struct irqaction bfin_timer_irq = {
.name = "BFIN Timer Tick",
+#ifdef CONFIG_IRQ_PER_CPU
+ .flags = IRQF_DISABLED | IRQF_PERCPU,
+#else
.flags = IRQF_DISABLED
+#endif
};

-static void
-time_sched_init(irq_handler_t timer_routine)
+void setup_core_timer(void)
{
u32 tcount;

@@ -71,12 +76,41 @@ time_sched_init(irq_handler_t timer_routine)
CSYNC();

bfin_write_TCNTL(7);
+}
+
+#ifdef CONFIG_TICK_SOURCE_SYSTMR0
+void setup_system_timer0(void)
+{
+ /* Power down the core timer, just to play safe. */
+ bfin_write_TCNTL(0);
+
+ disable_gptimers(TIMER0bit);
+ set_gptimer_status(0, TIMER_STATUS_TRUN0);
+ while (get_gptimer_status(0) & TIMER_STATUS_TRUN0)
+ udelay(10);
+
+ set_gptimer_config(0, 0x59); /* IRQ enable, periodic, PWM_OUT, SCLKed, OUT PAD disabled */
+ set_gptimer_period(TIMER0_id, get_sclk() / HZ);
+ set_gptimer_pwidth(TIMER0_id, 1);
+ SSYNC();
+ enable_gptimers(TIMER0bit);
+}
+#endif

+static void
+time_sched_init(irqreturn_t(*timer_routine) (int, void *))
+{
+#ifdef CONFIG_TICK_SOURCE_SYSTMR0
+ setup_system_timer0();
+#else
+ setup_core_timer();
+#endif
bfin_timer_irq.handler = (irq_handler_t)timer_routine;
- /* call setup_irq instead of request_irq because request_irq calls
- * kmalloc which has not been initialized yet
- */
+#ifdef CONFIG_TICK_SOURCE_SYSTMR0
+ setup_irq(IRQ_TIMER0, &bfin_timer_irq);
+#else
setup_irq(IRQ_CORETMR, &bfin_timer_irq);
+#endif
}

/*
@@ -87,17 +121,23 @@ static unsigned long gettimeoffset(void)
unsigned long offset;
unsigned long clocks_per_jiffy;

+#ifdef CONFIG_TICK_SOURCE_SYSTMR0
+ clocks_per_jiffy = bfin_read_TIMER0_PERIOD();
+ offset = bfin_read_TIMER0_COUNTER() / \
+ (((clocks_per_jiffy + 1) * HZ) / USEC_PER_SEC);
+
+ if ((get_gptimer_status(0) & TIMER_STATUS_TIMIL0) && offset < (100000 / HZ / 2))
+ offset += (USEC_PER_SEC / HZ);
+#else
clocks_per_jiffy = bfin_read_TPERIOD();
- offset =
- (clocks_per_jiffy -
- bfin_read_TCOUNT()) / (((clocks_per_jiffy + 1) * HZ) /
- USEC_PER_SEC);
+ offset = (clocks_per_jiffy - bfin_read_TCOUNT()) / \
+ (((clocks_per_jiffy + 1) * HZ) / USEC_PER_SEC);

/* Check if we just wrapped the counters and maybe missed a tick */
if ((bfin_read_ILAT() & (1 << IRQ_CORETMR))
- && (offset < (100000 / HZ / 2)))
+ && (offset < (100000 / HZ / 2)))
offset += (USEC_PER_SEC / HZ);
-
+#endif
return offset;
}

@@ -120,34 +160,38 @@ irqreturn_t timer_interrupt(int irq, void *dummy)
static long last_rtc_update;

write_seqlock(&xtime_lock);
-
- do_timer(1);
-
- profile_tick(CPU_PROFILING);
-
- /*
- * If we have an externally synchronized Linux clock, then update
- * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
- * called as close as possible to 500 ms before the new second starts.
- */
-
- if (ntp_synced() &&
- xtime.tv_sec > last_rtc_update + 660 &&
- (xtime.tv_nsec / NSEC_PER_USEC) >=
- 500000 - ((unsigned)TICK_SIZE) / 2
- && (xtime.tv_nsec / NSEC_PER_USEC) <=
- 500000 + ((unsigned)TICK_SIZE) / 2) {
- if (set_rtc_mmss(xtime.tv_sec) == 0)
- last_rtc_update = xtime.tv_sec;
- else
- /* Do it again in 60s. */
- last_rtc_update = xtime.tv_sec - 600;
+#ifdef CONFIG_TICK_SOURCE_SYSTMR0
+ if (get_gptimer_status(0) & TIMER_STATUS_TIMIL0) {
+#endif
+ do_timer(1);
+
+
+ /*
+ * If we have an externally synchronized Linux clock, then update
+ * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
+ * called as close as possible to 500 ms before the new second starts.
+ */
+
+ if (ntp_synced() &&
+ xtime.tv_sec > last_rtc_update + 660 &&
+ (xtime.tv_nsec / NSEC_PER_USEC) >=
+ 500000 - ((unsigned)TICK_SIZE) / 2
+ && (xtime.tv_nsec / NSEC_PER_USEC) <=
+ 500000 + ((unsigned)TICK_SIZE) / 2) {
+ if (set_rtc_mmss(xtime.tv_sec) == 0)
+ last_rtc_update = xtime.tv_sec;
+ else
+ /* Do it again in 60s. */
+ last_rtc_update = xtime.tv_sec - 600;
+ }
+#ifdef CONFIG_TICK_SOURCE_SYSTMR0
+ set_gptimer_status(0, TIMER_STATUS_TIMIL0);
}
+#endif
write_sequnlock(&xtime_lock);

-#ifndef CONFIG_SMP
update_process_times(user_mode(get_irq_regs()));
-#endif
+ profile_tick(CPU_PROFILING);

return IRQ_HANDLED;
}
diff --git a/arch/blackfin/kernel/traps.c b/arch/blackfin/kernel/traps.c
index bef025b..af7cc43 100644
--- a/arch/blackfin/kernel/traps.c
+++ b/arch/blackfin/kernel/traps.c
@@ -75,16 +75,6 @@ void __init trap_init(void)
CSYNC();
}

-/*
- * Used to save the RETX, SEQSTAT, I/D CPLB FAULT ADDR
- * values across the transition from exception to IRQ5.
- * We put these in L1, so they are going to be in a valid
- * location during exception context
- */
-__attribute__((l1_data))
-unsigned long saved_retx, saved_seqstat,
- saved_icplb_fault_addr, saved_dcplb_fault_addr;
-
static void decode_address(char *buf, unsigned long address)
{
#ifdef CONFIG_DEBUG_VERBOSE
@@ -211,18 +201,18 @@ asmlinkage void double_fault_c(struct pt_regs *fp)
printk(KERN_EMERG "\n" KERN_EMERG "Double Fault\n");
#ifdef CONFIG_DEBUG_DOUBLEFAULT_PRINT
if (((long)fp->seqstat & SEQSTAT_EXCAUSE) == VEC_UNCOV) {
+ unsigned int cpu = smp_processor_id();
char buf[150];
- decode_address(buf, saved_retx);
+ decode_address(buf, cpu_pda[cpu].retx);
printk(KERN_EMERG "While handling exception (EXCAUSE = 0x%x) at %s:\n",
- (int)saved_seqstat & SEQSTAT_EXCAUSE, buf);
- decode_address(buf, saved_dcplb_fault_addr);
+ (unsigned int)cpu_pda[cpu].seqstat & SEQSTAT_EXCAUSE, buf);
+ decode_address(buf, cpu_pda[cpu].dcplb_fault_addr);
printk(KERN_NOTICE " DCPLB_FAULT_ADDR: %s\n", buf);
- decode_address(buf, saved_icplb_fault_addr);
+ decode_address(buf, cpu_pda[cpu].icplb_fault_addr);
printk(KERN_NOTICE " ICPLB_FAULT_ADDR: %s\n", buf);

decode_address(buf, fp->retx);
- printk(KERN_NOTICE "The instruction at %s caused a double exception\n",
- buf);
+ printk(KERN_NOTICE "The instruction at %s caused a double exception\n", buf);
} else
#endif
{
@@ -240,6 +230,9 @@ asmlinkage void trap_c(struct pt_regs *fp)
#ifdef CONFIG_DEBUG_BFIN_HWTRACE_ON
int j;
#endif
+#ifdef CONFIG_DEBUG_HUNT_FOR_ZERO
+ unsigned int cpu = smp_processor_id();
+#endif
int sig = 0;
siginfo_t info;
unsigned long trapnr = fp->seqstat & SEQSTAT_EXCAUSE;
@@ -417,7 +410,7 @@ asmlinkage void trap_c(struct pt_regs *fp)
info.si_code = ILL_CPLB_MULHIT;
sig = SIGSEGV;
#ifdef CONFIG_DEBUG_HUNT_FOR_ZERO
- if (saved_dcplb_fault_addr < FIXED_CODE_START)
+ if (cpu_pda[cpu].dcplb_fault_addr < FIXED_CODE_START)
verbose_printk(KERN_NOTICE "NULL pointer access\n");
else
#endif
@@ -471,7 +464,7 @@ asmlinkage void trap_c(struct pt_regs *fp)
info.si_code = ILL_CPLB_MULHIT;
sig = SIGSEGV;
#ifdef CONFIG_DEBUG_HUNT_FOR_ZERO
- if (saved_icplb_fault_addr < FIXED_CODE_START)
+ if (cpu_pda[cpu].icplb_fault_addr < FIXED_CODE_START)
verbose_printk(KERN_NOTICE "Jump to NULL address\n");
else
#endif
@@ -960,6 +953,7 @@ void dump_bfin_process(struct pt_regs *fp)
else
verbose_printk(KERN_NOTICE "COMM= invalid\n");

+ printk(KERN_NOTICE "CPU = %d\n", current_thread_info()->cpu);
if (!((unsigned long)current->mm & 0x3) && (unsigned long)current->mm >= FIXED_CODE_START)
verbose_printk(KERN_NOTICE "TEXT = 0x%p-0x%p DATA = 0x%p-0x%p\n"
KERN_NOTICE " BSS = 0x%p-0x%p USER-STACK = 0x%p\n"
@@ -1053,6 +1047,7 @@ void show_regs(struct pt_regs *fp)
struct irqaction *action;
unsigned int i;
unsigned long flags;
+ unsigned int cpu = smp_processor_id();

verbose_printk(KERN_NOTICE "\n" KERN_NOTICE "SEQUENCER STATUS:\t\t%s\n", print_tainted());
verbose_printk(KERN_NOTICE " SEQSTAT: %08lx IPEND: %04lx SYSCFG: %04lx\n",
@@ -1112,9 +1107,9 @@ unlock:

if (((long)fp->seqstat & SEQSTAT_EXCAUSE) &&
(((long)fp->seqstat & SEQSTAT_EXCAUSE) != VEC_HWERR)) {
- decode_address(buf, saved_dcplb_fault_addr);
+ decode_address(buf, cpu_pda[cpu].dcplb_fault_addr);
verbose_printk(KERN_NOTICE "DCPLB_FAULT_ADDR: %s\n", buf);
- decode_address(buf, saved_icplb_fault_addr);
+ decode_address(buf, cpu_pda[cpu].icplb_fault_addr);
verbose_printk(KERN_NOTICE "ICPLB_FAULT_ADDR: %s\n", buf);
}

@@ -1153,20 +1148,21 @@ unlock:
asmlinkage int sys_bfin_spinlock(int *spinlock)__attribute__((l1_text));
#endif

-asmlinkage int sys_bfin_spinlock(int *spinlock)
+static DEFINE_SPINLOCK(bfin_spinlock_lock);
+
+asmlinkage int sys_bfin_spinlock(int *p)
{
- int ret = 0;
- int tmp = 0;
+ int ret, tmp = 0;

- local_irq_disable();
- ret = get_user(tmp, spinlock);
- if (ret == 0) {
- if (tmp)
+ spin_lock(&bfin_spinlock_lock); /* This would also hold kernel preemption. */
+ ret = get_user(tmp, p);
+ if (likely(ret == 0)) {
+ if (unlikely(tmp))
ret = 1;
- tmp = 1;
- put_user(tmp, spinlock);
+ else
+ put_user(1, p);
}
- local_irq_enable();
+ spin_unlock(&bfin_spinlock_lock);
return ret;
}

diff --git a/arch/blackfin/mm/init.c b/arch/blackfin/mm/init.c
index bc240ab..57d306b 100644
--- a/arch/blackfin/mm/init.c
+++ b/arch/blackfin/mm/init.c
@@ -31,7 +31,8 @@
#include <linux/bootmem.h>
#include <linux/uaccess.h>
#include <asm/bfin-global.h>
-#include <asm/l1layout.h>
+#include <asm/pda.h>
+#include <asm/cplbinit.h>
#include "blackfin_sram.h"

/*
@@ -53,6 +54,11 @@ static unsigned long empty_bad_page;

unsigned long empty_zero_page;

+extern unsigned long exception_stack[NR_CPUS][1024];
+
+struct blackfin_pda cpu_pda[NR_CPUS];
+EXPORT_SYMBOL(cpu_pda);
+
/*
* paging_init() continues the virtual memory environment setup which
* was begun by the code in arch/head.S.
@@ -98,6 +104,42 @@ void __init paging_init(void)
}
}

+asmlinkage void init_pda(void)
+{
+ unsigned int cpu = raw_smp_processor_id();
+
+ /* Initialize the PDA fields holding references to other parts
+ of the memory. The content of such memory is still
+ undefined at the time of the call, we are only setting up
+ valid pointers to it. */
+ memset(&cpu_pda[cpu], 0, sizeof(cpu_pda[cpu]));
+
+ cpu_pda[0].next = &cpu_pda[1];
+ cpu_pda[1].next = &cpu_pda[0];
+
+ cpu_pda[cpu].ex_stack = exception_stack[cpu + 1];
+
+#ifdef CONFIG_MPU
+#else
+ cpu_pda[cpu].ipdt = ipdt_tables[cpu];
+ cpu_pda[cpu].dpdt = dpdt_tables[cpu];
+#ifdef CONFIG_CPLB_INFO
+ cpu_pda[cpu].ipdt_swapcount = ipdt_swapcount_tables[cpu];
+ cpu_pda[cpu].dpdt_swapcount = dpdt_swapcount_tables[cpu];
+#endif
+#endif
+
+#ifdef CONFIG_SMP
+ cpu_pda[cpu].imask = 0x1f;
+#endif
+}
+
+void __cpuinit reserve_pda(void)
+{
+ printk(KERN_INFO "PDA for CPU%u reserved at %p\n", smp_processor_id(),
+ &cpu_pda[smp_processor_id()]);
+}
+
void __init mem_init(void)
{
unsigned int codek = 0, datak = 0, initk = 0;
@@ -141,21 +183,13 @@ void __init mem_init(void)

static int __init sram_init(void)
{
- unsigned long tmp;
-
/* Initialize the blackfin L1 Memory. */
bfin_sram_init();

- /* Allocate this once; never free it. We assume this gives us a
- pointer to the start of L1 scratchpad memory; panic if it
- doesn't. */
- tmp = (unsigned long)l1sram_alloc(sizeof(struct l1_scratch_task_info));
- if (tmp != (unsigned long)L1_SCRATCH_TASK_INFO) {
- printk(KERN_EMERG "mem_init(): Did not get the right address from l1sram_alloc: %08lx != %08lx\n",
- tmp, (unsigned long)L1_SCRATCH_TASK_INFO);
- panic("No L1, time to give up\n");
- }
-
+ /* Reserve the PDA space for the boot CPU right after we
+ * initialized the scratch memory allocator.
+ */
+ reserve_pda();
return 0;
}
pure_initcall(sram_init);
diff --git a/arch/blackfin/mm/sram-alloc.c b/arch/blackfin/mm/sram-alloc.c
index cc6f336..8f82b4c 100644
--- a/arch/blackfin/mm/sram-alloc.c
+++ b/arch/blackfin/mm/sram-alloc.c
@@ -41,8 +41,10 @@
#include <asm/blackfin.h>
#include "blackfin_sram.h"

-static spinlock_t l1sram_lock, l1_data_sram_lock, l1_inst_sram_lock;
-static spinlock_t l2_sram_lock;
+static DEFINE_PER_CPU(spinlock_t, l1sram_lock) ____cacheline_aligned_in_smp;
+static DEFINE_PER_CPU(spinlock_t, l1_data_sram_lock) ____cacheline_aligned_in_smp;
+static DEFINE_PER_CPU(spinlock_t, l1_inst_sram_lock) ____cacheline_aligned_in_smp;
+static spinlock_t l2_sram_lock ____cacheline_aligned_in_smp;

/* the data structure for L1 scratchpad and DATA SRAM */
struct sram_piece {
@@ -52,18 +54,22 @@ struct sram_piece {
struct sram_piece *next;
};

-static struct sram_piece free_l1_ssram_head, used_l1_ssram_head;
+static DEFINE_PER_CPU(struct sram_piece, free_l1_ssram_head);
+static DEFINE_PER_CPU(struct sram_piece, used_l1_ssram_head);

#if L1_DATA_A_LENGTH != 0
-static struct sram_piece free_l1_data_A_sram_head, used_l1_data_A_sram_head;
+static DEFINE_PER_CPU(struct sram_piece, free_l1_data_A_sram_head);
+static DEFINE_PER_CPU(struct sram_piece, used_l1_data_A_sram_head);
#endif

#if L1_DATA_B_LENGTH != 0
-static struct sram_piece free_l1_data_B_sram_head, used_l1_data_B_sram_head;
+static DEFINE_PER_CPU(struct sram_piece, free_l1_data_B_sram_head);
+static DEFINE_PER_CPU(struct sram_piece, used_l1_data_B_sram_head);
#endif

#if L1_CODE_LENGTH != 0
-static struct sram_piece free_l1_inst_sram_head, used_l1_inst_sram_head;
+static DEFINE_PER_CPU(struct sram_piece, free_l1_inst_sram_head);
+static DEFINE_PER_CPU(struct sram_piece, used_l1_inst_sram_head);
#endif

#if L2_LENGTH != 0
@@ -75,102 +81,115 @@ static struct kmem_cache *sram_piece_cache;
/* L1 Scratchpad SRAM initialization function */
static void __init l1sram_init(void)
{
- free_l1_ssram_head.next =
- kmem_cache_alloc(sram_piece_cache, GFP_KERNEL);
- if (!free_l1_ssram_head.next) {
- printk(KERN_INFO "Failed to initialize Scratchpad data SRAM\n");
- return;
+ unsigned int cpu;
+ for (cpu = 0; cpu < num_possible_cpus(); ++cpu) {
+ per_cpu(free_l1_ssram_head, cpu).next =
+ kmem_cache_alloc(sram_piece_cache, GFP_KERNEL);
+ if (!per_cpu(free_l1_ssram_head, cpu).next) {
+ printk(KERN_INFO "Fail to initialize Scratchpad data SRAM.\n");
+ return;
+ }
+
+ per_cpu(free_l1_ssram_head, cpu).next->paddr = (void *)get_l1_scratch_start_cpu(cpu);
+ per_cpu(free_l1_ssram_head, cpu).next->size = L1_SCRATCH_LENGTH;
+ per_cpu(free_l1_ssram_head, cpu).next->pid = 0;
+ per_cpu(free_l1_ssram_head, cpu).next->next = NULL;
+
+ per_cpu(used_l1_ssram_head, cpu).next = NULL;
+
+ /* mutex initialize */
+ spin_lock_init(&per_cpu(l1sram_lock, cpu));
+ printk(KERN_INFO "Blackfin Scratchpad data SRAM: %d KB\n",
+ L1_SCRATCH_LENGTH >> 10);
}
-
- free_l1_ssram_head.next->paddr = (void *)L1_SCRATCH_START;
- free_l1_ssram_head.next->size = L1_SCRATCH_LENGTH;
- free_l1_ssram_head.next->pid = 0;
- free_l1_ssram_head.next->next = NULL;
-
- used_l1_ssram_head.next = NULL;
-
- /* mutex initialize */
- spin_lock_init(&l1sram_lock);
-
- printk(KERN_INFO "Blackfin Scratchpad data SRAM: %d KB\n",
- L1_SCRATCH_LENGTH >> 10);
}

static void __init l1_data_sram_init(void)
{
+ unsigned int cpu;
#if L1_DATA_A_LENGTH != 0
- free_l1_data_A_sram_head.next =
- kmem_cache_alloc(sram_piece_cache, GFP_KERNEL);
- if (!free_l1_data_A_sram_head.next) {
- printk(KERN_INFO "Failed to initialize L1 Data A SRAM\n");
- return;
+ for (cpu = 0; cpu < num_possible_cpus(); ++cpu) {
+ per_cpu(free_l1_data_A_sram_head, cpu).next =
+ kmem_cache_alloc(sram_piece_cache, GFP_KERNEL);
+ if (!per_cpu(free_l1_data_A_sram_head, cpu).next) {
+ printk(KERN_INFO "Fail to initialize L1 Data A SRAM.\n");
+ return;
+ }
+
+ per_cpu(free_l1_data_A_sram_head, cpu).next->paddr =
+ (void *)get_l1_data_a_start_cpu(cpu) + (_ebss_l1 - _sdata_l1);
+ per_cpu(free_l1_data_A_sram_head, cpu).next->size =
+ L1_DATA_A_LENGTH - (_ebss_l1 - _sdata_l1);
+ per_cpu(free_l1_data_A_sram_head, cpu).next->pid = 0;
+ per_cpu(free_l1_data_A_sram_head, cpu).next->next = NULL;
+
+ per_cpu(used_l1_data_A_sram_head, cpu).next = NULL;
+
+ printk(KERN_INFO "Blackfin L1 Data A SRAM: %d KB (%d KB free)\n",
+ L1_DATA_A_LENGTH >> 10,
+ per_cpu(free_l1_data_A_sram_head, cpu).next->size >> 10);
}
-
- free_l1_data_A_sram_head.next->paddr =
- (void *)L1_DATA_A_START + (_ebss_l1 - _sdata_l1);
- free_l1_data_A_sram_head.next->size =
- L1_DATA_A_LENGTH - (_ebss_l1 - _sdata_l1);
- free_l1_data_A_sram_head.next->pid = 0;
- free_l1_data_A_sram_head.next->next = NULL;
-
- used_l1_data_A_sram_head.next = NULL;
-
- printk(KERN_INFO "Blackfin L1 Data A SRAM: %d KB (%d KB free)\n",
- L1_DATA_A_LENGTH >> 10,
- free_l1_data_A_sram_head.next->size >> 10);
#endif
#if L1_DATA_B_LENGTH != 0
- free_l1_data_B_sram_head.next =
- kmem_cache_alloc(sram_piece_cache, GFP_KERNEL);
- if (!free_l1_data_B_sram_head.next) {
- printk(KERN_INFO "Failed to initialize L1 Data B SRAM\n");
- return;
+ for (cpu = 0; cpu < num_possible_cpus(); ++cpu) {
+ per_cpu(free_l1_data_B_sram_head, cpu).next =
+ kmem_cache_alloc(sram_piece_cache, GFP_KERNEL);
+ if (!per_cpu(free_l1_data_B_sram_head, cpu).next) {
+ printk(KERN_INFO "Fail to initialize L1 Data B SRAM.\n");
+ return;
+ }
+
+ per_cpu(free_l1_data_B_sram_head, cpu).next->paddr =
+ (void *)get_l1_data_b_start_cpu(cpu) + (_ebss_b_l1 - _sdata_b_l1);
+ per_cpu(free_l1_data_B_sram_head, cpu).next->size =
+ L1_DATA_B_LENGTH - (_ebss_b_l1 - _sdata_b_l1);
+ per_cpu(free_l1_data_B_sram_head, cpu).next->pid = 0;
+ per_cpu(free_l1_data_B_sram_head, cpu).next->next = NULL;
+
+ per_cpu(used_l1_data_B_sram_head, cpu).next = NULL;
+
+ printk(KERN_INFO "Blackfin L1 Data B SRAM: %d KB (%d KB free)\n",
+ L1_DATA_B_LENGTH >> 10,
+ per_cpu(free_l1_data_B_sram_head, cpu).next->size >> 10);
+ /* mutex initialize */
}
-
- free_l1_data_B_sram_head.next->paddr =
- (void *)L1_DATA_B_START + (_ebss_b_l1 - _sdata_b_l1);
- free_l1_data_B_sram_head.next->size =
- L1_DATA_B_LENGTH - (_ebss_b_l1 - _sdata_b_l1);
- free_l1_data_B_sram_head.next->pid = 0;
- free_l1_data_B_sram_head.next->next = NULL;
-
- used_l1_data_B_sram_head.next = NULL;
-
- printk(KERN_INFO "Blackfin L1 Data B SRAM: %d KB (%d KB free)\n",
- L1_DATA_B_LENGTH >> 10,
- free_l1_data_B_sram_head.next->size >> 10);
#endif

- /* mutex initialize */
- spin_lock_init(&l1_data_sram_lock);
+#if L1_DATA_A_LENGTH != 0 || L1_DATA_B_LENGTH != 0
+ for (cpu = 0; cpu < num_possible_cpus(); ++cpu)
+ spin_lock_init(&per_cpu(l1_data_sram_lock, cpu));
+#endif
}

static void __init l1_inst_sram_init(void)
{
#if L1_CODE_LENGTH != 0
- free_l1_inst_sram_head.next =
- kmem_cache_alloc(sram_piece_cache, GFP_KERNEL);
- if (!free_l1_inst_sram_head.next) {
- printk(KERN_INFO "Failed to initialize L1 Instruction SRAM\n");
- return;
+ unsigned int cpu;
+ for (cpu = 0; cpu < num_possible_cpus(); ++cpu) {
+ per_cpu(free_l1_inst_sram_head, cpu).next =
+ kmem_cache_alloc(sram_piece_cache, GFP_KERNEL);
+ if (!per_cpu(free_l1_inst_sram_head, cpu).next) {
+ printk(KERN_INFO "Failed to initialize L1 Instruction SRAM\n");
+ return;
+ }
+
+ per_cpu(free_l1_inst_sram_head, cpu).next->paddr =
+ (void *)get_l1_code_start_cpu(cpu) + (_etext_l1 - _stext_l1);
+ per_cpu(free_l1_inst_sram_head, cpu).next->size =
+ L1_CODE_LENGTH - (_etext_l1 - _stext_l1);
+ per_cpu(free_l1_inst_sram_head, cpu).next->pid = 0;
+ per_cpu(free_l1_inst_sram_head, cpu).next->next = NULL;
+
+ per_cpu(used_l1_inst_sram_head, cpu).next = NULL;
+
+ printk(KERN_INFO "Blackfin L1 Instruction SRAM: %d KB (%d KB free)\n",
+ L1_CODE_LENGTH >> 10,
+ per_cpu(free_l1_inst_sram_head, cpu).next->size >> 10);
+
+ /* mutex initialize */
+ spin_lock_init(&per_cpu(l1_inst_sram_lock, cpu));
}
-
- free_l1_inst_sram_head.next->paddr =
- (void *)L1_CODE_START + (_etext_l1 - _stext_l1);
- free_l1_inst_sram_head.next->size =
- L1_CODE_LENGTH - (_etext_l1 - _stext_l1);
- free_l1_inst_sram_head.next->pid = 0;
- free_l1_inst_sram_head.next->next = NULL;
-
- used_l1_inst_sram_head.next = NULL;
-
- printk(KERN_INFO "Blackfin L1 Instruction SRAM: %d KB (%d KB free)\n",
- L1_CODE_LENGTH >> 10,
- free_l1_inst_sram_head.next->size >> 10);
#endif
-
- /* mutex initialize */
- spin_lock_init(&l1_inst_sram_lock);
}

static void __init l2_sram_init(void)
@@ -179,7 +198,7 @@ static void __init l2_sram_init(void)
free_l2_sram_head.next =
kmem_cache_alloc(sram_piece_cache, GFP_KERNEL);
if (!free_l2_sram_head.next) {
- printk(KERN_INFO "Failed to initialize L2 SRAM\n");
+ printk(KERN_INFO "Fail to initialize L2 SRAM.\n");
return;
}

@@ -200,6 +219,7 @@ static void __init l2_sram_init(void)
/* mutex initialize */
spin_lock_init(&l2_sram_lock);
}
+
void __init bfin_sram_init(void)
{
sram_piece_cache = kmem_cache_create("sram_piece_cache",
@@ -353,20 +373,20 @@ int sram_free(const void *addr)
{

#if L1_CODE_LENGTH != 0
- if (addr >= (void *)L1_CODE_START
- && addr < (void *)(L1_CODE_START + L1_CODE_LENGTH))
+ if (addr >= (void *)get_l1_code_start()
+ && addr < (void *)(get_l1_code_start() + L1_CODE_LENGTH))
return l1_inst_sram_free(addr);
else
#endif
#if L1_DATA_A_LENGTH != 0
- if (addr >= (void *)L1_DATA_A_START
- && addr < (void *)(L1_DATA_A_START + L1_DATA_A_LENGTH))
+ if (addr >= (void *)get_l1_data_a_start()
+ && addr < (void *)(get_l1_data_a_start() + L1_DATA_A_LENGTH))
return l1_data_A_sram_free(addr);
else
#endif
#if L1_DATA_B_LENGTH != 0
- if (addr >= (void *)L1_DATA_B_START
- && addr < (void *)(L1_DATA_B_START + L1_DATA_B_LENGTH))
+ if (addr >= (void *)get_l1_data_b_start()
+ && addr < (void *)(get_l1_data_b_start() + L1_DATA_B_LENGTH))
return l1_data_B_sram_free(addr);
else
#endif
@@ -384,17 +404,20 @@ void *l1_data_A_sram_alloc(size_t size)
{
unsigned long flags;
void *addr = NULL;
+ unsigned int cpu;

+ cpu = get_cpu();
/* add mutex operation */
- spin_lock_irqsave(&l1_data_sram_lock, flags);
+ spin_lock_irqsave(&per_cpu(l1_data_sram_lock, cpu), flags);

#if L1_DATA_A_LENGTH != 0
- addr = _sram_alloc(size, &free_l1_data_A_sram_head,
- &used_l1_data_A_sram_head);
+ addr = _sram_alloc(size, &per_cpu(free_l1_data_A_sram_head, cpu),
+ &per_cpu(used_l1_data_A_sram_head, cpu));
#endif

/* add mutex operation */
- spin_unlock_irqrestore(&l1_data_sram_lock, flags);
+ spin_unlock_irqrestore(&per_cpu(l1_data_sram_lock, cpu), flags);
+ put_cpu();

pr_debug("Allocated address in l1_data_A_sram_alloc is 0x%lx+0x%lx\n",
(long unsigned int)addr, size);
@@ -407,19 +430,22 @@ int l1_data_A_sram_free(const void *addr)
{
unsigned long flags;
int ret;
+ unsigned int cpu;

+ cpu = get_cpu();
/* add mutex operation */
- spin_lock_irqsave(&l1_data_sram_lock, flags);
+ spin_lock_irqsave(&per_cpu(l1_data_sram_lock, cpu), flags);

#if L1_DATA_A_LENGTH != 0
- ret = _sram_free(addr, &free_l1_data_A_sram_head,
- &used_l1_data_A_sram_head);
+ ret = _sram_free(addr, &per_cpu(free_l1_data_A_sram_head, cpu),
+ &per_cpu(used_l1_data_A_sram_head, cpu));
#else
ret = -1;
#endif

/* add mutex operation */
- spin_unlock_irqrestore(&l1_data_sram_lock, flags);
+ spin_unlock_irqrestore(&per_cpu(l1_data_sram_lock, cpu), flags);
+ put_cpu();

return ret;
}
@@ -430,15 +456,18 @@ void *l1_data_B_sram_alloc(size_t size)
#if L1_DATA_B_LENGTH != 0
unsigned long flags;
void *addr;
+ unsigned int cpu;

+ cpu = get_cpu();
/* add mutex operation */
- spin_lock_irqsave(&l1_data_sram_lock, flags);
+ spin_lock_irqsave(&per_cpu(l1_data_sram_lock, cpu), flags);

- addr = _sram_alloc(size, &free_l1_data_B_sram_head,
- &used_l1_data_B_sram_head);
+ addr = _sram_alloc(size, &per_cpu(free_l1_data_B_sram_head, cpu),
+ &per_cpu(used_l1_data_B_sram_head, cpu));

/* add mutex operation */
- spin_unlock_irqrestore(&l1_data_sram_lock, flags);
+ spin_unlock_irqrestore(&per_cpu(l1_data_sram_lock, cpu), flags);
+ put_cpu();

pr_debug("Allocated address in l1_data_B_sram_alloc is 0x%lx+0x%lx\n",
(long unsigned int)addr, size);
@@ -455,15 +484,18 @@ int l1_data_B_sram_free(const void *addr)
#if L1_DATA_B_LENGTH != 0
unsigned long flags;
int ret;
+ unsigned int cpu;

+ cpu = get_cpu();
/* add mutex operation */
- spin_lock_irqsave(&l1_data_sram_lock, flags);
+ spin_lock_irqsave(&per_cpu(l1_data_sram_lock, cpu), flags);

- ret = _sram_free(addr, &free_l1_data_B_sram_head,
- &used_l1_data_B_sram_head);
+ ret = _sram_free(addr, &per_cpu(free_l1_data_B_sram_head, cpu),
+ &per_cpu(used_l1_data_B_sram_head, cpu));

/* add mutex operation */
- spin_unlock_irqrestore(&l1_data_sram_lock, flags);
+ spin_unlock_irqrestore(&per_cpu(l1_data_sram_lock, cpu), flags);
+ put_cpu();

return ret;
#else
@@ -509,15 +541,18 @@ void *l1_inst_sram_alloc(size_t size)
#if L1_CODE_LENGTH != 0
unsigned long flags;
void *addr;
+ unsigned int cpu;

+ cpu = get_cpu();
/* add mutex operation */
- spin_lock_irqsave(&l1_inst_sram_lock, flags);
+ spin_lock_irqsave(&per_cpu(l1_inst_sram_lock, cpu), flags);

- addr = _sram_alloc(size, &free_l1_inst_sram_head,
- &used_l1_inst_sram_head);
+ addr = _sram_alloc(size, &per_cpu(free_l1_inst_sram_head, cpu),
+ &per_cpu(used_l1_inst_sram_head, cpu));

/* add mutex operation */
- spin_unlock_irqrestore(&l1_inst_sram_lock, flags);
+ spin_unlock_irqrestore(&per_cpu(l1_inst_sram_lock, cpu), flags);
+ put_cpu();

pr_debug("Allocated address in l1_inst_sram_alloc is 0x%lx+0x%lx\n",
(long unsigned int)addr, size);
@@ -534,15 +569,18 @@ int l1_inst_sram_free(const void *addr)
#if L1_CODE_LENGTH != 0
unsigned long flags;
int ret;
+ unsigned int cpu;

+ cpu = get_cpu();
/* add mutex operation */
- spin_lock_irqsave(&l1_inst_sram_lock, flags);
+ spin_lock_irqsave(&per_cpu(l1_inst_sram_lock, cpu), flags);

- ret = _sram_free(addr, &free_l1_inst_sram_head,
- &used_l1_inst_sram_head);
+ ret = _sram_free(addr, &per_cpu(free_l1_inst_sram_head, cpu),
+ &per_cpu(used_l1_inst_sram_head, cpu));

/* add mutex operation */
- spin_unlock_irqrestore(&l1_inst_sram_lock, flags);
+ spin_unlock_irqrestore(&per_cpu(l1_inst_sram_lock, cpu), flags);
+ put_cpu();

return ret;
#else
@@ -556,15 +594,18 @@ void *l1sram_alloc(size_t size)
{
unsigned long flags;
void *addr;
+ unsigned int cpu;

+ cpu = get_cpu();
/* add mutex operation */
- spin_lock_irqsave(&l1sram_lock, flags);
+ spin_lock_irqsave(&per_cpu(l1sram_lock, cpu), flags);

- addr = _sram_alloc(size, &free_l1_ssram_head,
- &used_l1_ssram_head);
+ addr = _sram_alloc(size, &per_cpu(free_l1_ssram_head, cpu),
+ &per_cpu(used_l1_ssram_head, cpu));

/* add mutex operation */
- spin_unlock_irqrestore(&l1sram_lock, flags);
+ spin_unlock_irqrestore(&per_cpu(l1sram_lock, cpu), flags);
+ put_cpu();

return addr;
}
@@ -574,15 +615,18 @@ void *l1sram_alloc_max(size_t *psize)
{
unsigned long flags;
void *addr;
+ unsigned int cpu;

+ cpu = get_cpu();
/* add mutex operation */
- spin_lock_irqsave(&l1sram_lock, flags);
+ spin_lock_irqsave(&per_cpu(l1sram_lock, cpu), flags);

- addr = _sram_alloc_max(&free_l1_ssram_head,
- &used_l1_ssram_head, psize);
+ addr = _sram_alloc_max(&per_cpu(free_l1_ssram_head, cpu),
+ &per_cpu(used_l1_ssram_head, cpu), psize);

/* add mutex operation */
- spin_unlock_irqrestore(&l1sram_lock, flags);
+ spin_unlock_irqrestore(&per_cpu(l1sram_lock, cpu), flags);
+ put_cpu();

return addr;
}
@@ -592,15 +636,18 @@ int l1sram_free(const void *addr)
{
unsigned long flags;
int ret;
+ unsigned int cpu;

+ cpu = get_cpu();
/* add mutex operation */
- spin_lock_irqsave(&l1sram_lock, flags);
+ spin_lock_irqsave(&per_cpu(l1sram_lock, cpu), flags);

- ret = _sram_free(addr, &free_l1_ssram_head,
- &used_l1_ssram_head);
+ ret = _sram_free(addr, &per_cpu(free_l1_ssram_head, cpu),
+ &per_cpu(used_l1_ssram_head, cpu));

/* add mutex operation */
- spin_unlock_irqrestore(&l1sram_lock, flags);
+ spin_unlock_irqrestore(&per_cpu(l1sram_lock, cpu), flags);
+ put_cpu();

return ret;
}
@@ -761,33 +808,36 @@ static int sram_proc_read(char *buf, char **start, off_t offset, int count,
int *eof, void *data)
{
int len = 0;
+ unsigned int cpu;

- if (_sram_proc_read(buf, &len, count, "Scratchpad",
- &free_l1_ssram_head, &used_l1_ssram_head))
- goto not_done;
+ for (cpu = 0; cpu < num_possible_cpus(); ++cpu) {
+ if (_sram_proc_read(buf, &len, count, "Scratchpad",
+ &per_cpu(free_l1_ssram_head, cpu), &per_cpu(used_l1_ssram_head, cpu)))
+ goto not_done;
#if L1_DATA_A_LENGTH != 0
- if (_sram_proc_read(buf, &len, count, "L1 Data A",
- &free_l1_data_A_sram_head,
- &used_l1_data_A_sram_head))
- goto not_done;
+ if (_sram_proc_read(buf, &len, count, "L1 Data A",
+ &per_cpu(free_l1_data_A_sram_head, cpu),
+ &per_cpu(used_l1_data_A_sram_head, cpu)))
+ goto not_done;
#endif
#if L1_DATA_B_LENGTH != 0
- if (_sram_proc_read(buf, &len, count, "L1 Data B",
- &free_l1_data_B_sram_head,
- &used_l1_data_B_sram_head))
- goto not_done;
+ if (_sram_proc_read(buf, &len, count, "L1 Data B",
+ &per_cpu(free_l1_data_B_sram_head, cpu),
+ &per_cpu(used_l1_data_B_sram_head, cpu)))
+ goto not_done;
#endif
#if L1_CODE_LENGTH != 0
- if (_sram_proc_read(buf, &len, count, "L1 Instruction",
- &free_l1_inst_sram_head, &used_l1_inst_sram_head))
- goto not_done;
+ if (_sram_proc_read(buf, &len, count, "L1 Instruction",
+ &per_cpu(free_l1_inst_sram_head, cpu),
+ &per_cpu(used_l1_inst_sram_head, cpu)))
+ goto not_done;
#endif
+ }
#if L2_LENGTH != 0
- if (_sram_proc_read(buf, &len, count, "L2",
- &free_l2_sram_head, &used_l2_sram_head))
+ if (_sram_proc_read(buf, &len, count, "L2", &free_l2_sram_head,
+ &used_l2_sram_head))
goto not_done;
#endif
-
*eof = 1;
not_done:
return len;
--
1.5.6.3

2008-11-19 06:57:19

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/5] Blackfin SMP like patchset

On Tue, 18 Nov 2008 17:05:03 +0800 Bryan Wu <[email protected]> wrote:

>
> Hi folks,
>
> We provide the SMP like functions for our Blackfin dual core processor
> BF561 for almost 1 year. And after a long time developing, debugging and
> internal review, we'd like to post them to LKML for other maintainer
> review.
>
> Please find our wiki page about this SMP like patches:
> http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like

Would prefer that changelogs be self-contained, please. Kernel
changelogs are for ever, and I doubt if that page will be there in 20
years time.

Particularly when that page must be read to learn fundamental things such as

The SMP support in certain Blackfin processors is describe as `SMP
Like' rather than just `SMP' due to the lack of hardware cache
coherency. A true SMP system would have support for cache coherency
in hardware.

On all `SMP Like' setups, cache coherency is maintained via
software mechanisms

Interesting!

2008-11-19 06:57:34

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/5] Blackfin arch: SMP supporting patchset: BF561 related code

On Tue, 18 Nov 2008 17:05:04 +0800 Bryan Wu <[email protected]> wrote:

> From: Graf Yang <[email protected]>
>
> Blackfin dual core BF561 processor can support SMP like features.
> https://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like
>
> In this patch, we provide SMP extend to BF561 kernel code
>
>
> ...
>
> --- a/arch/blackfin/mach-bf561/include/mach/mem_map.h
> +++ b/arch/blackfin/mach-bf561/include/mach/mem_map.h
> @@ -85,4 +85,124 @@
> #define L1_SCRATCH_START COREA_L1_SCRATCH_START
> #define L1_SCRATCH_LENGTH 0x1000
>
> +#ifndef __ASSEMBLY__
> +
> +#ifdef CONFIG_SMP
> +
> +#define get_l1_scratch_start_cpu(cpu) \
> + ({ unsigned long __addr; \
> + __addr = (cpu) ? COREB_L1_SCRATCH_START : COREA_L1_SCRATCH_START;\
> + __addr; })
> +
> +#define get_l1_code_start_cpu(cpu) \
> + ({ unsigned long __addr; \
> + __addr = (cpu) ? COREB_L1_CODE_START : COREA_L1_CODE_START; \
> + __addr; })
> +
> +#define get_l1_data_a_start_cpu(cpu) \
> + ({ unsigned long __addr; \
> + __addr = (cpu) ? COREB_L1_DATA_A_START : COREA_L1_DATA_A_START;\
> + __addr; })
> +
> +#define get_l1_data_b_start_cpu(cpu) \
> + ({ unsigned long __addr; \
> + __addr = (cpu) ? COREB_L1_DATA_B_START : COREA_L1_DATA_B_START;\
> + __addr; })
> +
> +#define get_l1_scratch_start() get_l1_scratch_start_cpu(blackfin_core_id())
> +#define get_l1_code_start() get_l1_code_start_cpu(blackfin_core_id())
> +#define get_l1_data_a_start() get_l1_data_a_start_cpu(blackfin_core_id())
> +#define get_l1_data_b_start() get_l1_data_b_start_cpu(blackfin_core_id())
> +
> +#else /* !CONFIG_SMP */
> +#define get_l1_scratch_start_cpu(cpu) L1_SCRATCH_START
> +#define get_l1_code_start_cpu(cpu) L1_CODE_START
> +#define get_l1_data_a_start_cpu(cpu) L1_DATA_A_START
> +#define get_l1_data_b_start_cpu(cpu) L1_DATA_B_START
> +#define get_l1_scratch_start() L1_SCRATCH_START
> +#define get_l1_code_start() L1_CODE_START
> +#define get_l1_data_a_start() L1_DATA_A_START
> +#define get_l1_data_b_start() L1_DATA_B_START
> +#endif /* !CONFIG_SMP */

grumble. These didn't need to be implemented as macros and hence
shouldn't have been.

Example:

int cpu = smp_processor_id();
get_l1_scratch_start_cpu(cpu);

that code should generate unused variable warnings on CONFIG_SMP=n. If
it doesn't, you got lucky, because it _should_.

Also

int cpu = smp_processor_id();
get_l1_scratch_start_cpu(pcu);

will happily compile and run with CONFIG_SMP=n.


macros=bad,bad,bad.

>
> ...
>
> --- /dev/null
> +++ b/arch/blackfin/mach-bf561/smp.c
> @@ -0,0 +1,182 @@
> +/*
> + * File: arch/blackfin/mach-bf561/smp.c
> + * Author: Philippe Gerum <[email protected]>
> + *
> + * Copyright 2007 Analog Devices Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see the file COPYING, or write
> + * to the Free Software Foundation, Inc.,
> + * 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/sched.h>
> +#include <linux/delay.h>
> +#include <asm/smp.h>
> +#include <asm/dma.h>
> +
> +#define COREB_SRAM_BASE 0xff600000
> +#define COREB_SRAM_SIZE 0x4000
> +
> +extern char coreb_trampoline_start, coreb_trampoline_end;

OK, these are defined in .S and we do often put declarations for such
things in .c rather than in .h. But I think it's better to put them in
.h anyway, to avoid possibly duplicated declarations in the future.

> +static DEFINE_SPINLOCK(boot_lock);
> +
> +static cpumask_t cpu_callin_map;
> +
>
> ...
>
> +void __cpuinit platform_secondary_init(unsigned int cpu)
> +{
> + local_irq_disable();
> +
> + /* Clone setup for peripheral interrupt sources from CoreA. */
> + bfin_write_SICB_IMASK0(bfin_read_SICA_IMASK0());
> + bfin_write_SICB_IMASK1(bfin_read_SICA_IMASK1());
> + SSYNC();
> +
> + /* Clone setup for IARs from CoreA. */
> + bfin_write_SICB_IAR0(bfin_read_SICA_IAR0());
> + bfin_write_SICB_IAR1(bfin_read_SICA_IAR1());
> + bfin_write_SICB_IAR2(bfin_read_SICA_IAR2());
> + bfin_write_SICB_IAR3(bfin_read_SICA_IAR3());
> + bfin_write_SICB_IAR4(bfin_read_SICA_IAR4());
> + bfin_write_SICB_IAR5(bfin_read_SICA_IAR5());
> + bfin_write_SICB_IAR6(bfin_read_SICA_IAR6());
> + bfin_write_SICB_IAR7(bfin_read_SICA_IAR7());
> + SSYNC();
> +
> + local_irq_enable();
> +
> + /* Calibrate loops per jiffy value. */
> + calibrate_delay();
> +
> + /* Store CPU-private information to the cpu_data array. */
> + bfin_setup_cpudata(cpu);
> +
> + /* We are done with local CPU inits, unblock the boot CPU. */
> + cpu_set(cpu, cpu_callin_map);
> + spin_lock(&boot_lock);
> + spin_unlock(&boot_lock);

Is this spin_lock()+spin_unlock() supposed to block until the secondary
CPU is running? If so, I don't think it works.

> +}
> +
>
> ...
>

2008-11-19 06:57:50

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/5] Blackfin arch: SMP supporting patchset: Blackfin header files and machine common code

On Tue, 18 Nov 2008 17:05:05 +0800 Bryan Wu <[email protected]> wrote:

> From: Graf Yang <[email protected]>
>
> Blackfin dual core BF561 processor can support SMP like features.
> https://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like
>
> In this patch, we provide SMP extend to Blackfin header files
> and machine common code
>
>
> ...
>
> +#define atomic_add_unless(v, a, u) \
> +({ \
> + int c, old; \
> + c = atomic_read(v); \
> + while (c != (u) && (old = atomic_cmpxchg((v), c, c + (a))) != c) \
> + c = old; \
> + c != (u); \
> +})

The macro references its args multiple times and will do weird or
inefficient things when called with expressions which have
side-effects, or which do slow things.

>
> ...
>
> +#include <asm/system.h> /* save_flags */
> +
> +static inline void set_bit(int nr, volatile unsigned long *addr)
> {
> int *a = (int *)addr;
> int mask;
> @@ -57,21 +91,23 @@ static __inline__ void clear_bit(int nr, volatile unsigned long *addr)
> a += nr >> 5;
> mask = 1 << (nr & 0x1f);
> local_irq_save(flags);
> - *a &= ~mask;
> + *a |= mask;

I think you just broke clear_bit(). Maybe I'm misreading the diff.

> local_irq_restore(flags);
> }
>
>
> ...
>
> +#define smp_mb__before_clear_bit() barrier()
> +#define smp_mb__after_clear_bit() barrier()
> +
> +static inline void __set_bit(int nr, volatile unsigned long *addr)
> +{
> + int *a = (int *)addr;
> + int mask;
> +
> + a += nr >> 5;
> + mask = 1 << (nr & 0x1f);
> + *a |= mask;
> +}
> +
> +static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +{
> + int *a = (int *)addr;
> + int mask;
> +
> + a += nr >> 5;
> + mask = 1 << (nr & 0x1f);
> + *a &= ~mask;
> +}
> +
> +static inline void __change_bit(int nr, volatile unsigned long *addr)
> +{
> + int mask;
> + unsigned long *ADDR = (unsigned long *)addr;
> +
> + ADDR += nr >> 5;
> + mask = 1 << (nr & 31);
> + *ADDR ^= mask;
> +}

I'm surprised there isn't any generic code which can be used for the above.

>
> ...
>

Gad what a lot of code. I don't think I have time to read it all, sorry.

2008-11-19 07:05:34

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 2/5] Blackfin arch: SMP supporting patchset: Blackfin header files and machine common code

On Wednesday 19 November 2008 17:56, Andrew Morton wrote:
> On Tue, 18 Nov 2008 17:05:05 +0800 Bryan Wu <[email protected]> wrote:

> > +#define smp_mb__before_clear_bit() barrier()
> > +#define smp_mb__after_clear_bit() barrier()
> > +
> > +static inline void __set_bit(int nr, volatile unsigned long *addr)
> > +{
> > + int *a = (int *)addr;
> > + int mask;
> > +
> > + a += nr >> 5;
> > + mask = 1 << (nr & 0x1f);
> > + *a |= mask;
> > +}
> > +
> > +static inline void __clear_bit(int nr, volatile unsigned long *addr)
> > +{
> > + int *a = (int *)addr;
> > + int mask;
> > +
> > + a += nr >> 5;
> > + mask = 1 << (nr & 0x1f);
> > + *a &= ~mask;
> > +}
> > +
> > +static inline void __change_bit(int nr, volatile unsigned long *addr)
> > +{
> > + int mask;
> > + unsigned long *ADDR = (unsigned long *)addr;
> > +
> > + ADDR += nr >> 5;
> > + mask = 1 << (nr & 31);
> > + *ADDR ^= mask;
> > +}
>
> I'm surprised there isn't any generic code which can be used for the above.

include/asm-generic/bitops/non-atomic.h


> > ...
>
> Gad what a lot of code. I don't think I have time to read it all, sorry.

:) I don't know who is expected to. Cc'ing linux-arch for something
like this might attract some helpful comments.

2008-11-19 07:27:30

by Bryan Wu

[permalink] [raw]
Subject: Re: [PATCH 0/5] Blackfin SMP like patchset

On Wed, Nov 19, 2008 at 2:56 PM, Andrew Morton
<[email protected]> wrote:
> On Tue, 18 Nov 2008 17:05:03 +0800 Bryan Wu <[email protected]> wrote:
>
>>
>> Hi folks,
>>
>> We provide the SMP like functions for our Blackfin dual core processor
>> BF561 for almost 1 year. And after a long time developing, debugging and
>> internal review, we'd like to post them to LKML for other maintainer
>> review.
>>
>> Please find our wiki page about this SMP like patches:
>> http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like
>
> Would prefer that changelogs be self-contained, please. Kernel
> changelogs are for ever, and I doubt if that page will be there in 20
> years time.
>

I guess Graf started this wiki recently although the patch exists for
a long time.
And Graf gave a presentation about this SMP on BF561 in AKA 2008 Linux kernel
developer conference. If I found the link of this presentation, I will
post it again.

> Particularly when that page must be read to learn fundamental things such as
>
> The SMP support in certain Blackfin processors is describe as `SMP
> Like' rather than just `SMP' due to the lack of hardware cache
> coherency. A true SMP system would have support for cache coherency
> in hardware.
>
> On all `SMP Like' setups, cache coherency is maintained via
> software mechanisms
>
> Interesting!
>

Exactly, SMP means hardware cache coherency. But BF561 dual core
processor was designed almost 8 years ago.
we have to do some workaround in software side. Fortunately, BF561
provides a L2 memory shared by both CoreA and CoreB.
We did some trick in this L2 memory and our Scratchpad memory.

'SMP Like' is software aided SMP solution on Blackfin dual core BF561 processor.
Please enjoy -:)

-Bryan

2008-11-19 07:28:30

by Bryan Wu

[permalink] [raw]
Subject: Re: [PATCH 0/5] Blackfin SMP like patchset

Sorry for forgetting linux-arch. post again.

-Bryan

On Wed, Nov 19, 2008 at 3:27 PM, Bryan Wu <[email protected]> wrote:
> On Wed, Nov 19, 2008 at 2:56 PM, Andrew Morton
> <[email protected]> wrote:
>> On Tue, 18 Nov 2008 17:05:03 +0800 Bryan Wu <[email protected]> wrote:
>>
>>>
>>> Hi folks,
>>>
>>> We provide the SMP like functions for our Blackfin dual core processor
>>> BF561 for almost 1 year. And after a long time developing, debugging and
>>> internal review, we'd like to post them to LKML for other maintainer
>>> review.
>>>
>>> Please find our wiki page about this SMP like patches:
>>> http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like
>>
>> Would prefer that changelogs be self-contained, please. Kernel
>> changelogs are for ever, and I doubt if that page will be there in 20
>> years time.
>>
>
> I guess Graf started this wiki recently although the patch exists for
> a long time.
> And Graf gave a presentation about this SMP on BF561 in AKA 2008 Linux kernel
> developer conference. If I found the link of this presentation, I will
> post it again.
>
>> Particularly when that page must be read to learn fundamental things such as
>>
>> The SMP support in certain Blackfin processors is describe as `SMP
>> Like' rather than just `SMP' due to the lack of hardware cache
>> coherency. A true SMP system would have support for cache coherency
>> in hardware.
>>
>> On all `SMP Like' setups, cache coherency is maintained via
>> software mechanisms
>>
>> Interesting!
>>
>
> Exactly, SMP means hardware cache coherency. But BF561 dual core
> processor was designed almost 8 years ago.
> we have to do some workaround in software side. Fortunately, BF561
> provides a L2 memory shared by both CoreA and CoreB.
> We did some trick in this L2 memory and our Scratchpad memory.
>
> 'SMP Like' is software aided SMP solution on Blackfin dual core BF561 processor.
> Please enjoy -:)
>
> -Bryan
>

2008-11-19 07:39:33

by Bryan Wu

[permalink] [raw]
Subject: Re: [PATCH 1/5] Blackfin arch: SMP supporting patchset: BF561 related code

On Wed, Nov 19, 2008 at 2:56 PM, Andrew Morton
<[email protected]> wrote:
> On Tue, 18 Nov 2008 17:05:04 +0800 Bryan Wu <[email protected]> wrote:
>
>> From: Graf Yang <[email protected]>
>>
>> Blackfin dual core BF561 processor can support SMP like features.
>> https://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like
>>
>> In this patch, we provide SMP extend to BF561 kernel code
>>
>>
>> ...
>>
>> --- a/arch/blackfin/mach-bf561/include/mach/mem_map.h
>> +++ b/arch/blackfin/mach-bf561/include/mach/mem_map.h
>> @@ -85,4 +85,124 @@
>> #define L1_SCRATCH_START COREA_L1_SCRATCH_START
>> #define L1_SCRATCH_LENGTH 0x1000
>>
>> +#ifndef __ASSEMBLY__
>> +
>> +#ifdef CONFIG_SMP
>> +
>> +#define get_l1_scratch_start_cpu(cpu) \
>> + ({ unsigned long __addr; \
>> + __addr = (cpu) ? COREB_L1_SCRATCH_START : COREA_L1_SCRATCH_START;\
>> + __addr; })
>> +
>> +#define get_l1_code_start_cpu(cpu) \
>> + ({ unsigned long __addr; \
>> + __addr = (cpu) ? COREB_L1_CODE_START : COREA_L1_CODE_START; \
>> + __addr; })
>> +
>> +#define get_l1_data_a_start_cpu(cpu) \
>> + ({ unsigned long __addr; \
>> + __addr = (cpu) ? COREB_L1_DATA_A_START : COREA_L1_DATA_A_START;\
>> + __addr; })
>> +
>> +#define get_l1_data_b_start_cpu(cpu) \
>> + ({ unsigned long __addr; \
>> + __addr = (cpu) ? COREB_L1_DATA_B_START : COREA_L1_DATA_B_START;\
>> + __addr; })
>> +
>> +#define get_l1_scratch_start() get_l1_scratch_start_cpu(blackfin_core_id())
>> +#define get_l1_code_start() get_l1_code_start_cpu(blackfin_core_id())
>> +#define get_l1_data_a_start() get_l1_data_a_start_cpu(blackfin_core_id())
>> +#define get_l1_data_b_start() get_l1_data_b_start_cpu(blackfin_core_id())
>> +
>> +#else /* !CONFIG_SMP */
>> +#define get_l1_scratch_start_cpu(cpu) L1_SCRATCH_START
>> +#define get_l1_code_start_cpu(cpu) L1_CODE_START
>> +#define get_l1_data_a_start_cpu(cpu) L1_DATA_A_START
>> +#define get_l1_data_b_start_cpu(cpu) L1_DATA_B_START
>> +#define get_l1_scratch_start() L1_SCRATCH_START
>> +#define get_l1_code_start() L1_CODE_START
>> +#define get_l1_data_a_start() L1_DATA_A_START
>> +#define get_l1_data_b_start() L1_DATA_B_START
>> +#endif /* !CONFIG_SMP */
>
> grumble. These didn't need to be implemented as macros and hence
> shouldn't have been.
>
> Example:
>
> int cpu = smp_processor_id();
> get_l1_scratch_start_cpu(cpu);
>
> that code should generate unused variable warnings on CONFIG_SMP=n. If
> it doesn't, you got lucky, because it _should_.
>
> Also
>
> int cpu = smp_processor_id();
> get_l1_scratch_start_cpu(pcu);
>
> will happily compile and run with CONFIG_SMP=n.
>
>
> macros=bad,bad,bad.
>

Yes, I also prefer inline functions rather than macros here.
Right, Graf?

>>
>> ...
>>
>> --- /dev/null
>> +++ b/arch/blackfin/mach-bf561/smp.c
>> @@ -0,0 +1,182 @@
>> +/*
>> + * File: arch/blackfin/mach-bf561/smp.c
>> + * Author: Philippe Gerum <[email protected]>
>> + *
>> + * Copyright 2007 Analog Devices Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, see the file COPYING, or write
>> + * to the Free Software Foundation, Inc.,
>> + * 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
>> + */
>> +
>> +#include <linux/init.h>
>> +#include <linux/kernel.h>
>> +#include <linux/sched.h>
>> +#include <linux/delay.h>
>> +#include <asm/smp.h>
>> +#include <asm/dma.h>
>> +
>> +#define COREB_SRAM_BASE 0xff600000
>> +#define COREB_SRAM_SIZE 0x4000
>> +
>> +extern char coreb_trampoline_start, coreb_trampoline_end;
>
> OK, these are defined in .S and we do often put declarations for such
> things in .c rather than in .h. But I think it's better to put them in
> .h anyway, to avoid possibly duplicated declarations in the future.
>

Oh, I suggested Graf to run checkpatch.pl to find some issues before I
sent out this patch.
Should this issues be catched by checkpatch.pl?


>> +static DEFINE_SPINLOCK(boot_lock);
>> +
>> +static cpumask_t cpu_callin_map;
>> +
>>
>> ...
>>
>> +void __cpuinit platform_secondary_init(unsigned int cpu)
>> +{
>> + local_irq_disable();
>> +
>> + /* Clone setup for peripheral interrupt sources from CoreA. */
>> + bfin_write_SICB_IMASK0(bfin_read_SICA_IMASK0());
>> + bfin_write_SICB_IMASK1(bfin_read_SICA_IMASK1());
>> + SSYNC();
>> +
>> + /* Clone setup for IARs from CoreA. */
>> + bfin_write_SICB_IAR0(bfin_read_SICA_IAR0());
>> + bfin_write_SICB_IAR1(bfin_read_SICA_IAR1());
>> + bfin_write_SICB_IAR2(bfin_read_SICA_IAR2());
>> + bfin_write_SICB_IAR3(bfin_read_SICA_IAR3());
>> + bfin_write_SICB_IAR4(bfin_read_SICA_IAR4());
>> + bfin_write_SICB_IAR5(bfin_read_SICA_IAR5());
>> + bfin_write_SICB_IAR6(bfin_read_SICA_IAR6());
>> + bfin_write_SICB_IAR7(bfin_read_SICA_IAR7());
>> + SSYNC();
>> +
>> + local_irq_enable();
>> +
>> + /* Calibrate loops per jiffy value. */
>> + calibrate_delay();
>> +
>> + /* Store CPU-private information to the cpu_data array. */
>> + bfin_setup_cpudata(cpu);
>> +
>> + /* We are done with local CPU inits, unblock the boot CPU. */
>> + cpu_set(cpu, cpu_callin_map);
>> + spin_lock(&boot_lock);
>> + spin_unlock(&boot_lock);
>
> Is this spin_lock()+spin_unlock() supposed to block until the secondary
> CPU is running? If so, I don't think it works.
>

We can remove these 2 line spin_lock+spin_unlock and it also works.
But maybe we will add some operation between spin_lock and spin_unlock
here in the future,
we'd like to keep them.

P.S. also forward this patch to linux-arch

Thanks
-Bryan

>> +}
>> +
>>
>> ...
>>
>
>

2008-11-19 07:42:25

by Bryan Wu

[permalink] [raw]
Subject: Re: [PATCH 2/5] Blackfin arch: SMP supporting patchset: Blackfin header files and machine common code

On Wed, Nov 19, 2008 at 2:56 PM, Andrew Morton
<[email protected]> wrote:
> On Tue, 18 Nov 2008 17:05:05 +0800 Bryan Wu <[email protected]> wrote:
>
>> From: Graf Yang <[email protected]>
>>
>> Blackfin dual core BF561 processor can support SMP like features.
>> https://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like
>>
>> In this patch, we provide SMP extend to Blackfin header files
>> and machine common code
>>
>>
>> ...
>>
>> +#define atomic_add_unless(v, a, u) \
>> +({ \
>> + int c, old; \
>> + c = atomic_read(v); \
>> + while (c != (u) && (old = atomic_cmpxchg((v), c, c + (a))) != c) \
>> + c = old; \
>> + c != (u); \
>> +})
>
> The macro references its args multiple times and will do weird or
> inefficient things when called with expressions which have
> side-effects, or which do slow things.
>

Right, I think we can replace them to inline functions


>>
>> ...
>>
>> +#include <asm/system.h> /* save_flags */
>> +
>> +static inline void set_bit(int nr, volatile unsigned long *addr)
>> {
>> int *a = (int *)addr;
>> int mask;
>> @@ -57,21 +91,23 @@ static __inline__ void clear_bit(int nr, volatile unsigned long *addr)
>> a += nr >> 5;
>> mask = 1 << (nr & 0x1f);
>> local_irq_save(flags);
>> - *a &= ~mask;
>> + *a |= mask;
>
> I think you just broke clear_bit(). Maybe I'm misreading the diff.
>
>> local_irq_restore(flags);
>> }
>>
>>
>> ...
>>
>> +#define smp_mb__before_clear_bit() barrier()
>> +#define smp_mb__after_clear_bit() barrier()
>> +
>> +static inline void __set_bit(int nr, volatile unsigned long *addr)
>> +{
>> + int *a = (int *)addr;
>> + int mask;
>> +
>> + a += nr >> 5;
>> + mask = 1 << (nr & 0x1f);
>> + *a |= mask;
>> +}
>> +
>> +static inline void __clear_bit(int nr, volatile unsigned long *addr)
>> +{
>> + int *a = (int *)addr;
>> + int mask;
>> +
>> + a += nr >> 5;
>> + mask = 1 << (nr & 0x1f);
>> + *a &= ~mask;
>> +}
>> +
>> +static inline void __change_bit(int nr, volatile unsigned long *addr)
>> +{
>> + int mask;
>> + unsigned long *ADDR = (unsigned long *)addr;
>> +
>> + ADDR += nr >> 5;
>> + mask = 1 << (nr & 31);
>> + *ADDR ^= mask;
>> +}
>
> I'm surprised there isn't any generic code which can be used for the above.
>

As Nick said, include/asm-generic/bitops/non-atomic.h is the generic code.
We will try it.

>>
>> ...
>>
>
> Gad what a lot of code. I don't think I have time to read it all, sorry.
>

Thanks a lot for the review, I will forward this patchset to linux-arch.
-Bryan

2008-11-19 07:44:24

by Bryan Wu

[permalink] [raw]
Subject: Re: [PATCH 2/5] Blackfin arch: SMP supporting patchset: Blackfin header files and machine common code

Post this patch to linux-arch, maybe more people are interested in this.

-Bryan

On Tue, Nov 18, 2008 at 5:05 PM, Bryan Wu <[email protected]> wrote:
> From: Graf Yang <[email protected]>
>
> Blackfin dual core BF561 processor can support SMP like features.
> https://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like
>
> In this patch, we provide SMP extend to Blackfin header files
> and machine common code
>
> Signed-off-by: Graf Yang <[email protected]>
> Signed-off-by: Bryan Wu <[email protected]>
> ---
> arch/blackfin/include/asm/atomic.h | 124 ++++++--
> arch/blackfin/include/asm/bfin-global.h | 5 +-
> arch/blackfin/include/asm/bitops.h | 185 ++++++++----
> arch/blackfin/include/asm/cache.h | 29 ++
> arch/blackfin/include/asm/cacheflush.h | 20 +-
> arch/blackfin/include/asm/context.S | 6 +-
> arch/blackfin/include/asm/cpu.h | 42 +++
> arch/blackfin/include/asm/l1layout.h | 3 +-
> arch/blackfin/include/asm/mutex-dec.h | 112 +++++++
> arch/blackfin/include/asm/mutex.h | 63 ++++
> arch/blackfin/include/asm/pda.h | 70 ++++
> arch/blackfin/include/asm/percpu.h | 12 +-
> arch/blackfin/include/asm/processor.h | 7 +-
> arch/blackfin/include/asm/rwlock.h | 6 +
> arch/blackfin/include/asm/smp.h | 42 +++
> arch/blackfin/include/asm/spinlock.h | 87 +++++-
> arch/blackfin/include/asm/spinlock_types.h | 22 ++
> arch/blackfin/include/asm/system.h | 116 ++++++--
> arch/blackfin/mach-common/Makefile | 1 +
> arch/blackfin/mach-common/cache.S | 36 ++
> arch/blackfin/mach-common/entry.S | 92 +++---
> arch/blackfin/mach-common/head.S | 29 +-
> arch/blackfin/mach-common/ints-priority.c | 41 +++-
> arch/blackfin/mach-common/smp.c | 476 ++++++++++++++++++++++++++++
> arch/blackfin/oprofile/common.c | 2 +-
> 25 files changed, 1437 insertions(+), 191 deletions(-)
> create mode 100644 arch/blackfin/include/asm/cpu.h
> create mode 100644 arch/blackfin/include/asm/mutex-dec.h
> create mode 100644 arch/blackfin/include/asm/pda.h
> create mode 100644 arch/blackfin/include/asm/rwlock.h
> create mode 100644 arch/blackfin/include/asm/smp.h
> create mode 100644 arch/blackfin/include/asm/spinlock_types.h
> create mode 100644 arch/blackfin/mach-common/smp.c
>
> diff --git a/arch/blackfin/include/asm/atomic.h b/arch/blackfin/include/asm/atomic.h
> index 7cf5087..8af0542 100644
> --- a/arch/blackfin/include/asm/atomic.h
> +++ b/arch/blackfin/include/asm/atomic.h
> @@ -13,15 +13,83 @@
> * Tony Kou ([email protected]) Lineo Inc. 2001
> */
>
> -typedef struct {
> - int counter;
> -} atomic_t;
> -#define ATOMIC_INIT(i) { (i) }
> +typedef struct { volatile int counter; } atomic_t;
>
> -#define atomic_read(v) ((v)->counter)
> +#define ATOMIC_INIT(i) { (i) }
> #define atomic_set(v, i) (((v)->counter) = i)
>
> -static __inline__ void atomic_add(int i, atomic_t * v)
> +#ifdef CONFIG_SMP
> +
> +#define atomic_read(v) __raw_uncached_fetch_asm(&(v)->counter)
> +
> +asmlinkage int __raw_uncached_fetch_asm(const volatile int *ptr);
> +
> +asmlinkage int __raw_atomic_update_asm(volatile int *ptr, int value);
> +
> +asmlinkage int __raw_atomic_clear_asm(volatile int *ptr, int value);
> +
> +asmlinkage int __raw_atomic_set_asm(volatile int *ptr, int value);
> +
> +asmlinkage int __raw_atomic_xor_asm(volatile int *ptr, int value);
> +
> +asmlinkage int __raw_atomic_test_asm(const volatile int *ptr, int value);
> +
> +static inline void atomic_add(int i, atomic_t *v)
> +{
> + __raw_atomic_update_asm(&v->counter, i);
> +}
> +
> +static inline void atomic_sub(int i, atomic_t *v)
> +{
> + __raw_atomic_update_asm(&v->counter, -i);
> +}
> +
> +static inline int atomic_add_return(int i, atomic_t *v)
> +{
> + return __raw_atomic_update_asm(&v->counter, i);
> +}
> +
> +static inline int atomic_sub_return(int i, atomic_t *v)
> +{
> + return __raw_atomic_update_asm(&v->counter, -i);
> +}
> +
> +static inline void atomic_inc(volatile atomic_t *v)
> +{
> + __raw_atomic_update_asm(&v->counter, 1);
> +}
> +
> +static inline void atomic_dec(volatile atomic_t *v)
> +{
> + __raw_atomic_update_asm(&v->counter, -1);
> +}
> +
> +static inline void atomic_clear_mask(int mask, atomic_t *v)
> +{
> + __raw_atomic_clear_asm(&v->counter, mask);
> +}
> +
> +static inline void atomic_set_mask(int mask, atomic_t *v)
> +{
> + __raw_atomic_set_asm(&v->counter, mask);
> +}
> +
> +static inline int atomic_test_mask(int mask, atomic_t *v)
> +{
> + return __raw_atomic_test_asm(&v->counter, mask);
> +}
> +
> +/* Atomic operations are already serializing */
> +#define smp_mb__before_atomic_dec() barrier()
> +#define smp_mb__after_atomic_dec() barrier()
> +#define smp_mb__before_atomic_inc() barrier()
> +#define smp_mb__after_atomic_inc() barrier()
> +
> +#else /* !CONFIG_SMP */
> +
> +#define atomic_read(v) ((v)->counter)
> +
> +static inline void atomic_add(int i, atomic_t *v)
> {
> long flags;
>
> @@ -30,7 +98,7 @@ static __inline__ void atomic_add(int i, atomic_t * v)
> local_irq_restore(flags);
> }
>
> -static __inline__ void atomic_sub(int i, atomic_t * v)
> +static inline void atomic_sub(int i, atomic_t *v)
> {
> long flags;
>
> @@ -40,7 +108,7 @@ static __inline__ void atomic_sub(int i, atomic_t * v)
>
> }
>
> -static inline int atomic_add_return(int i, atomic_t * v)
> +static inline int atomic_add_return(int i, atomic_t *v)
> {
> int __temp = 0;
> long flags;
> @@ -54,8 +122,7 @@ static inline int atomic_add_return(int i, atomic_t * v)
> return __temp;
> }
>
> -#define atomic_add_negative(a, v) (atomic_add_return((a), (v)) < 0)
> -static inline int atomic_sub_return(int i, atomic_t * v)
> +static inline int atomic_sub_return(int i, atomic_t *v)
> {
> int __temp = 0;
> long flags;
> @@ -68,7 +135,7 @@ static inline int atomic_sub_return(int i, atomic_t * v)
> return __temp;
> }
>
> -static __inline__ void atomic_inc(volatile atomic_t * v)
> +static inline void atomic_inc(volatile atomic_t *v)
> {
> long flags;
>
> @@ -77,20 +144,7 @@ static __inline__ void atomic_inc(volatile atomic_t * v)
> local_irq_restore(flags);
> }
>
> -#define atomic_cmpxchg(v, o, n) ((int)cmpxchg(&((v)->counter), (o), (n)))
> -#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
> -
> -#define atomic_add_unless(v, a, u) \
> -({ \
> - int c, old; \
> - c = atomic_read(v); \
> - while (c != (u) && (old = atomic_cmpxchg((v), c, c + (a))) != c) \
> - c = old; \
> - c != (u); \
> -})
> -#define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0)
> -
> -static __inline__ void atomic_dec(volatile atomic_t * v)
> +static inline void atomic_dec(volatile atomic_t *v)
> {
> long flags;
>
> @@ -99,7 +153,7 @@ static __inline__ void atomic_dec(volatile atomic_t * v)
> local_irq_restore(flags);
> }
>
> -static __inline__ void atomic_clear_mask(unsigned int mask, atomic_t * v)
> +static inline void atomic_clear_mask(unsigned int mask, atomic_t *v)
> {
> long flags;
>
> @@ -108,7 +162,7 @@ static __inline__ void atomic_clear_mask(unsigned int mask, atomic_t * v)
> local_irq_restore(flags);
> }
>
> -static __inline__ void atomic_set_mask(unsigned int mask, atomic_t * v)
> +static inline void atomic_set_mask(unsigned int mask, atomic_t *v)
> {
> long flags;
>
> @@ -123,9 +177,25 @@ static __inline__ void atomic_set_mask(unsigned int mask, atomic_t * v)
> #define smp_mb__before_atomic_inc() barrier()
> #define smp_mb__after_atomic_inc() barrier()
>
> +#endif /* !CONFIG_SMP */
> +
> +#define atomic_add_negative(a, v) (atomic_add_return((a), (v)) < 0)
> #define atomic_dec_return(v) atomic_sub_return(1,(v))
> #define atomic_inc_return(v) atomic_add_return(1,(v))
>
> +#define atomic_cmpxchg(v, o, n) ((int)cmpxchg(&((v)->counter), (o), (n)))
> +#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
> +
> +#define atomic_add_unless(v, a, u) \
> +({ \
> + int c, old; \
> + c = atomic_read(v); \
> + while (c != (u) && (old = atomic_cmpxchg((v), c, c + (a))) != c) \
> + c = old; \
> + c != (u); \
> +})
> +#define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0)
> +
> /*
> * atomic_inc_and_test - increment and test
> * @v: pointer of type atomic_t
> diff --git a/arch/blackfin/include/asm/bfin-global.h b/arch/blackfin/include/asm/bfin-global.h
> index 7729566..1dd0805 100644
> --- a/arch/blackfin/include/asm/bfin-global.h
> +++ b/arch/blackfin/include/asm/bfin-global.h
> @@ -47,6 +47,9 @@
> # define DMA_UNCACHED_REGION (0)
> #endif
>
> +extern void bfin_setup_caches(unsigned int cpu);
> +extern void bfin_setup_cpudata(unsigned int cpu);
> +
> extern unsigned long get_cclk(void);
> extern unsigned long get_sclk(void);
> extern unsigned long sclk_to_usecs(unsigned long sclk);
> @@ -58,8 +61,6 @@ extern void dump_bfin_trace_buffer(void);
>
> /* init functions only */
> extern int init_arch_irq(void);
> -extern void bfin_icache_init(void);
> -extern void bfin_dcache_init(void);
> extern void init_exception_vectors(void);
> extern void program_IAR(void);
>
> diff --git a/arch/blackfin/include/asm/bitops.h b/arch/blackfin/include/asm/bitops.h
> index b39a175..5872fb6 100644
> --- a/arch/blackfin/include/asm/bitops.h
> +++ b/arch/blackfin/include/asm/bitops.h
> @@ -7,7 +7,6 @@
>
> #include <linux/compiler.h>
> #include <asm/byteorder.h> /* swab32 */
> -#include <asm/system.h> /* save_flags */
>
> #ifdef __KERNEL__
>
> @@ -20,36 +19,71 @@
> #include <asm-generic/bitops/sched.h>
> #include <asm-generic/bitops/ffz.h>
>
> -static __inline__ void set_bit(int nr, volatile unsigned long *addr)
> +#ifdef CONFIG_SMP
> +
> +#include <linux/linkage.h>
> +
> +asmlinkage int __raw_bit_set_asm(volatile unsigned long *addr, int nr);
> +
> +asmlinkage int __raw_bit_clear_asm(volatile unsigned long *addr, int nr);
> +
> +asmlinkage int __raw_bit_toggle_asm(volatile unsigned long *addr, int nr);
> +
> +asmlinkage int __raw_bit_test_set_asm(volatile unsigned long *addr, int nr);
> +
> +asmlinkage int __raw_bit_test_clear_asm(volatile unsigned long *addr, int nr);
> +
> +asmlinkage int __raw_bit_test_toggle_asm(volatile unsigned long *addr, int nr);
> +
> +asmlinkage int __raw_bit_test_asm(const volatile unsigned long *addr, int nr);
> +
> +static inline void set_bit(int nr, volatile unsigned long *addr)
> {
> - int *a = (int *)addr;
> - int mask;
> - unsigned long flags;
> + volatile unsigned long *a = addr + (nr >> 5);
> + __raw_bit_set_asm(a, nr & 0x1f);
> +}
>
> - a += nr >> 5;
> - mask = 1 << (nr & 0x1f);
> - local_irq_save(flags);
> - *a |= mask;
> - local_irq_restore(flags);
> +static inline void clear_bit(int nr, volatile unsigned long *addr)
> +{
> + volatile unsigned long *a = addr + (nr >> 5);
> + __raw_bit_clear_asm(a, nr & 0x1f);
> }
>
> -static __inline__ void __set_bit(int nr, volatile unsigned long *addr)
> +static inline void change_bit(int nr, volatile unsigned long *addr)
> {
> - int *a = (int *)addr;
> - int mask;
> + volatile unsigned long *a = addr + (nr >> 5);
> + __raw_bit_toggle_asm(a, nr & 0x1f);
> +}
>
> - a += nr >> 5;
> - mask = 1 << (nr & 0x1f);
> - *a |= mask;
> +static inline int test_bit(int nr, const volatile unsigned long *addr)
> +{
> + volatile const unsigned long *a = addr + (nr >> 5);
> + return __raw_bit_test_asm(a, nr & 0x1f) != 0;
> }
>
> -/*
> - * clear_bit() doesn't provide any barrier for the compiler.
> - */
> -#define smp_mb__before_clear_bit() barrier()
> -#define smp_mb__after_clear_bit() barrier()
> +static inline int test_and_set_bit(int nr, volatile unsigned long *addr)
> +{
> + volatile unsigned long *a = addr + (nr >> 5);
> + return __raw_bit_test_set_asm(a, nr & 0x1f);
> +}
>
> -static __inline__ void clear_bit(int nr, volatile unsigned long *addr)
> +static inline int test_and_clear_bit(int nr, volatile unsigned long *addr)
> +{
> + volatile unsigned long *a = addr + (nr >> 5);
> + return __raw_bit_test_clear_asm(a, nr & 0x1f);
> +}
> +
> +static inline int test_and_change_bit(int nr, volatile unsigned long *addr)
> +{
> + volatile unsigned long *a = addr + (nr >> 5);
> + return __raw_bit_test_toggle_asm(a, nr & 0x1f);
> +}
> +
> +#else /* !CONFIG_SMP */
> +
> +#include <asm/system.h> /* save_flags */
> +
> +static inline void set_bit(int nr, volatile unsigned long *addr)
> {
> int *a = (int *)addr;
> int mask;
> @@ -57,21 +91,23 @@ static __inline__ void clear_bit(int nr, volatile unsigned long *addr)
> a += nr >> 5;
> mask = 1 << (nr & 0x1f);
> local_irq_save(flags);
> - *a &= ~mask;
> + *a |= mask;
> local_irq_restore(flags);
> }
>
> -static __inline__ void __clear_bit(int nr, volatile unsigned long *addr)
> +static inline void clear_bit(int nr, volatile unsigned long *addr)
> {
> int *a = (int *)addr;
> int mask;
> -
> + unsigned long flags;
> a += nr >> 5;
> mask = 1 << (nr & 0x1f);
> + local_irq_save(flags);
> *a &= ~mask;
> + local_irq_restore(flags);
> }
>
> -static __inline__ void change_bit(int nr, volatile unsigned long *addr)
> +static inline void change_bit(int nr, volatile unsigned long *addr)
> {
> int mask, flags;
> unsigned long *ADDR = (unsigned long *)addr;
> @@ -83,17 +119,7 @@ static __inline__ void change_bit(int nr, volatile unsigned long *addr)
> local_irq_restore(flags);
> }
>
> -static __inline__ void __change_bit(int nr, volatile unsigned long *addr)
> -{
> - int mask;
> - unsigned long *ADDR = (unsigned long *)addr;
> -
> - ADDR += nr >> 5;
> - mask = 1 << (nr & 31);
> - *ADDR ^= mask;
> -}
> -
> -static __inline__ int test_and_set_bit(int nr, void *addr)
> +static inline int test_and_set_bit(int nr, volatile unsigned long *addr)
> {
> int mask, retval;
> volatile unsigned int *a = (volatile unsigned int *)addr;
> @@ -109,19 +135,23 @@ static __inline__ int test_and_set_bit(int nr, void *addr)
> return retval;
> }
>
> -static __inline__ int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static inline int test_and_clear_bit(int nr, volatile unsigned long *addr)
> {
> int mask, retval;
> volatile unsigned int *a = (volatile unsigned int *)addr;
> + unsigned long flags;
>
> a += nr >> 5;
> mask = 1 << (nr & 0x1f);
> + local_irq_save(flags);
> retval = (mask & *a) != 0;
> - *a |= mask;
> + *a &= ~mask;
> + local_irq_restore(flags);
> +
> return retval;
> }
>
> -static __inline__ int test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static inline int test_and_change_bit(int nr, volatile unsigned long *addr)
> {
> int mask, retval;
> volatile unsigned int *a = (volatile unsigned int *)addr;
> @@ -131,13 +161,59 @@ static __inline__ int test_and_clear_bit(int nr, volatile unsigned long *addr)
> mask = 1 << (nr & 0x1f);
> local_irq_save(flags);
> retval = (mask & *a) != 0;
> - *a &= ~mask;
> + *a ^= mask;
> local_irq_restore(flags);
> -
> return retval;
> }
>
> -static __inline__ int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +/*
> + * This routine doesn't need to go through raw atomic ops in UP
> + * context.
> + */
> +#define test_bit(nr,addr) \
> +(__builtin_constant_p(nr) ? \
> + __constant_test_bit((nr), (addr)) : \
> + __test_bit((nr), (addr)))
> +
> +#endif /* CONFIG_SMP */
> +
> +/*
> + * clear_bit() doesn't provide any barrier for the compiler.
> + */
> +#define smp_mb__before_clear_bit() barrier()
> +#define smp_mb__after_clear_bit() barrier()
> +
> +static inline void __set_bit(int nr, volatile unsigned long *addr)
> +{
> + int *a = (int *)addr;
> + int mask;
> +
> + a += nr >> 5;
> + mask = 1 << (nr & 0x1f);
> + *a |= mask;
> +}
> +
> +static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +{
> + int *a = (int *)addr;
> + int mask;
> +
> + a += nr >> 5;
> + mask = 1 << (nr & 0x1f);
> + *a &= ~mask;
> +}
> +
> +static inline void __change_bit(int nr, volatile unsigned long *addr)
> +{
> + int mask;
> + unsigned long *ADDR = (unsigned long *)addr;
> +
> + ADDR += nr >> 5;
> + mask = 1 << (nr & 31);
> + *ADDR ^= mask;
> +}
> +
> +static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> {
> int mask, retval;
> volatile unsigned int *a = (volatile unsigned int *)addr;
> @@ -145,26 +221,23 @@ static __inline__ int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> a += nr >> 5;
> mask = 1 << (nr & 0x1f);
> retval = (mask & *a) != 0;
> - *a &= ~mask;
> + *a |= mask;
> return retval;
> }
>
> -static __inline__ int test_and_change_bit(int nr, volatile unsigned long *addr)
> +static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> {
> int mask, retval;
> volatile unsigned int *a = (volatile unsigned int *)addr;
> - unsigned long flags;
>
> a += nr >> 5;
> mask = 1 << (nr & 0x1f);
> - local_irq_save(flags);
> retval = (mask & *a) != 0;
> - *a ^= mask;
> - local_irq_restore(flags);
> + *a &= ~mask;
> return retval;
> }
>
> -static __inline__ int __test_and_change_bit(int nr,
> +static inline int __test_and_change_bit(int nr,
> volatile unsigned long *addr)
> {
> int mask, retval;
> @@ -177,16 +250,13 @@ static __inline__ int __test_and_change_bit(int nr,
> return retval;
> }
>
> -/*
> - * This routine doesn't need to be atomic.
> - */
> -static __inline__ int __constant_test_bit(int nr, const void *addr)
> +static inline int __constant_test_bit(int nr, const void *addr)
> {
> return ((1UL << (nr & 31)) &
> (((const volatile unsigned int *)addr)[nr >> 5])) != 0;
> }
>
> -static __inline__ int __test_bit(int nr, const void *addr)
> +static inline int __test_bit(int nr, const void *addr)
> {
> int *a = (int *)addr;
> int mask;
> @@ -196,11 +266,6 @@ static __inline__ int __test_bit(int nr, const void *addr)
> return ((mask & *a) != 0);
> }
>
> -#define test_bit(nr,addr) \
> -(__builtin_constant_p(nr) ? \
> - __constant_test_bit((nr),(addr)) : \
> - __test_bit((nr),(addr)))
> -
> #include <asm-generic/bitops/find.h>
> #include <asm-generic/bitops/hweight.h>
> #include <asm-generic/bitops/lock.h>
> diff --git a/arch/blackfin/include/asm/cache.h b/arch/blackfin/include/asm/cache.h
> index 023d721..8663781 100644
> --- a/arch/blackfin/include/asm/cache.h
> +++ b/arch/blackfin/include/asm/cache.h
> @@ -12,6 +12,11 @@
> #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT)
> #define SMP_CACHE_BYTES L1_CACHE_BYTES
>
> +#ifdef CONFIG_SMP
> +#define __cacheline_aligned
> +#else
> +#define ____cacheline_aligned
> +
> /*
> * Put cacheline_aliged data to L1 data memory
> */
> @@ -21,9 +26,33 @@
> __section__(".data_l1.cacheline_aligned")))
> #endif
>
> +#endif
> +
> /*
> * largest L1 which this arch supports
> */
> #define L1_CACHE_SHIFT_MAX 5
>
> +#if defined(CONFIG_SMP) && \
> + !defined(CONFIG_BFIN_CACHE_COHERENT) && \
> + defined(CONFIG_BFIN_DCACHE)
> +#define __ARCH_SYNC_CORE_DCACHE
> +#ifndef __ASSEMBLY__
> +asmlinkage void __raw_smp_mark_barrier_asm(void);
> +asmlinkage void __raw_smp_check_barrier_asm(void);
> +
> +static inline void smp_mark_barrier(void)
> +{
> + __raw_smp_mark_barrier_asm();
> +}
> +static inline void smp_check_barrier(void)
> +{
> + __raw_smp_check_barrier_asm();
> +}
> +
> +void resync_core_dcache(void);
> +#endif
> +#endif
> +
> +
> #endif
> diff --git a/arch/blackfin/include/asm/cacheflush.h b/arch/blackfin/include/asm/cacheflush.h
> index 4403415..1b040f5 100644
> --- a/arch/blackfin/include/asm/cacheflush.h
> +++ b/arch/blackfin/include/asm/cacheflush.h
> @@ -35,6 +35,7 @@ extern void blackfin_icache_flush_range(unsigned long start_address, unsigned lo
> extern void blackfin_dcache_flush_range(unsigned long start_address, unsigned long end_address);
> extern void blackfin_dcache_invalidate_range(unsigned long start_address, unsigned long end_address);
> extern void blackfin_dflush_page(void *page);
> +extern void blackfin_invalidate_entire_dcache(void);
>
> #define flush_dcache_mmap_lock(mapping) do { } while (0)
> #define flush_dcache_mmap_unlock(mapping) do { } while (0)
> @@ -44,12 +45,20 @@ extern void blackfin_dflush_page(void *page);
> #define flush_cache_vmap(start, end) do { } while (0)
> #define flush_cache_vunmap(start, end) do { } while (0)
>
> +#ifdef CONFIG_SMP
> +#define flush_icache_range_others(start, end) \
> + smp_icache_flush_range_others((start), (end))
> +#else
> +#define flush_icache_range_others(start, end) do { } while (0)
> +#endif
> +
> static inline void flush_icache_range(unsigned start, unsigned end)
> {
> #if defined(CONFIG_BFIN_DCACHE) && defined(CONFIG_BFIN_ICACHE)
>
> # if defined(CONFIG_BFIN_WT)
> blackfin_icache_flush_range((start), (end));
> + flush_icache_range_others(start, end);
> # else
> blackfin_icache_dcache_flush_range((start), (end));
> # endif
> @@ -58,6 +67,7 @@ static inline void flush_icache_range(unsigned start, unsigned end)
>
> # if defined(CONFIG_BFIN_ICACHE)
> blackfin_icache_flush_range((start), (end));
> + flush_icache_range_others(start, end);
> # endif
> # if defined(CONFIG_BFIN_DCACHE)
> blackfin_dcache_flush_range((start), (end));
> @@ -66,10 +76,12 @@ static inline void flush_icache_range(unsigned start, unsigned end)
> #endif
> }
>
> -#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
> -do { memcpy(dst, src, len); \
> - flush_icache_range ((unsigned) (dst), (unsigned) (dst) + (len)); \
> +#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
> +do { memcpy(dst, src, len); \
> + flush_icache_range((unsigned) (dst), (unsigned) (dst) + (len)); \
> + flush_icache_range_others((unsigned long) (dst), (unsigned long) (dst) + (len));\
> } while (0)
> +
> #define copy_from_user_page(vma, page, vaddr, dst, src, len) memcpy(dst, src, len)
>
> #if defined(CONFIG_BFIN_DCACHE)
> @@ -82,7 +94,7 @@ do { memcpy(dst, src, len); \
> # define flush_dcache_page(page) blackfin_dflush_page(page_address(page))
> #else
> # define flush_dcache_range(start,end) do { } while (0)
> -# define flush_dcache_page(page) do { } while (0)
> +# define flush_dcache_page(page) do { } while (0)
> #endif
>
> extern unsigned long reserved_mem_dcache_on;
> diff --git a/arch/blackfin/include/asm/context.S b/arch/blackfin/include/asm/context.S
> index c0e630e..40d20b4 100644
> --- a/arch/blackfin/include/asm/context.S
> +++ b/arch/blackfin/include/asm/context.S
> @@ -303,9 +303,14 @@
> RETI = [sp++];
> RETS = [sp++];
>
> +#ifdef CONFIG_SMP
> + GET_PDA(p0, r0);
> + r0 = [p0 + PDA_IRQFLAGS];
> +#else
> p0.h = _irq_flags;
> p0.l = _irq_flags;
> r0 = [p0];
> +#endif
> sti r0;
>
> sp += 4; /* Skip Reserved */
> @@ -352,4 +357,3 @@
> SYSCFG = [sp++];
> csync;
> .endm
> -
> diff --git a/arch/blackfin/include/asm/cpu.h b/arch/blackfin/include/asm/cpu.h
> new file mode 100644
> index 0000000..9b7aefe
> --- /dev/null
> +++ b/arch/blackfin/include/asm/cpu.h
> @@ -0,0 +1,42 @@
> +/*
> + * File: arch/blackfin/include/asm/cpu.h.
> + * Author: Philippe Gerum <[email protected]>
> + *
> + * Copyright 2007 Analog Devices Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see the file COPYING, or write
> + * to the Free Software Foundation, Inc.,
> + * 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#ifndef __ASM_BLACKFIN_CPU_H
> +#define __ASM_BLACKFIN_CPU_H
> +
> +#include <linux/percpu.h>
> +
> +struct task_struct;
> +
> +struct blackfin_cpudata {
> + struct cpu cpu;
> + struct task_struct *idle;
> + unsigned long cclk;
> + unsigned int imemctl;
> + unsigned int dmemctl;
> + unsigned long loops_per_jiffy;
> + unsigned long dcache_invld_count;
> +};
> +
> +DECLARE_PER_CPU(struct blackfin_cpudata, cpu_data);
> +
> +#endif
> diff --git a/arch/blackfin/include/asm/l1layout.h b/arch/blackfin/include/asm/l1layout.h
> index c13ded7..06bb37f 100644
> --- a/arch/blackfin/include/asm/l1layout.h
> +++ b/arch/blackfin/include/asm/l1layout.h
> @@ -24,7 +24,8 @@ struct l1_scratch_task_info
> };
>
> /* A pointer to the structure in memory. */
> -#define L1_SCRATCH_TASK_INFO ((struct l1_scratch_task_info *)L1_SCRATCH_START)
> +#define L1_SCRATCH_TASK_INFO ((struct l1_scratch_task_info *)\
> + get_l1_scratch_start())
>
> #endif
>
> diff --git a/arch/blackfin/include/asm/mutex-dec.h b/arch/blackfin/include/asm/mutex-dec.h
> new file mode 100644
> index 0000000..0134151
> --- /dev/null
> +++ b/arch/blackfin/include/asm/mutex-dec.h
> @@ -0,0 +1,112 @@
> +/*
> + * include/asm-generic/mutex-dec.h
> + *
> + * Generic implementation of the mutex fastpath, based on atomic
> + * decrement/increment.
> + */
> +#ifndef _ASM_GENERIC_MUTEX_DEC_H
> +#define _ASM_GENERIC_MUTEX_DEC_H
> +
> +/**
> + * __mutex_fastpath_lock - try to take the lock by moving the count
> + * from 1 to a 0 value
> + * @count: pointer of type atomic_t
> + * @fail_fn: function to call if the original value was not 1
> + *
> + * Change the count from 1 to a value lower than 1, and call <fail_fn> if
> + * it wasn't 1 originally. This function MUST leave the value lower than
> + * 1 even when the "1" assertion wasn't true.
> + */
> +static inline void
> +__mutex_fastpath_lock(atomic_t *count, fastcall void (*fail_fn)(atomic_t *))
> +{
> + if (unlikely(atomic_dec_return(count) < 0))
> + fail_fn(count);
> + else
> + smp_mb();
> +}
> +
> +/**
> + * __mutex_fastpath_lock_retval - try to take the lock by moving the count
> + * from 1 to a 0 value
> + * @count: pointer of type atomic_t
> + * @fail_fn: function to call if the original value was not 1
> + *
> + * Change the count from 1 to a value lower than 1, and call <fail_fn> if
> + * it wasn't 1 originally. This function returns 0 if the fastpath succeeds,
> + * or anything the slow path function returns.
> + */
> +static inline int
> +__mutex_fastpath_lock_retval(atomic_t *count, fastcall int (*fail_fn)(atomic_t *))
> +{
> + if (unlikely(atomic_dec_return(count) < 0))
> + return fail_fn(count);
> + else {
> + smp_mb();
> + return 0;
> + }
> +}
> +
> +/**
> + * __mutex_fastpath_unlock - try to promote the count from 0 to 1
> + * @count: pointer of type atomic_t
> + * @fail_fn: function to call if the original value was not 0
> + *
> + * Try to promote the count from 0 to 1. If it wasn't 0, call <fail_fn>.
> + * In the failure case, this function is allowed to either set the value to
> + * 1, or to set it to a value lower than 1.
> + *
> + * If the implementation sets it to a value of lower than 1, then the
> + * __mutex_slowpath_needs_to_unlock() macro needs to return 1, it needs
> + * to return 0 otherwise.
> + */
> +static inline void
> +__mutex_fastpath_unlock(atomic_t *count, fastcall void (*fail_fn)(atomic_t *))
> +{
> + smp_mb();
> + if (unlikely(atomic_inc_return(count) <= 0))
> + fail_fn(count);
> +}
> +
> +#define __mutex_slowpath_needs_to_unlock() 1
> +
> +/**
> + * __mutex_fastpath_trylock - try to acquire the mutex, without waiting
> + *
> + * @count: pointer of type atomic_t
> + * @fail_fn: fallback function
> + *
> + * Change the count from 1 to a value lower than 1, and return 0 (failure)
> + * if it wasn't 1 originally, or return 1 (success) otherwise. This function
> + * MUST leave the value lower than 1 even when the "1" assertion wasn't true.
> + * Additionally, if the value was < 0 originally, this function must not leave
> + * it to 0 on failure.
> + *
> + * If the architecture has no effective trylock variant, it should call the
> + * <fail_fn> spinlock-based trylock variant unconditionally.
> + */
> +static inline int
> +__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
> +{
> + /*
> + * We have two variants here. The cmpxchg based one is the best one
> + * because it never induce a false contention state. It is included
> + * here because architectures using the inc/dec algorithms over the
> + * xchg ones are much more likely to support cmpxchg natively.
> + *
> + * If not we fall back to the spinlock based variant - that is
> + * just as efficient (and simpler) as a 'destructive' probing of
> + * the mutex state would be.
> + */
> +#ifdef __HAVE_ARCH_CMPXCHG
> + if (likely(atomic_cmpxchg(count, 1, 0) == 1)) {
> + smp_mb();
> + return 1;
> + }
> + return 0;
> +#else
> + return fail_fn(count);
> +#endif
> +}
> +
> +#endif
> diff --git a/arch/blackfin/include/asm/mutex.h b/arch/blackfin/include/asm/mutex.h
> index 458c1f7..5d39925 100644
> --- a/arch/blackfin/include/asm/mutex.h
> +++ b/arch/blackfin/include/asm/mutex.h
> @@ -6,4 +6,67 @@
> * implementation. (see asm-generic/mutex-xchg.h for details)
> */
>
> +#ifndef _ASM_MUTEX_H
> +#define _ASM_MUTEX_H
> +
> +#ifndef CONFIG_SMP
> #include <asm-generic/mutex-dec.h>
> +#else
> +
> +static inline void
> +__mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
> +{
> + if (unlikely(atomic_dec_return(count) < 0))
> + fail_fn(count);
> + else
> + smp_mb();
> +}
> +
> +static inline int
> +__mutex_fastpath_lock_retval(atomic_t *count, int (*fail_fn)(atomic_t *))
> +{
> + if (unlikely(atomic_dec_return(count) < 0))
> + return fail_fn(count);
> + else {
> + smp_mb();
> + return 0;
> + }
> +}
> +
> +static inline void
> +__mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *))
> +{
> + smp_mb();
> + if (unlikely(atomic_inc_return(count) <= 0))
> + fail_fn(count);
> +}
> +
> +#define __mutex_slowpath_needs_to_unlock() 1
> +
> +static inline int
> +__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
> +{
> + /*
> + * We have two variants here. The cmpxchg based one is the best one
> + * because it never induce a false contention state. It is included
> + * here because architectures using the inc/dec algorithms over the
> + * xchg ones are much more likely to support cmpxchg natively.
> + *
> + * If not we fall back to the spinlock based variant - that is
> + * just as efficient (and simpler) as a 'destructive' probing of
> + * the mutex state would be.
> + */
> +#ifdef __HAVE_ARCH_CMPXCHG
> + if (likely(atomic_cmpxchg(count, 1, 0) == 1)) {
> + smp_mb();
> + return 1;
> + }
> + return 0;
> +#else
> + return fail_fn(count);
> +#endif
> +}
> +
> +#endif
> +
> +#endif
> diff --git a/arch/blackfin/include/asm/pda.h b/arch/blackfin/include/asm/pda.h
> new file mode 100644
> index 0000000..a24d130
> --- /dev/null
> +++ b/arch/blackfin/include/asm/pda.h
> @@ -0,0 +1,70 @@
> +/*
> + * File: arch/blackfin/include/asm/pda.h
> + * Author: Philippe Gerum <[email protected]>
> + *
> + * Copyright 2007 Analog Devices Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see the file COPYING, or write
> + * to the Free Software Foundation, Inc.,
> + * 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#ifndef _ASM_BLACKFIN_PDA_H
> +#define _ASM_BLACKFIN_PDA_H
> +
> +#include <asm/mem_map.h>
> +
> +#ifndef __ASSEMBLY__
> +
> +struct blackfin_pda { /* Per-processor Data Area */
> + struct blackfin_pda *next;
> +
> + unsigned long syscfg;
> +#ifdef CONFIG_SMP
> + unsigned long imask; /* Current IMASK value */
> +#endif
> +
> + unsigned long *ipdt; /* Start of switchable I-CPLB table */
> + unsigned long *ipdt_swapcount; /* Number of swaps in ipdt */
> + unsigned long *dpdt; /* Start of switchable D-CPLB table */
> + unsigned long *dpdt_swapcount; /* Number of swaps in dpdt */
> +
> + /*
> + * Single instructions can have multiple faults, which
> + * need to be handled by traps.c, in irq5. We store
> + * the exception cause to ensure we don't miss a
> + * double fault condition
> + */
> + unsigned long ex_iptr;
> + unsigned long ex_optr;
> + unsigned long ex_buf[4];
> + unsigned long ex_imask; /* Saved imask from exception */
> + unsigned long *ex_stack; /* Exception stack space */
> +
> +#ifdef ANOMALY_05000261
> + unsigned long last_cplb_fault_retx;
> +#endif
> + unsigned long dcplb_fault_addr;
> + unsigned long icplb_fault_addr;
> + unsigned long retx;
> + unsigned long seqstat;
> +};
> +
> +extern struct blackfin_pda cpu_pda[];
> +
> +void reserve_pda(void);
> +
> +#endif /* __ASSEMBLY__ */
> +
> +#endif /* _ASM_BLACKFIN_PDA_H */
> diff --git a/arch/blackfin/include/asm/percpu.h b/arch/blackfin/include/asm/percpu.h
> index 78dd61f..797c0c1 100644
> --- a/arch/blackfin/include/asm/percpu.h
> +++ b/arch/blackfin/include/asm/percpu.h
> @@ -3,4 +3,14 @@
>
> #include <asm-generic/percpu.h>
>
> -#endif /* __ARCH_BLACKFIN_PERCPU__ */
> +#ifdef CONFIG_MODULES
> +#define PERCPU_MODULE_RESERVE 8192
> +#else
> +#define PERCPU_MODULE_RESERVE 0
> +#endif
> +
> +#define PERCPU_ENOUGH_ROOM \
> + (ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES) + \
> + PERCPU_MODULE_RESERVE)
> +
> +#endif /* __ARCH_BLACKFIN_PERCPU__ */
> diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h
> index e3e9b41..30703c7 100644
> --- a/arch/blackfin/include/asm/processor.h
> +++ b/arch/blackfin/include/asm/processor.h
> @@ -106,7 +106,8 @@ unsigned long get_wchan(struct task_struct *p);
> eip; })
> #define KSTK_ESP(tsk) ((tsk) == current ? rdusp() : (tsk)->thread.usp)
>
> -#define cpu_relax() barrier()
> +#define cpu_relax() smp_mb()
> +
>
> /* Get the Silicon Revision of the chip */
> static inline uint32_t __pure bfin_revid(void)
> @@ -137,7 +138,11 @@ static inline uint32_t __pure bfin_revid(void)
> static inline uint16_t __pure bfin_cpuid(void)
> {
> return (bfin_read_CHIPID() & CHIPID_FAMILY) >> 12;
> +}
>
> +static inline uint32_t __pure bfin_dspid(void)
> +{
> + return bfin_read_DSPID();
> }
>
> static inline uint32_t __pure bfin_compiled_revid(void)
> diff --git a/arch/blackfin/include/asm/rwlock.h b/arch/blackfin/include/asm/rwlock.h
> new file mode 100644
> index 0000000..4a724b3
> --- /dev/null
> +++ b/arch/blackfin/include/asm/rwlock.h
> @@ -0,0 +1,6 @@
> +#ifndef _ASM_BLACKFIN_RWLOCK_H
> +#define _ASM_BLACKFIN_RWLOCK_H
> +
> +#define RW_LOCK_BIAS 0x01000000
> +
> +#endif
> diff --git a/arch/blackfin/include/asm/smp.h b/arch/blackfin/include/asm/smp.h
> new file mode 100644
> index 0000000..233cb8c
> --- /dev/null
> +++ b/arch/blackfin/include/asm/smp.h
> @@ -0,0 +1,42 @@
> +/*
> + * File: arch/blackfin/include/asm/smp.h
> + * Author: Philippe Gerum <[email protected]>
> + *
> + * Copyright 2007 Analog Devices Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see the file COPYING, or write
> + * to the Free Software Foundation, Inc.,
> + * 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#ifndef __ASM_BLACKFIN_SMP_H
> +#define __ASM_BLACKFIN_SMP_H
> +
> +#include <linux/kernel.h>
> +#include <linux/threads.h>
> +#include <linux/cpumask.h>
> +#include <linux/cache.h>
> +#include <asm/blackfin.h>
> +#include <mach/smp.h>
> +
> +#define raw_smp_processor_id() blackfin_core_id()
> +
> +struct corelock_slot {
> + int lock;
> +};
> +
> +void smp_icache_flush_range_others(unsigned long start,
> + unsigned long end);
> +
> +#endif /* !__ASM_BLACKFIN_SMP_H */
> diff --git a/arch/blackfin/include/asm/spinlock.h b/arch/blackfin/include/asm/spinlock.h
> index 64e908a..0249ac3 100644
> --- a/arch/blackfin/include/asm/spinlock.h
> +++ b/arch/blackfin/include/asm/spinlock.h
> @@ -1,6 +1,89 @@
> #ifndef __BFIN_SPINLOCK_H
> #define __BFIN_SPINLOCK_H
>
> -#error blackfin architecture does not support SMP spin lock yet
> +#include <asm/atomic.h>
>
> -#endif
> +asmlinkage int __raw_spin_is_locked_asm(volatile int *ptr);
> +asmlinkage void __raw_spin_lock_asm(volatile int *ptr);
> +asmlinkage int __raw_spin_trylock_asm(volatile int *ptr);
> +asmlinkage void __raw_spin_unlock_asm(volatile int *ptr);
> +asmlinkage void __raw_read_lock_asm(volatile int *ptr);
> +asmlinkage int __raw_read_trylock_asm(volatile int *ptr);
> +asmlinkage void __raw_read_unlock_asm(volatile int *ptr);
> +asmlinkage void __raw_write_lock_asm(volatile int *ptr);
> +asmlinkage int __raw_write_trylock_asm(volatile int *ptr);
> +asmlinkage void __raw_write_unlock_asm(volatile int *ptr);
> +
> +static inline int __raw_spin_is_locked(raw_spinlock_t *lock)
> +{
> + return __raw_spin_is_locked_asm(&lock->lock);
> +}
> +
> +static inline void __raw_spin_lock(raw_spinlock_t *lock)
> +{
> + __raw_spin_lock_asm(&lock->lock);
> +}
> +
> +#define __raw_spin_lock_flags(lock, flags) __raw_spin_lock(lock)
> +
> +static inline int __raw_spin_trylock(raw_spinlock_t *lock)
> +{
> + return __raw_spin_trylock_asm(&lock->lock);
> +}
> +
> +static inline void __raw_spin_unlock(raw_spinlock_t *lock)
> +{
> + __raw_spin_unlock_asm(&lock->lock);
> +}
> +
> +static inline void __raw_spin_unlock_wait(raw_spinlock_t *lock)
> +{
> + while (__raw_spin_is_locked(lock))
> + cpu_relax();
> +}
> +
> +static inline int __raw_read_can_lock(raw_rwlock_t *rw)
> +{
> + return __raw_uncached_fetch_asm(&rw->lock) > 0;
> +}
> +
> +static inline int __raw_write_can_lock(raw_rwlock_t *rw)
> +{
> + return __raw_uncached_fetch_asm(&rw->lock) == RW_LOCK_BIAS;
> +}
> +
> +static inline void __raw_read_lock(raw_rwlock_t *rw)
> +{
> + __raw_read_lock_asm(&rw->lock);
> +}
> +
> +static inline int __raw_read_trylock(raw_rwlock_t *rw)
> +{
> + return __raw_read_trylock_asm(&rw->lock);
> +}
> +
> +static inline void __raw_read_unlock(raw_rwlock_t *rw)
> +{
> + __raw_read_unlock_asm(&rw->lock);
> +}
> +
> +static inline void __raw_write_lock(raw_rwlock_t *rw)
> +{
> + __raw_write_lock_asm(&rw->lock);
> +}
> +
> +static inline int __raw_write_trylock(raw_rwlock_t *rw)
> +{
> + return __raw_write_trylock_asm(&rw->lock);
> +}
> +
> +static inline void __raw_write_unlock(raw_rwlock_t *rw)
> +{
> + __raw_write_unlock_asm(&rw->lock);
> +}
> +
> +#define _raw_spin_relax(lock) cpu_relax()
> +#define _raw_read_relax(lock) cpu_relax()
> +#define _raw_write_relax(lock) cpu_relax()
> +
> +#endif /* !__BFIN_SPINLOCK_H */
> diff --git a/arch/blackfin/include/asm/spinlock_types.h b/arch/blackfin/include/asm/spinlock_types.h
> new file mode 100644
> index 0000000..b1e3c4c
> --- /dev/null
> +++ b/arch/blackfin/include/asm/spinlock_types.h
> @@ -0,0 +1,22 @@
> +#ifndef __ASM_SPINLOCK_TYPES_H
> +#define __ASM_SPINLOCK_TYPES_H
> +
> +#ifndef __LINUX_SPINLOCK_TYPES_H
> +# error "please don't include this file directly"
> +#endif
> +
> +#include <asm/rwlock.h>
> +
> +typedef struct {
> + volatile unsigned int lock;
> +} raw_spinlock_t;
> +
> +#define __RAW_SPIN_LOCK_UNLOCKED { 0 }
> +
> +typedef struct {
> + volatile unsigned int lock;
> +} raw_rwlock_t;
> +
> +#define __RAW_RW_LOCK_UNLOCKED { RW_LOCK_BIAS }
> +
> +#endif
> diff --git a/arch/blackfin/include/asm/system.h b/arch/blackfin/include/asm/system.h
> index 8f1627d..6b368fa 100644
> --- a/arch/blackfin/include/asm/system.h
> +++ b/arch/blackfin/include/asm/system.h
> @@ -37,20 +37,16 @@
> #include <linux/linkage.h>
> #include <linux/compiler.h>
> #include <mach/anomaly.h>
> +#include <asm/pda.h>
> +#include <asm/processor.h>
> +
> +/* Forward decl needed due to cdef inter dependencies */
> +static inline uint32_t __pure bfin_dspid(void);
> +#define blackfin_core_id() (bfin_dspid() & 0xff)
>
> /*
> * Interrupt configuring macros.
> */
> -
> -extern unsigned long irq_flags;
> -
> -#define local_irq_enable() \
> - __asm__ __volatile__( \
> - "sti %0;" \
> - : \
> - : "d" (irq_flags) \
> - )
> -
> #define local_irq_disable() \
> do { \
> int __tmp_dummy; \
> @@ -66,6 +62,18 @@ extern unsigned long irq_flags;
> # define NOP_PAD_ANOMALY_05000244
> #endif
>
> +#ifdef CONFIG_SMP
> +# define irq_flags cpu_pda[blackfin_core_id()].imask
> +#else
> +extern unsigned long irq_flags;
> +#endif
> +
> +#define local_irq_enable() \
> + __asm__ __volatile__( \
> + "sti %0;" \
> + : \
> + : "d" (irq_flags) \
> + )
> #define idle_with_irq_disabled() \
> __asm__ __volatile__( \
> NOP_PAD_ANOMALY_05000244 \
> @@ -129,22 +137,85 @@ extern unsigned long irq_flags;
> #define rmb() asm volatile ("" : : :"memory")
> #define wmb() asm volatile ("" : : :"memory")
> #define set_mb(var, value) do { (void) xchg(&var, value); } while (0)
> -
> #define read_barrier_depends() do { } while(0)
>
> #ifdef CONFIG_SMP
> -#define smp_mb() mb()
> -#define smp_rmb() rmb()
> -#define smp_wmb() wmb()
> -#define smp_read_barrier_depends() read_barrier_depends()
> +asmlinkage unsigned long __raw_xchg_1_asm(volatile void *ptr, unsigned long value);
> +asmlinkage unsigned long __raw_xchg_2_asm(volatile void *ptr, unsigned long value);
> +asmlinkage unsigned long __raw_xchg_4_asm(volatile void *ptr, unsigned long value);
> +asmlinkage unsigned long __raw_cmpxchg_1_asm(volatile void *ptr,
> + unsigned long new, unsigned long old);
> +asmlinkage unsigned long __raw_cmpxchg_2_asm(volatile void *ptr,
> + unsigned long new, unsigned long old);
> +asmlinkage unsigned long __raw_cmpxchg_4_asm(volatile void *ptr,
> + unsigned long new, unsigned long old);
> +
> +#ifdef __ARCH_SYNC_CORE_DCACHE
> +# define smp_mb() do { barrier(); smp_check_barrier(); smp_mark_barrier(); } while (0)
> +# define smp_rmb() do { barrier(); smp_check_barrier(); } while (0)
> +# define smp_wmb() do { barrier(); smp_mark_barrier(); } while (0)
> #else
> +# define smp_mb() barrier()
> +# define smp_rmb() barrier()
> +# define smp_wmb() barrier()
> +#endif
> +
> +static inline unsigned long __xchg(unsigned long x, volatile void *ptr,
> + int size)
> +{
> + unsigned long tmp;
> +
> + switch (size) {
> + case 1:
> + tmp = __raw_xchg_1_asm(ptr, x);
> + break;
> + case 2:
> + tmp = __raw_xchg_2_asm(ptr, x);
> + break;
> + case 4:
> + tmp = __raw_xchg_4_asm(ptr, x);
> + break;
> + }
> +
> + return tmp;
> +}
> +
> +/*
> + * Atomic compare and exchange. Compare OLD with MEM, if identical,
> + * store NEW in MEM. Return the initial value in MEM. Success is
> + * indicated by comparing RETURN with OLD.
> + */
> +static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
> + unsigned long new, int size)
> +{
> + unsigned long tmp;
> +
> + switch (size) {
> + case 1:
> + tmp = __raw_cmpxchg_1_asm(ptr, new, old);
> + break;
> + case 2:
> + tmp = __raw_cmpxchg_2_asm(ptr, new, old);
> + break;
> + case 4:
> + tmp = __raw_cmpxchg_4_asm(ptr, new, old);
> + break;
> + }
> +
> + return tmp;
> +}
> +#define cmpxchg(ptr, o, n) \
> + ((__typeof__(*(ptr)))__cmpxchg((ptr), (unsigned long)(o), \
> + (unsigned long)(n), sizeof(*(ptr))))
> +
> +#define smp_read_barrier_depends() smp_check_barrier()
> +
> +#else /* !CONFIG_SMP */
> +
> #define smp_mb() barrier()
> #define smp_rmb() barrier()
> #define smp_wmb() barrier()
> #define smp_read_barrier_depends() do { } while(0)
> -#endif
> -
> -#define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
>
> struct __xchg_dummy {
> unsigned long a[100];
> @@ -194,9 +265,12 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr,
> (unsigned long)(n), sizeof(*(ptr))))
> #define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))
>
> -#ifndef CONFIG_SMP
> #include <asm-generic/cmpxchg.h>
> -#endif
> +
> +#endif /* !CONFIG_SMP */
> +
> +#define xchg(ptr, x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x), (ptr), sizeof(*(ptr))))
> +#define tas(ptr) ((void)xchg((ptr), 1))
>
> #define prepare_to_switch() do { } while(0)
>
> @@ -218,4 +292,4 @@ do { \
> (last) = resume (prev, next); \
> } while (0)
>
> -#endif /* _BLACKFIN_SYSTEM_H */
> +#endif /* _BLACKFIN_SYSTEM_H */
> diff --git a/arch/blackfin/mach-common/Makefile b/arch/blackfin/mach-common/Makefile
> index e6ed57c..9388b4a 100644
> --- a/arch/blackfin/mach-common/Makefile
> +++ b/arch/blackfin/mach-common/Makefile
> @@ -10,3 +10,4 @@ obj-$(CONFIG_BFIN_ICACHE_LOCK) += lock.o
> obj-$(CONFIG_PM) += pm.o dpmc_modes.o
> obj-$(CONFIG_CPU_FREQ) += cpufreq.o
> obj-$(CONFIG_CPU_VOLTAGE) += dpmc.o
> +obj-$(CONFIG_SMP) += smp.o
> diff --git a/arch/blackfin/mach-common/cache.S b/arch/blackfin/mach-common/cache.S
> index 3c98dac..1187512 100644
> --- a/arch/blackfin/mach-common/cache.S
> +++ b/arch/blackfin/mach-common/cache.S
> @@ -97,3 +97,39 @@ ENTRY(_blackfin_dflush_page)
> P1 = 1 << (PAGE_SHIFT - L1_CACHE_SHIFT);
> jump .Ldfr;
> ENDPROC(_blackfin_dflush_page)
> +
> +/* Invalidate the Entire Data cache by
> + * clearing DMC[1:0] bits
> + */
> +ENTRY(_blackfin_invalidate_entire_dcache)
> + [--SP] = ( R7:5);
> +
> + P0.L = LO(DMEM_CONTROL);
> + P0.H = HI(DMEM_CONTROL);
> + R7 = [P0];
> + R5 = R7; /* Save DMEM_CNTR */
> +
> + /* Clear the DMC[1:0] bits, All valid bits in the data
> + * cache are set to the invalid state
> + */
> + BITCLR(R7,DMC0_P);
> + BITCLR(R7,DMC1_P);
> + CLI R6;
> + SSYNC; /* SSYNC required before writing to DMEM_CONTROL. */
> + .align 8;
> + [P0] = R7;
> + SSYNC;
> + STI R6;
> +
> + /* Configures the data cache again */
> +
> + CLI R6;
> + SSYNC; /* SSYNC required before writing to DMEM_CONTROL. */
> + .align 8;
> + [P0] = R5;
> + SSYNC;
> + STI R6;
> +
> + ( R7:5) = [SP++];
> + RTS;
> +ENDPROC(_blackfin_invalidate_entire_dcache)
> diff --git a/arch/blackfin/mach-common/entry.S b/arch/blackfin/mach-common/entry.S
> index c6ae844..5531f49 100644
> --- a/arch/blackfin/mach-common/entry.S
> +++ b/arch/blackfin/mach-common/entry.S
> @@ -36,6 +36,7 @@
> #include <linux/init.h>
> #include <linux/linkage.h>
> #include <linux/unistd.h>
> +#include <linux/threads.h>
> #include <asm/blackfin.h>
> #include <asm/errno.h>
> #include <asm/fixed_code.h>
> @@ -75,11 +76,11 @@ ENTRY(_ex_workaround_261)
> * handle it.
> */
> P4 = R7; /* Store EXCAUSE */
> - p5.l = _last_cplb_fault_retx;
> - p5.h = _last_cplb_fault_retx;
> - r7 = [p5];
> +
> + GET_PDA(p5, r7);
> + r7 = [p5 + PDA_LFRETX];
> r6 = retx;
> - [p5] = r6;
> + [p5 + PDA_LFRETX] = r6;
> cc = r6 == r7;
> if !cc jump _bfin_return_from_exception;
> /* fall through */
> @@ -324,7 +325,9 @@ ENTRY(_ex_trap_c)
> [p4] = p5;
> csync;
>
> + GET_PDA(p5, r6);
> #ifndef CONFIG_DEBUG_DOUBLEFAULT
> +
> /*
> * Save these registers, as they are only valid in exception context
> * (where we are now - as soon as we defer to IRQ5, they can change)
> @@ -335,29 +338,25 @@ ENTRY(_ex_trap_c)
> p4.l = lo(DCPLB_FAULT_ADDR);
> p4.h = hi(DCPLB_FAULT_ADDR);
> r7 = [p4];
> - p5.h = _saved_dcplb_fault_addr;
> - p5.l = _saved_dcplb_fault_addr;
> - [p5] = r7;
> + [p5 + PDA_DCPLB] = r7;
>
> - r7 = [p4 + (ICPLB_FAULT_ADDR - DCPLB_FAULT_ADDR)];
> - p5.h = _saved_icplb_fault_addr;
> - p5.l = _saved_icplb_fault_addr;
> - [p5] = r7;
> + p4.l = lo(ICPLB_FAULT_ADDR);
> + p4.h = hi(ICPLB_FAULT_ADDR);
> + r6 = [p4];
> + [p5 + PDA_ICPLB] = r6;
>
> r6 = retx;
> - p4.l = _saved_retx;
> - p4.h = _saved_retx;
> - [p4] = r6;
> + [p5 + PDA_RETX] = r6;
> #endif
> r6 = SYSCFG;
> - [p4 + 4] = r6;
> + [p5 + PDA_SYSCFG] = r6;
> BITCLR(r6, 0);
> SYSCFG = r6;
>
> /* Disable all interrupts, but make sure level 5 is enabled so
> * we can switch to that level. Save the old mask. */
> cli r6;
> - [p4 + 8] = r6;
> + [p5 + PDA_EXIMASK] = r6;
>
> p4.l = lo(SAFE_USER_INSTRUCTION);
> p4.h = hi(SAFE_USER_INSTRUCTION);
> @@ -424,17 +423,16 @@ ENDPROC(_double_fault)
> ENTRY(_exception_to_level5)
> SAVE_ALL_SYS
>
> - p4.l = _saved_retx;
> - p4.h = _saved_retx;
> - r6 = [p4];
> + GET_PDA(p4, r7); /* Fetch current PDA */
> + r6 = [p4 + PDA_RETX];
> [sp + PT_PC] = r6;
>
> - r6 = [p4 + 4];
> + r6 = [p4 + PDA_SYSCFG];
> [sp + PT_SYSCFG] = r6;
>
> /* Restore interrupt mask. We haven't pushed RETI, so this
> * doesn't enable interrupts until we return from this handler. */
> - r6 = [p4 + 8];
> + r6 = [p4 + PDA_EXIMASK];
> sti r6;
>
> /* Restore the hardware error vector. */
> @@ -478,8 +476,8 @@ ENTRY(_trap) /* Exception: 4th entry into system event table(supervisor mode)*/
> * scratch register (for want of a better option).
> */
> EX_SCRATCH_REG = sp;
> - sp.l = _exception_stack_top;
> - sp.h = _exception_stack_top;
> + GET_PDA_SAFE(sp);
> + sp = [sp + PDA_EXSTACK]
> /* Try to deal with syscalls quickly. */
> [--sp] = ASTAT;
> [--sp] = (R7:6,P5:4);
> @@ -501,27 +499,22 @@ ENTRY(_trap) /* Exception: 4th entry into system event table(supervisor mode)*/
> * but they are not very interesting, so don't save them
> */
>
> + GET_PDA(p5, r7);
> p4.l = lo(DCPLB_FAULT_ADDR);
> p4.h = hi(DCPLB_FAULT_ADDR);
> r7 = [p4];
> - p5.h = _saved_dcplb_fault_addr;
> - p5.l = _saved_dcplb_fault_addr;
> - [p5] = r7;
> + [p5 + PDA_DCPLB] = r7;
>
> - r7 = [p4 + (ICPLB_FAULT_ADDR - DCPLB_FAULT_ADDR)];
> - p5.h = _saved_icplb_fault_addr;
> - p5.l = _saved_icplb_fault_addr;
> - [p5] = r7;
> + p4.l = lo(ICPLB_FAULT_ADDR);
> + p4.h = hi(ICPLB_FAULT_ADDR);
> + r7 = [p4];
> + [p5 + PDA_ICPLB] = r7;
>
> - p4.l = _saved_retx;
> - p4.h = _saved_retx;
> r6 = retx;
> - [p4] = r6;
> + [p5 + PDA_RETX] = r6;
>
> r7 = SEQSTAT; /* reason code is in bit 5:0 */
> - p4.l = _saved_seqstat;
> - p4.h = _saved_seqstat;
> - [p4] = r7;
> + [p5 + PDA_SEQSTAT] = r7;
> #else
> r7 = SEQSTAT; /* reason code is in bit 5:0 */
> #endif
> @@ -546,11 +539,11 @@ ENTRY(_kernel_execve)
> p0 = sp;
> r3 = SIZEOF_PTREGS / 4;
> r4 = 0(x);
> -0:
> +.Lclear_regs:
> [p0++] = r4;
> r3 += -1;
> cc = r3 == 0;
> - if !cc jump 0b (bp);
> + if !cc jump .Lclear_regs (bp);
>
> p0 = sp;
> sp += -16;
> @@ -558,7 +551,7 @@ ENTRY(_kernel_execve)
> call _do_execve;
> SP += 16;
> cc = r0 == 0;
> - if ! cc jump 1f;
> + if ! cc jump .Lexecve_failed;
> /* Success. Copy our temporary pt_regs to the top of the kernel
> * stack and do a normal exception return.
> */
> @@ -574,12 +567,12 @@ ENTRY(_kernel_execve)
> p0 = fp;
> r4 = [p0--];
> r3 = SIZEOF_PTREGS / 4;
> -0:
> +.Lcopy_regs:
> r4 = [p0--];
> [p1--] = r4;
> r3 += -1;
> cc = r3 == 0;
> - if ! cc jump 0b (bp);
> + if ! cc jump .Lcopy_regs (bp);
>
> r0 = (KERNEL_STACK_SIZE - SIZEOF_PTREGS) (z);
> p1 = r0;
> @@ -591,7 +584,7 @@ ENTRY(_kernel_execve)
>
> RESTORE_CONTEXT;
> rti;
> -1:
> +.Lexecve_failed:
> unlink;
> rts;
> ENDPROC(_kernel_execve)
> @@ -925,9 +918,14 @@ _schedule_and_signal_from_int:
> p1 = rets;
> [sp + PT_RESERVED] = p1;
>
> +#ifdef CONFIG_SMP
> + GET_PDA(p0, r0); /* Fetch current PDA (can't migrate to other CPU here) */
> + r0 = [p0 + PDA_IRQFLAGS];
> +#else
> p0.l = _irq_flags;
> p0.h = _irq_flags;
> r0 = [p0];
> +#endif
> sti r0;
>
> r0 = sp;
> @@ -1539,12 +1537,6 @@ ENTRY(_sys_call_table)
> .endr
> END(_sys_call_table)
>
> -#if ANOMALY_05000261
> -/* Used by the assembly entry point to work around an anomaly. */
> -_last_cplb_fault_retx:
> - .long 0;
> -#endif
> -
> #ifdef CONFIG_EXCEPTION_L1_SCRATCH
> /* .section .l1.bss.scratch */
> .set _exception_stack_top, L1_SCRATCH_START + L1_SCRATCH_LENGTH
> @@ -1554,8 +1546,8 @@ _last_cplb_fault_retx:
> #else
> .bss
> #endif
> -_exception_stack:
> - .rept 1024
> +ENTRY(_exception_stack)
> + .rept 1024 * NR_CPUS
> .long 0
> .endr
> _exception_stack_top:
> diff --git a/arch/blackfin/mach-common/head.S b/arch/blackfin/mach-common/head.S
> index c1dcaeb..a621ae4 100644
> --- a/arch/blackfin/mach-common/head.S
> +++ b/arch/blackfin/mach-common/head.S
> @@ -13,6 +13,7 @@
> #include <asm/blackfin.h>
> #include <asm/thread_info.h>
> #include <asm/trace.h>
> +#include <asm/asm-offsets.h>
>
> __INIT
>
> @@ -111,33 +112,26 @@ ENTRY(__start)
> * This happens here, since L1 gets clobbered
> * below
> */
> - p0.l = _saved_retx;
> - p0.h = _saved_retx;
> + GET_PDA(p0, r0);
> + r7 = [p0 + PDA_RETX];
> p1.l = _init_saved_retx;
> p1.h = _init_saved_retx;
> - r0 = [p0];
> - [p1] = r0;
> + [p1] = r7;
>
> - p0.l = _saved_dcplb_fault_addr;
> - p0.h = _saved_dcplb_fault_addr;
> + r7 = [p0 + PDA_DCPLB];
> p1.l = _init_saved_dcplb_fault_addr;
> p1.h = _init_saved_dcplb_fault_addr;
> - r0 = [p0];
> - [p1] = r0;
> + [p1] = r7;
>
> - p0.l = _saved_icplb_fault_addr;
> - p0.h = _saved_icplb_fault_addr;
> + r7 = [p0 + PDA_ICPLB];
> p1.l = _init_saved_icplb_fault_addr;
> p1.h = _init_saved_icplb_fault_addr;
> - r0 = [p0];
> - [p1] = r0;
> + [p1] = r7;
>
> - p0.l = _saved_seqstat;
> - p0.h = _saved_seqstat;
> + r7 = [p0 + PDA_SEQSTAT];
> p1.l = _init_saved_seqstat;
> p1.h = _init_saved_seqstat;
> - r0 = [p0];
> - [p1] = r0;
> + [p1] = r7;
> #endif
>
> /* Initialize stack pointer */
> @@ -255,6 +249,9 @@ ENTRY(_real_start)
> sp = sp + p1;
> usp = sp;
> fp = sp;
> + sp += -12;
> + call _init_pda
> + sp += 12;
> jump.l _start_kernel;
> ENDPROC(_real_start)
>
> diff --git a/arch/blackfin/mach-common/ints-priority.c b/arch/blackfin/mach-common/ints-priority.c
> index d45d0c5..eb8dfcf 100644
> --- a/arch/blackfin/mach-common/ints-priority.c
> +++ b/arch/blackfin/mach-common/ints-priority.c
> @@ -55,6 +55,7 @@
> * -
> */
>
> +#ifndef CONFIG_SMP
> /* Initialize this to an actual value to force it into the .data
> * section so that we know it is properly initialized at entry into
> * the kernel but before bss is initialized to zero (which is where
> @@ -63,6 +64,7 @@
> */
> unsigned long irq_flags = 0x1f;
> EXPORT_SYMBOL(irq_flags);
> +#endif
>
> /* The number of spurious interrupts */
> atomic_t num_spurious;
> @@ -163,6 +165,10 @@ static void bfin_internal_mask_irq(unsigned int irq)
> mask_bit = SIC_SYSIRQ(irq) % 32;
> bfin_write_SIC_IMASK(mask_bank, bfin_read_SIC_IMASK(mask_bank) &
> ~(1 << mask_bit));
> +#ifdef CONFIG_SMP
> + bfin_write_SICB_IMASK(mask_bank, bfin_read_SICB_IMASK(mask_bank) &
> + ~(1 << mask_bit));
> +#endif
> #endif
> }
>
> @@ -177,6 +183,10 @@ static void bfin_internal_unmask_irq(unsigned int irq)
> mask_bit = SIC_SYSIRQ(irq) % 32;
> bfin_write_SIC_IMASK(mask_bank, bfin_read_SIC_IMASK(mask_bank) |
> (1 << mask_bit));
> +#ifdef CONFIG_SMP
> + bfin_write_SICB_IMASK(mask_bank, bfin_read_SICB_IMASK(mask_bank) |
> + (1 << mask_bit));
> +#endif
> #endif
> }
>
> @@ -896,7 +906,7 @@ static struct irq_chip bfin_gpio_irqchip = {
> #endif
> };
>
> -void __init init_exception_vectors(void)
> +void __cpuinit init_exception_vectors(void)
> {
> /* cannot program in software:
> * evt0 - emulation (jtag)
> @@ -935,6 +945,10 @@ int __init init_arch_irq(void)
> # ifdef CONFIG_BF54x
> bfin_write_SIC_IMASK2(SIC_UNMASK_ALL);
> # endif
> +# ifdef CONFIG_SMP
> + bfin_write_SICB_IMASK0(SIC_UNMASK_ALL);
> + bfin_write_SICB_IMASK1(SIC_UNMASK_ALL);
> +# endif
> #else
> bfin_write_SIC_IMASK(SIC_UNMASK_ALL);
> #endif
> @@ -995,6 +1009,17 @@ int __init init_arch_irq(void)
>
> break;
> #endif
> +#ifdef CONFIG_TICK_SOURCE_SYSTMR0
> + case IRQ_TIMER0:
> + set_irq_handler(irq, handle_percpu_irq);
> + break;
> +#endif
> +#ifdef CONFIG_SMP
> + case IRQ_SUPPLE_0:
> + case IRQ_SUPPLE_1:
> + set_irq_handler(irq, handle_percpu_irq);
> + break;
> +#endif
> default:
> set_irq_handler(irq, handle_simple_irq);
> break;
> @@ -1029,7 +1054,7 @@ int __init init_arch_irq(void)
> search_IAR();
>
> /* Enable interrupts IVG7-15 */
> - irq_flags = irq_flags | IMASK_IVG15 |
> + irq_flags |= IMASK_IVG15 |
> IMASK_IVG14 | IMASK_IVG13 | IMASK_IVG12 | IMASK_IVG11 |
> IMASK_IVG10 | IMASK_IVG9 | IMASK_IVG8 | IMASK_IVG7 | IMASK_IVGHW;
>
> @@ -1070,8 +1095,16 @@ void do_irq(int vec, struct pt_regs *fp)
> || defined(BF538_FAMILY) || defined(CONFIG_BF51x)
> unsigned long sic_status[3];
>
> - sic_status[0] = bfin_read_SIC_ISR0() & bfin_read_SIC_IMASK0();
> - sic_status[1] = bfin_read_SIC_ISR1() & bfin_read_SIC_IMASK1();
> + if (smp_processor_id()) {
> +#ifdef CONFIG_SMP
> + /* This will be optimized out in UP mode. */
> + sic_status[0] = bfin_read_SICB_ISR0() & bfin_read_SICB_IMASK0();
> + sic_status[1] = bfin_read_SICB_ISR1() & bfin_read_SICB_IMASK1();
> +#endif
> + } else {
> + sic_status[0] = bfin_read_SIC_ISR0() & bfin_read_SIC_IMASK0();
> + sic_status[1] = bfin_read_SIC_ISR1() & bfin_read_SIC_IMASK1();
> + }
> #ifdef CONFIG_BF54x
> sic_status[2] = bfin_read_SIC_ISR2() & bfin_read_SIC_IMASK2();
> #endif
> diff --git a/arch/blackfin/mach-common/smp.c b/arch/blackfin/mach-common/smp.c
> new file mode 100644
> index 0000000..7aeeced
> --- /dev/null
> +++ b/arch/blackfin/mach-common/smp.c
> @@ -0,0 +1,476 @@
> +/*
> + * File: arch/blackfin/kernel/smp.c
> + * Author: Philippe Gerum <[email protected]>
> + * IPI management based on arch/arm/kernel/smp.c.
> + *
> + * Copyright 2007 Analog Devices Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see the file COPYING, or write
> + * to the Free Software Foundation, Inc.,
> + * 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include <linux/module.h>
> +#include <linux/delay.h>
> +#include <linux/init.h>
> +#include <linux/spinlock.h>
> +#include <linux/sched.h>
> +#include <linux/interrupt.h>
> +#include <linux/cache.h>
> +#include <linux/profile.h>
> +#include <linux/errno.h>
> +#include <linux/mm.h>
> +#include <linux/cpu.h>
> +#include <linux/smp.h>
> +#include <linux/seq_file.h>
> +#include <linux/irq.h>
> +#include <asm/atomic.h>
> +#include <asm/cacheflush.h>
> +#include <asm/mmu_context.h>
> +#include <asm/pgtable.h>
> +#include <asm/pgalloc.h>
> +#include <asm/processor.h>
> +#include <asm/ptrace.h>
> +#include <asm/cpu.h>
> +#include <linux/err.h>
> +
> +struct corelock_slot corelock __attribute__ ((__section__(".l2.bss")));
> +
> +void __cpuinitdata *init_retx_coreb, *init_saved_retx_coreb,
> + *init_saved_seqstat_coreb, *init_saved_icplb_fault_addr_coreb,
> + *init_saved_dcplb_fault_addr_coreb;
> +
> +cpumask_t cpu_possible_map;
> +EXPORT_SYMBOL(cpu_possible_map);
> +
> +cpumask_t cpu_online_map;
> +EXPORT_SYMBOL(cpu_online_map);
> +
> +#define BFIN_IPI_RESCHEDULE 0
> +#define BFIN_IPI_CALL_FUNC 1
> +#define BFIN_IPI_CPU_STOP 2
> +
> +struct blackfin_flush_data {
> + unsigned long start;
> + unsigned long end;
> +};
> +
> +void *secondary_stack;
> +
> +
> +struct smp_call_struct {
> + void (*func)(void *info);
> + void *info;
> + int wait;
> + cpumask_t pending;
> + cpumask_t waitmask;
> +};
> +
> +static struct blackfin_flush_data smp_flush_data;
> +
> +static DEFINE_SPINLOCK(stop_lock);
> +
> +struct ipi_message {
> + struct list_head list;
> + unsigned long type;
> + struct smp_call_struct call_struct;
> +};
> +
> +struct ipi_message_queue {
> + struct list_head head;
> + spinlock_t lock;
> + unsigned long count;
> +};
> +
> +static DEFINE_PER_CPU(struct ipi_message_queue, ipi_msg_queue);
> +
> +static void ipi_cpu_stop(unsigned int cpu)
> +{
> + spin_lock(&stop_lock);
> + printk(KERN_CRIT "CPU%u: stopping\n", cpu);
> + dump_stack();
> + spin_unlock(&stop_lock);
> +
> + cpu_clear(cpu, cpu_online_map);
> +
> + local_irq_disable();
> +
> + while (1)
> + SSYNC();
> +}
> +
> +static void ipi_flush_icache(void *info)
> +{
> + struct blackfin_flush_data *fdata = info;
> +
> + /* Invalidate the memory holding the bounds of the flushed region. */
> + blackfin_dcache_invalidate_range((unsigned long)fdata,
> + (unsigned long)fdata + sizeof(*fdata));
> +
> + blackfin_icache_flush_range(fdata->start, fdata->end);
> +}
> +
> +static void ipi_call_function(unsigned int cpu, struct ipi_message *msg)
> +{
> + int wait;
> + void (*func)(void *info);
> + void *info;
> + func = msg->call_struct.func;
> + info = msg->call_struct.info;
> + wait = msg->call_struct.wait;
> + cpu_clear(cpu, msg->call_struct.pending);
> + func(info);
> + if (wait)
> + cpu_clear(cpu, msg->call_struct.waitmask);
> + else
> + kfree(msg);
> +}
> +
> +static irqreturn_t ipi_handler(int irq, void *dev_instance)
> +{
> + struct ipi_message *msg, *mg;
> + struct ipi_message_queue *msg_queue;
> + unsigned int cpu = smp_processor_id();
> +
> + platform_clear_ipi(cpu);
> +
> + msg_queue = &__get_cpu_var(ipi_msg_queue);
> + msg_queue->count++;
> +
> + spin_lock(&msg_queue->lock);
> + list_for_each_entry_safe(msg, mg, &msg_queue->head, list) {
> + list_del(&msg->list);
> + switch (msg->type) {
> + case BFIN_IPI_RESCHEDULE:
> + /* That's the easiest one; leave it to
> + * return_from_int. */
> + kfree(msg);
> + break;
> + case BFIN_IPI_CALL_FUNC:
> + ipi_call_function(cpu, msg);
> + break;
> + case BFIN_IPI_CPU_STOP:
> + ipi_cpu_stop(cpu);
> + kfree(msg);
> + break;
> + default:
> + printk(KERN_CRIT "CPU%u: Unknown IPI message \
> + 0x%lx\n", cpu, msg->type);
> + kfree(msg);
> + break;
> + }
> + }
> + spin_unlock(&msg_queue->lock);
> + return IRQ_HANDLED;
> +}
> +
> +static void ipi_queue_init(void)
> +{
> + unsigned int cpu;
> + struct ipi_message_queue *msg_queue;
> + for_each_possible_cpu(cpu) {
> + msg_queue = &per_cpu(ipi_msg_queue, cpu);
> + INIT_LIST_HEAD(&msg_queue->head);
> + spin_lock_init(&msg_queue->lock);
> + msg_queue->count = 0;
> + }
> +}
> +
> +int smp_call_function(void (*func)(void *info), void *info, int wait)
> +{
> + unsigned int cpu;
> + cpumask_t callmap;
> + unsigned long flags;
> + struct ipi_message_queue *msg_queue;
> + struct ipi_message *msg;
> +
> + callmap = cpu_online_map;
> + cpu_clear(smp_processor_id(), callmap);
> + if (cpus_empty(callmap))
> + return 0;
> +
> + msg = kmalloc(sizeof(*msg), GFP_ATOMIC);
> + INIT_LIST_HEAD(&msg->list);
> + msg->call_struct.func = func;
> + msg->call_struct.info = info;
> + msg->call_struct.wait = wait;
> + msg->call_struct.pending = callmap;
> + msg->call_struct.waitmask = callmap;
> + msg->type = BFIN_IPI_CALL_FUNC;
> +
> + for_each_cpu_mask(cpu, callmap) {
> + msg_queue = &per_cpu(ipi_msg_queue, cpu);
> + spin_lock_irqsave(&msg_queue->lock, flags);
> + list_add(&msg->list, &msg_queue->head);
> + spin_unlock_irqrestore(&msg_queue->lock, flags);
> + platform_send_ipi_cpu(cpu);
> + }
> + if (wait) {
> + while (!cpus_empty(msg->call_struct.waitmask))
> + blackfin_dcache_invalidate_range(
> + (unsigned long)(&msg->call_struct.waitmask),
> + (unsigned long)(&msg->call_struct.waitmask));
> + kfree(msg);
> + }
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(smp_call_function);
> +
> +int smp_call_function_single(int cpuid, void (*func) (void *info), void *info,
> + int wait)
> +{
> + unsigned int cpu = cpuid;
> + cpumask_t callmap;
> + unsigned long flags;
> + struct ipi_message_queue *msg_queue;
> + struct ipi_message *msg;
> +
> + if (cpu_is_offline(cpu))
> + return 0;
> + cpus_clear(callmap);
> + cpu_set(cpu, callmap);
> +
> + msg = kmalloc(sizeof(*msg), GFP_ATOMIC);
> + INIT_LIST_HEAD(&msg->list);
> + msg->call_struct.func = func;
> + msg->call_struct.info = info;
> + msg->call_struct.wait = wait;
> + msg->call_struct.pending = callmap;
> + msg->call_struct.waitmask = callmap;
> + msg->type = BFIN_IPI_CALL_FUNC;
> +
> + msg_queue = &per_cpu(ipi_msg_queue, cpu);
> + spin_lock_irqsave(&msg_queue->lock, flags);
> + list_add(&msg->list, &msg_queue->head);
> + spin_unlock_irqrestore(&msg_queue->lock, flags);
> + platform_send_ipi_cpu(cpu);
> +
> + if (wait) {
> + while (!cpus_empty(msg->call_struct.waitmask))
> + blackfin_dcache_invalidate_range(
> + (unsigned long)(&msg->call_struct.waitmask),
> + (unsigned long)(&msg->call_struct.waitmask));
> + kfree(msg);
> + }
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(smp_call_function_single);
> +
> +void smp_send_reschedule(int cpu)
> +{
> + unsigned long flags;
> + struct ipi_message_queue *msg_queue;
> + struct ipi_message *msg;
> +
> + if (cpu_is_offline(cpu))
> + return;
> +
> + msg = kmalloc(sizeof(*msg), GFP_ATOMIC);
> + memset(msg, 0, sizeof(msg));
> + INIT_LIST_HEAD(&msg->list);
> + msg->type = BFIN_IPI_RESCHEDULE;
> +
> + msg_queue = &per_cpu(ipi_msg_queue, cpu);
> + spin_lock_irqsave(&msg_queue->lock, flags);
> + list_add(&msg->list, &msg_queue->head);
> + spin_unlock_irqrestore(&msg_queue->lock, flags);
> + platform_send_ipi_cpu(cpu);
> +
> + return;
> +}
> +
> +void smp_send_stop(void)
> +{
> + unsigned int cpu;
> + cpumask_t callmap;
> + unsigned long flags;
> + struct ipi_message_queue *msg_queue;
> + struct ipi_message *msg;
> +
> + callmap = cpu_online_map;
> + cpu_clear(smp_processor_id(), callmap);
> + if (cpus_empty(callmap))
> + return;
> +
> + msg = kmalloc(sizeof(*msg), GFP_ATOMIC);
> + memset(msg, 0, sizeof(msg));
> + INIT_LIST_HEAD(&msg->list);
> + msg->type = BFIN_IPI_CPU_STOP;
> +
> + for_each_cpu_mask(cpu, callmap) {
> + msg_queue = &per_cpu(ipi_msg_queue, cpu);
> + spin_lock_irqsave(&msg_queue->lock, flags);
> + list_add(&msg->list, &msg_queue->head);
> + spin_unlock_irqrestore(&msg_queue->lock, flags);
> + platform_send_ipi_cpu(cpu);
> + }
> + return;
> +}
> +
> +int __cpuinit __cpu_up(unsigned int cpu)
> +{
> + struct task_struct *idle;
> + int ret;
> +
> + idle = fork_idle(cpu);
> + if (IS_ERR(idle)) {
> + printk(KERN_ERR "CPU%u: fork() failed\n", cpu);
> + return PTR_ERR(idle);
> + }
> +
> + secondary_stack = task_stack_page(idle) + THREAD_SIZE;
> + smp_wmb();
> +
> + ret = platform_boot_secondary(cpu, idle);
> +
> + if (ret) {
> + cpu_clear(cpu, cpu_present_map);
> + printk(KERN_CRIT "CPU%u: processor failed to boot (%d)\n", cpu, ret);
> + free_task(idle);
> + } else
> + cpu_set(cpu, cpu_online_map);
> +
> + secondary_stack = NULL;
> +
> + return ret;
> +}
> +
> +static void __cpuinit setup_secondary(unsigned int cpu)
> +{
> +#ifndef CONFIG_TICK_SOURCE_SYSTMR0
> + struct irq_desc *timer_desc;
> +#endif
> + unsigned long ilat;
> +
> + bfin_write_IMASK(0);
> + CSYNC();
> + ilat = bfin_read_ILAT();
> + CSYNC();
> + bfin_write_ILAT(ilat);
> + CSYNC();
> +
> + /* Reserve the PDA space for the secondary CPU. */
> + reserve_pda();
> +
> + /* Enable interrupt levels IVG7-15. IARs have been already
> + * programmed by the boot CPU. */
> + irq_flags |= IMASK_IVG15 |
> + IMASK_IVG14 | IMASK_IVG13 | IMASK_IVG12 | IMASK_IVG11 |
> + IMASK_IVG10 | IMASK_IVG9 | IMASK_IVG8 | IMASK_IVG7 | IMASK_IVGHW;
> +
> +#ifdef CONFIG_TICK_SOURCE_SYSTMR0
> + /* Power down the core timer, just to play safe. */
> + bfin_write_TCNTL(0);
> +
> + /* system timer0 has been setup by CoreA. */
> +#else
> + timer_desc = irq_desc + IRQ_CORETMR;
> + setup_core_timer();
> + timer_desc->chip->enable(IRQ_CORETMR);
> +#endif
> +}
> +
> +void __cpuinit secondary_start_kernel(void)
> +{
> + unsigned int cpu = smp_processor_id();
> + struct mm_struct *mm = &init_mm;
> +
> + if (_bfin_swrst & SWRST_DBL_FAULT_B) {
> + printk(KERN_EMERG "CoreB Recovering from DOUBLE FAULT event\n");
> +#ifdef CONFIG_DEBUG_DOUBLEFAULT
> + printk(KERN_EMERG " While handling exception (EXCAUSE = 0x%x) at %pF\n",
> + (int)init_saved_seqstat_coreb & SEQSTAT_EXCAUSE, init_saved_retx_coreb);
> + printk(KERN_NOTICE " DCPLB_FAULT_ADDR: %pF\n", init_saved_dcplb_fault_addr_coreb);
> + printk(KERN_NOTICE " ICPLB_FAULT_ADDR: %pF\n", init_saved_icplb_fault_addr_coreb);
> +#endif
> + printk(KERN_NOTICE " The instruction at %pF caused a double exception\n",
> + init_retx_coreb);
> + }
> +
> + /*
> + * We want the D-cache to be enabled early, in case the atomic
> + * support code emulates cache coherence (see
> + * __ARCH_SYNC_CORE_DCACHE).
> + */
> + init_exception_vectors();
> +
> + bfin_setup_caches(cpu);
> +
> + local_irq_disable();
> +
> + /* Attach the new idle task to the global mm. */
> + atomic_inc(&mm->mm_users);
> + atomic_inc(&mm->mm_count);
> + current->active_mm = mm;
> + BUG_ON(current->mm); /* Can't be, but better be safe than sorry. */
> +
> + preempt_disable();
> +
> + setup_secondary(cpu);
> +
> + local_irq_enable();
> +
> + platform_secondary_init(cpu);
> +
> + cpu_idle();
> +}
> +
> +void __init smp_prepare_boot_cpu(void)
> +{
> +}
> +
> +void __init smp_prepare_cpus(unsigned int max_cpus)
> +{
> + platform_prepare_cpus(max_cpus);
> + ipi_queue_init();
> + platform_request_ipi(&ipi_handler);
> +}
> +
> +void __init smp_cpus_done(unsigned int max_cpus)
> +{
> + unsigned long bogosum = 0;
> + unsigned int cpu;
> +
> + for_each_online_cpu(cpu)
> + bogosum += per_cpu(cpu_data, cpu).loops_per_jiffy;
> +
> + printk(KERN_INFO "SMP: Total of %d processors activated "
> + "(%lu.%02lu BogoMIPS).\n",
> + num_online_cpus(),
> + bogosum / (500000/HZ),
> + (bogosum / (5000/HZ)) % 100);
> +}
> +
> +void smp_icache_flush_range_others(unsigned long start, unsigned long end)
> +{
> + smp_flush_data.start = start;
> + smp_flush_data.end = end;
> +
> + if (smp_call_function(&ipi_flush_icache, &smp_flush_data, 1))
> + printk(KERN_WARNING "SMP: failed to run I-cache flush request on other CPUs\n");
> +}
> +EXPORT_SYMBOL_GPL(smp_icache_flush_range_others);
> +
> +#ifdef __ARCH_SYNC_CORE_DCACHE
> +unsigned long barrier_mask __attribute__ ((__section__(".l2.bss")));
> +
> +void resync_core_dcache(void)
> +{
> + unsigned int cpu = get_cpu();
> + blackfin_invalidate_entire_dcache();
> + ++per_cpu(cpu_data, cpu).dcache_invld_count;
> + put_cpu();
> +}
> +EXPORT_SYMBOL(resync_core_dcache);
> +#endif
> diff --git a/arch/blackfin/oprofile/common.c b/arch/blackfin/oprofile/common.c
> index 0f6d303..f34795a 100644
> --- a/arch/blackfin/oprofile/common.c
> +++ b/arch/blackfin/oprofile/common.c
> @@ -130,7 +130,7 @@ int __init oprofile_arch_init(struct oprofile_operations *ops)
>
> mutex_init(&pfmon_lock);
>
> - dspid = bfin_read_DSPID();
> + dspid = bfin_dspid();
>
> printk(KERN_INFO "Oprofile got the cpu id is 0x%x. \n", dspid);
>
> --
> 1.5.6.3
>

2008-11-19 07:44:59

by Bryan Wu

[permalink] [raw]
Subject: Re: [PATCH 2/5] Blackfin arch: SMP supporting patchset: Blackfin header files and machine common code

On Wed, Nov 19, 2008 at 3:05 PM, Nick Piggin <[email protected]> wrote:
> On Wednesday 19 November 2008 17:56, Andrew Morton wrote:
>> On Tue, 18 Nov 2008 17:05:05 +0800 Bryan Wu <[email protected]> wrote:
>
>> > +#define smp_mb__before_clear_bit() barrier()
>> > +#define smp_mb__after_clear_bit() barrier()
>> > +
>> > +static inline void __set_bit(int nr, volatile unsigned long *addr)
>> > +{
>> > + int *a = (int *)addr;
>> > + int mask;
>> > +
>> > + a += nr >> 5;
>> > + mask = 1 << (nr & 0x1f);
>> > + *a |= mask;
>> > +}
>> > +
>> > +static inline void __clear_bit(int nr, volatile unsigned long *addr)
>> > +{
>> > + int *a = (int *)addr;
>> > + int mask;
>> > +
>> > + a += nr >> 5;
>> > + mask = 1 << (nr & 0x1f);
>> > + *a &= ~mask;
>> > +}
>> > +
>> > +static inline void __change_bit(int nr, volatile unsigned long *addr)
>> > +{
>> > + int mask;
>> > + unsigned long *ADDR = (unsigned long *)addr;
>> > +
>> > + ADDR += nr >> 5;
>> > + mask = 1 << (nr & 31);
>> > + *ADDR ^= mask;
>> > +}
>>
>> I'm surprised there isn't any generic code which can be used for the above.
>
> include/asm-generic/bitops/non-atomic.h
>
>
>> > ...
>>
>> Gad what a lot of code. I don't think I have time to read it all, sorry.
>
> :) I don't know who is expected to. Cc'ing linux-arch for something
> like this might attract some helpful comments.

Yeah, I posted it to linux-arch.
Thanks
-Bryan

2008-11-19 07:46:20

by Bryan Wu

[permalink] [raw]
Subject: Re: [PATCH 3/5] Blackfin arch: SMP supporting patchset: Blackfin CPLB related code

Cc, linux-arch
-Bryan

On Tue, Nov 18, 2008 at 5:05 PM, Bryan Wu <[email protected]> wrote:
> From: Graf Yang <[email protected]>
>
> Blackfin dual core BF561 processor can support SMP like features.
> https://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like
>
> In this patch, we provide SMP extend to Blackfin CPLB related code
>
> Signed-off-by: Graf Yang <[email protected]>
> Signed-off-by: Bryan Wu <[email protected]>
> ---
> arch/blackfin/include/asm/cplb-mpu.h | 15 ++--
> arch/blackfin/include/asm/cplb.h | 21 +++---
> arch/blackfin/include/asm/cplbinit.h | 57 ++++++++++++---
> arch/blackfin/include/asm/mmu_context.h | 27 +++++--
> arch/blackfin/kernel/cplb-mpu/cacheinit.c | 4 +-
> arch/blackfin/kernel/cplb-mpu/cplbinfo.c | 43 +++++++----
> arch/blackfin/kernel/cplb-mpu/cplbinit.c | 43 ++++++------
> arch/blackfin/kernel/cplb-mpu/cplbmgr.c | 102 ++++++++++++++-------------
> arch/blackfin/kernel/cplb-nompu/cacheinit.c | 9 ++-
> arch/blackfin/kernel/cplb-nompu/cplbinfo.c | 55 +++++++++------
> arch/blackfin/kernel/cplb-nompu/cplbinit.c | 89 +++++++++---------------
> arch/blackfin/kernel/cplb-nompu/cplbmgr.S | 29 ++++----
> 12 files changed, 275 insertions(+), 219 deletions(-)
>
> diff --git a/arch/blackfin/include/asm/cplb-mpu.h b/arch/blackfin/include/asm/cplb-mpu.h
> index 75c67b9..80680ad 100644
> --- a/arch/blackfin/include/asm/cplb-mpu.h
> +++ b/arch/blackfin/include/asm/cplb-mpu.h
> @@ -28,6 +28,7 @@
> */
> #ifndef __ASM_BFIN_CPLB_MPU_H
> #define __ASM_BFIN_CPLB_MPU_H
> +#include <linux/threads.h>
>
> struct cplb_entry {
> unsigned long data, addr;
> @@ -39,22 +40,22 @@ struct mem_region {
> unsigned long icplb_data;
> };
>
> -extern struct cplb_entry dcplb_tbl[MAX_CPLBS];
> -extern struct cplb_entry icplb_tbl[MAX_CPLBS];
> +extern struct cplb_entry dcplb_tbl[NR_CPUS][MAX_CPLBS];
> +extern struct cplb_entry icplb_tbl[NR_CPUS][MAX_CPLBS];
> extern int first_switched_icplb;
> extern int first_mask_dcplb;
> extern int first_switched_dcplb;
>
> -extern int nr_dcplb_miss, nr_icplb_miss, nr_icplb_supv_miss, nr_dcplb_prot;
> -extern int nr_cplb_flush;
> +extern int nr_dcplb_miss[], nr_icplb_miss[], nr_icplb_supv_miss[];
> +extern int nr_dcplb_prot[], nr_cplb_flush[];
>
> extern int page_mask_order;
> extern int page_mask_nelts;
>
> -extern unsigned long *current_rwx_mask;
> +extern unsigned long *current_rwx_mask[NR_CPUS];
>
> -extern void flush_switched_cplbs(void);
> -extern void set_mask_dcplbs(unsigned long *);
> +extern void flush_switched_cplbs(unsigned int);
> +extern void set_mask_dcplbs(unsigned long *, unsigned int);
>
> extern void __noreturn panic_cplb_error(int seqstat, struct pt_regs *);
>
> diff --git a/arch/blackfin/include/asm/cplb.h b/arch/blackfin/include/asm/cplb.h
> index 9e8b403..5f7545d 100644
> --- a/arch/blackfin/include/asm/cplb.h
> +++ b/arch/blackfin/include/asm/cplb.h
> @@ -30,7 +30,6 @@
> #ifndef _CPLB_H
> #define _CPLB_H
>
> -#include <asm/blackfin.h>
> #include <mach/anomaly.h>
>
> #define SDRAM_IGENERIC (CPLB_L1_CHBL | CPLB_USER_RD | CPLB_VALID | CPLB_PORTPRIO)
> @@ -55,13 +54,24 @@
> #endif
>
> #define L1_DMEMORY (CPLB_LOCK | CPLB_COMMON)
> +
> +#ifdef CONFIG_SMP
> +#define L2_ATTR (INITIAL_T | I_CPLB | D_CPLB)
> +#define L2_IMEMORY (CPLB_COMMON | CPLB_LOCK)
> +#define L2_DMEMORY (CPLB_COMMON | CPLB_LOCK)
> +
> +#else
> #ifdef CONFIG_BFIN_L2_CACHEABLE
> #define L2_IMEMORY (SDRAM_IGENERIC)
> #define L2_DMEMORY (SDRAM_DGENERIC)
> #else
> #define L2_IMEMORY (CPLB_COMMON)
> #define L2_DMEMORY (CPLB_COMMON)
> -#endif
> +#endif /* CONFIG_BFIN_L2_CACHEABLE */
> +
> +#define L2_ATTR (INITIAL_T | SWITCH_T | I_CPLB | D_CPLB)
> +#endif /* CONFIG_SMP */
> +
> #define SDRAM_DNON_CHBL (CPLB_COMMON)
> #define SDRAM_EBIU (CPLB_COMMON)
> #define SDRAM_OOPS (CPLB_VALID | ANOMALY_05000158_WORKAROUND | CPLB_LOCK | CPLB_DIRTY)
> @@ -71,14 +81,7 @@
> #define SIZE_1M 0x00100000 /* 1M */
> #define SIZE_4M 0x00400000 /* 4M */
>
> -#ifdef CONFIG_MPU
> #define MAX_CPLBS 16
> -#else
> -#define MAX_CPLBS (16 * 2)
> -#endif
> -
> -#define ASYNC_MEMORY_CPLB_COVERAGE ((ASYNC_BANK0_SIZE + ASYNC_BANK1_SIZE + \
> - ASYNC_BANK2_SIZE + ASYNC_BANK3_SIZE) / SIZE_4M)
>
> #define CPLB_ENABLE_ICACHE_P 0
> #define CPLB_ENABLE_DCACHE_P 1
> diff --git a/arch/blackfin/include/asm/cplbinit.h b/arch/blackfin/include/asm/cplbinit.h
> index f845b41..6bfc257 100644
> --- a/arch/blackfin/include/asm/cplbinit.h
> +++ b/arch/blackfin/include/asm/cplbinit.h
> @@ -36,6 +36,8 @@
> #ifdef CONFIG_MPU
>
> #include <asm/cplb-mpu.h>
> +extern void bfin_icache_init(struct cplb_entry *icplb_tbl);
> +extern void bfin_dcache_init(struct cplb_entry *icplb_tbl);
>
> #else
>
> @@ -46,8 +48,40 @@
>
> #define IN_KERNEL 1
>
> -enum
> -{ZERO_P, L1I_MEM, L1D_MEM, SDRAM_KERN , SDRAM_RAM_MTD, SDRAM_DMAZ, RES_MEM, ASYNC_MEM, L2_MEM};
> +#define ASYNC_MEMORY_CPLB_COVERAGE ((ASYNC_BANK0_SIZE + ASYNC_BANK1_SIZE + \
> + ASYNC_BANK2_SIZE + ASYNC_BANK3_SIZE) / SIZE_4M)
> +
> +#define CPLB_MEM CONFIG_MAX_MEM_SIZE
> +
> +/*
> +* Number of required data CPLB switchtable entries
> +* MEMSIZE / 4 (we mostly install 4M page size CPLBs
> +* approx 16 for smaller 1MB page size CPLBs for allignment purposes
> +* 1 for L1 Data Memory
> +* possibly 1 for L2 Data Memory
> +* 1 for CONFIG_DEBUG_HUNT_FOR_ZERO
> +* 1 for ASYNC Memory
> +*/
> +#define MAX_SWITCH_D_CPLBS (((CPLB_MEM / 4) + 16 + 1 + 1 + 1 \
> + + ASYNC_MEMORY_CPLB_COVERAGE) * 2)
> +
> +/*
> +* Number of required instruction CPLB switchtable entries
> +* MEMSIZE / 4 (we mostly install 4M page size CPLBs
> +* approx 12 for smaller 1MB page size CPLBs for allignment purposes
> +* 1 for L1 Instruction Memory
> +* possibly 1 for L2 Instruction Memory
> +* 1 for CONFIG_DEBUG_HUNT_FOR_ZERO
> +*/
> +#define MAX_SWITCH_I_CPLBS (((CPLB_MEM / 4) + 12 + 1 + 1 + 1) * 2)
> +
> +/* Number of CPLB table entries, used for cplb-nompu. */
> +#define CPLB_TBL_ENTRIES (16 * 4)
> +
> +enum {
> + ZERO_P, L1I_MEM, L1D_MEM, L2_MEM, SDRAM_KERN, SDRAM_RAM_MTD, SDRAM_DMAZ,
> + RES_MEM, ASYNC_MEM, OCB_ROM
> +};
>
> struct cplb_desc {
> u32 start; /* start address */
> @@ -66,8 +100,8 @@ struct cplb_tab {
> u16 size;
> };
>
> -extern u_long icplb_table[];
> -extern u_long dcplb_table[];
> +extern u_long icplb_tables[NR_CPUS][CPLB_TBL_ENTRIES+1];
> +extern u_long dcplb_tables[NR_CPUS][CPLB_TBL_ENTRIES+1];
>
> /* Till here we are discussing about the static memory management model.
> * However, the operating envoronments commonly define more CPLB
> @@ -78,15 +112,18 @@ extern u_long dcplb_table[];
> * This is how Page descriptor Table is implemented in uClinux/Blackfin.
> */
>
> -extern u_long ipdt_table[];
> -extern u_long dpdt_table[];
> +extern u_long ipdt_tables[NR_CPUS][MAX_SWITCH_I_CPLBS+1];
> +extern u_long dpdt_tables[NR_CPUS][MAX_SWITCH_D_CPLBS+1];
> #ifdef CONFIG_CPLB_INFO
> -extern u_long ipdt_swapcount_table[];
> -extern u_long dpdt_swapcount_table[];
> +extern u_long ipdt_swapcount_tables[NR_CPUS][MAX_SWITCH_I_CPLBS];
> +extern u_long dpdt_swapcount_tables[NR_CPUS][MAX_SWITCH_D_CPLBS];
> #endif
> +extern void bfin_icache_init(u_long icplbs[]);
> +extern void bfin_dcache_init(u_long dcplbs[]);
>
> #endif /* CONFIG_MPU */
>
> -extern void generate_cplb_tables(void);
> -
> +#if defined(CONFIG_BFIN_DCACHE) || defined(CONFIG_BFIN_ICACHE)
> +extern void generate_cplb_tables_cpu(unsigned int cpu);
> +#endif
> #endif
> diff --git a/arch/blackfin/include/asm/mmu_context.h b/arch/blackfin/include/asm/mmu_context.h
> index 35593dd..944e29f 100644
> --- a/arch/blackfin/include/asm/mmu_context.h
> +++ b/arch/blackfin/include/asm/mmu_context.h
> @@ -37,6 +37,10 @@
> #include <asm/pgalloc.h>
> #include <asm/cplbinit.h>
>
> +/* Note: L1 stacks are CPU-private things, so we bluntly disable this
> + feature in SMP mode, and use the per-CPU scratch SRAM bank only to
> + store the PDA instead. */
> +
> extern void *current_l1_stack_save;
> extern int nr_l1stack_tasks;
> extern void *l1_stack_base;
> @@ -88,12 +92,15 @@ activate_l1stack(struct mm_struct *mm, unsigned long sp_base)
> static inline void switch_mm(struct mm_struct *prev_mm, struct mm_struct *next_mm,
> struct task_struct *tsk)
> {
> +#ifdef CONFIG_MPU
> + unsigned int cpu = smp_processor_id();
> +#endif
> if (prev_mm == next_mm)
> return;
> #ifdef CONFIG_MPU
> - if (prev_mm->context.page_rwx_mask == current_rwx_mask) {
> - flush_switched_cplbs();
> - set_mask_dcplbs(next_mm->context.page_rwx_mask);
> + if (prev_mm->context.page_rwx_mask == current_rwx_mask[cpu]) {
> + flush_switched_cplbs(cpu);
> + set_mask_dcplbs(next_mm->context.page_rwx_mask, cpu);
> }
> #endif
>
> @@ -138,9 +145,10 @@ static inline void protect_page(struct mm_struct *mm, unsigned long addr,
>
> static inline void update_protections(struct mm_struct *mm)
> {
> - if (mm->context.page_rwx_mask == current_rwx_mask) {
> - flush_switched_cplbs();
> - set_mask_dcplbs(mm->context.page_rwx_mask);
> + unsigned int cpu = smp_processor_id();
> + if (mm->context.page_rwx_mask == current_rwx_mask[cpu]) {
> + flush_switched_cplbs(cpu);
> + set_mask_dcplbs(mm->context.page_rwx_mask, cpu);
> }
> }
> #endif
> @@ -165,6 +173,9 @@ init_new_context(struct task_struct *tsk, struct mm_struct *mm)
> static inline void destroy_context(struct mm_struct *mm)
> {
> struct sram_list_struct *tmp;
> +#ifdef CONFIG_MPU
> + unsigned int cpu = smp_processor_id();
> +#endif
>
> #ifdef CONFIG_APP_STACK_L1
> if (current_l1_stack_save == mm->context.l1_stack_save)
> @@ -179,8 +190,8 @@ static inline void destroy_context(struct mm_struct *mm)
> kfree(tmp);
> }
> #ifdef CONFIG_MPU
> - if (current_rwx_mask == mm->context.page_rwx_mask)
> - current_rwx_mask = NULL;
> + if (current_rwx_mask[cpu] == mm->context.page_rwx_mask)
> + current_rwx_mask[cpu] = NULL;
> free_pages((unsigned long)mm->context.page_rwx_mask, page_mask_order);
> #endif
> }
> diff --git a/arch/blackfin/kernel/cplb-mpu/cacheinit.c b/arch/blackfin/kernel/cplb-mpu/cacheinit.c
> index a8b712a..c6ff947 100644
> --- a/arch/blackfin/kernel/cplb-mpu/cacheinit.c
> +++ b/arch/blackfin/kernel/cplb-mpu/cacheinit.c
> @@ -25,7 +25,7 @@
> #include <asm/cplbinit.h>
>
> #if defined(CONFIG_BFIN_ICACHE)
> -void __init bfin_icache_init(void)
> +void __cpuinit bfin_icache_init(struct cplb_entry *icplb_tbl)
> {
> unsigned long ctrl;
> int i;
> @@ -43,7 +43,7 @@ void __init bfin_icache_init(void)
> #endif
>
> #if defined(CONFIG_BFIN_DCACHE)
> -void __init bfin_dcache_init(void)
> +void __cpuinit bfin_dcache_init(struct cplb_entry *dcplb_tbl)
> {
> unsigned long ctrl;
> int i;
> diff --git a/arch/blackfin/kernel/cplb-mpu/cplbinfo.c b/arch/blackfin/kernel/cplb-mpu/cplbinfo.c
> index 822beef..00cb2cf 100644
> --- a/arch/blackfin/kernel/cplb-mpu/cplbinfo.c
> +++ b/arch/blackfin/kernel/cplb-mpu/cplbinfo.c
> @@ -66,32 +66,32 @@ static char *cplb_print_entry(char *buf, struct cplb_entry *tbl, int switched)
> return buf;
> }
>
> -int cplbinfo_proc_output(char *buf)
> +int cplbinfo_proc_output(char *buf, void *data)
> {
> char *p;
> + unsigned int cpu = (unsigned int)data;;
>
> p = buf;
>
> - p += sprintf(p, "------------------ CPLB Information ------------------\n\n");
> -
> + p += sprintf(p, "------------- CPLB Information on CPU%u --------------\n\n", cpu);
> if (bfin_read_IMEM_CONTROL() & ENICPLB) {
> p += sprintf(p, "Instruction CPLB entry:\n");
> - p = cplb_print_entry(p, icplb_tbl, first_switched_icplb);
> + p = cplb_print_entry(p, icplb_tbl[cpu], first_switched_icplb);
> } else
> p += sprintf(p, "Instruction CPLB is disabled.\n\n");
>
> if (1 || bfin_read_DMEM_CONTROL() & ENDCPLB) {
> p += sprintf(p, "Data CPLB entry:\n");
> - p = cplb_print_entry(p, dcplb_tbl, first_switched_dcplb);
> + p = cplb_print_entry(p, dcplb_tbl[cpu], first_switched_dcplb);
> } else
> p += sprintf(p, "Data CPLB is disabled.\n");
>
> p += sprintf(p, "ICPLB miss: %d\nICPLB supervisor miss: %d\n",
> - nr_icplb_miss, nr_icplb_supv_miss);
> + nr_icplb_miss[cpu], nr_icplb_supv_miss[cpu]);
> p += sprintf(p, "DCPLB miss: %d\nDCPLB protection fault:%d\n",
> - nr_dcplb_miss, nr_dcplb_prot);
> + nr_dcplb_miss[cpu], nr_dcplb_prot[cpu]);
> p += sprintf(p, "CPLB flushes: %d\n",
> - nr_cplb_flush);
> + nr_cplb_flush[cpu]);
>
> return p - buf;
> }
> @@ -101,7 +101,7 @@ static int cplbinfo_read_proc(char *page, char **start, off_t off,
> {
> int len;
>
> - len = cplbinfo_proc_output(page);
> + len = cplbinfo_proc_output(page, data);
> if (len <= off + count)
> *eof = 1;
> *start = page + off;
> @@ -115,20 +115,33 @@ static int cplbinfo_read_proc(char *page, char **start, off_t off,
>
> static int __init cplbinfo_init(void)
> {
> - struct proc_dir_entry *entry;
> + struct proc_dir_entry *parent, *entry;
> + unsigned int cpu;
> + unsigned char str[10];
> +
> + parent = proc_mkdir("cplbinfo", NULL);
>
> - entry = create_proc_entry("cplbinfo", 0, NULL);
> - if (!entry)
> - return -ENOMEM;
> + for_each_online_cpu(cpu) {
> + sprintf(str, "cpu%u", cpu);
> + entry = create_proc_entry(str, 0, parent);
> + if (!entry)
> + return -ENOMEM;
>
> - entry->read_proc = cplbinfo_read_proc;
> - entry->data = NULL;
> + entry->read_proc = cplbinfo_read_proc;
> + entry->data = (void *)cpu;
> + }
>
> return 0;
> }
>
> static void __exit cplbinfo_exit(void)
> {
> + unsigned int cpu;
> + unsigned char str[20];
> + for_each_online_cpu(cpu) {
> + sprintf(str, "cplbinfo/cpu%u", cpu);
> + remove_proc_entry(str, NULL);
> + }
> remove_proc_entry("cplbinfo", NULL);
> }
>
> diff --git a/arch/blackfin/kernel/cplb-mpu/cplbinit.c b/arch/blackfin/kernel/cplb-mpu/cplbinit.c
> index 55af729..269d2a3 100644
> --- a/arch/blackfin/kernel/cplb-mpu/cplbinit.c
> +++ b/arch/blackfin/kernel/cplb-mpu/cplbinit.c
> @@ -30,13 +30,13 @@
> # error the MPU will not function safely while Anomaly 05000263 applies
> #endif
>
> -struct cplb_entry icplb_tbl[MAX_CPLBS];
> -struct cplb_entry dcplb_tbl[MAX_CPLBS];
> +struct cplb_entry icplb_tbl[NR_CPUS][MAX_CPLBS];
> +struct cplb_entry dcplb_tbl[NR_CPUS][MAX_CPLBS];
>
> int first_switched_icplb, first_switched_dcplb;
> int first_mask_dcplb;
>
> -void __init generate_cplb_tables(void)
> +void __init generate_cplb_tables_cpu(unsigned int cpu)
> {
> int i_d, i_i;
> unsigned long addr;
> @@ -55,15 +55,16 @@ void __init generate_cplb_tables(void)
> d_cache |= CPLB_L1_AOW | CPLB_WT;
> #endif
> #endif
> +
> i_d = i_i = 0;
>
> /* Set up the zero page. */
> - dcplb_tbl[i_d].addr = 0;
> - dcplb_tbl[i_d++].data = SDRAM_OOPS | PAGE_SIZE_1KB;
> + dcplb_tbl[cpu][i_d].addr = 0;
> + dcplb_tbl[cpu][i_d++].data = SDRAM_OOPS | PAGE_SIZE_1KB;
>
> #if 0
> - icplb_tbl[i_i].addr = 0;
> - icplb_tbl[i_i++].data = i_cache | CPLB_USER_RD | PAGE_SIZE_4KB;
> + icplb_tbl[cpu][i_i].addr = 0;
> + icplb_tbl[cpu][i_i++].data = i_cache | CPLB_USER_RD | PAGE_SIZE_4KB;
> #endif
>
> /* Cover kernel memory with 4M pages. */
> @@ -72,28 +73,28 @@ void __init generate_cplb_tables(void)
> i_data = i_cache | CPLB_VALID | CPLB_PORTPRIO | PAGE_SIZE_4MB;
>
> for (; addr < memory_start; addr += 4 * 1024 * 1024) {
> - dcplb_tbl[i_d].addr = addr;
> - dcplb_tbl[i_d++].data = d_data;
> - icplb_tbl[i_i].addr = addr;
> - icplb_tbl[i_i++].data = i_data | (addr == 0 ? CPLB_USER_RD : 0);
> + dcplb_tbl[cpu][i_d].addr = addr;
> + dcplb_tbl[cpu][i_d++].data = d_data;
> + icplb_tbl[cpu][i_i].addr = addr;
> + icplb_tbl[cpu][i_i++].data = i_data | (addr == 0 ? CPLB_USER_RD : 0);
> }
>
> /* Cover L1 memory. One 4M area for code and data each is enough. */
> #if L1_DATA_A_LENGTH > 0 || L1_DATA_B_LENGTH > 0
> - dcplb_tbl[i_d].addr = L1_DATA_A_START;
> - dcplb_tbl[i_d++].data = L1_DMEMORY | PAGE_SIZE_4MB;
> + dcplb_tbl[cpu][i_d].addr = get_l1_data_a_start_cpu(cpu);
> + dcplb_tbl[cpu][i_d++].data = L1_DMEMORY | PAGE_SIZE_4MB;
> #endif
> #if L1_CODE_LENGTH > 0
> - icplb_tbl[i_i].addr = L1_CODE_START;
> - icplb_tbl[i_i++].data = L1_IMEMORY | PAGE_SIZE_4MB;
> + icplb_tbl[cpu][i_i].addr = get_l1_code_start_cpu(cpu);
> + icplb_tbl[cpu][i_i++].data = L1_IMEMORY | PAGE_SIZE_4MB;
> #endif
>
> /* Cover L2 memory */
> #if L2_LENGTH > 0
> - dcplb_tbl[i_d].addr = L2_START;
> - dcplb_tbl[i_d++].data = L2_DMEMORY | PAGE_SIZE_1MB;
> - icplb_tbl[i_i].addr = L2_START;
> - icplb_tbl[i_i++].data = L2_IMEMORY | PAGE_SIZE_1MB;
> + dcplb_tbl[cpu][i_d].addr = L2_START;
> + dcplb_tbl[cpu][i_d++].data = L2_DMEMORY | PAGE_SIZE_1MB;
> + icplb_tbl[cpu][i_i].addr = L2_START;
> + icplb_tbl[cpu][i_i++].data = L2_IMEMORY | PAGE_SIZE_1MB;
> #endif
>
> first_mask_dcplb = i_d;
> @@ -101,7 +102,7 @@ void __init generate_cplb_tables(void)
> first_switched_icplb = i_i;
>
> while (i_d < MAX_CPLBS)
> - dcplb_tbl[i_d++].data = 0;
> + dcplb_tbl[cpu][i_d++].data = 0;
> while (i_i < MAX_CPLBS)
> - icplb_tbl[i_i++].data = 0;
> + icplb_tbl[cpu][i_i++].data = 0;
> }
> diff --git a/arch/blackfin/kernel/cplb-mpu/cplbmgr.c b/arch/blackfin/kernel/cplb-mpu/cplbmgr.c
> index baa52e2..76bd991 100644
> --- a/arch/blackfin/kernel/cplb-mpu/cplbmgr.c
> +++ b/arch/blackfin/kernel/cplb-mpu/cplbmgr.c
> @@ -30,10 +30,11 @@
>
> int page_mask_nelts;
> int page_mask_order;
> -unsigned long *current_rwx_mask;
> +unsigned long *current_rwx_mask[NR_CPUS];
>
> -int nr_dcplb_miss, nr_icplb_miss, nr_icplb_supv_miss, nr_dcplb_prot;
> -int nr_cplb_flush;
> +int nr_dcplb_miss[NR_CPUS], nr_icplb_miss[NR_CPUS];
> +int nr_icplb_supv_miss[NR_CPUS], nr_dcplb_prot[NR_CPUS];
> +int nr_cplb_flush[NR_CPUS];
>
> static inline void disable_dcplb(void)
> {
> @@ -98,42 +99,42 @@ static inline int write_permitted(int status, unsigned long data)
> }
>
> /* Counters to implement round-robin replacement. */
> -static int icplb_rr_index, dcplb_rr_index;
> +static int icplb_rr_index[NR_CPUS], dcplb_rr_index[NR_CPUS];
>
> /*
> * Find an ICPLB entry to be evicted and return its index.
> */
> -static int evict_one_icplb(void)
> +static int evict_one_icplb(unsigned int cpu)
> {
> int i;
> for (i = first_switched_icplb; i < MAX_CPLBS; i++)
> - if ((icplb_tbl[i].data & CPLB_VALID) == 0)
> + if ((icplb_tbl[cpu][i].data & CPLB_VALID) == 0)
> return i;
> - i = first_switched_icplb + icplb_rr_index;
> + i = first_switched_icplb + icplb_rr_index[cpu];
> if (i >= MAX_CPLBS) {
> i -= MAX_CPLBS - first_switched_icplb;
> - icplb_rr_index -= MAX_CPLBS - first_switched_icplb;
> + icplb_rr_index[cpu] -= MAX_CPLBS - first_switched_icplb;
> }
> - icplb_rr_index++;
> + icplb_rr_index[cpu]++;
> return i;
> }
>
> -static int evict_one_dcplb(void)
> +static int evict_one_dcplb(unsigned int cpu)
> {
> int i;
> for (i = first_switched_dcplb; i < MAX_CPLBS; i++)
> - if ((dcplb_tbl[i].data & CPLB_VALID) == 0)
> + if ((dcplb_tbl[cpu][i].data & CPLB_VALID) == 0)
> return i;
> - i = first_switched_dcplb + dcplb_rr_index;
> + i = first_switched_dcplb + dcplb_rr_index[cpu];
> if (i >= MAX_CPLBS) {
> i -= MAX_CPLBS - first_switched_dcplb;
> - dcplb_rr_index -= MAX_CPLBS - first_switched_dcplb;
> + dcplb_rr_index[cpu] -= MAX_CPLBS - first_switched_dcplb;
> }
> - dcplb_rr_index++;
> + dcplb_rr_index[cpu]++;
> return i;
> }
>
> -static noinline int dcplb_miss(void)
> +static noinline int dcplb_miss(unsigned int cpu)
> {
> unsigned long addr = bfin_read_DCPLB_FAULT_ADDR();
> int status = bfin_read_DCPLB_STATUS();
> @@ -141,7 +142,7 @@ static noinline int dcplb_miss(void)
> int idx;
> unsigned long d_data;
>
> - nr_dcplb_miss++;
> + nr_dcplb_miss[cpu]++;
>
> d_data = CPLB_SUPV_WR | CPLB_VALID | CPLB_DIRTY | PAGE_SIZE_4KB;
> #ifdef CONFIG_BFIN_DCACHE
> @@ -168,25 +169,25 @@ static noinline int dcplb_miss(void)
> } else if (addr >= _ramend) {
> d_data |= CPLB_USER_RD | CPLB_USER_WR;
> } else {
> - mask = current_rwx_mask;
> + mask = current_rwx_mask[cpu];
> if (mask) {
> int page = addr >> PAGE_SHIFT;
> - int offs = page >> 5;
> + int idx = page >> 5;
> int bit = 1 << (page & 31);
>
> - if (mask[offs] & bit)
> + if (mask[idx] & bit)
> d_data |= CPLB_USER_RD;
>
> mask += page_mask_nelts;
> - if (mask[offs] & bit)
> + if (mask[idx] & bit)
> d_data |= CPLB_USER_WR;
> }
> }
> - idx = evict_one_dcplb();
> + idx = evict_one_dcplb(cpu);
>
> addr &= PAGE_MASK;
> - dcplb_tbl[idx].addr = addr;
> - dcplb_tbl[idx].data = d_data;
> + dcplb_tbl[cpu][idx].addr = addr;
> + dcplb_tbl[cpu][idx].data = d_data;
>
> disable_dcplb();
> bfin_write32(DCPLB_DATA0 + idx * 4, d_data);
> @@ -196,21 +197,21 @@ static noinline int dcplb_miss(void)
> return 0;
> }
>
> -static noinline int icplb_miss(void)
> +static noinline int icplb_miss(unsigned int cpu)
> {
> unsigned long addr = bfin_read_ICPLB_FAULT_ADDR();
> int status = bfin_read_ICPLB_STATUS();
> int idx;
> unsigned long i_data;
>
> - nr_icplb_miss++;
> + nr_icplb_miss[cpu]++;
>
> /* If inside the uncached DMA region, fault. */
> if (addr >= _ramend - DMA_UNCACHED_REGION && addr < _ramend)
> return CPLB_PROT_VIOL;
>
> if (status & FAULT_USERSUPV)
> - nr_icplb_supv_miss++;
> + nr_icplb_supv_miss[cpu]++;
>
> /*
> * First, try to find a CPLB that matches this address. If we
> @@ -218,8 +219,8 @@ static noinline int icplb_miss(void)
> * that the instruction crosses a page boundary.
> */
> for (idx = first_switched_icplb; idx < MAX_CPLBS; idx++) {
> - if (icplb_tbl[idx].data & CPLB_VALID) {
> - unsigned long this_addr = icplb_tbl[idx].addr;
> + if (icplb_tbl[cpu][idx].data & CPLB_VALID) {
> + unsigned long this_addr = icplb_tbl[cpu][idx].addr;
> if (this_addr <= addr && this_addr + PAGE_SIZE > addr) {
> addr += PAGE_SIZE;
> break;
> @@ -257,23 +258,23 @@ static noinline int icplb_miss(void)
> * Otherwise, check the x bitmap of the current process.
> */
> if (!(status & FAULT_USERSUPV)) {
> - unsigned long *mask = current_rwx_mask;
> + unsigned long *mask = current_rwx_mask[cpu];
>
> if (mask) {
> int page = addr >> PAGE_SHIFT;
> - int offs = page >> 5;
> + int idx = page >> 5;
> int bit = 1 << (page & 31);
>
> mask += 2 * page_mask_nelts;
> - if (mask[offs] & bit)
> + if (mask[idx] & bit)
> i_data |= CPLB_USER_RD;
> }
> }
> }
> - idx = evict_one_icplb();
> + idx = evict_one_icplb(cpu);
> addr &= PAGE_MASK;
> - icplb_tbl[idx].addr = addr;
> - icplb_tbl[idx].data = i_data;
> + icplb_tbl[cpu][idx].addr = addr;
> + icplb_tbl[cpu][idx].data = i_data;
>
> disable_icplb();
> bfin_write32(ICPLB_DATA0 + idx * 4, i_data);
> @@ -283,19 +284,19 @@ static noinline int icplb_miss(void)
> return 0;
> }
>
> -static noinline int dcplb_protection_fault(void)
> +static noinline int dcplb_protection_fault(unsigned int cpu)
> {
> int status = bfin_read_DCPLB_STATUS();
>
> - nr_dcplb_prot++;
> + nr_dcplb_prot[cpu]++;
>
> if (status & FAULT_RW) {
> int idx = faulting_cplb_index(status);
> - unsigned long data = dcplb_tbl[idx].data;
> + unsigned long data = dcplb_tbl[cpu][idx].data;
> if (!(data & CPLB_WT) && !(data & CPLB_DIRTY) &&
> write_permitted(status, data)) {
> data |= CPLB_DIRTY;
> - dcplb_tbl[idx].data = data;
> + dcplb_tbl[cpu][idx].data = data;
> bfin_write32(DCPLB_DATA0 + idx * 4, data);
> return 0;
> }
> @@ -306,36 +307,37 @@ static noinline int dcplb_protection_fault(void)
> int cplb_hdr(int seqstat, struct pt_regs *regs)
> {
> int cause = seqstat & 0x3f;
> + unsigned int cpu = smp_processor_id();
> switch (cause) {
> case 0x23:
> - return dcplb_protection_fault();
> + return dcplb_protection_fault(cpu);
> case 0x2C:
> - return icplb_miss();
> + return icplb_miss(cpu);
> case 0x26:
> - return dcplb_miss();
> + return dcplb_miss(cpu);
> default:
> return 1;
> }
> }
>
> -void flush_switched_cplbs(void)
> +void flush_switched_cplbs(unsigned int cpu)
> {
> int i;
> unsigned long flags;
>
> - nr_cplb_flush++;
> + nr_cplb_flush[cpu]++;
>
> local_irq_save(flags);
> disable_icplb();
> for (i = first_switched_icplb; i < MAX_CPLBS; i++) {
> - icplb_tbl[i].data = 0;
> + icplb_tbl[cpu][i].data = 0;
> bfin_write32(ICPLB_DATA0 + i * 4, 0);
> }
> enable_icplb();
>
> disable_dcplb();
> for (i = first_switched_dcplb; i < MAX_CPLBS; i++) {
> - dcplb_tbl[i].data = 0;
> + dcplb_tbl[cpu][i].data = 0;
> bfin_write32(DCPLB_DATA0 + i * 4, 0);
> }
> enable_dcplb();
> @@ -343,7 +345,7 @@ void flush_switched_cplbs(void)
>
> }
>
> -void set_mask_dcplbs(unsigned long *masks)
> +void set_mask_dcplbs(unsigned long *masks, unsigned int cpu)
> {
> int i;
> unsigned long addr = (unsigned long)masks;
> @@ -351,12 +353,12 @@ void set_mask_dcplbs(unsigned long *masks)
> unsigned long flags;
>
> if (!masks) {
> - current_rwx_mask = masks;
> + current_rwx_mask[cpu] = masks;
> return;
> }
>
> local_irq_save(flags);
> - current_rwx_mask = masks;
> + current_rwx_mask[cpu] = masks;
>
> d_data = CPLB_SUPV_WR | CPLB_VALID | CPLB_DIRTY | PAGE_SIZE_4KB;
> #ifdef CONFIG_BFIN_DCACHE
> @@ -368,8 +370,8 @@ void set_mask_dcplbs(unsigned long *masks)
>
> disable_dcplb();
> for (i = first_mask_dcplb; i < first_switched_dcplb; i++) {
> - dcplb_tbl[i].addr = addr;
> - dcplb_tbl[i].data = d_data;
> + dcplb_tbl[cpu][i].addr = addr;
> + dcplb_tbl[cpu][i].data = d_data;
> bfin_write32(DCPLB_DATA0 + i * 4, d_data);
> bfin_write32(DCPLB_ADDR0 + i * 4, addr);
> addr += PAGE_SIZE;
> diff --git a/arch/blackfin/kernel/cplb-nompu/cacheinit.c b/arch/blackfin/kernel/cplb-nompu/cacheinit.c
> index bd08315..3a385ae 100644
> --- a/arch/blackfin/kernel/cplb-nompu/cacheinit.c
> +++ b/arch/blackfin/kernel/cplb-nompu/cacheinit.c
> @@ -25,9 +25,9 @@
> #include <asm/cplbinit.h>
>
> #if defined(CONFIG_BFIN_ICACHE)
> -void __init bfin_icache_init(void)
> +void __cpuinit bfin_icache_init(u_long icplb[])
> {
> - unsigned long *table = icplb_table;
> + unsigned long *table = icplb;
> unsigned long ctrl;
> int i;
>
> @@ -47,9 +47,9 @@ void __init bfin_icache_init(void)
> #endif
>
> #if defined(CONFIG_BFIN_DCACHE)
> -void __init bfin_dcache_init(void)
> +void __cpuinit bfin_dcache_init(u_long dcplb[])
> {
> - unsigned long *table = dcplb_table;
> + unsigned long *table = dcplb;
> unsigned long ctrl;
> int i;
>
> @@ -64,6 +64,7 @@ void __init bfin_dcache_init(void)
> ctrl = bfin_read_DMEM_CONTROL();
> ctrl |= DMEM_CNTR;
> bfin_write_DMEM_CONTROL(ctrl);
> +
> SSYNC();
> }
> #endif
> diff --git a/arch/blackfin/kernel/cplb-nompu/cplbinfo.c b/arch/blackfin/kernel/cplb-nompu/cplbinfo.c
> index 1e74f0b..3f00809 100644
> --- a/arch/blackfin/kernel/cplb-nompu/cplbinfo.c
> +++ b/arch/blackfin/kernel/cplb-nompu/cplbinfo.c
> @@ -68,22 +68,22 @@ static int cplb_find_entry(unsigned long *cplb_addr,
> return -1;
> }
>
> -static char *cplb_print_entry(char *buf, int type)
> +static char *cplb_print_entry(char *buf, int type, unsigned int cpu)
> {
> - unsigned long *p_addr = dpdt_table;
> - unsigned long *p_data = dpdt_table + 1;
> - unsigned long *p_icount = dpdt_swapcount_table;
> - unsigned long *p_ocount = dpdt_swapcount_table + 1;
> + unsigned long *p_addr = dpdt_tables[cpu];
> + unsigned long *p_data = dpdt_tables[cpu] + 1;
> + unsigned long *p_icount = dpdt_swapcount_tables[cpu];
> + unsigned long *p_ocount = dpdt_swapcount_tables[cpu] + 1;
> unsigned long *cplb_addr = (unsigned long *)DCPLB_ADDR0;
> unsigned long *cplb_data = (unsigned long *)DCPLB_DATA0;
> int entry = 0, used_cplb = 0;
>
> if (type == CPLB_I) {
> buf += sprintf(buf, "Instruction CPLB entry:\n");
> - p_addr = ipdt_table;
> - p_data = ipdt_table + 1;
> - p_icount = ipdt_swapcount_table;
> - p_ocount = ipdt_swapcount_table + 1;
> + p_addr = ipdt_tables[cpu];
> + p_data = ipdt_tables[cpu] + 1;
> + p_icount = ipdt_swapcount_tables[cpu];
> + p_ocount = ipdt_swapcount_tables[cpu] + 1;
> cplb_addr = (unsigned long *)ICPLB_ADDR0;
> cplb_data = (unsigned long *)ICPLB_DATA0;
> } else
> @@ -134,24 +134,24 @@ static char *cplb_print_entry(char *buf, int type)
> return buf;
> }
>
> -static int cplbinfo_proc_output(char *buf)
> +static int cplbinfo_proc_output(char *buf, void *data)
> {
> + unsigned int cpu = (unsigned int)data;
> char *p;
>
> p = buf;
>
> - p += sprintf(p, "------------------ CPLB Information ------------------\n\n");
> + p += sprintf(p, "------------- CPLB Information on CPU%u--------------\n\n", cpu);
>
> if (bfin_read_IMEM_CONTROL() & ENICPLB)
> - p = cplb_print_entry(p, CPLB_I);
> + p = cplb_print_entry(p, CPLB_I, cpu);
> else
> p += sprintf(p, "Instruction CPLB is disabled.\n\n");
>
> if (bfin_read_DMEM_CONTROL() & ENDCPLB)
> - p = cplb_print_entry(p, CPLB_D);
> + p = cplb_print_entry(p, CPLB_D, cpu);
> else
> p += sprintf(p, "Data CPLB is disabled.\n");
> -
> return p - buf;
> }
>
> @@ -160,7 +160,7 @@ static int cplbinfo_read_proc(char *page, char **start, off_t off,
> {
> int len;
>
> - len = cplbinfo_proc_output(page);
> + len = cplbinfo_proc_output(page, data);
> if (len <= off + count)
> *eof = 1;
> *start = page + off;
> @@ -174,20 +174,33 @@ static int cplbinfo_read_proc(char *page, char **start, off_t off,
>
> static int __init cplbinfo_init(void)
> {
> - struct proc_dir_entry *entry;
> + struct proc_dir_entry *parent, *entry;
> + unsigned int cpu;
> + unsigned char str[10];
> +
> + parent = proc_mkdir("cplbinfo", NULL);
>
> - entry = create_proc_entry("cplbinfo", 0, NULL);
> - if (!entry)
> - return -ENOMEM;
> + for_each_online_cpu(cpu) {
> + sprintf(str, "cpu%u", cpu);
> + entry = create_proc_entry(str, 0, parent);
> + if (!entry)
> + return -ENOMEM;
>
> - entry->read_proc = cplbinfo_read_proc;
> - entry->data = NULL;
> + entry->read_proc = cplbinfo_read_proc;
> + entry->data = (void *)cpu;
> + }
>
> return 0;
> }
>
> static void __exit cplbinfo_exit(void)
> {
> + unsigned int cpu;
> + unsigned char str[20];
> + for_each_online_cpu(cpu) {
> + sprintf(str, "cplbinfo/cpu%u", cpu);
> + remove_proc_entry(str, NULL);
> + }
> remove_proc_entry("cplbinfo", NULL);
> }
>
> diff --git a/arch/blackfin/kernel/cplb-nompu/cplbinit.c b/arch/blackfin/kernel/cplb-nompu/cplbinit.c
> index 2debc90..8966c70 100644
> --- a/arch/blackfin/kernel/cplb-nompu/cplbinit.c
> +++ b/arch/blackfin/kernel/cplb-nompu/cplbinit.c
> @@ -27,46 +27,20 @@
> #include <asm/cplb.h>
> #include <asm/cplbinit.h>
>
> -#define CPLB_MEM CONFIG_MAX_MEM_SIZE
> -
> -/*
> -* Number of required data CPLB switchtable entries
> -* MEMSIZE / 4 (we mostly install 4M page size CPLBs
> -* approx 16 for smaller 1MB page size CPLBs for allignment purposes
> -* 1 for L1 Data Memory
> -* possibly 1 for L2 Data Memory
> -* 1 for CONFIG_DEBUG_HUNT_FOR_ZERO
> -* 1 for ASYNC Memory
> -*/
> -#define MAX_SWITCH_D_CPLBS (((CPLB_MEM / 4) + 16 + 1 + 1 + 1 \
> - + ASYNC_MEMORY_CPLB_COVERAGE) * 2)
> -
> -/*
> -* Number of required instruction CPLB switchtable entries
> -* MEMSIZE / 4 (we mostly install 4M page size CPLBs
> -* approx 12 for smaller 1MB page size CPLBs for allignment purposes
> -* 1 for L1 Instruction Memory
> -* possibly 1 for L2 Instruction Memory
> -* 1 for CONFIG_DEBUG_HUNT_FOR_ZERO
> -*/
> -#define MAX_SWITCH_I_CPLBS (((CPLB_MEM / 4) + 12 + 1 + 1 + 1) * 2)
> -
> -
> -u_long icplb_table[MAX_CPLBS + 1];
> -u_long dcplb_table[MAX_CPLBS + 1];
> +u_long icplb_tables[NR_CPUS][CPLB_TBL_ENTRIES+1];
> +u_long dcplb_tables[NR_CPUS][CPLB_TBL_ENTRIES+1];
>
> #ifdef CONFIG_CPLB_SWITCH_TAB_L1
> -# define PDT_ATTR __attribute__((l1_data))
> +#define PDT_ATTR __attribute__((l1_data))
> #else
> -# define PDT_ATTR
> +#define PDT_ATTR
> #endif
>
> -u_long ipdt_table[MAX_SWITCH_I_CPLBS + 1] PDT_ATTR;
> -u_long dpdt_table[MAX_SWITCH_D_CPLBS + 1] PDT_ATTR;
> -
> +u_long ipdt_tables[NR_CPUS][MAX_SWITCH_I_CPLBS+1] PDT_ATTR;
> +u_long dpdt_tables[NR_CPUS][MAX_SWITCH_D_CPLBS+1] PDT_ATTR;
> #ifdef CONFIG_CPLB_INFO
> -u_long ipdt_swapcount_table[MAX_SWITCH_I_CPLBS] PDT_ATTR;
> -u_long dpdt_swapcount_table[MAX_SWITCH_D_CPLBS] PDT_ATTR;
> +u_long ipdt_swapcount_tables[NR_CPUS][MAX_SWITCH_I_CPLBS] PDT_ATTR;
> +u_long dpdt_swapcount_tables[NR_CPUS][MAX_SWITCH_D_CPLBS] PDT_ATTR;
> #endif
>
> struct s_cplb {
> @@ -93,8 +67,8 @@ static struct cplb_desc cplb_data[] = {
> .name = "Zero Pointer Guard Page",
> },
> {
> - .start = L1_CODE_START,
> - .end = L1_CODE_START + L1_CODE_LENGTH,
> + .start = 0, /* dyanmic */
> + .end = 0, /* dynamic */
> .psize = SIZE_4M,
> .attr = INITIAL_T | SWITCH_T | I_CPLB,
> .i_conf = L1_IMEMORY,
> @@ -103,8 +77,8 @@ static struct cplb_desc cplb_data[] = {
> .name = "L1 I-Memory",
> },
> {
> - .start = L1_DATA_A_START,
> - .end = L1_DATA_B_START + L1_DATA_B_LENGTH,
> + .start = 0, /* dynamic */
> + .end = 0, /* dynamic */
> .psize = SIZE_4M,
> .attr = INITIAL_T | SWITCH_T | D_CPLB,
> .i_conf = 0,
> @@ -117,6 +91,16 @@ static struct cplb_desc cplb_data[] = {
> .name = "L1 D-Memory",
> },
> {
> + .start = L2_START,
> + .end = L2_START + L2_LENGTH,
> + .psize = SIZE_1M,
> + .attr = L2_ATTR,
> + .i_conf = L2_IMEMORY,
> + .d_conf = L2_DMEMORY,
> + .valid = (L2_LENGTH > 0),
> + .name = "L2 Memory",
> + },
> + {
> .start = 0,
> .end = 0, /* dynamic */
> .psize = 0,
> @@ -165,16 +149,6 @@ static struct cplb_desc cplb_data[] = {
> .name = "Asynchronous Memory Banks",
> },
> {
> - .start = L2_START,
> - .end = L2_START + L2_LENGTH,
> - .psize = SIZE_1M,
> - .attr = SWITCH_T | I_CPLB | D_CPLB,
> - .i_conf = L2_IMEMORY,
> - .d_conf = L2_DMEMORY,
> - .valid = (L2_LENGTH > 0),
> - .name = "L2 Memory",
> - },
> - {
> .start = BOOT_ROM_START,
> .end = BOOT_ROM_START + BOOT_ROM_LENGTH,
> .psize = SIZE_1M,
> @@ -310,7 +284,7 @@ __fill_data_cplbtab(struct cplb_tab *t, int i, u32 a_start, u32 a_end)
> }
> }
>
> -void __init generate_cplb_tables(void)
> +void __init generate_cplb_tables_cpu(unsigned int cpu)
> {
>
> u16 i, j, process;
> @@ -322,8 +296,8 @@ void __init generate_cplb_tables(void)
>
> printk(KERN_INFO "NOMPU: setting up cplb tables for global access\n");
>
> - cplb.init_i.size = MAX_CPLBS;
> - cplb.init_d.size = MAX_CPLBS;
> + cplb.init_i.size = CPLB_TBL_ENTRIES;
> + cplb.init_d.size = CPLB_TBL_ENTRIES;
> cplb.switch_i.size = MAX_SWITCH_I_CPLBS;
> cplb.switch_d.size = MAX_SWITCH_D_CPLBS;
>
> @@ -332,11 +306,15 @@ void __init generate_cplb_tables(void)
> cplb.switch_i.pos = 0;
> cplb.switch_d.pos = 0;
>
> - cplb.init_i.tab = icplb_table;
> - cplb.init_d.tab = dcplb_table;
> - cplb.switch_i.tab = ipdt_table;
> - cplb.switch_d.tab = dpdt_table;
> + cplb.init_i.tab = icplb_tables[cpu];
> + cplb.init_d.tab = dcplb_tables[cpu];
> + cplb.switch_i.tab = ipdt_tables[cpu];
> + cplb.switch_d.tab = dpdt_tables[cpu];
>
> + cplb_data[L1I_MEM].start = get_l1_code_start_cpu(cpu);
> + cplb_data[L1I_MEM].end = cplb_data[L1I_MEM].start + L1_CODE_LENGTH;
> + cplb_data[L1D_MEM].start = get_l1_data_a_start_cpu(cpu);
> + cplb_data[L1D_MEM].end = get_l1_data_b_start_cpu(cpu) + L1_DATA_B_LENGTH;
> cplb_data[SDRAM_KERN].end = memory_end;
>
> #ifdef CONFIG_MTD_UCLINUX
> @@ -459,6 +437,5 @@ void __init generate_cplb_tables(void)
> cplb.switch_d.tab[cplb.switch_d.pos] = -1;
>
> }
> -
> #endif
>
> diff --git a/arch/blackfin/kernel/cplb-nompu/cplbmgr.S b/arch/blackfin/kernel/cplb-nompu/cplbmgr.S
> index f5cf3ac..985f3fc 100644
> --- a/arch/blackfin/kernel/cplb-nompu/cplbmgr.S
> +++ b/arch/blackfin/kernel/cplb-nompu/cplbmgr.S
> @@ -52,6 +52,7 @@
> #include <linux/linkage.h>
> #include <asm/blackfin.h>
> #include <asm/cplb.h>
> +#include <asm/asm-offsets.h>
>
> #ifdef CONFIG_EXCPT_IRQ_SYSC_L1
> .section .l1.text
> @@ -164,10 +165,9 @@ ENTRY(_cplb_mgr)
> .Lifound_victim:
> #ifdef CONFIG_CPLB_INFO
> R7 = [P0 - 0x104];
> - P2.L = _ipdt_table;
> - P2.H = _ipdt_table;
> - P3.L = _ipdt_swapcount_table;
> - P3.H = _ipdt_swapcount_table;
> + GET_PDA(P2, R2);
> + P3 = [P2 + PDA_IPDT_SWAPCOUNT];
> + P2 = [P2 + PDA_IPDT];
> P3 += -4;
> .Licount:
> R2 = [P2]; /* address from config table */
> @@ -208,11 +208,10 @@ ENTRY(_cplb_mgr)
> * range.
> */
>
> - P2.L = _ipdt_table;
> - P2.H = _ipdt_table;
> + GET_PDA(P3, R0);
> + P2 = [P3 + PDA_IPDT];
> #ifdef CONFIG_CPLB_INFO
> - P3.L = _ipdt_swapcount_table;
> - P3.H = _ipdt_swapcount_table;
> + P3 = [P3 + PDA_IPDT_SWAPCOUNT];
> P3 += -8;
> #endif
> P0.L = _page_size_table;
> @@ -469,10 +468,9 @@ ENTRY(_cplb_mgr)
>
> #ifdef CONFIG_CPLB_INFO
> R7 = [P0 - 0x104];
> - P2.L = _dpdt_table;
> - P2.H = _dpdt_table;
> - P3.L = _dpdt_swapcount_table;
> - P3.H = _dpdt_swapcount_table;
> + GET_PDA(P2, R2);
> + P3 = [P2 + PDA_DPDT_SWAPCOUNT];
> + P2 = [P2 + PDA_DPDT];
> P3 += -4;
> .Ldicount:
> R2 = [P2];
> @@ -541,11 +539,10 @@ ENTRY(_cplb_mgr)
>
> R0 = I0; /* Our faulting address */
>
> - P2.L = _dpdt_table;
> - P2.H = _dpdt_table;
> + GET_PDA(P3, R1);
> + P2 = [P3 + PDA_DPDT];
> #ifdef CONFIG_CPLB_INFO
> - P3.L = _dpdt_swapcount_table;
> - P3.H = _dpdt_swapcount_table;
> + P3 = [P3 + PDA_DPDT_SWAPCOUNT];
> P3 += -8;
> #endif
>
> --
> 1.5.6.3
>

2008-11-19 07:47:11

by Bryan Wu

[permalink] [raw]
Subject: Re: [PATCH 4/5] Blackfin arch: SMP supporting patchset: Blackfin kernel and memory management code

Cc, linux-arch
-Bryan

On Tue, Nov 18, 2008 at 5:05 PM, Bryan Wu <[email protected]> wrote:
> From: Graf Yang <[email protected]>
>
> Blackfin dual core BF561 processor can support SMP like features.
> https://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like
>
> In this patch, we provide SMP extend to Blackfin kernel and memory management code
>
> Singed-off-by: Graf Yang <[email protected]>
> Signed-off-by: Mike Frysinger <[email protected]>
> Signed-off-by: Bryan Wu <[email protected]>
> ---
> arch/blackfin/kernel/asm-offsets.c | 29 +++
> arch/blackfin/kernel/bfin_ksyms.c | 34 ++++
> arch/blackfin/kernel/entry.S | 1 +
> arch/blackfin/kernel/irqchip.c | 24 ++--
> arch/blackfin/kernel/kgdb.c | 4 +-
> arch/blackfin/kernel/module.c | 13 ++-
> arch/blackfin/kernel/process.c | 23 ++-
> arch/blackfin/kernel/ptrace.c | 8 +-
> arch/blackfin/kernel/reboot.c | 24 ++-
> arch/blackfin/kernel/setup.c | 163 ++++++++++++------
> arch/blackfin/kernel/time.c | 114 +++++++++----
> arch/blackfin/kernel/traps.c | 56 +++----
> arch/blackfin/mm/init.c | 60 +++++--
> arch/blackfin/mm/sram-alloc.c | 336 +++++++++++++++++++++---------------
> 14 files changed, 580 insertions(+), 309 deletions(-)
>
> diff --git a/arch/blackfin/kernel/asm-offsets.c b/arch/blackfin/kernel/asm-offsets.c
> index 9bb85dd..b5df945 100644
> --- a/arch/blackfin/kernel/asm-offsets.c
> +++ b/arch/blackfin/kernel/asm-offsets.c
> @@ -56,6 +56,9 @@ int main(void)
> /* offsets into the thread struct */
> DEFINE(THREAD_KSP, offsetof(struct thread_struct, ksp));
> DEFINE(THREAD_USP, offsetof(struct thread_struct, usp));
> + DEFINE(THREAD_SR, offsetof(struct thread_struct, seqstat));
> + DEFINE(PT_SR, offsetof(struct thread_struct, seqstat));
> + DEFINE(THREAD_ESP0, offsetof(struct thread_struct, esp0));
> DEFINE(THREAD_PC, offsetof(struct thread_struct, pc));
> DEFINE(KERNEL_STACK_SIZE, THREAD_SIZE);
>
> @@ -128,5 +131,31 @@ int main(void)
> DEFINE(SIGSEGV, SIGSEGV);
> DEFINE(SIGTRAP, SIGTRAP);
>
> + /* PDA management (in L1 scratchpad) */
> + DEFINE(PDA_SYSCFG, offsetof(struct blackfin_pda, syscfg));
> +#ifdef CONFIG_SMP
> + DEFINE(PDA_IRQFLAGS, offsetof(struct blackfin_pda, imask));
> +#endif
> + DEFINE(PDA_IPDT, offsetof(struct blackfin_pda, ipdt));
> + DEFINE(PDA_IPDT_SWAPCOUNT, offsetof(struct blackfin_pda, ipdt_swapcount));
> + DEFINE(PDA_DPDT, offsetof(struct blackfin_pda, dpdt));
> + DEFINE(PDA_DPDT_SWAPCOUNT, offsetof(struct blackfin_pda, dpdt_swapcount));
> + DEFINE(PDA_EXIPTR, offsetof(struct blackfin_pda, ex_iptr));
> + DEFINE(PDA_EXOPTR, offsetof(struct blackfin_pda, ex_optr));
> + DEFINE(PDA_EXBUF, offsetof(struct blackfin_pda, ex_buf));
> + DEFINE(PDA_EXIMASK, offsetof(struct blackfin_pda, ex_imask));
> + DEFINE(PDA_EXSTACK, offsetof(struct blackfin_pda, ex_stack));
> +#ifdef ANOMALY_05000261
> + DEFINE(PDA_LFRETX, offsetof(struct blackfin_pda, last_cplb_fault_retx));
> +#endif
> + DEFINE(PDA_DCPLB, offsetof(struct blackfin_pda, dcplb_fault_addr));
> + DEFINE(PDA_ICPLB, offsetof(struct blackfin_pda, icplb_fault_addr));
> + DEFINE(PDA_RETX, offsetof(struct blackfin_pda, retx));
> + DEFINE(PDA_SEQSTAT, offsetof(struct blackfin_pda, seqstat));
> +#ifdef CONFIG_SMP
> + /* Inter-core lock (in L2 SRAM) */
> + DEFINE(SIZEOF_CORELOCK, sizeof(struct corelock_slot));
> +#endif
> +
> return 0;
> }
> diff --git a/arch/blackfin/kernel/bfin_ksyms.c b/arch/blackfin/kernel/bfin_ksyms.c
> index b66f1d4..763c315 100644
> --- a/arch/blackfin/kernel/bfin_ksyms.c
> +++ b/arch/blackfin/kernel/bfin_ksyms.c
> @@ -68,3 +68,37 @@ EXPORT_SYMBOL(insw_8);
> EXPORT_SYMBOL(outsl);
> EXPORT_SYMBOL(insl);
> EXPORT_SYMBOL(insl_16);
> +
> +#ifdef CONFIG_SMP
> +EXPORT_SYMBOL(__raw_atomic_update_asm);
> +EXPORT_SYMBOL(__raw_atomic_clear_asm);
> +EXPORT_SYMBOL(__raw_atomic_set_asm);
> +EXPORT_SYMBOL(__raw_atomic_xor_asm);
> +EXPORT_SYMBOL(__raw_atomic_test_asm);
> +EXPORT_SYMBOL(__raw_xchg_1_asm);
> +EXPORT_SYMBOL(__raw_xchg_2_asm);
> +EXPORT_SYMBOL(__raw_xchg_4_asm);
> +EXPORT_SYMBOL(__raw_cmpxchg_1_asm);
> +EXPORT_SYMBOL(__raw_cmpxchg_2_asm);
> +EXPORT_SYMBOL(__raw_cmpxchg_4_asm);
> +EXPORT_SYMBOL(__raw_spin_is_locked_asm);
> +EXPORT_SYMBOL(__raw_spin_lock_asm);
> +EXPORT_SYMBOL(__raw_spin_trylock_asm);
> +EXPORT_SYMBOL(__raw_spin_unlock_asm);
> +EXPORT_SYMBOL(__raw_read_lock_asm);
> +EXPORT_SYMBOL(__raw_read_trylock_asm);
> +EXPORT_SYMBOL(__raw_read_unlock_asm);
> +EXPORT_SYMBOL(__raw_write_lock_asm);
> +EXPORT_SYMBOL(__raw_write_trylock_asm);
> +EXPORT_SYMBOL(__raw_write_unlock_asm);
> +EXPORT_SYMBOL(__raw_bit_set_asm);
> +EXPORT_SYMBOL(__raw_bit_clear_asm);
> +EXPORT_SYMBOL(__raw_bit_toggle_asm);
> +EXPORT_SYMBOL(__raw_bit_test_asm);
> +EXPORT_SYMBOL(__raw_bit_test_set_asm);
> +EXPORT_SYMBOL(__raw_bit_test_clear_asm);
> +EXPORT_SYMBOL(__raw_bit_test_toggle_asm);
> +EXPORT_SYMBOL(__raw_uncached_fetch_asm);
> +EXPORT_SYMBOL(__raw_smp_mark_barrier_asm);
> +EXPORT_SYMBOL(__raw_smp_check_barrier_asm);
> +#endif
> diff --git a/arch/blackfin/kernel/entry.S b/arch/blackfin/kernel/entry.S
> index faea88e..c0c3fe8 100644
> --- a/arch/blackfin/kernel/entry.S
> +++ b/arch/blackfin/kernel/entry.S
> @@ -30,6 +30,7 @@
> #include <linux/linkage.h>
> #include <asm/thread_info.h>
> #include <asm/errno.h>
> +#include <asm/blackfin.h>
> #include <asm/asm-offsets.h>
>
> #include <asm/context.S>
> diff --git a/arch/blackfin/kernel/irqchip.c b/arch/blackfin/kernel/irqchip.c
> index 07402f5..9eebb78 100644
> --- a/arch/blackfin/kernel/irqchip.c
> +++ b/arch/blackfin/kernel/irqchip.c
> @@ -36,7 +36,7 @@
> #include <linux/irq.h>
> #include <asm/trace.h>
>
> -static unsigned long irq_err_count;
> +static atomic_t irq_err_count;
> static spinlock_t irq_controller_lock;
>
> /*
> @@ -48,7 +48,7 @@ void dummy_mask_unmask_irq(unsigned int irq)
>
> void ack_bad_irq(unsigned int irq)
> {
> - irq_err_count += 1;
> + atomic_inc(&irq_err_count);
> printk(KERN_ERR "IRQ: spurious interrupt %d\n", irq);
> }
> EXPORT_SYMBOL(ack_bad_irq);
> @@ -72,7 +72,7 @@ static struct irq_desc bad_irq_desc = {
>
> int show_interrupts(struct seq_file *p, void *v)
> {
> - int i = *(loff_t *) v;
> + int i = *(loff_t *) v, j;
> struct irqaction *action;
> unsigned long flags;
>
> @@ -80,19 +80,20 @@ int show_interrupts(struct seq_file *p, void *v)
> spin_lock_irqsave(&irq_desc[i].lock, flags);
> action = irq_desc[i].action;
> if (!action)
> - goto unlock;
> -
> - seq_printf(p, "%3d: %10u ", i, kstat_irqs(i));
> + goto skip;
> + seq_printf(p, "%3d: ", i);
> + for_each_online_cpu(j)
> + seq_printf(p, "%10u ", kstat_cpu(j).irqs[i]);
> + seq_printf(p, " %8s", irq_desc[i].chip->name);
> seq_printf(p, " %s", action->name);
> for (action = action->next; action; action = action->next)
> - seq_printf(p, ", %s", action->name);
> + seq_printf(p, " %s", action->name);
>
> seq_putc(p, '\n');
> - unlock:
> + skip:
> spin_unlock_irqrestore(&irq_desc[i].lock, flags);
> - } else if (i == NR_IRQS) {
> - seq_printf(p, "Err: %10lu\n", irq_err_count);
> - }
> + } else if (i == NR_IRQS)
> + seq_printf(p, "Err: %10u\n", atomic_read(&irq_err_count));
> return 0;
> }
>
> @@ -101,7 +102,6 @@ int show_interrupts(struct seq_file *p, void *v)
> * come via this function. Instead, they should provide their
> * own 'handler'
> */
> -
> #ifdef CONFIG_DO_IRQ_L1
> __attribute__((l1_text))
> #endif
> diff --git a/arch/blackfin/kernel/kgdb.c b/arch/blackfin/kernel/kgdb.c
> index b795a20..ab40221 100644
> --- a/arch/blackfin/kernel/kgdb.c
> +++ b/arch/blackfin/kernel/kgdb.c
> @@ -363,12 +363,12 @@ void kgdb_passive_cpu_callback(void *info)
>
> void kgdb_roundup_cpus(unsigned long flags)
> {
> - smp_call_function(kgdb_passive_cpu_callback, NULL, 0, 0);
> + smp_call_function(kgdb_passive_cpu_callback, NULL, 0);
> }
>
> void kgdb_roundup_cpu(int cpu, unsigned long flags)
> {
> - smp_call_function_single(cpu, kgdb_passive_cpu_callback, NULL, 0, 0);
> + smp_call_function_single(cpu, kgdb_passive_cpu_callback, NULL, 0);
> }
> #endif
>
> diff --git a/arch/blackfin/kernel/module.c b/arch/blackfin/kernel/module.c
> index e1bebc8..2e14cad 100644
> --- a/arch/blackfin/kernel/module.c
> +++ b/arch/blackfin/kernel/module.c
> @@ -343,7 +343,13 @@ apply_relocate_add(Elf_Shdr * sechdrs, const char *strtab,
> pr_debug("location is %x, value is %x type is %d \n",
> (unsigned int) location32, value,
> ELF32_R_TYPE(rel[i].r_info));
> -
> +#ifdef CONFIG_SMP
> + if ((unsigned long)location16 >= COREB_L1_DATA_A_START) {
> + printk(KERN_ERR "module %s: cannot relocate in L1: %u (SMP kernel)",
> + mod->name, ELF32_R_TYPE(rel[i].r_info));
> + return -ENOEXEC;
> + }
> +#endif
> switch (ELF32_R_TYPE(rel[i].r_info)) {
>
> case R_pcrel24:
> @@ -436,6 +442,7 @@ module_finalize(const Elf_Ehdr * hdr,
> {
> unsigned int i, strindex = 0, symindex = 0;
> char *secstrings;
> + long err = 0;
>
> secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
>
> @@ -460,8 +467,10 @@ module_finalize(const Elf_Ehdr * hdr,
> (strcmp(".rela.l1.text", secstrings + sechdrs[i].sh_name) == 0) ||
> ((strcmp(".rela.text", secstrings + sechdrs[i].sh_name) == 0) &&
> (hdr->e_flags & (EF_BFIN_CODE_IN_L1|EF_BFIN_CODE_IN_L2))))) {
> - apply_relocate_add((Elf_Shdr *) sechdrs, strtab,
> + err = apply_relocate_add((Elf_Shdr *) sechdrs, strtab,
> symindex, i, mod);
> + if (err < 0)
> + return -ENOEXEC;
> }
> }
> return 0;
> diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
> index 326e301..4359ea2 100644
> --- a/arch/blackfin/kernel/process.c
> +++ b/arch/blackfin/kernel/process.c
> @@ -171,6 +171,13 @@ asmlinkage int bfin_clone(struct pt_regs *regs)
> unsigned long clone_flags;
> unsigned long newsp;
>
> +#ifdef __ARCH_SYNC_CORE_DCACHE
> + if (current->rt.nr_cpus_allowed == num_possible_cpus()) {
> + current->cpus_allowed = cpumask_of_cpu(smp_processor_id());
> + current->rt.nr_cpus_allowed = 1;
> + }
> +#endif
> +
> /* syscall2 puts clone_flags in r0 and usp in r1 */
> clone_flags = regs->r0;
> newsp = regs->r1;
> @@ -338,22 +345,22 @@ int _access_ok(unsigned long addr, unsigned long size)
> if (addr >= (unsigned long)__init_begin &&
> addr + size <= (unsigned long)__init_end)
> return 1;
> - if (addr >= L1_SCRATCH_START
> - && addr + size <= L1_SCRATCH_START + L1_SCRATCH_LENGTH)
> + if (addr >= get_l1_scratch_start()
> + && addr + size <= get_l1_scratch_start() + L1_SCRATCH_LENGTH)
> return 1;
> #if L1_CODE_LENGTH != 0
> - if (addr >= L1_CODE_START + (_etext_l1 - _stext_l1)
> - && addr + size <= L1_CODE_START + L1_CODE_LENGTH)
> + if (addr >= get_l1_code_start() + (_etext_l1 - _stext_l1)
> + && addr + size <= get_l1_code_start() + L1_CODE_LENGTH)
> return 1;
> #endif
> #if L1_DATA_A_LENGTH != 0
> - if (addr >= L1_DATA_A_START + (_ebss_l1 - _sdata_l1)
> - && addr + size <= L1_DATA_A_START + L1_DATA_A_LENGTH)
> + if (addr >= get_l1_data_a_start() + (_ebss_l1 - _sdata_l1)
> + && addr + size <= get_l1_data_a_start() + L1_DATA_A_LENGTH)
> return 1;
> #endif
> #if L1_DATA_B_LENGTH != 0
> - if (addr >= L1_DATA_B_START + (_ebss_b_l1 - _sdata_b_l1)
> - && addr + size <= L1_DATA_B_START + L1_DATA_B_LENGTH)
> + if (addr >= get_l1_data_b_start() + (_ebss_b_l1 - _sdata_b_l1)
> + && addr + size <= get_l1_data_b_start() + L1_DATA_B_LENGTH)
> return 1;
> #endif
> #if L2_LENGTH != 0
> diff --git a/arch/blackfin/kernel/ptrace.c b/arch/blackfin/kernel/ptrace.c
> index 140bf00..4de44f3 100644
> --- a/arch/blackfin/kernel/ptrace.c
> +++ b/arch/blackfin/kernel/ptrace.c
> @@ -220,8 +220,8 @@ long arch_ptrace(struct task_struct *child, long request, long addr, long data)
> break;
> pr_debug("ptrace: user address is valid\n");
>
> - if (L1_CODE_LENGTH != 0 && addr >= L1_CODE_START
> - && addr + sizeof(tmp) <= L1_CODE_START + L1_CODE_LENGTH) {
> + if (L1_CODE_LENGTH != 0 && addr >= get_l1_code_start()
> + && addr + sizeof(tmp) <= get_l1_code_start() + L1_CODE_LENGTH) {
> safe_dma_memcpy (&tmp, (const void *)(addr), sizeof(tmp));
> copied = sizeof(tmp);
>
> @@ -300,8 +300,8 @@ long arch_ptrace(struct task_struct *child, long request, long addr, long data)
> break;
> pr_debug("ptrace: user address is valid\n");
>
> - if (L1_CODE_LENGTH != 0 && addr >= L1_CODE_START
> - && addr + sizeof(data) <= L1_CODE_START + L1_CODE_LENGTH) {
> + if (L1_CODE_LENGTH != 0 && addr >= get_l1_code_start()
> + && addr + sizeof(data) <= get_l1_code_start() + L1_CODE_LENGTH) {
> safe_dma_memcpy ((void *)(addr), &data, sizeof(data));
> copied = sizeof(data);
>
> diff --git a/arch/blackfin/kernel/reboot.c b/arch/blackfin/kernel/reboot.c
> index ae97ca4..eeee8cb 100644
> --- a/arch/blackfin/kernel/reboot.c
> +++ b/arch/blackfin/kernel/reboot.c
> @@ -21,7 +21,7 @@
> * the core reset.
> */
> __attribute__((l1_text))
> -static void bfin_reset(void)
> +static void _bfin_reset(void)
> {
> /* Wait for completion of "system" events such as cache line
> * line fills so that we avoid infinite stalls later on as
> @@ -66,6 +66,18 @@ static void bfin_reset(void)
> }
> }
>
> +static void bfin_reset(void)
> +{
> + if (ANOMALY_05000353 || ANOMALY_05000386)
> + _bfin_reset();
> + else
> + /* the bootrom checks to see how it was reset and will
> + * automatically perform a software reset for us when
> + * it starts executing boot
> + */
> + asm("raise 1;");
> +}
> +
> __attribute__((weak))
> void native_machine_restart(char *cmd)
> {
> @@ -75,14 +87,10 @@ void machine_restart(char *cmd)
> {
> native_machine_restart(cmd);
> local_irq_disable();
> - if (ANOMALY_05000353 || ANOMALY_05000386)
> - bfin_reset();
> + if (smp_processor_id())
> + smp_call_function((void *)bfin_reset, 0, 1);
> else
> - /* the bootrom checks to see how it was reset and will
> - * automatically perform a software reset for us when
> - * it starts executing boot
> - */
> - asm("raise 1;");
> + bfin_reset();
> }
>
> __attribute__((weak))
> diff --git a/arch/blackfin/kernel/setup.c b/arch/blackfin/kernel/setup.c
> index 71a9a8c..c644d23 100644
> --- a/arch/blackfin/kernel/setup.c
> +++ b/arch/blackfin/kernel/setup.c
> @@ -26,11 +26,10 @@
> #include <asm/blackfin.h>
> #include <asm/cplbinit.h>
> #include <asm/div64.h>
> +#include <asm/cpu.h>
> #include <asm/fixed_code.h>
> #include <asm/early_printk.h>
>
> -static DEFINE_PER_CPU(struct cpu, cpu_devices);
> -
> u16 _bfin_swrst;
> EXPORT_SYMBOL(_bfin_swrst);
>
> @@ -79,29 +78,76 @@ static struct change_member *change_point[2*BFIN_MEMMAP_MAX] __initdata;
> static struct bfin_memmap_entry *overlap_list[BFIN_MEMMAP_MAX] __initdata;
> static struct bfin_memmap_entry new_map[BFIN_MEMMAP_MAX] __initdata;
>
> -void __init bfin_cache_init(void)
> -{
> +DEFINE_PER_CPU(struct blackfin_cpudata, cpu_data);
> +
> #if defined(CONFIG_BFIN_DCACHE) || defined(CONFIG_BFIN_ICACHE)
> - generate_cplb_tables();
> +void __init generate_cplb_tables(void)
> +{
> + unsigned int cpu;
> +
> + /* Generate per-CPU I&D CPLB tables */
> + for (cpu = 0; cpu < num_possible_cpus(); ++cpu)
> + generate_cplb_tables_cpu(cpu);
> +}
> #endif
>
> +void __cpuinit bfin_setup_caches(unsigned int cpu)
> +{
> #ifdef CONFIG_BFIN_ICACHE
> - bfin_icache_init();
> - printk(KERN_INFO "Instruction Cache Enabled\n");
> +#ifdef CONFIG_MPU
> + bfin_icache_init(icplb_tbl[cpu]);
> +#else
> + bfin_icache_init(icplb_tables[cpu]);
> +#endif
> #endif
>
> #ifdef CONFIG_BFIN_DCACHE
> - bfin_dcache_init();
> - printk(KERN_INFO "Data Cache Enabled"
> +#ifdef CONFIG_MPU
> + bfin_dcache_init(dcplb_tbl[cpu]);
> +#else
> + bfin_dcache_init(dcplb_tables[cpu]);
> +#endif
> +#endif
> +
> + /*
> + * In cache coherence emulation mode, we need to have the
> + * D-cache enabled before running any atomic operation which
> + * might invove cache invalidation (i.e. spinlock, rwlock).
> + * So printk's are deferred until then.
> + */
> +#ifdef CONFIG_BFIN_ICACHE
> + printk(KERN_INFO "Instruction Cache Enabled for CPU%u\n", cpu);
> +#endif
> +#ifdef CONFIG_BFIN_DCACHE
> + printk(KERN_INFO "Data Cache Enabled for CPU%u"
> # if defined CONFIG_BFIN_WB
> " (write-back)"
> # elif defined CONFIG_BFIN_WT
> " (write-through)"
> # endif
> - "\n");
> + "\n", cpu);
> #endif
> }
>
> +void __cpuinit bfin_setup_cpudata(unsigned int cpu)
> +{
> + struct blackfin_cpudata *cpudata = &per_cpu(cpu_data, cpu);
> +
> + cpudata->idle = current;
> + cpudata->loops_per_jiffy = loops_per_jiffy;
> + cpudata->cclk = get_cclk();
> + cpudata->imemctl = bfin_read_IMEM_CONTROL();
> + cpudata->dmemctl = bfin_read_DMEM_CONTROL();
> +}
> +
> +void __init bfin_cache_init(void)
> +{
> +#if defined(CONFIG_BFIN_DCACHE) || defined(CONFIG_BFIN_ICACHE)
> + generate_cplb_tables();
> +#endif
> + bfin_setup_caches(0);
> +}
> +
> void __init bfin_relocate_l1_mem(void)
> {
> unsigned long l1_code_length;
> @@ -230,7 +276,7 @@ static int __init sanitize_memmap(struct bfin_memmap_entry *map, int *pnr_map)
> /* record all known change-points (starting and ending addresses),
> omitting those that are for empty memory regions */
> chgidx = 0;
> - for (i = 0; i < old_nr; i++) {
> + for (i = 0; i < old_nr; i++) {
> if (map[i].size != 0) {
> change_point[chgidx]->addr = map[i].addr;
> change_point[chgidx++]->pentry = &map[i];
> @@ -238,13 +284,13 @@ static int __init sanitize_memmap(struct bfin_memmap_entry *map, int *pnr_map)
> change_point[chgidx++]->pentry = &map[i];
> }
> }
> - chg_nr = chgidx; /* true number of change-points */
> + chg_nr = chgidx; /* true number of change-points */
>
> /* sort change-point list by memory addresses (low -> high) */
> still_changing = 1;
> - while (still_changing) {
> + while (still_changing) {
> still_changing = 0;
> - for (i = 1; i < chg_nr; i++) {
> + for (i = 1; i < chg_nr; i++) {
> /* if <current_addr> > <last_addr>, swap */
> /* or, if current=<start_addr> & last=<end_addr>, swap */
> if ((change_point[i]->addr < change_point[i-1]->addr) ||
> @@ -261,10 +307,10 @@ static int __init sanitize_memmap(struct bfin_memmap_entry *map, int *pnr_map)
> }
>
> /* create a new memmap, removing overlaps */
> - overlap_entries = 0; /* number of entries in the overlap table */
> - new_entry = 0; /* index for creating new memmap entries */
> - last_type = 0; /* start with undefined memory type */
> - last_addr = 0; /* start with 0 as last starting address */
> + overlap_entries = 0; /* number of entries in the overlap table */
> + new_entry = 0; /* index for creating new memmap entries */
> + last_type = 0; /* start with undefined memory type */
> + last_addr = 0; /* start with 0 as last starting address */
> /* loop through change-points, determining affect on the new memmap */
> for (chgidx = 0; chgidx < chg_nr; chgidx++) {
> /* keep track of all overlapping memmap entries */
> @@ -286,14 +332,14 @@ static int __init sanitize_memmap(struct bfin_memmap_entry *map, int *pnr_map)
> if (overlap_list[i]->type > current_type)
> current_type = overlap_list[i]->type;
> /* continue building up new memmap based on this information */
> - if (current_type != last_type) {
> + if (current_type != last_type) {
> if (last_type != 0) {
> new_map[new_entry].size =
> change_point[chgidx]->addr - last_addr;
> /* move forward only if the new size was non-zero */
> if (new_map[new_entry].size != 0)
> if (++new_entry >= BFIN_MEMMAP_MAX)
> - break; /* no more space left for new entries */
> + break; /* no more space left for new entries */
> }
> if (current_type != 0) {
> new_map[new_entry].addr = change_point[chgidx]->addr;
> @@ -303,9 +349,9 @@ static int __init sanitize_memmap(struct bfin_memmap_entry *map, int *pnr_map)
> last_type = current_type;
> }
> }
> - new_nr = new_entry; /* retain count for new entries */
> + new_nr = new_entry; /* retain count for new entries */
>
> - /* copy new mapping into original location */
> + /* copy new mapping into original location */
> memcpy(map, new_map, new_nr*sizeof(struct bfin_memmap_entry));
> *pnr_map = new_nr;
>
> @@ -361,7 +407,6 @@ static __init int parse_memmap(char *arg)
> * - "memmap=XXX[KkmM][@][$]XXX[KkmM]" defines a memory region
> * @ from <start> to <start>+<mem>, type RAM
> * $ from <start> to <start>+<mem>, type RESERVED
> - *
> */
> static __init void parse_cmdline_early(char *cmdline_p)
> {
> @@ -383,12 +428,10 @@ static __init void parse_cmdline_early(char *cmdline_p)
> if (*to != ' ') {
> if (*to == '$'
> || *(to + 1) == '$')
> - reserved_mem_dcache_on =
> - 1;
> + reserved_mem_dcache_on = 1;
> if (*to == '#'
> || *(to + 1) == '#')
> - reserved_mem_icache_on =
> - 1;
> + reserved_mem_icache_on = 1;
> }
> }
> } else if (!memcmp(to, "earlyprintk=", 12)) {
> @@ -417,9 +460,8 @@ static __init void parse_cmdline_early(char *cmdline_p)
> * [_ramend - DMA_UNCACHED_REGION,
> * _ramend]: uncached DMA region
> * [_ramend, physical_mem_end]: memory not managed by kernel
> - *
> */
> -static __init void memory_setup(void)
> +static __init void memory_setup(void)
> {
> #ifdef CONFIG_MTD_UCLINUX
> unsigned long mtd_phys = 0;
> @@ -436,7 +478,7 @@ static __init void memory_setup(void)
> memory_end = _ramend - DMA_UNCACHED_REGION;
>
> #ifdef CONFIG_MPU
> - /* Round up to multiple of 4MB. */
> + /* Round up to multiple of 4MB */
> memory_start = (_ramstart + 0x3fffff) & ~0x3fffff;
> #else
> memory_start = PAGE_ALIGN(_ramstart);
> @@ -616,7 +658,7 @@ static __init void setup_bootmem_allocator(void)
> end_pfn = memory_end >> PAGE_SHIFT;
>
> /*
> - * give all the memory to the bootmap allocator, tell it to put the
> + * give all the memory to the bootmap allocator, tell it to put the
> * boot mem_map at the start of memory.
> */
> bootmap_size = init_bootmem_node(NODE_DATA(0),
> @@ -791,7 +833,11 @@ void __init setup_arch(char **cmdline_p)
> bfin_write_SWRST(_bfin_swrst | DOUBLE_FAULT);
> #endif
>
> +#ifdef CONFIG_SMP
> + if (_bfin_swrst & SWRST_DBL_FAULT_A) {
> +#else
> if (_bfin_swrst & RESET_DOUBLE) {
> +#endif
> printk(KERN_EMERG "Recovering from DOUBLE FAULT event\n");
> #ifdef CONFIG_DEBUG_DOUBLEFAULT
> /* We assume the crashing kernel, and the current symbol table match */
> @@ -835,7 +881,7 @@ void __init setup_arch(char **cmdline_p)
> printk(KERN_INFO "Blackfin Linux support by http://blackfin.uclinux.org/\n");
>
> printk(KERN_INFO "Processor Speed: %lu MHz core clock and %lu MHz System Clock\n",
> - cclk / 1000000, sclk / 1000000);
> + cclk / 1000000, sclk / 1000000);
>
> if (ANOMALY_05000273 && (cclk >> 1) <= sclk)
> printk("\n\n\nANOMALY_05000273: CCLK must be >= 2*SCLK !!!\n\n\n");
> @@ -867,18 +913,21 @@ void __init setup_arch(char **cmdline_p)
> BUG_ON((char *)&safe_user_instruction - (char *)&fixed_code_start
> != SAFE_USER_INSTRUCTION - FIXED_CODE_START);
>
> +#ifdef CONFIG_SMP
> + platform_init_cpus();
> +#endif
> init_exception_vectors();
> - bfin_cache_init();
> + bfin_cache_init(); /* Initialize caches for the boot CPU */
> }
>
> static int __init topology_init(void)
> {
> - int cpu;
> + unsigned int cpu;
> + /* Record CPU-private information for the boot processor. */
> + bfin_setup_cpudata(0);
>
> for_each_possible_cpu(cpu) {
> - struct cpu *c = &per_cpu(cpu_devices, cpu);
> -
> - register_cpu(c, cpu);
> + register_cpu(&per_cpu(cpu_data, cpu).cpu, cpu);
> }
>
> return 0;
> @@ -983,15 +1032,15 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> char *cpu, *mmu, *fpu, *vendor, *cache;
> uint32_t revid;
>
> - u_long cclk = 0, sclk = 0;
> + u_long sclk = 0;
> u_int icache_size = BFIN_ICACHESIZE / 1024, dcache_size = 0, dsup_banks = 0;
> + struct blackfin_cpudata *cpudata = &per_cpu(cpu_data, *(unsigned int *)v);
>
> cpu = CPU;
> mmu = "none";
> fpu = "none";
> revid = bfin_revid();
>
> - cclk = get_cclk();
> sclk = get_sclk();
>
> switch (bfin_read_CHIPID() & CHIPID_MANUFACTURE) {
> @@ -1003,10 +1052,8 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> break;
> }
>
> - seq_printf(m, "processor\t: %d\n"
> - "vendor_id\t: %s\n",
> - *(unsigned int *)v,
> - vendor);
> + seq_printf(m, "processor\t: %d\n" "vendor_id\t: %s\n",
> + *(unsigned int *)v, vendor);
>
> if (CPUID == bfin_cpuid())
> seq_printf(m, "cpu family\t: 0x%04x\n", CPUID);
> @@ -1016,7 +1063,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
>
> seq_printf(m, "model name\t: ADSP-%s %lu(MHz CCLK) %lu(MHz SCLK) (%s)\n"
> "stepping\t: %d\n",
> - cpu, cclk/1000000, sclk/1000000,
> + cpu, cpudata->cclk/1000000, sclk/1000000,
> #ifdef CONFIG_MPU
> "mpu on",
> #else
> @@ -1025,16 +1072,16 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> revid);
>
> seq_printf(m, "cpu MHz\t\t: %lu.%03lu/%lu.%03lu\n",
> - cclk/1000000, cclk%1000000,
> + cpudata->cclk/1000000, cpudata->cclk%1000000,
> sclk/1000000, sclk%1000000);
> seq_printf(m, "bogomips\t: %lu.%02lu\n"
> "Calibration\t: %lu loops\n",
> - (loops_per_jiffy * HZ) / 500000,
> - ((loops_per_jiffy * HZ) / 5000) % 100,
> - (loops_per_jiffy * HZ));
> + (cpudata->loops_per_jiffy * HZ) / 500000,
> + ((cpudata->loops_per_jiffy * HZ) / 5000) % 100,
> + (cpudata->loops_per_jiffy * HZ));
>
> /* Check Cache configutation */
> - switch (bfin_read_DMEM_CONTROL() & (1 << DMC0_P | 1 << DMC1_P)) {
> + switch (cpudata->dmemctl & (1 << DMC0_P | 1 << DMC1_P)) {
> case ACACHE_BSRAM:
> cache = "dbank-A/B\t: cache/sram";
> dcache_size = 16;
> @@ -1058,10 +1105,10 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> }
>
> /* Is it turned on? */
> - if ((bfin_read_DMEM_CONTROL() & (ENDCPLB | DMC_ENABLE)) != (ENDCPLB | DMC_ENABLE))
> + if ((cpudata->dmemctl & (ENDCPLB | DMC_ENABLE)) != (ENDCPLB | DMC_ENABLE))
> dcache_size = 0;
>
> - if ((bfin_read_IMEM_CONTROL() & (IMC | ENICPLB)) != (IMC | ENICPLB))
> + if ((cpudata->imemctl & (IMC | ENICPLB)) != (IMC | ENICPLB))
> icache_size = 0;
>
> seq_printf(m, "cache size\t: %d KB(L1 icache) "
> @@ -1086,8 +1133,13 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> "dcache setup\t: %d Super-banks/%d Sub-banks/%d Ways, %d Lines/Way\n",
> dsup_banks, BFIN_DSUBBANKS, BFIN_DWAYS,
> BFIN_DLINES);
> +#ifdef __ARCH_SYNC_CORE_DCACHE
> + seq_printf(m,
> + "SMP Dcache Flushes\t: %lu\n\n",
> + per_cpu(cpu_data, *(unsigned int *)v).dcache_invld_count);
> +#endif
> #ifdef CONFIG_BFIN_ICACHE_LOCK
> - switch ((bfin_read_IMEM_CONTROL() >> 3) & WAYALL_L) {
> + switch ((cpudata->imemctl >> 3) & WAYALL_L) {
> case WAY0_L:
> seq_printf(m, "Way0 Locked-Down\n");
> break;
> @@ -1137,6 +1189,12 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> seq_printf(m, "No Ways are locked\n");
> }
> #endif
> + if (*(unsigned int *)v != NR_CPUS-1)
> + return 0;
> +
> +#if L2_LENGTH
> + seq_printf(m, "L2 SRAM\t\t: %dKB\n", L2_LENGTH/0x400);
> +#endif
> seq_printf(m, "board name\t: %s\n", bfin_board_name);
> seq_printf(m, "board memory\t: %ld kB (0x%p -> 0x%p)\n",
> physical_mem_end >> 10, (void *)0, (void *)physical_mem_end);
> @@ -1144,6 +1202,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> ((int)memory_end - (int)_stext) >> 10,
> _stext,
> (void *)memory_end);
> + seq_printf(m, "\n");
>
> return 0;
> }
> diff --git a/arch/blackfin/kernel/time.c b/arch/blackfin/kernel/time.c
> index eb23523..06de2ce 100644
> --- a/arch/blackfin/kernel/time.c
> +++ b/arch/blackfin/kernel/time.c
> @@ -34,9 +34,11 @@
> #include <linux/interrupt.h>
> #include <linux/time.h>
> #include <linux/irq.h>
> +#include <linux/delay.h>
>
> #include <asm/blackfin.h>
> #include <asm/time.h>
> +#include <asm/gptimers.h>
>
> /* This is an NTP setting */
> #define TICK_SIZE (tick_nsec / 1000)
> @@ -46,11 +48,14 @@ static unsigned long gettimeoffset(void);
>
> static struct irqaction bfin_timer_irq = {
> .name = "BFIN Timer Tick",
> +#ifdef CONFIG_IRQ_PER_CPU
> + .flags = IRQF_DISABLED | IRQF_PERCPU,
> +#else
> .flags = IRQF_DISABLED
> +#endif
> };
>
> -static void
> -time_sched_init(irq_handler_t timer_routine)
> +void setup_core_timer(void)
> {
> u32 tcount;
>
> @@ -71,12 +76,41 @@ time_sched_init(irq_handler_t timer_routine)
> CSYNC();
>
> bfin_write_TCNTL(7);
> +}
> +
> +#ifdef CONFIG_TICK_SOURCE_SYSTMR0
> +void setup_system_timer0(void)
> +{
> + /* Power down the core timer, just to play safe. */
> + bfin_write_TCNTL(0);
> +
> + disable_gptimers(TIMER0bit);
> + set_gptimer_status(0, TIMER_STATUS_TRUN0);
> + while (get_gptimer_status(0) & TIMER_STATUS_TRUN0)
> + udelay(10);
> +
> + set_gptimer_config(0, 0x59); /* IRQ enable, periodic, PWM_OUT, SCLKed, OUT PAD disabled */
> + set_gptimer_period(TIMER0_id, get_sclk() / HZ);
> + set_gptimer_pwidth(TIMER0_id, 1);
> + SSYNC();
> + enable_gptimers(TIMER0bit);
> +}
> +#endif
>
> +static void
> +time_sched_init(irqreturn_t(*timer_routine) (int, void *))
> +{
> +#ifdef CONFIG_TICK_SOURCE_SYSTMR0
> + setup_system_timer0();
> +#else
> + setup_core_timer();
> +#endif
> bfin_timer_irq.handler = (irq_handler_t)timer_routine;
> - /* call setup_irq instead of request_irq because request_irq calls
> - * kmalloc which has not been initialized yet
> - */
> +#ifdef CONFIG_TICK_SOURCE_SYSTMR0
> + setup_irq(IRQ_TIMER0, &bfin_timer_irq);
> +#else
> setup_irq(IRQ_CORETMR, &bfin_timer_irq);
> +#endif
> }
>
> /*
> @@ -87,17 +121,23 @@ static unsigned long gettimeoffset(void)
> unsigned long offset;
> unsigned long clocks_per_jiffy;
>
> +#ifdef CONFIG_TICK_SOURCE_SYSTMR0
> + clocks_per_jiffy = bfin_read_TIMER0_PERIOD();
> + offset = bfin_read_TIMER0_COUNTER() / \
> + (((clocks_per_jiffy + 1) * HZ) / USEC_PER_SEC);
> +
> + if ((get_gptimer_status(0) & TIMER_STATUS_TIMIL0) && offset < (100000 / HZ / 2))
> + offset += (USEC_PER_SEC / HZ);
> +#else
> clocks_per_jiffy = bfin_read_TPERIOD();
> - offset =
> - (clocks_per_jiffy -
> - bfin_read_TCOUNT()) / (((clocks_per_jiffy + 1) * HZ) /
> - USEC_PER_SEC);
> + offset = (clocks_per_jiffy - bfin_read_TCOUNT()) / \
> + (((clocks_per_jiffy + 1) * HZ) / USEC_PER_SEC);
>
> /* Check if we just wrapped the counters and maybe missed a tick */
> if ((bfin_read_ILAT() & (1 << IRQ_CORETMR))
> - && (offset < (100000 / HZ / 2)))
> + && (offset < (100000 / HZ / 2)))
> offset += (USEC_PER_SEC / HZ);
> -
> +#endif
> return offset;
> }
>
> @@ -120,34 +160,38 @@ irqreturn_t timer_interrupt(int irq, void *dummy)
> static long last_rtc_update;
>
> write_seqlock(&xtime_lock);
> -
> - do_timer(1);
> -
> - profile_tick(CPU_PROFILING);
> -
> - /*
> - * If we have an externally synchronized Linux clock, then update
> - * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
> - * called as close as possible to 500 ms before the new second starts.
> - */
> -
> - if (ntp_synced() &&
> - xtime.tv_sec > last_rtc_update + 660 &&
> - (xtime.tv_nsec / NSEC_PER_USEC) >=
> - 500000 - ((unsigned)TICK_SIZE) / 2
> - && (xtime.tv_nsec / NSEC_PER_USEC) <=
> - 500000 + ((unsigned)TICK_SIZE) / 2) {
> - if (set_rtc_mmss(xtime.tv_sec) == 0)
> - last_rtc_update = xtime.tv_sec;
> - else
> - /* Do it again in 60s. */
> - last_rtc_update = xtime.tv_sec - 600;
> +#ifdef CONFIG_TICK_SOURCE_SYSTMR0
> + if (get_gptimer_status(0) & TIMER_STATUS_TIMIL0) {
> +#endif
> + do_timer(1);
> +
> +
> + /*
> + * If we have an externally synchronized Linux clock, then update
> + * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
> + * called as close as possible to 500 ms before the new second starts.
> + */
> +
> + if (ntp_synced() &&
> + xtime.tv_sec > last_rtc_update + 660 &&
> + (xtime.tv_nsec / NSEC_PER_USEC) >=
> + 500000 - ((unsigned)TICK_SIZE) / 2
> + && (xtime.tv_nsec / NSEC_PER_USEC) <=
> + 500000 + ((unsigned)TICK_SIZE) / 2) {
> + if (set_rtc_mmss(xtime.tv_sec) == 0)
> + last_rtc_update = xtime.tv_sec;
> + else
> + /* Do it again in 60s. */
> + last_rtc_update = xtime.tv_sec - 600;
> + }
> +#ifdef CONFIG_TICK_SOURCE_SYSTMR0
> + set_gptimer_status(0, TIMER_STATUS_TIMIL0);
> }
> +#endif
> write_sequnlock(&xtime_lock);
>
> -#ifndef CONFIG_SMP
> update_process_times(user_mode(get_irq_regs()));
> -#endif
> + profile_tick(CPU_PROFILING);
>
> return IRQ_HANDLED;
> }
> diff --git a/arch/blackfin/kernel/traps.c b/arch/blackfin/kernel/traps.c
> index bef025b..af7cc43 100644
> --- a/arch/blackfin/kernel/traps.c
> +++ b/arch/blackfin/kernel/traps.c
> @@ -75,16 +75,6 @@ void __init trap_init(void)
> CSYNC();
> }
>
> -/*
> - * Used to save the RETX, SEQSTAT, I/D CPLB FAULT ADDR
> - * values across the transition from exception to IRQ5.
> - * We put these in L1, so they are going to be in a valid
> - * location during exception context
> - */
> -__attribute__((l1_data))
> -unsigned long saved_retx, saved_seqstat,
> - saved_icplb_fault_addr, saved_dcplb_fault_addr;
> -
> static void decode_address(char *buf, unsigned long address)
> {
> #ifdef CONFIG_DEBUG_VERBOSE
> @@ -211,18 +201,18 @@ asmlinkage void double_fault_c(struct pt_regs *fp)
> printk(KERN_EMERG "\n" KERN_EMERG "Double Fault\n");
> #ifdef CONFIG_DEBUG_DOUBLEFAULT_PRINT
> if (((long)fp->seqstat & SEQSTAT_EXCAUSE) == VEC_UNCOV) {
> + unsigned int cpu = smp_processor_id();
> char buf[150];
> - decode_address(buf, saved_retx);
> + decode_address(buf, cpu_pda[cpu].retx);
> printk(KERN_EMERG "While handling exception (EXCAUSE = 0x%x) at %s:\n",
> - (int)saved_seqstat & SEQSTAT_EXCAUSE, buf);
> - decode_address(buf, saved_dcplb_fault_addr);
> + (unsigned int)cpu_pda[cpu].seqstat & SEQSTAT_EXCAUSE, buf);
> + decode_address(buf, cpu_pda[cpu].dcplb_fault_addr);
> printk(KERN_NOTICE " DCPLB_FAULT_ADDR: %s\n", buf);
> - decode_address(buf, saved_icplb_fault_addr);
> + decode_address(buf, cpu_pda[cpu].icplb_fault_addr);
> printk(KERN_NOTICE " ICPLB_FAULT_ADDR: %s\n", buf);
>
> decode_address(buf, fp->retx);
> - printk(KERN_NOTICE "The instruction at %s caused a double exception\n",
> - buf);
> + printk(KERN_NOTICE "The instruction at %s caused a double exception\n", buf);
> } else
> #endif
> {
> @@ -240,6 +230,9 @@ asmlinkage void trap_c(struct pt_regs *fp)
> #ifdef CONFIG_DEBUG_BFIN_HWTRACE_ON
> int j;
> #endif
> +#ifdef CONFIG_DEBUG_HUNT_FOR_ZERO
> + unsigned int cpu = smp_processor_id();
> +#endif
> int sig = 0;
> siginfo_t info;
> unsigned long trapnr = fp->seqstat & SEQSTAT_EXCAUSE;
> @@ -417,7 +410,7 @@ asmlinkage void trap_c(struct pt_regs *fp)
> info.si_code = ILL_CPLB_MULHIT;
> sig = SIGSEGV;
> #ifdef CONFIG_DEBUG_HUNT_FOR_ZERO
> - if (saved_dcplb_fault_addr < FIXED_CODE_START)
> + if (cpu_pda[cpu].dcplb_fault_addr < FIXED_CODE_START)
> verbose_printk(KERN_NOTICE "NULL pointer access\n");
> else
> #endif
> @@ -471,7 +464,7 @@ asmlinkage void trap_c(struct pt_regs *fp)
> info.si_code = ILL_CPLB_MULHIT;
> sig = SIGSEGV;
> #ifdef CONFIG_DEBUG_HUNT_FOR_ZERO
> - if (saved_icplb_fault_addr < FIXED_CODE_START)
> + if (cpu_pda[cpu].icplb_fault_addr < FIXED_CODE_START)
> verbose_printk(KERN_NOTICE "Jump to NULL address\n");
> else
> #endif
> @@ -960,6 +953,7 @@ void dump_bfin_process(struct pt_regs *fp)
> else
> verbose_printk(KERN_NOTICE "COMM= invalid\n");
>
> + printk(KERN_NOTICE "CPU = %d\n", current_thread_info()->cpu);
> if (!((unsigned long)current->mm & 0x3) && (unsigned long)current->mm >= FIXED_CODE_START)
> verbose_printk(KERN_NOTICE "TEXT = 0x%p-0x%p DATA = 0x%p-0x%p\n"
> KERN_NOTICE " BSS = 0x%p-0x%p USER-STACK = 0x%p\n"
> @@ -1053,6 +1047,7 @@ void show_regs(struct pt_regs *fp)
> struct irqaction *action;
> unsigned int i;
> unsigned long flags;
> + unsigned int cpu = smp_processor_id();
>
> verbose_printk(KERN_NOTICE "\n" KERN_NOTICE "SEQUENCER STATUS:\t\t%s\n", print_tainted());
> verbose_printk(KERN_NOTICE " SEQSTAT: %08lx IPEND: %04lx SYSCFG: %04lx\n",
> @@ -1112,9 +1107,9 @@ unlock:
>
> if (((long)fp->seqstat & SEQSTAT_EXCAUSE) &&
> (((long)fp->seqstat & SEQSTAT_EXCAUSE) != VEC_HWERR)) {
> - decode_address(buf, saved_dcplb_fault_addr);
> + decode_address(buf, cpu_pda[cpu].dcplb_fault_addr);
> verbose_printk(KERN_NOTICE "DCPLB_FAULT_ADDR: %s\n", buf);
> - decode_address(buf, saved_icplb_fault_addr);
> + decode_address(buf, cpu_pda[cpu].icplb_fault_addr);
> verbose_printk(KERN_NOTICE "ICPLB_FAULT_ADDR: %s\n", buf);
> }
>
> @@ -1153,20 +1148,21 @@ unlock:
> asmlinkage int sys_bfin_spinlock(int *spinlock)__attribute__((l1_text));
> #endif
>
> -asmlinkage int sys_bfin_spinlock(int *spinlock)
> +static DEFINE_SPINLOCK(bfin_spinlock_lock);
> +
> +asmlinkage int sys_bfin_spinlock(int *p)
> {
> - int ret = 0;
> - int tmp = 0;
> + int ret, tmp = 0;
>
> - local_irq_disable();
> - ret = get_user(tmp, spinlock);
> - if (ret == 0) {
> - if (tmp)
> + spin_lock(&bfin_spinlock_lock); /* This would also hold kernel preemption. */
> + ret = get_user(tmp, p);
> + if (likely(ret == 0)) {
> + if (unlikely(tmp))
> ret = 1;
> - tmp = 1;
> - put_user(tmp, spinlock);
> + else
> + put_user(1, p);
> }
> - local_irq_enable();
> + spin_unlock(&bfin_spinlock_lock);
> return ret;
> }
>
> diff --git a/arch/blackfin/mm/init.c b/arch/blackfin/mm/init.c
> index bc240ab..57d306b 100644
> --- a/arch/blackfin/mm/init.c
> +++ b/arch/blackfin/mm/init.c
> @@ -31,7 +31,8 @@
> #include <linux/bootmem.h>
> #include <linux/uaccess.h>
> #include <asm/bfin-global.h>
> -#include <asm/l1layout.h>
> +#include <asm/pda.h>
> +#include <asm/cplbinit.h>
> #include "blackfin_sram.h"
>
> /*
> @@ -53,6 +54,11 @@ static unsigned long empty_bad_page;
>
> unsigned long empty_zero_page;
>
> +extern unsigned long exception_stack[NR_CPUS][1024];
> +
> +struct blackfin_pda cpu_pda[NR_CPUS];
> +EXPORT_SYMBOL(cpu_pda);
> +
> /*
> * paging_init() continues the virtual memory environment setup which
> * was begun by the code in arch/head.S.
> @@ -98,6 +104,42 @@ void __init paging_init(void)
> }
> }
>
> +asmlinkage void init_pda(void)
> +{
> + unsigned int cpu = raw_smp_processor_id();
> +
> + /* Initialize the PDA fields holding references to other parts
> + of the memory. The content of such memory is still
> + undefined at the time of the call, we are only setting up
> + valid pointers to it. */
> + memset(&cpu_pda[cpu], 0, sizeof(cpu_pda[cpu]));
> +
> + cpu_pda[0].next = &cpu_pda[1];
> + cpu_pda[1].next = &cpu_pda[0];
> +
> + cpu_pda[cpu].ex_stack = exception_stack[cpu + 1];
> +
> +#ifdef CONFIG_MPU
> +#else
> + cpu_pda[cpu].ipdt = ipdt_tables[cpu];
> + cpu_pda[cpu].dpdt = dpdt_tables[cpu];
> +#ifdef CONFIG_CPLB_INFO
> + cpu_pda[cpu].ipdt_swapcount = ipdt_swapcount_tables[cpu];
> + cpu_pda[cpu].dpdt_swapcount = dpdt_swapcount_tables[cpu];
> +#endif
> +#endif
> +
> +#ifdef CONFIG_SMP
> + cpu_pda[cpu].imask = 0x1f;
> +#endif
> +}
> +
> +void __cpuinit reserve_pda(void)
> +{
> + printk(KERN_INFO "PDA for CPU%u reserved at %p\n", smp_processor_id(),
> + &cpu_pda[smp_processor_id()]);
> +}
> +
> void __init mem_init(void)
> {
> unsigned int codek = 0, datak = 0, initk = 0;
> @@ -141,21 +183,13 @@ void __init mem_init(void)
>
> static int __init sram_init(void)
> {
> - unsigned long tmp;
> -
> /* Initialize the blackfin L1 Memory. */
> bfin_sram_init();
>
> - /* Allocate this once; never free it. We assume this gives us a
> - pointer to the start of L1 scratchpad memory; panic if it
> - doesn't. */
> - tmp = (unsigned long)l1sram_alloc(sizeof(struct l1_scratch_task_info));
> - if (tmp != (unsigned long)L1_SCRATCH_TASK_INFO) {
> - printk(KERN_EMERG "mem_init(): Did not get the right address from l1sram_alloc: %08lx != %08lx\n",
> - tmp, (unsigned long)L1_SCRATCH_TASK_INFO);
> - panic("No L1, time to give up\n");
> - }
> -
> + /* Reserve the PDA space for the boot CPU right after we
> + * initialized the scratch memory allocator.
> + */
> + reserve_pda();
> return 0;
> }
> pure_initcall(sram_init);
> diff --git a/arch/blackfin/mm/sram-alloc.c b/arch/blackfin/mm/sram-alloc.c
> index cc6f336..8f82b4c 100644
> --- a/arch/blackfin/mm/sram-alloc.c
> +++ b/arch/blackfin/mm/sram-alloc.c
> @@ -41,8 +41,10 @@
> #include <asm/blackfin.h>
> #include "blackfin_sram.h"
>
> -static spinlock_t l1sram_lock, l1_data_sram_lock, l1_inst_sram_lock;
> -static spinlock_t l2_sram_lock;
> +static DEFINE_PER_CPU(spinlock_t, l1sram_lock) ____cacheline_aligned_in_smp;
> +static DEFINE_PER_CPU(spinlock_t, l1_data_sram_lock) ____cacheline_aligned_in_smp;
> +static DEFINE_PER_CPU(spinlock_t, l1_inst_sram_lock) ____cacheline_aligned_in_smp;
> +static spinlock_t l2_sram_lock ____cacheline_aligned_in_smp;
>
> /* the data structure for L1 scratchpad and DATA SRAM */
> struct sram_piece {
> @@ -52,18 +54,22 @@ struct sram_piece {
> struct sram_piece *next;
> };
>
> -static struct sram_piece free_l1_ssram_head, used_l1_ssram_head;
> +static DEFINE_PER_CPU(struct sram_piece, free_l1_ssram_head);
> +static DEFINE_PER_CPU(struct sram_piece, used_l1_ssram_head);
>
> #if L1_DATA_A_LENGTH != 0
> -static struct sram_piece free_l1_data_A_sram_head, used_l1_data_A_sram_head;
> +static DEFINE_PER_CPU(struct sram_piece, free_l1_data_A_sram_head);
> +static DEFINE_PER_CPU(struct sram_piece, used_l1_data_A_sram_head);
> #endif
>
> #if L1_DATA_B_LENGTH != 0
> -static struct sram_piece free_l1_data_B_sram_head, used_l1_data_B_sram_head;
> +static DEFINE_PER_CPU(struct sram_piece, free_l1_data_B_sram_head);
> +static DEFINE_PER_CPU(struct sram_piece, used_l1_data_B_sram_head);
> #endif
>
> #if L1_CODE_LENGTH != 0
> -static struct sram_piece free_l1_inst_sram_head, used_l1_inst_sram_head;
> +static DEFINE_PER_CPU(struct sram_piece, free_l1_inst_sram_head);
> +static DEFINE_PER_CPU(struct sram_piece, used_l1_inst_sram_head);
> #endif
>
> #if L2_LENGTH != 0
> @@ -75,102 +81,115 @@ static struct kmem_cache *sram_piece_cache;
> /* L1 Scratchpad SRAM initialization function */
> static void __init l1sram_init(void)
> {
> - free_l1_ssram_head.next =
> - kmem_cache_alloc(sram_piece_cache, GFP_KERNEL);
> - if (!free_l1_ssram_head.next) {
> - printk(KERN_INFO "Failed to initialize Scratchpad data SRAM\n");
> - return;
> + unsigned int cpu;
> + for (cpu = 0; cpu < num_possible_cpus(); ++cpu) {
> + per_cpu(free_l1_ssram_head, cpu).next =
> + kmem_cache_alloc(sram_piece_cache, GFP_KERNEL);
> + if (!per_cpu(free_l1_ssram_head, cpu).next) {
> + printk(KERN_INFO "Fail to initialize Scratchpad data SRAM.\n");
> + return;
> + }
> +
> + per_cpu(free_l1_ssram_head, cpu).next->paddr = (void *)get_l1_scratch_start_cpu(cpu);
> + per_cpu(free_l1_ssram_head, cpu).next->size = L1_SCRATCH_LENGTH;
> + per_cpu(free_l1_ssram_head, cpu).next->pid = 0;
> + per_cpu(free_l1_ssram_head, cpu).next->next = NULL;
> +
> + per_cpu(used_l1_ssram_head, cpu).next = NULL;
> +
> + /* mutex initialize */
> + spin_lock_init(&per_cpu(l1sram_lock, cpu));
> + printk(KERN_INFO "Blackfin Scratchpad data SRAM: %d KB\n",
> + L1_SCRATCH_LENGTH >> 10);
> }
> -
> - free_l1_ssram_head.next->paddr = (void *)L1_SCRATCH_START;
> - free_l1_ssram_head.next->size = L1_SCRATCH_LENGTH;
> - free_l1_ssram_head.next->pid = 0;
> - free_l1_ssram_head.next->next = NULL;
> -
> - used_l1_ssram_head.next = NULL;
> -
> - /* mutex initialize */
> - spin_lock_init(&l1sram_lock);
> -
> - printk(KERN_INFO "Blackfin Scratchpad data SRAM: %d KB\n",
> - L1_SCRATCH_LENGTH >> 10);
> }
>
> static void __init l1_data_sram_init(void)
> {
> + unsigned int cpu;
> #if L1_DATA_A_LENGTH != 0
> - free_l1_data_A_sram_head.next =
> - kmem_cache_alloc(sram_piece_cache, GFP_KERNEL);
> - if (!free_l1_data_A_sram_head.next) {
> - printk(KERN_INFO "Failed to initialize L1 Data A SRAM\n");
> - return;
> + for (cpu = 0; cpu < num_possible_cpus(); ++cpu) {
> + per_cpu(free_l1_data_A_sram_head, cpu).next =
> + kmem_cache_alloc(sram_piece_cache, GFP_KERNEL);
> + if (!per_cpu(free_l1_data_A_sram_head, cpu).next) {
> + printk(KERN_INFO "Fail to initialize L1 Data A SRAM.\n");
> + return;
> + }
> +
> + per_cpu(free_l1_data_A_sram_head, cpu).next->paddr =
> + (void *)get_l1_data_a_start_cpu(cpu) + (_ebss_l1 - _sdata_l1);
> + per_cpu(free_l1_data_A_sram_head, cpu).next->size =
> + L1_DATA_A_LENGTH - (_ebss_l1 - _sdata_l1);
> + per_cpu(free_l1_data_A_sram_head, cpu).next->pid = 0;
> + per_cpu(free_l1_data_A_sram_head, cpu).next->next = NULL;
> +
> + per_cpu(used_l1_data_A_sram_head, cpu).next = NULL;
> +
> + printk(KERN_INFO "Blackfin L1 Data A SRAM: %d KB (%d KB free)\n",
> + L1_DATA_A_LENGTH >> 10,
> + per_cpu(free_l1_data_A_sram_head, cpu).next->size >> 10);
> }
> -
> - free_l1_data_A_sram_head.next->paddr =
> - (void *)L1_DATA_A_START + (_ebss_l1 - _sdata_l1);
> - free_l1_data_A_sram_head.next->size =
> - L1_DATA_A_LENGTH - (_ebss_l1 - _sdata_l1);
> - free_l1_data_A_sram_head.next->pid = 0;
> - free_l1_data_A_sram_head.next->next = NULL;
> -
> - used_l1_data_A_sram_head.next = NULL;
> -
> - printk(KERN_INFO "Blackfin L1 Data A SRAM: %d KB (%d KB free)\n",
> - L1_DATA_A_LENGTH >> 10,
> - free_l1_data_A_sram_head.next->size >> 10);
> #endif
> #if L1_DATA_B_LENGTH != 0
> - free_l1_data_B_sram_head.next =
> - kmem_cache_alloc(sram_piece_cache, GFP_KERNEL);
> - if (!free_l1_data_B_sram_head.next) {
> - printk(KERN_INFO "Failed to initialize L1 Data B SRAM\n");
> - return;
> + for (cpu = 0; cpu < num_possible_cpus(); ++cpu) {
> + per_cpu(free_l1_data_B_sram_head, cpu).next =
> + kmem_cache_alloc(sram_piece_cache, GFP_KERNEL);
> + if (!per_cpu(free_l1_data_B_sram_head, cpu).next) {
> + printk(KERN_INFO "Fail to initialize L1 Data B SRAM.\n");
> + return;
> + }
> +
> + per_cpu(free_l1_data_B_sram_head, cpu).next->paddr =
> + (void *)get_l1_data_b_start_cpu(cpu) + (_ebss_b_l1 - _sdata_b_l1);
> + per_cpu(free_l1_data_B_sram_head, cpu).next->size =
> + L1_DATA_B_LENGTH - (_ebss_b_l1 - _sdata_b_l1);
> + per_cpu(free_l1_data_B_sram_head, cpu).next->pid = 0;
> + per_cpu(free_l1_data_B_sram_head, cpu).next->next = NULL;
> +
> + per_cpu(used_l1_data_B_sram_head, cpu).next = NULL;
> +
> + printk(KERN_INFO "Blackfin L1 Data B SRAM: %d KB (%d KB free)\n",
> + L1_DATA_B_LENGTH >> 10,
> + per_cpu(free_l1_data_B_sram_head, cpu).next->size >> 10);
> + /* mutex initialize */
> }
> -
> - free_l1_data_B_sram_head.next->paddr =
> - (void *)L1_DATA_B_START + (_ebss_b_l1 - _sdata_b_l1);
> - free_l1_data_B_sram_head.next->size =
> - L1_DATA_B_LENGTH - (_ebss_b_l1 - _sdata_b_l1);
> - free_l1_data_B_sram_head.next->pid = 0;
> - free_l1_data_B_sram_head.next->next = NULL;
> -
> - used_l1_data_B_sram_head.next = NULL;
> -
> - printk(KERN_INFO "Blackfin L1 Data B SRAM: %d KB (%d KB free)\n",
> - L1_DATA_B_LENGTH >> 10,
> - free_l1_data_B_sram_head.next->size >> 10);
> #endif
>
> - /* mutex initialize */
> - spin_lock_init(&l1_data_sram_lock);
> +#if L1_DATA_A_LENGTH != 0 || L1_DATA_B_LENGTH != 0
> + for (cpu = 0; cpu < num_possible_cpus(); ++cpu)
> + spin_lock_init(&per_cpu(l1_data_sram_lock, cpu));
> +#endif
> }
>
> static void __init l1_inst_sram_init(void)
> {
> #if L1_CODE_LENGTH != 0
> - free_l1_inst_sram_head.next =
> - kmem_cache_alloc(sram_piece_cache, GFP_KERNEL);
> - if (!free_l1_inst_sram_head.next) {
> - printk(KERN_INFO "Failed to initialize L1 Instruction SRAM\n");
> - return;
> + unsigned int cpu;
> + for (cpu = 0; cpu < num_possible_cpus(); ++cpu) {
> + per_cpu(free_l1_inst_sram_head, cpu).next =
> + kmem_cache_alloc(sram_piece_cache, GFP_KERNEL);
> + if (!per_cpu(free_l1_inst_sram_head, cpu).next) {
> + printk(KERN_INFO "Failed to initialize L1 Instruction SRAM\n");
> + return;
> + }
> +
> + per_cpu(free_l1_inst_sram_head, cpu).next->paddr =
> + (void *)get_l1_code_start_cpu(cpu) + (_etext_l1 - _stext_l1);
> + per_cpu(free_l1_inst_sram_head, cpu).next->size =
> + L1_CODE_LENGTH - (_etext_l1 - _stext_l1);
> + per_cpu(free_l1_inst_sram_head, cpu).next->pid = 0;
> + per_cpu(free_l1_inst_sram_head, cpu).next->next = NULL;
> +
> + per_cpu(used_l1_inst_sram_head, cpu).next = NULL;
> +
> + printk(KERN_INFO "Blackfin L1 Instruction SRAM: %d KB (%d KB free)\n",
> + L1_CODE_LENGTH >> 10,
> + per_cpu(free_l1_inst_sram_head, cpu).next->size >> 10);
> +
> + /* mutex initialize */
> + spin_lock_init(&per_cpu(l1_inst_sram_lock, cpu));
> }
> -
> - free_l1_inst_sram_head.next->paddr =
> - (void *)L1_CODE_START + (_etext_l1 - _stext_l1);
> - free_l1_inst_sram_head.next->size =
> - L1_CODE_LENGTH - (_etext_l1 - _stext_l1);
> - free_l1_inst_sram_head.next->pid = 0;
> - free_l1_inst_sram_head.next->next = NULL;
> -
> - used_l1_inst_sram_head.next = NULL;
> -
> - printk(KERN_INFO "Blackfin L1 Instruction SRAM: %d KB (%d KB free)\n",
> - L1_CODE_LENGTH >> 10,
> - free_l1_inst_sram_head.next->size >> 10);
> #endif
> -
> - /* mutex initialize */
> - spin_lock_init(&l1_inst_sram_lock);
> }
>
> static void __init l2_sram_init(void)
> @@ -179,7 +198,7 @@ static void __init l2_sram_init(void)
> free_l2_sram_head.next =
> kmem_cache_alloc(sram_piece_cache, GFP_KERNEL);
> if (!free_l2_sram_head.next) {
> - printk(KERN_INFO "Failed to initialize L2 SRAM\n");
> + printk(KERN_INFO "Fail to initialize L2 SRAM.\n");
> return;
> }
>
> @@ -200,6 +219,7 @@ static void __init l2_sram_init(void)
> /* mutex initialize */
> spin_lock_init(&l2_sram_lock);
> }
> +
> void __init bfin_sram_init(void)
> {
> sram_piece_cache = kmem_cache_create("sram_piece_cache",
> @@ -353,20 +373,20 @@ int sram_free(const void *addr)
> {
>
> #if L1_CODE_LENGTH != 0
> - if (addr >= (void *)L1_CODE_START
> - && addr < (void *)(L1_CODE_START + L1_CODE_LENGTH))
> + if (addr >= (void *)get_l1_code_start()
> + && addr < (void *)(get_l1_code_start() + L1_CODE_LENGTH))
> return l1_inst_sram_free(addr);
> else
> #endif
> #if L1_DATA_A_LENGTH != 0
> - if (addr >= (void *)L1_DATA_A_START
> - && addr < (void *)(L1_DATA_A_START + L1_DATA_A_LENGTH))
> + if (addr >= (void *)get_l1_data_a_start()
> + && addr < (void *)(get_l1_data_a_start() + L1_DATA_A_LENGTH))
> return l1_data_A_sram_free(addr);
> else
> #endif
> #if L1_DATA_B_LENGTH != 0
> - if (addr >= (void *)L1_DATA_B_START
> - && addr < (void *)(L1_DATA_B_START + L1_DATA_B_LENGTH))
> + if (addr >= (void *)get_l1_data_b_start()
> + && addr < (void *)(get_l1_data_b_start() + L1_DATA_B_LENGTH))
> return l1_data_B_sram_free(addr);
> else
> #endif
> @@ -384,17 +404,20 @@ void *l1_data_A_sram_alloc(size_t size)
> {
> unsigned long flags;
> void *addr = NULL;
> + unsigned int cpu;
>
> + cpu = get_cpu();
> /* add mutex operation */
> - spin_lock_irqsave(&l1_data_sram_lock, flags);
> + spin_lock_irqsave(&per_cpu(l1_data_sram_lock, cpu), flags);
>
> #if L1_DATA_A_LENGTH != 0
> - addr = _sram_alloc(size, &free_l1_data_A_sram_head,
> - &used_l1_data_A_sram_head);
> + addr = _sram_alloc(size, &per_cpu(free_l1_data_A_sram_head, cpu),
> + &per_cpu(used_l1_data_A_sram_head, cpu));
> #endif
>
> /* add mutex operation */
> - spin_unlock_irqrestore(&l1_data_sram_lock, flags);
> + spin_unlock_irqrestore(&per_cpu(l1_data_sram_lock, cpu), flags);
> + put_cpu();
>
> pr_debug("Allocated address in l1_data_A_sram_alloc is 0x%lx+0x%lx\n",
> (long unsigned int)addr, size);
> @@ -407,19 +430,22 @@ int l1_data_A_sram_free(const void *addr)
> {
> unsigned long flags;
> int ret;
> + unsigned int cpu;
>
> + cpu = get_cpu();
> /* add mutex operation */
> - spin_lock_irqsave(&l1_data_sram_lock, flags);
> + spin_lock_irqsave(&per_cpu(l1_data_sram_lock, cpu), flags);
>
> #if L1_DATA_A_LENGTH != 0
> - ret = _sram_free(addr, &free_l1_data_A_sram_head,
> - &used_l1_data_A_sram_head);
> + ret = _sram_free(addr, &per_cpu(free_l1_data_A_sram_head, cpu),
> + &per_cpu(used_l1_data_A_sram_head, cpu));
> #else
> ret = -1;
> #endif
>
> /* add mutex operation */
> - spin_unlock_irqrestore(&l1_data_sram_lock, flags);
> + spin_unlock_irqrestore(&per_cpu(l1_data_sram_lock, cpu), flags);
> + put_cpu();
>
> return ret;
> }
> @@ -430,15 +456,18 @@ void *l1_data_B_sram_alloc(size_t size)
> #if L1_DATA_B_LENGTH != 0
> unsigned long flags;
> void *addr;
> + unsigned int cpu;
>
> + cpu = get_cpu();
> /* add mutex operation */
> - spin_lock_irqsave(&l1_data_sram_lock, flags);
> + spin_lock_irqsave(&per_cpu(l1_data_sram_lock, cpu), flags);
>
> - addr = _sram_alloc(size, &free_l1_data_B_sram_head,
> - &used_l1_data_B_sram_head);
> + addr = _sram_alloc(size, &per_cpu(free_l1_data_B_sram_head, cpu),
> + &per_cpu(used_l1_data_B_sram_head, cpu));
>
> /* add mutex operation */
> - spin_unlock_irqrestore(&l1_data_sram_lock, flags);
> + spin_unlock_irqrestore(&per_cpu(l1_data_sram_lock, cpu), flags);
> + put_cpu();
>
> pr_debug("Allocated address in l1_data_B_sram_alloc is 0x%lx+0x%lx\n",
> (long unsigned int)addr, size);
> @@ -455,15 +484,18 @@ int l1_data_B_sram_free(const void *addr)
> #if L1_DATA_B_LENGTH != 0
> unsigned long flags;
> int ret;
> + unsigned int cpu;
>
> + cpu = get_cpu();
> /* add mutex operation */
> - spin_lock_irqsave(&l1_data_sram_lock, flags);
> + spin_lock_irqsave(&per_cpu(l1_data_sram_lock, cpu), flags);
>
> - ret = _sram_free(addr, &free_l1_data_B_sram_head,
> - &used_l1_data_B_sram_head);
> + ret = _sram_free(addr, &per_cpu(free_l1_data_B_sram_head, cpu),
> + &per_cpu(used_l1_data_B_sram_head, cpu));
>
> /* add mutex operation */
> - spin_unlock_irqrestore(&l1_data_sram_lock, flags);
> + spin_unlock_irqrestore(&per_cpu(l1_data_sram_lock, cpu), flags);
> + put_cpu();
>
> return ret;
> #else
> @@ -509,15 +541,18 @@ void *l1_inst_sram_alloc(size_t size)
> #if L1_CODE_LENGTH != 0
> unsigned long flags;
> void *addr;
> + unsigned int cpu;
>
> + cpu = get_cpu();
> /* add mutex operation */
> - spin_lock_irqsave(&l1_inst_sram_lock, flags);
> + spin_lock_irqsave(&per_cpu(l1_inst_sram_lock, cpu), flags);
>
> - addr = _sram_alloc(size, &free_l1_inst_sram_head,
> - &used_l1_inst_sram_head);
> + addr = _sram_alloc(size, &per_cpu(free_l1_inst_sram_head, cpu),
> + &per_cpu(used_l1_inst_sram_head, cpu));
>
> /* add mutex operation */
> - spin_unlock_irqrestore(&l1_inst_sram_lock, flags);
> + spin_unlock_irqrestore(&per_cpu(l1_inst_sram_lock, cpu), flags);
> + put_cpu();
>
> pr_debug("Allocated address in l1_inst_sram_alloc is 0x%lx+0x%lx\n",
> (long unsigned int)addr, size);
> @@ -534,15 +569,18 @@ int l1_inst_sram_free(const void *addr)
> #if L1_CODE_LENGTH != 0
> unsigned long flags;
> int ret;
> + unsigned int cpu;
>
> + cpu = get_cpu();
> /* add mutex operation */
> - spin_lock_irqsave(&l1_inst_sram_lock, flags);
> + spin_lock_irqsave(&per_cpu(l1_inst_sram_lock, cpu), flags);
>
> - ret = _sram_free(addr, &free_l1_inst_sram_head,
> - &used_l1_inst_sram_head);
> + ret = _sram_free(addr, &per_cpu(free_l1_inst_sram_head, cpu),
> + &per_cpu(used_l1_inst_sram_head, cpu));
>
> /* add mutex operation */
> - spin_unlock_irqrestore(&l1_inst_sram_lock, flags);
> + spin_unlock_irqrestore(&per_cpu(l1_inst_sram_lock, cpu), flags);
> + put_cpu();
>
> return ret;
> #else
> @@ -556,15 +594,18 @@ void *l1sram_alloc(size_t size)
> {
> unsigned long flags;
> void *addr;
> + unsigned int cpu;
>
> + cpu = get_cpu();
> /* add mutex operation */
> - spin_lock_irqsave(&l1sram_lock, flags);
> + spin_lock_irqsave(&per_cpu(l1sram_lock, cpu), flags);
>
> - addr = _sram_alloc(size, &free_l1_ssram_head,
> - &used_l1_ssram_head);
> + addr = _sram_alloc(size, &per_cpu(free_l1_ssram_head, cpu),
> + &per_cpu(used_l1_ssram_head, cpu));
>
> /* add mutex operation */
> - spin_unlock_irqrestore(&l1sram_lock, flags);
> + spin_unlock_irqrestore(&per_cpu(l1sram_lock, cpu), flags);
> + put_cpu();
>
> return addr;
> }
> @@ -574,15 +615,18 @@ void *l1sram_alloc_max(size_t *psize)
> {
> unsigned long flags;
> void *addr;
> + unsigned int cpu;
>
> + cpu = get_cpu();
> /* add mutex operation */
> - spin_lock_irqsave(&l1sram_lock, flags);
> + spin_lock_irqsave(&per_cpu(l1sram_lock, cpu), flags);
>
> - addr = _sram_alloc_max(&free_l1_ssram_head,
> - &used_l1_ssram_head, psize);
> + addr = _sram_alloc_max(&per_cpu(free_l1_ssram_head, cpu),
> + &per_cpu(used_l1_ssram_head, cpu), psize);
>
> /* add mutex operation */
> - spin_unlock_irqrestore(&l1sram_lock, flags);
> + spin_unlock_irqrestore(&per_cpu(l1sram_lock, cpu), flags);
> + put_cpu();
>
> return addr;
> }
> @@ -592,15 +636,18 @@ int l1sram_free(const void *addr)
> {
> unsigned long flags;
> int ret;
> + unsigned int cpu;
>
> + cpu = get_cpu();
> /* add mutex operation */
> - spin_lock_irqsave(&l1sram_lock, flags);
> + spin_lock_irqsave(&per_cpu(l1sram_lock, cpu), flags);
>
> - ret = _sram_free(addr, &free_l1_ssram_head,
> - &used_l1_ssram_head);
> + ret = _sram_free(addr, &per_cpu(free_l1_ssram_head, cpu),
> + &per_cpu(used_l1_ssram_head, cpu));
>
> /* add mutex operation */
> - spin_unlock_irqrestore(&l1sram_lock, flags);
> + spin_unlock_irqrestore(&per_cpu(l1sram_lock, cpu), flags);
> + put_cpu();
>
> return ret;
> }
> @@ -761,33 +808,36 @@ static int sram_proc_read(char *buf, char **start, off_t offset, int count,
> int *eof, void *data)
> {
> int len = 0;
> + unsigned int cpu;
>
> - if (_sram_proc_read(buf, &len, count, "Scratchpad",
> - &free_l1_ssram_head, &used_l1_ssram_head))
> - goto not_done;
> + for (cpu = 0; cpu < num_possible_cpus(); ++cpu) {
> + if (_sram_proc_read(buf, &len, count, "Scratchpad",
> + &per_cpu(free_l1_ssram_head, cpu), &per_cpu(used_l1_ssram_head, cpu)))
> + goto not_done;
> #if L1_DATA_A_LENGTH != 0
> - if (_sram_proc_read(buf, &len, count, "L1 Data A",
> - &free_l1_data_A_sram_head,
> - &used_l1_data_A_sram_head))
> - goto not_done;
> + if (_sram_proc_read(buf, &len, count, "L1 Data A",
> + &per_cpu(free_l1_data_A_sram_head, cpu),
> + &per_cpu(used_l1_data_A_sram_head, cpu)))
> + goto not_done;
> #endif
> #if L1_DATA_B_LENGTH != 0
> - if (_sram_proc_read(buf, &len, count, "L1 Data B",
> - &free_l1_data_B_sram_head,
> - &used_l1_data_B_sram_head))
> - goto not_done;
> + if (_sram_proc_read(buf, &len, count, "L1 Data B",
> + &per_cpu(free_l1_data_B_sram_head, cpu),
> + &per_cpu(used_l1_data_B_sram_head, cpu)))
> + goto not_done;
> #endif
> #if L1_CODE_LENGTH != 0
> - if (_sram_proc_read(buf, &len, count, "L1 Instruction",
> - &free_l1_inst_sram_head, &used_l1_inst_sram_head))
> - goto not_done;
> + if (_sram_proc_read(buf, &len, count, "L1 Instruction",
> + &per_cpu(free_l1_inst_sram_head, cpu),
> + &per_cpu(used_l1_inst_sram_head, cpu)))
> + goto not_done;
> #endif
> + }
> #if L2_LENGTH != 0
> - if (_sram_proc_read(buf, &len, count, "L2",
> - &free_l2_sram_head, &used_l2_sram_head))
> + if (_sram_proc_read(buf, &len, count, "L2", &free_l2_sram_head,
> + &used_l2_sram_head))
> goto not_done;
> #endif
> -
> *eof = 1;
> not_done:
> return len;
> --
> 1.5.6.3
>

2008-11-19 07:47:31

by Bryan Wu

[permalink] [raw]
Subject: Re: [PATCH 5/5] Blackfin arch: SMP supporting patchset: some other misc code

Cc, linux-arch
-Bryan

On Tue, Nov 18, 2008 at 5:05 PM, Bryan Wu <[email protected]> wrote:
> From: Graf Yang <[email protected]>
>
> Blackfin dual core BF561 processor can support SMP like features.
> https://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like
>
> In this patch, we provide SMP extend to some other misc code
>
> Singed-off-by: Graf Yang <[email protected]>
> Signed-off-by: Bryan Wu <[email protected]>
> ---
> arch/blackfin/Kconfig | 32 +++++++++++++++++++++-
> arch/blackfin/kernel/vmlinux.lds.S | 4 +-
> arch/blackfin/mach-bf518/include/mach/mem_map.h | 15 ++++++++++
> arch/blackfin/mach-bf527/include/mach/mem_map.h | 15 ++++++++++
> arch/blackfin/mach-bf533/include/mach/mem_map.h | 15 ++++++++++
> arch/blackfin/mach-bf537/include/mach/mem_map.h | 15 ++++++++++
> arch/blackfin/mach-bf538/include/mach/mem_map.h | 15 ++++++++++
> arch/blackfin/mach-bf548/include/mach/mem_map.h | 15 ++++++++++
> 8 files changed, 122 insertions(+), 4 deletions(-)
>
> diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig
> index 004c06c..7fc8a51 100644
> --- a/arch/blackfin/Kconfig
> +++ b/arch/blackfin/Kconfig
> @@ -200,6 +200,32 @@ config BF561
>
> endchoice
>
> +config SMP
> + depends on BF561
> + bool "Symmetric multi-processing support"
> + ---help---
> + This enables support for systems with more than one CPU,
> + like the dual core BF561. If you have a system with only one
> + CPU, say N. If you have a system with more than one CPU, say Y.
> +
> + If you don't know what to do here, say N.
> +
> +config NR_CPUS
> + int
> + depends on SMP
> + default 2 if BF561
> +
> +config IRQ_PER_CPU
> + bool
> + depends on SMP
> + default y
> +
> +config TICK_SOURCE_SYSTMR0
> + bool
> + select BFIN_GPTIMERS
> + depends on SMP
> + default y
> +
> config BF_REV_MIN
> int
> default 0 if (BF51x || BF52x || BF54x)
> @@ -502,6 +528,7 @@ source kernel/Kconfig.hz
>
> config GENERIC_TIME
> bool "Generic time"
> + depends on !SMP
> default y
>
> config GENERIC_CLOCKEVENTS
> @@ -576,6 +603,7 @@ endmenu
>
>
> menu "Blackfin Kernel Optimizations"
> + depends on !SMP
>
> comment "Memory Optimizations"
>
> @@ -738,7 +766,6 @@ config BFIN_INS_LOWOVERHEAD
>
> endmenu
>
> -
> choice
> prompt "Kernel executes from"
> help
> @@ -804,7 +831,8 @@ config BFIN_ICACHE_LOCK
> choice
> prompt "Policy"
> depends on BFIN_DCACHE
> - default BFIN_WB
> + default BFIN_WB if !SMP
> + default BFIN_WT if SMP
> config BFIN_WB
> bool "Write back"
> help
> diff --git a/arch/blackfin/kernel/vmlinux.lds.S b/arch/blackfin/kernel/vmlinux.lds.S
> index 7d12c66..2a48535 100644
> --- a/arch/blackfin/kernel/vmlinux.lds.S
> +++ b/arch/blackfin/kernel/vmlinux.lds.S
> @@ -109,7 +109,7 @@ SECTIONS
> #endif
>
> DATA_DATA
> - *(.data.*)
> + *(.data)
> CONSTRUCTORS
>
> /* make sure the init_task is aligned to the
> @@ -161,6 +161,7 @@ SECTIONS
> *(.con_initcall.init)
> ___con_initcall_end = .;
> }
> + PERCPU(4)
> SECURITY_INIT
> .init.ramfs :
> {
> @@ -236,7 +237,6 @@ SECTIONS
> . = ALIGN(4);
> __ebss_l2 = .;
> }
> -
> /* Force trailing alignment of our init section so that when we
> * free our init memory, we don't leave behind a partial page.
> */
> diff --git a/arch/blackfin/mach-bf518/include/mach/mem_map.h b/arch/blackfin/mach-bf518/include/mach/mem_map.h
> index 10f678f..ac95d33 100644
> --- a/arch/blackfin/mach-bf518/include/mach/mem_map.h
> +++ b/arch/blackfin/mach-bf518/include/mach/mem_map.h
> @@ -99,4 +99,19 @@
> #define L1_SCRATCH_START 0xFFB00000
> #define L1_SCRATCH_LENGTH 0x1000
>
> +#define get_l1_scratch_start_cpu(cpu) L1_SCRATCH_START
> +#define get_l1_code_start_cpu(cpu) L1_CODE_START
> +#define get_l1_data_a_start_cpu(cpu) L1_DATA_A_START
> +#define get_l1_data_b_start_cpu(cpu) L1_DATA_B_START
> +#define get_l1_scratch_start() L1_SCRATCH_START
> +#define get_l1_code_start() L1_CODE_START
> +#define get_l1_data_a_start() L1_DATA_A_START
> +#define get_l1_data_b_start() L1_DATA_B_START
> +
> +#define GET_PDA_SAFE(preg) \
> + preg.l = _cpu_pda; \
> + preg.h = _cpu_pda;
> +
> +#define GET_PDA(preg, dreg) GET_PDA_SAFE(preg)
> +
> #endif /* _MEM_MAP_518_H_ */
> diff --git a/arch/blackfin/mach-bf527/include/mach/mem_map.h b/arch/blackfin/mach-bf527/include/mach/mem_map.h
> index ef46dc9..bd7fe0f 100644
> --- a/arch/blackfin/mach-bf527/include/mach/mem_map.h
> +++ b/arch/blackfin/mach-bf527/include/mach/mem_map.h
> @@ -99,4 +99,19 @@
> #define L1_SCRATCH_START 0xFFB00000
> #define L1_SCRATCH_LENGTH 0x1000
>
> +#define get_l1_scratch_start_cpu(cpu) L1_SCRATCH_START
> +#define get_l1_code_start_cpu(cpu) L1_CODE_START
> +#define get_l1_data_a_start_cpu(cpu) L1_DATA_A_START
> +#define get_l1_data_b_start_cpu(cpu) L1_DATA_B_START
> +#define get_l1_scratch_start() L1_SCRATCH_START
> +#define get_l1_code_start() L1_CODE_START
> +#define get_l1_data_a_start() L1_DATA_A_START
> +#define get_l1_data_b_start() L1_DATA_B_START
> +
> +#define GET_PDA_SAFE(preg) \
> + preg.l = _cpu_pda; \
> + preg.h = _cpu_pda;
> +
> +#define GET_PDA(preg, dreg) GET_PDA_SAFE(preg)
> +
> #endif /* _MEM_MAP_527_H_ */
> diff --git a/arch/blackfin/mach-bf533/include/mach/mem_map.h b/arch/blackfin/mach-bf533/include/mach/mem_map.h
> index 581fc6e..d5eaef2 100644
> --- a/arch/blackfin/mach-bf533/include/mach/mem_map.h
> +++ b/arch/blackfin/mach-bf533/include/mach/mem_map.h
> @@ -168,4 +168,19 @@
> #define L1_SCRATCH_START 0xFFB00000
> #define L1_SCRATCH_LENGTH 0x1000
>
> +#define get_l1_scratch_start_cpu(cpu) L1_SCRATCH_START
> +#define get_l1_code_start_cpu(cpu) L1_CODE_START
> +#define get_l1_data_a_start_cpu(cpu) L1_DATA_A_START
> +#define get_l1_data_b_start_cpu(cpu) L1_DATA_B_START
> +#define get_l1_scratch_start() L1_SCRATCH_START
> +#define get_l1_code_start() L1_CODE_START
> +#define get_l1_data_a_start() L1_DATA_A_START
> +#define get_l1_data_b_start() L1_DATA_B_START
> +
> +#define GET_PDA_SAFE(preg) \
> + preg.l = _cpu_pda; \
> + preg.h = _cpu_pda;
> +
> +#define GET_PDA(preg, dreg) GET_PDA_SAFE(preg)
> +
> #endif /* _MEM_MAP_533_H_ */
> diff --git a/arch/blackfin/mach-bf537/include/mach/mem_map.h b/arch/blackfin/mach-bf537/include/mach/mem_map.h
> index 5078b66..be4de76 100644
> --- a/arch/blackfin/mach-bf537/include/mach/mem_map.h
> +++ b/arch/blackfin/mach-bf537/include/mach/mem_map.h
> @@ -176,4 +176,19 @@
> #define L1_SCRATCH_START 0xFFB00000
> #define L1_SCRATCH_LENGTH 0x1000
>
> +#define get_l1_scratch_start_cpu(cpu) L1_SCRATCH_START
> +#define get_l1_code_start_cpu(cpu) L1_CODE_START
> +#define get_l1_data_a_start_cpu(cpu) L1_DATA_A_START
> +#define get_l1_data_b_start_cpu(cpu) L1_DATA_B_START
> +#define get_l1_scratch_start() L1_SCRATCH_START
> +#define get_l1_code_start() L1_CODE_START
> +#define get_l1_data_a_start() L1_DATA_A_START
> +#define get_l1_data_b_start() L1_DATA_B_START
> +
> +#define GET_PDA_SAFE(preg) \
> + preg.l = _cpu_pda; \
> + preg.h = _cpu_pda;
> +
> +#define GET_PDA(preg, dreg) GET_PDA_SAFE(preg)
> +
> #endif /* _MEM_MAP_537_H_ */
> diff --git a/arch/blackfin/mach-bf538/include/mach/mem_map.h b/arch/blackfin/mach-bf538/include/mach/mem_map.h
> index d65d430..c134057 100644
> --- a/arch/blackfin/mach-bf538/include/mach/mem_map.h
> +++ b/arch/blackfin/mach-bf538/include/mach/mem_map.h
> @@ -104,4 +104,19 @@
> #define L1_SCRATCH_START 0xFFB00000
> #define L1_SCRATCH_LENGTH 0x1000
>
> +#define get_l1_scratch_start_cpu(cpu) L1_SCRATCH_START
> +#define get_l1_code_start_cpu(cpu) L1_CODE_START
> +#define get_l1_data_a_start_cpu(cpu) L1_DATA_A_START
> +#define get_l1_data_b_start_cpu(cpu) L1_DATA_B_START
> +#define get_l1_scratch_start() L1_SCRATCH_START
> +#define get_l1_code_start() L1_CODE_START
> +#define get_l1_data_a_start() L1_DATA_A_START
> +#define get_l1_data_b_start() L1_DATA_B_START
> +
> +#define GET_PDA_SAFE(preg) \
> + preg.l = _cpu_pda; \
> + preg.h = _cpu_pda;
> +
> +#define GET_PDA(preg, dreg) GET_PDA_SAFE(preg)
> +
> #endif /* _MEM_MAP_538_H_ */
> diff --git a/arch/blackfin/mach-bf548/include/mach/mem_map.h b/arch/blackfin/mach-bf548/include/mach/mem_map.h
> index a222842..361eb0e 100644
> --- a/arch/blackfin/mach-bf548/include/mach/mem_map.h
> +++ b/arch/blackfin/mach-bf548/include/mach/mem_map.h
> @@ -108,4 +108,19 @@
> #define L1_SCRATCH_START 0xFFB00000
> #define L1_SCRATCH_LENGTH 0x1000
>
> +#define get_l1_scratch_start_cpu(cpu) L1_SCRATCH_START
> +#define get_l1_code_start_cpu(cpu) L1_CODE_START
> +#define get_l1_data_a_start_cpu(cpu) L1_DATA_A_START
> +#define get_l1_data_b_start_cpu(cpu) L1_DATA_B_START
> +#define get_l1_scratch_start() L1_SCRATCH_START
> +#define get_l1_code_start() L1_CODE_START
> +#define get_l1_data_a_start() L1_DATA_A_START
> +#define get_l1_data_b_start() L1_DATA_B_START
> +
> +#define GET_PDA_SAFE(preg) \
> + preg.l = _cpu_pda; \
> + preg.h = _cpu_pda;
> +
> +#define GET_PDA(preg, dreg) GET_PDA_SAFE(preg)
> +
> #endif/* _MEM_MAP_548_H_ */
> --
> 1.5.6.3
>

2008-11-19 07:52:35

by gyang

[permalink] [raw]
Subject: Re: [PATCH 2/5] Blackfin arch: SMP supporting patchset: Blackfin header files and machine common code


在 2008-11-18二的 22:56 -0800,Andrew Morton写道:
> On Tue, 18 Nov 2008 17:05:05 +0800 Bryan Wu <[email protected]> wrote:
>
> > From: Graf Yang <[email protected]>
> >
> > Blackfin dual core BF561 processor can support SMP like features.
> > https://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like
> >
> > In this patch, we provide SMP extend to Blackfin header files
> > and machine common code
> >
> >
> > ...
> >
> > +#define atomic_add_unless(v, a, u) \
> > +({ \
> > + int c, old; \
> > + c = atomic_read(v); \
> > + while (c != (u) && (old = atomic_cmpxchg((v), c, c + (a))) != c) \
> > + c = old; \
> > + c != (u); \
> > +})
>
> The macro references its args multiple times and will do weird or
> inefficient things when called with expressions which have
> side-effects, or which do slow things.
>
> >
> > ...
> >
> > +#include <asm/system.h> /* save_flags */
> > +
> > +static inline void set_bit(int nr, volatile unsigned long *addr)
> > {
> > int *a = (int *)addr;
> > int mask;
> > @@ -57,21 +91,23 @@ static __inline__ void clear_bit(int nr, volatile unsigned long *addr)
> > a += nr >> 5;
> > mask = 1 << (nr & 0x1f);
> > local_irq_save(flags);
> > - *a &= ~mask;
> > + *a |= mask;
>
> I think you just broke clear_bit(). Maybe I'm misreading the diff.
OK, We have corrected it on our own tree.
>
> > local_irq_restore(flags);
> > }
> >
> >
> > ...
> >
> > +#define smp_mb__before_clear_bit() barrier()
> > +#define smp_mb__after_clear_bit() barrier()
> > +
> > +static inline void __set_bit(int nr, volatile unsigned long *addr)
> > +{
> > + int *a = (int *)addr;
> > + int mask;
> > +
> > + a += nr >> 5;
> > + mask = 1 << (nr & 0x1f);
> > + *a |= mask;
> > +}
> > +
> > +static inline void __clear_bit(int nr, volatile unsigned long *addr)
> > +{
> > + int *a = (int *)addr;
> > + int mask;
> > +
> > + a += nr >> 5;
> > + mask = 1 << (nr & 0x1f);
> > + *a &= ~mask;
> > +}
> > +
> > +static inline void __change_bit(int nr, volatile unsigned long *addr)
> > +{
> > + int mask;
> > + unsigned long *ADDR = (unsigned long *)addr;
> > +
> > + ADDR += nr >> 5;
> > + mask = 1 << (nr & 31);
> > + *ADDR ^= mask;
> > +}
>
> I'm surprised there isn't any generic code which can be used for the above.
>
> >
> > ...
> >
>
> Gad what a lot of code. I don't think I have time to read it all, sorry.

2008-11-19 08:11:39

by gyang

[permalink] [raw]
Subject: Re: [PATCH 1/5] Blackfin arch: SMP supporting patchset: BF561 related code


在 2008-11-19三的 15:39 +0800,Bryan Wu写道:
> On Wed, Nov 19, 2008 at 2:56 PM, Andrew Morton
> <[email protected]> wrote:
> > On Tue, 18 Nov 2008 17:05:04 +0800 Bryan Wu <[email protected]> wrote:
> >
> >> From: Graf Yang <[email protected]>
> >>
> >> Blackfin dual core BF561 processor can support SMP like features.
> >> https://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like
> >>
> >> In this patch, we provide SMP extend to BF561 kernel code
> >>
> >>
> >> ...
> >>
> >> --- a/arch/blackfin/mach-bf561/include/mach/mem_map.h
> >> +++ b/arch/blackfin/mach-bf561/include/mach/mem_map.h
> >> @@ -85,4 +85,124 @@
> >> #define L1_SCRATCH_START COREA_L1_SCRATCH_START
> >> #define L1_SCRATCH_LENGTH 0x1000
> >>
> >> +#ifndef __ASSEMBLY__
> >> +
> >> +#ifdef CONFIG_SMP
> >> +
> >> +#define get_l1_scratch_start_cpu(cpu) \
> >> + ({ unsigned long __addr; \
> >> + __addr = (cpu) ? COREB_L1_SCRATCH_START : COREA_L1_SCRATCH_START;\
> >> + __addr; })
> >> +
> >> +#define get_l1_code_start_cpu(cpu) \
> >> + ({ unsigned long __addr; \
> >> + __addr = (cpu) ? COREB_L1_CODE_START : COREA_L1_CODE_START; \
> >> + __addr; })
> >> +
> >> +#define get_l1_data_a_start_cpu(cpu) \
> >> + ({ unsigned long __addr; \
> >> + __addr = (cpu) ? COREB_L1_DATA_A_START : COREA_L1_DATA_A_START;\
> >> + __addr; })
> >> +
> >> +#define get_l1_data_b_start_cpu(cpu) \
> >> + ({ unsigned long __addr; \
> >> + __addr = (cpu) ? COREB_L1_DATA_B_START : COREA_L1_DATA_B_START;\
> >> + __addr; })
> >> +
> >> +#define get_l1_scratch_start() get_l1_scratch_start_cpu(blackfin_core_id())
> >> +#define get_l1_code_start() get_l1_code_start_cpu(blackfin_core_id())
> >> +#define get_l1_data_a_start() get_l1_data_a_start_cpu(blackfin_core_id())
> >> +#define get_l1_data_b_start() get_l1_data_b_start_cpu(blackfin_core_id())
> >> +
> >> +#else /* !CONFIG_SMP */
> >> +#define get_l1_scratch_start_cpu(cpu) L1_SCRATCH_START
> >> +#define get_l1_code_start_cpu(cpu) L1_CODE_START
> >> +#define get_l1_data_a_start_cpu(cpu) L1_DATA_A_START
> >> +#define get_l1_data_b_start_cpu(cpu) L1_DATA_B_START
> >> +#define get_l1_scratch_start() L1_SCRATCH_START
> >> +#define get_l1_code_start() L1_CODE_START
> >> +#define get_l1_data_a_start() L1_DATA_A_START
> >> +#define get_l1_data_b_start() L1_DATA_B_START
> >> +#endif /* !CONFIG_SMP */
> >
> > grumble. These didn't need to be implemented as macros and hence
> > shouldn't have been.
> >
> > Example:
> >
> > int cpu = smp_processor_id();
> > get_l1_scratch_start_cpu(cpu);
> >
> > that code should generate unused variable warnings on CONFIG_SMP=n. If
> > it doesn't, you got lucky, because it _should_.
> >
> > Also
> >
> > int cpu = smp_processor_id();
> > get_l1_scratch_start_cpu(pcu);
> >
> > will happily compile and run with CONFIG_SMP=n.
> >
> >
> > macros=bad,bad,bad.
> >
>
> Yes, I also prefer inline functions rather than macros here.
> Right, Graf?
OK!

>
> >>
> >> ...
> >>
> >> --- /dev/null
> >> +++ b/arch/blackfin/mach-bf561/smp.c
> >> @@ -0,0 +1,182 @@
> >> +/*
> >> + * File: arch/blackfin/mach-bf561/smp.c
> >> + * Author: Philippe Gerum <[email protected]>
> >> + *
> >> + * Copyright 2007 Analog Devices Inc.
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License as published by
> >> + * the Free Software Foundation; either version 2 of the License, or
> >> + * (at your option) any later version.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program; if not, see the file COPYING, or write
> >> + * to the Free Software Foundation, Inc.,
> >> + * 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> >> + */
> >> +
> >> +#include <linux/init.h>
> >> +#include <linux/kernel.h>
> >> +#include <linux/sched.h>
> >> +#include <linux/delay.h>
> >> +#include <asm/smp.h>
> >> +#include <asm/dma.h>
> >> +
> >> +#define COREB_SRAM_BASE 0xff600000
> >> +#define COREB_SRAM_SIZE 0x4000
> >> +
> >> +extern char coreb_trampoline_start, coreb_trampoline_end;
> >
> > OK, these are defined in .S and we do often put declarations for such
> > things in .c rather than in .h. But I think it's better to put them in
> > .h anyway, to avoid possibly duplicated declarations in the future.
> >
>
> Oh, I suggested Graf to run checkpatch.pl to find some issues before I
> sent out this patch.
> Should this issues be catched by checkpatch.pl?
OK, I will remove them.
>
>
> >> +static DEFINE_SPINLOCK(boot_lock);
> >> +
> >> +static cpumask_t cpu_callin_map;
> >> +
> >>
> >> ...
> >>
> >> +void __cpuinit platform_secondary_init(unsigned int cpu)
> >> +{
> >> + local_irq_disable();
> >> +
> >> + /* Clone setup for peripheral interrupt sources from CoreA. */
> >> + bfin_write_SICB_IMASK0(bfin_read_SICA_IMASK0());
> >> + bfin_write_SICB_IMASK1(bfin_read_SICA_IMASK1());
> >> + SSYNC();
> >> +
> >> + /* Clone setup for IARs from CoreA. */
> >> + bfin_write_SICB_IAR0(bfin_read_SICA_IAR0());
> >> + bfin_write_SICB_IAR1(bfin_read_SICA_IAR1());
> >> + bfin_write_SICB_IAR2(bfin_read_SICA_IAR2());
> >> + bfin_write_SICB_IAR3(bfin_read_SICA_IAR3());
> >> + bfin_write_SICB_IAR4(bfin_read_SICA_IAR4());
> >> + bfin_write_SICB_IAR5(bfin_read_SICA_IAR5());
> >> + bfin_write_SICB_IAR6(bfin_read_SICA_IAR6());
> >> + bfin_write_SICB_IAR7(bfin_read_SICA_IAR7());
> >> + SSYNC();
> >> +
> >> + local_irq_enable();
> >> +
> >> + /* Calibrate loops per jiffy value. */
> >> + calibrate_delay();
> >> +
> >> + /* Store CPU-private information to the cpu_data array. */
> >> + bfin_setup_cpudata(cpu);
> >> +
> >> + /* We are done with local CPU inits, unblock the boot CPU. */
> >> + cpu_set(cpu, cpu_callin_map);
> >> + spin_lock(&boot_lock);
> >> + spin_unlock(&boot_lock);
> >
> > Is this spin_lock()+spin_unlock() supposed to block until the secondary
> > CPU is running? If so, I don't think it works.
> >
>
> We can remove these 2 line spin_lock+spin_unlock and it also works.
> But maybe we will add some operation between spin_lock and spin_unlock
> here in the future,
> we'd like to keep them.
>
> P.S. also forward this patch to linux-arch
>
> Thanks
> -Bryan
>
> >> +}
> >> +
> >>
> >> ...
> >>
> >
> >

2008-11-19 08:20:39

by Bryan Wu

[permalink] [raw]
Subject: Re: [PATCH 2/5] Blackfin arch: SMP supporting patchset: Blackfin header files and machine common code

On Wed, Nov 19, 2008 at 3:52 PM, gyang <[email protected]> wrote:
>
> $B:_(B 2008-11-18$BFsE*(B 22:56 -0800$B!$(BAndrew Morton$B<LF;!'(B
>> On Tue, 18 Nov 2008 17:05:05 +0800 Bryan Wu <[email protected]> wrote:
>>
>> > From: Graf Yang <[email protected]>
>> >
>> > Blackfin dual core BF561 processor can support SMP like features.
>> > https://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like
>> >
>> > In this patch, we provide SMP extend to Blackfin header files
>> > and machine common code
>> >
>> >
>> > ...
>> >
>> > +#define atomic_add_unless(v, a, u) \
>> > +({ \
>> > + int c, old; \
>> > + c = atomic_read(v); \
>> > + while (c != (u) && (old = atomic_cmpxchg((v), c, c + (a))) != c) \
>> > + c = old; \
>> > + c != (u); \
>> > +})
>>
>> The macro references its args multiple times and will do weird or
>> inefficient things when called with expressions which have
>> side-effects, or which do slow things.
>>
>> >
>> > ...
>> >
>> > +#include <asm/system.h> /* save_flags */
>> > +
>> > +static inline void set_bit(int nr, volatile unsigned long *addr)
>> > {
>> > int *a = (int *)addr;
>> > int mask;
>> > @@ -57,21 +91,23 @@ static __inline__ void clear_bit(int nr, volatile unsigned long *addr)
>> > a += nr >> 5;
>> > mask = 1 << (nr & 0x1f);
>> > local_irq_save(flags);
>> > - *a &= ~mask;
>> > + *a |= mask;
>>
>> I think you just broke clear_bit(). Maybe I'm misreading the diff.
> OK, We have corrected it on our own tree.

Both the code and the patch are all right. Because the mess from diff,
we all misread it.

-Bryan

>>
>> > local_irq_restore(flags);
>> > }
>> >
>> >
>> > ...
>> >
>> > +#define smp_mb__before_clear_bit() barrier()
>> > +#define smp_mb__after_clear_bit() barrier()
>> > +
>> > +static inline void __set_bit(int nr, volatile unsigned long *addr)
>> > +{
>> > + int *a = (int *)addr;
>> > + int mask;
>> > +
>> > + a += nr >> 5;
>> > + mask = 1 << (nr & 0x1f);
>> > + *a |= mask;
>> > +}
>> > +
>> > +static inline void __clear_bit(int nr, volatile unsigned long *addr)
>> > +{
>> > + int *a = (int *)addr;
>> > + int mask;
>> > +
>> > + a += nr >> 5;
>> > + mask = 1 << (nr & 0x1f);
>> > + *a &= ~mask;
>> > +}
>> > +
>> > +static inline void __change_bit(int nr, volatile unsigned long *addr)
>> > +{
>> > + int mask;
>> > + unsigned long *ADDR = (unsigned long *)addr;
>> > +
>> > + ADDR += nr >> 5;
>> > + mask = 1 << (nr & 31);
>> > + *ADDR ^= mask;
>> > +}
>>
>> I'm surprised there isn't any generic code which can be used for the above.
>>
>> >
>> > ...
>> >
>>
>> Gad what a lot of code. I don't think I have time to read it all, sorry.
>

2008-11-19 13:51:18

by Mike Frysinger

[permalink] [raw]
Subject: Re: [PATCH 0/5] Blackfin SMP like patchset

On Wed, Nov 19, 2008 at 01:56, Andrew Morton wrote:
> On Tue, 18 Nov 2008 17:05:03 +0800 Bryan Wu wrote:
>> We provide the SMP like functions for our Blackfin dual core processor
>> BF561 for almost 1 year. And after a long time developing, debugging and
>> internal review, we'd like to post them to LKML for other maintainer
>> review.
>>
>> Please find our wiki page about this SMP like patches:
>> http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like
>
> Would prefer that changelogs be self-contained, please. Kernel
> changelogs are for ever, and I doubt if that page will be there in 20
> years time.

that isnt a changelog though, it's documentation. perhaps we should
put together a reduced version of the page for
Documentation/blackfin/smp-like.txt ...
-mike

2008-11-20 13:51:00

by Mike Frysinger

[permalink] [raw]
Subject: Re: [PATCH 2/5] Blackfin arch: SMP supporting patchset: Blackfin header files and machine common code

On Wed, Nov 19, 2008 at 02:42, Bryan Wu wrote:
> On Wed, Nov 19, 2008 at 2:56 PM, Andrew Morton wrote:
>> On Tue, 18 Nov 2008 17:05:05 +0800 Bryan Wu wrote:
>>> From: Graf Yang <[email protected]>
>>> +#define smp_mb__before_clear_bit() barrier()
>>> +#define smp_mb__after_clear_bit() barrier()
>>> +
>>> +static inline void __set_bit(int nr, volatile unsigned long *addr)
>>> +{
>>> + int *a = (int *)addr;
>>> + int mask;
>>> +
>>> + a += nr >> 5;
>>> + mask = 1 << (nr & 0x1f);
>>> + *a |= mask;
>>> +}
>>> +
>>> +static inline void __clear_bit(int nr, volatile unsigned long *addr)
>>> +{
>>> + int *a = (int *)addr;
>>> + int mask;
>>> +
>>> + a += nr >> 5;
>>> + mask = 1 << (nr & 0x1f);
>>> + *a &= ~mask;
>>> +}
>>> +
>>> +static inline void __change_bit(int nr, volatile unsigned long *addr)
>>> +{
>>> + int mask;
>>> + unsigned long *ADDR = (unsigned long *)addr;
>>> +
>>> + ADDR += nr >> 5;
>>> + mask = 1 << (nr & 31);
>>> + *ADDR ^= mask;
>>> +}
>>
>> I'm surprised there isn't any generic code which can be used for the above.
>>
>
> As Nick said, include/asm-generic/bitops/non-atomic.h is the generic code.
> We will try it.

the Blackfin ISA provides explicit bit operations. is there something
special about these C-level bit operations that prevents us from using
them ?

PRM:
"BITCLR" on page 13-2
"BITSET" on page 13-4
"BITTGL" on page 13-6
-mike