2009-01-21 08:47:08

by Tejun Heo

[permalink] [raw]
Subject: [PATCHSET linux-2.6-x86:core/percpu] x86: misc clean up and unify x86_32/64 code paths


Hello,

This patchset contains the following 10 patches mostly unifying 32 and
64bit code paths. 0001 and 0007-0010 are from me and 0002-0006 are
from Brian Gerst.

0001-x86-update-canary-handling-during-switch.patch
0002-x86-clean-up-gdt_page-definition.patch
0003-x86-fix-percpu_write-with-64-bit-constants.patch
0004-x86-set-fs-to-__KERNEL_PERCPU-unconditionally-for.patch
0005-x86-merge-mmu_context.h.patch
0006-x86-merge-irq_regs.h.patch
0007-x86-uv-cleanup.patch
0008-x86-prepare-for-tlb-merge.patch
0009-x86-make-x86_32-use-tlb_64.c.patch
0010-x86-rename-tlb_64.c-to-tlb.c.patch

0001-0002 clean up after previous percpu changes. 0003 improves
percpu op slightly. 0004 simplifies x86_32 init path. 0005-0010
unifies mmu_context.h, irq_regs.h and tlb.c, which not only simplifies
the source code but also improves performance by applying
optimizations implemented for either bitness to the other one while
unifying.

The following git tree contains the above changes on top of
core/percpu[1]. Please pull from it if there's no objection.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu

diffstat follows.

arch/x86/include/asm/genapic_32.h | 7 -
arch/x86/include/asm/genapic_64.h | 6 -
arch/x86/include/asm/irq_regs.h | 36 +++-
arch/x86/include/asm/irq_regs_32.h | 31 ---
arch/x86/include/asm/irq_regs_64.h | 1 -
arch/x86/include/asm/irq_vectors.h | 36 ++--
arch/x86/include/asm/mach-default/entry_arch.h | 18 ++-
arch/x86/include/asm/mmu_context.h | 63 ++++++-
arch/x86/include/asm/mmu_context_32.h | 55 ------
arch/x86/include/asm/mmu_context_64.h | 52 -----
arch/x86/include/asm/percpu.h | 2 +-
arch/x86/include/asm/system.h | 15 +-
arch/x86/include/asm/uv/uv.h | 33 ++++
arch/x86/include/asm/uv/uv_bau.h | 2 -
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/cpu/common.c | 23 ++-
arch/x86/kernel/entry_32.S | 6 +-
arch/x86/kernel/genx2apic_uv_x.c | 1 +
arch/x86/kernel/head_32.S | 6 +-
arch/x86/kernel/irq_64.c | 3 +
arch/x86/kernel/irqinit_32.c | 11 +-
arch/x86/kernel/smpboot.c | 1 +
arch/x86/kernel/{tlb_64.c => tlb.c} | 48 +++---
arch/x86/kernel/tlb_32.c | 239 ------------------------
arch/x86/kernel/tlb_uv.c | 68 ++++---
25 files changed, 268 insertions(+), 497 deletions(-)
delete mode 100644 arch/x86/include/asm/irq_regs_32.h
delete mode 100644 arch/x86/include/asm/irq_regs_64.h
delete mode 100644 arch/x86/include/asm/mmu_context_32.h
delete mode 100644 arch/x86/include/asm/mmu_context_64.h
create mode 100644 arch/x86/include/asm/uv/uv.h
rename arch/x86/kernel/{tlb_64.c => tlb.c} (90%)
delete mode 100644 arch/x86/kernel/tlb_32.c

--
tejun

[1] 8f5d36ed5bb6e33024619eaee15b7ce2e3d115b3


2009-01-21 08:46:43

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 01/10] x86: update canary handling during switch

Impact: cleanup

In switch_to(), instead of taking offset to irq_stack_union.stack,
make it a proper percpu access using __percpu_arg() and per_cpu_var().

Signed-off-by: Tejun Heo <[email protected]>
---
arch/x86/include/asm/system.h | 15 +++++++++------
1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h
index 52eb748..2fcc70b 100644
--- a/arch/x86/include/asm/system.h
+++ b/arch/x86/include/asm/system.h
@@ -89,13 +89,15 @@ do { \
#ifdef CONFIG_CC_STACKPROTECTOR
#define __switch_canary \
"movq %P[task_canary](%%rsi),%%r8\n\t" \
- "movq %%r8,%%gs:%P[gs_canary]\n\t"
-#define __switch_canary_param \
- , [task_canary] "i" (offsetof(struct task_struct, stack_canary)) \
- , [gs_canary] "i" (offsetof(union irq_stack_union, stack_canary))
+ "movq %%r8,"__percpu_arg([gs_canary])"\n\t"
+#define __switch_canary_oparam \
+ , [gs_canary] "=m" (per_cpu_var(irq_stack_union.stack_canary))
+#define __switch_canary_iparam \
+ , [task_canary] "i" (offsetof(struct task_struct, stack_canary))
#else /* CC_STACKPROTECTOR */
#define __switch_canary
-#define __switch_canary_param
+#define __switch_canary_oparam
+#define __switch_canary_iparam
#endif /* CC_STACKPROTECTOR */

/* Save restore flags to clear handle leaking NT */
@@ -114,13 +116,14 @@ do { \
"jc ret_from_fork\n\t" \
RESTORE_CONTEXT \
: "=a" (last) \
+ __switch_canary_oparam \
: [next] "S" (next), [prev] "D" (prev), \
[threadrsp] "i" (offsetof(struct task_struct, thread.sp)), \
[ti_flags] "i" (offsetof(struct thread_info, flags)), \
[tif_fork] "i" (TIF_FORK), \
[thread_info] "i" (offsetof(struct task_struct, stack)), \
[current_task] "m" (per_cpu_var(current_task)) \
- __switch_canary_param \
+ __switch_canary_iparam \
: "memory", "cc" __EXTRA_CLOBBER)
#endif

--
1.6.0.2

2009-01-21 08:47:36

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 06/10] x86: merge irq_regs.h

From: Brian Gerst <[email protected]>

Impact: cleanup, better irq_regs code generation for x86_64

Make 64-bit use the same optimizations as 32-bit.

Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
---
arch/x86/include/asm/irq_regs.h | 36 +++++++++++++++++++++++++++++++-----
arch/x86/include/asm/irq_regs_32.h | 31 -------------------------------
arch/x86/include/asm/irq_regs_64.h | 1 -
arch/x86/kernel/irq_64.c | 3 +++
4 files changed, 34 insertions(+), 37 deletions(-)
delete mode 100644 arch/x86/include/asm/irq_regs_32.h
delete mode 100644 arch/x86/include/asm/irq_regs_64.h

diff --git a/arch/x86/include/asm/irq_regs.h b/arch/x86/include/asm/irq_regs.h
index 89c898a..7784322 100644
--- a/arch/x86/include/asm/irq_regs.h
+++ b/arch/x86/include/asm/irq_regs.h
@@ -1,5 +1,31 @@
-#ifdef CONFIG_X86_32
-# include "irq_regs_32.h"
-#else
-# include "irq_regs_64.h"
-#endif
+/*
+ * Per-cpu current frame pointer - the location of the last exception frame on
+ * the stack, stored in the per-cpu area.
+ *
+ * Jeremy Fitzhardinge <[email protected]>
+ */
+#ifndef _ASM_X86_IRQ_REGS_H
+#define _ASM_X86_IRQ_REGS_H
+
+#include <asm/percpu.h>
+
+#define ARCH_HAS_OWN_IRQ_REGS
+
+DECLARE_PER_CPU(struct pt_regs *, irq_regs);
+
+static inline struct pt_regs *get_irq_regs(void)
+{
+ return percpu_read(irq_regs);
+}
+
+static inline struct pt_regs *set_irq_regs(struct pt_regs *new_regs)
+{
+ struct pt_regs *old_regs;
+
+ old_regs = get_irq_regs();
+ percpu_write(irq_regs, new_regs);
+
+ return old_regs;
+}
+
+#endif /* _ASM_X86_IRQ_REGS_32_H */
diff --git a/arch/x86/include/asm/irq_regs_32.h b/arch/x86/include/asm/irq_regs_32.h
deleted file mode 100644
index d7ed33e..0000000
--- a/arch/x86/include/asm/irq_regs_32.h
+++ /dev/null
@@ -1,31 +0,0 @@
-/*
- * Per-cpu current frame pointer - the location of the last exception frame on
- * the stack, stored in the per-cpu area.
- *
- * Jeremy Fitzhardinge <[email protected]>
- */
-#ifndef _ASM_X86_IRQ_REGS_32_H
-#define _ASM_X86_IRQ_REGS_32_H
-
-#include <asm/percpu.h>
-
-#define ARCH_HAS_OWN_IRQ_REGS
-
-DECLARE_PER_CPU(struct pt_regs *, irq_regs);
-
-static inline struct pt_regs *get_irq_regs(void)
-{
- return percpu_read(irq_regs);
-}
-
-static inline struct pt_regs *set_irq_regs(struct pt_regs *new_regs)
-{
- struct pt_regs *old_regs;
-
- old_regs = get_irq_regs();
- percpu_write(irq_regs, new_regs);
-
- return old_regs;
-}
-
-#endif /* _ASM_X86_IRQ_REGS_32_H */
diff --git a/arch/x86/include/asm/irq_regs_64.h b/arch/x86/include/asm/irq_regs_64.h
deleted file mode 100644
index 3dd9c0b..0000000
--- a/arch/x86/include/asm/irq_regs_64.h
+++ /dev/null
@@ -1 +0,0 @@
-#include <asm-generic/irq_regs.h>
diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index 1db0524..0b254de 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -22,6 +22,9 @@
DEFINE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
EXPORT_PER_CPU_SYMBOL(irq_stat);

+DEFINE_PER_CPU(struct pt_regs *, irq_regs);
+EXPORT_PER_CPU_SYMBOL(irq_regs);
+
/*
* Probabilistic stack overflow check:
*
--
1.6.0.2

2009-01-21 08:47:58

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 02/10] x86: clean up gdt_page definition

From: Brian Gerst <[email protected]>

Impact: cleanup && more compact percpu area layout with future changes

Move 64-bit GDT to page-aligned section and clean up comment
formatting.

Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
---
arch/x86/kernel/cpu/common.c | 20 ++++++++++----------
1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3887fcf..a8f0ded 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -63,23 +63,23 @@ cpumask_t cpu_sibling_setup_map;

static struct cpu_dev *this_cpu __cpuinitdata;

+DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
#ifdef CONFIG_X86_64
-/* We need valid kernel segments for data and code in long mode too
- * IRET will check the segment types kkeil 2000/10/28
- * Also sysret mandates a special GDT layout
- */
-/* The TLS descriptors are currently at a different place compared to i386.
- Hopefully nobody expects them at a fixed place (Wine?) */
-DEFINE_PER_CPU(struct gdt_page, gdt_page) = { .gdt = {
+ /*
+ * We need valid kernel segments for data and code in long mode too
+ * IRET will check the segment types kkeil 2000/10/28
+ * Also sysret mandates a special GDT layout
+ *
+ * The TLS descriptors are currently at a different place compared to i386.
+ * Hopefully nobody expects them at a fixed place (Wine?)
+ */
[GDT_ENTRY_KERNEL32_CS] = { { { 0x0000ffff, 0x00cf9b00 } } },
[GDT_ENTRY_KERNEL_CS] = { { { 0x0000ffff, 0x00af9b00 } } },
[GDT_ENTRY_KERNEL_DS] = { { { 0x0000ffff, 0x00cf9300 } } },
[GDT_ENTRY_DEFAULT_USER32_CS] = { { { 0x0000ffff, 0x00cffb00 } } },
[GDT_ENTRY_DEFAULT_USER_DS] = { { { 0x0000ffff, 0x00cff300 } } },
[GDT_ENTRY_DEFAULT_USER_CS] = { { { 0x0000ffff, 0x00affb00 } } },
-} };
#else
-DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
[GDT_ENTRY_KERNEL_CS] = { { { 0x0000ffff, 0x00cf9a00 } } },
[GDT_ENTRY_KERNEL_DS] = { { { 0x0000ffff, 0x00cf9200 } } },
[GDT_ENTRY_DEFAULT_USER_CS] = { { { 0x0000ffff, 0x00cffa00 } } },
@@ -112,8 +112,8 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {

[GDT_ENTRY_ESPFIX_SS] = { { { 0x00000000, 0x00c09200 } } },
[GDT_ENTRY_PERCPU] = { { { 0x00000000, 0x00000000 } } },
-} };
#endif
+} };
EXPORT_PER_CPU_SYMBOL_GPL(gdt_page);

#ifdef CONFIG_X86_32
--
1.6.0.2

2009-01-21 08:48:23

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 03/10] x86: fix percpu_write with 64-bit constants

From: Brian Gerst <[email protected]>

Impact: slightly better code generation for percpu_to_op()

The processor will sign-extend 32-bit immediate values in 64-bit
operations. Use the 'e' constraint ("32-bit signed integer constant,
or a symbolic reference known to fit that range") for 64-bit constants.

Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
---
arch/x86/include/asm/percpu.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index ce980db..0b64af4 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -75,7 +75,7 @@ do { \
case 8: \
asm(op "q %1,"__percpu_arg(0) \
: "+m" (var) \
- : "r" ((T__)val)); \
+ : "re" ((T__)val)); \
break; \
default: __bad_percpu_size(); \
} \
--
1.6.0.2

2009-01-21 08:48:43

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 04/10] x86: set %fs to __KERNEL_PERCPU unconditionally for x86_32

From: Brian Gerst <[email protected]>

Impact: cleanup

%fs is currently set to __KERNEL_DS at boot, and conditionally
switched to __KERNEL_PERCPU for secondary cpus. Instead, initialize
GDT_ENTRY_PERCPU to the same attributes as GDT_ENTRY_KERNEL_DS and
set %fs to __KERNEL_PERCPU unconditionally.

Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
---
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/kernel/head_32.S | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a8f0ded..fbebbce 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -111,7 +111,7 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
[GDT_ENTRY_APMBIOS_BASE+2] = { { { 0x0000ffff, 0x00409200 } } },

[GDT_ENTRY_ESPFIX_SS] = { { { 0x00000000, 0x00c09200 } } },
- [GDT_ENTRY_PERCPU] = { { { 0x00000000, 0x00000000 } } },
+ [GDT_ENTRY_PERCPU] = { { { 0x0000ffff, 0x00cf9200 } } },
#endif
} };
EXPORT_PER_CPU_SYMBOL_GPL(gdt_page);
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index e835b4e..24c0e5c 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -429,12 +429,14 @@ is386: movl $2,%ecx # set MP
ljmp $(__KERNEL_CS),$1f
1: movl $(__KERNEL_DS),%eax # reload all the segment registers
movl %eax,%ss # after changing gdt.
- movl %eax,%fs # gets reset once there's real percpu

movl $(__USER_DS),%eax # DS/ES contains default USER segment
movl %eax,%ds
movl %eax,%es

+ movl $(__KERNEL_PERCPU), %eax
+ movl %eax,%fs # set this cpu's percpu
+
xorl %eax,%eax # Clear GS and LDT
movl %eax,%gs
lldt %ax
@@ -446,8 +448,6 @@ is386: movl $2,%ecx # set MP
movb $1, ready
cmpb $0,%cl # the first CPU calls start_kernel
je 1f
- movl $(__KERNEL_PERCPU), %eax
- movl %eax,%fs # set this cpu's percpu
movl (stack_start), %esp
1:
#endif /* CONFIG_SMP */
--
1.6.0.2

2009-01-21 08:49:16

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 05/10] x86: merge mmu_context.h

From: Brian Gerst <[email protected]>

Impact: cleanup

tj: * changed cpu to unsigned as was done on mmu_context_64.h as cpu
id is officially unsigned int
* added missing ';' to 32bit version of deactivate_mm()

Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
---
arch/x86/include/asm/mmu_context.h | 63 ++++++++++++++++++++++++++++++--
arch/x86/include/asm/mmu_context_32.h | 55 ----------------------------
arch/x86/include/asm/mmu_context_64.h | 52 ---------------------------
3 files changed, 59 insertions(+), 111 deletions(-)
delete mode 100644 arch/x86/include/asm/mmu_context_32.h
delete mode 100644 arch/x86/include/asm/mmu_context_64.h

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 8aeeb3f..52948df 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -21,11 +21,54 @@ static inline void paravirt_activate_mm(struct mm_struct *prev,
int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
void destroy_context(struct mm_struct *mm);

-#ifdef CONFIG_X86_32
-# include "mmu_context_32.h"
-#else
-# include "mmu_context_64.h"
+
+static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
+{
+#ifdef CONFIG_SMP
+ if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
+ percpu_write(cpu_tlbstate.state, TLBSTATE_LAZY);
+#endif
+}
+
+static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
+ struct task_struct *tsk)
+{
+ unsigned cpu = smp_processor_id();
+
+ if (likely(prev != next)) {
+ /* stop flush ipis for the previous mm */
+ cpu_clear(cpu, prev->cpu_vm_mask);
+#ifdef CONFIG_SMP
+ percpu_write(cpu_tlbstate.state, TLBSTATE_OK);
+ percpu_write(cpu_tlbstate.active_mm, next);
#endif
+ cpu_set(cpu, next->cpu_vm_mask);
+
+ /* Re-load page tables */
+ load_cr3(next->pgd);
+
+ /*
+ * load the LDT, if the LDT is different:
+ */
+ if (unlikely(prev->context.ldt != next->context.ldt))
+ load_LDT_nolock(&next->context);
+ }
+#ifdef CONFIG_SMP
+ else {
+ percpu_write(cpu_tlbstate.state, TLBSTATE_OK);
+ BUG_ON(percpu_read(cpu_tlbstate.active_mm) != next);
+
+ if (!cpu_test_and_set(cpu, next->cpu_vm_mask)) {
+ /* We were in lazy tlb mode and leave_mm disabled
+ * tlb flush IPI delivery. We must reload CR3
+ * to make sure to use no freed page tables.
+ */
+ load_cr3(next->pgd);
+ load_LDT_nolock(&next->context);
+ }
+ }
+#endif
+}

#define activate_mm(prev, next) \
do { \
@@ -33,5 +76,17 @@ do { \
switch_mm((prev), (next), NULL); \
} while (0);

+#ifdef CONFIG_X86_32
+#define deactivate_mm(tsk, mm) \
+do { \
+ loadsegment(gs, 0); \
+} while (0)
+#else
+#define deactivate_mm(tsk, mm) \
+do { \
+ load_gs_index(0); \
+ loadsegment(fs, 0); \
+} while (0)
+#endif

#endif /* _ASM_X86_MMU_CONTEXT_H */
diff --git a/arch/x86/include/asm/mmu_context_32.h b/arch/x86/include/asm/mmu_context_32.h
deleted file mode 100644
index 08b5345..0000000
--- a/arch/x86/include/asm/mmu_context_32.h
+++ /dev/null
@@ -1,55 +0,0 @@
-#ifndef _ASM_X86_MMU_CONTEXT_32_H
-#define _ASM_X86_MMU_CONTEXT_32_H
-
-static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
-{
-#ifdef CONFIG_SMP
- if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
- percpu_write(cpu_tlbstate.state, TLBSTATE_LAZY);
-#endif
-}
-
-static inline void switch_mm(struct mm_struct *prev,
- struct mm_struct *next,
- struct task_struct *tsk)
-{
- int cpu = smp_processor_id();
-
- if (likely(prev != next)) {
- /* stop flush ipis for the previous mm */
- cpu_clear(cpu, prev->cpu_vm_mask);
-#ifdef CONFIG_SMP
- percpu_write(cpu_tlbstate.state, TLBSTATE_OK);
- percpu_write(cpu_tlbstate.active_mm, next);
-#endif
- cpu_set(cpu, next->cpu_vm_mask);
-
- /* Re-load page tables */
- load_cr3(next->pgd);
-
- /*
- * load the LDT, if the LDT is different:
- */
- if (unlikely(prev->context.ldt != next->context.ldt))
- load_LDT_nolock(&next->context);
- }
-#ifdef CONFIG_SMP
- else {
- percpu_write(cpu_tlbstate.state, TLBSTATE_OK);
- BUG_ON(percpu_read(cpu_tlbstate.active_mm) != next);
-
- if (!cpu_test_and_set(cpu, next->cpu_vm_mask)) {
- /* We were in lazy tlb mode and leave_mm disabled
- * tlb flush IPI delivery. We must reload %cr3.
- */
- load_cr3(next->pgd);
- load_LDT_nolock(&next->context);
- }
- }
-#endif
-}
-
-#define deactivate_mm(tsk, mm) \
- asm("movl %0,%%gs": :"r" (0));
-
-#endif /* _ASM_X86_MMU_CONTEXT_32_H */
diff --git a/arch/x86/include/asm/mmu_context_64.h b/arch/x86/include/asm/mmu_context_64.h
deleted file mode 100644
index c457250..0000000
--- a/arch/x86/include/asm/mmu_context_64.h
+++ /dev/null
@@ -1,52 +0,0 @@
-#ifndef _ASM_X86_MMU_CONTEXT_64_H
-#define _ASM_X86_MMU_CONTEXT_64_H
-
-static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
-{
-#ifdef CONFIG_SMP
- if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
- percpu_write(cpu_tlbstate.state, TLBSTATE_LAZY);
-#endif
-}
-
-static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
- struct task_struct *tsk)
-{
- unsigned cpu = smp_processor_id();
- if (likely(prev != next)) {
- /* stop flush ipis for the previous mm */
- cpu_clear(cpu, prev->cpu_vm_mask);
-#ifdef CONFIG_SMP
- percpu_write(cpu_tlbstate.state, TLBSTATE_OK);
- percpu_write(cpu_tlbstate.active_mm, next);
-#endif
- cpu_set(cpu, next->cpu_vm_mask);
- load_cr3(next->pgd);
-
- if (unlikely(next->context.ldt != prev->context.ldt))
- load_LDT_nolock(&next->context);
- }
-#ifdef CONFIG_SMP
- else {
- percpu_write(cpu_tlbstate.state, TLBSTATE_OK);
- BUG_ON(percpu_read(cpu_tlbstate.active_mm) != next);
-
- if (!cpu_test_and_set(cpu, next->cpu_vm_mask)) {
- /* We were in lazy tlb mode and leave_mm disabled
- * tlb flush IPI delivery. We must reload CR3
- * to make sure to use no freed page tables.
- */
- load_cr3(next->pgd);
- load_LDT_nolock(&next->context);
- }
- }
-#endif
-}
-
-#define deactivate_mm(tsk, mm) \
-do { \
- load_gs_index(0); \
- asm volatile("movl %0,%%fs"::"r"(0)); \
-} while (0)
-
-#endif /* _ASM_X86_MMU_CONTEXT_64_H */
--
1.6.0.2

2009-01-21 08:49:45

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 10/10] x86: rename tlb_64.c to tlb.c

Impact: file rename

tlb_64.c is now the tlb code for both 32 and 64. Rename it to tlb.c.

Signed-off-by: Tejun Heo <[email protected]>
---
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/tlb.c | 296 ++++++++++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/tlb_64.c | 296 ----------------------------------------------
3 files changed, 297 insertions(+), 297 deletions(-)
create mode 100644 arch/x86/kernel/tlb.c
delete mode 100644 arch/x86/kernel/tlb_64.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index a62a15c..0626a88 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -58,7 +58,7 @@ obj-$(CONFIG_PCI) += early-quirks.o
apm-y := apm_32.o
obj-$(CONFIG_APM) += apm.o
obj-$(CONFIG_X86_SMP) += smp.o
-obj-$(CONFIG_X86_SMP) += smpboot.o tsc_sync.o ipi.o tlb_64.o
+obj-$(CONFIG_X86_SMP) += smpboot.o tsc_sync.o ipi.o tlb.o
obj-$(CONFIG_X86_32_SMP) += smpcommon.o
obj-$(CONFIG_X86_64_SMP) += tsc_sync.o smpcommon.o
obj-$(CONFIG_X86_TRAMPOLINE) += trampoline_$(BITS).o
diff --git a/arch/x86/kernel/tlb.c b/arch/x86/kernel/tlb.c
new file mode 100644
index 0000000..b3ca1b9
--- /dev/null
+++ b/arch/x86/kernel/tlb.c
@@ -0,0 +1,296 @@
+#include <linux/init.h>
+
+#include <linux/mm.h>
+#include <linux/spinlock.h>
+#include <linux/smp.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+
+#include <asm/tlbflush.h>
+#include <asm/mmu_context.h>
+#include <asm/apic.h>
+#include <asm/uv/uv.h>
+
+DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate)
+ = { &init_mm, 0, };
+
+#include <mach_ipi.h>
+/*
+ * Smarter SMP flushing macros.
+ * c/o Linus Torvalds.
+ *
+ * These mean you can really definitely utterly forget about
+ * writing to user space from interrupts. (Its not allowed anyway).
+ *
+ * Optimizations Manfred Spraul <[email protected]>
+ *
+ * More scalable flush, from Andi Kleen
+ *
+ * To avoid global state use 8 different call vectors.
+ * Each CPU uses a specific vector to trigger flushes on other
+ * CPUs. Depending on the received vector the target CPUs look into
+ * the right per cpu variable for the flush data.
+ *
+ * With more than 8 CPUs they are hashed to the 8 available
+ * vectors. The limited global vector space forces us to this right now.
+ * In future when interrupts are split into per CPU domains this could be
+ * fixed, at the cost of triggering multiple IPIs in some cases.
+ */
+
+union smp_flush_state {
+ struct {
+ struct mm_struct *flush_mm;
+ unsigned long flush_va;
+ spinlock_t tlbstate_lock;
+ DECLARE_BITMAP(flush_cpumask, NR_CPUS);
+ };
+ char pad[SMP_CACHE_BYTES];
+} ____cacheline_aligned;
+
+/* State is put into the per CPU data section, but padded
+ to a full cache line because other CPUs can access it and we don't
+ want false sharing in the per cpu data segment. */
+static DEFINE_PER_CPU(union smp_flush_state, flush_state);
+
+/*
+ * We cannot call mmdrop() because we are in interrupt context,
+ * instead update mm->cpu_vm_mask.
+ */
+void leave_mm(int cpu)
+{
+ if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
+ BUG();
+ cpu_clear(cpu, percpu_read(cpu_tlbstate.active_mm)->cpu_vm_mask);
+ load_cr3(swapper_pg_dir);
+}
+EXPORT_SYMBOL_GPL(leave_mm);
+
+/*
+ *
+ * The flush IPI assumes that a thread switch happens in this order:
+ * [cpu0: the cpu that switches]
+ * 1) switch_mm() either 1a) or 1b)
+ * 1a) thread switch to a different mm
+ * 1a1) cpu_clear(cpu, old_mm->cpu_vm_mask);
+ * Stop ipi delivery for the old mm. This is not synchronized with
+ * the other cpus, but smp_invalidate_interrupt ignore flush ipis
+ * for the wrong mm, and in the worst case we perform a superfluous
+ * tlb flush.
+ * 1a2) set cpu mmu_state to TLBSTATE_OK
+ * Now the smp_invalidate_interrupt won't call leave_mm if cpu0
+ * was in lazy tlb mode.
+ * 1a3) update cpu active_mm
+ * Now cpu0 accepts tlb flushes for the new mm.
+ * 1a4) cpu_set(cpu, new_mm->cpu_vm_mask);
+ * Now the other cpus will send tlb flush ipis.
+ * 1a4) change cr3.
+ * 1b) thread switch without mm change
+ * cpu active_mm is correct, cpu0 already handles
+ * flush ipis.
+ * 1b1) set cpu mmu_state to TLBSTATE_OK
+ * 1b2) test_and_set the cpu bit in cpu_vm_mask.
+ * Atomically set the bit [other cpus will start sending flush ipis],
+ * and test the bit.
+ * 1b3) if the bit was 0: leave_mm was called, flush the tlb.
+ * 2) switch %%esp, ie current
+ *
+ * The interrupt must handle 2 special cases:
+ * - cr3 is changed before %%esp, ie. it cannot use current->{active_,}mm.
+ * - the cpu performs speculative tlb reads, i.e. even if the cpu only
+ * runs in kernel space, the cpu could load tlb entries for user space
+ * pages.
+ *
+ * The good news is that cpu mmu_state is local to each cpu, no
+ * write/read ordering problems.
+ */
+
+/*
+ * TLB flush IPI:
+ *
+ * 1) Flush the tlb entries if the cpu uses the mm that's being flushed.
+ * 2) Leave the mm if we are in the lazy tlb mode.
+ *
+ * Interrupts are disabled.
+ */
+
+/*
+ * FIXME: use of asmlinkage is not consistent. On x86_64 it's noop
+ * but still used for documentation purpose but the usage is slightly
+ * inconsistent. On x86_32, asmlinkage is regparm(0) but interrupt
+ * entry calls in with the first parameter in %eax. Maybe define
+ * intrlinkage?
+ */
+#ifdef CONFIG_X86_64
+asmlinkage
+#endif
+void smp_invalidate_interrupt(struct pt_regs *regs)
+{
+ unsigned int cpu;
+ unsigned int sender;
+ union smp_flush_state *f;
+
+ cpu = smp_processor_id();
+ /*
+ * orig_rax contains the negated interrupt vector.
+ * Use that to determine where the sender put the data.
+ */
+ sender = ~regs->orig_ax - INVALIDATE_TLB_VECTOR_START;
+ f = &per_cpu(flush_state, sender);
+
+ if (!cpumask_test_cpu(cpu, to_cpumask(f->flush_cpumask)))
+ goto out;
+ /*
+ * This was a BUG() but until someone can quote me the
+ * line from the intel manual that guarantees an IPI to
+ * multiple CPUs is retried _only_ on the erroring CPUs
+ * its staying as a return
+ *
+ * BUG();
+ */
+
+ if (f->flush_mm == percpu_read(cpu_tlbstate.active_mm)) {
+ if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK) {
+ if (f->flush_va == TLB_FLUSH_ALL)
+ local_flush_tlb();
+ else
+ __flush_tlb_one(f->flush_va);
+ } else
+ leave_mm(cpu);
+ }
+out:
+ ack_APIC_irq();
+ smp_mb__before_clear_bit();
+ cpumask_clear_cpu(cpu, to_cpumask(f->flush_cpumask));
+ smp_mb__after_clear_bit();
+ inc_irq_stat(irq_tlb_count);
+}
+
+static void flush_tlb_others_ipi(const struct cpumask *cpumask,
+ struct mm_struct *mm, unsigned long va)
+{
+ unsigned int sender;
+ union smp_flush_state *f;
+
+ /* Caller has disabled preemption */
+ sender = smp_processor_id() % NUM_INVALIDATE_TLB_VECTORS;
+ f = &per_cpu(flush_state, sender);
+
+ /*
+ * Could avoid this lock when
+ * num_online_cpus() <= NUM_INVALIDATE_TLB_VECTORS, but it is
+ * probably not worth checking this for a cache-hot lock.
+ */
+ spin_lock(&f->tlbstate_lock);
+
+ f->flush_mm = mm;
+ f->flush_va = va;
+ cpumask_andnot(to_cpumask(f->flush_cpumask),
+ cpumask, cpumask_of(smp_processor_id()));
+
+ /*
+ * Make the above memory operations globally visible before
+ * sending the IPI.
+ */
+ smp_mb();
+ /*
+ * We have to send the IPI only to
+ * CPUs affected.
+ */
+ send_IPI_mask(to_cpumask(f->flush_cpumask),
+ INVALIDATE_TLB_VECTOR_START + sender);
+
+ while (!cpumask_empty(to_cpumask(f->flush_cpumask)))
+ cpu_relax();
+
+ f->flush_mm = NULL;
+ f->flush_va = 0;
+ spin_unlock(&f->tlbstate_lock);
+}
+
+void native_flush_tlb_others(const struct cpumask *cpumask,
+ struct mm_struct *mm, unsigned long va)
+{
+ if (is_uv_system()) {
+ unsigned int cpu;
+
+ cpu = get_cpu();
+ cpumask = uv_flush_tlb_others(cpumask, mm, va, cpu);
+ if (cpumask)
+ flush_tlb_others_ipi(cpumask, mm, va);
+ put_cpu();
+ return;
+ }
+ flush_tlb_others_ipi(cpumask, mm, va);
+}
+
+static int __cpuinit init_smp_flush(void)
+{
+ int i;
+
+ for_each_possible_cpu(i)
+ spin_lock_init(&per_cpu(flush_state, i).tlbstate_lock);
+
+ return 0;
+}
+core_initcall(init_smp_flush);
+
+void flush_tlb_current_task(void)
+{
+ struct mm_struct *mm = current->mm;
+
+ preempt_disable();
+
+ local_flush_tlb();
+ if (cpumask_any_but(&mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)
+ flush_tlb_others(&mm->cpu_vm_mask, mm, TLB_FLUSH_ALL);
+ preempt_enable();
+}
+
+void flush_tlb_mm(struct mm_struct *mm)
+{
+ preempt_disable();
+
+ if (current->active_mm == mm) {
+ if (current->mm)
+ local_flush_tlb();
+ else
+ leave_mm(smp_processor_id());
+ }
+ if (cpumask_any_but(&mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)
+ flush_tlb_others(&mm->cpu_vm_mask, mm, TLB_FLUSH_ALL);
+
+ preempt_enable();
+}
+
+void flush_tlb_page(struct vm_area_struct *vma, unsigned long va)
+{
+ struct mm_struct *mm = vma->vm_mm;
+
+ preempt_disable();
+
+ if (current->active_mm == mm) {
+ if (current->mm)
+ __flush_tlb_one(va);
+ else
+ leave_mm(smp_processor_id());
+ }
+
+ if (cpumask_any_but(&mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)
+ flush_tlb_others(&mm->cpu_vm_mask, mm, va);
+
+ preempt_enable();
+}
+
+static void do_flush_tlb_all(void *info)
+{
+ unsigned long cpu = smp_processor_id();
+
+ __flush_tlb_all();
+ if (percpu_read(cpu_tlbstate.state) == TLBSTATE_LAZY)
+ leave_mm(cpu);
+}
+
+void flush_tlb_all(void)
+{
+ on_each_cpu(do_flush_tlb_all, NULL, 1);
+}
diff --git a/arch/x86/kernel/tlb_64.c b/arch/x86/kernel/tlb_64.c
deleted file mode 100644
index b3ca1b9..0000000
--- a/arch/x86/kernel/tlb_64.c
+++ /dev/null
@@ -1,296 +0,0 @@
-#include <linux/init.h>
-
-#include <linux/mm.h>
-#include <linux/spinlock.h>
-#include <linux/smp.h>
-#include <linux/interrupt.h>
-#include <linux/module.h>
-
-#include <asm/tlbflush.h>
-#include <asm/mmu_context.h>
-#include <asm/apic.h>
-#include <asm/uv/uv.h>
-
-DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate)
- = { &init_mm, 0, };
-
-#include <mach_ipi.h>
-/*
- * Smarter SMP flushing macros.
- * c/o Linus Torvalds.
- *
- * These mean you can really definitely utterly forget about
- * writing to user space from interrupts. (Its not allowed anyway).
- *
- * Optimizations Manfred Spraul <[email protected]>
- *
- * More scalable flush, from Andi Kleen
- *
- * To avoid global state use 8 different call vectors.
- * Each CPU uses a specific vector to trigger flushes on other
- * CPUs. Depending on the received vector the target CPUs look into
- * the right per cpu variable for the flush data.
- *
- * With more than 8 CPUs they are hashed to the 8 available
- * vectors. The limited global vector space forces us to this right now.
- * In future when interrupts are split into per CPU domains this could be
- * fixed, at the cost of triggering multiple IPIs in some cases.
- */
-
-union smp_flush_state {
- struct {
- struct mm_struct *flush_mm;
- unsigned long flush_va;
- spinlock_t tlbstate_lock;
- DECLARE_BITMAP(flush_cpumask, NR_CPUS);
- };
- char pad[SMP_CACHE_BYTES];
-} ____cacheline_aligned;
-
-/* State is put into the per CPU data section, but padded
- to a full cache line because other CPUs can access it and we don't
- want false sharing in the per cpu data segment. */
-static DEFINE_PER_CPU(union smp_flush_state, flush_state);
-
-/*
- * We cannot call mmdrop() because we are in interrupt context,
- * instead update mm->cpu_vm_mask.
- */
-void leave_mm(int cpu)
-{
- if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
- BUG();
- cpu_clear(cpu, percpu_read(cpu_tlbstate.active_mm)->cpu_vm_mask);
- load_cr3(swapper_pg_dir);
-}
-EXPORT_SYMBOL_GPL(leave_mm);
-
-/*
- *
- * The flush IPI assumes that a thread switch happens in this order:
- * [cpu0: the cpu that switches]
- * 1) switch_mm() either 1a) or 1b)
- * 1a) thread switch to a different mm
- * 1a1) cpu_clear(cpu, old_mm->cpu_vm_mask);
- * Stop ipi delivery for the old mm. This is not synchronized with
- * the other cpus, but smp_invalidate_interrupt ignore flush ipis
- * for the wrong mm, and in the worst case we perform a superfluous
- * tlb flush.
- * 1a2) set cpu mmu_state to TLBSTATE_OK
- * Now the smp_invalidate_interrupt won't call leave_mm if cpu0
- * was in lazy tlb mode.
- * 1a3) update cpu active_mm
- * Now cpu0 accepts tlb flushes for the new mm.
- * 1a4) cpu_set(cpu, new_mm->cpu_vm_mask);
- * Now the other cpus will send tlb flush ipis.
- * 1a4) change cr3.
- * 1b) thread switch without mm change
- * cpu active_mm is correct, cpu0 already handles
- * flush ipis.
- * 1b1) set cpu mmu_state to TLBSTATE_OK
- * 1b2) test_and_set the cpu bit in cpu_vm_mask.
- * Atomically set the bit [other cpus will start sending flush ipis],
- * and test the bit.
- * 1b3) if the bit was 0: leave_mm was called, flush the tlb.
- * 2) switch %%esp, ie current
- *
- * The interrupt must handle 2 special cases:
- * - cr3 is changed before %%esp, ie. it cannot use current->{active_,}mm.
- * - the cpu performs speculative tlb reads, i.e. even if the cpu only
- * runs in kernel space, the cpu could load tlb entries for user space
- * pages.
- *
- * The good news is that cpu mmu_state is local to each cpu, no
- * write/read ordering problems.
- */
-
-/*
- * TLB flush IPI:
- *
- * 1) Flush the tlb entries if the cpu uses the mm that's being flushed.
- * 2) Leave the mm if we are in the lazy tlb mode.
- *
- * Interrupts are disabled.
- */
-
-/*
- * FIXME: use of asmlinkage is not consistent. On x86_64 it's noop
- * but still used for documentation purpose but the usage is slightly
- * inconsistent. On x86_32, asmlinkage is regparm(0) but interrupt
- * entry calls in with the first parameter in %eax. Maybe define
- * intrlinkage?
- */
-#ifdef CONFIG_X86_64
-asmlinkage
-#endif
-void smp_invalidate_interrupt(struct pt_regs *regs)
-{
- unsigned int cpu;
- unsigned int sender;
- union smp_flush_state *f;
-
- cpu = smp_processor_id();
- /*
- * orig_rax contains the negated interrupt vector.
- * Use that to determine where the sender put the data.
- */
- sender = ~regs->orig_ax - INVALIDATE_TLB_VECTOR_START;
- f = &per_cpu(flush_state, sender);
-
- if (!cpumask_test_cpu(cpu, to_cpumask(f->flush_cpumask)))
- goto out;
- /*
- * This was a BUG() but until someone can quote me the
- * line from the intel manual that guarantees an IPI to
- * multiple CPUs is retried _only_ on the erroring CPUs
- * its staying as a return
- *
- * BUG();
- */
-
- if (f->flush_mm == percpu_read(cpu_tlbstate.active_mm)) {
- if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK) {
- if (f->flush_va == TLB_FLUSH_ALL)
- local_flush_tlb();
- else
- __flush_tlb_one(f->flush_va);
- } else
- leave_mm(cpu);
- }
-out:
- ack_APIC_irq();
- smp_mb__before_clear_bit();
- cpumask_clear_cpu(cpu, to_cpumask(f->flush_cpumask));
- smp_mb__after_clear_bit();
- inc_irq_stat(irq_tlb_count);
-}
-
-static void flush_tlb_others_ipi(const struct cpumask *cpumask,
- struct mm_struct *mm, unsigned long va)
-{
- unsigned int sender;
- union smp_flush_state *f;
-
- /* Caller has disabled preemption */
- sender = smp_processor_id() % NUM_INVALIDATE_TLB_VECTORS;
- f = &per_cpu(flush_state, sender);
-
- /*
- * Could avoid this lock when
- * num_online_cpus() <= NUM_INVALIDATE_TLB_VECTORS, but it is
- * probably not worth checking this for a cache-hot lock.
- */
- spin_lock(&f->tlbstate_lock);
-
- f->flush_mm = mm;
- f->flush_va = va;
- cpumask_andnot(to_cpumask(f->flush_cpumask),
- cpumask, cpumask_of(smp_processor_id()));
-
- /*
- * Make the above memory operations globally visible before
- * sending the IPI.
- */
- smp_mb();
- /*
- * We have to send the IPI only to
- * CPUs affected.
- */
- send_IPI_mask(to_cpumask(f->flush_cpumask),
- INVALIDATE_TLB_VECTOR_START + sender);
-
- while (!cpumask_empty(to_cpumask(f->flush_cpumask)))
- cpu_relax();
-
- f->flush_mm = NULL;
- f->flush_va = 0;
- spin_unlock(&f->tlbstate_lock);
-}
-
-void native_flush_tlb_others(const struct cpumask *cpumask,
- struct mm_struct *mm, unsigned long va)
-{
- if (is_uv_system()) {
- unsigned int cpu;
-
- cpu = get_cpu();
- cpumask = uv_flush_tlb_others(cpumask, mm, va, cpu);
- if (cpumask)
- flush_tlb_others_ipi(cpumask, mm, va);
- put_cpu();
- return;
- }
- flush_tlb_others_ipi(cpumask, mm, va);
-}
-
-static int __cpuinit init_smp_flush(void)
-{
- int i;
-
- for_each_possible_cpu(i)
- spin_lock_init(&per_cpu(flush_state, i).tlbstate_lock);
-
- return 0;
-}
-core_initcall(init_smp_flush);
-
-void flush_tlb_current_task(void)
-{
- struct mm_struct *mm = current->mm;
-
- preempt_disable();
-
- local_flush_tlb();
- if (cpumask_any_but(&mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)
- flush_tlb_others(&mm->cpu_vm_mask, mm, TLB_FLUSH_ALL);
- preempt_enable();
-}
-
-void flush_tlb_mm(struct mm_struct *mm)
-{
- preempt_disable();
-
- if (current->active_mm == mm) {
- if (current->mm)
- local_flush_tlb();
- else
- leave_mm(smp_processor_id());
- }
- if (cpumask_any_but(&mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)
- flush_tlb_others(&mm->cpu_vm_mask, mm, TLB_FLUSH_ALL);
-
- preempt_enable();
-}
-
-void flush_tlb_page(struct vm_area_struct *vma, unsigned long va)
-{
- struct mm_struct *mm = vma->vm_mm;
-
- preempt_disable();
-
- if (current->active_mm == mm) {
- if (current->mm)
- __flush_tlb_one(va);
- else
- leave_mm(smp_processor_id());
- }
-
- if (cpumask_any_but(&mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)
- flush_tlb_others(&mm->cpu_vm_mask, mm, va);
-
- preempt_enable();
-}
-
-static void do_flush_tlb_all(void *info)
-{
- unsigned long cpu = smp_processor_id();
-
- __flush_tlb_all();
- if (percpu_read(cpu_tlbstate.state) == TLBSTATE_LAZY)
- leave_mm(cpu);
-}
-
-void flush_tlb_all(void)
-{
- on_each_cpu(do_flush_tlb_all, NULL, 1);
-}
--
1.6.0.2

2009-01-21 08:50:17

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 08/10] x86: prepare for tlb merge

Impact: clean up, ipi vector number reordering for x86_32

Make the following changes to prepare for tlb merge.

* reorder x86_32 ip vectors

* adjust tlb_32.c and tlb_64.c such that their logics coincide exactly
- on spurious invalidate ipi, tlb_32 acks the irq
- tlb_64 now has proper memory barriers around clearing
flush_cpumask (no change in generated code)

* unexport flush_tlb_page from tlb_32.c, there's no user

* use unsigned int for cpu id

* drop unnecessary includes from tlb_64.c

Signed-off-by: Tejun Heo <[email protected]>
---
arch/x86/include/asm/irq_vectors.h | 33 ++++++++++++++++-----------------
arch/x86/kernel/tlb_32.c | 10 +++++-----
arch/x86/kernel/tlb_64.c | 18 +++++++-----------
3 files changed, 28 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index a16a2ab..4ee8f80 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -49,31 +49,30 @@
* some of the following vectors are 'rare', they are merged
* into a single vector (CALL_FUNCTION_VECTOR) to save vector space.
* TLB, reschedule and local APIC vectors are performance-critical.
- *
- * Vectors 0xf0-0xfa are free (reserved for future Linux use).
*/
#ifdef CONFIG_X86_32

# define SPURIOUS_APIC_VECTOR 0xff
# define ERROR_APIC_VECTOR 0xfe
-# define INVALIDATE_TLB_VECTOR 0xfd
-# define RESCHEDULE_VECTOR 0xfc
-# define CALL_FUNCTION_VECTOR 0xfb
-# define CALL_FUNCTION_SINGLE_VECTOR 0xfa
-# define THERMAL_APIC_VECTOR 0xf0
+# define RESCHEDULE_VECTOR 0xfd
+# define CALL_FUNCTION_VECTOR 0xfc
+# define CALL_FUNCTION_SINGLE_VECTOR 0xfb
+# define THERMAL_APIC_VECTOR 0xfa
+/* 0xf1 - 0xf9 : free */
+# define INVALIDATE_TLB_VECTOR 0xf0

#else

-#define SPURIOUS_APIC_VECTOR 0xff
-#define ERROR_APIC_VECTOR 0xfe
-#define RESCHEDULE_VECTOR 0xfd
-#define CALL_FUNCTION_VECTOR 0xfc
-#define CALL_FUNCTION_SINGLE_VECTOR 0xfb
-#define THERMAL_APIC_VECTOR 0xfa
-#define THRESHOLD_APIC_VECTOR 0xf9
-#define UV_BAU_MESSAGE 0xf8
-#define INVALIDATE_TLB_VECTOR_END 0xf7
-#define INVALIDATE_TLB_VECTOR_START 0xf0 /* f0-f7 used for TLB flush */
+# define SPURIOUS_APIC_VECTOR 0xff
+# define ERROR_APIC_VECTOR 0xfe
+# define RESCHEDULE_VECTOR 0xfd
+# define CALL_FUNCTION_VECTOR 0xfc
+# define CALL_FUNCTION_SINGLE_VECTOR 0xfb
+# define THERMAL_APIC_VECTOR 0xfa
+# define THRESHOLD_APIC_VECTOR 0xf9
+# define UV_BAU_MESSAGE 0xf8
+# define INVALIDATE_TLB_VECTOR_END 0xf7
+# define INVALIDATE_TLB_VECTOR_START 0xf0 /* f0-f7 used for TLB flush */

#define NUM_INVALIDATE_TLB_VECTORS 8

diff --git a/arch/x86/kernel/tlb_32.c b/arch/x86/kernel/tlb_32.c
index abf0808..93fcb05 100644
--- a/arch/x86/kernel/tlb_32.c
+++ b/arch/x86/kernel/tlb_32.c
@@ -84,13 +84,15 @@ EXPORT_SYMBOL_GPL(leave_mm);
*
* 1) Flush the tlb entries if the cpu uses the mm that's being flushed.
* 2) Leave the mm if we are in the lazy tlb mode.
+ *
+ * Interrupts are disabled.
*/

void smp_invalidate_interrupt(struct pt_regs *regs)
{
- unsigned long cpu;
+ unsigned int cpu;

- cpu = get_cpu();
+ cpu = smp_processor_id();

if (!cpumask_test_cpu(cpu, flush_cpumask))
goto out;
@@ -112,12 +114,11 @@ void smp_invalidate_interrupt(struct pt_regs *regs)
} else
leave_mm(cpu);
}
+out:
ack_APIC_irq();
smp_mb__before_clear_bit();
cpumask_clear_cpu(cpu, flush_cpumask);
smp_mb__after_clear_bit();
-out:
- put_cpu_no_resched();
inc_irq_stat(irq_tlb_count);
}

@@ -215,7 +216,6 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long va)
flush_tlb_others(&mm->cpu_vm_mask, mm, va);
preempt_enable();
}
-EXPORT_SYMBOL(flush_tlb_page);

static void do_flush_tlb_all(void *info)
{
diff --git a/arch/x86/kernel/tlb_64.c b/arch/x86/kernel/tlb_64.c
index b8bed84..19ac661 100644
--- a/arch/x86/kernel/tlb_64.c
+++ b/arch/x86/kernel/tlb_64.c
@@ -1,20 +1,14 @@
#include <linux/init.h>

#include <linux/mm.h>
-#include <linux/delay.h>
#include <linux/spinlock.h>
#include <linux/smp.h>
-#include <linux/kernel_stat.h>
-#include <linux/mc146818rtc.h>
#include <linux/interrupt.h>
+#include <linux/module.h>

-#include <asm/mtrr.h>
-#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
-#include <asm/proto.h>
-#include <asm/apicdef.h>
-#include <asm/idle.h>
+#include <asm/apic.h>
#include <asm/uv/uv.h>

DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate)
@@ -121,8 +115,8 @@ EXPORT_SYMBOL_GPL(leave_mm);

asmlinkage void smp_invalidate_interrupt(struct pt_regs *regs)
{
- int cpu;
- int sender;
+ unsigned int cpu;
+ unsigned int sender;
union smp_flush_state *f;

cpu = smp_processor_id();
@@ -155,14 +149,16 @@ asmlinkage void smp_invalidate_interrupt(struct pt_regs *regs)
}
out:
ack_APIC_irq();
+ smp_mb__before_clear_bit();
cpumask_clear_cpu(cpu, to_cpumask(f->flush_cpumask));
+ smp_mb__after_clear_bit();
inc_irq_stat(irq_tlb_count);
}

static void flush_tlb_others_ipi(const struct cpumask *cpumask,
struct mm_struct *mm, unsigned long va)
{
- int sender;
+ unsigned int sender;
union smp_flush_state *f;

/* Caller has disabled preemption */
--
1.6.0.2

2009-01-21 08:50:51

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 07/10] x86: uv cleanup

Impact: cleanup

Make the following uv related cleanups.

* collect visible uv related definitions and interfaces into uv/uv.h
and use it. this cleans up the messy situation where on 64bit, uv
is defined properly, on 32bit generic it's dummy and on the rest
undefined. after this clean up, uv is defined on 64 and dummy on
32.

* update uv_flush_tlb_others() such that it takes cpumask of
to-be-flushed cpus as argument, instead of that minus self, and
returns yet-to-be-flushed cpumask, instead of modifying the passed
in parameter. this interface change will ease dummy implementation
of uv_flush_tlb_others() and makes uv tlb flush related stuff
defined in tlb_uv proper.

Signed-off-by: Tejun Heo <[email protected]>
---
arch/x86/include/asm/genapic_32.h | 7 ----
arch/x86/include/asm/genapic_64.h | 6 ---
arch/x86/include/asm/uv/uv.h | 33 ++++++++++++++++++
arch/x86/include/asm/uv/uv_bau.h | 2 -
arch/x86/kernel/cpu/common.c | 1 +
arch/x86/kernel/genx2apic_uv_x.c | 1 +
arch/x86/kernel/smpboot.c | 1 +
arch/x86/kernel/tlb_64.c | 18 ++++------
arch/x86/kernel/tlb_uv.c | 68 +++++++++++++++++++++---------------
9 files changed, 83 insertions(+), 54 deletions(-)
create mode 100644 arch/x86/include/asm/uv/uv.h

diff --git a/arch/x86/include/asm/genapic_32.h b/arch/x86/include/asm/genapic_32.h
index 2c05b73..4334502 100644
--- a/arch/x86/include/asm/genapic_32.h
+++ b/arch/x86/include/asm/genapic_32.h
@@ -138,11 +138,4 @@ struct genapic {
extern struct genapic *genapic;
extern void es7000_update_genapic_to_cluster(void);

-enum uv_system_type {UV_NONE, UV_LEGACY_APIC, UV_X2APIC, UV_NON_UNIQUE_APIC};
-#define get_uv_system_type() UV_NONE
-#define is_uv_system() 0
-#define uv_wakeup_secondary(a, b) 1
-#define uv_system_init() do {} while (0)
-
-
#endif /* _ASM_X86_GENAPIC_32_H */
diff --git a/arch/x86/include/asm/genapic_64.h b/arch/x86/include/asm/genapic_64.h
index adf32fb..7bb092c 100644
--- a/arch/x86/include/asm/genapic_64.h
+++ b/arch/x86/include/asm/genapic_64.h
@@ -51,15 +51,9 @@ extern struct genapic apic_x2apic_phys;
extern int acpi_madt_oem_check(char *, char *);

extern void apic_send_IPI_self(int vector);
-enum uv_system_type {UV_NONE, UV_LEGACY_APIC, UV_X2APIC, UV_NON_UNIQUE_APIC};
-extern enum uv_system_type get_uv_system_type(void);
-extern int is_uv_system(void);

extern struct genapic apic_x2apic_uv_x;
DECLARE_PER_CPU(int, x2apic_extra_bits);
-extern void uv_cpu_init(void);
-extern void uv_system_init(void);
-extern int uv_wakeup_secondary(int phys_apicid, unsigned int start_rip);

extern void setup_apic_routing(void);

diff --git a/arch/x86/include/asm/uv/uv.h b/arch/x86/include/asm/uv/uv.h
new file mode 100644
index 0000000..dce5fe3
--- /dev/null
+++ b/arch/x86/include/asm/uv/uv.h
@@ -0,0 +1,33 @@
+#ifndef _ASM_X86_UV_UV_H
+#define _ASM_X86_UV_UV_H
+
+enum uv_system_type {UV_NONE, UV_LEGACY_APIC, UV_X2APIC, UV_NON_UNIQUE_APIC};
+
+#ifdef CONFIG_X86_64
+
+extern enum uv_system_type get_uv_system_type(void);
+extern int is_uv_system(void);
+extern void uv_cpu_init(void);
+extern void uv_system_init(void);
+extern int uv_wakeup_secondary(int phys_apicid, unsigned int start_rip);
+extern const struct cpumask *uv_flush_tlb_others(const struct cpumask *cpumask,
+ struct mm_struct *mm,
+ unsigned long va,
+ unsigned int cpu);
+
+#else /* X86_64 */
+
+static inline enum uv_system_type get_uv_system_type(void) { return UV_NONE; }
+static inline int is_uv_system(void) { return 0; }
+static inline void uv_cpu_init(void) { }
+static inline void uv_system_init(void) { }
+static inline int uv_wakeup_secondary(int phys_apicid, unsigned int start_rip)
+{ return 1; }
+static inline const struct cpumask *
+uv_flush_tlb_others(const struct cpumask *cpumask, struct mm_struct *mm,
+ unsigned long va, unsigned int cpu)
+{ return cpumask; }
+
+#endif /* X86_64 */
+
+#endif /* _ASM_X86_UV_UV_H */
diff --git a/arch/x86/include/asm/uv/uv_bau.h b/arch/x86/include/asm/uv/uv_bau.h
index 74e6393..9b0e61b 100644
--- a/arch/x86/include/asm/uv/uv_bau.h
+++ b/arch/x86/include/asm/uv/uv_bau.h
@@ -325,8 +325,6 @@ static inline void bau_cpubits_clear(struct bau_local_cpumask *dstp, int nbits)
#define cpubit_isset(cpu, bau_local_cpumask) \
test_bit((cpu), (bau_local_cpumask).bits)

-extern int uv_flush_tlb_others(struct cpumask *,
- struct mm_struct *, unsigned long);
extern void uv_bau_message_intr1(void);
extern void uv_bau_timeout_intr1(void);

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index fbebbce..99904f2 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -28,6 +28,7 @@
#include <asm/apic.h>
#include <mach_apic.h>
#include <asm/genapic.h>
+#include <asm/uv/uv.h>
#endif

#include <asm/pgtable.h>
diff --git a/arch/x86/kernel/genx2apic_uv_x.c b/arch/x86/kernel/genx2apic_uv_x.c
index b193e08..bfe3624 100644
--- a/arch/x86/kernel/genx2apic_uv_x.c
+++ b/arch/x86/kernel/genx2apic_uv_x.c
@@ -25,6 +25,7 @@
#include <asm/ipi.h>
#include <asm/genapic.h>
#include <asm/pgtable.h>
+#include <asm/uv/uv.h>
#include <asm/uv/uv_mmrs.h>
#include <asm/uv/uv_hub.h>
#include <asm/uv/bios.h>
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 869b988..def770b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -62,6 +62,7 @@
#include <asm/vmi.h>
#include <asm/genapic.h>
#include <asm/setup.h>
+#include <asm/uv/uv.h>
#include <linux/mc146818rtc.h>

#include <mach_apic.h>
diff --git a/arch/x86/kernel/tlb_64.c b/arch/x86/kernel/tlb_64.c
index e64a32c..b8bed84 100644
--- a/arch/x86/kernel/tlb_64.c
+++ b/arch/x86/kernel/tlb_64.c
@@ -15,8 +15,7 @@
#include <asm/proto.h>
#include <asm/apicdef.h>
#include <asm/idle.h>
-#include <asm/uv/uv_hub.h>
-#include <asm/uv/uv_bau.h>
+#include <asm/uv/uv.h>

DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate)
= { &init_mm, 0, };
@@ -206,16 +205,13 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
struct mm_struct *mm, unsigned long va)
{
if (is_uv_system()) {
- /* FIXME: could be an percpu_alloc'd thing */
- static DEFINE_PER_CPU(cpumask_t, flush_tlb_mask);
- struct cpumask *after_uv_flush = &get_cpu_var(flush_tlb_mask);
+ unsigned int cpu;

- cpumask_andnot(after_uv_flush, cpumask,
- cpumask_of(smp_processor_id()));
- if (!uv_flush_tlb_others(after_uv_flush, mm, va))
- flush_tlb_others_ipi(after_uv_flush, mm, va);
-
- put_cpu_var(flush_tlb_uv_cpumask);
+ cpu = get_cpu();
+ cpumask = uv_flush_tlb_others(cpumask, mm, va, cpu);
+ if (cpumask)
+ flush_tlb_others_ipi(cpumask, mm, va);
+ put_cpu();
return;
}
flush_tlb_others_ipi(cpumask, mm, va);
diff --git a/arch/x86/kernel/tlb_uv.c b/arch/x86/kernel/tlb_uv.c
index 690dcf1..aae15dd 100644
--- a/arch/x86/kernel/tlb_uv.c
+++ b/arch/x86/kernel/tlb_uv.c
@@ -11,6 +11,7 @@
#include <linux/kernel.h>

#include <asm/mmu_context.h>
+#include <asm/uv/uv.h>
#include <asm/uv/uv_mmrs.h>
#include <asm/uv/uv_hub.h>
#include <asm/uv/uv_bau.h>
@@ -209,14 +210,15 @@ static int uv_wait_completion(struct bau_desc *bau_desc,
*
* Send a broadcast and wait for a broadcast message to complete.
*
- * The cpumaskp mask contains the cpus the broadcast was sent to.
+ * The flush_mask contains the cpus the broadcast was sent to.
*
- * Returns 1 if all remote flushing was done. The mask is zeroed.
- * Returns 0 if some remote flushing remains to be done. The mask will have
- * some bits still set.
+ * Returns NULL if all remote flushing was done. The mask is zeroed.
+ * Returns @flush_mask if some remote flushing remains to be done. The
+ * mask will have some bits still set.
*/
-int uv_flush_send_and_wait(int cpu, int this_blade, struct bau_desc *bau_desc,
- struct cpumask *cpumaskp)
+const struct cpumask *uv_flush_send_and_wait(int cpu, int this_blade,
+ struct bau_desc *bau_desc,
+ struct cpumask *flush_mask)
{
int completion_status = 0;
int right_shift;
@@ -263,59 +265,69 @@ int uv_flush_send_and_wait(int cpu, int this_blade, struct bau_desc *bau_desc,
* Success, so clear the remote cpu's from the mask so we don't
* use the IPI method of shootdown on them.
*/
- for_each_cpu(bit, cpumaskp) {
+ for_each_cpu(bit, flush_mask) {
blade = uv_cpu_to_blade_id(bit);
if (blade == this_blade)
continue;
- cpumask_clear_cpu(bit, cpumaskp);
+ cpumask_clear_cpu(bit, flush_mask);
}
- if (!cpumask_empty(cpumaskp))
- return 0;
- return 1;
+ if (!cpumask_empty(flush_mask))
+ return flush_mask;
+ return NULL;
}

/**
* uv_flush_tlb_others - globally purge translation cache of a virtual
* address or all TLB's
- * @cpumaskp: mask of all cpu's in which the address is to be removed
+ * @cpumask: mask of all cpu's in which the address is to be removed
* @mm: mm_struct containing virtual address range
* @va: virtual address to be removed (or TLB_FLUSH_ALL for all TLB's on cpu)
+ * @cpu: the current cpu
*
* This is the entry point for initiating any UV global TLB shootdown.
*
* Purges the translation caches of all specified processors of the given
* virtual address, or purges all TLB's on specified processors.
*
- * The caller has derived the cpumaskp from the mm_struct and has subtracted
- * the local cpu from the mask. This function is called only if there
- * are bits set in the mask. (e.g. flush_tlb_page())
+ * The caller has derived the cpumask from the mm_struct. This function
+ * is called only if there are bits set in the mask. (e.g. flush_tlb_page())
*
- * The cpumaskp is converted into a nodemask of the nodes containing
+ * The cpumask is converted into a nodemask of the nodes containing
* the cpus.
*
- * Returns 1 if all remote flushing was done.
- * Returns 0 if some remote flushing remains to be done.
+ * Note that this function should be called with preemption disabled.
+ *
+ * Returns NULL if all remote flushing was done.
+ * Returns pointer to cpumask if some remote flushing remains to be
+ * done. The returned pointer is valid till preemption is re-enabled.
*/
-int uv_flush_tlb_others(struct cpumask *cpumaskp, struct mm_struct *mm,
- unsigned long va)
+const struct cpumask *uv_flush_tlb_others(const struct cpumask *cpumask,
+ struct mm_struct *mm,
+ unsigned long va, unsigned int cpu)
{
+ static DEFINE_PER_CPU(cpumask_t, flush_tlb_mask);
+ struct cpumask *flush_mask = &__get_cpu_var(flush_tlb_mask);
int i;
int bit;
int blade;
- int cpu;
+ int uv_cpu;
int this_blade;
int locals = 0;
struct bau_desc *bau_desc;

- cpu = uv_blade_processor_id();
+ WARN_ON(!in_atomic());
+
+ cpumask_andnot(flush_mask, cpumask, cpumask_of(cpu));
+
+ uv_cpu = uv_blade_processor_id();
this_blade = uv_numa_blade_id();
bau_desc = __get_cpu_var(bau_control).descriptor_base;
- bau_desc += UV_ITEMS_PER_DESCRIPTOR * cpu;
+ bau_desc += UV_ITEMS_PER_DESCRIPTOR * uv_cpu;

bau_nodes_clear(&bau_desc->distribution, UV_DISTRIBUTION_SIZE);

i = 0;
- for_each_cpu(bit, cpumaskp) {
+ for_each_cpu(bit, flush_mask) {
blade = uv_cpu_to_blade_id(bit);
BUG_ON(blade > (UV_DISTRIBUTION_SIZE - 1));
if (blade == this_blade) {
@@ -330,17 +342,17 @@ int uv_flush_tlb_others(struct cpumask *cpumaskp, struct mm_struct *mm,
* no off_node flushing; return status for local node
*/
if (locals)
- return 0;
+ return flush_mask;
else
- return 1;
+ return NULL;
}
__get_cpu_var(ptcstats).requestor++;
__get_cpu_var(ptcstats).ntargeted += i;

bau_desc->payload.address = va;
- bau_desc->payload.sending_cpu = smp_processor_id();
+ bau_desc->payload.sending_cpu = cpu;

- return uv_flush_send_and_wait(cpu, this_blade, bau_desc, cpumaskp);
+ return uv_flush_send_and_wait(uv_cpu, this_blade, bau_desc, flush_mask);
}

/*
--
1.6.0.2

2009-01-21 08:51:21

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 09/10] x86: make x86_32 use tlb_64.c

Impact: less contention when issuing invalidate IPI, cleanup

Make x86_32 use the same tlb code as 64bit. The 64bit code uses
multiple IPI vectors for tlb shootdown to reduce contention. This
patch makes x86_32 allocate the same 8 IPIs as x86_64 and share the
code paths.

Note that the usage of asmlinkage is inconsistent for x86_32 and 64
and calls for further cleanup. This has been noted with a FIXME
comment in tlb_64.c.

Signed-off-by: Tejun Heo <[email protected]>
---
arch/x86/include/asm/irq_vectors.h | 7 +-
arch/x86/include/asm/mach-default/entry_arch.h | 18 ++-
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/entry_32.S | 6 +-
arch/x86/kernel/irqinit_32.c | 11 +-
arch/x86/kernel/tlb_32.c | 239 ------------------------
arch/x86/kernel/tlb_64.c | 12 +-
7 files changed, 47 insertions(+), 248 deletions(-)
delete mode 100644 arch/x86/kernel/tlb_32.c

diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 4ee8f80..9a83a10 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -58,8 +58,11 @@
# define CALL_FUNCTION_VECTOR 0xfc
# define CALL_FUNCTION_SINGLE_VECTOR 0xfb
# define THERMAL_APIC_VECTOR 0xfa
-/* 0xf1 - 0xf9 : free */
-# define INVALIDATE_TLB_VECTOR 0xf0
+/* 0xf8 - 0xf9 : free */
+# define INVALIDATE_TLB_VECTOR_END 0xf7
+# define INVALIDATE_TLB_VECTOR_START 0xf0 /* f0-f7 used for TLB flush */
+
+# define NUM_INVALIDATE_TLB_VECTORS 8

#else

diff --git a/arch/x86/include/asm/mach-default/entry_arch.h b/arch/x86/include/asm/mach-default/entry_arch.h
index 6b1add8..6fa399a 100644
--- a/arch/x86/include/asm/mach-default/entry_arch.h
+++ b/arch/x86/include/asm/mach-default/entry_arch.h
@@ -11,10 +11,26 @@
*/
#ifdef CONFIG_X86_SMP
BUILD_INTERRUPT(reschedule_interrupt,RESCHEDULE_VECTOR)
-BUILD_INTERRUPT(invalidate_interrupt,INVALIDATE_TLB_VECTOR)
BUILD_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR)
BUILD_INTERRUPT(call_function_single_interrupt,CALL_FUNCTION_SINGLE_VECTOR)
BUILD_INTERRUPT(irq_move_cleanup_interrupt,IRQ_MOVE_CLEANUP_VECTOR)
+
+BUILD_INTERRUPT3(invalidate_interrupt0,INVALIDATE_TLB_VECTOR_START+0,
+ smp_invalidate_interrupt)
+BUILD_INTERRUPT3(invalidate_interrupt1,INVALIDATE_TLB_VECTOR_START+1,
+ smp_invalidate_interrupt)
+BUILD_INTERRUPT3(invalidate_interrupt2,INVALIDATE_TLB_VECTOR_START+2,
+ smp_invalidate_interrupt)
+BUILD_INTERRUPT3(invalidate_interrupt3,INVALIDATE_TLB_VECTOR_START+3,
+ smp_invalidate_interrupt)
+BUILD_INTERRUPT3(invalidate_interrupt4,INVALIDATE_TLB_VECTOR_START+4,
+ smp_invalidate_interrupt)
+BUILD_INTERRUPT3(invalidate_interrupt5,INVALIDATE_TLB_VECTOR_START+5,
+ smp_invalidate_interrupt)
+BUILD_INTERRUPT3(invalidate_interrupt6,INVALIDATE_TLB_VECTOR_START+6,
+ smp_invalidate_interrupt)
+BUILD_INTERRUPT3(invalidate_interrupt7,INVALIDATE_TLB_VECTOR_START+7,
+ smp_invalidate_interrupt)
#endif

/*
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index eb07453..a62a15c 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -58,7 +58,7 @@ obj-$(CONFIG_PCI) += early-quirks.o
apm-y := apm_32.o
obj-$(CONFIG_APM) += apm.o
obj-$(CONFIG_X86_SMP) += smp.o
-obj-$(CONFIG_X86_SMP) += smpboot.o tsc_sync.o ipi.o tlb_$(BITS).o
+obj-$(CONFIG_X86_SMP) += smpboot.o tsc_sync.o ipi.o tlb_64.o
obj-$(CONFIG_X86_32_SMP) += smpcommon.o
obj-$(CONFIG_X86_64_SMP) += tsc_sync.o smpcommon.o
obj-$(CONFIG_X86_TRAMPOLINE) += trampoline_$(BITS).o
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 4646902..a0b91aa 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -672,7 +672,7 @@ common_interrupt:
ENDPROC(common_interrupt)
CFI_ENDPROC

-#define BUILD_INTERRUPT(name, nr) \
+#define BUILD_INTERRUPT3(name, nr, fn) \
ENTRY(name) \
RING0_INT_FRAME; \
pushl $~(nr); \
@@ -680,11 +680,13 @@ ENTRY(name) \
SAVE_ALL; \
TRACE_IRQS_OFF \
movl %esp,%eax; \
- call smp_##name; \
+ call fn; \
jmp ret_from_intr; \
CFI_ENDPROC; \
ENDPROC(name)

+#define BUILD_INTERRUPT(name, nr) BUILD_INTERRUPT3(name, nr, smp_##name)
+
/* The include is where all of the SMP etc. interrupts come from */
#include "entry_arch.h"

diff --git a/arch/x86/kernel/irqinit_32.c b/arch/x86/kernel/irqinit_32.c
index 1507ad4..bf629ca 100644
--- a/arch/x86/kernel/irqinit_32.c
+++ b/arch/x86/kernel/irqinit_32.c
@@ -149,8 +149,15 @@ void __init native_init_IRQ(void)
*/
alloc_intr_gate(RESCHEDULE_VECTOR, reschedule_interrupt);

- /* IPI for invalidation */
- alloc_intr_gate(INVALIDATE_TLB_VECTOR, invalidate_interrupt);
+ /* IPIs for invalidation */
+ alloc_intr_gate(INVALIDATE_TLB_VECTOR_START+0, invalidate_interrupt0);
+ alloc_intr_gate(INVALIDATE_TLB_VECTOR_START+1, invalidate_interrupt1);
+ alloc_intr_gate(INVALIDATE_TLB_VECTOR_START+2, invalidate_interrupt2);
+ alloc_intr_gate(INVALIDATE_TLB_VECTOR_START+3, invalidate_interrupt3);
+ alloc_intr_gate(INVALIDATE_TLB_VECTOR_START+4, invalidate_interrupt4);
+ alloc_intr_gate(INVALIDATE_TLB_VECTOR_START+5, invalidate_interrupt5);
+ alloc_intr_gate(INVALIDATE_TLB_VECTOR_START+6, invalidate_interrupt6);
+ alloc_intr_gate(INVALIDATE_TLB_VECTOR_START+7, invalidate_interrupt7);

/* IPI for generic function call */
alloc_intr_gate(CALL_FUNCTION_VECTOR, call_function_interrupt);
diff --git a/arch/x86/kernel/tlb_32.c b/arch/x86/kernel/tlb_32.c
deleted file mode 100644
index 93fcb05..0000000
--- a/arch/x86/kernel/tlb_32.c
+++ /dev/null
@@ -1,239 +0,0 @@
-#include <linux/spinlock.h>
-#include <linux/cpu.h>
-#include <linux/interrupt.h>
-
-#include <asm/tlbflush.h>
-
-DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate)
- = { &init_mm, 0, };
-
-/* must come after the send_IPI functions above for inlining */
-#include <mach_ipi.h>
-
-/*
- * Smarter SMP flushing macros.
- * c/o Linus Torvalds.
- *
- * These mean you can really definitely utterly forget about
- * writing to user space from interrupts. (Its not allowed anyway).
- *
- * Optimizations Manfred Spraul <[email protected]>
- */
-
-static cpumask_var_t flush_cpumask;
-static struct mm_struct *flush_mm;
-static unsigned long flush_va;
-static DEFINE_SPINLOCK(tlbstate_lock);
-
-/*
- * We cannot call mmdrop() because we are in interrupt context,
- * instead update mm->cpu_vm_mask.
- *
- * We need to reload %cr3 since the page tables may be going
- * away from under us..
- */
-void leave_mm(int cpu)
-{
- BUG_ON(percpu_read(cpu_tlbstate.state) == TLBSTATE_OK);
- cpu_clear(cpu, percpu_read(cpu_tlbstate.active_mm)->cpu_vm_mask);
- load_cr3(swapper_pg_dir);
-}
-EXPORT_SYMBOL_GPL(leave_mm);
-
-/*
- *
- * The flush IPI assumes that a thread switch happens in this order:
- * [cpu0: the cpu that switches]
- * 1) switch_mm() either 1a) or 1b)
- * 1a) thread switch to a different mm
- * 1a1) cpu_clear(cpu, old_mm->cpu_vm_mask);
- * Stop ipi delivery for the old mm. This is not synchronized with
- * the other cpus, but smp_invalidate_interrupt ignore flush ipis
- * for the wrong mm, and in the worst case we perform a superfluous
- * tlb flush.
- * 1a2) set cpu_tlbstate to TLBSTATE_OK
- * Now the smp_invalidate_interrupt won't call leave_mm if cpu0
- * was in lazy tlb mode.
- * 1a3) update cpu_tlbstate[].active_mm
- * Now cpu0 accepts tlb flushes for the new mm.
- * 1a4) cpu_set(cpu, new_mm->cpu_vm_mask);
- * Now the other cpus will send tlb flush ipis.
- * 1a4) change cr3.
- * 1b) thread switch without mm change
- * cpu_tlbstate[].active_mm is correct, cpu0 already handles
- * flush ipis.
- * 1b1) set cpu_tlbstate to TLBSTATE_OK
- * 1b2) test_and_set the cpu bit in cpu_vm_mask.
- * Atomically set the bit [other cpus will start sending flush ipis],
- * and test the bit.
- * 1b3) if the bit was 0: leave_mm was called, flush the tlb.
- * 2) switch %%esp, ie current
- *
- * The interrupt must handle 2 special cases:
- * - cr3 is changed before %%esp, ie. it cannot use current->{active_,}mm.
- * - the cpu performs speculative tlb reads, i.e. even if the cpu only
- * runs in kernel space, the cpu could load tlb entries for user space
- * pages.
- *
- * The good news is that cpu_tlbstate is local to each cpu, no
- * write/read ordering problems.
- */
-
-/*
- * TLB flush IPI:
- *
- * 1) Flush the tlb entries if the cpu uses the mm that's being flushed.
- * 2) Leave the mm if we are in the lazy tlb mode.
- *
- * Interrupts are disabled.
- */
-
-void smp_invalidate_interrupt(struct pt_regs *regs)
-{
- unsigned int cpu;
-
- cpu = smp_processor_id();
-
- if (!cpumask_test_cpu(cpu, flush_cpumask))
- goto out;
- /*
- * This was a BUG() but until someone can quote me the
- * line from the intel manual that guarantees an IPI to
- * multiple CPUs is retried _only_ on the erroring CPUs
- * its staying as a return
- *
- * BUG();
- */
-
- if (flush_mm == percpu_read(cpu_tlbstate.active_mm)) {
- if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK) {
- if (flush_va == TLB_FLUSH_ALL)
- local_flush_tlb();
- else
- __flush_tlb_one(flush_va);
- } else
- leave_mm(cpu);
- }
-out:
- ack_APIC_irq();
- smp_mb__before_clear_bit();
- cpumask_clear_cpu(cpu, flush_cpumask);
- smp_mb__after_clear_bit();
- inc_irq_stat(irq_tlb_count);
-}
-
-void native_flush_tlb_others(const struct cpumask *cpumask,
- struct mm_struct *mm, unsigned long va)
-{
- /*
- * - mask must exist :)
- */
- BUG_ON(cpumask_empty(cpumask));
- BUG_ON(!mm);
-
- /*
- * i'm not happy about this global shared spinlock in the
- * MM hot path, but we'll see how contended it is.
- * AK: x86-64 has a faster method that could be ported.
- */
- spin_lock(&tlbstate_lock);
-
- cpumask_andnot(flush_cpumask, cpumask, cpumask_of(smp_processor_id()));
-#ifdef CONFIG_HOTPLUG_CPU
- /* If a CPU which we ran on has gone down, OK. */
- cpumask_and(flush_cpumask, flush_cpumask, cpu_online_mask);
- if (unlikely(cpumask_empty(flush_cpumask))) {
- spin_unlock(&tlbstate_lock);
- return;
- }
-#endif
- flush_mm = mm;
- flush_va = va;
-
- /*
- * Make the above memory operations globally visible before
- * sending the IPI.
- */
- smp_mb();
- /*
- * We have to send the IPI only to
- * CPUs affected.
- */
- send_IPI_mask(flush_cpumask, INVALIDATE_TLB_VECTOR);
-
- while (!cpumask_empty(flush_cpumask))
- /* nothing. lockup detection does not belong here */
- cpu_relax();
-
- flush_mm = NULL;
- flush_va = 0;
- spin_unlock(&tlbstate_lock);
-}
-
-void flush_tlb_current_task(void)
-{
- struct mm_struct *mm = current->mm;
-
- preempt_disable();
-
- local_flush_tlb();
- if (cpumask_any_but(&mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)
- flush_tlb_others(&mm->cpu_vm_mask, mm, TLB_FLUSH_ALL);
- preempt_enable();
-}
-
-void flush_tlb_mm(struct mm_struct *mm)
-{
-
- preempt_disable();
-
- if (current->active_mm == mm) {
- if (current->mm)
- local_flush_tlb();
- else
- leave_mm(smp_processor_id());
- }
- if (cpumask_any_but(&mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)
- flush_tlb_others(&mm->cpu_vm_mask, mm, TLB_FLUSH_ALL);
-
- preempt_enable();
-}
-
-void flush_tlb_page(struct vm_area_struct *vma, unsigned long va)
-{
- struct mm_struct *mm = vma->vm_mm;
-
- preempt_disable();
-
- if (current->active_mm == mm) {
- if (current->mm)
- __flush_tlb_one(va);
- else
- leave_mm(smp_processor_id());
- }
-
- if (cpumask_any_but(&mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)
- flush_tlb_others(&mm->cpu_vm_mask, mm, va);
- preempt_enable();
-}
-
-static void do_flush_tlb_all(void *info)
-{
- unsigned long cpu = smp_processor_id();
-
- __flush_tlb_all();
- if (percpu_read(cpu_tlbstate.state) == TLBSTATE_LAZY)
- leave_mm(cpu);
-}
-
-void flush_tlb_all(void)
-{
- on_each_cpu(do_flush_tlb_all, NULL, 1);
-}
-
-static int init_flush_cpumask(void)
-{
- alloc_cpumask_var(&flush_cpumask, GFP_KERNEL);
- return 0;
-}
-early_initcall(init_flush_cpumask);
diff --git a/arch/x86/kernel/tlb_64.c b/arch/x86/kernel/tlb_64.c
index 19ac661..b3ca1b9 100644
--- a/arch/x86/kernel/tlb_64.c
+++ b/arch/x86/kernel/tlb_64.c
@@ -113,7 +113,17 @@ EXPORT_SYMBOL_GPL(leave_mm);
* Interrupts are disabled.
*/

-asmlinkage void smp_invalidate_interrupt(struct pt_regs *regs)
+/*
+ * FIXME: use of asmlinkage is not consistent. On x86_64 it's noop
+ * but still used for documentation purpose but the usage is slightly
+ * inconsistent. On x86_32, asmlinkage is regparm(0) but interrupt
+ * entry calls in with the first parameter in %eax. Maybe define
+ * intrlinkage?
+ */
+#ifdef CONFIG_X86_64
+asmlinkage
+#endif
+void smp_invalidate_interrupt(struct pt_regs *regs)
{
unsigned int cpu;
unsigned int sender;
--
1.6.0.2

2009-01-21 09:26:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCHSET linux-2.6-x86:core/percpu] x86: misc clean up and unify x86_32/64 code paths


* Tejun Heo <[email protected]> wrote:

> Hello,
>
> This patchset contains the following 10 patches mostly unifying 32 and
> 64bit code paths. 0001 and 0007-0010 are from me and 0002-0006 are
> from Brian Gerst.
>
> 0001-x86-update-canary-handling-during-switch.patch
> 0002-x86-clean-up-gdt_page-definition.patch
> 0003-x86-fix-percpu_write-with-64-bit-constants.patch
> 0004-x86-set-fs-to-__KERNEL_PERCPU-unconditionally-for.patch
> 0005-x86-merge-mmu_context.h.patch
> 0006-x86-merge-irq_regs.h.patch
> 0007-x86-uv-cleanup.patch
> 0008-x86-prepare-for-tlb-merge.patch
> 0009-x86-make-x86_32-use-tlb_64.c.patch
> 0010-x86-rename-tlb_64.c-to-tlb.c.patch
>
> 0001-0002 clean up after previous percpu changes. 0003 improves
> percpu op slightly. 0004 simplifies x86_32 init path. 0005-0010
> unifies mmu_context.h, irq_regs.h and tlb.c, which not only simplifies
> the source code but also improves performance by applying
> optimizations implemented for either bitness to the other one while
> unifying.
>
> The following git tree contains the above changes on top of
> core/percpu[1]. Please pull from it if there's no objection.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu

Pulled into tip/core/percpu, thanks Tejun.

Very nice stuff! The SMP TLB flush code unification is a gem - i never
thought we'd get so far this quick ;-)

Note, i also merged the latest cpumask work into tip/core/percpu, to
resolve tlb_32.c conflicts. I also moved tlb.c from arch/x86/kernel/ to
arch/x86/mm/, where it really belongs conceptually.

-tip testing also stumbled upon a build error caused by the UV cleanups -
see the fix below.

Ingo

-------------->
>From 4ec71fa2d2c3f1040348f2604f4b8ccc833d1c2e Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Wed, 21 Jan 2009 10:24:27 +0100
Subject: [PATCH] x86: uv cleanup, build fix
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

Fix:

arch/x86/mm/srat_64.c: In function ‘acpi_numa_processor_affinity_init’:
arch/x86/mm/srat_64.c:141: error: implicit declaration of function ‘get_uv_system_type’
arch/x86/mm/srat_64.c:141: error: ‘UV_X2APIC’ undeclared (first use in this function)
arch/x86/mm/srat_64.c:141: error: (Each undeclared identifier is reported only once
arch/x86/mm/srat_64.c:141: error: for each function it appears in.)

A couple of UV definitions were moved to asm/uv/uv.h, but srat_64.c did
not include that header. Add it.

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/mm/srat_64.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index 09737c8..15df1ba 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -21,6 +21,7 @@
#include <asm/numa.h>
#include <asm/e820.h>
#include <asm/genapic.h>
+#include <asm/uv/uv.h>

int acpi_numa __initdata;

2009-01-21 09:35:34

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCHSET linux-2.6-x86:core/percpu] x86: misc clean up and unify x86_32/64 code paths


* Ingo Molnar <[email protected]> wrote:

> -tip testing also stumbled upon a build error caused by the UV cleanups
> - see the fix below.

another detail is fixed below.

Ingo

--------------->
>From 3b8f6278445384836f2739462b7964f21fbaa44f Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Wed, 21 Jan 2009 10:32:44 +0100
Subject: [PATCH] x86: make x86_32 use tlb_64.c, build fix
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

Fix:

arch/x86/mm/tlb.c:47: error: ‘CONFIG_X86_INTERNODE_CACHE_BYTES’ undeclared here (not in a function)

The CONFIG_X86_INTERNODE_CACHE_BYTES symbol is only defined on 64-bit,
because vsmp support is 64-bit only. Define it on 32-bit too - where it
will always be equal to X86_L1_CACHE_BYTES.

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/Kconfig.cpu | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index 8078955..3f02f66 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -300,7 +300,6 @@ config X86_INTERNODE_CACHE_BYTES
int
default "4096" if X86_VSMP
default X86_L1_CACHE_BYTES if !X86_VSMP
- depends on X86_64

config X86_CMPXCHG
def_bool X86_64 || (X86_32 && !M386)

2009-01-21 10:43:53

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCHSET linux-2.6-x86:core/percpu] x86: misc clean up and unify x86_32/64 code paths


This one was needed as well.

Ingo

------------------->
>From 61e0178cf01b4acc914e9653b421e925f22be0cd Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Wed, 21 Jan 2009 11:30:07 +0100
Subject: [PATCH] x86: uv cleanup, build fix #2

Fix more build-failure fallout from the UV cleanup - the UV drivers
were not updated to include <asm/uv/uv.h>.

Signed-off-by: Ingo Molnar <[email protected]>
---
drivers/misc/sgi-gru/grufile.c | 1 +
drivers/misc/sgi-xp/xp_main.c | 2 ++
drivers/misc/sgi-xp/xp_uv.c | 1 +
3 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/misc/sgi-gru/grufile.c b/drivers/misc/sgi-gru/grufile.c
index 6509838..50f6357 100644
--- a/drivers/misc/sgi-gru/grufile.c
+++ b/drivers/misc/sgi-gru/grufile.c
@@ -53,6 +53,7 @@
#define IS_UV() 0
#endif

+#include <asm/uv/uv.h>
#include <asm/uv/uv_hub.h>
#include <asm/uv/uv_mmrs.h>

diff --git a/drivers/misc/sgi-xp/xp_main.c b/drivers/misc/sgi-xp/xp_main.c
index 16f8dca..1a553e6 100644
--- a/drivers/misc/sgi-xp/xp_main.c
+++ b/drivers/misc/sgi-xp/xp_main.c
@@ -18,6 +18,8 @@
#include <linux/device.h>
#include "xp.h"

+#include <asm/uv/uv.h>
+
/* define the XP debug device structures to be used with dev_dbg() et al */

struct device_driver xp_dbg_name = {
diff --git a/drivers/misc/sgi-xp/xp_uv.c b/drivers/misc/sgi-xp/xp_uv.c
index d238576..c55cc8a 100644
--- a/drivers/misc/sgi-xp/xp_uv.c
+++ b/drivers/misc/sgi-xp/xp_uv.c
@@ -14,6 +14,7 @@
*/

#include <linux/device.h>
+#include <asm/uv/uv.h>
#include <asm/uv/uv_hub.h>
#if defined CONFIG_X86_64
#include <asm/uv/bios.h>

2009-01-21 11:03:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCHSET linux-2.6-x86:core/percpu] x86: misc clean up and unify x86_32/64 code paths


* Ingo Molnar <[email protected]> wrote:

> This one was needed as well.
>
> Ingo
>
> ------------------->
> From 61e0178cf01b4acc914e9653b421e925f22be0cd Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <[email protected]>
> Date: Wed, 21 Jan 2009 11:30:07 +0100
> Subject: [PATCH] x86: uv cleanup, build fix #2
>
> Fix more build-failure fallout from the UV cleanup - the UV drivers
> were not updated to include <asm/uv/uv.h>.
>
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> drivers/misc/sgi-gru/grufile.c | 1 +
> drivers/misc/sgi-xp/xp_main.c | 2 ++
> drivers/misc/sgi-xp/xp_uv.c | 1 +
> 3 files changed, 4 insertions(+), 0 deletions(-)

i went for this solution instead - it covers more .c files and is the
right thing to do as well.

Ingo

----------------->
>From 5b221278d61e3907a5e4104a844b63bc8bb3d43a Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Wed, 21 Jan 2009 11:30:07 +0100
Subject: [PATCH] x86: uv cleanup, build fix #2

Fix more build-failure fallout from the UV cleanup - the UV drivers
were not updated to include <asm/uv/uv.h>.

Signed-off-by: Ingo Molnar <[email protected]>
---
drivers/misc/sgi-gru/gru.h | 2 ++
drivers/misc/sgi-xp/xp.h | 2 ++
2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/misc/sgi-gru/gru.h b/drivers/misc/sgi-gru/gru.h
index f93f03a..1b5f579 100644
--- a/drivers/misc/sgi-gru/gru.h
+++ b/drivers/misc/sgi-gru/gru.h
@@ -19,6 +19,8 @@
#ifndef __GRU_H__
#define __GRU_H__

+#include <asm/uv/uv.h>
+
/*
* GRU architectural definitions
*/
diff --git a/drivers/misc/sgi-xp/xp.h b/drivers/misc/sgi-xp/xp.h
index 7b4cbd5..069ad3a 100644
--- a/drivers/misc/sgi-xp/xp.h
+++ b/drivers/misc/sgi-xp/xp.h
@@ -15,6 +15,8 @@

#include <linux/mutex.h>

+#include <asm/uv/uv.h>
+
#ifdef CONFIG_IA64
#include <asm/system.h>
#include <asm/sn/arch.h> /* defines is_shub1() and is_shub2() */

2009-01-21 11:20:41

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCHSET linux-2.6-x86:core/percpu] x86: misc clean up and unify x86_32/64 code paths

Ingo Molnar wrote:
> i went for this solution instead - it covers more .c files and is the
> right thing to do as well.

Heh... Looks like I'm the one who should be doing much more build
testing. Sorry about the mess.

Thanks.

--
tejun

2009-01-28 00:18:12

by Cliff Wickman

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86: uv cleanup



Hi Tejun,

I appreciate your consolidation and cleanup of the UV code, relative
to 32 vs 64.

I have a question about the addition of the
WARN_ON(!in_atomic());
in uv_flush_tlb_others().
The patch is
http://marc.info/?l=linux-kernel&m=123252788121855&w=2

I expect this function to always entered preemptable.
Could you explain a bit about your thought behind this WARN_ON?

Thanks.
-Cliff




const struct cpumask *uv_flush_tlb_others(const struct cpumask *cpumask,
struct mm_struct *mm,
unsigned long va, unsigned int cpu)
{
static DEFINE_PER_CPU(cpumask_t, flush_tlb_mask);
struct cpumask *flush_mask = &__get_cpu_var(flush_tlb_mask);
int i;
int bit;
int blade;
int uv_cpu;
int this_blade;
int locals = 0;
struct bau_desc *bau_desc;

WARN_ON(!in_atomic());

cpumask_andnot(flush_mask, cpumask, cpumask_of(cpu));

uv_cpu = uv_blade_processor_id();
this_blade = uv_numa_blade_id();

2009-01-28 00:33:42

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86: uv cleanup

Cliff Wickman wrote:
> I appreciate your consolidation and cleanup of the UV code, relative
> to 32 vs 64.
>
> I have a question about the addition of the
> WARN_ON(!in_atomic());
> in uv_flush_tlb_others().
> The patch is
> http://marc.info/?l=linux-kernel&m=123252788121855&w=2
>
> I expect this function to always entered preemptable.
> Could you explain a bit about your thought behind this WARN_ON?

The function needs to return modified cpumask. Before the patch, this
was achieved by the caller disabling preemption and pass modifiable
per-cpu cpumask and allowing uv code to modify it in place. After the
patch, to put uv code in uv proper (so that the generic code doesn't
have to allocate percpu cpumask for it), this is moved into the uv
function. The caller disables preemtion and uses the returned cpumask
which might or might not be the same array it gave to the function and
the array is guaranteed to be valid until preemption is re-enabled.
The meachnism itself isn't different. Only what's done where has
changed to make it more modular. More specifically, when uv is not
needed, the generic code can simply omit the uv related part including
the percpu cpumask allocation.

Thanks.

--
tejun