These patches try to support context tracking for Power arch, beginning with
64-bit pSeries. The codes are ported from that of the x86_64, and in each
patch, I listed the corresponding patch for x86.
v3:
This version is mainly a rebasing, against 3.10-rc1, also as the common code
to handle the exception are pulled into 3.10, so there is no dependency on
tip tree. So patch #2 and #6 in previous version_2 is merged together.
Li Zhong (5):
powerpc: Syscall hooks for context tracking subsystem
powerpc: Exception hooks for context tracking subsystem
powerpc: Exit user context on notify resume
powerpc: Use the new schedule_user API on userspace preemption
powerpc: select HAVE_CONTEXT_TRACKING for pSeries
arch/powerpc/include/asm/context_tracking.h | 10 +++
arch/powerpc/include/asm/thread_info.h | 7 ++-
arch/powerpc/kernel/entry_64.S | 3 +-
arch/powerpc/kernel/ptrace.c | 5 ++
arch/powerpc/kernel/signal.c | 5 ++
arch/powerpc/kernel/traps.c | 91 ++++++++++++++++++++-------
arch/powerpc/mm/fault.c | 16 ++++-
arch/powerpc/mm/hash_utils_64.c | 38 ++++++++---
arch/powerpc/platforms/pseries/Kconfig | 1 +
9 files changed, 140 insertions(+), 36 deletions(-)
create mode 100644 arch/powerpc/include/asm/context_tracking.h
--
1.7.9.5
This is the syscall slow path hooks for context tracking subsystem,
corresponding to
[PATCH] x86: Syscall hooks for userspace RCU extended QS
commit bf5a3c13b939813d28ce26c01425054c740d6731
TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there
is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is
better for it to be in the same 16 bits with others in the group, so in the
asm code, andi. with this group could work.
Signed-off-by: Li Zhong <[email protected]>
Acked-by: Frederic Weisbecker <[email protected]>
---
arch/powerpc/include/asm/thread_info.h | 7 +++++--
arch/powerpc/kernel/ptrace.c | 5 +++++
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index 8ceea14..ba7b197 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -97,7 +97,7 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_PERFMON_CTXSW 6 /* perfmon needs ctxsw calls */
#define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */
#define TIF_SINGLESTEP 8 /* singlestepping active */
-#define TIF_MEMDIE 9 /* is terminating due to OOM killer */
+#define TIF_NOHZ 9 /* in adaptive nohz mode */
#define TIF_SECCOMP 10 /* secure computing */
#define TIF_RESTOREALL 11 /* Restore all regs (implies NOERROR) */
#define TIF_NOERROR 12 /* Force successful syscall return */
@@ -106,6 +106,7 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_SYSCALL_TRACEPOINT 15 /* syscall tracepoint instrumentation */
#define TIF_EMULATE_STACK_STORE 16 /* Is an instruction emulation
for stack store? */
+#define TIF_MEMDIE 17 /* is terminating due to OOM killer */
/* as above, but as bit values */
#define _TIF_SYSCALL_TRACE (1<<TIF_SYSCALL_TRACE)
@@ -124,8 +125,10 @@ static inline struct thread_info *current_thread_info(void)
#define _TIF_UPROBE (1<<TIF_UPROBE)
#define _TIF_SYSCALL_TRACEPOINT (1<<TIF_SYSCALL_TRACEPOINT)
#define _TIF_EMULATE_STACK_STORE (1<<TIF_EMULATE_STACK_STORE)
+#define _TIF_NOHZ (1<<TIF_NOHZ)
#define _TIF_SYSCALL_T_OR_A (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
- _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT)
+ _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \
+ _TIF_NOHZ)
#define _TIF_USER_WORK_MASK (_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
_TIF_NOTIFY_RESUME | _TIF_UPROBE)
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 3b14d32..98c2fc1 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -32,6 +32,7 @@
#include <trace/syscall.h>
#include <linux/hw_breakpoint.h>
#include <linux/perf_event.h>
+#include <linux/context_tracking.h>
#include <asm/uaccess.h>
#include <asm/page.h>
@@ -1788,6 +1789,8 @@ long do_syscall_trace_enter(struct pt_regs *regs)
{
long ret = 0;
+ user_exit();
+
secure_computing_strict(regs->gpr[0]);
if (test_thread_flag(TIF_SYSCALL_TRACE) &&
@@ -1832,4 +1835,6 @@ void do_syscall_trace_leave(struct pt_regs *regs)
step = test_thread_flag(TIF_SINGLESTEP);
if (step || test_thread_flag(TIF_SYSCALL_TRACE))
tracehook_report_syscall_exit(regs, step);
+
+ user_enter();
}
--
1.7.9.5
This is the exception hooks for context tracking subsystem, including
data access, program check, single step, instruction breakpoint, machine check,
alignment, fp unavailable, altivec assist, unknown exception, whose handlers
might use RCU.
This patch corresponds to
[PATCH] x86: Exception hooks for userspace RCU extended QS
commit 6ba3c97a38803883c2eee489505796cb0a727122
But after the exception handling moved to generic code, and some changes in
following two commits:
56dd9470d7c8734f055da2a6bac553caf4a468eb
context_tracking: Move exception handling to generic code
6c1e0256fad84a843d915414e4b5973b7443d48d
context_tracking: Restore correct previous context state on exception exit
it is able for exception hooks to use the generic code above instead of a
redundant arch implementation.
Signed-off-by: Li Zhong <[email protected]>
---
arch/powerpc/kernel/traps.c | 91 +++++++++++++++++++++++++++++----------
arch/powerpc/mm/fault.c | 16 ++++++-
arch/powerpc/mm/hash_utils_64.c | 38 ++++++++++++----
3 files changed, 112 insertions(+), 33 deletions(-)
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 83efa2f..9d3c000 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -35,6 +35,7 @@
#include <linux/kdebug.h>
#include <linux/debugfs.h>
#include <linux/ratelimit.h>
+#include <linux/context_tracking.h>
#include <asm/emulated_ops.h>
#include <asm/pgtable.h>
@@ -668,6 +669,9 @@ int machine_check_generic(struct pt_regs *regs)
void machine_check_exception(struct pt_regs *regs)
{
int recover = 0;
+ enum ctx_state prev_state;
+
+ prev_state = exception_enter();
__get_cpu_var(irq_stat).mce_exceptions++;
@@ -683,7 +687,7 @@ void machine_check_exception(struct pt_regs *regs)
recover = cur_cpu_spec->machine_check(regs);
if (recover > 0)
- return;
+ goto exit;
#if defined(CONFIG_8xx) && defined(CONFIG_PCI)
/* the qspan pci read routines can cause machine checks -- Cort
@@ -693,20 +697,23 @@ void machine_check_exception(struct pt_regs *regs)
* -- BenH
*/
bad_page_fault(regs, regs->dar, SIGBUS);
- return;
+ goto exit;
#endif
if (debugger_fault_handler(regs))
- return;
+ goto exit;
if (check_io_access(regs))
- return;
+ goto exit;
die("Machine check", regs, SIGBUS);
/* Must die if the interrupt is not recoverable */
if (!(regs->msr & MSR_RI))
panic("Unrecoverable Machine check");
+
+exit:
+ exception_exit(prev_state);
}
void SMIException(struct pt_regs *regs)
@@ -716,20 +723,31 @@ void SMIException(struct pt_regs *regs)
void unknown_exception(struct pt_regs *regs)
{
+ enum ctx_state prev_state;
+ prev_state = exception_enter();
+
printk("Bad trap at PC: %lx, SR: %lx, vector=%lx\n",
regs->nip, regs->msr, regs->trap);
_exception(SIGTRAP, regs, 0, 0);
+
+ exception_exit(prev_state);
}
void instruction_breakpoint_exception(struct pt_regs *regs)
{
+ enum ctx_state prev_state;
+ prev_state = exception_enter();
+
if (notify_die(DIE_IABR_MATCH, "iabr_match", regs, 5,
5, SIGTRAP) == NOTIFY_STOP)
- return;
+ goto exit;
if (debugger_iabr_match(regs))
- return;
+ goto exit;
_exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip);
+
+exit:
+ exception_exit(prev_state);
}
void RunModeException(struct pt_regs *regs)
@@ -739,15 +757,21 @@ void RunModeException(struct pt_regs *regs)
void __kprobes single_step_exception(struct pt_regs *regs)
{
+ enum ctx_state prev_state;
+ prev_state = exception_enter();
+
clear_single_step(regs);
if (notify_die(DIE_SSTEP, "single_step", regs, 5,
5, SIGTRAP) == NOTIFY_STOP)
- return;
+ goto exit;
if (debugger_sstep(regs))
- return;
+ goto exit;
_exception(SIGTRAP, regs, TRAP_TRACE, regs->nip);
+
+exit:
+ exception_exit(prev_state);
}
/*
@@ -1006,34 +1030,37 @@ int is_valid_bugaddr(unsigned long addr)
void __kprobes program_check_exception(struct pt_regs *regs)
{
unsigned int reason = get_reason(regs);
+ enum ctx_state prev_state;
extern int do_mathemu(struct pt_regs *regs);
+ prev_state = exception_enter();
+
/* We can now get here via a FP Unavailable exception if the core
* has no FPU, in that case the reason flags will be 0 */
if (reason & REASON_FP) {
/* IEEE FP exception */
parse_fpe(regs);
- return;
+ goto exit;
}
if (reason & REASON_TRAP) {
/* Debugger is first in line to stop recursive faults in
* rcu_lock, notify_die, or atomic_notifier_call_chain */
if (debugger_bpt(regs))
- return;
+ goto exit;
/* trap exception */
if (notify_die(DIE_BPT, "breakpoint", regs, 5, 5, SIGTRAP)
== NOTIFY_STOP)
- return;
+ goto exit;
if (!(regs->msr & MSR_PR) && /* not user-mode */
report_bug(regs->nip, regs) == BUG_TRAP_TYPE_WARN) {
regs->nip += 4;
- return;
+ goto exit;
}
_exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip);
- return;
+ goto exit;
}
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
if (reason & REASON_TM) {
@@ -1049,7 +1076,7 @@ void __kprobes program_check_exception(struct pt_regs *regs)
if (!user_mode(regs) &&
report_bug(regs->nip, regs) == BUG_TRAP_TYPE_WARN) {
regs->nip += 4;
- return;
+ goto exit;
}
/* If usermode caused this, it's done something illegal and
* gets a SIGILL slap on the wrist. We call it an illegal
@@ -1059,7 +1086,7 @@ void __kprobes program_check_exception(struct pt_regs *regs)
*/
if (user_mode(regs)) {
_exception(SIGILL, regs, ILL_ILLOPN, regs->nip);
- return;
+ goto exit;
} else {
printk(KERN_EMERG "Unexpected TM Bad Thing exception "
"at %lx (msr 0x%x)\n", regs->nip, reason);
@@ -1083,16 +1110,16 @@ void __kprobes program_check_exception(struct pt_regs *regs)
switch (do_mathemu(regs)) {
case 0:
emulate_single_step(regs);
- return;
+ goto exit;
case 1: {
int code = 0;
code = __parse_fpscr(current->thread.fpscr.val);
_exception(SIGFPE, regs, code, regs->nip);
- return;
+ goto exit;
}
case -EFAULT:
_exception(SIGSEGV, regs, SEGV_MAPERR, regs->nip);
- return;
+ goto exit;
}
/* fall through on any other errors */
#endif /* CONFIG_MATH_EMULATION */
@@ -1103,10 +1130,10 @@ void __kprobes program_check_exception(struct pt_regs *regs)
case 0:
regs->nip += 4;
emulate_single_step(regs);
- return;
+ goto exit;
case -EFAULT:
_exception(SIGSEGV, regs, SEGV_MAPERR, regs->nip);
- return;
+ goto exit;
}
}
@@ -1114,11 +1141,17 @@ void __kprobes program_check_exception(struct pt_regs *regs)
_exception(SIGILL, regs, ILL_PRVOPC, regs->nip);
else
_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+
+exit:
+ exception_exit(prev_state);
}
void alignment_exception(struct pt_regs *regs)
{
int sig, code, fixed = 0;
+ enum ctx_state prev_state;
+
+ prev_state = exception_enter();
/* We restore the interrupt state now */
if (!arch_irq_disabled_regs(regs))
@@ -1131,7 +1164,7 @@ void alignment_exception(struct pt_regs *regs)
if (fixed == 1) {
regs->nip += 4; /* skip over emulated instruction */
emulate_single_step(regs);
- return;
+ goto exit;
}
/* Operand address was bad */
@@ -1146,6 +1179,9 @@ void alignment_exception(struct pt_regs *regs)
_exception(sig, regs, code, regs->dar);
else
bad_page_fault(regs, regs->dar, sig);
+
+exit:
+ exception_exit(prev_state);
}
void StackOverflow(struct pt_regs *regs)
@@ -1174,23 +1210,34 @@ void trace_syscall(struct pt_regs *regs)
void kernel_fp_unavailable_exception(struct pt_regs *regs)
{
+ enum ctx_state prev_state;
+ prev_state = exception_enter();
+
printk(KERN_EMERG "Unrecoverable FP Unavailable Exception "
"%lx at %lx\n", regs->trap, regs->nip);
die("Unrecoverable FP Unavailable Exception", regs, SIGABRT);
+
+ exception_exit(prev_state);
}
void altivec_unavailable_exception(struct pt_regs *regs)
{
+ enum ctx_state prev_state;
+ prev_state = exception_enter();
+
if (user_mode(regs)) {
/* A user program has executed an altivec instruction,
but this kernel doesn't support altivec. */
_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
- return;
+ goto exit;
}
printk(KERN_EMERG "Unrecoverable VMX/Altivec Unavailable Exception "
"%lx at %lx\n", regs->trap, regs->nip);
die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT);
+
+exit:
+ exception_exit(prev_state);
}
void vsx_unavailable_exception(struct pt_regs *regs)
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 229951f..141835b 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -32,6 +32,7 @@
#include <linux/perf_event.h>
#include <linux/magic.h>
#include <linux/ratelimit.h>
+#include <linux/context_tracking.h>
#include <asm/firmware.h>
#include <asm/page.h>
@@ -193,8 +194,8 @@ static int mm_fault_error(struct pt_regs *regs, unsigned long addr, int fault)
* The return value is 0 if the fault was handled, or the signal
* number if this is a kernel fault that can't be handled here.
*/
-int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
- unsigned long error_code)
+static int __kprobes __do_page_fault(struct pt_regs *regs,
+ unsigned long address, unsigned long error_code)
{
struct vm_area_struct * vma;
struct mm_struct *mm = current->mm;
@@ -475,6 +476,17 @@ bad_area_nosemaphore:
}
+int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
+ unsigned long error_code)
+{
+ int ret;
+ enum ctx_state prev_state;
+ prev_state = exception_enter();
+ ret = __do_page_fault(regs, address, error_code);
+ exception_exit(prev_state);
+ return ret;
+}
+
/*
* bad_page_fault is called when we have a bad access from the kernel.
* It is called from the DSI and ISI handlers in head.S and from some
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 88ac0ee..d92fb26 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -33,6 +33,7 @@
#include <linux/init.h>
#include <linux/signal.h>
#include <linux/memblock.h>
+#include <linux/context_tracking.h>
#include <asm/processor.h>
#include <asm/pgtable.h>
@@ -962,6 +963,9 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
const struct cpumask *tmp;
int rc, user_region = 0, local = 0;
int psize, ssize;
+ enum ctx_state prev_state;
+
+ prev_state = exception_enter();
DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
ea, access, trap);
@@ -973,7 +977,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
mm = current->mm;
if (! mm) {
DBG_LOW(" user region with no mm !\n");
- return 1;
+ rc = 1;
+ goto exit;
}
psize = get_slice_psize(mm, ea);
ssize = user_segment_size(ea);
@@ -992,19 +997,23 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
/* Not a valid range
* Send the problem up to do_page_fault
*/
- return 1;
+ rc = 1;
+ goto exit;
}
DBG_LOW(" mm=%p, mm->pgdir=%p, vsid=%016lx\n", mm, mm->pgd, vsid);
/* Bad address. */
if (!vsid) {
DBG_LOW("Bad address!\n");
- return 1;
+ rc = 1;
+ goto exit;
}
/* Get pgdir */
pgdir = mm->pgd;
- if (pgdir == NULL)
- return 1;
+ if (pgdir == NULL) {
+ rc = 1;
+ goto exit;
+ }
/* Check CPU locality */
tmp = cpumask_of(smp_processor_id());
@@ -1027,7 +1036,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
ptep = find_linux_pte_or_hugepte(pgdir, ea, &hugeshift);
if (ptep == NULL || !pte_present(*ptep)) {
DBG_LOW(" no PTE !\n");
- return 1;
+ rc = 1;
+ goto exit;
}
/* Add _PAGE_PRESENT to the required access perm */
@@ -1038,13 +1048,16 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
*/
if (access & ~pte_val(*ptep)) {
DBG_LOW(" no access !\n");
- return 1;
+ rc = 1;
+ goto exit;
}
#ifdef CONFIG_HUGETLB_PAGE
- if (hugeshift)
- return __hash_page_huge(ea, access, vsid, ptep, trap, local,
+ if (hugeshift) {
+ rc = __hash_page_huge(ea, access, vsid, ptep, trap, local,
ssize, hugeshift, psize);
+ goto exit;
+ }
#endif /* CONFIG_HUGETLB_PAGE */
#ifndef CONFIG_PPC_64K_PAGES
@@ -1124,6 +1137,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
pte_val(*(ptep + PTRS_PER_PTE)));
#endif
DBG_LOW(" -> rc=%d\n", rc);
+exit:
+ exception_exit(prev_state);
return rc;
}
EXPORT_SYMBOL_GPL(hash_page);
@@ -1259,6 +1274,9 @@ void flush_hash_range(unsigned long number, int local)
*/
void low_hash_fault(struct pt_regs *regs, unsigned long address, int rc)
{
+ enum ctx_state prev_state;
+ prev_state = exception_enter();
+
if (user_mode(regs)) {
#ifdef CONFIG_PPC_SUBPAGE_PROT
if (rc == -2)
@@ -1268,6 +1286,8 @@ void low_hash_fault(struct pt_regs *regs, unsigned long address, int rc)
_exception(SIGBUS, regs, BUS_ADRERR, address);
} else
bad_page_fault(regs, address, SIGBUS);
+
+ exception_exit(prev_state);
}
long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
--
1.7.9.5
This patch corresponds to
[PATCH] x86: Use the new schedule_user API on userspace preemption
commit 0430499ce9d78691f3985962021b16bf8f8a8048
Signed-off-by: Li Zhong <[email protected]>
---
arch/powerpc/include/asm/context_tracking.h | 10 ++++++++++
arch/powerpc/kernel/entry_64.S | 3 ++-
2 files changed, 12 insertions(+), 1 deletion(-)
create mode 100644 arch/powerpc/include/asm/context_tracking.h
diff --git a/arch/powerpc/include/asm/context_tracking.h b/arch/powerpc/include/asm/context_tracking.h
new file mode 100644
index 0000000..b6f5a33
--- /dev/null
+++ b/arch/powerpc/include/asm/context_tracking.h
@@ -0,0 +1,10 @@
+#ifndef _ASM_POWERPC_CONTEXT_TRACKING_H
+#define _ASM_POWERPC_CONTEXT_TRACKING_H
+
+#ifdef CONFIG_CONTEXT_TRACKING
+#define SCHEDULE_USER bl .schedule_user
+#else
+#define SCHEDULE_USER bl .schedule
+#endif
+
+#endif
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 915fbb4..d418977 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -33,6 +33,7 @@
#include <asm/irqflags.h>
#include <asm/ftrace.h>
#include <asm/hw_irq.h>
+#include <asm/context_tracking.h>
/*
* System calls.
@@ -634,7 +635,7 @@ _GLOBAL(ret_from_except_lite)
andi. r0,r4,_TIF_NEED_RESCHED
beq 1f
bl .restore_interrupts
- bl .schedule
+ SCHEDULE_USER
b .ret_from_except_lite
1: bl .save_nvgprs
--
1.7.9.5
Start context tracking support from pSeries.
Signed-off-by: Li Zhong <[email protected]>
---
arch/powerpc/platforms/pseries/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index 9a0941b..023b288 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -18,6 +18,7 @@ config PPC_PSERIES
select PPC_PCI_CHOICE if EXPERT
select ZLIB_DEFLATE
select PPC_DOORBELL
+ select HAVE_CONTEXT_TRACKING
default y
config PPC_SPLPAR
--
1.7.9.5
This patch allows RCU usage in do_notify_resume, e.g. signal handling.
It corresponds to
[PATCH] x86: Exit RCU extended QS on notify resume
commit edf55fda35c7dc7f2d9241c3abaddaf759b457c6
Signed-off-by: Li Zhong <[email protected]>
---
arch/powerpc/kernel/signal.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index cf12eae..d63b502 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -13,6 +13,7 @@
#include <linux/signal.h>
#include <linux/uprobes.h>
#include <linux/key.h>
+#include <linux/context_tracking.h>
#include <asm/hw_breakpoint.h>
#include <asm/uaccess.h>
#include <asm/unistd.h>
@@ -159,6 +160,8 @@ static int do_signal(struct pt_regs *regs)
void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
{
+ user_exit();
+
if (thread_info_flags & _TIF_UPROBE)
uprobe_notify_resume(regs);
@@ -169,4 +172,6 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
}
+
+ user_enter();
}
--
1.7.9.5
On Mon, 2013-05-13 at 13:21 +0800, Li Zhong wrote:
> These patches try to support context tracking for Power arch, beginning with
> 64-bit pSeries. The codes are ported from that of the x86_64, and in each
> patch, I listed the corresponding patch for x86.
So that's yet another pile of bloat on all syscall entry/exit and
exception entry/exit. What is it used for ? (I haven't followed on
x86_64 side).
Cheers,
Ben.
> v3:
>
> This version is mainly a rebasing, against 3.10-rc1, also as the common code
> to handle the exception are pulled into 3.10, so there is no dependency on
> tip tree. So patch #2 and #6 in previous version_2 is merged together.
>
> Li Zhong (5):
> powerpc: Syscall hooks for context tracking subsystem
> powerpc: Exception hooks for context tracking subsystem
> powerpc: Exit user context on notify resume
> powerpc: Use the new schedule_user API on userspace preemption
> powerpc: select HAVE_CONTEXT_TRACKING for pSeries
>
> arch/powerpc/include/asm/context_tracking.h | 10 +++
> arch/powerpc/include/asm/thread_info.h | 7 ++-
> arch/powerpc/kernel/entry_64.S | 3 +-
> arch/powerpc/kernel/ptrace.c | 5 ++
> arch/powerpc/kernel/signal.c | 5 ++
> arch/powerpc/kernel/traps.c | 91 ++++++++++++++++++++-------
> arch/powerpc/mm/fault.c | 16 ++++-
> arch/powerpc/mm/hash_utils_64.c | 38 ++++++++---
> arch/powerpc/platforms/pseries/Kconfig | 1 +
> 9 files changed, 140 insertions(+), 36 deletions(-)
> create mode 100644 arch/powerpc/include/asm/context_tracking.h
>
On Mon, 2013-05-13 at 13:21 +0800, Li Zhong wrote:
> int recover = 0;
> + enum ctx_state prev_state;
> +
> + prev_state = exception_enter();
Please make it nicer:
enum ctx_state prev_state = exception_enter();
int recover = 0;
> __get_cpu_var(irq_stat).mce_exceptions++;
>
> @@ -683,7 +687,7 @@ void machine_check_exception(struct pt_regs *regs)
> recover = cur_cpu_spec->machine_check(regs);
>
> if (recover > 0)
> - return;
> + goto exit;
I'm no fan of "exit" as a label as it's otherwise a libc function (I
know there is no relationship here but I just find that annoying).
I usually use "bail"
> #if defined(CONFIG_8xx) && defined(CONFIG_PCI)
> /* the qspan pci read routines can cause machine checks -- Cort
> @@ -693,20 +697,23 @@ void machine_check_exception(struct pt_regs *regs)
> * -- BenH
> */
> bad_page_fault(regs, regs->dar, SIGBUS);
> - return;
> + goto exit;
> #endif
>
> if (debugger_fault_handler(regs))
> - return;
> + goto exit;
>
> if (check_io_access(regs))
> - return;
> + goto exit;
>
> die("Machine check", regs, SIGBUS);
>
> /* Must die if the interrupt is not recoverable */
> if (!(regs->msr & MSR_RI))
> panic("Unrecoverable Machine check");
> +
> +exit:
> + exception_exit(prev_state);
> }
>
> void SMIException(struct pt_regs *regs)
> @@ -716,20 +723,31 @@ void SMIException(struct pt_regs *regs)
>
> void unknown_exception(struct pt_regs *regs)
> {
> + enum ctx_state prev_state;
> + prev_state = exception_enter();
> +
Same cosmetic comment for all other cases
..../...
> #include <asm/firmware.h>
> #include <asm/page.h>
> @@ -193,8 +194,8 @@ static int mm_fault_error(struct pt_regs *regs, unsigned long addr, int fault)
> * The return value is 0 if the fault was handled, or the signal
> * number if this is a kernel fault that can't be handled here.
> */
> -int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
> - unsigned long error_code)
> +static int __kprobes __do_page_fault(struct pt_regs *regs,
> + unsigned long address, unsigned long error_code)
> {
> struct vm_area_struct * vma;
> struct mm_struct *mm = current->mm;
> @@ -475,6 +476,17 @@ bad_area_nosemaphore:
>
> }
>
> +int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
> + unsigned long error_code)
> +{
> + int ret;
> + enum ctx_state prev_state;
> + prev_state = exception_enter();
> + ret = __do_page_fault(regs, address, error_code);
> + exception_exit(prev_state);
> + return ret;
> +}
Any reason why you didn't use the same wrapping trick in traps.c ? I'd
rather we stay consistent in the way we do things between the two files.
> /*
> * bad_page_fault is called when we have a bad access from the kernel.
> * It is called from the DSI and ISI handlers in head.S and from some
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index 88ac0ee..d92fb26 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -33,6 +33,7 @@
> #include <linux/init.h>
> #include <linux/signal.h>
> #include <linux/memblock.h>
> +#include <linux/context_tracking.h>
>
> #include <asm/processor.h>
> #include <asm/pgtable.h>
> @@ -962,6 +963,9 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> const struct cpumask *tmp;
> int rc, user_region = 0, local = 0;
> int psize, ssize;
> + enum ctx_state prev_state;
> +
> + prev_state = exception_enter();
>
> DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
> ea, access, trap);
> @@ -973,7 +977,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> mm = current->mm;
> if (! mm) {
> DBG_LOW(" user region with no mm !\n");
> - return 1;
> + rc = 1;
> + goto exit;
> }
> psize = get_slice_psize(mm, ea);
> ssize = user_segment_size(ea);
> @@ -992,19 +997,23 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> /* Not a valid range
> * Send the problem up to do_page_fault
> */
> - return 1;
> + rc = 1;
> + goto exit;
> }
> DBG_LOW(" mm=%p, mm->pgdir=%p, vsid=%016lx\n", mm, mm->pgd, vsid);
>
> /* Bad address. */
> if (!vsid) {
> DBG_LOW("Bad address!\n");
> - return 1;
> + rc = 1;
> + goto exit;
> }
> /* Get pgdir */
> pgdir = mm->pgd;
> - if (pgdir == NULL)
> - return 1;
> + if (pgdir == NULL) {
> + rc = 1;
> + goto exit;
> + }
>
> /* Check CPU locality */
> tmp = cpumask_of(smp_processor_id());
> @@ -1027,7 +1036,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> ptep = find_linux_pte_or_hugepte(pgdir, ea, &hugeshift);
> if (ptep == NULL || !pte_present(*ptep)) {
> DBG_LOW(" no PTE !\n");
> - return 1;
> + rc = 1;
> + goto exit;
> }
>
> /* Add _PAGE_PRESENT to the required access perm */
> @@ -1038,13 +1048,16 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> */
> if (access & ~pte_val(*ptep)) {
> DBG_LOW(" no access !\n");
> - return 1;
> + rc = 1;
> + goto exit;
> }
>
> #ifdef CONFIG_HUGETLB_PAGE
> - if (hugeshift)
> - return __hash_page_huge(ea, access, vsid, ptep, trap, local,
> + if (hugeshift) {
> + rc = __hash_page_huge(ea, access, vsid, ptep, trap, local,
> ssize, hugeshift, psize);
> + goto exit;
> + }
> #endif /* CONFIG_HUGETLB_PAGE */
>
> #ifndef CONFIG_PPC_64K_PAGES
> @@ -1124,6 +1137,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> pte_val(*(ptep + PTRS_PER_PTE)));
> #endif
> DBG_LOW(" -> rc=%d\n", rc);
> +exit:
> + exception_exit(prev_state);
> return rc;
> }
> EXPORT_SYMBOL_GPL(hash_page);
> @@ -1259,6 +1274,9 @@ void flush_hash_range(unsigned long number, int local)
> */
> void low_hash_fault(struct pt_regs *regs, unsigned long address, int rc)
> {
> + enum ctx_state prev_state;
> + prev_state = exception_enter();
> +
> if (user_mode(regs)) {
> #ifdef CONFIG_PPC_SUBPAGE_PROT
> if (rc == -2)
> @@ -1268,6 +1286,8 @@ void low_hash_fault(struct pt_regs *regs, unsigned long address, int rc)
> _exception(SIGBUS, regs, BUS_ADRERR, address);
> } else
> bad_page_fault(regs, address, SIGBUS);
> +
> + exception_exit(prev_state);
> }
So the above comes from the return of hash page following an error, it's
technically the same exception. I don't know if that matters however.
Also some exceptions are directly handled in asm such as SLB misses,
again I don't know if that matters as I'm not familiar with what the
context tracing code is actually doing.
> long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
Cheers,
Ben.
On Mon, 2013-05-13 at 15:51 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2013-05-13 at 13:21 +0800, Li Zhong wrote:
> > These patches try to support context tracking for Power arch, beginning with
> > 64-bit pSeries. The codes are ported from that of the x86_64, and in each
> > patch, I listed the corresponding patch for x86.
>
> So that's yet another pile of bloat on all syscall entry/exit and
> exception entry/exit. What is it used for ? (I haven't followed on
> x86_64 side).
To my understanding, it is used to enable RCU user extended quiescent
state, so RCU on that cpu doesn't need scheduler ticks. And together
with some other code(already in 3.10), we are able to remove the ticks
in some cases (e.g. only 1 task running on the cpu, with some other
limitations).
Maybe Paul, or Frederic could give some better descriptions.
Thanks, Zhong
>
> Cheers,
> Ben.
>
> > v3:
> >
> > This version is mainly a rebasing, against 3.10-rc1, also as the common code
> > to handle the exception are pulled into 3.10, so there is no dependency on
> > tip tree. So patch #2 and #6 in previous version_2 is merged together.
> >
> > Li Zhong (5):
> > powerpc: Syscall hooks for context tracking subsystem
> > powerpc: Exception hooks for context tracking subsystem
> > powerpc: Exit user context on notify resume
> > powerpc: Use the new schedule_user API on userspace preemption
> > powerpc: select HAVE_CONTEXT_TRACKING for pSeries
> >
> > arch/powerpc/include/asm/context_tracking.h | 10 +++
> > arch/powerpc/include/asm/thread_info.h | 7 ++-
> > arch/powerpc/kernel/entry_64.S | 3 +-
> > arch/powerpc/kernel/ptrace.c | 5 ++
> > arch/powerpc/kernel/signal.c | 5 ++
> > arch/powerpc/kernel/traps.c | 91 ++++++++++++++++++++-------
> > arch/powerpc/mm/fault.c | 16 ++++-
> > arch/powerpc/mm/hash_utils_64.c | 38 ++++++++---
> > arch/powerpc/platforms/pseries/Kconfig | 1 +
> > 9 files changed, 140 insertions(+), 36 deletions(-)
> > create mode 100644 arch/powerpc/include/asm/context_tracking.h
> >
>
>
On Mon, 2013-05-13 at 15:57 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2013-05-13 at 13:21 +0800, Li Zhong wrote:
> > int recover = 0;
> > + enum ctx_state prev_state;
> > +
> > + prev_state = exception_enter();
>
> Please make it nicer:
>
> enum ctx_state prev_state = exception_enter();
> int recover = 0;
OK, I'll fix it, and all the others.
> > __get_cpu_var(irq_stat).mce_exceptions++;
> >
> > @@ -683,7 +687,7 @@ void machine_check_exception(struct pt_regs *regs)
> > recover = cur_cpu_spec->machine_check(regs);
> >
> > if (recover > 0)
> > - return;
> > + goto exit;
>
> I'm no fan of "exit" as a label as it's otherwise a libc function (I
> know there is no relationship here but I just find that annoying).
>
> I usually use "bail"
OK, I'll fix it, and all the others.
>
> > #if defined(CONFIG_8xx) && defined(CONFIG_PCI)
> > /* the qspan pci read routines can cause machine checks -- Cort
> > @@ -693,20 +697,23 @@ void machine_check_exception(struct pt_regs *regs)
> > * -- BenH
> > */
> > bad_page_fault(regs, regs->dar, SIGBUS);
> > - return;
> > + goto exit;
> > #endif
> >
> > if (debugger_fault_handler(regs))
> > - return;
> > + goto exit;
> >
> > if (check_io_access(regs))
> > - return;
> > + goto exit;
> >
> > die("Machine check", regs, SIGBUS);
> >
> > /* Must die if the interrupt is not recoverable */
> > if (!(regs->msr & MSR_RI))
> > panic("Unrecoverable Machine check");
> > +
> > +exit:
> > + exception_exit(prev_state);
> > }
> >
> > void SMIException(struct pt_regs *regs)
> > @@ -716,20 +723,31 @@ void SMIException(struct pt_regs *regs)
> >
> > void unknown_exception(struct pt_regs *regs)
> > {
> > + enum ctx_state prev_state;
> > + prev_state = exception_enter();
> > +
>
> Same cosmetic comment for all other cases
>
> ..../...
>
> > #include <asm/firmware.h>
> > #include <asm/page.h>
> > @@ -193,8 +194,8 @@ static int mm_fault_error(struct pt_regs *regs, unsigned long addr, int fault)
> > * The return value is 0 if the fault was handled, or the signal
> > * number if this is a kernel fault that can't be handled here.
> > */
> > -int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
> > - unsigned long error_code)
> > +static int __kprobes __do_page_fault(struct pt_regs *regs,
> > + unsigned long address, unsigned long error_code)
> > {
> > struct vm_area_struct * vma;
> > struct mm_struct *mm = current->mm;
> > @@ -475,6 +476,17 @@ bad_area_nosemaphore:
> >
> > }
> >
> > +int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
> > + unsigned long error_code)
> > +{
> > + int ret;
> > + enum ctx_state prev_state;
> > + prev_state = exception_enter();
> > + ret = __do_page_fault(regs, address, error_code);
> > + exception_exit(prev_state);
> > + return ret;
> > +}
>
> Any reason why you didn't use the same wrapping trick in traps.c ? I'd
> rather we stay consistent in the way we do things between the two files.
OK, I'll make it consistent with those in traps.c
>
> > /*
> > * bad_page_fault is called when we have a bad access from the kernel.
> > * It is called from the DSI and ISI handlers in head.S and from some
> > diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> > index 88ac0ee..d92fb26 100644
> > --- a/arch/powerpc/mm/hash_utils_64.c
> > +++ b/arch/powerpc/mm/hash_utils_64.c
> > @@ -33,6 +33,7 @@
> > #include <linux/init.h>
> > #include <linux/signal.h>
> > #include <linux/memblock.h>
> > +#include <linux/context_tracking.h>
> >
> > #include <asm/processor.h>
> > #include <asm/pgtable.h>
> > @@ -962,6 +963,9 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> > const struct cpumask *tmp;
> > int rc, user_region = 0, local = 0;
> > int psize, ssize;
> > + enum ctx_state prev_state;
> > +
> > + prev_state = exception_enter();
> >
> > DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
> > ea, access, trap);
> > @@ -973,7 +977,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> > mm = current->mm;
> > if (! mm) {
> > DBG_LOW(" user region with no mm !\n");
> > - return 1;
> > + rc = 1;
> > + goto exit;
> > }
> > psize = get_slice_psize(mm, ea);
> > ssize = user_segment_size(ea);
> > @@ -992,19 +997,23 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> > /* Not a valid range
> > * Send the problem up to do_page_fault
> > */
> > - return 1;
> > + rc = 1;
> > + goto exit;
> > }
> > DBG_LOW(" mm=%p, mm->pgdir=%p, vsid=%016lx\n", mm, mm->pgd, vsid);
> >
> > /* Bad address. */
> > if (!vsid) {
> > DBG_LOW("Bad address!\n");
> > - return 1;
> > + rc = 1;
> > + goto exit;
> > }
> > /* Get pgdir */
> > pgdir = mm->pgd;
> > - if (pgdir == NULL)
> > - return 1;
> > + if (pgdir == NULL) {
> > + rc = 1;
> > + goto exit;
> > + }
> >
> > /* Check CPU locality */
> > tmp = cpumask_of(smp_processor_id());
> > @@ -1027,7 +1036,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> > ptep = find_linux_pte_or_hugepte(pgdir, ea, &hugeshift);
> > if (ptep == NULL || !pte_present(*ptep)) {
> > DBG_LOW(" no PTE !\n");
> > - return 1;
> > + rc = 1;
> > + goto exit;
> > }
> >
> > /* Add _PAGE_PRESENT to the required access perm */
> > @@ -1038,13 +1048,16 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> > */
> > if (access & ~pte_val(*ptep)) {
> > DBG_LOW(" no access !\n");
> > - return 1;
> > + rc = 1;
> > + goto exit;
> > }
> >
> > #ifdef CONFIG_HUGETLB_PAGE
> > - if (hugeshift)
> > - return __hash_page_huge(ea, access, vsid, ptep, trap, local,
> > + if (hugeshift) {
> > + rc = __hash_page_huge(ea, access, vsid, ptep, trap, local,
> > ssize, hugeshift, psize);
> > + goto exit;
> > + }
> > #endif /* CONFIG_HUGETLB_PAGE */
> >
> > #ifndef CONFIG_PPC_64K_PAGES
> > @@ -1124,6 +1137,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> > pte_val(*(ptep + PTRS_PER_PTE)));
> > #endif
> > DBG_LOW(" -> rc=%d\n", rc);
> > +exit:
> > + exception_exit(prev_state);
> > return rc;
> > }
> > EXPORT_SYMBOL_GPL(hash_page);
> > @@ -1259,6 +1274,9 @@ void flush_hash_range(unsigned long number, int local)
> > */
> > void low_hash_fault(struct pt_regs *regs, unsigned long address, int rc)
> > {
> > + enum ctx_state prev_state;
> > + prev_state = exception_enter();
> > +
> > if (user_mode(regs)) {
> > #ifdef CONFIG_PPC_SUBPAGE_PROT
> > if (rc == -2)
> > @@ -1268,6 +1286,8 @@ void low_hash_fault(struct pt_regs *regs, unsigned long address, int rc)
> > _exception(SIGBUS, regs, BUS_ADRERR, address);
> > } else
> > bad_page_fault(regs, address, SIGBUS);
> > +
> > + exception_exit(prev_state);
> > }
>
> So the above comes from the return of hash page following an error, it's
> technically the same exception. I don't know if that matters however.
> Also some exceptions are directly handled in asm such as SLB misses,
> again I don't know if that matters as I'm not familiar with what the
> context tracing code is actually doing.
Yes, the above and hash_page() are two C functions for a same exception.
And the exception hooks enable RCU usage in those C codes. But for asm
codes, I think we could assume that there would be no RCU usage there,
so we don't need wrap them in the hooks.
Thanks, Zhong
>
> > long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
>
> Cheers,
> Ben.
>
>
On Mon, 2013-05-13 at 16:03 +0800, Li Zhong wrote:
>
> To my understanding, it is used to enable RCU user extended quiescent
> state, so RCU on that cpu doesn't need scheduler ticks. And together
> with some other code(already in 3.10), we are able to remove the ticks
> in some cases (e.g. only 1 task running on the cpu, with some other
> limitations).
Ok, sounds interesting. Once you fix the little cosmetic issue, I don't
see any reason not to merge them as it's basically wiring up an existing
feature (in that regard the patches are pretty straightforward) and I
assume the overhead is only there when you enable it.
Cheers,
Ben.
On Mon, 2013-05-13 at 16:44 +0800, Li Zhong wrote:
> Yes, the above and hash_page() are two C functions for a same exception.
> And the exception hooks enable RCU usage in those C codes. But for asm
> codes, I think we could assume that there would be no RCU usage there,
> so we don't need wrap them in the hooks.
hash_page() won't start a new RCU, at least not in its current incarnation,
the only thing I can see it ever doing would be to take some RCU read locks one
day (it doesn't today).
low_hash_fault() is a different beast. It will typically kill things, thus
involving sending signals etc... RCU might well be involved.
Cheers,
Ben.
On Mon, 2013-05-13 at 18:59 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2013-05-13 at 16:03 +0800, Li Zhong wrote:
> >
> > To my understanding, it is used to enable RCU user extended quiescent
> > state, so RCU on that cpu doesn't need scheduler ticks. And together
> > with some other code(already in 3.10), we are able to remove the ticks
> > in some cases (e.g. only 1 task running on the cpu, with some other
> > limitations).
>
> Ok, sounds interesting. Once you fix the little cosmetic issue, I don't
> see any reason not to merge them as it's basically wiring up an existing
> feature (in that regard the patches are pretty straightforward) and I
> assume the overhead is only there when you enable it.
Thanks for your support, Ben.
Yes, there should be no overhead if not enabled.
Thanks, Zhong
>
> Cheers,
> Ben.
>
>
On Mon, 2013-05-13 at 19:06 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2013-05-13 at 16:44 +0800, Li Zhong wrote:
> > Yes, the above and hash_page() are two C functions for a same exception.
> > And the exception hooks enable RCU usage in those C codes. But for asm
> > codes, I think we could assume that there would be no RCU usage there,
> > so we don't need wrap them in the hooks.
>
> hash_page() won't start a new RCU, at least not in its current incarnation,
> the only thing I can see it ever doing would be to take some RCU read locks one
> day (it doesn't today).
Seems I added the hooks because of the trace point of hcall
entry/exit ...
Thanks, Zhong
>
> low_hash_fault() is a different beast. It will typically kill things, thus
> involving sending signals etc... RCU might well be involved.
>
> Cheers,
> Ben.
>
>
On Mon, 2013-05-13 at 17:46 +0800, Li Zhong wrote:
> > hash_page() won't start a new RCU, at least not in its current incarnation,
> > the only thing I can see it ever doing would be to take some RCU read locks one
> > day (it doesn't today).
>
> Seems I added the hooks because of the trace point of hcall
> entry/exit ...
Oh I see, and trace points use RCU ... ok, leave it there then.
Cheers,
Ben.
On Mon, May 13, 2013 at 04:03:13PM +0800, Li Zhong wrote:
> On Mon, 2013-05-13 at 15:51 +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2013-05-13 at 13:21 +0800, Li Zhong wrote:
> > > These patches try to support context tracking for Power arch, beginning with
> > > 64-bit pSeries. The codes are ported from that of the x86_64, and in each
> > > patch, I listed the corresponding patch for x86.
> >
> > So that's yet another pile of bloat on all syscall entry/exit and
> > exception entry/exit. What is it used for ? (I haven't followed on
> > x86_64 side).
>
> To my understanding, it is used to enable RCU user extended quiescent
> state, so RCU on that cpu doesn't need scheduler ticks. And together
> with some other code(already in 3.10), we are able to remove the ticks
> in some cases (e.g. only 1 task running on the cpu, with some other
> limitations).
>
> Maybe Paul, or Frederic could give some better descriptions.
That's pretty much it. It helps RCU and cputime accounting infrastructure
to know when we enter/exit userspace. This way we can:
* Consider the CPU as idle from an RCU point of view when we run in userspace,
so RCU won't need the tick to stay alive.
* Account cputime (utime/stime/...) without using the tick. Ok powerpc already
has CONFIG_VIRT_CPU_ACCOUNTING_NATIVE but making it working with full dynticks
would require some tweaks.
All in one this is to support full dynticks.
Thanks.
On Mon, May 13, 2013 at 06:59:23PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2013-05-13 at 16:03 +0800, Li Zhong wrote:
> >
> > To my understanding, it is used to enable RCU user extended quiescent
> > state, so RCU on that cpu doesn't need scheduler ticks. And together
> > with some other code(already in 3.10), we are able to remove the ticks
> > in some cases (e.g. only 1 task running on the cpu, with some other
> > limitations).
>
> Ok, sounds interesting. Once you fix the little cosmetic issue, I don't
> see any reason not to merge them as it's basically wiring up an existing
> feature (in that regard the patches are pretty straightforward) and I
> assume the overhead is only there when you enable it.
Yeah when CONFIG_RCU_USER_QS=n and CONFIG_VIRT_CPU_ACCOUNTING_GEN=n this
should be zero overhead. And those configs are only needed for full dynticks.
Thanks.
>
> Cheers,
> Ben.
>
>