2015-05-13 16:11:06

by Torsten Duwe

[permalink] [raw]
Subject: [PATCH] ppc64 ftrace: mark data_access callees "notrace" (pt.1)

In order to avoid an endless recursion, functions that may get
called from the data access handler must not call into tracing
functions, which may cause data access faults ;-)

Advancing from my previous approach that lavishly compiled whole
subdirs without the profiling switches, this is more fine-grained
(but probably yet incomplete). This patch is necessary albeit not
sufficient for FTRACE_WITH_REGS on ppc64.

Patch is against 3.17; fuzz+force should do no harm.

Signed-off-by: Torsten Duwe <[email protected]>

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index bf44ae9..29ed85f 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -717,7 +717,7 @@ static inline void __switch_to_tm(struct task_struct *prev)
* don't know which of the checkpointed state and the transactional
* state to use.
*/
-void restore_tm_state(struct pt_regs *regs)
+notrace void restore_tm_state(struct pt_regs *regs)
{
unsigned long msr_diff;

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index daee7f4..a8e5b8c 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -846,7 +846,7 @@ void early_init_mmu_secondary(void)
/*
* Called by asm hashtable.S for doing lazy icache flush
*/
-unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap)
+notrace unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap)
{
struct page *page;

@@ -867,7 +867,7 @@ unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap)
}

#ifdef CONFIG_PPC_MM_SLICES
-unsigned int get_paca_psize(unsigned long addr)
+notrace unsigned int get_paca_psize(unsigned long addr)
{
u64 lpsizes;
unsigned char *hpsizes;
@@ -896,7 +896,7 @@ unsigned int get_paca_psize(unsigned long addr)
* For now this makes the whole process use 4k pages.
*/
#ifdef CONFIG_PPC_64K_PAGES
-void demote_segment_4k(struct mm_struct *mm, unsigned long addr)
+notrace void demote_segment_4k(struct mm_struct *mm, unsigned long addr)
{
if (get_slice_psize(mm, addr) == MMU_PAGE_4K)
return;
@@ -919,7 +919,7 @@ void demote_segment_4k(struct mm_struct *mm, unsigned long addr)
* Result is 0: full permissions, _PAGE_RW: read-only,
* _PAGE_USER or _PAGE_USER|_PAGE_RW: no access.
*/
-static int subpage_protection(struct mm_struct *mm, unsigned long ea)
+static notrace int subpage_protection(struct mm_struct *mm, unsigned long ea)
{
struct subpage_prot_table *spt = &mm->context.spt;
u32 spp = 0;
@@ -967,7 +967,7 @@ void hash_failure_debug(unsigned long ea, unsigned long access,
trap, vsid, ssize, psize, lpsize, pte);
}

-static void check_paca_psize(unsigned long ea, struct mm_struct *mm,
+static notrace void check_paca_psize(unsigned long ea, struct mm_struct *mm,
int psize, bool user_region)
{
if (user_region) {
@@ -989,7 +989,7 @@ static void check_paca_psize(unsigned long ea, struct mm_struct *mm,
* -1 - critical hash insertion error
* -2 - access not permitted by subpage protection mechanism
*/
-int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
+notrace int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
{
enum ctx_state prev_state = exception_enter();
pgd_t *pgdir;
@@ -1269,7 +1269,7 @@ out_exit:
/* WARNING: This is called from hash_low_64.S, if you change this prototype,
* do not forget to update the assembly call site !
*/
-void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize, int ssize,
+notrace void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize, int ssize,
int local)
{
unsigned long hash, index, shift, hidx, slot;
@@ -1343,7 +1343,7 @@ void low_hash_fault(struct pt_regs *regs, unsigned long address, int rc)
exception_exit(prev_state);
}

-long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
+notrace long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
unsigned long pa, unsigned long rflags,
unsigned long vflags, int psize, int ssize)
{
diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c
index a5bcf93..ea911b2 100644
--- a/arch/powerpc/mm/hugetlbpage-hash64.c
+++ b/arch/powerpc/mm/hugetlbpage-hash64.c
@@ -18,7 +18,7 @@ extern long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
unsigned long pa, unsigned long rlags,
unsigned long vflags, int psize, int ssize);

-int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
+notrace int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
pte_t *ptep, unsigned long trap, int local, int ssize,
unsigned int shift, unsigned int mmu_psize)
{
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 7e70ae9..4fe2338 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -905,7 +905,7 @@ static int __init hugetlbpage_init(void)
#endif
module_init(hugetlbpage_init);

-void flush_dcache_icache_hugepage(struct page *page)
+notrace void flush_dcache_icache_hugepage(struct page *page)
{
int i;
void *start;
@@ -936,7 +936,7 @@ void flush_dcache_icache_hugepage(struct page *page)
* we can follow the address down to the the page and take a ref on it.
*/

-pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift)
+notrace pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift)
{
pgd_t pgd, *pgdp;
pud_t pud, *pudp;
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index e0f7a18..05f66b1 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -400,7 +400,7 @@ void flush_dcache_page(struct page *page)
}
EXPORT_SYMBOL(flush_dcache_page);

-void flush_dcache_icache_page(struct page *page)
+notrace void flush_dcache_icache_page(struct page *page)
{
#ifdef CONFIG_HUGETLB_PAGE
if (PageCompound(page)) {
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index c8d709a..72bd991 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -444,7 +444,7 @@ static void page_table_free_rcu(void *table)
}
}

-void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift)
+notrace void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift)
{
unsigned long pgf = (unsigned long)table;

diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 0399a67..5ccc3c8 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -94,7 +94,7 @@ static inline void create_shadowed_slbe(unsigned long ea, int ssize,
: "memory" );
}

-static void __slb_flush_and_rebolt(void)
+static notrace void __slb_flush_and_rebolt(void)
{
/* If you change this make sure you change SLB_NUM_BOLTED
* and PR KVM appropriately too. */
@@ -134,7 +134,7 @@ static void __slb_flush_and_rebolt(void)
: "memory");
}

-void slb_flush_and_rebolt(void)
+notrace void slb_flush_and_rebolt(void)
{

WARN_ON(!irqs_disabled());
@@ -149,7 +149,7 @@ void slb_flush_and_rebolt(void)
get_paca()->slb_cache_ptr = 0;
}

-void slb_vmalloc_update(void)
+notrace void slb_vmalloc_update(void)
{
unsigned long vflags;

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index b0c75cc..c1efc4c 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -74,8 +74,8 @@ static void slice_print_mask(const char *label, struct slice_mask mask) {}

#endif

-static struct slice_mask slice_range_to_mask(unsigned long start,
- unsigned long len)
+static notrace struct slice_mask slice_range_to_mask(unsigned long start,
+ unsigned long len)
{
unsigned long end = start + len - 1;
struct slice_mask ret = { 0, 0 };
@@ -564,7 +564,7 @@ unsigned long arch_get_unmapped_area_topdown(struct file *filp,
current->mm->context.user_psize, 1);
}

-unsigned int get_slice_psize(struct mm_struct *mm, unsigned long addr)
+notrace unsigned int get_slice_psize(struct mm_struct *mm, unsigned long addr)
{
unsigned char *hpsizes;
int index, mask_index;
@@ -676,7 +676,7 @@ void slice_set_psize(struct mm_struct *mm, unsigned long address,
#endif
}

-void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
+notrace void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
unsigned long len, unsigned int psize)
{
struct slice_mask mask = slice_range_to_mask(start, len);


2015-05-15 01:34:57

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH] ppc64 ftrace: mark data_access callees "notrace" (pt.1)

On Wed, 2015-05-13 at 18:11 +0200, Torsten Duwe wrote:
> In order to avoid an endless recursion, functions that may get
> called from the data access handler must not call into tracing
> functions, which may cause data access faults ;-)
>
> Advancing from my previous approach that lavishly compiled whole
> subdirs without the profiling switches, this is more fine-grained
> (but probably yet incomplete). This patch is necessary albeit not
> sufficient for FTRACE_WITH_REGS on ppc64.

There's got to be a better solution than this. The chance that you've correctly
annotated every function is basically 0, and the chance that we correctly add
it to every new or modififed function in the future is also 0.

I don't mean that as a criticism of you, but rather the technique. For starters
I don't see any annotations in 32-bit code, or in the BookE code etc.

Can you give us more details on what goes wrong without these annotations?

cheers

2015-05-15 08:45:49

by Torsten Duwe

[permalink] [raw]
Subject: Re: [PATCH] ppc64 ftrace: mark data_access callees "notrace" (pt.1)

On Fri, May 15, 2015 at 11:34:47AM +1000, Michael Ellerman wrote:
> On Wed, 2015-05-13 at 18:11 +0200, Torsten Duwe wrote:
> > In order to avoid an endless recursion, functions that may get
> > called from the data access handler must not call into tracing
> > functions, which may cause data access faults ;-)
> >
> > Advancing from my previous approach that lavishly compiled whole
> > subdirs without the profiling switches, this is more fine-grained
> > (but probably yet incomplete). This patch is necessary albeit not
> > sufficient for FTRACE_WITH_REGS on ppc64.
>
> There's got to be a better solution than this. The chance that you've correctly
> annotated every function is basically 0, and the chance that we correctly add

Well, I used an automated static code analysis to find these, so from that point
the chances to find all the relevant funcs is significantly > 0.

> it to every new or modififed function in the future is also 0.

Yes, this worries me, too. This may lead to very obscure and confusing breakage :-(

> I don't mean that as a criticism of you, but rather the technique. For starters

No problem, I don't take this personally. I'd also prefer a more elegant solution,
but the problem seems to stem from this particular hardware.

> I don't see any annotations in 32-bit code, or in the BookE code etc.

Exactly, for a start. 32-bit & friends would be another run, for all hardware
where the MMU does not fully autoload. What does sparc do, btw?

> Can you give us more details on what goes wrong without these annotations?

e.g. ftrace tries to note that a function has been called. The memory location of
the tracing framework that is to record this does not yet have an HTAB entry
-> data access fault. Should any of the functions involved in the HTAB handling
be profiled, ftrace will try to note that function call into some RAM location,
which might still not have an entry, etc...

I've seen this lead into an endless recursion, unless, like I wrote and patched before,
disabled tracing in all the relevant source dirs. This looked like overkill to me,
hence the machine-aided approach to find the exact set of functions affected.

Can you think of a better approach?

Torsten

2015-05-16 08:05:55

by Torsten Duwe

[permalink] [raw]
Subject: Re: [PATCH] ppc64 ftrace: mark data_access callees "notrace" (pt.1)

On Fri, May 15, 2015 at 10:45:42AM +0200, Torsten Duwe wrote:
> On Fri, May 15, 2015 at 11:34:47AM +1000, Michael Ellerman wrote:
> >
> > There's got to be a better solution than this.
>
> Can you think of a better approach?

Maybe a per thread variable to lock out a recursion into tracing?
Thanks for your doubt.

Torsten

2015-05-18 12:30:06

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH] ppc64 ftrace: mark data_access callees "notrace" (pt.1)

yOn Sat, 16 May 2015, Torsten Duwe wrote:

> > > There's got to be a better solution than this.
> >
> > Can you think of a better approach?
>
> Maybe a per thread variable to lock out a recursion into tracing?
> Thanks for your doubt.

ftrace already handles recursion protection by itself (depending on the
per-ftrace-ops FTRACE_OPS_FL_RECURSION_SAFE flag).

It's however not really well-defined what to do when recursion would
happen. Therefore __notrace__ annotation, that just completely avoid such
situation by making tracing impossible, looks like saner general solution
to me.

--
Jiri Kosina
SUSE Labs

2015-05-19 03:27:12

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH] ppc64 ftrace: mark data_access callees "notrace" (pt.1)

On Mon, 2015-05-18 at 14:29 +0200, Jiri Kosina wrote:
> yOn Sat, 16 May 2015, Torsten Duwe wrote:
>
> > > > There's got to be a better solution than this.
> > >
> > > Can you think of a better approach?
> >
> > Maybe a per thread variable to lock out a recursion into tracing?
> > Thanks for your doubt.
>
> ftrace already handles recursion protection by itself (depending on the
> per-ftrace-ops FTRACE_OPS_FL_RECURSION_SAFE flag).

OK, so I wonder why that's not working for us?

> It's however not really well-defined what to do when recursion would
> happen. Therefore __notrace__ annotation, that just completely avoid such
> situation by making tracing impossible, looks like saner general solution
> to me.

I disagree. Correctly annotating all functions that might be called ever and
for all time is a maintenance nightmare and is never going to work in the long
term.

cheers

2015-05-19 09:52:53

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH] ppc64 ftrace: mark data_access callees "notrace" (pt.1)

On Tue, 19 May 2015, Michael Ellerman wrote:

> > ftrace already handles recursion protection by itself (depending on the
> > per-ftrace-ops FTRACE_OPS_FL_RECURSION_SAFE flag).
>
> OK, so I wonder why that's not working for us?

The situation when traced function recurses to itself is different from
the situation when tracing core infrastrcuture would recurse to itself
while performing tracing.

> > It's however not really well-defined what to do when recursion would
> > happen. Therefore __notrace__ annotation, that just completely avoid such
> > situation by making tracing impossible, looks like saner general solution
> > to me.
>
> I disagree. Correctly annotating all functions that might be called ever and
> for all time is a maintenance nightmare and is never going to work in the long
> term.

All the functions called by ftrace must be marked as notrace, there is no
way out of it.

--
Jiri Kosina
SUSE Labs

2015-05-20 09:03:31

by Torsten Duwe

[permalink] [raw]
Subject: Re: [PATCH] ppc64 ftrace: mark data_access callees "notrace" (pt.1)

On Tue, May 19, 2015 at 01:27:07PM +1000, Michael Ellerman wrote:
> On Mon, 2015-05-18 at 14:29 +0200, Jiri Kosina wrote:
> >
> > ftrace already handles recursion protection by itself (depending on the
> > per-ftrace-ops FTRACE_OPS_FL_RECURSION_SAFE flag).
>
> OK, so I wonder why that's not working for us?

IIRC a data access fault happens just before that flag is looked at ;-)

I'm now thinking about a hybrid solution: mark the most critical functions
"notrace", especially those directly involved with MMU loading, and add
a per-thread flag to catch the not-so-obvious cases.

Torsten

2015-05-26 14:35:06

by Torsten Duwe

[permalink] [raw]
Subject: Re: [PATCH] ppc64 ftrace: mark data_access callees "notrace" (pt.1)

On Wed, May 20, 2015 at 11:03:25AM +0200, Torsten Duwe wrote:
> On Tue, May 19, 2015 at 01:27:07PM +1000, Michael Ellerman wrote:
> > On Mon, 2015-05-18 at 14:29 +0200, Jiri Kosina wrote:
> > >
> > > ftrace already handles recursion protection by itself (depending on the
> > > per-ftrace-ops FTRACE_OPS_FL_RECURSION_SAFE flag).
> >
> > OK, so I wonder why that's not working for us?
>
> IIRC a data access fault happens just before that flag is looked at ;-)
>
> I'm now thinking about a hybrid solution: mark the most critical functions
> "notrace", especially those directly involved with MMU loading, and add
> a per-thread flag to catch the not-so-obvious cases.

I realised the trace_recursion in the "current" task struct is not so far away,
and it should not fault, right? So that part of the solution would look like this
on top of my previous ftrace patch set.
It would impact performance further so I'd stick with the "notrace" on
certain hot functions, like hash_page. What do you think?

Torsten

diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 4717859..ae10752 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -72,6 +72,7 @@ int main(void)
DEFINE(THREAD, offsetof(struct task_struct, thread));
DEFINE(MM, offsetof(struct task_struct, mm));
DEFINE(MMCONTEXTID, offsetof(struct mm_struct, context.id));
+ DEFINE(TASK_TRACEREC, offsetof(struct task_struct, trace_recursion));
#ifdef CONFIG_PPC64
DEFINE(AUDITCONTEXT, offsetof(struct task_struct, audit_context));
DEFINE(SIGSEGV, SIGSEGV);
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index a4132ef..7842092 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -1202,7 +1202,13 @@ _GLOBAL(ftrace_caller)
SAVE_8GPRS(16,r1)
SAVE_8GPRS(24,r1)

-
+ ld r3, PACACURRENT(r13)
+ ld r4, TASK_TRACEREC(r3)
+ andi. r5, r4, 0x0010 // ( 1 << TRACE_FTRACE_BIT )
+ ori r4, r4, 0x0010
+ std r4, TASK_TRACEREC(r3)
+ bne 3f // ftrace in progress - avoid recursion!
+
LOAD_REG_IMMEDIATE(r3,function_trace_op)
ld r5,0(r3)

@@ -1224,9 +1230,14 @@ ftrace_call:
bl ftrace_stub
nop

+ ld r3, PACACURRENT(r13)
+ ld r4, TASK_TRACEREC(r3)
+ andi r4, r4, 0xffef // ~( 1 << TRACE_FTRACE_BIT )
+ std r4, TASK_TRACEREC(r3)
+
ld r3, _NIP(r1)
mtlr r3
-
+3:
REST_8GPRS(0,r1)
REST_8GPRS(8,r1)
REST_8GPRS(16,r1)

2015-06-03 13:03:16

by Torsten Duwe

[permalink] [raw]
Subject: [PATCH 0/4] ppc64 ftrace implementation

On Tue, May 19, 2015 at 11:52:47AM +0200, Jiri Kosina wrote:
> On Tue, 19 May 2015, Michael Ellerman wrote:
>
> > > ftrace already handles recursion protection by itself (depending on the
> > > per-ftrace-ops FTRACE_OPS_FL_RECURSION_SAFE flag).
> >
> > OK, so I wonder why that's not working for us?
>
> The situation when traced function recurses to itself is different from
> the situation when tracing core infrastrcuture would recurse to itself
> while performing tracing.

I have used this inspiration to add a catch-all parachute for ftrace_caller,
see my last reply. It reappears here as patch 4/4. Expect noticable performance
impact compared to the selected "notrace" attributation discussed here. This should
still be done in a second step especially for the hardware assistance functions
I mentioned.

Torsten

2015-06-03 13:09:20

by Torsten Duwe

[permalink] [raw]
Subject: Re: [PATCH 1/4] ppc64 ftrace implementation

Implement ftrace on ppc64

Signed-off-by: Torsten Duwe <[email protected]>

diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index e366187..6111191 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -46,6 +46,8 @@
extern void _mcount(void);

#ifdef CONFIG_DYNAMIC_FTRACE
+# define FTRACE_ADDR ((unsigned long)ftrace_caller+8)
+# define FTRACE_REGS_ADDR FTRACE_ADDR
static inline unsigned long ftrace_call_adjust(unsigned long addr)
{
/* reloction of mcount call site is the same as the address */
@@ -58,6 +60,9 @@ struct dyn_arch_ftrace {
#endif /* CONFIG_DYNAMIC_FTRACE */
#endif /* __ASSEMBLY__ */

+#ifdef CONFIG_DYNAMIC_FTRACE
+#define ARCH_SUPPORTS_FTRACE_OPS 1
+#endif
#endif

#if defined(CONFIG_FTRACE_SYSCALLS) && defined(CONFIG_PPC64) && !defined(__ASSEMBLY__)
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index d180caf..a4132ef 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -1152,32 +1152,107 @@ _GLOBAL(enter_prom)

#ifdef CONFIG_FUNCTION_TRACER
#ifdef CONFIG_DYNAMIC_FTRACE
-_GLOBAL(mcount)
+
+#define TOCSAVE 24
+
_GLOBAL(_mcount)
- blr
+ nop // REQUIRED for ftrace, to calculate local/global entry diff
+.localentry _mcount,.-_mcount
+ mflr r0
+ mtctr r0
+
+ LOAD_REG_ADDR_PIC(r12,ftrace_trace_function)
+ ld r12,0(r12)
+ LOAD_REG_ADDR_PIC(r0,ftrace_stub)
+ cmpd r0,r12
+ ld r0,LRSAVE(r1)
+ bne- 2f
+
+ mtlr r0
+ bctr
+
+2: /* here we have (*ftrace_trace_function)() in r12,
+ "selfpc" in CTR
+ and "frompc" in r0 */
+
+ mtlr r0
+ bctr
+
+_GLOBAL(ftrace_caller)
+ mr r0,r2 // global (module) call: save module TOC
+ b 1f
+.localentry ftrace_caller,.-ftrace_caller
+ mr r0,r2 // local call: callee's TOC == our TOC
+ b 2f
+
+1: addis r2,r12,(.TOC.-0b)@ha
+ addi r2,r2,(.TOC.-0b)@l
+
+2: // Here we have our proper TOC ptr in R2,
+ // and the one we need to restore on return in r0.
+
+ ld r12, 16(r1) // get caller's adress
+
+ stdu r1,-SWITCH_FRAME_SIZE(r1)
+
+ std r12, _LINK(r1)
+ SAVE_8GPRS(0,r1)
+ std r0,TOCSAVE(r1)
+ SAVE_8GPRS(8,r1)
+ SAVE_8GPRS(16,r1)
+ SAVE_8GPRS(24,r1)
+
+
+ LOAD_REG_IMMEDIATE(r3,function_trace_op)
+ ld r5,0(r3)
+
+ mflr r3
+ std r3, _NIP(r1)
+ std r3, 16(r1)
+ subi r3, r3, MCOUNT_INSN_SIZE
+ mfmsr r4
+ std r4, _MSR(r1)
+ mfctr r4
+ std r4, _CTR(r1)
+ mfxer r4
+ std r4, _XER(r1)
+ mr r4, r12
+ addi r6, r1 ,STACK_FRAME_OVERHEAD

-_GLOBAL_TOC(ftrace_caller)
- /* Taken from output of objdump from lib64/glibc */
- mflr r3
- ld r11, 0(r1)
- stdu r1, -112(r1)
- std r3, 128(r1)
- ld r4, 16(r11)
- subi r3, r3, MCOUNT_INSN_SIZE
.globl ftrace_call
ftrace_call:
bl ftrace_stub
nop
+
+ ld r3, _NIP(r1)
+ mtlr r3
+
+ REST_8GPRS(0,r1)
+ REST_8GPRS(8,r1)
+ REST_8GPRS(16,r1)
+ REST_8GPRS(24,r1)
+
+ addi r1, r1, SWITCH_FRAME_SIZE
+
+ ld r12, 16(r1) // get caller's adress
+ mr r2,r0 // restore callee's TOC
+ mflr r0 // move this LR to CTR
+ mtctr r0
+ mr r0,r12 // restore callee's lr at _mcount site
+ mtlr r0
+ bctr // jump after _mcount site
+
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
.globl ftrace_graph_call
ftrace_graph_call:
b ftrace_graph_stub
_GLOBAL(ftrace_graph_stub)
#endif
- ld r0, 128(r1)
- mtlr r0
- addi r1, r1, 112
+
_GLOBAL(ftrace_stub)
+ nop
+ nop
+.localentry ftrace_stub,.-ftrace_stub
blr
#else
_GLOBAL_TOC(_mcount)
@@ -1211,12 +1286,12 @@ _GLOBAL(ftrace_stub)
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
_GLOBAL(ftrace_graph_caller)
/* load r4 with local address */
- ld r4, 128(r1)
+ ld r4, LRSAVE+SWITCH_FRAME_SIZE(r1)
subi r4, r4, MCOUNT_INSN_SIZE

/* Grab the LR out of the caller stack frame */
- ld r11, 112(r1)
- ld r3, 16(r11)
+ ld r11, SWITCH_FRAME_SIZE(r1)
+ ld r3, LRSAVE(r11)

bl prepare_ftrace_return
nop
@@ -1228,10 +1303,7 @@ _GLOBAL(ftrace_graph_caller)
ld r11, 112(r1)
std r3, 16(r11)

- ld r0, 128(r1)
- mtlr r0
- addi r1, r1, 112
- blr
+ b ftrace_graph_stub

_GLOBAL(return_to_handler)
/* need to save return values */
diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c
index 44d4d8e..349d07c 100644
--- a/arch/powerpc/kernel/ftrace.c
+++ b/arch/powerpc/kernel/ftrace.c
@@ -61,8 +61,11 @@ ftrace_modify_code(unsigned long ip, unsigned int old, unsigned int new)
return -EFAULT;

/* Make sure it is what we expect it to be */
- if (replaced != old)
+ if (replaced != old) {
+ printk(KERN_ERR "%p: replaced (%#x) != old (%#x)",
+ (void *)ip, replaced, old);
return -EINVAL;
+ }

/* replace the text with the new text */
if (patch_instruction((unsigned int *)ip, new))
@@ -106,14 +109,16 @@ static int
__ftrace_make_nop(struct module *mod,
struct dyn_ftrace *rec, unsigned long addr)
{
- unsigned int op;
+ unsigned int op, op0, op1, pop;
unsigned long entry, ptr;
unsigned long ip = rec->ip;
void *tramp;

/* read where this goes */
- if (probe_kernel_read(&op, (void *)ip, sizeof(int)))
+ if (probe_kernel_read(&op, (void *)ip, sizeof(int))) {
+ printk(KERN_ERR "Fetching opcode failed.\n");
return -EFAULT;
+ }

/* Make sure that that this is still a 24bit jump */
if (!is_bl_op(op)) {
@@ -158,10 +163,42 @@ __ftrace_make_nop(struct module *mod,
*
* Use a b +8 to jump over the load.
*/
- op = 0x48000008; /* b +8 */

- if (patch_instruction((unsigned int *)ip, op))
+ pop = 0x48000008; /* b +8 */
+
+ /*
+ * Check what is in the next instruction. We can see ld r2,40(r1), but
+ * on first pass after boot we will see mflr r0.
+ */
+ if (probe_kernel_read(&op, (void *)(ip+4), MCOUNT_INSN_SIZE)) {
+ printk(KERN_ERR "Fetching op failed.\n");
+ return -EFAULT;
+ }
+
+ if (op != 0xe8410028) { /* ld r2,STACK_OFFSET(r1) */
+
+ if (probe_kernel_read(&op0, (void *)(ip-8), MCOUNT_INSN_SIZE)) {
+ printk(KERN_ERR "Fetching op0 failed.\n");
+ return -EFAULT;
+ }
+
+ if (probe_kernel_read(&op1, (void *)(ip-4), MCOUNT_INSN_SIZE)) {
+ printk(KERN_ERR "Fetching op1 failed.\n");
+ return -EFAULT;
+ }
+
+ if (op0 != 0x7c0802a6 && op1 != 0xf8010010) { /* mflr r0 ; std r0,LRSAVE(r1) */
+ printk(KERN_ERR "Unexpected instructions around bl when enabling dynamic ftrace! (%08x,%08x,bl,%08x)\n", op0, op1, op);
+ return -EINVAL;
+ }
+
+ pop = PPC_INST_NOP; /* When using -mkernel_profile there is no load to jump over */
+ }
+
+ if (patch_instruction((unsigned int *)ip, pop)) {
+ printk(KERN_ERR "Patching NOP failed.\n");
return -EPERM;
+ }

return 0;
}
@@ -287,6 +324,13 @@ int ftrace_make_nop(struct module *mod,

#ifdef CONFIG_MODULES
#ifdef CONFIG_PPC64
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
+ unsigned long addr)
+{
+ return ftrace_make_call(rec, addr);
+}
+#endif
static int
__ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
{
@@ -306,11 +350,18 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
* The load offset is different depending on the ABI. For simplicity
* just mask it out when doing the compare.
*/
+#if 0 // -pg, no -mprofile-kernel
if ((op[0] != 0x48000008) || ((op[1] & 0xffff0000) != 0xe8410000)) {
- pr_err("Unexpected call sequence: %x %x\n", op[0], op[1]);
+ pr_err("Unexpected call sequence at %p: %x %x\n", ip, op[0], op[1]);
return -EINVAL;
}
-
+#else
+ /* look for patched "NOP" on ppc64 with -mprofile-kernel */
+ if ((op[0] != 0x60000000) ) {
+ pr_err("Unexpected call at %p: %x\n", ip, op[0]);
+ return -EINVAL;
+ }
+#endif
/* If we never set up a trampoline to ftrace_caller, then bail */
if (!rec->arch.mod->arch.tramp) {
pr_err("No ftrace trampoline\n");
@@ -330,7 +381,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)

return 0;
}
-#else
+#else /* !CONFIG_PPC64: */
static int
__ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
{
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 6838451..1428ad8 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -138,12 +138,22 @@ static u32 ppc64_stub_insns[] = {
0x4e800420 /* bctr */
};

+/* In case of _mcount calls or dynamic ftracing, Do not save the
+ current callee's TOC (in R2) again into the original caller's stack
+ frame during this trampoline hop. The stack frame already holds
+ that of the original caller. _mcount and ftrace_caller will take
+ care of this TOC value themselves.
+*/
+#define SQUASH_TOC_SAVE_INSN(trampoline_addr) \
+ ((struct ppc64_stub_entry*)(trampoline_addr))-> \
+ jump[2] = PPC_INST_NOP;
+
#ifdef CONFIG_DYNAMIC_FTRACE

static u32 ppc64_stub_mask[] = {
0xffff0000,
0xffff0000,
- 0xffffffff,
+ 0x00000000,
0xffffffff,
#if !defined(_CALL_ELF) || _CALL_ELF != 2
0xffffffff,
@@ -170,6 +180,9 @@ bool is_module_trampoline(u32 *p)
if ((insna & mask) != (insnb & mask))
return false;
}
+ if (insns[2] != ppc64_stub_insns[2] &&
+ insns[2] != PPC_INST_NOP )
+ return false;

return true;
}
@@ -475,6 +488,14 @@ static unsigned long stub_for_addr(Elf64_Shdr *sechdrs,
static int restore_r2(u32 *instruction, struct module *me)
{
if (*instruction != PPC_INST_NOP) {
+
+ /* -mprofile_kernel sequence starting with mflr r0; std r0, LRSAVE(r1) */
+ if (instruction[-3] == 0x7c0802a6 && instruction[-2] == 0xf8010010) {
+ /* Nothing to be done here, it's a _mcount call location
+ and r2 will have to be restored in the _mcount function */
+ return 2;
+ };
+
pr_err("%s: Expect noop after relocate, got %08x\n",
me->name, *instruction);
return 0;
@@ -490,7 +511,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
unsigned int relsec,
struct module *me)
{
- unsigned int i;
+ unsigned int i, r2;
Elf64_Rela *rela = (void *)sechdrs[relsec].sh_addr;
Elf64_Sym *sym;
unsigned long *location;
@@ -603,8 +624,11 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
value = stub_for_addr(sechdrs, value, me);
if (!value)
return -ENOENT;
- if (!restore_r2((u32 *)location + 1, me))
+ if (!(r2 = restore_r2((u32 *)location + 1, me)))
return -ENOEXEC;
+ /* Squash the TOC saver for profiler calls */
+ if (!strcmp("_mcount", strtab+sym->st_name))
+ SQUASH_TOC_SAVE_INSN(value);
} else
value += local_entry_offset(sym);

@@ -665,6 +689,9 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
me->arch.tramp = stub_for_addr(sechdrs,
(unsigned long)ftrace_caller,
me);
+ /* ftrace_caller will take care of the TOC;
+ do not clobber original caller's value. */
+ SQUASH_TOC_SAVE_INSN(me->arch.tramp);
#endif

return 0;

2015-06-03 13:11:01

by Torsten Duwe

[permalink] [raw]
Subject: Re: [PATCH 2/4] ppc64 ftrace configuration


Add Kconfig variables and Makefile magic for ftrace with -mprofile-kernel

Signed-off-by: Torsten Duwe <[email protected]>

diff --git a/Makefile b/Makefile
index 3d16bcc..bbd5e87 100644
--- a/Makefile
+++ b/Makefile
@@ -733,7 +733,10 @@ export CC_FLAGS_FTRACE
ifdef CONFIG_HAVE_FENTRY
CC_USING_FENTRY := $(call cc-option, -mfentry -DCC_USING_FENTRY)
endif
-KBUILD_CFLAGS += $(CC_FLAGS_FTRACE) $(CC_USING_FENTRY)
+ifdef CONFIG_HAVE_MPROFILE_KERNEL
+CC_USING_MPROFILE_KERNEL := $(call cc-option, -mprofile-kernel -DCC_USING_MPROFILE_KERNEL)
+endif
+KBUILD_CFLAGS += $(CC_FLAGS_FTRACE) $(CC_USING_FENTRY) $(CC_USING_MPROFILE_KERNEL)
KBUILD_AFLAGS += $(CC_USING_FENTRY)
ifdef CONFIG_DYNAMIC_FTRACE
ifdef CONFIG_HAVE_C_RECORDMCOUNT
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 22b0940..566f204 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -94,8 +94,10 @@ config PPC
select OF_RESERVED_MEM
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_DYNAMIC_FTRACE
+ select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_GRAPH_TRACER
+ select HAVE_MPROFILE_KERNEL
select SYSCTL_EXCEPTION_TRACE
select ARCH_WANT_OPTIONAL_GPIOLIB
select VIRT_TO_BUS if !PPC64
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index a5da09c..dd53f3d 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -52,6 +52,11 @@ config HAVE_FENTRY
help
Arch supports the gcc options -pg with -mfentry

+config HAVE_MPROFILE_KERNEL
+ bool
+ help
+ Arch supports the gcc options -pg with -mprofile-kernel
+
config HAVE_C_RECORDMCOUNT
bool
help

2015-06-03 13:15:33

by Torsten Duwe

[permalink] [raw]
Subject: Re: [PATCH 3/4] ppc64 ftrace: spare early boot and low level code


Using -mprofile-kernel on early boot code not only confuses the checker
but is also useless, as the infrastructure is not yet in place. Proceed
like with -pg, equally with time.o and ftrace itself.

Signed-off-by: Torsten Duwe <[email protected]>

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 502cf69..fb33fc5 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -17,14 +17,14 @@ endif

ifdef CONFIG_FUNCTION_TRACER
# Do not trace early boot code
-CFLAGS_REMOVE_cputable.o = -pg -mno-sched-epilog
-CFLAGS_REMOVE_prom_init.o = -pg -mno-sched-epilog
-CFLAGS_REMOVE_btext.o = -pg -mno-sched-epilog
-CFLAGS_REMOVE_prom.o = -pg -mno-sched-epilog
+CFLAGS_REMOVE_cputable.o = -pg -mno-sched-epilog -mprofile-kernel
+CFLAGS_REMOVE_prom_init.o = -pg -mno-sched-epilog -mprofile-kernel
+CFLAGS_REMOVE_btext.o = -pg -mno-sched-epilog -mprofile-kernel
+CFLAGS_REMOVE_prom.o = -pg -mno-sched-epilog -mprofile-kernel
# do not trace tracer code
-CFLAGS_REMOVE_ftrace.o = -pg -mno-sched-epilog
+CFLAGS_REMOVE_ftrace.o = -pg -mno-sched-epilog -mprofile-kernel
# timers used by tracing
-CFLAGS_REMOVE_time.o = -pg -mno-sched-epilog
+CFLAGS_REMOVE_time.o = -pg -mno-sched-epilog -mprofile-kernel
endif

obj-y := cputable.o ptrace.o syscalls.o \

2015-06-03 13:22:23

by Torsten Duwe

[permalink] [raw]
Subject: Re: [PATCH 4/4] ppc64 ftrace recursion protection

As suggested by You and Jikos, a flag in task_struct's trace_recursion
is used to block a tracer function to recurse into itself, especially
on a data access fault. This should catch all functions called by the
fault handlers which are not yet attributed notrace.

Signed-off-by: Torsten Duwe <[email protected]>

diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 4717859..ae10752 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -72,6 +72,7 @@ int main(void)
DEFINE(THREAD, offsetof(struct task_struct, thread));
DEFINE(MM, offsetof(struct task_struct, mm));
DEFINE(MMCONTEXTID, offsetof(struct mm_struct, context.id));
+ DEFINE(TASK_TRACEREC, offsetof(struct task_struct, trace_recursion));
#ifdef CONFIG_PPC64
DEFINE(AUDITCONTEXT, offsetof(struct task_struct, audit_context));
DEFINE(SIGSEGV, SIGSEGV);
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index a4132ef..4768104 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -1202,7 +1202,13 @@ _GLOBAL(ftrace_caller)
SAVE_8GPRS(16,r1)
SAVE_8GPRS(24,r1)

-
+ ld r3, PACACURRENT(r13)
+ ld r4, TASK_TRACEREC(r3)
+ andi. r5, r4, 0x0010 // ( 1 << TRACE_FTRACE_BIT )
+ ori r4, r4, 0x0010
+ std r4, TASK_TRACEREC(r3)
+ bne 3f // ftrace in progress - avoid recursion!
+
LOAD_REG_IMMEDIATE(r3,function_trace_op)
ld r5,0(r3)

@@ -1224,9 +1230,14 @@ ftrace_call:
bl ftrace_stub
nop

+ ld r3, PACACURRENT(r13)
+ ld r4, TASK_TRACEREC(r3)
+ andi. r4, r4, 0xffef // ~( 1 << TRACE_FTRACE_BIT )
+ std r4, TASK_TRACEREC(r3)
+
ld r3, _NIP(r1)
mtlr r3
-
+3:
REST_8GPRS(0,r1)
REST_8GPRS(8,r1)
REST_8GPRS(16,r1)

2015-06-08 15:30:43

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/4] ppc64 ftrace implementation


Why are your patches all as replies?

On Wed, 3 Jun 2015 15:08:29 +0200
Torsten Duwe <[email protected]> wrote:

> Implement ftrace on ppc64

What do you mean "Implement ftrace on ppc64"? It's already implemented.

I'm not even going to bother looking at this patch because I have no
idea what its purpose is.

-- Steve

>
> Signed-off-by: Torsten Duwe <[email protected]>
>

2015-06-08 15:57:07

by Torsten Duwe

[permalink] [raw]
Subject: Re: [PATCH 1/4] ppc64 ftrace implementation

On Mon, Jun 08, 2015 at 11:30:32AM -0400, Steven Rostedt wrote:
>
> Why are your patches all as replies?

Because I cared too much about the threading and the series that the
"Re:" escaped me.

> > Implement ftrace on ppc64
>
> What do you mean "Implement ftrace on ppc64"? It's already implemented.

Sorry, too deep in the details. It's DYNAMIC_FTRACE_WITH_REGS, actually.
Using gcc's -mprofile-kernel to call into ftrace...

> I'm not even going to bother looking at this patch because I have no
> idea what its purpose is.

Understandable ;-)

So: Implement DYNAMIC_FTRACE_WITH_REGS, in order to prepare for live patching.
Other obvious things I forgot to mention?

Torsten