Changelog V1 --> V2:
Adopted commetns from Steven
* remove all reference to tsk->comm since it is unnecessary for non-sched
trace points
* reduce arguments for __get_user_pages trace point and update mm/gup.c
accordingly
* Added Ralf's acked-by for patch 4/7.
There is not content change for the trace points in arch specific mm/gup.c.
Some background about why I think this might be useful.
When I was profiling some hugetlb related program, I got page-faults event
doubled when hugetlb is enabled. When I looked into the code, I found page-faults
come from two places, do_page_fault and gup. So, I tried to figure out which
play a role (or both) in my use case. But I can't find existing finer tracing
event for sub page-faults in current mainline kernel.
So, I added the gup trace points support to have finer tracing events for
page-faults. The below events are added:
__get_user_pages
__get_user_pages_fast
fixup_user_fault
Both __get_user_pages and fixup_user_fault call handle_mm_fault.
Just added trace points to raw version __get_user_pages since all variants
will call it finally to do real work.
Although __get_user_pages_fast doesn't call handle_mm_fault, it might be useful
to have it to distinguish between slow and fast version.
Yang Shi (7):
trace/events: Add gup trace events
mm/gup: add gup trace points
x86: mm/gup: add gup trace points
mips: mm/gup: add gup trace points
s390: mm/gup: add gup trace points
sh: mm/gup: add gup trace points
sparc64: mm/gup: add gup trace points
arch/mips/mm/gup.c | 7 +++++++
arch/s390/mm/gup.c | 7 +++++++
arch/sh/mm/gup.c | 8 ++++++++
arch/sparc/mm/gup.c | 8 ++++++++
arch/x86/mm/gup.c | 7 +++++++
include/trace/events/gup.h | 71 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
mm/gup.c | 8 ++++++++
7 files changed, 116 insertions(+)
page-faults events record the invoke to handle_mm_fault, but the invoke
may come from do_page_fault or gup. In some use cases, the finer event count
mey be needed, so add trace events support for:
__get_user_pages
__get_user_pages_fast
fixup_user_fault
Signed-off-by: Yang Shi <[email protected]>
---
include/trace/events/gup.h | 71 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 71 insertions(+)
create mode 100644 include/trace/events/gup.h
diff --git a/include/trace/events/gup.h b/include/trace/events/gup.h
new file mode 100644
index 0000000..03a4674
--- /dev/null
+++ b/include/trace/events/gup.h
@@ -0,0 +1,71 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM gup
+
+#if !defined(_TRACE_GUP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_GUP_H
+
+#include <linux/types.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(gup_fixup_user_fault,
+
+ TP_PROTO(struct task_struct *tsk, struct mm_struct *mm,
+ unsigned long address, unsigned int fault_flags),
+
+ TP_ARGS(tsk, mm, address, fault_flags),
+
+ TP_STRUCT__entry(
+ __field( unsigned long, address )
+ ),
+
+ TP_fast_assign(
+ __entry->address = address;
+ ),
+
+ TP_printk("address=%lx", __entry->address)
+);
+
+TRACE_EVENT(gup_get_user_pages,
+
+ TP_PROTO(struct task_struct *tsk, struct mm_struct *mm,
+ unsigned long start, unsigned long nr_pages),
+
+ TP_ARGS(tsk, mm, start, nr_pages),
+
+ TP_STRUCT__entry(
+ __field( unsigned long, start )
+ __field( unsigned long, nr_pages )
+ ),
+
+ TP_fast_assign(
+ __entry->start = start;
+ __entry->nr_pages = nr_pages;
+ ),
+
+ TP_printk("start=%lx nr_pages=%lu", __entry->start, __entry->nr_pages)
+);
+
+TRACE_EVENT(gup_get_user_pages_fast,
+
+ TP_PROTO(unsigned long start, int nr_pages, int write,
+ struct page **pages),
+
+ TP_ARGS(start, nr_pages, write, pages),
+
+ TP_STRUCT__entry(
+ __field( unsigned long, start )
+ __field( unsigned long, nr_pages )
+ ),
+
+ TP_fast_assign(
+ __entry->start = start;
+ __entry->nr_pages = nr_pages;
+ ),
+
+ TP_printk("start=%lx nr_pages=%lu", __entry->start, __entry->nr_pages)
+);
+
+#endif /* _TRACE_GUP_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--
2.0.2
For slow version, just add trace point for raw __get_user_pages since all
slow variants call it to do the real work finally.
Signed-off-by: Yang Shi <[email protected]>
---
mm/gup.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/mm/gup.c b/mm/gup.c
index deafa2c..10245a4 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -13,6 +13,9 @@
#include <linux/rwsem.h>
#include <linux/hugetlb.h>
+#define CREATE_TRACE_POINTS
+#include <trace/events/gup.h>
+
#include <asm/pgtable.h>
#include <asm/tlbflush.h>
@@ -462,6 +465,8 @@ long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
if (!nr_pages)
return 0;
+ trace_gup_get_user_pages(tsk, mm, start, nr_pages);
+
VM_BUG_ON(!!pages != !!(gup_flags & FOLL_GET));
/*
@@ -599,6 +604,7 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
if (!(vm_flags & vma->vm_flags))
return -EFAULT;
+ trace_gup_fixup_user_fault(tsk, mm, address, fault_flags);
ret = handle_mm_fault(mm, vma, address, fault_flags);
if (ret & VM_FAULT_ERROR) {
if (ret & VM_FAULT_OOM)
@@ -1340,6 +1346,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
start, len)))
return 0;
+ trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
/*
* Disable interrupts. We use the nested form as we can already have
* interrupts disabled by get_futex_key.
--
2.0.2
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Signed-off-by: Yang Shi <[email protected]>
---
arch/x86/mm/gup.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index ae9a37b..ed6cca9 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -10,6 +10,9 @@
#include <linux/highmem.h>
#include <linux/swap.h>
+#define CREATE_TRACE_POINTS
+#include <trace/events/gup.h>>
+
#include <asm/pgtable.h>
static inline pte_t gup_get_pte(pte_t *ptep)
@@ -270,6 +273,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
(void __user *)start, len)))
return 0;
+ trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
/*
* XXX: batch / limit 'nr', to avoid large irq off latency
* needs some instrumenting to determine the common sizes used by
@@ -342,6 +347,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
goto slow_irqon;
#endif
+ trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
/*
* XXX: batch / limit 'nr', to avoid large irq off latency
* needs some instrumenting to determine the common sizes used by
--
2.0.2
Cc: [email protected]
Acked-by: Ralf Baechle <[email protected]>
Signed-off-by: Yang Shi <[email protected]>
---
arch/mips/mm/gup.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/mips/mm/gup.c b/arch/mips/mm/gup.c
index 349995d..3c5b8c8 100644
--- a/arch/mips/mm/gup.c
+++ b/arch/mips/mm/gup.c
@@ -12,6 +12,9 @@
#include <linux/swap.h>
#include <linux/hugetlb.h>
+#define CREATE_TRACE_POINTS
+#include <trace/events/gup.h>
+
#include <asm/cpu-features.h>
#include <asm/pgtable.h>
@@ -211,6 +214,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
(void __user *)start, len)))
return 0;
+ trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
/*
* XXX: batch / limit 'nr', to avoid large irq off latency
* needs some instrumenting to determine the common sizes used by
@@ -277,6 +282,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
if (end < start || cpu_has_dc_aliases)
goto slow_irqon;
+ trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
/* XXX: batch / limit 'nr' */
local_irq_disable();
pgdp = pgd_offset(mm, addr);
--
2.0.2
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: [email protected]
Signed-off-by: Yang Shi <[email protected]>
---
arch/s390/mm/gup.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/s390/mm/gup.c b/arch/s390/mm/gup.c
index 12bbf0e..ac25e28 100644
--- a/arch/s390/mm/gup.c
+++ b/arch/s390/mm/gup.c
@@ -10,6 +10,10 @@
#include <linux/vmstat.h>
#include <linux/pagemap.h>
#include <linux/rwsem.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/gup.h>
+
#include <asm/pgtable.h>
/*
@@ -188,6 +192,9 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
end = start + len;
if ((end <= start) || (end > TASK_SIZE))
return 0;
+
+ trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
/*
* local_irq_save() doesn't prevent pagetable teardown, but does
* prevent the pagetables from being freed on s390.
--
2.0.2
Cc: [email protected]
Signed-off-by: Yang Shi <[email protected]>
---
arch/sh/mm/gup.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/sh/mm/gup.c b/arch/sh/mm/gup.c
index e7af6a6..6df3e97 100644
--- a/arch/sh/mm/gup.c
+++ b/arch/sh/mm/gup.c
@@ -12,6 +12,10 @@
#include <linux/mm.h>
#include <linux/vmstat.h>
#include <linux/highmem.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/gup.h>
+
#include <asm/pgtable.h>
static inline pte_t gup_get_pte(pte_t *ptep)
@@ -178,6 +182,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
(void __user *)start, len)))
return 0;
+ trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
/*
* This doesn't prevent pagetable teardown, but does prevent
* the pagetables and pages from being freed.
@@ -231,6 +237,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
if (end < start)
goto slow_irqon;
+ trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
local_irq_disable();
pgdp = pgd_offset(mm, addr);
do {
--
2.0.2
Cc: "David S. Miller" <[email protected]>
Cc: [email protected]
Signed-off-by: Yang Shi <[email protected]>
---
The context depends on the below patch:
https://www.mail-archive.com/[email protected]/msg1028752.html
arch/sparc/mm/gup.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/sparc/mm/gup.c b/arch/sparc/mm/gup.c
index cf4fb47..6dcfc4d 100644
--- a/arch/sparc/mm/gup.c
+++ b/arch/sparc/mm/gup.c
@@ -10,6 +10,10 @@
#include <linux/vmstat.h>
#include <linux/pagemap.h>
#include <linux/rwsem.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/gup.h>
+
#include <asm/pgtable.h>
/*
@@ -177,6 +181,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
(void __user *)start, len)))
return 0;
+ trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
local_irq_save(flags);
pgdp = pgd_offset(mm, addr);
do {
@@ -209,6 +215,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
if (end < start)
goto slow_irqon;
+ trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
/*
* XXX: batch / limit 'nr', to avoid large irq off latency
* needs some instrumenting to determine the common sizes used by
--
2.0.2
On 12/02/2015 02:53 PM, Yang Shi wrote:
> diff --git a/mm/gup.c b/mm/gup.c
> index deafa2c..10245a4 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -13,6 +13,9 @@
> #include <linux/rwsem.h>
> #include <linux/hugetlb.h>
>
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/gup.h>
> +
> #include <asm/pgtable.h>
> #include <asm/tlbflush.h>
This needs to be _the_ last thing that gets #included. Otherwise, you
risk colliding with any other trace header that gets implicitly included
below.
> @@ -1340,6 +1346,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
> start, len)))
> return 0;
>
> + trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
> +
> /*
> * Disable interrupts. We use the nested form as we can already have
> * interrupts disabled by get_futex_key.
It would be _really_ nice to be able to see return values from the
various gup calls as well. Is that feasible?
On 12/2/2015 3:36 PM, Dave Hansen wrote:
> On 12/02/2015 02:53 PM, Yang Shi wrote:
>> diff --git a/mm/gup.c b/mm/gup.c
>> index deafa2c..10245a4 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -13,6 +13,9 @@
>> #include <linux/rwsem.h>
>> #include <linux/hugetlb.h>
>>
>> +#define CREATE_TRACE_POINTS
>> +#include <trace/events/gup.h>
>> +
>> #include <asm/pgtable.h>
>> #include <asm/tlbflush.h>
>
> This needs to be _the_ last thing that gets #included. Otherwise, you
> risk colliding with any other trace header that gets implicitly included
> below.
Thanks for the suggestion, will move it to the last.
>
>> @@ -1340,6 +1346,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
>> start, len)))
>> return 0;
>>
>> + trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
>> +
>> /*
>> * Disable interrupts. We use the nested form as we can already have
>> * interrupts disabled by get_futex_key.
>
> It would be _really_ nice to be able to see return values from the
> various gup calls as well. Is that feasible?
I think it should be feasible. kmem_cache_alloc trace event could show
return value. I'm supposed gup trace events should be able to do the
same thing.
Regards,
Yang
>
On 12/2/2015 4:11 PM, Shi, Yang wrote:
> On 12/2/2015 3:36 PM, Dave Hansen wrote:
>> On 12/02/2015 02:53 PM, Yang Shi wrote:
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index deafa2c..10245a4 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -13,6 +13,9 @@
>>> #include <linux/rwsem.h>
>>> #include <linux/hugetlb.h>
>>>
>>> +#define CREATE_TRACE_POINTS
>>> +#include <trace/events/gup.h>
>>> +
>>> #include <asm/pgtable.h>
>>> #include <asm/tlbflush.h>
>>
>> This needs to be _the_ last thing that gets #included. Otherwise, you
>> risk colliding with any other trace header that gets implicitly included
>> below.
>
> Thanks for the suggestion, will move it to the last.
>
>>
>>> @@ -1340,6 +1346,8 @@ int __get_user_pages_fast(unsigned long start,
>>> int nr_pages, int write,
>>> start, len)))
>>> return 0;
>>>
>>> + trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
>>> +
>>> /*
>>> * Disable interrupts. We use the nested form as we can
>>> already have
>>> * interrupts disabled by get_futex_key.
>>
>> It would be _really_ nice to be able to see return values from the
>> various gup calls as well. Is that feasible?
>
> I think it should be feasible. kmem_cache_alloc trace event could show
> return value. I'm supposed gup trace events should be able to do the
> same thing.
Just did a quick test, it is definitely feasible. Please check the below
test log:
trace-cmd-200 [000] 99.221486: gup_get_user_pages:
start=8000000ff0 nr_pages=1 ret=1
trace-cmd-200 [000] 99.223215: gup_get_user_pages:
start=8000000fdb nr_pages=1 ret=1
trace-cmd-200 [000] 99.223298: gup_get_user_pages:
start=8000000ed0 nr_pages=1 ret=1
nr_pages is the number of pages requested by the gup, ret is the return
value.
If nobody has objection, I will add it into V3.
Regards,
Yang
>
> Regards,
> Yang
>
>>
>
On Wed, 2 Dec 2015 14:53:27 -0800
Yang Shi <[email protected]> wrote:
> page-faults events record the invoke to handle_mm_fault, but the invoke
> may come from do_page_fault or gup. In some use cases, the finer event count
> mey be needed, so add trace events support for:
>
> __get_user_pages
> __get_user_pages_fast
> fixup_user_fault
>
> Signed-off-by: Yang Shi <[email protected]>
> ---
> include/trace/events/gup.h | 71 ++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 71 insertions(+)
> create mode 100644 include/trace/events/gup.h
>
> diff --git a/include/trace/events/gup.h b/include/trace/events/gup.h
> new file mode 100644
> index 0000000..03a4674
> --- /dev/null
> +++ b/include/trace/events/gup.h
> @@ -0,0 +1,71 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM gup
> +
> +#if !defined(_TRACE_GUP_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_GUP_H
> +
> +#include <linux/types.h>
> +#include <linux/tracepoint.h>
> +
> +TRACE_EVENT(gup_fixup_user_fault,
> +
> + TP_PROTO(struct task_struct *tsk, struct mm_struct *mm,
> + unsigned long address, unsigned int fault_flags),
> +
> + TP_ARGS(tsk, mm, address, fault_flags),
Arges added and not used by TP_fast_assign(), this will slow down the
code while tracing is enabled, as they need to be added to the trace
function call.
> +
> + TP_STRUCT__entry(
> + __field( unsigned long, address )
> + ),
> +
> + TP_fast_assign(
> + __entry->address = address;
> + ),
> +
> + TP_printk("address=%lx", __entry->address)
> +);
> +
> +TRACE_EVENT(gup_get_user_pages,
> +
> + TP_PROTO(struct task_struct *tsk, struct mm_struct *mm,
> + unsigned long start, unsigned long nr_pages),
> +
> + TP_ARGS(tsk, mm, start, nr_pages),
Here too but this is worse. See below.
> +
> + TP_STRUCT__entry(
> + __field( unsigned long, start )
> + __field( unsigned long, nr_pages )
> + ),
> +
> + TP_fast_assign(
> + __entry->start = start;
> + __entry->nr_pages = nr_pages;
> + ),
> +
> + TP_printk("start=%lx nr_pages=%lu", __entry->start, __entry->nr_pages)
> +);
> +
> +TRACE_EVENT(gup_get_user_pages_fast,
> +
> + TP_PROTO(unsigned long start, int nr_pages, int write,
> + struct page **pages),
> +
> + TP_ARGS(start, nr_pages, write, pages),
This and the above "gup_get_user_pages" have the same entry field,
assign and printk. They should be combined into a DECLARE_EVENT_CLASS()
and two DEFINE_EVENT()s. That will save on size as the
DECLARE_EVENT_CLASS() is the biggest part of each TRACE_EVENT().
-- Steve
> +
> + TP_STRUCT__entry(
> + __field( unsigned long, start )
> + __field( unsigned long, nr_pages )
> + ),
> +
> + TP_fast_assign(
> + __entry->start = start;
> + __entry->nr_pages = nr_pages;
> + ),
> +
> + TP_printk("start=%lx nr_pages=%lu", __entry->start, __entry->nr_pages)
> +);
> +
> +#endif /* _TRACE_GUP_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
On Wed, 2 Dec 2015 15:36:50 -0800
Dave Hansen <[email protected]> wrote:
> On 12/02/2015 02:53 PM, Yang Shi wrote:
> > diff --git a/mm/gup.c b/mm/gup.c
> > index deafa2c..10245a4 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -13,6 +13,9 @@
> > #include <linux/rwsem.h>
> > #include <linux/hugetlb.h>
> >
> > +#define CREATE_TRACE_POINTS
> > +#include <trace/events/gup.h>
> > +
> > #include <asm/pgtable.h>
> > #include <asm/tlbflush.h>
>
> This needs to be _the_ last thing that gets #included. Otherwise, you
> risk colliding with any other trace header that gets implicitly included
> below.
Agreed.
>
> > @@ -1340,6 +1346,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
> > start, len)))
> > return 0;
> >
> > + trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
> > +
> > /*
> > * Disable interrupts. We use the nested form as we can already have
> > * interrupts disabled by get_futex_key.
>
> It would be _really_ nice to be able to see return values from the
> various gup calls as well. Is that feasible?
Only if you rewrite the functions to have a single return code path
that we can add a tracepoint too. Or have a wrapper function that gets
called directly that calls these functions internally and the tracepoint
can trap the return value.
I can probably make function_graph tracer give return values, although
it will give a return value for void functions as well. And it may give
long long returns for int returns that may have bogus data in the
higher bits.
-- Steve
On 12/2/2015 8:13 PM, Steven Rostedt wrote:
> On Wed, 2 Dec 2015 15:36:50 -0800
> Dave Hansen <[email protected]> wrote:
>
>> On 12/02/2015 02:53 PM, Yang Shi wrote:
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index deafa2c..10245a4 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -13,6 +13,9 @@
>>> #include <linux/rwsem.h>
>>> #include <linux/hugetlb.h>
>>>
>>> +#define CREATE_TRACE_POINTS
>>> +#include <trace/events/gup.h>
>>> +
>>> #include <asm/pgtable.h>
>>> #include <asm/tlbflush.h>
>>
>> This needs to be _the_ last thing that gets #included. Otherwise, you
>> risk colliding with any other trace header that gets implicitly included
>> below.
>
> Agreed.
>
>>
>>> @@ -1340,6 +1346,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
>>> start, len)))
>>> return 0;
>>>
>>> + trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
>>> +
>>> /*
>>> * Disable interrupts. We use the nested form as we can already have
>>> * interrupts disabled by get_futex_key.
>>
>> It would be _really_ nice to be able to see return values from the
>> various gup calls as well. Is that feasible?
>
> Only if you rewrite the functions to have a single return code path
> that we can add a tracepoint too. Or have a wrapper function that gets
Yes. My preliminary test just could cover the success case. gup could
return errno from different a few code path.
> called directly that calls these functions internally and the tracepoint
> can trap the return value.
This will incur more changes in other subsystems (futex, kvm, etc), I'm
not sure if it is worth making such changes to get return value.
> I can probably make function_graph tracer give return values, although
> it will give a return value for void functions as well. And it may give
> long long returns for int returns that may have bogus data in the
> higher bits.
If the return value requirement is not limited to gup, the approach
sounds more reasonable.
Thanks,
Yang
>
> -- Steve
>
On 12/2/2015 8:07 PM, Steven Rostedt wrote:
> On Wed, 2 Dec 2015 14:53:27 -0800
> Yang Shi <[email protected]> wrote:
>
>> page-faults events record the invoke to handle_mm_fault, but the invoke
>> may come from do_page_fault or gup. In some use cases, the finer event count
>> mey be needed, so add trace events support for:
>>
>> __get_user_pages
>> __get_user_pages_fast
>> fixup_user_fault
>>
>> Signed-off-by: Yang Shi <[email protected]>
>> ---
>> include/trace/events/gup.h | 71 ++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 71 insertions(+)
>> create mode 100644 include/trace/events/gup.h
>>
>> diff --git a/include/trace/events/gup.h b/include/trace/events/gup.h
>> new file mode 100644
>> index 0000000..03a4674
>> --- /dev/null
>> +++ b/include/trace/events/gup.h
>> @@ -0,0 +1,71 @@
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM gup
>> +
>> +#if !defined(_TRACE_GUP_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_GUP_H
>> +
>> +#include <linux/types.h>
>> +#include <linux/tracepoint.h>
>> +
>> +TRACE_EVENT(gup_fixup_user_fault,
>> +
>> + TP_PROTO(struct task_struct *tsk, struct mm_struct *mm,
>> + unsigned long address, unsigned int fault_flags),
>> +
>> + TP_ARGS(tsk, mm, address, fault_flags),
>
> Arges added and not used by TP_fast_assign(), this will slow down the
> code while tracing is enabled, as they need to be added to the trace
> function call.
>
>> +
>> + TP_STRUCT__entry(
>> + __field( unsigned long, address )
>> + ),
>> +
>> + TP_fast_assign(
>> + __entry->address = address;
>> + ),
>> +
>> + TP_printk("address=%lx", __entry->address)
>> +);
>> +
>> +TRACE_EVENT(gup_get_user_pages,
>> +
>> + TP_PROTO(struct task_struct *tsk, struct mm_struct *mm,
>> + unsigned long start, unsigned long nr_pages),
>> +
>> + TP_ARGS(tsk, mm, start, nr_pages),
>
> Here too but this is worse. See below.
>
>> +
>> + TP_STRUCT__entry(
>> + __field( unsigned long, start )
>> + __field( unsigned long, nr_pages )
>> + ),
>> +
>> + TP_fast_assign(
>> + __entry->start = start;
>> + __entry->nr_pages = nr_pages;
>> + ),
>> +
>> + TP_printk("start=%lx nr_pages=%lu", __entry->start, __entry->nr_pages)
>> +);
>> +
>> +TRACE_EVENT(gup_get_user_pages_fast,
>> +
>> + TP_PROTO(unsigned long start, int nr_pages, int write,
>> + struct page **pages),
>> +
>> + TP_ARGS(start, nr_pages, write, pages),
>
> This and the above "gup_get_user_pages" have the same entry field,
> assign and printk. They should be combined into a DECLARE_EVENT_CLASS()
> and two DEFINE_EVENT()s. That will save on size as the
> DECLARE_EVENT_CLASS() is the biggest part of each TRACE_EVENT().
Thanks for the suggestion, will fix them in V3.
Regards,
Yang
>
> -- Steve
>
>
>> +
>> + TP_STRUCT__entry(
>> + __field( unsigned long, start )
>> + __field( unsigned long, nr_pages )
>> + ),
>> +
>> + TP_fast_assign(
>> + __entry->start = start;
>> + __entry->nr_pages = nr_pages;
>> + ),
>> +
>> + TP_printk("start=%lx nr_pages=%lu", __entry->start, __entry->nr_pages)
>> +);
>> +
>> +#endif /* _TRACE_GUP_H */
>> +
>> +/* This part must be outside protection */
>> +#include <trace/define_trace.h>
>
On Thu, 03 Dec 2015 10:36:18 -0800
"Shi, Yang" <[email protected]> wrote:
> > called directly that calls these functions internally and the tracepoint
> > can trap the return value.
>
> This will incur more changes in other subsystems (futex, kvm, etc), I'm
> not sure if it is worth making such changes to get return value.
No, it wouldn't require any changes outside of this.
-long __get_user_pages(..)
+static long __get_user_pages_internal(..)
{
[..]
}
+
+long __get_user_pages(..)
+{
+ long ret;
+ ret = __get_user_pages_internal(..);
+ trace_get_user_pages(.., ret)
+}
>
> > I can probably make function_graph tracer give return values, although
> > it will give a return value for void functions as well. And it may give
> > long long returns for int returns that may have bogus data in the
> > higher bits.
>
> If the return value requirement is not limited to gup, the approach
> sounds more reasonable.
>
Others have asked about it. Maybe I should do it.
-- Steve
On 12/3/2015 11:06 AM, Steven Rostedt wrote:
> On Thu, 03 Dec 2015 10:36:18 -0800
> "Shi, Yang" <[email protected]> wrote:
>
>>> called directly that calls these functions internally and the tracepoint
>>> can trap the return value.
>>
>> This will incur more changes in other subsystems (futex, kvm, etc), I'm
>> not sure if it is worth making such changes to get return value.
>
> No, it wouldn't require any changes outside of this.
>
> -long __get_user_pages(..)
> +static long __get_user_pages_internal(..)
> {
> [..]
> }
> +
> +long __get_user_pages(..)
> +{
> + long ret;
> + ret = __get_user_pages_internal(..);
> + trace_get_user_pages(.., ret)
> +}
Thanks for this. I just checked the fast version, it looks it just has
single return path, so this should be just needed by slow version.
>
>>
>>> I can probably make function_graph tracer give return values, although
>>> it will give a return value for void functions as well. And it may give
>>> long long returns for int returns that may have bogus data in the
>>> higher bits.
>>
>> If the return value requirement is not limited to gup, the approach
>> sounds more reasonable.
>>
>
> Others have asked about it. Maybe I should do it.
If you are going to add return value in common trace code, I won't do
the gup specific one in V3.
Thanks,
Yang
>
> -- Steve
>